programming models to enable persistent memory · goals: the programming model block-oriented...
TRANSCRIPT
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Programming Models to Enable Persistent Memory
Andy Rudoff Intel
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Agenda
Goals: The Programming Model Block-oriented Capabilities Memory-oriented Capabilities Discovery Forums for Driving the Ecosystem
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Target Audience
Those involved in SW development who… Want to use NVM as more than just a disk Are interested in emerging NVM features Want to understand the NVM Programming Model being discussed
by a group of interested companies
Particularly interesting for those involved in: File systems Applications that need high-performance caches Latency-sensitive applications Processing of very large data sets
Fairly technical (code examples, SW stack diagrams)
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Problem Statement
NVM features and performance are outgrowing the existing storage model
Sending block Read/Write operations down the traditional SCSI stack is no longer sufficient Well, it is if you’re just using NVM as a traditional disk But not if you want to leverage higher-order NVM operations
We believe NVM technology is advancing into something less like storage,
more like memory Need a programming model that comprehends this
Need to feed these ideas into a group of like-minded industry leaders in the
NVM space Produce an NVM Programming Model for the ecosystem
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Emerging APIs
Evolving NVM
Features, Performance
API 2 API 3
Application Application Application
API 1
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Goals: The Programming Model
Encourage a common ecosystem Without limiting ability to innovate
Components using NVM need:
Common ideas they can depend on Evolutionary path Flexibility
Programming Model
Published Spec of capabilities, approach to NVM Stop short of defining the API
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Programming Model Versus API
OSVs own their kernel APIs Cannot define these in a committee and push on OSVs Cannot define one API for multiple OS platforms
Serious differences on how things work in the kernel Goes against independent innovation
Next best thing is to agree on overall model With OSV collaboration Then engage OSV to define and implement API
Similar situation in user-space
A common API doesn’t always make sense Violates the “when in Rome” design principle Example: the UNIX versus the Windows event models
Ultimately: want OSV to ship and maintain the API
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Building Towards APIs
Evolving NVM
Extensible Programming
Model API
Concepts…
•Sync and Async I/O •Atomic Updates •Memory-mapping •Names •Permissions •Backing up Data
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Programming Model Targets
User
Kernel
Hardware
NVM Driver Stack
NVM PCIe 60P HS Con
C o n n 8 0 C o n n 8 0
Conn 80 Conn 80
Conn 80 Conn 80
C o n n 8 0 C o n n 8 0 C o n n 8 0 C o n n 8 0
Conn 80 Conn 80
Conn 80 Conn 80
Conn 80 Conn 80
CPLD
CPLD
CPLD
CPLD
CPLD
CPLD
CPLD
CPLD
NVM Management API
Optimized Applications
Management Applications
(GUI, CLI, CIM)
Open NVM Kernel API
Optimized Kernel Modules
Existing Applications
NVM User-Space API
SNIA NVM Programming TWG
Linux* Open Source Project, Microsoft *, other OSVs
Existing/Unchanged Infrastructure
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Agenda
Goals: The Programming Model Block-oriented Capabilities Memory-oriented Capabilities Discovery Forums for Driving the Ecosystem
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Block I/O
Traditional block stacks haven’t changed much in decades A recent example of change for NVM is TRIM
Block stack had to be modified to allow this through Typically a way to detect the capability was added too
Emerging NVM Features: Anticipated usage patterns Compressibility Discardable protection information Submitting “fused” block commands I/O barriers This list must be extensible
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Example: Linux* Block Stack
struct bio bi_flags
Test Macro:
blk_queue_discard()
Use: blkdev_issue_discard() NVM Driver Stack
NVM PCIe 60P HS Con
C o n n 8 0 C o n n 8 0
Conn 80 Conn 80
Conn 80 Conn 80
C o n n 8 0 C o n n 8 0 C o n n 8 0 C o n n 8 0
Conn 80 Conn 80
Conn 80 Conn 80
Conn 80 Conn 80
CPLD
CPLD
CPLD
CPLD
CPLD
CPLD
CPLD
CPLD
NVM Management API
Optimized Applications
Management Applications
(GUI, CLI, CIM)
Open NVM Kernel API
Optimized Kernel Modules
Existing Applications
NVM User-Space API
2012 Storage Developer Conference. © Intel. All Rights Reserved.
An NVM User-mode Programming Model
Operation Purpose
NVM Create Create “channel” to NVM
NVM Destroy Destroy channel
NVM Submit Submit batches of async I/O (wider interface than typical Read/Write)
NVM Reap Check/block for completion
NVM Handler Setup a handler function
NVM Capability Get device characteristics
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Agenda
Goals: The Programming Model Block-oriented Capabilities Memory-oriented Capabilities Discovery Forums for Driving the Ecosystem
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Application Memory Allocation
User Space
Kernel Space
Application
RAM
• Well-worn interface, around for decades • Memory is gone when application exits
– Or machine goes down
RAM
RAM RAM
Memory Management
ptr = malloc()
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Application NVM Allocation
User Space
Kernel Space
Application
NVM
• Simple, familiar interface, but then what? – Persistent, so apps want to “attach” to regions – Need to manage permissions for regions – Need to resize, remove, …, backup the data
NVM
NVM NVM
Memory Management
ptr = pm_malloc()
2012 Storage Developer Conference. © Intel. All Rights Reserved.
NVM Accessed as System Memory
Emerging NVM technologies enable this model Is there a “malloc()” for Persistent Memory? How is it named, managed? What’s the permission model? Issues start to line up with the file I/O model
Examples:
Open a blob of NVM by name Map a blob of NVM Sync a blob of NVM
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Example: Linux* Memory Mapping
/* volatile allocation … */ ptr = malloc(len); /* non-volatile allocation … */ fd = open(“/my/persistent/blob”, …); ptr = mmap(…, len, fd, …); … use *ptr … msync(ptr, len, …);
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Memory-mapping NVM like Files
No new “naming” mechanism File namespace is well-understood
No new permission model Mature permissions and access lists in place
Clear administrative model Create, delete, rename, etc.
Off-the-shelf back-up tools work!
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Example: Linux* Kernel Memory Mapping
All the reasons given above are also interesting to in-kernel access to NVM
Pushing these ideas with individual OSVs, starting with a public discussion in the Linux community: http://thread.gmane.org/gmane.linux.kernel/1297432
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Linux* Kernel Proposal: nvm_map
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Linux* Kernel Proposal: nvm_protect
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Linux* Kernel Proposal: nvm_sync
2012 Storage Developer Conference. © Intel. All Rights Reserved.
A Proposed Programming Model Covering All Three Paths
User
Kernel
Hardware NVM Hardware
Standard Access 1
NVM Driver
Block Layer
File System
File Access
Raw Access Middleware
(e.g. JVM)
3 NVM API
Data Path Control Path
NVM User-space API
Optimized Applications File
Access
NVM regions exposed as files 2
Naming Layer
NVM Management API
Management Applications
(GUI, CLI, CIM)
Open NVM Kernel API
Customer Kernel Modules
File or Raw Access
2012 Storage Developer Conference. © Intel. All Rights Reserved.
The Full User-Space Programming Model (Linux* Example)
open
read/ write
open
mmap
load/ store
msync
open disk device or build file
system and open file
read/ write
Handler
nvm_handler
nvm_create
nvm_submit
Completions
A “SSD mode”
B I/O to
app buffers
C Memory Mapped
D New async I/O
API
Standard APIs
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Agenda
Goals: The Programming Model Block-oriented Capabilities Memory-oriented Capabilities Discovery Forums for Driving the Ecosystem
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Discovery
Perhaps the most important idea here Components can discover hardware capabilities Examples: Capacity Performance Write-Atomicity New funky feature X
Must be extensible, for an evolving world
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Reacting to Discovery
User
Kernel
Optimized Applications
NVM User-Space API
nvm_capability(…)
Use Flags… Weak Binding…
NVM Driver Stack
Open NVM Kernel API
OS-specific capability call
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Discoverable Capabilities
Capability Meaning NVM_CAP_MAXQUEUE The maximum I/O submission queue depth for the
channel is returned. Applications can use this to tune the number of in-flight I/O operations they expect to have outstanding at any given time.
NVM_CAP_MINIO Minimum allowed I/O size, in bytes. Submitting an I/O smaller than this size will result in an error (EINVAL).
NVM_CAP_MAXIO Maximum allowed I/O size, in bytes. Submitting an I/O larger than this size will result in an error (EINVAL).
NVM_CAP_ALIGNIO Required I/O alignment, in bytes. Submitting an I/O for an offset into the underlying NVM that is not aligned to this size will result in an error (EINVAL).
NVM_CAP_GRANIO Granularity of I/O size, in bytes. Submitting an I/O for a size that is not a multiple of this will result in an error (EINVAL).
NVM_CAP_DIF True if the device supports DIF. When true, I/O operations submitted for this channel are expected to include an additional 8 bytes of DIF data.
NVM_CAP_METADATASIZE Non-zero if the device supports metadata bytes (e.g. DIX) associated with I/O operations.
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Discoverable Capabilities (cont)
Capability Meaning NVM_CAP_SIZE The total size of the region in bytes is returned. The last byte in
the region that the application can access is the returned value – 1. The last block that the application can do I/O to begins at the returned value - 1 rounded down to the block size.
NVM_CAP_CAPACITY The maximum number of logical blocks that may be allocated in the region at one time is returned. If 0 is returned, the region is not available for use.
NVM_CAP_UTILIZATION The current number of blocks allocated in the region is returned. NVM_CAP_PROVISION Returns 1 if thin provisioning is supported, 0 if it is not. If thin
provisioning is supported and NVM_CAP_SIZE and NVM_CAP_CAPACITY are not equal, the region is sparse.
NVM_CAP_WRITE_ATOMICITY The atomicity of a single write with respect to power failure. Writes of this size or smaller cannot be “torn” by system crash or loss of power – either the entire write takes place or the entire write does not take place.
NVM_CAP_BUFALIGN Required I/O alignment in bytes of the data buffer. Submitting an I/O with a buffer not aligned to this value will result in an error (EINVAL).
NVM_CAP_METADATA_ALIGN Required alignment in bytes of the metadata buffer. Submitting an I/O with a metadata buffer not aligned to this value will result in an error (EINVAL).
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Agenda
Goals: The Programming Model Block-oriented Capabilities Memory-oriented Capabilities Discovery Forums for Driving the Ecosystem
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Forums
Standards versus Community Linux* Issues Microsoft* Issues Everyone else
Multi-pronged Strategy
Begin with programming model Input from:
NVM vendors OS vendors Software vendors
Forge ahead with OSVs on APIs
2012 Storage Developer Conference. © Intel. All Rights Reserved.
SNIA: NVM Programming TWG Status
• Founding members
− Dell*, EMC*, Fujitsu*, HP*, IBM*, Intel, NetApp*, Oracle*, QLogic*, Symantec* • Charter: Develop specifications for new software “programming models” as
NVM becomes a standard feature of platforms − Scope:
In-kernel NVM programming models Kernel-to-application programming models
− Programming models specify the exact technical behavior, up to (but not including) the OS specific API semantics
• APIs − Each OSV codes the programming models to specific to OS − Linux Open Source project underway to provide the Linux implementation of this
effort − Ex: SNIA NVM Programming TWG + Linux open source project provides the full
solution for Linux
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Description of the NVM Programming TWG
The NVM Programming TWG is specifying OS extensions to support NVM hardware. The OS extensions enable applications and OS components to access and utilize features of NVM that is accessed as a disk or, emerging NVM products accessed as memory. The functionality described in the TWG’s specifications enable OSs to enable development of NVM-aware SW across a board range of NVM hardware. The specifications describe the relationships between OS extensions to underlying standards (such as those from T10 and NVM Express) as well as functionality common to NVM vendors. Participants in this group include architects and developers for NVM aware OS components and applications.
2012 Storage Developer Conference. © Intel. All Rights Reserved.
TWG Activities in the Software Stack
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Timeline
First Revision
2nd Revision
First Revision
2nd Revision
Specs for NVM Extensions between OS Components
Specs for NVM Application Extensions
September ‘12 Q4’12 Q2, 13 Q3, 13
Q4’12 Q2’13 Q3, 13 Q4, 13
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Summary
A Common NVM Programming Model Helps the Ecosystem Avoids lock-in on a single API Enables Feature Discovery
Memory Model is Key Leverage well-understood file semantics
Call to action: Start architecting SW for emerging NVM technologies (especially
Persistent Memory) Join these organizations to participate in this movement
http://www.snia.org/ Linux kernel mailing lists
2012 Storage Developer Conference. © Intel. All Rights Reserved.
Open Questions and Work in Progress…
How do SW features from the storage stack translate to Persistent Memory? RAID Remote Replication Deduplication Running host serviceability
What implicit assumptions have ISVs made due to the block-based nature of storage for so many decades? New algorithms to be designed Operating at a new scale