pnfs over rdma - possibilities - snia · nfs server manages data layout each nfs client can stripe...
Post on 03-Aug-2020
1 Views
Preview:
TRANSCRIPT
2015 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
pNFS/RDMA: Possibilities
Chuck LeverOracle Corporation
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
The opinions expressed in this presentation are the presenter’s own, and do not represent the
views of Oracle or anyone else.
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Given these storage trends: ❒ Throughput of networks is increasing ❒ Latency of persistent storage is dropping
exponentially ❒ Capacity is off the charts
❒ How can NFS make good use of our new Persistent Memory overlords?
What If . . . ?
3
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Traditional NFS
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Each NFS file resides on one server
❒ Applications locate files via a POSIX directory structure
❒ Clients access data via NFS READ and WRITE operations
Traditional NFS Operation
5
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Traditional NFS Server Storage Topology
6
SAN
Ethernet
NFS server
NFS clients
XFS
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ One RPC issued at a time per TCP socket
❒ Typically one or a few TCP sockets are shared across a server’s shares
❒ Data throughput is constrained by the server
Traditional NFS Weaknesses
7
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Traditional NFS FILE_SYNC WRITE
8
NFS Client NFS ServerTCP send
TCP send
Server updates durable storage
Application writes
Write is complete
TCP sendTCP send
TCP send
. . .
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ To avoid waiting for durable storage on every WRITE, NFSv3 introduced unstable WRITE plus COMMIT ❒ Client flushes data to server asynchronously ❒ Client sends COMMIT ❒ Server makes written data durable
❒ Transport bottlenecks remained
Two-phase Commit
9
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
What Is pNFS?
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ NFS protocol manages metadata ❒ Directory structure ❒ File open and lock state ❒ File data layout information ❒ Fall-back I/O mechanism
❒ Separate protocol and transports handle I/O
Data / Metadata Separation
11
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ A layout type: ❒ Specifies which transport protocol to use ❒ How to locate file data ❒ Specified separately from NFS protocol
❒ A layout instance tells where a file’s data resides ❒ Which NFS server and file, or ❒ Which SCSI LUN at which LBA
pNFS Layout Types
12
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Applications retain single-server view of files ❒ NFS server manages data layout ❒ Each NFS client can stripe file I/O across multiple
storage services ❒ Data and metadata operations run concurrently ❒ Clients and servers share a storage fabric
❒ SCSI, iSCSI, iSER, SRP ❒ Object-based storage ❒ NFS
Parallel NFS In A Nutshell
13
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
pNFS Server Storage Topology
14
SAN
Ethernet
NFS server
NFS clients
XFSSCSI
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ High Performance Computing ❒ Parallel I/O ❒ Greater file capacity
❒ Deployments where storage clients and servers share a storage fabric ❒ Each client can be directed to a particular
server ❒ Each file can be placed on a particular server
Example Usage Scenarios
15
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
What Is NFS/RDMA?
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ I/O-like access of the physical memory on another host ❒ Strong ordering of operations ❒ Asynchronous: completion fires when an
operation finishes ❒ Datagram channel: SEND and RECV ❒ Data transfer: READ and WRITE
What Is Remote Direct Memory Access?
17
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Zero-copy is possible on both send and receive ❒ No CPU cache footprint until app accesses
data ❒ Transport resources are pre-allocated
❒ No resource allocation in data path ❒ Reduced opportunity for deadlock
❒ Data transfer is concurrent with other transport operations
RDMA Ready For 100Gbps Fabrics
18
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Each RPC is conveyed by RDMA operations ❒ Ultra-low round-trip latency
❒ RNICs handle bulk data transfer ❒ Low CPU overhead ❒ High bandwidth
NFS/RDMA Concepts
19
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Non-I/O operations conveyed via RDMA SEND ❒ GETATTR, LOOKUP, and so on
❒ Data operations (i.e. NFS READ and WRITE) utilize RDMA READ and WRITE ❒ Server initiates all RDMA transfer ❒ After that, neither host CPU is involved
Data / Metadata Separation
20
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
NFS/RDMA FILE_SYNC WRITE
21
NFS Client NFS ServerRDMA SEND
RDMA READREAD result
RDMA SEND
RDMA READREAD result
Server updates durable storage
Application writes
Write is complete
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Use NFS/RDMA instead of NFS/TCP on IPoIB ❒ See “RDMA On 100Gbps Fabrics”
❒ Latency-sensitive SLAs
❒ CPU-intensive client workloads
❒ One-time bulk-data movement (e.g. backup)
Example Usage Scenarios
22
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
pNFS and NFS/RDMA
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Client gets direct access to durable storage
❒ E.g. ultra-low latency Persistent Memory
❒ No protocol translation overhead
❒ Data not even read into server DRAM
Why pNFS/RDMA?
24
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Multiple transport connections per client mount point
❒ Multiple QPs
❒ Multiple RNICs
Why pNFS/RDMA?
25
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Single converged fabric shared between pNFS clients and servers
❒ Rather than “pNFS/TCP with SCSI”
❒ Instead use “pNFS/RDMA with SRP”
Why pNFS/RDMA?
26
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
pNFS/RDMA Server Storage Topology
27
RDMA Fabric
NFS server
NFS clients
XFS
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Next Steps
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ NFSv4.1 on RDMA is a pre-requisite
❒ Bi-directional RPC-over-RDMA ❒ Lots of backchannel session slots ❒ NFSv4.1 Upper Layer Binding to RPC-over-
RDMA
What’s Needed For NFS/RDMA
29
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ A new pNFS layout type is not required for operation with SRP or iSER
❒ Proposal: a new pNFS layout type for accessing remote Persistent Memory devices directly ❒ Device naming ❒ Ensuring data durability ❒ Error handling and fencing ❒ Authentication, data privacy
What’s Needed For pNFS
30
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Questions / Discussion
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Appendix
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ pNFS Standards ❒ NFSv4.1: RFC 5661 ❒ pNFS layouts: RFCs 5662 - 5665
❒ NFS/RDMA Standards ❒ RPC-over-RDMA: RFC 5666 ❒ NFS/RDMA ULB: RFC 5667
NFS Reference Material
33
top related