pnfs over rdma - possibilities - snia · nfs server manages data layout each nfs client can stripe...

33
pNFS/RDMA: Possibilities Chuck Lever Oracle Corporation

Upload: others

Post on 03-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

pNFS/RDMA: Possibilities

Chuck LeverOracle Corporation

Page 2: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

The opinions expressed in this presentation are the presenter’s own, and do not represent the

views of Oracle or anyone else.

Page 3: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Given these storage trends: ❒ Throughput of networks is increasing ❒ Latency of persistent storage is dropping

exponentially ❒ Capacity is off the charts

❒ How can NFS make good use of our new Persistent Memory overlords?

What If . . . ?

3

Page 4: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Traditional NFS

Page 5: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Each NFS file resides on one server

❒ Applications locate files via a POSIX directory structure

❒ Clients access data via NFS READ and WRITE operations

Traditional NFS Operation

5

Page 6: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Traditional NFS Server Storage Topology

6

SAN

Ethernet

NFS server

NFS clients

XFS

Page 7: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ One RPC issued at a time per TCP socket

❒ Typically one or a few TCP sockets are shared across a server’s shares

❒ Data throughput is constrained by the server

Traditional NFS Weaknesses

7

Page 8: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Traditional NFS FILE_SYNC WRITE

8

NFS Client NFS ServerTCP send

TCP send

Server updates durable storage

Application writes

Write is complete

TCP sendTCP send

TCP send

. . .

Page 9: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ To avoid waiting for durable storage on every WRITE, NFSv3 introduced unstable WRITE plus COMMIT ❒ Client flushes data to server asynchronously ❒ Client sends COMMIT ❒ Server makes written data durable

❒ Transport bottlenecks remained

Two-phase Commit

9

Page 10: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

What Is pNFS?

Page 11: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ NFS protocol manages metadata ❒ Directory structure ❒ File open and lock state ❒ File data layout information ❒ Fall-back I/O mechanism

❒ Separate protocol and transports handle I/O

Data / Metadata Separation

11

Page 12: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ A layout type: ❒ Specifies which transport protocol to use ❒ How to locate file data ❒ Specified separately from NFS protocol

❒ A layout instance tells where a file’s data resides ❒ Which NFS server and file, or ❒ Which SCSI LUN at which LBA

pNFS Layout Types

12

Page 13: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Applications retain single-server view of files ❒ NFS server manages data layout ❒ Each NFS client can stripe file I/O across multiple

storage services ❒ Data and metadata operations run concurrently ❒ Clients and servers share a storage fabric

❒ SCSI, iSCSI, iSER, SRP ❒ Object-based storage ❒ NFS

Parallel NFS In A Nutshell

13

Page 14: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

pNFS Server Storage Topology

14

SAN

Ethernet

NFS server

NFS clients

XFSSCSI

Page 15: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ High Performance Computing ❒ Parallel I/O ❒ Greater file capacity

❒ Deployments where storage clients and servers share a storage fabric ❒ Each client can be directed to a particular

server ❒ Each file can be placed on a particular server

Example Usage Scenarios

15

Page 16: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

What Is NFS/RDMA?

Page 17: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ I/O-like access of the physical memory on another host ❒ Strong ordering of operations ❒ Asynchronous: completion fires when an

operation finishes ❒ Datagram channel: SEND and RECV ❒ Data transfer: READ and WRITE

What Is Remote Direct Memory Access?

17

Page 18: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Zero-copy is possible on both send and receive ❒ No CPU cache footprint until app accesses

data ❒ Transport resources are pre-allocated

❒ No resource allocation in data path ❒ Reduced opportunity for deadlock

❒ Data transfer is concurrent with other transport operations

RDMA Ready For 100Gbps Fabrics

18

Page 19: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Each RPC is conveyed by RDMA operations ❒ Ultra-low round-trip latency

❒ RNICs handle bulk data transfer ❒ Low CPU overhead ❒ High bandwidth

NFS/RDMA Concepts

19

Page 20: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Non-I/O operations conveyed via RDMA SEND ❒ GETATTR, LOOKUP, and so on

❒ Data operations (i.e. NFS READ and WRITE) utilize RDMA READ and WRITE ❒ Server initiates all RDMA transfer ❒ After that, neither host CPU is involved

Data / Metadata Separation

20

Page 21: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

NFS/RDMA FILE_SYNC WRITE

21

NFS Client NFS ServerRDMA SEND

RDMA READREAD result

RDMA SEND

RDMA READREAD result

Server updates durable storage

Application writes

Write is complete

Page 22: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Use NFS/RDMA instead of NFS/TCP on IPoIB ❒ See “RDMA On 100Gbps Fabrics”

❒ Latency-sensitive SLAs

❒ CPU-intensive client workloads

❒ One-time bulk-data movement (e.g. backup)

Example Usage Scenarios

22

Page 23: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

pNFS and NFS/RDMA

Page 24: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Client gets direct access to durable storage

❒ E.g. ultra-low latency Persistent Memory

❒ No protocol translation overhead

❒ Data not even read into server DRAM

Why pNFS/RDMA?

24

Page 25: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Multiple transport connections per client mount point

❒ Multiple QPs

❒ Multiple RNICs

Why pNFS/RDMA?

25

Page 26: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Single converged fabric shared between pNFS clients and servers

❒ Rather than “pNFS/TCP with SCSI”

❒ Instead use “pNFS/RDMA with SRP”

Why pNFS/RDMA?

26

Page 27: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

pNFS/RDMA Server Storage Topology

27

RDMA Fabric

NFS server

NFS clients

XFS

Page 28: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Next Steps

Page 29: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ NFSv4.1 on RDMA is a pre-requisite

❒ Bi-directional RPC-over-RDMA ❒ Lots of backchannel session slots ❒ NFSv4.1 Upper Layer Binding to RPC-over-

RDMA

What’s Needed For NFS/RDMA

29

Page 30: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ A new pNFS layout type is not required for operation with SRP or iSER

❒ Proposal: a new pNFS layout type for accessing remote Persistent Memory devices directly ❒ Device naming ❒ Ensuring data durability ❒ Error handling and fencing ❒ Authentication, data privacy

What’s Needed For pNFS

30

Page 31: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Questions / Discussion

Page 32: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Appendix

Page 33: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ pNFS Standards ❒ NFSv4.1: RFC 5661 ❒ pNFS layouts: RFCs 5662 - 5665

❒ NFS/RDMA Standards ❒ RPC-over-RDMA: RFC 5666 ❒ NFS/RDMA ULB: RFC 5667

NFS Reference Material

33