nvme over fabrics—high performance flash moves to · pdf filenvme queuing operational...

16
NVMe Over Fabrics—High Performance Flash Moves to Ethernet PCIe Storage Track John F. Kim, Director of Storage Marketing at Mellanox Technologies 1 Flash Memory Summit 2016 Santa Clara, CA

Upload: phamliem

Post on 06-Mar-2018

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

NVMe Over Fabrics—High Performance Flash Moves to Ethernet

PCIe Storage Track John F. Kim, Director of Storage

Marketing at Mellanox Technologies

1 Flash Memory Summit 2016 Santa Clara, CA

Page 2: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Traditional Ways to Connect Flash §  Inside the Server

•  SCSI (SATA, SAS) •  PCIe (proprietary standards) •  USB

§  Outside the Server •  DAS: SAS, SATA, USB, InfiniBand •  Fibre Channel SAN •  FCoE •  Ethernet TCP (iSCSI, NAS, Object)

Page 3: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

New Ways to Connect Flash §  Inside the Server

•  PCIe with standardized NVMe •  NVDIMM

§  Outside the Server •  DAS: Thunderbolt, USB 3.0 •  Ethernet RDMA (iSER, SMB Direct) •  Clustered file system (InfiniBand or Ethernet) •  Hyper-converged (Ethernet) •  NVMe over Fabrics

Page 4: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Options for NVMf Network §  InfiniBand

•  Best performance

§  Ethernet (RoCE or iWARP) •  Most widely used

§  Fibre Channel (FC-NVMe standard in progress) •  Most popular network for flash arrays in the enterprise

Page 5: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Why NVMe Over Fabrics

End-to-End NVMe semantics across a range of topologies [diagram from SNIA ESF webcast]

Page 6: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

NVMe Queuing Operational Model

  1. Host Driver enqueues the Submission Queue Entries into the SQ   2. NVMe Controller dequeues Submission Queue Entries   3. NVMe Controller enqueues Completion Queue Entries into the CQ   4. Host Driver dequeues Completion Queue Entries

NVMe Controller

FabricPort

NVMe NVM Subsystem

NVMe Host Driver

Host

Submission QueueCommand Id

OpCodeNSID

BufferAddress

(PRP/SGL)

CommandParameters

EnQueue

SQE

DeQueue

CQE

Command IdOpCode

NSID

BufferAddress

(PRP/SGL)

CommandParameters

Command Parm

SQ Head Ptr

Command Status

Command Id

P

1 2

Completion Queue EnQueue

CQE

DeQueue

SQE

Command IdOpCode

NSID

BufferAddress

(PRP/SGL)

CommandParameters

Command Parm

SQ Head Ptr

Command Status

Command Id

P

Command Parm

SQ Head Ptr

Command Status

Command Id

P 34

Subsystem-side

Host-side

Page 7: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Why Ethernet for NVMf? §  Ubiquity §  25/50/100GbE today, 200GbE soon §  RDMA (RoCE and iWARP) §  Scalability to 100’s of thousands nodes §  Standards-based, multi (many)-vendor §  Lowest Price/Port §  Converged Fabric capability §  Standard cables extend to long distance

Page 8: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

RDMA Options on Ethernet §  RoCE uses UDP/IP on Ethernet, no TCP

•  InfiniBand transport on top of Ethernet

§  iWARP layers on top of TCP/IP •  Offloaded TCP/IP flow control and management •  Might include TCP Offload Engine (TOE)

§  Both RoCE and iWARP support RDMA verbs •  NVMf using verbs can be run on top of either transport

§  Other options without RDMA

Flash Memory Summit 2016 Santa Clara, CA

Page 9: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Who Supports RDMA on Ethernet? Vendor RoCE iWARP 25/50/100 Mellanox Yes Yes (GA) Broadcom (Emulex)

Yes (in beta) ?

Broadcom (NetXtreme)

?

Chelsio Yes (GA) Announced QLogic Yes (announced) Yes (announced) Yes (in beta)

Flash Memory Summit 2016 Santa Clara, CA

Page 10: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Mellanox NVMf Demo on RoCE §  Topology –

•  Two compute nodes –  ConnectX4-Lx 25Gbps port

•  One storage node –  ConnectX4-Lx 50Gbps port –  4 X Intel NVMe devices (P3700/750

series)

•  Nodes connected through switch

Added fabric latency ~12us

Bandwidth (Target side)

IOPS (Target side)

Num. Online cores Each core utilization

BS = 4KB, 16 jobs, IO depth = 64

5.2GB/sec 1.3M 4 50%

ConnectX-4 Lx 25GbE NIC

Mellanox SN2700 Spectrum Switch

4 Intel P3700 NVMe SSDS

ConnectX-4 Lx 25GbE NIC

ConnectX-4 Lx 50GbE NIC

Page 11: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Intel SPDK NVMf Demo

§  Intel SPDK software with user-space target and polling mode §  Mellanox ConnnectX-4 Lx NICs, running RoCE §  4 Intel NVMe P3700 SSDs connected to shared PCIe Gen3x16 bus §  Intel Xeon-D 1567 CPU on target side; Intel Xeon E5 2600 V3 on

initiator side

PCIe 3.0 x8 PCIe 3.0 x8

NVMf Initiator NVMf Target

2x 25GbE with RDMA (RoCE)

Mellanox ConnectX-4 Lx dual-port

Mellanox ConnectX-4 Lx dual-port

Intel NVMe SSD P3700 2TB x 4

PCIe 3.0 x16

Page 12: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

E8Storage:x10Performance,HalftheTCO

12

02.55

7.510

AFA NVMeSSD

E8D24

MIOPS

Page 13: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

VM2

Vol

VM4

Vol

VM7 VM8

Vol Vol

NVMe over 100GbE RoCE Fabric Demo NVMf in ACTION

§  12-, 16-, 32-TB Flash Arrays

§  2.25M, 1.75M Rd/Wr IOPS

§  10.0GB/s, 6.0GB/s Rr/Wr BW

§  < 200us Latency

13 8/9/16

FIO Test Tools Suite illustrates Mangstor’s NX6320 High Performance, Low Latency NVMe over Fabric Devices.

@FMS16 Hall D

ConnectX-4 100Gb/s Adapter with SR-IOV RoCE

ESXi6.0 Hypervisor

VM1

Vol

VM3

Vol

VM5 VM6

Vol Vol

Page 14: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Copyright©PavilionDataSystems,ConfidenOal

LowCostOfOwnership+EnterpriseReliability

14

Thin-Provisioned,virtualNVMeLUNspresentedonhost

Oer

ü NoCustomSoTwareorHardwarerequiredinhosts/applicaOonOer

ü StandardEthernetConnecOvity

ü ErasureCodedVolumes

ü ThinProvisionedVolumes

ü QOSPolicyperVolume

Page 15: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

Copyright©PavilionDataSystems,ConfidenOal 15

PowerANDDensity,inanEnterprise-ClassArray

120 GB/S Bandwidth

20 Million

4K IOPS

Half PB Capacity (4U)

*1 PB in Q1-2017

Up To 40 x 40 GB/S Ports

High Speed Data Copies/

Clones

Thin Provisioning

Pay-As-You Grow Scalability in 4U Chassis

Zero Host

Footprint

Erasure Coding

Hot Swappable Components

Page 16: NVMe Over Fabrics—High Performance Flash Moves to · PDF fileNVMe Queuing Operational Model 1. Host Driver enqueues the Submission Queue Entries into the SQ 2. NVMe Controller dequeues

NVMf: High Performance Flash Moves to Ethernet

Thank You!

Flash Memory Summit 2016 Santa Clara, CA