nvme over fabrics demystified - snia · 2019. 12. 21. · offload vs no offload performance data...

37
2019 Storage Developer Conference India © All Rights Reserved. 1 NVMe over Fabrics Demystified Rob Davis Mellanox

Upload: others

Post on 23-Jan-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

2019 Storage Developer Conference India © All Rights Reserved.1

NVMe over Fabrics

Demystified

Rob Davis

Mellanox

Page 2: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 22

Why NVMe over Fabrics?

0.1

10

1000

HD SSD NVM

Acc

ess

Tim

e (m

icro

-Sec

)

Storage Media Technology

Acc

ess

Tim

e in

Mic

ro S

eco

nd

s

HDD PM

Page 3: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 33

NVMe Technology

▪Optimized for flash and PM▪ Traditional SCSI interfaces designed for spinning disk▪ NVMe bypasses unneeded layers

▪NVMe Flash Outperforms SAS/SATA Flash▪ +2.5x more bandwidth, +50% lower latency, +3x

more IOPS

Page 4: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 44

“NVMe over Fabrics” was the Logical and Historical next step

▪ Sharing NVMe based storage across multiple servers/CPUs was the next step▪ Better utilization: capacity, rack space, power▪ Scalability, management, fault isolation

▪NVMe over Fabrics standard▪ 50+ contributors ▪ Version 1.0 released in June 2016

▪Pre-standard demos in 2014

▪Able to almost match local NVMe performance

Gb/s

Page 5: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 55

NVMe over Fabrics (NVMe-oF) Transports

▪ The NVMe-oF standard is not Fabric specific

▪ Instead there is a separate Transport Binding specification for each Transport Layer▪ RDMA was 1st

▪ Later Fibre Channel▪ NVM.org just released a

new binding specification for TCP

Infi

niB

and

Page 6: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 66

How Does NVMe-oF Maintain NVMe Performance?

▪ By extending NVMe efficiency over a fabric▪ NVMe commands and data structures are transferred end

to end

▪ Bypassing legacy stacks for performance

▪ First products and early demos all used RDMA

▪ Performance is impressive

SAS/sATA

Device

over Fabrics

NVMe/RDMA

NVMe/TCP

Transport

Transport

or IB

Page 7: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 77

How Does NVMe-oF Maintain NVMe Performance?

▪ By extending NVMe efficiency over a fabric▪ NVMe commands and data structures are transferred end

to end

▪ Bypassing legacy stacks for performance

▪ First products and early demos all used RDMA

▪ Performance is impressive

SAS/sATA

Device

over Fabrics

NVMe/RDMA

NVMe/TCP

https://www.theregister.co.uk/2018/08/16/pavilion_fabrics_performance/

Page 8: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 88

How Does NVMe-oF Maintain NVMe Performance?

▪ By extending NVMe efficiency over a fabric▪ NVMe commands and data structures are transferred end

to end

▪ Bypassing legacy stacks for performance

▪ First products and early demos all used RDMA

▪ Performance is impressive

SAS/sATA

Device

over Fabrics

NVMe/RDMA

NVMe/TCP

Fibre

Channel

Fibre

Channel

NVMe/TCP

NVMe/FC

over Fabrics

~150

Page 9: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 99

Faster Storage Needs a Faster Network

10GbE

Fibre Channel

Page 10: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1010

Faster Network Wires Solves Some the Network Bottle Neck Problem…

Ethernet & InfiniBand

End-to-End 25, 40, 50, 56, 100, 200Gb

Going to 400Gb

Page 11: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1111

Faster Protocols Solves the Rest

Page 12: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1212

Faster Protocols Solves the Rest

Page 13: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1313

NVMe, NVMe-oF, and RDMA Protocols

Page 14: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1414

NVMe/RDMA

adapter based

transport

NVMe-oF over RoCE

Page 15: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1515

NVMe/RDMA

adapter based

transport

1) Ethernet▪ RoCE

▪ iWARP

2) InfiniBand

3) OmniPath

NVMe-oF over RoCE

Page 16: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1616

NVMe Commands Encapsulated

Ne

two

rk

Page 17: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1717

NVMe Commands Encapsulated

Ne

two

rk

RNICNVMe Initiator

RNICNVMe Target

Post Send (CC)

Send – Command Capsule

Ack

CompletionCompletion

Post NVMe commandWait for completionFree receive buffer

Post Send (RC)

Send – Response Capsule

Completion Ack

CompletionFree send buffer

Free send buffer

Post Send (Write data)

Write first

Write last

Ack

CompletionFree allocated buffer

Page 18: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1818

Importance of Latency with NVMe-oF

Common Switch & AdapterLo

ga

rith

mic

sca

le

Low Latency Switch & Adapter

Network hops multiply latency

Request/Response

Newest

NVMe SSD

Page 19: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 1919

Composable Infrastructure Use Case

▪Also called Compute Storage Disaggregation and Rack Scale▪Dramatically improves

data center efficiency

▪NVMe over Fabrics enables Composable Infrastructure▪ Low latency▪ High bandwidth▪ Nearly local disk

performance

Switch

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Switch

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Page 20: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2020

Hyperconverged and Scale-Out Storage Use Case

▪Scale-out ▪ Cluster of commodity servers▪ Software provides storage

functions

▪Hyperconverged collapses compute & storage▪ Integrated compute-storage

nodes & software▪NVMe-oF performs like

local/direct-attached SSD

Scale out Storage

Mellanox x86 Switch

Compute Nodes

Storage Application

VM VMVM VM

NVMe NVMe NVMe

NVMe NVMe NVMe

Storage

App

HCI Nodes

Page 21: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2121

Backend Scale Out Use Case

Backend

Network

JBOF

Frontend

Page 22: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2222

NVMe-oF Use Cases: Classic SAN

▪SAN features at higher performance▪ Better utilization:

capacity, rack space, and power

▪ Scalability▪Management▪ Fault isolation

Page 23: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2323

NVMe-oF Target Hardware OffloadsNo Offload Mode

Page 24: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2424

How Target Offload Works▪Offload▪ Only control path, management and

exceptions go through Target CPU software

▪ Data path and NVMe commands handled by the network adapter

Page 25: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2525

Offload vs No Offload Performance

Data Path

DD

R4

DD

R4

PCIe Switch

NVME

SSD

NVME

SSD

Initiator x86

ConnectX-5

Initiator x86

ConnectX-5

▪ 6M IOPs, 512B block size▪ 2M IOPs, 4K block side▪ ~15 usec latency (not including

SSD)

no Offload

Target2 100Gb Initiators

DD

R4

DD

R4

PCIe Switch

NVME

SSD

NVME

SSD

Initiator x86

ConnectX-5

Initiator x86

ConnectX-5

SOC

▪ 8M IOPs, 512B block size▪ 5M IOPs, 4K block side▪ ~5 usec latency (not including SSD)

Offload

Target2 100Gb Initiators

Page 26: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2626

Offload vs No Offload Performance

Data Path

DD

R4

DD

R4

PCIe Switch

NVME

SSD

NVME

SSD

Initiator x86

ConnectX-5

Initiator x86

ConnectX-5

▪ 6M IOPs, 512B block size▪ 2M IOPs, 4K block side▪ ~15 usec latency (not including

SSD)

no Offload

Target2 100Gb Initiators

DD

R4

DD

R4

PCIe Switch

NVME

SSD

NVME

SSD

Initiator x86

ConnectX-5

Initiator x86

ConnectX-5

SOC

▪ 8M IOPs, 512B block size▪ 5M IOPs, 4K block side▪ ~5 usec latency (not including SSD)

Offload

Target2 100Gb Initiators

Page 27: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2727

NVMe Emulation

Physical Local NVMe Storage

Physical Local Storage

OS/Hypervisor

NVMe Standard Driver

PCIe BUS

NVMe

Host Server

Local Physical Storage to Hardware Emulated Storage

NVMe Drive Emulation

Host Server

OS/Hypervisor

NVMe Standard Driver

NVMeEmulated

Storage

PCIe BUS

Remote Storage

Page 28: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2828

NVMe/TCP

▪NVMe-oF commands are sent over standard TCP/IP sockets▪Each NVMe queue pair is mapped to a TCP connection▪Easy to support NVMe over TCP with no changes▪Good for distance, stranded server, and out of band management connectivity

Page 29: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 2929

Latency: NVMe-RDMA vs NVMe-TCP

Lo

cal S

SD

Wri

te

RD

MA

Wri

te

TC

P W

rite

Tail Latency

Fra

cti

on

of

IOs

wit

h t

his

or

les

s la

ten

cy

Page 30: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 3030

Latency: NVMe-RDMA vs NVMe-TCP

Lo

cal S

SD

Wri

te

RD

MA

Wri

te

TC

P W

rite

Tail Latency

Fra

cti

on

of

IOs

wit

h t

his

or

les

s la

ten

cy

Page 31: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 3131

Latency: NVMe-RDMA vs NVMe-TCP

Lo

cal S

SD

Wri

te

RD

MA

Wri

te

TC

P W

rite

Tail Latency

Fra

cti

on

of

IOs

wit

h t

his

or

les

s la

ten

cy

Page 32: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 3232

NVMe over Fabrics Maturity

▪UNH-IOL, a neutral environment for multi-vendor interoperability since 1988

▪ Four plug fests for NVMe-oF since May 2017

▪Tests require participating vendors to mix and match in both Target and Initiator positions

▪ June 2018 test included Mellanox, Broadcom and Marvel ASIC solutions

▪URL to list of vendors who OK public results: https://www.iol.unh.edu/registry/nvmeof

Page 33: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 3333

NVMe Market Projection – $60B by 2021

▪~$20B in NVMe-oF revenue projected by 2021▪NVMe-oF adapter

shipments will exceed 1.5M units by 2021▪ This does not include ASICs,

Custom Mezz Cards, etc. inside AFAs and other Storage Appliances

Page 34: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 3434

Some NVMe-oF Storage Players

Page 35: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 3535

Conclusions

▪NVMe-oF brings the value of networked storage to NVMe based solutions

▪NVMe-oF is supported across many network technologies

▪The performance advantages of NVMe, are not lost with NVMe-oF ▪Especially with RDMA

▪There are many suppliers of NVMe-oF solutions across a variety of important data center use cases

Page 36: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

© 2019 Mellanox Technologies 36

Thank You

Page 37: NVMe over Fabrics Demystified - SNIA · 2019. 12. 21. · Offload vs No Offload Performance Data Path DDR4 DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86

2019 Storage Developer Conference India © All Rights Reserved.37

NVMe over Fabrics

Demystified

Rob Davis

Mellanox