wael noureddine, ph.d. chelsio communications inc. · chelsio t580-cr, rhel 6.5 (kernel 3.18.14),...

19
NVMe over Fabrics Wael Noureddine, Ph.D. Chelsio Communications Inc.

Upload: others

Post on 19-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

NVMe over Fabrics

Wael Noureddine, Ph.D. Chelsio Communications Inc.

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

Overview

r  Introduction r  NVMe/Fabrics and iWARP r  RDMA Block Device and results r  Closing notes

2

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

r  Accelerating pace of progress r Capacity r Reliability r Efficiency r Performance

r Latency r Bandwidth r  IOPS – Random

Access

3

NVMe Devices

Samsung Keynote, Flash Memory Summit, Santa Clara, August’2015

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

A Decade of NIC Evolution

4

2003

7Gbps aggregate

PCI-X Physical Functions: 1 DMA Queues: 1 Interrupt: Line Logical NICs: 1 Lock-based Synchronization

2013 2007

11Gbps bidirectional PCIe Gen1 Physical Functions: 1 DMA Queues: 8 Interrupt: MSI Logical NICs: 8 Up to 8 cores

55Gbps bidirectional PCIe Gen3 Physical Functions: 8 DMA Queues: 64K Interrupt: MSI-X Logical NICs: 128 Lock-free user space I/O

2013 2013 2013

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

NVMe – Non-Volatile Memory Express

r  Modern PCIe interface architecture r  Benefit from experience

with other high performance devices

r  Scale with exponential speed increases

r  Standardized for enhanced plug-and-play r  Register set r  Feature set r  Command set

r  Parallels to high performance NIC interface r  64K x deep queues r  MSI-X r  Lock-free operation

r  Security r  Data integrity protection

5

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

iWARP RDMA over Ethernet

r  Open, transparent and mature r  IETF Standard (2007)

r  Plug-and-play r  Standard and proven TCP/IP r  Built-in reliability congestion control,

flow control r  Natively routable

r  Cost effective r  Regular switches r  Same network appliances r  Works with DCB but NOT required r  No need for lossless configuration

r  No network restrictions r  Architecture, scale, distance, RTT,

link speeds r  Hardware performance

r  Exception processing in HW r  Ultra low latency r  High packet rate and bandwidth

MEMORY MEMORY

Payload No(fica(ons

CQ

Payload

Host Host

CQ

No(fica(ons

TCP/IP/Ethernet  Frames

TCP/IP/Ethernet  Frames

Unified Wire RNIC

RDMA TCP

IP ETH

Unified Wire RNIC

RDMA TCP

IP ETH

TCP/IP in hardware

LAN/Data Center WAN/Long Haul

Any Switches/Routers

End-to-end CRC

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

NVMe over Fabrics

r  Address PCI reach and scalability limits

r  Extend native NVMe over large fabric

r  iWARP RDMA over Ethernet first proof point r  Intel NVMe drives r  Chelsio 40GbE iWARP r  Compared to local disk

r  Less than 8usec additional RTT latency

r  Same IOPS performance

7

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

NVMe/Ethernet Fabric with iWARP

8

Appliance Frontend

NVMe  op(mized  client

Linux  clients

Windows  clients

SMB3

NVMe/Fabrics

Ethernet  Front-­‐end  Fabric

Ethernet  Back-­‐end  Fabric

….

NVM Storage

iSCSI NVM

Storage

NVM Storage

ANY  Switch/  router  

ANY  Switch/  router  

NVMe/Fabrics T5  Unified  Adapter T5  Unified  

Adapter

T5 supports iSCSI, SMBDirect and NVMe/Fabrics offload simultaneously

iSCSI  SMBv3  

NVMe/Fabrics

NVM Storage

Remote  replicaIon  

ANY  Switch/  router  

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

T5 Storage Protocol Support

9

NFS

Lower Layer Driver

iSCSI

iSER

SMB Direct

NDK

FCoE

NVMe/Fabrics and RBD RPC

Network Driver RDMA Driver iSCSI Driver

T5 Hardware

FCoE Driver

SMB

RDMA Offload

TCP Offload

iSCSI Offload FCoE Offload

NIC

T10-DIX

OS Block IO NBD

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

NVMe  Drive  

Linux  File  System  

NVMe/Fabrics Layering

10

Kernel  

T5  NIC  

Ini(ator  

RDMA  Verbs  

NVMe/Fabrics  

RDMA  Driver  

RDMA  Offload  

TCP  Offload  

Linux  Block  IO  

Target  

RDMA  Verbs  

NVMe/Fabrics  

RDMA  Driver  

RDMA  Offload  

TCP  Offload  

NVMe  Device  Driver  

T5  NIC  

Kernel  

NVMe  Drive  

NVMe  Device  Driver  

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

NVMe  Drive  

Linux  File  System  

RBD Layering

11

Kernel  

T5  NIC  

Ini(ator  

RDMA  Verbs  

RDMA  Block  Driver  

RDMA  Driver  

RDMA  Offload  

TCP  Offload  

Linux  Block  IO  

Target  

RDMA  Verbs  

RDMA  Block  Driver  

RDMA  Driver  

RDMA  Offload  

TCP  Offload  

Block  Device  Driver  

SAS  Drive..  

T5  NIC  

Kernel  

Linux  Block  IO  

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

Generic RDMA Block Device (RBD)

r  Open Source Chelsio Linux driver r  Part of Chelsio’s Unified Wire SW

r  Virtualizes block devices over an RDMA connection

r  Target r  Claims the physical device using

blkdev_get_by_path() r  Submits IOs from the Initiator via

submit_bio() r  Initiator

r  Registers as a BIO device driver r  Registers connected Target devices

as a BIO device using add_disk() r  Current RBD Protocol Limits

r  Max RBD IO size is 128KB r  256 max outstanding IOs in flight r  Initiator physical/logical sector size is

4KB

r  Management via rbdctl command r  Initiator provides character device

command interface for adding/removing devices

r  Simple command, rbdctl, used to connect, login, and register the target devices

r  Examples r  Target

r  Load rbdt module r  Initiator

r  Load rbdi module r  Add a new target:rbdct –n –a

<ipaddr> -d /dev/nvme0n1 r  Remove an RBD device: rbdctl –

r –d /dev/rbdi0 r  List connected devices: rbdctl -l

12

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

RBD vs. NBD Throughput – 40GbE

13

0

5

10

15

20

25

30

35

40

4k 8k 16k 32k 64k 128k 256k 512k 1M

Thr

ough

put

(Gbp

s)

I/O Size (B)

RBD_read NBD_Read RBD_Write NBD_Write

Intel Xeon CPU E5-1660 v2 6-core processor clocked at 3.70GHz (HT enabled) with 64 GB of RAM, Chelsio T580-CR, RHEL 6.5 (Kernel 3.18.14), standard MTU of 1500B, using RAMdisks on target (4) fio --name=<test_type> --iodepth=32 --rw=<test_type> --size=800m --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --group_reporting --ioengine=libaio --numjobs=4 --bs=<block_size> --runtime=30 --time_based --filename=/dev/rbdi0

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

RBD vs. NBD Latency – 40GbE

14

Intel Xeon CPU E5-1660 v2 6-core processor clocked at 3.70GHz (HT enabled) with 64 GB of RAM. Chelsio T580-CR, RHEL 6.5 (Kernel 3.18.14), standard MTU of 1500B, using RAMdisk on target (1) fio --name=<test_type> --iodepth=1 --rw=<test_type> --size=800m --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --group_reporting --ioengine=libaio --numjobs=1 --bs=<block_size> --runtime=30 --time_based --filename=/dev/rbdi0

0

20

40

60

80

100

120

140

160

180

200

4k 8k 16k 32k 64k 128k

Late

ncy

(us)

I/O Size (B)

RBD_Read NBD_Read RBD_Write NBD_Write

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

SSD Latency Results

15

Supermicro MB, 16 cores - Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz, 64GB memory Chelsio T580-CR, RHEL6.5, linux-3.18.14 kernel, standard MTU of 1500B, through Ethernet switch, NVMe SSD target fio-2.1.10, 1 thread, direct IO, 30 second runs, libaio IO engine, IO depth = 1, random read, write, and read-write.

0

50

100

150

200

250

300

350

400

450

4 8 16 32 64 128 256

Late

ncy

(use

c)

I/O Size (KB)

Random Read Latency

Local

RBD

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

SSD Bandwidth Results

16

Supermicro MB, 16 cores - Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz, 64GB memory Chelsio T580-CR, RHEL6.5, linux-3.18.14 kernel, standard MTU of 1500B, through Ethernet switch, SSD target fio-2.1.10, 1 thread, direct IO, 30 second runs, libaio IO engine, IO depth = 32, random read, write, and read-write.

0

1

2

3

4

5

6

7

8

9

4 8 16 32 64 128 256

Thr

ough

put

(Gbp

s)

I/O Size (KB)

Random Read Throughput (1 Thread)

Local

RBD

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

Local vs. RBD Bandwidth – 2 SSD

17

0

2

4

6

8

10

12

14

16

4 8 16 32 64 128 256

Thr

ough

put

(Gbp

s)

I/O Size (KB)

Read

Local RBD

0

1

2

3

4

5

6

7

8

9

10

4 8 16 32 64 128 256

Thr

ough

put

(Gbp

s)

I/O Size (KB)

Write

Local RBD

Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz 2x6 cores 64GB RAM, 2xNVMe SSD Chelsio T580-CR, CentOS Linux release 7.1.1503 Kernel 3.17.8, standard MTU of 1500B through 40Gb switch fio --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --group_reporting --name=foo --bs=$2 --size=800m --numjobs=8 --ioengine=libaio --iodepth=16 --rw=randwrite --runtime=30 --time_based --filename_format=/dev/rbdi\$jobnum --directory=/

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

SSD

RDMA over Fabrics Evolution

18

RNIC SSD

CPU

PCIe

SSD RNIC

SSD

CPU Memory

PCIe

SSD RNIC

SSD

CPU

Memory

PCIe

DMA to/from host memory CPU handles command transfer

YESTERDAY

Peer-to-peer DMA through PCI bus No host memory use

TODAY

Network integrated NVMe storage No CPU or memory involvement

TOMORROW

Memory

2015 Storage Developer Conference. © Chelsio Communications Inc. All Rights Reserved.

Conclusions

r  NVMe over Fabrics for scalable high performance and high efficiency fabrics that leverage disruptive solid-state storage

r  Standard interfaces and middleware r  Ease of use and widespread support

r  iWARP RDMA ideal for NVMe/Fabrics r  Familiar and proven foundation r  Plug-and-play

r  Works with DCB but does not require it r  No DCB tax and complexity

r  Scalability, reliability, routability r  Any switches and any routers over any distance r  Scales from 40 to 100Gbps and beyond

r  NVMe/Fabrics devices with built-in processing power 19