NETW-101: Discussion Group on Networking (Networking Track)
Chelsio CommunicationsNVMe-oF with iWARP and NVMe/TCP
Bob Dugan – Dir of Engineering
NVMe Developer Days 2018San Diego, CA 1
Outline
NVMe Developer Days 2018San Diego, CA 2
• Chelsioo Overviewo ASICso Ethernet Adapters
• iWARP Goodness• NVMe-oF with iWARP + NVMe/TOE
o NVMe-oF with iWARP Layeringo Software Availability
• Benchmarkso NVMe-oF with iWARPo NVMe/TCP & NVMe/TOEo Latency Summary
NVMe Developer Days 2018San Diego, CA 3
Chelsio
Chelsio Overview
NVMe Developer Days 2018San Diego, CA 4
• Company Factso Founded in 2000o 130 strong staffo Offices
§ Sunnyvale CA USA à R&D & Corporate Headquarters§ Bangalore India à R&D Center
o ISO 9001 Certified• Market
o Ethernet Adapters & ASICs Solutions: 10/25/40/50/100GbEo Customers are Server and Storage OEMs, ODMs, Channelo 1.3+ million ports shippedo Subset of Major Customers: Dell EMC, NetApp, IBM, Netflix
Chelsio Products
NVMe Developer Days 2018San Diego, CA 5
ASICs 10/25/40/50/100 GbEEthernet Adapters
https://www.chelsio.com/products/
Chelsio ASIC: Silicon History
NVMe Developer Days 2018San Diego, CA 6
+25/50/100Gb+crypto1.5µs
+40Gb+PCIe Gen3+T10-DIX2µs
10Gb+PCIe G2+FCoE+TM+Video Ofld+Filter+vSwitch3µs
10Gb+PCIe Gen1+iSCSI+iWARP7µs
T1
10Gb+PCI-X 1.1
10GbTOEPCI-X 1.0
T2 T3 T4 T5 T6
2003 2005 2007 2011 2013 2016 2018
Chelsio ASIC: T6 Architecture
NVMe Developer Days 2018San Diego, CA 7
Embedded Layer 2
Ethernet Switch
Lookup, Filtering and
Firewall
Cut-Through RX Memory
Cut-Through TX Memory
Data-flow Protocol Engine
Traffic Manager
Application Co-Processor TX
Application Co-Processor RX
DMA
Engi
ne
PCI-e
, x16
, Gen
3
General Purpose Processor
External DDR3/4 memory
On-Chip DRAMMemory Controller
1G/10G/25G40G/50G/100G
1G/10G/25G40G/50G/100G
TLS/SSL/IPsec Co-processor
Chelsio Ethernet Adapters
NVMe Developer Days 2018San Diego, CA 8
NVMe Developer Days 2018San Diego, CA 9
iWARP Goodness
iWARP Goodness
NVMe Developer Days 2018San Diego, CA 10
• Two flavors of Ethernet RDMA: iWARP and RoCE• iWARP sits on top of TCP/IP & utilizes its flow and congestion control
goodness• Works across standard Ethernet without the complexity of DCB & PFC• iWARP is standards based
o Initially based on five IETF RFCs, published in 2007o Three enhancement RFCs published sinceo Fully compliant with Open Fabrics
• Why is iWARP important and relevant?o Easy to deploy & maintain – best in classo Uses legacy infrastructure (switches, routers, etc.)o Routable across thousands of kilometers
iWARP Goodness cont.
NVMe Developer Days 2018San Diego, CA 11
• iWARP also...o Scales link speeds with Ethernet
§ Current products range from 1 to 100Gbpso Utilizes RDMA’s High Performance
§ Low Latency, High Throughput, High IOPS, Low CPU Utilizationo Easily scales to thousands of connections & to thousands of nodes
• iWARP Vendors – List is Growingo Chelsioo Intelo Marvell/Cavium/Qlogico Kazan Networks
NVMe Developer Days 2018San Diego, CA 12
NVMe-oF with iWARP+
NVMe/TOE
NVMe-oF with iWARPLayering
NVMe Developer Days 2018San Diego, CA 13
NVMe-oF & NVMe/TOE SoftwareChelsio Driver Availability
NVMe Developer Days 2018San Diego, CA 14
NVMe-oF (iWarp)• Windows Server 2016/2019 (Initiator)• Linux (Initiator & Target)
o RHEL 7.5o SLES 15o Ubuntu 18.04o Kernel.org 4.9
• VMware (Initiator)o ESX 6.7
NVMe/TOE (out of box drivers)• Linux (Initiator & Target) - Alpha
NVMe Developer Days 2018San Diego, CA 15
Benchmarks
Benchmark: 100G NVMe-oF w/iWARPRead/Write Throughput & IOPS
NVMe Developer Days 2018San Diego, CA 16
https://www.chelsio.com/wp-content/uploads/resources/t6-100g-nvme-jbof.pdf
Test Setup
Benchmark: 100G NVMe-oF w/iWARPRead/Write Throughput & IOPS
NVMe Developer Days 2018San Diego, CA 17
https://www.chelsio.com/wp-content/uploads/resources/t6-100g-nvme-jbof.pdf
Summary• Left graph is with null block device (no storage). Right graph is with NMVe storage.• Throughput of 93 Gbps for Read using block device and a high of 91.5 Gbps using NVMe.• Read IOPS exceeding 2.4M using null device and 1.9M using NVMe.
Benchmark: 100G NVMe-oF w/iWARPLatency at 4K IO size using a JBOF
NVMe Developer Days 2018
San Diego, CA 18
https://www.chelsio.com/wp-content/uploads/resources/t6-100g-nvmeof-jbof.pdf
Summary• Averages ~9 μSec latency
difference between Remote & Local NVMe
• Latency delta doesn’t exceed 12 μSec
Benchmark: 100G NMVe/TCPHost Stack vs TOE
NVMe Developer Days 2018San Diego, CA 19
100 Gb Switch 100 Gb100 Gb
100 Gb
• The target machineo 2 Intel Xeon E5-2687W v4
§ 12-core @ 3.00GHz (HT disabled)o 128GB of RAMo Chelsio T62100-CR (2 x 100Gbps)o RHEL 7.3 (4.18.0-rc6 kernel)
• The initiator machineso 1 Intel Xeon E5-1620 v4
§ 4-core processor @ 3.50GHz (HT enabled)o 32GB of RAMo Chelsio T62100-CR (2x100 Gbps)o RHEL 7.3 (4.18.0-rc6 kernel)Test Setup
Benchmark: 100G NMVe/TCPHost Stack TCP vs TOE
NVMe Developer Days 2018San Diego, CA 20
0
20
40
60
80
100
0.0
0.8
1.6
2.4
3.2
4.0
4K 8K 64K 256K 512K
Thro
ughp
ut (G
bps)
IOPS
(Mill
ions
)
I/O Size(Bytes)NVMEoF_TOE_Write_IOPS NVMEoF_TOE_Read_IOPSNVMEoF_TCP_Write_IOPs NVMEoF_TCP_Read_IOPSNVMEoF_TOE_Write_BW NVMEoF_TOE_Read_BWNVMEoF_TCP_Write_BW NVMEoF_TCP_Read_BW
Summary• Read throughput is line
rate for TOE• Write throughput near line
rate for TOE• 2.6M IOPS at 4K I/O size
for TOE• ~12 μSec delta latency
between local and remote storage
Benchmark: Latency SummaryDelta of Local vs Remote Latencies (uSec)
NVMe Developer Days 2018San Diego, CA 21
Read WriteNVMe-oF iWARP
(SPDK)5.48 8.98
NVMe-oF iWARP(kernel)
9.49 8.08
NVMe/TCP TOE(SPDK)
14.20 13.60
NVMe/TCP TOE(kernel)
14.08 12.54
NVMe/TCP(Host TCP stack)
20.75 16.50
• Above numbers are the latest but are a work in progress / still tuning
Free-format slide title
NVMe Developer Days 2018San Diego, CA 22