nfs/rdma over 40gbps iwarp - chelsio · nfs/rdma over 40gbps iwarp ... european telecommunications...

25
NFS/RDMA over 40Gbps iWARP Wael Noureddine Chelsio Communications

Upload: vuongdiep

Post on 31-Aug-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS/RDMA over 40Gbps iWARP

Wael Noureddine

Chelsio Communications

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Outline

RDMA

Motivating trends

iWARP

NFS over RDMA

Overview

Chelsio T5 support

Performance results

2

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Adoption Rate of 40GbE

3Source: Crehan Research - 2Q14 CREHAN Quarterly Market Share Tables

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Software Defined Everything

4Source: European Telecommunications Standards Institute http://portal.etsi.org/nfv/nfv_white_paper.pdf October, 2012

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Motivating Trends

Unprecedented curve in 40GbE growth (and pricing)

Consolidation and virtualization

Software defined storage (everything) using commodity

hardware

Rise of the data center

Power efficiency

High speed, ultra low latency SSDs

Need for high performance, high efficiency fabric

Ethernet remains the preferred technology

TCP/IP for scalability, reliability, robustness and reach

iWARP RDMA over Ethernet 5

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

RDMA Overview

Direct memory-to-memory transfer

All protocol processing handled by the NIC Must be in hardware

Protection handled by the NIC User space access requires both

local and remote enforcement

Asynchronous communication model Reduced host involvement

Performance Latency – polling

Throughput

Efficiency Zero copy

Kernel bypass (user space I/O)

CPU bypass

6

T5T5

Wireless/LAN/Datacenter/WAN

Network

Protection

Protocol Processing

MEMORYMEMORY

PayloadNotifications

CQ

Payload

HostHost

CQ

Notifications

Packets Packets

Performance and efficiency in return

for new communication paradigm

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

iWARP RDMA over Ethernet

IETF RFCs in 2007

Open standard

Multiple vendors

Ongoing standardization

Extensions to maintain API

uniformity with InfiniBand

Recent RFC 7306 by

Broadcom, Chelsio and Intel

Mature stack

3rd generation hardware

RDMA over TCP/IP/Ethernet

TCP reliability, scalability,

congestion and flow control

IP routability

Ethernet ubiquity

Wireless ready

Near 10Gbps, low latency

Cloud ready

Standard TCP/IP foundation

No network restrictions

Full featured implementation

All RDMA benefits

High performance

High packet rate

Low latency (1.5usec user-to-

user)

Line rate 40Gb with single

connection

7

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

iWARP Benefits

Convergence Coexists with all other traffic

on same port

No special treatment needed

Familiar protocol stack Standard tools for

monitoring/debugging

Standard network function appliances (security, load balancing…)

Plug-and-play No need for lossless network

operation

Leverages existing infrastructure

Less expensive network hardware

Easy to deploy and manage

Leverages decades of TCP/IP experience Congestion avoidance and

control

Critical for network stability

Reliability at hardware speeds Retransmission and re-

ordering

Routable Goes wherever IP is spoken

Scalable across Network size

Network architecture

Distance

Reliable, robust, scalable

8

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Linux RDMA Architecture

9

RDMA NICR-NIC

Host Channel

Adapter

HCA

User Direct Access

Programming Lib

UDAPL

Reliable Datagram

Service

RDS

iSCSI RDMA Protocol

(Initiator)

iSER

SCSI RDMA Protocol

(Initiator)

SRP

Sockets Direct

Protocol

SDP

IP over InfiniBandIPoIB

Performance

Manager Agent

PMA

Subnet Manager

Agent

SMA

Management

Datagram

MAD

Subnet AdministratorSA

Common

InfiniBand

iWARP

Key

InfiniBand HCA iWARP R-NIC

Hardware

Specific Driver

Hardware Specific

Driver

Connection

ManagerMAD

InfiniBand OpenFabrics Kernel Level Verbs / API iWARP

SA

ClientConnection

Manager

Connection Manager

Abstraction (CMA)

InfiniBand OpenFabrics User Level Verbs / API iWARP

SDPIPoIB SRP iSER RDS

SDP Lib

User Level

MAD API

Open

SM

Diag

Tools

Hardware

Provider

Mid-Layer

Upper Layer

Protocol

User

APIs

Kernel Space

User Space

NFS-RDMA

RPC

Cluster

File Sys

Application

Level

SMA

Clustered

DB Access

Sockets

Based

Access

Various

MPIs

Access to

File

Systems

Block

Storage

Access

IP Based

App

Access

Apps &

Access

Methods

for using

OF Stack

UDAPL

Kern

el bypass

Kern

el bypass

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS over RDMA Timeline

NetApp/Sun 2007

IETF RFCs

RFC 5532 problem statement in 2009

RFC 5666 RDMA for RPC in 2010

RFC 5667 NFS DDP in 2010

Renewed effort with rise in RDMA interest

Under active development – mostly client side

Chelsio, Emulex, Intel, LANL, Mellanox, NASA, NetApp,

Oracle…

10

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS over RDMA Overview

NFS extensions to use RDMA fabric (for NFSv2,3,4)

Client sends RPC in RDMA messages

Server initiates RDMA data transfer transactions

Reduces client side CPU utilization

Eliminates client side data copies

Leverages low latency fabric

Requires NIC with RDMA offload at both server and

client ends

11

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Client Stack

12

NFS Client

RPC Transport Switch

TCP/IP or UDP/IP RPC

RDMA CM

IB CM IW CM

Host Stack

TCP Offload

Module

RDMA DriverNetwork Device Driver

Kernel

RDMA Offload

TCP Offload

T5 NIC

RDMA Verbs

RDMA RPC

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Chelsio T5 Ethernet Controller ASIC

13

Single processor data-flow pipelined architecture

Up to 1M connections Concurrent multi-protocol

operation Full TCP/IPv4|IPv6 offload in

4CLK @500MHz

1G/10G/40G MAC

Embedded Layer 2

Ethernet Switch

Lookup, filtering and FirewallCut-Through RX Memory

Cut-Through TX Memory

Data-flow Protocol Engine

Traffic Manager

Application Co-Processor TX

Application Co-Processor RX

DM

A E

ngi

ne

PC

I-e,

X8

, Gen

3

General Purpose Processor

Optional external DDR3 memory

1G/10G/40G MAC

100M/1G/10G MAC

100M/1G/10G MAC

On-Chip DRAMMemory Controller

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

T5 Storage Protocol Support

14

NFS

Lower Layer Driver

iSCSI

iSER

SMB

Direct

NDK

FCoENVMe

RPC

Network Driver RDMA Driver iSCSI Driver

T5 Network Controller

FCoE

Driver

SMB

RDMA Offload

TCP Offload

iSCSI OffloadFCoE

Offload

NIC

T10-DIX

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Test Configuration

Clients (x4)

OS RHEL6.5

Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes

Processor Intel(R) Xeon(R) CPU E5-2687W [email protected]

No of Processors 2

No of Cores Total 16 (HT Disabled)

RAM 64 GB

Card Type T580-CR

Card Core Clock 500MHz

Server

OS RHEL6.1

Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes

Processor Intel(R) Xeon(R) CPU E5-2687W @ 3.10GHz

No of Processors 2

No of Cores Total 16 (HT Disabled)

RAM 64 GB

Card Type T580-CR

Card Core Clock 500MHz

Share 32GB ramdisk w/ ext2 filesystem.

15

• Clients connected through switch to server with all 40Gbps links• Sequential I/O direct (no buffer caching)• Need OFED 3.12+ for 40G iWARP support

Clients

40 Gb Switch

NFS/RDMA Server

40 Gb40 Gb40 Gb

40 Gb

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write – iWARP vs. L2 NIC

16

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4 8 16 32 64 128 256 512 1024 2048 4096

Th

ro

ugh

pu

t in

MB

/sec

I/O Size in KB

Write

RDMA

NIC

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write Client Ints/sec – iWARP vs. L2 NIC

17

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

4 8 16 32 64 128 256 512 1024 2048 4096

Inte

rru

pts

/se

c

I/O Size in KB

Write Ints/Sec

RDMA-Clis

NIC-Clis

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read – iWARP vs. L2 NIC

18

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4 8 16 32 64 128 256 512 1024 2048 4096

Th

ro

ugh

pu

t in

MB

/sec

I/O Size in KB

Read

RDMA

NIC

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read Client Ints/sec – iWARP vs. L2 NIC

19

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

4 8 16 32 64 128 256 512 1024 2048 4096

Inte

rru

pts

/se

c

I/O Size in KB

Read Ints/Sec

RDMA-Clis

NIC-Clis

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write – iWARP vs. InfiniBand

20

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ugh

pu

t in

MB

/se

c

I/O Size in KB

Write Throughput

IW

IB

RHEL6.4, NFS Share: 40GB ramdisk, ext2 file system

Kernel: 3.16.0 + NFSv4 + latest NFSRDMA/cxgb4 fixes, default settings

CPU: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz 64GB RAM 2 CPUs, 16 cores total, no HT

IW HW: Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller

IB HW: Mellanox Technologies MT27500 Family [ConnectX-3] FDR

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write – iWARP vs. FDR InfiniBand

21

0

10

20

30

40

50

60

70

80

90

100

4 8 16 32 64 128 256 512 1024 2048 4096

% C

PU

I/O Size in KB

Write Idle CPU

IW-Srv

IB-Srv

IW-Clis

IB-Clis

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read – iWARP vs. InfiniBand

22

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ugh

pu

t in

MB

/se

c

I/O Size in KB

Read Throughput

IW

IB

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read – iWARP vs. InfiniBand

23

0

10

20

30

40

50

60

70

80

90

100

4 8 16 32 64 128 256 512 1024 2048 4096

% C

PU

I/O Size in KB

Read Idle CPU

IW-Srv

IB-Srv

IW-Clis

IB-Clis

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Conclusions

RDMA fabric offers potential for improved efficiency

SMB v3.0 RDMA transport demonstrated significant gains

Renewed interest in NFS/RDMA

Work in progress

Performance benefits compared to NIC

iWARP RDMA is shipping at 40Gbps

High performance Ethernet alternative to InfiniBand

Chelsio adapter enables simultaneous operation of RDMA, NIC, TOE,

iSCSI, FCoE…

TCP/IP for Wireless, LAN, Datacenter and Cloud networking

Remains “a great all-in-one adapter”*

Call to action

Contribute to RDMA and NFS/RDMA in Linux

Mailing lists linux-rdma and linux-nfs on vger.kernel.org24

* Helen Chen et al. OFA NFS/RDMA Presentation 2007

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Thank You

Please visit www.chelsio.com for more info

25