high performance file serving with smb3 and rdma via smb

31
High Performance File Serving with SMB3 and RDMA via SMB Direct Tom Talpey, Microsoft Greg Kramer, Microsoft

Upload: others

Post on 16-Jan-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

High Performance File Serving with SMB3 and RDMA via SMB Direct

Tom Talpey, Microsoft Greg Kramer, Microsoft

Page 2: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Protocol

SMB Direct New protocol supporting SMB 3.0 over RDMA Minimal CPU overhead High bandwidth, low latency

Fabric agnostic iWARP, InfiniBand, RoCE IP addressing

IANA port (smbdirect 5445)

File Client File Server

SMB3 Server SMB3 Client

User

Kernel

Application

Disk R-NIC

Network w/ RDMA

support

NTFS SCSI

Network w/ RDMA

support

R-NIC

Page 3: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Documented

MS-SMBD http://msdn.microsoft.com/en-us/library/hh536346.aspx

MS-SMB2 http://msdn.microsoft.com/en-us/library/cc246482.aspx

Windows kRDMA API NDKPI

http://msdn.microsoft.com/en-us/library/windows/hardware/jj206456.aspx

Part of Windows Driver Kit Network Direct (and Verbs) heritage

Page 4: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Implemented

Windows Server 2012 SMB 3.0 over SMB Direct Supports

Multichannel Continuous availability All other SMB 3.0 features

Page 5: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Basics

SMB Direct is a transport framing Only 3 message types

2-way full duplex transport which supports: Datagram-type send/receive exchange

With fragmentation/reassembly for “large” Direct RDMA Read/Write

SMB 3.0 binding defines transport use: Client buffer advertisement for READ and WRITE Server RDMA buffer access (push/pull)

Page 6: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Use

Discovery via SMB 3.0 Multichannel “RDMA” attribute of interface

Negotiated capabilities SMB Direct version Message and RDMA Region sizes Credits

Messages RDMA Read operations (via NDK provider)

Page 7: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Three messages

7

Octet 0 Octet 1 Octet 2 Octet 3

MinVersion MaxVersion

Reserved CreditsRequested

PreferredSendSize

MaxReceiveSize

MaxFragmentedReceiveSize

SMB Direct Negotiate Request

Octet 0 Octet 1 Octet 2 Octet 3

MinVersion MaxVersion

NegotiatedVersion Reserved

CreditsRequested CreditsGranted

Status

MaxReadWriteSize

PreferredSendSize

MaxReceiveSize

MaxFragmentedReceiveSize

SMB Direct Negotiate Response

Octet 0 Octet 1 Octet 2 Octet 3

CreditsRequested CreditsGranted

Flags Reserved

RemainingDataLength

DataOffset

DataLength

Padding

Data (variable)

SMB Direct Data Transfer Header

Once Everything else

Page 8: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Transfers

Send/Receive model Single logical message Possibly sent as fragmentation “train”

Using ordering properties of RDMA

Implements crediting All SMB 3.0 operations use this

Direct placement model Advertises RDMA regions in scatter/gather list SMB 3.0 uses for SMB2_READ and SMB2_WRITE

Only.

Piggyback on existing “Channel”

Page 9: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Send transfers

9

SMB Direct HDR (24 bytes)

SMB3 message bytes 0 - 999

DataOffset = 24 DataLength = 1000

RemainingDataLength = 1048

Send 0

SMB Direct HDR (24 bytes)

SMB3 message bytes 1000 - 1999

DataOffset = 24 DataLength = 1000

RemainingDataLength = 48

Send 1

SMB Direct HDR (24 bytes)

SMB3 message bytes 2000- 2047

DataOffset = 24 DataLength = 48

RemainingDataLength = 0

Send 2

Page 10: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

SMB3 Reads and Writes

10

Octet 0 Octet 1 Octet 2 Octet 3

StructureSize DataOffset

Length

Offset

FileId

Channel

RemainingBytes

WriteChannelInfoOffset WriteChannelInfoLength

Flags

Buffer (variable)

SMB3 WRITE REQUEST

Previously reserved fields

Octet 0 Octet 1 Octet 2 Octet 3

StructureSize Padding Reserved

Length

Offset

FileId

MinimumCount

Channel

RemainingBytes

ReadChannelInfoOffset ReadChannelInfoLength

Flags

Buffer (variable)

SMB3 READ REQUEST

Octet 0 Octet 1 Octet 2 Octet 3

Address

Token

Length

Channel array

Page 11: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

RDMA transfers

11

SMB3 HDR

MEMORY DESCRIPTORS

SMB Direct HDR

SMB3 HDR

SMB3 WRITE RESP

SMB Direct HDR

RDMA Read

SMB3 WRITE REQ

Send

Send

DATA

SMB3 HDR

SMB3 READ REQ

MEMORY DESCRIPTORS

SMB Direct HDR

SMB3 HDR

SMB3 READ RESP

SMB Direct HDR

RDMA Write DATA

Send

Send

SMB Direct READ

SMB Direct WRITE

Client Server

Page 12: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Credits

Bi-directional Count of ready receive buffers offered

Dynamic – can increase or decrease at any time Optional to do so

Used only to control low-level SMBD message exchanges Recycled independently of SMB operations Relatively small number required (100’s even

for deep random workloads)

Page 13: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Quirks

Interesting corner cases “Last credit” Always need 1 in each endpoint to avoid deadlock

(but see details in spec!) Bi-directional – no requirement for same both ways

Async/Cancel/Errors No reply, multiple reply, unexpected large reply NOT an RPC-like interface, much as it may

resemble one

Page 14: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Efficiency

True bi-directional and streaming sends Can be exposed as sockets-like interface

With register/unregister/RDMA rw extensions RDMA operations / completions Datamover offload to RNIC Server “pull” model improves performance Many options for RDMA efficiency

FRMR, silent completions, coalescing, etc Resources bounded by credits and sizes

Page 15: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Performance

15

Page 16: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

SDC 2011 performance results

16

InfiniBand switch

Nehalem: 1 socket x 4 cores @ 2.26 Ghz

Westmere: 2 socket x 6 cores @ 2.66 Ghz

RAID 0 – 12 SSDs RAID 0 – 12 SSDs

Single 32 Gbps InfiniBand link

160,000 IOPS (1KiB random reads) 3200 MiB/sec (512KiB sequential reads)

Page 17: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Current performance results

17

File Server (SMB 3.0)

File Client (SMB 3.0)

SQLIO

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

Storage Spaces

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=20163

NTFS

Page 18: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Current performance results…

18

Avg. MB/sec* Avg. IOs/sec (512 KiB)

Avg. %CPU (Client) Avg. Latency (ms)

7,340 ~14K 8.6 1

Avg. MB/sec* Avg. IOs/sec (8 KiB)

Avg. %CPU (Client) Avg. Latency (ms)

3,711 ~453K 60 < 1

sqlio2.exe -T100 -t16 –s60 -b8 -o4 –frandom -BN –LS (four files per volume)

sqlio2.exe -T100 –t2 –s60 –b512 -o4 –fsequential -BN –LS (1 file per volume)

Server fully utilized

Server fully utilized

* 1MB = 1,000,000 bytes

Page 19: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Let’s take it to 11!

19

File Server (SMB 3.0)

File Client (SMB 3.0)

SQLIO

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

Storage Spaces

NTFS

RDMA NIC

RDMA NIC

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

Page 20: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Let’s take it to 11 16!

20

Avg. MB/sec* Avg. IOs/sec (512 KiB)

Avg. %CPU (client) Avg. Latency (ms)

16,253 ~31K 15 1

sqlio2.exe -T100 –t2 –s60 –b512 -o4 –fsequential -BN –LS (1 file per volume)

16 GigaBYTES (not bits) of storage throughput!

* 1MB = 1,000,000 bytes

Page 21: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

NUMA effects on performance

At these speeds, NUMA effects cannot be ignored

To achieve peak performance, the SMB3 / SMB Direct stack must avoid cross-NUMA node memory accesses whenever possible.

21

Test Case Avg. MB/sec* Avg. IOs/sec (8 KiB)

Avg. %CPU (client)

Avg. Latency (ms)

NUMA aware multichannel dispatcher

3,711 453K 60 < 1

NUMA unaware multichannel dispatcher

3,719 454K 76 < 1

sqlio2.exe -T100 -t16 –s60 -b8 -o4 –frandom -BN –LS (four files per volume)

* 1MB = 1,000,000 bytes

Page 22: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

NUMA and SMB3 Multichannel

SMB3 Multichannel can be used to improve performance on NUMA systems SMB3 session is split across multiple channels Channels affinitized to a set of NUMA nodes Client dispatches IO requests to maximize

performance and minimize cross NUMA node memory accesses

One example of how the Windows Server 2012 SMB3 / SMB Direct stack has been optimized for high performance on NUMA systems

22

Page 23: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

That’s great! Now what?

Are there simple improvements we could make to the SMB Direct protocol?

Goals: Ease of implementation Increase IOPS Decrease latency Decrease CPU utilization

23

Page 24: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Where can we reduce IO costs?

24

App SMB Client Client RNIC Server RNIC

ReadFile()

Register status

Send status

Invalidate registration

Invalidate status

ReadFile() status

Register buffer

Send SMB request

Send SMB response

Consumes CPU cycles

RDMA write data

Aggressive invalidation: • Consumes CPU cycles • Consumes RNIC/bus cycles • Increases interrupts/sec • Increases IO latency

Page 25: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Why aggressively invalidate?

Application will likely reuse same buffers for subsequent IO requests.

Why not cache and reuse buffer registrations? Peer can RDMA write after IO has completed

Data corruption / system crash / connection loss

Peer can RDMA read after IO has completed Data leak / connection loss

Registration caches are not robust enough for storage and enterprise server applications.

25

Page 26: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Why aggressively invalidate?

Invalidation provides strict correctness guarantees with respect to data: Data is in a consistent state following DMA

Application can safely access its data Peer no longer has access to the region

No data corruption, crashes, or leaks due to peer-initiated RDMA operations

Aggressive invalidation is a necessary expense, but we might be able to reduce its cost…

26

Page 27: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Use Send with Invalidate?

27

App SMB Client Client RNIC Server RNIC

ReadFile()

Register status

Send status

ReadFile() status

Register buffer

Send SMB request

Send SMB response with token

to invalidate

Consumes CPU cycles

RDMA write data

RNIC invalidates registration before indicating

received data

Page 28: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Benefits of send with invalidate...

Reduces RNIC work requests by 1/3rd for small IOs (IOs that require one memory descriptor) Fewer CPU cycles Fewer RNIC/bus cycles Fewer interrupts Lower IO latency

Already supported by major RDMA standards iWARP InfiniBand RoCE

28

Page 29: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Benefits of send with invalidate…

No change to SMB Direct protocol Make send with invalidate an optional feature. Client

continues to invalidate the buffer if the server does not.

Minimal change to SMB3 protocol SMB3 read/write request indicates when the server is

requested to invalidate a request’s memory descriptor via the server’s response.

Not a committed plan (investigation only) Feedback?

29

Page 30: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Summary

SMB3 and SMB Direct allow Windows Server 2012 to efficiently host enterprise application workloads.

SMB3 / SMB Direct protocols could be enhanced in simple ways to further improve performance. Increase IOPS Decrease CPU overhead Decrease latency

30

Page 31: High Performance File Serving with SMB3 and RDMA via SMB

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

31

Questions?

http://smb3.info