smb direct, the secret decoder ring · 2015-06-18 · • scatter/gather entries support (reading...

40
SMB Direct, The Secret Decoder Ring E2EVC 2015 BERLIN June 12 th -14 th

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

SMB Direct, The Secret Decoder Ring

E2EVC 2015 BERLINJune 12th-14th

Page 2: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Didier Van HoyeTechnical Architect & Technology Strategist

Microsoft MVP in Hyper-V

MEET Member

DELL TechCenter Rockstar

http://workinghardinit.wordpress.com @workinghardinit

Page 3: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What?

Why?

WHERE?

How?

Who? When?

Page 4: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What is SMB DIRECT?

Page 5: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

SMB Direct is SMB over RDMA

So what is RDMA?

Didier Van Hoye - WorkingHardInIT

Direct Memory Access (DMA) allows access (read/write) to the host memory directly, without the intervention of the CPU(s).

Remote Direct Memory Access (RDMA) extends this ability to remote systems.

Page 6: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What makes RDMA “great”?

Page 7: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What makes RDMA “great”?

Didier Van Hoye - WorkingHardInIT

Kernel Bypass: applications can perform data transfer directly from user space without the need to perform context switches over the kernel.

Zero-Copy: applications can transfer data from one host to another without use of the software network stack. Data is being read & written, sent/received directly to memory buffers without being copied over the different network layers.

Page 8: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What makes RDMA “great”?

Didier Van Hoye - WorkingHardInIT

CPU Bypass / Offload• Applications can access remote memory without consuming any host CPU cycles

in the remote machine. The remote memory machine will be read/written without involving remote processes (meaning CPU cycles).

• The caches in the remote CPU(s) won't be filled with the accessed memory content.

Transport protocol acceleration• Message based transactions (TCP is stream based).• Scatter/gather entries support (reading multiple memory buffers and sending

them as one or reading one and writing it to multiple memory buffers).

Page 9: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Didier Van Hoye - WorkingHardInIT

Page 10: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Q:Where did RDMA grow up?

Page 11: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

A: Where the benefits were worth the

premium price

Didier Van Hoye - WorkingHardInIT

1. Low latency

2. High throughput

3. Reduced CPU footprint

4. “lossless” fabrics (you have to provide this)

HPC, Financial transactions, medical imaging, storage, backup, cloud computing, …

Page 12: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Why do I care?

Page 13: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Why is RDMA important to us?

Didier Van Hoye - WorkingHardInIT

1. The BIG surge in East-West traffic due to virtualization induces a performance & scalability hit.

2. Storage over file system shares is also highly demanding.

In other words not using RDMA: Slows us down Loses us money (can’t leverage capabilities of hard & software)

Page 14: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Modern commodity network

hardware …

Didier Van Hoye - WorkingHardInIT

Page 15: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

A modern cloud OS, servers &

storage

Didier Van Hoye - WorkingHardInIT

Page 16: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Flavors of RDMA

Didier Van Hoye - WorkingHardInIT

Type (Cards*) Pros Cons

Non-RDMA Ethernet

(wide variety of NICs)

• TCP/IP-based protocol

• Works with any Ethernet switch

• Wide variety of vendors and models

• Support for in-box NIC teaming

• High CPU Utilization under load• High latency

iWARP

(Chelsio T4/T5)Lo

w C

PU

Uti

lizat

ion

un

der

load

Low

late

ncy

• TCP/IP-based protocol

• Works with any Ethernet switch

• RDMA traffic routable

• Offers up to 40Gbps per NIC port today

• Requires enabling firewall rules

RoCE(Mellanox ConnectX-3,

Mellanox ConnectX-3Pro,Mellanox ConnectX-4)

• Ethernet-based protocol• Works with Ethernet switches with DCB support• Routable RoCEv2 is here• Offers up to 100Gbps per NIC port today

• RoCEv1 non routable• Requires DCB switch with Priority Flow Control (PFC)

InfiniBand(Mellanox ConnectX-3,

Mellanox ConnectX-3Pro,Mellanox ConnectX-4,Mellanox Connect-IB)

• Switches typically less expensive per port• Switches offer high speed Ethernet uplinks• Commonly used in HPC environments• Offers up to 100Gbps per NIC port today

• Not an Ethernet-based protocol• RDMA traffic not routable via IP infrastructure• Requires InfiniBand switches• Requires a subnet manager (typically on the switch)

Page 17: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What is DCB?

Didier Van Hoye - WorkingHardInIT

• Priority-based Flow Control (PFC). Provides a link-level, flow-control mechanism that can be independently controlled for each priority to ensure zero-loss due to converged-network congestion.

• Quantified Congestion Notification (QCN). Provides end-to-end congestion management for protocols without built-in congestion-control mechanisms. It’s also expected to benefit protocols with existing congestion management by providing more timely reactions to network congestion.

• Enhanced Transmission Selection (ETS). Provides a common management framework for bandwidth assignment to traffic classes.

• Data Center Bridging Exchange Protocol (DCBx). A discovery and capability exchange protocol used to convey capabilities and configurations of the other three DCB features between neighbors to ensure consistent configuration across the network.

Page 18: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

How does PFC work?

Didier Van Hoye - WorkingHardInIT

Page 19: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

How does ETS work?

Didier Van Hoye - WorkingHardInIT

Page 20: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What about QCN?

Didier Van Hoye - WorkingHardInIT

Page 21: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What about QCN?

Didier Van Hoye - WorkingHardInIT

• Missing in Windows & a lot of switches for now • Complexity to achieve this across all hops & end to end• Is the lack of end to end congestion notification an issue?• End to end might not be needed (hop based in FC – ingress rate limiting -

does the job, doesn’t work in FCoE due to FCF/Layer 3 MAC rewriting ) … • Apparently not (you won’t hear anyone put that in writing) as RoCE v2 is

now being used in public clouds without problems …• Some state the industry silicon is ready but … the

standard/implementation has to come to many switches still…• Some say it’s a worry, it depends on the design & protocol• Devil in details= layer 2 so not routable ;-)

Page 22: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

What about DCBX?

Didier Van Hoye - WorkingHardInIT

• Missing in Windows for now.• It’s a convenience issue solved by automation of your DCB

configuration (PowerShell).• But convenience is important and I expect Microsoft to look

into this and possibly provide it in future versions of Windows.• The benefit is that the DCB configuration is learned from the

switches.

Page 23: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

How do I configure DCB?

Didier Van Hoye - WorkingHardInIT

Page 24: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Configure OS, Application &

Network

Didier Van Hoye - WorkingHardInIT

• Don’t mix QoS types

• Configure QoS for all payloads

• You need rNICs (RDMA capable NICs)

• Configure your switches end to end (ports, uplinks)

• Configure your host & rNICs

• You must do PFC (this requires tagged VLANson hosts & switches)

• You can do ETS (harder to test & make work in heterogeneous environments)

Page 25: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Didier Van Hoye - WorkingHardInIT

Page 26: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Disable 802.3x flow control (global

pause)

Didier Van Hoye - WorkingHardInIT

FTOS#configure

FTOS(conf)#interface range tengigabitethernet 0/0 -47

FTOS(conf-if-range-te-0/0-47)#no flowcontrol rx on tx on

FTOS(conf-if-range-te-0/0-47)#exit

FTOS(conf)#interface range fortyGigE 0/48 , fortyGigE 0/52

FTOS(conf-if-range-fo-0/48-52)#no flowcontrol rx on tx off

FTOS(conf-if-range-fo-0/48-52)#exit

Page 27: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Enable DCB & Configure VLANs

Didier Van Hoye - WorkingHardInIT

FTOS(conf)#service-class dynamic dot1p

FTOS(conf)#dcb enable

FTOS(conf)#exit

FTOS#copy running-config startup-config

FTOS#reload

FTOS#configure

FTOS(conf)#interface vlan 110

FTOS (conf-if-vl-vlan-id*)#tagged tengigabitethernet 0/0-47

FTOS (conf-if-vl-vlan-id*)#tagged port-channel 3

FTOS (conf-if-vl-vlan-id*)#exit

Page 28: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Create & configure DCB Map Policy

Didier Van Hoye - WorkingHardInIT

FTOS(conf)#dcb-map SMBDIRECT

FTOS(conf-dcbmap-profile-name*)#priority-group 0 bandwidth 90 pfc on

FTOS(conf-dcbmap-profile-name*)#priority-group 1 bandwidth 10 pfc off

FTOS(conf-dcbmap-profile-name*)#priority-pgid 1 1 1 1 0 1 1 1

FTOS(conf-dcb-profile-name*)#exit PFC => IEEE 802.1Qb

Page 29: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Apply DCB map to the switch ports

& uplinks

Didier Van Hoye - WorkingHardInIT

FTOS(conf)#interface range ten 0/0 – 47

FTOS(conf-if-range-te-0/0-47)#dcb-map SMBDIRECT

FTOS(conf-if-range-te-0/0-47)#exit

FTOS(conf)#interface range fortyGigE 0/48 , fortyGigE 0/52

FTOS(conf-if-range-fo-0/48,fo-0/52)#dcb-map SMBDIRECT

FTOS(conf-if-range-fo-0/48,fo-0/52)#exit

FTOS(conf)#exit

FTOS#copy running-config startup-config

Page 30: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Demo Time

Didier Van Hoye - WorkingHardInIT

Page 31: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Jumbo Frames?

Didier Van Hoye - WorkingHardInIT

• Infiniband?– Exists, MTU Size up to 4K.

– Impact on SMB Direct?

• RoCE?– Yes, familiar 1500 MTU size up to 9K (or more depending on your NIC/Swithes).

– Impact on SMB Direct?

– Also useful if you fall back to non RDMA (TCP/IP)

• iWarp? – Yes, familiar 1500 MTU size up to 9K (or more depending on your NIC/Swithes

– Impact on SMB Direct?

– Also same as above, handy during fail back to non RDMA (TCP/IP)!

https://workinghardinit.wordpress.com/2013/11/25/live-migration-can-benefit-from-jumbo-frames/

Page 32: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Jumbo Frames?

Didier Van Hoye - WorkingHardInIT

https://workinghardinit.wordpress.com/2013/11/25/live-migration-can-benefit-from-jumbo-frames/

Page 33: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Configure SMB Direct On The

Windows Host

Didier Van Hoye - WorkingHardInIT

• Infiniband?

– https://technet.microsoft.com/en-us/library/dn583823.aspx

• RoCE?

– https://technet.microsoft.com/en-us/library/dn583822.aspx

• iWarp?

– https://technet.microsoft.com/en-us/library/dn583825.aspx

Page 34: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Prevent live migration starving

storage traffic

Didier Van Hoye - WorkingHardInIT

Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 10GB

Set-SmbBandwidthLimit -Category VirtualMachine -BytesPerSecond 10GB

https://workinghardinit.wordpress.com/2013/09/03/preventing-live-migration-over-smb-starving-csv-traffic-in-windows-server-2012-r2-with-set-smbbandwidthlimit/

Page 35: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

PerfMon is your friend

Didier Van Hoye - WorkingHardInIT

Page 36: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

SMB Multichannel & Direct “Break”

sometimes

Didier Van Hoye - WorkingHardInIT

• Enabling/Disabling RDMA

• Enabling/Disabling Multichannel

• Enabling/Disabling the NICs

• In rare case a firmware upgrade to the switches can require

a reboot of the host(s)

Keep the above in mind when testing & experimenting or trouble shouting. Always start from a clean, known state.

Page 37: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

SMB Direct requires SMB Multichannel

Didier Van Hoye - WorkingHardInIT

Multiple RDMA NICs

SMB Server

SMB Client

Switch10GbE/IB

NIC10GbE/IB

NIC10GbE/IB

Switch10GbE/IB

NIC10GbE/IB

NIC10GbE/IB

Page 38: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

BIOS Power Optimizations still

apply!

Didier Van Hoye - WorkingHardInIT

• Disable C states in BIOS / UEFI

• Disable C1E states in BIOS / UEFI

• Disable PCIe Link Power Management in BIOS, basically set

all power optimizations to max performance

• Optimize memory for speed

• The faster the cards & bus speeds the better …

https://workinghardinit.wordpress.com/2013/06/10/still-need-to-optimizing-power-settings-on-dell-12th-generation-servers-for-lightning-fast-hyper-v-live-migrations/

Page 39: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

Didier Van Hoye - WorkingHardInIT

Page 40: SMB Direct, The Secret Decoder Ring · 2015-06-18 · • Scatter/gather entries support (reading multiple memory buffers and sending them as one or reading one and writing it to

The Agony of Choice …

Didier Van Hoye - WorkingHardInIT

• Infiniband will be around for a long time (just like FC)– What will they’ll do to offset Ethernet speed growth?

strategy seems RoCE but iWarp is giving them a fight for their money.

• Ethernet is likely to grow fast in– New deployments (no infiniband in place)

– > 10Gbps deployments only if price/Gbps drops low enough compared to Infiniband, that’s where Mellanox is outperforming Chelsio (cheap swithes & cards),

• What Flavor Should I use?– RoCE has shown great potential as the official heir to infiniband but must address

concerns around real life loss less routability

complexity of DCB (PFC/ETS) in a heterogeneous world

– iWarp has the advantage of TCP/IP routability & ease of deployment but must address concerns around need of DCB configurations in larger / high performance deployment & the overhead of relying

on the TCP/IP stack

– Remember … there are successful SMB 3 / SOFS / Storage Spaces deployments out there without SMB Direct … (it all depends) but build for the future …