high throughput file servers with smb direct, - etouches · pdf filehigh throughput file...

31

Upload: dangcong

Post on 06-Mar-2018

230 views

Category:

Documents


3 download

TRANSCRIPT

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

Jose Barreto Principal Program Manager

Microsoft Corporation

Abstract

In Windows Server 2012, we introduce the “SMB Direct” protocol, which allows file servers to use high throughput/low latency RDMA network interfaces.

However, there are three distinct flavors of RDMA, each with their own specific requirements and advantages, their own pros and cons.

In this session, we'll look into iWARP, InfiniBand and RoCE, outline the differences between them. We'll also list the specific vendors that offer each technology and provide step-by-step instructions for anyone planning to deploy them.

The talk will also include an update on RDMA performance and a customer case study.

Summary

• Overview of SMB Direct (SMB over RDMA)

• Three flavors of RDMA

• Setting up SMB Direct

• SMB Direct Performance

• SMB Direct Case Study

SMB Direct (SMB over RDMA) • New class of SMB file storage for the Enterprise

– Minimal CPU utilization for file storage processing

– Low latency and ability to leverage high speed NICs

– Fibre Channel-equivalent solution at a lower cost

• Traditional advantages of SMB file storage – Easy to provision, manage and migrate – Leverages converged network – No application change or administrator

configuration

• Required hardware – RDMA-capable network interface (R-NIC) – Support for iWARP, InfiniBand and RoCE

• Uses SMB Multichannel for Load Balancing/Failover

File Client File Server

SMB Server SMB Client

User

Kernel

Application

Disk

R-NIC

Network w/ RDMA support

NTFS SCSI

Network w/ RDMA support

R-NIC

What is RDMA?

• Remote Direct Memory Access Protocol – Accelerated IO delivery model which works

by allowing application software to bypass most layers of software and communicate directly with the hardware

• RDMA benefits – Low latency – High throughput – Zero copy capability – OS / Stack bypass

• RDMA Hardware Technologies – Infiniband – iWARP: RDMA over TCP/IP – RoCE: RDMA over Converged Ethernet

File Server

SMB Direct

Client

RDMA NIC

SMB Direct

Ethernet or InfiniBand

SMB Server

SMB Client

Memory Memory

NDKPI NDKPI

RDMA NIC

RDMA

File Server

SMB Direct

1. Application (Hyper-V,

SQL Server) does not

need to change.

2. SMB client makes the

decision to use SMB

Direct at run time

3. NDKPI provides a much

thinner layer than

TCP/IP

4. Remote Direct Memory

Access performed by

the network interfaces.

SMB over TCP and RDMA

Client

Application

NIC

RDMA NIC

TCP/ IP

User Kernel

SMB Direct

Ethernet and/or InfiniBand

TCP/ IP

Unchanged API

SMB Server SMB Client

Memory Memory

NDKPI NDKPI

RDMA NIC NIC

RDMA 1

2

3

4 1

2

3

4

Type (Cards*) Pros Cons

Non-RDMA Ethernet

(wide variety of NICs)

• TCP/IP-based protocol

• Works with any Ethernet switch

• Wide variety of vendors and models

• Support for in-box NIC teaming (LBFO)

• Currently limited to 10Gbps per NIC port

• High CPU Utilization under load

• High latency

iWARP

(Intel NE020*,

Chelsio T4)

Low

CP

U U

tiliz

atio

n u

nd

er lo

ad

Low

late

ncy

• TCP/IP-based protocol

• Works with any 10GbE switch

• RDMA traffic routable

• Currently limited to 10Gbps per NIC port*

RoCE

(Mellanox ConnectX-2,

Mellanox ConnectX-3*)

• Ethernet-based protocol

• Works with high-end 10GbE/40GbE switches

• Offers up to 40Gbps per NIC port today*

• RDMA traffic not routable via existing IP infrastructure

• Requires DCB switch with Priority Flow Control (PFC)

InfiniBand

(Mellanox ConnectX-2,

Mellanox ConnectX-3*)

• Offers up to 54Gbps per NIC port today*

• Switches typically less expensive per port than

10GbE switches*

• Switches offer 10GbE or 40GbE uplinks

• Commonly used in HPC environments

• Not an Ethernet-based protocol

• RDMA traffic not routable via existing IP infrastructure

• Requires InfiniBand switches

• Requires a subnet manager (on the switch or the host)

Comparing RDMA Technologies

* This is current as of the release of Windows Server 2012 RC. Information on this slide is subject to change as technologies evolve and new cards become available.

Mellanox ConnectX®-3 dual-Port Adapter with VPI (InfiniBand and Ethernet)

• Mellanox provides end-to-end InfiniBand and Ethernet connectivity solutions (adapters, switches, cables)

– Connecting data center servers and storage

• Up to 56Gb/s InfiniBand and 40Gb/s Ethernet per port

– Low latency, Low CPU overhead, RDMA

– InfiniBand to Ethernet Gateways for seamless operation

• Windows Server 2012 exposes the great value of InfiniBand for storage traffic, virtualization and low latency

– InfiniBand and Ethernet (with RoCE) integration

– Highest Efficiency, Performance and return on investment

• For more information:

– http://www.mellanox.com/content/pages.php?pg=file_server

– Gilad Shainer, [email protected], [email protected]

Intel 10GbE iWARP Adapter - NE020

• In production today – Supports Microsoft’s MPI via ND in Windows Server 2008 R2 and beyond

– See Intel’s Download site (http://downloadcenter.intel.com) for drivers (search “NE020”)

• Drivers inbox since Beta for Windows Server 2012 – Supports Microsoft’s SMB Direct via NDK

– Uses the IETF’s iWARP RDMA technology that is built on top of IP

– The only WAN-routable, “cloud-ready” RDMA technology

– Uses standard ethernet switches

– Beta drivers available from Intel’s Download site (http://downloadcenter.intel.com) for drivers (search “NE020”)

• For more information: – [email protected]

Setting up SMB Direct • Install hardware and drivers

– Get-NetAdapter – Get-NetAdapterRdma

• Configure IP addresses – Get-SmbServerNetworkInterface – Get-SmbClientNetworkInterface

• Establish an SMB Connection – Get-SmbConnection – Get-SmbMultichannelConnection

• Similar to configuring SMB for regular network interfaces

• Verify client Performance Counters – RDMA Activity – 1/interface – SMB Direct Connection – 1/connection – SMB Client Shares – 1/share

• Verify server Performance Counters – RDMA Activity – 1/interface – SMB Direct Connection – 1/connection – SMB Server Shares – 1/share – SMB Server Session – 1/session

InfiniBand details • Cards

– Mellanox ConnectX-2 – Mellanox ConnectX-3

• Configure a subnet manager on the switch – Using a managed switches with a built-in subnet manager

• Or use OpenSM on Windows Server 2012 – Included as part of the Mellanox package – New-Service –Name "OpenSM" –BinaryPathName "`"C:\Program

Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe`" --service -L 128" -DisplayName "OpenSM" –Description "OpenSM" -StartupType Automatic

iWARP details

• Cards – Intel NE020

– Chelsio T4

• Configure the firewall – SMB Direct with iWARP uses TCP port 5445

– Enable-NetFirewallRule FPSSMBD-iWARP-In-TCP

• Allow cross-subnet access (optional) – iWARP RDMA technology can be routed across IP subnets

– Set-NetOffloadGlobalSetting -NetworkDirectAcrossIPSubnets Allow

RoCE details • Cards

– Mellanox ConnectX-3 – Make sure to configure the NIC for Ethernet

• Configuring Priority Flow Control (PFC) on Windows

– Install-WindowsFeature Data-Center-Bridging – New-NetQosPolicy “RoCE” –NetDirectPortMatchCondition 445 -

PriorityValue8021Action 4 – Enable-NetQosFlowControl –Priority 4 – Enable-NetAdapterQos –InterfaceAlias RDMA1 – Set-NetQosDcbxSetting –willing 0 – New-NetQoSTrafficClass "RoCE" -Priority 4 -Bandwidth 60 -Algorithm ETS

• Configuring PFC on the Switch

SMB Direct Performance – 1 x 54GbIB

Single Server

Fusion IO Fusion IO Fusion IO Fusion IO

IO Micro Benchmark

SMB Client

SMB Server

Fusion IO Fusion IO Fusion IO Fusion IO

IO Micro Benchmark

10 GbE

10GbE

SMB Client

SMB Server

Fusion IO Fusion IO Fusion IO Fusion IO

IO Micro Benchmark

IB FDR

IB FDR

SMB Client

SMB Server

Fusion IO Fusion IO Fusion IO Fusion IO

IO Micro Benchmark

IB QDR

IB QDR

SMB Direct Performance – 1 x 54GbIB

*** Preliminary *** results from two Intel Romley machines with 2 sockets each, 8 cores/socket Both client and server using a single port of a Mellanox network interface PCIe Gen3 x8 slot

Data goes all the way to persistent storage, using 4 FusionIO ioDrive 2 cards

Preliminary results based on Windows Server 2012 beta

Configuration BW MB/sec

IOPS 512KB IOs/sec

%CPU Privileged

Non-RDMA (Ethernet, 10Gbps) 1,129 2,259 ~9.8

RDMA (InfiniBand QDR, 32Gbps) 3,754 7,508 ~3.5

RDMA (InfiniBand FDR, 54Gbps) 5,792 11,565 ~4.8

Local 5,808 11,616 ~6.6

Configuration BW MB/sec

IOPS 8KB IOs/sec

%CPU Privileged

Non-RDMA (Ethernet, 10Gbps) 571 73,160 ~21.0

RDMA (InfiniBand QDR, 32Gbps) 2,620 335,446 ~85.9

RDMA (InfiniBand FDR, 54Gbps) 2,683 343,388 ~84.7

Local 4,103 525,225 ~90.4

Workload: 512KB IOs, 8 threads, 8 outstanding Workload: 8KB IOs, 16 threads, 16 outstanding

htt

p:/

/sm

b3

.info

File Client (SMB 3.0)

SMB Direct Performance – 2 x 54GbIB

Single Server

SQLIO File Server (SMB 3.0)

SQLIO

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

Hyper-V (SMB 3.0)

File Server (SMB 3.0)

VM

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

SQLIO

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SMB Direct Performance – 2 x 54GbIB

Preliminary results based on Windows Server 2012 RC

Configuration BW MB/sec

IOPS 512KB IOs/sec

%CPU Privileged

Latency milliseconds

1 – Local 10,090 38,492 ~2.5% ~3ms

2 – Remote 9,852 37,584 ~5.1% ~3ms

3 - Remote VM 10,367 39,548 ~4.6% ~3 ms

SMB Direct Performance

File Server (SMB 3.0)

File Client (SMB 3.0)

SMB Direct Performance – 3 x 54GbIB

SQLIO

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

RDMA NIC

SAS

RAID Controller

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

SAS

SAS HBA

JBOD SSD SSD SSD SSD SSD SSD SSD SSD

Storage Spaces

Workload BW MB/sec

IOPS IOs/sec

%CPU Privileged

Latency milliseconds

512KB IOs, 100% read, 2t, 8o 16,778 32,002 ~11% ~ 2 ms

8KB IOs, 100% read, 16t, 2o 4,027 491,665 ~65% < 1 ms

Preliminary results based on Windows Server 2012 RC

Case Study

Summary

• Overview of SMB Direct (SMB over RDMA)

• Three flavors of RDMA

• Setting up SMB Direct

• SMB Direct Performance

• SMB Direct Case Study

Related Content

• Blog Posts http://smb3.info

• TechEd Talks WSV328 The Path to Continuous Availability with Windows Server 2012

VIR306 Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V

WSV314 Windows Server 2012 NIC Teaming and SMB Multichannel Solutions

WSV334 Windows Server 2012 File and Storage Services Management

WSV303 Windows Server 2012 High-Performance, Highly-Available Storage Using SMB

WSV330 How to Increase SQL Availability and Performance Using WS 2012 SMB 3.0 Solutions

WSV410 Continuously Available File Server: Under the Hood

WSV310 Windows Server 2012: Cluster-in-a-Box, RDMA, and More