high-performance networking with ndis 6.0, tcp chimney offload, and rss vik desai program manager...
TRANSCRIPT
High-Performance Networking High-Performance Networking With NDIS 6.0, TCP Chimney With NDIS 6.0, TCP Chimney Offload, and RSSOffload, and RSS
Vik DesaiVik DesaiProgram ManagerProgram ManagerWindows NetworkingWindows NetworkingMicrosoft CorporationMicrosoft Corporation
Appropriate AudienceAppropriate AudienceWho should attend this session?Who should attend this session?
Networking product buildersNetworking product buildersProduct decision makersProduct decision makersHardware and software engineers Hardware and software engineers Architects Architects
Network designers and deployersNetwork designers and deployersIT Managers IT Managers IT ConsultantsIT Consultants
Venture Capitalists and Private InvestorsVenture Capitalists and Private InvestorsIndustry analystsIndustry analysts
AgendaAgenda
Networking stack challengesNetworking stack challenges
Scalable networking goalsScalable networking goals
Scalable networking architectureScalable networking architectureReceive Side Scaling (RSS)Receive Side Scaling (RSS)
TCP Chimney OffloadTCP Chimney Offload
Scalable networking demoScalable networking demoNetXen Demo – Vikram KarvatNetXen Demo – Vikram Karvat
Broadcom Demo – Uri ElzurBroadcom Demo – Uri Elzur
Offload roadmapOffload roadmap
Summary and Call to ActionSummary and Call to Action
Networking ChallengesNetworking Challenges
Receive processing limited to a single Receive processing limited to a single CPU on a multi-processor systemCPU on a multi-processor system
CPU utilized in Protocol Processing CPU utilized in Protocol Processing increases with Physical layer speedsincreases with Physical layer speeds
Data Movement between network and Data Movement between network and application buffers is a bottleneckapplication buffers is a bottleneck
Large number of Interrupts even with Large number of Interrupts even with Interrupt ModerationInterrupt Moderation
Scalable Networking GoalsScalable Networking Goals
Boost application scalability on 1 GB and 10 GB Boost application scalability on 1 GB and 10 GB Ethernet with an integrated architectureEthernet with an integrated architecture
That preserves standard infrastructure (1500b MTU)That preserves standard infrastructure (1500b MTU)
That maintains standard network and server That maintains standard network and server management practicesmanagement practices
That does not compromise security, server reliability, That does not compromise security, server reliability, and application compatibilityand application compatibility
Enable Ethernet fabric convergenceEnable Ethernet fabric convergence
Robustly support new class of protocol offload Robustly support new class of protocol offload NICs in Microsoft WindowsNICs in Microsoft Windows
Receive Side ScalingReceive Side Scaling
Networking ChallengeNetworking ChallengeReceive processing limited to a single CPU on Receive processing limited to a single CPU on a multi-processor systema multi-processor system
SolutionSolutionParallelize receive processing by Queuing incoming Parallelize receive processing by Queuing incoming packets to multiple CPUspackets to multiple CPUs
Implementing Solution via RSSImplementing Solution via RSSNIC manages multiple hardware queuesNIC manages multiple hardware queues
NIC hashes incoming TCP segments to different NIC hashes incoming TCP segments to different hardware queues hardware queues
NIC driver requests DPCs on appropriate CPUsNIC driver requests DPCs on appropriate CPUs
RSS Description –RSS Description –Non RSS Capable NICNon RSS Capable NIC
Regular NIC
ReceiveFIFOInterrupt
Logic
Processor 0
ISR NDIS
TCPIP
APP
DPC
Incoming Packet
RSS Description –RSS Description –RSS Capable NICRSS Capable NIC
RSS CapableNIC
ReceiveFIFOs
InterruptLogic
Incoming Packet
Processor 0
ISR NDIS
TCPIP
APP
DPC
Processor 1
NDIS
TCPIP
APP
DPC
Processor 2
NDIS
TCPIP
APP
DPC
ToeplitzHash
TCP Chimney OffloadTCP Chimney Offload
Networking ChallengesNetworking ChallengesData Movement between networkData Movement between networkand application buffers is a bottleneckand application buffers is a bottleneck
Large number of Interrupts even with Large number of Interrupts even with Interrupt ModerationInterrupt Moderation
CPU utilized in Protocol Processing increasesCPU utilized in Protocol Processing increaseswith Physical layer speedswith Physical layer speeds
SolutionSolutionProvide Zero Copy solution for pre posted buffersProvide Zero Copy solution for pre posted buffers
Change interrupts from a per packet basis to Change interrupts from a per packet basis to a per segment basisa per segment basis
Offload Protocol Processing to hardwareOffload Protocol Processing to hardware
TCP Chimney ArchitectureTCP Chimney Architecture
NDIS 5.2 / 6.0
NDIS Miniport Driver
TCP Chimney Offload Capable Hardware
Framing Layer (Ethernet)
Path Layer IPv4 or IPv6
Transport Layer (TCP)
Other Misc. Layers
Switch
Applications
State Updates
Data T
ransfer
TCP Chimney Interfaces
TCP Chimney Interface DetailsTCP Chimney Interface Details
TCP/IP States Divided into TCP/IP States Divided into Const State – Does not change for connection lifetimeConst State – Does not change for connection lifetime
Cached State – Controlled by host stack and updated Cached State – Controlled by host stack and updated appropriately to offload targetappropriately to offload target
Delegated State – Controlled by Offload TargetDelegated State – Controlled by Offload Target
NDIS SupportsNDIS SupportsOffload Capability AdvertisementOffload Capability Advertisement
Interface to transfer and update state informationInterface to transfer and update state information
Interface to query statisticsInterface to query statistics
Interface to transfer dataInterface to transfer data
TCP Chimney InitializationTCP Chimney Initialization
Offload Manager determines suitabilityOffload Manager determines suitabilityof connection for offloadof connection for offloadState from each layer is captured and State from each layer is captured and transferred to offload targettransferred to offload targetIncoming Data packets/outgoing sends Incoming Data packets/outgoing sends are queuedare queued
Data packets will be replayed to offload Data packets will be replayed to offload targets for successful offload attemptstargets for successful offload attemptsData packets will be processed by stackData packets will be processed by stackfor unsuccessful offload attemptsfor unsuccessful offload attempts
Data Transfer BeginsData Transfer Begins
TCP Chimney Data TransferTCP Chimney Data Transfer
SendsSendsSegment passed to offload target Segment passed to offload target for completionfor completion
Send Completions after end-to-end TCP Ack Send Completions after end-to-end TCP Ack
Receive Receive If no receive buffers posted indicate dataIf no receive buffers posted indicate data
If receive buffers are posted indication If receive buffers are posted indication occurs as appropriateoccurs as appropriate
OOB/Urgent Data passed to Host StackOOB/Urgent Data passed to Host Stack
TCP Chimney TCP Chimney Connection TeardownConnection Teardown
Connections can be uploaded/offloaded Connections can be uploaded/offloaded at any timeat any time
Heuristics Manager tracks connections Heuristics Manager tracks connections appropriate for upload/offloadappropriate for upload/offload
Half Closed Connections are not uploadedHalf Closed Connections are not uploaded
Upload request initiated by offload targetUpload request initiated by offload targetOffload target to provide delegated state to host stackOffload target to provide delegated state to host stack
Offload target keeps connection state till host Offload target keeps connection state till host sends upload callsends upload call
TCP Chimney ImplicationsTCP Chimney Implications
IPsec Chimney required for IPsec traffic IPsec Chimney required for IPsec traffic
Will not work with Will not work with IM drivers incapable of understanding IM drivers incapable of understanding Chimney interfacesChimney interfaces
Hooking FirewallsHooking Firewalls
Best benefits for Best benefits for Long Lived ConnectionsLong Lived Connections
Pre-posted Receive BuffersPre-posted Receive Buffers
Large Application IO SizesLarge Application IO Sizes
10GbE Chimney Offload10GbE Chimney Offload
Vikram KarvatVikram KarvatVP MarketingVP [email protected]@netxen.com
Faisal LatifFaisal LatifPrincipal Software EngineerPrincipal Software [email protected]@netxen.com
NetXenNetXen
Next generation Ethernet silicon provider Next generation Ethernet silicon provider focused on server OEMsfocused on server OEMs
Chips, Boards, S/WChips, Boards, S/W
Founded February 2002Founded February 2002
Top tier investorsTop tier investorsAccel, Benchmark, Integral CapitalAccel, Benchmark, Integral Capital
Expertise in semiconductor, software, systems Expertise in semiconductor, software, systems and serversand servers
Intelligent NIC™ product lineIntelligent NIC™ product lineLaunched March 27, 2006Launched March 27, 2006
REAL products, REAL customersREAL products, REAL customers
Intelligent NIC ArchitectureIntelligent NIC Architecture
Single-ChipSingle-Chip
Dual 10GbE Dual 10GbE
Quad GbEQuad GbE
Protocol FeaturesProtocol FeaturesTCP/IPTCP/IP
RDMARDMA
iSCSIiSCSI
VirtualizationVirtualization
Security Security
Native 8X PCI-expressNative 8X PCI-express
1X/4X/8X1X/4X/8X
10GE10GE
Flow
Classifier
DDR
ProtocolProcessing
Engine
CAM
PCI-E 8X
CO
RE
IN
TE
RC
ON
NE
CT
FA
BR
IC10GbE
L2 Caches
GbE
QDR
QM
NetXen 10GbE ChimneyNetXen 10GbE Chimney
TxTx
Windows Server 2003 SP1 with SNPWindows Server 2003 SP1 with SNP
Windows Server 2003 SP1 with SNPWindows Server 2003 SP1 with SNP
RxRx3.4 GHz Xeon3.4 GHz Xeon
10GbE Switch10GbE Switch
10GbE Chimney Results10GbE Chimney Results
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
NIC (1500 Byte) NIC (Jumbo) Chimney (1500 Byte)
Th
rou
gh
pu
t (M
b/s
)
0
10
20
30
40
50
60
70
CP
U (
%)
ThroughputThroughput
CPU UtilizationCPU Utilization
Configuration: DP Xeon, 3.4GHz, HT off, 2GBConfiguration: DP Xeon, 3.4GHz, HT off, 2GB
60% Throughput60% Throughput
800% Processor Efficiency800% Processor Efficiency
Demo ConclusionDemo Conclusion
10GbE is happening NOW10GbE is happening NOW
Chimney enablesChimney enablesScalability with balanced system designScalability with balanced system design
Increased datacenter power efficiencyIncreased datacenter power efficiency
The Agile Datacenter requiresThe Agile Datacenter requiresAdaptability, Scalability, Intelligence Adaptability, Scalability, Intelligence
BroadcomBroadcom
Uri ElzurUri ElzurDirector, Advanced TechnologyDirector, Advanced TechnologyBroadcomBroadcom
Gururaj AnanthateertaGururaj AnanthateertaSenior Staff EngineerSenior Staff EngineerBroadcomBroadcom
Scalable TCP Chimney enables Scalable TCP Chimney enables Convergence Over EthernetConvergence Over EthernetScalable TCP Chimney - basis for Convergence over Ethernet
TCP based - Socket applications, iSCSI, iSCSI boot, iWARP (RDMA)
Microsoft’s SNP enable convergence over EthernetMicrosoft’s SNP enable convergence over Ethernet
Secure (Network based security), robust and standard compliant implementation is required
Ethernet requires Layer 2 functionality – VLAN, WoL, power management
Integrated Management
File System
TCP/IP
NDIS
NDIS IM Driver
NDIS Miniport
Class Driver
iSCSIMiniport
iSCSI Port Driver.
Storage Applications
NIC
Partition
HBA
Windows Socket Switch
Sockets Applications
Windows Sockets
RDMA Driver
User User ModeMode
KernelKernelModeMode
RDMA Provider
RNIC
(iscsiprt sys)
C-NICC-NIC
Broadcom’s C-NIC 2.5G/SBroadcom’s C-NIC 2.5G/SNTTCP over 2.5 GB/s TCP ChimneyNTTCP over 2.5 GB/s TCP Chimney
S2 (TX/RX)S2 (TX/RX) S1 (TX/RX)S1 (TX/RX)
HP DL 380G4 serverHP DL 380G4 server
3.4GHz Intel Xeon CPU3.4GHz Intel Xeon CPU
1 GB RAM1 GB RAM
Windows Server 2003Windows Server 2003SP1-SNP build 2670SP1-SNP build 2670
Two BCM5708S NICsTwo BCM5708S NICsBroadcom Miniport driver v 2.6.14*Broadcom Miniport driver v 2.6.14*
C-NIC C-NIC PerfmonPerfmon
BCM5708SBCM5708SBCM5708SBCM5708S
fiber cablefiber cable
NTTTCPsNTTTCPs
HP DL 380G4 serverHP DL 380G4 server
3.4GHz Intel Xeon CPU3.4GHz Intel Xeon CPU
1GB RAM1GB RAM
Windows Server 2003Windows Server 2003SP1-SNP build 2670SP1-SNP build 2670
Two BCM5708S NICsTwo BCM5708S NICsBroadcom Miniport driver v 2.6.14*Broadcom Miniport driver v 2.6.14*
Broadcom 2.5G SwitchBroadcom 2.5G Switch
BCM56580 StrataXGS IIIBCM56580 StrataXGS III
Less CPU - TOE vs. L2 @2.5G
0
20
40
60
80
100
120
1
CP
U U
til
[%]
Two Broadcom CNIC - TOE Two Broadcom CNIC - L2
More Throughput - TOE vs. L2 @2.5G
0
1
2
3
4
5
1
BW
[Gb/
S]
Two Broadcom CNIC - TOE Two Broadcom CNIC - L2
TCP Chimney scales…TCP Chimney scales…
• 2.5G/S offers more BW than non-TOE, at 1/6 of the CPU utilization2.5G/S offers more BW than non-TOE, at 1/6 of the CPU utilization• Microsoft’s SNP combined with BCM5708 provides 7.5 times better Microsoft’s SNP combined with BCM5708 provides 7.5 times better P/EP/E
• Performance Efficiency (Performance Efficiency (P/E) is network throughput divided by CPU UtilizationP/E) is network throughput divided by CPU Utilization• At Gigabit and beyond, TCP Chimney is critical to free up cycles for the At Gigabit and beyond, TCP Chimney is critical to free up cycles for the applications applications
Higher is betterHigher is better Lower is betterLower is better
BW improvement TOE vs. L2BW improvement TOE vs. L2 CPU Utilization reduction TOE vs. L2CPU Utilization reduction TOE vs. L2
Demo: NTTTCP
TOE L2TOE L2 TOE L2TOE L2
RSS Improves SMP ScalabilityRSS Improves SMP Scalability
With RSS web traffic is more evenly distributed on multiple CPUsWith RSS web traffic is more evenly distributed on multiple CPUs
Web Bench delivers up to 50% more requests/secWeb Bench delivers up to 50% more requests/sec
Demo: Web Bench 5.0Demo: Web Bench 5.0
Number of connections
0
10000
20000
30000
40000
50000
60000
Requ
ests
/ Se
c
RSS Enabled
RSS Disabled
Demo ConclusionDemo Conclusion
Broadcom’s C-NIC with Microsoft’s TCP Broadcom’s C-NIC with Microsoft’s TCP Chimney is here TODAYChimney is here TODAY
TCP Chimney scales to accommodate the TCP Chimney scales to accommodate the needs of the server and applicationsneeds of the server and applications
TCP Chimney is the basis for the future of TCP Chimney is the basis for the future of Networking in WindowsNetworking in Windows
Architecture allows for IPsec based securityArchitecture allows for IPsec based security
RSS provides for a better load spreading RSS provides for a better load spreading on SMP serverson SMP servers
Scalable Networking Scalable Networking Pack PartnersPack Partners
Future Chimney OffloadsFuture Chimney Offloads
IPsec Chimney IPsec Chimney
RDMA ChimneyRDMA Chimney
SSL ChimneySSL Chimney
Call To ActionCall To Action
Develop low cost TCP Chimney Offload Develop low cost TCP Chimney Offload and RSS hardware for Windows Vista and and RSS hardware for Windows Vista and Windows Server codenamed “Longhorn”Windows Server codenamed “Longhorn”
Deploy TCP Chimney Offload and RSS Deploy TCP Chimney Offload and RSS hardware in enterprise and personal hardware in enterprise and personal computing environmentscomputing environments
Additional ResourcesAdditional Resources
Web ResourcesWeb ResourcesDocumentation, White Papers, and software bits availableDocumentation, White Papers, and software bits availabletoday for TCP Chimney Offload and RSS: today for TCP Chimney Offload and RSS: http://support.microsoft.com/?kbid=912222http://support.microsoft.com/?kbid=912222
Specs: DDK and Documentation will available on: Specs: DDK and Documentation will available on: www.microsoft.com/www.microsoft.com/whdcwhdc
White Paper: White Paper: http://www.microsoft.com/whdc/device/network/scale.mspxhttp://www.microsoft.com/whdc/device/network/scale.mspx
Other Resources:Other Resources:www.microsoft.com/www.microsoft.com/snpsnp http://www.microsoft.com/whdc/device/network/netintro.mspxhttp://www.microsoft.com/whdc/device/network/netintro.mspx
Related SessionsRelated SessionsNet088 – Technical Overview of Microsoft’s NetDMA ArchitectureNet088 – Technical Overview of Microsoft’s NetDMA Architecture
Please send e-mail to with questionsPlease send e-mail to with questionsndis6fb @ microsoft.comndis6fb @ microsoft.com
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.