recent linux tcp updates, and how to tune your 100g...

41
Recent Linux TCP Updates, and how to tune your 100G host Nate Hanford, Brian Tierney, ESnet [email protected] http://fasterdata.es.net Internet2 Technology Exchange, Sept 27, 2016, Miami, FL

Upload: lehuong

Post on 14-Mar-2018

237 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

RecentLinuxTCPUpdates,andhowtotuneyour100GhostNateHanford,BrianTierney,[email protected]://fasterdata.es.net

Internet2TechnologyExchange,

Sept27,2016,Miami,FL

Page 2: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

Observation#1

• TCPismorestableinCentOS7vs CentOS6– Throughputrampsupmuchquicker

• Moreaggressiveslowstart– Lessvariabilityoverlifeoftheflow

10/24/162

Page 3: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

BerkeleytoAmsterdam

Page 4: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

10/24/164

NewYorktoTexas

Page 5: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

Observation#2

• TurningonFQhelpsthroughputevenmore– TCPisevenmorestable– Worksbetterwithsmallbufferdevices

• Pacingtomatchbottlenecklinkworksbetteryet

10/24/165

Page 6: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

NewTCPoption:FairQueuingScheduler (FQ)AvailableinLinuxkernel3.11(releasedlate2013)orhigher

– availableinFedora20,Debian 8,andUbuntu13.10– Backported to3.10.0-327kernel inv7.2CentOS/RHEL(Dec 2015)

ToenableFairQueuing(whichisoffbydefault),do:– tc qdisc adddev $ETHrootfq

Tobothpaceandshapethebandwidth:– tc qdisc adddev $ETHrootfq maxrate Ngbit

• Canreliablypaceuptoamaxrate of32Gbps

http://fasterdata.es.net/host-tuning/linux/fair-queuing-scheduler/

Page 7: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

FQBackground

• Lotsofdiscussionaround‘bufferbloat’startingin2011– https://www.bufferbloat.net/

• Googlewantedtobeabletogethigherutilizationontheirnetwork– Paper:“B4:ExperiencewithaGlobally-DeployedSoftwareDefinedWAN,SIGCOMM2013

• GooglehiredsomeverysmartTCPpeople• VanJacobson,MattMathis,EricDumazet,andothers

• Result:LotsofimprovementstotheTCPstackin2013-14,includingmostnotablythe‘fairqueuing’pacer

10/24/167

Page 8: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

10/24/168

NewYorktoTexas:WithPacing

Page 9: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

100GHostTuning

10/24/169

Page 10: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

TestEnvironment• Hosts:

– Supermicro X10DRiDTNs– IntelXeonE5-2643v3,2sockets,6coreseach– CentOS 7.2runningKernel3.10.0-327.el7.x86_64– Mellanox ConnectX-4EN/VPI100GNICswithportsinENmode– Mellanox OFEDDriver3.3-1.0.4(03Jul2016),Firmware12.16.1020

• Topology– BothsystemsconnectedtoDellZ9100100GbpsONTop-of-RackSwitch– Uplinktonersc-tb1ALUSR7750Routerrunning100GlooptoStarlightandback

• 92msRTT– UsingTagged802.1qtoswitchbetweenLoopandLocalVLANs– LANhad54usecRTT

• Configuration:– MTUwas9000B– irqbalance,tuned,andnumad wereoff– coreaffinitywassettocores7and8(ontheNUMAnodeclosesttotheNIC)– AlltestsareIPV4unlessotherwisestated

10/24/1610

Page 11: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

nersc-tb1

Dell z9100

nersc-tbn-4 nersc-tbn-5

star-tb1

100G loop: RTT = 92ms

100G

StarLight (Chicago)

Oakland, CA

Each host has:• Mellanox ConnectX-4 (100G)• Mellanox ConnectX-3 (40G)

Alcatel 7750 Router

TestbedTopologyAlcatel 7750 Router

40G

100G 100G40G

Page 12: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

OurCurrentBestSingleFlowResults• TCP

– LAN:79Gbps– WAN(RTT=92ms):36.5Gbps,49Gbpsusing‘sendfile’API(‘zero-copy’)– Testcommands:

• LAN:nuttcp -i1-xc7/7–w1m-T30hostname• WAN:nuttcp -i1-xc7/7–w900M-T30hostname

• UDP:– LANandWAN:33Gbps– Testcommand:nuttcp -l8972-T30-u-w4m-Ru -i1-xc7/7hostname

Othershavereportedupto85GbpsLANperformancewithsimilarhardware

10/24/1612

Page 13: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

CPUgovernorLinuxCPUgovernor(P-States)settingmakesabig difference:

RHEL: cpupower frequency-set -g performanceDebian:cpufreq-set -r -g performance

57Gbps defaultsettings(powersave)vs.79Gbps ‘performance’modeontheLANTowatchtheCPUgovernorinaction:watch -n 1 grep MHz /proc/cpuinfo

cpu MHz:1281.109cpu MHz:1199.960cpu MHz:1299.968cpu MHz:1199.960cpu MHz:1291.601cpu MHz:3700.000cpu MHz:2295.796cpu MHz:1381.250cpu MHz:1778.492

10/24/1613

Page 14: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

CPUfrequency• Driver:KernelmoduleorcodethatmakesCPUfrequencycallstohardware• Governor:Driversettingthatdetermineshowthefrequencywillbeset• PerformanceGovernor:Biastowardshigherfrequencies• Userspace Governor:Allowusertospecifyexactcoreandpackagefrequencies• OnlytheIntelP-StatesDrivercanmakeuseofTurboBoost• Checkcurrentsettings:cpupower frequency-info

10/24/1614

P-StatesPerformance

ACPI-CPUfreqPerformance

ACPI-CPUfreqUserspace

LAN 79G 72G 67G

WAN 36G 36G 27G

Page 15: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

TCPBuffers# add to /etc/sysctl.conf

# allow testing with 2GB buffers

net.core.rmem_max = 2147483647net.core.wmem_max = 2147483647

# allow auto-tuning up to 2GB buffers

net.ipv4.tcp_rmem = 4096 87380 2147483647net.ipv4.tcp_wmem = 4096 65536 2147483647

2GBisthemaxallowableunderLinuxWANBDP=12.5GB/s*92ms=1150MB(autotuningsetthisto1136MB)LANBDP=12.5GB/s*54us=675KB(autotuningsetthisto2-9MB)ManualbuffertuningmadeabigdifferenceontheLAN:

– 50-60Gbpsvs 79Gbps

10/24/1615

Page 16: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

zerocopy (sendfile)results

• iperf3–Zoption

• NosignificantdifferenceontheLAN

• SignificantimprovementontheWAN– 36.5Gbpsvs 49Gbps

10/24/1616

Page 17: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

IPv4vs IPv6results

• IPV6isslightlyfasterontheWAN,slightlyslowerontheLAN

• LAN:– IPV4:79Gbps– IPV6:77.2Gbps

• WAN– IPV4:36.5Gbps– IPV6:37.3Gbps

10/24/1617

Page 18: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

Don’tForgetaboutNUMAIssues

10/24/1618

• Upto2xperformancedifferenceifyouusethewrongcore.

• Ifyouhavea2CPUsocketNUMAhost,besureto:– Turnoffirqbalance– FigureoutwhatsocketyourNICisconnectedto:

cat /sys/class/net/ethN/device/numa_node

– RunMellanox IRQscript:/usr/sbin/set_irq_affinity_bynode.sh 1 ethN

– BindyourprogramtothesameCPUsocketastheNIC:numactl -N 1 program_name

• WhichcoresbelongtoaNUMAsocket?– cat/sys/devices/system/node/node0/cpulist– (note:onsomeDellservers,thatmightbe:0,2,4,6,...)

Page 19: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

SettingstoleavealoneinCentOS7

Recommendleavingtheseatthedefaultsettings,andnoneoftheseseemtoimpactperformancemuch

• InterruptCoalescence

• RingBuffersize

• LRO(off)andGRO(on)

• net.core.netdev_max_backlog

• txqueuelen

• tcp_timestamps

10/24/1619

Page 20: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

ToolSelection• Bothnuttcp andiperf3havedifferentstrengths.

• nuttcp isabout10%fasteronLANtests

• iperf3JSONoutputoptionisgreatforproducingplots

• Useboth!Botharepartofthe‘perfsonar-tools’package– Installationinstructionsat:http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/

10/24/1620

Page 21: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

OSComparisons• CentOS7(3.10kernel)vs.Ubuntu14.04(4.2kernel)vs.Ubuntu16.04(4.4kernel)

– Note:4.2kernelareabout5-10%slower(senderandreceiver)

• SampleResults:• CentOS7toCentOS7:79Gbps

• CentOS7toUbuntu14.04(4.2.0kernel):69Gbps

• Ubuntu14.04(4.2)toCentOS7:71Gbps

• CentOS7toUbuntu16.04(4.4kernel):73Gbps

• Ubuntu16.04(4.4kernel)toCentOS7:75Gbps

• CentOS7toDebian 8.4with4.4.6kernel:73.6G

• Debian 8.4with4.4.6KerneltoCentOS7:76G

10/24/1621

Page 22: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

BIOSSetting

• DCA/IOAT/DDIO:ON– AllowstheNICtodirectlyaddressthecacheinDMAtransfers

• PCIe MaxReadRequest:Turnitupto4096,butourresultssuggestitdoesn’tseemtohurtorhelp

• Turboboost:ON

• Hyperthreading:OFF– AddedexcessivevariabilityinLANperformance(51Gto77G)

• node/memoryinterleaving:??

10/24/1622

Page 23: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

PCIBusCommandsMakesureyou’reinstallingtheNICintherightslot.Usefulcommandsinclude:

FindyourPCIslot:lspci | grep Ethernet

81:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

ConfirmthatthisslotisPCIeGen3x16:lspci -s 81:00.0 -vvv | grep PCIeGen

[V0] Vendor specific: PCIeGen3 x16

ConfirmthatPCIMaxReadReq is4096Blspci -s 81:00.0 -vvv | grep MaxReadReq

MaxPayload 256 bytes, MaxReadReq 4096 bytes

Ifnot,youcanincreaseitusing‘setpci’

• Formoredetails,see:https://community.mellanox.com/docs/DOC-2496

10/24/1623

Page 24: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

Benchmarkingvs.ProductionHostSettings

Therearesomesettingsthatwillgiveyoumoreconsistentresultsforbenchmarking,butyoumaynotwanttorunonaproductionDTNBenchmarking:• UseaspecificcoreforIRQs:

/usr/sbin/set_irq_affinity_cpulist.sh 8 ethN• Useafixedclockspeed(settothemaxforyourprocessor):

– /bin/cpupower -c all frequency-set -f 3.4GHzProductionDTN:

/usr/sbin/set_irq_affinity_bynode.sh 1 ethN/bin/cpupower frequency-set -g performance

10/24/1624

Page 25: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

FQon100GHosts

10/24/1625

Page 26: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

100GHost,ParallelStreams:nopacingvs 20Gpacing

10/24/1626

WealsoseeconsistentlossontheLANwith4streams,nopacingPacketlossduetosmallbuffersinDellZ9100switch?

Page 27: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

100GHostto10GHost

10/24/1627

Page 28: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

FastHosttoSlowhost

10/24/1628

Throttledthereceivehostusing‘cpupower’command:/bin/cpupower -c all frequency-set -f 1.2GHz

Page 29: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

Summaryofour100Gresults

• NewEnhancementstoLinuxKernelmaketuningeasieringeneral.

• Afewofthestandard10Gtuningknobsnolongerapply

• TCPbufferautotuningdoesnotworkwell100GLAN

• Usethe‘performance’CPUgovernor

• UseFQPacingtomatchreceivehostspeedifpossible

• ImportanttobeusingtheLatestdriverfromMellanox– version:3.3-1.0.4(03Jul2016),firmware-version:12.16.1020

10/24/1629

Page 30: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

What’snextintheTCPworld?

• TCPBBR(BottleneckBandwidthandRTT)fromGoogle– https://patchwork.ozlabs.org/patch/671069/– GoogleGroup:https://groups.google.com/forum/#!topic/bbr-dev

• AdetaileddescriptionofBBRwillbepublishedinACMQueue,Vol.14No.5,September-October2016:– "BBR:Congestion-BasedCongestionControl".

• Googlereports2-4ordersofmagnitudeperformanceimprovementonapathwith1%lossand100msRTT.– Sampleresult:cubic:3.3Mbps,BBR:9150Mbps!!– EarlytestingonESnetlessconclusive,butseemstohelponsomepaths

10/24/1630

Page 31: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

InitialBBRTCPresults(bwctl,3streams,40sectest)RemoteHost Throughput Retransmits

perfsonar.cc.trincoll.edu htcp:4.52bbr:410

htcp:266bbr:179864

perfsonar.nssl.noaa.gov htcp:183bbr:803

htcp:1070bbr:240340

kstar-ps.nfri.re.kr htcp:4301bbr:4430

htcp:1641bbr:98329

ps1.jpl.net htcp:940bbr:935

htcp:1247bbr:399110

uhmanoa-tp.ps.uhnet.net htcp:5051bbr:3095

htcp:5364bbr:412348

10/24/1631

Page 32: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

MoreInformation

• https://fasterdata.es.net/host-tuning/linux/fair-queuing-scheduler/• https://fasterdata.es.net/host-tuning/40g-tuning/• TalkonSwitchBuffersizeexperiments:

– http://meetings.internet2.edu/2015-technology-exchange/detail/10003941/

• Mellanox TuningGuide:– https://community.mellanox.com/docs/DOC-1523

• Email:[email protected]

10/24/1632

Page 33: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

ExtraSlides

10/24/1633

Page 34: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

mlnx_tune command

• See:https://community.mellanox.com/docs/DOC-1523

10/24/1634

Page 35: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

Coalescing Parameters

• Varies by manufacturer• usecs: Wait this amount of microseconds after 1st packet is

received/transmitted• frames: Interrupt after this many frames are received or transmitted• tx-usecs and tx-frames aren’t as important as rx-usecs• Due to the higher line rate, lower is better, until interrupts get in the way

(at 100G, we are sending almost 14 frames/usec• Default settings seem best for most cases

8/30/201635

Page 36: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

AsmallamountofpacketlossmakesahugedifferenceinTCPperformance

MetroArea

Local(LAN)

Regional Continental

International

Measured(TCPReno) Measured(HTCP) Theoretical(TCPReno) Measured(noloss)

Withloss,highperformance beyondmetrodistancesisessentiallyimpossible

Page 37: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

TCP’sCongestionControl

© 2015 Internet2

50ms simulated RTTCongestion w/ 2Gbps UDP trafficHTCP / Linux 2.6.32

SlidefromMichaelSmitasin,LBLnet

Page 38: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

FQPacketsaremuchmoreevenlyspacedtcptrace/xplotoutput:FQonleft,StandardTCPonright

38

Page 39: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

FairQueuingandandSmallSwitchBuffersTCPThroughputonSmallBufferSwitch(Congestionw/2GbpsUDPbackgroundtraffic)

RequiresCentOS 7.2orhigher

tc qdisc add dev EthN root fqEnableFairQueuing

PacingsideeffectofFairQueuingyields~1.25Gbpsincreaseinthroughput@10Gbpsonourhosts

TSOdifferencesstillnegligibleonourhostsw/IntelX520

SlidefromMichaelSmitasin,LBL

Page 40: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

ESnet’s100GSDNTestbedTopology

40

ESnet

ESnet

ESnet

ESnet

ESnet

ESnet

ESnet

ESnet

ESnet

ESnetDedicated 100G

2 x 10G2 x 40G

8 x 10G

2 x 10G

5 x 10G

2 x 10G

SDN POP

SDN POP

SDN POP

100G POP

SDN POP

2 x 10G

SDN POP

8 x 10G

SDN POP

5 x 10G

SDN POP

100G

Oakland, California

Denver, Colorado

Chicago, Illinois

New York, New York

Washington, D.C.

Atlanta, Georgia

Amsterdam, Netherlands

CERN, Geneva Switzerland

exoGENI Rack

1 x 10G

1 x 40G

2 x 10G 2 x 40G

4 x 10G

4 x 10G

2 x 10G

4 x 10G

2 x 10G

2 x 10G

6 x 10G

5 x 10G

exoGENI Rack 40G

2 x 10G

2 x 10G

OpenFlow 1.0

OpenFlow 1.0

SDN Switch

SDN Switch

SDN Switch

SDN Switch

SDN Switch

SDN Switch

SDN Switch

denv-cr5

SDN POPreserved for SDX

2 x 10G

OpenFlow 1.0

2 x 10G

ESnet 100G SDN Testbed

3 x 10G

Berkeley, California

Argonne

Page 41: Recent Linux TCP Updates, and how to tune your 100G hostmeetings.internet2.edu/.../20160927-tierney-improving-performance... · Recent Linux TCP Updates, and how to tune your

TestbedAccess• Proposalprocesstogainaccessdescribedat:• http://www.es.net/RandD/100g-testbed/proposal-process/• Testbedisavailabletoanyone:

– DOEresearchers– Othergovernmentagencies– Industry

• MustsubmitashortproposaltoESnet (2pages)• ReviewCriteria:

– Project“readiness”– Couldtheexperimenteasilybedoneelsewhere?

41