ac dc tcp: virtual congestion control enforcement for...
TRANSCRIPT
AC⚡DCTCP:VirtualCongestionControlEnforcementforDatacenterNetworks
Keqiang He,EricRozner,Kanak Agarwal,YuGu,WesFelter,JohnCarter,AdityaAkella
1
DatacenterNetworkCongestionControl
• Congestionisnotrareindatacenternetworks[Singh,SIGCOMM’15]
• Taillatencyishuge• 99.9th-tilelatencyisordersofmagnitudehigherthanthemedian[Mogul,HotOS’15]• Queueinglatencyisthemajorcontributor[Jang,SIGCOMM’15]
• NewdatacenterTCPcongestioncontrolschemeshavebeenproposed• E.g.,DCTCP,TIMELY,DCQCN,TCP-Bolt,ICTCP,etc
2
But,WeCanNotControlVMTCPStacks
• Inmulti-tenantdatacenters,adminscannotcontrolVMTCPstacks• BecauseVMsaresetupandmanagedbydifferententities
3
Virtualization
Servers
Storage
Networking
Tenant1VM
Tenant2VM
Tenant3VM
Infrastructure
TCP/IPstack TCP/IPstack TCP/IPstackTherefore,outdated,inefficient,ormisconfiguredTCPstackscanbeimplementedintheVMs.
Thisleadsto2mainproblems.
Problem#1:LargeQueueingLatency
4
switchqueue
sender receiver
Noqueueinglatency,TCPRTTisaround60to200microsecondsTCPRTTcanreachtensofmillisecondsbecauseofpacketqueueing.
PPPPPPPP
Problem#2:TCPUnfairness
• ECNandnon-ECNcoexistenceproblem[Judd,NSDI’15]• Non-ECN:e.g.,CUBIC• ECN:e.g.,DCTCP
5
Problem#2:TCPUnfairness(cont.)
• Differentcongestioncontrolalgorithmsleadtounfairness
65flowswithdifferentCCalgorithmscongesta10GlinkDumbbelltopology
send
ers
receivers
CC:CongestionControl
AC!V⚡DCTCP:AdministratorControloverDataCenterTCP
7
ImplementsTCPcongestioncontrolintheVirtualSwitch
EnsuresVMTCPstackscannotimpactthenetwork
AC⚡DC:HighLevelView
8
OS OS OS
AppsApps Apps
ControlplaneDatapath(AC/DC)
vNIC vNIC vNIC
DatacenterNetwork
vSwitch
VirtualMachinesAC/DC(sender)
AC/DC(receiver)
Unifo
rmper-flow
CC
Per-flowCCfeed
back
Server
Casestudy:DCTCPCCinthevSwitch
AC⚡DCBenefits• NomodificationstoVMsorhardware
• Lowlatencyprovidedbystate-of-the-artCCalgorithms
• ImprovedTCPfairness andsupportbothECNandnon-ECNflows
• Enforceper-flowdifferentiationviacongestioncontrol,e.g.,• East-westandnorth-southflowscanusedifferentCCs(webserver)• Givehigherpriorityto“mission-critical”traffic(backendVM)
9
AC⚡DCDesign• ObtainingCongestionControlState
• DCTCPCongestionControlinthevSwitch
• EnforcingCongestionControl
• Per-flowDifferentiationviaCongestionControl
10
ObtainingCongestionControlState
• Per-flowconnectiontracking• Alltrafficgoesthroughthevirtualswitch• WecanreconstructCCviamonitoringallthepacketsofaconnection
• Maintainper-flowcongestioncontrolvariables• E.g.,CC-relatedsequencenumbers,dupack counteretc
11
PacketFlow
classificationUpdatingCCvariables
DCTCPCongestionControlinthevSwitch
• UniversalECNmarking
• GetECNfeedback
12
UniversalECNMarking
• Why?• NotallVMsrunECN-CapableTransports(ECT)likeDCTCP
• UniversalECNMarking• AllpacketsenteringthefabricshouldbeECN-markedbythevirtualswitch• SolvestheECNandnon-ECNcoexistenceproblem
13
GetECNFeedback
14
congestedswitch
Needawaytocarrythecongestioninformationback.
CongestionExperienced(CE)marked
Senderside
Receiverside
GetECNFeedback
15
congestedswitch
AC/DCAC/DC
sender receiver
Congestionfeedbackisencodedas8bytes:{ECN_bytes,Total_bytes}.
PiggybackedonanexistingTCPACK(PACK).
CongestionExperienced(CE)marked
DCTCPCongestionControlinthevSwitch
16
ExtractCCinfoifitisPACK;
IncomingACK
Updateconnectiontrackingvariables;Update⍺ onceeveryRTT;
Congestion?
tcp_cong_avoid();
No
Loss?Yes
⍺=max_alpha;
Yes
No
wnd=wnd*(1- ⍺/2);
AC/DCenforcesCContheflow;SendACKtoVM;
Cutwnd inlastRTT?
Yes
No
DCTCPCongestionControlLaw
EnforcingCongestionControl
• TCPsendsmin(CWND,RWND)• CWNDiscongestioncontrolwindow(congestioncontrol)• RWNDisreceiver’sadvertisedwindow(flowcontrol)
• AC⚡DCreusesRWNDforcongestioncontrolpurpose• VMswithunalteredTCPstackswillnaturallyfollowourenforcement
• Non-conformingflowscanbepolicedbydroppinganyexcesspacketsnotallowedbythecalculatedcongestionwindow• Losshastoberecoverede2e,thisincentivizestenantstorespectstandards
17
ControlLawforPer-flowDifferentiation
18
𝑅𝑊𝑁𝐷 = 𝑅𝑊𝑁𝐷 ∗ (1 −𝛼2)
𝑅𝑊𝑁𝐷 = 𝑅𝑊𝑁𝐷 ∗ (1 − (𝛼 −𝛼𝛽2 ))
When𝛽 iscloseto1,itbecomesDCTCP.When𝛽 iscloseto0,itbacks-offaggressively.
Larger𝛽 forhigherprioritytraffic.
DCTCP:
AC⚡DCTCP:
Implementation• PrototypeimplementationinOpenvSwitchkerneldatapath• ~1200LoCadded
• Ourdesignleveragesavailabletechniquestoimproveperformance• RCU-enabledhashtablestoperformconnectiontracking• AC⚡DCmanipulatesTCPsegments,insteadofMTU-sizedpackets• AC⚡DCleveragesNICchecksummingsotheTCPchecksumdoesnothavetoberecomputedafterheaderfieldsaremodified
19NIC
HypervisorTCP/IP
AC⚡DC
VM1Stack
VM2Stack
ManipulatesTCPsegments
NICrecalculatesTCPchecksum
TCPsegment
P P P P
TSO
Evaluation
• Testbed:17servers(6-core,60GBmemory),6 10Gbpsswitches• Microbenchmark topologies
20send
ers
receiver
Incast topologyDumbbelltopology
send
ers
receivers
Evaluation• Macrobechmark topology
• Metrics:TCPRTT,lossrate,FlowCompletionTime(FCT)21
17serversattachedtoa10Gswitch.
ExperimentSetting(compared3schemes)• CUBIC• CUBICstackontopofstandardOVS
• DCTCP• DCTCPstackontopofstandardOVS
• AC⚡DC• CUBIC/Reno/Vegas/HighSpeed/IllinoisstacksontopofAC⚡DC
22
VM VM VM
CUBIC DCTCP Any
CUBIC DCTCP AC⚡DC
OVS OVS AC⚡DC
VMs
Hypervisor
TrackingWindowSize
23
RunningDCTCPstackontopofAC⚡DC,onlyoutputscalculatedRWNDwithoutenforcement.AC⚡DCcloselytracksthewindowsizeofDCTCP.
Dumbbelltopology
send
ers
receivers
Convergence
24
CUBIC DCTCP AC/DC
AC/DChascomparableconvergencepropertiesasDCTCPandisbetterthanCUBIC.
Dumbbelltopology
send
ers
receivers
AC⚡DCimprovesfairnesswhenVMsusedifferentCCs
25StandardOVS AC⚡DC
Dumbbelltopology
send
ers
receivers
Overhead(CPUandMemory)
26
Senderside
Lessthan1%additionalCPUoverheadcomparedwiththebaseline.Eachconnectionuses320bytestomaintainCCvariables(10kconnectionsuse3.2MB).
Dumbbelltopology
send
ers
receivers
TCPIncast RTT&droprate
27
50th percentileRTT 99.9th percentileRTT Packetdroprate
AC⚡DCtrackstheperformanceofDCTCPclosely.
send
ers
receiver
Incast topology
Flowcompletiontimewithtrace-drivenworkloads
28
Web-searchingworkload(DCTCP) Data-miningworkload(CONGA)
AC⚡DCobtainssameperformanceasDCTCP.AC⚡DCcanreduceFCTby36%-76%comparedwithdefaultCUBIC.
17serversattachedtoa10Gswitch.
Summary
• AC⚡DCallowsadministratorstoregaincontroloverarbitrarytenantTCPstacksbyenforcingcongestioncontrolinthevirtualswitch
• AC⚡DCrequiresnochangestoVMsornetworkhardware
• AC⚡DCisscalable,light-weight(<1%CPUoverhead)andflexible
29
Thanks!
30
BackupSlides
31
RelatedWork
• DCTCP• ECN-basedcongestioncontrolforDCNs
• TIMELY• Latency-basedcongestioncontrolforDCNs• AccuratelatencymeasurementprovidedbyaccurateNICtimestamps
• vCC• vCCandAC⚡DCarecloselyrelatedworksbytwoindependentteamsJ
32
ECNandnon-ECNCoexistence
33
SwitchconfiguredwithWRED/ECN
ECN
Non-ECN
Whenqueueoccupancyislargerthanmarkingthreshold,non-ECNpacketsaredropped
IPSec
• AC⚡DCisnotabletoinspecttheTCPheadersforIPSectraffic
• Mayperformapproximatingratelimitingbasedoncongestionfeedbackinformation.
34