fast and cautious: leveraging multi-path diversity for ... · fast and cautious: leveraging...
TRANSCRIPT
Fast and Cautious:Leveraging Multi-path Diversity
for Transport Loss Recovery in Data Centers
Guo ChenYuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong (Larry) Luo, Yongqiang Xiong,
Xiaoliang Wang, and Youjian Zhao
Motivationn Services care about the tail flow completion time (tail FCT)
¨ Large number of flows generated in each operation¨ Overall performance governed by the last completed flows
16/6/25 2
Large-scale web applicationhosted in
Data Center Network (DCN)
AppLogic
AppLogic
AppLogic
AppLogic
AppLogic
AppLogic
AppLogic
AppLogic
AppLogic
AppLogic
Respondingtoauserrequest
Motivationn Services care about the tail flow completion time (tail FCT)
¨ Large number of flows generated in each operation¨ Overall performance governed by the last completed flows
n But packet loss hurts tail FCT¨ Real case in a Microsoft Azure’s DCN
16/6/25 3
Spine switch 2% random drop rate -->
increase of 99th percentile latency of all users
DCNtaillatencyvisualization[Pingmesh (SIGCOMM’15)]
(a)Normal (b)Spinefailure
Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary
16/6/25 4
Lossrateandlocationdistributionoflossy links(lossrate>1%)
Mean loss rate 4%
78% above ToR
Similar in 5 days
Packet Loss in DCN
16/6/25 5
1) Lossfrequently happens(theoveralllossrateislow)2) Mostlosseshappeninthenetworkinsteadoftheedge
n Loss characteristics¨ Measured in a Microsoft production DCN during Dec. 1st-5th, 2015
Packet Loss in DCNn Reasons causing loss
¨ Congestion lossØ Uneven load-balance
Ø Incast
¨ Failure lossØ Silent random drop
Ø Packet black-hole
Bursty;Transient
16/6/25 6
Complex;Hardtodetect
Greatly mitigated(e.g., 1%->0.01%)[Jupiter Rising SIGCOMM’15]
Common& Huge impacton performance[Pingmesh SIGCOMM’15]
Outlinen Motivationn Packet Loss in DCNn Impact of Packet Loss
¨ Why loss hurts the tail?¨ How hard loss hurts?
n Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary
16/6/25 7
How TCP Handles Loss?n Fast recovery
¨ Wait for certain number of DACKs to detect the loss and retransmit
8
1-2
Ack1-2
3-6
DupAck3
Retran3
RTT
RTT
Sender Receiver
1-2
Ack1-2
3-6
Retran3
RTT
Timeout
Sender Receiver
How TCP Handles Loss?n Fast recovery
¨ Wait for certain number of DACKs to detect the loss and retransmit
n Timeout (RTO)¨ If not enough DACKs return, retransmit
after a timeout
9
RTO>> RTTe.g.RTO=5ms,RTT<100us[Pingmesh (SIGCOMM’15),DCTCP(SIGCOMM’10)]
1-2
Ack1-2
3-6
Retran3
RTT
Timeout
Sender Receiver
How TCP Handles Loss?n Fast recovery
¨ Wait for certain number of DACKs to detect the loss and retransmit
n Timeout (RTO)¨ If not enough DACKs return, retransmit
after a timeout
10
RTO>> RTTe.g.RTO=5ms,RTT<100us[Pingmesh (SIGCOMM’15),DCTCP(SIGCOMM’10)]
Encountering one RTO àdramatically increase the FCT
Timeout probability of flows with different sizes passing a path with different packet loss rate
10KB(testbed) 100KB(testbed)
100KB(analysis)
10KB(analysis)
Loss Incurs Timeout
1. 1% loss à more than 1% flows timeout2. Larger flows (e.g. 100KB)
a. timeout ratio sharply grows when loss rate > 1%16/6/25 11
99th FCT>RTO3%lossà ~10%timeout
n A little loss causes enough timeout to hurt the tail FCT
Timeout probability of flows with different sizes passing a path with different packet loss rate
10KB(testbed) 100KB(testbed)
100KB(analysis)
10KB(analysis)
Loss Incurs Timeout
1. 1% loss à more than 1% flows timeout2. Larger flows (e.g. 100KB)
a. timeout ratio sharply grows when loss rate > 1%16/6/25 12
99th FCT>RTO3%lossà ~10%timeout
n A little loss causes enough timeout to hurt the tail FCT
To avoid RTO
Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary
16/6/25 13
Challenge for TCP Loss Recoveryn Prior works add aggressiveness to congestion control to do
loss recovery before timeout (RTO)¨ Tail Loss Probe (TLP)
Ø transmit one prober after 2RTT
¨ Instant Recovery (TCP-IR)Ø generate an FEC packet for every group of packets (up to 16)
Ø FEC packets also act as probers, delayed 1/4RTT before sent
¨ Proactive/RepFlowØ Duplicate every packet/flow
16/6/25 14
[SIGCOMM’13,RFC5827]
[SIGCOMM’13,RFC5827]
[SIGCOMM’13,INFOCOM’14]
Challenge for TCP Loss Recoveryn How long to wait before sending recovery packets?
¨ For congestion lossØ Should delay enough in case of worsening congestion
16/6/25 15
Bursty:Lead to multiple consecutive losses
[Incast (WREN’09),DCTCP(SIGCOMM’10)]
Challenge for TCP Loss Recoveryn How long to wait before sending recovery packets?
¨ For congestion lossØ Should delay enough in case of worsening congestion
¨ For failure loss such as random dropØ Should recover as fast as possible, otherwise already increase the FCT
16/6/25 16
• Wait 2RTT is too costly• Accurate & high-precision RTT measurement is challenging
[TLPSIGCOMM’13,RFC5827]
How to accelerate loss recovery as soon as possible, under various loss conditions without causing congestion?
Brief Summaryn Loss easily incurs timeout to hurt the tailn To prevent timeout, prior works add fixed aggressiveness to
recover loss before timeoutn Hard to adapt to various loss conditions
¨ Should be fast for failure loss¨ Should be cautious for congestion loss
16/6/25 17
Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary
16/6/25 18
n Utilize the “good” paths to proactively conduct loss recovery for “bad” paths¨ Leveraging path diversity (multiple paths; a few encounter loss)
n Fast and Cautious¨ Fast
Ø Proactive (immediate) recovery for potential packet loss utilizing sparetransmission opportunity
¨ CautiousØ Strictly follow congestion control without adding aggressiveness
FUSO: Fast Multi-path Loss Recovery
16/6/25 19
ReceiverSender
Multi-path Transport Background
16/6/25 20
SF1
SF2
SF3
SF1
SF2
SF3
CWND2CWNDtotal
CWND1
CWND3
Multi-pathCongestionControl
DataDistribution
Sub-flows:Implicitly/Explicitlymappingtophysicalpaths
ReceiverSender
FUSO
16/6/25 21
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2P3P4P5 P1
ReceiverSender
FUSO
16/6/25 22
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2P3P4P5
P1
ReceiverSender
FUSO
16/6/25 23
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2P3P4P5
P1
ReceiverSender
FUSO
16/6/25 24
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2P3P4P5 P1
ReceiverSender
FUSO
16/6/25 25
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2P3P4P5 P1
ReceiverSender
FUSO
16/6/25 26
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
ReceiverSender
FUSO
16/6/25 27
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P4P5 P1
P3
P2
Lost
ReceiverSender
FUSO
16/6/25 28
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P4P5 P1
P3
P2
Lost
ReceiverSender
FUSO
16/6/25 29
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P4P5 P1P3P2
Lost
ReceiverSender
FUSO
16/6/25 30
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P4P5 P1P3P2
Lost
ACKP3
ReceiverSender
FUSO
16/6/25 31
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P4P5 P1P3P2
Lost
ReceiverSender
FUSO
16/6/25 32
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P4P5
P1P3P2
Lost
ReceiverSender
FUSO
16/6/25 33
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P1P3P2
Lost
P4P5
ReceiverSender
FUSO
16/6/25 34
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P1P3P2
Lost
P4P5
ReceiverSender
FUSO
16/6/25 35
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P1P3P2
Lost
ACKP1P4P5
ACKP4&P5
ReceiverSender
FUSO
16/6/25 36
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
ReceiverSender
FUSO
16/6/25 37
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
LostSpareCWND
Nonewdata
ReceiverSender
FUSO
16/6/25 38
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
Proactivelossrecovery
P2
ReceiverSender
FUSO
16/6/25 39
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
Proactivelossrecovery
P2
“Worst”sub-flow
“Best”sub-flow
ReceiverSender
FUSO
16/6/25 40
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
Proactivelossrecovery
P2
“Worst”sub-flow
“Best”sub-flow
P2
ReceiverSender
FUSO
16/6/25 41
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
Proactivelossrecovery
P2
“Worst”sub-flow
“Best”sub-flow
P2
ReceiverSender
FUSO
16/6/25 42
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
Proactivelossrecovery
P2
“Worst”sub-flow
“Best”sub-flow
P2
ReceiverSender
FUSO
16/6/25 43
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
Lost
Proactivelossrecovery
P2
“Worst”sub-flow
“Best”sub-flow
P2
Done!
ReceiverSender
Standard MPTCP
16/6/25 44
SF1
SF2
SF3
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P3P4P5 P1P2
LostRetransmitafteranRTO
Sender
FUSO: Path Selection
16/6/25 45
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2
Lost
Possibility of encountering loss
Sender
FUSO: Path Selection
16/6/25 46
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2
Lost
“Worst”sub-flow
n “Worst” Sub-flow¨ With un-ACKed data¨ Most likely having loss
Un-ACKed data
Possibility of encountering loss
Sender
FUSO: Path Selection
16/6/25 47
SF1
SF2
SF3
CWNDtotal
CWND1
CWND2
CWND3
P2
Lost
“Worst”sub-flow
n “Worst” Sub-flow¨ With un-ACKed data¨ Most likely having loss
Possibility of encountering loss
n “Best” Sub-flow¨ With spare CWND¨ Least likely having loss
Spare CWND
“Best”sub-flow
FUSO in 1 Slide
n If (spare CWND) && (no new data)¨ Utilize the transmission opportunity to proactively recover¨ Use “good” paths to help “bad” paths
n Multi-path diversity offers many transmission opportunities¨ “Good” paths have spare window
16/6/25 48
AppData...
P2
MultipathCongestionControl
SendtobestSub-Flow
P1
P5P6P7R
P4
Sparewindow
...
Sparewindow
Un-ACKeddata
P4
Sender
AppData
R
Recover
P3
Receiver
Sub-Flow1
Sub-FlowN
Sub-Flow2P3
Sub-Flow1
Sub-Flow2
Sub-FlowN
Recoverypackets
FUSO Implementationn Implemented in Linux kernel; ~900 lines of code
16/6/25 49
https://github.com/1989chenguo/FUSO
Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary
16/6/25 50
Testbed Settingsn Network
¨ 1Gbps fabric & 1Gbps hosts; ECMP routing; ECN enabledn TCP
¨ Init_cwnd=16; min_RTO=5ms
16/6/25 51
99th FCT % of flows encountering
timeout
better
Testbed Resultsn Failure loss
¨ Random-drop
16/6/25 52
Fast
Reducing 99th FCT up to ~82.3%
Reducing the timeout flows up to 100%
Loss rate:0.125%-4%
Latency-sensitiveflows
better
Testbed Resultsn Congestion loss
¨ Incast
16/6/25 53
Concurrentresponses
Performs the best
Cautious
Testbed Resultsn Failure loss & Congestion loss
¨ From failure-loss-dominated to congestion-loss-dominated
16/6/25
54
Loss rate:2%
Latency-sensitiveflows
Adapt to various loss condition
better
Background long flows
Larger-scale Simulationsn Simulation settings
¨ NS2 simulator; 3-layer, 4-port FatTree
¨ 40Gbps fabric, 10Gbps host; 64 hosts, 20 switches
¨ Empirical failure generation
16/6/25 55
Latency-sensitiveflows
Backgroundlong flows
Random failure
better
Larger-scale Simulationsn Simulation settings
¨ NS2 simulator; 3-layer, 4-port FatTree fabric¨ 40Gbps fabric, 10Gbps host; 64 hosts, 20 switches¨ Empirical failure generation
16/6/25 56
Reducing the average FCT up to ~60.3%
Reducing the 99th FCT up to ~87.4%
Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary
16/6/25 57
Summaryn Loss hurts tail latency
¨ Loss is not uncommon¨ A little loss leads to enough timeout, hurting the tail
n Challenges for loss recovery¨ How to accelerate loss recovery under various loss conditions without
causing congestion?n Philosophy for FUSO
¨ To be fast & cautious are equally important¨ Fast: Proactive loss recovery utilizing spare transmission opportunity,
leveraging multipath diversity ¨ Cautious: Strictly follows congestion control without adding
aggressiveness16/6/25 58
Q&A?
Thanks