re-architecting congestion management in losslessethernet · re-architecting congestion management...
TRANSCRIPT
![Page 1: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/1.jpg)
Re-architecting Congestion Management in
Lossless EthernetWenxue Cheng, Kun Qian, Wanchun Jiang(CSU), Tong Zhang, Fengyuan Ren
NNS group @ Department of Computer Science and Technology, Tsinghua University
![Page 2: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/2.jpg)
Data Center Networks
Large Scales> 10000 machines
Short Messages< 1 KB
Special ProtocolsRDMA
Small RTT< 100𝜇𝑠
High Bandwidth10/40~100/400 Gbps
Shallow Buffer< 30 MB for ToR
Packets Loss ↑
Performance ↓
![Page 3: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/3.jpg)
Data Center Networks
Small RTT< 100𝜇𝑠
High Bandwidth10/40~100/400 Gbps
Shallow Buffer< 30 MB for ToR
Large Scales> 10000 machines
Short Messages< 1 KB
Special ProtocolsRDMA
Lossless Ethernet
![Page 4: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/4.jpg)
Priority-based Flow Control (PFC)
XOFF XON
Upstream Port(Sender)
Downstream Port(Receiver)
![Page 5: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/5.jpg)
PAUSE
Priority-based Flow Control (PFC)
PAUSE
XOFF XON
Upstream Port(Sender)
Downstream Port(Receiver)
![Page 6: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/6.jpg)
Priority-based Flow Control (PFC)
PAUSE
XOFF XON
Upstream Port(Sender)
Downstream Port(Receiver)
Congestion Spreading
![Page 7: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/7.jpg)
RESUMERESUME
Priority-based Flow Control (PFC)
XOFF XON
Upstream Port(Sender)
Downstream Port(Receiver)
![Page 8: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/8.jpg)
PFC IssuesCongestion Spreading & Head-of-Line Blocking
F0F1
Burst
H0
H1
H2
R0
R1
S0 S1
… H15
P2P1
P0
Congestion tree from P2 to H0 and H1.
F0 is a victim flow.
![Page 9: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/9.jpg)
Congestion Control Schemes
F0F1
Burst
H0
H1
H2
R0
R1
S0 S1
… H15
P2P1
P0
Congestion control schemes are needede.g. QCN[IEEE 802.1], DCQCN[RoCEv2] and TIMELY[SIGCOMM 2015].
Congestion NotificationRate Adjustment
Time
Rate
![Page 10: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/10.jpg)
Experimental Observation
Time (ms)0 3
H0H1H2…
H15
16 Messages of 64KB
![Page 11: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/11.jpg)
Experimental Observation
(1) Congestion spreading still exists.
Sending Rate of F1
Evolution-based rate decrease is slower than PFC’s effect.
![Page 12: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/12.jpg)
Experimental Observation
Sending Rate of F0 (Gbps)
(1) Congestion spreading still exists.(2) F0 is also victimized by CC.
PFC infects congestion detection of congestion control schemes.
![Page 13: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/13.jpg)
Experimental Observation
(1) Congestion spreading still exists.(2) F0 is also victimized by CC.(3) Rate recovery is inadaptable to
dynamic network conditions.
Liner rate increase method and tuning parameters.
![Page 14: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/14.jpg)
Basic Idea
Re-architecting Congestion Management
• Congestion Flows ⟷Victim Flows
Congestion Detection
• Fast Rate Decrease• Automatic Rate Increase
Rate Adjustment
![Page 15: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/15.jpg)
Congestion Detection
Real-Congestion (P2)
∑𝑅 > 𝐶
Non-Congestion (P3)
∑𝑅 < 𝐶
Quasi-Congestion (P0)
RESUME RESUMEPAUSE
∑𝑅 ?𝐶
![Page 16: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/16.jpg)
Congestion Detection
Quasi-Congestion (P0)1 1 1 1
RESUME RESUMEPAUSE
Real-Congestion (P2)1 1 1 11 1
Explicit Congestion Notification (ECN)• Only based on queue length• Fail to distinguish quasi-congestion and
real-congestion
![Page 17: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/17.jpg)
Congestion Detection
Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
![Page 18: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/18.jpg)
Congestion Detection
Quasi-Congestion (P0)
Real-Congestion (P2)Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
PAUSERESUMEPN=3
0 0 0PN=2PN=1PN=0
![Page 19: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/19.jpg)
Congestion Detection
Quasi-Congestion (P0)
Real-Congestion (P2)Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
PAUSERESUMEPN=3PN=2PN=1PN=0
1 0 0 0
![Page 20: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/20.jpg)
Congestion Detection
Quasi-Congestion (P0)
Real-Congestion (P2)Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
PAUSERESUMEPN=3
0 0PN=2PN=1
![Page 21: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/21.jpg)
Congestion Detection
Quasi-Congestion (P0)0 0 1 0
RESUME RESUMEPAUSE
Real-Congestion (P2)1 1 1 11 1
Partially marked with ECN
Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
![Page 22: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/22.jpg)
Congestion Detection
Quasi-Congestion (P0)0 0 1 0
RESUME RESUMEPAUSE
Real-Congestion (P2)111111
Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused PN=0
![Page 23: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/23.jpg)
Congestion Detection
Quasi-Congestion (P0)0 0 1 0
RESUME RESUMEPAUSE
Real-Congestion (P2)1 1 1 11 1
Continuously marked with ECN
Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
![Page 24: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/24.jpg)
Congestion Detection
Quasi-Congestion (P0)0 0 1 0
RESUME RESUMEPAUSE
Real-Congestion (P2)1 1 1 11 1
Continuously marked with ECN
Partially marked with ECNNon-Paused ECN (NP-ECN)• Don’t change ECN for packets that has
been paused• Counter PN: number of packets that
has been paused
Victim Flows
Congested Flows
![Page 25: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/25.jpg)
Rate Adjustment
How to adjust the rates of • Congested Flows --> target?• Victim Flows --> no decrease?• Non-congested Flows
Burst = 40𝐺𝑏𝑝𝑠, F0 = 20𝐺𝑏𝑝𝑠,Reduce F1’s rate
H4
![Page 26: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/26.jpg)
F0 = 20𝐺𝑏𝑝𝑠, Reduce F1’s rate
Rate Adjustment
How to adjust the rates of • Congested Flows è reduce to receiving rate immediately • Victim Flows & Uncongested Flows à rate increase
![Page 27: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/27.jpg)
Rate Adjustment
How to adjust the rates of • Congested Flows è reduce to receiving rate immediately • Victim Flows & Uncongested Flows à rate increase
Receiver-Driven Rate Decrease• 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 ← min 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒, 1 − 𝑤IJK 𝑟𝑒𝑐𝑅𝑎𝑡𝑒• No PFC & no serious throughput loss & 1 control loop
![Page 28: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/28.jpg)
Rate Adjustment
How to adjust the rates of • Congested Flows è reduce to receiving rate immediately • Victim Flows & Non-congested Flows à rate increase
Receiver-Driven Rate Decrease• 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 ← min 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒, 1 − 𝑤IJK 𝑟𝑒𝑐𝑅𝑎𝑡𝑒• No congestion & No PFC triggers in one control loop
Self-weighted Rate increase
• N𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 ← 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 1 − 𝑤 +𝑀𝑎𝑥𝑅𝑎𝑡𝑒 R 𝑤𝑤 ← 𝑤 1 − 𝑤 +𝑤IST R 𝑤
• Automatic gentle-to-aggressive
![Page 29: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/29.jpg)
Photonic Congestion Notification (PCN)
• Receiver-driven Rate Decrease
• Self-weighted Rate increase
• NP-ECN• Identify
Congested Flows • Rate Estimator
SenderSwitches
Receiver
Congestion Notification Packet (CNP)
Period T
![Page 30: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/30.jpg)
PCN’s Benefit
Time (ms)0 3
H0H1H2…
H15
16 Messages of 64KB
No congestion tree
No serious throughput loss
![Page 31: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/31.jpg)
Benefit
Time (ms)0 3
H0H1H2…
H15
16 Messages of 64KB
Sending Rate of F0 (Gbps)
Sending Rate of F1 (Gbps)
F1 is reduced in one loop
F0 is not victimized by PCN
![Page 32: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/32.jpg)
Evaluation SetupTestbed Setup
• Dumbbell topology • Implementation on DPDK (Intel 82599)• 4 hosts (PowerEdge R530) connected to single ToR• 10Gbps
NS-3 Simulation Setup• Clos topology• 512 hosts / 32 ToRs / 16 Leafs / 8 Spines• 10Gbps / 40Gbps
![Page 33: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/33.jpg)
Evaluations
Basic Prosperities
• Convergence• Fairness• Stability
Workbench
• Burst Tolerance• Parameter
sensitivity• Realistic
Workloads
Special Cases
• Flow Scalability• Adversarial Traffic• Multiple
Bottlenecks• Multiple Priorities• DeadlockTestbed
NS-3 Simulations
![Page 34: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/34.jpg)
Evaluation: Large-Scale SimulationsSimulation Setup
W1: Web-server workloadW2: Hadoop cluster workload
Pod 0 Pod 7
512 hosts
![Page 35: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/35.jpg)
Evaluation: Large-Scale Simulations
Web-server Workload
PAUSE Rate (Mbps) Flow Complete Time (ms) Flow Complete Rate (Kps)
![Page 36: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/36.jpg)
Evaluation: Large-Scale Simulations
Hadoop Workload
PAUSE Rate (Mbps) Flow Complete Time (ms) Flow Complete Rate (Kps)
![Page 37: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME](https://reader036.vdocuments.us/reader036/viewer/2022062602/5f01f5607e708231d401dff8/html5/thumbnails/37.jpg)
Conclusion
Evaluations on testbed and ns-3 simulation show, PCN triggers fewer PFC and achieves lower flow completion time.
Re-architecting congestion management
Proposing Photonic Congestion Notification (PCN)• NP-ECN à victim flows/congested flows• Receiver-driven rate decrease à no PFC in 1 loop• Automatic rate increase