yibo zhu, monia ghobadi, jitendra padhye (all microsoft)...0 5 0 5 0 5 0 5 0 4kb 16kb 64kb 256kb b b...
TRANSCRIPT
![Page 1: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/1.jpg)
![Page 2: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/2.jpg)
Yibo Zhu, Monia Ghobadi, JitendraPadhye (all Microsoft)
![Page 3: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/3.jpg)
![Page 4: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/4.jpg)
0 5
10 15 20 25 30 35 40
4KB 16KB 64KB 256KB 1MB 4MB
Th
rou
ghpu
t (G
bps)
Message size
TCP
4
Small messages CPU is the bottleneckLarger msgs ~3 CPU
cores are burnt by TCP
Sender Receiver
0
10
20
TCP RDMA(read/write)
RDMA(send)
Tim
e t
o t
ran
sfe
r 2
KB
(m
s)
0
20
40
60
80
100
4KB 16KB 64KB 256KB 1MB 4MB
CP
U u
tiliz
ation
(%
)Message size
TCP
![Page 5: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/5.jpg)
5
RDMA bypasses host OS stack
frees host CPU, lowers latency
Memory
Buffer A
Write local buffer at address A
to remote buffer at address B
Buffer B is filled
DMA
NICApplication
NICApplicationMemory
Buffer B DMA
Sender
Receiver
Allocate
Allocate
![Page 6: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/6.jpg)
6
RDMA single thread ~40Gbps RDMA CPU ~0%
RDMA latency 1~2 μs
0
10
20
TCP RDMA(read/write)
RDMA(send)
Tim
e t
o t
ran
sfe
r 2
KB
(m
s)
0
20
40
60
80
100
4KB 16KB 64KB 256KB 1MB 4MB
CP
U u
tiliz
ation
(%
)
Message size
TCPRDMA
0 5
10 15 20 25 30 35 40
4KB 16KB 64KB 256KB 1MB 4MB
Th
rou
ghpu
t (G
bps)
Message size
TCPRDMA
![Page 7: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/7.jpg)
• Solution:
• Problem
7
![Page 8: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/8.jpg)
Enter DCQCN and TIMELY: Congestion Control for ROCEv2
ECN
Delay
![Page 9: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/9.jpg)
![Page 10: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/10.jpg)
![Page 11: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/11.jpg)
Takeaway:
DCQCN is a little too complicated
![Page 12: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/12.jpg)
DCQCN model matches simulations and implementation
TIMELY model matches simulations
![Page 13: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/13.jpg)
• Stability
• Rate of convergence
• Fairness
• High utilization
• Low flow completion time
![Page 14: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/14.jpg)
We don’t have an intuitive explanation
![Page 15: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/15.jpg)
![Page 16: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/16.jpg)
Load factor = 0.8
![Page 17: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/17.jpg)
![Page 18: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/18.jpg)
• Feedback is delayed as queue builds up
![Page 19: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/19.jpg)
T0, Q = 2
T1, Q = 3
T2, Q = 4
Blue packet arrival complete
Blue packet is about to arrive
Blue packet ready to depart
… and is marked, reflecting
state of queue at T2
Marking threshold = 4 packets
![Page 20: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/20.jpg)
T0, Q = 2
T1, Q = 3
T2, Q = 4
Blue packet arrival complete.
… timer starts
Blue packet is about to arrive
Blue packet ready to depart
… and reflects state of queue
at T0
![Page 21: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/21.jpg)
• Delay inherently reports “stale” information
• The staleness is affected by queue length!
• Longer queue more stale feedback
• This can lead to instability
![Page 22: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/22.jpg)
• Can have fixed queue or fairness – but not both!
![Page 23: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/23.jpg)
Bottleneck queue is a function of number of flows.
DCQCN (40Gbps link) TIMELY (10Gbps link)
![Page 24: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/24.jpg)
![Page 25: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/25.jpg)
DCQCN with RED-like marking
DCQCN with PI-like marking
![Page 26: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/26.jpg)
![Page 27: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/27.jpg)
![Page 28: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/28.jpg)
• Can have fixed queue or fairness – but not both!
• ECN marking is resistant to feedback jitter
![Page 29: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/29.jpg)
0
20
40
60
80
100
120
140
0 0.05 0.1 0.15 0.2
Qu
eu
e(K
B)
Time(s)
TIMELYDCQCN
![Page 30: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/30.jpg)
![Page 31: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/31.jpg)
![Page 32: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger](https://reader033.vdocuments.us/reader033/viewer/2022041920/5e6ba528c5223720de76942d/html5/thumbnails/32.jpg)