![Page 1: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/1.jpg)
EyeQ:(An engineer’s approach to)
Taming network performance unpredictability in the Cloud
VimalMohammad Alizadeh
Balaji PrabhakarDavid Mazières
Changhoon KimAlbert Greenberg
![Page 2: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/2.jpg)
2
What are we depending on?
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
5 Lessons We’ve Learned Using AWS
… in the Netflix data centers, we have a high capacity, super fast, highly reliablenetwork. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency.
Overhaul appsto deal with variability
Many customersdon’t even realise network issues:
Just “spin up more VMs!”Makes app more network dep.
![Page 3: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/3.jpg)
3
Cloud: Warehouse Scale ComputerMulti-tenancy: To increase cluster utilisation
6/11/12
http://research.google.com/people/jeff/latency.html
Provisioning the WarehouseCPU, memory, disk
Network
![Page 4: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/4.jpg)
4
Sharing the Network
• Policy– Sharing model
• Mechanism– Computing rates– Enforcing rates on entities…• Per-VM (multi-tenant)• Per-service (search, map-reduce, etc.)
6/11/12
Can we achieve this?
2Ghz VCPU15GB memory1Gbps network
Tenant X’s Virtual Switch
VM1 VM2 VMnVM3 …
Tenant Y’s Virtual Switch
VM1 VM2 VMiVM3 …
Customer X specifiesthe thickness of each pipe.No traffic matrix.(Hose Model)
![Page 5: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/5.jpg)
5
Why is it hard? (1)
• Bandwidth demands can be…– Random, bursty– Short: few millisecond requests
• Timescales matter!– Need guarantees on the order of few RTTs (ms)
6/11/12
• Default policy insufficient: 1 vs many TCP flows, UDP, etc.• Poor scalability of traditional QoS mechanisms
10–100KB 10–100MB
![Page 6: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/6.jpg)
6
Seconds: Eternity
6/11/12
Switch
1 Long livedTCP flow
Bursty UDP sessionON: 5msOFF: 15ms
Shared10G pipe
![Page 7: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/7.jpg)
7
Under the hood
6/11/12
Switch
![Page 8: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/8.jpg)
8
Why is it hard? (2)
6/11/12
Switch
• Switch sees contention, but lacks VM state• Receiver-host has VM state, but does not see contention
(1) Drops in network: servers don’t see true demand
(2) Elusive TCP (back-off) makes true demand detection harder
![Page 9: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/9.jpg)
9
Key Idea: Bandwidth Headroom• Bandwidth guarantees: managing congestion• Congestion: link util reaches 100%
– At millisecond timescales• Don’t allow 100% util
– 10% headroom: Early detection at receiver
6/11/12
N x 10G
UDP
TCP
Shared pipeLimit to 9G
Single Switch: Headroom
What about a network?
![Page 10: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/10.jpg)
10
Network design: the old
6/11/12
http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-
teaser/
Over-subscription
![Page 11: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/11.jpg)
11
Network design: the new
6/11/12
http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-
teaser/
(1) Uniform capacity across racks
(2) Over-subscription only atTop-of-Rack
![Page 12: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/12.jpg)
12
Mitigating Congestion in a Network
6/11/12
Load balancing + Admissibility =Hotspot free network core
[VL2, FatTree, Hedera, MicroTE]
Aggregate rate > 10GbpsFabric gets congested
Server
VM
10Gbps pipe
Fabric
Aggregate rate < 10GbpsCongestion free Fabric
Server
VM
10Gbps pipe
FabricLoad balancing: ECMP, etc.
Admissibility: e2e congestion control (EyeQ)
![Page 13: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/13.jpg)
13
EyeQ Platform
6/11/12
TX packets
VMVM
TX
VM
Software VSwitchAdaptive Rate
Limiters
untrusted
RX
3Gbps6Gbps
RX packets
Software VSwitch
VM
Congestion Detectors
untrusted VM
RX componentdetects
TX componentreacts
End-to-endflow control
(VSwitch—VSwitch)
DataCentreFabric
Congestion Feedback
![Page 14: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/14.jpg)
14
Does it work?
6/11/12
Without EyeQ With EyeQ
Improves utilisation
Provides protection
TCP: 6GbpsUDP: 3Gbps
![Page 15: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/15.jpg)
15
State: only at edge
EyeQ
One Big Switch
![Page 16: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/16.jpg)
16
EyeQ Load balancing+ Bandwidth headroom+ Admissibility at millisec timescales= Network as one big switch= Bandwidth sharing at edge
Linux, Windows implementation for 10Gbps~1700 lines C codehttp://github.com/jvimal/perfiso_10g (Linux kmod)No documentation, yet.
![Page 17: EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon](https://reader035.vdocuments.us/reader035/viewer/2022062404/551c1ab65503469e4f8b584b/html5/thumbnails/17.jpg)
176/11/12