![Page 1: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/1.jpg)
FastandEfficientFlowandLossMeasurementforDataCenterNetworks
YuliangLiRui Miao Changhoon Kim Minlan Yu
1
![Page 2: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/2.jpg)
FlowRadar:Captureallflowsonafinetimescale
2
![Page 3: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/3.jpg)
Flowcoverage
• Weneedtoinspecteachindividualflow– Definedby5tuples:source,dest IPs,ports,protocol
3
Transientloops/blackholes Fine-grainedtrafficanalysis
0%
10%
20%
30%
40%
50%
60%
1 10 100 1000 10000
Distrib
ution
Size (Byte)
![Page 4: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/4.jpg)
Temporalcoverage
• Weneedflowinformationonafinetimescale
4
0
1
2
3
4
5
6
0 50 100
#Loss
Time (ms)
ShorttimescaleLossrates Timelyattackdetection
DoS
![Page 5: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/5.jpg)
Key insight: division of labor
• Goal:Monitoralltheflowsonafinetimescale
5
Overhead atthecollector
Overhead at theswitches
NetFlow
Mirroring
Limitedper-packetprocessingtimeLimitedmemory(10sofMB)
Collectorhaslimitedbandwidthandstorage
Needssampling
![Page 6: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/6.jpg)
Key insight: division of labor
• Goal:Monitoralltheflowsonafinetimescale
6
FlowRadar
Usefixedoperationsperpacketatswitches
Overhead atthecollector
Overhead at theswitches
NetFlow
Mirroring
Smalldatastructures
![Page 7: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/7.jpg)
FlowRadar architecture
7
Extractper-flowcountersacrossswitchesCollector
Switches
AnalysisApplications
Flows&counters
Compactflowcounterswithfixedper-packetoperations
Periodicreport
![Page 8: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/8.jpg)
Challenge:Howtohandlecollisions?
• Handling hashcollisionsishard• Large hash tablesà highmemoryusage• Linked list/Cuckoohashingàmultiple,non-constant#memoryaccesses
8
Flow a b cPacketCount … … …
Flowd
d1
collision
![Page 9: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/9.jpg)
Solution:Embracescollisions
• Handling hashcollisionsishard• Large hash tablesà highmemoryusage• Linked list/Cuckoohashingàmultiple,non-constant#memoryaccesses
• Embracingthecollisions• Lessmemoryandconstant#memoryaccesses
9
![Page 10: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/10.jpg)
FlowXorFlowCountPacketCount
Solution:Embracescollisions
• Embracingthecollisions:xor upalltheflows• Lessmemoryandconstant#memoryaccesses
10
Countingtable
flowa:S(a) Flowb:S(b) Flowc:S(c)a a b b c c
[InvertibleBloomLookupTable(arXiv 2011)]
S(x):#packetsinx
H1 H2 H3
⊕ba a1S(a)
1S(a)
a1S(a)
b1
S(b)+S(b) +S(c)
⊕c b⊕c1
S(b)+S(c)
c1S(c)
2 2 2
![Page 11: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/11.jpg)
FlowFiltertoidentifynewflows
• 1.Checkandupdatetheflowfilter• 2.Updatecountingtable– Thefirstpacketfromanewflow,updateallfields– SubsequentpacketsupdateonlyPacketCount
11
bloomfilter:identifynewflow
FlowXor a a⊕b b⊕c b⊕c a cFlowCount 1 2 2 2 1 1PacketCount S(a) S(a)+S(b) S(b)+S(c) S(b)+S(c) S(a) S(c)
+1+1 +1
d
⊕d⊕d ⊕d+1+1 +1
+1+1 +1
d
Flowfilter
[InvertibleBloomLookupTable(arXiv 2011)]
EncodedFlowset
Countingtable
H1 H2 H3
![Page 12: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/12.jpg)
Easytoimplementinmerchantsilicon
• Switchdataplane– Fixedoperations per packet– Smallmemory: 2.36MBfor100Kflows
• Switchcontrolplane– Collectsthesmallencodedflowset every 10ms
• WeimplementeditusingP4Language.– Portabletomanyp4-compatibleswitchchips
12
![Page 13: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/13.jpg)
Flows&counters
Extractper-flowcountersacrossswitches
AnalysisApplications
FlowRadar architecture
13
Compactflowcounterswithfixedper-packetoperations
Collector
Switches
Periodicreport
Encodedflowset
Encodedflowset
Encodedflowset
Stage1.SingleDecode Stage2.Network-wideDecode
![Page 14: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/14.jpg)
Stage1.SingleDecode
14
FlowXor …FlowCount …PacketCount …
Bloomfilter
Countingtable
FlowfilterFlow #packeta S(a)… …
Input:anencodedflowset fromasingleswitch
Output:per-flowcounters
![Page 15: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/15.jpg)
Flowfilter
Flow #packeta 5
FlowXor a a⊕b b⊕c⊕d b⊕c⊕d a c⊕dFlowCount 1 2 3 3 1 2PacketCount 5 12 13 13 5 6
Stage1.SingleDecode
• Findapurecell:acellwithoneflow• Removetheflowfromallcells
15
Purecell
Decoded:
-1 -1 -1-5 -5 -5
H1 H2 H3
![Page 16: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/16.jpg)
FlowXor 0 b b⊕c⊕d b⊕c⊕d 0 c⊕dFlowCount 0 1 3 3 0 2PacketCount 0 7 13 13 0 6
Flowfilter
Stage1.SingleDecode
• Findapurecell:acellwithoneflow• Removetheflowfromallcells– Createmorepurecells
• Iterateuntilnopurecells
16
Flow #packeta 5Decoded:
H1 H2 H3
![Page 17: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/17.jpg)
Stage1.SingleDecode
17
FlowXor 0 0 c⊕d c⊕d 0 c⊕dFlowCount 0 0 2 2 0 2PacketCount 0 0 6 6 0 6
Flowfilter
Flow #packeta 5b 7
Decoded:
H1 H2 H3
![Page 18: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/18.jpg)
Flows&counters
FlowRadar architecture
18
Collector
Switches
Stage1.SingleDecode Stage2.Network-wideDecode
Periodicreport
Encodedflowset
Encodedflowset
Encodedflowset
AnalysisApplications
![Page 19: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/19.jpg)
Keyinsight:overlappingsetsofflows
• Thesetsofflowsoverlapacrosshops–Wecanusetheredundancytodecodemoreflows
• Usedifferenthashfunctionsacrosshops
19
abcd
abcd
a a⊕c⊕d
b⊕c⊕d
a⊕b⊕c b⊕d
1 3 3 3 2
a⊕d a⊕c b⊕c⊕d
a⊕b⊕c b⊕d
2 2 3 3 2
FlowXor
FlowCountPktCount
FlowXor
FlowCountPktCount
Use5cellstodecode4flows10
Collectorcanleverageflowsets fromallswitchestodecodemoreflows
![Page 20: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/20.jpg)
Keyinsight:overlappingsetsofflows
• Thesetsofflowsoverlapacrosshops–Wecanusetheredundancytodecodemoreflows
• Usedifferenthashfunctionsacrosshops• Provisionswitchmemorybasedonavg(#flows),notmax(#flows)– SingleDecode fornormalcases– Network-widedecodeforburstsofflows
20
![Page 21: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/21.jpg)
Challenge1:setsofflowsnotfullyoverlapped
• Flowsfromoneswitchmaygotodifferentnexthops• Oneswitchreceivesflows frommultiplehops
21
abcd
a
e
cd
![Page 22: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/22.jpg)
Solution:ZigzagdecodewithFlowFilters
• Generalizetotheentirenetwork– Noneedforroutinginformation– Supportincrementaldeployment
22
Flowfilter Flowfilter
a
a b c d
Decoded:
a ec d
ab cc d d eSingleDecode SingleDecodeRemovefromtheother
FlowDecode
![Page 23: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/23.jpg)
Challenge2:differentcountervalues
• Thecounters ofthesameflowmaybedifferentacrosshops– Somepacketsmaygetlost– Somepacketsmaybeonthefly
23drop
![Page 24: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/24.jpg)
Solution:calculatecountersforeachswitch
• Solvealinearequationsystemforeachswitch– Alreadyknowthefullsetofflowsateachswitch– Solvablewithnflowsand>=ncombinationsofcounters
24
a b c d
a⊕b⊕d …
3 …14 …
FlowXor
FlowCountPktCount
Flow #pkta 5b 7c 4d 2
!𝑆(𝑎)+𝑆(𝑏)+𝑆(𝑑)=14…
![Page 25: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/25.jpg)
Solving the linearequationsystem
• Challenge:solvingthelinearequationsystemis notfast• Thematrixisverysparse,eachcolumnhask“1”s.• We use iterative solvers• Speeduptheiteration:– Starttheiterationfromcloseapproximationsofcounters,whichisgotfromtheFlowDecode
– Stoptheiterationwhentheresultisfloatingwithin±0.5aninteger.
25
![Page 26: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/26.jpg)
Flows&counters
FlowRadar architecture
26
Collector
Switches
Stage1.SingleDecode Stage2.Network-wideDecode
Periodicreport
Encodedflowset
Encodedflowset
Encodedflowset
Stage2.1FlowDecode
Stage2.2CounterDecode
FlowXor …FlowCount …PacketCount …
Flow filter
Flow filter
Flow #pktabc
CounterDecode
Flow #pkta 5b 7c 4
FlowDecode
CounterDecode
FlowXor …FlowCount …PacketCount …
Flow #pktxyz
Flow #pktx 3y 5z 6
AnalysisApplications
![Page 27: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/27.jpg)
Evaluations
27
AnalysisApplicationsFlows&countersCollector
Switches
Stage1.SingleDecode
Stage2.1FlowDecode
Stage2.2CounterDecode
Periodicreport
Encodedflowset
Encodedflowset
Encodedflowset
• Memoryusageisefficient• Bandwidthusageislow• NetDecode vsSingleDecode
![Page 28: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/28.jpg)
Evaluation
• Simulationofk=8FatTree (80switches,128hosts)inns3• Configurethememorybaseonavg(#flow),– whenburstofflowshappens,usenetwork-widedecode
• Theworstcaseisallswitchesarepushedtomax(#flow)– Traffic:eachswitchhassamenumberofflows,andthussamememory
• Eachswitchreportstheflowsets every10ms.
28
![Page 29: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/29.jpg)
�
�
��
��
��
��
��
� �� ��� ����
�����������������������
� ����� ��� ������ ���
��������� �� ���������
Memoryefficiency
29
�
�
��
��
��
��
��
� �� ��� ����
�����������������������
� ����� ��� ������ ���
��������� �� ���������������� ���� �� ��
#cell=#flow(Impractical) FlowRadar:2.36MB
FlowRadar:24.8MB
![Page 30: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/30.jpg)
Bandwidthusage
• EvaluatedbasedonFacebookdatacenters(SIGCOMM’ 15)– EachToR haslessthan100Kflowsper10msà lessthan2.3Gbps– EachToR talksto44*10G-hosts– FlowRadar consumes0.52%oftotalToR bandwidth
30
![Page 31: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/31.jpg)
NetDecode vs. SingleDecode
31
���
�
��
���
����
�����
�� �� �� �� ��� ��� ���
����������������������������
� ����� ��� ������ ���
SingleDecode NetDecode
SingleDecode:100Kflow,10ms
NetDecodeneedsmoretime
NetDecode:126.8Kflow,3sec
![Page 32: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/32.jpg)
NetDecode
• TheCounterDecode takesthemajorityofthedecodingtime• TheCounterDecode limitsthe#counterscouldbedecoded– If#flows>126.8K,wecanstilldecodeallflows(upto152K)withoutcountersusingFlowDecode
32
![Page 33: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/33.jpg)
AnalysisApplicationsFlows&counters
FlowRadar analyzer
33
Collector
Switches
Stage1.SingleDecode Stage2.Network-wideDecode
Periodicreport
Encodedflowset
Encodedflowset
Encodedflowset
![Page 34: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/34.jpg)
Analysisapplications
• Flowcoverage– Transientloops/blackholes– Errorinmatch-actiontable– Fine-grainedtrafficanalysis
• Temporalcoverage– Shorttimescaleper-flowlossrate– ECMPloadimbalance– Timelyattackdetection
34
![Page 35: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/35.jpg)
Per-flowlossmap:bettertemporalcoverage
• Detectlossesaftereveryflowlet
35
012345
012345
1 2 3 4 5 6 7 8 9 10 11
Switch1
Switch2
15packets
14packets
35packets
34packets
NetFlowdetectsloss
FlowRadardetectsloss
![Page 36: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/36.jpg)
FlowRadar conclusion
• Reportallper-flowcountersonafinetimescale• Fullyleveragebothswitchesandthecollector– Switches:Encodedflowsets withfixedper-packetprocessingtime,andlowmemoryusage
– Collector:Network-widedecoding
36
FlowRadar
Overhead atthecollector
Overhead at switches
NetFlow
Mirroring
![Page 37: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/37.jpg)
LossRadar:Detectindividuallossesonafinetimescale
37
![Page 38: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/38.jpg)
Losseshavesignificantimpact
• Significantlatencyincreaseandthroughputdrop– ViolatingSLAanddroprevenue
• Takesoperatorsuptotensofhourstofindandfixtheproblem
38
Takeaway:WewanttoknowlossASAP
![Page 39: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/39.jpg)
Challenge:manytypesoflosses
39
OutputPort 1OutputPortn
InputPort1InputPortn
Input buffer
… parser Ingressmatch-actions(L2->L3->ACL)
Sharedbuffer
Egressmatch-actions
…
SwitchCPU
Configure
corruptions
misconfigurations updates
ResourceshortageResourceshortage corruptions
Greyfailure
Takeaway:Weneedagenerictool
![Page 40: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/40.jpg)
Challenge: losses could happen everywhere
40
TCP:I have losses
But I don’t knowwhere
ECMP:multiplepossible paths
Takeaway:Weneedthelocationsofthelosses
![Page 41: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/41.jpg)
Challenge:lackofdetailsoflosses
• Difficulttoinfertherootcauses
41
Flow1
Flow2 Pass
Drop
0
1
2
3
4
5
6
0 50 100
#Loss
Time (ms)
Differentflowsexperiencedifferentlosspatterns
Losspatternschangeovertime
Takeaway:Needtheinformationofindividuallosses
![Page 42: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/42.jpg)
LossRadar overview
• Fastdetectionà Periodicallysendtrafficdigesttocollector• Knowinglocationà Install meters to monitor traffic• Needinfoofindividuallossesà Includedetailsofeachloss• Beinggenericà Comparethesetsofpackets
42
DigestCollector
TrafficDigest
TrafficDigest
TrafficDigest
![Page 43: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/43.jpg)
Howtocoverthewholenetwork
• Need to coverallpipelines
43
UM
DMUM
DM
DMUMIngresspipeline
sharedbuffer
Egresspipeline
Cover ingress pipeline Cover egress pipeline
UM:UpstreammeterDM:Downstreammeter
S2
S1S3
![Page 44: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/44.jpg)
LossRadar overview
• Fastdetectionà Periodicallysendtrafficdigesttocollector• Knowinglocationà Install meters to monitor traffic• Needinfoofindividuallossesà Includedetailsofeachloss• Beinggenericà Comparethesetsofpackets
44
DigestCollector
TrafficDigest
TrafficDigest
TrafficDigest
![Page 45: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/45.jpg)
Howtocomparethesetsofpackets
• UsingInvertibleBloomFilter(IBF)[Sigcomm’11]• OnlyO(#loss)memory– Eachendkeepsasmallpieceofmemory– Same packetsat both ends will cancel out– Only the differences remain
45
upstream
downstream
Collector
![Page 46: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/46.jpg)
How to achieve low memory usage
46
xorSumcount
00
00
00
00
00
xorSumcount
00
00
00
00
00
upstream
downstream
Collector
![Page 47: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/47.jpg)
How to achieve low memory usage
47
a a axorSumcount
a1
00
a1
a1
00
xorSumcount
00
00
00
00
00
upstream
downstream
Collector
a
drop
a=5-tuple+IPID
![Page 48: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/48.jpg)
How to achieve low memory usage
48
bb
bxorSumcount
a1
b1
a⊕b2
a1
b1
xorSumcount
00
b1
b1
00
b1
a a a upstream
downstream
Collector
b
bb b b
![Page 49: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/49.jpg)
How to achieve low memory usage
49
c c cupstream
downstream
Collector
xorSumcount
a⊕c2
b⊕c2
a⊕b2
a⊕c2
b1
xorSumcount
00
b1
b1
00
b1
bb
ba a a
b b b
c
drop
![Page 50: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/50.jpg)
How to achieve low memory usage
50
d dd
upstream
downstream
Collector
xorSumcount
a⊕c⊕d3
b⊕c2
a⊕b⊕d3
a⊕c2
b⊕d2
xorSumcount
d1
b1
b⊕d2
00
b⊕d2
c c cb
bba a a
b b bdd d
d
d
xorSumcount
a⊕c2
c1
a1
a⊕c2
00
ca a ac
c
Collectordoessubtraction
![Page 51: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/51.jpg)
Benefits
• Memory-efficient– Proportionalto#losses– 10KBpermeter(trafficpatternreportedinDCTCP)
• Extend to collect more packet information– TTL:helpidentifyloops– Timestamp:taggedinthepacketheadersattheupstream– Anyotherfieldsthatprogrammableswitchescanconfigure
51
![Page 52: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/52.jpg)
Challenges
• HowtoensureUMandDMcapturethesamesetofpackets– UsethepacketheadertocarrybatchID
• Incrementaldeployment– Put UMandDMaroundtheblackbox– Comparesum(UMi)andsum(DMj)
52
![Page 53: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/53.jpg)
LossRadar conclusion
• Designafastandefficientlossdetectionsystem– Providelocationsanddetailsofindividuallosses– Generictoanytypesoflosses– Memoryonlyproportionalto#losses
• Futurework:designananalyzertodiagnoseproblems– Temporalanalysis,correlationacrossflowsandswitches– Combinewithinfoprovidedbyhosts
53
![Page 54: Fast and Efficient Flow and Loss Measurement for Data ...netseminar.stanford.edu/seminars/05_23_16.pdf · – Small memory:2.36MB for 100K flows • Switch control plane – Collects](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57fb52ceec351c757c7643/html5/thumbnails/54.jpg)
Conclusion
• FullvisibilityofDataCenters– Reportalltheflowsandallthelostpacketsinafewmillisecondsacrossalltheswitches
• Efficientdatastructuresonprogrammableswitches– BothFlowRadar andLossRadar areimplementedonP4– TechnologytransfertoBarefootswitches(Seemydemotomorrow)
54