![Page 1: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/1.jpg)
I Know What Your Packet Did Last Hop: Using Packet Histories
to Troubleshoot Networks
Nikhil HandigolWith
Brandon Heller, Vimal Jeyakumar, David Mazières, Nick McKeownNSDI 2014, Seattle, WA
April 2, 2014
![Page 2: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/2.jpg)
2
Bug Story: Incomplete HandoverA
B
Switch X
WiFi AP Y WiFi AP Z
Match: ActionSrc A, Dst B: Output to Y
![Page 3: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/3.jpg)
3
Network Outagesmake news headlines
Hosting.com's New Jersey data center was taken down on June 1, 2010, igniting a cloud outage and connectivity loss for nearly two hours… Hosting.com said the connectivity loss was due to a software bug in a Cisco switch that caused the switch to fail.
On April 26, 2010, NetSuite suffered a service outage that rendered its cloud-based applications inaccessible to customers worldwide for 30 minutes… NetSuite blamed a network issue for the downtime.
The Planet was rocked by a pair of network outages that knocked it off line for about 90 minutes on May 2, 2010. The outages caused disruptions for another 90 minutes the following morning.... Investigation found that the outage was caused by a fault in a router in one of the company's data centers.
![Page 4: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/4.jpg)
4
Troubleshooting Networks is Hard Today
ping
traceroute
tcpdump/SPAN/sFlowSNMP
Forwarding State
Forwarding State
Forwarding State
Forwarding State
Forwarding State
• Tedious and ad hoc• Requires skill and experience• Not guaranteed to provide helpful answers
Lots and lotsof graphs
(source: NANOG Survey in “Automatic Test Packet Generation”, Hongyi Zeng, et. al.)
![Page 5: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/5.jpg)
5
We want complete network visibility
ping
traceroute sFlow
SNMP
Complete visibility: every event that ever happened to every packet
0 100
We want tobe hereVisibility Spectrum
![Page 6: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/6.jpg)
6
Talk Outline
1. How to achieve complete network visibility– An abstraction: Packet History– A platform: NetSight
2. Why achieving complete visibility is feasible– Data compression– MapReduce-style scale-out design
![Page 7: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/7.jpg)
7
Forwarding State
Forwarding State
Forwarding State
Forwarding State
Packet History
Packet history = Path taken by a packet + Header
modifications +
Switch state encountered
![Page 8: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/8.jpg)
8
Our Troubleshooting Workflow
Forwarding State
Forwarding State
Forwarding State
Forwarding State
1. Record and store all packet histories2. Query and use packet histories of errant packets
![Page 9: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/9.jpg)
9
NetSightA platform to capture and filter
packet histories of interest
![Page 10: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/10.jpg)
10
PostcardCollector
Control Plane
Flow Table State Recorder
![Page 11: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/11.jpg)
11
PostcardCollector
Control Plane
Flow Table State RecorderMatch ACT
Match ACT
![Page 12: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/12.jpg)
12
PostcardCollector
Control Plane
Flow Table State Recorder
![Page 13: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/13.jpg)
13
PostcardCollector
Control Plane
Flow Table State Recorder
Version -> Flow Table State
Packet Header
Switch ID Output port
Version
Step 1: Generate postcards
![Page 14: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/14.jpg)
14
Reconstructing Packet HistoriesStep 2: Group postcards by generating packet
Packet Header
Switch ID Output port
Version
Packet Header
Switch ID Output port
Version
Packet Header
Switch ID Output port
Version
Packet Header
Switch ID Output port
Version
![Page 15: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/15.jpg)
15
Reconstructing Packet HistoriesStep 3: Sort postcards using topology
Topo-sort
![Page 16: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/16.jpg)
16
PostcardCollector
Control Plane
Flow Table State Recorder
1. <Match, Action>2. <Match, Action>3. <Match, Action>4. <Match, Action>5. <Match, Action> 6. …7. …
1. <Match, Action>2. <Match, Action>3. <Match, Action>4. <Match, Action>5. <Match, Action> 6. …7. …
1. <Match, Action>2. <Match, Action>3. <Match, Action>4. <Match, Action>5. <Match, Action> 6. …7. …
1. <Match, Action>2. <Match, Action>3. <Match, Action>4. <Match, Action>5. <Match, Action> 6. …7. …
![Page 17: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/17.jpg)
17
Control Plane
Flow Table State Recorder
PostcardCollector
NetSight APIPacket History Filter: A regular-expression-like language to specify packet histories of interest
• Reachability errors• Isolation violation• Black holes• Waypoint routing violation
Troubleshooting Apps
Postcards
Packet History Assembly
Troubleshooting Application
Troubleshooting Application
Troubleshooting Application
Troubleshooting App
FilteredPacket Histories
![Page 18: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/18.jpg)
18
Bug Story: Incomplete Handover
Packet History Filter
Packet History
WiFi AP Y WiFi AP Z
Switch X
Packet History Filter“Pkts from server not reaching the client”
Packet HistorySwitch X:inport: p0, outports: [p1] mods: [...] state version: 3
Switch Y:inport p1, outports: [p3]mods: ...…
Y
X
![Page 19: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/19.jpg)
19
Troubleshooting Apps
netsharknprof
ndb netwatch
ndb:Interactivenetwork debugger
netwatch:Live networkinvariant monitor
netshark:Network-widewireshark
nprof:Hierarchicalnetwork profiler
![Page 20: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/20.jpg)
20
But will it scale?
![Page 21: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/21.jpg)
21
Why generating postcards for every packet at every hop is crazy!
Network Overhead– 64 byte-postcard/pkt/hop– Stanford Network: 5 hops avg, 1031 byte avg pkt– 31% extra traffic!
Processing Overhead– Packet history assembly and filtering
Storage Overhead
![Page 22: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/22.jpg)
22
Why generating postcards for every packet at every hop is ^ crazy!
Cost is OK for low-utilization networks– E.g., test networks, “bring-up phase” networks– Single server can handle entire Stanford traffic
not
![Page 23: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/23.jpg)
23
Why generating postcards for every packet at every hop is ^ crazy!
Huge redundancy in packet header fields– Only a few fields change – IP ID, TCP seq. no.– Postcards can be compressed to 10-20 bytes/pkt
not
Diff-based compression
![Page 24: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/24.jpg)
24
Why generating postcards for every packet at every hop is ^ crazy!
Postcard processing is embarrassingly parallel– Each packet history can be processed independent
of other packet histories
not
Assembly Filtering
![Page 25: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/25.jpg)
25
Scaling NetSight Performance
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… … … …
Postcards
Shuffle
CompressedPostcard Lists
CompressedPacket Histories
![Page 26: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/26.jpg)
26
Scaling NetSight Performance
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… … … …
Postcards
Shuffle
CompressedPostcard Lists
CompressedPacket Histories
![Page 27: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/27.jpg)
27
Scaling NetSight Performance
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… … … …
Postcards
Shuffle
CompressedPostcard Lists
CompressedPacket Histories
![Page 28: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/28.jpg)
28
Scaling NetSight Performance
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… … … …
Postcards
Shuffle
CompressedPostcard Lists
CompressedPacket Histories
![Page 29: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/29.jpg)
29
NetSight Variants
![Page 30: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/30.jpg)
30
NetSight-SwitchAssist moves postcard compression to switches
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… … …
Shuffle
Move postcard compression to switcheswith simple hardware mechanisms
![Page 31: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/31.jpg)
31
NetSight-HostAssist exploits visibility from the hypervisor
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… … …
Shuffle
HVPacketHeader
Mini-postcards contain only unique pkt ID and switch state version
(1) Store packet header at the hypervisor(2) Add unique pkt ID
![Page 32: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/32.jpg)
32
Overhead Reduction in NetSight
Basic (naïve) NetSight : 31% extra trafficin Stanford backbone network
NetSight Switch-Assist: 7%
NetSight Host-Assist: 3%
![Page 33: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/33.jpg)
33
Takeaways
Complete network visibility is possible– Packet History: a powerful troubleshooting
abstraction that gives complete visibility– NetSight: a platform to capture and filter packet
histories of interestComplete network visibility is feasible – It is possible to collect and filter packet histories at
scale
![Page 34: I Know W hat Y our Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks](https://reader036.vdocuments.us/reader036/viewer/2022062410/5681634e550346895dd3ee07/html5/thumbnails/34.jpg)
34
Every
NetSight API
http://yuba.stanford.edu/netsight