9/19/20151 automating cross-layer diagnosis of enterprise 802.11 wireless networks geoffrey m....
TRANSCRIPT
![Page 1: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/1.jpg)
04/19/23 1
Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks
Geoffrey M. Voelker
Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung Cheng,
Jennifer Chiang, Patrick Verkaik, Alex Snoeren, Stefan Savage
![Page 2: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/2.jpg)
204/19/23
Diagnosing distributed systems
Simple systems Few components Inputs/Output observed Cause of failure usually
obvious Distributed systems
Many interdependent components
Hard to monitor all interactions
Cause of failure/degradation is non-obvious
![Page 3: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/3.jpg)
304/19/23
The promise of enterprise 802.11
Blanket AP coverage = seamless connectivity
![Page 4: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/4.jpg)
404/19/23
![Page 5: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/5.jpg)
504/19/23
A familiar story...
“The wireless is being flaky.”
“Flaky how?”
“Well, my connections got dropped earlier and now things seem very sloooow.”
“OK, we will take a look”
“Wait, wait … it’s ok now”
“Mmm… well let us know if you have any more problems.”
User
Support
![Page 6: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/6.jpg)
604/19/23
Our story: new CSE building at UCSD
150k square feet 4 floors + basement >500 occupants
Building-wide WiFi 40 APs (802.11b/g)
Channel 1, 6, 11
Users complain about wireless performance since we moved in July 2005 Admins and vendors can
not solve the issues
![Page 7: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/7.jpg)
704/19/23
Why is it hard to figure out?
Problems can be in anywhere Across layers – protocols
Even in the same layer – 802.11 {a,b,f,g,h,i,n,s} Software incompatibilities – vendor variations Transient or persistent - time Radio propagates in free space - locations Radio spreads across channels – frequencies
Shared spectrum makes it worse APs bridge wireless and wired worlds – infrastructure
To diagnose Gather data everywhere Analyze across all layers
Want a system to do this job automatically
![Page 8: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/8.jpg)
804/19/23
Better world
The wireless is being flaky
User Your SSH has over 200ms responsetime in average, 8% TCP packet is lost due to the interferences from the microwave oven nearby
This problem is logged for sys admins
![Page 9: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/9.jpg)
904/19/23
Shaman Goal: Develop a system to automatically diagnose
problems in wireless networks Pervasive data collection (Jigsaw)
Extensive passive monitoring system Observe all transmissions across locations, channels,
and time Provides a unified synchronized trace of every packet
transmission Explicitly model protocols on critical path
DHCP, 802.11 MAC, TCP, etc. Provides complete delay and loss breakdown
For every packet transmission, all protocol stages Framework for diagnostic tools
Use model outputs to determine root cause of problems
Users can query on demand, also alert admins
![Page 10: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/10.jpg)
1004/19/23
Shaman system architecture
Tracesync &
merging
Gather and merge traces from monitors into one global trace
Protocolmodeling
Infer protocol states
Criticalpath
diagnosis
Identify problems on the critical path
Do all in real-time
Wired gatewaymonitor
Wirelessmonitor
Wirelessmonitor
Wirelessmonitor
…
![Page 11: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/11.jpg)
1104/19/23
Why pervasive monitoring?
Protocol states are often not directly observable Inferred from packet traces
and protocol state machines
Packet delay and losses PHY/MAC interactions
with each other and the environment
Capturing all wireless events provide the ground truth to model protocol states Require a global
perspective = one clock Require high resolution
timestamp for 802.11 timing analysis
How?
DHCP req
DHCP rsp
![Page 12: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/12.jpg)
1204/19/23
Jigsaw passive monitor system
Overlays existing WiFi network Series of passive monitors Blanket deployment for best
coverage Monitor
PoE box w/ 266Mhz P4 + 128MB ram
2 b/g radios 96 monitors (192 radios)
Monitors are paired in each location Covering all channels in use
Captures all 802.11 activity (including PHY/CRC errors)
Stream back to centralized storage
![Page 13: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/13.jpg)
1304/19/23
Trace merging (ideal)Tim
e
![Page 14: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/14.jpg)
1404/19/23
Not all monitors see all packets
![Page 15: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/15.jpg)
1504/19/23
Trace merging (reality)Tim
e
Time (s)C
lock
diff
(us
)
![Page 16: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/16.jpg)
1604/19/23
Challenge 1: sync at 10us precision
Why 10us precision? Critical evidence for 802.11 layer analysis
802.11 channel access mechanism Carrier-sense multiple access (CSMA)
Channel busy wait Channel idle send
Timing unit is ~10us Precise trace timestamps reveal 802.11 internal states
Ex1: if A and B send at same time, they could interfere A can’t hear B
Ex2: if A sends right after B’s transmission A can hear B How?
Create a global clock Monitors timestamp packets w/ local HW clocks
802.11 HW clocks has 1us granularity Estimate the offset between local and global clock for each monitor
![Page 17: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/17.jpg)
1704/19/23
Challenge 2: sync across 192 radios
Goal: estimate the offset between local and global clock for each monitor Time route from one monitor to the other
Sync across channels Ch. 1 monitor does not hear packet sent in ch. 6. Dual radios on same monitor slaved to same clock
To
∆t1
∆t2
0 1 2 3
Jigsaw: Solving the Puzzle of Enterprise 802.11 AnalysisCheng, Bellardo, Benko, Snoeren, Voelker, and SavageSIGCOMM 2006
![Page 18: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/18.jpg)
1804/19/23
Trace merging (reality)Tim
e
Frame 1
Frame 4
Frame 5
Frame 3
Frame 2
Shamansync’dtrace
![Page 19: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/19.jpg)
1904/19/23
Part of a sync’d trace
Traces synchronized
User 1
User 2
![Page 20: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/20.jpg)
2004/19/23
Shaman system architecture
Tracesync &
merging
Gather and merge traces from monitors into one global trace
Protocolmodeling
Infer protocol states
Criticalpath
diagnosis
Identify problems on the critical path
Wired gatewaymonitor
Wirelessmonitor
Wirelessmonitor
Wirelessmonitor
…
![Page 21: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/21.jpg)
2104/19/23
Now we have fully sync’d global traces What protocols must we model?
Critical path Mobility management
Scan/associate w/ AP DHCP ARP Portal page login
Data transport protocols TCP 802.11 Mac delay/loss
Modeling protocols
![Page 22: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/22.jpg)
2204/19/23
Mobility Management
Create the illusion of a single AP Proprietary system w/ site specific policy
Most components are simple protocols 1-2 request-response transactions
Easier to model compared to TCP Very reliable in wired network
ARP, DHCP, DNS Seldom suspected as the culprits in wireless
Users expect seamless connectivity People often suspend/resume laptops while
moving in the office building
![Page 23: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/23.jpg)
2304/19/23
Mobility overhead in UCSD CSE
Major Problem: (Gratuitous) ARPs & Scans
![Page 24: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/24.jpg)
2404/19/23
Protocol Modeling
Distribution is not enough Want to diagnose any user’s problem by finding
the root cause
Need to track per packet delay and loss Essential to model e2e protocols like TCP
Complex mechanisms to accommodate delay and loss
Example: slow SSH response high TCP losses most 802.11 retries failed microwave ovens operating nearby
![Page 25: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/25.jpg)
2504/19/23
The journey of a packet in 802.11
Wirelessgateway
CNN.com AP User
Tim
e
Wired packet802.11 Data802.11 Ack
Queuing
Channel busy
Exp. Backoff
Exp. Backoff
![Page 26: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/26.jpg)
2604/19/23
Modeling 802.11 packet delays
Emulate AP queue Based on
input/output events Events observed
directly Ethernet packet on
wires 802.11 data/ack on
wireless
Need to infer when a packet … Reaches head of TxQ Is scheduled to the
TxQ Is received by the AP
Directly observed
Inferred/Modeled
AP
![Page 27: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/27.jpg)
2704/19/23
Applying 802.11 delays to TCP diagnosis
Scenario 3 users downloading same large tar ball through same
AP from the CSE website 1 user complains about download performance in spite
of having 54Mbps 802.11g connectivity Major performance bottleneck is queue competition
![Page 28: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/28.jpg)
2804/19/23
Modeling packet losses
Delay alone is not enough for diagnosis Loss is another major factor
802.11 performs retransmission on loss Loss happens on both ways
Data or Ack Must model 802.11 conversations
Loss causes Attenuation (e.g. not enough signal strength) Interference from other 802.11 devices (hidden-
terminals) Interference from other devices in 2.4GHz
![Page 29: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/29.jpg)
2904/19/23
Broadband interference
~9 am 12-2 pm
![Page 30: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/30.jpg)
3004/19/23
Interference fingerprints
![Page 31: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/31.jpg)
3104/19/23
TCP performance measures
Want to measure TCP performance bottlenecks Compares actual goodput
with (modeled) ideal goodput [JP98]
Major problems in UCSD CSE TCP bulk flows 30% small receiver
window 19% AP retry bug 30% AP 802.11b/g
compatibility policy (protected mode)
Rcv wnd = 0
Internet Delay/Loss
Wireless Loss
Over-protected
BestRate?
Automating Cross-Layer Diagnosis of Enterprise Wireless NetworksCheng, Afanasyev, Benko, Verkaik, Snoeren, Voelker, and Savage. SIGCOMM 2007
WirelessDelay
![Page 32: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/32.jpg)
3204/19/23
Putting everything together
Criticalpath
diagnosis
DHCP
ARP
TCP
Diagnose
802.11Delay/Loss
BroadbandInterference
Scan/Association
![Page 33: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/33.jpg)
3304/19/23
System status Real-time monitoring and diagnosis of UCSD
CSE wireless network 30 seconds delay
Serving UCSD CSE wireless users Resolved 67 tickets
Validated manually Discovered various implementation bugs and
protocol problems Only retry once Do not respect CSMA, burst frames in a row Very large transmission duration Overly conservative 802.11g protection policy …
Working w/ vendors and admins to fix AP bugs Re-deployed in city 802.11 mesh network in
the bay area
![Page 34: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/34.jpg)
3404/19/23
Conclusions
Wireless diagnosis is challenging Especially for large enterprise network Need to check a lot of factors Need a system to the job automatically
Shaman: an automatic comprehensive wireless diagnosis system Large-scale 24x7 monitoring High resolution synchronization Models protocol states on the critical path
Mobility management TCP 802.11 delay and losses
Automatically diagnose user problems in real-time
![Page 35: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/35.jpg)
3504/19/23
Q & A
Source code, data, live traffic monitoringsysnet.ucsd.edu/wireless/
![Page 36: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/36.jpg)
3604/19/23
![Page 37: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/37.jpg)
3704/19/23
Related Work WiFiProfiler [Mobisys06]
Peer diagnosis among clients DAIR [NSDI07]
Distributed monitors but application-spec traffic summaries No centralized merging/sync
Fine-grained location system Wit [SIGCOMM06]
Automatic protocol states inference engine Airmagnet
Troubleshooting user problems (PHY/MAC) Detect interferences, security problems, protocol
incompatibilities Special devices to perform active probes
Airtight/Airdefense/Kismet Detect rogue APs and security problems
![Page 38: 9/19/20151 Automating Cross-Layer Diagnosis of Enterprise 802.11 Wireless Networks Geoffrey M. Voelker Mikhail Afanasyev, John Bellardo, Peter Benko, Yu-Chung](https://reader033.vdocuments.us/reader033/viewer/2022042718/56649e615503460f94b5ce30/html5/thumbnails/38.jpg)
3804/19/23
Other work
Metropolitan-scale Wi-Fi location systemCheng, Chawathe, LaMarca, Krumm. Mobisys 2005
Monkey See, Monkey Do: A tool for TCP Tracing and ReplayingCheng, Hoezle, Cardwell, Savage, Voelker. USENIX 2004
Fatih: Detecting and Isolating Malicious RoutersMizrak, Cheng, Marzullo, Savage. DSN 2005
Total Recall: System Support for Automated Availability ManagementBhagwan, Tati, Cheng, Savage, Voelker. NSDI 2003