magellan: a tool for unicast fault isolation cengiz alaettinoglu packet design llc ramesh govindan...
Post on 08-Jan-2018
217 Views
Preview:
DESCRIPTION
TRANSCRIPT
Magellan: A Tool for Unicast Fault Isolation
Cengiz AlaettinogluPacket Design LLC
Ramesh GovindanInformation Sciences Institute
John MehringerInformation Sciences Institute
Motivation
Why can't I reach www.cnn.com? Why is the Internet soooo slow today? It was fine yesterday!
Goals
User's perspective What is of interest to user
Internet wide routing monitoring not just an AS
History of route changes not just a snapshot
Fault diagnosis link/router failure/repair
Challenges
Scaling Directed search by correlating destinations Shared learning
Automated heuristics for fault isolation Route change Location of link/router failure/repair Oscillations Others?
Data Collection
Select target's interesting to the user tcpdump/libpcap Weighting / aging (not implemented)
Initial path to targets traceroute
Monitoring paths Carefully constructed ICMP probes
Snapshot
Monitoring
Construct a routing graph Nodes: routers Links: (to, from, source, destination, hop, statistics...)
Probe each link Send two ICMP Echo Request packets to destination
For ttl = hop - 1, hop, verify incident routers, to, from
Scheduling Probes
WRR schedule a probe for each link Limits the rate of probe packets Weights: some links are more important/interesting
Distance to link No of destinations using it History of volatility
Exponentially averaged
Test Result
Positive Do nothing
Negative Determine new path
Incremental traceroute from the link upstream and downstream
Determine cause Automatic heuristics based
Active Fault Isolation
Link failure Probe the link using other destinations that uses it Correlate results
Router failure Generalize on link failure
Oscillations History of old routes Back and forth between a set of routes
Magellan Components
Magellan Nam
Perl Script
Visualization Offline or real-time Great for debugging/tuning
Snapshot
Link or router failure I want the nam buttons, etc...
Effectiveness thru Measurement
Picked 500 popular web sites Yahoo, msn, aol, cnn, ... www.web100.com
Monitored routes to these destinations for 7 days
Measurements
Number of Link Probes: 839694 Probe per second: 1.39 / second
Total Failures: 2078 Router Failures: 334 Link Failures: 951 Unknown cause: 793
Transients Number of Oscillations: 541
No of Path Changes
Effect of Path Length
Dominant Path
Cumulative Dominant Path
Future work: Distributed Magellan
Magellan 1
Magellan 2
Weight to probe inversely proportional to ratio of distances
Shared learning
Related Work
Topology Maps Router/AS level interconnections Mercator, skitter, AT&T Not all links are usable (routing policy/metrics)
Routing Topology Effect of policy/metrics Npd Vern Paxson's work Focus is on measurement
Conclusions
Unicast fault isolation User's perspective Automated heuristics History of changes
http://www.isi.edu/scan
top related