magellan: a tool for unicast fault isolation cengiz alaettinoglu packet design llc ramesh govindan...

Post on 08-Jan-2018

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Goals User's perspective What is of interest to user Internet wide routing monitoring not just an AS History of route changes not just a snapshot Fault diagnosis link/router failure/repair

TRANSCRIPT

Magellan: A Tool for Unicast Fault Isolation

Cengiz AlaettinogluPacket Design LLC

Ramesh GovindanInformation Sciences Institute

John MehringerInformation Sciences Institute

Motivation

Why can't I reach www.cnn.com? Why is the Internet soooo slow today? It was fine yesterday!

Goals

User's perspective What is of interest to user

Internet wide routing monitoring not just an AS

History of route changes not just a snapshot

Fault diagnosis link/router failure/repair

Challenges

Scaling Directed search by correlating destinations Shared learning

Automated heuristics for fault isolation Route change Location of link/router failure/repair Oscillations Others?

Data Collection

Select target's interesting to the user tcpdump/libpcap Weighting / aging (not implemented)

Initial path to targets traceroute

Monitoring paths Carefully constructed ICMP probes

Snapshot

Monitoring

Construct a routing graph Nodes: routers Links: (to, from, source, destination, hop, statistics...)

Probe each link Send two ICMP Echo Request packets to destination

For ttl = hop - 1, hop, verify incident routers, to, from

Scheduling Probes

WRR schedule a probe for each link Limits the rate of probe packets Weights: some links are more important/interesting

Distance to link No of destinations using it History of volatility

Exponentially averaged

Test Result

Positive Do nothing

Negative Determine new path

Incremental traceroute from the link upstream and downstream

Determine cause Automatic heuristics based

Active Fault Isolation

Link failure Probe the link using other destinations that uses it Correlate results

Router failure Generalize on link failure

Oscillations History of old routes Back and forth between a set of routes

Magellan Components

Magellan Nam

Perl Script

Visualization Offline or real-time Great for debugging/tuning

Snapshot

Link or router failure I want the nam buttons, etc...

Effectiveness thru Measurement

Picked 500 popular web sites Yahoo, msn, aol, cnn, ... www.web100.com

Monitored routes to these destinations for 7 days

Measurements

Number of Link Probes: 839694 Probe per second: 1.39 / second

Total Failures: 2078 Router Failures: 334 Link Failures: 951 Unknown cause: 793

Transients Number of Oscillations: 541

No of Path Changes

Effect of Path Length

Dominant Path

Cumulative Dominant Path

Future work: Distributed Magellan

Magellan 1

Magellan 2

Weight to probe inversely proportional to ratio of distances

Shared learning

Related Work

Topology Maps Router/AS level interconnections Mercator, skitter, AT&T Not all links are usable (routing policy/metrics)

Routing Topology Effect of policy/metrics Npd Vern Paxson's work Focus is on measurement

Conclusions

Unicast fault isolation User's perspective Automated heuristics History of changes

http://www.isi.edu/scan

top related