internet iso-bar: a scalable overlay distance monitoring system yan chen, lili qiu, chris overton...
Post on 20-Dec-2015
216 views
TRANSCRIPT
Internet Iso-bar: A Scalable Overlay Distance Monitoring System
Yan Chen, Lili Qiu, Chris Overton and Randy H. Katz
Motivations Applications of end-to-end distance
monitoring/estimation– Overlay Routing/Location – Peer-to-peer Systems – VPN Management/Provisioning– Service Redirection/Placement – Cache-infrastructure Configuration
Requirements for E2E distance monitoring system– Scalable: a small amount of probing traffic and system load– Accurate: capture congestion/failures + latency estimation– Fast: small computation for real-time estimation– Incrementally deployable– Easy to use
Benefit applications– Application-driven measurement– Inference techniques for trouble shooting, root cause analysis– Improve application performance and reliability
E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring
Scalability: (N hosts, AP address prefixes, K landmarks, C clusters) N > AP » C C ≥ K
Estimation accuracy
Monitors deployment
E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring
Static estimation
Scalability: (N hosts, AP address prefixes, K landmarks, C clusters) N > AP » C C ≥ K
O(N • K) probes, each landmark takes O(N)
Estimation accuracy
Accurate, but only symmetric distance
Monitors deployment
End hosts
E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring
Static estimation
Yes Yes Yes
Scalability: (N hosts, AP address prefixes, K landmarks, C clusters) N > AP » C C ≥ K
O(N • K) probes, each landmark takes O(N)
O(F•AP) probes, F = number of CDN edge server farms
Clustering need pair-wise distance b/t all pairs of APs,O(C2 +AP) probes
O(N2) probes
Estimation accuracy
Accurate, but only symmetric distance
No existing comparison.
Inaccurate: Triangulation inequality & proximity-based clustering
Exact measurements most accurate
Monitors deployment
End hosts CDN edge servers
Transit AS’s (hard to deploy)
End hosts
E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring
Static estimation
Yes Yes Yes Yes
Scalability: (N hosts, AP address prefixes, K landmarks, C clusters) N > AP » C C ≥ K
O(N • K) probes, each landmark takes O(N)
O(F•AP) probes, F = number of CDN edge server farms
Clustering need pair-wise distance b/t all pairs of APs,O(C2 +AP) probes
O(N2) probes
O(C2
+N) probes
Estimation accuracy
Accurate, but only symmetric distance
No existing comparison.
Inaccurate: Triangulation inequality & proximity-based clustering
Exact measurements most accurate
Similar accuracy to GNP
Monitors deployment
End hosts CDN edge servers
Transit AS’s (hard to deploy)
End hosts
End hosts
Problem Formulation
• Given N end hosts, how to select a subset of
them as monitors and build a scalable overlay distance monitoring service without knowing the underlying topology?
• Distance info desired: report congestion/failure if occurs, otherwise latency
E2E Congestion/Failures Analysis
• Based on National Lab of Applied Network Research (NLANR) AMP data set– 104 sites in US (including Alaska, Hawaii) & Australia, every
host ping all other hosts every minute– Sliding window of 10 samples, use minimum RTT as latency
sample– 105M measurements, 6/25/01 – 7/1/01– Congestion/failures (uniformly denoted as congestion)
defined as measurement “loss” or (latency > geo mean × geo stdev)
• Congestions not common, only 0.96% samples• A few congestion links dominate the E2E congestion
– Besides those happened at the last mile, E2E congestion exhibit strong spatial correlation
NLANR AMP Sites
Internet Iso-bar
• Procedures1. Cluster hosts that perceive similar performance
to a small set of sites (landmarks)2. For each cluster, select a monitor for active and
continuous probing3. Estimate distance between any pair of hosts
using inter- and intra-cluster distance
Internet Iso-bar (I): Host Clustering• Define correlation distance between each pair of
hosts– Existing work use network proximity: cor_dist(i,j) =
net_dist(i,j) (denoted pij)– Iso-bar uses network distance vector (k landmarks for
clustering only): netVi = [pi1, pi2, …, pik]T
• Euclidean distance based:
• Cosine vector similarity based:
• Apply generic clustering methods – Optimize the worst case: minimize the maximum radius of all
clusters (limit_num_minRmax)– Optimize the average case: minimize the sum of total host-
monitor distance (limit_num_minDistSum)
2jm
km1imji )p (p|netVnetV| j),cor_dist(i
k
1m
k
1m jm
2
im2
k
1m jmim
ji
ji
)p()p(
)p(p1
|netV||netV|
netVnetV1j),cor_dist(i
End Host
Cluster ACluster
B
Cluster C
Landmark
Diagram of Internet Iso-bar
Cluster A
End Host
Cluster B
Monitor
Cluster C
Distance probes from monitor to its hosts
Distance probes among monitors
Landmark
Diagram of Internet Iso-bar
Internet Iso-bar (II): Distance Estimation
• Intra-cluster estimation – If path(m, i) or path(m, j) is
congested, report path(i, j) as congestion
– O/w pDist(i,j) = (mDist(m, i) + mDist(m, j))/ 2
• Inter-cluster estimation– If path(mi, i), path(mi, mj) or
path(mj, j) is congested, report path(i, j) as congestion
– O/w pDist(i,j) = mDist(mi, mj)
i
j
m
j
mj
im
i
Evaluation Methodology• Internet measurement data
– NLANR AMP data set
• Clustering with geometric mean of training date
• Estimation dates: 6/25/01 – 7/24/01, 12/06/01
– Keynote CDN measurement data • 63 agents covering all major ISPs in US, Europe, Asia &
Australia• 2 targets (CDN re-directors) in Boston and Texas• Measure TCP connection time (2/3 of handshake) from each
agent to target every minute• Training date: 10/21/2002• Estimation dates: 10/21/2002 – 11/25/2002
• Similar latency estimation results for both datasets, present NLANR
Evaluation Methodology (II)• Estimation metric
– Relative accuracy error for un-congested latency
– Stability
– For dynamic monitoring systems, amount of congestion captured and false positive ratio
• Internet distance estimation techniques evaluated– Omniscent: use g-mean data of (source, dest) on training
date– Global Network Positioning (GNP) – Clustering with network distance vector (Iso-bar)– Clustering with network proximity
• 15 clusters vs. 15 landmarks of GNP
|]istmeasured_d
distpredicted_logAvg[|rrorrelative_e 2
Latency Prediction Accuracy &
Stability
• Training date: 06/25/01
• Estimation dates: 06/25/01 - 12/06/01
• Summary of the 90th percentile relative error for various distance estimation methods
Training date: 06/25/01 Estimation date: 07/24/01
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
Relative Error
Cu
mu
lati
ve P
rob
abili
ty
OmniscentGNPIso-bar: minDistSumIso-bar: minRmaxProximity-based clustering
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 30 60 90 120 150 180
Interval between traning and estimation dates (days)
Rel
ativ
e E
rro
rOmniscent
GNP
Iso-bar:minDistSum
Iso-bar:minRmax
Proximity-basedclustering
Distance Estimation Results• Latency estimation when un-congested
– Omniscient is the most accurate, but unscalable– GNP and Iso-bar are the second
• Both have good accuracy and stability for distance estimation• GNP unscalable for online monitoring, static approach
– Iso-bar outperforms proximity-based clustering by 50%• 90th percentile < 0.5, if 60ms latency, 45ms < prediction < 90ms
• Congestion/failures estimation– 6/25/01 – 7/01/01, averagely 148K congested
measurements per day– Iso-bar captures 78% of them, 32% false positive ratio– Only 3% of monitoring overhead compared with RON
Conclusions
• Propose Internet Iso-bar• Cluster hosts based on the network similarity• Inter- and Intra-cluster latency estimation w/
first-step heuristic for congestion/failure detection
• Preliminary results promising – High accuracy & stability for normal latency
estimation– Simple heuristics of congestion estimation captures
78% of congestions, with 32% false positive, and only 3% of monitoring overhead of RON
Ongoing Work
• Current focus switch from latency estimation to congestion/failures estimation – Apply topology information, e.g. lossy link detection with
network tomography– Cluster and choose monitors based on the lossy links
• Benefit applications– Dynamic node join/leave for P2P systems
• Joining client pings landmark sites to get distance vector, compare with those of monitors, and choose closest one to join
• Split/merge clusters
– Multi-path selection
• More comprehensive evaluation– Simulate with large network– Deploy on PlanetLab, and operate at finer level
Internet Iso-bar Problem formulation:
Given N end hosts, how to select a subset of them as monitors and build a scalable overlay distance monitoring service without knowing the underlying topology?
Distance info desired: report congestion/failure if occurs, o/w latency
Our approach:1. Cluster hosts that perceive similar performance to a small set
of sites (landmarks)2. For each cluster, select a monitor for active and continuous
probing3. Estimate distance between any pair of hosts using inter- and
intra-cluster distance
Performance evaluation – Using real Internet measurement data – Compared with other distance estimation services: GNP, RON– Performance metrics: accuracy and stability
Internet Iso-bar (II): Distance Estimation
• Congestion/failures analysis– Congestion/failures (uniformly denoted as congestion) not
common• Defined as measurement “loss” or (latency > geo mean × geo stdev)• Only 0.96% out of 105M NLANR ping measurements over a week
– Suggest a few congestion links dominate the E2E congestion• Besides those happened at the last mile, E2E congestion exhibit
strong spatial correlation
• Estimation algorithms– Intra-cluster estimation (i and j use the same monitor m)
• If path(m, i) or path(m, j) is congested, report path(i, j) as congestion• O/w predictedDist(i,j) = (measuredDist(m, i) + measuredDist(m, j))/ 2
– Inter-cluster distance estimation• If path(monitori, i), path(monitori, monitorj) or path(monitorj, j) is congested,
report path(i, j) as congestion• Otherwise predictedDist(i,j) = measuredDist(monitori, monitorj)
– Self-diagnostics of monitors, check for last-mile congestion