internet iso-bar: a scalable overlay distance monitoring system yan chen, lili qiu, chris overton...

Internet Iso-bar: A Scalable Overlay Distance Monitoring System

Yan Chen, Lili Qiu, Chris Overton and Randy H. Katz

Motivations Applications of end-to-end distance

monitoring/estimation– Overlay Routing/Location – Peer-to-peer Systems – VPN Management/Provisioning– Service Redirection/Placement – Cache-infrastructure Configuration

Requirements for E2E distance monitoring system– Scalable: a small amount of probing traffic and system load– Accurate: capture congestion/failures + latency estimation– Fast: small computation for real-time estimation– Incrementally deployable– Easy to use

Benefit applications– Application-driven measurement– Inference techniques for trouble shooting, root cause analysis– Improve application performance and reliability

E2E Estimation/Monitoring Systems Comparison

Properties GNP Akamai IDMaps RON Internet Isobar

Dynamic monitoring

Scalability: (N hosts, AP address prefixes, K landmarks, C clusters) N > AP » C C ≥ K

Estimation accuracy

Monitors deployment



Dynamic monitoring

Static estimation


O(N • K) probes, each landmark takes O(N)

Estimation accuracy

Accurate, but only symmetric distance

Monitors deployment

End hosts



Dynamic monitoring

Static estimation

Yes Yes Yes



O(F•AP) probes, F = number of CDN edge server farms

Clustering need pair-wise distance b/t all pairs of APs,O(C2 +AP) probes

O(N2) probes

Estimation accuracy


No existing comparison.

Inaccurate: Triangulation inequality & proximity-based clustering

Exact measurements most accurate

Monitors deployment

End hosts CDN edge servers

Transit AS’s (hard to deploy)

End hosts



Dynamic monitoring

Static estimation

Yes Yes Yes Yes



O(F•AP) probes, F = number of CDN edge server farms

Clustering need pair-wise distance b/t all pairs of APs,O(C2 +AP) probes

O(N2) probes

O(C2

+N) probes

Estimation accuracy


No existing comparison.

Inaccurate: Triangulation inequality & proximity-based clustering

Exact measurements most accurate

Similar accuracy to GNP

Monitors deployment

End hosts CDN edge servers

Transit AS’s (hard to deploy)

End hosts

End hosts

Problem Formulation

• Given N end hosts, how to select a subset of

them as monitors and build a scalable overlay distance monitoring service without knowing the underlying topology?

• Distance info desired: report congestion/failure if occurs, otherwise latency

E2E Congestion/Failures Analysis

• Based on National Lab of Applied Network Research (NLANR) AMP data set– 104 sites in US (including Alaska, Hawaii) & Australia, every

host ping all other hosts every minute– Sliding window of 10 samples, use minimum RTT as latency

sample– 105M measurements, 6/25/01 – 7/1/01– Congestion/failures (uniformly denoted as congestion)

defined as measurement “loss” or (latency > geo mean × geo stdev)

• Congestions not common, only 0.96% samples• A few congestion links dominate the E2E congestion

– Besides those happened at the last mile, E2E congestion exhibit strong spatial correlation

NLANR AMP Sites

Internet Iso-bar

• Procedures1. Cluster hosts that perceive similar performance

to a small set of sites (landmarks)2. For each cluster, select a monitor for active and

continuous probing3. Estimate distance between any pair of hosts

using inter- and intra-cluster distance

Internet Iso-bar (I): Host Clustering• Define correlation distance between each pair of

hosts– Existing work use network proximity: cor_dist(i,j) =

net_dist(i,j) (denoted pij)– Iso-bar uses network distance vector (k landmarks for

clustering only): netVi = [pi1, pi2, …, pik]T

• Euclidean distance based:

• Cosine vector similarity based:

• Apply generic clustering methods – Optimize the worst case: minimize the maximum radius of all

clusters (limit_num_minRmax)– Optimize the average case: minimize the sum of total host-

monitor distance (limit_num_minDistSum)

2jm

km1imji )p (p|netVnetV| j),cor_dist(i

k

1m

k

1m jm

2

im2

k

1m jmim

ji

ji

)p()p(

)p(p1

|netV||netV|

netVnetV1j),cor_dist(i

End Host

Cluster ACluster

B

Cluster C

Landmark

Diagram of Internet Iso-bar

Cluster A

End Host

Cluster B

Monitor

Cluster C

Distance probes from monitor to its hosts

Distance probes among monitors

Landmark

Diagram of Internet Iso-bar

Internet Iso-bar (II): Distance Estimation

• Intra-cluster estimation – If path(m, i) or path(m, j) is

congested, report path(i, j) as congestion

– O/w pDist(i,j) = (mDist(m, i) + mDist(m, j))/ 2

• Inter-cluster estimation– If path(mi, i), path(mi, mj) or

path(mj, j) is congested, report path(i, j) as congestion

– O/w pDist(i,j) = mDist(mi, mj)

i

j

m

j

mj

im

i

Evaluation Methodology• Internet measurement data

– NLANR AMP data set

• Clustering with geometric mean of training date

• Estimation dates: 6/25/01 – 7/24/01, 12/06/01

– Keynote CDN measurement data • 63 agents covering all major ISPs in US, Europe, Asia &

Australia• 2 targets (CDN re-directors) in Boston and Texas• Measure TCP connection time (2/3 of handshake) from each

agent to target every minute• Training date: 10/21/2002• Estimation dates: 10/21/2002 – 11/25/2002

• Similar latency estimation results for both datasets, present NLANR

Evaluation Methodology (II)• Estimation metric

– Relative accuracy error for un-congested latency

– Stability

– For dynamic monitoring systems, amount of congestion captured and false positive ratio

• Internet distance estimation techniques evaluated– Omniscent: use g-mean data of (source, dest) on training

date– Global Network Positioning (GNP) – Clustering with network distance vector (Iso-bar)– Clustering with network proximity

• 15 clusters vs. 15 landmarks of GNP

|]istmeasured_d

distpredicted_logAvg[|rrorrelative_e 2

Latency Prediction Accuracy &

Stability

• Training date: 06/25/01

• Estimation dates: 06/25/01 - 12/06/01

• Summary of the 90th percentile relative error for various distance estimation methods

Training date: 06/25/01 Estimation date: 07/24/01

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1

Relative Error

Cu

mu

lati

ve P

rob

abili

ty

OmniscentGNPIso-bar: minDistSumIso-bar: minRmaxProximity-based clustering

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 30 60 90 120 150 180

Interval between traning and estimation dates (days)

Rel

ativ

e E

rro

rOmniscent

GNP

Iso-bar:minDistSum

Iso-bar:minRmax

Proximity-basedclustering

Distance Estimation Results• Latency estimation when un-congested

– Omniscient is the most accurate, but unscalable– GNP and Iso-bar are the second

• Both have good accuracy and stability for distance estimation• GNP unscalable for online monitoring, static approach

– Iso-bar outperforms proximity-based clustering by 50%• 90th percentile < 0.5, if 60ms latency, 45ms < prediction < 90ms

• Congestion/failures estimation– 6/25/01 – 7/01/01, averagely 148K congested

measurements per day– Iso-bar captures 78% of them, 32% false positive ratio– Only 3% of monitoring overhead compared with RON

Conclusions

• Propose Internet Iso-bar• Cluster hosts based on the network similarity• Inter- and Intra-cluster latency estimation w/

first-step heuristic for congestion/failure detection

• Preliminary results promising – High accuracy & stability for normal latency

estimation– Simple heuristics of congestion estimation captures

78% of congestions, with 32% false positive, and only 3% of monitoring overhead of RON

Ongoing Work

• Current focus switch from latency estimation to congestion/failures estimation – Apply topology information, e.g. lossy link detection with

network tomography– Cluster and choose monitors based on the lossy links

• Benefit applications– Dynamic node join/leave for P2P systems

• Joining client pings landmark sites to get distance vector, compare with those of monitors, and choose closest one to join

• Split/merge clusters

– Multi-path selection

• More comprehensive evaluation– Simulate with large network– Deploy on PlanetLab, and operate at finer level

Internet Iso-bar Problem formulation:

Given N end hosts, how to select a subset of them as monitors and build a scalable overlay distance monitoring service without knowing the underlying topology?

Distance info desired: report congestion/failure if occurs, o/w latency

Our approach:1. Cluster hosts that perceive similar performance to a small set

of sites (landmarks)2. For each cluster, select a monitor for active and continuous

probing3. Estimate distance between any pair of hosts using inter- and

intra-cluster distance

Performance evaluation – Using real Internet measurement data – Compared with other distance estimation services: GNP, RON– Performance metrics: accuracy and stability

Internet Iso-bar (II): Distance Estimation

• Congestion/failures analysis– Congestion/failures (uniformly denoted as congestion) not

common• Defined as measurement “loss” or (latency > geo mean × geo stdev)• Only 0.96% out of 105M NLANR ping measurements over a week

– Suggest a few congestion links dominate the E2E congestion• Besides those happened at the last mile, E2E congestion exhibit

strong spatial correlation

• Estimation algorithms– Intra-cluster estimation (i and j use the same monitor m)

• If path(m, i) or path(m, j) is congested, report path(i, j) as congestion• O/w predictedDist(i,j) = (measuredDist(m, i) + measuredDist(m, j))/ 2

– Inter-cluster distance estimation• If path(monitori, i), path(monitori, monitorj) or path(monitorj, j) is congested,

report path(i, j) as congestion• Otherwise predictedDist(i,j) = measuredDist(monitori, monitorj)

– Self-diagnostics of monitors, check for last-mile congestion

internet iso-bar: a scalable overlay distance monitoring system yan chen, lili qiu, chris overton...

Documents