sonia fahmy students: minseok kwon, ossama younis

Post on 18-Nov-2014

833 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

1

Sonia FahmyStudents: Minseok Kwon, Ossama Younis

Department of Computer SciencesPurdue University

For slides, technical reports, and implementations, please see:

http://www.cs.purdue.edu/~fahmy/

This work was supported by NSF ANI-0238294 (CAREER) and the Schlumberger Foundation

Overlay Networks: A Dual Overlay Networks: A Dual Layer ViewLayer View

2

Why Overlays?

• Overlay networks help overcome deployment barriers to network-level solutions

• The advantages of overlays include flexibility, adaptivity, and ease of deployment

• Applications• Application-level multicast (e.g., End

System Multicast/Narada)• Inter-domain routing pathology solutions

(e.g., Resilient Overlay Networks)• Content distribution • Peer-to-peer networks

3

Overlay Multicast

Overlay link

Source

Routers and underlying links

Receivers

4

Why Characterize Overlays?

• Overlay multicast consumes additional network bandwidth and increases latency over IP multicast quantify the overlay performance penalty

• Little work has been done on characterizing overlay multicast tree structure, especially large trees

• Such characterization gives insight into overlay properties and their causes, and a deeper understanding of different overlay multicast approaches better overlay design

Real data fromESM experiments Simulations Analytical

models

Characterizing Overlay Networks

5

Our Hypothesis

• Observations• Many high degree high bandwidth routers

heavily utilized in upper levels of ESM/TAG trees, which tend to be longer. Many hosts are connected to lower degree low bandwidth routers, clustered close together at lower levels of the trees. This lowers multicast cost

• Causes• Overlay host distribution• Overlay protocol (full/partial info/overhead,

delay/bandwidth/diameter/degree, source-based/shared tree/trees/mesh)

• Topology (connectivity and degrees)

6

Overlay Tree Metrics

• Overlay cost = number of underlying hops traversed by every overlay link

• Link stress = total number of identical copies of a packet over the same underlying link

• Overlay cost = ∑stress(i) for all router-to-router links i• Number of hops and delays between parent and child hosts

in an overlay tree• Degree of hosts = host contribution to the link stress of the

host-to-first-router link• Degree of routers and hop-by-hop delays of underlying links

traversed by overlay links• Mean bottleneck bandwidth between the source and

receivers • Relative Delay Penalty (RDP), mean/longest latency

7

Metrics: Examples

• Overlay cost = 12• Link stress on A = 2• RDP of B = (15+15+10)/20 = 2

Overlay link

Source

Receivers

AB

15 ms 15 ms

10 ms

20 ms

C

8

Overlay Tree Structure

• Questions• What do overlay multicast trees look like? Why?• How much additional cost do they incur over IP multicast?

• Methodology• Use overlay trees (65 hosts) in ESM experiments (from

CMU) in November 2002. Use public traceroute servers and synthesize approximate routes. (Most university hosts are connected to the Internet 2 backbone network)

• PlanetLab experiments and tree/traceroute data

9

Results: End System Multicast

• Number of hops between two hosts versus level of host in overlay trees

• Distributions of per-hop delay for different overlay tree levels

(a) Tree level 1

(b) Tree levels 4-6

10

Results: End System Multicast

• Frequency of occurrence of number of hop values between two hosts

• Degree of host versus level of host in overlay tree

11

Experiments on PlanetLab• Internet Experiments

• Implement and experiment with TAG (Topology-Aware Grouping) on the PlanetLab (http://www.planet-lab.org) wide-area platform

• Additional experiments with NICE and HyperCast• Run several sets of experiments with nodes in the United States,

Europe, and Asia

12

Overlay Tree Structure: Simulations

• Topologies• Contains 4k routers connected in ways consistent with router-level

power-law and small-world properties• GT-ITM topology with 4k routers• Delays and bandwidths according to realistic distributions

• Overlay multicast algorithms• ESM (End System Multicast) [SIGCOMM 2001]

• A host has the upper degree bound (we use 6) on the number of its neighbors

• TAG (Topology-Aware Grouping) [extended NOSSDAV 2002]• Uses ulimit=6 and bwthresh=100 kbps for partial path matching

• MDDBST (Minimum Diameter Degree-Bounded Spanning Tree) [NOSSDAV 2001, INFOCOM 2003]

• Minimizes the number of hops in the longest path, and bounds the degree of hosts in overlay trees (degree bound = edge bw/min bw)

13

Results: Number of Hops

• Uniform host distribution• Non-uniform host

distribution

MDDBST not as clear as ESM, because it minimizes max. cost

14

Results: Isolation of Topology Effects

• Variability in router degrees

• Clustering (small world)

15

Results: Degree

• Router degree versus overlay tree level of destination host

• Frequency of router degree

16

Results: Latency and Bandwidth• Relative delay penalty (RDP)

ESM achieves a good balance, but scalability is a concern

• Mean bottleneck bandwidth

17

Overlay Multicast Tree Cost• Network Model

• LO(h,k,n) denotes overlay cost for an overlay O when n is the number of hosts

• We only count hops in router subsequences

• We use n instead of m• Why an underlying tree model?

• Simple analysis• Consistency with real topologies

[Radoslavov00]• Transformation from a graph to a k-ary

tree with minimum cost tree

• Why least cost tree?

• Modeling and analysis are simplified

• Many overlay multicast algorithms optimize a delay-related metric, which is typically also optimized by underlying intra-domain routing protocols

• A lower bound on the overlay tree cost can be computed

h

k

Source

Host Receiver

18

Network Models with Unary Nodes

Self-similar Tree Model (k=2, θ=1, h=3)

1)( ihk

Unary node with only one child

Number of unary nodes created between

adjacent nodes at levels i-1 and i

Branching node

• To incorporate the number-of-hops distribution, use a self-similar tree model [SODA2002]

19

Receivers at Leaf Nodes

k

Source

αh

(a)

α

k

Level l

(b)

Overlay link

Receiver

))1(1( nkk

h

i

ihk1

)(

)1(2 lhk

1))1(1( )1( nlkk

20

Receivers at Leaf Nodes

))1(1(1

)( nh

i

ih kkk

0

)1))1(1((2)(

)1()1( nllh kkklg

1))1(1( )1( nlkk

1

)(h

l

l lgk

1

)())1(1(1

1),,(

h

l

lnh

o lgkkkk

knkhL

The overlay cost in (a):

The overlay cost in (b):

where

if otherwise

The sum of (a) and (b):

n1-θ is observed

where

2

1log

1

h

k

kh

21

Receivers at Leaf Nodes

),(

),,(),,(

khU

nkhLnkhR

o

oo where

h

i

iho kkhU

1

)(),(

θ=0.15

22

Receivers at Leaf or Non-leaf Nodes

1

1

k

kkM

β

… …

……

kp k(1-p)

Lυ(h-1,k,n)

L υ(h-2,k,n)L υ(h-3,k,n)

h

k(1-p)kp

kp k(1-p)

(a)

nMp )1(1 1 α

β

kp k(1-p)

……

……

kpLevel l

)1(2 )1( kpk lh

)1()(2 lhlh kk (A)

(B)

)()()1( BAlhB

(b)

)1()1(),,1(

)1()(

lTpknklhkpL

lhBlT

23

Receivers at Leaf or Non-leaf Nodes

)),,1(( )1( nkhLkkp h

))1(1)(122()1( )1( niih kkpkkihB

)},,1()1({)1()1(1

1

nkihkpLihBpkTh

i

ii

1

1

)1(

)},,1()1({)1(

)),,1((),,(h

i

ii

h

nkihkpLihBpk

nkhLkkpnkhL

The overlay cost in (a):

The overlay cost in (b):

where

The sum of (a) and (b):

24

Receivers at Leaf or Non-leaf Nodes

),(

),,(),,(

khU

nkhLnkhR

where

l

i

ihh

l

l kkM

khU1

)(

1

1),(

θ=0.15

25

Cost Model Validation

• The analytical results are validated using traceroute-based simulation topologies and our earlier topologies

• Normalized overly cost via simulations

• ESM and MDDBST have n0.8-n0.9; TAG has a slightly higher cost due to partial path matching

• Cost with GT-ITM/uniform hosts is higher than with non-uniform/power-law/small-world

• The normalized overlay tree cost for the real ESM tree is n0.945

26

Related Work

• Chuang and Sirbu (1998) found that the ratio between the total number of multicast links and the average unicast path length exhibits a power-law (m0.8)

• Chalmers and Almeroth (2001) found the ratio to be around m0.7 and multicast trees have a high frequency of unary nodes

• Phillips et al.(1999), Adjih et al.(2002) and Mieghem et al.(2001) mathematically model the efficiency of IP multicast

• Radoslavov (2000) characterized real and generated topologies with respect to neighborhood size growth, robustness, and increase in path lengths due to link failure. They analyzed the impact of topology on heuristic overlay multicast strategies

• Jin and Bestavros (2002) have shown that both Internet AS-level and router-level graphs exhibit small-world behavior. They also outlined how small-world behavior affects the overlay multicast tree size

• Overlay multicast algorithms include End System Multicast (2000,2001), CAN-based multicast (2002), MDDBST (2001,2003), TAG (2001), etc.

27

Conclusions

• We have investigated the efficiency of overlay multicast using theoretical models, experimental data, and simulations. We find that: The number of routers/delay between parent and

child hosts tends to decrease as the level of the host in the ESM/TAG overlay tree increaseslower cost

Routing features in overlay multicast protocols, non-uniform host distribution, along with power-law and small-world topology characteristics contribute to these phenomena

We can quantify potential bandwidth savings of overlay multicast compared to unicast (n0.9 < n) and the bandwidth penalty of overlay multicast compared to IP multicast (n0.9 > n0.8)

28

Ongoing Work

• We are conducting larger scale simulations and experiments using PlanetLab

• We are examining other and more dynamic metrics with other overlay protocols

• We will precisely formulate the relationship between the overlay trees, overlay protocols and Internet topology characteristics

• We are investigating the possibility of inter-overlay cooperation to further reduce the overlay performance penalty

29

Other Work…

• Exploiting network tomography for monitoring and traffic engineering• FlowMate on-line passive flow clustering:

design and implementation [ICNP 2002, ToN]• Distributed network delay and loss monitoring

[CC 2003]

• Testing security mechanisms [Computers&Security 2003, CACM 2004]

• Sensor networks [INFOCOM 2004, IWQoS 2004]

30

Overlays A and B may or may not cooperate.

Overlays A and B may or may not cooperate.

Cooperative Overlays

Overlay A

Overlay B

Co-located nodes

Shared routers and links

31

A Spectrum of Overlay Cooperation

Independent overlays Merged overlays

Sharing informatione.g., control info,

queriesShared

measurementCooperative forwarding

Inter-overlay Traffic engineering

Less cooperation More cooperation

32

Cooperative Forwarding

Overlay B

Overlay A

Route X

Route Y

• Route Y is better than route X which only uses hosts in overlay A. Can be proactive or reactive for long-lived flows.

33

Cooperation Mechanisms

• Privilege levels

• Full privileges and obligations: a host (active member) is authorized to use all the services provided by its home overlay network(s).

• Limited privileges and obligations: a host (passive member) has limited capabilities such as routing and replication.

• Each overlay selects a set of nodes that other overlays can exploit as passive members (transit nodes) according to peering relationships.

• Inter-overlay agents selected according to:

• Number of co-located overlay nodes; Number of different overlays represented in neighbor nodes; Minimum maximum delay to other hosts in home overlay

• Passive membership selected according to:• Performance improvement of other overlays, e.g., Number of members that

have this passive member as their next hop (maximum-next-hop)• Compatibility and loads of other overlays• Trust-based priority to determine which overlays are cooperative

34

Additional Cooperative Services

• Shared measurement service

• Control information sharing (e.g., randomized routing time intervals and traffic equilibrium computation for multiple overlays)

• Query forwarding in peer-to-peer networks

• Inter-overlay traffic engineering

35

Related Work

• Overlay broadcasting (Y. Chawathe et al.)

• Studies possibility of cooperation among overlays.

• Routing underlay (A. Nakao et al.)

• Provides shared network layer information to overlay nodes.

• Tomography-based overlay network monitoring (Y. Chen et al.)

• Requires O(n) measurements for all the O(n2) overlay paths.

• Selfish source and overlay routing (L. Qiu et al.)

• Other overlay networks

• Include RON, Detour, End System Multicast, etc.

36

Current Plan

• We are designing a cooperative overlay architecture for heterogeneous overlay networks to collaborate.

• Our goal is to prove that overlay cooperation reduces competition, improves overall performance, and preserves heterogeneity [ICNP2003 poster].

• Ongoing Work

• We are currently implementing our algorithms on PlanetLab.

• We will examine other types of overlay cooperation services with particular attention to the complexity, scalability, and security issues.

37

Is “On-line” Tomography Useful and at What Time Scale?

What is “tomography”? A method of producing (inferring) an image of the internal structures of a solid object by the observation and recording of the differences in the effects on the passage of waves of energy impinging on those structures.

What is “network tomography”? Internet mapping (routes, per-segment delays, per-segment losses, per-segment bandwidth, shared bottlenecks) via composing end-to-end measurements.

38

Why FlowMate?

Source Receiver

Receiver

Receiver

Receiver

39

Why FlowMate?

• Partitioning flows emerging from the same source (busy server) according to shared bottlenecks is useful for: Customized, more fair and more responsive

coordinated congestion management. Overlay networks (e.g., application-layer

multicast and peer-to-peer applications). Load balancing. Pricing. Traffic engineering and admission control.

40

The Problem

• Input:• A set of flows (micro or macro), F,

originating at the same source, where F = {f1, f2, …, fn}

• Required:• Periodically map each flow fi (1 i

n) to a cluster gj (1 j m) G = {g1, g2, …, gm}, m n, where all flows f gj G share a common bottleneck

41

FlowMate Features

• Employs passive probing to reduce probe generation and processing overhead, and network load with a large number of flows.

• Employs on-line clustering based on constantly changing shared bottlenecks.

• Works with or without receiver timestamp support (and no router support).

• Reduces overhead using representatives.• Uses limited history for stability (no

samples).

42

Architecture

Transport layer implementation enables more accurate timestamping

43

Basic Algorithm [O(NG)]

Initialize: Empty cluster list and flow table.Repeat forever:- Collect delay Information.- Check triggering condition.- If (triggered): cluster flows and generate clusters.- Delete delay samples and maintain compact history information.Partitioning: - Select delay samples.- Assign a representative flow for each cluster.- Each flow is tested against each representative, and joins the cluster with highest correlation.- A flow either joins a cluster or forms a new one.

44

Shared Bottleneck Test

For two flows f1 and f2 sharing a common bottleneck in sr [Rubenstein00]:The cross correlation measure of multiplexed (f1, f2) packets, spaced apart by time t > 0, is higher than the auto correlation measure of packets of f1 or f2, spaced apart by time T > t.

s

r

45

In-Band Delay Sampling

• One way delay (reasonable clock skew OK).• Extend the time-stamped ACK (RFC 1323) to

include packet reception time.• Select samples according to inter-packet

spacing.Performance of FlowMate vs. RTT usage

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

5 15 25 35 45 55 65 75 85 95

Time (sec)

% c

orr

ectn

ess

FlowMate

Using RTT

time

Samples chosen as probes

46

Triggering Clustering

Time

d_min d_maxtClustering

not invoked

Clustering may be invoked if

enough samples for all flows

Clustering must be invoked if not invoked since t

Last timeclustering was

invoked

• Every flow with at least M samples is considered

47

Our Accuracy Index

• Sources of inaccuracies: false sharing and cluster splits

• A cluster split is not as harmful as false sharingLet kj denote the resulting number of splits of a

correct cluster:

Example: correct: {1,2,3},{4,5,6}, result: {1,2},{3,4,5},{6}, I=0.67

48

Simulation Configuration

Configuration:

• Cross and reverse traffic: CBR sources

• Forward traffic: FTP, Telnet, or HTTP/1.1

• Background traffic: 3 “StarWars” flows (self-similar traffic)

D5

D3

Source

Cross- traffic

generator

Cross- traffic destination(s)

s

D4

D9

D10

D11

D12

D2

D1

D8 D7 D6

11

10

9

8765

43

2

1

3 Mbps

1.5 Mbps

10 Mbps

12 ms

12 ms

19 ms 5 ms 9 ms

22 ms

5 ms11 ms

12 ms

5 ms

3 ms

4 ms

2 ms

2 ms 3 ms2 ms

1 ms

3 ms

4 msbottlenecks

13 ms

14 ms

17 ms

14 ms

3 ms

3 ms

49

Foreground Load

FlowMate accuracy (using a simpler topology)Different loads Staggered start times

Correlation periods: 1, 2, 4, 6, 8, 10 seconds.

50

Background Load

• Load and on/off periods have little impact on average accuracy

51

Bursty Flows

Telnet traffic HTTP/1.1 traffic

Sampling: Flow life-time (P2P FTPs (elephants), HTTP/1.0 vs. 1.1),Packet interleaving patterns, Delayed ACKs

52

Router Buffering

Buffer size vs avg index Drop policy

53

• Naïve coordinated congestion management demonstrates better fairness and responsiveness

Sample Application

54

Related Work

• Two-flow correlation tests based on delay or loss of all Poisson probe samples [Rubenstein et al., SIGMETRICS 2000].

• Semi-active Bayesian probing (using shared packet loss correlations) [Harfoush et al., ICNP 2000].

• Shannon or Renyi entropy-based flow clustering [Katabi et al., MIT-TR-2001 and IC3N01].

• Other tomography work, e.g., [AT&T, UMass, BU, Rice, Berkeley].

• Congestion Management schemes, e.g., Congestion Manager (CM) [Balakrishnan et al, SIGCOMM 99], Ensemble, Int, FastStart.

55

Conclusions

• FlowMate is an on-line flow partitioning scheme that does not require active probing. Partitioning is periodically performed at the flow origin for a large set of flows.

• FlowMate appears to be robust under heavy background load and has low overhead.

• High burstiness of flows to be partitioned is the main factor that degrades performance.

• FlowMate can be useful to many applications, such as overlay networks, congestion management, load balancing, and pricing.

• We have integrated FlowMate into Linux v2.4.17 and performed experiments on Emulab and Planetlab.

56

Distributed Clustering for Sensor Networks

Goals:

Scalability (to thousands of nodes)?

Prolonged network lifetime?

Data and state aggregation?

Robustness in the face of unexpected failures?

Security of sensor communications?

Approach Clustering

Requirements:

Completely distributed

O(1) iterations to terminate

Low message/processing overhead

High energy, well-spread cluster heads

Balanced clusters

Approaches HEED (Hybrid, Energy-Efficient, Distributed clustering) and READ (Robust, Energy-Aware Distributed clustering)

Network:

Rectangular field with a large number of dispersed sensor nodes

Sensor nodes:

Location un-aware and quasi-stationary

Homogeneous

Unattended (infeasible to recharge)

Example applications:

Seismic monitoring or field surveillance.

57

Anomaly Detection and Security Testing

• Tomography-based anomaly detection:

1. Infer per-segment delays, losses and traffic properties through tomography among a set of cooperating end hosts

2. Detect attacks, configuration problems, and flash crowds on-line based on inferred properties

• Firewall testing:1. Develop a vulnerability type versus

firewall operation matrix2. Place Common Vulnerabilities

Exposure (CVE), and other firewall problems in appropriate matrix cells

3. Find clusters in matrix; predict problems; automate firewall testing

Packet Egress

Packet maybe

dropped

Stream maybe

dropped

Address Lookup

NAT/PAT

Routing Decision

Application Level

Packet Reassembly

Port Filtering

Sanity Checks

Dynamic Rule Set

NAT/PAT

Packet Ingress

top related