analyzing peer-to-peer traffic across large networks jia wang joint work with subhabrata sen...

21
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Jia Wang Joint work with Subhabrata Joint work with Subhabrata Sen Sen AT&T Labs - Research AT&T Labs - Research

Upload: baldwin-preston

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing Peer-to-Peer Traffic Across Large Networks

Jia WangJia Wang

Joint work with Subhabrata SenJoint work with Subhabrata Sen

AT&T Labs - ResearchAT&T Labs - Research

Page 2: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

2

P2P applications

Distributed file sharingDistributed file sharing Napster, Gnutella, FastTrack, EDonkey, Napster, Gnutella, FastTrack, EDonkey,

DirectConnect…DirectConnect… Searching v.s. data fetching phasesSearching v.s. data fetching phases All the communications occur over default All the communications occur over default

ports ports SuperNodes and HubsSuperNodes and Hubs

Why is this interesting?Why is this interesting?Large and growing traffic volume Large and growing traffic volume

Page 3: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

3

Outline MethodologyMethodology

Data collectionData collection Characterization metricsCharacterization metrics

Analysis resultsAnalysis results Traffic volume and overlay topology Traffic volume and overlay topology System dynamicsSystem dynamics Traffic characterizationTraffic characterization

P2P vs WebP2P vs Web

Page 4: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

4

Methodology ChallengesChallenges

Decentralized systemDecentralized system Transient peer membershipTransient peer membership Some popular close proprietary protocolsSome popular close proprietary protocols

Large-scale passive measurementLarge-scale passive measurement Flow-level data from routers across a large tier-1 ISP Flow-level data from routers across a large tier-1 ISP

backbonebackbone Analyze both signaling and data fetching trafficAnalyze both signaling and data fetching traffic 3 levels of granularity: IP, Prefix, AS3 levels of granularity: IP, Prefix, AS

P2P protocolsP2P protocols FastTrack:1214 (including Morpheus)FastTrack:1214 (including Morpheus) Gnutella:6346/6347 Gnutella:6346/6347 DirectConnect:411/412DirectConnect:411/412

Page 5: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

5

Methodology Discussion AdvantagesAdvantages

Requires minimal knowledge of P2P protocols: port Requires minimal knowledge of P2P protocols: port numbernumber

Large scale non-intrusive measurementLarge scale non-intrusive measurement More complete view of P2P trafficMore complete view of P2P traffic Allows localized analysis Allows localized analysis

LimitationsLimitations Flow-level data: no application-level detailsFlow-level data: no application-level details Incomplete traffic flowsIncomplete traffic flows

Other issuesOther issues DHCP, NAT, proxyDHCP, NAT, proxy Host Host IP IP Asymmetric IP routingAsymmetric IP routing

Page 6: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

6

Measurements CharacterizationCharacterization

Overlay network topologyOverlay network topology Traffic distributionTraffic distribution Dynamic behaviorDynamic behavior

MetricsMetrics Host distributionHost distribution Host connectivity Host connectivity Traffic volumeTraffic volume Mean bandwidth usageMean bandwidth usage Traffic pattern over timeTraffic pattern over time Connection duration and on-timeConnection duration and on-time

Page 7: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

7

Data cleaning

Invalid IPsInvalid IPs 10.0.0.0-10.255.255.25510.0.0.0-10.255.255.255 172.16.0.0-172.31.255.255.255172.16.0.0-172.31.255.255.255 192.168.0.0-192.168.255.255192.168.0.0-192.168.255.255

No matched prefixes in routing tablesNo matched prefixes in routing tables Invalid AS numbersInvalid AS numbers

> 64512> 64512 Removed 4% flowsRemoved 4% flows

Page 8: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

8

Overview of P2P traffic

Total 800 million flow recordsTotal 800 million flow records FastTrack is the most popular oneFastTrack is the most popular one

Date (2001)Date (2001) 9/10-9/159/10-9/15 10/9-10/1310/9-10/13 12/10-12/1612/10-12/16

# flows# flows 111M111M 184M184M 341M341M

# IPs# IPs 3.4M3.4M 4.5M4.5M 5.9M5.9M

# IPs / day# IPs / day 1M1M 1.5M1.5M 1.9M1.9M

Total traffic Total traffic (GB/day)(GB/day)

773773 11531153 17761776

Traffic per IP Traffic per IP (MB/day)(MB/day)

1.61.6 1.61.6 1.81.8

Page 9: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

9

Host distribution

Page 10: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

10

Host connectivity

Connectivity is very small for most hosts, very high for few hosts

Distribution is less skewed at prefix and AS levels

FastTrack (9/14/2001)

Page 11: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

11

Traffic volume distribution

Significant skews in traffic volume across granularities

Few entities source most of the traffic Few entities receive most of the traffic

FastTrack (9/14/2001)

Page 12: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

12

Mean bandwidth usage

Upstream usage < downstream usage. Possible causes are Asymmetric available BW, e.g., DSL, cable Users/ISPs rate-limiting upstream data transfers

FastTrack (9/14/2001)

Page 13: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

13

Time of day effect

Traffic volume exhibits very strong time-of-day effect Milder time-of-day variation for # hosts in the system

FastTrack (9/14/2001 GMT)

Page 14: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

14

Host connection duration & on-time

Substantial transience: most hosts stay in the system for a short time

Distribution less skewed at the prefix and AS levels

Using per-cluster or per-AS indexing/caching nodes may help

FastTrack (9/14/2001) thd=30min

Page 15: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

15

Traffic characterization

The power lawThe power law May not be a suitable model for P2P trafficMay not be a suitable model for P2P traffic

Relationship between metricsRelationship between metrics Traffic volumeTraffic volume Number of IPsNumber of IPs On-timeOn-time Mean bandwidth usageMean bandwidth usage

Page 16: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

16

Traffic volume vs. on-time

1. Volume heavy hitters tend to have long on-times2. Hosts with short on-times contribute small traffic volumes

FastTrack (9/14/2001): top 1% hosts (73% volume)

1

2

Page 17: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

17

Connectivity vs. on-timeFastTrack (9/14/2001): top 1% hosts (73% volume)

1. Hosts with high connectivity have long on-times

2. Hosts with short on-times communicate with few other hosts

1

2

Page 18: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

18

P2P vs Web ObservationsObservations

97% of prefixes contributing P2P traffic also 97% of prefixes contributing P2P traffic also contribute Web trafficcontribute Web traffic

Heavy hitter prefixes for P2P traffic tend to be Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web trafficheavy hitters for Web traffic

Prefix stability – the daily traffic volume Prefix stability – the daily traffic volume (in %) from the prefix does not change (in %) from the prefix does not change over daysover days

Experiments: Experiments: 0.01%, 0.1%, 0.1%, 1%, 10% 10% heavy hitters => heavy hitters => 10%, 30%, 30%, 50%, 90% of 90% of the traffic volume the traffic volume

Page 19: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

19

Traffic stabilityMarch 2002

Top 0.01% prefixes Top 1% prefixes

P2P traffic contributed by the top heavy hitter prefixesis more stable than either Web or total traffic

Page 20: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

20

Summary

Measure and characterize P2P traffic Measure and characterize P2P traffic across a large networkacross a large network

Three popular P2P systemsThree popular P2P systems Significant increase in both number of users Significant increase in both number of users

and traffic volumeand traffic volume Traffic distributions are highly skewedTraffic distributions are highly skewed High level system dynamicsHigh level system dynamics P2P is significant, but stable component of P2P is significant, but stable component of

the Internet traffic the Internet traffic

Page 21: Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research

Analyzing peer-to-peer traffic accoss large networks

21

Acknowledgement

AT&T LabsAT&T Labs Matt Grossglauser, Carsten Lund, Jennifer Matt Grossglauser, Carsten Lund, Jennifer

Rexford, Matt Roughan, Fred TrueRexford, Matt Roughan, Fred True ExternalExternal

Steve GribbleSteve Gribble