presentation by nanda kishore lella lella.2@wright

A Measurement Study of Peer-to-Peer File Sharing Systems

Stefan SaroiuP. Krishna Gummadi

Steven D. Gribble

Presentationby

Nanda Kishore LellaLella.2@wright.edu

Outline• P2P Overview

– What is a peer?– Example applications– Benefits of P2P

• P2P Content Sharing– Challenges– Group management/data placement approaches– Measurement studies

• Conclusion

What is Peer-to-Peer (P2P)?

• Most people think of P2P as music sharing

Examples:

• Napster

• Gnutella

What is a peer?

• Contrasted with Client-Server model

• Servers are centrally maintained and administered

• Client has fewer resources than a server

What is a peer?

• A peer’s resources are similar to the resources of the other participants

• P2P – peers communicating directly with other peers and sharing resources

P2P Application Taxonomy

P2P Systems

Distributed Computing File Sharing Collaboration PlatformsJXTA

P2P Goals/Benefits

• Cost sharing

• Resource aggregation

• Improved scalability/reliability

• Increased autonomy

• Anonymity/privacy

• Dynamism

• Ad-hoc communication

P2P File Sharing

• Content exchange– Gnutella

• File systems– Oceanstore

• Filtering/mining– Opencola

Research Areas

• Peer discovery and group management

• Data location and placement

• Reliable and efficient file exchange

• Security/privacy/anonymity/trust

Current Research

• Group management and data placement– Chord, CAN, Tapestry, Pastry

• Anonymity– Publius

• Performance studies– Gnutella measurement study etc.

Management/Placement Challenges

• Per-node state

• Bandwidth usage

• Search time

• Fault tolerance/resiliency

Approaches

• Centralized

• Flooding

• Document Routing

Centralized

• Napster model• Benefits:

– Efficient search

– Limited bandwidth usage

– Efficient network handling

• Drawbacks:– Central point of failure

– Limited scale

Bob Alice

JaneJudy

Flooding

• Gnutella model• Benefits:

– No central point of failure

– Limited per-node state

• Drawbacks:– Slow searches

– Bandwidth intensive

Document Routing

• FreeNet, Chord, CAN, Tapestry, Pastry model

• Benefits:– More efficient searching

– Limited per-node state

• Drawbacks:– Limited fault-tolerance vs

redundancy

001 012

Document Routing – CAN

• Associate to each node and item a unique id in an d-dimensional space

• Goals– Scales to hundreds of thousands of nodes

– Handles rapid arrival and failure of nodes

• Properties – Routing table size O(d)

– Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

Slide modified from another presentation

CAN Example: Two Dimensional Space

• Space divided between nodes• All nodes cover the entire space• Each node covers either a square or a

rectangular area • Example:

– Node n1:(1, 2) first node that joins cover the entire space

1 2 3 4 5 6 70

• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

• Nodes n4:(5, 5) and n5:(6,6) join

1 2 3 4 5 6 70

n3 n4n5

• Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

• Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

1 2 3 4 5 6 70

n3 n4n5

• Each item is stored by the node who owns its mapping in the space

1 2 3 4 5 6 70

n3 n4n5

CAN: Query Example• Each node knows its

neighbors in the d-space• Forward query to the

neighbor that is closest to the query id

• Can route around some failures– some failures require local

flooding• Example: assume n1 queries

f41 2 3 4 5 6 70

n3 n4n5

CAN: Query Example

1 2 3 4 5 6 70

n3 n4n5

CAN: Query Example

1 2 3 4 5 6 70

n3 n4n5

CAN: Query Example

1 2 3 4 5 6 70

n3 n4n5

Node Failure Recovery

• Simple failures– know your neighbor’s neighbors– when a node fails, one of its neighbors takes

over its zone

• More complex failure modes– simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors– hopefully, a rare event

Document Routing – Chord

• MIT project• Uni-dimensional ID

space• Keep track of log N

nodes• Search through log N

nodes to find desired key

Document Routing – Chord(2)

K19• Each node and key is

assigned an id.• If a node needs a key, searches in its table of n nodes for the key.• If fails, goes to the last

node of its table and repeats until it finds the key.

• Search through log N nodes to find desired key

Doc Routing – Tapestry/Pastry

• Global mesh of meshes• Suffix-based routing• Uses underlying network

distance in constructing mesh

1290239E

Naming in Tapestry

1290239E

• Every node has a 4 bit

name similar to IP address

• Each bit in the name

can hold 16 types• Keys present at the node

are in accordance with

the node name.

Tapestry Routing

Msg to 4598

Remaining Problems?

• Hard to handle highly dynamic environments

• Methods don’t consider peer characteristics

Measurement Studies

Gnutella vs. Napster

napster.com

server

response

queryfile download

Napster Gnutella

Methodology

2 stages:1. periodically crawl Gnutella/Napster

• discover peers and their metadata

2. feed output from crawl into measurement tools:• bottleneck bandwidth – SProbe• latency – SProbe• peer availability – LF• degree of content sharing – Napster crawler

Crawling

• May 2001

• Napster crawl– query index server and keep track of results– query about returned peers– don’t capture users sharing unpopular content

• Gnutella crawl– send out ping messages with large TTL

Measurement Study

• How many peers are server-like…client-like?

• Bandwidth, latency

• Connectivity

• Who is sharing what?

Graph results

• CDF: cumulative distribution function• From this graph, we see that while 78% of the

participating peers have downstream bottleneck

bandwidths of at least 1000Kbps• Only 8% of the peers have upstream bottleneck

bandwidths of at least 10Mbps.• 22% of the participating peers have upstream

bottleneck bandwidths of 100Kbps or less.

Reported Bandwidth

Graph results

• The percentage of Napster users connected with modems (of 64Kbps or less) is

• About 25%, while the percentage of Gnutella users with similar connectivity is as low as 8%.

• 50% of the users in Napster and 60% of the users in Gnutella use broadband connections

• only about 20% of the users in Napster and 30% of the users in Gnutella have very high bandwidth connections

• Overall, Gnutella users on average tend to have higher downstream bottleneck bandwidths than Napster

users.

Graph results

• Approximately 20% of the peers have latencies of at least 280ms,

• Another 20% have latencies of at most 70ms

Graph results

• This graph illustrates the presence of two clusters; a smaller one situated at (20-60Kbps, 100-1,000ms) and a larger one at over (1,000Kbps, 60-300ms).

• horizontal lines in the graph they predicate that the latency also depends on the location of the peer for measuring system.

Measured Uptime

Number of Shared Files

Correlation of Free-Riding with B/WCDF of Number of Downloads Per Reported Bandwidths (Napster)

0 1 10 100Number of Downloads

Unknown

Modem + ISDN

Dual ISDN + Cable + DSL

T1 + T3

CDF of Number of Uploads Per Reported Bandwidths (Napster)

0 1 10 100Number of Uploads

sUnknown

T1 + T3

Dual ISDN + Cable + DSL

Modem + ISDN

CDFs of Downstream Bottlenck Bandwidths for All Napster Users and For All Users Who Reported Unknown Bandwidhts

1 10 100 1,000 10,000 100,000

Measured (SProbe) Downstream Bottleneck Bandwidth (Kbps)

All Napster Users

Napster Users whoReported Unknown Bandwidhts

Power law

• A connected cluster of peers that spans the entire network survives even in the presence of a large percentage p of random peer breakdowns, where p can be as large as:

where m is the minimum node degree and K is the maximum node degree.α <3.

Gnutella

Fri Feb 16 05:21:52-05:23:22 PST1771 hosts

Popular sites:

• 212.239.171.174

• adams-00-305a.Stanford.EDU

• 0.0.0.0

30% random failures

1771 – 471 – 294 hosts Fri Feb 16 05:21:52-05:23:22 PST

4% orchestrated failures

Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts

Results Overview

• Lots of heterogeneity between peers– Systems should consider peer capabilities

• Peers lie– Systems must be able to verify reported peer

capabilities or measure true capabilities

Points of Discussion

• Is it all hype?

• Should P2P be a research area?

• Do P2P applications/systems have common research questions?

• What are the “killer apps” for P2P systems?

Conclusion

• P2P is an interesting and useful model

• There are lots of technical challenges to be solved

presentation by nanda kishore lella lella.2@wright

presentationcan example

node n1

node statedrawbacks

peer p2p

dimensional spacenode

dimensional spacespace

measurement study of

dimensional spacenodes

Documents

disuruh nanda!!!!!!!!!!!!!!!!

passion continues book.pdf · chennai, tamil nadu to...

scanned by...

j. res. angrau vol. xlii no.2 pp 1-84, april - june,...

patterns by lella boutique - modafabrics.com

lella boutique schoolhouse 2019 international quilt market...

team 16 : medfrs device diagnostic software misha...

nanda powerpoint

shraddha nanda

aircraft guide - di lella international

designed by lella (15.69mb)

lyapunov exponents and eigenvalues of products...

nanda kishore daggupati - pathy foundation...

little tree lella boutique holiday traditions

kishore and kishore delhi india

eicl cover 2020 colour final printout...

nanda / white body wall tiles · nanda blanco 25x75 nanda...

nanda prakruthi

egocentricbiasanddoubtincognitiveagentsegocentricbiasanddoubtincognitiveagents...

nanda gokula