recent problems in peer-to-peer content retrieval

19
NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING 1 B.N. Levine Recent Problems in Peer-to-peer Content Retrieval Brian Neil Levine Dept. of Computer Science UMass Amherst The work by BNL and his students presented here was supported in part by National Science Foundation awards ANI-033055 and EIA-0080199. AMHERST

Upload: mave

Post on 15-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Recent Problems in Peer-to-peer Content Retrieval. AMHERST. Brian Neil Levine Dept. of Computer Science UMass Amherst. The work by BNL and his students presented here was supported in part by National Science Foundation awards ANI-033055 and EIA-0080199. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

1B.N. Levine

Recent Problems in Peer-to-peer Content Retrieval

Brian Neil Levine

Dept. of Computer Science

UMass AmherstThe work by BNL and his students presented here was supported in part by National Science Foundation awards ANI-033055 and EIA-0080199.

AMHERST

Page 2: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

2B.N. Levine

Motivation

• Peer-to-peer content sharing is one of the largest portions of traffic on the network.

• Illegal (gnutella, kazaa) or not (Apple iTunes), understanding the characteristics of such traffic is important to a well-performing Internet.

• This talk: – What’s being done in p2p content & retrieval.– Overview of research in p2p traffic measurement.– How such measurements can affect p2p design.

Page 3: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

3B.N. Levine

What is a p2p architecture?

1

Re

sou

rces

ou

t of y

ou

r p

ock

et

to m

ake

it w

ork

(=m

on

ey)

Peers required to make it work

Centralized

successful

unsuccessful

robust,fault-tolerant

Many

over-budgeted

Little

LotsDistributed

Robust P2P

P2P

Cha

nce

you’

ll be

hel

d ac

coun

tabl

e

Page 4: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

4B.N. Levine

Overview of P2P research problems

• Content search– P2P designs are not one-size-fits-all.– Different applications require different solutions.

• Peer selection– Finding the best peer of many serving a file…

• Incentives for peers to participate• Security and privacy• Evaluation against measurement traces

– What does real p2p traffic look like?– What’s the real performance of these protocols?

Page 5: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

6B.N. Levine

Circular Pegs, square holes…• DHTs work great when:

– each node is associate with a unique keyword (e.g., SOS).

– The keywords stored are well-known

• e.g., DNS lookup using a DHT

– Hashes of keywords ensure work is evenly distributed

• Libraries of content?• Real measurements show:

– Nodes store more than one file, each file brings at least one keyword

• h(“The Red Hot Chili Peppers”, “Breaking the girl”)

– Content search is difficult: index each term? Or index whole title? Or part?

• h(“red”), h(“hot”), h(“chili”),…• H(“let”), h(“there”), h(“be”), h(“light”)…

– Some stored keywords are more popular than others.

– Some queried keywords are more popular than others.

Page 6: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

7B.N. Levine

How many keys per new user in your app?

806.0 xy

Number of files in user library

Nu

mb

er o

f u

niq

ue

keys

• DNS: 1-2 keys pers authoritative domain.

• [Left] : Unique terms in real collections of shared files (based on file names only! Not idv3 tags).

Page 7: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

8B.N. Levine

Cost of indexing files in DHTs

100%

80%

60%

40%

20%

0%

Per

cent

age

of p

eers

con

tact

ed to

inde

x fil

es

Cumulative percentage of peers (ranked)

e.g., in a 100-node network, 40% of the nodes must contact 100% of the peers to index filenames for each join and leave.

Page 8: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

9B.N. Levine

Methods of p2p search

• Distributed Hash Tables– CAN, Chord, Pastry, etc…

• Distribute the index• Cost: updating

pointers to content

• Flooded search over– Random graphs – Small-world networks– Power-law degree networks

• Return results only on the content you have stored

• Make it easy for searches to traverse the graph

• Cost: updating the graph; group similar nodes together

• Links represent– Nothing– Relational autocorrelation

• “Heat-seeking search” over an organized network.

Mu

ch focu

sN

ot e

no

ugh

focu

s

Page 9: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

10B.N. Levine

Searching for Topics not files…

• Information Retrieval searches:– Show me all documents that are related to

“salsa dancing” (as google does)

• You can’t index every word of every document– It’s hard enough to handle file names.

• One approach: place nodes with similar content together.

Page 10: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

11B.N. Levine

Arranging topology to match content

0

0.2

0.4

0.6

0.8

1

- 20 40 60 80 100Nodes contacted by BFS of the graph

Rec

all

Optimal

Per-queryArrangement

Arrangement

Random (gnutella)

• Arrange topology so that we increase the amount of relevant information returned to peers for limited BFS of the graph.

• Tough problem!• Can you find

answers without flooding? Can you route queries towards content?

Page 11: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

12B.N. Levine

Retrieval (briefly)

• Content is likely to be available from several peers.

• From which peer do you download?– Random (current approach)– Heuristics (ping, hop count, dl time)

• (but, most peers you’ve never seen before)

– Learned/Adaptive methods (e.g., MDPs)• See [BZLS; IPTPS’03]

Page 12: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

13B.N. Levine

Selecting for both accuracy and speed

• Of the set of 100, IR techniques will chose servers it believes are most accurate (red)

• Selecting nodes for best transfer times picks a different set (green).

• Trivial composition doesn’t work.

Client

...

Page 13: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

14B.N. Levine

Some other lessons learned from measurement (openNap)

Ratio of audio:video

Shared Transferred

# of files 20:1 1:1

# of bytes 1:1 0.06:1

• What happened to content delivery on the Internet?• What happened to serving video on the Internet?

Page 14: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

15B.N. Levine

Who’s transferring/serving files? (openNap)

Percentage of users down/uploading

Pe

rce

nta

ge

of a

ll d

ow

n/u

plo

ads

Page 15: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

16B.N. Levine

Session Lengths (gnutella)P

erce

ntag

e of

all

sess

ion

>x

Length of node availability (10 min. increments)

Page 16: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

17B.N. Levine

Balance of work in Chord(simulation based on real traces)

Equal work

Keys indexed

Queries Resolved

Msgs rcvd

Msgs sent

Percentage of all nodes (ranked)

100%

80%

60%

40%

20%

0%

Cum

ulat

ive

perc

enta

ge

of w

ork

doin

g “x

” pe

rfor

med

Page 17: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

18B.N. Levine

Does caching queries balance load? (simulation based on real traces)

• cached (infinite buffer): 20% answer 55% of the queries.

•Answer: yes, but still a problem.

• normal: 20% answer 84% of the queries.

Page 18: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

19B.N. Levine

Some Measurements of P2P

• Ripeanu et al. – Gnutella topology does not match underlying network topology.

MMCN'02

• Markatos – A simple, query caching scheme can reduce query traffic by a factor of

two. CCGrid 2002

• Saroiu et al. – Gnutella bandwidth, latency, and node availability over a 60-hour

period. Multimedia Systems Journal v8n6

• Adar and Huberman – A free-rider study, using Gnutella’s QueryHit messages to

infer peer downloads.

• Chu, Labonte, Levine – Measurements of Napster and Gnutella file popularity and

session lengths. Proc. ITCom 2002

• Bhagwan et al – effects of dhcp on availability of nodes in p2p, TOD, joins and

leaves IPTPS 2003

• Chu, Labonte, Levine – Measurements of all transfers and most libraries in a large

p2p system (openNap); evaluation of Chord

Page 19: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

20B.N. Levine

Summary Open Issues

• Applications of p2p are broad.• Methods other than DHT are possible.• Measurement studies have revealed the skewed

distributions of p2p systems.– Can these be modeled?

• DHTs are limited in their application to content sharing.– Work well for single-key systems

• Stronger efforts are needed to match research designs to real characteristics of systems.

• Thanks to Jacky Chu and Kevin Labonte for doing the balance of the work.