search and replication in unstructured peer-to-peer networks

Search and Replication in Unstructured Peer-to-Peer

Networks

Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker

ICS 2002

Outline

• Brief survey of P2P architectures

• Evaluation Methodology

• Search Methods

• Replication

• Conclusions

Peer-to-Peer Networks

• Peers are connected by an overlay network.

• Users cooperate to share files (e.g., music, videos, etc.)

• Dynamic: nodes join or leave frequently

P2P Network Architectures I

• Centralized: – Use of central directory server (CDS)– Peers query to the CSD to find other peers

that hold the desired object

Pros: very efficient

Cons: poorly scales single point of failure

P2P Network Architectures II

• Decentralized: No central directory server– But structured:

• P2P network topology is tightly controlled

• Files are placed at specified locations

– Unstructured:• No control in Network

topology or file placement

P2P Network Architectures III

Decentralized but Structured• “loose structured”

– Placement of files is based on hints

• “tight structure”– Precisely declare

• structure of P2P network and • file placement

– Use of distributed hash tablePros: Efficient satisfaction of queries

Good scalingCons: No proof it works

P2P Network Architectures IV

Decentralized and Unstructured• Placement of files not based on topology

knowledge• Finding files

– Node queries neighbors (usually using flooding)

Pros: extremely resilient to network changesCons: extremely unscalable

generates large loads

Evaluation Methodology I

Terminology• Network Topology:

instant graph formed by nodes in the network

• Query Distribution:frequency of lookups to files

• Replication Distribution:

percentage of nodes that have a particular file

Evaluation Methodology II

• Network Topologies– Powel-Law Random Graph (PLRG)

• Max node degree: 1746, median: 1 average 4.46

– Normal Random Graph (Random)• Average and median node degree is 4

– Gnutella graph (Gnutella)• Oct 2000 snapshot• Max degree: 136, median: 2, average: 5.5

– Two-dimensional Grid• 100x100 10000 nodes

Evaluation Methodology III

• Object query distribution qi

– Uniform– Zipf-like

• Object replication density distribution ri

– Uniform

– Proportional: ri qi

– Square-Root: ri qi

Evaluation Methodology IV

• Metrics– User aspects

• Pr(success)• #hops

– Load aspects• Average #messages per node• #nodes visited• Peak #messages

Limitation of Flooding I

• Gnutella uses TTL to check #hops queries travel

• Problem: – Hard to choose TTL:

• For objects that are widely present in the network, small TTLs suffice

• For objects that are rare in the network, large TTLs are necessary

– Number of query messages grow exponentially as TTL grows

Limitation of Flooding II

• Node may receive the same messages more than once

• Need for duplication detection mechanisms

• Still duplication increases as TTL increases in flooding

Limitation of Flooding Conclusion

• Flooding increases per-node overhead

• Need for more scalable search methods:– Expanding Ring

– Random Walks

Expanding Ring• Adaptively Adjust TTL

– Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds

Still have duplicate messages

Random Walk

• Simple random walk– Takes too long to find anything

• Multiple-walker random walk– K walkers after each walking T steps visits as

many nodes as 1 walker walking K*T steps– More messages more overhead– When to terminate the search:

• TTL• Checking: check back with query originator once

every C steps

Search Traffic Comparison

avg. # msgs per node per query

1.863

2.85

0.053

0.961

0.027 0.0310

0.5

1

1.5

2

2.5

3

Random Gnutella

Flood Ring Walk

Search Delay Comparison

# hops till success

2.51 2.39

4.033.4

9.12

7.3

0

2

4

6

8

10

Random Gnutella

Flood Ring Walk

Lessons Learned about Search Methods

• Key: Cover the right number of nodes as quickly as possible and with as little overhead as possible

• Pay Attention to– Adaptive termination– Minimize message duplication– Small expansion in each step

Replication

• In unstructured P2P systems, search success is essentially about coverage: visiting enough nodes to find the object => replication density matters

• Goal: minimize average search size (number of probes till query is satisfied)

• Theoretical Optimal: copy everything everywhere– Limited node storage

Replication Strategies

• Uniform Replication– pi = 1/m– Simple, resources are divided equally

• Proportional Replication– pi = qi– “Fair”, resources per item proportional to

demand– Reflects current P2P practices

Square-Root Replication

• pi is proportional to square-root(qi)• Lies “In-between” Uniform and Proportional

Achieving Square-Root Replication I

• Assuming that each query keeps track the number of probes needed

• Store an object at a number of nodes that is proportional to the number of probes

• Two implementations:– Path replication: store the object along the

path of a successful “walk”– Random replication: store the object randomly

among nodes visited by the agents

Achieving Square-Root Replication II

Evaluation of Replication Methods I

• Metrics– Overall message traffic– Search delay

• Dynamic simulation– Assume Zipf-like object query probability– 5 query/sec Poisson arrival– Results are during 5000sec-9000sec– Search method: 32-walkers random walk with

state keeping and check every 4 steps

Evaluation of Replication Methods II

Square-Root Replication reduces search traffic

Avg. # msgs per node (5000-9000sec)

0

10000

20000

30000

40000

50000

60000

Owner Rep

Path Rep

Random Rep

Evaluation of Replication Methods III

Dynamic simulation: Hop Distribution (5000~9000s)

0

20

40

60

80

100

120

1 2 4 8 16 32 64 128 256

#hops

qu

eri

es

fin

ish

ed

(%

)

Owner Replication

Path Replication

Random Replication

Conclusions

• Multi-walker random walk scales much better than flooding– Can find data more quickly– Reduces the traffic overload

• Square-root replication distribution is desirable– Minimizes search delay– Minimizes the overall search traffic

search and replication in unstructured peer-to-peer networks

Documents