study on network size estimation schemes for peer-to-peer networks

Study on Network Size Estimation Schemes

for Peer-to-Peer Networks

2008/02/19Hosik Cho

hscho@mmlab.snu.ac.kr

Some Questions

• How many people in this room?• Why do you think that?

• How many people in this campus?• Can you count them all?

• How many nodes in a P2P network over the world?

Contents

• Peer to Peer networks• Network size estimation• Estimation methods

– Unstructured P2P– Structured P2P

• Conclusion

P2P networks

• A peer to peer overlay network connects peers in a logical manner on top of IP.

• Unstructured P2P: Gnutella, Freenet• Structured P2P: Chord, CAN, Pastry, …

• P2P applications– File sharing systems (Kazza, Gnutella)– Video over IP (CoolStreaming)– Voice over IP (Skype)

P2P networks

• Characteristics– Scalable– Self-organizing capability– Resilience to failure– Fully decentralized

• The system monitoring and obtaining global statistics become much more complex.

Network size estimation

• Network size (N)– Load balancing– Restricted broadcasting– Determining network parameters

• For unstructured P2P network, most approaches are based on broadcasting.

• For structured P2P network, the size can be directly inferred from the density of identifiers.

Related Works

• Unstructured P2P– Sample & Collide– Hops Sampling– Gossip-based aggregation

• Structured P2P– Token passing– Neighbor sampling– Finger sampling

Sample&Collide (1)

• “Birthday Paradox” – The probability of having two people in a room that have the same birthday is at least 50%, for a group of 23 peoples.

• The initiator samples nodes uniformly at random until a sample returns a node that already has been selected.

• The expected number (X) of samples is √2n• The system size is estimated to X2/2

Sample&Collide (2)

1. Initiator node set T>0

2. Send to neighbors3. Nodes picks a

random number U, and decrements T by log(U)/di

4. T>0, forwards the message

5. T<0, return its ID to the initiator (sample)

HopsSampling (1)

• Probabilistic polling approach• An initiator spreads messages in the network

and estimates the system size based on the replies it gets back.

• If hopCount < minHopsReporting, a response is set with prob. 1

• Else, the response is sent with prob. 1/2(hopCount-minHopsReporting)

• If minHopsReporting=2, only 25% of nodes with distance 4 will report back.

HopsSampling (2)

1. Initiator node set hopCount=0

2. Send to neighbors3. If hopCount <

minHopsReport, send response

4. Else, send response with probability depending on hopCount.

Gossip-based (1)

• Epidemic-based approach• If exactly one node of the system holds a

value 1, and all the other values are 0, the average is 1/N.

• An initiator take the value 1, and start gossiping.

• The reached nodes participate to the process by setting their value to 0.

• At each cycle, each node in the network chooses one of its neighbor and swaps its estimation parameter.

Gossip-based (2)

• Estimation (Estimation+neighbor’s_Estimation)/2

• To provide correct estimations, this algorithm needs to wait a certain number of rounds to elapse before computing the size estimation.

• This period is the required time for the gossip to propagate in the whole network and for the values to converge.

N Estimation in S-P2P

• Assumptions– IDs are uniformly distributed.– Each node knows the total number of

nodes (N) in the system.– Nodes do not leave and join frequently.

Basic approaches

Token 54 7

(a) Token passing (b) Neighbor sampling

N Estimation in S-P2P

• In actual deployed system,– Nodes join and leave frequently.– Node must estimate the time how long a

query delivered to the destination. O(logN)

– Proximity-based identifiers are adopted for efficient routing.

• AS number• geographic location

Uniformity of Identifiers

Myth Real

Estimation result (1)

Proximity ID’s

Uniformly distributed IDs

Extended approach

• Structured P2P maintains fingers, routing tables, contacts, etc.

• Estimate N more precisely using structural information.

Estimation result (2)

Proximity ID’s

Uniformly distributed IDs

Conclusion

• For unstructured P2P– Tradeoff between the quality of the

estimate and the associated overhead.– A proper algorithm should be applied

according to its objectives and applications.

• For structured P2P– Distribution of identifiers may be skewed.– Use of structural information will make the

estimation results more accurate.

References

• D. Psaltoulis, D. Kostoulas, I. Gupta, K. Birman, and A. Demers, “Practical algorithms for size estimation in large and dynamic groups,” PODC 2004.

• D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, and A. Demers, “Decentralized schemes for size estimation in large and dynamic group,” IEEE NCA’05, 2005.

• L. Massoulie, A.-M. Kermarrec, E. Le Merrer, and A.J. Ganesh, “Peer couting and sampling in overlay networks: random walk methods,” Technical report MSR-TR-2005-156, 2005.

• G.S. Manku, M. Bawa, and P. Raghavan, “Symphony: Distributed Hashing in a Small World,” USITS 2003.

study on network size estimation schemes for peer-to-peer networks

Documents

nppe schemes class 5 - oxford€¦ · schemes of work is an...

channel estimation in ofdm systems · paper we investigate...

fast and reliable estimation schemes in rfid systems

fast and reliable estimation schemes in rfid...

understanding peer effects: on the nature, estimation, and...

student peer mentoring schemes at newcastle university ·...

design of nonlinear precoding and estimation...

bandwidth modeling and estimation in peer to peer...

analytical schemes to optimize the mining results using...

chapter 3 channel estimation schemes...

distributed aggregation schemes for scalable peer...

improvement schemes for indoor mobile location estimation...

peer-to-peer data replication meets delay tolerant...

erc grant schemes guide for peer...

variance estimation for fractional brownian … · for peer...

deep learning based channel estimation schemes for ieee

channel and frequency offset estimation schemes for ... ·...

incentive-based schemes smita rai ecs289l. outline...

convection parameterisations ii introduction · convection...

fast and reliable estimation schemes in rfid...