distributed algorithms for peer-to-peer systems ronaldo alves ferreira phd thesis advisors: ananth...

82
Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer Science – Purdue University November - 2006

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Distributed Algorithms for Peer-to-Peer Systems

Ronaldo Alves FerreiraPhD Thesis

Advisors: Ananth Grama and Suresh Jagannathan

Department of Computer Science – Purdue UniversityNovember - 2006

Page 2: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Thesis Goals

• Develop algorithms that address fundamental problems in peer-to-peer systems.

• Investigate the feasibility of completely decentralized solutions.

Page 3: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Outline

• Background

• Distributed Algorithms for Structured P2P Networks Locality in DHT Systems [ICPADS 04 and JPDC 06] Semantic Queries in DHT Systems (Binary matrix

decomposition) – [To be submitted to ICDCS]

• Distributed Algorithms for Unstructured P2P Networks Uniform sampling Search with probabilistic guarantees [P2P 05] Duplicate elimination [P2P 05 and TPDS 07]

• Conclusions

Page 4: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Background• Peer-to-Peer (P2P) networks are self-organizing

distributed systems where participating nodes both provide and receive services from each other in a cooperative manner without distinguished roles as pure clients or pure servers.

• P2P Internet applications have recently been popularized by file sharing applications, such as Napster and Gnutella.

• P2P systems have many interesting technical aspects, such as decentralized control, self-organization, adaptation and scalability.

• One of the key problems in large-scale P2P applications is to provide efficient algorithms for object location and routing within the network.

Page 5: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Background• Central server (Napster)

Search is centralized, data transfer is peer-to-peer.

• Structured solution – “DHT” (Chord, Pastry, Tapestry, CAN) Regular topologies (emulations of hypercube or mesh). Guaranteed file lookup using hashes. Lacks keyword search.

• Unstructured solution (Gnutella) Peers have constant degrees and organize themselves

in an unstructured topology. Controlled flooding used by real-world softwares: high

message overhead. Recent proposals: random walks, replication. Trend towards distributed randomized algorithms.

Page 6: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Background - DHT• All known proposals take as input a key and, in

response, route a message to the node responsible for that key.

• The keys are random binary strings of fixed length (generally 128 bits). Nodes have identifiers taken from the same space as the keys.

• Each node maintains a routing table consisting of a small subset of nodes in the system.

• Nodes route queries to neighboring nodes that make the most “progress” towards resolving the query.

Page 7: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Background - DHT0XX

X1XX

X2XX

X3XX

X

2321

2032

2001

0112

START

0112 routes a message

to key 2000.

First hop fixes first digit (2)

Second hop fixes second

digit (20)END

2001 closest live node to

2000.

Page 8: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Background - DHT

• Scalable, efficient, and decentralized systems:O(log(N)) routing table entries per nodeRoute in O(log(N)) number of hops

Page 9: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition Motivation

• Search is still an important problem in P2P systems.

• Most current proposals solve the problem for keyword searches or object keys.

• DHTs do not support search beyond exact matches.

• Unstructured P2P systems support generic meta-information.

• How can we search for synonyms as in current information retrieval systems?

Page 10: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition Motivation

• Information retrieval systems generally rely on matrix decomposition techniques, such as SVD, to solve the problem of synonym and polysemy.

• SVD is expensive and generates dense matrices.

• We’ll look to the problem using binary data.

Page 11: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition Motivation

• Several high-dimensional datasets of modern applications are binary or can be transformed into binary formatRetailer’s transactions

• A centralized solution is not viable:Volume of dataReal-time responsePrivacy

Page 12: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionDefinition

• Given m binary vectors of size n, find a set of k binary vectors, with k << m, such that for any input vector v, there is an output vector o such that the Hamming distance between v and o is at most ε.

Page 13: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition

• Example:

1110

1100

1100

1110

0110

1110

1110

1110

A

1111x 1110 1110y

TxyÃA

Page 14: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition

• Proximus provides a serial solution to the problem.

• The matrix is recursively partitioned based on successive computations of rank-one approximations.

Page 15: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition

• Rank-one Approx: Given a matrix , find that minimize the error:

• Minimizing the error in a rank-one approximation is equivalent to maximize:

1:2

ijT

ijF

T axyAaxyA

222,

FF

Td yxAyxyxC

Page 16: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition

• The maximization problem can be solved in linear time in the number of non-zeros of the matrix A using an alternating iterative heuristic.

Find the optimal solution to x for a fixed y, and then use the computed value of x to find a new y. The new value of y is again used to find a new value of x. This process is repeated until there is no change to the values of x and y.

Page 17: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix Decomposition

Page 18: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionDistributed Approach

• The matrix is distributed across multiple peers.

• I want to find an approximation to the serial solution.

• To minimize communication overhead, we rely on subsampling.

• Peers exchange rank-one approximations and consolidate the patterns to achieve common approximations.

• Similar approach has been successfully used in Conquest.

Page 19: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionConsolidation Algorithm

• Straightforward solution using DHT:Use presence vector as a key.Nodes at Hamming distance one communicate

their patterns during consolidation.

• Problems with this approach:Patterns are high-dimensional (curse of

dimensionality).Patterns are not uniformly distributed (load

imbalance).

Page 20: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionConsolidation Algorithm

• High-Dimensionality problem: use only a prefix of the pattern: If patterns are not similar, the keys will not be similar. If keys are similar, patterns still need to be

consolidated.

• Load imbalance problem: define a proximity preserving hash function.Aggregate multiple bits to force uniform distribution,

while preserving proximity.

Page 21: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionConsolidation Algorithm

• Define computation groups based on Hamming distance of node identifiers.

Page 22: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionConsolidation Algorithm

• Algorithm works in rounds:Peers exchange their rank-one approximations with

their computation neighbors.Patterns are locally consolidated and forwarded to

computation neighbors.This process is repeated in as many rounds as the

bound on the error of the approximation (Hamming radius).

Page 23: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Binary Matrix DecompositionConsolidation Algorithm

• Proximity preserving Hash function

otherwise 1

0 if 01f

fj

1i

i

ji

yb

1

1

1

1

12

1minarg where

;0 if 0xf

fjj

Nxiii

ii

i

pssf

i

f

Break pattern into strides such that the probability of having all zeros in the stride is approximately 0.5.

Page 24: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Evaluation• Prototype implementation.

• Cluster with 18 workstations. Each workstation runs 15 processes, emulating a total of 270 peers.

• Dataset 1: Walmart customer’s transactions Translated into a binary matrix with 34,239 columns (items) and ~1

million rows (transactions).

• Dataset 2: Synthetic dataset generated with IBM Quest generator. Multiple datasets generated by varying the number of underlying

patterns, correlation between patterns, and confidence of a pattern. [3 million transactions]

Page 25: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Evaluation• Evaluation of the quality of the results in terms of

precision and recall.

• Compression• Load balancing

2

2&

F

F

Ã

ÃAprecision 2

2&

F

F

A

ÃArecall

Page 26: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Precision and recall for the Walmart dataset.

Page 27: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Precision for the synthetic datasets.

Page 28: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Recall for the synthetic datasets.

Page 29: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Compression for the Walmart dataset.

Page 30: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Compression for the synthetic datasets (60% and 90%).

Page 31: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Load balancing (Walmart dataset).

Page 32: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Experimental Results

• Load balancing (Synthetic datasets).

Page 33: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Structured P2P SystemsLocality in DHT

Page 34: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Enhancing Locality in DHT• Computers (nodes) have unique ID

Typically 128 bits long Assignment should lead to uniform distribution in the node

ID space, for example SHA-1 of node’s IP

• Scalable, efficient O(log(N)) routing table entries per node Route in O(log(N)) number of hops

• Virtualization destroys locality.

• Messages may have to travel around the world to reach a node in the same LAN.

• Query responses do not contain locality information.

Page 35: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Enhancing Locality in DHT• Two-level overlay

One global overlay Several local overlays

• Global overlay is the main repository of data. Any prefix-based DHT protocol can be used.

• Global overlay helps nodes organize themselves into local overlays.

• Local overlays explore the organization of the Internet in ASs.

• Local overlays use a modified version of Pastry.

• Size of the local overlay is controlled by a local overlay leader. Uses efficient distributed algorithms for merging and splitting

local overlays. The algorithms are based on equivalent operations in hypercubes.

Page 36: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Simulation Setup• Underlying topology is simulated using data collected

from the Internet by the Skitter project at CAIDA.

• Data contains link delays and IP addresses of the routers. IP addresses are mapped to their ASs using BGP data collected by the Route Views project at University of Oregon.

• The resulting topology contains 218,416 routers, 692,271 links, and 7,704 ASs.

• We randomly select 10,000 routers and connect a LAN with 10 hosts in each one of them.

• 10,000 overlay nodes selected randomly from the hosts.

Page 37: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Simulation Setup• NLANR web proxy trace with 500,254 objects.

• Zipf distribution parameters: {0.70, 0.75, 0.80, 0.85, 0.90}

• Maximum overlay sizes: {200; 300; 400; 500; 1,000; 2,000}

• Local cache size: 5MB (LRU replacement policy).

Page 38: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Simulation Results• Performance gains in delay response

Page 39: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Simulation Results• Performance gains in number of messages

Page 40: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Unstructured P2P SystemsSearch with Probabilistic

Guarantees

Page 41: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Unstructured P2P Networks Randomized Algorithms

• Randomization and Distributed SystemsBreak symmetry (e.g. Ethernet backoff)Load balancing.Fault handling.Solutions to problems that are unsolvable

deterministically. (e.g., Consensus under failure)

• Uniform sampling is a key component in the development of randomized algorithms.

Page 42: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

• Substrate for the development of randomized algorithmsSearch with probabilistic guarantees.Duplicate elimination. Job distribution with load balancing.

• Definition: An algorithm samples uniformly at random from

a set of nodes in a connected network if and only if it selects a node i belonging to the network with probability 1/n, where n is the number of nodes in the network.

Page 43: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling• In a complete network, the problem is trivial.

• Random walks of a prescribed minimum length gives random sampling – independent of origin of walk.

• A long walk reaches stationary distribution π,

πi = di / 2|E|. Not a uniform sample if network nodes have different

degrees.

• In [Awan et al 04], we study different algorithms that change the transition probabilities among neighbors in order to achieve uniform sampling using random walks. We show, using simulation, that the length of the random walk is O(log n).

Page 44: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Algorithm• To share its content with other peers in the network, a

peer p installs references to each of its objects at a set Qp of peers. Any metainformation can be published.

• To provide guarantees that content published by a peer p can be found by any other peer in the network, three fundamental questions need to be answered:

1. Where should the nodes in Qp be located in the network?

2. What is the size of the set Qp?3. When a peer q attempts to locate an object, how many

peers must q contact?

Page 45: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Algorithm

• We select nodes in Qp uniformly at random from the network. It provides fault tolerance. In the event of a node failure, the

node can be easily replaced. It facilitates search, since there is no consistent global routing

infrastructure.

• Questions 2 and 3 can be answered using the “birthday paradox” (or a “balls and bins” abstraction). “How many people must there be in a room before there is a

50% chance that two of them were born on the same day of the year?” (at least 23)

If we set Qp to and we search in an independent set of the same size, we have high probability of intersection.

nn ln

Page 46: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Controlled Installation of References• If every replica inserts reference pointers, popular

objects may have their reference pointers on all peers.• We use a probabilistic algorithm to decide if a node

should install pointers to its objects.• When a peer p joins the network, it sends a query for

an object using a random walk of length γ√n: If the query is unsuccessful, then p installs the pointers with

probability one. If the query is successful and the responding peer q is at a

distance l from p, then p installs pointers with probability l/γ√n.

Page 47: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Simulation Setup• Overlay topology composed of 30,607 nodes Partial view of the Gnutella network. Power-law graph. Random graph.

• Node dynamics: Static Dynamic: simulates changes of connections Failures without updates Failures with updates

• Measurements: Distribution of replication ratios. Average number of hops (unbounded TTL). Percentage of query failures (bounded TTL). Percentage of object owners that install pointers. Number of messages per peer.

Page 48: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Simulation Results• Cohen and Shenker [SIGCOMM 2002] showed that a replication

proportional to the square root of the access frequency is optimal.• Distribution of replication ratios.

Page 49: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Simulation Results• Percentage of failures of a query as a function of object

popularity (left γ=1, right γ=2).

Page 50: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Search with Probabilistic Guarantees

Simulation Results

• Fraction of replicas installing pointers.

Page 51: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Unstructured P2P SystemsDuplicate Elimination

Page 52: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination• Consider a peer-to-peer storage system, designed

without any assumptions on the structure of the overlay network, and which contains multiple peers, each peer holding numerous files.

• How can a peer determine which files need to be

kept using minimum communication overhead?

• How can the storage system as a whole make sure that each file is present in at least k peers, where k is a system parameter chosen to satisfy a client or application’s availability requirements?

Page 53: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination• Problem can be abstracted to a relaxed and probabilistic

version of the leader election problem.

• Divide the process of electing a leader into two phases

• First Phase: Reduce the number of potential leaders by half in every round. Number of messages exchanged in the system is O(n) At the end of the first phase, have at least C contenders.

(0 < C < sqrt(n/lnn))

• Second Phase: Use birthday paradox (Probabilistic Quorum) solution.

Page 54: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination

• Contender – Node that wants to be a leader

• Mediator – Node that arbitrates between any two contenders for a given round

• Winner – Node that proceeds as contender to the next round

Page 55: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination

• Each contender sends sqrt(n ln2/(E[Xi]-1)) in round i of the first phase. Where E[Xi] is the expected number of contenders in round i.

• The total number of messages in the first phase is O(n).

• Each contender sends sqrt(nln n) messages in the second phase.

• The total number of messages in second phase is O(n).

Page 56: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Illustration of First Phase

Round 1:

F

G

E

D

H

A B

C

DD

AAHH

G,FG,F

B,EB,E

CC

Numbers are for illustrative purposes

Page 57: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Illustration of First Phase

Round 1:

F

G

E

D

H

A B

C

DD

AAHH

G,FG,F

B,EB,E

CC

A,C,D,H proceed to round 2

Page 58: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Illustration of First Phase

Round 2:

F

G

E

D

H

A B

CAA

HHHH

DD C,DC,D

CC

AA

Page 59: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Illustration of First Phase

Round 2:

F

G

E

D

H

A B

CAA

HHHH

DD C,DC,D

CC

AA

A, H proceed to 2nd phase

Page 60: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Illustration of Second Phase

F

G

E

D

H

A B

CAA

HHH,AH,A

AA

HH

57

98

Page 61: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Illustration of Second Phase

H is the leader

F

G

E

D

H

A B

CAA

HHH,AH,A

AA

HH

57

98

Page 62: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate EliminationSimulation Setup

• Power-law random graph with 50,000 nodes.• One object is replicated at the nodes (1% to 50%)• Measurements

Message overheadLoad distributionAccuracy of the protocols

Page 63: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination• Total number of messages in the system vs percentage of

replicas.

Page 64: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination

• Number of messages received per node.

Page 65: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate EliminationPlanetLab Experiment

• 132 PlanetLab nodes.• File traces from Microsoft

10,568 file systems4,801 Windows machines10.5TB of data

• Measurements:Message overheadSpace reclaimedMemory overhead per node

Page 66: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Duplicate Elimination

• Number of messages received per node.

Page 67: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Conclusion

• Problems relating to search, semantic queries and resource management have been addressed for structured and unstructured P2P systems.

• Other results (not presented) include cycle sharing infrastructure in unstructured networks [Parallel Computing 06].

Page 68: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Conclusion

• A number of challenges still remain:

Page 69: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Acknowledgments

• Professors Ananth Grama and Suresh Jagannathan (Advisors).

• Professors Sonia Fahmy and Zhiyuan Li (Committee members).

• My labmates: Asad, Deepak, Jayesh, Mehmet, Metin, Murali.

Page 70: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Extra Slides

Page 71: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Motivation - P2P Systems

• Proliferation of cheap computing resources: cpu, storage, bandwidth.

• Global self-adaptive systemUtilize resources wherever possible.Localize effects of single failures.No single point of vulnerability.

Page 72: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Motivation - P2P Systems

• Reduce reliance on serversEliminate bottlenecks, improve scalability.Lower deployment costs and complexity.Better resilience – no single point of failure.

• Direct client connectionFaster data dissemination.

Page 73: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Related Work

• Duplicate EliminationSALAD

• P2P AlgorithmsCohen and ShenkerChord/Pastry/Tapestry/CAN

Page 74: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling• In a complete network, the problem is

trivial.

• Random walks of a prescribed minimum length gives random sampling – independent of origin of walk.

• Random walks are conveniently analyzed using Markov model:Represent the network as a graph G(V,E), and

define probability matrix P, where pij = 1/di, (i,j) ϵ E.

A long walk reaches stationary distribution π, πi = di / 2|E|.

Not a uniform sample if network nodes have non-uniform degree distribution.

Page 75: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

• Random sampling using random walk on a power-law graph

Page 76: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

• To get uniform sampling we need to change the transition probabilities.

• Symmetric transition probability, or doubly stochastic, matrix yields random walks with πi = 1/ n.

• Length of walk = O(log n).

Page 77: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

• Aim: Design distributed algorithms to locally compute transitions between neighboring nodes, resulting in a global transition matrix with stationary uniform distribution.

• Known algorithms:Maximum-Degree Algorithm:

otherwise 0

if /1

if /1

max

max

jidd

jid

p iij

Page 78: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

Metropolis-Hastings Algorithm:

otherwise 0

if 1

if ),max(/1

jip

jidd

pj

ij

ji

ij

Page 79: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling• Random Weight Distribution (RWD) [Awan et al

04]Initialization: Assign a small constant transition

probability,1/ρ (system parameter ρ ≥ dmax), to each edge. This leaves back a high self-transition probability (called weight)

Iteration: Each node i, randomly distributes its weight to neighbors by incrementing transition probability with them symmetrically – using ACKs and NACKs.

Termination: For node i, either pii = 0 or pij = 0 is a neighbor of i.

Each increment is done by a quantum value (system parameter).

Page 80: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

• Node sampling probability at 3log n for RWD and MH

Page 81: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Uniform Sampling

• Node sampling probability at 5log n for RWD and MH

Page 82: Distributed Algorithms for Peer-to-Peer Systems Ronaldo Alves Ferreira PhD Thesis Advisors: Ananth Grama and Suresh Jagannathan Department of Computer

Potential Scenarios

• Communication Instant messagingVoice, video

• CollaborationProject workspacesFile sharingGaming

• Content distributionSports scores, weather, news, stock tickers, RSSFile bulk transfer, streamed media, live content