peer to peer systems architecture & research overview by shay horovitz
Post on 19-Dec-2015
216 views
TRANSCRIPT
Peer To Peer Systems
Architecture & Research Overview
by
Shay Horovitz
Lecture contents
What is Peer-to-Peer computing What makes it distinctive What are the potentials Examples of existing applications Possible applications Research item – Distributed trie Summary
Introduction
Peer-To-Peer Computing Today
Based on research of the Swedish Institute of Computer Science
Rise and Fall of P2P in Media
“Peer-to-peer is the next great thing for Internet” Stanford Law Net Guru – Lawrence Lessig, 2000
“Peer-to-peer computing is leading us into the 3rd age of the internet” Bob Knighton – Inter Corp, Fall 2000
“Is P2P plunging off the deep end” Wall Street Journal, April 2, 2001
“Does Peer-to-Peer Suck?” Jon Katz – Slashdot, April 4, 2001
What’s P2P computing ?
Webster definition of peer:
“one that is of equal standing with another”
P2P Computing – Computing between equals
What makes it distinctive ?
Class of applications that take advantage of resources: storage,CPU, content – available at the edges of the network
Edges ? Users and their PCs and devices Often
Without permanent IP address Turned off !!!
The Contrast: Server-Centric
The Client/Server paradigm The client is basically a glorified I/O
device Information, Control, Computation is
kept at the Server Simpler to build centralized systems
Client Server & ‘Problems.COM’
A Typical Client-Server architecture: Server is a “Super Computer” Addresses/Ports of servers are known MANY clients to ONE server Client is just a “Monitor” Server is down = Network is down Server is Expensive Scalability – More clients=More Servers
Client Server Solutions
Replication (Many Servers) Expensive, Synchronization and much
more… Brute Force (Faster Server)
Expensive, Scalability, Single point of failure
P2P Topologies
Centralized Hierarchical Decentralized Hash Circle Decentralized with Super Nodes
Topologies - Centralized
Like Client-Server, many clients and one server entity (one server / group of servers)
Used in Napster Server acts like “144”, just helps to
initiate the communication Simple to design
Topologies - Centralized
Topologies - Centralized
Ways of action: Client sends server the query, server ask
everyone and responds to client Client gets list of clients from server All Clients send ID’s of the data they hold
to the server and when client asks for data, server responds with specific addresses
Topologies - Hierarchical
Servers are organized in a tree Suits for communication between
“hierarchical objects” like companies, organizations – Inside P2P, Outside Client-Server
Suits for security architectures like Certificate Authority
Topologies - Hierarchical
Topologies - Hierarchical
Ways of action: Much like the Centralized topology Can set policy rules at the level of
servers Server sends the queries to his ancestor
when needed
Topologies - Decentralized
It’s the “Pure” P2P topology No servers (well, maybe just one !) Topology changes as peers are
joining/leaving the network Mainly, the topology is really based on
the “logical” behavior of the peers
Topologies - Decentralized
Topologies - Decentralized
Ways of action: Peer sends requests to his “neighbors” Neighbors route the requests to their
neighbors Many message could drop since “weak”
peers might not work as fast as needed In future, special algorithms will dictate
the behavior of this topology
Topologies – Hash Circle
Mainly for file-sharing, storage-distribution
All resources are represented by a hash value
Only “Exact” searches are allowed
Topologies – Hash Circle
Topologies – Hash Circle
Ways of action: When a peer joins, it gets responsibility for part
of the hash space Each peer knows his neighbors in the hash
space and a few other randomly chosen peers Requests are forwarded to the node closest to
the hash query Requires O(logN) forwards = low bandwidth
Topologies – Decent’ + Super Nodes
New topology Still – no servers (not expensive ones
at least) Used in iMesh, Kazaa Slow peers do not slow the search
Topologies – Decent’ + Super Nodes
Topologies – Decent’ + Super Nodes
Ways of action: A super node is a normal node that’s
elected to act as a local server Usually super nodes are elected for their
bandwidth Requests are forwarded from slow peers
to super nodes
P2P Application
Application is P2P if:• Allows for variable connectivity &
temporary network addresses• Gives the nodes at the edges of the
network significant autonomy
Another Point Of View
In P2P, peers in relation to each other act as: Clients AND Servers AND Routers AND Caches AND… EVERYTHING
What makes (made) it so hyped ?
Industry looking for something positive after .Com death
Has large social consequences The Internet has already changed society We can expect further changes
Some very interesting applications became widely known and used Napster, iMesh, Gnutella, FreeNet, Kazaa,
Morpheus, CuteMX, Scour …
Potential of P2P
Better resource utilization Scalability Fault-tolerance Denial of service tolerance
Research item
Efficient Peer-To-Peer Lookup Based
on a Distributed Trie
Michael J.Freedman
MIT Lab for Computer Science
Radek Virgralek
InterTrust STAR Lab
Published - 2002
Until Now…
2 main approaches for lookup : Broadcast searches (gnutella) Location deterministic algorithms (chord)
The new approach: Distributed Trie!
What’s a Trie ?
A trie is a tree that store a string by
representing each character in the
String as an edge on path from root
to leaf
Trie example
Words :
•Then
•Them
•Those
•Toss
•Ball
The lookup scales
Lookups efficiency
Maintenance cost
Efficient lookup methods
Replicating the lookup structure on every peer
BUT – slow maintenance
How to reduce maintenance costs ?
Reducing maintenance costs
1st approach : Eliminate the lookup structure – thus there is NO maintenance
Lookups are “broadcast-like” – costing efficiency and scalability
Implemented in Gnutella
Reducing maintenance costs
2nd approach : Partitioning the lookup structure
Distribute subsets of partitions on each peer Peers update only small number of replicas The systems assign partitions to peers by:
Static assignment Dynamic assignment
Static partitions assignment
Each node replicates only
partitions that are “Close”
to it’s address
Dynamic partitions assignment
Each node replicates only
partitions that are
frequently accessed by
the node
Relaxing the consistency
Update local lookups “Lazily” – just when a node actually get a request for a key !
BUT – What do we get out of this ?
Relaxing
The Good
Should reduce maintenance cost
since we actually use less updates
Relaxing
The Bad
Peers hold stale replicas because
of the “lazy” updates of the local
lookup structures
Relaxing
and… the UGLY
Limit addressing errors by piggybacking the updates on other traffic
Offered algorithms nature
Use dynamic partitioning based on peers’ access locality
Use Lazy updates to reduce maintenance cost
Piggyback trie state on lookup responses only
Use Timestamping to reconcile conflicting updates
Algorithms differences
Difference in the volume of the trie structure piggybacked
Algorithms differences
Difference in how aggressively the requester uses the partitions
Security in a trie…
Who knows who the caller is ? Who knows who the callee is ?
The system model
Lookup ( key )
The callee sends the caller the value associated with key if successful or a failure message
The system model
Insert ( key, value )
The callee inserts a < key, value > pair into its lookup structure
The system model
Join()
The callee sends to the caller initial state needed to bootstrap lookup operations
Back to Maintenance…
Update a value ? NO ! Delete a value ? Oh NO ! So what’s left ? Re-insert !!!
Can Re-insert in the same key Can Re-insert in other keys “Actual Deletion ” is made for old info,
according to timestamp value
A Closer Look at Dist’ Trie
What’s in it ?
Peer storage
Each peer holds a number of key/value pairs locally
Peer storage
The peer also stores partitions of a lookup organized as a trie
Peer storage
Very Important – A trie representation is insensitive to the insertion ordering !
It’s easy to merge two versions of the lookup structure
Trie internals
A trie node consist of 2^m routing tables Each routing table consists of L entries
Trie Internals
Each entry in table consists a peer
address ‘a’ and a timestamp ‘t’
Trie Internals
Each level of trie “consumes” ‘m’ bits of the ‘k’-bit key
Trie Internals
If the node if a Leaf ( having depth [k/m] ), then peer ‘a’ was known at time ‘t’ to hold the replica of the i-th child of the node
The Ancestor Invariant
All peers maintain the Ancestor
Invariant:
If a peer holds a trie node, it
must hold all ancestors of the
node
The Ancestor Invariant
Conclusion – Logically, nodes closer to the trie root are more widely replicated by peers, removing any single point of failure
Welcome to “PeerLand”
JOIN – in order to join the system, a peer must know the address of at least one participating peer, called its introducer.
The introducer sends its root routing table to the new client
Inserting Data
Performed locally by inserting a key/value pair in the local storage
Alternatively, a peer can send insert request to other peers
General Lookup Algorithm
Lookup()
{
Key= LocalStorage.CheckForKey(KeyName)
If NotEmpty(Key)
{
CreateProcess(DistributedLookup)
}
}
Lookup example
Assume A calls: lookup(0101000000) keyName=0101000000 A.LocalTrie.FindMatch(keyName)
Lookup example - cont B = A.currentNode.GetLatestUpdatedAddressInTable If B.HasActualValue then
B.returnValueTo(A) Else
B.returnNullTo(A)
Lookup example - cont
“Else B.returnNullTo(A)” = Failure – so A will turn to the next “B” in the table according to decreasing TIMESTAMP order
Lookup example - cont
If there are no more “B”s in the table, A will
call the process on the parent’s routing table
Modes of action
Bounded mode Unbounded mode Full Path mode
Modes for exploring the tradeoff between size of the piggybacked trie and the speed of convergence to an accurate map:
Bounded Mode
The Callee send its most
specific routing table
matching the key (or the
value itself) if its routing table
is more specific (deeper) than
the Caller’s table.
Unbounded Mode
The Callee send its most
specific routing table
matching the key (or the
value itself) REGARDLESS
of the Callers table.
WHY ???
Unbounded Mode
This might be useful to
get new peers when
backtracking on higher
levels of the trie !!!
Full Path Mode
The Callee send to the Caller
all of its tables from the root
to its most specific table
(deepest).
About Security
Most P2P lookup algorithms are
susceptible to malicious behavior.
How can you fool the trie ?
Security leaks
Possible Answer :
Security leaks
Another Possible Answer :
Security Modes
Conservative Mode – Update the trie just with the tables of
nodes that actually led to the information Ignore all other updates
Liberal Mode – Callers immediately update their local
tries with any piggybacked state !
Conservative example
Experiments & Simulation
200 peers L=10 (Size of table in each node) M=2 ( Size of step in trie levels) For each step, 2000 random
keys/value pairs During simulation, added/removed
peers with probability of 0.005
Failure Probability
A lookup fails when the requesting
peer’s trie didn’t contain sufficient
information to locate an existing
key/value pair (Even after contacting
other peers during lookup)
Probability of lookup failure
Message Overhead
A Lookup can be:• Local – satisfied locally• Remote – requires contacting peers
We measure the number of lookup
operations that were sent to other
peers in order to satisfy the request.
Message Overhead
Bibliography
“Efficient Peer-to-Peer Lookup Based on a Distributed Trie” / IPTPS ’02 Cambridge http://www.cs.rice.edu/Conferences/IPTPS02/167.pdf
“Peer-to-Peer Computing” / Swedish Institute of Computer Science (SICS) in Stockholm http://www.sics.se/~perbrand/open.pdf
“DISTRIBUTED HASH TABLES Building large-scale, robust distributed applications “ PODC ’02 Monterey http://www.podc.org/podc2002/kaashoek.ppt
Gnutella website http://gnutella.wego.com “Chord: A scalable peer-to-peer lookup service for internet
applications” ACM SIGCOMM, San Diego 01
THANK YOU !!!
For not falling asleep : - )