improving data access in p2p systems karl aberer and magdalena punceva swiss federal institute of...
Post on 22-Dec-2015
219 views
TRANSCRIPT
Improving Data Access in P2P Systems
Karl Aberer and Magdalena PuncevaSwiss Federal Institute of Technology
Manfred Hauswirth and Roman SchmidtTechnical University of Vienna
Outline Introduction Gnutella Gridella
P-Grid Search Algorithm Construction Algorithm Trie Construction Algorithm Mapping Filenames to Binary Keys Core System Components Communication Model
Performance Comparison Future Work
Introduction Client-Server-based systems:
Resources are concentrated Servers’ network bandwidth must be increased Caching, replication, load-balancing and fault-tolerance
algorithms were introduced to remedy P2P systems:
Every node(peer) acts as both client and server The P2P approach circumvents many problems of
client-server systems but results in considerably more complex searching, node organization, security, and so on
Napster, Gnutella, Gridella, …
Gnutella (1) Decentralized System Using Ping, Pong, Query, QueryHit and Push messages with TTL
field Connection setup:
• A sends a Ping to B.• B responds with a Pong to A, and forwards Ping to C and
D, who respond with another Pong.• After some time, A knows other peers and vice versa.
Query:• A initiates Query messages as describes above.• A runs a simplified HTTP GET interaction to retrieve file
when receives a QueryHit.• If the requested peer is behind a firewall, A might send a
Push message.
Gnutella (2) From a user’s view:
Simple, effective for high hit rates Fault tolerant toward peer failures Adapt well to dynamically changing peer populations
From a networking perspective: Price is very high bandwidth consumption Each node receiving the broadcast search request
scans its local database for possible hits TTL = 7, 4 connections C per peer, total number of
messages originating from one Gnutella message is:
TTL
i
iCC0
26240)1(**2
Gnutella (3) Free-riding:
Users provide no files (or few interesting files) to share
Nearly 70% of Gnutella users share no files and nearly 50% of all responses are returned by the top 1% of the sharing hosts.
Transform Gnutella into a client-server-like system that might face technical and legal issue similar Napster’s
Reputation: Frequently meet unknown peers and have no way
to judge their reputations
Gridella Gridella is based on the Peer-Grid (P-Grid)
approach which is a virtual binary search tree that distributes replication over a community of peers.
Search time and number of generated messages grow as O(log2n) with the number of data items n
Peers perform construction and search/update operations without any central control or global knowledge in an unreliable environment
P-Grid’s Structure It’s completely decentralized. All peers serve as entry points for search. Interactions are strictly local. It uses randomized algorithms for access
and search. Probabilistic estimates of search request
success can be given. Search is robust against node failures. It scales gracefully in the total number of
nodes and data items.
P-Grid
Each peer is responsible for part of the overall tree. When a peer receives a query it cannot answer, it refers to its routing table to find the appropriate peer to forward the request to.
P-Grid Network
Peer routing tables provide at least one path from any peer receiving a request to one of the peers holding a replica so that any query can be satisfied regardless of the peer queried.
Search Algorithm
The algorithm compares the common prefix of the peer’s path to the query submitted to find the “closest” peer.
Construction Algorithm
When two peers meet, they divide the search space. Each takes responsibility for one half and stores the address of the other peer to cover the other half.
Construction Simulation Each peer participates in a constant number of
exchanges independent of the population size. It scales gracefully as maximum path length
grows. To obtain fast convergence, the maximum
allowed recursion depth should exceed a minimum value.
The number of peers responsible for the same keys is distributed uniformly with a low deviation from the expected average number of peers responsible for a key.
Trie Construction Algorithm
The algorithm constructs a balanced trie structure, which the mapping algorithm uses to compute binary search keys.
Mapping Strings into Binary Keys
The mapping algorithm uses the trie structure to map strings to binary keys.
Gridella Core System Components
The Gridella client provides user-related functionality, while the server handles data management and communication.
Gridella Communication Model
Queries are mapped into binary keys and sent to the local Gridella server, which either answers the query or forwards it to the appropriate peer.