peer-to-peer computing ding choon hoong grid computing and distributed systems (grids) lab. the...
TRANSCRIPT
Peer-to-Peer Computing
Ding Choon HoongGrid Computing and Distributed Systems
(GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org
WW Grid
An Introduction
Outline
What is Peer-to-Peer Computing? P2P Topologies Example P2P Applications Some key issues Conclusion
What is peer-to-peer (P2P) computing?
Webster definition Peer: one that is of equal standing with another
Computing between equals
Resource Sharing
Exploit idle resources available in the edges
E.g. CPU idle cycles, unused storage space, spare network bandwidth,…
Exploit plentiful resources among network edges
E.g. network bandwidth
Federated cooperation among companies Sharing unavailable resources (e.g. databases)
……
Client-Server vs. P2P
Client-Server paradigm The client is a dumb device The server performs all computation, stores data, and
handle the control Simple architecture, but introduces:-
Performance bottlenecks, single point of failure, etc.
Each peer in P2P can be Client Server Intermediate: relaying requests/responses
History of P2P
Origin of P2P dates back to ARPANET Early P2P applications/servers is Usenet
and DNS 1990s: Shift in paradigm to client-server 1999: Napster => explosion of P2P usage 2000s: Gnutella, Kazaa, Audiogalaxy, etc.
Cluster, Grid, P2P: Characteristics
Characteristic Cluster Grid P2P
Population Commodity Computers
High-end computers Edge of network (desktop PC)
Ownership Single Multiple Multiple
Discovery Membership Services
Centralised Index & Decentralised Info
Decentralized
User Management Centralised Decentralised Decentralised
Resource management
Centralized Distributed Distributed
Allocation/Scheduling Centralised Decentralised Decentralised
Inter-Operability VIA based? No standards yet No standards
Single System Image Yes No No
Scalability 100s 1000? Millions? [@Home]
Capacity Guaranteed Varies, but high Varies
Throughput Medium High Very High
Speed(Lat. Bandwidth) Low, high High, Low High, Low
Types of P2P applications
Instant messaging Managing and sharing information Collaboration Distributed Services …more to come?
Generic P2P Topologies
Centralized Topology
Generic P2P Topologies (cont)
Ring Topology
Generic P2P Topologies (cont)
Hierarchical Topology
Generic P2P Topologies (cont)
Decentralized Topology
Generic P2P Topologies (cont)
Hybrid Topology Centralized and Ring
Topology
Generic P2P Topologies (cont)
Hybrid Topology Centralized and
Centralized Topology
Generic P2P Topologies (cont)
Hybrid Topology Centralized and
Decentralized Topology
Example P2P Applications
SETI@home Napster Gnutella FastTrack
SETI@home
SETI@home uses the National Astronomy and Ionospheric Center's 305 meter telescope at Arecibo, Puerto Rico.
A screenshot of the SETI@home client program. •2.4 mil volunteers as of Oct. 2000
Napster
Centralized MP3 file sharing Clients/Peers hold the files Servers holds catalog and broker
relationships Clients upload IP address, music file shared, and
requests Clients request locations where requests can be met
File transfer is P2P – proprietary protocol
Napster (cont)
NapsterClient
Napster Client
Napster Client
Napster Client
Napster Client
Napster Connection Host
Napster Index Server
Query
Direct File Transfer
Napster Server Cluster
Assigned Index Server
Connect
1
2
Reply
3
4
Gnutella
Completely decentralized – no servers with catalogs
Shares any files Gnutella node ---- SERVENT
Issue the query and view search result Accept the query from other SERVENTs and check the
match against its database and response with corresponding result
Gnutella (cont)
Joining the network:- The new node connects to a well-known SERVENT Then sends a PING message to discover other nodes PONG message are sent in reply from hosts offering
connections with the new node Direct connection are then made
Gnutella (cont)
Searching a file:- A node broadcasts its QUERY to all its peers who in turn
broadcasts to their peers Nodes route back QUERYHITS along the QUERY path
back to the sender containing the location detail To download the files a direct connection is made using
details of the host in the QUERYHIT message
Gnutella (cont)
Gnutella broadcasts its messages. To prevent flooding -TTL is introduced. To prevent forwarding same mesg. twice -
each servent maintains a list of recently seen mesg.
Gnutella (cont)
GnuCache
A
User A connects to the GnuCache to get the list of available servents already connected in the networkGnuCache sends back the list to the user AUser A sends the request message GNUTELLA CONNECT to the user BUser B replies with the GNUTELLA OK message granting user A to join the network
B
D
C
F
G
E
H
J
I
(1) (2)
(3)
(4)(1)
(1)
(1)
(2)
(2)
(2)
(3)
(3)
(3) (3)
(2)
Gnutella (cont)
Typical query scenario:- A sends a query message to its neighbor, B B first checks that the message is not an old one Then checks for a match with its local data If there is a match, it sends the queryHit message back
to user A B then decrements TTL by 1 and forwards the query
message to users C, D, and E C, D, and E performs the same steps as user B and
forwards the query message further to users F, G, H, and I
Gnutella (cont)
Problems Broadcast mesg. congests the network Lost of reply packets (dynamic environment)
FastTrack
Hybrid between centralized and decentralized
Has 2 tiers of control:- Ordinary nodes that connect to super nodes in a
centralized fashion Super nodes that connect to each other in a
decentralized manner
FastTrack (cont)
FastTrack (cont)
Joining the network? - Bootstrapping node Querying?
Problems (Like Gnutella) Broadcast mesg. between Super Nodes Lost of reply packets
Some key issues
Scalability Networks can grow to millions of nodes Challenge in achieving efficient peer and resource
discovery High amount of query/response traffic
Availability Potential for commercial content provision Such services require high availability and accessibility
Anonymity What is the right level of anonymity?
Some key issues (cont)
Security Due to open nature, have to assume environment is
hostile Concerns include:
Privacy and anonymity File authenticity Threats like worms and virus
Fault Resilience The system must still be able to function even though
several important nodes goes off-line.
Some key issues (cont)
Standards and Interoperability Lack of standards lead to poor interoperability between
applications Can be improved by using common protocols
Copyright / Access Control Classic case of Napster being shut down Other applications have learned to get around the law Possibility of paid access in future
Some key issues (cont)
Quality of Service (QoS) Metrics to be used is not clearly defined Tradeoff between achieving QoS and costs
Complexity of Queries Must be able to support query languages of varying
degree of expressiveness Simple keywords to SQL-like searches
Search Mechanism Different search algorithms are used to reduced search
time and maximize search space
Load Balancing existence of hot-spots (overloaded nodes) due to:
uneven node distribution throughout logical space uneven object distribution among nodes uneven demand distribution among objects
query and routing hot-spots
Self-organization Ability to adapt itself to the dynamic nature of the
Internet Depends on the architecture of the system
Conclusion
Different P2P network topologies Examples of different P2P applications Key issues related to P2P Further reading:-
http://www.gridbus.org/~raj/papers/P2PbasedContentSharing.pdf