peer-to-peer computing ding choon hoong grid computing and distributed systems (grids) lab. the...

Post on 15-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Peer-to-Peer Computing

Ding Choon HoongGrid Computing and Distributed Systems

(GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org

WW Grid

An Introduction

Outline

What is Peer-to-Peer Computing? P2P Topologies Example P2P Applications Some key issues Conclusion

What is peer-to-peer (P2P) computing?

Webster definition Peer: one that is of equal standing with another

Computing between equals

Resource Sharing

Exploit idle resources available in the edges

E.g. CPU idle cycles, unused storage space, spare network bandwidth,…

Exploit plentiful resources among network edges

E.g. network bandwidth

Federated cooperation among companies Sharing unavailable resources (e.g. databases)

……

Client-Server vs. P2P

Client-Server paradigm The client is a dumb device The server performs all computation, stores data, and

handle the control Simple architecture, but introduces:-

Performance bottlenecks, single point of failure, etc.

Each peer in P2P can be Client Server Intermediate: relaying requests/responses

History of P2P

Origin of P2P dates back to ARPANET Early P2P applications/servers is Usenet

and DNS 1990s: Shift in paradigm to client-server 1999: Napster => explosion of P2P usage 2000s: Gnutella, Kazaa, Audiogalaxy, etc.

Cluster, Grid, P2P: Characteristics

Characteristic Cluster Grid P2P

Population Commodity Computers

High-end computers Edge of network (desktop PC)

Ownership Single Multiple Multiple

Discovery Membership Services

Centralised Index & Decentralised Info

Decentralized

User Management Centralised Decentralised Decentralised

Resource management

Centralized Distributed Distributed

Allocation/Scheduling Centralised Decentralised Decentralised

Inter-Operability VIA based? No standards yet No standards

Single System Image Yes No No

Scalability 100s 1000? Millions? [@Home]

Capacity Guaranteed Varies, but high Varies

Throughput Medium High Very High

Speed(Lat. Bandwidth) Low, high High, Low High, Low

Types of P2P applications

Instant messaging Managing and sharing information Collaboration Distributed Services …more to come?

Generic P2P Topologies

Centralized Topology

Generic P2P Topologies (cont)

Ring Topology

Generic P2P Topologies (cont)

Hierarchical Topology

Generic P2P Topologies (cont)

Decentralized Topology

Generic P2P Topologies (cont)

Hybrid Topology Centralized and Ring

Topology

Generic P2P Topologies (cont)

Hybrid Topology Centralized and

Centralized Topology

Generic P2P Topologies (cont)

Hybrid Topology Centralized and

Decentralized Topology

Example P2P Applications

SETI@home Napster Gnutella FastTrack

SETI@home

SETI@home uses the National Astronomy and Ionospheric Center's 305 meter telescope at Arecibo, Puerto Rico.

A screenshot of the SETI@home client program. •2.4 mil volunteers as of Oct. 2000

Napster

Centralized MP3 file sharing Clients/Peers hold the files Servers holds catalog and broker

relationships Clients upload IP address, music file shared, and

requests Clients request locations where requests can be met

File transfer is P2P – proprietary protocol

Napster (cont)

NapsterClient

Napster Client

Napster Client

Napster Client

Napster Client

Napster Connection Host

Napster Index Server

Query

Direct File Transfer

Napster Server Cluster

Assigned Index Server

Connect

1

2

Reply

3

4

Gnutella

Completely decentralized – no servers with catalogs

Shares any files Gnutella node ---- SERVENT

Issue the query and view search result Accept the query from other SERVENTs and check the

match against its database and response with corresponding result

Gnutella (cont)

Joining the network:- The new node connects to a well-known SERVENT Then sends a PING message to discover other nodes PONG message are sent in reply from hosts offering

connections with the new node Direct connection are then made

Gnutella (cont)

Searching a file:- A node broadcasts its QUERY to all its peers who in turn

broadcasts to their peers Nodes route back QUERYHITS along the QUERY path

back to the sender containing the location detail To download the files a direct connection is made using

details of the host in the QUERYHIT message

Gnutella (cont)

Gnutella broadcasts its messages. To prevent flooding -TTL is introduced. To prevent forwarding same mesg. twice -

each servent maintains a list of recently seen mesg.

Gnutella (cont)

GnuCache

A

User A connects to the GnuCache to get the list of available servents already connected in the networkGnuCache sends back the list to the user AUser A sends the request message GNUTELLA CONNECT to the user BUser B replies with the GNUTELLA OK message granting user A to join the network

B

D

C

F

G

E

H

J

I

(1) (2)

(3)

(4)(1)

(1)

(1)

(2)

(2)

(2)

(3)

(3)

(3) (3)

(2)

Gnutella (cont)

Typical query scenario:- A sends a query message to its neighbor, B B first checks that the message is not an old one Then checks for a match with its local data If there is a match, it sends the queryHit message back

to user A B then decrements TTL by 1 and forwards the query

message to users C, D, and E C, D, and E performs the same steps as user B and

forwards the query message further to users F, G, H, and I

Gnutella (cont)

Problems Broadcast mesg. congests the network Lost of reply packets (dynamic environment)

FastTrack

Hybrid between centralized and decentralized

Has 2 tiers of control:- Ordinary nodes that connect to super nodes in a

centralized fashion Super nodes that connect to each other in a

decentralized manner

FastTrack (cont)

FastTrack (cont)

Joining the network? - Bootstrapping node Querying?

Problems (Like Gnutella) Broadcast mesg. between Super Nodes Lost of reply packets

Some key issues

Scalability Networks can grow to millions of nodes Challenge in achieving efficient peer and resource

discovery High amount of query/response traffic

Availability Potential for commercial content provision Such services require high availability and accessibility

Anonymity What is the right level of anonymity?

Some key issues (cont)

Security Due to open nature, have to assume environment is

hostile Concerns include:

Privacy and anonymity File authenticity Threats like worms and virus

Fault Resilience The system must still be able to function even though

several important nodes goes off-line.

Some key issues (cont)

Standards and Interoperability Lack of standards lead to poor interoperability between

applications Can be improved by using common protocols

Copyright / Access Control Classic case of Napster being shut down Other applications have learned to get around the law Possibility of paid access in future

Some key issues (cont)

Quality of Service (QoS) Metrics to be used is not clearly defined Tradeoff between achieving QoS and costs

Complexity of Queries Must be able to support query languages of varying

degree of expressiveness Simple keywords to SQL-like searches

Search Mechanism Different search algorithms are used to reduced search

time and maximize search space

Load Balancing existence of hot-spots (overloaded nodes) due to:

uneven node distribution throughout logical space uneven object distribution among nodes uneven demand distribution among objects

query and routing hot-spots

Self-organization Ability to adapt itself to the dynamic nature of the

Internet Depends on the architecture of the system

Conclusion

Different P2P network topologies Examples of different P2P applications Key issues related to P2P Further reading:-

http://www.gridbus.org/~raj/papers/P2PbasedContentSharing.pdf

top related