complex adaptive systems c.d.l. informatica – università di bologna peer-to-peer systems and

Complex Adaptive SystemsC.d.L. Informatica – Università di Bologna

Peer-to-peer systems andoverlay networks

Fabio PicconiDipartimento di Scienze dell’Informazione

• Introduction to P2P systems

• Common topologies

• Data location

• Churn

• Newscast algorithm

• Security

Outline

Peer-to-peer vs. client-server

client-server

• Server well connected to the “center” of the Internet

• Servers carries out critical tasks

• Clients only talk to server

• Only nodes located on the “periphery” of the Internet

• Tasks distributed across all nodes

• Clients talk to other clients

peer-to-peer

Example – Video sharing

Client-server: YouTube

client-server

Advantages• Client can disconnect after upload

• Uploader needs little bandwidth

• Other users can find the file easily(just use search on server webpage)

Disadvantages• Server may not accept file or

remove it later (according to content policy)

• Whole system depends on the server(what if shut down like Napster?)

• Server storage and bandwidthare expensive!

uploader

downloader downloader

downloader

downloaderdownloader

Example – Video sharing

Peer-to-peer: BitTorrent

peer-to-peer

Advantages• Does not depend on a central server

• Bandwidth shared across nodes(downloaders also act as uploaders)

• High scalability, low cost

Disadvantages• Seeder must remain on-line to

guarantee file availability

• Content is more difficult to find(downloaders must find .torrent file)

• Freeloaders cheat in order to download without uploading

seeder

downloader downloader

downloader

downloaderdownloader

Comparison: P2P vs. client-server

Client-server• Asymmetric: client and servers

carry out different tasks

• Global knowledge: servers have a global view of the network

• Centralization: communications and management are centralized

• Single point of failure: a server failure brings down the system

• Limited scalability: servers easily overloaded

• Expensive: server storage and bandwidth capacity is not cheap

Peer-to-peer• Symmetric: each node carries out

the same tasks

• Local knowledge: nodes only know a small set of other nodes

• Decentralization: nodes must self-organize in a decentralized way

• Robustness: several nodes may fail with little or no impact

• High scalability: high aggregate capacity, load distribution

• Low-cost: storage and bandwidth are contributed by users

Characterizing peer-to-peer systems

The main characteristics of P2P systems are:• decentralization (i.e., no central server)

• self-organization (e.g., adding new nodes and removing disconnected ones)

• symmetric communications (e.g., peers act as clients and servers)

• scalability (thanks to high aggregate capacity and load distribution)

• shared ownership (i.e., storage and bandwidth are contributed by peers)

• overlay construction and routing (i.e., nodes form a logical network on top of the underlying IP network)

a message from one peer to another is sent through the underlying IP network

P2P environment

P2P systems are deployed in a challenging environment:• High latency and low bandwidth between nodes

- a high hop count will result in a high end-to-end latency - transferring large files may take a long time

• Churn- nodes may disconnect temporarily- new nodes are constantly joining the system, while others leave the

overlay permanently

• Security- P2P clients run on machines under full control of their users- data sent to other nodes may be erased, corrupted, disclosed, etc.- malicious users may try to bring down the system (e.g., routing attack)

• Selfishness- users may run hacked P2P clients in order to avoid contributing resources

Problems

Some of the problems that a P2P systems designer must face:• Overlay construction and maintenance

- maintain a given overlay topology (e.g., random, two-level, ring, etc.)

• Data location- locate a given data object among a large number of nodes

• Data dissemination- propagate data in an efficient and robust manner

• Per-node state- keep the amount of state per node small

• Tolerance to churn - maintain system invariants (e.g., topology, data location, data availability)

despite node arrivals and departures

Topology

Some common topologies:• Flat unstructured: a node can connect to any other node

- only constraint: maximum degree dmax

- fast join procedure- usually very tolerant to churn- good for data dissemination, bad for location

• Two-level unstructured: nodes connect to a supernode- supernodes form a small overlay- used for indexing and forwarding- large state and high load on supernodes

• Flat structured: constraints based on node ids - allows for efficient data location- constraints require long join and leave procedures- less robust in high-churn environments

Data location - Flooding

Problem: find the set of nodes S that store a copy of object O

Solutions:(1) Flooding: send a search message to all nodes [first Gnutella protocol]

• A search message contains either keywords or an object id

Advantages:- simplicity- no topology constraints

Disadvantages: - high network overhead (huge traffic generated by each search request)- flooding stopped by TTL (which produces search horizon)- only applicable to small number of nodes

Data location - Flooding

(1) Flooding (cont.)

Flooding in a flat unstructured network:

search horizon forTTL = 2

Objects that lie outside of the horizon are not found

search

Data location - Superpeers

(2) Two-level overlay: use superpeers to index the locations of an object[eMule, Gnutella 2, BitTorrent]

• Each node connects to a superpeer and advertises the list of objects it stores

• Search requests are sent to the superpeer, which forwards them to other superpeers

Advantages:- highly scalable

Disadvantages:- superpeers must be realiable, powerful and well connected to the Internet (expensive)- superpeers must maintain large state- the system relies on a small number of superpeers

Data location - Superpeers

(2) Two-level overlay (cont.)

• A two-level overlay is a partially centralized system• In some systems superpeers do not connect to each other (e.g., BitTorrent)

responserequest

Data location - KBR

(3) Structured networks: use a routing algorithm that implements Key-BasedRouting [Overnet, Kad, BitTorrent trackerless]

Key-Based Routing (also known as Distributed Hash Tables, or DHTs)works as follows:

• each node is given a unique node identifier, or nodeid

• given a key k, the node whose nodeid is numerically closest to kamong all nodes in the network is known as the root of key k

• given a routing key k, a KBR algorithm can route a message to theroot of k in a small number of hops, usually O(log N)

• the location of an object with id objectid is tracked by the root ofk = objectid

• thus, one can find the location of an object by routing a message to theroot of k = objectid and querying the root for the location of the object

Data location - KBR (cont.)

Key-Based Routing [Pastry]

Source node id: 04F2

k = object id: 8955

Hop # Hop id Shared prefix length0 04F2 01 85E0 12 8909 23 8957 34 8954 3 (root of k)

route(k=8955,msg)

overlay address space[0000,FFFF]

85E0obj8955stored onnodes 620F,C52A

obj 8955

Object 8955 is tracked by node 8954, which knows of two copies storedat nodes 620F and C52A

Routing table for node 4F28 [Pastry]

Routing table

Leaf Set

4F04 4F1B 4F21 - 4F30 4F55 4FF5

Node id: 4F28

used to find next hop with longer shared prefix

used to find the nodeid closest to a key that is close to the local nodeid

• In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = 65536 nodes.

• The average route length in this case is 4 hops.

(3) Structured networks (cont.)

Advantages:- completely decentralized (no need for superpeers)- routing algorithm achieves low hop count for large network sizes

Disadvantages:- each object must be tracked by a different node- objects are tracked by unreliable nodes (i.e., which may disconnect)- keyword-based searches are more difficult to implement than

with superpeers (because objects are located by their objectid)- the overlay must be structured according to a given topology

in order to achieve a low hop count- routing tables must be updated every time a node joins or leaves the overlay

Data location - Loosely structured overlays

(4) Loosely structured networks: use hints on the location of objects [Freenet]

• Nodes locate objects by sending search requests containing the object id

• Requests are propagated using a technique similar to flooding

• Objects with similar identifiers are grouped on the same nodes

DE F5B20

requestfor AE5J

responseAF02

(4) Loosely structured networks (cont.)

• A search response leaves routing hints on the path back to the source

• Hints are used when propagating future requests for similar object ids

HintsAE5J: D5B20: E

HintsAE5J: F

HintsAE5J: B

requestfor AF02

(4) Loosely structured networks (cont.)

Advantages:- no topology constraints, flat architecture- searches are more efficient than with plain flooding

Disadvantages:- does not support keyword-based searches- search requests have a TTL- contrary to structured overlays, loosely structured overlay do not

guarantee a low number of hops, nor that the object will be found

The location schemes described previously can be classified according to:

• degree of structure

• decentralization

unstructured loosely structured

structured

partially centralized

eMuleGnutella 2BitTorrent

decentralized Gnutella Freenet Kad

Data location - Summary

Churn (node arrivals and departures) can have several effects on a P2P system:

• data objects may be become unavailable if all replicas disconnect

• routing tables may become inconsistent(e.g., entries may point to nodes which have disconnected)

• the overlay may become partitioned if several nodes suddenly disconnect:

Effects of churn

Node arrivals and departures must not disrupt the normal behavior of the system

• system invariants must be maintained- connected overlay (i.e., no partitions), low average path length- data objects accessible from anywhere in the network

• two types of churn tolerance:

- dynamic repair: ability to react to changes in the overlay to maintainsystem invariants (e.g., heal partitions)

- static resilience: ability to continue operating correctly before adaptation

occurs (e.g., route messages through alternate paths)

Churn tolerance

An simple way to prevent partitions is to increase the node degree

Churn – Preventing partitions

Ring partitions can be avoided by keeping a list of successor nodes

Percentage of failed paths for various numbers of successor nodes

Churn – Static resilience (Ring topology)

1 successor16 successors48 successors

More successors reduce the chances of “opening” the ring

Percentage of failed paths for various overlay topologies

Churn – Static resilience (various topologies)

Some topologies provide more alternate paths between nodes than others

• Dynamically update routing information to adapt to overlay changes

• Two types of repair algorithms:- reactive: start maintenance procedure immediately after detection- periodic: execute maintenance procedure periodically

A reactive algorithm can bring down the system instead of repairing it:

• Each node disconnection triggers a maintenance procedure on a set of nodes

• Over a given churn rate, the maintenance traffic congests the network

• The network congestion leads to nodes being considered as disconnected

• This triggers even more maintenance procedures (positive feedback), eventually bringing down the network

This effect may be avoided using a periodic maintenance algorithm

Churn – Dynamic repair (Ring topology)

Churn is an important issue in P2P overlays:

• Data may become unavailable, and routing information outdated

• Static resilience- depends on the topology (i.e., the number of alternate paths)- increases with the average node degree

• Dynamic repair protocols must be carefully designed- reactive protocols are usually faster- periodic protocols can handle higher churn rates

Churn – Summary

Newscast• Peer-to-peer protocol that creates and maintains an unstructured overlay

• Highly resilient to churn

• Can be used to propagate information

• Extremely simple design based on information gossip:

- Each node only knows about a small set of other knows (view)

- Each node periodically picks a random node from this set

- Both nodes exchange their views and update them

• The random view exchange makes the algorithm very robust to failures and changes in the overlay

Newscast - View exchange• Each node maintains a view v containing c entries, where

entry = {node address, timestamp}

• Each node executes the following code every T seconds:

1. select random entry r from local view 2. send local view plus an entry with the local address to node r3. retrieve view of node r and merge it with local view4. keep the c entries with the most recent timestamps

• The view of a node changes on each round

Newscast - View exchange (cont.)

Example view exchange initiated by node A

B,10D,5E,3X,8S,12W,2

view of node A(c = 6)

A,15B,10D,5E,3X,8S,12W,2

E,15J,14C,9D,8H,10L,14Z,2

view of node E

1. Select random nodeB,10D,5E,3X,8S,12W,2

view of node A

selected node

2. Exchange views (plus local entry) with selected node

view of node A

E,15 J,14 L,14S,12H,10B,10 C,9X,8D,8D,5E,3W,2Z,2

3. Merge views, and order by timestamp

merge result

4. Keep c most recent entries

E,15 J,14 L,14S,12H,10B,10

new view of node A(c = 6)

Newscast - Average path length• Newscast overlays have a low average path length, i.e., O(log N)

(average number of hops between any two nodes)

Newscast - Static resilience• The overlay shows high static resilience

Newscast - Summary• Simple peer-to-peer algorithm

• Nodes use only local information

• Periodic peer-wise data exchanges

• Emergent properties:

- low average path length

- resistance to high churn

Security

Security in peer-to-peer systems is hard to enforce:

• Users have full control on their computers

• Modified clients may not follow the standard protocol

• Communications may be eavesdropped

• Data may be corrupted

• Private data stored on remote computers may be disclosed

Security - Weak identities• The user may leave the system and rejoin it with a new identity (i.e, user id)

• If an attack is detected, the attacker can reenter the system with a new id

• An attacker may create a large number of false identities (Sybil attack)

Example of Sybil Attack:

• Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine

• The attacker can intercept all traffic coming from or going to node A

Security - Strong identities• The user cannot change its identity

• Solution: use a centralized, trusted Certification Authority (CA)

- Each new user must obtain an identity certificate:

certificate = { user id, IP address, user’s public key, signatureCA }

- The certificate is digitally signed by the CA, whose public key is known by all users

- A certificate cannot be forged (would require the CA’s private key)

-To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA

• Strong identities prevent Sybil Attacks

• If an attacker is caught, it cannot easily rejoin the system

Security – Weak vs. strong identities• Strong identities requires a centralized CA

• new nodes must contact the CA before joining the network:

- the CA response may be slow (a few days)

- if the CA is unavailable, new nodes cannot join the system

• the security of the system depends on the CA:

- the CA must correctly verify the identity of the requester

- the CA’s private key must be kept secret

• Many P2P systems use weak identities

• IP address already gives some identity information

• Some systems ensure anonymity (e.g., FreeNet)

The end

You can get these slides from

http://www.cs.unibo.it/~picconi/slides

For any questions, mail me to:

picconi@cs.unibo.it

complex adaptive systems c.d.l. informatica – università di bologna peer-to-peer systems and

Documents

indexing in peer-to-peer systems - cornell …...indexing in...

topics in database systems: data management in peer-to-peer...

peer-to-peer systems and security

peer-to-peer systems and security in2194 chapter 2 ... ·...

routing indices for peer-to-peer systems

peer to peer english/chinese cross-language information...

distributedsystems peer/to/peer*

distributed systems 600.437 peer to peer systems &...

introduction to peer-to-peer systems

database systems & peer-2-peer networks

distributed computing peer to peer computing chapter 10:...

a framework for structured peer-to-peer systems

peer to peer streaming systems report

peer-to-peer systems - uni-duesseldorf.de...peer-to-peer...

semantic peer-to-peer systems

linearizing peer-to-peer systems with oracles

dynamic networks for peer-to-peer systems

peer to peer distributed systems

distributed systems concepts and design chapter 10: ...

peer-to-peer systems