complex adaptive systems c.d.l. informatica – università di bologna peer-to-peer systems and
Post on 23-Feb-2016
42 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Complex Adaptive SystemsC.d.L. Informatica – Università di Bologna
Peer-to-peer systems andoverlay networks
Fabio PicconiDipartimento di Scienze dell’Informazione
2
• Introduction to P2P systems
• Common topologies
• Data location
• Churn
• Newscast algorithm
• Security
Outline
3
Peer-to-peer vs. client-server
client-server
• Server well connected to the “center” of the Internet
• Servers carries out critical tasks
• Clients only talk to server
• Only nodes located on the “periphery” of the Internet
• Tasks distributed across all nodes
• Clients talk to other clients
peer-to-peer
4
Example – Video sharing
Client-server: YouTube
client-server
Advantages• Client can disconnect after upload
• Uploader needs little bandwidth
• Other users can find the file easily(just use search on server webpage)
Disadvantages• Server may not accept file or
remove it later (according to content policy)
• Whole system depends on the server(what if shut down like Napster?)
• Server storage and bandwidthare expensive!
uploader
downloader downloader
downloader
downloaderdownloader
5
Example – Video sharing
Peer-to-peer: BitTorrent
peer-to-peer
Advantages• Does not depend on a central server
• Bandwidth shared across nodes(downloaders also act as uploaders)
• High scalability, low cost
Disadvantages• Seeder must remain on-line to
guarantee file availability
• Content is more difficult to find(downloaders must find .torrent file)
• Freeloaders cheat in order to download without uploading
seeder
downloader downloader
downloader
downloaderdownloader
6
Comparison: P2P vs. client-server
Client-server• Asymmetric: client and servers
carry out different tasks
• Global knowledge: servers have a global view of the network
• Centralization: communications and management are centralized
• Single point of failure: a server failure brings down the system
• Limited scalability: servers easily overloaded
• Expensive: server storage and bandwidth capacity is not cheap
Peer-to-peer• Symmetric: each node carries out
the same tasks
• Local knowledge: nodes only know a small set of other nodes
• Decentralization: nodes must self-organize in a decentralized way
• Robustness: several nodes may fail with little or no impact
• High scalability: high aggregate capacity, load distribution
• Low-cost: storage and bandwidth are contributed by users
7
Characterizing peer-to-peer systems
The main characteristics of P2P systems are:• decentralization (i.e., no central server)
• self-organization (e.g., adding new nodes and removing disconnected ones)
• symmetric communications (e.g., peers act as clients and servers)
• scalability (thanks to high aggregate capacity and load distribution)
• shared ownership (i.e., storage and bandwidth are contributed by peers)
• overlay construction and routing (i.e., nodes form a logical network on top of the underlying IP network)
a message from one peer to another is sent through the underlying IP network
8
P2P environment
P2P systems are deployed in a challenging environment:• High latency and low bandwidth between nodes
- a high hop count will result in a high end-to-end latency - transferring large files may take a long time
• Churn- nodes may disconnect temporarily- new nodes are constantly joining the system, while others leave the
overlay permanently
• Security- P2P clients run on machines under full control of their users- data sent to other nodes may be erased, corrupted, disclosed, etc.- malicious users may try to bring down the system (e.g., routing attack)
• Selfishness- users may run hacked P2P clients in order to avoid contributing resources
9
Problems
Some of the problems that a P2P systems designer must face:• Overlay construction and maintenance
- maintain a given overlay topology (e.g., random, two-level, ring, etc.)
• Data location- locate a given data object among a large number of nodes
• Data dissemination- propagate data in an efficient and robust manner
• Per-node state- keep the amount of state per node small
• Tolerance to churn - maintain system invariants (e.g., topology, data location, data availability)
despite node arrivals and departures
10
Topology
Some common topologies:• Flat unstructured: a node can connect to any other node
- only constraint: maximum degree dmax
- fast join procedure- usually very tolerant to churn- good for data dissemination, bad for location
• Two-level unstructured: nodes connect to a supernode- supernodes form a small overlay- used for indexing and forwarding- large state and high load on supernodes
• Flat structured: constraints based on node ids - allows for efficient data location- constraints require long join and leave procedures- less robust in high-churn environments
11
Data location - Flooding
Problem: find the set of nodes S that store a copy of object O
Solutions:(1) Flooding: send a search message to all nodes [first Gnutella protocol]
• A search message contains either keywords or an object id
Advantages:- simplicity- no topology constraints
Disadvantages: - high network overhead (huge traffic generated by each search request)- flooding stopped by TTL (which produces search horizon)- only applicable to small number of nodes
12
Data location - Flooding
(1) Flooding (cont.)
Flooding in a flat unstructured network:
search horizon forTTL = 2
obj
Objects that lie outside of the horizon are not found
search
13
Data location - Superpeers
(2) Two-level overlay: use superpeers to index the locations of an object[eMule, Gnutella 2, BitTorrent]
• Each node connects to a superpeer and advertises the list of objects it stores
• Search requests are sent to the superpeer, which forwards them to other superpeers
Advantages:- highly scalable
Disadvantages:- superpeers must be realiable, powerful and well connected to the Internet (expensive)- superpeers must maintain large state- the system relies on a small number of superpeers
14
Data location - Superpeers
(2) Two-level overlay (cont.)
• A two-level overlay is a partially centralized system• In some systems superpeers do not connect to each other (e.g., BitTorrent)
responserequest
obj
15
Data location - KBR
(3) Structured networks: use a routing algorithm that implements Key-BasedRouting [Overnet, Kad, BitTorrent trackerless]
Key-Based Routing (also known as Distributed Hash Tables, or DHTs)works as follows:
• each node is given a unique node identifier, or nodeid
• given a key k, the node whose nodeid is numerically closest to kamong all nodes in the network is known as the root of key k
• given a routing key k, a KBR algorithm can route a message to theroot of k in a small number of hops, usually O(log N)
• the location of an object with id objectid is tracked by the root ofk = objectid
• thus, one can find the location of an object by routing a message to theroot of k = objectid and querying the root for the location of the object
16
Data location - KBR (cont.)
Key-Based Routing [Pastry]
Source node id: 04F2
k = object id: 8955
Hop # Hop id Shared prefix length0 04F2 01 85E0 12 8909 23 8957 34 8954 3 (root of k)
04F2
3A79
5230
8909
8954
8957
AC78
C52A
E25A
route(k=8955,msg)
overlay address space[0000,FFFF]
620F
85E0obj8955stored onnodes 620F,C52A
8821
obj 8955
obj 8955
Object 8955 is tracked by node 8954, which knows of two copies storedat nodes 620F and C52A
17
Data location - KBR (cont.)
Routing table for node 4F28 [Pastry]
Routing table
02A3
409A
4F04
19BA
413C
4F1B
4F21
2F34
4288
N/A
E129
4E01
F0A4
N/A
4FF5
…
…
…
...
Leaf Set
4F04 4F1B 4F21 - 4F30 4F55 4FF5
Node id: 4F28
used to find next hop with longer shared prefix
used to find the nodeid closest to a key that is close to the local nodeid
• In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = 65536 nodes.
• The average route length in this case is 4 hops.
18
Data location - KBR (cont.)
(3) Structured networks (cont.)
Advantages:- completely decentralized (no need for superpeers)- routing algorithm achieves low hop count for large network sizes
Disadvantages:- each object must be tracked by a different node- objects are tracked by unreliable nodes (i.e., which may disconnect)- keyword-based searches are more difficult to implement than
with superpeers (because objects are located by their objectid)- the overlay must be structured according to a given topology
in order to achieve a low hop count- routing tables must be updated every time a node joins or leaves the overlay
19
Data location - Loosely structured overlays
(4) Loosely structured networks: use hints on the location of objects [Freenet]
• Nodes locate objects by sending search requests containing the object id
• Requests are propagated using a technique similar to flooding
• Objects with similar identifiers are grouped on the same nodes
AE5J
A
B
C
DE F5B20
requestfor AE5J
responseAF02
20
Data location - Loosely structured overlays
(4) Loosely structured networks (cont.)
• A search response leaves routing hints on the path back to the source
• Hints are used when propagating future requests for similar object ids
AE5J
A
B
C
DE F
HintsAE5J: D5B20: E
5B20
HintsAE5J: F
HintsAE5J: B
AF02
requestfor AF02
21
Data location - Loosely structured overlays
(4) Loosely structured networks (cont.)
Advantages:- no topology constraints, flat architecture- searches are more efficient than with plain flooding
Disadvantages:- does not support keyword-based searches- search requests have a TTL- contrary to structured overlays, loosely structured overlay do not
guarantee a low number of hops, nor that the object will be found
22
The location schemes described previously can be classified according to:
• degree of structure
• decentralization
unstructured loosely structured
structured
partially centralized
eMuleGnutella 2BitTorrent
- -
decentralized Gnutella Freenet Kad
Data location - Summary
23
Churn (node arrivals and departures) can have several effects on a P2P system:
• data objects may be become unavailable if all replicas disconnect
• routing tables may become inconsistent(e.g., entries may point to nodes which have disconnected)
• the overlay may become partitioned if several nodes suddenly disconnect:
Effects of churn
24
Node arrivals and departures must not disrupt the normal behavior of the system
• system invariants must be maintained- connected overlay (i.e., no partitions), low average path length- data objects accessible from anywhere in the network
• two types of churn tolerance:
- dynamic repair: ability to react to changes in the overlay to maintainsystem invariants (e.g., heal partitions)
- static resilience: ability to continue operating correctly before adaptation
occurs (e.g., route messages through alternate paths)
Churn tolerance
25
An simple way to prevent partitions is to increase the node degree
Churn – Preventing partitions
Ring partitions can be avoided by keeping a list of successor nodes
26
Percentage of failed paths for various numbers of successor nodes
Churn – Static resilience (Ring topology)
1 successor16 successors48 successors
More successors reduce the chances of “opening” the ring
27
Percentage of failed paths for various overlay topologies
Churn – Static resilience (various topologies)
Some topologies provide more alternate paths between nodes than others
28
• Dynamically update routing information to adapt to overlay changes
• Two types of repair algorithms:- reactive: start maintenance procedure immediately after detection- periodic: execute maintenance procedure periodically
A reactive algorithm can bring down the system instead of repairing it:
• Each node disconnection triggers a maintenance procedure on a set of nodes
• Over a given churn rate, the maintenance traffic congests the network
• The network congestion leads to nodes being considered as disconnected
• This triggers even more maintenance procedures (positive feedback), eventually bringing down the network
This effect may be avoided using a periodic maintenance algorithm
Churn – Dynamic repair (Ring topology)
29
Churn is an important issue in P2P overlays:
• Data may become unavailable, and routing information outdated
• Static resilience- depends on the topology (i.e., the number of alternate paths)- increases with the average node degree
• Dynamic repair protocols must be carefully designed- reactive protocols are usually faster- periodic protocols can handle higher churn rates
Churn – Summary
30
Newscast• Peer-to-peer protocol that creates and maintains an unstructured overlay
• Highly resilient to churn
• Can be used to propagate information
• Extremely simple design based on information gossip:
- Each node only knows about a small set of other knows (view)
- Each node periodically picks a random node from this set
- Both nodes exchange their views and update them
• The random view exchange makes the algorithm very robust to failures and changes in the overlay
31
Newscast - View exchange• Each node maintains a view v containing c entries, where
entry = {node address, timestamp}
• Each node executes the following code every T seconds:
1. select random entry r from local view 2. send local view plus an entry with the local address to node r3. retrieve view of node r and merge it with local view4. keep the c entries with the most recent timestamps
• The view of a node changes on each round
32
Newscast - View exchange (cont.)
Example view exchange initiated by node A
B,10D,5E,3X,8S,12W,2
view of node A(c = 6)
D
A E
X
B
W
S
33
Newscast - View exchange (cont.)
A,15B,10D,5E,3X,8S,12W,2
E,15J,14C,9D,8H,10L,14Z,2
view of node E
A E
1. Select random nodeB,10D,5E,3X,8S,12W,2
view of node A
selected node
2. Exchange views (plus local entry) with selected node
view of node A
34
Newscast - View exchange (cont.)
E,15 J,14 L,14S,12H,10B,10 C,9X,8D,8D,5E,3W,2Z,2
3. Merge views, and order by timestamp
merge result
4. Keep c most recent entries
E,15 J,14 L,14S,12H,10B,10
new view of node A(c = 6)
35
Newscast - Average path length• Newscast overlays have a low average path length, i.e., O(log N)
(average number of hops between any two nodes)
36
Newscast - Static resilience• The overlay shows high static resilience
37
Newscast - Summary• Simple peer-to-peer algorithm
• Nodes use only local information
• Periodic peer-wise data exchanges
• Emergent properties:
- low average path length
- resistance to high churn
38
Security
Security in peer-to-peer systems is hard to enforce:
• Users have full control on their computers
• Modified clients may not follow the standard protocol
• Communications may be eavesdropped
• Data may be corrupted
• Private data stored on remote computers may be disclosed
39
Security - Weak identities• The user may leave the system and rejoin it with a new identity (i.e, user id)
• If an attack is detected, the attacker can reenter the system with a new id
• An attacker may create a large number of false identities (Sybil attack)
S2
A S3
S4
S1
S6
S5
Example of Sybil Attack:
• Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine
• The attacker can intercept all traffic coming from or going to node A
40
Security - Strong identities• The user cannot change its identity
• Solution: use a centralized, trusted Certification Authority (CA)
- Each new user must obtain an identity certificate:
certificate = { user id, IP address, user’s public key, signatureCA }
- The certificate is digitally signed by the CA, whose public key is known by all users
- A certificate cannot be forged (would require the CA’s private key)
-To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA
• Strong identities prevent Sybil Attacks
• If an attacker is caught, it cannot easily rejoin the system
41
Security – Weak vs. strong identities• Strong identities requires a centralized CA
• new nodes must contact the CA before joining the network:
- the CA response may be slow (a few days)
- if the CA is unavailable, new nodes cannot join the system
• the security of the system depends on the CA:
- the CA must correctly verify the identity of the requester
- the CA’s private key must be kept secret
• Many P2P systems use weak identities
• IP address already gives some identity information
• Some systems ensure anonymity (e.g., FreeNet)
42
The end
You can get these slides from
http://www.cs.unibo.it/~picconi/slides
For any questions, mail me to:
picconi@cs.unibo.it
top related