Download - What is a P2P system?
What is a P2P system?
• A distributed system architecture:• No centralized control• Nodes are symmetric in function
• Large number of unreliable nodes• Enabled by technology improvements
Node
Node
Node Node
Node
Internet
How to build critical services?
• Many critical services use Internet• Hospitals, government agencies, etc.
• These services need to be robust• Node and communication failures• Load fluctuations (e.g., flash crowds)• Attacks (including DDoS)
The promise of P2P computing
• Reliability: no central point of failure• Many replicas• Geographic distribution
• High capacity through parallelism:• Many disks• Many network connections• Many CPUs
• Automatic configuration• Useful in public and proprietary settings
Traditional distributed computing:
client/server
• Successful architecture, and will continue to be so• Tremendous engineering necessary to make server farms scalable and
robust
Server
Client
Client Client
Client
Internet
The abstraction:Distributed hash table (DHT)
Distributed hash table
Distributed application
get (key) data
node node node….
put(key, data)
Lookup service
lookup(key) node IP address
• Application may be distributed over many nodes• DHT distributes data storage over many nodes
(File sharing)
(DHash)
(Chord)
A DHT has a good interface
• Put(key, value) and get(key) value• Simple interface!
• API supports a wide range of applications• DHT imposes no structure/meaning on keys
• Key/value pairs are persistent and global• Can store keys in other DHT values• And thus build complex data structures
A DHT makes a good shared infrastructure
• Many applications can share one DHT service• Much as applications share the Internet
• Eases deployment of new applications• Pools resources from many participants
• Efficient due to statistical multiplexing• Fault-tolerant due to geographic distribution
Recent DHT-based projects
• File sharing [CFS, OceanStore, PAST, Ivy, …]• Web cache [Squirrel, ..]• Archival/Backup store [HiveNet, Mojo,
Pastiche]• Censor-resistant stores [Eternity, FreeNet,..]• DB query and indexing [PIER, …]• Event notification [Scribe]• Naming systems [ChordDNS, Twine, ..]• Communication primitives [I3, …]
Common thread: data is location-independent
Roadmap
• One application: CFS/DHash• One structured overlay: Chord• Alternatives:
• Other solutions• Geometry and performance
• The interface• Applications
CFS: Cooperative file sharing
• DHT used as a robust block store• Client of DHT implements file system
• Read-only: CFS, PAST• Read-write: OceanStore, Ivy
Distributed hash tables
File system
get (key) block
node node node….
put (key, block)
CFS Design
File representation:self-authenticating data
Signed blocks:– Root blocks; Chord ID = H(publisher's public key)Unsigned blocks– Directory blocks, inode blocks, data blocks; – Chord ID = H(block contents)
995:key=901key=732Signature
File System key=995
……
“a.txt” ID=144
key=431key=795
…
…
(root block)
(directory blocks)
(i-node block)
(data)
901= SHA-1 144 = SHA-1431=SHA-1
DHT distributes blocks by hashing IDs
InternetNode ANode C
Node B
Node D
995:key=901key=732Signature
Block732
Block901
247:key=407key=992key=705Signature
Block992
Block407
Block705
• DHT replicates blocks for fault tolerance
• DHT caches popular blocks for load balance
DHT implementation challenges
1. Scalable lookup2. Balance load (flash crowds)3. Handling failures4. Coping with systems in flux5. Network-awareness for performance6. Robustness with untrusted participants7. Programming abstraction8. Heterogeneity9. Anonymity10. Indexing
Goal: simple, provably-good algorithms
1. The lookup problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Put (Key=sha-1(data),Value=data…) Client
Get(key=sha-1(data))
?
• Get() is a lookup followed by check
• Put() is a lookup followed by a store
Centralized lookup (Napster)
Publisher@
Client
Lookup(“title”)
N6
N9 N7
DB
N8
N3
N2N1SetLoc(“title”, N4)
Simple, but O(N) state and a single point of failure
Key=“title”Value=file data…
N4
Flooded queries (Gnutella)
N4Publisher@
Client
N6
N9
N7N8
N3
N2N1
Robust, but worst case O(N) messages per lookup
Key=“title”Value=MP3 data…
Lookup(“title”)
Algorithms based on routing• Map keys to nodes in
a load-balanced way• Hash keys and nodes
into a string of digit• Assign key to “closest”
node
Examples: CAN, Chord, Kademlia, Pastry, Tapestry, Viceroy, ….
K20K5
K80
CircularID space N32
N90
N105
N60• Forward a lookup for a key to a closer node
• Join: insert node in ring
Chord’s routing table: fingers
N80
½¼
1/8
1/161/321/641/128
Lookups take O(log(N)) hops
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
• Lookup: route to closest predecessor
Can we do better?
• Caching• Exploit flexibility at the
geometry level• Iterative vs. recursive lookups
2. Balance load
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
• Hash function balances keys over nodes
• For popular keys, cache along the path
K19
Why Caching Works Well
N20
• Only O(log N) nodes have fingers pointing to N20• This limits the single-block load on N20
3. Handling failures: redundancy
N32
N10
N5
N20
N110
N99
N80
N60
• Each node knows IP addresses of next r nodes• Each key is replicated at next r nodes
N40
K19
K19
K19
Lookups find replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
1.3.
2.
4.
Lookup(BlockID=17)
RPCs:1. Lookup step2. Get successor list3. Failed block fetch4. Block fetch
First Live Successor Manages Replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
Copy of17
• Node can locally determine that it is the first live successor
4. Systems in flux
• Lookup takes log(N) hopsIf system is stableBut, system is never stable!
• What we desire are theorems of the type:
1. In the almost-ideal state, ….log(N)…2. System maintains almost-ideal state
as nodes join and fail
Half-life [Liben-Nowell 2002]
• Doubling time: time for N joins • Halfing time: time for N/2 old nodes to fail• Half life: MIN(doubling-time, halfing-time)
N nodes
N new nodes join
N/2 old nodes leave
Applying half life
• For any node u in any P2P network:If u wishes to stay connected with high
probability, then, on average, u must be notified
about (log N) new nodes per half life
• And so on, …
5. Optimize routing to reduce latency
• Nodes close on ring, but far away in Internet• Goal: put nodes in routing table that result in
few hops and low latency
CA-T1CCIArosUtah
CMU
To vu.nlLulea.se
MITMA-CableCisco
Cornell
NYU
OR-DSLN20
N41N80N40
“close” metric impacts choice of nearby nodes
• Chord’s numerical close and (original) routing table restrict choice• Should new nodes be able to choose their own ID
• Other allows for more choice (e.g., prefix based, XOR)
N32N103
N105
N60
N06 USA
Europe
USA
Far east
USA
K104
6. Malicious participants
• Attacker denies service• Flood DHT with data
• Attacker returns incorrect data [detectable]• Self-authenticating data
• Attacker denies data exists [liveness]• Bad node is responsible, but says no• Bad node supplies incorrect routing info• Bad nodes make a bad ring, and good node joins
it
Basic approach: use redundancy
Sybil attack [Douceur 02]
• Attacker creates multiple identities
• Attacker controls enough nodes to foil the redundancy
N32
N10
N5
N20
N110
N99
N80
N60
N40
Need a way to control creation of node IDs
One solution: secure node IDs
• Every node has a public key• Certificate authority signs public
key of good nodes• Every node signs and verifies
messages• Quotas per publisher
Another solution:exploit practical byzantine
protocols
• A core set of servers is pre-configured with keys and perform admission control [OceanStore]
• The servers achieve consensus with a practical byzantine recovery protocol [Castro and Liskov ’99 and ’00]
• The servers serialize updates [OceanStore] or assign secure node Ids [Configuration service]
N32N103
N105
N60
N06N
N
N
N
A more decentralized solution:
weak secure node IDs
• ID = SHA-1 (IP-address node)• Assumption: attacker controls limited IP
addresses
• Before using a node, challenge it to verify its ID
Using weak secure node IDS
• Detect malicious nodes• Define verifiable system properties
• Each node has a successor• Data is stored at its successor
• Allow querier to observe lookup progress• Each hop should bring the query closer
• Cross check routing tables with random queries
• Recovery: assume limited number of bad nodes
• Quota per node ID
Summary
http://project-iris.net