pastry peter druschel, rice university antony rowstron, microsoft research uk some slides are...
TRANSCRIPT
Pastry
Peter Druschel, Rice University
Antony Rowstron, Microsoft Research UK
Some slides are borrowed from the original presentation by the authors
Outline
• Background
• Pastry
• Pastry proximity routing
• PAST
Background
Peer-to-peer systems
• distribution
• decentralized control
• self-organization
• symmetry (communication, node roles)
Common issues
• Organize, maintain overlay network
• Resource allocation/load balancing
• Resource location
• Network proximity routing
Pastry provides a generic p2p substrate
Architecture
TCP/IP
Pastry
Network storage
Event notification
Internet
P2p substrate (self-organizingoverlay network)
P2p application layer?
Structured p2p overlays
The primitive route(M, X) routes message M to the live node with node Id closest to key X
Node ids and keys are from a large, sparse id space
Distributed Hash Tables (DHT)
k6,v6
k1,v1
k5,v5
k2,v2
k4,v4
k3,v3
nodes
Operations:insert(k,v)lookup(k)
P2P overlay networ
k
P2P overlay networ
k
• p2p overlay maps keys to nodes• completely decentralized and self-organizing• robust, scalable
Outline
• Background
• Pastry
• Pastry proximity routing
• PAST
• SCRIBE
• Conclusions
Pastry: Object distribution
objId
Consistent hashing [Karger et al. ‘97]
128 bit circular id space
nodeIds (uniform random)
objIds (uniform random)
Invariant: node with numerically closest nodeId maintains object
nodeIds
O2128-1
Pastry: Object insertion/lookup
X
Route(X)
Msg with key X is routed to live node with nodeId closest to X
Problem: complete
routing table not feasible
O2128-1
Pastry: Routing
Tradeoff
• O(log N) routing table size
• O(log N) message forwarding steps
Routing table of # 65a1fcx0x
1x
2x
3x
4x
5x
7x
8x
9x
ax
bx
cx
dx
ex
fx
60x
61x
62x
63x
64x
66x
67x
68x
69x
6ax
6bx
6cx
6dx
6ex
6fx
650x
651x
652x
653x
654x
655x
656x
657x
658x
659x
65bx
65cx
65dx
65ex
65fx
65a0x
65a2x
65a3x
65a4x
65a5x
65a6x
65a7x
65a8x
65a9x
65aax
65abx
65acx
65adx
65aex
65afx
log16 N rows
Row 0
Row 1
Row 2
Row 3
Pastry: Routing
Propertieslog16 N steps O(log N) state
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
Pastry: Leaf sets
Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively.
• routing efficiency/robustness
• fault detection (keep-alive)
• application-specific local coordination
Pastry: Routing procedureif (destination is within range of our leaf set)
forward to numerically closest memberelse
let l = length of shared prefix let d = value of l-th digit in D’s addressif (Rl
d exists) (Rld = entry at column d row l)
forward to Rld
else forward to a known node that (a) shares at least as long a prefix(b) is numerically closer than this node
Pastry: Performance
Integrity of overlay/ message delivery:• guaranteed unless L/2 simultaneous failures
of nodes with adjacent nodeIds
Number of routing hops:• No failures: < log16 N expected, 128/b + 1 max
• During failure recovery:– O(N) worst case, average case much better
Self-organization
How are the routing tables and leaf sets
initialized and maintained?
• Node addition
• Node departure (failure)
Pastry: Node addition
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
New node: d46a1c
Node departure (failure)
Leaf set members exchange heartbeat
• Leaf set repair (eager): request set from farthest live node in set
• Routing table repair (lazy): get table from peers in the same row, then higher rows
Pastry: Experimental results
Prototype
• implemented in Java
• emulated network
• deployed currently at ~25 sites worldwide
Pastry: Average # of hops
L=16, 100k random queries
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1000 10000 100000
Number of nodes
Average number of hops
Pastry
Log(N)
Outline
• Background
• Pastry
• Pastry proximity routing
Pastry: Proximity routing
Proximity metric = time delay estimated by ping
A node can probe distance to any other node
Each routing table entry uses a node close to the local node (in the proximity space), among all nodes with the appropriate node Id prefix.
Pastry: Routes in proximity space
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
NodeId space
d467c4
65a1fcd13da3
d4213f
d462ba
Proximity space
Pastry: Distance traveled
L=16, 100k random queries, Euclidean proximity space
0.8
0.9
1
1.1
1.2
1.3
1.4
1000 10000 100000Number of nodes
Relative DistancePastry
Complete routing table
Pastry: Locality properties
Expected distance traveled by a message in the proximity space is within a small constant of the minimum
Among k nodes with node Ids closest to the key, message likely to reach the node closest to the source node first
d467c4
65a1fcd13da3
d4213f
d462ba
Proximity space
Pastry: Node addition
New node: d46a1c
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
NodeId space
Pastry delay vs IP delay
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
Distance between source and destination
Distance traveled by Pastry message
Mean = 1.59
GATech top., .5M hosts, 60K nodes, 20K random messages
Pastry: API
• route(M, X): route message M to node with nodeId numerically closest to X
• deliver(M): deliver message M to application• forwarding(M, X): message M is being
forwarded towards key X• newLeaf(L): report change in leaf set L to
application
Pastry: Security
• Secure nodeId assignment
• Secure node join protocols
• Randomized routing
• Byzantine fault-tolerant leaf set membership protocol
Pastry: Summary
• Generic p2p overlay network
• Scalable, fault resilient, self-organizing, secure
• O(log N) routing steps (expected)
• O(log N) routing table size
• Network proximity routing
PAST: File Retrieval
fileId file located in log16 N steps (expected)
usually locates replica nearest client C
Lookup
k replicasC