1 distributed k -ary system algorithms for distributed hash tables ali ghodsi [email protected]...
TRANSCRIPT
![Page 1: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/1.jpg)
1
Distributed k-ary SystemAlgorithms for Distributed Hash Tables
http://www.sics.se/~ali/thesis/
PhD Defense, 7th December 2006, KTH/Royal Institute of Technology
![Page 2: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/2.jpg)
2
Distributed k-ary SystemAlgorithms for Distributed Hash Tables
http://www.sics.se/~ali/thesis/
PhD Defense, 7th December 2006, KTH/Royal Institute of Technology
![Page 3: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/3.jpg)
3
Presentation Overview
• Gentle introduction to DHTs
• Contributions
• The future
![Page 4: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/4.jpg)
4
What’s a Distributed Hash Table (DHT)?
• An ordinary hash table
• Every node provides a lookup operation•Provide the value associated with a key
• Nodes keep routing pointers•If item not found, route to another node
Key ValueAlexander
Berlin
Ali Stockholm
Marina Gothenburg
PeterLouvain la neuve
Seif Stockholm
Stefan Stockholm
, which is distributed
![Page 5: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/5.jpg)
5
So what?
•Characteristic properties•Scalability
•Number of nodes can be huge•Number of items can be huge
•Self-manage in presence joins/leaves/failures•Routing information •Data items
Time to find data is logarithmic
Size of routing tables is logarithmic
Example:
log2(1000000)≈20
EFFICIENT!
Store number of items proportional to number of nodes
Typically:
With D items and n nodes
Store D/n items per node
Move D/n items when nodes join/leave/fail
EFFICIENT!
Self-management routing info:• Ensure routing information
is up-to-date
Self-management of items:• Ensure that data is always
replicated and available
![Page 6: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/6.jpg)
6
Presentation Overview
• …• …
• What’s been the general motivation for DHTs?
• …• …
![Page 7: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/7.jpg)
7
Traditional Motivation (1/2)
• Peer-to-Peer filesharing very popular
• Napster• Completely centralized• Central server knows who has what• Judicial problems
• Gnutella• Completely decentralized• Ask everyone you know to find
data• Very inefficient
central index
decentralized index
![Page 8: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/8.jpg)
8
Traditional Motivation (2/2)• Grand vision of DHTs
• Provide efficient file sharing
• Quote from Chord: ”In particular, [Chord] can help avoid single points of failure or control that systems like Napster possess, and the lack of scalability that systems like Gnutella display because of their widespread use of broadcasts.” [Stoica et al. 2001]
• Hidden assumptions• Millions of unreliable nodes• User can switch off computer any time (leave=failure)• Extreme dynamism (nodes joining/leaving/failing)• Heterogeneity of computers and latencies• Unstrusted nodes
![Page 9: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/9.jpg)
9
Our philosophy
• DHT is a useful data structure
• Assumptions might not be true• Moderate amount of dynamism• Leave not same thing as failure
• Dedicated servers• Nodes can be trusted• Less heterogeneity
• Our goal is to achieve more given stronger assumptions
![Page 10: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/10.jpg)
10
Presentation Overview
• …• …
• How to construct a DHT?
• …• …
![Page 11: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/11.jpg)
11
How to construct a DHT (Chord)?
• Use a logical name space, called the identifier space, consisting of identifiers {0,1,2,…, N-1}
• Identifier space is a logical ring modulo N
• Every node picks a random identifier
• Example:
• Space N=16 {0,…,15}
• Five nodes a, b, c, d• a picks 6• b picks 5• c picks 0• d picks 5• e picks 2
2
11
6
5
01
3
4
789
10
15
14
13
12
![Page 12: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/12.jpg)
12
Definition of Successor
• The successor of an identifier is the first node met going in clockwise direction starting at the identifier
• Example• succ(12)=14• succ(15)=2• succ(6)=6
2
11
6
5
01
3
4
789
10
15
14
13
12
![Page 13: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/13.jpg)
13
Where to store data (Chord) ?
•Use globally known hash function, H
•Each item <key,value> gets identifier H(key)
•Store each item at its successor•Node n is responsible for item k
•Example• H(“Marina”)=12• H(“Peter”)=2• H(“Seif”)=9• H(“Stefan”)=14
2
11
6
5
01
3
4
789
10
15
14
13
12
Key Value
Alexander Berlin
MarinaGothenburg
PeterLouvain la neuve
Seif Stockholm
Stefan Stockholm
Store number of items proportional to number of nodes
Typically:
With D items and n nodes
Store D/n items per node
Move D/n items when nodes join/leave/fail
EFFICIENT!
![Page 14: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/14.jpg)
14
Where to point (Chord) ?•Each node points to its successor
•The successor of a node n is succ(n+1)•Known as a node’s succ pointer
•Each node points to its predecessor•First node met in anti-clockwise direction starting at n-1 •Known as a node’s pred pointer
•Example• 0’s successor is succ(1)=2• 2’s successor is succ(3)=5• 5’s successor is succ(6)=6• 6’s successor is succ(7)=11• 11’s successor is succ(12)=0
2
11
6
5
01
3
4
789
10
15
14
13
12
![Page 15: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/15.jpg)
15
DHT Lookup
•To lookup a key k
• Calculate H(k)
• Follow succ pointers until item k is found
•Example• Lookup ”Seif” at node 2
• H(”Seif”)=9
• Traverse nodes:• 2, 5, 6, 11 (BINGO)
• Return “Stockholm” to initiator
2
11
6
5
01
3
4
789
10
15
14
13
12
Key Value
Alexander Berlin
Marina Gothenburg
PeterLouvain la neuve
Seif Stockholm
Stefan Stockholm
![Page 16: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/16.jpg)
16
Speeding up lookups
• If only pointer to succ(n+1) is used• Worst case lookup time is N, for N nodes
• Improving lookup time• Point to succ(n+1)• Point to succ(n+2)• Point to succ(n+4)• Point to succ(n+8)• …• Point to succ(n+2M)
• Distance always halved to the destination
2
11
6
5
01
3
4
789
10
15
14
13
12
Time to find data is logarithmic
Size of routing tables is logarithmic
Example:
log2(1000000)≈20
EFFICIENT!
![Page 17: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/17.jpg)
17
Dealing with failures
• Each node keeps a successor-list• Pointer to f closest successors
• succ(n+1)• succ(succ(n+1)+1)• succ(succ(succ(n+1)+1)+1)• ...
• If successor fails• Replace with closest alive
successor
• If predecessor fails• Set pred to nil
2
11
6
5
01
3
4
789
10
15
14
13
12
![Page 18: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/18.jpg)
18
Handling Dynamism
• Periodic stabilization used to make pointers eventually correct
• Try pointing succ to closest alive successor
• Try pointing pred to closest alive predecessor
![Page 19: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/19.jpg)
19
Presentation Overview
• Gentle introduction to DHTs
• Contributions
• The future
![Page 20: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/20.jpg)
20
Outline
• …• …
• Lookup consistency
• …• …
![Page 21: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/21.jpg)
21
Problems with periodic stabilization
• Joins and leaves can result in inconsistent lookup results
• At node 12, lookup(14)=14
• At node 10, lookup(14)=15
1012 14 15
![Page 22: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/22.jpg)
22
Problems with periodic stabilization
• Leaves can result in routing failures
1013
16
![Page 23: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/23.jpg)
23
Problems with periodic stabilization
• Too many leaves destroy the system
• #leaves+#failures/round < |successor-list|
10 11 12 14 15
![Page 24: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/24.jpg)
24
Outline
• …• …
• Atomic Ring Maintenance
• …• …
![Page 25: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/25.jpg)
25
Atomic Ring Maintenance
• Differentiate leaves from failures
• Leave is a synchronized departure
• Failure is a crash-stop
• Initially assume no failures• Build a ring initially
![Page 26: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/26.jpg)
26
Atomic Ring Maintenance
• Separate parts of the problem• Concurrency control
• Serialize neighboring joins/leaves
• Lookup consistency
![Page 27: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/27.jpg)
27
Naïve Approach
• Each node i hosts a lock called Li
• For p to join or leave:• First acquire Lp.pred
• Second acquire Lp
• Third acquire Lp.succ
• Thereafter update relevant pointers
• Can lead to deadlocks
![Page 28: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/28.jpg)
28
Our Approach to Concurrency Control
• Each node i hosts a lock called Li
• For p to join or leave:• First acquire Lp
• Thereafter acquire Lp.succ
• Thereafter update relevant pointers
• Each lock has a lock queue• Nodes waiting to acquire the lock
![Page 29: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/29.jpg)
29
Safety
• Non-interference theorem:
•When node p acquires both locks:
•Node p’s successor cannot leave
•Node p’s ”predecessor” cannot leave
•Other joins cannot affect ”relevant” pointers
![Page 30: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/30.jpg)
30
Dining Philosophers
• Problem similar to the Dining philosophers’ problem
• Five philosophers around a table• One fork between each philosopher
(5)• Philosophers eat and think• To eat:
• grab left fork• then grab right fork
![Page 31: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/31.jpg)
31
Deadlocks
• Can result in a deadlock• If all nodes acquire their first lock• Every node waiting indefinitely for second lock
• Solution from Dining philosophers’• Introduce asymmetry• One node acquires locks in reverse order
• Node with highest identifier reverses• If n<n.succ, then n has highest identity
![Page 32: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/32.jpg)
32
1414, 12
Pitfalls• Join adds node/“philosopher”
• Solution: some requests in the lock queue forwarded to new node
1012 14 15
12 12
![Page 33: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/33.jpg)
33
Pitfalls
• Leave removes a node/“philosopher”• Problem:
if leaving node gives lock queue to its successor, nodes can get worse position in queue: starvation
• Use forwarding to avoid starvation• Lock queue empty after local leave
request
![Page 34: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/34.jpg)
34
Correctness
• Liveness Theorem: • Algorithm is starvation free
• Also free from deadlocks and livelocks
• Every joining/leaving node will eventually succeed getting both locks
![Page 35: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/35.jpg)
35
Performance drawbacks
• If many neighboring nodes leaving• All grab local lock• Sequential progress
• Solution• Randomized locking• Release locks and retry• Liveness with high probability
1012 14 15
![Page 36: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/36.jpg)
36
Lookup consistency: leaves
• So far dealt with concurrent joins/leaves• Look at concurrent join/leaves/lookups
• Lookup consistency (informally):• At any time, only one node responsible
for any key
• Joins/leaves should “not affect” functionality of lookups
![Page 37: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/37.jpg)
37
Lookup consistency
• Goal is to make joins and leaves appear as if they happened instantaneously
• Every leave has a leave point• A point in global time, where the whole
system behaves as if the node instantaneously left
• Implemented with a LeaveForward flag• The leaving node forwards messages to
successor if LeaveForward is true
![Page 38: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/38.jpg)
38
Leave Algorithm
pred:=p
succ:=r
LeaveForward=true
LeaveForward=false
<UpdateSucc, succ=r>
Node p Node q (leaving) Node r
<StopForwarding>
<LeavePoint, pred=p>
leave point
![Page 39: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/39.jpg)
39
Lookup consistency: joins
• Every join has a join point• A point in global time, where the whole
system behaves as if the node instantaneously joined
• Implemented with a JoinForward flag• The successor of a joining node
forwards messages to new node if JoinForward is true
![Page 40: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/40.jpg)
40
Join Algorithm
Join Point
JoinForward=trueoldpred=predpred=q
JoinForwarding=false
succ:=q
pred:=psucc:=r
<JoinPoint, pred=p>
<UpdateSucc,
succ=q>
Node p Node q (joining) Node r
<StopForwarding>
<Finish>
<UpdatePred, pred=q>
![Page 41: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/41.jpg)
41
Outline
• …• …
• What about failures?
• …• …
![Page 42: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/42.jpg)
42
Dealing with Failures
• We prove it is impossible to provide lookup consistency on the Internet
• Assumptions• Availability (always eventually answer)• Lookup consistency• Partition tolerance
• Failure detectors can behave as if the networked partitioned
![Page 43: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/43.jpg)
43
Dealing with Failures
• We provide fault-tolerant atomic ring• Locks leased• Guarantees locks are always released
• Periodic stabilization ensures • Eventually correct ring• Eventual lookup consistency
![Page 44: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/44.jpg)
44
Contributions
• Lookup consistency in presence of joins/leaves• System not affected by joins/leaves• Inserts do not “disappear”
• No routing failures when nodes leave
• Number of leaves not bounded
![Page 45: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/45.jpg)
45
Related Work
• Li, Misra, Plaxton (’04, ’06) have a similar solution
• Advantages• Assertional reasoning• Almost machine verifiable proofs
• Disadvantages• Starvation possible• Not used for lookup consistency• Failure-free environment assumed
![Page 46: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/46.jpg)
46
Related Work
• Lynch, Malkhi, Ratajczak (’02), position paper with pseudo code in appendix
• Advantages• First to propose atomic lookup consistency
• Disadvantages• No proofs• Message might be sent to a node that left• Does not work for both joins and leaves
together• Failures not dealt with
![Page 47: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/47.jpg)
47
Outline
• …• …
• Additional Pointers on the Ring
• …• …
![Page 48: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/48.jpg)
48
Routing
• Generalization of Chord to provide arbitrary arity
• Provide logk(n) hops per lookup
•k being a configurable parameter
•n being the number of nodes
• Instead of only log2(n)
![Page 49: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/49.jpg)
49
Achieving logk(n) lookup
Interval 1
Interval 2
Interval 3
Interval 0
0
32
48
4
8
12
16
I3I2I1I0Node 0
48…63
32…47
16…31
0…15Level 1
• Each node logk(N) levels, N=kL
• Each level contains k intervals,
• Example, k=4, N=64 (43), node 0
![Page 50: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/50.jpg)
50
I3I2I1I0Node 0
48…63
32…47
16…31
0…15Level 1Interval 2
Interval 1
Interval 3
Interval 0
Achieving logk(n) lookup
0
32
48
4
8
12
16
• Each node logk(N) levels, N=kL
• Each level contains k intervals,
• Example, k=4, N=64 (43), node 0
I3I2I1I0Node 0
12…15
8…114…70…3Level 2
48…63
32…47
16…31
0…15Level 1
![Page 51: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/51.jpg)
51
I3I2I1I0Node 0
12…15
8…114…70…3Level 2
48…63
32…47
16…31
0…15Level 1
Achieving logk(n) lookup
0
32
48
4
8
12
16
I3I2I1I0Node 0
3210Level 3
12…15
8…114…70…3Level 2
48…63
32…47
16…31
0…15Level 1
• Each node logk(N) levels, N=kL
• Each level contains k intervals,
• Example, k=4, N=64 (43), node 0
![Page 52: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/52.jpg)
52
Arity important
• Maximum number of hops can be configured
• Example, a 2-hop system
rNNNr
r
NNk
rr
1
11 log)(log)(log
2)(log
N
Nk
N
rNk1
![Page 53: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/53.jpg)
53
•Each node has (k-1)logk(N) pointers• Node p’s pointers point at
Placing pointers
11
)))1(mod)1((1()( ki
kkipif
0
32
48
4
8
12
16
Node 0’s pointersf(1)=1f(2)=2f(3)=3f(4)=4f(5)=8f(6)=12f(7)=16f(8)=32f(9)=48
![Page 54: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/54.jpg)
54
Greedy Routing
• lookup(i) algorithm
• Use pointer closest to i, without “overshooting” i
• If no such pointer exists, succ is responsible for i
i
![Page 55: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/55.jpg)
55
Routing with Atomic Ring Maintenance
• Invariant of lookup• Last hop is always predecessor of
responsible node
• Last step in lookup• If JoinForward is true, forward to pred• If LeaveForward is true, forward to
succ
![Page 56: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/56.jpg)
56
Avoiding Routing Failures
• If nodes leave, routing failures can occur
• Accounting algorithm • Simple Algorithm
• No routing failures of ordinary messages
• Fault-free Algorithm• No routing failures
• Many cases and interleavings• Concurrent joins and leaves,
pointers in both directions
![Page 57: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/57.jpg)
57
General Routing
• Three lookup styles
• Recursive
• Iterative
• Transitive
![Page 58: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/58.jpg)
58
Reliable Routing• Reliable lookup for each style
• If initiator doesn’t crash, responsible node reached• No redundant delivery of messages
• General strategy• Repeat operation until success• Filter duplicates using unique identifiers
• Iterative lookup• Reliability easy to achieve
• Recursive lookup• Several algorithms possible
• Transitive lookup• Efficient reliability hard to achieve
![Page 59: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/59.jpg)
59
Outline
• …• …
• One-to-many Communication
• …• …
![Page 60: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/60.jpg)
60
Group Communication on an Overlay
• Use existing routing pointers • Group communication
• DHT only provides key lookup• Complex queries by searching the overlay• Limited horizon broadcast• Iterative deepening
• More efficient than Gnutella-like systems• No unintended graph partitioning• Cheaper topology maintenance [castro04]
![Page 61: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/61.jpg)
61
Group Communication on an Overlay
• DHT builds a graph• Why not use general graph algorithms?
• Can use the specific structure of DHTs• More efficient• Avoids redundant messages
![Page 62: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/62.jpg)
62
Broadcast Algorithms
• Correctness conditions:• Termination
• Algorithm should eventually terminate
• Coverage• All nodes should receive the broadcast
message
• Non-redundancy• Each node receives the message at most once
• Initially assume no failures
![Page 63: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/63.jpg)
63
Naïve Broadcast
• Naive Broadcast Algorithmsend message to succ until:
initiator reached or overshooted
2
11
6
5
01
3
4
789
10
15
14
13
12
initiator
![Page 64: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/64.jpg)
64
Naïve Broadcast
• Naive Broadcast Algorithmsend message to succ until:
initiator reached or overshooted
• Improvement• Initiator delegates half
the space to neighbor
• Idea applied recursively• log(n) time and n messages
2
11
6
5
01
3
4
789
10
15
14
13
12
initiator
![Page 65: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/65.jpg)
65
Simple Broadcast in the Overlay
• Dissertation assumes general DHT model
event n.SimpleBcast(m, limit) % initially limit = nfor i:=M downto 1 do
if u(i) ∈ (n,limit) thensendto u(i) : SimpleBcast(m, limit)limit := u(i)
![Page 66: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/66.jpg)
66
”Advanced” Broadcast• Old algorithm on k-ary trees
![Page 67: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/67.jpg)
67
Getting responses
• Getting a reply• Nodes send directly back to initiator• Not scalable
• Simple Broadcast with Feedback• Collect responses back to initiator• Broadcast induces a tree, feedback in reverse direction
• Similar to simple broadcast algorithm• Keeps track of parent (par)• Keeps track of children (Ack)• Accumulate feedback from children, send to parent
• Atomic ring maintenance• Acquire local lock to ensure nodes do not leave
![Page 68: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/68.jpg)
68
Outline
• …• …
• Advanced One-to-many Communication
• …• …
![Page 69: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/69.jpg)
69
Motivation for Bulk Operation
• Building MyriadStore in 2005• Distributed backup using the DKS DHT
• Restoring a 4mb file• Each block (4kb) indexed in DHT• Requires 1000 items in DHT
• Expensive• One node making 1000 lookups• Marshaling/unmarshaling 1000 requests
![Page 70: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/70.jpg)
70
Bulk Operation
• Define a bulk set: I• A set of identifiers
• bulk_operation(m, I)• Send message m to every node i ∈ I
• Similar correctness to broadcast• Coverage: all nodes with identifier in I • Termination• Non-redundancy
![Page 71: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/71.jpg)
71
Bulk Owner Operation with Feedback
• Define a bulk set: I• A set of identifiers
• bulk_own(m, I)• Send m to every node responsible for an
identifier i ∈ I
• Example• Bulk set I={4}• Node 4 might not exist• Some node is responsible for identifier 4
![Page 72: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/72.jpg)
72
Bulk Operation with Feedback
• Define a bulk set: I• A set of identifiers
• bulk_feed(m, I)• Send message m to every node i ∈ I• Accumulate responses back to initiator
• bulk_own_feed(m, I)• Send message m to every node
responsible for i ∈ I• Accumulate responses back to initiator
![Page 73: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/73.jpg)
73
Bulk Properties (1/2)
• No redundant messages
• Maximum log(n) messages per node
![Page 74: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/74.jpg)
74
Bulk Properties (2/2)• Two extreme cases
• Case 1• Bulk set is all identifiers• Identical to simple broadcast• Message complexity is n• Time complexity is log(n)
• Case 2• Bulk set is a singleton with one identifier• Identical to ordinary lookup• Message complexity is log(n)• Time complexity is in log(n)
![Page 75: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/75.jpg)
75
Pseudo Reliable Broadcast
• Pseudo-reliable broadcast to deal with crash failures
• Coverage property• If initiator is correct, every node gets the message
• Similar to broadcast with feedback
• Use failure detectors on children• If child with responsibility to cover I fails• Use bulk to retry covering interval I
• Filter redundant messages using unique identifiers
• Eventually perfect failure detector for termination• Inaccuracy results in redundant messages
![Page 76: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/76.jpg)
76
Applications of bulk operation
• Bulk operation• Topology maintenance: update nodes in bulk
set• Pseudo-reliable broadcast: re-covering intervals
• Bulk owner• Multiple inserts into a DHT
• Bulk owner with feedback• Multiple lookups in a DHT• Range queries
![Page 77: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/77.jpg)
77
Outline
• …• …
• Replication
• …• …
![Page 78: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/78.jpg)
78
Successor-list replication
• Successor-list replication• Replicate a node’s item on its f
successors• DKS, Chord, Pastry, Koorde etcetera.
• Was abandoned in favor of symmetric replication because …
![Page 79: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/79.jpg)
79
Motivation: successor-lists• If a node joins or leaves
• f replicas need to be updated
Color represents data item
Replication degree 3
Every color replicated three times
![Page 80: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/80.jpg)
80
Motivation: successor-lists• If a node joins or leaves
• f replicas need to be updated
Color represents data item
Node leaves
Yellow, green, red, blue need to be re-distributed
![Page 81: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/81.jpg)
81
Multiple hashing• Rehashing
• Store each item <k,v> at• succ( H(k) )• succ( H(H(k)) )• succ( H(H(H(k))) )• …
• Multiple hash functions• Store each item <k,v> at
• succ( H1(k) )• succ( H2(k) )• succ( H3(k) )• …
• Advocated by CAN and Tapestry
![Page 82: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/82.jpg)
82
Motivation: multiple hashing
• Example• Item <”Seif”, ”Stockholm”>
• H(”Seif”)=7• succ(7)=9
• Node 9 crashes• Node 12 should get item from replica• Need hash inverse H-1(7)=”Seif” (impossible)• Items dispersed all over nodes (inefficient)
5
9 12
7
Seif, Stockholm
![Page 83: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/83.jpg)
83
Symmetric Replication•Basic Idea
•Replicate identifiers, not nodes
•Associate each identifier i with f other identifiers:
•
•Identifier space partitioned into m
equivalence classes •Cardinality of each class is f, m=N/f
•Each node replicates the equivalence class of all identifiers it is responsible for
fkfN
kikr 0for,)(
![Page 84: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/84.jpg)
84
Symmetric replicationReplication degree f=4, Space={0,…,15}• Congruence classes modulo 4:
• {0, 4, 8, 12}• {1, 5, 9, 13}• {2, 6, 10, 14}• {3, 7, 11, 15}
0 12
1514
13 3
12
11
4
5
6
9 87
10
Data: 15, 0
Data: 1, 2, 3
Data: 4, 5
Data: 14, 13, 12, 11
Data: 6, 7, 8, 9, 10
![Page 85: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/85.jpg)
85
Ordinary ChordReplication degree f=4, Space={0,…,15}• Congruence classes modulo 4
• {0, 4, 8, 12}• {1, 5, 9, 13}• {2, 6, 10, 14}• {3, 7, 11, 15}
0 12
1514
13 3
12
11
4
5
6
9 87
10
Data: 15, 0
Data: 1, 2, 3
Data: 4, 5
Data: 14, 13, 12, 11
Data: 10, 9, 8, 7
Data: 11, 12
Data: 13, 14, 15
Data: 0, 1
Data: 2, 3, 4, 5, 6
Data: 6, 5, 4, 3
Data: 2, 1, 0, 15Data: 7, 8
Data: 3, 4
Data: 9, 10, 11
Data: 5, 6, 7
Data: 12, 13
Data: 8, 9
Data: 14, 15, 0, 1, 2
Data: 10, 11, 12, 13, 14
Data: 6, 7, 8, 9, 10
![Page 86: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/86.jpg)
86
Data: 15, 0
Data: 1, 2, 3
Data: 4, 5
Data: 14, 13, 12, 11
Data: 6, 7, 8, 9, 10
Data: 11, 12
Cheap join/leaveReplication degree f=4, Space={0,…,15}• Congruence classes modulo 4
• {0, 4, 8, 12}• {1, 5, 9, 13}• {2, 6, 10, 14}• {3, 7, 11, 15}
0 12
1514
13 3
12
11
4
5
6
9 87
10
Data: 10, 9, 8, 7
Data: 11, 12
Data: 13, 14, 15
Data: 0, 1
Data: 2, 3, 4, 5, 6
Data: 6, 5, 4, 3
Data: 2, 1, 0, 15Data: 7, 8
Data: 3, 4
Data: 9, 10, 11
Data: 5, 6, 7
Data: 12, 13
Data: 8, 9
Data: 14, 15, 0, 1, 2
Data: 10, 11, 12, 13, 14
Data: 7, 8
Data: 3, 4
Data: 0, 15
Data: 11, 12, 7, 8, 3, 4, 0, 15
![Page 87: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/87.jpg)
87
Contributions
• Message complexity for join/leave O(1)• Bit complexity remains unchanged
• Handling failures more complex• Bulk operation to fetch data• On average log(n) complexity
• Can do parallel lookups• Decreasing latencies• Increasing robustness• Distributed voting• Erasure codes
![Page 88: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/88.jpg)
88
Presentation Overview
• …• …
• Summary
• …• …
![Page 89: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/89.jpg)
89
Summary (1/3)
• Atomic ring maintenance• Lookup consistency for j/l• No routing failures as nodes j/l• No bound on number of leaves• Eventual consistency with failures
• Additional routing pointers• k-ary lookup• Reliable lookup• No routing failures with additional
pointers
![Page 90: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/90.jpg)
90
Summary (2/3)• Efficient Broadcast
• log(n) time and n message complexity• Used in overlay multicast
• Bulk operations• Efficient parallel lookups• Efficient range queries
![Page 91: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/91.jpg)
91
Summary (3/3)
• Symmetric Replication• Simple, O(1) message complexity for j/l
•O(log f) for failures
• Enables parallel lookups• Decreasing latencies• Increasing robustness• Distributed voting
![Page 92: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/92.jpg)
92
Presentation Overview
• Gentle introduction to DHTs
• Contributions
• The future
![Page 93: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/93.jpg)
93
Future Work (1/2)
• Periodic stabilization• Prove it is self-stabilizing
![Page 94: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/94.jpg)
94
Future Work (2/2)
• Replication Consistency• Atomic consistency impossible in
asynchronous systems• Assume partial synchrony• Weaker consistency models?• Using virtual synchrony
![Page 95: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/95.jpg)
95
Speculative long-term agenda
• Overlay today provides• Dynamic membership• Identities (max/min avail)• Only know subset of nodes• Shared memory registers
• Revisit distributed computing• Assuming an overlay as basic primitive• Leader election• Consensus• Shared memory consistency (started)• Transactions• Wave algorithms (started)
• Implement middleware providing these…
![Page 96: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/96.jpg)
96
Acknowledgments
• Seif Haridi• Luc Onana Alima
• Cosmin Arad• Per Brand• Sameh El-Ansary • Roland Yap
![Page 97: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/97.jpg)
97
THANK YOU
![Page 98: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/98.jpg)
98
![Page 99: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/99.jpg)
99
Handling joins
• When n joins• Find n’s successor with lookup(n)• Set succ to n’s successor• Stabilization fixes the rest
Periodically at n:
1. set v:=succ.pred2. if v≠nil and v is in
(n,succ]3. set succ:=v4. send a notify(n) to succ
When receiving notify(p) at n:
1. if pred=nil or p is in (pred,n]
2. set pred:=p
11
1513
![Page 100: 1 Distributed k -ary System Algorithms for Distributed Hash Tables Ali Ghodsi aligh@kth.se ali/thesis/ PhD Defense, 7th December 2006,](https://reader035.vdocuments.us/reader035/viewer/2022062511/5519b2785503465b578b465e/html5/thumbnails/100.jpg)
100
Handling leaves
• When n leaves• Just dissappear (like failure)
• When pred detected failed• Set pred to nil
• When succ detected failed• Set succ to closest alive in
successor list
Periodically at n:
1. set v:=succ.pred2. if v≠nil and v is in
(n,succ]3. set succ:=v4. send a notify(n) to succ
When receiving notify(p) at n:
1. if pred=nil or p is in (pred,n]
2. set pred:=p
11
1513