dynamo: amazon’s highly - university of waterloo · dynamo: amazon’s highly available key-value...
TRANSCRIPT
![Page 1: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/1.jpg)
Dynamo: Amazon’s Highly Available Key-value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels
SOSP(2007)
Presenter: Shichao Jin
![Page 2: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/2.jpg)
Outline
Background
Design Principles
Techniques
Conclusion
![Page 3: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/3.jpg)
Background
Amazon Shopping Carts
low-latency key-value storage
Put() & Get()
SLA: response within 300ms for 99.9% of requests
hundreds of nodes
a collection of distributed techniques
spawned many imitators
Voldemort (LinkedIn)
Cassandra (Facebook)
![Page 4: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/4.jpg)
Design Principles
Always-writable
Incrementally scalable
Symmetrical
Decentralized
Heterogenous
![Page 5: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/5.jpg)
Techniques
Problem Technique
Partitioning Consistent hashing
High availability for writes
Eventual consistency,
Vector clocks with reconciliation during reads
Handling temporary failures
Sloppy quorum protocol and hinted handoff
Recovering from permanent failures Anti-entropy using Merkle trees
Membership and failure detection Gossip-based membership protocol and failure detection
![Page 6: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/6.jpg)
Partition——Consistent Hashing
m nodes
items identified by keys
How to partition items to m nodes?
![Page 7: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/7.jpg)
Partition——Consistent Hashing
node0 node1 node2 node3
11%4=3
102%4=2
![Page 8: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/8.jpg)
Partition——Consistent Hashing
Disadvantages of hash:
static, rehash when add/delete node(s)
Solution:
Consistent Hashing
![Page 9: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/9.jpg)
Partition——Consistent Hashing
Consistent Hashing:
hash space: ring
each node manages a region
all rehash is unnecessary
![Page 10: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/10.jpg)
Partition——Consistent Hashing
add node3
delete node1
![Page 11: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/11.jpg)
Partition——Consistent Hashing
Problems of Consistent Hashing:
non-uniform load distribution
heterogeneity
Solution:
Virtual Nodes
![Page 12: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/12.jpg)
Partition——Consistent Hashing
Virtual Nodes:
disperse load to other nodes when a node fails
![Page 13: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/13.jpg)
Replication
An Example for Replication
N = 3
B, C, D is K’s preference list
for fault-tolerance
for availability
![Page 14: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/14.jpg)
High Availability for Writes
Concurrent Writes:
Application: Shopping Cart
Two-Phase Commit in distributed RDBMS
![Page 15: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/15.jpg)
High Availability for Writes
Concurrent Writes:
Problem: 2 (more) versions of a data item
Possible Solution: timestamp (How?)
Dynamo: Vector Clocks
N1 N2 N3
K14 V14 K14 V14 K14 V14 K14 V14’ K14 V14’’
![Page 16: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/16.jpg)
High Availability for Writes
Vector Clocks:
logical clock
causal order (partial)
![Page 17: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/17.jpg)
High Availability for Writes
How to determine ordering of versions?
(A:1, B:1, C:1) < (A:3, B:1, C:1)
(A:1, B:1, C:1) ? (A:2, C:1)
![Page 18: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/18.jpg)
Consistency——Strict Quorum
Eventual Consistency:
given enough time all updates will propagate
through the system
Read after Write
N1 N2 N3
K14 V14 K14 V14 K14 V14 K14 V14’ K14 V14’
![Page 19: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/19.jpg)
Consistency——Strict Quorum
Strict Quorum:
see the latest data
define a replica set of size N
put() waits for acks from at least W replicas
get() waits for responses from at least R replicas
W+R > N
![Page 20: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/20.jpg)
Consistency——Strict Quorum
Strict Quorum Example:
N=3, W=2, R=2
replica set for K14: {N1, N2, N3}
assume put() on N3 fails
N1 N2 N3
K14 V14 K14 V14
put(
K1
4,
V1
4)
![Page 21: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/21.jpg)
Consistency——Strict Quorum
Strict Quorum Example:
Now, issuing get() to any two nodes out of three will
return the answer
N1 N2 N3
K14 V14 K14 V14
ge
t(K1
4)
nill
![Page 22: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/22.jpg)
Consistency——Strict Quorum
Why does Strict Quorum works?
Tune W, R, N:
optimized for write, set W small
optimized for read, set R small
W R
![Page 23: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/23.jpg)
Temporary Failure ——Hinted Handoff
Hinted Handoff (Sloppy Quorum)
node accepts writes for other down nodes
data accepted by other node is handed off when
down node recovers
set W = 3, N = 3
do not wait B recover
![Page 24: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/24.jpg)
Temporary Failure ——Hinted Handoff
Sloppy Quorum
![Page 25: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/25.jpg)
Permanent Failure ——Replica Synchronize
Replica Synchronization (Merkle tree)
hierarchical checksums
executed periodically or when membership changes
![Page 26: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/26.jpg)
Permanent Failure ——Replica Synchronize
Replica Synchronization (Merkle tree)
hierarchical checksums
executed periodically or when membership changes
![Page 27: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/27.jpg)
Permanent Failure ——Replica Synchronize
Replica Synchronization (Merkle tree)
hierarchical checksums
executed periodically or when membership changes
![Page 28: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/28.jpg)
Conclusion
Consistent Hashing
Vector Clocks
Eventual consistency
Strict & Sloppy Quorum
Merkel Tree
![Page 29: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/29.jpg)
References
Dynamo Paper
KaiAn: Open Source Implementation of Amazon’s
Dynamo
UCB CS162: Key-Value Store, Networking, Protocols
A Little Riak Book by Eric Redmond
![Page 30: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,](https://reader033.vdocuments.us/reader033/viewer/2022041603/5e3223fa3a91494b187ba261/html5/thumbnails/30.jpg)
Q & A