a novel data placement model for highly-available storage ......kinesis: a data/replica placement...
TRANSCRIPT
![Page 1: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/1.jpg)
Rama, Microsoft Research
joint work with
John MacCormick, Nick Murphy, Kunal Talwar,
Udi Wieder, Junfeng Yang, and Lidong Zhou
A Novel Data Placement Model for Highly-Available Storage Systems
![Page 2: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/2.jpg)
Introduction
Kinesis:
Framework for placing data and replicas in a data-center storage system
Three Design Principles:
Structured organization
Freedom of choice
Scattered placement
![Page 3: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/3.jpg)
Kinesis Design =>Structured Organization
Segmentation: Divide storage servers into k
segments
Each segment is an independent hash-based system
Failure Isolation: Servers with shared
components in the same segment
r replicas per item, each on a different segment
Reduces impact of correlated failures
![Page 4: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/4.jpg)
Kinesis Design =>Freedom of Choice
Balanced Resource Usage:
Multiple-choice paradigm
Write: r out of k choices
Read: 1 out of r choices
![Page 5: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/5.jpg)
Kinesis Design =>Scattered Data Placement
Independent hash functions
Each server stores different set of items
Parallel Recovery
Spread recovery load among multiple servers uniformly
Recover faster
![Page 6: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/6.jpg)
Motivation => Data Placement Models
Data is stored on a server determined by a hash function
Server identified by a local computation during reads
Low overhead
Limited control in placing data items and replicas
Any server can store any data
A directory provides the server to fetch data form
Expensive to maintain a globally-consistent directory in a large-scale system
Can place data items carefully to maintain load balance, avoid correlated failure, etc.
Hash-based Directory-based
![Page 7: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/7.jpg)
Kinesis =>Advantages
Enables hash-based storage systems to have advantages of directory-based systems
Near-optimal load balance
Tolerance to shared-component failures
Freedom from problems induced by correlated replica placement
No bottlenecks induced by directory maintenance
Avoids slow data/replica placement algorithms
![Page 8: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/8.jpg)
Evaluation
Theory
Simulations
Experiments
![Page 9: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/9.jpg)
Theory => The “Balls and Bins” Problem
Load balancing is often modeled by the task of throwing balls (items) into bins (servers)
Throw m balls into n bins:
Pick a bin uniformly at random
Insert the ball into the bin
Single-Choice Paradigm:
Max Load:(with high prob.)
![Page 10: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/10.jpg)
Throw m balls into n bins:
Pick d bins uniformly at random (d ≥ 2)
Insert the ball into the less-loaded bin
Excess load is independent of m, number of balls! [BCSV00]
Max Load:(with high prob.)
Theory => Multiple-Choice Paradigm
![Page 11: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/11.jpg)
Theory vs. Practice
Storage load vs. network load: [M91]
Network load is not persistent unlike storage load
Non-uniform sampling: [V99],[W07]
Servers chosen based on consistent (or linear) hashing
Replication: [MMRWYZ08]
Choose r out of k servers instead of 1 out of d bins
Heterogeneity:
Variable-sized servers [W07]
Variable-sized items [TW07]
![Page 12: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/12.jpg)
Theory => Planned Expansions
Adding new, empty servers into the system
Creates sudden, large imbalances in load Empty servers vs. filled servers
Eventual storage balance Do subsequent inserts/writes fill up new servers?
Eventual network load balance Are new items distributed between old and new servers? New items are often more popular than old items
![Page 13: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/13.jpg)
Theory => Planned Expansions
Expanding a linear-hashing system:
0 2b2b-1
N = 2
N = 4
N = 6
![Page 14: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/14.jpg)
Theory => Planned Expansions
Before Expansion:
Assume there are k n servers on k segments
Storage is evenly balanced
Expansion:
Add α n new servers to each segment
After expansion:
R = (1 - α) n servers with twice the load as
L = 2 α n servers (new servers + split servers)
![Page 15: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/15.jpg)
Theory => Planned Expansions
Relative Load Distribution: RL
Ratio of expected number of replicas inserted in L servers
to expected number of replicas inserted on all servers
RL > 1 => eventual storage balance!
RL < 1 => no eventual storage balance
Theorem: RL > 1 if k ≥ 2 r [MMRWYZ08]
![Page 16: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/16.jpg)
Evaluation
Theory
Simulations
Experiments
![Page 17: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/17.jpg)
Simulations => Overview
Real-World Traces:
Compare with Chain: one segment
single-choice paradigm
chained replica placement
E.g. PAST, CFS, Boxwood, Petal
Trace Num Files Total Size Type
MSNBC 30,656 2.33 TB Read only
Plan-9 1.9 Million 173 GB Read/write
0
2b
2b-1
![Page 18: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/18.jpg)
Simulations => Load Balance
MSNBC Trace Plan-9 Trace
0.0
0.5
1.0
1.5
2.0
0 10000 20000 30000
Sto
rag
e:
Max -
Avg
(G
B)
Number of Objects
Chain(3)
Kinesis(7,3)
0
0.5
1
1.5
2
2.5
3
0 1 2 3
Sto
rag
e:
Max -
Avg
(G
B)
Number of Operations (106)
Chain(3)
Kinesis(7,3)
![Page 19: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/19.jpg)
Simulations => System Provisioning
45.0%
35.0%
29.0%26.0%
23.0%20.0%
52.0%
0%
15%
30%
45%
60%
K (5) K (6) K (7) K (8) K (9) K (10) Chain
Perc
en
tag
e O
verp
rovis
ion
ing
![Page 20: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/20.jpg)
Simulations => User Experienced Delays
0
2
4
6
8
10
0 100 200 300 400 500
90
thp
c. Q
ueu
ing
Dela
y (
sec)
Number of Servers
Chain (3)
Kinesis (7,3)
![Page 21: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/21.jpg)
Simulations => Read Load vs. Write Load
Read load: short term resource consumption Network bandwidth, computation, disk bandwidth
Directly impacts user experience
Write load: short and long term resource consumption Storage space
Solution (Kinesis S+L): Choose short term over long term Storage balance is restored when transient resources are
not bottlenecked
![Page 22: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/22.jpg)
Simulations => Read Load vs. Write Load
0
50
100
150
200
0 20 40 60 80 100
Qu
eu
ing
Dela
y:
Max -
Avg
(ms)
Update to Read Ratio (%)
Chain
Kinesis S
Kinesis S+L
Kinesis L
![Page 23: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/23.jpg)
Simulations => Read Load vs. Write Load
0%
5%
10%
15%
0.0
0.5
1.0
1.5
0 4 8 12 16 20 24
Sto
rag
e Im
bala
nce (
%)
Req
uest
Ra
te (
Gb
/s)
Time (hours)
Request Rate
Kinesis S+L
Kinesis S
![Page 24: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/24.jpg)
Evaluation
Theory
Simulations
Experiments
![Page 25: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/25.jpg)
Kinesis => Prototype Implementation
Storage system for variable-sized objects
Read/Write interface
Storage servers
Linear hashing system
Read, Append, Exists, Size (storage), Load (queued requests)
Failure recovery for primary replicas Primary for an item is the highest numbered server with a replica
![Page 26: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/26.jpg)
Kinesis => Prototype Implementation
Front end
Two-step read protocol
1) Query k candidate servers
2) Fetch from least loaded server with a replica
Versioned updates with copy-on-write semantics
RPC-based communication with servers
Non implemented!
Failure detector
Failure-consistent updates
![Page 27: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/27.jpg)
Kinesis =>Experiments
15 node LAN test-bed
14 storage servers and 1 front end
MSNBC trace of 6 hours duration
5000 files, 170 GB total size, and 200,000 reads
Failure induced at 3 hours
![Page 28: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/28.jpg)
10.7 10.7 10.2 10.6 10.3 10.8
14.0
17.8 17.4
14.3
10.9 10.7 10.4 10.6
Experiments => Kinesis vs. Chain
11.9 11.4 11.7 11.9 11.5 11.9 11.612.4 12.5 12.9 13.4 12.7 12.3
11.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Kinesis(7,3)
Chain(3)
Average Latency
Kinesis: 1.73 sec Chain: 3 sec
![Page 29: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/29.jpg)
Experiments => Failure Recovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Kinesis(7,3)
Chain(3)
Total Recovery Time
Kinesis: 17 min(12 servers)
Chain: 44 min(5 servers)
![Page 30: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/30.jpg)
Related Work
Prior work:
Hashing: Archipelago, OceanStore, PAST, CFS…
Two-choice paradigm: common load balancing technique
Parallel recovery: Chain Replication [RS04]
Random distribution: Ceph [WBML06]
Our contributions:
Putting them together in the context of storage systems
Extending theoretical results to the new design
Demonstrating the power of these simple design principles
![Page 31: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/31.jpg)
Summary and Conclusions
Kinesis: A data/replica placement framework for LAN storage systems
Structured organization, freedom-of-choice, and scattered distribution
Simple and easy to implement, yet quite powerful!
Reduces infrastructure cost, tolerates correlated failures, and quickly restores lost replicas
Soon to appear in ACM Transactions on Storage, 2008Download: http://research.microsoft.com/users/rama
![Page 32: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/32.jpg)
![Page 33: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/33.jpg)
Experiments => Kinesis vs. Chain
0
5
10
15
0 0.5 1 1.5 2 2.5 3
Avera
ge L
ate
ncy (
sec)
Time (hours)
Chain (3)
Kinesis (7,3)
![Page 34: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and](https://reader035.vdocuments.us/reader035/viewer/2022071513/613479f2dfd10f4dd73bc144/html5/thumbnails/34.jpg)
Experiments => Kinesis vs. Chain
0
10
20
30
40
50
0
10
20
30
40
50
0 0.5 1 1.5 2 2.5 3
Se
rve
r L
oa
d:
Ma
x -
Avg
Req
uest
Rate
Time (hours)
Total Load
Chain (3)
Kinesis (7,3)