![Page 1: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/1.jpg)
BigtableA Distributed Storage System
for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemaway,Wilson C. Hsieh, Deborah A. Wallach,
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
Google, Inc.
Presented by: Emanuele Rocca
![Page 2: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/2.jpg)
2
What is Bigtable?
BIG
TABLE
![Page 3: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/3.jpg)
3
What is Bigtable?
Let's start saying what Bigtable is NOT
● Not a database● Not a sharded database● Not a distributed hashtable
![Page 4: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/4.jpg)
4
What is Bigtable?
A distributed, persistent, sorted, associative array
(row:string, column: string, time:int64) → string
![Page 5: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/5.jpg)
5
Why did they implement it?
Quoting Jeff Dean:
● Applications at Google place very different demands on the storage system
● Handle petabytes of data
● Scale to thousands of commodity servers
● Fun
![Page 6: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/6.jpg)
6
Outline
● Data model● Architecture● Use cases● Performance evaluation● Great excitement
![Page 7: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/7.jpg)
7
Data model
![Page 8: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/8.jpg)
8
Data Model
Remember the Relational Model?
![Page 9: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/9.jpg)
9
Data Model
Forget it!
![Page 10: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/10.jpg)
10
Data Model
Simpler than the Relational Model:Dynamic control over data layout
![Page 11: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/11.jpg)
11
Data Model
Indexed by: row key, column key, timestamp
![Page 12: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/12.jpg)
12
Data Model
Data is maintained in lexicographic order by row key
● Allows (forces) developers to reason about the locality properties of their data
● Reads of short row ranges are efficient and require communication with a small number of machines
![Page 13: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/13.jpg)
13
Data Model
Row range for a table dynamically partitioned
● Partitions are called TABLETSTABLETS● 1 GFS file per tablet● Unit of distribution and load balancing
![Page 14: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/14.jpg)
14
Data Model
● Reads/writes under a single row key are atomic
● Timestamps can be used to store multiple versions of the same item: garbage collection
![Page 15: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/15.jpg)
15
Architecture
![Page 16: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/16.jpg)
16
Architecture
Building blocks:● Google File System● Cluster scheduling system● Chubby: High available, persistent, distributed lock service
![Page 17: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/17.jpg)
17
Architecture
1 master server,N tablet servers
![Page 18: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/18.jpg)
18
Architecture
![Page 19: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/19.jpg)
19
ArchitectureThe tablet server
● Can be dynamically added or removed from a cluster according to changes in the workload
● Manages a set of N tablets (10 < N < 1000)
● Handles reads / writes to rows located in its tablets
● Splits tablets that have grown too large
![Page 20: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/20.jpg)
20
ArchitectureThe master server
● Assigns tablets to tablets servers
● Detects when a tablet server joins / leaves
● Balances tablet↔server load
![Page 21: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/21.jpg)
21
Architecture
The poor master is usually...Quite bored.
![Page 22: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/22.jpg)
22
Use Cases
![Page 23: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/23.jpg)
23
Use Cases
● Web indexing● Gmail● Youtube● Google Maps, Earth, Reader, Code● …● Google App Engine
![Page 24: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/24.jpg)
24
Performance Evaluation
![Page 25: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/25.jpg)
25
Performance Evaluation
Experimental Setup
● N tablet servers● Huge GFS cell: 1786 machines, 2x 400 GB disks each
![Page 26: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/26.jpg)
26
Performance Evaluation
Benchmarks
● Sequential write● Sequential read● Random write● Random read● Scan
![Page 27: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/27.jpg)
27
Performance EvaluationBigger values are better
![Page 28: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/28.jpg)
28
Performance Evaluation
● Scans are superfast: RPC overhead is amortized
● Random reads from memory also scale very well
● Random reads from GFS show the worst scaling
![Page 29: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/29.jpg)
29
Why did they implement it?
Quoting Jeff Dean:
● Applications at Google place very different demands on the storage system
● Handle petabytes of data
● Scale to thousands of commodity servers
● Fun
![Page 30: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/30.jpg)
30
Conclusions
Bigtable scales to petabytes of data across thousands of commodity Linux servers
Developers can have an hard time adapting to different models
Google's structured storage needs are satisfied
![Page 31: Bigtable - Linuxema/slides/bigtable.pdfBigtable A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemaway, Wilson C. Hsieh, Deborah A. Wallach, Mike](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2e81cf30c9e13f1077d972/html5/thumbnails/31.jpg)
31
Use Cases