big table - slides by jatin. goals wide applicability scalability high performance and high...

10
Big Table - Slides by Jatin

Upload: austen-norton

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Big Table

- Slides by Jatin

Page 2: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Goals

• wide applicability• Scalability• high performance• and high availability

Page 3: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

• Bigtable resembles a database• Bigtable does not support a full relational data

model• Data is indexed using row and column names

that can be arbitrary strings

Page 4: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

What is Bigtable?

• A Bigtable is a sparse, distributed, persistent multidimensional sorted map.

• The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

• (row:string, column:string, time:int64) -> string

Page 5: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

For example, bigtable store data for maps.google.com/index.html under the key com.google.maps/index.html

Page 6: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Columns

• A table may have an unbounded number of columns.

• Column keys are grouped into sets called column families

• A column key is named using the following syntax: family:qualier.

Page 7: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Storage

• Bigtable uses the distributed Google File System (GFS) to store log and data files.

• The Google SSTable file format is used internally to store Bigtable data.

• An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range

Page 8: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Implementation• The implementation has three parts:– Library code at each client– Master server– Tablet Servers

• Each Tablet Server starts with a single tablet. When the size of this tablet becomes large it gets split into two tablets.

• The Tablet location information is stored using a B+ tree kind of hierarchy.

• Bigtable relies on a highly-available and persistent distributed lock service called Chubby.

Page 9: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Tablet location hierarchy

Page 10: Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability

Finding Tablet Location

• Client caches tablet locations.• In case if it does not know, it has to make

three network round-trips in case cache is empty and upto six round trips in case cache is stale.

• Tablet locations are stored in memory, so no GFS accesses are required