bigtable a distributed storage system
TRANSCRIPT
![Page 1: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/1.jpg)
Bigtable: A Distributed Storage System
Presenter: Ku. Devyani B.Vaidya
Dr. Panjabrao
Deshamukh,Amravati
(CO-6G)
![Page 2: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/2.jpg)
Dec 8th , 2011 Dec 8th , 2011
Bigtable: A Distributed Storage System
1. Introduction2. What is a Bigtable? 3. Why not A DBMS? 4. Data model: Row
Column Timestamps
5. APIs
6. Building Blocks
8. Conclusion7.Real Applications
![Page 3: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/3.jpg)
Dec 8th , 2011 Dec 8th , 2011
Introduction
• BigTable is a distributed storage system for managing structured data.
• Designed to scale to a very large size - Petabytes of data across thousands of servers
• Used for many Google projects - Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …
• Flexible, high-performance solution for all of Google’s products
![Page 4: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/4.jpg)
Dec 8th , 2011 Dec 8th , 2011
What is a Bigtable?
• “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”
![Page 5: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/5.jpg)
Dec 8th , 2011 Dec 8th , 2011
Why not A DBMS?• Few DBMS’s support the requisite scale
– Required DB with wide scalability, wide applicability, high performance and high availability
• Couldn’t afford it if there was one– Most DBMSs require very expensive
infrastructure• DBMSs provide more than Google needs
– E.g., full transactions, SQL• Google has highly optimized lower-level
systems that could be exploited– GFS, Chubby, MapReduce, Job scheduling
![Page 6: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/6.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data model: Row• Row keys are arbitrary strings • Row is the unit of transactional consistency• Data is maintained in lexicographic order by
row key• Rows with consecutive keys (Row Range) are
grouped together as “tablets”.
![Page 7: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/7.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data model: Column• Column keys are grouped into sets called
“column families”, which form the unit of access control.
• Column key is named using the following syntax: family :qualifier
• Access control and disk/memory accounting are performed at column family level
![Page 8: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/8.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data model: timestamps• Each cell in Bigtable can contain multiple
versions of data, each indexed by timestamp• Timestamps are 64-bit integers• Assigned by:
– Bigtable– Client application
• Data is stored in decreasing timestamp order, so that most recent data is easily accessed– Application specifies how many versions (n) of data
items are maintained in a cell - Bigtable garbage-collects cell versions automatically.
![Page 9: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/9.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data ModelExample: Web Indexing
![Page 10: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/10.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
![Page 11: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/11.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
Row
![Page 12: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/12.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
Columns
![Page 13: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/13.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
Cells
![Page 14: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/14.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
timestamps
![Page 15: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/15.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
Column family
![Page 16: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/16.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
Column family
family: qualifier
![Page 17: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/17.jpg)
Dec 8th , 2011 Dec 8th , 2011
Data Model
Column family
family: qualifier
![Page 18: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/18.jpg)
Dec 8th , 2011 Dec 8th , 2011
APIs
• The Bigtable API provides functions :
- Creating and deleting tables and column families.
- Changing cluster , table and column family metadata.
- Support for single row transactions
- Allows cells to be used as integer counters
![Page 19: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/19.jpg)
Dec 8th , 2011 Dec 8th , 2011
Building Blocks
. Bigtable uses the distributed Google File System (GFS) to store log and data files
• The Google SSTable file format is used internally to store Bigtable data
• An SSTable provides a persistent , ordered immutable map from keys to values
![Page 20: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/20.jpg)
Dec 8th , 2011 Dec 8th , 2011
Real Applications • Google Analytics
http://analytics.google.com
• Google Earth & Google Maps http://earth.google.com
• Personalized Search www.google.com/psearch
• Web Indexing• Google Finance• Orkut• Writely
![Page 21: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/21.jpg)
Dec 8th , 2011 Dec 8th , 2011
Conclusion
• Bigtable has achieved its goals of high performance, data availability and scalability.
It has been successfully deployed in real apps (Personalized Search, Orkut, GoogleMaps, …)
• Significant advantages of building own storage system like flexibility in designing data model, control over implementation and other infrastructure on which Bigtable relies on.
![Page 23: Bigtable a distributed storage system](https://reader033.vdocuments.us/reader033/viewer/2022051404/58ee293d1a28ab6d6f8b464f/html5/thumbnails/23.jpg)
Dec 8th , 2011
©2007 The Board of Regents of the University of Nebraska. All rights reserved.
Thanks