topic 12: nosql in action

115
12: NoSQL in Action Zubair Nabi [email protected] April 20, 2013 Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33

Upload: zubair-nabi

Post on 26-Jan-2015

111 views

Category:

Technology


0 download

DESCRIPTION

Cloud Computing Workshop 2013, ITU

TRANSCRIPT

Page 1: Topic 12: NoSQL in Action

12: NoSQL in Action

Zubair Nabi

[email protected]

April 20, 2013

Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33

Page 2: Topic 12: NoSQL in Action

Outline

1 Amazon’s Dynamo

2 MongoDB

3 Google BigTable

4 Cassandra

Zubair Nabi 12: NoSQL in Action April 20, 2013 2 / 33

Page 3: Topic 12: NoSQL in Action

Outline

1 Amazon’s Dynamo

2 MongoDB

3 Google BigTable

4 Cassandra

Zubair Nabi 12: NoSQL in Action April 20, 2013 3 / 33

Page 4: Topic 12: NoSQL in Action

Introduction

At the forefront of the NoSQL movement and has influenced the designof many subsequent systems

Design considerations are two-fold: 1) Infrastructure and 2) Business

Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33

Page 5: Topic 12: NoSQL in Action

Introduction

At the forefront of the NoSQL movement and has influenced the designof many subsequent systems

Design considerations are two-fold: 1) Infrastructure and 2) Business

Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33

Page 6: Topic 12: NoSQL in Action

Infrastructure Considerations

Tens of thousands of servers and network elements distributed acrossthe globe

Commodity off-the-shelf hardwareI Failure is normal

Hundreds of services, all decentralized and loosely coupled

Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33

Page 7: Topic 12: NoSQL in Action

Infrastructure Considerations

Tens of thousands of servers and network elements distributed acrossthe globeCommodity off-the-shelf hardware

I Failure is normal

Hundreds of services, all decentralized and loosely coupled

Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33

Page 8: Topic 12: NoSQL in Action

Infrastructure Considerations

Tens of thousands of servers and network elements distributed acrossthe globeCommodity off-the-shelf hardware

I Failure is normal

Hundreds of services, all decentralized and loosely coupled

Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33

Page 9: Topic 12: NoSQL in Action

Business Considerations

Strict, internal SLAs regarding performance, reliability, and efficiency

Reliability is of paramount importance because an outage means lossin revenue and customer trust

The platform needs to be highly scalable, to support continuous growthMost services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.

I No need for complex querying and management afforded by RDBMS

Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33

Page 10: Topic 12: NoSQL in Action

Business Considerations

Strict, internal SLAs regarding performance, reliability, and efficiency

Reliability is of paramount importance because an outage means lossin revenue and customer trust

The platform needs to be highly scalable, to support continuous growthMost services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.

I No need for complex querying and management afforded by RDBMS

Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33

Page 11: Topic 12: NoSQL in Action

Business Considerations

Strict, internal SLAs regarding performance, reliability, and efficiency

Reliability is of paramount importance because an outage means lossin revenue and customer trust

The platform needs to be highly scalable, to support continuous growth

Most services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.

I No need for complex querying and management afforded by RDBMS

Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33

Page 12: Topic 12: NoSQL in Action

Business Considerations

Strict, internal SLAs regarding performance, reliability, and efficiency

Reliability is of paramount importance because an outage means lossin revenue and customer trust

The platform needs to be highly scalable, to support continuous growthMost services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.

I No need for complex querying and management afforded by RDBMS

Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33

Page 13: Topic 12: NoSQL in Action

Business Considerations

Strict, internal SLAs regarding performance, reliability, and efficiency

Reliability is of paramount importance because an outage means lossin revenue and customer trust

The platform needs to be highly scalable, to support continuous growthMost services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.

I No need for complex querying and management afforded by RDBMS

Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33

Page 14: Topic 12: NoSQL in Action

Design

1 Implemented as a partitioned system with replication and consistencywindows

2 Targets applications that require weaker consistency

3 Gives high availability

4 Possibility for write operations even in the presence of partitioningamongst replicas

5 Always writeable so conflict resolution needs to happen during reads

Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33

Page 15: Topic 12: NoSQL in Action

Design

1 Implemented as a partitioned system with replication and consistencywindows

2 Targets applications that require weaker consistency

3 Gives high availability

4 Possibility for write operations even in the presence of partitioningamongst replicas

5 Always writeable so conflict resolution needs to happen during reads

Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33

Page 16: Topic 12: NoSQL in Action

Design

1 Implemented as a partitioned system with replication and consistencywindows

2 Targets applications that require weaker consistency

3 Gives high availability

4 Possibility for write operations even in the presence of partitioningamongst replicas

5 Always writeable so conflict resolution needs to happen during reads

Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33

Page 17: Topic 12: NoSQL in Action

Design

1 Implemented as a partitioned system with replication and consistencywindows

2 Targets applications that require weaker consistency

3 Gives high availability

4 Possibility for write operations even in the presence of partitioningamongst replicas

5 Always writeable so conflict resolution needs to happen during reads

Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33

Page 18: Topic 12: NoSQL in Action

Design

1 Implemented as a partitioned system with replication and consistencywindows

2 Targets applications that require weaker consistency

3 Gives high availability

4 Possibility for write operations even in the presence of partitioningamongst replicas

5 Always writeable so conflict resolution needs to happen during reads

Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33

Page 19: Topic 12: NoSQL in Action

Conflict Resolution

A datastore can only perform simple conflict resolution

Passes the buck to the application

The application is aware of the data schema and hence better suited tochoose a conflict resolution mechanism

If the application does not want to implement conflict resolution, simplemechanisms, such as “last write wins” provided by the framework

Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33

Page 20: Topic 12: NoSQL in Action

Conflict Resolution

A datastore can only perform simple conflict resolution

Passes the buck to the application

The application is aware of the data schema and hence better suited tochoose a conflict resolution mechanism

If the application does not want to implement conflict resolution, simplemechanisms, such as “last write wins” provided by the framework

Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33

Page 21: Topic 12: NoSQL in Action

Conflict Resolution

A datastore can only perform simple conflict resolution

Passes the buck to the application

The application is aware of the data schema and hence better suited tochoose a conflict resolution mechanism

If the application does not want to implement conflict resolution, simplemechanisms, such as “last write wins” provided by the framework

Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33

Page 22: Topic 12: NoSQL in Action

Conflict Resolution

A datastore can only perform simple conflict resolution

Passes the buck to the application

The application is aware of the data schema and hence better suited tochoose a conflict resolution mechanism

If the application does not want to implement conflict resolution, simplemechanisms, such as “last write wins” provided by the framework

Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33

Page 23: Topic 12: NoSQL in Action

Interface

1 Simple key/value interface storing values as BLOBs

2 Operations limited to one key/value pair at a time

3 No support for hierarchichal namespaces (like those in filesystems)

Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33

Page 24: Topic 12: NoSQL in Action

Interface

1 Simple key/value interface storing values as BLOBs

2 Operations limited to one key/value pair at a time

3 No support for hierarchichal namespaces (like those in filesystems)

Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33

Page 25: Topic 12: NoSQL in Action

Interface

1 Simple key/value interface storing values as BLOBs

2 Operations limited to one key/value pair at a time

3 No support for hierarchichal namespaces (like those in filesystems)

Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33

Page 26: Topic 12: NoSQL in Action

Node Assignment

Completely decentralized so all nodes have equal responsibilities

As nodes can be heterogeneous, work is distributed proportional to thecapabilities of a node

Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33

Page 27: Topic 12: NoSQL in Action

Node Assignment

Completely decentralized so all nodes have equal responsibilities

As nodes can be heterogeneous, work is distributed proportional to thecapabilities of a node

Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33

Page 28: Topic 12: NoSQL in Action

Operations

Provides two operations:

1 get(key), returns a list of objects and a context2 put(key, context, object)

get can return more than one object if more than one conflictingversions

The context contains system metadata such as the object version

Keys and values are stored as an array of bytes, and only interpretedby the application

Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33

Page 29: Topic 12: NoSQL in Action

Operations

Provides two operations:1 get(key), returns a list of objects and a context

2 put(key, context, object)

get can return more than one object if more than one conflictingversions

The context contains system metadata such as the object version

Keys and values are stored as an array of bytes, and only interpretedby the application

Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33

Page 30: Topic 12: NoSQL in Action

Operations

Provides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)

get can return more than one object if more than one conflictingversions

The context contains system metadata such as the object version

Keys and values are stored as an array of bytes, and only interpretedby the application

Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33

Page 31: Topic 12: NoSQL in Action

Operations

Provides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)

get can return more than one object if more than one conflictingversions

The context contains system metadata such as the object version

Keys and values are stored as an array of bytes, and only interpretedby the application

Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33

Page 32: Topic 12: NoSQL in Action

Operations

Provides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)

get can return more than one object if more than one conflictingversions

The context contains system metadata such as the object version

Keys and values are stored as an array of bytes, and only interpretedby the application

Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33

Page 33: Topic 12: NoSQL in Action

Operations

Provides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)

get can return more than one object if more than one conflictingversions

The context contains system metadata such as the object version

Keys and values are stored as an array of bytes, and only interpretedby the application

Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33

Page 34: Topic 12: NoSQL in Action

Partitioning

MD5 hash of keys determines their storage nodes

Consistent hashing to provide incremental scalability

Partitioning done across virtual nodes instead of physical ones to takehardware heterogeneity into account

Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33

Page 35: Topic 12: NoSQL in Action

Partitioning

MD5 hash of keys determines their storage nodes

Consistent hashing to provide incremental scalability

Partitioning done across virtual nodes instead of physical ones to takehardware heterogeneity into account

Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33

Page 36: Topic 12: NoSQL in Action

Partitioning

MD5 hash of keys determines their storage nodes

Consistent hashing to provide incremental scalability

Partitioning done across virtual nodes instead of physical ones to takehardware heterogeneity into account

Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33

Page 37: Topic 12: NoSQL in Action

Outline

1 Amazon’s Dynamo

2 MongoDB

3 Google BigTable

4 Cassandra

Zubair Nabi 12: NoSQL in Action April 20, 2013 13 / 33

Page 38: Topic 12: NoSQL in Action

Introduction

Schemaless document database in C++

Used by a large number of organizations including SourceForge.net.foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EASports, github, etc.

Databases are distributed over multiple servers

Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33

Page 39: Topic 12: NoSQL in Action

Introduction

Schemaless document database in C++

Used by a large number of organizations including SourceForge.net.foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EASports, github, etc.

Databases are distributed over multiple servers

Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33

Page 40: Topic 12: NoSQL in Action

Introduction

Schemaless document database in C++

Used by a large number of organizations including SourceForge.net.foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EASports, github, etc.

Databases are distributed over multiple servers

Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33

Page 41: Topic 12: NoSQL in Action

Databases and Collections

Databases contain collections (“named groupings”) of documents

Documents within a collection might be heterogeneous

But a good strategy is to create a database collection for each objecttype

A collection is created automatically whenever the first document isinserted into the database

Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33

Page 42: Topic 12: NoSQL in Action

Databases and Collections

Databases contain collections (“named groupings”) of documents

Documents within a collection might be heterogeneous

But a good strategy is to create a database collection for each objecttype

A collection is created automatically whenever the first document isinserted into the database

Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33

Page 43: Topic 12: NoSQL in Action

Databases and Collections

Databases contain collections (“named groupings”) of documents

Documents within a collection might be heterogeneous

But a good strategy is to create a database collection for each objecttype

A collection is created automatically whenever the first document isinserted into the database

Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33

Page 44: Topic 12: NoSQL in Action

Databases and Collections

Databases contain collections (“named groupings”) of documents

Documents within a collection might be heterogeneous

But a good strategy is to create a database collection for each objecttype

A collection is created automatically whenever the first document isinserted into the database

Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33

Page 45: Topic 12: NoSQL in Action

Hierarchical Namespaces

Documents can be organized into a hierarchical structure using adot-notation

I For instance, the collections wiki.articles, wiki.categoriesand wiki.authors exist within the namespace wiki

The collection namespace itself is flat, hierarchical structure only forthe user

Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33

Page 46: Topic 12: NoSQL in Action

Hierarchical Namespaces

Documents can be organized into a hierarchical structure using adot-notation

I For instance, the collections wiki.articles, wiki.categoriesand wiki.authors exist within the namespace wiki

The collection namespace itself is flat, hierarchical structure only forthe user

Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33

Page 47: Topic 12: NoSQL in Action

Hierarchical Namespaces

Documents can be organized into a hierarchical structure using adot-notation

I For instance, the collections wiki.articles, wiki.categoriesand wiki.authors exist within the namespace wiki

The collection namespace itself is flat, hierarchical structure only forthe user

Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33

Page 48: Topic 12: NoSQL in Action

Documents

Unit of data storage

Conceptually similar to an XML document, JSON document, etc.

Documents are persisted in Binary JSON (BSON)

Easy to convert between BSON and JSON and between BSON andother programming language structures

Possible to insert (insert), search (find), and update a document(save)

Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33

Page 49: Topic 12: NoSQL in Action

Documents

Unit of data storage

Conceptually similar to an XML document, JSON document, etc.

Documents are persisted in Binary JSON (BSON)

Easy to convert between BSON and JSON and between BSON andother programming language structures

Possible to insert (insert), search (find), and update a document(save)

Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33

Page 50: Topic 12: NoSQL in Action

Documents

Unit of data storage

Conceptually similar to an XML document, JSON document, etc.

Documents are persisted in Binary JSON (BSON)

Easy to convert between BSON and JSON and between BSON andother programming language structures

Possible to insert (insert), search (find), and update a document(save)

Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33

Page 51: Topic 12: NoSQL in Action

Documents

Unit of data storage

Conceptually similar to an XML document, JSON document, etc.

Documents are persisted in Binary JSON (BSON)

Easy to convert between BSON and JSON and between BSON andother programming language structures

Possible to insert (insert), search (find), and update a document(save)

Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33

Page 52: Topic 12: NoSQL in Action

Documents

Unit of data storage

Conceptually similar to an XML document, JSON document, etc.

Documents are persisted in Binary JSON (BSON)

Easy to convert between BSON and JSON and between BSON andother programming language structures

Possible to insert (insert), search (find), and update a document(save)

Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33

Page 53: Topic 12: NoSQL in Action

Datatypes

Scalar: boolean, integer, double

Character sequence: string, code, etc.

BSON-objects: object

Object ID: To identify documents within a collection

Misc: null, array, date

Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33

Page 54: Topic 12: NoSQL in Action

Datatypes

Scalar: boolean, integer, double

Character sequence: string, code, etc.

BSON-objects: object

Object ID: To identify documents within a collection

Misc: null, array, date

Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33

Page 55: Topic 12: NoSQL in Action

Datatypes

Scalar: boolean, integer, double

Character sequence: string, code, etc.

BSON-objects: object

Object ID: To identify documents within a collection

Misc: null, array, date

Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33

Page 56: Topic 12: NoSQL in Action

Datatypes

Scalar: boolean, integer, double

Character sequence: string, code, etc.

BSON-objects: object

Object ID: To identify documents within a collection

Misc: null, array, date

Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33

Page 57: Topic 12: NoSQL in Action

Datatypes

Scalar: boolean, integer, double

Character sequence: string, code, etc.

BSON-objects: object

Object ID: To identify documents within a collection

Misc: null, array, date

Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33

Page 58: Topic 12: NoSQL in Action

References

No mechanism for foreign keys

References between documents need to be resolved by clientapplications

Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33

Page 59: Topic 12: NoSQL in Action

References

No mechanism for foreign keys

References between documents need to be resolved by clientapplications

Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33

Page 60: Topic 12: NoSQL in Action

Transaction Properties

Atomicity for only update and delete operations

Allows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:

1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinct3 MapReduce code execution on multiple nodes

Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33

Page 61: Topic 12: NoSQL in Action

Transaction Properties

Atomicity for only update and delete operations

Allows code to be executed locally on database nodes (server-sidecode execution)

Three different strategies for server-side execution:1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinct3 MapReduce code execution on multiple nodes

Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33

Page 62: Topic 12: NoSQL in Action

Transaction Properties

Atomicity for only update and delete operations

Allows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:

1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinct3 MapReduce code execution on multiple nodes

Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33

Page 63: Topic 12: NoSQL in Action

Transaction Properties

Atomicity for only update and delete operations

Allows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:

1 Execution of arbitrary code on a single node via eval operator

2 Aggregation via count, group, and distinct3 MapReduce code execution on multiple nodes

Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33

Page 64: Topic 12: NoSQL in Action

Transaction Properties

Atomicity for only update and delete operations

Allows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:

1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinct

3 MapReduce code execution on multiple nodes

Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33

Page 65: Topic 12: NoSQL in Action

Transaction Properties

Atomicity for only update and delete operations

Allows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:

1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinct3 MapReduce code execution on multiple nodes

Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33

Page 66: Topic 12: NoSQL in Action

Outline

1 Amazon’s Dynamo

2 MongoDB

3 Google BigTable

4 Cassandra

Zubair Nabi 12: NoSQL in Action April 20, 2013 21 / 33

Page 67: Topic 12: NoSQL in Action

Introduction

Supports a relaxed relational model that is dynamically controlled bythe clients

Clients can reason about the locality properties of the data

Data indexing can be row-wise as well as column-wise

Data can be delivered either out of memory or from disk

Used internally by Google for more than 60 projects including GoogleEarth, Google Analytics, Orkut, and Google Docs

Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33

Page 68: Topic 12: NoSQL in Action

Introduction

Supports a relaxed relational model that is dynamically controlled bythe clients

Clients can reason about the locality properties of the data

Data indexing can be row-wise as well as column-wise

Data can be delivered either out of memory or from disk

Used internally by Google for more than 60 projects including GoogleEarth, Google Analytics, Orkut, and Google Docs

Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33

Page 69: Topic 12: NoSQL in Action

Introduction

Supports a relaxed relational model that is dynamically controlled bythe clients

Clients can reason about the locality properties of the data

Data indexing can be row-wise as well as column-wise

Data can be delivered either out of memory or from disk

Used internally by Google for more than 60 projects including GoogleEarth, Google Analytics, Orkut, and Google Docs

Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33

Page 70: Topic 12: NoSQL in Action

Introduction

Supports a relaxed relational model that is dynamically controlled bythe clients

Clients can reason about the locality properties of the data

Data indexing can be row-wise as well as column-wise

Data can be delivered either out of memory or from disk

Used internally by Google for more than 60 projects including GoogleEarth, Google Analytics, Orkut, and Google Docs

Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33

Page 71: Topic 12: NoSQL in Action

Introduction

Supports a relaxed relational model that is dynamically controlled bythe clients

Clients can reason about the locality properties of the data

Data indexing can be row-wise as well as column-wise

Data can be delivered either out of memory or from disk

Used internally by Google for more than 60 projects including GoogleEarth, Google Analytics, Orkut, and Google Docs

Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33

Page 72: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 73: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 74: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KB

Rows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 75: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 76: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 77: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 78: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 79: Topic 12: NoSQL in Action

Data Model

Values stored as arrays of bytes which need to be interpreted by theclients

Values are addressed by a 3-tuple (row-key, column-key,timestamp)

Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tablets

I The unit of distribution and load balancing

Reads can be made efficient (only having to access a small number ofservers) by wisely choosing row keys

I Row ranges with small lexicographic distances are partitioned into fewertablets

I For instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.

Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33

Page 80: Topic 12: NoSQL in Action

Columns

No limit on the number of columns per table

Columns grouped into sets called column families based on their keyprefix

I Basic unit of access controlI Expected to store the same or similar type of data so that it can be

compressedI Need to be created before data can be stored in a column

Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33

Page 81: Topic 12: NoSQL in Action

Columns

No limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix

I Basic unit of access controlI Expected to store the same or similar type of data so that it can be

compressedI Need to be created before data can be stored in a column

Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33

Page 82: Topic 12: NoSQL in Action

Columns

No limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix

I Basic unit of access control

I Expected to store the same or similar type of data so that it can becompressed

I Need to be created before data can be stored in a column

Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33

Page 83: Topic 12: NoSQL in Action

Columns

No limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix

I Basic unit of access controlI Expected to store the same or similar type of data so that it can be

compressed

I Need to be created before data can be stored in a column

Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33

Page 84: Topic 12: NoSQL in Action

Columns

No limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix

I Basic unit of access controlI Expected to store the same or similar type of data so that it can be

compressedI Need to be created before data can be stored in a column

Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33

Page 85: Topic 12: NoSQL in Action

Timestamps

64-bit integers that represent different versions of a cell value

Value assigned by either the datastore or the client

Cells ordered in decreasing order of their timestamp

Automatic garbage collection can be used to remove revisions

Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33

Page 86: Topic 12: NoSQL in Action

Timestamps

64-bit integers that represent different versions of a cell value

Value assigned by either the datastore or the client

Cells ordered in decreasing order of their timestamp

Automatic garbage collection can be used to remove revisions

Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33

Page 87: Topic 12: NoSQL in Action

Timestamps

64-bit integers that represent different versions of a cell value

Value assigned by either the datastore or the client

Cells ordered in decreasing order of their timestamp

Automatic garbage collection can be used to remove revisions

Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33

Page 88: Topic 12: NoSQL in Action

Timestamps

64-bit integers that represent different versions of a cell value

Value assigned by either the datastore or the client

Cells ordered in decreasing order of their timestamp

Automatic garbage collection can be used to remove revisions

Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33

Page 89: Topic 12: NoSQL in Action

API

Read operations for lookup, selection, etc.

Write operations for creation, update, and deletion of values

Write operations for tables and column families for creation anddeletion

Administrative operations to modify store configuration and metadata

MapReduce hooks

Transactions are atomic at the single-row level

Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33

Page 90: Topic 12: NoSQL in Action

API

Read operations for lookup, selection, etc.

Write operations for creation, update, and deletion of values

Write operations for tables and column families for creation anddeletion

Administrative operations to modify store configuration and metadata

MapReduce hooks

Transactions are atomic at the single-row level

Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33

Page 91: Topic 12: NoSQL in Action

API

Read operations for lookup, selection, etc.

Write operations for creation, update, and deletion of values

Write operations for tables and column families for creation anddeletion

Administrative operations to modify store configuration and metadata

MapReduce hooks

Transactions are atomic at the single-row level

Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33

Page 92: Topic 12: NoSQL in Action

API

Read operations for lookup, selection, etc.

Write operations for creation, update, and deletion of values

Write operations for tables and column families for creation anddeletion

Administrative operations to modify store configuration and metadata

MapReduce hooks

Transactions are atomic at the single-row level

Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33

Page 93: Topic 12: NoSQL in Action

API

Read operations for lookup, selection, etc.

Write operations for creation, update, and deletion of values

Write operations for tables and column families for creation anddeletion

Administrative operations to modify store configuration and metadata

MapReduce hooks

Transactions are atomic at the single-row level

Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33

Page 94: Topic 12: NoSQL in Action

API

Read operations for lookup, selection, etc.

Write operations for creation, update, and deletion of values

Write operations for tables and column families for creation anddeletion

Administrative operations to modify store configuration and metadata

MapReduce hooks

Transactions are atomic at the single-row level

Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33

Page 95: Topic 12: NoSQL in Action

Architecture

Implemented atop GFS

Multiple tablet servers and a single master

Zubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33

Page 96: Topic 12: NoSQL in Action

Architecture

Implemented atop GFS

Multiple tablet servers and a single master

Zubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33

Page 97: Topic 12: NoSQL in Action

HBase

Open source clone of HBase in Java

Implemented atop HDFS

HBase can be the source and/or the sink of Hadoop jobs

Facebook Chat implemented using HBase

Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33

Page 98: Topic 12: NoSQL in Action

HBase

Open source clone of HBase in Java

Implemented atop HDFS

HBase can be the source and/or the sink of Hadoop jobs

Facebook Chat implemented using HBase

Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33

Page 99: Topic 12: NoSQL in Action

HBase

Open source clone of HBase in Java

Implemented atop HDFS

HBase can be the source and/or the sink of Hadoop jobs

Facebook Chat implemented using HBase

Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33

Page 100: Topic 12: NoSQL in Action

HBase

Open source clone of HBase in Java

Implemented atop HDFS

HBase can be the source and/or the sink of Hadoop jobs

Facebook Chat implemented using HBase

Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33

Page 101: Topic 12: NoSQL in Action

Outline

1 Amazon’s Dynamo

2 MongoDB

3 Google BigTable

4 Cassandra

Zubair Nabi 12: NoSQL in Action April 20, 2013 29 / 33

Page 102: Topic 12: NoSQL in Action

Introduction

Borrows concepts from both Dynamo and BigTable

Originally developed by Facebook but now an Apache open sourceproject

Designed for Facebook Chat for efficiently storing, indexing, andsearching messages

Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33

Page 103: Topic 12: NoSQL in Action

Introduction

Borrows concepts from both Dynamo and BigTable

Originally developed by Facebook but now an Apache open sourceproject

Designed for Facebook Chat for efficiently storing, indexing, andsearching messages

Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33

Page 104: Topic 12: NoSQL in Action

Introduction

Borrows concepts from both Dynamo and BigTable

Originally developed by Facebook but now an Apache open sourceproject

Designed for Facebook Chat for efficiently storing, indexing, andsearching messages

Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33

Page 105: Topic 12: NoSQL in Action

Design Goals

Processing of a large amount of data

Highly scalable

Reliability at a massive scale

High throughput writes without sacrificing read efficiency

Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33

Page 106: Topic 12: NoSQL in Action

Design Goals

Processing of a large amount of data

Highly scalable

Reliability at a massive scale

High throughput writes without sacrificing read efficiency

Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33

Page 107: Topic 12: NoSQL in Action

Design Goals

Processing of a large amount of data

Highly scalable

Reliability at a massive scale

High throughput writes without sacrificing read efficiency

Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33

Page 108: Topic 12: NoSQL in Action

Design Goals

Processing of a large amount of data

Highly scalable

Reliability at a massive scale

High throughput writes without sacrificing read efficiency

Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33

Page 109: Topic 12: NoSQL in Action

Data Model

A table is a distributed multidimensional map indexed by a key

Rows are identified by a string-key and operations over them areatomic per replica regardless of the number of columns

Column families encapsule columns and super columns

Columns have a name and store a number of values per row, each witha timestamp

Super columns are columns with sub columns

Only three operations to get, insert, and delete

Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33

Page 110: Topic 12: NoSQL in Action

Data Model

A table is a distributed multidimensional map indexed by a key

Rows are identified by a string-key and operations over them areatomic per replica regardless of the number of columns

Column families encapsule columns and super columns

Columns have a name and store a number of values per row, each witha timestamp

Super columns are columns with sub columns

Only three operations to get, insert, and delete

Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33

Page 111: Topic 12: NoSQL in Action

Data Model

A table is a distributed multidimensional map indexed by a key

Rows are identified by a string-key and operations over them areatomic per replica regardless of the number of columns

Column families encapsule columns and super columns

Columns have a name and store a number of values per row, each witha timestamp

Super columns are columns with sub columns

Only three operations to get, insert, and delete

Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33

Page 112: Topic 12: NoSQL in Action

Data Model

A table is a distributed multidimensional map indexed by a key

Rows are identified by a string-key and operations over them areatomic per replica regardless of the number of columns

Column families encapsule columns and super columns

Columns have a name and store a number of values per row, each witha timestamp

Super columns are columns with sub columns

Only three operations to get, insert, and delete

Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33

Page 113: Topic 12: NoSQL in Action

Data Model

A table is a distributed multidimensional map indexed by a key

Rows are identified by a string-key and operations over them areatomic per replica regardless of the number of columns

Column families encapsule columns and super columns

Columns have a name and store a number of values per row, each witha timestamp

Super columns are columns with sub columns

Only three operations to get, insert, and delete

Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33

Page 114: Topic 12: NoSQL in Action

Data Model

A table is a distributed multidimensional map indexed by a key

Rows are identified by a string-key and operations over them areatomic per replica regardless of the number of columns

Column families encapsule columns and super columns

Columns have a name and store a number of values per row, each witha timestamp

Super columns are columns with sub columns

Only three operations to get, insert, and delete

Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33

Page 115: Topic 12: NoSQL in Action

References

1 NoSQL Databases: https://oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf

Zubair Nabi 12: NoSQL in Action April 20, 2013 33 / 33