nosql databases · 2019-12-03 · after an update completes, any subsequent access will return...

136
NoSQL Databases Amir H. Payberah [email protected] 03/09/2019

Upload: others

Post on 29-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

NoSQL Databases

Amir H. [email protected]

03/09/2019

Page 2: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The Course Web Page

https://id2221kth.github.io

1 / 89

Page 3: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Where Are We?

2 / 89

Page 4: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Database and Database Management System

I Database: an organized collection of data.

I Database Management System (DBMS): a software to capture and analyze data.

3 / 89

Page 5: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Three Database Revolutions

[Guy Harrison, Next Generation Databases: NoSQLand Big Data, 2015]

4 / 89

Page 6: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Early Database Systems

I There were databases but no Database Management Systems (DBMS).

[Guy Harrison, Next Generation Databases: NoSQLand Big Data, 2015]

5 / 89

Page 7: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The First Database Revolution

I Navigational data model: hierarchical model (IMS) and network model (CODASYL).

I Disk-aware

[Guy Harrison, Next Generation Databases: NoSQLand Big Data, 2015]

6 / 89

Page 8: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The Second Database Revolution

I Relational data model: Edgar F. Codd paper• Logical data is disconnected from physical information storage

I ACID transactions• Atomic, Consistent, Isolated, Durable

I SQL language

I Object databases• Information is represented in the form of objects

7 / 89

Page 9: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The Second Database Revolution

I Relational data model: Edgar F. Codd paper• Logical data is disconnected from physical information storage

I ACID transactions• Atomic, Consistent, Isolated, Durable

I SQL language

I Object databases• Information is represented in the form of objects

7 / 89

Page 10: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The Second Database Revolution

I Relational data model: Edgar F. Codd paper• Logical data is disconnected from physical information storage

I ACID transactions• Atomic, Consistent, Isolated, Durable

I SQL language

I Object databases• Information is represented in the form of objects

7 / 89

Page 11: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The Second Database Revolution

I Relational data model: Edgar F. Codd paper• Logical data is disconnected from physical information storage

I ACID transactions• Atomic, Consistent, Isolated, Durable

I SQL language

I Object databases• Information is represented in the form of objects

7 / 89

Page 12: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the whole transaction is

aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction so that committed

data cannot be lost through a power failure.

8 / 89

Page 13: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the whole transaction is

aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction so that committed

data cannot be lost through a power failure.

8 / 89

Page 14: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the whole transaction is

aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction so that committed

data cannot be lost through a power failure.

8 / 89

Page 15: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the whole transaction is

aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction so that committed

data cannot be lost through a power failure.

8 / 89

Page 16: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

The Third Database Revolution

I NoSQL databases: BASE instead of ACID.

I NewSQL databases: scalable performance of NoSQL + ACID.

[http://ithare.com/nosql-vs-sql-for-mogs]

9 / 89

Page 17: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Three Waves of Database Technology

[Guy Harrison, Next Generation Databases: NoSQLand Big Data, 2015]

10 / 89

Page 18: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SQL vs. NoSQL Databases

11 / 89

Page 19: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Relational SQL Databases

I The dominant technology for storing structured data in web and business applications.

I SQL is good• Rich language and toolset• Easy to use and integrate• Many vendors

I They promise: ACID

12 / 89

Page 20: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SQL Databases Challenges

I Web-based applications caused spikes.• Internet-scale data size• High read-write rates• Frequent schema changes

I RDBMS were not designed to be distributed.

13 / 89

Page 21: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SQL Databases Challenges

I Web-based applications caused spikes.• Internet-scale data size• High read-write rates• Frequent schema changes

I RDBMS were not designed to be distributed.

13 / 89

Page 22: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Scaling SQL Databases is Expensive and Inefficient

[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]

14 / 89

Page 23: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

NoSQL

I Avoids:• Overhead of ACID properties• Complexity of SQL query

I Provides:• Scalablity• Easy and frequent changes to DB• Large data volumes

15 / 89

Page 24: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

NoSQL Cost and Performance

[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]

16 / 89

Page 25: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SQL vs. NoSQL

[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]

17 / 89

Page 26: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

ACID vs. BASE

18 / 89

Page 27: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Availability

I Replicating data to improve the availability of data.

I Data replication• Storing data in more than one site or node

19 / 89

Page 28: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Consistency

I Strong consistency• After an update completes, any subsequent access will return the updated value.

I Eventual consistency• Does not guarantee that subsequent accesses will return the updated value.• Inconsistency window.• If no new updates are made to the object, eventually all accesses will return the last

updated value.

20 / 89

Page 29: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Consistency

I Strong consistency• After an update completes, any subsequent access will return the updated value.

I Eventual consistency• Does not guarantee that subsequent accesses will return the updated value.• Inconsistency window.• If no new updates are made to the object, eventually all accesses will return the last

updated value.

20 / 89

Page 30: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

CAP Theorem

I Consistency• Consistent state of data after the execution of an operation.

I Availability• Clients can always read and write data.

I Partition Tolerance• Continue the operation in the presence of network partitions.

I You can choose only two!

21 / 89

Page 31: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Consistency vs. Availability

I The large-scale applications have to be reliable: availability, consistency, partitiontolerance

I Not possible to achieve with ACID properties.

I The BASE approach forfeits the ACID properties of consistency and isolation in favorof availability and performance.

22 / 89

Page 32: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

BASE Properties

I Basic Availability• Possibilities of faults but not a fault of the whole system.

I Soft-state• Copies of a data item may be inconsistent

I Eventually consistent• Copies becomes consistent at some later time if there are no more updates to that data

item

23 / 89

Page 33: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

ACID vs. BASE

[https://www.guru99.com/sql-vs-nosql.html]

24 / 89

Page 34: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

NoSQL Data Models

25 / 89

Page 35: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

NoSQL Data Models

[http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques]

26 / 89

Page 36: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Key-Value Data Model

I Collection of key/value pairs.

I Ordered Key-Value: processing over key ranges.

I Dynamo, Scalaris, Voldemort, Riak, ...

27 / 89

Page 37: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Column-Oriented Data Model

I Similar to a key/value store, but the value can have multiple attributes (Columns).

I Column: a set of data values of a particular type.

I Store and process data by column instead of row.

I BigTable, Hbase, Cassandra, ...

28 / 89

Page 38: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Document Data Model

I Similar to a column-oriented store, but values can have complex documents.

I Flexible schema (XML, YAML, JSON, and BSON).

I CouchDB, MongoDB, ...

{

FirstName: "Bob",

Address: "5 Oak St.",

Hobby: "sailing"

}

{

FirstName: "Jonathan",

Address: "15 Wanamassa Point Road",

Children: [

{Name: "Michael", Age: 10},

{Name: "Jennifer", Age: 8},

]

}

29 / 89

Page 39: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Graph Data Model

I Uses graph structures with nodes, edges, and properties to represent and store data.

I Neo4J, InfoGrid, ...

[http://en.wikipedia.org/wiki/Graph database]

30 / 89

Page 40: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

BigTable

31 / 89

Page 41: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

BigTable

I Lots of (semi-)structured data at Google.• URLs, per-user data, geographical locations, ...

I Distributed multi-level map

I CAP: strong consistency and partition tolerance

32 / 89

Page 42: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model

33 / 89

Page 43: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (1/7)

I Column-Oriented data model

I Similar to a key/value store, but the value can have multiple attributes (Columns).

I Column: a set of data values of a particular type.

I Store and process data by column instead of row.

34 / 89

Page 44: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (1/7)

I Column-Oriented data model

I Similar to a key/value store, but the value can have multiple attributes (Columns).

I Column: a set of data values of a particular type.

I Store and process data by column instead of row.

34 / 89

Page 45: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (2/7)

I In many analytical databases queries, few attributes are needed.

I Column values are stored contiguously on disk: reduces I/O.

[Lars George, Hbase: The Definitive Guide, O’Reilly, 2011]

35 / 89

Page 46: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (3/7)

I Table

I Distributed multi-dimensional sparse map

36 / 89

Page 47: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (4/7)

I Rows

I Every read or write in a row is atomic.

I Rows sorted in lexicographical order.

37 / 89

Page 48: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (5/7)

I Column

I The basic unit of data access.

I Column families: group of (the same type) column keys.

I Column key naming: family:qualifier

38 / 89

Page 49: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (6/7)

I Timestamp

I Each column value may contain multiple versions.

39 / 89

Page 50: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (7/7)

I Tablet: contiguous ranges of rows stored together.

I Tablets are split by the system when they become too large.

I Each tablet is served by exactly one tablet server.

40 / 89

Page 51: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

System Architecture

41 / 89

Page 52: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

BigTable System Structure

[https://www.slideshare.net/GrishaWeintraub/cap-28353551]

42 / 89

Page 53: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Main Components

I Master

I Tablet server

I Client library

43 / 89

Page 54: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master

I Assigns tablets to tablet server.

I Balances tablet server load.

I Garbage collection of unneeded files in GFS.

I Handles schema changes, e.g., table and column family creations

44 / 89

Page 55: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master

I Assigns tablets to tablet server.

I Balances tablet server load.

I Garbage collection of unneeded files in GFS.

I Handles schema changes, e.g., table and column family creations

44 / 89

Page 56: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master

I Assigns tablets to tablet server.

I Balances tablet server load.

I Garbage collection of unneeded files in GFS.

I Handles schema changes, e.g., table and column family creations

44 / 89

Page 57: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master

I Assigns tablets to tablet server.

I Balances tablet server load.

I Garbage collection of unneeded files in GFS.

I Handles schema changes, e.g., table and column family creations

44 / 89

Page 58: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Server

I Can be added or removed dynamically.

I Each manages a set of tablets (typically 10-1000 tablets/server).

I Handles read/write requests to tablets.

I Splits tablets when too large.

45 / 89

Page 59: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Server

I Can be added or removed dynamically.

I Each manages a set of tablets (typically 10-1000 tablets/server).

I Handles read/write requests to tablets.

I Splits tablets when too large.

45 / 89

Page 60: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Server

I Can be added or removed dynamically.

I Each manages a set of tablets (typically 10-1000 tablets/server).

I Handles read/write requests to tablets.

I Splits tablets when too large.

45 / 89

Page 61: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Server

I Can be added or removed dynamically.

I Each manages a set of tablets (typically 10-1000 tablets/server).

I Handles read/write requests to tablets.

I Splits tablets when too large.

45 / 89

Page 62: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Client Library

I Library that is linked into every client.

I Client data does not move though the master.

I Clients communicate directly with tablet servers for reads/writes.

46 / 89

Page 63: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Building Blocks

I The building blocks for the BigTable are:• Google File System (GFS)• Chubby• SSTable

47 / 89

Page 64: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Google File System (GFS)

I Large-scale distributed file system.

I Store log and data files.

48 / 89

Page 65: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Chubby Lock Service

I Ensure there is only one active master.

I Store bootstrap location of BigTable data.

I Discover tablet servers.

I Store BigTable schema information and access control lists.

49 / 89

Page 66: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SSTable

I SSTable file format used internally to store BigTable data.

I Chunks of data plus a block index.

I Immutable, sorted file of key-value pairs.

I Each SSTable is stored in a GFS file.

50 / 89

Page 67: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SSTable

I SSTable file format used internally to store BigTable data.

I Chunks of data plus a block index.

I Immutable, sorted file of key-value pairs.

I Each SSTable is stored in a GFS file.

50 / 89

Page 68: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SSTable

I SSTable file format used internally to store BigTable data.

I Chunks of data plus a block index.

I Immutable, sorted file of key-value pairs.

I Each SSTable is stored in a GFS file.

50 / 89

Page 69: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

SSTable

I SSTable file format used internally to store BigTable data.

I Chunks of data plus a block index.

I Immutable, sorted file of key-value pairs.

I Each SSTable is stored in a GFS file.

50 / 89

Page 70: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Serving

51 / 89

Page 71: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master Startup

I The master executes the following steps at startup:

• Grabs a unique master lock in Chubby, which prevents concurrent master instantiations.

• Scans the servers directory in Chubby to find the live servers.

• Communicates with every live tablet server to discover what tablets are already assignedto each server.

• Scans the METADATA table to learn the set of tablets.

52 / 89

Page 72: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master Startup

I The master executes the following steps at startup:

• Grabs a unique master lock in Chubby, which prevents concurrent master instantiations.

• Scans the servers directory in Chubby to find the live servers.

• Communicates with every live tablet server to discover what tablets are already assignedto each server.

• Scans the METADATA table to learn the set of tablets.

52 / 89

Page 73: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master Startup

I The master executes the following steps at startup:

• Grabs a unique master lock in Chubby, which prevents concurrent master instantiations.

• Scans the servers directory in Chubby to find the live servers.

• Communicates with every live tablet server to discover what tablets are already assignedto each server.

• Scans the METADATA table to learn the set of tablets.

52 / 89

Page 74: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master Startup

I The master executes the following steps at startup:

• Grabs a unique master lock in Chubby, which prevents concurrent master instantiations.

• Scans the servers directory in Chubby to find the live servers.

• Communicates with every live tablet server to discover what tablets are already assignedto each server.

• Scans the METADATA table to learn the set of tablets.

52 / 89

Page 75: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Master Startup

I The master executes the following steps at startup:

• Grabs a unique master lock in Chubby, which prevents concurrent master instantiations.

• Scans the servers directory in Chubby to find the live servers.

• Communicates with every live tablet server to discover what tablets are already assignedto each server.

• Scans the METADATA table to learn the set of tablets.

52 / 89

Page 76: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of live tablet serves and unassigned tablets.• When a tablet server starts, it creates and acquires an exclusive lock in Chubby.

I Master detects the status of the lock of each tablet server by checking periodically.

I Master is responsible for finding when tablet server is no longer serving its tabletsand reassigning those tablets as soon as possible.

53 / 89

Page 77: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of live tablet serves and unassigned tablets.• When a tablet server starts, it creates and acquires an exclusive lock in Chubby.

I Master detects the status of the lock of each tablet server by checking periodically.

I Master is responsible for finding when tablet server is no longer serving its tabletsand reassigning those tablets as soon as possible.

53 / 89

Page 78: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of live tablet serves and unassigned tablets.• When a tablet server starts, it creates and acquires an exclusive lock in Chubby.

I Master detects the status of the lock of each tablet server by checking periodically.

I Master is responsible for finding when tablet server is no longer serving its tabletsand reassigning those tablets as soon as possible.

53 / 89

Page 79: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of live tablet serves and unassigned tablets.• When a tablet server starts, it creates and acquires an exclusive lock in Chubby.

I Master detects the status of the lock of each tablet server by checking periodically.

I Master is responsible for finding when tablet server is no longer serving its tabletsand reassigning those tablets as soon as possible.

53 / 89

Page 80: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Finding a Tablet

I Three-level hierarchy.

I The first level is a file stored in Chubby that contains the location of the root tablet.

I Root tablet contains location of all tablets in a special METADATA table.

I METADATA table contains location of each tablet under a row.

I The client library caches tablet locations.

54 / 89

Page 81: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Serving (1/2)

I Updates committed to a commit log.

I Recently committed updates are stored in memory - memtable

I Older updates are stored in a sequence of SSTables.

55 / 89

Page 82: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Serving (2/2)

I Strong consistency• Only one tablet server is responsible for a given piece of data.• Replication is handled on the GFS layer.

I Trade-off with availability• If a tablet server fails, its portion of data is temporarily unavailable until a new server

is assigned.

56 / 89

Page 83: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Tablet Serving (2/2)

I Strong consistency• Only one tablet server is responsible for a given piece of data.• Replication is handled on the GFS layer.

I Trade-off with availability• If a tablet server fails, its portion of data is temporarily unavailable until a new server

is assigned.

56 / 89

Page 84: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Loading Tablets

I To load a tablet, a tablet server does the following:

I Finds locaton of tablet through its METADATA.• Metadata for a tablet includes list of SSTables and set of redo points.

I Read SSTables index blocks into memory.

I Read the commit log since the redo point and reconstructs the memtable.

57 / 89

Page 85: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

BigTable vs. HBase

BigTable HBase

GFS HDFS

Tablet Server Region Server

SSTable StoreFile

Memtable MemStore

Chubby ZooKeeper

58 / 89

Page 86: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

HBase Example

# Create the table "test", with the column family "cf"

create ’test’, ’cf’

# Use describe to get the description of the "test" table

describe ’test’

# Put data in the "test" table

put ’test’, ’row1’, ’cf:a’, ’value1’

put ’test’, ’row2’, ’cf:b’, ’value2’

put ’test’, ’row3’, ’cf:c’, ’value3’

# Scan the table for all data at once

scan ’test’

# To get a single row of data at a time, use the get command

get ’test’, ’row1’

59 / 89

Page 87: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

HBase Example

# Create the table "test", with the column family "cf"

create ’test’, ’cf’

# Use describe to get the description of the "test" table

describe ’test’

# Put data in the "test" table

put ’test’, ’row1’, ’cf:a’, ’value1’

put ’test’, ’row2’, ’cf:b’, ’value2’

put ’test’, ’row3’, ’cf:c’, ’value3’

# Scan the table for all data at once

scan ’test’

# To get a single row of data at a time, use the get command

get ’test’, ’row1’

59 / 89

Page 88: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

HBase Example

# Create the table "test", with the column family "cf"

create ’test’, ’cf’

# Use describe to get the description of the "test" table

describe ’test’

# Put data in the "test" table

put ’test’, ’row1’, ’cf:a’, ’value1’

put ’test’, ’row2’, ’cf:b’, ’value2’

put ’test’, ’row3’, ’cf:c’, ’value3’

# Scan the table for all data at once

scan ’test’

# To get a single row of data at a time, use the get command

get ’test’, ’row1’

59 / 89

Page 89: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

HBase Example

# Create the table "test", with the column family "cf"

create ’test’, ’cf’

# Use describe to get the description of the "test" table

describe ’test’

# Put data in the "test" table

put ’test’, ’row1’, ’cf:a’, ’value1’

put ’test’, ’row2’, ’cf:b’, ’value2’

put ’test’, ’row3’, ’cf:c’, ’value3’

# Scan the table for all data at once

scan ’test’

# To get a single row of data at a time, use the get command

get ’test’, ’row1’

59 / 89

Page 90: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

HBase Example

# Create the table "test", with the column family "cf"

create ’test’, ’cf’

# Use describe to get the description of the "test" table

describe ’test’

# Put data in the "test" table

put ’test’, ’row1’, ’cf:a’, ’value1’

put ’test’, ’row2’, ’cf:b’, ’value2’

put ’test’, ’row3’, ’cf:c’, ’value3’

# Scan the table for all data at once

scan ’test’

# To get a single row of data at a time, use the get command

get ’test’, ’row1’

59 / 89

Page 91: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra

60 / 89

Page 92: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra

I A column-oriented database

I It was created for Facebook and was later open sourced

I CAP: availability and partition tolerance

61 / 89

Page 93: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Borrowed From BigTable

I Data model: column oriented• Keyspaces (similar to the schema in a relational database), tables, and columns.

I SSTable disk storage• Append-only commit log• Memtable (buffering and sorting)• Immutable sstable files

62 / 89

Page 94: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Borrowed From BigTable

I Data model: column oriented• Keyspaces (similar to the schema in a relational database), tables, and columns.

I SSTable disk storage• Append-only commit log• Memtable (buffering and sorting)• Immutable sstable files

62 / 89

Page 95: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Partitioning (1/2)

I Key/value, where values are stored as objects.

I If size of data exceeds the capacity of a single machine: partitioning

I Consistent hashing for partitioning.

63 / 89

Page 96: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Partitioning (1/2)

I Key/value, where values are stored as objects.

I If size of data exceeds the capacity of a single machine: partitioning

I Consistent hashing for partitioning.

63 / 89

Page 97: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Partitioning (2/2)

I Consistent hashing.

I Hash both data and node ids using the same hash function in a same id space.

I partition = hash(d) mod n, d: data, n: the size of the id space

id space = [0, 15], n = 16

hash("Fatemeh") = 12

hash("Ahmad") = 2

hash("Seif") = 9

hash("Jim") = 14

hash("Sverker") = 4

64 / 89

Page 98: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Partitioning (2/2)

I Consistent hashing.

I Hash both data and node ids using the same hash function in a same id space.

I partition = hash(d) mod n, d: data, n: the size of the id space

id space = [0, 15], n = 16

hash("Fatemeh") = 12

hash("Ahmad") = 2

hash("Seif") = 9

hash("Jim") = 14

hash("Sverker") = 4

64 / 89

Page 99: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Replication

I To achieve high availability and durability, data should be replicated on multiplenodes.

65 / 89

Page 100: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Adding and Removing Nodes

I Gossip-based mechanism: periodically, each node contacts another randomly selectednode.

66 / 89

Page 101: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Adding and Removing Nodes

I Gossip-based mechanism: periodically, each node contacts another randomly selectednode.

66 / 89

Page 102: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra Example

# Create a keyspace called "test"

create keyspace test

with replication = {’class’: ’SimpleStrategy’, ’replication_factor’: 1};

# Print the list of keyspaces

describe keyspaces;

# Navigate to the "test" keyspace

use test

# Create the "words" table in the "test" keyspace

create table words (word text, count int, primary key (word));

# Insert a row

insert into words(word, count) values(’hello’, 5);

# Look at the table

select * from words;

67 / 89

Page 103: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra Example

# Create a keyspace called "test"

create keyspace test

with replication = {’class’: ’SimpleStrategy’, ’replication_factor’: 1};

# Print the list of keyspaces

describe keyspaces;

# Navigate to the "test" keyspace

use test

# Create the "words" table in the "test" keyspace

create table words (word text, count int, primary key (word));

# Insert a row

insert into words(word, count) values(’hello’, 5);

# Look at the table

select * from words;

67 / 89

Page 104: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra Example

# Create a keyspace called "test"

create keyspace test

with replication = {’class’: ’SimpleStrategy’, ’replication_factor’: 1};

# Print the list of keyspaces

describe keyspaces;

# Navigate to the "test" keyspace

use test

# Create the "words" table in the "test" keyspace

create table words (word text, count int, primary key (word));

# Insert a row

insert into words(word, count) values(’hello’, 5);

# Look at the table

select * from words;

67 / 89

Page 105: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra Example

# Create a keyspace called "test"

create keyspace test

with replication = {’class’: ’SimpleStrategy’, ’replication_factor’: 1};

# Print the list of keyspaces

describe keyspaces;

# Navigate to the "test" keyspace

use test

# Create the "words" table in the "test" keyspace

create table words (word text, count int, primary key (word));

# Insert a row

insert into words(word, count) values(’hello’, 5);

# Look at the table

select * from words;

67 / 89

Page 106: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra Example

# Create a keyspace called "test"

create keyspace test

with replication = {’class’: ’SimpleStrategy’, ’replication_factor’: 1};

# Print the list of keyspaces

describe keyspaces;

# Navigate to the "test" keyspace

use test

# Create the "words" table in the "test" keyspace

create table words (word text, count int, primary key (word));

# Insert a row

insert into words(word, count) values(’hello’, 5);

# Look at the table

select * from words;

67 / 89

Page 107: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cassandra Example

# Create a keyspace called "test"

create keyspace test

with replication = {’class’: ’SimpleStrategy’, ’replication_factor’: 1};

# Print the list of keyspaces

describe keyspaces;

# Navigate to the "test" keyspace

use test

# Create the "words" table in the "test" keyspace

create table words (word text, count int, primary key (word));

# Insert a row

insert into words(word, count) values(’hello’, 5);

# Look at the table

select * from words;

67 / 89

Page 108: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Neo4j

68 / 89

Page 109: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Neo4j

I A graph database

I The relationships between data is equally important as the data itself

I Cypher: a declarative query language similar to SQL, but optimized for graphs

I CAP: strong consistency and availability

69 / 89

Page 110: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (1/4)

I Node (Vertex)• The main data element from which graphs are constructed.• A waypoint along a traversal route

70 / 89

Page 111: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (2/4)

I Relationship (Edge)

I May contain• Direction• Metadata, e.g., weight or relationship type

71 / 89

Page 112: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (3/4)

I Label• Define node category (optional)• Can have more than one

72 / 89

Page 113: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Data Model (4/4)

I Properties• Enrich a node or relationship

73 / 89

Page 114: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Example

[Ian Robinson et al., Graph Databases, 2015]

74 / 89

Page 115: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

How a Graph is Physically Stored in Neo4j? (1/2)

I Neo4j stores graph data in a number of different store files.

I Each store file contains the data for a specific part of the graph.• Separate stores for nodes, relationships, labels, and properties.

I The division of storage responsibilities facilitates performant graph traversals.

[Ian Robinson et al., Graph Databases, 2015]

75 / 89

Page 116: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

How a Graph is Physically Stored in Neo4j? (2/2)

[Ian Robinson et al., Graph Databases, 2015]

76 / 89

Page 117: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

77 / 89

Page 118: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

What is Cypher?

I Declarative query language

I (): Nodes

I []: Relationships

I {}: Properties

78 / 89

Page 119: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (1/4)

// Match all nodes

MATCH (n)

RETURN n;

// Match all nodes with a Person label

MATCH (n:Person)

RETURN n;

// Match all nodes with a Person label and property name is ’Tom Hanks’

MATCH (n:Person {name: ’Tom Hanks’})

RETURN n;

79 / 89

Page 120: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (1/4)

// Match all nodes

MATCH (n)

RETURN n;

// Match all nodes with a Person label

MATCH (n:Person)

RETURN n;

// Match all nodes with a Person label and property name is ’Tom Hanks’

MATCH (n:Person {name: ’Tom Hanks’})

RETURN n;

79 / 89

Page 121: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (1/4)

// Match all nodes

MATCH (n)

RETURN n;

// Match all nodes with a Person label

MATCH (n:Person)

RETURN n;

// Match all nodes with a Person label and property name is ’Tom Hanks’

MATCH (n:Person {name: ’Tom Hanks’})

RETURN n;

79 / 89

Page 122: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (2/4)

// Return nodes with label Person and name property equals ’Tom Hanks’

MATCH (p:Person)

WHERE p.name = ’Tom Hanks’

RETURN p;

// Return nodes with label Movie, released property is between 1991 and 1999

MATCH (m:Movie)

WHERE m.released > 1990 AND m.released < 2000

RETURN m;

// Find all the movies Tom Hanks acted in

MATCH (:Person {name:’Tom Hanks’})-[:ACTED_IN]->(m:Movie)

RETURN m.title;

80 / 89

Page 123: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (2/4)

// Return nodes with label Person and name property equals ’Tom Hanks’

MATCH (p:Person)

WHERE p.name = ’Tom Hanks’

RETURN p;

// Return nodes with label Movie, released property is between 1991 and 1999

MATCH (m:Movie)

WHERE m.released > 1990 AND m.released < 2000

RETURN m;

// Find all the movies Tom Hanks acted in

MATCH (:Person {name:’Tom Hanks’})-[:ACTED_IN]->(m:Movie)

RETURN m.title;

80 / 89

Page 124: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (2/4)

// Return nodes with label Person and name property equals ’Tom Hanks’

MATCH (p:Person)

WHERE p.name = ’Tom Hanks’

RETURN p;

// Return nodes with label Movie, released property is between 1991 and 1999

MATCH (m:Movie)

WHERE m.released > 1990 AND m.released < 2000

RETURN m;

// Find all the movies Tom Hanks acted in

MATCH (:Person {name:’Tom Hanks’})-[:ACTED_IN]->(m:Movie)

RETURN m.title;

80 / 89

Page 125: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (3/4)

// Find all the movies Tom Hanks directed and order by latest movie

MATCH (:Person {name:’Tom Hanks’})-[:DIRECTED]->(m:Movie)

RETURN m.title, m.release ORDER BY m.release DESC;

// Find all of the co-actors Tom Hanks has ever worked with

MATCH (:Person {name:’Tom Hanks’})-->(:Movie)<-[:ACTED_IN]-(coActor:Person)

RETURN coActor.name;

81 / 89

Page 126: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (3/4)

// Find all the movies Tom Hanks directed and order by latest movie

MATCH (:Person {name:’Tom Hanks’})-[:DIRECTED]->(m:Movie)

RETURN m.title, m.release ORDER BY m.release DESC;

// Find all of the co-actors Tom Hanks has ever worked with

MATCH (:Person {name:’Tom Hanks’})-->(:Movie)<-[:ACTED_IN]-(coActor:Person)

RETURN coActor.name;

81 / 89

Page 127: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (4/4)

// Find nodes with an ACTED_IN relationship

MATCH (p)-[:ACTED_IN]->()

RETURN p

// Find Person nodes with an ACTED_IN or DIRECTED_IN relationship

MATCH (p:Person)-[:ACTED_IN|DIRECTED]->()

RETURN p

// Find Person nodes who do not have an ACTED_IN relationship

MATCH (p:Person)

WHERE NOT (p)-[:ACTED_IN]->()

RETURN p

82 / 89

Page 128: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (4/4)

// Find nodes with an ACTED_IN relationship

MATCH (p)-[:ACTED_IN]->()

RETURN p

// Find Person nodes with an ACTED_IN or DIRECTED_IN relationship

MATCH (p:Person)-[:ACTED_IN|DIRECTED]->()

RETURN p

// Find Person nodes who do not have an ACTED_IN relationship

MATCH (p:Person)

WHERE NOT (p)-[:ACTED_IN]->()

RETURN p

82 / 89

Page 129: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Cypher Example (4/4)

// Find nodes with an ACTED_IN relationship

MATCH (p)-[:ACTED_IN]->()

RETURN p

// Find Person nodes with an ACTED_IN or DIRECTED_IN relationship

MATCH (p:Person)-[:ACTED_IN|DIRECTED]->()

RETURN p

// Find Person nodes who do not have an ACTED_IN relationship

MATCH (p:Person)

WHERE NOT (p)-[:ACTED_IN]->()

RETURN p

82 / 89

Page 130: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Summary

83 / 89

Page 131: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Summary

I NoSQL data models: key-value, column-oriented, document-oriented, graph-based

I Sharding and consistent hashing

I ACID vs. BASE

I CAP (Consistency vs. Availability)

84 / 89

Page 132: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Summary

I BigTable

I Column-oriented

I Main components: master, tablet server, client library

I Basic components: GFS, SSTable, Chubby

I CP

85 / 89

Page 133: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Summary

I Cassandra

I Column-oriented (similar to BigTable)

I Consistency hashing

I Gossip-based membership

I AP

86 / 89

Page 134: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Summary

I Neo4j

I Graph-based

I Cypher

I CA

87 / 89

Page 135: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

References

I F. Chang et al., Bigtable: A distributed storage system for structured data, ACMTransactions on Computer Systems (TOCS) 26.2, 2008.

I A. Lakshman et al., Cassandra: a decentralized structured storage system, ACMSIGOPS Operating Systems Review 44.2, 2010.

I I. Robinson et al., Graph Databases (2nd ed.), O’Reilly Media, 2015.

88 / 89

Page 136: NoSQL Databases · 2019-12-03 · After an update completes, any subsequent access will return theupdated value. I Eventualconsistency Doesnot guaranteethat subsequent accesses will

Questions?

AcknowledgementsSome content of the Neo4j slides were derived from Ljubica Lazarevic’s slides.

89 / 89