22 introduction to distributed databases · 13 node application server node node p3 →id:101-200...

Intro to Database Systems

15-445/15-645

Fall 2020

Andy PavloComputer Science Carnegie Mellon UniversityAP

22 Introduction to Distributed Databases

15-445/645 (Fall 2020)

ADMINISTRIVIA

Homework #5: Sunday Dec 6th @ 11:59pm

Project #4: Sunday Dec 13th @ 11:59pm

Potpourri + Review: Wednesday Dec 9th

→ Vote for what system you want me to talk about.https://cmudb.io/f20-systems

Final Exam:→ Session #1: Thursday Dec 17th @ 8:30am→ Session #2: Thursday Dec 17th @ 1:00pm

15-445/645 (Fall 2020)

UPCOMING DATABASE TALKS

Confluent ksqlDB (Kafka)→ Monday Nov 23rd @ 5pm ET

Microsoft SQL Server Optimizer→ Monday Nov 30th @ 5pm ET

Snowflake Lecture→ Monday Dec 7th @ 3:20pm ET

15-445/645 (Fall 2020)

PARALLEL VS. DISTRIBUTED

Parallel DBMSs:→ Nodes are physically close to each other.→ Nodes connected with high-speed LAN.→ Communication cost is assumed to be small.

Distributed DBMSs:→ Nodes can be far from each other.→ Nodes connected using public network.→ Communication cost and problems cannot be ignored.

15-445/645 (Fall 2020)

DISTRIBUTED DBMSs

Use the building blocks that we covered in single-node DBMSs to now support transaction processing and query execution in distributed environments.→ Optimization & Planning→ Concurrency Control→ Logging & Recovery

15-445/645 (Fall 2020)

TODAY'S AGENDA

System Architectures

Design Issues

Partitioning Schemes

Distributed Concurrency Control

15-445/645 (Fall 2020)

SYSTEM ARCHITECTURE

A DBMS's system architecture specifies what shared resources are directly accessible to CPUs.

This affects how CPUs coordinate with each other and where they retrieve/store objects in the database.

15-445/645 (Fall 2020)

SYSTEM ARCHITECTURE

SharedNothing

Network

SharedMemory

Network

SharedDisk

Network

SharedEverything

15-445/645 (Fall 2020)

SHARED MEMORY

CPUs have access to common memory address space via a fast interconnect.→ Each processor has a global view of all the

in-memory data structures. → Each DBMS instance on a processor has to

"know" about the other instances.

Network

15-445/645 (Fall 2020)

SHARED DISK

All CPUs can access a single logical disk directly via an interconnect, but each have their own private memories.→ Can scale execution layer independently

from the storage layer.→ Must send messages between CPUs to

learn about their current state.

Network

15-445/645 (Fall 2020)

Storage

SHARED DISK EXAMPLE

ApplicationServer Node

Get Id=101Page ABC

15-445/645 (Fall 2020)

Storage

SHARED DISK EXAMPLE

Get Id=200Page XYZ

15-445/645 (Fall 2020)

Storage

SHARED DISK EXAMPLE

NodeGet Id=101 Page ABC

15-445/645 (Fall 2020)

Storage

SHARED DISK EXAMPLE

Update 101Page ABC

15-445/645 (Fall 2020)

Storage

SHARED DISK EXAMPLE

Update 101Page ABC

15-445/645 (Fall 2020)

SHARED NOTHING

Each DBMS instance has its own CPU, memory, and disk.

Nodes only communicate with each other via network.→ Harder to scale capacity.→ Harder to ensure consistency.→ Better performance & efficiency.

Network

15-445/645 (Fall 2020)

SHARED NOTHING EXAMPLE

P1→ID:1-150

P2→ID:151-300

Get Id=200

15-445/645 (Fall 2020)

P1→ID:1-150

P2→ID:151-300

Get Id=10 Get Id=200

Get Id=200

15-445/645 (Fall 2020)

P3→ID:101-200

P1→ID:1-100

P2→ID:201-300

15-445/645 (Fall 2020)

EARLY DISTRIBUTED DATABASE SYSTEMS

MUFFIN – UC Berkeley (1979)

SDD-1 – CCA (1979)

System R* – IBM Research (1984)

Gamma – Univ. of Wisconsin (1986)

NonStop SQL – Tandem (1987)

Bernstein

Mohan DeWitt

Stonebraker

15-445/645 (Fall 2020)

DESIGN ISSUES

How does the application find data?

How to execute queries on distributed data?→ Push query to data.→ Pull data to query.

How does the DBMS ensure correctness?

15-445/645 (Fall 2020)

HOMOGENOUS VS. HETEROGENOUS

Approach #1: Homogenous Nodes→ Every node in the cluster can perform the same set of

tasks (albeit on potentially different partitions of data).→ Makes provisioning and failover "easier".

Approach #2: Heterogenous Nodes→ Nodes are assigned specific tasks.→ Can allow a single physical node to host multiple "virtual"

node types for dedicated tasks.

15-445/645 (Fall 2020)

MONGODB HETEROGENOUS ARCHITECTURE

Router (mongos)

Shards (mongod)

P1→ID:1-100

P2→ID:101-200

P3→ID:201-300

P4→ID:301-400

Config Server (mongod)

Router (mongos)

ApplicationServer

Get Id=101

15-445/645 (Fall 2020)

DATA TRANSPARENCY

Users should not be required to know where data is physically located, how tables are partitionedor replicated.

A query that works on a single-node DBMS should work the same on a distributed DBMS.

15-445/645 (Fall 2020)

DATABASE PARTITIONING

Split database across multiple resources:→ Disks, nodes, processors.→ Often called "sharding" in NoSQL systems.

The DBMS executes query fragments on each partition and then combines the results to produce a single answer.

15-445/645 (Fall 2020)

NAÏVE TABLE PARTITIONING

Assign an entire table to a single node.

Assumes that each node has enough storage space for an entire table.

Ideal if queries never join data across tables stored on different nodes and access patterns are uniform.

15-445/645 (Fall 2020)

Table1

SELECT * FROM table

Ideal Query:

Table2 Partitions

15-445/645 (Fall 2020)

Table1

SELECT * FROM table

Ideal Query:

Table2 Partitions

15-445/645 (Fall 2020)

Table1

SELECT * FROM table

Ideal Query:

Table2 Partitions

Table1

Table2

15-445/645 (Fall 2020)

HORIZONTAL PARTITIONING

Split a table's tuples into disjoint subsets.→ Choose column(s) that divides the database equally in

terms of size, load, or usage.→ Hash Partitioning, Range Partitioning

The DBMS can partition a database physical(shared nothing) or logically (shared disk).

15-445/645 (Fall 2020)

SELECT * FROM tableWHERE partitionKey = ?

Ideal Query:

PartitionsTable1101 a XXX 2019-11-29

102 b XXY 2019-11-28

103 c XYZ 2019-11-29

104 d XYX 2019-11-27

105 e XYY 2019-11-29

hash(a)%4 = P2

hash(b)%4 = P4

hash(c)%4 = P3

hash(d)%4 = P2

hash(e)%4 = P1

Partitioning Key

15-445/645 (Fall 2020)

Ideal Query:

102 b XXY 2019-11-28

103 c XYZ 2019-11-29

104 d XYX 2019-11-27

105 e XYY 2019-11-29

hash(a)%4 = P2

hash(b)%4 = P4

hash(c)%4 = P3

hash(d)%4 = P2

hash(e)%4 = P1

Partitioning Key

15-445/645 (Fall 2020)

Ideal Query:

102 b XXY 2019-11-28

103 c XYZ 2019-11-29

104 d XYX 2019-11-27

105 e XYY 2019-11-29

hash(a)%4 = P2

hash(b)%4 = P4

hash(c)%4 = P3

hash(d)%4 = P2

hash(e)%4 = P1

Partitioning Key

15-445/645 (Fall 2020)

Ideal Query:

102 b XXY 2019-11-28

103 c XYZ 2019-11-29

104 d XYX 2019-11-27

105 e XYY 2019-11-29

hash(a)%4 = P2

hash(b)%4 = P4

hash(c)%4 = P3

hash(d)%4 = P2

hash(e)%4 = P1

Partitioning Key

15-445/645 (Fall 2020)

Ideal Query:

102 b XXY 2019-11-28

103 c XYZ 2019-11-29

104 d XYX 2019-11-27

105 e XYY 2019-11-29

hash(a)%4 = P2

hash(b)%4 = P4

hash(c)%4 = P3

hash(d)%4 = P2

hash(e)%4 = P1

Partitioning Key

15-445/645 (Fall 2020)

Ideal Query:

102 b XXY 2019-11-28

103 c XYZ 2019-11-29

104 d XYX 2019-11-27

105 e XYY 2019-11-29

Partitioning Key

hash(a)%5 = P4

hash(b)%5 = P3

hash(c)%5 = P5

hash(d)%5 = P1

hash(e)%5 = P3

15-445/645 (Fall 2020)

CONSISTENT HASHING

hash(key1)

15-445/645 (Fall 2020)

CONSISTENT HASHING

hash(key2)

hash(key1)

15-445/645 (Fall 2020)

CONSISTENT HASHING

hash(key2)

hash(key1)

15-445/645 (Fall 2020)

CONSISTENT HASHING

If hash(key)=P4

15-445/645 (Fall 2020)

CONSISTENT HASHING

15-445/645 (Fall 2020)

CONSISTENT HASHING

15-445/645 (Fall 2020)

CONSISTENT HASHING

Replication Factor = 3P5

15-445/645 (Fall 2020)

CONSISTENT HASHING

Replication Factor = 3

hash(key1)

15-445/645 (Fall 2020)

CONSISTENT HASHING

Replication Factor = 3

hash(key1)

15-445/645 (Fall 2020)

Storage

LOGICAL PARTITIONING

15-445/645 (Fall 2020)

Storage

Get Id=1

15-445/645 (Fall 2020)

Storage

Get Id=3

15-445/645 (Fall 2020)

PHYSICAL PARTITIONING

ApplicationServer

Get Id=1Id=1

15-445/645 (Fall 2020)

PHYSICAL PARTITIONING

ApplicationServer

Get Id=3

15-445/645 (Fall 2020)

SINGLE-NODE VS. DISTRIBUTED

A single-node txn only accesses data that is contained on one partition.→ The DBMS does not need coordinate the behavior

concurrent txns running on other nodes.

A distributed txn accesses data at one or more partitions.→ Requires expensive coordination.

15-445/645 (Fall 2020)

TRANSACTION COORDINATION

If our DBMS supports multi-operation and distributed txns, we need a way to coordinate their execution in the system.

Two different approaches:→ Centralized: Global "traffic cop".→ Decentralized: Nodes organize themselves.

15-445/645 (Fall 2020)

TP MONITORS

A TP Monitor is an example of a centralized coordinator for distributed DBMSs.

Originally developed in the 1970-80s to provide txns between terminals and mainframe databases.→ Examples: ATMs, Airline Reservations.

Many DBMSs now support the same functionality internally.

15-445/645 (Fall 2020)

Coordinator

CENTRALIZED COORDINATOR

Partitions

ApplicationServer P3 P4

15-445/645 (Fall 2020)

Coordinator

PartitionsLock Request

15-445/645 (Fall 2020)

Coordinator

15-445/645 (Fall 2020)

Coordinator

Acknowledgement

15-445/645 (Fall 2020)

Coordinator

PartitionsCommit Request

Safe to commit?Application

Server P3 P4

15-445/645 (Fall 2020)

Coordinator

Partitions

Acknowledgement

Commit Request

Safe to commit?Application

Server P3 P4

15-445/645 (Fall 2020)

Query Requests

Partitions

15-445/645 (Fall 2020)

Query Requests

P1→ID:1-100

P2→ID:101-200

P3→ID:201-300

P4→ID:301-400

Partitions

15-445/645 (Fall 2020)

Query Requests

P1→ID:1-100

P2→ID:101-200

P3→ID:201-300

P4→ID:301-400

Partitions

15-445/645 (Fall 2020)

Safe to commit?

P1→ID:1-100

P2→ID:101-200

P3→ID:201-300

P4→ID:301-400

Commit Request

Partitions

15-445/645 (Fall 2020)

DECENTRALIZED COORDINATOR

ApplicationServer

Begin Request

PartitionsMaster Node

15-445/645 (Fall 2020)

ApplicationServer

Query Request

15-445/645 (Fall 2020)

ApplicationServer

Safe to commit?

Commit Request

15-445/645 (Fall 2020)

DISTRIBUTED CONCURRENCY CONTROL

Need to allow multiple txns to execute simultaneously across multiple nodes.→ Many of the same protocols from single-node DBMSs

can be adapted.

This is harder because of:→ Replication.→ Network Communication Overhead.→ Node Failures.→ Clock Skew.

15-445/645 (Fall 2020)

DISTRIBUTED 2PL

Node 1 Node 2

NETWORK

Set A=2

Set B=7

ApplicationServer

15-445/645 (Fall 2020)

DISTRIBUTED 2PL

Node 1 Node 2

NETWORK

Set A=2

A=1A=2

Set B=7

B=8B=7

ApplicationServer

15-445/645 (Fall 2020)

DISTRIBUTED 2PL

Node 1 Node 2

NETWORK

Set A=2

A=1A=2

Set B=7

B=8B=7

ApplicationServer

ApplicationServerSet B=9 Set A=0

15-445/645 (Fall 2020)

DISTRIBUTED 2PL

Node 1 Node 2

NETWORK

Set A=2

A=1A=2

Set B=7

B=8B=7

ApplicationServer

ApplicationServerSet B=9 Set A=0

Waits-For Graph

15-445/645 (Fall 2020)

CONCLUSION

I have barely scratched the surface on distributed database systems…

It is hard to get this right.

15-445/645 (Fall 2020)

NEXT CL ASS

Distributed OLTP Systems

Replication

CAP Theorem

Real-World Examples

22 introduction to distributed databases · 13 node application server node node p3 →id:101-200...

Documents

segment routing - conference.apnic.net © nokia 2016 •...

global leader election in distributed architecture ·...

anexo 1: cÁlculos hidrÁulicos en las...

distributed detection of node capture attacks in wireless...

an architecture for a power-aware distributed microsensor...

common issues with two-fiber bidirectional line switched...

centralized multi-node repair in distributed storage ·...

a distributed approach to node clustering in decentralized...

ieee transactions on parallel and distributed...

application note canopen in codesys v3 - festo usa note –...

back bay distributed antenna system proposed node relocation...

research article hierarchical node replication attacks...

gprs charging cdr-aggregation-limit · node-id certain...

distributed weighted-multidimensional scaling for node...

chapter 1: distributed systems: what is a distributed · pdf...

dynamic distributed orchestration of node-red iot...

shepherd: node monitors for fault-tolerant distributed...

network node is not needed anymore - completed distributed...

xyz: a motion-enabled, power aware sensor node platform for...

distributed health checking for compute node high...