data-intensive distributed computing - roegiest › bigdata-2019w › slides ›...

49
Data-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details CS 431/631 451/651 (Winter 2019) Adam Roegiest Kira Systems March 19, 2019 These slides are available at http://roegiest.com/bigdata-2019w/

Upload: others

Post on 25-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Data-Intensive Distributed Computing

Part 7: Mutable State (2/2)

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

CS 431/631 451/651 (Winter 2019)

Adam RoegiestKira Systems

March 19, 2019

These slides are available at http://roegiest.com/bigdata-2019w/

Page 2: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

The Fundamental Problem

We want to keep track of mutable state in a scalable manner

MapReduce won’t do!

Assumptions:State organized in terms of logical records

State unlikely to fit on single machine, must be distributed

Page 3: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Motivating Scenarios

Money shouldn’t be created or destroyed:Alice transfers $100 to Bob and $50 to Carol

The total amount of money after the transfer should be the same

Phantom shopping cart:Bob removes an item from his shopping cart…

Item still remains in the shopping cartBob refreshes the page a couple of times… item finally gone

Page 4: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Motivating Scenarios

People you don’t want seeing your pictures:Alice removes mom from list of people who can view photos

Alice posts embarrassing pictures from Spring BreakCan mom see Alice’s photo?

Why am I still getting messages?Bob unsubscribes from mailing list and receives confirmation

Message sent to mailing list right after unsubscribeDoes Bob receive the message?

Page 5: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Three Core Ideas

Partitioning (sharding)To increase scalability and to decrease latency

CachingTo reduce latency

ReplicationTo increase robustness (availability) and to increase throughput

Why do these scenarios happen?

Need replica coherence protocol!

Page 6: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: Wikipedia (Cake)

Page 7: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Morale of the story: there’s no free lunch!

Source: www.phdcomics.com/comics/archive.php?comicid=1475

(Everything is a tradeoff)

Page 8: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Three Core Ideas

Partitioning (sharding)To increase scalability and to decrease latency

CachingTo reduce latency

ReplicationTo increase robustness (availability) and to increase throughput

Why do these scenarios happen?

Need replica coherence protocol!

Page 9: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Relational Databases

… to the rescue!

Source: images.wikia.com/batman/images/b/b1/Bat_Signal.jpg

Page 10: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

How do RDBMSes do it?

Partition tables to keep transactions on a single machineExample: partition by user

What about transactions that require multiple machines?Example: transactions involving multiple users

Transactions on a single machine: (relatively) easy!

Solution: Two-Phase Commit

Page 11: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Coordinator

subordinates

Okay everyone, PREPARE! YES

YES

YES

Good.COMMIT!

ACK!

ACK!

ACK!

DONE!

2PC: Sketch

Page 12: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Coordinator

subordinates

Okay everyone, PREPARE! YES

YES

NO

ABORT!

2PC: Sketch

Page 13: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Coordinator

subordinates

Okay everyone, PREPARE! YES

YES

YES

Good.COMMIT!

ACK!

ACK!

2PC: Sketch

Page 14: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

2PC: Assumptions and Limitations

Assumptions:Persistent storage and write-ahead log at every node

WAL is never permanently lost

Limitations:It’s blocking and slow

What if the coordinator dies?

Page 15: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Three Core Ideas

Partitioning (sharding)To increase scalability and to decrease latency

CachingTo reduce latency

ReplicationTo increase robustness (availability) and to increase throughput

Why do these scenarios happen?

Need replica coherence protocol!

Page 16: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Replication possibilities

Update sent to a masterReplication is synchronous

Replication is asynchronousCombination of both

Update sent to an arbitrary replicaReplication is synchronous(?)Replication is asynchronous

Combination of both

Page 17: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Distributed ConsensusMore general problem: addresses replication and partitioning

Time

… Paxos

Hi everyone, let’s change

the value of x.Hi everyone,

let’s execute a transaction t.

Page 18: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Replication possibilities

Update sent to a masterReplication is synchronous

Replication is asynchronousCombination of both

Update sent to an arbitrary replicaReplication is synchronous(?)Replication is asynchronous

Combination of both

Guaranteed consistency with a consensus protocol

A buggy mess“Eventual Consistency”

Page 19: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Consistency

Availability

(Brewer, 2000)

Partition tolerance

… pick two

CAP “Theorem”

Page 20: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

CAP Tradeoffs

CA = consistency + availabilityE.g., parallel databases that use 2PC

AP = availability + tolerance to partitionsE.g., DNS, web caching

Page 21: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Wait a sec, that doesn’t sound right!

Source: Abadi (2012) Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 45(2):37-42

Is this helpful?

CAP not really even a “theorem” because vague definitionsMore precise formulation came a few years later

Page 22: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Abadi Says…

CAP says, in the presence of P, choose A or CBut you’d want to make this tradeoff even when there is no P

Fundamental tradeoff is between consistency and latencyNot available = (very) long latency

CP makes no sense!

Page 23: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Move over, CAP

PACIf there’s a partition, do we choose A or C?

ELCOtherwise, do we choose Latency or Consistency?

PACELC (“pass-elk”)

Page 24: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

At the end of the day…

Guaranteed consistency with a consensus protocol

A buggy mess

“Eventual Consistency”

Page 25: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Morale of the story: there’s no free lunch!

Source: www.phdcomics.com/comics/archive.php?comicid=1475

(Everything is a tradeoff)

Page 26: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

h = 0h = 2n – 1

Machine fails: What happens?

Solution: ReplicationN = 3, replicate +1, –1

Covered!

Covered!

Page 27: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Image Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

HBase

Page 28: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Three Core Ideas

Partitioning (sharding)To increase scalability and to decrease latency

CachingTo reduce latency

ReplicationTo increase robustness (availability) and to increase throughput

Why do these scenarios happen?

Need replica coherence protocol!

Page 29: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: www.facebook.com/note.php?note_id=23844338919

MySQL

memcached

Read path:Look in memcachedLook in MySQLPopulate in memcached

Write path:Write in MySQLRemove in memcached

Subsequent read:Look in MySQLPopulate in memcached

Facebook Architecture

Page 30: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

1. User updates first name from “Jason” to “Monkey”.

2. Write “Monkey” in master DB in CA, delete memcached entry in CA and VA.

3. Someone goes to profile in Virginia, read VA replica DB, get “Jason”.

4. Update VA memcache with first name as “Jason”.

5. Replication catches up. “Jason” stuck in memcached until another write!

Source: www.facebook.com/note.php?note_id=23844338919

MySQL

memcached

California

MySQL

memcached

Virginia

Replication lag

Facebook Architecture: Multi-DC

Page 31: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: www.facebook.com/note.php?note_id=23844338919

= stream of SQL statements

Solution: Piggyback on replication stream, tweak SQL

REPLACE INTO profile (`first_name`) VALUES ('Monkey’)WHERE `user_id`='jsobel' MEMCACHE_DIRTY 'jsobel:first_name'

Facebook Architecture: Multi-DC

MySQL

memcached

California

MySQL

memcached

Virginia

Replication

Page 32: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Three Core Ideas

Partitioning (sharding)To increase scalability and to decrease latency

CachingTo reduce latency

ReplicationTo increase robustness (availability) and to increase throughput

Why do these scenarios happen?

Need replica coherence protocol!

Page 33: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: Google

Now imagine multiple datacenters…

What’s different?

Page 34: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

tl;dr -

Implement a global consensus protocol for every transactionGuarantee consistency, but slow

Eventual consistencyWho knows?

Single row transactionsEasy to implement, obvious limitations

Page 35: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

tl;dr -

Implement a global consensus protocol for every transactionGuarantee consistency, but slow

Eventual consistencyWho knows?

Single row transactionsEasy to implement, obvious limitations

Page 36: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

tl;dr -

Implement a global consensus protocol for every transactionGuarantee consistency, but slow✗

✗Entity groups

Groups of entities that share affinityExample: user + user’s photos + user’s posts etc.

Page 37: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: Baker et al., CIDR 2011

Google’s Megastore

Page 38: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

But what if that’s not enough?

Page 39: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Preserving commit order: example schema

Source: Llyod, 2012

Page 40: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Preserving commit order

Source: Llyod, 2012

Page 41: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Snapshot MapReduce and queries

Initial state

T1@ts1 INSERT INTO ads VALUES (2, “elkhound puppies”)

T2@ts2 INSERT INTO impressions VALUES (US, 2PM, 2)

Source: Llyod, 2012

Page 42: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: Llyod, 2012

Google’s Spanner

Features:Full ACID translations across multiple datacenters, across continents!

External consistency (= linearizability):system preserves happens-before relationship among transactions

How?Given write transactions A and B, if A happens-before B, then

timestamp(A) < timestamp(B)

Page 43: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

TrueTime → write timestamps

Source: Llyod, 2012

Page 44: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Why this works

Source: Llyod, 2012

Page 45: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

TrueTime

Source: Llyod, 2012

Page 46: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: The Matrix

What’s the catch?

Page 47: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Three Core Ideas

Partitioning (sharding)To increase scalability and to decrease latency

CachingTo reduce latency

ReplicationTo increase robustness (availability) and to increase throughput

Need replica coherence protocol!

Page 48: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Source: Wikipedia (Cake)

Page 49: Data-Intensive Distributed Computing - Roegiest › bigdata-2019w › slides › didp-part07b.pdfData-Intensive Distributed Computing Part 7: Mutable State (2/2) This work is licensed

Morale of the story: there’s no free lunch!

Source: www.phdcomics.com/comics/archive.php?comicid=1475

(Everything is a tradeoff)