clock skew and other annoying realities in distributed systems (donny nadolny, pagerduty) |...

64
20160908 Clock Skew, and other annoying realities in distributed systems Donny Nadolny [email protected] #CassandraSummit

Upload: datastax

Post on 16-Apr-2017

114 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

Clock Skew, and other annoying realities in

distributed systemsDonny Nadolny

[email protected]

#CassandraSummit

Page 2: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08

Page 3: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Probably not: • user tracking / metrics • hit counter / impressions • log data

Should I Care?Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store

Page 4: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

Probably not: • user tracking / metrics • hit counter / impressions • log data

Individual data is low impact

Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store

Individual data is high impact

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Should I Care?

Page 5: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC

Introduction to Reads & Writes

Page 6: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Cluster: 5 nodes • Replication factor: 3 • Consistency: QUORUM

Cassandra Write

Page 7: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Write

INSERT INTO table1 …

Page 8: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Write

INSERT INTO table1 …write

foo

write foo

write foo

Page 9: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Write

INSERT INTO table1 …

value: foo

write fo

o

write foo

write foo

Page 10: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Write

INSERT INTO table1 …value: foo

value: foo

write fo

o

write foo

write foo

Page 11: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Write

INSERT INTO table1 …

Success

value: foo

value: foo

write fo

o

write foo

write foo

Page 12: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Write

INSERT INTO table1 …

Success

value: foo

value: foo

write fo

o

write foo

write foo

Page 13: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Read

SELECT * FROM table1 WHERE …

value: foo

value: foo

Page 14: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Read

SELECT * FROM table1 WHERE …

value: foo

value: foo

read

read

Page 15: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Read

SELECT * FROM table1 WHERE …

value: foo

value: foo

read

read

Page 16: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Read

SELECT * FROM table1 WHERE …

value: foo

value: foo

read

read

Page 17: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Read

SELECT * FROM table1 WHERE …

Success, value: foo

value: foo

value: foo

read

read

Page 18: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Update

UPDATE table1 …value: foo, t=5

value: foo, t=5

Page 19: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Update

UPDATE table1 …

value: foo, t=5

write ba

r, t=7

write bar, t=7

write bar, t=7

value: foo, t=5

Page 20: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Cassandra Update

UPDATE table1 …

value: foo, t=5

value: bar, t=7

write ba

r, t=7

write bar, t=7

write bar, t=7

value: foo, t=5 value: bar, t=7

Page 21: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC

Successful Write?

Page 22: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Bank Example

t=5

savings: 10000, t=5

savings: 10000, t=5

write …

write …

write …

t=2

INSERT INTO balances …savings: 10000, t=5

Page 23: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Bank Example savings: 10000, t=5

savings: 10000, t=5

t=5

t=2

Success

INSERT INTO balances …savings: 10000, t=5

Page 24: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Withdraw 8,000 from ATM: • Read current balance: 10,000

Bank Example savings: 10000, t=5

savings: 10000, t=5

read

read

t=6

t=3

savings: 10000, t=5

Page 25: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000

Bank Example savings: 10000, t=5 savings: 2000, t=4

write …writ

e …

t=7

t=4

write

sav

ings

: 20

00, t=

4

savings: 10000, t=5 savings: 2000, t=4

s: 10000, t=5 s: 2000, t=4

Page 26: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 • Dispense 8,000 cash

Bank Example

Success

t=7

t=4

savings: 10000, t=5 savings: 2000, t=4

savings: 10000, t=5 savings: 2000, t=4

s: 10000, t=5 s: 2000, t=4

Page 27: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• A successful write can really fail • Your clocks are not perfectly synchronized • “I’m running NTP, I’m good” - oh really?

Clock Skew

Page 28: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC

Failed Write?

Page 29: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write?

INSERT INTO stock_trades …

trade 123: buy 100 BRKA

trade 123…

trade 123…

write …

write trade 123 …

write trade 123 …

Page 30: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write?

INSERT INTO stock_trades …

trade 123: buy 100 BRKA

trade 123…

trade 123…

write …

write trade 123 …

write trade 123 …

Page 31: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write?

Connection error

trade 123: buy 100 BRKA

trade 123…

trade 123…

write …

write trade 123 …

write trade 123 …

Page 32: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write?

INSERT INTO stock_trades …

Page 33: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write?

Connection Error

Write Timeout

Page 34: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write?

INSERT INTO stock_trades …

trade 245: buy 100 BRKA

trade 245…

trade 245…

Page 35: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write? trade 245: buy 100 BRKA

trade 245…

trade 245…

hints: tell nodeA trade 123 … tell nodeB trade 123 … tell nodeC trade 123 …

Page 36: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Failed Write? trade 245: buy 100 BRKA trade 123: buy 100 BRKA

trade 245… trade 123…

trade 245… trade 123…

write …

write trade 123 …

write trade 123 …

Page 37: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Full repair • Read repair chance • Hinted handoff

Eventual Consistency

Page 38: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC

Multiple Writes aka “I wish I had transactions”

Page 39: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Rule: minimum $10,000 end of day balance, monthly fee otherwise

Another Bank Example

Page 40: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

• Rule: minimum $10,000 end of day balance, monthly fee otherwise

Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Another Bank Example

Page 41: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Rule: minimum $10,000 end of day balance, monthly fee otherwise

Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee

Another Bank Example

Transfer money amount = … s = read savings c = read checking write_savings(s - amount) write_checking(c + amount)

Page 42: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Rule: minimum $10,000 end of day balance, monthly fee otherwise

Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee

Another Bank Example

Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(13000)

Page 43: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Rule: minimum $10,000 end of day balance, monthly fee otherwise

Balance checker for each user: s = read savings //2000 c = read checking //6000 if s + c < 10000 //true mark user for monthly fee

Another Bank Example

Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(11000)

Page 44: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

1. “Window of vulnerability is small, hope it doesn’t happen” • The client (your application) can crash

2. “Do the writes in reverse order” • Works for balance checker, but allows overdrawing your account

3. “Use a lock!” • The write can propagate out anyway • How long will you hold the lock for a failed write?

Solutions?

Page 45: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Writes to multiple columns in the same row (when issued at the same time)

• Writes to multiple rows in one table that have the same partition key (when issued at the same time)

Partition key: the primary key of a table, or the first part of the primary key if it is a compound key

Isolation Guarantees in Cassandra

Page 46: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC

Atomic Batches

Page 47: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

https://en.wikipedia.org/wiki/Atomicity_(database_systems)

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Atomicity“An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs…

the transaction cannot be observed to be in progress by another database client”

Page 48: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

https://en.wikipedia.org/wiki/Atomicity_(database_systems)

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Atomicity“An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs…

the transaction cannot be observed to be in progress by another database client”

“An example of an atomic transaction is a monetary transfer from bank account A to account B.”

Page 49: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

Atomic Batch Write

Page 50: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Atomic Batch Write

write ba

tch

write batch

Page 51: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

Atomic Batch Write

write ba

tch

write batch

Page 52: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

Atomic Batch Write

write ta

ble2

write table1

write table1

Page 53: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

Atomic Batch Write

Success

write ta

ble2

write table1

write table1

Page 54: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

Atomic Batch Write

delet

e batc

h

delete batch

Page 55: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Atomic Batch Write

write table1

write table1

Page 56: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

Atomic Batch Write

Connection error

Page 57: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH;

Atomic Batch Writewrite table2

write table1w

rite

tab

le1

Page 58: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC

Summary

Page 59: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• No isolation - you can read partial results • … even without any failures

Summary

Page 60: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• No isolation - you can read partial results • … even without any failures

• Atomic batches aren't really atomic • also, you give up sequential ordering

Summary

Page 61: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• No isolation - you can read partial results • … even without any failures

• Atomic batches aren't really atomic • also, you give up sequential ordering

• A write can say it failed but really it succeeded • or it didn’t yet, but will hours later

Summary

Page 62: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• No isolation - you can read partial results • … even without any failures

• Atomic batches aren't really atomic • also, you give up sequential ordering

• A write can say it failed but really it succeeded • or it didn’t yet, but will hours later

• A write can say it succeeded but really it failed • :(

Summary

Page 63: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08

Questions? [email protected]

Page 64: Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS

• Idempotency - useful overall in distributed systems • Avoid modifying data

• Critical deletes get a new delete column written + row delete • Truly mutable data can be written to a new column (incrementing a

version number in the column name) • Monitor ntp • Distributed locks with ZooKeeper and a sleep(100) before release • Think hard about ordering & partial failure • Test by adding “if (rng < …) exit or sleep” in between various writes

How do you deal with it?