ordering of events in distributed systems & eventual consistency jinyang li

27
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Ordering of events in Distributed Systems

&Eventual Consistency

Jinyang Li

Page 2: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

• Consistency model:– A constraint on the system state observable by

application operations

• Examples:– X86 memory:

– Database:

What is consistency?

read x (should be 5)write x=5

time

x:=x+1; y:=y-1 assert(x+y==const)

time

Page 3: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Consistency

• No right or wrong consistency models– Tradeoff between ease of programmability and

efficiency

• Consistency is hard in (distributed) systems:– Data replication (caching)– Concurrency– Failures

Page 4: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Consistency challenges: example

• Each node has a local copy of state• Read from local state• Send writes to the other node, but do not wait

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

Page 5: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Consistency challenges: example

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

W(x)1

W(y)1

x=1If y==0 critical section

y=1If x==0 critical section

Page 6: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Does this work?QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

R(y)0

W(x)1 W(y)1

R(x)0

x=1If y==0 critical section

y=1If x==0 critical section

Page 7: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

What went wrong?QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

W(x)1 W(y)1

CPU0 sees: W(x)1R(y)0W(y)1

CPU1 sees:W(y)1R(x)0W(x)1

Diff CPUs see different event orders!

R(y)0 R(x)0

Page 8: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Strict consistency

• Each operation is stamped with a global wall-clock time

• Rules:1. Each read gets the latest write value

2. All operations at one CPU have time-stamps in execution order

Page 9: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

• No two CPUs in the critical section• Proof: suppose mutual exclusion is violated

CPU0: W(x)1 R(y)0CPU1: W(y)1 R(x)0

• Rule 1: read gets latest writeCPU0: W(x)1 R(x)0CPU1: W(y)1 R(x)0

Strict consistency gives “intuitive” results

W must have timestamp later than R

Contradicts rule 1: R must see W(x)1

Page 10: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Sequential consistency

• Strict consistency is not practical– No global wall-clock available

• Sequential consistency is the closest

• Rules: There is a total order of ops s.t.– All CPUs see results according to total

order (i.e. reads see most recent writes)– Each CPUs’ ops appear in order

Page 11: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Lamport clock gives a total order

• Each CPU keeps a logical clock• Each CPU updates its logical clock between

successive events• A sender includes its clock value in the message.• A receiver advances its clock be greater than the

message’s clock value.• Lamport clocks define a total order.

– Ties are broken based on CPU ids.

Page 12: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Fix the exampleQuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

W(x)1

W(y)1

CPU0 should see orderW(x)1W(y)1

CPU1 should see orderW(x)1W(y)1

ack

ack

R(y)0 R(x)1

Page 13: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Lamport clock: an example

W(x)1 W(y)11,0 S: W(x)1

2,1 R: W(x)1

4,0 R: ack

2,0 R: W(y)1

1,1 S: W(y)1

4,1 R: ack

3,1 S: ack3,0 S: ack

1,0 S W(x)1 1,1 S W(y)12,0 R W(y)12,1 R W(x)13,0 S ack3,1 S ack4,0 R ack4,1 S ack

Defines one possible total order: W(x)1 < W(y)1

Page 14: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Lamport clock: an example

W(x)1 W(y)11,0 S: W(x)1

2,1 R: W(x)1

4,0 R: ack

2,0 R: W(y)1

1,1 S: W(y)1

4,1 R: ack

3,1 S: ack3,0 S: ack

1,0 S W(x)1 ?????

1,0 S W(x)11,1 S W(y)12,0 R W(y)1 3,0 S ack

1,0 S W(x)11,1 S W(y)1

?????? 1,1 S: W(x)1

1,0 S W(x)11,1 S W(y)12,1 R: W(x)13,1 S: ack

1,0 S W(x)1

Page 15: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Beyond Lamport clock

• Typical system obtains a total order differently– Use a single node to order all reads/writes

• E.g. the lock_server in Lab1

– Partition state over multiple nodes, each node orders reads/writes for its partition

• Invariant: exactly one is in charge of ordering The ordering node must be online

Page 16: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Weakly consistent systems

• Sequential consistency– All read/writes are applied in total order– Reads must see most recent writes

• Eventual consistency (Bayou)– Writes are eventually applied in total order– Reads might not see most recent writes in

total order

Page 17: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Why (not) eventual consistency?

• Support disconnected operations– Better to read a stale value than nothing– Better to save writes somewhere than nothing

• Potentially anomalous application behavior– Stale reads and conflicting writes…

Page 18: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:01:02:0

0:01:02:0

0:01:02:0

N0

N1

N2

Page 19: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:31:02:0

N0

N1

N2

1:0 W(x)2:0 W(y)3:0 W(z)

0:01:12:0

0:01:02:0

1:1 W(x)

1:0 W(x)2:0 W(y)3:0 W(z)

0:31:02:0

Page 20: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:31:02:0

N0

N1

N2

1:0 W(x)2:0 W(y)3:0 W(z)

0:31:42:0

0:01:02:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

1:1 W(x)0:31:42:0

Page 21: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

N0

N1

N2

0:31:42:0

0:01:02:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z) Which portion of

The log is stable?

Page 22: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

N0

N1

N2

0:31:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:31:42:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

Page 23: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

N0

N1

N2

0:31:62:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:31:42:5

Page 24: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou uses a primary to commit a total order

• Why is it important to make log stable?– Stable writes can be committed – Stable portion of the log can be truncated

• Problem: If any node is offline, the stable portion of all logs stops growing

• Bayou’s solution:– A designated primary defines a total commit order – Primary assigns CSNs (commit-seq-no)– Any write with a known CSN is stable– All stable writes are ordered before tentative writes

Page 25: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:31:02:0

N0

N1

N2

1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)

0:01:12:0

0:01:02:0

∞:1:1 W(x)

∞:1:1 W(x) 0:01:12:0

Page 26: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:41:12:0

N0

N1

N2

1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)

0:01:12:0

0:01:02:0

∞:1:1 W(x)

4:1:1 W(x)

1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)4:1:1 W(x)

0:41:12:0

Page 27: Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li

Bayou’s limitations

• Primary cannot fail

• Server creation & retirement makes nodeID grow arbitrarily long

• Anomalous behaviors for apps?– Calendar app