ordering of events in distributed systems & eventual consistency jinyang li
Post on 21-Dec-2015
217 views
TRANSCRIPT
Ordering of events in Distributed Systems
&Eventual Consistency
Jinyang Li
• Consistency model:– A constraint on the system state observable by
application operations
• Examples:– X86 memory:
– Database:
What is consistency?
read x (should be 5)write x=5
time
x:=x+1; y:=y-1 assert(x+y==const)
time
Consistency
• No right or wrong consistency models– Tradeoff between ease of programmability and
efficiency
• Consistency is hard in (distributed) systems:– Data replication (caching)– Concurrency– Failures
Consistency challenges: example
• Each node has a local copy of state• Read from local state• Send writes to the other node, but do not wait
QuickTime™ and a decompressor
are needed to see this picture.QuickTime™ and a decompressor
are needed to see this picture.
Consistency challenges: example
QuickTime™ and a decompressor
are needed to see this picture.QuickTime™ and a decompressor
are needed to see this picture.
W(x)1
W(y)1
x=1If y==0 critical section
y=1If x==0 critical section
Does this work?QuickTime™ and a decompressor
are needed to see this picture.QuickTime™ and a decompressor
are needed to see this picture.
R(y)0
W(x)1 W(y)1
R(x)0
x=1If y==0 critical section
y=1If x==0 critical section
What went wrong?QuickTime™ and a decompressor
are needed to see this picture.QuickTime™ and a decompressor
are needed to see this picture.
W(x)1 W(y)1
CPU0 sees: W(x)1R(y)0W(y)1
CPU1 sees:W(y)1R(x)0W(x)1
Diff CPUs see different event orders!
R(y)0 R(x)0
Strict consistency
• Each operation is stamped with a global wall-clock time
• Rules:1. Each read gets the latest write value
2. All operations at one CPU have time-stamps in execution order
• No two CPUs in the critical section• Proof: suppose mutual exclusion is violated
CPU0: W(x)1 R(y)0CPU1: W(y)1 R(x)0
• Rule 1: read gets latest writeCPU0: W(x)1 R(x)0CPU1: W(y)1 R(x)0
Strict consistency gives “intuitive” results
W must have timestamp later than R
Contradicts rule 1: R must see W(x)1
Sequential consistency
• Strict consistency is not practical– No global wall-clock available
• Sequential consistency is the closest
• Rules: There is a total order of ops s.t.– All CPUs see results according to total
order (i.e. reads see most recent writes)– Each CPUs’ ops appear in order
Lamport clock gives a total order
• Each CPU keeps a logical clock• Each CPU updates its logical clock between
successive events• A sender includes its clock value in the message.• A receiver advances its clock be greater than the
message’s clock value.• Lamport clocks define a total order.
– Ties are broken based on CPU ids.
Fix the exampleQuickTime™ and a decompressor
are needed to see this picture.QuickTime™ and a decompressor
are needed to see this picture.
W(x)1
W(y)1
CPU0 should see orderW(x)1W(y)1
CPU1 should see orderW(x)1W(y)1
ack
ack
R(y)0 R(x)1
Lamport clock: an example
W(x)1 W(y)11,0 S: W(x)1
2,1 R: W(x)1
4,0 R: ack
2,0 R: W(y)1
1,1 S: W(y)1
4,1 R: ack
3,1 S: ack3,0 S: ack
1,0 S W(x)1 1,1 S W(y)12,0 R W(y)12,1 R W(x)13,0 S ack3,1 S ack4,0 R ack4,1 S ack
Defines one possible total order: W(x)1 < W(y)1
Lamport clock: an example
W(x)1 W(y)11,0 S: W(x)1
2,1 R: W(x)1
4,0 R: ack
2,0 R: W(y)1
1,1 S: W(y)1
4,1 R: ack
3,1 S: ack3,0 S: ack
1,0 S W(x)1 ?????
1,0 S W(x)11,1 S W(y)12,0 R W(y)1 3,0 S ack
1,0 S W(x)11,1 S W(y)1
?????? 1,1 S: W(x)1
1,0 S W(x)11,1 S W(y)12,1 R: W(x)13,1 S: ack
1,0 S W(x)1
Beyond Lamport clock
• Typical system obtains a total order differently– Use a single node to order all reads/writes
• E.g. the lock_server in Lab1
– Partition state over multiple nodes, each node orders reads/writes for its partition
• Invariant: exactly one is in charge of ordering The ordering node must be online
Weakly consistent systems
• Sequential consistency– All read/writes are applied in total order– Reads must see most recent writes
• Eventual consistency (Bayou)– Writes are eventually applied in total order– Reads might not see most recent writes in
total order
Why (not) eventual consistency?
• Support disconnected operations– Better to read a stale value than nothing– Better to save writes somewhere than nothing
• Potentially anomalous application behavior– Stale reads and conflicting writes…
Bayou
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
0:01:02:0
0:01:02:0
0:01:02:0
N0
N1
N2
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
0:31:02:0
N0
N1
N2
1:0 W(x)2:0 W(y)3:0 W(z)
0:01:12:0
0:01:02:0
1:1 W(x)
1:0 W(x)2:0 W(y)3:0 W(z)
0:31:02:0
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
0:31:02:0
N0
N1
N2
1:0 W(x)2:0 W(y)3:0 W(z)
0:31:42:0
0:01:02:0
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
1:1 W(x)0:31:42:0
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
N0
N1
N2
0:31:42:0
0:01:02:0
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
0:41:42:0
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z) Which portion of
The log is stable?
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
N0
N1
N2
0:31:42:0
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
0:41:42:0
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
0:31:42:5
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
N0
N1
N2
0:31:62:5
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
0:41:42:0
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
0:41:42:5
1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)
0:31:42:5
Bayou uses a primary to commit a total order
• Why is it important to make log stable?– Stable writes can be committed – Stable portion of the log can be truncated
• Problem: If any node is offline, the stable portion of all logs stops growing
• Bayou’s solution:– A designated primary defines a total commit order – Primary assigns CSNs (commit-seq-no)– Any write with a known CSN is stable– All stable writes are ordered before tentative writes
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
0:31:02:0
N0
N1
N2
1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)
0:01:12:0
0:01:02:0
∞:1:1 W(x)
∞:1:1 W(x) 0:01:12:0
Bayou propagation
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
VersionVector
Write log
0:41:12:0
N0
N1
N2
1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)
0:01:12:0
0:01:02:0
∞:1:1 W(x)
4:1:1 W(x)
1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)4:1:1 W(x)
0:41:12:0
Bayou’s limitations
• Primary cannot fail
• Server creation & retirement makes nodeID grow arbitrarily long
• Anomalous behaviors for apps?– Calendar app