d u k e s y s t e m s asynchronous replicated state machines (causal multicast and all that) jeff...

30
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Upload: liliana-perry

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

D u k e S y s t e m s

AsynchronousReplicated State Machines

(Causal Multicast and All That)

Jeff ChaseDuke University

Page 2: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

A Classic Paper

• Time, Clocks, and the Ordering of Events in Distributed Systems

• Leslie Lamport, CACM, July 1978

• Introduced:– logical clocks (“Lamport clocks”)

– state-machine replication model

– Causal multicast

– physical clocks that respect causality

– (Almost but not) vector clocks

Page 3: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Concurrency and time

A

B

CC

What do these words mean?after?last?subsequent?eventually?

Page 4: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Same world, different timelines

Which of these happened first?

A

B

W(x)=v

R(x) R(x)

Message send

Message receive

“Event e1a wrote W(x)=v”

e1a

e1b

e2

e3b e4

e3a

e1a is concurrent with e1be3a is concurrent with e3bThis is a partial order of events.

Page 5: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Concurrency and time

• Premise: nodes communicate only by messages.

• Nodes can observe a remote event only through some chain of messages.– Message patterns define the observability of events.

• Event e1 can affect or cause event e2 iff the node initiating e2 could have already observed e1.

– Message patterns define a potential causality relation.

– Also known as happened-before ().

• Event e1 precedes e2 iff e1 could have caused e2.

– We can view the potential causality relation as logical time.

• Events are concurrent if neither precedes the other.

Page 6: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Axiom 1: happened-before ()

C

A

B

C

1. If e1, e2 are in the same process/node, and e1 comes before e2, then e1e2.

- Also called program order

Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Page 7: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Axiom 2: happened-before ()

C

A

B

C

2. If e1 is a message send, and e2 is the corresponding receive, then e1e2.

Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Page 8: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Axiom 3: happened-before ()

C

A

B

C

3. is transitive happened-before is the transitive closure of

the relation defined by #1 and #2

Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Page 9: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Potential causality: example

A

B

C

A1 A2

A3 A4

B1 B2 B3 B4

C1 C2 C3

A1 < B2 < C2

B3 < A3

C2 < A4

Page 10: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Logical clocks

We need clocks that reflect logical time. Start with logical clocks:

1. Each node maintains a monotonically increasing logical clock value LC.

2. Events are timestamped with the current LC value at the generating node.– Increment LC on each new event: LC = LC + 1

3. Piggyback current clock value on all messages.– On receive, advance receiver’s LC to sender’s LC.

– If LCs > LC then LC = LCs + 1

Page 11: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

The Clock Condition

• e1 e2 implies that LC(e1) < LC(e2)

• LC ordering respects potential causality.

• Is the converse also true?

• i.e., Does LC(e1) < LC(e2) imply that e1 e2?

• What if e1 and e2 are concurrent?

Page 12: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Logical clocks: example

0

0

0

1 2 3 9

61 5 8

2 3 4 6 7

4 5 6 7 8 10

7

5

A

B

C

C5: LC update advances receiver’s clock if it is “running slow” relative to sender.

A6-A10: receiver’s clock is unaffected because it is “running fast” relative to sender.

Page 13: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Potential causality

A

SS

C

W(x)=v

R(x) v?

OK “Try it now”

“Event e1a wrote W(x)=v”

If nodes observe events in an order that violates causality, they may perceive an inconsistency.

Page 14: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Causal consistency

A

SS

C

W(x)=v

R(x) 0????

This ordering violates causal consistency.

“Try it now”

“Event e1a wrote W(x)=v”

Page 15: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Same world, unified timelines?

A

B

W(x)=v

R(x) R(x)

e1b

e2

e4

e3a

This is a total order of events.Also called a sequential schedule.It is consistent with the partial order induced by happened-before (causal order).

External witness

e1a

e5

X

e3b

Page 16: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Same world, unified timelines?

A

B

W(x)=v

R(x) R(x)

e1b

e2

e3b

e4

e3a

Here is another total order of the same events.Like the last one, it is consistent with the partial order: it does not change any existing orderings; it only assigns orderings to events that are concurrent in the partial order.

External witness

e1aX

X

Page 17: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Replicated State Machines, revisited

Replicas are “consistent” if all replicas apply the same (deterministic) updates in the same total order.

Suppose a client can read from any replica while writes are propagating. Does the order matter?

opA opA opBopB

opA opB opB opA

Page 18: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Challenge: A Decentralized Mutex

Implement a distributed mutex, respecting these conditions:1. Resource holder must release before algorithm

grants the resource to another Process;

2. Different acquire requests must be granted “in the order in which they are made”;

3. If every holder eventually releases, then every request is eventually granted.

Note: ordering is arbitrary for concurrent requests.

Page 19: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Definition of a lock (mutex)

• Acquire + release ops on L are strictly paired.– After acquire completes, the caller holds (owns)

the lock L until the matching release.

• Acquire + release pairs on each L are ordered.– Total order: each lock L has at most one holder.

– That property is mutual exclusion; L is a mutex.

Page 20: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Lamport’s Mutex Problem

• The trick here is to implement a mutex service in a decentralized way, without a central lock server.

• Nodes/processes exchange messages and execute requests locally in the same order.– Replicated State Machine

• Lamport’s condition II says the order must be causal.– (why?).

Page 21: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Lamport’s Proposed Solution

• Process group of N processes/nodes/peers with unique IDs.

• Each process is a state machine:– Two operations: request (acquire) and release

– Internal state per-process: request queue

• Basic idea:– Requester sends request to all peers (multicast)

– Peers have some means to order the requests: receive code may defer or reorder messages before delivery.

– All peers agree on the same order of delivery.

– Therefore, all peers agree on the sequence of acquisitions.

Page 22: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Causal Multicast (“cbcast”)1. Assume FIFO delivery at transport: messages sent by P are

received by others in their send order (program order).

2. Sender timestamps each messages with its logical clock.

3. Receiver acknowledges a received request with a message back to the sender, timestamped with logical clock of receiver (ack sender).

Propagates knowledge of peer timestamp values.

4. The logical clocks induce a partial order on the messages. To make it a total order: if two messages have the same timestamp, use the unique process ID to break the tie.

This total order is “arbitrary”, but it respects potential causality.

5. A node delivers a request R to the local application when R is stable: it has the earliest timestamp T of all undelivered requests, and there can be no preceding request still in the network (e.g., the node has received a later-timestamped message from every peer).

Page 23: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

The Lamport Mutex Algorithm

1. Use causal multicast.

2. Processes cache the acquire requests they have received on their request queue, in timestamp order.• Including their own acquire requests

3. If a release message is received, remove the corresponding request from the queue.

4. Take the lock when your request is at the front of the queue.• This request is next, and it is stable.

Page 24: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

Enhanced causal multicast: Isis• Later refinements to causal multicast (“cbcast”) improved

concurrency (e.g., in Birman’s Isis group system).– Lamport’s approach delivers messages in a total order.

– Lamport’s use of logical clocks imposes an ordering in some instances where causality does not require it.

– Isis advanced cbcast (causal broadcast): deliver concurrent messages in any order; never delay your own messages to self; use various batching optimizations for message bursts.

– How to know if messages are concurrent?

• Lamport’s state-machine service doesn’t handle failures: addressed in later work (e.g., Isis again).– Requires failure detection, uniform atomicity, and views.

Page 25: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University
Page 26: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University
Page 27: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University
Page 28: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University
Page 29: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University
Page 30: D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University