distributed snapshots: determining global states of distributed systems - k. mani chandy and leslie...

23
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Upload: collin-rutt

Post on 02-Apr-2015

231 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Distributed Snapshots: Determining Global States of Distributed Systems

- K. Mani Chandy and Leslie Lamport

Page 2: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Outline• What this paper is about.

• Stable property of a system.

• Distributed System model

– Definitions.

– Token Conservation system example.

– Non-Deterministic Computation example.

• Global State Determination Algorithm

– Requirements of Consistent Global State.

– Termination of the algorithm

– Properties of the recorded global state. (theorem + example)

• Stability Detection Algorithm

Page 3: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Goal/Theme

• Algorithms by which a process in a distributed system can determine global state of the system during a computation.

• Processes need to cooperate to record state.• Individual processes do not share clocks/memory.• Global state = (Process state, channel state)• Algorithm must run concurrently with, but not

alter underlying computation.

Page 4: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Stable State• Why does this come in the paper ?

– Global State Detection Algorithm can help solve stability detection.

– Several distributed systems problems need to determine stable property.

• y(S) : predicate function defined on the global states of the distributed system D.

• y is a stable property if y(S) => y(S’) for all global states S’ in D reachable from S in D.

• E.g. “computation terminated”, “system deadlocked”.

Page 5: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Distributed System Model• Finite set of processes & channels[directed graph].• Channels: infinite buffers, error free, ordered

delivery of messages, messages have finite delay.• Channel state: sequence of messages sent but

exclude those received.• Process defined by: initial state, set of events, set

of states.• Events e = <p, s, s’, M, c>. They are atomic and

can change the state of p and at most one channel. M and c = null if e does not change state of any channel.

Page 6: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.

Page 7: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.

• Global state (initial state): processes (initial state), channels (empty sequence).

• Next (S, e)

• let seq = (ei : 0 <= i <= n) be a sequence of events in component processes of a distributed system.

• seq is legal iff:– system starts in S0 and Si+1 = next (Si, ei) 0 <= i <= n.

Page 8: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Example 2.1

Page 9: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.

Page 10: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Example 2.2

Page 11: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.

Page 12: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.

• M : marker. M’ : message.• In example 2.1 only 1 event was possible… not so

the case here. Different permutations of sequence of events will lead to different global states… non-determinism.

• Non-Determinism : for example the events “p sends M” and q sends M’” may occur in the initial state and the next states after these events are different.

Page 13: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Algorithm Steps:

• Each process records its own state.• 2 processes that a channel is incident on cooperate

in recording channel state.• All process and channel states will not be recorded

at the exact same instant… no global clock.• Recorded global state must be “consistent”.• Should run concurrently with underlying

computation… sending messages…require processes to carry out computation; however algorithm cannot affect underlying computation.

Page 14: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Example 3.1

• In-p. record P's state. In-c. record q, c and c’ state.

• 2 tokens !

• n < n’. (n: #msg's sent c before recording P's state, n’: #msg’s sent c before recording C’s state).

• In-p. record c’s state. In-c. record q, p and c’ state.

• 0 tokens !

• n > n’.

• Hence need : n = n’ for consistent global state.

Page 15: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.• Likewise: need m = m’ where m: #MSG’s

received along c before recording Q’s state. m’ : #MSG’s received along c before recording C's state.

• The state of a channel c that is recorded must be the sequence of messages sent along the channel before the sender’s state is recorded, excluding the sequence of messages received along the channel before the receiver’s state is recorded.

• Marker has no effect on underlying computation.

Page 16: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Global State Algorithm Outline• Marker sending rule :

– For each channel c, incident on/directed away from p:

– p sends one marker along c after p records its state and before p sends any further messages along c.

• Marker Receiving rule:– On receiving a marker along channel c:

if q has not recorded its state then

begin q records it state;

q records the state c as the empty sequence.

end

else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.

Page 17: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Algorithm Termination

• To ensure termination in finite time:– L1: no marker remains forever in an incident input

channel.

– L2: state recording takes finite time.

• Can prove that the algorithm must terminate in finite time given these conditions.

Page 18: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Properties of Global State• Recorded global state may not be the same as any

actual state.• But is equivalent (reachable from) and is

consistent.• See next theorem.

Page 19: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.• let seq = (ei : 0 <= i <= n) be a distributed computation,

and let Si be the global state immediately before the event ei in seq.

• Let the algorithm be initiated in global state Sl and let it terminate in global state S.

• The recorded global state S* may be different from all global states Sk, l <= k <=

• We show that:– S* is reachable from Sl, and

– S is reachable from S*.

Page 20: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Contd.

• Specifically, we show that there exists a computation seq’ where– seq’ is a permutation of seq, such that Sl, S* and S

occur as global states in seq’.

– Sl = S* or Sl occurs earlier than S*.

– S = S* or S* occurs earlier than S in seq’.

Page 21: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Theorem 1.

• There exists a computation seq’=(ei: 0<=i) where

– For all i, where i < l or i > : ei’ = ei

– the subsequence (ei’: l<=i<) is a permutation of the subsequence (ei: l<=i<)

– for all i where i <= l or i > : Si’ = Si

– there exists some k, l <= k<= such that S* = Sk’

Page 22: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Example 4.1

• To show how seq’ can be derived from seq.

• Example 2.2 fig 7.– e0: p sends M, changes state to B (post-recording event).

– e1: q sends M’, changes state to D (pre-recording event).

– e2: p gets M’, changes state to A (post-recording event).

• We can interchange interchange e0 and e1 to form the subsequence seq’. (which corresponds to the global state we recorded in fig 8… see after e0’).

Page 23: Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Stability Detection• Stability Detection Algorithm:

Input: A stable property y.

Output: A Boolean value definite with the property (y(Si) -> definite) and (definite -> y(So))

• Solution to stability detection problem:begin

record a global state S*;

definite := y(S*)

end.

• Algorithm correctness comes from properties discussed earlier.