global states in a distributed system
DESCRIPTION
Global States in a Distributed System. By John Kor and Yvonne Cheng. Initial Problem Example. Garbage Collector Free’s up memory which is no longer in use Check’s if a reference to memory still exists What about in a distributed system. Initial Problem Example (cont’d). - PowerPoint PPT PresentationTRANSCRIPT
Global States in a Distributed System
By John Kor and Yvonne Cheng
Initial Problem Example
Garbage CollectorFree’s up memory which is no longer in useCheck’s if a reference to memory still exists
What about in a distributed system
Initial Problem Example (cont’d)
A distributed system consists of multiple processes
Each process is located on a different computerNo sharing of processor or memory
Initial Problem Example (cont’d)
Each process can only determine its own “state”
Problem: How do we determine when to garbage collect in a distributed system?How do we check whether a reference to
memory still exists?
System Model
A distributed system consists of multiple processes Each process is located on a different computer Each process consists of “events” An event is either sending a message, receiving a
message, or changing the value of some variable Each process has a communication channel in and
out
Our Garbage Collection Problem
In order to test whether a certain property of our system is true, we cannot just look at each process individually
A “snapshot” of the entire system must be taken to test whether a certain property of the system is true
This “snapshot” is called a Global State
Definition
The global state of a distributed system is the set of local states of each individual processes involved in the system plus the state of the communication channels.
Determinism
Deterministic ComputationAt any point in computation there is at most
one event that can happen next.
Non-Deterministic ComputationAt any point in computation there can be
more than one event that can happen next.
Deterministic Computation
Non-Deterministic Computation
Determinism
Deterministic computationA local event would reveal everything about
the global state!The process will know other process’ state
Non-Deterministic computationBecause of branching, a local event cannot
reveal what the next step will be
Simple Algorithm
Create a new process that collects the states of every other process
Every process will save their state at an arbitrary time and send it to this new process
Advantages
Very simple
Easy to implement
Problems?
Based on the assumption that all processes work on a synchronized global clock
Wrong assumption!
Problems (cont’d)
State recorded by p
m
p q
Problems (cont’d)
p q
m
Problems (cont’d)
State recorded by q
p q
m
Problems (cont’d)
Global state recorded
m
p q
m
Another view
p
q
m
Another view
Process p has no record of sending m
Process q HAS record of receiving mProblem?Global state does not show p sending m,
therefore there is confusion as to where m came from
Breaks the Consistency concept
Consistency
A global state is consistent if it could have been observed by an external observer
If e e` , then both e and e` must reside within the same state
For a successful Global State, all states must be consistent
Solution
Need to develop an asynchronous algorithm
Cannot depend on a clock
Must ensure consistency in all global states
Assumptions
Distributed system: Finite set of processes and channels; described by graph
Processes Set of states, initial state, set of events
Channels FIFO, error-free, infinite buffers, arbitrary but finite
delay
PART 2
Presented By: Yvonne
Idea of a global state recording algorithm
- each process records its own state
- the two processes incident by one channel cooperate in recording the channel state
Challenge
- No global clock
- Need a meaningful result
- Superimposed on underlying computation
Meaningful: The notion of Consistency
- it could have been observed by an external observer
- All feasible states are consistent
An Example
p q
p
q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
A Consistent State?
p q
Sp1 Sq
1
p
q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Yes
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Sp1 Sq
1
A Consistent State?
p q
Sp2 Sq
3
m3
p
q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2 m3
Yes
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2 m3
Sp2 Sq
3
m3
An inconsistent State
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Sp1 Sq
3
Conducting algorithm: Using An Example
- Processes: p and q- Channels: c and c’- Token: t
p q
c
c’
An Example
- p records its state
t
p q
c
c’
An Example
- q, c, and c’ record their states
t
p q
c
c’
An Example
- The composite global state!
t
p q
c
c’
t
An Example
- n: number of messages sent along c before p’s state is recorded
- n’: number of message sent along c before c’s state is recorded
p q
c
c’
An Example
- Reason of inconsistency: n<n’
t
p q
c
c’
t
p q
c
c’
n = 0
n’ = 1
Similar scenario
c is recorded when the token is at process p.
p sends the token through channel c, and the states of c’, p, and q are recorded.
The recorded global state : no tokens in the system.
The reason of inconsistency : n>n’
Conclusion from the example
A consistent global state
requires
n = n’
Similar Conclusion
m : number of messages received along c before q’s state is recorded
m’ : number of messages received along c before c’s state is recorded
To be consistency: m=m’
Some other equations
m’ : number of messages received along c before c’s state is recorded
n’ : number of messages sent along c before c’s state is recorded
m : number of messages received along c before p’s state is recorded
n : number of messages sent along c before p’s state is recorded
n = n’ m = m’
n’ >= m’
n >= m
Other Fact
The state of channel c that is recorded must be the sequence of messages sent along the channel before the sender’s state is recorded, excluding the sequence of messages received along the channel before the receiver’s state is recorded.
Two cases:n’=m’ : c is emptyn’>m’: c must be the (m’+1)st…n’th messages sent by p
along c
Put All Together:A brief sketch of the algorithm
p sends a marker message along all its outgoing channels after it records its state and before it sends any other messages.
On receipt of a marker message from channel celse
state ( c ) = messages received on c since it had recorded its state excluding the marker.
if p has not recorded its staterecord the statestate ( c ) = EMPTY
Chandy and Lamport Algorithm
Features: Does not promise us to give us exactly what is
there But gives us consistent state!!
Algorithm in Action
p
qSq
0 Sq1 Sq
2 Sq3
Sp0 Sp
1 Sp2 Sp
3
m1 m2 m3
Algorithm in Action
p
qSq
0 Sq1 Sq
2 Sq3
Sp0 Sp
1 Sp2 Sp
3
m1 m2 m3
q records state as Sq1 , sends marker to p
Algorithm in Action
p
qSq
0 Sq1 Sq
2 Sq3
Sp0 Sp
1 Sp2 Sp
3
m1 m2 m3
p records state as Sp2, channel state as empty
Algorithm in Action
p
qSq
0 Sq1 Sq
2 Sq3
Sp0 Sp
1 Sp2 Sp
3
m1 m2 m3
q records channel state as m3
Algorithm in Action
p
qSq
0 Sq1 Sq
2 Sq3
Sp0 Sp
1 Sp2 Sp
3
m1 m2 m3
Recorded Global State = ((Sp2, Sq
1), (0,m3) )
Algorithm in Action
p
qSq
0 Sq1 Sq
2 Sq3
Sp0 Sp
1 Sp2 Sp
3
m1 m2 m3
Recorded Global State = ((Sp2, Sq
1), (0,m3) )
Computation may not even have passed through the state recorded!
What have we recorded
The recorded consistent state can be anything!
Properties of the recorded global state
Si : global state when the algorithm starts
Sj : global state when the algorithm finishs
S*: state recorded by the algorithm
Then S* is reachable from Si
Sj is reachable from S*
S* Is reachable from Si
Si
Sj
Sj Is reachable from S*
Si
Sj
Still what good is it?
Stable PropertiesA property Y is called a stable property
iff for all states S` reachable from S Y(S) -> Y(S’)
Detection of Stable Properties
Outcome = false;
while ( outcome == false )
{
determine Global State S;
outcome = Y (S);
}
Checkpoint
S* serves as a checkpoint
On a failure, restart the computation from S*
Problem! Not able to restore to Sj
Si
Sj
S*
Solution: Publishing
A Broadcast medium
A central recorder process records all the messages received by each process
Processes record their states at their own time and send it to the recorder
Determining Global State
Recorder can construct global state from Checkpointed States of all processes
Plus Messages recorded since last checkpoint
Problems
Publishing keeps track of all messages received by each process
Expensive!Solution
recorder takes checkpoint of process p at time t
deletes all messages recd by p before t.
Comparison
SNAPSHOT PUBLISHING
NetworkStronglyconnected
Need not be
Mode Distributed Centralized
Scalability Yes No
Restorability No Yes
Conclusion
Global State detection difficult in Distributed Systems
Snapshot algorithm may not give an actual state but is very helpful in detecting Stable Properties
Publishing gives an asynchronous way of determining global states but is unscalable