aos lectures
DESCRIPTION
AOS LecturesTRANSCRIPT
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 1
Sunday, Aug 03 - 2014
-
Introduction to Distributed Systems
Theoretical Foundations
Distributed Mutual Exclusion
Distributed Deadlock
Agreement Protocols
Distributed File Systems
Distributed Scheduling
Distributed Shared Memory
Recovery
Fault tolerance
Protection and Security
Course Overview
Mid-Semester
-
Text and References
Text Book:
Advanced Concepts in Operating Systems:
Mukesh Singhal & Shivaratri, Tata Mc Graw Hill
References:
1. Distributed Operating Systems: P. K. Sinha, PHI
2. Distributed Operating Systems: The Logical Design, A.
Goscinski, AW
3. Modern Operating Systems: A.S.Tanenbaum & VAN, PHI
4. Distributed Systems: Concepts and Design: G. Coluris, AW
-
EC No. Evaluation
Component &
Type of
Examination
Duration Weigh-
tage
Day, Date, Session,Time
EC-1 Quiz ** Details to be announced
on LMS Taxila
15% ** Details to be announced
on LMS Taxila
EC-2 Mid-Semester Test
(Closed Book)*
2 Hours 35%
EC-3 Comprehensive
Exam
(Open Book)*
3 Hours 50%
Evaluation Components
-
Distributed Systems
A Distributed System is a collection of independent computers that appears to its users
as a single coherent system [Tanenbaum]
A Distributed System is
- a system having several computers that do
not share a memory or a clock
- Communication is via message passing
- Each computer has its own OS+Memory
[Shivaratri & Singhal]
-
Multiprocessor System Architecture
Types
Tightly Coupled Systems
Loosely Coupled Systems
-
Tightly Coupled Systems
Systems with a single system wide memory Parallel Processing System , SMMP (shared memory multiprocessor systems)
CPU Shared memory
CPU CPU CPU
Interconnection hardware
-
Loosely Coupled System
Distributed Memory Systems (DMS) Communication via Message Passing
CPU
Local
memory
CPU
Local
memory
CPU
Local
memory
CPU
Local
memory
Communication network
-
Motivation
Resource Sharing
Enhanced Performance
Improved Reliability & Availability
Modular expandability
-
Distributed System Architecture Types
Minicomputer Model
Workstation Model
Workstation Server Model
Processor Pool Model
Hybrid Model
-
Mini-
computer
Mini-
computer
Mini-
computer Communication
network
Terminals
MINICOMPUTER MODEL
-
WORKSTATION MODEL
Workstation
Communication
networkWorkstationWorkstation
WorkstationWorkstation
-
Workstation
Communication
networkWorkstationWorkstation
WorkstationWorkstation
Mini-
computer
used as
file
serv er
Mini-
computer
used as
database
serv er
Mini-
computer
used as
print
serv er
. . .Mini-
computer
used as
file
serv er
WORKSTATION SERVERMODEL
-
File
serv er
Run
serv er
Communication
network
Pool of processors.
...
Terminals
Processor Pool Model
-
Hybrid Model
Based upon workstation-server model but with additional pool of processors
Processors in the pool can be allocated dynamically
Gives guaranteed response time to interactive jobs
More expensive to build
-
Distributed OS
A distributed OS is one that looks to its users like an centralized OS but runs on
multiple, independent CPUs. The key
concept is transparency. In other words,
the use of multiple processors should be
invisible to the user.
[Tanenbaum & Van Renesse]
-
Issues
Global knowledge
Naming
Scalability
Compatibility
Process Synchronization
Resource Management
Security
Structuring
-
Global Knowledge
No Global Memory
No Global Clock
Unpredictable Message Delays
-
Naming
Name refers to objects [ Files, Computers etc]
Name Service Maps logical name to physical address
Techniques
- LookUp Tables [Directories]
- Algorithmic
- Combination of above two
-
Scalability
Grow with time
Scaling Dimensions Size, Geographically & Administratively
Techniques Hiding Communication Latencies, Distribution & Caching
-
Scaling Techniques (1)
Hide Communication Latencies
1.4
The difference between letting:
a) a server or
b) a client check forms as they are being filled Scalability cont.
-
Scaling Techniques (2)
Distribution
1.5
An example of dividing the DNS name space into zones.
Scalability cont.
-
Scaling Techniques (3)
Replication
Replicate components across the distributed system
Replication increases availability , balancing load distribution
Consistency problem has to be handled
-
Compatibility
Interoperability among resources in system
Levels of Compatibility Binary Level, Execution Level & Protocol Level
-
Process Synchronization
Difficult because of unavailability of shared memory
Mutual Exclusion Problem
-
Resource Management
Make both local and remote resources available
Specific Location of resource should be hidden from user
Techniques
- Data Migration [DFS, DSM]
- Computation Migration [RPC]
- Distributed Scheduling [ Load Balancing]
-
Security
Authentication a entity is what it claims to be
Authorization what privileges an entity has and making only those privileges
available
-
Structuring
Techniques
- Monolithic Kernel
- Collective Kernel [Microkernel based , Mach, V-
Kernel, Chorus and Galaxy]
- Object Oriented OS [Services are implemented
as objects, Eden, Choices, x-kernel, Medusa,
Clouds, Amoeba & Muse]
- Client-Server Computing Model [Processes are
categorized as servers and clients]
-
Communication Primitives
-
Communication Primitives
High level constructs [Helps the program in using underlying communication
network]
Two Types of Communication Models
- Message passing
- Remote Procedure Calls
-
Message Passing
Two basic communication primitives
- SEND(a,b) , a Message , b Destination
- RECEIVE(c,d), c Source , d Buffer for storing the message
Client-Server Computation Model
- Client sends Message to server and waits
- Server replies after computation
-
Design Issues
Blocking vs Non blocking primitives
Nonblocking
- SEND primitive return the control to the user process as soon as the
message is copied from user buffer to kernel buffer
- Advantage : Programs have maximum flexibility in performing
computation and communication in any order
- Drawback Programming becomes tricky and difficult
Blocking
- SEND primitive does not return the control to the user process until
message has been sent or acknowledgement has been received
- Advantage : Programs behavior is predictable
- Drawback Lack of flexibility in programming
-
Design Issues cont..
Synchronous vs Asynchronous Primitives
Synchronous
- SEND primitive is blocked until corresponding RECEIVE
primitive is executed at the target computer
Asynchronous
- Messages are buffered
- SEND primitive does not block even if there is no
corresponding execution of the RECEIVE primitive
- The corresponding RECEIVE primitive can be either
blocking or non-blocking
-
Details to be handled in Message
Passing
Pairing of Response with Requests
Data Representation
Sender should know the address of Remote machine
Communication and System failures
-
Remote Procedure Call (RPC)
RPC is an interaction between a client and a server
Client invokes procedure on sever
Server executes the procedure and pass the result back to client
Calling process is suspended and proceeds only after getting the result from
server
-
RPC Design issues
Structure
Binding
Parameter and Result Passing
Error handling, semantics and Correctness
-
Structure RPC mechanism is based upon stub procedures.
Unpack Execute
Procedure
Stub
Procedure Remote
Procedure
Return
Pack
result
Local
Procedure
Call
Local
Procedure
Call
User
Program Stub
Procedure
Pack
parameters
& Transmit
wait
Unpack
Result
Return
from local
call
Client Machine
Server Machine
-
Binding
Determines remote procedure and machine on which it will be executed
Check compatibility of the parameters passed
Use Binding Server
-
Binding
Binding Server
Receive
Query
Return
Server
Address Unpack
Stub
Procedure Remote
Procedure
Return
Pack
result
Local
Procedure
Call
Local
Procedure
Call
Binding
Server
User
Program Stub
Procedure
Pack
parameters
& Transmit
wait
Unpack
Result
Return
from local
call
Client Machine
Server Machine
Registering
Services
1
2
3
4 5
6 7
8
-
Parameter and Result Passing
Stub Procedures Convert Parameters & Result to appropriate form
Pack parameters into a buffer
Receiver Stub Unpacks the parameters
Expensive if done on every call
- Send Parameters along with code that helps to identify
format so that receiver can do conversion
- Alternatively Each data type may have a standard format.
Sender will convert data to standard format and receiver
will convert from standard format to its local
representation
Passing Parameters by Reference
-
Error handling, Semantics and
Correctness RPC may fail either due to computer or communication
failure
If the remote server is slow the program invoking remote procedure may call it twice.
If client crashes after sending RPC message
If client recovers quickly after crash and reissues RPC
Orphan Execution of Remote procedures
RPC Semantics
- At least once semantics
- Exactly Once
- At most once
-
Correctness Condition
Given by Panzieri & Srivastava
Let Ci denote call made by machine & Wi represents corresponding computation
If C2 happened after C1 ( C1 C2) & Computations W2 & W1 share the same
data, then to be correct in the presence of
failures RPC should satisfy
C1 C2 implies W1 W2
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 2
Sunday, Aug 09 - 2014
-
Communication Primitives
-
Communication Primitives
High level constructs [Helps the program in using underlying communication
network]
Two Types of Communication Models
- Message passing
- Remote Procedure Calls
-
Message Passing
Two basic communication primitives
- SEND(a,b) , a Message , b Destination
- RECEIVE(c,d), c Source , d Buffer for storing the message
Client-Server Computation Model
- Client sends Message to server and waits
- Server replies after computation
-
Design Issues
Blocking vs Non blocking primitives
Nonblocking
- SEND primitive return the control to the user process as soon as the
message is copied from user buffer to kernel buffer
- Advantage : Programs have maximum flexibility in performing
computation and communication in any order
- Drawback Programming becomes tricky and difficult
Blocking
- SEND primitive does not return the control to the user process until
message has been sent or acknowledgement has been received
- Advantage : Programs behavior is predictable
- Drawback Lack of flexibility in programming
-
Design Issues cont..
Synchronous vs Asynchronous Primitives
Synchronous
- SEND primitive is blocked until corresponding RECEIVE
primitive is executed at the target computer
Asynchronous
- Messages are buffered
- SEND primitive does not block even if there is no
corresponding execution of the RECEIVE primitive
- The corresponding RECEIVE primitive can be either
blocking or non-blocking
-
Details to be handled in Message
Passing
Pairing of Response with Requests
Data Representation
Sender should know the address of Remote machine
Communication and System failures
-
Remote Procedure Call (RPC)
RPC is an interaction between a client and a server
Client invokes procedure on sever
Server executes the procedure and pass the result back to client
Calling process is suspended and proceeds only after getting the result from
server
-
RPC Design issues
Structure
Binding
Parameter and Result Passing
Error handling, semantics and Correctness
-
Structure RPC mechanism is based upon stub procedures.
Unpack Execute
Procedure
Stub
Procedure Remote
Procedure
Return
Pack
result
Local
Procedure
Call
Local
Procedure
Call
User
Program Stub
Procedure
Pack
parameters
& Transmit
wait
Unpack
Result
Return
from local
call
Client Machine
Server Machine
1
2
3 4 5
6 7
8
-
Binding
Determines remote procedure and machine on which it will be executed
Check compatibility of the parameters passed
Use Binding Server
-
Binding
Binding Server
Receive
Query
Return
Server
Address Unpack
Stub
Procedure Remote
Procedure
Return
Pack
result
Local
Procedure
Call
Local
Procedure
Call
Binding
Server
User
Program Stub
Procedure
Pack
parameters
& Transmit
wait
Unpack
Result
Return
from local
call
Client Machine
Server Machine
Registering
Services
1
2
3
4 5
6 7
8
-
Parameter and Result Passing
Stub Procedures Convert Parameters & Result to appropriate form
Pack parameters into a buffer
Receiver Stub Unpacks the parameters
Expensive if done on every call
- Send Parameters along with code that helps to identify
format so that receiver can do conversion
- Alternatively Each data type may have a standard format.
Sender will convert data to standard format and receiver
will convert from standard format to its local
representation
Passing Parameters by Reference
-
Error handling, Semantics and
Correctness
RPC may fail either due to computer or communication failure
If the remote server is slow the program invoking remote procedure may call it twice.
If client crashes after sending RPC message
If client recovers quickly after crash and reissues RPC
Orphan Execution of Remote procedures
RPC Semantics
- At least once semantics
- Exactly Once
- At most once
-
Correctness Condition
Given by Panzieri & Srivastava
Let Ci denote call made by machine & Wi represents corresponding computation
If C2 happened after C1 ( C1 C2) & Computations W2 & W1 share the same
data, then to be correct in the presence of
failures RPC should satisfy
C1 C2 implies W1 W2
-
Theoretical Foundations
16
-
17
Theoretical Aspects Limitations of Distributed Systems
Logical Clocks
Vector Clocks
-
Limitations of Distributed Systems
Absence of Global clock
Absence of Shared Memory
-
Example
$500
Communication Channel S1:A S2:A
$200 (a)
$450
Communication Channel S1:A S2:A
$200 (b)
$500
Communication Channel S1:A S2:A
$250 (c)
-
20
Happened before relation:
a -> b : Event a occurred before event b. Events in the same process.
a -> b : If a is the event of sending a message m in a process and b is the event of receipt of the
same message m by another process.
a -> b, b -> c, then a -> c. -> is transitive.
Causally Ordered Events
a -> b : Event a causally affects event b
Concurrent Events
a || b: if a !-> b and b !-> a
Lamports Logical Clock
-
21
Space-time Diagram
Time
Space P1
P2
e11 e12 e13 e14
e21 e22 e23 e24
Internal
Events Messages
-
22
Logical Clocks Conditions satisfied:
Ci is clock in Process Pi.
If a -> b in process Pi, Ci(a) < Ci(b)
Let a: sending message m in Pi; b : receiving message m in Pj; then, Ci(a) < Cj(b).
Implementation Rules: R1: Ci = Ci + d (d > 0); clock is updated between two
successive events.
R2: Cj = max(Cj, tm+ d); (d > 0); When Pj receives a message m with a time stamp tm (tm assigned by Pi, the sender; tm = Ci(a), a being the event of sending message m).
A reasonable value for d is 1
-
23
Space-time Diagram
/
Time
Space
P1(0)
P2 (0)
e11 e12 e13 e14
e21 e22 e23 e24
e15 e16 e17
e25
(1) (2) (3) (4) (5) (6) (7)
(1) (2) (3) (4) (7)
-
24
Space-time Diagram
/
Time
Space
P1(0)
P2 (0)
a b c d e
f g h i
-
25
Limitation of Lamports Clock
Time
Space P1
P2
e11 e12 e13
e21 e22 e23
P3 e31 e32 e33
(1) (2) (3)
(1)
(1) (2)
(3) (4)
C(e11) < C(e32) but not causally related.
This inter-dependency not reflected in Lamports Clock.
-
26
Vector Clocks Keep track of transitive dependencies among
processes for recovery purposes.
Ci[1..n]: is a vector clock at process Pi whose entries are the assumed/best guess clock values of different processes.
Ci[j] (j != i) is the best guess of Pi for Pjs clock.
Vector clock rules: Ci[i] = Ci[i] + d, (d > 0); for successive events in Pi
For all k, Cj[k] = max (Cj[k],tm[k]), when a message m with time stamp tm is received by Pj from Pi.
: For all
-
27
Vector Clocks Comparison
1. ta = tb iff for all i, ta[i] = tb[i];
2. ta != tb iff there exists an i, ta[i] != tb[i];
3. ta
-
28
Vector Clock
Time
Space P1
P2
e11 e12 e13
e21 e22 e23 e24
P3 e31 e32
(1,0,0) (2,0,0) (3,4,1)
(0,1,0) (2,2,0) (2,4,1) (2,3,1)
(0,0,1) (0,0,2)
-
29
Next Lecture
Causal Order of Messages
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 3
Saturday, Aug 16 -
2014
-
Causal Order of Messages
- BSS Algorithm
- SES Algorithm
Global State
Cut
Todays Lecture
-
Causal ordering of Messages
8/16/2014 CS ZG623, Adv OS 3
Time
Space P1
P2
P3
Send(M1)
Send(M3)
Send(M2)
-
Birman- Schiper- Stephenson (BSS) protocol
8/16/2014 CS ZG623, Adv OS 4
Before broadcasting m, process Pi increments vector time
VT Pi [i] and timestamps m.
Process P j P i receives m with timestamp VT m from Pi ,
delays delivery until both:
VT Pj [i] = VT m [i] - 1 // received all previous ms
VT Pj [k] >= VT m [k] for every k {1,2,, n} - {i}
// received all messages also received by Pi before sending
message m When Pj delivers m, VT Pj updated by IR2 of vector clock
-
Example of BSS
8/16/2014 CS ZG623, Adv OS 5
P1
P2
P3
(buffer)
(0,0,1) (0,1,1)
(0,0,1) (0,1,1)
(0,0,1) (0,1,1)
deliver
from buffer
-
Schiper-Eggli-Sandoz (SES) protocol
8/16/2014 CS ZG623, Adv OS 6
No need for broadcast messages.
Each process maintains a vector V_P of size N - 1, N being the number of processes in the system.
V_P is a vector of tuple (P,t): P the destination process id and t, a vector timestamp.
Tm: logical time of sending message m
Tpi: present logical time at pi
Initially, V_P is empty.
-
SES Continued
8/16/2014 CS ZG623, Adv OS 7
Sending a Message:
Send message M, time stamped tm, along with V_P1 to P2.
Insert (P2, tm) into V_P1. Overwrite the previous value of (P2,t), if any.
(P2,tm) is not sent. Any future message carrying (P2,tm) in V_P1 cannot be delivered to P2 until tm < Tp2.
Delivering a message
If V_M (in the message) does not contain any pair (P2, t), it can be delivered.
/* (P2, t) exists in V_M*/ If t !< Tp2, buffer the message. (Dont deliver).
else (t < Tp2) deliver it
-
P1
P2
P3
x y z (0,0,0)
(0,0,0)
(0,0,0)
M1 M2
M3
VP1( __ , __ )
VP2( __ , __ )
VP3( __ , __ )
Initially Vector Time is (0,0,0) at all sites
Vector at each site is empty. Note here we have taken size of vector at each site
as 2. If You want you can extend it easily upto 3.
SES Algorithm Example
Figure 1
a b
-
Vectors Maintained at each site
VP1 ( __ , __ )
This entry gets updated when P1 sends any Message to P2
This entry gets updated when P1 sends any Message to P3
VP2 ( __ , __ )
This entry gets updated when P2 sends any Message to P1
This entry gets updated when P2 sends any Message to P3
VP3 ( __ , __ )
This entry gets updated when P3 sends any Message to P1
This entry gets updated when P3 sends any Message to P2
-
Consider Sending of Message M1 from P2 to P1 of Figure 1
Send Step at P2
1. P2 Sends Message M1 to P1 [ Vector accompanying Message
will be represented as VM ( __, __) i.e. empty ]. So P2 sends the
Message {( __, __), (0,1,0)} to P1.
2. After Sending Message it updates its own vector VP2. Now P1s corresponding entry in VP2 (i.e. Vector of Process P2) is empty so
the pair {P1,(0,1,0)} will be inserted in VP2. So before sending
message M1, VP2 is ( __,__) but after sending message M1 ,
P2s vector VP2 will be updated as {(P1,(0,1,0)), __)}.
Important Point :
Message M1 is sent with Earlier VP2 (as VM )i.e. VP2
existing at P2 before sending message M1. VP2 gets
updated with new entry after sending Message M1.
-
Receiving of M1 by P1 at Point y of Figure
Receiving of M1 by P1[Note here what P1 has received from P2. Message of form {( __, __), (0,1,0)} i.e. (VM,t).
Check
1. If the Accompanying VM contains any entry corresponding to P1,
If not receive the message . [This condition is satisfied here].
else (If the condition of step 1 is not satisfied then only go to else part )
2. Compare time t (Message time stamp) with TP1 (Current Time
running at P1). Time running at P1 before receiving the message
is (0,0,0). If t < TP1 then receive the message
else (if this step is false)
3. Buffer the Message [t >= TP1]
For Figure 1 only Step1 is fulfilled that is why M1 is received
at point y at P1. [Assume M3 has not been received at x]
-
What will happen at P1 when Message
is delivered at point y
STEP A Merge VP1 & VM. Update VP1 by values of VM. 1. If for any process say P (not equal to P1) there exits an entry in VM but does
not have any entry in VP1 then that entry has to be inserted in VP1.
2. If for any process say P (not equal to P1) there exists an entry in both VM
and VP1 then you have to update the value for that process in VP1 by setting
the time =max(a,b) where a is time in VM and b is time in VP1
In our case after receiving Message M1, both VP1 & VM are ( __, __) So
there Merge operation will be same i.e. ( __, __)
SO VP1 after Merge will be ( __, __ ) i.e. empty
STEP B. Update P1s logical clock so time at P1 will be (1,1,0) STEP C. Check for Buffered Messages at P1 . There is no buffered Messages at P1.
-
Now Complete State of System after Receiving M1 by P1
is shown as follows
Process Vector Time
P1 VP1 ( __, __) [Empty] (1,1,0) at y
P2 VP2 : {P1,(0,1,0), __ }. Note that VP2 updated after
sending Message M1
(0,1,0)
P3 VP3 ( __, __) [Empty] (0,0,0)
-
Now Consider the case when M2 is sent from P2 to P3
Send Step of M2 from P2 to P3
1. P2 Sends Message M2 to P3. So P2 sends the Message M2 with VM
as { P1,(0,1,0), __ } and time stamped as (0,2,0). So the format of
sent message looks like { (P1,(0,1,0), __ ), (0,2,0) }
2. After Sending Message it updates its own vector VP1. Here P2s vector entry corresponding to P3 is empty. So now VP2 becomes {
(P1,(0,1,0)), (P3,(0,2,0) } . Note that this will be VP2 after sending
message M2. So the state of the system looks like as follows
Process Vector Time
P1 VP1 ( __, __) [Empty] (1,1,0) at y
P2 VP2 : {P1,(0,1,0), P3,(0,2,0)} After sending M2 (0,2,0)
P3 VP3 ( __, __) [Empty] P3 has not received M2 yet (0,0,0)
-
Receiving of M2 by P3 at Point a of Figure
Receiving of M2 by P3[Note here what P3 has received from P2. Message of form [{ P1,(0,1,0), __ }, (0,2,0)] i.e. [VM,t]
Check
1. If the Accompanying VM contains any entry corresponding to P3, If not receive the message . [This condition is satisfied here. VMs entry for P3 is empty].
else (If the condition of step 1 is not satisfied then only go to else part )
2. Compare time t which is in the message with time at P1 . Time running at P1 before receiving the message is (0,0,0). If t
-
What will happen at P3 when Message
M2 is delivered at point a
STEP A Merge VP3 & VM. Update VP3 by values of VM. 1. If for any process say P (not equal to P3) there exists an entry in VM but
does not have any entry in VP3 then that entry has to be inserted in VP1.
2. If for any process say P (not equal to P1) there exists an entry in both VM
and VP1 then you have to update the value for that process in VP1 by setting
the time =max(a,b) where a is time in VM and b is time in VP1
In This case
VM is [ {P1,(0,1,0)} , __ ] andVP3 is ( __ , __)
[Note VM contains entry for P1. Step1 condition ]
SO VP3 after Merge will be [{P1,(0,1,0)} , __ ]
STEP B. Update P3s logical clock so time at P3 will be (0,2,1) STEP C. Check for Buffered Messages at P3 . There is no buffered Messages at P3.
-
Now Complete State of System after Receiving M2 by P3
at point a is shown as follows
Process Vector Time
P1 VP1 ( __, __) [Empty] (1,1,0) at y
P2 VP2 : [ {P1,(0,1,0)},{P2,(0,2,0)}] Note that VP2 updated after
sending Message M2
(0,2,0)
P3 VP3 ({P1,(0,1,0)}, __) (0,2,1) at a
-
Now Consider the case when M3 is sent from P3 to P1 at point
b of Figure 1
Send Step of M3 from P3 to P1
1. P3 Sends Message M3 to P1. So P3 sends the Message M3 with VM as {
P1,(0,1,0), __ } and time stamped as (0,2,2). So the format of sent message
looks like { (P1,(0,1,0), __ ), (0,2,2) }
2. After Sending Message M3 P3 updates its own vector VP3. VP3 already
contains an entry for P1 so that entry gets simply updated with latest time
value. Complete state of system after sending M3 by P3 is shown below.
Process Vector Time
P1 VP1 ( __, __) P1 has not received M3 so far (1,1,0) at y
P2 VP2 : {P1,(0,1,0), P3,(0,2,0)} After sending M2 (0,2,0)
P3 VP3 ( P1,(0,2,2), __) P3 has sent M3 (0,2,2) at b
-
Receiving of M3 by P1 at Point x of Figure1
Receiving of M3 by P1 at point x. [Note here what P1 has received from P3. Message of form [{ P1,(0,1,0), __ }, (0,2,2)] i.e. [VM,t]
Check
1. If the Accompanying VM contains any entry corresponding to P1, If not
receive the message . [This condition is not true for this case. Because
accompanying VM contains an entry corresponding to P1].
else (If the condition of step 1 is not satisfied then only go to else part )
2. If P1 has received Message M3 at point x then current time at that point will
be (0,0,0) which is TP1. The message has come with time stamped t as (0,2,2). Clearly t is > TP1 so the message will be buffered.
-
What will happen at P3 when Message
M2 is delivered at point a
STEP A Merge VP3 & VM. Update VP3 by values of VM. 1. If for any process say P (not equal to P3) there exists an entry in VM but
does not have any entry in VP3 then that entry has to be inserted in VP1.
2. If for any process say P (not equal to P1) there exists an entry in both VM
and VP1 then you have to update the value for that process in VP1 by setting
the time =max(a,b) where a is time in VM and b is time in VP1
In This case
VM is [ {P1,(0,1,0)} , __ ] andVP3 is ( __ , __)
[Note VM contains entry for P1. Step1 condition ]
SO VP3 after Merge will be [{P1,(0,1,0)} , __ ]
STEP B. Update P3s logical clock so time at P3 will be (0,2,1) STEP C. Check for Buffered Messages at P3 . There is no buffered Messages at P3.
-
Now Complete State of System after Receiving M2 by P3
at point a is shown as follows
Process Vector Time
P1 VP1 ( __, __) [Empty] (1,1,0) at y
P2 VP2 : [ {P1,(0,1,0)},{P2,(0,2,0)}] Note that VP2 updated after
sending Message M2
(0,2,0)
P3 VP3 ({P1,(0,1,0)}, __) (0,2,1) at a
-
Global State: The Model
8/16/2014 CS ZG623, Adv OS 22
Node properties:
No shared memory
No global clock
Channel properties:
FIFO
loss free
non-duplicating
-
The Need for Global State
Many problems in Distributed Computing can be
cast as executing some action on reaching a
particular state
e.g. -Distributed deadlock detection is finding a cycle in the
Wait For Graph.
-Termination detection
-Checkpointing
And many more..
-
Difficulties due to Non Determinism
Deterministic Computation
- At any point in computation there is at
most one event that can happen next.
Non-Deterministic Computation
- At any point in computation there can be
more than one event that can happen next.
-
Global state Example
8/16/2014 CS ZG623, Adv OS 25
Global State 3
Global State 2
Global State 1
-
Recording Global state
Let global state of A is recorded in (1) and not in (2).
State of B, C1, and C2 are recorded in (2)
Extra amount of $50 will appear in global state
Reason: As state recorded before sending message and C1s state after sending message.
Inconsistent global state if n < n, where
n is number of messages sent by A along channel before As state was recorded
n is number of messages sent by A along the channel before channels state was recorded.
Consistent global state: n = n
-
Continued
Similarly, for consistency m = m
m: no. of messages received along channel before Bs state recording
m: no. of messages received along channel by B before channels state was recorded.
Also, n >= m, as no. of messages sent along the channel
be less than that of received
Hence, n >= m
Consistent global state should satisfy the above equation.
Consistent global state:
Channel state: sequence of messages sent before recording senders state, excluding the messages received before receivers state was recorded.
Only transit messages are recorded in the channel state.
-
Notion of Consistency: Example
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
-
A Consistent State?
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Sp1 Sq
1
-
Yes
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Sp1 Sq
1
-
What about this?
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Sp2 Sq
3
Yes
-
What about this?
p
q
p q
Sp0 Sp
1 Sp2 Sp
3
Sq0 Sq
1 Sq2 Sq
3
m1
m2
m3
Sp1 Sq
3
No
-
Chandy-Lamport GSR algorithm
8/16/2014 CS ZG623, Adv OS 33
Sender (Process p)
Record the state of p
For each outgoing channel (c) incident on p, send a marker before sending any other messages
Receiver (q receives marker on c1)
If q has not yet recorded its state
Record the state of q
Record the state of c1 as null
For each outgoing channel (c) incident on q , send a marker before sending any other message.
If q has already recorded its state
Record the state of (c1) as all the messages received since the last time of the state of q was recorded.
-
Uses of GSR
8/16/2014 CS ZG623, Adv OS 34
recording a consistent state of the global computation checkpointing for fault tolerance (rollback, recovery) testing and debugging monitoring and auditing
detecting stable properties in a distributed system via snapshots. A property is stable if, once it holds in a state, it holds in all subsequent states.
termination deadlock garbage collection
-
State Recording Example
8/16/2014 CS ZG623, Adv OS 35
-
State Recording Example
8/16/2014 CS ZG623, Adv OS 36
Let p transfer 100 to q, q transfer 50 to p and 30 to r, and p wants to record the global state.
400
50
30
-
State Recording Example
8/16/2014 CS ZG623, Adv OS 37
Let p transfer 100 to q, q transfer 50 to p and 30 to r, and p wants to record the global state.
400
50
30
520
-
State Recording Example
8/16/2014 CS ZG623, Adv OS 38
Let p transfer 100 to q, q transfer 50 to p and 30 to r, and p wants to record the global state.
400
50
30
520
530
50
-
Cut
8/16/2014 CS ZG623, Adv OS 39
A cut is a set of cut events, one per node, each of which
captures the state of the node on which it occurs. It is
also a graphical representation of a global state.
-
Consistent Cut
8/16/2014 CS ZG623, Adv OS 40
An inconsistent cut :
ee , e c2 , c2 -/->c3
A cut C = {c1, c2, c3, } is consistent if for all sites there are no events ei and ej such that:
(ei --> ej) and (ej --> cj) and (ei -/-> ci), ci , cj C
e
e
-
Ordering of Cut events
8/16/2014 CS ZG623, Adv OS 41
The cut events in a consistent cut are not causally related. Thus, the cut is a set of concurrent events and a set of concurrent events is a cut.
Note, in this inconsistent cut, c3 --> c2.
-
Termination detection
8/16/2014 CS ZG623, Adv OS 42
Question:
In a distributed computation,
when are all of the processes
become idle (i.e., when has the
computation terminated)?
-
Huangs algorithm
8/16/2014 CS ZG623, Adv OS 43
The computation starts when the controlling agent sends the first message and terminates when all processes are idle.
The role of weights: the controlling agent initially has a weight of 1 and all others
have a weight of zero,
when a process sends a message, a portion of the senders weight is put in the message, reducing the senders weight,
a receiver adds to its weight the weight of a received message, on becoming idle, a process sends it weight to the controlling
agent,
the sum of all weights is always 1.
The computation terminates when the weight of the controlling agent reaches 1 after the first message.
-
Continued
8/16/2014 CS ZG623, Adv OS 44
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 4
Saturday, Aug 23-2014
-
Distributed Mutual Exclusion
-Non-Token Based Algorithms (Will be Covered Today)
1. Lamports Algorithm
2. Ricart-Agarwala Algorithm
3. Maekawas Algorithm
-Token Based Algorithms (Next Class)
1. Suzuki-Kasami Algorithm
2. Singhals Hieuristic Search
3. Raymonds Tree Based
Todays Lecture
-
3
Mutual Exclusion
Access to shared resource by several un-coordinated user requests must be
serialized
Actions performed on shared resource on user behalf must be atomic
Examples : Directory Managment
-
4
Classification of Mutual Exclusion
Algorithms
Non-token based:
A site/process can enter a critical section when an
assertion (condition) becomes true.
Algorithm should ensure that the assertion will be true
in only one site/process.
Token based:
A unique token (a known, PRIVILEGE message) is shared
among cooperating sites/processes.
Possessor of the token has access to critical section.
Need to take care of conditions such as loss of token,
crash of token holder, possibility of multiple tokens, etc.
-
5
General System Model
At any instant, a site may have several requests for critical section (CS), queued up, and serviced one at
a time.
Site States: Requesting CS, executing CS, idle (neither requesting nor executing CS).
Requesting CS: blocked until granted access, cannot make additional requests for CS.
Executing CS: using the CS.
Idle: action is outside the site. In token-based approaches, idle site can have the token.
-
6
Mutual Exclusion: Requirements
Freedom from deadlocks: two or more sites should not endlessly wait on conditions/messages that never
become true/arrive.
Freedom from starvation: No indefinite waiting.
Fairness: Order of execution of CS follows the order of the requests for CS. (equal priority).
Fault tolerance: recognize faults, reorganize, continue. (e.g., loss of token).
-
7
Performance
Number of messages per CS invocation: should be minimized.
Synchronization delay, i.e., time between the leaving of CS by a site and the entry of CS by the next one: should be minimized.
Response time: time interval between request messages transmissions and exit of CS.
System throughput, i.e., rate at which system executes requests for CS: should be maximized.
If sd is synchronization delay, E the average CS execution time: system throughput = 1 / (sd + E).
-
8
Performance metrics
Last site
exits CS
Next site
enters CS
Synchronization
delay
Time
Time
CS Request
arrives
Messages
sent Enter CS Exit CS
E
Response Time
-
9
Performance ...
Low and High Load:
Low load: No more than one request at a given point in time.
High load: Always a pending mutual exclusion request at a site.
Best and Worst Case:
Best Case (low loads): Round-trip message delay + Execution time. 2T + E.
Worst case (high loads).
Message traffic: low at low loads, high at high loads.
Average performance: when load conditions fluctuate widely.
-
10
Simple Solution
Control site: grants permission for CS execution.
A site sends REQUEST message to control site.
Controller grants access one by one.
Synchronization delay: 2T -> A site release CS by sending message to controller and controller sends
permission to another site.
System throughput: 1/(2T + E). If synchronization delay is reduced to T, throughput doubles.
Controller becomes a bottleneck, congestion can occur.
-
11
Non-token Based Algorithms
Notations:
Si: Site I
Ri: Request set, containing the ids of all Sis from which permission must be received before accessing CS.
Non-token based approaches use time stamps to order requests for CS.
Smaller time stamps get priority over larger ones.
Lamports Algorithm
Ri = {S1, S2, , Sn}, i.e., all sites.
Request queue: maintained at each Si. Ordered by time stamps.
Assumption: message delivered in FIFO.
-
12
Lamports Algorithm
Requesting CS: Send REQUEST(tsi, i). (tsi,i): Request time stamp. Place
REQUEST in request_queuei.
On receiving the message; sj sends time-stamped REPLY message to si. Sis request placed in request_queuej.
Executing CS: Si has received a message with time stamp larger than (tsi,i)
from all other sites.
Sis request is the top most one in request_queuei.
Releasing CS: Exiting CS: send a time stamped RELEASE message to all
sites in its request set.
Receiving RELEASE message: Sj removes Sis request from its queue.
-
13
Lamports Algorithm
Performance.
3(N-1) messages per CS invocation. (N - 1) REQUEST, (N - 1) REPLY, (N - 1) RELEASE messages.
Synchronization delay: T
Optimization
Suppress reply messages. (e.g.,) Sj receives a REQUEST message from Si after sending its own REQUEST message
with time stamp higher than that of Sis. Do NOT send REPLY message.
Messages reduced to between 2(N-1) and 3(N-1).
-
14
Lamports Algorithm: Example
(2,1)
(1,2)
S1
S2
S3
(1,2) (2,1)
(1,2) (2,1)
S1
S2
S3
Step 1:
Step 2:
S2 enters CS
(1,2) (2,1)
-
15
Lamports: Example
(1,2) (2,1)
(1,2) (2,1)
S1
S2
S3
Step 3:
(1,2) (2,1)
S2 leaves CS
(1,2) (2,1)
(1,2) (2,1)
S1
S2
S3
Step 4:
(1,2) (2,1)
S1 enters CS
(2,1)
(2,1)
(2,1)
-
16
Ricart-Agrawala Algorithm
Requesting critical section Si sends time stamped REQUEST message
Sj sends REPLY to Si, if
Sj is not requesting nor executing CS
If Sj is requesting CS and Sis time stamp is smaller than its own request.
Request is deferred otherwise.
Executing CS: after it has received REPLY from all sites in its request set.
Releasing CS: Send REPLY to all deferred requests. i.e., a sites REPLY messages are blocked only by sites with smaller time stamps
-
17
Ricart-Agrawala: Performance
Performance: 2(N-1) messages per CS execution. (N-1) REQUEST + (N-1)
REPLY.
Synchronization delay: T.
Optimization [Proposed by Roucairol and Carvalho]: When Si receives REPLY message from Sj -> authorization
to access CS till
Sj sends a REQUEST message and Si sends a REPLY message.
Access CS repeatedly till then.
A site requests permission from dynamically varying set of sites: 0 to 2(N-1) messages.
-
18
Ricart-Agrawala: Example
(2,1)
(1,2)
S1
S2
S3
(2,1)
S1
S2
S3
Step 1:
Step 2:
S2 enters CS
-
19
Ricart-Agrawala: Example
(2,1)
S1
S2
S3
Step 3:
S1 enters CS
S2 leaves CS
-
20
Maekawas Algorithm
A site requests permission only from a subset of sites.
Request set of sites si & sj: Ri, Rj such that Ri and Rj will have atleast one common site (Sk). Sk mediates conflicts between Ri and Rj.
A site can send only one REPLY message at a time, i.e., a site can send a REPLY message only after receiving a RELEASE message for the previous REPLY message.
Request Sets Rules: Sets Ri and Rj have atleast one common site.
Si is always in Ri.
Cardinality of Ri, i.e., the number of sites in Ri is K.
Any site Si is in K number of Ris. N = K(K - 1) + 1 -> K = square root of N.
-
21
Maekawas Algorithm ...
Requesting CS Si sends REQUEST(i) to sites in Ri.
Sj sends REPLY to Si if
Sj has NOT sent a REPLY message to any site after it received the last RELEASE message.
Otherwise, queue up Sis request.
Executing CS: after getting REPLY from all sites in Ri.
Releasing CS send RELEASE(i) to all sites in Ri
Any Sj after receiving RELEASE message, send REPLY message to the next request in queue.
If queue empty, update status indicating receipt of RELEASE.
-
22
Request Subsets
Example k = 2; (N = 3).
R1 = {1, 2}; R3 = {1, 3}; R2 = {2, 3}
Example k = 3; N = 7.
R1 = {1, 2, 3}; R4 = {1, 4, 5}; R6 = {1, 6, 7};
R2 = {2, 4, 6}; R5 = {2, 5, 7}; R7 = {3, 4, 7};
R3 = {3, 5, 6}
-
23
Maekawas Algorithm ...
Performance Synchronization delay: 2T
Messages: 3 times square root of N (one each for REQUEST, REPLY, RELEASE messages)
Deadlocks Message deliveries are not ordered.
Assume Si, Sj, Sk concurrently request CS
Ri intersection Rj = {Sij}, Rj Rk = {Sjk}, Rk Ri = {Ski}
Possible that:
Sij is locked by Si (forcing Sj to wait at Sij)
Sjk by Sj (forcing Sk to wait at Sjk)
Ski by Sk (forcing Si to wait at Ski)
-> deadlocks among Si, Sj, and Sk.
-
24
Handling Deadlocks in Maekawas Algorithm
Si yields to a request if that has a smaller time stamp.
A site suspects a deadlock when it is locked by a request with a higher time stamp (lower priority).
Deadlock handling messages:
FAILED: from Si to Sj -> Si has granted permission to higher priority request.
INQUIRE: from Si to Sj -> Si would like to know Sj has succeeded in locking all sites in Sjs request set.
YIELD: from Si to Sj -> Si is returning permission to Sj so that Sj can yield to a higher priority request.
-
25
Handling Deadlocks
REQUEST(tsi,i) to Sj: Sj is locked by Sk -> Sj sends FAILED to Si, if Sis request has
higher time stamp.
Otherwise, Sj sends INQUIRE(j) to Sk.
INQUIRE(j) to Sk: Sk sends a YIELD (k) to Sj, if Sk has received a FAILED
message from a site in Sks set. (or) if Sk sent a YIELD and has not received a new REPLY.
YIELD(k) to Sj: Sj assumes it has been released by Sk, places Sks request in
its queue appropriately, sends a REPLY(j) to the top request in its queue.
Sites may exchange these messages even if there is no real deadlock. Maximum number of messages per CS request: 5 times square root of N.
-
Next Class
Token-Based Algorithms For Distributed
Mutual Exclusion
26
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 5
Sunday, Aug 24-2014
-
Distributed Mutual Exclusion
-Non-Token Based Algorithms
1. Lamports Algorithm (Covered in Previous Class)
2. Ricart-Agarwala Algorithm (Covered in Previous Class)
3. Maekawas Algorithm (Will be Covered Today)
-Token Based Algorithms (Will be Covered Today)
1. Suzuki-Kasami Algorithm
2. Singhals Hieuristic Search
3. Raymonds Tree Based
Todays Lecture
-
3
Maekawas Algorithm
A site requests permission only from a subset of sites.
Request set of sites si & sj: Ri, Rj such that Ri and Rj will have atleast one common site (Sk). Sk mediates conflicts between Ri and Rj.
A site can send only one REPLY message at a time, i.e., a site can send a REPLY message only after receiving a RELEASE message for the previous REPLY message.
Request Sets Rules: Sets Ri and Rj have atleast one common site.
Si is always in Ri.
Cardinality of Ri, i.e., the number of sites in Ri is K.
Any site Si is in K number of Ris. N = K(K - 1) + 1 -> K = square root of N.
-
4
Maekawas Algorithm ...
Requesting CS Si sends REQUEST(i) to sites in Ri.
Sj sends REPLY to Si if
Sj has NOT sent a REPLY message to any site after it received the last RELEASE message.
Otherwise, queue up Sis request.
Executing CS: after getting REPLY from all sites in Ri.
Releasing CS send RELEASE(i) to all sites in Ri
Any Sj after receiving RELEASE message, send REPLY message to the next request in queue.
If queue empty, update status indicating receipt of RELEASE.
-
5
Request Subsets
Example k = 2; (N = 3).
R1 = {1, 2}; R3 = {1, 3}; R2 = {2, 3}
Example k = 3; N = 7.
R1 = {1, 2, 3}; R4 = {1, 4, 5}; R6 = {1, 6, 7};
R2 = {2, 4, 6}; R5 = {2, 5, 7}; R7 = {3, 4, 7};
R3 = {3, 5, 6}
-
6
Maekawas Algorithm ...
Performance Synchronization delay: 2T
Messages: 3 times square root of N (one each for REQUEST, REPLY, RELEASE messages)
Deadlocks Message deliveries are not ordered.
Assume Si, Sj, Sk concurrently request CS
Ri intersection Rj = {Sij}, Rj Rk = {Sjk}, Rk Ri = {Ski}
Possible that:
Sij is locked by Si (forcing Sj to wait at Sij)
Sjk by Sj (forcing Sk to wait at Sjk)
Ski by Sk (forcing Si to wait at Ski)
-> deadlocks among Si, Sj, and Sk.
-
7
Handling Deadlocks in Maekawas Algorithm
Si yields to a request if that has a smaller time stamp.
A site suspects a deadlock when it is locked by a request with a higher time stamp (lower priority).
Deadlock handling messages:
FAILED: from Si to Sj -> Si has granted permission to higher priority request.
INQUIRE: from Si to Sj -> Si would like to know Sj has succeeded in locking all sites in Sjs request set.
YIELD: from Si to Sj -> Si is returning permission to Sj so that Sj can yield to a higher priority request.
-
8
Handling Deadlocks
REQUEST(tsi,i) to Sj: Sj is locked by Sk -> Sj sends FAILED to Si, if Sis request has
higher time stamp.
Otherwise, Sj sends INQUIRE(j) to Sk.
INQUIRE(j) to Sk: Sk sends a YIELD (k) to Sj, if Sk has received a FAILED
message from a site in Sks set. (or) if Sk sent a YIELD and has not received a new REPLY.
YIELD(k) to Sj: Sj assumes it has been released by Sk, places Sks request in
its queue appropriately, sends a REPLY(j) to the top request in its queue.
Sites may exchange these messages even if there is no real deadlock. Maximum number of messages per CS request: 5 times square root of N.
-
9
Token-based Algorithms
Unique token circulates among the participating sites.
A site can enter CS if it has the token.
Token-based approaches use sequence numbers instead of time stamps.
Request for a token contains a sequence number.
Sequence number of sites advance independently.
Correctness issue is trivial since only one token is present -> only one site can enter CS.
Deadlock and starvation issues to be addressed.
-
10
Suzuki-Kasami Algorithm
If a site without a token needs to enter a CS, broadcast a REQUEST for token message to all other sites.
Token: (a) Queue of request sites (b) Array LN[1..N], the sequence number of the most recent execution by a site j.
Token holder sends token to requestor, if it is not inside CS. Otherwise, sends after exiting CS.
Token holder can make multiple CS accesses.
Design issues:
Distinguishing outdated REQUEST messages.
Format: REQUEST(j,n) -> jth site making nth request.
Each site has RNi[1..N] -> RNi[j] is the largest sequence number of request from j.
Determining which site has an outstanding token request.
If LN[j] = RNi[j] - 1, then Sj has an outstanding request.
-
11
Suzuki-Kasami Algorithm ...
Passing the token
After finishing CS
(assuming Si has token), LN[i] := RNi[i]
Token consists of Q and LN. Q is a queue of requesting sites.
Token holder checks if RNi[j] = LN[j] + 1. If so, place j in Q.
Send token to the site at head of Q.
Performance
0 to N messages per CS invocation.
Synchronization delay is 0 (if the token holder repeats CS) or T.
-
Suzuki Kasami Example
-
Consider 5 Sites
S1 S2
S4
S3
S5
n=1 n=2
n=5
n=4
n=3
0
1
0
2
0
3
0
4
0
5 0
1
0
2
0
3
0
4
0
5
0
1
0
2
0
3
0
4
0
5
0
1
0
2
0
3
0
4
0
5
0
1
0
2
0
3
0
4
0
5
RN1 RN2
RN5
RN3
RN4
0
1
0
2
0
3
0
4
0
5
LN
Q { } (empty)
Token at Site
S1
-
Site S1 wants to enter CS
S1 wont execute Request Part of the algorithm
S1 enters CS as it has token
When S1 release CS,it performs following tasks
(i) LN[1] = RN1[1] = 0 i.e. same as before
(ii) If during S1s CS execution, any request from other site (say S2) have come then that Sites id will now be entered into the queue if RN1[2] = LN[2]+1
So if site holding (not received from other) the token enters CS then overall state of the system wont change it remains same.
-
Site S2 wants to enter CS
S2 does not have token so it sends REQUEST(2,1) message to every other site
Sites S1,S3,S4 and S5 updates their RN as follows
0
1
1
2
0
3
0
4
0
5
It may be possible that S2s request came at S1 when S1 is under CS execution So at this point S1 will not pass the token to S2. It will finish its CS and then adds S2
Into Queue because at that point
RN1[2] = 1 which is LN[2]+1
Now as the queue is not empty, so S1 pass token to S2 by removing S2 entry from
Token queue.
-
Site S2 Sends REQUEST(2,1)
S1 S2
S4
S3
S5
n=1 n=2
n=5
n=4
0
1
1
2
0
3
0
4
0
5 0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
RN1 RN2
RN3
RN4
0
1
0
2
0
3
0
4
0
5
LN
Q {S2 } [Inserted only when S1 leave CS]
Token at Site
S1
REQUEST(2,1)
REQUEST(2,1)
REQUEST(2,1)
REQUEST(2,1)
-
Site S1 Sends Token to S2
S1 S2
S4
S3
S5
n=1 n=2
n=5
n=4
0
1
1
2
0
3
0
4
0
5 0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
RN1 RN2
RN3
RN4
LN Token at Site S2
0
1
0
2
0
3
0
4
0
5
Q{}
LN
-
Site S2 Leaves CS
S1 S2
S4
S3
S5
n=1 n=2
n=5
n=4
0
1
1
2
0
3
0
4
0
5 0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
0
1
1
2
0
3
0
4
0
5
RN1 RN2
RN3
RN4
LN Token at Site S2
0
1
1
2
0
3
0
4
0
5
Q{}
LN
-
Now suppose S2 again wants to enter
CS
S2 can enter CS as token queue is empty i.e. no other site has requested for CS and
token is currently held by S2
When S2 release CS it updates LN[2] as RN2[2] which is same as 1.
-
Salient Points
A site updates its RN table as soon as it receives Request message from any other site [At this point a site may hold an ideal token or
may be executing CS]
When a site enters CS without sending request message (i.e. when site wants to again enter CS without making request) then its RN
entry and LN entry of token remains unchanged
A site can be added in to token queue only by the site who is releasing the CS
A site holding the token can repeatedly enter CS until it does not send token to some other site.
A site gives priority to other sites with outstanding requests for the CS over its pending requests for the CS
-
21
Singhals Heuristic Algorithm
Instead of broadcast: each site maintains information on other sites, guess the sites likely to have the token.
Data Structures: Si maintains SVi[1..M] and SNi[1..M] for storing information on
other sites: state and highest sequence number.
Token contains 2 arrays: TSV[1..M] and TSN[1..M].
States of a site
R : requesting CS
E : executing CS
H : Holding token, idle
N : None of the above
Initialization: SVi[j] := N, for j = M .. i; SVi[j] := R, for j = i-1 .. 1; SNi[j] := 0, j =
1..N. S1 (Site 1) is in state H.
Token: TSV[j] := N & TSN[j] := 0, j = 1 .. N.
-
22
Singhals Heuristic Algorithm
Requesting CS If Si has no token and requests CS:
SVi[i] := R. SNi[i] := SNi[i] + 1.
Send REQUEST(i,sn) to sites Sj for which SVi[j] = R. (sn: sequence number, updated value of SNi[i]).
Receiving REQUEST(i,sn): if sn SVj[i] := R.
SVj[j] = R -> If SVj[i] != R, set it to R & send REQUEST(j,SNj[j]) to Si. Else do nothing.
SVj[j] = E -> SVj[i] := R.
SVj[j] = H -> SVj[i] := R, TSV[i] := R, TSN[i] := sn, SVj[j] = N. Send token to Si.
Executing CS: after getting token. Set SVi[i] := E.
-
23
Singhals Heuristic Algorithm
Releasing CS SVi[i] := N, TSV[i] := N. Then, do:
For other Sj: if (SNi[j] > TSN[j]), then {TSV[j] := SVi[j]; TSN[j] := SNi[j]}
else {SVi[j] := TSV[j]; SNi[j] := TSN[j]}
If SVi[j] = N, for all j, then set SVi[i] := H. Else send token to a site Sj provided SVi[j] = R.
Fairness of algorithm will depend on choice of Si, since no queue is maintained in token.
Arbitration rules to ensure fairness used.
Performance Low to moderate loads: average of N/2 messages.
High loads: N messages (all sites request CS).
Synchronization delay: T.
-
24
Singhal: Example
S2
S3
S4
Sn
S1
1 2 3 4 n
H
R N
R
N N N
N N N
N N
N
N N N N
R
R R R
R R R R
S1
S2
S4
Sn
S3
1 2 3 4 n
N
N N
R
R N N
R N N
N N
N
N H N N
N
R R R
R R R R
(a) Initial Pattern
Each row in the matrix has
increasing number of Rs.
(b) Pattern after S3 gets the token from
S1.
Stair case is pattern can be identified
by noting that S1 has 1 R and S2 has 2
Rs and so on. Order of occurrence of R
in a row does not matter.
-
25
Singhal: Example
Assume there are 3 sites in the system. Initially:
Site 1: SV1[1] = H, SV1[2] = N, SV1[3] = N. SN1[1], SN1[2], SN1[3] are 0.
Site 2: SV2[1] = R, SV2[2] = N, SV2[3] = N. SNs are 0.
Site 3: SV3[1] = R, SV3[2] = R, SV3[3] = N. SNs are 0.
Token: TSVs are N. TSNs are 0.
Assume site 2 is requesting token.
S2 sets SV2[2] = R, SN2[2] = 1.
S2 sends REQUEST(2,1) to S1 (since only S1 is set to R in SV[2])
S1 receives the REQUEST. Accepts the REQUEST since SN1[2] is smaller than
the message sequence number.
Since SV1[1] is H: SV1[2] = R, TSV[2] = R, TSN[2] = 1, SV1[1] = N.
Send token to S2
S2 receives the token. SV2[2] = E. After exiting the CS, SV2[2] = TSV[2] = N.
Updates SN, SV, TSN, TSV. Since nobody is REQUESTing, SV2[2] = H.
Assume S3 makes a REQUEST now. It will be sent to both S1 and S2. Only S2
responds since only SV2[2] is H (SV1[1] is N now).
-
26
Raymonds Algorithm
Sites are arranged in a logical directed tree. Root: token holder. Edges: directed towards root.
Every site has a variable holder that points to an immediate neighbor node, on the directed path towards root. (Roots holder point to itself).
Requesting CS If Si does not hold token and request CS, sends REQUEST upwards
provided its request_q is empty. It then adds its request to request_q.
Non-empty request_q -> REQUEST message for top entry in q (if not done before).
Site on path to root receiving REQUEST -> propagate it up, if its request_q is empty. Add request to request_q.
Root on receiving REQUEST -> send token to the site that forwarded the message. Set holder to that forwarding site.
Any Si receiving token -> delete top entry from request_q, send token to that site, set holder to point to it. If request_q is non-empty now, send REQUEST message to the holder site.
-
27
Raymonds Algorithm
Executing CS: getting token with the site at the top of request_q. Delete top of request_q, enter CS.
Releasing CS If request_q is non-empty, delete top entry from q, send token
to that site, set holder to that site.
If request_q is non-empty now, send REQUEST message to the holder site.
Performance Average messages: O(log N) as average distance between 2
nodes in the tree is O(log N).
Synchronization delay: (T log N) / 2, as average distance between 2 sites to successively execute CS is (log N) / 2.
Greedy approach: Intermediate site getting the token may enter CS instead of forwarding it down. Affects fairness, may cause starvation.
-
28
Raymonds Algorithm: Example
S1
S4 S5
S2
S7
S3
S6
Token
holder
Token
request
S1
S4 S5
S2
S7
S3
S6
Step 1:
Step 2:
Token
-
29
Raymonds Algm.: Example
S1
S4 S5
S2
S7
S3
S6
Step 3:
Token
holder
-
30
Comparison
Non-Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)
Lamport 2T+E T 3(N-1) 3(N-1)
Ricart-Agrawala 2T+E T 2(N-1) 2(N-1)
Maekawa 2T+E 2T 3*sq.rt(N) 5*sq.rt(N)
Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)
Suzuki-Kasami 2T+E T N N
Singhal 2T+E T N/2 N
Raymond T(log N)+E Tlog(N)/2 log(N) 4
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 6
Friday, Aug 29-2014
-
Distributed Mutual Exclusion
-Non-Token Based Algorithms
1. Lamports Algorithm (Covered in Previous Class)
2. Ricart-Agarwala Algorithm (Covered in Previous Class)
3. Maekawas Algorithm (Covered in Previous Class)
-Token Based Algorithms
1. Suzuki-Kasami Algorithm (Covered in Previous Class)
2. Singhals Hieuristic Search (Covered in Previous Class)
* Singhals Algorithm Example (Will be Covered Today)
3. Raymonds Algorithm (Will be Covered Today)
-Comparisons of DMS Algorithms (Will be Covered Today)
- Introduction to Distributed Deadlock (Will be Covered Today)
Todays Lecture
-
Singhals DMS Algorithm Example
-
R R R R N
R R R N N
R R N N N
R N N N N
H N N N N
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
R R R R R
R R R R N
R R R N N
R R N N N
H N N N N
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
S5
S4
S3
S2
S1
S5, S4, S3,S2 Requests for CS
S5 -> S4, S3, S2,S1 , S4 -> S3, S2,S1 S3 -> S2,S1
S2 -> S1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
-
R R R R R
R R R R R
R R R N R
R R N N R
H N N N N
0 0 0 0 1
0 0 0 1 1
0 0 1 0 1
0 1 0 0 1
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
S5s CS Request Received by S4, S3 , S2
R R R R R
R R R R R
R R R N R
R R N N R
H N N N N
0 1 1 1 1
0 0 0 1 1
0 0 1 0 1
0 1 0 0 1
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
In Response to S5s CS Request Sites S4, S3 , S2 will also send request
-
R R R R R
R R R R R
R R R R R
R R N R R
H N N N N
0 1 1 1 1
0 0 0 1 1
0 0 1 1 1
0 1 0 1 1
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
S4s CS Request Received by S3 , S2
R R R R R
R R R R R
R R R R R
R R N R R
H N N N N
0 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 1 0 1 1
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
In Response to S4s CS Request Sites S3 , S2 will also send request back to S4
-
R R R R R
R R R R R
R R R R R
R R R R R
H N N N N
0 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 1 1 1 1
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
S3s CS Request Received by S2
R R R R R
R R R R R
R R R R R
R R R R R
H N N N N
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 0 0 0 0
SV SN
0 0 0 0 0 N N N N N
TSN TSV
S5
S4
S3
S2
S1
In Response to S3s CS Request Site S2 will also send request back to S3
-
S5s CS Request Received by S1
R R R R R
R R R R R
R R R R R
R R R R R
N N N N R
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 0 0 0 1
SV SN
0 0 0 0 1 N N N N R
TSN TSV
S5
S4
S3
S2
S1
S1 Will Pass Token to S5
S4,S3 CS Request Received by S1
R R R R R
R R R R R
R R R R R
R R R R R
N N R R R
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 0 1 1 1
SV SN
0 0 0 0 1 N N N N R
TSN TSV
S5
S4
S3
S2
S1
-
S5 Receives Token From S1 and enters CS
R R R R E
R R R R R
R R R R R
R R R R R
N N N N R
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 0 0 0 1
SV SN
0 0 0 0 1 N N N N R
TSN TSV
S5
S4
S3
S2
S1
S5 Releases CS
N R R R N
R R R R R
R R R R R
R R R R R
N N N N R
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 0 0 0 1
SV SN
0 1 1 1 1 N R R R N
TSN TSV
S5
S4
S3
S2
S1
-
10
Raymonds Algorithm
Sites are arranged in a logical directed tree. Root: token holder. Edges: directed towards root.
Every site has a variable holder that points to an immediate neighbor node, on the directed path towards root. (Roots holder point to itself).
Requesting CS If Si does not hold token and request CS, sends REQUEST upwards
provided its request_q is empty. It then adds its request to request_q.
Non-empty request_q -> REQUEST message for top entry in q (if not done before).
Site on path to root receiving REQUEST -> propagate it up, if its request_q is empty. Add request to request_q.
Root on receiving REQUEST -> send token to the site that forwarded the message. Set holder to that forwarding site.
Any Si receiving token -> delete top entry from request_q, send token to that site, set holder to point to it. If request_q is non-empty now, send REQUEST message to the holder site.
-
11
Raymonds Algorithm
Executing CS: getting token with the site at the top of request_q. Delete top of request_q, enter CS.
Releasing CS If request_q is non-empty, delete top entry from q, send token
to that site, set holder to that site.
If request_q is non-empty now, send REQUEST message to the holder site.
Performance Average messages: O(log N) as average distance between 2
nodes in the tree is O(log N).
Synchronization delay: (T log N) / 2, as average distance between 2 sites to successively execute CS is (log N) / 2.
Greedy approach: Intermediate site getting the token may enter CS instead of forwarding it down. Affects fairness, may cause starvation.
-
12
Raymonds Algorithm: Example
S1
S4 S5
S2
S7
S3
S6
Token
holder
Token
request
S1
S4 S5
S2
S7
S3
S6
Step 1:
Step 2:
Token
-
13
Raymonds Algm.: Example
S1
S4 S5
S2
S7
S3
S6
Step 3:
Token
holder
-
14
Comparison
Non-Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)
Lamport 2T+E T 3(N-1) 3(N-1)
Ricart-Agrawala 2T+E T 2(N-1) 2(N-1)
Maekawa 2T+E 2T 3*sq.rt(N) 5*sq.rt(N)
Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)
Suzuki-Kasami 2T+E T N N
Singhal 2T+E T N/2 N
Raymond T(log N)+E Tlog(N)/2 log(N) 4
-
15
Deadlock
Deadlock is a situation where a process or a set of processes is blocked on an event
that never occurs
Processes while holding some resources may request for additional allocation of
resources which are held by other
processes
Processes are in circular wait for the resources
-
16
Deadlock vs Starvation
Starvation occurs when a process waits for a resource that becomes available continuously but is not allocated
to a process
Two Main Differences
- In starvation it is not certain that a process will ever get
the requested resource where as a deadlocked process
is permanently blocked because required resource never
become available
- In starvation the resource under contention is in
continuation use where as this is not true in case of
deadlock
-
17
Necessary Conditions for Deadlock
Exclusive access
Wait while hold
No Preemption
Circular wait
-
18
Models of Deadlock
Single-Unit Request Model
- Process is restricted to request only one resource at a
time
- Outdegree in WFG is one
- Cycle in WFG means deadlock
P1 P2
P3
-
19
Models of Deadlock
AND Request Model
- Process can simultaneously request multiple resources
- Process Remain blocked until all the resources are granted
- Outdegree of WFG can be more than 1
- Cycle in WFG means system is deadlocked
- Process can be involved in more than one deadlock
P1 P2
P3 P4
-
20
Models of Deadlock
OR Request Model
- Process can simultaneously request multiple resources
- Process Remain blocked until it is granted any of the requested
resources
- Outdegree of WFG can be more than 1
- Cycle in WFG is not a sufficient condition for the deadlock
- Knot in the WFG is a sufficient condition for deadlock
- Knot is a subset of graph such that starting from any node in the
subset it is impossible to leave the knot by following the edges of the
graph
-
Cycle vs Knot
P1 P2
P3
P4
P5
Cycle but no
Knot
Deadlock in AND Model
But no Deadlock in OR Model
P1 P2
P3
P4
P5
Cycle & Knot
Deadlock in both AND & OR Model
-
Resources
Reusable (CPU, Main-memory, I/O Devices)
Consumable (Messages, Interrupt Signals
-
23
Distributed Deadlock Detection
Assumptions:
a. System has only reusable resources
b. Only exclusive access to resources
c. Only one copy of each resource
d. States of a process: running or blocked
e. Running state: process has all the resources
f. Blocked state: waiting on one or more resource
-
24
Resource vs Communication
Deadlocks
Resource Deadlocks
A process needs multiple resources for an activity.
Deadlock occurs if each process in a set request resources
held by another process in the same set, and it must receive
all the requested resources to move further.
Communication Deadlocks
Processes wait to communicate with other processes in a set.
Each process in the set is waiting on another processs
message, and no process in the set initiates a message
until it receives a message for which it is waiting.
-
25
Graph Models
Nodes of a graph are processes. Edges of a graph the pending requests or assignment of resources.
Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2.
Transaction-wait-for Graphs (TWF): WFG in databases.
Deadlock: directed cycle in the graph.
Cycle example:
P1 P2
-
26
Graph Models
Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2.
P1
P2
R1
R2
Request
Edge
Assignment
Edge
-
27
AND, OR Models
AND Model
A process/transaction can simultaneously request for multiple resources.
Remains blocked until it is granted all of the requested resources.
OR Model
A process/transaction can simultaneously request for multiple resources.
Remains blocked till any one of the requested resource is granted.
-
28
Sufficient Condition
P1 P2
P3 P4
P5
P6
Deadlock ??
-
29
AND, OR Models
AND Model
Presence of a cycle.
P1 P2
P3 P4
P5
-
30
AND, OR Models
OR Model
Presence of a knot.
Knot: Subset of a graph such that starting from any node in the subset, it is impossible to leave the knot by
following the edges of the graph.
P1 P2
P3 P4
P5
P6
-
31
Deadlock Handling Strategies
Deadlock Prevention: difficult
Deadlock Avoidance: before allocation, check for possible deadlocks.
Difficult as it needs global state info in each site (that handles resources).
Deadlock Detection: Find cycles. Focus of discussion.
Deadlock detection algorithms must satisfy 2 conditions:
No undetected deadlocks.
No false deadlocks.
-
32
Distributed Deadlocks
Centralized Control
A control site constructs wait-for graphs (WFGs) and checks for directed cycles.
WFG can be maintained continuously (or) built on-demand by requesting WFGs from individual sites.
Distributed Control
WFG is spread over different sites.Any site can initiate the deadlock detection process.
Hierarchical Control
Sites are arranged in a hierarchy.
A site checks for cycles only in descendents.
-
33
Centralized Algorithms
Ho-Ramamurthy 2-phase Algorithm
Each site maintains a status table of all processes initiated at that site: includes all resources locked & all resources being waited on.
Controller requests (periodically) the status table from each site.
Controller then constructs WFG from these tables, searches for cycle(s).
If no cycles, no deadlocks.
Otherwise, (cycle exists): Request for state tables again.
Construct WFG based only on common transactions in the 2 tables.
If the same cycle is detected again, system is in deadlock.
Later proved: cycles in 2 consecutive reports need not result in a deadlock. Hence, this algorithm detects false deadlocks.
-
34
Centralized Algorithms...
Ho-Ramamoorthy 1-phase Algorithm
Each site maintains 2 status tables: resource status table and process status table.
Resource table: transactions that have locked or are waiting for resources.
Process table: resources locked by or waited on by transactions.
Controller periodically collects these tables from each site.
Constructs a WFG from transactions common to both the tables.
No cycle, no deadlocks.
A cycle means a deadlock.
-
BITS Pilani
Pilani Campus
Advance Operating System
(CS ZG623)
First Semester 2014-2015
PANKAJ VYAS
Department of CS/IS
Lecture 7
Sunday, Aug 31-2014
-
What Covered in Last Class
- Introduction to Deadlocks
- Centralized Algorithms
- Ho-Ramamoorthys 2-Phase
- Ho-Ramamoorthys 1-Phase
Todays Lecture
- Distributed Deadlock Detection Algorithms
- Path Pushing [Will not be Covered in Syllabus]
- Edge Chasing [Examples :CMH for AND Model ,
Sinha-Natarajan. Mitchell-Merritt Algorithm ]
- Diffusion Computation [Examples :CMH for OR Model,
Chandy-Herman]
-Global State Detection [Will not be Covered in Syllabus]
Todays Lecture
-
3
Distributed Algorithms
Path-pushing: resource dependency information disseminated through designated paths (in the graph) [Examples : Menasce-Muntz & Obermarck]
[Are not included in Syllabus]
Edge-chasing: special messages or probes circulated along edges of WFG. Deadlock exists if the probe is received back by the initiator. [Examples :CMH for AND Model , Sinha-Natarajan]
Diffusion computation: queries on status sent to process in WFG. [Examples :CMH for OR Model, Chandy-Herman]
Global state detection: get a snapshot of the distributed system. [Examples :Bracha-Toueg,Kshemkalyani-Singhal]
[Are not included in Syllabus]
-
4
Edge-Chasing Algorithm
Chandy-Misra-Haass Algorithm (AND MODEL):
A probe(i, j, k) is used by a deadlock detection process Pi. This probe is sent by the home site of Pj to Pk.
This probe message is circulated via the edges of the graph. Probe returning to Pi implies deadlock detection.
Terms used:
Pj is dependent on Pk, if a sequence of Pj, Pi1,.., Pim, Pk exists.
Pj is locally dependent on Pk, if above condition + Pj,Pk on same site.
Each process maintains an array dependenti: dependenti(j) is true if Pi knows that Pj is dependent on it. (initially set to false for all i & j).
-
5
Chandy-Misra-Haass Algorithm Sending the probe:
if Pi is locally dependent on itself then deadlock.
else for all Pj and Pk such that
(a) Pi is locally dependent upon Pj, and
(b) Pj is waiting on Pk, and
(c ) Pj and Pk are on different sites, send probe(i,j,k) to the home
site of Pk.
Receiving the probe:
if (d) Pk is blocked, and
(e) dependentk(i) is false, and
(f) Pk has not replied to all requests of Pj,
then begin
dependentk(i) := true;
if k = i then Pi is deadlocked
else ...
-
6
Chandy-Misra-Haass Algorithm
Receiving the probe:
.
else for all Pm and Pn such that
(a) Pk is locally dependent upon Pm, and
(b) Pm is waiting on Pn, and
(c) Pm and Pn are on different sites, send probe(i,m,n)
to the home site of Pn.
end.
Performance:
For a deadlock that spans m processes over n sites, m(n-1)/2 messages
are needed.
Size of the message 3 words.
Delay in deadlock detection O(n).
-
7
C-M-H Algorithm: Example
P1 P2 P6 P7
P3
P4 P5
Site 1
Site 2
Site 3
( 1,1,2 )
( 1,2,3 )
( 1,2,4 )
( 1,4,5 )
( 1,5,6 )
( 1,6,7 )
( 1,7,1 )
P1 initiates Deadlock
Detection by sending
Probe Message (1,1,2) to
P2
-
8
Diffusion-based Algorithm
CMH Algorithm for OR Model Initiation by a blocked process Pi:
send query(i,i,j) to all processes Pj in the dependent set DSi of Pi;
num(i) := |DSi|; waiti(i) := true;
Blocked process Pk receiving query(i,j,k):
if this is engaging query for process Pk /* first query from Pi */
then send query(i,k,m) to all Pm in DSk;
numk(i) := |DSk|; waitk(i) := true;
else if waitk(i) then send a reply(i,k,j) to Pj.
Process Pk receiving reply(i,j,k)
if waitk(i) then
numk(i) := numk(i) - 1;
if numk(i) = 0 then
if i = k then declare a deadlock.
else send reply(i, k, m) to Pm, which sent the engaging query.
-
9
Diffusion Algorithm: Example
-
11
Engaging Query
How to distinguish an engaging query? query(i,j,k) from the initiator contains a unique
sequence number for the query apart from the tuple (i,j,k).
This sequence number is used to identify subsequent queries.
(e.g.,) when query(1,7,1) is received by P1 from P7, P1 checks the sequence number along with the tuple.
P1 understands that the query was initiated by itself and it is not an engaging query.
Hence, P1 sends a reply back to P7 instead of forwarding the query on all its outgoing links.
-
Mitchell-Merritt Algorithm
(Edge-Chasing Category)
Each Node has two labels : Public & Private