aos lectures

353
BITS Pilani Pilani Campus Advance Operating System (CS ZG623) First Semester 2014-2015 PANKAJ VYAS Department of CS/IS ([email protected]) Lecture 1 Sunday, Aug 03 - 2014

Upload: harku186

Post on 08-Nov-2015

254 views

Category:

Documents


2 download

DESCRIPTION

AOS Lectures

TRANSCRIPT

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 1

    Sunday, Aug 03 - 2014

  • Introduction to Distributed Systems

    Theoretical Foundations

    Distributed Mutual Exclusion

    Distributed Deadlock

    Agreement Protocols

    Distributed File Systems

    Distributed Scheduling

    Distributed Shared Memory

    Recovery

    Fault tolerance

    Protection and Security

    Course Overview

    Mid-Semester

  • Text and References

    Text Book:

    Advanced Concepts in Operating Systems:

    Mukesh Singhal & Shivaratri, Tata Mc Graw Hill

    References:

    1. Distributed Operating Systems: P. K. Sinha, PHI

    2. Distributed Operating Systems: The Logical Design, A.

    Goscinski, AW

    3. Modern Operating Systems: A.S.Tanenbaum & VAN, PHI

    4. Distributed Systems: Concepts and Design: G. Coluris, AW

  • EC No. Evaluation

    Component &

    Type of

    Examination

    Duration Weigh-

    tage

    Day, Date, Session,Time

    EC-1 Quiz ** Details to be announced

    on LMS Taxila

    15% ** Details to be announced

    on LMS Taxila

    EC-2 Mid-Semester Test

    (Closed Book)*

    2 Hours 35%

    EC-3 Comprehensive

    Exam

    (Open Book)*

    3 Hours 50%

    Evaluation Components

  • Distributed Systems

    A Distributed System is a collection of independent computers that appears to its users

    as a single coherent system [Tanenbaum]

    A Distributed System is

    - a system having several computers that do

    not share a memory or a clock

    - Communication is via message passing

    - Each computer has its own OS+Memory

    [Shivaratri & Singhal]

  • Multiprocessor System Architecture

    Types

    Tightly Coupled Systems

    Loosely Coupled Systems

  • Tightly Coupled Systems

    Systems with a single system wide memory Parallel Processing System , SMMP (shared memory multiprocessor systems)

    CPU Shared memory

    CPU CPU CPU

    Interconnection hardware

  • Loosely Coupled System

    Distributed Memory Systems (DMS) Communication via Message Passing

    CPU

    Local

    memory

    CPU

    Local

    memory

    CPU

    Local

    memory

    CPU

    Local

    memory

    Communication network

  • Motivation

    Resource Sharing

    Enhanced Performance

    Improved Reliability & Availability

    Modular expandability

  • Distributed System Architecture Types

    Minicomputer Model

    Workstation Model

    Workstation Server Model

    Processor Pool Model

    Hybrid Model

  • Mini-

    computer

    Mini-

    computer

    Mini-

    computer Communication

    network

    Terminals

    MINICOMPUTER MODEL

  • WORKSTATION MODEL

    Workstation

    Communication

    networkWorkstationWorkstation

    WorkstationWorkstation

  • Workstation

    Communication

    networkWorkstationWorkstation

    WorkstationWorkstation

    Mini-

    computer

    used as

    file

    serv er

    Mini-

    computer

    used as

    database

    serv er

    Mini-

    computer

    used as

    print

    serv er

    . . .Mini-

    computer

    used as

    file

    serv er

    WORKSTATION SERVERMODEL

  • File

    serv er

    Run

    serv er

    Communication

    network

    Pool of processors.

    ...

    Terminals

    Processor Pool Model

  • Hybrid Model

    Based upon workstation-server model but with additional pool of processors

    Processors in the pool can be allocated dynamically

    Gives guaranteed response time to interactive jobs

    More expensive to build

  • Distributed OS

    A distributed OS is one that looks to its users like an centralized OS but runs on

    multiple, independent CPUs. The key

    concept is transparency. In other words,

    the use of multiple processors should be

    invisible to the user.

    [Tanenbaum & Van Renesse]

  • Issues

    Global knowledge

    Naming

    Scalability

    Compatibility

    Process Synchronization

    Resource Management

    Security

    Structuring

  • Global Knowledge

    No Global Memory

    No Global Clock

    Unpredictable Message Delays

  • Naming

    Name refers to objects [ Files, Computers etc]

    Name Service Maps logical name to physical address

    Techniques

    - LookUp Tables [Directories]

    - Algorithmic

    - Combination of above two

  • Scalability

    Grow with time

    Scaling Dimensions Size, Geographically & Administratively

    Techniques Hiding Communication Latencies, Distribution & Caching

  • Scaling Techniques (1)

    Hide Communication Latencies

    1.4

    The difference between letting:

    a) a server or

    b) a client check forms as they are being filled Scalability cont.

  • Scaling Techniques (2)

    Distribution

    1.5

    An example of dividing the DNS name space into zones.

    Scalability cont.

  • Scaling Techniques (3)

    Replication

    Replicate components across the distributed system

    Replication increases availability , balancing load distribution

    Consistency problem has to be handled

  • Compatibility

    Interoperability among resources in system

    Levels of Compatibility Binary Level, Execution Level & Protocol Level

  • Process Synchronization

    Difficult because of unavailability of shared memory

    Mutual Exclusion Problem

  • Resource Management

    Make both local and remote resources available

    Specific Location of resource should be hidden from user

    Techniques

    - Data Migration [DFS, DSM]

    - Computation Migration [RPC]

    - Distributed Scheduling [ Load Balancing]

  • Security

    Authentication a entity is what it claims to be

    Authorization what privileges an entity has and making only those privileges

    available

  • Structuring

    Techniques

    - Monolithic Kernel

    - Collective Kernel [Microkernel based , Mach, V-

    Kernel, Chorus and Galaxy]

    - Object Oriented OS [Services are implemented

    as objects, Eden, Choices, x-kernel, Medusa,

    Clouds, Amoeba & Muse]

    - Client-Server Computing Model [Processes are

    categorized as servers and clients]

  • Communication Primitives

  • Communication Primitives

    High level constructs [Helps the program in using underlying communication

    network]

    Two Types of Communication Models

    - Message passing

    - Remote Procedure Calls

  • Message Passing

    Two basic communication primitives

    - SEND(a,b) , a Message , b Destination

    - RECEIVE(c,d), c Source , d Buffer for storing the message

    Client-Server Computation Model

    - Client sends Message to server and waits

    - Server replies after computation

  • Design Issues

    Blocking vs Non blocking primitives

    Nonblocking

    - SEND primitive return the control to the user process as soon as the

    message is copied from user buffer to kernel buffer

    - Advantage : Programs have maximum flexibility in performing

    computation and communication in any order

    - Drawback Programming becomes tricky and difficult

    Blocking

    - SEND primitive does not return the control to the user process until

    message has been sent or acknowledgement has been received

    - Advantage : Programs behavior is predictable

    - Drawback Lack of flexibility in programming

  • Design Issues cont..

    Synchronous vs Asynchronous Primitives

    Synchronous

    - SEND primitive is blocked until corresponding RECEIVE

    primitive is executed at the target computer

    Asynchronous

    - Messages are buffered

    - SEND primitive does not block even if there is no

    corresponding execution of the RECEIVE primitive

    - The corresponding RECEIVE primitive can be either

    blocking or non-blocking

  • Details to be handled in Message

    Passing

    Pairing of Response with Requests

    Data Representation

    Sender should know the address of Remote machine

    Communication and System failures

  • Remote Procedure Call (RPC)

    RPC is an interaction between a client and a server

    Client invokes procedure on sever

    Server executes the procedure and pass the result back to client

    Calling process is suspended and proceeds only after getting the result from

    server

  • RPC Design issues

    Structure

    Binding

    Parameter and Result Passing

    Error handling, semantics and Correctness

  • Structure RPC mechanism is based upon stub procedures.

    Unpack Execute

    Procedure

    Stub

    Procedure Remote

    Procedure

    Return

    Pack

    result

    Local

    Procedure

    Call

    Local

    Procedure

    Call

    User

    Program Stub

    Procedure

    Pack

    parameters

    & Transmit

    wait

    Unpack

    Result

    Return

    from local

    call

    Client Machine

    Server Machine

  • Binding

    Determines remote procedure and machine on which it will be executed

    Check compatibility of the parameters passed

    Use Binding Server

  • Binding

    Binding Server

    Receive

    Query

    Return

    Server

    Address Unpack

    Stub

    Procedure Remote

    Procedure

    Return

    Pack

    result

    Local

    Procedure

    Call

    Local

    Procedure

    Call

    Binding

    Server

    User

    Program Stub

    Procedure

    Pack

    parameters

    & Transmit

    wait

    Unpack

    Result

    Return

    from local

    call

    Client Machine

    Server Machine

    Registering

    Services

    1

    2

    3

    4 5

    6 7

    8

  • Parameter and Result Passing

    Stub Procedures Convert Parameters & Result to appropriate form

    Pack parameters into a buffer

    Receiver Stub Unpacks the parameters

    Expensive if done on every call

    - Send Parameters along with code that helps to identify

    format so that receiver can do conversion

    - Alternatively Each data type may have a standard format.

    Sender will convert data to standard format and receiver

    will convert from standard format to its local

    representation

    Passing Parameters by Reference

  • Error handling, Semantics and

    Correctness RPC may fail either due to computer or communication

    failure

    If the remote server is slow the program invoking remote procedure may call it twice.

    If client crashes after sending RPC message

    If client recovers quickly after crash and reissues RPC

    Orphan Execution of Remote procedures

    RPC Semantics

    - At least once semantics

    - Exactly Once

    - At most once

  • Correctness Condition

    Given by Panzieri & Srivastava

    Let Ci denote call made by machine & Wi represents corresponding computation

    If C2 happened after C1 ( C1 C2) & Computations W2 & W1 share the same

    data, then to be correct in the presence of

    failures RPC should satisfy

    C1 C2 implies W1 W2

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 2

    Sunday, Aug 09 - 2014

  • Communication Primitives

  • Communication Primitives

    High level constructs [Helps the program in using underlying communication

    network]

    Two Types of Communication Models

    - Message passing

    - Remote Procedure Calls

  • Message Passing

    Two basic communication primitives

    - SEND(a,b) , a Message , b Destination

    - RECEIVE(c,d), c Source , d Buffer for storing the message

    Client-Server Computation Model

    - Client sends Message to server and waits

    - Server replies after computation

  • Design Issues

    Blocking vs Non blocking primitives

    Nonblocking

    - SEND primitive return the control to the user process as soon as the

    message is copied from user buffer to kernel buffer

    - Advantage : Programs have maximum flexibility in performing

    computation and communication in any order

    - Drawback Programming becomes tricky and difficult

    Blocking

    - SEND primitive does not return the control to the user process until

    message has been sent or acknowledgement has been received

    - Advantage : Programs behavior is predictable

    - Drawback Lack of flexibility in programming

  • Design Issues cont..

    Synchronous vs Asynchronous Primitives

    Synchronous

    - SEND primitive is blocked until corresponding RECEIVE

    primitive is executed at the target computer

    Asynchronous

    - Messages are buffered

    - SEND primitive does not block even if there is no

    corresponding execution of the RECEIVE primitive

    - The corresponding RECEIVE primitive can be either

    blocking or non-blocking

  • Details to be handled in Message

    Passing

    Pairing of Response with Requests

    Data Representation

    Sender should know the address of Remote machine

    Communication and System failures

  • Remote Procedure Call (RPC)

    RPC is an interaction between a client and a server

    Client invokes procedure on sever

    Server executes the procedure and pass the result back to client

    Calling process is suspended and proceeds only after getting the result from

    server

  • RPC Design issues

    Structure

    Binding

    Parameter and Result Passing

    Error handling, semantics and Correctness

  • Structure RPC mechanism is based upon stub procedures.

    Unpack Execute

    Procedure

    Stub

    Procedure Remote

    Procedure

    Return

    Pack

    result

    Local

    Procedure

    Call

    Local

    Procedure

    Call

    User

    Program Stub

    Procedure

    Pack

    parameters

    & Transmit

    wait

    Unpack

    Result

    Return

    from local

    call

    Client Machine

    Server Machine

    1

    2

    3 4 5

    6 7

    8

  • Binding

    Determines remote procedure and machine on which it will be executed

    Check compatibility of the parameters passed

    Use Binding Server

  • Binding

    Binding Server

    Receive

    Query

    Return

    Server

    Address Unpack

    Stub

    Procedure Remote

    Procedure

    Return

    Pack

    result

    Local

    Procedure

    Call

    Local

    Procedure

    Call

    Binding

    Server

    User

    Program Stub

    Procedure

    Pack

    parameters

    & Transmit

    wait

    Unpack

    Result

    Return

    from local

    call

    Client Machine

    Server Machine

    Registering

    Services

    1

    2

    3

    4 5

    6 7

    8

  • Parameter and Result Passing

    Stub Procedures Convert Parameters & Result to appropriate form

    Pack parameters into a buffer

    Receiver Stub Unpacks the parameters

    Expensive if done on every call

    - Send Parameters along with code that helps to identify

    format so that receiver can do conversion

    - Alternatively Each data type may have a standard format.

    Sender will convert data to standard format and receiver

    will convert from standard format to its local

    representation

    Passing Parameters by Reference

  • Error handling, Semantics and

    Correctness

    RPC may fail either due to computer or communication failure

    If the remote server is slow the program invoking remote procedure may call it twice.

    If client crashes after sending RPC message

    If client recovers quickly after crash and reissues RPC

    Orphan Execution of Remote procedures

    RPC Semantics

    - At least once semantics

    - Exactly Once

    - At most once

  • Correctness Condition

    Given by Panzieri & Srivastava

    Let Ci denote call made by machine & Wi represents corresponding computation

    If C2 happened after C1 ( C1 C2) & Computations W2 & W1 share the same

    data, then to be correct in the presence of

    failures RPC should satisfy

    C1 C2 implies W1 W2

  • Theoretical Foundations

    16

  • 17

    Theoretical Aspects Limitations of Distributed Systems

    Logical Clocks

    Vector Clocks

  • Limitations of Distributed Systems

    Absence of Global clock

    Absence of Shared Memory

  • Example

    $500

    Communication Channel S1:A S2:A

    $200 (a)

    $450

    Communication Channel S1:A S2:A

    $200 (b)

    $500

    Communication Channel S1:A S2:A

    $250 (c)

  • 20

    Happened before relation:

    a -> b : Event a occurred before event b. Events in the same process.

    a -> b : If a is the event of sending a message m in a process and b is the event of receipt of the

    same message m by another process.

    a -> b, b -> c, then a -> c. -> is transitive.

    Causally Ordered Events

    a -> b : Event a causally affects event b

    Concurrent Events

    a || b: if a !-> b and b !-> a

    Lamports Logical Clock

  • 21

    Space-time Diagram

    Time

    Space P1

    P2

    e11 e12 e13 e14

    e21 e22 e23 e24

    Internal

    Events Messages

  • 22

    Logical Clocks Conditions satisfied:

    Ci is clock in Process Pi.

    If a -> b in process Pi, Ci(a) < Ci(b)

    Let a: sending message m in Pi; b : receiving message m in Pj; then, Ci(a) < Cj(b).

    Implementation Rules: R1: Ci = Ci + d (d > 0); clock is updated between two

    successive events.

    R2: Cj = max(Cj, tm+ d); (d > 0); When Pj receives a message m with a time stamp tm (tm assigned by Pi, the sender; tm = Ci(a), a being the event of sending message m).

    A reasonable value for d is 1

  • 23

    Space-time Diagram

    /

    Time

    Space

    P1(0)

    P2 (0)

    e11 e12 e13 e14

    e21 e22 e23 e24

    e15 e16 e17

    e25

    (1) (2) (3) (4) (5) (6) (7)

    (1) (2) (3) (4) (7)

  • 24

    Space-time Diagram

    /

    Time

    Space

    P1(0)

    P2 (0)

    a b c d e

    f g h i

  • 25

    Limitation of Lamports Clock

    Time

    Space P1

    P2

    e11 e12 e13

    e21 e22 e23

    P3 e31 e32 e33

    (1) (2) (3)

    (1)

    (1) (2)

    (3) (4)

    C(e11) < C(e32) but not causally related.

    This inter-dependency not reflected in Lamports Clock.

  • 26

    Vector Clocks Keep track of transitive dependencies among

    processes for recovery purposes.

    Ci[1..n]: is a vector clock at process Pi whose entries are the assumed/best guess clock values of different processes.

    Ci[j] (j != i) is the best guess of Pi for Pjs clock.

    Vector clock rules: Ci[i] = Ci[i] + d, (d > 0); for successive events in Pi

    For all k, Cj[k] = max (Cj[k],tm[k]), when a message m with time stamp tm is received by Pj from Pi.

    : For all

  • 27

    Vector Clocks Comparison

    1. ta = tb iff for all i, ta[i] = tb[i];

    2. ta != tb iff there exists an i, ta[i] != tb[i];

    3. ta

  • 28

    Vector Clock

    Time

    Space P1

    P2

    e11 e12 e13

    e21 e22 e23 e24

    P3 e31 e32

    (1,0,0) (2,0,0) (3,4,1)

    (0,1,0) (2,2,0) (2,4,1) (2,3,1)

    (0,0,1) (0,0,2)

  • 29

    Next Lecture

    Causal Order of Messages

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 3

    Saturday, Aug 16 -

    2014

  • Causal Order of Messages

    - BSS Algorithm

    - SES Algorithm

    Global State

    Cut

    Todays Lecture

  • Causal ordering of Messages

    8/16/2014 CS ZG623, Adv OS 3

    Time

    Space P1

    P2

    P3

    Send(M1)

    Send(M3)

    Send(M2)

  • Birman- Schiper- Stephenson (BSS) protocol

    8/16/2014 CS ZG623, Adv OS 4

    Before broadcasting m, process Pi increments vector time

    VT Pi [i] and timestamps m.

    Process P j P i receives m with timestamp VT m from Pi ,

    delays delivery until both:

    VT Pj [i] = VT m [i] - 1 // received all previous ms

    VT Pj [k] >= VT m [k] for every k {1,2,, n} - {i}

    // received all messages also received by Pi before sending

    message m When Pj delivers m, VT Pj updated by IR2 of vector clock

  • Example of BSS

    8/16/2014 CS ZG623, Adv OS 5

    P1

    P2

    P3

    (buffer)

    (0,0,1) (0,1,1)

    (0,0,1) (0,1,1)

    (0,0,1) (0,1,1)

    deliver

    from buffer

  • Schiper-Eggli-Sandoz (SES) protocol

    8/16/2014 CS ZG623, Adv OS 6

    No need for broadcast messages.

    Each process maintains a vector V_P of size N - 1, N being the number of processes in the system.

    V_P is a vector of tuple (P,t): P the destination process id and t, a vector timestamp.

    Tm: logical time of sending message m

    Tpi: present logical time at pi

    Initially, V_P is empty.

  • SES Continued

    8/16/2014 CS ZG623, Adv OS 7

    Sending a Message:

    Send message M, time stamped tm, along with V_P1 to P2.

    Insert (P2, tm) into V_P1. Overwrite the previous value of (P2,t), if any.

    (P2,tm) is not sent. Any future message carrying (P2,tm) in V_P1 cannot be delivered to P2 until tm < Tp2.

    Delivering a message

    If V_M (in the message) does not contain any pair (P2, t), it can be delivered.

    /* (P2, t) exists in V_M*/ If t !< Tp2, buffer the message. (Dont deliver).

    else (t < Tp2) deliver it

  • P1

    P2

    P3

    x y z (0,0,0)

    (0,0,0)

    (0,0,0)

    M1 M2

    M3

    VP1( __ , __ )

    VP2( __ , __ )

    VP3( __ , __ )

    Initially Vector Time is (0,0,0) at all sites

    Vector at each site is empty. Note here we have taken size of vector at each site

    as 2. If You want you can extend it easily upto 3.

    SES Algorithm Example

    Figure 1

    a b

  • Vectors Maintained at each site

    VP1 ( __ , __ )

    This entry gets updated when P1 sends any Message to P2

    This entry gets updated when P1 sends any Message to P3

    VP2 ( __ , __ )

    This entry gets updated when P2 sends any Message to P1

    This entry gets updated when P2 sends any Message to P3

    VP3 ( __ , __ )

    This entry gets updated when P3 sends any Message to P1

    This entry gets updated when P3 sends any Message to P2

  • Consider Sending of Message M1 from P2 to P1 of Figure 1

    Send Step at P2

    1. P2 Sends Message M1 to P1 [ Vector accompanying Message

    will be represented as VM ( __, __) i.e. empty ]. So P2 sends the

    Message {( __, __), (0,1,0)} to P1.

    2. After Sending Message it updates its own vector VP2. Now P1s corresponding entry in VP2 (i.e. Vector of Process P2) is empty so

    the pair {P1,(0,1,0)} will be inserted in VP2. So before sending

    message M1, VP2 is ( __,__) but after sending message M1 ,

    P2s vector VP2 will be updated as {(P1,(0,1,0)), __)}.

    Important Point :

    Message M1 is sent with Earlier VP2 (as VM )i.e. VP2

    existing at P2 before sending message M1. VP2 gets

    updated with new entry after sending Message M1.

  • Receiving of M1 by P1 at Point y of Figure

    Receiving of M1 by P1[Note here what P1 has received from P2. Message of form {( __, __), (0,1,0)} i.e. (VM,t).

    Check

    1. If the Accompanying VM contains any entry corresponding to P1,

    If not receive the message . [This condition is satisfied here].

    else (If the condition of step 1 is not satisfied then only go to else part )

    2. Compare time t (Message time stamp) with TP1 (Current Time

    running at P1). Time running at P1 before receiving the message

    is (0,0,0). If t < TP1 then receive the message

    else (if this step is false)

    3. Buffer the Message [t >= TP1]

    For Figure 1 only Step1 is fulfilled that is why M1 is received

    at point y at P1. [Assume M3 has not been received at x]

  • What will happen at P1 when Message

    is delivered at point y

    STEP A Merge VP1 & VM. Update VP1 by values of VM. 1. If for any process say P (not equal to P1) there exits an entry in VM but does

    not have any entry in VP1 then that entry has to be inserted in VP1.

    2. If for any process say P (not equal to P1) there exists an entry in both VM

    and VP1 then you have to update the value for that process in VP1 by setting

    the time =max(a,b) where a is time in VM and b is time in VP1

    In our case after receiving Message M1, both VP1 & VM are ( __, __) So

    there Merge operation will be same i.e. ( __, __)

    SO VP1 after Merge will be ( __, __ ) i.e. empty

    STEP B. Update P1s logical clock so time at P1 will be (1,1,0) STEP C. Check for Buffered Messages at P1 . There is no buffered Messages at P1.

  • Now Complete State of System after Receiving M1 by P1

    is shown as follows

    Process Vector Time

    P1 VP1 ( __, __) [Empty] (1,1,0) at y

    P2 VP2 : {P1,(0,1,0), __ }. Note that VP2 updated after

    sending Message M1

    (0,1,0)

    P3 VP3 ( __, __) [Empty] (0,0,0)

  • Now Consider the case when M2 is sent from P2 to P3

    Send Step of M2 from P2 to P3

    1. P2 Sends Message M2 to P3. So P2 sends the Message M2 with VM

    as { P1,(0,1,0), __ } and time stamped as (0,2,0). So the format of

    sent message looks like { (P1,(0,1,0), __ ), (0,2,0) }

    2. After Sending Message it updates its own vector VP1. Here P2s vector entry corresponding to P3 is empty. So now VP2 becomes {

    (P1,(0,1,0)), (P3,(0,2,0) } . Note that this will be VP2 after sending

    message M2. So the state of the system looks like as follows

    Process Vector Time

    P1 VP1 ( __, __) [Empty] (1,1,0) at y

    P2 VP2 : {P1,(0,1,0), P3,(0,2,0)} After sending M2 (0,2,0)

    P3 VP3 ( __, __) [Empty] P3 has not received M2 yet (0,0,0)

  • Receiving of M2 by P3 at Point a of Figure

    Receiving of M2 by P3[Note here what P3 has received from P2. Message of form [{ P1,(0,1,0), __ }, (0,2,0)] i.e. [VM,t]

    Check

    1. If the Accompanying VM contains any entry corresponding to P3, If not receive the message . [This condition is satisfied here. VMs entry for P3 is empty].

    else (If the condition of step 1 is not satisfied then only go to else part )

    2. Compare time t which is in the message with time at P1 . Time running at P1 before receiving the message is (0,0,0). If t

  • What will happen at P3 when Message

    M2 is delivered at point a

    STEP A Merge VP3 & VM. Update VP3 by values of VM. 1. If for any process say P (not equal to P3) there exists an entry in VM but

    does not have any entry in VP3 then that entry has to be inserted in VP1.

    2. If for any process say P (not equal to P1) there exists an entry in both VM

    and VP1 then you have to update the value for that process in VP1 by setting

    the time =max(a,b) where a is time in VM and b is time in VP1

    In This case

    VM is [ {P1,(0,1,0)} , __ ] andVP3 is ( __ , __)

    [Note VM contains entry for P1. Step1 condition ]

    SO VP3 after Merge will be [{P1,(0,1,0)} , __ ]

    STEP B. Update P3s logical clock so time at P3 will be (0,2,1) STEP C. Check for Buffered Messages at P3 . There is no buffered Messages at P3.

  • Now Complete State of System after Receiving M2 by P3

    at point a is shown as follows

    Process Vector Time

    P1 VP1 ( __, __) [Empty] (1,1,0) at y

    P2 VP2 : [ {P1,(0,1,0)},{P2,(0,2,0)}] Note that VP2 updated after

    sending Message M2

    (0,2,0)

    P3 VP3 ({P1,(0,1,0)}, __) (0,2,1) at a

  • Now Consider the case when M3 is sent from P3 to P1 at point

    b of Figure 1

    Send Step of M3 from P3 to P1

    1. P3 Sends Message M3 to P1. So P3 sends the Message M3 with VM as {

    P1,(0,1,0), __ } and time stamped as (0,2,2). So the format of sent message

    looks like { (P1,(0,1,0), __ ), (0,2,2) }

    2. After Sending Message M3 P3 updates its own vector VP3. VP3 already

    contains an entry for P1 so that entry gets simply updated with latest time

    value. Complete state of system after sending M3 by P3 is shown below.

    Process Vector Time

    P1 VP1 ( __, __) P1 has not received M3 so far (1,1,0) at y

    P2 VP2 : {P1,(0,1,0), P3,(0,2,0)} After sending M2 (0,2,0)

    P3 VP3 ( P1,(0,2,2), __) P3 has sent M3 (0,2,2) at b

  • Receiving of M3 by P1 at Point x of Figure1

    Receiving of M3 by P1 at point x. [Note here what P1 has received from P3. Message of form [{ P1,(0,1,0), __ }, (0,2,2)] i.e. [VM,t]

    Check

    1. If the Accompanying VM contains any entry corresponding to P1, If not

    receive the message . [This condition is not true for this case. Because

    accompanying VM contains an entry corresponding to P1].

    else (If the condition of step 1 is not satisfied then only go to else part )

    2. If P1 has received Message M3 at point x then current time at that point will

    be (0,0,0) which is TP1. The message has come with time stamped t as (0,2,2). Clearly t is > TP1 so the message will be buffered.

  • What will happen at P3 when Message

    M2 is delivered at point a

    STEP A Merge VP3 & VM. Update VP3 by values of VM. 1. If for any process say P (not equal to P3) there exists an entry in VM but

    does not have any entry in VP3 then that entry has to be inserted in VP1.

    2. If for any process say P (not equal to P1) there exists an entry in both VM

    and VP1 then you have to update the value for that process in VP1 by setting

    the time =max(a,b) where a is time in VM and b is time in VP1

    In This case

    VM is [ {P1,(0,1,0)} , __ ] andVP3 is ( __ , __)

    [Note VM contains entry for P1. Step1 condition ]

    SO VP3 after Merge will be [{P1,(0,1,0)} , __ ]

    STEP B. Update P3s logical clock so time at P3 will be (0,2,1) STEP C. Check for Buffered Messages at P3 . There is no buffered Messages at P3.

  • Now Complete State of System after Receiving M2 by P3

    at point a is shown as follows

    Process Vector Time

    P1 VP1 ( __, __) [Empty] (1,1,0) at y

    P2 VP2 : [ {P1,(0,1,0)},{P2,(0,2,0)}] Note that VP2 updated after

    sending Message M2

    (0,2,0)

    P3 VP3 ({P1,(0,1,0)}, __) (0,2,1) at a

  • Global State: The Model

    8/16/2014 CS ZG623, Adv OS 22

    Node properties:

    No shared memory

    No global clock

    Channel properties:

    FIFO

    loss free

    non-duplicating

  • The Need for Global State

    Many problems in Distributed Computing can be

    cast as executing some action on reaching a

    particular state

    e.g. -Distributed deadlock detection is finding a cycle in the

    Wait For Graph.

    -Termination detection

    -Checkpointing

    And many more..

  • Difficulties due to Non Determinism

    Deterministic Computation

    - At any point in computation there is at

    most one event that can happen next.

    Non-Deterministic Computation

    - At any point in computation there can be

    more than one event that can happen next.

  • Global state Example

    8/16/2014 CS ZG623, Adv OS 25

    Global State 3

    Global State 2

    Global State 1

  • Recording Global state

    Let global state of A is recorded in (1) and not in (2).

    State of B, C1, and C2 are recorded in (2)

    Extra amount of $50 will appear in global state

    Reason: As state recorded before sending message and C1s state after sending message.

    Inconsistent global state if n < n, where

    n is number of messages sent by A along channel before As state was recorded

    n is number of messages sent by A along the channel before channels state was recorded.

    Consistent global state: n = n

  • Continued

    Similarly, for consistency m = m

    m: no. of messages received along channel before Bs state recording

    m: no. of messages received along channel by B before channels state was recorded.

    Also, n >= m, as no. of messages sent along the channel

    be less than that of received

    Hence, n >= m

    Consistent global state should satisfy the above equation.

    Consistent global state:

    Channel state: sequence of messages sent before recording senders state, excluding the messages received before receivers state was recorded.

    Only transit messages are recorded in the channel state.

  • Notion of Consistency: Example

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1

    m2

    m3

  • A Consistent State?

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1

    m2

    m3

    Sp1 Sq

    1

  • Yes

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1

    m2

    m3

    Sp1 Sq

    1

  • What about this?

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1

    m2

    m3

    Sp2 Sq

    3

    Yes

  • What about this?

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1

    m2

    m3

    Sp1 Sq

    3

    No

  • Chandy-Lamport GSR algorithm

    8/16/2014 CS ZG623, Adv OS 33

    Sender (Process p)

    Record the state of p

    For each outgoing channel (c) incident on p, send a marker before sending any other messages

    Receiver (q receives marker on c1)

    If q has not yet recorded its state

    Record the state of q

    Record the state of c1 as null

    For each outgoing channel (c) incident on q , send a marker before sending any other message.

    If q has already recorded its state

    Record the state of (c1) as all the messages received since the last time of the state of q was recorded.

  • Uses of GSR

    8/16/2014 CS ZG623, Adv OS 34

    recording a consistent state of the global computation checkpointing for fault tolerance (rollback, recovery) testing and debugging monitoring and auditing

    detecting stable properties in a distributed system via snapshots. A property is stable if, once it holds in a state, it holds in all subsequent states.

    termination deadlock garbage collection

  • State Recording Example

    8/16/2014 CS ZG623, Adv OS 35

  • State Recording Example

    8/16/2014 CS ZG623, Adv OS 36

    Let p transfer 100 to q, q transfer 50 to p and 30 to r, and p wants to record the global state.

    400

    50

    30

  • State Recording Example

    8/16/2014 CS ZG623, Adv OS 37

    Let p transfer 100 to q, q transfer 50 to p and 30 to r, and p wants to record the global state.

    400

    50

    30

    520

  • State Recording Example

    8/16/2014 CS ZG623, Adv OS 38

    Let p transfer 100 to q, q transfer 50 to p and 30 to r, and p wants to record the global state.

    400

    50

    30

    520

    530

    50

  • Cut

    8/16/2014 CS ZG623, Adv OS 39

    A cut is a set of cut events, one per node, each of which

    captures the state of the node on which it occurs. It is

    also a graphical representation of a global state.

  • Consistent Cut

    8/16/2014 CS ZG623, Adv OS 40

    An inconsistent cut :

    ee , e c2 , c2 -/->c3

    A cut C = {c1, c2, c3, } is consistent if for all sites there are no events ei and ej such that:

    (ei --> ej) and (ej --> cj) and (ei -/-> ci), ci , cj C

    e

    e

  • Ordering of Cut events

    8/16/2014 CS ZG623, Adv OS 41

    The cut events in a consistent cut are not causally related. Thus, the cut is a set of concurrent events and a set of concurrent events is a cut.

    Note, in this inconsistent cut, c3 --> c2.

  • Termination detection

    8/16/2014 CS ZG623, Adv OS 42

    Question:

    In a distributed computation,

    when are all of the processes

    become idle (i.e., when has the

    computation terminated)?

  • Huangs algorithm

    8/16/2014 CS ZG623, Adv OS 43

    The computation starts when the controlling agent sends the first message and terminates when all processes are idle.

    The role of weights: the controlling agent initially has a weight of 1 and all others

    have a weight of zero,

    when a process sends a message, a portion of the senders weight is put in the message, reducing the senders weight,

    a receiver adds to its weight the weight of a received message, on becoming idle, a process sends it weight to the controlling

    agent,

    the sum of all weights is always 1.

    The computation terminates when the weight of the controlling agent reaches 1 after the first message.

  • Continued

    8/16/2014 CS ZG623, Adv OS 44

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 4

    Saturday, Aug 23-2014

  • Distributed Mutual Exclusion

    -Non-Token Based Algorithms (Will be Covered Today)

    1. Lamports Algorithm

    2. Ricart-Agarwala Algorithm

    3. Maekawas Algorithm

    -Token Based Algorithms (Next Class)

    1. Suzuki-Kasami Algorithm

    2. Singhals Hieuristic Search

    3. Raymonds Tree Based

    Todays Lecture

  • 3

    Mutual Exclusion

    Access to shared resource by several un-coordinated user requests must be

    serialized

    Actions performed on shared resource on user behalf must be atomic

    Examples : Directory Managment

  • 4

    Classification of Mutual Exclusion

    Algorithms

    Non-token based:

    A site/process can enter a critical section when an

    assertion (condition) becomes true.

    Algorithm should ensure that the assertion will be true

    in only one site/process.

    Token based:

    A unique token (a known, PRIVILEGE message) is shared

    among cooperating sites/processes.

    Possessor of the token has access to critical section.

    Need to take care of conditions such as loss of token,

    crash of token holder, possibility of multiple tokens, etc.

  • 5

    General System Model

    At any instant, a site may have several requests for critical section (CS), queued up, and serviced one at

    a time.

    Site States: Requesting CS, executing CS, idle (neither requesting nor executing CS).

    Requesting CS: blocked until granted access, cannot make additional requests for CS.

    Executing CS: using the CS.

    Idle: action is outside the site. In token-based approaches, idle site can have the token.

  • 6

    Mutual Exclusion: Requirements

    Freedom from deadlocks: two or more sites should not endlessly wait on conditions/messages that never

    become true/arrive.

    Freedom from starvation: No indefinite waiting.

    Fairness: Order of execution of CS follows the order of the requests for CS. (equal priority).

    Fault tolerance: recognize faults, reorganize, continue. (e.g., loss of token).

  • 7

    Performance

    Number of messages per CS invocation: should be minimized.

    Synchronization delay, i.e., time between the leaving of CS by a site and the entry of CS by the next one: should be minimized.

    Response time: time interval between request messages transmissions and exit of CS.

    System throughput, i.e., rate at which system executes requests for CS: should be maximized.

    If sd is synchronization delay, E the average CS execution time: system throughput = 1 / (sd + E).

  • 8

    Performance metrics

    Last site

    exits CS

    Next site

    enters CS

    Synchronization

    delay

    Time

    Time

    CS Request

    arrives

    Messages

    sent Enter CS Exit CS

    E

    Response Time

  • 9

    Performance ...

    Low and High Load:

    Low load: No more than one request at a given point in time.

    High load: Always a pending mutual exclusion request at a site.

    Best and Worst Case:

    Best Case (low loads): Round-trip message delay + Execution time. 2T + E.

    Worst case (high loads).

    Message traffic: low at low loads, high at high loads.

    Average performance: when load conditions fluctuate widely.

  • 10

    Simple Solution

    Control site: grants permission for CS execution.

    A site sends REQUEST message to control site.

    Controller grants access one by one.

    Synchronization delay: 2T -> A site release CS by sending message to controller and controller sends

    permission to another site.

    System throughput: 1/(2T + E). If synchronization delay is reduced to T, throughput doubles.

    Controller becomes a bottleneck, congestion can occur.

  • 11

    Non-token Based Algorithms

    Notations:

    Si: Site I

    Ri: Request set, containing the ids of all Sis from which permission must be received before accessing CS.

    Non-token based approaches use time stamps to order requests for CS.

    Smaller time stamps get priority over larger ones.

    Lamports Algorithm

    Ri = {S1, S2, , Sn}, i.e., all sites.

    Request queue: maintained at each Si. Ordered by time stamps.

    Assumption: message delivered in FIFO.

  • 12

    Lamports Algorithm

    Requesting CS: Send REQUEST(tsi, i). (tsi,i): Request time stamp. Place

    REQUEST in request_queuei.

    On receiving the message; sj sends time-stamped REPLY message to si. Sis request placed in request_queuej.

    Executing CS: Si has received a message with time stamp larger than (tsi,i)

    from all other sites.

    Sis request is the top most one in request_queuei.

    Releasing CS: Exiting CS: send a time stamped RELEASE message to all

    sites in its request set.

    Receiving RELEASE message: Sj removes Sis request from its queue.

  • 13

    Lamports Algorithm

    Performance.

    3(N-1) messages per CS invocation. (N - 1) REQUEST, (N - 1) REPLY, (N - 1) RELEASE messages.

    Synchronization delay: T

    Optimization

    Suppress reply messages. (e.g.,) Sj receives a REQUEST message from Si after sending its own REQUEST message

    with time stamp higher than that of Sis. Do NOT send REPLY message.

    Messages reduced to between 2(N-1) and 3(N-1).

  • 14

    Lamports Algorithm: Example

    (2,1)

    (1,2)

    S1

    S2

    S3

    (1,2) (2,1)

    (1,2) (2,1)

    S1

    S2

    S3

    Step 1:

    Step 2:

    S2 enters CS

    (1,2) (2,1)

  • 15

    Lamports: Example

    (1,2) (2,1)

    (1,2) (2,1)

    S1

    S2

    S3

    Step 3:

    (1,2) (2,1)

    S2 leaves CS

    (1,2) (2,1)

    (1,2) (2,1)

    S1

    S2

    S3

    Step 4:

    (1,2) (2,1)

    S1 enters CS

    (2,1)

    (2,1)

    (2,1)

  • 16

    Ricart-Agrawala Algorithm

    Requesting critical section Si sends time stamped REQUEST message

    Sj sends REPLY to Si, if

    Sj is not requesting nor executing CS

    If Sj is requesting CS and Sis time stamp is smaller than its own request.

    Request is deferred otherwise.

    Executing CS: after it has received REPLY from all sites in its request set.

    Releasing CS: Send REPLY to all deferred requests. i.e., a sites REPLY messages are blocked only by sites with smaller time stamps

  • 17

    Ricart-Agrawala: Performance

    Performance: 2(N-1) messages per CS execution. (N-1) REQUEST + (N-1)

    REPLY.

    Synchronization delay: T.

    Optimization [Proposed by Roucairol and Carvalho]: When Si receives REPLY message from Sj -> authorization

    to access CS till

    Sj sends a REQUEST message and Si sends a REPLY message.

    Access CS repeatedly till then.

    A site requests permission from dynamically varying set of sites: 0 to 2(N-1) messages.

  • 18

    Ricart-Agrawala: Example

    (2,1)

    (1,2)

    S1

    S2

    S3

    (2,1)

    S1

    S2

    S3

    Step 1:

    Step 2:

    S2 enters CS

  • 19

    Ricart-Agrawala: Example

    (2,1)

    S1

    S2

    S3

    Step 3:

    S1 enters CS

    S2 leaves CS

  • 20

    Maekawas Algorithm

    A site requests permission only from a subset of sites.

    Request set of sites si & sj: Ri, Rj such that Ri and Rj will have atleast one common site (Sk). Sk mediates conflicts between Ri and Rj.

    A site can send only one REPLY message at a time, i.e., a site can send a REPLY message only after receiving a RELEASE message for the previous REPLY message.

    Request Sets Rules: Sets Ri and Rj have atleast one common site.

    Si is always in Ri.

    Cardinality of Ri, i.e., the number of sites in Ri is K.

    Any site Si is in K number of Ris. N = K(K - 1) + 1 -> K = square root of N.

  • 21

    Maekawas Algorithm ...

    Requesting CS Si sends REQUEST(i) to sites in Ri.

    Sj sends REPLY to Si if

    Sj has NOT sent a REPLY message to any site after it received the last RELEASE message.

    Otherwise, queue up Sis request.

    Executing CS: after getting REPLY from all sites in Ri.

    Releasing CS send RELEASE(i) to all sites in Ri

    Any Sj after receiving RELEASE message, send REPLY message to the next request in queue.

    If queue empty, update status indicating receipt of RELEASE.

  • 22

    Request Subsets

    Example k = 2; (N = 3).

    R1 = {1, 2}; R3 = {1, 3}; R2 = {2, 3}

    Example k = 3; N = 7.

    R1 = {1, 2, 3}; R4 = {1, 4, 5}; R6 = {1, 6, 7};

    R2 = {2, 4, 6}; R5 = {2, 5, 7}; R7 = {3, 4, 7};

    R3 = {3, 5, 6}

  • 23

    Maekawas Algorithm ...

    Performance Synchronization delay: 2T

    Messages: 3 times square root of N (one each for REQUEST, REPLY, RELEASE messages)

    Deadlocks Message deliveries are not ordered.

    Assume Si, Sj, Sk concurrently request CS

    Ri intersection Rj = {Sij}, Rj Rk = {Sjk}, Rk Ri = {Ski}

    Possible that:

    Sij is locked by Si (forcing Sj to wait at Sij)

    Sjk by Sj (forcing Sk to wait at Sjk)

    Ski by Sk (forcing Si to wait at Ski)

    -> deadlocks among Si, Sj, and Sk.

  • 24

    Handling Deadlocks in Maekawas Algorithm

    Si yields to a request if that has a smaller time stamp.

    A site suspects a deadlock when it is locked by a request with a higher time stamp (lower priority).

    Deadlock handling messages:

    FAILED: from Si to Sj -> Si has granted permission to higher priority request.

    INQUIRE: from Si to Sj -> Si would like to know Sj has succeeded in locking all sites in Sjs request set.

    YIELD: from Si to Sj -> Si is returning permission to Sj so that Sj can yield to a higher priority request.

  • 25

    Handling Deadlocks

    REQUEST(tsi,i) to Sj: Sj is locked by Sk -> Sj sends FAILED to Si, if Sis request has

    higher time stamp.

    Otherwise, Sj sends INQUIRE(j) to Sk.

    INQUIRE(j) to Sk: Sk sends a YIELD (k) to Sj, if Sk has received a FAILED

    message from a site in Sks set. (or) if Sk sent a YIELD and has not received a new REPLY.

    YIELD(k) to Sj: Sj assumes it has been released by Sk, places Sks request in

    its queue appropriately, sends a REPLY(j) to the top request in its queue.

    Sites may exchange these messages even if there is no real deadlock. Maximum number of messages per CS request: 5 times square root of N.

  • Next Class

    Token-Based Algorithms For Distributed

    Mutual Exclusion

    26

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 5

    Sunday, Aug 24-2014

  • Distributed Mutual Exclusion

    -Non-Token Based Algorithms

    1. Lamports Algorithm (Covered in Previous Class)

    2. Ricart-Agarwala Algorithm (Covered in Previous Class)

    3. Maekawas Algorithm (Will be Covered Today)

    -Token Based Algorithms (Will be Covered Today)

    1. Suzuki-Kasami Algorithm

    2. Singhals Hieuristic Search

    3. Raymonds Tree Based

    Todays Lecture

  • 3

    Maekawas Algorithm

    A site requests permission only from a subset of sites.

    Request set of sites si & sj: Ri, Rj such that Ri and Rj will have atleast one common site (Sk). Sk mediates conflicts between Ri and Rj.

    A site can send only one REPLY message at a time, i.e., a site can send a REPLY message only after receiving a RELEASE message for the previous REPLY message.

    Request Sets Rules: Sets Ri and Rj have atleast one common site.

    Si is always in Ri.

    Cardinality of Ri, i.e., the number of sites in Ri is K.

    Any site Si is in K number of Ris. N = K(K - 1) + 1 -> K = square root of N.

  • 4

    Maekawas Algorithm ...

    Requesting CS Si sends REQUEST(i) to sites in Ri.

    Sj sends REPLY to Si if

    Sj has NOT sent a REPLY message to any site after it received the last RELEASE message.

    Otherwise, queue up Sis request.

    Executing CS: after getting REPLY from all sites in Ri.

    Releasing CS send RELEASE(i) to all sites in Ri

    Any Sj after receiving RELEASE message, send REPLY message to the next request in queue.

    If queue empty, update status indicating receipt of RELEASE.

  • 5

    Request Subsets

    Example k = 2; (N = 3).

    R1 = {1, 2}; R3 = {1, 3}; R2 = {2, 3}

    Example k = 3; N = 7.

    R1 = {1, 2, 3}; R4 = {1, 4, 5}; R6 = {1, 6, 7};

    R2 = {2, 4, 6}; R5 = {2, 5, 7}; R7 = {3, 4, 7};

    R3 = {3, 5, 6}

  • 6

    Maekawas Algorithm ...

    Performance Synchronization delay: 2T

    Messages: 3 times square root of N (one each for REQUEST, REPLY, RELEASE messages)

    Deadlocks Message deliveries are not ordered.

    Assume Si, Sj, Sk concurrently request CS

    Ri intersection Rj = {Sij}, Rj Rk = {Sjk}, Rk Ri = {Ski}

    Possible that:

    Sij is locked by Si (forcing Sj to wait at Sij)

    Sjk by Sj (forcing Sk to wait at Sjk)

    Ski by Sk (forcing Si to wait at Ski)

    -> deadlocks among Si, Sj, and Sk.

  • 7

    Handling Deadlocks in Maekawas Algorithm

    Si yields to a request if that has a smaller time stamp.

    A site suspects a deadlock when it is locked by a request with a higher time stamp (lower priority).

    Deadlock handling messages:

    FAILED: from Si to Sj -> Si has granted permission to higher priority request.

    INQUIRE: from Si to Sj -> Si would like to know Sj has succeeded in locking all sites in Sjs request set.

    YIELD: from Si to Sj -> Si is returning permission to Sj so that Sj can yield to a higher priority request.

  • 8

    Handling Deadlocks

    REQUEST(tsi,i) to Sj: Sj is locked by Sk -> Sj sends FAILED to Si, if Sis request has

    higher time stamp.

    Otherwise, Sj sends INQUIRE(j) to Sk.

    INQUIRE(j) to Sk: Sk sends a YIELD (k) to Sj, if Sk has received a FAILED

    message from a site in Sks set. (or) if Sk sent a YIELD and has not received a new REPLY.

    YIELD(k) to Sj: Sj assumes it has been released by Sk, places Sks request in

    its queue appropriately, sends a REPLY(j) to the top request in its queue.

    Sites may exchange these messages even if there is no real deadlock. Maximum number of messages per CS request: 5 times square root of N.

  • 9

    Token-based Algorithms

    Unique token circulates among the participating sites.

    A site can enter CS if it has the token.

    Token-based approaches use sequence numbers instead of time stamps.

    Request for a token contains a sequence number.

    Sequence number of sites advance independently.

    Correctness issue is trivial since only one token is present -> only one site can enter CS.

    Deadlock and starvation issues to be addressed.

  • 10

    Suzuki-Kasami Algorithm

    If a site without a token needs to enter a CS, broadcast a REQUEST for token message to all other sites.

    Token: (a) Queue of request sites (b) Array LN[1..N], the sequence number of the most recent execution by a site j.

    Token holder sends token to requestor, if it is not inside CS. Otherwise, sends after exiting CS.

    Token holder can make multiple CS accesses.

    Design issues:

    Distinguishing outdated REQUEST messages.

    Format: REQUEST(j,n) -> jth site making nth request.

    Each site has RNi[1..N] -> RNi[j] is the largest sequence number of request from j.

    Determining which site has an outstanding token request.

    If LN[j] = RNi[j] - 1, then Sj has an outstanding request.

  • 11

    Suzuki-Kasami Algorithm ...

    Passing the token

    After finishing CS

    (assuming Si has token), LN[i] := RNi[i]

    Token consists of Q and LN. Q is a queue of requesting sites.

    Token holder checks if RNi[j] = LN[j] + 1. If so, place j in Q.

    Send token to the site at head of Q.

    Performance

    0 to N messages per CS invocation.

    Synchronization delay is 0 (if the token holder repeats CS) or T.

  • Suzuki Kasami Example

  • Consider 5 Sites

    S1 S2

    S4

    S3

    S5

    n=1 n=2

    n=5

    n=4

    n=3

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5 0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    RN1 RN2

    RN5

    RN3

    RN4

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    LN

    Q { } (empty)

    Token at Site

    S1

  • Site S1 wants to enter CS

    S1 wont execute Request Part of the algorithm

    S1 enters CS as it has token

    When S1 release CS,it performs following tasks

    (i) LN[1] = RN1[1] = 0 i.e. same as before

    (ii) If during S1s CS execution, any request from other site (say S2) have come then that Sites id will now be entered into the queue if RN1[2] = LN[2]+1

    So if site holding (not received from other) the token enters CS then overall state of the system wont change it remains same.

  • Site S2 wants to enter CS

    S2 does not have token so it sends REQUEST(2,1) message to every other site

    Sites S1,S3,S4 and S5 updates their RN as follows

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    It may be possible that S2s request came at S1 when S1 is under CS execution So at this point S1 will not pass the token to S2. It will finish its CS and then adds S2

    Into Queue because at that point

    RN1[2] = 1 which is LN[2]+1

    Now as the queue is not empty, so S1 pass token to S2 by removing S2 entry from

    Token queue.

  • Site S2 Sends REQUEST(2,1)

    S1 S2

    S4

    S3

    S5

    n=1 n=2

    n=5

    n=4

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5 0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    RN1 RN2

    RN3

    RN4

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    LN

    Q {S2 } [Inserted only when S1 leave CS]

    Token at Site

    S1

    REQUEST(2,1)

    REQUEST(2,1)

    REQUEST(2,1)

    REQUEST(2,1)

  • Site S1 Sends Token to S2

    S1 S2

    S4

    S3

    S5

    n=1 n=2

    n=5

    n=4

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5 0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    RN1 RN2

    RN3

    RN4

    LN Token at Site S2

    0

    1

    0

    2

    0

    3

    0

    4

    0

    5

    Q{}

    LN

  • Site S2 Leaves CS

    S1 S2

    S4

    S3

    S5

    n=1 n=2

    n=5

    n=4

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5 0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    RN1 RN2

    RN3

    RN4

    LN Token at Site S2

    0

    1

    1

    2

    0

    3

    0

    4

    0

    5

    Q{}

    LN

  • Now suppose S2 again wants to enter

    CS

    S2 can enter CS as token queue is empty i.e. no other site has requested for CS and

    token is currently held by S2

    When S2 release CS it updates LN[2] as RN2[2] which is same as 1.

  • Salient Points

    A site updates its RN table as soon as it receives Request message from any other site [At this point a site may hold an ideal token or

    may be executing CS]

    When a site enters CS without sending request message (i.e. when site wants to again enter CS without making request) then its RN

    entry and LN entry of token remains unchanged

    A site can be added in to token queue only by the site who is releasing the CS

    A site holding the token can repeatedly enter CS until it does not send token to some other site.

    A site gives priority to other sites with outstanding requests for the CS over its pending requests for the CS

  • 21

    Singhals Heuristic Algorithm

    Instead of broadcast: each site maintains information on other sites, guess the sites likely to have the token.

    Data Structures: Si maintains SVi[1..M] and SNi[1..M] for storing information on

    other sites: state and highest sequence number.

    Token contains 2 arrays: TSV[1..M] and TSN[1..M].

    States of a site

    R : requesting CS

    E : executing CS

    H : Holding token, idle

    N : None of the above

    Initialization: SVi[j] := N, for j = M .. i; SVi[j] := R, for j = i-1 .. 1; SNi[j] := 0, j =

    1..N. S1 (Site 1) is in state H.

    Token: TSV[j] := N & TSN[j] := 0, j = 1 .. N.

  • 22

    Singhals Heuristic Algorithm

    Requesting CS If Si has no token and requests CS:

    SVi[i] := R. SNi[i] := SNi[i] + 1.

    Send REQUEST(i,sn) to sites Sj for which SVi[j] = R. (sn: sequence number, updated value of SNi[i]).

    Receiving REQUEST(i,sn): if sn SVj[i] := R.

    SVj[j] = R -> If SVj[i] != R, set it to R & send REQUEST(j,SNj[j]) to Si. Else do nothing.

    SVj[j] = E -> SVj[i] := R.

    SVj[j] = H -> SVj[i] := R, TSV[i] := R, TSN[i] := sn, SVj[j] = N. Send token to Si.

    Executing CS: after getting token. Set SVi[i] := E.

  • 23

    Singhals Heuristic Algorithm

    Releasing CS SVi[i] := N, TSV[i] := N. Then, do:

    For other Sj: if (SNi[j] > TSN[j]), then {TSV[j] := SVi[j]; TSN[j] := SNi[j]}

    else {SVi[j] := TSV[j]; SNi[j] := TSN[j]}

    If SVi[j] = N, for all j, then set SVi[i] := H. Else send token to a site Sj provided SVi[j] = R.

    Fairness of algorithm will depend on choice of Si, since no queue is maintained in token.

    Arbitration rules to ensure fairness used.

    Performance Low to moderate loads: average of N/2 messages.

    High loads: N messages (all sites request CS).

    Synchronization delay: T.

  • 24

    Singhal: Example

    S2

    S3

    S4

    Sn

    S1

    1 2 3 4 n

    H

    R N

    R

    N N N

    N N N

    N N

    N

    N N N N

    R

    R R R

    R R R R

    S1

    S2

    S4

    Sn

    S3

    1 2 3 4 n

    N

    N N

    R

    R N N

    R N N

    N N

    N

    N H N N

    N

    R R R

    R R R R

    (a) Initial Pattern

    Each row in the matrix has

    increasing number of Rs.

    (b) Pattern after S3 gets the token from

    S1.

    Stair case is pattern can be identified

    by noting that S1 has 1 R and S2 has 2

    Rs and so on. Order of occurrence of R

    in a row does not matter.

  • 25

    Singhal: Example

    Assume there are 3 sites in the system. Initially:

    Site 1: SV1[1] = H, SV1[2] = N, SV1[3] = N. SN1[1], SN1[2], SN1[3] are 0.

    Site 2: SV2[1] = R, SV2[2] = N, SV2[3] = N. SNs are 0.

    Site 3: SV3[1] = R, SV3[2] = R, SV3[3] = N. SNs are 0.

    Token: TSVs are N. TSNs are 0.

    Assume site 2 is requesting token.

    S2 sets SV2[2] = R, SN2[2] = 1.

    S2 sends REQUEST(2,1) to S1 (since only S1 is set to R in SV[2])

    S1 receives the REQUEST. Accepts the REQUEST since SN1[2] is smaller than

    the message sequence number.

    Since SV1[1] is H: SV1[2] = R, TSV[2] = R, TSN[2] = 1, SV1[1] = N.

    Send token to S2

    S2 receives the token. SV2[2] = E. After exiting the CS, SV2[2] = TSV[2] = N.

    Updates SN, SV, TSN, TSV. Since nobody is REQUESTing, SV2[2] = H.

    Assume S3 makes a REQUEST now. It will be sent to both S1 and S2. Only S2

    responds since only SV2[2] is H (SV1[1] is N now).

  • 26

    Raymonds Algorithm

    Sites are arranged in a logical directed tree. Root: token holder. Edges: directed towards root.

    Every site has a variable holder that points to an immediate neighbor node, on the directed path towards root. (Roots holder point to itself).

    Requesting CS If Si does not hold token and request CS, sends REQUEST upwards

    provided its request_q is empty. It then adds its request to request_q.

    Non-empty request_q -> REQUEST message for top entry in q (if not done before).

    Site on path to root receiving REQUEST -> propagate it up, if its request_q is empty. Add request to request_q.

    Root on receiving REQUEST -> send token to the site that forwarded the message. Set holder to that forwarding site.

    Any Si receiving token -> delete top entry from request_q, send token to that site, set holder to point to it. If request_q is non-empty now, send REQUEST message to the holder site.

  • 27

    Raymonds Algorithm

    Executing CS: getting token with the site at the top of request_q. Delete top of request_q, enter CS.

    Releasing CS If request_q is non-empty, delete top entry from q, send token

    to that site, set holder to that site.

    If request_q is non-empty now, send REQUEST message to the holder site.

    Performance Average messages: O(log N) as average distance between 2

    nodes in the tree is O(log N).

    Synchronization delay: (T log N) / 2, as average distance between 2 sites to successively execute CS is (log N) / 2.

    Greedy approach: Intermediate site getting the token may enter CS instead of forwarding it down. Affects fairness, may cause starvation.

  • 28

    Raymonds Algorithm: Example

    S1

    S4 S5

    S2

    S7

    S3

    S6

    Token

    holder

    Token

    request

    S1

    S4 S5

    S2

    S7

    S3

    S6

    Step 1:

    Step 2:

    Token

  • 29

    Raymonds Algm.: Example

    S1

    S4 S5

    S2

    S7

    S3

    S6

    Step 3:

    Token

    holder

  • 30

    Comparison

    Non-Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)

    Lamport 2T+E T 3(N-1) 3(N-1)

    Ricart-Agrawala 2T+E T 2(N-1) 2(N-1)

    Maekawa 2T+E 2T 3*sq.rt(N) 5*sq.rt(N)

    Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)

    Suzuki-Kasami 2T+E T N N

    Singhal 2T+E T N/2 N

    Raymond T(log N)+E Tlog(N)/2 log(N) 4

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 6

    Friday, Aug 29-2014

  • Distributed Mutual Exclusion

    -Non-Token Based Algorithms

    1. Lamports Algorithm (Covered in Previous Class)

    2. Ricart-Agarwala Algorithm (Covered in Previous Class)

    3. Maekawas Algorithm (Covered in Previous Class)

    -Token Based Algorithms

    1. Suzuki-Kasami Algorithm (Covered in Previous Class)

    2. Singhals Hieuristic Search (Covered in Previous Class)

    * Singhals Algorithm Example (Will be Covered Today)

    3. Raymonds Algorithm (Will be Covered Today)

    -Comparisons of DMS Algorithms (Will be Covered Today)

    - Introduction to Distributed Deadlock (Will be Covered Today)

    Todays Lecture

  • Singhals DMS Algorithm Example

  • R R R R N

    R R R N N

    R R N N N

    R N N N N

    H N N N N

    0 0 0 0 0

    0 0 0 0 0

    0 0 0 0 0

    0 0 0 0 0

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    R R R R R

    R R R R N

    R R R N N

    R R N N N

    H N N N N

    0 0 0 0 1

    0 0 0 1 0

    0 0 1 0 0

    0 1 0 0 0

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    S5

    S4

    S3

    S2

    S1

    S5, S4, S3,S2 Requests for CS

    S5 -> S4, S3, S2,S1 , S4 -> S3, S2,S1 S3 -> S2,S1

    S2 -> S1

    1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

  • R R R R R

    R R R R R

    R R R N R

    R R N N R

    H N N N N

    0 0 0 0 1

    0 0 0 1 1

    0 0 1 0 1

    0 1 0 0 1

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    S5s CS Request Received by S4, S3 , S2

    R R R R R

    R R R R R

    R R R N R

    R R N N R

    H N N N N

    0 1 1 1 1

    0 0 0 1 1

    0 0 1 0 1

    0 1 0 0 1

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    In Response to S5s CS Request Sites S4, S3 , S2 will also send request

  • R R R R R

    R R R R R

    R R R R R

    R R N R R

    H N N N N

    0 1 1 1 1

    0 0 0 1 1

    0 0 1 1 1

    0 1 0 1 1

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    S4s CS Request Received by S3 , S2

    R R R R R

    R R R R R

    R R R R R

    R R N R R

    H N N N N

    0 1 1 1 1

    0 1 1 1 1

    0 0 1 1 1

    0 1 0 1 1

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    In Response to S4s CS Request Sites S3 , S2 will also send request back to S4

  • R R R R R

    R R R R R

    R R R R R

    R R R R R

    H N N N N

    0 1 1 1 1

    0 1 1 1 1

    0 0 1 1 1

    0 1 1 1 1

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    S3s CS Request Received by S2

    R R R R R

    R R R R R

    R R R R R

    R R R R R

    H N N N N

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 0 0 0 0

    SV SN

    0 0 0 0 0 N N N N N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    In Response to S3s CS Request Site S2 will also send request back to S3

  • S5s CS Request Received by S1

    R R R R R

    R R R R R

    R R R R R

    R R R R R

    N N N N R

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 0 0 0 1

    SV SN

    0 0 0 0 1 N N N N R

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    S1 Will Pass Token to S5

    S4,S3 CS Request Received by S1

    R R R R R

    R R R R R

    R R R R R

    R R R R R

    N N R R R

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 0 1 1 1

    SV SN

    0 0 0 0 1 N N N N R

    TSN TSV

    S5

    S4

    S3

    S2

    S1

  • S5 Receives Token From S1 and enters CS

    R R R R E

    R R R R R

    R R R R R

    R R R R R

    N N N N R

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 0 0 0 1

    SV SN

    0 0 0 0 1 N N N N R

    TSN TSV

    S5

    S4

    S3

    S2

    S1

    S5 Releases CS

    N R R R N

    R R R R R

    R R R R R

    R R R R R

    N N N N R

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 1 1 1 1

    0 0 0 0 1

    SV SN

    0 1 1 1 1 N R R R N

    TSN TSV

    S5

    S4

    S3

    S2

    S1

  • 10

    Raymonds Algorithm

    Sites are arranged in a logical directed tree. Root: token holder. Edges: directed towards root.

    Every site has a variable holder that points to an immediate neighbor node, on the directed path towards root. (Roots holder point to itself).

    Requesting CS If Si does not hold token and request CS, sends REQUEST upwards

    provided its request_q is empty. It then adds its request to request_q.

    Non-empty request_q -> REQUEST message for top entry in q (if not done before).

    Site on path to root receiving REQUEST -> propagate it up, if its request_q is empty. Add request to request_q.

    Root on receiving REQUEST -> send token to the site that forwarded the message. Set holder to that forwarding site.

    Any Si receiving token -> delete top entry from request_q, send token to that site, set holder to point to it. If request_q is non-empty now, send REQUEST message to the holder site.

  • 11

    Raymonds Algorithm

    Executing CS: getting token with the site at the top of request_q. Delete top of request_q, enter CS.

    Releasing CS If request_q is non-empty, delete top entry from q, send token

    to that site, set holder to that site.

    If request_q is non-empty now, send REQUEST message to the holder site.

    Performance Average messages: O(log N) as average distance between 2

    nodes in the tree is O(log N).

    Synchronization delay: (T log N) / 2, as average distance between 2 sites to successively execute CS is (log N) / 2.

    Greedy approach: Intermediate site getting the token may enter CS instead of forwarding it down. Affects fairness, may cause starvation.

  • 12

    Raymonds Algorithm: Example

    S1

    S4 S5

    S2

    S7

    S3

    S6

    Token

    holder

    Token

    request

    S1

    S4 S5

    S2

    S7

    S3

    S6

    Step 1:

    Step 2:

    Token

  • 13

    Raymonds Algm.: Example

    S1

    S4 S5

    S2

    S7

    S3

    S6

    Step 3:

    Token

    holder

  • 14

    Comparison

    Non-Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)

    Lamport 2T+E T 3(N-1) 3(N-1)

    Ricart-Agrawala 2T+E T 2(N-1) 2(N-1)

    Maekawa 2T+E 2T 3*sq.rt(N) 5*sq.rt(N)

    Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl)

    Suzuki-Kasami 2T+E T N N

    Singhal 2T+E T N/2 N

    Raymond T(log N)+E Tlog(N)/2 log(N) 4

  • 15

    Deadlock

    Deadlock is a situation where a process or a set of processes is blocked on an event

    that never occurs

    Processes while holding some resources may request for additional allocation of

    resources which are held by other

    processes

    Processes are in circular wait for the resources

  • 16

    Deadlock vs Starvation

    Starvation occurs when a process waits for a resource that becomes available continuously but is not allocated

    to a process

    Two Main Differences

    - In starvation it is not certain that a process will ever get

    the requested resource where as a deadlocked process

    is permanently blocked because required resource never

    become available

    - In starvation the resource under contention is in

    continuation use where as this is not true in case of

    deadlock

  • 17

    Necessary Conditions for Deadlock

    Exclusive access

    Wait while hold

    No Preemption

    Circular wait

  • 18

    Models of Deadlock

    Single-Unit Request Model

    - Process is restricted to request only one resource at a

    time

    - Outdegree in WFG is one

    - Cycle in WFG means deadlock

    P1 P2

    P3

  • 19

    Models of Deadlock

    AND Request Model

    - Process can simultaneously request multiple resources

    - Process Remain blocked until all the resources are granted

    - Outdegree of WFG can be more than 1

    - Cycle in WFG means system is deadlocked

    - Process can be involved in more than one deadlock

    P1 P2

    P3 P4

  • 20

    Models of Deadlock

    OR Request Model

    - Process can simultaneously request multiple resources

    - Process Remain blocked until it is granted any of the requested

    resources

    - Outdegree of WFG can be more than 1

    - Cycle in WFG is not a sufficient condition for the deadlock

    - Knot in the WFG is a sufficient condition for deadlock

    - Knot is a subset of graph such that starting from any node in the

    subset it is impossible to leave the knot by following the edges of the

    graph

  • Cycle vs Knot

    P1 P2

    P3

    P4

    P5

    Cycle but no

    Knot

    Deadlock in AND Model

    But no Deadlock in OR Model

    P1 P2

    P3

    P4

    P5

    Cycle & Knot

    Deadlock in both AND & OR Model

  • Resources

    Reusable (CPU, Main-memory, I/O Devices)

    Consumable (Messages, Interrupt Signals

  • 23

    Distributed Deadlock Detection

    Assumptions:

    a. System has only reusable resources

    b. Only exclusive access to resources

    c. Only one copy of each resource

    d. States of a process: running or blocked

    e. Running state: process has all the resources

    f. Blocked state: waiting on one or more resource

  • 24

    Resource vs Communication

    Deadlocks

    Resource Deadlocks

    A process needs multiple resources for an activity.

    Deadlock occurs if each process in a set request resources

    held by another process in the same set, and it must receive

    all the requested resources to move further.

    Communication Deadlocks

    Processes wait to communicate with other processes in a set.

    Each process in the set is waiting on another processs

    message, and no process in the set initiates a message

    until it receives a message for which it is waiting.

  • 25

    Graph Models

    Nodes of a graph are processes. Edges of a graph the pending requests or assignment of resources.

    Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2.

    Transaction-wait-for Graphs (TWF): WFG in databases.

    Deadlock: directed cycle in the graph.

    Cycle example:

    P1 P2

  • 26

    Graph Models

    Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2.

    P1

    P2

    R1

    R2

    Request

    Edge

    Assignment

    Edge

  • 27

    AND, OR Models

    AND Model

    A process/transaction can simultaneously request for multiple resources.

    Remains blocked until it is granted all of the requested resources.

    OR Model

    A process/transaction can simultaneously request for multiple resources.

    Remains blocked till any one of the requested resource is granted.

  • 28

    Sufficient Condition

    P1 P2

    P3 P4

    P5

    P6

    Deadlock ??

  • 29

    AND, OR Models

    AND Model

    Presence of a cycle.

    P1 P2

    P3 P4

    P5

  • 30

    AND, OR Models

    OR Model

    Presence of a knot.

    Knot: Subset of a graph such that starting from any node in the subset, it is impossible to leave the knot by

    following the edges of the graph.

    P1 P2

    P3 P4

    P5

    P6

  • 31

    Deadlock Handling Strategies

    Deadlock Prevention: difficult

    Deadlock Avoidance: before allocation, check for possible deadlocks.

    Difficult as it needs global state info in each site (that handles resources).

    Deadlock Detection: Find cycles. Focus of discussion.

    Deadlock detection algorithms must satisfy 2 conditions:

    No undetected deadlocks.

    No false deadlocks.

  • 32

    Distributed Deadlocks

    Centralized Control

    A control site constructs wait-for graphs (WFGs) and checks for directed cycles.

    WFG can be maintained continuously (or) built on-demand by requesting WFGs from individual sites.

    Distributed Control

    WFG is spread over different sites.Any site can initiate the deadlock detection process.

    Hierarchical Control

    Sites are arranged in a hierarchy.

    A site checks for cycles only in descendents.

  • 33

    Centralized Algorithms

    Ho-Ramamurthy 2-phase Algorithm

    Each site maintains a status table of all processes initiated at that site: includes all resources locked & all resources being waited on.

    Controller requests (periodically) the status table from each site.

    Controller then constructs WFG from these tables, searches for cycle(s).

    If no cycles, no deadlocks.

    Otherwise, (cycle exists): Request for state tables again.

    Construct WFG based only on common transactions in the 2 tables.

    If the same cycle is detected again, system is in deadlock.

    Later proved: cycles in 2 consecutive reports need not result in a deadlock. Hence, this algorithm detects false deadlocks.

  • 34

    Centralized Algorithms...

    Ho-Ramamoorthy 1-phase Algorithm

    Each site maintains 2 status tables: resource status table and process status table.

    Resource table: transactions that have locked or are waiting for resources.

    Process table: resources locked by or waited on by transactions.

    Controller periodically collects these tables from each site.

    Constructs a WFG from transactions common to both the tables.

    No cycle, no deadlocks.

    A cycle means a deadlock.

  • BITS Pilani

    Pilani Campus

    Advance Operating System

    (CS ZG623)

    First Semester 2014-2015

    PANKAJ VYAS

    Department of CS/IS

    ([email protected])

    Lecture 7

    Sunday, Aug 31-2014

  • What Covered in Last Class

    - Introduction to Deadlocks

    - Centralized Algorithms

    - Ho-Ramamoorthys 2-Phase

    - Ho-Ramamoorthys 1-Phase

    Todays Lecture

    - Distributed Deadlock Detection Algorithms

    - Path Pushing [Will not be Covered in Syllabus]

    - Edge Chasing [Examples :CMH for AND Model ,

    Sinha-Natarajan. Mitchell-Merritt Algorithm ]

    - Diffusion Computation [Examples :CMH for OR Model,

    Chandy-Herman]

    -Global State Detection [Will not be Covered in Syllabus]

    Todays Lecture

  • 3

    Distributed Algorithms

    Path-pushing: resource dependency information disseminated through designated paths (in the graph) [Examples : Menasce-Muntz & Obermarck]

    [Are not included in Syllabus]

    Edge-chasing: special messages or probes circulated along edges of WFG. Deadlock exists if the probe is received back by the initiator. [Examples :CMH for AND Model , Sinha-Natarajan]

    Diffusion computation: queries on status sent to process in WFG. [Examples :CMH for OR Model, Chandy-Herman]

    Global state detection: get a snapshot of the distributed system. [Examples :Bracha-Toueg,Kshemkalyani-Singhal]

    [Are not included in Syllabus]

  • 4

    Edge-Chasing Algorithm

    Chandy-Misra-Haass Algorithm (AND MODEL):

    A probe(i, j, k) is used by a deadlock detection process Pi. This probe is sent by the home site of Pj to Pk.

    This probe message is circulated via the edges of the graph. Probe returning to Pi implies deadlock detection.

    Terms used:

    Pj is dependent on Pk, if a sequence of Pj, Pi1,.., Pim, Pk exists.

    Pj is locally dependent on Pk, if above condition + Pj,Pk on same site.

    Each process maintains an array dependenti: dependenti(j) is true if Pi knows that Pj is dependent on it. (initially set to false for all i & j).

  • 5

    Chandy-Misra-Haass Algorithm Sending the probe:

    if Pi is locally dependent on itself then deadlock.

    else for all Pj and Pk such that

    (a) Pi is locally dependent upon Pj, and

    (b) Pj is waiting on Pk, and

    (c ) Pj and Pk are on different sites, send probe(i,j,k) to the home

    site of Pk.

    Receiving the probe:

    if (d) Pk is blocked, and

    (e) dependentk(i) is false, and

    (f) Pk has not replied to all requests of Pj,

    then begin

    dependentk(i) := true;

    if k = i then Pi is deadlocked

    else ...

  • 6

    Chandy-Misra-Haass Algorithm

    Receiving the probe:

    .

    else for all Pm and Pn such that

    (a) Pk is locally dependent upon Pm, and

    (b) Pm is waiting on Pn, and

    (c) Pm and Pn are on different sites, send probe(i,m,n)

    to the home site of Pn.

    end.

    Performance:

    For a deadlock that spans m processes over n sites, m(n-1)/2 messages

    are needed.

    Size of the message 3 words.

    Delay in deadlock detection O(n).

  • 7

    C-M-H Algorithm: Example

    P1 P2 P6 P7

    P3

    P4 P5

    Site 1

    Site 2

    Site 3

    ( 1,1,2 )

    ( 1,2,3 )

    ( 1,2,4 )

    ( 1,4,5 )

    ( 1,5,6 )

    ( 1,6,7 )

    ( 1,7,1 )

    P1 initiates Deadlock

    Detection by sending

    Probe Message (1,1,2) to

    P2

  • 8

    Diffusion-based Algorithm

    CMH Algorithm for OR Model Initiation by a blocked process Pi:

    send query(i,i,j) to all processes Pj in the dependent set DSi of Pi;

    num(i) := |DSi|; waiti(i) := true;

    Blocked process Pk receiving query(i,j,k):

    if this is engaging query for process Pk /* first query from Pi */

    then send query(i,k,m) to all Pm in DSk;

    numk(i) := |DSk|; waitk(i) := true;

    else if waitk(i) then send a reply(i,k,j) to Pj.

    Process Pk receiving reply(i,j,k)

    if waitk(i) then

    numk(i) := numk(i) - 1;

    if numk(i) = 0 then

    if i = k then declare a deadlock.

    else send reply(i, k, m) to Pm, which sent the engaging query.

  • 9

    Diffusion Algorithm: Example

  • 11

    Engaging Query

    How to distinguish an engaging query? query(i,j,k) from the initiator contains a unique

    sequence number for the query apart from the tuple (i,j,k).

    This sequence number is used to identify subsequent queries.

    (e.g.,) when query(1,7,1) is received by P1 from P7, P1 checks the sequence number along with the tuple.

    P1 understands that the query was initiated by itself and it is not an engaging query.

    Hence, P1 sends a reply back to P7 instead of forwarding the query on all its outgoing links.

  • Mitchell-Merritt Algorithm

    (Edge-Chasing Category)

    Each Node has two labels : Public & Private