replication management using the state-machine approach

Upload: huaichao-wang

Post on 08-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Replication Management using the State-Machine Approach

    1/36

    Replication Managementusing the State-Machine

    ApproachFred B. Schneider

    Summary and Discussion :

    Hee Jung Kim and Ying Zhang

    October 27, 2005

  • 8/7/2019 Replication Management using the State-Machine Approach

    2/36

    Introduction State Machines

    Fault Tolerance Fault-tolerant State Machines

    Tolerating Faulty Output Devices

    Tolerating Faulty Clients

    Using Time to Make Request

    Reconfiguration

  • 8/7/2019 Replication Management using the State-Machine Approach

    3/36

    Why Replication ? Two kinds of replication are ..

    State machine Approach is .. What can be discussed in each

    sections

    Introduction

  • 8/7/2019 Replication Management using the State-Machine Approach

    4/36

    Ageneral method for implementing a fault-tolerant service by replicating servers and

    coordinating client interactions with serverreplicas.

    State-Machine Approach

  • 8/7/2019 Replication Management using the State-Machine Approach

    5/36

    State machine consist of- State Variables- Commands.

    Command might be implemented by

    - Sharing data amongst procedures,

    - Queuing requests

    - Using interrupt handlers.

    State Machines

  • 8/7/2019 Replication Management using the State-Machine Approach

    6/36

    Requests from clients processed incausal order. O1: Requests issued by a single client

    processed by smin the order they areissued

    O2: r1could have caused r2 => r1processed by

    smbefore r2

    Assumption !

  • 8/7/2019 Replication Management using the State-Machine Approach

    7/36

    Outputs of a state machine arecompletely determined by the

    sequence of requests it processes,independent of time or any other

    activity of a system

    Semantic Characterization

  • 8/7/2019 Replication Management using the State-Machine Approach

    8/36

    monitor: processdo true -> val:= sensor;

    ;delay D

    odend monitor

    Is this a state machine ?pc: state-machine

    var q:real;

    adjust: command(sensor-val: real)q := F(q, sensor-val);send q to actuator

    end adjustend pc

    No !!

    Yes !!

  • 8/7/2019 Replication Management using the State-Machine Approach

    9/36

    Byzantine failures:arbitrary and malicious

    Failstop failures:other components [can] detect

    that a failure has occurred

    Fault Tolerance

  • 8/7/2019 Replication Management using the State-Machine Approach

    10/36

    A system consisting of a set of distinctcomponents is t fault-tolerant if it

    satisfies its specification provided thatno more than t of those componentsbecome faulty during some interval of

    interest.

    T Fault-Tolerance

  • 8/7/2019 Replication Management using the State-Machine Approach

    11/36

    Replicate State Machines and run onseparate processors.

    Each replica

    Starts in the same initial state Executes same requests in the same order

    Assuming independent failure Combine outputs of the replicas of thisensemble.

    Fault-tolerant SM

  • 8/7/2019 Replication Management using the State-Machine Approach

    12/36

    Replica CoordinationAll replicas receive and process the same

    sequence of requests. Agreement :

    Each Non-Fault replica receives every

    request. Order : Each Non-Fault replica processes

    the requests in the same relative order.

    Fault-tolerant SM

  • 8/7/2019 Replication Management using the State-Machine Approach

    13/36

    Any protocol that allows a designatedprocessor called the transmitterso that

    IC1: All non-faulty processors agree on the

    same value.

    IC2: If the transmitter is non-faulty, thenall non-faulty processors use its value as theone on which they agree.

    Agreement

  • 8/7/2019 Replication Management using the State-Machine Approach

    14/36

    Order requirement can be satisfied by

    Assigning unique ids to requests.

    Processing the requests according to atotal ordering on the unique ids.

    Order and Stability

  • 8/7/2019 Replication Management using the State-Machine Approach

    15/36

    Order Implementation

    A replica next processes the stablerequest with smallest unique ids.

    Using Logical Clocks. Synchronized Real-Time Clocks.

    Using Replica-Generated Identifiers.

  • 8/7/2019 Replication Management using the State-Machine Approach

    16/36

    Using Logical Clocks

    A logical clock is a mapping T fromevents to the integers.

    LCl: Tp is incremented after eachevent at P.

    LC2: Upon receipt of a message -with

    timestamp ts, process p resets Tp,:Tp := max(Tp, ts) + 1.

  • 8/7/2019 Replication Management using the State-Machine Approach

    17/36

    Using Logical Clocks

    Assumption to property ofcommunication channels. FIFO channels between processors

    Failure Detection Assumption (for fail-stop processors) : A processor pdetects

    that a fail-stop processor qhas failed onlyafter phas received the last message sentto p by q.

  • 8/7/2019 Replication Management using the State-Machine Approach

    18/36

    Logical Clocks Stability Test

    Every client periodically makes some-possibly null-request to the state machine.

    Request stable at smiif a request withlarger timestamp has been received fromevery client running on a non-faultyprocessor.

  • 8/7/2019 Replication Management using the State-Machine Approach

    19/36

    Synchronized Real-time Clocks Tp(e) : the real-time clock at processor p

    when event eoccurs.Unique id : Tp(e) appended by fixed bitstring that uniquely identifies p.

    - O1 satisfied if only one request in betweensuccessive clock ticks

    - O2 satisfied if degree on synchronizationis better than the minimum messagedelivery time.

  • 8/7/2019 Replication Management using the State-Machine Approach

    20/36

    Synchronized Real-time Clocks

    (contd) Real-time Clock Stability Test I

    ris stable at smiexecuted at pif the local clock

    at preads tsand uid(r) < ts td

    Real Clock Stability Test IIris stable at smiif a request with larger uid has

    been received from every client.

  • 8/7/2019 Replication Management using the State-Machine Approach

    21/36

    Using Replica-Generated Ids. Unique ids assigned by the replicas

    Two phase protocol

    Replicas propose candidate unique ids

    One candidate is selected Elaboration of the protocol

    Seen : smihas seen ronce it has received rand

    proposed a candidate unique id for it. Accepted: smihas accepted ronce it knows thefinal choice of uid(r).

  • 8/7/2019 Replication Management using the State-Machine Approach

    22/36

    Using Replica-Generated Ids.

    Constraints on the proposed

    ids(cuid(smi,r)) UID1: cuid(smi,r) < = uid(r)

    UID2: if rSEEN at smiafter rhas been

    accepted then uid(r) < cuid(smi,r) Replica-Generated Id Stability Test:

    rthat has been accepted by smiis stable providedthere is no request rthat has

    i) Been seen by smi

    ii) Not been accepted by smiiii) cuid(smi,r) < = uid(r)

  • 8/7/2019 Replication Management using the State-Machine Approach

    23/36

    Using Replica-Generated Ids.

    Replica-generated Unique Identifiers :smimaintains

    SEENi: largest cuid(smi,r) so far assigned by smi ACCEPT i: largest uid(r) so far assigned by smi onreceipt of r cuid(smi,r) = max( ) + 1+ i

    Disseminates cuid(smi,r) to other replicas, awaitsreceipt of a candidate uid from every non-faultyreplica.

    uid(r) = maxj(cuid(smi,r))

  • 8/7/2019 Replication Management using the State-Machine Approach

    24/36

    Outputs used outside system :

    Use replicated voters and output devices.

    Outputs used inside system :

    the client need not gather a majority of

    responses to its request to the state

    machine. It can use the single responseproduced locally.

    Tolerating Faulty Output Devices

  • 8/7/2019 Replication Management using the State-Machine Approach

    25/36

    Replicate the client

    - However, requires changes to state machinesthat handle requests from that client.

    Defensive programming- Sometimes, a client cannot be made

    fault-tolerant by using replication.- Careful design of state machine can limit the

    effects of requests from faulty clients.

    Tolerating Faulty Clients

  • 8/7/2019 Replication Management using the State-Machine Approach

    26/36

    Assume that- All clients and state machine replicas have

    clocks synchronized to within r, and

    - Election starts at time strtand known to all clien

    ts and state machine replicas. Transmitting a default vote

    - If client has not made a request by time strt+ r,

    then a request with that clients default vote hasbeen made.

    Using Time to Make Request

  • 8/7/2019 Replication Management using the State-Machine Approach

    27/36

    An ensemble of state machine replicas can tolerate more than t faults if it is possible to remove state machine replicas ru

    nning on faulty processors from the ensemble and add replicas running on repairedprocessors.

    Reconfiguration

  • 8/7/2019 Replication Management using the State-Machine Approach

    28/36

    Combining Condition:

    P(t) - F(t) > X for all 0

  • 8/7/2019 Replication Management using the State-Machine Approach

    29/36

    Unbounded total number offaultsis possible if ..

    Fl: Byzantine failures, removed faulty replica fromthe ensemble before the Combining Condition is

    violated by subsequent processor failures.

    F2: Replicas running on repaired processors areadded to the ensemble before the CombiningCondition is violated by subsequent processorfailures.

  • 8/7/2019 Replication Management using the State-Machine Approach

    30/36

    Configuration

    The configurationof the system is defined as:C: The clientsS: The state-machine replicas

    O: The output devices

    To change system configuration ..

    - the value of C,S,O must be available- whenever C,S,O added, state must be updated

  • 8/7/2019 Replication Management using the State-Machine Approach

    31/36

    Managing Configuration

    A non -faulty configurator satisfies ..

    C1: Only a faulty element is removedfrom the configuration.C2: Only a non-faulty element is added

    to the configuration.

  • 8/7/2019 Replication Management using the State-Machine Approach

    32/36

    Integration with Failstop Processorsand Logical Clocks

    If eis a client or output device, then smi sends the statvariables to before sending any output with ids > rjoin.

    If e is a state-machine replica, smnew, then smi:1. sends state variables and copies of any pendingrequests to smnew,2. sends sm

    newsubsequent request r received from c

    such that uid(r) < uid(rc), where rc is the first requestthat smnew received directly from cafter beingrestarted.

  • 8/7/2019 Replication Management using the State-Machine Approach

    33/36

    Integration with Failstop Processorsand Realtime Clocks

    If eis a client or output device, then smi sends the statevariables to before sending any output with ids > rjoin.

    If e is a state-machine replica, smnew, then smi:1. sends state variables and copies of any pendingrequests to smnew,2. sends to sm

    newevery request received

    during the next interval of duration.

    Simplified !!

  • 8/7/2019 Replication Management using the State-Machine Approach

    34/36

    Stability RevisedWhen requests made by a client can be

    received from two sources-the client and viaa relay.The stability test must be changed ..

    Stability Test During Restart :

    rreceived directly from cby a restarting

    smnew is stable only after the last requestfrom crelayed by another processor hasbeen received by smnew

  • 8/7/2019 Replication Management using the State-Machine Approach

    35/36

    State Machines approach is ..

    Coping with failures (Byzantine, Failstop) ..

    -. Fault-tolerant State Machines

    -. Tolerating Faulty Output Devices

    -. Tolerating Faulty Clients

    Optimization :

    - . Using time to requestDynamic reconfiguration

    -. Managing the configuration-. Integrating a repaired object

    Summary

  • 8/7/2019 Replication Management using the State-Machine Approach

    36/36

    Thank you !!!

    Any question ???