fault tolerance

Fault - Tolerant Services

Laxmi Yadav(46)Arti Yadav(47)Msc.I.T. Part-1

Introduction

• In this section, examine how to provide a service that is correct despite up to process failures, by replicating data and functionality at replica managers.

• Intuitively, a service based on replication is correct if it keeps responding despite failures and if clients cannot tell the difference between the service they obtain from an implementation with replicated data and one provided by a single correct replica manager.

Fault Tolerance concept Texonomy

A replicated shared object service is said to be linearizable if for any execution there is some interleaving of the series of operations issued by all the clients that satisfies the following two criteria:

• The interleaved sequence of operations meets the specification of a (single) correct copy of the objects.

• The order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution.

Linearizability

Sequential consistencyA replicated shared object service is said to be sequentially consistent if for any execution there is some interleaving of the series of operations issued by all the clients that satisfies the following two criteria:

• The interleaved sequence of operations meets the specification of a (single) correct copy of the objects.

• The order of operations in the interleaving is consistent with the program order in which each individual client executed them.

Replication TechniquesReplication present in this section two fundamental classes of techniques that ensure linearizability:

• Passive (primary - backup) replication technique• The active replication technique

Passive (primary-backup) replication

• In the passive or primary-backup model of replication for fault tolerance there is at any one time a single primary replica manager and one or more secondary replica managers – ‘backups’ or ‘slaves’.

• This system obviously implements linearizability if the primary is correct, since the primary sequences all the operations upon the shared objects.

• When the primary crashes, the communication system eventually delivers a new view to the surviving backups, one that excludes the old primary.

The sequence of events when a client requests an operation to be performed is as follows:

• Request• Coordination• Execution• Agreement• Response

The passive (primary-backup) model

Active replication• In the active model of replication for fault tolerance the

replica managers are state machines that play equivalent roles and are organized as a group.

• The active replication system does not achieve linearizability. This is because the total order in which the replica managers process requests is not necessarily the same as the real-time order in which the clients made their requests.

• The reliability of the multicast ensures that every correct replica manager processes the same set of requests and the total order ensures that they process them in the same order.

the sequence of events when a client requests an operation to be performed is as follows:

• Request• Coordination• Execution• Agreement• Response

Active replication

Conclusion• Linearizability has been introduced as the abstract

correctness criterion, and “active replication/primary-backup" have been presented as the two main classes of replication techniques.

• The real issue in achieving fault-tolerance by replication is thus related to the implementation of the group multicast primitives.

Thank you!!!

fault tolerance

Education