chapter 7 - local stabilization1 chapter 7 – local stabilization self-stabilization shlomi dolev...

Chapter 7 - Local Stabilization 1

Chapter 7 – Local

Stabilization

Self-StabilizationShlomi DolevMIT Press , 2000

Draft of January 2004Shlomi Dolev, All Rights Reserved ©


Chapter 7: roadmap

7.1 Superstabilization 7.2 Self-Stabilizing Fault-Containing

Algorithms7.3 Error-Detection Codes and Repair


Dynamic System

Algorithms for dynamic systems are designed to cope with failures of processors with no global re-initialization.

Such algorithms consider only global states reachable from a predefined initial state under a restrictive sequence of failures and attempt to cope with such failures with as few adjustments as possible.

Self Stabilization

Self-stabilizing algorithms are designed to guarantee a particular behavior finally.

Traditionally, changes in the communications graph were ignored.

Dynamic System & Self Stabilization

Superstabilizing algorithms combine the benefits of both self-stabilizing and dynamic algorithms


Definitions

A Superstabilizing Algorithm:

Must be self-stabilizing Must preserve a “passage predicate” Should exhibit fast convergence rate

Passage Predicate - Defined with respect to a class of topology changes(A topology change falsifies legitimacy and therefore the passage predicate must be weaker than legitimacy but strong enough to be

useful).


Passage Predicate - Example

In a token ring:

A processor crash can lose the token but still not falsify the passage predicate

Passage Predicate Legitimate State

At most one token exists in the system. (e.g. the existence of 2 tokens isn’t legal)

Exactly one token exists in the system.


Evaluation of a Super-Stabilizing Algorithm

a. Time complexityThe maximal number of rounds that have passed from a legitimate state through a single topology change and ends in a legitimate state

b. Adjustment measureThe maximal number of processors that must change their local state upon a topology change, in order to achieve legitimacy


Motivation for Super-Stabilization

A self-stabilizing algorithm that does not ignore theoccurrence of topology changes (“events”) will beinitialized in a predefined way and react better to dynamic changes during execution

Question:Is it possible, for the algorithm that detects a fault, when it occurs, to maintain a “nearly legitimate” state during convergence?


Motivation for Super-Stabilization

While transient faults are rare (but harmful), a dynamic change in the topology may be frequent.

Thus, a super-stabilizing algorithm has a lower worst-case time measure for reaching a legitimate state again, once a topology change occurs.

In the following slides we present a self-stabilizing and a super-stabilizing algorithm for the graph coloring task.


Graph Coloring

a. The coloring task is to assign a color value to each processor, such that no two neighboring processors are assigned the same color.

b. Minimization of the colors number is not required. The algorithm uses Δ+1 colors, where Δ is an upper bound on a processor’s number of neighbors.

c. For example:


If Pi has the color of one of its neighbors with a higher ID, it chooses another color and writes it.

Graph Coloring - A Self-Stabilzing Algorithm

01 Do forever02 GColors := 003 For m:=1 to δ do04 lrm:=read(rm)

05 If ID(m)>i then 06 GColors := GColors U lrm.color

07 od08 If colori GColors then

09 colori:=choose(\\ GColors)

10 Write ri.color := color

11 od

Colors of Pi’s

neighbors

Gather only thecolors of neighborswith greater ID than Pi’s.


Id = 3 Id = 1

Id = 5

Id = 4 Id = 2

Graph Coloring - Self-Stabilzing Algorithm - Simulation

GColors = { Blue }

GColors = { Blue }

GColors = { Blue }

GColors = { Blue }

GColors = {}

Phase I


Id = 3 Id = 1

Id = 5

Id = 4 Id = 2

GColors = { Green }

GColors = {}

GColors = {}

GColors = { Blue , green , Red }

GColors = { Blue , green }

Phase II



Id = 3 Id = 1

Id = 5

Id = 4 Id = 2

Stabilized

GColors = { Green }

GColors = {}

GColors = {}

GColors = { Blue , green , Red }

GColors = { Blue , green }

Phase III



Graph Coloring - Self-Stabilizing Algorithm (continued)

What happens when a change in the topology occurs ?

If a new neighbor is added, it is possible that twoprocessors have the same color. It is possible that during convergence every processor will change its color.Example:

1 2 3 4 5

i=4GColo

rs

{blue}

i=5GColo

rs

ø

i=1GColo

rs

{blue}

i=2GColo

rs

{red}

i=3GColo

rs

{red}

i=2GColo

rs

{blue}

i=1GColo

rs

{red}

Stabilized

But in what cost ?


Graph Coloring – Super-Stabilizing Motivation

a. Every processor changed its color but only one processor really needed to.

b. If we could identify the topology change we could maintain the changes in its environment.

c. We’ll add some elements to the algorithm:a. AColor – A variable that collects all of the

processor neighbors’ colors.b. Interrupt section – Identify the problematic area.c. - A symbol to flag a non-existing color.


01 Do forever02 AColors := 03 GColors := 04 For m:=1 to δ do05 lrm:=read(rm)

06 AColors := AColors U lrm.color07 If ID(m)>i then GColors := GColors U

lrm.color08 od

09 If colori = ┴ or colori GColors then

10 colori:=choose(\\ AColors)

11 Write ri.color := color12 od13 Interrupt section14 If recoverij and j > i then

15 Colori := ┴16 Write ri.color := ┴

All of Pi neighbors’

colors

Graph Coloring – A Super-Stabilizing Algorithm

Activated after a topology change toidentify the critical processor

recoveri,j is the interrupt which Pi gets upon a

change in the communication between Pi and Pj


Graph Coloring - Super-Stabilizing Algorithm - Example

Notice that the new algorithm stabilizes faster

than the previous one.Let us consider the previous example, this

time using the super-stabilizing algorithm:

1 2 3 4 5

i=4GColors = {blue}AColors =

{blue,red}

Stabilized

In O(1).

Color4 =

r4.color =


Graph Coloring – Super-Stabilizing Proof

Lemma 1: This algorithm is self-stabilizing.Proof by induction:a. After the first iteration:

a. The value doesn’t exist in the system.b. Pn has a fixed value.

b. Assume that Pk has a fixed value i<k<n.

If Pi has a Pk neighbor then Pi does not

change to Pk’s color, but chooses a different

color.Due to the assumptions we get that Pi’s color

becomes fixed for 1≤i≤n, so the system stabilizes.


Graph Coloring – Super-Stabilizing

Passage Predicate – The color of a neighboring processor is always different in everyexecution that starts in a safe configuration,in which only a single topology change occursbefore the next safe configuration is reached



Super-stabilizing Time – Number of cycles required to reach a safe configuration

following a topology change.

Super-stabilizing vs. Self-Stabilizing

O(1) O(n)



Adjustment Measure – The number of processors that changes color upon a topology change.The super-stabilizing algorithm changes one processor color, the one which had the singletopology change


Chapter 7: roadmap




Self-Stabilizing Fault-Containing Algorithms

Fault model : Several transient faults in the system.This fault model is less severe than dynamic changes faults, and considers the case where f transient faults occurred, changing the state of f processors.

The goal of self-stabilizing fault containing algorithms :a) From any arbitrary configuration, a safe

configuration is reached.b) Starting from a safe configuration followed by

transient faults that corrupt the state of f processors, a safe configuration is reached within O(f) cycles.


A Self-Stabilizing Algorithm for Fixed Output Tasks

Our Goal: to design a self-stabilizing fault-containing algorithm for fixed-output fixed-input tasks.

Fixed Input: the algorithm has a fixed input (like its fixed local topology), Ii will contain the input for processor Pi

Fixed Output: the variable Oi will contain the output of processor Pi, the output should not change over time.

A version of the update algorithm is a self stabilizing algorithm for any fixed-input fixed-output task.


Fixed-output algorithm for Processor Pi

1. upon a pulse2. ReadSeti := Ø

3. forall Pj N(i) do

4. ReadSeti := ReadSeti read(Processorsj)

5. ReadSeti := ReadSeti \\ <i,*,*>

6. ReadSeti := ReadSeti ++ <*,1,*>

7. ReadSeti := ReadSeti {<i,0,Ii>}

8. forall Pj processors(ReadSeti) do

9. ReadSeti := ReadSeti \\ NotMinDist(Pj, ReadSeti)

10. write Processorsi := ConPrefix(ReadSeti)

11. write Oi := ComputeOutput(Inputs(Processorsi))


Explaining the Algorithm

The algorithm is a version of the self-stabilizing update algorithm, it has an extra Ii variable in each <id, dis, Ii> tuple which contains the fixed input of the processor Pi

Just like in the update algorithm, it is guaranteed that eventually each processor will have a tuple for all other processors

Each processor will have all the inputs and will compute the correct output.

But is this algorithm fault containing?


Does this Algorithm have the Fault-containment Property?

Error Scenario (assuming output is OR of inputs): In a safe configuration P5 has a tuple <1,4,0> A fault has occurred and changed it to <1,1,1> Error propagation :

Conclusion : It doesn’t have this property. The system stabilizes only after O(d/2) cycles

cycle : 0

P1 P2 P3 P4 P5

I=0

I=0

I=0

I=0

I=0

O1=0

O2=0

O3=0

O4=0 O5=0<1,1,0> <1,2,0> <1,4,0><1,3,0>1

O4=0

<1,1,1><1,3,0>

O5=12 <1,2,1>

O4=1 <1,4,0>

O5=0

3 <1,3,1>

O4=0

<1,3,0>

O5=1

4

O5=0<1,4,0>


Fault Containment – Naive Approach

A processor that is about to change its output waits for d cycles before it does so

This approach ensures : Stabilization from every arbitrary state Starting in a safe configuration followed by

several simultaneous transient faults, only the output of the processors that experienced a fault can be changed and each such change is a change to correct output value.

During this interval correct input values propagate towards the faulty processor


Fault Containment – Naive Approach (cont.)

The above approach ensures self-stabilization, but has a serious drawback : The time it takes for the correct input values

to propagate to the faulty processors and correct its output is O(d), which contradicts the second requirement of self stabilizing fault-containment

This requirement states that a safe configuration is reached within O(f) cycles

Lets consider a more sophisticated approach to meet all fault-containment requirements


Designing a Fault-containing Algorithm

Evidence : The values of all the Ii fields in Processorsi. Each tuple will contain evidence in addition to its other 3 fields

The additional Ai variable enables the processors that experienced a fault to learn quickly about the inputs of the remote processors

When a processor experiences a fault, the Ai variables of most of the processors within distance 2f+1 or less from it are the set of correct inputs

We should maintain this as an invariant throughout the algorithm for a time sufficient to let the faulty processors regain the correct input values Ii of the other processors


Self-stabilizing Fault-containing Algorithm for Processor Pi

1. upon a pulse2. ReadSeti := Ø3. forall Pj N(i) do4. ReadSeti := ReadSeti read(Processorsj)5. if (RepairCounteri ≠ d + 1) then //in repair process6. RepairCounteri := min(RepairCounteri,

read(RepairCounterj))7. od8. ReadSeti := ReadSeti \\ <i,*,*,*>9. ReadSeti := ReadSeti ++ <*,1,*,*>10. ReadSeti := ReadSeti <i,0,Ii,Ai>11. forall Pj processors(ReadSeti) do12. ReadSeti := ReadSeti \\ NotMinDist(Pj, ReadSeti)13. write Processorsi := ConPrefix(ReadSeti)


Self-stabilizing Fault-containing Algorithm – (cont.)

14. if (RepairCounteri = d + 1) then //not in repair process

15. if (Oi ≠ ComputeOutputi(Inputs(Processorsi))) or

16. (<*,*,*,A> Processorsi | A ≠ Inputs(Processorsi))

17. then RepairCounteri := 0 //repair started

18. else //in repair process19. RepairCounteri := min(RepairCounteri + 1, d + 1)

20. write Oi := ComputeOutputi(RepairCounteri,

21. MajorityInputs(Processorsi))

22. if (RepairCounteri = d + 1) then //repair over

23. Ai := Inputs(Processorsi)


Explaining the Algorithm

Initial state: RepairCounteri = d + 1. A change in the RepairCounteri variable occurs when Pi detects an error in its state, or when an error from another processor propagates towards it during the repair process

ComputeOutput is applied on the majority of inputs of processors with distance ≤ RepairCounteri from Pi, thus assuring that eventually the distance 2f+1 is reached and that Oi is set correcly

The value of Ai doesn’t change throughout the repair process, thus maintaining the invariant needed for the faulty processors to regain the correct input values and eventually present a correct output


cycle :

P1 P2 P3 P4 P5

I=0

I=0

I=0

I=0

I=0

O=0 O=0 O=0 O=0 O=0

<1,1,0,A1> <1,2,0,A1>

0

<1,4,0,A1><1,3,0,A1>

P6

I=0

<1,5,0,A1>

A2:i1= 0 A4:i1= 0

A3:i1= 0A4:i1= 0

A5:i1= 0

Error scenario : Input of P1 at (P2 and P4) erroneously alters to 1 Input of P1 at (A2 and A4) evidences erroneously alters to 1 The erroneous input propagates to P3 and so do the

evidences, causing it to calculate erroneous output based on majority inputs at distance ≤ 1

The output is fixed in P3 once the distance grows to ≤ 2.

A2:i1= 0

A1:i1= 0A2:i1= 1 A4:i1= 1

1

<1,1,1,A1> <1,3,1,A1><1,0,0,A1>

A4:i1= 1

<1,1,0,A1>

2

<1,3,0,A1>

A2:i1= 1

<1,2,1,A1>

3

<1,2,0,A1>

O=1

<1,3,1,A1>

4

O=0

<1,3,0,A1>


Error Scenario

Initially the network graph is in a safe configuration

In the first cycle red processors experience several faults : The evidence considering the blue processor

is erroneous. The distance to the blue processor and its

input are also erroneous. In the next cycles the error propagates

throughout the graph The output becomes erroneous at many

processors, but convergence back to the safe state is quick


Feel the Power (example)

…

…

…

Cycle:

01234

Regular

Wrong evidence

Source (blue)

Wrong output

Wrong evidence

and output

(repairCounter)


Algorithm Analysis

Ai of processor Pi stays unchanged for (d+1) cycles since the repair process started, which ensures that the fault factor doesn’t grow

The majority-based output calculation is applied during the repair process After 2f+1 cycles the majority of inputs

around the faulty processor is correct, applying correct output computation

In that manner, after 2f+1 cycles the system’s output stabilizes with correct values, despite the continuing changes in processors’ internal states (for d+1 cycles)


Conclusions

Compared to the naive implementation, this algorithm significantly shortens the system stabilization time

The price we pay for this improvement : Network load grows because each tuple

includes an additional variable Ai of size O(n) (n – number of processors).

During the algorithm execution, faulty output is allowed for non-faulty processors during short periods of time


Chapter 7: roadmap



chapter 7 - local stabilization1 chapter 7 – local stabilization self-stabilization shlomi dolev...

Documents

local state

local stabilization1

local stabilization2

superstabilizing algorithm

selfstabilizing fault

dynamic algorithms

legitimacy slide

local stabilization9