fast leader (full) recovery despite dynamic faults

Post on 21-Jan-2016

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Fast Leader (Full) Recovery despite Dynamic Faults. Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil. Join Work. Sébastien Tixeuil. Ajoy K. Datta & Lawrence L. Larmore. Self-Stabilization [Dijkstra,74]. Self-Stabilization [Dijkstra,74]. - PowerPoint PPT Presentation

TRANSCRIPT

Fast Leader (Full) Recovery despite Dynamic Faults

Ajoy K. Datta

Stéphane Devismes

Lawrence L. Larmore

Sébastien Tixeuil

Join Work

ICDCN, 04/01/2013, Mumbia

Ajoy K. Datta & Lawrence L. Larmore

Sébastien Tixeuil

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

A fault = a process state corruption

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Recover after any number of

transient faults

Price of the Versatility

1. Several impossibility results– E.g., Leader Election and Token

Circulation in anonymous networks

2. The stabilization time usually depends on global parameters

(diameter, size of the network …)

ICDCN, 04/01/2013, Mumbia

Price of the Versatility

1. Several impossibility results– E.g., Leader Election and Token

Circulation in Anonymous Networks

2. The stabilization time usually depends on global parameters

(diameter, size of the network …)

ICDCN, 04/01/2013, Mumbia

When a few number of faults hit the system

• Self-Stabilization: Ω(D) rounds

ICDCN, 04/01/2013, Mumbia

When a few number of faults hit the system

• Self-Stabilization: Ω(D) rounds

• Stronger forms:– Fault Containment [Ghosh et al, Dist Comp 2007]

– k-adaptive Self-Stabilization [Burman et al, OPODIS’05]

• Weakened forms:– k-stabilization [Beauquier et al, PODC’98]

ICDCN, 04/01/2013, Mumbia

When a few number of faults hit the system

• Self-Stabilization: Ω(D) rounds

• Stronger forms:– Fault Containment [Ghosh et al, Dist Comp 2007]

– k-adaptive Self-Stabilization [Burman et al, OPODIS’05]

• Weakened forms:– k-stabilization [Beauquier et al, PODC’98]

ICDCN, 04/01/2013, Mumbia

Fault-Containment

• Pros– Self-stabilizing– If f ≤ k faults, stabilization time in O(f) rounds– Containment radius– Fault gap is small

• Cons (currently) – k=1, or– Surrounded by a majority of correct processes, or – Synchronous setting, or– Probabilistic recovery

ICDCN, 04/01/2013, Mumbia

Fault gap• The minimum time between consecutive faulty

transitions to have O(f) recovery time

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

≥ Fault gap

O(f)

Fault gap• The minimum time between consecutive faulty

transitions to have O(f) recovery time

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

< fault gap

>Ω(D)

Time-Adaptive Self-stabilization

• Self-Stabilization

• If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), – “output” stabilization in O(f) rounds

ICDCN, 04/01/2013, Mumbia

Output vs. State Stabilization

ICDCN, 04/01/2013, Mumbia

Legitimate

Correct OutputO(f)

>Ω(D)

Illegitimate

f ≤ k faults

Output vs. State Stabilization

ICDCN, 04/01/2013, Mumbia

Legitimate

Correct OutputO(f)

>Ω(D)

Illegitimate

f ≤ k faults

The fault gap depends on global parameters

k-Stabilization (first definition)

ICDCN, 04/01/2013, Mumbia

If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous,the system eventually recoversOtherwise no guarantee

k-Stabilization (first definition)

• Pros– Can solve more problems than self-stabilization– Usually, only-k-dependent stabilization time– Usually, only-k-dependent fault gap

• Cons– Not self-stabilizing– Static faults: f ≤ k faults should occur in a single

transition ICDCN, 04/01/2013, Mumbia

Our definition of k-stabilization

• Faulty transition = one process state corruption

• Dynamic faults: – if f ≤ k faulty transitions occur

in an arbitrary manner• The system eventually recovers

ICDCN, 04/01/2013, Mumbia

Our definition of k-stabilization

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

1 fault 1 fault 1 fault

f ≤ k faults

Our contribution

• Leader recovery protocol– On an anonymous (yet oriented) ring– Asynchronous atomic read/write

– k-stabilizing if n ≥ 18k + 1– Stabilization time O(k2) rounds– Log(k) bits per process– This problem is unsolvable in self-stabilizing setting

ICDCN, 04/01/2013, Mumbia

Our contribution

ICDCN, 04/01/2013, Mumbia

The system stars in a legitimate configuration where one process is elected

Our contribution

ICDCN, 04/01/2013, Mumbia

Some faulty transitions occurs in an arbitrary manner

Our contribution

ICDCN, 04/01/2013, Mumbia

Some faulty transitions occurs in an arbitrary manner

Fault propagation

Our contribution

ICDCN, 04/01/2013, Mumbia

Some faulty transitions occurs in an arbitrary manner

Fault propagation

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Fault gap

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

f ≤ k faulty transition

f ≤ k faulty transitions

0 0O(k2) rounds

Main ideas of the algorithm

ICDCN, 04/01/2013, Mumbia

Vote = Relative Address {-∈3k..3k} { }∪ ⊥

ICDCN, 04/01/2013, Mumbia

0

⊥⊥

3

2

1-1

-2

-3

3k

Interval of relevance:6+1 votes

After k faults

ICDCN, 04/01/2013, Mumbia

0

⊥⊥

3

2

1-1

-2

-3

After k faults

ICDCN, 04/01/2013, Mumbia

0

⊥⊥

3

0

1-1

-2

-3

After k faults

ICDCN, 04/01/2013, Mumbia

1

⊥⊥

3

0

1 0

-2

-3

At most 3k processes change their votes

After k faults

ICDCN, 04/01/2013, Mumbia

1

⊥⊥

3

0

1 0

-2

-3

At most 3k processes change their votes

Always a majority of votes for the previous leader

Rumors

ICDCN, 04/01/2013, Mumbia

1

1

Vote

Rumor

In a legitimate state, Vote = Rumor, for all process

Main idea:Vote: hard to change Rumor: easy to change

Rumors

ICDCN, 04/01/2013, Mumbia

1

2

Vote

Rumor If Rumor ≠ Vote• If Rumor ≠ ⊥

• Candidate ← Rumor• Else

• Candidate ← VoteInitiate Query(Candidate)

Rumors

ICDCN, 04/01/2013, Mumbia

1

2

Vote

Rumor Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and

Count the votes for the candidate

Query Return

• If at least 3k+1 votes for the Candidate

– If Rumor ≠ ≠ Candidate⊥• Initiate a Denial of rumor in its interval of relevance

– Vote←Candidate

– Rumor←Candidate

• Else

– If Rumor = Candidate, then Rumor←⊥– Initiate a Denial of Candidate in its interval of relevance

– If Vote = Candidate, then Vote← ⊥

ICDCN, 04/01/2013, Mumbia

Query Tracks

ICDCN, 04/01/2013, Mumbia

Other tracks

• Denial (to kill a rumor)

• To manage lost queries– Probe wave– Report

(see the paper)

ICDCN, 04/01/2013, Mumbia

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

ICDCN, 04/01/2013, Mumbia

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

ICDCN, 04/01/2013, Mumbia

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

• So, at any time at most n/2 pending initiated query

ICDCN, 04/01/2013, Mumbia

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

• So, at any time at most n/2 pending initiated query• Now, we can have up to 9k rogue queries, i.e., non-

initiated queries

ICDCN, 04/01/2013, Mumbia

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

• So, at any time at most n/2 pending initiated query• Now, we can have up to 9k rogue queries, i.e., non-

initiated queries• So, n > n/2+9k, that is n ≥ 18k + 1

ICDCN, 04/01/2013, Mumbia

Conclusion

• Less restrictive definition of k-stabilization

• Using this definition, we solve a problem having no self-stabilizing solution:– Leader recovery protocol

• On an anonymous (yet oriented) ring• Only-k-dependent complexity:

– Stabilization time O(k2) rounds– Log(k) bits per process

ICDCN, 04/01/2013, Mumbia

Thank You!ICDCN, 04/01/2013, Mumbia

top related