mikel larrea distributed systems group university of the basque country, upv/ehu

22
UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

Upload: tangia

Post on 21-Mar-2016

52 views

Category:

Documents


1 download

DESCRIPTION

Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments. Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU. Context and Seminal Papers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

UPV / EHU

Distributed Algorithms forFailure Detection and Consensus in

Crash, Crash-Recovery andOmission Environments

Mikel Larrea

Distributed Systems GroupUniversity of the Basque Country, UPV/EHU

Page 2: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

2

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Context and Seminal Papers• In the Consensus problem, all correct processes

propose a value and must reach a unanimous and irrevocable decision on some proposed value

• [FLP85] M. Fischer, N. Lynch, M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 1985

• [CT96] T. Chandra, S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996

• [CHT96] T. Chandra, V. Hadzilacos, S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 1996

Page 3: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

3

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Motivation

Page 4: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

4

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Motivation++

(Zurich, July 2010)

Page 5: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

5

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Crash Failure Detectors [CT96]

Page 6: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

6

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Strengthening Completeness

Page 7: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

7

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Guest Stars: P and Omega

P: strong completeness, eventual strong accuracy– Eventually every process that crashes is

permanently suspected by every correct process– There is a time after which correct processes are

not suspected by any correct process

• Omega satisfies the following property:– There is a time after which all the correct

processes always trust the same correct process

• What is a correct process?– It depends on the failure model :-)

Page 8: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

8

UPV / EHU

Mikel Larrea − Mannheim, May 2011

FD-based Consensus

Page 9: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

9

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Fault-tolerant Architecture

Page 10: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

10

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Outline• Part I: Crash Environments

– (Near-) Communication-efficient algorithms for P– Communication-optimal algorithms for P

• Part II: Crash-Recovery Environments– Implementing Omega with/without stable storage– Communication-efficient algorithms for Omega– From Omega to P– Fault-tolerant aggregator election and data aggregation

in wireless sensor networks

• Part III: Omission Environments– Secure failure detection and consensus in TrustedPals– Communication-efficient algorithm for P

Page 11: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

UPV / EHU

Part I:

P in Crash Environments

Joint work withRoberto Cortiñas, Alberto Lafuente, Iratxe Soraluze, Joachim Wieland

Page 12: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

12

UPV / EHU

Mikel Larrea − Mannheim, May 2011

The First P Algorithm [CT96]

Page 13: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

13

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Part I. Summary of Results

• Efficient implementations of P– Nearly communication-efficient algorithms (n+C

links are used forever) Q-based, transformations

– Communication-efficient algorithms (n links)• Pure ring-based, optimizations

• Optimal implementations of P– Communication-optimal algorithms (C links)

• RBcast-based, one-to-one, one-to-all

Page 14: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

14

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Reliable Broadcast [CT96]“All correct processes deliverthe same set of messages”

Page 15: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

15

UPV / EHU

Mikel Larrea − Mannheim, May 2011

P in Crash Environments

• [WLL07] J. Wieland, M. Larrea, A. Lafuente. An evaluation of ring-based algorithms for the Eventually Perfect failure detector class. 15th International Conference on Parallel, Distributed and Network-based Processing, 2007

• [LSCL08] M. Larrea, I. Soraluze, R. Cortiñas, A. Lafuente. An Evaluation of Communication-Optimal P Algorithms. 16th International Conference on Parallel, Distributed and Network-based Processing, 2008

Page 16: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

UPV / EHU

Joint work withJosé Javier Astrain, Ernesto Jiménez,

Cristian Martín, Iratxe Soraluze

Part II:

Omega in Crash-Recovery Environments

Page 17: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

17

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Part II. Summary of Results• Redefinition of Omega

– Take into account unstable processes– Take into account the availability of stable

storage

• Implementation of Omega– With and without stable storage– Efficient algorithms

• From Omega to P

• Fault-tolerant aggregator election and data aggregation in wireless sensor networks

Page 18: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

18

UPV / EHU

Mikel Larrea − Mannheim, May 2011

From Omega to P

Page 19: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

UPV / EHU

Joint work withRoberto Cortiñas, Felix Freiling, Marjan

Ghajar-Azadanlou, Alberto Lafuente, Lucia Penso, Iratxe Soraluze

Part III:

P in Omission Environments

Page 20: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

20

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Part III. Summary of Results

• Reduction from Byzantine to omission– Processes are equipped with tamper proof

security modules (e.g., smartcards)

• Actually, omission + buffering/timing attacks

• Omission models– send | receive | general– permanent | transient– non-selective | selective

Page 21: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

21

UPV / EHU

Mikel Larrea − Mannheim, May 2011

Part III. Summary of Results• Impossibility result

P is impossible to implement in the (transient) general omission model

• Redefinition and implementation of P– In-connected and out-connected processes– All-to-all communication, sequence numbers,

connectivity matrix

P-based Consensus– Termination: every in-connected process

eventually decides– Adaptation of Chandra-Toueg’s algorithm

Page 22: Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

UPV / EHU

Distributed Algorithms forFailure Detection and Consensus in

Crash, Crash-Recovery andOmission Environments

Mikel Larrea

Distributed Systems GroupUniversity of the Basque Country, UPV/EHU

Thank [email protected]