mikel larrea distributed systems group university of the basque country, upv/ehu
DESCRIPTION
Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments. Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU. Context and Seminal Papers. - PowerPoint PPT PresentationTRANSCRIPT
UPV / EHU
Distributed Algorithms forFailure Detection and Consensus in
Crash, Crash-Recovery andOmission Environments
Mikel Larrea
Distributed Systems GroupUniversity of the Basque Country, UPV/EHU
2
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Context and Seminal Papers• In the Consensus problem, all correct processes
propose a value and must reach a unanimous and irrevocable decision on some proposed value
• [FLP85] M. Fischer, N. Lynch, M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 1985
• [CT96] T. Chandra, S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996
• [CHT96] T. Chandra, V. Hadzilacos, S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 1996
3
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Motivation
4
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Motivation++
(Zurich, July 2010)
5
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Crash Failure Detectors [CT96]
6
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Strengthening Completeness
7
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Guest Stars: P and Omega
P: strong completeness, eventual strong accuracy– Eventually every process that crashes is
permanently suspected by every correct process– There is a time after which correct processes are
not suspected by any correct process
• Omega satisfies the following property:– There is a time after which all the correct
processes always trust the same correct process
• What is a correct process?– It depends on the failure model :-)
8
UPV / EHU
Mikel Larrea − Mannheim, May 2011
FD-based Consensus
9
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Fault-tolerant Architecture
10
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Outline• Part I: Crash Environments
– (Near-) Communication-efficient algorithms for P– Communication-optimal algorithms for P
• Part II: Crash-Recovery Environments– Implementing Omega with/without stable storage– Communication-efficient algorithms for Omega– From Omega to P– Fault-tolerant aggregator election and data aggregation
in wireless sensor networks
• Part III: Omission Environments– Secure failure detection and consensus in TrustedPals– Communication-efficient algorithm for P
UPV / EHU
Part I:
P in Crash Environments
Joint work withRoberto Cortiñas, Alberto Lafuente, Iratxe Soraluze, Joachim Wieland
12
UPV / EHU
Mikel Larrea − Mannheim, May 2011
The First P Algorithm [CT96]
13
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part I. Summary of Results
• Efficient implementations of P– Nearly communication-efficient algorithms (n+C
links are used forever) Q-based, transformations
– Communication-efficient algorithms (n links)• Pure ring-based, optimizations
• Optimal implementations of P– Communication-optimal algorithms (C links)
• RBcast-based, one-to-one, one-to-all
14
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Reliable Broadcast [CT96]“All correct processes deliverthe same set of messages”
15
UPV / EHU
Mikel Larrea − Mannheim, May 2011
P in Crash Environments
• [WLL07] J. Wieland, M. Larrea, A. Lafuente. An evaluation of ring-based algorithms for the Eventually Perfect failure detector class. 15th International Conference on Parallel, Distributed and Network-based Processing, 2007
• [LSCL08] M. Larrea, I. Soraluze, R. Cortiñas, A. Lafuente. An Evaluation of Communication-Optimal P Algorithms. 16th International Conference on Parallel, Distributed and Network-based Processing, 2008
UPV / EHU
Joint work withJosé Javier Astrain, Ernesto Jiménez,
Cristian Martín, Iratxe Soraluze
Part II:
Omega in Crash-Recovery Environments
17
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part II. Summary of Results• Redefinition of Omega
– Take into account unstable processes– Take into account the availability of stable
storage
• Implementation of Omega– With and without stable storage– Efficient algorithms
• From Omega to P
• Fault-tolerant aggregator election and data aggregation in wireless sensor networks
18
UPV / EHU
Mikel Larrea − Mannheim, May 2011
From Omega to P
UPV / EHU
Joint work withRoberto Cortiñas, Felix Freiling, Marjan
Ghajar-Azadanlou, Alberto Lafuente, Lucia Penso, Iratxe Soraluze
Part III:
P in Omission Environments
20
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part III. Summary of Results
• Reduction from Byzantine to omission– Processes are equipped with tamper proof
security modules (e.g., smartcards)
• Actually, omission + buffering/timing attacks
• Omission models– send | receive | general– permanent | transient– non-selective | selective
21
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part III. Summary of Results• Impossibility result
P is impossible to implement in the (transient) general omission model
• Redefinition and implementation of P– In-connected and out-connected processes– All-to-all communication, sequence numbers,
connectivity matrix
P-based Consensus– Termination: every in-connected process
eventually decides– Adaptation of Chandra-Toueg’s algorithm
UPV / EHU
Distributed Algorithms forFailure Detection and Consensus in
Crash, Crash-Recovery andOmission Environments
Mikel Larrea
Distributed Systems GroupUniversity of the Basque Country, UPV/EHU
Thank [email protected]