byzantine fault tolerance cs 425: distributed systems fall 2011 1 material drived from slides by i....
TRANSCRIPT
![Page 1: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/1.jpg)
1
Byzantine Fault Tolerance
CS 425: Distributed SystemsFall 2011
Material drived from slides by I. Gupta and N.Vaidya
![Page 2: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/2.jpg)
2
Reading List
• L. Lamport, R. Shostak, M. Pease, “The Byzantine Generals Problem,” ACM ToPLaS 1982.
• M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” OSDI 1999.
![Page 3: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/3.jpg)
Byzantine Generals Problem
A sender wants to send message to n-1 other peers
• Fault-free nodes must agree
• Sender fault-free agree on its message
• Up to f failures
![Page 4: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/4.jpg)
Byzantine Generals Problem
A sender wants to send message to n-1 other peers
• Fault-free nodes must agree
• Sender fault-free agree on its message
• Up to f failures
![Page 5: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/5.jpg)
S
321
v
vv
Byzantine Generals Algorithm
5
value v
Faulty peer
![Page 6: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/6.jpg)
S
321
v
vv
6
value v
v v
Byzantine Generals Algorithm
![Page 7: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/7.jpg)
S
321
v
vv
7
value v
v v
?
?
Byzantine Generals Algorithm
![Page 8: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/8.jpg)
S
321
v
vv
8
value v
v
v v
?
v?
Byzantine Generals Algorithm
![Page 9: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/9.jpg)
32
v
vv
9
value v
x
v v
?
v?[v,v,?]
[v,v,?]
S
1
Byzantine Generals Algorithm
![Page 10: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/10.jpg)
32
v
vv
10
value v
x
v v
?
v?v
v
S
1Majorityvote resultsin correctresult atgood peers
Byzantine Generals Algorithm
![Page 11: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/11.jpg)
S
321
v
wx
11
Faulty source
Byzantine Generals Algorithm
![Page 12: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/12.jpg)
S
32
v
w
x
12
w w1
Byzantine Generals Algorithm
![Page 13: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/13.jpg)
S
32
v
wx
13x
w w
v
xv
1
Byzantine Generals Algorithm
![Page 14: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/14.jpg)
S
32
v
wx
14x
w w
v
xv
1[v,w,x]
[v,w,x]
[v,w,x]
Byzantine Generals Algorithm
![Page 15: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/15.jpg)
S
32
v
wx
15x
w w
v
xv
1[v,w,x]
[v,w,x]
[v,w,x]
Vote resultidentical atgood peers
Byzantine Generals Algorithm
![Page 16: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/16.jpg)
Known Results
• Need 3f + 1 nodes to tolerate f failures
• Need Ω(n2) messages in general
16
![Page 17: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/17.jpg)
Ω(n2) Message Complexity
• Each message at least 1 bit
• Ω(n2) bits “communication complexity” to agree on just 1 bit value
17
![Page 18: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/18.jpg)
18
Practical Byzantine Fault Tolerance
• Computer systems provide crucial services• Computer systems fail– Crash-stop failure– Crash-recovery failure– Byzantine failure
• Example: natural disaster, malicious attack, hardware failure, software bug, etc.
• Need highly available service Replicate to increase availability
![Page 19: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/19.jpg)
19
Challenges
Request A Request B
Client Client
![Page 20: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/20.jpg)
20
Requirements
• All replicas must handle same requestsdespite failure.
• Replicas must handle requests in identical order despite failure.
![Page 21: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/21.jpg)
21
Challenges
2: Request B
1: Request A
Client Client
![Page 22: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/22.jpg)
22
State Machine Replication
2: Request B
1: Request A
2: Request B
1: Request A
2: Request B
1: Request A
2: Request B
1: Request A
Client Client
How to assign sequence number to requests?
![Page 23: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/23.jpg)
23
Primary Backup Mechanism
Client Client
2: Request B
1: Request A
What if the primary is faulty?Agreeing on sequence number
Agreeing on changing the primary (view change)
View 0
![Page 24: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/24.jpg)
24
Normal Case Operation
• Three phase algorithm:– PRE-PREPARE picks order of requests– PREPARE ensures order within views– COMMIT ensures order across views
• Replicas remember messages in log• Messages are authenticated– .σk denotes a message sent by k
![Page 25: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/25.jpg)
25
Pre-prepare Phase
Primary: Replica 0
Replica 1
Replica 2
Replica 3
Request: m
PRE-PREPARE, v, n, mσ0
Fail
![Page 26: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/26.jpg)
26
Prepare PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
Accepted PRE-PREPARE
![Page 27: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/27.jpg)
27
Prepare PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
PREPARE, v, n, D(m), 1σ1
Accepted PRE-PREPARE
![Page 28: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/28.jpg)
28
Prepare PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
PREPARE, v, n, D(m), 1σ1
Accepted PRE-PREPARE
Collect PRE-PREPARE + 2f matching PREPARE
![Page 29: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/29.jpg)
29
Commit PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
PREPARE
COMMIT, v, n, D(m)σ2
![Page 30: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/30.jpg)
30
Commit Phase (2)Request: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
PREPARE COMMIT
Collect 2f+1 matching COMMIT: execute and reply
![Page 31: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/31.jpg)
31
View Change
• Provide liveness when primary fails– Timeouts trigger view changes– Select new primary (= view number mod 3f+1)
• Brief protocol– Replicas send VIEW-CHANGE message along with
the requests they prepared so far– New primary collects 2f+1 VIEW-CHANGE messages– Constructs information about committed requests
in previous views
![Page 32: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/32.jpg)
32
View Change Safety
• Goal: No two different committed request with same sequence number across views
Quorum for Committed Certificate (m, v, n)
At least one correct replica has Prepared Certificate (m, v, n)
View Change Quorum
![Page 33: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dce5503460f94ac21c4/html5/thumbnails/33.jpg)
33
Related WorksFault Tolerance
Fail Stop Fault Tolerance
Paxos1989 (TR)
VS ReplicationPODC 1988
Byzantine Fault Tolerance
Byzantine Agreement
RampartTPDS 1995
SecureRingHICSS 1998
PBFT OSDI ‘99
BASETOCS ‘03
Byzantine Quorums
Malkhi-ReiterJDC 1998
PhalanxSRDS 1998
FleetToKDI ‘00
Q/USOSP ‘05
Hybrid Quorum
HQ Replication OSDI ‘06