srg peerreview: practical accountability for distributed systems andreas heaberlen, petr kouznetsov,...
Post on 23-Jan-2016
215 views
TRANSCRIPT
![Page 1: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/1.jpg)
SRG
PeerReview: Practical Accountability for Distributed Systems
Andreas Heaberlen, Petr Kouznetsov, and Peter DruschelSOSP’07
![Page 2: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/2.jpg)
Problems
How to:Detect Byzantine faults whose effects are
observed by a correct node.Link faults to faulty nodes.Defend correct nodes against false
accusations.
![Page 3: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/3.jpg)
Accountability
Use accountability to detect and expose node faults.Maintain a tamper-evident record that
captures all actions of each node.Detect a faulty node when it’s behavior
deviates from that of a correct node.
![Page 4: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/4.jpg)
Limitations of current systems
Designed for a specific type of faults or for a specific application.
Based on many strong assumptions.Not provide verifiable evidence of
misbehavior.Use formal specification of a system to
check for misbehavior.Can only detect faulty nodes that
misbehave repeatedly.
![Page 5: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/5.jpg)
Overview
Model a node as a deterministic state machine.
Each node keeps a secure log that records all sent and received messages, all inputs and outputs.
To check a node j, node i will:Get j’s log.Replay j’s log using a reference
implementation.Compare the results.
![Page 6: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/6.jpg)
The problem of detection
Ideal completeness: a faulty node should be exposed by all correct nodes.
Ideal accuracy: no correct node is ever exposed by a correct node (no false positives).
![Page 7: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/7.jpg)
Types of faults can be detected
Available data: messages sent and received among nodes.
Can only detect faults that manifest themselves through messages.
Can only detect faults that are observed by a correct nodes.
Need to consider:Verifiability of outputs.Missing and long delayed messages.
![Page 8: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/8.jpg)
Problem statementTerms:
Detectably fault, detectably ignorant.Accomplices (of i): nodes that send
messages caused by an incorrect message sent by i
Completeness: Eventually, every detectably ignorant node
is suspected forever by every correct node. If node i is detectably faulty, then eventually,
some faulty accomplice is exposed or suspected forever by every correct node.
![Page 9: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/9.jpg)
Problem statement (cont)
Accuracy:No correct node is forever suspected by a
correct node.No correct node is ever exposed by a
correct node.
![Page 10: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/10.jpg)
System model
Failure indications:exposed(j)suspected(j) trusted(j)
![Page 11: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/11.jpg)
Assumptions
The state machines Si are deterministic.
A message sent from a correct node to another is eventually received.
Use a hash function H() that is: pre-image resistant, second pre-image resistant, and collision resistant.
Each node has a unique identifier. Nodes can sign messages, and faulty nodes can node forge the signature.
![Page 12: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/12.jpg)
Assumptions (cont)
Each node has access to a reference implementation of all Si. The implementation can take a snapshot and can be initialized from a snapshot.
Function ω that maps each node to a set of witnesses. The set {i} U ω(i) contains at least one correct node.
![Page 13: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/13.jpg)
Tamper-evident logs Log entry Hash value
Authenticator
If a prefix of a node’s log does not match the hash value then that node is faulty
),,( kkkk ctse
))(||||||( 1 kkkkk cHtshHh
),( kkjjk hs
![Page 14: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/14.jpg)
Tamper-evident logs
αkj can be used to check if j’s log
contains ek
To inspect x entries of j: i challenge j to return ek-(x-1),… ek and hk-x.
i calculate hk and compare with the value in the authenicator.
![Page 15: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/15.jpg)
Commitment protocol To ensure that a node can not add an entry for
a message it has never received and that a node’s log is complete.
When i send a message to j: i creates (sk,SEND,{j,m}), attach hk-1, sk and σi(sk||hk)
to m and send m. j calculate the signature, if valid then j creates
(sl, RECV,{i,m}) and retusn ACK to i with hl-1, sl and σj(sl||hl).
i verify the signature and send a challenge to j’s witnesses if the signature is not valid.
![Page 16: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/16.jpg)
Consistency protocolA faulty node can hide itself by keeping
more than one log or a log with multiple branches
![Page 17: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/17.jpg)
Consistency protocol
If i receives authenticators from j, it must eventually forward those authenticators to j’s witnesses.
Periodically, each ω of j’s witnesses will challenge j to return a list of entries (from k to l) then ω check for consistency.
Finally, ω extracts all authenticators j receives from other nodes and send them to corresponding witness sets.
![Page 18: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/18.jpg)
Audit protocol To check if the node’s behavior consistent with
it’s reference implementation. Each witness of i will:
Look up the most recent authenticator of i. Challenge to get all log entries since the last audit
and add them to λωi. Create an instance of i’s reference implementation,
initialize the most recent snapshot. Replay all the inputs and compare the outputs. Expose i if the outputs are not equal.
![Page 19: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/19.jpg)
Challenge/response protocol
Audit challenge:Consists two authenticators αk
i and αli (k < l)
i’s log must contains ek – el, otherwise faulty If i is correct, returns the corresponding log
segment.
![Page 20: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/20.jpg)
Challenge/response protocol
Send challenge:Consists the message m with all needed
information attached. i must acknowledge m, otherwise faulty. If i has not yet received m, accepts m and
returns an ACK. If i has already received m, just resends the
ACK.
![Page 21: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/21.jpg)
Evidence transfer protocol To ensure that all correct nodes eventually
collect the same evidence against faulty nodes.
Every node i periodically fetches challenges collected by witnesses of every other node j.
If a correct node i obtains a challenge for j, i indicates suspected(j). When I receives a message from j, i challenges j.
If i receives valid answers to all pending challenges of j, i indicates trusted(j).
If i obtains a misbehavior from j, i indicates exposed(j).
![Page 22: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/22.jpg)
Overhead
Signing messages.Extra messages to implement the
protocols.Taking snapshots of nodes.Replay nodes’ execution
![Page 23: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/23.jpg)
Extension
Pf : probability that an all-faulty witness set exists.
Pm: probability that a given instance of misbehavior remains undetected.
The message complexity grows with O(logN).
![Page 24: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/24.jpg)
Applications
Overlay multicast.NFSP2P email (ePOST)
![Page 25: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/25.jpg)
Evaluation
Strategy of the freeloader in Overlay Multicast.
![Page 26: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/26.jpg)
Evaluation (cont)
Message latency in NFS
![Page 27: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/27.jpg)
Evaluation (cont)
Throughput of NFS
![Page 28: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/28.jpg)
Evaluation (cont)
Average traffic in ePOST
![Page 29: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/29.jpg)
Evaluation (cont)
Scalability
![Page 30: SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07](https://reader035.vdocuments.us/reader035/viewer/2022070416/56649d3f5503460f94a18380/html5/thumbnails/30.jpg)
Evaluation (cont)
Scalability