sosp 2007 © 2007 andreas haeberlen, mpi-sws 1 practical accountability for distributed systems...

23
© 2007 Andreas Haeberlen, MPI-SWS 1 SOSP 2007 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov MPI-SWS Peter Druschel MPI-SWS

Upload: griffin-morris

Post on 06-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

SOSP © 2007 Andreas Haeberlen, MPI-SWS General faults occur in practice Many faults are not 'fail-stop' Node is still running, but its behavior changes Examples: Hardware malfunctions Misconfigurations Software modifications by users Hacker attacks...

TRANSCRIPT

Page 1: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

© 2007 Andreas Haeberlen, MPI-SWS 1SOSP 2007

Practical accountability for distributed systems

Andreas Haeberlen

MPI-SWS / Rice University

Petr Kuznetsov MPI-SWS

Peter Druschel MPI-SWS

Page 2: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

2© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Motivation

Distributed state, incomplete information General case: Multiple admins with different interests

Admin

Page 3: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

3© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

General faults occur in practice Many faults are not 'fail-stop'

Node is still running, but its behavior changes

Examples: Hardware malfunctions Misconfigurations Software modifications by users Hacker attacks ...

Page 4: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

4© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Dealing with general faults is difficult

How to detect faults? How to identify the faulty nodes? How to convince others that a node is (not) faulty?

Incorrectmessage

Responsibleadmin

Page 5: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

5© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Learning from the 'offline' world Relies on accountability Example: Banks

Can be used to detect, identify and convince But: Existing fault-tolerance work mostly focused

on prevention

Goal: A general+practical system for accountability

Requirement SolutionCommitment Signed receiptsTamper-evident record

Double-entry bookkeeping

Inspections Audits

Page 6: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

6© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Outline Introduction What is accountability? How can we implement it? How well does it work?

Page 7: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

7© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Ideal accountability

Whenever a node is faulty in any way, the system generates a proof of misbehavior against that node

Fault := Node deviates from expected behavior Recall that our goal is to

detect faults identify the faulty nodes convince others that a node is (or is not) faulty

Can we build a system that provides the following guarantee?

Page 8: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

8© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Can we detect all faults? Problem: Faults that

affect only a node's internal state

Requires online trusted probes at each node

Focus on observable faults: Faults that causally

affect a correct node

This allows us to detect faults without introducing any trusted components

A

X

C

100101011000101101011100100100

0

Page 9: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

9© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Can we always get a proof? Problem: He-said-she-said

situation Three possible causes:

A never sent X B refuses to accept X X was lost by the network

Cannot get proof of misbehavior! Generalize to verifiable evidence:

a proof of misbehavior, or a challenge that the node cannot answer

What if, after a long time, no response has arrived?

Does not prove the fault, but we can suspect the node

A

X

B

C

?

I sent X!

I neverreceived

X!?!

Page 10: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

10© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Practical accountability We propose the following definition of a

distributed system with accountability:

This is useful Any (!) fault that affects a correct node is

eventually detected and linked to a faulty node

It can be implemented in practice

Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node

Page 11: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

11© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Outline Introduction What is accountability? How can we implement it? How well does it work?

Page 12: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

12© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Adds accountability to a given system Implemented as a library Provides secure record, commitment, auditing, etc.

Assumptions:

Implementation: PeerReview

1. System can be modeled as collection of deterministic state machines

2. Nodes have reference implementations of the state machines

3. Correct nodes can eventually communicate4. Nodes can sign messages

Page 13: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

13© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

M

PeerReview from 10,000 feet All nodes keep a log of

their inputs & outputs Including all messages

Each node has a set of witnesses, who audit its log periodically

If the witnesses detect misbehavior, they

generate evidence make the evidence avai-

lable to other nodes Other nodes check evi-

dence, report faultA's log

B's log

A

BM

CD

E

A's witnesses

M

Page 14: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

14© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

PeerReview detects tampering

A B

Message Ha

sh ch

ain

Send(X)

Recv(Y)

Send(Z)

Recv(M)

H0

H1

H2

H3

H4

B's log

ACK

What if a node modifies its log entries?

Log entries form a hash chain

Inspired by secure histories [Maniatis02]

Signed hash is included with every message Node commits to its current state Changes are evident

Hash(log)

Hash(log)

Page 15: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

15© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

PeerReview detects inconsistencies

What if a node keeps multiple logs? forks its log?

Check whether the signed hashes form a single hash chain

H3'

Read X

H4'

Not found

Read Z

OK

Create X

H0

H1

H2

H3

H4

OK

"View #1""View #2"

Page 16: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

16© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Module B

PeerReview detects faults How to recognize

faults in a log? Assumption:

Node can be modeled as a deterministic state machine

To audit a node: Replay inputs to a

trusted copy of the state machine

Check outputs against the log

Module AModule B

=?

LogNetwork

Input

Output

Stat

e m

achi

ne

if ≠

Module A

Page 17: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

17© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

PeerReview offers provable guarantees PeerReview guarantees that:

1) Faults will be detected

2) Good nodes cannot be accused

Formal definitions and proof in a TR

If node commits a fault + has a correct witness,

then witness obtains a proof of misbehavior (PoM), or a challenge that the faulty node cannot answer

If node is correct there can never be a PoM, and it can answer any challenge

Page 18: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

18© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Outline Introduction What is accountability? How can we implement it? How well does it work?

Is it widely applicable? How much does it cost? Does it scale?

Page 19: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

19© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

PeerReview is widely applicable App #1: NFS server in the Linux kernel

Many small, latency-sensitive requests Tampering with files Lost updates

App #2: Overlay multicast Transfers large volume of data

Freeloading Tampering with content

App #3: P2P email Complex, large, decentralized

Denial of service Attacks on DHT routing

More information in the paper

Metadata corruption Incorrect access control

Censorship

Page 20: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

20© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

How much does PeerReview cost?

Dominant cost depends on number of witnesses W

O(W2) component

Baseline 1 2 3 4 5

100

80

60

40

20

0

Avg

traffi

c (K

bps/

node

)

Number of witnesses

Baseline trafficSignaturesand ACKs

Checking logs

W dedicatedwitnesses

Page 21: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

21© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Mutual auditing

Small probability of error is inevitable Example: Replication

Can use this to optimize PeerReview Accept that an instance of a fault is found

only with high probability Asymptotic complexity: O(N2) O(log N)

Small randomsample of peers

chosen as witnesses

Node

Page 22: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

22© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

PeerReview is scalable

Assumption: Up to 10% of nodes can be faulty Probabilistic guarantees enable scalability

Example: Email system scales to over 10,000 nodeswith P=0.999999

DSL/cableupstream

Email systemw/o accountability

O((log N)2)

O(log N)

Email system+ PeerReview(P=0.999999)

Email system + PeerReview(P=1.0)

System size (nodes)

Avg

traf

fic (K

bps/

node

)

Page 23: SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov

23© 2007 Andreas Haeberlen, MPI-SWS SOSP 2007

Summary Accountability is a new approach to handling

faults in distributed systems detects faults identifies the faulty nodes produces evidence

Our practical definition of accountability:Whenever a fault is observed by a correct node, the system eventually generates verifiable evidenceagainst a faulty node

PeerReview: A system that enforces accountability

Offers provable guarantees and is widely applicableThank you!

Andreas Haeberlen received a SOSP travel scholarship, which was supported by Infosys