paxos made simple jinghe zhang. introduction lock is the easiest way to manage concurrency mutex and...

24
Paxos Made Simple Jinghe Zhang

Upload: meryl-mccarthy

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Paxos Made Simple

Jinghe Zhang

Introduction

Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks.

In distributed system: No master for issuing locks. Failures.

Problem

How to reach consensus/data consistency in distributed system that can tolerate non-malicious failures?

Requirements Safety

Only a value that has been proposed may be chosen.

Only a single value is chosen. A node never learns that a value has been

chosen unless it actually has been. Liveness

Some proposed value is eventually chosen. If a value has been chosen, a node can

eventually learn the value

Paxos Properties

Paxos is an asynchronous consensus algorithm Asynchronous networks

No common clocks or shared notion of time (local ideas of time are fine, but different processes may have very different “clocks”)

No way to know how long a message will take to get from A to B

Paxos Properties

Paxos is guaranteed safe. Consensus is a stable property: once

reached it is never violated; the agreed value is not changed.

Paxos Properties

Paxos is not guaranteed live. Consensus is reached if “a large

enough subnetwork...is non-faulty for a long enough time.”

Otherwise Paxos might never terminate.

Paxos Algorithm

Key Assumptions: Set of processes that run Paxos is

known a-priori Processes suffer crash failures All processes have Greek names (but

translate as “Fred”, “Cynthia”, “Nancy”…)

Paxos Algorithm 3 roles

proposer acceptor Learner

A node can act as more than one clients (usually 3).

2 phases Phase 1: Prepare request Response Phase 2: Accept request Response

Phase 1: (prepare request)

(1) A proposer chooses a new proposal version number n , and sends a prepare request (“prepare”,n) to a majority of acceptors:(a) Can I make a proposal with number n ?(b) if yes, do you suggest some value for

my proposal?

Phase 1: (prepare request)

(2) If an acceptor receives a prepare request (“prepare”, n) with n greater than that of any prepare request it has already responded, sends out (“ack”, n, n’, v’) or (“ack”, n, , )(a) responds with a promise not to accept any

more proposals numbered less than n.(b) suggest the value v of the highest-number

proposal that it has accepted if any, else

Phase 2: (accept request)

(3) If the proposer receives responses from a majority of the acceptors, then it can issue an accept request (“accept”, n , v) with number n and value v: (a) n is the number that appears in the

prepare request.(b) v is the value of the highest-numbered

proposal among the responses

Phase 2: (accept request)

(4) If the acceptor receives an accept request (“accept”, n , v) , it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

Learning the decision Obvious algorithm: whenever acceptor

accepts a proposal, respond to all learners (“accept”, n, v).

No Byzantine-Failures: Acceptors informs a distinguished learner and let the distinguished learner broadcast the result.

Learner receives (“accept”, n, v) from a majority of acceptors, decides v, and sends (“decide”, v) to all other learners.

Learners receive (“decide”, v), decide v

In Well-Behaved Runs

1 1

2

n

.

.

.

(“accept”,1 ,v1)

1

2

n

.

.

.

1 1

2

n

.

.

.

(“prepare”,1)

(“ack”,1, , )

decide v1

(“accept”,1 ,v1)

1: proposer1-n: acceptors1-n: learners

Paxos is safe…

Intuition: If a proposal with value v is decided,

then every higher-numbered proposal issued by any proposer has value v.

A majority of acceptors accept (n, v), v is decided

next prepare request with Proposal Number n+1

(what if n+k?)

Safety (proof)

Suppose (n, v) is the earliest proposal that passed. If none, safety holds.

Let (n’, v’) be the earliest issued proposal after (n, v) with a different value v’!=v

As (n’, v’) passed, it requires a major of acceptors. Thus, some process approve both (n, v) and (n’, v’), though it will suggest value v with version number k>= n.

As (n’, v’) passed, it must receive a response (“ack”, n’, j, v’) to its prepare request, with n<j<n’. Consider (j, v’) we get the contradiction.

Liveness

Fischer-Lynch-Patterson (1985) No consensus can be guaranteed in

an asynchronous communication system in the presence of any failures.

Intuition: a “failed” process may just be slow, and can rise from the dead at exactly the wrong time.

Liveness FLP tells us that it is impossible for an

asynchronous system to agree on anything with accuracy and liveness! Liveness requires that agents are free

to accept different values in subsequent rounds.

But: safety requires that once some round succeeds, no subsequent round can change it.

Liveness(cont.) Paper gives us a scenario with 2

proposers, and during the scenario no decision can be made. As the paper points out, selecting a

distinguished proposer will solve the problem.

“Leader election” But Paxos doesn’t block in case of a lost

message Phase 1 can start with new rank even if

previous attempts never ended

Applications

Chubby lock service. Petal: Distributed virtual disks. Frangipani: A scalable distributed

file system.

Summary

Consensus is “impossible” But this doesn’t turn out to be a big

obstacle We can achieve consensus with

probability one in many situations Paxos is an example of a

consensus protocol, very simple

If you are interested...

Lamport, Leslie (May 1998). "The Part-Time Parliament". ACM Transactions on Computer Systems 16 (2): 133–169. (http://research.microsoft.com/users/lamport/pubs/lamport-paxos.pdf)

Questions?

Jenkins, if I want another yes-man, I’ll build one!