protocol verification with merci

Protocol Verification with Merci

Mark R. Tuttle and Amit GoelDTS SCL

Introduction• I love proof

– Proof is the path to understanding why things work– But theorem provers are too hard for the masses (even me)

• I advocate model checking at Intel– It is the path to automated formal verification for the masses– But model checkers verify without explaining, and don’t scale

• But the world has changed– Decision procedures and SMT now automate some forms of proof– Is theorem proving now viable for nonspecialists in product groups?

Slide 2

Our result• Amit wrote Merci: SMT-based proof checker from SCL

– Systems modeled with guarded commands (like Murphi, TLA+)– Clean mapping to decision procedures of an SMT solver

• Mark validated a classical distributed algorithm– A novice: no prior exposure to Merci, little exposure to SMT– Model done in 3 days, proof done in 3 days, just 9 pages long– Model looks like ordinary code, invariants explain the algorithm

• Found little need to coach the prover about “obvious” things

Slide 3

Consensus

• Validity:– Each output was an input

• Agreement:– All outputs are equal

• Termination:– All nodes choose an output

n1 n2 n3

0 1 0

1 1 1

nodes

inputs

outputs

[Pease, Shostak, Lamport]

messagepassing

Slide 4

A shocking result!

• Consensus is impossible in an asynchronous system if even one node can fail.– Asynchronous: no bound on node step time, msg delivery time– Failure: node just stops (crashes)

• A decade of papers– Different system models, different failure models– How fast? How few messages? How many failures

• Consensus is the “hardest problem” in concurrency!– but sometimes it can be solved…

[Fischer, Lynch, Patterson]

[Herlihy]

Slide 5

Synchronous modelComputation is a sequence of rounds of message passing.

nodes send

messages

nodesreceive

messages

nodeschange

state

round r round r+1

node

Slide 6

Crash failures

At most t nodes can fail.

n

n is correctsends all messages

n is silentsends no messages

n crashes!sends some

messages

Slide 7

Algorithm

procedure consensus (node n)state ← { input }for each round r = 1, 2, …, t+1 do

broadcast state to all nodesreceive state1, state2, …, statek from other nodesstate ← state1 U state2 U … U statek

output ← min(state)

Validity: each output was an inputTermination: all nodes choose an output at end of round t+1Agreement: ???

[Dolev, Strong]

Slide 8

Clean round: no nodes fail

• There is a clean round in t+1 rounds (at most t failures).• Nodes have same state after a clean round.• Nodes choose same output value min(state). Agreement!

[Dwork, Moses]

Clean round!

Slide 9

Merci • A typed procedural language

• Guarded commands used to describe systems

type nodevar array(node, bool) y = mk_array[node](false)var array(node, bool) critical =mk_array[node](false)var node turn

transition unit req_critical (node n)require (!y[n]){ y[n] := true; }

transition unit enter_critical (node n)require (y[n] && !critical[n] && turn=n){ critical[n] := true; }

transition unit exit_critical (node n)require (critical[n]){critical[n] := false; y[n] := false; nondet turn;}

[Amit Goel]

Merci• A typed procedural language


• A goal description language for compositional reasoning

def bool mutex = (node n1, node n2) (critical[n1] && critical[n2] => n1=n2)

def bool aux = (node n) (critical[n] => turn=n)

goal g0 = invariant mutex assuming auxgoal g1 = invariant aux

[Amit Goel]

Merci• A typed procedural language


• A goal description language for compositional reasoning

• A template system for extending the language

template <type elem> Set { type t // set type const bool mem (elem x, t s) const t add (elem x, t s) const t remove (elem x, t s)

axiom mem_add = (elem x, elem y, t s) (mem (x, add (y, s)) = (x = y || mem (x, s)))

axiom mem_remove = (elem x, elem y, t s) (mem (x, remove(y, s)) = (x !=y && mem(x, s)))}

type nodemodule Node= Set<type node>

[Amit Goel]

Crash failure model

def bool is_crash_behavior (Nodes crashed, Nodes crashing, message_pattern deliver) =

(node p) (p crashed => is_silent(p,deliver)) && (node p) (is_faulty(p,deliver) => p crashed || p crashing) &&Nodes.disjoint(crashed,crashing) &&Nodes.cardinality(crashed) + Nodes.cardinality(crashing) ≤ t

faulty

silent

Slide 13

Synchronous model

for each node pinitialize state of p

for each round rfor each p and q

send msg from p to qfor each p and q

receive msg from p to qfor each p

update state of p

phase

init

send

recv

comp

program counter

init[p]

send[p][q]

recv[p][q]

comp[p]

algorithm

how?

what?

how?

how?decide?decide!

Slide 14

phase ← send

phase ← recv

phase ← comp

Synchronous model• Transitions

– initialize(p)

– start_send– send(p,q)

– start_recv– recv(p,q)

– start_comp– comp(p)

init[p] ← true

send[p][q] ← true

recv[p][q] ← true

comp[p] ← true

increment roundsend[q][p] ← falserecv[p][q] ← falsecomp[p] ← fasle

is_init_phase = phase = init

init_phase_done = forall (node p) (init[p])

Slide 15

transition start_sending () require ( is_init_phase && init_phase_done ||

is_comp_phase && comp_phase_done){

"send[p][q], recv[p][q], comp[p] <= false""message[p][q] <= null_message"

round := round + 1; phase := send;

crashed := Nodes.union(crashed,crashing);nondet crashing;nondet deliver;assume is_crash_behavior(crashed,crash,deliver);

}

Slide 16

transition send (node n, node m) require (is_send_phase)require (!send[n][m]){

messages[n][m] := (deliver [n][m] ? global_state[n] : null_message);

send[n][m] := true;}

initialize(p) 8 lines

start_send() 16 lines send(p,q) 9 lines

start_recv() 5 lines recv(p,q) 7 lines

start_comp() 5 lines comp(p) 13 lines

Transition size

Slide 17

Agreement proof• Recall the agreement proof

– A1: There is a clean round – A2: All states are equal at the end of a clean round – A3: All states remain equal after a clean round – A4: All nodes choose from their states the same output value

• Merci proof is short– A1: 7 lines– A2: 127 lines– A3: 12 lines– A4: 25 lines

• Merci proof is almost entirely at the algorithmic levelSlide 18

A1: There is a clean rounddef bool clean_round_by_round_t_plus_1 =

round >= t+1 => !before_clean

def bool faulty_grows_until_clean_round = before_clean => Nodes.cardinality(faulty) >= round

goal clean1 = invariant faulty_grows_until_clean_roundgoal clean2 = invariant clean_round_by_round_t_plus_1 assuming faulty_grows_until_clean_round

Slide 19

A2: All states equal …def bool state_equality =

(node n, node m) (noncrashed(n) && noncrashed(m) => state[n] = state[m])

def bool state_equality_in_clean = in_clean && send_phase_done && recv_phase_done =>

state_equality

• Proof– A2.1: If nonfaulty n has v, then n received v in a message– A2.2: That message was sent to everyone since round is clean– A2.3: If m received v in a message, then m has v– A2.4: So nonfaulty n and m have the same values

• Proof algorithmic and short: 48, 34, 15, and 30 lines long

Slide 20

Conclusion• Classical fault-tolerant distributed algorithm proved w/Merci

– Model looks like ordinary code, invariants explain the algorithm– Merci proof is 170 lines, Classical proof is 1+ page– Model and proof done in 6 days with no prior experience

• Yices made quantification hard– exists: usually have to produce the example by hand– forall: template instantiation wouldn’t find the right instantiation

• Yices counterexamples mostly useless– Get a context from first few lines, ignore the rest– “Is property false or is Yices failing to instantiate a forall template?”– BKM: Think about the algorithm itself, and ignore Yices output

Slide 21

protocol verification with merci

Documents

node n1

arraynodefalsevar node

node n criticaln

t nodes

critical node nrequire

node step time

bool critical

nodes failthere