mechanically veri ed lisp intepretersmom22/talks/2008-lisp-interpreter.pdf · veri ed lisp...

Mechanically Verified LISP Intepreters

Magnus O. Myreen

University of Cambridge, UK

Nov 2008

Verified LISP interpreters

Why LISP?

The simplest real-world functional language for which I could verifyan implementation.

The specification?

Given a string denoting an s-expression, the interpreters evaluatesthe expression and produce a string describing the result.

Interpreters?

Yes, multiple: ARM, PowerPC and x86 implementations.

This talk

Describes ideas behind proof techniques instead of detailed proofs.

1. machine-code specifications

2. decompilation into logic

3. proof-producing compilation

4. LISP proofs

5. summary, lessons learnt

Part 1. Machine code

— underlying models, Hoare triples

Machine code

Underlying processor models:

ARM – developed by Anthony Fox, verified against aregister-transfer lever model of an ARM processor;

x86 – developed together with Susmit Sarkar, PeterSewell, Scott Owens, etc, heavily tested against areal processor;

PowerPC – a HOL4 translation of Xavier Leroy’s PowerPCmodel, used in his proof of an optimising C compiler.

Large detailed models...

Machine code, x86

Example, specification of x86 decoding:

" 8B /r | MOV r32, r/m32 ";" B8+rd id | MOV r32, imm32 ";

Snippet from operational semantics:

x86 exec ii (Xbinop binop name ds) len = parT unit(seqT (read eip ii) (λx. write eip ii (x + len)))(seqT(parT (read src ea ii ds) (read dest ea ii ds))(λ((ea src , val src), (ea dest , val dest)).write binop ii binop name val dest val src ea dest))

Machine code, x86

Even ‘simple’ instructions get complex definition.

Sequential op.sem. evaluated for instruction “40” (i.e. inc eax):

x86 read reg EAX state = eax ∧x86 read eip state = eip ∧x86 read mem eip state = some 0x40⇒

x86 next state =some (x86 write reg EAX (eax + 1)

(x86 write eip (eip + 1)(x86 write eflag AF none(x86 write eflag SF (some (sign of (eax + 1)))(x86 write eflag ZF (some (eax + 1 = 0))(x86 write eflag PF (some (parity of (eax + 1)))(x86 write eflag OF none state)))))))

Machine code, specifications

Clearly some abbreviations are needed!

A machine-code Hoare triple:

{R EAX a ∗ EIP p ∗ S }p : 40{R EAX (a+1) ∗ EIP (p+1) ∗ S }

Here S = ∃a s z p o. eflag AF a ∗ eflag SF s ∗ eflag ZF z ∗ ....

In HOL4 syntax:

SPEC X86 MODEL(xR EAX a * xEIP p * xS){(p,[0x40w])}(xR EAX (a+1w) * xEIP (p+1w) * xS)

Machine code, Hoare triple

The Hoare triple uses a separating conjunction ∗, defined over sets:

(p ∗ q) s = ∃v u. p u ∧ q v ∧ (u ∪ v = s) ∧ (u ∩ v = {})

We define translations from processor states to sets, for instancex86 to set can produce:

{ xReg EAX 5, xReg EDX 56, xReg ECX 89, ... ,xMem 0 none, xMem 1 (some 67), xMem 2 (some 255), ... ,xStatus AF (some true), xStatus ZF none, ... }

Let R a x = λs. (s = {xReg a x}).

(R a x ∗R b y ∗ p) (x86 to set s)⇒ a 6= b∧ (x86 read reg a s = x)

Machine code, Hoare triple

The Hoare triple uses a separating conjunction ∗, defined over sets:

(p ∗ q) s = ∃v u. p u ∧ q v ∧ (u ∪ v = s) ∧ (u ∩ v = {})

We define translations from processor states to sets, for instancex86 to set can produce:

{ xReg EAX 5, xReg EDX 56, xReg ECX 89, ... ,xMem 0 none, xMem 1 (some 67), xMem 2 (some 255), ... ,xStatus AF (some true), xStatus ZF none, ... }

Let R a x = λs. (s = {xReg a x}).

(R a x ∗R b y ∗ p) (x86 to set s)⇒ a 6= b∧ (x86 read reg a s = x)

Machine code, Hoare triple definition

The Hoare triple’s definition

{p} c {q} = ∀r s. (p ∗ code c ∗ r) (to set(s))⇒∃n. (q ∗ code c ∗ r) (to set(nextn(s)))

Covers functional correctness, termination, resource usage.

{R EAX a ∗ EIP p ∗ S }p : 40{R EAX (a+1) ∗ EIP (p+1) ∗ S }

Machine code, memory accesses

A memory load:

a ∈ domain f ∧ aligned(a)⇒{R ESI a ∗ M f ∗ EIP p ∗ S }p : 31C0{R ESI (f (a)) ∗ M f ∗ EIP (p+2) ∗ S }

where M f = λs. (s = {xMem a (some f (a)) | a ∈ domain f }).

Machine code, Hoare triple rules

compose: {p} c {m} ∧ {m} c ′ {q} ⇒ {p} c ∪ c ′ {q}

frame: {p} c {q} ⇒ ∀r . {p ∗ r} c {q ∗ r}

cond: {p ∗ cond g} c {q} = g ⇒ {p} c {q}

exists: {∃x . p(x)} c {q} = ∀x . {p(x)} c {q}

id: {p} c {p}

extend: {p} c {q} ⇒ ∀c ′. {p} c ∪ c ′ {q}

where cond g = λs. (s = {}) ∧ g .

frame: {p} c {q} ⇒ ∀r . {p ∗ r} c {q ∗ r}

id: {p} c {p}

extend: {p} c {q} ⇒ ∀c ′. {p} c ∪ c ′ {q}

frame: {p} c {q} ⇒ ∀r . {p ∗ r} c {q ∗ r}

id: {p} c {p}

extend: {p} c {q} ⇒ ∀c ′. {p} c ∪ c ′ {q}

Machine code, manual proofs

Tried to do proofs manually in HOL4, very tiresome.

Proved Schorr-Waite implementation and used it the verification ofan in-place mark-and-sweep garbage collector.

Proof was unsatisfactory: long, tedious and tied to the ARM model.

Part 2. Decompilation into logic

— automating machine code proofs

Decompilation, overview

Conventional approach:

1. user annotates program with assertions

2. tool generates verification conditions (VCs)

3. user proves VCs

Decompilation approach:

1. tool translates program into recursive function

2. user proves function correct

Decompilation, idea

Given a while-program:

a := 0;while (n 6= 0) do

a := a + 1;n := n - 2

automatic transformation produces:

f(a,n) = let a = 0 in g(a,n)

g(a,n) = if n = 0 then (a,n) elselet a = a + 1 inlet n = n - 2 ing(a,n)

Decompilation, idea

Given a while-program:

a := 0;while (n 6= 0) do

a := a + 1;n := n - 2

automatic transformation produces:

Decompilation, certificate

“while-programs as recursive functions” – an idea by McCarthyfrom 1960s.

Novelty: automated in theorem prover, and produce a proof:

The automatically proved statement:

f pre(a,n) ⇒HOARE TRIPLE(VAR "a" a * VAR "n" n)(a := 0; while (n 6= 0) do a := a + 1; n := n - 2 end)(let (a2,n2) = f(a,n) in (VAR "a" a2 * VAR "n" n2))

where f pre(a,n) = let a = 0 in g pre(a,n)

g pre(a,n) = if n = 0 then true elselet a = a + 1 inlet n = n - 2 in

g pre(a,n)

Decompilation, certificate

“while-programs as recursive functions” – an idea by McCarthyfrom 1960s.

Novelty: automated in theorem prover, and produce a proof:The automatically proved statement:

where f pre(a,n) = let a = 0 in g pre(a,n)

g pre(a,n) = if n = 0 then true elselet a = a + 1 inlet n = n - 2 ing pre(a,n)

Decompilation, verification

Suppose we want to prove the while-program.

We look at the generated function:

prove that it computes the desired result:

∀n a. 0 ≤ n ⇒ (g(a,2×n) = (n + a, 0)) ∧ g pre(a,2×n)

∀n a. 0 ≤ n ∧ EVEN n ⇒ (f(a,n) = (n DIV 2, 0)) ∧ f pre(a,n)

Very simple proofs (7-line HOL4-proof).

Decompilation, using the certificate

We now have two theorems:

It is now easy to prove the code:

0 ≤ n ∧ EVEN n ⇒HOARE TRIPLE

(VAR "a" a * VAR "n" n)(a := 0; while (n 6= 0) do a := a + 1; n := n - 2 end)(VAR "a" (n DIV 2) * VAR "n" 0)

Decompilation, using the certificate

We now have two theorems:

It is now easy to prove the code:

0 ≤ n ∧ EVEN n ⇒HOARE TRIPLE(VAR "a" a * VAR "n" n)(a := 0; while (n 6= 0) do a := a + 1; n := n - 2 end)(VAR "a" (n DIV 2) * VAR "n" 0)

Comparison with the VC approach

Annotate the program:

{ pre n }a := 0;while (n 6= 0) do { inv n } [ variant ]

a := a + 1;n := n - 2

end{ post n }

Define:

pre n = λs. (s("n") = n) ∧ EVEN n ∧ 0 ≤ npost n = λs. (s("a") = n DIV 2) ∧ (s("n") = 0)inv n = λs. (n = 2×s("a")+s("n")) ∧ EVEN s("n") ∧ 0 ≤ s("n")variant = λs. s("n")

Annotate the program:

{ pre n }a := 0;while (n 6= 0) do { inv n } [ variant ]

a := a + 1;n := n - 2

end{ post n }

Define:

pre n = λs. (s("n") = n) ∧ EVEN n ∧ 0 ≤ npost n = λs. (s("a") = n DIV 2) ∧ (s("n") = 0)inv n = λs. (n = 2×s("a")+s("n")) ∧ EVEN s("n") ∧ 0 ≤ s("n")variant = λs. s("n")

If the user proves the verification conditions, then we have:

∀n. HOARE TRIPLE (pre n) (a := 0; while ...) (post n)

Summary of comparison:

I VC proof requires more user input and is longer;

I VC proof requires the user to invent an invariant expression:

n = 2 × s("a") + s("n")

the new proof only required stating the desired result of theremaining part of the loop:

g(a, 2 × n) = (n + a, 0)

I VC proof uses a variant where the new proof uses induction;

I VC proof deals directly with the state s, the other does not.

n = 2 × s("a") + s("n")

g(a, 2 × n) = (n + a, 0)

n = 2 × s("a") + s("n")

g(a, 2 × n) = (n + a, 0)

n = 2 × s("a") + s("n")

g(a, 2 × n) = (n + a, 0)

n = 2 × s("a") + s("n")

g(a, 2 × n) = (n + a, 0)

Decompilation, core ideas

How to implement the proof-producing translation?

Key ideas:

1. define functions as instances of

tailrec G ,F ,D(x) = if G (x) then tailrec G ,F ,D(F (x)) else D(x)

2. specify termination for value x as

pre G ,F (x) = ∃n. ¬(G (F n(x)))

3. but give the user

pre G ,F (x) = if G (x) then pre G ,F (F (x)) else true

Key ideas:

pre G ,F (x) = ∃n. ¬(G (F n(x)))

Key ideas:

pre G ,F (x) = ∃n. ¬(G (F n(x)))

4. use loop rule

(∀x . HOARE TRIPLE (p(x)) c (p(F (x))))⇒(∀x . pre G ,F (x)⇒

HOARE TRIPLE (p(x)) (while G c) (p(tailrec G ,F ,id(x))))

For machine code:

5. also work one loop at a time, loop rule:

(∀x . H x ∧ G (x)⇒ SPEC m (p(x)) c (p(F (x))))⇒(∀x . H x ∧ ¬G (x)⇒ SPEC m (p(x)) c (q(D(x))))⇒(∀x . pre G ,F ,H(x)⇒ SPEC m (p(x)) c (q(tailrec G ,F ,D(x))))

pre G ,F ,H(x) = H(x) ∧ if G (x) then pre G ,F ,H(F (x)) else true

4. use loop rule

(∀x . HOARE TRIPLE (p(x)) c (p(F (x))))⇒(∀x . pre G ,F (x)⇒

HOARE TRIPLE (p(x)) (while G c) (p(tailrec G ,F ,id(x))))

For machine code:

5. also work one loop at a time, loop rule:

(∀x . H x ∧ G (x)⇒ SPEC m (p(x)) c (p(F (x))))⇒(∀x . H x ∧ ¬G (x)⇒ SPEC m (p(x)) c (q(D(x))))⇒(∀x . pre G ,F ,H(x)⇒ SPEC m (p(x)) c (q(tailrec G ,F ,D(x))))

pre G ,F ,H(x) = H(x) ∧ if G (x) then pre G ,F ,H(F (x)) else true

Decompilation, example

Given some hard-to-read machine code,

0: E3A00000 mov r0, #04: E3510000 L: cmp r1, #08: 12800001 addne r0, r0, #112: 15911000 ldrne r1, [r1]16: 1AFFFFFB bne L

This transforms to a readable HOL4 function:

f (r0, r1,m) = let r0 = 0 in g(r0, r1,m)

g(r0, r1,m) = if r1 = 0 then (r0, r1,m) else

let r0 = r0+1 in

let r1 = m(r1) in

g(r0, r1,m)

Precondition keeps track of side-conditions:

f pre(r0, r1,m) = let r0 = 0 in g pre(r0, r1,m)

g pre(r0, r1,m) = if r1 = 0 then true else

let r0 = r0+1 in

let cond = r1 ∈ domain m ∧ aligned(r1) inlet r1 = m(r1) in

g pre(r0, r1,m) ∧ cond

Certificate:

fpre(r0, r1,m)⇒

{ (R0,R1,M) is (r0, r1,m) ∗ PC p }p : E3A00000, p+4 : E3510000 . . . p+16 : 1AFFFFFB

{ (R0,R1,M) is f (r0, r1,m) ∗ PC (p + 20) }

Here (R0,R1,M) is (r0, r1,m) = R0 r0 ∗ R1 r1 ∗M m.

Decompilation, proof reuse

Manual verification proof:

∀x l a m. list(l , a,m) ⇒ f (x , a,m) = (length(l), 0,m)

∀x l a m. list(l , a,m) ⇒ fpre(x , a,m)

Note: Proof not tied to ARM model.

In fact, similar x86 code and PowerPC code decompiles tof ′ and f ′′ such that f = f ′ = f ′′. Manual proof can be reused!

Proving f = f ′ is easy in some case since

G = G ′ ∧ F = F ′ ∧ D = D ′ ⇒ tailrec G ,F ,D = tailrec G ′,F ′,D′

Decompilation, proof reuse

Manual verification proof:

∀x l a m. list(l , a,m) ⇒ f (x , a,m) = (length(l), 0,m)

∀x l a m. list(l , a,m) ⇒ fpre(x , a,m)

Note: Proof not tied to ARM model.

In fact, similar x86 code and PowerPC code decompiles tof ′ and f ′′ such that f = f ′ = f ′′. Manual proof can be reused!

Proving f = f ′ is easy in some case since

G = G ′ ∧ F = F ′ ∧ D = D ′ ⇒ tailrec G ,F ,D = tailrec G ′,F ′,D′

Part 3. Proof-producing compilation

— generating correct code

Compilation, idea

Decompilation: code → function × certificate.

Compilation: function→ code × certificate.

Compilation of function f :

1. generate code for f ;

2. decompile code to produce proved-to-be-correct f ′;

3. automatically prove f = f ′.

Note: step 1 can introduce arbitrary optimisations (instruction

reordering, conditional execution ...) as long as step 3 succeeds.

Compilation, idea

Decompilation: code → function × certificate.

Compilation: function→ code × certificate.

Compilation of function f :

1. generate code for f ;

2. decompile code to produce proved-to-be-correct f ′;

3. automatically prove f = f ′.

Note: step 1 can introduce arbitrary optimisations (instruction

reordering, conditional execution ...) as long as step 3 succeeds.

Compilation, example

Compiling

f (r1) = if r1 < 10 then r1 else let r1 = r1 − 10 in f (r1)

produces ARM code:

E351000A L: cmp r1,#102241100A subcs r1,r1,#102AFFFFFC bcs L

and proves:

{R1 r1 ∗ PC p ∗ S} p : E351000A, ... {R1 f (r1) ∗ PC (p+12) ∗ S}

Extension: If we prove “f (x) = x mod 10”, then compiler can bemade to understand “let r1 = r1 mod 10 in”, for ARM.

Compiling

produces ARM code:

and proves:

Compiling

produces ARM code:

and proves:

Compilation, input language

input ::= f (v , v , ..., v) = rhs

rhs ::= let r = exp in rhs | let s = r in rhs

| let m = m[ address 7→ r ] in rhs

| let (v , v , ..., v) = g(v , v , ..., v) in rhs

| if guard then rhs else rhs

| f (v , v , ..., v) | (v , v , ..., v)

exp ::= x | ¬ x | s | i32 | x binop x | m address | x � i5 | x � i5 | x�. i5binop ::= + | − | × | & | ?? | !!

compare ::= < | ≤ | > | ≥ | <. | ≤. | >. | ≥. | =

guard ::= ¬ guard | x compare x | x & x = 0

address ::= r | r + i7 | r − i7

x ::= r | i8

v ::= r | s | m

Part 4. LISP proofs

— decompiling primitives, compiling interpreters

LISP proofs, strategy

To produce verified LISP interpreters:

1. write ARM implementation of car, cdr, cons, etc.

2. verify these operations using decompilation;

3. reuse proofs for PowerPC and x86;

4. define a lisp eval as tail-recursive function, using car, cdr etc.

5. compile lisp eval using proof-producing compilation.

LISP proofs, primitives

First define s-expression as HOL data-type:

x ::= Dot x x | Num n | Str s

where n is natural numbers and s is strings.

Define basic LISP operations:

car (Dot x y) = x

cdr (Dot x y) = y

cons x y = Dot x y

plus (Num m) (Num n) = Num (m + n)

LISP proofs, primitives

First define s-expression as HOL data-type:

x ::= Dot x x | Num n | Str s

where n is natural numbers and s is strings.

Define basic LISP operations:

car (Dot x y) = x

cdr (Dot x y) = y

cons x y = Dot x y

plus (Num m) (Num n) = Num (m + n)

LISP proofs, primitives proved

The specification for ARM instruction executing ‘car’:

(∃x y . v1 = Dot x y)⇒

{ LISP limit (v1, v2, v3, v4, v5, v6) ∗ PC p }p : 35E30003{ LISP limit ((car v1), v2, v3, v4, v5, v6) ∗ PC (p+1) }

LISP limit (v1, v2, v3, v4, v5, v6) =∃r3 r4 r5 r6 r7 r8 r9 m t.

R3 r3 ∗ R4 r4 ∗ R5 r5 ∗ R6 r6 ∗ R7 r7 ∗ R8 r8 ∗ R9 r9 ∗M m ∗M t ∗cond(lisp inv limit (v1, v2, v3, v4, v5, v6) (r3, r4, r5, r6, r7, r8, r9,m, t))

with ‘lisp inv’ relating abstract and concrete states.

LISP proofs, cons

The specification for ‘cons’:

size v1 + size v2 + size v3 + size v4 + size v5 + size v6 < limit ⇒

{ LISP limit (v1, v2, v3, v4, v5, v6) ∗ S ∗ PC p }p : ...code...{ LISP limit ((cons v1 v2), v2, v3, v4, v5, v6) ∗ S ∗ PC (p+336) }

wheresize (Str s) = 0

size (Num n) = 0size (Dot x y) = size x + size y + 1

LISP proofs, memory usage

Memory layout:

(heap half 1) (heap half 2) (table)[xxxxxxxxxxx.........] [....................] [symbols]

Memory not wasted:

{ LISP limit (v1, v2, v3, v4, v5, v6) ∗ S ∗ PC p }p : ...code...{ LISP limit ((equal v1 v2), v2, v3, v4, v5, v6) ∗ S ∗ PC (p+232) }

equal x y = if x = y then Str “T” else Str “nil”

LISP proofs, memory usage

Memory layout:

(heap half 1) (heap half 2) (table)[xxxxxxxxxxx.........] [....................] [symbols]

Memory not wasted:

{ LISP limit (v1, v2, v3, v4, v5, v6) ∗ S ∗ PC p }p : ...code...{ LISP limit ((equal v1 v2), v2, v3, v4, v5, v6) ∗ S ∗ PC (p+232) }

equal x y = if x = y then Str “T” else Str “nil”

LISP proofs, compile

Verified primitives supplied to compiler, makes it understand:

let v1 = cons v1 v2 inlet v1 = equal v1 v2 inlet v1 = car v1 inlet v4 = cdr v2 in...

LISP proofs, compile

Example:

f (v1, v2, v3, v4, v5, v6) = if v2 = Str “nil” then(v1, v2, v3, v4, v5, v6)

elselet v2 = cdr v2 inlet v1 = cons v1 v2 in

f (v1, v2, v3, v4, v5, v6)

compiles to:

fpre(v1, v2, v3, v4, v5, v6, limit)⇒

{ LISP limit (v1, v2, v3, v4, v5, v6) ∗ S ∗ PC p }p : ...code...{ LISP limit f (v1, v2, v3, v4, v5, v6) ∗ S ∗ PC (p+356) }

LISP proofs, eval

So lisp eval is defined as tail-recursion with

v1 – s-exp to be evaluatedv2 – temp var 1v3 – temp var 2v4 – stack/continuationv5 – store: list of symbol, value pairsv6 – task variable

compiles to:

lisp evalpre(v1, v2, v3, v4, v5, v6, limit)⇒

{ LISP limit (v1, v2, v3, v4, v5, v6)) ∗ S ∗ PC p }p : ...code...{ LISP limit lisp eval(v1, v2, v3, v4, v5, v6) ∗ S ∗ PC (p+1452) }

Future work

(not-yet-proved) specification with parsing and printing:

lisp evalpre(exp, limit)⇒

{ String a (sexp2str(exp)) ∗ ... ∗ S ∗ PC p }p : ...code...{ String a (sexp2str(lisp eval(exp))) ∗ ... ∗ S ∗ PC (p+...) }

Further work

1. finish string-to-string specification

2. add support for bignum, rationals, complex rationals...

3. verify an ACL2 evaluator

Part 5. Summary

— lessons learnt

Summary

Verified LISP interpreters were produced by:

1. verifying ARM implementations of car, cdr, cons, etc.using decompilation

2. reusing proofs for PowerPC and x86;

3. defining a lisp eval as tail-recursive function;

4. compiling lisp eval using proof-producing compilation.

Lesson learnt:

“Make the proof easy for the theorem prover” – Mike Gordon.

(It made sense to turn everything into recursive functions.)

Summary

Verified LISP interpreters were produced by:

1. verifying ARM implementations of car, cdr, cons, etc.using decompilation

2. reusing proofs for PowerPC and x86;

3. defining a lisp eval as tail-recursive function;

4. compiling lisp eval using proof-producing compilation.

Lesson learnt:

“Make the proof easy for the theorem prover” – Mike Gordon.

(It made sense to turn everything into recursive functions.)

mechanically veri ed lisp intepretersmom22/talks/2008-lisp-interpreter.pdf · veri ed lisp...

Documents

the racerpro environment for lisp-based semantic web ...a...

frontal lisp, lateral lisp, distorted r ppt.pdf · frontal...

mechanically veri ed lisp...

lisp: further constructs€¦ · lisp: further constructs...

lisp: implemented by steve russell, lisp: list...

software architecture 6....

lisp multicast - cisco.com · lisp multicast...

curso de lisp con golden common lisp -...

master thesis : simulating lisp with ns3lisp sites and...

mechanically veri ed lisp...

clojure “lisp reloaded”. 2 versions of lisp lisp is an...

lisp builder

veri searchpresentation2

verİ analİtİĞİ - da.sabanciuniv.edu · Öğrenciler,...

the standard lisp...

literate programming in lisp (lp/lisp) -...

common lisp

lisp shared model virtualization...information about lisp...

veri ed bytecode veri erskleing/papers/tcs02.pdf ·...

externalconnectivity lisp -...