proof-carrying code

Proof-Carrying Code

Programmable mobile devices

By 2003, one in five people will own a mobile communications device.

Nokia expects to sell 500M Java-enabled phones in 2003.

Most of these devices will be power and memory limited.

Mobile/Wireless Devices

In ‘97, 101M mobile phones vs 82M PCs. (40% vs 14%.)

95% phones will be WAP enabled by ‘04.

64Mbits of RAM in 2002.

Battery life a primary factor.

Efficiency and bandwidth will still be precious.

Cheese and the Sum Total of Human Knowledge

The Code Safety Problem

Code Safety

CPU

Code

Trusted Host

Is this safe to execute?

Approach 1Trust the Code Producer

CPU

Code

Trusted Host

sig

Trusted 3rd Party

PK1

PK1

PK2

PK2

Trust is based on personal authority, not program properties

Scaling problems?

Approach 2Baby-sit the Program

CPU

Code

Trusted Host

Execution monitor

Expensive

Limited in expressive power(Why?)

E.g., Software Fault Isolation [Wahbe & Lucco], Inline Reference Monitors [Schneider]

Approach 3Java

CPU

Code

Trusted Host

Interp/ JIT

Expensive and/or big

Limited in expressive power

Verifier

TheoremProver

Approach 4Formal Verification

CPU

Code

Flexible andpowerful.

Trusted Host

But really reallyreally hard andmust be correct.

A Key Idea: Explicit Proofs

CertifyingProver

CPU

ProofChecker

Code

Proof

Trusted Host

A Key Idea: Explicit Proofs

CertifyingProver

CPU

Code

Proof

No longer need totrust this component.

ProofChecker

Proof-Carrying Code

CertifyingProver

CPU

Code

Proof

Simple,small (<52KB),and fast.

No longer need totrust this component.

ProofChecker

Reasonable in size (0-10%).

But...

...How to generate the proofs?

Proving theorems about real programs is hard.

Most useful safety properties of low-level programs are undecidable.

Theorem-proving systems are unfamiliar to programmers and hard to use even for experts.

The Role ofProgramming Languages

Civilized programming languages can provide “safety for free”.

Well-formed/well-typed safe.

Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.

Certifying Compilers[Necula & Lee, PLDI’98]

Intuition:

Compiler “knows” why each translation step is semantics-preserving.

So, have it generate a proof that safety is preserved.

“Small theorems about big programs.”

Don’t try to verify the whole compiler, but only each output it generates.

Automation viaCertifying Compilation

CertifyingCompiler

CPULooks and smells like a compiler.

% spjc foo.java bar.class baz.c -ljdk1.2.2

Sourcecode

Proof

Objectcode

CertifyingProver

ProofChecker

Overview of the Necula/Lee Approach to PCC

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

Reference Interpreters

A reference interpreter (RI) is a standard interpreter extended with instrumentation to check the safety of each instruction before it is executed, and abort execution if anything unsafe is about to happen.

In other words, an RI is capable only of safe execution.

Reference Interpreterscont’d

The reference interpreter is never actually implemented.

The point will be to prove (by using the proof rules given in the safety policy) that execution of the code on the RI never aborts, and thus execution on the real hardware will be identical to execution on the RI.

Sample Reference Interpreter

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

The Safety Policy

The RI can be viewed as defining a safety policy

RI language is a restriction of x86 assembly language

Must prove that a given program always makes progress on the RI

We introduce verification conditions (VCs), whose truth implies that the corresponding instruction has a defined execution on the RI.

Verification Conditions

The point of the verification conditions, then, is to provide such progress theorems for each instruction in the program.

In other words, a VC’s validity says that the corresponding instruction has a defined execution in the s86 operational semantics.

The VCGen

The verification condition generator (VCGen) examines each instruction.

It essentially encodes the operational semantics of the language.

It checks some simple properties.

E.g., direct jumps go to legal addrs.

It invokes the Checker when dangerous instructions are encountered.

The VCGen, cont’d

Examples of dangerous instructions:

memory operations

procedure calls

procedure returns

For each such instruction, VCGen creates a verification condition (VC).

A VC is a logical predicate whose truth implies the instruction is safe.

Examples of Safety Properties

Memory safety. Which addresses are readable /

writable; when, and what values.

Type safety. What values can be stored and used

in operations.

System call safety. Which system routines can be called

and when.

Examples of Safety Policiescont’d

Action sequence safety.

E.g., no network send after reading a file.

Resource usage safety.

E.g., instruction counts, stack limits, etc.

What Can’t Be Enforced?

Informally:

Safety properties. Yes. “No bad thing will happen.”

Liveness properties. Not yet. “A good thing will eventually happen.”

Information-flow properties. ? “Confidentiality will be preserved.”

Example of type safety giving us VC validity?

Example: Source Code

public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Example: Target Code

ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:

cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret

L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

L13:call __Jv_ThrowBadArrayIndex

ANN_UNREACHABLEnop

L6:call __Jv_ThrowNullPointer

ANN_UNREACHABLEnop

Cut Points

Each loop entry must be annotated as a cut point.

VCGen requires this so that checking can be performed in a single scan of the code.

As a convenience, the modified registers are also declared in the cut annotations.

Example: Target Code

ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:

cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret

L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

L13:call __Jv_ThrowBadArrayIndex

ANN_UNREACHABLEnop

L6:call __Jv_ThrowNullPointer

ANN_UNREACHABLEnop

VCGen requires annotations in order to simplify the process.

Example: Source Code

public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

The VCGen Process (1)_bcopy__6arrays5BcopyAIAI:

cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 retL22:

xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esiL7: ANN_LOOP(INV = …

A0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)ebx := src_1ecx := (sel4 rm_1 (add src_1 4))

A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)

edx := 0

A5 = (csubneq dst_1 0)eax := dst_1esi := (sel4 rm_1 (add dst_1 4))

The VCGen Process (2)

L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13

movl 8(%ebx,%edx,4), %edi

movl %edi, 8(%eax,%edx,4) …

A3A5A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))

edi := edi_1edx := edx_1rm := rm_2

A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))!!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))

The Checker (1)

The checker is asked to verify that(saferd4 (add src_1 (add (imul edx_1 4) 8)))

under assumptionsA0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)A5 = (csubneq dst_1 0)A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))

The checker looks in the PCC for a proof of this VC.

The Checker (2)

In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as

szint : pf (size jint 4)

rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).

Checker (3)

A proof for

(saferd4 (add src_1 (add (imul edx_1 4) 8)))

in the Java specification looks like this (excerpt):

(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7)))

This proof can be easily validated via LF type checking.

VC Explosion

a == b

a == c

f(a,c)

a := x c := x

a := y c := y

a=b => (x=c => safef(y,c) x<>c => safef(x,y))

a<>b => (a=x => safef(y,x) a<>x => safef(a,y))

Exponential growth in size ofthe VC is possible.

VC Explosion

a == b

a == c

f(a,c)

a := x c := x

a := y c := y

INV: P(a,b,c,x)

(a=b => P(x,b,c,x)

a<>b => P(a,b,x,x))

(a’,c’. P(a’,b,c’,x) =>

a’=c’ => safef(y,c’) a’<>c’ => safef(a’,y))

Growth can usually becontrolled by careful placementof just the right “join-point” invariants.

Stack Slots

Each procedure will want to use the stack for local storage.

This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory.

We avoid this problem by assuming that procedures use up to 256 words of stack as registers.

Other Approaches to PCC

Typed Assembly Language[Morrisett, et al., ‘98]

Use modern type theory to develop a static type system for machine code.

Prove decidability of typechecking.

Prove soundness of type system.

Developing such a type system is very hard, but done only once.

TAL

fact: ALL rho.{r1:int, sp:{r1:int, sp:rho}::rho} jgz r1, positive mov r1,1 retpositive: push r1 ; sp : int::{t1:int,sp:rho}::rho sub r1,r1,1 call fact[int::{r1:int,sp:rho}::rho] imul r1,r1,r2 pop r2 ; sp : {r1:int,sp:rho}:: ret

Eliminating VCGen

We can eliminate VCGen by using the logic to encode a global invariant on states, Inv(S).

Then, the proof must show:

Inv(S0)

S:State. Inv(S) ! Inv(Step(S))

S:State. Inv(S) ! SP(S)

Foundational PCC

Appel and Felty [’00] develop a semantic model of types, starting from the foundations of mathematical logic.

This model is used to construct the global invariant.

Hamid, Shao, et al. define the global invariant to be a syntactic well-formedness condition on machine states.

Temporal-logic PCC

Bernard and Lee [’02] define the global invariant via a temporal-logic specification.

A trusted generic program then interprets these specifications to extract verification conditions.

proof-carrying code

Documents

useful safety properties

safe execution

safety policyri language

defined execution

abort execution

mobile phones vs

target code

real programs