proof-carrying code
DESCRIPTION
Proof-Carrying Code. Programmable mobile devices. By 2003, one in five people will own a mobile communications device. Nokia expects to sell 500M Java-enabled phones in 2003. Most of these devices will be power and memory limited. Mobile/Wireless Devices. - PowerPoint PPT PresentationTRANSCRIPT
Proof-Carrying Code
Programmable mobile devices
By 2003, one in five people will own a mobile communications device.
Nokia expects to sell 500M Java-enabled phones in 2003.
Most of these devices will be power and memory limited.
Mobile/Wireless Devices
In ‘97, 101M mobile phones vs 82M PCs. (40% vs 14%.)
95% phones will be WAP enabled by ‘04.
64Mbits of RAM in 2002.
Battery life a primary factor.
Efficiency and bandwidth will still be precious.
Cheese and the Sum Total of Human Knowledge
The Code Safety Problem
Code Safety
CPU
Code
Trusted Host
Is this safe to execute?
Approach 1Trust the Code Producer
CPU
Code
Trusted Host
sig
Trusted 3rd Party
PK1
PK1
PK2
PK2
Trust is based on personal authority, not program properties
Scaling problems?
Approach 2Baby-sit the Program
CPU
Code
Trusted Host
Execution monitor
Expensive
Limited in expressive power(Why?)
E.g., Software Fault Isolation [Wahbe & Lucco], Inline Reference Monitors [Schneider]
Approach 3Java
CPU
Code
Trusted Host
Interp/ JIT
Expensive and/or big
Limited in expressive power
Verifier
TheoremProver
Approach 4Formal Verification
CPU
Code
Flexible andpowerful.
Trusted Host
But really reallyreally hard andmust be correct.
A Key Idea: Explicit Proofs
CertifyingProver
CPU
ProofChecker
Code
Proof
Trusted Host
A Key Idea: Explicit Proofs
CertifyingProver
CPU
Code
Proof
No longer need totrust this component.
ProofChecker
Proof-Carrying Code
CertifyingProver
CPU
Code
Proof
Simple,small (<52KB),and fast.
No longer need totrust this component.
ProofChecker
Reasonable in size (0-10%).
But...
...How to generate the proofs?
Proving theorems about real programs is hard.
Most useful safety properties of low-level programs are undecidable.
Theorem-proving systems are unfamiliar to programmers and hard to use even for experts.
The Role ofProgramming Languages
Civilized programming languages can provide “safety for free”.
Well-formed/well-typed safe.
Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.
Certifying Compilers[Necula & Lee, PLDI’98]
Intuition:
Compiler “knows” why each translation step is semantics-preserving.
So, have it generate a proof that safety is preserved.
“Small theorems about big programs.”
Don’t try to verify the whole compiler, but only each output it generates.
Automation viaCertifying Compilation
CertifyingCompiler
CPULooks and smells like a compiler.
% spjc foo.java bar.class baz.c -ljdk1.2.2
Sourcecode
Proof
Objectcode
CertifyingProver
ProofChecker
Overview of the Necula/Lee Approach to PCC
High-Level Architecture
Explanation
CodeVerificationconditiongenerator
Checker
Safetypolicy
Agent
Host
Reference Interpreters
A reference interpreter (RI) is a standard interpreter extended with instrumentation to check the safety of each instruction before it is executed, and abort execution if anything unsafe is about to happen.
In other words, an RI is capable only of safe execution.
Reference Interpreterscont’d
The reference interpreter is never actually implemented.
The point will be to prove (by using the proof rules given in the safety policy) that execution of the code on the RI never aborts, and thus execution on the real hardware will be identical to execution on the RI.
Sample Reference Interpreter
High-Level Architecture
Explanation
CodeVerificationconditiongenerator
Checker
Safetypolicy
Agent
Host
The Safety Policy
The RI can be viewed as defining a safety policy
RI language is a restriction of x86 assembly language
Must prove that a given program always makes progress on the RI
We introduce verification conditions (VCs), whose truth implies that the corresponding instruction has a defined execution on the RI.
Verification Conditions
The point of the verification conditions, then, is to provide such progress theorems for each instruction in the program.
In other words, a VC’s validity says that the corresponding instruction has a defined execution in the s86 operational semantics.
The VCGen
The verification condition generator (VCGen) examines each instruction.
It essentially encodes the operational semantics of the language.
It checks some simple properties.
E.g., direct jumps go to legal addrs.
It invokes the Checker when dangerous instructions are encountered.
The VCGen, cont’d
Examples of dangerous instructions:
memory operations
procedure calls
procedure returns
For each such instruction, VCGen creates a verification condition (VC).
A VC is a logical predicate whose truth implies the instruction is safe.
Examples of Safety Properties
Memory safety. Which addresses are readable /
writable; when, and what values.
Type safety. What values can be stored and used
in operations.
System call safety. Which system routines can be called
and when.
Examples of Safety Policiescont’d
Action sequence safety.
E.g., no network send after reading a file.
Resource usage safety.
E.g., instruction counts, stack limits, etc.
What Can’t Be Enforced?
Informally:
Safety properties. Yes. “No bad thing will happen.”
Liveness properties. Not yet. “A good thing will eventually happen.”
Information-flow properties. ? “Confidentiality will be preserved.”
Example of type safety giving us VC validity?
Example: Source Code
public class Bcopy { public static void bcopy(int[] src,
int[] dst) { int l = src.length; int i = 0;
for(i=0; i<l; i++) { dst[i] = src[i]; } }}
Example: Target Code
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:
cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret
L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi
L7:ANN_LOOP(INV = {
(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},
MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret
L13:call __Jv_ThrowBadArrayIndex
ANN_UNREACHABLEnop
L6:call __Jv_ThrowNullPointer
ANN_UNREACHABLEnop
Cut Points
Each loop entry must be annotated as a cut point.
VCGen requires this so that checking can be performed in a single scan of the code.
As a convenience, the modified registers are also declared in the cut annotations.
Example: Target Code
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:
cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret
L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi
L7:ANN_LOOP(INV = {
(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},
MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret
L13:call __Jv_ThrowBadArrayIndex
ANN_UNREACHABLEnop
L6:call __Jv_ThrowNullPointer
ANN_UNREACHABLEnop
VCGen requires annotations in order to simplify the process.
Example: Source Code
public class Bcopy { public static void bcopy(int[] src,
int[] dst) { int l = src.length; int i = 0;
for(i=0; i<l; i++) { dst[i] = src[i]; } }}
The VCGen Process (1)_bcopy__6arrays5BcopyAIAI:
cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 retL22:
xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esiL7: ANN_LOOP(INV = …
A0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)ebx := src_1ecx := (sel4 rm_1 (add src_1 4))
A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)
edx := 0
A5 = (csubneq dst_1 0)eax := dst_1esi := (sel4 rm_1 (add dst_1 4))
The VCGen Process (2)
L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13
movl 8(%ebx,%edx,4), %edi
movl %edi, 8(%eax,%edx,4) …
A3A5A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))
edi := edi_1edx := edx_1rm := rm_2
A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))!!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))
The Checker (1)
The checker is asked to verify that(saferd4 (add src_1 (add (imul edx_1 4) 8)))
under assumptionsA0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)A5 = (csubneq dst_1 0)A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))
The checker looks in the PCC for a proof of this VC.
The Checker (2)
In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as
szint : pf (size jint 4)
rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).
Checker (3)
A proof for
(saferd4 (add src_1 (add (imul edx_1 4) 8)))
in the Java specification looks like this (excerpt):
(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7)))
This proof can be easily validated via LF type checking.
VC Explosion
a == b
a == c
f(a,c)
a := x c := x
a := y c := y
a=b => (x=c => safef(y,c) x<>c => safef(x,y))
a<>b => (a=x => safef(y,x) a<>x => safef(a,y))
Exponential growth in size ofthe VC is possible.
VC Explosion
a == b
a == c
f(a,c)
a := x c := x
a := y c := y
INV: P(a,b,c,x)
(a=b => P(x,b,c,x)
a<>b => P(a,b,x,x))
(a’,c’. P(a’,b,c’,x) =>
a’=c’ => safef(y,c’) a’<>c’ => safef(a’,y))
Growth can usually becontrolled by careful placementof just the right “join-point” invariants.
Stack Slots
Each procedure will want to use the stack for local storage.
This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory.
We avoid this problem by assuming that procedures use up to 256 words of stack as registers.
Other Approaches to PCC
Typed Assembly Language[Morrisett, et al., ‘98]
Use modern type theory to develop a static type system for machine code.
Prove decidability of typechecking.
Prove soundness of type system.
Developing such a type system is very hard, but done only once.
TAL
fact: ALL rho.{r1:int, sp:{r1:int, sp:rho}::rho} jgz r1, positive mov r1,1 retpositive: push r1 ; sp : int::{t1:int,sp:rho}::rho sub r1,r1,1 call fact[int::{r1:int,sp:rho}::rho] imul r1,r1,r2 pop r2 ; sp : {r1:int,sp:rho}:: ret
Eliminating VCGen
We can eliminate VCGen by using the logic to encode a global invariant on states, Inv(S).
Then, the proof must show:
Inv(S0)
S:State. Inv(S) ! Inv(Step(S))
S:State. Inv(S) ! SP(S)
Foundational PCC
Appel and Felty [’00] develop a semantic model of types, starting from the foundations of mathematical logic.
This model is used to construct the global invariant.
Hamid, Shao, et al. define the global invariant to be a syntactic well-formedness condition on machine states.
Temporal-logic PCC
Bernard and Lee [’02] define the global invariant via a temporal-logic specification.
A trusted generic program then interprets these specifications to extract verification conditions.