Formal Methods for Minimizing the DHOSA Trusted Computing Base
Greg Morrisett, Harvard University
with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassoratti, & J.B.Tristan
1
TRANSFORMATION
HARDWARE SYSTEM ARCHITECTURES
SVA
Binary translation and
emulation
Formal methods
Hardware support for isolation
Dealing with malicious hardware
Cryptographic secure
computation
Data-centric security
Secure browser appliance
Secure servers
WEB-BASED ARCHITECTURES
e.g., Enforce properties on a malicious OS
e.g., Prevent dataexfiltration
e.g., Enable complex distributed systems, with resilience to hostile OS’s
3
DHOSA TechnologiesWe are investigating a variety of techniques to
defend hosts: Binary Translation & Instrumentation LLVM & Secure Virtual Architecture New Hardware architectures
How can we minimize the need to trust these components?
4
The role of formal methods Ideally, we should have proofs that the tools
are “correct”. The consumer should be able to independently
validate the proofs against the working system.
This raises three hard problems: We need formal models of system components. We need formal statements of “correctness”. We need proofs that our
enforcement/rewriting/analysis code (or hardware) are correct.
5
Some of our activities Tools for formal modeling of machine
architectures Domain-specific languages embedded into Coq. Give us declarative specs of machine-level syntax
& semantics. Give us executable specifications for model
validation. Give us the ability to formally reason about
machine code. Tools for proving correctness of binary-
validation Specifically, that a binary will respect an isolation
policy. e.g., SFI, CFI, XFI, NaCL, TAL, etc.
Tools for proving correctness of compilers. New techniques for scalable proofs of correctness. New techniques for legacy compilers.
6
Modeling Machine Architectures Real machines (e.g., Intel’s IA64) are messy.
Even decoding instructions is hard to get right. The semantics are not explained well (and not
always understood.) There are actually many different versions.
Yet to prove that a compiler or analysis or rewriting tool is correct, we need to be able to reason about real machine architectures.
And of course, we don’t just want Intel IA64. Need IA32, AMD, ARM, … And of course the specialized hardware that DHOSA is
considering!
7
Currently Various groups are building models of
machines. ACL2 group doing FP verification Cambridge group studying relaxed memory
models NICTA group doing L4 verification Inria group doing compiler verification
However, none of them really supports everything we need:1. declarative formulation – crucial for formal
reasoning2. efficiently executable – crucial for testing and
validation3. completeness – crucial for systems-level work4. reuse in reasoning – crucial for modeling many
architectures
8
Our Approach Two domain-specific languages (DSLs)
One for binary de-coding (parsing): bits -> ASTs One for semantics: ASTs -> behavior
The DSLs are inspired by N. Ramsey’s work. Sled andλ-RTL. Ramsey’s work intended for generating compiler back-
ends. Our focus is on reasoning about compiler-like tools.
The DSLs are embedded into Coq. lets us reason formally (in Coq) about parsing, semantics.
e.g., is decoding deterministic? e.g., will this binary, when executed in this state, respect SFI?
the encoding lets us extract efficient ML code (i.e., a simulator)
9
Decoding??
10
Yacc in Coq via CombinatorsDefinition CALL_p : parser instr :=
"1110" $ "1000" $ word @ (fun w => CALL (Imm_op w) None)
||
"1111" $ "1111" $ ext_op_modrm (str ”010” || str ”011”) @
(fun op => CALL op None)
||
"1001" $ "1010" $ halfword $$ word @
(fun p =>
CALL (Imm_op (snd p)) (Some (fst p))).
11
X86 Integer Instruction DecoderDefinition instr_parser :=
AAA_p || AAD_p || AAM_p || AAS_p || ADC_p || ADD_p || AND_p || CMP_p || OR_p ||
SBB_p || SUB_p || XOR_p || ARPL_p || BOUND_p || BSF_p || BSR_p || BSWAP_p || BT_p ||
BTC_p || BTR_p || BTS_p || CALL_p || CBW_p || CDQ_p || CLC_p || CLD_p || CLI_p ||
CMC_p || CMPS_p || CMPXCHG_p || CPUID_p || CWD_p || CWDE_p || DAA_p || DAS_p ||
DEC_p || DIV_p || HLT_p || IDIV_p || IMUL_p || IN_p || INC_p || INS_p || INTn_p ||
INT_p || INTO_p || INVD_p || INVLPG_p || IRET_p || Jcc_p || JCXZ_p || JMP_p ||
LAHF_p || LAR_p || LDS_p || LEA_p || LEAVE_p || LES_p || LFS_p || LGDT_p || LGS_p ||
LIDT_p || LLDT_p || LMSW_p || LOCK_p || LODS_p || LOOP_p || LOOPZ_p || LOOPNZ_p ||
LSL_p || LSS_p || LTR_p || MOV_p || MOVCR_p || MOVDR_p || MOVSR_p || MOVBE_p ||
MOVS_p || MOVSX_p || MOVZX_p || MUL_p || NEG_p || NOP_p || NOT_p || OUT_p ||
OUTS_p || POP_p || POPSR_p || POPA_p || POPF_p || PUSH_p || PUSHSR_p || PUSHA_p ||
PUSHF_p || RCL_p || RCR_p || RDMSR_p || RDPMC_p || RDTSC_p || RDTSCP_p || REPINS_p ||
REPLODS_p || REPMOVS_p || REPOUTS_p || REPSTOS_p || REPECMPS_p || REPESCAS_p ||
REPNECMPS_p || REPNESCAS_p || RET_p || ROL_p || ROR_p || RSM_p || SAHF_p || SAR_p ||
SCAS_p || SETcc_p || SGDT_p || SHL_p || SHLD_p || SHR_p || SHRD_p || SIDT_p ||
SLDT_p || SMSW_p || STC_p || STD_p || STI_p || STOS_p || STR_p || TEST_p || UD2_p ||
VERR_p || VERW_p || WAIT_p || WBINVD_p || WRMSR_p || XADD_p || XCHG_p || XLAT_p.
12
Parsing Semantics The declarative syntax helps get things right.
we can literally scrape manuals to get decoders. though it’s far from sufficient – manuals have bugs!
It’s possible to give a simple functional interpretation of the parsing combinators (a la Haskell).
parser T := string -> FinSet(string * T) allows us to extract executable code for testing.
Makes it very easy to reason about parsers and prove things like || is associative and commutative. or e.g., that Intel’s manuals are deterministic (they are
not).
13
SemanticsThe usual style for machines is a small-step,
operational semantics.
M(R1(pc)) = a parse(M,a) = i (M,R1,i) (M’,R1’)
(M,R1 || R2 || … || Rn) (M’,R1’ || R2 || … || Rn)
This makes it easy to specify non-determinism and reason about the fine-grained behavior of the machine.
But doesn’t really give us an efficient executable.Nor reusable reasoning.
14
Our approachWrite a monadic denotational semantics for instructions:
Definition step_AND(op1 op2:operand) :=
w1 <- get_op32 op1 ;
w2 <- get_op32 op2 ;
let res := Word32.Int.and w1 w2 in
set_op32 op1 res ;;
set_flag OF false ;;
set_flag CF false ;;
set_flag ZF (is_zero32 res) ;;
set_flag SF (is_signed32 res) ;;
set_flag PF (parity res) ;;
b <- next_oracle_bit ;
set_flag AF b
15
Reasoning versus Validation The monadic operations can be interpreted as pure
functions over oracles and machine states. The monadic operations are essentially RTLs over bit-
vectors. The infrastructure can be re-used across a wide variety of
machine architectures. i.e., defining and reasoning about machine architecture
semantics becomes relatively easy. But we can extract efficient ML code for testing the
model against other simulators & real machines. e.g., in-place updates for state changes instead of
functional data structures. in particular, we can leverage the work that Stephen
talked about to do better validation.
16
Example Application: Google’s NaCl NaCl uses software-fault isolation (SFI) to
enforce an isolation policy. good baseline for us to study mask the high-bits of every store/jump to ensure a
piece of untrusted code stays in its sandbox. tricky: must consider every parse of the x86 code. by enforcing an alignment convention, ensures
there’s only one parse (McCamant). security depends on the “checker” which verifies
these properties. Our goal: build and prove correctness of the
checker.
17
Our Verified Checker We generated a checker that is:
declarative easy to update
provably correct w.r.t. our x86 model except that it contains ~80 lines of trusted C code
smaller and faster than Google’s checker Google’s checker about 600 lines of trusted C code about 3x faster on a 200Kloc C program
Basic idea: generate a DFA that accepts only correctly rewritten
programs. the DFA is encoded as a set of tables, which are proven
correct. only the DFA driver is trusted.
18
Thus far… Focus: Formal methods for modeling real
machines. DSLs for instruction decoding, instruction
semantics. Yield both formal reasoning & efficient execution. Allows us to prove correctness of binary-level tools
like the SFI checker. Another Focus: compiler correctness
Crucial for eliminating language-based techniques from TCB.
For example, the Illinois group’s secure virtual architecture depends upon the correctness of the LLVM compiler.
19
To Date Gold standard was Leroy’s Compcert Compiler
(mildly) optimizing compiler for C to x86, ARM, PPC
models of these languages & architectures proof of correctness See J.Regher’s compiler bug paper at PLDI.
However: machine models are incomplete, unvalidated optimization at O1 levels but not O3 proofs are roughly 17x the size of the code!
20
Earlier WorkPost-Doc (now MIT faculty member) Adam
Chlipala’s work on lambda-tamer: compiler from core-ML to MIPS-like machine
transformations like CPS and closure-conversion breakthrough: |proofs| ≈ |code|
clever language representations avoid tedious proofs about variables, scope, binding.
clever language semantics makes reasoning simpler, more uniform.
clever tactic-based reasoning makes proofs mostly automatic, and far more extensible.
21
Current Work: We have built a version of LLVM where the optimizer is
provably correct (see PLDI’11 paper). to be fair, only intra-procedural optimizations but includes global value numbering, sparse conditional constant
propagation, advanced dead code elimination, loop invariant code motion, loop deletion, loop unrolling, and dead-store elimination.
The “proof” is completely automated. in essence, we have a way to effectively prove that the input to
the optimizer has the same behavior as the output. or more properly, when we can’t, we don’t optimize the code.
The prover knows nothing about the internals of the LLVM optimizer. so it’s easy to change LLVM, or add new optimizations.
22
LLVM Translation Validation
LLVM front-ends
LLVM Optimizer
code generato
r
equivalence checker
23
How do we do this? Convert LLVM’s SSA-based intermediate language
into a categorical value graph representation. similar to circuit representations (think BDDs). but incorporates loops by lifting everything to the level of
streams of values. allows us to reason equationally about both data and
control. Take advantage of category theory to normalize the
input and output graphs, and check for equivalence. this gives us many equivalences for free, such as common
sub-expressions and loop-invariant computations. but still need to normalize underlying scalar computations.
The key challenge is getting this to scale to big functions.
24
% of Functions Validated on all Opts.
bzip2
gcc
h264
ref
hmm
erlb
m
libqu
a...
mcf
milc
perlb
ench
sjeng
sphinx
sqlit
e3to
tal
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FailAlarmOKBoring
Fail: we fail to translate LLVM’s IR into our representationAlarm: we fail to validate the translationOK: we validate the translation and there are significant differencesBoring: we validate but the differences are minimal
25
Quick Recap DHOSA relies upon compilers, rewriting,
analysis, and other software tools to provide protection.
Our goal is to increase assurance in these tools. provide detailed formal models of machines prove correctness of key components find techniques for automating proofs
The hope is that these investments will pay off, not just for this project but others. e.g., IARPA Stonesoup, DARPA CRASH