a formally-verified c static analyzer - tum - chair viischulzef/2015-02-03-david... ·...
TRANSCRIPT
A Formally-Verified C static analyzer
joint work with J.-H. Jourdan, V. Laporte, S.Blazy, X. Leroy, presented at POPL’15 !!
David Pichardie
How do you trust your software ?
bug finders sound verifiers verified verifiers
How do you trust your software ?The increasing complexity of safety critical systems requires efficient validation techniques
bug finders sound verifiers verified verifiers
How do you trust your software ?The increasing complexity of safety critical systems requires efficient validation techniques• Manual verifications
– do not scale
manual verification bug finders sound verifiers verified verifiers
yesterday
How do you trust your software ?The increasing complexity of safety critical systems requires efficient validation techniques• Manual verifications
– do not scale• Automatic bug finders
– may miss some bugs
manual verification bug finders sound verifiers verified verifiers
yesterday today
How do you trust your software ?The increasing complexity of safety critical systems requires efficient validation techniques• Manual verifications
– do not scale• Automatic bug finders
– may miss some bugs• Automatic, sound verifiers
– find all bugs, may raise false alarms ex: the Astrée static analyzer
manual verification bug finders sound verifiers verified verifiers
~1M loc of a critical control-command software analyzed
http://www.astree.ens.fr/
0 false alarms
yesterday today tomorrow
The increasing complexity of safety critical systems requires efficient validation techniques• Manual verifications
– do not scale• Automatic bug finders
– may miss some bugs• Automatic, sound verifiers
– find all bugs, may raise false alarms ex: the Astrée static analyzer
• Formally-verified verifiers– the verifier comes with a soundness proof– that is machine checked
manual verification bug finders sound verifiers verified verifiers
~1M loc of a critical control-command software analyzed
http://www.astree.ens.fr/
0 false alarms
yesterday today tomorrow after tomorrow
How do you trust the tool that verifies your software ?
How do you we verify a verifier?
How do you we verify a verifier?
A simple idea:
How do you we verify a verifier?
Program and prove your verifier in the same language!
A simple idea:
How do you we verify a verifier?
Program and prove your verifier in the same language!
Which language ?
A simple idea:
How do you we verify a verifier?
Program and prove your verifier in the same language!
Which language ?
Coq
A simple idea:
Coq: an animal with two faces
Coq: an animal with two faces
First face: • a proof assistant that allows to interactively build proof in constructive logic
Coq: an animal with two faces
First face: • a proof assistant that allows to interactively build proof in constructive logic
Second face: • a functional programming language with a very rich type system
Coq: an animal with two faces
sort: 8 l: list int , { l’: list int |Sorted l’ ^ PermutationOf l l’ }
First face: • a proof assistant that allows to interactively build proof in constructive logic
Second face: • a functional programming language with a very rich type system
example:
Coq: an animal with two faces
sort: 8 l: list int , { l’: list int |Sorted l’ ^ PermutationOf l l’ }
sort: int list ! int list
First face: • a proof assistant that allows to interactively build proof in constructive logic
Second face: • a functional programming language with a very rich type system
example:
• with an extraction mechanism to Ocaml
Our methodology
Our methodology
Definition analyzer (p:program) := ...
Logical Framework (here Coq)
We program the static analyzer inside Coq Static Analyzer
Our methodology
Definition analyzer (p:program) := ...
Logical Framework (here Coq)
We program the static analyzer inside Coq Static Analyzer
Theorem analyser_is_sound :
8 p, analyser p = Yes ! Sound(p)
We state its correctness wrt. a formal specification of the language semantics
Language Semantics
Our methodology
Definition analyzer (p:program) := ...
Logical Framework (here Coq)
We program the static analyzer inside Coq Static Analyzer
Theorem analyser_is_sound :
8 p, analyser p = Yes ! Sound(p)
We state its correctness wrt. a formal specification of the language semantics
Language Semantics
Proof. ... (* few days later *) ... Qed.
We interactively and mechanically prove this theorem
Soundness Proof
Our methodology
Extraction analyzer.
We extract an OCaml implementation of the analyzer
Definition analyzer (p:program) := ...
Logical Framework (here Coq)
We program the static analyzer inside Coq Static Analyzer
Theorem analyser_is_sound :
8 p, analyser p = Yes ! Sound(p)
We state its correctness wrt. a formal specification of the language semantics
Language Semantics
Proof. ... (* few days later *) ... Qed.
We interactively and mechanically prove this theorem
Soundness Proof
parser.ml pprinter.mlanalyzer.ml
This talk
Verified Static Analysismeets
the verified Compcert C compiler
Background: verifying a compiler
CompCert, a moderately optimizing C compiler usable for critical embedded software
= compiler + proof that the compiler does not introduce bugs
Using the Coq proof assistant, X. Leroy proves the following semantic preservation property:
• Compiler written from scratch, along with its proof; not trying to prove an existing compiler
For all source programs S and compiler-generated code C, if the compiler generates machine code C from source S, without reporting a compilation error, then «C behaves like S».
Compcert meets the industrial world
Fly-by-wire software, for recent Airbus planes • control-command code generated from block diagrams
(3600 files, 3.96 MB of assembly code)• minimalistic OS
Results• Estimated WCET for each file• Average improvement per file: 14% • Compiled with CompCert 2.3, May 2014
Conformance to the certification process (DO-178)• Trade-off between traceability guarantees and efficiency of the generated
code
The Verasco project INRIA Celtique, Gallium, Abstraction, Toccata + VERIMAG + Airbus
Goal: develop and verify in Coq a realistic static analyzer by abstract interpretation
• Language analyzed: the CompCert subset of C• Nontrivial abstract domains, including relational domains• Modular architecture inspired from Astrée’s• Decent alarm reporting
Slogan: • if « CompCert ≈ 1/10th of GCC but formally verified », • likewise « Verasco ≈1/10th of Astrée but formally verified »
http://verasco.imag.fr
Modularity
Astrée is highly modular and programmed in ML
Verasco is highly modular and programmed in Coq
Building a static analyzer in ML
Modular design
Example of interface
module IntervalAbVal : ABVAL = ...!!module NonRelAbEnv (AV:ABVAL) : ABENV = ...!!module SimpleAbMem (AE:ABENV) : ABMEMORY = ...!!module Iterator (AM:ABMEMORY) : ANALYZER = ...!!module myAnalyzer = Iterator(SimpleAbMem(NonRelAbEnv(IntervalAbVal)))
module type ABDOM = sig! type ab! val le : ab → ab → bool! val top : ab! val join : ab → ab → ab! val widen : ab → ab → ab!end
Building a static analyzer
in ML in CoqClass adom (ab:Type) (c:Type) := {! le : ab → ab → bool;! top : ab;! join : ab → ab → ab;! widen : ab → ab → ab;! ! gamma : ab → ℘(c);!! gamma_monotone : ∀ a1 a2,! le a1 a2 = true ⟹ ! gamma a1 ⊆ gamma a2;! gamma_top : ∀ x, ! x ∈ gamma top;! join_sound : ∀ x y, ! gamma x ∪ gamma y ! ⊆ gamma (join x y)!}
module type ABDOM = sig! type ab! val le : ab → ab → bool! val top : ab! val join : ab → ab → ab! val widen : ab → ab → ab!end
Lazy proofsProof by necessity • We don’t prove properties that
are not strictly necessary to establish a soundness theorem.
What we don’t prove
• (ab,le,join) enjoy a lattice structure
• gamma is a meet morphism between complete lattices (Galois connection)
• widen is a sound widening operator
Class adom (ab:Type) (c:Type) := {! le : ab → ab → bool;! top : ab;! join : ab → ab → ab;! widen : ab → ab → ab;! ! gamma : ab → ℘(c);!! gamma_monotone : ∀ a1 a2,! le a1 a2 = true ⟹ ! gamma a1 ⊆ gamma a2;! gamma_top : ∀ x, ! x ∈ gamma top;! join_sound : ∀ x y, ! gamma x ∪ gamma y ! ⊆ gamma (join x y)!}
numbers
CompCert compiler
General architecture
? ...
statesState abstraction
control flowAbstract interpreterAlarms
Numerical abstraction
...
Each layer is parameterized by the underlying one.
CompCert: 1 compiler, 11 languages
type eliminationloop simplifications
CFG constructionexpr. decomp.
spilling, reloadingcalling conventions
Compcert C Clight C#minor
CminorCminorSelRTL
LTL LTLin Linear
MachASM
side-effects outof expressions
stack allocationof «&»variables
Optimizations: constant prop., CSE, tail calls, (LCM), (software pipelining)
instructionselection
registerallocation (IRC)
linearizationof the CFG
layout ofstack frames
asm codegeneration
(instruction scheduling)
Where should we perform the analysis ?
Compcert behavior preservation theorems
. . .
Theorem [Behavior Preservation]
!
Corollaries• If target program goes wrong, source program goes wrong too• A program verifier on , gives useful information on , but not
necessarily on . Pi Pi+1
Pi�1
Pn 2 Ln
P2 2 L2
P1 2 L1
8Pi 8Pi+1, C(Pi) = Pi+1 =) B(Pi + 1) ✓ B(Pi)
Which CompCert representation ?
RTL ?
• the place where most CompCert optimizations take place
• but platform specific, flat expressions
C source ?
• a language for programmer, not for tools
• ultimately we want to gives alarms at this level
Clight ?
• C syntax without side-effect in expressions
C#minor
• almost like Clight but with tool-friendly syntax (e.g. store/load instruction)
Platform specific backend
Compcert C Clight C#minor Cminor RTL … ASM
C#minor Abstract interpreter (1/2)
C#minor• structured statements• exit n (encoding break/continue) : jumps to the end of the (n+1)-th
enclosing block• goto with labels • variables
• global (their address can be taken, statically allocated)• local (their can be taken, dynamically allocated/freed at each function
call/return)• temporary (not resident in memory)
CompCert compilerC#minor ...
control flowAbstract interpreterAlarms
...
C#minor Abstract interpreter (2/2)
CompCert compilerC#minor ...
control flowAbstract interpreterAlarms
...
Structural approach instead of CFG approach• obviates the need to define program points• uses less memory than the CFG-based interpreter • transfer functions are more involved (control can leave a stmt in many ways)• local fixpoint solving at each loop• function call trigger a recursive call of the abstract interpreter• goto requires a global fixpoint computation
Parameterized by a relational abstract domain for execution states(environment + memory state + call stack)
The state abstract domain (1/2)
Abstract memory cell: 1 unit of storage
c ::= temp(f,t) | local(f,x,offset,size) | global(x,offset,size)
Abstract value: (cell types, points-to graph, numerical abstraction)
The domain is parameterized by a relational numerical domain where cells act as variables.
statesMemory & value domain
control flowAbstract interpreterAlarms
CompCert compilerC#minorClightCompCert C ...
The state abstract domain (2/2)
Example:
t := &T; …; s := s + load(int32, t + 8×i + 4)
1. Points-to information says « t contains a pointer to global T »
2. Numerical domain gives a range {4,12} for the numerical value 8×i + 4
3. We approximate the load expression with two numerical cell assignments temp(f,s) := temp(f,s) + global(T,4,int32) temp(f,s) := temp(f,s) + global(T,12,int32)
4. And finally take the join of the two new numerical abstract states thus obtained.
statesMemory & value domain
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
transforms any rel. domain over Z into a rel. domain over machine integers with modulo
arithmetic
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
conjunctions of linear inequalities ∑ai xi ≤ c
[SAS’13]
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
symbolic conditional expressions
(improve precision of assume commands)
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
transforms any non-rel. domain into a (reduced) rel. domain
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
crucial to analyze the safety of memory accesses (memory alignement)
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
requires reasoning on double-precision floating-point
numbers (IEEE754)
numbers
Abstract numerical domains
statesMemory & value domain
control flowAbstract interpreterAlarms
Z → int
Integer congruences Integer & F.P. intervals
Nonrel→ Rel Nonrel→ RelSymbolic equalities
Convex polyhedra
CompCert compilerC#minorClightCompCert C ...
VERIMAG work
Combining abstract domains
Implementations of reduced products tend to be specific to the 2 domains being combined.
System of inter-domains communication channels inspired by that of Astrée [ASIAN’O6]• Channels are used by domains when they need information from another
domain.• The information already present in a channel is enriched with information
of a query.
Implementation
34 000 lines of Coq, excluding blanks and comments • half proof, half code & specs• plus parts reused from CompCert
Bulk of the development: abstract domains for states and for numbers(involve large case analyses and difficult proofs over integer and floating points arithmetic)
Except for the operations over polyhedra, the algorithms are implemented directly in Coq’s specification language.
transfert function
checker
untrusted solver
= formally verified= not verified
transfert functionExternal solver with verified operatorFully verified operator
Experimental results
Preliminary experiments on small C programs (up to a few hundred lines)• CompCert benchmarks• Cryptographic routines (NaCl library)
Exercise many delicate aspects of the C language: arrays, pointer arithmetic, function pointers, floating-point arithmetic.
The analyzer can takes < 1 minute to analyze a few hundred lines of C.
We obtain automatically foundational « can’t go wrong » proofs for non-trivial C programs
Conclusion
Static analyzer based on abstract interpretation which establishes the absence of run-time errors in C programs (excluding recursion and dynamic allocation)
!
!
!
Modular architecture supporting the extensible combination of multiple abstract domains (relational and non-relational)
Integrates with CompCert, so that the soundness of the analysis is guaranteed on the compiled code as well
Theorem vanalysis_is_correct: forall prog tr, ! vanalysis prog = OK $ program_behaves (semantics prog) (Goes_wrong tr)$ False.
A holistic effect with compiler verificationCompilerTheorem transf_c_program_is_refinement: forall p tp, transf_c_program p = OK tp → (forall behv, exec_C_program p behv → not_wrong behv) → (forall behv, exec_Asm_program tp behv → exec_C_program p behv).
A holistic effect with compiler verificationCompilerTheorem transf_c_program_is_refinement: forall p tp, transf_c_program p = OK tp → (forall behv, exec_C_program p behv → not_wrong behv) → (forall behv, exec_Asm_program tp behv → exec_C_program p behv).
Static analyzerTheorem analyzer_is_correct: forall p, static_analyzer_result p = Success → (forall behv, exec_C_program p behv → not_wrong behv).
+
A holistic effect with compiler verificationCompilerTheorem transf_c_program_is_refinement: forall p tp, transf_c_program p = OK tp → (forall behv, exec_C_program p behv → not_wrong behv) → (forall behv, exec_Asm_program tp behv → exec_C_program p behv).
Static analyzerTheorem analyzer_is_correct: forall p, static_analyzer_result p = Success → (forall behv, exec_C_program p behv → not_wrong behv).
+
Stronger correctness resultTheorem transf_c_program_is_refinement: forall p tp, transf_c_program p = OK tp → static_analyzer_result p = Success → (forall behv, exec_Asm_program tp behv → exec_C_program p behv).
=
Future directions
Engineering • from C#minor to Clight• better support for precision debugging
Improving the algorithmic efficiency of the static analyzer• from Coq’s integer and FP arithmetic (list of bits) to more efficient libraries• improve physical sharing of purely functional data structures used for
maps and setsExtend the memory abstract domain• dynamic memory allocation • recursion (also requires modifying the iterator)
Improving the precision of the analysis• on-the-fly unrolling of certain loops (based on unverified heuristics)• new abstract domains, e.g. octagons
Questions ?
Related work
S. Blazy, V. Laporte, A. Maroneze, and D. Pichardie. Formal verification of a C value analysis based on abstract interpretation. SAS 2013.
A.Fouilhé,D.Monniaux,and M.Périn. Efficient generation of correctness certificates for the abstract domain of polyhedra. SAS 2013.
S. Cho, J. Kang, J. Choi, C.-K. Hur, and K. Yi. Sparrow- Berry: A verified validator for an industrial-strength static analyzer. http://ropas.snu.ac.kr/sparrowberry/, 2013
M. Hofmann, A. Karbyshev, and H. Seidl. Verifying a local generic solver in Coq. SAS 2010.
T. Nipkow. Abstract interpretation of annotated commands. ITP 2012.
D. Cachera and D. Pichardie. A certified denotational abstract interpreter. ITP 2010.
Ain
Cin
Aout
Cout
C0out
⇥Ain
Cin
⇥ Aout
Cout
C0out
Figure 4. Communication channels between abstract operators.Left: single operator. Right: composition of two operators fromdifferent domains.
the same concrete environments. (Channel Cin
in figure 4.) Sym-metrically, each transfer function that returns a new abstract valuealso returns a channel C0
out
that other domains can query to obtaininformation on the state after the execution of the transfer function.Finally, yet another channel C
out
is provided as extra argument toeach transfer function. Querying this channel will provide informa-tion on the state after the execution of transfer functions from otherdomains. Only abstract domains having already computed thesefunction can answer these queries. In other words, the domain op-eration produces C0
out
by enriching information already present inC
out
with information of its own. For example, the assign transferfunction, which corresponds, in the concrete, to an assignment ofan expression to a variable, has the following type:
assign: var ! iexpr var ! t * in_chan !
in_chan ! (t * in_chan)+?;
The first and second arguments are the assigned variable and ex-pression. The third argument is a pair representing the initial states:an abstract value and the channel C
in
. The fourth argument is thechannel C
out
representing the current information about the finalstate after the assignment. If no contradiction is found, assign re-turns the final abstract state and the enriched channel C0
out
. Thespecification of assign is as follows:
assign_correct: 8 x e ab chan ⇢ n,
n 2 eval_iexpr ⇢ e !
⇢ 2 � ab !
(upd ⇢ x n) 2 � chan !
(upd ⇢ x n) 2 � (assign x e ab chan);
Here, we extend � to pairs of abstract values and channels, tak-ing � (x, y) = � x \ � y and using Coq type classes. Thisspecification of assign is analogous to the one in figure 2. Thedifference is that we add an hypothesis stating that the two chan-nels given as parameters are correct with respect to initial and finalstates respectively. Moreover, we demand that the returned channelbe correct with respect to the final state.
An implementation of such a specification has to create a chan-nel. For each query, the implementation can choose to forward it tothe channel received as its fourth parameter, effectively forwardingit to another domain, or to answer it using its own information, or todo both and combine the information. For example, an interval do-main will answer get_itv queries but not get_eq_expr queries,forwarding the latter to other domains.
This interface for channel-aware transfer functions such asassign makes it easy to combine two domains and have themcommunicate. Verasco provides a generic combinator (pictured as⌦ in figure 1) that takes two abstract domains over ideal numericalenvironments and returns a product domain where informationcoming from both domains is stored, and where the two domainscan communicate via channels. The definition of assign for theproduct is the following:
assign v e (ab:(A*B)*in_chan) chan :=
let ’((a, b), abchan) := ab in
(* Computation on the first component *)
do_bot retachan <- assign v e (a, abchan) chan;
let ’(reta, chan) := retachan in
(* Computation on the second component,
using the new input channel *)
do_bot retbchan <- assign v e (b, abchan) chan;
let ’(retb, chan) := retbchan in
NotBot ((reta, retb), chan)
The “plumbing” implemented here is depicted in figure 4, rightpart. As shown there, the C
out
channel passed to the second abstractdomain is the C0
out
channel generated by the first abstract domain.This enables the second abstract domain to query the first one.
Using this product construction, we can build trees (nested prod-ucts) of cooperating domains. As depicted in figure 1, the input tothe “Z ! int” domain transformer described in section 6.5 issuch a combination of numerical domains. Abstract states of thiscombination are pairs of, on the one hand, nested pairs of abstractstates from the individual numerical domains, and, on the otherhand, a channel. The channel is not only used when calling abstracttransfer functions, but also directly in order to get numerical infor-mation such as variation intervals. When the “Z ! int” domaintransformer calls a transfer function, it simply passes as initial C
out
channel a “top” channel whose concretization contains all concreteenvironments: assign x e v in chan top.
One final technical difficulty is comparison between abstractstates, such as the subsumption test v used during post-fixpointcomputation. In the upper layers, abstract states comprise (nestedpairs of) abstract values plus a channel. Therefore, it seems nec-essary to compare two channels, or at least one channel and oneabstract state. However, channels being records of functions, com-parison is not decidable. Our solution is to maintain the invariantthat channels never contain more information than what is con-tained in the abstract values they are paired with. That is, at thetop of the combination of domains, when we manipulate a pair(ab, chan) of an abstract value and a channel, we will makesure that �(ab) ✓ �(chan) holds. In order to check whether onesuch pair (ab1, chan1) is smaller than another (ab2, chan2),we only need to check that �(ab1, chan1) ✓ �(ab2), whichis easily decidable. Thus, the type of the comparison function forabstract values is:
leb: t * in_chan ! t ! bool
Note that it is useful to provide leb with a channel for its firstargument: an abstract domain can, then, query other domains inorder to compare abstract values.
However, the constraint �(ab) ✓ �(chan) is too strong forreal abstract domains: it makes it impossible for a domain to for-ward a query to another domain. Instead, we use a weaker con-straint for every transfer function. In the case of assign, we prove:
8 x e in chan0 ab chan,
assign x e in chan0 = NotBot (ab, chan) !
� chan0 \ � ab ✓ �(chan)
That is, the returned channel contains no more informationthan what is contained in the returned abstract value and thegiven channel. When chan0 is in_chan_top, it follows that�(ab) ✓ �(chan), ensuring the soundness of the leb compari-son function.
The property that limits the amount of information contained inchannels is useful beyond the proof of soundness for comparisons:it is also a sanity check, ensuring that the returned channel onlydepends on the abstract value and on the channel given as thelast argument, but not for instance on channels or abstract values
Communications between domains