a formally-veriﬁed c static analyzer - tum - chair viischulzef/2015-02-03-david... ·...

A Formally-Verified C static analyzer

joint work with J.-H. Jourdan, V. Laporte, S.Blazy, X. Leroy, presented at POPL’15 !!

David Pichardie

How do you trust your software ?

bug finders sound verifiers verified verifiers

How do you trust your software ?The increasing complexity of safety critical systems requires efficient validation techniques

bug finders sound verifiers verified verifiers

How do you trust your software ?The increasing complexity of safety critical systems requires efficient validation techniques• Manual verifications

– do not scale

manual verification bug finders sound verifiers verified verifiers

yesterday


– do not scale• Automatic bug finders

– may miss some bugs


yesterday today



– may miss some bugs• Automatic, sound verifiers

– find all bugs, may raise false alarms ex: the Astrée static analyzer


~1M loc of a critical control-command software analyzed

http://www.astree.ens.fr/

0 false alarms

yesterday today tomorrow

The increasing complexity of safety critical systems requires efficient validation techniques• Manual verifications


– may miss some bugs• Automatic, sound verifiers

– find all bugs, may raise false alarms ex: the Astrée static analyzer

• Formally-verified verifiers– the verifier comes with a soundness proof– that is machine checked


~1M loc of a critical control-command software analyzed

http://www.astree.ens.fr/

0 false alarms

yesterday today tomorrow after tomorrow

How do you trust the tool that verifies your software ?

How do you we verify a verifier?


A simple idea:


Program and prove your verifier in the same language!

A simple idea:



Which language ?

A simple idea:



Which language ?

Coq

A simple idea:

Coq: an animal with two faces


First face: • a proof assistant that allows to interactively build proof in constructive logic



Second face: • a functional programming language with a very rich type system


sort: 8 l: list int , { l’: list int |Sorted l’ ^ PermutationOf l l’ }



example:


sort: 8 l: list int , { l’: list int |Sorted l’ ^ PermutationOf l l’ }

sort: int list ! int list



example:

• with an extraction mechanism to Ocaml

Our methodology

Our methodology

Definition analyzer (p:program) := ...

Logical Framework (here Coq)

We program the static analyzer inside Coq Static Analyzer

Our methodology




Theorem analyser_is_sound :

8 p, analyser p = Yes ! Sound(p)

We state its correctness wrt. a formal specification of the language semantics

Language Semantics

Our methodology







Language Semantics

Proof. ... (* few days later *) ... Qed.

We interactively and mechanically prove this theorem

Soundness Proof

Our methodology

Extraction analyzer.

We extract an OCaml implementation of the analyzer







Language Semantics

Proof. ... (* few days later *) ... Qed.

We interactively and mechanically prove this theorem

Soundness Proof

parser.ml pprinter.mlanalyzer.ml

This talk

Verified Static Analysismeets

the verified Compcert C compiler

Background: verifying a compiler

CompCert, a moderately optimizing C compiler usable for critical embedded software

= compiler + proof that the compiler does not introduce bugs

Using the Coq proof assistant, X. Leroy proves the following semantic preservation property:

• Compiler written from scratch, along with its proof; not trying to prove an existing compiler

For all source programs S and compiler-generated code C, if the compiler generates machine code C from source S, without reporting a compilation error, then «C behaves like S».

Compcert meets the industrial world

Fly-by-wire software, for recent Airbus planes • control-command code generated from block diagrams

(3600 files, 3.96 MB of assembly code)• minimalistic OS

Results• Estimated WCET for each file• Average improvement per file: 14% • Compiled with CompCert 2.3, May 2014

Conformance to the certification process (DO-178)• Trade-off between traceability guarantees and efficiency of the generated

code

The Verasco project INRIA Celtique, Gallium, Abstraction, Toccata + VERIMAG + Airbus

Goal: develop and verify in Coq a realistic static analyzer by abstract interpretation

• Language analyzed: the CompCert subset of C• Nontrivial abstract domains, including relational domains• Modular architecture inspired from Astrée’s• Decent alarm reporting

Slogan: • if « CompCert ≈ 1/10th of GCC but formally verified », • likewise « Verasco ≈1/10th of Astrée but formally verified »

http://verasco.imag.fr

Modularity

Astrée is highly modular and programmed in ML

Verasco is highly modular and programmed in Coq

Building a static analyzer in ML

Modular design

Example of interface

module IntervalAbVal : ABVAL = ...!!module NonRelAbEnv (AV:ABVAL) : ABENV = ...!!module SimpleAbMem (AE:ABENV) : ABMEMORY = ...!!module Iterator (AM:ABMEMORY) : ANALYZER = ...!!module myAnalyzer = Iterator(SimpleAbMem(NonRelAbEnv(IntervalAbVal)))

module type ABDOM = sig! type ab! val le : ab → ab → bool! val top : ab! val join : ab → ab → ab! val widen : ab → ab → ab!end

Building a static analyzer

in ML in CoqClass adom (ab:Type) (c:Type) := {! le : ab → ab → bool;! top : ab;! join : ab → ab → ab;! widen : ab → ab → ab;! ! gamma : ab → ℘(c);!! gamma_monotone : ∀ a1 a2,! le a1 a2 = true ⟹ ! gamma a1 ⊆ gamma a2;! gamma_top : ∀ x, ! x ∈ gamma top;! join_sound : ∀ x y, ! gamma x ∪ gamma y ! ⊆ gamma (join x y)!}

module type ABDOM = sig! type ab! val le : ab → ab → bool! val top : ab! val join : ab → ab → ab! val widen : ab → ab → ab!end

Lazy proofsProof by necessity • We don’t prove properties that

are not strictly necessary to establish a soundness theorem.

What we don’t prove

• (ab,le,join) enjoy a lattice structure

• gamma is a meet morphism between complete lattices (Galois connection)

• widen is a sound widening operator

Class adom (ab:Type) (c:Type) := {! le : ab → ab → bool;! top : ab;! join : ab → ab → ab;! widen : ab → ab → ab;! ! gamma : ab → ℘(c);!! gamma_monotone : ∀ a1 a2,! le a1 a2 = true ⟹ ! gamma a1 ⊆ gamma a2;! gamma_top : ∀ x, ! x ∈ gamma top;! join_sound : ∀ x y, ! gamma x ∪ gamma y ! ⊆ gamma (join x y)!}

numbers

CompCert compiler

General architecture

? ...

statesState abstraction

control flowAbstract interpreterAlarms

Numerical abstraction

...

Each layer is parameterized by the underlying one.

CompCert: 1 compiler, 11 languages

type eliminationloop simplifications

CFG constructionexpr. decomp.

spilling, reloadingcalling conventions

Compcert C Clight C#minor

CminorCminorSelRTL

LTL LTLin Linear

MachASM

side-effects outof expressions

stack allocationof «&»variables

Optimizations: constant prop., CSE, tail calls, (LCM), (software pipelining)

instructionselection

registerallocation (IRC)

linearizationof the CFG

layout ofstack frames

asm codegeneration

(instruction scheduling)

Where should we perform the analysis ?

Compcert behavior preservation theorems

. . .

Theorem [Behavior Preservation]

!

Corollaries• If target program goes wrong, source program goes wrong too• A program verifier on , gives useful information on , but not

necessarily on . Pi Pi+1

Pi�1

Pn 2 Ln

P2 2 L2

P1 2 L1

8Pi 8Pi+1, C(Pi) = Pi+1 =) B(Pi + 1) ✓ B(Pi)

Which CompCert representation ?

RTL ?

• the place where most CompCert optimizations take place

• but platform specific, flat expressions

C source ?

• a language for programmer, not for tools

• ultimately we want to gives alarms at this level

Clight ?

• C syntax without side-effect in expressions

C#minor

• almost like Clight but with tool-friendly syntax (e.g. store/load instruction)

Platform specific backend

Compcert C Clight C#minor Cminor RTL … ASM

C#minor Abstract interpreter (1/2)

C#minor• structured statements• exit n (encoding break/continue) : jumps to the end of the (n+1)-th

enclosing block• goto with labels • variables

• global (their address can be taken, statically allocated)• local (their can be taken, dynamically allocated/freed at each function

call/return)• temporary (not resident in memory)

CompCert compilerC#minor ...


...

C#minor Abstract interpreter (2/2)

CompCert compilerC#minor ...


...

Structural approach instead of CFG approach• obviates the need to define program points• uses less memory than the CFG-based interpreter • transfer functions are more involved (control can leave a stmt in many ways)• local fixpoint solving at each loop• function call trigger a recursive call of the abstract interpreter• goto requires a global fixpoint computation

Parameterized by a relational abstract domain for execution states(environment + memory state + call stack)

The state abstract domain (1/2)

Abstract memory cell: 1 unit of storage

c ::= temp(f,t) | local(f,x,offset,size) | global(x,offset,size)

Abstract value: (cell types, points-to graph, numerical abstraction)

The domain is parameterized by a relational numerical domain where cells act as variables.

statesMemory & value domain


CompCert compilerC#minorClightCompCert C ...

The state abstract domain (2/2)

Example:

t := &T; …; s := s + load(int32, t + 8×i + 4)

1. Points-to information says « t contains a pointer to global T »

2. Numerical domain gives a range {4,12} for the numerical value 8×i + 4

3. We approximate the load expression with two numerical cell assignments temp(f,s) := temp(f,s) + global(T,4,int32) temp(f,s) := temp(f,s) + global(T,12,int32)

4. And finally take the join of the two new numerical abstract states thus obtained.


numbers

Abstract numerical domains



Z → int

Integer congruences Integer & F.P. intervals

Nonrel→ Rel Nonrel→ RelSymbolic equalities

Convex polyhedra


VERIMAG work

numbers




Z → int



Convex polyhedra


VERIMAG work

transforms any rel. domain over Z into a rel. domain over machine integers with modulo

arithmetic

numbers




Z → int



Convex polyhedra

conjunctions of linear inequalities ∑ai xi ≤ c

[SAS’13]


VERIMAG work

numbers




Z → int



Convex polyhedra


VERIMAG work

symbolic conditional expressions

(improve precision of assume commands)

numbers




Z → int



Convex polyhedra


VERIMAG work

transforms any non-rel. domain into a (reduced) rel. domain

numbers




Z → int



Convex polyhedra


VERIMAG work

crucial to analyze the safety of memory accesses (memory alignement)

numbers




Z → int



Convex polyhedra


VERIMAG work

requires reasoning on double-precision floating-point

numbers (IEEE754)

numbers




Z → int



Convex polyhedra


VERIMAG work

Combining abstract domains

Implementations of reduced products tend to be specific to the 2 domains being combined.

System of inter-domains communication channels inspired by that of Astrée [ASIAN’O6]• Channels are used by domains when they need information from another

domain.• The information already present in a channel is enriched with information

of a query.

Implementation

34 000 lines of Coq, excluding blanks and comments • half proof, half code & specs• plus parts reused from CompCert

Bulk of the development: abstract domains for states and for numbers(involve large case analyses and difficult proofs over integer and floating points arithmetic)

Except for the operations over polyhedra, the algorithms are implemented directly in Coq’s specification language.

transfert function

checker

untrusted solver

= formally verified= not verified

transfert functionExternal solver with verified operatorFully verified operator

Experimental results

Preliminary experiments on small C programs (up to a few hundred lines)• CompCert benchmarks• Cryptographic routines (NaCl library)

Exercise many delicate aspects of the C language: arrays, pointer arithmetic, function pointers, floating-point arithmetic.

The analyzer can takes < 1 minute to analyze a few hundred lines of C.

We obtain automatically foundational « can’t go wrong » proofs for non-trivial C programs

Conclusion

Static analyzer based on abstract interpretation which establishes the absence of run-time errors in C programs (excluding recursion and dynamic allocation)

!

!

!

Modular architecture supporting the extensible combination of multiple abstract domains (relational and non-relational)

Integrates with CompCert, so that the soundness of the analysis is guaranteed on the compiled code as well

Theorem vanalysis_is_correct: forall prog tr, ! vanalysis prog = OK $ program_behaves (semantics prog) (Goes_wrong tr)$ False.

A holistic effect with compiler verificationCompilerTheorem transf_c_program_is_refinement: forall p tp, transf_c_program p = OK tp → (forall behv, exec_C_program p behv → not_wrong behv) → (forall behv, exec_Asm_program tp behv → exec_C_program p behv).


Static analyzerTheorem analyzer_is_correct: forall p, static_analyzer_result p = Success → (forall behv, exec_C_program p behv → not_wrong behv).

+


Static analyzerTheorem analyzer_is_correct: forall p, static_analyzer_result p = Success → (forall behv, exec_C_program p behv → not_wrong behv).

+

Stronger correctness resultTheorem transf_c_program_is_refinement: forall p tp, transf_c_program p = OK tp → static_analyzer_result p = Success → (forall behv, exec_Asm_program tp behv → exec_C_program p behv).

=

Future directions

Engineering • from C#minor to Clight• better support for precision debugging

Improving the algorithmic efficiency of the static analyzer• from Coq’s integer and FP arithmetic (list of bits) to more efficient libraries• improve physical sharing of purely functional data structures used for

maps and setsExtend the memory abstract domain• dynamic memory allocation • recursion (also requires modifying the iterator)

Improving the precision of the analysis• on-the-fly unrolling of certain loops (based on unverified heuristics)• new abstract domains, e.g. octagons

Questions ?

Related work

S. Blazy, V. Laporte, A. Maroneze, and D. Pichardie. Formal verification of a C value analysis based on abstract interpretation. SAS 2013.

A.Fouilhé,D.Monniaux,and M.Périn. Efficient generation of correctness certificates for the abstract domain of polyhedra. SAS 2013.

S. Cho, J. Kang, J. Choi, C.-K. Hur, and K. Yi. Sparrow- Berry: A verified validator for an industrial-strength static analyzer. http://ropas.snu.ac.kr/sparrowberry/, 2013

M. Hofmann, A. Karbyshev, and H. Seidl. Verifying a local generic solver in Coq. SAS 2010.

T. Nipkow. Abstract interpretation of annotated commands. ITP 2012.

D. Cachera and D. Pichardie. A certified denotational abstract interpreter. ITP 2010.

Ain

Cin

Aout

Cout

C0out

⇥Ain

Cin

⇥ Aout

Cout

C0out

Figure 4. Communication channels between abstract operators.Left: single operator. Right: composition of two operators fromdifferent domains.

the same concrete environments. (Channel Cin

in figure 4.) Sym-metrically, each transfer function that returns a new abstract valuealso returns a channel C0

out

that other domains can query to obtaininformation on the state after the execution of the transfer function.Finally, yet another channel C

out

is provided as extra argument toeach transfer function. Querying this channel will provide informa-tion on the state after the execution of transfer functions from otherdomains. Only abstract domains having already computed thesefunction can answer these queries. In other words, the domain op-eration produces C0

out

by enriching information already present inC

out

with information of its own. For example, the assign transferfunction, which corresponds, in the concrete, to an assignment ofan expression to a variable, has the following type:

assign: var ! iexpr var ! t * in_chan !

in_chan ! (t * in_chan)+?;

The first and second arguments are the assigned variable and ex-pression. The third argument is a pair representing the initial states:an abstract value and the channel C

in

. The fourth argument is thechannel C

out

representing the current information about the finalstate after the assignment. If no contradiction is found, assign re-turns the final abstract state and the enriched channel C0

out

. Thespecification of assign is as follows:

assign_correct: 8 x e ab chan ⇢ n,

n 2 eval_iexpr ⇢ e !

⇢ 2 � ab !

(upd ⇢ x n) 2 � chan !

(upd ⇢ x n) 2 � (assign x e ab chan);

Here, we extend � to pairs of abstract values and channels, tak-ing � (x, y) = � x \ � y and using Coq type classes. Thisspecification of assign is analogous to the one in figure 2. Thedifference is that we add an hypothesis stating that the two chan-nels given as parameters are correct with respect to initial and finalstates respectively. Moreover, we demand that the returned channelbe correct with respect to the final state.

An implementation of such a specification has to create a chan-nel. For each query, the implementation can choose to forward it tothe channel received as its fourth parameter, effectively forwardingit to another domain, or to answer it using its own information, or todo both and combine the information. For example, an interval do-main will answer get_itv queries but not get_eq_expr queries,forwarding the latter to other domains.

This interface for channel-aware transfer functions such asassign makes it easy to combine two domains and have themcommunicate. Verasco provides a generic combinator (pictured as⌦ in figure 1) that takes two abstract domains over ideal numericalenvironments and returns a product domain where informationcoming from both domains is stored, and where the two domainscan communicate via channels. The definition of assign for theproduct is the following:

assign v e (ab:(A*B)*in_chan) chan :=

let ’((a, b), abchan) := ab in

(* Computation on the first component *)

do_bot retachan <- assign v e (a, abchan) chan;

let ’(reta, chan) := retachan in

(* Computation on the second component,

using the new input channel *)

do_bot retbchan <- assign v e (b, abchan) chan;

let ’(retb, chan) := retbchan in

NotBot ((reta, retb), chan)

The “plumbing” implemented here is depicted in figure 4, rightpart. As shown there, the C

out

channel passed to the second abstractdomain is the C0

out

channel generated by the first abstract domain.This enables the second abstract domain to query the first one.

Using this product construction, we can build trees (nested prod-ucts) of cooperating domains. As depicted in figure 1, the input tothe “Z ! int” domain transformer described in section 6.5 issuch a combination of numerical domains. Abstract states of thiscombination are pairs of, on the one hand, nested pairs of abstractstates from the individual numerical domains, and, on the otherhand, a channel. The channel is not only used when calling abstracttransfer functions, but also directly in order to get numerical infor-mation such as variation intervals. When the “Z ! int” domaintransformer calls a transfer function, it simply passes as initial C

out

channel a “top” channel whose concretization contains all concreteenvironments: assign x e v in chan top.

One final technical difficulty is comparison between abstractstates, such as the subsumption test v used during post-fixpointcomputation. In the upper layers, abstract states comprise (nestedpairs of) abstract values plus a channel. Therefore, it seems nec-essary to compare two channels, or at least one channel and oneabstract state. However, channels being records of functions, com-parison is not decidable. Our solution is to maintain the invariantthat channels never contain more information than what is con-tained in the abstract values they are paired with. That is, at thetop of the combination of domains, when we manipulate a pair(ab, chan) of an abstract value and a channel, we will makesure that �(ab) ✓ �(chan) holds. In order to check whether onesuch pair (ab1, chan1) is smaller than another (ab2, chan2),we only need to check that �(ab1, chan1) ✓ �(ab2), whichis easily decidable. Thus, the type of the comparison function forabstract values is:

leb: t * in_chan ! t ! bool

Note that it is useful to provide leb with a channel for its firstargument: an abstract domain can, then, query other domains inorder to compare abstract values.

However, the constraint �(ab) ✓ �(chan) is too strong forreal abstract domains: it makes it impossible for a domain to for-ward a query to another domain. Instead, we use a weaker con-straint for every transfer function. In the case of assign, we prove:

8 x e in chan0 ab chan,

assign x e in chan0 = NotBot (ab, chan) !

� chan0 \ � ab ✓ �(chan)

That is, the returned channel contains no more informationthan what is contained in the returned abstract value and thegiven channel. When chan0 is in_chan_top, it follows that�(ab) ✓ �(chan), ensuring the soundness of the leb compari-son function.

The property that limits the amount of information contained inchannels is useful beyond the proof of soundness for comparisons:it is also a sanity check, ensuring that the returned channel onlydepends on the abstract value and on the channel given as thelast argument, but not for instance on channels or abstract values

Communications between domains

a formally-veriﬁed c static analyzer - tum - chair viischulzef/2015-02-03-david... ·...

Documents