abstract interpretation

1

Abstract interpretation

Giorgio LeviDipartimento di Informatica, Università di Pisa

[email protected]

http://www.di.unipi.it/~levi/levi.html

2

The general idea a semantics

any definition style, from a denotational definition to a detailed interpreter

assigning meanings to programs on a suitable concrete domain (concrete computations domain)

an abstract domain modeling some properties of concrete computations and forgetting about the remaining information (abstract computations domain)

we derive an abstract semantics, which allows us to “execute” the program on the abstract domain to compute its abstract meaning, i.e., the modeled property

3

Concrete and Abstract Domains two complete partial orders

the partial orders reflect precision• smaller is better

concrete domain CC), {}, CC, , ) has the structure of a powerset

• we will see later why

abstract domain (AA, bottom, top, lub, glb) each abstract value is a description of “a set of” concrete values

4

Concretization

concrete domain CC), {}, CC, , )abstract domain ((AA, , bottom, top, lub, glb)) the meaning of abstract values is defined by a

concretization function

: AA ((CC))

aA, A, (a) is the set of concrete computations described by a that’s why the concrete domain needs to be a powerset

the concretization function must be monotonic

a1,a2 A, A, a1 a2 implies (a1) (a2) concretization preserves relative precision

5

Abstractionconcrete domain CC), {}, CC, , )abstract domain ((AA, , bottom, top, lub, glb))every element of ((CC)) should have a unique “best”

(most precise) description in AA this is possible if and only if AA is a Moore family

• closed under glb in such a case, we can define an abstraction function

: ((CC) ) AA

c((CC)), , (c) is the best abstract description of c the abstraction function must be monotonic

c1,c2 ((CC)), c1 c2 implies (c1) (c2) abstraction preserves relative precision

6

Galois connection

Galois connection (insertion)

x((CC))x x

yAAyy (yAAyy)

mutually determine each other

C A

CC), {}, CC, , )

((AA, , bottom, top, lub, glb)): AA ((CC)) (concretization)

: ((CC) ) A A (abstraction)

monotonic

there may be loss of information (approximation) in describing

an element of ((CC)) by an element of AA

7

Concrete semantics the concrete semantics is defined as the least or (greatest) fixpont

of a concrete semantic evaluation function FF defined on the domain CC

this does not necessarily mean that the semantic definition style is denotational!

FF is defined in terms of primitive semantic operations fi on CC

the abstract semantic evaluation function is obtained by replacing in FF each concrete operation fi by a suitable abstract operation

however, since the actual concrete domain is ((CC)), we need first to lift the concrete semantics lfp FF to a collecting semantics

defined on ((CC))

8

Collecting semantics lifting lfp F F to the powerset (to get the collecting semantics) is

simply a conceptual operation collecting semantics = {lfp FF}

we don’t need to define a brand new collecting semantic

evaluation function FFcc on ((CC)) we just need to reason in terms of liftings of all the primitive operations (and

of FF), while designing the abstract operations and establishing their properties

in the following, by abuse of notation, we will use the same notation for the standard and the collecting (“conceptually” lifted) operations

9

Abstract operations: local correctnessan abstract operator fi

defined on AA is locally correct wrt a concrete operator fi if

x1,..,xn ((CC))fi x1,..,xn) fi

x1,..,xn the concrete computation step is more precise than the

concretization of the “corresponding” abstract computation step

a very weak requirement, which is satisfied, for example, by an abstract operator which always computes the worst abstract value top

the real issue in the design of abstract operations is therefore precision

10

Abstract operations: optimality and completeness correctness

x1,..,xn ((CC))fi x1,..,xn) fi

x1,..,xn

optimality y1,..,yn AA.

fiy1,..,yn) fi

y1,..,yn the most precise abstract operator fi

correct wrt fi

a theoretical bound and basis for the design, rather then an implementable definition

completeness (exactness or absolute precision)

x1,..,xn ((CC))fi x1,..,xn)) fi

x1,..,xn no loss of information,the abstraction of the concrete computation step is exactly

the same as the result of the corresponding abstract computation step

11

From local to global correctness the composition of locally correct abstract operations is locally correct

wrt the composition of concrete operations composition does not preserve optimality, i.e., the composition of optimal

operators may be less precise than the optimal abstract version of the composition

if we obtain FF(abstract semantic evaluation function) by replacing in FF every concrete semantic operation by a corresponding (locally correct) abstract operation, the local correctness property still holds

x ((CC))FF x) FFx))) local correctness implies global correctness, i.e., correctness of the abstract

semantics wrt the concrete one

lfp F F lfpFFgfp F F gfpFF

(lfp F F ) lfpFF(gfp F F ) gfpFF

the abstraction of the concrete semantics is more precise than the abstract semantics

12

(lfp FF ) lfp FF:why computing lfp FF?

lfp FF cannot be computed in finitely many steps steps are in general required

lfp FFcan be computed in finitely many steps, if the abstract domain is finite or at least noetherian

does not contain infinite increasing chains interesting for static program analysis, where the fixpoint computation must

terminate most program properties considered in static analysis are undecidable we accept a loss of precision (safe approximation) in order to make the

analysis feasible

13

Applications

comparative semantics a technique to reason about semantics at different level of

abstraction• non-noetherian abstract domain• abstraction without approximation (completeness)

(lfp FF) lfp FF

static analysis = effective computation of the abstract semantics

if the abstract domain is noetherian and the abstract operations are computationally feasible

if the abstract domain is non-noetherian or if the fixpoint computation is too complex

• use widening operators– which effectively compute an (upper) approximation of lfp FF

» one example later

14

The abstract interpretation frameworkCC), {}, CC, , ) (concrete domain)

(AA, bottom, top, lub, glb) (abstract domain)

: AA ((CC)) monotonic (concretization function)

: ((CC) ) A A monotonic (abstraction function)

x((CC))x xyAAyy (Galois connection)

fi fi| x1,..,xn ((CC))

fi x1,..,xn) fix1,..,xn(local correctness)

critical choices the abstract domain to model the property the (possibly optimal) correct abstract operations

15

Other approaches and extensions there exist weaker versions of abstract interpretation

without Galois connections (e.g., concretization function only) based on approximation operators (widening, narrowing) without explicit abstract domain (closure operators)

the theory provides also several results on abstract domain design

how to combine domains how to improve the precision of a domain how to transform an abstract domain into a complete one …... we will look at some of these results in the last lecture

16

A simple abstract interpreter computing Signs

concrete semantics executable specification (in ML) of the denotational

semantics of untyped -calculus without recursion

abstract semantics abstract interpreter computing on the domain Sign

18

A program

Fun(Id "x", Ifthenelse(Var (Id "x"),

Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1))))

the ML expression

function x -> if x=0 then x * x else x * (-1)

19

Concrete semanticsdenotational interpretereager semanticsseparation from the main semantic evaluation

function of the primitive operations which will then be replaced by their abstract versions

abstraction of concrete values identity function in the concrete semantics

symbolic “non-deterministic” semantics of the conditional

20

Semantic domains type eval =

| Funval of (eval -> eval) | Int of int | Wrong

let alfa x = x type env = ide -> eval let emptyenv (x: ide) = alfa(Wrong) let applyenv ((x: env), (y: ide)) = x y let bind ((r:env), (l:ide), (e:eval)) (lu:ide) =

if lu = l then e else r(lu)

21

Semantic evaluation function

let rec sem (e:exp) (r:env) = match e with| Eint(n) -> alfa(Int(n))| Var(i) -> applyenv(r,i)| Times(a,b) -> times ( (sem a r), (sem b r))

| Ifthenelse(a,b,c) -> let a1 = sem a r in (if valid(a1) then sem b r else (if unsatisfiable(a1) then sem c r else merge(a1,sem b r,sem c r)))

| Fun(ii,aa) -> makefun(ii,aa,r) | Appl(a,b) -> applyfun(sem a r, sem b r)

22

Primitive operations

let times (x,y) = match (x,y) with |(Int nx, Int ny) -> Int (nx * ny) | _ -> alfa(Wrong)

let valid x = match x with

|Int n -> n=0

let unsatisfiable x = match x with |Int n -> if n=0 then false else true

let merge (a,b,c) = match a with |Int n -> if b=c then b else alfa(Wrong)

| _ -> alfa(Wrong)

let applyfun ((x:eval),(y:eval)) = match x with|Funval f -> f y| _ -> alfa(Wrong)

let rec makefun(ii,aa,r) = Funval(function d -> if d = alfa(Wrong) then alfa(Wrong) else sem aa (bind(r,ii,d)))

23

From the concrete to the collecting semantics

the concrete semantic evaluation function sem: exp -> env -> eval

the collecting semantic evaluation function semc: exp -> env -> (eval) semc e r = {sem e r} all the concrete primitive operations have to be lifted to (eval) in the design of the abstract operations

24

Example of concrete evaluation # let esempio = sem( Fun

(Id "x",

Ifthenelse

(Var (Id "x"), Times (Var (Id "x"), Var (Id "x")),

Times (Var (Id "x"), Eint (-1)))) ) emptyenv;;

val esempio : eval = Funval <fun>

# applyfun(esempio,Int 0);;

- : eval = Int 0

# applyfun(esempio,Int 1);;

- : eval = Int -1

# applyfun(esempio,Int(-1));;

- : eval = Int 1

in the “virtual” collecting versionapplyfunc(esempio,{Int 0,Int 1}) = {Int 0, Int -1}

applyfunc(esempio,{Int 0,Int -1}) = {Int 0, Int 1}

applyfunc(esempio,{Int -1,Int 1}) = {Int 1, Int -1}

25

From the collecting to the abstract semantics

concrete domain: ((ceval), )concrete (non-collecting) environment:

cenv = ide -> cevalabstract domain: (eval, )abstract environment: env = ide -> evalthe collecting semantic evaluation function

semc: exp -> env -> (ceval)the abstract semantic evaluation function

sem: exp -> env -> eval

26

The Sign Abstract Domain

concrete domain ((((ZZ), ), )) sets of integers

abstract domain ((SignSign, , ))

0-

top

0 - +

bot

0+

27

Redefining eval for SignSigntype ceval = Funval of (ceval -> ceval) | Int of int | Wrong

type eval = Afunval of (eval -> eval) | Top | Bottom | Zero | Zerop | Zerom | P | M

let alfa x = match x with Wrong -> Top | Int n -> if n = 0 then Zero else if n > 0 then P else M

the partial order relation the relation shown in the Sign lattice, extended with its lifting to

functions • there exist no infinite increasing chains• we might add a recursive function construct and find a way to compute

the abstract least fixpoint in a finite number of steps lub and glb of eval are the obvious ones concrete domain: ceval), {}, ceval, , ) abstract domain: (eval, , Bottom, Top, lub, glb)

28

Concretization function concrete domain: ceval), {}, ceval, , ) abstract domain: (eval, , Bottom, Top, lub, glb)

s(x) ={}, if x = Bottom

{Int(y) |y>0}, if x = P

{Int(y) |y0}, if x = Zerop

{Int(0)}, if x = Zero

{Int(y)|y0}, if x = Zerom

{Int(y)|y<0}, if x = M

ceval, if x = Top

{Funval(g) |y eval x s(y, g(x) s(f(y))}, if x = Afunval(f)

29

Abstraction function concrete domain: ceval), {}, ceval, , ) abstract domain: (eval, , Bottom, Top, lub, glb)s(y) = glb{

Bottom, if y = {}M, if y {Int(z)| z<0}Zerom, if y {Int(z)| z0}Zero, if y {Int(0)}Zerop, if y {Int(z)| z 0}P, if y {Int(z)| z>0}Top, if y ceval

lub{Afunval(f)| Funval(g) s(Afunval(f))},

if y {Funval(g)} & Funval(g) y}}

30

Galois connections and s

are monotonic define a Galois connection

31

Times Sign

bot - 0- 0 0+ + top

bot bot bot bot bot bot bot bot - bot + 0+ 0 0- - top 0- bot 0+ 0+ 0 0- 0- top 0 bot 0 0 0 0 0 0 0+ bot 0- 0- 0 0+ 0+ top + bot - 0- 0 0+ + toptop bot top top 0 top top top

optimal (hence correct) and complete (no approximation)

32

Abstract operations in addition to times and lub

let valid x = match x with | Zero -> true | _ -> false

let unsatisfiable x = match x with | M -> true| P -> true| _ -> false

let merge (a,b,c) = match a with | Afunval(_) -> Top| _ -> lub(b,c)

let applyfun ((x:eval),(y:eval)) = match x with |Afunval f -> f y| _ -> alfa(Wrong)

let rec makefun(ii,aa,r) = Afunval(function d -> if d = alfa(Wrong) then d else sem aa (bind(r,ii,d)))

sem is left unchanged

33

An example of abstract evaluation# let esempio = sem( Fun (Id "x", Ifthenelse (Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) ) emptyenv;;val esempio : eval = Afunval <fun>

# applyfun(esempio,P);;- : eval = M# applyfun(esempio,Zero);;- : eval = Zero# applyfun(esempio,M);;- : eval = P# applyfun(esempio,Zerop);;- : eval = Top# applyfun(esempio,Zerom);;- : eval = Zerop# applyfun(esempio,Top);;- : eval = Top

applyfunc(esempio,{Int 0,Int 1}) = {Int 0, Int -1}

applyfunc(esempio,{Int 0,Int -1}) = {Int 0, Int 1}applyfunc(esempio,{Int -1,Int 1}) = {Int 1, Int -1}

wrt the abstraction of the concrete (collecting) semantics, approximation for Zerop

no abstract operations which “invent” the values Zerop and Zerom

which are the only ones on which the conditional takes both ways and can introduce approximation

34

Recursion the language has no recursion

• fixpoint computations are not needed if (sets of) functions on the concrete domain are abstracted to

functions on the abstract domain, we must be careful in the case of recursive definitions

• a naïve solution might cause the application of a recursive abstract function to diverge, even if the domain is finite

• we might never get rid of recursion because the guard in the conditional is not valid or satisfiable

• we cannot explicitely compute the fixpoint, because equivalence on functions cannot be expressed

• termination can only be obtained by a loop checking mechanism (finitely many different recursive calls)

we will see a different solution in a case where (sets of) functions are abstracted to non functional values

• the explicit fixpoint computation will then be possible

abstract interpretation

Documents