precise interprocedural analysis using random interpretation sumit gulwani george necula uc-berkeley

Precise Interprocedural Analysis using Random Interpretation

Sumit Gulwani George Necula

UC-Berkeley

2

Random Interpretation

= Random Testing + Abstract Interpretation

• Almost as simple as random testing but better soundness guarantees.

• Almost as sound as abstract interpretation but more precise, efficient, and simple.

3

Example

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert(c+d = 0); assert(c = a+i)

c := 2a + b; d := b – 2i;

True False

False

•Random testing needs to execute all 4 paths to verify assertions.

• Abstract interpretation analyzes statements once but uses complicated operations.

•Random interpretation simply executes program once (and captures effect of all paths).

True

*

*

4

Outline

• Framework for intraprocedural random interpretation– Advantages

•Investigate all analyses using one framework•Design and proof of new analyses will be

simpler

• A generic algorithm for interprocedural analysis

5

Outline

• Framework for intraprocedural random interpretation– Affine join function– Eval function– Example

• A generic algorithm for interprocedural analysis

6

Random Interpretation framework

Goal: Detect equivalences of expressions.

Generic Algorithm:

• Choose random values for input variables.

• Execute assignments.

– Using Eval function to evaluate expressions.

• Execute both branches of conditionals and combine the program states at join points.

– Using Affine Join function.

• Compare values of expressions to decide equality.

7

Affine Join function

Used for combining program states at join points.w: State £ State ! State

Let = w(1,2). Then,

(y) =def w£1(y) + (1-w)£2(y)

2: [a=4, b=1]1: [a=2, b=3]

a := 2; b := 3;

a := 4; b := 1;

= 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]

8

2: [a=4, b=1]1: [a=2, b=3]

Properties of Affine Join

• Affine join preserves common linear relationships e.g. a+b=5.

• It does not introduce false relationships w.h.p.

a := 2; b := 3;

a := 4; b := 1;

= 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]

9

Eval function

Eval: Expression £ State ! Value• Used for executing expressions• Defined in terms of Poly: Expression ! Polynomial• Poly is abstraction specific

Eval(e,) = Evaluation of Poly(e) using and random choices for non-program variables

Poly must satisfy:

• Correctness: Poly(e1) = Poly(e2) iff e1 = e2

• Linearity: Poly(e) is linear in program variables.

10

Example of Poly function

• Linear Arithmetic (POPL 2003)Expression e := y | e1 § e

2 | c¢e

Poly(e) = e

• Uninterpreted Functions (POPL 2004)Expression e := y | F(e)Poly(y) = yPoly(F(e)) = a £ Poly(e) + b

Example: Random Interpretation for Linear Arithmetic

i=3, a=0, b=3

i=3

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c+d = 0); assert (c = a+i)

i=3, a=-4, b=7

i=3, a=-4, b=7c=23, d=-23

c := 2a + b; d := b – 2i;

i=3, a=1, b=2

i=3, a=-4, b=7c=-1, d=1

i=3, a=-4, b=7 c=11, d=-11

False

False

w1 = 5

w2 = 2

True

True*

*

12

Outline

• Framework for intraprocedural random interpretation– Affine join function– Eval function– Example

• A generic algorithm for interprocedural analysis– Random summary (Idea #1)– Issue of freshness (Idea #2)– Error probability and complexity– Experiments

i=3, a=0, b=3

i=3

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;


i=3, a=-4, b=7

i=3, a=-4, b=7c=23, d=-23

c := 2a + b; d := b – 2i;

i=3, a=1, b=2

i=3, a=-4, b=7c=-1, d=1

i=3, a=-4, b=7 c=11, d=-11

False

False

w1 = 5

w2 = 2

Example

True

True*

*

• The second assertion is true in the context i=2.

•We need two new ideas to make the analysis interprocedural.

i=2

a=0, b=i

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;


a=8-4i, b=5i-8

a=8-4i, b=5i-8c=21i-40, d=40-21i

c := 2a + b; d := b – 2i;

a=i-2, b=2

a=8-4i, b=5i-8c=8-3i, d=3i-8

a=8-4i, b=5i-8 c=9i-16, d=16-9i

False

False

w1 = 5

w2 = 2

Idea #1: Keep input variables symbolic

•Do not choose random values for input variables (to later instantiate by any context).

• Resulting program state at the end is a random summary.

a=0, b=2c=2, d=-2

True

True

*

*

15

Idea #2: Generate fresh summaries

u = 5¢2 -7 = 3v = 5¢1 -7 = -2w = 5¢1 -7 = -2

x = 5i-7

w = 5 x = 3x = i+1

x := i+1;

x := 3;

return x;

*

Procedure P Input: i

Assert (u = 3);Assert (v = w);

u := P(2); v := P(1); w := P(1);

Procedure Q

•Plugging the same summary twice is unsound.

•Fresh summaries can be generated by random affine combination of few independent summaries!

True False

16

Generating 2 random summaries for P

Procedure P

x=[5i-7,7-2i]

w=[5,-2]

x = [3,3]x=[i+1,i+1]

x := i+1;

x := 3;

return x;

*

Input: i

True False

x = 7(5i-7,7-2i) = 47i-91

x = 6(5i-7,7-2i) = 40i-77

x = 2(5i-7,7-2i) = 19i-35

x = 0(5i-7,7-2i) = 7-2i

x = 5(5i-7,7-2i) = 33i-63

x = 1(5i-7,7-2i) = 5i-7Procedure Q calls P 3 times. Hence, generating 2 random summaries for Q requires 2£3 fresh summaries of P.

17

Generating 2 random summaries for Q

u = [47¢2-91, 40¢2-77] =[3,3]v = [19¢1-35, 7-2¢1] =[-16,5]w = [33¢1-63, 5¢1-7] =[-30,-2]

Assert (u = 3);Assert (v = w);

u := P(2); v := P(1); w := P(1);

Procedure Qx = 7(5i-7,7-2i) = 47i-91

x = 6(5i-7,7-2i) = 40i-77

x = 2(5i-7,7-2i) = 19i-35

x = 0(5i-7,7-2i) = 7-2i

x = 5(5i-7,7-2i) = 33i-63

x = 1(5i-7,7-2i) = 5i-7

18

Loops and Fixed point computation

• In presence of loops (in procedures and call-graphs), fixed point computation is required.

• The number of iterations required to reach fixed point is kv(2kI+1) + 1

kv: # of visible variables

kI: # of input variables

19

Error Probability and Complexity

Time Complexity = nkVkI2t

Error probability = 1/qt-m

n: size of programkV, kI: # of visible and input variables

t: # of random summariesq: size of set from which random values are chosenm: kI kV (generic bound)

kI + kV (for linear arithmetic)

4 (for unary uninterpreted functions)

20

Related Work

• Intraprocedural random interpretation– Linear arithmetic (POPL 03)– Uninterpreted functions (POPL 04)

• Interprocedural dataflow analysis (POPL 95, TCS 96)– Sagiv, Reps, Horwitz– Cons: simpler properties, e.g. liveness, linear

constants– Pro: better computational complexity

• Interprocedural linear arithmetic (POPL 04)– Muller-Olm, Seidl– Cons: O(k2) times slower– Pro: works for non-linear relationships too

21

Related Work

• Intraprocedural random interpretation– Linear arithmetic (POPL 03)– Uninterpreted functions (POPL 04)

• Interprocedural dataflow analysis (POPL 95, TCS 96)– Sagiv, Reps, Horwitz– Cons: simpler properties, e.g. liveness, linear

constants– Pro: better computational complexity

• Interprocedural linear arithmetic (POPL 04)– Muller-Olm, Seidl– Cons: O(k2) times slower– Pro: works for non-linear relationships too

22

Experiments

Prog Line

Inp Var Time

go 29K

63 1700

47

ijpeg 28K

31 825 4

li 23K

53 392 34

gzip 8K 49 525 2

Random Inter(this paper)

Random Intra(POPL 2003)

Det Inter(TCS 96)

Var) Speedup

170 107

34 24

160 756

200 39

Inp)

Speedup

17 1.9

3 2.3

20 1.3

6 2.0•Inp: # of input variables that were constants•Var: # of local variable that were constants• (Var): # of fewer local variable constants discovered

Random Inter discovers 10-70% more facts; Random Intra is faster by 10-500 times; Det Inter is faster by 2 times.

23

Conclusion

• Randomization buys efficiency, simplicity at cost of probabilistic soundness.

• Combining randomized techniques with symbolic techniques is powerful.

precise interprocedural analysis using random interpretation sumit gulwani george necula uc-berkeley

Documents

context i

b slide

linear arithmetic i

b assert c

b assertc

random summaries

c e polye

return x