precise interprocedural analysis using random interpretation sumit gulwani george necula uc-berkeley
TRANSCRIPT
Precise Interprocedural Analysis using Random Interpretation
Sumit Gulwani George Necula
UC-Berkeley
2
Random Interpretation
= Random Testing + Abstract Interpretation
• Almost as simple as random testing but better soundness guarantees.
• Almost as sound as abstract interpretation but more precise, efficient, and simple.
3
Example
a := 0; b := i;
a := i-2; b := 2;
c := b – a; d := i – 2b;
assert(c+d = 0); assert(c = a+i)
c := 2a + b; d := b – 2i;
True False
False
•Random testing needs to execute all 4 paths to verify assertions.
• Abstract interpretation analyzes statements once but uses complicated operations.
•Random interpretation simply executes program once (and captures effect of all paths).
True
*
*
4
Outline
• Framework for intraprocedural random interpretation– Advantages
•Investigate all analyses using one framework•Design and proof of new analyses will be
simpler
• A generic algorithm for interprocedural analysis
5
Outline
• Framework for intraprocedural random interpretation– Affine join function– Eval function– Example
• A generic algorithm for interprocedural analysis
6
Random Interpretation framework
Goal: Detect equivalences of expressions.
Generic Algorithm:
• Choose random values for input variables.
• Execute assignments.
– Using Eval function to evaluate expressions.
• Execute both branches of conditionals and combine the program states at join points.
– Using Affine Join function.
• Compare values of expressions to decide equality.
7
Affine Join function
Used for combining program states at join points.w: State £ State ! State
Let = w(1,2). Then,
(y) =def w£1(y) + (1-w)£2(y)
2: [a=4, b=1]1: [a=2, b=3]
a := 2; b := 3;
a := 4; b := 1;
= 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]
8
2: [a=4, b=1]1: [a=2, b=3]
Properties of Affine Join
• Affine join preserves common linear relationships e.g. a+b=5.
• It does not introduce false relationships w.h.p.
a := 2; b := 3;
a := 4; b := 1;
= 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]
9
Eval function
Eval: Expression £ State ! Value• Used for executing expressions• Defined in terms of Poly: Expression ! Polynomial• Poly is abstraction specific
Eval(e,) = Evaluation of Poly(e) using and random choices for non-program variables
Poly must satisfy:
• Correctness: Poly(e1) = Poly(e2) iff e1 = e2
• Linearity: Poly(e) is linear in program variables.
10
Example of Poly function
• Linear Arithmetic (POPL 2003)Expression e := y | e1 § e
2 | c¢e
Poly(e) = e
• Uninterpreted Functions (POPL 2004)Expression e := y | F(e)Poly(y) = yPoly(F(e)) = a £ Poly(e) + b
Example: Random Interpretation for Linear Arithmetic
i=3, a=0, b=3
i=3
a := 0; b := i;
a := i-2; b := 2;
c := b – a; d := i – 2b;
assert (c+d = 0); assert (c = a+i)
i=3, a=-4, b=7
i=3, a=-4, b=7c=23, d=-23
c := 2a + b; d := b – 2i;
i=3, a=1, b=2
i=3, a=-4, b=7c=-1, d=1
i=3, a=-4, b=7 c=11, d=-11
False
False
w1 = 5
w2 = 2
True
True*
*
12
Outline
• Framework for intraprocedural random interpretation– Affine join function– Eval function– Example
• A generic algorithm for interprocedural analysis– Random summary (Idea #1)– Issue of freshness (Idea #2)– Error probability and complexity– Experiments
i=3, a=0, b=3
i=3
a := 0; b := i;
a := i-2; b := 2;
c := b – a; d := i – 2b;
assert (c+d = 0); assert (c = a+i)
i=3, a=-4, b=7
i=3, a=-4, b=7c=23, d=-23
c := 2a + b; d := b – 2i;
i=3, a=1, b=2
i=3, a=-4, b=7c=-1, d=1
i=3, a=-4, b=7 c=11, d=-11
False
False
w1 = 5
w2 = 2
Example
True
True*
*
• The second assertion is true in the context i=2.
•We need two new ideas to make the analysis interprocedural.
i=2
a=0, b=i
a := 0; b := i;
a := i-2; b := 2;
c := b – a; d := i – 2b;
assert (c+d = 0); assert (c = a+i)
a=8-4i, b=5i-8
a=8-4i, b=5i-8c=21i-40, d=40-21i
c := 2a + b; d := b – 2i;
a=i-2, b=2
a=8-4i, b=5i-8c=8-3i, d=3i-8
a=8-4i, b=5i-8 c=9i-16, d=16-9i
False
False
w1 = 5
w2 = 2
Idea #1: Keep input variables symbolic
•Do not choose random values for input variables (to later instantiate by any context).
• Resulting program state at the end is a random summary.
a=0, b=2c=2, d=-2
True
True
*
*
15
Idea #2: Generate fresh summaries
u = 5¢2 -7 = 3v = 5¢1 -7 = -2w = 5¢1 -7 = -2
x = 5i-7
w = 5 x = 3x = i+1
x := i+1;
x := 3;
return x;
*
Procedure P Input: i
Assert (u = 3);Assert (v = w);
u := P(2); v := P(1); w := P(1);
Procedure Q
•Plugging the same summary twice is unsound.
•Fresh summaries can be generated by random affine combination of few independent summaries!
True False
16
Generating 2 random summaries for P
Procedure P
x=[5i-7,7-2i]
w=[5,-2]
x = [3,3]x=[i+1,i+1]
x := i+1;
x := 3;
return x;
*
Input: i
True False
x = 7(5i-7,7-2i) = 47i-91
x = 6(5i-7,7-2i) = 40i-77
x = 2(5i-7,7-2i) = 19i-35
x = 0(5i-7,7-2i) = 7-2i
x = 5(5i-7,7-2i) = 33i-63
x = 1(5i-7,7-2i) = 5i-7Procedure Q calls P 3 times. Hence, generating 2 random summaries for Q requires 2£3 fresh summaries of P.
17
Generating 2 random summaries for Q
u = [47¢2-91, 40¢2-77] =[3,3]v = [19¢1-35, 7-2¢1] =[-16,5]w = [33¢1-63, 5¢1-7] =[-30,-2]
Assert (u = 3);Assert (v = w);
u := P(2); v := P(1); w := P(1);
Procedure Qx = 7(5i-7,7-2i) = 47i-91
x = 6(5i-7,7-2i) = 40i-77
x = 2(5i-7,7-2i) = 19i-35
x = 0(5i-7,7-2i) = 7-2i
x = 5(5i-7,7-2i) = 33i-63
x = 1(5i-7,7-2i) = 5i-7
18
Loops and Fixed point computation
• In presence of loops (in procedures and call-graphs), fixed point computation is required.
• The number of iterations required to reach fixed point is kv(2kI+1) + 1
kv: # of visible variables
kI: # of input variables
19
Error Probability and Complexity
Time Complexity = nkVkI2t
Error probability = 1/qt-m
n: size of programkV, kI: # of visible and input variables
t: # of random summariesq: size of set from which random values are chosenm: kI kV (generic bound)
kI + kV (for linear arithmetic)
4 (for unary uninterpreted functions)
20
Related Work
• Intraprocedural random interpretation– Linear arithmetic (POPL 03)– Uninterpreted functions (POPL 04)
• Interprocedural dataflow analysis (POPL 95, TCS 96)– Sagiv, Reps, Horwitz– Cons: simpler properties, e.g. liveness, linear
constants– Pro: better computational complexity
• Interprocedural linear arithmetic (POPL 04)– Muller-Olm, Seidl– Cons: O(k2) times slower– Pro: works for non-linear relationships too
21
Related Work
• Intraprocedural random interpretation– Linear arithmetic (POPL 03)– Uninterpreted functions (POPL 04)
• Interprocedural dataflow analysis (POPL 95, TCS 96)– Sagiv, Reps, Horwitz– Cons: simpler properties, e.g. liveness, linear
constants– Pro: better computational complexity
• Interprocedural linear arithmetic (POPL 04)– Muller-Olm, Seidl– Cons: O(k2) times slower– Pro: works for non-linear relationships too
22
Experiments
Prog Line
Inp Var Time
go 29K
63 1700
47
ijpeg 28K
31 825 4
li 23K
53 392 34
gzip 8K 49 525 2
Random Inter(this paper)
Random Intra(POPL 2003)
Det Inter(TCS 96)
Var) Speedup
170 107
34 24
160 756
200 39
Inp)
Speedup
17 1.9
3 2.3
20 1.3
6 2.0•Inp: # of input variables that were constants•Var: # of local variable that were constants• (Var): # of fewer local variable constants discovered
Random Inter discovers 10-70% more facts; Random Intra is faster by 10-500 times; Det Inter is faster by 2 times.
23
Conclusion
• Randomization buys efficiency, simplicity at cost of probabilistic soundness.
• Combining randomized techniques with symbolic techniques is powerful.