ide dataflow analysis in the presence of large object-oriented libraries
Post on 03-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
IDE Dataflow Analysis in the Presence of Large Object-
Oriented Libraries
Atanas (Nasko) RountevMariana Sharp
Guoqing (Harry) XuOhio State University
Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant
22 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Analysis with Large Libraries
All programs are built with reusable components- Standard libraries in C++, Java, C#- Domain-specific libraries
Whole-program analysis: complete client program C, together with all libraries it uses- Solutions for all program points in C and in the libraries
Summary-based analysis: pre-analyze the library and record reusable library summary information- Solutions for all program points in C
Goal: reduce the cost without losing any precision- e.g., the solutions inside C should be the same
This may be low-hanging fruit
33 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Talk Outline Interprocedural distributed environment (IDE)
dataflow analysis problems- Definition; precise whole-program analysis- Examples: dependence analysis and type analysis
Generation of library summaries for IDE problems- Intra/interprocedural analysis in the library
Handling the possible effects of unknown clients Filtering away details that are irrelevant for
clients
Experimental evaluation- Entire Java 1.4.2 libraries; 20 client programs
44 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interproc. Distributive Environment Problems
Defined by Sagiv, Reps, and Horwitz [TheorCompSci96]
- Subsumes the interprocedural finite distributive subset (IFDS) problems from their [POPL95] work
- Versions of constant propagation, slicing, alias analysis, side-effect analysis, reaching definitions, liveness, etc.
An environment is a map e : D L; e Env(D,L)- D is a set of symbols, L is a meet semi-lattice- Environment meet: (e1 e2)(d) = e1(d) e2(d)
Environment transformer t : Env(D,L) Env(D,L)- Distributive: e.g. t(e1 e2) = t(e1) t(e2)
55 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Dependence Analysis and Type Analysis for Java
Dependencies: for a local variable v at CFG node n, which formal parameters of n’s method influence v?- Restricted form of dep. analysis; useful for SDG building
D = { v1, …, vk }: locals vi
L = powerset of { f1, …, fm }: formals fj; meet is Transformer for v1:=f2: t(e) = e[v1 {f2}] Transformer for v1:=v2+v3: t(e) = e[v1 e(v2) e(v3)] Call v1:=meth(v2): composition of v2-to-formal, valid
same-level paths in meth, return-to-v1
0-CFA type analysis: D = { v1, …, vk, fld1, …, fldm }: locals and fields; L = powerset of set of types
66 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Representation of Environment Transformers
Key issue for any summary-based analysis: how do we represent and manipulate dataflow functions?- For IDE: composition/meet of environment
transformers
Sagiv et al.: a transformer can be represented by a bipartite directed graph with 2(|D|+1) nodes- Edges labeled with functions L L
ll
d1
ll
d1
dn
ll
dn
…..
ll
d1
lf
d1
dn
dn
…..
ll
ll
d1
d1
dn
dn
…..
d2
d2
d3
d3
llll
ll ll
t(env) = env t(env) = env[d1 ]}{ f t(env) = env[d2 env(d1) env(d3)]
77 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Composition of Transformers Graph reachability + composition of edge
labels
ll
d1
lf
d2
ll
ll
d1 d2 d3
llll
ll ll
}{ f
t(env) = env[d2 env(d1) env(d3)]
t(env) = env[d1 ]
d3
ll
ll
d1
d1
d2
d2
d3
d3
llll
lf
lf
88 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Precise Whole-Program Analysis Graph reachability along valid interprocedural paths Phase 1: summary function n for each CFG node n
- Represents the solution at n as a function of the solution at the entry of the procedure containing n
- Computed through composition and meet of transformers- Summary function at proc exit used at call sites to proc- Partial functions n: only for the subset of the domain that
is relevant to callers of n’s procedure
Phase 2: Top-down propagation of actual environments (e.g., dependence sets, type sets)
Adapt to library summary generation?
99 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Talk Outline Interprocedural distributed environment (IDE)
dataflow analysis problems- Definition; precise whole-program analysis- Examples: dependence analysis and type analysis
Generation of library summaries for IDE problems- Intra/interprocedural analysis in the library
Handling the possible effects of unknown clients Filtering away details that are irrelevant for
clients
Experimental evaluation- Entire Java 1.4.2 libraries; 20 client programs
1010 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Phase 1: Intraprocedural Summary Generation
Produce a set of summary functions n,m
- n is the entry or a call site- m is the exit or a call site- there exists a call-free path from n to m
Similar to the summary functions n from the whole-program analysis, but - complete functions instead of partial functions- all possible compositions and meets of transformers
(as graph operations), until a fixed point is reached
After this, some elements of D are filtered away- e.g., for dependence analysis: locals that are not
actuals of calls and not written the return values from calls
1111 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Example
class DateFormatString format(Date f1) { DateFormat r0; Date r1; StringBuffer r2, r3; r0 = this; r1 = f1; r2 = new StringBuffer();cs1: r3 = r0.format(r1,r2);cs2: String r4 = r3.toString(); return r4;}
this f1 r0 r1 r2 r3 r4 ret
this f1 r0 r1 r2 r3 r4 ret
l ll
ll
this f1 r0 r1 r2 r3 r4 ret
this f1 r0 r1 r2 r3 r4 ret
ll
entry cs1
rs2 exit
this f1 r0 r1 r2
this f1 r0 r1 r2
l ll
ll
r4 ret
r4 ret
ll
1212 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Phase 2: Interprocedural Summary Generation
class DateFormatString format(Date f1) { DateFormat r0; Date r1; StringBuffer r2, r3; r0 = this; r1 = f1; r2 = new StringBuffer();cs1: r3 = r0.format(r1,r2);cs2: String r4 = r3.toString(); return r4;}
summary for toString, at cs2 r3
r3
ll
r4
r4
r4 ret
r4 ret
ll
r3 ret
r3 ret
ll
rs1 exit
1313 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Phase 2: Interprocedural Summary Generation
Fixed call site: has exactly one possible target- Cannot be a site that calls back client methods
Check type hierarchy for possible overriding in clients- Cannot have multiple target methods
Static calls; constructor calls; final classes/methods Intraprocedural 0-CFA type analysis: in the summary
function, the only edge reaching x should be x
Fixed method: has only fixed calls (or no calls), and this also holds for all methods reachable from it
Bottom-up traversal of the SCC-DAG of fixed methods; composition and filtering
In non-fixed methods: instantiate fixed calls to fixed methods; composition and filtering
1414 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Example: Final Summary for format
class DateFormatString format(Date f1) { DateFormat r0; Date r1; StringBuffer r2, r3; r0 = this; r1 = f1; r2 = new StringBuffer();cs1: r3 = r0.format(r1,r2);cs2: String r4 = r3.toString(); return r4;}
r3 ret
r3 ret
ll
rs1 exit
entry cs1
this f1 r0 r1 r2
this f1 r0 r1 r2
l ll
ll
1515 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Talk Outline Interprocedural distributed environment (IDE)
dataflow analysis problems- Definition; precise whole-program analysis- Examples: dependence analysis and type analysis
Generation of library summaries for IDE problems- Intra/interprocedural analysis in the library
Handling the possible effects of unknown clients Filtering away details that are irrelevant for
clients
Experimental evaluation- Entire Java 1.4.2 libraries; 20 client programs
1616 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Summary Generation Libraries: 10238 classes, 77190 methods 0-CFA type analysis + dependence analysis [w/
Soot]- Both data and control dependencies- Simple optimizations: def-use chains, sparse graphs
Cost: 90 minutes time, 1.2GB memory- Includes all Soot-related costs and all I/O
Final summary on disk: 18MB Measurements: number of edges in the graph
representation of transformers- [1]: before any composition or meet- [2]: after intraprocedural composition and meet- [3]: after [2] and intraprocedural filtering: remove
elements that are irrelevant for callers and callees
1717 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Intraprocedural Propagation
0
500000
1000000
1500000
2000000
2500000
3000000
1 2 3
0
100000
200000
300000
400000
500000
600000
700000
1 2 3
dependence analysis:reduction in # edgesfrom [2] to [3]: 53%
type analysis:reduction in # edgesfrom [2] to [3]: 55%
1818 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Propagation for Dep. Analysis
Fixed methods: 25490 (33%); eliminate 7195 (9%) of them because their only callers are in the library
Summary functions for fixed methods- Instantiate at fixed calls within non-fixed methods:
eliminates 21% of all library call sites- Additional intraprocedural propagation and filtering
0
500000
1000000
1500000
2000000
2500000
3000000
1 2 3 4
reduction in # edgesfrom [3] to [4]: 32%
1919 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Summary-Based Analysis of Clients
0%
10%
20%
30%
40%
50%
60%
70%
compress db
fractal
jack
javac
javacup-0.10j
jb-6.1
jess
jflex-1.4.1
jlex-1.2.6
jtar-1.21
mindterm-1.1.5
mpegaudio
muffin-0.9.3a
rabbit2
raytrace
sablecc-2.18.2
socksecho
socksproxy
violet
Reduction in start-to-end time: IR building, type analysis + call graph, dependence analysis
2020 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Only Dependence Analysis Reduction in analysis time: actual analysis and
a hypothetical best case with no library dependencies
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
compress db
fractal
jack
javac
javacup-0.10j
jb-6.1
jess
jflex-1.4.1
jlex-1.2.6
jtar-1.21
mindterm-1.1.5
mpegaudio
muffin-0.9.3a
rabbit2
raytrace
sablecc-2.18.2
socksecho
socksproxy
violet
2121 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Overview of Results Start-to-end cost: IR, type analysis, dep.
analysis- Average time reduction 51%- Average memory reduction 33%
Only dependence analysis- Average time reduction 69% - Average memory reduction 90%- Very close to a conservative upper bound
Conclusions- Summary generation has reasonable cost- Summary size is small (# edges and total disk size)- Significant savings for analysis running time and
memory usage, compared to whole-program analysis
2222 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Future Work This is a very preliminary study
- Promising initial results, but just the tip of the iceberg
More IDE analyses, with different characteristics- e.g. points-to analysis, side-effect analysis, constant
propagation, typestate properties, etc.
Beyond IDE analyses- e.g. recent [POPL08] paper by Yorsh et al.
Better handling of callbacks and polymorphic calls- e.g. take advantage of behavioral subtyping
Reusable API for storing and retrieving summary information – generality for many different analyses- Open-source API implementation based on Soot
2323 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Questions?
top related