an efficient inclusion-based points-to analysis for strictly-typed languages
Post on 05-Feb-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
An Efficient Inclusion-Based An Efficient Inclusion-Based Points-To Analysis for Strictly-Points-To Analysis for Strictly-
Typed LanguagesTyped Languages
John Whaley Monica S. LamJohn Whaley Monica S. Lam
Computer Systems LaboratoryComputer Systems LaboratoryStanford UniversityStanford University
September 18, 2002September 18, 2002
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 22
BackgroundBackground
Andersen’s points-to analysis for C (1994)Andersen’s points-to analysis for C (1994) Flow-insensitive, context-insensitiveFlow-insensitive, context-insensitive Inclusion-based, more accurate thanInclusion-based, more accurate than
unification-based Steensgaardunification-based Steensgaard O(nO(n33), considered too slow to be practical), considered too slow to be practical
CLA optimization to Andersen’s analysis CLA optimization to Andersen’s analysis (Heintze & Tardieu, PLDI’01)(Heintze & Tardieu, PLDI’01) Online caching/cycle eliminationOnline caching/cycle elimination Field-independent: 1.3M lines of code in 137sField-independent: 1.3M lines of code in 137s
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 33
Doing it for JavaDoing it for Java
We want Andersen-level pointers for We want Andersen-level pointers for JavaJava
Naïve port of CLA algorithm:Naïve port of CLA algorithm: Spec “compress” benchmark: Spec “compress” benchmark: 2+ hours!2+ hours! Call graph accuracy: same as RTA (terrible)Call graph accuracy: same as RTA (terrible)
Our paper: how to do CLA for JavaOur paper: how to do CLA for Java Spec “compress” benchmark: Spec “compress” benchmark: 5 seconds!5 seconds! JEdit (1371 classes): ~10 minutes!JEdit (1371 classes): ~10 minutes! Call graph accuracy: very goodCall graph accuracy: very good
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 44
Java vs. C: Virtual callsJava vs. C: Virtual calls
Java has many virtual callsJava has many virtual calls Accuracy of analysis strongly affects Accuracy of analysis strongly affects
number of call targetsnumber of call targets More call targets leads to more code being More call targets leads to more code being
analyzed and longer analysis timesanalyzed and longer analysis times
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 55
Java vs. C: Treatment of FieldsJava vs. C: Treatment of Fields
Field-independent:Field-independent: in in o.fo.f, use only , use only oo Most C pointer analysesMost C pointer analyses Sound even for non-type-safe languagesSound even for non-type-safe languages
Field-based:Field-based: in in o.fo.f, use only , use only ff Very inaccurate, requires type safetyVery inaccurate, requires type safety
Field-sensitive:Field-sensitive: in in o.fo.f, use both , use both oo, , ff Strictly more accurate than field-Strictly more accurate than field-
independent or field-basedindependent or field-based Essential for JavaEssential for Java
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 66
Java vs. C: Local variablesJava vs. C: Local variables
Local variables/stack locations are Local variables/stack locations are reusedreused
Flow insensitivity causes many false Flow insensitivity causes many false aliasesaliases
Local flow sensitivity is necessaryLocal flow sensitivity is necessary
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 77
Our ContributionOur Contribution
Andersen-style inclusion-based points-to Andersen-style inclusion-based points-to analysis for Java, based on ideas from CLAanalysis for Java, based on ideas from CLA Field sensitivityField sensitivity
• Tracks separate fields of separate objectsTracks separate fields of separate objects Uses “method summary graphs”Uses “method summary graphs”
• Sparse representation, uses local flow sensitivitySparse representation, uses local flow sensitivity OptimizationsOptimizations
• Caching across iterations, reducing redundant opsCaching across iterations, reducing redundant ops Supports all features of JavaSupports all features of Java
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 88
Algorithm OverviewAlgorithm Overview
Intraprocedural:Intraprocedural:Generate a sparse, flow-insensitive Generate a sparse, flow-insensitive summary graph for each methodsummary graph for each method Based on access paths, uses local flow sensitivityBased on access paths, uses local flow sensitivity
Interprocedural:Interprocedural:Using summary graphs, build inclusion Using summary graphs, build inclusion graph to obtain whole-program resultgraph to obtain whole-program result
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 99
Method SummariesMethod Summaries
Sparse, flow-insensitive summary of the Sparse, flow-insensitive summary of the semantics of each methodsemantics of each method Stores (writes) in methodStores (writes) in method Calls made by method and their parametersCalls made by method and their parameters Return values, thrown and caught Return values, thrown and caught
exceptionsexceptions Use a flow-sensitive technique to Use a flow-sensitive technique to
generate method summariesgenerate method summaries Precisely model updates to stack and localsPrecisely model updates to stack and locals
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1010
Method Summary: ExampleMethod Summary: Example
f gstatic void foo(C x, C y) {static void foo(C x, C y) {
C t = x.f;C t = x.f;t.g = y;t.g = y;x.g = x;x.g = x;t.bar(y);t.bar(y);
}}
x
g
yx.f
bar(t,y);
Code for method foo: Summary for method foo:
read edge
write edge
parameter map edge
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1111
Node typesNode types
A node represents an object at run time.A node represents an object at run time. Concrete type nodesConcrete type nodes
Objects that have a known concrete typeObjects that have a known concrete type newnew statements and constant objects statements and constant objects
Abstract nodesAbstract nodes Parameters, return values, dereferencesParameters, return values, dereferences Interprocedural phase maps an abstract Interprocedural phase maps an abstract
node to set of concrete nodes it can node to set of concrete nodes it can representrepresent
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1212
Edge typesEdge types
Read edge:Read edge: Created by load statementsCreated by load statements Represent dereferences (access paths) of Represent dereferences (access paths) of
known locationsknown locations Write edge:Write edge:
Created by store statementsCreated by store statements Represent references created by the Represent references created by the
methodmethod
f
f
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1313
Outgoing parameter mapOutgoing parameter map
Records which nodes are passed as Records which nodes are passed as which parameterswhich parameters
This is used in the interprocedural This is used in the interprocedural phase to match call sites to call targetsphase to match call sites to call targets
f gx
g
yx.f
t.bar(y);
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1414
Generating method Generating method summarysummary
Worklist data flow solver (flow-Worklist data flow solver (flow-sensitive)sensitive)
Strong updates on locals, weak on Strong updates on locals, weak on othersothers
Detect and close cycles in access pathsDetect and close cycles in access paths More detail in the paperMore detail in the paper
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1515
Review: Andersen’s Points-Review: Andersen’s Points-toto
Points-to is encoded as inclusion Points-to is encoded as inclusion relationsrelations
x = y implies x x = y implies x y y
x x y is also written as: x y is also written as: x y y
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1616
Review: Andersen’s Points-Review: Andersen’s Points-toto
x x new newyy
e e new newyy.f.f
x.f = e;x.f = e;
e = x.f;e = x.f;
ee11 = e = e22;;
StoreStore
LoadLoad
CopyCopy
Transitive closureTransitive closure
x x new newyy
newnewyy.f .f e e
ee11 e e22
ee11 e e22, e, e22 e e33
ee11 e e33
If code contains: Apply rule:Rule name:
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1717
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
f gx
g
yx.f
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1818
Andersen exampleAndersen example
C
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
f gx
g
yx.f
fD E
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 1919
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
x x new newyy
e e new newyy.f.fe = x.f;e = x.f;LoadLoad
If code contains: Apply rule:Rule name:
C
f gx
g
yx.f
fD E
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2020
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
x x new newyy
e e new newyy.f.fe = x.f;e = x.f;LoadLoad
If code contains: Apply rule:Rule name:
C
f gx
g
yx.f
fD E
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2121
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
If code contains: Apply rule:Rule name:
C
f gx
g
yx.f
fD E
x.f = e;x.f = e;StoreStore x x new newyy
newnewyy.f .f e e
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2222
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
If code contains: Apply rule:Rule name:
C
f gx
g
yx.f
fD E
x.f = e;x.f = e;StoreStore x x new newyy
newnewyy.f .f e e
g
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2323
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
If code contains: Apply rule:Rule name:
C
f gx
g
yx.f
fD E
x.f = e;x.f = e;StoreStore x x new newyy
newnewyy.f .f e e
g
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2424
Andersen exampleAndersen example
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
If code contains: Apply rule:Rule name:
C
f gx
g
yx.f
fD E
x.f = e;x.f = e;StoreStore x x new newyy
newnewyy.f .f e e
gg
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2525
Mapping method callsMapping method calls
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
t.bar(y);t.bar(y);
C
f gx
g
yx.f
fD E
gg
t.bar(y);
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2626
Mapping method callsMapping method calls
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
t.bar(y);t.bar(y);
C
f gx
g
yx.f
fD E
gg
t.bar(y);
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2727
Mapping method callsMapping method calls
t = x.f;t = x.f;
t.g = y;t.g = y;
x.g = x;x.g = x;
t.bar(y);t.bar(y);
C
f gx
g
yx.f
fD E
gg
t.bar(y);
Bar:this
Bar:p1
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2828
Overall PictureOverall Picture
CD
E F“Concrete” world
“Abstract” world
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 2929
Graph-based AndersenGraph-based Andersen
Computing full transitive closure is Computing full transitive closure is prohibitively expensiveprohibitively expensive
Store the graph in pre-transitive form, Store the graph in pre-transitive form, and calculate reachable nodes on and calculate reachable nodes on demanddemand
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3030
AlgorithmAlgorithm
foreach write edge eforeach write edge e11 →→ e e2 2 dodo
foreach n in getConcreteNodes(eforeach n in getConcreteNodes(e11))
add write edge n.f add write edge n.f → → ee22
foreach read edge eforeach read edge e11 →→ e e2 2 dodo
foreach n in getConcreteNodes(eforeach n in getConcreteNodes(e11))
add inclusion edge eadd inclusion edge e22 n.f n.f
foreach method call eforeach method call e11.f().f()
foreach n in getConcreteNodes(eforeach n in getConcreteNodes(e11))add parameter mappings for target methodadd parameter mappings for target method
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3131
Caching reachability queriesCaching reachability queries
getConcreteNodes(e): transitive closure getConcreteNodes(e): transitive closure query on the inclusion graphquery on the inclusion graph
The same queries are repeated many The same queries are repeated many timestimes
Store the result in a hash tableStore the result in a hash table Cached result may be stale due to edges Cached result may be stale due to edges
added since the last queryadded since the last query Iterate until convergenceIterate until convergence
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3232
Online cycle detectionOnline cycle detection
Inclusion graph includes cyclesInclusion graph includes cycles The algorithm collapses cycles as they The algorithm collapses cycles as they
are traversedare traversed During traversal, keeps track of current pathDuring traversal, keeps track of current path If a node on current path is revisited, If a node on current path is revisited,
collapse all nodes in cyclecollapse all nodes in cycle Each node has a “skip” pointer, which is set Each node has a “skip” pointer, which is set
when collapsed and followed on all accesseswhen collapsed and followed on all accesses
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3333
Reusing cachesReusing caches
Concrete node cache values don’t Concrete node cache values don’t change much between algorithm change much between algorithm iterationsiterations
Reallocation and rebuilding them is Reallocation and rebuilding them is expensiveexpensive
Reuse caches from old iterationsReuse caches from old iterations Keep track of an iteration ‘version’ number Keep track of an iteration ‘version’ number
for each cache entryfor each cache entry
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3434
Minimizing set union Minimizing set union operationsoperations
Many caches don’t change across Many caches don’t change across iterationsiterations
Avoid set union operations for caches Avoid set union operations for caches that haven’t changed since the last that haven’t changed since the last iterationiteration Keep a ‘changed’ flag for each cache entry, Keep a ‘changed’ flag for each cache entry,
records if last computation changed the entryrecords if last computation changed the entry If input set hasn’t changed, set union If input set hasn’t changed, set union
operation is redundantoperation is redundant
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3535
Experimental ResultsExperimental Results
Concrete type inferenceConcrete type inference Static call graphStatic call graph
Implemented in ~800 lines of JavaImplemented in ~800 lines of Java Freely available at: Freely available at:
http://joeq.sourceforge.nethttp://joeq.sourceforge.net
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3636
ProgramsPrograms
SpecJVMSpecJVM Standard benchmark suiteStandard benchmark suite
J2EE – Java 2 Enterprise Edition v1.3J2EE – Java 2 Enterprise Edition v1.3 Massive (1+ million lines) business frameworkMassive (1+ million lines) business framework
joeqjoeq Compiler infrastructure, 75K linesCompiler infrastructure, 75K lines
CloudscapeCloudscape Database shipped with J2EE, no source codeDatabase shipped with J2EE, no source code
JEditJEdit Full-featured editor, 100K linesFull-featured editor, 100K lines
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3737
Experimental ResultsExperimental Results
We analyzed the reachable code for We analyzed the reachable code for each applicationeach application Results include code in class libraryResults include code in class library Analysis was very effective in reducing Analysis was very effective in reducing
total program sizetotal program size
Pentium 4 2GHz 2GB RAM, Redhat 7.2Pentium 4 2GHz 2GB RAM, Redhat 7.2 Sun JDK 1.3.1_01 with 512MB heapSun JDK 1.3.1_01 with 512MB heap
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3838
Analysis Precision vs. RTAAnalysis Precision vs. RTA
0
0.5
1
1.5
2
2.5
3
chec
k
com
press db jac
kjav
ac jess
mpeg
audi
om
trt
raytr
ace
adm
intoo
l
appc
lient
deplo
ytool
j2ees
erve
r
pack
ager
verif
ier joeq
cloud
scap
ejed
it
Ave
rag
e ta
rget
s p
er c
all
site
RTA
Points-to
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 3939
Analysis time: Small Analysis time: Small benchmarksbenchmarks
0
10
20
30
40
50
60
70
80
chec
k
com
press db jac
kjav
ac jess
mpe
gaudi
om
trt
raytr
ace
Sec
on
ds
No opt
Opt
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 4040
Analysis time: Large Analysis time: Large benchmarksbenchmarks
0
200
400
600
800
1000
1200
1400
1600
1800
2000
adm
intoo
l
appc
lient
deplo
ytool
j2ees
erve
r
pack
ager
verif
ier joeq
cloud
scap
ejed
it
Se
co
nd
s
No optOpt
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 4141
Analysis time (speedup)Analysis time (speedup)
0
2
4
6
8
10
12
14
16
18
20
chec
k
com
press db ja
ckja
vac
jess
mpe
gaud
iom
trt
rayt
race
adm
into
ol
appc
lient
deplo
ytool
j2ee
serv
er
pack
ager
verif
ier
joeq
cloud
scap
eje
dit
Tim
es s
pee
du
p
Opt
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 4242
Analysis time Analysis time (bytecodes/second)(bytecodes/second)
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
chec
k
com
pres
s db jack
javac
jess
mpe
gaud
iom
trt
raytr
ace
adm
intoo
l
appc
lient
deplo
ytool
j2ees
erve
r
pack
ager
verif
ierjoe
q
cloud
scap
ejed
it
Byt
eco
des
per
sec
on
d
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 4343
Related WorkRelated Work
Original CLA paperOriginal CLA paper Heintze and Tardieu (PLDI 2001)Heintze and Tardieu (PLDI 2001)
Anderson’s analysis for JavaAnderson’s analysis for Java Rountev, Milanova, Ryder (OOPSLA 2001)Rountev, Milanova, Ryder (OOPSLA 2001) Liang, Pennings, Harrold (PASTE 2001)Liang, Pennings, Harrold (PASTE 2001) Many others…Many others…
Concrete type inferenceConcrete type inference CHA, RTACHA, RTA Flow and context sensitivity, 0-CFAFlow and context sensitivity, 0-CFA
September 18, 2002September 18, 2002 SAS 2002SAS 2002 Slide Slide 4444
ConclusionConclusion Improved precisionImproved precision
Field sensitivityField sensitivity Local flow sensitivityLocal flow sensitivity
Improved efficiencyImproved efficiency Reuse reachability cache across iterationsReuse reachability cache across iterations Minimize set-union operationsMinimize set-union operations
Scales to the largest Java programsScales to the largest Java programs A new baseline for Java pointersA new baseline for Java pointers
No reason to use a less precise analysisNo reason to use a less precise analysis
top related