![Page 1: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/1.jpg)
Partial Method Compilationusing Dynamic Profile Information
John WhaleyStanford University
October 17, 2001
![Page 2: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/2.jpg)
Outline
• Background and Overview• Dynamic Compilation System• Partial Method Compilation Technique• Optimizations• Experimental Results• Related Work• Conclusion
![Page 3: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/3.jpg)
Dynamic Compilation
• We want code performance comparable to static compilation techniques
• However, we want to avoid long startup delays and slow responsiveness
• Dynamic compiler should be fast AND good
![Page 4: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/4.jpg)
Traditional approach
• Interpreter plus optimizing compiler• Switch from interpreter to optimizing
compiler via some heuristic
• Problems:• Interpreter is too slow! (10x to 100x)
![Page 5: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/5.jpg)
Another approach
• Simple compiler plus optimizing compiler (Jalapeno, JUDO, Microsoft)• Switch from simple to optimizing compiler
via some heuristic
• Problems:• Code from simple compiler is still too
slow! (30% to 100% slower than optimizing)
• Memory footprint problems (Suganuma et al., OOPSLA’01)
![Page 6: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/6.jpg)
Yet another approach
• Multi-level compilation (Jalapeno, HotSpot)• Use multiple compiled versions to
slowly “accelerate” into optimized execution
• Problems:• This simply increases the delay
before the program runs at full speed!
![Page 7: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/7.jpg)
Problem with compilation
• Compilation takes time proportional to the amount of code being compiled
• Many optimizations are superlinear in the size of the code
• Compilation of large amounts of code is the cause of undesirably long compilation times
![Page 8: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/8.jpg)
Methods can be large
• All of these techniques operate at method boundaries
• Methods can be large, especially after inlining
• Cutting inlining too much hurts performance considerably (Arnold et al., Dynamo’00)
• Even when being frugal about inlining, methods can still become very large
![Page 9: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/9.jpg)
Methods are poor boundaries
• Method boundaries do not correspond very well to the code that would most benefit from optimization
• Even “hot” methods typically contain some code that is rarely or never executed
![Page 10: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/10.jpg)
Example: SpecJVM dbvoid read_db(String fn) { int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ }}
Hotloop
![Page 11: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/11.jpg)
Example: SpecJVM db
Lots ofrare code!
void read_db(String fn) { int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ }}
![Page 12: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/12.jpg)
Hot “regions”, not methods
• The regions that are important to compile have nothing to do with the method boundaries
• Using a method granularity causes the compiler to waste time optimizing large pieces of code that do not matter
![Page 13: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/13.jpg)
Overview of our technique
Increase the precision of selectivecompilation to operate at a sub-methodgranularity
1. Collect basic block level profile data for hot methods
2. Recompile using the profile data, replacing rare code entry points with branches into the interpreter
![Page 14: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/14.jpg)
Overview of our technique
• Takes advantage of the well-known fact that a large amount of code is rarely or never executed
• Simple to understand and implement, yet highly effective
• Beneficial secondary effect of improving optimization opportunities on the common paths
![Page 15: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/15.jpg)
Overview of Dynamic Compilation System
![Page 16: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/16.jpg)
interpretedcode
compiledcode
fully optimized
code
when execution count = t1
when execution count = t2
Stage 1:
Stage 2:
Stage 3:
![Page 17: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/17.jpg)
Identifying rare code
• Simple technique: any basic block executed during Stage 2 is said to be hot
• Effectively ignores initialization• Add instrumentation to the targets of
conditional forward branches• Better techniques exist, but using this
we saw no performance degradation• Enable/disable profiling is implicitly
handled by stage transitions
![Page 18: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/18.jpg)
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
1 10 100 500 1000 2000 5000
LinpackJ avaCUPJ avaLEXSwingSetcheckcompressjessdbjavacmpegaudmtrtjack
Method-at-a-time strategy
execution threshold
% o
f b
asi
c b
lock
s
![Page 19: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/19.jpg)
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
1 10 100 500 1000 2000 5000
LinpackJ avaCUPJ avaLEXSwingSetcheckcompressjessdbjavacmpegaudmtrtjack
execution threshold
Actual basic blocks executed%
of
basi
c b
lock
s
![Page 20: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/20.jpg)
Partial method compilation technique
![Page 21: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/21.jpg)
Technique
1. Based on profile data, determine the set of rare blocks.• Use code coverage information from
the first compiled version
![Page 22: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/22.jpg)
Technique
2. Perform live variable analysis.• Determine the set of live variables at
rare block entry points
live: x,y,z
![Page 23: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/23.jpg)
Technique
3. Redirect the control flow edges that targeted rare blocks, and remove the rare blocks.
to interpreter…
![Page 24: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/24.jpg)
Technique
4. Perform compilation normally.• Analyses treat the interpreter
transfer point as an unanalyzable method call.
![Page 25: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/25.jpg)
Technique
5. Record a map for each interpreter transfer point.• In code generation, generate a map
that specifies the location, in registers or memory, of each of the live variables.
• Maps are typically < 100 bytesx: sp - 4
y: R1
z: sp - 8
live: x,y,z
![Page 26: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/26.jpg)
Optimizations
![Page 27: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/27.jpg)
Partial dead code elimination
• Modified dead code elimination to treat rare blocks specially
• Move computation that is only live on a rare path into the rare block, saving computation in the common case
![Page 28: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/28.jpg)
Partial dead code elimination
• Optimistic approach on SSA form
• Mark all instructions that compute essential values, recursively
• Eliminate all non-essential instructions
![Page 29: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/29.jpg)
Partial dead code elimination
• Calculate necessary code, ignoring all rare blocks
• For each rare block, calculate the instructions that are necessary for that rare block, but not necessary in non-rare blocks
• If these instructions are recomputable at the point of the rare block, they can be safely copied there
![Page 30: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/30.jpg)
Partial dead code example
x = 0;
if (rare branch 1) {
...
z = x + y;
...
}
if (rare branch 2) {
...
a = x + z;
...
}
![Page 31: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/31.jpg)
Partial dead code example
if (rare branch 1) {
x = 0;
...
z = x + y;
...
}
if (rare branch 2) {
x = 0;
...
a = x + z;
...
}
![Page 32: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/32.jpg)
Pointer and escape analysis
• Treating an entrance to the rare path as a method call is a conservative assumption
• Typically does not matter because there are no merges back into the common path
• However, this conservativeness hurts pointer and escape analysis because a single unanalyzed call kills all information
![Page 33: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/33.jpg)
Pointer and escape analysis
• Stack allocate objects that don’t escape in the common blocks
• Eliminate synchronization on objects that don’t escape the common blocks
• If a branch to a rare block is taken:• Copy stack-allocated objects to the
heap and update pointers• Reapply eliminated synchronizations
![Page 34: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/34.jpg)
Copying from stack to heap
stackobject
Heap
stackobject
copy
rewrite
![Page 35: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/35.jpg)
Reconstructing interpreter state
• We use a runtime “glue” routine• Construct a set of interpreter stack
frames, initialized with their corresponding method and bytecode pointers
• Iterate through each location pair in the map, and copy the value at the location to its corresponding position in the interpreter stack frame
• Branch into the interpreter, and continue execution
![Page 36: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/36.jpg)
Experimental Results
![Page 37: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/37.jpg)
Experimental Methodology
• Fully implemented in a proprietary system• Unfortunately, cannot publish those
numbers!
• Proof-of-concept implementation in thejoeq virtual machine http://joeq.sourceforge.net• Unfortunately, joeq does not perform
significant optimizations!
![Page 38: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/38.jpg)
Experimental Methodology
• Also implemented as an offline step, using refactored class files• Use offline profile information to split
methods into “hot” and “cold” parts• We then rely on the virtual machine’s
default method-at-a-time strategy• Provides a reasonable approximation of the
effectiveness of this technique• Can also be used as a standalone optimizer• Available under LGPL as part of joeq release
![Page 39: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/39.jpg)
Experimental Methodology
• IBM JDK 1.3 cx130-20010626 on RedHat Linux 7.1
• Pentium 3 600 mhz, 512 MB RAM
• Thresholds: t1 = 2000, t2 = 25000
• Benchmarks: SpecJVM, SwingSet, Linpack, JavaLex, JavaCup
![Page 40: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/40.jpg)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
check compress jess db javac mpegaud mtrt jack SwingSet linpack JLex JCup
Run time improvement
First bar: originalSecond bar: PMCThird bar: PMC + my opts
Blue: optimized execution
![Page 41: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/41.jpg)
Related Work
Dynamic techniques• Dynamo (Bala et al., PLDI’00)• Self (Chambers et al., OOPSLA’91)• HotSpot (JVM’01)• IBM JDK (Ishizaki et al., OOPSLA’00)
![Page 42: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/42.jpg)
Related Work
Static techniques• Trace scheduling (Fisher, 1981)• Superblock scheduling (IMPACT
compiler)• Partial redundancy elimination with
cost-benefit analysis (Horspool, 1997)• Optimal compilation unit shapes
(Bruening, FDDO’00)• Profile-guided code placement
strategies
![Page 43: Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d4c5503460f94a2a141/html5/thumbnails/43.jpg)
Conclusion
• Partial method compilation technique is simple to implement, yet very effective
• Compile times reduced drastically
• Overall run times improved by an average of 10%, and up to 32%
• System is available under LGPL at: http://joeq.sourceforge.net