adaptive optimization in the jalapeño jvm
DESCRIPTION
Adaptive Optimization in the Jalapeño JVM. M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney. Presented by Andrew Cove 15-745 Spring 2006. Research JVM developed at IBM T.J. Watson Research Center Extensible system architecture based on federation of threads that communicate asynchronously - PowerPoint PPT PresentationTRANSCRIPT
Adaptive Optimization in the Jalapeño JVM
M. Arnold, S. Fink, D. Grove,
M. Hind, P. Sweeney
Presented by Andrew Cove
15-745 Spring 2006
Jalapeño JVM
• Research JVM developed at IBM T.J. Watson Research Center• Extensible system architecture based on federation of threads that
communicate asynchronously• Supports adaptive multi-level optimization with low overhead
– Statistical sampling
Contributions
• Extensible adaptive optimization architecture that enables online feedback-directed optimization
• Adaptive optimization system that uses multiple optimization levels to improve performance
• Implementation and evaluation of feedback-directed inlining based on low-overhead sample data
• Doesn’t require programmer directives
Jalapeño JVM - Details
• Written in Java– Optimizations applied not only to application and libraries, but to JVM
itself
– Boot Strapped• Boot image contains core Jalapeño services precompiled to machine code• Doesn’t need to run on top of another JVM
• Subsystems– Dynamic Class Loader
– Dynamic Linker
– Object Allocator
– Garbage Collector
– Thread Scheduler
– Profiler • Online measurement system
– 2 Compilers
Jalapeño JVM - Details
• 2 Compilers– Baseline
• Translates bytecodes directly into native code by simulating Java’s operand stack
• No register allocation
– Optimizing Compiler• Linear scan register allocation• Converts bytecodes into IR, which it uses for optimizations• Compile-only
– Compiles all methods to native code before execution
– 3 levels of optimization
– …
Jalapeño JVM - Details
• Optimizing Compiler (without online feedback)– Level 0: Optimizations performed during conversion
• Copy, Constant, Type, Non-Null propagation• Constant folding, arithmetic simplification• Dead code elimination• Inlining• Unreachable code elimination• Eliminate redundant null checks• …
– Level 1:• Common Subexpression Elimination• Array bounds check elimination• Redundant load elimination• Inlining (size heuristics)• Global flow-insensitive copy and constant propagation, dead assignment
elimination• Scalar replacement of aggregates and short arrays
Jalapeño JVM - Details
• Optimizing Compiler (without online feedback)– Level 2
• SSA based flow sensitive optimizations• Array SSA optimizations
Jalapeño JVM - Details
Jalapeño Adaptive Optimization System (AOS)
• Sample based profiling drives optimized recompilation• Exploit runtime information beyond the scope of a static model• Multi-level and adaptive optimizations
– Balance optimization effectiveness with compilation overhead to maximize performance
• 3 Component Subsystems (Asynchronous threads)– Runtime Measurement
– Controller
– Recompilation
– Database (3+1 = 3 ?)
Jalapeño Adaptive Optimization System (AOS)
Subsystems – Runtime Measurement
• Sample driven program profile– Instrumentation
– Hardware monitors
– VM instrumentation
– Sampling• Timer interrupts trigger yields between threads• Method-associative counters updated at yields
– Triggers controller at threshold levels
• Data processed by organizers– Hot method organizer
• Tells controller the time dominant methods that aren’t fully optimized
– Decay organizer• Decreases sample weights to emphasize recent data
Hotness
• A hot method is where the program spends a lot of its time• Hot edges are used later on to determine good function calls to
inline• In both cases, hotness is a function of the number of samples that
are taken– In a method
– In a given callee from a given caller
• The system can adaptively adjust hotness thresholds– To reduce optimization in startup
– To encourage optimization of more methods
– To reduce analysis time when too many methods are hot
Subsystems – Controller
• Orchestrates and conducts the other components of AOS– Directs data monitoring
– Creates organizer threads
– Chooses to recompile based on data and cost/benefit model
• To recompile or not to recompile?
• Find j that minimizes expected future running time of recompiled m• If , recompile m at level j• Assume, arbitrarily, that program will run for twice its current
duration• , Pm is estimated percentage of future time
Subsystems – Controller
• System estimates effectiveness of optimization levels as constant based on offline measurements
• Uses linear model of compilation speed for each optimization level as function of method size– Linearity of higher level optimizations?
Subsystems – Controller
Subsystems – Recompilation
• In theory– Multiple compilation threads that invoke compilers
– Can occur in parallel to the application
• In practice– Single compilation thread
• Some JVM services require the master lock– Multiple compilation threads are not effective
– Lock contention between compilation and application threads
– Left as a footnote!
• Recompilation times are stored to improve time estimates in cost/benefit analysis
Feedback-Directed Inlining
• Statistical samples of method calls used to build dynamic call graph– Traverse call stack at yields
• Identify hot edges– Recompile caller methods with inlined callee (even if the caller was
already optimized)
• Decay old edges• Adaptive Inlining Organizer
– Determine hot edges and hot methods worth recompiling with inlined method call
– Weight inline rules with boost factor• Based on number of calls on call edge and previous study on effects of
removing call overhead• Future work: more sophisticated heuristic
• Seems obvious: new inline optimizations don’t eliminate old inlines
Experimental Methodology
• System– Dual 333MHz PPC processors, 1 GB memory
• Timer interrupts at 10 ms intervals• Recompilation organizer 2 times per second to 1 time every 4s• DCG and adaptive inline organizer every 2.5 seconds• Method sample half life 1.7 seconds• Edge weight half life 7.3 seconds
• SPECjvm98• Jalapeño Optimizing Compiler• Volano chat room simulator
• Startup and Steady-State measurements
Results
• Compile time overhead plays large role in startup
Results
• Multilevel Adaptive does well (and JIT’s don’t have overhead)
Results
• Startup doesn’t reach high enough optimization level to benefit
Questions
• Assuming execution time will be twice the current duration is completely arbitrary, but has nice outcome (less optimization at startup, more at steady state)
• Meaningless measurements of optimizations vs. phase shifts– Due to execution time estimation
Questions
• Does it scale?– More online-feedback optimizations
• More threads needing cycles– Organizer threads
– Recompilation threads
• More data to measure• Especially slow if there can only be one recompilation thread• More complicated cost/benefit analysis
– Potential speed ups and estimate compilation times
Questions
Questions