adaptive optimization in the jalapeño jvm

Adaptive Optimization in the Jalapeño JVM

M. Arnold, S. Fink, D. Grove,

M. Hind, P. Sweeney

Presented by Andrew Cove

15-745 Spring 2006

Jalapeño JVM

• Research JVM developed at IBM T.J. Watson Research Center• Extensible system architecture based on federation of threads that

communicate asynchronously• Supports adaptive multi-level optimization with low overhead

– Statistical sampling

Contributions

• Extensible adaptive optimization architecture that enables online feedback-directed optimization

• Adaptive optimization system that uses multiple optimization levels to improve performance

• Implementation and evaluation of feedback-directed inlining based on low-overhead sample data

• Doesn’t require programmer directives

Jalapeño JVM - Details

• Written in Java– Optimizations applied not only to application and libraries, but to JVM

itself

– Boot Strapped• Boot image contains core Jalapeño services precompiled to machine code• Doesn’t need to run on top of another JVM

• Subsystems– Dynamic Class Loader

– Dynamic Linker

– Object Allocator

– Garbage Collector

– Thread Scheduler

– Profiler • Online measurement system

– 2 Compilers


• 2 Compilers– Baseline

• Translates bytecodes directly into native code by simulating Java’s operand stack

• No register allocation

– Optimizing Compiler• Linear scan register allocation• Converts bytecodes into IR, which it uses for optimizations• Compile-only

– Compiles all methods to native code before execution

– 3 levels of optimization

– …


• Optimizing Compiler (without online feedback)– Level 0: Optimizations performed during conversion

• Copy, Constant, Type, Non-Null propagation• Constant folding, arithmetic simplification• Dead code elimination• Inlining• Unreachable code elimination• Eliminate redundant null checks• …

– Level 1:• Common Subexpression Elimination• Array bounds check elimination• Redundant load elimination• Inlining (size heuristics)• Global flow-insensitive copy and constant propagation, dead assignment

elimination• Scalar replacement of aggregates and short arrays


• Optimizing Compiler (without online feedback)– Level 2

• SSA based flow sensitive optimizations• Array SSA optimizations

Jalapeño Adaptive Optimization System (AOS)

• Sample based profiling drives optimized recompilation• Exploit runtime information beyond the scope of a static model• Multi-level and adaptive optimizations

– Balance optimization effectiveness with compilation overhead to maximize performance

• 3 Component Subsystems (Asynchronous threads)– Runtime Measurement

– Controller

– Recompilation

– Database (3+1 = 3 ?)

Jalapeño Adaptive Optimization System (AOS)

Subsystems – Runtime Measurement

• Sample driven program profile– Instrumentation

– Hardware monitors

– VM instrumentation

– Sampling• Timer interrupts trigger yields between threads• Method-associative counters updated at yields

– Triggers controller at threshold levels

• Data processed by organizers– Hot method organizer

• Tells controller the time dominant methods that aren’t fully optimized

– Decay organizer• Decreases sample weights to emphasize recent data

Hotness

• A hot method is where the program spends a lot of its time• Hot edges are used later on to determine good function calls to

inline• In both cases, hotness is a function of the number of samples that

are taken– In a method

– In a given callee from a given caller

• The system can adaptively adjust hotness thresholds– To reduce optimization in startup

– To encourage optimization of more methods

– To reduce analysis time when too many methods are hot

Subsystems – Controller

• Orchestrates and conducts the other components of AOS– Directs data monitoring

– Creates organizer threads

– Chooses to recompile based on data and cost/benefit model

• To recompile or not to recompile?

• Find j that minimizes expected future running time of recompiled m• If , recompile m at level j• Assume, arbitrarily, that program will run for twice its current

duration• , Pm is estimated percentage of future time


• System estimates effectiveness of optimization levels as constant based on offline measurements

• Uses linear model of compilation speed for each optimization level as function of method size– Linearity of higher level optimizations?


Subsystems – Recompilation

• In theory– Multiple compilation threads that invoke compilers

– Can occur in parallel to the application

• In practice– Single compilation thread

• Some JVM services require the master lock– Multiple compilation threads are not effective

– Lock contention between compilation and application threads

– Left as a footnote!

• Recompilation times are stored to improve time estimates in cost/benefit analysis

Feedback-Directed Inlining

• Statistical samples of method calls used to build dynamic call graph– Traverse call stack at yields

• Identify hot edges– Recompile caller methods with inlined callee (even if the caller was

already optimized)

• Decay old edges• Adaptive Inlining Organizer

– Determine hot edges and hot methods worth recompiling with inlined method call

– Weight inline rules with boost factor• Based on number of calls on call edge and previous study on effects of

removing call overhead• Future work: more sophisticated heuristic

• Seems obvious: new inline optimizations don’t eliminate old inlines

Experimental Methodology

• System– Dual 333MHz PPC processors, 1 GB memory

• Timer interrupts at 10 ms intervals• Recompilation organizer 2 times per second to 1 time every 4s• DCG and adaptive inline organizer every 2.5 seconds• Method sample half life 1.7 seconds• Edge weight half life 7.3 seconds

• SPECjvm98• Jalapeño Optimizing Compiler• Volano chat room simulator

• Startup and Steady-State measurements

Results

• Compile time overhead plays large role in startup

Results

• Multilevel Adaptive does well (and JIT’s don’t have overhead)

Results

• Startup doesn’t reach high enough optimization level to benefit

Questions

• Assuming execution time will be twice the current duration is completely arbitrary, but has nice outcome (less optimization at startup, more at steady state)

• Meaningless measurements of optimizations vs. phase shifts– Due to execution time estimation

Questions

• Does it scale?– More online-feedback optimizations

• More threads needing cycles– Organizer threads

– Recompilation threads

• More data to measure• Especially slow if there can only be one recompilation thread• More complicated cost/benefit analysis

– Potential speed ups and estimate compilation times

Questions

adaptive optimization in the jalapeño jvm

Documents

jalapeo jvmresearch

multiple optimization

online feedbacklevel

native code

jalapeo jvmm

core jalapeo services

lowoverhead sample datadoesnt

compilation overhead