fully dynamic specialization aj shankar osq lunch 9 december 2003
TRANSCRIPT
Fully Dynamic Specialization
AJ Shankar
OSQ Lunch
9 December 2003
“That’s Why They Play the Game”
Programs are executed because we can’t determine their behavior statically!
Idea: Optimize programs dynamically to take advantage of runtime information we can’t get statically Look at portions of the program for
predictable inputs that we can optimize for
Specialization
Recompile portions of the program, using known runtime values as constants Possibly many variants
of the same code Allow for fallback to
original code when assumptions are not met
Predictable == recurrent
Generic
Unpredictable Predictable
G P2 P3 P4
Unpredictable
Predictable
LOAD pcX = …X = …
How It Works
Chose a good region of code to specialize: after a good predictable instruction
Insert dispatch that checks the result of the chosen instruction
Recompile code for different results of the instruction
During execution, jump to appropriate specialized code
Dispatch(X)
Spec1 Spec2 Default
… …
Dispatch(X)
Spec1 Spec2 Default
… …
Dispatch(X)
Spec1 Spec2 Default
…
Rest of Code
…
When Is This a Good Idea?
Any app whose execution is heavily dependent on input
For instance Interpreters Raytracers Dynamic content producers (CGI scripts,
etc.)
Specialization Is Hard!
Specializing code at runtime is costly Can even slow the program down
Existing specializers rely on static annotations to clue them in about profitable areas Difficult to get right Limits specialization potential
Existing: DyC, Cyclone, etc.
Explicitly annotate static data No support for automatic specialization
of frequently-executed code Could compile lots of useless stuff
No concrete store information Doesn’t take advantage of the fact that
memory location X is constant for the lifetime of the program
Existing: Calpa
Mock, et al, 2000. Extension to DyC. Profile execution on sample input to derive
annotations But converting a concrete profile to an
abstract annotation means Still unable to detect concrete memory constants Frequently executed code for arbitrary input?
Still needs source, is offline!
Motivating Example: Interpreter
while(1) {i = instrs[pc];switch(instr.opcode) {
case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;
case BNEQ;if (env[i.op1] != 0)
pc = env[i.op2];else pc++;break;
...}
}
Sample interpreted program:
X = 10;…WHILE (Z != 0) {
Y = X+Z;…
}
• X is constant after initialization• concrete memory location
• Y = X+Z executed frequently
Motivating Example: Interpreter
while(1) {i = instrs[pc];switch(instr.opcode) {
case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;
case BNEQ;if (env[i.op1] != 0)
pc = env[i.op2];else pc++;break;
...}
}
Sample interpreted program:
X = 10;…WHILE (Z != 0) {
Y = X+Z;…
}
while(1) {while (pc == 15) {
// Y = X + Zenv[3] = 10 + env[2];…// Z != 0 ?if (env[2] == 0)
pc = 19;} else {
// normal loop }
}
A More Concrete Approach
Do everything at runtime! Specialize on execution-time hot values Know which concrete memory locations are
constant Other benefits of this approach:
Specialize temporally, as execution progresses Specialize dynamically loaded libraries as well No annotations or source code necessary
LOAD pcX = …X = …LOAD pc
A Quick Recap
Chose a good region of code to specialize
Insert dispatch that checks the result of the chosen instruction (the “trigger”)
Recompile code for different values of a hot instruction
During execution, jump to appropriate specialized code
Dispatch(X)
Spec1 Spec2 Default
… …
Dispatch(X)
Spec1 Spec2 Default
… …
Dispatch(X)
Spec1 Spec2 Default
… …
Dispatch(pc)
pc=15 pc=27 while(1)
Rest of Code
…
The Details
Need to identify the best predictable instruction Specializing on its result should provide
the greatest benefit To find it, gather profile information about
all instructions Need to actually do the specializing
Instrumentation: Hot Values
What’s a hot value? One that occurs frequently as the result of an instruction x % 2 has two very hot values, 0 and 1
Good candidate instructions are predictable: result in (only) a few hot values For instance, small_constant_table[x], but not
rand(x) Case study: Interpreter
Predictable instructions: LOAD pc, instr.opcodeinstr = instrs[pc];switch(instr.opcode) { … }
Instrumentation: Store Profile
Keep track of memory locations that have been written to
Idea: if a location hasn’t been written to yet, it probably won’t be later, either
Case study: Interpreter Store profile says env[Y] written to a lot,
but env[X], instrs[] never written toregs[instr.res] = regs[instr.op1] + regs[instr.op2];
Invalidating Specialized Code
Memory locations may not really be constant
When ‘constant’ memory is overwritten, must invalidate or modify specializations that depended on it
How does Calpa handle invalidation? Computes points-to set Inserts invalidation calls at all appropriate points
(offline) Too costly an approach, without modification
Invalidation Options
Write barrier Still feasible if field is
private On-entry checks
Feasible if specialization depends on a small number of memory locations
e.g. Factor(BigInt x) Hardware support
e.g. Mondrian Ideal solution Possible to simulate?
Class Interpreter {private Instruction[] instrs;void SetInstrs(Instruction[] is) {
instrs = is;}
}
Dispatch
Spec1 Default
Hot Instruction
CheckMem
Invalidate
Specialization Algorithm
1. Find good candidate instructions Predictable Frequently executed
2. For each candidate instruction Simultaneously evaluate method using constant
propagation for some of its hot values Compute overall cost/benefit
3. Choose the best instruction
Specializing the Interpreter
while(1) {i = instrs[pc];switch(instr.opcode) {
case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;
case BNEQ;if (env[i.op1] != 0)
pc = env[i.op2];else pc++;break;
...}
}
Instr.opcode:
Executed very frequently
A small handful of values
pc:
Executed very frequently
More values, but still reasonable
Candidates:
switch(ADD)
Specializing on instr.opcode
LOOP: i = instrs[pc]
case ADD:
switch(i.opcode)
……
env[i.res] = env[i.op1]+env[i.op2]
pc = pc + 1
goto LOOP
switch(ADD)
case ADD:
benefit = 1
env[i.res] = env[i.op1]+env[i.op2]
pc = pc + 1
goto LOOP
LOOP: i = instrs[pc]
benefit = 3
benefit = 2
i.opcode = ADD
i.opcode = ADD
i.opcode = ADD
i.opcode = ADD
i.opcode = ADD
i.opcode = ADD
Dispatch(opcode)
{}
Other values of opcode have similar results…
LOOP: i = instrs[15]
Specializing on pc
LOOP: i = instrs[pc]
case ADD:
switch(i.opcode)
……
env[i.res] = env[i.op1]+env[i.op2]
pc = pc + 1
goto LOOP
LOOP: i = instrs[15]
switch(ADD)
case ADD:
env[Y] = 10 + env[Z]
pc = 15 + 1
LOOP: i = instrs[16]
switch(BNEQ)
if (env[Z] != 0)
pc = 15
pc = 15 ; i = ADD Y, X, Z
pc = 15 ; i = ADD Y, X, Z
pc = 15 ; i = ADD Y, X, Z
pc = 15; i = ADD Y, X, Z
pc = 16 ; i = ADD Y, X, Z
pc = 16 ; i = BNEQ Z, 15
pc = 16 ; i = BNEQ Z, 15
pc++; …
Dispatch(pc)
benefit = 1
benefit = 2
benefit = 3
benefit = 6
benefit = 7
benefit = 8
benefit = 9
benefit = 10
benefit = …
Y = X + Z
pc = 16 ; i = BNEQ Z, 15
Final Result
Choose to specialize on pc because benefit is far greater than for instr.opcode
Generate different versions for each of the hottest values of pc
Terminate loop unrolling either naturally (when we don’t know what pc is anymore) or with a simple heuristic
Implementation Ideas
Use Dynamo Hot trace as basis for specialization Intuitively, follow the lifetime of an object
as it travels through the program across function boundaries
Unfortunately, closed-source, and API isn’t expressive enough
Implementation Ideas
JikesRVM Java VM written in Java Has a primitive framework for sampling Has a fairly sophisticated framework for
dynamic recompilation Does aggressive inlining Only instrument hot traces (but compiler
is slow…)