fully dynamic specialization aj shankar osq lunch 9 december 2003

Fully Dynamic Specialization

AJ Shankar

OSQ Lunch

9 December 2003

“That’s Why They Play the Game”

Programs are executed because we can’t determine their behavior statically!

Idea: Optimize programs dynamically to take advantage of runtime information we can’t get statically Look at portions of the program for

predictable inputs that we can optimize for

Specialization

Recompile portions of the program, using known runtime values as constants Possibly many variants

of the same code Allow for fallback to

original code when assumptions are not met

Predictable == recurrent

Generic

Unpredictable Predictable

G P2 P3 P4

Unpredictable

Predictable

LOAD pcX = …X = …

How It Works

Chose a good region of code to specialize: after a good predictable instruction

Insert dispatch that checks the result of the chosen instruction

Recompile code for different results of the instruction

During execution, jump to appropriate specialized code

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

…

Rest of Code

…

When Is This a Good Idea?

Any app whose execution is heavily dependent on input

For instance Interpreters Raytracers Dynamic content producers (CGI scripts,

etc.)

Specialization Is Hard!

Specializing code at runtime is costly Can even slow the program down

Existing specializers rely on static annotations to clue them in about profitable areas Difficult to get right Limits specialization potential

Existing: DyC, Cyclone, etc.

Explicitly annotate static data No support for automatic specialization

of frequently-executed code Could compile lots of useless stuff

No concrete store information Doesn’t take advantage of the fact that

memory location X is constant for the lifetime of the program

Existing: Calpa

Mock, et al, 2000. Extension to DyC. Profile execution on sample input to derive

annotations But converting a concrete profile to an

abstract annotation means Still unable to detect concrete memory constants Frequently executed code for arbitrary input?

Still needs source, is offline!

Motivating Example: Interpreter

while(1) {i = instrs[pc];switch(instr.opcode) {

case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;

case BNEQ;if (env[i.op1] != 0)

pc = env[i.op2];else pc++;break;

...}

}

Sample interpreted program:

X = 10;…WHILE (Z != 0) {

Y = X+Z;…

}

• X is constant after initialization• concrete memory location

• Y = X+Z executed frequently

Motivating Example: Interpreter





...}

}

Sample interpreted program:

X = 10;…WHILE (Z != 0) {

Y = X+Z;…

}

while(1) {while (pc == 15) {

// Y = X + Zenv[3] = 10 + env[2];…// Z != 0 ?if (env[2] == 0)

pc = 19;} else {

// normal loop }

}

A More Concrete Approach

Do everything at runtime! Specialize on execution-time hot values Know which concrete memory locations are

constant Other benefits of this approach:

Specialize temporally, as execution progresses Specialize dynamically loaded libraries as well No annotations or source code necessary

LOAD pcX = …X = …LOAD pc

A Quick Recap

Chose a good region of code to specialize

Insert dispatch that checks the result of the chosen instruction (the “trigger”)

Recompile code for different values of a hot instruction

During execution, jump to appropriate specialized code

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(pc)

pc=15 pc=27 while(1)

Rest of Code

…

The Details

Need to identify the best predictable instruction Specializing on its result should provide

the greatest benefit To find it, gather profile information about

all instructions Need to actually do the specializing

Instrumentation: Hot Values

What’s a hot value? One that occurs frequently as the result of an instruction x % 2 has two very hot values, 0 and 1

Good candidate instructions are predictable: result in (only) a few hot values For instance, small_constant_table[x], but not

rand(x) Case study: Interpreter

Predictable instructions: LOAD pc, instr.opcodeinstr = instrs[pc];switch(instr.opcode) { … }

Instrumentation: Store Profile

Keep track of memory locations that have been written to

Idea: if a location hasn’t been written to yet, it probably won’t be later, either

Case study: Interpreter Store profile says env[Y] written to a lot,

but env[X], instrs[] never written toregs[instr.res] = regs[instr.op1] + regs[instr.op2];

Invalidating Specialized Code

Memory locations may not really be constant

When ‘constant’ memory is overwritten, must invalidate or modify specializations that depended on it

How does Calpa handle invalidation? Computes points-to set Inserts invalidation calls at all appropriate points

(offline) Too costly an approach, without modification

Invalidation Options

Write barrier Still feasible if field is

private On-entry checks

Feasible if specialization depends on a small number of memory locations

e.g. Factor(BigInt x) Hardware support

e.g. Mondrian Ideal solution Possible to simulate?

Class Interpreter {private Instruction[] instrs;void SetInstrs(Instruction[] is) {

instrs = is;}

}

Dispatch

Spec1 Default

Hot Instruction

CheckMem

Invalidate

Specialization Algorithm

1. Find good candidate instructions Predictable Frequently executed

2. For each candidate instruction Simultaneously evaluate method using constant

propagation for some of its hot values Compute overall cost/benefit

3. Choose the best instruction

Specializing the Interpreter





...}

}

Instr.opcode:

Executed very frequently

A small handful of values

pc:

Executed very frequently

More values, but still reasonable

Candidates:

switch(ADD)

Specializing on instr.opcode

LOOP: i = instrs[pc]

case ADD:

switch(i.opcode)

……

env[i.res] = env[i.op1]+env[i.op2]

pc = pc + 1

goto LOOP

switch(ADD)

case ADD:

benefit = 1


pc = pc + 1

goto LOOP


benefit = 3

benefit = 2

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

Dispatch(opcode)

{}

Other values of opcode have similar results…

LOOP: i = instrs[15]

Specializing on pc


case ADD:

switch(i.opcode)

……


pc = pc + 1

goto LOOP


switch(ADD)

case ADD:

env[Y] = 10 + env[Z]

pc = 15 + 1


switch(BNEQ)

if (env[Z] != 0)

pc = 15

pc = 15 ; i = ADD Y, X, Z

pc = 15 ; i = ADD Y, X, Z

pc = 15 ; i = ADD Y, X, Z

pc = 15; i = ADD Y, X, Z

pc = 16 ; i = ADD Y, X, Z

pc = 16 ; i = BNEQ Z, 15

pc = 16 ; i = BNEQ Z, 15

pc++; …

Dispatch(pc)

benefit = 1

benefit = 2

benefit = 3

benefit = 6

benefit = 7

benefit = 8

benefit = 9

benefit = 10

benefit = …

Y = X + Z

pc = 16 ; i = BNEQ Z, 15

Final Result

Choose to specialize on pc because benefit is far greater than for instr.opcode

Generate different versions for each of the hottest values of pc

Terminate loop unrolling either naturally (when we don’t know what pc is anymore) or with a simple heuristic

Implementation Ideas

Use Dynamo Hot trace as basis for specialization Intuitively, follow the lifetime of an object

as it travels through the program across function boundaries

Unfortunately, closed-source, and API isn’t expressive enough

Implementation Ideas

JikesRVM Java VM written in Java Has a primitive framework for sampling Has a fairly sophisticated framework for

dynamic recompilation Does aggressive inlining Only instrument hot traces (but compiler

is slow…)

fully dynamic specialization aj shankar osq lunch 9 december 2003

Documents