fully dynamic specialization aj shankar osq lunch 9 december 2003

24
Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Upload: gary-bowyer

Post on 16-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Fully Dynamic Specialization

AJ Shankar

OSQ Lunch

9 December 2003

Page 2: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

“That’s Why They Play the Game”

Programs are executed because we can’t determine their behavior statically!

Idea: Optimize programs dynamically to take advantage of runtime information we can’t get statically Look at portions of the program for

predictable inputs that we can optimize for

Page 3: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Specialization

Recompile portions of the program, using known runtime values as constants Possibly many variants

of the same code Allow for fallback to

original code when assumptions are not met

Predictable == recurrent

Generic

Unpredictable Predictable

G P2 P3 P4

Unpredictable

Predictable

Page 4: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

LOAD pcX = …X = …

How It Works

Chose a good region of code to specialize: after a good predictable instruction

Insert dispatch that checks the result of the chosen instruction

Recompile code for different results of the instruction

During execution, jump to appropriate specialized code

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

Rest of Code

Page 5: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

When Is This a Good Idea?

Any app whose execution is heavily dependent on input

For instance Interpreters Raytracers Dynamic content producers (CGI scripts,

etc.)

Page 6: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Specialization Is Hard!

Specializing code at runtime is costly Can even slow the program down

Existing specializers rely on static annotations to clue them in about profitable areas Difficult to get right Limits specialization potential

Page 7: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Existing: DyC, Cyclone, etc.

Explicitly annotate static data No support for automatic specialization

of frequently-executed code Could compile lots of useless stuff

No concrete store information Doesn’t take advantage of the fact that

memory location X is constant for the lifetime of the program

Page 8: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Existing: Calpa

Mock, et al, 2000. Extension to DyC. Profile execution on sample input to derive

annotations But converting a concrete profile to an

abstract annotation means Still unable to detect concrete memory constants Frequently executed code for arbitrary input?

Still needs source, is offline!

Page 9: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Motivating Example: Interpreter

while(1) {i = instrs[pc];switch(instr.opcode) {

case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;

case BNEQ;if (env[i.op1] != 0)

pc = env[i.op2];else pc++;break;

...}

}

Sample interpreted program:

X = 10;…WHILE (Z != 0) {

Y = X+Z;…

}

• X is constant after initialization• concrete memory location

• Y = X+Z executed frequently

Page 10: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Motivating Example: Interpreter

while(1) {i = instrs[pc];switch(instr.opcode) {

case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;

case BNEQ;if (env[i.op1] != 0)

pc = env[i.op2];else pc++;break;

...}

}

Sample interpreted program:

X = 10;…WHILE (Z != 0) {

Y = X+Z;…

}

while(1) {while (pc == 15) {

// Y = X + Zenv[3] = 10 + env[2];…// Z != 0 ?if (env[2] == 0)

pc = 19;} else {

// normal loop }

}

Page 11: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

A More Concrete Approach

Do everything at runtime! Specialize on execution-time hot values Know which concrete memory locations are

constant Other benefits of this approach:

Specialize temporally, as execution progresses Specialize dynamically loaded libraries as well No annotations or source code necessary

Page 12: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

LOAD pcX = …X = …LOAD pc

A Quick Recap

Chose a good region of code to specialize

Insert dispatch that checks the result of the chosen instruction (the “trigger”)

Recompile code for different values of a hot instruction

During execution, jump to appropriate specialized code

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(X)

Spec1 Spec2 Default

… …

Dispatch(pc)

pc=15 pc=27 while(1)

Rest of Code

Page 13: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

The Details

Need to identify the best predictable instruction Specializing on its result should provide

the greatest benefit To find it, gather profile information about

all instructions Need to actually do the specializing

Page 14: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Instrumentation: Hot Values

What’s a hot value? One that occurs frequently as the result of an instruction x % 2 has two very hot values, 0 and 1

Good candidate instructions are predictable: result in (only) a few hot values For instance, small_constant_table[x], but not

rand(x) Case study: Interpreter

Predictable instructions: LOAD pc, instr.opcodeinstr = instrs[pc];switch(instr.opcode) { … }

Page 15: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Instrumentation: Store Profile

Keep track of memory locations that have been written to

Idea: if a location hasn’t been written to yet, it probably won’t be later, either

Case study: Interpreter Store profile says env[Y] written to a lot,

but env[X], instrs[] never written toregs[instr.res] = regs[instr.op1] + regs[instr.op2];

Page 16: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Invalidating Specialized Code

Memory locations may not really be constant

When ‘constant’ memory is overwritten, must invalidate or modify specializations that depended on it

How does Calpa handle invalidation? Computes points-to set Inserts invalidation calls at all appropriate points

(offline) Too costly an approach, without modification

Page 17: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Invalidation Options

Write barrier Still feasible if field is

private On-entry checks

Feasible if specialization depends on a small number of memory locations

e.g. Factor(BigInt x) Hardware support

e.g. Mondrian Ideal solution Possible to simulate?

Class Interpreter {private Instruction[] instrs;void SetInstrs(Instruction[] is) {

instrs = is;}

}

Dispatch

Spec1 Default

Hot Instruction

CheckMem

Invalidate

Page 18: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Specialization Algorithm

1. Find good candidate instructions Predictable Frequently executed

2. For each candidate instruction Simultaneously evaluate method using constant

propagation for some of its hot values Compute overall cost/benefit

3. Choose the best instruction

Page 19: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Specializing the Interpreter

while(1) {i = instrs[pc];switch(instr.opcode) {

case ADD:env[i.res] = env[i.op1] + env[i.op2];pc++;break;

case BNEQ;if (env[i.op1] != 0)

pc = env[i.op2];else pc++;break;

...}

}

Instr.opcode:

Executed very frequently

A small handful of values

pc:

Executed very frequently

More values, but still reasonable

Candidates:

Page 20: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

switch(ADD)

Specializing on instr.opcode

LOOP: i = instrs[pc]

case ADD:

switch(i.opcode)

……

env[i.res] = env[i.op1]+env[i.op2]

pc = pc + 1

goto LOOP

switch(ADD)

case ADD:

benefit = 1

env[i.res] = env[i.op1]+env[i.op2]

pc = pc + 1

goto LOOP

LOOP: i = instrs[pc]

benefit = 3

benefit = 2

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

i.opcode = ADD

Dispatch(opcode)

{}

Other values of opcode have similar results…

Page 21: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

LOOP: i = instrs[15]

Specializing on pc

LOOP: i = instrs[pc]

case ADD:

switch(i.opcode)

……

env[i.res] = env[i.op1]+env[i.op2]

pc = pc + 1

goto LOOP

LOOP: i = instrs[15]

switch(ADD)

case ADD:

env[Y] = 10 + env[Z]

pc = 15 + 1

LOOP: i = instrs[16]

switch(BNEQ)

if (env[Z] != 0)

pc = 15

pc = 15 ; i = ADD Y, X, Z

pc = 15 ; i = ADD Y, X, Z

pc = 15 ; i = ADD Y, X, Z

pc = 15; i = ADD Y, X, Z

pc = 16 ; i = ADD Y, X, Z

pc = 16 ; i = BNEQ Z, 15

pc = 16 ; i = BNEQ Z, 15

pc++; …

Dispatch(pc)

benefit = 1

benefit = 2

benefit = 3

benefit = 6

benefit = 7

benefit = 8

benefit = 9

benefit = 10

benefit = …

Y = X + Z

pc = 16 ; i = BNEQ Z, 15

Page 22: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Final Result

Choose to specialize on pc because benefit is far greater than for instr.opcode

Generate different versions for each of the hottest values of pc

Terminate loop unrolling either naturally (when we don’t know what pc is anymore) or with a simple heuristic

Page 23: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Implementation Ideas

Use Dynamo Hot trace as basis for specialization Intuitively, follow the lifetime of an object

as it travels through the program across function boundaries

Unfortunately, closed-source, and API isn’t expressive enough

Page 24: Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

Implementation Ideas

JikesRVM Java VM written in Java Has a primitive framework for sampling Has a fairly sophisticated framework for

dynamic recompilation Does aggressive inlining Only instrument hot traces (but compiler

is slow…)