trace-based compilationckrintz/classes/s20/cs... · code cache •translated code blocks (traces)...

37
Trace-based Compilation

Upload: others

Post on 19-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Trace-based Compilation

Page 2: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Compilation• Dynamic compilation

n Compilation of methods at runtime at any time & repeatedly4Table-base compilation precludes the need for code patching

u Adds a level of indirection (memory access) for table lookup4Can be performed in the background

• Enables multi-mode (mixed mode) compilationn Interpret first, compile when hot (= executed repeatedly)n Fast-compile (no opts) first (Just-In-Time compilation)

4Optimize (perhaps repeatedly) over time• Enables adaptive compilation

n (Re-)compile hot methods and optimize based on their execution profile, behavior

• Profiling/sampling required to identify when/how to do so

Page 3: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Compilation• Trace-based

n Same thing, only for a series of instructions (e.g. a path through a method) -- not an entire method4Store native code traces in a code cache

n Link traces together when possible to avoid going back and forth between executing code and the runtime

n Used in 4Binary translators/instrumentors: Pin, virtualization systems4Binary optimizers: Dynamo, Sun Studio, SOLAR, ADORE4VMs for high level languages

u Javascript: Mozilla’s SpiderMonkey, SPURu Java: Yeti, HotpathVM, Hotspot TraceJIT, Testerossa, Dalvik u Python: PyPyu Lua: LuaJIT

Page 4: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Translated Code Blocks• Emulator translates one block at a time into a CFG

n What are static BBs? --> require >1 pass over code to construct control flow graph (CFG)

add . . .load . . .store . . .

loop: load . . .add . . .store . . .brcond skipload . . .sub . . .

skip: add . . .store . . .brcond loopadd . . .store . . .jump indirect. . .

Find the leaders:1) first instruction in method2) Instruction after branch/jump3) Target of a branch/jump... Then draw the blocks and edges

Page 5: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Translated Code Blocks• Emulator translates one block at a time into a CFG

n What are static BBs? --> require >1 pass over code to construct control flow graph (CFG)4Note that each instruction is in ONLY ONE basic block

add . . .load . . .store . . .

loop: load . . .add . . .store . . .brcond skipload . . .sub . . .

skip: add . . .store . . .brcond loopadd . . .store . . .jump indirect. . .

BB1

BB2

BB3

BB4

BB5

Find the leaders:1) first instruction in method2) Instruction after branch/jump3) Target of a branch/jump... Then draw the blocks and edges

Page 6: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Tracing JIT: Translated Code Blocks

• Emulator translates one block at a time: n Dynamic basic blocks not static basic blocks

• Dynamic BBs - determined by actual flow of executing programn Execute the programn Form basic blocks and connect them as you discover the

edgesn You CANNOT go back (one pass only!)

4The cost/trade-off is that some instructions are in multiple basic blocks...

Page 7: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Static Basic Blocks versus Dynamic Basic Blocks

add . . .load . . .store . . .

loop: load . . .add . . .store . . .brcond skipload . . .sub . . .

skip: add . . .store . . .brcond loopadd . . .store . . .jump indirect. . .

BB1

BB2

BB3

BB4

BB5

Page 8: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Static Basic Blocks versus Dynamic Basic Blocks

add . . .load . . .store . . .

loop: load . . .add . . .store . . .brcond skipload . . .sub . . .

skip: add . . .store . . .brcond loopadd . . .store . . .jump indirect. . .

BB1

BB2

BB3

BB4

BB5

not taken, then taken

taken

jumps are always taken

Page 9: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Static Basic Blocks versus Dynamic Basic Blocks

add . . .load . . .store . . .

loop: load . . .add . . .store . . .brcond skipload . . .sub . . .

skip: add . . .store . . .brcond loopadd . . .store . . .jump indirect. . .

BB1

BB2

BB3

BB4

BB5

add . . .load . . .store . . .

loop: load . . .add . . .store . . .brcond skipload . . .sub . . .

[skip:]add . . .store . . .brcond loop

loop: load . . .add . . .store . . .brcond skip

skip: add . . .store . . .brcond loop. . .

BB1

BB2

BB3

BB4

not taken, then taken

taken

jumps are always taken

Page 10: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Translated Code Blocks

• Emulator translates a block at a time: dynamic basic blocks• Dynamic BBs - determined by actual flow of executing

programn Traces are series of dynamic basic blocks

4Single entry, multiple exitn Note that traces can go across function call boundaries

4Essentially inlining on-demandA

CE

BOops!

Oops!

A

C

E

B

D

Page 11: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache

• Translated code blocks (traces) are stored in a code cache (in native code) for reusen Old, unused blocks are replaced when the cache fills

A

CE

B

Page 12: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache

• Translated code blocks (traces) are stored in a code cache (in native code) for reusen Old, unused blocks are replaced when the cache fills

• As the translator finds code blocks that are related in terms of control flown A block is the fall through of one in the cachen A block is the jump target of one in the cachen A block jumps or falls through to a block already in cache

n Links them together in some cases, as discovered4When it can avoid jumping back into the interpreter

u For frequently taken branches, or sequential bb’s4Called “chaining”, or “stitching” in our paper

Page 13: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Incremental Predecoding and Tracing

• Translate binary while the program is executing (dynamically)n Predecode or translate new instructions when reached by program

4Incrementally4SPC: source (virtual) program counter4TPC: target program counter

SPC to TPCmap table

EmulationManager

Interpreter

Translatormiss

hit

Control flow

Data access

Byte or source code

Binary traces

SPC

TPC

Page 14: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Management

• Differences from memory/hw cachen Traces do not have a fixed sizen Presence and locations of the blocks are dependent on one

another b/c of chainingn There is no copy of the cached blocks in a `backing store‘

Page 15: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Management

• Differences from memory/hw cachen Traces do not have a fixed sizen Presence and locations of the blocks are dependent on one

another b/c of chainingn There is no copy of the cached blocks in a `backing store’

• Two key code cache operationsn SPC to TPC mapping (find block in code cache given an SPC)n Given a TPC in code cache, find the corresponding SPC

4This is needed to find the source when an exception occurs

Page 16: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Management

• Differences from memory/hw cachen Traces do not have a fixed sizen Presence and locations of the blocks are dependent on one

another b/c of chainingn There is no copy of the cached blocks in a `backing store’

• Two key code cache operationsn SPC to TPC mapping (find block in code cache given an SPC)n Given a TPC in code cache, find the corresponding SPC

4This is needed to find the source when an exception occurs

• Practical implementations limit the size of the code cachen We need a replacement policy for when/if cache fills

Page 17: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Replacement Policies• Least Recently Used (LRU)

n Good idea, but hard to implement4Requires extra memory access per block to keep track4Unlinking gets complicated

u All pointers into removed block must be updated to point to interpreteru Can use back pointers

4Can lead to fragmentation (holes where nothing fits)

Page 18: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Replacement Policies• Least Recently Used (LRU)

n Good idea, but hard to implement4Requires extra memory access per block to keep track4Unlinking gets complicated

u All pointers into removed block must be updated to point to interpreteru Can use back pointers

4Can lead to fragmentation (holes where nothing fits)• Flush when full• Preemptive flush

n Flush when program behavior changes

Page 19: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Replacement Policies• Least Recently Used (LRU)

n Good idea, but hard to implement4Requires extra memory access per block to keep track4Unlinking gets complicated

u All pointers into removed block must be updated to point to interpreteru Can use back pointers

4Can lead to fragmentation (holes where nothing fits)• Flush when full• Preemptive flush

n Flush when program behavior changes• Fine-grained FIFO

n Manage code cache as circular buffern Must still maintain backpointers to keep track of chaining

Page 20: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Code Cache Replacement Policies

• Least Recently Used (LRU)• Flush when full• Preemptive flush• Fine-grained FIFO• Course-grained FIFO

n Partition cache into large blocks, each managed via FIFOn Simplifies backpointer problem b/c only cross-block pointers

must be maintained4Backpointers point to blocks, no backpointers within blocks

Page 21: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Javascript Implementation

• Interpreter/runtime in browser (Node.js outside of browser)• Lookups done by name at runtime b/c of dynamic typing --

variables have no types!n Variable access -- Lookup at runtime -- high overheadn Method calls -- Dynamic dispatch -- high overhead

Page 22: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Javascript Implementation

• Interpreter/runtime in browser (Node.js outside of browser)• Lookups done by name at runtime b/c of dynamic typing --

variables have no types!n Variable access -- Lookup at runtime -- high overheadn Method calls -- Dynamic dispatch -- high overhead

• Complicationsn Inheritance – lookup name in table, then walk hierarchy

doing the same4Local hash tables and global hashtables

n Fields/methods can be added dynamically via use or call4Must always do the lookup to make sure you don’t miss something

Page 23: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Javascript Implementation

• Prototypes vs classes; prototype inheritance vs classicaln See related readings for more on this & automatic type inference

for JS

• Unboxing: decoding an object to determine its type / layoutn High overhead

• Fat/high-level bytecodes (much work done in runtime for each)n Like superinstructions – only at the bytecode leveln Unlike Java/GForth, similar to Python/Ruby – low dispatch raten So traditional interpreter opts that target the dispatch loop have

little impact

Page 24: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Today’s Paper• Trace-based compilation (why? How?)

n Mixed mode (interp + compile traces)n Two assumptions

4Most time spent in hot loops4Hot loops are mostly type-stable

n Traces are extended bb’s – they have multiple exits4Single entry (at the top)

u No joins within/into (single entry)- Entry only through the top- Makes optimizing much simpler

4Functions inlined (a bit complex, b/c we have to still stack frames)u Trace across function boundaries

A

CE

BOops!

Oops!

Page 25: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Today’s Paper• Trace-based compilation (why? How?)

n Mixed mode• Contributions

n A real implementationn Focus on tracing loops only

4Nested trace trees4Why? How does an outer get to an inner trace?

Page 26: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Today’s Paper• Trace-based compilation (why? How?)

n Mixed mode• Contributions

n A real implementationn Focus on tracing loops only

4Nested trace trees4Why? How does an outer get to an inner trace?4Option1: trace inner (hot first), continue tracing through outer back

to inner (code duplication for outer for every side exit and type)u Will overflow the code cache (fixed-size memory area for traces)

4Option2: give up and only trace inner’s

4New option: keep track and separate inner and outer; have outer “call” inner

Page 27: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

TraceMonkey State Machine

Page 28: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Sieve example in Javascript, TraceMonkey LIR for Line 5 – tests/side exits are guards

Type ID in object headerInternal rep/type

Assumptions: primes is anArray type, store to primes[k]completes without error

Page 29: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

X86 for Line 5 in the Sieve example -- Guards are the *’d instructions

Page 30: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Compiler Optimizations Performed on Traces• Optimizations

n Floating point to integer emulation on non-fp architecturesn CSEn Simplification, constant folding, strength reductionn Source-language optimizations (Double -> int)n Dead stack stores, call stack stores, code elimination

• Blacklisting (Why? What? How?)

Page 31: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Compiler Optimizations and Blacklisting• Optimizations

n Floating point to integer emulation on non-fp architecturesn CSEn Simplification, constant folding, strength reductionn Source-language optimizations (Double -> int)n Dead stack stores, call stack stores, code elimination

• Blacklisting n Some traces won’t finish - won’t reach header (exceptions)

4So recording them is a waste4Set a counter so that you don’t record them for awhile4Try again, then if untraceable, mark in bytecode (bad loop header)4Trace lookup in blacklist is time consuming

Page 32: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

38

Static Single Assignment Form

• LIR of a function (intermed. repr.)n Simplifies program analysis and optimizationn Static Single Assignment (SSA) form of a program makes

information about variable definitions and uses explicit• A program is in SSA form if it satisfies:

n each definition (def) has a distinct name; andn each use refers to a single definition.

• To make this work, the compiler inserts special operations, called f-functions, at points where control flow paths join (block E above)

C

ED

Page 33: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

39

SSA Form: f - Functions and joins/join-points

• A f-function behaves as follows:x1 = … x2 = …

x3 = f (x1, x2)

This assigns to x3 the value of x1, if control comes from the left, and that of x2 if control comes from the right.n Its simply a marker for the compiler (not a real function)

4Lets the compiler optimize the block without having to worry about different versions – the Phi function will select it

• Significantly complicates compilation• On entry to a basic block, all the f-functions in the block

execute (conceptually) in parallel.

Page 34: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

40

Static Single Assignment Form

• LIR of a function (intermed. repr.)n Simplifies program analysis and optimizationn Static Single Assignment (SSA) form of a program makes

information about variable definitions and uses explicit• A program is in SSA form if it satisfies:

n each definition (def) has a distinct name; andn each use refers to a single definition.

• To make this work, the compiler inserts special operations, called f-functions, at points where control flow paths joinn Although this is a key part of SSA, traces contain NO joins

4In TraceMonkey

A

CE

BOops!

Oops!

Page 35: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Register Allocation

• Linear scan over the instructions in a tracen From last to first, greedily

v3 = add v1, v2• Spilling

n Heuristic: consider all values in registers at this point (for the instructions after this one)Find oldest register-carried value across them:4Consider values in registers for instruction after this one4Look at instructions before this instruction

u Find last instruction that refers to (uses) each value u Spill the value that has the minimum (smallest index) instruction

4Copy a value to/from memory (runtime stack location) when neededu Insert restore code just after this instructionu Insert spill code just after last use (a previous instruction)

Page 36: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Other Issues and Results

• Calling external functionsn C-function: Foreign Function Interface (FFI)

• Access to global variables, exceptions, to the call stackn Aborts trace, but rarely occurs (note this when you code!)

• Findingsn 9 of 26 benchmarks better than G8 and SFX (all are better

than SpiderMonkey interpreter-only baseline)4Both G8 (hot method compilation) and SFX (opt’d interpretation)4Footprint not measured

Page 37: Trace-based Compilationckrintz/classes/s20/cs... · Code Cache •Translated code blocks (traces) are stored in a code cache (in native code) for reuse nOld, unused blocks are replaced

Results