beamjit: an llvm based just-in-time compiler for …erlang functional language developed by...
TRANSCRIPT
Who am I?
Senior researcher at the Swedish Institute of Computer Science(SICS) working on programming languages, tools and distributedsystems.
What this talk is About
Automatic synthesis of a JIT-compiling VM for Erlang.
Our experiences with LLVM and MCJIT.
Outline
BackgroundJust-In-Time CompilationErlangBEAM: Specification & ImplementationProject Goals
JIT:ing as it applies to BEAMProfilingTracingGenerating and Calling Native CodeFuture Work
LLVM’s Strengths and WeaknessesQuestions
Just-In-Time (JIT) Compilation
Decide at runtime to compile “hot” parts to native code.
Fairly common implementation technique.
Python (Psyco, PyPy)Smalltalk (Cog)Java (HotSpot)JavaScript (SquirrelFish Extreme, SpiderMonkey)
Erlang
Functional language developed by Ericsson.
Soft real-time.
Multi-threaded with share-nothing semantics.
Message passing.
Powerful supervision primitives.
Hot code loading and replacement.
OTP: Framework for writing fault-tolerant applications.
Compiled to virtual machine (VM), BEAM.
BEAM: Specification & Implementation
BEAM is the name of the Erlang VM.
A register machine.
Approximately 150 instructions which are specialized toaround 450 macro-instructions using a peephole optimizerduring code loading.
Instructions are CISC-like.
Hand-written C (mostly) directly threaded interpreter.
No authoritative description of the semantics of the VMexcept the implementation source code!
HiPE – a ahead-of-time native compiler
Traditional back-end for x86, PowerPC, SPARC, ARMErLLVM back-end based on LLVM
Motivation
A JIT compiler increases flexibility.
Compiled BEAM modules are platform independent.
Cross-module optimization.
Integrates naturally with code upgrade.
Project Goals
Goals:
Do as little manual work as possible.
Preserve the semantics of plain BEAM.
Automatically stay in sync with the plain BEAM, i.e. if bugsare fixed in the interpreter the JIT should not have to bemodified manually.
Have a native code generator which is state-of-the-art.
Plan:
Parse and extract semantics from the C implementation.
Transform the parsed C source to C fragments which are thenreassembled into a replacement VM which includes aJIT-compiler.
Outline
BackgroundJust-In-Time CompilationErlangBEAM: Specification & ImplementationProject Goals
JIT:ing as it applies to BEAMProfilingTracingGenerating and Calling Native CodeFuture Work
LLVM’s Strengths and WeaknessesQuestions
Just-In-Time (JIT) Compilation as it Applies toBEAM
Use light-weight profiling to detect when we are at a placewhich is frequently executed.
Trace the flow of execution until we get back to the sameplace.
Compile trace to native code.
NOTE: We are tracing the execution flow in the interpreter,the granularity is finer than BEAM opcodes.
Profile Trace Generate Native Code Run Native
BEAMJIT: What is Needed?
Three basic execution modes
ProfilingTracingNative
Interpreter loop has to be modified to support modeswitching:
Turn on/off tracing.Passing state to/from native code.
Native code generation: Need the semantics for eachinstruction.
Profiling
First step in figuring out what to JIT-compile
Let Erlang compiler insert profile instructions at locationswhich can be the head of a loop
Maintain a time stamp and counter for each location
Measure execution intensity by incrementing a counter if thelocation was visited recently, reset otherwise
Trigger tracing when count is high enough
Blacklist locations which:
Never produce a successful trace.Where we, when executing native code, leave the tracewithout executing the loop at least once.
Outline
BackgroundJust-In-Time CompilationErlangBEAM: Specification & ImplementationProject Goals
JIT:ing as it applies to BEAMProfilingTracingGenerating and Calling Native CodeFuture Work
LLVM’s Strengths and WeaknessesQuestions
Extracting the Semantics of the BEAM Opcodes
Use libclang to parse and simplify the interpreter source:
Use Erlang binding for libclang.
Flatten variable scopes.
Remove loops, replace by if + goto.
Make fall-troughs explicit.
Turn structured C into a spaghetti of Basic Blocks (BB), CFG– Control Flow Graph.
Do liveness-analysis of variables.
bb_4090
bb_4097
bb_2488bb_4109
bb_4115 bb_4113 bb_4111
bb_4365
bb_4372
bb_3878
bb_3880
bb_3872
bb_4014
bb_4580
bb_4561
bb_4485
bb_4488bb_4413
bb_4420
i_bs_skip_bits2_fxrI
bb_4486
do_is_ne_exact_literal
bb_3900
bb_4500
bb_4481
bb_3764
bb_4401
i_bs_match_string_xfII
bb_4008
i_bs_get_float2_fxIsId
bb_4356 bb_4360 bb_4358
bb_4161
bb_4174
bb_4096
bb_4099
i_bs_skip_bits2_fryI
bb_4446 bb_4445
bb_3925
i_bs_get_binary2_frIsId
bb_4063 bb_4061
bb_4073
bb_4065
i_bs_get_utf8_rfd
bb_3911
bb_4067
bb_3898
bb_3894
bb_3896
bb_4130
bb_4138
bb_4320
bb_4327
bb_4521
bb_4517
bb_4126
bb_4557
bb_4140
bb_4143
i_bs_get_binary2_fxIsId
bb_4128
bb_4343
bb_4317
bb_3746
bb_3753bb_3752
bb_4075
bb_4077
bb_4454
bb_4390
bb_4362
bb_3851
bb_3854
bb_3852
bb_3860
bb_4477
bb_3745
bb_4540
bb_4448
bb_4460
bb_4345bb_4178
bb_4132
bb_3866
bb_3584
bb_3760
bb_3936 bb_4371
bb_4384
bb_4313
bb_4176
bb_4162
bb_4164
i_bs_get_integer_fIId
i_bs_skip_bits2_fxxI
bb_4525 bb_4526 bb_4406bb_4566
bb_4574
bb_4374
bb_3944 bb_4397
bb_4386
bb_4009
bb_4016bb_4015
bb_3884
bb_3883
is_function_fr
bb_4315
bb_4388
i_is_ne_exact_literal_yfc
bb_3788
bb_3794 bb_3792 bb_3790
bb_3950 bb_3784
bb_4329
bb_4326bb_3775
bb_3847
bb_3776
bb_3783
bb_3538
bb_4527
bb_3758
i_bs_start_match2_rfIId
bb_3919
bb_4076
bb_4155
bb_4084
bb_3762
bb_4311
bb_3921 bb_3923
bb_1636
i_bs_match_string_rfII
bb_4141
bb_4149
bb_4441
bb_4437
i_bs_get_float2_frIsId
bb_4568
bb_4339
bb_4341
bb_4493bb_4533
bb_4180
i_bs_get_utf8_xfd
i_bs_skip_bits2_frxI
bb_4405
i_bs_start_match2_yfIId i_bs_start_match2_xfIId
bb_4408
i_bs_skip_bits2_fxyI
bb_4565
Naıve Tracing
Use a new version of the interpreter, generated from the CFG.
Generate a tracing and non-tracing version of each opcode.
For each basic block we pass through, record basic blockidentity and PC.
Abort trace if too long.
If we reach the profile instruction we started the trace from –We have found a loop!
Naıve Profiling to Tracing Mode Switch
Directthreading
0xCDAADEF0
0xCDAADEFF
0xCDAACAF8
0xCDAAD432
0xDEADBEEF
PC
x = x + 1;...
0XCDAACAF8
Indirect threading17
42
97
23
12
PC
x = x + 1;...
0XCDAACAF8
95: 0xCBAADEF0
96: 0xCDAEDEFF
97: 0xCDAACAF8
98: 0xCDAAD432
99: 0xDEADBEEF
Profiling Opcodes
95: 0xCBBADEF0
96: 0xCBAEDEFF
97: 0xCBAACAF8
98: 0xCBAAD432
99: 0xDBADBEEF
Tracing Opcodes
record_trace_bb(4711, PC)x = x + 1;...
0XCBAACAF8
Have two implementations of each opcode.
Switch the table of opcodes.
Compiler has to assume that a mode switch can take place atany block → performance suffers
Refined Tracing
Modify the interpreter loop as little as possible.
Have separate trace interpreter.
Limit entry to the interpreter at instruction boundaries.
Have separate cleanup-interpreter to continue execution tothe next instruction boundary.
Reuse cleanup-interpreter when returning from native mode.
Profiling Tracing Native Cleanup
Further Tracing Refinements
Ensure that we have a representative trace:
Follow along a previously created trace.
Allow multi-path traces.
Generate native code when the trace has not grown for Nsuccessive iterations.
Enforce limit on total size of trace.
Trace Compression
Large CFGs slow down LLVM optimization and native codegeneration significantly.
Solution: Compress traces to remove shared segments.
BB=0ip=0x4567
BB=1ip=0x4567
BB=2ip=0x4567
BB=3ip=0x4568
BB=3ip=0x4568
BB=4ip=0x4569
BB=4ip=0x4569
BB=0ip=0x4567
BB=1ip=0x4567
BB=2ip=0x4567
BB=3ip=0x4568
BB=4ip=0x4569
Outline
BackgroundJust-In-Time CompilationErlangBEAM: Specification & ImplementationProject Goals
JIT:ing as it applies to BEAMProfilingTracingGenerating and Calling Native CodeFuture Work
LLVM’s Strengths and WeaknessesQuestions
Native-code Generation
Glue together LLVM-IR-fragments for the trace.
Guards are inserted to make sure we stay on the traced path.
Fragments are extracted from the CFG as C-source, compiledto IR using clang (at build-time) and loaded during systeminitialization.
Hand the resulting IR off to LLVM for the rest.
beam_emu.c CFG
fragments.c
jit_emu.c Trace
LLVM optimizer Native codeBitcode IR generator
Calling Native Code
Switching from interpreter to native code:
Use liveness information from the CFG.
Package native-code as a function where the arguments arethe live variables.
Switching from native code to interpreter:
The cleanup-interpreter is a set of functions, one for each BB,which tail-recursively calls the next BB. Arguments are thelive variables.
Cleanup-interpreter packs up live variables in a structurewhich the interpreter unpacks on return.
Performance Improvements
Run native-code generation in separate thread.
Erlang-aware constant propagation:
Eliminate loads from code (constant at compile time).Will eliminate loading of immediates.Will eliminate many of the guards.
Performance
Steady state:
Eliminates most of the instruction decode overhead.
Up to 50% reduction in runtime.
Well-behaved programs: around 25%.
Programs using cute tricks such as unrolling: up to 200%increase in runtime.
Future Work
Full SMP support.
Box/unboxing-aware constant propagation.
Extend JIT support to fold in primitives.
Outline
BackgroundJust-In-Time CompilationErlangBEAM: Specification & ImplementationProject Goals
JIT:ing as it applies to BEAMProfilingTracingGenerating and Calling Native CodeFuture Work
LLVM’s Strengths and WeaknessesQuestions
LLVM’s Strengths
Access to the C AST via libClang.
Quality of generated code.
The optimization framework.
LLVM’s Weaknesses
No thread-safety – Extra housekeeping to do things in thecorrect context.
Compilation could be much faster.
Native code has to be packaged as C-function – Would reallylike to have enough control to patch generated native code.
Clang/LLVM does bad job on the main interpreter:
Allocates the VM’s stack- and instruction-pointers on thestack.Inserts extra instruction sequence before indirect jumps (whendispatching to next VM instruction), GCC appears to insert itat the branch target and only if needed.
Costs us 15-20%
Outline
BackgroundJust-In-Time CompilationErlangBEAM: Specification & ImplementationProject Goals
JIT:ing as it applies to BEAMProfilingTracingGenerating and Calling Native CodeFuture Work
LLVM’s Strengths and WeaknessesQuestions
Questions?