building an efficient jit - llvm · final results 20x faster compile times! cpu total rprvt jit201...

41
Building An Efficient JIT the best kind of jit Nate Begeman Apple Inc August 1st, 2008

Upload: others

Post on 05-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Building An Efficient JITthe best kind of jit

Nate BegemanApple Inc

August 1st, 2008

Page 2: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Overview

• Resources

• JIT Basics

• Clang-based JIT

• An Efficient Clang-based JIT

Page 3: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Resources

Memory, Cycles, Etc.How much are we using?

Page 4: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

CPU

CPU TOTAL RPRVT

myjit 50ms 100ms 41928K

Page 5: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

TOTAL

CPU TOTAL RPRVT

myjit 50ms 100ms 41928K

Page 6: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

RPRVT

CPU TOTAL RPRVT

myjit 50ms 100ms 41928K

Page 7: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

JIT Basics

JIT 101Common JIT Tasks

Page 8: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

What does a JIT do?

• Create IR

• Loading Libraries

• Link Modules

• Optimization & Transforms

• Codegen

Page 9: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Loading Libraries

#include “llvm/Bitcode/ReaderWriter.h”

// ParseBitcodeFile - Read the specified bitcode file// returning the Module.Module *ParseBitcodeFile(MemoryBuffer *buffer, ...);

#include “llvm/System/DynamicLibrary.h”

// LoadLibraryPermanently - Load the dynamic library at // path. It will be unloaded when the program terminates.bool LoadLibraryPermanently(const char *path, ...);

Page 10: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Generating IR

#include “llvm/Module.h”

// No default constructor, must provide name.Module(const std::string &ModuleID);

You probably also want to...

#include “llvm/Module.h”

/// Set the data layoutvoid setDataLayout(const std::string& DL);/// Set the target triple.void setTargetTriple(const std::string &T);

Page 11: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

In Your Module...

• Functions

- Declarations

- Definitions

• Globals

• Annotations

Page 12: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Linking

• What?

• Why?

Page 13: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Linking

#include “llvm/Linker.h”

/// LinkModules - The Src module is linked into the Dst module/// such that types, global variables, functions, etc. are/// matched and resolved.static bool LinkModules(Module* Dst, Module* Src, ...);

Destroys Src...but not its memory!

Page 14: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Opts & Transforms

• No one correct level of optimization

• Create your own PassManager(s)

Page 15: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Opts & Transforms

#include “llvm/Transforms/Scalar.h”#include “llvm/Transforms/IPO.h”

std::vector<const char *> exportList;

PassManager Passes;Passes.add(new TargetData(M));Passes.add(createInternalizePass(exportList)); Passes.add(createScalarReplAggregatesPass()); Passes.add(createInstructionCombiningPass());Passes.add(createGlobalOptimizerPass());Passes.add(createFunctionInliningPass());

All these passes and more are yours for one low price!

Page 16: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Inlining

Constant *Fn = M->getFunction(“inlineMe”);

for (Value::use_iterator ui = Fn->use_begin(), ue = Fn->use_end(); ui != ue; ) { if (CallInst *CI = dyn_cast<CallInst>(*ui++)) InlineFunction(CI);}

#include “llvm/Transforms/Utils/Cloning.h”

/// InlineFunction - Performs one level of inlining.bool InlineFunction(CallInst *C)

Page 17: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Code Generation

• Get a function pointer from JIT

• Release when finished.#include “llvm/ExecutionEngine/ExecutionEngine.h”

/// getPointerToFunction - This returns the address of the/// specified function,compiling it if necessary.void *getPointerToFunction(Function *F);

/// freeMachineCodeForFunction - deallocate memory used to/// code-generate this Function.void freeMachineCodeForFunction(Function *F);

Page 18: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Practical Example

JIT 102A Clang-based JIT

Page 19: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Assembling the JIT

libclangC99

/// c_to_module - entry point into libclang, our shared library/// that uses clang to turn a C string into an LLVM IR Module.extern "C" Module *c_to_module(const char *source, char **log);

int main(int argc, char **argv) {...const char *source = getFile(path);...

for(;;) { // Create a new module from the source string Module *M = c_to_module(source, 0);

Page 20: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Assembling the JIT

libclangC99

parse bitcode

#include “llvm/Support/MemoryBuffer.h”

// Load my library of fancy extensions to Libc MemoryBuffer *buffer = MemoryBuffer::getFile("mylibc.bc"); Module *Library = ParseBitcodeFile(buffer);

Page 21: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Assembling the JIT

libclangC99

parse bitcode

Linker

// Link the modules together so that we can do inlining. // After this step, 'M' will contain all the bitcode. Linker::LinkModules(M, Library, 0 /* error string */);

// Don’t forget to delete Library, otherwise we’ll leak// memory.

delete Library;

Page 22: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Assembling the JIT

Optimizer

// Register some passes with a PassManager, and run them. PassManager Passes; Passes.add(new TargetData(M)); Passes.add(createInternalizePass(exportList)); Passes.add(createGlobalDCEPass()); Passes.add(createGlobalOptimizerPass()); Passes.add(createScalarReplAggregatesPass()); Passes.add(createInstructionCombiningPass()); Passes.run(*M);

libclangC99

parse bitcode

Linker

Page 23: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Assembling the JITCode

Generator

// Create the JIT ExistingModuleProvider *EMP; EMP = new ExistingModuleProvider(M); ExecutionEngine *JIT = ExecutionEngine::create(EMP); Function *F = M->getFunction("myCosine");

// Cast function pointer to correct type and call. jitfn fn = (jitfn)JIT->getPointerToFunction(F); printf("foo! %d\n", fn(5));

OptimizerlibclangC99

parse bitcode

Linker

Page 24: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Success!

...but at what price?

CodeGeneratorOptimizerlibclangC99

parse bitcode

Linker

Page 25: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Demo

Page 26: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

JIT 102 Results

Slow compile times!

CPU TOTAL RPRVT

jit102 600ms 800ms 31500K

Page 27: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

JIT 102 Results

CPU TOTAL RPRVT

jit102 600ms 800ms 31500K

Too much memory!

Page 28: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Efficient JITing

JIT 201Don’t do what we just taught you in 101!

Page 29: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

What does a JIT do?

• Loading Libraries

• Create IR

• Link Modules

• Optimization & Transforms

• Codegen

Page 30: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Areas To Optimize

• Loading Libraries

• Create IR

• Link Modules

• Optimization & Transforms

• Codegen

Page 31: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Loading Libraries

#include “llvm/Bitcode/ReaderWriter.h”

/// getBitcodeModuleProvider - Read the header of the specified/// bitcode buffer and prepare for lazy deserialization of/// function bodies.ModuleProvider *getBitcodeModuleProvider(MemoryBuffer *Buffer);

#include “llvm/Bitcode/ReaderWriter.h”

/// materializeFunction - make sure the given function is fully/// read.bool materializeFunction(Function *F);

Page 32: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Loading Libraries

• What if I don’t need to inline?

• Compile your runtime library to a dynamic library!

Page 33: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Loading Libraries // Load my library of fancy extensions to Libc MemoryBuffer *buffer = MemoryBuffer::getFile("mylibc.bc"); Module *Library = ParseBitcodeFile(buffer);

// Create the runtime library module provider, which will // lazily stream functions out of the module. MemoryBuffer *buffer = MemoryBuffer::getFile("mylibc.bc"); ModuleProvider *LMP = getBitcodeModuleProvider(buffer); Module *LM = LMP->getModule();

Page 34: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Linking

Module *Library = Provider->theModule(); // 300KB + 20MBModule *MyFunc = compileWithClang(someSource); // 50KB

// A common mistake, links Library into MyFunc.// Materializes & copies 20MB of IR into 50KB module => 40MB!!LinkModules(MyFunc, Library);

// Much better, link MyFunc into Library, and delete contents// later. 50KB into 300KB => 350KB. 100x improvement.LinkModules(Library, MyFunc);

Page 35: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Linking

• What is “GhostLinkage” ?

• Upcoming Improvements.

Page 36: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Opts & Transforms

• Run Internalize, DCE, and instcombine on clang-generated code.

• Just run IPO opts after link.

Page 37: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Demo & Bakeoff!

Page 38: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Final Results

20x faster compile times!

CPU TOTAL RPRVT

jit201 30ms 40ms 3900K

Page 39: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Final Results

CPU TOTAL RPRVT

jit201 30ms 40ms 3900K

7x less memory!

Page 40: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Documentation

• llvm / [email protected]

• clang / [email protected]

• web archives of lists available

Page 41: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!

Questions&

Answers