Download - Ruby World

Applying Compiler Technology to Ruby

Sept 8, 2009

Evan Phoenix

Wednesday, September 16, 2009

What makes Ruby great can make Ruby slow.


‣ Highly Dynamic


‣ Highly Dynamic

• Very high level operations

• New code can be introduced at anytime

• Dynamic typing

• Exclusively late bound method calls

• Easier to implement as an interpreter


Haven’t other languages had these same features/

weaknesses?


‣Prior Work


‣Prior Work

• Smalltalk

• 1980-1994: Extensive work to make it fast

• Self

• 1992-1996: A primary research vehicle for making dynamic languages fast

• Java / Hotspot

• 1996-present: A battle hardened engine for (limited) dynamic dispatch


‣What Can We Learn From Them?


‣What Can We Learn From Them?

• Complied code is faster than interpreted code

• It’s very hard (almost impossible) to figure things out staticly

• The type profile of a program is stable over time

• Therefore:

• Learn what a program does and optimize based on that

• This is called Type Feedback


‣Code Generation (JIT)

• Eliminating overhead of interpreter instantly increases performance a fixed percentage

• Naive code generation results in small improvement over interpreter

• Method calling continues to dominate time

• Need a way to generate better code

• Combine with program type information!


‣Type Profile

• As the program executes, it’s possible to see how one method calls another methods

• The relationship of one method and all the methods it calls is the type profile of the method

• Just because you CAN use dynamic dispatch, doesn’t mean you always do.

• It’s common that a call site always calls the same method every time it’s run


21%

1 class98%

1: 25245 2: 275 3: 86 4: 50 5: 35 6: 6 7: 10 8: 5 9: 5 10: 2 10+: 34

Call sites running Array specs


‣Type Profiling (Cont.)

• 98% of all method calls are to the same method every time

• In other words, 98% of all method calls are statically bound


‣Type Feedback

• Optimize a semi-static relationship to generate faster code

• Semi-static relationships are found by profiling all call sites

• Allow JIT to make vastly better decisions

• Most common optimization: Method Inlining


‣Method Inlining

• Rather than emit a call to a target method, copy it’s body at the call site

• Eliminates code to lookup and begin execution of target method

• Simplifies (or eliminates) setup for target method

• Allows for type propagation, as well as providing a wider horizon for optimization.

• A wider horizon means better generated code, which means less work to do per method == faster execution.


Implementation


‣Code Generation (JIT)

• Early experimentation with custom JIT

•Realized we weren’t experts

•Would take years to get good code being generated

• Switched to LLVM


‣LLVM

• Provides an internal AST (LLVM IR) for describing work to be done

• Text representation of AST allows for easy debugging

• Provides ability to compile AST to machine code in memory

• Contains thousands of optimizations

• Competitive with GCC


‣Type Profiling

• All call sites use a class called InlineCache, one per call site

• InlineCache accelerates method dispatch by caching previous method used

• In addition, tracks a fixed number of receiver classes seen when there is a cache miss

• When compiling a method using LLVM, all InlineCaches for a method can be read

• InlineCaches with good information can be used to accurately find a method to inline


‣When To Compile

• It takes time for a method’s type information to settle down

• Compiling too early means not having enough type info

• Compiling too late means lost performance

• Use simple call counters to allow a method to “heat up”

• Each invocation of a method increments counter

• When counter reaches a certain value, method is queued for compilation.

• Threshold value is tunable: -Xjit.call_til_compile

• Still experimenting with good default values


‣How to Compile

• To impact runtime as little as possible, all JIT compilation happens in a background OS thread

• Methods are queued, and background thread reads queue to find methods to compile

• After compiling, function pointers to JIT generated code are installed in methods

• All future invocations of method use JIT code


‣Benchmarks

0

2.25

4.5

6.75

9

1.8 1.9 rbx rbx jit rbx jit +blocks

2.59

3.60

5.90

5.30

8.02

Seconds

def foo() ary = [] 100.times { |i| ary << i }end

300,000 times


‣Benchmarks

0

7.5

15

22.5

30


12.0112.54

25.36

5.264.85

Seconds

def foo() hsh = {} 100.times { |i| hsh[i] = 0 }end

100,000 times


‣Benchmarks

0

1.75

3.5

5.25

7


2.662.68

6.26

2.09

3.64

Seconds

def foo() hsh = { 47 => true } 100.times { |i| hsh[i] }end

100,000 times


‣Benchmarks

0

2

4

6

8

1.8 1.9 jruby rbx rbx jit rbx jit +blocks

1.531.53

7.27

1.891.58

7.36

Seconds

tak(18, 9, 0)


‣Conclusion

• Ruby is a wonderful language because it is organized for humans

• By gather and using information about a running program, it’s possible to make that program much faster without impacting flexibility

• Thank You!


Download - Ruby World

Top Related