Download - Ruby World
Applying Compiler Technology to Ruby
Sept 8, 2009
Evan Phoenix
Wednesday, September 16, 2009
What makes Ruby great can make Ruby slow.
Wednesday, September 16, 2009
‣ Highly Dynamic
Wednesday, September 16, 2009
‣ Highly Dynamic
• Very high level operations
• New code can be introduced at anytime
• Dynamic typing
• Exclusively late bound method calls
• Easier to implement as an interpreter
Wednesday, September 16, 2009
Haven’t other languages had these same features/
weaknesses?
Wednesday, September 16, 2009
‣Prior Work
Wednesday, September 16, 2009
‣Prior Work
• Smalltalk
• 1980-1994: Extensive work to make it fast
• Self
• 1992-1996: A primary research vehicle for making dynamic languages fast
• Java / Hotspot
• 1996-present: A battle hardened engine for (limited) dynamic dispatch
Wednesday, September 16, 2009
‣What Can We Learn From Them?
Wednesday, September 16, 2009
‣What Can We Learn From Them?
• Complied code is faster than interpreted code
• It’s very hard (almost impossible) to figure things out staticly
• The type profile of a program is stable over time
• Therefore:
• Learn what a program does and optimize based on that
• This is called Type Feedback
Wednesday, September 16, 2009
‣Code Generation (JIT)
• Eliminating overhead of interpreter instantly increases performance a fixed percentage
• Naive code generation results in small improvement over interpreter
• Method calling continues to dominate time
• Need a way to generate better code
• Combine with program type information!
Wednesday, September 16, 2009
‣Type Profile
• As the program executes, it’s possible to see how one method calls another methods
• The relationship of one method and all the methods it calls is the type profile of the method
• Just because you CAN use dynamic dispatch, doesn’t mean you always do.
• It’s common that a call site always calls the same method every time it’s run
Wednesday, September 16, 2009
21%
1 class98%
1: 25245 2: 275 3: 86 4: 50 5: 35 6: 6 7: 10 8: 5 9: 5 10: 2 10+: 34
Call sites running Array specs
Wednesday, September 16, 2009
‣Type Profiling (Cont.)
• 98% of all method calls are to the same method every time
• In other words, 98% of all method calls are statically bound
Wednesday, September 16, 2009
‣Type Feedback
• Optimize a semi-static relationship to generate faster code
• Semi-static relationships are found by profiling all call sites
• Allow JIT to make vastly better decisions
• Most common optimization: Method Inlining
Wednesday, September 16, 2009
‣Method Inlining
• Rather than emit a call to a target method, copy it’s body at the call site
• Eliminates code to lookup and begin execution of target method
• Simplifies (or eliminates) setup for target method
• Allows for type propagation, as well as providing a wider horizon for optimization.
• A wider horizon means better generated code, which means less work to do per method == faster execution.
Wednesday, September 16, 2009
Implementation
Wednesday, September 16, 2009
‣Code Generation (JIT)
• Early experimentation with custom JIT
•Realized we weren’t experts
•Would take years to get good code being generated
• Switched to LLVM
Wednesday, September 16, 2009
‣LLVM
• Provides an internal AST (LLVM IR) for describing work to be done
• Text representation of AST allows for easy debugging
• Provides ability to compile AST to machine code in memory
• Contains thousands of optimizations
• Competitive with GCC
Wednesday, September 16, 2009
‣Type Profiling
• All call sites use a class called InlineCache, one per call site
• InlineCache accelerates method dispatch by caching previous method used
• In addition, tracks a fixed number of receiver classes seen when there is a cache miss
• When compiling a method using LLVM, all InlineCaches for a method can be read
• InlineCaches with good information can be used to accurately find a method to inline
Wednesday, September 16, 2009
‣When To Compile
• It takes time for a method’s type information to settle down
• Compiling too early means not having enough type info
• Compiling too late means lost performance
• Use simple call counters to allow a method to “heat up”
• Each invocation of a method increments counter
• When counter reaches a certain value, method is queued for compilation.
• Threshold value is tunable: -Xjit.call_til_compile
• Still experimenting with good default values
Wednesday, September 16, 2009
‣How to Compile
• To impact runtime as little as possible, all JIT compilation happens in a background OS thread
• Methods are queued, and background thread reads queue to find methods to compile
• After compiling, function pointers to JIT generated code are installed in methods
• All future invocations of method use JIT code
Wednesday, September 16, 2009
‣Benchmarks
0
2.25
4.5
6.75
9
1.8 1.9 rbx rbx jit rbx jit +blocks
2.59
3.60
5.90
5.30
8.02
Seconds
def foo() ary = [] 100.times { |i| ary << i }end
300,000 times
Wednesday, September 16, 2009
‣Benchmarks
0
7.5
15
22.5
30
1.8 1.9 rbx rbx jit rbx jit +blocks
12.0112.54
25.36
5.264.85
Seconds
def foo() hsh = {} 100.times { |i| hsh[i] = 0 }end
100,000 times
Wednesday, September 16, 2009
‣Benchmarks
0
1.75
3.5
5.25
7
1.8 1.9 rbx rbx jit rbx jit +blocks
2.662.68
6.26
2.09
3.64
Seconds
def foo() hsh = { 47 => true } 100.times { |i| hsh[i] }end
100,000 times
Wednesday, September 16, 2009
‣Benchmarks
0
2
4
6
8
1.8 1.9 jruby rbx rbx jit rbx jit +blocks
1.531.53
7.27
1.891.58
7.36
Seconds
tak(18, 9, 0)
Wednesday, September 16, 2009
‣Conclusion
• Ruby is a wonderful language because it is organized for humans
• By gather and using information about a running program, it’s possible to make that program much faster without impacting flexibility
• Thank You!
Wednesday, September 16, 2009