building high-performance language implementations with low effort

67
Building High-Performance Language Implementations With Low Effort Stefan Marr FOSDEM 2015, Brussels, Belgium January 31st, 2015 @smarr http://stefan-marr.de

Upload: stefan-marr

Post on 15-Jul-2015

1.913 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building High-Performance Language Implementations With Low Effort

Building High-Performance

Language Implementations

With Low Effort

Stefan MarrFOSDEM 2015, Brussels, Belgium

January 31st, 2015

@smarrhttp://stefan-marr.de

Page 2: Building High-Performance Language Implementations With Low Effort

Why should you care about how Programming Languages work?

2SMBC: http://www.smbc-comics.com/?id=2088

Page 3: Building High-Performance Language Implementations With Low Effort

3SMBC: http://www.smbc-comics.com/?id=2088

Why should you care about how Programming Languages work?

• Performance isn’t magic

• Domain-specific languages• More concise• More productive

• It’s easier than it looks• Often open source• Contributions welcome

Page 4: Building High-Performance Language Implementations With Low Effort

What’s “High-Performance”?

4

Based on latest data from http://benchmarksgame.alioth.debian.org/Geometric mean over available benchmarks.

Disclaimer: Not indicate for application performance!

Competitively Fast!

0

3

5

8

10

13

15

18

Java V8 C# Dart Python Lua PHP Ruby

Page 5: Building High-Performance Language Implementations With Low Effort

Small and Manageable

16

260

525

562

1 10 100 1000

What’s “Low Effort”?

5

KLOC: 1000 Lines of Code, without blank lines and comments

V8 JavaScript

HotSpotJava Virtual Machine

Dart VM

Lua 5.3 interp.

Page 6: Building High-Performance Language Implementations With Low Effort

Language Implementation Approaches

6

Source Program

Interpreter

Run TimeDevelopmentTime

Input

Output

Source Program

Compiler Binary

Input

Output

Run TimeDevelopmentTime

Simple, but often slow More complex, but often fasterNot ideal for all languages.

Page 7: Building High-Performance Language Implementations With Low Effort

Modern Virtual Machines

7

Source Program

Interpreter

Run TimeDevelopment Time

Input

Output

Binary

Runtime Info

CompilerVirtual Machine

with

Just-In-TimeCompilation

Page 8: Building High-Performance Language Implementations With Low Effort

VMs are Highly Complex

8

Interpreter

Input

Output

Compiler Optimizer

Garbage Collector

CodeGen

Foreign Function Interface

Threadsand

MemoryModel

How to reuse most partsfor a new language?

Debugging Profiling

Easily500 KLOC

Page 9: Building High-Performance Language Implementations With Low Effort

How to reuse most partsfor a new language?

9

Input

Output

Make Interpreters Replaceable Components!

Interpreter

Compiler Optimizer

Garbage Collector

CodeGen

Foreign Function Interface

Threadsand

MemoryModel

Garbage Collector

Interpreter

Interpreter

Page 10: Building High-Performance Language Implementations With Low Effort

Interpreter-based Approaches

Truffle + Graal

with Partial Evaluation

Oracle Labs

RPythonwith Meta-Tracing

[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.

[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.

Page 11: Building High-Performance Language Implementations With Low Effort

SELF-OPTIMIZING TREES

A Simple Technique for Language Implementation and Optimization

[1] Würthinger, T.; Wöß, A.; Stadler, L.; Duboscq, G.; Simon, D. & Wimmer, C. (2012), Self-Optimizing AST Interpreters, in 'Proc. of the 8th Dynamic Languages Symposium' , pp. 73-82.

Page 12: Building High-Performance Language Implementations With Low Effort

Code Convention

12

Python-ish

Interpreter Code

Java-ish

Application Code

Page 13: Building High-Performance Language Implementations With Low Effort

A Simple Abstract Syntax Tree Interpreter

13

root_node = parse(file)

root_node.execute(Frame())

if (condition) {cnt := cnt + 1;

} else {cnt := 0;

}

cnt

1

+cnt:=

ifcnt:=

0

cond

root_node

Page 14: Building High-Performance Language Implementations With Low Effort

Implementing AST Nodes

14

if (condition) {cnt := cnt + 1;

} else {cnt := 0;

}

class Literal(ASTNode):final valuedef execute(frame):return value

class VarWrite(ASTNode):child sub_exprfinal idxdef execute(frame):val := sub_expr.execute(frame)frame.local_obj[idx]:= valreturn val

class VarRead(ASTNode):final idxdef execute(frame):return frame.local_obj[idx]

cnt

1

+cnt:=

ifcnt:=

0

cond

Page 15: Building High-Performance Language Implementations With Low Effort

Self-Optimization by Node Specialization

15

cnt := cnt + 1

def UninitVarWrite.execute(frame):val := sub_expr.execute(frame)return specialize(val).execute_evaluated(frame, val)

uninitializedvariable write

cnt

1

+cnt:=

cnt:=

def UninitVarWrite.specialize(val):if val instanceof int:return replace(IntVarWrite(sub_expr))

elif …:…

else:return replace(GenericVarWrite(sub_expr))

specialized

Page 16: Building High-Performance Language Implementations With Low Effort

Self-Optimization by Node Specialization

16

cnt := cnt + 1

def IntVarWrite.execute(frame):try:val := sub_expr.execute_int(frame)return execute_eval_int(frame, val)

except ResultExp, e:return respecialize(e.result).

execute_evaluated(frame, e.result)

def IntVarWrite.execute_eval_int(frame, anInt):frame.local_int[idx] := anIntreturn anInt

intvariable write

cnt

1

+cnt:=

Page 17: Building High-Performance Language Implementations With Low Effort

Some Possible Self-Optimizations

• Type profiling and specialization

• Value caching

• Inline caching

• Operation inlining

• Library Lowering

17

Page 18: Building High-Performance Language Implementations With Low Effort

Library Lowering for Array class

createSomeArray() { return Array.new(1000, ‘fast fast fast’); }

18

class Array {static new(size, lambda) {

return new(size).setAll(lambda);}

setAll(lambda) {forEach((i, v) -> { this[i] = lambda.eval(); });

}}

class Object {eval() { return this; }

}

Page 19: Building High-Performance Language Implementations With Low Effort

Optimizing for Object Values

19

createSomeArray() { return Array.new(1000, ‘fast fast fast’); }

.new

Array

global lookup

method invocation

1000

int literal

‘fast’

string literal

Object, but not a lambda

Optimizationpotential

Page 20: Building High-Performance Language Implementations With Low Effort

Specialized new(size, lambda)

def UninitArrNew.execute(frame):

size := size_expr.execute(frame)

val := val_expr.execute(frame)

return specialize(size, val).

execute_evaluated(frame, size, val)

20

createSomeArray() { return Array.new(1000, ‘fast fast fast’); }

def UninitArrNew.specialize(size, val):

if val instanceof Lambda:

return replace(StdMethodInvocation())

else:

return replace(ArrNewWithValue())

Page 21: Building High-Performance Language Implementations With Low Effort

Specialized new(size, lambda)

def ArrNewWithValue.execute_evaluated(frame, size, val):

return Array([val] * 1000)

21

createSomeArray() { return Array.new(1000, ‘fast fast fast’); }

1 specialized node vs. 1000x `this[i] = lambda.eval()`1000x `eval() { return this; }`

.new

Array

global lookup1000

int literal

‘fast’

string literal

specialized

Page 22: Building High-Performance Language Implementations With Low Effort

JUST-IN-TIME COMPILATION FOR

INTERPRETERS

Generating Efficient Native Code

22

Page 23: Building High-Performance Language Implementations With Low Effort

How to Get Fast Program Execution?

23

VarWrite.execute(frame)

IntVarWrite.execute(frame)

VarRead.execute(frame)

Literal.execute(frame)

ArrayNewWithValue.execute(frame)

..VW_execute() # bin

..IVW_execute() # bin

..VR_execute() # bin

..L_execute() # bin

..ANWV_execute() # bin

Standard Compilation: 1 node at a time

Minimal Optimization Potential

Page 24: Building High-Performance Language Implementations With Low Effort

Problems with Node-by-Node Compilation

24

cnt

1

+cnt:=

Slow Polymorphic Dispatches

def IntVarWrite.execute(frame):try:val := sub_expr.execute_int(frame)return execute_eval_int(frame, val)

except ResultExp, e:return respecialize(e.result).

execute_evaluated(frame, e.result)

cnt:=

Runtime checks in general

Page 25: Building High-Performance Language Implementations With Low Effort

Compilation Unit based on User Program

Meta-Tracing Partial Evaluation

Guided By AST

25

cnt

1

+cnt:=

ifcnt:=

0

cnt

1+

cnt:=if cnt:=

0

[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.

[2] Bolz et al., Tracing the Meta-level: PyPy'sTracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.

Page 26: Building High-Performance Language Implementations With Low Effort

RPython

Just-in-Time Compilation withMeta Tracing

Page 27: Building High-Performance Language Implementations With Low Effort

RPython

• Subset of Python

– Type-inferenced

• Generates VMs

27

Interpretersource

RPythonToolchain

Meta-TracingJIT Compiler

Interpreter

http://rpython.readthedocs.org/

Garbage Collector

Page 28: Building High-Performance Language Implementations With Low Effort

Meta-Tracing of an Interpreter

28

cnt

1+cnt:=

if

cnt:= 0

[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.

Page 29: Building High-Performance Language Implementations With Low Effort

Meta Tracers need to know the Loops

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

jit_merge_point(node=self)

cond = cond_expr.execute_bool(frame)

if not cond:

break

body_expr.execute(frame)

29

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))

Page 30: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntLessThan(ASTNode):

child left_expr

child right_expr

def execute_bool(frame):

try:

left = left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = right_expr.execute_int()

expect UnexpectedResult r:

...

return left < right

30

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))

Page 31: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntVarRead(ASTNode):

final idx

def execute_int(frame):

if frame.is_int(idx):

return frame.local_int[idx]

else:

new_node = respecialize()

raise UnexpectedResult(new_node.execute())

31

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)

Page 32: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntVarRead(ASTNode):

final idx

def execute_int(frame):

if frame.is_int(idx):

return frame.local_int[idx]

else:

new_node = respecialize()

raise UnexpectedResult(new_node.execute())

32

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))

Page 33: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntVarRead(ASTNode):

final idx

def execute_int(frame):

if frame.is_int(idx):

return frame.local_int[idx]

else:

new_node = respecialize()

raise UnexpectedResult(new_node.execute())

33

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]

Page 34: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntLessThan(ASTNode):

child left_expr

child right_expr

def execute_bool(frame):

try:

left = left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = right_expr.execute_int()

expect UnexpectedResult r:

...

return left < right

34

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))

Page 35: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntLessThan(ASTNode):

child left_expr

child right_expr

def execute_bool(frame):

try:

left = left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = right_expr.execute_int()

expect UnexpectedResult r:

...

return left < right

35

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))

Page 36: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntLessThan(ASTNode):

child left_expr

child right_expr

def execute_bool(frame):

try:

left = left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = right_expr.execute_int()

expect UnexpectedResult r:

...

return left < right

36

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)

Page 37: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntLessThan(ASTNode):

child left_expr

child right_expr

def execute_bool(frame):

try:

left = left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = right_expr.execute_int()

expect UnexpectedResult r:

...

return left < right

37

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))

Page 38: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class IntLessThan(ASTNode):

child left_expr

child right_expr

def execute_bool(frame):

try:

left = left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = right_expr.execute_int()

expect UnexpectedResult r:

...

return left < right

38

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))b1 := i4 < i5

Page 39: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

jit_merge_point(node=self)

cond = cond_expr.execute_bool(frame)

if not cond:

break

body_expr.execute(frame)

39

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))b1 := i4 < i5guard_true(b1)

Page 40: Building High-Performance Language Implementations With Low Effort

Tracing Records one Concrete Execution

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

jit_merge_point(node=self)

cond = cond_expr.execute_bool(frame)

if not cond:

break

body_expr.execute(frame)

40

while (cnt < 100) {cnt := cnt + 1;

}

Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))b1 := i4 < i5guard_true(b1)...

Page 41: Building High-Performance Language Implementations With Low Effort

Traces are Ideal for Optimizationguard(cond_expr ==

Const(IntLessThan))guard(left_expr ==

Const(IntVarRead))

i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))

i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))

guard(right_expr ==Const(IntLiteral))

i5 := right_expr.value # Const(100)guard_no_exception(

Const(UnexpectedResult))

b1 := i4 < i5guard_true(b1)

...

i1 := left_expr.idx # Const(1)a1 := frame.layouti1 := a1[Const(1)] guard(i1 == Const(F_INT))

i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]

i5 := right_expr.value # Const(100)

b1 := i2 < i5guard_true(b1)

...

a1 := frame.layouti1 := a1[1] guard(i1 == F_INT)

a2 := frame.local_inti2 := a2[1]

b1 := i2 < 100guard_true(b1)

...

Page 42: Building High-Performance Language Implementations With Low Effort

Truffle + Graal

Just-in-Time Compilation withPartial Evaluation

Oracle Labs

Page 43: Building High-Performance Language Implementations With Low Effort

Truffle+Graal

• Java framework

– AST interpreters

• Based on HotSpot

JVM

43

InterpreterGraal Compiler +

Truffle Partial Evaluator

http://www.ssw.uni-linz.ac.at/Research/Projects/JVM/Truffle.htmlhttp://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index-2301583.html

Garbage Collector

+ Truffle Framework

HotSpot JVM

Page 44: Building High-Performance Language Implementations With Low Effort

Partial Evaluation Guided By AST

44

cnt

1+cnt:=

ifcnt:= 0

[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.

Page 45: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

cond = cond_expr.execute_bool(frame)

if not cond:

break

body_expr.execute(frame)

45

while (cnt < 100) {cnt := cnt + 1;

}

Page 46: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

cond = cond_expr.execute_bool(frame)

if not cond:

break

body_expr.execute(frame)

46

while (cnt < 100) {cnt := cnt + 1;

}

class IntLessThan(ASTNode):child left_exprchild right_expr

def execute_bool(frame):try:

left = left_expr.execute_int()except UnexpectedResult r:

...try:

right = right_expr.execute_int()expect UnexpectedResult r:

...return left < right

Page 47: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

try:

left = cond_expr.left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = cond_expr.right_expr.execute_int()

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)47

while (cnt < 100) {cnt := cnt + 1;

}

Page 48: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

try:

left = cond_expr.left_expr.execute_int()

except UnexpectedResult r:

...

try:

right = cond_expr.right_expr.execute_int()

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

class IntVarRead(ASTNode):final idx

def execute_int(frame):if frame.is_int(idx):

return frame.local_int[idx]else:

new_node = respecialize()raise UnexpectedResult(new_node.execute

Page 49: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

try:if frame.is_int(1):left = frame.local_int[1]

else:new_node = respecialize()raise UnexpectedResult(new_node.execute())

except UnexpectedResult r:

...

try:

right = cond_expr.right_expr.execute_int()

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

Page 50: Building High-Performance Language Implementations With Low Effort

Optimize Optimistically

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:

try:if frame.is_int(1):left = frame.local_int[1]

else:new_node = respecialize()raise UnexpectedResult(new_node.execute())

except UnexpectedResult r:

...

try:

right = cond_expr.right_expr.execute_int()

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

Page 51: Building High-Performance Language Implementations With Low Effort

Optimize Optimistically

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:if frame.is_int(1):

left = frame.local_int[1]else:

__deopt_return_to_interp()try:

right = cond_expr.right_expr.execute_int()

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

Page 52: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:if frame.is_int(1):

left = frame.local_int[1]else:

__deopt_return_to_interp()try:

right = cond_expr.right_expr.execute_int()

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

class IntLiteral(ASTNode):final valuedef execute_int(frame):return value

Page 53: Building High-Performance Language Implementations With Low Effort

Partial Evaluation inlinesbased on Runtime Constants

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:if frame.is_int(1):

left = frame.local_int[1]else:

__deopt_return_to_interp()try:

right = 100

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

class IntLiteral(ASTNode):final valuedef execute_int(frame):return value

Page 54: Building High-Performance Language Implementations With Low Effort

Classic Optimizations:

Dead Code Eliminationclass WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:if frame.is_int(1):

left = frame.local_int[1]else:

__deopt_return_to_interp()try:

right = 100

expect UnexpectedResult r:

...

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

class IntLiteral(ASTNode):final valuedef execute_int(frame):return value

Page 55: Building High-Performance Language Implementations With Low Effort

Classic Optimizations:

Constant Propagation

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:if frame.is_int(1):

left = frame.local_int[1]else:

__deopt_return_to_interp()right = 100

cond = left < right

if not cond:

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

class IntLiteral(ASTNode):final valuedef execute_int(frame):return value

Page 56: Building High-Performance Language Implementations With Low Effort

Classic Optimizations:

Loop Invariant Code Motionclass WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

while True:if frame.is_int(1):

left = frame.local_int[1]else:

__deopt_return_to_interp()

if not (left < 100):

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

Page 57: Building High-Performance Language Implementations With Low Effort

class WhileNode(ASTNode):

child cond_expr

child body_expr

def execute(frame):

if not frame.is_int(1):__deopt_return_to_interp()

while True:if not (frame.local_int[1] < 100):

break

body_expr.execute(frame)

while (cnt < 100) {cnt := cnt + 1;

}

Classic Optimizations:

Loop Invariant Code Motion

Page 58: Building High-Performance Language Implementations With Low Effort

Compilation Unit based on User Program

Meta-Tracing Partial Evaluation

Guided by AST

58

cnt

1

+cnt:=

ifcnt:=

0

cnt

1+

cnt:=if cnt:=

0

[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.

[2] Bolz et al., Tracing the Meta-level: PyPy'sTracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.

Page 59: Building High-Performance Language Implementations With Low Effort

WHAT’S POSSIBLE FOR A SIMPLE

INTERPRETER?

Results

59

Page 60: Building High-Performance Language Implementations With Low Effort

Designed for Teaching:

• Simple

• Conceptual Clarity

• An Interpreter family

– in C, C++, Java, JavaScript, RPython, Smalltalk

Used in the past by:

http://som-st.github.io

60

Page 61: Building High-Performance Language Implementations With Low Effort

Self-Optimizing SOMs

61

SOMMERTruffleSOM

Meta-TracingRPython

SOMPETruffleSOM

Partial Evaluation +Graal Compiler

on the HotSpot JVM

JIT Compiled JIT Compiled

github.com/SOM-st/TruffleSOMgithub.com/SOM-st/RTruffleSOM

Page 62: Building High-Performance Language Implementations With Low Effort

Java 8 -server vs. SOM+JITJIT-compiled Peak Performance

62

3.5x slower(min. 1.6x, max. 6.3x)

RPython

2.8x slower(min. 3%, max. 5x)

Truffle+Graal

Compiled

SOMMT

Compiled

SOMPE

●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●

●●●●

●●●●

●●●●●●●●●

●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●●●

●●

●●

●●●●●

●●●●●●●●

●●●●●●●●●●

●●

1

4

8

Bo

un

ce

Bu

bble

So

rt

De

lta

Blu

e

Fa

nn

ku

ch

Ma

nd

elb

rot

NB

od

y

Pe

rmu

te

Qu

ee

ns

Qu

ickS

ort

Ric

ha

rds

Sie

ve

Sto

rag

e

Tow

ers

Bo

un

ce

Bu

bble

So

rt

De

lta

Blu

e

Fa

nn

ku

ch

Ma

nd

elb

rot

NB

od

y

Pe

rmu

te

Qu

ee

ns

Qu

ickS

ort

Ric

ha

rds

Sie

ve

Sto

rag

e

Tow

ers

Ru

ntim

e n

orm

aliz

ed

to

Java

(co

mp

iled

or

inte

rpre

ted

)

Page 63: Building High-Performance Language Implementations With Low Effort

Implementation: Smaller Than Lua

63

Meta-TracingSOMMT (RTruffleSOM)

Partial EvaluationSOMPE (TruffleSOM)

KLOC: 1000 Lines of Code, without blank lines and comments

4.2

9.8

16

260

525

562

1 10 100 1000

V8 JavaScript

HotSpotJava Virtual Machine

Dart VM

Lua 5.3 interp.

Page 64: Building High-Performance Language Implementations With Low Effort

CONCLUSION

64

Page 65: Building High-Performance Language Implementations With Low Effort

Simple and Fast Interpreters are Possible!

• Self-optimizing AST interpreters

• RPython or Truffle for JIT Compilation

65

[1] Würthinger et al., Self-Optimizing AST Interpreters, Proc. of the 8th Dynamic Languages Symposium, 2012, pp. 73-82.

[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.

[4] Marr et al., Are We There Yet? Simple Language Implementation Techniques for the 21st Century. IEEE Software 31(5):60—67, 2014

[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.

Literature on the ideas:

Page 66: Building High-Performance Language Implementations With Low Effort

RPython• #pypy on irc.freenode.net

• rpython.readthedocs.org

• Kermit Example interpreterhttps://bitbucket.org/pypy/example-interpreter

• A Tutorialhttp://morepypy.blogspot.be/2011/04/tutorial-writing-interpreter-with-pypy.html

• Language implementationshttps://www.evernote.com/shard/s130/sh/4d42a591-c540-4516-9911-c5684334bd45/d391564875442656a514f7ece5602210

Truffle• http://mail.openjdk.java.net/

mailman/listinfo/graal-dev• SimpleLanguage interpreter

https://github.com/OracleLabs/GraalVM/tree/master/graal/com.oracle.truffle.sl/src/com/oracle/truffle/sl

• A Tutorialhttp://cesquivias.github.io/blog/2014/10/13/writing-a-language-in-truffle-part-1-a-simple-slow-interpreter/

• Project– http://www.ssw.uni-

linz.ac.at/Research/Projects/JVM/Truffle.html– http://www.oracle.com/technetwork/oracle-

labs/program-languages/overview/index-2301583.html 66

Big Thank You!to both communities,

for help, answering questions, debugging support, etc…!!!

Page 67: Building High-Performance Language Implementations With Low Effort

Languages: Small, Elegant, and Fast!

67

cnt

1

+cnt:=

ifcnt:=

0

cnt

1+cnt:=

ifcnt:= 0

Compiled

SOMMT

Compiled

SOMPE

●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●

●●●●

●●●●

●●●●●●●●●

●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●●●

●●

●●

●●●●●

●●●●●●●●

●●●●●●●●●●

●●

1

4

8B

ou

nce

Bu

bble

So

rt

De

lta

Blu

e

Fa

nn

ku

ch

Ma

nd

elb

rot

NB

od

y

Pe

rmu

te

Qu

ee

ns

Qu

ickS

ort

Ric

ha

rds

Sie

ve

Sto

rag

e

Tow

ers

Bo

un

ce

Bu

bble

So

rt

De

lta

Blu

e

Fa

nn

ku

ch

Ma

nd

elb

rot

NB

od

y

Pe

rmu

te

Qu

ee

ns

Qu

ickS

ort

Ric

ha

rds

Sie

ve

Sto

rag

e

Tow

ers

Ru

ntim

e n

orm

aliz

ed

to

Java

(co

mp

iled

or

inte

rpre

ted

)

3.5x slower(min. 1.6x, max. 6.3x)

4.2 KLOCRPython

2.8x slower(min. 3%, max. 5x)

9.8 KLOCTruffle+Graal

@smarr | http://stefan-marr.de