(coscup 2015) a beginner's journey to mozilla spidermonkey js engine
TRANSCRIPT
Why Joining SpiderMonkey
• Explore a real language engine implementation
• Good First Features encouraging beginners
SpiderMonkey Overview
NativeCode
Bytecode
JIT Compiler
JS Source
Compiler
Interpreter
CPU
Bytecode Generation
SpiderMonkey Overview
NativeCode
Bytecode
JIT Compiler
JS Source
Compiler
Interpreter
CPU
Bytecode Interpretation
Bytecode Generation
SpiderMonkey Overview
NativeCode
Bytecode
JIT Compiler
JS Source
Compiler
Interpreter
CPU
Bytecode Interpretation Hot Code Optimization
Native Code Execution
Bytecode Generation
Bytecode Compiler
• Lexical Analysis
• Split the source script into token stream
• Syntactic Analysis
• Parse token stream and build Abstract Syntax Tree
• Code Generation
• Traverse the AST to emit bytecode
VarOrExprs → var Vars | Expr
Vars → Var | Var, Vars
Var → Id | Id = AssignExpr
Expr → AssignExpr | AssignExpr, Expr
AssignExpr → CondExpr | CondExpr AssignOp AssignExpr
AddExprs → MulExpr | MulExpr + AddExpr
MulExpr → UnaryExpr | UnaryExpr * MulExpr
PrimaryExpr → (Expr) | Id | LitInt | LitFloat | LitString
| false | true | null | this
Syntactic Analysis
. . .
Recursive Descent Parsing
. . .
Top to Bottom
Left to Right
Syntactic Analysis
Statement List
Assignment
Def : x BinaryAdd
Use : y Use : z
Assignment
Def : a BinaryMultiply
Use : b Use : c
Result AST
Code Generation
= =
x
y
S
z
+ a
b c
*
DefVar x DefVar a BindName x GetName y GetName z Add SetName x
BindName a GetName b GetName c Mul SetName a
Bytecode Interpreter
• Prepare the stack frame to interpret bytecode
• Dispatch bytecode in a large switch statement
INTERPRETER_LOOP ( )
CASE ( JSOP_GETNAME ) {
GetNameOperation( ) } CASE ( JSOP_ADD ) {
AddOperation( ) } CASE ( JSOP_SETNAME ) {
SetNameOperation( )
} ... ... More Handlers ... ... END_LOOP ( )
function add (src, dst) {
return src + dst;
}
add(“coscup”, 2015);
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
Caller Callee Stack Frame
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
Caller Callee Stack Frame
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
Caller Callee Stack Frame
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
Caller Callee Stack Frame
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
JSVal: 2015
Caller Callee Stack Frame
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
JSVal: 2015
Caller Callee Stack Frame
JSVal: “coscup”
JSVal: 2015
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
JSVal: 2015
JSVal: “coscup”
Caller Callee Stack Frame
JSVal: “coscup”
JSVal: 2015
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
JSVal: 2015
JSVal: “coscup”
JSVal: 2015
Caller Callee Stack Frame
JSVal: “coscup”
JSVal: 2015
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
JSVal: 2015
Caller Callee
JSVal: “coscup2015”
Stack Frame
JSVal: “coscup”
JSVal: 2015
Interpretation Example
GetName “add”
Undefined
String “coscup”
Int16 2015
Call 2
GetArg 0
GetArg 1
Add
Return
JSVal: Func_add
JSVal: Undef
JSVal: “coscup”
JSVal: 2015
Caller Callee
JSVal: “coscup2015”
Stack Frame
Interpretation Example
Performance Disadvantage
• Immediate execution without proper redundancy
elimination and task specialized optimization
Performance Disadvantage
• Immediate execution without proper redundancy
elimination and task specialized optimization
Example
Object Property Access
Obj.Prop
JS Object
var People = {
Name : “Me”,
Age : 1,
Gender : “M”
};
Property Value
People.Name
People.Age
People.Gender Property Access
Object Internal
• A list of shapes each of which
• Represents a named property
• A vector of slots each of which
• Stores the value of the mapped property
• A shape to describe its overall attributes
Object
Name
“Me”
Shape List
Slot Vector Attr
Shape Age Gender
1 “M”
Object Property Access
• Object layout traversal
1. Search shape list to locate
the target property shape
2. Access slot vector with the
index found in the shape
P1
Pi
Pj
Pn
Object
Object Property Access
• Object layout traversal
1. Search shape list to locate
the target property shape
2. Access slot vector with the
index found in the shape
• To speed up traversal
• Attach hash tables with some
shapes for table indexing
P1
Pi
Pj
Pn
Object
Pi
Pj
Performance Gap
lea eax, obj mov ebx, [eax + 4]
AoT Compilation
Direct access Slow object
layout traversal
struct Object { int Prop1; int Prop2;
}; int prop = obj -> Prop2;
var obj = { Prop1 : 1, Prop2 : 2,
} var prop = obj.Prop2;
Interpretation
VS
GetName obj GetProp Prop2
Can we improve the performance?
In addition to object property access,
Still many issues…
Interpretation
JIT Compilation
JIT Compilation
• Generate extremely fast native code
• Baseline for hot methods
• Inline cache to speed up dynamic property lookup
• IonMonkey for very hot methods
• Comprehensive optimization to remove redundancy
Inline Cache
• Objective
• Mitigate the overhead of object layout traversal
for each single property access
• Idea
• Cache the resolved value after dynamic lookup
• Emit a piece of direct access code for that value
Inline Cache
• Efficient code for direct access
• But if obj is modified, the code will be unsafe
var res = obj.prop;
GetName “obj” GetProp “prop” mov eax, obj
mov eax, [eax + OfstSlot]
Direct Access Guard
• If an object is modified with property insertion or
deletion, its layout is also changed
• Execute the cached code may cause invalid access
• Need a guard to check for object modification
• Object remains the same, enter cached code
• Otherwise, fallback to dynamic lookup and reoptimize
Direct Access Guard
• Benefit from object shape
• Object has a shape to describe its overall attribute
• The object shape is synchronized with its layout
Direct Access Guard
• Benefit from object shape
• Object has a shape to describe its overall attribute
• The object shape is synchronized with its layout
• Applying object shape to guard the cached code
mov eax, obj cmp [eax + ShapeOfst], CachedShape
Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj
call VM_CallBack
1. Resolve designated property
Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj
call VM_CallBack
1. Resolve designated property
2. Generate direct access code
cmp [eax+ShapeOfst], CachedShape jne MISS
mov eax, [eax+CachedSlotOfst] jmp EXIT
MISS: call VM_CallBack
EXIT:
Cached code
Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj
1. Resolve designated property
2. Generate direct access code
3. Modify original call site
cmp [eax+ShapeOfst], CachedShape jne MISS
mov eax, [eax+CachedSlotOfst] jmp EXIT
MISS: call VM_CallBack
EXIT:
Cached code
call VM_CallBack call Cached_Code
Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj
1. Resolve designated property
2. Generate direct access code
3. Modify original call site
4. Jump to cached code
cmp [eax+ShapeOfst], CachedShape jne MISS
mov eax, [eax+CachedSlotOfst] jmp EXIT
MISS: call VM_CallBack
EXIT:
Cached code
call VM_CallBack call Cached_Code
Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj
1. Resolve designated property
2. Generate direct access code
3. Modify original call site
4. Jump to cached code
cmp [eax+ShapeOfst], CachedShape jne MISS
mov eax, [eax+CachedSlotOfst] jmp EXIT
MISS: call VM_CallBack
EXIT:
Cached code
call VM_CallBack call Cached_Code
After code linking,
It will be direct access,
If shape not changed
What If ...
var dog = { Name : “dog”, Bow : function( ){ },
}
var cat = { Name : “cat”, Meow : function( ){ },
}
for (var i = 0 ; i < 100 ; i++) { WhoAmI(dog); WhoAmI(cat);
}
function WhoAmI (obj) { return obj.Name; }
dog cat dog cat . . .
Expensive cache and flush
Polymorphic IC • Cache multiple sets of object shapes and the
resolved values
cmp [eax+ShapeOfst], CachedShape1 jne SHAPE2 mov eax, [eax+CachedSlotOfst1] jmp EXIT
SHAPE2: cmp [eax+ShapeOfst], CachedShape2 jne SHAPE3 mov eax, [eax+CachedSlotOfst2] jmp EXIT … … …
MISS: call VM_CallBack
EXIT:
IonMonkey
• Translate bytecode to static single assignment
form (SSA) and build control flow graph
• Apply data and control flow hybrid optimization
• Translate optimized SSAs to native code
Static Single Assignment
• Each expression has at most 3 operands
• Each target operand has an unique assignment
X = 1
X = 2
Y = X + 1
Z = 3
Y = X + 2
X1 = 1
X2 = 2
Y1 = X2 + 1
Z1 = 3
Y2 = X2 + 2
Original Code SSA Form
Control Flow Graph
• The control flow relation among basic blocks
• Basic block
Consecutive instructions with last one as control transfer Goto Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
Cond
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B2 B3
B4 B5
B1
Value Numbering
• Eliminate redundant expressions
X1 = A1 + B1
Y1 = 1
Z1 = A1 + B1
X1 = A1 + B1
Y1 = 1
Z1 = X1
• Often combined with other optimizations
• Constant folding and propagation
• Expression simplification
• Unreachable code elimination
Value Numbering
• Assign a hash value to each expression
• Expressions containing the same value of a
former expression can be reduced
• Same set of source values
• Same operator considering algebraic commutative
X1 = A1 + B1
Z1 = B1 + A1
(+, V1, V2) V3
Hash Key Value
Z1 = X1
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
A1 B1 3 8
Operand
V1 V2 V3 V4
Value Hash Key
(A1) (B1) (3) (8)
Local Scope
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
A1 B1 3 8
X1
Operand
V1 V2 V3 V4
V5
Value Hash Key
(A1) (B1) (3) (8)
(-, V1,V2)
Local Scope
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
A1 B1 3 8
X1 X2
Operand
V1 V2 V3 V4
V5 V3
Value Hash Key
(A1) (B1) (3) (8)
(-, V1,V2) (V3)
Local Scope
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
A1 B1 3 8
X1 X2 Y1
Operand
V1 V2 V3 V4
V5 V3 V6
Value Hash Key
(A1) (B1) (3) (8)
(-, V1,V2) (V3)
(+,V1,V2)
Local Scope
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
Z1 = 6
A1 B1 3 8 6
X1 X2 Y1 Z1
Operand
V1 V2 V3 V4 V7
V5 V3 V6 V7
Value Hash Key
(A1) (B1) (3) (8) (6)
(-, V1,V2) (V3)
(+,V1,V2) (V7)
Local Scope
Constant Folding
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
Z1 = 6
T1 = 9
A1 B1 3 8 6 9
X1 X2 Y1 Z1 T1
Operand
V1 V2 V3 V4 V7 V8 V5 V3 V6 V7 V8
Value Hash Key
(A1) (B1) (3) (8) (6) (9)
(-, V1,V2) (V3)
(+,V1,V2) (V7) (V8)
Local Scope
Constant Folding
Const Propagation
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
Z1 = 6
T1 = 9
U1 = Y1
A1 B1 3 8 6 9
X1 X2 Y1 Z1 T1 U1
Operand
V1 V2 V3 V4 V7 V8 V5 V3 V6 V7 V8 V6
Value Hash Key
(A1) (B1) (3) (8) (6) (9)
(-, V1,V2) (V3)
(+,V1,V2) (V7) (V8)
(+,V1,V2)
Local Scope
Constant Folding
Const Propagation
X1 = A1–B1
X2 = 3
Y1 = A1+B1
Z1 = 3 + 3
T1 = Z1+ 3
U1 = B1+A1
V1 = B1* 8
Z1 = 6
T1 = 9
U1 = Y1
V1 = B1<< 3
A1 B1 3 8 6 9
X1 X2 Y1 Z1 T1 U1
Operand
V1 V2 V3 V4 V7 V8 V5 V3 V6 V7 V8 V6
Value Hash Key
(A1) (B1) (3) (8) (6) (9)
(-, V1,V2) (V3)
(+,V1,V2) (V7) (V8)
(+,V1,V2) V1 V9 (<<,V2,V3)
Local Scope
Constant Folding
Const Propagation
Expr Simplification
Extend to Global Scope
• Require analysis for dominating relation in CFG
• For exprs e1 and e2, e2 can be reduced if
• e2 has the same value with e1
• e1 dominates e2 in CFG, that is, all paths from entry
point to e2 must go through e1
• Examine basic blocks in reverse post order
• Guarantee dominating exprs are handled first
Global Scope
Goto Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B1
B2 B3
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• In B4
Global Scope
Goto Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B1
B2 B3
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• In B4
Global Scope
Goto Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B1
B2 B3
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• In B4
Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
• V1 = Y1
Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
• V1 = Y1
• W1 cannot be simplified
Loop Invariant Code Motion
• Hoist the loop invariant exprs outside the loop
• For a loop invariant expression x = y + z
• y and z should not depend on the operands defined
in the loop
Loop Invariant Code Motion X1 = A1+B1
Y1 = X1+ 3
Z1 = Y1+ A1
T1 = A1- B1
U1 = T1+ 3
V1 = Y1+ U1
• Invariant expressions
• e1: Y1 = X1 + 3
• e2: T1 = A1 – B1
•Hoist e1 and e2 from
B3 to B1
B1
B2
B3 V1 < 100
Loop Invariant Code Motion X1 = A1+B1
Y1 = X1+ 3
T1 = A1-B1
Z1 = Y1+ A1
U1 = T1+ 3
V1 = Y1+ U1
• Invariant expressions
• e1: Y1 = X1 + 3
• e2: T1 = A1 – B1
•Hoist e1 and e2 from
B3 to B1
B1
B2
B3 V1 < 100
More Optimizations
• SSA and control flow optimizations
• Dead code elimination
• Value range analysis
• Loop unrolling
• And more . . .
• Native code generation
• Linear scan register allocation
• And more . . .
Conclusion
•Under the hood of SpiderMonkey
•General but slow bytecode interpretation
•Two level JIT optimizations for hot codes
About Me
Security Researcher from DSNS Lab @ NCTU
• Interests • Virtual Machine
• Binary Translation
• Current Works • Android Code Obfuscation
• App Protection