(coscup 2015) a beginner's journey to mozilla spidermonkey js engine

86
COSCUP 2015 ZongShen Shen [email protected] A Beginner’s Journey to Mozilla SpiderMonkey JS Engine

Upload: zong-shen-shen

Post on 17-Aug-2015

9.382 views

Category:

Technology


0 download

TRANSCRIPT

COSCUP 2015

ZongShen Shen

[email protected]

A Beginner’s Journey to Mozilla

SpiderMonkey JS Engine

Why Joining SpiderMonkey

• Explore a real language engine implementation

• Good First Features encouraging beginners

About the Talk

• Under the hood of engine implementation

• Begineer’s view and experience sharing

Outline

•Bytecode & Interpreter Basics

• JIT Optimization

SpiderMonkey Overview

NativeCode

Bytecode

JIT Compiler

JS Source

Compiler

Interpreter

CPU

SpiderMonkey Overview

NativeCode

Bytecode

JIT Compiler

JS Source

Compiler

Interpreter

CPU

Bytecode Generation

SpiderMonkey Overview

NativeCode

Bytecode

JIT Compiler

JS Source

Compiler

Interpreter

CPU

Bytecode Interpretation

Bytecode Generation

SpiderMonkey Overview

NativeCode

Bytecode

JIT Compiler

JS Source

Compiler

Interpreter

CPU

Bytecode Interpretation Hot Code Optimization

Native Code Execution

Bytecode Generation

Bytecode Compiler

• Lexical Analysis

• Split the source script into token stream

• Syntactic Analysis

• Parse token stream and build Abstract Syntax Tree

• Code Generation

• Traverse the AST to emit bytecode

Lexical Analysis

var x = y + z ;

var a = b * c ;

Variable

Name

Assignment

Add

Semicolon

VarOrExprs → var Vars | Expr

Vars → Var | Var, Vars

Var → Id | Id = AssignExpr

Expr → AssignExpr | AssignExpr, Expr

AssignExpr → CondExpr | CondExpr AssignOp AssignExpr

AddExprs → MulExpr | MulExpr + AddExpr

MulExpr → UnaryExpr | UnaryExpr * MulExpr

PrimaryExpr → (Expr) | Id | LitInt | LitFloat | LitString

| false | true | null | this

Syntactic Analysis

. . .

Recursive Descent Parsing

. . .

Top to Bottom

Left to Right

Syntactic Analysis

Statement List

Assignment

Def : x BinaryAdd

Use : y Use : z

Assignment

Def : a BinaryMultiply

Use : b Use : c

Result AST

Code Generation

= =

x

y

S

z

+ a

b c

*

Code Generation

= =

x

y

S

z

+ a

b c

*

DefVar x BindName x

Code Generation

= =

x

y

S

z

+ a

b c

*

DefVar x BindName x GetName y

Code Generation

= =

x

y

S

z

+ a

b c

*

DefVar x BindName x GetName y GetName z

Code Generation

= =

x

y

S

z

+ a

b c

*

DefVar x BindName x GetName y GetName z Add

Code Generation

= =

x

y

S

z

+ a

b c

*

DefVar x BindName x GetName y GetName z Add SetName x

Code Generation

= =

x

y

S

z

+ a

b c

*

DefVar x DefVar a BindName x GetName y GetName z Add SetName x

BindName a GetName b GetName c Mul SetName a

Bytecode Interpreter

• Prepare the stack frame to interpret bytecode

• Dispatch bytecode in a large switch statement

INTERPRETER_LOOP ( )

CASE ( JSOP_GETNAME ) {

GetNameOperation( ) } CASE ( JSOP_ADD ) {

AddOperation( ) } CASE ( JSOP_SETNAME ) {

SetNameOperation( )

} ... ... More Handlers ... ... END_LOOP ( )

function add (src, dst) {

return src + dst;

}

add(“coscup”, 2015);

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

Caller Callee Stack Frame

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

Caller Callee Stack Frame

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

Caller Callee Stack Frame

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

Caller Callee Stack Frame

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

JSVal: 2015

Caller Callee Stack Frame

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

JSVal: 2015

Caller Callee Stack Frame

JSVal: “coscup”

JSVal: 2015

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

JSVal: 2015

JSVal: “coscup”

Caller Callee Stack Frame

JSVal: “coscup”

JSVal: 2015

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

JSVal: 2015

JSVal: “coscup”

JSVal: 2015

Caller Callee Stack Frame

JSVal: “coscup”

JSVal: 2015

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

JSVal: 2015

Caller Callee

JSVal: “coscup2015”

Stack Frame

JSVal: “coscup”

JSVal: 2015

Interpretation Example

GetName “add”

Undefined

String “coscup”

Int16 2015

Call 2

GetArg 0

GetArg 1

Add

Return

JSVal: Func_add

JSVal: Undef

JSVal: “coscup”

JSVal: 2015

Caller Callee

JSVal: “coscup2015”

Stack Frame

Interpretation Example

Performance Disadvantage

• Immediate execution without proper redundancy

elimination and task specialized optimization

Performance Disadvantage

• Immediate execution without proper redundancy

elimination and task specialized optimization

Example

Object Property Access

Obj.Prop

JS Object

var People = {

Name : “Me”,

Age : 1,

Gender : “M”

};

Property Value

People.Name

People.Age

People.Gender Property Access

Object Internal

• A list of shapes each of which

• Represents a named property

• A vector of slots each of which

• Stores the value of the mapped property

• A shape to describe its overall attributes

Object

Name

“Me”

Shape List

Slot Vector Attr

Shape Age Gender

1 “M”

Object Property Access

• Object layout traversal

1. Search shape list to locate

the target property shape

2. Access slot vector with the

index found in the shape

P1

Pi

Pj

Pn

Object

Object Property Access

• Object layout traversal

1. Search shape list to locate

the target property shape

2. Access slot vector with the

index found in the shape

• To speed up traversal

• Attach hash tables with some

shapes for table indexing

P1

Pi

Pj

Pn

Object

Pi

Pj

Performance Gap

lea eax, obj mov ebx, [eax + 4]

AoT Compilation

Direct access Slow object

layout traversal

struct Object { int Prop1; int Prop2;

}; int prop = obj -> Prop2;

var obj = { Prop1 : 1, Prop2 : 2,

} var prop = obj.Prop2;

Interpretation

VS

GetName obj GetProp Prop2

Can we improve the performance?

In addition to object property access,

Still many issues…

Can we improve the performance?

In addition to object property access,

Still many issues…

Interpretation

JIT Compilation

JIT Compilation

• Generate extremely fast native code

• Baseline for hot methods

• Inline cache to speed up dynamic property lookup

• IonMonkey for very hot methods

• Comprehensive optimization to remove redundancy

Inline Cache

• Objective

• Mitigate the overhead of object layout traversal

for each single property access

• Idea

• Cache the resolved value after dynamic lookup

• Emit a piece of direct access code for that value

Inline Cache

var res = obj.prop;

GetName “obj” GetProp “prop”

Inline Cache

var res = obj.prop;

GetName “obj” GetProp “prop”

Dynamic lookup logic

Inline Cache

• Efficient code for direct access

• But if obj is modified, the code will be unsafe

var res = obj.prop;

GetName “obj” GetProp “prop” mov eax, obj

mov eax, [eax + OfstSlot]

Direct Access Guard

• If an object is modified with property insertion or

deletion, its layout is also changed

• Execute the cached code may cause invalid access

• Need a guard to check for object modification

• Object remains the same, enter cached code

• Otherwise, fallback to dynamic lookup and reoptimize

Direct Access Guard

• Benefit from object shape

• Object has a shape to describe its overall attribute

• The object shape is synchronized with its layout

Direct Access Guard

• Benefit from object shape

• Object has a shape to describe its overall attribute

• The object shape is synchronized with its layout

• Applying object shape to guard the cached code

mov eax, obj cmp [eax + ShapeOfst], CachedShape

Inline Cache Instance

Prologue

mov eax, obj

call VM_CallBack

Inline Cache Instance

Prologue Interpreter Callback

mov eax, obj

call VM_CallBack

1. Resolve designated property

Inline Cache Instance

Prologue Interpreter Callback

mov eax, obj

call VM_CallBack

1. Resolve designated property

2. Generate direct access code

cmp [eax+ShapeOfst], CachedShape jne MISS

mov eax, [eax+CachedSlotOfst] jmp EXIT

MISS: call VM_CallBack

EXIT:

Cached code

Inline Cache Instance

Prologue Interpreter Callback

mov eax, obj

1. Resolve designated property

2. Generate direct access code

3. Modify original call site

cmp [eax+ShapeOfst], CachedShape jne MISS

mov eax, [eax+CachedSlotOfst] jmp EXIT

MISS: call VM_CallBack

EXIT:

Cached code

call VM_CallBack call Cached_Code

Inline Cache Instance

Prologue Interpreter Callback

mov eax, obj

1. Resolve designated property

2. Generate direct access code

3. Modify original call site

4. Jump to cached code

cmp [eax+ShapeOfst], CachedShape jne MISS

mov eax, [eax+CachedSlotOfst] jmp EXIT

MISS: call VM_CallBack

EXIT:

Cached code

call VM_CallBack call Cached_Code

Inline Cache Instance

Prologue Interpreter Callback

mov eax, obj

1. Resolve designated property

2. Generate direct access code

3. Modify original call site

4. Jump to cached code

cmp [eax+ShapeOfst], CachedShape jne MISS

mov eax, [eax+CachedSlotOfst] jmp EXIT

MISS: call VM_CallBack

EXIT:

Cached code

call VM_CallBack call Cached_Code

After code linking,

It will be direct access,

If shape not changed

What If ...

var dog = { Name : “dog”, Bow : function( ){ },

}

var cat = { Name : “cat”, Meow : function( ){ },

}

for (var i = 0 ; i < 100 ; i++) { WhoAmI(dog); WhoAmI(cat);

}

function WhoAmI (obj) { return obj.Name; }

dog cat dog cat . . .

Expensive cache and flush

Polymorphic IC • Cache multiple sets of object shapes and the

resolved values

cmp [eax+ShapeOfst], CachedShape1 jne SHAPE2 mov eax, [eax+CachedSlotOfst1] jmp EXIT

SHAPE2: cmp [eax+ShapeOfst], CachedShape2 jne SHAPE3 mov eax, [eax+CachedSlotOfst2] jmp EXIT … … …

MISS: call VM_CallBack

EXIT:

IonMonkey

• Translate bytecode to static single assignment

form (SSA) and build control flow graph

• Apply data and control flow hybrid optimization

• Translate optimized SSAs to native code

Warm up for basic terms…

Static Single Assignment

• Each expression has at most 3 operands

• Each target operand has an unique assignment

X = 1

X = 2

Y = X + 1

Z = 3

Y = X + 2

X1 = 1

X2 = 2

Y1 = X2 + 1

Z1 = 3

Y2 = X2 + 2

Original Code SSA Form

Control Flow Graph

• The control flow relation among basic blocks

• Basic block

Consecutive instructions with last one as control transfer Goto Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

Cond

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T F

T F

B2 B3

B4 B5

B1

Lets start the optimizations…

Value Numbering

• Eliminate redundant expressions

X1 = A1 + B1

Y1 = 1

Z1 = A1 + B1

X1 = A1 + B1

Y1 = 1

Z1 = X1

• Often combined with other optimizations

• Constant folding and propagation

• Expression simplification

• Unreachable code elimination

Value Numbering

• Assign a hash value to each expression

• Expressions containing the same value of a

former expression can be reduced

• Same set of source values

• Same operator considering algebraic commutative

X1 = A1 + B1

Z1 = B1 + A1

(+, V1, V2) V3

Hash Key Value

Z1 = X1

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

A1 B1 3 8

Operand

V1 V2 V3 V4

Value Hash Key

(A1) (B1) (3) (8)

Local Scope

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

A1 B1 3 8

X1

Operand

V1 V2 V3 V4

V5

Value Hash Key

(A1) (B1) (3) (8)

(-, V1,V2)

Local Scope

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

A1 B1 3 8

X1 X2

Operand

V1 V2 V3 V4

V5 V3

Value Hash Key

(A1) (B1) (3) (8)

(-, V1,V2) (V3)

Local Scope

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

A1 B1 3 8

X1 X2 Y1

Operand

V1 V2 V3 V4

V5 V3 V6

Value Hash Key

(A1) (B1) (3) (8)

(-, V1,V2) (V3)

(+,V1,V2)

Local Scope

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

Z1 = 6

A1 B1 3 8 6

X1 X2 Y1 Z1

Operand

V1 V2 V3 V4 V7

V5 V3 V6 V7

Value Hash Key

(A1) (B1) (3) (8) (6)

(-, V1,V2) (V3)

(+,V1,V2) (V7)

Local Scope

Constant Folding

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

Z1 = 6

T1 = 9

A1 B1 3 8 6 9

X1 X2 Y1 Z1 T1

Operand

V1 V2 V3 V4 V7 V8 V5 V3 V6 V7 V8

Value Hash Key

(A1) (B1) (3) (8) (6) (9)

(-, V1,V2) (V3)

(+,V1,V2) (V7) (V8)

Local Scope

Constant Folding

Const Propagation

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

Z1 = 6

T1 = 9

U1 = Y1

A1 B1 3 8 6 9

X1 X2 Y1 Z1 T1 U1

Operand

V1 V2 V3 V4 V7 V8 V5 V3 V6 V7 V8 V6

Value Hash Key

(A1) (B1) (3) (8) (6) (9)

(-, V1,V2) (V3)

(+,V1,V2) (V7) (V8)

(+,V1,V2)

Local Scope

Constant Folding

Const Propagation

X1 = A1–B1

X2 = 3

Y1 = A1+B1

Z1 = 3 + 3

T1 = Z1+ 3

U1 = B1+A1

V1 = B1* 8

Z1 = 6

T1 = 9

U1 = Y1

V1 = B1<< 3

A1 B1 3 8 6 9

X1 X2 Y1 Z1 T1 U1

Operand

V1 V2 V3 V4 V7 V8 V5 V3 V6 V7 V8 V6

Value Hash Key

(A1) (B1) (3) (8) (6) (9)

(-, V1,V2) (V3)

(+,V1,V2) (V7) (V8)

(+,V1,V2) V1 V9 (<<,V2,V3)

Local Scope

Constant Folding

Const Propagation

Expr Simplification

Extend to Global Scope

• Require analysis for dominating relation in CFG

• For exprs e1 and e2, e2 can be reduced if

• e2 has the same value with e1

• e1 dominates e2 in CFG, that is, all paths from entry

point to e2 must go through e1

• Examine basic blocks in reverse post order

• Guarantee dominating exprs are handled first

Global Scope

Goto Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T F

T F

B1

B2 B3

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• In B4

Global Scope

Goto Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T F

T F

B1

B2 B3

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• In B4

Global Scope

Goto Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T F

T F

B1

B2 B3

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• Z1 = 6

• In B4

Global Scope

Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T

T F

B1

B2

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• Z1 = 6

• B3 is removed via UCE

• In B4

Global Scope

Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T

T F

B1

B2

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• Z1 = 6

• B3 is removed via UCE

• In B4

Global Scope

Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T

T F

B1

B2

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• Z1 = 6

• B3 is removed via UCE

• In B4

• V1 = Y1

Global Scope

Cond

X1 = 3

Y1 = A1+B1

Z1 = X1+ 3

T1 = A1 – B1

Z1 > 3

V1 = A1+B1

W1 = B1- 3

U1 = B1- 3

T

T F

B1

B2

B4 B5

• Dominating relation

• B1 dominates B2,B3,B4,B5

• Reverse post order

• B1, B3, B2, B5, B4

• In B1

• Z1 = 6

• B3 is removed via UCE

• In B4

• V1 = Y1

• W1 cannot be simplified

Loop Invariant Code Motion

• Hoist the loop invariant exprs outside the loop

• For a loop invariant expression x = y + z

• y and z should not depend on the operands defined

in the loop

Loop Invariant Code Motion X1 = A1+B1

Y1 = X1+ 3

Z1 = Y1+ A1

T1 = A1- B1

U1 = T1+ 3

V1 = Y1+ U1

• Invariant expressions

• e1: Y1 = X1 + 3

• e2: T1 = A1 – B1

•Hoist e1 and e2 from

B3 to B1

B1

B2

B3 V1 < 100

Loop Invariant Code Motion X1 = A1+B1

Y1 = X1+ 3

T1 = A1-B1

Z1 = Y1+ A1

U1 = T1+ 3

V1 = Y1+ U1

• Invariant expressions

• e1: Y1 = X1 + 3

• e2: T1 = A1 – B1

•Hoist e1 and e2 from

B3 to B1

B1

B2

B3 V1 < 100

More Optimizations

• SSA and control flow optimizations

• Dead code elimination

• Value range analysis

• Loop unrolling

• And more . . .

• Native code generation

• Linear scan register allocation

• And more . . .

Conclusion

•Under the hood of SpiderMonkey

•General but slow bytecode interpretation

•Two level JIT optimizations for hot codes

About Me

Security Researcher from DSNS Lab @ NCTU

• Interests • Virtual Machine

• Binary Translation

• Current Works • Android Code Obfuscation

• App Protection

Thanks for Listening