context threading: a flexible and efficient dispatch technique for virtual machine interpreters

25
Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown

Upload: salene

Post on 21-Mar-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters. Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown. Research supported by IBM CAS, NSERC, CITO. Interpreter performance. Why not just JIT? High performance JVMs still interpret - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Research supported by IBM CAS, NSERC, CITO

Context Threading: A flexible and efficient dispatch technique for

virtual machine interpretersMarc Berndl

Benjamin VitaleMathew Zaleski

Angela Demke Brown

Page 2: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Interpreter performance

•Why not just JIT?•High performance JVMs still interpret•People use interpreted languages that

don’t yet have JITs•They still want performance!

•30-40% of execution time is due to branch misprediction

•Our technique eliminates 95% of branch mispredictions

Page 3: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Overview

Motivation•Background: The Context Problem•Existing Solutions•Our Approach•Inlining•Results

Page 4: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

load

A Tale of Two Machines

LoadedProgram

VirtualProgram

Return AddressWayness

(Conditional)

Execution Cycle

BytecodeBodies

Pipeline

Target Address(Indirect)

Pred

ictors

Execution Cycle

Virtual Machine Interpreter

Real MachineCPU

Page 5: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Interpreter

LoadedProgram

Bytecodebodies

Internal Representation

fetch

dispatch LoadParms

execute

Execution Cycle

Page 6: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return

Running Java Example

void foo(){ int i=1; do{ i+=i; } while(i<64); }

Java Source Java Bytecode

Javac compiler

Page 7: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

while(1){ opcode = *vPC++; switch(opcode){

//and many more..

}};

Switched Interpreter

case iload_1: ..

break;

case iadd: ..break;

slow. burdened by switch and loop overhead

Page 8: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

9Context Threading

“Threading” Dispatch

‣ No switch overhead. Data driven indirect branch.

execution of virtual program

“threads” through bodies(as in needle & thread)

iload_1: ..goto *vPC++;

iadd: ..goto *vPC++;

istore: ..goto *vPC++;

0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return

Page 9: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

10

Context Threading

0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return

Context Problem

‣ Data driven indirect branches hard to predict

iload_1: ..goto *vPC++;

iadd: ..goto *vPC++;

istore: ..goto *vPC++;

indirect branch

predictor(micro-arch)

Page 10: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Direct Threaded Interpreter

-7&&if_icmplt64&&bipush&&iload_1&&istore_1&&iadd&&iload_1&&iload_1…

iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

DTT - DirectThreading Table

VirtualProgram

vPC iload_1: ..goto *vPC++;

iadd: ..goto *vPC++;

Target of computed goto is data-driven

C implementationof each body

istore: ..goto *vPC++;

Page 11: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Existing Solutions

BodyBodyBodyBodyBody

GOTO *PC

????

Piumarta & Ricardi :Bodies Replicated

Super InstructionReplicate

iload_1goto *pc

1

iload_1goto *pc

2

1

1

2

2

Ertl & Gregg:Bodies and Dispatch

Replicated

Limited to relocatable virtual instructions

Page 12: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Overview

MotivationBackground: The Context ProblemExisting Solutions• Our Approach• Inlining• Results

Page 13: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Key Observation

•Virtual and native control flow similar•Linear or straight-line code•Conditional branches•Calls and Returns•Indirect branches

•Hardware has predictors for each type•Direct uses indirect branch for everything!

‣ Solution: Leverage hardware predictors

Page 14: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Essence of our Solution

iload_1: ..ret;

iadd: ..ret;

..call iload_1call istore_1call iaddcall iload_1call iload_1

CTT - ContextThreading Table (generated code)

Bytecode bodies (ret terminated)

Return Branch Predictor Stack

…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

Package bodies as subroutines and call them

Page 15: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Subroutine Threading

iload_1: …ret;

iadd: …ret;

call bipush call if_icmplt

call iload_1 call istore_1 call iadd call iload_1 call iload_1

CTT load timegenerated

code

Bytecode bodies (ret terminated)

if_cmplt: …goto *vPC++;

virtual branch instructions as before

…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2… 64

-7DTT contains

addresses in CTT

vPC

Page 16: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

The Context Threading Table

•A sequence of generated call instructions

•Good alignment of virtual and hardware control flow for straight-line code.

‣ Can virtual branches go into the CTT?

Page 17: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Specialized Branch Inlining

Conditional Branch

Predictor now

mobilized……target:

…call …call iload_1

if(icmplt) goto target:

Branch Inlined Into the CTT

5

DTT

vPC

target:……

Inlining conditional branches provides context

Page 18: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Tiny Inlining

•Context Threading is a dispatch technique •But, we inline branches

•Some non-branching bodies are very small•Why not inline those?

►Inline all tiny linear bodies into the CTT

Page 19: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Overview

MotivationBackground: The Context ProblemExisting SolutionsOur ApproachInlining• Results

Page 20: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Experimental Setup

•Two Virtual Machines on two hardware architectures.•VM: Java/SableVM, OCaml interpreter

•Compare against direct threaded SableVM•SableVM distro uses selective inlining

•Arch: P4, PPC

•Branch Misprediction•Execution Time ►Is our technique effective and general?

Page 21: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Mispredicted Taken BranchesNo

rmal

ized

to

Dire

ct T

hrea

ding

95% mispredictions eliminated on averageSableVm/Java Pentium 4

Page 22: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Execution timeNo

rmal

ized

to

Dire

ct T

hrea

ding

27% average reduction in execution time

Pentium 4

Page 23: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Execution Time (geomean)No

rmal

ized

to

Dire

ct T

hrea

ding

Our technique is effective and general

Page 24: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading

Conclusions•Context Problem: branch mispredictions due to mismatch between native and virtual control flow•Solution: Generate control flow code into the Context Threading Table•Results•Eliminate 95% of branch

mispredictions•Reduce execution time by 30-40%‣recent, post CGO 2005, work follows

Page 25: Context Threading:  A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading 32

What about Scripting Languages?

• Recently ported context threading to TCL.

• 10x cycles executed per bytecode dispatched.

• Much lower dispatch overhead.

• Speedup due to subroutine threading, approx. 5%.

• TCL conference 2005Cycle

s per

virt

ual

inst

ruct

ion