dynamic binary optimization

31
Dynamic Binary Optimization Presenter Kim Jin Chul

Upload: nydia

Post on 16-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Dynamic Binary Optimization. Presenter Kim Jin Chul. Contents. 1. Overview of Applying Optimization on VMs. 2. Dynamic Program Behavior. 3. Profiling. 4. Optimizing Translation Blocks. addir16, r4, 4; add 4 to %eax lwzxr17, r2, r16; load operand from memory - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Binary Optimization

Dynamic Binary Optimization

Presenter Kim Jin Chul

Page 2: Dynamic Binary Optimization

Contents

Overview of Applying Optimization on VMs

Profiling

Optimizing Translation Blocks

11

33

44

22 Dynamic Program Behavior

Page 3: Dynamic Binary Optimization

Classical Optimizations

addi r16, r4, 4 ; add 4 to %eaxlwzx r17, r2, r16 ; load operand from memory add r7, r17, r7 ; perform add of %edxaddi r16, r4, 4 ; add 4 to %eaxstwx r7, r2, r16 ; store %edx value into memory

addl %edx, 4(%eax)movl 4(%eax), %edx

addi r16, r4, 4 ; add 4 to %eaxlwzx r17, r2, r16 ; load operand from memoryadd r7, r17, r7 ; perform add of %edxstwx r7, r2, r16 ; store %edx value into memory

Translation from IA-32 to PowerPC code.

Adopt a Common Subexpression Elimination

Page 4: Dynamic Binary Optimization

Optimization Based on Profiling

Basic Block A ... ... R3 ← ... R7 ← ... R1 ← R2 + R3 Br L1 if R3 == 0

Basic Block B ... R6 ← R1 + R6 ... ...

Basic Block CL1: R1 ← 0 ... ...

Basic Block A ... ... R3 ← ... R7 ← ... Br L1 if R3 == 0

Basic Block B ... R6 ← R1 + R6 ... ...

Basic Block CL1: R1 ← 0 ... ...

Basic Block B ... R6 ← R1 + R6 ... ...

Basic Block CL1: R1 ← 0 ... ...

Basic Block A ... ... R3 ← ... R7 ← ... Br L1 if R3 == 0

Compensation code R1 ← R2 + R3

use

def

Page 5: Dynamic Binary Optimization

Optimization Based on Profiling

Basic Block A ... ... R3 ← ... R7 ← ... R1 ← R2 + R3 Br L1 if R3 == 0

Superblock ... ... R3 ← ... R7 ← ... Br L2 if R3 != 0 R1 ← 0 ... ...

Compensation code R1 ← R2 + R3

Basic Block B L2:... R6 ← R1 + R6 ... ...

Basic Block B ... R6 ← R1 + R6 ... ...

Basic Block CL1: R1 ← 0 ... ...

Page 6: Dynamic Binary Optimization

A staged optimization system

Binary memory

image

Basic block

cache

Code cache Profile data

Translator OptimizerEmulation

manager

Interpreter

Stages: Interpret Basic translation Optmized block Highly optimized blocks

Fast startup Very slow startup

Slow steady state Fast steady state

Simple profiling Extensive profiling

Page 7: Dynamic Binary Optimization

Dynamic Program Behavior

Dynamic control flow is highly predictable

.

.R3 ← 100

loop: R1 ← mem(R2)Br found if R1 == –1R2 ← R2 + 4R3 ← R3 – 1Br loop if R3 != 0..

found: ...

Page 8: Dynamic Binary Optimization

Dynamic Program Behavior

50%

40%

30%

20%

10%

0%

0-10% 10-20%

20-30%

30-40% 40-50%

50-60% 60-70% 70-80%

80-90% >90%

Distribution of taken conditional branches

Predominantly not taken : 28%Predominantly taken : 42%

Fra

ctio

n of

sta

tic

cond

itio

nal b

ranc

hes

Percent taken

Back...

Page 9: Dynamic Binary Optimization

Dynamic Program Behavior

50%

40%

30%

20%

10%

0%

176.gcc 181.mcf 197.parser 252.eon 256.bzip2 171.swim 173.applu177.mesa187.facerec189.lucas

100%

90%

80%

70%

60%

Consistency of conditional branches The high percentage consists of backward branches

Benchmark

Dyn

amic

bra

nche

s de

cide

d sa

me

as p

revi

ous

tim

e

SPEC

Page 10: Dynamic Binary Optimization

Dynamic Program Behavior

The predictability of indirect jumps Some jump destination addresses seldom change

25%

20%

15%

10%

5%

0%

1 2 3 4 5 6 7 8 9 >9

Number of different destinations

Per

cent

of i

ndir

ect j

umps

Page 11: Dynamic Binary Optimization

Dynamic Program Behavior

The predictability of data value

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

All Add/Sub Load Logic Shift Set

Fra

ctio

n w

ith

cons

tant

val

ue

Instruction type

Static

Dynamic

Static instructions always compute the same value

Dynamic instructions execute the static instructions

Page 12: Dynamic Binary Optimization

Profiling

The process of collecting instruction and data statistics for an executing program

Optimization based on profiling work

Binary memory

image

Basic block

cache

Code cache Profile data

Translator OptimizerEmulation

manager

Interpreter

Back...

Page 13: Dynamic Binary Optimization

The Role of Profiling

HLLProgram

CompilerFrontend

A

B C

D

E

F

CompilerBackend

InstrumentedCode

InstrumentedCode

Test Data

ProgramExecution

ProgramStatistics

OptimizingCompiler

OptimizedBinary

Traditional profiling

Page 14: Dynamic Binary Optimization

The Role of Profiling

A

B

D

E

ProgramBinary

ProgramData

Interpreter Translator/Optimizer

PartialProgramStatistics

On-the-fly profiling in a dynamic optimizing VM

Page 15: Dynamic Binary Optimization

Types of Profiles

Several types of profile data How frequently different code regions are

being executed? It can be used to decide the level of

optimization Is control flow predictability?

It may be used as the basis for gathering and rearranging basic blocks

Rearranged basic blocks get a chance to be merged superblock

Page 16: Dynamic Binary Optimization

Types of Profiles

A

B C

D

E

F17

65

1550

25

48

A

B C

D

E

F

15

50

13

50

10

48

15

12

38

2

17

A basic block profile A edge profile

Page 17: Dynamic Binary Optimization

Collecting Profiles

Instrumentation-based profiling Specific program-related events and counts all

instances of the events being profiled Software-based Vs Hardware-based

Speed? Support? Flexibility?

Sampling-based profiling Program runs in its unmodified form, the

program is interrupted and event is captured

Instrumentation Vs Sampling Overhead : Instrumentation < Sampling

Sampling causes traps!

Page 18: Dynamic Binary Optimization

Profiling During InterpretationInstruction function list..branch_conditional(inst) { BO = extract(inst, 25, 5); BI = extract(inst, 20, 5); displacement = extract(inst, 15, 14) * 4; . . // code to compute whether branch should be taken . . profile_addr = lookup(PC); if (branch_taken) profile_cnt(profile_addr, taken); PC = PC + displacement; Else profile_cnt(profile_addr, nottaken); PC = PC + 4;}

PC

Takencount

Not-takencount

HASHBranch PC

PowerPC Branch Conditional Interpreter Routine

Profile Table for Collecting an Edge Profile During Interpretation

Page 19: Dynamic Binary Optimization

Profiling Translated Code

Translated basic block

Fall-through stub

Branch target stub

increment edge counter (j)

if (counter (j) > trigger) then invoke optimizer

else branch to target basic block

increment edge counter (i)

if (counter (i) > trigger) then invoke optimizer

else branch to fall-through basic block

Edge Profiling Code Inserted into Stubs of a Binary Translated Basic Block

Emulation Stages

Page 20: Dynamic Binary Optimization

Profiling Overhead

For profiling during interpretation, occurring 10-20% overhead

Profiling overheads can be reduced To reduce the number of instrumentation

points by selecting a smaller set of key points

Page 21: Dynamic Binary Optimization

Optimizing Translation Blocks

Two-part strategy for optimzing Using dominant control flow for enhancing

memory locality Making a translation blocks larger

Traces, Superblocks, Tree groups

Two parts of the strategy are actually relatively independent

Page 22: Dynamic Binary Optimization

Improving Locality

Two kinds of memory localities Spatial locality

Access to a memory location is soon followed by a memory access to an adjacent memory location

Temporal locality Access to a memory location is accessed

again in the near future

Page 23: Dynamic Binary Optimization

Improving Locality

ABr cond1 = = true

BBr cond2 = = false

CBr uncond

DBr cond3 = = true

EBr uncond

F

GBr cond4 = = true

A

B D

CF

G97

30

1

1

70

29

1

3

68

E

6829

2

Example code sequence

Page 24: Dynamic Binary Optimization

Improving Locality

ABr cond1 = = false

DBr cond3 = = true

F

Br uncond

G

Br cond2 = = false

E

Br uncond

B

C

Br cond4 = = true

Br uncond

A

B D

CF

G97

30

1

1

70

29

1

3

68

E

6829

2

Rearrange the blocks in memory

Page 25: Dynamic Binary Optimization

Improving Locality

A

B

Call proc xyz

.

.

.

K

L

Call proc xyz

X

Y

Proc xyz

ZReturn

X

Y

Z

A

B

X

Y

Z

K

L

.

.

.

X

Y

A

B

X

Z

K

L

.

.

.

Procedure InliningPositive & Negative

Effect?

Page 26: Dynamic Binary Optimization

Traces

A

B D

CF

G97

30

1

1

70

29

1

3

68

E

6829

2

Trace 1

Trace 2

Trace 3

Trace A contiguous sequence Both side entrances and side exits

Superblocks

Traces

Relations between Superblocks and Traces

Page 27: Dynamic Binary Optimization

Superblocks

A

B D

CF

G97

30

1

1

70

29

1

3

68

E

6829

2

A

B D

CF

G

E

G G

Superblocks Regions of code with only one entry and one or

more exit points

Page 28: Dynamic Binary Optimization

Superblocks

ABr cond1 = = false

DBr cond3 = = true

F

Br uncond

G

Br cond2 = = false

E

Br uncond

B

C

Br cond4 = = true

Br uncond

ABr cond1 = = false

DBr cond3 = = true

F

Br uncond

G

Br cond2 = = false

E

Br uncond

B

C

Br cond4 = = true

Br uncond

G

G

Br cond4 = = true

Br cond4 = = true

Page 29: Dynamic Binary Optimization

Tree Groups

A

B D

CF

G

E

G G

Tree groups Regions of code with only one entry and one or

more exit pointsFigure 4.7

Page 30: Dynamic Binary Optimization
Page 31: Dynamic Binary Optimization

SPEC benchmarks

Integer SPEC benchmark 176.gcc – GNU Compiler 181.mcf – Combinatorial Optimization 197.parset – Word Processor 252.eon – Computer Visualization 256.bzip2 – Compression

Floating-Point SPEC benchmark 171.swim – Shallow Water Modeling 173.applu – Parabolic 187.facerec – Imageprocessing 189.lucas – Number Theory

Back...