efficient x86 instrumentation :

51
Paradyn/Condor Week Madison, WI March 12-14, 2001 Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Itai Gurari [email protected] Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685

Upload: effie

Post on 18-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Efficient x86 Instrumentation :. Dynamic Rewriting and Function Relocation Itai Gurari [email protected] Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685. Introduction. Dynamic Instrumentation: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient x86 Instrumentation :

Paradyn/Condor Week Madison, WI March 12-14, 2001

Efficient x86 Instrumentation:Dynamic Rewriting and Function

Relocation

Itai [email protected]

Computer Science DepartmentUniversity of Wisconsin

1210 W. Dayton St.Madison, WI 53706-1685

Page 2: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 2

Introduction

Dynamic Instrumentation:• Insert instrumentation into application in

execution• Used by Paradyn to gather performance

data• Paradyn instrumentation is inserted for

three types of points– function entry, exit, and call

Page 3: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 3

Paradyn

Executable CodeExecutable Code

Instrumentation Points

foo () {

call <bar>

}

Page 4: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 4

Instrumentation Points

EntryEntry

CallCall

ExitExit

Paradyn

Executable CodeExecutable Code

foo () {

call <bar>

}

Page 5: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 5

EntryEntry

CallCall

ExitExit

startTimer()

stopTimer()

counter++

Executable CodeExecutable CodeInstrumentationInstrumentation

Paradyn Instrumentation Points

foo () {

call <bar>

}

Page 6: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 6

Transfer from function to instrumentation code as quickly as possible

Goal

Page 7: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 7

To switch execution from a function to its instrumentation code:– Overwrite instructions in function with a control

transfer instruction.– Equivalent of overwritten instructions are copied to

the code patch area. – On the x86, Paradyn uses, by default, a 5- byte

jump to transfer control the instrumentation code.•5-byte jump range is whole address space

– If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction).

Control Transfer

Page 8: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 8

• Dynamically rewrite function in place• Different techniques for different types

of instrumentation points

Inserting Control Transfer Instructions

Page 9: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 9

Jumps and TrapsInstrument Entry Point

Case 1

push mov sub

Page 10: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 10

Jumps and TrapsInstrument Entry Point

Case 1

jmp <instrumentation>

push mov sub

Enough room to replace Enough room to replace instruction with a jumpinstruction with a jump

Page 11: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 11

Jumps and TrapsInstrument Entry Point

Case 2

push mov jmp

Page 12: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 12

Jumps and TrapsInstrument Entry Point

Case 2

push mov

jmp <instrumentation>

jmp

jmp

Inserting a jump instruction interferes withInserting a jump instruction interferes withthe target of the backwards jumpthe target of the backwards jump

Page 13: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 13

Jumps and TrapsInstrument Entry Point

Case 2

push mov jmp

int3 mov jmp

Must use a trap instruction Must use a trap instruction to get to instrumentationto get to instrumentation

Page 14: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 14

Jumps and Traps

call <Foo>

Instrument Call Point

Page 15: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 15

Jumps and Traps

jmp <instrumentation>

Instrument Call Point

Enough room Enough room to replace instruction to replace instruction with a jumpwith a jump

call <Foo>

Page 16: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 16

Instrument Exit PointCase 1

Jumps and Traps

mov leave ret

Page 17: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 17

Jumps and Traps

jmp <instrumentation>

Instrument Exit PointCase 1

Back up far enough to replaceBack up far enough to replaceinstructions with a jumpinstructions with a jump

mov leave ret

Page 18: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 18

Jumps and TrapsInstrument Exit Point

Case 2

call <Foo> leave ret

Page 19: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 19

Jumps and Traps

call jmp <instrumentation>

Instrument Exit PointCase 2

Jump interferes with Jump interferes with the preceding callthe preceding call

call <Foo> leave ret

Page 20: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 20

Jumps and TrapsInstrument Exit Point

Case 2a

call <Foo> leave ret

Beginning of nextBeginning of nextfunction function (4-byte boundary)(4-byte boundary)

Page 21: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 21

Jumps and TrapsInstrument Exit Point

Case 2a

call <Foo> leave ret

Compiler padsCompiler padswith “bonus bytes”with “bonus bytes”

Beginning of nextBeginning of nextfunction function (4-byte boundary)(4-byte boundary)

? ? ?

Page 22: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 22

Jumps and Traps

jmp <instrumentation>

Instrument Exit PointCase 2a

call <Foo>

Replace instructionsReplace instructionswith a jumpwith a jump

call <Foo> leave ret

Compiler padsCompiler padswith “bonus bytes”with “bonus bytes”

Beginning of nextBeginning of nextfunction function (4-byte boundary)(4-byte boundary)

? ? ?

Page 23: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 23

Jumps and TrapsInstrument Exit Point

Case 2b

call <Foo> leave ret ?

Not enough Not enough ““bonus bytes” bonus bytes” to overwrite to overwrite with a jump with a jump (if any)(if any)

Page 24: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 24

Jumps and TrapsInstrument Exit Point

Case 2b

Overwrite Overwrite return with return with a trapa trap

call <Foo> leave ret

call <Foo> leave int3 ?

Not enough Not enough ““bonus bytes” bonus bytes” to overwrite to overwrite with a jumpwith a jump(if any) (if any)

?

Page 25: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 25

Jumps and TrapsExtra slot

push mov sub mov

No jumps to first ten bytes of functionNo jumps to first ten bytes of function

Page 26: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 26

Jumps and TrapsExtra slot

push mov sub

jmp <instrumentation> mov

No jumps to first ten bytes of functionNo jumps to first ten bytes of function

Enough space to Enough space to overwrite entry overwrite entry with a jumpwith a jump

mov

Page 27: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 27

Jumps and TrapsExtra slot

push mov sub

jmp <instrumentation> jmp <instrumentation>

Enough space to Enough space to overwrite entry overwrite entry with a jumpwith a jump

No jumps to first ten bytes of functionNo jumps to first ten bytes of function

Make 2-byte jump to “extraMake 2-byte jump to “extraslot”, overwrite “extra slot” slot”, overwrite “extra slot” with jump to instrumentationwith jump to instrumentation

mov

Page 28: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 28

Traps on x86• Generate an exception that is caught by

either the application (Solaris, Linux) or the paradyn daemon (Windows NT).

• Address of trap instruction is used to calculate which instrumentation code to execute.

Control Transfer

Page 29: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 29

Trap handling is slow:• On Solaris 2.6 jumps are over 1000 times faster than traps.• On Linux 2.2 jumps are over 200 times faster than traps

Traps Limit Instrumentation:• can’t insert as much or at as fine a granularity

Trap handling logic is difficult:• Susceptible to bugs

• Difficult to understand and maintain

Problem

Page 30: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 30

Solution

Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps.– Rewrite the function, on-the-fly:

combines dynamic instrumentation, binary rewriting.

Page 31: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 31

DynamicRewriting

Dynamic Rewriting

Page 32: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 32

DynamicRewriting

overwriteexisting

instructions

Dynamic Rewriting

Page 33: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 33

DynamicRewriting

overwriteexisting

instructions

expand instrumentation

points

Dynamic Rewriting

Page 34: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 34

DynamicRewriting

overwriteexisting

instructions

Relocate Function

Dynamic Rewriting

expand instrumentation

points

Page 35: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 35

In Paradyn we rewrite a function:– only if the function contains an instrumentation

point that would require using a trap to instrument

– the first time a request to instrument the function is made

– even if the instrumentation to be inserted is not for a point that requires using a jump •e.g. the exit needs a trap, the entry can use

a jump, request is to instrument the entry

Function Rewriting and Relocation

Page 36: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 36

– all instrumentation points that cannot use a jump are expanded.

Function Rewriting and Relocation(continued)

Page 37: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 37

Rewriting A Function

EntryEntry

push mov

call <Bar>

CallCall ExitExit

ret

call <Foo>

CallCall

Page 38: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 38

push mov

EntryEntry

nop

Insert nop at entryInsert nop at entry

call <Bar>

CallCall ExitExit

ret

call <Foo>

Rewriting A Function

CallCall

Page 39: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 39

EntryEntry

Insert nop at entryInsert nop at entry

call <Bar>

CallCall ExitExit

ret

call <Foo> jmp < instrumentation >

Rewriting A Function

CallCall

Page 40: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 40

call <Bar>

CallCall

nop nop nop nop

Insert nops at exitInsert nops at exit

ExitExit

ret

call <Foo>

EntryEntry

Insert nop at entryInsert nop at entry

jmp < instrumentation >

Rewriting A Function

CallCall

Page 41: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 41

call <Bar>

CallCall

Insert nops at exitInsert nops at exit

ExitExit

call <Foo> jmp < instrumentation >

jmp < instrumentation >

Rewriting A Function

EntryEntry

Insert nop at entryInsert nop at entry

CallCall

Page 42: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 42

Rewriting A Function

EntryEntry

push mov

call <Bar>

CallCall ExitExit

ret

call <Foo>

Original FunctionOriginal Function CallCall

Page 43: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 43

Overwrite entry of original Overwrite entry of original function with jump to function with jump to rewritten functionrewritten function

call <Foo>

CallCall ExitExit

ret

Rewriting A Function

EntryEntry

call <Foo> jmp < rewritten function>

Original FunctionOriginal Function

Page 44: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 44

Update Jumps and Calls• PC-relative jump and call instructions:

– with destinations outside the function will have incorrect displacements

– some jumps to locations inside the function will have incorrect displacements

• 2-byte jumps:– have range of 128 bytes forward, 127

bytes backwards– if target address is no longer in range,

replace 2-byte instruction with 5-byte instruction that has further reach

Page 45: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 45

Status

Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT).

Page 46: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 46

Current Limitations

We do not relocate a function if: – the application is executing within the

function we want to instrument– it has a jump table

Page 47: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 47

Jumps vs. Traps

Trap JumpSolaris

Linux

37.6

8.3

.03

.04

Trap handling:Trap handling:Average time to get to instrumentation and back Average time to get to instrumentation and back

• time in microsecondstime in microseconds

Page 48: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 48

• Relocating functions that are performance bottlenecks, leads to greatest speedup

• More instrumentation can be inserted since perturbation to system is minimized.

• In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls)

Jumps vs. Traps

Page 49: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 49

Some Resultsbubba (circuit layout)bubba (circuit layout)• instrumented 9 functions for CPU

– all required trap for exit point– 5 relocated functions

•called 400 thousand times •consumed 20% of CPU.

• 23 seconds to execute using relocation • 42 seconds to execute without relocation

Page 50: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 50

Some Resultsfspx (2-D heat transfer simulation)fspx (2-D heat transfer simulation)• 4 of 46 functions required traps

– all for exit points

• instrumented __atan for CPU – required trap for exit– called 107 million times – consumed 25% of CPU.

• 7.5 minutes to execute using relocation • 115 minutes to execute without

relocation

Page 51: Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 51

Conclusions

Dynamic rewriting and function relocation:

• Used by Paradyn to allow using jumps, instead of traps, when profiling applications, to improve performance.

• Crucial for large scale and fine-grained instrumentation.