instant profiling: instrumentation sampling for profiling datacenter applications

Post on 23-Feb-2016

63 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications. Hyoun Kyu Cho 1 , Tipp Moseley 2 , Richard Hank 2 , Derek Bruening 2 , Scott Mahlke 1. 1 University of Michigan 2 Google. Datacenter Applications. http://googleblog.blogspot.com. - PowerPoint PPT Presentation

TRANSCRIPT

1

Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

Hyoun Kyu Cho1, Tipp Moseley2, Richard Hank2, Derek Bruening2, Scott Mahlke1

1University of Michigan 2Google

2

Datacenter Applications

• In 2010, US Datacenters spent 70~90 billion kWh*

• Datacenter application performance is critical• Profiling can help

http://googleblog.blogspot.com

*[Koomey`11]

3

Challenges for Datacenters• Need to run on live traffic

• Difficult to isolate• Overheads

• Value profiling 3.8x slowdown1

• Path profiling 31%, edge profiling 16%2

• Binary management• Many programs, multiple

versions

Traditional ProfilingSource Code

Instrumented Binary

Input Data

Instrumentation Build

TrainingRun

Profile Data

1[Calder`99] 2[Ball`96]

4

Continuous profiling infrastructure for datacenters

Negligible overhead• Sampling based• Aggregated profiling overhead less than 0.01%

Limitations• Heavily rely on Performance Monitoring Units• Limited flexibility and portabiliity

[Ren et al.`10]

Google-Wide Profiling

5

Unified profiling infrastructure for datacenters• Flexible types of profile data• Portable across heterogeneous datacenter

While maintaining• Low overhead• Does not burden binary management

Goals

Sampling Dynamic Binary Instrumentation

6

Instrumentation Sampling

hardware

operating system

application

system call gateway

6

Instrumentation Sampling

hardware

operating system

application

[Bruening`04]

dispatch instrumentationengine client

code cacheDynamoRIO

context switch

6

Instrumentation Sampling

hardware

operating system

application

shep

herd

ing

thre

ad

start profiling

dispatch instrumentationengine client

code cachestopprofiling

Unbounded profiling periods due to fragment linking

Latency degradation due to initial instrumentation

Multi-threade programs

7

Problems with Basic Implementation

code cache

8

Temporal Unlinking/Relinking of Fragments

BB1

BB2

dispatch

contextswitch

BB2->BB1

9

S/W Code Cache Pre-population

hardware

operating system

application

shep

herd

ing

thre

ad

dispatch instrumentationengine

clientcode cache

Still have latency degradation for intial instrumentation phases

Sampling makes it possible to miss thread operations

Forces Instant Profiling’s signal handler for every thread

Enumerates all threads and sends profiling start signal to each thread

10

Multithreaded Program Support

6-core Intel Xeon 2.67GHz w/ 12MB L3 12GB main memory Linux kernel 2.6.32 gcc 4.4.3 w/ -O3 SPEC INT2006, BigTable, Web search Edge profiling client

11

Experimental Setup

12

Naïve Edge Profiling

400.

perlb

ench

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

462.

libqu

antu

m

464.

h264

ref

473.

asta

r

web

sea

rch

bigt

able

a.m

ean

0

5

10

15

20

25

30

35

40

45

50

Slow

dow

n

13

Profiling Overhead40

0.pe

rlben

ch

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

462.

libqu

antu

m

464.

h264

ref

473.

asta

r

web

sea

rch

bigt

able

a.m

ean

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

1.30

2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms

Nor

mal

ized

Exe

cutio

n Ti

me

14

S/W Code Cache Prepopulation

0 1 2 3 4 5 6 7 8 90

500000

1000000

1500000

2000000

2500000

3000000

3500000

w/ pre-population w/o pre-population

Sampling Phases

Cum

ulat

ive

Num

ber o

f Sam

ples

15

Profiling Accuracy40

0.pe

rlben

ch

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

462.

libqu

antu

m

464.

h264

ref

473.

asta

r

web

sea

rch

bigt

able

a.m

ean

0

10

20

30

40

50

60

70

80

90

100

2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms

Prof

iling

Acc

urac

y

16

Asymptotic Accuracy

0 20 40 60 80 100 120 1400

10

20

30

40

50

60

70

80

90

100

bigtable web search

Sampling Phases

Cum

ulat

ive

Acc

urac

y

Low-overhead, portable, flexible profiling needed

Instant Profiling • Combines sampling and DBI• Pre-populates S/W code cache• Tunable tradeoff between overhead and

information• Provides eventual profiling accuracy

Less than 5% overhead, more than 80% accuracy for naïve edge profiling client

17

Conclusion

18

Thank you!

top related