instant profiling: instrumentation sampling for profiling datacenter applications

20
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications Hyoun Kyu Cho 1 , Tipp Moseley 2 , Richard Hank 2 , Derek Bruening 2 , Scott Mahlke 1 1 1 University of Michigan 2 Google

Upload: ganya

Post on 23-Feb-2016

62 views

Category:

Documents


1 download

DESCRIPTION

Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications. Hyoun Kyu Cho 1 , Tipp Moseley 2 , Richard Hank 2 , Derek Bruening 2 , Scott Mahlke 1. 1 University of Michigan 2 Google. Datacenter Applications. http://googleblog.blogspot.com. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

1

Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

Hyoun Kyu Cho1, Tipp Moseley2, Richard Hank2, Derek Bruening2, Scott Mahlke1

1University of Michigan 2Google

Page 2: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

2

Datacenter Applications

• In 2010, US Datacenters spent 70~90 billion kWh*

• Datacenter application performance is critical• Profiling can help

http://googleblog.blogspot.com

*[Koomey`11]

Page 3: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

3

Challenges for Datacenters• Need to run on live traffic

• Difficult to isolate• Overheads

• Value profiling 3.8x slowdown1

• Path profiling 31%, edge profiling 16%2

• Binary management• Many programs, multiple

versions

Traditional ProfilingSource Code

Instrumented Binary

Input Data

Instrumentation Build

TrainingRun

Profile Data

1[Calder`99] 2[Ball`96]

Page 4: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

4

Continuous profiling infrastructure for datacenters

Negligible overhead• Sampling based• Aggregated profiling overhead less than 0.01%

Limitations• Heavily rely on Performance Monitoring Units• Limited flexibility and portabiliity

[Ren et al.`10]

Google-Wide Profiling

Page 5: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

5

Unified profiling infrastructure for datacenters• Flexible types of profile data• Portable across heterogeneous datacenter

While maintaining• Low overhead• Does not burden binary management

Goals

Sampling Dynamic Binary Instrumentation

Page 6: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

6

Instrumentation Sampling

hardware

operating system

application

system call gateway

Page 7: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

6

Instrumentation Sampling

hardware

operating system

application

[Bruening`04]

dispatch instrumentationengine client

code cacheDynamoRIO

context switch

Page 8: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

6

Instrumentation Sampling

hardware

operating system

application

shep

herd

ing

thre

ad

start profiling

dispatch instrumentationengine client

code cachestopprofiling

Page 9: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

Unbounded profiling periods due to fragment linking

Latency degradation due to initial instrumentation

Multi-threade programs

7

Problems with Basic Implementation

Page 10: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

code cache

8

Temporal Unlinking/Relinking of Fragments

BB1

BB2

dispatch

contextswitch

BB2->BB1

Page 11: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

9

S/W Code Cache Pre-population

hardware

operating system

application

shep

herd

ing

thre

ad

dispatch instrumentationengine

clientcode cache

Still have latency degradation for intial instrumentation phases

Page 12: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

Sampling makes it possible to miss thread operations

Forces Instant Profiling’s signal handler for every thread

Enumerates all threads and sends profiling start signal to each thread

10

Multithreaded Program Support

Page 13: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

6-core Intel Xeon 2.67GHz w/ 12MB L3 12GB main memory Linux kernel 2.6.32 gcc 4.4.3 w/ -O3 SPEC INT2006, BigTable, Web search Edge profiling client

11

Experimental Setup

Page 14: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

12

Naïve Edge Profiling

400.

perlb

ench

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

462.

libqu

antu

m

464.

h264

ref

473.

asta

r

web

sea

rch

bigt

able

a.m

ean

0

5

10

15

20

25

30

35

40

45

50

Slow

dow

n

Page 15: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

13

Profiling Overhead40

0.pe

rlben

ch

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

462.

libqu

antu

m

464.

h264

ref

473.

asta

r

web

sea

rch

bigt

able

a.m

ean

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

1.30

2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms

Nor

mal

ized

Exe

cutio

n Ti

me

Page 16: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

14

S/W Code Cache Prepopulation

0 1 2 3 4 5 6 7 8 90

500000

1000000

1500000

2000000

2500000

3000000

3500000

w/ pre-population w/o pre-population

Sampling Phases

Cum

ulat

ive

Num

ber o

f Sam

ples

Page 17: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

15

Profiling Accuracy40

0.pe

rlben

ch

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

462.

libqu

antu

m

464.

h264

ref

473.

asta

r

web

sea

rch

bigt

able

a.m

ean

0

10

20

30

40

50

60

70

80

90

100

2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms

Prof

iling

Acc

urac

y

Page 18: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

16

Asymptotic Accuracy

0 20 40 60 80 100 120 1400

10

20

30

40

50

60

70

80

90

100

bigtable web search

Sampling Phases

Cum

ulat

ive

Acc

urac

y

Page 19: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

Low-overhead, portable, flexible profiling needed

Instant Profiling • Combines sampling and DBI• Pre-populates S/W code cache• Tunable tradeoff between overhead and

information• Provides eventual profiling accuracy

Less than 5% overhead, more than 80% accuracy for naïve edge profiling client

17

Conclusion

Page 20: Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications

18

Thank you!