instant profiling: instrumentation sampling for profiling datacenter applications
DESCRIPTION
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications. Hyoun Kyu Cho 1 , Tipp Moseley 2 , Richard Hank 2 , Derek Bruening 2 , Scott Mahlke 1. 1 University of Michigan 2 Google. Datacenter Applications. http://googleblog.blogspot.com. - PowerPoint PPT PresentationTRANSCRIPT
1
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications
Hyoun Kyu Cho1, Tipp Moseley2, Richard Hank2, Derek Bruening2, Scott Mahlke1
1University of Michigan 2Google
2
Datacenter Applications
• In 2010, US Datacenters spent 70~90 billion kWh*
• Datacenter application performance is critical• Profiling can help
http://googleblog.blogspot.com
*[Koomey`11]
3
Challenges for Datacenters• Need to run on live traffic
• Difficult to isolate• Overheads
• Value profiling 3.8x slowdown1
• Path profiling 31%, edge profiling 16%2
• Binary management• Many programs, multiple
versions
Traditional ProfilingSource Code
Instrumented Binary
Input Data
Instrumentation Build
TrainingRun
Profile Data
1[Calder`99] 2[Ball`96]
4
Continuous profiling infrastructure for datacenters
Negligible overhead• Sampling based• Aggregated profiling overhead less than 0.01%
Limitations• Heavily rely on Performance Monitoring Units• Limited flexibility and portabiliity
[Ren et al.`10]
Google-Wide Profiling
5
Unified profiling infrastructure for datacenters• Flexible types of profile data• Portable across heterogeneous datacenter
While maintaining• Low overhead• Does not burden binary management
Goals
Sampling Dynamic Binary Instrumentation
6
Instrumentation Sampling
hardware
operating system
application
system call gateway
6
Instrumentation Sampling
hardware
operating system
application
[Bruening`04]
dispatch instrumentationengine client
code cacheDynamoRIO
context switch
6
Instrumentation Sampling
hardware
operating system
application
shep
herd
ing
thre
ad
start profiling
dispatch instrumentationengine client
code cachestopprofiling
Unbounded profiling periods due to fragment linking
Latency degradation due to initial instrumentation
Multi-threade programs
7
Problems with Basic Implementation
code cache
8
Temporal Unlinking/Relinking of Fragments
BB1
BB2
dispatch
contextswitch
BB2->BB1
9
S/W Code Cache Pre-population
hardware
operating system
application
shep
herd
ing
thre
ad
dispatch instrumentationengine
clientcode cache
Still have latency degradation for intial instrumentation phases
Sampling makes it possible to miss thread operations
Forces Instant Profiling’s signal handler for every thread
Enumerates all threads and sends profiling start signal to each thread
10
Multithreaded Program Support
6-core Intel Xeon 2.67GHz w/ 12MB L3 12GB main memory Linux kernel 2.6.32 gcc 4.4.3 w/ -O3 SPEC INT2006, BigTable, Web search Edge profiling client
11
Experimental Setup
12
Naïve Edge Profiling
400.
perlb
ench
401.
bzip
2
403.
gcc
429.
mcf
445.
gobm
k
462.
libqu
antu
m
464.
h264
ref
473.
asta
r
web
sea
rch
bigt
able
a.m
ean
0
5
10
15
20
25
30
35
40
45
50
Slow
dow
n
13
Profiling Overhead40
0.pe
rlben
ch
401.
bzip
2
403.
gcc
429.
mcf
445.
gobm
k
462.
libqu
antu
m
464.
h264
ref
473.
asta
r
web
sea
rch
bigt
able
a.m
ean
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms
Nor
mal
ized
Exe
cutio
n Ti
me
14
S/W Code Cache Prepopulation
0 1 2 3 4 5 6 7 8 90
500000
1000000
1500000
2000000
2500000
3000000
3500000
w/ pre-population w/o pre-population
Sampling Phases
Cum
ulat
ive
Num
ber o
f Sam
ples
15
Profiling Accuracy40
0.pe
rlben
ch
401.
bzip
2
403.
gcc
429.
mcf
445.
gobm
k
462.
libqu
antu
m
464.
h264
ref
473.
asta
r
web
sea
rch
bigt
able
a.m
ean
0
10
20
30
40
50
60
70
80
90
100
2ms/4s 1ms/1s 2ms/1s 4ms/1s 2ms/250ms
Prof
iling
Acc
urac
y
16
Asymptotic Accuracy
0 20 40 60 80 100 120 1400
10
20
30
40
50
60
70
80
90
100
bigtable web search
Sampling Phases
Cum
ulat
ive
Acc
urac
y
Low-overhead, portable, flexible profiling needed
Instant Profiling • Combines sampling and DBI• Pre-populates S/W code cache• Tunable tradeoff between overhead and
information• Provides eventual profiling accuracy
Less than 5% overhead, more than 80% accuracy for naïve edge profiling client
17
Conclusion
18
Thank you!