microarchitectural characterization of production jvms and java workload work in progress

25
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen M. Blackburn (Australian Nat’l Univ.) Kathryn S. McKinley (UT Austin)

Upload: miya

Post on 09-Jan-2016

26 views

Category:

Documents


1 download

DESCRIPTION

Microarchitectural Characterization of Production JVMs and Java Workload work in progress. Jungwoo Ha ( UT Austin ) Magnus Gustafsson ( Uppsala Univ. ) Stephen M. Blackburn ( Australian Nat’l Univ. ) Kathryn S. McKinley ( UT Austin ). Challenges of JVM Performance Analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

Microarchitectural Characterization

of Production JVMs and Java Workload

work in progress

Jungwoo Ha (UT Austin)Magnus Gustafsson (Uppsala Univ.)

Stephen M. Blackburn (Australian Nat’l Univ.)

Kathryn S. McKinley (UT Austin)

Page 2: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 2

Challenges of JVM Performance Analysis

Controlling nondeterminism Just-In-Time Compilation driven by nondeterministic

sampling Garbage Collectors Other Helper Threads

Production JVMs are not created equal Thread model (kernel, user threads) Type of helper threads

Need a solid measurement methodology! Isolate each JVM part

Page 3: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 3

Forest and Trees

What performance metrics explain performance differences and bottlenecks? Cache miss? L1 or L2? TLB miss? # of instructions?

Inspecting one or two metrics is not always enough

Performance counters give us only small number of counters at a time Multiple invocation for the measurement inevitable

Page 4: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 4

Case Study: jython

Application performance (Cycles)

Page 5: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 5

Case Study: jython

L1 Instruction cache miss/cyc

Page 6: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 6

Case Study: jython

L1 Data cache miss/cyc

Page 7: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 7

Case Study: jython

Total Instruction executed (retired)

Page 8: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 8

Case Study: jython

L2 Data cache miss/cycle

Page 9: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 9

Project Status

Established methodology to characterize application code performance Large number of metrics (40+) measured from

hardware performance counters apples to apple comparison of JVMs using

standard interface (JVMTI, JNI)

Simulator data for detail analysis Limit studies

What if L1 cache had no misses?

More performance metrics e.g. uop mix

Page 10: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 10

Performance Counter Methodology

Warmup JVM

Stop JIT

Full Heap GC

Measured Run

change metric

Invoke JVMy times

1st – xth iteration

(x+1)th iteration

(x+2)th – (x+2+(n/p)k)thiteration

Collecting n metric x warmup iterations (x = 10) p performance counters (can measure at most p metrics per iter.) n/p iterations needed for measurement k redundant measurement for statistical validation (k = 1)

Need to hold workload constant for multiple measurements

Page 11: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 11

Performance Counter Methodology

Stop-the-world Garbage Collector No concurrent marking

One perfctr instance per pthread JVM internal threads are different pthreads from the

application

JVMTI Callbacks Thread start - start counter Thread finish - stop counter GC start - pause counter, only for userlevel thread GC stop - resume counter, only for userlevel thread

Page 12: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 12

Methodology Limitations

Cannot factor out memory barrier overhead Use garbage collector with the least application

overhead

If a helper thread runs in the same pthread with the application (user-level thread), it will cause perturbation No evidence in J9, HotSpot, JRockit

Instrumented code overhead Must be included in the measurement

Page 13: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 13

Performance Counter Experiment Pentium-M uni-processor

32KB 8-way L1 cache (data & instruction) 2MB 4-way L2 cache 2 hardware counter (18 if multiplexed)

1GB Memory 32bit Linux 2.6.20 with perfctr patch PAPI 3.5.0 Library

Simulator Experiment PTLsim (http://www.ptlsim.org) x86 simulator 64bit AMD Athlon

Experiment

Page 14: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 14

Experiment

3 Production JVMs * 2 versions IBM J9, Sun HotSpot JVM, JRockit (perfctr only) 1.5 and 1.6 Heap Size = max (16MB, 4*minimum heap size)

18 Benchmarks 9 DaCapo benchmarks 8 SPEC JVM 98 1 PseudoJBB

Page 15: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 15

Experiment

40+ Metrics 40 distinct metrics from performance counter

L1 or L2 Cache misses (Instruction, Data, Read, Write) TLB-I miss Branch predictions Resource Stalls

More rich metrics from the simulator Micro operation mix Load to store

Page 16: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 16

Performance Counter Results (Cycle Counts)

PseudoJBB pmd

jython jess

Page 17: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 17

Performance Counter Results (Cycle Counts)

jack hsqldb

compress db

Page 18: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 18

Performance Counter Results

IBM J9 1.6 performed better than Sun HotSpot 1.6 in the average

JRockit has the most variation in performance

Full results ~800 graphs Full jython results in the paper http://z.cs.utexas.edu/users/habals/jvmcmp or Google my name (Jungwoo Ha)

Page 19: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 19

Future Work

JVM activity characterization Garbage collector JIT

Statistical analysis of performance metrics metrics correlation Methodology to identify performance bottleneck

Multicore performance analysis

Page 20: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 20

Conclusions

Methodology for production JVM comparison

Performance evaluation data

Simulator results for deeper analysis

Page 21: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

Thanks you!

Page 22: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 22

Page 23: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 23

Simulation Result

Page 24: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 24

Perfect Cache - compress

Page 25: Microarchitectural Characterization of  Production JVMs  and Java Workload work in progress

2/22/08 25

Perfect Cache - db