microarchitectural characterization of production jvms and java workload work in progress

Post on 09-Jan-2016

26 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Microarchitectural Characterization of Production JVMs and Java Workload work in progress. Jungwoo Ha ( UT Austin ) Magnus Gustafsson ( Uppsala Univ. ) Stephen M. Blackburn ( Australian Nat’l Univ. ) Kathryn S. McKinley ( UT Austin ). Challenges of JVM Performance Analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Microarchitectural Characterization

of Production JVMs and Java Workload

work in progress

Jungwoo Ha (UT Austin)Magnus Gustafsson (Uppsala Univ.)

Stephen M. Blackburn (Australian Nat’l Univ.)

Kathryn S. McKinley (UT Austin)

2/22/08 2

Challenges of JVM Performance Analysis

Controlling nondeterminism Just-In-Time Compilation driven by nondeterministic

sampling Garbage Collectors Other Helper Threads

Production JVMs are not created equal Thread model (kernel, user threads) Type of helper threads

Need a solid measurement methodology! Isolate each JVM part

2/22/08 3

Forest and Trees

What performance metrics explain performance differences and bottlenecks? Cache miss? L1 or L2? TLB miss? # of instructions?

Inspecting one or two metrics is not always enough

Performance counters give us only small number of counters at a time Multiple invocation for the measurement inevitable

2/22/08 4

Case Study: jython

Application performance (Cycles)

2/22/08 5

Case Study: jython

L1 Instruction cache miss/cyc

2/22/08 6

Case Study: jython

L1 Data cache miss/cyc

2/22/08 7

Case Study: jython

Total Instruction executed (retired)

2/22/08 8

Case Study: jython

L2 Data cache miss/cycle

2/22/08 9

Project Status

Established methodology to characterize application code performance Large number of metrics (40+) measured from

hardware performance counters apples to apple comparison of JVMs using

standard interface (JVMTI, JNI)

Simulator data for detail analysis Limit studies

What if L1 cache had no misses?

More performance metrics e.g. uop mix

2/22/08 10

Performance Counter Methodology

Warmup JVM

Stop JIT

Full Heap GC

Measured Run

change metric

Invoke JVMy times

1st – xth iteration

(x+1)th iteration

(x+2)th – (x+2+(n/p)k)thiteration

Collecting n metric x warmup iterations (x = 10) p performance counters (can measure at most p metrics per iter.) n/p iterations needed for measurement k redundant measurement for statistical validation (k = 1)

Need to hold workload constant for multiple measurements

2/22/08 11

Performance Counter Methodology

Stop-the-world Garbage Collector No concurrent marking

One perfctr instance per pthread JVM internal threads are different pthreads from the

application

JVMTI Callbacks Thread start - start counter Thread finish - stop counter GC start - pause counter, only for userlevel thread GC stop - resume counter, only for userlevel thread

2/22/08 12

Methodology Limitations

Cannot factor out memory barrier overhead Use garbage collector with the least application

overhead

If a helper thread runs in the same pthread with the application (user-level thread), it will cause perturbation No evidence in J9, HotSpot, JRockit

Instrumented code overhead Must be included in the measurement

2/22/08 13

Performance Counter Experiment Pentium-M uni-processor

32KB 8-way L1 cache (data & instruction) 2MB 4-way L2 cache 2 hardware counter (18 if multiplexed)

1GB Memory 32bit Linux 2.6.20 with perfctr patch PAPI 3.5.0 Library

Simulator Experiment PTLsim (http://www.ptlsim.org) x86 simulator 64bit AMD Athlon

Experiment

2/22/08 14

Experiment

3 Production JVMs * 2 versions IBM J9, Sun HotSpot JVM, JRockit (perfctr only) 1.5 and 1.6 Heap Size = max (16MB, 4*minimum heap size)

18 Benchmarks 9 DaCapo benchmarks 8 SPEC JVM 98 1 PseudoJBB

2/22/08 15

Experiment

40+ Metrics 40 distinct metrics from performance counter

L1 or L2 Cache misses (Instruction, Data, Read, Write) TLB-I miss Branch predictions Resource Stalls

More rich metrics from the simulator Micro operation mix Load to store

2/22/08 16

Performance Counter Results (Cycle Counts)

PseudoJBB pmd

jython jess

2/22/08 17

Performance Counter Results (Cycle Counts)

jack hsqldb

compress db

2/22/08 18

Performance Counter Results

IBM J9 1.6 performed better than Sun HotSpot 1.6 in the average

JRockit has the most variation in performance

Full results ~800 graphs Full jython results in the paper http://z.cs.utexas.edu/users/habals/jvmcmp or Google my name (Jungwoo Ha)

2/22/08 19

Future Work

JVM activity characterization Garbage collector JIT

Statistical analysis of performance metrics metrics correlation Methodology to identify performance bottleneck

Multicore performance analysis

2/22/08 20

Conclusions

Methodology for production JVM comparison

Performance evaluation data

Simulator results for deeper analysis

Thanks you!

2/22/08 22

2/22/08 23

Simulation Result

2/22/08 24

Perfect Cache - compress

2/22/08 25

Perfect Cache - db

top related