spec* java™ platform benchmarks and their role in the java … · ~1-5% in os kernel ~5-10% in gc...
TRANSCRIPT
*Other names and brands may be claimed as the property of others.
SPEC* Java™ Platform Benchmarks and Their Role in the
Java Technology Ecosystem
Anil Kumar David DagastineIntel Corporation Sun MicrosystemsPerformance Analyst Java Performance Lead
2*Other names and brands may be claimed as the property of others.
Agenda> What are SPEC Java Benchmarks?
> Timeline> Introduction> Basic characteristics
> Java Ecosystem Roles> H/W evaluation and design> S/W performance and development> JVMs evaluation and optimizations
> Summary > Resources
Note: This is NOT an official SPEC presentation. All information is based on public data and references.
3*Other names and brands may be claimed as the property of others.
SPEC Java™ Benchmark Timeline> Standard Performance Evaluation Corporation
> Industry collaboration of member companies http://www.spec.org/spec/membership.html
> SPEC Java Benchmarks with timeline
Old Retired Currently Active FutureSPECjvm98* SPECjvm2008*SPECjbb2000* SPECjbb2005* ???SPECjAppServer2001* 2002 SPECjAppServer2004* ???
* SPEC, SPECjvm, SPECjbb, SPECjAppServer and SPECjms are registered trademarks of Standard Performance Evaluation Corporation
SPECjms2007*
4*Other names and brands may be claimed as the property of others.
Need for Multiple Benchmarks> Can a single benchmark represent all Java
application areas? No> Too many application areas muddle the result> A good benchmark should fairly represent its area(s)
> SPEC Java Benchmarks covered here> SPECjbb2005> SPECjvm2008> SPECjms2007> SPECjAppServer2004
> Primary goal: characterize benchmarks to> Help to correlate to real world applications> Pros and cons
5*Other names and brands may be claimed as the property of others.
Methodology> Hardware setup
> Intel® Xeon® X5570 (Core™ i7) based server> Characterize performance using
> OS Perfmon counters> CPU utilization, context switches, interrupts, I/O etc.
H/W counters Cache misses, branches, locks, etc. Use Intel® VTune™ Performance Analyzer
JVM profiling GC, object and heap characteristics, Java locks, code
and data footprint, etc. Use Oracle® JRockit Mission Control
Configuration details, number of instances, etc.
6*Other names and brands may be claimed as the property of others.
SPECjbb2005 > Emulates a 3-tier warehouse system
> Clients: simulated by driver threads > Middle-tier: business logic and object manipulation> Database: stored in binary trees> Exercises the JVM, compiler, GC, threads, locks,
etc.> Measures performance of CPUs, cache/memory
hierarchy and both horizontal and vertical scalability
7*Other names and brands may be claimed as the property of others.
SPECjbb2005 Benchmark metric> Final metrics “SPECjbb2005 bops”
and “SPECjbb2005 bops/JVM” Arithmetic mean from “Expected
Peak Warehouse” count to 2x that> Multiple instances
> “SPECjbb2005 bops” totaled > Ramp-up warehouses
> Rough indicator of thread scaling> Steady state warehouses
> Rough indicator of degradation under load
Ram
p-up
Total from2 JVM instances **
Steady State
** http://www.spec.org/osg/jbb2005/results/res2009q2/jbb2005-20090330-00702.html
8*Other names and brands may be claimed as the property of others.
SPECjbb2005 JVM profiling> Six different transaction types> Allocation intensive
Large dynamic footprint: ~25mb / thread High object allocation: ~5-15KB / bop Increases with increment in warehouses Minimal sharing among threads
Small code footprint: ~2MB / JVM instance> Many locks, not highly contended: 50-500 locks /
bop> GC intensive with ~1 sec interval between GCs
> ~2-5% CPU cycles spent in GC > CPU and cache/memory intensive
> Often saturates CPU
9*Other names and brands may be claimed as the property of others.
SPECjbb2005 Hardware event profiling> Events normalized per SPECjbb2005 bop
> ~65,000 IA x86 machine instructions> ~20% of instructions are branches
> ~1.5% of all branches are mis-predicted> Barely any floating point operations (Don’t care)> ~0 ITLB miss with large pages for code (Don’t care)> ~125 DTLB misses with large pages for data> ~17k loads and ~7k stores: 70/30 read/write split
> ~0.007 MPI (misses/inst.) for 8MB L3 Nehalem-EP> High cache/memory bandwidth utilization
10*Other names and brands may be claimed as the property of others.
SPECjbb2005 CPU, Cache/Memory and scaling characteristics> CPU utilization
~1-5% in OS kernel ~5-10% in GC + memory management libs ~85-94% in Java Transactions
Top ~20 methods consume ~80% of all CPU cycles > Cache/Memory capacity, bandwidth and latency
> 8-16 gb sufficient for 2 socket systems Very sensitive to cache size and latencies ~3 mb/thread is optimal cache size
> Scaling Depends whether using single instance (vertical) or
multiple instances (horizontal)
11*Other names and brands may be claimed as the property of others.
SPECjbb2005Summary> Pros
Easy to set up and run Intensively exercises CPU and memory Lots of published data to compare h/w and s/w
> Cons Long single instance runtime for large h/w thread count Minimal disk I/O, no network I/O and no think times High h/w thread count required to scale memory
capacity beyond 16 gb> Flexibility
Many different use scenarios can be created with simple changes to SPECjbb.props file Runtime, cache/memory pressure, # of threads, etc.
12*Other names and brands may be claimed as the property of others.
SPECjvm2008> Replaces decade old, client only SPECjvm98
Multi-threaded Broad collection of real-world Java applications Covers both client and server apps on single platform
> Stresses many aspects of JVM and h/w functionality JIT, GC, threads, locks, etc. CPU, cache/memory, integer and floating-point units
> Min. memory for SUT: 512 mb / hardware thread> Only single JVM instance allowed> Base run (simulates out-of-box) required for
submission, Peak run optional> Freely downloadable
13*Other names and brands may be claimed as the property of others.
SPECjvm2008Groups and sub-groups> 11 groups with sub-groups
Individual scores for each group and sub-groups
Overall score computed using nested geo-mean (ops/minute)
> Each workload has unique profile Excellent for specific JVM and h/w
area debug and evaluation
Score= kn1X 11. . . X 1n1 .. . nkXk1 .. . Xknk
1) Startup : 17 sub-tests 2) Compiler compiler.compiler compiler.sunflow 3) Compress 4) Crypto crypto.aes crypto.rsa crypto.signverify 5) Derby 6) Mpegaudio 7) Scimark.X.large : 5 sub-tests 8) Scimark.X.small : 5 sub-tests 9) Serial 10) Sunflow 11) XML xml.transform xml.validation
SPECjvm2008 groups
Characteristics too diverse to cover in a short time http://www.informatik.uni-trier.de/~ley/db/conf/sipew/sbw2009.html
14*Other names and brands may be claimed as the property of others.
SPECjvm2008Benchmark metric
> Sun® Blade X6270 > Intel® Xeon® X5570 (Nehalem-EP)
> Sun ® HotSpot(TM) 64-Bit Server VM
compiler 821compress 437crypto 666derby 645mpegaudio 259scimark.large 74scimark.small 423serial 377startup 45sunflow 165xml 933Base ops/m 317
crypto.signverify 1106crypto.rsa 906crypto.aes 295
compiler.sunflow 514compiler.compiler 1313
xml.validation 1166xml.transform 748
Cache size impact17 sub groups
Scimark fft lu sor sparse monte_carlolarge 72 18 98 51small 388 546 747 256 333
15*Other names and brands may be claimed as the property of others.
SPECjvm2008: Summary> Pros
Easy to run, quick feedback, configurable Generic optimizations more effective for Geo-mean Exercises wide range of JVM and H/W functionality
Sub-groups can exercise specific sub-components Real-world relevance
Single instance challenges scaling> Cons
Run length a bit long Minimal disk I/O, no network I/O, no think times Limited scaling for memory capacity (>24GB RAM) Sub-group focus can be too narrow
> Flexibility Plug-in analysis framework for heap, power, etc.
16*Other names and brands may be claimed as the property of others.
SPECjms2007
> First industry-standard multi-tier benchmark> Focus on message-oriented middleware (MOM)
servers based on JMS (Java Message Service)> Exercises JMS server software, JVM, database for
message persistence, network and h/w
> Models the supply chain of a supermarket company
> Scale along 2 dimensions● Horizontal ● Vertical
17*Other names and brands may be claimed as the property of others.
SPECjms2007 Benchmark Metric> Two metrics cover horizontal and vertical topology> SPECjms2007@Horizontal
> Destinations (queues and topics) are increased> While keeping the traffic per Destination constant
> SPECjms2007@Vertical> Traffic (message count) to a Destination is increased > While keeping the number of Destinations fixed
For characterization data, please refer tohttp://www.dvs.tu-darmstadt.de/publications/pdf/WorkloadCharSPECjms2007.pdf
18*Other names and brands may be claimed as the property of others.
SPECjms2007 Summary> Pros
Real world 3-tier setup Excellent evaluation tool for JVM and application
server > Cons
Very complex setup Minimal characterization data
> Flexibility Freeform topology gives user complete control over
benchmark configuration
19*Other names and brands may be claimed as the property of others.
SPECjAppServer2004> Multi-tier benchmark exercising many Java 2
Enterprise Edition (J2EE) technologies Web and EJB containers, JMS, transaction mgmt,
Message Driven Beans, database connectivity, etc. > Emulates an automobile company and dealerships
> Dealers interact using web browsers> Manufacturing process is accomplished via RMI
> Exercises application s/w, JVM, DB and network> Metric "SPECjAppServer2004 jops" denotes
jAppServer Operations Per Second
Driver(s) + Application server(s) + Database
server(s)
20*Other names and brands may be claimed as the property of others.
SPECjAppServer2004 JVM profiling> Code footprint 12 mb / JVM instance > ~5000 methods compiled during a benchmark run> Method level CPU utilization profile extremely flat > ~70% User and ~20% Kernel CPU utilization> Moderate object allocation rate
– GC interval ~10 sec> Reasonable data sharing among threads
– Many Java locks including contented ones> Requires a finely tuned DB server > Multiple instance performance > single instance
21*Other names and brands may be claimed as the property of others.
SPECjAppServer2004 H/W profiling> Events per SPECjAppServer2004 jop
> ~6 million IA x86 asm instructions> ~20% of instructions are branches
> ~7% of all branches are mis-predicted> Minimal floating point operations (Don’t care)> ~30% of instructions are loads and ~17% are stores
> ~ 65/35 read/write split> ~5k ITLB misses with large pages for code> ~35k DTLB misses with large pages for data
> ~0.005 MPI (misses/inst.) for 8MB L3 Nehalem-EP> Moderate cache/memory bandwidth utilization
22*Other names and brands may be claimed as the property of others.
SPECjAppServer2004Summary> Pros
Real-world system characteristics Well known to fairly exercise J2EE technologies Usage response time methodology
> Cons Very complex to setup, run and optimize Wide variation in response time at high utilization
> Flexibility EAStress benchmark enabled by a special research
run mode, for use in research and development
23*Other names and brands may be claimed as the property of others.
High level characteristics: Bird’s Eye View > Reference platform is 2 chip Intel Xeon X5570
Basic Characteristic
SPECjbb-2005
SPECjvm-2008
SPECjms-2007
SPECjApp-Server2004
System Tiers Stand-alone Stand-alone 3-Tier 3-TierDisk I/O Minimal Minimal Reasonable ReasonableNetwork I/O None None Reasonable ReasonableMemory Capacity
Medium(~16GB)
Low-Medium
(~8-24GB)
High(~24GB)
High(>24GB)
Memory Bandwidth
High Low-High Medium Medium
# of Instances Multiple Single Multiple MultipleEase of Use Easy Medium Very
ComplexVery Complex
24*Other names and brands may be claimed as the property of others.
SPEC Java BenchmarksJava Ecosystem Roles
> H/W evaluation and design> S/W performance analysis and development> JVM performance optimization> Will my app run faster?
> JVM optimization with focus on multi-core> Session TS7499 Wednesday 4:10PM
25*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleHardware evaluation and design
> SPEC Java benchmarks are heavily used to> Assess competitive performance across
> Machine architecture generations> SKU differentiation> Different machine architectures
> Measure H/W feature value add: HT, Turbo, etc.> Measure H/W scaling: CPU, memory system, etc.> Optimize platform configuration: BIOS, memory, etc.> Evaluate future processor and platform design using
benchmark instruction traces
26*Other names and brands may be claimed as the property of others.
Java Ecosystem Role Hardware evaluation> Hardware and software change assessment
138
302
51
252
368
556
0
100
200
300
400
500
600
3.80GHzIrwindale(2MB_L2)
3.00GHzWoodcrest(4MB_L2)
3.00GHzClovertown(2x4MB_L2)
3.16GHzHarpertown(2x6MB_L2)
3.33GHzHarpertown(2x6MB_L2)
2.93GHzNehalem-EP
(8MB_L3)
SPEC
jbb2
005
k bo
ps
Core™2 changes
2.7x
Core scalingDCQC
1.8x
CacheScaling46MB
1.2x
JVMOptimizations
1.2x
Nehalem-EP changes
1.5x
P28P28
P27P2
7
P27
P27
Data using Oracle® JRockit 64-bit JVM P27 or P28 versions
27*Other names and brands may be claimed as the property of others.
Java Ecosystem Role S/W performance and development
> Application Server performance> SPECjAppServer2004 very helpful in identifying
improvement opportunities> Application S/W stack comparison
> Power and performance> SPECpower_ssj2008, first industry standard
benchmark to measure performance and power> Based on SPECjbb2005> Plays significant role in measuring power efficiency
> Virtualization environment evaluation> SPECjbb2005 and SPECjAppServer2004 play a
very constructive role
28*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleHardware evaluation
*SPECjAppServer2004 based publications at SPEC:http://www.spec.org/jAppServer2004/results/res2007q4/jAppServer2004-20071023-00088.htmlhttp://www.spec.org/osg/jAppServer2004/results/res2009q1/jAppServer2004-20090310-00128.html
2056
3975
0500
10001500200025003000350040004500
Intel® Xeon® X5460(3.16GHz_2x6MB L3)
Intel® Xeon® X5570(2.93GHz_8MB_L3
with_HT_and_Turbo)
SPEC
jApp
Serv
er20
04 J
OPS
Oracle® WebLogic Server Standard Edition Release 10.3
Oracle® Application Server 10G10.1.3.3
Nehalem-EP,App Server,JVM changes
1.93x
> Hardware and software change assessment
29*Other names and brands may be claimed as the property of others.
3975
2925
0500
10001500200025003000350040004500
Sun GlassFish EnterpriseServer v2.1
Oracle WebLogic ServerStandard Edition Release
10.3
SPEC
jApp
Serv
er20
04 J
OPS
Java Ecosystem RoleSoftware evaluation
Single node two processor server using Intel® Xeon® X5570 2.93GHz with Intel® Turbo Boost Technology up to 3.20 GHz *SPECjAppServer2004 based publications at SPEC:http://www.spec.org/jAppServer2004/results/res2009q2/jAppServer2004-20090324-00129.htmlhttp://www.spec.org/osg/jAppServer2004/results/res2009q1/jAppServer2004-20090310-00128.html
Ope
nSol
aris
Su
n® H
ot S
pot
MyS
QL
5.1.
30
Ora
cle®
Lin
ux
Ora
cle®
JR
ocki
t O
racl
e® D
B
30*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleJVM performance optimization
> SPEC benchmarks provide a platform to highlight JVM competitive performance
> SPEC publication makes open source software credibile
> SPEC benchmarks are a record of many years of fiery competition among JVM vendors
> Active competition has led to many innovative performance optimizations
> SPEC run rules ensures optimizations are widely applicable and acceptable
31*Other names and brands may be claimed as the property of others.
SPECjbb2005 PerformanceSun CMT Systems, 1-4 chips
12/01/05 12/01/05 06/01/06 06/01/06 06/01/07 10/01/07 10/01/07 04/01/08 10/1/080
100000
200000
300000
400000
500000
600000
700000
US-T1 1.2Ghz, JDK 5_06 US-T1 1.2Ghz,
JDK 5_08
US-T1 1.4Ghz, JDK 6_02 US-T2 1.4Ghz,
JDK 6_04-P
4 X US-T2 Plus 1.4Ghz, JDK 6_06-P
2 x US-T2 Plus 1.4Ghz, JDK 6_06-P
32*Other names and brands may be claimed as the property of others.
02/01/08 05/01/08 09/01/08 04/01/09200000
250000
300000
350000
400000
450000
500000
550000
600000
2 x Intel X5460JDK 6_05-P
4 x Intel X7350JDK 6_06-P
4 x Intel X7460JDK 6_06-P
2 x Intel X5570JDK 6_14-P
SPECjbb2005 PerformanceSun Intel Systems, 1-4 chips
33*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleJVM Optimization> JDK 5 Update 6
– Biased Locking• Improves uncontended synchronization • An object is "biased" toward the thread which first acquires its
monitor – Initial BigDecimal optimization
• Internally represent BigDecimal as a long where possible– Platform optimized arraycopy
• Hand optimized arraycopy> JDK 5 Update 8
– Biased Locking improvements• Much faster bias revocation
– Parallel Old Generation GC• Parallel collector for full GCs
34*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleJVM Optimization> JDK 6 Update 2
– Biased Locking on by default in Java 6– Vectorization (superword)
• Load, operate on and store multiple array elements at once with single machine instructions
– Depth first object promotion order• Promote objects from young generation to old generation• Depth-first closer to object allocation order than breadth-first
• JDK 6 Update 4 Performance Release– Hashmap hashing algorithm– Autobox elision
• Optimize autoboxing to reduce cache misses– BigDecimal optimization
• Additional changes to reduce cache misses– Object zeroing elision
• Elide object zeroing when fields guaranteed to be initialized
35*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleJVM Optimization> JDK 6 Update 5 Performance Release
– TreeMap: point optimizations– HashMap: Integer front-cache– Escape analysis: Scalar replacement and lock
elimination> JDK 6 Update 6 Performance Release
– Variation on Harmony “Wide Node” TreeMap– Compressed object references
• 32-bit reference in a 64-bit JVM– HashMap: point optimization– StringCache: cache commonly used strings
36*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleJVM Optimization> JDK 6 Update 14 Performance
– Zero-based compressed object references• 64-bit now equals or exceeds 32-bit performance• Will be default for servers
37*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleWill my application run faster?> JDK 6
– Includes all performance optimizations in JDK 5 updates
> JDK 6 Update 2– Biased Locking plus all prior performance
optimizations> JDK 6 Update 14
– All prior optimizations default on, except HashMap and TreeMap
> Java For Business– High Performance JDK 5 and J2SE 1.4.2
including the latest HotSpot JVM
38*Other names and brands may be claimed as the property of others.
SPECjvm2008 Base PerformanceSun Intel Systems, 1-4 chips
05/01/08 09/01/08 04/01/090
50
100
150
200
250
300
350
SPECjvm2008 Base == Out of box performance. No JVM performance tuning.
4 x Intel X7350JDK 6_06-P
4 x Intel X7460JDK 6_06-P
2 x Intel X5570JDK 6_14-P
39*Other names and brands may be claimed as the property of others.
SPECjbb2005 PerformanceIBM Power6 Systems, 1-4 chips
08/01/06 06/01/07 11/01/07 06/01/080
50000
100000
150000
200000
250000
40*Other names and brands may be claimed as the property of others.
Java Ecosystem RoleWill my application run faster?• Java optimizations identified through SPEC Java
benchmark analysis yield real improvements running your applications
• Experimental optimizations eventually become default behavior
• All production JVMs have shown ongoing performance improvement
• Active competition on SPEC benchmarks results in innovative performance optimization for the Java Platform
41*Other names and brands may be claimed as the property of others.
Summary> Are SPEC Java benchmarks perfect?> No, but highly relevant because
> Created by consensus among members playing lead roles in Java ecosystem
> Code available as well as characterization data > Many publications and strict fair comparison run rules> Very credible as competitive reference> Updated benchmarks to reflect latest trends> Lead to many optimizations benefiting wide
application segments
Excellent evaluation tool and more, with proven record of benefitting all of Java eco-system
42*Other names and brands may be claimed as the property of others.
Resources> www.spec.org> SPECjbb2005:
http://www.spec.org/jbb2005/ http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1526002
> SPECjvm2008: http://www.spec.org/jvm2008/ http://www.informatik.uni-trier.de/~ley/db/conf/sipew/sbw2009.html
> SPECjms2007 http://www.spec.org/jms2007/ http://www.dvs.tu-darmstadt.de/publications/pdf/WorkloadCharSPECjms2007.pdf
> SPECjAppServer2004 http://www.spec.org/jAppServer2004/ http://www.ece.wisc.edu/~pharm/papers/caecw8.pdf http://portal.acm.org/citation.cfm?id=1140329
43*Other names and brands may be claimed as the property of others.
Anil [email protected]
David [email protected]