jvm performance tunning
DESCRIPTION
JVM performance tuning guide.describes JVM memory structure and explains gc mechanism.and guide for tuning jvm.TRANSCRIPT
JavaStudy NetworkDaehyub Cho
JVM [Java Virtual Machine]
PerformanceTuning
AGENDA
Basic concept of JVM Tuning1
Hotspot compiler2
Threading Model3
Memory Model4
Basic Concept of JVM TuningBasic concept of JVM Tuning
Basic of performance tuning
1. Decide what performance level is “good enough”2. Test & measurement
• Scenario based• Stress Tool (Load Runner)• Profiling Tool (J probe, etc)
3. Profile application to find bottlenecks4. Tuning
• Application *• Middleware [WAS]• OS• JVM
5. Return to Step 2 [feedback]
JVM Tuning
• Improve performance about 10~20%• Find appropriate parameter for your application
– Hotspot compile option– Thread model option *– GC and memory related option **
• Changing parameter is very dangerous action– Need more test and feed back– Ref spec.org
Hotspot CompilerHotspot compiler
JVM Layout
• Hotspot from JDK 1.3
VM
ClientCompiler
ServerCompiler
• Runtime• GC• Interpreter• Threading & Locking• ….
JVM
Hotspot Compiler
Hotspot compiler
• JIT (Just-In-Time Compiler)– Compile byte code to native code– Compile as rules of optimization (Not thinking)– At execution/installation– Compile byte code to native code
• Hotspot– Compile byte code to native code– ‘Thinking’ to trying find where optimization can take place– Adaptive Optimizing in runtime
Hotspot Detection
• Hotspot detection• Method Inlining• Dynamic Deoptimization
Hotspot Detection and Method Inlining
• Literal constants are folded
• String concatenation is sometimes folded
• Constant fields are inlined
int foo = 9* 10; int foo = 90;
String foo = “Hello “ + (9*10); String foo = “Hello 90”;
public class A{ public static final VALUE=99;}public class B{ static int VALUE2=A.VALUE;}
public class B{ static int VALUE2=99;}
When after compiling class B
Hotspot detection / Method Inlining
• Dead code branches are eliminated
public class A{ static final boolean DEBUG = false; public void methodA() if(DEBUG) System.out.println(“DEBUG MODE); System.out.println(“Say Hello”); }// method A}// class A
↓public class A{ static final boolean DEBUG = false; public void methodA() System.out.println(“Say Hello”); }// method A}// class A
Hotspot Client compiler
• Java Option : -client• Focused on Simple & Fast start up• 3 Phase compiler
– HIR (High Level Intermediate Representation)– LIR (Low Level Intermediate Representation)– Machine code
• It focuses on local code quality and does very few global optimizations since those are often the most expensive in terms of compile time
• It has for inlining any function that has no exception handlers or synchronization and also supports deoptimization for debugging and inlining
Hotspot Server compiler
• Java Option : -server• Focused on optimization• SSA (Static Single Assignment)-based IR
Hotspot compiler Option
• Hotspot compile option– -XX:MaxInlineSize=<size>
• Integer specifying maximum number of bytecode instructions in a method which gets inlined.
– -XX:FreqInlineSize=<size>• Integer specifying maximum number of bytecode instructions in a
frequently executed method which gets inlined.
– -Xint• Interpreter only (no JIT compilation)
– -XX:+PrintCompilation
ThreadingThreading model
Threading Model
• Thread Model– Java is multi threaded programming language– Native thread model from JDK 1.2
• Thread mapping (M:N and 1:1)• Thread synchronization
JavaApplication
Java Thread
OperatingSystemThread Handling
Thread SchedulingLock Mgmt (synchronization)
JVM
Solaris M:N Thread Model
JavaApplication
Java Thread
JVM
Solaris OS
OS Kernel
Solaris Thread
LWP
Kernel Thread
Solaris M:N Thread Model
• Solaris M:N Thread Model– Thread based synchronization– LWP based synchronization
Thread based sync LWP based sync
JDK1.2 N/A Default
JDK1.3 Default -XX:+UseLWPSynchronization
JDK1.4 -XX:-UseLWPSynchronization Default
Solaris 1:1 Thread Model
JavaApplication
Java Thread
JVM
Solaris OS
OS Kernel
Solaris Thread
LWP
Kernel Thread
Solaris 1:1 Thread Model
• Solaris 1:1 Thread Model– Bound thread– Alternate Libthread
Bound Thread Alternate Libthread*
JDK1.2 N/A export LD_LIBRARY_PATH=/usr/lib/lwp
JDK1.3 -XX:+UseBoundThreads export LD_LIBRARY_PATH=/usr/lib/lwp
JDK1.4 -XX:+UseBoundThreads export LD_LIBRARY_PATH=/usr/lib/lwp
※ In Solaris 9, alternate lib thread is default, do not add /usr/lib/lwp to LD_LIBRARY_PATH
JVM Performance Test on Solaris
Architecture Cpus Threads Model %diff in throughput (against Standard Model)
Sparc 30 400/2000 Standard ---
Sparc 30 400/2000 LWP Synchronization 215%/800%
Sparc 30 400/2000 Bound Threads -10%/-80%
Sparc 30 400/2000 Alternate One-to-one 275%/900%
Sparc 4 400/2000 Standard ---
Sparc 4 400/2000 LWP Synchronization 30%/60%
Sparc 4 400/2000 Bound Threads -5%/-45%
Sparc 4 400/2000 Alternate One-to-one 30%/50%
Sparc 2 400/2000 Standard ---
Sparc 2 400/2000 LWP Synchronization 0%/25%
Sparc 2 400/2000 Bound Threads -30%/-40%
Sparc 2 400/2000 Alternate One-to-one -10%/0%
Intel 4 400/2000 Standard ---
Intel 4 400/2000 LWP Synchronization 25%/60%
Intel 4 400/2000 Bound Threads 0%/-10%
Intel 4 400/2000 Alternate One-to-one 20%/60%
Intel 2 400/2000 Standard ---
Intel 2 400/2000 LWP Synchronization 15%/45%
Intel 2 400/2000 Bound Threads -10%/-15%
Intel 2 400/2000 Alternate One-to-one 15%/35%
< Solaris 8 with JVM 1.3 >See next page graph!!
JVM Performance Test on Solaris
• Performance Test Result Graph
Memory TuningMemory Model
Memory Tuning
• Garbage Collection• JVM Memory Layout• Garbage Collection Model• Server VM and Client VM• Garbage Collection Measurement & Analysis• Tuning Garbage Collection
Generational Garbage Collection
JVM Memory Layout
• New/Young – Recently created object• Old – Long lived object• Perm – JVM classes and methods
Eden Old Perm
New/Young Old
Used in Application JVM
Total Heap Size
SS1 SS2
Garbage Collection
• Garbage Collection– Collecting unused java object– Cleaning memory– Minor GC
• Collection memory in New/Young generation
– Major GC (Full GC)• Collection memory in Old generation
Minor GC
• Minor Collection– New/Young Generation– Copy and Scavenge – Very Fast
Minor GC
Eden SS1 SS1
Copy live objects to Survivor area
New Object
Garbage
Lived Object
1st Minor GC
Old
Old
Old
Minor GC
2nd Minor GC
Old
Old
Old
New Object
Garbage
Lived Object
Minor GC
OLD
3rd Minor GC
Objects moved old space when they become tenured
New Object
Garbage
Lived Object
Major GC
• Major Collection– Old Generation– Mark and compact– Slow
• 1st – goes through the entire heap, marking unreachable objects• 2nd – unreachable objects are compacted
Major GC
Eden SS1 SS2
Eden SS1 SS2
Mark the objects to be removed
Eden SS1 SS2
Compact the objects to be removed
Server option versus Client option
• -X:NewRatio=2 (1.3) , -Xmn128m(1.4), -XX:NewSize=<size> -XX:MaxNewSize=<size>
GC Tuning Parameter
• Memory Tuning Parameter– Perm Size : -XX:MaxPermSize=64m– Total Heap Size : -ms512m –mx 512m– New Size
• -XX:NewRatio=2 Old/New Size• -XX:NewSize=128m• -Xmn128m (JDK 1.4)
– Survivor Size : -XX:SurvivorRatio=64 (eden/survivor)– Heap Ratio
• -XX:MaxHeapFreeRatio=70• -XX:MinHeapFreeRatio=40
– Suvivor Ratio• -XX:TargetSurvivorRatio=50
Support for –XX Option
• Options that begin with -X are nonstandard (not guaranteed to be supported on all VM implementations), and are subject to change without notice in subsequent releases of the Java 2 SDK.
• Because the -XX options have specific system requirements for correct operation and may require privileged access to system configuration parameters, they are not recommended for casual use. These options are also subject to change
without notice.
Garbage Collection Model
• New type of GC– Default Collector– Parallel GC for young generation - JDK 1.4– Concurrent GC for old generation - JDK 1.4 – Incremental Low Pause Collector (Train GC)
Parallel GC
• Parallel GC– Improve performance of GC– For young generation (Minor GC)– More than 4CPU and 256MB Physical
memory required
threads
timegc
threads
Default GC Parallel GC
Young Generation
Parallel GC
• Two Parallel Collectors– Low-pause : -XX:+UseParNewGC
• Near real-time or pause dependent application• Works with
– Mark and compact collector– Concurrent old area collector
– Throughput : -XX:+UseParallelGC• Enterprise or throughput oriented application• Works only with the mark and compact collector
Parallel GC
• Throughput Collector– –XX:+UseParallelGC– -XX:ParallelGCThreads=<desired number>– -XX:+UseAdaptiveSizePolicy
• Adaptive resizing of the young generation
Parallel GC
• Throughput Collector– AggressiveHeap
• Enabled By-XX:+AggresiveHeap• Inspect machine resources and attempts to set various parameters to
be optimal for long-running,memory-intensive jobs– Useful in more than 4 CPU machine, more than 256M– Useful in Server Application– Do not use with –ms and –mx
• Example) HP Itanium 1.4.2 java -XX:+ServerApp -XX:+AggresiveHeap -Xmn3400m -spec.jbb.JBBmain -propfile Test1
Concurrent GC
• Concurrent GC– Reduce pause time to collect
Old Generation– For old generation (Full GC)
– Enabled by -XX:+UseConcMarkSweepGC
threads
timegc
threads
Default GC Concurrent GC
OldGeneration
Incremental GC
• Incremental GC– Enabled by –XIncgc (from JDK 1.3)– Collect Old generation whenever collect young generation– Reduce pause time for collect old generation– Disadvantage
• More frequently young generation GC has occurred.• More resource is needed• Do not use with –XX:+UseParallelGC and –XX:+UseParNewGC
Incremental GC
• Incremental GC
Minor GC
After many time of Minor GC
Full GC
Minor GC
Minor GC
Old Generation is collected in Minor GC
Default GC Incremental GC
Young Generation
OldGeneration
Incremental GC
• Incremental GC-client –XX:+PrintGCDetails -Xincgc –ms32m –mx32m
[GC [DefNew: 540K->35K(576K), 0.0053557 secs][Train: 3495K->3493K(32128K), 0.0043531 secs] 4036K->3529K(32704K), 0.0099856 secs][GC [DefNew: 547K->64K(576K), 0.0048216 secs][Train: 3529K->3540K(32128K), 0.0058683 secs] 4041K->3604K(32704K), 0.0109779 secs][GC [DefNew: 575K->64K(576K), 0.0164904 secs] 4116K->3670K(32704K), 0.0169019 secs][GC [DefNew: 576K->64K(576K), 0.0057541 secs][Train: 3671K->3651K(32128K), 0.0051286 secs] 4182K->3715K(32704K), 0.0113042 secs][GC [DefNew: 575K->56K(576K), 0.0114559 secs] 4227K->3745K(32704K), 0.0191390 secs][Full GC [Train MSC: 3689K->3280K(32128K), 0.0909523 secs] 4038K->3378K(32704K), 0.0910213 secs][GC [DefNew: 502K->64K(576K), 0.0173220 secs][Train: 3329K->3329K(32128K), 0.0066279 secs] 3782K->3393K(32704K), 0.0325125 secs
Young Generation GC Old Generation GC in Minor GC TimeMinor GC
Full GC
Sun JVM 1.4.1 in Windows OS
Mark-compact Better throughput
Incremental GC(Train) Better Pause
Parallel GC Best Throughput
Concurrent GC Best Pause
Garbage Collection Measurement
• -verbosegc (All Platform)• -XX:+PrintGCDetails ( JDK 1.4)• -Xverbosegc (HP)
Garbage Collection Measurement
• -verbosegc
[GC 40549K->20909K(64768K), 0.0484179 secs][GC 41197K->21405K(64768K), 0.0411095 secs][GC 41693K->22995K(64768K), 0.0846190 secs][GC 43283K->23672K(64768K), 0.0492838 secs][Full GC 43960K->1749K(64768K), 0.1452965 secs][GC 22037K->2810K(64768K), 0.0310949 secs][GC 23098K->3657K(64768K), 0.0469624 secs][GC 23945K->4847K(64768K), 0.0580108 secs]
Full GC
Total Heap Size
GC Time
Heap size after GC
Heap size before GC
GC Log analysis using AWK script
• Awk script
BEGIN{ printf("Minor\tMajor\tAlive\tFree\n");}{ if( substr($0,1,4) == "[GC "){ split($0,array," "); printf("%s\t0.0\t",array[3])
split(array[2],barray,"K") before=barray[1] after=substr(barray[2],3) reclaim=before-after printf("%s\t%s\n",after,reclaim) }
if( substr($0,1,9) == "[Full GC "){ split($0,array," "); printf("0.0\t%s\t",array[4])
split(array[3],barray,"K") before = barray[1] after = substr(barray[2],3) reclaim = before - after printf("%s\t%s\n",after,reclaim) } next;}
% awk –f gc.awk gc.log
※ Usage
gc.awk
Minor Major Alive Freed0.0484179 0.0 20909 196400.0411095 0.0 21405 197920.0846190 0.0 22995 186980.0492838 0.0 23672 196110.0 0.1452965 1749 422110.0310949 0.0 2810 192270.0469624 0.0 3657 194410.0580108 0.0 4847 19098
gc.log
GC Log analysis using AWK script
< GC Time >
GC Log analysis using HPJtune
※ http://www.hp.com/products1/unix/java/java2/hpjtune/index.html
GC Log analysis using AWK script
< GC Amount >
Garbage Collection Tuning
• GC Tuning– Find Most Important factor
• Low pause? Or High performance?• Select appropriate GC model (New Model has risk!!)
– Select “server” or “client”– Find appropriate Heap size by reviewing GC log– Find ratio of young and old generation
Garbage Collection Tuning
• GC Tuning– Full GC Most important factor in GC tuning
• How frequently ? How long ?• Short and Frequently decrease old space• Long and Sometimes increase old space• Short and Sometimes decrease throughput by Load balancing
– Fix Heap size• Set “ms” and “mx” as same• Remove shrinking and growing overhead
– Don’t• Don’t make heap size bigger than physical memory (SWAP)• Don’t make new generation bigger than half the heap
Jmeter / Threads Histogram
Jmeter /Threads Group Histogram
Example
Example
2004-01-08 오후 7:14
2004-01-09 오전 8 시 전후
2004-01-09 오후 7 시 전후
금요일 업무시간
2004-01-10오전 10 시 전후
2004-01-10오후 6 시 전후
PEAK TIME52000~56000 sec9 시 ~ 1 시간 가량
Before TunedOld Area
Example
Peak Time 시에 Old GC 시간이 4~8 sec 로 이로 인한 Hang 현상 유발이 가능함
Before TunedGC Time
Example
12 일 03:38A12 일 05:58P13 일 07:18A13 일 09:38P14 일 11:58A15 일 01:18A15 일 03:38P16 일 05:58A16 일 07:18P17 일 08:38A17 일 10:58P
Weekend
Mon Office
Our
Tue Office
Our
Thur Office
Our
Fri Office
Our
After AP TunedGC Time
Example
12 일 03:38A12 일 05:58P13 일 07:18A13 일 09:38P14 일 11:58A15 일 01:18A15 일 03:38P16 일 05:58A16 일 07:18P17 일 08:38A17 일 10:58P
Weekend
Mon Office
Our
Tue Office
Our
Thur Office
Our
Fri Office
Our
Summary
JVM Tuning Summary
• Determine JVM performance goal• Gather statistics on your application• Select hotspot compiler• Tuning heap• Check threading model• Feedback
More TipsMore Tips
Thread dump
• Thread dump– Enabled by
• Unix “kill –3 [JAVA PID]”• Windows “Ctrl+Break”
– Snapshot of java application– Can profiling “hang-up”, and “slow-down”
Thread dump example
""
• Thread dump when slowdown in WAS
ExecuteThread: '232' for queue: 'default'" daemon prio=5 tid=0x573ca630 nid=0xd2c waiting for monitor entry [0x5cebf000..0x5cebfdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.util.ListResourceBundle.handleGetObject(ListResourceBundle.java:122) at java.util.ResourceBundle.getObject(ResourceBundle.java:371) at java.util.ResourceBundle.getObject(ResourceBundle.java:374) at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:483) at java.text.DateFormatSymbols.<init>(DateFormatSymbols.java:99) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:275) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.util.CmLog.setFileLog(CmLog.java:171) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:371) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
"ExecuteThread: '231' for queue: 'default'" daemon prio=5 tid=0x573f9a60 nid=0x13a8 waiting for monitor entry [0x5ce7f000..0x5ce7fdb8] at java.util.Hashtable.get(Hashtable.java:314) at java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:333) at java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:55) at java.text.NumberFormat.getInstance(NumberFormat.java:565) at java.text.NumberFormat.getInstance(NumberFormat.java:324) at java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:327) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:276) at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:264) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:88) at XXX.uv.com.cm.CmDateTimeUtil.getCurrentTime(CmDateTimeUtil.java:67) at XXX.uv.com.datastu.DateTime.setCurrentTime(DateTime.java:190) at XXX.uv.com.jsp.EjbJspBase.service(EjbJspBase.java:239) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:265) at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:200) at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.java:2546) at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
• Profiling CPU usage/HP UX– HP UX : Glance + Thread Dump
HP Glance
Press “G”
Thread monitoring
• Profiling CPU usage/HP UX
"Application Manager Thread" prio=8 tid=0x002a6c00 nid=62 lwp_id=15999 waiting on monitor [0x64bce000..0x64bce4b8] at java.lang.Thread.sleep(Native Method) at weblogic.management.mbeans.custom.ApplicationManager$ApplicationPoller.run(ApplicationManager.java:1137)
CPU Load of Thread 15999 is 17.7%
Thread 15999 is working on weblogic.management.mbeans.custom.ApplicationManager(ApplicationManger.java 1137)
Glance Thread Monitoring
Java Thread Dump
• Other tools– Profile with Java option– Analyze using HP Jmeter– Jprobe– Stress Test
• Load Runner• MS Stress (Free)
• Related URL– Java Thread http://java.sun.com/docs/hotspot/threads/threads.htm– Java Performance http://java.sun.com/docs/hotspot/PerformanceFAQ.html– Java Thread http://www.javaworld.com/javaworld/jw-09-1998/jw-09-threads.html– Pick up performance with generational gc
http://www.javaworld.com/javaworld/jw-01-2002/jw-0111-hotspotgc.html– JVM1.4 GC Tunning http://java.sun.com/docs/hotspot/gc1.4.2/index.html– HP Jmeter,Jtune,Jconfig http://www.hp.com/products1/unix/java/developers/index.html– SPECjvm98– SPECjAppServer2001/2002
Thank you