java™ se performance tuning

321
SEM-DTJ-380-LA Java ™ SE Performance Tuning Revision A

Upload: others

Post on 28-Dec-2021

11 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Java™ SE Performance Tuning

SEM-DTJ-380-LA

Java™ SE Performance Tuning Revision A

Page 2: Java™ SE Performance Tuning

Copyright 2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Sun, Sun Microsystems, the Sun logo, the Duke logo, Java, JavaServer Pages, JSP, Java HotSpot, Java VisualVM, Sun Fire T1000, Sun Fire T2000, UltraSparc, NetBeans, NetBeans Profiler, Sun Studio, and JavaScript are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements. Federal Acquisitions: Commercial Software Government Users Subject to Standard License Terms and Conditions Export Laws. Products, Services, and technical data delivered by Sun may be subject to U.S. export controls or the trade laws of other countries. You will comply with all such laws and obtain all licenses to export, re-export, or import as may be required after delivery to You. You will not export or re-export to entities on the most current U.S. export exclusions lists or to any country subject to U.S. embargo or terrorist controls as specified in the U.S. export laws. You will not use or provide Products, Services, or technical data for nuclear, missile, or chemical biological weaponry end uses. DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Export Control Classification Number (ECCN) assigned:EAR99

Page 3: Java™ SE Performance Tuning

Copyright 2008 Sun Microsystems Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits ré servé s. Ce produit ou document est proté gé par un copyright et distribué avec des licences qui en restreignent l'utilisation, la copie, la distribution, et la dé compilation. Aucune partie de ce produit ou document ne peut ê tre reproduite sous aucune forme, par quelque moyen que ce soit, sans l'autorisation pré alable et é crite de Sun et de ses bailleurs de licence, s'il y en a. Le logiciel dé tenu par des tiers, et qui comprend la technologie relative aux polices de caractè res, est proté gé par un copyright et licencié par des fournisseurs de Sun. Sun, Sun Microsystems, the Sun logo, the Duke logo, Java, JavaServer Pages, JSP, Java HotSpot, Java VisualVM, Sun Fire T1000, Sun Fire T2000, UltraSparc, NetBeans, NetBeans Profiler, Sun Studio et JavaScript sont des marques de fabrique ou des marques dé posé es de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays. L'interfaces d'utilisation graphique OPEN LOOK et Sun a é té dé veloppé e par Sun Microsystems, Inc. pour ses utilisateurs et licencié s. Sun reconnaît les efforts de pionniers de Xerox pour larecherche et le dé veloppement du concept des interfaces d'utilisation visuelle ou graphique pour l'industrie de l'informatique. Sun dé tient une licence non exclusive de Xerox sur l'interface d'utilisation graphique Xerox, cette licence couvrant é galement les licencié s de Sun qui mettent en place l'interface d'utilisation graphique OPEN LOOK et qui en outre se conforment aux licences é crites de Sun. Lé gislation en matiè re dexportations. Les Produits, Services et donné es techniques livré s par Sun peuvent ê tre soumis aux contrô les amé ricains sur les exportations, ou à la lé gislation commerciale dautres pays. Nous nous conformerons à lensemble de ces textes et nous obtiendrons toutes licences dexportation, de ré -exportation ou dimportation susceptibles dê tre requises aprè s livraison à Vous. Vous nexporterez, ni ne ré -exporterez en aucun cas à des entité s figurant sur les listes amé ricaines dinterdiction dexportation les plus courantes, ni vers un quelconque pays soumis à embargo par les Etats-Unis, ou à des contrô les anti-terroristes, comme pré vu par la lé gislation amé ricaine en matiè re dexportations. Vous nutiliserez, ni ne fournirez les Produits, Services ou donné es techniques pour aucune utilisation finale lié e aux armes nuclé aires, chimiques ou biologiques ou aux missiles. LA DOCUMENTATION EST FOURNIE "EN L'ETAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L'APTITUDE A UNE UTILISATION PARTICULIERE OU A L'ABSENCE DE CONTREFAÇ ON.

Page 4: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 4

Course Objectives

• Incorporate monitoring, profiling and tuning into the application development life cycle

• Monitor the Operating System (OS) layer: Central Processing Unit (CPU), network, disk I/O, virtual memory, processes and locks

• Monitor the Java Virtual Machine (JVM) and application layers

• Profile the OS, JVM and application layers

• Tune garbage collection (GC)

Page 5: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 5

Course Objectives • Examine and manage the Just in Time (JIT)

compiler

• Examine JVM ergonomics

• Examine 64 bit JVMs

• Tune the JVM for multi-core platforms

Page 6: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 6

Course Overview: What to Expect

• Master Java™ SE performance monitoring by learning: > What and where to performance monitor

> What to profile and what tools work the best for different use cases

> Commonly observed patterns indicating performance issues

> How Java™ HotSpot garbage collectors work and how to tune them

> What you need to know about the JIT compiler

> What is JVM ergonomics and how it works

> What you need to know about 64-bit JVMs

> How to tune the JVM for specific hardware platforms

Page 7: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 7

Course Overview: What to Expect

• In short, learn the basics of the JVM internals and learn enough about Java™ SE performance to know where to start and what to look for, to enable you to identify and resolve most of the JVM and Java™ performance issues you observe.

Page 8: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 8

Course Overview: The Nature of Performance Tuning

• Performance tuning is largely an art.

• There is no one approach that is always necessarily the right approach.

• There are performance issues which will require very specialized expertise to identify the root cause, and / or be able to recommend a solution.

Page 9: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 9

Course Module Outlines

● Module 1: Examining Performance Tuning > Distinguishing between monitoring, profiling and tuning > Incorporating monitoring, profiling and tuning into the application development life cycle

● Module 2: Monitoring the OS Layer > Monitoring CPU utilization > Monitoring network performance > Monitoring disk input output (I/O) > Monitoring memory utilization > Monitoring process behavior

Page 10: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 10

Course Module Outlines • Module 3: Monitoring the JVM and Application Layers

> Examining generational collector architectures

> Monitoring GC

> Monitoring the JVM

> Monitoring the application

• Module 4: Profiling the OS, JVM and Application Layers

> Examining profiling tools

> Profiling CPU usage

> Profiling the heap and memory usage

> Detecting lock contention

Page 11: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 11

Course Module Outlines • Module 5: Tuning GC

> Tuning collector generation sizes

> Selecting the collector that best fits application characteristics and requirements

> Examining practices that negatively impact GC performance

• Module 6: Examining and Managing the JIT compiler

> Examining choices of JIT compilers

> Tuning the JIT compiler

> Creating micro benchmarks

Page 12: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 12

Course Module Outlines • Module 7: Examining Ergonomics

> Examining JVM ergonomics behavior

• Module 8: Using 64 bit JVMs

> Examine the issues associated with using 64 bit JVMs

> Identify application characteristics that suit 64 bit JVMs

> Tuning 64 bit JVM for different application requirements

• Module 9: Optimize the JVM for Multi-core platforms

> Examining JVM features that can leverage multi-core architectures

> Optimize the JVM for various multi-core architectures

> Tuning the JVM for the Sun Fire™ T1000/T2000 platform

Page 13: Java™ SE Performance Tuning

SEM-DTJ-380-LA

Module 1: Examining Performance Tuning

Page 14: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 14

Objectives

• Distinguish between monitoring, profiling and tuning

• Incorporate monitoring, profiling and tuning into the application development life cycle

Page 15: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 15

Definitions • Performance monitoring

• Performance profiling

• Performance tuning

Page 16: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 16

Definitions: Performance Monitoring • An act of non-intrusively collecting or observing performance data from an operating or running application.

• In most cases, a “preventative” or “proactive” type of action. However, it can be an initial step in a reactive action.

• Can be performed in production, or qualification, or development environments.

• Helps identify or isolate potential issues without having a severe impact on runtime responsiveness or throughput.

Page 17: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 17

Definitions: Performance Monitoring

• Often times monitoring crosses over into trouble-shooting or service-ability.

• Training focus is on performance monitoring aspects rather than those which lend themselves to trouble-shooting or debugging.

• Training is focused more on techniques and tools related to performance throughput or performance responsiveness issues.

Page 18: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 18

Definitions: Performance Profiling

• An act of collecting or observing performance data from an operating or running application.

• Usually more intrusive than monitoring.

• Usually a narrower focus than monitoring.

• In general a reactive type of activity. Could be a proactive activity in situations where performance is a well defined systemic quality or requirement for a target application.

• Seldom performed in production environments.

• Commonly done in qualification, testing or development environments.

Page 19: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 19

Definitions: Performance Tuning

• An act of changing tune-ables, source code and/or configuration attribute(s) for the purposes of improving application responsiveness and/or application throughput.

• Usually results from monitoring and/or profiling activities.

Page 20: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 20

Typical Development Process

Start

Analysis

Design

Code

Yes

Quality OK

Test No

Page 21: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 21

Application Performance Process

Start

Analysis

Design

Code

Deploy

Profile No

Yes

Performance OK

Benchmark

Page 22: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 22

Application Performance Process

Monitor

Start

Analysis

Design

Code

Deploy

Profile No

Yes

Performance OK

Benchmark

Page 23: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 23

Module 2: Monitoring the OS Layer

Page 24: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 24

Objectives

• Monitor CPU usage

• Monitor network I/O

• Monitor disk I/O

• Monitor virtual memory usage

• Monitor processes including lock contention

Page 25: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 25

What to Expect

• What level of the software stack to monitor

> Operating system level

> JVM level (covered in module 3)

> Application level (covered in module 3)

• What information to monitor

> What to monitor is covered per component at a given level

> Example: At OS level, monitor CPU usage

• What tools to use

Page 26: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 26

What to Monitor in the OS Level

• CPU utilization

• Network traffic

• Disk I/O

• Virtual memory usage

• Processes and kernel locks

Monitoring definition recap: An act of non-intrusively collecting or observing performance data from an operating or running application.

Page 27: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 27

Monitoring CPU Usage: Overview

• Rationale for monitoring CPU usage

> Get big picture view of CPU demand

> Get per process measurement of CPU utilization

• Measurements of CPU usage

> User (usr) time

> System (sys) time

> Idle time

> Voluntary context switching (VCX)

> Involuntary context switching (ICX)

• Tools for monitoring CPU usage

Page 28: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 28

CPU monitoring: What to look for • High sys / kernel cpu time

> High sys / kernel cpu time indicates a lot of cpu cycles are spent in the kernel.

> A reduction in kernel cpu time will give more cpu time to the application.

> Also, high sys cpu time could indicate shared resource contention, (in other words, locking). More on lock contention later.

Page 29: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 29

CPU monitoring: What to look for • Idle cpu

> On multi-threaded applications and multi-core systems, idle cpu can be an indicator of an application's inability to scale.

> Combination of high sys or kernel CPU utilization and idle CPU could indicate shared resource contention as the scalability blocker.

> Applicable to all operating systems, i.e. Windows, Linux and Solaris

Page 30: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 30

Voluntary Context Switching (VCX)

block

block

Thread

Thread

Thread

Page 31: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 31

Involuntary Context Switching (VCX)

CPU Context Switch

Thread Priority Interrupt

Thread

Low Priority Thread

Thread

Page 32: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 32

CPU monitoring: What to look for • High VCX

> High voluntary context switching can be an indication an application is experiencing lock contention as a result of the JVM implements or supports Java™ locking such as synchronized methods or block, or Read/Write locks in the java.util.concurrent.locks package.

• High ICX > There are some applications (most often seen in OLTP /

database systems) which can benefit by switching to the Solaris FX (fixed) scheduler as a means to reduce context switching.

> For Solaris, use priocntl to set FX scheduling:

> $ priocntl c FX -s <PID> [<PID> ... ] for disc next week

Page 33: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 33

Tools For Monitoring: CPU Usage • Tools to monitor cpu utilization

> vmstat (Solaris & Linux)

> mpstat (Solaris)

> prstat (Solaris)

> top (Linux, prefer prstat on Solaris)

> Task Manager (Windows)

> Performance Monitor (Windows)

> Windows Resource Manager (Windows Server)

> xosview (Linux)

> cpubar (Solaris – Performance Tools CD)

> iobar (Solaris – Performance Tools CD)

> dtrace (Solaris)

Page 34: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 34

Tools for Monitoring CPU Usage: vmstat

• Use vmstat to obtain summaries of CPU usage

• Data of interest: us – user time; sy – system time; id – idle time

Page 35: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 35

CPU Usage Monitoring: vmstat

• us – user time; sy – system time; id – idle time

Page 36: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 36

Tools For Monitoring: mpstat

• usr – user time; sys – system time; idl – idle time csw – context switches; icsw – involuntary context switches

Page 37: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 37

Tools For Monitoring: prstat

• Example of overall CPU time measured per process

Page 38: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 38

CPU monitoring: prstat -m

• Example of CPU utilization microstate information measured per process

• Data of interest: USR, SYS, VCX, ICX

Page 39: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 39

Tools For Monitoring: prstat -Lm

• Example of CPU utilization including microstate information measured per light weight process. (USR, SYS, VCX, ICX)

Page 40: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 40

CPU Monitoring: Solaris - cpubar

• Data of interest: 0 – CPU 0; 1 – CPU 1 avg – Average; --- Moving average; Green – User; Red – System; Blue – Idle

• Available on Solaris Performance Tools 3.0 CD, or download from: http://mediacast.sun.com/share/stefanschneider/PerformanceCD3.0.tar.gz

Page 41: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 41

CPU monitoring: How to map Java™ Threads to Light Weight Processes (LWPs)

• Use HotSpot's jps command to find the process ids of all running Java™ processes on your machine.

• Use Solaris prstat -Lm, or prstat -Lmp <pid> to locate the LWP id(s) consuming the most cpu (usr or sys).

• Use HotSpot's jstack to find the executing threads taking the cpu (usr and sys) time.

• Map the LWPID to jstack's thread id.

> LWPID is in the far right column of prstat.

> Look for jstack's corresponding 'nid', reported in hex.

Page 42: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 42

42

Module 2: Demo 1

Page 43: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 43

CPU monitoring: Linux - vmstat

• Use vmstat to obtain summaries of CPU usage

• Data of interest: us – user time; sy – system time; id – idle time

Page 44: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 44

CPU Monitoring: Linux mpstat

Page 45: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 45

CPU Monitoring Using: pidstat • %user - % of user level

(application) task

• %system - % of system level (kernel) task

• %CPU – total % of CPU time

• cswch/s – total voluntary context switches/second

• nvcswhc/s – total involuntary context switches/second

Page 46: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 46

CPU monitoring : Linux - xosview

• Data of interest: CPU 0 and CPU 1

Page 47: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 47

Monitoring Network I/O: Overview • Data of interest

> Network utilization in terms of Transaction Control Protocol (TCP) statistics and established connections

• Tools to monitor network I/O

> netstat (Solaris & Linux)

> Performance Monitor (Windows)

> dtrace (Solaris)

> nicstat (Solaris – Performance Tools CD)

> tcptop (Dtrace Toolkit)

Page 48: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 48

Monitoring Network I/O: Using tcptop • tcptop can show per process TCP statistics

• The screen capture shows 'rcp' generating 115 kb of traffic

Page 49: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 49

Monitoring Network I/O: Using nicstat

• nicstat displays network statistics

• Notice wAvs, write average size, during four intervals is about 1420 bytes, the Maximum Transmission Unit (MTU) size.

Page 50: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 50

Monitoring Disk I/O: Overview • Data of interest

> Number of disk accesses

> Latency and average latencies

• Tools to monitor disk I/O

> iostat (Solaris & Linux)

> iotop (Solaris & Linux)

> pidstat (Linux)

> Performance Monitor (Windows)

> dtrace (Solaris)

> iobar (Solaris – Performance Tools CD)

• Disk caches

Page 51: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 51

Disk I/O Monitoring Tools: iostat, iobar, iotop, pidstat

• iostat reports per disk, text output

• iobar reports per disk, gui output

• iotop reports per process statistics, text output

• pidstat reports per process statistics, text output

• Data of interest

> number of disk accesses, latency, average latencies

Page 52: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 52

Monitoring Disk I/O: iotop example

• iotop reporting at a 5 second interval

• DISKTIME reported in microseconds

• CMD find, is keeping disk cmdk0 busy almost 60% of time during the 5 second interval

Page 53: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 53

Monitoring Disk I/O: pidstat example

Page 54: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 54

Monitoring Disk IO: Disk Cache • Why not enable disk cache?

• What's the risk?

• On some Sun branded systems, disk cache may be disabled by default. Linux & Windows systems usually have it enabled.

• Disk cache being disabled depends on the Sun branded model and how recent the model.

• Ask before disabling.

Page 55: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 55

Monitoring Virtual Memory: Overview • Observe paging to identify swapping

> Pages in (pi)

> Pages out (po)

> Scan rate (sr)

• Fixing the swapping problem

Page 56: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 56

Monitoring Virtual Memory: Swapping • Why is swapping bad for a Java application?

• Any volunteer want to explain?

Page 57: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 57

Monitoring Virtual Memory: Tools • Tools to monitor memory paging & usage

> vmstat (Solaris & Linux)

> prstat (Solaris)

> top (Linux – prefer prstat on Solaris)

> Performance Monitor (Windows)

> dtrace (Solaris)

> cpubar (Solaris – Performance Tools CD)

> meminfo (Solaris – Performance Tools CD)

Page 58: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 58

Monitoring Virtual Memory: vmstat

• Data of interest: pi – pages in; po – pages out; sr – page scan rate

Page 59: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 59

Virtual Memory: Swapping Example

• Data of interest > pi – pages in; po – pages out; sr – page scan rate

> Watch for high scan rate (see rows 3 to 6), or increasing trend. Low scan rate is ok if they occur infrequently.

Page 60: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 60

Monitoring Virtual Memory : cpubar

• Data of interest

> p/s - pages per second, sr - scan rate

> Watch for high scan rate, or increasing trend. Low scan rate is ok if they occur infrequently.

Page 61: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 61

Virtual Memory: Fixing the Swapping Problem

• Smaller Java heap sizes

• Add physical memory

• Reduce number of applications running on the machine

• Any one, or any combination of the above will help

Page 62: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 62

Monitoring Processes: Overview • Data of interest

> Footprint size

> Number of threads and thread state

> CPU usage

> Runtime stack

> Context switches

> Lock contention

Page 63: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 63

Monitoring Processes: Questions of Interest • Why is footprint size, number of threads, thread

state, lock contention and context switching important to monitor?

• What does lock contention and/or context switching look like on Solaris?

• How can you find the lock or locks causing problems?

• How can you address the thread context switching problem?

Page 64: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 64

Monitoring Processes: Tools • ps (Solaris & Linux)

• vmstat (Solaris & Linux)

• mpstat (Solaris)

• prstat (Solaris)

• pidstat (Linux)

• Performance Monitor (Windows)

• top (Linux – prefer prstat on Solaris)

• dtrace (Solaris)

Page 65: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 65

Monitoring Processes: prstat -Lm

Data of interest: Number of threads per process (sum of LWPIDs per PID); CPU usage (USR, SYS); Locks (LCK); Context switches (VCX and ICX);

Page 66: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 66

Monitoring Processes: mpstat

• Data of interest: csw – context switches; icsw – involuntary context switches; smtx – lock contention

Page 67: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 67

Monitoring Processes: Kernel • Data of interest

> kernel cpu utilization, locks, system calls, interrupts, migrations, run queue depth

• Tools to monitor the kernel

> vmstat (Linux & Solaris)

> mpstat (Solaris)

> lockstat & plockstat (Solaris)

> Performance Monitor (Windows)

> dtrace (Solaris)

> intrstat (Solaris)

Page 68: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 68

Kernel Monitoring: What to look for • Why are high sys / kernel cpu, run queue depth,

lock contention, migrations and context switching important to monitor?

> Discussion

• What do they indicate when each is observed?

> Discussion

• How to do you address each of these problems?

> Discussion

Page 69: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 69

Kernel Monitoring Using: vmstat

• Data of interest: Kernel cpu utilization, run queue depth (r column – represents number of runnable kernel threads)

Page 70: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 70

Kernel Monitoring Using: mpstat

• Data of interest: Kernel cpu utilization(sys); locks(smtx); system calls(syscl); interrupts(intr); migrations(migr); context switches (csw); involuntary context switches (iscw)

Page 71: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 71

Kernel Monitoring Using : prstat -Lm

• Data of interest: Kernel cpu utilization(SYS), locks(LCK), system calls(SCL), voluntary context switches(vCX); involuntary context switches (icx).

Page 72: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 72

Kernel Monitoring Using: pidstat • %user - % of user level

(application) task

• %system - % of system level (kernel) task

• %CPU – total % of CPU time

• cswch/s – total voluntary context switches/second

• nvcswhc/s – total involuntary context switches/second

Page 73: Java™ SE Performance Tuning

SEM-DTJ-380-LA

Module 3: Monitoring the JVM and Application Layers

Page 74: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 74

Objectives • Examine HotSpot generational garbage collectors

• Monitor the JVM

> GC

> JIT compiler

• Monitor the application

> Application throughput

> Application responsiveness

Page 75: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 75

What to Expect • Overview of generational GC

• How to monitoring the JVM

> GC: Which tools to use and what to monitor

> JIT compiler: Which tools to use and what to monitor

• How to monitoring the application

> Application throughput

> Application responsiveness

Page 76: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 76

What to Monitor: JVM • There are two major areas to monitor at the JVM

level of the software stack.

> Garbage collector

>The portion of the JVM responsible for freeing memory no longer utilized by application logic. The “magic” that lets programmers not have to worry about “managing memory”.

>Garbage collection involves traversing Java™ heap spaces where application objects are allocated and managed by the JVM's garbage collector.

> JIT compilation

>The portion of the JVM responsible for turning byte code into execute-able instruction(s) for the target hardware platform.

Page 77: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 77

Monitoring GC • Examining the GC Basics

> Young generation

> Tenured generation

> Permanent generation

• Data of interest

> Frequency and duration of collections

> Java™ heap usage

> Number of application threads

Page 78: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 78

GC Basics • HotSpot uses what is termed “generational

collectors”

• HotSpot Java™ heap is allocated into generational spaces.

Page 79: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 79

GC Basics: The Major Spaces • Young generation

> Further divided into:

> Eden

>A “from” survivor space

>A “to” survivor space

> Java™ objects are allocated in eden

• Tenured (old) generation

• Permanent generation

Page 80: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 80

GC Basics: Young Generation • When eden space is full, minor garbage collection

event occurs. Live objects in eden space are copied “to” a survivor space.

• Additionally, objects in the “from” survivor space are copied “to” survivor space.

• Each object which survives a garbage collection has its age incremented.

• Objects exceeding a JVM defined age threshold are promoted to the tenured (old) generation space.

Page 81: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 81

GC Basics: Young Generation • If “to” survivor space is too small to hold surviving

Eden and “from” survivor space objects, objects will be promoted to “tenured” space.

> This is a situation which can potentially lead to performance issues.

> Short lived objects getting promoted to old generation will require a Full GC (stop the world kind of event) or rely on a tenured spaced concurrent collector to remove them from the Java™ heap. Both choices have their own consequences.

Page 82: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 82

GC Basics: Young Generation • Minor garbage collection events can be:

> Stop the world single threaded events

Stop the world single threaded events block all Java™ application threads to perform the garbage collection event.

> Stop the world multi-threaded events

Stop the world multi-threaded events block all Java™ application threads, but is multi-threaded.

Page 83: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 83

GC Basics: Young Generation • Single threaded garbage collector is used with the

“default” collector.

• Multi-threaded garbage collector, also called a parallel collector, can be used with either the “throughput” collector or “concurrent” collector.

> -XX:+UseParallelGC, throughput collector

> -XX:+UseParNewGC, concurrent collector

>Throughput and concurrent collectors talked about in more detail later.

• Minor garbage collections are very quick compared to full garbage collection events.

Page 84: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 84

GC Basics: Tenured (old) Generation • Terms old and tenured generation commonly used

inter-changeably

• Contains objects which have survived minor collections and copied to tenured space > Some edge cases where objects are directly allocated

in tenured generation.

Page 85: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 85

GC Basics: Tenured (old) Generation • Objects are garbage collected by one of the

following: > Stop the world single threaded full garbage collection

> Stop the world multi-threaded full garbage collection

> Stop the world means, all Java™ application threads are blocked for the duration of the garbage collection.

> Single threaded and (mostly) concurrent garbage collection

> Most of the garbage collection occurs concurrently while Java™ application threads run.

Page 86: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 86

GC Basics: Object Life Cycle • Allocation through to garbage collection

> Object is allocated in Eden space.

> If object is in-use (reachable) at minor collection time, then object is copied “to” survivor space. If object is no longer in use (not reachable) it is garbage collected.

> As subsequent minor garbage collections occur, the object is copied “from” a survivor space “to” the other survivor space.

Page 87: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 87

GC Basics: Object Life Cycle • Allocation through garbage collection (continued)

> If the object continues to be in-use at subsequent minor garbage collections, at some threshold “age”, the object will be copied (tenured) to the tenured space. If the object is no longer in use, it will be garbage collected.

> Once in the tenured space, either a concurrent collection or full collection will be required to occur in order for the object to be collected when it is no longer in use.

Page 88: Java™ SE Performance Tuning

88 88

Module 3: Demo 1

Page 89: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 89

GC Basics: Permanent Generation • Contains meta-data (objects) required by the JVM

to describe the objects used in the application such as class objects, and interned Strings.

Page 90: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 90

GC Basics: Permanent Generation • Holds objects containing information which

describe an application class's or Java™ SE library class's methods.

• Populated by the JVM at runtime based on classes in use by the application.

• Classes may get collected (unloaded) if JVM finds they are no longer needed and space may be needed for other classes.

Page 91: Java™ SE Performance Tuning

91 91

Module 3: Demo 2

Page 92: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 92

Monitoring GC: The Tools • Tools for monitoring GC

> -verbose:gc

> -XX:+PrintGCTimeStamps

> -XX:+PrintGCDetails

> -XX:+PrintGCApplicationStoppedTime

> -XX:+PrintGCApplicationConcurrentTime

> jstat, jps

> JConsole

> Java™ VisualVM

> VisualGC

> dtrace (Java HotSpot™ JDK 6 contains samples)

Page 93: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 93

Monitoring GC: Using -verbose:gc • -verbose:gc

> [GC 1884K->1299K(5056K), 0.0031820 secs]

• Use -XX:+PrintGCTimeStamps with -verbose:gc > 3.791: [GC 1884K->1299K(5056K), 0.0031820 secs]

Page 94: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 94

Monitoring GC: Using -verbose:gc • Use -XX:+PrintGCDateStamps (Java™ 6u4 and later)

> 2008-06-10T06:12:47.513-0500: [GC 10308K->2725K(101376K), 0.0320270 secs]

> Where: Date and time is GMT time. Format: YYYY-MM-DD:HH.MM.SS.mmm-ttttt where YYYY = year, MM=month, DD=day of month, HH=hour, MM=minute, SS=seconds, mmm=milliseconds, tttt=time zone offset.

• -XX:+PrintGCDateStamps and -XX:+PrintGCTimeStamps can be used together > 2008-06-10T06:21:29.711-0500: 5.551:[GC 10377K-

>2786K(101376K), 0.0271350 secs]

> PrintGCTimeStamps output after date & time.

Page 95: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 95

Monitoring GC: Using -verbose:gc • Data of interest

> Frequency and duration, heap usage

• Explain what pattern(s) indicate potential problems.

Page 96: Java™ SE Performance Tuning

96 96

Module 3: Demo 3

Page 97: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 97

Monitoring GC: Printing GC Details • -XX:+PrintGCDetails • [GC [DefNew: 490K->64K(960K), 0.0032800 secs] 5470K->5151K(7884K),

0.0033270 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

• [Full GC (System) [Tenured: 5087K->5151K(6924K), 0.0971070 secs] 6047K->5151K(7884K), [Perm : 11178K->11178K(16384K)], 0.0972120 secs] [Times: user=0.10 sys=0.01, real=0.10 secs]

• Data of interest

> frequency and duration, heap usage

• Explain what patterns indicate potential problems?

Page 98: Java™ SE Performance Tuning

98 98

Module 3: Demo 4

Page 99: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 99

Monitoring GC: Printing Pause Time • -XX:+PrintGCApplicationStoppedTime

• -XX:+PrintGCApplicationConcurrentTime

• Helpful when tuning pause time sensitive applications

• Useful for identifying odd pause time issues when combined with GC timestamps and GC duration.

Page 100: Java™ SE Performance Tuning

100 100

Module 3: Demo 5

Page 101: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 101

Monitoring GC: Using jps • jps

> Included in the HotSpot JDK

> Capable of local and remote monitoring

> Command line utility to find running java processes.

> jps [-q] [-mlvV] [<hostid> where <hostid> = <hostname>[:<port>]

> Quick demo (local and remote monitoring)

> See jps man page on java.sun.com for details on -q, -mlvV options

Page 102: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 102

Monitoring GC: Using jstat • jstat

> included in the HotSpot JDK.

> command line utility.

> jstat -<option> [-t] [-h<lines>] <vmid> [<internal> [<count>]]

> Garbage collection option(s):

>-gc, -gccapacity, -gccause, -gcnew, -gcnewcapacity, -gcold, -gcoldcapacity, -gcpermcapacity, -gcutil

> See jstat man page on java.sun.com for details on garbage collection options

> Explain what patterns indicate potential problems.

Page 103: Java™ SE Performance Tuning

103 103

Module 3: Demo 6

Page 104: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 104

Monitoring GC: Using jstat • Beware: When using the Concurrent Mark Sweep

(CMS) collector (also known as concurrent collector), jstat reports two full gc events per CMS cycle, obviously misleading. But, young generation stats are accurate with CMS.

• Quick demo

> Running jstat against Java2D demo.

> Connect locally and remote

> Remember, remote monitoring requires jstatd and a policy file.

Page 105: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 105

Monitoring GC: Using jconsole • jconsole

> Is a monitoring and management GUI console.

> Is included in the HotSpot JDK.

> Can attach local or remote.

> Can monitor internals of a target JVM.

> Can monitor multiple JVMs.

> Explain what patterns indicate potential problems.

Page 106: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 106

Monitoring GC: Using jconsole • Provides endless observability

> MBean support for

>JVM memory usage by memory pool / spaces

>Class loading, JIT compilation, garbage collector, runtime, threading and logging

>Thread monitor contention

> Graphical view of heap memory, threads, cpu usage and class loading

Page 107: Java™ SE Performance Tuning

107 107

Module 3: Demo 7

Page 108: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 108

Monitoring GC: Using VisualVM • VisualVM: Background and capabilities

> Packaged with JDK 6 update 7

> Open source project at https://visualvm.dev.java.net

> Integrates several existing JDK software tools, lightweight memory and CPU profiling capabilities.

>JConsole

>Subset of NetBeans Profiler

> Includes performance analysis and troubleshooting abilities.

>Thread deadlock detection

>Thread monitor contention

Page 109: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 109

Monitoring GC: Using VisualVM • VisualVM: Extendability

> Can be further extended with specific functionality for target application through: additional plug-in or extending an existing plug-in.

>Possibilities include: –GlassFish performance monitoring plug-

in

–JavaDB performance monitoring plug-in

–External vendors such as WebSphere performance monitoring plug-in.

> Plugins, enhancements and updates delivered through VisualVM plug-in center.

Page 110: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 110

Monitoring GC: Using VisualVM • VisualVM

> Explain what patterns indicate potential performance issues.

Page 111: Java™ SE Performance Tuning

111 111

Module 3: Demo 8

Page 112: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 112

Monitoring GC: Using VisualGC • VisualGC

> Standalone GUI or VisualVM plug-in.

> Not included in HotSpot JDK. Separate download.

> Visually observe garbage collection behavior. (A picture is worth a thousand words).

> Also includes classloading and JIT compilation information.

Page 113: Java™ SE Performance Tuning

113 113

Module 3: Demo 9

Page 114: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 114

Garbage Collectors: GCHisto • Currently standalone GUI, VisualVM plug-in under

development.

• Open Source project, http://gchisto.dev.java.net

• Not included in HotSpot JDK, Separate

• Graphical tool which summarizes GC activity obtained from GC logs

• Allows comparison of JVM tuning, such as heap sizes or collector types by comparing GC logs.

• Demo of GCHisto in Module 5.

Page 115: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 115

Monitoring JIT Compilation • Tools for monitoring JIT Compilation

> jstat

> JConsole

> VisualVM

> VisualGC

> -XX:+PrintCompilation (can be intrusive)

> -XX:+LogCompilation (can be intrusive)

> DTrace (HotSpot JDK 6 contains samples)

• Data of interest

> frequency, duration, possible opt / de-opt cycles, failed compilations

Page 116: Java™ SE Performance Tuning

116 116

Module 3: Demo 10

Page 117: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 117

Monitoring JIT: Using PrintCompilation • -XX:+PrintCompilation

1 java.util.Properties$LineReader::readLine (452 bytes)

2 java.lang.String::hashCode (60 bytes)

3 java.lang.String::equals (88 bytes)

3 made not entrant (2) java.lang.String::equals (88 bytes)

4 java.lang.Object::<init> (1 bytes)

5 java.lang.String::indexOf (151 bytes)

6 java.lang.String::equals (88 bytes) <------------------- redoing 3

6 made not entrant (2) java.lang.String::equals (88 bytes)

7 java.lang.String::indexOf (151 bytes)

8 java.lang.String::equals (88 bytes) <------------------- redoing 6

8 made not entrant (2) java.lang.String::equals (88 bytes)

• Data of interest

> frequency, duration, possible opt / de-opt cycles, patterns of interest

Page 118: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 118

Monitoring JIT: Using LogCompilation

• -XX:+LogCompilation

> Beware, it can be intrusive

• Will probably need someone from JIT compiler team to analyze it.

• Data of interest

> frequency, duration, possible opt / de-opt cycles

Page 119: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 119

Monitoring JIT: Using .hotspot_compiler File

• What is the .hotspot_compiler file

• The .hotspot_compiler file format > exclude A/B/C/D methodName where

>A.B.C.D is the fully qualified package and class name and methodName is the method name.

> Example: To exclude java.util.HashMap.clear(), specify:

>exclude java/util/HashMap clear

• The .hotspot_compiler file must be placed in the directory where the java command is launched. > Can also use -XX:CompileCommand=exclude,A/B/C/D,methodName

> Can also use -XX:CompileCommandFile=/path/to/file

Page 120: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 120

Tools For Monitoring : JIT • When to use the .hotspot_compiler file

> JIT compiler in an endless loop attempting a “heroic” optimization which will not converge

> JIT compiler in a de-optimization – re-optimization cycle

> JIT compiler producing 'bad' code resulting in a core dump or other severe problem

Page 121: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 121

Monitoring the Application: Terms • Throughput sensitive applications

> Have as highest priority the raw throughput of the information or data being processed

> Maximize application throughput even at the expense of responsiveness

> Will tolerate high pause times in order to maximize throughput

> Example: Batch processing applications

Page 122: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 122

Monitoring the Application: Terms • Responsiveness sensitive applications

> Have as highest priority the servicing of all requests within a predefined maximum time

> Raw throughput of data or speed of processing requests are secondary to max response time goal

> Are sensitive to GC pause time

> Examples:

>User input applications such as Web browser or GUI based applications

>Financial trading applications

>Telecommunication applications

Page 123: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 123

Monitoring the Application: Tools and Data

• Tools

> JConsole (using application Mbeans)

> Extend VisualVM with a plug-in to gather Java™ application data of interest and monitor the application with VisualVM

> Application log

> GCHisto and btrace plugins for VisualVM

> Specialized DTrace scripts

• Data of interest

> Critical application information and instrumentation

Page 124: Java™ SE Performance Tuning

SEM-DTJ-380-LA

Module 4: Performance Profiling

Page 125: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 125

Objectives • Examining and selecting profiling tools

• Examining profiling tips for:

> CPU profiling

> Heap profiling

> Memory leak profiling

> Lock contention detection

• Identifying anti-patterns in:

> Heap profiles

> Method profiles

Page 126: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 126

What to Expect • What tools to use to profile

> Focus is on Sun Microsystems free and open source tools along with bundled JDK tools

> No commercial or external vendor tools are covered

• An overview of how to use each profiling tool

• An examination of good fit use cases for each tool

• Tips for finding source of lock contention and memory leaks

• Commonly observed patterns in profiles and suggested ways to fix them

Page 127: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 127

Tools For Profiling Java™ applications • Free tools

> NetBeans™ Profiler, subset in VisualVM

>http://www.netbeans.org

> Sun™ Studio Collector / Analyzer

> http://developers.sun.com/sunstudio/downloads

> jmap / jhat

>included in HotSpot JDK

• Commercial Profilers > Intel® VTune

> OptimizeIt

> YourKit

> Not covered in any detail here

Page 128: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 128

Free Profilers: NetBeans™ Profiler • Characteristics:

> CPU performance profiling using byte code instrumentation

> Low overhead profiling

• Capabilities:

> Method profiling

>Select all methods for profiling or specific method(s)

> Note: You can limit everything but JDK classes

> Memory profiling / heap profiling

> Memory leak detection

Page 129: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 129

Free Profilers: NetBeans™ Profiler • Supported platforms:

> Solaris (SPARC & x86)

> Linux

> Windows

> Mac OS X

• Requirements:

> Requires HotSpot JDK 5 or later

• Download:

> Included out-of-the-box in NetBeans™ IDE 6.0 and later

> http://www.netbeans.org

Page 130: Java™ SE Performance Tuning

130 130

Module 4: Demo 1

Page 131: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 131

Free Profilers: Sun Studio • Sun Studio Collector / Analyzer capabilities:

> Statistical CPU profiling using JVMTI

>Can specify sampling interval, default 1 sec

>User and sys cpu time

>Inclusive or exclusive method times

> Time spent in locks

> View Java™ byte code in User Mode and Machine Mode

> View JIT compiler generated assembly code in Expert Mode

> Supports specific CPU counter collection

Page 132: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 132

Free Profilers: Sun Studio • Sun Studio Collector / Analyzer characteristics:

> Easily invoked with 'collect -j on' prefixed to Java™ command line.

• Supported platforms:

> Solaris (SPARC & x86) and Linux

• Requirements:

> Requires HotSpot JDK 5 or later

• Additional Information: > http://developers.sun.com/solaris/articles/perftools.html

> http://developers.sun.com/solaris/articles/javapps.html

> http://developers.sun.com/solaris/articles/profiling_websphere.html

Page 133: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 133

Free Profilers: Sun Studio • View options for 'collected' data:

> GUI option: Analyzer GUI

> Command line option: 'er_print'

Page 134: Java™ SE Performance Tuning

134 134

Module 4: Demo 2

Page 135: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 135

Free Profilers: jmap and jhat • Used in combination:

> jmap – produces heap profile

> jhat – reads and presents the data

• Additional information:

> Shipped with JDK 5 and later

> Command line tools

> Heap memory profiling

> Perm gen statistics

> Finalizer statistics

> Supported on all platforms

Page 136: Java™ SE Performance Tuning

136 136

Module 4: Demo 3

Page 137: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 137

Profiling Tips • CPU profiling tips

• Heap profiling tips

• Memory leak profiling tips

• Lock contention profiling tips

• Profiling tools selection tips

• Inlining effect

Page 138: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 138

CPU Profiling Tips: Why and When • Why perform CPU profiling?

> CPU profiling provides information about where an application is spending most of its time.

• When is CPU profiling needed or beneficial? > Poor application throughput measured against a

predetermined target

> Saturated cpu utilization

> High sys or kernel cpu utilization

> High lock contention

> To a lesser extent, idle cpu or poor application scalability

Page 139: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 139

CPU Profiling Tips: Strategies • Approaches which work best for CPU profiling

> Start with holistic approach to isolate major cpu consumers or hot methods.

>Look at methods with high usr and/or sys cpu usage.

>Look at both inclusive and exclusive method times.

>Looking at inclusive times may help identify a change in implementation or design could be a good corrective approach.

>Looking at exclusive times focus on specific implementation details within a method.

Page 140: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 140

CPU Profiling Tips: Strategies • Some profilers such as NetBeans Profiler allows

you profile a subset of an application.

> Approach can be useful when or if profiling the entire application is very intrusive or severely disturbs application's performance.

> If holistic approach is not possible or painful, then profiling suspected subsets of an application is good approach.

• DTrace scripts can also be effective

Page 141: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 141

CPU Profiling Tips: Which Product • For profiling entire applications

> Sun Studio Collector works well

>1 second default sampling rate

>Easy to setup, just prepend 'collect -j on' to java command line.

>Can fine tune sampling rate.

>Can direct output to specified file name.

> DTrace scripting

>Can customize to target specific areas.

>May require DTrace scripting expertise to author the script.

Page 142: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 142

CPU Profiling Tips: Which Product • For profiling portions of applications

> NetBeans Profiler works very well

>Can easily configure which classes or packages to profile, (include or !include).

>Easy to setup if application is setup as a NetBeans Project.

>Remote or local profiling

>Can view profiling as application is running.

>Can compare profile against another profile.

> DTrace scripting

>Customize to target specific portions.

>May require DTrace scripting expertise to author the script.

Page 143: Java™ SE Performance Tuning

143 143

Module 4: Demo 4

Page 144: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 144

Heap Profiling Tips: Why and When • Heap profiling provides information about the

memory allocation footprint of an application.

• When is heap profiling needed or beneficial?

> Observing frequent garbage collections

> Application requires a large Java heap

> Can be useful for obtaining better cpu utilization or application throughout and responsiveness > Less time allocating objects and/or collecting them means more

cpu time spent running the application.

Page 145: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 145

Heap Profiling Tips: Strategies • What approaches work best for heap profiling

> Start with holistic approach to isolate major memory allocators.

>Look at objects with large amount of bytes being allocated.

>Look at objects with high number / count of object allocations.

>Look at stack traces for locations where large amounts of bytes are being allocated.

>Look at stack traces for locations where large number of objects being allocated

Page 146: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 146

Heap Profiling Tips: Strategies • If holistic approach is too intrusive, NetBeans

Profiler can profile subsets of the application.

> Hypothesize on packages or classes which might have a large memory allocation footprint. > Look at objects with large amount of bytes being allocated.

> Look at objects with high number / count of object allocations.

> Look at stack traces for locations where large amounts of bytes are being allocated.

> Look at stack traces for locations where large number of objects are being allocated.

Page 147: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 147

Heap Profiling Tips: Strategies • Cross reference cpu profiling with heap profiling

> Look for objects which may have lengthy initialization times and allocate large amounts of memory. They are good candidates for caching.

• Look for alternative classes, objects and possibly caching approaches where high number / count of bytes are being allocated.

• Consider profiling while application is running to observe memory allocation patterns.

Page 148: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 148

Heap Profiling Tips: Complementary Tools (jmap / jhat)

• jmap and jhat can also capture heap profiles

> Not as sophisticated as NetBeans Profiler

> Limited to a snapshot at the time of jmap capture. (jmap captures the snapshot, jhat displays the data)

> User interface not as polished as NetBeans Profiler

> Easily view top memory consumer at time when snapshot was taken.

> Look at stack traces for allocation location.

Page 149: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 149

Heap Profiling Tips: jmap / jhat Strategies • Focus on large memory allocators

> Consider alternative classes, objects and possibly caching approaches for large allocators.

• Capture several snapshots.

• Compare top memory allocators.

Page 150: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 150

Heap Profiling Tips: jmap / jhat Strategies

• Quick and easy to use

> run jmap on the command line

> run jhat on the command line

> connect with a web browser

• Can be intrusive on the application to generate the snapshot.

Page 151: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 151

Memory Leak Profiling Tips: Why • Memory leaks are situations where a reference to

allocated object(s) remain unintentionally reachable and as a result cannot be garbage collected.

• Lead to poor application performance.

• Can lead to application failure.

• Can be hard to diagnose.

Page 152: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 152

Memory Leaks Profiling Tips: Tools • Tools which help find memory leaks

> NetBeans Profiler

> VisualVM

> jmap / jhat

> Commercial offerings (not covered)

>JProbe Memory

>YourKit

>SAP Memory Analyzer

Page 153: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 153

Memory Leak Profiling Tips: Strategies • NetBeans Profiler / VisualVM Strategies

> View live heap profiling results while application is running.

> Pay close attention to “Surviving Generations”.

>Surviving Generations is the number of different object ages for a given class.

>An increasing Surviving Generations over a period of time can be strong indicator of a source of a memory leak.

> Use Heap Walker to traverse object references

Page 154: Java™ SE Performance Tuning

154 154

Module 4: Demo 5

Page 155: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 155

Memory Leak Profiling Tips: Strategies • jmap / jhat Strategies

> Capture multiple heap profiles and compare footprints, (i.e. look for obvious memory usage increases).

> -XX:+HeapDumpOnOutOfMemoryError

>Use this JVM command line switch when launching application. Can be used with -XX:HeapDumpPath=<path>/<file>

> Use jhat's Object Query Language (OQL) to query with interesting state information

For example you can query live HTTP requests using:

select s from com.sun.grizzly.ReadTask s s.byteBuffer.position > 0

Page 156: Java™ SE Performance Tuning

156 156

Module 4: Demo 6

Page 157: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 157

Lock Contention Profiling Tips: Overview • Use of Java™ synchronization can lead to highly

contended locks.

• Observing high values of voluntary context switches can be an indication of lock contention.

• Collector / Analyzer is very good with identifying Java objects experiencing lock contention.

Page 158: Java™ SE Performance Tuning

158 158

Module 4: Demo 7

Page 159: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 159

Profiling Tips: Good Use Cases • Collector / Analyzer

> CPU profiling entire application

> Sys cpu profiling or distinct usr vs sys profiling

> Lock contention profiling

> Integration with scripts, command files or batch files

> Also view performance of JVM internals including methods

> Want to see machine level assembly instructions

> Narrow to specific window of sampling

Page 160: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 160

Profiling Tips: Good Use Cases • NetBeans Profiler

> Profiling subset of application, for CPU profiling or heap profiling

> Heap profiling

> Finding memory leaks

> Profiling an application using NetBeans IDE and/or NetBeans project

> Remote profiling

> Attach to running application

> View profiling as application is running

Page 161: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 161

Profiling Tips: Good Use Cases • DTrace and DTrace scripts

> Non-intrusive snapshots of running application

> Command line utility

> Can leverage existing public scripts

>Heap profiling

>Finding memory leaks

>Monitor contention

>JIT Compilation

>Garbage collection activity

>Method entry / exit

>Java™ Native Interface entry / exit

Page 162: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 162

Profiling Tips: Good Use Cases • jmap / jhat

> Heap profiling

> Finding memory leaks

> Simple command line utilities

> Quick & easy snapshots of running application

Page 163: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 163

Profiling Tips: Inlining effect • If observing misleading or confusing results in cpu

profiles, disable in-lining

• It is possible methods of particular interest are being in-lined and leading to misleading observations.

• To disable in-lining, add the following JVM command line switch to the JVM command line args: -XX:-Inline

> Note: disabling in-lining may distort “actual” performance profile

Page 164: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 164

Identifying Anti-patterns and Memory Leak Patterns • Identifying anti-patterns in heap profiles

• Identifying memory leak patterns in heap profiles

• Identifying anti-patterns in method profiles

Page 165: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 165

Anti-patterns in Heap Profile • Large number of String or char[] allocations in

heap profile

> Possible over allocation of String

> Possibly benefit from use of StringBuilder

> Possible StringBuilder or StringBuffer resizing.

> Possibly utilize ThreadLocal to cache char[] or StringBuilder or StringBuffer

• Reducing char[] and String allocations will likely reduce garbage collection frequency.

Page 166: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 166

Anti-patterns in Heap Profiles • Observing StringBuffer in heap profile

> Possible candidate for StringBuilder if synchronized access is not required.

> Reducing char[] allocations on StringBuilder/StringBuffer on expansion of StringBuilder/StringBuffer size.

• Reducing char[] allocations will likely reduce garbage collection frequency.

Page 167: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 167

Anti-patterns in Heap Profiles • Observing Hashtable in heap profile

> Possible candidate for HashMap if synchronized access is not required.

> Possible candidate for ConcurrentHashMap if synchronized access is required.

> Further partitioning of data stored in Hashtable may lead to finer grained synchronized access and less contention.

Page 168: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 168

Anti-patterns in Heap Profiles • Observing Vector in heap profile

> Possible candidate for ArrayList if synchronized access is not required.

> If synchronized access required and depending on its use, consider using: LinkedBlockingDeque, ArrayBlockingQueue, ConcurrentLinkedQueue, LinkedBlockingQueue or PriorityBlockingQueue.

Page 169: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 169

How to Reduce Lock Contention • Approaches to reduce lock contention

> Identify ways to partition the “guarded” data such that multiple locks can be integrated at a finer grained level as a result of partitioning.

> Use a concurrent data structure first introduced in Java™ SE 5:

>java.util.concurrent package.

> If writes are much less frequent than reads:

>Separate read lock from write lock by using a Java SE 5 ReentrantReadWriteLock.

Page 170: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 170

Concurrent Data Structures Versus Synchronized Collections: Tips • Concurrent data structures might introduce

additional cpu utilization overhead and might in some cases not provide as good of performance as a synchronized Collection.

• Compare the approaches with meaningful workloads.

Page 171: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 171

Concurrent Data Structures Versus Synchronized Collections: Tips

• HotSpot JVM biased locking may also improve synchronized collection performance.

> -XX:+UseBiasedLocking introduced in JDK 5.0_06

> Improved in JDK 5.0_08

• Must be explicitly enabled in JDK 5 versions.

• Enabled by default in JDK 6 versions.

Page 172: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 172

Anti-patterns in Heap Profiles • Exception object allocations

> Do not use exceptions for flow control

> Use alternative flow control such as if / then / else, or switch flow control

• Generating stack traces are expensive operations

Page 173: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 173

Memory Leak Patterns in Heap Profiles

• Monitoring for trends illustrating increasing“surviving generations”while heap profiling when application is running indicates strong memory leak candidate.

> See section on Tools for Profiling.

• -XX:HeapDumpOnOutOfMemoryError can be used to capture heap dumps when out of memory errors occur.

> Heap dumps can be analyzed by JHAT, NetBeans Profiler or VisualVM.

Page 174: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 174

Anti-patterns in Method Profiles • Observing Map.containsKey(key) in profile.

> If null keys are allowed in the Map, and null keys are not being used as valid keys in the Map.

> Look at stack traces for unnecessary call flows which look like

if (!map.containsKey(key))

value = map.get(key);

> value will be null if a key is not found via map.get(key)

> Other use cases using Map methods such as put(key, value) or remove(key) may potentially be eliminated too.

Page 175: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 175

Anti-patterns in Method Profiles • Observing high sys cpu times.

> Look for monitor contention

>Monitor contention and high sys cpu time have a strong correlation.

>Consider alternatives to minimize monitor contention.

> Look for opportunities to minimize number of system calls.

>Example: read as much data as is ready to be read using non-blocking SocketChannels.

> Reduction in sys cpu time will likely lead to better application throughput and response time.

Page 176: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 176

Module 5: Tuning Garbage Collection

Page 177: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 177

Objectives • Tune GC

• Select garbage collector that best fits application characteristics

• Interpret GC output

Page 178: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 178

What to Expect • How to tune GC by setting GC generation sizes

• Compare different HotSpot garbage collectors

• Select HotSpot garbage collector based on application performance requirements

• Use tools to monitor GC and interpret the output

Page 179: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 179

Garbage Collectors: An Overview • The advantage of not having to deal with memory

management issues as is the case with C / C++.

• Historically, garbage collection had often been the most common source attribute-able to poor Java application performance.

• It still can be a source of poor application performance. Just doesn't seem to get as much attention as it use to.

• To tune the JVM's garbage collectors, you need to understand the basics of how it works.

Page 180: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 180

Garbage Collectors: Sizing Java™ Heap Spaces • -Xmx<n>m, max size of Java™ heap, (young generation

+ tenured generation)

• -Xms<n>m, initial size of Java™ heap (young generation + tenured generation) > Applications with emphasis on performance usually set -Xms and -

Xmx to the same value.

> When -Xmx != -Xms, Java™ heap growth or shrink requires a full garbage collection

• -Xmn<n>m, size of young generation heap space

Page 181: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 181

Garbage Collectors: Sizing Java™ Heap Spaces

• -XX:NewSize=<n>m, initial size of young generation space

• -XX:MaxNewSize=<n>m, max size of young generation space

• -XX:NewRatio=<n> ratio of young generation space to tenured space

• Applications with emphasis on performance tend to use -Xmn to size young generation since it combines use of -XX:MaxNewSize and -XX:NewSize

Page 182: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 182

Garbage Collectors: Sizing Java™ Heap Spaces

• -XX:PermSize=<n>m, initial size of permanent generation space

• -XX:MaxPermSize=<n>m, max size of permanent generation space

• Applications with emphasis on performance almost always explicitly set -XX:PermSize and -XX:MaxPermSize to the same value.

> Growing or shrinking permanent generation space requires a full garbage collection

Page 183: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 183

Garbage Collectors: Sizing Spaces • Tools to help identify heap sizes:

> VisualVM's Monitor tab

> JConsole's Memory tab

> VisualGC's heap space sizes

> GCHisto

> jstat's gc options

> -verbose:gc's heap space sizes

> -XX:+PrintGCDetails heap space sizes

Page 184: Java™ SE Performance Tuning

184 184

Module 5: Demo 1

Page 185: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 185

Garbage Collectors: The Choices • Application throughput versus responsiveness

> Throughput

Emphasis on how much work can be done in a given interval of time with no concern of application pauses due to garbage collection

> Responsiveness

Emphasis on maintaining an elapsed time limit in which the application must respond to interactions and / or stimuli, or number times a time limit can be exceeded per some interval of time

Page 186: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 186

Garbage Collectors: The Choices • Choice of garbage collectors

> Serial collector

> Throughput collector

> Concurrent collector

> Low pause concurrent collector

• Note: For pause sensitive applications, if the serial collector (for small Java™ heaps), concurrent collector, or low pause concurrent collector are not able to meet application pause time requirements, consider using Java™ Real Time System (Java™ RTS) as an alternative.

Page 187: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 187

Garbage Collectors: Serial Collector • An introduction

> Enabled with -XX:+UseSerialGC

> Single threaded young generation collector (stops all application threads)

> Single threaded tenured generation collector (stops all application threads)

• Suitability

> Well suited for single processor core machines

> Well suited for configurations of one-to-one JVM to processor core configuration

> Tends to work well for applications with small Java™ heaps, i.e. less than 128mb, may work well up to 256mb

Page 188: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 188

Garbage Collectors: Serial Collector • Comparing serial and throughput collectors

> Serial collector can work better on applications with small young generation heaps versus parallel throughput collector.

>Throughput collector's parallel GC threads may compete for work in small young generation heaps resulting in thrashing.

> For small Java™ heaps, or small young generation Java™ heaps, try both, serial collector and parallel throughput collector.

Page 189: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 189

Garbage Collectors: Serial Collector • Events which initiate a serial collector garbage

collection

> Eden space is unable to satisfy an object allocation request. Results in a minor garbage collection event.

> Tenured generation space is unable to satisfy an object promotion coming from young generation.

> An explicit invocation or call to System.gc().

Page 190: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 190

Garbage Collectors: Serial Collector • Good throughput performance can be realized with

the serial collector if:

> Well tuned Java™ heap spaces on Java™ applications with small Java™ heaps, (i.e. smaller than 100mb heaps)

> Target platform has small number of virtual processors

> The number of JVMs deployed on a multi-core processor platform equals the number of multi-core processors

>Bind JVMs to processors or processor sets for best results

Page 191: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 191

Garbage Collectors: Throughput Collector

• An introduction

> Multi-threaded young generation space collectors enabled with -XX:+UseParallelGC

> JDK 5.0_06 introduced multi-threaded tenured generation space collector enabled with -XX:+UseParallelOldGC (also enables -XX:+UseParallelGC)

• Suitability

> Can significantly reduce garbage collection overhead on multi-core processor systems

> Well suited for applications running on multi-processor or multi-core systems

Page 192: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 192

Garbage Collectors: Throughput Collector

• Managing collector threads

> Number parallel throughput collector threads controlled by -XX:ParallelGCThreads=<N>

>Defaults to Runtime.availableProcessors(). In a JDK 6 update release, 5/8ths available processors if > 8

>In multiple JVM per machine configurations, setting -XX:ParallelGCThreads=<N> lower will likely yield better results

>Reducing number of parallel gc threads can also reduce fragmentation effect in promotion buffers in tenured space.

Page 193: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 193

Garbage Collectors: Throughput Collector

• Multiple JVM strategies

> Bind JVMs to processor sets

>Works well for JVMs which tend to have equal load since idle processors outside a processor set are not available to other JVMs outside the processor set

>Note: Runtime.availableProcessors() reports number of processors in processor set

> Create zones and run a JVM (or possible more than one JVM) per zone

Page 194: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 194

Garbage Collectors: Throughput Collector

• Events which initiate a minor garbage collection

> Eden space is unable to satisfy an object allocation request. Results in a minor garbage collection event.

• Events which might initiate a full garbage collection

> Tenured generation space unable to satisfy an object promotion coming from young generation.

> An explicit invocation or call to System.gc().

Page 195: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 195

Garbage Collectors: Throughput Collector

• Good throughput performance can be realized with the throughput collector if: > Pause time requirements are less important than throughput

> Java™ heap spaces are well tuned

> Application runs on multi-core system

>Bind JVMs to processor sets in multiple JVM configurations for best results

>Consider -XX:+BindGCTaskThreadsToCPUs or -XX:+AggressiveOpts when scaling across multiple cores

> Young and tenured generations both use multi-threaded collectors, that is using -XX:+UseParallelOldGC and -XX:+UseParallelGC

Page 196: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 196

Garbage Collectors: Throughput Collector

• Good throughput performance can be realized with the throughput collector if: > Use of -XX:+UseParallelGC, the multi-threaded young generation

collector, and young generation heap is sized so that only long lived objects are tenured to old generation and Full GC events can be avoided.

> If Full GC events cannot be avoided, a multi-threaded old generation collector can reduce length of Full GC events, -XX:+UseParallelOldGC.

Note: -XX:+UseParallelOldGC also enables -XX:+UseParallelGC

Page 197: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 197

Garbage Collectors: Concurrent Collector

• An Introduction

> Single threaded tenured space collector which runs mostly concurrent with Java™ application threads and is enabled with -XX:+UseConcMarkSweepGC

> Parallel, multi-threaded young generation collector enabled by default.

• Suitability

> Well suited when application responsiveness is more important than application throughput

> Consider using Concurrent incremental choice for applications with very sensitive pause time requirements

Page 198: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 198

Garbage Collectors: Concurrent Collector

• Issues to be aware of

> Cost of the concurrent collections is the additional overhead of more memory and CPU cycles

> Concurrent mode failure can occur when objects are copied to the tenured space faster than the concurrent collector can collect them. (“loses the race”)

> Concurrent mode failure can also occur from tenured space fragmentation.

> Corrective action by the JVM is to perform a full garbage collection which will block all Java application threads.

Page 199: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 199

Garbage Collectors: Concurrent Collector

• Concurrent collector cycle contains the following phases

> Initial mark

> Concurrent mark

> Remark

> Concurrent sweep

> Concurrent reset

• During a concurrent collector cycle, a Java™ application is paused during the 'initial mark' and 'remark' phases.

Page 200: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 200

Garbage Collectors: Concurrent Collector

• Initial mark phase

> Objects in the tenured generation are “marked” as reachable including those objects which may be reachable from young generation.

> Pause time is typically short in duration relative to minor collection pause times.

• Concurrent mark phase

> Traverses the tenured generation object graph for reachable objects concurrently while Java™ application threads are executing.

Page 201: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 201

Garbage Collectors: Concurrent Collector

• Remark

> Finds objects that were missed by the concurrent mark phase due to updates by Java™ application threads to objects after the concurrent collector had finished tracing that object.

• Concurrent sweep

> Collects the objects identified as unreachable during marking phases.

• Concurrent reset

> Prepares for next concurrent collection.

Page 202: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 202

Garbage Collectors: Concurrent Collector

• Additional heap space sizing considerations > Size young generation as large possible with the goal to only

tenure long lived objects to old generation space and stay within pause time requirements.

> If it is difficult to meet pause times requirements and have CPU available, try sizing young generation spaces smaller, (eden and survivor spaces).

> Realize, sizing young gen smaller will put more pressure on the old generation concurrent collector since objects will be tenured to old generation at a faster rate. And, old generation space fragmentation may increase too.

Page 203: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 203

Garbage Collectors: Concurrent Collector

• Additional heap space sizing considerations

> As much as 20% additional tenured generation space may be required for floating garbage

> Floating garbage consists of objects which are found to be reachable by the concurrent garbage collector which may become unreachable by the time a concurrent garbage collection cycle finishes.

> If unable to meet pause time requirements and cannot avoid Full GC events, consider heap profiling to reduce object allocations.

Page 204: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 204

Garbage Collectors: Concurrent Collector

• Additional heap space sizing considerations

> Favor tuning survivor spaces rather than -XX:CMSInitiatingOccupancyFraction=# to reduce floating garbage and remark phase times.

> To date, classes will not by default be unloaded from permanent generation when using the concurrent collector unless explicitly instructed to do so using -XX:+CMSClassUnloadingEnabled and -XX:+PermGenSweepingEnabled, (the 2nd switch is not needed in post HotSpot 6.0u4 JVMs).

Page 205: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 205

Garbage Collectors: Concurrent Collector

• Additional heap space sizing considerations

> If relying on explicit GC and want them to be concurrent, use:

-XX:+ExplicitGCInvokesConcurrent (requires 1.6.0 and later)

-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses for permanent generation concurrent collections and for class unloading, (requires JDK 1.6.0u4 or later).

Page 206: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 206

Garbage Collectors: Concurrent Collector

• Additional heap space sizing considerations

> For small Java™ heaps, pause time requirements may be achieved using the Serial collector.

>Consider the Serial collector for Java heaps to 128mb and possibly up to 256m.

>Serial collector is easier to tune than CMS.

>What is learned from tuning the Serial collector is useful for initial CMS tuning.

Page 207: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 207

Garbage Collectors: Concurrent Collector

• Events initiating concurrent collection cycle

> Ideally, the cycle needs to start early enough so that the collection finishes before tenured space becomes full.

Because full garbage collections are expensive, the concurrent collector estimates the time remaining until tenured space will be used up and the amount of time needed to complete a concurrent collection cycle based on recent history.

Page 208: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 208

Garbage Collectors: Concurrent Collector

• Events initiating concurrent collection cycle (continued)

> Concurrent collector will also start if the occupancy of the tenured space exceeds an initiating occupancy percentage threshold. Default value is 92% and may change from release to release. Tune -XX:CMSInitiatingOccupancyFraction=n where n is the % of the tenured space size.

• Note: Minor collection events occur as they do with throughput and serial collectors.

Page 209: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 209

Garbage Collectors: Concurrent Collector

• Good responsiveness performance can be realized with the concurrent collector if:

> Pause time requirements are more important than throughput

> Java™ heap spaces are well tuned

>Will likely require larger tenured heap space sizing than other collectors due to heap fragmentation and floating garbage

> By default a multi-threaded young generation collector, -XX:+UseParNewGC is enabled by default with the concurrent collector

Page 210: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 210

Garbage Collectors: Concurrent Collector

• CMSIncrementalMode is an additional concurrent alternative for applications not able to meet pause time requirements with CMS and not wanting to use Java™ RTS.

> CMSIncrementalMode enables the concurrent modes to be done incrementally.

> Periodically gives additional processor back to the application resulting in better application responsiveness by doing the concurrent work in small chunks.

Page 211: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 211

Garbage Collectors: Concurrent Collector

• How to tune CMSIncremental mode

> CMSIncrementalMode has a duty cycle that controls the amount of work the concurrent collector is allowed to do before giving up the processor.

> Duty cycle is the % of time between minor collections the concurrent collector is allowed to run.

> Duty cycle by default is automatically computed using what's called automatic pacing.

> Both duty cycle and pacing can be fine tuned.

Page 212: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 212

Garbage Collectors: Concurrent Collector

• How to enable CMSIncremental mode

> On JDK 6, recommend using the following two switches together:

>-XX:+UseConcMarkSweepGC and

>-XX:+CMSIncrementalMode

> Or use:

>-Xincgc

Page 213: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 213

Garbage Collectors: Concurrent Collector

• How to enable CMSIncremental mode

> On JDK 5 use all of the following switches together:

>-XX:+UseConcmarkSweepGC

>-XX:+CMSIncrementalMode

>-XX:+CMSIncrementalPacing

>-XX:CMSIncrementalDutyCycleMin=0

>-XX:CMSIncrementalDutyCycle=10

> JDK 5 settings mirror the default settings decided upon for JDK 6.

> JDK 5's -Xincgc != CMSIncrementalMode, it enables CMS

Page 214: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 214

Garbage Collectors: Concurrent Collector

• Fine tuning CMSIncremental mode

> If full collections are still occurring, then:

>Increase the safety factor using -XX:CMSIncrementalSafetyFactor=n The default value is 10. Increasing safety factor adds conservatism when computing the duty cycle.

>Increase the minimum duty cycle using -XX:CMSIncrementalDutyCycleMin=n The default is 0 in JDK 6, 10 in JDK 5.

>Disable automatic pacing and use a fixed duty cycle using -XX:-CMSIncrementalPacing and -XX:CMSIncrementalDutyCycle=n The default duty cycle is 10 in JDK 6, 50 in JDK 5.

Page 215: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 215

Garbage Collectors: Permanent Generation

• Why tune the permanent generation?

> Some Java™ applications require fine tuning of the permanent generation space.

> Applications that dynamically generate and load many classes such as commonly seen in web container or application server implementations, especially those utilizing JSPs need a larger permanent generation space than the provided default max size of 64m.

Page 216: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 216

Garbage Collectors: Permanent Generation

• Tuning the permanent generation > Use JConsole, VisualVM or jstat to observe the sizing

behavior of permanent generation.

> Use -XX:MaxPermSize=<n> to increase the maximum size. Also consider setting -XX:PermSize=<n> to the same value to avoid performance overhead of permanent generation space expansion.

> Concurrent collector can be specified to collect permanent generation by using -XX:+CMSClassUnloadingEnabled and -XX:+PermGenSweepingEnabled (not required for HotSpot Java™ 6 JVMs.).

Page 217: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 217

Garbage Collectors: Explicit GC • Do not use System.gc() unless there is a specific

use case or need to.

• Explicit invocations can be disabled using -XX:+DisableExplicitGC.

• A common use of explicit GC is RMI distributed garbage collection (dgc).

• -XX:+DisableExlipicitGC will disable RMI dgc. Consider tuning RMI dgc if needed rather than disabling explicit GC.

Page 218: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 218

Garbage Collectors: Explicit GC • Default RMI distributed GC interval is once per

minute, (60000 ms). > Use -Dsun.rmi.dgc.client.gcInterval and

-Dsun.rmi.dgc.server.gcInterval to change. Max is Long.MAX_VALUE.

> When using JDK 6 and the Concurrent collector also use -XX:+ExplicitGCInvokesConcurrent

Page 219: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 219

Garbage Collectors: Reference Objects

• Got WeakReference, SoftReference or PhantomReference?

• Once a reference object is discovered by the garbage collector, it gets queued for reference processing which can extend the lifetime of a reference object until the reference processing is completed for that reference object.

• If there are lots of reference objects, the number of reference processing threads can have an impact on the latency of retiring the reference objects.

Page 220: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 220

Garbage Collectors: Reference Objects

• In addition, lots of reference objects also give the garbage collector more work to do since unreachable reference objects need to be discovered and queued during garbage collection.

• Reference object processing can extend the time it takes to perform garbage collections, especially if there are consistently many unreachable reference objects to process.

Page 221: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 221

Garbage Collectors: Soft References

• Soft References lifetimes are kept alive longer in HotSpot Server JVM.

> -XX:SoftRefLRUPolicyMSPerMB=<n> to control clearing rate, default is 1000ms.

> This specifies the number of ms a soft reference will be kept alive for each megabyte of free heap space after it is no longer strongly reachable.

> Keep in mind soft references are cleared only during garbage collection which may not occur as frequently as the value set of SoftRefLRUPolicyMSPerMB.

Page 222: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 222

Garbage Collectors: Finalizers

• Use of finalizers can create performance issues.

> Relying on garbage collection to manage resources other than memory is not a good idea.

> For example, do not rely on a finalizer to close file descriptors.

> Try to limit use of finalizer as safety net, use other mechanisms for releasing resources.

> If using a finalizer cannot be avoided, try to keep the work being done as small as possible.

Page 223: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 223

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using serial

collector

Minor collection:

> [GC [DefNew: 960K->64K(960K), 0.0047410 secs] 3950K->3478K(5056K), 0.0047900 secs]

> DefNew is young generation space.

> Old generation space not collected.

> Permanent generation space not collected.

Page 224: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 224

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using serial

collector(continued)

Full collection:

> [Full GC [Tenured: 4652K->4660K(8384K), 0.0844300 secs] 4962K->4660K(9344K), [Perm : 1416K->1416K(12288K)], 0.0845060 secs]

> Tenured is old generation space.

> Perm is permanent generation space.

> Young generation stats reported.

Page 225: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 225

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using

throughput collector

Minor collection:

> [GC [PSYoungGen: 13737K->1978K(14080K)] 17407K->6303K(43840K), 0.2144150 secs]

> Collected 13737K minus 1978K bytes of space collected in a 14080K sized young gen space.

> 43840K overall heap size.

> Took .2144150 seconds to perform collection.

Page 226: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 226

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using

throughput collector

Full collection:

> [Full GC [PSYoungGen: 32K->0K(12800K)] [PSOldGen: 5180K->5181K(29760K)] 5212K->5181K(42560K) [PSPermGen: 10974K->10974K(24832K)], 0.0923850 secs]

> PSYoungGen is young generation space

> PSOldGen is old generation space

> PSPermGen is permanent generation space

Page 227: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 227

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using

concurrent collector

> Minor collections follow serial collector format.

> [Full GC [CMS: 5994K->5992K(49152K), 0.2584730 secs] 6834K->5992K(63936K), [CMS Perm : 10971K->10971K(18404K)], 0.2586030 secs]

> Note, CMS Perm indicates concurrent mark sweep collection activity in permanent generation space.

Page 228: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 228

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using

concurrent collector

> [GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs]

> CMS-initial-mark indicates the start of a concurrent collection cycle.

> [CMS-concurrent-mark: 0.267/0.374 secs]

> CMS-concurrent-mark indicates the end of the concurrent marking phase.

Page 229: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 229

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using

concurrent collector

> [CMS-concurrent-preclean: 0.044/0.064 secs]

> CMS-concurrent-preclean indicates work done concurrently in preparation for the remark phase.

> [GC [1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]

> CMS-remark indicates remarking work

> Note, a minor collection occurred concurrently with CMS-remark.

Page 230: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 230

Garbage Collectors: GC Output • Understanding -XX:+PrintGCDetails using

concurrent collector

> [CMS-concurrent-sweep: 0.291/0.662 secs]

> CMS-concurrent-sweep indicates the end of the concurrent sweeping phase.

> [CMS-concurrent-reset: 0.016/0.016 secs]

> CMS-concurrent-reset indicates work done to prepare for the next collection cycle.

Page 231: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 231

Garbage Collectors: PrintGCStats • PrintGCStats tool summarizes statistics from GC activity

> Note: applicable only to 32-bit JVM's

• Script download location: http://java.sun.com/developer/technicalArticles/Programming/ turbo/PrintGCStats.zip

• Usage: PrintGCStats -v ncpu=<n> [-v interval=<seconds>] [-v verbose=1] <gc log file>

> ncpu is number of cpus on the target machine

> interval requires use of command line switch -XX:+PrintGCTimeStamps and reports statistics at each 'interval'

> verbose provides more detailed output

Page 232: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 232

Garbage Collectors: PrintGCStats

Page 233: Java™ SE Performance Tuning

233 233

Module 5: Demo 2

Page 234: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 234

Garbage Collectors: GCHisto • Currently standalone GUI, VisualVM plug-in under

development.

• Open Source project, http://gchisto.dev.java.net

• Not included in HotSpot JDK, Separate

• Graphical tool which summarizes GC activity obtained from GC logs

• Allows comparison of JVM tuning, such as heap sizes or collector types by comparing GC logs.

Page 235: Java™ SE Performance Tuning

235 235

Module 5: Demo 3

Page 236: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 236

Garbage Collectors: More Information

• More detailed information:

> Jon Masamitsu's blog: http://blogs.sun.com/jonthecollector

> Java SE Performance on java.sun.com: http://java.sun.com/javase/technologies/performance.jsp

> Java SE Memory Management http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf

Page 237: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 237

Module 6: Examining and Managing the JIT Compiler

Page 238: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 238

Objectives • Examine the client and the server HotSpot

compilers

• Examine optimization strategies used by the HotSpot compiler

• Interpret HotSpot compiler behavior

• Create effective micro-benchmarks

Page 239: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 239

What to Expect • Overview of HotSpot JIT compiler choices

• Examples of the optimizations performed by the JIT compiler

• How to understand what the JIT compiler is doing

• Pitfalls of micro-benchmarks and how to identify problems with micro-benchmarks

Page 240: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 240

HotSpot JIT Compilers: An Introduction

• The “magic” of HotSpot which generates native code and makes optimizations decisions based on what it observes while an application is executing.

• As of JDK 6, two flavors:

> HotSpot Client, -client

> HotSpot Server, -server

> Ergonomics will choose automatically if one is not explicitly specified.

• Tiered compilation in development

Page 241: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 241

HotSpot JIT Compilers: An Introduction

• Default operation

> Application code is initially run in the interpreter, after reaching a compile threshold, “hot” methods or loops are compiled into native code and executed in that way (no longer executed in the interpreter).

• You can override with extremes:

> -Xint, compile nothing, interpret only

> -Xcomp, compile everything from the beginning

> -Xint / -Xcomp (do not do either !!!)

Page 242: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 242

JIT Compilers: HotSpot Client • Always the default on Windows

• Only choice in Windows HotSpot JRE

• Focused on startup and footprint performance

• Simple, quick optimizations

• Optimization decisions made early, 1500 iterations

• Ergonomics default on Linux and Solaris platforms with less than 2G of RAM on JDK / JRE 5 and later JDK / JRE

• Default on Windows platforms regardless of number of processors, processor cores or amount of RAM

Page 243: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 243

JIT Compilers: HotSpot Server • Available in all HotSpot JDKs

• Available in all HotSpot JREs except Windows, (no HotSpot Server JRE)

• Focused on long running applications with little sensitivity towards startup

• Sophisticated optimization decisions

• Optimization decisions delayed, 10000 iterations (some cases 15,000)

Page 244: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 244

JIT Compilers: HotSpot Server • Improvement over -client can be as much as 2x once

application has reached steady state (JIT / dynamic compiled)

• Ergonomics default on Linux and Solaris platforms with 2G of RAM on JDK / JRE 5 and later

Page 245: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 245

JIT Compilers: HotSpot Tiered Compiler

• Tiered compilation

> Experimental in JDK 6 and beyond

> Integrates the best of -client and -server

> Early testing has not shown what was hoped for

> It is evolving and changing. Acceptable to experiment with on non-production, non-critical applications.

Page 246: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 246

JIT Compilers: Optimizations • Escape Analysis (JDK 6, but limited)

> -XX:+DoEscapeAnalysis

> Some optimizations available in JDK 6u5p

> Further enhancements coming in a JDK 6 update release

• Autobox Elision (JDK 6u4p)

> -XX:AutoBoxCacheMax=n

> Only available with -server JVMs

• Synchronization (JDK 6)

> -XX:+UseBiasedLocking

> -XX:+UseSpinning

Page 247: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 247

JIT Compilers: Escape Analysis Explained

• An object is said to escape the thread that allocated it if some other thread can ever see it

• If an object does not escape, it is possible to perform: > Object explosion: allocate an object's fields in different places

> Scalar replacement: store scalar fields in registers

> Thread stack allocation: store fields in stack frame

> Eliminate synchronization

> Eliminate GC read / write barriers

• Memory system pressure reduced, possibly eliminated

Page 248: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 248

JIT Compilers: Autobox Elision Explained

• Generic collection classes introduce hidden promotion of scalars to objects, also known as “auto-boxing”

> HashMap.get(5) is transformed by javac to HashMap.get(Integer.valueOf(5))

• Smart JIT compilers (that is. -server) can eliminate the object allocation entirely and/or use the scalar value instead of accessing the object once it has been allocated

Page 249: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 249

JIT Compilers: Autobox Elision Example

• Consider Java Collections of Java primitive types

> Essentially transforms HashMap<Integer, String>() into HashMap<int, String>()

> Essentially transforms TreeMap<Integer, String>() into TreeMap<int, String>()

Page 250: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 250

JIT Compilers: Autobox Elision Advantages

• Less Java heap required

• Improved memory locality, better CPU cache utilization

• Map.get(int key) noticeably faster

> Integer.equals(Entry[hash].key) operation is not needed to find entry's value

> Only need key == Entry.[hash].key to find value

> Saves instructions executed

Page 251: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 251

JIT Compilers: What is it doing? • Use -XX:+PrintCompilation

• Example output: 4% ! sun.nio.cs.UTF_8$Decoder::decodeArrayLoop @ 129 (1814 bytes)

18 java.lang.CharacterDataLatin1::getProperties (11 bytes)

19 java.io.BufferedReader::ensureOpen (18 bytes)

20 java.io.StreamTokenizer::read (38 bytes)

21 ! java.io.BufferedReader::read (104 bytes)

22 java.io.StreamTokenizer::nextToken (1268 bytes)

23 java.lang.String::toLowerCase (436 bytes)

24 java.lang.Character::toLowerCase (162 bytes)

26 ! sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (698 bytes)

27 java.lang.Character::toLowerCase (6 bytes)

29 java.util.HashMap::indexFor (6 bytes)

Page 252: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 252

JIT Compilers: What is it doing? • -XX:+PrintCompilation explained: 4% ! sun.nio.cs.UTF_8$Decoder::decodeArrayLoop @ 129 (1814 bytes)

[id][flags] [class name::method name] [@ bci)] ([size]) where:

id is ordinal number of compilation

flags is none or more of:

% indicates an “On Stack Replacement”

! indicates method has an exception handler

* method is a native method

b compilation not done in parallel with execution

s method is synchronized

made not entrant

class name::method name is name of the method

@ bci byte code index for an on stack replacement operation

size is the size of the code generated

Page 253: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 253

JIT Compilers: Programming Advice • When writing Java code, don't try to out-smart the JIT

compiler.

• Focus on the right architecture, design, implementation, choosing the best data structures and algorithms.

• Write the code as you normally would, then evaluate the applications performance. Only then consider changing source code to be more JIT compiler friendly.

• Never change code to be JIT compiler friendly.

• If you have to change code to be JIT compiler friendly, it is a JIT compiler bug. Please file a bug!

Page 254: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 254

Micro-benchmark: Beware the JIT Compiler

• JIT compiler can completely change what a developer thinks he or she is measuring in a micro-benchmark.

• Obvious pointers to “bad” micro-benchmarks

> No warm-up cycle

> Garbage collection events observed during measurement interval

> Relying on millisecond accuracy with System.currentTimeMillis()

> Relying on nanosecond accuracy with System.nanoTime()

Page 255: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 255

Micro-benchmark: Beware the JIT Compiler

• Obvious pointers to “bad” micro-benchmarks (continued)

> Measurement interval less than 10 seconds

> -XX:+PrintCompilation shows methods being optimized during measurement interval

> Elapsed time is the only reported metric.

> Varying length of measurement interval yields different throughput rates.

> Unexpected or surprising results or results which do not make sense.

Page 256: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 256

Benchmark Creation Tips • Define exactly what you want to know and do not

get distracted with other artifacts.

• Make sure work done in the measurement interval is always the same work.

• Compute and report multiple metrics such as elapsed time and iterations per unit of time.

• Be aware of both accuracy and granularity of Java time APIs.

Page 257: Java™ SE Performance Tuning

257 257

Module 6: Demo 1

Page 258: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 258

Benchmark Creation Tips • Avoid chances of dead code. Perform non-trivial

computations, pass args into methods, return a result from a method and print out computation results immediately after the measurement interval.

• Small data sets or small data structures can become cache sensitive and may not accurately reflect larger scope performance.

Page 259: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 259

Benchmark Execution Tips • -XX:+PrintCompilation not showing methods being

optimized during measurement interval

• Make sure other applications are not intruding such as weather or stock applet.

• Run multiple iterations to increase confidence in reported results.

• Question whether results make sense. If suspicious or unexpected, probably not accurate. You should investigate results.

• For multi-threaded micro-benchmarks, thread scheduling may not be deterministic especially under heavy load.

Page 260: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 260

Module 7: Examining JVM Ergonomics

Page 261: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 261

Objectives • Examine JVM ergonomics

• Override JVM ergonomics choices

Page 262: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 262

What to Expect • What is ergonomics

• How to override ergonomics defaults

Page 263: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 263

Ergonomics: What it Does • Evaluates the system and auto-magically choose

defaults for the HotSpot JVM. No tuning required!

• Relies on definition of “server class machine” > 2 or more processor cores and 2 or more GB of physical

RAM

• Server class machines will use -server JIT compiler > Special case: 32-bit Windows JVMs are never

considered “server class”

> Special case: 64-bit JVMs are always -server

Page 264: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 264

Ergonomics: What it Does • As of JDK 5, server class machines sets

> Server JIT compiler

> Throughput collector

> Initial heap size (-Xms) 1/64th of physical memory up to max of 1G

> Max heap size (-Xmx) 1/4th of physical memory up to max of 1G

• If not identified as server class

> Client JIT compiler

> Default serial collector, same as before

> -Xms4m and -Xmx64m

Page 265: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 265

Ergonomics • Use -XX:+PrintCommandLineFlags to tell you

what Ergonomics is choosing

• Example:

> On 2 socket, 3.0 GHz Hyperthreaded Intel, running Solaris with 4GB RAM:

> java -XX:+PrintCommandLineFlags -version

> -XX:MaxHeapSize=1073626112 (1 GB)

> -XX:+PrintCommandLineFlags

> -XX:+UseParallelGC

> Java HotSpot(TM) Server VM (build 1.6.0_02-b04, mixed mode)

Page 266: Java™ SE Performance Tuning

266 266

Module 7: Demo 1

Page 267: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 267

Module 8: Examining 64 bit JVMs

Page 268: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 268

Objectives • Examine the characteristics of 64 bit JVMs

• Identify application characteristics that suit the use of 64 bit JVMs

• Tune 64 bit JVMs

Page 269: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 269

What to Expect • Advantages and challenges with 64-bit JVMs

• What class of applications are good candidates for 64-bit JVMs

• How to tune 64-bit JVMs

• The future of 64-bit JVMs

Page 270: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 270

64-bit JVMs: Advantages

• Really large Java™ heaps, that is heaps greater than 4G limit associated with 32-bit JVMs.

• Utilize more memory with a single JVM on large multi-core systems.

• Memory intensive applications can be migrated to using Java™, i.e large data caches.

• Potentially fewer garbage collections.

• Potential use of additional CPU registers available on 64-bit CPUs.

Page 271: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 271

64-bit JVMs: Challenges • Scaling applications in JVMs with really large

Java™ heaps.

• Performance penalty as a result of larger pointer references (memory addresses). Typical penalty of ~ 10% - 15%.

• Native code (JNI) must be 64-bit compiled.

• Product support (some Java™ application products do not provide official support on 64-bit JVMs)

Page 272: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 272

Application Characteristics that Suit 64-bit JVMs

• Large in-memory databases or caches

• Large batch processing engines

• Applications desiring small pause times and/or no full GC pauses

• Applications which have high allocation rates

• Applications wanting to leverage memory mapping or MemoryMappedBuffers.

Page 273: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 273

Tuning Suggestions • For all cases:

Follow same tuning guidelines as 32-bit JVMs.

• For throughput emphasized case:

> Size heap spaces as large as possible and avoid premature tenuring of objects to old generation.

> Use throughput collector.

>For JDK 5_06 & later, use parallel old generation collector: -XX:+UseParallelOldGC

>Otherwise, use the parallel young generation collector: -XX:+UseParallelGC

Page 274: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 274

Tuning Suggestions • For Pause time emphasized case:

> Young generation:

>Size young generation to meet pause time constraint.

>On multi-core systems use: -XX:+UseParNewGC

>Size survivor spaces to maximize collecting of objects prior to tenuring.

> Old generation:

>Size old generation big enough to allow concurrent collector to clean up unreachable objects.

>Use concurrent old gen collector: -XX:+UseConcMarkSweepGC

>Use -XX:CMSInitiatingOccupancyFraction to tune when CMS starts concurrent collection cycle.

Page 275: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 275

Tuning Suggestions • For very sensitive pause time emphasized case:

> Use: -XX:+CMSIncrementalMode

But, follow tuning CMS suggestions, including CMS incremental mode, as you would for 32-bit JVMs.

Page 276: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 276

Future work • Compressed oops (ordinary object pointers)

> Limited 64-bit Java™ heap space, That is imposing a 32G Java™ heap limit.

> 32-bit JVM performance with larger than 32-bit Java™ heap space, i.e. larger than 4G Java™ heap.

> HotSpot engineering working on it.

> Early testing shows possibilities of 8% or more improvement on some workloads.

> Introduced in JDK performance release JDK 6u6p

>Slated to be in general JDK 6 update release, 6u12

>Enabled with: -XX:+UseCompressedOops

Page 277: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 277

Module 9: Tune the JVM for Multi-core platforms

Page 278: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 278

Module Objectives • Examine the issues associated with multi-core

platforms

• Examine JVM features that take advantage of multi-core platform architectures

• Tune the JVM for specific multi-core platforms

Page 279: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 279

What to Expect • How to tune the JVM for specific hardware

platforms

> Challenges introduced by multi-core platforms

> AMD / Intel multi-core platforms

> NUMA based platforms

> CMT (Sun Fire™ T1000 / Sun Fire ™ T2000) platforms

Page 280: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 280

Multi-core Platforms: The Opportunities

• Multiple hardware threads

> But you need to leverage those plentiful hardware threads

• Increased throughput

> Java and JVMs are inherently multi-threaded

> Threading in Java™ is easy

> Available libraries such as: java.util.concurrent library

> Throughput garbage collector (parallel collector)

Page 281: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 281

Multi-core Platforms: The Opportunities

• Improved determinism

> Concurrent collector

> RTSJ (Real Time System Java™)

• Both parallel and concurrent

> Garbage collection

> Dynamic compilation

> Class loading

Page 282: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 282

Multi-core Platforms: The Challenges

• Need to optimize memory use

> Why: All those hardware threads pound memory

• Need to overcome:

> Memory latency

Time to fetch data from memory

> Memory bandwidth limitations

The limitations on the amount of data transferred between memory and processor

Page 283: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 283

Multi-core Platforms: Overcoming Challenges

• Leverage processor / memory affinity to overcome challenges.

> Run a software thread on the same hardware thread.

> Keep data close to processor (that is keep processor caches warm)

> Use operating system help to bind processes and/or threads to processors and/or processor sets.

Page 284: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 284

Multi-core Platforms: Overcoming Challenges

• Additional general guidelines

> Number of simultaneous active software threads should be >= number of hardware threads

>May be less due to memory limitations

>Try to use all hardware threads

>Include non-Java threads in the count of software threads such as native threads and concurrent GC thread.

> Minimize shared data structure writes and leverage JVM synchronization improvements

>Requires processor to acquire ownership

>Synchronization requires write to lock word

>But, reads of shared data are ok.

Page 285: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 285

JVM Synchronization Improvements

• Most locking is uncontended

• JVM avoids associating an OS mutex (heavy-weight lock) with each object

• While uncontended, JVM uses light-weight mechanism(s) to enter / exit monitor

• If contended, falls back to heavy-weight lock

• Detecting contention requires an atomic write to a shared lock word, usually via compare-exchange instruction

Page 286: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 286

JVM Synchronization Improvements

• Must complete all memory operations to first level of memory shared by all processors.

• Must acquire lock word ownership

• Takes 10's to many 100's of CPU cycles

Page 287: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 287

JVM Synchronization Improvements

• Typically two light-weight mechanisms

• Start with biased locking

• Avoids lock word contention when a lock is owned by only a single thread over long periods of time

• Single compare-exchange biases lock toward a thread

• Thereafter, compare-and-branch for monitor entry / exit

Page 288: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 288

JVM Synchronization Improvements

• Typically very expensive to transfer bias to another thread, so

• If lock ownership starts changing frequently, but still without contention, then

> switch to compare-exchange for monitor entry / exit

> More expensive, but still far cheaper than OS lock

Page 289: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 289

JVM Synchronization Improvements

• If real contention occurs (one thread wants to acquire a lock held by another), try desperately to avoid heavy-weight lock

• System calls for monitor entry / exit take thousands of cpu cycles

Page 290: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 290

JVM Synchronization Improvements

• Adaptive spinning > Spin awhile, then retry lock acquisition

> Locked region usually short, lock likely released during spin

> Platform-dependent and execution history-dependent spin time, otherwise cost exceeds benefit

Page 291: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 291

JVM Synchronization Improvements

• If in spite of all this a lock becomes heavy-weight, bias selection of next thread to acquire lock

• On monitor exit, prefer running a blocked thread that has recently run on the same processor

• Caches and TLB will be warm

• Locally unfair, but globally efficient

• Relies on OS to guarantee that every thread will eventually run

Page 292: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 292

JVM Synchronization Improvements

• Biased Locking

> -XX:+UseBiasedLocking, on by default in JDK 6. Must be set explicitly in JDK 5.

> Bias synchronized object to the thread that created it.

> If the synchronized block is never accessed by another thread, uses cmp+branch, not atomics, to lock/unlock. Note: on US-T1 and US-T2; CAS (compare and set) is cheap.

Page 293: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 293

JVM Optimizations for Affinity • Thread-Local Allocation Buffers (TLAB)

> Java threads allocate objects in thread private memory instead of allocating on shared heap.

• Parallel TLABs

> GC threads copy live objects to thread private memory

Page 294: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 294

JVM Optimizations for Affinity • NUMA aware allocators

> Chip and/or board-local allocation regions: TLABs and PLABs write large.

> Associate Java and GC threads with a region.

> Collect all regions when one becomes full.

> Depends on OS affinity mechanisms.

> Objects allocated together are usually accessed together.

Page 295: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 295

JVM NUMA Optimizations • On Solaris

> Single JVM

>New in JDK 6 Update 2: use the NUMA allocator, -XX:+UseNUMA

> Prior to JDK 6 Update 2:

>/etc/system: lgrp_mem_default_policy=3

> Multiple JVM

>Use processor sets using psrset

>Significant improvement (5-10%) on x64 and US-IV+ System

>lgrp_mem_pset_aware=1 –Default random policy applies only to

lgroups with a process' processor set

Page 296: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 296

JVM NUMA Optimizations • On Linux

> Single JVM

>numactl --interleave

> Multiple JVM

>numactl –cpubind=$node_num --membind=$node_num

> Significant improvement on x64 systems

Page 297: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 297

JVM NUMA Optimizations • On Windows

> Single JVM

>AMD Opteron: enable node-interleaving in the BIOS

> Multiple JVM

>Use Processor Affinity (similar to numactl on Linux) –Bring up task manager select the

processes tab, select a process and right click. You will get a popup, select Processor Affinity

> Significant improvement on x64 systems

Page 298: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 298

JVM Latency / Bandwidth Optimizations

• Allocation prefetch > Prefetch instructions can acquire cache line ownership for

a processor in time for later writes

> Allocate space in cache for the acquired line

> When allocating objects linearly in TLABs, prefetch a platform-dependent distance ahead of address of the object being allocated

> Subsequent allocations should find line already cached

> Sometimes it's a good idea to prefetch multiple cache lines ahead

Page 299: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 299

JVM Latency / Bandwidth Optimizations

• Object field reordering

• Group frequently accessed fields together so they end up in minimum number of cache lines

• Often with object header

• Experience shows that scalar fields should be grouped together separately from object reference fields

Page 300: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 300

JVM Latency / Bandwidth Optimizations

• Vectorization

> Load, operate on and store multiple array elements at once with single machine instructions

> E.g., use 8- or 16-byte loads and stores to access 4 or 8 char array elements at a time

> Compiler-generated or tailored assembly code: e.g., System.arraycopy()

• Java™ Programming Advice

> Use System.arraycopy() to best take advantage of vectorization optimizations.

> Avoid writing your own “special loop copiers”.

Page 301: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 301

JVM Latency / Bandwidth Optimizations

• Processors cache virtual-to-physical address translations in Translation Lookaside Buffers (TLBs)

• TLB size is limited, typically 8 to 64 entries

• TLB miss is expensive, requires walking page table in memory

• Modern processors support large pages

> 2 to 4mb rather than 4 to 8kb,

> As much as 256mb on UltraSPARC® T1 processor / UltraSPARC® T2 processor

Page 302: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 302

JVM Latency / Bandwidth Optimizations

• Can map memory with many fewer TLB entries

• JVM can map Java™ heap and generated code cache with large pages

• Far fewer TLB misses

Page 303: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 303

JVM Support for Large Pages • Solaris

> Enabled by default: it just works

> Default size, 8k on SPARC, 4k on x64

> Large page size on US-III and US-IV systems 4m

> US-IV+ supports 32mb pages

>-XX:LargePageSizeInBytes=32m

> US-T1, US-T2 supports 256mb pages

>-XX:LargePageSizeInBytes=256m

> X86 (AMD and Intel) support 4mb pages

> X64 (AMD and Intel) support 2mb pages

Page 304: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 304

JVM Support for Large pages • Large page support optimizes processor TLB

(Translation-Lookaside Buffers) usage.

• TLB is a page translation cache that hold most recently use virtual-to-processor address translations.

• TLB miss can be costly. Memory access(es) to read page table.

• Larger page size allows for bigger TLB entry which can represent larger memory range.

Page 305: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 305

JVM Support for Large pages • Windows

> Use the local security settings console to "lock pages in memory" for the user running the application

>-XX:+UseLargePages

> Remember to reboot

> For more detailed information: http://java.sun.com/docs/hotspot/VMOptions.html#largepages

Page 306: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 306

JVM Support for Large pages • Linux

> Complex to setup.

> Create huge page directory, i.e. mkdir /mnt/hugepages

> Mount huge page file system i.e. mount -t hugelbfs nodev /mnt/hugepages

> Set permissions for read and write on the directory for user/usres what will use large pages. For example, chmod 755 /mnt/hugepages or chmod 777 /mnt/hugepages

> By default, only root will have access after mounting.

Page 307: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 307

JVM Support for Large pages • Linux (continued)

> Specify how many pages you want to allocate as large pages. i.e echo 1500 > /proc/sys/vm/nr_hugepages

> Verify things are ready:

>cat /proc/meminfo | grep -E “(HugePage|Hugepage|Mem)”

> Ready to use -XX:+UseLargePages

> Values in /proc will reset after reboot. Consider setting up an init script.

> More info at: > http://java.sun.com/javase/technologies/hotspot/largememory.jsp

Page 308: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 308

Tuning GC for Multi-core Platforms • For throughput

> Use -XX:+UseParallelOldGC with JDK 5.06 and later. -XX:UseParallelGC otherwise.

> Tune -XX:ParallelGCThreads for number of logical processors application is using. Sun US-T1 and US-T2 can be very sensitive to this.

• For pause sensitive

> Use -XX:+UseConcMarkSweepGC along with -XX:+UseParNewGC

> Use -XX:+CMSIncrementalMode if needed

Page 309: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 309

Tuning GC for Multi-core Platforms • For pause sensitive (continued)

> -XX:ParallelGCThreads=<n>

>Default: number of hardware threads (ncpus)

>if ncpus <= 8, then ncpus, else ncpus * 5 / 8

> -XX:CMSInitiatingOccupancyFraction is the old gen occupancy at which CMS starts collecting

>Larger values improve throughput and Full GC risk

>Lower values reduce throughput and Full GC risk

Page 310: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 310

JVM Code Generation Optimization • AggressiveOpts a general “all purpose”

performance flag, very applicable for multi-core systems.

> -XX:+AggressiveOpts

> Targets code optimizations not garbage collection.

Page 311: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 311

UltraSparc T1 / T2 Tuning: Overview • Common mistakes:

> Single threaded micro-benchmarks

> Default # of parallel GC threads can overwhelm system, i.e. just doing a simple javac compile of a small set of Java™ classes.

> Evaluating without replicating expected application load. Small data subsets will not reflect accurate performance.

> Requires a different performance testing methodology than usually done for evaluating performance.

Page 312: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 312

UltraSparc T1 / T2 Tuning: Overview • Workloads which operate “mostly” within CPU

cache:

These perform better on higher clock rate CPUs than on the T1/T2 CPUs due to higher clock rates and deeper pipelines. A T1/T2 with no cache misses will operate like 8 slow traditional architecture cores.

• Workloads with a high cache miss rate:

These perform as well or even better on the T1/T2 than 32 fast traditional architecture cores. (T1/T2 loads from memory are faster)

Page 313: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 313

UltraSparc T1 / T2 Tuning: Architecture • Drastically different CPU architecture compared to

traditional x86

> Multiple hardware threads per core

> T1 has 4 hardware threads per core

> T2 has 8 hardware threads per core

> T1, 1 hardware thread per core can execute per clock cycle

> T2, 2 hardware threads per core can execute per clock cycle

> Automatically switches to a different hardware thread when thread becomes stalled, (that is CPU cache miss).

Page 314: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 314

UltraSparc T1 / T2 Tuning: Architecture • Switching between runnable hardware threads is

much faster than waiting for memory fetching.

• A single threaded application will consume only 1 core and 1 hardware thread on a T1 or T2 processor.

> Example: since each core of a T1 can execute at most 1 hardware thread per cycle, a 1.2 GHz, T1 processor will execute a single threaded application as if it were on a 300 MHz CPU.

Page 315: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 315

UltraSparc T1 / T2 Tuning: Architecture • Solaris mpstat & vmstat report each hardware

thread as a CPU.

> More importantly, a CPU is reported as being utlilized even when a processor is stalled.

> Need a tool to assess CPU cache misses and instruction count which will provide UltraSparc T1 / T2 processor core utilization.

Page 316: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 316

Monitoring UltraSparc T1 / T2 CPUs: corestat and mpstat

• Utilize both corestat and mpstat to understand CPU utilization for UltraSparc T1 / T2 systems.

• corestat is perl script that aggregates cpustat L2 cache misses & instruction count across all virtual processors in a UltraSparc T1 / T2 core. > http://blogs.sun.com/roller/resources/travi/corestat_v1.0.tar.gz

• mpstat reports how busy a virtual processor is where “busy” also includes a “stalled” state

Page 317: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 317

Monitoring UltraSparc T1 / T2 CPUs: corestat and mpstat

• An mpstat report of a 100% utilized virtual processor and core does not differentiate between:

> A processor and core that is really busy

> A processor and core that is stalled

• A combination of corestat and mpstat provides information about whether increasing or decreasing Java™ threads or adding JVMs makes sense

Page 318: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 318

UltraSparc T1 / T2 Case Study • 8 core (32 strands) T2000 versus 8 core Opteron

• 1/4th production workload showed Opteron performing 25% better

• Full production workload showed T2000 30% better than Opteron

• Conclusion

> You have to load the UltraSparc T1 / T2 to see it shine.

> System architecture is vastly different from traditional high clock rate systems.

Page 319: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 319

UltraSparc T1 / T2 JVM Tuning: Throughput Focus • JDK 5.0 update 6 and later has specific

optimizations for T1 / T2.

• For applications emphasizing throughput begin with:

> -XX:+UseParallelOldGC (requires 5.0 update 6, improved in update 8), otherwise use -XX:+UseParallelGC

> -XX:ParallelGCThreads=<# cpus * 5/8>

> -XX:LargePageSizeInBytes=256m

> -XX:+AggressiveOpts

> -XX:+UseBiasedLocking

Page 320: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 320

UltraSparc T1 / T2 JVM Tuning: Pause Time Focus • For applications emphasizing pause time

constraints over throughput begin with: > -XX:+UseConcMarkSweepGC

> -XX:+UseParNewGC

> -XX:ParallelGCThreads=<# cpus * 5/8>

> -XX:LargePageSizeInBytes=256m

> -XX:+AggressiveOpts

> -XX:+UseBiasedLocking

> -XX:MaxTenuringThreshold=31

> -XX:TargetSurvivorRatio=90

> -XX:SurvivorRatio=8

Page 321: Java™ SE Performance Tuning

The Java™ SE Performance Tuning Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Revision A 321

UltraSparc T1 / T2 JVM Tuning: Incorporating Feedback • Monitor CPU utilization with mpstat and corestat.

Pay particular attention to core utilization to get an understanding of whether there's “load” room.

• Monitor JVM behavior and tune accordingly:

> Fine tune heap sizes (-Xms, -Xmx and -Xmn).

> Fine tune survivor spaces.

> Try enabling / disabling biased locking.

> Try enabling / disabling aggressive opts.

• Use the monitoring data to drive the tuning effort.

• Minimize configuration changes between runs. Ideally. one change at a time.