ibm tivoli jvm monitoring – best practices steve klopfer technical specialist, ibm...

IBM Tivoli

JVM Monitoring – Best Practices

Steve Klopfer

Technical Specialist, IBM

scklopf@us.ibm.com

IBM Tivoli

Definitions

Monitoring – Observing performance data in real time to find and correct resource, throughput, or response time problems.

Trending – The analysis of data with the intention of identifying discernable patterns.

Forecasting – The projection of those identified patterns on business growth patterns to understand the impact on business processes.

Capacity planning – The response to forecasts that ensures the integrity of business processes.

IBM Software Group | Tivoli software

Capacity/Load model

Typical WAS/J2EE Application Components

CPU (AIX, Solaris, Windows)

Component interactions

Production JVM (AIX, AS400, HP-UX, Linux, Solaris, Unix, Windows, OS/390, z/OS)

Application ServerApplication Server

J2EE ApplicationJ2EE Application

Servlet

CICSTransaction

Gateway

MQSeriesConnector

JDBC Driver

Thread Pool

EJB Pools

JDBC Pools

Mainframe

Back-end systems

Database

Memory Management

J2EE ServicesFile and

Network I/O

CustomerTransactions

J2EE components Back-end connectors

HTTP Server

plugin

What kinds of Problems does JVM Monitoring Help Solve? Request / Transaction problems

– Slow or Hung requests – Intermittent performance problems– Correlation to remote EJB containers, CICS, IMS, MQ

Real time diagnostics– In flight request search and diagnose capability with Java stack trace and thread dumps in real

time Memory leaks

– Monitor JVM heap size, memory usage and garbage collection patterns, – Heap snapshots

Resource monitoring – Connection Pools, JDBC, Thread pool, etc

Non-intrusive Diagnostic data collection for key application components – JMS, SCA, Portlets (ITCAM for WS only), Web Services, etc.

Problem Situation Automation– Alerts and Traps for hard to re-create problems and problem context for later diagnosis

Problem recreation – Provides production data for hard to re-create problems via integration with Rational Performance

Tester (RPT) and IBM Performance Optimization Toolkit (IPOT) How is it doing today and how will it do tomorrow?

– Historical and Trending reports

Questions to Ask when troubleshooting Is the problem re-creatable?

Did it ever work? If it did, what changed – configuration, additional installation, product upgrade etc.

Does environment matter e.g. works in test/development but not in production

What is the topology of the environment What external systems are involved?

Any connectivity (firewall), security – authentication, expired passwords issues?

Is there any workload considerations Is the problem happening under heavy workloads?

Network or bandwidth issues?

Is there a pattern to the problem e.g. every Monday morning at 10 AM?

IBM Tivoli

What must a good monitoring product do?A clever person solves a problem. A wise person avoids it. -- Einstein

• It must monitor the environment 24 X 7.

• Real time visualization tools are not adequate unless you plan on having highly paid analysts monitoring these tools 24 X 7.

• It must support intelligent alerting

• Alerting tools must acquire and correlate metrics from multiple sources.

• It must exhibit a depth of monitoring across the breadth of technologies that spans, at minimum, end-user experience (both real and synthetic), application servers, and data base servers.

IBM Tivoli

Monitoring Levels Vertical levels, not Horizontal levels

Monitoring On Demand Change monitoring level as needed without restarting either the

applications or the application servers

No need to pinpoint specific classes or methods in advance (i.e., no need to designate what needs to be monitored)

“Level 1” – Request Level - Production 100% of System Resource information

100% of incoming requests/transactions

“Level 2” – Component Level – Problem Determination View major application events (EJB’s, servlets, JDBC, JNDI, etc.)

“Level 3” – Method Level - Tracing Adds method trace information for problem determination and

performance analysis.

IBM Tivoli

Using the Tool Efficiently Everyone assumes they need method level data for every

transaction in Production What would you do with that much data?

Gain Application/Transaction Understanding in Test/QA, workload understanding in Production

Use Traps and Alerts to find anomalies and collect detailed data

Test/QA Use L2/L3 for Transaction/Application Analysis

Top Methods Used (L3)

Most CPU Intensive methods (L3)

Top Slowest Methods (L3)

Transaction Component (L2) Trace

Transaction Method (L3) Trace

SQL Profile (L2)

IBM Tivoli

Application Performance Analysis

Work with Defined Objectives

Throughput / Response time Goals from SLA’s

Identify and Fix any Performance Problems Early

Slow Transactions, Memory Leaks, WebSphere Performance Tuning

Best Practices for Performance Tuning and Analysis

Collect the information about the applications and the environment.

Identify Key Transactions

Conduct Transaction Profiling

Conduct Workload Profiling

Measure the baseline matrix for various performance parameters before tuning

Leverage your tools in conjunction with load testing tools to analyze and tune application performance

IBM Tivoli

Focus on Best Practices Identify all key transactions in the workload mix

Most frequently used

Most important to application

Set workable limit e.g. 10-20

Conduct Transaction Profiling to obtain basic understanding of what these key transactions do

Code Flow (component and method level)

Component Profile

Method Profile

Event timings for each component and method

IBM Tivoli

Transaction Profiling

Transaction Profiling refers to tracing the entire execution of a selected

request (HTTP or EJB invocation)

Normally the best practice is to prepare a single user automated test script that fires off such transactions with a think time in between invocations

At L2 monitoring level, the data is shown at J2EE component Level with contextual data

JSP, EJB, JMS, MQI, JDBC, JNDI

At L3, full application class/method trace will be collected by default

IBM Tivoli

Workload Analysis

Workload Analysis refers to running the applications via a

Traffic Simulator with a number of clients

Monitoring Tool is normally running at L1 for this type of analysis, with a sampling rate under 10%

Normally the best practice is to prepare a multi-user automated test script that fires off transactions in the right mix that represents the ‘production’ workload

IBM Tivoli

Workload Analysis

Each run should be at least 30-60 minutes long to observe the system at

Steady State

During steady state, analysis can be conducted on a large number of metrics:

Heap, CPU, paging, throughput, response time, WebSphere resource pools, GC activities etc..

At the end of the run, a graph of CPU% vs. Throughput Rate should be plotted. Any non-linearity of the behavior of the workload should be explained, bottlenecks eliminated, and a re-run until a relatively linear line is obtained

More reports can be drawn from Performance Analysis & Reporting (PAR)

IBM Tivoli

Additional Performance Tuning Tips - 1 Here are a few other things that we can try to help improve performance. Please note, that these suggestions are given without detailed knowledge of the environment / architecture / open issues.

Increase web container max keep-alives.

Increase web container thread pool.

Increase database connection pool.

Adjust maximum and minimum heap sizes.

Disable explicit garbage collection.

Enable concurrent I/O at o/s level.

Pre-compile JSPs.

Increase the priority of the app server process at o/s level.

IBM Tivoli

Additional Performance Tuning Tips - 2

If there are many short living objects, tuning NewSize and MaxNewSize JVM parameters would help.

Changing ulimit for operating system (AIX, Solaris) may help improve performance.

Enable dynamic caching, if possible.

Creating new indexes or re-organizing indexes will help improve performance of database intensive transactions.

Adjusting prepared statement cache size may also help.

Adjust O/S parameters: tcp_time_wait_interval and tcp_fin_wait_2_flush_interval.

IBM Tivoli

Example: Workload Analysis

IBM Tivoli

Check Environmental ConsistencyEnsure Platform Can Support Application

Verify System, Java

and App Server

Runtime Environment

IBM Tivoli

Check Server StatisticsCompare key performance metrics side-by-side

Shows paging and load

balancing in clustered

deployments

Ensures overall throughput matches

expected results from load generator

Quick overview of application impact

on monitored servers

IBM Tivoli

Validate Throughput vs Response TimeQuantify Application Scalability

Correlated plot of

response time during stress test relative to request rate

Graphical report showing

number of requests over

IBM Tivoli

Calculate Throughput vs. JVM CPU%Verify target transaction per second rate achievable

Request rate during stress run (same as prior slide)

Correlated plot reveals low JVM

CPU consumption even as

throughput increases

IBM Tivoli

Throughput vs. Garbage Collection (GC) Tune JVM to minimize GC frequency

Request rate during stress

GC frequency not in steady state as throughput rises

Increased heap size impacting GC rate although < = 6 per

minute appears to be affordable as response time

remains < 34 ms

IBM Tivoli

Throughput vs. Total GC timeAvoid paging (has large effect on end user response time)

Request rate ramps and tops

Excessive and persistently high

total GC time

Total time for GC to complete per cycle correlated with request rate

IBM Tivoli

Throughput vs. Heap size after GCGood indicator of potential memory leaks

Request rate during stress run (same as prior slide)

Shows well-tuned heap size as little

if any growth during high throughput

No growth in heap under

increased load proves no detectable

IBM Tivoli

WebSphere Resources Utilization AnalysisVerify application does not over-tax app server resources

Saturated thread pool –

good candidate for tuning !

Overall we see low J2EE resource consumption

IBM Tivoli

Check Average CPU time per TransactionBased on threads running application classes in workload mix

Spikes showing high

consumption at random intervals

Otherwise normal

consumption rates

IBM Tivoli

Check Average CPU time per TransactionBased on threads running application classes in workload mix

Transaction with very high CPU in spike

interval

IBM Tivoli

Example: Transaction Analysis Methodology

IBM Tivoli

Analyze Transaction Instances of InterestShow “Level 2” J2EE component-level events

Sequential view of event

execution / flow

High-precision timing

measurements for each event

Highlighted JCA calls

exhibit high delta CPU

timing difference

IBM Tivoli

Further Analyze TransactionsShow discreet “Level 3” method-level and nested method events

Each row shows

method flow and depth

Good candidate for tuning due to

high delta CPU consumption

IBM Tivoli

Analyze SQL Profile

• Check the response time for various queries. • Use the data in conjunction with Top used queries report. Tune queries.

IBM Tivoli

Check for Top Methods UsedIdentify hot methods by count

Names of hot

methods

Total Invocation

IBM Tivoli

Check for Most CPU-Intensive MethodsCorrelate hot methods by CPU cost with highest count methods

Names of hot methods

CPU consumption

for each method

IBM Tivoli

Check for Slowest MethodsCorrelate with hot methods to evaluate total contribution to response time

Names of slow

methods

High average response time

per method

IBM Tivoli

Example: Memory Leak Analysis

IBM Tivoli

Memory Analysis ReportingQuick check to detect presence of a leak Upward slope

indicates possibility of

a “slow” memory leakConstant

request rate correlated with JVM Heap Size

IBM Tivoli

Memory Leak: Avg. Heap Size after GC vs. Requests

• Average Heap Size after GC vs. Number of Requests: • Verify that a leak exists with the Avg. Heap Size After GC Graph. • Check to see if it is due to an increasing number of requests.

To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Change Metrics.

IBM Tivoli

Memory Leak: Average Heap Size after GC vs. Live Sessions

• Average Heap Size after Garbage Collection (GC) vs. Live Sessions: • Verify that a leak exists with the Avg. Heap Size After GC Graph • Check to see if it is due to an increasing number of users.

To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Select Metrics.

IBM Tivoli

Find Leaking CandidatesProduction-friendly heap-based analysisComparison of

heap snapshots shows suspected leak candidates

Classname filters

Application class that appears to have some growth

IBM Tivoli

Zero in on leaking codeView suspected classes and allocating methodsEach ‘allocation pattern’

uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code

Indicates the specific point in the application code where this object set was allocated from

IBM Tivoli

Zero in on leaking code (scroll from previous page)View suspected classes and allocating methods

Each ‘allocation pattern’ uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code

Additional code and GC performance details help developers isolate leak and optimize JVM

Large number of surviving objects since last GC

IBM Tivoli

View References to Live ObjectsConfirm Allocating Class

Helps pinpoint why objects in question are not getting garbage collected

Also shows other objects on the heap which contain references to the set of objects being analyzed.

Allocating method and line number in the code

IBM Tivoli

Questions

IBM Tivoli

Thank You

ibm tivoli jvm monitoring – best practices steve klopfer technical specialist, ibm...

ibm tivoli jvm

production data

ibm corporation

method level data

analysis of data

request level production

application servers

depth of monitoring

Documents

ibm soa © 2007 ibm corporation service oriented...

ibm directory strategy rick mayo ibm directory brand manager...

© 2006 ibm corporation ibm systems and technology group ibm...

ibm i and bladecenter 2q 2009 update vess natchev and kyle...

db2 sequoia(v11) high availability enhancements · db2...

© 2009 ibm corporation doug mack...

© 2015 ibm corporation ibm spectrum protect 7.1.2 updates,...

© 2012 ibm corporation openstack technology review & demo...

© 2010 ibm corporation ibm power systems power your planet....

© 2012 ibm corporation openstack ce technology review &...

® ibm software group © ibm corporation rup and agility at...

warren heising and joe kennedy, ibm corp. ibm information...

© 2008 ibm corporation sales training ibm systemsgreen...

preference-aware integration of temporal...

your career @ ibm certified professions and beyond…...

pre-solicitation meeting 5/30/03 - michigan€¦ · web...

® ibm software group © 2008 ibm corporation alm nonucm and...

ibm eserver © 2004 ibm corporation ibm systems and...

© 2011 ibm corporation cloud security perspectives dan...

© 2008 ibm corporation aix workload partions viraf patel...