ibm tivoli jvm monitoring – best practices steve klopfer technical specialist, ibm...
Post on 31-Mar-2015
225 Views
Preview:
TRANSCRIPT
IBM Tivoli
JVM Monitoring – Best Practices
Steve Klopfer
Technical Specialist, IBM
scklopf@us.ibm.com
IBM Tivoli
Definitions
Monitoring – Observing performance data in real time to find and correct resource, throughput, or response time problems.
Trending – The analysis of data with the intention of identifying discernable patterns.
Forecasting – The projection of those identified patterns on business growth patterns to understand the impact on business processes.
Capacity planning – The response to forecasts that ensures the integrity of business processes.
IBM Software Group | Tivoli software
Capacity/Load model
IBM Software Group | Tivoli software
Typical WAS/J2EE Application Components
CPU (AIX, Solaris, Windows)
CPU (AIX, Solaris, Windows)
Component interactions
Production JVM (AIX, AS400, HP-UX, Linux, Solaris, Unix, Windows, OS/390, z/OS)
Production JVM (AIX, AS400, HP-UX, Linux, Solaris, Unix, Windows, OS/390, z/OS)
Application ServerApplication Server
J2EE ApplicationJ2EE Application
EJB
Servlet
EJB
CICSTransaction
Gateway
MQSeriesConnector
JDBC Driver
Thread Pool
EJB Pools
JDBC Pools
Mainframe
Back-end systems
Database
Memory Management
J2EE ServicesFile and
Network I/O
CustomerTransactions
J2EE components Back-end connectors
HTTP Server
plugin
IBM Software Group | Tivoli software
© 2007 IBM Corporation5 ITCAM for WebSphere and ITCAM for J2EE Version 6.1
What kinds of Problems does JVM Monitoring Help Solve? Request / Transaction problems
– Slow or Hung requests – Intermittent performance problems– Correlation to remote EJB containers, CICS, IMS, MQ
Real time diagnostics– In flight request search and diagnose capability with Java stack trace and thread dumps in real
time Memory leaks
– Monitor JVM heap size, memory usage and garbage collection patterns, – Heap snapshots
Resource monitoring – Connection Pools, JDBC, Thread pool, etc
Non-intrusive Diagnostic data collection for key application components – JMS, SCA, Portlets (ITCAM for WS only), Web Services, etc.
Problem Situation Automation– Alerts and Traps for hard to re-create problems and problem context for later diagnosis
Problem recreation – Provides production data for hard to re-create problems via integration with Rational Performance
Tester (RPT) and IBM Performance Optimization Toolkit (IPOT) How is it doing today and how will it do tomorrow?
– Historical and Trending reports
IBM Software Group | Tivoli software
Questions to Ask when troubleshooting Is the problem re-creatable?
Did it ever work? If it did, what changed – configuration, additional installation, product upgrade etc.
Does environment matter e.g. works in test/development but not in production
What is the topology of the environment What external systems are involved?
Any connectivity (firewall), security – authentication, expired passwords issues?
Is there any workload considerations Is the problem happening under heavy workloads?
Network or bandwidth issues?
Is there a pattern to the problem e.g. every Monday morning at 10 AM?
IBM Tivoli
What must a good monitoring product do?A clever person solves a problem. A wise person avoids it. -- Einstein
• It must monitor the environment 24 X 7.
• Real time visualization tools are not adequate unless you plan on having highly paid analysts monitoring these tools 24 X 7.
• It must support intelligent alerting
• Alerting tools must acquire and correlate metrics from multiple sources.
• It must exhibit a depth of monitoring across the breadth of technologies that spans, at minimum, end-user experience (both real and synthetic), application servers, and data base servers.
IBM Tivoli
Monitoring Levels Vertical levels, not Horizontal levels
Monitoring On Demand Change monitoring level as needed without restarting either the
applications or the application servers
No need to pinpoint specific classes or methods in advance (i.e., no need to designate what needs to be monitored)
“Level 1” – Request Level - Production 100% of System Resource information
100% of incoming requests/transactions
“Level 2” – Component Level – Problem Determination View major application events (EJB’s, servlets, JDBC, JNDI, etc.)
“Level 3” – Method Level - Tracing Adds method trace information for problem determination and
performance analysis.
IBM Tivoli
Using the Tool Efficiently Everyone assumes they need method level data for every
transaction in Production What would you do with that much data?
Gain Application/Transaction Understanding in Test/QA, workload understanding in Production
Use Traps and Alerts to find anomalies and collect detailed data
Test/QA Use L2/L3 for Transaction/Application Analysis
Top Methods Used (L3)
Most CPU Intensive methods (L3)
Top Slowest Methods (L3)
Transaction Component (L2) Trace
Transaction Method (L3) Trace
SQL Profile (L2)
IBM Tivoli
Application Performance Analysis
Work with Defined Objectives
Throughput / Response time Goals from SLA’s
Identify and Fix any Performance Problems Early
Slow Transactions, Memory Leaks, WebSphere Performance Tuning
Best Practices for Performance Tuning and Analysis
Collect the information about the applications and the environment.
Identify Key Transactions
Conduct Transaction Profiling
Conduct Workload Profiling
Measure the baseline matrix for various performance parameters before tuning
Leverage your tools in conjunction with load testing tools to analyze and tune application performance
IBM Tivoli
Focus on Best Practices Identify all key transactions in the workload mix
Most frequently used
Most important to application
Set workable limit e.g. 10-20
Conduct Transaction Profiling to obtain basic understanding of what these key transactions do
Code Flow (component and method level)
Component Profile
Method Profile
Event timings for each component and method
IBM Tivoli
Transaction Profiling
Transaction Profiling refers to tracing the entire execution of a selected
request (HTTP or EJB invocation)
Normally the best practice is to prepare a single user automated test script that fires off such transactions with a think time in between invocations
At L2 monitoring level, the data is shown at J2EE component Level with contextual data
JSP, EJB, JMS, MQI, JDBC, JNDI
At L3, full application class/method trace will be collected by default
IBM Tivoli
Workload Analysis
Workload Analysis refers to running the applications via a
Traffic Simulator with a number of clients
Monitoring Tool is normally running at L1 for this type of analysis, with a sampling rate under 10%
Normally the best practice is to prepare a multi-user automated test script that fires off transactions in the right mix that represents the ‘production’ workload
IBM Tivoli
Workload Analysis
Each run should be at least 30-60 minutes long to observe the system at
Steady State
During steady state, analysis can be conducted on a large number of metrics:
Heap, CPU, paging, throughput, response time, WebSphere resource pools, GC activities etc..
At the end of the run, a graph of CPU% vs. Throughput Rate should be plotted. Any non-linearity of the behavior of the workload should be explained, bottlenecks eliminated, and a re-run until a relatively linear line is obtained
More reports can be drawn from Performance Analysis & Reporting (PAR)
IBM Tivoli
Additional Performance Tuning Tips - 1 Here are a few other things that we can try to help improve performance. Please note, that these suggestions are given without detailed knowledge of the environment / architecture / open issues.
Increase web container max keep-alives.
Increase web container thread pool.
Increase database connection pool.
Adjust maximum and minimum heap sizes.
Disable explicit garbage collection.
Enable concurrent I/O at o/s level.
Pre-compile JSPs.
Increase the priority of the app server process at o/s level.
IBM Tivoli
Additional Performance Tuning Tips - 2
If there are many short living objects, tuning NewSize and MaxNewSize JVM parameters would help.
Changing ulimit for operating system (AIX, Solaris) may help improve performance.
Enable dynamic caching, if possible.
Creating new indexes or re-organizing indexes will help improve performance of database intensive transactions.
Adjusting prepared statement cache size may also help.
Adjust O/S parameters: tcp_time_wait_interval and tcp_fin_wait_2_flush_interval.
IBM Tivoli
Example: Workload Analysis
IBM Tivoli
Check Environmental ConsistencyEnsure Platform Can Support Application
Verify System, Java
and App Server
Runtime Environment
IBM Tivoli
Check Server StatisticsCompare key performance metrics side-by-side
Shows paging and load
balancing in clustered
deployments
Ensures overall throughput matches
expected results from load generator
Quick overview of application impact
on monitored servers
IBM Tivoli
Validate Throughput vs Response TimeQuantify Application Scalability
Correlated plot of
response time during stress test relative to request rate
Graphical report showing
number of requests over
time
IBM Tivoli
Calculate Throughput vs. JVM CPU%Verify target transaction per second rate achievable
Request rate during stress run (same as prior slide)
Correlated plot reveals low JVM
CPU consumption even as
throughput increases
IBM Tivoli
Throughput vs. Garbage Collection (GC) Tune JVM to minimize GC frequency
Request rate during stress
run
GC frequency not in steady state as throughput rises
Increased heap size impacting GC rate although < = 6 per
minute appears to be affordable as response time
remains < 34 ms
!
IBM Tivoli
Throughput vs. Total GC timeAvoid paging (has large effect on end user response time)
Request rate ramps and tops
out
Excessive and persistently high
total GC time
Total time for GC to complete per cycle correlated with request rate
!
IBM Tivoli
Throughput vs. Heap size after GCGood indicator of potential memory leaks
Request rate during stress run (same as prior slide)
Shows well-tuned heap size as little
if any growth during high throughput
No growth in heap under
increased load proves no detectable
leaks
IBM Tivoli
WebSphere Resources Utilization AnalysisVerify application does not over-tax app server resources
Saturated thread pool –
good candidate for tuning !
Overall we see low J2EE resource consumption
IBM Tivoli
Check Average CPU time per TransactionBased on threads running application classes in workload mix
Spikes showing high
consumption at random intervals
Otherwise normal
consumption rates
IBM Tivoli
Check Average CPU time per TransactionBased on threads running application classes in workload mix
Transaction with very high CPU in spike
interval
IBM Tivoli
Example: Transaction Analysis Methodology
IBM Tivoli
Analyze Transaction Instances of InterestShow “Level 2” J2EE component-level events
Sequential view of event
execution / flow
High-precision timing
measurements for each event
call
Highlighted JCA calls
exhibit high delta CPU
timing difference
!
IBM Tivoli
Further Analyze TransactionsShow discreet “Level 3” method-level and nested method events
Each row shows
method flow and depth
Good candidate for tuning due to
high delta CPU consumption
!
IBM Tivoli
Analyze SQL Profile
• Check the response time for various queries. • Use the data in conjunction with Top used queries report. Tune queries.
IBM Tivoli
Check for Top Methods UsedIdentify hot methods by count
Names of hot
methods
Total Invocation
Count
!
IBM Tivoli
Check for Most CPU-Intensive MethodsCorrelate hot methods by CPU cost with highest count methods
Names of hot methods
CPU consumption
for each method
!
IBM Tivoli
Check for Slowest MethodsCorrelate with hot methods to evaluate total contribution to response time
Names of slow
methods
High average response time
per method
!
IBM Tivoli
Example: Memory Leak Analysis
IBM Tivoli
Memory Analysis ReportingQuick check to detect presence of a leak Upward slope
indicates possibility of
a “slow” memory leakConstant
request rate correlated with JVM Heap Size
IBM Tivoli
Memory Leak: Avg. Heap Size after GC vs. Requests
• Average Heap Size after GC vs. Number of Requests: • Verify that a leak exists with the Avg. Heap Size After GC Graph. • Check to see if it is due to an increasing number of requests.
To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Change Metrics.
IBM Tivoli
Memory Leak: Average Heap Size after GC vs. Live Sessions
• Average Heap Size after Garbage Collection (GC) vs. Live Sessions: • Verify that a leak exists with the Avg. Heap Size After GC Graph • Check to see if it is due to an increasing number of users.
To access this feature: Select PROBLEM DETERMINATION -> Memory Diagnosis -> Memory Analysis -> Select Metrics.
IBM Tivoli
Find Leaking CandidatesProduction-friendly heap-based analysisComparison of
heap snapshots shows suspected leak candidates
Classname filters
Application class that appears to have some growth
IBM Tivoli
Zero in on leaking codeView suspected classes and allocating methodsEach ‘allocation pattern’
uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code
Indicates the specific point in the application code where this object set was allocated from
!
IBM Tivoli
Zero in on leaking code (scroll from previous page)View suspected classes and allocating methods
Each ‘allocation pattern’ uniquely identifies a set of heap objects of the same class, allocated by the same request type, and from the same point in the application code
Additional code and GC performance details help developers isolate leak and optimize JVM
Large number of surviving objects since last GC
IBM Tivoli
View References to Live ObjectsConfirm Allocating Class
Helps pinpoint why objects in question are not getting garbage collected
Also shows other objects on the heap which contain references to the set of objects being analyzed.
Allocating method and line number in the code
IBM Tivoli
Questions
IBM Tivoli
Thank You
top related