performance diagnostics using statspack data 18-may 2006 tim gorman sagelogix, inc. n. ca oracle...
TRANSCRIPT
Performance Diagnostics using STATSPACK data
18-May 2006
Tim GormanSageLogix, Inc.
N. CA Oracle Users GroupN. CA Oracle Users Group
AgendaAgenda
• Configuring STATSPACK optimally• About the STATSPACK repository• Analysis using the STATSPACK repository
Configuring STATSPACKConfiguring STATSPACK
What’s missing from the standard installation script “spcreate.sql” in “$ORACLE_HOME/rdbms/admin”?
Purging
Configuration of data sampling levels and thresholds
Purging STATSPACK dataCan’t retain data forever, after all…
Recommendation: retain at least one major business-cycle of performance data
Standard script “sppurge.sql” must be run manually…Recommendation: Script “sppurpkg.sql” available at http://www.EvDBT.com/tools.htm
EXEC SPPURPKG.PURGE(14)
Oracle10g finally includes a procedure providing similar functionalityEXEC STATSPACK.PURGE(TRUNC(SYSDATE - 14), TRUE)
Configuring STATSPACKConfiguring STATSPACK
Configuration of data sampling levels and thresholdsSnap Level
Thresholds
Snap level determines which information is gatheredBasic database performance statistics
SQL statement activity
SQL execution plans
Segment-level I/O statistics
Latch details
Thresholds keep too much information from being gatheredSome gathered information can scheduled below specified thresholds
SQL statement activity
Segment-level I/O statistics
Snap LevelSnap Level
Level 0 (i.e. any value < 5) Database performance statistics only
Level 5 Level 0 plus SQL execution info Default level of data gathering for all versions of STATSPACK
Recommended level for Oracle8i and below..
Level 6 (introduced in Oracle9i) Level 5 plus SQL Plan info
Level 7 (introduced in Oracle9i) Level 6 plus segment-level usage info
Recommended level for Oracle9i and above…
Level 10 Level 5, 6, and 7 plus detailed parent/child latch statistics
MetaLink note #149121.1 Gathering a STATSPACK Snapshot
Data gathering thresholdsData gathering thresholds
SQL statements are gathered if any of these thresholds are exceeded
Threshold Default value For busier systems…
Number of executions 100 >= 1,000
Number of parse calls 1,000 same
Number of disk reads 1,000 >= 10,000
Number of buffer gets 10,000 >= 100,000
Size of sharable memory 1 Mbyte same
Version count 20 same
Data gathering thresholdsData gathering thresholds
Segment-level statistics are gathered if any of these thresholds are exceeded
Threshold Default value For busy systems, recommended value
Number of physical reads 1,000 Same
Number of logical reads 10,000 Same
Number of “buffer busy waits”
100 Same
Number of row-lock waits 100 Same
Number of ITL waits 100 1
Number of CR/CU blocks shipped (RAC)
1,000 Same
Data gathering thresholdsData gathering thresholds
Metalink note #153507.1 - Oracle Applications and STATSPACK
Contains good suggested threshold values for a busy and complex database
But the note does not suggest a good method for setting the parameters
SQL*Plus script “spparms.sql” (found online at http://www.EvDBT.com/tools.htm) provides code for updating SQL thresholds in the STATS$STATSPACK_PARAMETER table
Uses settings suggested in the MetaLink note…
Also sets SNAP_LEVEL…
Using STATSPACKUsing STATSPACK
But using STATSPACK isn’t all about the reports provided by Oracle
The single main report requires some advance knowledge of a time period in which problems are occurring…
Keep in mind that there is an amazing repository of information that can used for broad general analysis as well as specific targeted investigations
Think of some interesting questions that arise during a troubleshooting session
What changed between then and now?
Is the current behavior an anomaly or normal for this environment?
How much of resource XXX are we using? How much have we used over time?
Using STATSPACKUsing STATSPACK
In general, STATSPACK data will not provide a specific answer to a specific solution
It is too high-level, too aggregated
However, it can provide a general ideaEnough to zero in on what should be examined more closely
So…..Use STATSPACK data to:
Generalize the problem(s)
Eliminate irrelevancies (!!!!)
View trends from a high level
Use extended SQL Tracing (event 10046, level > 1) to:Examine individual processes minutely
Determine exactly what is happening in a specific process
STATSPACK repositorySTATSPACK repository
Number of tables has expanded with each versionAbout 30 tables in Oracle8i
About 40 tables in Oracle9i
About 55 tables in Oracle10gNot counting “control tables” used by STATSPACK itself
Each of these tables can be considered a FACT table in a subject area of a dimensional data model
Lone dimension is STATS$SNAPSHOT“time” dimension
Each of the “fact” tables in the repository are keyed by SNAP_ID, which can be translated to SNAP_TIME by joining to STATS$SNAPSHOT
SNAP_ID, DBID, INSTANCE_NUMBER
“snapshots” and cumulative data
“snapshots” and cumulative data
Each time the packaged procedure STATSPACK.SNAP is run, it captures the current values in the V$ views
Stores the current values in the corresponding STATS$ table
SNAP_ID SNAP_TIME VALUE
4355 10-Feb 2005 13:00 875,543,322
4356 10-Feb 2005 14:00 875,543,421
4357 10-Feb 2005 15:00 875,648,888
4358 10-Feb 2005 16:00 733
4359 10-Feb 2005 17:00 3,321,333
“snapshots” and cumulative data
“snapshots” and cumulative data
The standard STATSPACK reportCalculates the “difference” or “deltas” between any two “snapshots” using PL/SQL logic
But this type of data prevents reporting and analysis across many snapshots
Cannot simply summarizeThe cumulative data is not additive
Cannot analyze across instance restartsAll statistics are reset to zero after restart
“snapshots” and cumulative data
“snapshots” and cumulative data
Cumulative data needs to be converted into “deltas” somehow…
SNAP_ID SNAP_TIME VALUE VALUE_INC
4355 10-FEB 2005 13:00 875,543,322
4356 10-FEB 2005 14:00 875,543,421 99
4357 10-FEB 2005 15:00 875,648,888 105,467
4358 10-FEB 2005 16:00 733 733
4359 10-FEB 2005 17:00 3,321,333 3,320,600
Analytic windowing functionsAnalytic windowing functions
“LAG()” function to the rescue!
LAG (<expr>, <offset>, <default>)OVER (
PARTITION BY clauseORDER BY clause[ ROWS | RANGE ] windowing clause
)
Creates the concept of a “current row” in relationship to preceding rowsA set of related rows is created with the PARTITION BY, ORDER BY, and windowing clauses
Analytic windowing functionsAnalytic windowing functions
select snap_id,
snap_time,
value,
lag(value, 1, 0) over
(partition by dbid,
instance_number
name
order by snap_id) prev_value
from <table-name>
where …
order by …
Analytic windowing functions
Analytic windowing functions
If current is greater than (or equal to) previousthen use deltaelse use current
If=> decode(greatest(value, lag(value, 1, 0) over (partition by dbid, instance_number, name order by snap_id)Equals => value,Then => value - lag(value, 1, 0) over (partition by dbid, instance_number, name order by snap_id),Else => value)
sp_systime_9i.sqlsp_systime_9i.sql
Written to mimic the top-level logic of the YAPP reporthttp://www.oraperf.com/Response-time = Service-time + Wait-time
Script “sp_systime_9i.sql” uses analytic windowing functions to produce this report
LAG() function to calculate “deltas” between snapshotsRANK() function to find the “top N” calculated “delta” valuesRATIO_TO_REPORT() function to calculate percentages on the returned “delta” values on the whole
The intent of the report is to show, day-by-day or hour-by-hour, where the database instance is spending the most time
sp_systime_9i.sqlsp_systime_9i.sql
Daily breakdown (top 10 time consumers)
Service, Non-Idle Idle, Seconds % of
Day or Wait Name Spent Total
------ -------- ----------------------------------- ------------------- -------
22-AUG Service SQL execution 28,842.92 67.39
Service Recursive SQL execution 4,480.52 10.47
Wait db file sequential read 3,598.26 8.41
Wait db file parallel write 2,441.00 5.70
Wait direct path read 1,096.21 2.56
Wait db file scattered read 1,019.07 2.38
Service Parsing SQL 382.46 0.89
Wait log file parallel write 343.28 0.80
Wait log file sync 179.09 0.42
Wait control file parallel write 111.12 0.26
sp_systime_9i.sqlsp_systime_9i.sql
Hourly breakdown (top 3 time-consumers)
Service, Non-Idle Idle, Seconds % of
Day Hour or Wait Name Spent Total
------ ------ -------- ----------------------------------- ------------------- -------
28-AUG 17:00 Service SQL execution 270.65 84.62
Wait log file sync 10.87 3.40
Service Parsing SQL 7.16 2.24
18:00 Service SQL execution 189.92 82.56
Wait log file sync 9.79 4.26
Wait db file sequential read 5.70 2.48
19:00 Service SQL execution 106.07 81.84
Wait log file sync 4.85 3.74
Wait db file sequential read 4.46 3.44
See listing…
top_stmt4_9i.sqltop_stmt4_9i.sql
Latest in a line of stored procedures to produce a “top N SQL statements” report
Can be sorted by:logical I/Os + (100 * physical I/Os)
elapsed time
logical I/Os
physical I/Os
Each of these can be cumulative for the time period sampled or per execution
top_stmt4_9i.sqltop_stmt4_9i.sql
Beginning Snap Time: 11/29/04 11:00:03 Page 1
Ending Snap Time : 12/01/04 10:00:02 Nbr of Snapshots: 48
Date of Report : 12/01/04 10:55:30 Total Logical Reads: 580,110,532
Total Physical Reads: 2,816,050
.
Module: " ? @ihe3 (TNS V1-V3)"
.
SQL Statement Text (Hash Value=1397201706)
------------------------------------------
0 SELECT level, series_id, parent_id, series_name, type,
1 constraint_flag, facilitator_page, series_text_3, display_order
2 _num, master_series_id, series_keyword, instructor_id FROM c
3 m_series_instance WHERE reg_code = :reg_code AND type =
4 :block_type AND status = 'active' START WITH pa
5 rent_id = 0 CONNECT BY parent_id = PRIOR series_id
.
: Disk Buffer Cache Hit DR Per BG Per CPU Per Ela Per
: Reads Gets Ratio Runs Run Run Run Run
: ----- ------ --------- ---- ------ ------ ------ ------
: 2,040,353 253,437,801 99.19% 3,071 664 82,526 1.47 8.06
: (72.454%) (43.688%)
top_stmt4_9i.sqltop_stmt4_9i.sql
. SQL execution plan from "11/29/04 11:00:03" (snap #481)
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------------
|* 0 | SELECT STATEMENT | | | | |
|* 1 | FILTER | | | | |
|* 2 | CONNECT BY WITH FILTERING | | | | |
|* 3 | NESTED LOOPS | | | | |
|* 4 | INDEX RANGE SCAN | CM_SERIES_INSTANCE_IDX4 | 69 | 207 | 1 (0)|
|* 5 | TABLE ACCESS BY USER ROWID | CM_SERIES_INSTANCE | | | |
|* 6 | NESTED LOOPS | | | | |
|* 7 | BUFFER SORT | | 69 | 8418 | |
|* 8 | CONNECT BY PUMP | | | | |
|* 9 | TABLE ACCESS BY INDEX ROWID| CM_SERIES_INSTANCE | 69 | 8418 | 2 (50)|
|* 10 | INDEX RANGE SCAN | CM_SERIES_INSTANCE_IDX4 | 69 | | 1 (0)|
------------------------------------------------------------------------------------------------
See listing…
Additional scriptsAdditional scripts
Generic reporting enablementsp_delta_views.sql
Based on STATS$SQL_SUMMARY and STATS$SQL_PLANtop_stmt4_10g.sql, top_stmt4_9i.sql, top_stmt4_8i.sql, run_top_stmt4.sqlsphistory.sql
Based on STATS$SEG_STAT & STATS$SEG_STAT_OBJsp_buffer_busy_waits.sql, sp_itl_waits.sql, sp_row_lock_waits.sql, sp_gc_waits.sql
Based on STATS$PARAMETERsp_parm_changes.sql
Based on STATS$SYSSTATsptrends.sql
Based on STATS$SYSTEM_EVENTsp_evtrends.sql
Based on STATS$LATCH_MISSESsp_latch_misses.sql
General methodsGeneral methods
Start with “sp_systime” report for initial high-level analysis on a day-by-day basis, with hour-by-hour detail
Whatever consumes the most time, follow up with other analysis scripts
If “SQL execution”, “Recursive SQL execution”, “db file … read” events are significant, find “top N SQL” using TOP_STMT4 report
If “latch free”, drill down with “sp_latch_misses.sql”
If “buffer busy waits”, drill down with “sp_buffer_busy_waits.sql”
If “row lock waits”, drill down with “sp_row_lock_waits.sql”
If RAC (GC) waits, drill down with “sp_gc_waits.sql”
If ITL waits, drill down with “sp_itl_waits.sql”
If “redo *” wait-events, check out “sp_avg_redo_write_time.sql”
For trending on other wait events, use “sp_evtrends.sql” script
For trends on statistics, use “sptrends.sql” script
Northern CA Oracle Users Group
Northern CA Oracle Users Group
Questions? Discussion?
Please fill out the evaluation forms?
[email protected] Website: http://www.SageLogix.com/
Scripts and presentation can be downloaded fromhttp://www.EvDBT.com/papers.htmhttp://www.EvDBT.com/tools.htm
Else, email me… :-)