troubleshooting complex performance issues - oracle seg$ contention
TRANSCRIPT
gluent.com 1
Troubleshooting Another Complex Performance Issue (Part 4)
Tanel PõderGluent, Inc.
http://gluent.comhttp://blog.tanelpoder.com
gluent.com 2
Intro: About me
• Tanel Põder• Oracle Database Performance geek (18+ years)• Exadata Performance geek• Linux Performance geek• Hadoop Performance geek
• CEO & co-‐founder:
Expert Oracle Exadata book
(2nd edition is out now!)
Instant promotion
gluent.com 3
Gluent: All Enterprise Data Available in Hadoop!
GluentHadoop
Gluent
MSSQL
Tera-‐data
IBM DB2
Big Data Sources
Oracle
App X
App Y
App Z
gluent.com 4
A refresher
gluent.com 5
Your task is slow because
It is doing too many operations
or
The operations complete too slowly
or both
gluent.com 6
Measure & Profile!
1. Top SQL & wait events
2. Performance counters
3. Oracle kernel functions
ASH, sqlmonsql_tracesnappersqlmonpstackperf
gluent.com 7
Let's get started!
gluent.com 8
A Data Warehouse data loading/preparation
• A large Exadata / RAC / Oracle 11.2.0.3 reporting environment• Lots of parallel CTAS and direct path loads• High concurrency, high parallelism• Throughput bad, all kinds of waits showing up:
• Other, Cluster, Configuration, Application, Concurrency, User I/O, CPU
gluent.com 9
Parallel Degree 48, 96+, inter-‐instance PX
• Mostly CREATE TABLE AS SELECT statements!• Creating pre-‐aggregated tables, etc
gluent.com 10
Drilldown with ASH data – instance-‐wide
• Wait event names are more informative than wait classes:
SQL> @ashtop session_state,wait_class,event session_type='FOREGROUND'sysdate-1/24 sysdate
%This SESSION WAIT_CLASS EVENT AAS Seconds------ ------- -------------- ------------------------------ ------ -------
32% ON CPU 40.6 14619713% WAITING Cluster gc buffer busy acquire 16.8 6055911% WAITING Configuration enq: TX - allocate ITL entry 13.9 501159% WAITING Cluster gc buffer busy release 11.3 407398% WAITING Scheduler resmgr:cpu quantum 9.9 357114% WAITING Application enq: TX - row lock contention 5.2 187983% WAITING User I/O direct path read temp 4.3 154003% WAITING User I/O cell smart table scan 3.7 132592% WAITING Concurrency buffer busy waits 3 107352% WAITING Other enq: CF - contention 2.1 74431% WAITING Application enq: TM - contention 1.7 6109
gluent.com 11
Drilldown with ASH data – one SQL:
• Let's pick one of the slow CTAS statements:
SQL> @ash/dashtop session_state,event,p1text,p1 top_level_sql_id='89apzfvjba3gu' sysdate-2 sysdate
%This SESSION EVENT P1TEXT P1------ ------- ------------------------------ ------------------ ----------
36% WAITING gc buffer busy acquire file# 120% WAITING gc buffer busy release file# 110% WAITING enq: TX - allocate ITL entry name|mode 14150533168% WAITING gc buffer busy acquire file# 14% WAITING buffer busy waits file# 14% WAITING enq: TX - allocate ITL entry name|mode 14150533162% WAITING buffer busy waits file# 12% WAITING gc buffer busy release file# 12% WAITING enq: TX - row lock contention name|mode 14150533161% WAITING latch: ges resource hash list address 3.3055E+101% ON CPU sleeptime/senderid 268501512
...
Looks like the CTAS statements do
directly experience these waits
gluent.com 12
Wait event explanation
• buffer busy waits• Buffer is physically available in local instance, but is pinned by some
other local session (in incompatible mode)
• gc buffer busy acquire• Someone has already requested the remote block into local instance
• gc buffer busy release• Block is local, but has to be shipped out to a remote instance first (as
someone in the remote instance had requested it first)
• enq: TX -‐ allocate ITL entry• Can't change block as all the block's ITL entries are held by others• Can't dynamically add more ITLs (no space in block)
gluent.com 13
Looks like write-‐write contention across RAC nodes!
gluent.com 14
Contention on what objects, blocks, what SQL?
• Let's dig into the top wait event:
SQL> @ashtop session_state,wait_class,p1,p2 "event='gc buffer busy acquire'" sysdate-1/24 sysdate
%This SESSION WAIT_CLASS P1 P2 AAS TotalSeconds------ ------- -------------- ---------- ---------- ---------- ------------
8% WAITING Cluster 1 279634 1.3 4760 6% WAITING Cluster 1 279635 1.1 3916 6% WAITING Cluster 1 279629 1.1 3838 5% WAITING Cluster 1 279632 .9 3146 4% WAITING Cluster 1 279638 .7 2503
...
SQL> @sed "gc buffer busy"
EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ----------------------- ----------- ----------- ----------- -----------gc buffer busy acquire Cluster file# block# class#gc buffer busy release Cluster file# block# class#
gluent.com 15
Which object?
• Translate file#, block# into segment names/numbers:• Assumes the blocks are in buffer cache
SQL> SELECT objd data_object_id, COUNT(*) FROM v$bh WHERE file#=1 AND block# IN( 279634,279635,279629,279632,279638,279636,279613,279662,279628,279608,279653,279627,279642,279637,279643,279631,279646,279622,279582,279649
) GROUP BY objd ORDER BY 2 DESC;
DATA_OBJECT_ID COUNT(*)-------------- ----------
8 113 All blocks belong to a segment with data_object_id = 8
gluent.com 16
What is segment #8?
• Look up the object names:• Using DBA_OBJECTS.DATA_OBJECT_ID -‐> OBJECT_ID
SQL> @doid 8
object_id owner object_name O_PARTITION object_type--------- --------- ----------------- ------------- -----------
8 SYS C_FILE#_BLOCK# CLUSTER14 SYS SEG$ TABLE13 SYS UET$ TABLE
This segment #8 is a cluster which contains SEG$ (DBA_SEGMENTS) and UET$ (DBA_EXTENTS), but UET$ isn't used anymore thanks to LMT
tablespaces. That leaves SEG$
gluent.com 17
Write contention on SEG$?
• SEG$ is modified when:
1. A new segment is created (table, index, partition, etc)2. An existing segment extends3. A segment is dropped / moved
4. Parallel direct path loads (CTAS) can also create many temporary segments (one per PX slave)• …and merge these into final segment in the end of loading
More about this later…
gluent.com 18
More evidence!
gluent.com 19
AWR 1
• No CPU starvation evident (checked all RAC nodes)
Top 5 Timed Foreground Events~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Event Waits Time(s) (ms) time Wait Class---------------------------- ----------- -------- ----- ------ ----------DB CPU 165,199 37.0gc buffer busy acquire 1,260,128 68,399 54 15.3 Clusterenq: TX - allocate ITL entry 354,496 40,583 114 9.1 Configuratdirect path write temp 4,632,946 37,455 8 8.4 User I/Ogc buffer busy release 213,750 22,683 106 5.1 Cluster
Host CPU (CPUs: 160 Cores: 80 Sockets: 8)~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle--------- --------- --------- --------- --------- ---------
5.20 42.76 26.0 3.3 0.0 69.8
gluent.com 20
AWR 2
• AWR also listed a SEG$ insert as a top SQL:Elapsed Elapsed Time
Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id----------- ------------- ------------- ------ ----- ----- -------------
100,669.1 4,444 22.65 22.6 .5 .0 g7mt7ptq286u7insert into seg$ (file#,block#,type#,ts#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,lists,groups,cachehint,hwmincr, spare1, scanhint, bitmapranges) values (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,DECODE(:17,0,NULL,:17),:18,:19)
SNAP_ID NODE SQL_ID EXECS AVG_ETIME AVG_LIO AVG_PIO------- ---- ------------- ------ ---------- -------- --------
24263 3 g7mt7ptq286u7 552 70.227 880.6 .0
• AWR on another node (same insert into seq$):
Super-‐slow single row inserts into SEG$ ???
gluent.com 21
SQL Trace confirms SEG$ insert problem:
insert into seg$ (file#,block#,type#,ts#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,lists,groups,cachehint,hwmincr, spare1,scanhint, bitmapranges)
values(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,DECODE(:17,0,NULL,:17),:18,:19)
call count cpu elapsed disk query current rows------- ------ -------- ---------- -------- -------- --------- -------Parse 0 0.00 0.00 0 0 0 0 Execute 1 0.82 75.14 3 773 1508 1 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- -------- -------- --------- -------total 1 0.82 75.14 3 773 1508 1
Row Source Operation---------------------------------------------------LOAD TABLE CONVENTIONAL (cr=773 pr=3 pw=0 time=75139696 us)
gluent.com 22
SEG$ insert wait events:
Elapsed times include waiting on following events:Event waited on Times Max. Wait Total Waited------------------------------------ Waited ---------- ------------gc buffer busy release 199 1.01 10.76buffer busy waits 560 1.03 7.02resmgr:cpu quantum 54 0.04 0.08enq: TX - row lock contention 93 0.35 4.25latch: ges resource hash list 400 0.01 0.98gc buffer busy acquire 464 8.96 36.64buffer deadlock 158 0.00 0.00Disk file operations I/O 5 0.00 0.00gc cr block 2-way 101 0.00 0.01gc current block 2-way 2 0.00 0.00cell single block physical read 3 0.00 0.00gc current block busy 10 0.05 0.14gc current grant 2-way 2 0.00 0.00enq: TX - allocate ITL entry 316 0.77 13.18latch: cache buffers chains 334 0.01 1.45...
Why does a single row insert cause so many waits?
gluent.com 23
Checkpoint: What do we know?
gluent.com 24
Checkpoint
1. Lots of parallel CTAS statements, seem to wait for various RAC gc buffer busy events and enq: TX – ITL contention
2. CTAS statements create new segments, new segments cause inserts into SEG$
3. AWR and SQL Trace report super-‐long elapsed times for single-‐row SEG$ inserts
4. It's actually the recursive SEG$ inserts that wait for the gc buffer busy and enq: TX – ITL contention events
gluent.com 25
"Hypothesis"
Could we be in a cache + transaction layer nested locking ping-‐pong loop?
gluent.com 26
Multiple layers of locking & coherency
ITL 2ITL 1
data2data1ccLH c2c1
ITL 4ITL 3
KCBH -‐ block common headerKTBBH -‐ transaction common header
KDBH -‐ Data header
Data block
Row
Level 0: VOS/service layer-‐ Cache buffers chains latch
Level 1: Cache layerBuffer pin-‐ local pin-‐ global cache pin
Level 2: Transaction layer-‐ ITL entry-‐ row lock byte
You must pin (and retrieve) a buffer before you can
change or examine it
What if you pin a buffer and find all its ITL slots busy?
(with no space to add mode)
The session will release the pinand start waiting for enq: TX – ITL contention event
And re-‐get/pin again when TX –ITL wait is over!
gluent.com 27
Raw trace file (edited):
*** 2013-11-18 21:39:09.760WAIT #1: nam='buffer busy waits' ela= 1023321 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='gc buffer busy release' ela= 25304 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='buffer busy waits' ela= 4315 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='Disk file operations I/O' ela= 10 FileOperation=2 fileno=141 filetype=2 WAIT #1: nam='gc cr block 2-way' ela= 134 p1=141 p2=5456 p3=3447 obj#=0 WAIT #1: nam='enq: TX - allocate ITL entry' ela= 29295 name|mode=1415053316WAIT #1: nam='latch: ges resource hash list' ela= 958 address=33046235872 number=91WAIT #1: nam='gc buffer busy acquire' ela= 3384 file#=1 block#=279627 class#=1 obj#=14 WAIT #1: nam='buffer busy waits' ela= 817 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='gc buffer busy acquire' ela= 344 file#=1 block#=192 class#=4 obj#=14WAIT #1: nam='gc buffer busy acquire' ela= 2096 file#=1 block#=279656 class#=1 obj#=14 WAIT #1: nam='buffer busy waits' ela= 3922 file#=1 block#=279656 class#=1 obj#=14 WAIT #1: nam='gc cr block 2-way' ela= 124 p1=141 p2=28888 p3=3895 obj#=0 WAIT #1: nam='enq: TX - allocate ITL entry' ela= 9260 name|mode=1415053316 WAIT #1: nam='latch: ges resource hash list' ela= 683 address=33043220640 WAIT #1: nam='gc buffer busy acquire' ela= 13696 file#=1 block#=279656 class#=1 obj#=14WAIT #1: nam='buffer busy waits' ela= 1312 file#=1 block#=279656 class#=1 obj#=14...*** 2013-11-18 21:39:10.933WAIT #1: nam='gc buffer busy acquire' ela= 1023271 file#=1 block#=277294 class#=1
obj#=9 WAIT #1: nam='buffer deadlock' ela= 15 dba=4471598 class*10+mode=12 flag=8192 obj#=9
What's obj# 9?
What's class# 4?
obj# 14 is SEG$
gluent.com 28
OBJ #9 and #14
• Look up the object names:• Using DBA_OBJECTS.DATA_OBJECT_ID -‐> OBJECT_ID
SQL> @oid 9,14
owner object_name object_type--------------- --------------------- ------------------SYS I_FILE#_BLOCK# INDEX SYS SEG$ TABLE
The object #9 is a cluster index for cluster
C_FILE#_BLOCK# (oid #8 on a previous slide
SQL> @bclass 4
CLASS UNDO_SEGMENT_ID------------------ ---------------segment header Bclass #4 is a segment
header block that also stores freelist information
gluent.com 29
So what did happen?
gluent.com 30
Conclusion from gathered evidence
1. Lots of concurrent + highly parallel CTAS statements
2. Each PX slave created its own temporary segments (in spikes when PX query started up)
3. Spikes of concurrent SEG$ inserts simultaneously on 4 RAC nodes
4. SEG$ cluster blocks out of ITL entries
5. This caused global cache/TX locking trashing – long loops of unsuccessful attempts to pin and insert into SEG$ blocks
gluent.com 31
Also worth noting
1. Resource manager a possible factor• Plenty of resmgr:cpu quantumwaits• Lock holders, buffer pin holders might have gone to sleep – while
holding the pins/locks under contention
1. Freelist-‐based block space allocation• SEG$ is a SYS-‐owned freelist-‐managed index cluster• Many cluster table blocks walked through when searching for space
2. ASH doesn't always show recursive SQL IDs• Attributes waits to parent (top level) statement SQL ID instead• ASH p1, p2, p3, current_obj# columns are useful for drilldown
gluent.com 32
How to fix it?
• Do it less!• What?
• Do not insert/update SEG$ entries so frequently!• How?
• Do not allow parallel direct load insert to create a temporary loading segment for each slave (and inserted partition)!• How?
• Make sure that High Water Mark Brokering is used!• One temporary segment -‐ thus one SEG$ insert/update/delete -‐ per
query (and inserted partition) = much less SEG$ contention.• More in following slides …
gluent.com 33
Why is High-‐Water-‐Mark Brokering needed?
• A historical problem with large uniform extent sizes:1. Each data loading PX slave allocated its own extents (to its private
temporary segment) when loading data2. When the load was completed the private temporary segments were
merged to one final table segment3. The last extent of each private segment possibly ended up not fully used
• Some were almost full, some almost empty – half-‐empty on average4. Wastage = 0.5 x extent_size x PX_slaves
• The more PX slaves, the more wastage• The bigger extent, the bigger wastage
• References:• https://blogs.oracle.com/datawarehousing/entry/parallel_load_uniform
_or_autoallocate• http://www.oracle.com/technetwork/database/bi-‐
datawarehousing/twpdwbestpractices-‐for-‐loading-‐11g-‐404400.pdf
Problem Solved by using High-‐Water-‐Mark
Brokering!
But…
gluent.com 34
Parallel Data Loading and Large Extents – space wastage?SQL> @seg sales_parallel_ctasSEG_MB OWNER SEGMENT_NAME SEG_TABLESPACE_NAME
------- ------- ------------------------- --------------------9472 TANEL SALES_PARALLEL_CTAS TANEL_DEMO_LARGE 8896 TANEL SALES_PARALLEL_CTAS_FIX TANEL_DEMO_LARGE
SQL>SELECT * FROM TABLE(space_tools.get_space_usage('TANEL', 'SALES_PARALLEL_CTAS','TABLE'));---------------------------------------------------------------------------------------Full blocks /MB 1134669 8865Unformatted blocks/MB 0 0Free Space 0-25% 0 0Free Space 25-50% 0 0Free Space 50-75% 0 0Free Space 75-100% 72963 570
SQL>SELECT * FROM TABLE(space_tools.get_space_usage('TANEL', 'SALES_PARALLEL_CTAS_FIX’,'TABLE'))----------------------------------------------------------------------------------------Full blocks /MB 1134669 8865Unformatted blocks/MB 0 0Free Space 0-25% 0 0Free Space 25-50% 0 0Free Space 50-75% 0 0Free Space 75-100% 0 0
An identical table PX CTAS in the same tablespace with
64MB uniform extent (upper table 6.4% bigger)
570 MB worth of blocks unused in the segment?PX DOP 16 x 64 * ~0.5 =
512MB
Same table loaded with INSERT /*+ APPEND */ PX DOP 16. No empty blocks
gluent.com 35
Parallel Data Loading and Large Extents – HWM brokering
• Instead of each PX slave working on their own separate segment…1. Allocate and extend only one segment2. Slaves allocate space within the single segment as needed3. No “holes” in tables, no space wastage problem anymore!4. Serialized via the HV -‐ Direct Loader High Water Mark enqueue
• Parameter:• _insert_enable_hwm_brokered = true (default)• “during parallel inserts high water marks are brokered”
• Problem:• In Oracle 11.2 (including 11.2.0.4) this only works with INSERT APPENDs• …and not for CTAS nor ALTER TABLE MOVE• (in one case it worked for CTAS, only if the target table was partitioned)
gluent.com 36
Parallel Data Loading and Large Extents – HWM brokering
SQL> @fix 6941515
BUGNO VALUE DESCRIPTION IS_DEFAULT------- ------ ------------------------------------------------------------ ----------6941515 0 use high watermark brokering for insert into single segment 1
SQL> alter session set "_fix_control"='6941515:ON';
Session altered.
• Actually this is a bug, and Oracle has fixed it long time ago• But bugfix 6941515 is not enabled by default!
• After enabling the bug-‐fix, parallel CTAS and parallel ALTER TABLE MOVE also use HWM brokering…• Unfortunately it didn’t work with ALTER TABLE MOVE if moving the
whole table if the table was partitioned.DBMS_SPACE /
space_tools helps to measure this!
Thanks to Alex Fatkulin for spotting
this!
gluent.com 37
Additional slides(we are probably out of time)
gluent.com 38
Case Study: Data Loading Performance – Example 1
• Parallel Create Table As Select from Hadoop to Exadata• Buffer busy waits dominating the response time profile (SQL Monitoring)
http://blog.tanelpoder.com/2013/11/06/diagnosing-‐buffer-‐busy-‐waits-‐with-‐the-‐ash_wait_chains-‐sql-‐script-‐v0-‐2/
gluent.com 39
Data Loading Performance: ashtop.sql
SQL> @ash/ashtop session_state,event,p2text,p2,p3text,p3 sql_id='3rtbs9vqukc71' "timestamp'2013-10-05 01:00:00'" "timestamp'2013-10-05 03:00:00'"
%This SESSION EVENT P2TEXT P2 P3TEXT P3 ------ ------- --------------------------------- --------- -------- -------- -------
57% WAITING buffer busy waits block# 2 class# 1331% ON CPU file# 0 size 524288 1% WAITING external table read file# 0 size 524288 1% ON CPU block# 2 class# 13 0% ON CPU consumer 12573 0 0% WAITING cell smart file creation 0 0 0% WAITING DFS lock handle id1 3 id2 2 0% ON CPU file# 41 size 41 0% WAITING cell single block physical read diskhash# 4695794 bytes 8192 0% WAITING control file parallel write block# 1 requests 2 0% WAITING control file parallel write block# 41 requests 2 0% WAITING change tracking file synchronous blocks 1 0 0% WAITING control file parallel write block# 42 requests 2 0% WAITING db file single write block# 1 blocks 1 0% ON CPU 0 0
• Break the (buffer busy) wait events down by block#/class#
block #2 ?
gluent.com 40
Case Study: Data Loading Performance – Example 2
• Lots of serial sessions doing single row inserts – on multiple RAC nodes• “buffer busy waits” and “gc buffer busy acquire” waits• file# = 6• block# = 2• class# = 13
SQL> @bclass 13
CLASS UNDO_SEGMENT_ID------------------ ---------------file header block
SQL> select file#, block#, status from v$bh where class# = 13;
FILE# BLOCK# STATUS---------- ---------- ----------
5 2 xcur4 2 xcur
...
Block dump from disk:buffer tsn: 7 rdba: 0x00000002 (1024/2)scn: 0x0000.010b6f9b seq: 0x02 flg: 0x04 tail:
0x6f9b1d02frmt: 0x02 chkval: 0xd587 type: 0x1d=KTFB
Bitmapped File Space HeaderHex dump of block: st=0, typ_found=1
A single space allocation contention point per LMT file.
Bigfile tablespaces have only one file!
gluent.com 41
Parallel Data Loading Performance: ash_wait_chains.sql
SQL> @ash/ash_wait_chains event2 sql_id='3rtbs9vqukc71' "timestamp'2013-10-05 01:00:00'" "timestamp'2013-10-05 03:00:00'”
%This SECONDS WAIT_CHAIN------ ---------- ----------------------------------------------------------------------------35% 43735 -> ON CPU15% 18531 -> buffer busy waits [file header block] -> ON CPU7% 9266 -> buffer busy waits [file header block] -> control file parallel write7% 8261 -> buffer busy waits [file header block] -> cell smart file creation6% 6959 -> direct path write5% 6707 -> buffer busy waits [file header block] -> DFS lock handle4% 4658 -> buffer busy waits [file header block] -> local write wait4% 4610 -> buffer busy waits [file header block] -> cell single block physical read3% 4282 -> buffer busy waits [file header block] -> local write wait -> db file par2% 2801 -> buffer busy waits [file header block]2% 2676 -> buffer busy waits [file header block] -> ASM file metadata operation2% 2092 -> buffer busy waits [file header block] -> change tracking file synchronous2% 2050 -> buffer busy waits [file header block] -> control file sequential read
The waiting session(s) The blocking session(s)Block type if buffer busy wait
• Display top wait chains from ASH (who’s waiting for whom)
gluent.com 42
Case Study: Parallel Data Loading Performance
• Reduce demand on the LMT bitmap blocks• By allocating bigger extents at a time
• Use large uniform extents for fast-‐growing tables(paces) • The customer went with 64MB uniform extent size
• The autoallocate extent management is suboptimal for very large segments• As there’d be 1 LMT space management bit per 64kB regardless of
your INITIAL extent size at the segment level• The _partition_large_extents= TRUE doesn’t change this either
• Large uniform extents are better for data loading andscanning!