troubleshooting complex performance issues - oracle seg$ contention

gluent.com 1

Troubleshooting Another Complex Performance Issue (Part 4)

Tanel PõderGluent, Inc.

http://gluent.comhttp://blog.tanelpoder.com

gluent.com 2

Intro: About me

• Tanel Põder• Oracle Database Performance geek (18+ years)• Exadata Performance geek• Linux Performance geek• Hadoop Performance geek

• CEO & co-‐founder:

Expert Oracle Exadata book

(2nd edition is out now!)

Instant promotion

gluent.com 3

Gluent: All Enterprise Data Available in Hadoop!

GluentHadoop

Gluent

MSSQL

Tera-‐data

IBM DB2

Big Data Sources

Oracle

App X

App Y

App Z

gluent.com 4

A refresher

gluent.com 5

Your task is slow because

It is doing too many operations

or

The operations complete too slowly

or both

gluent.com 6

Measure & Profile!

1. Top SQL & wait events

2. Performance counters

3. Oracle kernel functions

ASH, sqlmonsql_tracesnappersqlmonpstackperf

gluent.com 7

Let's get started!

gluent.com 8

A Data Warehouse data loading/preparation

• A large Exadata / RAC / Oracle 11.2.0.3 reporting environment• Lots of parallel CTAS and direct path loads• High concurrency, high parallelism• Throughput bad, all kinds of waits showing up:

• Other, Cluster, Configuration, Application, Concurrency, User I/O, CPU

gluent.com 9

Parallel Degree 48, 96+, inter-‐instance PX

• Mostly CREATE TABLE AS SELECT statements!• Creating pre-‐aggregated tables, etc

gluent.com 10

Drilldown with ASH data – instance-‐wide

• Wait event names are more informative than wait classes:

SQL> @ashtop session_state,wait_class,event session_type='FOREGROUND'sysdate-1/24 sysdate

%This SESSION WAIT_CLASS EVENT AAS Seconds------ ------- -------------- ------------------------------ ------ -------

32% ON CPU 40.6 14619713% WAITING Cluster gc buffer busy acquire 16.8 6055911% WAITING Configuration enq: TX - allocate ITL entry 13.9 501159% WAITING Cluster gc buffer busy release 11.3 407398% WAITING Scheduler resmgr:cpu quantum 9.9 357114% WAITING Application enq: TX - row lock contention 5.2 187983% WAITING User I/O direct path read temp 4.3 154003% WAITING User I/O cell smart table scan 3.7 132592% WAITING Concurrency buffer busy waits 3 107352% WAITING Other enq: CF - contention 2.1 74431% WAITING Application enq: TM - contention 1.7 6109

gluent.com 11

Drilldown with ASH data – one SQL:

• Let's pick one of the slow CTAS statements:

SQL> @ash/dashtop session_state,event,p1text,p1 top_level_sql_id='89apzfvjba3gu' sysdate-2 sysdate

%This SESSION EVENT P1TEXT P1------ ------- ------------------------------ ------------------ ----------

36% WAITING gc buffer busy acquire file# 120% WAITING gc buffer busy release file# 110% WAITING enq: TX - allocate ITL entry name|mode 14150533168% WAITING gc buffer busy acquire file# 14% WAITING buffer busy waits file# 14% WAITING enq: TX - allocate ITL entry name|mode 14150533162% WAITING buffer busy waits file# 12% WAITING gc buffer busy release file# 12% WAITING enq: TX - row lock contention name|mode 14150533161% WAITING latch: ges resource hash list address 3.3055E+101% ON CPU sleeptime/senderid 268501512

...

Looks like the CTAS statements do

directly experience these waits

gluent.com 12

Wait event explanation

• buffer busy waits• Buffer is physically available in local instance, but is pinned by some

other local session (in incompatible mode)

• gc buffer busy acquire• Someone has already requested the remote block into local instance

• gc buffer busy release• Block is local, but has to be shipped out to a remote instance first (as

someone in the remote instance had requested it first)

• enq: TX -‐ allocate ITL entry• Can't change block as all the block's ITL entries are held by others• Can't dynamically add more ITLs (no space in block)

gluent.com 13

Looks like write-‐write contention across RAC nodes!

gluent.com 14

Contention on what objects, blocks, what SQL?

• Let's dig into the top wait event:

SQL> @ashtop session_state,wait_class,p1,p2 "event='gc buffer busy acquire'" sysdate-1/24 sysdate

%This SESSION WAIT_CLASS P1 P2 AAS TotalSeconds------ ------- -------------- ---------- ---------- ---------- ------------

8% WAITING Cluster 1 279634 1.3 4760 6% WAITING Cluster 1 279635 1.1 3916 6% WAITING Cluster 1 279629 1.1 3838 5% WAITING Cluster 1 279632 .9 3146 4% WAITING Cluster 1 279638 .7 2503

...

SQL> @sed "gc buffer busy"

EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ----------------------- ----------- ----------- ----------- -----------gc buffer busy acquire Cluster file# block# class#gc buffer busy release Cluster file# block# class#

gluent.com 15

Which object?

• Translate file#, block# into segment names/numbers:• Assumes the blocks are in buffer cache

SQL> SELECT objd data_object_id, COUNT(*) FROM v$bh WHERE file#=1 AND block# IN( 279634,279635,279629,279632,279638,279636,279613,279662,279628,279608,279653,279627,279642,279637,279643,279631,279646,279622,279582,279649

) GROUP BY objd ORDER BY 2 DESC;

DATA_OBJECT_ID COUNT(*)-------------- ----------

8 113 All blocks belong to a segment with data_object_id = 8

gluent.com 16

What is segment #8?

• Look up the object names:• Using DBA_OBJECTS.DATA_OBJECT_ID -‐> OBJECT_ID

SQL> @doid 8

object_id owner object_name O_PARTITION object_type--------- --------- ----------------- ------------- -----------

8 SYS C_FILE#_BLOCK# CLUSTER14 SYS SEG$ TABLE13 SYS UET$ TABLE

This segment #8 is a cluster which contains SEG$ (DBA_SEGMENTS) and UET$ (DBA_EXTENTS), but UET$ isn't used anymore thanks to LMT

tablespaces. That leaves SEG$

gluent.com 17

Write contention on SEG$?

• SEG$ is modified when:

1. A new segment is created (table, index, partition, etc)2. An existing segment extends3. A segment is dropped / moved

4. Parallel direct path loads (CTAS) can also create many temporary segments (one per PX slave)• …and merge these into final segment in the end of loading

More about this later…

gluent.com 18

More evidence!

gluent.com 19

AWR 1

• No CPU starvation evident (checked all RAC nodes)

Top 5 Timed Foreground Events~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Event Waits Time(s) (ms) time Wait Class---------------------------- ----------- -------- ----- ------ ----------DB CPU 165,199 37.0gc buffer busy acquire 1,260,128 68,399 54 15.3 Clusterenq: TX - allocate ITL entry 354,496 40,583 114 9.1 Configuratdirect path write temp 4,632,946 37,455 8 8.4 User I/Ogc buffer busy release 213,750 22,683 106 5.1 Cluster

Host CPU (CPUs: 160 Cores: 80 Sockets: 8)~~~~~~~~ Load Average

Begin End %User %System %WIO %Idle--------- --------- --------- --------- --------- ---------

5.20 42.76 26.0 3.3 0.0 69.8

gluent.com 20

AWR 2

• AWR also listed a SEG$ insert as a top SQL:Elapsed Elapsed Time

Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id----------- ------------- ------------- ------ ----- ----- -------------

100,669.1 4,444 22.65 22.6 .5 .0 g7mt7ptq286u7insert into seg$ (file#,block#,type#,ts#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,lists,groups,cachehint,hwmincr, spare1, scanhint, bitmapranges) values (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,DECODE(:17,0,NULL,:17),:18,:19)

SNAP_ID NODE SQL_ID EXECS AVG_ETIME AVG_LIO AVG_PIO------- ---- ------------- ------ ---------- -------- --------

24263 3 g7mt7ptq286u7 552 70.227 880.6 .0

• AWR on another node (same insert into seq$):

Super-‐slow single row inserts into SEG$ ???

gluent.com 21

SQL Trace confirms SEG$ insert problem:

insert into seg$ (file#,block#,type#,ts#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,lists,groups,cachehint,hwmincr, spare1,scanhint, bitmapranges)

values(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,DECODE(:17,0,NULL,:17),:18,:19)

call count cpu elapsed disk query current rows------- ------ -------- ---------- -------- -------- --------- -------Parse 0 0.00 0.00 0 0 0 0 Execute 1 0.82 75.14 3 773 1508 1 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- -------- -------- --------- -------total 1 0.82 75.14 3 773 1508 1

Row Source Operation---------------------------------------------------LOAD TABLE CONVENTIONAL (cr=773 pr=3 pw=0 time=75139696 us)

gluent.com 22

SEG$ insert wait events:

Elapsed times include waiting on following events:Event waited on Times Max. Wait Total Waited------------------------------------ Waited ---------- ------------gc buffer busy release 199 1.01 10.76buffer busy waits 560 1.03 7.02resmgr:cpu quantum 54 0.04 0.08enq: TX - row lock contention 93 0.35 4.25latch: ges resource hash list 400 0.01 0.98gc buffer busy acquire 464 8.96 36.64buffer deadlock 158 0.00 0.00Disk file operations I/O 5 0.00 0.00gc cr block 2-way 101 0.00 0.01gc current block 2-way 2 0.00 0.00cell single block physical read 3 0.00 0.00gc current block busy 10 0.05 0.14gc current grant 2-way 2 0.00 0.00enq: TX - allocate ITL entry 316 0.77 13.18latch: cache buffers chains 334 0.01 1.45...

Why does a single row insert cause so many waits?

gluent.com 23

Checkpoint: What do we know?

gluent.com 24

Checkpoint

1. Lots of parallel CTAS statements, seem to wait for various RAC gc buffer busy events and enq: TX – ITL contention

2. CTAS statements create new segments, new segments cause inserts into SEG$

3. AWR and SQL Trace report super-‐long elapsed times for single-‐row SEG$ inserts

4. It's actually the recursive SEG$ inserts that wait for the gc buffer busy and enq: TX – ITL contention events

gluent.com 25

"Hypothesis"

Could we be in a cache + transaction layer nested locking ping-‐pong loop?

gluent.com 26

Multiple layers of locking & coherency

ITL 2ITL 1

data2data1ccLH c2c1

ITL 4ITL 3

KCBH -‐ block common headerKTBBH -‐ transaction common header

KDBH -‐ Data header

Data block

Row

Level 0: VOS/service layer-‐ Cache buffers chains latch

Level 1: Cache layerBuffer pin-‐ local pin-‐ global cache pin

Level 2: Transaction layer-‐ ITL entry-‐ row lock byte

You must pin (and retrieve) a buffer before you can

change or examine it

What if you pin a buffer and find all its ITL slots busy?

(with no space to add mode)

The session will release the pinand start waiting for enq: TX – ITL contention event

And re-‐get/pin again when TX –ITL wait is over!

gluent.com 27

Raw trace file (edited):

*** 2013-11-18 21:39:09.760WAIT #1: nam='buffer busy waits' ela= 1023321 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='gc buffer busy release' ela= 25304 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='buffer busy waits' ela= 4315 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='Disk file operations I/O' ela= 10 FileOperation=2 fileno=141 filetype=2 WAIT #1: nam='gc cr block 2-way' ela= 134 p1=141 p2=5456 p3=3447 obj#=0 WAIT #1: nam='enq: TX - allocate ITL entry' ela= 29295 name|mode=1415053316WAIT #1: nam='latch: ges resource hash list' ela= 958 address=33046235872 number=91WAIT #1: nam='gc buffer busy acquire' ela= 3384 file#=1 block#=279627 class#=1 obj#=14 WAIT #1: nam='buffer busy waits' ela= 817 file#=1 block#=279627 class#=1 obj#=14WAIT #1: nam='gc buffer busy acquire' ela= 344 file#=1 block#=192 class#=4 obj#=14WAIT #1: nam='gc buffer busy acquire' ela= 2096 file#=1 block#=279656 class#=1 obj#=14 WAIT #1: nam='buffer busy waits' ela= 3922 file#=1 block#=279656 class#=1 obj#=14 WAIT #1: nam='gc cr block 2-way' ela= 124 p1=141 p2=28888 p3=3895 obj#=0 WAIT #1: nam='enq: TX - allocate ITL entry' ela= 9260 name|mode=1415053316 WAIT #1: nam='latch: ges resource hash list' ela= 683 address=33043220640 WAIT #1: nam='gc buffer busy acquire' ela= 13696 file#=1 block#=279656 class#=1 obj#=14WAIT #1: nam='buffer busy waits' ela= 1312 file#=1 block#=279656 class#=1 obj#=14...*** 2013-11-18 21:39:10.933WAIT #1: nam='gc buffer busy acquire' ela= 1023271 file#=1 block#=277294 class#=1

obj#=9 WAIT #1: nam='buffer deadlock' ela= 15 dba=4471598 class*10+mode=12 flag=8192 obj#=9

What's obj# 9?

What's class# 4?

obj# 14 is SEG$

gluent.com 28

OBJ #9 and #14

• Look up the object names:• Using DBA_OBJECTS.DATA_OBJECT_ID -‐> OBJECT_ID

SQL> @oid 9,14

owner object_name object_type--------------- --------------------- ------------------SYS I_FILE#_BLOCK# INDEX SYS SEG$ TABLE

The object #9 is a cluster index for cluster

C_FILE#_BLOCK# (oid #8 on a previous slide

SQL> @bclass 4

CLASS UNDO_SEGMENT_ID------------------ ---------------segment header Bclass #4 is a segment

header block that also stores freelist information

gluent.com 29

So what did happen?

gluent.com 30

Conclusion from gathered evidence

1. Lots of concurrent + highly parallel CTAS statements

2. Each PX slave created its own temporary segments (in spikes when PX query started up)

3. Spikes of concurrent SEG$ inserts simultaneously on 4 RAC nodes

4. SEG$ cluster blocks out of ITL entries

5. This caused global cache/TX locking trashing – long loops of unsuccessful attempts to pin and insert into SEG$ blocks

gluent.com 31

Also worth noting

1. Resource manager a possible factor• Plenty of resmgr:cpu quantumwaits• Lock holders, buffer pin holders might have gone to sleep – while

holding the pins/locks under contention

1. Freelist-‐based block space allocation• SEG$ is a SYS-‐owned freelist-‐managed index cluster• Many cluster table blocks walked through when searching for space

2. ASH doesn't always show recursive SQL IDs• Attributes waits to parent (top level) statement SQL ID instead• ASH p1, p2, p3, current_obj# columns are useful for drilldown

gluent.com 32

How to fix it?

• Do it less!• What?

• Do not insert/update SEG$ entries so frequently!• How?

• Do not allow parallel direct load insert to create a temporary loading segment for each slave (and inserted partition)!• How?

• Make sure that High Water Mark Brokering is used!• One temporary segment -‐ thus one SEG$ insert/update/delete -‐ per

query (and inserted partition) = much less SEG$ contention.• More in following slides …

gluent.com 33

Why is High-‐Water-‐Mark Brokering needed?

• A historical problem with large uniform extent sizes:1. Each data loading PX slave allocated its own extents (to its private

temporary segment) when loading data2. When the load was completed the private temporary segments were

merged to one final table segment3. The last extent of each private segment possibly ended up not fully used

• Some were almost full, some almost empty – half-‐empty on average4. Wastage = 0.5 x extent_size x PX_slaves

• The more PX slaves, the more wastage• The bigger extent, the bigger wastage

• References:• https://blogs.oracle.com/datawarehousing/entry/parallel_load_uniform

_or_autoallocate• http://www.oracle.com/technetwork/database/bi-‐

datawarehousing/twpdwbestpractices-‐for-‐loading-‐11g-‐404400.pdf

Problem Solved by using High-‐Water-‐Mark

Brokering!

But…

gluent.com 34

Parallel Data Loading and Large Extents – space wastage?SQL> @seg sales_parallel_ctasSEG_MB OWNER SEGMENT_NAME SEG_TABLESPACE_NAME

------- ------- ------------------------- --------------------9472 TANEL SALES_PARALLEL_CTAS TANEL_DEMO_LARGE 8896 TANEL SALES_PARALLEL_CTAS_FIX TANEL_DEMO_LARGE

SQL>SELECT * FROM TABLE(space_tools.get_space_usage('TANEL', 'SALES_PARALLEL_CTAS','TABLE'));---------------------------------------------------------------------------------------Full blocks /MB 1134669 8865Unformatted blocks/MB 0 0Free Space 0-25% 0 0Free Space 25-50% 0 0Free Space 50-75% 0 0Free Space 75-100% 72963 570

SQL>SELECT * FROM TABLE(space_tools.get_space_usage('TANEL', 'SALES_PARALLEL_CTAS_FIX’,'TABLE'))----------------------------------------------------------------------------------------Full blocks /MB 1134669 8865Unformatted blocks/MB 0 0Free Space 0-25% 0 0Free Space 25-50% 0 0Free Space 50-75% 0 0Free Space 75-100% 0 0

An identical table PX CTAS in the same tablespace with

64MB uniform extent (upper table 6.4% bigger)

570 MB worth of blocks unused in the segment?PX DOP 16 x 64 * ~0.5 =

512MB

Same table loaded with INSERT /*+ APPEND */ PX DOP 16. No empty blocks

gluent.com 35

Parallel Data Loading and Large Extents – HWM brokering

• Instead of each PX slave working on their own separate segment…1. Allocate and extend only one segment2. Slaves allocate space within the single segment as needed3. No “holes” in tables, no space wastage problem anymore!4. Serialized via the HV -‐ Direct Loader High Water Mark enqueue

• Parameter:• _insert_enable_hwm_brokered = true (default)• “during parallel inserts high water marks are brokered”

• Problem:• In Oracle 11.2 (including 11.2.0.4) this only works with INSERT APPENDs• …and not for CTAS nor ALTER TABLE MOVE• (in one case it worked for CTAS, only if the target table was partitioned)

gluent.com 36

Parallel Data Loading and Large Extents – HWM brokering

SQL> @fix 6941515

BUGNO VALUE DESCRIPTION IS_DEFAULT------- ------ ------------------------------------------------------------ ----------6941515 0 use high watermark brokering for insert into single segment 1

SQL> alter session set "_fix_control"='6941515:ON';

Session altered.

• Actually this is a bug, and Oracle has fixed it long time ago• But bugfix 6941515 is not enabled by default!

• After enabling the bug-‐fix, parallel CTAS and parallel ALTER TABLE MOVE also use HWM brokering…• Unfortunately it didn’t work with ALTER TABLE MOVE if moving the

whole table if the table was partitioned.DBMS_SPACE /

space_tools helps to measure this!

Thanks to Alex Fatkulin for spotting

this!

gluent.com 37

Additional slides(we are probably out of time)

gluent.com 38

Case Study: Data Loading Performance – Example 1

• Parallel Create Table As Select from Hadoop to Exadata• Buffer busy waits dominating the response time profile (SQL Monitoring)

http://blog.tanelpoder.com/2013/11/06/diagnosing-‐buffer-‐busy-‐waits-‐with-‐the-‐ash_wait_chains-‐sql-‐script-‐v0-‐2/

gluent.com 39

Data Loading Performance: ashtop.sql

SQL> @ash/ashtop session_state,event,p2text,p2,p3text,p3 sql_id='3rtbs9vqukc71' "timestamp'2013-10-05 01:00:00'" "timestamp'2013-10-05 03:00:00'"

%This SESSION EVENT P2TEXT P2 P3TEXT P3 ------ ------- --------------------------------- --------- -------- -------- -------

57% WAITING buffer busy waits block# 2 class# 1331% ON CPU file# 0 size 524288 1% WAITING external table read file# 0 size 524288 1% ON CPU block# 2 class# 13 0% ON CPU consumer 12573 0 0% WAITING cell smart file creation 0 0 0% WAITING DFS lock handle id1 3 id2 2 0% ON CPU file# 41 size 41 0% WAITING cell single block physical read diskhash# 4695794 bytes 8192 0% WAITING control file parallel write block# 1 requests 2 0% WAITING control file parallel write block# 41 requests 2 0% WAITING change tracking file synchronous blocks 1 0 0% WAITING control file parallel write block# 42 requests 2 0% WAITING db file single write block# 1 blocks 1 0% ON CPU 0 0

• Break the (buffer busy) wait events down by block#/class#

block #2 ?

gluent.com 40

Case Study: Data Loading Performance – Example 2

• Lots of serial sessions doing single row inserts – on multiple RAC nodes• “buffer busy waits” and “gc buffer busy acquire” waits• file# = 6• block# = 2• class# = 13

SQL> @bclass 13

CLASS UNDO_SEGMENT_ID------------------ ---------------file header block

SQL> select file#, block#, status from v$bh where class# = 13;

FILE# BLOCK# STATUS---------- ---------- ----------

5 2 xcur4 2 xcur

...

Block dump from disk:buffer tsn: 7 rdba: 0x00000002 (1024/2)scn: 0x0000.010b6f9b seq: 0x02 flg: 0x04 tail:

0x6f9b1d02frmt: 0x02 chkval: 0xd587 type: 0x1d=KTFB

Bitmapped File Space HeaderHex dump of block: st=0, typ_found=1

A single space allocation contention point per LMT file.

Bigfile tablespaces have only one file!

gluent.com 41

Parallel Data Loading Performance: ash_wait_chains.sql

SQL> @ash/ash_wait_chains event2 sql_id='3rtbs9vqukc71' "timestamp'2013-10-05 01:00:00'" "timestamp'2013-10-05 03:00:00'”

%This SECONDS WAIT_CHAIN------ ---------- ----------------------------------------------------------------------------35% 43735 -> ON CPU15% 18531 -> buffer busy waits [file header block] -> ON CPU7% 9266 -> buffer busy waits [file header block] -> control file parallel write7% 8261 -> buffer busy waits [file header block] -> cell smart file creation6% 6959 -> direct path write5% 6707 -> buffer busy waits [file header block] -> DFS lock handle4% 4658 -> buffer busy waits [file header block] -> local write wait4% 4610 -> buffer busy waits [file header block] -> cell single block physical read3% 4282 -> buffer busy waits [file header block] -> local write wait -> db file par2% 2801 -> buffer busy waits [file header block]2% 2676 -> buffer busy waits [file header block] -> ASM file metadata operation2% 2092 -> buffer busy waits [file header block] -> change tracking file synchronous2% 2050 -> buffer busy waits [file header block] -> control file sequential read

The waiting session(s) The blocking session(s)Block type if buffer busy wait

• Display top wait chains from ASH (who’s waiting for whom)

gluent.com 42

Case Study: Parallel Data Loading Performance

• Reduce demand on the LMT bitmap blocks• By allocating bigger extents at a time

• Use large uniform extents for fast-‐growing tables(paces) • The customer went with 64MB uniform extent size

• The autoallocate extent management is suboptimal for very large segments• As there’d be 1 LMT space management bit per 64kB regardless of

your INITIAL extent size at the segment level• The _partition_large_extents= TRUE doesn’t change this either

• Large uniform extents are better for data loading andscanning!

gluent.com 43

Thanks!!!

http://blog.tanelpoder.com/seminar

[email protected]@tanelpoder

troubleshooting complex performance issues - oracle seg$ contention

Software