correlated subquery
TRANSCRIPT
-
7/27/2019 Correlated Subquery
1/15
Correlated Subquery
-
7/27/2019 Correlated Subquery
2/15
What is Correlated Subquery?
If the a subquery depends on the values of
its parent query then the sub query is
called correlated subquery.
select
Coursename , Courseadminid,
(select
Firstname+' '+Lastname
from
studentwhere
studentid=Course.courseadminid
) as CourseAdminName
from
course
-
7/27/2019 Correlated Subquery
3/15
How Does It Work?
For each output row from parent query, Oraclewill execute the subquery once.
Oracle may use subquery resultset cache, that
is, Oracle will cache the subquery resultsetsbased on parent query values.
But the cache size is very limited. For example,for scalar subquery (scalar means each
subquery execution only returns one row),Oracle uses a HASH table for cache, and thecache slot number is 256 for 8i and 9i, and 1024for 10g (from Jonathan Lewis).
-
7/27/2019 Correlated Subquery
4/15
Performance Concerns
Since the subquery is executed once per recordfrom parent query, we should eliminate as manyrows as possible before the execution of thesubquery step.
The efficiency of each subquery execution: dowe have right index or partition pruning? Forexample, when a subquery has to use FTS on atable with 10K blocks, if the parent query returns10K rows, that will be a huge number of physicalIO.
If the purpose is to get a value from anothertable, consider using JOIN.
-
7/27/2019 Correlated Subquery
5/15
Subquery In Action: UAD
This query had massive temp space usages
on the hash join and it had not yet to
generate a single output row to execute
subquery.
-
7/27/2019 Correlated Subquery
6/15
Subquery In Action: TAOHere is the source of occasional TAO DB CPU alerts, the part of the view
TAO.TAO_FACT_AGGREGATE_D_SEGMENTS used by TAO cube build process. Note there is a
subquery around table RM.RM_PIXELS, which used to be the source of high LIO and CBC
contentions. In one run on 09/27 from hour 06 to 11, one query recorded 699M LIO and AWR ASH
had major waits as CPU and latch: cache buffers chain. Table RM.RM_PIXELS has 101142 rows,
so it passed the limit of cache slots. The other table involved TAO.TAO_DEFAULT_IDS has only 3
rows.
-
7/27/2019 Correlated Subquery
7/15
Todays Subquery: Direct
MarketingRealtime Tracking:
Query has executed 31,002 seconds
Last call from client (SID 2008) is 29,602 seconds.
All the PX slave has LAST_CALL_ET as 2 seconds, meaning a
new round of subquery execution just started.
This is the view used by the qurery in question. Direct Marketing
wraps most of its business logic inside views. Plenty of correlated
subqueries here, this is another direct marketing feature.
-
7/27/2019 Correlated Subquery
8/15
The Issue
After 11.2.0.3 upgrade (2012/09/28), this query has continuouslytopped high physical IO usages among all databases scanned byperformance framework.
For example, the single run from 09/30 22 hour to 10/01 08 hour, itread 339M blocks. The run from 09/26 22 hour only read 3.168Mblocks, also completed within 2 hours. So my interest is toinvestigate where the 100 times of physical IO is from.
AWR segment statistics shows the source of physical IO is the tableSEM.AI_KEYWORD, one of the table used twice by the correlatedsubquery inside the view related to the query. The physical reads onthis table for the same period from 09/30 22 hour to 10/01 08 hour is100M blocks. The mismatch from 300M blocks from AWR SQL stats
is another interesting thing worth some research.
-
7/27/2019 Correlated Subquery
9/15
FTS and partition pruning
for AI_KEYWORD
1. 6 sets of PX queues, or DFO (dataflow object), one for parent query, one
for each subquery.
2. 6 PX COORDINATOR operations. So
each subquery will start its own PX
operations.
3. Plans for subqueries in SELECT listare usually listed at the top on the
main query. Here from line ID 1 to 28.
The main query is from 29 to 32.
4. The final COST is usually messed up
and the cost from subqueries not
counted in.
5. The values passed from main queryto subquery usually lised in
PREDICATE part as bind variables.
6. No plan change after upgrade.
-
7/27/2019 Correlated Subquery
10/15
Table Info
The base table of parent query, AI_ACCOUNT, has 206 rows, or206 unique IDs.
The subquery condition on AI_KEYWORD is ACCOUNT_ID =AI_ACCOUNT.ID for each row returned from parent query.
AI_KEYWORD is list partitioned by ACCOUNT_ID. The total size is1,730,120 blocks (outdated, but pretty close. dba_segments has1,752,876 blocks). The largest partition has 85,120 blocks.
In theory, for each row from AI_ACCOUNT, the two subqueriesinside the view related to AI_KEYWORD will each read a singlepartition. So this query should use at most 2* 1,752,876 blocks(3.505M) of physical reads, plus some block numbers from othertables. That is close to the result before 11.2.0.3 upgrade (3.168M).
-
7/27/2019 Correlated Subquery
11/15
Some Research
Using session stats to see if any interesting statisticsduring realtime tracking.
From previous case study, we have seen the case whena target table has full table refresh, consistent readsrelated to UNDO could cause larger than expected IOusages. Another case is chained or migrated rows whichcould also cause larger than expected IO usages. Alsothere is no UNDO and chained/migrated rows statisticsinside session statistics during realtime tracking.
Using ASH and AWR ASH: The dominant events are
direct path read, which is usually FTS. AWR ASH alsoshows the major waits for the query before upgrade isdb file scatter read, which usually can take advantagesof cache.
-
7/27/2019 Correlated Subquery
12/15
Where is The Clue?
The huge PX process count is from the sum of PX
slave processes used by all runs of the subqueries.
1. Plenty of sessions use direct path read at the same time. Direct marketing uses DEFAULparallel attribute at table level for thousands of tables. The execution plan shows PX for all
subqueries and main query.
2. The interesting thing here is, each direct path read has a different object_id. We are
expecting partition pruning to a single partition (table has no subpartition).
3. There is an issue with partition pruning with correlated subquery.
We can also use AWR ASH for the
same SQL_ID, and order the object
id by sample time. For this query,
we can see the same object_id
appeared multiple times at totally
different time ranges, meaning the
same partition was scanned
multiple times.
-
7/27/2019 Correlated Subquery
13/15
Some Test to Narrow Down The
Scope The purpose is to see if PX is the cause for inefficient partition pruning inside subquery.
I rewrote the query by replacing the view with an inline view, and using no_parallel hints todisable all PX operations.
The query plan shows partition pruning with PARTITION LIST SINGLE operation, the one withPX is PX Block Iterator. So I assume partition pruning will work in this case.
The query completed within 310 seconds. Total physical IO is 1,564,075 blocks. Next run took 3minutes with 400K blocks. The major waits were db file scatter read, so cache played a role.
-
7/27/2019 Correlated Subquery
14/15
Is It Reproducible?
Yes and No. When this database was busy and my test failed to acquire enough PX processes,it actually completed with the time close to the one run in serial.
But when the database was relatively idle and the query could acquire PX processes for mostof the time, the pattern could be reproduced. Here is the part of SQL report from a test query.
After 17 minutes, it had read more than 10M blocks, with only one account id completed.
-
7/27/2019 Correlated Subquery
15/15
Summary
Be careful about the performance of correlated
subquery.
Use DEFAULT (or any) parallel attribute on
tables with caution. This should be an Oracle bug or unwanted
feature, just not sure if it exists in earlier version.
Remove outdated or unwanted scheduled queryjobs. The offending query is from an outdated
scheduled job no application actually uses.