missing partition key column stats

Upload: saeed-meethal

Post on 04-Jun-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Missing Partition Key Column Stats

    1/41

    The Impact of Missing Partition

    Level Column Stats on PartitionKey

  • 8/13/2019 Missing Partition Key Column Stats

    2/41

    EDWLITE does not have long running reportquery for a while. Let's check what this one is

    doing, which has run more than 10 hours.

  • 8/13/2019 Missing Partition Key Column Stats

    3/41

    1. Realtime tracking screen, Stats Tab of SQL panel.2. Review SQL elapsed time (ELAPSED_MS) against other time.3. The majority of elapsed time has been on CPU (46246215/47063224 = 98%).

    4. High CPU time is frequently caused by high cache access (memory access), although notalways. LIO buffer_gets is 2B (2,115,581,456).

    High DIRECT_WRITES is usually caused by large HASHJOIN or SORTING

  • 8/13/2019 Missing Partition Key Column Stats

    4/41

    1. Some systems allow parallel operation to cross multiple instances. So it is better to check SQL stats oall instances.

    2. Some PX SQL execution could have more than 1 child cursors, 1 for QC and 1 or more for slave

    processes. For that case, it is better to blank out Cursor# text box.3. Multiple slave cursors could be generated for parallel DDL operation cross multiple instances.4. In this case, the majority of the activities are on node 4.5. Also two PX slave processes have done their tasks.

  • 8/13/2019 Missing Partition Key Column Stats

    5/41

    1. Understanding the business meaning and the schema characteristicssignificantly help troubleshooting and tuning. Here the only thing I can

    guessing.2. This SQL is to query data for a list of customer IDs. Usually this shou

    of the query plan.3. This SQL is to query a fact table for a time range. And the time range

    inside a DATE/TIME dimension table. A typical pattern of some Y! reand constantly troublemakers, especially in 10G when dealing with papruning.

    4. We can use Related Table/Index Info tab to research the table and istructures and stats, to gain some sense about how this query could

  • 8/13/2019 Missing Partition Key Column Stats

    6/41

    1. For 11G, v$sql_monitor and v$sql_plan_monitor are always our best friends for troubleshooting.2. This information is usually kept for very short period after SQL completes. In any case, when researching a p

    SQL (not necessarily long running like this), try to gather information from both views as earliest as possibleuse Plan Monitor tab for this purpose. SQL Monitor View is for gv$sql_monitor. Summary View is for aggregv$sql_plan_monitor and Detail View is for every sessions inside gv$sql_plan_monitor for the concerned SQ

    3. In this case, we only get v$sql_monitor information from one session. Other sessions might be long done or too long and the in-memory info has been purged. We have known two PX sessions have been done. It is ainteresting to research why a long running SQL with PX only has activities on a single session, either it is caudata skew or some steps are not in PX.

  • 8/13/2019 Missing Partition Key Column Stats

    7/41

    1. Here is v$sql_plan_monitor info. PID (parent plan line ID) column is blank. For PX operation, it is recorded isession. Because the query has run for too long, we have lost the information recorded inside QC session, wincludes PLAN_OBJECT_NAME, CARD (plan cardinality estimate). But we can always refer back to the exeplan for those information.

    2. Always pay attention to any very large values inside OUTPUT_ROWS column, and TIME_SEC column. For

    intensive query, READ_REQ or WRT_REQ are the most important ones.3. Here we can see the line 8 HASH JOIN has both as large numbers, and the first row source of this HASH JOused 822 seconds, with a huge output of 44,809,229 rows.

    4. While the output of HASH JOIN is very large, it is the first row source (outer table) of a NESTED LOOPS JO7). It could cause huge LIOs or cache accesses for the inner table of the NL JOIN.

  • 8/13/2019 Missing Partition Key Column Stats

    8/41

    1. The "start" count of step 32 is consistent with the output of HASH JOIN at step 8. This step is an INLIST ITERBecause the INLIST ITERATOR will be operated on each record from HASH JOIN output, the overall "start" coamplified by the INLIST size. That is why we see step 33/34 with a huge number 4,294,957,578, 121 times of

    2. Step 34 INDEX UNIQUE SCAN returned 211,816,728 index records. Oracle retrieved each related rows from then applied the JOIN condition plus other filter condition. The final output is only 1996. So there is significant here.

    3. Remember earlier we saw the v$sql has buffer_gets around 2B, so pretty much here is the source of LIO.4. Note most of the runtime information from step 11 to 31 have been lost because the query has run for too long

  • 8/13/2019 Missing Partition Key Column Stats

    9/41

    Using Active Sessions tab of the top panel to check current systemsession status. There is only one session for the concerned query is

    still active on this node. Right click to invoke context menu and select"Track PX Operation"

  • 8/13/2019 Missing Partition Key Column Stats

    10/41

    1. One important tool to track query status is v$session_longops.2. Inside "SQL Sessions", we can see all sessions (active and idle"

    related to this query. Right click any row for the concerned SQL toinvoke context menu and select "Long Ops by SQL"

  • 8/13/2019 Missing Partition Key Column Stats

    11/41

    1. There is only one active operation, at the top and it is a HASH JOIN.2. There was plenty of time used by SID 413 to scan MYDIM.DIM_ACCOUNT_C table on plan step 31. This mus

    other row source related to the HASH JOIN. Since V$sql_plan_monitor has no information about step 31, we cthis step has done long time ago.

    3. Usually high CPU usage from HASH JOIN is caused by data skew, when some hash buckets have very long cbelieve there is some data skew because there is only one session left we can see meaningful activity. But thecould be from other steps HASH JOIN is feeding the output to them. In any case, we can use information from

    to check which assumption is true.

  • 8/13/2019 Missing Partition Key Column Stats

    12/41

    V$sesstat is one method to check what the SQL is actually doing.

  • 8/13/2019 Missing Partition Key Column Stats

    13/41

    1. After initial display, click Refresh button to

    and average.2. In this case, the majority session stats are L

  • 8/13/2019 Missing Partition Key Column Stats

    14/41

    1. PIO stats is not very big, average 111KB/s.2. Average 46K/s LIO is a high number. For case

    high LIO is normally not expected because the CPU would be wasted on traverse linked list chseveral hash buckets.

    3. Since the delta shows the major stat changes arelated, so we can conclude the issue is related

  • 8/13/2019 Missing Partition Key Column Stats

    15/41

    Session Events will give us what the session has been waitingso far

  • 8/13/2019 Missing Partition Key Column Stats

    16/41

    The other useful info is ASH. Using "Activity" tab will give us ASHinformation for related cursor. The dominant one if CPU.

    1 HASH JOIN i i id PX S th b d t k i i h b d

  • 8/13/2019 Missing Partition Key Column Stats

    17/41

    1. HASH JOIN is inside PX. So there may be some data skew since one session has been done.2. At step 8, the estimated HASH JOIN cardinality is 2587, but the actual output is beyond 35M and still counting.

    So the bad estimate could be the source of suboptimal plan.3. There is a HASH JOIN between line 15 to 22, a fact table with a date dimension table. By common sense, we

    should expect the JOIN output is inline with the output from the fact table. The fact table has 51M rows asestimated output, but the final HASH JOIN output is 2874. The bad news is that we lost runtime data about thesesteps inside v$sql_plan_monitor, so we cannot simply verify by using V$ view data.

  • 8/13/2019 Missing Partition Key Column Stats

    18/41

    The plan predicates show us the fact table date range andthe INLIST we mentioned in SQL PLAN MONITOR screen.

  • 8/13/2019 Missing Partition Key Column Stats

    19/41

    1. Usually suboptimal plan is the result of bad or missing stats.2. Use Related Table/Index Info to check table/index stats

    3. Here is the list for all the tables/indexes used by the SQL cursor.

  • 8/13/2019 Missing Partition Key Column Stats

    20/41

    Table stats for DIM_DATE

  • 8/13/2019 Missing Partition Key Column Stats

    21/41

    1. For JOIN cardinality estimate, table/partition stats andjoin column stats are very important.

    2. Column stats for DATE_SID is fine for DIM_DATE.

  • 8/13/2019 Missing Partition Key Column Stats

    22/41

  • 8/13/2019 Missing Partition Key Column Stats

    23/41

    The fact table has no global stats.

  • 8/13/2019 Missing Partition Key Column Stats

    24/41

    The fact table is partitioned by DATE_SID.

  • 8/13/2019 Missing Partition Key Column Stats

    25/41

    1. Let's try partition stats.2. We don't know partition name yet. So use %

    wildcard to start so that we can get the partitionname convention.

  • 8/13/2019 Missing Partition Key Column Stats

    26/41

  • 8/13/2019 Missing Partition Key Column Stats

    27/41

    Lets check column stats. For now, we can onlycheck one partition at a time. So we have to pickseveral partitions to check.

  • 8/13/2019 Missing Partition Key Column Stats

    28/41

  • 8/13/2019 Missing Partition Key Column Stats

    29/41

    Column stats for DATE_SID does exist for earlier days. Wecan find out DATE_SID has column stats up to 20120408.

  • 8/13/2019 Missing Partition Key Column Stats

    30/41

    1. We need further investigation about the root cause of high LIO, that is, whyjoin output between DIM_DATE and the fact table is so low.

    2. One method is to construct some small queries as subsets from original quand manipulate some predicates, for example, changing the date range, anuse "explain plan" to see the difference, especially when we know the boundates 04/08 and 04/09.

    3. This Framework does not have Explain Plan capability yet. So we have to

    SQLPLUS for this purpose.4. If DATE_SID is up to 08-04-2012, the JOIN cardinality is good.

  • 8/13/2019 Missing Partition Key Column Stats

    31/41

    1. The estimate is not good for 09-04-2012. One more day is added.We expected more, but got less.

  • 8/13/2019 Missing Partition Key Column Stats

    32/41

  • 8/13/2019 Missing Partition Key Column Stats

    33/41

  • 8/13/2019 Missing Partition Key Column Stats

    34/41

    1. Here is the table stats part for the first good query.

    2. Oracle CBO does know the partitions involved andconsider their stats.

  • 8/13/2019 Missing Partition Key Column Stats

    35/41

    1. CBO needs column stats at partition or global level for JOINcardinality estimate.

    2. Here is the summary for good case with date as 08-04-2012Note NDV value 51, and density is 0.019608.

  • 8/13/2019 Missing Partition Key Column Stats

    36/41

    Here is the JOIN cardinality calculation, for good case,use sel(ectivity) of 1/51.

  • 8/13/2019 Missing Partition Key Column Stats

    37/41

    Here is the plan with good JOINcardinality estimate.

  • 8/13/2019 Missing Partition Key Column Stats

    38/41

    Here is the query with bad plan, with additional date 09-04-2012 which has no partition column stats forDATE_SID.

  • 8/13/2019 Missing Partition Key Column Stats

    39/41

  • 8/13/2019 Missing Partition Key Column Stats

    40/41

  • 8/13/2019 Missing Partition Key Column Stats

    41/41

    Of course with bad plan