understanding histogram ppt
Post on 27-Apr-2015
115 Views
Preview:
TRANSCRIPT
Understanding Histograms in the Oracle Database
Robert GaydosRobert GaydosR Wave SolutionsRobert.Gaydos@rwavesolutions.com
Copyright © R Wave Solutions 2008 All rights reserved
Addenda
� What is a Histogram?� Types of Histograms� How to create Histograms� How to identify existing Histograms� Rules and tips when creating Histograms� Q & A
Copyright © R Wave Solutions 2008 All rights reserved
What is a Histogram?
� A histogram holds the data distribution of values within a column of a table.– Number of occurrences for a specific value/range – Number of occurrences for a specific value/range – Used by CBO to optimize a query.
� It is collected by using DBMS_STATS .– By default DBMS_STATS does not collect
histogram stats.� Min/Max Value and # of distinct values� NOTE: DBMS_STAT is used to delete histogram data.
Copyright © R Wave Solutions 2008 All rights reserved
Types of Histograms
� Two types of Histograms– frequency histograms
height-balanced histograms – height-balanced histograms – Type of histogram is stored in the HISTOGRAM
column of the *tab_col_statistics views (USER/DBA)
– Value = (‘HEIGHT BALANCED’, ‘FREQUENCY’, or ‘NONE’)
Copyright © R Wave Solutions 2008 All rights reserved
Histogram and Buckets
� When Histograms are created the number of buckets can be specified.
� It is this number that controls the type of � It is this number that controls the type of histogram created.
� # Buckets = # of Rows of information.� When Distinct Values <= Buckets
– Then Frequency Histogram is created– Else Height-Balanced Histogram is created
Copyright © R Wave Solutions 2008 All rights reserved
Frequency Histograms
� Each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of � Each bucket contains the number of occurrences of that single value.
Copyright © R Wave Solutions 2008 All rights reserved
Frequency Histograms Example
DataABC
SortedABB
ResultsBucket 1 A = 1Bucket 2 B = 3
CBCCCCCBDE
BBCCCCCCDE
Bucket 2 B = 3Bucket 3 C = 6Bucket 4 D = 1Bucket 5 E = 1
Copyright © R Wave Solutions 2008 All rights reserved
Height-Balanced Histograms
� In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same band contains approximately the same number of rows.
� The useful information that the histogram provides is where in the range of values the endpoints fall .
Copyright © R Wave Solutions 2008 All rights reserved
Height-Balanced Example 1
DataABC
SortedABB
ResultsEPN0
EPVB
CBCCCCCBDE
BBCCCCCCDE
Buckets = 4
0123
BCCE
Copyright © R Wave Solutions 2008 All rights reserved
Height-Balanced Example 1
DataABC
SortedABB
ResultsEPN0
EPVB
Notice that C Crosses Buckets.
The DB only stores0 - B2 - C3 - E
CBCCCCCBDE
BBCCCCCCDE
0123
Buckets = 4
BCCE
Crosses Buckets.
This is how database knows which values are popular
Copyright © R Wave Solutions 2008 All rights reserved
How to Create a Histogram
� Created by using DBMS_STATS.GATHER_TABLE_STATS
� METHOD_OPT => 'FOR COLUMNS SIZE � METHOD_OPT => 'FOR COLUMNS SIZE <# of buckets> <Column Name>‘
execute dbms_stats.gather_table_stats (ownname => 'oe', tabname => 'inventories', METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand');
Copyright © R Wave Solutions 2008 All rights reserved
Identify Existing Histograms
� In 10g– HISTOGRAM column of the *tab_col_statistics views
(USER/DBA)
In 9� In 9select owner, table_name, column_name, count(*)
bucketsfrom dba_histogramswhere endpoint_number not in (0,1)group by owner, table_name, column_name
Copyright © R Wave Solutions 2008 All rights reserved
Rules On When To Create Histograms
� First there are no rules of thumb that are always correct.When creating using dbms_stats, the use of � When creating using dbms_stats, the use of the DEGREE (for parallel) can skew the value of density
� Running dbms_stats can drop histograms– Use method_opt==>'REPEAT‘
Copyright © R Wave Solutions 2008 All rights reserved
Rules On When To Create Histograms
� Index maintenance overhead is performed during update, insert and delete operations.
� Histogram maintenance overhead is � Histogram maintenance overhead is performed during the analyze/dbms_stats process.
� Histograms are NOT just for indexed columns.
– Adding a histogram to an un-indexed column that is used in a where clause can improve performance.
Copyright © R Wave Solutions 2008 All rights reserved
Histogram Example
select sum(t1.d2*t2.d3*t3.d3)from t1, t2, t3where t1.fk1 = t2.pk1and t3.pk1 = t2.fk1and t3.d2 = 35and t1.d3 = 0; << This column has 2 values in it
In this example column T1.D3 has 2 values and is our most selective criterion, but without a histogram CBO assumes an even distribution of 2 distinct values (density = 0.5).
Copyright © R Wave Solutions 2008 All rights reserved
SQL*Plan Without Histogram
Rows Row Source Operation------- ---------------------------------
1 SORT AGGREGATE 1 SORT AGGREGATE 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID 601 INDEX RANGE SCAN OBJ#(57612)
1000 HASH JOIN 4 TABLE ACCESS FULL OBJ#(57604)
filter("T1"."D3"=0)62500 TABLE ACCESS FULL OBJ#(57605)
Copyright © R Wave Solutions 2008 All rights reserved
Timing Without Histogram
call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 1 0.00 0.00 0 0 0 0Fetch 2 1.04 4.91 19013 20231 0 1------- ------ -------- ---------- ---------- ---------- ---------- ----------------- ------ -------- ---------- ---------- ---------- ---------- ----------total 4 1.04 4.91 19013 20231 0 1
Query = 20231
Copyright © R Wave Solutions 2008 All rights reserved
Use DBMS_STATS to Add Histogram
execute dbms_stats.gather_table_stats(NULL, 'T1', estimate_percent=>null, method_opt=>'FOR COLUMNS SIZE AUTO method_opt=>'FOR COLUMNS SIZE AUTO d3');
Copyright © R Wave Solutions 2008 All rights reserved
Timing With Histogram
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.01 0.00 0 0 0 0
Fetch 2 0.56 2.84 9841 10666 0 1Fetch 2 0.56 2.84 9841 10666 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.57 2.84 9841 10666 0 1
Now Query = 10666It was Query = 20231
½ the IOCopyright © R Wave Solutions 2008 All rights reserved
Plan With Histogram
Rows Row Source Operation------- ---------------------------------------
1 SORT AGGREGATE 2088 HASH JOIN 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID OBJ#(57606) 601 INDEX RANGE SCAN OBJ#(57612) 1000 TABLE ACCESS BY INDEX ROWID OBJ#(57605) 1005 NESTED LOOPS
4 TABLE ACCESS FULL OBJ#(57604)filter("T1"."D3"=0)
1000 INDEX RANGE SCAN OBJ#(57609)access("T1"."FK1"="T2"."PK1")
Copyright © R Wave Solutions 2008 All rights reserved
Histograms Opportunities
� Any column used in a where clause with skewed dataColumns that are not queried all the time� Columns that are not queried all the time
� Reduced overhead for insert, update, delete
Copyright © R Wave Solutions 2008 All rights reserved
When Not To Create A Histogram
� No rule of thumb is ever perfect.� Do not create Histograms
– Evenly distributed data in the column– Columns that are not queried
– Do not create them on every column of every table.
– PK of a table
Copyright © R Wave Solutions 2008 All rights reserved
Creating / Maintaining Histograms
� Table maintenance procedures are important.– Accidentally deleting histogram data is possible.– Accidentally deleting histogram data is possible.– Histogram data is ONLY Collected/Maintained
when stats are collected (DBMS_STATS).– Changing data in the table does not change data
stored in the histogram.
Copyright © R Wave Solutions 2008 All rights reserved
Maintaining Histograms
� Always check if histogram exists on table before DBMS_STATS is run.– Do not forget to note the # of buckets.– Do not forget to note the # of buckets.– Create tables to hold information.
� Use METHOD_OPT FOR ALL COLUMNS REPEAT to prevent deletion of histograms data.
� Distribution ratio change -> Recollect
Copyright © R Wave Solutions 2008 All rights reserved
Creating Histograms
� What is the best way to start out creating Histograms? Three popular ways to create.– Salt and Pepper - Create them as you need them.– Salt and Pepper - Create them as you need them.– Little Hammer DBMS_STATS using
METHOD_OPT AUTO� Should NOT be used after fresh restarted database
– Big Hammer DBMS_STATS using SKEWONLY– Over Kill Hammer - FOR ALL COLUMNS
Copyright © R Wave Solutions 2008 All rights reserved
Other Comments
� It is the Density of a column that the optimizer considers when accessing a table.Changing the number of buckets changes � Changing the number of buckets changes the density.
� More Buckets DOES NOT mean better density.
� Using degree != 1 can affect density.
Copyright © R Wave Solutions 2008 All rights reserved
Review
� What is a Histogram?� Two types of Histograms in the database� How to create Histograms� Data is collected by running DBMS_STATS.� Different methods to create histograms� Bucket size affects density of column.� Density of the column affects CBO.
Copyright © R Wave Solutions 2008 All rights reserved
Thank You for Your Time
Copyright © R Wave Solutions 2008 All rights reserved
top related