Download - Query Optimization
![Page 1: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/1.jpg)
Query Optimizer
(Chapter 9.0 - 9.6)
![Page 2: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/2.jpg)
Optimization
• Minimizes uses of resources by choosing best set of alternative query access plans
• considers I/O cost, CPU cost
• gathers statistics - may become out of date (DB2 – RUNSTATS command) – E.g. Selectivity of values - 1/domain - used
to determine number of tuples of each value
![Page 3: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/3.jpg)
Filter Factor FF (selectivity)
• Fraction of rows with specified values(s) for specific attribute that result from the predicate restriction
• FF(c)= # records satisfying condition ctotal# of records in relation
• Estimate attribute with i distinct values as:– Assume |R| is #rows in table R
( |R|/i) / |R| = 1/col_cardinality e.g. (10,000/2)/10,000 = 1/2
![Page 4: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/4.jpg)
Filter Factor FF
• FF tells how many tuples satisfy predicate - hopefully only need to access those tuples + index
• Statistical assumptions - uniform distribution of column values, independent join distribution of values from any 2 columns
![Page 5: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/5.jpg)
Assumptions
• Attribute values independent
• Conjunctive select (independent) C1 and C2 FF(C1) * FF(C2)
• e.g. 1/2 (gender) * 1/4 (class) = 1/8 freshman in CS are female
![Page 6: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/6.jpg)
Information for Optimization
1. SYSCOLUMNS col_name, table_name, #of values, High, Low
2. Statistics on columns that deviate strongly from the uniform assumption
3. Cluster Ratio how well clustering property holds for rows with respect to a given index if 100% clustered
with updates, becomes less clustered if clustering ratio 80% or more, use sequential
prefetch
![Page 7: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/7.jpg)
Examples of FF
• if SQL statement specified: – col = const,
• DB2 assumes FF is 1/col_cardinality
– col between const1 and const2• DB2 assumes FF=(const2 - const1)/(High - Low)
• For some predicates, FF not predictable by simple formula
![Page 8: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/8.jpg)
Explain Plan
• You can have access to query plan with
EXPLAIN PLAN statement for SQL_query in ORACLE
gives access type (index) col
![Page 9: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/9.jpg)
Using Indexes
• System must decide if to use index
• What if more than one index, which one?
• What if composite index?
![Page 10: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/10.jpg)
Plans using Indexes
Can use an index if index matches select condition in where clause:
1. A matching index scan - only have to access a limited number of contiguous leaf entries to access data
2. Predicate screening – index entries to eliminate RIDs3. Non-matching index scan – use index to identify RIDs4. Index-only retrieval – don’t have to access data, RIDs5. Multiple index retrieval – use >1 index to identify RIDs
![Page 11: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/11.jpg)
Indexes – Matching index scan
A matching index scan is a single-step query planOnly have to access contiguous leaf nodes
Example: Assume a table T1 with multiple indexes on columns C1, C2 and C3
1. Single where clause and (one) index matches Select * from T1
where C1=10
search B+-tree to leaf level for leftmost entry having specified values
useful for =, between
![Page 12: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/12.jpg)
Index Scan 2. If multiple where clauses and all '='
Select * from T1 where C1=10 and C2=5 and C3=1
a) if there is a composite index and a selectcondition matches all index columns
only have to read contiguous leaf pages FF = FF(P1) * FF(P2) * ...
b) if there is a separate index for each clause
must choose one of the indexes
![Page 13: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/13.jpg)
Index Scan– Predicate screening
3. If all select conditions match composite index columns and some selects are a range Select * from T1 where C1=10 and C2
between 5 and 50 and C3 like ‘A%’
- Access contiguous leaf pages, but not all results on contiguous leaf pages - Must examine index entries to determine if in the result
called predicate screening
![Page 14: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/14.jpg)
Predicate screening
• discard RIDs based on values (for index)
• will access fewer tuples because RIDs used instead to eliminate potential tuples
![Page 15: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/15.jpg)
Index Scan
4. If select conditions match some index columns of composite index
Select * from T1 where C1=10 and C2=30 and C6=20
- a matching scan can be used if at least one of the columns in select is first column of index – must eliminate tuples with what indexes you can, then
examine the tuples
![Page 16: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/16.jpg)
Rules for predicate matching
Decide how many attributes to match in a composite index after the first column, so can read in a small contiguous range of leaf entries in B+-tree to get RIDs
• Match first column of composite index then: – look at index columns from left to right – Match ends when no predicate found – If range (<=, like, between) for a column, match
terminates thereafter
If a range, easier to scan all entries for range - treat rest of entries as screening predicates
![Page 17: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/17.jpg)
Non-matching index scan
• Not always used by DBMSs• attributes in where clause don't include initial
attribute of index
Select * from T1
where C2=30 and C3=15
search leaf entries of index and compare values for entries
must read in all leaf pages to find C2, C3 values e.g. 50 index pages vs 500,000 data pages
![Page 18: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/18.jpg)
Index only retrieval
• elements retrieved in select clause are attributes of compose index
• don't need to access rows (actual data) Select C1, C3 from T1
where C1=5 and C3 between 2 and 5
Select count(*) from T1
![Page 19: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/19.jpg)
Multiple Index Access
• If conjunctive conditions in where clause (and), can use >1 index – Extract RIDs from each index satisfying
matching predicate – Intersect lists of RIDs (and them) from each
index – Final list - satisfies all predicates indexed – If disjunctive conditions (or)
Union the two lists of RIDs
![Page 20: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/20.jpg)
Some Query optimizer rules for using RID-lists (then use list prefetch)
1. predicted active resulting RIDs must not be > 50% of RID pool
2. Limit to any single RID list the size of the RID memory pool (16M RIDs)
3. RID list cannot be generated by screening predicates
![Page 21: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/21.jpg)
Rules for multiple indexes
Optimizer determines diminishing returns using multiple index access
1. List indexes with matching predicates in where clause
2. Place indexes in order by increasing filter factor3. For successive indexes, extract RID list only if
reduced cost for final row returned e.g. no sense reading 100's of pages of a new index to get number of rows to only 1 tuple
![Page 22: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/22.jpg)
List Prefetch – for accessing rows with RID list
• Assume once a list of RIDs is created, the system can order pages to minimize disk I/O– E.g. elevator algorithm for disk request
scheduling
![Page 23: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/23.jpg)
Problem: Using RID lists with Multiple IndexesProspects Table : 50M rows - 10 row per
pageIndexes:
• zipcode – 100,000 values (100 entries per page)
• hobby – 100 values (1000 entries per page• page – 50 values (1000 entries per page• incomeclass – 10 values (1000 entries per
page)
![Page 24: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/24.jpg)
Problem cont’d
Select name, stradr from prospectswhere zipcode between 02159 and 02658and age = 40 and hobby = ‘chess’ and incomeclass = 10;
FF in ascending order:1. FF(zipcode) = 500/100,000 = 1/2002. FF(hobby) = 1/1003. FF(age) = 1/504. FF(incomeclass) = 1/10
![Page 25: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/25.jpg)
Problem cont’d
Rows in table is: 5,000,000Data rows read if use indexes: (1) 50,000,000/200 = 250,000 (1,2) 250,000/100 = 2500 (1,2,3) 2500/50 = 50 (1,2,3,4) 50/10 = 5 How much time will this take? Is it cost effective
to use all of these indexes? see textbook Pg. 579
![Page 26: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/26.jpg)
Problem cont’d I/O costs
• Cost:– RIO is 1/80– Sequential Prefetch 1/800– List Prefetch 1/200
• Note: textbook assumes if read <= 3 pages use RIO
![Page 27: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/27.jpg)
Problem cont’d
Table scan5,000,000/800 = 6250
Using index 1:index: (500,000/200 + 3)/800 = 4data: 250,000/200 = 1250
Using indexes 1&2:index (50,000/100)/800 = 0.625data: 2500/200 = 12.5
![Page 28: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/28.jpg)
Problem cont’d
Using indexes 1,2,3:
index (50,000/50)/800 = 1.25
data: 50/200 = 0.25
Using indexes 1,2,3,4:
index (50,000/10)/800 = 6.25
data: 5/200 = 0.025
![Page 29: Query Optimization](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4ee0e4a7959244d8b46b9/html5/thumbnails/29.jpg)
Problem cont’dIndex used
Data rows
I/O cost
Index I/O cost
Cost Increase if use index
None 50M
6250 s
1 250,000
1250 s
4 s Decrease 6250 to 1250 s
With 4 additional s
1,2 2500
12.5 s
4 + 0.625 s Decrease 1250 to 12.5 s
With 0.625 additional s
1,2,3 50
0.25 s
4 + 0.625 + 1.25
Decrease 12.5 to .25 s
With 1.25 additional s
1,2,3,4 5
0.025 s
4 + 0.625 + 1.25 + 6.25 s
Decrease 0.25 to 0.025 s
With 6.25 additional s