query optimizer (chapter 9.0 - 9.6)

24
Query Optimizer (Chapter 9.0 - 9.6)

Upload: vivien-william

Post on 01-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Query Optimizer (Chapter 9.0 - 9.6). Optimization. Minimizes uses of resources by choosing best set of alternative query access plans considers  I/O cost, CPU cost gathers statistics - may become out of date (DB2 - RUNSTATS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Query Optimizer  (Chapter 9.0 - 9.6)

Query Optimizer

(Chapter 9.0 - 9.6)

Page 2: Query Optimizer  (Chapter 9.0 - 9.6)

Optimization

• Minimizes uses of resources by choosing best set of alternative query access plans

• considers  I/O cost, CPU cost

• gathers statistics - may become out of date (DB2 - RUNSTATS)

• selectivity of values - 1/domain - used to determine number of tuples of each values

Page 3: Query Optimizer  (Chapter 9.0 - 9.6)

Filter Factor - selectivity

• Fraction of rows with specified values(s) for specified attribute that result from the predicate restriction

• FF(c)= # records satisfying conditiontotal# of records in relation

• Estimate attribute with i distinct values as: ( |R|/i) / |R|  = 1/col_cardinality

e.g. (10,000/2)/10,000 = 1/2               

Page 4: Query Optimizer  (Chapter 9.0 - 9.6)

Filter Factor FF

• FF tells how many tuples satisfy predicate - hopefully only need to access those tuples + index

• Statistical assumptions - uniform distribution of column values, independent join distribution of values from any 2 columns

Page 5: Query Optimizer  (Chapter 9.0 - 9.6)

Assumptions

• Attribute values independent

• Conjunctive select (independent)                 C1 and C2                 FF(C1) * FF(C2)   

• e.g.  1/2 (gender) * 1/4 (class) = 1/8 freshman female in CS

Page 6: Query Optimizer  (Chapter 9.0 - 9.6)

Information for Optimization

1. SYSCOLUMNS     col_name, table_name, #of values, High, Low

2. Cluster Ratio how well clustering property holds for rows with respect to a given index           if 100% clustered - clustered             with updates, becomes less clustered              if clustering ratio 80% or more, use sequential

prefetch 3. Statistics on columns that deviate strongly from the

uniform assumption

Page 7: Query Optimizer  (Chapter 9.0 - 9.6)

Examples of FF

• if SQL statement specified: – col = const,

• DB2 assumes FF is 1/col_cardinality

– col between const1 and const2• DB2 assumes FF=(const2 - const1)/(High - Low)

• Predicates involving non-correlated subselects can be used for index retrieval but FF not predictable by simple formula

Page 8: Query Optimizer  (Chapter 9.0 - 9.6)

Explain Plan

• You can have access to query plan with

  EXPLAIN PLAN statement for SQL_query in ORACLE

gives access type (index) col

Page 9: Query Optimizer  (Chapter 9.0 - 9.6)

Plans using Indexes

Can use an index if index matches select condition in where clause:

• A matching index scan - only have to access a limited number of contiguous leaf entries to access data

• Predicate screening – index entries to eliminate RIDs• Non-matching index scan – use index to identify RIDs• Index-only retrieval – don’t have to access data, RIDs• Multiple index retrieval – use >1 index to identify RIDs

Page 10: Query Optimizer  (Chapter 9.0 - 9.6)

Matching index scan

When is a matching index scan used? Assume a table T1 with multiple indexes on

columns C1, C2 and C3

1. Single where clause and (one) index matches Select * from T1

where C1=10 search B+-tree to leaf level for leftmost entry having specified values

useful for =, between

Page 11: Query Optimizer  (Chapter 9.0 - 9.6)

Index Scan used

2. If multiple where clauses and all '=' Select * from T1 where C1=10 and C2=5 and C3=1

  a)   if there is a separate index for each clause       must choose one of the indexes b)  if there is a composite index and a select

condition matches all index columns onlyhave to read contiguous leaf pages

             FF = FF(P1) * FF(P2) * ...

Page 12: Query Optimizer  (Chapter 9.0 - 9.6)

Index Scan used

3. If all select conditions match composite index columns and some selects are a range Select * from T1 where C1=10 and C2

between 5 and 50

- not all entries on contiguous leaf pages

If must examine index entries to determine if in the result called predicate screening

Page 13: Query Optimizer  (Chapter 9.0 - 9.6)

Predicate screening

• discard RIDs based on values (for index)

• will access fewer tuples because RIDs used instead to eliminate potential tuples

Page 14: Query Optimizer  (Chapter 9.0 - 9.6)

Index Scan used

4. If select conditions match some index columns of composite index

          Select * from T1 where C1=10 and C2=30 and C6=20

- a matching scan can be used if at least one of the columns in select is first column of index – must eliminate tuples with what indexes you can, then

examine the tuples

Page 15: Query Optimizer  (Chapter 9.0 - 9.6)

Rules for predicate matching

Decide how many attributes to match in a composite index after the first column, so can read in a small contiguous range of leaf entries in B+-tree to get RIDs

• Match first column of composite index then: – look at index columns from left to right – Match ends when no predicate found – If range (<=, like, between) for a column, match

terminates thereafter

  If a range, easier to scan all entries for range - treat rest of entries as screening predicates

Page 16: Query Optimizer  (Chapter 9.0 - 9.6)

Non-matching index scan

• attributes in where clause don't include initial attribute of index

          Select * from T1

where C2=30 and C3=15

search leaf entries of index and compare values for entries

must read in all leaf pages to find C2, C3 values       e.g. 50 index pages vs 500,000 data pages

Page 17: Query Optimizer  (Chapter 9.0 - 9.6)

Index only retrieval

• elements retrieved in select clause are attributes of compose index

• don't need to access rows (actual data)     Select C1, C3 from T1

where C1=5 and C3 between 2 and 5

       Select count(*) from T1

Page 18: Query Optimizer  (Chapter 9.0 - 9.6)

Multiple Index Access

• If conjunctive conditions in where clause (and), can use >1 index – Extract RIDs from each index satisfying

matching predicate – Intersect lists of RIDs (and them) from each

index – Final list - satisfies all predicates indexed

Page 19: Query Optimizer  (Chapter 9.0 - 9.6)

Multiple Index Access

– If disjunctive conditions (or)          Union the two lists of RIDs

Page 20: Query Optimizer  (Chapter 9.0 - 9.6)

Query optimizer rules for RIDs (DB2)

   1.  predicted active resulting RIDs must not be  > 50% of RID pool

2.  Limit to any single RID list the size of the RID memory pool (16M RIDs)

3.  RID list cannot be generated by screening predicates

Page 21: Query Optimizer  (Chapter 9.0 - 9.6)

Rules cont’d

Optimizer determines diminishing returns using multiple index access

1.  List indexes with matching predicates in where clause

2.  Place indexes in order by increasing filter factor3.  For successive indexes, extract RID list only if

reduced cost for final row returned     e.g. no sense reading 100's of pages of a new index to get number of rows to only 1 tuple

Page 22: Query Optimizer  (Chapter 9.0 - 9.6)

Example with Multiple Indexes

Table prospects: 50M rows

Indexes: zipcode – 100,000 values

hobby – 100 values

age – 50 values

incomeclass – 10 values

Page 23: Query Optimizer  (Chapter 9.0 - 9.6)

Example with Multiple Indexes

Select name, stradr from prospectswhere zipcode between 02159 and 02658and age = 40 and hobby = ‘chess’ and incomeclass = 10;

FF in ascending order:1. FF(zipcode) = 500/100,000 = 1/2002. FF(hobby) = 1/1003. FF(age) = 1/504. FF(incomeclass) = 1/10

Page 24: Query Optimizer  (Chapter 9.0 - 9.6)

Example

(1) 50,000,000/200 = 250,000

(2) 250,000/100 = 2500

(3) 2500/50 = 50

(4) 50/10 = 5

How much time will this take? Is it cost effective to use all of these indexes?

see textbook Pg. 579