based on slides from shasha et al, kifer et al. query...

61
© Dennis Shasha, Philippe Bonnet - 2001 QUERY PROCESSING Based on slides from Shasha et al, Kifer et al.

Upload: others

Post on 23-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

QUERY PROCESSINGBased on slides from Shasha et al, Kifer et al.

Page 2: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Query TuningSELECT s.RESTAURANT_NAME, t.TABLE_SEATING, to_char(t.DATE_TIME,'Dy, Mon FMDD') AS THEDATE, to_char(t.DATE_TIME,'HH:MI PM') AS THETIME,to_char(t.DISCOUNT,'99') || '%' AS AMOUNTVALUE,t.TABLE_ID, s.SUPPLIER_ID, t.DATE_TIME, to_number(to_char(t.DATE_TIME,'SSSSS')) AS SORTTIME

FROM TABLES_AVAILABLE t, SUPPLIER_INFO s,

(SELECT s.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, max(t.DISCOUNT) AMOUNT, t.OFFER_TYPEFROM TABLES_AVAILABLE t, SUPPLIER_INFO WHERE t.SUPPLIER_ID = s.SUPPLIER_ID

and (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') != TO_CHAR(sysdate, 'MM/DD/YYYY') OR TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)

and t.NUM_OFFERS > 0and t.DATE_TIME > SYSDATE and s.CITY = 'SF' and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7) and to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.O F F E R _T Y P E = 'D iscou nt„

GROUP BYs.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, t.OFFER_TYP

) u

WHEREt.SUPPLIER_ID = s.SUPPLIER_ID

and u.SUPPLIER_ID = s.SUPPLIER_IDand t.SUPPLIER_ID = u.SUPPLIER_IDand t.TABLE_SEATING = u.TABLE_SEATINGand t.DATE_TIME = u.DATE_TIMEand t.DISCOUNT = u.AMOUNTand t.OFFER_TYPE = u.OFFER_TYPEand (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') !=

TO_CHAR(sysdate, 'MM/DD/YYYY') OR TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)

and t.NUM_OFFERS > and t.DATE_TIME > SYSDATE and s.CITY = 'SF' and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7) and

to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.OFFER_TYPE = 'Discount'

ORDER BY AMOUNTVALUE DESC, t.TABLE_SEATING ASC, upper(s.RESTAURANT_NAME) ASC,SORTTIME ASC, t.DATE_TIME ASC

E xecution is too slow …1) How is this query executed?2) How to make it run faster?

Page 3: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Query Execution PlanOutput of the Oracle EXPLAIN tool

Physical Operators

Access Method Cost Model

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)

1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)

2 1 NESTED LOOPS (Cost=164 Card=1 Bytes=106)

3 2 NESTED LOOPS (Cost=155 Card=1 Bytes=83)

4 3 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=28)

5 3 VIEW

6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)

7 6 NESTED LOOPS (Cost=81 Card=1 Bytes=34)

8 7 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=24)

9 7 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=200)

10 2 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=460)

Page 4: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 20014

Query Processing Architecture

Page 5: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Agenda

• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation

Page 6: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Query Representation

• A query is decomposed into blocks– Aggregation– Order by– SPJ– Relations

• Each block is represented and optimized independently– Inter block optimization is possible/desirable.

Page 7: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Query Representation

• Query Tree • Query graph

Reserves Sailors

sid=sid

bid=100

sname(On-the-fly)

rating > 5(Scan;write to temp T1)

(Scan;write totemp T2)

(Sort-Merge Join)

Reserves

Sailors

rating > 5

bid=100

sid=sid

Page 8: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Query Execution PlanOutput of the Oracle EXPLAIN tool

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)

1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)

2 1 NESTED LOOPS (Cost=164 Card=1 Bytes=106)

3 2 NESTED LOOPS (Cost=155 Card=1 Bytes=83)

4 3 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=28)

5 3 VIEW

6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)

7 6 NESTED LOOPS (Cost=81 Card=1 Bytes=34)

8 7 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=24)

9 7 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=200)

10 2 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=460)

Page 9: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Agenda

• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation

Page 10: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200110

Access Path•• Access pathAccess path is the notion that denotes algorithm +

data structure used to locate rows satisfying some condition

• Examples:– File scan: can be used for any condition– Hash: equality search; all search key attributes of hash

index are specified in condition– B+ tree: equality or range search; a prefix of the search

key attributes are specified in condition• B+ tree supports a variety of access paths

– Binary search: Relation sorted on a sequence of attributes and some prefix of that sequence is specified in condition

Page 11: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200111

Access Paths Supported by B+ tree• Example: Given a B+ tree whose search key is the

sequence of attributes a2, a1, a3, a4– Access path for search a1>5 AND a2=3 AND a3= ‘x’ (R): find

first entry having a2=3 AND a1>5 AND a3= ‘x’ and scan leaves from there until entry having a2>3 or a3 ‘x’. Select satisfying entries

– Access path for search a2=3 AND a3 >‘x’ (R): locate first entry having a2=3 and scan leaves until entry having a2>3. Select satisfying entries

– Access path for search a1>5 AND a3 = ‘x’ (R): Scan of R

Page 12: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200112

Choosing an Access Path•• SelectivitySelectivity of an access path = number of pages

retrieved using that path• Size of domain of attribute is an indicator of the

selectivity of search conditions that involve that attribute

• Example: CrsCode= „C S 305‟ AND Grade= „B ‟ (TranscriptTranscript)– a B+ tree with search key CrsCode has lower selectivity

than a B+ tree with search key Grade

Page 13: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Agenda

• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation

Page 14: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Physical Operators

• Algorithms– Sort– Select – Project– Join

• Nested Loop• Sort-Merge• Hash-Join

Page 15: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200115

External Sorting• Sorting is used in implementing many relational

operations• Problem:

– Relations are typically large, do not fit in main memory– So cannot use traditional in-memory sorting algorithms

• Approach used:– Combine in-memory sorting with techniques aimed at

minimizing I/O– I/O costs dominate => cost of sorting algorithm is measured

in the number of page transfers

Page 16: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200116

E xternal S orting (cont‟d)

• External sorting has two main components:– Computation involved in sorting records in

buffers in main memory– I/O necessary to move records between mass

store and main memory

Page 17: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200117

Simple Sort Algorithm• M = number of main memory page buffers• F = number of pages in file to be sorted• Typical algorithm has two phases:

– Partial sort phase: sort M pages at a time; create F/M sorted runsruns on mass store, cost = 2F

Example: M = 2, F = 7run

Original file

Partially sorted file

5 3 2 6 1 10 15 7 20 11 8 4 7 5

2 3 5 6 1 7 10 15 4 8 11 20 5 7

Page 18: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200118

Simple Sort Algorithm– Merge Phase: merge all runs into a single run

using M-1 buffers for input and 1 output buffer• Merge step: divide runs into groups of size M-1 and merge each group into a run; cost = 2F

each step reduces number of runs by a factor of M-1

M pagesBuffer

Page 19: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200119

Duplicate Elimination

• A major step in computing projection, union, and difference relational operators

• Algorithm:– Sort– At the last stage of the merge step eliminate

duplicates on the fly– No additional cost (with respect to sorting) in

terms of I/O

Page 20: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200120

Sort-Based Projection

• Algorithm:– Sort rows of relation– Eliminate unwanted columns in partial sort

phase (no additional cost)– Eliminate duplicates on completion of last

merge step (no additional cost)• Cost: the cost of sorting

Page 21: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200121

Computing Selection (attr op value)

• No index on attr:– If rows are not sorted on attr:

• Scan all data pages to find rows satisfying selection condition

• Cost = F– If rows are sorted on attr and op is =, >, < then:

• Use binary search (at log2 F ) to locate first data page containing row in which (attr = value)

• Scan further to get all rows satisfying (attr op value)• Cost = log2 F + (cost of scan)

Page 22: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200122

Computing Selection (attr op value)• Clustered B+ tree index on attr (for “= ” or range search):

– Locate first index entry corresponding to a row in which (attr = value). CostCost = depth of tree

– Rows satisfying condition packed in sequence in successive data pages; scan those pages.CostCost: number of pages occupied by qualifying rows

B+ treeindex entries(containing rows)that satisfycondition

Page 23: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200123

Computing Selection (attr op value)

• Unclustered B+ tree index on attr (for “= ” or range search):– Locate first index entry corresponding to a row in which (attr

= value). CostCost = depth of tree

– Index entries with pointers to rows satisfying condition are packed in sequence in successive index pages• Scan entries and sort record Ids to identify table data pages

with qualifying rowsAny page that has at least one such row must be fetched once.

•• CostCost: number of rows that satisfy selection condition

Page 24: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200124

Unclustered B+ Tree Indexindex entries (containingrow Ids) that satisfycondition

data page

Data file

B+ Tree

Page 25: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200125

Computing Selection (attr = value)

• Hash index on attr (for “= ” search only):– Hash on value. CostCost 1.2

• 1.2 – typical average cost of hashing (> 1 due to possible overflow chains)

• Finds the (unique) bucket containing all index entries satisfying selection condition

• Clustered index – all qualifying rows packed in the bucket (a few pages)CostCost: number of pages occupies by the bucket

• Unclustered index – sort row Ids in the index entries to identify data pages with qualifying rowsEach page containing at least one such row must be fetched onceCostCost: min(number of qualifying rows in bucket, number of pages in file)

Page 26: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200126

Computing Joins• The cost of joining two relations makes the

choice of a join algorithm crucial•• SimpleSimple blockblock--nested loopsnested loops join algorithm

for computing r A=B s

foreach page pr in r doforeach page ps in s do

output pr A=B ps

Page 27: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200127

Block-Nested Loop Illustrated

Output buffer

s

r

Input buffer for s

Input buffer for r

… and so on

r s

Page 28: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200128

Index-Nested Loop Join r A=B s• Use an index on s with search key B (instead of

scanning s) to find rows of s that match tr–– Cost Cost = r + r + cost of outputting final result

– Effective if number of rows of s that match tuples in r is small (i.e., is small) and index is clustered

foreach tuple tr in r do {use index to find all tuples ts in s satisfying tr.A=ts.B;output (tr, ts)

}

Number of rows in r

avg cost of retrieving all rows in s that match tr

Page 29: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200129

Sort-Merge Join r A=B ssort r on A;sort s on B;while !eof(r) and !eof(s) do {

Scan r and s concurrently until tr.A=ts.B=c;Output A=c(r)B=c (s)

}

r

s

B=c (s)

A=c(r)

Page 30: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200130

Hash-Join r A=B s

• Step 1: Hash r on A and s on B into the same set of buckets

• Step 2: Since matching tuples must be in same bucket, read each bucket in turn and output the result of the join

•• CostCost:: 3 (r + s ) + cost of output of final result– assuming each bucket fits in memory

Page 31: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Agenda

• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation

Page 32: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200132

Query Optimizer

• Uses heuristic algorithms to evaluate relational algebra expressions. This involves:– estimating the cost of a relational algebra expression– transforming one relational algebra expression to an

equivalent one– choosing access paths for evaluating the subexpressions

• Q uery optim izers do not “optim ize” – just try to find “reasonably good” evaluation strategies

Page 33: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200133

Equivalence Preserving Transformations

• To transform a relational expression into another equivalent expression we need transformation rules that preserve equivalence

• Each transformation rule– Is provably correct (ie, does preserve equivalence)– Has a heuristic associated with it

Page 34: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200134

Selection and Projection Rules

• Break complex selection into simpler ones:– Cond1Cond2 (R) Cond1 (Cond2 (R) )

• Break projection into stages:– attr (R) attr ( attr(R)), if attr attr

• Commute projection and selection:– attr (Cond(R)) Cond ( attr (R)),

if attr all attributes in Cond

Page 35: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200135

Commutativity and Associativity of Join(and Cartesian Product as Special Case)

• Join commutativity: R S S R– used to reduce cost of nested loop evaluation strategies (smaller relation

should be in outer loop)

• Join associativity: R (S T) (R S) T– used to reduce the size of intermediate relations in computation of multi-

relational join – first compute the join that yields smaller intermediate result

• N-way join has T(N) N! different evaluation plans– T(N) is the number of parenthesized expressions– N! is the number of permutations

• Query optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan

Page 36: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200136

Pushing Selections and Projections

• Cond (R S) R Cond S – Cond relates attributes of both R and S– Reduces size of intermediate relation since rows can be

discarded sooner

• Cond (R S) Cond (R) S – Cond involves only the attributes of R– Reduces size of intermediate relation since rows of R are

discarded sooner

• attr(R S) attr(attr(R) S),if attributes(R) attr attr ∩ attributes(R)– reduces the size of an operand of product

Page 37: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200137

Equivalence Example

• C1 C2 C3 (RR SS) C1 (C2 (C3 (RR SS) ) ) C1 (C2 (RR) C3 (SS) ) C2 (RR) C1C3 (SS)

assuming C2 involves only attributes of RR, C3 involves only attributes of SS, and C1 relates attributes of R and SS

Page 38: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200138

SELECT P.NameFROM ProfessorProfessor P, TeachingTeaching TWHERE P.Id = T.ProfId -- join condition

AND P. DeptId = „C S ‟ AND T.Semester = „F 1994‟

Name(D eptId= ‘C S’ Sem ester= ‘F 1994’(ProfessorProfessor Id=ProfId TeachingTeaching))

Cost - Example 1

Name

D eptId= ‘C S’ Sem ester= ‘F 1994’

Id=ProfId

ProfessorProfessor TeachingTeaching

Master query execution plan (nothing pushed)

Page 39: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200139

Metadata on Tables (in system catalogue)

–– ProfessorProfessor (Id, Name, DeptId)• size: 200 pages, 1000 rows, 50 departments• indexes: clustered, 2-level B+tree on DeptId, hash on Id

–– TeachingTeaching (ProfId, CrsCode, Semester)• size: 1000 pages, 10,000 rows, 4 semesters• indexes: clustered, 2-level B+tree on Semester;

hash on ProfId– Definition: WeightWeight of an attribute – average number

of rows that have a particular value• weight of Id = 1 (it is a key)• weight of ProfId = 10 (10,000 classes/1000 professors)

Page 40: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200140

Estimating Cost - Example 1• Join - block-nested loops with 52 page buffer (50

pages – input for Professor, Professor, 1 page – input for TeachingTeaching, 1 – output page– Scanning ProfessorProfessor (outer loop): 200 page transfers,

(4 iterations, 50 transfers each)– Finding matching rows in TeachingTeaching (inner loop):

1000 page transfers for each iteration of outer loop– Total cost = 200+4*1000 = 4200 page transfers

Page 41: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200141

Estimating Cost - E xam ple 1 (cont‟d)

• Selection and projection – scan rows of interm ediate file, discard those that don‟t satisfy selection, project on those that do, write result when output buffer is full.

• Complete algorithm:– do join, write result to intermediate file on disk– read intermediate file, do select/project, write final

result– Problem: unnecessary I/O

Page 42: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200142

Pipelining• Solution: use pipeliningpipelining:

– join and select/project act as coroutines, operate as producer/consumer sharing a buffer in main memory.• When join fills buffer; select/project filters it and outputs

result• Process is repeated until select/project has processed last

output from join – Performing select/project adds no additional cost

joinjoin selectselect/projectprojectIntermediateresult

outputfinal result

buffer

Page 43: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200143

Estimating Cost - E xam ple 1 (cont‟d)

• Total cost:4200 + (cost of outputting final result)

– We will disregard the cost of outputting final result in comparing with other query evaluation strategies, since this will be same for all

Page 44: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200144

Name(Sem ester= ‘F 1994’ (D eptId= ‘C S’ (ProfessorProfessor) Id=ProfId TeachingTeaching))

Cost Example 2SELECT P.NameFROM ProfessorProfessor P, TeachingTeaching TWHERE P.Id = T.ProfId AND

P. DeptId = „C S ‟ AND T.Semester = „F 1994‟

Name

Sem ester= ‘F 1994’

DeptId= „C S ‟

ProfessorProfessor TeachingTeaching

Id=ProfId

Partially pushed plan: selection pushed to ProfessorProfessor

Page 45: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200145

Cost Example 2 -- selection• Compute DeptId= „C S ‟ (ProfessorProfessor) (to reduce size of one

join table) using clustered, 2-level B+ tree on DeptId.– 50 departments and 1000 professors; hence weight

of DeptId is 20 (roughly 20 CS professors). These rows are in ~4 consecutive pages in ProfessorProfessor.• Cost = 4 (to get rows) + 2 (to search index) = 6• keep resulting 4 pages in memory and pipe to next step

clustered indexon DeptId rows of

ProfessorProfessor

Page 46: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200146

Cost Example 2 -- join

• Index-nested loops join using hash index on ProfId of TeachingTeaching and looping on the selected professors (computed on previous slide)– Since selection on Semester was not pushed, hash

index on ProfId of TeachingTeaching can be used – Note: if selection on Semester were pushed, the

index on ProfId would have been lost – an advantage of not using a fully pushed query execution plan

Page 47: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200147

Cost Example 2 – join (cont‟d)– Each professor matches ~10 TeachingTeaching rows. Since 20 CS

professors, hence 200 teaching records.– All index entries for a particular ProfId are in same bucket.

Assume ~1.2 I/Os to get a bucket.• Cost = 1.2 20 (to fetch index entries for 20 CS

professors) + 200 (to fetch TeachingTeaching rows, since hash index is unclustered) = 224

TeachingTeachinghash

1.2 20 10

ProfId

Page 48: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200148

Cost Example 2 – select/project

• Pipe result of join to select (on Semester) and project (on Name) at no I/O cost

• Cost of output same as for Example 1• Total cost:

6 (select on ProfessorProfessor) + 224 (join) = 230• Comparison:

4200 (example 1) vs. 230 (example 2) !!!

Page 49: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200149

Choosing Query Execution Plan

• Step 1: Choose a logical plan• Step 2: Reduce search space• Step 3: Use a heuristic search to further

reduce complexity

Page 50: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200150

Step 1: Choosing a Logical Plan

• Involves choosing a query tree, which indicates the order in which algebraic operations are applied

• Heuristic: P ushed trees are good, but som etim es “nearly fully pushed” trees are better due to indexing (as w e saw in the example)

• So: T ake the initial “m aster plan” tree and produce a fully pushed tree plus several nearly fully pushed trees.

Page 51: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200151

Step 2: Reduce Search Space

• Deal with associativity of binary operators (join, union, … )

A B C DA B C D

A B C DA B C D

DD

CC

A BA BLogical queryexecution plan

Equivalentquery tree Equivalent left left

deep query treedeep query tree

Page 52: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200152

S tep 2 (cont‟d)

• Two issues:– Choose a particular shape of a tree (like in the

previous slide)• Equals the number of ways to parenthesize N-way

join – grows very rapidly– Choose a particular permutation of the leaves

• E.g., 4! permutations of the leaves A, B, C, DA, B, C, D

Page 53: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200153

Step 2: Dealing With Associativity• Too many trees to evaluate: settle on a particular

shape: leftleft--deep tree.deep tree.– Used because it allows pipeliningpipelining:

– Property: once a row of XX has been output by P1, it need not be output again (but C may have to be processed several times in P2 for successive portions of XX)

– Advantage: none of the intermediate relations (X, YX, Y) have to be completely materialized and saved on disk.• Important if one such relation is very large, but the final result is

small

A BA B X CX C Y DY DXX YY

P1 P2 P3

Page 54: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200154

Step 2: Dealing with Associativity

• consider the alternative: if we use the association ((A BA B) (C DC D))

A BA B

C DC D

X YX Y

XX

YY

Each row of XX mustbe processed againstall of YY. Hence all ofYY (can be very large)must be stored in P3, or P2 has torecompute it several times.

P1

P2P3

Page 55: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200155

Step 3: Heuristic Search

• The choice of left-deep trees still leaves open too many options (N! permutations):– (((A BA B) CC) DD), – (((C AC A) DD) BB), … ..

• A heuristic (often dynamic programming based) algorithm is used to get a „good‟ plan

Page 56: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 200156

Step 3: Dynamic Programming Algorithm

• To compute a join of E1, E2, … , E N in a left-deep manner:– Start with 1-relation expressions (can involve , )– C hoose the best and “nearly best” plans for each (a plan is

considered nearly best if its out put has som e “interesting” form, e.g., is sorted)

– Combine these 1-relation plans into 2-relation expressions. Retain only the best and nearly best 2-relation plans

– Do same for 3-relation expressions, etc.

Page 57: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Performance Impact of the Query Rewritings

>10000

-10

0

10

20

30

40

50

60

70

80

Thro

ughp

ut ra

tio

SQLServer 2000Oracle 8iDB2 V7.1

Running Example100000 Employees100000 Students10 tech

Page 58: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Agenda

• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation

Page 59: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

SubqueriesTraditional optimization:• Block / Block

optimization• Nested loop across

blocks

select c custkey

from customer,

(select o custkey from orders

group by c custkey

having 1000000 < sum(o totalprice))

as AggResult

where o custkey = c custkey

Page 60: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Subqueries

• Using an algebraic notation to represent subqueries

• Apply is a higher order operator that takes as input a relation and a parametrized expression and outputs a relation.

Page 61: Based on slides from Shasha et al, Kifer et al. QUERY PROCESSINGhjemmesider.diku.dk/~bonnet/coursematerial/AdvancedDB06/... · 2006-10-05 · © Dennis Shasha, Philippe Bonnet - 2001

© Dennis Shasha, Philippe Bonnet - 2001

Subqueries

• The query tree can be rewritten to eliminate the apply operator

1000000<X(customer A. G1 X=sum(o price) o custkey=c custkeyorders)

= 1000000<X Gc custkey;X=sum(o price)(customer ALOJ o custkey=c custkeyorders);

= 1000000<X Gc custkey;X=sum(o price)(customer LOJ o custkey=c custkeyorders);

= 1000000<X Gc custkey;X=sum(o price)(customer JOIN o custkey=c custkey orders);