based on slides from shasha et al, kifer et al. query...

© Dennis Shasha, Philippe Bonnet - 2001

QUERY PROCESSINGBased on slides from Shasha et al, Kifer et al.


Query TuningSELECT s.RESTAURANT_NAME, t.TABLE_SEATING, to_char(t.DATE_TIME,'Dy, Mon FMDD') AS THEDATE, to_char(t.DATE_TIME,'HH:MI PM') AS THETIME,to_char(t.DISCOUNT,'99') || '%' AS AMOUNTVALUE,t.TABLE_ID, s.SUPPLIER_ID, t.DATE_TIME, to_number(to_char(t.DATE_TIME,'SSSSS')) AS SORTTIME

FROM TABLES_AVAILABLE t, SUPPLIER_INFO s,

(SELECT s.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, max(t.DISCOUNT) AMOUNT, t.OFFER_TYPEFROM TABLES_AVAILABLE t, SUPPLIER_INFO WHERE t.SUPPLIER_ID = s.SUPPLIER_ID

and (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') != TO_CHAR(sysdate, 'MM/DD/YYYY') OR TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)

and t.NUM_OFFERS > 0and t.DATE_TIME > SYSDATE and s.CITY = 'SF' and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7) and to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.O F F E R _T Y P E = 'D iscou nt„

GROUP BYs.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, t.OFFER_TYP

) u

WHEREt.SUPPLIER_ID = s.SUPPLIER_ID

and u.SUPPLIER_ID = s.SUPPLIER_IDand t.SUPPLIER_ID = u.SUPPLIER_IDand t.TABLE_SEATING = u.TABLE_SEATINGand t.DATE_TIME = u.DATE_TIMEand t.DISCOUNT = u.AMOUNTand t.OFFER_TYPE = u.OFFER_TYPEand (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') !=

TO_CHAR(sysdate, 'MM/DD/YYYY') OR TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)

and t.NUM_OFFERS > and t.DATE_TIME > SYSDATE and s.CITY = 'SF' and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7) and

to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.OFFER_TYPE = 'Discount'

ORDER BY AMOUNTVALUE DESC, t.TABLE_SEATING ASC, upper(s.RESTAURANT_NAME) ASC,SORTTIME ASC, t.DATE_TIME ASC

E xecution is too slow …1) How is this query executed?2) How to make it run faster?


Query Execution PlanOutput of the Oracle EXPLAIN tool

Physical Operators

Access Method Cost Model

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)

1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)

2 1 NESTED LOOPS (Cost=164 Card=1 Bytes=106)


4 3 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=28)

5 3 VIEW

6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)



9 7 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=200)



Query Processing Architecture


Agenda

• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation


Query Representation

• A query is decomposed into blocks– Aggregation– Order by– SPJ– Relations

• Each block is represented and optimized independently– Inter block optimization is possible/desirable.


Query Representation

• Query Tree • Query graph

Reserves Sailors

sid=sid

bid=100

sname(On-the-fly)

rating > 5(Scan;write to temp T1)

(Scan;write totemp T2)

(Sort-Merge Join)

Reserves

Sailors

rating > 5

bid=100

sid=sid


Query Execution PlanOutput of the Oracle EXPLAIN tool

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)

1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)




5 3 VIEW

6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)






Agenda



Access Path•• Access pathAccess path is the notion that denotes algorithm +

data structure used to locate rows satisfying some condition

• Examples:– File scan: can be used for any condition– Hash: equality search; all search key attributes of hash

index are specified in condition– B+ tree: equality or range search; a prefix of the search

key attributes are specified in condition• B+ tree supports a variety of access paths

– Binary search: Relation sorted on a sequence of attributes and some prefix of that sequence is specified in condition


Access Paths Supported by B+ tree• Example: Given a B+ tree whose search key is the

sequence of attributes a2, a1, a3, a4– Access path for search a1>5 AND a2=3 AND a3= ‘x’ (R): find

first entry having a2=3 AND a1>5 AND a3= ‘x’ and scan leaves from there until entry having a2>3 or a3 ‘x’. Select satisfying entries

– Access path for search a2=3 AND a3 >‘x’ (R): locate first entry having a2=3 and scan leaves until entry having a2>3. Select satisfying entries

– Access path for search a1>5 AND a3 = ‘x’ (R): Scan of R


Choosing an Access Path•• SelectivitySelectivity of an access path = number of pages

retrieved using that path• Size of domain of attribute is an indicator of the

selectivity of search conditions that involve that attribute

• Example: CrsCode= „C S 305‟ AND Grade= „B ‟ (TranscriptTranscript)– a B+ tree with search key CrsCode has lower selectivity

than a B+ tree with search key Grade


Agenda



Physical Operators

• Algorithms– Sort– Select – Project– Join

• Nested Loop• Sort-Merge• Hash-Join


External Sorting• Sorting is used in implementing many relational

operations• Problem:

– Relations are typically large, do not fit in main memory– So cannot use traditional in-memory sorting algorithms

• Approach used:– Combine in-memory sorting with techniques aimed at

minimizing I/O– I/O costs dominate => cost of sorting algorithm is measured

in the number of page transfers


E xternal S orting (cont‟d)

• External sorting has two main components:– Computation involved in sorting records in

buffers in main memory– I/O necessary to move records between mass

store and main memory


Simple Sort Algorithm• M = number of main memory page buffers• F = number of pages in file to be sorted• Typical algorithm has two phases:

– Partial sort phase: sort M pages at a time; create F/M sorted runsruns on mass store, cost = 2F

Example: M = 2, F = 7run

Original file

Partially sorted file

5 3 2 6 1 10 15 7 20 11 8 4 7 5

2 3 5 6 1 7 10 15 4 8 11 20 5 7


Simple Sort Algorithm– Merge Phase: merge all runs into a single run

using M-1 buffers for input and 1 output buffer• Merge step: divide runs into groups of size M-1 and merge each group into a run; cost = 2F

each step reduces number of runs by a factor of M-1

M pagesBuffer


Duplicate Elimination

• A major step in computing projection, union, and difference relational operators

• Algorithm:– Sort– At the last stage of the merge step eliminate

duplicates on the fly– No additional cost (with respect to sorting) in

terms of I/O


Sort-Based Projection

• Algorithm:– Sort rows of relation– Eliminate unwanted columns in partial sort

phase (no additional cost)– Eliminate duplicates on completion of last

merge step (no additional cost)• Cost: the cost of sorting


Computing Selection (attr op value)

• No index on attr:– If rows are not sorted on attr:

• Scan all data pages to find rows satisfying selection condition

• Cost = F– If rows are sorted on attr and op is =, >, < then:

• Use binary search (at log2 F ) to locate first data page containing row in which (attr = value)

• Scan further to get all rows satisfying (attr op value)• Cost = log2 F + (cost of scan)


Computing Selection (attr op value)• Clustered B+ tree index on attr (for “= ” or range search):

– Locate first index entry corresponding to a row in which (attr = value). CostCost = depth of tree

– Rows satisfying condition packed in sequence in successive data pages; scan those pages.CostCost: number of pages occupied by qualifying rows

B+ treeindex entries(containing rows)that satisfycondition


Computing Selection (attr op value)

• Unclustered B+ tree index on attr (for “= ” or range search):– Locate first index entry corresponding to a row in which (attr

= value). CostCost = depth of tree

– Index entries with pointers to rows satisfying condition are packed in sequence in successive index pages• Scan entries and sort record Ids to identify table data pages

with qualifying rowsAny page that has at least one such row must be fetched once.

•• CostCost: number of rows that satisfy selection condition


Unclustered B+ Tree Indexindex entries (containingrow Ids) that satisfycondition

data page

Data file

B+ Tree


Computing Selection (attr = value)

• Hash index on attr (for “= ” search only):– Hash on value. CostCost 1.2

• 1.2 – typical average cost of hashing (> 1 due to possible overflow chains)

• Finds the (unique) bucket containing all index entries satisfying selection condition

• Clustered index – all qualifying rows packed in the bucket (a few pages)CostCost: number of pages occupies by the bucket

• Unclustered index – sort row Ids in the index entries to identify data pages with qualifying rowsEach page containing at least one such row must be fetched onceCostCost: min(number of qualifying rows in bucket, number of pages in file)


Computing Joins• The cost of joining two relations makes the

choice of a join algorithm crucial•• SimpleSimple blockblock--nested loopsnested loops join algorithm

for computing r A=B s

foreach page pr in r doforeach page ps in s do

output pr A=B ps


Block-Nested Loop Illustrated

Output buffer

s

r

Input buffer for s

Input buffer for r

… and so on

r s


Index-Nested Loop Join r A=B s• Use an index on s with search key B (instead of

scanning s) to find rows of s that match tr–– Cost Cost = r + r + cost of outputting final result

– Effective if number of rows of s that match tuples in r is small (i.e., is small) and index is clustered

foreach tuple tr in r do {use index to find all tuples ts in s satisfying tr.A=ts.B;output (tr, ts)

}

Number of rows in r

avg cost of retrieving all rows in s that match tr


Sort-Merge Join r A=B ssort r on A;sort s on B;while !eof(r) and !eof(s) do {

Scan r and s concurrently until tr.A=ts.B=c;Output A=c(r)B=c (s)

}

r

s

B=c (s)

A=c(r)


Hash-Join r A=B s

• Step 1: Hash r on A and s on B into the same set of buckets

• Step 2: Since matching tuples must be in same bucket, read each bucket in turn and output the result of the join

•• CostCost:: 3 (r + s ) + cost of output of final result– assuming each bucket fits in memory


Agenda



Query Optimizer

• Uses heuristic algorithms to evaluate relational algebra expressions. This involves:– estimating the cost of a relational algebra expression– transforming one relational algebra expression to an

equivalent one– choosing access paths for evaluating the subexpressions

• Q uery optim izers do not “optim ize” – just try to find “reasonably good” evaluation strategies


Equivalence Preserving Transformations

• To transform a relational expression into another equivalent expression we need transformation rules that preserve equivalence

• Each transformation rule– Is provably correct (ie, does preserve equivalence)– Has a heuristic associated with it


Selection and Projection Rules

• Break complex selection into simpler ones:– Cond1Cond2 (R) Cond1 (Cond2 (R) )

• Break projection into stages:– attr (R) attr ( attr(R)), if attr attr

• Commute projection and selection:– attr (Cond(R)) Cond ( attr (R)),

if attr all attributes in Cond


Commutativity and Associativity of Join(and Cartesian Product as Special Case)

• Join commutativity: R S S R– used to reduce cost of nested loop evaluation strategies (smaller relation

should be in outer loop)

• Join associativity: R (S T) (R S) T– used to reduce the size of intermediate relations in computation of multi-

relational join – first compute the join that yields smaller intermediate result

• N-way join has T(N) N! different evaluation plans– T(N) is the number of parenthesized expressions– N! is the number of permutations

• Query optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan


Pushing Selections and Projections

• Cond (R S) R Cond S – Cond relates attributes of both R and S– Reduces size of intermediate relation since rows can be

discarded sooner

• Cond (R S) Cond (R) S – Cond involves only the attributes of R– Reduces size of intermediate relation since rows of R are

discarded sooner

• attr(R S) attr(attr(R) S),if attributes(R) attr attr ∩ attributes(R)– reduces the size of an operand of product


Equivalence Example

• C1 C2 C3 (RR SS) C1 (C2 (C3 (RR SS) ) ) C1 (C2 (RR) C3 (SS) ) C2 (RR) C1C3 (SS)

assuming C2 involves only attributes of RR, C3 involves only attributes of SS, and C1 relates attributes of R and SS


SELECT P.NameFROM ProfessorProfessor P, TeachingTeaching TWHERE P.Id = T.ProfId -- join condition

AND P. DeptId = „C S ‟ AND T.Semester = „F 1994‟

Name(D eptId= ‘C S’ Sem ester= ‘F 1994’(ProfessorProfessor Id=ProfId TeachingTeaching))

Cost - Example 1

Name

D eptId= ‘C S’ Sem ester= ‘F 1994’

Id=ProfId

ProfessorProfessor TeachingTeaching

Master query execution plan (nothing pushed)


Metadata on Tables (in system catalogue)

–– ProfessorProfessor (Id, Name, DeptId)• size: 200 pages, 1000 rows, 50 departments• indexes: clustered, 2-level B+tree on DeptId, hash on Id

–– TeachingTeaching (ProfId, CrsCode, Semester)• size: 1000 pages, 10,000 rows, 4 semesters• indexes: clustered, 2-level B+tree on Semester;

hash on ProfId– Definition: WeightWeight of an attribute – average number

of rows that have a particular value• weight of Id = 1 (it is a key)• weight of ProfId = 10 (10,000 classes/1000 professors)


Estimating Cost - Example 1• Join - block-nested loops with 52 page buffer (50

pages – input for Professor, Professor, 1 page – input for TeachingTeaching, 1 – output page– Scanning ProfessorProfessor (outer loop): 200 page transfers,

(4 iterations, 50 transfers each)– Finding matching rows in TeachingTeaching (inner loop):

1000 page transfers for each iteration of outer loop– Total cost = 200+4*1000 = 4200 page transfers


Estimating Cost - E xam ple 1 (cont‟d)

• Selection and projection – scan rows of interm ediate file, discard those that don‟t satisfy selection, project on those that do, write result when output buffer is full.

• Complete algorithm:– do join, write result to intermediate file on disk– read intermediate file, do select/project, write final

result– Problem: unnecessary I/O


Pipelining• Solution: use pipeliningpipelining:

– join and select/project act as coroutines, operate as producer/consumer sharing a buffer in main memory.• When join fills buffer; select/project filters it and outputs

result• Process is repeated until select/project has processed last

output from join – Performing select/project adds no additional cost

joinjoin selectselect/projectprojectIntermediateresult

outputfinal result

buffer


Estimating Cost - E xam ple 1 (cont‟d)

• Total cost:4200 + (cost of outputting final result)

– We will disregard the cost of outputting final result in comparing with other query evaluation strategies, since this will be same for all


Name(Sem ester= ‘F 1994’ (D eptId= ‘C S’ (ProfessorProfessor) Id=ProfId TeachingTeaching))

Cost Example 2SELECT P.NameFROM ProfessorProfessor P, TeachingTeaching TWHERE P.Id = T.ProfId AND

P. DeptId = „C S ‟ AND T.Semester = „F 1994‟

Name

Sem ester= ‘F 1994’

DeptId= „C S ‟

ProfessorProfessor TeachingTeaching

Id=ProfId

Partially pushed plan: selection pushed to ProfessorProfessor


Cost Example 2 -- selection• Compute DeptId= „C S ‟ (ProfessorProfessor) (to reduce size of one

join table) using clustered, 2-level B+ tree on DeptId.– 50 departments and 1000 professors; hence weight

of DeptId is 20 (roughly 20 CS professors). These rows are in ~4 consecutive pages in ProfessorProfessor.• Cost = 4 (to get rows) + 2 (to search index) = 6• keep resulting 4 pages in memory and pipe to next step

clustered indexon DeptId rows of

ProfessorProfessor


Cost Example 2 -- join

• Index-nested loops join using hash index on ProfId of TeachingTeaching and looping on the selected professors (computed on previous slide)– Since selection on Semester was not pushed, hash

index on ProfId of TeachingTeaching can be used – Note: if selection on Semester were pushed, the

index on ProfId would have been lost – an advantage of not using a fully pushed query execution plan


Cost Example 2 – join (cont‟d)– Each professor matches ~10 TeachingTeaching rows. Since 20 CS

professors, hence 200 teaching records.– All index entries for a particular ProfId are in same bucket.

Assume ~1.2 I/Os to get a bucket.• Cost = 1.2 20 (to fetch index entries for 20 CS

professors) + 200 (to fetch TeachingTeaching rows, since hash index is unclustered) = 224

TeachingTeachinghash

1.2 20 10

ProfId


Cost Example 2 – select/project

• Pipe result of join to select (on Semester) and project (on Name) at no I/O cost

• Cost of output same as for Example 1• Total cost:

6 (select on ProfessorProfessor) + 224 (join) = 230• Comparison:

4200 (example 1) vs. 230 (example 2) !!!


Choosing Query Execution Plan

• Step 1: Choose a logical plan• Step 2: Reduce search space• Step 3: Use a heuristic search to further

reduce complexity


Step 1: Choosing a Logical Plan

• Involves choosing a query tree, which indicates the order in which algebraic operations are applied

• Heuristic: P ushed trees are good, but som etim es “nearly fully pushed” trees are better due to indexing (as w e saw in the example)

• So: T ake the initial “m aster plan” tree and produce a fully pushed tree plus several nearly fully pushed trees.


Step 2: Reduce Search Space

• Deal with associativity of binary operators (join, union, … )

A B C DA B C D

A B C DA B C D

DD

CC

A BA BLogical queryexecution plan

Equivalentquery tree Equivalent left left

deep query treedeep query tree


S tep 2 (cont‟d)

• Two issues:– Choose a particular shape of a tree (like in the

previous slide)• Equals the number of ways to parenthesize N-way

join – grows very rapidly– Choose a particular permutation of the leaves

• E.g., 4! permutations of the leaves A, B, C, DA, B, C, D


Step 2: Dealing With Associativity• Too many trees to evaluate: settle on a particular

shape: leftleft--deep tree.deep tree.– Used because it allows pipeliningpipelining:

– Property: once a row of XX has been output by P1, it need not be output again (but C may have to be processed several times in P2 for successive portions of XX)

– Advantage: none of the intermediate relations (X, YX, Y) have to be completely materialized and saved on disk.• Important if one such relation is very large, but the final result is

small

A BA B X CX C Y DY DXX YY

P1 P2 P3


Step 2: Dealing with Associativity

• consider the alternative: if we use the association ((A BA B) (C DC D))

A BA B

C DC D

X YX Y

XX

YY

Each row of XX mustbe processed againstall of YY. Hence all ofYY (can be very large)must be stored in P3, or P2 has torecompute it several times.

P1

P2P3


Step 3: Heuristic Search

• The choice of left-deep trees still leaves open too many options (N! permutations):– (((A BA B) CC) DD), – (((C AC A) DD) BB), … ..

• A heuristic (often dynamic programming based) algorithm is used to get a „good‟ plan


Step 3: Dynamic Programming Algorithm

• To compute a join of E1, E2, … , E N in a left-deep manner:– Start with 1-relation expressions (can involve , )– C hoose the best and “nearly best” plans for each (a plan is

considered nearly best if its out put has som e “interesting” form, e.g., is sorted)

– Combine these 1-relation plans into 2-relation expressions. Retain only the best and nearly best 2-relation plans

– Do same for 3-relation expressions, etc.


Performance Impact of the Query Rewritings

>10000

-10

0

10

20

30

40

50

60

70

80

Thro

ughp

ut ra

tio

SQLServer 2000Oracle 8iDB2 V7.1

Running Example100000 Employees100000 Students10 tech


Agenda



SubqueriesTraditional optimization:• Block / Block

optimization• Nested loop across

blocks

select c custkey

from customer,

(select o custkey from orders

group by c custkey

having 1000000 < sum(o totalprice))

as AggResult

where o custkey = c custkey


Subqueries

• Using an algebraic notation to represent subqueries

• Apply is a higher order operator that takes as input a relation and a parametrized expression and outputs a relation.


Subqueries

• The query tree can be rewritten to eliminate the apply operator

1000000<X(customer A. G1 X=sum(o price) o custkey=c custkeyorders)

= 1000000<X Gc custkey;X=sum(o price)(customer ALOJ o custkey=c custkeyorders);

= 1000000<X Gc custkey;X=sum(o price)(customer LOJ o custkey=c custkeyorders);

= 1000000<X Gc custkey;X=sum(o price)(customer JOIN o custkey=c custkey orders);

based on slides from shasha et al, kifer et al. query...

Documents