1 chapter 21 query processing transparencies ownernoclient © pearson education limited 1995, 2005
TRANSCRIPT
![Page 1: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/1.jpg)
1
Chapter 21
Query Processing
Transparencies
© Pearson Education Limited 1995, 2005
![Page 2: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/2.jpg)
2
Chapter 21 - Objectives
Objectives of query processing and optimization. Static versus dynamic query optimization. How a query is decomposed and semantically
analyzed. How to create a R.A.T. to represent a query. Rules of equivalence for RA operations. How to apply heuristic transformation rules to
improve efficiency of a query.
© Pearson Education Limited 1995, 2005
![Page 3: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/3.jpg)
3
Chapter 21 - Objectives
Types of database statistics required to estimate cost of operations.
Different strategies for implementing selection. How to evaluate cost and size of selection. Different strategies for implementing join. How to evaluate cost and size of join. Different strategies for implementing projection. How to evaluate cost and size of projection.
© Pearson Education Limited 1995, 2005
![Page 4: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/4.jpg)
4
Chapter 21 - Objectives
How to evaluate the cost and size of other RA operations.
How pipelining can be used to improve efficiency of queries.
Difference between materialization and pipelining.
Advantages of left-deep trees. Approaches to finding optimal execution
strategy. How Oracle handles QO.
© Pearson Education Limited 1995, 2005
![Page 5: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/5.jpg)
5
Introduction
In network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language.
Programmer’s responsibility to select most appropriate execution strategy.
With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved.
Relieves user of knowing what constitutes good execution strategy.
© Pearson Education Limited 1995, 2005
![Page 6: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/6.jpg)
6
Introduction
Also gives DBMS more control over system performance.
Two main techniques for query optimization:– heuristic rules that order operations in a query; – comparing different strategies based on relative
costs, and selecting one that minimizes resource usage.
Disk access tends to be dominant cost in query processing for centralized DBMS.
© Pearson Education Limited 1995, 2005
![Page 7: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/7.jpg)
7
Query Processing
Activities involved in retrieving data from the database.
Aims of QP:– transform query written in high-level language
(e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA);
– execute strategy to retrieve required data.
© Pearson Education Limited 1995, 2005
![Page 8: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/8.jpg)
8
Query Optimization
Activity of choosing an efficient execution strategy for processing query.
As there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimizes resource usage.
Generally, reduce total execution time of query. May also reduce response time of query. Problem computationally intractable with large
number of relations, so strategy adopted is reduced to finding near optimum solution.
© Pearson Education Limited 1995, 2005
![Page 9: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/9.jpg)
9
Example 21.1 - Different Strategies
Find all Managers who work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
© Pearson Education Limited 1995, 2005
![Page 10: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/10.jpg)
10
Example 21.1 - Different Strategies
Three equivalent RA queries are:
(1) (position='Manager') (city='London')
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager') (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)
(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
(city='London' (Branch))
© Pearson Education Limited 1995, 2005
![Page 11: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/11.jpg)
11
Example 21.1 - Different Strategies
Assume:– 1000 tuples in Staff; 50 tuples in Branch;– 50 Managers; 5 London branches;– no indexes or sort keys;– results of any intermediate operations stored
on disk;– cost of the final write is ignored;– tuples are accessed one at a time.
© Pearson Education Limited 1995, 2005
![Page 12: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/12.jpg)
12
Example 21.1 - Cost Comparison
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together.
© Pearson Education Limited 1995, 2005
![Page 13: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/13.jpg)
13
Phases of Query Processing
QP has four main phases:
– decomposition (consisting of parsing and validation);
– optimization;– code generation;– execution.
© Pearson Education Limited 1995, 2005
![Page 14: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/14.jpg)
14
Phases of Query Processing
© Pearson Education Limited 1995, 2005
![Page 15: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/15.jpg)
15
Dynamic versus Static Optimization
Two times when first three phases of QP can be carried out:– dynamically every time query is run;– statically when query is first submitted.
Advantages of dynamic QO arise from fact that information is up to date.
Disadvantages are that performance of query is affected, time may limit finding optimum strategy.
© Pearson Education Limited 1995, 2005
![Page 16: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/16.jpg)
16
Dynamic versus Static Optimization
Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy.
Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run.
Could use a hybrid approach to overcome this.
© Pearson Education Limited 1995, 2005
![Page 17: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/17.jpg)
17
Query Decomposition
Aims are to transform high-level query into RA query and check that query is syntactically and semantically correct.
Typical stages are:– analysis, – normalization, – semantic analysis, – simplification, – query restructuring.
© Pearson Education Limited 1995, 2005
![Page 18: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/18.jpg)
18
Analysis
Analyze query lexically and syntactically using compiler techniques.
Verify relations and attributes exist. Verify operations are appropriate for object type.
© Pearson Education Limited 1995, 2005
![Page 19: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/19.jpg)
19
Analysis - Example
SELECT staff_no
FROM Staff
WHERE position > 10;
This query would be rejected on two grounds:– staff_no is not defined for Staff relation
(should be staffNo).– Comparison ‘>10’ is incompatible with type
position, which is variable character string.
© Pearson Education Limited 1995, 2005
![Page 20: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/20.jpg)
20
Analysis
Finally, query transformed into some internal representation more suitable for processing.
Some kind of query tree is typically chosen, constructed as follows:– Leaf node created for each base relation.– Non-leaf node created for each intermediate
relation produced by RA operation.– Root of tree represents query result.– Sequence is directed from leaves to root.
© Pearson Education Limited 1995, 2005
![Page 21: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/21.jpg)
21
Example 21.1 - R.A.T.
© Pearson Education Limited 1995, 2005
![Page 22: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/22.jpg)
22
Normalization
Converts query into a normalized form for easier manipulation.
Predicate can be converted into one of two forms:
Conjunctive normal form:(position = 'Manager' salary > 20000) (branchNo = 'B003')
Disjunctive normal form:(position = 'Manager' branchNo = 'B003' )
(salary > 20000 branchNo = 'B003')
© Pearson Education Limited 1995, 2005
![Page 23: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/23.jpg)
23
Semantic Analysis
Rejects normalized queries that are incorrectly formulated or contradictory.
Query is incorrectly formulated if components do not contribute to generation of result.
Query is contradictory if its predicate cannot be satisfied by any tuple.
Algorithms to determine correctness exist only for queries that do not contain disjunction and negation.
© Pearson Education Limited 1995, 2005
![Page 24: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/24.jpg)
24
Semantic Analysis
For these queries, could construct:– A relation connection graph. – Normalized attribute connection graph.
Relation connection graph
Create node for each relation and node for result. Create edges between two nodes that represent a join, and edges between nodes that represent projection.
If not connected, query is incorrectly formulated. © Pearson Education Limited 1995, 2005
![Page 25: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/25.jpg)
25
Semantic Analysis - Normalized Attribute Connection Graph
Create node for each reference to an attribute, or constant 0.
Create directed edge between nodes that represent a join, and directed edge between attribute node and 0 node that represents selection.
Weight edges a b with value c, if it represents inequality condition (a b + c); weight edges 0 a with -c, if it represents inequality condition (a c).
If graph has cycle for which valuation sum is negative, query is contradictory.
© Pearson Education Limited 1995, 2005
![Page 26: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/26.jpg)
26
Example 21.2 - Checking Semantic Correctness
SELECT p.propertyNo, p.streetFROM Client c, Viewing v, PropertyForRent pWHERE c.clientNo = v.clientNo AND
c.maxRent >= 500 AND
c.prefType = ‘Flat’ AND p.ownerNo = ‘CO93’;
Relation connection graph not fully connected, so query is not correctly formulated.
Have omitted the join condition (v.propertyNo = p.propertyNo) .
© Pearson Education Limited 1995, 2005
![Page 27: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/27.jpg)
27
Example 21.2 - Checking Semantic Correctness
Relation Connection graph
Normalized attribute
connection graph
© Pearson Education Limited 1995, 2005
![Page 28: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/28.jpg)
28
Example 21.2 - Checking Semantic Correctness
SELECT p.propertyNo, p.streetFROM Client c, Viewing v, PropertyForRent pWHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo AND c.prefType = ‘Flat’ AND c.maxRent <
200;
Normalized attribute connection graph has cycle between nodes c.maxRent and 0 with negative valuation sum, so query is contradictory.
© Pearson Education Limited 1995, 2005
![Page 29: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/29.jpg)
29
Simplification
– Detects redundant qualifications, – eliminates common sub-expressions, – transforms query to semantically equivalent
but more easily and efficiently computed form. Typically, access restrictions, view definitions,
and integrity constraints are considered. Assuming user has appropriate access privileges,
first apply well-known idempotency rules of boolean algebra.
© Pearson Education Limited 1995, 2005
![Page 30: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/30.jpg)
30
Transformation Rules for RA Operations
Conjunctive Selection operations can cascade into individual Selection operations (and vice versa).
pqr(R) = p(q(r(R)))
Sometimes referred to as cascade of Selection.
branchNo='B003' salary>15000(Staff) = branchNo='B003'(salary>15000(Staff))
© Pearson Education Limited 1995, 2005
![Page 31: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/31.jpg)
31
Transformation Rules for RA Operations
Commutativity of Selection.
p(q(R)) = q(p(R))
For example:
branchNo='B003'(salary>15000(Staff)) = salary>15000(branchNo='B003'(Staff))
© Pearson Education Limited 1995, 2005
![Page 32: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/32.jpg)
32
Transformation Rules for RA Operations
In a sequence of Projection operations, only the last in the sequence is required.
LM … N(R) = L (R)
For example:
lNamebranchNo, lName(Staff) = lName (Staff)
© Pearson Education Limited 1995, 2005
![Page 33: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/33.jpg)
33
Transformation Rules for RA Operations
Commutativity of Selection and Projection.
If predicate p involves only attributes in projection list, Selection and Projection operations commute:
Ai, …, Am(p(R)) = p(Ai, …, Am(R)) where p {A1, A2, …, Am}
For example:
fName, lName(lName='Beech'(Staff)) = lName='Beech'(fName,lName(Staff))
© Pearson Education Limited 1995, 2005
![Page 34: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/34.jpg)
34
Transformation Rules for RA Operations
Commutativity of Theta join (and Cartesian product).
R p S = S p R
R X S = S X R
Rule also applies to Equijoin and Natural join. For example:
Staff staff.branchNo=branch.branchNo Branch =
Branch staff.branchNo=branch.branchNo Staff
© Pearson Education Limited 1995, 2005
![Page 35: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/35.jpg)
35
Transformation Rules for RA Operations
Commutativity of Selection and Theta join (or Cartesian product).
If selection predicate involves only attributes of one of join relations, Selection and Join (or Cartesian product) operations commute:
p(R r S) = (p(R)) r S
p(R X S) = (p(R)) X S
where p {A1, A2, …, An}
© Pearson Education Limited 1995, 2005
![Page 36: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/36.jpg)
36
Transformation Rules for RA Operations
If selection predicate is conjunctive predicate having form (p q), where p only involves attributes of R, and q only attributes of S, Selection and Theta join operations commute as:
p q(R r S) = (p(R)) r (q(S))
p q(R X S) = (p(R)) X (q(S))
© Pearson Education Limited 1995, 2005
![Page 37: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/37.jpg)
37
Transformation Rules for RA Operations
For example:
position='Manager' city='London'(Staff
Staff.branchNo=Branch.branchNo Branch) =
(position='Manager'(Staff)) Staff.branchNo=Branch.branchNo (city='London' (Branch))
© Pearson Education Limited 1995, 2005
![Page 38: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/38.jpg)
38
Transformation Rules for RA Operations
Commutativity of Projection and Theta join (or Cartesian product).
If projection list is of form L = L1 L2, where L1 only has attributes of R, and L2 only has attributes of S, provided join condition only contains attributes of L, Projection and Theta join commute:
L1L2(R r S) = (L1(R)) r (L2(S))
© Pearson Education Limited 1995, 2005
![Page 39: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/39.jpg)
39
Transformation Rules for RA Operations
If join condition contains additional attributes not in L (M = M1 M2 where M1 only has attributes of R, and M2 only has attributes of S), a final projection operation is required:
L1L2(R r S) = L1L2( (L1M1(R)) r (L2M2(S)))
© Pearson Education Limited 1995, 2005
![Page 40: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/40.jpg)
40
Transformation Rules for RA Operations
For example:
position,city,branchNo(Staff Staff.branchNo=Branch.branchNo Branch) =
(position, branchNo(Staff)) Staff.branchNo=Branch.branchNo (
city, branchNo (Branch))
and using the latter rule:
position, city(Staff Staff.branchNo=Branch.branchNo Branch) =
position, city ((position, branchNo(Staff)) Staff.branchNo=Branch.branchNo ( city, branchNo (Branch)))
© Pearson Education Limited 1995, 2005
![Page 41: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/41.jpg)
41
Transformation Rules for RA Operations
Commutativity of Union and Intersection (but not set difference).
R S = S R
R S = S R
© Pearson Education Limited 1995, 2005
![Page 42: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/42.jpg)
42
Transformation Rules for RA Operations
Commutativity of Selection and set operations (Union, Intersection, and Set difference).
p(R S) = p(S) p(R)
p(R S) = p(S) p(R)
p(R - S) = p(S) - p(R)
© Pearson Education Limited 1995, 2005
![Page 43: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/43.jpg)
43
Transformation Rules for RA Operations
Commutativity of Projection and Union.
L(R S) = L(S) L(R)
Associativity of Union and Intersection (but not Set difference).
(R S) T = S (R T)
(R S) T = S (R T)
© Pearson Education Limited 1995, 2005
![Page 44: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/44.jpg)
44
Transformation Rules for RA Operations
Associativity of Theta join (and Cartesian product). Cartesian product and Natural join are always
associative:
(R S) T = R (S T)
(R X S) X T = R X (S X T)
If join condition q involves attributes only from S and T, then Theta join is associative:
(R p S) q r T = R p r (S q T)
© Pearson Education Limited 1995, 2005
![Page 45: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/45.jpg)
45
Transformation Rules for RA Operations
For example:
(Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)
ownerNo=Owner.ownerNo staff.lName=Owner.lName Owner =
Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName
(PropertyForRent ownerNo Owner)
© Pearson Education Limited 1995, 2005
![Page 46: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/46.jpg)
46
Example 21.3 Use of Transformation Rules
For prospective renters of flats, find properties that match requirements and owned by CO93.
SELECT p.propertyNo, p.streetFROM Client c, Viewing v, PropertyForRent pWHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo ANDc.maxRent >= p.rent AND c.prefType = p.type AND p.ownerNo = ‘CO93’;
© Pearson Education Limited 1995, 2005
![Page 47: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/47.jpg)
47
Example 21.3 Use of Transformation Rules
© Pearson Education Limited 1995, 2005
![Page 48: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/48.jpg)
48
Example 21.3 Use of Transformation Rules
© Pearson Education Limited 1995, 2005
![Page 49: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/49.jpg)
49
Example 21.3 Use of Transformation Rules
© Pearson Education Limited 1995, 2005
![Page 50: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/50.jpg)
50
Heuristical Processing Strategies
Perform Selection operations as early as possible.– Keep predicates on same relation together.
Combine Cartesian product with subsequent Selection whose predicate represents join condition into a Join operation.
Use associativity of binary operations to rearrange leaf nodes so leaf nodes with most restrictive Selection operations executed first.
© Pearson Education Limited 1995, 2005
![Page 51: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/51.jpg)
51
Heuristical Processing Strategies
Perform Projection as early as possible.
– Keep projection attributes on same relation together.
Compute common expressions once.
– If common expression appears more than once, and result not too large, store result and reuse it when required.
– Useful when querying views, as same expression is used to construct view each time.
© Pearson Education Limited 1995, 2005
![Page 52: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/52.jpg)
52
Cost Estimation for RA Operations
Many different ways of implementing RA operations.
Aim of QO is to choose most efficient one. Use formulae that estimate costs for a number of
options, and select one with lowest cost. Consider only cost of disk access, which is usually
dominant cost in QP. Many estimates are based on cardinality of the
relation, so need to be able to estimate this.
© Pearson Education Limited 1995, 2005
![Page 53: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/53.jpg)
53
Database Statistics
Success of estimation depends on amount and currency of statistical information DBMS holds.
Keeping statistics current can be problematic. If statistics updated every time tuple is changed,
this would impact performance. DBMS could update statistics on a periodic basis,
for example nightly, or whenever the system is idle.
© Pearson Education Limited 1995, 2005
![Page 54: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/54.jpg)
54
Typical Statistics for Relation R
nTuples(R) - number of tuples in R.
bFactor(R) - blocking factor of R.
nBlocks(R) - number of blocks required to store R:
nBlocks(R) = [nTuples(R)/bFactor(R)]
© Pearson Education Limited 1995, 2005
![Page 55: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/55.jpg)
55
Typical Statistics for Attribute A of Relation R
nDistinctA(R) - number of distinct values that
appear for attribute A in R.
minA(R),maxA(R)
– minimum and maximum possible values for attribute A in R.
SCA(R) - selection cardinality of attribute A in R.
Average number of tuples that satisfy an equality condition on attribute A.
© Pearson Education Limited 1995, 2005
![Page 56: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/56.jpg)
56
Statistics for Multilevel Index I on Attribute A
nLevelsA(I) - number of levels in I.
nLfBlocksA(I) - number of leaf blocks in I.
© Pearson Education Limited 1995, 2005
![Page 57: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/57.jpg)
57
Selection Operation
Predicate may be simple or composite. Number of different implementations, depending
on file structure, and whether attribute(s) involved are indexed/hashed.
Main strategies are:– Linear Search (Unordered file, no index).– Binary Search (Ordered file, no index). – Equality on hash key.– Equality condition on primary key.
© Pearson Education Limited 1995, 2005
![Page 58: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/58.jpg)
58
Selection Operation
– Inequality condition on primary key.– Equality condition on clustering (secondary)
index. – Equality condition on a non-clustering
(secondary) index. – Inequality condition on a secondary B+-tree
index.
© Pearson Education Limited 1995, 2005
![Page 59: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/59.jpg)
59
Estimating Cardinality of Selection
Assume attribute values are uniformly distributed within their domain and attributes are independent.
nTuples(S) = SCA(R)
For any attribute B A of S, nDistinctB(S) =
nTuples(S) if nTuples(S) < nDistinctB(R)/2
nDistinctB(R) if nTuples(S) > 2*nDistinctB(R)
[(nTuples(S) + nDistinctB(R))/3] otherwise
© Pearson Education Limited 1995, 2005
![Page 60: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/60.jpg)
60
Linear Search (Ordered File, No Index)
May need to scan each tuple in each block to check whether it satisfies predicate.
For equality condition on key attribute, cost estimate is:
[nBlocks(R)/2] For any other condition, entire file may need to be
searched, so more general cost estimate is:
nBlocks(R)
© Pearson Education Limited 1995, 2005
![Page 61: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/61.jpg)
61
Binary Search (Ordered File, No Index)
If predicate is of form A = x, and file is ordered on key attribute A, cost estimate:
[log2(nBlocks(R))] Generally, cost estimate is:
[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1 First term represents cost of finding first tuple
using binary search. Expect there to be SCA(R) tuples satisfying
predicate.
© Pearson Education Limited 1995, 2005
![Page 62: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/62.jpg)
62
Equality of Hash Key
If attribute A is hash key, apply hashing algorithm to calculate target address for tuple.
If there is no overflow, expected cost is 1. If there is overflow, additional accesses may be
necessary.
© Pearson Education Limited 1995, 2005
![Page 63: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/63.jpg)
63
Equality Condition on Primary Key
Can use primary index to retrieve single record satisfying condition.
Need to read one more block than number of index accesses, equivalent to number of levels in index, so estimated cost is:
nLevelsA(I) + 1
© Pearson Education Limited 1995, 2005
![Page 64: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/64.jpg)
64
Inequality Condition on Primary Key
Can first use index to locate record satisfying predicate (A = x).
Provided index is sorted, records can be found by accessing all records before/after this one.
Assuming uniform distribution, would expect half the records to satisfy inequality, so estimated cost is:
nLevelsA(I) + [nBlocks(R)/2]
© Pearson Education Limited 1995, 2005
![Page 65: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/65.jpg)
65
Equality Condition on Clustering Index
Can use index to retrieve required records. Estimated cost is:
nLevelsA(I) + [SCA(R)/bFactor(R)]
Second term is estimate of number of blocks that will be required to store number of tuples that satisfy equality condition, represented as SCA(R).
© Pearson Education Limited 1995, 2005
![Page 66: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/66.jpg)
66
Equality Condition on Non-Clustering Index
Can use index to retrieve required records. Have to assume that tuples are on different
blocks (index is not clustered this time), so estimated cost becomes:
nLevelsA(I) + [SCA(R)]
© Pearson Education Limited 1995, 2005
![Page 67: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/67.jpg)
67
Inequality Condition on a Secondary B+-Tree Index
From leaf nodes of tree, can scan keys from smallest value up to x (< or <= ) or from x up to maximum value (> or >=).
Assuming uniform distribution, would expect half the leaf node blocks to be accessed and, via index, half the file records to be accessed.
Estimated cost is:
nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2]
© Pearson Education Limited 1995, 2005
![Page 68: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/68.jpg)
68
Composite Predicates - Conjunction without Disjunction
May consider following approaches:- If one attribute has index or is ordered, can use one of above selection strategies. Can then check each retrieved record.
- For equality on two or more attributes, with composite index (or hash key) on combined attributes, can search index directly.
- With secondary indexes on one or more attributes (involved only in equality conditions in predicate), could use record pointers if exist.
© Pearson Education Limited 1995, 2005
![Page 69: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/69.jpg)
69
Composite Predicates - Selections with Disjunction
If one term contains an (OR), and term requires linear search, entire selection requires linear search.
Only if index or sort order exists on every term can selection be optimized by retrieving records that satisfy each condition and applying union operator.
Again, record pointers can be used if they exist.
© Pearson Education Limited 1995, 2005
![Page 70: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/70.jpg)
70
Join Operation
Main strategies for implementing join:
– Block Nested Loop Join.– Indexed Nested Loop Join.– Sort-Merge Join.– Hash Join.
© Pearson Education Limited 1995, 2005
![Page 71: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/71.jpg)
71
Estimating Cardinality of Join
Cardinality of Cartesian product is:
nTuples(R) * nTuples(S)
More difficult to estimate cardinality of any join as depends on distribution of values.
Worst case, cannot be any greater than this value.
© Pearson Education Limited 1995, 2005
![Page 72: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/72.jpg)
72
Estimating Cardinality of Join
If assume uniform distribution, can estimate for Equijoins with a predicate (R.A = S.B) as follows:– If A is key of R: nTuples(T) nTuples(S)– If B is key of S: nTuples(T) nTuples(R)
Otherwise, could estimate cardinality of join as:
nTuples(T) = SCA(R)*nTuples(S) or
nTuples(T) = SCB(S)*nTuples(R)
© Pearson Education Limited 1995, 2005
![Page 73: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/73.jpg)
73
Block Nested Loop Join
Simplest join algorithm is nested loop that joins two relations together a tuple at a time.
Outer loop iterates over each tuple in R, and inner loop iterates over each tuple in S.
As basic unit of reading/writing is a disk block, better to have two extra loops that process blocks.
Estimated cost of this approach is:
nBlocks(R) + (nBlocks(R) * nBlocks(S))
© Pearson Education Limited 1995, 2005
![Page 74: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/74.jpg)
74
Block Nested Loop Join
Could read as many blocks as possible of smaller relation, R say, into database buffer, saving one block for inner relation and one for result.
New cost estimate becomes:
nBlocks(R) + [nBlocks(S)*(nBlocks(R)/(nBuffer-2))]
If can read all blocks of R into the buffer, this reduces to:
nBlocks(R) + nBlocks(S)
© Pearson Education Limited 1995, 2005
![Page 75: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/75.jpg)
75
Indexed Nested Loop Join
If have index (or hash function) on join attributes of inner relation, can use index lookup.
For each tuple in R, use index to retrieve matching tuples of S.
Cost of scanning R is nBlocks(R), as before. Cost of retrieving matching tuples in S depends
on type of index and number of matching tuples. If join attribute A in S is PK, cost estimate is:
nBlocks(R) + nTuples(R)*(nlevelsA(I) + 1)
© Pearson Education Limited 1995, 2005
![Page 76: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/76.jpg)
76
Sort-Merge Join
For Equijoins, most efficient join is when both relations are sorted on join attributes.
Can look for qualifying tuples merging relations. May need to sort relations first. Now tuples with same join value are in order. If assume join is *:* and each set of tuples with
same join value can be held in database buffer at same time, then each block of each relation need only be read once.
© Pearson Education Limited 1995, 2005
![Page 77: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/77.jpg)
77
Sort-Merge Join
Cost estimate for the sort-merge join is:
nBlocks(R) + nBlocks(S)
If a relation has to be sorted, R say, add:
nBlocks(R)*[log2(nBlocks(R)]
© Pearson Education Limited 1995, 2005
![Page 78: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/78.jpg)
78
Hash Join
For Natural or Equijoin, hash join may be used. Idea is to partition relations according to some
hash function that provides uniformity and randomness.
Each equivalent partition should hold same value for join attributes, although it may hold more than one value.
Cost estimate of hash join as:
3(nBlocks(R) + nBlocks(S))
© Pearson Education Limited 1995, 2005
![Page 79: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/79.jpg)
79
Projection Operation
To implement projection need to:– remove attributes that are not required;– eliminate any duplicate tuples produced from
previous step. Only required if projection attributes do not include a key.
Two main approaches to eliminating duplicates:
– sorting;
– hashing.
© Pearson Education Limited 1995, 2005
![Page 80: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/80.jpg)
80
Estimating Cardinality of Projection
When projection contains key, cardinality is:
nTuples(S) = nTuples(R)
If projection consists of a single non-key attribute, estimate is:
nTuples(S) = SCA(R)
Otherwise, could estimate cardinality as:
nTuples(S) min(nTuples(R), im
=1(nDistinctai(R)))
© Pearson Education Limited 1995, 2005
![Page 81: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/81.jpg)
81
Duplicate Elimination using Sorting
Sort tuples of reduced relation using all remaining attributes as sort key.
Duplicates will now be adjacent and can be removed easily.
Estimated cost of sorting is:
nBlocks(R)*[log2(nBlocks(R))].
Combined cost is:
nBlocks(R) + nBlocks(R)*[log2(nBlocks(R))]
© Pearson Education Limited 1995, 2005
![Page 82: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/82.jpg)
82
Duplicate Elimination using Hashing
Two phases: partitioning and duplicate elimination.
In partitioning phase, for each tuple in R, remove unwanted attributes and apply hash function to combination of remaining attributes, and write reduced tuple to hashed value.
Two tuples that belong to different partitions are guaranteed not to be duplicates.
Estimated cost is: nBlocks(R) + nB
© Pearson Education Limited 1995, 2005
![Page 83: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/83.jpg)
83
Set Operations
Can be implemented by sorting both relations on same attributes, and scanning through each of sorted relations once to obtain desired result.
Could use sort-merge join as basis. Estimated cost in all cases is:
nBlocks(R) + nBlocks(S) + nBlocks(R)*[log2(nBlocks(R))] + nBlocks(S)*[log2(nBlocks(S))]
Could also use hashing algorithm.
© Pearson Education Limited 1995, 2005
![Page 84: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/84.jpg)
84
Estimating Cardinality of Set Operations
As duplicates are eliminated when performing Union, difficult to estimate cardinality, but can give an upper and lower bound as:
max(nTuples(R), nTuples(S)) nTuples(T) nTuples(R) + nTuples(S)
For Set Difference, can also give upper and lower bound:
0 nTuples(T) nTuples(R)
© Pearson Education Limited 1995, 2005
![Page 85: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/85.jpg)
85
Aggregate Operations
SELECT AVG(salary)
FROM Staff;
To implement query, could scan entire Staff relation and maintain running count of number of tuples read and sum of all salaries.
Easy to compute average from these two running counts.
© Pearson Education Limited 1995, 2005
![Page 86: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/86.jpg)
86
Aggregate Operations
SELECT AVG(salary)
FROM Staff
GROUP BY branchNo;
For grouping queries, can use sorting or hashing algorithms similar to duplicate elimination.
Can estimate cardinality of result using estimates derived earlier for selection.
© Pearson Education Limited 1995, 2005
![Page 87: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/87.jpg)
87
Enumeration of Alternative Strategies
Fundamental to efficiency of QO is the search space of possible execution strategies and the enumeration algorithm used to search this space.
Query with 2 joins gives 12 join orderings:R (S T) R (T S) (S T) R (T S) R
S (R T) S (T R) (R T) S (T R) S
T (R S) T (S R) (R S) T (S R) T
With n relations, (2(n – 1))!/(n – 1)! orderings. If n = 4 this is 120; if n = 10 this is > 176 billion. Compounded by different selection/join methods.
© Pearson Education Limited 1995, 2005
![Page 88: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/88.jpg)
88
Pipelining
Materialization - output of one operation is stored in temporary relation for processing by next.
Could also pipeline results of one operation to another without creating temporary relation.
Known as pipelining or on-the-fly processing. Pipelining can save on cost of creating temporary
relations and reading results back in again. Generally, pipeline is implemented as separate
process or thread.
© Pearson Education Limited 1995, 2005
![Page 89: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/89.jpg)
89
Types of Trees
© Pearson Education Limited 1995, 2005
![Page 90: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/90.jpg)
90
Pipelining
With linear trees, relation on one side of each operator is always a base relation.
However, as need to examine entire inner relation for each tuple of outer relation, inner relations must always be materialized.
This makes left-deep trees appealing as inner relations are always base relations.
Reduces search space for optimum strategy, and allows QO to use dynamic processing.
Not all execution strategies are considered.
© Pearson Education Limited 1995, 2005
![Page 91: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/91.jpg)
91
Physical Operators & Strategies
Term physical operator refers to specific algorithm that implements a logical operation, such as selection or join.
For example, can use sort-merge join to implement the join operation.
Replacing logical operations in a R.A.T. with physical operators produces an execution strategy (or query evaluation plan or access plan).
© Pearson Education Limited 1995, 2005
![Page 92: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/92.jpg)
92
Physical Operators & Strategies
© Pearson Education Limited 1995, 2005
![Page 93: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/93.jpg)
93
Reducing the Search Space Restriction 1: Unary operations processed on-
the-fly: selections processed as relations are accessed for first time; projections processed as results of other operations are generated.
Restriction 2: Cartesian products are never formed unless query itself specifies one.
Restriction 3: Inner operand of each join is a base relation, never an intermediate result. This uses fact that with left-deep trees inner operand is a base relation and so already materialized.
Restriction 3 excludes many alternative strategies but significantly reduces number to be considered.
© Pearson Education Limited 1995, 2005
![Page 94: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/94.jpg)
94
Dynamic Programming
Enumeration of left-deep trees using dynamic programming first proposed for System R QO.
Algorithm based on assumption that the cost model satisfies principle of optimality.
Thus, to obtain optimal strategy for query with n joins, only need to consider optimal strategies for subexpressions with (n – 1) joins and extend those strategies with an additional join. Remaining suboptimal strategies can be discarded.
© Pearson Education Limited 1995, 2005
![Page 95: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/95.jpg)
95
Dynamic Programming
To ensure some potentially useful strategies are not discarded algorithm retains strategies with interesting orders: an intermediate result has an interesting order if it is sorted by a final ORDER BY attribute, GROUP BY attribute, or any attributes that participate in subsequent joins.
© Pearson Education Limited 1995, 2005
![Page 96: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/96.jpg)
96
Dynamic Programming
SELECT p.propertyNo, p.streetFROM Client c, Viewing v, PropertyForRent pWHERE c.maxRent < 500 AND
c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo;
Attributes c.clientNo, v.clientNo, v.propertyNo, and p.propertyNo are interesting.
If any intermediate result is sorted on any of these attributes, then corresponding partial strategy must be included in search.
© Pearson Education Limited 1995, 2005
![Page 97: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/97.jpg)
97
Dynamic Programming
Algorithm proceeds from the bottom up and constructs all alternative join trees that satisfy the restrictions above, as follows:
Pass 1: Enumerate the strategies for each base relation using a linear search and all available indexes on the relation. These partial strategies are partitioned into equivalence classes based on any interesting orders. An additional equivalence class is created for the partial strategies with no interesting order.
© Pearson Education Limited 1995, 2005
![Page 98: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/98.jpg)
98
Dynamic Programming
For each equivalence class, strategy with lowest cost is retained for consideration in next pass.
Do not retain equivalence class with no interesting order if its lowest cost strategy is not lower than all other strategies.
For a given relation R, any selections involving only attributes of R are processed on-the-fly. Similarly, any attributes of R that are not part of the SELECT clause and do not contribute to any subsequent join can be projected out at this stage (restriction 1 above).
© Pearson Education Limited 1995, 2005
![Page 99: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/99.jpg)
99
Dynamic Programming
Pass 2: Generate all 2-relation strategies by considering each strategy retained after Pass 1 as outer relation, discarding any Cartesian products generated (restriction 2 above). Again, any on-the-fly processing is performed and lowest cost strategy in each equivalence class is retained.
Pass n: Generate all n-relation strategies by considering each strategy retained after Pass (n – 1) as outer relation, discarding any Cartesian products generated. After pruning, now have lowest overall strategy for processing the query.
© Pearson Education Limited 1995, 2005
![Page 100: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/100.jpg)
100
Dynamic Programming
Although algorithm is still exponential, there are query forms for which it only generates O(n3) strategies, so for n = 10 the number is 1,000, which is significantly better than the 176 billion different join orders noted earlier.
© Pearson Education Limited 1995, 2005
![Page 101: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/101.jpg)
101
Semantic Query Optimization
Based on constraints specified on the database schema to reduce the search space.
For example, a constraint states that staff cannot supervise more than 100 properties, so any query searching for staff who supervise more than 100 properties will produce zero rows. Now consider:CREATE ASSERTION ManagerSalary
CHECK (salary > 20000 AND position = ‘Manager’)SELECT s.staffNo, fName, lName, propertyNoFROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNo AND
position = ‘Manager’;© Pearson Education Limited 1995, 2005
![Page 102: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/102.jpg)
102
Semantic Query Optimization
Can rewrite this query as:
SELECT s.staffNo, fName, lName, propertyNoFROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNo AND
salary > 20000 AND position = ‘Manager’;
Additional predicate may be very useful if only
index for Staff is a B+-tree on the salary attribute. However, additional predicate would complicate
query if no such index existed.
© Pearson Education Limited 1995, 2005
![Page 103: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/103.jpg)
103
Query Optimization in Oracle
Oracle supports two approaches to query optimization: rule-based and cost-based.
Rule-based 15 rules, ranked in order of efficiency. Particular
access path for a table only chosen if statement contains a predicate or other construct that makes that access path available.
Score assigned to each execution strategy using these rankings and strategy with best (lowest) score selected.
© Pearson Education Limited 1995, 2005
![Page 104: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/104.jpg)
104
QO in Oracle – Rule-Based When 2 strategies have same score, tie-break
resolved by making decision based on order in which tables occur in the SQL statement.
© Pearson Education Limited 1995, 2005
![Page 105: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/105.jpg)
105
QO in Oracle – Rule-based: Example
SELECT propertyNoFROM PropertyForRentWHERE rooms > 7 AND city = ‘London’
Single-column access path using index on city from WHERE condition (city = ‘London’). Rank 9.
Unbounded range scan using index on rooms from WHERE condition (rooms > 7). Rank 11.
Full table scan - rank 15. Although there is index on propertyNo, column does not appear
in WHERE clause and so is not considered by optimizer. Based on these paths, rule-based optimizer will choose to use
index based on city column.
© Pearson Education Limited 1995, 2005
![Page 106: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/106.jpg)
106
QO in Oracle – Cost-Based
To improve QO, Oracle introduced cost-based optimizer in Oracle 7, which selects strategy that requires minimal resource use necessary to process all rows accessed by query (avoiding above tie-break anomaly).
User can select whether minimal resource usage is based on throughput or based on response time, by setting the OPTIMIZER_MODE initialization parameter.
Cost-based optimizer also takes into consideration hints that the user may provide.
© Pearson Education Limited 1995, 2005
![Page 107: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/107.jpg)
107
QO in Oracle – Statistics
Cost-based optimizer depends on statistics for all tables, clusters, and indexes accessed by query.
Users’ responsibility to generate these statistics and keep them current.
Package DBMS_STATS can be used to generate and manage statistics.
Whenever possible, Oracle uses a parallel method to gather statistics, although index statistics are collected serially. EXECUTE
DBMS_STATS.GATHER_SCHEMA_STATS(‘Manager’);
© Pearson Education Limited 1995, 2005
![Page 108: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/108.jpg)
108
QO in Oracle – Histograms
Previously made assumption that data values within columns of a table are uniformly distributed.
Histogram of values and their relative frequencies gives optimizer improved selectivity estimates in presence of non-uniform distribution.
© Pearson Education Limited 1995, 2005
![Page 109: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/109.jpg)
109
QO in Oracle – Histograms
(a) uniform distribution of rooms; (b) actual non-uniform distribution.
(a) can be stored compactly as low value (1) and high value (10), and as total count of all frequencies (in this case, 100).
© Pearson Education Limited 1995, 2005
![Page 110: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/110.jpg)
110
QO in Oracle – Histograms
Histogram is data structure that can improve estimates of number of tuples in result.
Two types of histogram:– width-balanced histogram, which divides data into a
fixed number of equal-width ranges (called buckets) each containing count of number of values falling within that bucket;
– height-balanced histogram, which places approximately same number of values in each bucket so that end points of each bucket are determined by how many values are in that bucket.
© Pearson Education Limited 1995, 2005
![Page 111: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/111.jpg)
111
QO in Oracle – Histograms
(a) width-balanced for rooms with 5 buckets. Each bucket of equal width with 2 values (1-2, 3-4, etc.)
(b) height-balanced – height of each column is 20 (100/5).
© Pearson Education Limited 1995, 2005
![Page 112: 1 Chapter 21 Query Processing Transparencies ownerNoclient © Pearson Education Limited 1995, 2005](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6a3ea/html5/thumbnails/112.jpg)
112
QO in Oracle – Viewing Execution Plan
© Pearson Education Limited 1995, 2005