scuholliday - coen 17814–1 schedule today: u query processing overview
TRANSCRIPT
![Page 1: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/1.jpg)
SCU Holliday - COEN 178 14–1
Schedule
• Today: Query Processing overview
![Page 2: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/2.jpg)
SCU Holliday - COEN 178 14–2
Steps in Query Processing 1. Parsing and translation
2. Optimization 3. Evaluation
![Page 3: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/3.jpg)
SCU Holliday - COEN 178 14–3
Steps in Query Processing• Parsing and translation
translate the query into its internal form. This is then translated into relational algebra.
Parser checks syntax, verifies relations
• Optimization• Evaluation
The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.
![Page 4: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/4.jpg)
SCU Holliday - COEN 178 14–4
Optimization• A relational algebra expression may have many
equivalent expressions E.g., balance2500(balance(account)) is equivalent to
balance(balance2500(account))
• Each relational algebra operation can be evaluated using one of several different algorithms
• Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. E.g., can use an index on balance to find accounts with
balance < 2500, or can perform complete relation scan and discard
accounts with balance 2500
![Page 5: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/5.jpg)
SCU Holliday - COEN 178 14–5
Query Optimization
• Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is estimated using statistical information from the
database catalog• e.g. number of tuples in each relation, size of tuples, etc.
• We want to know How to measure query costs Algorithms for evaluating relational algebra operations How to combine algorithms for individual operations in
order to evaluate a complete expression
![Page 6: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/6.jpg)
SCU Holliday - COEN 178 14–6
Measures of Query Cost• Cost is generally measured as total elapsed time
for answering query Many factors contribute to time cost
• disk accesses, CPU, or even network communication
• Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account Number of seeks * average-seek-cost Number of blocks read * average-block-read-cost Number of blocks written * average-block-write-cost
• Cost to write a block is greater than cost to read a block – data is read back after being written to ensure that the write was
successful
![Page 7: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/7.jpg)
SCU Holliday - COEN 178 14–7
Cost• For simplicity we just use number of block transfers from
disk as the cost measure We also ignore CPU costs for simplicity
• Costs depends on the size of the buffer in main memory Having more memory reduces need for disk access Amount of real memory available to buffer depends on other
concurrent OS processes, and hard to determine ahead of actual execution
We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available
• Real systems take CPU cost into account, differentiate between sequential and random I/O, and take buffer size into account
![Page 8: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/8.jpg)
SCU Holliday - COEN 178 14–8
Example
R A B C S C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
![Page 9: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/9.jpg)
SCU Holliday - COEN 178 14–9
Example
Select B,D
From R,S
Where R.A = “c” and S.E = 2 and R.C=S.C
B,D(R.A=“c” S.E=2 R.C=S.C)(R X S)
![Page 10: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/10.jpg)
SCU Holliday - COEN 178 14–10
R A B C S C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
Answer B D2 x
![Page 11: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/11.jpg)
SCU Holliday - COEN 178 14–11
• How do we execute query?
- Do Cartesian product- Select tuples- Do projection
One idea
![Page 12: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/12.jpg)
SCU Holliday - COEN 178 14–12
RXS R.A R.B R.C S.C S.D S.E
a 1 10 10 x 2
a 1 10 20 y 2
. .
C 2 10 10 x 2 . .
Bingo!
Got one...
![Page 13: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/13.jpg)
SCU Holliday - COEN 178 14–13
Relational Algebra - can be used to describe plans...
Ex: Plan I
B,D
R.A=“c” S.E=2 R.C=S.C
X
R S
OR: B,D [ R.A=“c” S.E=2 R.C = S.C (RXS)]
![Page 14: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/14.jpg)
SCU Holliday - COEN 178 14–14
Another idea:
B,D
R.A = “c” S.E = 2
R S
Plan II
natural join
![Page 15: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/15.jpg)
SCU Holliday - COEN 178 14–15
R S
A B C (R) (S) C D E
a 1 10 A B C C D E 10 x 2
b 1 20 c 2 10 10 x 2 20 y 2
c 2 10 20 y 2 30 z 2
d 2 35 30 z 2 40 x 1
e 3 45 50 y 3
![Page 16: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/16.jpg)
SCU Holliday - COEN 178 14–16
Plan III
Use R.A and S.C Indexes
(1) Use R.A index to select R tuples with R.A = “c”
(2) For each R.C value found, use S.C index to find matching tuples
(3) Eliminate S tuples S.E 2
(4) Join matching R,S tuples, project
B,D attributes and place in result
![Page 17: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/17.jpg)
SCU Holliday - COEN 178 14–17
R S
A B C C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
A CI1 I2
=“c”
<c,2,10> <10,x,2>
check=2?
output: <2,x>
next tuple:<c,7,15>
![Page 18: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/18.jpg)
SCU Holliday - COEN 178 14–18
Example: SQL query
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
![Page 19: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/19.jpg)
SCU Holliday - COEN 178 14–19
Example: Parse Tree<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
![Page 20: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/20.jpg)
SCU Holliday - COEN 178 14–20
Example: Generating Relational Algebra
title
StarsIn <condition>
<tuple> IN name
<attribute> birthdate LIKE ‘%1960’
starName MovieStar
Fig. 7.15: An expression using a two-argument , midway between a parse tree and relational algebra
![Page 21: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/21.jpg)
SCU Holliday - COEN 178 14–21
Example: Logical Query Plan
title
starName=name
StarsIn name
birthdate LIKE ‘%1960’
MovieStar
Fig. 7.18: Applying the rule for IN conditions
![Page 22: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/22.jpg)
SCU Holliday - COEN 178 14–22
Example: Improved Logical Query Plan
title
starName=name
StarsIn name
birthdate LIKE ‘%1960’
MovieStar
Fig. 7.20: An improvement on fig. 7.18.
Question:Push project to
StarsIn?
![Page 23: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/23.jpg)
SCU Holliday - COEN 178 14–23
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
![Page 24: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/24.jpg)
SCU Holliday - COEN 178 14–24
Selection Operation• File scan – search algorithms that locate and retrieve
records that fulfill a selection condition.
• Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.
Cost estimate (number of disk blocks scanned) = br
If selection is on a key attribute, cost = (br /2)
• stop on finding record
Linear search can be applied regardless of • selection condition or
• ordering of records in the file, or
• availability of indices
![Page 25: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/25.jpg)
SCU Holliday - COEN 178 14–25
Selection continued
• A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored
contiguously Cost estimate (number of disk blocks to be
scanned):log2(br) — cost of locating the first tuple by a
binary search on the blocks• Plus number of blocks containing records that
satisfy selection condition
![Page 26: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/26.jpg)
SCU Holliday - COEN 178 14–26
Selection with Index Scan
• A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the corresponding equality condition
• A4 (primary index on nonkey, equality) Retrieve multiple records. Records will be on consecutive blocks
• A5 (equality on search-key of secondary index). Retrieve a single record if the search-key is a candidate key Retrieve multiple records if search-key is not a candidate key
• Can be very expensive!
• each record may be on a different block – one block access for each retrieved record
![Page 27: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/27.jpg)
SCU Holliday - COEN 178 14–27
Cross Product and Join
• We want a way to estimate the size of the results of joins and cross products.
• The cross product r s contains nr * ns tuples and each tuple occupies br + bs bytes
• If R S =, then r s is the same as r s
![Page 28: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/28.jpg)
SCU Holliday - COEN 178 14–28
Join Size Estimation• If R S is a key for R, then we know that a
tuple of s will join with at most one tuple from r, so the number of tuples in r s is no greater than the number of tuples in s.
• If R S is a foreign key for S referencing R, then the number of tuples in r s is exactly the number of tuples in s.
R SA X
35 …
36 …
37 …
K A
k1 35
k2 35
k3 37
![Page 29: SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f335503460f94c50730/html5/thumbnails/29.jpg)
SCU Holliday - COEN 178 14–29
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics