cps216: data-intensive computing systems•chapter 16, introduction to algorithms, cormen,...
TRANSCRIPT
![Page 1: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/1.jpg)
CPS216: Data-intensive
Computing Systems
Query Optimization (Cost-
based optimization)
Shivnath Babu
![Page 2: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/2.jpg)
Query Optimization Problem
Pick the best plan from the space of
physical plans
![Page 3: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/3.jpg)
Cost-Based Optimization
• Prune the space of plans using heuristics
• Estimate cost for remaining plans
– Be smart about how you iterate through plans
• Pick the plan with least cost
Focus on queries with joins
![Page 4: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/4.jpg)
Heuristics for pruning plan space
• Predicates as early as possible
• Avoid plans with cross products
• Only left-deep join trees
![Page 5: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/5.jpg)
Physical Plan Selection
Logical Query Plan
P1 P2 …. Pn
C1 C2 …. Cn
Pick minimum cost one
Physical
plans
Costs
![Page 6: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/6.jpg)
Review of Notation
• T (R) : Number of tuples in R
• B (R) : Number of blocks in R
![Page 7: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/7.jpg)
Simple Cost Model
Cost (R S) = T(R) + T(S)
All other operators have 0 cost
Note: The simple cost model used for illustration only
![Page 8: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/8.jpg)
Cost Model Example
R S
T
X
T(R) + T(S)
T(X) + T(T)
Total Cost: T(R) + T(S) + T(T) + T(X)
![Page 9: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/9.jpg)
Selinger Algorithm
• Dynamic Programming based
• Dynamic Programming:
– General algorithmic paradigm
– Exploits “principle of optimality”
– Useful reading:
• Chapter 16, Introduction to Algorithms,
Cormen, Leiserson, Rivest
![Page 10: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/10.jpg)
Principle of Optimality
Optimal for “whole” made up from
optimal for “parts”
![Page 11: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/11.jpg)
Principle of Optimality
Query: R1 R2 R3 R4 R5
R3 R2
R4
R1
R5 Optimal Plan:
![Page 12: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/12.jpg)
Principle of Optimality
Query: R1 R2 R3 R4 R5
R3 R2
R4
R1
R5 Optimal Plan:
Optimal plan for joining R3, R2, R4, R1
![Page 13: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/13.jpg)
Principle of Optimality
Query: R1 R2 R3 R4 R5
R3 R2
R4
R1
R5 Optimal Plan:
Optimal plan for joining R3, R2, R4
![Page 14: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/14.jpg)
Exploiting Principle of Optimality
Query: R1 R2 … Rn
R3 R1
R2
R2 R3
R1
Optimal
for joining R1, R2, R3
Sub-Optimal
for joining R1, R2, R3
![Page 15: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/15.jpg)
Exploiting Principle of Optimality
R3 R1
R2
Ri
Rj
Sub-Optimal
for joining R1,…,Rn
A sub-optimal sub-plan cannot lead to an
optimal plan
![Page 16: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/16.jpg)
Query: R1 R2 R3 R4
{ R1 } { R2 } { R3 } { R4 }
{ R1, R2 } { R1, R3 } { R1, R4 } { R2, R3 } { R2, R4 } { R3, R4 }
{ R1, R2, R3 } { R1, R2, R4 } { R1, R3, R4 } { R2, R3, R4 }
{ R1, R2, R3, R4 }
Progress
of
algorithm
Selinger Algorithm:
![Page 17: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/17.jpg)
Notation
OPT ( { R1, R2, R3 } ):
Cost of optimal plan to join R1,R2,R3
T ( { R1, R2, R3 } ):
Number of tuples in R1 R2 R3
![Page 18: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/18.jpg)
OPT ( { R1, R2, R3 } ):
OPT ( { R2, R3 } ) + T ( { R2, R3 } ) + T(R1)
OPT ( { R1, R2 } ) + T ( { R1, R2 } ) + T(R3)
OPT ( { R1, R3 } ) + T ( { R1, R3 } ) + T(R2)
Min
Selinger Algorithm:
Note: Valid only for the simple cost model
![Page 19: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/19.jpg)
Query: R1 R2 R3 R4
{ R1 } { R2 } { R3 } { R4 }
{ R1, R2 } { R1, R3 } { R1, R4 } { R2, R3 } { R2, R4 } { R3, R4 }
{ R1, R2, R3 } { R1, R2, R4 } { R1, R3, R4 } { R2, R3, R4 }
{ R1, R2, R3, R4 }
Progress
of
algorithm
Selinger Algorithm:
![Page 20: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/20.jpg)
Query: R1 R2 R3 R4
{ R1 } { R2 } { R3 } { R4 }
{ R1, R2 } { R1, R3 } { R1, R4 } { R2, R3 } { R2, R4 } { R3, R4 }
{ R1, R2, R3 } { R1, R2, R4 } { R1, R3, R4 } { R2, R3, R4 }
{ R1, R2, R3, R4 }
Progress
of
algorithm
Selinger Algorithm:
![Page 21: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/21.jpg)
R2
R3
R4
R1
Selinger Algorithm:
Optimal plan:
Query: R1 R2 R3 R4
![Page 22: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/22.jpg)
More Complex Cost Model
• DB System:
– Two join algorithms:
• Tuple-based nested loop join
• Sort-Merge join
– Two access methods
• Table Scan
• Index Scan (all indexes are in memory)
– Plans pipelined as much as possible
• Cost: Number of disk I/O s
![Page 23: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/23.jpg)
Cost of Table Scan
Table Scan
R
Cost: B (R)
![Page 24: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/24.jpg)
Cost of Clustered Index Scan
Index Scan
R
Cost: B (R)
![Page 25: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/25.jpg)
Cost of Clustered Index Scan
Index Scan
R
Cost: B (X) R.A > 50
X
![Page 26: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/26.jpg)
Cost of Non-Clustered Index Scan
Index Scan
R
Cost: T (R)
![Page 27: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/27.jpg)
Cost of Non-Clustered Index Scan
Index Scan
R
Cost: T (X) R.A > 50
X
![Page 28: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/28.jpg)
Cost of Tuple-Based NLJ
NLJ
Cost for entire plan:
Cost (Outer) + T(X) x Cost (Inner)
Inner
X
Outer
![Page 29: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/29.jpg)
Cost of Sort-Merge Join
Right
X
Left
Y
Sort Sort
Merge Cost for entire plan:
Cost (Right) + Cost (Left) +
2 (B (X) + B (Y) ) R1.A = R2.A
R1 R2
![Page 30: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/30.jpg)
Cost of Sort-Merge Join
Right
X
Left
Y
Sort
Merge Cost for entire plan:
Cost (Right) + Cost (Left) +
2 B (Y) R1.A = R2.A
R1 R2
Sorted on R1.A
![Page 31: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/31.jpg)
Cost of Sort-Merge Join
Right
X
Left
Y
Merge Cost for entire plan:
Cost (Right) + Cost (Left)
R1.A = R2.A
R1 R2
Sorted on R1.A
Sorted on R2.A
![Page 32: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/32.jpg)
Cost of Sort-Merge Join
Bottom Line: Cost depends on
sorted-ness of inputs
![Page 33: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/33.jpg)
Principle of Optimality?
Query: R1 R2 R3 R4 R5
SMJ
R1
Is Plan X the optimal plan for joining R2,R3,R4,R5?
Optimal plan: (R1.A = R2.A)
Plan X
Scan
![Page 34: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/34.jpg)
Violation of Principle of Optimality
Plan Y Plan X
(sorted on R2.A) (unsorted on R2.A)
Optimal plan for joining
R2,R3,R4,R4
Suboptimal plan for joining
R2,R3,R4,R5
![Page 35: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/35.jpg)
Principle of Optimality?
Query: R1 R2 R3 R4 R5
SMJ Optimal plan:
(R1.A = R2.A)
Plan X
Can we assert anything about plan X?
R1
Scan
![Page 36: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/36.jpg)
Weaker Principle of Optimality
If plan X produces output sorted on R2.A then
plan X is the optimal plan for joining R2,R3,R4,R5
that produces output sorted on R2.A
If plan X produces output unsorted on R2.A then
plan X is the optimal plan for joining R2, R3, R4, R5
![Page 37: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/37.jpg)
Interesting Order
• An attribute is an interesting order if:
– participates in a join predicate
– Occurs in the Group By clause
– Occurs in the Order By clause
![Page 38: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/38.jpg)
Interesting Order: Example
Select *
From R1(A,B), R2(A,B), R3(B,C)
Where R1.A = R2.A and R2.B = R3.B
Interesting Orders: R1.A, R2.A, R2.B, R3.B
![Page 39: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/39.jpg)
Modified Selinger Algorithm
{R1} {R1}(A) {R3}(B) {R2} {R2}(A) {R2}(B) {R3}
{R1,R2} {R1,R2}(A) {R1,R2}(B) {R2,R3} {R2,R3}(A) {R2,R3}(B)
{R1,R2,R3}
![Page 40: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/40.jpg)
Notation
{R1,R2} (C)
Optimal way of joining R1, R2 so that output is sorted
on attribute R2.C
![Page 41: CPS216: Data-intensive Computing Systems•Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest . Principle of Optimality Optimal for “whole” made up from optimal](https://reader034.vdocuments.us/reader034/viewer/2022052016/602f41dffff76251875ad4a9/html5/thumbnails/41.jpg)
Modified Selinger Algorithm
{R1} {R1}(A) {R3}(B) {R2} {R2}(A) {R2}(B) {R3}
{R1,R2} {R1,R2}(A) {R1,R2}(B) {R2,R3} {R2,R3}(A) {R2,R3}(B)
{R1,R2,R3}