14 query planning &optimization - cmu 15-445/645 · first implementation of a query optimizer...
TRANSCRIPT
![Page 1: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/1.jpg)
Intro to Database Systems
15-445/15-645
Fall 2019
Andy PavloComputer Science Carnegie Mellon UniversityAP
14Query Planning &Optimization
Part I
![Page 2: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/2.jpg)
CMU 15-445/645 (Fall 2019)
ADMINISTRIVIA
Mid-Term Exam is Wed Oct 16th @ 12:00pm→ See mid-term exam guide for more info.
Project #2 is due Sun Oct 20th @ 11:59pm
2
![Page 3: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/3.jpg)
CMU 15-445/645 (Fall 2019)
QUERY OPTIMIZATION
Remember that SQL is declarative.→ User tells the DBMS what answer they want, not how to
get the answer.
There can be a big difference in performance based on plan is used:→ See last week: 1.3 hours vs. 0.45 seconds
3
![Page 4: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/4.jpg)
CMU 15-445/645 (Fall 2019)
IBM SYSTEM R
First implementation of a query optimizer from the 1970s.→ People argued that the DBMS could never choose a query
plan better than what a human could write.
Many concepts and design decisions from the System R optimizer are still used today.
4
![Page 5: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/5.jpg)
CMU 15-445/645 (Fall 2019)
QUERY OPTIMIZATION
Heuristics / Rules→ Rewrite the query to remove stupid / inefficient things.→ These techniques may need to examine catalog, but they
do not need to examine data.
Cost-based Search→ Use a model to estimate the cost of executing a plan.→ Evaluate multiple equivalent plans for a query and pick
the one with the lowest cost.
5
![Page 6: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/6.jpg)
CMU 15-445/645 (Fall 2019)
ARCHITECTURE OVERVIEW
6
Parser
SystemCatalog
Tree Rewriter(Optional)
CostModel
SQL Rewriter(Optional)
Binder
OptimizerSQL Query1
SQL Query2
AbstractSyntaxTree
3
Logical Plan
4
Logical Plan
5
Physical Plan
6
Application
Name→Internal ID
Schema Info
Schema Info
Estimates
![Page 7: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/7.jpg)
CMU 15-445/645 (Fall 2019)
LOGICAL VS. PHYSICAL PL ANS
The optimizer generates a mapping of a logical algebra expression to the optimal equivalent physical algebra expression.
Physical operators define a specific execution strategy using an access path.→ They can depend on the physical format of the data that
they process (i.e., sorting, compression).→ Not always a 1:1 mapping from logical to physical.
7
![Page 8: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/8.jpg)
CMU 15-445/645 (Fall 2019)
QUERY OPTIMIZATION IS NP -HARD
This is the hardest part of building a DBMS.
If you are good at this, you will get paid $$$.
People are starting to look at employing ML to improve the accuracy and efficacy of optimizers.
I am expanding the Advanced DB Systems class to cover this topic in greater detail.
8
![Page 9: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/9.jpg)
CMU 15-445/645 (Fall 2019)
TODAY'S AGENDA
Relational Algebra Equivalences
Static Rules
9
![Page 10: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/10.jpg)
CMU 15-445/645 (Fall 2019)
REL ATIONAL ALGEBRA EQUIVALENCES
Two relational algebra expressions are equivalentif they generate the same set of tuples.
The DBMS can identify better query plans without a cost model.
This is often called query rewriting.
10
![Page 11: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/11.jpg)
CMU 15-445/645 (Fall 2019)
PREDICATE PUSHDOWN
11
SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'
πname, cid(σgrade='A'(student⋈enrolled))
![Page 12: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/12.jpg)
CMU 15-445/645 (Fall 2019)
PREDICATE PUSHDOWN
11
student enrolled
s.sid=e.sid
grade='A'
s.name,e.cid
⨝s
p
student enrolled
s.sid=e.sid
grade='A'
s.name,e.cid
sp
⨝
SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'
![Page 13: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/13.jpg)
CMU 15-445/645 (Fall 2019)
REL ATIONAL ALGEBRA EQUIVALENCES
12
πname, cid(σgrade='A'(student⋈enrolled))
πname, cid(student⋈(σgrade='A'(enrolled)))
=
SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'
![Page 14: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/14.jpg)
CMU 15-445/645 (Fall 2019)
REL ATIONAL ALGEBRA EQUIVALENCES
Selections:→ Perform filters as early as possible.→ Reorder predicates so that the DBMS applies the most
selective one first.→ Break a complex predicate, and push down
σp1∧p2∧…pn(R) = σp1(σp2(…σpn(R)))
Simplify a complex predicate → (X=Y AND Y=3) → X=3 AND Y=3
13
![Page 15: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/15.jpg)
CMU 15-445/645 (Fall 2019)
REL ATIONAL ALGEBRA EQUIVALENCES
Projections:→ Perform them early to create smaller tuples and reduce
intermediate results (if duplicates are eliminated)→ Project out all attributes except the ones requested or
required (e.g., joining keys)
This is not important for a column store…
14
![Page 16: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/16.jpg)
CMU 15-445/645 (Fall 2019)
PROJECTION PUSHDOWN
15
student enrolled
s.sid=e.sid
grade='A'
s.name,e.cid
⨝s
p
student enrolled
s.sid=e.sid
grade='A'
s.name,e.cid
⨝
s
p
sid,cidpsid,namep
SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'
![Page 17: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/17.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Impossible / Unnecessary Predicates
16
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;X
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
![Page 18: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/18.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Impossible / Unnecessary Predicates
16
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;
SELECT * FROM A WHERE 1 = 1;
X
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
![Page 19: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/19.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Impossible / Unnecessary Predicates
16
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;
SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;
X
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
![Page 20: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/20.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Impossible / Unnecessary Predicates
Join Elimination
16
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;
SELECT A1.*FROM A AS A1 JOIN A AS A2ON A1.id = A2.id;
SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;
X
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
![Page 21: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/21.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Impossible / Unnecessary Predicates
Join Elimination
16
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;
SELECT * FROM A WHERE 1 = 1;
SELECT * FROM A;
SELECT * FROM A;
X
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
![Page 22: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/22.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Ignoring Projections
17
SELECT * FROM A AS A1WHERE EXISTS(SELECT val FROM A AS A2
WHERE A1.id = A2.id);
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
Source: Lukas Eder
![Page 23: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/23.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Ignoring Projections
17
SELECT * FROM A;
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
Source: Lukas Eder
![Page 24: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/24.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Ignoring Projections
Merging Predicates
17
SELECT * FROM AWHERE val BETWEEN 1 AND 100
OR val BETWEEN 50 AND 150;
SELECT * FROM A;
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
Source: Lukas Eder
![Page 25: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/25.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Ignoring Projections
Merging Predicates
17
SELECT * FROM AWHERE val BETWEEN 1 AND 100
OR val BETWEEN 50 AND 150;
SELECT * FROM A;
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
Source: Lukas Eder
![Page 26: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/26.jpg)
CMU 15-445/645 (Fall 2019)
MORE EXAMPLES
Ignoring Projections
Merging Predicates
17
SELECT * FROM AWHERE val BETWEEN 1 AND 150;
SELECT * FROM A;
CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );
Source: Lukas Eder
![Page 27: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/27.jpg)
CMU 15-445/645 (Fall 2019)
REL ATIONAL ALGEBRA EQUIVALENCES
Joins:→ Commutative, associative
R⋈S = S⋈R(R⋈S)⋈T = R⋈(S⋈T)
How many different orderings are there for an n-way join?
18
![Page 28: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/28.jpg)
CMU 15-445/645 (Fall 2019)
REL ATIONAL ALGEBRA EQUIVALENCES
How many different orderings are there for an n-way join?
Catalan number ≈4n→ Exhaustive enumeration will be too slow.
We’ll see in a second how an optimizer limits the search space...
19
![Page 29: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/29.jpg)
CMU 15-445/645 (Fall 2019)
CONCLUSION
We can use static rules and heuristics to optimize a query plan without needing to understand the contents of the database.
20
![Page 30: 14 Query Planning &Optimization - CMU 15-445/645 · First implementation of a query optimizer from the 1970s. →People argued that the DBMS could never choose a query plan better](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3dac124d6734725443e7d1/html5/thumbnails/30.jpg)
CMU 15-445/645 (Fall 2019)
NEXT CL ASS
MID-TERM EXAM!→ Seriously, this is not a joke.
21