![Page 1: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/1.jpg)
postgrespro.ru
Adaptive query optimizationin PostgreSQL
Oleg IvanovPostgres Professional
![Page 2: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/2.jpg)
2
● Problem statement– Query optimization– Correlated clauses issue
● Machine learning– Gradient K Nearest Neighbours method
● Adaptive query optimization– Theory– Implementation
● Experimental evaluation
Outline
![Page 3: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/3.jpg)
3
Query optimization
![Page 4: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/4.jpg)
4
SeqScan
users u1
IndexScan
messages m
MergeJoin
IndexScan
users u2
NestedLoopJoin
SELECT *FROM users AS u1, messages AS m, users AS u2WHERE u1.id = m.sender_id AND m.receiver_id = u2.id;
Query plan
![Page 5: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/5.jpg)
5
IndexScan
users u1
SeqScan
messages m
NestedLoopJoin
IndexScan
users u2
NestedLoopJoin
SELECT *FROM users AS u1, messages AS m, users AS u2WHERE u1.id = m.sender_id AND m.receiver_id = u2.id;
Query plan
![Page 6: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/6.jpg)
6
SeqScan
users u1
SeqScan
messages m
HashJoin
SeqScan
users u2
HashJoin
SELECT *FROM users AS u1, messages AS m, users AS u2WHERE u1.id = m.sender_id AND m.receiver_id = u2.id;
Query plan
![Page 7: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/7.jpg)
7
EXPLAIN SELECT *FROM users AS u1, messages AS m, users AS u2WHERE u1.id = m.sender_id AND m.receiver_id = u2.id; QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=540.00..439429.44 rows=10003825 width=27) Hash Cond: (m.receiver_id = u2.id) -> Hash Join (cost=270.00..301606.84 rows=10003825 width=23) Hash Cond: (m.sender_id = u1.id) -> Seq Scan on messages m (cost=0.00..163784.25 rows=10003825 width=19) -> Hash (cost=145.00..145.00 rows=10000 width=4) -> Seq Scan on users u1 (cost=0.00..145.00 rows=10000 width=4) -> Hash (cost=145.00..145.00 rows=10000 width=4) -> Seq Scan on users u2 (cost=0.00..145.00 rows=10000 width=4)(9 rows)
Query plan
![Page 8: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/8.jpg)
8
EXPLAIN SELECT *FROM users AS u1, messages AS m, users AS u2WHERE u1.id = m.sender_id AND m.receiver_id = u2.id; QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=540.00..439429.44 rows=10003825 width=27) Hash Cond: (m.receiver_id = u2.id) -> Hash Join (cost=270.00..301606.84 rows=10003825 width=23) Hash Cond: (m.sender_id = u1.id) -> Seq Scan on messages m (cost=0.00..163784.25 rows=10003825 width=19) -> Hash (cost=145.00..145.00 rows=10000 width=4) -> Seq Scan on users u1 (cost=0.00..145.00 rows=10000 width=4) -> Hash (cost=145.00..145.00 rows=10000 width=4) -> Seq Scan on users u2 (cost=0.00..145.00 rows=10000 width=4)(9 rows)
Plan node execution cost Plan node cardinality
Plan execution cost
Query plan
![Page 9: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/9.jpg)
9
Cost-based query optimization
System R
1. A function which determines the plan's cost2. Minimizing the function value over all possible plans for the query
How does PostgreSQL optimize queries?
![Page 10: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/10.jpg)
10
cs seq_page_cost 1.0
cr random_page_cost 4.0
ct cpu_Tuple_cost 0.01
ci cpu_Index_tuple_cost 0.005
co cpu_Operator_cost 0.0025
Cost=ns cs+nr cr+nt c t+nic i+no co
PostgreSQL cost model
![Page 11: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/11.jpg)
11
0 – 45 – 9
10 - 14 15 - 19
20 - 24 25 - 29
30 - 34 35 - 39
40 - 44 45 - 49
50 - 54 55 - 59
60 - 64 65 - 69
70 - 74 75 - 79
80 - 84 85 - 89
90 - 94 95 – 99
100 or more
0
2000
4000
6000
8000
10000
12000
14000
Age
SELECT * FROM usersWHERE age < 25;
Cardinality
Cardinality estimation
![Page 12: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/12.jpg)
12
Dynamic programming over subsets
● System R● Time complexity: 3n
● Memory consumption: 2n
● Always finds the cheapest plan
![Page 13: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/13.jpg)
13
Genetic algorithm
● PostgreSQL● Common and flexible method● Can be stopped on every iteration● No guarantees
![Page 14: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/14.jpg)
14
Dynamic programming
or
Genetic algorithm
Optimization methodSeqScan
users
HashJoin
HashJoin
SeqScan
messages
SeqScan
pictures
Cost = 439429
MergeJoin
SeqScan
messages
SeqScan
pictures
Cost = 304528
PostgreSQL query optimization
Plan's costestimation
Cardinalityestimation
Cost model
![Page 15: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/15.jpg)
15
Correlated clauses issue
![Page 16: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/16.jpg)
16
Query clauses
Information aboutstored data
PostgreSQL state
Cardinalityestimation Cost model
![Page 17: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/17.jpg)
17Dataset:
The TPC Benchmark™DS (TPC-DS)http://www.tpc.org/tpcds/
![Page 18: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/18.jpg)
18Dataset:
The TPC Benchmark™DS (TPC-DS)http://www.tpc.org/tpcds/
Error: 300 times Error: 4 times
![Page 19: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/19.jpg)
19
Query clauses
Information aboutstored data
PostgreSQL state
Cardinalityestimation Cost model
How good are query optimizers, really?V. Leis, A. Gubichev, A. Mirchev, P. Boncz,
A. Kemper, and T. Neumann,Proc. VLDB, Nov. 2015
![Page 20: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/20.jpg)
20
0 – 45 – 9
10 - 14 15 - 19
20 - 24 25 - 29
30 - 34 35 - 39
40 - 44 45 - 49
50 - 54 55 - 59
60 - 64 65 - 69
70 - 74 75 - 79
80 - 84 85 - 89
90 - 94 95 – 99
100 or more
0
2000
4000
6000
8000
10000
12000
14000
Age
Selectivity≃0.3Cardinality=N tuples⋅Selectivity
SELECT * FROM usersWHERE age < 25;
![Page 21: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/21.jpg)
21
SELECT * FROM usersWHERE age < 25 AND city = 'Ottawa';
Selectivityage=13
Selectivitycity=17
Selectivityage ,city=?
Only selectivities of individual clauses(i.e. marginal selectivities)
are known
![Page 22: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/22.jpg)
22
age < 25
city = 'Ottawa'
1/3
1/7 1/21
![Page 23: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/23.jpg)
23
SELECT * FROM usersWHERE age < 25 AND city = 'Ottawa';
Selectivityage ,city=Selectivityage⋅Selectivitycity
With the only following exception Selectivity25<ageANDage<57=Selectivity25<age<57
Only selectivities of individual clausesare known
The clauses are considered to be independent:
![Page 24: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/24.jpg)
24
SELECT * FROM usersWHERE age < 12 AND married = true;
married = trueage < 12
![Page 25: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/25.jpg)
25
SELECT * FROM usersWHERE age < 12 AND married = false;
age < 12
married = false
![Page 26: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/26.jpg)
26
SELECT * FROM usersWHERE age > 25 AND married = trueAND position = 'CTO';
age > 25position = 'CTO'
married = true
![Page 27: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/27.jpg)
27
Multidimensional histograms
Pros:● Solve the problem● Have theoretical guarantees
Contras:● Dimensionality curse● Require memory● Require time for building or updating● Not clear which of all possible
column subsets are needed● Correlation tests are slow
![Page 28: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/28.jpg)
28Histograms Cost models
ResultSQL query
Query executionQuery optimization
Adaptive query optimization: idea
A priori cost estimation
![Page 29: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/29.jpg)
29Histograms Cost models
Feedback ResultSQL query
Query executionQuery optimization
Execution statistics
A priori cost estimation
Adaptive query optimization: idea
![Page 30: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/30.jpg)
30Histograms Cost models
Feedback ResultSQL query
Query executionQuery optimization
Execution statistics
A priori cost estimation
Adaptive query optimization: idea
![Page 31: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/31.jpg)
31Histograms Cost models
Feedback ResultSQL query
Query executionQuery optimization
Execution statistics
A priori cost estimation
Adaptive query optimization: idea
![Page 32: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/32.jpg)
32
Machine learning
![Page 33: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/33.jpg)
33
Feature 1
Feature 2
Feature 3
Hidden variables
Features
Train set Test set
ObjectsMachine learning
![Page 34: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/34.jpg)
34
Learning procedure
For our problem the learning workflow is iterative:
● On the planning stage the model has to predict hidden variables for a number of objects
● After the execution stage some of these objects are appended to the train set and the model can learn on them
![Page 35: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/35.jpg)
35
Learning procedure
Query 1
After-execution stage:Object 1 – Variable 1Object 3 – Variable 3
Planning stage:Object 1 - ?Object 2 - ?Object 3 - ?
Train set:Empty
![Page 36: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/36.jpg)
36
Learning procedure
Query 2
After-execution stage:Object 4 – Variable 4
Planning stage:Object 4 - ?Object 1 - ?Object 5 - ?
Train set:Object 1 – Variable 1Object 3 – Variable 3
![Page 37: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/37.jpg)
37
Learning procedure
Query 3
After-execution stage:...
Planning stage:...
Train set:Object 1 – Variable 1Object 3 – Variable 3Object 4 – Variable 4
![Page 38: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/38.jpg)
38
K Nearest Neighbours method
25 47 55 32 22 45 28
50 120 100 80 30 90 ?
Age
Salary
![Page 39: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/39.jpg)
39
25 47 55 32 22 45 28
50 120 100 80 30 90 ?
Age
Salary
K Nearest Neighbours method
![Page 40: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/40.jpg)
40
Gradient approach to kNN
Goal: not to store the whole train set
Idea: to use a fixed number of virtual objectsthat provide the best possible prediction quality
![Page 41: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/41.jpg)
41
27 47 28
53 103 ?
2/3
1/3
1/3 5/3
Gradient approach to kNN
Age
Salary
![Page 42: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/42.jpg)
42
Gradient approach to kNN
Math for learning:
Loss function+
Stochastic gradient descent to optimize it+
K Nearest Neighbours method
![Page 43: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/43.jpg)
43
Gradient approach to kNN
13 1220
17
10
15
![Page 44: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/44.jpg)
44
Gradient approach to kNN
13 12
10
20
17
K = 3
15
![Page 45: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/45.jpg)
45
Gradient approach to kNN
13 12
10
15
20
17
K = 3
![Page 46: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/46.jpg)
46
Gradient approach to kNN
13 12
9
13
18
17
K = 3
![Page 47: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/47.jpg)
47
Adaptive query optimizationTheory
![Page 48: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/48.jpg)
48
SeqScan
users u1
IndexScan
messages m
MergeJoin
users u2
NestedLoopJoin
The object is a node with its subtree
u2.id = messages.receiver_id
u1.id = messages.sender_id
IndexScan
u2.married = trueu2.age < 25
![Page 49: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/49.jpg)
49
Node cardinality is105 tuples!Clauses list
users.id = messages.receiver_idANDusers.married = trueANDusers.age < 25
PostgreSQL estimator
Histograms
![Page 50: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/50.jpg)
50
Node cardinality is1017 tuples!Clauses list
users.id = messages.receiver_idANDusers.married = constANDusers.age < const
Machine learning
Clause selectivities● 0.0001● 0.73● 0.23
Relations to join● users● messages
*equal clauses must be handled specially: transform into clause with an equalence class as an argument
![Page 51: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/51.jpg)
51
users.id = messages.receiver_id
users.married = const
users.age < const
0.0001
0.73
0.23
Node cardinality
Features(clause types)
Hidden variable ?
Object is a plan node
Machine learning problem statement
Base relations● users● messages
Each set of base relations and clause types induce its own machine learning problem (feature subspace hash)!
![Page 52: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/52.jpg)
52
NestedLoopJoin
HashJoin IndexScan
SeqScan SeqScan
users messages
friends
SELECT * FROM users, messages, friendsWHERE users.age > 25 AND users.id > 1000AND users.id = messages.sender_idAND users.id = friends.first_idAND messages.receiver_id = friends.second_id;
eclass_2users.age > CONSTeclass_1 > CONST
Marginal selectivities0.0001
0.780.97
Clauses
Hash Sort clauses
Feature subspace hash993059063
Feature vector(0.0001, 0.78, 0.97)
users.id = friends.first_idmessages.receiver_id = friends.second_id
users.id = messages.sender_id
users.age > 25users.id > 1000
eclass_1users.idmessages.sender_idfriends.first_id
eclass_2messages.receiver_idfriends.second_id
Base relationsusersmessagesfriends
![Page 53: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/53.jpg)
53
Query optimization Query executionQuery parsing
Cardinality estimation
Machine learning data
Query execution statistics
LearningMachine learning
Workflow
![Page 54: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/54.jpg)
54
● Will it converge?
● How fast will it converge?
● What guarantees on obtained plans orregressor do we have?
Yes, in the finite number of steps
Predictions are correct for all executed paths
Don't know (in practice in a few steps)
With perfect cost model obtained plans are not worse
Theoretical properties
![Page 55: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/55.jpg)
55
Adaptive query optimizationImplementation
![Page 56: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/56.jpg)
56
Source code
Current code for vanilla PostgreSQL (extension + patch):https://github.com/tigvarts/aqo
Open Source, PostgreSQL license
Without extension equivalent to standard PostgreSQL optimizer
Needs to be in the shared preload libraries
![Page 57: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/57.jpg)
57
Planning stage (prediction):set_baserel_size_estimatesget_parameterized_baserel_sizeset_joinrel_size_estimatesget_parameterized_joinrel_size
After-execution stage:ExecutorEnd – learningExplainOnePlan – visualization
Other:planner_hook – prepare to the planning stageExecutorStart – setting the flags for statistics collectioncopy_generic_path_info – transmit Path information to Plan node
Hooks
![Page 58: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/58.jpg)
58
Control
For some queries we don't need AQO.
So we need a mechanism to disable AQO learning or usage for some queries.
![Page 59: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/59.jpg)
59
Query types
Query type is the set of queries, which differ only in their constants.
Query type:SELECT * FROM users WHERE age > const AND city = const;
Queries:SELECT * FROM users WHERE age > 18 AND city = 'Ottawa';SELECT * FROM users WHERE age > 65 AND city = 'New York';…
![Page 60: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/60.jpg)
60
AQO tables
aqo_queries● Query_hash● Learn AQO● Use AQO● Feature_space
Settings
aqo_query_texts● Query_hash● Query_text
For user
aqo_data● Feature_space● Feature_subspace● Features● Target
Machine learning
![Page 61: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/61.jpg)
61
Control
The users don't want to configure AQO query settings manually.
So we need a mechanism to determine automatically whether the query needs AQO.
It is called auto tuning.
![Page 62: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/62.jpg)
62
AQO tables
aqo_queries● Query_hash● Learn AQO● Use AQO● Feature_space● Auto_tuning
Settings
aqo_query_texts● Query_hash● Query_text
For user
aqo_query_stat● Query_hash● Planning time● Execution time● Cardinality error● Number of executions
For auto tuning
aqo_data● Feature_space● Feature_subspace● Features● Target
Machine learning
![Page 63: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/63.jpg)
63
Control
GUC: aqo.mode
● Disabled: disabled for all query types● Forced: enabled for all query types● Controlled: use manual settings for known query types, ignore others● Intelligent: use manual settings for known query types,
tries to tune others automatically
![Page 64: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/64.jpg)
64
Query optimization Query executionQuery parsing
Cardinality estimation
Machine learning data
Query execution statistics
LearningMachine learning
Workflow
![Page 65: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/65.jpg)
65
Query optimization Query executionQuery parsing
Cardinality estimation
Query execution statistics
Learning
Fair workflow
Query settings definition Auto tuning
![Page 66: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/66.jpg)
66
Experimental evaluation
![Page 67: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/67.jpg)
67
Strongly Correlated Columns (StrongCor)
Cardinality estimation using neural networksH. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte,
CASCON’15, 2015, IBM Corp.
![Page 68: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/68.jpg)
68
Cardinality estimation error
![Page 69: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/69.jpg)
69
Cardinality estimation error
![Page 70: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/70.jpg)
70
Cardinality estimation error
![Page 71: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/71.jpg)
71
TPC-H slow
TPC-H fast
0 10 20 30 40 50 60
+1.3%
Performance improvement
AdaptiveOriginal
![Page 72: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/72.jpg)
72
TPC-H slow
TPС-H fast
0 5000 10000 15000 20000
-4.4%
AdaptiveOriginal
Performance improvement
![Page 73: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/73.jpg)
73
TPC-DS very slow
TPC-DS slow
TPC-DS normal
TPC-DS fast
TPC-DS very fast
0 2 4 6 8 10 12 14 16 18
+12%
AdaptiveOriginal
Performance improvement
![Page 74: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/74.jpg)
74
TPC-DS very slow
TPC-DS slow
TPC-DS normal
TPC-DS fast
TPC-DS very fast
0 20 40 60 80 100 120 140
+24%
AdaptiveOriginal
Performance improvement
![Page 75: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/75.jpg)
75
TPC-DS very slow
TPC-DS slow
TPC-DS normal
TPC-DS fast
TPC-DS very fast
0 50 100 150 200 250 300
+41%
AdaptiveOriginal
Performance improvement
![Page 76: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/76.jpg)
76
TPC-DS very slow
TPC-DS slow
TPC-DS normal
TPC-DS fast
TPC-DS very fast
0200
400600
8001000
12001400
16001800
+285%
AdaptiveOriginal
Performance improvement
![Page 77: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/77.jpg)
77
TPC-DS very slow
TPC-DS slow
TPC-DS normal
TPC-DS fast
TPC-DS very fast
0 5000 10000 15000 20000
Adaptive
+115%
Original
Performance improvement
![Page 78: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/78.jpg)
78
Maximum acceleration
![Page 79: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/79.jpg)
790 1 2 3 4 5 6 7 8 9
# iter
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4E
xecu
tion
tim
e,
s
TPC-DS 18
Learning progress
![Page 80: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/80.jpg)
800 1 2 3 4 5 6 7 8 9
# iter
100
101
102
103
104
Exe
cuti
on t
ime, s
TPC-DS 69
Learning progress
![Page 81: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/81.jpg)
810 1 2 3 4 5 6 7 8 9
# iter
0.7
0.8
0.9
1.0
1.1
1.2
1.3Exe
cuti
on
tim
e, s
TPC-DS 26
Learning progress
![Page 82: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/82.jpg)
82
SELECT MIN(k.keyword) AS movie_keyword, MIN(n.name) AS actor_name, MIN(t.title) AS hero_movieFROM cast_info AS ci, keyword AS k, movie_keyword AS mk, name AS n, title AS tWHERE k.keyword in ('superhero', 'sequel', 'second-part', 'marvel-comics', 'based-on-comic', 'tv-special', 'fight', 'violence') AND n.name LIKE '%Downey%Robert%' AND t.production_year > 2000 AND k.id = mk.keyword_id AND t.id = mk.movie_id AND t.id = ci.movie_id AND ci.movie_id = mk.movie_id AND n.id = ci.person_id;
Example: complicated queryHow good are query optimizers, really?
V. Leis, A. Gubichev, A. Mirchev, P. Boncz,A. Kemper, and T. Neumann,
Proc. VLDB, Nov. 2015Join Order Benchmark
![Page 83: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/83.jpg)
83
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=15028.15..15028.16 rows=1 width=96) (actual time=9059.369..9059.369 rows=1 loops=1) -> Nested Loop (cost=8.21..15028.14 rows=1 width=48) (actual time=218.507..9058.827 rows=88 loops=1) -> Nested Loop (cost=7.78..12650.75 rows=5082 width=37) (actual time=0.700..3343.348 rows=785477 loops=1) Join Filter: (t.id = ci.movie_id) -> Nested Loop (cost=7.21..12370.11 rows=148 width=41) (actual time=0.682..423.503 rows=14165 loops=1) -> Nested Loop (cost=6.78..12235.17 rows=270 width=20) (actual time=0.661..159.090 rows=35548 loops=1) -> Seq Scan on keyword k (cost=0.00..3632.40 rows=8 width=20) (actual time=0.125..28.679 rows=8 loops=1) Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[])) Rows Removed by Filter: 134162 -> Bitmap Heap Scan on movie_keyword mk (cost=6.78..1072.32 rows=303 width=8) (actual time=0.949..15.307 rows=4444 loops=8) Recheck Cond: (keyword_id = k.id) Heap Blocks: exact=23488 -> Bitmap Index Scan on keyword_id_movie_keyword (cost=0.00..6.71 rows=303 width=0) (actual time=0.563..0.563 rows=4444 loops=8) Index Cond: (keyword_id = k.id) -> Index Scan using title_pkey on title t (cost=0.43..0.49 rows=1 width=21) (actual time=0.007..0.007 rows=0 loops=35548) Index Cond: (id = mk.movie_id) Filter: (production_year > 2000) Rows Removed by Filter: 1 -> Index Scan using movie_id_cast_info on cast_info ci (cost=0.56..1.47 rows=34 width=8) (actual time=0.010..0.190 rows=55 loops=14165) Index Cond: (movie_id = mk.movie_id) -> Index Scan using name_pkey on name n (cost=0.43..0.46 rows=1 width=19) (actual time=0.007..0.007 rows=0 loops=785477) Index Cond: (id = ci.person_id) Filter: (name ~~ '%Downey%Robert%'::text) Rows Removed by Filter: 1 Planning time: 36.697 ms Execution time: 9059.593 ms(26 rows)
Bad cardinality estimates
![Page 84: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/84.jpg)
84
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=88572.37..88572.38 rows=1 width=96) (actual time=16070.666..16070.666 rows=1 loops=1) -> Nested Loop (cost=8.21..88571.71 rows=88 width=48) (actual time=367.618..16070.206 rows=88 loops=1) Join Filter: (mk.movie_id = t.id) -> Nested Loop (cost=7.78..88571.24 rows=1 width=39) (actual time=367.595..16068.698 rows=112 loops=1) -> Nested Loop (cost=7.35..84247.78 rows=9242 width=28) (actual time=0.659..5438.176 rows=1564305 loops=1) -> Nested Loop (cost=6.78..12235.17 rows=35548 width=20) (actual time=0.642..154.833 rows=35548 loops=1) -> Seq Scan on keyword k (cost=0.00..3632.40 rows=8 width=20) (actual time=0.121..28.424 rows=8 loops=1) Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[])) Rows Removed by Filter: 134162 -> Bitmap Heap Scan on movie_keyword mk (cost=6.78..1072.32 rows=303 width=8) (actual time=0.925..14.836 rows=4444 loops=8) Recheck Cond: (keyword_id = k.id) Heap Blocks: exact=23488 -> Bitmap Index Scan on keyword_id_movie_keyword (cost=0.00..6.71 rows=303 width=0) (actual time=0.543..0.543 rows=4444 loops=8) Index Cond: (keyword_id = k.id) -> Index Scan using movie_id_cast_info on cast_info ci (cost=0.56..1.47 rows=55 width=8) (actual time=0.008..0.139 rows=44 loops=35548) Index Cond: (movie_id = mk.movie_id) -> Index Scan using name_pkey on name n (cost=0.43..0.46 rows=1 width=19) (actual time=0.007..0.007 rows=0 loops=1564305) Index Cond: (id = ci.person_id) Filter: (name ~~ '%Downey%Robert%'::text) Rows Removed by Filter: 1 -> Index Scan using title_pkey on title t (cost=0.43..0.45 rows=1 width=21) (actual time=0.012..0.012 rows=1 loops=112) Index Cond: (id = ci.movie_id) Filter: (production_year > 2000) Rows Removed by Filter: 0 Planning time: 11.166 ms Execution time: 16070.850 ms(26 rows)
Using previous statistics to refine estimates
![Page 85: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/85.jpg)
85
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=169291.60..169291.61 rows=1 width=96) (actual time=3851.890..3851.890 rows=1 loops=1) -> Hash Join (cost=107718.88..169290.94 rows=88 width=48) (actual time=845.627..3851.419 rows=88 loops=1) Hash Cond: (ci.person_id = n.id) -> Nested Loop (cost=7.78..58633.52 rows=785477 width=37) (actual time=0.700..3011.292 rows=785477 loops=1) Join Filter: (t.id = ci.movie_id) -> Nested Loop (cost=7.21..30001.33 rows=14165 width=41) (actual time=0.682..415.470 rows=14165 loops=1) -> Nested Loop (cost=6.78..12235.17 rows=35548 width=20) (actual time=0.663..154.262 rows=35548 loops=1) -> Seq Scan on keyword k (cost=0.00..3632.40 rows=8 width=20) (actual time=0.126..28.971 rows=8 loops=1) Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[])) Rows Removed by Filter: 134162 -> Bitmap Heap Scan on movie_keyword mk (cost=6.78..1072.32 rows=303 width=8) (actual time=0.980..14.743 rows=4444 loops=8) Recheck Cond: (keyword_id = k.id) Heap Blocks: exact=23488 -> Bitmap Index Scan on keyword_id_movie_keyword (cost=0.00..6.71 rows=303 width=0) (actual time=0.579..0.579 rows=4444 loops=8) Index Cond: (keyword_id = k.id) -> Index Scan using title_pkey on title t (cost=0.43..0.49 rows=1 width=21) (actual time=0.007..0.007 rows=0 loops=35548) Index Cond: (id = mk.movie_id) Filter: (production_year > 2000) Rows Removed by Filter: 1 -> Index Scan using movie_id_cast_info on cast_info ci (cost=0.56..1.47 rows=44 width=8) (actual time=0.009..0.170 rows=55 loops=14165) Index Cond: (movie_id = mk.movie_id) -> Hash (cost=107705.93..107705.93 rows=414 width=19) (actual time=756.785..756.785 rows=2 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on name n (cost=0.00..107705.93 rows=414 width=19) (actual time=77.074..756.771 rows=2 loops=1) Filter: (name ~~ '%Downey%Robert%'::text) Rows Removed by Filter: 4167489 Planning time: 13.462 ms Execution time: 3852.110 ms(28 rows)
![Page 86: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/86.jpg)
86
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=109869.39..109869.40 rows=1 width=96) (actual time=783.543..783.543 rows=1 loops=1) -> Nested Loop (cost=1.85..109868.73 rows=88 width=48) (actual time=93.291..783.412 rows=88 loops=1) -> Nested Loop (cost=1.43..109849.92 rows=41 width=36) (actual time=92.005..767.675 rows=5202 loops=1) -> Nested Loop (cost=0.99..109833.44 rows=9 width=40) (actual time=91.982..763.536 rows=306 loops=1) -> Nested Loop (cost=0.56..109825.54 rows=17 width=19) (actual time=91.907..758.112 rows=486 loops=1) -> Seq Scan on name n (cost=0.00..107705.93 rows=2 width=19) (actual time=73.336..732.998 rows=2 loops=1) Filter: (name ~~ '%Downey%Robert%'::text) Rows Removed by Filter: 4167489 -> Index Scan using person_id_cast_info on cast_info ci (cost=0.56..1054.82 rows=499 width=8) (actual time=12.418..12.503 rows=243 loops=2) Index Cond: (person_id = n.id) -> Index Scan using title_pkey on title t (cost=0.43..0.45 rows=1 width=21) (actual time=0.011..0.011 rows=1 loops=486) Index Cond: (id = ci.movie_id) Filter: (production_year > 2000) Rows Removed by Filter: 0 -> Index Scan using movie_id_movie_keyword on movie_keyword mk (cost=0.43..1.36 rows=47 width=8) (actual time=0.007..0.010 rows=17 loops=306) Index Cond: (movie_id = t.id) -> Index Scan using keyword_pkey on keyword k (cost=0.42..0.45 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=5202) Index Cond: (id = mk.keyword_id) Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[])) Rows Removed by Filter: 1 Planning time: 13.981 ms Execution time: 783.723 ms(22 rows)
![Page 87: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/87.jpg)
87
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=110747.71..110747.72 rows=1 width=96) (actual time=770.231..770.232 rows=1 loops=1) -> Nested Loop (cost=1.85..110747.05 rows=88 width=48) (actual time=78.828..770.093 rows=88 loops=1) Join Filter: (mk.movie_id = t.id) -> Nested Loop (cost=1.42..110694.70 rows=112 width=39) (actual time=75.208..769.518 rows=112 loops=1) -> Nested Loop (cost=1.00..110660.75 rows=74 width=27) (actual time=74.639..743.329 rows=10066 loops=1) -> Nested Loop (cost=0.56..109820.42 rows=486 width=19) (actual time=74.589..736.659 rows=486 loops=1) -> Seq Scan on name n (cost=0.00..107705.93 rows=2 width=19) (actual time=74.543..736.376 rows=2 loops=1) Filter: (name ~~ '%Downey%Robert%'::text) Rows Removed by Filter: 4167489 -> Index Scan using person_id_cast_info on cast_info ci (cost=0.56..1054.82 rows=243 width=8) (actual time=0.027..0.094 rows=243 loops=2) Index Cond: (person_id = n.id) -> Index Scan using movie_id_movie_keyword on movie_keyword mk (cost=0.43..1.26 rows=47 width=8) (actual time=0.006..0.010 rows=21 loops=486) Index Cond: (movie_id = ci.movie_id) -> Index Scan using keyword_pkey on keyword k (cost=0.42..0.45 rows=1 width=20) (actual time=0.002..0.002 rows=0 loops=10066) Index Cond: (id = mk.keyword_id) Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[])) Rows Removed by Filter: 1 -> Index Scan using title_pkey on title t (cost=0.43..0.45 rows=1 width=21) (actual time=0.005..0.005 rows=1 loops=112) Index Cond: (id = ci.movie_id) Filter: (production_year > 2000) Rows Removed by Filter: 0 Planning time: 14.306 ms Execution time: 770.452 ms(23 rows)
![Page 88: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/88.jpg)
88
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=112898.09..112898.10 rows=1 width=96) (actual time=754.243..754.243 rows=1 loops=1) -> Nested Loop (cost=1.85..112897.43 rows=88 width=48) (actual time=75.655..754.117 rows=88 loops=1) -> Nested Loop (cost=1.43..110510.89 rows=5202 width=36) (actual time=74.710..739.330 rows=5202 loops=1) Join Filter: (t.id = mk.movie_id) -> Nested Loop (cost=0.99..110046.39 rows=306 width=40) (actual time=74.694..735.745 rows=306 loops=1) -> Nested Loop (cost=0.56..109820.42 rows=486 width=19) (actual time=74.648..732.892 rows=486 loops=1) -> Seq Scan on name n (cost=0.00..107705.93 rows=2 width=19) (actual time=74.602..732.599 rows=2 loops=1) Filter: (name ~~ '%Downey%Robert%'::text) Rows Removed by Filter: 4167489 -> Index Scan using person_id_cast_info on cast_info ci (cost=0.56..1054.82 rows=243 width=8) (actual time=0.027..0.097 rows=243 loops=2) Index Cond: (person_id = n.id) -> Index Scan using title_pkey on title t (cost=0.43..0.45 rows=1 width=21) (actual time=0.005..0.006 rows=1 loops=486) Index Cond: (id = ci.movie_id) Filter: (production_year > 2000) Rows Removed by Filter: 0 -> Index Scan using movie_id_movie_keyword on movie_keyword mk (cost=0.43..1.26 rows=21 width=8) (actual time=0.004..0.008 rows=17 loops=306) Index Cond: (movie_id = ci.movie_id) -> Index Scan using keyword_pkey on keyword k (cost=0.42..0.45 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=5202) Index Cond: (id = mk.keyword_id) Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[])) Rows Removed by Filter: 1 Planning time: 15.498 ms Execution time: 754.449 ms(23 rows)
Convergence to the plan with good estimates
![Page 89: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/89.jpg)
89
Conclusion
![Page 90: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/90.jpg)
90
Uses stored statistics to refine cardinality estimates
Works for clauses of the same structure (i. e. age < some_const)
Works when data distribution doesn't change rapidly
Is suitable for the static workload (i. e. the finite number of query templates or clause structures)
Is useful for complicated queries of the same structure with slow plan caused by bad cardinality estimates (OLAP)
Adaptive Query Optimization
![Page 91: Adaptive query optimization in PostgreSQL...Cost-based query optimization System R 1. A function which determines the plan's cost 2. Minimizing the function value over all possible](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec99cb47bca3b712d021e5d/html5/thumbnails/91.jpg)
91
Further work
Histograms Cost models
A priori cost estimation
Feedback ResultSQL query
Query executionQuery optimization
Execution statistics