deepquery - university of washington...deepquery deep reinforcement learning for query optimization...
TRANSCRIPT
DeepQueryDeep Reinforcement Learning for Query Optimization
1
Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and S. Sathiya Keerthi (Criteo Reserach)
Query Optimization is an Old Problem
2
1979
Query Optimization Is Still Hard Today
3
2015
Two Key Query Optimization Observations
Leis et al observe (among other) that:
• Accuracy of cardinality estimation is critical to good plans• Especially hard in the presence of correlations between tables
• Risk associated with certain plan choices should be considered• E.g., nested-loops are seldom beneficial without an index
4
Active Research Areas
5
Cardinality Estim ation Histogram Sam pling
627
39
433
6
Wavelets Synopses
Recent Success of Deep Learning
7Examples & Images (except indexing) from: https://machinelearningmastery.com/inspirational-applications-deep-learning/
Black-and-white im age colorization Text and im age translation
Object detection and classification
Autom atic gam e playing Indexing! [SIGM OD’18]
Our Vision
8
Can we rethink query optimization in the context of deep learning?
Data
SQL Query
Model
TS
⋈⋈
U
Query Plan
Our Vision
9
Data
SQL QueryTS
⋈⋈
U
Query Plan
…
…
Deep Learning Model
How to encode data and a
query?
Can we rethink query optimization in the context of deep learning?
W hat type of m odel do we
need?
Can Deep Learning Help Query Opt?
• Can we automatically learn a query (subquery) representation?• That suffices to predict the query cardinality?• That can serve to derive the representation for more complex queries?
• Can we use the learned representation to find good plans?• Combine it with reinforcement learning to incrementally build plans
10
A Deep Learning Model for Query Optimization
11
Deep Model
Data
SQLQuery
Input Vector
Basic Statistics:1D-Histogram s
Encode Join PredicatesEncode Selections
Need Ability to Encode all
Possible Queries!
Learning Subquery Representations• Use deep learning to capture properties of intermediate relations
12
ℎ"
#"ℎ"$%
State t+1State t
RepresentationofDatabaseProperties
Queryoperationattimet
SubqueryRepresentation
⋈State t+2
⋈⋈
Queryoperationattimet+1
StateTransitionFunction ""#$
SubqueryRepresentation
Action t Action t+1
%&'%&(&)*+*,-)+**,/&*
%&'%&(&)*+*,-)+**,/&*+1
+0*,-)+**,/&*
! ⋈
A Recursive Function
ℎ"
#"ℎ"$%
ObservedVariables
…
2234
2256789:8;
13
Use cardinality as an observed variable
Query Operation
Prior subquery representation
Learning the Recursive Function• One approach is to train a recurrent neural network (RNN)
14
!"
ℎ" ℎ$
!$
%&'()*+,.,/+0*,1
ℎ2
!2
ℎ3
!3
ℎ4
!4
Preliminary Experiments• Synthetic database generated using the PDGF Framework
• 8 relations (1 fact table and 7 dimension tables)• Data size ~1.6 GB
• 8k random queries• 6k for training/2k for testing
• Select-join queries• All queries have 5 selection and/or joins
• Implementation in Tensorflow
15
• Predictions compared to Postgres estimates• First action is either a selection or a join
16
Results
First Action Second Action
• Need to improve predictions for later actions
17
Results
Fifth Action
Can Deep Learning Help Query Opt?
• Can we automatically learn a query (subquery) representation?• That suffices to predict the query cardinality?• That can serve to derive the representation for more complex queries?
• Can we use the learned representation to find good plans?• Combine it with reinforcement learning to incrementally build plans
18
How to Find Good Query Plans?• Use Subquery Representations with Reinforcement Learning
• Given: A subquery representation and a set of possible actions, which is best?
19
State t+1
SubqueryRepresentation
⋈ AAction 1
? ⋈ #Action 2
⋈ $Action 3
Reinforcement Learning
20
AgentAction
Observe State
Reward
Environment
21
AgentAction
Observe State
Reward
Environm ent
State 0
State 2
State 3
action 2
action 2
reward
reward
reward
Q(s3,a2)
Q(s2,a2)
action 1
Q(s0,a1)State 1
Reinforcement Learning: Q-learning
Reinforcement Learning with Subquery Representations
22
state 0
Database
state 2state 1
action 2: join operation (S,T) action 1: join operation (U,T)
reward: -cardinality
TS
⋈TU
⋈
Q(s0,a2)Q(s0,a1)
"
Q-Learning
23
state 0
Database
state 2state 1
TS
⋈TU
⋈
Q(s0,a2)Q(s0,a1)
Value-based iteration
state 6
Q(s2,a2)
state 4
Q(s1,a1)TS
⋈⋈
Ustate 5
Q(s2,a1)
TS
⋈⋈
R
TU
⋈⋈
S
"
Q-Learning
24
state 0
Database
state 1
TS
⋈TU
⋈
Q(s0,a2)Q(s0,a1)
state 6
Q(s2,a2)
state 4
Q(s1,a1)TS
⋈⋈
Ustate 5
Q(s2,a1)
TS
⋈⋈
R
TU
⋈⋈
S
"
state 2
Value-based iteration
Project Status
• Short paper outlines the vision: Learning State Representations for Query Optimization with Deep Reinforcement Learning. Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya KeerthiDEEM Workshop @ SIGMOD 2018
• Experimenting with different approaches for both components of the problem
• Expect more results soon
25
Related Work• The Case for Learned Index Structures. Kraska, et. al.
• Use models to replace/enhance indexes on single attributes
• Deep Reinforcement Learning for Join Order Enumeration Marcus, et. al.
• Using RL with Postgres estimates as the reward.
26
Conclusion
• Query optimization remains a hard problem• Can we use deep learning and reinforcement learning to improve
• Cardinality estimation?• Query plan selection?
• DeepQuery: Deep reinforcement learning for query optimization• Learn subquery representations through a recursive function• Use reinforcement learning for join enumeration• Preliminary results are promising
27