deepquery - university of washington...deepquery deep reinforcement learning for query optimization...

27
DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and S. Sathiya Keerthi (Criteo Reserach)

Upload: others

Post on 24-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

DeepQueryDeep Reinforcement Learning for Query Optimization

1

Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and S. Sathiya Keerthi (Criteo Reserach)

Page 2: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Query Optimization is an Old Problem

2

1979

Page 3: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Query Optimization Is Still Hard Today

3

2015

Page 4: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Two Key Query Optimization Observations

Leis et al observe (among other) that:

• Accuracy of cardinality estimation is critical to good plans• Especially hard in the presence of correlations between tables

• Risk associated with certain plan choices should be considered• E.g., nested-loops are seldom beneficial without an index

4

Page 5: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Active Research Areas

5

Cardinality Estim ation Histogram Sam pling

627

39

433

Page 6: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

6

Wavelets Synopses

Page 7: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Recent Success of Deep Learning

7Examples & Images (except indexing) from: https://machinelearningmastery.com/inspirational-applications-deep-learning/

Black-and-white im age colorization Text and im age translation

Object detection and classification

Autom atic gam e playing Indexing! [SIGM OD’18]

Page 8: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Our Vision

8

Can we rethink query optimization in the context of deep learning?

Data

SQL Query

Model

TS

⋈⋈

U

Query Plan

Page 9: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Our Vision

9

Data

SQL QueryTS

⋈⋈

U

Query Plan

Deep Learning Model

How to encode data and a

query?

Can we rethink query optimization in the context of deep learning?

W hat type of m odel do we

need?

Page 10: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Can Deep Learning Help Query Opt?

• Can we automatically learn a query (subquery) representation?• That suffices to predict the query cardinality?• That can serve to derive the representation for more complex queries?

• Can we use the learned representation to find good plans?• Combine it with reinforcement learning to incrementally build plans

10

Page 11: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

A Deep Learning Model for Query Optimization

11

Deep Model

Data

SQLQuery

Input Vector

Basic Statistics:1D-Histogram s

Encode Join PredicatesEncode Selections

Need Ability to Encode all

Possible Queries!

Page 12: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Learning Subquery Representations• Use deep learning to capture properties of intermediate relations

12

ℎ"

#"ℎ"$%

State t+1State t

RepresentationofDatabaseProperties

Queryoperationattimet

SubqueryRepresentation

⋈State t+2

⋈⋈

Queryoperationattimet+1

StateTransitionFunction ""#$

SubqueryRepresentation

Action t Action t+1

%&'%&(&)*+*,-)+**,/&*

%&'%&(&)*+*,-)+**,/&*+1

+0*,-)+**,/&*

! ⋈

Page 13: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

A Recursive Function

ℎ"

#"ℎ"$%

ObservedVariables

2234

2256789:8;

13

Use cardinality as an observed variable

Query Operation

Prior subquery representation

Page 14: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Learning the Recursive Function• One approach is to train a recurrent neural network (RNN)

14

!"

ℎ" ℎ$

!$

%&'()*+,.,/+0*,1

ℎ2

!2

ℎ3

!3

ℎ4

!4

Page 15: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Preliminary Experiments• Synthetic database generated using the PDGF Framework

• 8 relations (1 fact table and 7 dimension tables)• Data size ~1.6 GB

• 8k random queries• 6k for training/2k for testing

• Select-join queries• All queries have 5 selection and/or joins

• Implementation in Tensorflow

15

Page 16: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

• Predictions compared to Postgres estimates• First action is either a selection or a join

16

Results

First Action Second Action

Page 17: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

• Need to improve predictions for later actions

17

Results

Fifth Action

Page 18: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Can Deep Learning Help Query Opt?

• Can we automatically learn a query (subquery) representation?• That suffices to predict the query cardinality?• That can serve to derive the representation for more complex queries?

• Can we use the learned representation to find good plans?• Combine it with reinforcement learning to incrementally build plans

18

Page 19: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

How to Find Good Query Plans?• Use Subquery Representations with Reinforcement Learning

• Given: A subquery representation and a set of possible actions, which is best?

19

State t+1

SubqueryRepresentation

⋈ AAction 1

? ⋈ #Action 2

⋈ $Action 3

Page 20: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Reinforcement Learning

20

AgentAction

Observe State

Reward

Environment

Page 21: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

21

AgentAction

Observe State

Reward

Environm ent

State 0

State 2

State 3

action 2

action 2

reward

reward

reward

Q(s3,a2)

Q(s2,a2)

action 1

Q(s0,a1)State 1

Reinforcement Learning: Q-learning

Page 22: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Reinforcement Learning with Subquery Representations

22

state 0

Database

state 2state 1

action 2: join operation (S,T) action 1: join operation (U,T)

reward: -cardinality

TS

⋈TU

Q(s0,a2)Q(s0,a1)

"

Page 23: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Q-Learning

23

state 0

Database

state 2state 1

TS

⋈TU

Q(s0,a2)Q(s0,a1)

Value-based iteration

state 6

Q(s2,a2)

state 4

Q(s1,a1)TS

⋈⋈

Ustate 5

Q(s2,a1)

TS

⋈⋈

R

TU

⋈⋈

S

"

Page 24: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Q-Learning

24

state 0

Database

state 1

TS

⋈TU

Q(s0,a2)Q(s0,a1)

state 6

Q(s2,a2)

state 4

Q(s1,a1)TS

⋈⋈

Ustate 5

Q(s2,a1)

TS

⋈⋈

R

TU

⋈⋈

S

"

state 2

Value-based iteration

Page 25: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Project Status

• Short paper outlines the vision: Learning State Representations for Query Optimization with Deep Reinforcement Learning. Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya KeerthiDEEM Workshop @ SIGMOD 2018

• Experimenting with different approaches for both components of the problem

• Expect more results soon

25

Page 26: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Related Work• The Case for Learned Index Structures. Kraska, et. al.

• Use models to replace/enhance indexes on single attributes

• Deep Reinforcement Learning for Join Order Enumeration Marcus, et. al.

• Using RL with Postgres estimates as the reward.

26

Page 27: DeepQuery - University of Washington...DeepQuery Deep Reinforcement Learning for Query Optimization 1 Jennifer Ortiz (UW), Magdalena Balazinska (UW), Johannes Gehrke (Microsoft) and

Conclusion

• Query optimization remains a hard problem• Can we use deep learning and reinforcement learning to improve

• Cardinality estimation?• Query plan selection?

• DeepQuery: Deep reinforcement learning for query optimization• Learn subquery representations through a recursive function• Use reinforcement learning for join enumeration• Preliminary results are promising

27