university of pennsylvania, database research group 1 chase & backchase: a versatile tool for...
Post on 21-Dec-2015
214 views
TRANSCRIPT
University of Pennsylvania, Database Research Group
1
Chase & Backchase:A Versatile Tool for Query
Optimization
Lucian Popa, Alin Deutsch, Arnaud Sahuguet, Val Tannen
University of Pennsylvania
University of Pennsylvania, Database Research Group
2
Logical vs. Physical
• A separation is necessary for physical data independence:
Logical schem
a
Physical
schema
Logical optimization
Physical optimization
Network
Unfortunately, limited cooperation !
University of Pennsylvania, Database Research Group
3
Optimization Techniques
1 Some rewriting (unnesting)
2 Acces path (index) use (ad-hoc)
3 Join ordering (dynamic programming)
4 Semantic optimization (more rewriting)
5 Use of materialized views (rewriting, too)
6 Object-oriented techniques (everything above, but OO! )
Tradition
al
More
“exotic”
Logical
Physical
Physical
Logical
Both ?!
Mixed
• Limited cooperation, so far, because of :– different foundations
– different affiliation: logical or physical
University of Pennsylvania, Database Research Group
4
Bridges
• What is needed: interaction between all these
techniques. – can produce better plans, by enabling each other:
• This work connects together some of these techniques,
by finding a common foundation
Semantic optimizati
on
Use of materialized
views
Use of indexes
University of Pennsylvania, Database Research Group
5
In a (Coco)Nutshell
• The Unifier: Constraints (Dependencies):– logical constraints:
• semantic relationships among elements of the logical schema– physical constraints:
• semantic relationships between elements of the physical schema and elements of the logical schema
• Chase with constraints produces the universal plan UP:– UP incorporates all relevant access paths– subqueries of UP provide the search space (candidate plans)
• Backchase: – search among candidate plans for scan-minimal queries, checking for
equivalence using (reverse) chase
– search space can be pruned using cost-based optimization
University of Pennsylvania, Database Research Group
6
Talk Outline
• Motivation and Overview
• Logical and Physical Constraints, Chase and Backchase (VLDB’99)
• Theoretical Results
(ICDT’99, VLDB’99, recent improvements)
• Using Cost Information
• Experimental Results (SIGMOD’00, recent improvements)
• Conclusion
University of Pennsylvania, Database Research Group
7
Proj : Set < Struct { string PName; string PDept; string CustName; string Budget;} >
class Dept (extent depts){ attribute string DName; attribute Set<string> DProjs;}
Logical Schema (ODMG syntax)
Dept : Dict < Doid, Struct { string DName; Set <string> DProjs; }>
fresh base type for oids
• To formally describe classes and their operations:
use dictionaries (finite functions).
University of Pennsylvania, Database Research Group
8
Translating OO Queries into Dictionary Form
select struct (PN: s, DN: Dept [d].DName)
from dom Dept d , Dept [d].DProjs s
“domain” as extent “lookup” for oid dereferencing
select struct (PN: s, DN: d.DName)from depts d , d.DProjs s OO
• Dictionary operations: dom M, M[k]
• Constraints are translated in a similar way
Dict
University of Pennsylvania, Database Research Group
9
Logical Schema Constraints
(RIC1) (d dom Dept) (s Dept [d].DProjs)
(p Proj) s = p.PName
(INV2) (p Proj) (d dom Dept)
p.PDept = Dept [d].DName (s Dept [d].DProjs) p.Pname = s
+ two others …
• Describe semantic relationships among elements of the logical schema
• Example: an inverse relationship between Proj and
Dept:
University of Pennsylvania, Database Research Group
10Physical Schema
Primary Index on PName I : Dict <string, T>
Secondary Index on CustName SI : Dict <string, Set < T >>
(where T is the type of tuples in Proj)
• Two indexes for relation Proj :
• A materialized view (a la “join index”) :
JI : Set< Struct{Doid JDoid; string JPN} >
JI = select struct (JDoid: d, JPN: p.PName)
from dom Dept d, Dept [d].DProjs s, Proj p
where s = p.PName
University of Pennsylvania, Database Research Group
11
Physical Schema Constraints
Describe semantic relationship between elements of the logical schema and elements of the physical schema.
(JI1) (d dom Dept) (s Dept [d].DProjs) (p Proj)
[ s = p. PName ( j JI) j .JDoid = d and j.JPN = p.Pname ]
(SI1) (p Proj) (k dom SI)
(t SI[k]) [ k = p.CustName and p
= t ]
One of the constraints for SI :
One of the constraints for JI :
University of Pennsylvania, Database Research Group
12
An Example of Interaction in Optimization
select struct (PN: s, PB: p.Budg, DN: Dept [d].DName)from dom Dept d, Dept [d].DProjs s , Proj pwhere s = p.PName and p.CustName = “CitiBank”
A user (logical) query
• The query is chased with all the applicable logical and physical constraints (eg., RIC1, SI1, JI1)
• The result of the chase is the universal plan.
University of Pennsylvania, Database Research Group
13
The Generalized Chase
select O(r1,…,rm)from R1 r1,…,Rm rm
where B1(r1,…,rm)
(r1R1) … (rmRm) [ B1(r1,…,rm) (s1S1) … (snSn) B2(r1,…,rm,s1,
…,sn) ]
Let d be the constraint:
select O(r1,…,rm) from R1 r1,…,Rm rm,
S1 s1,…,Sn sn
where B1(r1,…,rm) and B2(r1,…,rm,s1,…,sn)
d
distinct !
distinct !
University of Pennsylvania, Database Research Group
14
A Universal Plan
select struct (PN: s, DN: Dept [d].DName)
from dom Dept d , Dept [d].DProjs s ,
Proj p, JI j, dom SI k, SI [k] t, dom I i
where s = p.PName and p.CustName = “CitiBank”
and Dept[d].DName = p.PDept and j.DOID = d and j.PN =
p.PName
and p.CustName = k and p = t and i = p.PName and p =
I[i]
U:
• U gathers those elements from both logical and physical schema, that are relevant for alternative implementations
• U is redundant
Added by chase
University of Pennsylvania, Database Research Group
15
(Top-Down) Backchase Minimization
• Eliminates the redundancies from the universal plan
• Enumerates subqueries of UP:
eliminates scans top-down, as long as equivalence is
preserved;
equivalence is verified by a (reverse) chase with
applicable constraints (eg., INV2)
• Outputs several scan-minimal subqueries
University of Pennsylvania, Database Research Group
16Chase & (Top-Down) Backchase
Q0Original Query
U
Chase
dn
d1
Universal Plan
...
d1
dn
More minimization possible ...
. . .
...
. . .Minimal subqueries of U
QnQ1
BackChased’m
d’1
...
Cost-based optimization still needed
University of Pennsylvania, Database Research Group
17
Some of the Generated Candidate Plans
select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)
from SI[“Citibank”] p
select struct (PN: j.JPN, PB: I[j.JPN].Budg, DN: Dept[j.JDoid].DName)from JI j
where I[j.JPN].CustName = “CitiBank”
Plan 3: Using the Join Index and the Primary Index
Plan 2: Secondary Index Lookup
select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)from Proj pwhere p.CustName = “CitiBank”
Plan 1: Relation Scan
Different access paths, but equivalent
University of Pennsylvania, Database Research Group
18
Talk Outline
• Motivation and Overview
• Constraints, Chase and Backchase
• Theoretical Results
• Using Cost
• Experimental Results
• Conclusion
University of Pennsylvania, Database Research Group
19
Path-Conjunctive Language
Paths: P ::= x | c | R | P.A | dom P | P[x]
Path-Conjunctions: B ::= P1 = P1’ and ... and Pk = Pk’
Path-Conjunctive (PC) Queries: select struct (A1: P1’, ..., An: Pn’) from P1 x1, ..., Pm xm where B
Embedded PC Dependencies (EPCDs): (r1P1) … (rmPm) [ B1(r1,…,rm) (s1P’1) … (snP’n) B2(r1,…,rm,s1 ,…, sn) ]
No set/dictionary type here
Still very expressive!
University of Pennsylvania, Database Research Group
20
Overview of PC Query Containment Results
Chase Complexity
PC query containment/ EPCD t riviality
NP-complete
PC query containmentunder f ull EPCDs
Decisionprocedure EXP-complete
PC query containmentunder EPCDs
Complete proofprocedure
r .e.
• The chase is also complete for EPCD implication
• Strengthen results of Chandra & Merlin, Beeri & Vardi (70’s & 80’s)
University of Pennsylvania, Database Research Group
21
Completeness of C&B
Assume the following:
Logical constraints: D a set of EPCDs
Physical schema:
• primary indexes
• materialized views that are PC queries (includes join indexes and access suport relations)
• access structures representable as dictionary expressions with PC query domain and entry (includes secondary indexes and gmaps [Tsatalos et al] )
Physical constraints: C (eg., SI1, JI1)
C is a set of EPCDs. (In general the constraints in C are not full.)
University of Pennsylvania, Database Research Group
22
Completeness of C&B (continued)
Theorem (Completeness)
Let Q be a PC query such that some chasing sequence of Q with D terminates (with chase D
(Q) ). Then:
(a) chase C (chase D (Q) ) terminates
(b) any scan-minimal candidate plan equivalent to Q under D is a subquery of chase C (chase D (Q) )
Strengthens results of Levy at al, new search space
University of Pennsylvania, Database Research Group
23
Talk Outline
• Motivation and Overview
• Constraints, Chase and Backchase
• Theoretical Results
• Interaction with Cost-Based Optimization
• Experimental Results
• Conclusion
University of Pennsylvania, Database Research Group
24
Using Cost
Want to– reduce the size of the backchase search space– output a scan-minimal subquery that is also cost-
minimal
Cost monotonicity assumption (made implicitly earlier): subqueries are cheaper
Hence the top-down backchase cannot exploit cost!
Instead, a bottom-up backchase enumerates queriesbuilt from the scans of the universal plan, checking
for equivalence with the universal plan and cost minimality
and pruning all superqueries
University of Pennsylvania, Database Research Group
25Bottom-Up Backchase with Cost-Based Pruning
Q0
Original Query
U
Chase
dn
d1
Universal Plan
...
S1, S2, S3, S4
Scans of U:
S1
S2
S4
S3
subqueries of size 1
subqueries of size 2
S3, S4
...S1, S2
S1, S3
S1, S4
Equivalent to U ?
If YES then minimal rewriting.
Becomes best plan so far.
Prune all superqueries
Cost higher than the min cost so
far ?
If YES then prune it together with all
superqueries
S1, S2, S3
S1, S3, S4
S1, S2, S4
...
University of Pennsylvania, Database Research Group
26
Interaction with a Cost-Based Component
Cost-basedoptimization
Logical schema
=
relations + OO classes
Physical schema
= views + indexes
Rewriting with logical and
physical constraints (C&B)
Cost-based pruning
Query
Best physical plan
Candidate plan q
Physical plan and cost for q
Cost information
University of Pennsylvania, Database Research Group
27
Cost-Based Optimization of a Candidate Plan(the usual suspects)
• Scan order (“join reordering”)
generalized dynamic programming technique– beyond relational– global”, i.e., interacting with backchase pruning
• Placement of selections and projections– because the chase works with queries in “normalized”
form
• Join methods: nested loops, index join , others
University of Pennsylvania, Database Research Group
28
Talk Outline
• Motivation and Overview
• Chase and Backchase (as in proposal): Example of candidate plans enumeration
• Chase and Backchase (as in proposal): Completeness of path-conjunctive (PC) case
• Cost-based C&B
• Experimental results
• Future Research and Conclusion
University of Pennsylvania, Database Research Group
29
Is the C&B Technique Practical ?
• Is the chase feasible ? – tries exponentially many mappings for each step
• Is the backchase feasible ? – explores exponentially many subqueries of UP– what is the effect of cost-based pruning ?
• Is it worthwhile ? – does the quality of the generated plans outweigh
the cost of optimization ?
University of Pennsylvania, Database Research Group
30 Experiments
• We have developed a prototype in Java
• Several experimental configurations that: • cover important practical cases• are scalable
• Not shown here:
– Exper. Config. 1 : relational queries and indexes.
– Exper. Config. 3 : OO techniques
interaction between semantic optimization and path
indexes
University of Pennsylvania, Database Research Group
31Experimental Configuration 2 : Relational Views
• Input query: chain of stars
• Schemas: views and key constraints
– In addition: indexes on the corner relations
• Scale-up parameters:– # stars, size of stars, # views/star, #indexes/star
R1 R2
S11
S12
S13
S21
S22
S23
V11
V12
V21
V22
key constraints
Interaction between semantic optimization and views • No rewriting with views in the absence of key constraints
University of Pennsylvania, Database Research Group
32
Time to Chase
Time to chase [EC2]
0
0.5
1
1.5
2
10 15 20 25
Size of the query
Tim
e in
sec
on
ds
9 view s + 3 key = 21 constraints
6 view s + 3 key = 15 constraints
• Chasing alone is fast !
• For queries with more than 15 joins and more than 15 constraints it takes seconds
University of Pennsylvania, Database Research Group
33
Stratification
• For UP of size 12-15 joins, FB (top-down or bottom-up) may become impractical
• OQF (On-line Query Fragmentation): UP can be decomposed, prior to backchase, into smaller univ. plans.
Querychase
UP
backchase, no cost
Plans (minimal
subqueries)
R1 R2
S11
S12
S13
S21
S22
S23
V11
V12
V21
V22
Full backchase (FB):
Complexity: 2k1 + … + 2km << 2k1+…+km
University of Pennsylvania, Database Research Group
34 Backchase Strategies
• We compare several C&B optimizers obtained by
combining in various ways: – top-down/bottom-up full backchase
– cost-based pruning
– stratification
– TopDownFB (top-down full, no cost pruning)
– BottomUpFB (bottom-up full, no cost pruning)
– BottomUpFB+Prune (bottom-up full, cost pruning)
– OQF (OQF fragmentation, BottomUpFB within each fragment)
– OQF+Prune (OQF, BottomUpFB+Prune within each fragment)
– DP (dynamic programming, i.e. no C&B) • also used for cost-evaluation in all the other strategies
University of Pennsylvania, Database Research Group
35One Star Query (No Stratification Applicable)
One star query (s ize = 6)
0
10
20
30
40
50
60
70
80
90
100
110
1 2 3 4
Num ber of View s
To
tal O
pti
miz
atio
n T
ime
[s]
TopDow n FB
BottomUp FB
BottomUp FB + Prune
Dynamic-Programming
One star query (s ize = 7)
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
1 2 3 4 5
Num ber of View s
To
tal O
pti
miz
atio
n T
ime
[s]
TopDow n FB
BottomUp FB
BottomUp FB + Prune
Dynamic-Programming
• BottomUpFB+Prune outperforms its full counterparts that do not use cost pruning:– total optimization time, for query size 6, and 4 views:
• BottomUpFB+Prune : 6.4s • TopDownFB : 108.1 s• BottomUpFB : 82.2s
University of Pennsylvania, Database Research Group
36
Adding IndexesQuery size = 6, #Views = 2
0
10
20
30
40
50
60
70
80
90
100
[6,2,0] [6,2,1] [6,2,2] [6,2,3] [6,2,4] [6,2,5]
(query size, #views, #idx)
TopDow n FB
BottomUpFB
BottomUp FB +Prune
Query size = 6, #Views = 4
0
10
20
30
40
50
60
70
80
90
100
[6,4,0] [6,4,1] [6,4,2] [6,4,3] [6,4,4]
(query size, #views, #idx)
BottomUp FB + Prune
• Total optimization time (query size=6, views=2, indexes=3):• BottomUpFB+Prune : 10.7s • TopDownFB : 68.3 s• BottomUpFB : 63.6s
• BottomUpFB+Prune can go to larger configurations: – (query size=6, views=4, indexes=4): 53.2s
University of Pennsylvania, Database Research Group
37
Chain of Two Stars (OQF Stratification is Applicable)
1
10
100
1000
[6,2] [8,2] [8,4] [10,2] [10,4]
[query size, number of views]
To
tal
Op
tim
iza
tio
n T
ime
[s
]
Bottom Up + Prune
OQF
pruneOQF
Dynam ic-Program m ing
TopDownFB
Bottom UpFB
• BottomUpFB+Prune is similar to OQF: 72s for query size 10 and 4 views
• OQF+Prune scales best: only 8.6s (faster than DP !) – with dictionaries (indexes), it may miss good plans
• DP becomes expensive at large queries: 14.6s for query size 10– DP’s performance directly affects BottomUpFB+Prune
University of Pennsylvania, Database Research Group
38
Is C&B Worthwhile ?
• Configuration: chain of two stars
• Execution time measured with DB2 on a medium database (15,000 tuples) – unoptimized query vs. C&B optimized query (produced with BottomUpFB+Prune)
Querysize
Exec Time Views OptimizedExec Time
OptimizationTime
Total ProcesingTime (C&B)
6 20min 40s 2 2min 16s 1.8s 2min 18s
2 5min 44s 7.3s 5min 51s 8 24min 30s
4 19s 20.2s 39.2s
2 7min 52s 30s 8min 22s 10 28min 10s
4 1min 34s 1min 12s 2min 46s
University of Pennsylvania, Database Research Group
39
Summary
• C&B integrates, flexibly, many aspects of logical and physical optimization.
• Good theoretical foundations based on constraints and chase
• The technique is practical (feasible + worthwhile)– cost-based pruning is good !– stratification is good !– even better when we can combine them
University of Pennsylvania, Database Research Group
40
Future Work
• Better stratification techniques:– complete for arbitrary constraints – complete, when combined with cost-based
pruning
• Other query languages:– union / disjunction – grouping / aggregates– bag and list semantics
University of Pennsylvania, Database Research Group
41
University of Pennsylvania, Database Research Group
42
Tableaux Chase
A B C D
RR
w x y z’w’ x y’ z
w x y z
d
A B C D
RRR
w x y z’w’ x y’ zw x y’’ z’’
w x y z
d
A B C D
RR
a b c da’ b c’ d’
a b c’’ d’’
(r1R)(r2R) [ r1.B = r2.B
(r3R) r3.A = r1.A and r3.B =
r1.B ]
Think conjunctive query syntax ...
• Used to check equivalence of tableaux (conjunctive queries) under dependencies.
• A chase step:
University of Pennsylvania, Database Research Group
43
Interaction of Indexes with Views
• (Levy et al ’95) For conjunctive queries and views:
– finitely many minimal equivalent rewritings
• However, the optimal plan may not be among them!
– Scenario:
• relations R(A,B), S(B, C) and view V = A (R S)
• input query Q = R S
• P = V R S is equivalent, but not minimal thrown away
• but, if V is small and R has an index on A, then P can be better than Q !
No interaction captured: indexes not in the language
University of Pennsylvania, Database Research Group
44C&B Captures Interaction of Indexes with
Views• The index IR on A is explicit in our framework.
• Chase Q with constraints describing V and IR to obtain U:
U = select struct (A=r.A, B=r.B, C=s.C) from R r, S s, V v, dom IR k
where r.B = s.B and v.A = r.A and k = r.A and IR
[k] = r
select struct (A=r.A, B=r.B, C=s.C)from R r, S s, V vwhere r.B = s.B and v.A = r.A
Elim k
select struct (A=r.A, B=r.B, C=s.C)from R r, S swhere r.B = s.B
Elim v
P1
select struct (A=v.A, B=s.B,
C=s.C)from V v, S s
where IR [v.A].B = s.BP2
Interaction! More minimal rewritings, incorporating indexes
University of Pennsylvania, Database Research Group
45
Path-Conjunctive Language (cont’d)
The following query plan:
select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)
from SI[“Citibank”] p
is not PC. However, the following is PC:
select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)
from dom SI k, SI[k] pwhere p = “CitiBank”
In general, PC cannot express navigation OO queries, index-based joins or index-based selections.
We will rediscover them when we translate PC queries into physical plans.
University of Pennsylvania, Database Research Group
46
PC Containment
Theorem (Containment). The PC containment problem, Q1
Q2 (under all instances), is in NP (and equivalent to the
existence of a PC-containment mapping).
Theorem (Containment under EPCDs). Let:
• D a set of EPCDs
• Q1, Q2 PC queries s.t. some chasing sequence of Q1 with D terminates (with chase D (Q1) ).
The following are equivalent : (a) Q1 D Q2 (b) chase D (Q1) Q2
University of Pennsylvania, Database Research Group
47
OQF (On-line Query Fragmentation)
INPUT: Query Q, Constraints C
F1
OQF
Fm
C&B C&B C&B
...
All plans
OUTPUT: Q = …
C1 C2 Cm … C =
partial plans P1 partial plans Pm
...
F2
University of Pennsylvania, Database Research Group
48
OQF + Prune
INPUT: Query Q, Constraints C
F1
OQF
Fm
BottomUpFB+ Prune
...
Plan P
OUTPUT: Q = …
C1 C2 Cm … C =
partial plan P1 partial plan Pm
...
F2
BottomUpFB+ Prune
BottomUpFB+ Prune