sigmod 2020 tutorial optimal join algorithms meet top-π
TRANSCRIPT
1
Optimal Join Algorithms meet Top-πSIGMOD 2020 tutorial
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
Northeastern University, Boston
Slides: https://northeastern-datalab.github.io/topk-join-tutorial/DOI: https://doi.org/10.1145/3318464.3383132Data Lab: https://db.khoury.northeastern.edu
Ranked results
Time
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details
Part 2 : Optimal Join Algorithms
2
Outline tutorial
β’ Part 1: Top-π (Wolfgang): ~20minβ’ Part 2: Optimal Join Algorithms (Mirek): ~30min
β Lower Bound and the Yannakakis Algorithm
β Problems Caused by Cycles
β Tree Decompositions
β Summary and Further Reading
β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min
3
Basic Terminology and Assumptions
β’ Terminology
- Full conjunctive query (CQ)
β’ Natural join of π relations with O(π) tuples each
β’ E.g.: π π΄1, π΄2, π΄3, π΄4 = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4β’ Any selections comparing attributes to constants, e.g., π΄4 < 1
- Query size: O(π)
- Output cardinality: π
β’ Assumptions
- No pre-computed data structures such as indexes, sorted representation, materialized views
4
Complexity Notation
β’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation
β’ Common practice: focus on data complexity
- We care about scalability in data size
β’ Treat query size π as a constant
- E.g., O π π β ππ(π) + logπ π(π) β π simplifies to O ππ(π) + logπ π(π) β π
5
Complexity Notation
β’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation
β’ Common practice: focus on data complexity
- We care about scalability in data size
β’ Treat query size π as a constant
- E.g., O π π β ππ(π) + logπ π(π) β π simplifies to O ππ(π) + logπ π(π) β π
β’ We mostly use ΰ·©O-notation (soft-O) data complexity
- Abstracts away polylog factors in input size that clutter formulas
- E.g., O ππ(π) + logπ π(π) β π further simplifies to ΰ·©O ππ(π) + π
6
Outline tutorial
β’ Part 1: Top-π (Wolfgang): ~20minβ’ Part 2: Optimal Join Algorithms (Mirek): ~30min
β Lower Bound and the Yannakakis Algorithm
β Problems Caused by Cycles
β Tree Decompositions
β Summary and Further Reading
β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min
7
Lower Bound for Any Query
β’ Need to read entire input at least once: Ξ©(ππ)
- Ξ©(π) data complexity
β’ Need to output every result, each of size π: Ξ©(ππ)
- Ξ©(π) data complexity
β’ Together: Ξ©(π + π) time complexity to compute any CQ
β’ Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs
- Time complexity ΰ·©O π + π
8
Yannakakis Algorithm
β’ Given: acyclic conjunctive query Q as a rooted join tree
β’ Step 1: semi-join reduction (two sweeps)
- Semi-join reduction sweep from the leaves to root
- Semi-join reduction sweep from root to the leaves
β’ Step 2: use the join tree as the query plan
- Compute the joins bottom up, with early projections
[Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDBβ81] https://dl.acm.org/doi/10.5555/1286831.1286840
9
Database TheoryYannakakis Algorithm β Example π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π΄2 π΄1, π΄2
π΄1, π΄2
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
10
Database TheoryYannakakis Algorithm β Example
1. Bottom-up traversal (semi-joins)
π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3
π΄2
π΄1, π΄2
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
πΉπ β πΉπ
π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π΄1, π΄2
11
Database TheoryYannakakis Algorithm β Example
1. Bottom-up traversal (semi-joins)
π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π΄2
π΄1, π΄2
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
πΉπ β πΉπ
π¨π π¨π
1 101 202 20
12
Database TheoryYannakakis Algorithm β Example
1. Bottom-up traversal (semi-joins)
π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π΄2
π΄1, π΄2
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
πΉπ β πΉπ
π¨π π¨π
1 101 202 20
13
Database TheoryYannakakis Algorithm β Example
1. Bottom-up traversal (semi-joins)
π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π΄1, π΄2
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
πΉπ β πΉπ
π¨π π¨π
1 101 202 20
πΉπ β πΉπ
π¨π
102030
14
Database TheoryYannakakis Algorithm β Example
1. Bottom-up traversal (semi-joins)
π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
πΉπ β πΉπ
π¨π π¨π
1 101 202 20
πΉπ β πΉπ
π¨π
102030
πΉπ β πΉπ
π¨π π¨π
1 101 20
15
Database TheoryYannakakis Algorithm β Example
1. Bottom-up traversal (semi-joins)
2. Top-down traversal (semi-joins)
π¨π π¨π
1 201 104 60
π 1
π¨π π¨π π¨π
1 10 1001 20 1003 10 3001 40 3002 30 200
π 2
π¨π
102030
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 20002 20 2000
π 4
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
πΉπ β πΉππΉπ β πΉπ
πΉπ β πΉπ
16
Database TheoryYannakakis Algorithm β Example π¨π π¨π
1 201 10
π 1
π¨π π¨π π¨π
1 10 1001 20 100
π 2
π¨π
1020
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 2000
π 4
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
1. Bottom-up traversal (semi-joins)
2. Top-down traversal (semi-joins)
3. Join bottom-up
17
Database TheoryYannakakis Algorithm β Example π¨π π¨π
1 201 10
π 1
π¨π π¨π π¨π
1 10 1001 20 100
π 2
π¨π
1020
π 3 π¨π π¨π π¨π
1 10 10001 20 10001 20 2000
π 4
π = π 1 π΄1, π΄2 β π 2 π΄1, π΄2, π΄3 β π 3 π΄2 β π 4 π΄1, π΄2, π΄4
1. Bottom-up traversal (semi-joins)
2. Top-down traversal (semi-joins)
3. Join bottom-up
In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa!
18
Yannakakis Algorithm β Properties
β’ Semi-join sweeps take ΰ·©O π
β’ A join step can never shrink intermediate result size
- This does not hold for all trees
- Tree must be attribute-connected (more on this soon)
β’ Hence all intermediate results are of size O π
β’ Each join step therefore has O π + π input and O π output
β’ Easy to compute a binary join with O π + π input and O π output in time ΰ·©O π + π , e.g., using sort-merge join
19
Outline tutorial
β’ Part 1: Top-π (Wolfgang): ~20minβ’ Part 2: Optimal Join Algorithms (Mirek): ~30min
β Lower Bound and the Yannakakis Algorithm
β Problems Caused by Cycles
β Tree Decompositions
β Summary and Further Reading
β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min
20
CQs with Cycles
β’ 3-path: π3π = π 1(π΄1, π΄2) β π 2(π΄2, π΄3) β π 3(π΄3, π΄4)
β’ 3-cycle: π3π = π 1(π΄1, π΄2) β π 2(π΄2, π΄3) β π 3(π΄3, π΄1)
π3π π3π
π΄1π΄2
π΄2 π΄3
π΄3π΄4
π΄1 π΄2
π΄3
π΄2
π΄3
π΄1
21
CQs with Cycles
β’ 3-path: π3π = π 1(π΄1, π΄2) β π 2(π΄2, π΄3) β π 3(π΄3, π΄4)
β’ 3-cycle: π3π = π 1(π΄1, π΄2) β π 2(π΄2, π΄3) β π 3(π΄3, π΄1)
β’ Already semi-join-reduced input
π¨π π¨π
1 1
2 1
β¦ β¦
n 1
π¨π π¨π
1 1
1 2
β¦ β¦
1 n
π¨π *
1 1
2 2
β¦ β¦
n n
π 1
π 1
π 2
Join tree
π 3
π 2 π 3
22
CQs with Cycles
β’ 3-path: π3π = π 1(π΄1, π΄2) β π 2(π΄2, π΄3) β π 3(π΄3, π΄4)
β’ 3-cycle: π3π = π 1(π΄1, π΄2) β π 2(π΄2, π΄3) β π 3(π΄3, π΄1)
β’ Already semi-join reduced in the example
β’ π 1 β π 2 produces π2 intermediate results
- Final output size: π2 for π3π, but only π for π3π
π¨π π¨π
1 1
2 1
β¦ β¦
n 1
π¨π π¨π
1 1
1 2
β¦ β¦
1 n
π¨π *
1 1
2 2
β¦ β¦
n n
π 1
π 1
π 2
Join tree
π 3
π 2 π 3
23
What Went Wrong?
β’ The tree for the 3-cycle is not attribute-connected!
β’ Attribute-connectedness:
- For each attribute, the nodes containing it form a connected sub-graph
β’ In the right tree, π΄1 violates this property
π 1(π΄1, π΄2)
π 2(π΄2, π΄3)
π 3(π΄3, π΄4)π3π π3π
π 1(π΄1, π΄2)
π 2(π΄2, π΄3)
π 3(π΄3, π΄1)
24
Solutions for Cycles? Some Bad News
β’ Maybe we just need an algorithm that is better suited for cyclic CQs?
β’ Yes, butβ¦
β’ β¦ [Ngo+ 18]:
- ΰ·©O π + π unattainable based on well-accepted complexity-theoretic assumptions
[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACMβ18 https://doi.org/10.1145/3180143
25
What Can Be Done?
β’ Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]
β’ Instead of ΰ·©O π + π , getΰ·©O π + πWC = ΰ·©O πWC
β’ πWC = largest output of Q over any possible DB instance
- Determined by the AGM bound[4]
- Based on fractional edge cover of the join hypergraph
β’ 3-cycle: π1.5 vs naive upper bound π3
[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACMβ18 https://doi.org/10.1145/3180143
[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDTβ14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.β14 https://doi.org/10.1145/2590989.2590991
[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.β13 https://doi.org/10.1137/110859440
26
What Can Be Done?
β’ Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]
β’ Instead of ΰ·©O π + π , getΰ·©O π + πWC = ΰ·©O πWC
β’ πWC = largest output of Q over any possible DB instance
- Determined by the AGM bound[4]
- Based on fractional edge cover of the join hypergraph
β’ 3-cycle: π1.5 vs naive upper bound π3
β’ Tree decompositions
β’ Put more effort into pre-processing to avoid large intermediate results
- Use WCO joins as sub-routine
β’ Goal: ΰ·©O ππ + π for smallest πpossible
- ΰ·©O ππ captures pre-processing cost
- π = 1 for acyclic CQ
[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACMβ18 https://doi.org/10.1145/3180143
[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDTβ14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.β14 https://doi.org/10.1145/2590989.2590991
[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.β13 https://doi.org/10.1137/110859440
27
Outline tutorial
β’ Part 1: Top-π (Wolfgang): ~20minβ’ Part 2: Optimal Join Algorithms (Mirek): ~30min
β Lower Bound and the Yannakakis Algorithm
β Problems Caused by Cycles
β Tree Decompositions
β Summary and Further Reading
β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min
28
Main Idea of Tree Decompositions
β’ Convert cyclic CQ to a rooted tree-shaped CQ
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
S = π 1 π΄1, π΄2 β π 2 π΄2, π΄3 β π 3 π΄3, π΄4
T = π 4 π΄4, π΄5 β π 5 π΄5, π΄6 β π 6(π΄6, π΄1)
decomposition
29
Main Idea of Tree Decompositions
β’ Convert cyclic CQ to a rooted tree-shaped CQ
β’ Materialize all tree nodes (βbagsβ) using a WCO join algorithm
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
S = π 1 π΄1, π΄2 β π 2 π΄2, π΄3 β π 3 π΄3, π΄4
T = π 4 π΄4, π΄5 β π 5 π΄5, π΄6 β π 6(π΄6, π΄1)
decomposition
S
T
30
Main Idea of Tree Decompositions
β’ Convert cyclic CQ to a rooted tree-shaped CQ
β’ Materialize all tree nodes (βbagsβ) using a WCO join algorithm
β’ Apply Yannakakis algorithm on the tree
- Acyclic CQ whose input relations are the bags
- Achieves ΰ·©O π₯ + π where π₯ is the size of the largest bag
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
S = π 1 π΄1, π΄2 β π 2 π΄2, π΄3 β π 3 π΄3, π΄4
T = π 4 π΄4, π΄5 β π 5 π΄5, π΄6 β π 6(π΄6, π΄1)
decomposition
S
T
Yannakakis
31
Tree Decomposition Intuition
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
32
Tree Decomposition Intuition
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
33
Tree Decomposition Intuition
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
34
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
π―1
Bag materialization costs O(π3) (AGM bound)
35
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
π―1
Bag materialization costs O(π3) (AGM bound)
36
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4π―2
π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
Bag materialization costs O(π2) (AGM bound)
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
37
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4π―2
π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
Bag materialization costs O(π2) (AGM bound)
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
38
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π―3 π 1 π΄1, π΄2
π 2 π΄2, π΄3
π 3 π΄3, π΄4
π 4 π΄4, π΄5
π 5 π΄5, π΄6
π 6(π΄6, π΄1)
O(π) bag materializationβ¦?
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
39
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π―3 π 1 π¨π, π΄2
π 2 π΄2, π΄3
π 3 π΄3, π΄4
π 4 π΄4, π΄5
π 5 π΄5, π΄6
π 6(π΄6, π¨π)
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
40
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
π―3 π 1 π΄1, π΄2
π 2 π΄2, π΄3 , π 1 π΄1, _
π 3 π΄3, π΄4 , π 1 π΄1, _
π 4 π΄4, π΄5 , π 1 π΄1, _
π 5 π΄5, π΄6 , π 1 π΄1, _
π 6(π΄6, π΄1)
O π β ππ΄1 π 1 bag materialization: still O(π2)
π6π π΄1, β¦ , π΄6 = π 1 π΄1, π΄2 β π 2 π΄2, π΄3β π 3 π΄3, π΄4 β π 4 π΄4, π΄5β π 5(π΄5, π΄6) β π 6(π΄6, π΄1)
41
Tree Decomposition: Formal Definition
β’ Given: hypergraph β = (π±, β°)
- π±: attributes
β’ E.g., π΄1, π΄2, π΄3, π΄4, π΄5, π΄6
- β°: relations
β’ E.g., π 3 is hyperedge (π΄3, π΄4)
β’ A tree decomposition of β is a pair π―, π where
- π― = π π― ,πΈ(π―) is a tree
- π:π π― β 2π± assigns a bag π(π£) to each tree node π£ such that
β’ Each hyperedge πΉ β β° is covered, i.e., βπΉ β β°: βπ£ β π π― : πΉ β π(π£)
β’ For each π’ β π±, the bags containing π’ are connected
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
π―3 π 1 π΄1, π΄2
π 2 π΄2, π΄3 , π 1 π΄1, _
π 3 π΄3, π΄4 , π 1 π΄1, _
π 4 π΄4, π΄5 , π 1 π΄1, _
π 5 π΄5, π΄6 , π 1 π΄1, _
π 6(π΄6, π΄1)
[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODSβ17] https://doi.org/10.1145/3034786.3056105
42
Tree-Decomposition Properties
β’ Query has multiple decompositionsβwhich is best?
43
Tree-Decomposition Properties
β’ Query has multiple decompositionsβwhich is best?
β’ Consider a tree with O(π) nodes, each materialized using WCO join
- Worst-case output of bag π is of size O(πππ) for some ππ β₯ 1 (AGM bound)
- Fractional hypertree width (fhw) π = maxπ
ππ [Grohe+ 14]
- Total bag-materialization cost: O(ππ)
- Size of a materialized bag: O(ππ)
- Resulting cost for Yannakakis algorithm on materialized tree: ΰ·©O(ππ + π)
[Grohe+ 14] Grohe and Marx. Constraint solving via fractional edge covers. ACM TALGβ14. https://doi.org/10.1145/2636918
44
Who Wins?π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
π―1
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4π―2
π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
π―3 π 1 π΄1, π΄2
π 2 π΄2, π΄3 , π 1 π΄1, _
π 3 π΄3, π΄4 , π 1 π΄1, _
π 4 π΄4, π΄5 , π 1 π΄1, _
π 5 π΄5, π΄6 , π 1 π΄1, _
π 6(π΄6, π΄1)
π£π > π£π = π£π
O(π2)
O(π2)
O(π3)
45
A Closer Look
β’ π―1 loses, because it does not decompose the query
46
A Closer Look
β’ π―1 loses, because it does not decompose the query
β’ Are π―2 and π―3 really equally good?
- In π―2, bag computation requires joining 3 relations
- In π―3, at most 2 relations are joined
β’ One of them is just the set of distinct π΄1-values in π 1
47
A Closer Look
β’ π―1 loses, because it does not decompose the query
β’ Are π―2 and π―3 really equally good?
- In π―2, bag computation requires joining 3 relations
- In π―3, at most 2 relations are joined
β’ One of them is just the set of distinct π΄1-values in π 1
β’ π―3 is better when the number of distinct π΄1-values in π 1 is low, e.g., π(π2/3) instead of π π
48
Who Wins?
π―3
π 1 π΄1, π΄2
π 2 π΄2, π΄3 , π 1 π΄1, _
π 3 π΄3, π΄4 , π 1 π΄1, _
π 4 π΄4, π΄5 , π 1 π΄1, _
π 5 π΄5, π΄6 , π 1 π΄1, _
π 6(π΄6, π΄1)
Degree constraint: ππ΄1 π 1 β€ π2/3
O(π)
O(π)
O(π5/3)
O(π5/3)
O(π5/3)
O(π5/3)
βThe number of distinct π΄1values in π 1 is at most π2/3β
49
Who Wins?
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4
π―2
π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
π―3
π 1 π΄1, π΄2
π 2 π΄2, π΄3 , π 1 π΄1, _
π 3 π΄3, π΄4 , π 1 π΄1, _
π 4 π΄4, π΄5 , π 1 π΄1, _
π 5 π΄5, π΄6 , π 1 π΄1, _
π 6(π΄6, π΄1)
Degree constraint: ππ΄1 π 1 β€ π2/3
O(π)
O(π)
O(π5/3)
O(π5/3)
O(π5/3)
O(π5/3)
O(π2)
O(π2)
50
Could π―2 Win?
β’ Consider bag π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4 in π―2
β’ What if each π 1-tuple joins with only βa fewβ π 2-tuples?
β’ What if each π 2-tuple joins with only βa fewβ π 3-tuples?
β’ What if βa fewβ was π1/3?
51
Who Wins Now?
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4
π―2
π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
Degree constraint: βπ β {2,3,5,6}:
βπ: ππ΄π+1ππ΄π=π π π β€ π1/3
O(π5/3)
O(π5/3)
52
Who Wins Now?
π 1 π΄1, π΄2 , π 2 π΄2, π΄3 , π 3 π΄3, π΄4
π―2
π 4 π΄4, π΄5 , π 5 π΄5, π΄6 , π 6(π΄6, π΄1)
π―3
π 1 π΄1, π΄2
π 2 π΄2, π΄3 , π 1 π΄1, _
π 3 π΄3, π΄4 , π 1 π΄1, _
π 4 π΄4, π΄5 , π 1 π΄1, _
π 5 π΄5, π΄6 , π 1 π΄1, _
π 6(π΄6, π΄1)
Degree constraint: βπ β {2,3,5,6}:
βπ: ππ΄π+1ππ΄π=π π π β€ π1/3
O(π)
O(π)
O(π2)
O(π2)
O(π2)
O(π2)
O(π5/3)
O(π5/3)
53
Best of Both Worlds
β’ Depending on the degree constraints that hold for a DB instance, we may sometimes prefer π―2 and sometimes π―3
β’ What if we used both? [Alon+ 97, Marx 13]
- Intuition: each decomposition is a different query βplanβ
β’ Query output = union of individual plansβ results
- Decide for each input tuple to which plan(s) to send it
- Main idea: split each input relation into heavy and light
β’ Goal: enforce desirable degree constraints for each tree
[Marx 13] Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACMβ13 https://doi.org/10.1145/2535926
[Alon+ 97] Alon, Yuster, Zwick. Finding and counting given length cycles. Algorithmicaβ97 https://doi.org/10.1007/BF02523189
54
Multiple Plans: Plan 1
π―3
π 1π» π΄1, π΄2
π 2 π΄2, π΄3 , π 1π» π΄1, _
π 3 π΄3, π΄4 , π 1π» π΄1, _
π 4 π΄4, π΄5 , π 1π» π΄1, _
π 5 π΄5, π΄6 , π 1π» π΄1, _
π 6(π΄6, π΄1)
π 1π»: contains all tuples whose
π΄1-values occur more than
π1/3 times (fewer than π2/3
such π΄1-values exist)
55
Multiple Plans: Plan 1
π―3: computes π 1π» β π 2 β β― β π 6
π 1π» π΄1, π΄2
π 2 π΄2, π΄3 , π 1π» π΄1, _
π 3 π΄3, π΄4 , π 1π» π΄1, _
π 4 π΄4, π΄5 , π 1π» π΄1, _
π 5 π΄5, π΄6 , π 1π» π΄1, _
π 6(π΄6, π΄1)
Degree constraint: ππ΄1 π 1π» β€ π2/3
O(π)
O(π)
O(π5/3)
O(π5/3)
O(π5/3)
O(π5/3)
π 1π»: contains all tuples whose
π΄1-values occur more than
π1/3 times (fewer than π2/3
such π΄1-values exist)
56
More Plans
β’ Note that
- π6π = π 1 β π 2 β π 3 β π 4 β π 5 β π 6 together with
- π 1πΏ = π 1 β π 1
π»
β’ implies that π6π is the union of
- π 1π» β π 2 β π 3 β π 4 β π 5 β π 6 and
- π 1πΏ β π 2 β π 3 β π 4 β π 5 β π 6
β’ To compute the latter, apply the same trick to π 2
57
Multiple Plans: Plan 2
π―3: computes π 1πΏ β π 2
π» β π 3 β β― β π 6
π 2π» π΄2, π΄3
π 3 π΄3, π΄4 , π 2π» π΄2, _
π 4 π΄4, π΄5 , π 2π» π΄2, _
π 5 π΄5, π΄6 , π 2π» π΄2, _
π 6 π΄6, π΄1 , π 2π» π΄2, _
π 1πΏ(π΄1, π΄2)
Degree constraint: ππ΄2 π 2π» β€ π2/3
π 2π»: contains all tuples whose
π΄2-values occur more than π1/3 times (fewer than π2/3
such π΄2-values exist)
π 2πΏ = π 2 β π 2
π»
58
Plans 3 to 6
β’ Plans discussed so far
- π 1πΏ β π 2
πΏ β π 3π» β π 4 β π 5 β π 6
- π 1πΏ β π 2
πΏ β π 3πΏ β π 4
π» β π 5 β π 6
β’ Continue analogously to compute
- π 1πΏ β π 2
πΏ β π 3π» β π 4 β π 5 β π 6
- π 1πΏ β π 2
πΏ β π 3πΏ β π 4
π» β π 5 β π 6
- π 1πΏ β π 2
πΏ β π 3πΏ β π 4
πΏ β π 5π» β π 6
- π 1πΏ β π 2
πΏ β π 3πΏ β π 4
πΏ β π 5πΏ β π 6
π»
β’ What is missing?
59
The 7-th Plan
β’ Join all light-only partitions with each other:
- π 1πΏ β π 2
πΏ β π 3πΏ β π 4
πΏ β π 5πΏ β π 6
πΏ
β’ Input now satisfies the other degree constraint:
- βπ β {2,3,5,6}: βπ: ππ΄π+1ππ΄π=π π π β€ π1/3
β’ Use decomposition π―2 for it!
60
Analysis and Discussion
β’ Rewrite 6-cycle into 7 sub-queries
- Six of them use π―3, copying the heavy attribute to intermediate bags
- One uses π―2 on the all-light case
β’ Analysis
- Assigning input tuples to subqueries: O(π)
- Bag materialization: O(π5/3)
- Bag size: O(π5/3)
β’ Running Yannakakis on each of the 7 trees takes ΰ·©O π5/3 + π
- Beats single-tree complexity ΰ·©O π2 + π and WCO-join complexity ΰ·©O π3
61
Tree Decompositions: The Big Picture
β’ WCO join algorithms applied to a query Q: complexity determined by the AGM bound for Q
β’ Decomposing Q into a tree creates bags that each may have a lower AGM bound value (fractional hypertree width)
β’ This reduces time complexity
- Materialize each bag using a WCO join algorithm
- Run Yannakakis algorithm on the tree with materialized bags as input
β’ Design goal: minimize width of tree, but ensure that each input relation is covered by a bag and the tree is attribute-connected!
62
Tree Decompositions: Degree Constraints
β’ Degree constraints generalize cardinality constraints and functional dependencies
β’ Enable improved complexity guarantees
β’ Application to general input (with only cardinality constraints):
- Partition input so that each partition satisfies stronger degree constraints
- Use appropriate tree for each partition
- Current frontier: submodular width
β’ Application to input DB with degree constraints:
- Degree-aware submodular width
[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODSβ17] https://doi.org/10.1145/3034786.3056105
[Joglekar, RΓ©. It's all a matter of degree. Theory of Computing Systemsβ18] https://doi.org/10.1007/s00224-017-9811-8
[Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACMβ13] https://doi.org/10.1145/2535926
63
Outline tutorial
β’ Part 1: Top-π (Wolfgang): ~20minβ’ Part 2: Optimal Join Algorithms (Mirek): ~30min
β Lower Bound and the Yannakakis Algorithm
β Problems Caused by Cycles
β Tree Decompositions
β Summary and Further Reading
β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min
64
Optimal-Joins Summary
β’ On acyclic CQs, the Yannakakis algorithmβs complexity ΰ·©O π + πmatches the lower bound
- Simple join-at-a-time query plan after semi-join reduction sweeps
β’ Lower bound cannot be matched for general cyclic CQs
β’ WCO joins achieve ΰ·©O π + πWC
- βHolisticβ multi-way join approach
β’ Tree decomposition approaches achieve ΰ·©O ππ + π
- Best π is submodular width subw(Q) using multi-tree decomposition
- πsubw(π) = O(πWC), but usually better
β’ 6-cycle: πWC = π3, but multi-tree approach achieves ΰ·©O π5/3 + π
65
Further Reading
β’ Extensions of the query model
- Projections and aggregation
β’ Factorized DB
β’ Stronger notions of optimality
β’ β¦and many more (see our tutorial paper)
[Khamis, Ngo, Rudra. FAQ: questions asked frequently. PODSβ16] https://doi.org/10.1145/2902251.2902280
[Bakibayev, KoΔiskΓ½, Olteanu, ZΓ‘vodnΓ½. Aggregation and Ordering in Factorised Databases. PVLDBβ13] https://doi.org/10.14778/2556549.2556579
[Bakibayev, Olteanu, ZΓ‘vodnΓ½. FDB: A Query Engine for Factorised Relational Databases. PVLDBβ12] https://doi.org/10.14778/2350229.2350242
[Olteanu, Schleich. Factorized databases. SIGMOD Recordβ16] https://doi.org/10.1145/3003665.3003667
[Olteanu, ZΓ‘vodny. Size bounds for factorised representations of query results. TODSβ15] https://doi.org/10.1145/2656335
[Khamis, RΓ©, Rudra. Joins via Geometric Resolutions: Worst Case and Beyond. TODSβ16] https://doi.org/10.1145/2967101
[Ngo, Nguyen, Re, Rudra. Beyond worst-case analysis for joins with minesweeper. PODSβ14] https://doi.org/10.1145/2594538.2594547