sigmod 2020 tutorial optimal join algorithms meet top-π‘˜

65
1 Optimal Join Algorithms meet Top- SIGMOD 2020 tutorial Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Northeastern University, Boston Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu Ranked results Time This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details Part 2 : Optimal Join Algorithms

Upload: others

Post on 23-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

1

Optimal Join Algorithms meet Top-π‘˜SIGMOD 2020 tutorial

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

Northeastern University, Boston

Slides: https://northeastern-datalab.github.io/topk-join-tutorial/DOI: https://doi.org/10.1145/3318464.3383132Data Lab: https://db.khoury.northeastern.edu

Ranked results

Time

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details

Part 2 : Optimal Join Algorithms

Page 2: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

2

Outline tutorial

β€’ Part 1: Top-π‘˜ (Wolfgang): ~20minβ€’ Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm

– Problems Caused by Cycles

– Tree Decompositions

– Summary and Further Reading

β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 3: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

3

Basic Terminology and Assumptions

β€’ Terminology

- Full conjunctive query (CQ)

β€’ Natural join of 𝑙 relations with O(𝑛) tuples each

β€’ E.g.: 𝑄 𝐴1, 𝐴2, 𝐴3, 𝐴4 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4β€’ Any selections comparing attributes to constants, e.g., 𝐴4 < 1

- Query size: O(𝑙)

- Output cardinality: π‘Ÿ

β€’ Assumptions

- No pre-computed data structures such as indexes, sorted representation, materialized views

Page 4: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

4

Complexity Notation

β€’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation

β€’ Common practice: focus on data complexity

- We care about scalability in data size

β€’ Treat query size 𝑙 as a constant

- E.g., O 𝑓 𝑙 β‹… 𝑛𝑓(𝑙) + log𝑛 𝑓(𝑙) β‹… π‘Ÿ simplifies to O 𝑛𝑓(𝑙) + log𝑛 𝑓(𝑙) β‹… π‘Ÿ

Page 5: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

5

Complexity Notation

β€’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation

β€’ Common practice: focus on data complexity

- We care about scalability in data size

β€’ Treat query size 𝑙 as a constant

- E.g., O 𝑓 𝑙 β‹… 𝑛𝑓(𝑙) + log𝑛 𝑓(𝑙) β‹… π‘Ÿ simplifies to O 𝑛𝑓(𝑙) + log𝑛 𝑓(𝑙) β‹… π‘Ÿ

β€’ We mostly use ΰ·©O-notation (soft-O) data complexity

- Abstracts away polylog factors in input size that clutter formulas

- E.g., O 𝑛𝑓(𝑙) + log𝑛 𝑓(𝑙) β‹… π‘Ÿ further simplifies to ΰ·©O 𝑛𝑓(𝑙) + π‘Ÿ

Page 6: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

6

Outline tutorial

β€’ Part 1: Top-π‘˜ (Wolfgang): ~20minβ€’ Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm

– Problems Caused by Cycles

– Tree Decompositions

– Summary and Further Reading

β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 7: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

7

Lower Bound for Any Query

β€’ Need to read entire input at least once: Ξ©(𝑙𝑛)

- Ξ©(𝑛) data complexity

β€’ Need to output every result, each of size 𝑙: Ξ©(π‘™π‘Ÿ)

- Ξ©(π‘Ÿ) data complexity

β€’ Together: Ξ©(𝑛 + π‘Ÿ) time complexity to compute any CQ

β€’ Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs

- Time complexity ΰ·©O 𝑛 + π‘Ÿ

Page 8: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

8

Yannakakis Algorithm

β€’ Given: acyclic conjunctive query Q as a rooted join tree

β€’ Step 1: semi-join reduction (two sweeps)

- Semi-join reduction sweep from the leaves to root

- Semi-join reduction sweep from root to the leaves

β€’ Step 2: use the join tree as the query plan

- Compute the joins bottom up, with early projections

[Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDB’81] https://dl.acm.org/doi/10.5555/1286831.1286840

Page 9: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

9

Database TheoryYannakakis Algorithm – Example π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝐴2 𝐴1, 𝐴2

𝐴1, 𝐴2

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

Page 10: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

10

Database TheoryYannakakis Algorithm – Example

1. Bottom-up traversal (semi-joins)

π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3

𝐴2

𝐴1, 𝐴2

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

π‘ΉπŸ ⋉ π‘ΉπŸ’

π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝐴1, 𝐴2

Page 11: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

11

Database TheoryYannakakis Algorithm – Example

1. Bottom-up traversal (semi-joins)

π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝐴2

𝐴1, 𝐴2

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

π‘ΉπŸ ⋉ π‘ΉπŸ’

π‘¨πŸ π‘¨πŸ

1 101 202 20

Page 12: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

12

Database TheoryYannakakis Algorithm – Example

1. Bottom-up traversal (semi-joins)

π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝐴2

𝐴1, 𝐴2

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

π‘ΉπŸ ⋉ π‘ΉπŸ’

π‘¨πŸ π‘¨πŸ

1 101 202 20

Page 13: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

13

Database TheoryYannakakis Algorithm – Example

1. Bottom-up traversal (semi-joins)

π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝐴1, 𝐴2

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

π‘ΉπŸ ⋉ π‘ΉπŸ’

π‘¨πŸ π‘¨πŸ

1 101 202 20

π‘ΉπŸ ⋉ π‘ΉπŸ‘

π‘¨πŸ

102030

Page 14: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

14

Database TheoryYannakakis Algorithm – Example

1. Bottom-up traversal (semi-joins)

π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

π‘ΉπŸ ⋉ π‘ΉπŸ’

π‘¨πŸ π‘¨πŸ

1 101 202 20

π‘ΉπŸ ⋉ π‘ΉπŸ‘

π‘¨πŸ

102030

π‘ΉπŸ ⋉ π‘ΉπŸ

π‘¨πŸ π‘¨πŸ

1 101 20

Page 15: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

15

Database TheoryYannakakis Algorithm – Example

1. Bottom-up traversal (semi-joins)

2. Top-down traversal (semi-joins)

π‘¨πŸ π‘¨πŸ

1 201 104 60

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 1003 10 3001 40 3002 30 200

𝑅2

π‘¨πŸ

102030

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 20002 20 2000

𝑅4

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

π‘ΉπŸ’ ⋉ π‘ΉπŸπ‘ΉπŸ‘ ⋉ π‘ΉπŸ

π‘ΉπŸ ⋉ π‘ΉπŸ

Page 16: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

16

Database TheoryYannakakis Algorithm – Example π‘¨πŸ π‘¨πŸ

1 201 10

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 100

𝑅2

π‘¨πŸ

1020

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 2000

𝑅4

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

1. Bottom-up traversal (semi-joins)

2. Top-down traversal (semi-joins)

3. Join bottom-up

Page 17: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

17

Database TheoryYannakakis Algorithm – Example π‘¨πŸ π‘¨πŸ

1 201 10

𝑅1

π‘¨πŸ π‘¨πŸ π‘¨πŸ‘

1 10 1001 20 100

𝑅2

π‘¨πŸ

1020

𝑅3 π‘¨πŸ π‘¨πŸ π‘¨πŸ’

1 10 10001 20 10001 20 2000

𝑅4

𝑄 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴1, 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴2 β‹ˆ 𝑅4 𝐴1, 𝐴2, 𝐴4

1. Bottom-up traversal (semi-joins)

2. Top-down traversal (semi-joins)

3. Join bottom-up

In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa!

Page 18: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

18

Yannakakis Algorithm – Properties

β€’ Semi-join sweeps take ΰ·©O 𝑛

β€’ A join step can never shrink intermediate result size

- This does not hold for all trees

- Tree must be attribute-connected (more on this soon)

β€’ Hence all intermediate results are of size O π‘Ÿ

β€’ Each join step therefore has O 𝑛 + π‘Ÿ input and O π‘Ÿ output

β€’ Easy to compute a binary join with O 𝑛 + π‘Ÿ input and O π‘Ÿ output in time ΰ·©O 𝑛 + π‘Ÿ , e.g., using sort-merge join

Page 19: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

19

Outline tutorial

β€’ Part 1: Top-π‘˜ (Wolfgang): ~20minβ€’ Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm

– Problems Caused by Cycles

– Tree Decompositions

– Summary and Further Reading

β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 20: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

20

CQs with Cycles

β€’ 3-path: 𝑄3𝑝 = 𝑅1(𝐴1, 𝐴2) β‹ˆ 𝑅2(𝐴2, 𝐴3) β‹ˆ 𝑅3(𝐴3, 𝐴4)

β€’ 3-cycle: 𝑄3𝑐 = 𝑅1(𝐴1, 𝐴2) β‹ˆ 𝑅2(𝐴2, 𝐴3) β‹ˆ 𝑅3(𝐴3, 𝐴1)

𝑄3𝑝 𝑄3𝑐

𝐴1𝐴2

𝐴2 𝐴3

𝐴3𝐴4

𝐴1 𝐴2

𝐴3

𝐴2

𝐴3

𝐴1

Page 21: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

21

CQs with Cycles

β€’ 3-path: 𝑄3𝑝 = 𝑅1(𝐴1, 𝐴2) β‹ˆ 𝑅2(𝐴2, 𝐴3) β‹ˆ 𝑅3(𝐴3, 𝐴4)

β€’ 3-cycle: 𝑄3𝑐 = 𝑅1(𝐴1, 𝐴2) β‹ˆ 𝑅2(𝐴2, 𝐴3) β‹ˆ 𝑅3(𝐴3, 𝐴1)

β€’ Already semi-join-reduced input

π‘¨πŸ π‘¨πŸ

1 1

2 1

… …

n 1

π‘¨πŸ π‘¨πŸ‘

1 1

1 2

… …

1 n

π‘¨πŸ‘ *

1 1

2 2

… …

n n

𝑅1

𝑅1

𝑅2

Join tree

𝑅3

𝑅2 𝑅3

Page 22: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

22

CQs with Cycles

β€’ 3-path: 𝑄3𝑝 = 𝑅1(𝐴1, 𝐴2) β‹ˆ 𝑅2(𝐴2, 𝐴3) β‹ˆ 𝑅3(𝐴3, 𝐴4)

β€’ 3-cycle: 𝑄3𝑐 = 𝑅1(𝐴1, 𝐴2) β‹ˆ 𝑅2(𝐴2, 𝐴3) β‹ˆ 𝑅3(𝐴3, 𝐴1)

β€’ Already semi-join reduced in the example

β€’ 𝑅1 β‹ˆ 𝑅2 produces 𝑛2 intermediate results

- Final output size: 𝑛2 for 𝑄3𝑝, but only 𝑛 for 𝑄3𝑐

π‘¨πŸ π‘¨πŸ

1 1

2 1

… …

n 1

π‘¨πŸ π‘¨πŸ‘

1 1

1 2

… …

1 n

π‘¨πŸ‘ *

1 1

2 2

… …

n n

𝑅1

𝑅1

𝑅2

Join tree

𝑅3

𝑅2 𝑅3

Page 23: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

23

What Went Wrong?

β€’ The tree for the 3-cycle is not attribute-connected!

β€’ Attribute-connectedness:

- For each attribute, the nodes containing it form a connected sub-graph

β€’ In the right tree, 𝐴1 violates this property

𝑅1(𝐴1, 𝐴2)

𝑅2(𝐴2, 𝐴3)

𝑅3(𝐴3, 𝐴4)𝑄3𝑝 𝑄3𝑐

𝑅1(𝐴1, 𝐴2)

𝑅2(𝐴2, 𝐴3)

𝑅3(𝐴3, 𝐴1)

Page 24: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

24

Solutions for Cycles? Some Bad News

β€’ Maybe we just need an algorithm that is better suited for cyclic CQs?

β€’ Yes, but…

β€’ … [Ngo+ 18]:

- ΰ·©O 𝑛 + π‘Ÿ unattainable based on well-accepted complexity-theoretic assumptions

[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACM’18 https://doi.org/10.1145/3180143

Page 25: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

25

What Can Be Done?

β€’ Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]

β€’ Instead of ΰ·©O 𝑛 + π‘Ÿ , getΰ·©O 𝑛 + π‘ŸWC = ΰ·©O π‘ŸWC

β€’ π‘ŸWC = largest output of Q over any possible DB instance

- Determined by the AGM bound[4]

- Based on fractional edge cover of the join hypergraph

β€’ 3-cycle: 𝑛1.5 vs naive upper bound 𝑛3

[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACM’18 https://doi.org/10.1145/3180143

[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDT’14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.’14 https://doi.org/10.1145/2590989.2590991

[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.’13 https://doi.org/10.1137/110859440

Page 26: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

26

What Can Be Done?

β€’ Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]

β€’ Instead of ΰ·©O 𝑛 + π‘Ÿ , getΰ·©O 𝑛 + π‘ŸWC = ΰ·©O π‘ŸWC

β€’ π‘ŸWC = largest output of Q over any possible DB instance

- Determined by the AGM bound[4]

- Based on fractional edge cover of the join hypergraph

β€’ 3-cycle: 𝑛1.5 vs naive upper bound 𝑛3

β€’ Tree decompositions

β€’ Put more effort into pre-processing to avoid large intermediate results

- Use WCO joins as sub-routine

β€’ Goal: ΰ·©O 𝑛𝑑 + π‘Ÿ for smallest 𝑑possible

- ΰ·©O 𝑛𝑑 captures pre-processing cost

- 𝑑 = 1 for acyclic CQ

[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACM’18 https://doi.org/10.1145/3180143

[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDT’14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.’14 https://doi.org/10.1145/2590989.2590991

[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.’13 https://doi.org/10.1137/110859440

Page 27: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

27

Outline tutorial

β€’ Part 1: Top-π‘˜ (Wolfgang): ~20minβ€’ Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm

– Problems Caused by Cycles

– Tree Decompositions

– Summary and Further Reading

β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 28: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

28

Main Idea of Tree Decompositions

β€’ Convert cyclic CQ to a rooted tree-shaped CQ

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

S = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴3, 𝐴4

T = 𝑅4 𝐴4, 𝐴5 β‹ˆ 𝑅5 𝐴5, 𝐴6 β‹ˆ 𝑅6(𝐴6, 𝐴1)

decomposition

Page 29: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

29

Main Idea of Tree Decompositions

β€’ Convert cyclic CQ to a rooted tree-shaped CQ

β€’ Materialize all tree nodes (β€œbags”) using a WCO join algorithm

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

S = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴3, 𝐴4

T = 𝑅4 𝐴4, 𝐴5 β‹ˆ 𝑅5 𝐴5, 𝐴6 β‹ˆ 𝑅6(𝐴6, 𝐴1)

decomposition

S

T

Page 30: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

30

Main Idea of Tree Decompositions

β€’ Convert cyclic CQ to a rooted tree-shaped CQ

β€’ Materialize all tree nodes (β€œbags”) using a WCO join algorithm

β€’ Apply Yannakakis algorithm on the tree

- Acyclic CQ whose input relations are the bags

- Achieves ΰ·©O π‘₯ + π‘Ÿ where π‘₯ is the size of the largest bag

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

S = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3 β‹ˆ 𝑅3 𝐴3, 𝐴4

T = 𝑅4 𝐴4, 𝐴5 β‹ˆ 𝑅5 𝐴5, 𝐴6 β‹ˆ 𝑅6(𝐴6, 𝐴1)

decomposition

S

T

Yannakakis

Page 31: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

31

Tree Decomposition Intuition

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Page 32: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

32

Tree Decomposition Intuition

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

Page 33: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

33

Tree Decomposition Intuition

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

Page 34: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

34

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

𝒯1

Bag materialization costs O(𝑛3) (AGM bound)

Page 35: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

35

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

𝒯1

Bag materialization costs O(𝑛3) (AGM bound)

Page 36: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

36

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4𝒯2

𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

Bag materialization costs O(𝑛2) (AGM bound)

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Page 37: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

37

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4𝒯2

𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

Bag materialization costs O(𝑛2) (AGM bound)

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Page 38: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

38

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝒯3 𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3

𝑅3 𝐴3, 𝐴4

𝑅4 𝐴4, 𝐴5

𝑅5 𝐴5, 𝐴6

𝑅6(𝐴6, 𝐴1)

O(𝑛) bag materialization…?

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Page 39: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

39

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝒯3 𝑅1 π‘¨πŸ, 𝐴2

𝑅2 𝐴2, 𝐴3

𝑅3 𝐴3, 𝐴4

𝑅4 𝐴4, 𝐴5

𝑅5 𝐴5, 𝐴6

𝑅6(𝐴6, π‘¨πŸ)

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Page 40: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

40

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node)

For each attribute, the bags containing it are connected

𝒯3 𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1 𝐴1, _

𝑅6(𝐴6, 𝐴1)

O 𝑛 βˆ™ πœ‹π΄1 𝑅1 bag materialization: still O(𝑛2)

𝑄6𝑐 𝐴1, … , 𝐴6 = 𝑅1 𝐴1, 𝐴2 β‹ˆ 𝑅2 𝐴2, 𝐴3β‹ˆ 𝑅3 𝐴3, 𝐴4 β‹ˆ 𝑅4 𝐴4, 𝐴5β‹ˆ 𝑅5(𝐴5, 𝐴6) β‹ˆ 𝑅6(𝐴6, 𝐴1)

Page 41: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

41

Tree Decomposition: Formal Definition

β€’ Given: hypergraph β„‹ = (𝒱, β„°)

- 𝒱: attributes

β€’ E.g., 𝐴1, 𝐴2, 𝐴3, 𝐴4, 𝐴5, 𝐴6

- β„°: relations

β€’ E.g., 𝑅3 is hyperedge (𝐴3, 𝐴4)

β€’ A tree decomposition of β„‹ is a pair 𝒯, πœ’ where

- 𝒯 = 𝑉 𝒯 ,𝐸(𝒯) is a tree

- πœ’:𝑉 𝒯 β†’ 2𝒱 assigns a bag πœ’(𝑣) to each tree node 𝑣 such that

β€’ Each hyperedge 𝐹 ∈ β„° is covered, i.e., βˆ€πΉ ∈ β„°: βˆƒπ‘£ ∈ 𝑉 𝒯 : 𝐹 βŠ† πœ’(𝑣)

β€’ For each 𝑒 ∈ 𝒱, the bags containing 𝑒 are connected

A1

A4

A3

A2A6

A5

R6 R1

R2

R3R4

R5

𝒯3 𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1 𝐴1, _

𝑅6(𝐴6, 𝐴1)

[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODS’17] https://doi.org/10.1145/3034786.3056105

Page 42: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

42

Tree-Decomposition Properties

β€’ Query has multiple decompositionsβ€”which is best?

Page 43: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

43

Tree-Decomposition Properties

β€’ Query has multiple decompositionsβ€”which is best?

β€’ Consider a tree with O(𝑙) nodes, each materialized using WCO join

- Worst-case output of bag 𝑖 is of size O(𝑛𝑑𝑖) for some 𝑑𝑖 β‰₯ 1 (AGM bound)

- Fractional hypertree width (fhw) 𝑑 = max𝑖

𝑑𝑖 [Grohe+ 14]

- Total bag-materialization cost: O(𝑛𝑑)

- Size of a materialized bag: O(𝑛𝑑)

- Resulting cost for Yannakakis algorithm on materialized tree: ΰ·©O(𝑛𝑑 + π‘Ÿ)

[Grohe+ 14] Grohe and Marx. Constraint solving via fractional edge covers. ACM TALG’14. https://doi.org/10.1145/2636918

Page 44: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

44

Who Wins?𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

𝒯1

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4𝒯2

𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

𝒯3 𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1 𝐴1, _

𝑅6(𝐴6, 𝐴1)

π“£πŸ > π“£πŸ = π“£πŸ‘

O(𝑛2)

O(𝑛2)

O(𝑛3)

Page 45: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

45

A Closer Look

β€’ 𝒯1 loses, because it does not decompose the query

Page 46: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

46

A Closer Look

β€’ 𝒯1 loses, because it does not decompose the query

β€’ Are 𝒯2 and 𝒯3 really equally good?

- In 𝒯2, bag computation requires joining 3 relations

- In 𝒯3, at most 2 relations are joined

β€’ One of them is just the set of distinct 𝐴1-values in 𝑅1

Page 47: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

47

A Closer Look

β€’ 𝒯1 loses, because it does not decompose the query

β€’ Are 𝒯2 and 𝒯3 really equally good?

- In 𝒯2, bag computation requires joining 3 relations

- In 𝒯3, at most 2 relations are joined

β€’ One of them is just the set of distinct 𝐴1-values in 𝑅1

β€’ 𝒯3 is better when the number of distinct 𝐴1-values in 𝑅1 is low, e.g., 𝑂(𝑛2/3) instead of 𝑂 𝑛

Page 48: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

48

Who Wins?

𝒯3

𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1 𝐴1, _

𝑅6(𝐴6, 𝐴1)

Degree constraint: πœ‹π΄1 𝑅1 ≀ 𝑛2/3

O(𝑛)

O(𝑛)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛5/3)

β€œThe number of distinct 𝐴1values in 𝑅1 is at most 𝑛2/3”

Page 49: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

49

Who Wins?

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4

𝒯2

𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

𝒯3

𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1 𝐴1, _

𝑅6(𝐴6, 𝐴1)

Degree constraint: πœ‹π΄1 𝑅1 ≀ 𝑛2/3

O(𝑛)

O(𝑛)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛2)

O(𝑛2)

Page 50: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

50

Could 𝒯2 Win?

β€’ Consider bag 𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4 in 𝒯2

β€’ What if each 𝑅1-tuple joins with only β€œa few” 𝑅2-tuples?

β€’ What if each 𝑅2-tuple joins with only β€œa few” 𝑅3-tuples?

β€’ What if β€œa few” was 𝑛1/3?

Page 51: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

51

Who Wins Now?

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4

𝒯2

𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

Degree constraint: βˆ€π‘– ∈ {2,3,5,6}:

βˆ€π‘—: πœ‹π΄π‘–+1πœŽπ΄π‘–=𝑗 𝑅𝑖 ≀ 𝑛1/3

O(𝑛5/3)

O(𝑛5/3)

Page 52: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

52

Who Wins Now?

𝑅1 𝐴1, 𝐴2 , 𝑅2 𝐴2, 𝐴3 , 𝑅3 𝐴3, 𝐴4

𝒯2

𝑅4 𝐴4, 𝐴5 , 𝑅5 𝐴5, 𝐴6 , 𝑅6(𝐴6, 𝐴1)

𝒯3

𝑅1 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1 𝐴1, _

𝑅6(𝐴6, 𝐴1)

Degree constraint: βˆ€π‘– ∈ {2,3,5,6}:

βˆ€π‘—: πœ‹π΄π‘–+1πœŽπ΄π‘–=𝑗 𝑅𝑖 ≀ 𝑛1/3

O(𝑛)

O(𝑛)

O(𝑛2)

O(𝑛2)

O(𝑛2)

O(𝑛2)

O(𝑛5/3)

O(𝑛5/3)

Page 53: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

53

Best of Both Worlds

β€’ Depending on the degree constraints that hold for a DB instance, we may sometimes prefer 𝒯2 and sometimes 𝒯3

β€’ What if we used both? [Alon+ 97, Marx 13]

- Intuition: each decomposition is a different query β€œplan”

β€’ Query output = union of individual plans’ results

- Decide for each input tuple to which plan(s) to send it

- Main idea: split each input relation into heavy and light

β€’ Goal: enforce desirable degree constraints for each tree

[Marx 13] Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACM’13 https://doi.org/10.1145/2535926

[Alon+ 97] Alon, Yuster, Zwick. Finding and counting given length cycles. Algorithmica’97 https://doi.org/10.1007/BF02523189

Page 54: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

54

Multiple Plans: Plan 1

𝒯3

𝑅1𝐻 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1𝐻 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1𝐻 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1𝐻 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1𝐻 𝐴1, _

𝑅6(𝐴6, 𝐴1)

𝑅1𝐻: contains all tuples whose

𝐴1-values occur more than

𝑛1/3 times (fewer than 𝑛2/3

such 𝐴1-values exist)

Page 55: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

55

Multiple Plans: Plan 1

𝒯3: computes 𝑅1𝐻 β‹ˆ 𝑅2 β‹ˆ β‹― β‹ˆ 𝑅6

𝑅1𝐻 𝐴1, 𝐴2

𝑅2 𝐴2, 𝐴3 , 𝑅1𝐻 𝐴1, _

𝑅3 𝐴3, 𝐴4 , 𝑅1𝐻 𝐴1, _

𝑅4 𝐴4, 𝐴5 , 𝑅1𝐻 𝐴1, _

𝑅5 𝐴5, 𝐴6 , 𝑅1𝐻 𝐴1, _

𝑅6(𝐴6, 𝐴1)

Degree constraint: πœ‹π΄1 𝑅1𝐻 ≀ 𝑛2/3

O(𝑛)

O(𝑛)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛5/3)

O(𝑛5/3)

𝑅1𝐻: contains all tuples whose

𝐴1-values occur more than

𝑛1/3 times (fewer than 𝑛2/3

such 𝐴1-values exist)

Page 56: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

56

More Plans

β€’ Note that

- 𝑄6𝑐 = 𝑅1 β‹ˆ 𝑅2 β‹ˆ 𝑅3 β‹ˆ 𝑅4 β‹ˆ 𝑅5 β‹ˆ 𝑅6 together with

- 𝑅1𝐿 = 𝑅1 βˆ– 𝑅1

𝐻

β€’ implies that 𝑄6𝑐 is the union of

- 𝑅1𝐻 β‹ˆ 𝑅2 β‹ˆ 𝑅3 β‹ˆ 𝑅4 β‹ˆ 𝑅5 β‹ˆ 𝑅6 and

- 𝑅1𝐿 β‹ˆ 𝑅2 β‹ˆ 𝑅3 β‹ˆ 𝑅4 β‹ˆ 𝑅5 β‹ˆ 𝑅6

β€’ To compute the latter, apply the same trick to 𝑅2

Page 57: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

57

Multiple Plans: Plan 2

𝒯3: computes 𝑅1𝐿 β‹ˆ 𝑅2

𝐻 β‹ˆ 𝑅3 β‹ˆ β‹― β‹ˆ 𝑅6

𝑅2𝐻 𝐴2, 𝐴3

𝑅3 𝐴3, 𝐴4 , 𝑅2𝐻 𝐴2, _

𝑅4 𝐴4, 𝐴5 , 𝑅2𝐻 𝐴2, _

𝑅5 𝐴5, 𝐴6 , 𝑅2𝐻 𝐴2, _

𝑅6 𝐴6, 𝐴1 , 𝑅2𝐻 𝐴2, _

𝑅1𝐿(𝐴1, 𝐴2)

Degree constraint: πœ‹π΄2 𝑅2𝐻 ≀ 𝑛2/3

𝑅2𝐻: contains all tuples whose

𝐴2-values occur more than 𝑛1/3 times (fewer than 𝑛2/3

such 𝐴2-values exist)

𝑅2𝐿 = 𝑅2 βˆ– 𝑅2

𝐻

Page 58: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

58

Plans 3 to 6

β€’ Plans discussed so far

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐻 β‹ˆ 𝑅4 β‹ˆ 𝑅5 β‹ˆ 𝑅6

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐿 β‹ˆ 𝑅4

𝐻 β‹ˆ 𝑅5 β‹ˆ 𝑅6

β€’ Continue analogously to compute

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐻 β‹ˆ 𝑅4 β‹ˆ 𝑅5 β‹ˆ 𝑅6

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐿 β‹ˆ 𝑅4

𝐻 β‹ˆ 𝑅5 β‹ˆ 𝑅6

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐿 β‹ˆ 𝑅4

𝐿 β‹ˆ 𝑅5𝐻 β‹ˆ 𝑅6

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐿 β‹ˆ 𝑅4

𝐿 β‹ˆ 𝑅5𝐿 β‹ˆ 𝑅6

𝐻

β€’ What is missing?

Page 59: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

59

The 7-th Plan

β€’ Join all light-only partitions with each other:

- 𝑅1𝐿 β‹ˆ 𝑅2

𝐿 β‹ˆ 𝑅3𝐿 β‹ˆ 𝑅4

𝐿 β‹ˆ 𝑅5𝐿 β‹ˆ 𝑅6

𝐿

β€’ Input now satisfies the other degree constraint:

- βˆ€π‘– ∈ {2,3,5,6}: βˆ€π‘—: πœ‹π΄π‘–+1πœŽπ΄π‘–=𝑗 𝑅𝑖 ≀ 𝑛1/3

β€’ Use decomposition 𝒯2 for it!

Page 60: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

60

Analysis and Discussion

β€’ Rewrite 6-cycle into 7 sub-queries

- Six of them use 𝒯3, copying the heavy attribute to intermediate bags

- One uses 𝒯2 on the all-light case

β€’ Analysis

- Assigning input tuples to subqueries: O(𝑛)

- Bag materialization: O(𝑛5/3)

- Bag size: O(𝑛5/3)

β€’ Running Yannakakis on each of the 7 trees takes ΰ·©O 𝑛5/3 + π‘Ÿ

- Beats single-tree complexity ΰ·©O 𝑛2 + π‘Ÿ and WCO-join complexity ΰ·©O 𝑛3

Page 61: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

61

Tree Decompositions: The Big Picture

β€’ WCO join algorithms applied to a query Q: complexity determined by the AGM bound for Q

β€’ Decomposing Q into a tree creates bags that each may have a lower AGM bound value (fractional hypertree width)

β€’ This reduces time complexity

- Materialize each bag using a WCO join algorithm

- Run Yannakakis algorithm on the tree with materialized bags as input

β€’ Design goal: minimize width of tree, but ensure that each input relation is covered by a bag and the tree is attribute-connected!

Page 62: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

62

Tree Decompositions: Degree Constraints

β€’ Degree constraints generalize cardinality constraints and functional dependencies

β€’ Enable improved complexity guarantees

β€’ Application to general input (with only cardinality constraints):

- Partition input so that each partition satisfies stronger degree constraints

- Use appropriate tree for each partition

- Current frontier: submodular width

β€’ Application to input DB with degree constraints:

- Degree-aware submodular width

[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODS’17] https://doi.org/10.1145/3034786.3056105

[Joglekar, RΓ©. It's all a matter of degree. Theory of Computing Systems’18] https://doi.org/10.1007/s00224-017-9811-8

[Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACM’13] https://doi.org/10.1145/2535926

Page 63: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

63

Outline tutorial

β€’ Part 1: Top-π‘˜ (Wolfgang): ~20minβ€’ Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm

– Problems Caused by Cycles

– Tree Decompositions

– Summary and Further Reading

β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min

Page 64: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

64

Optimal-Joins Summary

β€’ On acyclic CQs, the Yannakakis algorithm’s complexity ΰ·©O 𝑛 + π‘Ÿmatches the lower bound

- Simple join-at-a-time query plan after semi-join reduction sweeps

β€’ Lower bound cannot be matched for general cyclic CQs

β€’ WCO joins achieve ΰ·©O 𝑛 + π‘ŸWC

- β€œHolistic” multi-way join approach

β€’ Tree decomposition approaches achieve ΰ·©O 𝑛𝑑 + π‘Ÿ

- Best 𝑑 is submodular width subw(Q) using multi-tree decomposition

- 𝑛subw(𝑄) = O(π‘ŸWC), but usually better

β€’ 6-cycle: π‘ŸWC = 𝑛3, but multi-tree approach achieves ΰ·©O 𝑛5/3 + π‘Ÿ

Page 65: SIGMOD 2020 tutorial Optimal Join Algorithms meet Top-π‘˜

65

Further Reading

β€’ Extensions of the query model

- Projections and aggregation

β€’ Factorized DB

β€’ Stronger notions of optimality

β€’ …and many more (see our tutorial paper)

[Khamis, Ngo, Rudra. FAQ: questions asked frequently. PODS’16] https://doi.org/10.1145/2902251.2902280

[Bakibayev, KočiskΓ½, Olteanu, ZΓ‘vodnΓ½. Aggregation and Ordering in Factorised Databases. PVLDB’13] https://doi.org/10.14778/2556549.2556579

[Bakibayev, Olteanu, ZΓ‘vodnΓ½. FDB: A Query Engine for Factorised Relational Databases. PVLDB’12] https://doi.org/10.14778/2350229.2350242

[Olteanu, Schleich. Factorized databases. SIGMOD Record’16] https://doi.org/10.1145/3003665.3003667

[Olteanu, ZΓ‘vodny. Size bounds for factorised representations of query results. TODS’15] https://doi.org/10.1145/2656335

[Khamis, RΓ©, Rudra. Joins via Geometric Resolutions: Worst Case and Beyond. TODS’16] https://doi.org/10.1145/2967101

[Ngo, Nguyen, Re, Rudra. Beyond worst-case analysis for joins with minesweeper. PODS’14] https://doi.org/10.1145/2594538.2594547