18 advanced database systems - cmu 15-721

66
Parallel Join Algorithms (Sorting) @Andy_Pavlo // 15-721 // Spring 2019 ADVANCED DATABASE SYSTEMS Lecture #18

Upload: others

Post on 29-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

Parallel Join Algorithms (Sorting)

@Andy_Pavlo // 15-721 // Spring 2019

ADVANCEDDATABASE SYSTEMS

Le

ctu

re #

18

Page 2: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

PROJECT #2

This Week→ Status Meetings

Monday April 8th

→ Code Review Submission→ Update Presentation→ Design Document

2

Page 3: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

PARALLEL JOIN ALGORITHMS

Perform a join between two relations on multiple threads simultaneously to speed up operation.

Two main approaches:→ Hash Join→ Sort-Merge Join

3

Page 4: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SIMD Background

Parallel Sort-Merge Join

Evaluation

4

Page 5: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

S INGLE INSTRUCTION, MULTIPLE DATA

A class of CPU instructions that allow the processor to perform the same operation on multiple data points simultaneously.

All major ISAs have microarchitecture support SIMD operations.→ x86: MMX, SSE, SSE2, SSE3, SSE4, AVX→ PowerPC: Altivec→ ARM: NEON

5

Page 6: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Z

SIMD EXAMPLE

6

X + Y = Z 87654321

X

SISD

+for (i=0; i<n; i++) {Z[i] = X[i] + Y[i];

}

11111111

Y

x1

x2

⋮xn

y1

y2

⋮yn

x1+y1

x2+y2

⋮xn+yn

+ =

Page 7: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Z

SIMD EXAMPLE

6

X + Y = Z 87654321

X

SISD

+for (i=0; i<n; i++) {Z[i] = X[i] + Y[i];

}

911111111

Y

x1

x2

⋮xn

y1

y2

⋮yn

x1+y1

x2+y2

⋮xn+yn

+ =

Page 8: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Z

SIMD EXAMPLE

6

X + Y = Z 87654321

X

SISD

+for (i=0; i<n; i++) {Z[i] = X[i] + Y[i];

}

9 8 7 6 5 4 3 211111111

Y

x1

x2

⋮xn

y1

y2

⋮yn

x1+y1

x2+y2

⋮xn+yn

+ =

Page 9: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Z

SIMD EXAMPLE

6

X + Y = Z 87654321

X

for (i=0; i<n; i++) {Z[i] = X[i] + Y[i];

}

9 8 7 611111111

Y

SIMD

+

8 7 6 5

1 1 1 1

128-bit SIMD Register

128-bit SIMD Register128-bit SIMD Register

x1

x2

⋮xn

y1

y2

⋮yn

x1+y1

x2+y2

⋮xn+yn

+ =

Page 10: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Z

SIMD EXAMPLE

6

X + Y = Z 87654321

X

for (i=0; i<n; i++) {Z[i] = X[i] + Y[i];

}

9 8 7 6 5 4 3 211111111

Y

SIMD

+4 3 2 1

1 1 1 1

x1

x2

⋮xn

y1

y2

⋮yn

x1+y1

x2+y2

⋮xn+yn

+ =

Page 11: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SIMD TRADE-OFFS

Advantages:→ Significant performance gains and resource utilization if

an algorithm can be vectorized.

Disadvantages:→ Implementing an algorithm using SIMD is still mostly a

manual process.→ SIMD may have restrictions on data alignment.→ Gathering data into SIMD registers and scattering it to

the correct locations is tricky and/or inefficient.

7

Page 12: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN (R⨝S)

Phase #1: Sort→ Sort the tuples of R and S based on the join key.

Phase #2: Merge→ Scan the sorted relations and compare tuples.→ The outer relation R only needs to be scanned once.

8

Page 13: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN (R⨝S)

9

Relation R Relation S

SORT! SORT!

Page 14: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN (R⨝S)

9

Relation R Relation S

⨝SORT! SORT!

MERGE!

Page 15: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN (R⨝S)

9

Relation R Relation S

⨝SORT! SORT!

MERGE!

Page 16: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

PARALLEL SORT-MERGE JOINS

Sorting is always the most expensive part.

Use hardware correctly to speed up the join algorithm as much as possible.→ Utilize as many CPU cores as possible.→ Be mindful of NUMA boundaries.→ Use SIMD instructions where applicable.

10

MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013

Page 17: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

PARALLEL SORT-MERGE JOIN (R⨝S)

Phase #1: Partitioning (optional)→ Partition R and assign them to workers / cores.

Phase #2: Sort→ Sort the tuples of R and S based on the join key.

Phase #3: Merge→ Scan the sorted relations and compare tuples.→ The outer relation R only needs to be scanned once.

11

Page 18: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

PARTITIONING PHASE

Approach #1: Implicit Partitioning→ The data was partitioned on the join key when it was

loaded into the database.→ No extra pass over the data is needed.

Approach #2: Explicit Partitioning→ Divide only the outer relation and redistribute among the

different CPU cores.→ Can use the same radix partitioning approach we talked

about last time.

12

Page 19: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT PHASE

Create runs of sorted chunks of tuples for both input relations.

It used to be that Quicksort was good enough.

But NUMA and parallel architectures require us to be more careful…

13

Page 20: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

CACHE-CONSCIOUS SORTING

Level #1: In-Register Sorting→ Sort runs that fit into CPU registers.

Level #2: In-Cache Sorting→ Merge Level #1 output into runs that fit into CPU caches.→ Repeat until sorted runs are ½ cache size.

Level #3: Out-of-Cache Sorting→ Used when the runs of Level #2 exceed the size of caches.

14

SORT VS. HASH REVISITED: FAST JOIN IMPLEMENTATION ON MODERN MULTI-CORE CPUSVLDB 2009

Page 21: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

CACHE-CONSCIOUS SORTING

15

Level #1

Level #2

Level #3

SORTED

UNSORTED

Page 22: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

Input Output

Page 23: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

Input Output

Page 24: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

5

3Input Output

Page 25: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

5

3Input Output

3

Page 26: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

5

3Input Output

3

Page 27: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

9

6

5

3

5

6

Input Output3

5

6

9

Page 28: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

9

6

5

3

5

6

Input Output3

5

6

9

wires = [9,5,3,6]

wires[0] = min(wires[0], wires[1])wires[1] = max(wires[0], wires[1])wires[2] = min(wires[2], wires[3])wires[3] = max(wires[2], wires[3])

wires[0] = min(wires[0], wires[2])wires[2] = max(wires[0], wires[2])wires[1] = min(wires[1], wires[3])wires[3] = max(wires[1], wires[3])

wires[1] = min(wires[1], wires[2])wires[2] = max(wires[1], wires[2])

Page 29: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited

data dependencies and no branches.

16

9

5

3

6

3

6

5

9

9

6

5

3

5

6

Input Output3

5

6

9

wires = [9,5,3,6]

wires[0] = min(wires[0], wires[1])wires[1] = max(wires[0], wires[1])wires[2] = min(wires[2], wires[3])wires[3] = max(wires[2], wires[3])

wires[0] = min(wires[0], wires[2])wires[2] = max(wires[0], wires[2])wires[1] = min(wires[1], wires[3])wires[3] = max(wires[1], wires[3])

wires[1] = min(wires[1], wires[2])wires[2] = max(wires[1], wires[2])

11

22

3

3

Page 30: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

17

12 21 4 13

9 8 6 7

1 14 3 0

5 11 15 10

Instructions:→ 4 LOAD

Page 31: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

17

12 21 4 13

9 8 6 7

1 14 3 0

5 11 15 10

Sort Across Registers

Instructions:→ 4 LOAD

Page 32: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

17

12 21 4 13

9 8 6 7

1 14 3 0

5 11 15 10

1 8 3 0

5 11 4 7

9 14 6 10

12 21 15 13

Sort Across Registers

Instructions:→ 4 LOAD

Instructions:→ 10 MIN/MAX

Page 33: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #1 SORTING NETWORKS

17

12 21 4 13

9 8 6 7

1 14 3 0

5 11 15 10

1 8 3 0

5 11 4 7

9 14 6 10

12 21 15 13

1 5 9 12

8 11 14 21

3 4 6 15

0 7 10 13

Sort Across Registers

Transpose Registers

Instructions:→ 4 LOAD

Instructions:→ 10 MIN/MAX

Instructions:→ 8 SHUFFLE→ 4 STORE

Page 34: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #2 BITONIC MERGE NETWORK

Like a Sorting Network but it can merge two locally-sorted lists into a globally-sorted list.

Can expand network to merge progressively larger lists (½ cache size).

Intel’s Measurements→ 2.25–3.5x speed-up over SISD implementation.

18

EFFICIENT IMPLEMENTATION OF SORTING ON MULTI-COREVLDB 2008

Page 35: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #2 BITONIC MERGE NETWORK

19

Input Output

b4

b3

b2

b1

Sorted Run

Reverse Sorted Run

a1

a2

a3

a4

S

H

U

F

F

L

E

S

H

U

F

F

L

E

Sorted Run

min/max min/max min/max

Page 36: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

LEVEL #3 MULTI-WAY MERGING

Use the Bitonic Merge Networks but split the process up into tasks.→ Still one worker thread per core.→ Link together tasks with a cache-sized FIFO queue.

A task blocks when either its input queue is empty or its output queue is full.

Requires more CPU instructions, but brings bandwidth and compute into balance.

20

Page 37: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Sorted Runs

LEVEL #3 MULTI-WAY MERGING

21

MERGE

MERGE

MERGE

MERGE

MERGE

MERGE

MERGE

Cache-Sized Queue

Page 38: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Sorted Runs

LEVEL #3 MULTI-WAY MERGING

21

MERGE

MERGE

MERGE

MERGE

MERGE

MERGE

MERGE

Cache-Sized Queue

Page 39: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MERGE PHASE

Iterate through the outer table and inner table in lockstep and compare join keys.

May need to backtrack if there are duplicates.

Can be done in parallel at the different cores without synchronization if there are separate output buffers.

22

Page 40: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN VARIANTS

Multi-Way Sort-Merge (M-WAY)

Multi-Pass Sort-Merge (M-PASS)

Massively Parallel Sort-Merge (MPSM)

23

Page 41: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

Outer Table→ Each core sorts in parallel on local data (levels #1/#2).→ Redistribute sorted runs across cores using the multi-

way merge (level #3).

Inner Table→ Same as outer table.

Merge phase is between matching pairs of chunks of outer/inner tables at each core.

24

MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013

Page 42: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

25

Local-NUMA Partitioning

Page 43: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

25

Local-NUMA Partitioning Sort

Page 44: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

25

Local-NUMA Partitioning Sort

Multi-Way Merge

Page 45: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

25

Local-NUMA Partitioning Sort

Multi-Way Merge

Page 46: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

25

SORT!

SORT!

SORT!

SORT!

Local-NUMA Partitioning Sort

Multi-Way Merge

Same steps asOuter Table

Page 47: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-WAY SORT-MERGE

25

SORT!

SORT!

SORT!

SORT!

Local-NUMA Partitioning Sort

Multi-Way Merge

Local Merge Join

Same steps asOuter Table

Page 48: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-PASS SORT-MERGE

Outer Table→ Same level #1/#2 sorting as Multi-Way.→ But instead of redistributing, it uses a multi-pass naïve

merge on sorted runs.

Inner Table→ Same as outer table.

Merge phase is between matching pairs of chunks of outer table and inner table.

26

MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013

Page 49: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-PASS SORT-MERGE

27

Local-NUMA Partitioning

Local-NUMA Partitioning

Page 50: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-PASS SORT-MERGE

27

Local-NUMA Partitioning Sort

Local-NUMA PartitioningSort

Page 51: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MULTI-PASS SORT-MERGE

27

Local-NUMA Partitioning Sort

Global Merge Join

Local-NUMA PartitioningSort

Page 52: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MASSIVELY PARALLEL SORT-MERGE

Outer Table→ Range-partition outer table and redistribute to cores.→ Each core sorts in parallel on their partitions.

Inner Table→ Not redistributed like outer table.→ Each core sorts its local data.

Merge phase is between entire sorted run of outer table and a segment of inner table.

28

MASSIVELY PARALLEL SORT-MERGE JOINS IN MAIN MEMORY MULTI-CORE DATABASE SYSTEMSVLDB 2012

Page 53: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MASSIVELY PARALLEL SORT-MERGE

29

Cross-NUMA Partitioning

Page 54: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MASSIVELY PARALLEL SORT-MERGE

29

Cross-NUMA Partitioning Sort

Globally Sorted

Page 55: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MASSIVELY PARALLEL SORT-MERGE

29

SORT!

SORT!

SORT!

SORT!

Cross-NUMA Partitioning Sort

Page 56: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MASSIVELY PARALLEL SORT-MERGE

29

SORT!

SORT!

SORT!

SORT!

Cross-NUMA Partitioning Sort

Cross-Partition Merge Join

Page 57: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

MASSIVELY PARALLEL SORT-MERGE

29

SORT!

SORT!

SORT!

SORT!

Cross-NUMA Partitioning Sort

Cross-Partition Merge Join

Page 58: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Rule #1: No random writes to non-local memory→ Chunk the data, redistribute, and then each core

sorts/works on local data.

Rule #2: Only perform sequential reads on non-local memory→ This allows the hardware prefetcher to hide remote

access latency.

Rule #3: No core should ever wait for another→ Avoid fine-grained latching or sync barriers.

30

Source: Martina-Cezara Albutiu

Page 59: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

EVALUATION

Compare the different join algorithms using a synthetic data set.→ Sort-Merge: M-WAY, M-PASS, MPSM→ Hash: Radix Partitioning

Hardware:→ 4 Socket Intel Xeon E4640 @ 2.4GHz→ 8 Cores with 2 Threads Per Core→ 512 GB of DRAM

31

MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013

Page 60: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

RAW SORTING PERFORMANCE

32

0

9

18

27

36

1 2 4 8 16 32 64 128 256Thr

ough

put (

M T

upl

es/s

ec)

Number of Tuples (in 220)

C++ STL Sort SIMD Sort

Source: Cagri Balkesen

Single-threaded sorting performance

2.5–3x Faster

Page 61: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

COMPARISON OF SORT-MERGE JOINS

33

0

100

200

300

400

0

5

10

15

20

25

M-WAY M-PASS MPSM

Thr

ough

put (

M T

upl

es/s

ec)

Cyc

les

/ O

utp

ut T

upl

e

Partition Sort S-Merge M-Join Throughput

13.6

Source: Cagri Balkesen

Workload: 1.6B⋈ 128M (8-byte tuples)

7.6

22.9

Page 62: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

Hyper-Threading

M-WAY JOIN VS. MPSM JOIN

34

0

100

200

300

400

1 2 4 8 16 32 64Thr

ough

put (

M T

upl

es/s

ec)

Number of Threads

Multi-Way Massively Parallel

108 M/sec

315 M/sec

Source: Cagri Balkesen

Workload: 1.6B⋈ 128M (8-byte tuples)

130 M/sec

54 M/sec

259 M/sec

90 M/sec

Page 63: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN VS. HASH JOIN

35

0

2

4

6

8

SORT HASH SORT HASH SORT HASH SORT HASH

128M⨝128M 1.6B⨝1.6B 128M⨝512M 1.6B⨝6.4B

Cyc

les

/ O

utp

ut T

upl

e

Partition Sort S-Merge M-Join Build+Probe

Source: Cagri Balkesen

Workload: Different Table Sizes (8-byte tuples)

Page 64: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

SORT-MERGE JOIN VS. HASH JOIN

36

0

150

300

450

600

750

128 256 384 512 768 1024 1280 1536 1792 1920Thr

ough

put (

M T

upl

es/s

ec)

Millions of Tuples

Multi-Way Sort-Merge Join Radix Hash Join

Source: Cagri Balkesen

Varying the size of the input relations

Page 65: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

PARTING THOUGHTS

Both join approaches are equally important.

Every serious OLAP DBMS supports both.

We did not consider the impact of queries where the output needs to be sorted.

37

Page 66: 18 ADVANCED DATABASE SYSTEMS - CMU 15-721

CMU 15-721 (Spring 2019)

NEXT CL ASS

Query Compilation→ Or "Why is DSL Project is Awesome"

38