work-efficient parallel skyline computation for the...

102
Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh, Sean Chester, Ira Assent [email protected] Data-Intensive Systems Group Aarhus University, Denmark Harvard University 11 February 2016

Upload: others

Post on 29-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Work-Efficient Parallel Skyline Computation for theGPU

Kenneth S. Bøgh, Sean Chester, Ira Assent

[email protected] Systems Group

Aarhus University, Denmark

Harvard University11 February 2016

Page 2: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What this talk will cover

1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)

2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:

I multicore CPUsI GPUs

5 Current research at DASlab

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22

Page 3: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What this talk will cover

1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)

2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:

I multicore CPUsI GPUs

5 Current research at DASlab

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22

Page 4: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What this talk will cover

1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)

2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:

I multicore CPUsI GPUs

5 Current research at DASlab

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22

Page 5: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What this talk will cover

1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)

2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:

I multicore CPUsI GPUs

5 Current research at DASlab

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22

Page 6: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What this talk will cover

1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)

2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:

I multicore CPUsI GPUs

5 Current research at DASlab

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22

Page 7: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What is a GPU?

1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22

Page 8: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What is a GPU?

1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22

Page 9: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What is a GPU?

1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22

Page 10: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What is a GPU?

1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22

Page 11: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

What is a GPU?

1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22

Page 12: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Key differences between CPU and GPU

Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked

CPU

CPU RAM

Shared cache, 2MB per core

256KB 256KB

2x32KB 2x32KB

Core Core ...GPU

GPU RAM

Shared cache, 1.5MB

64KB R/W48KB RO

192 cores

64KB R/W48KB RO

192 cores ...

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22

Page 13: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Key differences between CPU and GPU

Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked

CPU

CPU RAM

Shared cache, 2MB per core

256KB 256KB

2x32KB 2x32KB

Core Core ...GPU

GPU RAM

Shared cache, 1.5MB

64KB R/W48KB RO

192 cores

64KB R/W48KB RO

192 cores ...

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22

Page 14: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Key differences between CPU and GPU

Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked

CPU

CPU RAM

Shared cache, 2MB per core

256KB 256KB

2x32KB 2x32KB

Core Core ...GPU

GPU RAM

Shared cache, 1.5MB

64KB R/W48KB RO

192 cores

64KB R/W48KB RO

192 cores ...

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22

Page 15: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Key differences between CPU and GPU

Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked

CPU

CPU RAM

Shared cache, 2MB per core

256KB 256KB

2x32KB 2x32KB

Core Core ...GPU

GPU RAM

Shared cache, 1.5MB

64KB R/W48KB RO

192 cores

64KB R/W48KB RO

192 cores ...

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22

Page 16: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Key differences between CPU and GPU

Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked

CPU

CPU RAM

Shared cache, 2MB per core

256KB 256KB

2x32KB 2x32KB

Core Core ...GPU

GPU RAM

Shared cache, 1.5MB

64KB R/W48KB RO

192 cores

64KB R/W48KB RO

192 cores ...

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22

Page 17: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The CPU and GPU threading models

CPU threads execute independently

GPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute

C

B E

A D F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22

Page 18: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The CPU and GPU threading models

CPU threads execute independently

GPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute

C

CPU1 CPU2

B E

A D F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22

Page 19: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The CPU and GPU threading models

CPU threads execute independentlyGPU threads execute in step-locked groups of 32 called warps

Threads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute

C

B E

A D F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22

Page 20: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The CPU and GPU threading models

CPU threads execute independentlyGPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute next

Otherwise some threads will halt while the others execute

C

WarpThread1−32

B E

A D F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22

Page 21: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The CPU and GPU threading models

CPU threads execute independentlyGPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute

C

WarpThread1 WarpThread2−32 (halted)

B E

A D F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22

Page 22: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

Close to the conferencelocation - to make you happyCheap - to make yourdepartment happySkyline query: Minimize priceand distance, returning all besttrade-offs p

q

PriceD

ista

nce

Price

Dis

tanc

e

Price

Dis

tanc

e

Price

Dis

tanc

e

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 23: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

Close to the conferencelocation - to make you happy

Cheap - to make yourdepartment happySkyline query: Minimize priceand distance, returning all besttrade-offs p

q

Price

Dis

tanc

e

PriceD

ista

nce

Price

Dis

tanc

e

Price

Dis

tanc

e

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 24: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

Close to the conferencelocation - to make you happyCheap - to make yourdepartment happy

Skyline query: Minimize priceand distance, returning all besttrade-offs p

q

Price

Dis

tanc

e

Price

Dis

tanc

e

PriceD

ista

nce

Price

Dis

tanc

e

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 25: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

Close to the conferencelocation - to make you happyCheap - to make yourdepartment happySkyline query: Minimize priceand distance, returning all besttrade-offs

p

q

Price

Dis

tanc

e

Price

Dis

tanc

e

Price

Dis

tanc

e

PriceD

ista

nce

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 26: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

A point p dominates* anotherpoint q if:

I p is preferable or equivalentto q in all dimensions

I p is strictly preferable to q inat least one dimension

The skyline [1] consists ofpoints that are not dominated

p

q

Price

Dis

tanc

e

Price

Dis

tanc

e

Price

Dis

tanc

e

PriceD

ista

nce

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 27: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

A point p dominates* anotherpoint q if:

I p is preferable or equivalentto q in all dimensions

I p is strictly preferable to q inat least one dimension

The skyline [1] consists ofpoints that are not dominated

p

q

Price

Dis

tanc

e

Price

Dis

tanc

e

Price

Dis

tanc

e

PriceD

ista

nce

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 28: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Example - Finding a conference hotel

A point p dominates* anotherpoint q if:

I p is preferable or equivalentto q in all dimensions

I p is strictly preferable to q inat least one dimension

The skyline [1] consists ofpoints that are not dominated

p

q

Price

Dis

tanc

e

Price

Dis

tanc

e

Price

Dis

tanc

e

PriceD

ista

nce

*This is the same concept as pareto dominance from Economics, but applied to databases.

[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22

Page 29: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The state of parallel skylines

GGS [3] is the state-of-the-artGPU skyline algorithmRun on Nvidia GTX Titan with2688 cores at 0.8 Ghz

BSkyTree [7] is the sequentialstate-of-the-artRun on a 3.4 Ghz Inteli7-3770

[3] K.S. Bøgh et al., “Efficient GPU-based skyline computation”,Proc. DaMoN, 2013.[7] J. Lee and S.-w. Hwang, “Scalable skyline computationusing a balanced pivot selection technique”, Inf. Syst., 2014.

1 2 4 6 80

20

40

Tim

e(s

)1 2 4 6 8

103

104

Cardinality, ×106

Dom

test

s/n

BSkyTree GGS

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 7 / 22

Page 30: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Monotonic sorting

1. Compute monotonic score for each data point

2. Sort the data by the score

3. for i = 0, . . . ,n − 1 do

4. Append point i to candidate buffer if nopoint in candidate buffer dominates i

candidate buffer. . .

unprocessed points

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 8 / 22

Page 31: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Object-based partitioning

Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests

CB

A

ED

F

F

C

B E

A D

F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22

Page 32: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Object-based partitioning

Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests

CB

A

ED

F

F

C

B E

A D

F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22

Page 33: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Object-based partitioning

Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests

CB

A

ED

F

F

C

B E

A D

F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22

Page 34: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Object-based partitioning

Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests

CB

A

ED

F

F

C

B E

A D

F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22

Page 35: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Object-based partitioning

Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests

CB

A

ED

F

F

C

B E

A D

F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22

Page 36: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Object-based partitioning

Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests

CB

A

ED

F

F

C

B E

A D F

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22

Page 37: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Control flow of Hybrid

Phase Isolution tree

. . .α α

Phase IIsolution tree

. . .α α

Updatesolution tree

. . .α

Phase I is ideal; Phase II is cache-resident; Update phase is sequential

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 10 / 22

Page 38: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Control flow of Hybrid

Phase Isolution tree

. . .α α

Phase IIsolution tree

. . .α α

Updatesolution tree

. . .α

Phase I is ideal; Phase II is cache-resident; Update phase is sequential

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 10 / 22

Page 39: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Control flow of Hybrid

Phase Isolution tree

. . .α α

Phase IIsolution tree

. . .α α

Updatesolution tree

. . .α

Phase I is ideal; Phase II is cache-resident; Update phase is sequential

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 10 / 22

Page 40: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 41: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 42: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 43: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 44: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 45: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = −− Q0 = −−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 46: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = −− Q0 = −−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 47: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 0− Q0 = −−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 48: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 0− Q0 = −−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 49: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = −−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 50: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = −−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 51: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = 1−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 52: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = 1−

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 53: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = 10

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 54: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = 10p1 M1 = 11 Q1 = 00p2 M2 = 10 Q2 = 11p3 M3 = 10 Q3 = 10p4 M4 = 10 Q4 = 01p5 M5 = 01 Q5 = 01p6 M6 = 01 Q6 = 11

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 55: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

p0

p0

p1

p2

p3

p5 p6

p4

Point Median Quartilep0 M0 = 01 Q0 = 10p6 M6 = 01 Q6 = 11p5 M5 = 01 Q5 = 01p4 M4 = 10 Q4 = 01p3 M3 = 10 Q3 = 10p2 M2 = 10 Q2 = 11p1 M1 = 11 Q1 = 00

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 56: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Static median/quartile based partitioning

Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching

01

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1

Point Median Quartilep0 M0 = 01 Q0 = 10p6 M6 = 01 Q6 = 11p5 M5 = 01 Q5 = 01p4 M4 = 10 Q4 = 01p3 M3 = 10 Q3 = 10p2 M2 = 10 Q2 = 11p1 M1 = 11 Q1 = 00

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22

Page 57: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3

P2

CompareCompareDescentCompareDominance testCompareDominance testCompare

01

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1

w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 58: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3

P2

CompareCompareDescentCompareDominance testCompareDominance testCompare

01

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 59: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

CompareCompareDescentCompareDominance testCompareDominance testCompare

01

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 60: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

Compare

CompareDescentCompareDominance testCompareDominance testCompare

01

w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 61: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

Compare

Compare

DescentCompareDominance testCompareDominance testCompare

01

w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 62: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

CompareCompare

Descent

CompareDominance testCompareDominance testCompare

01w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 63: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

CompareCompareDescent

Compare

Dominance testCompareDominance testCompare

01w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 64: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

CompareCompareDescentCompare

Dominance test

CompareDominance testCompare

01w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 65: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

CompareCompareDescentCompareDominance test

Compare

Dominance testCompare

01w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 66: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3P2

CompareCompareDescentCompareDominance testCompare

Dominance test

Compare

01w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 67: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

The SkyAlign workflow

w3P3

P2CompareCompareDescentCompareDominance testCompareDominance test

Compare

01w3

10

P0

11

P6

01

P5

M

Q

10

01

P4

10

P3

11

P2

11

00

P1w1 w2 w3 w4

p0

p1

p2

p3

p5 p6

p4

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22

Page 68: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Experimental setup

Intel i7-3770 with 4 cores at 3.4Ghz and hyperthreading enabledNvidia GTX Titan with 2688 cores at 0.8 GhzTransfer of data to and from GPU are included in the running timeTree building is included in the running time

Compared algorithms:I BSkyTree [7]: State-of-the-art sequential algorithmI Hybrid [4]: The proposed multicore algorithm (run with 8 threads)I GGS [3]: Previous state-of-the-art, tree-less GPU algorithmI SkyAlign [2]: The proposed GPU algorithm

Download all code: http://cs.au.dk/research-at-cs/data-intensive-systems/repository/

[2] K.S. Bøgh et al., “Work-efficient parallel skyline computation for the GPU”, PVLDB, 8:9, 962–973. 2015.[3] K.S. Bøgh et al., “Efficient GPU-based skyline computation”, Proc. DaMoN, 2013.[4] S. Chester et al., “Scalable parallelization of skyline computation for multi-core processors”, ICDE, 2015.[7] J. Lee and S.-w. Hwang, “Scalable skyline computation using a balanced pivot selection technique”, Inf. Syst., 2014.

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 13 / 22

Page 69: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Experimental setup

Intel i7-3770 with 4 cores at 3.4Ghz and hyperthreading enabledNvidia GTX Titan with 2688 cores at 0.8 GhzTransfer of data to and from GPU are included in the running timeTree building is included in the running timeCompared algorithms:

I BSkyTree [7]: State-of-the-art sequential algorithmI Hybrid [4]: The proposed multicore algorithm (run with 8 threads)I GGS [3]: Previous state-of-the-art, tree-less GPU algorithmI SkyAlign [2]: The proposed GPU algorithm

Download all code: http://cs.au.dk/research-at-cs/data-intensive-systems/repository/

[2] K.S. Bøgh et al., “Work-efficient parallel skyline computation for the GPU”, PVLDB, 8:9, 962–973. 2015.[3] K.S. Bøgh et al., “Efficient GPU-based skyline computation”, Proc. DaMoN, 2013.[4] S. Chester et al., “Scalable parallelization of skyline computation for multi-core processors”, ICDE, 2015.[7] J. Lee and S.-w. Hwang, “Scalable skyline computation using a balanced pivot selection technique”, Inf. Syst., 2014.

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 13 / 22

Page 70: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Evaluating running time

1 2 4 6 8

104

106

Tim

e(m

s)

4 8 12 16102103104

1 2 4 6 8102

103

104

Cardinality, ×106

Tim

e(m

s)

4 8 12 16101102103104

Dimensionality

ANTICORRELATED

INDEPENDENT

BSkyTree Hybrid GGS SkyAlign

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 14 / 22

Page 71: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Evaluating dominance tests

1 2 4 6 8102

104

106

Dom

test

s/n

4 8 12 16101

103

105

1 2 4 6 8102103104105

Cardinality, ×106

Dom

test

s/n

4 8 12 16100102104

Dimensionality

ANTICORRELATED

INDEPENDENT

BSkyTree Hybrid GGS SkyAlign

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 15 / 22

Page 72: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Evaluating work

1 2 4 6 8

106

108

Wor

k

4 8 12 16103

105

107

1 2 4 6 8104

105

106

Cardinality, ×106

Wor

k

4 8 12 16101

104

107

Dimensionality

ANTICORRELATED

INDEPENDENT

BSkyTree Hybrid GGS SkyAlign

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 16 / 22

Page 73: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Evaluating running time on the CPU

1 2 4 6 8102

104

106

Tim

e(m

s)

4 8 12 16102

104

106

1 2 4 6 8102

103

104

Tim

e(m

s)

4 8 12 16102

104

106

ANTICORRELATED

INDEPENDENT

Hybrid GGS SkyAlign

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 17 / 22

Page 74: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Scalability

1 4 7 14 280

10

20

Tim

e(m

s)

1 4 8 16 320

102030

1 4 7 14 280

10

20

Cores, 2x14

Tim

e(m

s)

1 4 8 16 320

102030

Cores, 4x8

ANTICORRELATED

INDEPENDENT

Hybrid GGS SkyAlign

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 18 / 22

Page 75: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Evaluating Clock per instruction

1 2 4 6 8

0.4

0.6

CP

I

4 8 12 160.20.40.60.8

1

1 2 4 6 8

0.4

0.6

0.8

CP

I

4 8 12 16

0.5

1

1.5

ANTICORRELATED

INDEPENDENT

Hybrid GGS SkyAlign

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 19 / 22

Page 76: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22

Page 77: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22

Page 78: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22

Page 79: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22

Page 80: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 81: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading data

Update-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read

Update

Memory

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 82: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating data

Memory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 83: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage used

Optimize for at most two - at the cost of the third

Read Update

Memory

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 84: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 85: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

8 4 9 1 5 0 2

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 86: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

8 4 9 1 5 0 2 3

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 87: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

8 4 7 1 5 0 2 3

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 88: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

0 1 2 4 5 8 9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 89: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

0 1 2 4 5 8 9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 90: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

0 1 2 3 4 5 8 9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 91: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 5 8 4 9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 92: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 5 8 4 9

<4 <9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 93: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 5 8 4 9

<4 <9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 94: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 8 4 5 9

<4 <9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 95: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 3 8 4 5 9

<4 <9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 96: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 G 5 8 4 G 9 G

<4 <9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 97: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Current research: The RUM conjecture

Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third

Read Update

Memory

1 2 0 3 5 8 4 G 9 G

<4 <9

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22

Page 98: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Open questions

Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22

Page 99: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Open questions

Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22

Page 100: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Open questions

Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22

Page 101: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Open questions

Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22

Page 102: Work-Efficient Parallel Skyline Computation for the …daslab.seas.harvard.edu/classes/cs265/files/visiting...Work-Efficient Parallel Skyline Computation for the GPU Kenneth S. Bøgh,

Open questions

Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?

Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22