computation of skyline points using parallel scan ... · this thesis is presented to the ... maria...

83
Computation of Skyline points using parallel scan algorithms on GPU devices This thesis is presented to the School of Computer Science & Software Engineering for the degree of Masters of Science (By Research) of The University of Western Australia By Maria Luisa Bravo-Rojas February 2014

Upload: dinhdung

Post on 17-Oct-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Computation of Skyline points using parallel scanalgorithms on GPU devices

This thesis is

presented to the

School of Computer Science & Software Engineering

for the degree of

Masters of Science (By Research)

of

The University of Western Australia

By

Maria Luisa Bravo-Rojas

February 2014

c© Copyright 2014

by

Maria Luisa Bravo-Rojas

iii

iv

Abstract

The computation of skyline points has become a particularly interesting topic in

recent years because of its application in multi-criteria decision-making systems.

Though many efficient algorithms have been designed, several important issues related

to this problem still persist. In particular, most algorithms are inefficient when applied

to high-dimensional datasets. The increase in the size of the skyline is a consequence of

the increase in the number of dimensions. Given that most operational databases are

very large, existing skyline algorithms perform poorly and, with a view to avoid this in-

efficiency, other algorithms have sacrificed the accuracy of results by removing from the

dataset some dimensions qualified as less significant.

Our research intends to propose a method to compute skylines for high-dimensional

databases with large cardinality that delivers accurate results and fast processing of the

data. Following our objective, we have studied the application of parallel programming

techniques and the benefits provided by GPGPU development frameworks.

We have found that simplicity is better when implementing parallelism and our proposed

algorithm scans the data avoiding the creation of data structures and making extensive

use of the functionalities provided by the GPGPU framework to reduce computing time.

Furthermore, our implementation benefits from hardware advantages delivered by GPU

devices as the texture memory space. Nevertheless, the GPGPU framework guarantees

portability to our implementation.

v

Besides cardinality and dimensionality, the correlation coefficient becomes a decisive fac-

tor at the characterization of data. Therefore, we have designed a group of tests using

datasets with different levels of correlation in order to evaluate our algorithm’s perfor-

mance.

vi

A Maruja y Esteban

vii

viii

Acknowledgements

Peruvian people preserve ancient memories about the “Amautas” who were wise peo-

ple dedicated to teach in the time of the Inka’s Empire. In Peru, our government

awards the title of “Amauta” to outstanding teachers as a sign of respect and acknowledge-

ment. Throughout the course of my studies at the UWA I have enjoyed the opportunity to

work in an environment enriched with the presence of people that brought to my memory

the Amautas tradition.

In that spirit, I would like to thank my main supervisor, Professor Amitava Datta and

co-supervisor, Associate Professor Chris McDonald for their supervision and guidance.

Particularly I am very recognized to Professor Datta for the opportunity to study a Re-

searchs degree.

Finally and always first, my immeasurable recognition to my family’s limitless, boundless

support. We are the Bravos.

ix

x

Contents

Abstract v

Acknowledgements ix

Contents xi

List of Tables xiii

List of Figures xiv

1 Introduction 1

1.1 Motivation and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Literature review 7

2.1 The Parallel Programing model . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 General Purpose Computing on GPU (GPGPU) . . . . . . . . . . . . . . 10

2.3 Skyline Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

xi

3 The algorithm 17

3.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 The Algorithm’s method for computation of Skyline points in high-dimensional,

high cardinality data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 The not-dominated relationship and the partitioning of the dataset 18

3.2.2 The weighted-sum approach implemented as a sorting criterion . . 22

3.3 Implementation of the parallel scan algorithm . . . . . . . . . . . . . . . . 23

3.3.1 Main algorithm (Figure 5) . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.2 Parallelizing the brute-force algorithm in the GPU . . . . . . . . . 25

3.3.3 Parallelizing the brute-force algorithm in the CPU . . . . . . . . . 29

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 The experiments 31

4.1 The experimental environment . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Dataset generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.1 Random datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Dependent datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.3 Real-life datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.1 Results on synthetic data . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.2 Results on real-life data . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Conclusions 59

Bibliography 61

xii

List of Tables

1 NASDAQ values at 20-Nov-2012 . . . . . . . . . . . . . . . . . . . . . . . 1

2 Tasks executed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 % Skyline points in not-correlated synthetic data . . . . . . . . . . . . . . 38

4 % Skyline points in anti-correlated synthetic data . . . . . . . . . . . . . . 39

5 % Skyline points in correlated synthetic data . . . . . . . . . . . . . . . . 39

6 Algorithm’s computing time for not-correlated synthetic datasets . . . . . 39

7 Algorithm’s computing time for anti-correlated synthetic datasets . . . . . 40

8 Algorithm’s computing time for correlated synthetic datasets . . . . . . . 42

9 % Skyline points in real-life data. . . . . . . . . . . . . . . . . . . . . . . . 50

10 Algorithm’s computing time for real-life datasets . . . . . . . . . . . . . . 51

xiii

List of Figures

1 q belongs to the data space under SA . . . . . . . . . . . . . . . . . . . . . 19

2 q belongs to SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 x 6≺ SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 x ≺ SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Block diagram of the parallel scan algorithm . . . . . . . . . . . . . . . . 24

6 Texture memory in the GPU . . . . . . . . . . . . . . . . . . . . . . . . . 25

7 Tuple t mapped into texture memory . . . . . . . . . . . . . . . . . . . . . 26

8 Flow diagram of the parallel brute-force algorithm . . . . . . . . . . . . . 27

9 Flow diagram of the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 28

10 Histogram for correlations observed between each pair of dimensions in a

10K not-correlated synthetic dataset . . . . . . . . . . . . . . . . . . . . . 32

11 Histogram for correlations observed between each pair of dimensions in a

10K anti-correlated synthetic dataset . . . . . . . . . . . . . . . . . . . . . 33

12 Histogram for correlations observed between each pair of dimensions in a

10K correlated synthetic dataset . . . . . . . . . . . . . . . . . . . . . . . 34

13 Histogram for correlations observed between each pair of dimensions in the

correlated NBA dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

14 Histogram for correlations observed between each pair of dimensions in the

not-correlated Microarray dataset . . . . . . . . . . . . . . . . . . . . . . . 35

xiv

15 Histogram for correlations observed between each pair of dimensions in the

correlated EEG1 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

16 Histogram for correlations observed between each pair of dimensions in the

correlated EEG2 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

17 Algorithm’s computing time for processing one cell on not-correlated syn-

thetic datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

18 Algorithm’s computing time for processing one cell on anti-correlated syn-

thetic datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

19 Algorithm’s computing time for processing one cell on correlated synthetic

datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

20 GPU effect on unsorted not-correlated synthetic datasets . . . . . . . . . 43

21 GPU effect on sorted not-correlated synthetic datasets . . . . . . . . . . . 43

22 GPU effect on unsorted anti-correlated synthetic datasets . . . . . . . . . 44

23 GPU effect on sorted anti-correlated synthetic datasets . . . . . . . . . . . 44

24 GPU effect on unsorted correlated synthetic datasets . . . . . . . . . . . . 45

25 GPU effect on sorted correlated synthetic datasets . . . . . . . . . . . . . 45

26 Sorting effect on not-correlated synthetic datasets (GPU algorithm) . . . 46

27 Sorting effect on not-correlated synthetic datasets (CPU algorithm) . . . 47

28 Sorting effect on anti-correlated synthetic datasets (GPU algorithm) . . . 47

29 Sorting effect on anti-correlated synthetic datasets (CPU algorithm) . . . 48

30 Sorting effect on correlated synthetic datasets (GPU algorithm) . . . . . . 48

31 Sorting effect on correlated synthetic datasets (CPU algorithm) . . . . . . 49

32 Algorithm’s computing time for processing one cell on real-life datasets. . 51

33 GPU effect on unsorted real-life datasets . . . . . . . . . . . . . . . . . . . 52

34 GPU effect on sorted real-life datasets . . . . . . . . . . . . . . . . . . . . 53

35 Sorting effect on real-life datasets processed using the GPU algorithm . . 53

xv

36 Sorting effect on real-life datasets processed using the CPU algorithm . . 54

37 Comparing effect of the algorithms in not-correlated synthetic datasets . . 55

38 Comparing effect of the algorithms in anti-correlated synthetic datasets . 56

39 Comparing effect of the algorithms in correlated synthetic datasets . . . . 56

40 Comparing effect of the algorithms in real-life datasets . . . . . . . . . . . 57

41 Top points cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

xvi

Chapter 1

Introduction

The aim of this dissertation is to present fast parallel algorithms for computing the

skyline of a multi-dimensional dataset. We will first motivate this problem through

the following example. A novice investor wants to buy shares and (s)he must decide where

to invest. After some market research, the investor analyzes the following information1

in order to make the decision:

Company Price EPS P\E(US $ ) (US $ )

Microsoft 26.73 1.86 14.36

Apple 562.75 44.16 12.70

Google 672.40 31.94 20.98

Yahoo 18.24 3.28 5.56

Amazon 234.09 0.08 2922.25

Facebook 23.22 0.14 165.00

Table 1: NASDAQ values at 20-Nov-2012

Based on the price the most viable candidate would be Yahoo but other variables such as

EPS (Earnings-per-share ratio) and P/E (Price-to-earnings ratio), should be considered

as well. The EPS ratio is an indicator of the companys profitability and the P/E ratio

forecasts the companys earnings growth. Therefore, the investor will make the best choice

looking for low price, high EPS and high P/E.

1http://www.marketwatch.com/investing/stock/

1

2 CHAPTER 1. INTRODUCTION

Humans face decision making problems on a daily basis with varying complexities. If

we consider the individual investor evaluating three simple variables, it could be thought

as a simple problem. However, for making a good decision, investment companies apply

sophisticated techniques to maximize profit and, quite often, they need to analyze many

variables [3].

Optimization methods approach the analysis of such complex systems by designing mathe-

matical models. Performance criteria are composed of objective functions that are usually

self-contradictory. The procedure of obtaining one or more optimum solutions for a set

of objective functions is known as a Multi-Objective Optimization Problem (MOOP) [40].

The objective function can reach an optimal value that is either a maximum or a min-

imum value. Assuming that the optimization looks for maximum values, the MOOP is

formally represented by [41]:

Maximize F(x) = [f1(x), f2(x), . . . , fM (x)]

subject to G(x) = [g1(x), g2(x), . . . , gJ(x)] ≥ 0

H(x) = [h1(x), h2(x), . . . , hK(x)] = 0

xLi ≤ xi ≤ xUi , i = 1, . . . ,N

Where x = (x1, x2, ..., xN )T is the vector of the N decision variables, M is the number

of objectives fi , there are J inequality and K equality constraints, and xL and xU are

respectively the lower and upper bound for each decision variable xi.

The set of optimal solutions constitutes the Optimal Pareto Front and is obtained by

applying the concept of dominance to compare each pair of possible solutions.

A solution x1 is said to dominate another solution x2, x1 ≺ x2, if both the following

conditions are true:

1.1. MOTIVATION AND OBJECTIVES 3

1. The solution x1 is not worse than x2 in all objectives.

2. The solution x1 is strictly better than x2 in at least one objective (using the mini-

mizing optimization).

Therefore x1 ≺ x2 if fi(x1) ≤ fi(x2)∀i ∈ {1, . . . ,M} and ∃ j ∈ {1,. . . ,M} where fj(x1) <

fj(x2).

The skyline for a dataset groups not-dominated points, i.e., the points that are not dom-

inated by other points. These not-dominated points constitute a subset of the entire

dataset. A point is not-dominated in the subset if it is not dominated by any other point

belonging to the subset. Therefore when a unique point dominates the entire dataset, the

skyline for the dataset is only that point. We can conclude that the search for a skyline

is made up of the computation of all not-dominated points in the dataset.

1.1 Motivation and objectives

Chaudhuri [9] addressed the impact of the Big Data phenomenon on the data manage-

ment industry. Query optimization and Operational Business Intelligence are short listed

as key issues that challenge researchers and developers.

The skyline operation was proposed by Borzsony in 2001, nevertheless the ISO-SQL 2011

standard still does not incorporate this operator [64]. The ISO Committee for Informa-

tion technology approaches the development of the standard by looking for cost-effective,

short development times and market-oriented results. Given that the market evolution

stresses the need for efficient data analysis frameworks, the industry researchers should

be working to provide feasible algorithms aimed to improve query efficiency and simple

enough to be used in complex Business Intelligence platforms.

4 CHAPTER 1. INTRODUCTION

Therefore, this work aims to achieve the following goals:

• Propose a new algorithm for computing Skylines in high-dimensional databases.

• Test the performance of the proposed algorithm.

1.1.1 Challenges

The design of a new Skyline algorithm confronts two main issues:

• Overwhelming complexity is to be avoided in order to reduce computational time.

• The algorithm should optimize memory usage.

Other works [59] have analyzed the impact of run-time complexity in search algorithms

when processing high-dimensional datasets. Given that state-of-the-art Skyline algo-

rithms [29] [44] are based on complex searches, this work aims to provide an alternative

solution.

Memory availability constitutes an ongoing issue for designing Skyline algorithms as more

business experiment the need to process larger datasets because of the increasing rate of

data collection.

1.1.2 Solution

This work proposes a new simple parallel scan algorithm that computes Skyline points

in high dimensional large datasets using a heterogeneous computing approach. Our algo-

rithm is simple and compares well with the existing algorithms in terms of its performance.

1.2 Contribution

The achievement of the goals listed above becomes the contribution of this thesis. We

will implement a new scan algorithm for computing Skylines on GPU devices, run a bat-

tery of tests using different types of datasets characterized by high-dimensionality and

1.3. ORGANIZATION OF THE THESIS 5

high-cardinality, and analyse the results.

Additionally, our work presents an implementation that takes advantage of heterogeneous

computing techniques, following the current trend of designing solutions able to utilize

hybrid architectures.

1.3 Organization of the Thesis

This thesis is structured as follows. In Chapter 2 we examine the parallel programming

models and review some previous algorithms for computing Skyline points. Chapter 3

presents our proposal of a parallel algorithm on the GPGPU framework. The evaluation of

our algorithm is discussed in Chapter 4, elaborating on the impact of data correlation over

the algorithm’s performance. Finally, Chapter 5 summarizes contributions and limitations

of our work.

6 CHAPTER 1. INTRODUCTION

Chapter 2

Literature review

Trough this chapter, we examine the parallel programming model and review previ-

ous work in the design of algorithms to compute Skyline points. Given that our

work aims to take advantage of parallel processing features, we begin this chapter pre-

senting a summarized view of parallel programming models and their compatible machine

architectures. Then, we examine the GPU model and the hardware support provided by

this technology. Finally, we discuss some work related to our research.

2.1 The Parallel Programing model 1

Parallel programming has been defined as “a form of computation in which many calcu-

lations are carried out simultaneously, operating on the principle that large problems can

often be divided into smaller ones, which are then solved concurrently” [1].

The computer resources used in this computation model are provided differently in dif-

ferent models, e.g., by a single multicore processor computer (Multicore computing), an

arbitrary number of computers connected by a network (Distributed computing), special-

ized parallel computers as the General-purpose computing on graphics processing units

(GPGPU) or a combination of any of the parallel architectures mentioned above.

1This section is based on the online resources provided for the “Applications of ParallelComputers” course at the University of California-Berkeley (http:\\www.cs.berkeley.edu\∼knight\cs267\resources.html)

7

8 CHAPTER 2. LITERATURE REVIEW

Parallel computer architectures provide parallelism, communication and synchronization

functionalities. According to the style of parallelism implemented, computers and software

are classified as Single-instruction-multiple-data (SIMD), Multiple-instruction-multiple-

data (MIMD) or Multiple-instruction-single-data (MISD). Communication patterns follow

the architecture’s memory model, e.g., Shared address space machines (Shared-memory),

Distributed address space machines (Distributed - memory) and the hybrid Distributed -

Shared memory model.

A programming model is made-up of the languages and libraries that create an interface

presenting an abstract view of the machine; this interface is employed by the program-

mer to write programs. Any parallel programming model must provide the user with the

capabilities to express parallelism, communication and synchronization in an algorithm.

The first parallel architectures brought programming models tailored to the architec-

ture specifications and unable to support portability and upgrades. At present, several

architecture-independent parallel programming models coexist supported by compatible

machine models.

The Shared Memory programming model delivers programs structured as a collection of

parallel tasks assigned to threads of control where each thread has a set of private variables

(local stack variables) and a set of shared variables (global heap). These threads commu-

nicate implicitly by writing and reading shared variables and coordinate by synchronizing

on shared variables, implementing protection mechanisms as locks, semaphores and mon-

itors to control concurrent access. A number of libraries implement the Shared Memory

model; widely used are the POSIX threading interface PThreads and the OpenMP speci-

fication for parallel programming. Shared Memory programming is supported by comput-

ers based on Shared Memory, Multithreaded Processors or Distributed Shared Memory

machine models.

The first category provides a global memory shared by all processors. The tasks run-

ning on different processors communicate with each other writing to and reading from

2.1. THE PARALLEL PROGRAMING MODEL 9

the global memory. Multithreaded processors attempt to reduce or hide latency by sup-

porting multiple concurrent streams of threads, which are independent of each other.

These threads are mapped onto hardware contexts that include general-purpose registers,

program counters and status registers. The Cray MTA supercomputer implements the

Multithreaded Processors model. In the Distributed Shared Memory architecture, phys-

ically separated memories can be addressed as one logically shared address space. The

NASA Columbia supercomputer built by SGI constitutes an example of this model.

Inside the Message Passing programming model every processor executes an independent

process communicating by calling subroutines to send data from one processor to an-

other. The address space is local; there is no shared data. Software support is provided

by MPI (Message Passing Interface) which is the de facto standard to enable message

passing applications. Message Passing is supported for the Distributed Memory and the

Internet/Grid Computing machine models.

The Distributed Memory machine model provides each processor with local memory and

cache without direct access to another processor’s memory. The Grid parallel model has

been defined as “a type of parallel and distributed system that enables the sharing, selec-

tion and aggregation of geographically distributed autonomous resources dynamically at

runtime depending on their availability, capability, performance, cost and user’s quality-

of-service requirements” [8]. Two important implementations of this model can be found

at the NASA’s Information Power Grid2 and the SETI@home project3.

A particular case is the Partitioned Global Address Space (PGAS) programming model

that attempts to exploit the data locality features of MPI with the data referencing sim-

plicity facilitated by the Shared Memory model. This objective is achieved providing

local and shared data with a local-view programming style that differentiates between

local and remote data partitions. Languages like the Unified Parallel C (UPC), Co-Array

2http://ntrs.nasa.gov/search.jsp?R=200101113893http://setiathome.berkeley.edu/

10 CHAPTER 2. LITERATURE REVIEW

Fortran (CAF) and Titanium implement PGAS while the Cray XK7 supercomputer pro-

vides support to this programming model4.

Applications based on the Data Parallel programming model consist of parallel opera-

tions applied to all or a defined subset of a data structure. High Performance Fortran

(HPF), CM Fortran and Fortran 90 languages support the Data Parallel model while

hardware support for this model has been provided by Vector and SIMD architectures.

Vector architectures operating on one-dimensional arrays of data were introduced by the

Cray platforms and later the SIMD concept allowing simultaneous processing of all the

vector elements was implemented in the Connection Computer Machine series and the

MasPar supercomputers. The modern Graphic Processing Units (GPUs) are based on a

wide vector width SIMD architecture.

Hybrid programming models refer to the combinations of the parallel programming models

previously mentioned. An example of hybrid systems can be found in the Hopper peta-flop

system5 supporting MPI-OpenMP programming.

2.2 General Purpose Computing on GPU (GPGPU)

The Graphics Processing Unit (GPU) has evolved from a configurable graphics processor

to a programmable parallel processor [42]. Heterogeneous or GPGPU computing aims

to take advantage of the improvement in GPU’s capabilities and the growing availability

of development tools, to gain performance while executing tasks. Specifically, the GPU

device possesses vector processing capabilities used to perform parallel operations while

the CPU core is optimized for low latency on single thread and used for executing serial

portions of code.

At this time NVIDIA and ATI, the most important GPU designers, improved their sup-

port to heterogeneous environments. A remarkable difference between both proposals can

4http://www.cray.com/Products/Computing/XK7/Software.aspx5http://www.nersc.gov/users/computational-systems/hopper/

2.3. SKYLINE POINTS 11

be found in the satisfaction of portability requirements.

NVIDIA provides CUDA (Compute Unified Device Architecture), a C language environ-

ment for parallel application development on the GPU that exposes several hardware

features in order to obtain better performance, but restricting the application of this

framework to applications running on NVIDIA’s graphics cards6.

Both NVIDIA and AMD hardware are supported by OpenCL (Open Computing Lan-

guage), a heterogeneous programming framework that is managed by the nonprofit tech-

nology consortium Khronos Group. Applications created using OpenCL can be executed

across a range of device types made by different vendors, supporting different levels of

parallelism and efficiently mapping to homogeneous or heterogeneous systems. This cross-

platform, industry-wide support guarantees portability to applications developed using

the OpenCL framework [18].

2.3 Skyline Points

Pareto approached the Multi-Objective Optimization (MOO) problem almost two cen-

turies ago proposing the concept of optimality to determinate the points considered as

possible solutions [45]. Further mathematical treatment of this theory [55] introduced

the concept of dominance. Kung et al [31] devised a basic divide-and-conquer algorithm

to find the optimal points in a multi-dimensional space, referring the MOO problem as

the Maximum Vector Problem. Borzsony et al [7] extended this work to the database

field including these calculations in the SQL language and designating the set of resultant

points as Skyline points.

Borzsony et al [7] presented the Divide-and-Conquer (D&C) and the Block Nested Loop

(BNL) algorithms. D&C divides the data into partitions, obtains partial skylines for

each partition and finally merges the partial skylines to find the skyline for the whole

dataset. BNL constitutes an improvement over the brute-force approach that compares

6https:\\developer.nvidia.com\cuda-faq

12 CHAPTER 2. LITERATURE REVIEW

every point with every other point; this algorithm maintains a subset of not-permanent

not-dominated points in memory, comparing each dataset point against this temporal

skyline and updating the subset after the comparison.

Tan et al [54] proposed two algorithms: Bitmap and Index. The Bitmap algorithm pre-

processes a N-dimensional dataset to build a bitmap structure where each dimension Di

with Mi values is represented as an NxMi vector. Then the algorithm projects the values

of each dataset point onto the NxMi vector to obtain a data structure where using bitwise

operations the retrieving of the skyline is a fast operation. The Index algorithm finds the

dimension Di where a dataset point Pj has got the largest value among all the dimensions

and stores the Pj values in a partition corresponding to Di. The B+-structure is used to

index the data partitions and search for the skyline points.

The Nearest Neighbor (NN) algorithm was proposed by Kossmann et al [29] . NN divides

the data space into regions and obtains partial skylines searching for nearest neighbors

of the origin in each region. The Branch-and-Bound Skyline algorithm (BBS) [44] is also

based on nearest-neighbor search, using R-trees as data-partitioning method and adopt-

ing the branch-and-bound paradigm to prune the data space.

The large number of applications for skylines in multi-criteria decision-making, data-

mining and visualization and user-preferences queries have made the computation of sky-

lines an active area of research in recent years. The number of points in the skyline can

be very large, depending on characteristics like the high dimensionality of the data and

anti-correlated data [28].

The rate of growth in the size of the datasets has motivated the creation of a number of

algorithms to optimize the computation of the skylines [39] [32] [43] [61]. The analysis

of the skyline subspaces has found a practical application of the data-warehouse cube

paradigm presenting the Skycube [47] [33] [62] [53] [63]. This algorithm creates a cube to

answer multiple related skyline queries.

2.3. SKYLINE POINTS 13

One of the approaches for improving the efficiency of skyline algorithms is to relax the

requirement of absolute dominance to reduce the number of objects retrieved by the algo-

rithm [58] [51] [38] [49] [52], hence reducing the computation load. This approach delivers

k-dominant skyline queries and has been improved with the representative skylines, which

introduce the p-core constraint to get only the most representative points on the skyline

[17].

The progressive delivery of results is another characteristic desirable in skyline query pro-

cessing algorithms and indispensable for users of large datasets. Algorithms based on

bitmaps and indexes have been proposed to provide this functionality [25] [34] [36] [52].

Another main issue is the application of skylines to data streams and uncertain data.

Researchers working on this topic have proposed the application of probabilistic skyline

queries for uncertain data [2] [65] [35] [14] [15]. The data streams skylines problem has

been solved in [37], but the weakness of any proposal is the incremental maintenance of

the skyline [20].

Furthermore, the increasing need of skyline applications running on mobile devices and

the proliferation of distributed sources of data, have encouraged new developments con-

cerning communication and computation efficiency [57] [60]. In this line of work some

researchers have proposed the processing of the data load on a centralized server and oth-

ers favor the distributed alternative. This line of work is associated with proposals aiming

at the adequate planning for the execution of skyline queries [48]. Data exploration and

data mining on skylines have been investigated and a number of measures have been

found to compare skylines [6] [22]. Another work has developed techniques to discover

prominent streaks in sequence data using skylines and extending the conventional data

mining tasks [23].

From a database point of view, indexing and use of very powerful hardware leverage the

efficiency in solving the skyline problem. The smartest indexing algorithms provide us

with the tools, but we need to model the methods to use these tools. Distributed com-

puting is an efficient method to take advantage of algorithms and hardware resources

14 CHAPTER 2. LITERATURE REVIEW

and is proposed in some research works on skylines with distributed sources of data [26].

However, these approaches deliver approximate or representative results only, sacrificing

accuracy in order to gain efficiency.

The computational capabilities of graphics processors units (GPUs) have been used to

perform non-graphical routines calling this practice general-purpose GPU computing or

GPGPU computing [50]. Addressing database and data mining applications, Govindaraju

et al [19] presents a bitonic-based sort algorithm and analyses the improvement obtained

by the use of texture mapping and blending functionalities of the GPU. Looking for a

fair application of parallel computing, Hyeonseung et al [21] studied the computation of

Skyline points in multicore CPU architectures.

Later, Choi et al [10] presented a GPU-based Nested Loop algorithm implemented using

CUDA and a 1024 Mb graphic card. While their proposal attempted to deal with the

overhead from memory exchange issue, the experiments have been conducted on datasets

with cardinality below or equal to 100K and dimensionality not higher than 30. Therefore,

the effect of data transfer executed by the algorithm inside the GPU memory (shared and

local memory) is not affecting visibly the performance.

The MapReduce paradigm has been used to process skylines and its variants [11], propos-

ing data partitions based on quadtrees in order to prune the data and define histograms

used in the map and reduce functions. In spite of a parallel implementation, the experi-

mental part has been run using a small number of dimensions.

Returning to the main issue, the weighted-sum method for MOO has been approached

by Marler and Arora [39]. Chomicki et al in [12] discussed the time reduction effect of

presorting the dataset before processing the skyline.

2.4. SUMMARY 15

2.4 Summary

A review of previous work on this topic provides a broad classification of the techniques

used to solve the Skyline problem. The D&C algorithm proves to be impractical for

processing high-dimensional, high-cardinality datasets when implemented in sequential

algorithms. The BNL algorithm works with a partial Skyline obtained over a data space

whose size increases at each iteration. Therefore this approach is unable to deal efficiently

with subsets of data too big to fit in memory.

The use of Bitmaps in a large dataset requires the creation of a bitmap structure for each

dimension where one bit is stored for each different value in the dimension represented

by the bitmap. The size of this bitmap and the cost associated with the computation of

each value for all the dimensions make this alternative impractical.

NN and BBS use R-Tree as data structure and Nearest-neighbor search to find the not-

dominated points. Weber et al [59] analyses Nearest-neighbor search and R-Tree data

partitions. The study concludes that for these methods among others, there is a dimen-

sionality D beyond which a simple sequential scan performs better.

In this review we have approximated the Skyline computation problem from the perspec-

tive of efficient use of resources provided by the heterogeneous architectures currently

available. Nowadays, big data challenges require solutions to high-dimensionality and

hyper-high cardinality. We have reviewed the main issues and looked for new theoretical

solutions. Finally, we have found that given the hardware resources available and GPGPU

computing advances the best solution is the simplest one: Simplicity is better. This work

proposes a simple and efficient parallel scan algorithm designed to exploit GPU’s and

CPU’s functionalities. Our experiments measure performance over different dataset types.

16 CHAPTER 2. LITERATURE REVIEW

Chapter 3

The algorithm

Through this chapter we propose a parallel scan algorithm for computing Skyline

points in high-dimensional datasets with high cardinality. We implement the al-

gorithm on a GPGPU framework and a benchmark version running on a multicore CPU

environment. This chapter begins summarizing some concepts related to the finding of

Skyline points, used throughout this thesis.

3.1 Notations and Definitions

We are using the symbol ≺ to denote the dominance relationship between two points,

and the maximizing optimization to find the optimal solutions. The symbol 6≺ is used to

denote not-dominance.

Definition

Let U be an N-dimensional space containing points p and q.

• p is not-dominated by q, q 6≺ p, if ∃ i ∈ {1,. . . ,N} where pi > qi.

• p is not-dominated in U if q 6≺ p ∀ q ∈ U.

Our work studies multidimensional sets of points and a computational problem relevant

in a data-mining context. Throughout this document, we are using the word tuple as

synonymous of an N -dimensional point.

17

18 CHAPTER 3. THE ALGORITHM

3.2 The Algorithm’s method for computation of Skyline

points in high-dimensional, high cardinality data

Finding skyline points in big data spaces aggravates the issues related to memory resources

and computation time. We are approaching this problem following the strategy described

next:

1. The time complexity will be minimized using a parallelized brute-force algorithm

with complexity O(d) where d is a measure of density defined as:

d = M ×N where M = number of tuples , N = number of dimensions.

2. The storage complexity issue is approached by designing data partitions that take

advantage of GPU and CPU resources.

3. We improve the efficiency in the pruning of the data space by pre-sorting the dataset

using a weight-sum criterion.

The first action, related to the algorithm’s parallelization, will be described in Section 3.

The actions established to deal with data partitions’ design and pruning optimization,

are elaborated in the next subsections.

3.2.1 The not-dominated relationship and the partitioning of the dataset

By definition, the Pareto frontier clusters not-dominated points qualified after the evalu-

ation of weak dominance in the dataset. Applying the divide and conquer technique, our

solution splits the dataset into segments tailored to the working memory and discovers

the group of not-dominated points in each segment. These partial skylines constitute a

new smaller dataset that is processed to find a final skyline.

Given that the algorithm is looking for not-dominated points, the implemented process

attempts to find the first point dominating the point under evaluation. This characteristic

provides an effective pruning of the data space.

3.2. THE ALGORITHM’S METHOD FOR COMPUTATIONOF SKYLINE POINTS IN HIGH-DIMENSIONAL, HIGH CARDINALITY DATA19

The not-dominance relationship is not-transitive, nevertheless our solution proves that in

an N -dimensional space and given that each data partition constitutes an N -dimensional

sub-space, a transitive relationship exists between the partial skylines and the points be-

longing to the data partitions.

According to the concept of dominance, given an N -dimensional space U where p ∈ U

and q ∈ U , we use the following relationships:

1. p ≺ q , p dominates q ≡ q 6≺ p, p is not-dominated by q

2. q ≺ p, q dominates p ≡ p 6≺ q, q is not-dominated by p

3. p 6≺ q and q 6≺ p, p and q are incomparable

Given the N -dimensional sets U , A, SA where A ⊂ U and SA ⊂ A, if q 6≺ p ∀q ∈ A,∀p ∈

SA ⇒ SA defines the skyline for A.

Therefore:

• If p ≺ q ⇒ q ∈ (A− SA) (Figure 1)

• If p 6≺ q and q 6≺ p⇒ q ∈ SA (Figure 2)

Figure 1: q belongs to the data space under SA

20 CHAPTER 3. THE ALGORITHM

Figure 2: q belongs to SA

A relationship of dominance can be defined between any point x and a partial skyline SA.

Definition

Given x ∈ U :

• If ∀p ∈ SA, p 6≺ x⇒ SA 6≺ x (Figure 3)

• If ∀p ∈ SA, x ≺ p⇒ x ≺ SA (Figure 4)

• If ∃p ∈ SA, p 6≺ x⇒ SA 6≺ x

• If ∀p ∈ SA, p ≺ x⇒ SA ≺ x

3.2. THE ALGORITHM’S METHOD FOR COMPUTATIONOF SKYLINE POINTS IN HIGH-DIMENSIONAL, HIGH CARDINALITY DATA21

Figure 3: x 6≺ SA

Figure 4: x ≺ SA

Definition

• ∀x ∈ U,∀q ∈ (A− SA), SA 6≺ x, q 6≺ SA ⇒ q 6≺ x

• ∀x ∈ U,∀q ∈ (A− SA), ∀p ∈ SA, p 6≺ x, q 6≺ p⇒ q 6≺ x

22 CHAPTER 3. THE ALGORITHM

Our approach discovers partial skyline points selecting not-dominated points and ignor-

ing the sub-space dominated by these points. Then the algorithm processes these partial

skylines discovering not-dominated points that, applying the previous definition, are not-

dominated by the sub-space under each partial skyline. The number of data partitions

decreases in each iteration until the final skyline is computed in a non-partitioned data

space.

Partitioning the N -dimensional data-space into several N -dimensional sub-spaces permits

the comparison between points positioned in different partial skylines. The fragmentation

of the N -dimensional data-space in slices representing combinations of M dimensions

where M < N allows the computation of partial skylines for each slice, however the

intersection of these partial results will always supply an empty set or a unique point

dominating every point in each dimension of the data-space.

3.2.2 The weighted-sum approach implemented as a sorting criterion

The formal representation of the Multi-Objective Optimization Problem (MOOP) was

presented before referencing [41]:

Maximize F(x) = [f1(x), f2(x), . . . , fM (x)]

subject to G(x) = [g1(x), g2(x), . . . , gJ(x)] ≥ 0

H(x) = [h1(x), h2(x), . . . , hK(x)] = 0

xLi ≤ xi ≤ xUi , i = 1, . . . ,N

The weighted-sum method provides a solution to MOOP by selecting scalar weights wi

and maximizing the following function:

U =∑N−1

i=0 wifi(x)

3.3. IMPLEMENTATION OF THE PARALLEL SCAN ALGORITHM 23

In spite of the lack of accuracy inherent to this approach when used to search the so-

lution space [13], the application of this technique to order the data space moves the

stronger skyline candidate points to the top positions. Our algorithm takes full advan-

tage of this arrangement in the partitioning of the data space, facilitating a more efficient

pruning action.

3.3 Implementation of the parallel scan algorithm

3.3.1 Main algorithm (Figure 5)

Firstly, the algorithm orders the data space placing the strong candidate skyline points

to the top. For each tuple p, the sort algorithm calculates W =∑N−1

i=0 pi. Given that we

are using a maximizing optimization, the tuples with higher W obtain the upper places

in the ordered dataset. Even when the existence of outliers prevents this initial process

from delivering a skyline for the dataset, the sorting provides a data space optimized for

the subsequent partitions.

After pre-sorting the dataset, the size of the data partition S is established based on the

texture memory available in the GPU device. Half of the GPU memory is assigned to an

array B that stores an image containing the values corresponding to the upper S points

placed in the ordered data space. The remainder memory resources are allocated to an

array A and used to load one data partition at a time; at each iteration a partial sky-

line is computed by comparing the points in the array A against the points kept in array B.

Through the comparing task, the algorithm searches for not-dominated points. Given that

the tuples stored in the array B represent the points with the highest sum of values in N

dimensions, the points belonging to the partition loaded in array A are efficiently pruned

from the data space.At this stage, the algorithm has found T points belonging to partial

skylines. A new dataset is defined containing only these T points and loaded into CPU

memory. Next, the algorithm processes the reduced data space and delivers a final skyline.

24 CHAPTER 3. THE ALGORITHM

Figure 5: Block diagram of the parallel scan algorithm

3.3. IMPLEMENTATION OF THE PARALLEL SCAN ALGORITHM 25

3.3.2 Parallelizing the brute-force algorithm in the GPU

The algorithm stores data in images intended to be processed in the GPU, taking advan-

tage of the faster access to texture memory provided in the GPU device. The following

paragraphs explain the translation of tuple values into pixels (mapping) and the logical

sequence of the kernel implemented in the GPU to process the images containing the

dataset.

Mapping the dataset into the texture memory

A GPU device processes images represented as a set of pixels, each pixel linked to coor-

dinates that define its position in the image. Figure 6 shows that for each pixel, the color

space is defined following the RGBA color model. These component intensities are stored

as four numeric values for a pixel.

Figure 6: Texture memory in the GPU

Vectors composed by 4 values are named as vector4 type in OpenCL nomenclature. Fig-

ure 7 shows the translation for the N values of tuple t into N/4 vector4 values. Each

vector4 is stored in the texture memory as a pixel, where in turn each component of the

color space represents a tuple value in one of the N dimensions.

26 CHAPTER 3. THE ALGORITHM

Figure 7: Tuple t mapped into texture memory

Implementing the Kernel

Figure 8 shows the implementation of a brute-force algorithm in a GPU kernel. The

texture memory size limits S representing the maximum number of tuples allowed in one

partition.

S = Texture memory size*4/ N

The OpenCL kernel instantiates a set of work items running in parallel to process the

same code on different elements of the dataset. Given that the algorithm will process S

tuples at each partition, the algorithm adjusts the kernel to create S work items in turn.

Inside the kernel, the algorithm’s code favors simplicity in order to obtain maximum

efficiency of the parallel implementation. Reading the image involves calculations of

coordinates and displacements. These instructions are coded using vector4 functions to

compensate the cost of each comparison instruction in the GPU processor.

3.3. IMPLEMENTATION OF THE PARALLEL SCAN ALGORITHM 27

Figure 8: Flow diagram of the parallel brute-force algorithm

The point p evaluated in the work item is compared against the points mapped in the

second image stored in B. The algorithm’s intent is to prune the data space of dominated

points; therefore the comparison task tries to find at least one point q dominating p and

stop the process. If no such q is found dominating p, p belongs to the partial skyline and

will be evaluated against other partial skylines in the CPU.

28 CHAPTER 3. THE ALGORITHM

Figure 9: Flow diagram of the Kernel

3.4. SUMMARY 29

3.3.3 Parallelizing the brute-force algorithm in the CPU

The kernel running in the CPU device is the same kernel implemented in the GPU. A

new dataset is defined to contain the points belonging to the partial skylines found in

the GPU kernel. The CPU device offers a memory space limited only for the RAM size;

therefore the algorithm process the dataset without partitions. Since this arrangement

enables the evaluation of each tuple against a whole data space, the skyline delivered at

the end of the process constitutes the skyline for the original dataset.

3.4 Summary

This chapter introduced an algorithm to compute skyline points in high-dimensional

datasets with high cardinality. This algorithm parallelizes the simple brute-force ap-

proach providing a hybrid implementation by adding the divide and conquer technique.

The algorithm aims to maximize efficiency in time by re-ordering the data space with

criteria directed to localize best candidates to skyline points. Afterwards, the dataset

is partitioned according to the GPU texture memory resources and creating a special

partition containing strong points. Inside the kernel, the algorithm minimizes delay by

avoiding warp divergence. Our use of the not-dominated relationship provides the algo-

rithm with a faster pruning strategy.

The partial skylines found in the GPU define a new dataset. The kernel implemented in

the CPU works in this smaller data space and finally delivers the skyline of the original

dataset.

30 CHAPTER 3. THE ALGORITHM

Chapter 4

The experiments

Through this chapter, we present the evaluation of the GPGPU-based parallel algo-

rithm introduced in the previous chapter. The multi-core CPU algorithm version

is used to benchmark our algorithm’s performance. Firstly, we detail the characteristics

of the datasets used to test the algorithms and the design of the experiment.

4.1 The experimental environment

The experiments were conducted using the following configuration:

• CPU: Processor Intel Core i5-2400S, 4 Cores, 2.5 GHz clock speed, 6 Mb cache.

• GPU: AMD Radeon HD 6750M, 512 MB, 720 Stream Processors, 36 Texture Units.

• OS: Mac OS X Lion 10.7.5.

The algorithms were coded in C99 using the OpenCL framework provided by XCode.

4.2 Dataset generation

We tested the efficiency of the proposed algorithm processing three types of data: syn-

thetic random, synthetic dependent and real-life datasets. To compare the correlations

in the datasets,we analysed tuples with values in 55 dimensions obtaining the correlation

31

32 CHAPTER 4. THE EXPERIMENTS

for each pair of dimensions DiDj where i, j ∈ [1 . . . 55], getting 1485 observations with

each type of synthetic dataset and for all the real-life datasets. The only exception was

the NBA dataset, that was analysed for 19 dimensions.

4.2.1 Random datasets

A first group of synthetic random data was generated by varying dimensionality and car-

dinality with values obtained using the random generator discussed in [24]. The analysis

of correlation for 10000 tuples (Figure 10) shows the dataset’s dimensions with values

near to zero for correlation measurements. Hereafter we will use the term not-correlated

synthetic datasets to refer to this type of data.

Figure 10: Histogram for correlations observed between each pair of dimensions in a 10Knot-correlated synthetic dataset

A second group of random datasets was generated adding restrictions to the values ob-

tained by the random generator in order to obtain pairs of variables with a negative

correlation rate. The correlation analysis for the 10000 tuples is displayed in Figure 11.

Hereafter we will use the term anti-correlated synthetic datasets to refer to this type of

data.

4.2. DATASET GENERATION 33

Figure 11: Histogram for correlations observed between each pair of dimensions in a 10Kanti-correlated synthetic dataset

4.2.2 Dependent datasets

Another type of synthetic data was generated by varying dimensionality and cardinality

with values restricted to an area under a skyline or Pareto Front composed of 1000 points.

The skyline was obtained on a Random dataset and our algorithm included the skyline

points following a random order as discussed in [24]. We analysed correlation for 10000

tuples and Figure 12 shows a higher correlation than that obtained for the Random

datasets. Hereafter we will use the term correlated synthetic datasets to refer to this type

of data.

4.2.3 Real-life datasets

NBA dataset. This dataset was obtained at http://www.databasebasketball.com/ con-

taining 21000 tuples with values in 19 dimensions. We analysed 171 correlation observa-

tions for the entire dataset and Figure 13 shows a high correlation between the dimensions.

34 CHAPTER 4. THE EXPERIMENTS

Figure 12: Histogram for correlations observed between each pair of dimensions in a 10Kcorrelated synthetic dataset

Microarray dataset. From the Standford Microarray database1 we obtained a dataset

containing 47000 tuples and values in 55 dimensions. We analysed correlation for the

entire dataset and Figure 14 shows that near 80% correlation between dimensions have

values around zero.

EEG datasets. Two datasets with values in 55 dimensions were obtained from [4]:

EEG1 containing 210259 tuples and EEG2 containing 631200 tuples. We analysed corre-

lation observations for both whole datasets and Figure 15 shows the analysis for EEG1

with almost 70% of the observations getting a correlation ratio greater than 0.7. Figure 16

shows the analysis for EEG2 with almost 75% of the observations getting a correlation

ratio greater than 0.7.

1www.smd.standford.edu

4.2. DATASET GENERATION 35

Figure 13: Histogram for correlations observed between each pair of dimensions in thecorrelated NBA dataset

Figure 14: Histogram for correlations observed between each pair of dimensions in thenot-correlated Microarray dataset

36 CHAPTER 4. THE EXPERIMENTS

Figure 15: Histogram for correlations observed between each pair of dimensions in thecorrelated EEG1 dataset

Figure 16: Histogram for correlations observed between each pair of dimensions in thecorrelated EEG2 dataset

4.3. RESULTS AND ANALYSIS 37

4.3 Results and analysis

We designed a set of tasks to verify the accuracy of our algorithm and measure the com-

puting time for each dataset. The analysis demanded separated tests of the GPU impact

and the pre-sorting process. We coded an implementation of the algorithm running only

in the CPU and it was used to benchmark the proposed algorithm. Hereafter the parallel

scan algorithm presented in this work is referred to as the GPU-CPU algorithm and the

benchmark implementation is identified as the CPU algorithm.

The test of the GPU-CPU algorithm involved the execution of the following tasks:

Task Description

1 Generation of synthetic not-correlated and anti-correlated data:- 55 dimensions- Cardinalities 10K, 50K, 100K and 500K

2 Generation of a Skyline for the 10K dataset using 20 dimensions.

3 Generation of datasets with points localized under the Pareto Front established bythe Skyline obtained in task 2.

4 Formatting real-life datasets.

5 Computation of Skylines for each Unsorted dataset and dimensionality using a navebrute-force algorithm.

6 Computation of Skylines for each Unsorted dataset and dimensionality using theCPU algorithm.

7 Computation of Skylines for each Unsorted dataset and dimensionality using theGPU-CPU algorithm.

8 Comparing the Skylines obtained on task 5 against Skylines obtained in task 6.

9 Comparing the Skylines obtained on task 5 against Skylines obtained in task 7.

10 Computation of Skylines for each Sorted dataset and dimensionality using a navebrute-force algorithm.

11 Computation of Skylines for each Sorted dataset and dimensionality using the CPUalgorithm.

12 Computation of Skylines for each Sorted dataset and dimensionality using the GPU-CPU algorithm.

13 Comparing the Skylines obtained on task 10 against Skylines obtained in task 11.

14 Comparing the Skylines obtained on task 10 against Skylines obtained in task 12.

Table 2: Tasks executed

38 CHAPTER 4. THE EXPERIMENTS

The next sections have been structured to display the experiments results using the fol-

lowing tables and graphs:

Object of analysis Synthetic datasets Real-life datasets

1. Skyline points density Tables 3, 4, 5 Table 92. Computing time Table 6, Figure 17 Table 10, Figure 32

Table 7, Figure 18Table 8, Figure 19

3. GPU effect Figures 20, 21, 22, 23, 24, 25 Figures 33, 344. Sorting effect Figures 26, 27, 28, 29, 30, 31 Figures 35, 36

4.3.1 Results on synthetic data

The Skyline points density was measured as the percentage of skyline points found in the

synthetic datasets (Tables 3, 4, 5).

Anti-correlated datasets were generated coupling dimensions on pairs with a correlation

ratio equal to -1; this characteristic explains why every tuple belongs to the Skyline in

this dataset type. Similar behavior is observed in not-correlated data starting at the 23th

dimension.

The correlated synthetic dataset was generated using a Skyline composed by 10000 tuples

on 19 dimensions. Therefore, once the Skyline has reached a cardinality equal to 10000,

it remains unchanged through the next dimensions.

Dimensions

Cardin. 11 15 19 23 27 31 35 39 43 47 51 55

10000 62.85 91.50 98.92 99.92 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

50000 41.94 81.48 96.82 99.53 99.95 99.99 100.0 100.0 100.0 100.0 100.0 100.0

100000 35.75 76.19 95.33 99.37 99.95 99.99 100.0 100.0 100.0 100.0 100.0 100.0

500000 21.68 60.48 88.49 98.09 99.78 99.98 99.99 99.99 100.0 100.0 100.0 100.0

Table 3: % Skyline points in not-correlated synthetic data

4.3. RESULTS AND ANALYSIS 39

Dimensions

Cardin. 11 15 19 23 27 31 35 39 43 47 51 55

10000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

50000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

100000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

500000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Table 4: % Skyline points in anti-correlated synthetic data

Dimensions

Cardin. 11 15 19 23 27 31 35 39 43 47 51 55

10000 8.77 9.89 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00

50000 1.75 1.98 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00

100000 0.88 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

500000 0.18 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20

Table 5: % Skyline points in correlated synthetic data

Computing time

The GPU-CPU sorted algorithm processing not-correlated synthetic datasets delivers re-

sults where the effect of cardinality affects the computing time performance significantly

more than the dataset dimensionality (Table 6). Because the GPU-CPU algorithm main

task involves only vectors comparisons, the increment in the number of cells processed

would be expected to result in a proportional increment in the processing time. On the

contrary, our results show that while the increment in computing time for cardinalities un-

der 50K is small and constant, the increment for higher cardinalities display a logarithmic

trend (Figure 17).

Cardinality

Dimensions 10000 50000 100000 500000

11 1.017652 2.475153 8.439203 205.574463

15 0.292031 6.142216 27.723028 1328.326538

19 0.391086 9.497192 56.642090 2557.642578

23 0.396959 10.761326 429.605225 3058.534424

27 0.395591 12.631336 87.725037 3445.944092

31 0.345020 14.184874 514.369629 3765.470459

35 0.401556 18.972763 617.557678 15470.986328

39 0.403797 20.974472 680.608521 17053.708984

43 0.409707 23.135134 744.215027 18638.875000

47 0.418898 24.800280 798.059509 19992.802734

51 0.419861 26.148502 862.092285 21582.740234

55 0.428797 26.679304 926.026123 23195.812500

Table 6: Algorithm’s computing time for not-correlated synthetic datasets

40 CHAPTER 4. THE EXPERIMENTS

Figure 17: Algorithm’s computing time for processing one cell on not-correlated syntheticdatasets.

The algorithm’s performance working on the anti-correlated synthetic datasets is sum-

marized in Figure 18 and Table 7. The average computing time for one cell is mostly

constant on these datasets; this behavior can be explained by the density of the Skyline

points reaching the 100% in this data space.

Cardinality

Dimensions 10000 50000 100000 500000

11 0.266033 6.160472 247.338318 6206.256348

15 0.254809 6.112443 285.676178 7174.325195

19 0.293400 7.653198 364.562408 9138.733398

23 0.296162 9.572461 427.748474 10696.526367

27 0.298267 12.191456 480.847626 12019.625977

31 0.262136 14.040410 504.473419 12618.903320

35 0.303876 18.668655 606.348694 15170.001953

39 0.308801 21.145960 672.321716 16812.277344

43 0.313407 23.297487 731.681946 18289.800781

47 0.313600 24.434307 792.916382 19823.744141

51 0.317469 25.232813 856.372070 21424.695312

55 0.329156 25.835175 922.723022 23079.070312

Table 7: Algorithm’s computing time for anti-correlated synthetic datasets

While working on correlated synthetic datasets, the algorithm obtains its best perfor-

mance processing 100K tuples (Figure 19 and Table 8).

4.3. RESULTS AND ANALYSIS 41

Figure 18: Algorithm’s computing time for processing one cell on anti-correlated syntheticdatasets

Effect of using GPU

We analysed the effect of processing the synthetic datasets in the GPU device using the

texture memory to obtain a faster access to the data. The results displayed in Figure 20

and Figure 21 establish that the processing time remains unchanged when the algorithm

makes use of the GPU processors on not-correlated synthetic data.

Figure 22 and Figure 23 display the effect of the use of GPU processors on anti-correlated

synthetic data processing. The effect is null.

The analysis of the running time for the correlated synthetic dataset finds a significant

decrease assignable to the utilisation of the GPU processing capabilities. The results are

shown in Figure 24 and Figure 25.

42 CHAPTER 4. THE EXPERIMENTS

Figure 19: Algorithm’s computing time for processing one cell on correlated syntheticdatasets.

Cardinality

Dimensions 10000 50000 100000 500000

11 0.024828 0.035293 0.057874 0.561151

15 0.024696 0.038187 0.066345 0.735816

19 0.060545 0.079764 0.114862 1.015052

23 0.067920 0.089358 0.131216 1.223082

27 0.073202 0.098617 0.146550 1.450916

31 0.076620 0.105001 0.158715 1.563144

35 0.087926 0.121737 0.182078 1.828309

39 0.094304 0.133889 0.198746 2.046243

43 0.102605 0.143463 0.218502 2.290935

47 0.107733 0.151773 0.232963 2.340979

51 0.114884 0.162657 0.255126 2.727166

55 0.122318 0.171794 0.264067 2.648500

Table 8: Algorithm’s computing time for correlated synthetic datasets

4.3. RESULTS AND ANALYSIS 43

Figure 20: GPU effect on unsorted not-correlated synthetic datasets

Figure 21: GPU effect on sorted not-correlated synthetic datasets

44 CHAPTER 4. THE EXPERIMENTS

Figure 22: GPU effect on unsorted anti-correlated synthetic datasets

Figure 23: GPU effect on sorted anti-correlated synthetic datasets

4.3. RESULTS AND ANALYSIS 45

Figure 24: GPU effect on unsorted correlated synthetic datasets

Figure 25: GPU effect on sorted correlated synthetic datasets

46 CHAPTER 4. THE EXPERIMENTS

Effect of sorting

We analysed the effect of pre-sorting the synthetic datasets to optimize pruning efficiency

in the algorithms. Figure 26 and Figure 27 show that the computing time is not affected

by the pre-sorting in the not-correlated synthetic datasets, neither using the GPU texture

memory nor taking advantage of the CPU multiprocessing capabilities.

Figure 26: Sorting effect on not-correlated synthetic datasets (GPU algorithm)

Figure 28 and Figure 29 show that the effect of pre-sorting in the anti-correlated synthetic

datasets processing is null. Figure 30 shows that the computing time is not affected by

the pre-sorting in the correlated synthetic datasets when the process is executed in the

GPU texture memory. Processing the same datasets using the CPU multiprocessing ca-

pabilities provides an improvement in the computing time directly proportional to the

dataset density growth (Figure 31).

4.3. RESULTS AND ANALYSIS 47

Figure 27: Sorting effect on not-correlated synthetic datasets (CPU algorithm)

Figure 28: Sorting effect on anti-correlated synthetic datasets (GPU algorithm)

48 CHAPTER 4. THE EXPERIMENTS

Figure 29: Sorting effect on anti-correlated synthetic datasets (CPU algorithm)

Figure 30: Sorting effect on correlated synthetic datasets (GPU algorithm)

4.3. RESULTS AND ANALYSIS 49

Figure 31: Sorting effect on correlated synthetic datasets (CPU algorithm)

50 CHAPTER 4. THE EXPERIMENTS

4.3.2 Results on real-life data

The Skyline points density was measured as the percentage of skyline points found in the

real-life datasets (Table 9). The previous correlation analysis for the Microarray dataset

qualified it as a not-correlated data space; this characteristic explains the high percentage

of Skyline points starting at dimension 43. The NBA, EEG1 and EEG2 datasets have

been categorized as highly correlated and show a slight increment in the Skyline density,

proportional to the raise in the dimensionality.

Dimensions

Cardinality 11 15 19 23 27 31 35 39 43 47 51 55

20000 3.31 5.85 9.40(NBA)

47000 0.79 2.37 2.73 4.94 6.40 8.16 16.70 23.01 99.40 99.91 100 100(Microarray)

210259 0.01 0.01 0.01 0.03 0.03 0.06 0.15 0.17 0.23 0.23 0.26 0.27(EEG1)

631200 0.08 0.08 0.08 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09(EEG2)

Table 9: % Skyline points in real-life data.

Computing time

The GPU-CPU sorted algorithm processing correlated real-life datasets, reduces the com-

puting time more efficiently than the same algorithm working on not-correlated real-life

datasets (Table 10). Processing each cell requires almost the same fraction of clocks on

EEG1 through the increasing number of dimensions, while this variation is slightly wider

on EEG2. In spite of the NBA dataset low cardinality, the average computing time for

one cell contained in this data space represents more than twice the value obtained for

the EEG1 dataset. The Microarray dataset presents the same behavior than the not-

correlated synthetic datasets when measuring computing times for one cell (Figure 32).

4.3. RESULTS AND ANALYSIS 51

Cardinality

Dimensions 20000 47000 210259 631200(NBA) (Microarray) (EEG1) (EEG2)

11 0.027071 0.036561 0.135716 0.92190915 0.031448 0.045965 0.166571 1.15345719 0.047514 0.141917 0.208821 1.48587923 0.098986 0.240849 1.76307127 0.133895 0.278463 2.03667331 0.210513 0.317111 2.35142135 0.902592 0.361544 2.65055139 30.426100 0.384954 2.96150643 23.837299 0.444117 3.34213647 25.255312 0.485952 3.49160651 27.135036 0.525998 4.03482555 27.916765 0.601897 4.002118

Table 10: Algorithm’s computing time for real-life datasets

Figure 32: Algorithm’s computing time for processing one cell on real-life datasets.

52 CHAPTER 4. THE EXPERIMENTS

Effect of using GPU

Figure 33 shows that the use of the GPU improves the time processing when the data is

highly correlated. Efficiency in the process of not-correlated datasets is not affected by

the use of GPU. Figure 34 proves that the same effect occurs either on sorted or unsorted

real-life datasets.

Figure 33: GPU effect on unsorted real-life datasets

Effect of sorting

Figure 35 shows that the pre-sorting process does not affect the efficiency of the algorithm

running in the GPU texture memory.

4.3. RESULTS AND ANALYSIS 53

Figure 34: GPU effect on sorted real-life datasets

Figure 35: Sorting effect on real-life datasets processed using the GPU algorithm

54 CHAPTER 4. THE EXPERIMENTS

The effect of pre-sorting the datasets improves the efficiency of the algorithm running in

the CPU processors only when the data is correlated. Figure 36 shows that the algorithm

maintains the same performance when the Microarray not-correlated dataset is processed.

Figure 36: Sorting effect on real-life datasets processed using the CPU algorithm

4.4 Summary

Working with not-correlated and anti-correlated datasets turns out to be not so fruitful

on any of the improvements tested in this experiment, Figure 37 and Figure 38 show that

the processing time stays unchanged for any of the improvements.

Correlated synthetic datasets and real-life datasets are affected by the use of the GPU

texture memory; however the computation of the skylines proves to be more efficient when

the data is unsorted (Figure 39 and Figure 40). The reason that the unsorted implemen-

tation of our algorithm outperforms the sorted version can be found analyzing the data

4.4. SUMMARY 55

Figure 37: Comparing effect of the algorithms in not-correlated synthetic datasets

space arrangement changing through the algorithm tasks. The stronger candidates to

become skyline points are obtained in the pre-sorting process. How are these top-points

positioned in the data space? Using 55 dimensions we found that the EEG1 dataset de-

livers 575 points in the top of its sorted space, but only 74 top-points belong to the final

skyline. Figure 41 graphs the values for the first 11 top-points where all but the 11th are

skyline points. Giving that the pre-sorting process is providing a subset of points repre-

senting a cluster in the data space, the partial skyline for this subset is eliminating most

of the top points and delivering too few not-dominated points into the process running in

the GPU. Therefore the GPU algorithm wastes precious texture memory resources and

the unsorted implementation becomes more efficient.

56 CHAPTER 4. THE EXPERIMENTS

Figure 38: Comparing effect of the algorithms in anti-correlated synthetic datasets

Figure 39: Comparing effect of the algorithms in correlated synthetic datasets

4.4. SUMMARY 57

Figure 40: Comparing effect of the algorithms in real-life datasets

Figure 41: Top points cluster

58 CHAPTER 4. THE EXPERIMENTS

Given a high-dimensional data space with high cardinality, the unsorted implementation

of the parallel algorithm demonstrates that the resources provided by GPU devices im-

prove the efficiency of a simple parallelized scan algorithm and compares well with other

more complex approaches [21]. Nevertheless, our work was aimed to obtain a consistent

and scalable reduction in computing time. This goal was not achieved by reason of the

time consumed on the data transference between the RAM memory (host) and the GPU

memory (kernel). The texture memory size prohibits the upload of the entire dataset,

consequently the algorithm partitions the data and increments significantly the comput-

ing time at each point where the target subspace is transferred to the GPU. This effect

is displayed at the results obtained by processing the real-life correlated datasets.

Chapter 5

Conclusions

This work presents a parallel scan algorithm that computes Skyline points in high di-

mensional large datasets using a heterogeneous computing approach. Our approach

delivers the following contributions:

• A parallel scan algorithm running on GPU devices taking advantage of characteris-

tics as texture memory provided by graphic cards hardware, and vector data types

included in the OpenCL framework.

• A comprehensive test for performance and accuracy of the proposed algorithm.

We confronted and solved two main issues:

• Overwhelming complexity of recursive searching: our algorithm implements a brute-

force parallel scanning that suits better the GPU architecture.

• Memory availability, both host memory (RAM) and device memory: our approach

optimizes memory usage by working with texture memory and processing vector

data types inside the kernels.

Through our investigation, we analysed the effect of sorting on the algorithm perfor-

mance. Sorting allows the pruning to be more effective, because the stronger points are

processed first, but a not-dominated point could be ignored during sorting if this point

59

60 CHAPTER 5. CONCLUSIONS

has characteristics of a maximum or minimum in any dimension and low weight-sum.

Trying to achieve efficiency at processing high dimensional data spaces with high car-

dinality, we accessed the GPU texture memory characterized by faster access and, size

bounded to the device specifications. While the use of this resource improves the algo-

rithm efficiency when dealing with correlated datasets, it causes the opposite effect when

the data space has been qualified as not-correlated or anti-correlated. Datasets with zero

or negative correlation present partial Skylines with high cardinality. The size of each

data partition is calculated to fit into the GPU texture memory, therefore larger partial

results cause a significant increase in the number of uploads to the memory device. In

spite of this limitation, the algorithm’s dependence on the hardware availability, implies

scalability because of the algorithm adaptability.

Scan parallel algorithms have been studied and proved to outperform complex tree struc-

tured indexes when the datasets belong to high-dimensional space[27]. Our proposal

aimed to provide a simplistic approach to the Skyline computational problem proposing a

parallel scan dominance algorithm. The experimental part of our work demonstrates the

accuracy of our implementation and its competence to deal with high dimensionality. To

the best of our knowledge, other parallel implementations has been tested only on data

spaces with dimensionality lower than the datasets generated and collected in this work.

Nevertheless, our algorithm computing time compares well with results presented in these

works [21].

Bibliography

[1] Almasi, G. and Gottlieb, A. Highly Parallel Computing. Pearson Education. 1993.

[2] Atallah, M. et al. Asymptotically Efficient Algorithms for Skyline Probabilities of

Uncertain Data. ACM Transactions on Database Systems . 2011.

[3] Atsalakis, G. and Valavanis, K. Surveying stock market forecasting techniques - Part

II: Soft computing methods. Expert Syst. Appl. 36, 3 (April 2009), 5932-5941. 2009.

[4] BCI-Lab. Data sets IIIa. Provided by the Laboratory of Brain-Computer Inter-

faces (BCI-Lab), Graz University of Technology, (Gert Pfurtscheller, Alois Schlgl).

http://bbci.de/competition/iii/index.html

[5] Bhattacharya, B. et al. Computation of Non-dominated Points Using Compact

Voronoi Diagrams. WALCOM: Algorithms and Computation Lecture Notes in Com-

puter Science. Springer Berlin/Heidelberg. 2010.

[6] Bohm, C. et al. SkyDist: Data Mining on Skyline Objects Advances in Knowl-

edge Discovery and Data Mining Lecture Notes in Computer Science. Springer

Berlin/Heidelberg. 2010.

[7] Borzsony, S. et al. The Skyline operator. Data Engineering, 2001. Proceedings. 17th

International Conference on , vol., no., pp.421-430, 2001

[8] Buyya, R. and Venugopal, Srikumar. A gentle introduction to grid computing and

technologies. CSI Communications. July 2005.

[9] Chaudhuri, S. What Next? A Half-Dozen Management Research Goals for Big Data

and the Cloud. PODS12, May 21-23, 20102, Scottsdale, Arizona, USA.

61

62 BIBLIOGRAPHY

[10] Choi W. et al. Multi-criteria decision making with skyline computation. Information

Reuse and Integration (IRI), 2012 IEEE 13th International Conference on , vol., no.,

pp.316,323, 8-10 Aug. 2012.

[11] Choi W. et al. Parallel Computation of Skyline and Reverse Skyline Queries Using

MapReduce. Proceedings of the VLDB Endowment 6.14, 2013.

[12] Chomicki, J. et al. Skyline with Presorting. In: Proceedings of the 19th International

Conference on Data Engineering (ICDE). IEEE Computer Society. pp 717-719. 2003.

[13] De Weck, O. Multiobjective Optimization: History and Promise, In The Third China-

Japan-Korea Joint Symposium on Optimization of Structural and Mechanical Sys-

tems, 2004.

[14] Ding, X. and Jin, H. Efficient and Progressive Algorithms for Distributed Skyline

Queries over Uncertain Data IEEE 30th International Conference on Distributed

Computing Systems (ICDCS), 2010.

[15] Ding, X. et al. Continuous monitoring of skylines over uncertain data streams Infor-

mation Sciences. 2011.

[16] Donoho, D. High-Dimensional Data Analysis: The Curses and

Blessings of Dimensionality. Standford University. 2000. www-

stat.stanford.edu/ donoho/Lectures/AMS2000/Curses.pdf

[17] Fung, G. et al. Extract Interesting Skyline Points in High Dimension Database

Systems for Advanced Applications. Lecture Notes in Computer Science. Springer

Berlin/Heidelberg. 2010.

[18] Gaster, B. et al. Heterogeneous Computing with OpenCL. Elsevier, 2012.

[19] Govindaraju, N. et al. A cache-efficient sorting algorithm for

database and data mining computations using graphics pro-

cessors. University of North Carolina, Chapel Hill, 2005.

https://gamma.cs.unc.edu/papers/documents/technicalreports/tr05016.pdf

BIBLIOGRAPHY 63

[20] Hsueh, Y. et al. Efficient Updates for Continuous Skyline Computations Database

and Expert Systems Applications: 19th international conference DEXA, Italy.

Springer. 2008.

[21] Hyeonseung, I. et al. Parallel skyline computation on multicore architec-

tures. Information Systems, Volume 36, Issue 4, June 2011, Pages 808-823.

http://www.sciencedirect.com/science/article/pii/S0306437910001389

[22] Jang, S. et al. Skyline Minimum Vector 2010. 12th International Asia-Pacific Web

Conference.

[23] Jiang, X. et al. Prominent streak discovery in sequence data. Proceedings of the 17th

ACM SIGKDD international conference on Knowledge discovery and data mining.

ACM, New York, USA. 2011.

[24] Jones, D. Good Practice in (Pseudo) Random Num-

ber Generation for Bioinformatics Applications. May 2010.

http://www0.cs.ucl.ac.uk/staff/D.Jones/GoodPracticeRNG.pdf

[25] Jung, H. et al. A fast and progressive algorithm for skyline queries with totally and

partially ordered domains. Journal of Systems and Software.2010.

[26] Khler, H. et al. Efficient parallel skyline processing using hyperplane projections.

2011. In Proceedings of the International conference on Management of data (SIG-

MOD 11). ACM, New York, NY, USA. 2011.

[27] K. Jinwoong et al. Parallel multi-dimensional range query processing with R-trees on

GPU, Journal of Parallel and Distributed Computing, Volume 73, Issue 8, August

2013.

[28] Kohler, H. and Yang, J. Computing Large Skylines over Few Dimensions: The Curse

of Anti-correlation. 12th International Asia-Pacific Web Conference. Korea. 2010.

[29] Kossmann, D. et al. Shooting stars in the sky: An online algorithm for skyline queries.

In Proceedings of the Very Large Data Bases Conference (VLDB; Hong Kong, China,

Aug. 20-23). 2002. 275-286.

64 BIBLIOGRAPHY

[30] Kriegel, H. et al. Route skyline queries: A multipreference path planning approach

IEEE 26th International Conference on Data Engineering (ICDE), 2010.

[31] Kung, H. et al. On finding the maxima of a set of vectors. Journal of the ACM,

22(4):469476, 1975.

[32] Lee, J. and Hwang, S. BSkyTree: scalable skyline computation using a balanced pivot

selection. Proceedings of the 13th International Conference on Extending Database

Technology (EDBT 10). ACM, New York, NY, USA. 2010.

[33] Lee, J. and Hwang, S. QSkycube: Efficient Skycube Computation Using Point Based

Space Partitioning PVLDB. 2010.

[34] Li, C et al. Multi-Source Skyline Queries Processing in Multi-Dimensional Space.

Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer

Science. Springer Berlin/Heidelberg. 2010.

[35] Lian, X. and Chen, L. Reverse skyline search in uncertain databases. ACM Transac-

tions on Database Systems (TODS). 2008.

[36] Loper, S. and Makki, S. Data Filtering Utilizing Window Indexing International Con-

ference on Advanced Information Networking and Applications Workshops. IEEE

24th International Conference on Advanced Information Networking and Applica-

tions Workshops, 2010.

[37] Lu, H. et al. Continuous Skyline Monitoring over Distributed Data Streams Scientific

and Statistical Database Management. Lecture Notes in Computer Science. Springer

Berlin/Heidelberg. 2010.

[38] Lu, H. et al. Flexible and Efficient Resolution of Skyline Query Size Constraints.

IEEE transactions on knowledge and data engineering, Vol.23 (7), p.991-1005 [Peer

Reviewed Journal]. 2011.

[39] Marler, T and Arora J. The weighted sum method for multi-objective optimization:

new insights. Struct Multidisc Optim (2010) 41:853862. Springer Verlag. 2009.

BIBLIOGRAPHY 65

[40] Narzisi, G. et al. Multi-Objective Evolutionary Optimization of Agent-based models:

an application to Emergency Response Planning. The IASTED International Con-

ference on Computational Intelligence, CI 2006, pp. 224-230, November 20-22, San

Francisco, CA, 2006.

[41] Narzisi, G. Multi-Objective Optimization. Courant Institute of Mathematical Sci-

ences. New York University, 2008.

[42] Nickolls, J. and Dally, W. The GPU computing era. IEEE Computer Society. March-

April 2010.

[43] Ozyer, T. et al. Integrating multi-objective genetic algorithm based clustering and

data partitioning for skyline computation Applied Intelligence. Springer Netherlands.

2011.

[44] Papadias, D. et al. Progressive skyline computation in database systems. ACM Trans-

actions on Database Systems 30 (1). 2005. 41-82.

[45] Pareto, V. Manual of political economy. A. M. Kelley Publishers, New York, 1971.

[46] Preparata, F. and Shamos, M. Computational Geometry: An Introduction. Springer-

Verlag, New York, Berlin, etc., 1985.

[47] Rassi, C. et al. Computing closed skycubes. Proceedings of the VLDB Endowment.

2010.

[48] Rocha-Junior, J. et al. Efficient execution plans for distributed skyline query pro-

cessing. Proceedings of the 14th International Conference on Extending Database

Technology. Uppsala, Sweden. 2011.

[49] Sarma, A. et al. Representative skylines using threshold based preference distribu-

tions IEEE 27th International Conference on Data Engineering (ICDE). 2011.

[50] Scarpino, M. OpenCL in action. Manning Publications Co. 2012.

[51] Siddique, A. and Morimoto, Y. Efficient Maintenance of k-Dominant Skyline for Fre-

quently Updated Database First International Conference on Advances in Databases.

Second International Conference on Advances in Databases, Knowledge, and Data

Applications. 2010.

66 BIBLIOGRAPHY

[52] Tao, Y. et al. Distance-based Representative Skyline Chinese University of Hong

Kong University of New South Wales and NICTA Simon Fraser University Data

Engineering. ICDE. 2009.

[53] Tambaram Kailasam, G. et al. Efficient skycube computation using point and

domain-based filtering Information Sciences. 2010.

[54] Tan, K. et al. Efficient progressive skyline computation. In Proceedings of the Very

Large Data Bases Conference (VLDB; Rome, Italy, Sep. 11-14). 2001. 301 310.

[55] The Luc, Dinh. Pareto optimality in Pareto Optimality, Game Theory And Equilib-

ria. Springer, Optimization and its Applications, Vol. 17, 2008.

[56] Valkanas, G. et al. Efficient and Adaptive Distributed Skyline Computation Scientific

and Statistical Database Management. Lecture Notes in Computer Science. Springer

Berlin / Heidelberg. 2010.

[57] Venkatesh, R. et al. Skyline and mapping aware join query evaluation. Information

Systems, Volume 36, Issue 6, September 2011.

[58] Vlachou, A. and Vazirgiannis, M. Ranking the sky: Discovering the importance of

skyline points through subspace dominance relationships. Data & knowledge engi-

neering, Vol.69 (9), p.943-964 [Peer Reviewed Journal]. 2010.

[59] Weber, R. et al. A Quantitative analysis and performance study for Similarity-search

methods in High-dimensional spaces. Proceedings of the 24th VLDB Conference, New

York, USA. 1998.

[60] Xiao, Y. and Chen, Y. Efficient Distributed Skyline Queries for Mobile Applications.

Journal of Computer Science and Technology. Springer Boston. 2010.

[61] Yang, Z. et al. Efficient Analyzing General Dominant Relationship based on Partial

Order Models. IEICE TRANSACTIONS on Information and Systems. 2010.

[62] Yiu, M. et al. Measuring the Sky: On Computing Data Cubes via Skylining the

Measures. IEEE Transactions on Knowledge and Data Engineering, 2010.

BIBLIOGRAPHY 67

[63] Yuan, Y. et al. Efficient computation of the skyline cube. Proceedings of the 31st

International Conference on Very Large Data Bases (VLDB 05). 2005.

[64] Zemke, F. What is new in SQL:2011. SIGMOD Record, March 2012 (Vol 41, No 1).

[65] Zhang, Y. et al. Ranking uncertain sky: The probabilistic top-k skyline operator

Journal Information Systems, July 2011.