computation of skyline points using parallel scan ... · this thesis is presented to the ... maria...
TRANSCRIPT
Computation of Skyline points using parallel scanalgorithms on GPU devices
This thesis is
presented to the
School of Computer Science & Software Engineering
for the degree of
Masters of Science (By Research)
of
The University of Western Australia
By
Maria Luisa Bravo-Rojas
February 2014
Abstract
The computation of skyline points has become a particularly interesting topic in
recent years because of its application in multi-criteria decision-making systems.
Though many efficient algorithms have been designed, several important issues related
to this problem still persist. In particular, most algorithms are inefficient when applied
to high-dimensional datasets. The increase in the size of the skyline is a consequence of
the increase in the number of dimensions. Given that most operational databases are
very large, existing skyline algorithms perform poorly and, with a view to avoid this in-
efficiency, other algorithms have sacrificed the accuracy of results by removing from the
dataset some dimensions qualified as less significant.
Our research intends to propose a method to compute skylines for high-dimensional
databases with large cardinality that delivers accurate results and fast processing of the
data. Following our objective, we have studied the application of parallel programming
techniques and the benefits provided by GPGPU development frameworks.
We have found that simplicity is better when implementing parallelism and our proposed
algorithm scans the data avoiding the creation of data structures and making extensive
use of the functionalities provided by the GPGPU framework to reduce computing time.
Furthermore, our implementation benefits from hardware advantages delivered by GPU
devices as the texture memory space. Nevertheless, the GPGPU framework guarantees
portability to our implementation.
v
Besides cardinality and dimensionality, the correlation coefficient becomes a decisive fac-
tor at the characterization of data. Therefore, we have designed a group of tests using
datasets with different levels of correlation in order to evaluate our algorithm’s perfor-
mance.
vi
Acknowledgements
Peruvian people preserve ancient memories about the “Amautas” who were wise peo-
ple dedicated to teach in the time of the Inka’s Empire. In Peru, our government
awards the title of “Amauta” to outstanding teachers as a sign of respect and acknowledge-
ment. Throughout the course of my studies at the UWA I have enjoyed the opportunity to
work in an environment enriched with the presence of people that brought to my memory
the Amautas tradition.
In that spirit, I would like to thank my main supervisor, Professor Amitava Datta and
co-supervisor, Associate Professor Chris McDonald for their supervision and guidance.
Particularly I am very recognized to Professor Datta for the opportunity to study a Re-
searchs degree.
Finally and always first, my immeasurable recognition to my family’s limitless, boundless
support. We are the Bravos.
ix
Contents
Abstract v
Acknowledgements ix
Contents xi
List of Tables xiii
List of Figures xiv
1 Introduction 1
1.1 Motivation and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature review 7
2.1 The Parallel Programing model . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 General Purpose Computing on GPU (GPGPU) . . . . . . . . . . . . . . 10
2.3 Skyline Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
xi
3 The algorithm 17
3.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 The Algorithm’s method for computation of Skyline points in high-dimensional,
high cardinality data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 The not-dominated relationship and the partitioning of the dataset 18
3.2.2 The weighted-sum approach implemented as a sorting criterion . . 22
3.3 Implementation of the parallel scan algorithm . . . . . . . . . . . . . . . . 23
3.3.1 Main algorithm (Figure 5) . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 Parallelizing the brute-force algorithm in the GPU . . . . . . . . . 25
3.3.3 Parallelizing the brute-force algorithm in the CPU . . . . . . . . . 29
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 The experiments 31
4.1 The experimental environment . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Dataset generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Random datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Dependent datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.3 Real-life datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Results on synthetic data . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Results on real-life data . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Conclusions 59
Bibliography 61
xii
List of Tables
1 NASDAQ values at 20-Nov-2012 . . . . . . . . . . . . . . . . . . . . . . . 1
2 Tasks executed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 % Skyline points in not-correlated synthetic data . . . . . . . . . . . . . . 38
4 % Skyline points in anti-correlated synthetic data . . . . . . . . . . . . . . 39
5 % Skyline points in correlated synthetic data . . . . . . . . . . . . . . . . 39
6 Algorithm’s computing time for not-correlated synthetic datasets . . . . . 39
7 Algorithm’s computing time for anti-correlated synthetic datasets . . . . . 40
8 Algorithm’s computing time for correlated synthetic datasets . . . . . . . 42
9 % Skyline points in real-life data. . . . . . . . . . . . . . . . . . . . . . . . 50
10 Algorithm’s computing time for real-life datasets . . . . . . . . . . . . . . 51
xiii
List of Figures
1 q belongs to the data space under SA . . . . . . . . . . . . . . . . . . . . . 19
2 q belongs to SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 x 6≺ SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 x ≺ SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Block diagram of the parallel scan algorithm . . . . . . . . . . . . . . . . 24
6 Texture memory in the GPU . . . . . . . . . . . . . . . . . . . . . . . . . 25
7 Tuple t mapped into texture memory . . . . . . . . . . . . . . . . . . . . . 26
8 Flow diagram of the parallel brute-force algorithm . . . . . . . . . . . . . 27
9 Flow diagram of the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10 Histogram for correlations observed between each pair of dimensions in a
10K not-correlated synthetic dataset . . . . . . . . . . . . . . . . . . . . . 32
11 Histogram for correlations observed between each pair of dimensions in a
10K anti-correlated synthetic dataset . . . . . . . . . . . . . . . . . . . . . 33
12 Histogram for correlations observed between each pair of dimensions in a
10K correlated synthetic dataset . . . . . . . . . . . . . . . . . . . . . . . 34
13 Histogram for correlations observed between each pair of dimensions in the
correlated NBA dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
14 Histogram for correlations observed between each pair of dimensions in the
not-correlated Microarray dataset . . . . . . . . . . . . . . . . . . . . . . . 35
xiv
15 Histogram for correlations observed between each pair of dimensions in the
correlated EEG1 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
16 Histogram for correlations observed between each pair of dimensions in the
correlated EEG2 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
17 Algorithm’s computing time for processing one cell on not-correlated syn-
thetic datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
18 Algorithm’s computing time for processing one cell on anti-correlated syn-
thetic datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
19 Algorithm’s computing time for processing one cell on correlated synthetic
datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
20 GPU effect on unsorted not-correlated synthetic datasets . . . . . . . . . 43
21 GPU effect on sorted not-correlated synthetic datasets . . . . . . . . . . . 43
22 GPU effect on unsorted anti-correlated synthetic datasets . . . . . . . . . 44
23 GPU effect on sorted anti-correlated synthetic datasets . . . . . . . . . . . 44
24 GPU effect on unsorted correlated synthetic datasets . . . . . . . . . . . . 45
25 GPU effect on sorted correlated synthetic datasets . . . . . . . . . . . . . 45
26 Sorting effect on not-correlated synthetic datasets (GPU algorithm) . . . 46
27 Sorting effect on not-correlated synthetic datasets (CPU algorithm) . . . 47
28 Sorting effect on anti-correlated synthetic datasets (GPU algorithm) . . . 47
29 Sorting effect on anti-correlated synthetic datasets (CPU algorithm) . . . 48
30 Sorting effect on correlated synthetic datasets (GPU algorithm) . . . . . . 48
31 Sorting effect on correlated synthetic datasets (CPU algorithm) . . . . . . 49
32 Algorithm’s computing time for processing one cell on real-life datasets. . 51
33 GPU effect on unsorted real-life datasets . . . . . . . . . . . . . . . . . . . 52
34 GPU effect on sorted real-life datasets . . . . . . . . . . . . . . . . . . . . 53
35 Sorting effect on real-life datasets processed using the GPU algorithm . . 53
xv
36 Sorting effect on real-life datasets processed using the CPU algorithm . . 54
37 Comparing effect of the algorithms in not-correlated synthetic datasets . . 55
38 Comparing effect of the algorithms in anti-correlated synthetic datasets . 56
39 Comparing effect of the algorithms in correlated synthetic datasets . . . . 56
40 Comparing effect of the algorithms in real-life datasets . . . . . . . . . . . 57
41 Top points cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
xvi
Chapter 1
Introduction
The aim of this dissertation is to present fast parallel algorithms for computing the
skyline of a multi-dimensional dataset. We will first motivate this problem through
the following example. A novice investor wants to buy shares and (s)he must decide where
to invest. After some market research, the investor analyzes the following information1
in order to make the decision:
Company Price EPS P\E(US $ ) (US $ )
Microsoft 26.73 1.86 14.36
Apple 562.75 44.16 12.70
Google 672.40 31.94 20.98
Yahoo 18.24 3.28 5.56
Amazon 234.09 0.08 2922.25
Facebook 23.22 0.14 165.00
Table 1: NASDAQ values at 20-Nov-2012
Based on the price the most viable candidate would be Yahoo but other variables such as
EPS (Earnings-per-share ratio) and P/E (Price-to-earnings ratio), should be considered
as well. The EPS ratio is an indicator of the companys profitability and the P/E ratio
forecasts the companys earnings growth. Therefore, the investor will make the best choice
looking for low price, high EPS and high P/E.
1http://www.marketwatch.com/investing/stock/
1
2 CHAPTER 1. INTRODUCTION
Humans face decision making problems on a daily basis with varying complexities. If
we consider the individual investor evaluating three simple variables, it could be thought
as a simple problem. However, for making a good decision, investment companies apply
sophisticated techniques to maximize profit and, quite often, they need to analyze many
variables [3].
Optimization methods approach the analysis of such complex systems by designing mathe-
matical models. Performance criteria are composed of objective functions that are usually
self-contradictory. The procedure of obtaining one or more optimum solutions for a set
of objective functions is known as a Multi-Objective Optimization Problem (MOOP) [40].
The objective function can reach an optimal value that is either a maximum or a min-
imum value. Assuming that the optimization looks for maximum values, the MOOP is
formally represented by [41]:
Maximize F(x) = [f1(x), f2(x), . . . , fM (x)]
subject to G(x) = [g1(x), g2(x), . . . , gJ(x)] ≥ 0
H(x) = [h1(x), h2(x), . . . , hK(x)] = 0
xLi ≤ xi ≤ xUi , i = 1, . . . ,N
Where x = (x1, x2, ..., xN )T is the vector of the N decision variables, M is the number
of objectives fi , there are J inequality and K equality constraints, and xL and xU are
respectively the lower and upper bound for each decision variable xi.
The set of optimal solutions constitutes the Optimal Pareto Front and is obtained by
applying the concept of dominance to compare each pair of possible solutions.
A solution x1 is said to dominate another solution x2, x1 ≺ x2, if both the following
conditions are true:
1.1. MOTIVATION AND OBJECTIVES 3
1. The solution x1 is not worse than x2 in all objectives.
2. The solution x1 is strictly better than x2 in at least one objective (using the mini-
mizing optimization).
Therefore x1 ≺ x2 if fi(x1) ≤ fi(x2)∀i ∈ {1, . . . ,M} and ∃ j ∈ {1,. . . ,M} where fj(x1) <
fj(x2).
The skyline for a dataset groups not-dominated points, i.e., the points that are not dom-
inated by other points. These not-dominated points constitute a subset of the entire
dataset. A point is not-dominated in the subset if it is not dominated by any other point
belonging to the subset. Therefore when a unique point dominates the entire dataset, the
skyline for the dataset is only that point. We can conclude that the search for a skyline
is made up of the computation of all not-dominated points in the dataset.
1.1 Motivation and objectives
Chaudhuri [9] addressed the impact of the Big Data phenomenon on the data manage-
ment industry. Query optimization and Operational Business Intelligence are short listed
as key issues that challenge researchers and developers.
The skyline operation was proposed by Borzsony in 2001, nevertheless the ISO-SQL 2011
standard still does not incorporate this operator [64]. The ISO Committee for Informa-
tion technology approaches the development of the standard by looking for cost-effective,
short development times and market-oriented results. Given that the market evolution
stresses the need for efficient data analysis frameworks, the industry researchers should
be working to provide feasible algorithms aimed to improve query efficiency and simple
enough to be used in complex Business Intelligence platforms.
4 CHAPTER 1. INTRODUCTION
Therefore, this work aims to achieve the following goals:
• Propose a new algorithm for computing Skylines in high-dimensional databases.
• Test the performance of the proposed algorithm.
1.1.1 Challenges
The design of a new Skyline algorithm confronts two main issues:
• Overwhelming complexity is to be avoided in order to reduce computational time.
• The algorithm should optimize memory usage.
Other works [59] have analyzed the impact of run-time complexity in search algorithms
when processing high-dimensional datasets. Given that state-of-the-art Skyline algo-
rithms [29] [44] are based on complex searches, this work aims to provide an alternative
solution.
Memory availability constitutes an ongoing issue for designing Skyline algorithms as more
business experiment the need to process larger datasets because of the increasing rate of
data collection.
1.1.2 Solution
This work proposes a new simple parallel scan algorithm that computes Skyline points
in high dimensional large datasets using a heterogeneous computing approach. Our algo-
rithm is simple and compares well with the existing algorithms in terms of its performance.
1.2 Contribution
The achievement of the goals listed above becomes the contribution of this thesis. We
will implement a new scan algorithm for computing Skylines on GPU devices, run a bat-
tery of tests using different types of datasets characterized by high-dimensionality and
1.3. ORGANIZATION OF THE THESIS 5
high-cardinality, and analyse the results.
Additionally, our work presents an implementation that takes advantage of heterogeneous
computing techniques, following the current trend of designing solutions able to utilize
hybrid architectures.
1.3 Organization of the Thesis
This thesis is structured as follows. In Chapter 2 we examine the parallel programming
models and review some previous algorithms for computing Skyline points. Chapter 3
presents our proposal of a parallel algorithm on the GPGPU framework. The evaluation of
our algorithm is discussed in Chapter 4, elaborating on the impact of data correlation over
the algorithm’s performance. Finally, Chapter 5 summarizes contributions and limitations
of our work.
Chapter 2
Literature review
Trough this chapter, we examine the parallel programming model and review previ-
ous work in the design of algorithms to compute Skyline points. Given that our
work aims to take advantage of parallel processing features, we begin this chapter pre-
senting a summarized view of parallel programming models and their compatible machine
architectures. Then, we examine the GPU model and the hardware support provided by
this technology. Finally, we discuss some work related to our research.
2.1 The Parallel Programing model 1
Parallel programming has been defined as “a form of computation in which many calcu-
lations are carried out simultaneously, operating on the principle that large problems can
often be divided into smaller ones, which are then solved concurrently” [1].
The computer resources used in this computation model are provided differently in dif-
ferent models, e.g., by a single multicore processor computer (Multicore computing), an
arbitrary number of computers connected by a network (Distributed computing), special-
ized parallel computers as the General-purpose computing on graphics processing units
(GPGPU) or a combination of any of the parallel architectures mentioned above.
1This section is based on the online resources provided for the “Applications of ParallelComputers” course at the University of California-Berkeley (http:\\www.cs.berkeley.edu\∼knight\cs267\resources.html)
7
8 CHAPTER 2. LITERATURE REVIEW
Parallel computer architectures provide parallelism, communication and synchronization
functionalities. According to the style of parallelism implemented, computers and software
are classified as Single-instruction-multiple-data (SIMD), Multiple-instruction-multiple-
data (MIMD) or Multiple-instruction-single-data (MISD). Communication patterns follow
the architecture’s memory model, e.g., Shared address space machines (Shared-memory),
Distributed address space machines (Distributed - memory) and the hybrid Distributed -
Shared memory model.
A programming model is made-up of the languages and libraries that create an interface
presenting an abstract view of the machine; this interface is employed by the program-
mer to write programs. Any parallel programming model must provide the user with the
capabilities to express parallelism, communication and synchronization in an algorithm.
The first parallel architectures brought programming models tailored to the architec-
ture specifications and unable to support portability and upgrades. At present, several
architecture-independent parallel programming models coexist supported by compatible
machine models.
The Shared Memory programming model delivers programs structured as a collection of
parallel tasks assigned to threads of control where each thread has a set of private variables
(local stack variables) and a set of shared variables (global heap). These threads commu-
nicate implicitly by writing and reading shared variables and coordinate by synchronizing
on shared variables, implementing protection mechanisms as locks, semaphores and mon-
itors to control concurrent access. A number of libraries implement the Shared Memory
model; widely used are the POSIX threading interface PThreads and the OpenMP speci-
fication for parallel programming. Shared Memory programming is supported by comput-
ers based on Shared Memory, Multithreaded Processors or Distributed Shared Memory
machine models.
The first category provides a global memory shared by all processors. The tasks run-
ning on different processors communicate with each other writing to and reading from
2.1. THE PARALLEL PROGRAMING MODEL 9
the global memory. Multithreaded processors attempt to reduce or hide latency by sup-
porting multiple concurrent streams of threads, which are independent of each other.
These threads are mapped onto hardware contexts that include general-purpose registers,
program counters and status registers. The Cray MTA supercomputer implements the
Multithreaded Processors model. In the Distributed Shared Memory architecture, phys-
ically separated memories can be addressed as one logically shared address space. The
NASA Columbia supercomputer built by SGI constitutes an example of this model.
Inside the Message Passing programming model every processor executes an independent
process communicating by calling subroutines to send data from one processor to an-
other. The address space is local; there is no shared data. Software support is provided
by MPI (Message Passing Interface) which is the de facto standard to enable message
passing applications. Message Passing is supported for the Distributed Memory and the
Internet/Grid Computing machine models.
The Distributed Memory machine model provides each processor with local memory and
cache without direct access to another processor’s memory. The Grid parallel model has
been defined as “a type of parallel and distributed system that enables the sharing, selec-
tion and aggregation of geographically distributed autonomous resources dynamically at
runtime depending on their availability, capability, performance, cost and user’s quality-
of-service requirements” [8]. Two important implementations of this model can be found
at the NASA’s Information Power Grid2 and the SETI@home project3.
A particular case is the Partitioned Global Address Space (PGAS) programming model
that attempts to exploit the data locality features of MPI with the data referencing sim-
plicity facilitated by the Shared Memory model. This objective is achieved providing
local and shared data with a local-view programming style that differentiates between
local and remote data partitions. Languages like the Unified Parallel C (UPC), Co-Array
2http://ntrs.nasa.gov/search.jsp?R=200101113893http://setiathome.berkeley.edu/
10 CHAPTER 2. LITERATURE REVIEW
Fortran (CAF) and Titanium implement PGAS while the Cray XK7 supercomputer pro-
vides support to this programming model4.
Applications based on the Data Parallel programming model consist of parallel opera-
tions applied to all or a defined subset of a data structure. High Performance Fortran
(HPF), CM Fortran and Fortran 90 languages support the Data Parallel model while
hardware support for this model has been provided by Vector and SIMD architectures.
Vector architectures operating on one-dimensional arrays of data were introduced by the
Cray platforms and later the SIMD concept allowing simultaneous processing of all the
vector elements was implemented in the Connection Computer Machine series and the
MasPar supercomputers. The modern Graphic Processing Units (GPUs) are based on a
wide vector width SIMD architecture.
Hybrid programming models refer to the combinations of the parallel programming models
previously mentioned. An example of hybrid systems can be found in the Hopper peta-flop
system5 supporting MPI-OpenMP programming.
2.2 General Purpose Computing on GPU (GPGPU)
The Graphics Processing Unit (GPU) has evolved from a configurable graphics processor
to a programmable parallel processor [42]. Heterogeneous or GPGPU computing aims
to take advantage of the improvement in GPU’s capabilities and the growing availability
of development tools, to gain performance while executing tasks. Specifically, the GPU
device possesses vector processing capabilities used to perform parallel operations while
the CPU core is optimized for low latency on single thread and used for executing serial
portions of code.
At this time NVIDIA and ATI, the most important GPU designers, improved their sup-
port to heterogeneous environments. A remarkable difference between both proposals can
4http://www.cray.com/Products/Computing/XK7/Software.aspx5http://www.nersc.gov/users/computational-systems/hopper/
2.3. SKYLINE POINTS 11
be found in the satisfaction of portability requirements.
NVIDIA provides CUDA (Compute Unified Device Architecture), a C language environ-
ment for parallel application development on the GPU that exposes several hardware
features in order to obtain better performance, but restricting the application of this
framework to applications running on NVIDIA’s graphics cards6.
Both NVIDIA and AMD hardware are supported by OpenCL (Open Computing Lan-
guage), a heterogeneous programming framework that is managed by the nonprofit tech-
nology consortium Khronos Group. Applications created using OpenCL can be executed
across a range of device types made by different vendors, supporting different levels of
parallelism and efficiently mapping to homogeneous or heterogeneous systems. This cross-
platform, industry-wide support guarantees portability to applications developed using
the OpenCL framework [18].
2.3 Skyline Points
Pareto approached the Multi-Objective Optimization (MOO) problem almost two cen-
turies ago proposing the concept of optimality to determinate the points considered as
possible solutions [45]. Further mathematical treatment of this theory [55] introduced
the concept of dominance. Kung et al [31] devised a basic divide-and-conquer algorithm
to find the optimal points in a multi-dimensional space, referring the MOO problem as
the Maximum Vector Problem. Borzsony et al [7] extended this work to the database
field including these calculations in the SQL language and designating the set of resultant
points as Skyline points.
Borzsony et al [7] presented the Divide-and-Conquer (D&C) and the Block Nested Loop
(BNL) algorithms. D&C divides the data into partitions, obtains partial skylines for
each partition and finally merges the partial skylines to find the skyline for the whole
dataset. BNL constitutes an improvement over the brute-force approach that compares
6https:\\developer.nvidia.com\cuda-faq
12 CHAPTER 2. LITERATURE REVIEW
every point with every other point; this algorithm maintains a subset of not-permanent
not-dominated points in memory, comparing each dataset point against this temporal
skyline and updating the subset after the comparison.
Tan et al [54] proposed two algorithms: Bitmap and Index. The Bitmap algorithm pre-
processes a N-dimensional dataset to build a bitmap structure where each dimension Di
with Mi values is represented as an NxMi vector. Then the algorithm projects the values
of each dataset point onto the NxMi vector to obtain a data structure where using bitwise
operations the retrieving of the skyline is a fast operation. The Index algorithm finds the
dimension Di where a dataset point Pj has got the largest value among all the dimensions
and stores the Pj values in a partition corresponding to Di. The B+-structure is used to
index the data partitions and search for the skyline points.
The Nearest Neighbor (NN) algorithm was proposed by Kossmann et al [29] . NN divides
the data space into regions and obtains partial skylines searching for nearest neighbors
of the origin in each region. The Branch-and-Bound Skyline algorithm (BBS) [44] is also
based on nearest-neighbor search, using R-trees as data-partitioning method and adopt-
ing the branch-and-bound paradigm to prune the data space.
The large number of applications for skylines in multi-criteria decision-making, data-
mining and visualization and user-preferences queries have made the computation of sky-
lines an active area of research in recent years. The number of points in the skyline can
be very large, depending on characteristics like the high dimensionality of the data and
anti-correlated data [28].
The rate of growth in the size of the datasets has motivated the creation of a number of
algorithms to optimize the computation of the skylines [39] [32] [43] [61]. The analysis
of the skyline subspaces has found a practical application of the data-warehouse cube
paradigm presenting the Skycube [47] [33] [62] [53] [63]. This algorithm creates a cube to
answer multiple related skyline queries.
2.3. SKYLINE POINTS 13
One of the approaches for improving the efficiency of skyline algorithms is to relax the
requirement of absolute dominance to reduce the number of objects retrieved by the algo-
rithm [58] [51] [38] [49] [52], hence reducing the computation load. This approach delivers
k-dominant skyline queries and has been improved with the representative skylines, which
introduce the p-core constraint to get only the most representative points on the skyline
[17].
The progressive delivery of results is another characteristic desirable in skyline query pro-
cessing algorithms and indispensable for users of large datasets. Algorithms based on
bitmaps and indexes have been proposed to provide this functionality [25] [34] [36] [52].
Another main issue is the application of skylines to data streams and uncertain data.
Researchers working on this topic have proposed the application of probabilistic skyline
queries for uncertain data [2] [65] [35] [14] [15]. The data streams skylines problem has
been solved in [37], but the weakness of any proposal is the incremental maintenance of
the skyline [20].
Furthermore, the increasing need of skyline applications running on mobile devices and
the proliferation of distributed sources of data, have encouraged new developments con-
cerning communication and computation efficiency [57] [60]. In this line of work some
researchers have proposed the processing of the data load on a centralized server and oth-
ers favor the distributed alternative. This line of work is associated with proposals aiming
at the adequate planning for the execution of skyline queries [48]. Data exploration and
data mining on skylines have been investigated and a number of measures have been
found to compare skylines [6] [22]. Another work has developed techniques to discover
prominent streaks in sequence data using skylines and extending the conventional data
mining tasks [23].
From a database point of view, indexing and use of very powerful hardware leverage the
efficiency in solving the skyline problem. The smartest indexing algorithms provide us
with the tools, but we need to model the methods to use these tools. Distributed com-
puting is an efficient method to take advantage of algorithms and hardware resources
14 CHAPTER 2. LITERATURE REVIEW
and is proposed in some research works on skylines with distributed sources of data [26].
However, these approaches deliver approximate or representative results only, sacrificing
accuracy in order to gain efficiency.
The computational capabilities of graphics processors units (GPUs) have been used to
perform non-graphical routines calling this practice general-purpose GPU computing or
GPGPU computing [50]. Addressing database and data mining applications, Govindaraju
et al [19] presents a bitonic-based sort algorithm and analyses the improvement obtained
by the use of texture mapping and blending functionalities of the GPU. Looking for a
fair application of parallel computing, Hyeonseung et al [21] studied the computation of
Skyline points in multicore CPU architectures.
Later, Choi et al [10] presented a GPU-based Nested Loop algorithm implemented using
CUDA and a 1024 Mb graphic card. While their proposal attempted to deal with the
overhead from memory exchange issue, the experiments have been conducted on datasets
with cardinality below or equal to 100K and dimensionality not higher than 30. Therefore,
the effect of data transfer executed by the algorithm inside the GPU memory (shared and
local memory) is not affecting visibly the performance.
The MapReduce paradigm has been used to process skylines and its variants [11], propos-
ing data partitions based on quadtrees in order to prune the data and define histograms
used in the map and reduce functions. In spite of a parallel implementation, the experi-
mental part has been run using a small number of dimensions.
Returning to the main issue, the weighted-sum method for MOO has been approached
by Marler and Arora [39]. Chomicki et al in [12] discussed the time reduction effect of
presorting the dataset before processing the skyline.
2.4. SUMMARY 15
2.4 Summary
A review of previous work on this topic provides a broad classification of the techniques
used to solve the Skyline problem. The D&C algorithm proves to be impractical for
processing high-dimensional, high-cardinality datasets when implemented in sequential
algorithms. The BNL algorithm works with a partial Skyline obtained over a data space
whose size increases at each iteration. Therefore this approach is unable to deal efficiently
with subsets of data too big to fit in memory.
The use of Bitmaps in a large dataset requires the creation of a bitmap structure for each
dimension where one bit is stored for each different value in the dimension represented
by the bitmap. The size of this bitmap and the cost associated with the computation of
each value for all the dimensions make this alternative impractical.
NN and BBS use R-Tree as data structure and Nearest-neighbor search to find the not-
dominated points. Weber et al [59] analyses Nearest-neighbor search and R-Tree data
partitions. The study concludes that for these methods among others, there is a dimen-
sionality D beyond which a simple sequential scan performs better.
In this review we have approximated the Skyline computation problem from the perspec-
tive of efficient use of resources provided by the heterogeneous architectures currently
available. Nowadays, big data challenges require solutions to high-dimensionality and
hyper-high cardinality. We have reviewed the main issues and looked for new theoretical
solutions. Finally, we have found that given the hardware resources available and GPGPU
computing advances the best solution is the simplest one: Simplicity is better. This work
proposes a simple and efficient parallel scan algorithm designed to exploit GPU’s and
CPU’s functionalities. Our experiments measure performance over different dataset types.
Chapter 3
The algorithm
Through this chapter we propose a parallel scan algorithm for computing Skyline
points in high-dimensional datasets with high cardinality. We implement the al-
gorithm on a GPGPU framework and a benchmark version running on a multicore CPU
environment. This chapter begins summarizing some concepts related to the finding of
Skyline points, used throughout this thesis.
3.1 Notations and Definitions
We are using the symbol ≺ to denote the dominance relationship between two points,
and the maximizing optimization to find the optimal solutions. The symbol 6≺ is used to
denote not-dominance.
Definition
Let U be an N-dimensional space containing points p and q.
• p is not-dominated by q, q 6≺ p, if ∃ i ∈ {1,. . . ,N} where pi > qi.
• p is not-dominated in U if q 6≺ p ∀ q ∈ U.
Our work studies multidimensional sets of points and a computational problem relevant
in a data-mining context. Throughout this document, we are using the word tuple as
synonymous of an N -dimensional point.
17
18 CHAPTER 3. THE ALGORITHM
3.2 The Algorithm’s method for computation of Skyline
points in high-dimensional, high cardinality data
Finding skyline points in big data spaces aggravates the issues related to memory resources
and computation time. We are approaching this problem following the strategy described
next:
1. The time complexity will be minimized using a parallelized brute-force algorithm
with complexity O(d) where d is a measure of density defined as:
d = M ×N where M = number of tuples , N = number of dimensions.
2. The storage complexity issue is approached by designing data partitions that take
advantage of GPU and CPU resources.
3. We improve the efficiency in the pruning of the data space by pre-sorting the dataset
using a weight-sum criterion.
The first action, related to the algorithm’s parallelization, will be described in Section 3.
The actions established to deal with data partitions’ design and pruning optimization,
are elaborated in the next subsections.
3.2.1 The not-dominated relationship and the partitioning of the dataset
By definition, the Pareto frontier clusters not-dominated points qualified after the evalu-
ation of weak dominance in the dataset. Applying the divide and conquer technique, our
solution splits the dataset into segments tailored to the working memory and discovers
the group of not-dominated points in each segment. These partial skylines constitute a
new smaller dataset that is processed to find a final skyline.
Given that the algorithm is looking for not-dominated points, the implemented process
attempts to find the first point dominating the point under evaluation. This characteristic
provides an effective pruning of the data space.
3.2. THE ALGORITHM’S METHOD FOR COMPUTATIONOF SKYLINE POINTS IN HIGH-DIMENSIONAL, HIGH CARDINALITY DATA19
The not-dominance relationship is not-transitive, nevertheless our solution proves that in
an N -dimensional space and given that each data partition constitutes an N -dimensional
sub-space, a transitive relationship exists between the partial skylines and the points be-
longing to the data partitions.
According to the concept of dominance, given an N -dimensional space U where p ∈ U
and q ∈ U , we use the following relationships:
1. p ≺ q , p dominates q ≡ q 6≺ p, p is not-dominated by q
2. q ≺ p, q dominates p ≡ p 6≺ q, q is not-dominated by p
3. p 6≺ q and q 6≺ p, p and q are incomparable
Given the N -dimensional sets U , A, SA where A ⊂ U and SA ⊂ A, if q 6≺ p ∀q ∈ A,∀p ∈
SA ⇒ SA defines the skyline for A.
Therefore:
• If p ≺ q ⇒ q ∈ (A− SA) (Figure 1)
• If p 6≺ q and q 6≺ p⇒ q ∈ SA (Figure 2)
Figure 1: q belongs to the data space under SA
20 CHAPTER 3. THE ALGORITHM
Figure 2: q belongs to SA
A relationship of dominance can be defined between any point x and a partial skyline SA.
Definition
Given x ∈ U :
• If ∀p ∈ SA, p 6≺ x⇒ SA 6≺ x (Figure 3)
• If ∀p ∈ SA, x ≺ p⇒ x ≺ SA (Figure 4)
• If ∃p ∈ SA, p 6≺ x⇒ SA 6≺ x
• If ∀p ∈ SA, p ≺ x⇒ SA ≺ x
3.2. THE ALGORITHM’S METHOD FOR COMPUTATIONOF SKYLINE POINTS IN HIGH-DIMENSIONAL, HIGH CARDINALITY DATA21
Figure 3: x 6≺ SA
Figure 4: x ≺ SA
Definition
• ∀x ∈ U,∀q ∈ (A− SA), SA 6≺ x, q 6≺ SA ⇒ q 6≺ x
• ∀x ∈ U,∀q ∈ (A− SA), ∀p ∈ SA, p 6≺ x, q 6≺ p⇒ q 6≺ x
22 CHAPTER 3. THE ALGORITHM
Our approach discovers partial skyline points selecting not-dominated points and ignor-
ing the sub-space dominated by these points. Then the algorithm processes these partial
skylines discovering not-dominated points that, applying the previous definition, are not-
dominated by the sub-space under each partial skyline. The number of data partitions
decreases in each iteration until the final skyline is computed in a non-partitioned data
space.
Partitioning the N -dimensional data-space into several N -dimensional sub-spaces permits
the comparison between points positioned in different partial skylines. The fragmentation
of the N -dimensional data-space in slices representing combinations of M dimensions
where M < N allows the computation of partial skylines for each slice, however the
intersection of these partial results will always supply an empty set or a unique point
dominating every point in each dimension of the data-space.
3.2.2 The weighted-sum approach implemented as a sorting criterion
The formal representation of the Multi-Objective Optimization Problem (MOOP) was
presented before referencing [41]:
Maximize F(x) = [f1(x), f2(x), . . . , fM (x)]
subject to G(x) = [g1(x), g2(x), . . . , gJ(x)] ≥ 0
H(x) = [h1(x), h2(x), . . . , hK(x)] = 0
xLi ≤ xi ≤ xUi , i = 1, . . . ,N
The weighted-sum method provides a solution to MOOP by selecting scalar weights wi
and maximizing the following function:
U =∑N−1
i=0 wifi(x)
3.3. IMPLEMENTATION OF THE PARALLEL SCAN ALGORITHM 23
In spite of the lack of accuracy inherent to this approach when used to search the so-
lution space [13], the application of this technique to order the data space moves the
stronger skyline candidate points to the top positions. Our algorithm takes full advan-
tage of this arrangement in the partitioning of the data space, facilitating a more efficient
pruning action.
3.3 Implementation of the parallel scan algorithm
3.3.1 Main algorithm (Figure 5)
Firstly, the algorithm orders the data space placing the strong candidate skyline points
to the top. For each tuple p, the sort algorithm calculates W =∑N−1
i=0 pi. Given that we
are using a maximizing optimization, the tuples with higher W obtain the upper places
in the ordered dataset. Even when the existence of outliers prevents this initial process
from delivering a skyline for the dataset, the sorting provides a data space optimized for
the subsequent partitions.
After pre-sorting the dataset, the size of the data partition S is established based on the
texture memory available in the GPU device. Half of the GPU memory is assigned to an
array B that stores an image containing the values corresponding to the upper S points
placed in the ordered data space. The remainder memory resources are allocated to an
array A and used to load one data partition at a time; at each iteration a partial sky-
line is computed by comparing the points in the array A against the points kept in array B.
Through the comparing task, the algorithm searches for not-dominated points. Given that
the tuples stored in the array B represent the points with the highest sum of values in N
dimensions, the points belonging to the partition loaded in array A are efficiently pruned
from the data space.At this stage, the algorithm has found T points belonging to partial
skylines. A new dataset is defined containing only these T points and loaded into CPU
memory. Next, the algorithm processes the reduced data space and delivers a final skyline.
3.3. IMPLEMENTATION OF THE PARALLEL SCAN ALGORITHM 25
3.3.2 Parallelizing the brute-force algorithm in the GPU
The algorithm stores data in images intended to be processed in the GPU, taking advan-
tage of the faster access to texture memory provided in the GPU device. The following
paragraphs explain the translation of tuple values into pixels (mapping) and the logical
sequence of the kernel implemented in the GPU to process the images containing the
dataset.
Mapping the dataset into the texture memory
A GPU device processes images represented as a set of pixels, each pixel linked to coor-
dinates that define its position in the image. Figure 6 shows that for each pixel, the color
space is defined following the RGBA color model. These component intensities are stored
as four numeric values for a pixel.
Figure 6: Texture memory in the GPU
Vectors composed by 4 values are named as vector4 type in OpenCL nomenclature. Fig-
ure 7 shows the translation for the N values of tuple t into N/4 vector4 values. Each
vector4 is stored in the texture memory as a pixel, where in turn each component of the
color space represents a tuple value in one of the N dimensions.
26 CHAPTER 3. THE ALGORITHM
Figure 7: Tuple t mapped into texture memory
Implementing the Kernel
Figure 8 shows the implementation of a brute-force algorithm in a GPU kernel. The
texture memory size limits S representing the maximum number of tuples allowed in one
partition.
S = Texture memory size*4/ N
The OpenCL kernel instantiates a set of work items running in parallel to process the
same code on different elements of the dataset. Given that the algorithm will process S
tuples at each partition, the algorithm adjusts the kernel to create S work items in turn.
Inside the kernel, the algorithm’s code favors simplicity in order to obtain maximum
efficiency of the parallel implementation. Reading the image involves calculations of
coordinates and displacements. These instructions are coded using vector4 functions to
compensate the cost of each comparison instruction in the GPU processor.
3.3. IMPLEMENTATION OF THE PARALLEL SCAN ALGORITHM 27
Figure 8: Flow diagram of the parallel brute-force algorithm
The point p evaluated in the work item is compared against the points mapped in the
second image stored in B. The algorithm’s intent is to prune the data space of dominated
points; therefore the comparison task tries to find at least one point q dominating p and
stop the process. If no such q is found dominating p, p belongs to the partial skyline and
will be evaluated against other partial skylines in the CPU.
3.4. SUMMARY 29
3.3.3 Parallelizing the brute-force algorithm in the CPU
The kernel running in the CPU device is the same kernel implemented in the GPU. A
new dataset is defined to contain the points belonging to the partial skylines found in
the GPU kernel. The CPU device offers a memory space limited only for the RAM size;
therefore the algorithm process the dataset without partitions. Since this arrangement
enables the evaluation of each tuple against a whole data space, the skyline delivered at
the end of the process constitutes the skyline for the original dataset.
3.4 Summary
This chapter introduced an algorithm to compute skyline points in high-dimensional
datasets with high cardinality. This algorithm parallelizes the simple brute-force ap-
proach providing a hybrid implementation by adding the divide and conquer technique.
The algorithm aims to maximize efficiency in time by re-ordering the data space with
criteria directed to localize best candidates to skyline points. Afterwards, the dataset
is partitioned according to the GPU texture memory resources and creating a special
partition containing strong points. Inside the kernel, the algorithm minimizes delay by
avoiding warp divergence. Our use of the not-dominated relationship provides the algo-
rithm with a faster pruning strategy.
The partial skylines found in the GPU define a new dataset. The kernel implemented in
the CPU works in this smaller data space and finally delivers the skyline of the original
dataset.
Chapter 4
The experiments
Through this chapter, we present the evaluation of the GPGPU-based parallel algo-
rithm introduced in the previous chapter. The multi-core CPU algorithm version
is used to benchmark our algorithm’s performance. Firstly, we detail the characteristics
of the datasets used to test the algorithms and the design of the experiment.
4.1 The experimental environment
The experiments were conducted using the following configuration:
• CPU: Processor Intel Core i5-2400S, 4 Cores, 2.5 GHz clock speed, 6 Mb cache.
• GPU: AMD Radeon HD 6750M, 512 MB, 720 Stream Processors, 36 Texture Units.
• OS: Mac OS X Lion 10.7.5.
The algorithms were coded in C99 using the OpenCL framework provided by XCode.
4.2 Dataset generation
We tested the efficiency of the proposed algorithm processing three types of data: syn-
thetic random, synthetic dependent and real-life datasets. To compare the correlations
in the datasets,we analysed tuples with values in 55 dimensions obtaining the correlation
31
32 CHAPTER 4. THE EXPERIMENTS
for each pair of dimensions DiDj where i, j ∈ [1 . . . 55], getting 1485 observations with
each type of synthetic dataset and for all the real-life datasets. The only exception was
the NBA dataset, that was analysed for 19 dimensions.
4.2.1 Random datasets
A first group of synthetic random data was generated by varying dimensionality and car-
dinality with values obtained using the random generator discussed in [24]. The analysis
of correlation for 10000 tuples (Figure 10) shows the dataset’s dimensions with values
near to zero for correlation measurements. Hereafter we will use the term not-correlated
synthetic datasets to refer to this type of data.
Figure 10: Histogram for correlations observed between each pair of dimensions in a 10Knot-correlated synthetic dataset
A second group of random datasets was generated adding restrictions to the values ob-
tained by the random generator in order to obtain pairs of variables with a negative
correlation rate. The correlation analysis for the 10000 tuples is displayed in Figure 11.
Hereafter we will use the term anti-correlated synthetic datasets to refer to this type of
data.
4.2. DATASET GENERATION 33
Figure 11: Histogram for correlations observed between each pair of dimensions in a 10Kanti-correlated synthetic dataset
4.2.2 Dependent datasets
Another type of synthetic data was generated by varying dimensionality and cardinality
with values restricted to an area under a skyline or Pareto Front composed of 1000 points.
The skyline was obtained on a Random dataset and our algorithm included the skyline
points following a random order as discussed in [24]. We analysed correlation for 10000
tuples and Figure 12 shows a higher correlation than that obtained for the Random
datasets. Hereafter we will use the term correlated synthetic datasets to refer to this type
of data.
4.2.3 Real-life datasets
NBA dataset. This dataset was obtained at http://www.databasebasketball.com/ con-
taining 21000 tuples with values in 19 dimensions. We analysed 171 correlation observa-
tions for the entire dataset and Figure 13 shows a high correlation between the dimensions.
34 CHAPTER 4. THE EXPERIMENTS
Figure 12: Histogram for correlations observed between each pair of dimensions in a 10Kcorrelated synthetic dataset
Microarray dataset. From the Standford Microarray database1 we obtained a dataset
containing 47000 tuples and values in 55 dimensions. We analysed correlation for the
entire dataset and Figure 14 shows that near 80% correlation between dimensions have
values around zero.
EEG datasets. Two datasets with values in 55 dimensions were obtained from [4]:
EEG1 containing 210259 tuples and EEG2 containing 631200 tuples. We analysed corre-
lation observations for both whole datasets and Figure 15 shows the analysis for EEG1
with almost 70% of the observations getting a correlation ratio greater than 0.7. Figure 16
shows the analysis for EEG2 with almost 75% of the observations getting a correlation
ratio greater than 0.7.
1www.smd.standford.edu
4.2. DATASET GENERATION 35
Figure 13: Histogram for correlations observed between each pair of dimensions in thecorrelated NBA dataset
Figure 14: Histogram for correlations observed between each pair of dimensions in thenot-correlated Microarray dataset
36 CHAPTER 4. THE EXPERIMENTS
Figure 15: Histogram for correlations observed between each pair of dimensions in thecorrelated EEG1 dataset
Figure 16: Histogram for correlations observed between each pair of dimensions in thecorrelated EEG2 dataset
4.3. RESULTS AND ANALYSIS 37
4.3 Results and analysis
We designed a set of tasks to verify the accuracy of our algorithm and measure the com-
puting time for each dataset. The analysis demanded separated tests of the GPU impact
and the pre-sorting process. We coded an implementation of the algorithm running only
in the CPU and it was used to benchmark the proposed algorithm. Hereafter the parallel
scan algorithm presented in this work is referred to as the GPU-CPU algorithm and the
benchmark implementation is identified as the CPU algorithm.
The test of the GPU-CPU algorithm involved the execution of the following tasks:
Task Description
1 Generation of synthetic not-correlated and anti-correlated data:- 55 dimensions- Cardinalities 10K, 50K, 100K and 500K
2 Generation of a Skyline for the 10K dataset using 20 dimensions.
3 Generation of datasets with points localized under the Pareto Front established bythe Skyline obtained in task 2.
4 Formatting real-life datasets.
5 Computation of Skylines for each Unsorted dataset and dimensionality using a navebrute-force algorithm.
6 Computation of Skylines for each Unsorted dataset and dimensionality using theCPU algorithm.
7 Computation of Skylines for each Unsorted dataset and dimensionality using theGPU-CPU algorithm.
8 Comparing the Skylines obtained on task 5 against Skylines obtained in task 6.
9 Comparing the Skylines obtained on task 5 against Skylines obtained in task 7.
10 Computation of Skylines for each Sorted dataset and dimensionality using a navebrute-force algorithm.
11 Computation of Skylines for each Sorted dataset and dimensionality using the CPUalgorithm.
12 Computation of Skylines for each Sorted dataset and dimensionality using the GPU-CPU algorithm.
13 Comparing the Skylines obtained on task 10 against Skylines obtained in task 11.
14 Comparing the Skylines obtained on task 10 against Skylines obtained in task 12.
Table 2: Tasks executed
38 CHAPTER 4. THE EXPERIMENTS
The next sections have been structured to display the experiments results using the fol-
lowing tables and graphs:
Object of analysis Synthetic datasets Real-life datasets
1. Skyline points density Tables 3, 4, 5 Table 92. Computing time Table 6, Figure 17 Table 10, Figure 32
Table 7, Figure 18Table 8, Figure 19
3. GPU effect Figures 20, 21, 22, 23, 24, 25 Figures 33, 344. Sorting effect Figures 26, 27, 28, 29, 30, 31 Figures 35, 36
4.3.1 Results on synthetic data
The Skyline points density was measured as the percentage of skyline points found in the
synthetic datasets (Tables 3, 4, 5).
Anti-correlated datasets were generated coupling dimensions on pairs with a correlation
ratio equal to -1; this characteristic explains why every tuple belongs to the Skyline in
this dataset type. Similar behavior is observed in not-correlated data starting at the 23th
dimension.
The correlated synthetic dataset was generated using a Skyline composed by 10000 tuples
on 19 dimensions. Therefore, once the Skyline has reached a cardinality equal to 10000,
it remains unchanged through the next dimensions.
Dimensions
Cardin. 11 15 19 23 27 31 35 39 43 47 51 55
10000 62.85 91.50 98.92 99.92 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
50000 41.94 81.48 96.82 99.53 99.95 99.99 100.0 100.0 100.0 100.0 100.0 100.0
100000 35.75 76.19 95.33 99.37 99.95 99.99 100.0 100.0 100.0 100.0 100.0 100.0
500000 21.68 60.48 88.49 98.09 99.78 99.98 99.99 99.99 100.0 100.0 100.0 100.0
Table 3: % Skyline points in not-correlated synthetic data
4.3. RESULTS AND ANALYSIS 39
Dimensions
Cardin. 11 15 19 23 27 31 35 39 43 47 51 55
10000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
50000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
100000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
500000 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Table 4: % Skyline points in anti-correlated synthetic data
Dimensions
Cardin. 11 15 19 23 27 31 35 39 43 47 51 55
10000 8.77 9.89 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00
50000 1.75 1.98 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
100000 0.88 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
500000 0.18 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20
Table 5: % Skyline points in correlated synthetic data
Computing time
The GPU-CPU sorted algorithm processing not-correlated synthetic datasets delivers re-
sults where the effect of cardinality affects the computing time performance significantly
more than the dataset dimensionality (Table 6). Because the GPU-CPU algorithm main
task involves only vectors comparisons, the increment in the number of cells processed
would be expected to result in a proportional increment in the processing time. On the
contrary, our results show that while the increment in computing time for cardinalities un-
der 50K is small and constant, the increment for higher cardinalities display a logarithmic
trend (Figure 17).
Cardinality
Dimensions 10000 50000 100000 500000
11 1.017652 2.475153 8.439203 205.574463
15 0.292031 6.142216 27.723028 1328.326538
19 0.391086 9.497192 56.642090 2557.642578
23 0.396959 10.761326 429.605225 3058.534424
27 0.395591 12.631336 87.725037 3445.944092
31 0.345020 14.184874 514.369629 3765.470459
35 0.401556 18.972763 617.557678 15470.986328
39 0.403797 20.974472 680.608521 17053.708984
43 0.409707 23.135134 744.215027 18638.875000
47 0.418898 24.800280 798.059509 19992.802734
51 0.419861 26.148502 862.092285 21582.740234
55 0.428797 26.679304 926.026123 23195.812500
Table 6: Algorithm’s computing time for not-correlated synthetic datasets
40 CHAPTER 4. THE EXPERIMENTS
Figure 17: Algorithm’s computing time for processing one cell on not-correlated syntheticdatasets.
The algorithm’s performance working on the anti-correlated synthetic datasets is sum-
marized in Figure 18 and Table 7. The average computing time for one cell is mostly
constant on these datasets; this behavior can be explained by the density of the Skyline
points reaching the 100% in this data space.
Cardinality
Dimensions 10000 50000 100000 500000
11 0.266033 6.160472 247.338318 6206.256348
15 0.254809 6.112443 285.676178 7174.325195
19 0.293400 7.653198 364.562408 9138.733398
23 0.296162 9.572461 427.748474 10696.526367
27 0.298267 12.191456 480.847626 12019.625977
31 0.262136 14.040410 504.473419 12618.903320
35 0.303876 18.668655 606.348694 15170.001953
39 0.308801 21.145960 672.321716 16812.277344
43 0.313407 23.297487 731.681946 18289.800781
47 0.313600 24.434307 792.916382 19823.744141
51 0.317469 25.232813 856.372070 21424.695312
55 0.329156 25.835175 922.723022 23079.070312
Table 7: Algorithm’s computing time for anti-correlated synthetic datasets
While working on correlated synthetic datasets, the algorithm obtains its best perfor-
mance processing 100K tuples (Figure 19 and Table 8).
4.3. RESULTS AND ANALYSIS 41
Figure 18: Algorithm’s computing time for processing one cell on anti-correlated syntheticdatasets
Effect of using GPU
We analysed the effect of processing the synthetic datasets in the GPU device using the
texture memory to obtain a faster access to the data. The results displayed in Figure 20
and Figure 21 establish that the processing time remains unchanged when the algorithm
makes use of the GPU processors on not-correlated synthetic data.
Figure 22 and Figure 23 display the effect of the use of GPU processors on anti-correlated
synthetic data processing. The effect is null.
The analysis of the running time for the correlated synthetic dataset finds a significant
decrease assignable to the utilisation of the GPU processing capabilities. The results are
shown in Figure 24 and Figure 25.
42 CHAPTER 4. THE EXPERIMENTS
Figure 19: Algorithm’s computing time for processing one cell on correlated syntheticdatasets.
Cardinality
Dimensions 10000 50000 100000 500000
11 0.024828 0.035293 0.057874 0.561151
15 0.024696 0.038187 0.066345 0.735816
19 0.060545 0.079764 0.114862 1.015052
23 0.067920 0.089358 0.131216 1.223082
27 0.073202 0.098617 0.146550 1.450916
31 0.076620 0.105001 0.158715 1.563144
35 0.087926 0.121737 0.182078 1.828309
39 0.094304 0.133889 0.198746 2.046243
43 0.102605 0.143463 0.218502 2.290935
47 0.107733 0.151773 0.232963 2.340979
51 0.114884 0.162657 0.255126 2.727166
55 0.122318 0.171794 0.264067 2.648500
Table 8: Algorithm’s computing time for correlated synthetic datasets
4.3. RESULTS AND ANALYSIS 43
Figure 20: GPU effect on unsorted not-correlated synthetic datasets
Figure 21: GPU effect on sorted not-correlated synthetic datasets
44 CHAPTER 4. THE EXPERIMENTS
Figure 22: GPU effect on unsorted anti-correlated synthetic datasets
Figure 23: GPU effect on sorted anti-correlated synthetic datasets
4.3. RESULTS AND ANALYSIS 45
Figure 24: GPU effect on unsorted correlated synthetic datasets
Figure 25: GPU effect on sorted correlated synthetic datasets
46 CHAPTER 4. THE EXPERIMENTS
Effect of sorting
We analysed the effect of pre-sorting the synthetic datasets to optimize pruning efficiency
in the algorithms. Figure 26 and Figure 27 show that the computing time is not affected
by the pre-sorting in the not-correlated synthetic datasets, neither using the GPU texture
memory nor taking advantage of the CPU multiprocessing capabilities.
Figure 26: Sorting effect on not-correlated synthetic datasets (GPU algorithm)
Figure 28 and Figure 29 show that the effect of pre-sorting in the anti-correlated synthetic
datasets processing is null. Figure 30 shows that the computing time is not affected by
the pre-sorting in the correlated synthetic datasets when the process is executed in the
GPU texture memory. Processing the same datasets using the CPU multiprocessing ca-
pabilities provides an improvement in the computing time directly proportional to the
dataset density growth (Figure 31).
4.3. RESULTS AND ANALYSIS 47
Figure 27: Sorting effect on not-correlated synthetic datasets (CPU algorithm)
Figure 28: Sorting effect on anti-correlated synthetic datasets (GPU algorithm)
48 CHAPTER 4. THE EXPERIMENTS
Figure 29: Sorting effect on anti-correlated synthetic datasets (CPU algorithm)
Figure 30: Sorting effect on correlated synthetic datasets (GPU algorithm)
4.3. RESULTS AND ANALYSIS 49
Figure 31: Sorting effect on correlated synthetic datasets (CPU algorithm)
50 CHAPTER 4. THE EXPERIMENTS
4.3.2 Results on real-life data
The Skyline points density was measured as the percentage of skyline points found in the
real-life datasets (Table 9). The previous correlation analysis for the Microarray dataset
qualified it as a not-correlated data space; this characteristic explains the high percentage
of Skyline points starting at dimension 43. The NBA, EEG1 and EEG2 datasets have
been categorized as highly correlated and show a slight increment in the Skyline density,
proportional to the raise in the dimensionality.
Dimensions
Cardinality 11 15 19 23 27 31 35 39 43 47 51 55
20000 3.31 5.85 9.40(NBA)
47000 0.79 2.37 2.73 4.94 6.40 8.16 16.70 23.01 99.40 99.91 100 100(Microarray)
210259 0.01 0.01 0.01 0.03 0.03 0.06 0.15 0.17 0.23 0.23 0.26 0.27(EEG1)
631200 0.08 0.08 0.08 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09(EEG2)
Table 9: % Skyline points in real-life data.
Computing time
The GPU-CPU sorted algorithm processing correlated real-life datasets, reduces the com-
puting time more efficiently than the same algorithm working on not-correlated real-life
datasets (Table 10). Processing each cell requires almost the same fraction of clocks on
EEG1 through the increasing number of dimensions, while this variation is slightly wider
on EEG2. In spite of the NBA dataset low cardinality, the average computing time for
one cell contained in this data space represents more than twice the value obtained for
the EEG1 dataset. The Microarray dataset presents the same behavior than the not-
correlated synthetic datasets when measuring computing times for one cell (Figure 32).
4.3. RESULTS AND ANALYSIS 51
Cardinality
Dimensions 20000 47000 210259 631200(NBA) (Microarray) (EEG1) (EEG2)
11 0.027071 0.036561 0.135716 0.92190915 0.031448 0.045965 0.166571 1.15345719 0.047514 0.141917 0.208821 1.48587923 0.098986 0.240849 1.76307127 0.133895 0.278463 2.03667331 0.210513 0.317111 2.35142135 0.902592 0.361544 2.65055139 30.426100 0.384954 2.96150643 23.837299 0.444117 3.34213647 25.255312 0.485952 3.49160651 27.135036 0.525998 4.03482555 27.916765 0.601897 4.002118
Table 10: Algorithm’s computing time for real-life datasets
Figure 32: Algorithm’s computing time for processing one cell on real-life datasets.
52 CHAPTER 4. THE EXPERIMENTS
Effect of using GPU
Figure 33 shows that the use of the GPU improves the time processing when the data is
highly correlated. Efficiency in the process of not-correlated datasets is not affected by
the use of GPU. Figure 34 proves that the same effect occurs either on sorted or unsorted
real-life datasets.
Figure 33: GPU effect on unsorted real-life datasets
Effect of sorting
Figure 35 shows that the pre-sorting process does not affect the efficiency of the algorithm
running in the GPU texture memory.
4.3. RESULTS AND ANALYSIS 53
Figure 34: GPU effect on sorted real-life datasets
Figure 35: Sorting effect on real-life datasets processed using the GPU algorithm
54 CHAPTER 4. THE EXPERIMENTS
The effect of pre-sorting the datasets improves the efficiency of the algorithm running in
the CPU processors only when the data is correlated. Figure 36 shows that the algorithm
maintains the same performance when the Microarray not-correlated dataset is processed.
Figure 36: Sorting effect on real-life datasets processed using the CPU algorithm
4.4 Summary
Working with not-correlated and anti-correlated datasets turns out to be not so fruitful
on any of the improvements tested in this experiment, Figure 37 and Figure 38 show that
the processing time stays unchanged for any of the improvements.
Correlated synthetic datasets and real-life datasets are affected by the use of the GPU
texture memory; however the computation of the skylines proves to be more efficient when
the data is unsorted (Figure 39 and Figure 40). The reason that the unsorted implemen-
tation of our algorithm outperforms the sorted version can be found analyzing the data
4.4. SUMMARY 55
Figure 37: Comparing effect of the algorithms in not-correlated synthetic datasets
space arrangement changing through the algorithm tasks. The stronger candidates to
become skyline points are obtained in the pre-sorting process. How are these top-points
positioned in the data space? Using 55 dimensions we found that the EEG1 dataset de-
livers 575 points in the top of its sorted space, but only 74 top-points belong to the final
skyline. Figure 41 graphs the values for the first 11 top-points where all but the 11th are
skyline points. Giving that the pre-sorting process is providing a subset of points repre-
senting a cluster in the data space, the partial skyline for this subset is eliminating most
of the top points and delivering too few not-dominated points into the process running in
the GPU. Therefore the GPU algorithm wastes precious texture memory resources and
the unsorted implementation becomes more efficient.
56 CHAPTER 4. THE EXPERIMENTS
Figure 38: Comparing effect of the algorithms in anti-correlated synthetic datasets
Figure 39: Comparing effect of the algorithms in correlated synthetic datasets
4.4. SUMMARY 57
Figure 40: Comparing effect of the algorithms in real-life datasets
Figure 41: Top points cluster
58 CHAPTER 4. THE EXPERIMENTS
Given a high-dimensional data space with high cardinality, the unsorted implementation
of the parallel algorithm demonstrates that the resources provided by GPU devices im-
prove the efficiency of a simple parallelized scan algorithm and compares well with other
more complex approaches [21]. Nevertheless, our work was aimed to obtain a consistent
and scalable reduction in computing time. This goal was not achieved by reason of the
time consumed on the data transference between the RAM memory (host) and the GPU
memory (kernel). The texture memory size prohibits the upload of the entire dataset,
consequently the algorithm partitions the data and increments significantly the comput-
ing time at each point where the target subspace is transferred to the GPU. This effect
is displayed at the results obtained by processing the real-life correlated datasets.
Chapter 5
Conclusions
This work presents a parallel scan algorithm that computes Skyline points in high di-
mensional large datasets using a heterogeneous computing approach. Our approach
delivers the following contributions:
• A parallel scan algorithm running on GPU devices taking advantage of characteris-
tics as texture memory provided by graphic cards hardware, and vector data types
included in the OpenCL framework.
• A comprehensive test for performance and accuracy of the proposed algorithm.
We confronted and solved two main issues:
• Overwhelming complexity of recursive searching: our algorithm implements a brute-
force parallel scanning that suits better the GPU architecture.
• Memory availability, both host memory (RAM) and device memory: our approach
optimizes memory usage by working with texture memory and processing vector
data types inside the kernels.
Through our investigation, we analysed the effect of sorting on the algorithm perfor-
mance. Sorting allows the pruning to be more effective, because the stronger points are
processed first, but a not-dominated point could be ignored during sorting if this point
59
60 CHAPTER 5. CONCLUSIONS
has characteristics of a maximum or minimum in any dimension and low weight-sum.
Trying to achieve efficiency at processing high dimensional data spaces with high car-
dinality, we accessed the GPU texture memory characterized by faster access and, size
bounded to the device specifications. While the use of this resource improves the algo-
rithm efficiency when dealing with correlated datasets, it causes the opposite effect when
the data space has been qualified as not-correlated or anti-correlated. Datasets with zero
or negative correlation present partial Skylines with high cardinality. The size of each
data partition is calculated to fit into the GPU texture memory, therefore larger partial
results cause a significant increase in the number of uploads to the memory device. In
spite of this limitation, the algorithm’s dependence on the hardware availability, implies
scalability because of the algorithm adaptability.
Scan parallel algorithms have been studied and proved to outperform complex tree struc-
tured indexes when the datasets belong to high-dimensional space[27]. Our proposal
aimed to provide a simplistic approach to the Skyline computational problem proposing a
parallel scan dominance algorithm. The experimental part of our work demonstrates the
accuracy of our implementation and its competence to deal with high dimensionality. To
the best of our knowledge, other parallel implementations has been tested only on data
spaces with dimensionality lower than the datasets generated and collected in this work.
Nevertheless, our algorithm computing time compares well with results presented in these
works [21].
Bibliography
[1] Almasi, G. and Gottlieb, A. Highly Parallel Computing. Pearson Education. 1993.
[2] Atallah, M. et al. Asymptotically Efficient Algorithms for Skyline Probabilities of
Uncertain Data. ACM Transactions on Database Systems . 2011.
[3] Atsalakis, G. and Valavanis, K. Surveying stock market forecasting techniques - Part
II: Soft computing methods. Expert Syst. Appl. 36, 3 (April 2009), 5932-5941. 2009.
[4] BCI-Lab. Data sets IIIa. Provided by the Laboratory of Brain-Computer Inter-
faces (BCI-Lab), Graz University of Technology, (Gert Pfurtscheller, Alois Schlgl).
http://bbci.de/competition/iii/index.html
[5] Bhattacharya, B. et al. Computation of Non-dominated Points Using Compact
Voronoi Diagrams. WALCOM: Algorithms and Computation Lecture Notes in Com-
puter Science. Springer Berlin/Heidelberg. 2010.
[6] Bohm, C. et al. SkyDist: Data Mining on Skyline Objects Advances in Knowl-
edge Discovery and Data Mining Lecture Notes in Computer Science. Springer
Berlin/Heidelberg. 2010.
[7] Borzsony, S. et al. The Skyline operator. Data Engineering, 2001. Proceedings. 17th
International Conference on , vol., no., pp.421-430, 2001
[8] Buyya, R. and Venugopal, Srikumar. A gentle introduction to grid computing and
technologies. CSI Communications. July 2005.
[9] Chaudhuri, S. What Next? A Half-Dozen Management Research Goals for Big Data
and the Cloud. PODS12, May 21-23, 20102, Scottsdale, Arizona, USA.
61
62 BIBLIOGRAPHY
[10] Choi W. et al. Multi-criteria decision making with skyline computation. Information
Reuse and Integration (IRI), 2012 IEEE 13th International Conference on , vol., no.,
pp.316,323, 8-10 Aug. 2012.
[11] Choi W. et al. Parallel Computation of Skyline and Reverse Skyline Queries Using
MapReduce. Proceedings of the VLDB Endowment 6.14, 2013.
[12] Chomicki, J. et al. Skyline with Presorting. In: Proceedings of the 19th International
Conference on Data Engineering (ICDE). IEEE Computer Society. pp 717-719. 2003.
[13] De Weck, O. Multiobjective Optimization: History and Promise, In The Third China-
Japan-Korea Joint Symposium on Optimization of Structural and Mechanical Sys-
tems, 2004.
[14] Ding, X. and Jin, H. Efficient and Progressive Algorithms for Distributed Skyline
Queries over Uncertain Data IEEE 30th International Conference on Distributed
Computing Systems (ICDCS), 2010.
[15] Ding, X. et al. Continuous monitoring of skylines over uncertain data streams Infor-
mation Sciences. 2011.
[16] Donoho, D. High-Dimensional Data Analysis: The Curses and
Blessings of Dimensionality. Standford University. 2000. www-
stat.stanford.edu/ donoho/Lectures/AMS2000/Curses.pdf
[17] Fung, G. et al. Extract Interesting Skyline Points in High Dimension Database
Systems for Advanced Applications. Lecture Notes in Computer Science. Springer
Berlin/Heidelberg. 2010.
[18] Gaster, B. et al. Heterogeneous Computing with OpenCL. Elsevier, 2012.
[19] Govindaraju, N. et al. A cache-efficient sorting algorithm for
database and data mining computations using graphics pro-
cessors. University of North Carolina, Chapel Hill, 2005.
https://gamma.cs.unc.edu/papers/documents/technicalreports/tr05016.pdf
BIBLIOGRAPHY 63
[20] Hsueh, Y. et al. Efficient Updates for Continuous Skyline Computations Database
and Expert Systems Applications: 19th international conference DEXA, Italy.
Springer. 2008.
[21] Hyeonseung, I. et al. Parallel skyline computation on multicore architec-
tures. Information Systems, Volume 36, Issue 4, June 2011, Pages 808-823.
http://www.sciencedirect.com/science/article/pii/S0306437910001389
[22] Jang, S. et al. Skyline Minimum Vector 2010. 12th International Asia-Pacific Web
Conference.
[23] Jiang, X. et al. Prominent streak discovery in sequence data. Proceedings of the 17th
ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM, New York, USA. 2011.
[24] Jones, D. Good Practice in (Pseudo) Random Num-
ber Generation for Bioinformatics Applications. May 2010.
http://www0.cs.ucl.ac.uk/staff/D.Jones/GoodPracticeRNG.pdf
[25] Jung, H. et al. A fast and progressive algorithm for skyline queries with totally and
partially ordered domains. Journal of Systems and Software.2010.
[26] Khler, H. et al. Efficient parallel skyline processing using hyperplane projections.
2011. In Proceedings of the International conference on Management of data (SIG-
MOD 11). ACM, New York, NY, USA. 2011.
[27] K. Jinwoong et al. Parallel multi-dimensional range query processing with R-trees on
GPU, Journal of Parallel and Distributed Computing, Volume 73, Issue 8, August
2013.
[28] Kohler, H. and Yang, J. Computing Large Skylines over Few Dimensions: The Curse
of Anti-correlation. 12th International Asia-Pacific Web Conference. Korea. 2010.
[29] Kossmann, D. et al. Shooting stars in the sky: An online algorithm for skyline queries.
In Proceedings of the Very Large Data Bases Conference (VLDB; Hong Kong, China,
Aug. 20-23). 2002. 275-286.
64 BIBLIOGRAPHY
[30] Kriegel, H. et al. Route skyline queries: A multipreference path planning approach
IEEE 26th International Conference on Data Engineering (ICDE), 2010.
[31] Kung, H. et al. On finding the maxima of a set of vectors. Journal of the ACM,
22(4):469476, 1975.
[32] Lee, J. and Hwang, S. BSkyTree: scalable skyline computation using a balanced pivot
selection. Proceedings of the 13th International Conference on Extending Database
Technology (EDBT 10). ACM, New York, NY, USA. 2010.
[33] Lee, J. and Hwang, S. QSkycube: Efficient Skycube Computation Using Point Based
Space Partitioning PVLDB. 2010.
[34] Li, C et al. Multi-Source Skyline Queries Processing in Multi-Dimensional Space.
Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer
Science. Springer Berlin/Heidelberg. 2010.
[35] Lian, X. and Chen, L. Reverse skyline search in uncertain databases. ACM Transac-
tions on Database Systems (TODS). 2008.
[36] Loper, S. and Makki, S. Data Filtering Utilizing Window Indexing International Con-
ference on Advanced Information Networking and Applications Workshops. IEEE
24th International Conference on Advanced Information Networking and Applica-
tions Workshops, 2010.
[37] Lu, H. et al. Continuous Skyline Monitoring over Distributed Data Streams Scientific
and Statistical Database Management. Lecture Notes in Computer Science. Springer
Berlin/Heidelberg. 2010.
[38] Lu, H. et al. Flexible and Efficient Resolution of Skyline Query Size Constraints.
IEEE transactions on knowledge and data engineering, Vol.23 (7), p.991-1005 [Peer
Reviewed Journal]. 2011.
[39] Marler, T and Arora J. The weighted sum method for multi-objective optimization:
new insights. Struct Multidisc Optim (2010) 41:853862. Springer Verlag. 2009.
BIBLIOGRAPHY 65
[40] Narzisi, G. et al. Multi-Objective Evolutionary Optimization of Agent-based models:
an application to Emergency Response Planning. The IASTED International Con-
ference on Computational Intelligence, CI 2006, pp. 224-230, November 20-22, San
Francisco, CA, 2006.
[41] Narzisi, G. Multi-Objective Optimization. Courant Institute of Mathematical Sci-
ences. New York University, 2008.
[42] Nickolls, J. and Dally, W. The GPU computing era. IEEE Computer Society. March-
April 2010.
[43] Ozyer, T. et al. Integrating multi-objective genetic algorithm based clustering and
data partitioning for skyline computation Applied Intelligence. Springer Netherlands.
2011.
[44] Papadias, D. et al. Progressive skyline computation in database systems. ACM Trans-
actions on Database Systems 30 (1). 2005. 41-82.
[45] Pareto, V. Manual of political economy. A. M. Kelley Publishers, New York, 1971.
[46] Preparata, F. and Shamos, M. Computational Geometry: An Introduction. Springer-
Verlag, New York, Berlin, etc., 1985.
[47] Rassi, C. et al. Computing closed skycubes. Proceedings of the VLDB Endowment.
2010.
[48] Rocha-Junior, J. et al. Efficient execution plans for distributed skyline query pro-
cessing. Proceedings of the 14th International Conference on Extending Database
Technology. Uppsala, Sweden. 2011.
[49] Sarma, A. et al. Representative skylines using threshold based preference distribu-
tions IEEE 27th International Conference on Data Engineering (ICDE). 2011.
[50] Scarpino, M. OpenCL in action. Manning Publications Co. 2012.
[51] Siddique, A. and Morimoto, Y. Efficient Maintenance of k-Dominant Skyline for Fre-
quently Updated Database First International Conference on Advances in Databases.
Second International Conference on Advances in Databases, Knowledge, and Data
Applications. 2010.
66 BIBLIOGRAPHY
[52] Tao, Y. et al. Distance-based Representative Skyline Chinese University of Hong
Kong University of New South Wales and NICTA Simon Fraser University Data
Engineering. ICDE. 2009.
[53] Tambaram Kailasam, G. et al. Efficient skycube computation using point and
domain-based filtering Information Sciences. 2010.
[54] Tan, K. et al. Efficient progressive skyline computation. In Proceedings of the Very
Large Data Bases Conference (VLDB; Rome, Italy, Sep. 11-14). 2001. 301 310.
[55] The Luc, Dinh. Pareto optimality in Pareto Optimality, Game Theory And Equilib-
ria. Springer, Optimization and its Applications, Vol. 17, 2008.
[56] Valkanas, G. et al. Efficient and Adaptive Distributed Skyline Computation Scientific
and Statistical Database Management. Lecture Notes in Computer Science. Springer
Berlin / Heidelberg. 2010.
[57] Venkatesh, R. et al. Skyline and mapping aware join query evaluation. Information
Systems, Volume 36, Issue 6, September 2011.
[58] Vlachou, A. and Vazirgiannis, M. Ranking the sky: Discovering the importance of
skyline points through subspace dominance relationships. Data & knowledge engi-
neering, Vol.69 (9), p.943-964 [Peer Reviewed Journal]. 2010.
[59] Weber, R. et al. A Quantitative analysis and performance study for Similarity-search
methods in High-dimensional spaces. Proceedings of the 24th VLDB Conference, New
York, USA. 1998.
[60] Xiao, Y. and Chen, Y. Efficient Distributed Skyline Queries for Mobile Applications.
Journal of Computer Science and Technology. Springer Boston. 2010.
[61] Yang, Z. et al. Efficient Analyzing General Dominant Relationship based on Partial
Order Models. IEICE TRANSACTIONS on Information and Systems. 2010.
[62] Yiu, M. et al. Measuring the Sky: On Computing Data Cubes via Skylining the
Measures. IEEE Transactions on Knowledge and Data Engineering, 2010.
BIBLIOGRAPHY 67
[63] Yuan, Y. et al. Efficient computation of the skyline cube. Proceedings of the 31st
International Conference on Very Large Data Bases (VLDB 05). 2005.
[64] Zemke, F. What is new in SQL:2011. SIGMOD Record, March 2012 (Vol 41, No 1).
[65] Zhang, Y. et al. Ranking uncertain sky: The probabilistic top-k skyline operator
Journal Information Systems, July 2011.