[vldb 2013] skyline operator on anti correlated distributions
TRANSCRIPT
Skyline Operator on Anti-correlated Distribution
Proceedings of the VLDB(2013) Endowment, Vol. 6 No. 9Haichuan Shang, Masaru Kitsuregawa
Presenter:
WooSung Choi([email protected])
DataKnow. LabKorea UNIV.
Background
Related work
Preliminaries• Formal definition of Dominates ()
Given a set of d-dimensional points
We say that a point DOMINATES another point If and only if
and Denoted by (simply saying, 이 자명하게 선호됨 )
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf
Note thatthe meaning of ‘dominates’ may differ according to type of application
www.caranddriver.com
formal Definition (skyline)•The Skyline operator
Input - Given a set of objects {
A
B
CD
E
F
Dominating Area(B)
x axis
y ax
is
G
Common misconceptions“ ” , wrong“, ”, correct
Suppose there are n objects in the given set Algorithm -Naïve 1
Naï
ve a
ppro
ach
Nested L
oop Stru
c-ture
Computational Cost -
Motivation
Data Distribution
Data Distribution?
Related Work: Summary•Worst-case Analysis (2.1)
worst case complexity on arbitrary data distributions [16], [12]
•Elimination Category (2.2)Average Complexity with dimensional independence Idea: Eliminate non-skyline objects quickly!BNL[7], SFS[9], LESS[12], … [20], where is the skyline cardinality[20], where is the skyline cardinality
Anti-Correlation 은 왜 중요한가 ?
Anti-Correlated (2)•A relationship in which
the value in one dimension increases as the values in the other dimensions decrease
•Skyline Queries are used to find a set of non-dominated data points
for Multi-Criteria Decision Making•Data in real world
is more likely to be anti- correlated
Anti-Correlated (3)•The anti-correlation significantly limits the practical usage of the existing algorithms
•and yields the demand of effective mathemati-cal models and efficient algorithms on anti-cor-related data
[20], where is the skyline cardinality tends to increase on anti-correlated distribution
These existing algorithms fall back to
뭘 하겠다는 연구인가 ?
공헌도
Contribution•1) General model for the anti-correlated distri-bution•2) Polynomial Estimation of the lower bound of the expected value of skyline cardinality•3) a “Determination and Elimination Frame-work” for efficient computation of skyline on anti-correlated distribution
3. PRELIMINIARIES
Definition & Expectation of Skyline Cardinality
Model: Anti-Correlated Distribution
0 2000 4000 6000 8000 10000 120000
1000
2000
3000
4000
5000
6000
7000
8000
Uniform
0 2000 4000 6000 8000 10000 120000
1000
2000
3000
4000
5000
6000
Anti c=1
0 2000 4000 6000 8000 10000 120000
1000
2000
3000
4000
5000
6000
Anti c=0.1
1) General model for the anti-correlated distribution
1K Tuples
0 2000 4000 6000 8000 10000 120000
10002000300040005000600070008000
Uniform
0 2000 4000 6000 8000 10000 120000
1000
2000
3000
4000
5000
6000
Anti c=1
0 2000 4000 6000 8000 10000 120000
100020003000400050006000
Anti c=0.1
12 57 116
1) General model for the anti-correlated distribution
1K Tuples
0 2000 4000 6000 8000 10000 120000
1000
2000
3000
4000
5000
6000
Anti c=1
57�̂�2,1000,1≈ √1000∗ 𝜋−1=55.04991222) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality
Generalization•Theorem 3
The expected value of the skyline cardinality
when •Where
2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality
[20], where is the skyline cardinality tends to increase on anti-correlated distribution
These existing algorithms: ) ~
Pearson Correlation Coefficientor covariance based model
공분산•확률론과 통계학에서 , 공분산 ( 共分散 , 영어: covari-ance) 은 2 개의 확률변수의 상관정도를 나타내는 값•만약 2 개의 변수중 하나의 값이 상승하는 경향을 보일 때 , 다른 값도 상승하는 경향의 상관관계에 있다면 , 공분산의 값은 양수•반대로 2 개의 변수중 하나의 값이 상승하는 경향을 보일 때 , 다른 값이 하강하는 경향을 보인다면 공분산의 값은 음수