probabilistic skyline operator over sliding windows
DESCRIPTION
Probabilistic Skyline Operator over Sliding Windows. Wenjie Zhang University of New South Wales & NICTA, Australia. Joint work: Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK). Outline. Background Framework Algorithms Experiment Conclusion. Background. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/1.jpg)
Probabilistic Skyline Operator over Sliding
WindowsWenjie Zhang
University of New South Wales & NICTA, AustraliaJoint work:
Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA)
Jeffrey Xu Yu (CUHK)
![Page 2: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/2.jpg)
Outline
Background Framework Algorithms Experiment Conclusion
2
![Page 3: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/3.jpg)
Background
Elements continuously arrive with occurrence probabilities
Problem : How to continuously compute skylines in a sliding window with size N (elements)?
11
22
33
55
44
0.1
0.1
0.4
0.1
0.8
66 0.5
1
Sliding window: N = 5
3
![Page 4: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/4.jpg)
Background
Multi-criteria decision making regarding uncertain data: Online auction Financial market … …
4
![Page 5: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/5.jpg)
Related work
Probabilistic skyline
computation
Uncertain stream
processing
Probabilistic skyline (VLDB07)
Probabilistic reverse skyline (SIGMOD08)
Probabilistic aggregates and sketches over uncertain streams (SIGMOD07, SODA07, PODS07)
Frequent items on uncertain streams (SIGMOD08)
Top-k queries over uncertain sliding window (VLDB08)
… …
5
![Page 6: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/6.jpg)
Models and Problem Definition Model: DS is a stream of elements, each
element a is in a d-dimensional space and with an occurrence probability P(a) ( in (0, 1])
The skyline probability of an element a is:
Problem Definition: retrieving elements from the most recent N elements, with skyline probability no less than a given threshold q
aaDSasky aPaPaP','
))'(1()()(
6
![Page 7: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/7.jpg)
Challenges and Contributions
Space efficiency: Contribution: Space reduction: O(N) to
O(lnd-1N)
Time efficiency Contribution: R-tree based efficient
incremental algorithms
7
![Page 8: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/8.jpg)
Outline
Background and Preliminaries Framework Algorithms Experiment Conclusion
8
![Page 9: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/9.jpg)
Framework: what to keep ?
11
22
33
55
44
0.1
0.1
0.4
0.1
0.8
Pnew (2) < q , element 2 will never become
skyline in the window
window size N : 5 probability threshold: 0.5
)()( aPaPsky
Pold (2) = 1 – P(1)
9
)(aPold )(aPnew
Pnew(2) = (1 – P(3)) * (1 – P(4))
![Page 10: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/10.jpg)
Framework: what to keep ?
Candidate set SN,q: Correctness: (1) no missing skyline points
(2) no false hits to determine SN, q
(3) no false positive to determine skyline results
(4) no false negative to determine skyline results
--- probability based on SN,q may not be accurate, but
satisfies the threshold requirement.
qaPnew )(
10
![Page 11: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/11.jpg)
Framework
Space required for SN,q: SN,q is the minimum information to be
maintained to get a correct answer.
11
44
22
0.3
0.8
0.4
33 0.9
window size N : 4 probability threshold q: 0.5
11
Psky(3) = 0.9 * (1 – 0.4) * (1- 0.3) < q
1
2
Psky(3) = 0.9 > q
![Page 12: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/12.jpg)
Space of Candidate Set
Theorem: Candidate Set requires a poly-logarithmic space on average case regarding uniform distributions, O(f(q)lnd-
1N).
12
![Page 13: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/13.jpg)
Outline
Background and Preliminaries Framework Algorithms Experiment Conclusion
13
![Page 14: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/14.jpg)
Algorithms
We maintain two R-trees R1: SKYN,q --- skylines
R2: SN,q - SKYN,q --- candidates – skylines
14
![Page 15: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/15.jpg)
Algorithms
1(.1)
2(.1)
3(.4)
4(.1)
5(.8)
6(.8)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
window size N : 13 probability threshold q: 0.2
15
not in SN,qR1: SKYN,q
R2: SN,q – SKYN,q
![Page 16: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/16.jpg)
Algorithms
New element arrives Check Psky & Pnew on R1
Check Pnew on R2 Handling elements with Pnew < q
Old element expires Update Pold
Check Psky on R2
16
![Page 17: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/17.jpg)
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
5(.8)6(.8)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SN,q - SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Before update:
Pnew : (1, 1)
Psky : (0.8, 0.8)
global Pnew = 1 – 0.2
After update:
global Pnew *= 1- 0.8
Delete from R1
17
Delete an Entry:
![Page 18: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/18.jpg)
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SN,q - SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Before update:
Pnew : (1, 1)
Psky : (0.24, 0.6)
global Pnew = 1
After update:
global Pnew *= 1 – 0.8
min Pnew = 0.2 ≥ q
max Psky = 0.12 < q
Move from R1 to R2
18
Move an Entry from R1 to R2:
![Page 19: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/19.jpg)
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SN,q - SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Before update:
Pnew : (0.9, 1)
global Pnew = 1
After update:
global Pnew *= 1 – 0.8
min Pnew < q;
max Pnew ≥ q
Drill down and delete 2
19
![Page 20: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/20.jpg)
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
R2: SN,q - SKYN,q
Update Pold of 12 & 13
global Pold /= (1 – 0.1)
20
Update Pold:
![Page 21: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/21.jpg)
Algorithms: new elements arrives
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
R2: SN,q - SKYN,q
Insert new element:
Pnew = 1.
compute Psky
21
![Page 22: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/22.jpg)
Algorithm: old element expires Delete it from R1 or R2. Update Pold of remaining elements:
Record global Pold on intermediate entries fully dominated by it
Check Psky after update
22
![Page 23: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/23.jpg)
Algorithms: old element expires
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Pold (7) /= 1 – P(3)
global Pold /= 1 – P(4)
23
![Page 24: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/24.jpg)
Algorithms: handling multiple thresholds Continuous queries
Users specify k probability thresholds q1, …, qk. (qi < qi-1)
Solution: instead of maintaining R1, we maintain R1, …, Rk, each corresponding to a confidence value.
Ad-hoc queries Users issue a query: retrieve skylines with
probability at least q’ (q’ ≥ qk) Solution: find an Ri with qi ≤ q’ < qi-1. Then all
elements in {Rj: j < i -1} are results. We search Ri-1 to output qualified skylines
24
![Page 25: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/25.jpg)
Experiment
Data set: Real: stock transactions. 2-d. probability
assigned randomly. Size: 2 million Synthetic: spatial location (independent or
anti-correlated); probability (uniform or normal); 2d to 5d; 2 million
Default values: p : 0.3; d: 3; N : 1M; spatial distribution: anti-correlated; probability: uniform;
25
![Page 26: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/26.jpg)
Experiment: space
0.1% to the sliding window size for 2-d data; save around 89% space even for 5-d data.
26
![Page 27: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/27.jpg)
Experiment: space
Size of SN,q deceases with the increase of Pu, while size of SKYN,q increases with it.
27
![Page 28: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/28.jpg)
Experiment: space28
![Page 29: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/29.jpg)
Experiment: time29
![Page 30: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/30.jpg)
Experiment: time
Maintenance time increases with # probability thresholds; query time deceases with it.
30
![Page 31: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/31.jpg)
Conclusion
We characterize a candidate set with minimum size and propose time efficient techniques.
We extend the framework to handle multiple thresholds.
31
![Page 32: Probabilistic Skyline Operator over Sliding Windows](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813fe8550346895daadb37/html5/thumbnails/32.jpg)
Thanks !
32