probabilistic skyline operator over sliding windows

Post on 09-Jan-2016

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Probabilistic Skyline Operator over Sliding Windows. Wenjie Zhang University of New South Wales & NICTA, Australia. Joint work: Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK). Outline. Background Framework Algorithms Experiment Conclusion. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Probabilistic Skyline Operator over Sliding

WindowsWenjie Zhang

University of New South Wales & NICTA, AustraliaJoint work:

Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA)

Jeffrey Xu Yu (CUHK)

Outline

Background Framework Algorithms Experiment Conclusion

2

Background

Elements continuously arrive with occurrence probabilities

Problem : How to continuously compute skylines in a sliding window with size N (elements)?

11

22

33

55

44

0.1

0.1

0.4

0.1

0.8

66 0.5

1

Sliding window: N = 5

3

Background

Multi-criteria decision making regarding uncertain data: Online auction Financial market … …

4

Related work

Probabilistic skyline

computation

Uncertain stream

processing

Probabilistic skyline (VLDB07)

Probabilistic reverse skyline (SIGMOD08)

Probabilistic aggregates and sketches over uncertain streams (SIGMOD07, SODA07, PODS07)

Frequent items on uncertain streams (SIGMOD08)

Top-k queries over uncertain sliding window (VLDB08)

… …

5

Models and Problem Definition Model: DS is a stream of elements, each

element a is in a d-dimensional space and with an occurrence probability P(a) ( in (0, 1])

The skyline probability of an element a is:

Problem Definition: retrieving elements from the most recent N elements, with skyline probability no less than a given threshold q

aaDSasky aPaPaP','

))'(1()()(

6

Challenges and Contributions

Space efficiency: Contribution: Space reduction: O(N) to

O(lnd-1N)

Time efficiency Contribution: R-tree based efficient

incremental algorithms

7

Outline

Background and Preliminaries Framework Algorithms Experiment Conclusion

8

Framework: what to keep ?

11

22

33

55

44

0.1

0.1

0.4

0.1

0.8

Pnew (2) < q , element 2 will never become

skyline in the window

window size N : 5 probability threshold: 0.5

)()( aPaPsky

Pold (2) = 1 – P(1)

9

)(aPold )(aPnew

Pnew(2) = (1 – P(3)) * (1 – P(4))

Framework: what to keep ?

Candidate set SN,q: Correctness: (1) no missing skyline points

(2) no false hits to determine SN, q

(3) no false positive to determine skyline results

(4) no false negative to determine skyline results

--- probability based on SN,q may not be accurate, but

satisfies the threshold requirement.

qaPnew )(

10

Framework

Space required for SN,q: SN,q is the minimum information to be

maintained to get a correct answer.

11

44

22

0.3

0.8

0.4

33 0.9

window size N : 4 probability threshold q: 0.5

11

Psky(3) = 0.9 * (1 – 0.4) * (1- 0.3) < q

1

2

Psky(3) = 0.9 > q

Space of Candidate Set

Theorem: Candidate Set requires a poly-logarithmic space on average case regarding uniform distributions, O(f(q)lnd-

1N).

12

Outline

Background and Preliminaries Framework Algorithms Experiment Conclusion

13

Algorithms

We maintain two R-trees R1: SKYN,q --- skylines

R2: SN,q - SKYN,q --- candidates – skylines

14

Algorithms

1(.1)

2(.1)

3(.4)

4(.1)

5(.8)

6(.8)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

window size N : 13 probability threshold q: 0.2

15

not in SN,qR1: SKYN,q

R2: SN,q – SKYN,q

Algorithms

New element arrives Check Psky & Pnew on R1

Check Pnew on R2 Handling elements with Pnew < q

Old element expires Update Pold

Check Psky on R2

16

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

5(.8)6(.8)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SN,q - SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Before update:

Pnew : (1, 1)

Psky : (0.8, 0.8)

global Pnew = 1 – 0.2

After update:

global Pnew *= 1- 0.8

Delete from R1

17

Delete an Entry:

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SN,q - SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Before update:

Pnew : (1, 1)

Psky : (0.24, 0.6)

global Pnew = 1

After update:

global Pnew *= 1 – 0.8

min Pnew = 0.2 ≥ q

max Psky = 0.12 < q

Move from R1 to R2

18

Move an Entry from R1 to R2:

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SN,q - SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Before update:

Pnew : (0.9, 1)

global Pnew = 1

After update:

global Pnew *= 1 – 0.8

min Pnew < q;

max Pnew ≥ q

Drill down and delete 2

19

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

R2: SN,q - SKYN,q

Update Pold of 12 & 13

global Pold /= (1 – 0.1)

20

Update Pold:

Algorithms: new elements arrives

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

R2: SN,q - SKYN,q

Insert new element:

Pnew = 1.

compute Psky

21

Algorithm: old element expires Delete it from R1 or R2. Update Pold of remaining elements:

Record global Pold on intermediate entries fully dominated by it

Check Psky after update

22

Algorithms: old element expires

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Pold (7) /= 1 – P(3)

global Pold /= 1 – P(4)

23

Algorithms: handling multiple thresholds Continuous queries

Users specify k probability thresholds q1, …, qk. (qi < qi-1)

Solution: instead of maintaining R1, we maintain R1, …, Rk, each corresponding to a confidence value.

Ad-hoc queries Users issue a query: retrieve skylines with

probability at least q’ (q’ ≥ qk) Solution: find an Ri with qi ≤ q’ < qi-1. Then all

elements in {Rj: j < i -1} are results. We search Ri-1 to output qualified skylines

24

Experiment

Data set: Real: stock transactions. 2-d. probability

assigned randomly. Size: 2 million Synthetic: spatial location (independent or

anti-correlated); probability (uniform or normal); 2d to 5d; 2 million

Default values: p : 0.3; d: 3; N : 1M; spatial distribution: anti-correlated; probability: uniform;

25

Experiment: space

0.1% to the sliding window size for 2-d data; save around 89% space even for 5-d data.

26

Experiment: space

Size of SN,q deceases with the increase of Pu, while size of SKYN,q increases with it.

27

Experiment: space28

Experiment: time29

Experiment: time

Maintenance time increases with # probability thresholds; query time deceases with it.

30

Conclusion

We characterize a candidate set with minimum size and propose time efficient techniques.

We extend the framework to handle multiple thresholds.

31

Thanks !

32

top related