information technology influence computation in spatial dabases muhammad aamir cheema faculty of...

85
Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia [email protected] www.aamircheema.com

Upload: kory-oliver

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Information Technology

Influence Computation in Spatial Dabases

Muhammad Aamir CheemaFaculty of Information TechnologyMonash University, Australia

[email protected]

Page 2: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work

Page 3: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Introduction: Influence Set

In a data set consisting of facilities and users, a facility influences a user if considers as one of its most “important” facilities

A set of users influenced by is called influence set of

Influence

Influence Set

U1

U2f2

f1

Influence Set of Coles

Page 4: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Introduction: Influence Set

A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g., Distance Rating Price

Important facility?

Who are my potential customers ?

Page 5: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Introduction: Influence Set

Important to identify potential users/customers Used in various applications such as marketing, cluster and

outlier analysis, and decision support systems

Significance

Reverse Nearest Neighbors Reverse Top- Reverse Skyline

Types

Page 6: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work

Page 7: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008]Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 8: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Reverse k Nearest Neighbors (RkNN)

• Definition of importance– A facility f is important to a user if f is

one of its k closest facilities

• Reverse k Nearest Neighbors– Find every user u for which the query

facility q is one of its k-closest facilities.

Influence set of f1 is {u1,u2}

Influence set of f2 is {u3}

K=1

u2

f1

f2

u1

u3

Page 9: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 10: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Pre-computation based approach[F. Korn et al., SIGMOD 2000]

• Pre-computation– For each user u

• Draw a circle centered at u containing its k closest facilities

– Index these circles using an R-tree

• Query processing– Find the circles that contain q

• Problems– arbitrary k?– data updates?

u1

f1

f2u2

u3 f3

u4

k = 1

q q

Page 11: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 12: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

On-the-fly RkNN Algorithms

Pruning

Verification• Find the users that lie in the

unpruned space• For each such user, check

whether it is a RkNN of q or not

• Prune the search space using near by facilities of q

Data indexed by R-trees

Page 13: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

On-the-fly RkNN AlgorithmsPruning

Verification

Half-space

Region-based

TPL (VLDB 2004), TPL++ (PVLDB 2015)

FINCH (PVLDB 2008),InfZone (ICDE 2011)

Six-regions (SIGMOD 2000)

SLICE (ICDE 2014)

Page 14: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 15: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

1. Divide the whole space centred at the query q into six equal regions each of 60o

2. Let f be a facility in a partition P

3. Let u be a user in P for which dist(u,q) > dist(q,f)

4. q cannot be the closest facility of u

Proof Sketch: • fqu ≤ 60o and ufq > 60o

• ufq > fqu uq > uf

f

q

u

Six-regions: Pruning[I. Stanoi et al., SIGMOD Workshop 2000]

Page 16: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

1. Divide the whole space centred at the query q into six equal regions

2. Find the k-th nearest neighbor in each Partition.

3. The k-th nearest facility of q in each region defines the area that can be pruned

ba

c

d

q

u1

u2

Six-regions: Pruning[I. Stanoi et al., SIGMOD Workshop 2000] k =

2

Page 17: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

• Access users R-tree and prune the entries that lie in the pruned area

• For each unpruned user u– Issue a boolean range query

to check if u is a RkNN or not

Disadvantage: Requires boolean range query for each candidate user

ba

c

d

q

u1

Six-regions: Verification[I. Stanoi et al., SIGMOD 2000] k =

2

Page 18: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 19: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

• Half-space Pruning:• q cannot be the closest facility of u if

it lies in the half-space• q cannot be among the k-

closest facilities of u if u lies in k half-spaces

• Pruning Algorithm1. Find the nearest unseen facility f in the

unpruned area.2. Draw a bisector between q and f to

prune the search space3. Go to step 1 unless all facilities in the

unpruned area have been accessed

ba

c

d

q

u

TPL: Pruning[Y. Tao et al., VLDB 2004]

k = 2

Page 20: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL: Pruning[Y. Tao et al., VLDB 2004]

b

q

Advantage: Prunes more space than six-

regionsDisadvantage:X Pruning is more expensive especially when k is not small

Page 21: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL: Pruning[Y. Tao et al., VLDB 2004]

Advantage: Prunes more space than six-

regionsDisadvantage:X Pruning is more expensive especially when k is not small

Find the k-half spaces that contain the user

Requires using subsets

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{a,c}

k! (m-k)!

m!

Page 22: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL: Pruning[Y. Tao et al., VLDB 2004]

Solution: TPL does not use all possible subsets

1. Sort facilities by hilbert-values2. Consider only the subset

consisting of k consecutive facilities

Considers m subsetsX Some pruning power is lost

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{d,a}

{a,b,c,d}

Page 23: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL: Verification[Y. Tao et al., VLDB 2004]

• Prune the user R-tree entries using the k-half spaces approach

• Determine the candidate users

• Issue a bulk boolean range query to verify all candidate users

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{d,a}

Page 24: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 25: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Key Idea Approximate the unpruned area

by a convex polygon

Advantage: Pruning is more efficient (e.g.,

point containment in logarithmic time)

FINCH: Pruning[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

Page 26: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Computing polygon• Get intersection points of half-spaces

and the boundary space• For each intersection point

– Compute a counter that denotes the number of half-spaces that contain it

– Remove the intersections with counter ≥ k

• Compute the convex hull of remaining intersection points

FINCH: Pruning[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

2

1 13

1

1

00

00 0

1

2

Page 27: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Pruning Algorithm1. Initialize whole space as the convex

polygon2. Find the nearest facility that lies inside

the convex polygon3. Draw its half-space, compute new

intersections and their counters and update the convex polygon

4. Go to step 2 until there is an un-accessed facility inside the polygon

FINCH: Pruning[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

Page 28: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

• Prune the user R-tree entries that lie outside the convex polygon

• For each user that lies inside the polygon

– Issue a boolean range query to check if it is a RkNN or not

FINCH: Verification[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

Page 29: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 30: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Influence Zone (InfZone): Motivation[M. Cheema et al., ICDE 2011]

Pruning

Verification

• Find the users that lie in the unpruned space

• For each such user, issue a boolean range query to verify it

• Prune the search space using near by facilities of q

Influence Zone is an area such that a user u is a RkNN if and only if u is inside this area

• Compute influence zone using near by facilities

• Find the users that lie in the influence zone

Page 31: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

The influence zone corresponds to the unpruned polygon when the bisectors of all the facilities have been considered for pruning.

Challenges:• How to compute unpruned polygon?• Using all facilities for pruning will be

very expensive

db

c

a

q

Influence Zone (InfZone): Challenges[M. Cheema et al., ICDE 2011] k =

2

Page 32: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Challenge 1: Constructing the polygon• Like FINCH, compute the counters of

all intersections• Remove the intersections with

counter ≥ k• Keep only the intersections that

either lie on the boundary of the data space OR have counter equal to k-1 or k-2

• Keep only the extreme intersections on each boundary

• Sort the intersections according to their angles with q

• Connect the intersections in the sorted order

Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]

aq

c b

k = 2 2

1 1

3

1

1

00

00 0

2

0

Page 33: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Challenge 2: Avoid accessing all facilities• Let Cv denote the circle centered at a

vertex v with radius dist(v,q)• A facility f can be ignored if it lies

outside Cv for every vertex of the current influence zone

• An entry e of the facility R-tree can be ignored if it lies outside Cv for every vertex of the current influence zone

Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]

aq

c b

k = 2

1 11

1

00

00

Page 34: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Influence Zone Construction Algorithm• Initialize InfZone as the whole data space• Enheap the root of the R-tree in a heap• While heap is not empty

– De-heap an entry e– If e lies outside every Cv

• Ignore e– Else

• If e is an intermediate node– Insert children of e in the heap

• Else– Draw the bisector of e and

update the current influence zone

Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]

aq

c b

k = 2

1 11

1

00

00

Page 35: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

• Prune the user R-tree entries that lie outside the influence zone

• Return the users that lie inside the influence zone

Point containment can be done in logarithmic time O(log m)

Rectangle containment takes linear time O(m)

Influence Zone (InfZone): Verification[M. Cheema et al., ICDE 2011]

aq

c b

k = 2

1 11

1

00

00

Page 36: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 37: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

SLICE: Motivation[S. Yang et al., ICDE 2014]

Regions-based (Six-regions)

Half-space

(InfZone)

VS

Range query

Pruning CostO(m log k) O(km2

)

Pruning Power

Verification Cost

Low High

O(log m)

SLICE

O(m log m)

High

O(k)

m is the # of facilities considered for pruning

Page 38: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

1. Divide the whole space centred at the query q into t equal regions

2. Draw arcs for each facility

3. k-th arc in each partition defines the pruning region

Pruning requires checking only one distance

q

f1

f2

k=2

SLICE: Key Idea[S. Yang et al., ICDE 2014]

Page 39: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

SLICE: Comparison with six-regions[S. Yang et al., ICDE 2014]

q

f

Six-region SLICE

Partitions Pruned

No. of Partitions

One

6

Area pruneddist(f,q) 𝑑𝑖𝑠𝑡 ( 𝑓 ,𝑞)2 cos(𝜃max)

< 90o

any

VSθmax

Page 40: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

SLICE: Verification[S. Yang et al., ICDE 2014]

• Significant facility: – k-th arc in each partition is called

the bounding arc – A facility f that prunes at least one

point p ∈ P lying inside the bounding arc of P.

– An insignifcant facility cannot prune any candidate user

MN

𝐫 𝐁

P

𝐫 𝐁 𝐫 𝐁

Verification for a candidate

Issuing range query

for each candidate

Access significant facilities during

pruning

High I/O and cpu cost

Use significant facilities to verify O(k)

Regions-based

2

SLICE

q

Page 41: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 42: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL++: Optimization 1[S. Yang et al., PVLDB 2015]TPL:1. Sort facilities by hilbert-values2. Consider only the subset

consisting of k consecutive facilities

X Considers m subsets X Some pruning power is lostTPL++:3. Initialize a counter to 0

4. Access facilities one by one

5. Increment the counter whenever a facility prunes the user u

6. Prune u when counter ≥ k

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{d,a}

O(km)

O(m)

Page 43: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Pruning power: TPL vs TPL++[S. Yang et al., PVLDB 2015]

Page 44: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL++: Optimization 2[S. Yang et al., PVLDB 2015]TPL:• A facility entry e or a facility

point that lies in the pruned space is ignored

TPL++:• A facility entry e that lies in the

pruned space is ignored• A facility point is used for

pruning even if it lies in the pruned space

aq

d

c b

u

d

Page 45: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

TPL vs TPL++

2 5 10 15 20 250

10

20

30

40I/O cost

TPL TPL++

k

2 5 10 15 20 250

60

120

180

240CPU cost (ms)

TPL TPL++

k

2 times better 20 times better

Page 46: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Page 47: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Pruning Six-regions

TPL TPL++ FINCH InfZone SLICE

node O(1) O(km) O(m) O(m) O(m) O(1)

point O(1) O(km) O(m) O(logm) O(m) O(1)

Adding f O(log k) O(logm) O(logm) O(m2) O(m2) O(log m)

Verification

node O(1) O(km) O(m) O(m) O(m) O(1)

point O(1) O(km) O(m) O(logm) O(logm)

O(1)

#candidates

Large Large Small Medium Minimal Small

Verifying u Range query

Bulk Range query

Bulk Range query

Range query

O(logm)

O(k)

Comparison of RkNN Algorithms

Page 48: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Experimental Comparison [Yang et al., PVLDB 2015]

• Setup– Intel Xeon 2.66 GHz CPU, 4GB

Memory and Hard disk– Index: R*-tree – 100 buffers– I/O cost and CPU cost– Average cost per query

• Data sets– Three real data sets (up to 25M

points)– CA, LA and NA– Synthetic data sets follows

different distributions (up to 20M points)

Source code and data sets are available online

Page 49: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Experimental Comparison [Yang et al., PVLDB 2015]

Page 50: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

50

RankingCriteria 1st 2nd 3rd 4th 5th 6th

I/O (no buffer) TPL++,InfZone

SLICE TPL FINCH SIX

I/O (small buffer)

TPL++,InfZone

FINCH SLICE TPL,SIX

CPU (k<10) SLICE InfZone TPL++ FINCH SIX,TPL

CPU (10<k<25) SLICE InfZone, TPL++

FINCH SIX TPL

CPU (25<k<200)

SLICE TPL++ SIX FINCH InfZone TPL

Implementation

SIX,SLICE TPL, TPL++

FINCH, InfZone

Experimental Comparison [Yang et al., PVLDB 2015]

Page 51: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries

IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)

Reverse Skyline QueriesOther work

Page 52: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Reverse Top-k (RTk) QueriesIntroduced by [Vlachou et al., ICDE 2010]

Examples are from [Vlachou et al, ICDE 2010]

Score(p2) = 0.2x3 + 0.8x2 = 2.2

• Definition of importance (Top-k queries)– Each user u has a preference function– Score of a facility is

score(f) = w[1]*f[1] + … w[d]*f[d]– A facility f is important to a user u if f is

one of the top-k facilities for u• Bichromatic Reverse Top-k Query (RTk)

– Find every user u for which the query facility q is one of her top-k facilities.

Tom and Max are the reverse top-1 users of p2

Bob is not a reverse top-1 user of p2

Page 53: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Examples are from [Vlachou et al, ICDE 2010]

q = p2, k=1

• Bichromatic RTk queries– Find every user u for which the query facility q is one of her top-k

facilities. (e.g., result is {Tom, Max})• Monochromatic RTk Queries

– Find every weighting vector for which q is one of the top-k facilities.

Result: line segment where w[price]=[1/7,5/6]

Reverse Top-k (RTk) Queries: TypesIntroduced by [Vlachou et al., ICDE 2010]

Page 54: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries

IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)

Reverse Skyline QueriesOther work

Page 55: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

• Score(q) is the projection on the vector w

• Rank(q) w.r.t. w number of facilities below the red line

• Rank(q) < Rank(f) for every w if q dominates f

• Ignore facilities that are dominated by q• Result is empty if k facilities dominate q

Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]

f

qw=[0.5,0.5]

f

f

Page 56: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

• The relative rank of q and f depends on the rotation of the red line

Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]

q

f

w

w`w``

Page 57: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Algorithm• Start with vertical line• Rank(q) Count the number of facilities

on the left• Rotate the line counter-clockwise• Update Rank(q) when line intersects a

facility • Report the weighting vectors for which

Rank(q) ≤ k

Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]

q

a

b

Rank(q) = 21

Page 58: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Given a point a=(u,v) and a weighting vector W=(w1, w2), a.score = u*w1 + v*w2

• A point a=(u,v) is mapped to a line a*: y=ux + v in dual

• The weighting vector W=(w1, w2) is mapped to a vertical line W*: x=w1/w2

• The intersection of a* and w* is the point where y= u(w1/w2)+ v = (u*w1 +v*w2))/w2

a

b

a*

W*: x = w1/ w2

ya= a.score/w2

yb= b.score/w2

b*

Primal Dual

Page 59: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Query: Given a weighted vector W=(w1,w2), return k objects with smallest scores

• Solution: – Map W and all the objects to dual space– Return k lowest lines intersecting W*

a

b

W*: x = w1/ w2

Primal Dual

c d

1

2

Rank1. a2. b3. c4. d

Rank1. d2. b3. a4. c

W*: x = w3/ w4

Page 60: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Given a set of lines L, mass of a point p is the number of lines that lie strictly below p

• k-lower envelope consists of every point p that lies on one of the lines in L and has mass equal to k-1.

pp’

2-lower envelope

Page 61: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Map all facilities to dual space and compute k-lower envelope• Map query point to dual space• Return weighting vectors where query line is below the k-lower envelope

Slide # 61

a

b

Primal Dual

c dW*: x = w1/ w2

q

Page 62: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]

Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection

Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope

Slide # 62

a

b

Primal Dual

c d

Page 63: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]

Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection

Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope

a

b

c d

Line with k-th largest slope.

i.e., point in primal with k-th largest x-value

A point (u,v) in primal is

mapped to a line y=ux+v

Page 64: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]

Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection

Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope

Page 65: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries

IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)

Reverse Skyline QueriesOther work

Page 66: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]

Given a set of facilities F and a set of weighting vectors W, return every weighting vector for which q is one of the top-k facilities

Brute Force Algorithm: For each vector w in W

Compute top-k facilities Return w if q is among the top-k facilities

Page 67: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]

Threshold based algorithm (RTA)• Sort the weighting vectors by their pair-wise similarity

(Similar vectors have similar top-k results)• Evaluate the first top-k query, calculate a threshold• For each weighting vector

– Try to prune using the threshold– Refine threshold

Page 68: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]

• Evaluate top-2 query for w1

• Set threshold based on w2

• score(q) for w2 > threshold discard w2

• Compute top-k for w3 and update the buffer

W=[ w1, w2, w3 ]Buffer: p1, p2

w1 q

p4p1

p2

p3

p5p6

p7

p8p9

p10

w2

w3

Example is from [Vlachou et al, ICDE 2010]

Page 69: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., SIGMOD 2013]

Branch-and-bound algorithm: Key idea• Weighting vectors and facilities are indexed (e.g., by R-tree)• Compute upper and lower bounds• Prune using the bounds• Process unpurned entries

Page 70: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries

IntroductionPre-computation based approachOn-the-fly algorithm

Other work

Page 71: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Reverse Skyline [Dellis et al., VLDB 2007]

Dominance• A point x dominates y if x is at least

as good as y on all the dimensions and x is better than y on at least one dimension

Skyline• Return every point that is not

dominated by any other point

x

y

Distance

Pri

ce

z

c

a

d

Page 72: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Reverse Skyline [Dellis et al., VLDB 2007]

Dynamic Dominance• A user u gives her ideal point • A point x dominates y if its difference

from u is not larger than y’s difference on each dimension and is smaller on at least one dimension

Dynamic Skyline• Return every point that is not

dynamically dominated by any other point

Transform each x[i] to |u[i] – x[i]|

x

Distance from airport

Room

siz

e

zy

a

bu

y` a`z`

b`

Page 73: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Reverse Skyline[Dellis et al., VLDB 2007]

Definition of Importance• A user u considers a facility f to be

important if f is among the dynamic skyline for the user u

Reverse Skyline• Return every user u for which the query

facility is in its dynamic skyline

x

Distance from airport

Room

siz

e

u

y` a`z`

b`

Page 74: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries

IntroductionPre-computation based approachOn-the-fly algorithm

Other work

Page 75: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Precomputation based approach[Dellis et al., VLDB 2007]

Pre-computation• For each user u

– Compute and store its dynamic skyline

Query processing• u is not an answer if q is dominated by

its pre-computed skyline• u is an answer if q is not dominated by

its pre-computed skyline

x

Distance from airport

Room

siz

e

u

y` a`z`

b`

q

q

Page 76: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Precomputation based approach[Dellis et al., VLDB 2007]

Reducing storage requirement• For each user u

– Store only k of its dynamic skyline points

Query processing– u is not an answer if q is dominated by any of

the k stored points– u is guaranteed to be an answer if q

dominates any of the k stored points– otherwise, call verification to check if u is an

answer

x

Distance from airport

Room

siz

e

u

z`

b`

q

qq

Page 77: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries

IntroductionPre-computation based approachOn-the-fly algorithm

Other work

Page 78: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

On-the-fly Algorithm[Dellis et al., VLDB 2007]

• Window of a user u is a rectangle centered at u and q on one of the corners

• A user u is an answer iff its window is empty

Key idea• Divide the space around q into 2d

partitions• Compute skyline for each partition• Any user dominated by these skylines

cannot be the answer

e

Distance from airport

Room

siz

e

dc

a

bq

f

g

u`

u

u

Page 79: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work

Page 80: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Other work on reverse spatial queries

Uncertain data Continuous Monitoring (e.g., moving objects, data

stream) Influence Maximization Other spaces (e.g., road network, general metric

space, non-metric space, obstructed space) Spatial Keyword Queries …

Page 81: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Open problems on reverse spatial queries

Location-based reverse top-k queries Location-based reverse skyline queries

Page 82: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Location-based Reverse Top-k

• Definition of importance– Each user u has a preference function– A facility f is important to a user u if f is

one of the top-k facilities for u• Reverse Top-k Query (RTk)

– Find every user u for which the query facility q is one of her top-k facilities.

Influence set of f1 is {u2}

Influence set of f2 is {u1,u3}

K=1

u2

f1

f2

u1

u3

Price=1

Price=22

3

0.9*price + 0.1*distance

0.5*price + 0.5*distance

1*distance

Page 83: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Location-based Reverse Skyline • Dominance

A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y

• Definition of importance A facility f is important to a user u if f is not

dominated by any other facility• Reverse Skyline

Find every user u for which the query facility q is not dominated by any other facility.

Influence set of f1 is {u1,u2}

Influence set of f2 is {u1,u2,u3}

u2

f1

f2

u1

u3

Price=1

Price=2

Page 84: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

References1. Flip Korn, S. Muthukrishnan: Influence Sets Based on Reverse Nearest Neighbor Queries. SIGMOD 2000:201-212

2. Ioana Stanoi, Divyakant Agrawal, Amr El Abbadi: Reverse Nearest Neighbor Queries for Dynamic Databases. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2000:44-53

3. Yufei Tao, Dimitris Papadias, Xiang Lian: Reverse kNN Search in Arbitrary Dimensionality. VLDB 2004:744-755

4. Evangelos Dellis, Bernhard Seeger: Efficient Computation of Reverse Skyline Queries. VLDB 2007:291-302

5. Wei Wu, Fei Yang, Chee Yong Chan, Kian-Lee Tan: FINCH: evaluating reverse k-Nearest-Neighbor queries on location data. PVLDB 1(1):1056-1067 (2008)

6. Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Reverse top-k queries. ICDE 2010:365-376

7. Muhammad Aamir Cheema, Xuemin Lin, Wenjie Zhang, Ying Zhang: Influence zone: Efficiently processing reverse k nearest neighbors queries. ICDE 2011:577-588

8. Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Monochromatic and Bichromatic Reverse Top-k Queries. IEEE Trans. Knowl. Data Eng. (TKDE) 23(8):1215-1229 (2011)

9. Muhammad Aamir Cheema, Wenjie Zhang, Xuemin Lin, Ying Zhang: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. (VLDB) 21(5):703-728 (2012)

10. Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: Branch-and-bound algorithm for reverse top-k queries. SIGMOD 2013:481-492

11. Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Ying Zhang: SLICE: Reviving regions-based pruning for reverse k nearest neighbors queries. ICDE 2014:760-771

12. Muhammad Aamir Cheema, Zhitao Shen, Xuemin Lin, Wenjie Zhang: A Unified Framework for Efficiently Processing Ranking Related Queries. EDBT 2014:427-438

13. Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Wei Wang: Reverse k Nearest Neighbors Query Processing: Experiments and Analysis. PVLDB 8(5):605-616 (2015)

Page 85: Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia aamir.cheema@monash.edu

Faculty of Information Technology

Thanks