information technology influence computation in spatial dabases muhammad aamir cheema faculty of...
TRANSCRIPT
Information Technology
Influence Computation in Spatial Dabases
Muhammad Aamir CheemaFaculty of Information TechnologyMonash University, Australia
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work
Faculty of Information Technology
Introduction: Influence Set
In a data set consisting of facilities and users, a facility influences a user if considers as one of its most “important” facilities
A set of users influenced by is called influence set of
Influence
Influence Set
U1
U2f2
f1
Influence Set of Coles
Faculty of Information Technology
Introduction: Influence Set
A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g., Distance Rating Price
Important facility?
Who are my potential customers ?
Faculty of Information Technology
Introduction: Influence Set
Important to identify potential users/customers Used in various applications such as marketing, cluster and
outlier analysis, and decision support systems
Significance
Reverse Nearest Neighbors Reverse Top- Reverse Skyline
Types
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008]Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
Reverse k Nearest Neighbors (RkNN)
• Definition of importance– A facility f is important to a user if f is
one of its k closest facilities
• Reverse k Nearest Neighbors– Find every user u for which the query
facility q is one of its k-closest facilities.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u3}
K=1
u2
f1
f2
u1
u3
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
Pre-computation based approach[F. Korn et al., SIGMOD 2000]
• Pre-computation– For each user u
• Draw a circle centered at u containing its k closest facilities
– Index these circles using an R-tree
• Query processing– Find the circles that contain q
• Problems– arbitrary k?– data updates?
u1
f1
f2u2
u3 f3
u4
k = 1
q q
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
On-the-fly RkNN Algorithms
Pruning
Verification• Find the users that lie in the
unpruned space• For each such user, check
whether it is a RkNN of q or not
• Prune the search space using near by facilities of q
Data indexed by R-trees
Faculty of Information Technology
On-the-fly RkNN AlgorithmsPruning
Verification
Half-space
Region-based
TPL (VLDB 2004), TPL++ (PVLDB 2015)
FINCH (PVLDB 2008),InfZone (ICDE 2011)
Six-regions (SIGMOD 2000)
SLICE (ICDE 2014)
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
1. Divide the whole space centred at the query q into six equal regions each of 60o
2. Let f be a facility in a partition P
3. Let u be a user in P for which dist(u,q) > dist(q,f)
4. q cannot be the closest facility of u
Proof Sketch: • fqu ≤ 60o and ufq > 60o
• ufq > fqu uq > uf
f
q
u
Six-regions: Pruning[I. Stanoi et al., SIGMOD Workshop 2000]
Faculty of Information Technology
1. Divide the whole space centred at the query q into six equal regions
2. Find the k-th nearest neighbor in each Partition.
3. The k-th nearest facility of q in each region defines the area that can be pruned
ba
c
d
q
u1
u2
Six-regions: Pruning[I. Stanoi et al., SIGMOD Workshop 2000] k =
2
Faculty of Information Technology
• Access users R-tree and prune the entries that lie in the pruned area
• For each unpruned user u– Issue a boolean range query
to check if u is a RkNN or not
Disadvantage: Requires boolean range query for each candidate user
ba
c
d
q
u1
Six-regions: Verification[I. Stanoi et al., SIGMOD 2000] k =
2
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
• Half-space Pruning:• q cannot be the closest facility of u if
it lies in the half-space• q cannot be among the k-
closest facilities of u if u lies in k half-spaces
• Pruning Algorithm1. Find the nearest unseen facility f in the
unpruned area.2. Draw a bisector between q and f to
prune the search space3. Go to step 1 unless all facilities in the
unpruned area have been accessed
ba
c
d
q
u
TPL: Pruning[Y. Tao et al., VLDB 2004]
k = 2
Faculty of Information Technology
TPL: Pruning[Y. Tao et al., VLDB 2004]
b
q
Advantage: Prunes more space than six-
regionsDisadvantage:X Pruning is more expensive especially when k is not small
Faculty of Information Technology
TPL: Pruning[Y. Tao et al., VLDB 2004]
Advantage: Prunes more space than six-
regionsDisadvantage:X Pruning is more expensive especially when k is not small
Find the k-half spaces that contain the user
Requires using subsets
aq
d
c b
u
k = 2
{a,b}
{b,c}
{c,d}
{a,c}
k! (m-k)!
m!
Faculty of Information Technology
TPL: Pruning[Y. Tao et al., VLDB 2004]
Solution: TPL does not use all possible subsets
1. Sort facilities by hilbert-values2. Consider only the subset
consisting of k consecutive facilities
Considers m subsetsX Some pruning power is lost
aq
d
c b
u
k = 2
{a,b}
{b,c}
{c,d}
{d,a}
{a,b,c,d}
Faculty of Information Technology
TPL: Verification[Y. Tao et al., VLDB 2004]
• Prune the user R-tree entries using the k-half spaces approach
• Determine the candidate users
• Issue a bulk boolean range query to verify all candidate users
aq
d
c b
u
k = 2
{a,b}
{b,c}
{c,d}
{d,a}
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
Key Idea Approximate the unpruned area
by a convex polygon
Advantage: Pruning is more efficient (e.g.,
point containment in logarithmic time)
FINCH: Pruning[W. Wu et al., PVLDB 2008]
aq
c b
u
k = 2
Faculty of Information Technology
Computing polygon• Get intersection points of half-spaces
and the boundary space• For each intersection point
– Compute a counter that denotes the number of half-spaces that contain it
– Remove the intersections with counter ≥ k
• Compute the convex hull of remaining intersection points
FINCH: Pruning[W. Wu et al., PVLDB 2008]
aq
c b
u
k = 2
2
1 13
1
1
00
00 0
1
2
Faculty of Information Technology
Pruning Algorithm1. Initialize whole space as the convex
polygon2. Find the nearest facility that lies inside
the convex polygon3. Draw its half-space, compute new
intersections and their counters and update the convex polygon
4. Go to step 2 until there is an un-accessed facility inside the polygon
FINCH: Pruning[W. Wu et al., PVLDB 2008]
aq
c b
u
k = 2
Faculty of Information Technology
• Prune the user R-tree entries that lie outside the convex polygon
• For each user that lies inside the polygon
– Issue a boolean range query to check if it is a RkNN or not
FINCH: Verification[W. Wu et al., PVLDB 2008]
aq
c b
u
k = 2
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
Influence Zone (InfZone): Motivation[M. Cheema et al., ICDE 2011]
Pruning
Verification
• Find the users that lie in the unpruned space
• For each such user, issue a boolean range query to verify it
• Prune the search space using near by facilities of q
Influence Zone is an area such that a user u is a RkNN if and only if u is inside this area
• Compute influence zone using near by facilities
• Find the users that lie in the influence zone
Faculty of Information Technology
The influence zone corresponds to the unpruned polygon when the bisectors of all the facilities have been considered for pruning.
Challenges:• How to compute unpruned polygon?• Using all facilities for pruning will be
very expensive
db
c
a
q
Influence Zone (InfZone): Challenges[M. Cheema et al., ICDE 2011] k =
2
Faculty of Information Technology
Challenge 1: Constructing the polygon• Like FINCH, compute the counters of
all intersections• Remove the intersections with
counter ≥ k• Keep only the intersections that
either lie on the boundary of the data space OR have counter equal to k-1 or k-2
• Keep only the extreme intersections on each boundary
• Sort the intersections according to their angles with q
• Connect the intersections in the sorted order
Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]
aq
c b
k = 2 2
1 1
3
1
1
00
00 0
2
0
Faculty of Information Technology
Challenge 2: Avoid accessing all facilities• Let Cv denote the circle centered at a
vertex v with radius dist(v,q)• A facility f can be ignored if it lies
outside Cv for every vertex of the current influence zone
• An entry e of the facility R-tree can be ignored if it lies outside Cv for every vertex of the current influence zone
Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]
aq
c b
k = 2
1 11
1
00
00
Faculty of Information Technology
Influence Zone Construction Algorithm• Initialize InfZone as the whole data space• Enheap the root of the R-tree in a heap• While heap is not empty
– De-heap an entry e– If e lies outside every Cv
• Ignore e– Else
• If e is an intermediate node– Insert children of e in the heap
• Else– Draw the bisector of e and
update the current influence zone
Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]
aq
c b
k = 2
1 11
1
00
00
Faculty of Information Technology
• Prune the user R-tree entries that lie outside the influence zone
• Return the users that lie inside the influence zone
Point containment can be done in logarithmic time O(log m)
Rectangle containment takes linear time O(m)
Influence Zone (InfZone): Verification[M. Cheema et al., ICDE 2011]
aq
c b
k = 2
1 11
1
00
00
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
SLICE: Motivation[S. Yang et al., ICDE 2014]
Regions-based (Six-regions)
Half-space
(InfZone)
VS
Range query
Pruning CostO(m log k) O(km2
)
Pruning Power
Verification Cost
Low High
O(log m)
SLICE
O(m log m)
High
O(k)
m is the # of facilities considered for pruning
Faculty of Information Technology
1. Divide the whole space centred at the query q into t equal regions
2. Draw arcs for each facility
3. k-th arc in each partition defines the pruning region
Pruning requires checking only one distance
q
f1
f2
k=2
SLICE: Key Idea[S. Yang et al., ICDE 2014]
Faculty of Information Technology
SLICE: Comparison with six-regions[S. Yang et al., ICDE 2014]
q
f
Six-region SLICE
Partitions Pruned
No. of Partitions
One
6
Area pruneddist(f,q) 𝑑𝑖𝑠𝑡 ( 𝑓 ,𝑞)2 cos(𝜃max)
< 90o
any
VSθmax
Faculty of Information Technology
SLICE: Verification[S. Yang et al., ICDE 2014]
• Significant facility: – k-th arc in each partition is called
the bounding arc – A facility f that prunes at least one
point p ∈ P lying inside the bounding arc of P.
– An insignifcant facility cannot prune any candidate user
MN
𝐫 𝐁
P
𝐫 𝐁 𝐫 𝐁
Verification for a candidate
Issuing range query
for each candidate
Access significant facilities during
pruning
High I/O and cpu cost
Use significant facilities to verify O(k)
Regions-based
2
SLICE
q
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
TPL++: Optimization 1[S. Yang et al., PVLDB 2015]TPL:1. Sort facilities by hilbert-values2. Consider only the subset
consisting of k consecutive facilities
X Considers m subsets X Some pruning power is lostTPL++:3. Initialize a counter to 0
4. Access facilities one by one
5. Increment the counter whenever a facility prunes the user u
6. Prune u when counter ≥ k
aq
d
c b
u
k = 2
{a,b}
{b,c}
{c,d}
{d,a}
O(km)
O(m)
Faculty of Information Technology
Pruning power: TPL vs TPL++[S. Yang et al., PVLDB 2015]
Faculty of Information Technology
TPL++: Optimization 2[S. Yang et al., PVLDB 2015]TPL:• A facility entry e or a facility
point that lies in the pruned space is ignored
TPL++:• A facility entry e that lies in the
pruned space is ignored• A facility point is used for
pruning even if it lies in the pruned space
aq
d
c b
u
d
Faculty of Information Technology
TPL vs TPL++
2 5 10 15 20 250
10
20
30
40I/O cost
TPL TPL++
k
2 5 10 15 20 250
60
120
180
240CPU cost (ms)
TPL TPL++
k
2 times better 20 times better
Faculty of Information Technology
Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms
Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]
Comparison of RkNN algorithms
Faculty of Information Technology
Pruning Six-regions
TPL TPL++ FINCH InfZone SLICE
node O(1) O(km) O(m) O(m) O(m) O(1)
point O(1) O(km) O(m) O(logm) O(m) O(1)
Adding f O(log k) O(logm) O(logm) O(m2) O(m2) O(log m)
Verification
node O(1) O(km) O(m) O(m) O(m) O(1)
point O(1) O(km) O(m) O(logm) O(logm)
O(1)
#candidates
Large Large Small Medium Minimal Small
Verifying u Range query
Bulk Range query
Bulk Range query
Range query
O(logm)
O(k)
Comparison of RkNN Algorithms
Faculty of Information Technology
Experimental Comparison [Yang et al., PVLDB 2015]
• Setup– Intel Xeon 2.66 GHz CPU, 4GB
Memory and Hard disk– Index: R*-tree – 100 buffers– I/O cost and CPU cost– Average cost per query
• Data sets– Three real data sets (up to 25M
points)– CA, LA and NA– Synthetic data sets follows
different distributions (up to 20M points)
Source code and data sets are available online
Faculty of Information Technology
Experimental Comparison [Yang et al., PVLDB 2015]
Faculty of Information Technology
50
RankingCriteria 1st 2nd 3rd 4th 5th 6th
I/O (no buffer) TPL++,InfZone
SLICE TPL FINCH SIX
I/O (small buffer)
TPL++,InfZone
FINCH SLICE TPL,SIX
CPU (k<10) SLICE InfZone TPL++ FINCH SIX,TPL
CPU (10<k<25) SLICE InfZone, TPL++
FINCH SIX TPL
CPU (25<k<200)
SLICE TPL++ SIX FINCH InfZone TPL
Implementation
SIX,SLICE TPL, TPL++
FINCH, InfZone
Experimental Comparison [Yang et al., PVLDB 2015]
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries
IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)
Reverse Skyline QueriesOther work
Faculty of Information Technology
Reverse Top-k (RTk) QueriesIntroduced by [Vlachou et al., ICDE 2010]
Examples are from [Vlachou et al, ICDE 2010]
Score(p2) = 0.2x3 + 0.8x2 = 2.2
• Definition of importance (Top-k queries)– Each user u has a preference function– Score of a facility is
score(f) = w[1]*f[1] + … w[d]*f[d]– A facility f is important to a user u if f is
one of the top-k facilities for u• Bichromatic Reverse Top-k Query (RTk)
– Find every user u for which the query facility q is one of her top-k facilities.
Tom and Max are the reverse top-1 users of p2
Bob is not a reverse top-1 user of p2
Faculty of Information Technology
Examples are from [Vlachou et al, ICDE 2010]
q = p2, k=1
• Bichromatic RTk queries– Find every user u for which the query facility q is one of her top-k
facilities. (e.g., result is {Tom, Max})• Monochromatic RTk Queries
– Find every weighting vector for which q is one of the top-k facilities.
Result: line segment where w[price]=[1/7,5/6]
Reverse Top-k (RTk) Queries: TypesIntroduced by [Vlachou et al., ICDE 2010]
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries
IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)
Reverse Skyline QueriesOther work
Faculty of Information Technology
• Score(q) is the projection on the vector w
• Rank(q) w.r.t. w number of facilities below the red line
• Rank(q) < Rank(f) for every w if q dominates f
• Ignore facilities that are dominated by q• Result is empty if k facilities dominate q
Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]
f
qw=[0.5,0.5]
f
f
Faculty of Information Technology
• The relative rank of q and f depends on the rotation of the red line
Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]
q
f
w
w`w``
Faculty of Information Technology
Algorithm• Start with vertical line• Rank(q) Count the number of facilities
on the left• Rotate the line counter-clockwise• Update Rank(q) when line intersects a
facility • Report the weighting vectors for which
Rank(q) ≤ k
Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]
q
a
b
Rank(q) = 21
Faculty of Information Technology
RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]
• Given a point a=(u,v) and a weighting vector W=(w1, w2), a.score = u*w1 + v*w2
• A point a=(u,v) is mapped to a line a*: y=ux + v in dual
• The weighting vector W=(w1, w2) is mapped to a vertical line W*: x=w1/w2
• The intersection of a* and w* is the point where y= u(w1/w2)+ v = (u*w1 +v*w2))/w2
a
b
a*
W*: x = w1/ w2
ya= a.score/w2
yb= b.score/w2
b*
Primal Dual
Faculty of Information Technology
RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]
• Query: Given a weighted vector W=(w1,w2), return k objects with smallest scores
• Solution: – Map W and all the objects to dual space– Return k lowest lines intersecting W*
a
b
W*: x = w1/ w2
Primal Dual
c d
1
2
Rank1. a2. b3. c4. d
Rank1. d2. b3. a4. c
W*: x = w3/ w4
Faculty of Information Technology
RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]
• Given a set of lines L, mass of a point p is the number of lines that lie strictly below p
• k-lower envelope consists of every point p that lies on one of the lines in L and has mass equal to k-1.
pp’
2-lower envelope
Faculty of Information Technology
RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]
• Map all facilities to dual space and compute k-lower envelope• Map query point to dual space• Return weighting vectors where query line is below the k-lower envelope
Slide # 61
a
b
Primal Dual
c dW*: x = w1/ w2
q
Faculty of Information Technology
Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]
Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection
Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope
Slide # 62
a
b
Primal Dual
c d
Faculty of Information Technology
Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]
Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection
Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope
a
b
c d
Line with k-th largest slope.
i.e., point in primal with k-th largest x-value
A point (u,v) in primal is
mapped to a line y=ux+v
Faculty of Information Technology
Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]
Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection
Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries
IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)
Reverse Skyline QueriesOther work
Faculty of Information Technology
Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]
Given a set of facilities F and a set of weighting vectors W, return every weighting vector for which q is one of the top-k facilities
Brute Force Algorithm: For each vector w in W
Compute top-k facilities Return w if q is among the top-k facilities
Faculty of Information Technology
Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]
Threshold based algorithm (RTA)• Sort the weighting vectors by their pair-wise similarity
(Similar vectors have similar top-k results)• Evaluate the first top-k query, calculate a threshold• For each weighting vector
– Try to prune using the threshold– Refine threshold
Faculty of Information Technology
Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]
• Evaluate top-2 query for w1
• Set threshold based on w2
• score(q) for w2 > threshold discard w2
• Compute top-k for w3 and update the buffer
W=[ w1, w2, w3 ]Buffer: p1, p2
w1 q
p4p1
p2
p3
p5p6
p7
p8p9
p10
w2
w3
Example is from [Vlachou et al, ICDE 2010]
Faculty of Information Technology
Bichromatic Reverse Top-k (≥2d)[Vlachou et al., SIGMOD 2013]
Branch-and-bound algorithm: Key idea• Weighting vectors and facilities are indexed (e.g., by R-tree)• Compute upper and lower bounds• Prune using the bounds• Process unpurned entries
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries
IntroductionPre-computation based approachOn-the-fly algorithm
Other work
Faculty of Information Technology
Reverse Skyline [Dellis et al., VLDB 2007]
Dominance• A point x dominates y if x is at least
as good as y on all the dimensions and x is better than y on at least one dimension
Skyline• Return every point that is not
dominated by any other point
x
y
Distance
Pri
ce
z
c
a
d
Faculty of Information Technology
Reverse Skyline [Dellis et al., VLDB 2007]
Dynamic Dominance• A user u gives her ideal point • A point x dominates y if its difference
from u is not larger than y’s difference on each dimension and is smaller on at least one dimension
Dynamic Skyline• Return every point that is not
dynamically dominated by any other point
Transform each x[i] to |u[i] – x[i]|
x
Distance from airport
Room
siz
e
zy
a
bu
y` a`z`
b`
Faculty of Information Technology
Reverse Skyline[Dellis et al., VLDB 2007]
Definition of Importance• A user u considers a facility f to be
important if f is among the dynamic skyline for the user u
Reverse Skyline• Return every user u for which the query
facility is in its dynamic skyline
x
Distance from airport
Room
siz
e
u
y` a`z`
b`
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries
IntroductionPre-computation based approachOn-the-fly algorithm
Other work
Faculty of Information Technology
Precomputation based approach[Dellis et al., VLDB 2007]
Pre-computation• For each user u
– Compute and store its dynamic skyline
Query processing• u is not an answer if q is dominated by
its pre-computed skyline• u is an answer if q is not dominated by
its pre-computed skyline
x
Distance from airport
Room
siz
e
u
y` a`z`
b`
q
q
Faculty of Information Technology
Precomputation based approach[Dellis et al., VLDB 2007]
Reducing storage requirement• For each user u
– Store only k of its dynamic skyline points
Query processing– u is not an answer if q is dominated by any of
the k stored points– u is guaranteed to be an answer if q
dominates any of the k stored points– otherwise, call verification to check if u is an
answer
x
Distance from airport
Room
siz
e
u
z`
b`
q
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries
IntroductionPre-computation based approachOn-the-fly algorithm
Other work
Faculty of Information Technology
On-the-fly Algorithm[Dellis et al., VLDB 2007]
• Window of a user u is a rectangle centered at u and q on one of the corners
• A user u is an answer iff its window is empty
Key idea• Divide the space around q into 2d
partitions• Compute skyline for each partition• Any user dominated by these skylines
cannot be the answer
e
Distance from airport
Room
siz
e
dc
a
bq
f
g
u`
u
u
Faculty of Information Technology
Outline
IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work
Faculty of Information Technology
Other work on reverse spatial queries
Uncertain data Continuous Monitoring (e.g., moving objects, data
stream) Influence Maximization Other spaces (e.g., road network, general metric
space, non-metric space, obstructed space) Spatial Keyword Queries …
Faculty of Information Technology
Open problems on reverse spatial queries
Location-based reverse top-k queries Location-based reverse skyline queries
Faculty of Information Technology
Location-based Reverse Top-k
• Definition of importance– Each user u has a preference function– A facility f is important to a user u if f is
one of the top-k facilities for u• Reverse Top-k Query (RTk)
– Find every user u for which the query facility q is one of her top-k facilities.
Influence set of f1 is {u2}
Influence set of f2 is {u1,u3}
K=1
u2
f1
f2
u1
u3
Price=1
Price=22
3
0.9*price + 0.1*distance
0.5*price + 0.5*distance
1*distance
Faculty of Information Technology
Location-based Reverse Skyline • Dominance
A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y
• Definition of importance A facility f is important to a user u if f is not
dominated by any other facility• Reverse Skyline
Find every user u for which the query facility q is not dominated by any other facility.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u1,u2,u3}
u2
f1
f2
u1
u3
Price=1
Price=2
Faculty of Information Technology
References1. Flip Korn, S. Muthukrishnan: Influence Sets Based on Reverse Nearest Neighbor Queries. SIGMOD 2000:201-212
2. Ioana Stanoi, Divyakant Agrawal, Amr El Abbadi: Reverse Nearest Neighbor Queries for Dynamic Databases. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2000:44-53
3. Yufei Tao, Dimitris Papadias, Xiang Lian: Reverse kNN Search in Arbitrary Dimensionality. VLDB 2004:744-755
4. Evangelos Dellis, Bernhard Seeger: Efficient Computation of Reverse Skyline Queries. VLDB 2007:291-302
5. Wei Wu, Fei Yang, Chee Yong Chan, Kian-Lee Tan: FINCH: evaluating reverse k-Nearest-Neighbor queries on location data. PVLDB 1(1):1056-1067 (2008)
6. Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Reverse top-k queries. ICDE 2010:365-376
7. Muhammad Aamir Cheema, Xuemin Lin, Wenjie Zhang, Ying Zhang: Influence zone: Efficiently processing reverse k nearest neighbors queries. ICDE 2011:577-588
8. Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Monochromatic and Bichromatic Reverse Top-k Queries. IEEE Trans. Knowl. Data Eng. (TKDE) 23(8):1215-1229 (2011)
9. Muhammad Aamir Cheema, Wenjie Zhang, Xuemin Lin, Ying Zhang: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. (VLDB) 21(5):703-728 (2012)
10. Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: Branch-and-bound algorithm for reverse top-k queries. SIGMOD 2013:481-492
11. Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Ying Zhang: SLICE: Reviving regions-based pruning for reverse k nearest neighbors queries. ICDE 2014:760-771
12. Muhammad Aamir Cheema, Zhitao Shen, Xuemin Lin, Wenjie Zhang: A Unified Framework for Efficiently Processing Ranking Related Queries. EDBT 2014:427-438
13. Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Wei Wang: Reverse k Nearest Neighbors Query Processing: Experiments and Analysis. PVLDB 8(5):605-616 (2015)
Faculty of Information Technology
Thanks