csis 7101: csis 7101: spatial data (part 3) distance browsing in spatial database gÍsli r....
TRANSCRIPT
CSIS 7101:CSIS 7101:Spatial Data (Part 3)
Distance Browsing in Spatial Database
GÍSLI R. HJALTASON and HANAN SAMET
Rollo ChanChu Chung Man
Mak Wai YipVivian Lee
Eric LoSindy ShouHugh Wang
What is Distance Browsing?
Browsing through the database on the basis of distances from an arbitrary spatial query object Ranking data objects in their order of distance from
a given query object E.g. Find the nearest person to me who is sleeping.
2 different techniques: k-nearest neighbor algorithm (k-NN) Incremental nearest neighbor algorithm
(INN)
A collection of spatial objects stored in an R-tree spatial data structure
Before All of Them
qo
Requirement - Consistency Definition:
Let d be the combination of functions d0 and dn, and e N denotethe fact that item e is contained in exactly set of nodes N. The functiond0 and dn are consistent iff for any query object q and any object or nodee in the hierarchical data structure there exists n in N, where e N, suchthat d(q, n) d(q, e)
The circle around query object q depictssearch region after reporting o as nextnearest object.
ExampleR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
R0 (0)
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
Find the THREE nearest neighbors to query point q in the R-tree given.
k-Nearest Neighbor Search
Incremental Nearest Neighbor Search
k-Nearest Neighbor Search
Applicable only when k is fixed in advance
Maintain a global list of candidate k nearest neighbors as traverse in depth-first manner
Only make local decisions Next node to visit must be the child node
Make use of nearest list Comparing with the max. value in the
list
Pruning Strategies
Strategy 1:prunes an entry whose bounding rectangle r1 is such that
MINDIST(q, r1) > MINMAXDIST(q, r2),where r2 is some other bounding rectangle
Strategy 2:prunes an object o when
DIST(q, o) > MINMAXDIST(q, r),where r is some bounding rectangle.
b
o
a
q
r
o
b
a
q
r
MINDIST (optimistic)MINMAXDIST (pessimistic)
Pruning Strategies (con’t) Strategies 1 & 2 are useful only when
k=1 Strategy 3:
prunes any node whose bounding rectangle r is such that
MINDIST(q, r) > NearestList.MaxDist
Only MINDIST() is sufficient for pruning
Nearest ListNearest List
R0 (0)
Example – k-NNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
∞Max DistMax Dist.
a b
R4 R3
g hdR4: R3:
k = 3k = 3
Nearest ListNearest List
R0 (0)
Example – k-NNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
∞Max DistMax Dist.
a b
R4 R3
g hdR4: R3:g hd
d(59)g(81)h(17)
81
a(17)
59b(48) 48i(21) 21
k = 3k = 3
Problems with k-NN
Nodes/objects are not visited by order of distance.
May access non-optimal objects, and need to prune them.
Need to know k in advance, difficult to combine with other predicates.
Incremental Nearest Neighbor Search
Top-down manner tree traversal Depth-first traversal
Breadth-first traversal
Incremental Nearest Neighbor Search
INN use Best-first traversal Pick the node with least distance in the set of all
nodes that have yet to be visited Use a priority queue
Distance from the query object is the key Makes global decisions (k-NN make local
decisions) Based on priority queue Choose among the child nodes of all visited nodes
Priority Priority QueueQueue
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
Priority Priority QueueQueue
R2 (0)R1 (0)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
R0 (0)
Priority Priority QueueQueue
R3 (13) R4 (11)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
R1 (0)
R2 (0)
Priority Priority QueueQueue
R6 (44)R5 (0)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
R2 (0)R4 (11)R3 (13)
Priority Priority QueueQueue
[c](53)[i](0)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
R5 (0)R4 (11)R3 (13)R6 (44)
Priority Priority QueueQueue
i (21)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
[i](0)
R4 (11)R3 (13)R6 (44)[c](53)
Priority Priority QueueQueue
[h](17)[g](74)[d](30)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
R4 (11)R3 (13)
R6 (44)[c](53)
i (21)
Priority Priority QueueQueue
[b](27)[a](13)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
R3 (13)[h](17)i (21)
R6 (44)[c](53)
[d](30)
[g](74)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
Priority Priority QueueQueue
[a](13)[h](17)i (21)
R6 (44)[c](53)
[d](30)
[g](74)
[b](27)
a (17)a (17)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
Priority Priority QueueQueue
[h](17)i (21)
R6 (44)[c](53)
[d](30)
[g](74)
[b](27)
a (17)a (17)h (17)h (17)
R0 (0)
Example – INNR0R1
R2
R3
R4
R5 R6
qf
c
g
d
hba
ei
abcdefghi
174857594886811721
13275330457474170
Seg. Dist. BR Dist.R0R1R2R3R4R5R6
0001311044
BR Dist.
e fc ia b
R5 R6R3 R4
R1 R2
g hd
R0:
R1: R2:
R3: R4: R5: R6:
Priority Priority QueueQueue
i (21)
R6 (44)[c](53)
[d](30)
[g](74)
[b](27)
a (17)a (17)h (17)h (17)i (21)i (21)
Variants Find Farthest Object:
Queue sorted in descending order of distance Replace <= by >=
Min and Max Distance: E.g. Find all Cities distanced from Hongkong
for 100 Miles to 200 Miles Prune unqualified nodes
Solve the Traditional k-NN Problem
Priority Queue
Play a key role in performance In 2-dimension:
worst case unlikely to arise in practice expected number of points in queue = O( ) usually fit in memory
In higher-dimension: Higher dimension, larger queue size
k
Priority Queue (con’t) Idea:
priority queue will be split into three-tiers first tier in memory, 2nd and 3rd in a disk file a set of ranges, first tier stores the nearest
range, 3rd tier stores the farthest when 1st tier exhausted, move elements
from 2nd tier when 2nd tier exhausted, scan elements and
rebuild 1st and 2nd tier with new ranges
Comparison of k-NN and INNk-NN Depth-first recursion Make local decision k is fixed If used with k
unknown, Pick a fixed K’, do k-NN If k gradually > K’, pick
a m>=k and re-apply k-NN
Drawback: waste computational power if chosen m too large
INN Priority queue Make global decision Number of neighbors
not known in advanced
Experiment Dataset
Real-world data: TIGER/Line File Howard: 17,421 line segments Water: 37,495 line segments PG: 59,551 line segments Roads: 200,482 line segments
Synthetic data Hierarchical data structure: R*-tree Utilizing buffered I/O Three measures: execution time, R-tree node
I/O, object distance calculations
Cumulative Cost of Distance Browsing
Incremental Cost of Distance Browsing
k-Nearest Neighbor Queries
Experimental Result INN outperforms k-NN in distance browsing In k-NN queries, INN algorithm is better
than k-NN algorithm For large number of neighbor, priority
queue for INN is smaller than the NearestList maintained by k-NN
k-Nearest Neighbor Search
Incremental Nearest Neighbor Search
References Gisli R. Hjaltason, Hanan Samet,
“Distance Browsing in Spatial Databases”, ACM TODS, Volume 24, Number 1, pp. 265-318, March 1999
~ THE END ~~ THE END ~