csis 7101: csis 7101: spatial data (part 3) distance browsing in spatial database gÍsli r....

Post on 14-Dec-2015

221 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSIS 7101:CSIS 7101:Spatial Data (Part 3)

Distance Browsing in Spatial Database

GÍSLI R. HJALTASON and HANAN SAMET

Rollo ChanChu Chung Man

Mak Wai YipVivian Lee

Eric LoSindy ShouHugh Wang

What is Distance Browsing?

Browsing through the database on the basis of distances from an arbitrary spatial query object Ranking data objects in their order of distance from

a given query object E.g. Find the nearest person to me who is sleeping.

2 different techniques: k-nearest neighbor algorithm (k-NN) Incremental nearest neighbor algorithm

(INN)

A collection of spatial objects stored in an R-tree spatial data structure

Before All of Them

qo

Requirement - Consistency Definition:

Let d be the combination of functions d0 and dn, and e N denotethe fact that item e is contained in exactly set of nodes N. The functiond0 and dn are consistent iff for any query object q and any object or nodee in the hierarchical data structure there exists n in N, where e N, suchthat d(q, n) d(q, e)

The circle around query object q depictssearch region after reporting o as nextnearest object.

ExampleR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

R0 (0)

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Find the THREE nearest neighbors to query point q in the R-tree given.

k-Nearest Neighbor Search

Incremental Nearest Neighbor Search

k-Nearest Neighbor Search

Applicable only when k is fixed in advance

Maintain a global list of candidate k nearest neighbors as traverse in depth-first manner

Only make local decisions Next node to visit must be the child node

Make use of nearest list Comparing with the max. value in the

list

Pruning Strategies

Strategy 1:prunes an entry whose bounding rectangle r1 is such that

MINDIST(q, r1) > MINMAXDIST(q, r2),where r2 is some other bounding rectangle

Strategy 2:prunes an object o when

DIST(q, o) > MINMAXDIST(q, r),where r is some bounding rectangle.

b

o

a

q

r

o

b

a

q

r

MINDIST (optimistic)MINMAXDIST (pessimistic)

Pruning Strategies (con’t) Strategies 1 & 2 are useful only when

k=1 Strategy 3:

prunes any node whose bounding rectangle r is such that

MINDIST(q, r) > NearestList.MaxDist

Only MINDIST() is sufficient for pruning

Nearest ListNearest List

R0 (0)

Example – k-NNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

∞Max DistMax Dist.

a b

R4 R3

g hdR4: R3:

k = 3k = 3

Nearest ListNearest List

R0 (0)

Example – k-NNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

∞Max DistMax Dist.

a b

R4 R3

g hdR4: R3:g hd

d(59)g(81)h(17)

81

a(17)

59b(48) 48i(21) 21

k = 3k = 3

Problems with k-NN

Nodes/objects are not visited by order of distance.

May access non-optimal objects, and need to prune them.

Need to know k in advance, difficult to combine with other predicates.

Incremental Nearest Neighbor Search

Top-down manner tree traversal Depth-first traversal

Breadth-first traversal

Incremental Nearest Neighbor Search

INN use Best-first traversal Pick the node with least distance in the set of all

nodes that have yet to be visited Use a priority queue

Distance from the query object is the key Makes global decisions (k-NN make local

decisions) Based on priority queue Choose among the child nodes of all visited nodes

Priority Priority QueueQueue

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

R2 (0)R1 (0)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R0 (0)

Priority Priority QueueQueue

R3 (13) R4 (11)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R1 (0)

R2 (0)

Priority Priority QueueQueue

R6 (44)R5 (0)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R2 (0)R4 (11)R3 (13)

Priority Priority QueueQueue

[c](53)[i](0)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R5 (0)R4 (11)R3 (13)R6 (44)

Priority Priority QueueQueue

i (21)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

[i](0)

R4 (11)R3 (13)R6 (44)[c](53)

Priority Priority QueueQueue

[h](17)[g](74)[d](30)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R4 (11)R3 (13)

R6 (44)[c](53)

i (21)

Priority Priority QueueQueue

[b](27)[a](13)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R3 (13)[h](17)i (21)

R6 (44)[c](53)

[d](30)

[g](74)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

[a](13)[h](17)i (21)

R6 (44)[c](53)

[d](30)

[g](74)

[b](27)

a (17)a (17)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

[h](17)i (21)

R6 (44)[c](53)

[d](30)

[g](74)

[b](27)

a (17)a (17)h (17)h (17)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

i (21)

R6 (44)[c](53)

[d](30)

[g](74)

[b](27)

a (17)a (17)h (17)h (17)i (21)i (21)

Variants Find Farthest Object:

Queue sorted in descending order of distance Replace <= by >=

Min and Max Distance: E.g. Find all Cities distanced from Hongkong

for 100 Miles to 200 Miles Prune unqualified nodes

Solve the Traditional k-NN Problem

Priority Queue

Play a key role in performance In 2-dimension:

worst case unlikely to arise in practice expected number of points in queue = O( ) usually fit in memory

In higher-dimension: Higher dimension, larger queue size

k

Priority Queue (con’t) Idea:

priority queue will be split into three-tiers first tier in memory, 2nd and 3rd in a disk file a set of ranges, first tier stores the nearest

range, 3rd tier stores the farthest when 1st tier exhausted, move elements

from 2nd tier when 2nd tier exhausted, scan elements and

rebuild 1st and 2nd tier with new ranges

Comparison of k-NN and INNk-NN Depth-first recursion Make local decision k is fixed If used with k

unknown, Pick a fixed K’, do k-NN If k gradually > K’, pick

a m>=k and re-apply k-NN

Drawback: waste computational power if chosen m too large

INN Priority queue Make global decision Number of neighbors

not known in advanced

Experiment Dataset

Real-world data: TIGER/Line File Howard: 17,421 line segments Water: 37,495 line segments PG: 59,551 line segments Roads: 200,482 line segments

Synthetic data Hierarchical data structure: R*-tree Utilizing buffered I/O Three measures: execution time, R-tree node

I/O, object distance calculations

Cumulative Cost of Distance Browsing

Incremental Cost of Distance Browsing

k-Nearest Neighbor Queries

Experimental Result INN outperforms k-NN in distance browsing In k-NN queries, INN algorithm is better

than k-NN algorithm For large number of neighbor, priority

queue for INN is smaller than the NearestList maintained by k-NN

k-Nearest Neighbor Search

Incremental Nearest Neighbor Search

References Gisli R. Hjaltason, Hanan Samet,

“Distance Browsing in Spatial Databases”, ACM TODS, Volume 24, Number 1, pp. 265-318, March 1999

~ THE END ~~ THE END ~

top related