csis 7101: csis 7101: spatial data (part 3) distance browsing in spatial database gÍsli r....

33
CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric Lo Sindy Shou Hugh Wang

Upload: tavion-rootes

Post on 14-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

CSIS 7101:CSIS 7101:Spatial Data (Part 3)

Distance Browsing in Spatial Database

GÍSLI R. HJALTASON and HANAN SAMET

Rollo ChanChu Chung Man

Mak Wai YipVivian Lee

Eric LoSindy ShouHugh Wang

Page 2: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

What is Distance Browsing?

Browsing through the database on the basis of distances from an arbitrary spatial query object Ranking data objects in their order of distance from

a given query object E.g. Find the nearest person to me who is sleeping.

2 different techniques: k-nearest neighbor algorithm (k-NN) Incremental nearest neighbor algorithm

(INN)

A collection of spatial objects stored in an R-tree spatial data structure

Page 3: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Before All of Them

qo

Requirement - Consistency Definition:

Let d be the combination of functions d0 and dn, and e N denotethe fact that item e is contained in exactly set of nodes N. The functiond0 and dn are consistent iff for any query object q and any object or nodee in the hierarchical data structure there exists n in N, where e N, suchthat d(q, n) d(q, e)

The circle around query object q depictssearch region after reporting o as nextnearest object.

Page 4: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

ExampleR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

R0 (0)

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Find the THREE nearest neighbors to query point q in the R-tree given.

k-Nearest Neighbor Search

Incremental Nearest Neighbor Search

Page 5: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

k-Nearest Neighbor Search

Applicable only when k is fixed in advance

Maintain a global list of candidate k nearest neighbors as traverse in depth-first manner

Only make local decisions Next node to visit must be the child node

Make use of nearest list Comparing with the max. value in the

list

Page 6: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Pruning Strategies

Strategy 1:prunes an entry whose bounding rectangle r1 is such that

MINDIST(q, r1) > MINMAXDIST(q, r2),where r2 is some other bounding rectangle

Strategy 2:prunes an object o when

DIST(q, o) > MINMAXDIST(q, r),where r is some bounding rectangle.

b

o

a

q

r

o

b

a

q

r

MINDIST (optimistic)MINMAXDIST (pessimistic)

Page 7: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Pruning Strategies (con’t) Strategies 1 & 2 are useful only when

k=1 Strategy 3:

prunes any node whose bounding rectangle r is such that

MINDIST(q, r) > NearestList.MaxDist

Only MINDIST() is sufficient for pruning

Page 8: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Nearest ListNearest List

R0 (0)

Example – k-NNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

∞Max DistMax Dist.

a b

R4 R3

g hdR4: R3:

k = 3k = 3

Page 9: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Nearest ListNearest List

R0 (0)

Example – k-NNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

∞Max DistMax Dist.

a b

R4 R3

g hdR4: R3:g hd

d(59)g(81)h(17)

81

a(17)

59b(48) 48i(21) 21

k = 3k = 3

Page 10: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Problems with k-NN

Nodes/objects are not visited by order of distance.

May access non-optimal objects, and need to prune them.

Need to know k in advance, difficult to combine with other predicates.

Page 11: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Incremental Nearest Neighbor Search

Top-down manner tree traversal Depth-first traversal

Breadth-first traversal

Page 12: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Incremental Nearest Neighbor Search

INN use Best-first traversal Pick the node with least distance in the set of all

nodes that have yet to be visited Use a priority queue

Distance from the query object is the key Makes global decisions (k-NN make local

decisions) Based on priority queue Choose among the child nodes of all visited nodes

Page 13: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Page 14: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

R2 (0)R1 (0)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R0 (0)

Page 15: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

R3 (13) R4 (11)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R1 (0)

R2 (0)

Page 16: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

R6 (44)R5 (0)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R2 (0)R4 (11)R3 (13)

Page 17: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

[c](53)[i](0)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R5 (0)R4 (11)R3 (13)R6 (44)

Page 18: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

i (21)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

[i](0)

R4 (11)R3 (13)R6 (44)[c](53)

Page 19: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

[h](17)[g](74)[d](30)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R4 (11)R3 (13)

R6 (44)[c](53)

i (21)

Page 20: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Priority QueueQueue

[b](27)[a](13)

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

R3 (13)[h](17)i (21)

R6 (44)[c](53)

[d](30)

[g](74)

Page 21: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

[a](13)[h](17)i (21)

R6 (44)[c](53)

[d](30)

[g](74)

[b](27)

a (17)a (17)

Page 22: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

[h](17)i (21)

R6 (44)[c](53)

[d](30)

[g](74)

[b](27)

a (17)a (17)h (17)h (17)

Page 23: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

R0 (0)

Example – INNR0R1

R2

R3

R4

R5 R6

qf

c

g

d

hba

ei

abcdefghi

174857594886811721

13275330457474170

Seg. Dist. BR Dist.R0R1R2R3R4R5R6

0001311044

BR Dist.

e fc ia b

R5 R6R3 R4

R1 R2

g hd

R0:

R1: R2:

R3: R4: R5: R6:

Priority Priority QueueQueue

i (21)

R6 (44)[c](53)

[d](30)

[g](74)

[b](27)

a (17)a (17)h (17)h (17)i (21)i (21)

Page 24: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Variants Find Farthest Object:

Queue sorted in descending order of distance Replace <= by >=

Min and Max Distance: E.g. Find all Cities distanced from Hongkong

for 100 Miles to 200 Miles Prune unqualified nodes

Solve the Traditional k-NN Problem

Page 25: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Queue

Play a key role in performance In 2-dimension:

worst case unlikely to arise in practice expected number of points in queue = O( ) usually fit in memory

In higher-dimension: Higher dimension, larger queue size

k

Page 26: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Priority Queue (con’t) Idea:

priority queue will be split into three-tiers first tier in memory, 2nd and 3rd in a disk file a set of ranges, first tier stores the nearest

range, 3rd tier stores the farthest when 1st tier exhausted, move elements

from 2nd tier when 2nd tier exhausted, scan elements and

rebuild 1st and 2nd tier with new ranges

Page 27: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Comparison of k-NN and INNk-NN Depth-first recursion Make local decision k is fixed If used with k

unknown, Pick a fixed K’, do k-NN If k gradually > K’, pick

a m>=k and re-apply k-NN

Drawback: waste computational power if chosen m too large

INN Priority queue Make global decision Number of neighbors

not known in advanced

Page 28: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Experiment Dataset

Real-world data: TIGER/Line File Howard: 17,421 line segments Water: 37,495 line segments PG: 59,551 line segments Roads: 200,482 line segments

Synthetic data Hierarchical data structure: R*-tree Utilizing buffered I/O Three measures: execution time, R-tree node

I/O, object distance calculations

Page 29: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Cumulative Cost of Distance Browsing

Page 30: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Incremental Cost of Distance Browsing

Page 31: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

k-Nearest Neighbor Queries

Page 32: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

Experimental Result INN outperforms k-NN in distance browsing In k-NN queries, INN algorithm is better

than k-NN algorithm For large number of neighbor, priority

queue for INN is smaller than the NearestList maintained by k-NN

k-Nearest Neighbor Search

Incremental Nearest Neighbor Search

Page 33: CSIS 7101: CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip

References Gisli R. Hjaltason, Hanan Samet,

“Distance Browsing in Spatial Databases”, ACM TODS, Volume 24, Number 1, pp. 265-318, March 1999

~ THE END ~~ THE END ~