cost modeling of spatial query operators using nonparametric regression songtao jiang department of...

Post on 18-Jan-2018

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Our Approach Training process Building model

TRANSCRIPT

Cost Modeling of Spatial Query Operators Using Nonparametric Regression

Songtao Jiang

Department of Computer ScienceUniversity of Vermont

October 10, 2003

Three Commonly used Spatial Operators

Range queryRange (reference object, range)

K nearest neighborKNN (reference object, number of neighbors)

Window queryWindow (a rectangle)

Our Approach

Training process

Building model

Cost variables Range query: <x, y, distance>

Window query: <x_left, y_bottom, x_right, y_top>(x_left, y_bottom) is the low left corner(x_right, y_top) is the upper right corner

KNN: <x, y, number>

Data sets

Real data set: 500,000 meters by 300,000 meters two dimensional space, 15,000 spatial objects, the distribution is unknown (Urban Areas of Counties in the Pennsylvania State. URL: http://www.psu.edu/access/urban.shtml)

Synthetic data set: 10,000 meters by 10,000 meters two dimensional space, 1000 or 10,000 objects, the distributions are uniform or Gaussian.

Urban area of Adams County in Pennsylvania State

Statistical Model (an example)

Range query, Distance = 1000 meters

Results (1)

Varying spatial operatorGaussian data set

0102030405060708090

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative CPU error

Perc

enta

ge o

f tes

ing

poin

ts

Range

KNN

Window

Gaussian data set

0102030405060708090

100

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative IO error

Perc

enta

ge o

f tes

ting

poin

ts

Range

KNN

Window

Results (2) Varying spatial data set density

Range query operator

0

10

20

30

40

50

60

70

80

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative CPU error

Prec

enta

ge o

f tes

ting

poin

ts

Denser

Sparser

Range query operator

0102030405060708090

100

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative IO error

Perc

enta

ge o

f tes

ting

poin

ts

Denser

Sparser

Results (3) Varying training data set size

Range query operator

0

10

20

30

40

50

60

70

80

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative CPU error

Perc

enta

ge o

f tes

ting

poin

ts

Large

Small

Range query operator

0102030405060708090

100

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative IO error

Perc

enta

ge o

f tes

ting

poin

ts

Large

Small

Conclusion

Accuracy

Easy to use

Time toleranceTraining overhead is small

top related