cost modeling of spatial query operators using nonparametric regression songtao jiang department of...

11
Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10, 2003

Upload: edwin-boone

Post on 18-Jan-2018

212 views

Category:

Documents


0 download

DESCRIPTION

Our Approach Training process Building model

TRANSCRIPT

Page 1: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Cost Modeling of Spatial Query Operators Using Nonparametric Regression

Songtao Jiang

Department of Computer ScienceUniversity of Vermont

October 10, 2003

Page 2: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Three Commonly used Spatial Operators

Range queryRange (reference object, range)

K nearest neighborKNN (reference object, number of neighbors)

Window queryWindow (a rectangle)

Page 3: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Our Approach

Training process

Building model

Page 4: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Cost variables Range query: <x, y, distance>

Window query: <x_left, y_bottom, x_right, y_top>(x_left, y_bottom) is the low left corner(x_right, y_top) is the upper right corner

KNN: <x, y, number>

Page 5: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Data sets

Real data set: 500,000 meters by 300,000 meters two dimensional space, 15,000 spatial objects, the distribution is unknown (Urban Areas of Counties in the Pennsylvania State. URL: http://www.psu.edu/access/urban.shtml)

Synthetic data set: 10,000 meters by 10,000 meters two dimensional space, 1000 or 10,000 objects, the distributions are uniform or Gaussian.

Page 6: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Urban area of Adams County in Pennsylvania State

Page 7: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Statistical Model (an example)

Range query, Distance = 1000 meters

Page 8: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Results (1)

Varying spatial operatorGaussian data set

0102030405060708090

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative CPU error

Perc

enta

ge o

f tes

ing

poin

ts

Range

KNN

Window

Gaussian data set

0102030405060708090

100

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative IO error

Perc

enta

ge o

f tes

ting

poin

ts

Range

KNN

Window

Page 9: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Results (2) Varying spatial data set density

Range query operator

0

10

20

30

40

50

60

70

80

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative CPU error

Prec

enta

ge o

f tes

ting

poin

ts

Denser

Sparser

Range query operator

0102030405060708090

100

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative IO error

Perc

enta

ge o

f tes

ting

poin

ts

Denser

Sparser

Page 10: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Results (3) Varying training data set size

Range query operator

0

10

20

30

40

50

60

70

80

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative CPU error

Perc

enta

ge o

f tes

ting

poin

ts

Large

Small

Range query operator

0102030405060708090

100

<10% 10%-20%

20%-30%

30%-40%

>40%

Relative IO error

Perc

enta

ge o

f tes

ting

poin

ts

Large

Small

Page 11: Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,

Conclusion

Accuracy

Easy to use

Time toleranceTraining overhead is small