pods, may 23, 2012

18
Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Department of Computer Science, Duke University PODS, May 23, 2012 Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman

Upload: sylvester-vinson

Post on 31-Dec-2015

24 views

Category:

Documents


2 download

DESCRIPTION

Joint work with Pankaj K. Agarwal , Alon Efrat , and Swaminathan Sankararaman. Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Department of Computer Science, Duke University. PODS, May 23, 2012. Nearest-Neighbor Searching. a set of points in. any query point in. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PODS, May 23, 2012

Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang

Department of Computer Science, Duke University

PODS, May 23, 2012

Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman

Page 2: PODS, May 23, 2012

2

Nearest-Neighbor Searching

ApplicationsDatabases, Information RetrievalStatistical Classification, ClusteringPattern Recognition, Data CompressionComputer Vision, etc.

๐‘†

๐‘โˆ—

a set of points in

any query point in

Find the closest point to

๐‘ž

Page 3: PODS, May 23, 2012

3

Voronoi Diagram

Voronoi cell: Voronoi diagram : decomposition induced by

Preprocessing time

Space

Query time

๐‘๐‘–

Page 4: PODS, May 23, 2012

4

Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.

๐‘ž

What is the โ€œnearest neighborโ€ of now?

Page 5: PODS, May 23, 2012

5

Our Model and Problem Statement Uncertain point : represented as a probability density function(pdf) --

Expected distance:

. Find the expected nearest neighbor (ENN) of :

Or an -ENN : ๐‘ž ๐‘„

Page 6: PODS, May 23, 2012

6

Previous work Uncertain data

ENNโ€ข The ENN under metric: ฮต-approximation [Ljosa2007]โ€ข No bounds on the running time

Most likely NNโ€ข Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc]

Uncertain queryENNโ€ข Discrete uniform distribution: both exact and O(1)

factor approximation [Li2011, Sharifzadeh2010, etc] โ€ข No bounds on the running time

Page 7: PODS, May 23, 2012

7

Our contribution

Distance

function

Settings Preprocessing time Space Query time

Squared Euclidean distance

Uncertain data

Uncertain query

metric

Uncertain data

Uncertain query

Euclidean metric(-ENN)

Uncertain data

Uncertain query

Results in , extends to higher dimensions

First nontrivial methods for ENN queries with provable performance guarantees !

Page 8: PODS, May 23, 2012

8

Expected Voronoi cell

Expected Voronoi diagram : induced by

An example in metric

Expected Voronoi Diagram

Page 9: PODS, May 23, 2012

9

: the centroid of

Lemma:

โ€ข same as the weighted Voronoi diagram WVD

Squared Euclidean distanceUncertain data

Preprocessing time

Space Query time

Remarks: Works for any distribution

๏ฟฝฬ‚๏ฟฝ๐œŽ 2

๐‘ƒโˆˆ๐’ซ ๐‘ž

Ed (๐‘ƒ ,๐‘ž)|โˆจ๐‘žโˆ’๏ฟฝฬ‚๏ฟฝ ||2

๏ฟฝฬ‚๏ฟฝ๐œŽ 2

Page 10: PODS, May 23, 2012

10

metricUncertain data Size of : Lower bound construction

the inverse Ackermann function Remarks: Extends to metric

Page 11: PODS, May 23, 2012

11

metricUncertain data (cont.) A near-linear size index exists despite size of

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions

Page 12: PODS, May 23, 2012

12

Euclidean metric (-ENN)Uncertain data Approximate by

Outside the grid:

Inside the gird:

Total # of cells:

Remarks: Extends to any metric

8 Ed (๐‘ƒ , ๏ฟฝฬ‚๏ฟฝ)/ ๐œ€๏ฟฝฬ‚๏ฟฝ

Cell size: ๐œ€

Page 13: PODS, May 23, 2012

13

Euclidean metric (-ENN)Uncertain data (cont.)

A linear size approximate !

13

Preprocessing time

Space Query time

๐‘”๐‘ƒ 1

๐‘”๐‘ƒ 2

๐‘ž

Page 14: PODS, May 23, 2012

14

Conclusion and future work Conclusion

First nontrivial methods for answering exact or approximate ENN queries with provable performance guarantees

ENN is not a good indicator when the variance is large Future work

Linear-size index for most likely NN queries in sublinear time Index for returning the probability distribution of NNs

THANKS

Page 15: PODS, May 23, 2012

15

Squared Euclidean distanceUncertain query

: the centroid of

Preprocessingโ€ข Compute the Voronoi diagram VD Queryโ€ข Given , compute in , then query VD with

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions and works for any distribution

Page 16: PODS, May 23, 2012

16

Rectilinear metricUncertain query Similarly, linear pieces

Preprocessing time

Space

Query time

Page 17: PODS, May 23, 2012

17

Euclidean metric (-ENN)Uncertain query

Preprocessing time

Space

Query time

Remarks: Extends to higher dimensions

Page 18: PODS, May 23, 2012

18

metricUncertain data (cont.) A near-linear size index exists despite size of

linear pieces!

๐‘๐‘–๐‘—

โˆ’ (๐‘ฅ๐‘ ๐‘–๐‘—โˆ’๐‘ฅ๐‘ž)+(๐‘ฆ๐‘ ๐‘–๐‘—

โˆ’ ๐‘ฆ๐‘ž)

โˆ’ (๐‘ฅ๐‘ ๐‘–๐‘—โˆ’๐‘ฅ๐‘ž)โˆ’ ( ๐‘ฆ๐‘๐‘–๐‘—

โˆ’ ๐‘ฆ๐‘ž)

(๐‘ฅ๐‘๐‘–๐‘—โˆ’๐‘ฅ๐‘ž)+ (๐‘ฆ ๐‘๐‘–๐‘—

โˆ’ ๐‘ฆ๐‘ž)

(๐‘ฅ๐‘๐‘–๐‘—โˆ’๐‘ฅ๐‘ž)โˆ’ ( ๐‘ฆ๐‘ ๐‘–๐‘—

โˆ’๐‘ฆ๐‘ž )๐‘๐‘–๐‘—

Linear!

๐‘ƒ ๐‘–