probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data

34
Probabilistic Verifiers: Evaluating Constrained Nearest- Neighbor Queries over Uncertain Data Reynold Cheng Hong Kong Polytechnic University [email protected] http:// www.comp.polyu.edu.hk/~csckcheng Jinchuan Chen ([email protected] ) Hong Kong Polytechnic University Mohamed Mokbel, Chi-Yin Chow ({ mokbel,cchow}@cs.umn.edu ) The University of Minnesota-Twin Cities IEEE ICDE 2008 IEEE ICDE 2008

Upload: nola

Post on 02-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

IEEE ICDE 2008. Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data. Reynold Cheng Hong Kong Polytechnic University [email protected] http://www.comp.polyu.edu.hk/~csckcheng. Jinchuan Chen ( [email protected] ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries

over Uncertain Data

Reynold Cheng Hong Kong Polytechnic [email protected]://www.comp.polyu.edu.hk/~csckcheng

Jinchuan Chen ([email protected]) Hong Kong Polytechnic UniversityMohamed Mokbel, Chi-Yin Chow ({mokbel,cchow}@cs.umn.edu)The University of Minnesota-Twin Cities

IEEE ICDE 2008IEEE ICDE 2008

Page 2: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 2

Location and Sensor Applications

Service Provider

GPS

sensornetwork

What is the region that gives max temperatur

e?

RF-ID

Find a cab closest to my

current location.

Page 3: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 3

Data Uncertainty

Measurement error [TDRP98, ISSD99] Sampling error [TDRP98, ISSD99] Network latency [TKDE04] Manually injected by users to protect

location privacy [PET06,VLDB06]

Page 4: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 4

Attribute Uncertainty Model [TDRP98, ISSD99,VLDB04b]

pdf

y(pdf)

Uncertainty region

We represent an uncertainty pdf as a histogram

Page 5: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 5

Probabilistic Nearest Neighbor Query (PNN) [TKDE04]

INPUT1. A query point called q

2. A set of n objects X1,X2,…, Xn with uncertainty regions and pdfs

OUTPUT A set of (Xi,pi) tuples

pi is the non-zero probability (qualification probability) that Xi is the nearest neighbor of q

Page 6: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 6

Basic Solution [TKDE04]

X2

qqnn11

ff

X1

X3

X4

f

ni

i drrDrdp1 4,3,2

11 ))(1()(

•ddii((rr)): distance pdf of : distance pdf of XXi i from from qq•DDii((rr)): distance cdf of : distance cdf of XXii from from qq•nnii:: s smallest distance of mallest distance of XXii from from qq•ff:: shortest max distance of all objects from shortest max distance of all objects from qq

X5

X6

Page 7: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 7

2 Assumptions A user only needs answers with confidence

higher than some threshold Approximation of qualification probabilities

is allowed

Page 8: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 8

Constrained Probabilistic Nearest-Neighbor Query (C-PNN) Denote

pi.l: lower bound of pi

pi.u: upper bound of pi P: Probability threshold ∆: Tolerance

Given (P, ∆), return a set {Xi}: pi.u P, and pi.l P, or pi.u – pi.l ∆

Page 9: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 9

0.96

0.80.85

0.75

P=0.8

(a) (b)

0.10.78

0.7

(c)

0.85

(d)

0.2

0.65

?

0.16

0.08

P=0.8

Illustrating C-PNN (with P=0.8, ∆=0.15)

pi.u

pi.l

To be refined

P=0.8

Page 10: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 10

Intuition If [pi.l, pi.u] is known, whether Xi satisfies C-PNN

can be computed without knowing pi.

0.3 0.20.1 0.2

0.3 0.30.4

0.2

0.40.3 0.3

R1

R3

q

R2

p1.l 0.3

p3.u 1-0.3

Compute [pi.l,pi.u] for any

distance pdf

Page 11: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 11

Solution FrameworkFiltering

Verification

Refinement

q

?

?

q

0.4

0.1

q

Page 12: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 12

Probabilistic Verifiers

Initialization

Candidate set (from filtering)

Sorted candidate set

IncrementalRefinement

L-SR

RS

U-SR

Classifier

In ascending order of

computational complexity

Test if Xi satisfies, or

fails the query

User

Xi

Page 13: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 13

Example: P=0.5,Δ=0.15Candidates (After filtering)

0

1

C

A

B

1

1

1

0

0.40.4

1

0

0.6

0.30.3 ?

0.4

0.540.14

0.35

0.480.13

Verifier Incremental Refinement

Classifier

Page 14: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 14

Partitioning uncertainty pdfs into subregions

0.2 0.20.1 0.2

0.3 0.3

0.1

0.4

0.2

0.40.3

0.1

0.2

e3 e4 e5e2e1 e6

S1 S2 S3 S4 S5

R1

R2

R3

Page 15: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 15

0.2 0.20.1 0.2

0.3 0.3

0.1

0.4

0.2

0.40.3

0.1

0.2

R1

R3

q

R2

e1

End-Points

e2 e3 e5 e6e4

S1 S2 S3 S4 S5

ff

Page 16: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 16

Subregion Data Structure

R1

R2

R3

0.3,0 0.2,0.3 0.1,0.5 0.2,0.8

0.3,0 0.3,0.3

0.3,0.7

0.4,0.6

0.2,0.6

0.4,0.30.3,0

S1 S2 S3 S4 S5

s35 , D3(e5)

Page 17: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 17

Rightmost-Subregion (RS) Verifier

0.2 0.20.1 0.2

0.3 0.3

0.1

0.4

0.2

0.40.3

0.1

0.2

R1

R3

q

R2

X3 has no chance to be the nearest neighbor when R2 > f2.

p3 1-0.3=0.7

p1 1-0.2=0.8

Page 18: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 18

RS Verifier

R1

R2

R3

0.3,0 0.2,0.3 0.1,0.5 0.2,0.8

0.3,0 0.3,0.3

0.3,0.7

0.4,0.6

0.2,0.6

0.4,0.30.3,0

S1 S2 S3 S4 S5

p3 0.7

p1 0.8

Page 19: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 19

L-SR and U-SR Verifiers

otherwise 1

1 if ))(1(1

. ikSU jjkjij

jkceD

clq

)))(1( ))(1((2

1. 1

ikSU jkikSU jkijjkjk

eDeDuq

No. of objects in subregion Sj

Qualifcation prob. of Xi in subregion Sj

Page 20: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 20

L-SR and U-SR Verifiers

0.2 0.20.1 0.2

0.3 0.3

0.1

s22 s23

0.4

0.2

0.40.3

s24

f1

n2

n3

f2

f3

0.1

0.2

n1

R1

R3

q

R2

S3

q13 =1 if both R2 and R3 are larger than e4

q13 =0 if either R2 or R3 are smaller than e3

q13 =1/3 if both R2 or R3 are insider S3

e3 e4

Page 21: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 21

Complexity of VerifiersAlgorithm Qualification

Prob. BoundCost

RS Upper O(|C|)

L-SR Lower O(|C|M)

U-SR Upper O(|C|M)

|C|=no. of candidates with non-zero prob.M= no. of subregions

Page 22: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 22

[p2.l, p2.u] = [q21.l,q21.u]*0.3 + [q22.l,q22.u]* 0.3 + [q23.l,q23.u] * 0.4[p2.l, p2.u] = q21* 0.3 + q22* 0.3 + [q23.l,q23.u] * 0.4p2 = q21* 0.3 + q22* 0.3 + q23* 0.4[p2.l, p2.u] = q21*0.3 + [q22.l,q22.u]* 0.3 + [q23.l,q23.u] * 0.4

Incremental Refinement

0.2 0.20.1 0.2

0.3 0.3

0.1

0.4

0.2

0.40.3

0.1

0.2

R1

R3

q

R2

Page 23: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 23

Experiment Setup

Uncertain Object DB Long Beach (53k)(http://www.census.gov/geo/www/tiger/)

Uncertainty pdf Uniform (default)

Gaussian (μ: center, : 1/6 of range)

Size of R-Tree/PTI Node 4kbytes

Threshold (P) 0.3

Delta (∆) 0.01

Page 24: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 24

1. Effect of Filtering

1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Total Set Size

Fra

ctio

n o

f Tim

e C

ost

Filtering

Basic

Page 25: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 25

2. Effect of Verification

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

20

40

60

80

100

120

Threshold

Tim

e (

ms)

BasicRefineVR

5 times

40 times

Page 26: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 26

2. Analysis of VR

0 0.1 0.3 0.5 0.7 0.9 10

10

20

30

40

50

60

70

80

90

Threshold

Tim

e (

ms)

FilteringVerificationRefinement

Page 27: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 27

3. Effect of Threshold

0.1 0.15 0.2 0.25 0.3 0.350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Threshold

Fra

ctio

n of

'Unk

now

n' T

uple

s

RSL-SRU-SR

Page 28: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 28

4. Effect of Tolerance

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Tolerance

Fra

ctio

n of

Com

plet

ed Q

uerie

s

Page 29: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 29

5. Gaussian pdf

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.0e-1

1.0e0

1.0e1

1.0e2

1.0e3

1.0e4

1.0e5

Threshold

Tim

e (

ms)

BasicRefineVR

Page 30: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 30

Related Works PNNQ

R-tree based [TKDE04] Monte-Carlo based [DASFAA07] Line-approximation of uncertainty pdf [ICDE07b]

Range Queries [DPD99, ISSD99, VLDB04a, VLDB05, ICDE07a]

Top-k Queries [ICDE07c, ICDE08b, ICDE08c] Skylines [VLDB07] and reverse skylines

[SIGMOD08] Identification in uncertain biometric database

[ICDE06]

Page 31: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 31

Other Uncertainty Models Probabilistic Database: each tuple is augmented with a

probability value (tuple uncertainty) Dalvi & Suciu [VLDB04b,ICDE07d] studied efficient query

operator evaluation with ranked results. [VLDB06, ICDE08b] combined the attribute and tuple

uncertainty models. A large branch of work deals with fuzzy modeling [IGP06].

Page 32: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 32

References[TKDE04] R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Querying imprecise data in

moving object environments. IEEE TKDE, 16(9), Sept. 2004.[SIGMOD03] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Evaluating probabilistic

queries over imprecise data,” in Proc. ACM SIGMOD, 2003.[DASFAA07] H. Kriegel, P. Kunath, and M. Renz, “Probabilistic nearest-neighbor query

on uncertain objects,” in DASFAA, 2007.[ICDE06]C. Bohm, A. Pryakhin, and M. Schubert, “The gauss-tree: Efficient object

identification in databases of probabilistic feature vectors,” in Proc. ICDE, 2006.[ICDE07a] J. Chen and R. Cheng, “Efficient evaluation of imprecise locationdependent

queries,” in Proc. ICDE, 2007.[IDG06] J. Galindo, A. Urrutia and M. Piattini. Fuzzy Databases: Modeling, Design, and

Implementation. Ideas Group Publishing, 2006.[ICDE08b[ M. Hua, J. Pei, X. Lin and W. Zhang. Efficiently Answering Probabilistic

Threshold Top-k Queries on Uncertain Data, ICDE 2008.[SIGMOD08] X. Lian and L. Chen. Monochromatic and bichromatic reverse skyline

search over uncertain databases. In Proc. SIGMOD, 2008.[ICDE08c] K. Yi, F. Li, D. Srivastava, and G. Kollios. Efficient processing of top-k

queries in uncertain databases. In Proc. ICDE, 2008.

Page 33: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 33

References[VLDB05]Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. Kao, and S. Prabhakar, “Indexing multi-dimensional

uncertain data with arbitrary probability density functions,” in Proc. VLDB, 2005[VLDB04b] N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. VLDB 2004.[ICDE07d] Chris Re, Nilesh Dalvi, Dan Suciu. Efficient Top-k Query Evaluation on Probabilistic Data.

ICDE, 2007[VLDB04c] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein and W. Hong. Model-Driven Data

Acquisition in Sensor Networks. In VLDB, 2004.[VLDB06] O. Mar, A. Sarma, A. Halevy, and J. Widom. ULDBs: databases with uncertainty and lineage. In

VLDB, 2006.[ICDE07b] V. Ljosa and A. K. Singh. APLA: Indexing arbitrary probability distributions. In Proc. ICDE,

2007.[ADI00] Y. Manolopoulos, Y. Theodoridis, and V. J. Tsotras. Chapter 4: Access methods for intervals. In

Advanced Database Indexing, Kluwer, 2000.[VLDB07] J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In Proc. VLDB,

2007.[DPD99] O. Wolfson, P. Sistla, S. Chamberlain, and Y. Yesha. Updating and querying databases that track

mobile units. Distributed and Parallel Databases, 7(3), 1999.[ISSD99] D. Pfoser and C. S. Jensen. Capturing the Uncertainty of Moving-Object Representations, in Proc.

of the Sixth International Symposium on Spatio Databases, Hong Kong, July 20-23, 1999, pp. 111-132.

[ICDE08a] Singh et al. Database support for pdf attributes. In Proc. ICDE, 2008.[ICDE07c] M. Soliman, I. Ilyas, and K. Chang. Top-k query processing in uncertain databases. In ICDE,

2007.

Page 34: Probabilistic Verifiers:  Evaluating Constrained Nearest-Neighbor Queries  over Uncertain Data

Probabilistic Verifiers Cheng, Chen, Mokbel, Chow 34

Conclusions To avoid expensive evaluation of PNNQ, we

propose the notion of constrained PNNQ (P, ∆). We present a framework which gradually refines

the bounds of qualification probabilities. RS, L-SR, and U-SR verifiers Incremental Refinement

The method deals with arbitrary uncertainty pdf