approximate string search in spatial databaseslifeifei/papers/sas.pdfapproximate string search is...

124
1-1 Approximate String Search in Spatial Databases Bin Yao Florida State University Kun Hou Florida State University Marios Hadjieleftheriou AT&T Labs Research Feifei Li Florida State University

Upload: others

Post on 11-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

1-1

Approximate String Search in Spatial Databases

Bin Yao

Florida StateUniversity

Kun Hou

Florida StateUniversity

Marios Hadjieleftheriou

AT&T Labs Research

Feifei Li

Florida StateUniversity

Page 2: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

2-1

Introduction

Approximate string search is important

data cleaning

data integration

online search engine

Page 3: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

2-2

Introduction

Approximate string search is important

Also, spatial database

data cleaning

data integration

online search engine

map service

strategic planning of resources

Page 4: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

2-3

Introduction

Approximate string search is important

Also, spatial database

Our work: the approximate string search in spatial database(Spatial Approximate String (SAS) queries).

data cleaning

data integration

online search engine

map service

strategic planning of resources

Page 5: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

3-1

Example of a SAS range query

String similarity metric: edit distance with threshold τ .

Page 6: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

3-2

Example of a SAS range query

String similarity metric: edit distance with threshold τ .

Page 7: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

3-3

Example of a SAS range query

String similarity metric: edit distance with threshold τ .

Page 8: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

4-1

A straightforward solution

R-tree solution:

Page 9: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

4-2

A straightforward solution

R-tree solution:

Page 10: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

4-3

A straightforward solution

R-tree solution:

Page 11: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

4-4

A straightforward solution

R-tree solution:

Problem: only utilize the spatial dimension for the pruning.

Page 12: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

5-1

Background: q-grams based edit distance pruning

Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.

Page 13: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

5-2

Background: q-grams based edit distance pruning

Example: q = 2, τ = 2σ1 : theatre; σ2 : theater.

Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.

Page 14: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

5-3

Background: q-grams based edit distance pruning

Example: q = 2, τ = 2σ1 : theatre; σ2 : theater.

Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.

Gσ1{#t, th, he, ea, at, tr, re, e$},Gσ2{#t, th, he, ea, at, te, er, r$}.

Page 15: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

5-4

Background: q-grams based edit distance pruning

Example: q = 2, τ = 2σ1 : theatre; σ2 : theater.

Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.

Gσ1{#t, th, he, ea, at, tr, re, e$},Gσ2{#t, th, he, ea, at, te, er, r$}.

Page 16: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

6-1

Background: estimating set resemblance

Set resemblance: two sets A and Bρ(A,B) = |A∩B|

|A∪B| .

Page 17: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

6-2

Background: estimating set resemblance

Set resemblance: two sets A and Bρ(A,B) = |A∩B|

|A∪B| .

The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),

Pr(min{π(X)}) = π(x)) = 1|X| .

Page 18: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

6-3

Background: estimating set resemblance

Set resemblance: two sets A and Bρ(A,B) = |A∩B|

|A∪B| .

s(A) = {min{π1(A)}, . . . ,min{π`(A)}},s(A ∪B) = {min{π1(A), π1(B)}, . . . ,min{π`(A), π`(B)}}.

The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),

Pr(min{π(X)}) = π(x)) = 1|X| .

Page 19: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

6-4

Background: estimating set resemblance

Unbiased estimator for ρ(A,B):

ρ(A,B) = Pr(min{π(A)} = min{π(B)}).

Set resemblance: two sets A and Bρ(A,B) = |A∩B|

|A∪B| .

s(A) = {min{π1(A)}, . . . ,min{π`(A)}},s(A ∪B) = {min{π1(A), π1(B)}, . . . ,min{π`(A), π`(B)}}.

The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),

Pr(min{π(X)}) = π(x)) = 1|X| .

Page 20: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

6-5

Background: estimating set resemblance

Unbiased estimator for ρ(A,B):

ρ(A,B) = Pr(min{π(A)} = min{π(B)}).

Set resemblance: two sets A and Bρ(A,B) = |A∩B|

|A∪B| .

s(A) = {min{π1(A)}, . . . ,min{π`(A)}},s(A ∪B) = {min{π1(A), π1(B)}, . . . ,min{π`(A), π`(B)}}.

ρ(A,B) =|{i|min{πi(A)} = min{πi(B)}}|

`.

The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),

Pr(min{π(X)}) = π(x)) = 1|X| .

Page 21: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

7-1

Background: estimating set resemblance

hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1

Set A = {1, 2, 4}

Page 22: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

7-2

Background: estimating set resemblance

hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1

Set A = {1, 2, 4}

signatureof A

1121

Page 23: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

7-3

Background: estimating set resemblance

hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1

Set A = {1, 2, 4}

signatureof A

1121

Set B = {2, 3}hashes 2 3h1 3 2h2 1 4h3 4 1h4 2 3

signatureof B

2112

Page 24: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

7-4

Background: estimating set resemblance

hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1

Set A = {1, 2, 4}

signatureof A

1121

Set B = {2, 3}hashes 2 3h1 3 2h2 1 4h3 4 1h4 2 3

signatureof B

2112

Page 25: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

7-5

Background: estimating set resemblance

hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1

Set A = {1, 2, 4}

signatureof A

1121

Set B = {2, 3}hashes 2 3h1 3 2h2 1 4h3 4 1h4 2 3

signatureof B

2112

ρ(A,B) = 1/4

Page 26: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

8-1

The MHR-tree: basic idea

Page 27: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

8-2

The MHR-tree: basic idea

Lemma 2 Let Gu be the set for the union of q-grams of strings inthe subtree of node u. For a SAS query (r, σ, τ), if |Gu∩Gσ| <|σ|− 1− (τ − 1) ∗ q, then the subtree of node u does not containany element from As.

Page 28: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

9-1

The MHR-tree: union of q-grams (q = 2)

· · ·ab cd ab ef gh ij kl nr

Page 29: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

9-2

The MHR-tree: union of q-grams (q = 2)

· · ·

· · ·

ab cd ab ef gh ij kl nr

ab, cd, ef gh, ij, kl, nr

Page 30: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

9-3

The MHR-tree: union of q-grams (q = 2)

· · ·

· · ·

· · ·

ab cd ab ef gh ij kl nr

ab, cd, ef, gh, ij, kl, nr

ab, cd, ef gh, ij, kl, nr

Page 31: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

9-4

The MHR-tree: union of q-grams (q = 2)

· · ·

· · ·

· · ·

ab cd ab ef gh ij kl nr

ab, cd, ef, gh, ij, kl, nr

ab, cd, ef gh, ij, kl, nr

MBR

Page 32: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-1

The MHR-tree

Min-wise signature with linear hashing R-tree (MHR-tree)

Page 33: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-2

The MHR-tree

2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2

Min-wise signature with linear hashing R-tree (MHR-tree)

Page 34: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-3

The MHR-tree

2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2

Min-wise signature with linear hashing R-tree (MHR-tree)

Page 35: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-4

The MHR-tree

2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2

2, 1, 1 1, 2, 2 · · ·

Min-wise signature with linear hashing R-tree (MHR-tree)

Page 36: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-5

The MHR-tree

2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2

2, 1, 1 1, 2, 2 · · ·

Min-wise signature with linear hashing R-tree (MHR-tree)

Page 37: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-6

The MHR-tree

2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2

2, 1, 1 1, 2, 2

1, 1, 1

· · ·

Min-wise signature with linear hashing R-tree (MHR-tree)

Page 38: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

10-7

The MHR-tree

2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2

2, 1, 1 1, 2, 2

1, 1, 1

· · ·

Min-wise signature with linear hashing R-tree (MHR-tree)

MBR

Page 39: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

11-1

Query algorithms for the MHR-tree

Range-MHR(MHR-tree R, Range r, String σ, int τ)

Follow the range query algorithm on R-tree,

If u is a leaf nodeFor every point p ∈ up

If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;

ElseFor every child node wi of u

If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;

Return A

Page 40: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

11-2

Query algorithms for the MHR-tree

Range-MHR(MHR-tree R, Range r, String σ, int τ)

Follow the range query algorithm on R-tree,

If u is a leaf nodeFor every point p ∈ up

If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;

ElseFor every child node wi of u

If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;

Return A

Page 41: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

11-3

Query algorithms for the MHR-tree

Range-MHR(MHR-tree R, Range r, String σ, int τ)

Follow the range query algorithm on R-tree,

If u is a leaf nodeFor every point p ∈ up

If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;

ElseFor every child node wi of u

If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;

Return A

Page 42: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

11-4

Query algorithms for the MHR-tree

Range-MHR(MHR-tree R, Range r, String σ, int τ)

Follow the range query algorithm on R-tree,

If u is a leaf nodeFor every point p ∈ up

If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;

ElseFor every child node wi of u

If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;

Return A

Page 43: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

11-5

Query algorithms for the MHR-tree

Range-MHR(MHR-tree R, Range r, String σ, int τ)

Follow the range query algorithm on R-tree,

If u is a leaf nodeFor every point p ∈ up

If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;

ElseFor every child node wi of u

If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;

Return A

Page 44: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-1

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

Page 45: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-2

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5

Page 46: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-3

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5

Page 47: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-4

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|

` = 3/4.

Page 48: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-5

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|

` = 3/4.

ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|

|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .

Page 49: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-6

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|

` = 3/4.

ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|

|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .

Page 50: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-7

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|

` = 3/4.

ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|

|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .

|Gu ∪Gσ| = |Gσ|ρ(G,Gσ) = 5/(3/4) = 20/3.

Page 51: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-8

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|

` = 3/4.

ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|

|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .

|Gu ∪Gσ| = |Gσ|ρ(G,Gσ) = 5/(3/4) = 20/3.

ρ(Gu, Gσ) = |{i|min{hi(Gu)}=min{hi(Gσ)}}|` = 3/4.

Page 52: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

12-9

Query algorithms for the MHR-tree

Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.

σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|

` = 3/4.

ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|

|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .

|Gu ∪Gσ| = |Gσ|ρ(G,Gσ) = 5/(3/4) = 20/3.

ρ(Gu, Gσ) = |{i|min{hi(Gu)}=min{hi(Gσ)}}|` = 3/4.

|Gu ∩Gσ| = ρ(Gu, Gσ) ∗ |Gu ∪Gσ| = 3/4 ∗ 20/3 = 5.

Page 53: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

13-1

Duplicate q-grams in strings

Issue 1: duplicate q-grams in one string (for both query and data).

Page 54: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

13-2

Duplicate q-grams in strings

Issue 1: duplicate q-grams in one string (for both query and data).

example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}

Page 55: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

13-3

Duplicate q-grams in strings

Issue 1: duplicate q-grams in one string (for both query and data).

example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}

Page 56: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

13-4

Duplicate q-grams in strings

Issue 1: duplicate q-grams in one string (for both query and data).

example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}

Page 57: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

13-5

Duplicate q-grams in strings

Issue 1: duplicate q-grams in one string (for both query and data).

example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}

Issue 2: duplicate q-grams between strings.

Page 58: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

13-6

Duplicate q-grams in strings

Issue 1: duplicate q-grams in one string (for both query and data).

example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}

Issue 2: duplicate q-grams between strings.

Do not distinguish q-grams from different nodes.node 1: pizz (#p, pi, iz, zz, z$);node 2: zza (#z, zz, za, a$)parent node 3:union of signatures corresponding to (#p, pi, iz, zz, z$, #z,za, a$)

Page 59: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-1

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

Page 60: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-2

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

Page 61: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-3

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

b1

b2

b3

Page 62: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-4

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

b1

b2

b3

Θ(b2)

Page 63: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-5

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

b1

b2

b3

Θ(b2)

Page 64: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-6

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

V Sol1b1

b2

b3

V Sol2

V Sol3

Θ(b2)

Page 65: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-7

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

V Sol1b1

b2

b3

V Sol2

V Sol3

r

Θ(b2)

Page 66: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-8

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

V Sol1b1

b2

b3

V Sol2

V Sol3

r

Θ(b2)

Θ(b2, r)

Page 67: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-9

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

V Sol1b1

b2

b3

V Sol2

V Sol3

r

Θ(b2)

Θ(b2, r)

|Abi | = niΘ(bi,r)Θ(bi)

ρiLMni

= Θ(bi,r)Θ(bi)

ρiLM . (ρiLM estimates the number

of similar strings with query string in bi.)

Page 68: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

14-10

Selectivity estimation for SAS range queries

Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).

V Sol1b1

b2

b3

V Sol2

V Sol3

r

Θ(b2)

Θ(b2, r)

|Abi | = niΘ(bi,r)Θ(bi)

ρiLMni

= Θ(bi,r)Θ(bi)

ρiLM . (ρiLM estimates the number

of similar strings with query string in bi.)

Improvements:Minimum number of neighborhoods principle,Spatial uniformity principle.

Page 69: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-1

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

Page 70: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-2

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

toyτ = 1

boycoy

min

men

mine

too

Page 71: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-3

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

toyτ = 1

boycoy

min

men

mine

tooη = 4

Page 72: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-4

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

toyτ = 1

boycoy

min

men

mine

tooη = 2

Page 73: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-5

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

toyτ = 1

boycoy

min

men

mine

tooη = 2

A smaller η give a more accurate estimator.

Page 74: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-6

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

Spatial uniformity principle:

toyτ = 1

boycoy

min

men

mine

tooη = 2

A smaller η give a more accurate estimator.

Page 75: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-7

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

Spatial uniformity principle:

toyτ = 1

boycoy

min

men

mine

tooη = 2

A smaller η give a more accurate estimator.

Page 76: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-8

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

Spatial uniformity principle:

toyτ = 1

boycoy

min

men

mine

tooη = 2

A smaller η give a more accurate estimator.

Page 77: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

15-9

Two improvements for selectivity estimation

Minimum number of neighborhoods principle:

Spatial uniformity principle:

toyτ = 1

boycoy

min

men

mine

tooη = 2

A smaller η give a more accurate estimator.

Page 78: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

16-1

The partitioning metric

Neighborhood and uniformity quality of b:∆(b) = ηbnb

∑1,...,dXi,

{X1, . . . , Xd}: the side lengths of b in each dimension.

Page 79: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

16-2

The partitioning metric

Neighborhood and uniformity quality of b:∆(b) = ηbnb

∑1,...,dXi,

{X1, . . . , Xd}: the side lengths of b in each dimension.

Our goal:Minimize

∑ki=1 ∆(bi), where k is the number of buckets specified

by the user.

Page 80: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

16-3

The partitioning metric

Neighborhood and uniformity quality of b:∆(b) = ηbnb

∑1,...,dXi,

{X1, . . . , Xd}: the side lengths of b in each dimension.

The greedy algorithm;The adaptive R-tree algorithm.

Our goal:Minimize

∑ki=1 ∆(bi), where k is the number of buckets specified

by the user.

Page 81: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-1

The greedy algorithm

b1

b2

b3

Page 82: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-2

The greedy algorithm

b1

b2

b3 ?

Page 83: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-3

The greedy algorithm

b1

b2

b3

η3 · n3 ·Π(b3)

?

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.

Π(bi): the perimeter of bi.

Page 84: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-4

The greedy algorithm

b1

b2

b3

η3 · n3 ·Π(b3)

?

P

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.

Π(bi): the perimeter of bi.

Page 85: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-5

The greedy algorithm

b1

b2

b3

η3 · n3 ·Π(b3)

?

Θ(P )

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.

Π(bi): the perimeter of bi.

Page 86: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-6

The greedy algorithm

b1

b2

b3

η3 · n3 ·Π(b3)

?

k − 3 · ( ηk−3 ·

|P |k−3 ·

(Θ(P )k−3

)1/d

· d)

Θ(P )

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.η: number of neighborhoods of strings for remaining points.

Π(bi): the perimeter of bi.

k − 3 buckets.

Page 87: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-7

The greedy algorithm

b1

b2

b3

η3 · n3 ·Π(b3)

?

U(bi) = ηi · ni ·Π(bi) + ηk−i · |P | ·

(Θ(P )k−i

)1/d

· d(Π(bi) is the perimeter of bucket bi)

k − 3 · ( ηk−3 ·

|P |k−3 ·

(Θ(P )k−3

)1/d

· d)

Θ(P )

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.η: number of neighborhoods of strings for remaining points.

Π(bi): the perimeter of bi.

k − 3 buckets.

Page 88: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-8

The greedy algorithm

b1

b2

b3

η3 · n3 ·Π(b3)

U(bi) = ηi · ni ·Π(bi) + ηk−i · |P | ·

(Θ(P )k−i

)1/d

· d(Π(bi) is the perimeter of bucket bi)

k − 3 · ( ηk−3 ·

|P |k−3 ·

(Θ(P )k−3

)1/d

· d)

Θ(P )

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.η: number of neighborhoods of strings for remaining points.

Π(bi): the perimeter of bi.

k − 3 buckets.

Page 89: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-9

The greedy algorithm

b1

b2

b3

U(bi) = ηi · ni ·Π(bi) + ηk−i · |P | ·

(Θ(P )k−i

)1/d

· d(Π(bi) is the perimeter of bucket bi)

Page 90: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-10

The greedy algorithm

b1

b2

b3

b4

Page 91: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

17-11

The greedy algorithm

b1

b2

b3

b4 b5

Page 92: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-1

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

R-tree root

Page 93: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-2

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

Page 94: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-3

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

Page 95: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-4

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

b3

Page 96: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-5

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

b3

?

Page 97: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-6

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

η3 · n3 ·Π(b3)

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.Π(bi): the perimeter of bi.

Page 98: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-7

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

η3 · n3 ·Π(b3) ηk−3 · n · (

rk−3 )1/d · d

r

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.Π(bi): the perimeter of bi.η: number of neighborhoods of strings for remaining points.

Page 99: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-8

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

U ′(bi) = ηi · ni ·Π(bi) + ηk−i · n · (

rk−i )

1/d · d.

η3 · n3 ·Π(b3) ηk−3 · n · (

rk−3 )1/d · d

r

ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.Π(bi): the perimeter of bi.η: number of neighborhoods of strings for remaining points.

Page 100: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-9

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

Page 101: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

18-10

The adaptive R-tree algorithm

k = 6

· · ·· · · · · · · · · · · ·

level3

R-tree root

Page 102: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

19-1

Experiment setup

All experiments were executed on a Linux machine with an IntelXeon CPU at 2GHz and 2GB of memory.

Page 103: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

19-2

Experiment setup

All experiments were executed on a Linux machine with an IntelXeon CPU at 2GHz and 2GB of memory.Real data sets:The road-networks for states (Texas, California) in USA, each pointis associated with the names of the state, county and town.Synthetic data sets: uniform points and random clustered points.Assign strings from real data sets randomly to the point generated.

Page 104: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

19-3

Experiment setup

All experiments were executed on a Linux machine with an IntelXeon CPU at 2GHz and 2GB of memory.Real data sets:The road-networks for states (Texas, California) in USA, each pointis associated with the names of the state, county and town.Synthetic data sets: uniform points and random clustered points.Assign strings from real data sets randomly to the point generated.

The default experimental parameters are summarized below.

Symbol Definition Default Valueθ query area percentage of data space 3%N size of points set 2,000,000l signature length 50τ edit distance threshold 2d dimensionality 2

Page 105: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

20-1

SAS range queries: impact of the signature size

Page 106: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

20-2

SAS range queries: impact of the signature size

Page 107: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

20-3

SAS range queries: impact of the signature size

Page 108: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

20-4

SAS range queries: impact of the signature size

Page 109: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

21-1

SAS range queries: query performance

Page 110: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

21-2

SAS range queries: query performance

Page 111: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

21-3

SAS range queries: query performance

Page 112: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

21-4

SAS range queries: query performance

Page 113: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

22-1

Selectivity estimation: relative errors

CA data set

Page 114: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

22-2

Selectivity estimation: relative errors

TX data set

Page 115: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

22-3

Selectivity estimation: relative errors

Page 116: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

22-4

Selectivity estimation: relative errors

Page 117: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

22-5

Selectivity estimation: relative errors

Page 118: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

22-6

Selectivity estimation: relative errors

Page 119: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

23-1

Selectivity estimation: cost of the adaptive estimator

Page 120: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

23-2

Selectivity estimation: cost of the adaptive estimator

Page 121: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

24-1

Conclusions

We designed MHR-tree for spatial approximate string queries.

Page 122: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

24-2

Conclusions

We designed MHR-tree for spatial approximate string queries.

We designed novel selectivity estimator for SAS range queries,which take into account both the spatial and string distribu-tions.

Page 123: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

24-3

Conclusions

We designed MHR-tree for spatial approximate string queries.

We designed novel selectivity estimator for SAS range queries,which take into account both the spatial and string distribu-tions.

Future work includes examining spatial approximate sub-stringqueries, and using the KMV synopsis to improve the perfor-mance.

Page 124: Approximate String Search in Spatial Databaseslifeifei/papers/SAS.pdfApproximate string search is important Also, spatial database Our work: the approximate string search in spatial

25-1

The End

THANK Y OU

Q and A