proximity searching in high dimensional spaces with a proximity preserving order

20
Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE

Upload: edolie

Post on 11-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order. Edgar Ch ávez Karina Figueroa Gonzalo Navarro. UNIVERSIDAD DE CHILE, CHILE. UNIVERSIDAD MICHOACANA, MEXICO. Content. About the problem Basic concepts Previous work Our technique Experiments - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Proximity Searching in High Dimensional Spaces with a Proximity

Preserving Order

Edgar Chávez

Karina Figueroa

Gonzalo Navarro

UNIVERSIDADMICHOACANA,MEXICO

UNIVERSIDADDE CHILE,

CHILE

Page 2: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Content

1. About the problem

2. Basic concepts

3. Previous work

4. Our technique

5. Experiments

6. Conclusion and future wok

Page 3: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Proximity Searching

Huge Database

•Exact searching is not possible

Expensive distance

Page 4: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Applications

• Retrieval Information

• Classification

• People finder through the web

• Clustering

• Currently used on– Classification of Spider’s web– Face recognition on Chilean’s Web

Page 5: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Problems (metric spaces)

Index

Extraction of characteristics

Complex objects

High dimension

Memorylimited

Huge databases

Page 6: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Terminology

• Queries– Range query– K nearest neighbor

Properties•Symmetry•Strict possitiveness•Triangle inequality

Page 7: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Previous work

• Pivot based • Partition based

Pivot

distance

q

Page 8: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Previous work

• Pivot based • Partition based

centroq

Page 9: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Our techniquePermutation

Permutantp3

p2

p5

P4

P6

u

P1

Page 10: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Our technique

• Exact matching elements have the same permutation

• Similar elements must have a similar permutation (we guess)

• Spearman footrule metric– Measures the similarity of the

permutations– Promissority elements first

Page 11: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Spearman Footrule metricExample

3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4

Difference of positions

Page 12: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Searching process (1a. part)Preprocessing time

Permutantp1

p2

p3

p3,p1,p2

p3,p2,p1

p2,p1,p3

p2,p3,p1

Page 13: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Searching process (2a. part)Query time

Permutantp1

p2

p3

p3,p1,p2

p3,p2,p1

p2,p1,p3

p2,p3,p1

q

p2,p1,p3

Sorting elementsby SpearmanFootrule metric

p2,p1,p3p2,p3,p1…..…..p3,p1,p2

Page 14: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Experiments 93% retrieved, comparing 10% of database

90% retrieved, comparing 60% of databasePivot based

algorithmRetrieved 48%

%re

trie

ved

Page 15: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Experiments100% retrieved, comparing 15% of database

100% retrieved, comparing 90% of database%

retr

ieve

d

Page 16: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

How good is our prediction?

retrieved

Dimension 256, using 256 pivots

Percentage of the database compared

Metric algorithms are using one of them

Page 17: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Similarities between permutations

Almost the same value

Page 18: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Conclusion

• A new probabilistic algorithm for proximity searching in metric space.

• Our technique is based on permutations.• Close elements will have similar

permutations.• This technique is the fastest known

algorithm for high dimension.• Permutations are good predictor

Page 19: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Future Work

• Can Non-metric spaces be tackled with this technique?

• Approximated all K Nearest neighbor algorithm.

• Improving other metric indexes.

Page 20: Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

Thank you

UNIVERSIDADMICHOACANA,MEXICO

UNIVERSIDADDE CHILE,

CHILE

[email protected]