proximity searching in high dimensional spaces with a proximity preserving order

Proximity Searching in High Dimensional Spaces with a Proximity

Preserving Order

Edgar Chávez

Karina Figueroa

Gonzalo Navarro

UNIVERSIDADMICHOACANA,MEXICO

UNIVERSIDADDE CHILE,

Content

1. About the problem

2. Basic concepts

3. Previous work

4. Our technique

5. Experiments

6. Conclusion and future wok

Proximity Searching

Huge Database

•Exact searching is not possible

Expensive distance

Applications

• Retrieval Information

• Classification

• People finder through the web

• Clustering

• Currently used on– Classification of Spider’s web– Face recognition on Chilean’s Web

Problems (metric spaces)

Extraction of characteristics

Complex objects

High dimension

Memorylimited

Huge databases

Terminology

• Queries– Range query– K nearest neighbor

Properties•Symmetry•Strict possitiveness•Triangle inequality

Previous work

• Pivot based • Partition based

distance

Previous work

• Pivot based • Partition based

centroq

Our techniquePermutation

Permutantp3

Our technique

• Exact matching elements have the same permutation

• Similar elements must have a similar permutation (we guess)

• Spearman footrule metric– Measures the similarity of the

permutations– Promissority elements first

Spearman Footrule metricExample

3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4

Difference of positions

Searching process (1a. part)Preprocessing time

Permutantp1

p3,p1,p2

p3,p2,p1

p2,p1,p3

p2,p3,p1

Searching process (2a. part)Query time

Permutantp1

p3,p1,p2

p3,p2,p1

p2,p1,p3

p2,p3,p1

p2,p1,p3

Sorting elementsby SpearmanFootrule metric

p2,p1,p3p2,p3,p1…..…..p3,p1,p2

Experiments 93% retrieved, comparing 10% of database

90% retrieved, comparing 60% of databasePivot based

algorithmRetrieved 48%

Experiments100% retrieved, comparing 15% of database

100% retrieved, comparing 90% of database%

How good is our prediction?

retrieved

Dimension 256, using 256 pivots

Percentage of the database compared

Metric algorithms are using one of them

Similarities between permutations

Almost the same value

Conclusion

• A new probabilistic algorithm for proximity searching in metric space.

• Our technique is based on permutations.• Close elements will have similar

permutations.• This technique is the fastest known

algorithm for high dimension.• Permutations are good predictor

Future Work

• Can Non-metric spaces be tackled with this technique?

• Approximated all K Nearest neighbor algorithm.

• Improving other metric indexes.

Thank you

UNIVERSIDADMICHOACANA,MEXICO

UNIVERSIDADDE CHILE,

Kfiguero@dcc.uchile.cl

proximity searching in high dimensional spaces with a proximity preserving order

similar permutations

metric indexes

permutationsimilar elements

close elements

database comparedsimilarities

high dimensionalmetric

high dimensional spaces

nearest neighbor algorithm

Documents

power search #4 proximity searching with bing

proximity searching in high dimensional spaces with a...

faculty - the pervasiveness of proximal point iterations...

proximity as principles: directness, community norms and...

epfl dialog with the eth board 2020 presentation title ·...

sharp:privateproximity testandsecure …privateproximity...

blupath: a proximity marketing platform christos symeou...

guide to proximity searching - university of exeter · when...

privacy-preserving data sharing with attribute-based private...

d owntown r evitalization i 2019 dri application · the...

efficient, proximity-preserving node overlap removal · for...

a proximity-aware interest-clustered p2p file sharing...

collateral damage · • contrast • repetition •...

care services discovery (csd) a new draft ihe profile for...

envenomation and wounds cause by human interaction with...

sdt topic-09: searching & searching

ieee transactions on information forensics and … ·...

searching in sequence databases · searching in sequence...

proximity sensor/proximity switch/proximity...

proximity, proximity, proximity long