nearest neighbor search in high dimensions seminar in algorithms and geometry mica arie-nachimson...
TRANSCRIPT
![Page 1: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/1.jpg)
Nearest Neighbor Search in High Dimensions
Seminar in Algorithms and Geometry
Mica Arie-Nachimson and Daniel Glasner
April 2009
![Page 2: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/2.jpg)
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• ConclusionIndyk and Motwani, 1998
Gionis, Indyk and Motwani, 1999
Main Results
![Page 3: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/3.jpg)
Nearest Neighbor Problem
• Input: A set P of points in Rd (or any metric space).
• Output: Given a query point q, find the point p* in P which is closest to q.
qp*
![Page 4: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/4.jpg)
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
![Page 5: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/5.jpg)
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
22
2
3
8
7
4
2
Feature space
query
![Page 6: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/6.jpg)
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
aboutboat
batabate
able
scoutshout
abaut
Feature space
query
![Page 7: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/7.jpg)
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
And many more…
![Page 8: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/8.jpg)
Approximate Nearest Neighbor -NN
![Page 9: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/9.jpg)
Approximate Nearest Neighbor -NN
• Input: A set P of points in Rd (or any metric space).
• Given a query point q, let:– p* point in P closest to q– r* the distance ||p*-q||
• Output: Some point p’ with distance at most r*(1+)
q
p*r*
![Page 10: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/10.jpg)
Approximate Nearest Neighbor -NN
• Input: A set P of points in Rd (or any metric space).
• Given a query point q, let:– p* point in P closest to q– r* the distance ||p*-q||
• Output: Some point p’ with distance at most r*(1+)
p*·r*(1+)
·r*(1+)q
r*
![Page 11: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/11.jpg)
Approximate vs. ExactNearest Neighbor
• Many applications give similar results with approximate NN
• Example from Computer Vision
![Page 12: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/12.jpg)
Retiling
Slide from Lihi Zelnik-Manor
![Page 13: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/13.jpg)
Exact NNS ~27 sec
Approximate NNS ~0.6 sec
Slide from Lihi Zelnik-Manor
![Page 14: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/14.jpg)
Solution Method
• Input: A set P of n points in Rd.• Method: Construct a data structure
to answer nearest neighbor queries• Complexity
– Preprocessing: space and time to construct the data structure
– Query: time to return answer
![Page 15: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/15.jpg)
Solution Method
• Naïve approach:– Preprocessing O(nd)– Query time O(nd)
• Reasonable requirements:– Preprocessing time and space poly(nd).– Query time sublinear in n.
![Page 16: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/16.jpg)
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• Conclusion
![Page 17: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/17.jpg)
Classical nearest neighbor methods
• Tree structures– kd-trees
• Vornoi Diagrams– Preprocessing poly(n), exp(d)– Query log(n), exp(d)
• Difficult problem in high dimensions– The solutions still work, but are exp(d)…
![Page 18: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/18.jpg)
KD-tree
• d=1 (binary search tree)
5 20
7 ,8 10 ,12 13 ,15 18
12 157 8 10 13 18
13,15,187,8,10,12
1813,1510,127,8
![Page 19: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/19.jpg)
KD-tree
• d=1 (binary search tree)
5 20
7 ,8 10 ,12 13 ,15 18
12 157 8 10 13 18
13,15,187,8,10,12
1813,1510,127,8
17query
min dist = 1
![Page 20: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/20.jpg)
KD-tree
• d=1 (binary search tree)
5 20
7 ,8 10 ,12 13 ,15 18
12 157 8 10 13 18
13,15,187,8,10,12
1813,1510,127,8
16query
min dist = 2min dist = 1
![Page 21: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/21.jpg)
KD-tree
• d>1: alternate between dimensions• Example: d=2
x
y
x
(12,5( )6,8( )17,4( )23,2( )20,10( )9,9( )1,6)
(17,4( )23,2 )(20,10)
(12,5( )6,8( )1,6 )(9,9)
![Page 22: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/22.jpg)
KD-tree
• d>1: alternate between dimensions• Example: d=2
xx
y
x
![Page 23: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/23.jpg)
KD-tree: complexity
• Preprocessing O(nd)• Query
– O(logn) if points are randomly distributed– w.c. O(kn1-1/k) almost linear when n close to k
• Need to search the whole tree
xx
y
x
![Page 24: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/24.jpg)
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• Conclusion
![Page 25: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/25.jpg)
Sublinear solutions
Query timePreprocessing
BucketingO(logn)nO(1/)
LSHO(n1/(1+))
[sqrt(n) when =1]
O(n1+1/(1+))
[n3/2 when =1]
2
Linear in d
Not counting logn factors
Solve -NN by reduction
![Page 26: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/26.jpg)
r-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for every query q, find a ball that it resides in, if exists.
• If doesn’t reside in any ball return NO.
Return p1 p1
![Page 27: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/27.jpg)
r-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for every query q, find a ball that it resides in, if exists.
• If doesn’t reside in any ball return NO.
Return NO
![Page 28: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/28.jpg)
Reduction from -NN to r-PLEB
• The two problems are connected– r-PLEB is like a decision problem for -
NN
![Page 29: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/29.jpg)
Reduction from -NN to r-PLEB
• The two problems are connected– r-PLEB is like a decision problem for -
NN
![Page 30: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/30.jpg)
Reduction from -NN to r-PLEB
• The two problems are connected– r-PLEB is like a decision problem for -
NN
![Page 31: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/31.jpg)
Reduction from -NN to r-PLEBNaïve Approach
• Set R=proportion between largest dist and smallest dist of 2 points
• Define r={(1+)0, (1+)1,…,R}• For each ri construct ri-PLEB• Given q, find the smallest r* which gives a
YES– Use binary search to find r*
![Page 32: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/32.jpg)
Reduction from -NN to r-PLEBNaïve Approach
• Set R=proportion between largest dist and smallest dist of 2 points
• Define r={(1+)0, (1+)1,…,R}• For each ri construct ri-PLEB• Given q, find the smallest ri which gives a
YES– Use binary search
r1-PLEBr2-PLEB
r3-PLEB
![Page 33: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/33.jpg)
Reduction from -NN to r-PLEBNaïve Approach
• Correctness– Stopped at ri=(1+)k
– ri+1=(1+)k+1
r1-PLEBr2-PLEB
r3-PLEB
(1+)k · r* · (1+)k+1
![Page 34: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/34.jpg)
Reduction from -NN to r-PLEBNaïve Approach
Reduction overhead:
• Space: O(log1+R) r-PLEB constructions – Size of {(1+)0, (1+)1,…,R} is log1+R
• Query: O(loglog1+R) calls to r-PLEBDependency on R
![Page 35: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/35.jpg)
Reduction from -NN to r-PLEBBetter Approach
• Set rmed as the radius which gives n/2 connected components (C.C)
Har-Peled 2001
![Page 36: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/36.jpg)
Reduction from -NN to r-PLEBBetter Approach
• Set rmed as the radius which gives n/2 connected components (C.C)
![Page 37: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/37.jpg)
Reduction from -NN to r-PLEBBetter Approach
• Set rmed as the radius which gives n/2 connected components (C.C)
• Set rtop= 4nrmedlogn/
rmed
rtop
![Page 38: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/38.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
![Page 39: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/39.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rtop
![Page 40: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/40.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
![Page 41: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/41.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rtop
![Page 42: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/42.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rtop
![Page 43: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/43.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
![Page 44: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/44.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
![Page 45: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/45.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
![Page 46: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/46.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
![Page 47: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/47.jpg)
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C. 2 + half of the points
O(loglogR)=O(log(n/)
Complexity overhead: how many r-PLEB queries? Total: O(logn)
![Page 48: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/48.jpg)
(r,)-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for query q:– If q resides in a ball of radius r, return
the ball.– If q doesn’t reside in any ball, return NO.– If q resides only in the “border” of a ball,
return either the ball or NO.
p1Return p1
![Page 49: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/49.jpg)
(r,)-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for query q:– If q resides in a ball of radius r, return
the ball.– If q doesn’t reside in any ball, return NO.– If q resides only in the “border” of a ball,
return either the ball or NO.
Return NO
![Page 50: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/50.jpg)
(r,)-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for query q:– If q resides in a ball of radius r, return
the ball.– If q doesn’t reside in any ball, return NO.– If q resides only in the “border” of a ball,
return either the ball or NO.
Return YES or NO
![Page 51: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/51.jpg)
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• Conclusion
![Page 52: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/52.jpg)
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
Indyk and Motwani, 1998
![Page 53: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/53.jpg)
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
![Page 54: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/54.jpg)
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
![Page 55: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/55.jpg)
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
![Page 56: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/56.jpg)
Bucketing Method
• Given query q• Compute the cube it resides in [O(d)]• Find the ball this cube intersects [O(1)]• This point is an (r,)-PLEB of q
r-PLEB
![Page 57: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/57.jpg)
Bucketing Method
• Given query q• Compute the cube it resides in [O(d)]• Find the ball this cube intersects [O(1)]• This point is an (r,)-PLEB of q
r/sqrt(d)
r/s
qrt(
d)
r-PLEB
![Page 58: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/58.jpg)
Bucketing Method
• Given query q• Compute the cube it resides in [O(d)]• Find the ball this cube intersects [O(1)]• This point is an (r,)-PLEB of q
NO
YES
YES or NO r-PLEB
![Page 59: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/59.jpg)
Bucketing MethodComplexity
• Space required: O(nk)=O(n(1/d))• Query time: O(d)• If d=O(logn) [or n=O(2d)]
– Space req: O(nlog(1/))
• Else use dimensionality reduction in l2 from d to -2log(n) [Johnson-Lindenstrauss lemma]– Space: nO(1/)
2
![Page 60: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/60.jpg)
Break
![Page 61: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/61.jpg)
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Local Sensitive Hashing
• Conclusion
![Page 62: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/62.jpg)
Locality Sensitive Hashing
• Indyk & Motwani 98, Gionis, Indyk & Motwani 99
• A solution for (r,)-PLEB.• Probabilistic construction, query
succeeds with high probability.• Use random hash functions
g: X U (some finite range).• Preserve “separation” of “near” and
“far” points with high probability.
![Page 63: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/63.jpg)
Locality Sensitive Hashing
• If ||p-q|| ≤ r, then Pr[g(p)=g(q)] is “high”
• If ||p-q|| > (1+)r, then Pr[g(p)=g(q)] is “low”
r
… g3
… g2
… g1
![Page 64: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/64.jpg)
A locality sensitive family
• A family H of functions h: X → U is called (P1,P2,r,(1+)r)-sensitive for metric dX, if for any p,q:
– if ||p-q|| < r then Pr[ h(p)=h(q) ] > P1
– if ||p-q|| >(1+)r then Pr[ h(p)=h(q) ] < P2
• For this notion to be useful we requireP1 > P2
![Page 65: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/65.jpg)
Intuition
• if ||p-q|| < r then Pr[ h(p)=h(q) ] > P1
• if ||p-q|| >(1+)r then Pr[ h(p)=h(q) ] < P2
h1
h2
Illustration from Lihi Zelnik-Manor
![Page 66: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/66.jpg)
Claim
• If there is a (P1,P2,r,(1+)r) - sensitive family for dX then there exists an algorithm for (r,)-PLEB in dX with
• Space - O(dn+n1+) • Query - O(dn)
Where
~ When = 1 O(dn + n3/2) O(d¢sqrt(n))
![Page 67: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/67.jpg)
Algorithm – preprocessing
k
h1
h2
hk
• For i = 1,…,L – Uniformly select k functions from H
– Set gi(p)=(h1(p),h2(p),…,hk(p))
gi( ) = (0,0,...,1)
gi( ) = (1,0,…,0)
hi : Rd {0,1}0 1
![Page 68: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/68.jpg)
Algorithm – preprocessing
• For i = 1,…,L – Uniformly select k functions from H
– Set gi(p)=(h1(p),h2(p),…,hk(p))
– Compute gi(p) for all p 2 P
– Store resulting values in a hash table
![Page 69: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/69.jpg)
Algorithm - query
• S à , i à 1• While |S| · 2L
– S Ã S [ {points in bucket gi(q) of table i}
– If 9 p 2 S s.t. ||p-q|| · (1+)rreturn p and exit.
– i++
• Return NO.
![Page 70: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/70.jpg)
Correctness
• Property I:if ||q-p*|| · r then gi(p*) = gi(q) for some i 2 1,...,L
• Property II:number of points p2 P s.t. ||q-p|| ¸ (1+)r and gi(p*) = gi(q) is less than 2L
• We show that Pr[I & II hold] ¸ ½-1/e
![Page 71: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/71.jpg)
Correctness
• Property I:if ||q-p*|| · r then gi(p*) = gi(q) for some i 2 1,...,L
• Property II:number of points p2 P s.t. ||q-p|| ¸ (1+)r and gi(p*) = gi(q) is less than 2L
• Choose: – k = log1/p2
n
– L = nwhere
![Page 72: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/72.jpg)
Complexity
• k = log1/p2n
• L = nwhere• Space
L¢n + d¢n = O(n1+ + dn)
• QueryL hash function evaluations +
O(L) distance calculations = O(dn)
Hash tables Data points
~
![Page 73: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/73.jpg)
Significance of k and L
||p-q||
Pr[
g(p
) =
g
(q)]
![Page 74: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/74.jpg)
Significance of k and L
||p-q||
Pr[
gi(p
) =
gi(q
)
for
som
e i 2
1,.
..,L
]
![Page 75: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/75.jpg)
Application
• Perform NNS in Rd with l1 distance.
• Reduce the problem to NNS in Hd’ the hamming cube of dimension d’.
• Hd’ = binary strings of length d’.
• dHam(s1,s2) = number of coordinates where s1 and s2 disagree.
![Page 76: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/76.jpg)
• w.l.o.g all coordinates of all points in P are positive integer < C.
• Map integer i 2 {1,...,C} to(1,1,....,1,0,0,...0)
• Map a vector by mapping each coordinate.• Example: {(5,3,2),(2,4,1)}
{(11111,11100,11000),(11000,11110,10000)}
Embedding l1d in Hd’
C-i zerosi ones
![Page 77: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/77.jpg)
• Distances are preserved.• Actual computations are performed
in the original space O(log C) overhead.
Embedding l1d in Hd’
![Page 78: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/78.jpg)
A sensitive family for the hamming cube
• Hd’ = {hi : hi(b1,…,bd’) = bi for i = 1,…,d’}– If dHam(s1,s2) < r what is Pr[h(p)=h(q)] ?
at most 1-r/d’– If dHam(s,s2) > (1+)r what is Pr[h(p)=h(q)] ?
at least 1-(1+)r/d’
• Hd’ is (r,(1+)r,1-r/d’,1-(1+)r/d’) sensitive.
• Question: what are these projections in the original space?
![Page 79: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/79.jpg)
Corollary
• We can bound · (1/1+)
• Space - O(dn+n(1+1/(1+) • Query - O(dn1/(1+)
When = 1 O(dn + n3/2) O(d¢sqrt(n))
![Page 80: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/80.jpg)
Recent results
• In Euclidian space– · 1/(1+)2 + O(log log n / log1/3 n)
[Andoni & Indyk 2008]– ¸ 0.462/(1+)2
[Motwani, Naor & Panigrahy 2006]
• LSH family for ls s 2 [0,2)[Datar,Immorlica,Indyk & Mirrokni 2004]
• And many more.
![Page 81: Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649c775503460f9492ccca/html5/thumbnails/81.jpg)
Conclusion
• NNS is an important problem with many applications.
• The problem can be efficiently solved in low dimensions.
• We saw some efficient approximate solutions in high dimensions, which are applicable to many metrics.