near neighbor problem made fairmahabadi/slides/fairnn.pdfall existing algorithms for this problem...
TRANSCRIPT
![Page 1: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/1.jpg)
Near Neighbor Problem Made Fair
Sariel Har-PeledUIUC
Sepideh MahabadiTTIC
![Page 2: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/2.jpg)
Nearest Neighbor Problems
• Nearest Neighbor: Given a set of objects, find the closest one to the query object.
![Page 3: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/3.jpg)
Nearest Neighbor Problems
• Nearest Neighbor: Given a set of objects, find the closest one to the query object.
![Page 4: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/4.jpg)
Nearest Neighbor Problems
• Nearest Neighbor: Given a set of objects, find the closest one to the query object.
• Near Neighbor: given a set of objects, find one that is close enough to the query object.
![Page 5: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/5.jpg)
There are many applications of NN
Searching for the closest object
![Page 6: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/6.jpg)
Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟
![Page 7: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/7.jpg)
Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
𝑞𝑞
𝑟𝑟
![Page 8: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/8.jpg)
Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal: • Find a point 𝑝𝑝∗ in the 𝑟𝑟-neighborhood
𝑞𝑞𝑝𝑝∗
𝑟𝑟
![Page 9: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/9.jpg)
Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal: • Find a point 𝑝𝑝∗ in the 𝑟𝑟-neighborhood• Do it in sub-linear time and small space
𝑞𝑞𝑝𝑝∗
𝑟𝑟
![Page 10: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/10.jpg)
Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal: • Find a point 𝑝𝑝∗ in the 𝑟𝑟-neighborhood• Do it in sub-linear time and small space
All existing algorithms for this problem• Either space or query time depending exponentially on 𝑑𝑑• Or assume certain properties about the data, e.g., bounded
intrinsic dimension
𝑞𝑞𝑝𝑝∗
𝑟𝑟
![Page 11: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/11.jpg)
Approximate Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal: • Find a point 𝑝𝑝∗ in the 𝑟𝑟-neighborhood• Do it in sub-linear time and small space• Approximate Near Neighbor
─ Report a point in distance c𝑟𝑟 for c > 1
𝑞𝑞𝑝𝑝
𝑟𝑟𝑐𝑐𝑟𝑟
![Page 12: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/12.jpg)
Approximate Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal: • Find a point 𝑝𝑝∗ in the 𝑟𝑟-neighborhood• Do it in sub-linear time and small space• Approximate Near Neighbor
─ Report a point in distance c𝑟𝑟 for c > 1─ For Hamming (and Manhattan) query time is 𝑛𝑛𝑂𝑂(1/𝑐𝑐) [IM98] ─ and for Euclidean it is 𝑛𝑛𝑂𝑂( 1
𝑐𝑐2) [AI08]
𝑞𝑞
𝑟𝑟𝑐𝑐𝑟𝑟𝑝𝑝
![Page 13: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/13.jpg)
Fair Near Neighbor
Sample a neighbor of the query uniformly at random
Individual fairness: every neighbor has the same chance of being reported. Remove the bias inherent in the NN data structure
![Page 14: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/14.jpg)
Fair Near Neighbor
Sample a neighbor of the query uniformly at random
Individual fairness: every neighbor has the same chance of being reported. Remove the bias inherent in the NN data structure
Applications:Removing noise, k-NN classificationAnonymizing the dataCounting the neighborhood size
![Page 15: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/15.jpg)
Fair Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal: • Return each point 𝑝𝑝 in the neighborhood of 𝑞𝑞 with uniform
probability• Do it in sub-linear time and small space
𝑞𝑞12
𝑟𝑟 12
![Page 16: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/16.jpg)
Approximate Fair Near Neighbor
Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter 𝑟𝑟A query point 𝑞𝑞 comes online
Goal of Approximate Fair NN─ Any point 𝑝𝑝 in 𝑁𝑁(𝑞𝑞, 𝑟𝑟) is reported with “almost uniform”
probability, i.e., 𝜆𝜆𝑞𝑞(𝑝𝑝) where
11 + 𝜖𝜖 𝑁𝑁 𝑞𝑞, 𝑟𝑟
≤ 𝜆𝜆𝑞𝑞(𝑝𝑝) ≤1 + 𝜖𝜖𝑁𝑁 𝑞𝑞, 𝑟𝑟
𝑞𝑞12
𝑟𝑟 12
![Page 17: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/17.jpg)
Results on (1 + 𝜖𝜖)-Approximate Fair NN
𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
𝑞𝑞
𝑟𝑟
![Page 18: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/18.jpg)
Results on (1 + 𝜖𝜖)-Approximate Fair NN
𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN
Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
𝑞𝑞
𝑟𝑟𝑞𝑞
𝑟𝑟𝑐𝑐𝑟𝑟
![Page 19: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/19.jpg)
Results on (1 + 𝜖𝜖)-Approximate Fair NN
𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN
Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟
Dependence on 𝜖𝜖 is O(log(1𝜖𝜖))
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
![Page 20: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/20.jpg)
Results on (1 + 𝜖𝜖)-Approximate Fair NN
𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN
Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟
Dependence on 𝜖𝜖 is O(log(1𝜖𝜖))
Experiments
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
![Page 21: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/21.jpg)
Results on (1 + 𝜖𝜖)-Approximate Fair NN
𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN
Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟
Dependence on 𝜖𝜖 is O(log(1𝜖𝜖))
Experiments
Recent paper [Aumuller, Pagh, Silvestry’19] defining the same notion
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
![Page 22: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/22.jpg)
Locality Sensitive Hashing (LSH)One of the main approaches to solve the Nearest Neighbor problems
![Page 23: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/23.jpg)
Hashing scheme s.t. close points have higher probability of collision than far points
Locality Sensitive Hashing (LSH)
![Page 24: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/24.jpg)
Hashing scheme s.t. close points have higher probability of collision than far pointsHash functions: 𝑔𝑔1 , … ,𝑔𝑔𝐿𝐿
• 𝑔𝑔𝑖𝑖 is an independently chosen hash function
Locality Sensitive Hashing (LSH)
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 25: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/25.jpg)
Hashing scheme s.t. close points have higher probability of collision than far pointsHash functions: 𝑔𝑔1 , … ,𝑔𝑔𝐿𝐿
• 𝑔𝑔𝑖𝑖 is an independently chosen hash function
If 𝑝𝑝 − 𝑝𝑝′ ≤ 𝑟𝑟 , they collide w.p. ≥ 𝑃𝑃ℎ𝑖𝑖𝑖𝑖ℎIf 𝑝𝑝 − 𝑝𝑝′ ≥ 𝑐𝑐𝑟𝑟 , they collide w.p. ≤ 𝑃𝑃𝑙𝑙𝑙𝑙𝑙𝑙
For 𝑃𝑃ℎ𝑖𝑖𝑖𝑖ℎ ≥ 𝑃𝑃𝑙𝑙𝑙𝑙𝑙𝑙
Locality Sensitive Hashing (LSH)
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 26: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/26.jpg)
Retrieval: [Indyk, Motwani’98]• The union of the query buckets is roughly
the neighborhood of 𝑞𝑞
• ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 is roughly the neighborhood
Locality Sensitive Hashing (LSH)
𝑞𝑞
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 27: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/27.jpg)
Retrieval: [Indyk, Motwani’98]• The union of the query buckets is roughly
the neighborhood of 𝑞𝑞
• ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 is roughly the neighborhood
• How to report a uniformly random neighbor from union of these buckets?
Locality Sensitive Hashing (LSH)
𝑞𝑞
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 28: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/28.jpg)
Retrieval: [Indyk, Motwani’98]• The union of the query buckets is roughly
the neighborhood of 𝑞𝑞
• ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 is roughly the neighborhood
• How to report a uniformly random neighbor from union of these buckets?
• Collecting all points might take 𝑂𝑂(𝑛𝑛) time
Locality Sensitive Hashing (LSH)
𝑞𝑞
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 29: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/29.jpg)
Approaches
![Page 30: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/30.jpg)
How to output a random neighbor from ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 :
1. Choose a uniformly random bucket2. Choose a uniformly random point in the
bucket
Approach 1: Uniform/Uniform
𝑞𝑞
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 31: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/31.jpg)
How to output a random neighbor from ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 :
1. Choose a random bucket proportional to its size
2. Choose a random point in the bucket
Approach 2: Weighted/Uniform
𝑞𝑞
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 32: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/32.jpg)
How to output a random neighbor from ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 :
1. Choose a random bucket proportional to its size
2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked
w.p. proportional to its degree 𝑑𝑑𝑝𝑝
Approach 2: Weighted/Uniform
𝑞𝑞
Number of buckets that 𝒑𝒑appears in
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 33: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/33.jpg)
How to output a random neighbor from ⋃𝑖𝑖 𝐵𝐵𝑖𝑖 𝑔𝑔𝑖𝑖 𝑞𝑞 :
1. Choose a random bucket proportional to its size
2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked
w.p. proportional to its degree 𝑑𝑑𝑝𝑝
3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝
, o.w. repeat
Approach 3: Optimal
𝑞𝑞
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 34: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/34.jpg)
1. Choose a random bucket proportional to its size
2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked
w.p. proportional to its degree 𝑑𝑑𝑝𝑝
3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝
, o.w. repeat
Uniform probability
𝑞𝑞
Approach 3: Optimal
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 35: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/35.jpg)
1. Choose a random bucket proportional to its size
2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked
w.p. proportional to its degree 𝑑𝑑𝑝𝑝
3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝
, o.w. repeat
Uniform probability Need to spend 𝑂𝑂(𝐿𝐿) to find the degree
𝑞𝑞
Approach 3: Optimal
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 36: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/36.jpg)
1. Choose a random bucket proportional to its size
2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked
w.p. proportional to its degree 𝑑𝑑𝑝𝑝
3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝
, o.w. repeat
Uniform probability Need to spend 𝑂𝑂(𝐿𝐿) to find the degree Might need 𝑂𝑂 𝑑𝑑𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑂𝑂(𝐿𝐿) samples Total time is 𝑂𝑂(𝐿𝐿2)
𝑞𝑞
Approach 3: Optimal
𝑔𝑔1
𝑔𝑔2
𝑔𝑔3
𝑔𝑔𝐿𝐿
![Page 37: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/37.jpg)
Approximate the degree 𝑑𝑑𝑝𝑝Sample 𝑂𝑂( 𝐿𝐿
𝑑𝑑𝑝𝑝⋅𝜖𝜖2) buckets out of 𝐿𝐿 buckets to (1 + 𝜖𝜖)-approximate the degree.
Still if the degree is low this takes 𝑂𝑂(𝐿𝐿) samples.
![Page 38: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/38.jpg)
Approximate the degree 𝑑𝑑𝑝𝑝Sample 𝑂𝑂( 𝐿𝐿
𝑑𝑑𝑝𝑝⋅𝜖𝜖2) buckets out of 𝐿𝐿 buckets to (1 + 𝜖𝜖)-approximate the degree.
Still if the degree is low this takes 𝑂𝑂(𝐿𝐿) samples.
Case 1: Small degree 𝒅𝒅𝒑𝒑:• More samples are required to estimate• Reject with lower probability -> Fewer queries of this type
Case 2: Large degree 𝒅𝒅𝒑𝒑:• Fewer samples are required to estimate• Reject with higher probability -> More queries of this type
![Page 39: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/39.jpg)
Approximate the degree 𝑑𝑑𝑝𝑝Sample 𝑂𝑂( 𝐿𝐿
𝑑𝑑𝑝𝑝⋅𝜖𝜖2) buckets out of 𝐿𝐿 buckets to (1 + 𝜖𝜖)-approximate the degree.
Still if the degree is low this takes 𝑂𝑂(𝐿𝐿) samples.
Case 1: Small degree 𝒅𝒅𝒑𝒑:• More samples are required to estimate• Reject with lower probability -> Fewer queries of this type
Case 2: Large degree 𝒅𝒅𝒑𝒑:• Fewer samples are required to estimate• Reject with higher probability -> More queries of this type
This decreases 𝑂𝑂(𝐿𝐿2) runtime to �𝑂𝑂(𝐿𝐿) Large dependency on 𝜖𝜖 of the form 𝑂𝑂( 1
𝜖𝜖2)
Via a different sampling approach we show how to reduce the dependency to logarithmic 𝑂𝑂(log 1
𝜖𝜖).
![Page 40: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/40.jpg)
Experiments
Setup• Take MNIST as the data set• Ask a query several times and compute the empirical distribution of the
neighbors.• Compute the statistical distance of the empirical distribution to the uniform
distribution
![Page 41: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/41.jpg)
Experiments
Setup• Take MNIST as the data set• Ask a query several times and compute the empirical distribution of the
neighbors.• Compute the statistical distance of the empirical distribution to the uniform
distributionComparison• Our algorithm performs 2.5 times worse than the optimal algorithm, but the
other two perform 7 and 10 times worse than the optimal.• Four times faster than the optimal but 15 times slower than the other two
![Page 42: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/42.jpg)
Conclusion
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
The dependence on the parameter 𝑛𝑛 matches the standard Nearest Neighbor.We get an independent near neighbor each time we draw a sample. More generally the approach works for sampling form a sub-
collection of sets.
![Page 43: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/43.jpg)
Conclusion
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
The dependence on the parameter 𝑛𝑛 matches the standard Nearest Neighbor.We get an independent near neighbor each time we draw a sample. More generally the approach works for sampling form a sub-
collection of sets.Open Problem:
o Finding the optimal dependency on the density parameter: 𝐴𝐴 𝑞𝑞,𝑐𝑐𝑐𝑐𝐴𝐴 𝑞𝑞,𝑐𝑐
![Page 44: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8dc008b929c663b00c52bc/html5/thumbnails/44.jpg)
Conclusion
Domain Guarantee Space Query
Exact Neighborhood𝑁𝑁(𝑞𝑞, 𝑟𝑟)
w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 𝑞𝑞, 𝑐𝑐𝑟𝑟𝑁𝑁 𝑞𝑞, 𝑟𝑟
)
Approximate Neighborhood𝑁𝑁 𝑞𝑞, 𝑟𝑟 ⊆ 𝑆𝑆 ⊆ 𝑁𝑁(𝑞𝑞, 𝑐𝑐𝑟𝑟)
In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)
ThanksQuestions?
The dependence on the parameter 𝑛𝑛 matches the standard Nearest Neighbor.We get an independent near neighbor each time we draw a sample. More generally the approach works for sampling form a sub-
collection of sets.Open Problem:
o Finding the optimal dependency on the density parameter: 𝐴𝐴 𝑞𝑞,𝑐𝑐𝑐𝑐𝐴𝐴 𝑞𝑞,𝑐𝑐