Transcript
Page 1: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Near Neighbor Problem Made Fair

Sariel Har-PeledUIUC

Sepideh MahabadiTTIC

Page 2: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Nearest Neighbor Problems

β€’ Nearest Neighbor: Given a set of objects, find the closest one to the query object.

Page 3: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Nearest Neighbor Problems

β€’ Nearest Neighbor: Given a set of objects, find the closest one to the query object.

Page 4: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Nearest Neighbor Problems

β€’ Nearest Neighbor: Given a set of objects, find the closest one to the query object.

β€’ Near Neighbor: given a set of objects, find one that is close enough to the query object.

Page 5: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

There are many applications of NN

Searching for the closest object

Page 6: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘Ÿ

Page 7: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

π‘žπ‘ž

π‘Ÿπ‘Ÿ

Page 8: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal: β€’ Find a point π‘π‘βˆ— in the π‘Ÿπ‘Ÿ-neighborhood

π‘žπ‘žπ‘π‘βˆ—

π‘Ÿπ‘Ÿ

Page 9: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal: β€’ Find a point π‘π‘βˆ— in the π‘Ÿπ‘Ÿ-neighborhoodβ€’ Do it in sub-linear time and small space

π‘žπ‘žπ‘π‘βˆ—

π‘Ÿπ‘Ÿ

Page 10: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal: β€’ Find a point π‘π‘βˆ— in the π‘Ÿπ‘Ÿ-neighborhoodβ€’ Do it in sub-linear time and small space

All existing algorithms for this problemβ€’ Either space or query time depending exponentially on 𝑑𝑑‒ Or assume certain properties about the data, e.g., bounded

intrinsic dimension

π‘žπ‘žπ‘π‘βˆ—

π‘Ÿπ‘Ÿ

Page 11: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approximate Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal: β€’ Find a point π‘π‘βˆ— in the π‘Ÿπ‘Ÿ-neighborhoodβ€’ Do it in sub-linear time and small spaceβ€’ Approximate Near Neighbor

─ Report a point in distance cπ‘Ÿπ‘Ÿ for c > 1

π‘žπ‘žπ‘π‘

π‘Ÿπ‘Ÿπ‘π‘π‘Ÿπ‘Ÿ

Page 12: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approximate Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal: β€’ Find a point π‘π‘βˆ— in the π‘Ÿπ‘Ÿ-neighborhoodβ€’ Do it in sub-linear time and small spaceβ€’ Approximate Near Neighbor

─ Report a point in distance cπ‘Ÿπ‘Ÿ for c > 1─ For Hamming (and Manhattan) query time is 𝑛𝑛𝑂𝑂(1/𝑐𝑐) [IM98] ─ and for Euclidean it is 𝑛𝑛𝑂𝑂( 1

𝑐𝑐2) [AI08]

π‘žπ‘ž

π‘Ÿπ‘Ÿπ‘π‘π‘Ÿπ‘Ÿπ‘π‘

Page 13: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Fair Near Neighbor

Sample a neighbor of the query uniformly at random

Individual fairness: every neighbor has the same chance of being reported. Remove the bias inherent in the NN data structure

Page 14: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Fair Near Neighbor

Sample a neighbor of the query uniformly at random

Individual fairness: every neighbor has the same chance of being reported. Remove the bias inherent in the NN data structure

Applications:Removing noise, k-NN classificationAnonymizing the dataCounting the neighborhood size

Page 15: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Fair Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal: β€’ Return each point 𝑝𝑝 in the neighborhood of π‘žπ‘ž with uniform

probabilityβ€’ Do it in sub-linear time and small space

π‘žπ‘ž12

π‘Ÿπ‘Ÿ 12

Page 16: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approximate Fair Near Neighbor

Dataset of 𝑛𝑛 points 𝑃𝑃 in a metric space, e.g. ℝ𝑑𝑑, and a parameter π‘Ÿπ‘ŸA query point π‘žπ‘ž comes online

Goal of Approximate Fair NN─ Any point 𝑝𝑝 in 𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ) is reported with β€œalmost uniform”

probability, i.e., πœ†πœ†π‘žπ‘ž(𝑝𝑝) where

11 + πœ–πœ– 𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ

≀ πœ†πœ†π‘žπ‘ž(𝑝𝑝) ≀1 + πœ–πœ–π‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

π‘žπ‘ž12

π‘Ÿπ‘Ÿ 12

Page 17: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Results on (1 + πœ–πœ–)-Approximate Fair NN

𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

π‘žπ‘ž

π‘Ÿπ‘Ÿ

Page 18: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Results on (1 + πœ–πœ–)-Approximate Fair NN

𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN

Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

π‘žπ‘ž

π‘Ÿπ‘Ÿπ‘žπ‘ž

π‘Ÿπ‘Ÿπ‘π‘π‘Ÿπ‘Ÿ

Page 19: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Results on (1 + πœ–πœ–)-Approximate Fair NN

𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN

Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ

Dependence on πœ–πœ– is O(log(1πœ–πœ–))

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

Page 20: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Results on (1 + πœ–πœ–)-Approximate Fair NN

𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN

Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ

Dependence on πœ–πœ– is O(log(1πœ–πœ–))

Experiments

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

Page 21: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Results on (1 + πœ–πœ–)-Approximate Fair NN

𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴 and 𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 are the space and query time of standard ANN

Approximate neighborhood: a set 𝑆𝑆 such that 𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ

Dependence on πœ–πœ– is O(log(1πœ–πœ–))

Experiments

Recent paper [Aumuller, Pagh, Silvestry’19] defining the same notion

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

Page 22: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Locality Sensitive Hashing (LSH)One of the main approaches to solve the Nearest Neighbor problems

Page 23: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Hashing scheme s.t. close points have higher probability of collision than far points

Locality Sensitive Hashing (LSH)

Page 24: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Hashing scheme s.t. close points have higher probability of collision than far pointsHash functions: 𝑔𝑔1 , … ,𝑔𝑔𝐿𝐿

β€’ 𝑔𝑔𝑖𝑖 is an independently chosen hash function

Locality Sensitive Hashing (LSH)

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 25: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Hashing scheme s.t. close points have higher probability of collision than far pointsHash functions: 𝑔𝑔1 , … ,𝑔𝑔𝐿𝐿

β€’ 𝑔𝑔𝑖𝑖 is an independently chosen hash function

If 𝑝𝑝 βˆ’ 𝑝𝑝′ ≀ π‘Ÿπ‘Ÿ , they collide w.p. β‰₯ π‘ƒπ‘ƒβ„Žπ‘–π‘–π‘–π‘–β„ŽIf 𝑝𝑝 βˆ’ 𝑝𝑝′ β‰₯ π‘π‘π‘Ÿπ‘Ÿ , they collide w.p. ≀ 𝑃𝑃𝑙𝑙𝑙𝑙𝑙𝑙

For π‘ƒπ‘ƒβ„Žπ‘–π‘–π‘–π‘–β„Ž β‰₯ 𝑃𝑃𝑙𝑙𝑙𝑙𝑙𝑙

Locality Sensitive Hashing (LSH)

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 26: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Retrieval: [Indyk, Motwani’98]β€’ The union of the query buckets is roughly

the neighborhood of π‘žπ‘ž

β€’ ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž is roughly the neighborhood

Locality Sensitive Hashing (LSH)

π‘žπ‘ž

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 27: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Retrieval: [Indyk, Motwani’98]β€’ The union of the query buckets is roughly

the neighborhood of π‘žπ‘ž

β€’ ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž is roughly the neighborhood

β€’ How to report a uniformly random neighbor from union of these buckets?

Locality Sensitive Hashing (LSH)

π‘žπ‘ž

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 28: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Retrieval: [Indyk, Motwani’98]β€’ The union of the query buckets is roughly

the neighborhood of π‘žπ‘ž

β€’ ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž is roughly the neighborhood

β€’ How to report a uniformly random neighbor from union of these buckets?

β€’ Collecting all points might take 𝑂𝑂(𝑛𝑛) time

Locality Sensitive Hashing (LSH)

π‘žπ‘ž

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 29: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approaches

Page 30: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

How to output a random neighbor from ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž :

1. Choose a uniformly random bucket2. Choose a uniformly random point in the

bucket

Approach 1: Uniform/Uniform

π‘žπ‘ž

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 31: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

How to output a random neighbor from ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž :

1. Choose a random bucket proportional to its size

2. Choose a random point in the bucket

Approach 2: Weighted/Uniform

π‘žπ‘ž

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 32: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

How to output a random neighbor from ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž :

1. Choose a random bucket proportional to its size

2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked

w.p. proportional to its degree 𝑑𝑑𝑝𝑝

Approach 2: Weighted/Uniform

π‘žπ‘ž

Number of buckets that 𝒑𝒑appears in

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 33: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

How to output a random neighbor from ⋃𝑖𝑖 𝐡𝐡𝑖𝑖 𝑔𝑔𝑖𝑖 π‘žπ‘ž :

1. Choose a random bucket proportional to its size

2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked

w.p. proportional to its degree 𝑑𝑑𝑝𝑝

3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝

, o.w. repeat

Approach 3: Optimal

π‘žπ‘ž

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 34: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

1. Choose a random bucket proportional to its size

2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked

w.p. proportional to its degree 𝑑𝑑𝑝𝑝

3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝

, o.w. repeat

Uniform probability

π‘žπ‘ž

Approach 3: Optimal

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 35: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

1. Choose a random bucket proportional to its size

2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked

w.p. proportional to its degree 𝑑𝑑𝑝𝑝

3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝

, o.w. repeat

Uniform probability Need to spend 𝑂𝑂(𝐿𝐿) to find the degree

π‘žπ‘ž

Approach 3: Optimal

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 36: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

1. Choose a random bucket proportional to its size

2. Choose a random point in the bucket Each point 𝑝𝑝 in the neighborhood is picked

w.p. proportional to its degree 𝑑𝑑𝑝𝑝

3. Keep 𝑝𝑝 with probability 1𝑑𝑑𝑝𝑝

, o.w. repeat

Uniform probability Need to spend 𝑂𝑂(𝐿𝐿) to find the degree Might need 𝑂𝑂 π‘‘π‘‘π‘šπ‘šπ‘šπ‘šπ‘šπ‘š = 𝑂𝑂(𝐿𝐿) samples Total time is 𝑂𝑂(𝐿𝐿2)

π‘žπ‘ž

Approach 3: Optimal

𝑔𝑔1

𝑔𝑔2

𝑔𝑔3

𝑔𝑔𝐿𝐿

Page 37: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approximate the degree 𝑑𝑑𝑝𝑝Sample 𝑂𝑂( 𝐿𝐿

π‘‘π‘‘π‘π‘β‹…πœ–πœ–2) buckets out of 𝐿𝐿 buckets to (1 + πœ–πœ–)-approximate the degree.

Still if the degree is low this takes 𝑂𝑂(𝐿𝐿) samples.

Page 38: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approximate the degree 𝑑𝑑𝑝𝑝Sample 𝑂𝑂( 𝐿𝐿

π‘‘π‘‘π‘π‘β‹…πœ–πœ–2) buckets out of 𝐿𝐿 buckets to (1 + πœ–πœ–)-approximate the degree.

Still if the degree is low this takes 𝑂𝑂(𝐿𝐿) samples.

Case 1: Small degree 𝒅𝒅𝒑𝒑:β€’ More samples are required to estimateβ€’ Reject with lower probability -> Fewer queries of this type

Case 2: Large degree 𝒅𝒅𝒑𝒑:β€’ Fewer samples are required to estimateβ€’ Reject with higher probability -> More queries of this type

Page 39: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Approximate the degree 𝑑𝑑𝑝𝑝Sample 𝑂𝑂( 𝐿𝐿

π‘‘π‘‘π‘π‘β‹…πœ–πœ–2) buckets out of 𝐿𝐿 buckets to (1 + πœ–πœ–)-approximate the degree.

Still if the degree is low this takes 𝑂𝑂(𝐿𝐿) samples.

Case 1: Small degree 𝒅𝒅𝒑𝒑:β€’ More samples are required to estimateβ€’ Reject with lower probability -> Fewer queries of this type

Case 2: Large degree 𝒅𝒅𝒑𝒑:β€’ Fewer samples are required to estimateβ€’ Reject with higher probability -> More queries of this type

This decreases 𝑂𝑂(𝐿𝐿2) runtime to �𝑂𝑂(𝐿𝐿) Large dependency on πœ–πœ– of the form 𝑂𝑂( 1

πœ–πœ–2)

Via a different sampling approach we show how to reduce the dependency to logarithmic 𝑂𝑂(log 1

πœ–πœ–).

Page 40: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Experiments

Setupβ€’ Take MNIST as the data setβ€’ Ask a query several times and compute the empirical distribution of the

neighbors.β€’ Compute the statistical distance of the empirical distribution to the uniform

distribution

Page 41: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Experiments

Setupβ€’ Take MNIST as the data setβ€’ Ask a query several times and compute the empirical distribution of the

neighbors.β€’ Compute the statistical distance of the empirical distribution to the uniform

distributionComparisonβ€’ Our algorithm performs 2.5 times worse than the optimal algorithm, but the

other two perform 7 and 10 times worse than the optimal.β€’ Four times faster than the optimal but 15 times slower than the other two

Page 42: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Conclusion

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

The dependence on the parameter 𝑛𝑛 matches the standard Nearest Neighbor.We get an independent near neighbor each time we draw a sample. More generally the approach works for sampling form a sub-

collection of sets.

Page 43: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Conclusion

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

The dependence on the parameter 𝑛𝑛 matches the standard Nearest Neighbor.We get an independent near neighbor each time we draw a sample. More generally the approach works for sampling form a sub-

collection of sets.Open Problem:

o Finding the optimal dependency on the density parameter: 𝐴𝐴 π‘žπ‘ž,𝑐𝑐𝑐𝑐𝐴𝐴 π‘žπ‘ž,𝑐𝑐

Page 44: Near Neighbor Problem Made Fairmahabadi/slides/FairNN.pdfAll existing algorithms for this problem ... Fair Near Neighbor. Dataset of 𝑛𝑛points 𝑃𝑃in a metric space, e.g

Conclusion

Domain Guarantee Space Query

Exact Neighborhood𝑁𝑁(π‘žπ‘ž, π‘Ÿπ‘Ÿ)

w.h.p 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴 ⋅𝑁𝑁 π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿπ‘π‘ π‘žπ‘ž, π‘Ÿπ‘Ÿ

)

Approximate Neighborhood𝑁𝑁 π‘žπ‘ž, π‘Ÿπ‘Ÿ βŠ† 𝑆𝑆 βŠ† 𝑁𝑁(π‘žπ‘ž, π‘π‘π‘Ÿπ‘Ÿ)

In expectation 𝑂𝑂(𝑆𝑆𝐴𝐴𝐴𝐴𝐴𝐴) �𝑂𝑂(𝑇𝑇𝐴𝐴𝐴𝐴𝐴𝐴)

ThanksQuestions?

The dependence on the parameter 𝑛𝑛 matches the standard Nearest Neighbor.We get an independent near neighbor each time we draw a sample. More generally the approach works for sampling form a sub-

collection of sets.Open Problem:

o Finding the optimal dependency on the density parameter: 𝐴𝐴 π‘žπ‘ž,𝑐𝑐𝑐𝑐𝐴𝐴 π‘žπ‘ž,𝑐𝑐


Top Related