ryan o’donnell (cmu, ias) joint work with yi wu (cmu, ibm), yuan zhou (cmu)

Ryan O’Donnell (CMU, IAS)

joint work with

Yi Wu (CMU, IBM), Yuan Zhou (CMU)

Locality Sensitive Hashing [Indyk–Motwani ’98]

objects sketchesh :

H : family of hash functions h s.t.

“similar” objects collide w/ high prob.

“dissimilar” objects collide w/ low prob.

Abbreviated history

A

Broder ’97, Altavista

B

0 1 1 1 0 0 1 0 0

1 1 1 0 0 0 1 0 1

wor

d 1?

wor

d 2?

wor

d 3?

wor

d d?

Jaccard similarity:

Invented simple H s.t. Pr [h(A) = h(B)] =

Indyk–Motwani ’98 (cf. Gionis–I–M ’98)

Defined LSH.

Invented very simple H good for

{0, 1}d under Hamming distance.

Showed good LSH implies good

nearest-neighbor-search data structs.

Charikar ’02, STOC

Proposed alternate H (“simhash”) for

Jaccard similarity.

Many papers about LSH

Practice Theory

Free code base [AI’04]

Sequence comparisonin bioinformatics

Association-rule findingin data mining

Collaborative filtering

Clustering nouns bymeaning in NLP

Pose estimation in vision

• • •

[Tenesawa–Tanaka ’07]

[Broder ’97]

[Indyk–Motwani ’98]

[Gionis–Indyk–Motwani ’98]

[Charikar ’02]

[Datar–Immorlica– –Indyk–Mirrokni ’04]

[Motwani–Naor–Panigrahi ’06]

[Andoni–Indyk ’06]

[Neylon ’10]

[Andoni–Indyk ’08, CACM]

Given: (X, dist), r > 0, c > 1

distance space “radius” “approx factor”

Goal: Family H of functions X → S

(S can be any finite set)

s.t. ∀ x, y ∈ X,

≥ p

≤ q

≥ q.5 ≥ q.25 ≥ q.1 ≥ qρ

Theorem

[IM’98, GIM’98]

Given LSH family for (X, dist),

can solve “(r,cr)-near-neighbor search”

for n points with data structure of

size: O(n1+ρ)

query time: Õ(nρ) hash fcn evals.

Example

X = {0,1}d, dist = Hamming

r = ϵd, c = 5

0 1 1 1 0 0 1 0 0

1 1 1 0 0 0 1 0 1

dist ≤ ϵd

or ≥ 5ϵd

H = { h1, h2, …, hd }, hi(x) = xi[IM’98]

“output a random coord.”

Analysis

= q

= qρ

(1 − 5ϵ)1/5 ≈ 1 − ϵ. ∴ ρ ≈

(1 − 5ϵ)1/5 ≤ 1 − ϵ. ∴ ρ ≤

In general, achieves ρ ≤ ∀ c (∀ r).

Optimal upper bound

( {0, 1}d, Ham ), r > 0, c > 1.

S ≝ {0, 1}d ∪ {✔}, H ≝ {hab : dist(a,b) ≤ r}

hab(x) = ✔ if x = a or x = b

x otherwise

0

positive=> 0.5 > 0.1 > 0.01 > 0.0001

Wait, what?

[IM’98, GIM’98] Theorem:

Given LSH family for (X, dist),

can solve “(r,cr)-near-neighbor search”

for n points with data structure of

size: Õ(n1+ρ)

query time: Õ(nρ) hash fcn evals

Wait, what?

[IM’98, GIM’98] Theorem:

size: Õ(n1+ρ)

query time: Õ(nρ) hash fcn evals

More results

For Rd with ℓp-distance:

when p = 1, 0 < p < 1, p = 2

[IM’98] [DIIM’04] [AI’06]For Jaccard similarity: ρ ≤ 1/c

For {0,1}d with Hamming distance:

[Bro’97]

−od(1) (assuming q ≥ 2−o(d))[MNP’06]

immediately

for ℓp-distance

Our Theorem

For {0,1}d with Hamming distance:

−od(1) (assuming q ≥ 2−o(d))

immediately

for ℓp-distance

(∃ r s.t.)

Proof also yields ρ ≥ 1/c for Jaccard.

Proof:

Proof:

Noise-stability is log-convex.

Proof:

A definition, and two lemmas.

Fix any arbitrary function h : {0,1}d → S.

Pick x ∈ {0,1}d at random:

0 1 1 1 0 0 1 0 0x = h(x) = s

Continuous-time (lazy)

random walk for time τ.

0 0 1 1 0 0 1 1 0y = h(y) = s’

def:

Lemma 1:

Lemma 2:

From which the proof of ρ ≥ 1/c follows easily.

For x y,τ

when τ ≪ 1.

Kh(τ) is a log-convex function of τ.

(for any h)

0

1

τ

Continuous-Time Random Walk

: Repeatedly

— waits Exponential(1) seconds,

— dings.

(Reminder: T ~ Expon(1) means Pr[T > u] = e−u.)

In C.T.R.W. on {0,1}d, each coord. gets

its own independent alarm clock.

When ith clock dings, coord. i is rerandomized.

0 1 1 1 0 0 1 0 0 1x =

0 1 0 1 0 0 1 0 1 1y =

timeτ

0

1

1

1

Pr[coord. i never updated] = Pr[Exp(1) > τ] = e−τ

∴ Pr[xi ≠ yi] =

⇒ Lemma 1: dist(x,y) ≈

Lemma 2: Kh(τ) is a log-convex function of τ.

Remark: True for any reversible C.T.M.C.

Recall: For f : {0,1}d → ℝ,

Given hash function h : {0,1}d → S,

for each s ∈ S, introduce

hs : {0,1}d → {0,1}, hs(x) = 1{h(x)=s}

Proof of Lemma 2:

is log-convex.log-convexnon-neg. lin. comb. of

Lemma 1:

Lemma 2:

Theorem: LSH for {0,1}d requires

For x y,τ

is a log-convex function of τ.

Proof: Say H is an LSH family for {0,1}d

with params .

r (c − o(1)) r

def: (Non-neg. lin. comb.

of log-convex fcns.

∴ KH(τ) is also

log-convex.)

w.v.h.p.,

dist(x,y) ≈ ∴ KH(ϵ) ≳ qρ

KH(cϵ) ≲ q

in truth, q+2−Θ(d); we assume q not tiny

∴ KH(ϵ) ≳

KH(cϵ) ≲

∴ KH(0) = ln

ln

ln

1

qρ

q

0

ρ ln q

ln q

KH(τ) is log-convex

0 τ

ln KH(τ)

cϵ

ln q

ϵ

∴

Super-tedious, super-straightforward

Make Lemma 1 precise. (Chernoff)

Make precise. (Taylor)

Choose ϵ = ϵ(c, q, d) very carefully.

Theorem:

Meaningful iff q ≥ 2−o(d); i.e., not tiny.

ryan o’donnell (cmu, ias) joint work with yi wu (cmu, ibm), yuan zhou (cmu)

Documents