introduction to machine learning lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · mehryar...

21
Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research [email protected]

Upload: others

Post on 27-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

Introduction to Machine LearningLecture 4

Mehryar MohriCourant Institute and Google Research

[email protected]

Page 2: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

Nearest-Neighbor Algorithms

Page 3: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Nearest Neighbor Algorithms

Definition: fix , given a labeled sample

3

the -NN returns the hypothesis defined byk hS

∀x ∈ X , hS(x) = 1Pi:yi=1 wi>

Pi:yi=0 wi

,

where the weights are chosen such that if is among the nearest neighbors of .

w1, . . . , wm

wi = 1k

xi kx

S = ((x1, y1), . . . , (xm, ym)) ∈ (X × 0, 1)m,

k ≥ 1

Page 4: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Voronoi Diagram

4

Page 5: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Questions

Performance: does it work?

Choice of the weights: are there better choices than uniform? In particular, can take into account distance to each nearest neighbor.

Choice of the distance metric: can a useful metric be defined (or even learned) for a particular problem?

Computation in high dimension: data structures and algorithms to improve upon naive algorithm.

5

Page 6: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Bayes Classifier

Definition: the Bayes error is defined by

• the Bayes classifier is a measurable hypothesis achieving that error.

6

R = infh

hmeasurable

Pr(x,y)∼D

[h(x) = y].

Page 7: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Set-up

Sample drawn i.i.d. according to some distribution

Nearest neighbor of :

Error of hypothesis returned on point :

7

D

S = ((x1, y1), . . . , (xm, ym)) ∈ (X × 0, 1)m.

x ∈ X

NN(S, x) = argminx in S

d(x, x).

x ∈ X

R(hS , x) = 1y(hS(x)) =y(x),

where is the label of point (random variable).y(u) u

Page 8: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Convergence of NN Algorithm

Lemma: for any in support, with probability one when .

Proof: Let be in the support of the distribution, then for any , . Thus,

8

|S|→ +∞

xPr[B(x, )] > 0 > 0

PrdNN(S, x), x

>

=

1− Pr[B(x, )]

|S|→ 0.

Since is decreasing with , this also implies convergence with probability one.

NN(S, x), x)→ x

dNN(S, x), x

|S|

x

Page 9: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

NN Algorithm - Limit Guarantee

Theorem: let be the hypothesis returned by the nearest neighbor algorithm. Then,

Proof:

9

hS

ES∼Dm

[R(hS , x)]

= PrS∼Dm

[y(NN(S, x)) = y(x)]

=

x

Pr [y(x) = y(x) | NN(S, x) = x] PrS∼Dm

[NN(S, x) = x]

=

x

(1− Pr [y(x) = y(x) | NN(S, x) = x]) PrS∼Dm

[NN(S, x) = x]

=

x

1−

y∈YPr[y | x] Pr[y | x]

Pr

S∼Dm[NN(S, x) = x].

lim|S|→∞

ES∼Dm

[R(hS)] ≤ 2R∗

1− |Y|/2|Y|− 1

R∗

.

Page 10: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

NN Algorithm - Limit Guarantee

In view of the lemma, with probability one when . Thus,

Let , then

10

NN(S, x)→ x|S|→ +∞

lim|S|→+∞

ES∼Dm

[R(hS , x)] =1−

y∈YPr[y | x]2

.

From this it can be concluded that

lim|S|→+∞

ES∼Dm

[R(hS)] = Ex∼D

1−

y∈YPr[y | x]2

.

y∗ = argmaxy

Pr[y|x]

1−

y∈YPr[y | x]2 = 1− Pr[y∗ | x]2 −

y =y∗

Pr[y | x]2.

Page 11: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

NN Algorithm - Limit Guarantee

Now, since the variance is non-negative,

11

1|Y|− 1

y =y∗

Pr[y | x]2 − 1|Y|− 1

y =y∗

Pr[y | x]2≥ 0.

Thus, in view of ,

y =y∗ Pr[y | x] = (1− Pr[y∗ | x])

Ex∼D

1−

y∈YPr[y | x]2

≤ E

x∼D

1− Pr[y∗ | x]2 − (1− Pr[y∗ | x])2

|Y|− 1

= Ex∼D

1− (1 −R∗(x))2 − R∗(x)2

|Y|− 1

= Ex∼D

2R∗(x)− |Y|R∗(x)2

|Y|− 1

≤ 2R∗ − |Y|R∗2

|Y|− 1. (using E[R∗(x)2] ≤ E[R∗(x)]2)

Page 12: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

Notes

Similar results for the -NN algorithm.

• or .

Guarantees only for infinite amount of data:

• machine learning deals with finite samples.

• arbitrarily slow convergence rate.

12

k

(k →∞) ∧ ( km → 0)m = |S|→∞

Page 13: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

NN Problem

Problem: given sample , find the nearest neighbor of test point .

• general problem extensively studied in computer science.

• exact vs. approximate algorithms.

• dimensionality crucial.

• better algorithms for small intrinsic dimension (e.g., limited doubling dimension).

13

S = ((x1, y1), . . . , (xm, ym))x

N

Page 14: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

NN Problem - Case N = 2

Algorithm:

• compute Voronoi diagram in .

• point location data structure to determine NN.

• complexity: space, time.

14

O(m log m)

O(m) O(log m)

x

Page 15: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

NN Problem - Case N > 2

Voronoi diagram: size in .

Linear algorithm (no pre-processing):

• compute distance for all .

• complexity of distance computation: .

• no additional space needed.

Tree-based data structures: pre-processing.

• often used in applications: -d trees ( -dimensional trees).

15

x− xi i ∈ [1, m]

Ω(Nm)

kk

Om

N/2

Page 16: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

k-d Trees

Binary space partioning trees.

Prominent tree-based data structure.

Works for low or medium dimensionality.

NN search:

• for randomly distributed points.

• in the worst case (Lee and Wong, 1977).

Can be extended to -NN search.

High dimension: typically inefficient.

16

(Bentley, 1975)

O(log m)O(Nm

1− 1N )

approximate NN methods.

k

Page 17: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

k-d Trees - Illustration

17

(4, 2), X (5, 9), X

(3, 5), Y

(1, 1) (8, 4) (2, 9.5) (7, 5.5)

Page 18: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

k-d Trees - Construction

Algorithm: for each non-leaf node,

• choose dimension (e.g., longest of hyperrectangle).

• choose pivot (median).

• split node according to (pivot, dimension).

18

balanced tree, binary space partitioning.

Page 19: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

k-d Trees - NN Search

19

Page 20: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

pageMehryar Mohri - Introduction to Machine Learning

k-d Trees - NN Search

Algorithm:

• find region containing (starting from root node, move to child node based on node test).

• save region point as current best.

• move up tree and recursively search regions intersecting hypersphere :

• update current best if current point is closer.

• restart search with each intersecting sub-tree.

• move up tree when no more intersecting sub-tree.

20

x

x0

S(x, x− x0)

Page 21: Introduction to Machine Learning Lecture 4mohri/mlu/mlu_lecture_4.pdf · 2011. 9. 26. · Mehryar Mohri - Introduction to Machine Learning page k-d Trees - NN Search Algorithm: •

Mehryar Mohri - Foundations of Machine Learning Courant Institute, NYUpage

References• Jon Louis Bentley. Multidimensional binary search trees used for associative searching.

Communications of the ACM, Vol. 18, No. 9, 1975.

• Lee, D. T. and Wong, C. K. Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Informatica Vol. 9, Issue 1. Springer, NY, 1977.

21