stat 437 lecture notes 4math.wsu.edu/faculty/xchen/stat437/lecturenotes4/lecture... ·...

28
Stat 437 Lecture Notes 4 Xiongzhi Chen Washington State Univ. Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 1 / 28

Upload: others

Post on 06-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Stat 437 Lecture Notes 4

Xiongzhi Chen

Washington State Univ.

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 1 / 28

Prediction under a loss

Let (X,Y ) ∈ Rp+1 have joint distribution Pr(X,Y )

Find a function f(X) to predict Y given values of X

Loss: L(Y, f(X)), e.g.,

Squared loss L(Y, f(X)) = (Y − f(X))2

Absolute deviation loss L(Y, f(X)) = |Y − f(X)|Lα(Y, f(X)) = [d(Y, f(Y ))]α for some α > 0

Integrated squared loss

ρ2(f) =

∫[y − f(x)]2 Pr(dx, dy)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 2 / 28

Optimizers for two losses

Fact 1: ρ2(f) =∫

[y − f(x)]2 Pr(dx, dy) is minimized by

f(x) = E(Y |X = x)

f is called “regression”

Fact 2: ρ1(f) =∫|Y − f(X)|Pr(dx, dy) is minimized by

f(x) = Med(Y |X = x)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 3 / 28

Conditional expectation (supplementary)

Let Y be defined on the probability space (Ω,A, P ), and let C be asub σ-algebra of A.

Assume Y ∈ L1(Ω, P ). the conditional expectation of Y given C isdefined as any random variable g that is measurable with respectto C and that ∫

BgdP =

∫BY dP, ∀B ∈ C

g is often denoted by E(Y |C) (and is the Radon-Nikodymderivative)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 4 / 28

Conditional expectation (supplementary)

If h ∈ C, then with ω ∈ Ω,

E(h(ω)Y (ω)|C) = h(ω)E(Y (ω)|C)

In particular, E(h(X)Y |X) = h(X)E(Y |X)

If X is independent of W , then E(W |X) = E(W )

Iterated expectation:

E(W ) = E[E(W |X)]

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 5 / 28

Minimizer under squared loss

Let X and Y be defined on the same probability space (Ω,A, P )

The relation: Pr (X,Y ) = Pr (Y |X) Pr (X)

Let C = σ(X). then

ρ2(f) =

∫[y − f(x)]2 Pr(dx, dy)

= E(X,Y )

([Y − f (X)]2

)= EX(EY |X [(Y − f(X))2|X])

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 6 / 28

Minimizer under squared loss

Fact: EY |X [(Y − f(X))2|X] ≥ EY |X [(Y − E(Y |X))2]Proof:

EY |X

([Y − f (X)]2 |X

)= EY |X [(Y − E(Y |X) + E(Y |X)− f(X))2|X]

= EY |X([Y − E(Y |X)]2|X) + EY |X([E(Y |X)− f(X)]2|X)

andEY |X([Y − E(Y |X)][E(Y |X)− f(X)]|X) = 0

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 7 / 28

Median as a minimizer

Definition of median m: P (W ≤ m) ≥ 1/2 and P (W ≥ m) ≥ 1/2

Assume W has median m = 0. then for c > 0

E[(|X − c| − |X|)I(X ≤ 0)] = cP (X ≤ 0)

andE[(|X − c| − |X|)I(X > 0)] ≥ −cP (X > 0)

SoE(|X − c| − |X|) ≥ c[P (X ≤ 0)− P (X > 0)] ≥ 0

The case when c < 0 can be deal with for −X

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 8 / 28

Minimizer under absolute deviation loss

Recall ρ1(f) =∫|Y − f(X)|Pr(dx, dy)

ρ1(f) = EX [EY |X(|Y − f(X)|X)]

EY |X(|Y − f(X)|X) is minimized by

f(x) = Med(Y |X = x)

So ρ1(f) is minimized by f(x) = Med(Y |X = x)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 9 / 28

Some classification methods

Nearest neighbors

Logistic regression

Mixture models

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 10 / 28

Settings

N observations (x1, y1), . . . , (xN , yN ) ∈ Rp+1

xi records the ith observation for the same p features

yi is the class label for xi

Assume yi ∈ 0, 1 for all i, i.e., 0 and 1 are the class labels

For a new observation (x, y), estimate its label y, i.e., classify xinto one of the 2 classes

Note: the training set

T = (x1, y1), . . . , (xN , yN )

and (x, y) /∈ T

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 11 / 28

Nearest neighbor method

Compute:

Y (x) =1

k

∑xi∈Nk(x)

yi

where Nk(x) is the neighborhood of x defined by the k closestpoints xi in the training example.

Classification via majority vote: Y (x) = 1 if Y (x) > 0.5, i.e., x isclassified into class 1 if Y (x) > 0.5

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 12 / 28

Example: setting

Observations are from two classes ORANGE or BLUE

Classifier: KNN for which the class label is ORANGE if Y > 0.5

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 13 / 28

Example: 15-NN2.3 Least Squares and Nearest Neighbors 15

15-Nearest Neighbor Classifier

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

..

. .. .. .. .. . .. . .. . .. . . . .. . . . . .. . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

oo

ooo

o

o

o

o

o

o

o

o

oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo o

oo

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o ooo

o

o

ooo o

o

o

o

o

o

o

o

oo

o

o

oo

ooo

o

o

ooo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

ooo

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

o

o

o

o

FIGURE 2.2. The same classification example in two dimensions as in Fig-ure 2.1. The classes are coded as a binary variable (BLUE = 0, ORANGE = 1) andthen fit by 15-nearest-neighbor averaging as in (2.8). The predicted class is hencechosen by majority vote amongst the 15-nearest neighbors.

In Figure 2.2 we see that far fewer training observations are misclassifiedthan in Figure 2.1. This should not give us too much comfort, though, sincein Figure 2.3 none of the training data are misclassified. A little thoughtsuggests that for k-nearest-neighbor fits, the error on the training datashould be approximately an increasing function of k, and will always be 0for k = 1. An independent test set would give us a more satisfactory meansfor comparing the different methods.

It appears that k-nearest-neighbor fits have a single parameter, the num-ber of neighbors k, compared to the p parameters in least-squares fits. Al-though this is the case, we will see that the effective number of parametersof k-nearest neighbors is N/k and is generally bigger than p, and decreaseswith increasing k. To get an idea of why, note that if the neighborhoodswere nonoverlapping, there would be N/k neighborhoods and we would fitone parameter (a mean) in each neighborhood.

It is also clear that we cannot use sum-of-squared errors on the trainingset as a criterion for picking k, since we would always pick k = 1! It wouldseem that k-nearest-neighbor methods would be more appropriate for themixture Scenario 2 described above, while for Gaussian data the decisionboundaries of k-nearest neighbors would be unnecessarily noisy.

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 14 / 28

Example: 1-NN

16 2. Overview of Supervised Learning

1−Nearest Neighbor Classifier

oo

ooo

o

o

o

o

o

o

o

o

oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo o

oo

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o ooo

o

o

ooo o

o

o

o

o

o

o

o

oo

o

o

oo

ooo

o

o

ooo

o

o

o

o

o

o

o

o o

o

o

o

o

o

o

oo

ooo

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

o

o

o

o

FIGURE 2.3. The same classification example in two dimensions as in Fig-ure 2.1. The classes are coded as a binary variable (BLUE = 0, ORANGE = 1), andthen predicted by 1-nearest-neighbor classification.

2.3.3 From Least Squares to Nearest Neighbors

The linear decision boundary from least squares is very smooth, and ap-parently stable to fit. It does appear to rely heavily on the assumptionthat a linear decision boundary is appropriate. In language we will developshortly, it has low variance and potentially high bias.

On the other hand, the k-nearest-neighbor procedures do not appear torely on any stringent assumptions about the underlying data, and can adaptto any situation. However, any particular subregion of the decision bound-ary depends on a handful of input points and their particular positions,and is thus wiggly and unstable—high variance and low bias.

Each method has its own situations for which it works best; in particularlinear regression is more appropriate for Scenario 1 above, while nearestneighbors are more suitable for Scenario 2. The time has come to exposethe oracle! The data in fact were simulated from a model somewhere be-tween the two, but closer to Scenario 2. First we generated 10 means mk

from a bivariate Gaussian distribution N((1, 0)T , I) and labeled this classBLUE. Similarly, 10 more were drawn from N((0, 1)T , I) and labeled classORANGE. Then for each class we generated 100 observations as follows: foreach observation, we picked an mk at random with probability 1/10, and

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 15 / 28

KNN and Regression

KNN: Y (x) = 1k

∑xi∈Nk(x) yi where Nk(x) is the neighborhood of

x defined by the k closest points xi in the training example.

KNN:f(x) = Ave(yi|xi ∈ Nk(x))

KNN: f(x) ≈ E(Y |X = x) when N, k O(1) and k/N = o(1)

KNN is a nonparametric, local method

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 16 / 28

Issues with KNN

Curse of dimensionality on complexity:

Effective number of parameters for k-NN is N/k

Dimensionality of feature: p

p N implies p N/k

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 17 / 28

Issues with KNN

Curse of dimensionality on Nk(x)

N data points zi uniformly distributed in a p-dim unit ballBr = x ∈ Rp : ‖x‖ ≤ rWhen r = 1, median distance from 0 to the closest data point is

d(p,N) =(

1− 2−1/N)1/p

Ba with a ≥ 0.5 likely contains data points, i.e., Nk(0) ⊇ B0.5

needed in order to be a sensible neighborhood

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 18 / 28

Uniform distribution on a ball (supplementary)

Volume of p-ball with radius r is vp(r) = πp/2rp

Γ(1+ p2

)

W ∼ 1/vp(1) implies

Pr(d(W, 0) ≤ x) = Pr(‖W‖ ≤ x) = xp

For iid Xi ∼ 1/vp(1),

G(x) = P ( min1≤i≤n

‖Xi‖ ≤ x) = 1− (1− xp)n

Set G(x) = 1/2 and solve for x to get d(p,N) =(1− 2−1/N

)1/p

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 19 / 28

Settings

N observations (x1, G1), . . . , (xN , GN ) ∈ Rp+1

xi records the ith observation for the same p features

Gi is the class label for xi

Assume Gi ∈ G = 1, . . . ,K for all i, i.e., G contains the classlabels

For a new observation x ∈ Rp, estimate its label G, i.e., classify xinto one of the K classes

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 20 / 28

Loss for classification

Let G(X) be the estimated label for X

Loss: L(k, l) = 0 iff k = l, i.e., L(k, l) is the loss of classifying anobservation belonging to class Gk as GlExpected loss

ρ = E(G,X)[L(Gk, G(X))]

= EX

(K∑k=1

L[Gk, G(X)] Pr(Gk|X)

)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 21 / 28

Bayes classifier

0-1 loss: L(k, l) = 1k 6=l, i.e., L(k, l) = 1 if k 6= l

To minimize

ρ = EX

(K∑k=1

L[Gk, G(X)] Pr(Gk|X)

),

we can minimize

G(x) = argming∈G

K∑k=1

L(Gk, g) Pr(Gk|X = x)

Solution: g = Gj where

j = argmaxk∈G

Pr(Gk|X = x)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 22 / 28

Bayes classifier

0-1 loss: L(k, l) = 1k 6=l and

ρ = EX

(K∑k=1

L[Gk, G(X)] Pr(Gk|X)

)

Optimal classifier

G(x) = Gk if Pr(Gk) = maxg∈G

Pr(g|X = x)

The above is called “Bayes classifier”, where Pr(g|X = x) is aposterior probability

Remark: posterior probability minimizes expected squared loss ina Bayesian setting

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 23 / 28

KNN and Bayes classifier

Use Y in place of G to code the labels

Regression: f(x) = E(Yk|X = x) = Pr(G = Gk|X)

Classification:

G(x) = Gk if Pr(Gk) = maxg∈G

Pr(g|X = x)

kNN: f(x) = Ave(yi|xi ∈ Ns(x)) ≈ E(Yk|X = x)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 24 / 28

Settings

N observations (x1, G1), . . . , (xN , GN ) ∈ Rp+1

xi records the ith observation for the same p features

Gi is the class label for xi

Assume Gi ∈ G = 1, . . . ,K for all i, i.e., G contains the classlabels

For a new observation x ∈ Rp, estimate its label G, i.e., classify xinto one of the K classes

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 25 / 28

Model log-odds

For k = 1, . . . ,K − 1

rk,K = logPr(G = k|X = x)

Pr(G = K|X = x)= βk0 + βTk x

where βk0 and βk are parameters that need to be estimated

The posterior probability Pr(G = K|X = x) is the reference

Each rk,K is a logarithmic odds

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 26 / 28

Model log-odds

Equivalent formulation: for k = 1, . . . ,K − 1

Pr(G = k|X = x) =βk0 + βTk x

1 +∑K−1

l=1 exp(βl0 + βTl x)

and

Pr(G = K|X = x) =1

1 +∑K−1

l=1 exp(βl0 + βTl x)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 27 / 28

Fit a logistic regression

Parameter θ = (β10, βT1 , . . . , β(K−1)0, β

TK−1)

Pr(G = k|X = x) = pk(x; θ)

Maximize logarithmic likelihood:

l(θ) =

N∑i=1

log pgi(xi; θ)

where pgi(xi; θ) = Pr(G = k|X = x; θ)

Method: iteratively reweighted least squares (IWLS)

Xiongzhi Chen (Washington State Univ.) Stat 437 Lecture Notes 4 28 / 28