z }| { dwork.caltech.edu/slides/slides14.pdfw t x= 0. w ho r fa is it? 2 ry relimina p...

21
D D (N ) D (N - K ) g (K ) E (g ) g E (g - ) E (g ) K E ( g - m * ) E ( g - m * ) D D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 D 9 D 10 D z }| {

Upload: others

Post on 15-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Review of Le ture 13• Validation

DvalD

(N)

Dtrain(N − K)

g

(K)

Eval(g )g

Eval(g−) estimates Eout(g)

• Data ontamination

PSfrag repla ements

Validation Set Size, K

Expe tedError

Eval (g−m∗

)

Eout (g−m∗

)

5 15 250.50.60.70.8

Dval slightly ontaminated• Cross validation

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10train trainvalidate Dz }| {

10-fold ross validation

Page 2: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Learning From DataYaser S. Abu-MostafaCalifornia Institute of Te hnologyLe ture 14: Support Ve tor Ma hines

Sponsored by Calte h's Provost O� e, E&AS Division, and IST • Thursday, May 17, 2012

Page 3: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Outline• Maximizing the margin• The solution• Nonlinear transforms

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 2/20

Page 4: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Better linear separationLinearly separable data

Hi

Hi

Hi

Hi

Hi

Hi

Di�erent separating linesWhi h is best?

Two questions:1. Why is bigger margin better?2. Whi h w maximizes the margin? © AM

L Creator: Yaser Abu-Mostafa - LFD Le ture 14 3/20

Page 5: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Remember the growth fun tion?All di hotomies with any line:

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 4/20

Page 6: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Di hotomies with fat marginFat margins imply fewer di hotomies

0.3970.50.866infinity

0.3970.50.866infinity

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 5/20

Page 7: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Finding w with large marginLet xn be the nearest data point to the plane wTx = 0. How far is it?

2 preliminary te hni alities:1. Normalize w:

|wTxn| = 1

2. Pull out w0:w = (w1, · · · , wd) apart from bThe plane is now wTx + b = 0 (no x0)

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 6/20

Page 8: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Computing the distan eThe distan e between xn and the plane wTx + b = 0 where |wTxn + b| = 1

The ve tor w is ⊥ to the plane in the X spa e:xn

Hi

Hi

x’

x’’

wTake x′ and x′′ on the planewTx′ + b = 0 and wTx′′ + b = 0

=⇒ wT(x′ − x′′) = 0

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 7/20

Page 9: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

and the distan e is . . .

Distan e between xn and the plane:xn

Hi

Hi

x

wTake any point x on the planeProje tion of xn − x on w

w =w

‖w‖=⇒ distan e =

∣∣wT(xn − x)

∣∣

distan e =1

‖w‖

∣∣wTxn − wTx∣

∣ =1

‖w‖

∣∣wTxn + b − wTx − b

∣∣ =

1

‖w‖

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 8/20

Page 10: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

The optimization problemMaximize 1

‖w‖

subje t to minn=1,2,...,N

|wTxn + b| = 1

Noti e: |wTxn + b| = yn (wTxn + b)

Minimize 1

2wTw

subje t to yn (wTxn + b) ≥ 1 for n = 1, 2, . . . , N

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 9/20

Page 11: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Outline• Maximizing the margin• The solution• Nonlinear transforms

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 10/20

Page 12: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Constrained optimizationMinimize 1

2wTw

subje t to yn (wTxn + b) ≥ 1 for n = 1, 2, . . . , N

w ∈ Rd, b ∈ R

Lagrange? inequality onstraints =⇒ KKT © AM

L Creator: Yaser Abu-Mostafa - LFD Le ture 14 11/20

Page 13: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

We saw this before

wlin

wtw = C

w

Ein = onst.

∇Einnormal

Remember regularization?Minimize Ein(w) = 1

N(Zw − y)T(Zw − y)subje t to: wTw ≤ C

∇Ein normal to onstraintoptimize onstrainRegularization: Ein wTwSVM: wTw Ein © AM

L Creator: Yaser Abu-Mostafa - LFD Le ture 14 12/20

Page 14: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Lagrange formulationMinimize L(w, b,α) =

1

2wTw −

N∑

n=1

αn(yn (wTxn + b) −1)

w.r.t. w and b and maximize w.r.t. ea h αn ≥ 0

∇wL = w −

N∑

n=1

αnynxn = 0

∂L

∂b= −

N∑

n=1

αnyn = 0

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 13/20

Page 15: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Substituting . . .

w =

N∑

n=1

αnynxn and N∑

n=1

αnyn = 0

in the Lagrangian L(w, b, α) =1

2wTw −

N∑

n=1

αn (yn (wTxn+b)−1 )

we get L(α) =

N∑

n=1

αn −1

2

N∑

n=1

N∑

m=1

ynym αnαm xTnxm

Maximize w.r.t. to α subje t to αn ≥ 0 for n = 1, · · · , N and ∑Nn=1 αnyn = 0

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 14/20

Page 16: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

The solution - quadrati programming

minα

1

T

y1y1 x1Tx1 y1y2 x1

Tx2 . . . y1yN x1TxN

y2y1 x2Tx1 y2y2 x2

Tx2 . . . y2yN x2TxN

. . . . . . . . . . . .

yNy1 xNTx1 yNy2 xN

Tx2 . . . yNyN xNTxN

︸ ︷︷ ︸quadrati oe� ientsα + (−1T)

︸ ︷︷ ︸linear α

subje t to yTα = 0

︸ ︷︷ ︸linear onstraint0︸︷︷︸lower bounds ≤ α ≤ ∞︸︷︷︸upper bounds

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 15/20

Page 17: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

QP hands us α

Solution: α = α1, · · · , αN

=⇒ w =

N∑

n=1

αnynxn

KKT ondition: For n = 1, · · · , N

αn (yn (wTxn + b) − 1) = 0We saw this before!αn > 0 =⇒ xn is a support ve tor

wlin

wtw = C

w

Ein = onst.

∇Einnormal

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 16/20

Page 18: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Support ve tors

Hi

HiClosest xn's to the plane: a hieve the margin=⇒ yn (wTxn + b) = 1

w =∑

xn is SVαnynxn

Solve for b using any SV:yn (wTxn + b) = 1

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 17/20

Page 19: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

Outline• Maximizing the margin• The solution• Nonlinear transforms

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 18/20

Page 20: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

z instead of x

L(α) =

N∑

n=1

αn −1

2

N∑

n=1

N∑

m=1

ynym αnαm zTnzm

PSfrag repla ements−1 0 1

−1

0

1

PSfrag repla ements0 0.5 1

0

0.5

1

X −→ Z

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 19/20

Page 21: z }| { Dwork.caltech.edu/slides/slides14.pdfw T x= 0. w Ho r fa is it? 2 ry relimina p technicalities: 1. rmalize No w: |w T x n| = 1 2. Pull out w 0: w = (w 1,··· ,w d) rt apa

�Support ve tors� in X spa e

Hi

HiSupport ve tors live in Z spa eIn X spa e, �pre-images� of support ve torsThe margin is maintained in Z spa eGeneralization result

E[Eout] ≤ E[# of SV's]N − 1

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 14 20/20