1-norm support vector machines good for feature selection solve the quadratic program for some :...

1-norm Support Vector MachinesGood for Feature Selection

÷> 0 Solve the quadratic program for some

:min ÷e0y+ kwk1

D(Awà eí ) + y > ey > 0;w;í

s. t. ,, denoteswhereD ii = æ1 A+ Aàor membership.

Equivalent to solve a Linear Program as follows: min

y>0;w;í ;s ÷e0y+ e0s

D(Awà eí ) + y > e

à s 6 w 6 s

-Support Vector Regression(Linear Case:f (x) = x0w+ b)

Given the training set:

S = f (xi;yi)j xi 2 Rn; yi 2 R; i = 1; . . .;mg

Motivated by SVM:

jjwjj2should be as small as possible Some tiny error should be discarded

Represented by an matrix and a vector mâ n Ay 2 Rm

ï

Try to find such that that is (w;b) y ù Aw+ eb

yi ù w0xi + b ;i = 1ááámwhere e= [1;ááá1]02 Rm

-Insensitive Loss Function"

-insensitive loss function:"

jyi à f (xi)j" = maxf0; jyi à f (xi)j à "g

The loss made by the estimation function, fat the data point(xi;yi) is

jøj" = maxf0; jøj à "g=0 if jøj6 "

jøj à " otherwise

ú

If ø2 Rn then jøj" 2 Rn is defined as:

(jøj") i = jøij" ; i = 1. . .n

(Tiny Error Should Be Discarded)

x

x

x

x

x

x

x

x

x

"

"

-Insensitive Linear Regression"

f (x) = x0w+ b

yj à f (xj) à "f (xk) à yk à "

Find (w;b)with the smallest overall error

Five Popular Loss Functions

-Insensitive Loss Regression

Linear -insensitive loss function:

where

""L "(x;y; f ) = jyà f (x)j"

= max(0; jyà f (x)j à ");

x 2 Rn;y 2 R & f is a real function

Quadratic -insensitive loss function:"

L 2"(x;y; f ) = jyà f (x)j2"

ï- insensitive Support Vector Regression Model

Motivated by SVM: jjwjj2should be as small as possible

Some tiny error should be discarded

min(w;b;ø)2Rn+1+m

21jjwjj22+ Ce0 øj jï

where øj jï 2 Rm; ( øj jï)i = max(0; A iw+ bà yij j à ï )

Why minimize ?probably approximately correct

(pac)

íí w

íí

2

Consider performing linear regression for any trainingdata distribution and

max1ô iô m

jj(xi;yi)jj6R ;0< î < 1 and c> 0

Pr(err(f ) > mc( ï2jjwjj22R2+SSElog2m + logî1)) < îD

Pr(err(f ) ô mc ( ï 2

jjwjj22R2+SSE

log2m+ logî1)) > 1à î

D

D

then

Occam’s razor: the simplest is the best

Reformulated - SVR as a Constrained Minimization Problem

min(w;b;ø;øã)2Rn+1+2m

21w0w+ Ce0(ø+ øã)

yà Awà eb 6 eï + øAw+ ebà y 6 eï + øã

ø;øã > 0

subject to

n+1+2m variables and 2m constrains minimization problem

ï

Enlarge the problem size and computational complexity for solving the problem

SV Regression by Minimizing Quadratic -Insensitive Loss"

min(w;b;ø)2R n+1+l

21jjwjj22 + 2

÷jj(jøj")jj22

We have the following problem:

where (jøj") i = jyi à (w0xi + b)j"

Primal Formulation of SVR for Quadratic -Insensitive Loss

min(w;b;ø+;øà )2R n+1+2l

21jjwjj22 + 2

C(jjø+jj22 + jjøà jj22)

"

Extremely important: At the solution

à Awà eb+ y 6 e" + ø+

Aw+ ebà y 6 e" + øà

ø+;øà > 0

06øà ? ø+>0

subject to

Dual Formulation of SVR for Quadratic -Insensitive Loss"

maxë+;ëà

y0(ë+ à ëà ) à "e0(ë+ + ëà )

à 21(ë+ à ëà )0(AA0+ C

1I )(ë+ à ëà )

e0(ë+ à ëà ) = 0

ë+;ëà > 0

subject to

KKT Complementarity Conditions

KKT conditions are :

Don’t forget we have:

ë+ = Cø+ & ëà = Cøà

06ë+ ? à Awà eb+ y06ëà ? Aw+ ebà y

à e" à ø+>0à e" à øà>0

06ëà ? ë+>006øà ? ø+>0

Simplify Dual Formulation of SVR

maxë

y0ë à "jjëjj1 à 21ë0(AA0+ C

1I )ë

e0ë = 0subject to

The case , problem becomes to the least squares linear regression with a weight decay factor

" = 0

Kernel in Dual Formulation for SVR

maxë2R l

y0ë à "jjëjj1 à 21ë0(K (A;A0) + C

1I )ë0

e0ë = 0 Then the regression function is defined by

Supposeëãsolves the QP problem:

f (x) = K (x;A0)ëã + bã

where bãis chosen such that

f (xi) à yi = à " à Cëã

i with ëãi > 0

subject to

1-norm support vector machines good for feature selection solve the quadratic program for some :...

Documents