1-norm support vector machines good for feature selection solve the quadratic program for some :...
Post on 19-Dec-2015
214 views
TRANSCRIPT
1-norm Support Vector MachinesGood for Feature Selection
÷> 0 Solve the quadratic program for some
:min ÷e0y+ kwk1
D(Awà eí ) + y > ey > 0;w;í
s. t. ,, denoteswhereD ii = æ1 A+ Aàor membership.
Equivalent to solve a Linear Program as follows: min
y>0;w;í ;s ÷e0y+ e0s
D(Awà eí ) + y > e
à s 6 w 6 s
-Support Vector Regression(Linear Case:f (x) = x0w+ b)
Given the training set:
S = f (xi;yi)j xi 2 Rn; yi 2 R; i = 1; . . .;mg
Motivated by SVM:
jjwjj2should be as small as possible Some tiny error should be discarded
Represented by an matrix and a vector mâ n Ay 2 Rm
ï
Try to find such that that is (w;b) y ù Aw+ eb
yi ù w0xi + b ;i = 1ááámwhere e= [1;ááá1]02 Rm
-Insensitive Loss Function"
-insensitive loss function:"
jyi à f (xi)j" = maxf0; jyi à f (xi)j à "g
The loss made by the estimation function, fat the data point(xi;yi) is
jøj" = maxf0; jøj à "g=0 if jøj6 "
jøj à " otherwise
ú
If ø2 Rn then jøj" 2 Rn is defined as:
(jøj") i = jøij" ; i = 1. . .n
(Tiny Error Should Be Discarded)
x
x
x
x
x
x
x
x
x
"
"
-Insensitive Linear Regression"
f (x) = x0w+ b
yj à f (xj) à "f (xk) à yk à "
Find (w;b)with the smallest overall error
-Insensitive Loss Regression
Linear -insensitive loss function:
where
""L "(x;y; f ) = jyà f (x)j"
= max(0; jyà f (x)j à ");
x 2 Rn;y 2 R & f is a real function
Quadratic -insensitive loss function:"
L 2"(x;y; f ) = jyà f (x)j2"
ï- insensitive Support Vector Regression Model
Motivated by SVM: jjwjj2should be as small as possible
Some tiny error should be discarded
min(w;b;ø)2Rn+1+m
21jjwjj22+ Ce0 øj jï
where øj jï 2 Rm; ( øj jï)i = max(0; A iw+ bà yij j à ï )
Why minimize ?probably approximately correct
(pac)
íí w
íí
2
Consider performing linear regression for any trainingdata distribution and
max1ô iô m
jj(xi;yi)jj6R ;0< î < 1 and c> 0
Pr(err(f ) > mc( ï2jjwjj22R2+SSElog2m + logî1)) < îD
Pr(err(f ) ô mc ( ï 2
jjwjj22R2+SSE
log2m+ logî1)) > 1à î
D
D
then
Occam’s razor: the simplest is the best
Reformulated - SVR as a Constrained Minimization Problem
min(w;b;ø;øã)2Rn+1+2m
21w0w+ Ce0(ø+ øã)
yà Awà eb 6 eï + øAw+ ebà y 6 eï + øã
ø;øã > 0
subject to
n+1+2m variables and 2m constrains minimization problem
ï
Enlarge the problem size and computational complexity for solving the problem
SV Regression by Minimizing Quadratic -Insensitive Loss"
min(w;b;ø)2R n+1+l
21jjwjj22 + 2
÷jj(jøj")jj22
We have the following problem:
where (jøj") i = jyi à (w0xi + b)j"
Primal Formulation of SVR for Quadratic -Insensitive Loss
min(w;b;ø+;øà )2R n+1+2l
21jjwjj22 + 2
C(jjø+jj22 + jjøà jj22)
"
Extremely important: At the solution
à Awà eb+ y 6 e" + ø+
Aw+ ebà y 6 e" + øà
ø+;øà > 0
06øà ? ø+>0
subject to
Dual Formulation of SVR for Quadratic -Insensitive Loss"
maxë+;ëà
y0(ë+ à ëà ) à "e0(ë+ + ëà )
à 21(ë+ à ëà )0(AA0+ C
1I )(ë+ à ëà )
e0(ë+ à ëà ) = 0
ë+;ëà > 0
subject to
KKT Complementarity Conditions
KKT conditions are :
Don’t forget we have:
ë+ = Cø+ & ëà = Cøà
06ë+ ? à Awà eb+ y06ëà ? Aw+ ebà y
à e" à ø+>0à e" à øà>0
06ëà ? ë+>006øà ? ø+>0
Simplify Dual Formulation of SVR
maxë
y0ë à "jjëjj1 à 21ë0(AA0+ C
1I )ë
e0ë = 0subject to
The case , problem becomes to the least squares linear regression with a weight decay factor
" = 0