classification problem 2-category linearly separable case a- a+ malignant benign

Classification Problem2-Category Linearly Separable

x0w+ b= à 1

wx0w+ b= + 1x0w+ b= 0

Malignant

Benign

Support Vector MachinesMaximizing the Margin between Bounding

Planes

x0w+ b= + 1

x0w+ b= à 1

jjwjj22 = Margin

Algebra of the Classification Problem

2-Category Linearly Separable Case

Given m points in the n dimensional real spaceRn

Represented by anmâ nmatrixAor Membership of each pointA iin the classesAà A+

is specified by anmâ mdiagonal matrix D :

D ii = à 1 if A i 2 Aà and D ii = 1 A i 2 A+if SeparateAà and A+by two bounding planes such that:

A iw+ b > + 1; for D ii = + 1;A iw+ b 6 à 1; for D ii = à 1

More succinctly:D(Aw+ eb)>e

e= [1;1;. . .;1]02 Rm:

, where

Support Vector Classification(Linearly Separable Case)

Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample and represented by

matrices

(x2)0...

75 2 R lâ n; D =

y1 ááá 0......

...0 ááá yl

2 R lâ l

Support Vector Classification(Linearly Separable Case, Primal)

The hyperplane that solves the minimization problem:

min(w;b)2R n+1

21 jjwjj22

D(Aw+ eb)>e;

realizes the maximal margin hyperplane withgeometric margin í = jjwjj2

Support Vector Classification(Linearly Separable Case, Dual Form)

The dual problem of previous MP:

maxë2R l

e0ë à 21ë0DAA0Dë

subject to

e0Dë = 0; ë>0:Applying the KKT optimality conditions, we have

w = A0Dë. But where isb?

06ë ? D(Aw+ eb) à e>0Don’t forget

Dual Representation of SVM

(Key of Kernel Methods: )

The hypothesis is determined by(ëã;bã)

h(x) = sgn(êx;A0Dëã

ë+ bã)

= sgn(P

yiëãi

êxi;x

ë+ bã)

= sgn(P

ëãi >0

yiëãi

êxi;x

ë+ bã)

w = A0Dëã =P

yiëiA0i

Remember : A0i = xi

Compute the Geometric Margin via Dual Solution

The geometric margin í = jjwãjj21 and

êwã;wã

ë= (ëã)0DAA0Dëã, hence we can

computeí by usingëã. Use KKT again (in dual)!

0 6 ëã ? D(AA0Dëã + bãe) à e> 0 Don’t forgete0Dëã = 0

í = (e0ëã)à 21

ëãi >0

ëãi )

Soft Margin SVM(Nonseparable Case)

If data are not linearly separable Primal problem is infeasible Dual problem is unbounded above

Introduce the slack variable for each training point

yi(w0xi + b)>1à øi; øi>0 8 i

The inequality system is always feasible

w = 0; b= 0 & ø= ee.g.

Two Different Measures of Training Error

min(w;b;ø)2R n+1+l

21jjwjj22 + 2

Cjjøjj22

D(Aw+ eb) + ø>e

2-Norm Soft Margin:

1-Norm Soft Margin:min

(w;b;ø)2R n+1+l21jjwjj22 + Ce0ø

D(Aw+ eb) + ø>e

2-Norm Soft Margin Dual Formulation

The Lagrangian for 2-norm soft margin:

L (w;b;ø;ë) = 21w0w+ 2

Cø0ø+ë0[eà D(Aw+ eb) à ø]

where ë>0

The partial derivatives with respect to primalvariables equal zeros

@w@L (w;b;ø;ë) = wà A0Dë = 0

@b@L (w;b;ø;ë) = e0Dë = 0; @ø

@L (w;b;ø;ë) = Cøà ë = 0

Dual Maximization ProblemFor 2-Norm Soft Margin

maxë2R l

e0ë à 21ë0D(AA0+ C

1I )Dë

e0Dë = 0

The corresponding KKT complementarity:

06ë ? D(Aw+ eb) + øà e>0 Use above conditions to find bã

f (x) =ð P

?wiþi(x)

Linear Machine in Feature Space

Let þ : X ! Fbe a nonlinear map from the

input space to some feature space

The classifier will be in the form (Primal):

Make it in the dual form:

f (x) =ð P

lë iyi

êþ(xi) áþ(x)

ëñ+ b

K (x;z) =êþ(x) áþ(z)

Kernel: Represent Inner Product in Feature Space

The classifier will become:

f (x) =ð P

lë iyiK (xi;x)

Definition: A kernel is a functionK : X â X ! Rsuch thatfor all x;z 2 X

where þ : X ! F

Introduce Kernel into DualFormulation

Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample in the feature space

implicitly defined by the kernel K (x;z).The SV classifier is determined byëã that

solvesmaxë2R l

e0ë à 21ë0DK (A;A0)Dë

subject to

e0Dë = 0; ë>0:

The value of kernel function represents the inner product in feature space

Kernel functions merge two steps 1. map input data from input space to feature space (might be infinite dim.) 2. do inner product in the feature space

Kernel TechniqueBased on Mercer’s Condition (1909)

Mercer’s Conditions Guarantees the Convexity of QP

and k(x;z)is a symmetric function onX .

K 2 Rnâ n

be a finite spaceX = f x1; x2; . . .; xngLet

Then k(x;z)is a kernel function if and only if

is positive semi-definite.;K i j = k(xi;xj)

Introduce Kernel in Dual FormulationFor 2-Norm Soft Margin

maxë2R l

e0ë à 21ë0D(K (A;A0) + C

1I )Dë

e0Dë = 0

Then the decision rule is defined by

Use above conditions to find

The feature space implicitly defined byk(x;z) Supposeëãsolves the QP problem:

h(x) = sgn(K (x;A0)Dëã + bã)

Introduce Kernel in Dual Formulationfor 2-Norm Soft Margin

for any

bã is chosen so that

yi[K (A0i;A

0)Dëã + bã] = 1à Cëã

i with ëãi 6= 0

06ëã ? D(K (A;A0)Dëã + ebã)+ øã à e> 0

Because:

and ëã = Cøã

Geometric Margin in Feature Spacefor 2-Norm Soft Margin

The geometric margin in the feature space is defined by

í = jjwãjj21 =

àe0ëã à C

1jjëãjj22áà 2

jjwãjj22 = (ëã)0DK (A;A0)Dëã

...= e0ëã à C

1 jjëãjj22

Why e0øã > jjøãjj22 ?

Discussion about Cfor 2-Norm Soft Margin

The only difference between “hard margin” and 2-norm soft margin is the objective function in the optimization problem

Larger C will give you a smaller margin in the feature space

CompareK (A;A0) & (K (A;A0) + C1I )

Smaller C will give you a better numerical condition

classification problem 2-category linearly separable case a- a+ malignant benign

Documents

benign or not benign

cluster analysis { advanced and special algorithms ji r kl...

perceptron learning - homepage.cs.uri.edu · perceptron...

decision trees - carnegie mellon school of computer...

idiot.s guide to support vector...

cs 343: artificial...

learning relu networks on linearly separable data

pattern - media...

machine learning (cse 446): perceptronperceptron convergence...

support vector machinesmjt.cs.illinois.edu/ml/lec6.pdf ·...

meanfieldgamesmasterequationswithnon-separable

autofcl: automatically tuning fully connected layers for...

nonlinear multi-antenna detection methods · multiuser...

lecture 10: non-linear support vector machines. …lecture...

machine learning (cse 446): perceptron convergenceperceptron...

separable ivp

kernel methods and support vector...

introduction to neural networks - university portal ·...

09 classification -...

limitations on separable measurements by convex...