![Page 1: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/1.jpg)
Topics in Algorithms 2007
Ramesh Hariharan
![Page 2: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/2.jpg)
Support Vector Machines
![Page 3: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/3.jpg)
Machine Learning
How do learn good separators for 2 classes of points?
Seperator could be linear or non-linear
Maximize margin of separation
![Page 4: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/4.jpg)
Support Vector Machines Hyperplane w
x
w
|w| = 1 For all x on the hyperplane w.x = |w||x| cos(ø)= |x|cos (ø) = constant = -b w.x+b=0ø
-b
![Page 5: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/5.jpg)
Support Vector Machines Margin of separation
w
|w| = 1x Є Blue: wx+b >= Δx Є Red: wx+b <= -Δ
maximize 2 Δ w,b,Δ
wx+b=0
wx+b=Δ
wx+b=-Δ
![Page 6: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/6.jpg)
Support Vector Machines Eliminate Δ by dividing by Δ
w
|w| = 1x Є Blue: (w/Δ) x + (b/Δ) >= 1x Є Red: (w/Δ) x + (b/Δ) <= -1
w’=w/Δ b’=b/Δ |w’|=|w|/Δ=1/Δ
wx+b=0
wx+b=Δ
wx+b=-Δ
![Page 7: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/7.jpg)
Support Vector Machines Perfect Separation Formulation
w
x Є Blue: w’x+b’ >= 1x Є Red: w’x+b’ <= -1
minimize |w’|/2 w’,b’
minimize (w’.w’)/2 w’,b’
wx+b=0
wx+b=Δ
wx+b=-Δ
![Page 8: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/8.jpg)
Support Vector Machines Formulation allowing for
misclassificationx Є Blue: wx+b >= 1x Є Red: -(wx+b) >= 1
minimize (w.w)/2 w,b
xi Є Blue: wxi + b >= 1-ξixi Є Red: -(wxi + b) >= 1-ξi ξi >= 0
minimize (w.w)/2 + C Σξi w,b,ξi
![Page 9: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/9.jpg)
Support Vector Machines Duality
yi (wxi + b) + ξi >= 1 ξi >= 0 yi=+/-1, class label
minimize (w.w)/2 + C Σξi w,b,ξi
Primal
Σ λi yi = 0 λi >= 0 -λi >= C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi
Dual
![Page 10: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/10.jpg)
Support Vector Machines Duality (Primal Lagrangian
Dual) If Primal is feasible then Primal=Lagrangian
Primal yi (wxi + b) + ξi >= 1 ξi >= 0 yi=+/-1, class label min (w.w)/2 + C Σξiw,b,ξi
Primal
min maxw,b,ξi λi, αi >=0
(w.w)/2 + C Σξi- Σi λi (yi (wxi + b) + ξi - 1) - Σi αi (ξi - 0)
Lagrangian Primal
=
![Page 11: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/11.jpg)
Support Vector Machines Lagrangian Primal Lagrangian
Dual Langrangian Primal >= Lagrangian Dual
>=
min maxw,b,ξi λi, αi >=0
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1) -Σiαi(ξi -0)
Lagrangian Primal
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
Lagrangian Dual
![Page 12: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/12.jpg)
Support Vector Machines Lagrangian Primal >= Lagrangian DualProof Consider a 2d matrix
Find max in each row Find the smallest of these values
Find min in each column Find the largest of these values
LP
LD
![Page 13: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/13.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
ProofConsider w* b* ξ* optimal for primal Find λi, αi>=0 such that minimizing over w,b,ξ gives w* b* ξ* Σiλi(yi (w*xi+b*)+ξi* -1)=0 Σiαi(ξi* -0)=0
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 14: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/14.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
ProofConsider w* b* ξi* optimal for primal Find λi, αi >=0 such that Σiλi(yi (w*xi+b*)+ξi* -1)=0 Σiαi(ξi* -0)=0 ξi* > 0 implies αi=0 yi (w*xi+b*)+ξi* -1 !=0 implies λi=0
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 15: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/15.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
ProofConsider w* b* ξi* optimal for primal Find λi, αi >=0 such that minimizing over w,b,ξi gives w*,b*, ξi* at w*,b*,ξi* δ/ δwj = 0, δ/ δξi = 0, δ/ δb = 0and second derivatives should be non-neg at all places
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 16: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/16.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
ProofConsider w* b* ξi* optimal for primal Find λi, αi >=0 such that minimizing over w,b gives w*,b*
w* - Σiλi yi xi = 0
-Σiλi yi = 0
-λi - αi +C = 0 second derivatives are always non-neg
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 17: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/17.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
ProofConsider w* b* ξi* optimal for primalFind λi, αi >=0 such that ξi* > 0 implies αi=0 yi (w*xi+b*)+ξi* -1 !=0 implies λi=0 w* - Σiλi yi xi = 0
-Σiλi yi = 0
- λi - αi + C = 0 Such a λi, αi >=0 always exists!!!!!
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 18: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/18.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
Roll all primal variables into w lagrange multipliers into λ
min f(w) w Xw >= y
max min f(w) – λ (Xw-y)λ>=0 w
min max f(w) – λ (Xw-y) w λ>=0
![Page 19: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/19.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
X w*
y >
=0
=>=0λ
λ>=0X=
Grad(f) at w* =
Claim: This is satisfiable
>=
![Page 20: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/20.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
λ>=0X=
Grad(f) =
Claim: This is satisfiable
Grad(f)
Row vectors of X=
Grad(f)
![Page 21: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/21.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
λ>=0X=
Grad(f) =
Claim: This is satisfiable
Row vectors of X=
Grad(f)
h
X= h >=0, Grad(f) h < 0w*+h is feasible and f(w*+h)<f(w*) for small enough h
![Page 22: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/22.jpg)
Support Vector Machines Finally the Lagrange Dual
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
w - Σiλi yi xi = 0
-Σiλi yi = 0
-λi - αi +C = 0
Rewrite in final dual form
Σ λi yi = 0λi >= 0-λi >= -C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi
![Page 23: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081404/5a4d1ba67f8b9ab0599c91f2/html5/thumbnails/23.jpg)
Support Vector Machines Karush-Kuhn-Tucker conditions
Rewrite in final dual form
Σ λi yi = 0λi >= 0-λi >= C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi
Σiλi(yi (w*xi+b*)+ξi* -1)=0 Σiαi(ξi* -0)=0 -λi - αi +C = 0
If ξi*>0 αi =0 λi =CIf yi (w*xi+b*)+ξi* -1>0 λi = 0 ξi* = 0 If 0 < λi <C yi (w*xi+b*)=1