advanced methods for sequence analysis · convex optimization g. rätsch, c.s. ong and p. philips:...
TRANSCRIPT
![Page 1: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/1.jpg)
Advanced Methodsfor Sequence Analysis
G. Rätsch 1, C.S. Ong 1,2 and P. Philips 1
1 Friedrich Miescher Laboratory, Tübingen2 Max Planck Institute for Biological Cybernetics, Tübingen
Vorlesung WS 2006/2007Eberhard Karls Universität Tübingen
10 January 2007
http://www.fml.mpg.de/raetsch/lectures/amsa
![Page 2: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/2.jpg)
Convex Optimization
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2
Machine Learning as Numerical Optimization
Unconstrained Optimization
Gradient DescentNewton Method
Constrained Optimization
Some HistoryConvex Functions and Convex SetsCommon Problem FormulationsKKT conditions
http://www.stanford.edu/~boyd/cvxbook/
![Page 3: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/3.jpg)
Recall the SVM
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 3
1. How are examples represented?
2. How are labels represented?
3. What are the inputs to the SVM?
4. What does SVM training output?
![Page 4: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/4.jpg)
Recall the SVM
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 4
1. How are examples represented?
By the kernel matrix K.
2. How are labels represented?
By the label vector y.
3. What are the inputs to the SVM?
K, y
4. What does SVM training output?
α
Recall: Representer Theorem
Kα = y
![Page 5: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/5.jpg)
Linear Equations
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 5
MotivationKα = y
Linear Algebra Golub and van Loan [1996]
Gaussian EliminationLU FactorizationQR FactorizationEigenvalue methodsLanczos methodsConjugate gradient methods
![Page 6: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/6.jpg)
Variational Formulation
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 6
MotivationSolve a scalar function instead of a matrix equation.
Objective
minα
1
2α>Kα− y>α
The gradient at optimality
Kα− y = 0
gives the original problem.
Observations
Second derivavtive of objective is K.K is positive semidefinite hence the problem is con-vex.
![Page 7: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/7.jpg)
Gradient Descent
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 7
GradientFor a multivariate function f : Rn → R, define the gradi-ent of f to be
∇f (x) =
(∂f (x)
∂x1,∂f (x)
∂x2, . . . ,
∂f (x)
∂xn,
).
Algorithm Schölkopf and Smola [2002]For an intial value x0 and precision ε,k = 0while (‖∇f (xk)‖ > ε)
Compute g = ∇f (xk)Perform line search on f (xk − γg) for optimal γxk+1 = xk − γgk = k + 1
![Page 8: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/8.jpg)
Newton Method (1)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 8
HessianFor a multivariate function f : Rn → R,
∇2f (x)ij =∂2f (x)
∂xi∂xj, i, j = 1, . . . , n
MotivationThe second order Taylor approximation f of f at x is
f (x + v) = f (x) +∇f (x)>v +1
2v>∇2f (x)v,
which is a convex quadratic function of v, with minimizer
v∗ = −∇2f (x)−1∇f (x).
Newton stepThe vector v∗ above is called the Newton step for f at x.
![Page 9: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/9.jpg)
Newton Method (2)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 9
Motivation
AlgorithmFor an intial value x0 and precision ε,k = 0while (‖∇f (xk)‖ > ε)
xk+1 = xk −∇2f (x)−1∇f (x)k = k + 1
![Page 10: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/10.jpg)
Non-smooth functions
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 10
MotivationNon differentiable functions f have to be treated care-fully. For example the hinge loss
`(f (xi), yi) := max{0, 1− yif (xi)}.
Piecewise differentiableFor piecewise differentiable functions, we can treat eachpiece individually, and then we can apply the abovemethods.
![Page 11: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/11.jpg)
Constrained Optimization
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 11
minx f0(x)subject to fi(x) 6 0 for all i
gj(x) = 0 for all j
x ∈ Rn is the optimization variable
f0 : Rn → R is the objective or cost function
fi : Rn → R are inequality constraint functions
gj : Rn → R are equality constraint functions
![Page 12: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/12.jpg)
Some History
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 12
Theory (Convex Analysis) ca. 1900–1970
Algorithms
1947: simplex algorithm for linear programming(Dantzig)1960s: early interior-point methods (Fiacco & Mc-Cormick, Dikin, . . . )1970s: ellipsoid method and other subgradient meth-ods1980s: polynomial-time interior-point methods for lin-ear programming (Karmarkar 1984)late 1980s-now: polynomial-time interior-point meth-ods for nonlinear convex optimization (Nesterov & Ne-mirovski 1994)
![Page 13: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/13.jpg)
Convex Set
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 13
line segment between x1 and x2: all points
x = θx1 + (1− θ)x2
with 0 6 θ 6 1.
convex set : contains line seqment between any two pointsin the set
x1, x2 ∈ C, 0 6 θ 6 1 =⇒ x = θx1 + (1− θ)x2 ∈ C.
Examples (one convex, two non-convex sets)
![Page 14: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/14.jpg)
How to check
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 14
Use definitionTo show that a set C is convex, show that C is obtainedfrom simple convex sets (where we establish convexityby the definition).
Applying operations that preserve convexity
intersectionaffine functionsperspective functionslinear-fractional functions
![Page 15: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/15.jpg)
Convex Function
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 15
f : Rn → R is convex if the domain of f is a convex set and
f (θx1 + (1− θ)x2) 6 θf(x1) + (1− θ)f (x2)
for all x1, x2 in the domain of f , 0 6 θ 6 1.
affine: ax + b on R, for any a, b ∈ R.
affine: a>x + b on Rn, for any a ∈ Rn, b ∈ R.
exponential: expax for any a ∈ R.
powers: xa on R++, for a > 1 or a 6 0.
powers of absolute value: |x|a on R, for a > 1.
negative entropy: x log x on R++.
norms: ‖x‖p = (∑n
i=1 |xi|p)1p for p > 1.
![Page 16: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/16.jpg)
How to check (1)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 16
Restrict to linef : Rn → R is convex if and only if the function g : R → R,
g(t) = f (x + tv), dom g = {t|x + tv ∈ dom f}is convex (in t) for any x ∈ dom f, v ∈ Rn.
First-order conditionThe first order approximation of f is a global underesti-mator.
∇f (x) =
(∂f (x)
∂x1,∂f (x)
∂x2, . . . ,
∂f (x)
∂xn,
)f with convex domain is convex if and only if
f (x2) > f (x1) +∇f (x1)>(x2 − x1)
for all x1, x2 ∈ dom f .
![Page 17: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/17.jpg)
How to check (2)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 17
Second-order conditionThe hessian is positive semi-definite.
∇2f (x)ij =∂2f (x)
∂xi∂xj, i, j = 1, . . . , n
f is convex if and only if
∇2f (x) � 0 for all x ∈ dom f.
![Page 18: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/18.jpg)
How to check (3)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 18
Show that f is obtained from simple convex functions byoperations that preserve convexity
nonnegative weighted sum
composition with affine function
pointwise maximum and supremum
composition
minimization
perspective
![Page 19: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/19.jpg)
Convex Optimization
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 19
Constrained Optimization (generally hard)
minx f0(x)subject to fi(x) 6 0 for all i
gj(x) = 0 for all j
Convex Optimization (generally easy)
minx f0(x)subject to fi(x) 6 0 for all i
a>j x = bj for all j
f0, f1, . . . , fm are convex, and the equality constraints areaffine Boyd and Vandenberghe [2004].
![Page 20: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/20.jpg)
Linear Program
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 20
minx c>x + dsubject to Gx 6 h
Ax = b
![Page 21: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/21.jpg)
Quadratic Program
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 21
minx12x>Px + q>x + r
subject to Gx 6 hAx = b
![Page 22: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/22.jpg)
Some other programs
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 22
Quadratically constrained quadratic program
minx12x>P0x + q>0 x + r0
subject to 12x>P0x + q>0 x + r0 6 0 for all i
Ax = b
Second order cone program
minx f>xsubject to ‖Aix + bi‖2 6 c>i x + di for all i
Fx = g
Semidefinite program
minx c>xsubject to x1F1 + x2F2 + . . . + xnFn + G � 0
Ax = b
![Page 23: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/23.jpg)
Problem reformulations
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 23
equivalent formulations of a problem can lead to verydifferent duals
reformulating the primal problem can be useful when thedual is difficult to derive, or uninteresting.
Common reformulations
introduce new variables and equality constraintsmake explicit constraints implicit and vice versatransform objective or constraint functions
![Page 24: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/24.jpg)
Equivalent convex problems (1)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 24
Two problems are (informally) equivalent if the solution ofone is readily obtained from the solution of the other, andvice-versa.
Eliminating equality constraints
minx f0(x)subject to fi(x) 6 0 for all i
Ax = b
is equivalent to
minz f0(Fz + x0)subject to fi(Fz + x0) 6 0 for all i
Ax = b
where F and x0 are such that Ax = b ⇔ x = Fz + x0 forsome z.
![Page 25: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/25.jpg)
Equivalent convex problems (2)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 25
Introducing Equality Constraints
minx f0(A0x + b)subject to fi(Aix + b) for all i
is equivalent to
minx,yif0(y0)
subject to fi(yi) 6 0, for all iyi = Aix + bi
![Page 26: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/26.jpg)
Equivalent convex problems (3)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 26
Introducing slack variables
minx f0(x)subject to a>i x 6 bi for all i
is equivalent to
minx,s f0(x)subject to a>i x + si = bi for all i
si > 0
![Page 27: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/27.jpg)
Equivalent convex problems (4)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 27
Epigraph form
minx f0(x)subject to fi(x) 6 0 for all i
Ax = b
is equivalent to
minx,t tsubject to f0(x)− t 6 0
fi(x) 6 0 for all iAx = b
![Page 28: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/28.jpg)
Equivalent convex problems (5)
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 28
Minimizing over some variables
minx1,x2 f0(x1, x2)subject to fi(x1) 6 0 for all i
is equivalent to
minx1 f0(x1)subject to fi(x1) 6 0 for all i
where f0(x1) = infx2f0(x1, x2).
![Page 29: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/29.jpg)
Lagrange Duality
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 29
LagrangianL : Rn × Rm × Rp → R with
L(x, λ, ν) = f0(x) +
m∑i=1
λifi(x) +
p∑j=1
νjgj(x).
weighted sum of objective and constraint functionsλi is the Lagrange multiplier associated with fi(x) 6 0.νj is the Lagrange multiplier associated with gj(x) = 0.
![Page 30: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/30.jpg)
Dual function
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 30
Lagrange dual functionh : Rm × Rp → R,
h(λ, ν) = infx
L(x, λ, ν).
h is concave.
Lagrange dual problem
maxλ,ν h(λ, ν)subject to λ > 0
![Page 31: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/31.jpg)
Checking for Optimality
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 31
If x, λ, ν satisfy the KKT conditions for a convex problem,then they are optimal.The following four conditions are called the Karush-Kuhn-Tucker (KKT) conditions:
primal constraints: fi(x) 6 0, gi(x) = 0
dual constraints: λ > 0
complementary slackness: λifi(x) = 0
gradient of Lagrangian with respect to x vanishes:
∇f0(x) +∑
i
λi∇fi(x) +∑
j
ν∇gj(x) = 0
![Page 32: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/32.jpg)
Summary
G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 32
Machine Learning as Numerical Optimization
Unconstrained Optimization
Gradient DescentNewton Method
Constrained Optimization
Convex Functions and Convex SetsCommon Problem FormulationsKKT conditions
http://www.stanford.edu/~boyd/cvxbook/
![Page 33: Advanced Methods for Sequence Analysis · Convex Optimization G. Rätsch, C.S. Ong and P. Philips: Advanced Methods for Sequence Analysis, Page 2 Machine Learning as Numerical Optimization](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc39d5c7e62d14f7f006961/html5/thumbnails/33.jpg)
References
Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
Gene H. Golub and Charles F. van Loan. Matrix Computations. Johns Hopkins, 3rd edition, 1996.
B. Scholkopf and A.J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.