![Page 2: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/2.jpg)
Why do we need optimization in machine learning
• We want to find the best possible decision w.r.t. a problem
• In a supervised setting for instance, we want to learn a map X → Y
• We consider a set of candidates F for such a decision
f
X → Y
F
CO&ML 2
![Page 3: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/3.jpg)
Why do we need optimization in machine learning
f
X → Y
F
• Quantify how well a candidate function in F fits with the database
◦ define a data-dependent criterion Cdata
◦ Typically, given a function f , Cdata(f) is big if f not accurate on the data.
• Given both F and Cdata, a method to find an optimal candidate:
minf∈F Cdata(f).
CO&ML 3
![Page 4: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/4.jpg)
Quizz: Regression
10 20 30 40 50 60 70 802
4
6
8
10
12
14
Surface
Ren
t (x
10.0
00 J
PY
)
Scatter plot of Rent vs. Surface
y = 0.13*x + 2.4y = − 0.0033*x2 + 0.35*x − 0.71y = 6.2e−05*x3 − 0.01*x2 + 0.59*x − 3.1y = 1.4e−06*x4 − 0.00016*x3 + 0.0016*x2 + 0.33*x − 1.2y = − 1.8e−07*x5 + 3.7e−05*x4 − 0.0028*x3 + 0.092*x2 − 1.1*x + 7.1
apartment linear quadratic cubic 4th degree 5th degree
Does Least-Square regression fall into this approach?
1. Yes 2. No
CO&ML 4
![Page 5: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/5.jpg)
Quizz: k-nearest neighbors
X
Does k-nearest neighbors fall into this approach?
1. Yes 2. No
CO&ML 5
![Page 6: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/6.jpg)
Quizz: SVM
Does the SVM fall into this approach?
1. Yes 2. No
CO&ML 6
![Page 7: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/7.jpg)
What is optimization?
• A general formulation for optimization problem is that of defining
◦ unknown variables x1, x2, · · · , xn ∈ X1 × X2 × · · · × Xn, and solve
minimize (or mazimize) f(x1, x2, · · · , xn),
subject to fi(x1, x2, · · · , xn)
{
<,>
=
≤,≥
}
bi, i = 1, 2, · · · ,m;
CO&ML 7
![Page 8: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/8.jpg)
What is optimization?
• A general formulation for optimization problem is that of defining
◦ unknown variables x1, x2, · · · , xn ∈ X1 × X2 × · · · × Xn, and solve
minimize (or mazimize) f(x1, x2, · · · , xn),
subject to fi(x1, x2, · · · , xn)
{
<,>
=
≤,≥
}
bi, i = 1, 2, · · · ,m;
• where
◦ the bi ∈ R
◦ functions f (objective) and g1, g2, · · · , gm (constraints) are functions
X1 ×X2 × · · · × Xn → R
CO&ML 8
![Page 9: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/9.jpg)
What is optimization?
• A general formulation for optimization problem is that of defining
◦ unknown variables x1, x2, · · · , xn ∈ X1 × X2 × · · · × Xn, and solve
minimize (or mazimize) f(x1, x2, · · · , xn),
subject to fi(x1, x2, · · · , xn)
{
<,>
=
≤,≥
}
bi, i = 1, 2, · · · ,m;
• the sets Xi need not be the same, as Xi might be
◦ R scalar numbers,◦ Z integers,◦ S+
n positive definite matrices,◦ strings of letters,◦ etc.
• When the Xi are different, the adjective mixed usually comes in.
CO&ML 9
![Page 10: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/10.jpg)
Optimization & Mathematical Programming
• Optimization is field of applied mathematics on its own.
• Also called Mathematical Programming.
Mathematical Programming is not about programming code for mathematics!
CO&ML 10
![Page 11: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/11.jpg)
Optimization & Mathematical Programming
Mathematical Programming is not about programming code for mathematics!
• George Dantzig, who proposed the “first” optimization algorithm, explains:
◦ The military refer to their various plans or proposed schedules of
training, logistical supply and deployment of combat units as a program.
When I first analyzed the Air Force planning problem and saw that it
could be formulated as a system of linear inequalities, I called my paper
Programming in a Linear Structure. Note that the term program was used
for linear programs long before it was used as the set of instructions
used by a computer. In the early days, these instructions were called
codes.
CO&ML 11
![Page 12: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/12.jpg)
Mathematical Programming
◦ In the summer of 1948, Koopmans and I visited the Rand Corporation. One
day we took a stroll along the Santa Monica beach. Koopmans said: Why
not shorten Programming in a Linear Structure to Linear Programming? I
replied: Thats it! From now on that will be its name. Later that day I
gave a talk at Rand, entitled Linear Programming; years later Tucker
shortened it to Linear Program.
◦ The term Mathematical Programming is due to Robert Dorfman of Harvard,
who felt as early as 1949 that the term Linear Programming was too
restrictive.
CO&ML 12
![Page 13: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/13.jpg)
Mathematical Programming
• Today mathematical programming = optimization. A relatively newdiscipline
◦ What seems to characterize the pre-1947 era was lack of any interest
in trying to optimize. T. Motzkin in his scholarly thesis written in
1936 cites only 42 papers on linear inequality systems, none of which
mentioned an objective function.
CO&ML 13
![Page 14: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/14.jpg)
Before we move on to some reminders
• Keep in mind that optimization is hard. very hard in general.
• In 60 years, we have gone from nothing to quite a few successes.
• But always keep in mind that most problems are intractable.
• For some particular problems there is hope: CONVEX problems
CO&ML 14
![Page 15: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/15.jpg)
Before we move on to some reminders
Evaluation = Programming Assignments
CO&ML 15
![Page 17: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/17.jpg)
Reminders: Convex set
line segment between x1 and x2: all points
{x = λx1 + (1− λ)x2, 0 ≤ λ ≤ 1}
convex set: contains line segment between any two points in the set
C is convex ⇔ ∀x1, x2 ∈ C, 0 ≤ λ ≤ 1; λx1 + (1− λ)x2 ∈ C
examples (one convex, two nonconvex sets)
CO&ML 17
![Page 18: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/18.jpg)
Convex combination and convex hull
convex combination of x1,. . . , xk: any point x of the form
x = λ1x1 + λ2x2 + · · ·+ λkxk
with λ1 + · · ·+ λk = 1, λi ≥ 0
convex hull 〈S〉: set of all convex combinations of points in S
CO&ML 18
![Page 19: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/19.jpg)
Convex cone
conic (nonnegative) combination of x1 and x2: any point of the form
x = λ1x1 + λ2x2
with λ1 ≥ 0, λ2 ≥ 0
0
x1
x2
convex cone: set that contains all conic combinations of points in the set
CO&ML 19
![Page 20: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/20.jpg)
Hyperplanes and halfspaces
hyperplane: set of the form {x | aTx = b} (a 6= 0)
a
x
aTx = b
x0
halfspace: set of the form {x | aTx ≤ b} (a 6= 0)
a
aTx ≥ b
aTx ≤ b
x0
• a is the normal vector
• hyperplanes are affine and convex; halfspaces are convex
CO&ML 20
![Page 21: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/21.jpg)
Norm balls and norm cones
norm: a function ‖·‖ that satisfies
• ‖x‖ ≥ 0; ‖x‖ = 0 if and only if x = 0
• ‖tx‖ = |t| ‖x‖ for t ∈ R
• ‖x+ y‖ ≤ ‖x‖+ ‖y‖
notation: ‖·‖ is general (unspecified) norm; ‖·‖symb is particular norm
norm ball with center xc and radius r: {x | ‖x− xc‖ ≤ r}
norm cone: {(x, t) | ‖x‖ ≤ t}Euclidean norm cone is called second-order cone
x1x2
t
−1
0
1
−1
0
10
0.5
1
cones are convex
CO&ML 21
![Page 22: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/22.jpg)
Usual norms for vectors in Rd
• l2 norm:
‖x‖2 =√xTx =
√
√
√
√
d∑
i=1
x2i
• l1 norm:
‖x‖1 =d∑
i=1
|xi|
• lp norm:
‖x‖p =
(
d∑
i=1
|xi|p)
1p
CO&ML 22
![Page 23: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/23.jpg)
Quizz: lp norms
The unit ball of the lp norm is {x | ‖x‖p ≤ 1}
The unit ball of the lp norm is convex.
1. True 2. False
CO&ML 23
![Page 24: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/24.jpg)
Polyhedra
solution set of finitely many linear inequalities and equalities
Ax � b, Cx = d
(A ∈ Rm×n, C ∈ R
p×n, � is componentwise inequality)
a1 a2
a3
a4
a5
P
polyhedron is intersection of finite number of halfspaces and hyperplanes
CO&ML 24
![Page 25: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/25.jpg)
Positive semidefinite cone
notation:
• Sn is set of symmetric n× n matrices
• Sn+ = {X ∈ Sn | X � 0}: positive semidefinite n× n matrices
X ∈ Sn+ ⇐⇒ zTXz ≥ 0 for all z
Sn+ is a convex cone
• Sn++ = {X ∈ Sn | X ≻ 0}: positive definite n× n matrices
example:
[
x y
y z
]
∈ S2+
xy
z
0
0.5
1
−1
0
1
0
0.5
1
CO&ML 25
![Page 26: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/26.jpg)
Euclidean balls and ellipsoids
(Euclidean) ball with center xc and radius r:
B(xc, r) = {x | ‖x− xc‖2 ≤ r} = {xc + ru | ‖u‖2 ≤ 1}
ellipsoid: set of the form
{x | (x− xc)TP−1(x− xc) ≤ 1}
with P ∈ Sn++ (i.e., P symmetric positive definite)
xc
other representation: {xc +Au | ‖u‖2 ≤ 1} with A square and nonsingular
CO&ML 26
![Page 27: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/27.jpg)
optimization problems
minimize f0(x)subject to fi(x) ≤ bi, i = 1, . . . ,m
• x = (x1, . . . , xn): optimization variables
• f0 : Rn → R: objective function
• fi : Rn → R, i = 1, . . . ,m: constraint functions
optimal solution x⋆ has smallest value of f0 among all vectors that satisfy theconstraints
CO&ML 27
![Page 28: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/28.jpg)
Quizz
minimize f0(x)subject to fi(x) ≤ bi, i = 1, . . . ,m
If an optimization problem has an optimal solution x⋆, this solution is unique.
1. True 2. False
CO&ML 28
![Page 29: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/29.jpg)
Examples
portfolio optimization
• variables: amounts invested in different assets
• constraints: budget, max./min. investment per asset, minimum return
• objective: overall risk or return variance
device sizing in electronic circuits
• variables: device widths and lengths
• constraints: manufacturing limits, timing requirements, maximum area
• objective: power consumption
data fitting
• variables: model parameters
• constraints: prior information, parameter limits
• objective: measure of misfit or prediction error
CO&ML 29
![Page 30: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/30.jpg)
Solving optimization problems
general optimization problem
• very difficult to solve
• methods involve some compromise, e.g., very long computation time, or notalways finding the solution
exceptions: certain problem classes can be solved efficiently and reliably
• least-squares problems
• linear programming problems
• convex optimization problems
CO&ML 30
![Page 31: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/31.jpg)
Least-squares
minimize ‖Ax− b‖22
solving least-squares problems
• analytical solution: x⋆ = (ATA)−1AT b
• reliable and efficient algorithms and software
• computation time proportional to n2k (A ∈ Rk×n); less if structured
• a mature technology
using least-squares
• least-squares problems are easy to recognize
• a few standard techniques increase flexibility (e.g., including weights, addingregularization terms)
CO&ML 31
![Page 32: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/32.jpg)
Linear programming
minimize cTx
subject to aTi x ≤ bi, i = 1, . . . ,m
solving linear programs
• no analytical formula for solution
• reliable and efficient algorithms and software
• computation time proportional to n2m if m ≥ n; less with structure
• a mature technology
using linear programming
• not as easy to recognize as least-squares problems
• a few standard tricks used to convert problems into linear programs(e.g., problems involving ℓ1- or ℓ∞-norms, piecewise-linear functions)
CO&ML 32
![Page 33: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/33.jpg)
Convex optimization problem
minimize f0(x)subject to fi(x) ≤ bi, i = 1, . . . ,m
• objective and constraint functions are convex:
fi(αx+ βy) ≤ αfi(x) + βfi(y)
if α+ β = 1, α ≥ 0, β ≥ 0
• includes least-squares problems and linear programs as special cases
CO&ML 33
![Page 34: Convex Optimization & Machine Learning Introduction to …marcocuturi.net/Teaching/KU/2011/COML/Lec1-COML.pdf · 2016-10-04 · Solving optimization problems general optimization](https://reader033.vdocuments.us/reader033/viewer/2022050421/5f90415041d7676ea10663e2/html5/thumbnails/34.jpg)
Convex optimization problem
solving convex optimization problems
• no analytical solution
• reliable and efficient algorithms
• computation time (roughly) proportional to max{n3, n2m,F}, where F is costof evaluating fi’s and their first and second derivatives
• almost a technology
using convex optimization
• often difficult to recognize
• many tricks for transforming problems into convex form
• surprisingly many problems can be solved via convex optimization
CO&ML 34