mixed-integer disciplined convexprogramming
Miles Lubin, Emre Yamangil, Russell Bent, and Juan Pablo VielmaJanuary 7, 2016
MIT & Los Alamos National Laboratory
“Models with integer and binary variables must still obeyall of the same disciplined convex programming rules thatCVX enforces for continuous models. For this reason, we arecalling these models mixed-integer disciplined convexprograms or MIDCPs.“
1
First, mixed-integer linear programming (MILP),
minx
cTx
subject to Ax = b,x ≥ 0,xi ∈ Z, ∀i ∈ I
- Despite NP-Hardness, many problems of practical interest canbe solved to optimality or near optimality
- Algorithms are based on LP relaxations, branch & bound, cuts,preprocessing, heuristics, ...
- 50+ years of commercial investment in developing thesetechniques
2
who uses milp?
http://www.gurobi.com/company/example-customers3
example: unit commitment
Minimize unit-production costs + fixed operating costssubject to:
- Total generation = total demand, hourly over 24h- Generation and transmission limits
◦ If a generator is on, it produces within some interval [l,u],otherwise its production is zero.
- Linear approximation of nonlinear powerflow laws
1% reduction in generation gives billions of dollars per year costreduction. Worth spending time to improve optimality.
4
Mixed-integer convex programming (MICP),
minx
f(x)
subject to gj(x) ≤ 0, ∀jxi ∈ Z, ∀i ∈ I
Objective function f and constraints gj convex
5
Why mixed-integer convex over MILP?
- Risk/uncertainty- Statistical model fitting (Bertsimas and King, 2015)- Model predictive control- Add discrete aspects to anything you already model using CVX
6
what can be modeled using micp?
Disjunctions/unions of compact convex sets (Stubbs and Mehrotra,1999)
{x : f1(x) ≤ 0 or f2(x) ≤ 0}
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
7
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
x belongs to the convex hull iff ∃ u1,u2, λ1, λ2 such thatx = λ1u1 + λ2u2λ1 + λ2 = 1
λ1 ≥ 0, λ2 ≥ 0f1(u1) ≤ 0f2(u2) ≤ 0
8
x = λ1u1 + λ2u2λ1 + λ2 = 1
λ1 ≥ 0, λ2 ≥ 0f1(u1) ≤ 0f2(u2) ≤ 0
- If we impose λ1, λ2 ∈ {0, 1} then have conditions for the union,but not convex
- Instead, introduce z1 = λ1u1, z2 = λ2u2
9
x = λ1u1 + λ2u2λ1 + λ2 = 1
λ1 ≥ 0, λ2 ≥ 0f1(u1) ≤ 0f2(u2) ≤ 0
- If we impose λ1, λ2 ∈ {0, 1} then have conditions for the union,but not convex
- Instead, introduce z1 = λ1u1, z2 = λ2u2
9
x = z1 + z2λ1 + λ2 = 1
λ1 ≥ 0, λ2 ≥ 0f1(z1/λ1) ≤ 0f2(z2/λ2) ≤ 0
- Still not convex
10
x = z1 + z2λ1 + λ2 = 1
λ1 ≥ 0, λ2 ≥ 0λ1f1(z1/λ1) ≤ 0λ2f2(z2/λ2) ≤ 0λ1, λ2 ∈ {0, 1}
- Perspective function of any convex function is convex.
11
- Which nonconvex sets can be modeled with MICP? Finite unionsof closed, convex sets
Conjecture: the following set is not MICP representable
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
12
State of the art for solving MICP
minx
f(x)
subject to gj(x) ≤ 0, ∀jxi ∈ Z, ∀i ∈ I
- f,gj convex, smooth- Oracles for gradients and Hessians are available.- Exploit derivative-based nonlinear programming (NLP) solvers- Bonmin, KNITRO, α-ECP, DICOPT, FilMINT, MINLP_BB, SBB, ...
13
[...] solution algorithms for [MICP] have benefit from thetechnological progress made in solving MILP and NLP.However, in the realm of [MICP], the progress has been farmore modest, and the dimension of solvable [MICP] bycurrent solvers is small when compared to MILPs and NLPs.
– Bonami, Kilinç, and Linderoth (2012)
14
Mixed-integer disciplined convex programming (MIDCP),
minx
f(x)
subject to gj(x) ≤ 0, ∀jxi ∈ Z, ∀i ∈ I
Objective function f and constraints gj disciplined convexForget about derivatives
15
What is a disciplined convex function? (Grant, Boyd, Ye, 2006)
- Expressed as composition of basic atoms. Rules of compositionprove convexity.
- Example√x2 + y2 versus ||[x, y]||2,
√xy versus geomean(x, y).
Why disciplined convex?
- Functions expressed in this form can be automaticallyconverted to conic optimization problems. We have goodsolvers and theory for conic optimization.
16
DCP
minx
f(x)
s. t. gj(x) ≤ 0,∀j
Conic
minx
cTx
s. t. Ax = bx ∈ K
Optimal solution x∗
CVXCVXPYConvex.jl
ECOSSCS
Mosek
17
MIDCP
minx
f(x)
s. t. gj(x) ≤ 0,∀jxi ∈ Z, ∀i ∈ I
MIConic
minx
cTx
s. t. Ax = bx ∈ K
xi ∈ Z, ∀i ∈ I
Optimal solution x∗
CVXCVXPYConvex.jl
???
18
- Mixed-integer second order cone programming (MISOCP) hasrapidly improving commercial support by Gurobi/CPLEX◦ Gurobi claims 2.8x improvement from version 6.0 to 6.5 (1year)
- Otherwise (exponential cones, SDP), mainly research codes(ecos_bb by Han Wang, 2014; SCIP-SDP by Mars, 2012)
- For MISOCP, no established open-source codes
19
naive branch and bound
- Relax integer constraints. Solution provides valid lower boundon optimal value
- Pick a variable i to “branch on”. Divide search space into xi ≤ A,xi ≥ A+ 1. Solve relaxations on each of the two branches.
- If relaxation value on a branch is worse than the best knownsolution, then can prune that branch.
Implemented in ecos_bb
20
facility layout
fo9 facility layout problem
- 182 variables (72 binary), 343 constraints- ecos_bb: Not solved in 11 hours, no feasible solution- Gurobi (MISOCP): 19 minutes to optimal
Our new method (requires MILP solver and conic solver):
- CBC+ECOS: 5.5 hours to optimal- Gurobi (MILP) + ECOS: 8 minutes to optimal
21
facility layout
fo9 facility layout problem
- 182 variables (72 binary), 343 constraints- ecos_bb: Not solved in 11 hours, no feasible solution- Gurobi (MISOCP): 19 minutes to optimal
Our new method (requires MILP solver and conic solver):
- CBC+ECOS: 5.5 hours to optimal- Gurobi (MILP) + ECOS: 8 minutes to optimal
21
How general is MISOCP?MINLPLIB2 benchmark library designed for derivative-basedmixed-integer nonlinear solvers. We classified all of the convexinstances by conic representability.Of the 333 convex instances, 60% are MISOCP representable.
- tls6 instance previously unsolved. We translated it toConvex.jl and it was solved by Gurobi!
22
What about the remaining 40% of problems?Of remaining 129 instances, in 107 the only nonlinear expressionsinvolve exp and log.
Let’s consider the exponential cone:
EXP = cl{(x, y, z) ∈ R3 : y exp(x/y) ≤ z, y > 0}
- Closed, convex, nonsymmetric cone. Theoretically tractable andsupported by ECOS (Akle, 2015).
23
What about the remaining 40% of problems?Of remaining 129 instances, in 107 the only nonlinear expressionsinvolve exp and log.Let’s consider the exponential cone:
EXP = cl{(x, y, z) ∈ R3 : y exp(x/y) ≤ z, y > 0}
- Closed, convex, nonsymmetric cone. Theoretically tractable andsupported by ECOS (Akle, 2015).
23
What about the remaining 22 of 333 instances?
- Seven representable by combination of EXP and second-ordercones (SOC).
- Two representable by the power cone
POWα = {(x, y, z) ∈ R3 : |z| ≤ xay1−a, x ≥ 0, y ≥ 0}
- One family of 13 instances has a univariate convex function notknown to be exactly representable by SOC, EXP, or POW (butcould approximate).
24
Folklore: Almost all convex optimization problems of practicalinterest can be represented as conic programming problems usingsecond-order, positive semidefinite, exponential, and power cones.
25
Folklore: Almost all convex optimization problems of practicalinterest can be represented as conic programming problems usingsecond-order, positive semidefinite, exponential, and power cones.Rest of this talk:
- How should we solve mixed-integer conic problems?- What do we gain (if anything) from the translation to conicform?
26
mixed-integer conic programming
minx,z
cTz
s.t. Axx+ Azz = b (MICONE)L ≤ x ≤ U, x ∈ Zn, z ∈ K,
- K = K1 ×K2 × · · · × Kr, where each Ki is a standard cone:nonnegative orthant, second-order cone, exponential cone,power cone, or even positive semidefinite cone.
- WLOG, integer variables do not belong to cones, have zeroobjective coefficients
27
polyhedral outer approximation
MILP solvers are powerful, can we use them to solve MICONEproblems?
- To do so, need to approximate cone K with finite number oflinear constraints (i.e., polyhedron)
28
dual cones
Given a closed, convex cone K, the dual cone K∗ is the set such thatz ∈ K iff zTβ ≥ 0 ∀β ∈ K∗.
- Nonnegative orthant, second-order and positive semidefinitecone are self dual. Exponential and power cones are not.
- Any finite subset of K∗ yields a polyhedral outer approximationof K!
29
minx,z
cTz
s.t. Axx+ Azz = b (MIOA(T))L ≤ x ≤ U, x ∈ Zn,
βTz ≥ 0∀β ∈ T,
- If T = K∗, then equivalent to MICONE- If T ⊆ K∗ and |T| < ∞, then polyhedral outer approximation →MILP relaxation of MICONE
- Claim: ∃T ⊆ K∗, |T| < ∞ such that MIOA(T) is equivalent toMICONE
30
Consider the subproblem with integer values x = x̂ fixed:
vx̂ = minz
cTz
s.t. Azz = b− Axx̂, (CP(x̂))z ∈ K.
The dual of CP(x̂) is
maxβ,λ
λT(b− Axx̂)
s.t. β = c− ATzλβ ∈ K∗.
31
the conic outer approximation algorithm
- Start with T = ∅ (or small).- Until the optimal objective of MIOA(T) is ≥ the optimalobjective of the best feasible solution:◦ Let x̂ be the optimal integer solution of the MILP problemMIOA(T)
◦ Solve conic problem CP(x̂), add dual solution βx̂ to T◦ If CP(x̂) was feasible, record new feasible solution toMICONE
Finite termination assuming strong duality holds at subproblems
32
the conic outer approximation algorithm
- Start with T = ∅ (or small).- Until the optimal objective of MIOA(T) is ≥ the optimalobjective of the best feasible solution:◦ Let x̂ be the optimal integer solution of the MILP problemMIOA(T)
◦ Solve conic problem CP(x̂), add dual solution βx̂ to T◦ If CP(x̂) was feasible, record new feasible solution toMICONE
Finite termination assuming strong duality holds at subproblems
32
the conic outer approximation algorithm
- Easy to implement, requires black box MILP and conic solvers- Very often, number of iterations is < 10, sometimes 10-30.Worst observed is 171.
- Each MILP subproblem could be NP-Hard, but takes advantageof fast solvers
33
what can go wrong?
Hijazi et al. (2014):
Bn :=
x ∈ {0, 1}n :n∑j=1
(xj −
12
)2≤ n− 1
4
34
what can go wrong?
Hijazi et al. (2014):
Bn :=
x ∈ {0, 1}n :n∑j=1
(xj −
12
)2≤ n− 1
4
- Bn is empty- No outer approximating hyperplane can exclude more than oneinteger lattice point
- So any polyhedral outer approximation which contains nointeger vertices needs at least 2n hyperplanes
35
how to fix it?
Bn :=
x ∈ {0, 1}n :n∑j=1
(xj −
12
)2≤ n− 1
4
Consider the equivalent formulation in an extended set of variables:
B̂n :=
x ∈ {0, 1}n, z ∈ Rn :n∑j=1
zj ≤n− 14 ,
(xj −
12
)2≤ zj ∀j
- Bn = projx B̂n
- If you form a polyhedral relaxation of each(xj − 1
2)2 ≤ zj
individually, then 2n hyperplanes in R2n is sufficient to excludeall integer points
- Outer approximation algorithm now converges in 2 iterations- Why? Small polyhedron in higher dimension can haveexponentially many facets in lower dimension
36
how to fix it?
Bn :=
x ∈ {0, 1}n :n∑j=1
(xj −
12
)2≤ n− 1
4
Consider the equivalent formulation in an extended set of variables:
B̂n :=
x ∈ {0, 1}n, z ∈ Rn :n∑j=1
zj ≤n− 14 ,
(xj −
12
)2≤ zj ∀j
- Bn = projx B̂n
- If you form a polyhedral relaxation of each(xj − 1
2)2 ≤ zj
individually, then 2n hyperplanes in R2n is sufficient to excludeall integer points
- Outer approximation algorithm now converges in 2 iterations- Why? Small polyhedron in higher dimension can haveexponentially many facets in lower dimension
36
how to fix it?
Bn :=
x ∈ {0, 1}n :n∑j=1
(xj −
12
)2≤ n− 1
4
Consider the equivalent formulation in an extended set of variables:
B̂n :=
x ∈ {0, 1}n, z ∈ Rn :n∑j=1
zj ≤n− 14 ,
(xj −
12
)2≤ zj ∀j
- Bn = projx B̂n
- If you form a polyhedral relaxation of each(xj − 1
2)2 ≤ zj
individually, then 2n hyperplanes in R2n is sufficient to excludeall integer points
- Outer approximation algorithm now converges in 2 iterations
- Why? Small polyhedron in higher dimension can haveexponentially many facets in lower dimension
36
how to fix it?
Bn :=
x ∈ {0, 1}n :n∑j=1
(xj −
12
)2≤ n− 1
4
Consider the equivalent formulation in an extended set of variables:
B̂n :=
x ∈ {0, 1}n, z ∈ Rn :n∑j=1
zj ≤n− 14 ,
(xj −
12
)2≤ zj ∀j
- Bn = projx B̂n
- If you form a polyhedral relaxation of each(xj − 1
2)2 ≤ zj
individually, then 2n hyperplanes in R2n is sufficient to excludeall integer points
- Outer approximation algorithm now converges in 2 iterations- Why? Small polyhedron in higher dimension can haveexponentially many facets in lower dimension
36
“extended formulation for second-order cone”
Vielma, Dunning, Huchette, Lubin (2015):
- For MISOCP: Rewrite ∑ix2i ≤ t2
as ∑izi ≤ t, where zi ≥ x2i /t
- Implemented by CPLEX and Gurobi within months ofpublication.
37
Example: ∑iexp(cTi x+ bi) ≤ t
is expressed in conic form as∑izi ≤ t where (cTi x+ bi, 1, zi) ∈ EXP.
The translation to conic form already produces a formulation withadditional variables ideal for polyhedral outer approximation.
38
Example:exp(x2 + y2) ≤ t
is DCP compliant. CVX will generate equivalent formulation
exp(z) ≤ t where x2 + y2 ≤ z
which is then translated to EXP and second-order cones.
- Tawarmalani and Sahinidis (2005) show that polyhedral outerapproximations that exploit composition structure have extrastrength. This comes for “free” with conic representations.
39
vx̂ = minz
cTz
s.t. Azz = b− Axx̂,z ∈ K.
maxβ,λ
λT(b− Axx̂)
s.t. β = c− ATzλβ ∈ K∗.
Suppose CP(x̂) is feasible and strong duality holds at the optimalprimal-dual solution (zx̂, βx̂, λx̂). Then for any z with Azz = b− Axx̂and βT
x̂z ≥ 0, we have cTz ≥ vx̂.
Including βTx̂z ≥ 0 in polyhedral relaxation implies that if x = x̂ is the
optimal (integer part of the) solution of the relaxation, then theobjective value of the relaxation is at least vx̂.
Proof.
βTx̂z = (c− ATzλx̂)
Tz = cTz− λTx̂(b− Axx̂) = cTz− vx̂ ≥ 0.
40
vx̂ = minz
cTz
s.t. Azz = b− Axx̂,z ∈ K.
maxβ,λ
λT(b− Axx̂)
s.t. β = c− ATzλβ ∈ K∗.
Suppose CP(x̂) is feasible and strong duality holds at the optimalprimal-dual solution (zx̂, βx̂, λx̂). Then for any z with Azz = b− Axx̂and βT
x̂z ≥ 0, we have cTz ≥ vx̂.Including βT
x̂z ≥ 0 in polyhedral relaxation implies that if x = x̂ is theoptimal (integer part of the) solution of the relaxation, then theobjective value of the relaxation is at least vx̂.
Proof.
βTx̂z = (c− ATzλx̂)
Tz = cTz− λTx̂(b− Axx̂) = cTz− vx̂ ≥ 0.
40
vx̂ = minz
cTz
s.t. Azz = b− Axx̂,z ∈ K.
maxβ,λ
λT(b− Axx̂)
s.t. β = c− ATzλβ ∈ K∗.
Given x̂, assume CP(x̂) is infeasible and its dual is unbounded, suchthat we have a ray (βx̂, λx̂) satisfying βx̂ ∈ K∗, β = −ATzλx̂, andλTx̂(b− Axx̂) > 0. Then for any z satisfying Azz = b− Axx̂ we have
βTx̂z < 0.
Including βTx̂z ≥ 0 in polyhedral relaxation implies x = x̂ is
infeasible to the relaxation.
Proof.
βTx̂z = −λT
x̂Azz = −λTx̂(b− Axx̂) < 0.
41
vx̂ = minz
cTz
s.t. Azz = b− Axx̂,z ∈ K.
maxβ,λ
λT(b− Axx̂)
s.t. β = c− ATzλβ ∈ K∗.
Given x̂, assume CP(x̂) is infeasible and its dual is unbounded, suchthat we have a ray (βx̂, λx̂) satisfying βx̂ ∈ K∗, β = −ATzλx̂, andλTx̂(b− Axx̂) > 0. Then for any z satisfying Azz = b− Axx̂ we have
βTx̂z < 0.
Including βTx̂z ≥ 0 in polyhedral relaxation implies x = x̂ is
infeasible to the relaxation.
Proof.
βTx̂z = −λT
x̂Azz = −λTx̂(b− Axx̂) < 0.
41
Claim: ∃T ⊆ K∗, |T| < ∞ such that MIOA(T) is equivalent to MICONE
Proof.
Solve CP(x̂) for all possible x̂ (finite). Then include all correspondingdual vectors βx̂ in T. If x∗ is the integer solution of MIOA(T), thenCP(x∗) is feasible from the second lemma. From first lemma,objective of MIOA(T) is at least the optimal value of CP(x∗), but it’salso a relaxation so x∗ must be optimal.
42
Claim: ∃T ⊆ K∗, |T| < ∞ such that MIOA(T) is equivalent to MICONE
Proof.
Solve CP(x̂) for all possible x̂ (finite). Then include all correspondingdual vectors βx̂ in T. If x∗ is the integer solution of MIOA(T), thenCP(x∗) is feasible from the second lemma. From first lemma,objective of MIOA(T) is at least the optimal value of CP(x∗), but it’salso a relaxation so x∗ must be optimal.
42
recap
- Value of disciplined modeling for mixed-integer convex:◦ Translation to conic form makes the problem easier tosolve!
◦ DCP rules correspond to extended formulations- Almost all traditional benchmark problems are conicrepresentable
- Proposed first outer approximation algorithm, based on conicduality
43
Computational results
44
PajaritoNew solver!
- 1000 lines of Julia- Input is in conic form, accessible through Convex.jl DCPlanguage
- Works with any MILP and conic solvers supported in Julia- For tough conic problems, also supports traditional nonlinearsolvers
- Open source pending DOE approval
45
Translated 194 convex instances from MINLPLIB2 from AMPL intoConvex.jl (heroic parsing work by E. Yamangil).Compare with Bonmin’s outer approximation algorithm using CPLEXas the MILP solver. We use the same version of CPLEX plus KNITROas NLP (conic) solver.
46
(a) Solution time (b) Number of OA iterations
Figure: Comparison performance profiles for SOC representable instances(Bonmin >30 sec)
- Dominates on iteration count. Haven’t optimized Pajarito soBonmin is faster on the easier problems.
- CPLEX is usually the fastest on MISOCPs. Use that instead.
47
(a) Solution time (b) Number of OA iterations
Figure: Comparison performance profiles for SOC representable instances(Bonmin >30 sec)
- Dominates on iteration count. Haven’t optimized Pajarito soBonmin is faster on the easier problems.
- CPLEX is usually the fastest on MISOCPs. Use that instead.
47
(a) Solution time (b) Number of OA iterations
Figure: Comparison performance profiles for SOC representable instances(Bonmin >30 sec)
- Dominates on iteration count. Haven’t optimized Pajarito soBonmin is faster on the easier problems.
- CPLEX is usually the fastest on MISOCPs. Use that instead.
47
Almost none of the non-MISOCP instances in MINLPLIB2 are hard!
48
gams01
gams01 unsolved instance which is EXP+SOC representable
- 145 variables. 1268 constraints (1158 are linear)- Previous best bound: 1735.06. Best known solution: 21516.83- Pajarito solved in 6 iterations (< 10 hours). Optimal solution is21380.20.
49