object-oriented convex optimization with cvxpy · 2018. 5. 23. · i object-oriented convex...
TRANSCRIPT
Object-Oriented Convex Optimization with CVXPY
Stephen Boyd Steven Diamond Akshay AgrawalStanford University
BayOpt, Stanford, 5/19/18
1
Outline
Convex optimization
CVXPY 1.0
Parameters and warm-start
Distributed optimization
Summary
Convex optimization 2
Convex optimization problem
minimize f0(x)subject to fi (x) ≤ 0, i = 1, . . . ,m
Ax = b,
with variable x ∈ Rn
I objective and inequality constraints f0, . . . , fm are convex
for all x , y , θ ∈ [0, 1],
fi (θx + (1− θ)y) ≤ θfi (x) + (1− θ)fi (y)
i.e., graphs of fi curve upward
I equality constraints are linear
Convex optimization 3
Why convex optimization?
I beautiful, fairly complete, and useful theory
I solution algorithms that work well in theory and practiceI many applications in
I machine learning, statisticsI controlI signal, image processingI networkingI engineering designI finance
. . . and many more
Convex optimization 4
How do you solve a convex problem?
I use someone else’s (‘standard’) solver (LP, QP, SOCP, . . . )I easy, but your problem must be in a standard formI cost of solver development amortized across many users
I write your own (custom) solverI lots of work, but can take advantage of special structure
I use a convex modeling languageI transforms user-friendly format into solver-friendly standard formI extends reach of problems solvable by standard solvers
Convex optimization 5
Convex modeling languages
I long tradition of modeling languages for optimizationI AMPL, GAMS
I modeling languages for convex optimizationI CVX, YALMIP, CVXGEN, CVXPY, Convex.jl, CVXR
I function of a convex modeling language:I check/verify problem convexityI convert to standard form
Convex optimization 6
Disciplined convex programming (DCP)
I system for constructing expressions with known curvatureI constant, affine, convex, concave
I expressions formed fromI variablesI constants and parametersI library of functions with known curvature, monotonicity, sign
I basis of all convex modeling systems
I more at dcp.stanford.edu
Convex optimization 7
The one rule that DCP is based on
h(f1(x), . . . , fk(x)) is convex when h is convex and for each i
I h is increasing in argument i , and fi is convex, or
I h is decreasing in argument i , and fi is concave, or
I fi is affine
I there’s a similar rule for concave compositions(just swap convex and concave above)
Convex optimization 8
Outline
Convex optimization
CVXPY 1.0
Parameters and warm-start
Distributed optimization
Summary
CVXPY 1.0 9
CVXPY
a modeling language in Python for convex optimization
I developed by Diamond & Boyd, 2014–
I uses signed DCP to verify convexity
I open source all the way to the solvers
I mixes easily with general Python code, other libraries
I already used in many research projects, classes, companies
I over 100,000 downloads on PyPi
CVXPY 1.0 10
CVXPY 1.0
a complete redesign of CVXPY
I a modular framework for mapping problems into solver standard formI recognizes QPs and targets specialized solversI supports complex numbers via rewriting as equivalent real-valued problem
I a unified system for defining variables and parameters with special properties,e.g.,
I nonnegativeI symmetricI sparse
I full NumPy compatibility (matching syntax, etc.)
CVXPY 1.0 11
CVXPY 1.0 example
(constrained LASSO)
minimize ‖Ax − b‖22 + γ‖x‖1subject to 1T x = 0, ‖x‖∞ ≤ 1
with variable x ∈ Rn
from cvxpy import *
x = Variable(n)
cost = sum_squares(A*x-b) + gamma*norm(x,1)
obj = Minimize(cost)
constr = [sum_entries(x) == 0, norm(x,"inf") <= 1]
prob = Problem(obj, constr)
opt_val = prob.solve()
solution = x.value
CVXPY 1.0 12
Solvers
I ECOS (Domahidi)I interior-point methodI supports exponential coneI compact, library-free C code
I SCS (O’Donoghue)I first-order methodI parallelism with OpenMPI GPU support
I OSQP (Stellato, Banjac, Goulart)I first-order methodI targets QPs and LPsI code generation support
I others: CVXOPT, GLPK, MOSEK, GUROBI, Cbc, . . .
CVXPY 1.0 13
Reductions
I mapping from problem to standard form is a series of reductions
I a reduction maps a problem into an equivalent one
I equivalent means a solution of one can be readily constructed from a solutionof the other
I analogous to reduction in theoretical CS
CVXPY 1.0 14
Reductions
canonicalization
p1 pn
...
p0
s1 sn...
s0
retrieval
solver
CVXPY 1.0 15
Example reductions
I flipping the objective from minimize to maximize
I adding slack variables
I changing variables
I monotone transformations of objective/constraints
I eliminating complex numbers
I dualizing
I pre-solve
CVXPY 1.0 16
Canonicalization of DCP programs
DCP programs are canonicalized to equivalent cone programs via the reductions
I conversion to Smith form
I relaxing convex equality constraints
I expanding nonlinear functions to graph implementations
(this was hard-coded in CVXPY < 1.0)
CVXPY 1.0 17
Overall framework
CVXPY 1.0 parses the problem, analyzes it, and dispatches to the most specializedsolver reachable via known reductions
p0
DSL Front End Analyzer
LP
QP
SDP
CFP
Back Ends
Si
S2
S1
......
Sk
Rewriting System
pn
Solver
CVXPY 1.0 18
Outline
Convex optimization
CVXPY 1.0
Parameters and warm-start
Distributed optimization
Summary
Parameters and warm-start 19
Parameters in CVXPY
I symbolic representations of constants
I can specify sign (for use in DCP analysis)
I change value of constant without re-parsing problem
I for-loop style trade-off curve:
x_values = []
for val in numpy.logspace(-4, 2, 100):
gamma.value = val
prob.solve()
x_values.append(x.value)
Parameters and warm-start 20
Parallel style trade-off curve
# Use tools for parallelism in standard library.
from multiprocessing import Pool
# Function maps gamma value to optimal x.
def get_x(gamma_value):
gamma.value = gamma_value
result = prob.solve()
return x.value
# Parallel computation with N processes.
pool = Pool(processes = N)
x_values = pool.map(get_x, numpy.logspace(-4, 2, 100))
Parameters and warm-start 21
Warm-start
I problem object caches solution and factorization
I solution used as initial guess
I factorization re-used if possible
I stateful serial approach versus stateless parallel approach
Parameters and warm-start 22
Performance
(LASSO)minimize ‖Ax − b‖22 + γ‖x‖1
with variable x ∈ Rn
I A ∈ R1000×500, 100 values γ
I single thread time for one LASSO: 1.6 seconds (OSQP)
for-loop 4 proc. 32 proc. warm-start
4 core MacBook Pro 173 sec 70 sec 81 sec 43 sec32 cores, Intel Xeon 527 sec 171 sec 45 sec 149 sec
Parameters and warm-start 23
Outline
Convex optimization
CVXPY 1.0
Parameters and warm-start
Distributed optimization
Summary
Distributed optimization 24
Sum of problems
I overload + for problem objectsI objectives add (if both minimize or maximize)I constraints lists add (i.e., concatenate)
problem1 = Problem(Minimize(abs(x+y), [x>=2]))
problem2 = Problem(Minimize(square(y+z), [y>=1]))
problem = problem1 + problem2
print(problem)
# Problem(Minimize(abs(x+y)+square(y+z)), [x>=2,y>=1])
Distributed optimization 25
Separable problems
I fully separable problems (can be) solved in parallel
I groups objective terms, constraints with same variables
problem1 = Problem(Minimize(abs(x)), [x >= 2])
problem2 = Problem(Minimize(y**2), [y >= 1])
# Solve in parallel
(problem1 + problem2).solve(parallel=True)
# Solve serially
problem1.solve()
problem2.solve()
Distributed optimization 26
fix function
I replaces variables in a given list with parameters with the same value
I create expression
# An expression with variables x and z.
expr1 = sum_squares(A*x - b) + norm(z, 1)
expr1.variables() # [x, z]
I use fix function
# Fix expr1 with respect to z.
# z is replaced with Parameter(value=z.value).
expr2 = fix(expr1, [z])
expr2.variables() # [x]
Distributed optimization 27
Alternating direction method of multipliers
I problemminimize f (x) + g(z)subject to Ax + Bz = c
I augmented Lagrangian
Lρ(x , z , y) = f (x) + g(z) + yT (Ax + Bz − c) + (ρ/2) ‖Ax + Bz − c‖22
I ADMM:xk+1 := argminx Lρ(x , zk , yk)zk+1 := argminz Lρ(xk+1, z , yk)yk+1 := yk + ρ(Axk+1 + Bzk+1 − c)
Distributed optimization 28
Generic ADMM in CVXPY
I form and solve original problem
Problem(Minimize(f + g), [A*x + B*z == c]).solve()
I ADMM in CVXPY:
resid = A*x + B*z - c
y = Parameter(m, value=zeros(m))
aug_lagr = f+g+y.T*resid+(rho/2)*sum_squares(resid)
for k in range(MAX_ITERS):
Problem(Minimize(fix(aug_lagr, [z]))).solve()
Problem(Minimize(fix(aug_lagr, [x]))).solve()
y.value += rho*resid.value
Distributed optimization 29
Consensus optimization
I want to solve problem with N objective terms
minimize∑N
i=1 fi (x)
I e.g., fi is the loss function for ith block of training data
I consensus form:minimize
∑Ni=1 fi (xi )
subject to xi = z
I xi are local variablesI z is the global variableI xi = z are consistency or consensus constraints
Distributed optimization 30
ADMM consensus
I alternating direction method of multipliers (ADMM) consensus:
xk+1i := argminxi
(fi (xi ) + (ρ/2)
∥∥xi − xk + uki∥∥22
)uk+1i := uki + xk+1
i − xk+1
I xk = (1/N)∑N
i=1 xki
I parameter ρ ≥ 0
I split across N worker processesI in each iteration
I update xi locally (in each worker process, in parallel)I gather xi on master process, average to get xI scatter x to workersI update ui locally
Distributed optimization 31
ADMM consensus: the workers
I launch N worker processesI pipe to communicate with master
def run_worker(f, pipe):
xbar = Parameter(n, value=zeros(n))
u = Parameter(n, value=zeros(n))
f += (rho/2)*sum_squares(x - xbar + u)
prox = Problem(Minimize(f))
# ADMM loop.
while True:
prox.solve()
pipe.send(x.value)
xbar.value = pipe.recv()
u.value += x.value - xbar.value
Distributed optimization 32
ADMM consensus: the master
I master gathers xi and scatters x
# pipes = list with pipe to each worker
for i in range(MAX_ITER):
# Gather and average xi
xbar = sum([pipe.recv() for pipe in pipes])/N
# Scatter xbar
[pipe.send(xbar) for pipe in pipes]
Distributed optimization 33
Consensus SVM
I data (ai , bi ), i = 1, . . . ,N, ai ∈ Rn, bi ∈ {−1,+1}I linear classifier sign(aTw + v), with weight w , offset v
I choose w , v to minimize 1N
∑Ni=1(1− bi (a
Ti w + v))+ + λ‖w‖22
I split data and use ADMM consensus to solve
Distributed optimization 34
Example
I N = 106 samples
I n = 103 (dense) features
I 6 hours to solve serially using CVXPY and SCSI 20 seconds to solve with ADMM consensus
I split over 100 processesI 32 cores, Intel Xeon CPUsI 2 sec per ADMM iterationI 10 iterations to converge
Distributed optimization 35
Example
Distributed optimization 36
Outline
Convex optimization
CVXPY 1.0
Parameters and warm-start
Distributed optimization
Summary
Summary 37
Summary
I object-oriented convex optimizationI is close to the mathematicsI . . . and extremely practical
I CVXPY mixes well with high level PythonI parallelismI object oriented design
I CVXPY is building block forI distributed optimizationI nonconvex optimizationI domain-specific application packages
Summary 38