object-oriented convex optimization with cvxpy · 2018. 5. 23. · i object-oriented convex...

Object-Oriented Convex Optimization with CVXPY

Stephen Boyd Steven Diamond Akshay AgrawalStanford University

BayOpt, Stanford, 5/19/18

1

Outline

Convex optimization

CVXPY 1.0

Parameters and warm-start

Distributed optimization

Summary

Convex optimization 2

Convex optimization problem

minimize f0(x)subject to fi (x) ≤ 0, i = 1, . . . ,m

Ax = b,

with variable x ∈ Rn

I objective and inequality constraints f0, . . . , fm are convex

for all x , y , θ ∈ [0, 1],

fi (θx + (1− θ)y) ≤ θfi (x) + (1− θ)fi (y)

i.e., graphs of fi curve upward

I equality constraints are linear


Why convex optimization?

I beautiful, fairly complete, and useful theory

I solution algorithms that work well in theory and practiceI many applications in

I machine learning, statisticsI controlI signal, image processingI networkingI engineering designI finance

. . . and many more


How do you solve a convex problem?

I use someone else’s (‘standard’) solver (LP, QP, SOCP, . . . )I easy, but your problem must be in a standard formI cost of solver development amortized across many users

I write your own (custom) solverI lots of work, but can take advantage of special structure

I use a convex modeling languageI transforms user-friendly format into solver-friendly standard formI extends reach of problems solvable by standard solvers


Convex modeling languages

I long tradition of modeling languages for optimizationI AMPL, GAMS

I modeling languages for convex optimizationI CVX, YALMIP, CVXGEN, CVXPY, Convex.jl, CVXR

I function of a convex modeling language:I check/verify problem convexityI convert to standard form


Disciplined convex programming (DCP)

I system for constructing expressions with known curvatureI constant, affine, convex, concave

I expressions formed fromI variablesI constants and parametersI library of functions with known curvature, monotonicity, sign

I basis of all convex modeling systems

I more at dcp.stanford.edu


http://dcp.stanford.edu

The one rule that DCP is based on

h(f1(x), . . . , fk(x)) is convex when h is convex and for each i

I h is increasing in argument i , and fi is convex, or

I h is decreasing in argument i , and fi is concave, or

I fi is affine

I there’s a similar rule for concave compositions(just swap convex and concave above)


Outline

Convex optimization

CVXPY 1.0



Summary

CVXPY 1.0 9

CVXPY

a modeling language in Python for convex optimization

I developed by Diamond & Boyd, 2014–

I uses signed DCP to verify convexity

I open source all the way to the solvers

I mixes easily with general Python code, other libraries

I already used in many research projects, classes, companies

I over 100,000 downloads on PyPi

CVXPY 1.0 10

CVXPY 1.0

a complete redesign of CVXPY

I a modular framework for mapping problems into solver standard formI recognizes QPs and targets specialized solversI supports complex numbers via rewriting as equivalent real-valued problem

I a unified system for defining variables and parameters with special properties,e.g.,

I nonnegativeI symmetricI sparse

I full NumPy compatibility (matching syntax, etc.)

CVXPY 1.0 11

CVXPY 1.0 example

(constrained LASSO)

minimize ‖Ax − b‖22 + γ‖x‖1subject to 1T x = 0, ‖x‖∞ ≤ 1


from cvxpy import *

x = Variable(n)

cost = sum_squares(A*x-b) + gamma*norm(x,1)

obj = Minimize(cost)

constr = [sum_entries(x) == 0, norm(x,"inf") <= 1]

prob = Problem(obj, constr)

opt_val = prob.solve()

solution = x.value

CVXPY 1.0 12

Solvers

I ECOS (Domahidi)I interior-point methodI supports exponential coneI compact, library-free C code

I SCS (O’Donoghue)I first-order methodI parallelism with OpenMPI GPU support

I OSQP (Stellato, Banjac, Goulart)I first-order methodI targets QPs and LPsI code generation support

I others: CVXOPT, GLPK, MOSEK, GUROBI, Cbc, . . .

CVXPY 1.0 13

Reductions

I mapping from problem to standard form is a series of reductions

I a reduction maps a problem into an equivalent one

I equivalent means a solution of one can be readily constructed from a solutionof the other

I analogous to reduction in theoretical CS

CVXPY 1.0 14

Reductions

canonicalization

p1 pn

...

p0

s1 sn...

s0

retrieval

solver

CVXPY 1.0 15

Example reductions

I flipping the objective from minimize to maximize

I adding slack variables

I changing variables

I monotone transformations of objective/constraints

I eliminating complex numbers

I dualizing

I pre-solve

CVXPY 1.0 16

Canonicalization of DCP programs

DCP programs are canonicalized to equivalent cone programs via the reductions

I conversion to Smith form

I relaxing convex equality constraints

I expanding nonlinear functions to graph implementations

(this was hard-coded in CVXPY < 1.0)

CVXPY 1.0 17

Overall framework

CVXPY 1.0 parses the problem, analyzes it, and dispatches to the most specializedsolver reachable via known reductions

p0

DSL Front End Analyzer

LP

QP

SDP

CFP

Back Ends

Si

S2

S1

......

Sk

Rewriting System

pn

Solver

CVXPY 1.0 18

Outline

Convex optimization

CVXPY 1.0



Summary

Parameters and warm-start 19

Parameters in CVXPY

I symbolic representations of constants

I can specify sign (for use in DCP analysis)

I change value of constant without re-parsing problem

I for-loop style trade-off curve:

x_values = []

for val in numpy.logspace(-4, 2, 100):

gamma.value = val

prob.solve()

x_values.append(x.value)


Parallel style trade-off curve

# Use tools for parallelism in standard library.

from multiprocessing import Pool

# Function maps gamma value to optimal x.

def get_x(gamma_value):

gamma.value = gamma_value

result = prob.solve()

return x.value

# Parallel computation with N processes.

pool = Pool(processes = N)

x_values = pool.map(get_x, numpy.logspace(-4, 2, 100))


Warm-start

I problem object caches solution and factorization

I solution used as initial guess

I factorization re-used if possible

I stateful serial approach versus stateless parallel approach


Performance

(LASSO)minimize ‖Ax − b‖22 + γ‖x‖1


I A ∈ R1000×500, 100 values γ

I single thread time for one LASSO: 1.6 seconds (OSQP)

for-loop 4 proc. 32 proc. warm-start

4 core MacBook Pro 173 sec 70 sec 81 sec 43 sec32 cores, Intel Xeon 527 sec 171 sec 45 sec 149 sec


Outline

Convex optimization

CVXPY 1.0



Summary

Distributed optimization 24

Sum of problems

I overload + for problem objectsI objectives add (if both minimize or maximize)I constraints lists add (i.e., concatenate)

problem1 = Problem(Minimize(abs(x+y), [x>=2]))

problem2 = Problem(Minimize(square(y+z), [y>=1]))

problem = problem1 + problem2

print(problem)

# Problem(Minimize(abs(x+y)+square(y+z)), [x>=2,y>=1])


Separable problems

I fully separable problems (can be) solved in parallel

I groups objective terms, constraints with same variables

problem1 = Problem(Minimize(abs(x)), [x >= 2])

problem2 = Problem(Minimize(y**2), [y >= 1])

# Solve in parallel

(problem1 + problem2).solve(parallel=True)

# Solve serially

problem1.solve()

problem2.solve()


fix function

I replaces variables in a given list with parameters with the same value

I create expression

# An expression with variables x and z.

expr1 = sum_squares(A*x - b) + norm(z, 1)

expr1.variables() # [x, z]

I use fix function

# Fix expr1 with respect to z.

# z is replaced with Parameter(value=z.value).

expr2 = fix(expr1, [z])

expr2.variables() # [x]


Alternating direction method of multipliers

I problemminimize f (x) + g(z)subject to Ax + Bz = c

I augmented Lagrangian

Lρ(x , z , y) = f (x) + g(z) + yT (Ax + Bz − c) + (ρ/2) ‖Ax + Bz − c‖22

I ADMM:xk+1 := argminx Lρ(x , zk , yk)zk+1 := argminz Lρ(xk+1, z , yk)yk+1 := yk + ρ(Axk+1 + Bzk+1 − c)


Generic ADMM in CVXPY

I form and solve original problem

Problem(Minimize(f + g), [A*x + B*z == c]).solve()

I ADMM in CVXPY:

resid = A*x + B*z - c

y = Parameter(m, value=zeros(m))

aug_lagr = f+g+y.T*resid+(rho/2)*sum_squares(resid)

for k in range(MAX_ITERS):

Problem(Minimize(fix(aug_lagr, [z]))).solve()

Problem(Minimize(fix(aug_lagr, [x]))).solve()

y.value += rho*resid.value


Consensus optimization

I want to solve problem with N objective terms

minimize∑N

i=1 fi (x)

I e.g., fi is the loss function for ith block of training data

I consensus form:minimize

∑Ni=1 fi (xi )

subject to xi = z

I xi are local variablesI z is the global variableI xi = z are consistency or consensus constraints


ADMM consensus

I alternating direction method of multipliers (ADMM) consensus:

xk+1i := argminxi

(fi (xi ) + (ρ/2)

∥∥xi − xk + uki∥∥22

)uk+1i := uki + xk+1

i − xk+1

I xk = (1/N)∑N

i=1 xki

I parameter ρ ≥ 0

I split across N worker processesI in each iteration

I update xi locally (in each worker process, in parallel)I gather xi on master process, average to get xI scatter x to workersI update ui locally


ADMM consensus: the workers

I launch N worker processesI pipe to communicate with master

def run_worker(f, pipe):

xbar = Parameter(n, value=zeros(n))

u = Parameter(n, value=zeros(n))

f += (rho/2)*sum_squares(x - xbar + u)

prox = Problem(Minimize(f))

# ADMM loop.

while True:

prox.solve()

pipe.send(x.value)

xbar.value = pipe.recv()

u.value += x.value - xbar.value


ADMM consensus: the master

I master gathers xi and scatters x

# pipes = list with pipe to each worker

for i in range(MAX_ITER):

# Gather and average xi

xbar = sum([pipe.recv() for pipe in pipes])/N

# Scatter xbar

[pipe.send(xbar) for pipe in pipes]


Consensus SVM

I data (ai , bi ), i = 1, . . . ,N, ai ∈ Rn, bi ∈ {−1,+1}I linear classifier sign(aTw + v), with weight w , offset v

I choose w , v to minimize 1N

∑Ni=1(1− bi (a

Ti w + v))+ + λ‖w‖22

I split data and use ADMM consensus to solve


Example

I N = 106 samples

I n = 103 (dense) features

I 6 hours to solve serially using CVXPY and SCSI 20 seconds to solve with ADMM consensus

I split over 100 processesI 32 cores, Intel Xeon CPUsI 2 sec per ADMM iterationI 10 iterations to converge


Example


Outline

Convex optimization

CVXPY 1.0



Summary

Summary 37

Summary

I object-oriented convex optimizationI is close to the mathematicsI . . . and extremely practical

I CVXPY mixes well with high level PythonI parallelismI object oriented design

I CVXPY is building block forI distributed optimizationI nonconvex optimizationI domain-specific application packages

Summary 38

object-oriented convex optimization with cvxpy · 2018. 5. 23. · i object-oriented convex...

Documents