cvx applications

7/24/2019 Cvx Applications

1/51

Convex Optimization Applications

Stephen Boydand Steven Diamond

EE & CS Departments

Stanford University

MLSS, Kyoto, August 23-24 2015

1


2/51

Outline

Portfolio Optimization

Worst-Case Risk Analysis

Optimal Advertising

Regression Variations

Model Fitting

2


3/51

Outline



Optimal Advertising


Model Fitting

Portfolio Optimization 3


4/51

Portfolio allocation vector

invest fraction wi in asset i, i=1, . . . , n

w Rn is portfolio allocation vector

1Tw=1

wi


5/51

Asset returns

investments held for one period

initial prices pi>0; end of period prices p+

i >0

asset (fractional) returns ri= (p+

i pi)/pi portfolio (fractional) return R=rTw

common model: ris a random variable, with mean E r=,covarianceE(r )(r )T =

so R is a RV with E R=Tw, var(R) =wTw

E R is (mean) return of portfolio

var(R) is riskof portfolio(risk also sometimes given as std(R) =

var(R))

two objectives: high return, low risk



6/51

Classical (Markowitz) portfolio optimization

maximize Tw wTwsubject to 1Tw=1, w W

variable w Rn

W is set of allowed portfolios common case: W=Rn+ (long only portfolio)

>0 is the risk aversion parameter

Tw wTw is risk-adjusted return

varying gives optimal risk-return trade-off

can also fix return and minimize risk, etc.



7/51

Example

optimal risk-return trade-off for 10 assets, long only portfolio



8/51

Example

return distributions for two risk aversion values



9/51

Portfolio constraints

W=Rn (simple analytical solution)

leverage limit: w1 Lmax

market neutral: mTw=0 mi is capitalization of asset i M=mTr is market return mTw=cov(M, R)

i.e., market neutral portfolio return is uncorrelated withmarket return



10/51

Example

optimal risk-return trade-off curves for leverage limits 1, 2, 4



11/51

Example

three portfolios with wTw=2, leverage limits L= 1, 2, 4



12/51


13/51

Factor covariance model

=FFT +D

F Rnk, k n is factor loading matrix

k is number of factors (or sectors), typically 10s Fij is loading of asset ito factor j

D is diagonal matrix; Dii>0 is idiosyncratic risk

> 0 is the factor covariance matrix

FTw Rk gives portfolio factor exposures

portfolio is factor j neutralif(FTw)j=0



14/51

Portfolio optimization with factor covariance model

maximize Tw

fTf +wTDw

subject to 1Tw=1, f =FTww W, f F

variables w Rn (allocations), f Rk (factor exposures)

Fgives factor exposure constraints

computational advantage: O(nk2) vs. O(n3)



15/51

Example

50 factors, 3000 assets

leverage limit =2 solve with covariance given as

single matrix factor model

CVXPY/ECOS single thread time

covariance solve time

single matrix 687.26 secfactor model 0.58 sec



16/51

Outline



Optimal Advertising


Model Fitting

Worst-Case Risk Analysis 16


17/51

Covariance uncertainty

single period Markowitz portfolio allocation problem

we have fixed portfolio allocation w Rn

return covariance not known, but we believe S S is convex set of possible covariance matrices

risk is wTw, a linear function of



18/51

Worst-case risk analysis

what is the worst (maximum) risk, over all possiblecovariance matrices?

worst-case risk analysis problem:

maximize wTw

subject to S, 0

with variable

. . . a convex problem with variable

if the worst-case risk is not too bad, you can worry less

if not, youll confront your worst nightmare



19/51

Example

w= (0.6, 0.5, 0.25, 0.65, 0)

optimized for nom, return 0.1, leverage limit 2

S= {nom + : |ii| =0, |ij| 0.2},

nom =

0.86 0.34 0.14 0.15 0.550.34 0.66 0.12 0.51 0.24

0.14 0.12 0.45 0.06 0.110.15 0.51 0.06 0.55 0.14

0.55 0.24 0.11 0.14 0.9



20/51

Example

nominal risk =0.511

worst case risk =0.917

worst case =

0 0.2 0.2 0.2 0.080.2 0 0.2 0.2 0.020.2 0.2 0 0.2 0.050.2 0.2 0.2 0 0.02

0.08 0.02 0.05 0.02 0



21/51

Outline



Optimal Advertising


Model Fitting

Optimal Advertising 21


22/51

Ad display

m advertisers/ads, i=1, . . . , m

n time slots, t=1, . . . , n

Tt is total traffic in time slot t

Dit 0 is number of ad idisplayed in period t

iDit Tt

contracted minimum total displays:

tDit ci goal: choose Dit



23/51

Clicks and revenue

Citis number of clicks on ad i in period t

click model: Cit=PitDit, Pit [0, 1]

payment: Ri>0 per click for ad i, up to budget Bi

ad revenue

Si=min

Rit

Cit, Bi

. . . a concave function ofD



24/51

Ad optimization

choose displays to maximize revenue:

maximize iSisubject to D 0, DT1 T, D1 c

variable is D Rmn

data are T, c, R, B, P



25/51

Example

24 hourly periods, 5 ads (AE)

total traffic:



26/51

Example

ad data:

Ad A B C D E

ci 61000 80000 61000 23000 64000Ri 0.15 1.18 0.57 2.08 2.43

Bi 25000 12000 12000 11000 17000



27/51

Example

Pit



28/51

Example

optimal Dit



29/51

Example

ad revenue

Ad A B C D E

ci 61000 80000 61000 23000 64000

Ri 0.15 1.18 0.57 2.08 2.43

Bi 25000 12000 12000 11000 17000tDit 61000 80000 148116 23000 167323

Si 182 12000 12000 11000 7760



30/51

Outline



Optimal Advertising


Model Fitting

Regression Variations 30


31/51

Standard regression

given data (xi, yi) Rn R, i=1, . . . , m

fit linear (affine) model yi=Txi v, R

n, v R

residuals are ri= yi yi

least-squares: choose , v to minimize r22 =

ir

2

i

mean of optimal residuals is zero

can add (Tychonov) regularization: with >0,

minimize r22

+22



32/51

Robust (Huber) regression

replace square with Huber function

(u) =

u2 |u| M2Mu M2 |u| >M

M>0 is the Huber threshold

same as least-squares for small residuals, but allows (some)large residuals



33/51

Example

m=450 measurements, n=300 regressors

choose true; xi N(0, I)

set yi= (true)Txi+i, i N(0, 1)

with probability p, replace yi withyi data has fraction pof (non-obvious) wrong measurements

distribution of good and bad yiare the same

try to recover true Rn from measurements y Rm

prescient version: we know which measurements are wrong



34/51

Example

50 problem instances, pvarying from 0 to 0.15



35/51

Example



36/51

Quantile regression

tilted1 penalty: for (0, 1),

(u) =(u)++ (1 )(u)= (1/2)|u| + ( 1/2)u

quantile regression: choose , v to minimize

i(ri)

=0.5: equal penalty for over- and under-estimating =0.1: 9 more penalty for under-estimating

=0.9: 9 more penalty for over-estimating


Q il i


37/51

Quantile regression

forri=0,

i(ri)

v = |{i :ri>0}| (1 ) |{i :ri0}| = (1 ) |{i :ri


38/51

Example

time series xt, t=0, 1, 2, . . .

auto-regressive predictor:

xt+1 =T(xt, . . . , xtM) v

M=10 is memory of predictor

use quantile regression for =0.1, 0.5, 0.9

at each time t, gives three one-step-ahead predictions:

x0.1t+1, x0.5t+1, x0.9t+1


E l


39/51

Example

time series xt


E l


40/51

Example

xtand predictions x0.1t+1, x

0.5t+1, x

0.9t+1 (training set, t=0, . . . , 399)


Example


41/51

Example

xtand predictions x0.1t+1, x

0.5t+1, x

0.9t+1 (test set, t=400, . . . , 449)


Example


42/51

Example

residual distributions for =0.9, 0.5, and 0.1 (training set)


Example


43/51

Example

residual distributions for =0.9, 0.5, and 0.1 (test set)


Outline


44/51

Outline



Optimal Advertising


Model Fitting

Model Fitting 44

Data model


45/51

Data model

given data (xi, yi) X Y , i=1, . . . , m forX =Rn, x is feature vector

forY=R, y is (real) outcomeor label

forY= {1, 1}, yis (boolean) outcome

find modelorpredictor: X Yso that (x) yfor data(x, y) that you havent seen

forY=R, is a regression model

forY= {1, 1}, is a classifier we choose based on observed data, prior knowledge

Model Fitting 45

Loss minimization model


46/51

Loss minimization model

data model parametrized by Rn

loss function L: X Y Rn R

L(xi, yi, ) is loss (miss-fit) for data point (xi, yi),

using model parameter choose ; then model is

(x) =argminy

L(x, y, )

Model Fitting 46

Model fitting via regularized loss minimization


47/51

Model fitting via regularized loss minimization

regularization r :Rn R {}

r() measures model complexity, enforces constraints, orrepresents prior

choose by minimizing regularized loss

(1/m)i

L(xi, yi, ) +r()

for many useful cases, this is a convex problem

model is (x) = argminyL(x, y, )

Model Fitting 47

Examples


48/51

Examples

model L(x, y, ) (x) r()least-squares (Tx y)2 Tx 0ridge regression (Tx y)2 Tx 2

2

lasso (Tx y)2 Tx 1logistic classifier log(1+exp(yTx)) sign(Tx) 0

SVM (1 yTx)+ sign(Tx) 22

>0 scales regularization

all lead to convex fitting problems

Model Fitting 48

Example


49/51

p

original (boolean) features z {0, 1}10

(boolean) outcome y {1, 1}

new feature vector x {0, 1}55 contains all products zizj(co-occurence of pairs of original features)

use logistic loss, 1 regularizer

training data has m=200 examples; test on 100 examples

Model Fitting 49

Example


50/51

p

Model Fitting 50

Example


51/51

p

selected features zizj, =0.01

Model Fitting 51

cvx applications

Documents