admm: an usage studyhome.cse.ust.hk/~qyaoaa/papers/talk-admm.pdf · admm: an usage study quanming...
TRANSCRIPT
Overview
1 A Brief Review
2 Three Application Examples
3 The ADMM Algorithm
4 ADMM: Extensions
Quanming Yao (HKUST) April 6, 2017 2 / 24
That One Famous Paper
Three years ago, 1000 cites - lots of people are using it.
Paper website: http://stanford.edu/~boyd/admm.html
Quanming Yao (HKUST) April 6, 2017 3 / 24
Proximal Gradient Descent: Review
Optimization problem
minx
f (x) + g(x)
Two fundamental assumptions
A1. f : is Lipschitz smooth
A2. g : has cheap closed-form solution on proximal step, i.e.,
minx
1
2‖x − z‖2
2 + g(x)
First attempt: Accelerated proximal gradient (APG) descent
Quanming Yao (HKUST) April 6, 2017 4 / 24
ADMM: An Overview
When either Assumption A1 or A2 fails,
APG can not be applied; or becomes too slow
Alternating Direction Method of Multipliers (ADMM)
convex case:most general optimization approach with convergence guarantee
nonconvex case:good empirical performance in many problems
most important: easy to use
ADMM serves as an alternative when APG fails
Quanming Yao (HKUST) April 6, 2017 5 / 24
Example: Robust PCA
Robust PCA: data is contaminated by sparse errors
minX‖D − X‖1︸ ︷︷ ︸
f
+λ ‖X‖∗
f is not smooth
Quanming Yao (HKUST) April 6, 2017 6 / 24
Example: Fused Lasso
Fused Lasso: the signal is smooth, a few flip-flop
(a) `1-norm. (b) `2-norm. (c) Fused lasso.
minx∈Rd
1
2‖y − Dx‖2
2 + λ ‖Gx‖1︸ ︷︷ ︸g
, G =
+1 −1 . . .. . .
. . . +1 −1
∈ Rd×d−1
g has no closed-form solution on proximal step
Quanming Yao (HKUST) April 6, 2017 7 / 24
Example: Matrix Completion with Box Constraint
MovieLens- ratings in [1, 5]
Image- pixels in [0, 255]
minX
1
2
∑(i ,j)∈Ω
(Xij − Oij)2 + λ ‖X‖∗ s.t. 1 ≤ Xij ≤ 5︸ ︷︷ ︸
constaint
extra constraints
Quanming Yao (HKUST) April 6, 2017 8 / 24
ADMM: An Illustration
Consider optimization problem: f and g are convex
minx
f (x) + g(y) : s.t. Ax = By
ADMM works on augmented Lagrangian
L (x , y , p) = f (x) + g(y) + p>(Ax − By)︸ ︷︷ ︸standard Lagrangian
+ρ
2‖Ax − By‖2
2
p: the dual parameterρ2 ‖Ax − By‖2
2: augmented term
ρ: penalty parameter, needs to be positive
Quanming Yao (HKUST) April 6, 2017 9 / 24
ADMM: An Illustration
minx ,y
maxpL(x , y , p) ≡ f (x) + g(y) + p>(Ax − By) +
ρ
2‖Ax − By‖2
2
Optimization procedures
xt+1 = arg minx
f (x) + p>t (Ax − Byt) +ρ
2‖Ax − Byt‖2
2 , (1)
yt+1 = arg miny
g(y) + p>t (Axt+1 − By) +ρ
2‖Ax − Byt+1‖2
2 , (2)
pt+1 = pt + ρ (Axt+1 − Byt+1) . (3)
Alternating direction
(1) descent step to minimize L w.r.t x (similarly for (2))
(3) ascend step to maximize L w.r.t p
f and g are convex, O(1/T ) rate is guaranteed [4]
Quanming Yao (HKUST) April 6, 2017 10 / 24
ADMM: Application to Robust PCA
Robust PCA: minX‖D − X‖1︸ ︷︷ ︸
f
+λ ‖X‖∗︸ ︷︷ ︸g
Reformulation: introduce Y to decouple f and g
minX ,Y‖Y ‖1 + λ ‖X‖∗ s.t. X + Y = D.
Augmented Lagrangian
L(X ,Y ,P) ≡ ‖Y ‖1 + λ ‖X‖∗ + 〈P,X + Y − D〉+ρ
2‖X + Y − D‖2
F
Quanming Yao (HKUST) April 6, 2017 11 / 24
ADMM: Application to Robust PCA
minX ,Y
maxP‖Y ‖1 + λ ‖X‖∗ + 〈P,X + Y − D〉+
ρ
2‖X + Y − D‖2
F
ADMM procedures
Xt+1 = arg minXλ ‖X‖∗ + 〈Pt ,X + Yt − D〉+
ρ
2‖X + Yt − D‖2
F
= arg minX
1
2
∥∥∥∥X − (D − Yt −1
ρPt
)∥∥∥∥2
F
+λ
ρ‖X‖∗
= proxλρ‖·‖∗
(ZXt
)proximal step with nuclear norm
where ZXt = D − Yt − 1
ρPt .
SVD on ZXt = UΣV>: Closed-form [1], Xt+1 = U max
(Σ− λ
ρ I , 0)V>
Quanming Yao (HKUST) April 6, 2017 12 / 24
ADMM: Application to Robust PCA
minX ,Y
maxP‖Y ‖1 + λ ‖X‖∗ + 〈P,X + Y − D〉+
ρ
2‖X + Y − D‖2
F
ADMM procedures
Yt+1 = arg minY‖Y ‖1 + 〈P,Xt+1 + Y − D〉+
ρ
2‖Xt+1 + Y − D‖2
F
= arg minY
∥∥∥∥Y − (D − Xt+1 −1
ρPt
)∥∥∥∥2
F
+1
ρ‖Y ‖1
= prox 1ρ‖·‖1
(ZYt
)proximal step with `1-norm
where ZYt = D − Xt+1 − 1
ρPt .
Closed-form: [Yt+1]ij = sign([
ZYt
]ij
)max
(∣∣∣[ZYt
]ij
∣∣∣− 1ρ , 0)
Quanming Yao (HKUST) April 6, 2017 13 / 24
ADMM: Application to Robust PCA
Reformulation
minX ,Y‖Y ‖1 + λ ‖X‖∗ s.t. X + Y = D.
ADMM procedures
Xt+1 = proxλρ‖·‖∗
(ZXt
)where ZX
t = D − Yt − 1ρPt
Yt+1 = prox 1ρ‖·‖1
(ZYt
)where ZY
t = D − Xt+1 − 1ρPt
Pt+1 = Pt + ρ(D − Xt+1 − Yt+1)
ADMM is the only choice
Smoothing technique and interior method can be used, but they are slow
Quanming Yao (HKUST) April 6, 2017 14 / 24
ADMM: Application to Robust PCA
Convergence curve (convex problem), the number shows the value of ρ
too large ρ - increase; too small ρ - decrease
once ρ > 0 convergence is guaranteed, I usually set ρ = 1
Quanming Yao (HKUST) April 6, 2017 15 / 24
ADMM: Other Two Examples
Fused lasso: minx∈Rd12 ‖y − Dx‖2
2 + λ ‖Gx‖1
minx ,z12 ‖y − Dx‖2
2 + λ ‖z‖1 s.t. z = Gx
closed-form on x
proximal step with `1-norm on z
Matrix completion: minX12
∑(i ,j)∈Ω (Xij − Oij)
2 +λ ‖X‖∗ s.t. 1 ≤ Xij ≤ 5
minX12
∑(i ,j)∈Ω (Xij − Oij)
2 + λ ‖X‖∗ s.t. 1 ≤ Zij ≤ 5, X = Z
simple projection on Z
no closed-form on X (we will use linearization later)
Quanming Yao (HKUST) April 6, 2017 16 / 24
ADMM: Improvements
Useful extensions
Multi-block of parameters
Increasing penalty parameter
Linearization and acceleration
Nonconvex optimization
Quanming Yao (HKUST) April 6, 2017 17 / 24
Multi-Blocks
An example: minx
m∑i=1
fi (x) → minx ,y i
m∑i=1
fi (yi ) s.t. x = y i .
Each fi is a convex function or indicator function of a convex set
L(x , y1, . . . , ym, p1, . . . , pm) =m∑i=1
fi (yi ) + p>i (x − y i ) +
ρ
2
∥∥x − y i∥∥2
2
ADMM procedures
for each i , get y it+1 = arg miny i fi (yi ) + p>i (xt − y i ) + ρ
2
∥∥xt − y i∥∥2
2
get xt+1 = arg minx∑m
i=1 p>i (x − y i ) + ρ
2
∥∥x − y i∥∥2
2
for each i , update pt+1 = pt + ρ(xt+1 − y it+1
)Convergence is not guaranteed in general cases [2]; works well in practice.
Quanming Yao (HKUST) April 6, 2017 18 / 24
Increasing Penalty Parameter
Robust PCA
minX ,Y‖Y ‖1 + ‖X‖∗ s.t. X + Y = D.
ADMM procedures, start with ρ0 = 0.001 (a small value)
Xt+1 = prox 1ρ‖·‖∗
(ZXt
)where ZX
t = D − Yt − 1ρPt
Yt+1 = prox 1ρ‖·‖1
(ZYt
)where ZY
t = D − Xt+1 − 1ρPt
Pt+1 = Pt + ρt(D − Xt+1 − Yt+1)
ρt+1 = cρt , c > 1 (i.e. increasing exponentially)
Obtaining a fast approximate solution
Fast convergence to a limit point (not optimal one) [5]
Quanming Yao (HKUST) April 6, 2017 19 / 24
Linearization and Acceleration
minX
1
2
∑(i ,j)∈Ω
(Xij − Oij)2 + λ ‖X‖∗ , s.t.1 ≤ Zij ≤ 5,X = Z
On minimization over Xt+1
Xt+1 =arg minX
1
2
∑(i ,j)∈Ω
(Xij − Oij)2
︸ ︷︷ ︸f (X )
+〈Pt ,X − Zt〉+ρ
2‖X − Zt‖2
F +λ ‖X‖∗
No closed-form, need to be solved with other algorithms like APG.
Quanming Yao (HKUST) April 6, 2017 20 / 24
Linearization and Acceleration
Second order approximation on f (X ) [8]
arg minX
f (Xt) + 〈X − Xt ,∇f (Xt)〉+L
2‖X − Xt‖2
F
+〈Pt ,X − Zt〉+ρ
2‖X − Zt‖2
F +λ ‖X‖∗
= arg minX
1
2
∥∥∥∥X − ρ
ρ+ LZt +
1
ρ+ LPt −
L
ρ+ LXt
∥∥∥∥2
F
+λ
ρ+ L‖X‖∗
Proximal step with the nuclear norm: closed-form solution
Acceleration may also be applied (similar to APG) [3, 7]
If linearization is not used, no need to do acceleration
Quanming Yao (HKUST) April 6, 2017 21 / 24
Nonconvex Optimization
Nonconvex robust PCA [6]
minX ,Y
m∑i=1
n∑j=1
κ (Yij) + λ
m∑i=1
κ (σi (X )) , s.t. X + Y = D.
where κ(x) = log(
1 + |x |θ
)No convergence guarantee in general, work well in practice
use standard ADMM procedures (no linearization and acceleration)
make ρ increase exponentially
Quanming Yao (HKUST) April 6, 2017 22 / 24
ADMM: Application to Robust PCA
ADMM on nonconvex problem
no converge guarantee in general
Quanming Yao (HKUST) April 6, 2017 23 / 24
Cai, J., Candes, E. J., and Shen, Z.
A singular value thresholding algorithm for matrix completion.SIAM Journal on Optimization 20 (2010), 1956–1982.
Chen, C., He, B., Ye, Y., and Yuan, X.
The direct extension of admm for multi-block convex minimization problems is not necessarily convergent.Mathematical Programming 155, 1-2 (2016), 57–79.
Goldstein, T., ODonoghue, B., and Setzer, S.
Fast alternating direction optimization methods.SIAM Journal on Imaging Sciences 7, 3 (2014).
He, B., and Yuan, X.
On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method.SIAM Journal on Numerical Analysis 50, 2 (2012), 700–709.
Lin, Z., Chen, M., and Ma, Y.
The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices.arXiv preprint arXiv:1009.5055 (2010).
Oh, T.-H., Tai, Y.-W., Bazin, J.-C., Kim, H., and Kweon, I. S.
Partial sum minimization of singular values in robust pca: Algorithm and applications.IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 4 (2016), 744–758.
Ouyang, Y., Chen, Y., Lan, G., and Pasiliao Jr, E.
An accelerated linearized alternating direction method of multipliers.SIAM Journal on Imaging Sciences 8, 1 (2015), 644–681.
Yang, J., and Yuan, X.
Linearized augmented lagrangian and alternating direction methods for nuclear norm minimization.Mathematics of Computation 82, 281 (2013), 301–329.
Quanming Yao (HKUST) April 6, 2017 24 / 24