majorization-minimization algorithm€¦ · distributed algorithm for nonlinear programming...
TRANSCRIPT
Majorization-Minimization Algorithm
Ying Sun and Prof. Daniel P. Palomar
ELEC5470/IEDA6100A - Convex OptimizationThe Hong Kong University of Science and Technology (HKUST)
Fall 2019-20
Acknowledgment
Slides of this lecture are majorly based on the following works:[Hun-Lan’J04] D. R. Hunter, and K. Lange, "A Tutorial on MMAlgorithms", Amer. Statistician, pp. 30-37, 2004.[Raz-Hon-Luo’J13] M. Razaviyayn, M. Hong, and Z. Luo, "A UnifiedConvergence Analysis of Block Successive Minimization Methods forNonsmooth Optimization", SIAM J. Optim., pp. 1126-1153, 2013.[Scu-Fac-Son-Pal-Pan’J14] G. Scutari, F. Facchinei, Peiran Song,D.P. Palomar, and Jong-Shi Pang, "Decomposition by PartialLinearization: Parallel Optimization of Multi-Agent Systems", IEEETrans. Signal Process., vol. 62, no. 3, pp. 641-656, Feb. 2014.
[Sun-Bab-Pal’J17] Y. Sun, P. Babu, and D. P. Palomar,“Majorization-Minimization Algorithms in Signal Processing,Communications, and Machine Learning”, IEEE Trans. SignalProcess, vol. 65, no. 3, pp. 794-816, Feb. 2017.
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Problem Statement
Consider the following optimization problem
minimizex
f (x)
subject to x ∈X ,
with X being a closed convex set and f (x) being continuous.f (x) is too complicated to manipulate.
Idea: successively minimize an approximating function u(x,xk
)xk+1 = arg min
x∈Xu(x,xk
),
hoping the sequence of minimizers{xk}will converge to
optimal x?.
Question: how to construct u(x,xk
)?
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 5 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Problem Statement
Consider the following optimization problem
minimizex
f (x)
subject to x ∈X ,
with X being a closed convex set and f (x) being continuous.f (x) is too complicated to manipulate.
Idea: successively minimize an approximating function u(x,xk
)xk+1 = arg min
x∈Xu(x,xk
),
hoping the sequence of minimizers{xk}will converge to
optimal x?.
Question: how to construct u(x,xk
)?
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 5 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Problem Statement
Consider the following optimization problem
minimizex
f (x)
subject to x ∈X ,
with X being a closed convex set and f (x) being continuous.f (x) is too complicated to manipulate.
Idea: successively minimize an approximating function u(x,xk
)xk+1 = arg min
x∈Xu(x,xk
),
hoping the sequence of minimizers{xk}will converge to
optimal x?.
Question: how to construct u(x,xk
)?
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 5 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Terminology
Distance from a point to a set:
d (x,S ) = infs∈S‖x− s‖ .
Directional derivative:
f ′ (x;d) , liminfλ↓0
f (x+ λd)− f (x)
λ.
Stationary point: x is a stationary point if
f ′ (x;d)≥ 0, ∀d such that x+d ∈X .
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 6 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Majorization-Minimization
Construction rule:
u (y,y) = f (y) , ∀y ∈X (A1)u (x,y)≥ f (x) , ∀x,y ∈X (A2)u′ (x,y;d)
∣∣x=y = f ′ (y;d) , ∀d with y +d ∈X (A3)
u (x,y) is continuous in x and y (A4)
Pictorially:
f (x)
xk
u(x,xk
)
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 7 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Algorithm
Majorization-Minimization (Successive Upper-Bound Minimization):1: Find a feasible point x0 ∈X and set k = 02: repeat3: xk+1 = arg minx∈X u
(x,xk
)(global minimum)
4: k ← k +15: until some convergence criterion is met
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 8 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Convergence
Under assumptions A1-A4, every limit point of the sequence{xk}is a stationary point of the original problem.
If further assume that the level set X 0 ={x|f (x)≤ f
(x0)}
iscompact, then
limk→∞
d(xk ,X ?
)= 0,
where X ? is the set of stationary points.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 9 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Construct Surrogate Function
The performance of Majorization-Minimization algorithmdepends crucially on the surrogate function u
(x,xk
).
Guideline: the global minimizer of u(x,xk
)should be easy to
find.
Suppose f (x) = f1 (x) + κ (x), where f1 (x) is some “nice”function and κ (x) is the one needed to be approximated.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 11 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Construction by Convexity
Suppose κ (t) is convex, then
κ
(∑i
αi ti
)≤∑
i
αiκ (ti )
with αi ≥ 0 and ∑αi = 1.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 12 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
For example:
κ
(wTx
)= κ
(wT(x−xk
)+wTxk
)= κ
(∑i
αi
(wi
(xi −xki
)αi
+wTxk))
≤∑i
αiκ
(wi
(xi −xki
)αi
+wTxk)
If further assume that w and x are positive (αi = wixki /w
Txk):
κ
(wTx
)≤∑
i
wixki
wTxkκ
(wTxk
xkixi
)The surrogate functions are separable (parallel algorithm).
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 13 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Construction by Taylor Expansion
Suppose κ (x) is concave and differentiable, then
κ (x)≤ κ
(xk)
+ ∇κ
(xk)(
x−xk),
which is a linear upper-bound.Suppose κ (x) is convex and twice differentiable, then
κ (x)≤ κ
(xk)
+∇κ
(xk)T (
x−xk)
+12
(x−xk
)TM(x−xk
)if M−∇2κ (x)� 0, ∀x.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 14 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Construction by Inequalities
Arithmetic-Geometric Mean Inequality:(n
∏i=1
xi
)1/n
≤ 1n
n
∑i=1
xi
Cauchy-Schwartz Inequality:
‖x‖ ≥ xTxk
‖xk‖
Jensen’s Inequality:
κ (Ex)≤ Eκ (x)
with κ (·) being convex.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 15 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
EM Algorithm
Assume the complete data set {x,z} consists of observedvariable x and latent variable z.Objective: estimate parameter θ ∈Θ from x.Maximum likelihood estimator: θ = arg minθ∈Θ− logp (x|θ)
EM (Expectation Maximization) algorithm:
E-step: evaluate p(z|x,θk
)“guess” z from current estimate of θ
M-step: update θ as θk+1 = arg minθ∈Θ u(θ ,θk
), where
u(
θ ,θk)
=−Ez|x,θk logp (x,z|θ)
update θ from “guessed” complete data set
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 17 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
An MM Interpretation of EM
The objective function can be written as
− logp (x|θ)
=− log Ez|θp (x|z,θ)
=− log Ez|θ
(p(z|x,θk
)p (x|z,θ)
p (z|x,θk)
)
=− log Ez|x,θk
(p (x|z,θ)
p (z|x,θk)p (z|θ)
)≤−Ez|x,θk log
(p (x|z,θ)
p (z|x,θk)p (z|θ)
)(Jensen’s Inequality)
=−Ez|x,θk logp (x,z|θ)︸ ︷︷ ︸u(θ ,θk)
+ Ez|x,θkp(z|x,θk
)
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 18 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Proximal Minimization
f (x) is convex. Solve minx f (x) by solving the equivalentproblem
minimizex∈X ,y∈X
f (x) + 12c ‖x−y‖2 .
Objective function is strongly convex in both x and y.Algorithm:
xk+1 = arg minx∈X
{f (x) +
12c
∥∥∥x−yk∥∥∥2}
yk+1 = xk+1.
An MM interpretation:
xk+1 = arg minx∈X
{f (x) +
12c
∥∥∥x−xk∥∥∥2}
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 19 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
DC Programming
Consider the unconstrained problem
minimizex∈Rn
f (x) ,
where f (x) = g (x) +h (x) with g (x) convex and h (x) concave.DC (Difference of Convex) Programming generates
{xk}by
solving∇g(xk+1
)=−∇h
(xk).
An MM interpretation:
xk+1 = arg minx
{g (x) + ∇h
(xk)T (
x−xk)}
.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 20 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Power Control by GP
[Chi-Tan-Pal-O’Ne-Jul’J07] Problem: maximize systemthroughput. Essentially we need to solve the following problem:
minimizeP∈P
∑j 6=i GijPj+ni∑j GijPj+ni
Objective function is the ratio of two posynomials.Minorize a posynomial, denoted by g (x) = ∑i mi (x), bymononial:
g (x)≥∏i
(mi (x)
αi
)αi
where αi =mi(xk)g(xk)
. (Arithmetic-Geometric Mean Inequality)
Solution: approximate the denominator posynomial∑j GijPj +ni by monomial.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 22 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Reweighted `1-norm
Sparsity signal recovery problem
minimizex
‖x‖0subject to Ax = b
`1-norm approximation
minimizex
‖x‖1subject to Ax = b
General formminimize
x∑ni=1 φ (|xi |)
subject to Ax = b
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 23 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
[Can-Wak-Boy’J08] Assume φ (t) is concave nondecreasing, atxki , φ (|xi |) is majorized by wk
i |xi | with wki = φ ′ (t)|t=|xi |.
At each iteration a weighted `1-norm is solved
minimizex
∑wki |xi |
subject to Ax = b
0x
x
1x
1
log 1x
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 24 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Sparse Generalized Eigenvalue Problem
`0-norm regularized generalized eigenvalue problem
maximizex
xTAx−ρ ‖x‖0subject to xTBx = 1.
Replace ‖xi‖0 by some nicely behaved function gp (xi )
|xi |p, 0< p ≤ 1log (1+ |xi |/p)/ log (1+1/p), p > 01− e−|xi |/p, p > 0.
Take gp (xi ) = |xi |p for example.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 25 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
[Son-Bab-Pal’J15a] Majorize gp (xi ) at xki by quadratic functionwki x
2i + cki .
The surrogate function for gp (xi ) = |xi |p is defined as
u(xi ,x
ki
)=
p
2
∣∣∣xki ∣∣∣p−2 x2i +(1− p
2
)∣∣∣xki ∣∣∣p .Solve at each iteration the following GEVP:
maximizex
xTAx−ρxTdiag(wk)x
subject to xTBx = 1
However, as |xi | → 0, wi →+∞...
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 26 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Smooth approximation of gp (x):
g εp (x) =
{ p2εp−2x2, |x | ≤ ε
|x |p−(1− p
2
)εp, |x |> ε
When |x | ≤ ε , w remains to be a constant.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 27 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
0 10 20 30 40 50 60
0
0.5
1
1.5
2
2.5
3
iterations
obje
ctive v
alu
e
IRQM (proposed)
DC−SGEP
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 28 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Sequence Design
Complex unimodular sequence {xn ∈ C}Nn=1.Autocorrelation: rk = ∑
Nn=k+1 xnx
∗n−k = r∗−k , k = 0, . . . ,N−1.
Integrated sidelobe level (ISL):
ISL =N−1
∑k=1|rk |2 .
Problem formulation:
minimize{xn}Nn=1
ISL
subject to |xn|= 1, n = 1, . . . ,N.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 29 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
By Fourier transform:
ISL ∝
2N
∑p=1
[∣∣∣aHp x∣∣∣2−N
]2with x = [x1, . . . ,xN ]T , ap =
[1,e jωp , . . . ,e jωp(N−1)
]Tand
ωp = 2π
2N (p−1).Equivalent problem:
minimizex
∑2Np=1(aHp xxHap
)2subject to |xn|= 1, ∀n.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 30 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
[Son-Bab-Pal’J15b] Define A = [a1, . . . ,a2N ],
pk =[∣∣aH1 xk
∣∣2 , . . . , ∣∣aH2Nxk∣∣2]T , A = A
(diag
(pk)−pkmaxI
)AH .
Quadratic surrogate function:
������
�:const.pkmaxx
HAAHx +2Re(
xH(
A−2N2xk(xk)H)
xk)
Equivalent tominimize
x‖x−y‖2
subject to |xn|= 1, ∀n
with
y =−(
A−2N2xk(xk)H)
xk
Closed-form solution: xn = e j arg(yn).
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 31 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
0 20 40 60 80 10030
35
40
45
50
55
iteration
Inte
gra
ted s
idelo
be level (d
B)
benchmark
proposed method
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 32 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
Covariance Estimation
xi ∼ elliptical(0,Σ)
Fitting normalized sample si = xi‖xi‖2
to Angular CentralGaussian distribution
f (si ) ∝ det(Σ)−1/2(sTi Σ
−1si)−K/2
[Sun-Bab-Pal’J14] Shrinkage penalty
h (Σ) = log det(Σ) +Tr(Σ−1T
)Solve the following problem:
minimizeΣ
log det(Σ) + KN ∑ log
(xTi Σ
−1xi)
+ αh (Σ)
subject to Σ� 0
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 33 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
At Σk , the objective function is majorized by
(1+ α) log det(Σ) +K
N
N
∑i=1
xTi Σ−1xi
xTi(Σk)−1
xi+ αTr
(Σ−1T
)
Surrogate function is convex in Σ−1.Setting the gradient to zero leads to the weighted sampleaverage
Σk+1 =1
1+ α
K
N ∑xixTi
xTi(Σk)−1
xi+
α
1+ αT
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 34 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionConstruction TechniquesExample AlgorithmsApplications
1 6 11 168
10
12
14
16
18
20
Iteration
Obj
ectiv
e F
unct
ion
Val
ue
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 35 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Problem Statement
Consider the following problem
minimizex∈X
f (x)
Set X possesses Cartesian product structure X = ∏mi=1Xi .
Observation: the problem
minimizexi∈Xi
f(x01, . . . ,x
0i−1,xi ,x
0i+1, . . . ,x
0m
)with x0−i taking some feasible value, is easy to solve.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 37 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Block Coordinate Descent (BCD)
Denote x , (x1, . . . ,xm),f(x01, . . . ,x
0i−1,xi ,x
0i+1, . . . ,x
0m
), f
(xi ,x0
)Block Coordinate Descent (nonlinear Gauss-Seidel)1: Initialize x0 ∈X and set k = 0.2: repeat3: k = k +1, i = (k mod n) +14: xki = arg minxi∈Xi
f(xi ,xk−1
)5: xki ← xk−1i , ∀k 6= i6: until some convergence criterion is met
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 39 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Convergence
[Ber’B99] Assume that
f (x) is continuously differentiable over the set X .xki = arg minxi∈Xi
f(xi ,xk−1
)has a unique solution.
Then every limit point of the sequence{xk}is a stationary
point.[Gri-Sci’J00] Generalizations
globally convergent for m = 2.f is component-wise strictly quasi-convex w.r.t. m−2components.f is pseudo-convex.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 40 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
BS-MM Algorithm
Combination of MM and BCDBlock Succesive Majorization-Minimization (BS-MM):1: Initialize x0 ∈X and set k = 0.2: repeat3: k = k +1, i = (k mod n) +14: X k = arg minxi∈Xi
ui(xi ,xk−1
)5: Set xki to be an arbitrary element in X k
6: xki ← xk−1i , ∀k 6= i7: until some convergence criterion is met
Generalization of BCD
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 42 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Convergence
Surrogate function ui (·, ·) satisfies the following assumptions
ui (yi ,y) = f (y) , ∀y ∈X ,∀i (B1)ui (xi ,y)≥ f (y1, . . . ,yi−1,xi ,yi+1, . . . ,yn) ,
∀xi ∈Xi ,∀y ∈X ,∀i (B2)u′i (xi ,y;di )
∣∣xi=yi
= f ′ (y;d) ,
∀d = (0, . . . ,di , . . . ,0) such that yi +di ∈Xi ,∀i (B3)ui (xi ,y) is continuous in (xi ,y) , ∀i (B4)
In short, ui(xi ,xk
)majorizes f (x) on the ith block.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 43 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Under assumptions B1-B4, for simplicity additionally assumethat f is continuously differentiable,
ui (xi ,y) is quasi-convex in xi ,each subproblem minxi∈Xi
ui(xi ,xk−1
)has a unique solution
for any xk−1 ∈X ,then every limit point of
{xk}is a stationary point.
level set X 0 ={x|f (x)≤ f
(x0)}
is compact,each subproblem minxi∈Xi
ui(xi ,xk−1
)has a unique solution
for any xk−1 ∈X for at least m−1 blocks,then limk→∞ d
(xk ,X ?
)= 0.
More restrictive assumption than MM due to the cyclic updatebehavior.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 44 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Alternating Proximal Minimization
Consider the problem
minimizex
f (x1, . . . ,xm)
subject to xi ∈Xi ,
with f (·) being convex in each block.The convergence of BCD is not easy to establish since eachsubproblem may have multiple solutions.Alternating Proximal Minimization solves
minimizexi
f(xk1 , . . . ,x
ki−1,xi ,x
ki+1, . . . ,x
km
)+ 1
2c
∥∥xi −xki∥∥2
subject to xi ∈Xi
Strictly convex objective→unique minimizer
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 46 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Proximal Splitting Algorithm
Consider the following problem
minimizex
∑mi=1 fi (xi ) + fm+1 (x1, . . . ,xm)
subject to xi ∈Xi , i = 1, . . . ,m
with fi convex and lower semicontinuous, fm+1 convex and
‖∇fm+1 (x)−∇fm+1 (y)‖ ≤ βi ‖xi −yi‖
Cyclically update:
xk+1i = proxγfi
(xki − γ∇xi fm+1
(xk))
,
with the proximity operator defined as
proxf (x) = arg miny∈X
f (y) +12‖x−y‖2 .
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 47 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Proximal Splitting Algorithm
BS-MM interpretation:
ui
(xi ,xk
)= fi (xi ) +
12γ
∥∥∥xi −xki∥∥∥2
+ ∇xi fm+1
(xk)T (
xi −xki)
+ ∑j 6=i
fj
(xkj)
+ fm+1
(xk−i ,xi
).
Check:
fm+1
(xk)
+12γ
∥∥∥xi −xki∥∥∥2
+ ∇xi fm+1
(xk)T (
xi −xki)
≥fm+1
(xk)
+βi
2
∥∥∥xi −xki∥∥∥2
+ ∇xi fm+1
(xk)T (
xi −xki)
≥fm+1
(xk−i ,xi
)(Descent lemma)
with γ ∈ [εi ,2/βi − εi ] and εi ∈ (0,min{1,1/βi}).Ying Sun and Prof. Daniel P. Palomar MM Algorithm 48 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
Robust Estimation of Location and Scatter
xi ∼ elliptical(µ,R)
[Sun-Bab-Pal’J15] Fitting xi to a Cauchy distribution with pdf
f (x) ∝ det(R)−1/2(1+ (xi −µ)T R−1 (xi −µ)
)−(K+1)/2
Shrinkage penalty
h (t,T) =K log(Tr(R−1T
))+log det(R)+log
(1+ (t−µ)T R−1 (t−µ)
)Solve the following problem:
minimizeµ,R�0
log det(R) + K+1N ∑
Ni=1 log
(1+ (xi −µ)T R−1 (xi −µ)
)+αh (t,T)
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 50 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
BS-MM Algorithm update:
µt+1 =(K +1)∑
Ni=1wi (µt ,Rt)xi +Nαwt (µt ,Rt)t
(K +1)∑Ni=1wi (µt ,Rt)+Nαwt (µt ,Rt)
Rt+1 =K +1N+Nα
N
∑i=1
wi
(µt+1,Rt
)(xi −µt+1
)(xi −µt+1
)T+
Nα
N+Nαwt(µt+1,Rt
)(µt+1− t
)(µt+1− t
)T+
NαK
N+Nα
T
Tr(R−1t T
)where
wi (µ,R) =1
1+(xi −µ)T R−1 (xi −µ)
wt (µ,R) =1
1+(t−µ)T R−1 (t−µ).
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 51 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
1 2 3 4 5 6 7 8 9 10 11 12 13 14 152000
2500
3000
3500
4000
4500
5000
Iterations
Obj
ectiv
e fu
nctio
n va
lue
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 52 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Problem Statement
Consider the following problem:
minimizex
f (x)
subject to xi ∈Xi
where the Xi ’s are closed and convex sets,f (x) = ∑
Ll=1 fl (x1, . . . ,xm).
Conditional gradient update (Frank-Wolfe):
xk+1 = xk + γkdk
direction dk , xk −xk with
xki = arg minxi∈Xi
∇xi f(xk)T (
xi −xki)
step-size γk ∈ (0,1], chosen to guarantee convergence.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 54 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Exact Jacobi SCA Algorithm
Idea:
Conditional gradient update linearize all the fl ’s at xk .Each function fl might be convex w.r.t. some block xi .We want to preserve the convex property of fl
(xi ,xk−i
).
Solution: keep the convex fl(xi ,xk−i
)’s and linearize the others.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 55 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Define Ci as the set of indices of l such that fl(xi ,xk−i
)is
convex.Approximate f (x) on the ith block at point xk :
fi
(xi ,xk
)= ∑
l∈Ci
fl
(xi ,xk−i
)︸ ︷︷ ︸
convex terms
+π i
(xk)T (
xi −xki)
︸ ︷︷ ︸linear approx.
non-convex terms
+τi
2
(xi −xki
)THi
(xk)(
xi −xki)
︸ ︷︷ ︸proximal term
,
withπ i
(xk)= ∑
l /∈Ci
∇xi fl
(xk)
and Hi
(xk)� cHi
I.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 56 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Exact Jacobi SCA update:
xi(xk ,τi
)= arg min
xi∈Xi
fi
(xi ,xk
)xk+1 = xk + γ
k(x−xk
)Step-size rule
constant step-size that depends on the Lipschitz constant of∇fdiminishing step-size
Remark: update of the blocks can be done sequentially(Gauss-Seidel SCA Algorithm)
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 57 / 65
Outline
1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications
2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications
3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Extensions
FLEXA
non-smooth objective functioninexact update directionflexible block update choice
HyFLEXA
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 59 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Comparison
BS-MM FLEXAconvergence stationary point stationary point
objective functioncontinuous continuous
may not be smooth may not be smoothconstraint set Cartesian Cartesian & convexupdate rule sequential sequential or parallel
approx. functionglobal upper-bound local approximationunique minimizer not requiredcan be non-convex convex approx.
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 60 / 65
The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization
Distributed Algorithm for Nonlinear Programming
Exact Jacobi Successive Convex ApproximationExtensions
Summary
We have studied
Majorization-Minimization algorithmBlock Coordinate Descent algorithmBlock Successive Majorization-Minimization algorithm
We have briefly introduced
Distributed Successive Convex Approximation algorithm
Ying Sun and Prof. Daniel P. Palomar MM Algorithm 61 / 65
References I
D. R. Hunter and K. Lange.A tutorial on MM algorithms.Amer. Statistician, volume 58, pp. 30–37, 2004.
M. Razaviyayn, M. Hong, and Z. Luo.A unified convergence analysis of block successive minimization methods fornonsmooth optimization.SIAM J. Optim., volume 23, no. 2, pp. 1126–1153, 2013.
G. Scutari, F. Facchinei, P. Song, D. Palomar, and J.-S. Pang.Decomposition by partial linearization: Parallel optimization of multi-agentsystems.IEEE Trans. Signal Process., volume 62, no. 3, pp. 641–656, 2014.
Y. Sun, P. Babu, and D. Palomar.Majorization-minimization algorithms in signal processing, communications, andmachine learning.IEEE Transactions on Signal Processing, volume 65, no. 3, pp. 794–816, 2017.
M. Chiang, C. W. Tan, D. Palomar, D. O’Neill, and D. Julian.Power control by geometric programming.IEEE Trans. Wireless Commun, volume 6, no. 7, pp. 2640–2651, 2007.
References II
E. J. Candes, M. Wakin, and S. Boyd.Enhancing sparsity by reweighted l1 minimization.J. Fourier Anal. Appl., volume 14, no. 5-6, pp. 877–905, 2008.
J. Song, P. Babu, and D. P. Palomar.Sparse generalized eigenvalue problem via smooth optimization.IEEE Trans. Signal Process., volume 63, no. 7, pp. 1627–1642, 2015.
—.Optimization methods for designing sequences with low autocorrelationsidelobes.IEEE Trans. Signal Process., volume 63, no. 15, pp. 3998–4009, 2015.
Y. Sun, P. Babu, and D. P. Palomar.Regularized Tyler’s scatter estimator: Existence, uniqueness, and algorithms.IEEE Trans. Signal Processing, volume 62, no. 19, pp. 5143–5156, 2014.
D. P. Bertsekas.Nonlinear Programming.Athena Scientific, 1999.
L. Grippo and M. Sciandrone.On the convergence of the block nonlinear gauss–seidel method under convexconstraints.Oper. Res. Lett., volume 26, no. 3, pp. 127–136, 2000.
References III
Y. Sun, P. Babu, and D. P. Palomar.Regularized robust estimation of mean and covariance matrix under heavy-taileddistributions.IEEE Transactions on Signal Processing, volume 63, no. 12, pp. 3096–3109,2015.
Thanks
For more information visit:
https://www.danielppalomar.com