randomized coordinate descent methods on optimization ... › ~optimization › l1 ›...
TRANSCRIPT
![Page 1: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/1.jpg)
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Lijun Xu Optimization Group Meeting
November 27, 2012
By I. Necoara, Y. Nesterov, and F. Glineur
![Page 2: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/2.jpg)
Outline
Introduction Randomized Block (i,j) Coordinate Descent
Method RCD Method in Strongly Convex Case Random Pairs Sampling Extensions Numerical experiment
![Page 3: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/3.jpg)
• Coordinate Descent Method consider Q: How to choose ? a) cyclic. (difficult to prove convergence) b) maximal descent. convergence rate is trivial (worse than simple
Gradient Method in general) c) random. (faster, simpler, robust, distributed
and parallel, etc. )
Introduction
![Page 4: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/4.jpg)
Introduction
• Randomized(block) coordinate descent methods
a) The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010)[1].
b) The extension to composite functions was given by Richtárik and Takáč (2011)[2]
[1] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, Core Discussion Paper, 2010. [2] P. Richtarik and M. Takac, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, submitted to Mathematical Programming, 2011.
![Page 5: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/5.jpg)
Problem formulation
• Minimize a separable convex objective function
with linearly coupled constraints. • Extension to problems with non-separate objective
function and general linear constraints.
![Page 6: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/6.jpg)
Motivation of Formulation
Applications in •Resource allocation in economic systems, •Distributed computer systems, •Traffic equilibrium problems, •Network flow, etc.
Dual problem corresponding to an optimization of a sum of convex functions.
Finding a point in the intersection of some convex sets.
![Page 7: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/7.jpg)
Notations
• (2.1) becomes
* *
1* * *
1 1* *
[1 ]
: 0 0
( ) ( ( ) ,..., ( ) ) ( ,..., )
( ) ( )
N
ii
T T T T TN N
i i j j N
KKT Ux x
f x U f x f xf x f x i j
λ λ λ=
= ⇔ =
∇ = ⇔ ∇ ∇ =
⇔∇ =∇ ∀ ≠ ∈
∑
:
min ( ) s.t. 0Nnxf x Ux
∈=
![Page 8: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/8.jpg)
Notations • Consider the subspace • It’s orthogonal complement
• Define extended norm induced by G: (for the gradients), Cauchy-Schwartz inequality:
![Page 9: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/9.jpg)
Notations
• Partition of the identity matrix
0
0
n n
i n n
n n
U I
×
×
×
=
----i th entry 1
, ,
( ) ( ), .
Nn
i i ii
ni i
x U x x
f U fα α α=
= ∈
= ∈
∑
1
1
( )
( ) ( )
N
i i ii
N
i i ii
x x d U x d
f x f x d
+
=
+
=
= + = +
= +
∑
∑, n
i ix d ∈
![Page 10: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/10.jpg)
Basic Assumption • All are convex , • are Lipschitz continuous(with Lipschitz
constants ), i.e.:
• Graph (V,E) is undirected and connected, with N notes V={1,…,N}.
use as chosen coordinates.
0iL >
if
if∇
( , )i j E∈
![Page 11: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/11.jpg)
Randomized Block (i,j) Coordinate Descent Method
• Recall
• Choose randomly a pair with probability
• Define
1 1
1
min ( ) ( )+ + ( )
s.t. 0.Nn N Nx
N
f x f x f x
x x∈
=
+ + =
( , )i j E∈( ) 0ij jip p= >
![Page 12: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/12.jpg)
• Consider feasibility of i.e. we require . • Minimize the right hand side adding feasibility
• Get the following decrease in f
Randomized Block (i,j) Coordinate Descent Method
0i jd d+ =
![Page 13: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/13.jpg)
Randomized Block (i,j) Coordinate Descent Method
• Each iteration: compute only , full gradient methods: . • depends on random variable: • Define the expected value:
![Page 14: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/14.jpg)
• Key Inequality :
• where
Randomized Block (i,j) Coordinate Descent Method
(0 ( )( ))T T Nn Nnij N i j i j nG e e e e I ×= + − − ⊗ ∈
![Page 15: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/15.jpg)
• Introduce the distance
which measures the size of the level set of f given by .
• Convergence results:
Randomized Block (i,j) Coordinate Descent Method
0x
![Page 16: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/16.jpg)
• Proof convexity : and key inequality: obtain take expectation in (denoting ),
Randomized Block (i,j) Coordinate Descent Method
![Page 17: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/17.jpg)
Design of the probability • Uniform probabilities:
• Dependent on the Lipschitz constants:
• Design the probability since
![Page 18: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/18.jpg)
Recall convergence rate: Idea: searching for to optimize . i.e. is assumed constant such that for .
Design of the probability
![Page 19: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/19.jpg)
• Using the relaxation from semidefinite programming:
Design of the probability
where , and are multipliers in Lagrange Relaxation.
2 2 21( , , )T
NR R R=
![Page 20: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/20.jpg)
• Note
• Convergence rate under designed probability
Design of the probability
![Page 21: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/21.jpg)
Comparison with full gradient method
1 1 1
11
1
11 1
NL L L
NL L
LN
L L L N N
− − −
−−
−
−− −×
=
• Consider a particular case: a) a complete graph b) probability ,
• upper bound (BCD method)
![Page 22: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/22.jpg)
• Full gradient method
similarly, (full)
(random)
Comparison with full gradient method
![Page 23: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/23.jpg)
Strongly Convex Case • Strongly convex w.r.t with convexity
parameter
and key inequality:
minimizing over x
![Page 24: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/24.jpg)
• Similarly, choose the optimal probability by solving the following SDP:
Strongly Convex Case
![Page 25: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/25.jpg)
Rate of convergence in probability
• The proof use a similar reasoning as Theorem 1 in [14] and is
derived from Markov inequality. [14] P. Richtarik and M. Takac, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, submitted to Mathematical Programming, 2011.
![Page 26: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/26.jpg)
Rate of convergence in probability
![Page 27: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/27.jpg)
Random pairs sampling • method needs to choose a pair of
coordinates at each iteration. • So we need a fast procedure to generate
random pairs. • Given probability distribution redefine into a indices vector such that:
then divide [0,1] into subintervals:
( , )( ) i jRCD
( , )i j
| |pn E=
pn
Remark :
![Page 28: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/28.jpg)
• Clearly, the width of interval equals the probability ,
• Sampling Algorithm Description
Random pairs sampling
l
l li jp
![Page 29: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/29.jpg)
Generalizations
Extension of to more than one pair. ( , )( ) i jRCD
The same rate of convergence will be obtained for as previous sections.
( )MRCD
![Page 30: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/30.jpg)
Extension of to nonseparable objective functions with general equality constraints.
has component-wise Lipschitz continuous gradient:
Generalizations ( , )( ) i jRCD
f
![Page 31: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/31.jpg)
• Assuming
Generalizations
arg mini i j jA s A s+
![Page 32: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/32.jpg)
• Similar convergence rate:
• Similar choosing the probability:
Generalizations
![Page 33: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/33.jpg)
Google Problem
Goal:
![Page 34: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing](https://reader035.vdocuments.us/reader035/viewer/2022081405/5f0d16b57e708231d4389f93/html5/thumbnails/34.jpg)
Thank you!