online submodular optimization problemsniaohe.ise.illinois.edu/ie598/online submodular... · 2018....
TRANSCRIPT
Online Submodular Optimization Problems
Menglong Li
April 26, 2018
Menglong Li (UIUC) Short title April 26, 2018 1 / 24
Overview
1 The Reason
2 Online convex optimization (OCO)
3 Online submodular set function minimization
4 Online Continuous Submodular Maximization
5 Online submodular minimization in real spaceQuadratic submodular functionsGeneral continuous submodular functions
Menglong Li (UIUC) Short title April 26, 2018 2 / 24
Why consider submodular functions
Question: why consider submodular functions?
Submodular functions often appear as objective functions of machinelearning tasks such as sensor placement, document summarization oractive learning
submodular functions can model valuation functions of agents withdiminishing returns
Submodular function itself is nonconvex and nonconcave in general,which lead to very hard optimization problems
Menglong Li (UIUC) Short title April 26, 2018 3 / 24
Why consider submodular functions
Question: why consider submodular functions?
Submodular functions often appear as objective functions of machinelearning tasks such as sensor placement, document summarization oractive learning
submodular functions can model valuation functions of agents withdiminishing returns
Submodular function itself is nonconvex and nonconcave in general,which lead to very hard optimization problems
Menglong Li (UIUC) Short title April 26, 2018 3 / 24
Why consider submodular functions
Question: why consider submodular functions?
Submodular functions often appear as objective functions of machinelearning tasks such as sensor placement, document summarization oractive learning
submodular functions can model valuation functions of agents withdiminishing returns
Submodular function itself is nonconvex and nonconcave in general,which lead to very hard optimization problems
Menglong Li (UIUC) Short title April 26, 2018 3 / 24
Why consider submodular functions
Question: why consider submodular functions?
Submodular functions often appear as objective functions of machinelearning tasks such as sensor placement, document summarization oractive learning
submodular functions can model valuation functions of agents withdiminishing returns
Submodular function itself is nonconvex and nonconcave in general,which lead to very hard optimization problems
Menglong Li (UIUC) Short title April 26, 2018 3 / 24
Why consider online problem
Question: why consider online problem?
Online problem can model multi-period problem which the cost function ineach period is revealed after the decision is made at that period.
Menglong Li (UIUC) Short title April 26, 2018 4 / 24
Why consider online problem
Question: why consider online problem?Online problem can model multi-period problem which the cost function ineach period is revealed after the decision is made at that period.
Menglong Li (UIUC) Short title April 26, 2018 4 / 24
Online convex optimization
Model setting:Consider a multi-period decision-making problem where a decision makermakes decision at each period to minimize the ”regret”.For t=1,...,T,at iteration t, the decision maker chooses xt ∈ K .After the decision maker has committed to this choice, a convex costfunction ft ∈ F : K → R is revealed.Then go to the next period.The decision maker wants to minimize the regret:
RT ((xt)) =T∑t=1
ft(xt)−minx∈K
T∑t=1
ft(x).
Menglong Li (UIUC) Short title April 26, 2018 5 / 24
OCO
Projected gradient algorithm:
xt+1 = ΠK (xt − αt∇ft(xt))
. Denote D the diameter of K .
1 If ft is L-Lipschtiz continuous and αt = DL√2T
, then
RT ((xt)) ≤ DL√
2T
2 If ft is L-Lipschtiz continuous and µ-strongly convex, and αt = 1µt ,
then
RT ((xt)) ≤ L2(1 + logT )
2µ
Menglong Li (UIUC) Short title April 26, 2018 6 / 24
Submodular set function
Let [n] = {1, ..., n}. A set function f : 2[n] → R is called submodular if forall sets S ,T ⊆ [n] such that T ⊆ S , and for all elements i ∈ [n]− S , wehave
f (T ∪ {i})− f (T ) ≥ f (S ∪ {i})− f (S).
Or equivalently, f is submodular if and only if for all S ,T ⊆ [n],
f (S ∪ T ) + f (S ∩ T ) ≤ f (S) + f (T ).
An important theorem:For any set function f : 2[n] → R, there is a convex extension (calledLovasz extension) f L : [0, 1]n → R such that f is submodular if and onlyf L is convex. In addition, there is a correspondence between minimum off L and minimum of f .
Menglong Li (UIUC) Short title April 26, 2018 7 / 24
Results
The model setting is similar to OCO:
Assume ft : 2[n] → [−M,M] is submodular.
Suppose in each period t, the decision maker has unlimited access tothe value oracles of the previously seen cost functions f1, f2, ...ft−1.
Theorem (Hazan, E., & Kale, S. (2012).)
There is an online subgradient descent algorithm with step sizeαt =
√n
16MT such that
E [RegretT ] ≤ 4M√nT
Menglong Li (UIUC) Short title April 26, 2018 8 / 24
Results
The model setting is similar to OCO:
Assume ft : 2[n] → [−M,M] is submodular.
Suppose in each period t, the decision maker has unlimited access tothe value oracles of the previously seen cost functions f1, f2, ...ft−1.
Theorem (Hazan, E., & Kale, S. (2012).)
There is an online subgradient descent algorithm with step sizeαt =
√n
16MT such that
E [RegretT ] ≤ 4M√nT
Menglong Li (UIUC) Short title April 26, 2018 8 / 24
Submodular function in real space
Let L be a lattice. A function f : L → R ∪ {+∞} is submodular if for allx , y ∈ L,
f (x) + f (y) ≥ f (x ∧ y) + f (x ∨ y)
It is clear that when f is twice differentiable and L is a box, f issubmodular if and only if
∂2f (x)
∂xi∂xj≤ 0, ∀i 6= j , x ∈ L
A twice differentiable function f is called DR-submodular if
∂2f (x)
∂xi∂xj≤ 0,∀i , j , x ∈ L.
Menglong Li (UIUC) Short title April 26, 2018 9 / 24
The performance of stationary points
Theorem (Hassani et al. 2017)
If f be a monotone and DR-submodular and assume K ⊆ L is a convexset. Then,(i) If x is a stationary point of f in K , then f (x) ≥ 1
2OPT.(ii) Furthermore, if f is L-smooth, gradient ascent with a step size smallerthan 1/L will converge to a stationary point.
The lower bound in (i) is tight.
Menglong Li (UIUC) Short title April 26, 2018 10 / 24
Online gradient ascent
Online Gradient AscentInput: convex set K ,T , x1 ∈ K , step sizes {αt}Output: {xt : 1 ≤ t ≤ T}1: for t ← 1, 2, 3, ...,T do2: Play xt and receive reward ft(xt).3: xt+1 = ΠK (xt + αt∇ft(xt))4: end for
Theorem (Chen et al. 2018)
Assume that the functions ft : L → R+ are monotone and DR-submodularfor t = 1, 2, 3, ...,T . With step size αt = D
G√t,
1
2maxx∈K
T∑t=1
ft(x)−T∑t=1
ft(xt) ≤3
4DG√T
Here, D is the diam(K ),G = sup1≤t≤T ,x∈K ||∇ft(x)||.
Menglong Li (UIUC) Short title April 26, 2018 11 / 24
Online Quadratic submodular function minimization
Why consider quadratic functions?
Broad applications: It arises in a broad range of fields such ascombinatorial optimization, numerical partial differential equationsfrom engineering, control and finance, and general nonlinearprogramming problems
NP-hard: Nonconvex quadratic optimization problems are known tobe NP-hard
Menglong Li (UIUC) Short title April 26, 2018 12 / 24
Online Quadratic submodular function minimization
Model setting:In each period t = 1, ...,T , ft is a quadratic function, i.e., ft(x) = xTAtx .Let box K ⊂ Rn be the decision space.The decision maker wants to minimize the regret
RT ((xt)) =T∑t=1
ft(xt)−minx∈K
T∑t=1
ft(x)
(Note that a quadratic function xTAx is submodular if and only if all theoff diagonal entries are nonpositive.)
Menglong Li (UIUC) Short title April 26, 2018 13 / 24
SDP relaxation of quadratic submodular functionsminimization problem
Consider the quadratic optimization problem
OPTQP = min xTQx (QP)
s.t. x2 ∈ F
and its (SDP) relaxation
OPT SDP = min < Q,X > (SDP)
s.t. diag(X ) ∈ F ,Z ∈ Sn+.
Here, F ∈ Rn is a closed convex set. Sn and Sn+ are the set of n × n
symmetric matrices and the set of n × n positive semidefinite symmetricmatrices, respectively.
Menglong Li (UIUC) Short title April 26, 2018 14 / 24
When QP=SDP ?
Theorem (Zhang, S. (2000))
If Q = [qij ]n×n satisfies qij ≤ 0 for all i 6= j , then OPTQP = OPT SDP .Moreover, suppose that X ∗ is an optimal solution for (SDP), then√
diag(X ∗) is an optimal solution for (QP).
Menglong Li (UIUC) Short title April 26, 2018 15 / 24
Regret bound
Back to our online quadratic submodular optimization problem:For t = 1, ...,T , ft(x) = xTAtx is submodular. Decision makersuccessively chooses xt to minimize the regret
RT ((xt)) =T∑t=1
ft(xt)−minx∈K
T∑t=1
ft(x).
Menglong Li (UIUC) Short title April 26, 2018 16 / 24
Regret bound
Initiated by the SDP relaxation theorem, we first solve the online SDPproblem:For t = 1, ...,T , choose Xt ∈ Sn
+(K ) with diag(Xt) ∈ K 2 to minimize theregret
RT ((Xt)) =T∑t=1
< At ,Xt > − minX∈Sn
+(K)
T∑t=1
< At ,X >
Here, Sn+(K ) = {X ∈ Sn
+|diag(X ) ∈ K 2}.
Menglong Li (UIUC) Short title April 26, 2018 17 / 24
Algorithm
For the online SDP problem, we use the following algorithm:Initial X1.For t = 1, ...,T − 1, let Xt+1 = ΠSn
+(K2)(Xt − αtAt).
Theorem
If At are all submodular and αt = DG√T
, the regret
RT ((Xt)) ≤ DG√T
Here, D = diam(Sn+(K 2)),G = max1≤t≤T ||At ||2
Menglong Li (UIUC) Short title April 26, 2018 18 / 24
proof
Proof.
Let Yt+1 = Xt − αAt , α = DG√T
. Then
||Xt+1−X ∗||2 ≤ ||Yt+1−X ∗||2 = ||Xt−X ∗||2+α2||At ||2−2α < At ,Xt−X ∗ >
This implies
< At ,Xt − X ∗ >≤ 1
2α(||Xt − X ∗||2 − ||Xt+1 − X ∗||2) +
α
2||At ||2
Take summation from 1 to T ,
T∑t=1
< At ,Xt − X ∗ >≤ 1
2α||X1 − X ∗||2 +
α
2
T∑t=1
||At ||2
≤ D2
2α+αTG
2= DG
√T
Menglong Li (UIUC) Short title April 26, 2018 19 / 24
Regret bound for original problem
Theorem
Suppose Xt is the selected matrices in the online SDP problem. Letxt =
√diag(Xt) ∈ K ′ = K ∪ (−K ). Then
RT ((xt)) ≤ RT ((Xt)) ≤ DG√T
Proof.
Denote xij , aij the (i , j)-th entry of Xt ,At , respectively. Thenxt = (
√x11, ...,
√xnn). Since Xt is positive semidefinite, we have
x2ij ≤ xiixjj .
xTt Atxt =∑
i ,j aij√xii√xjj =
∑ni=1 aiixii +
∑i 6=j aij
√xii√xjj ≤∑n
i=1 aiixii +∑
i 6=j aij |xij | ≤∑n
i=1 aiixii +∑
i 6=j aijxij =< At ,Xt >.
Therefore, RT ((xt)) =∑T
t=1 xTt Atxt −minx∈K ′
∑Tt=1 x
TAtx ≤∑T
t=1 <
At ,Xt > −minX∈Sn+(K
′)
∑Tt=1 < At ,X >≤ DG
√T
Menglong Li (UIUC) Short title April 26, 2018 20 / 24
How about general continuous submodular functions?
Intuition: Generalize the Lovasz extension of submodular set functionLet L =
∏ni=1 Xi . Xi ⊂ R are compact. Define the measure space P(Xi )
convex hull of all the one-point distribution on Xi , i.e.,
P(Xi ) = conv{δxi : xi ∈ Xi}
Let P(L) = Πni=1P(Xi ). There are two extensions of H : L → R:
∀µ ∈ P(L), h1(µ1, ..., µn) =
∫ 1
0H(F−1µ1 (t), ...,F−1µn (t))dt
convex closure: The lowest semi-continuous convex functionsuch that hc(δx) ≤ H(x)
hc(µ1, ..., µn) = infγ∈P(L)
∫LH(x)dγ(x)
Menglong Li (UIUC) Short title April 26, 2018 21 / 24
Theorem (Bach, F. (2015))
h1 is convex if and only if H is submodular.
If H is submodular, then the two extensions are equal, i.e., h1 = hc .
Minimizing hc on P(L) and minimizing H on L is equivalent, that is,the two optimal values are equal, and one may find minimizers of oneproblem given the other one.
Menglong Li (UIUC) Short title April 26, 2018 22 / 24
References
Chen, L., Hassani, H., & Karbasi, A. (2018). Online Continuous SubmodularMaximization. arXiv preprint arXiv:1802.06052.
Hassani, H., Soltanolkotabi, M., & Karbasi, A. (2017). Gradient methods forsubmodular maximization. In Advances in Neural Information Processing Systems(pp. 5843-5853).
Hazan, E., & Kale, S. (2012). Online submodular minimization. Journal of MachineLearning Research, 13(Oct), 2903-2922.
Bach, F. (2015). Submodular functions: from discrete to continous domains. arXivpreprint arXiv:1511.00394.
Zhang, S. (2000). Quadratic maximization and semidefinite relaxation.Mathematical Programming, 87(3), 453-465
Menglong Li (UIUC) Short title April 26, 2018 23 / 24
The End
Menglong Li (UIUC) Short title April 26, 2018 24 / 24