online submodular optimization problemsniaohe.ise.illinois.edu/ie598/online submodular... · 2018....

Online Submodular Optimization Problems

Menglong Li

April 26, 2018

Menglong Li (UIUC) Short title April 26, 2018 1 / 24

Overview

1 The Reason

2 Online convex optimization (OCO)

3 Online submodular set function minimization

4 Online Continuous Submodular Maximization

5 Online submodular minimization in real spaceQuadratic submodular functionsGeneral continuous submodular functions


Why consider submodular functions

Question: why consider submodular functions?

Submodular functions often appear as objective functions of machinelearning tasks such as sensor placement, document summarization oractive learning

submodular functions can model valuation functions of agents withdiminishing returns

Submodular function itself is nonconvex and nonconcave in general,which lead to very hard optimization problems


Why consider online problem

Question: why consider online problem?

Online problem can model multi-period problem which the cost function ineach period is revealed after the decision is made at that period.


Why consider online problem

Question: why consider online problem?Online problem can model multi-period problem which the cost function ineach period is revealed after the decision is made at that period.


Online convex optimization

Model setting:Consider a multi-period decision-making problem where a decision makermakes decision at each period to minimize the ”regret”.For t=1,...,T,at iteration t, the decision maker chooses xt ∈ K .After the decision maker has committed to this choice, a convex costfunction ft ∈ F : K → R is revealed.Then go to the next period.The decision maker wants to minimize the regret:

RT ((xt)) =T∑t=1

ft(xt)−minx∈K

T∑t=1

ft(x).


OCO

Projected gradient algorithm:

xt+1 = ΠK (xt − αt∇ft(xt))

. Denote D the diameter of K .

1 If ft is L-Lipschtiz continuous and αt = DL√2T

, then

RT ((xt)) ≤ DL√

2T

2 If ft is L-Lipschtiz continuous and µ-strongly convex, and αt = 1µt ,

then

RT ((xt)) ≤ L2(1 + logT )

2µ


Submodular set function

Let [n] = {1, ..., n}. A set function f : 2[n] → R is called submodular if forall sets S ,T ⊆ [n] such that T ⊆ S , and for all elements i ∈ [n]− S , wehave

f (T ∪ {i})− f (T ) ≥ f (S ∪ {i})− f (S).

Or equivalently, f is submodular if and only if for all S ,T ⊆ [n],

f (S ∪ T ) + f (S ∩ T ) ≤ f (S) + f (T ).

An important theorem:For any set function f : 2[n] → R, there is a convex extension (calledLovasz extension) f L : [0, 1]n → R such that f is submodular if and onlyf L is convex. In addition, there is a correspondence between minimum off L and minimum of f .


Results

The model setting is similar to OCO:

Assume ft : 2[n] → [−M,M] is submodular.

Suppose in each period t, the decision maker has unlimited access tothe value oracles of the previously seen cost functions f1, f2, ...ft−1.

Theorem (Hazan, E., & Kale, S. (2012).)

There is an online subgradient descent algorithm with step sizeαt =

√n

16MT such that

E [RegretT ] ≤ 4M√nT


Submodular function in real space

Let L be a lattice. A function f : L → R ∪ {+∞} is submodular if for allx , y ∈ L,

f (x) + f (y) ≥ f (x ∧ y) + f (x ∨ y)

It is clear that when f is twice differentiable and L is a box, f issubmodular if and only if

∂2f (x)

∂xi∂xj≤ 0, ∀i 6= j , x ∈ L

A twice differentiable function f is called DR-submodular if

∂2f (x)

∂xi∂xj≤ 0,∀i , j , x ∈ L.


The performance of stationary points

Theorem (Hassani et al. 2017)

If f be a monotone and DR-submodular and assume K ⊆ L is a convexset. Then,(i) If x is a stationary point of f in K , then f (x) ≥ 1

2OPT.(ii) Furthermore, if f is L-smooth, gradient ascent with a step size smallerthan 1/L will converge to a stationary point.

The lower bound in (i) is tight.


Online gradient ascent

Online Gradient AscentInput: convex set K ,T , x1 ∈ K , step sizes {αt}Output: {xt : 1 ≤ t ≤ T}1: for t ← 1, 2, 3, ...,T do2: Play xt and receive reward ft(xt).3: xt+1 = ΠK (xt + αt∇ft(xt))4: end for

Theorem (Chen et al. 2018)

Assume that the functions ft : L → R+ are monotone and DR-submodularfor t = 1, 2, 3, ...,T . With step size αt = D

G√t,

1

2maxx∈K

T∑t=1

ft(x)−T∑t=1

ft(xt) ≤3

4DG√T

Here, D is the diam(K ),G = sup1≤t≤T ,x∈K ||∇ft(x)||.


Online Quadratic submodular function minimization

Why consider quadratic functions?

Broad applications: It arises in a broad range of fields such ascombinatorial optimization, numerical partial differential equationsfrom engineering, control and finance, and general nonlinearprogramming problems

NP-hard: Nonconvex quadratic optimization problems are known tobe NP-hard


Online Quadratic submodular function minimization

Model setting:In each period t = 1, ...,T , ft is a quadratic function, i.e., ft(x) = xTAtx .Let box K ⊂ Rn be the decision space.The decision maker wants to minimize the regret

RT ((xt)) =T∑t=1

ft(xt)−minx∈K

T∑t=1

ft(x)

(Note that a quadratic function xTAx is submodular if and only if all theoff diagonal entries are nonpositive.)


SDP relaxation of quadratic submodular functionsminimization problem

Consider the quadratic optimization problem

OPTQP = min xTQx (QP)

s.t. x2 ∈ F

and its (SDP) relaxation

OPT SDP = min < Q,X > (SDP)

s.t. diag(X ) ∈ F ,Z ∈ Sn+.

Here, F ∈ Rn is a closed convex set. Sn and Sn+ are the set of n × n

symmetric matrices and the set of n × n positive semidefinite symmetricmatrices, respectively.


When QP=SDP ?

Theorem (Zhang, S. (2000))

If Q = [qij ]n×n satisfies qij ≤ 0 for all i 6= j , then OPTQP = OPT SDP .Moreover, suppose that X ∗ is an optimal solution for (SDP), then√

diag(X ∗) is an optimal solution for (QP).


Regret bound

Back to our online quadratic submodular optimization problem:For t = 1, ...,T , ft(x) = xTAtx is submodular. Decision makersuccessively chooses xt to minimize the regret

RT ((xt)) =T∑t=1

ft(xt)−minx∈K

T∑t=1

ft(x).


Regret bound

Initiated by the SDP relaxation theorem, we first solve the online SDPproblem:For t = 1, ...,T , choose Xt ∈ Sn

+(K ) with diag(Xt) ∈ K 2 to minimize theregret

RT ((Xt)) =T∑t=1

< At ,Xt > − minX∈Sn

+(K)

T∑t=1

< At ,X >

Here, Sn+(K ) = {X ∈ Sn

+|diag(X ) ∈ K 2}.


Algorithm

For the online SDP problem, we use the following algorithm:Initial X1.For t = 1, ...,T − 1, let Xt+1 = ΠSn

+(K2)(Xt − αtAt).

Theorem

If At are all submodular and αt = DG√T

, the regret

RT ((Xt)) ≤ DG√T

Here, D = diam(Sn+(K 2)),G = max1≤t≤T ||At ||2


proof

Proof.

Let Yt+1 = Xt − αAt , α = DG√T

. Then

||Xt+1−X ∗||2 ≤ ||Yt+1−X ∗||2 = ||Xt−X ∗||2+α2||At ||2−2α < At ,Xt−X ∗ >

This implies

< At ,Xt − X ∗ >≤ 1

2α(||Xt − X ∗||2 − ||Xt+1 − X ∗||2) +

α

2||At ||2

Take summation from 1 to T ,

T∑t=1

< At ,Xt − X ∗ >≤ 1

2α||X1 − X ∗||2 +

α

2

T∑t=1

||At ||2

≤ D2

2α+αTG

2= DG

√T


Regret bound for original problem

Theorem

Suppose Xt is the selected matrices in the online SDP problem. Letxt =

√diag(Xt) ∈ K ′ = K ∪ (−K ). Then

RT ((xt)) ≤ RT ((Xt)) ≤ DG√T

Proof.

Denote xij , aij the (i , j)-th entry of Xt ,At , respectively. Thenxt = (

√x11, ...,

√xnn). Since Xt is positive semidefinite, we have

x2ij ≤ xiixjj .

xTt Atxt =∑

i ,j aij√xii√xjj =

∑ni=1 aiixii +

∑i 6=j aij

√xii√xjj ≤∑n

i=1 aiixii +∑

i 6=j aij |xij | ≤∑n

i=1 aiixii +∑

i 6=j aijxij =< At ,Xt >.

Therefore, RT ((xt)) =∑T

t=1 xTt Atxt −minx∈K ′

∑Tt=1 x

TAtx ≤∑T

t=1 <

At ,Xt > −minX∈Sn+(K

′)

∑Tt=1 < At ,X >≤ DG

√T


How about general continuous submodular functions?

Intuition: Generalize the Lovasz extension of submodular set functionLet L =

∏ni=1 Xi . Xi ⊂ R are compact. Define the measure space P(Xi )

convex hull of all the one-point distribution on Xi , i.e.,

P(Xi ) = conv{δxi : xi ∈ Xi}

Let P(L) = Πni=1P(Xi ). There are two extensions of H : L → R:

∀µ ∈ P(L), h1(µ1, ..., µn) =

∫ 1

0H(F−1µ1 (t), ...,F−1µn (t))dt

convex closure: The lowest semi-continuous convex functionsuch that hc(δx) ≤ H(x)

hc(µ1, ..., µn) = infγ∈P(L)

∫LH(x)dγ(x)


Theorem (Bach, F. (2015))

h1 is convex if and only if H is submodular.

If H is submodular, then the two extensions are equal, i.e., h1 = hc .

Minimizing hc on P(L) and minimizing H on L is equivalent, that is,the two optimal values are equal, and one may find minimizers of oneproblem given the other one.


References

Chen, L., Hassani, H., & Karbasi, A. (2018). Online Continuous SubmodularMaximization. arXiv preprint arXiv:1802.06052.

Hassani, H., Soltanolkotabi, M., & Karbasi, A. (2017). Gradient methods forsubmodular maximization. In Advances in Neural Information Processing Systems(pp. 5843-5853).

Hazan, E., & Kale, S. (2012). Online submodular minimization. Journal of MachineLearning Research, 13(Oct), 2903-2922.

Bach, F. (2015). Submodular functions: from discrete to continous domains. arXivpreprint arXiv:1511.00394.

Zhang, S. (2000). Quadratic maximization and semidefinite relaxation.Mathematical Programming, 87(3), 453-465


The End


online submodular optimization problemsniaohe.ise.illinois.edu/ie598/online submodular... · 2018....

Documents