nonsmooth analysis and optimization - iki.fi1.2 dual spaces, separation, and weak convergence6 1.3...
Post on 26-Mar-2021
6 Views
Preview:
TRANSCRIPT
INTRODUCTION TO NONSMOOTH
ANALYSIS AND OPTIMIZATION
Christian Clason
christian.clason@uni-due.de
hps://udue.de/clason
orcid: 0000-0002-9948-8426
Tuomo Valkonen
tuomov@iki.fi
hps://tuomov.iki.fi
orcid: 0000-0001-6683-3572
2020-12-31
arxiv: 2001.00216v3
๐๐ (๐ข)๐ (๐ข) + ใ0, ๐ขใ
๐ (๐ข)
๐ (๐ข) + ใ๐ โฒ(๐ข;โ1), ๐ขใ
๐ข
CONTENTS
preface v
I BACKGROUND
1 functional analysis 2
1.1 Normed vector spaces 2
1.2 Dual spaces, separation, and weak convergence 6
1.3 Hilbert spaces 12
2 calculus of variations 15
2.1 The direct method 15
2.2 Dierential calculus in Banach spaces 20
2.3 Superposition operators 24
2.4 Variational principles 26
II CONVEX ANALYSIS
3 convex functions 32
3.1 Basic properties 32
3.2 Existence of minimizers 37
3.3 Continuity properties 39
4 convex subdifferentials 42
4.1 Definition and basic properties 42
4.2 Fundamental examples 45
4.3 Calculus rules 50
5 fenchel duality 57
5.1 Fenchel conjugates 57
5.2 Duality of optimization problems 63
i
contents
6 monotone operators and proximal points 68
6.1 Basic properties of set-valued mappings 68
6.2 Monotone operators 71
6.3 Resolvents and proximal points 78
7 smoothness and convexity 86
7.1 Smoothness 86
7.2 Strong convexity 89
7.3 MoreauโYosida regularization 93
8 proximal point and splitting methods 101
8.1 Proximal point method 101
8.2 Explicit spliing 102
8.3 Implicit spliing 103
8.4 Primal-dual proximal spliing 105
8.5 Primal-dual explicit spliing 107
8.6 The Augmented Lagrangian and alternating direction minimization 109
8.7 Connections 111
9 weak convergence 117
9.1 Opialโs lemma and Fejรฉr monotonicity 117
9.2 The fundamental methods: proximal point and explicit spliing 119
9.3 Preconditioned proximal point methods: DRS and PDPS 122
9.4 Preconditioned explicit spliing methods: PDES and more 127
9.5 Fixed-point theorems 131
10 rates of convergence by testing 134
10.1 The fundamental methods 135
10.2 Structured algorithms and acceleration 138
11 gaps and ergodic results 145
11.1 Gap functionals 145
11.2 Convergence of function values 148
11.3 Ergodic gap estimates 152
11.4 The testing approach in its general form 156
11.5 Ergodic gaps for accelerated primal-dual methods 157
11.6 Convergence of the ADMM 164
12 meta-algorithms 167
12.1 Over-relaxation 167
12.2 Inertia 172
12.3 Line search 177
ii
contents
III NONCONVEX ANALYSIS
13 clarke subdifferentials 181
13.1 Definition and basic properties 181
13.2 Fundamental examples 184
13.3 Calculus rules 188
13.4 Characterization in finite dimensions 197
14 semismooth newton methods 200
14.1 Convergence of generalized Newton methods 200
14.2 Newton derivatives 202
15 nonlinear primal-dual proximal splitting 212
15.1 Nonconvex explicit spliing 212
15.2 NL-PDPS formulation and assumptions 215
15.3 Convergence proof 216
16 limiting subdifferentials 222
16.1 Bouligand subdierentials 222
16.2 Frรฉchet subdierentials 223
16.3 Mordukhovich subdierentials 225
17 Y-subdifferentials and approximate fermat principles 229
17.1 Y-subdierentials 229
17.2 Smooth spaces 231
17.3 Fuzzy Fermat principles 233
17.4 Approximate Fermat principles and projections 237
IV SET-VALUED ANALYSIS
18 tangent and normal cones 240
18.1 Definitions and examples 240
18.2 Basic relationships and properties 245
18.3 Polarity and limiting relationships 248
18.4 Regularity 257
19 tangent and normal cones of pointwise-defined sets 260
19.1 Derivability 260
19.2 Tangent and normal cones 262
20 derivatives and coderivatives of set-valued mappings 270
20.1 Definitions 270
20.2 Basic properties 273
20.3 Examples 277
20.4 Relation to subdierentials 286
iii
contents
21 derivatives and coderivatives of pointwise-defined mappings 289
21.1 Proto-dierentiability 289
21.2 Graphical derivatives and coderivatives 291
22 calculus for the graphical derivative 295
22.1 Semi-dierentiability 295
22.2 Cone transformation formulas 297
22.3 Calculus rules 299
23 calculus for the frรฉchet coderivative 306
23.1 Semi-codierentiability 306
23.2 Cone transformation formulas 307
23.3 Calculus rules 309
24 calculus for the clarke graphical derivative 314
24.1 Strict dierentiability 314
24.2 Cone transformation formulas 316
24.3 Calculus rules 318
25 calculus for the limiting coderivative 321
25.1 Strict codierentiability 321
25.2 Partial sequential normal compactness 322
25.3 Cone transformation formulas 325
25.4 Calculus rules 328
26 second-order optimality conditions 331
26.1 Second-order derivatives 331
26.2 Subconvexity 334
26.3 Suicient and necessary conditions 336
27 lipschitz-like properties and stability 342
27.1 Lipschitz-like properties of set-valued mappings 342
27.2 Neighborhood-based coderivative criteria 348
27.3 Point-based coderivative criteria 353
27.4 Stability with respect to perturbations 359
28 faster convergence from regularity 364
28.1 Subregularity and submonotonicity of subdierentials 364
28.2 Local linear convergence of explicit spliing 369
iv
PREFACE
Optimization is concerned with nding solutions to problems of the form
min๐ฅโ๐
๐น (๐ฅ)
for a function ๐น : ๐ โ โ and a set ๐ โ ๐ . Specically, one considers the followingquestions:
1. Does this problem admit a solution, i.e., is there an ๐ฅ โ ๐ such that
๐น (๐ฅ) โค ๐น (๐ฅ) for all ๐ฅ โ ๐ ?
2. Is there an intrinsic characterization of ๐ฅ , i.e., one not requiring comparison with allother ๐ฅ โ ๐ ?
3. How can this ๐ฅ be computed (eciently)?
4. Is ๐ฅ stable, e.g., with respect to computational errors?
For๐ โ โ๐ , these questions can be answered in turn roughly as follows:
1. If๐ is compact and ๐น is continuous, the Weierstraร Theorem yields that ๐น attains itsminimum at a point ๐ฅ โ ๐ (as well as its maximum).
2. If ๐น is dierentiable and๐ is open, the Fermat principle
0 = ๐น โฒ(๐ฅ)
holds.
3. If ๐น is continuously dierentiable and ๐ is open, one can apply the steepest descentor gradient method to compute an ๐ฅ satisfying the Fermat principle: Choosing astarting point ๐ฅ0 and setting
๐ฅ๐+1 = ๐ฅ๐ โ ๐ก๐๐น โฒ(๐ฅ๐), ๐ = 0, . . . ,
for suitable step sizes ๐ก๐ , we have that ๐ฅ๐ โ ๐ฅ for ๐ โ โ.
v
preface
If ๐น is even twice continuously dierentiable, one can apply Newtonโs method to theFermat principle: Choosing a suitable starting point ๐ฅ0 and setting
๐ฅ๐+1 = ๐ฅ๐ โ ๐น โฒโฒ(๐ฅ๐)โ1๐น โฒ(๐ฅ๐), ๐ = 0, . . . ,
we have that ๐ฅ๐ โ ๐ฅ for ๐ โ โ.
4. If ๐น is twice continuously dierentiable and ๐น โฒโฒ(๐ฅ) is invertible, then the inversefunction theorem yields that (๐น โฒ)โ1 exists locally around ๐ฅ and is continuously dier-entiable and hence Lipschitz continuous. If we now have computed an approximatesolution ๐ฅ to the Fermat principle with ๐น โฒ(๐ฅ) = ๐ค โ 0, we obtain from this theestimate
โ๐ฅ โ ๐ฅ โ = โ(๐น โฒ)โ1(0) โ (๐น โฒ)โ1(๐ค)โ โค โ((๐น โฒ)โ1)โฒ(0)โโ๐ค โ.
A small residual in the Fermat principle therefore implies a small error for theminimizer.
However, there are many practically relevant functions that are not dierentiable, suchas the absolute value or maximum function. The aim of nonsmooth analysis is thereforeto nd generalized derivative concepts that on the one hand allow the above sketchedapproach for such functions and on the other hand admit a suciently rich calculus togive explicit derivatives for a suciently large class of functions. Here we concentrate onthe two classes of
i) convex functions,
ii) locally Lipschitz continuous functions,
which together cover a wide spectrum of applications. In particular, the rst class will leadus to generalized gradient methods, while the second class are the basis for generalizedNewton methods. To x ideas, we aim at treating problems of the form
(P) min๐ฅโ๐ถ
1๐โ๐ (๐ฅ) โ ๐งโ๐๐ + ๐ผ
๐โ๐ฅ โ๐๐
for a convex set ๐ถ โ ๐ , a (possibly nonlinear but dierentiable) operator ๐ : ๐ โ ๐ ,๐ผ โฅ 0 and ๐, ๐ โ [1,โ) (in particular, ๐ = 1 and/or ๐ = 1). Such problems are ubiquitousin inverse problems, imaging, and optimal control of dierential equations. Hence, weconsider optimization in innite-dimensional function spaces; i.e., we are looking forfunctions as minimizers. The main benet (beyond the frequently cleaner notation) is thatthe developed algorithms become discretization independent: they can be applied to any(reasonable) nite-dimensional approximation, and the details โ in particular, the nenessโ of the approximation do not inuence the convergence behavior of the algorithms. Aspecial role will be played throughout the book by integral functionals and superpositionoperators that act pointwise on functions, since these allow transferring the often moreexplicit nite-dimensional calculus to the innite-dimensional setting.
vi
preface
Nonsmooth analysis and optimization in nite dimensions has a long history; we referhere to the classical textbooks [Mรคkelรค & Neittaanmรคki 1992; Hiriart-Urruty & Lemarรฉchal1993a; Hiriart-Urruty & Lemarรฉchal 1993b; Rockafellar & Wets 1998] as well as the recent[Bagirov, Karmitsa & Mรคkelรค 2014; Beck 2017]. There also exists a large body of literatureon specic nonsmooth optimization problems, in particular ones involving variationalinequalities and equilibrium constraints; see, e.g., [Outrata, Koฤvara & Zowe 1998; Facchinei& Pang 2003a; Facchinei & Pang 2003b]. In contrast, the innite-dimensional setting isstill being actively developed, with monographs and textbooks focusing on either theory[Clarke 1990; Mordukhovich 2006; Schirotzek 2007; Barbu & Precupanu 2012; Penot 2013;Clarke 2013; Ioe 2017; Mordukhovich 2018] or algorithms [Ito & Kunisch 2008; Ulbrich2011] or restricted settings [Bauschke & Combettes 2017]. The aim of this book is thusto draw together results scattered throughout the literature in order to give a uniedpresentation of theory and algorithms โ both rst- and second-order โ in Banach spacesthat is suitable for an advanced class on mathematical optimization. In order to do this, wefocus on optimization of nonsmooth functionals rather than nonsmooth constraints; inparticular, we do not treat optimization with complementarity or equilibrium constraints,which still see signicant active development in innite dimensions. Regarding generalizedderivatives of set-valued mappings required for the mentioned stability results, we similarlydo not aim for a (possibly fuzzy) general theory and instead restrict ourselves to situationswhere one of a zoo of regularity conditions holds that allows deriving exact results thatstill apply to problems of the form (P). The general theory can be found in, e.g., [Aubin &Frankowska 1990; Rockafellar & Wets 1998; Mordukhovich 2018; Mordukhovich 2006] (towhich this book is, among other things, intended as a gentle introduction).
The book is intended for students and researchers with a solid background in analysis andlinear algebra and an interest in the mathematical foundations of nonsmooth optimization.Since we deal with innite-dimensional spaces, some knowledge of functional analysisis assumed, but the necessary background will be summarized in Chapter 1. Similarly,Chapter 2 collects needed fundamental results from the calculus of variations, including thedirect method for existence of minimizers and the related notion of lower semicontinuity aswell as dierential calculus in Banach spaces, where the results on pointwise superpositionoperators on Lebesgue spaces require elementary (Lebesgue) measure and integrationtheory. Basic familiarity with classical nonlinear optimization is helpful but not necessary.
In Part II we then start our study of convex optimization problems. After introducing convexfunctionals and their basic properties in Chapter 3, we dene our rst generalized derivativein Chapter 4: the convex subdierential, which is no longer a single unique derivative butconsists of a set of equally admissible subderivatives. Nevertheless, we obtain a usefulcorresponding Fermat principle as well as calculus rules. A particularly useful calculusrule in convex optimization is Fenchel duality, which assigns to any optimization problema dual problem that can help treating the original primal problem; this is the content ofChapter 5. We change our viewpoint in Chapter 6 slightly to study the subdierential as aset-valued monotone operator, which leads us to the corresponding resolvent or proximal
vii
preface
point mapping, which will later become the basis of all algorithms. The following Chapter 7discusses the relation between convexity and smoothness of primal and dual problem andintroduces the MoreauโYosida regularization, which has better properties in both regardsthat can be used to accelerate the convergence of algorithms. We turn to these in Chapter 8,where we start by deriving a number of popular rst-order methods including forward-backward splitting and primal-dual proximal splitting (also known as the ChambolleโPockmethod). Their convergence under rather general assumptions is then shown in Chapter 9.If additional convexity properties hold, we can even show convergence rates for the iteratesusing a general testing approach; this is carried out in Chapter 10. Otherwise we eitherhave to restrict ourselves to more abstract criticality measures as in Chapter 11 or modifythe algorithms to include over-relaxation or inertia as in Chapter 12. One philosophy wehere wish to pass to the reader is that the development of optimization methods consists,rstly, in suitable reformulation of the problem; secondly, in the preconditioning of theraw optimality conditions; and, thirdly, in testing with appropriate operators whether thisyields fast convergence.
We leave the convex world in Part III. For locally Lipschitz continuous functions, weintroduce the Clarke subdierential in Chapter 13 and derive calculus rules. Not only isthis useful for obtaining a Fermat principle for problems of the form (P), it is also the basisfor dening a further generalized derivative that can be used in place of the Hessian in ageneralized Newton method. This Newton derivative and the corresponding semismoothNewton method is studied in Chapter 14. We also derive and analyze a variant of the primal-dual proximal splitting method suitable for (P) in Chapter 15. We end this part with a shortoutlook Chapters 16 and 17 to further subdierential concepts that can lead to sharperoptimality conditions but in general admit a weaker calculus; we will look at some of thesein detail in the next part.
To derive stability properties of minimization problems, we need to study the sensitivity ofsubdierentials to perturbations and hence generalized derivative concepts for set-valuedmappings; this is the goal of Part IV. The construction of the generalized derivatives isgeometric, based on tangent and normal cones introduced in Chapter 18. From these, weobtain Frรฉchet and limiting (co)derivatives in Chapter 20 and derive calculus rules forthem in Chapters 22 to 25. In particular, we show how to lift the (more extensive) nite-dimensional theory to the special case of pointwise-dened sets and mappings operatorson Lebesgue spaces in Chapters 19 and 21. We then address second-order conditions fornonsmooth nonconvex optimization problems in Chapter 26. In Chapter 27, we use thesederivatives to characterize Lipschitz-like properties of set-valued mappings, which thenare used to obtain the desired stability properties. We also show in Chapter 28 that theseregularity properties imply faster convergence of rst-order methods.
This book can serve as a textbook for several dierent classes:
(i) an introductory course on convex optimization based on Chapters 3 to 10 (excludingSection 3.3 and results on superposition operators) and adding Chapters 11, 12 and 15
viii
preface
as time permits;
(ii) an intermediate course on nonsmooth optimization based on Chapters 3 to 9 (includ-ing Section 3.3 and results on superposition operators) together with Chapters 13, 14,16 and 17;
(iii) an intermediate course on nonsmooth analysis based on Chapters 3 to 6 togetherwith Chapter 13 and Chapters 16 to 20, adding Chapters 22 to 21 as time permits;
(iv) an advanced course on set-valued analysis based on Chapters 16 to 28.
This book is based in part on such graduate lectures given by the rst author in 2014 (inslightly dierent form) and 2016โ2017 at the University of Duisburg-Essen and by thesecond author at the University of Cambridge in 2015 and Escuela Politรฉcnica Nacionalin Quito in 2020. Shorter seminars were also delivered at the University of Jyvรคskylรค andthe Escuela Politรฉcnica Nacional in 2017. Part IV of the book was also used in a courseon variational analysis at the EPN in 2019. Parts of the book were also taught by bothauthors at the Winter School โModern Methods in Nonsmooth Optimizationโ organized byChristian Kanzow and Daniel Wachsmuth at the University Wรผrzburg in February 2018,for which the notes were further adapted and extended. As such, much (but not all) ofthis material is classical. In particular, Chapters 3 to 7 as well as Chapter 13 are based on[Barbu & Precupanu 2012; Brokate 2014; Schirotzek 2007; Attouch, Buttazzo & Michaille2014; Bauschke & Combettes 2017; Clarke 2013], Chapter 14 is based on [Ulbrich 2002; Ito& Kunisch 2008; Schiela 2008], Chapter 16 is extracted from [Mordukhovich 2006], andChapters 18 to 25 are adapted from [Rockafellar & Wets 1998; Mordukhovich 2006]. Partsof Chapter 17 are adapted from [Ioe 2017], and other parts are original work. On the otherhand, Chapters 8 to 12 as well as Chapters 15, 21 and 28 are adapted from [Valkonen 2020c;Valkonen 2021; Clason, Mazurenko & Valkonen 2019], [Clason & Valkonen 2017b], and[Valkonen 2021], respectively.
Finally, we would like to thank Sebastian Angerhausen, Fernando Jimenez Torres, EnsioSuonperรค, Diego Vargas Jaramillo, Daniel Wachsmuth, and in particular Gerd Wachsmuthfor carefully reading parts of the manuscript, nding mistakes and bits that could beexpressed more clearly, and making helpful suggestions. All remaining errors are of courseour own.
Essen and Quito/Helsinki, December 2020
ix
Part I
BACKGROUND
1
1 FUNCTIONAL ANALYSIS
Functional analysis is the study of innite-dimensional vector spaces and of the operatorsacting between them, and has since its foundations in the beginning of the 20th centurygrown into the lingua franca of modern applied mathematics. In this chapter we collectthe basic concepts and results (and, more importantly, x notations) from linear functionalanalysis that will be used throughout the rest of the book. For details and proofs, the readeris referred to the standard literature, e.g., [Alt 2016; Brezis 2010; Rynne & Youngson 2008],or to [Clason 2020].
1.1 normed vector spaces
In the following, ๐ will denote a vector space over the eld ๐, where we restrict ourselvesfor the sake of simplicity to the case ๐ = โ. A mapping โ ยท โ : ๐ โ โ+ โ [0,โ) is calleda norm (on ๐ ), if for all ๐ฅ โ ๐ there holds
(i) โ_๐ฅ โ = |_ |โ๐ฅ โ for all _ โ ๐,
(ii) โ๐ฅ + ๐ฆ โ โค โ๐ฅ โ + โ๐ฆ โ for all ๐ฆ โ ๐ ,(iii) โ๐ฅ โ = 0 if and only if ๐ฅ = 0 โ ๐ .
Example 1.1. (i) The following mappings dene norms on ๐ = โ๐ :
โ๐ฅ โ๐ =(๐โ๐=1
|๐ฅ๐ |๐) 1/๐
, 1 โค ๐ < โ,
โ๐ฅ โโ = max๐=1,...,๐
|๐ฅ๐ |.
(ii) The following mappings dene norms on ๐ = โ๐ (the space of real-valued se-
2
1 functional analysis
quences for which these terms are nite):
โ๐ฅ โ๐ =( โโ๐=1
|๐ฅ๐ |๐) 1/๐
, 1 โค ๐ < โ,
โ๐ฅ โโ = sup๐=1,...,โ
|๐ฅ๐ |.
(iii) The following mappings dene norms on ๐ = ๐ฟ๐ (ฮฉ) (the space of real-valuedmeasurable functions on the domain ฮฉ โ โ๐ for which these terms are nite):
โ๐ขโ๐ =(โซ
ฮฉ|๐ข (๐ฅ) |๐
) 1/๐, 1 โค ๐ < โ,
โ๐ขโโ = ess sup๐ฅโฮฉ
|๐ข (๐ฅ) |,
where ess sup stands for the essential supremum; for details on these denitions,see, e.g., [Alt 2016].
(iv) The following mapping denes a norm on ๐ = ๐ถ (ฮฉ) (the space of continuousfunctions on ฮฉ):
โ๐ขโ๐ถ = sup๐ฅโฮฉ
|๐ข (๐ฅ) |.
An analogous norm is dened on ๐ = ๐ถ0(ฮฉ) (the space of continuous functionson ฮฉ with compact support), if the supremum is taken only over the space ofcontinuous functions on ฮฉ with compact support), if the supremum is taken onlyover ๐ฅ โ ฮฉ.
If โ ยท โ is a norm on ๐ , the tuple (๐, โ ยท โ) is called a normed vector space, and one frequentlydenotes this by writing โ ยท โ๐ . If the norm is canonical (as in Example 1.1 (ii)โ(iv)), it is oftenomitted, and one speaks simply of โthe normed vector space ๐ โ.
Two norms โ ยท โ1, โ ยท โ2 are called equivalent on ๐ , if there are constants ๐1, ๐2 > 0 suchthat
๐1โ๐ฅ โ2 โค โ๐ฅ โ1 โค ๐2โ๐ฅ โ2 for all ๐ฅ โ ๐ .If ๐ is nite-dimensional, all norms on ๐ are equivalent. However, the corresponding con-stants ๐1 and ๐2 may depend on the dimension ๐ of ๐ ; avoiding such dimension-dependentconstants is one of the main reasons to consider optimization in innite-dimensionalspaces.
If (๐, โ ยท โ๐ ) and (๐, โ ยท โ๐ ) are normed vector spaces with ๐ โ ๐ , we call ๐ continuouslyembedded in ๐ , denoted by ๐ โฉโ ๐ , if there exists a ๐ถ > 0 with
โ๐ฅ โ๐ โค ๐ถ โ๐ฅ โ๐ for all ๐ฅ โ ๐ .
3
1 functional analysis
A norm directly induces a notion of convergence, the so-called strong convergence. Asequence {๐ฅ๐}๐โโ โ ๐ converges (strongly in ๐ ) to a ๐ฅ โ ๐ , denoted by ๐ฅ๐ โ ๐ฅ , if
lim๐โโ โ๐ฅ๐ โ ๐ฅ โ๐ = 0.
A set๐ โ ๐ is called
โข closed, if for every convergent sequence {๐ฅ๐}๐โโ โ ๐ the limit ๐ฅ โ ๐ is an elementof๐ as well;
โข compact, if every sequence {๐ฅ๐}๐โโ โ ๐ contains a convergent subsequence {๐ฅ๐๐ }๐โโwith limit ๐ฅ โ ๐ .
A mapping ๐น : ๐ โ ๐ is continuous if and only if ๐ฅ๐ โ ๐ฅ implies ๐น (๐ฅ๐) โ ๐น (๐ฅ). If ๐ฅ๐ โ ๐ฅand ๐น (๐ฅ๐) โ ๐ฆ imply that ๐น (๐ฅ) = ๐ฆ (i.e., graph ๐น โ ๐ ร ๐ is a closed set), we say that ๐นhas closed graph.
Further we dene for later use for ๐ฅ โ ๐ and ๐ > 0
โข the open ball ๐(๐ฅ, ๐ ) โ {๐ง โ ๐ | โ๐ฅ โ ๐งโ๐ < ๐ } andโข the closed ball ๐น(๐ฅ, ๐ ) โ {๐ง โ ๐ | โ๐ฅ โ ๐งโ๐ โค ๐ }.
The closed ball around 0 โ ๐ with radius 1 is also referred to as the unit ball ๐น๐ . A set๐ โ ๐ is called
โข open, if for all ๐ฅ โ ๐ there exists an ๐ > 0 with๐(๐ฅ, ๐ ) โ ๐ (i.e., all ๐ฅ โ ๐ are interiorpoints of๐ );
โข bounded, if it is contained in ๐น(0, ๐ ) for a ๐ > 0;
โข convex, if for any ๐ฅ, ๐ฆ โ ๐ and _ โ [0, 1] also _๐ฅ + (1 โ _)๐ฆ โ ๐ .
In normed vector spaces it always holds that the complement of an open set is closed andvice versa (i.e., the closed sets in the sense of topology are exactly the (sequentially) closedset as dened above). The denition of a norm directly implies that both open and closedballs are convex.
For arbitrary ๐ , we denote by cl๐ the closure of ๐ , dened as the smallest closed set thatcontains๐ (which coincides with the set of all limit points of convergent sequences in๐ );we write int๐ for the interior of๐ , which is the largest open set contained in๐ ; and wewrite bd๐ โ cl๐ \ int๐ for the boundary of๐ . Finally, we write co๐ for the convex hullof๐ , dened as the smallest convex set that contains๐ .
A normed vector space ๐ is called complete if every Cauchy sequence in ๐ is convergent;in this case,๐ is called a Banach space. All spaces in Example 1.1 are Banach spaces. Convexsubsets of Banach spaces have the following useful property which derives from the BaireTheorem.
4
1 functional analysis
Lemma 1.2. Let ๐ be a Banach space and๐ โ ๐ be closed and convex. Then
int๐ = {๐ฅ โ ๐ | for all โ โ ๐ there is a ๐ฟ > 0 with ๐ฅ + ๐กโ โ ๐ for all ๐ก โ [0, ๐ฟ]} .
The set on the right-hand side is called algebraic interior or core. For this reason, Lemma 1.2is sometimes referred to as the โcore-int Lemmaโ. Note that the inclusion โโโ always holdsin normed vector spaces due to the denition of interior points via open balls.
We now consider mappings between normed vector spaces. In the following, let (๐, โ ยท โ๐ )and (๐, โ ยท โ๐ ) be normed vector spaces,๐ โ ๐ , and ๐น : ๐ โ ๐ be a mapping. We denoteby
โข ker ๐น โ {๐ฅ โ ๐ | ๐น (๐ฅ) = 0} the kernel or null space of ๐น ;โข ran ๐น โ {๐น (๐ฅ) โ ๐ | ๐ฅ โ ๐ } the range of ๐น ;โข graph ๐น โ {(๐ฅ, ๐ฆ) โ ๐ ร ๐ | ๐ฆ = ๐น (๐ฅ)} the graph of ๐น .
We call ๐น : ๐ โ ๐
โข continuous at ๐ฅ โ ๐ , if for all Y > 0 there exists a ๐ฟ > 0 with
โ๐น (๐ฅ) โ ๐น (๐ง)โ๐ โค Y for all ๐ง โ ๐(๐ฅ, ๐ฟ) โฉ๐ ;
โข Lipschitz continuous, if there exists an ๐ฟ > 0 (called Lipschitz constant) with
โ๐น (๐ฅ1) โ ๐น (๐ฅ2)โ๐ โค ๐ฟโ๐ฅ1 โ ๐ฅ2โ๐ for all ๐ฅ1, ๐ฅ2 โ ๐ .
โข locally Lipschitz continuous at ๐ฅ โ ๐ , if there exists a ๐ฟ > 0 and a ๐ฟ = ๐ฟ(๐ฅ, ๐ฟ) > 0with
โ๐น (๐ฅ) โ ๐น (๐ฅ)โ๐ โค ๐ฟโ๐ฅ โ ๐ฅ โ๐ for all ๐ฅ โ ๐(๐ฅ, ๐ฟ) โฉ๐ ;
โข locally Lipschitz continuous near ๐ฅ โ ๐ , if there exists a ๐ฟ > 0 and a ๐ฟ = ๐ฟ(๐ฅ, ๐ฟ) > 0with
โ๐น (๐ฅ1) โ ๐น (๐ฅ2)โ๐ โค ๐ฟโ๐ฅ1 โ ๐ฅ2โ๐ for all ๐ฅ1, ๐ฅ2 โ ๐(๐ฅ, ๐ฟ) โฉ๐ .We will refer to the ๐(๐ฅ, ๐ฟ) as the Lipschitz neighborhood of ๐ฅ (for ๐น ). If ๐น is locallyLipschitz continuous near every ๐ฅ โ ๐ , we call ๐น locally Lipschitz continuous on ๐ .
If ๐ : ๐ โ ๐ is linear, continuity is equivalent to the existence of a constant ๐ถ > 0 with
โ๐๐ฅ โ๐ โค ๐ถ โ๐ฅ โ๐ for all ๐ฅ โ ๐ .
5
1 functional analysis
For this reason, continuous linear mappings are called bounded; one speaks of a boundedlinear operator. The space ๐(๐ ;๐ ) of bounded linear operators is itself a normed vectorspace if endowed with the operator norm
โ๐ โ๐(๐ ;๐ ) = sup๐ฅโ๐\{0}
โ๐๐ฅ โ๐โ๐ฅ โ๐ = sup
โ๐ฅ โ๐=1โ๐๐ฅ โ๐ = sup
โ๐ฅ โ๐โค1โ๐๐ฅ โ๐
(which is equal to the smallest possible constant ๐ถ in the denition of continuity). If(๐, โ ยท โ๐ ) is a Banach space, then so is (๐(๐ ;๐ ), โ ยท โ๐(๐ ;๐ )).Finally, if ๐ โ ๐(๐ ;๐ ) is bijective, the inverse ๐ โ1 : ๐ โ ๐ is continuous if and only ifthere exists a ๐ > 0 with
๐ โ๐ฅ โ๐ โค โ๐๐ฅ โ๐ for all ๐ฅ โ ๐ .In this case, โ๐ โ1โ๐(๐ ;๐ ) = ๐โ1 for the largest possible choice of ๐ .
1.2 dual spaces, separation, and weak convergence
Of particular importance to us is the special case ๐(๐ ;๐ ) for ๐ = โ, the space of boundedlinear functionals on ๐ . In this case, ๐ โ โ ๐(๐ ;โ) is called the dual space (or just dual) of๐ . For ๐ฅโ โ ๐ โ and ๐ฅ โ ๐ , we set
ใ๐ฅโ, ๐ฅใ๐ โ ๐ฅโ(๐ฅ) โ โ.
This duality pairing indicates that we can also interpret it as ๐ฅ acting on ๐ฅโ, which willbecome important later. The denition of the operator norm immediately implies that
(1.1) ใ๐ฅโ, ๐ฅใ๐ โค โ๐ฅโโ๐ โ โ๐ฅ โ๐ for all ๐ฅ โ ๐, ๐ฅโ โ ๐ โ.
In many cases, the dual of a Banach space can be identied with another known Banachspace.
Example 1.3. (i) (โ๐ , โ ยท โ๐)โ ๏ฟฝ (โ๐ , โ ยท โ๐) with ๐โ1 +๐โ1 = 1, where we set 0โ1 = โand โโ1 = 0. The duality pairing is given by
ใ๐ฅโ, ๐ฅใ๐ =๐โ๐=1
๐ฅโ๐ ๐ฅ๐ .
(ii) (โ๐)โ ๏ฟฝ (โ๐) for 1 < ๐ < โ. The duality pairing is given by
ใ๐ฅโ, ๐ฅใ๐ =โโ๐=1
๐ฅโ๐ ๐ฅ๐ .
Furthermore, (โ1)โ = โโ, but (โโ)โ is not a sequence space.
6
1 functional analysis
(iii) Analogously, ๐ฟ๐ (ฮฉ)โ ๏ฟฝ ๐ฟ๐ (ฮฉ) with ๐โ1 + ๐โ1 = 1 for 1 < ๐ < โ. The dualitypairing is given by
ใ๐ขโ, ๐ขใ๐ =โซฮฉ๐ขโ(๐ฅ)๐ข (๐ฅ) ๐๐ฅ.
Furthermore, ๐ฟ1(ฮฉ)โ ๏ฟฝ ๐ฟโ(ฮฉ), but ๐ฟโ(ฮฉ)โ is not a function space.
(iv) ๐ถ0(ฮฉ)โ ๏ฟฝ M(ฮฉ), the space of Radon measure; it contains among others theLebesguemeasure as well as Diracmeasures ๐ฟ๐ฅ for๐ฅ โ ฮฉ, dened via ๐ฟ๐ฅ (๐ข) = ๐ข (๐ฅ)for ๐ข โ ๐ถ0(ฮฉ). The duality pairing is given by
ใ๐ขโ, ๐ขใ๐ถ =โซฮฉ๐ข (๐ฅ) ๐๐ขโ.
A central result on dual spaces is the HahnโBanach Theorem, which comes in both analgebraic and a geometric version.
Theorem 1.4 (HahnโBanach, algebraic). Let ๐ be a normed vector space and ๐ฅ โ ๐ \ {0}.Then there exists a ๐ฅโ โ ๐ โ with
โ๐ฅโโ๐ โ = 1 and ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โ๐ .
Theorem 1.5 (HahnโBanach, geometric). Let ๐ be a normed vector space and ๐ด, ๐ต โ ๐ beconvex, nonempty, and disjoint.
(i) If ๐ด is open, there exists an ๐ฅโ โ ๐ โ and a _ โ โ with
ใ๐ฅโ, ๐ฅ1ใ๐ < _ โค ใ๐ฅโ, ๐ฅ2ใ๐ for all ๐ฅ1 โ ๐ด, ๐ฅ2 โ ๐ต.
(ii) If ๐ด is closed and ๐ต is compact, there exists an ๐ฅโ โ ๐ โ and a _ โ โ with
ใ๐ฅโ, ๐ฅ1ใ๐ โค _ < ใ๐ฅโ, ๐ฅ2ใ๐ for all ๐ฅ1 โ ๐ด, ๐ฅ2 โ ๐ต.
Particularly the geometric version โ also referred to as separation theorems โ is of crucialimportance in convex analysis. We will also require their following variant, which is knownas Eidelheit Theorem.
Corollary 1.6. Let ๐ be a normed vector space and ๐ด, ๐ต โ ๐ be convex and nonempty. If theinterior int๐ด of ๐ด is nonempty and disjoint with ๐ต, there exists an ๐ฅโ โ ๐ โ \ {0} and a _ โ โ
withใ๐ฅโ, ๐ฅ1ใ๐ โค _ โค ใ๐ฅโ, ๐ฅ2ใ๐ for all ๐ฅ1 โ ๐ด, ๐ฅ2 โ ๐ต.
7
1 functional analysis
Proof. Theorem 1.5 (i) yields the existence of ๐ฅโ and _ satisfying the claim for all ๐ฅ1 โ int๐ด;this inequality is even strict, which also implies ๐ฅโ โ 0. It thus remains to show that therst inequality also holds for the remaining ๐ฅ1 โ ๐ด \ int๐ด. Since int๐ด is nonempty, thereexists an ๐ฅ0 โ int๐ด, i.e., there is an ๐ > 0 with ๐(๐ฅ0, ๐ ) โ ๐ด. The convexity of ๐ด thenimplies that ๐ก๐ฅ + (1 โ ๐ก)๐ฅ โ ๐ด for all ๐ฅ โ ๐(๐ฅ0, ๐ ) and ๐ก โ [0, 1]. Hence,
๐ก๐(๐ฅ0, ๐ ) + (1 โ ๐ก)๐ฅ = ๐(๐ก๐ฅ0 + (1 โ ๐ก)๐ฅ, ๐ก๐ ) โ ๐ด,
and in particular ๐ฅ (๐ก) โ ๐ก๐ฅ0 + (1 โ ๐ก)๐ฅ โ int๐ด for all ๐ก โ (0, 1).We can thus nd a sequence {๐ฅ๐}๐โโ โ int๐ด (e.g., ๐ฅ๐ = ๐ฅ (๐โ1)) with ๐ฅ๐ โ ๐ฅ . Due to thecontinuity of ๐ฅโ โ ๐ = ๐(๐ ;โ) we can thus pass to the limit ๐ โ โ and obtain
ใ๐ฅโ, ๐ฅใ๐ = lim๐โโใ๐ฅ
โ, ๐ฅ๐ใ๐ โค _. ๏ฟฝ
This can be used to characterize a normed vector space by its dual. For example, a directconsequence of Theorem 1.4 is that the norm on a Banach space can be expressed as anoperator norm.
Corollary 1.7. Let ๐ be a Banach space. Then for all ๐ฅ โ ๐ ,
โ๐ฅ โ๐ = supโ๐ฅโโ๐โโค1
|ใ๐ฅโ, ๐ฅใ๐ |,
and the supremum is attained.
A vector ๐ฅ โ ๐ can therefore be considered as a linear and, by (1.1), bounded functional on๐ โ, i.e., as an element of the bidual ๐ โโ โ (๐ โ)โ. The embedding ๐ โฉโ ๐ โโ is realized bythe canonical injection
(1.2) ๐ฝ : ๐ โ ๐ โโ, ใ๐ฝ๐ฅ, ๐ฅโใ๐ โ โ ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅโ โ ๐ โ.
Clearly, ๐ฝ is linear; Theorem 1.4 furthermore implies that โ ๐ฝ๐ฅ โ๐ โโ = โ๐ฅ โ๐ . If the canonicalinjection is surjective and we can thus identify ๐ โโ with ๐ , the space ๐ is called reexive.All nite-dimensional spaces are reexive, as are Example 1.1 (ii) and (iii) for 1 < ๐ < โ;however, โ1, โโ as well as ๐ฟ1(ฮฉ), ๐ฟโ(ฮฉ) and ๐ถ (ฮฉ) are not reexive. In general, a normedvector space is reexive if and only if its dual space is reexive.
The following consequence of the separation Theorem 1.5 will be of crucial importance inPart IV. For a set ๐ด โ ๐ , we dene the polar cone
๐ดโฆ โ {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ด} ,
cf. Figure 1.1. Similarly, we dene for ๐ต โ ๐ โ the prepolar cone
๐ตโฆ โ {๐ฅ โ ๐ | ใ๐ฅโ, ๐ฅใ๐ โค 0 for all ๐ฅโ โ ๐ต} .
8
1 functional analysis
๐ด
๐ดโฆ0
Figure 1.1: The polar cone ๐ดโฆ is the normal cone at zero to the smallest cone containing ๐ด.
The bipolar cone of ๐ด โ ๐ is then dened as
๐ดโฆโฆ โ (๐ดโฆ)โฆ โ ๐ .
(If ๐ is reexive, ๐ดโฆโฆ = (๐ดโฆ)โฆ.) For the following statement about polar cones, recall that aset ๐ถ โ ๐ is called a cone if ๐ฅ โ ๐ถ and _ > 0 implies that _๐ฅ โ ๐ถ (such that (pre-, bi-)polarcones are indeed cones).
Theorem 1.8 (bipolar theorem). Let ๐ be a normed vector space and ๐ด โ ๐ . Then(i) ๐ดโฆ is closed and convex;
(ii) ๐ด โ ๐ดโฆโฆ;
(iii) If ๐ด โ ๐ต, then ๐ตโฆ โ ๐ดโฆ.
(iv) if ๐ถ is a nonempty, closed, and convex cone, then ๐ถ = ๐ถโฆโฆ.
Proof. (i): This follows directly from the denition and the continuity of the duality pairing.
(ii): Let ๐ฅ โ ๐ด be arbitrary. Then by denition of the polar cone, every ๐ฅโ โ ๐ดโฆ satises
ใ๐ฅโ, ๐ฅใ๐ โค 0,
i.e., ๐ฅ โ (๐ดโฆ)โฆ = ๐ดโฆโฆ.
(iii): This is immediate from the denition.
(iv): By (ii), we only need to prove๐ถโฆโฆ โ ๐ถ which we do by contradiction. Assume thereforethat there exists ๐ฅ โ ๐ถโฆโฆ \ {0} with ๐ฅ โ ๐ถ . Applying Theorem 1.4 (ii) to the nonempty(due to (ii)) closed, and convex set ๐ถโฆโฆ and the disjoint compact convex set {๐ฅ}, we obtain๐ฅโ โ ๐ โ \ {0} and _ โ โ such that
(1.3) ใ๐ฅโ, ๐ฅใ๐ โค _ < ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅ โ ๐ถ.
9
1 functional analysis
Since๐ถ is a cone, the rst inequality must also hold for ๐ก๐ฅ โ ๐ถ for every ๐ก > 0. This impliesthat
ใ๐ฅโ, ๐ฅใ๐ โค ๐กโ1_ โ 0 for ๐ก โ โ,i.e., ใ๐ฅโ, ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ถ must hold, i.e., ๐ฅโ โ ๐ถโฆ. On the other hand, if _ < 0, we obtainby the same argument that
ใ๐ฅโ, ๐ฅใ๐ โค ๐กโ1_ โ โโ for ๐ก โ 0,
which cannot hold. Hence, we can take _ = 0 in (1.3). Together, we obtain from ๐ฅ โ ๐ถโฆโฆ thecontradiction
0 < ใ๐ฅโ, ๐ฅใ๐ โค 0. ๏ฟฝ
The duality pairing induces further notions of convergence.
(i) A sequence {๐ฅ๐}๐โโ โ ๐ converges weakly (in ๐ ) to ๐ฅ โ ๐ , denoted by ๐ฅ๐ โ ๐ฅ , if
ใ๐ฅโ, ๐ฅ๐ใ๐ โ ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅโ โ ๐ โ.
(ii) A sequence {๐ฅโ๐}๐โโ โ ๐ โ converges weakly-โ (in๐ โ) to ๐ฅโ โ ๐ โ, denoted by ๐ฅโ๐ โโ ๐ฅโ,if
ใ๐ฅโ๐, ๐ฅใ๐ โ ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅ โ ๐ .
Weak convergence generalizes the concept of componentwise convergence in โ๐ , which โas can be seen from the proof of the HeineโBorel Theorem โ is the appropriate conceptin the context of compactness. Strong convergence in ๐ implies weak convergence bycontinuity of the duality pairing; in the same way, strong convergence in๐ โ implies weak-โconvergence. If ๐ is reexive, weak and weak-โ convergence (both in ๐ = ๐ โโ) coincide.In nite-dimensional spaces, all convergence notions coincide.
Weakly convergent sequences are always bounded; if ๐ is a Banach space, so are weakly-โconvergent sequences. If ๐ฅ๐ โ ๐ฅ and ๐ฅโ๐ โโ ๐ฅโ or ๐ฅ๐ โ ๐ฅ and ๐ฅโ๐ โ ๐ฅโ, then ใ๐ฅโ๐, ๐ฅ๐ใ๐ โใ๐ฅโ, ๐ฅใ๐ . However, the duality pairing of weak(-โ) convergent sequences does not convergein general.
As for strong convergence, one denes weak(-โ) continuity and closedness of mappingsas well as weak(-โ) closedness and compactness of sets. The last property is of fundamen-tal importance in optimization; its characterization is therefore a central result of thischapter.
Theorem 1.9 (EberleinโSmulyan). If ๐ is a normed vector space, then ๐น๐ is weakly compactif and only if ๐ is reexive.
10
1 functional analysis
Hence in a reexive space, all bounded sequences contain a weakly (but in general notstrongly) convergent subsequence. Note that weak closedness is a stronger claim thanclosedness, since the property has to hold for more sequences. For convex sets, however,both concepts coincide.
Lemma 1.10. Let ๐ be a normed vector space and๐ โ ๐ be convex. Then๐ is weakly closedif and only if๐ is closed.
Proof. Weakly closed sets are always closed since a convergent sequence is also weaklyconvergent. Let now๐ โ ๐ be convex closed and nonempty (otherwise nothing has to beshown) and consider a sequence {๐ฅ๐}๐โโ โ ๐ with ๐ฅ๐ โ ๐ฅ โ ๐ . Assume that ๐ฅ โ ๐ \๐ .Then the sets ๐ and {๐ฅ} satisfy the premise of Theorem 1.5 (ii); we thus nd an ๐ฅโ โ ๐ โ
and a _ โ โ withใ๐ฅโ, ๐ฅ๐ใ๐ โค _ < ใ๐ฅโ, ๐ฅใ๐ for all ๐ โ โ.
Passing to the limit ๐ โ โ in the rst inequality yields the contradiction
ใ๐ฅโ, ๐ฅใ๐ < ใ๐ฅโ, ๐ฅใ๐ . ๏ฟฝ
If ๐ is not reexive (e.g., ๐ = ๐ฟโ(ฮฉ)), we have to turn to weak-โ convergence.
Theorem 1.11 (BanachโAlaoglu). If ๐ is a separable normed vector space (i.e., contains acountable dense subset), then ๐น๐ โ is weakly-โ compact.
By the Weierstraร Approximation Theorem, both ๐ถ (ฮฉ) and ๐ฟ๐ (ฮฉ) for 1 โค ๐ < โ areseparable; also, โ๐ is separable for 1 โค ๐ < โ. Hence, bounded and weakly-โ closed balls inโโ, ๐ฟโ(ฮฉ), andM(ฮฉ) are weakly-โ compact. However, these spaces themselves are notseparable.
We also have the following straightforward improvement of Theorem 1.8 (i).
Lemma 1.12. Let๐ be a separable normed vector space and๐ด โ ๐ . Then๐ดโฆ is weakly-โ closedand convex.
Note, however, that arbitrary closed convex sets in nonreexive spaces do not have to beweakly-โ closed.Finally, we will also need the following โweak-โโ separation theorem, whose proof isanalogous to the proof of Theorem 1.5 (using the fact that the linear weakly-โ continuousfunctionals are exactly those of the form ๐ฅโ โฆโ ใ๐ฅโ, ๐ฅใ๐ for some ๐ฅ โ ๐ ); see also [Rudin1991, Theorem 3.4(b)].
11
1 functional analysis
Theorem 1.13. Let ๐ be a normed vector space and ๐ด โ ๐ โ be a nonempty, convex, andweakly-โ closed subset and ๐ฅโ โ ๐ โ \๐ด. Then there exist an ๐ฅ โ ๐ and a _ โ โ with
ใ๐งโ, ๐ฅใ๐ โค _ < ใ๐ฅโ, ๐ฅใ๐ for all ๐งโ โ ๐ด.
Since a normed vector space is characterized by its dual, this is also the case for linearoperators acting on this space. For any ๐ โ ๐(๐ ;๐ ), the adjoint operator ๐ โ โ ๐(๐ โ;๐ โ) isdened via
ใ๐ โ๐ฆโ, ๐ฅใ๐ = ใ๐ฆโ,๐๐ฅใ๐ for all ๐ฅ โ ๐, ๐ฆโ โ ๐ โ.
It always holds that โ๐ โโ๐(๐ โ;๐ โ) = โ๐ โ๐(๐ ;๐ ) . Furthermore, the continuity of๐ implies that๐ โ is weakly-โ continuous (and ๐ weakly continuous).
1.3 hilbert spaces
Especially strong duality properties hold in Hilbert spaces. A mapping (ยท | ยท) : ๐ ร ๐ โ โ
on a vector space ๐ over โ is called inner product, if
(i) (๐ผ๐ฅ + ๐ฝ๐ฆ | ๐ง) = ๐ผ (๐ฅ | ๐ง) + ๐ฝ (๐ฆ | ๐ง) for all ๐ฅ, ๐ฆ, ๐ง โ ๐ and ๐ผ, ๐ฝ โ โ;
(ii) (๐ฅ | ๐ฆ) = (๐ฆ | ๐ฅ) for all ๐ฅ, ๐ฆ โ ๐ ;(iii) (๐ฅ | ๐ฅ) โฅ 0 for all ๐ฅ โ ๐ with equality if and only if ๐ฅ = 0.
An inner product induces a norm
โ๐ฅ โ๐ โโ(๐ฅ | ๐ฅ)๐ ,
which satises the CauchyโSchwarz inequality
(๐ฅ | ๐ฆ)๐ โค โ๐ฅ โ๐ โ๐ฆ โ๐ .
If๐ is complete with respect to the induced norm (i.e., if (๐, โ ยท โ๐ ) is a Banach space), then๐ is called a Hilbert space; if the inner product is canonical, it is frequently omitted, and theHilbert space is simply denoted by ๐ . The spaces in Example 1.3 (i)โ(iii) for ๐ = 2(= ๐) areall Hilbert spaces, where the inner product coincides with the duality pairing and inducesthe canonical norm.
Directly from the denition of the induced norm we obtain the binomial expansion
(1.4) โ๐ฅ + ๐ฆ โ2๐ = โ๐ฅ โ2๐ + 2(๐ฅ | ๐ฆ)๐ + โ๐ฆ โ2๐ ,
which in turn can be used to verify the three-point identity
(1.5) (๐ฅ โ ๐ฆ | ๐ฅ โ ๐ง)๐ =12 โ๐ฅ โ ๐ฆ โ2๐ โ 1
2 โ๐ฆ โ ๐งโ2๐ + 12 โ๐ฅ โ ๐งโ2๐ for all ๐ฅ, ๐ฆ, ๐ง โ ๐ .
12
1 functional analysis
(This can be seen as a generalization of the classical Pythagorean theorem in plane geome-try.)
The relevant point in our context is that the dual of a Hilbert space ๐ can be identiedwith ๐ itself.
Theorem 1.14 (FrรฉchetโRiesz). Let ๐ be a Hilbert space. Then for each ๐ฅโ โ ๐ โ there exists aunique ๐ง๐ฅโ โ ๐ with โ๐ฅโโ๐ โ = โ๐ง๐ฅโ โ๐ and
ใ๐ฅโ, ๐ฅใ๐ = (๐ฅ | ๐ง๐ฅโ)๐ for all ๐ฅ โ ๐ .
The element ๐ง๐ฅโ is called Riesz representation of ๐ฅโ. The (linear) mapping ๐ฝ๐ : ๐ โ โ ๐ ,๐ฅโ โฆโ ๐ง๐ฅโ , is called Riesz isomorphism, and can be used to show that every Hilbert space isreexive.
Theorem 1.14 allows to use the inner product instead of the duality pairing in Hilbert spaces.For example, a sequence {๐ฅ๐}๐โโ โ ๐ converges weakly to ๐ฅ โ ๐ if and only if
(๐ฅ๐ | ๐ง)๐ โ (๐ฅ | ๐ง)๐ for all ๐ง โ ๐ .
This implies that if ๐ฅ๐ โ ๐ฅ and in addition โ๐ฅ๐โ๐ โ โ๐ฅ โ๐ (in which case we say that ๐ฅ๐strictly converges to ๐ฅ ),
(1.6) โ๐ฅ๐ โ ๐ฅ โ2๐ = โ๐ฅ๐โ2๐ โ 2(๐ฅ๐ | ๐ฅ)๐ + โ๐ฅ โ2๐ โ 0,
i.e.,๐ฅ๐ โ ๐ฅ . A normed vector space in which strict convergence implies strong convergenceis said to have the RadonโRiesz property.
Similar statements hold for linear operators on Hilbert spaces. For a linear operator ๐ โ๐(๐ ;๐ ) between Hilbert spaces ๐ and ๐ , the Hilbert space adjoint operator ๐โ โ ๐(๐ ;๐ ) isdened via
(๐โ ๐ฆ | ๐ฅ)๐ = (๐๐ฅ | ๐ฆ)๐ for all ๐ฅ โ ๐, ๐ฆ โ ๐ .If ๐โ = ๐ , the operator ๐ is called self-adjoint. A self-adjoint operator is called positivedenite, if there exists a ๐ > 0 such that
(๐๐ฅ | ๐ฅ)๐ โฅ ๐ โ๐ฅ โ2๐ for all ๐ฅ โ ๐ .
In this case, ๐ has a bounded inverse ๐ โ1 with โ๐ โ1โ๐(๐ ;๐ ) โค ๐โ1. We will also use thenotation ๐ โฅ ๐ for two operators ๐,๐ : ๐ โ ๐ if
(๐๐ฅ | ๐ฅ)๐ โฅ (๐๐ฅ | ๐ฅ)๐ for all ๐ฅ โ ๐ .
Hence ๐ is positive denite if and only if ๐ โฅ ๐Id for some ๐ > 0; if ๐ โฅ 0, we say that ๐ ismerely positive semi-denite.
13
1 functional analysis
The Hilbert space adjoint is related to the (Banach space) adjoint via ๐โ = ๐ฝ๐๐ โ๐ฝโ1๐ . If thecontext is obvious, we will not distinguish the two in notation. Similarly, we will also โ bya moderate abuse of notation โ use angled brackets to denote inner products in Hilbertspaces except where we need to refer to both at the same time (which will rarely be thecase, and the danger of confusing inner products with elements of a product space is muchgreater).
14
2 CALCULUS OF VARIATIONS
We rst consider the question about the existence of minimizers of a (nonlinear) functional๐น : ๐ โ โ for a subset ๐ of a Banach space ๐ . Answering such questions is one of thegoals of the calculus of variations.
2.1 the direct method
It is helpful to include the constraint ๐ฅ โ ๐ into the functional by extending ๐น to all of ๐with the valueโ. We thus consider
๐น : ๐ โ โ โ โ โช {โ}, ๐น (๐ฅ) ={๐น (๐ฅ) if ๐ฅ โ ๐ ,โ if ๐ฅ โ ๐ \๐ .
We use the usual arithmetic on โ, i.e., ๐ก < โ and ๐ก + โ = โ for all ๐ก โ โ; subtraction andmultiplication of negative numbers with โ and in particular ๐น (๐ฅ) = โโ is not allowed,however. Thus if there is any ๐ฅ โ ๐ at all, a minimizer ๐ฅ necessarily must lie in๐ .
We thus consider from now on functionals ๐น : ๐ โ โ. The set on which ๐น is nite is calledthe eective domain
dom ๐น โ {๐ฅ โ ๐ | ๐น (๐ฅ) < โ} .If dom ๐น โ โ , the functional ๐น is called proper.
We now generalize the Weierstraร Theorem (every real-valued continuous function ona compact set attains its minimum and maximum) to Banach spaces and in particular tofunctions of the form ๐น . Since we are only interested in minimizers, we only require aโone-sidedโ continuity: We call ๐น lower semicontinuous in ๐ฅ โ ๐ if
๐น (๐ฅ) โค lim inf๐โโ ๐น (๐ฅ๐) for every {๐ฅ๐}๐โโ โ ๐ with ๐ฅ๐ โ ๐ฅ,
see Figure 2.1. Analogously, we dene weakly(-โ) lower semicontinuous functionals viaweakly(-โ) convergent sequences. Finally,๐น is called coercive if for every sequence {๐ฅ๐}๐โโ โ๐ with โ๐ฅ๐โ๐ โ โ we also have ๐น (๐ฅ๐) โ โ.
15
2 calculus of variations
๐ฅ
๐น1(๐ฅ)๐ฅ๐
๐น1(๐ฅ๐)
(a) ๐น1 is lower semicontinuous at ๐ฅ
๐ฅ
๐น2(๐ฅ)
๐ฅ๐
๐น2(๐ฅ๐)
(b) ๐น2 is not lower semicontinuous at ๐ฅ
Figure 2.1: Illustration of lower semicontinuity: two functions ๐น1, ๐น2 : โ โ โ and a se-quence {๐ฅ๐}๐โโ realizing their (identical) limes inferior.
We now have everything at hand to prove the central existence result in the calculus ofvariations. The strategy for its proof is known as the direct method.1
Theorem 2.1. Let ๐ be a reexive Banach space and ๐น : ๐ โ โ be proper, coercive, andweakly lower semicontinuous. Then the minimization problem
min๐ฅโ๐
๐น (๐ฅ)
has a solution ๐ฅ โ dom ๐น .
Proof. The proof can be separated into three steps.
(i) Pick a minimizing sequence.
Since ๐น is proper, there exists an ๐ โ inf๐ฅโ๐ ๐น (๐ฅ) < โ (although ๐ = โโ is notexcluded so far). We can thus nd a sequence {๐ฆ๐}๐โโ โ ran ๐น \ {โ} โ โ with๐ฆ๐ โ ๐ , i.e., there exists a sequence {๐ฅ๐}๐โโ โ ๐ with
๐น (๐ฅ๐) โ ๐ = inf๐ฅโ๐
๐น (๐ฅ).
Such a sequence is called minimizing sequence. Note that from the convergence of{๐น (๐ฅ๐)}๐โโ we cannot conclude the convergence of {๐ฅ๐}๐โโ (yet).
(ii) Show that the minimizing sequence contains a convergent subsequence.
Assume to the contrary that {๐ฅ๐}๐โโ is unbounded, i.e., that โ๐ฅ๐โ๐ โ โ for ๐ โ โ.The coercivity of ๐น then implies that ๐น (๐ฅ๐) โ โ as well, in contradiction to ๐น (๐ฅ๐) โ๐ < โ by denition of the minimizing sequence. Hence, the sequence is bounded, i.e.,
1This strategy is applied so often in the literature that one usually just writes โExistence of a minimizerfollows from the direct method.โ or even just โExistence follows from standard arguments.โ The basicidea goes back to Hilbert; the version based on lower semicontinuity which we use here is due to LeonidaTonelli (1885โ1946), who through it had a lasting inuence on the modern calculus of variations.
16
2 calculus of variations
there is an ๐ > 0 with โ๐ฅ๐โ๐ โค ๐ for all ๐ โ โ. In particular, {๐ฅ๐}๐โโ โ ๐น(0, ๐).The EberleinโSmulyan Theorem 1.9 therefore implies the existence of a weaklyconverging subsequence {๐ฅ๐๐ }๐โโ with limit ๐ฅ โ ๐ . (This limit is a candidate for theminimizer.)
(iii) Show that its limit is a minimizer.
From the denition of the minimizing sequence, we also have ๐น (๐ฅ๐๐ ) โ ๐ for๐ โ โ. Together with the weak lower semicontinuity of ๐น and the denition of theinmum we thus obtain
inf๐ฅโ๐
๐น (๐ฅ) โค ๐น (๐ฅ) โค lim inf๐โโ
๐น (๐ฅ๐๐ ) = ๐ = inf๐ฅโ๐
๐น (๐ฅ) < โ.
This implies that ๐ฅ โ dom ๐น and that inf๐ฅโ๐ ๐น (๐ฅ) = ๐น (๐ฅ) > โโ. Hence, the inmumis attained in ๐ฅ which is therefore the desired minimizer. ๏ฟฝ
Remark 2.2. If ๐ is not reexive but the dual of a separable Banach space, we can argue analogouslyfor weakly-โ lower semicontinuous functionals using the BanachโAlaoglu Theorem 1.11
Note how the topology on ๐ used in the proof is restricted in step (ii) and (iii): Step (ii)prots from a coarse topology (in which more sequences are convergent), while step (iii)prots from a ne topology (the fewer sequences are convergent, the easier it is to satisfythe lim inf conditions). Since in the cases of interest to us no more than boundedness of aminimizing sequence can be expected, we cannot use a ner than the weak topology. Wethus have to ask whether a suciently large class of (interesting) functionals are weaklylower semicontinuous.
A rst example is the class of bounded linear functionals: For any ๐ฅโ โ ๐ โ, the functional
๐น : ๐ โ โ, ๐ฅ โฆโ ใ๐ฅโ, ๐ฅใ๐ ,
is weakly continuous by denition of weak convergence and hence a fortiori weakly lowersemicontinuous. Another advantage of (weak) lower semicontinuity is that it is preservedunder certain operations.
Lemma 2.3. Let๐ and๐ be Banach spaces and ๐น : ๐ โ โ be weakly(-โ) lower semicontinuous.Then the following functionals are weakly(-โ) lower semicontinuous as well:
(i) ๐ผ๐น for all ๐ผ โฅ 0;
(ii) ๐น +๐บ for ๐บ : ๐ โ โ weakly(-โ) lower semicontinuous;
(iii) ๐ โฆ ๐น for ๐ : โ โ โ lower semicontinuous and monotonically increasing.
(iv) ๐น โฆ ฮฆ for ฮฆ : ๐ โ ๐ weakly(-โ) continuous, i.e., ๐ฆ๐ โ(โ) ๐ฆ implies ฮฆ(๐ฆ๐) โ(โ) ฮฆ(๐ฆ);
17
2 calculus of variations
(v) ๐ฅ โฆโ sup๐โ๐ผ ๐น๐ (๐ฅ) with ๐น๐ : ๐ โ โ weakly(-โ) lower semicontinuous for all ๐ โ ๐ผ andan arbitrary set ๐ผ .
Note that (v) does not hold for continuous functions.
Proof. We only show the claim for the case of weak lower semicontinuity; the statementsfor weak-โ lower semicontinuity follow by the same arguments.
Statements (i) and (ii) follow directly from the properties of the limes inferior.
For statement (iii), it rst follows from the monotonicity and weak lower semicontinuityof ๐ that ๐ฅ๐ โ ๐ฅ implies
๐ (๐น (๐ฅ)) โค ๐ (lim inf๐โโ ๐น (๐ฅ๐)) .
It remains to show that the right-hand side can be bounded by lim inf๐โโ ๐ (๐น (๐ฅ๐)).For that purpose, we consider the subsequence {๐ (๐น (๐ฅ๐๐ )}๐โโ which realizes the lim inf ,i.e., for which lim inf๐โโ ๐ (๐น (๐ฅ๐)) = lim๐โโ ๐ (๐น (๐ฅ๐๐ )). By passing to a further subse-quence which we index by ๐โฒ for simplicity, we can also obtain that lim inf๐โโ ๐น (๐ฅ๐๐ ) =lim๐ โฒโโ ๐น (๐ฅ๐๐ โฒ ). Since the lim inf restricted to a subsequence can never be smaller thanthat of the full sequence, the monotonicity of๐ together with its weak lower semicontinuitynow implies that
๐ (lim inf๐โโ ๐น (๐ฅ๐)) โค ๐ ( lim
๐ โฒโโ๐น (๐ฅ๐๐ โฒ )) โค lim inf
๐ โฒโโ๐ (๐น (๐ฅ๐๐ โฒ )) = lim inf
๐โโ ๐ (๐น (๐ฅ๐)),
where we have used in the last step that a subsequence of a convergent sequence has thesame limit (which coincides with the lim inf).
Statement (iv) follows directly from the weak continuity of ฮฆ, as ๐ฆ๐ โ ๐ฆ implies that๐ฅ๐ โ ฮฆ(๐ฆ๐) โ ฮฆ(๐ฆ) =: ๐ฅ , and the lower semicontinuity of ๐น yields
๐น (ฮฆ(๐ฆ)) โค lim inf๐โโ ๐น (ฮฆ(๐ฆ๐)) .
Finally, let {๐ฅ๐}๐โโ be a weakly converging sequence with limit ๐ฅ โ ๐ . Then the denitionof the supremum implies that
๐น ๐ (๐ฅ) โค lim inf๐โโ ๐น ๐ (๐ฅ๐) โค lim inf
๐โโ sup๐โ๐ผ
๐น๐ (๐ฅ๐) for all ๐ โ ๐ผ .
Taking the supremum over all ๐ โ ๐ผ on both sides yields statement (v). ๏ฟฝ
Corollary 2.4. If ๐ is a Banach space, the norm โ ยท โ๐ is proper, coercive, and weakly lowersemicontinuous. Similarly, the dual norm โ ยท โ๐ โ is proper, coercive, and weakly-โ lowersemicontinuous.
18
2 calculus of variations
Proof. Coercivity and dom โ ยท โ๐ = ๐ follow directly from the denition. Weak lowersemicontinuity follows from Lemma 2.3 (v) and Corollary 1.7 since
โ๐ฅ โ๐ = supโ๐ฅโโ๐โโค1
|ใ๐ฅโ, ๐ฅใ๐ |.
The claim for โ ยท โ๐ โ follows analogously using the denition of the operator norm in placeof Corollary 1.7. ๏ฟฝ
Another frequently occurring functional is the indicator function2 of a set ๐ โ ๐ , denedas
๐ฟ๐ (๐ฅ) ={0 ๐ฅ โ ๐ ,โ ๐ฅ โ ๐ \๐ .
The purpose of this denition is of course to reduce the minimization of a functional๐น : ๐ โ โ over ๐ to the minimization of ๐น โ ๐น + ๐ฟ๐ over ๐ . The following result istherefore important for showing the existence of a minimizer.
Lemma 2.5. Let ๐ be a Banach space and๐ โ ๐ . Then ๐ฟ๐ : ๐ โ โ is
(i) proper if๐ is nonempty;
(ii) weakly lower semicontinuous if๐ is convex and closed;
(iii) coercive if๐ is bounded.
Proof. Statement (i) is clear. For (ii), consider a weakly converging sequence {๐ฅ๐}๐โโ โ ๐with limit ๐ฅ โ ๐ . If ๐ฅ โ ๐ , then ๐ฟ๐ โฅ 0 immediately yields
๐ฟ๐ (๐ฅ) = 0 โค lim inf๐โโ ๐ฟ๐ (๐ฅ๐).
Let now ๐ฅ โ ๐ . Since ๐ is convex and closed and hence by Lemma 1.10 also weakly closed,there must be a ๐ โ โ with ๐ฅ๐ โ ๐ for all ๐ โฅ ๐ (otherwise we could โ by passing to asubsequence if necessary โ construct a sequence with ๐ฅ๐ โ ๐ฅ โ ๐ , in contradiction to theassumption). Thus, ๐ฟ๐ (๐ฅ๐) = โ for all ๐ โฅ ๐ , and therefore
๐ฟ๐ (๐ฅ) = โ = lim inf๐โโ ๐ฟ๐ (๐ฅ๐).
For (iii), let๐ be bounded, i.e., there exist an๐ > 0 with๐ โ ๐น(0, ๐). If โ๐ฅ๐โ๐ โ โ, thenthere exists an ๐ โ โ with โ๐ฅ๐โ๐ > ๐ for all ๐ โฅ ๐ , and thus ๐ฅ๐ โ ๐น(0, ๐) โ ๐ for all๐ โฅ ๐ . Hence, ๐ฟ๐ (๐ฅ๐) โ โ as well. ๏ฟฝ
2not to be confused with the characteristic function ๐๐ with ๐๐ (๐ฅ) = 1 for ๐ฅ โ ๐ and 0 else
19
2 calculus of variations
2.2 differential calculus in banach spaces
To characterize minimizers of functionals on innite-dimensional spaces using the Fermatprinciple, we transfer the classical derivative concepts to Banach spaces.
Let ๐ and ๐ be Banach spaces, ๐น : ๐ โ ๐ be a mapping, and ๐ฅ, โ โ ๐ be given.
โข If the one-sided limit
๐น โฒ(๐ฅ ;โ) โ lim๐กโ 0
๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โ ๐
(where ๐กโ 0 denotes the limit for arbitrary positive decreasing null sequences) exists,it is called the directional derivative of ๐น in ๐ฅ in direction โ.
โข If ๐น โฒ(๐ฅ ;โ) exists for all โ โ ๐ and
๐ท๐น (๐ฅ) : ๐ โ ๐, โ โฆโ ๐น โฒ(๐ฅ ;โ)
denes a bounded linear operator, we call ๐น Gรขteaux dierentiable (at ๐ฅ ) and๐ท๐น (๐ฅ) โ๐(๐ ;๐ ) its Gรขteaux derivative.
โข If additionallylim
โโโ๐โ0
โ๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐ท๐น (๐ฅ)โโ๐โโโ๐ = 0,
then ๐น is called Frรฉchet dierentiable (in ๐ฅ ) and ๐น โฒ(๐ฅ) โ ๐ท๐น (๐ฅ) โ ๐(๐ ;๐ ) its Frรฉchetderivative.
โข If additionally the mapping ๐น โฒ : ๐ โ ๐(๐ ;๐ ) is (Lipschitz) continuous, we call ๐น(Lipschitz) continuously dierentiable.
The dierence between Gรขteaux and Frรฉchet dierentiable lies in the approximation errorof ๐น near ๐ฅ by ๐น (๐ฅ) + ๐ท๐น (๐ฅ)โ: While it only has to be bounded in โโโ๐ โ i.e., linear inโโโ๐ โ for a Gรขteaux dierentiable function, it has to be superlinear in โโโ๐ if ๐น is Frรฉchetdierentiable. (For a xed directionโ, this is of course also the case forGรขteaux dierentiablefunctions; Frรฉchet dierentiability thus additionally requires a uniformity in โ.) We alsopoint out that continuous dierentiability always entails Frรฉchet dierentiability.
Remark 2.6. Sometimes a weaker notion than continuous dierentiability is used. A mapping๐น : ๐ โ ๐ is called strictly dierentiable in ๐ฅ if
(2.1) lim๐ฆโ๐ฅ
โโ โ๐โ0
โ๐น (๐ฆ + โ) โ ๐น (๐ฆ) โ ๐น โฒ(๐ฅ)โโ๐โโโ๐ = 0.
The benet of this denition over that of continuous dierentiability is that the limit process is nowin the function ๐น rather than the derivative ๐น โฒ; strict dierentiability can therefore hold if everyneighborhood of ๐ฅ contains points where ๐น is not dierentiable. However, if ๐น is dierentiable
20
2 calculus of variations
everywhere in a neighborhood of ๐ฅ , then ๐น is strictly dierentiable if and only if ๐น โฒ is continuous;see [Dontchev & Rockafellar 2014, Proposition 1D.7]. Although many results of Chapters 13 to 25actually hold under the weaker assumption of strict dierentiability, we will therefore work onlywith the more standard notion of continuous dierentiability.
If ๐น is Gรขteaux dierentiable, the Gรขteaux derivative can be computed via
๐ท๐น (๐ฅ)โ =(๐๐๐ก ๐น (๐ฅ + ๐กโ)
) ๏ฟฝ๏ฟฝ๏ฟฝ๐ก=0.
Bounded linear operators ๐น โ ๐(๐ ;๐ ) are obviously Frรฉchet dierentiable with derivative๐น โฒ(๐ฅ) = ๐น โ ๐(๐ ;๐ ) for all ๐ฅ โ ๐ . Further derivatives can be obtained through the usualcalculus, whose proof in Banach spaces is exactly as in โ๐ . As an example, we prove achain rule.
Theorem 2.7. Let ๐ , ๐ , and ๐ be Banach spaces, and let ๐น : ๐ โ ๐ be Frรฉchet dierentiableat ๐ฅ โ ๐ and ๐บ : ๐ โ ๐ be Frรฉchet dierentiable at ๐ฆ โ ๐น (๐ฅ) โ ๐ . Then ๐บ โฆ ๐น is Frรฉchetdierentiable at ๐ฅ and
(๐บ โฆ ๐น )โฒ(๐ฅ) = ๐บโฒ(๐น (๐ฅ)) โฆ ๐น โฒ(๐ฅ).
Proof. For โ โ ๐ with ๐ฅ + โ โ dom ๐น we have
(๐บ โฆ ๐น ) (๐ฅ + โ) โ (๐บ โฆ ๐น ) (๐ฅ) = ๐บ (๐น (๐ฅ + โ)) โ๐บ (๐น (๐ฅ)) = ๐บ (๐ฆ + ๐) โ๐บ (๐ฆ)
with ๐ โ ๐น (๐ฅ + โ) โ ๐น (๐ฅ). The Frรฉchet dierentiability of ๐บ thus implies that
โ(๐บ โฆ ๐น ) (๐ฅ + โ) โ (๐บ โฆ ๐น ) (๐ฅ) โ๐บโฒ(๐ฆ)๐โ๐ = ๐1(โ๐โ๐ )
with ๐1(๐ก)/๐ก โ 0 for ๐ก โ 0. The Frรฉchet dierentiability of ๐น further implies
โ๐ โ ๐น โฒ(๐ฅ)โโ๐ = ๐2(โโโ๐ )
with ๐2(๐ก)/๐ก โ 0 for ๐ก โ 0. In particular,
(2.2) โ๐โ๐ โค โ๐น โฒ(๐ฅ)โโ๐ + ๐2(โโโ๐ ) .
Hence, with ๐ โ โ๐บโฒ(๐น (๐ฅ))โ๐(๐ ;๐ ) we have
โ(๐บ โฆ ๐น ) (๐ฅ + โ) โ (๐บ โฆ ๐น ) (๐ฅ) โ๐บโฒ(๐น (๐ฅ))๐น โฒ(๐ฅ)โโ๐ โค ๐1(โ๐โ๐ ) + ๐ ๐2(โโโ๐ ).
If โโโ๐ โ 0, we obtain from (2.2) and ๐น โฒ(๐ฅ) โ ๐(๐ ;๐ ) that โ๐โ๐ โ 0 as well, and theclaim follows. ๏ฟฝ
A similar rule for Gรขteaux derivatives does not hold, however.
Of special importance in Part IV will be the following inverse function theorem, whoseproof can be found, e.g., in [Renardy & Rogers 2004, Theorem 10.4].
21
2 calculus of variations
Theorem 2.8 (inverse function theorem). Let ๐น : ๐ โ ๐ be a continuously dierentiablemapping between the Banach spaces ๐ and ๐ and ๐ฅ โ ๐ . If ๐น โฒ(๐ฅ) : ๐ โ ๐ is bijective,then there exists an open set ๐ โ ๐ with ๐น (๐ฅ) โ ๐ such that ๐นโ1 : ๐ โ ๐ exists and iscontinuously dierentiable.
Of particular relevance in optimization is of course the special case ๐น : ๐ โ โ, where๐ท๐น (๐ฅ) โ ๐(๐ ;โ) = ๐ โ (if the Gรขteaux derivative exists). Following the usual notationfrom Section 1.2, we will then write ๐น โฒ(๐ฅ ;โ) = ใ๐ท๐น (๐ฅ), โใ๐ for directional derivative indirection โ โ ๐ . Our rst result is the classical Fermat principle characterizing minimizersof a dierentiable functions.
Theorem 2.9 (Fermat principle). Let ๐น : ๐ โ โ be Gรขteaux dierentiable and ๐ฅ โ ๐ be alocal minimizer of ๐น . Then ๐ท๐น (๐ฅ) = 0, i.e.,
ใ๐ท๐น (๐ฅ), โใ๐ = 0 for all โ โ ๐ .
Proof. Let โ โ ๐ be arbitrary. Since ๐ฅ is a local minimizer, the coreโint Lemma 1.2 impliesthat there exists an Y > 0 such that ๐น (๐ฅ) โค ๐น (๐ฅ + ๐กโ) for all ๐ก โ (0, Y), i.e.,
(2.3) 0 โค ๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โ ๐น โฒ(๐ฅ ;โ) = ใ๐ท๐น (๐ฅ), โใ๐ for ๐ก โ 0,
where we have used the Gรขteaux dierentiability and hence directional dierentiability of๐น . Since the right-hand side is linear in โ, the same argument for โโ yields ใ๐ท๐น (๐ฅ), โใ๐ โค 0and therefore the claim. ๏ฟฝ
We will also need the following version of the mean value theorem.
Theorem 2.10. Let ๐น : ๐ โ โ be Frรฉchet dierentiable. Then for all ๐ฅ, โ โ ๐ ,
๐น (๐ฅ + โ) โ ๐น (๐ฅ) =โซ 1
0ใ๐น โฒ(๐ฅ + ๐กโ), โใ๐ ๐๐ก .
Proof. Consider the scalar function
๐ : [0, 1] โ โ, ๐ก โฆโ ๐น (๐ฅ + ๐กโ).From Theorem 2.7 we obtain that ๐ (as a composition of mappings on Banach spaces) isdierentiable with
๐ โฒ(๐ก) = ใ๐น โฒ(๐ฅ + ๐กโ), โใ๐ ,and the fundamental theorem of calculus in โ yields that
๐น (๐ฅ + โ) โ ๐น (๐ฅ) = ๐ (1) โ ๐ (0) =โซ 1
0๐ โฒ(๐ก) ๐๐ก =
โซ 1
0ใ๐น โฒ(๐ฅ + ๐กโ), โใ๐ ๐๐ก . ๏ฟฝ
22
2 calculus of variations
As in classical analysis, this result is useful for relating local and pointwise properties ofsmooth functions. A typical example is the following lemma.
Lemma 2.11. Let ๐น : ๐ โ ๐ be continuously Frรฉchet dierentiable in a neighborhood ๐ of๐ฅ โ ๐ . Then ๐น is locally Lipschitz continuous near ๐ฅ โ ๐ .
Proof. Since ๐น โฒ : ๐ โ ๐(๐ ;๐ ) is continuous in ๐ , there exists a ๐ฟ > 0 with โ๐น โฒ(๐ง) โ๐น โฒ(๐ฅ)โ๐(๐ ;๐ ) โค 1 and hence โ๐น โฒ(๐ง)โ๐(๐ ;๐ ) โค 1 + โ๐น โฒ(๐ฅ)โ๐ โ for all ๐ง โ ๐น(๐ฅ, ๐ฟ) โ ๐ . For any๐ฅ1, ๐ฅ2 โ ๐น(๐ฅ, ๐ฟ) we also have ๐ฅ2 + ๐ก (๐ฅ1โ๐ฅ2) โ ๐น(๐ฅ, ๐ฟ) for all ๐ก โ [0, 1] (since balls in normedvector spaces are convex), and hence Theorem 2.10 implies that
โ๐น (๐ฅ1) โ ๐น (๐ฅ2)โ๐ โคโซ 1
0โ๐น โฒ(๐ฅ2 + ๐ก (๐ฅ1 โ ๐ฅ2))โ๐(๐ ;๐ )๐ก โ๐ฅ1 โ ๐ฅ2โ๐ ๐๐ก
โค 1 + โ๐น โฒ(๐ฅ)โ๐(๐ ;๐ )2 โ๐ฅ1 โ ๐ฅ2โ๐ .
and thus local Lipschitz continuity near ๐ฅ with constant ๐ฟ = 12 (1 + โ๐น โฒ(๐ฅ)โ๐(๐ ;๐ )). ๏ฟฝ
Note that since the Gรขteaux derivative of ๐น : ๐ โ โ is an element of๐ โ, it cannot be addedto elements in ๐ (as required for, e.g., a steepest descent method). However, in Hilbertspaces (and in particular in โ๐ ), we can use the FrรฉchetโRiesz Theorem 1.14 to identify๐ท๐น (๐ฅ) โ ๐ โ with an element โ๐น (๐ฅ) โ ๐ , called the gradient of ๐น at ๐ฅ , in a canonical wayvia
ใ๐ท๐น (๐ฅ), โใ๐ = (โ๐น (๐ฅ) | โ)๐ for all โ โ ๐ .As an example, let us consider the functional ๐น (๐ฅ) = 1
2 โ๐ฅ โ2๐ , where the norm is induced bythe inner product. Then we have for all ๐ฅ, โ โ ๐ that
๐น โฒ(๐ฅ ;โ) = lim๐กโ 0
12 (๐ฅ + ๐กโ | ๐ฅ + ๐กโ)๐ โ 1
2 (๐ฅ | ๐ฅ)๐๐ก
= (๐ฅ | โ)๐ = ใ๐ท๐น (๐ฅ), โใ๐ ,
since the inner product is linear in โ for xed ๐ฅ . Hence, the squared norm is Gรขteauxdierentiable at every ๐ฅ โ ๐ with derivative ๐ท๐น (๐ฅ) = โ โฆโ (๐ฅ | โ)๐ โ ๐ โ; it is even Frรฉchetdierentiable since
limโโโ๐โ0
๏ฟฝ๏ฟฝ 12 โ๐ฅ + โโ2๐ โ 1
2 โ๐ฅ โ2๐ โ (๐ฅ, โ)๐๏ฟฝ๏ฟฝ
โโโ๐ = limโโโ๐โ0
12 โโโ๐ = 0.
The gradient โ๐น (๐ฅ) โ ๐ by denition is given by
(โ๐น (๐ฅ) | โ)๐ = ใ๐ท๐น (๐ฅ), โใ๐ = (๐ฅ | โ)๐ for all โ โ ๐,
i.e., โ๐น (๐ฅ) = ๐ฅ . To illustrate how the gradient (in contrast to the derivative) depends on theinner product, let๐ โ ๐(๐ ;๐ ) be self-adjoint and positive denite (and thus continuouslyinvertible). Then (๐ฅ | ๐ฆ)๐ โ (๐๐ฅ | ๐ฆ)๐ also denes an inner product on the vector space ๐
23
2 calculus of variations
and induces an (equivalent) norm โ๐ฅ โ๐ โ (๐ฅ | ๐ฅ)1/2๐ on ๐ . Hence (๐, (ยท | ยท)๐ ) is a Hilbertspace as well, which we will denote by ๐ . Consider now the functional ๐น : ๐ โ โ with๐น (๐ฅ) โ 1
2 โ๐ฅ โ2๐ (which is well-dened since โ ยท โ๐ is also an equivalent norm on๐ ). Then, thederivative ๐ท๐น (๐ฅ) โ ๐ โ is still given by ใ๐ท๐น (๐ฅ), โใ๐ = (๐ฅ | โ)๐ for all โ โ ๐ (or, equivalently,for all โ โ ๐ since we dened ๐ via the same vector space). However, โ๐น (๐ฅ) โ ๐ is nowcharacterized by
(๐ฅ | โ)๐ = ใ๐ท๐น (๐ฅ), โใ๐ = (โ๐น (๐ฅ) | โ)๐ = (๐โ๐น (๐ฅ) | โ)๐ for all โ โ ๐,i.e., โ๐น (๐ฅ) = ๐โ1๐ฅ โ โ๐น (๐ฅ). (The situation is even more delicate if ๐ is only positivedenite on a subspace, as in the case of ๐ = ๐ฟ2(ฮฉ) and ๐ = ๐ป 1(ฮฉ).)
2.3 superposition operators
A special class of operators on function spaces arise from pointwise application of a real-valued function, e.g.,๐ข (๐ฅ) โฆโ sin(๐ข (๐ฅ)). We thus consider for ๐ : ฮฉรโ โ โ with ฮฉ โ โ๐
open and bounded as well as ๐, ๐ โ [1,โ] the corresponding superposition or Nemytskiioperator
(2.4) ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ), [๐น (๐ข)] (๐ฅ) = ๐ (๐ฅ,๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ.
For this operator to be well-dened requires certain restrictions on ๐ . We call ๐ : ฮฉรโ โ โ
a Carathรฉodory function if
(i) for all ๐ง โ โ, the mapping ๐ฅ โฆโ ๐ (๐ฅ, ๐ง) is measurable;
(ii) for almost every ๐ฅ โ ฮฉ, the mapping ๐ง โฆโ ๐ (๐ฅ, ๐ง) is continuous.We additionally require the following growth condition: For given ๐, ๐ โ [1,โ) there exist๐ โ ๐ฟ๐ (ฮฉ) and ๐ โ ๐ฟโ(ฮฉ) with(2.5) |๐ (๐ฅ, ๐ง) | โค ๐(๐ฅ) + ๐ (๐ฅ) |๐ง |๐/๐ .Under these conditions, ๐น is even continuous.
Theorem 2.12. If the Carathรฉodory function ๐ : ฮฉ ร โ โ โ satises the growth condition(2.5) for ๐, ๐ โ [1,โ), then the superposition operator ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) dened via (2.4) iscontinuous.
Proof. We sketch the essential steps; a complete proof can be found in, e.g., [Appell &Zabreiko 1990, Theorems 3.1, 3.7]. First, one shows for given ๐ข โ ๐ฟ๐ (ฮฉ) the measurabilityof ๐น (๐ข) using the Carathรฉodory properties. It then follows from (2.5) and the triangleinequality that
โ๐น (๐ข)โ๐ฟ๐ โค โ๐โ๐ฟ๐ + โ๐โ๐ฟโ โ|๐ข |๐/๐ โ๐ฟ๐ = โ๐โ๐ฟ๐ + โ๐โ๐ฟโ โ๐ขโ๐/๐๐ฟ๐ < โ,
24
2 calculus of variations
i.e., ๐น (๐ข) โ ๐ฟ๐ (ฮฉ).To show continuity, we consider a sequence {๐ข๐}๐โโ โ ๐ฟ๐ (ฮฉ) with ๐ข๐ โ ๐ข โ ๐ฟ๐ (ฮฉ). Thenthere exists a subsequence, again denoted by {๐ข๐}๐โโ, that converges pointwise almosteverywhere in ฮฉ, as well as a ๐ฃ โ ๐ฟ๐ (ฮฉ) with |๐ข๐ (๐ฅ) | โค |๐ฃ (๐ฅ) | + |๐ข1(๐ฅ) | =: ๐(๐ฅ) for all๐ โ โ and almost every ๐ฅ โ ฮฉ (see, e.g., [Alt 2016, Lemma 3.22 as well as (3-14) in the proofof Theorem 3.17]). The continuity of ๐ง โฆโ ๐ (๐ฅ, ๐ง) then implies ๐น (๐ข๐) โ ๐น (๐ข) pointwisealmost everywhere as well as
| [๐น (๐ข๐)] (๐ฅ) | โค ๐(๐ฅ) + ๐ (๐ฅ) |๐ข๐ (๐ฅ) |๐/๐ โค ๐(๐ฅ) + ๐ (๐ฅ) |๐(๐ฅ) |๐/๐ for almost every ๐ฅ โ ฮฉ.
Since ๐ โ ๐ฟ๐ (ฮฉ), the right-hand side is in ๐ฟ๐ (ฮฉ), and we can apply Lebesgueโs dominatedconvergence theorem to deduce that ๐น (๐ข๐) โ ๐น (๐ข) in ๐ฟ๐ (ฮฉ). As this argument can beapplied to any subsequence, the whole sequence must converge to ๐น (๐ข), which yield theclaimed continuity. ๏ฟฝ
In fact, the growth condition (2.5) is also necessary for continuity; see [Appell & Zabreiko1990, Theorem 3.2]. In addition, it is straightforward to show that for ๐ = ๐ = โ, thegrowth condition (2.5) (with ๐/๐ โ 0 in this case) implies that ๐น is even locally Lipschitzcontinuous.
Similarly, one would like to show that dierentiability of ๐ implies dierentiability of thecorresponding superposition operator ๐น , ideally with โpointwiseโ derivative [๐น โฒ(๐ข)โ] (๐ฅ) =๐ โฒ(๐ข (๐ฅ))โ(๐ฅ) (compare Example 1.3 (iii)). However, this does not hold in general; for ex-ample, the superposition operator dened by ๐ (๐ฅ, ๐ง) = sin(๐ง) is not dierentiable at ๐ข = 0for 1 โค ๐ = ๐ < โ. The reason is that for a Frรฉchet dierentiable superposition operator๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) and a direction โ โ ๐ฟ๐ (ฮฉ), the pointwise(!) product has to satisfy๐น โฒ(๐ข)โ โ ๐ฟ๐ (ฮฉ). This leads to additional conditions on the superposition operator ๐น โฒdened by ๐ โฒ, which is known as two norm discrepancy.
Theorem 2.13. Let ๐ : ฮฉ ร โ โ โ be a Carathรฉodory function that satises the growthcondition (2.5) for 1 โค ๐ < ๐ < โ. If the partial derivative ๐ โฒ๐ง is a Carathรฉodory functionas well and satises (2.5) for ๐โฒ = ๐ โ ๐, the superposition operator ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) iscontinuously Frรฉchet dierentiable, and its derivative in ๐ข โ ๐ฟ๐ (ฮฉ) in direction โ โ ๐ฟ๐ (ฮฉ) isgiven by
[๐น โฒ(๐ข)โ] (๐ฅ) = ๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ))โ(๐ฅ) for almost every ๐ฅ โ ฮฉ.
Proof. Theorem 2.12 yields that for ๐ โ ๐๐๐โ๐ (i.e.,
๐๐ = ๐
๐ โฒ ), the superposition operator
๐บ : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ), [๐บ (๐ข)] (๐ฅ) = ๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ,
is well-dened and continuous. The Hรถlder inequality further implies that for any ๐ข โ๐ฟ๐ (ฮฉ),(2.6) โ๐บ (๐ข)โโ๐ฟ๐ โค โ๐บ (๐ข)โ๐ฟ๐ โโโ๐ฟ๐ for all โ โ ๐ฟ๐ (ฮฉ),
25
2 calculus of variations
i.e., the pointwise multiplication โ โฆโ ๐บ (๐ข)โ denes a bounded linear operator ๐ท๐น (๐ข) :๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ).Let now โ โ ๐ฟ๐ (ฮฉ) be arbitrary. Since ๐ง โฆโ ๐ (๐ฅ, ๐ง) is continuously dierentiable byassumption, the classical mean value theorem together with the properties of the integral(in particular, monotonicity, Jensenโs inequality on [0, 1], and Fubiniโs theorem) and (2.6)implies that
โ๐น (๐ข + โ) โ ๐น (๐ข) โ ๐ท๐น (๐ข)โโ๐ฟ๐
=
(โซฮฉ|๐ (๐ฅ,๐ข (๐ฅ) + โ(๐ฅ)) โ ๐ (๐ฅ,๐ข (๐ฅ)) โ ๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ))โ(๐ฅ) |๐ ๐๐ฅ
) 1๐
=
(โซฮฉ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝโซ 1
0๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ) + ๐กโ(๐ฅ))โ(๐ฅ) ๐๐ก โ ๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ))โ(๐ฅ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๐ ๐๐ฅ) 1๐
โค(โซ 1
0
โซฮฉ
๏ฟฝ๏ฟฝ (๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ) + ๐กโ(๐ฅ)) โ ๐ โฒ๐ง (๐ฅ,๐ข (๐ฅ))) โ(๐ฅ)๏ฟฝ๏ฟฝ๐ ๐๐ฅ ๐๐ก ) 1๐
=โซ 1
0โ(๐บ (๐ข + ๐กโ) โ๐บ (๐ข))โโ๐ฟ๐ ๐๐ก
โคโซ 1
0โ๐บ (๐ข + ๐กโ) โ๐บ (๐ข)โ๐ฟ๐ ๐๐ก โโโ๐ฟ๐ .
Due to the continuity of ๐บ : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ), the integrand tends to zero uniformly in[0, 1] for โโโ๐ฟ๐ โ 0, and hence ๐น is by denition Frรฉchet dierentiable with derivative๐น โฒ(๐ข) = ๐ท๐น (๐ข) (whose continuity we have already shown). ๏ฟฝ
2.4 variational principles
As the example ๐ (๐ก) = 1/๐ก on {๐ก โ โ : ๐ก โฅ 1} shows, the coercivity requirement inTheorem 2.1 is necessary to obtain minimizers even if the functional is bounded from below.However, sometimes one does not need an exact minimizer and is satised with โalmostminimizersโ. Variational principles state that such almost minimizers can be obtained asminimizers of a perturbed functional and even give a precise relation between the size ofthe perturbation needed in terms of the desired distance from the inmum.
The most well-known variational principle is Ekelandโs variational principle, which holdsin general complete metric spaces but which we here state in Banach spaces for the sakeof notation. In the statement of the following theorem, note that we do not assume thefunctional to be weakly lower semicontinuous.
Theorem 2.14 (Ekelandโs variational principle). Let ๐ be a Banach space and ๐น : ๐ โ โ beproper, lower semicontinuous, and bounded from below. Let Y > 0 and ๐งY โ ๐ be such that
๐น (๐งY) < inf๐ฅโ๐
๐น (๐ฅ) + Y.
26
2 calculus of variations
Then for any _ > 0, there exists an ๐ฅ_ โ ๐ with
(i) โ๐ฅ_ โ ๐งY โ๐ โค _,
(ii) ๐น (๐ฅ_) + Y_ โ๐ฅ_ โ ๐งY โ๐ โค ๐น (๐งY),
(iii) ๐น (๐ฅ_) < ๐น (๐ฅ) + Y_ โ๐ฅ โ ๐ฅ_โ๐ for all ๐ฅ โ ๐ \ {๐ฅ_}.
Proof. The proof proceeds similarly to that of Theorem 2.1: We construct an โalmostminimizingโ sequence, show that it converges, and verify that the limit has the desiredproperties. Here we proceed inductively. First, set ๐ฅ0 โ ๐งY . For given ๐ฅ๐ , dene now
๐๐ โ{๐ฅ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ ๐น (๐ฅ) + Y_โ๐ฅ โ ๐ฅ๐โ๐ โค ๐น (๐ฅ๐)
}.
Since ๐ฅ๐ โ ๐๐ , this set is nonempty. We can thus choose ๐ฅ๐+1 โ ๐๐ such that
(2.7) ๐น (๐ฅ๐+1) โค 12๐น (๐ฅ๐) +
12 inf๐ฅโ๐๐
๐น (๐ฅ),
which is possible because either the right-hand side equals ๐น (๐ฅ๐) (in which case we choose๐ฅ๐+1 = ๐ฅ๐) or is strictly greater, in which case there must exist such an ๐ฅ๐+1 by the propertiesof the inmum. By construction, the sequence {๐น (๐ฅ๐)}๐โโ is thus decreasing as well asbounded from below and therefore convergent. Using the triangle inequality, the fact that๐ฅ๐+1 โ ๐๐ , and the telescoping sum, we also obtain that for any๐ โฅ ๐ โ โ,
(2.8) Y
_โ๐ฅ๐ โ ๐ฅ๐โ๐ โค
๐โ1โ๐=๐
Y
_โ๐ฅ ๐ โ ๐ฅ ๐+1โ๐ โค ๐น (๐ฅ๐) โ ๐น (๐ฅ๐).
Hence, {๐ฅ๐}๐โโ is a Cauchy sequence since {๐น (๐ฅ๐)}๐โโ is one and hence converges to some๐ฅ_ โ ๐ since ๐ is complete.
We now show that this limit has the claimed properties. We begin with (ii), for which weuse the fact that both ๐น and the norm in ๐ are lower semicontinuous and hence obtainfrom (2.8) by taking๐ โ โ that
(2.9) Y
_โ๐ฅ๐ โ ๐ฅ_โ๐ + ๐น (๐ฅ_) โค lim sup
๐โโY
_โ๐ฅ๐ โ ๐ฅ๐โ๐ + ๐น (๐ฅ๐) โค ๐น (๐ฅ๐) for any ๐ โฅ 0.
Choosing in particular ๐ = 0 such that ๐ฅ0 = ๐งY yields (ii).
Furthermore, by denition of ๐งY , this implies that
Y
_โ๐งY โ ๐ฅ_โ๐ โค ๐น (๐งY) โ ๐น (๐ฅ_) โค ๐น (๐งY) โ inf
๐ฅโ๐๐น (๐ฅ) < Y
and hence (i).
27
2 calculus of variations
Assume now that (iii) does not hold, i.e., that there exists an ๐ฅ โ ๐ \ {๐ฅ_} such that
(2.10) ๐น (๐ฅ) โค ๐น (๐ฅ_) โY
_โ๐ฅ โ ๐ฅ_โ๐ < ๐น (๐ฅ_).
Estimating ๐น (๐ฅ_) using (2.9) and then using the productive zero together with the triangleinequality, we obtain from the rst inequality that for all ๐ โ โ,
๐น (๐ฅ) โค ๐น (๐ฅ๐) โ Y
_โ๐ฅ๐ โ ๐ฅ_โ๐ โ Y
_โ๐ฅ โ ๐ฅ_โ๐ โค ๐น (๐ฅ๐) โ Y
_โ๐ฅ๐ โ ๐ฅ โ๐ .
Hence, ๐ฅ โ ๐๐ for all ๐ โ โ. From (2.7), we then deduce that
2๐น (๐ฅ๐+1) โ ๐น (๐ฅ๐) โค ๐น (๐ฅ) for all ๐ โ โ.
The convergence of {๐น (๐ฅ๐)}๐โโ together with (2.10) and the lower semicontinuity of ๐น thusyields the contradiction
lim๐โโ ๐น (๐ฅ๐) โค ๐น (๐ฅ) < ๐น (๐ฅ_) โค lim
๐โโ ๐น (๐ฅ๐). ๏ฟฝ
Ekelandโs variational principle has the disadvantage that even for dierentiable ๐น , theperturbed function that is minimized by ๐ฅ_ is inherently nonsmooth. This is dierent forsmooth variational principles such as the following one due to Borwein and Preiss [Borwein& Preiss 1987].
Theorem 2.15 (BorweinโPreiss variational principle). Let๐ be a Banach space and ๐น : ๐ โ โ
be proper, lower semicontinuous, and bounded from below. Let Y > 0 and ๐งY โ ๐ be such that
๐น (๐งY) < inf๐ฅโ๐
๐น (๐ฅ) + Y.
Then for any _ > 0 and ๐ โฅ 1, there exists
โข a sequence {๐ฅ๐}๐โโ0 โ ๐ with ๐ฅ0 = ๐งY converging strongly to some ๐ฅ_ โ ๐ and
โข a sequence {`๐}๐โโ0 โ (0,โ) with โโ๐=0 `๐ = 1
such that
(i) โ๐ฅ_ โ ๐ฅ๐โ๐ โค _ for all ๐ โ โ โช {0},(ii) ๐น (๐ฅ_) + Y
_๐โโ๐=0 `๐โ๐ฅ_ โ ๐ฅ๐โ๐๐ โค ๐น (๐งY),
(iii) ๐น (๐ฅ_) + Y_๐
โโ๐=0 `๐โ๐ฅ_ โ ๐ฅ๐โ๐๐ โค ๐น (๐ฅ) + Y
_๐โโ๐=0 `๐โ๐ฅ โ ๐ฅ๐โ๐๐ for all ๐ฅ โ ๐ .
Proof. We proceed similarly to the proof of Theorem 2.14 by induction. First, we choseconstants ๐พ, [, `, \ > 0 such that
โข ๐น (๐งY) โ inf๐ฅโ๐ ๐น (๐ฅ) < [ < ๐พ < Y,
28
2 calculus of variations
โข ` < 1 โ ๐พY ,
โข \ < `
(1 โ
([๐พ
) 1/๐ )๐.
Let now ๐ฅ0 โ ๐งY and ๐น0 โ ๐น and set ๐ฟ := (1 โ `) Y_๐ > 0. We then dene
๐น1(๐ฅ) := ๐น0(๐ฅ) + ๐ฟ`โ๐ฅ โ ๐ฅ0โ๐ for all ๐ฅ โ ๐ .By construction, we then have
inf๐ฅโ๐
๐น1(๐ฅ) โค ๐น1(๐ฅ0) = ๐น0(๐ฅ0),
and thus we can nd, by the same argument as for (2.7), an ๐ฅ1 โ ๐ with
๐น1(๐ฅ1) โค \๐น0(๐ฅ0) + (1 โ \ ) inf๐ฅโ๐
๐น1(๐ฅ).
Continuing in this manner, we obtain sequences {๐ฅ๐}๐โโ and {๐น๐}๐โโ with
(2.11) ๐น๐+1(๐ฅ) = ๐น๐ (๐ฅ) + ๐ฟ`๐โ๐ฅ โ ๐ฅ๐โ๐๐and
(2.12) ๐น๐+1(๐ฅ๐+1) โค \๐น๐ (๐ฅ๐) + (1 โ \ ) inf๐ฅโ๐
๐น (๐ฅ).
Set now ๐ ๐ โ inf๐ฅโ๐ ๐น๐ (๐ฅ) and ๐๐ โ ๐น๐ (๐ฅ๐). Then (2.11) implies that {๐ ๐}๐โฅ0 is monoton-ically increasing, while (2.12) implies that {๐๐}๐โฅ0 is monotonically decreasing. We thushave
(2.13) ๐ ๐ โค ๐ ๐+1 โค ๐๐+1 โค \๐๐ + (1 โ \ )๐ ๐+1 โค ๐๐,which can be rearranged to show for all ๐ โฅ 0 that
(2.14) ๐๐+1 โ ๐ ๐+1 โค \๐๐ + (1 โ \ )๐ ๐+1 โ ๐ ๐+1 = \ (๐๐ โ ๐ ๐+1) โค \ (๐๐ โ ๐ ๐) โค \๐ (๐0 โ ๐ 0).This together with the monotonicity of the two sequences and the boundedness of ๐น frombelow shows that lim๐โโ ๐๐ = lim๐โโ ๐ ๐ โ โ. We now use (2.11) in (2.13) to obtain that
๐๐ โฅ ๐๐+1 = ๐น๐ (๐ฅ๐) + ๐ฟ`๐โ๐ฅ๐+1 โ ๐ฅ๐โ๐ โฅ ๐ ๐ + ๐ฟ`๐โ๐ฅ๐+1 โ ๐ฅ๐โ๐,which together with (2.14) and the choice of [ yields
๐ฟ`๐โ๐ฅ๐+1 โ ๐ฅ๐โ๐๐ โค ๐๐ โ ๐ ๐ โค \๐ (๐0 โ ๐ 0) < [\๐ .The choice of \ and ` now ensure that 0 < \
` < 1, which implies that
(2.15) โ๐ฅ๐ โ ๐ฅ๐โ๐ โค๐โ๐โ1โ๐=๐
โ๐ฅ๐+1 โ ๐ฅ๐ โ๐ โค([๐ฟ
) 1/๐ ๐โ๐โ1โ๐=๐
(\
`
)๐/๐โค
([๐ฟ
) 1/๐ (\
`
)๐/๐ (1 โ
(\
`
) 1/๐)โ1for all๐,๐ โฅ 0
29
2 calculus of variations
using the partial geometric series
๐โ๐โ1โ๐=๐
๐ผ๐ =๐โ๐โ1โ๐=0
๐ผ๐ โ๐โ1โ๐=0
๐ผ๐ =1 โ ๐ผ๐โ๐
1 โ ๐ผ โ 1 โ ๐ผ๐1 โ ๐ผ <
๐ผ๐
1 โ ๐ผ
valid for any ๐ผ โ (0, 1). Hence {๐ฅ๐}๐ โ โ is a Cauchy sequence which therefore convergesto some ๐ฅ_ โ ๐ . Setting `๐ โ `๐ (1 โ `) > 0, we also have โโ
๐=0 `๐ = 1 by the choice of` < 1. Furthermore, the denition of `๐ and ๐ฟ implies for all ๐ฅ โ ๐ that
(2.16) ๐น (๐ฅ) + Y
_๐
โโ๐=0
`๐ โ๐ฅ โ ๐ฅ๐ โ๐๐ = lim๐โโ ๐น (๐ฅ) +
๐โ๐=0
๐ฟ`๐ โ๐ฅ โ ๐ฅ๐ โ๐๐ = lim๐โโ ๐น๐ (๐ฅ).
It remains to verify the claims on ๐ฅ_ . First, (2.15) together with the choice of \ and ๐ฟ impliesfor all ๐,๐ โฅ 0 that
โ๐ฅ๐ โ ๐ฅ๐โ๐ โค([๐ฟ
) 1/๐ ([
๐พ
)โ1/๐=
(๐พ๐ฟ
) 1/๐<
( Y๐ฟ
) 1/๐(1 โ `)1/๐ = _.
Letting๐ โ โ for xed ๐ โ โ โช {0} now shows (i).
Second, by (2.11) and the denition of ๐ฟ , we have
๐น (๐ฅ๐) + Y
_๐
โโ๐=0
`๐ โ๐ฅ๐ โ ๐ฅ๐ โ๐๐ = ๐น๐ (๐ฅ๐) + Y
_๐
โโ๐=๐+1
`๐ โ๐ฅ๐ โ ๐ฅ๐ โ๐๐ โค ๐๐ + Yโโ
๐=๐+1`๐ ,
where the inequality follows from (i). The lower semicontinuity of ๐น and of the norm thusyield
(2.17) ๐น (๐ฅ_) +Y
_๐
โโ๐=0
`๐ โ๐ฅ_ โ ๐ฅ๐ โ๐๐ โค lim๐โโ๐๐ โค ๐0 = ๐น (๐งY)
since {๐๐}๐โฅ0 is monotonically decreasing. This shows (ii).
Finally, (2.16) and the denition of ๐ ๐ imply for all ๐ฅ โ ๐ that
๐น (๐ฅ) + Y
_๐
โโ๐=0
`๐ โ๐ฅ โ ๐ฅ๐ โ๐๐ = lim๐โโ ๐น๐ (๐ฅ) โฅ lim
๐โโ ๐ ๐ = lim๐โโ๐๐,
which together with (2.17) yields (iii). ๏ฟฝ
The BorweinโPreiss variational principle therefore guarantees a smooth perturbationif, e.g., ๐ is a Hilbert space and ๐ = 2. Further smooth variational principles that allowfor more general smooth perturbations such as the DevilleโGodefroyโZizzler variationalprinciple can be found in, e.g., [Borwein & Zhu 2005; Schirotzek 2007].
30
Part II
CONVEX ANALYSIS
31
3 CONVEX FUNCTIONS
The classical derivative concepts from the previous chapter are not sucient for ourpurposes, since many interesting functionals are not dierentiable in this sense; also, theycannot handle functionals with values in โ. We therefore need a derivative concept that ismore general than Gรขteaux and Frรฉchet derivatives and still allows a Fermat principle aswell as a rich calculus. Throughout this and the following chapters, ๐ will be a normedvector space unless noted otherwise.
3.1 basic properties
We rst consider a general class of functionals that admit such a generalized derivative. Afunctional ๐น : ๐ โ โ is called convex if for all ๐ฅ, ๐ฆ โ ๐ and _ โ [0, 1], it holds that
(3.1) ๐น (_๐ฅ + (1 โ _)๐ฆ) โค _๐น (๐ฅ) + (1 โ _)๐น (๐ฆ)
(where the function valueโ is allowed on both sides). If for all ๐ฅ, ๐ฆ โ dom ๐น with ๐ฅ โ ๐ฆand all _ โ (0, 1) we even have
๐น (_๐ฅ + (1 โ _)๐ฆ) < _๐น (๐ฅ) + (1 โ _)๐น (๐ฆ),
we call ๐น strictly convex.
An alternative characterization of the convexity of a functional ๐น : ๐ โ โ is based on itsepigraph
epi ๐น โ {(๐ฅ, ๐ก) โ ๐ รโ | ๐น (๐ฅ) โค ๐ก} .
Lemma 3.1. Let ๐น : ๐ โ โ. Then epi ๐น is
(i) nonempty if and only if ๐น is proper;
(ii) convex if and only if ๐น is convex;
(iii) (weakly) closed if and only if ๐น is (weakly) lower semicontinuous.1
1For that reason, some authors use the term closed to refer to lower semicontinuous functionals. We willstick with the latter, much less ambiguous, term throughout the following.
32
3 convex functions
Proof. Statement (i) follows directly from the denition: ๐น is proper if and only if thereexists an ๐ฅ โ ๐ and a ๐ก โ โ with ๐น (๐ฅ) โค ๐ก < โ, i.e., (๐ฅ, ๐ก) โ epi ๐น .
For (ii), let ๐น be convex and (๐ฅ, ๐ ), (๐ฆ, ๐ ) โ epi ๐น be given. For any _ โ [0, 1], the denition(3.1) then implies that
๐น (_๐ฅ + (1 โ _)๐ฆ) โค _๐น (๐ฅ) + (1 โ _)๐น (๐ฆ) โค _๐ + (1 โ _)๐ ,
i.e., that_(๐ฅ, ๐ ) + (1 โ _) (๐ฆ, ๐ ) = (_๐ฅ + (1 โ _)๐ฆ, _๐ + (1 โ _)๐ ) โ epi ๐น,
and hence epi ๐น is convex. Let conversely epi ๐น be convex and ๐ฅ, ๐ฆ โ ๐ be arbitrary,where we can assume that ๐น (๐ฅ) < โ and ๐น (๐ฆ) < โ (otherwise (3.1) is trivially satised).We clearly have (๐ฅ, ๐น (๐ฅ)), (๐ฆ, ๐น (๐ฆ)) โ epi ๐น . The convexity of epi ๐น then implies for all_ โ [0, 1] that
(_๐ฅ + (1 โ _)๐ฆ, _๐น (๐ฅ) + (1 โ _)๐น (๐ฆ)) = _(๐ฅ, ๐น (๐ฅ)) + (1 โ _) (๐ฆ, ๐น (๐ฆ)) โ epi ๐น,
and hence by denition of epi ๐น that (3.1) holds.
Finally, we show (iii): Let rst ๐น be lower semicontinuous, and let {(๐ฅ๐, ๐ก๐)}๐โโ โ epi ๐น bean arbitrary sequence with (๐ฅ๐, ๐ก๐) โ (๐ฅ, ๐ก) โ ๐ รโ. Then we have that
๐น (๐ฅ) โค lim inf๐โโ ๐น (๐ฅ๐) โค lim sup
๐โโ๐ก๐ = ๐ก,
i.e., (๐ฅ, ๐ก) โ epi ๐น . Let conversely epi ๐น be closed and assume that ๐น is proper (otherwisethe claim holds trivially) and not lower semicontinuous. Then there exists a sequence{๐ฅ๐}๐โโ โ ๐ with ๐ฅ๐ โ ๐ฅ โ ๐ and
๐น (๐ฅ) > lim inf๐โโ ๐น (๐ฅ๐) =: ๐ โ [โโ,โ) .
We now distinguish two cases.
a) ๐ฅ โ dom ๐น : In this case, we can select a subsequence, again denoted by {๐ฅ๐}๐โโ, suchthat there exists an Y > 0with ๐น (๐ฅ๐) โค ๐น (๐ฅ) โY and thus (๐ฅ๐, ๐น (๐ฅ) โY) โ epi ๐น for all๐ โ โ. From ๐ฅ๐ โ ๐ฅ and the closedness of epi ๐น , we deduce that (๐ฅ, ๐น (๐ฅ) โ Y) โ epi ๐นand hence ๐น (๐ฅ) โค ๐น (๐ฅ) โ Y, contradicting Y > 0.
b) ๐ฅ โ dom ๐น : In this case, we can argue similarly using ๐น (๐ฅ๐) โค ๐ + Y for๐ > โโ or๐น (๐ฅ๐) โค Y for๐ = โโ to obtain a contradiction with ๐น (๐ฅ) = โ.
The equivalence of weak lower semicontinuity and weak closedness follows in exactly thesame way. ๏ฟฝ
Note that (๐ฅ, ๐ก) โ epi ๐น implies that ๐ฅ โ dom ๐น ; hence the eective domain of a proper,convex, and lower semicontinuous functional is always nonempty, convex, and closed aswell. Also, together with Lemma 1.10 we immediately obtain
33
3 convex functions
Corollary 3.2. Let ๐น : ๐ โ โ be convex. Then ๐น is weakly lower semicontinuous if and only๐น is lower semicontinuous.
Also useful for the study of a functional ๐น : ๐ โ โ are the corresponding sublevel sets
sub๐ก ๐น โ {๐ฅ โ ๐ | ๐น (๐ฅ) โค ๐ก} , ๐ก โ โ,
for which one shows as in Lemma 3.1 the following properties.
Lemma 3.3. Let ๐น : ๐ โ โ.
(i) If ๐น is convex, sub๐ก ๐น is convex for all ๐ก โ โ (but the converse does not hold).
(ii) ๐น is (weakly) lower semicontinuous if and only if sub๐ก ๐น is (weakly) closed for all ๐ก โ โ.
Directly from the denition we obtain the convexity of
(i) continuous ane functionals of the form ๐ฅ โฆโ ใ๐ฅโ, ๐ฅใ๐ โ ๐ผ for xed ๐ฅโ โ ๐ โ and๐ผ โ โ;
(ii) the norm โ ยท โ๐ in a normed vector space ๐ ;
(iii) the indicator function ๐ฟ๐ถ for a convex set ๐ถ .
If ๐ is a Hilbert space, ๐น (๐ฅ) = โ๐ฅ โ2๐ is even strictly convex: For ๐ฅ, ๐ฆ โ ๐ with ๐ฅ โ ๐ฆ andany _ โ (0, 1),
โ_๐ฅ + (1 โ _)๐ฆ โ2๐ = (_๐ฅ + (1 โ _)๐ฆ | _๐ฅ + (1 โ _)๐ฆ)๐= _2(๐ฅ | ๐ฅ)๐ + 2_(1 โ _) (๐ฅ | ๐ฆ)๐ + (1 โ _)2(๐ฆ | ๐ฆ)๐= _
(_(๐ฅ | ๐ฅ)๐ โ (1 โ _) (๐ฅ โ ๐ฆ | ๐ฅ)๐ + (1 โ _) (๐ฆ | ๐ฆ)๐
)+ (1 โ _)
(_(๐ฅ | ๐ฅ)๐ + _(๐ฅ โ ๐ฆ | ๐ฆ)๐ + (1 โ _) (๐ฆ | ๐ฆ)๐
)= (_ + (1 โ _))
(_(๐ฅ | ๐ฅ)๐ + (1 โ _) (๐ฆ | ๐ฆ)๐
)โ _(1 โ _) (๐ฅ โ ๐ฆ | ๐ฅ โ ๐ฆ)๐
= _โ๐ฅ โ2๐ + (1 โ _)โ๐ฆ โ2๐ โ _(1 โ _)โ๐ฅ โ ๐ฆ โ2๐< _โ๐ฅ โ2๐ + (1 โ _)โ๐ฆ โ2๐ .
Further examples can be constructed as in Lemma 2.3 through the following operations.
Lemma 3.4. Let ๐ and ๐ be normed vector spaces and let ๐น : ๐ โ โ be convex. Then thefollowing functionals are convex as well:
(i) ๐ผ๐น for all ๐ผ โฅ 0;
(ii) ๐น +๐บ for ๐บ : ๐ โ โ convex (if ๐น or ๐บ are strictly convex, so is ๐น +๐บ);
34
3 convex functions
(iii) ๐ โฆ ๐น for ๐ : โ โ โ convex and increasing;
(iv) ๐น โฆ ๐พ for ๐พ : ๐ โ ๐ linear;
(v) ๐ฅ โฆโ sup๐โ๐ผ ๐น๐ (๐ฅ) with ๐น๐ : ๐ โ โ convex for an arbitrary set ๐ผ .
Lemma 3.4 (v) in particular implies that the pointwise supremum of continuous anefunctionals is always convex. In fact, any convex functional can be written in this way. Toshow this, we dene for a proper functional ๐น : ๐ โ โ the convex envelope
๐น ฮ (๐ฅ) โ sup {๐(๐ฅ) | ๐ continuous ane with ๐(๐ฅ) โค ๐น (๐ฅ) for all ๐ฅ โ ๐ } .
Note that ๐น ฮ : ๐ โ [โโ,โ] without further assumptions of ๐น .
Lemma 3.5. Let ๐น : ๐ โ โ be proper. Then ๐น is convex and lower semicontinuous if and onlyif ๐น = ๐น ฮ .
Proof. Since ane functionals are convex, Lemma 3.4 (v) and Lemma 2.3 (v) imply that๐น = ๐น ฮ is always continuous and lower semicontinuous.
Let now ๐น : ๐ โ โ be proper, convex, and lower semicontinuous. It is clear from thedenition of ๐น ฮ as a pointwise supremum that ๐น ฮ โค ๐น always holds. Assume therefore that๐น ฮ < ๐น . Then there exists an ๐ฅ0 โ ๐ and a _ โ โ with
๐น ฮ (๐ฅ0) < _ < ๐น (๐ฅ0).
We now use the HahnโBanach separation theorem to construct a continuous ane func-tional ๐ โ ๐ โ with ๐ โค ๐น but ๐(๐ฅ0) > _ > ๐น ฮ (๐ฅ0), which would contradict the denition of๐น ฮ . Since ๐น is proper, convex, and lower semicontinuous, epi ๐น is nonempty, convex, andclosed by Lemma 3.1. Furthermore, {(๐ฅ0, _)} is compact and, as _ < ๐น (๐ฅ0), disjoint withepi ๐น . Theorem 1.5 (ii) hence yields a ๐งโ โ (๐ รโ)โ and an ๐ผ โ โ with
ใ๐งโ, (๐ฅ, ๐ก)ใ๐รโ โค ๐ผ < ใ๐งโ, (๐ฅ0, _)ใ๐รโ for all (๐ฅ, ๐ก) โ epi ๐น .
We now dene an ๐ฅโ โ ๐ โ via ใ๐ฅโ, ๐ฅใ๐ = ใ๐งโ, (๐ฅ, 0)ใ๐รโ for all ๐ฅ โ ๐ and set ๐ โใ๐งโ, (0, 1)ใ๐รโ โ โ. Then ใ๐งโ, (๐ฅ, ๐ก)ใ๐รโ = ใ๐ฅโ, ๐ฅใ๐ + ๐ ๐ก and hence
(3.2) ใ๐ฅโ, ๐ฅใ๐ + ๐ ๐ก โค ๐ผ < ใ๐ฅโ, ๐ฅ0ใ๐ + ๐ _ for all (๐ฅ, ๐ก) โ epi ๐น .
Now for (๐ฅ, ๐ก) โ epi ๐น we also have (๐ฅ, ๐ก โฒ) โ epi ๐น for all ๐ก โฒ > ๐ก , and the rst inequality in(3.2) implies that for all suciently large ๐ก โฒ > 0,
๐ โค ๐ผ โ ใ๐ฅโ, ๐ฅใ๐๐ก โฒ
โ 0 for ๐ก โฒ โ โ.
Hence ๐ โค 0. We continue with a case distinction.
35
3 convex functions
(i) ๐ < 0: We set๐ : ๐ โ โ, ๐ฅ โฆโ ๐ผ โ ใ๐ฅโ, ๐ฅใ๐
๐ ,
which is ane and continuous. Furthermore, (๐ฅ, ๐น (๐ฅ)) โ epi ๐น for any ๐ฅ โ dom ๐น ,and using the productive zero in the rst inequality in (3.2) implies (noting ๐ < 0!)that
๐(๐ฅ) = 1๐ (๐ผ โ ใ๐ฅโ, ๐ฅใ๐ โ ๐ ๐น (๐ฅ)) + ๐น (๐ฅ) โค ๐น (๐ฅ).
(For ๐ฅ โ dom ๐น this holds trivially.) But the second inequality in (3.2) implies that
๐(๐ฅ0) = 1๐ (๐ผ โ ใ๐ฅโ, ๐ฅ0ใ๐ ) > _.
(ii) ๐ = 0: Then ใ๐ฅโ, ๐ฅใ๐ โค ๐ผ < ใ๐ฅโ, ๐ฅ0ใ๐ for all ๐ฅ โ dom ๐น , which can only hold for๐ฅ0 โ dom ๐น . But ๐น is proper, and hence we can nd a ๐ฆ0 โ dom ๐น , for which wecan construct as in case (i) by separating epi ๐น and (๐ฆ0, `) for suciently small ` acontinuous ane functional ๐0 : ๐ โ โ with ๐0 โค ๐น pointwise. For ๐ > 0 we nowset
๐๐ : ๐ โ โ, ๐ฅ โฆโ ๐0(๐ฅ) + ๐ (ใ๐ฅโ, ๐ฅใ๐ โ ๐ผ) ,which is continuous ane as well. Since ใ๐ฅโ, ๐ฅใ๐ โค ๐ผ , we also have that ๐๐ (๐ฅ) โค๐0(๐ฅ) โค ๐น (๐ฅ) for all ๐ฅ โ dom ๐น and arbitrary ๐ > 0. But due to ใ๐ฅโ, ๐ฅ0ใ๐ > ๐ผ , wecan choose ๐ > 0 with ๐๐ (๐ฅ0) > _.
In both cases, the denition of ๐น ฮ as a supremum implies that ๐น ฮ (๐ฅ0) > _ as well, contra-dicting the assumption ๐น ฮ (๐ฅ0) < _. ๏ฟฝ
Remark 3.6. Using the weak-โ HahnโBanach Theorem 1.13 in place of Theorem 1.5, the same proofshows that a proper functional ๐น : ๐ โ โ โ is convex and weakly-โ lower semicontinuous if andonly if ๐น = ๐นฮ for
๐นฮ (๐ฅโ) โ sup {ใ๐ฅโ, ๐ฅใ๐ + ๐ผ | ๐ฅ โ ๐, ๐ผ โ โ, ใ๐ฅโ, ๐ฅใ๐ + ๐ผ โค ๐น (๐ฅโ) for all ๐ฅโ โ ๐ โ} .
(Note that a convex and weakly-โ lower semicontinuous functional need not be lower semicontinu-ous, since convex and closed sets need not be weakly-โ closed.)
A particularly useful class of convex functionals in the calculus of variations arises fromintegral functionals with convex integrands dened through superposition operators.
Lemma 3.7. Let ๐ : โ โ โ be proper, convex, and lower semicontinuous. If ฮฉ โ โ๐ isbounded and 1 โค ๐ โค โ, this also holds for
๐น : ๐ฟ๐ (ฮฉ) โ โ, ๐ข โฆโ{โซ
ฮฉ๐ (๐ข (๐ฅ)) ๐๐ฅ if ๐ โฆ ๐ข โ ๐ฟ1(ฮฉ),
โ else.
36
3 convex functions
Proof. First, Lemma 3.5 implies that there exist ๐, ๐ โ โ such that
(3.3) ๐ (๐ก) โฅ ๐๐ก โ ๐ for all ๐ก โ โ.
Since ฮฉ is bounded, this proves that ๐น (๐ข) > โโ for all ๐ข โ ๐ฟ๐ (ฮฉ) โ ๐ฟ1(ฮฉ), consequentlyfor ๐ข โ ๐ฟ๐ (ฮฉ) as ๐ฟ๐ (ฮฉ) โ ๐ฟ1(ฮฉ) due to the boundedness of ฮฉ. Since ๐ is proper, there is a๐ก0 โ dom ๐ . Hence, also using that ฮฉ is bounded, the constant function ๐ข0 โก ๐ก0 โ dom ๐นsatises ๐น (๐ข0) < โ. This shows that ๐น is proper.
To show convexity, we take ๐ข, ๐ฃ โ dom ๐น (since otherwise (3.1) is trivially satised) and_ โ [0, 1] arbitrary. The convexity of ๐ now implies that
๐ (_๐ข (๐ฅ) + (1 โ _)๐ฃ (๐ฅ)) โค _๐ (๐ข (๐ฅ)) + (1 โ _) ๐ (๐ฃ (๐ฅ)) for almost every ๐ฅ โ ฮฉ.
Since ๐ข, ๐ฃ โ dom ๐น and ๐ฟ1(ฮฉ) is a vector space, _๐ (๐ข (๐ฅ)) + (1 โ _) ๐ (๐ฃ (๐ฅ)) โ ๐ฟ1(ฮฉ) as well.Similarly, the left-hand side is bounded from below by ๐ (_๐ข (๐ฅ) + (1 โ _)๐ฃ (๐ฅ)) โ ๐ โ ๐ฟ1(ฮฉ)by (3.3). We can thus integrate the inequality over ฮฉ to obtain the convexity of ๐น .
To show lower semicontinuity, we use Lemma 3.1. Let {(๐ข๐, ๐ก๐)}๐โโ โ epi ๐น with ๐ข๐ โ ๐ขin ๐ฟ๐ (ฮฉ) and ๐ก๐ โ ๐ก in โ. Then there exists a subsequence {๐ข๐๐ }๐โโ with ๐ข๐๐ (๐ฅ) โ ๐ข (๐ฅ)almost everywhere. Hence, the lower semicontinuity of ๐ together with Fatouโs Lemmaimplies thatโซ
ฮฉ๐ (๐ข (๐ฅ)) โ (๐๐ข (๐ฅ) โ ๐ผ) ๐๐ฅ โค
โซฮฉlim inf๐โโ
(๐ (๐ข๐๐ (๐ฅ)) โ (๐๐ข๐๐ (๐ฅ) โ ๐ผ)) ๐๐ฅ
โค lim inf๐โโ
โซฮฉ๐ (๐ข๐๐ (๐ฅ)) โ (๐๐ข๐๐ (๐ฅ) โ ๐ผ) ๐๐ฅ
= lim inf๐โโ
โซฮฉ๐ (๐ข๐๐ (๐ฅ)) ๐๐ฅ โ
โซฮฉ๐๐ข (๐ฅ) โ ๐ผ ๐๐ฅ
as the integrands are nonnegative due to (3.3). Since (๐ข๐๐ , ๐ก๐๐ ) โ epi ๐น , this yields
๐น (๐ข) =โซฮฉ๐ (๐ข (๐ฅ)) ๐๐ฅ โค lim inf
๐โโ
โซฮฉ๐ (๐ข๐๐ (๐ฅ)) ๐๐ฅ = lim inf
๐โโ๐น (๐ข๐๐ ) โค lim
๐โโ๐ก๐๐ = ๐ก,
i.e., (๐ข, ๐ก) โ epi ๐น . Hence epi ๐น is closed, and the lower semicontinuity of ๐น follows fromLemma 3.1 (iii). ๏ฟฝ
3.2 existence of minimizers
After all this preparation, we can quickly prove the main result on existence of solutionsto convex minimization problems.
Theorem 3.8. Let ๐ be a reexive Banach space and let
37
3 convex functions
(i) ๐ โ ๐ be nonempty, convex, and closed;
(ii) ๐น : ๐ โ โ be proper, convex, and lower semicontinuous with dom ๐น โฉ๐ โ โ ;(iii) ๐ be bounded or ๐น be coercive.
Then the problemmin๐ฅโ๐
๐น (๐ฅ)admits a solution ๐ฅ โ ๐ โฉ dom ๐น . If ๐น is strictly convex, the solution is unique.
Proof. We consider the extended functional ๐น = ๐น + ๐ฟ๐ : ๐ โ โ. Assumption (i) togetherwith Lemma 2.5 implies that ๐ฟ๐ is proper, convex, and weakly lower semicontinuous.From (i) we obtain an ๐ฅ0 โ ๐ with ๐น (๐ฅ0) < โ, and hence ๐น is proper, convex, and (byCorollary 3.2) weakly lower semicontinuous. Finally, due to (iii), ๐น is coercive since forbounded ๐ , we can use that ๐น > โโ, and for coercive ๐น , we can use that ๐ฟ๐ โฅ 0. Hencewe can apply Theorem 2.1 to obtain the existence of a minimizer ๐ฅ โ dom ๐น = ๐ โฉ dom ๐นof ๐น with
๐น (๐ฅ) = ๐น (๐ฅ) โค ๐น (๐ฅ) = ๐น (๐ฅ) for all ๐ฅ โ ๐ ,i.e., ๐ฅ is the claimed solution.
Let now ๐น be strictly convex, and let ๐ฅ and ๐ฅโฒ โ ๐ be two dierent minimizers, i.e.,๐น (๐ฅ) = ๐น (๐ฅโฒ) = min๐ฅโ๐ ๐น (๐ฅ) and ๐ฅ โ ๐ฅโฒ. Then by the convexity of ๐ we have for all_ โ (0, 1) that
๐ฅ_ โ _๐ฅ + (1 โ _)๐ฅโฒ โ ๐ ,while the strict convexity of ๐น implies that
๐น (๐ฅ_) < _๐น (๐ฅ) + (1 โ _)๐น (๐ฅโฒ) = ๐น (๐ฅ).
But this is a contradiction to ๐น (๐ฅ) โค ๐น (๐ฅ) for all ๐ฅ โ ๐ . ๏ฟฝ
Note that for a sum of two convex functionals to be coercive, it is in general not sucientthat only one of them is. Functionals for which this is the case โ such as the indicatorfunction of a bounded set โ are called supercoercive; another example which will be helpfullater is the squared norm.
Lemma 3.9. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐ฅ0 โ ๐ begiven. Then the functional
๐ฝ : ๐ โ โ, ๐ฅ โฆโ ๐น (๐ฅ) + 12 โ๐ฅ โ ๐ฅ0โ2๐
is coercive.
38
3 convex functions
Proof. Since ๐น is proper, convex, and lower semicontinuous, it follows from Lemma 3.5 that๐น is bounded from below by a continuous ane functional, i.e., there exists an ๐ฅโ โ ๐ โ
and an ๐ผ โ โ with ๐น (๐ฅ) โฅ ใ๐ฅโ, ๐ฅใ๐ + ๐ผ for all ๐ฅ โ ๐ . Together with the reverse triangleinequality and (1.1), we obtain that
๐ฝ (๐ฅ) โฅ ใ๐ฅโ, ๐ฅใ๐ + ๐ผ + 12 (โ๐ฅ โ๐ โ โ๐ฅ0โ๐ )2
โฅ โโ๐ฅโโ๐ โ โ๐ฅ โ๐ + ๐ผ + 12 โ๐ฅ โ2๐ โ โ๐ฅ โ๐ โ๐ฅ0โ๐
= โ๐ฅ โ๐( 12 โ๐ฅ โ๐ โ โ๐ฅโโ๐ โ โ โ๐ฅ0โ๐
) + ๐ผ.Since ๐ฅโ and ๐ฅ0 are xed, the term in parentheses is positive for โ๐ฅ โ๐ suciently large,and hence ๐ฝ (๐ฅ) โ โ for โ๐ฅ โ๐ โ โ as claimed. ๏ฟฝ
3.3 continuity properties
To close this chapter, we show the following remarkable result: Any (locally) bounded convexfunctional is (locally) continuous. (An extended real-valued proper functional is necessarilydiscontinuous at some point.) Besides being of use in later chapters, this result illustratesthe beauty of convex analysis: an algebraic but global property (convexity) connects twotopological but local properties (neighborhood and continuity). Here we consider of coursethe strong topology in a normed vector space.
Lemma 3.10. Let ๐ be a normed vector space, ๐น : ๐ โ โ be convex, and ๐ฅ โ ๐ . If there is a๐ > 0 such that ๐น is bounded from above on ๐(๐ฅ, ๐), then ๐น is locally Lipschitz continuousnear ๐ฅ .
Proof. By assumption, there exists an๐ โ โ with ๐น (๐ฆ) โค ๐ for all ๐ฆ โ ๐(๐ฅ, ๐). We rstshow that ๐น is locally bounded from below as well. Let ๐ฆ โ ๐(๐ฅ, ๐) be arbitrary. Sinceโ๐ฅ โ ๐ฆ โ๐ < ๐ , we also have that ๐ง โ 2๐ฅ โ ๐ฆ = ๐ฅ โ (๐ฆ โ ๐ฅ) โ ๐(๐ฅ, ๐), and the convexity of๐น implies that ๐น (๐ฅ) = ๐น ( 1
2๐ฆ + 12๐ง
) โค 12๐น (๐ฆ) + 1
2๐น (๐ง) and hence that
โ๐น (๐ฆ) โค ๐น (๐ง) โ 2๐น (๐ฅ) โค ๐ โ 2๐น (๐ฅ) =:๐,
i.e., โ๐ โค ๐น (๐ฆ) โค ๐ for all ๐ฆ โ ๐(๐ฅ, ๐).We now show that this implies Lipschitz continuity on ๐(๐ฅ, ๐2 ). Let ๐ฆ1, ๐ฆ2 โ ๐(๐ฅ, ๐2 ) with๐ฆ1 โ ๐ฆ2 and set
๐ง โ ๐ฆ1 + ๐2๐ฆ1 โ ๐ฆ2
โ๐ฆ1 โ ๐ฆ2โ๐ โ ๐(๐ฅ, ๐),
which holds because โ๐ง โ ๐ฅ โ๐ โค โ๐ฆ1 โ ๐ฅ โ๐ + ๐2 < ๐ . By construction, we thus have that
๐ฆ1 = _๐ง + (1 โ _)๐ฆ2 for _ โโ๐ฆ1 โ ๐ฆ2โ๐
โ๐ฆ1 โ ๐ฆ2โ๐ + ๐2โ (0, 1),
39
3 convex functions
and the convexity of ๐น now implies that ๐น (๐ฆ1) โค _๐น (๐ง) + (1 โ _)๐น (๐ฆ2). Together with thedenition of _ as well as ๐น (๐ง) โค ๐ and โ๐น (๐ฆ2) โค ๐ = ๐ โ 2๐น (๐ฅ), this yields the estimate
๐น (๐ฆ1) โ ๐น (๐ฆ2) โค _(๐น (๐ง) โ ๐น (๐ฆ2)) โค _(2๐ โ 2๐น (๐ฅ))=
2(๐ โ ๐น (๐ฅ))โ๐ฆ1 โ ๐ฆ2โ๐ + ๐
2โ๐ฆ1 โ ๐ฆ2โ๐
โค 2(๐ โ ๐น (๐ฅ))๐/2 โ๐ฆ1 โ ๐ฆ2โ๐ .
Exchanging the roles of ๐ฆ1 and ๐ฆ2, we obtain that
|๐น (๐ฆ1) โ ๐น (๐ฆ2) | โค 2(๐ โ ๐น (๐ฅ))๐/2 โ๐ฆ1 โ ๐ฆ2โ๐ for all ๐ฆ1, ๐ฆ2 โ ๐(๐ฅ, ๐2 )
and hence the local Lipschitz continuity with constant ๐ฟ(๐ฅ, ๐/2) โ 4(๐ โ ๐น (๐ฅ))/๐ . ๏ฟฝ
This result can be extended by showing that convex functions are bounded everywhere inthe interior (again a topological concept!) of their eective domain. As an intermediarystep, we rst consider the scalar case.2
Lemma 3.11. Let ๐ : โ โ โ be convex. Then ๐ is locally bounded from above on int(dom ๐ ).
Proof. Let ๐ฅ โ int(dom ๐ ), i.e., there exist ๐, ๐ โ โ with ๐ฅ โ (๐, ๐) โ dom ๐ . Let now๐ง โ (๐, ๐). Since intervals are convex, there exists a _ โ (0, 1) with ๐ง = _๐ + (1 โ _)๐. Byconvexity, we thus have
๐ (๐ง) โค _๐ (๐) + (1 โ _) ๐ (๐) โค max{๐ (๐), ๐ (๐)} < โ.Hence ๐ is locally bounded from above in ๐ฅ . ๏ฟฝ
The proof of the general case requires further assumptions on ๐ and ๐น .
Theorem 3.12. Let ๐ be a Banach space ๐น : ๐ โ โ be convex and lower semicontinuous.Then ๐น is locally bounded from above on int(dom ๐น ).
Proof. We rst show the claim for the case ๐ฅ = 0 โ int(dom ๐น ), which implies in particularthat๐ โ |๐น (0) | is nite. Consider now for arbitrary โ โ ๐ the mapping
๐ : โ โ โ, ๐ก โฆโ ๐น (๐กโ).It is straightforward to verify that ๐ is convex and lower semicontinuous as well andsatises 0 โ int(dom ๐ ). By Lemmas 3.10 and 3.11, ๐ is thus locally Lipschitz continuous2With a bit more eort, one can show that the claim holds for ๐น : โ๐ โ โ with arbitrary ๐ โ โ; see, e.g.,[Schirotzek 2007, Corollary 1.4.2].
40
3 convex functions
near 0; hence in particular |๐ (๐ก) โ ๐ (0) | โค ๐ฟ๐ก โค 1 for suciently small ๐ก > 0. The reversetriangle inequality therefore yields a ๐ฟ > 0 with
๐น (0 + ๐กโ) โค |๐น (0 + ๐กโ) | = |๐ (๐ก) | โค |๐ (0) | + 1 = ๐ + 1 for all ๐ก โ [0, ๐ฟ] .
Hence, 0 lies in the algebraic interior of the sublevel set sub๐+1 ๐น , which is convex andclosed (since we assumed ๐น to be lower semicontinuous) by Lemma 3.3. The coreโintLemma 1.2 thus yields that 0 โ int(sub๐+1 ๐น ), i.e., there exists a ๐ > 0 with ๐น (๐ง) โค ๐ + 1for all ๐ง โ ๐(0, ๐).For the general case ๐ฅ โ int(dom ๐น ), consider
๐น : ๐ โ โ, ๐ฆ โฆโ ๐น (๐ฆ โ ๐ฅ).
Again, it is straightforward to verify convexity and lower semicontinuity of ๐น and that0 โ int(dom ๐น ). It follows from the above that ๐น is locally bounded from above on ๐(0, ๐),which immediately implies that ๐น is locally bounded from above on ๐(๐ฅ, ๐). ๏ฟฝ
Together with Lemma 3.10, we thus obtain the desired result.
Theorem 3.13. Let ๐ be a Banach space ๐น : ๐ โ โ be convex and lower semicontinuous.Then ๐น is locally Lipschitz continuous on int(dom ๐น ).
We shall have several more occasions to observe the unreasonably nice behavior of convexlower semicontinuous functions on the interior of their eective domain.
41
4 CONVEX SUBDIFFERENTIALS
We now turn to the characterization of minimizers of convex functionals via a Fermatprinciple.
4.1 definition and basic properties
We rst dene our notion of generalized derivative. The motivation is geometric: Theclassical derivative ๐ โฒ(๐ก) of a scalar function ๐ : โ โ โ at ๐ก can be interpreted as the slopeof the tangent at ๐ (๐ก). If the function is not dierentiable, the tangent โ if it exists at all โneed no longer be unique. The idea is thus to dene as the generalized derivative the set ofall tangent slopes. Correspondingly, we dene in a normed vector space ๐ the (convex)subdierential of ๐น : ๐ โ โ at ๐ฅ โ dom ๐น as
(4.1) ๐๐น (๐ฅ) โ {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค ๐น (๐ฅ) โ ๐น (๐ฅ) for all ๐ฅ โ ๐ } .
(Note that ๐ฅ โ dom ๐น is allowed since in this case the inequality is trivially satised.) For๐ฅ โ dom ๐น , we set ๐๐น (๐ฅ) = โ . An element ๐ฅโ โ ๐๐น (๐ฅ) is called a subderivative. (Followingthe terminology for classical derivatives, we reserve the more common term subgradientfor its Riesz representation ๐ง๐ฅโ โ ๐ when ๐ is a Hilbert space.)
The following example shows that the subdierential can also be empty for ๐ฅ โ dom ๐น ,even if ๐น is convex.
Example 4.1. We take ๐ = โ (and hence ๐ โ ๏ฟฝ ๐ = โ) and consider
๐น (๐ฅ) ={โโ๐ฅ if ๐ฅ โฅ 0,โ if ๐ฅ < 0.
Since (3.1) is trivially satised if ๐ฅ or ๐ฆ is negative, we can assume ๐ฅ, ๐ฆ โฅ 0 so thatwe are allowed to take the square of both sides of (3.1). A straightforward algebraicmanipulation then shows that this is equivalent to ๐ก (๐ก โ 1) (โ๐ฅ โโ
๐ฆ)2 โฅ 0, which holdsfor any ๐ฅ, ๐ฆ โฅ 0 and ๐ก โ [0, 1]. Hence, ๐น is convex.
42
4 convex subdifferentials
However, for ๐ฅ = 0, any ๐ฅโ โ ๐๐น (0) by denition must satisfy
๐ฅโ ยท ๐ฅ โค โโ๐ฅ for all ๐ฅ โ โ.
Taking now ๐ฅ > 0 arbitrary, we can divide by it on both sides and let ๐ฅ โ 0 to obtain
๐ฅโ โค โ(โ๐ฅ)โ1
โ โโ.
This is impossible for ๐ฅโ โ โ ๏ฟฝ ๐ โ. Hence, ๐๐น (0) is empty.
In fact, it will become clear that the nonexistence of tangents is much more problematicthan the nonuniqueness. However, we will later show that for proper, convex, and lowersemicontinuous functionals, ๐๐น (๐ฅ) is nonempty (and bounded) for all ๐ฅ โ int(dom ๐น ); seeTheorem 13.17. Furthermore, it follows directly from the denition that for all ๐ฅ โ ๐ , theset ๐๐น (๐ฅ) is convex and weakly-โ closed.The denition immediately yields a Fermat principle.
Theorem 4.2 (Fermat principle). Let ๐น : ๐ โ โ and ๐ฅ โ dom ๐น . Then the followingstatements are equivalent:
(i) 0 โ ๐๐น (๐ฅ);(ii) ๐น (๐ฅ) = min
๐ฅโ๐๐น (๐ฅ).
Proof. This is a direct consequence of the denitions: 0 โ ๐๐น (๐ฅ) if and only if
0 = ใ0, ๐ฅ โ ๐ฅใ๐ โค ๐น (๐ฅ) โ ๐น (๐ฅ) for all ๐ฅ โ ๐,
i.e., ๐น (๐ฅ) โค ๐น (๐ฅ) for all ๐ฅ โ ๐ .1 ๏ฟฝ
This matches the geometrical intuition: If ๐ = โ ๏ฟฝ ๐ โ, the ane function ๐น (๐ฅ) โ๐น (๐ฅ) + ๐ฅโ(๐ฅ โ ๐ฅ) with ๐ฅโ โ ๐๐น (๐ฅ) describes a tangent at (๐ฅ, ๐น (๐ฅ)) with slope ๐ฅโ; thecondition ๐ฅโ = 0 โ ๐๐น (๐ฅ) thus means that ๐น has a horizontal tangent in ๐ฅ . (Conversely, thefunction from Example 4.1 only has a vertical tangent in ๐ฅ = 0, which corresponds to aninnite slope that is not an element of any vector space.)
1Note that convexity of ๐น is not required for Theorem 4.2! The condition 0 โ ๐๐น (๐ฅ) therefore characterizesthe global(!) minimizers of any function ๐น . However, nonconvex functionals can also have localminimizers,for which the subdierential inclusion is not satised. In fact, (convex) subdierentials of nonconvexfunctionals are usually empty. (And conversely, one can show that ๐๐น (๐ฅ) โ โ for all ๐ฅ โ dom ๐น impliesthat ๐น is convex.) This leads to problems in particular for the proof of calculus rules, for which we willindeed have to assume convexity.
43
4 convex subdifferentials
Not surprisingly, the convex subdierential behaves more nicely for convex functions. Thekey property is an alternative characterization using directional derivatives, which exist(at least in the extended real-valued sense) for any convex function.
Lemma 4.3. Let ๐น : ๐ โ โ be convex and let ๐ฅ โ dom ๐น and โ โ ๐ be given. Then:
(i) the function
๐ : (0,โ) โ โ, ๐ก โฆโ ๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
,
is increasing;
(ii) there exists a limit ๐น โฒ(๐ฅ ;โ) = lim๐กโ 0 ๐ (๐ก) โ [โโ,โ], which satises
๐น โฒ(๐ฅ ;โ) โค ๐น (๐ฅ + โ) โ ๐น (๐ฅ);
(iii) if ๐ฅ โ int(dom ๐น ), the limit ๐น โฒ(๐ฅ ;โ) is nite.
Proof. (i): Inserting the denition and sorting terms shows that for all 0 < ๐ โค ๐ก , thecondition ๐ (๐ ) โค ๐ (๐ก) is equivalent to
๐น (๐ฅ + ๐ โ) โค ๐
๐ก๐น (๐ฅ + ๐กโ) +
(1 โ ๐
๐ก
)๐น (๐ฅ),
which follows from the convexity of ๐น since ๐ฅ + ๐ โ = ๐ ๐ก (๐ฅ + ๐กโ) + (1 โ ๐
๐ก )๐ฅ .(ii): The claim immediately follows from (i) since
๐น โฒ(๐ฅ ;โ) = lim๐กโ 0
๐ (๐ก) = inf๐ก>0
๐ (๐ก) โค ๐ (1) = ๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ โ.
(iii): Since int(dom ๐น ) is contained in the algebraic interior of dom ๐น , there exists an Y > 0such that ๐ฅ + ๐กโ โ dom ๐น for all ๐ก โ (โY, Y). Proceeding as in (i), we obtain that ๐ (๐ ) โค ๐ (๐ก)for all ๐ < ๐ก < 0 as well. From ๐ฅ = 1
2 (๐ฅ + ๐กโ) + 12 (๐ฅ โ ๐กโ) for ๐ก > 0, we also obtain that
๐ (โ๐ก) = ๐น (๐ฅ โ ๐กโ) โ ๐น (๐ฅ)โ๐ก โค ๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)
๐ก= ๐ (๐ก)
and hence that ๐ is increasing on all โ \ {0}. As in (ii), the choice of Y now implies that
โโ < ๐ (โY) โค ๐น โฒ(๐ฅ ;โ) โค ๐ (Y) < โ. ๏ฟฝ
Lemma 4.4. Let ๐น : ๐ โ โ be convex and ๐ฅ โ dom ๐น . Then
๐๐น (๐ฅ) = {๐ฅโ โ ๐ โ | ใ๐ฅโ, โใ๐ โค ๐น โฒ(๐ฅ ;โ) for all โ โ ๐ } .
Proof. Since any ๐ฅ โ ๐ can be written as ๐ฅ = ๐ฅ +โ for some โ โ ๐ and vice versa, it sucesto show that for any ๐ฅโ โ ๐ โ, the following statements are equivalent:
44
4 convex subdifferentials
(i) ใ๐ฅโ, โใ๐ โค ๐น โฒ(๐ฅ ;โ) for all โ โ ๐ ;(ii) ใ๐ฅโ, โใ๐ โค ๐น (๐ฅ + โ) โ ๐น (๐ฅ) for all โ โ ๐ .
If ๐ฅโ โ ๐ โ satises ใ๐ฅโ, โใ๐ โค ๐น โฒ(๐ฅ ;โ) for all โ โ ๐ , we immediately obtain fromLemma 4.3 (ii) that
ใ๐ฅโ, โใ๐ โค ๐น โฒ(๐ฅ ;โ) โค ๐น (๐ฅ + โ) โ ๐น (๐ฅ) for all โ โ ๐ .
Setting ๐ฅ = ๐ฅ + โ โ ๐ then yields ๐ฅโ โ ๐๐น (๐ฅ).Conversely, if ใ๐ฅโ, โใ โค ๐น (๐ฅ + โ) โ ๐น (๐ฅ) holds for all โ := ๐ฅ โ ๐ฅ โ ๐ , it also holds for ๐กโ forall โ โ ๐ and ๐ก > 0. Dividing by ๐ก and passing to the limit then yields that
ใ๐ฅโ, โใ๐ โค lim๐กโ 0
๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
= ๐น โฒ(๐ฅ ;โ). ๏ฟฝ
4.2 fundamental examples
We now look at some examples. First, the construction from the directional derivativeindicates that the subdierential is indeed a generalization of the Gรขteaux derivative.
Theorem 4.5. Let ๐น : ๐ โ โ be convex. If ๐น Gรขteaux dierentiable at ๐ฅ , then ๐๐น (๐ฅ) ={๐ท๐น (๐ฅ)}.
Proof. By denition of the Gรขteaux derivative, we have that
ใ๐ท๐น (๐ฅ), โใ๐ = ๐ท๐น (๐ฅ)โ = ๐น โฒ(๐ฅ ;โ) for all โ โ ๐ .
Lemma 4.4 now immediately yields ๐ท๐น (๐ฅ) โ ๐๐น (๐ฅ). Conversely, ๐ฅโ โ ๐๐น (๐ฅ) again byLemma 4.4 implies that
ใ๐ฅโ, โใ๐ โค ๐น โฒ(๐ฅ ;โ) = ใ๐ท๐น (๐ฅ), โใ๐ for all โ โ ๐ .
Taking the supremum over all โ with โโโ๐ โค 1 now yields that โ๐ฅโ โ ๐ท๐น (๐ฅ)โ๐ โ โค 0, i.e.,๐ฅโ = ๐ท๐น (๐ฅ). ๏ฟฝ
The converse holds as well: If ๐ฅ โ int(dom ๐น ) and ๐๐น (๐ฅ) is a singleton, then ๐น is Gรขteauxdierentiable; see Theorem 13.18.
Of course, we also want to compute subdierentials of functionals that are not dierentiable.The canonical example is the norm โ ยท โ๐ on a normed vector space, which even for ๐ = โ
is not dierentiable at ๐ฅ = 0.
45
4 convex subdifferentials
Theorem 4.6. For any ๐ฅ โ ๐ ,
๐(โ ยท โ๐ ) (๐ฅ) ={{๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โ๐ and โ๐ฅโโ๐ โ = 1} if ๐ฅ โ 0,๐น๐ โ if ๐ฅ = 0.
Proof. For ๐ฅ = 0, we have ๐ฅโ โ ๐(โ ยท โ๐ ) (๐ฅ) by denition if and only if
ใ๐ฅโ, ๐ฅใ๐ โค โ๐ฅ โ๐ for all ๐ฅ โ ๐ \ {0}
(since the inequality is trivial for ๐ฅ = 0), which by the denition of the operator norm isequivalent to โ๐ฅโโ๐ โ โค 1.
Let now ๐ฅ โ 0 and consider ๐ฅโ โ ๐(โ ยท โ๐ ) (๐ฅ). Inserting rst ๐ฅ = 0 and then ๐ฅ = 2๐ฅ intothe denition (4.1) yields the sequence of inequalities
โ๐ฅ โ๐ โค ใ๐ฅโ, ๐ฅใ๐ = ใ๐ฅโ, 2๐ฅ โ ๐ฅใ โค โ2๐ฅ โ๐ โ โ๐ฅ โ๐ = โ๐ฅ โ๐ ,
which imply that ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โ๐ . Similarly, we have for all ๐ฅ โ ๐ that
ใ๐ฅโ, ๐ฅใ๐ = ใ๐ฅโ, (๐ฅ + ๐ฅ) โ ๐ฅใ๐ โค โ๐ฅ + ๐ฅ โ๐ โ โ๐ฅ โ๐ โค โ๐ฅ โ๐ ,
As in the case ๐ฅ = 0, this implies that โ๐ฅโโ๐ โ โค 1. For ๐ฅ = ๐ฅ/โ๐ฅ โ๐ we thus have that
ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โโ1๐ ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โโ1๐ โ๐ฅ โ๐ = 1.
Hence, โ๐ฅโโ๐ โ = 1 is in fact attained.
Conversely, let ๐ฅโ โ ๐ โ with ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โ๐ and โ๐ฅโโ๐ โ = 1. Then we obtain for all ๐ฅ โ ๐from (1.1) the relation
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ = ใ๐ฅโ, ๐ฅใ๐ โ ใ๐ฅโ, ๐ฅใ๐ โค โ๐ฅ โ๐ โ โ๐ฅ โ๐ ,
and hence ๐ฅโ โ ๐(โ ยท โ๐ ) (๐ฅ) by denition. ๏ฟฝ
Example 4.7. In particular, we obtain for ๐ = โ the subdierential of the absolute valuefunction as
(4.2) ๐( | ยท |) (๐ก) = sign(๐ก) โ{1} if ๐ก > 0,{โ1} if ๐ก < 0,[โ1, 1] if ๐ก = 0,
cf. Figure 4.1a.2
46
4 convex subdifferentials
๐ฅ
๐๐น (๐ฅ)
โ1
0
1
(a) ๐น (๐ฅ) = |๐ฅ |
๐ฅ
๐๐น (๐ฅ)
โ1
โ10 1
(b) ๐น (๐ฅ) = ๐ฟ [โ1,1] (๐ฅ)
Figure 4.1: Illustration of graph ๐๐น for some example functions ๐น : โ โ โ
We can also give a more explicit characterization of the subdierential of the indicatorfunctional of a set ๐ถ โ ๐ .
Lemma 4.8. For any ๐ถ โ ๐ ,๐๐ฟ๐ถ (๐ฅ) = {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ถ} .
Proof. For any ๐ฅ โ ๐ถ = dom๐ฟ๐ถ , we have that
๐ฅโ โ ๐๐ฟ๐ถ (๐ฅ) โ ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค ๐ฟ๐ถ (๐ฅ) for all ๐ฅ โ ๐โ ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ถ,
since the rst inequality is trivially satised for all ๐ฅ โ ๐ถ . ๏ฟฝ
The set ๐๐ถ (๐ฅ) โ ๐๐ฟ๐ถ (๐ฅ) is also called the (convex) normal cone to ๐ถ at ๐ฅ (which maybe empty if ๐ถ is not convex). Depending on the set ๐ถ , this can be made even more ex-plicit.
Example 4.9. Let ๐ = โ and ๐ถ = [โ1, 1], and let ๐ก โ ๐ถ . Then we have ๐ฅโ โ ๐๐ฟ [โ1,1] (๐ก) ifand only if ๐ฅโ(๐ก โ ๐ก) โค 0 for all ๐ก โ [โ1, 1]. We proceed by distinguishing three cases.
Case 1: ๐ก = 1. Then ๐ก โ ๐ก โ [โ2, 0], and hence the product is nonpositive if andonly if ๐ฅโ โฅ 0.
Case 2: ๐ก = โ1. Then ๐ก โ ๐ก โ [0, 2], and hence the product is nonpositive if andonly if ๐ฅโ โค 0.
Case 3: ๐ก โ (โ1, 1). Then ๐ก โ ๐ก can be positive as well as negative, and hence only
2Note that this set-valued denition of sign(๐ก) diers from the usual (single-valued) one, in particular for๐ก = 0; to make this distinction clear, one often refers to (4.2) as the sign in the sense of convex analysis.Throughout this book, we will always use the sign in this sense.
47
4 convex subdifferentials
๐ฅโ = 0 is possible.
We thus obtain that
(4.3) ๐๐ฟ [โ1,1] (๐ก) =
[0,โ) if ๐ก = 1,(โโ, 0] if ๐ก = โ1,{0} if ๐ก โ (โ1, 1),โ if ๐ก โ โ \ [โ1, 1],
cf. Figure 4.1b. Readers familiar with (non)linear optimization will recognize these as thecomplementarity conditions for Lagrange multipliers corresponding to the inequalitiesโ1 โค ๐ก โค 1.
Conversely, subdierentials of convex functionals can be obtained from normal cones tocorresponding epigraphs (which are convex sets by Lemma 3.1). This relation will be thebasis for dening further subdierentials for more general classes of mappings in Part IV.
Lemma 4.10. Let ๐น : ๐ โ โ be convex and ๐ฅ โ dom ๐น . Then ๐ฅโ โ ๐๐น (๐ฅ) if and only if(๐ฅโ,โ1) โ ๐epi ๐น (๐ฅ, ๐น (๐ฅ)).
Proof. By denition of the normal cone, (๐ฅโ,โ1) โ ๐epi ๐น (๐ฅ, ๐น (๐ฅ)) is equivalent to
(4.4) ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โ (๐ก โ ๐น (๐ฅ)) โค 0 for all (๐ฅ, ๐ก) โ epi ๐น,
i.e., for all ๐ฅ โ ๐ and ๐ก โฅ ๐น (๐ฅ). Taking ๐ก = ๐น (๐ฅ) and rearranging, this yields that ๐ฅโ โ ๐๐น (๐ฅ).Conversely, from ๐ฅโ โ ๐๐น (๐ฅ) we immediately obtain that
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค ๐น (๐ฅ) โ ๐น (๐ฅ) โค ๐ก โ ๐น (๐ฅ) for all ๐ฅ โ ๐, ๐ก โฅ ๐น (๐ฅ),
i.e., (4.4) and thus (๐ฅโ,โ1) โ epi ๐น . ๏ฟฝ
The following result furnishes a crucial link between nite- and innite-dimensionalconvex optimization. We again assume (as we will from now on) that ฮฉ โ โ๐ is open andbounded.
Theorem 4.11. Let ๐ : โ โ โ be proper, convex, and lower semicontinuous, and let ๐น :๐ฟ๐ (ฮฉ) โ โ with 1 โค ๐ < โ be as in Lemma 3.7. Then we have for all ๐ข โ dom ๐น with๐ โ ๐
๐โ1 that
๐๐น (๐ข) = {๐ขโ โ ๐ฟ๐ (ฮฉ) | ๐ขโ(๐ฅ) โ ๐๐ (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ} .
48
4 convex subdifferentials
Proof. Let ๐ข, ๏ฟฝ๏ฟฝ โ dom ๐น , i.e., ๐ โฆ ๐ข, ๐ โฆ ๏ฟฝ๏ฟฝ โ ๐ฟ1(ฮฉ), and let ๐ขโ โ ๐ฟ๐ (ฮฉ) be arbitrary. If๐ขโ(๐ฅ) โ ๐๐ (๐ข (๐ฅ)) almost everywhere, we can integrate over all ๐ฅ โ ฮฉ to obtain
๐น (๏ฟฝ๏ฟฝ) โ ๐น (๐ข) =โซฮฉ๐ (๏ฟฝ๏ฟฝ (๐ฅ)) โ ๐ (๐ข (๐ฅ)) ๐๐ฅ โฅ
โซฮฉ๐ขโ(๐ฅ) (๏ฟฝ๏ฟฝ (๐ฅ) โ ๐ข (๐ฅ)) ๐๐ฅ = ใ๐ขโ, ๏ฟฝ๏ฟฝ โ ๐ขใ๐ฟ๐ ,
i.e., ๐ขโ โ ๐๐น (๐ข).Conversely, let ๐ขโ โ ๐๐น (๐ข). Then by denition it holds thatโซ
ฮฉ๐ขโ(๐ฅ) (๏ฟฝ๏ฟฝ (๐ฅ) โ ๐ข (๐ฅ)) ๐๐ฅ โค
โซฮฉ๐ (๏ฟฝ๏ฟฝ (๐ฅ)) โ ๐ (๐ข (๐ฅ)) ๐๐ฅ for all ๏ฟฝ๏ฟฝ โ ๐ฟ๐ (ฮฉ).
Let now ๐ก โ โ be arbitrary and let ๐ด โ ฮฉ be an arbitrary measurable set. Setting
๏ฟฝ๏ฟฝ (๐ฅ) โ{๐ก if ๐ฅ โ ๐ด,๐ข (๐ฅ) if ๐ฅ โ ๐ด,
the above inequality implies due to ๏ฟฝ๏ฟฝ โ ๐ฟ๐ (ฮฉ) thatโซ๐ด๐ขโ(๐ฅ) (๐ก โ ๐ข (๐ฅ)) ๐๐ฅ โค
โซ๐ด๐ (๐ก) โ ๐ (๐ข (๐ฅ)) ๐๐ฅ.
Since ๐ด was arbitrary, it must hold that
๐ขโ(๐ฅ) (๐ก โ ๐ข (๐ฅ)) โค ๐ (๐ก) โ ๐ (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ.
Furthermore, since ๐ก โ โ was arbitrary, we obtain that ๐ขโ(๐ฅ) โ ๐๐ (๐ข (๐ฅ)) for almost every๐ฅ โ ฮฉ. ๏ฟฝ
Remark 4.12. A similar representation representation can be shown for vector-valued and spatially-dependent integrands ๐ : ฮฉ รโ โ โ๐ under stronger assumptions; see, e.g., [Rockafellar 1976a,Corollary 3F].
A similar proof shows that for ๐น : โ๐ โ โ with ๐น (๐ฅ) = โ๐๐=1 ๐๐ (๐ฅ๐) and ๐๐ : โ โ โ convex,
we have for any ๐ฅ โ dom ๐น that
๐๐น (๐ฅ) = {๐ฅโ โ โ๐
๏ฟฝ๏ฟฝ ๐ฅโ๐ โ ๐๐๐ (๐ฅ๐), 1 โค ๐ โค ๐}.
Together with the above examples, this yields componentwise expressions for the subdif-ferential of the norm โ ยท โ1 as well as of the indicator functional of the unit ball with respectto the supremum norm in โ๐ .
49
4 convex subdifferentials
4.3 calculus rules
As for classical derivatives, one rarely obtains subdierentials from the fundamental deni-tion but rather by applying calculus rules. It stands to reason that these are more dicultto derive the weaker the derivative concept is (i.e., the more functionals are dierentiablein that sense). For convex subdierentials, the following two rules still follow directly fromthe denition.
Lemma 4.13. Let ๐น : ๐ โ โ be convex and ๐ฅ โ dom ๐น . Then,
(i) ๐(_๐น ) (๐ฅ) = _(๐๐น (๐ฅ)) โ {_๐ฅโ | ๐ฅโ โ ๐๐น (๐ฅ)} for _ โฅ 0;
(ii) ๐๐น (ยท + ๐ฅ0) (๐ฅ) = ๐๐น (๐ฅ + ๐ฅ0) for ๐ฅ0 โ ๐ with ๐ฅ + ๐ฅ0 โ dom ๐น .
Already the sum rule is considerably more delicate.
Theorem 4.14 (sum rule). Let ๐ be a Banach space, ๐น,๐บ : ๐ โ โ be convex and lowersemicontinuous, and ๐ฅ โ dom ๐น โฉ dom๐บ . Then
๐๐น (๐ฅ) + ๐๐บ (๐ฅ) โ ๐(๐น +๐บ) (๐ฅ),
with equality if there exists an ๐ฅ0 โ int(dom ๐น ) โฉ dom๐บ .
Proof. The inclusion follows directly from adding the denitions of the two subdierentials.Let therefore ๐ฅ โ dom ๐น โฉ dom๐บ and ๐ฅโ โ ๐(๐น +๐บ) (๐ฅ), i.e., satisfying
(4.5) ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค (๐น (๐ฅ) +๐บ (๐ฅ)) โ (๐น (๐ฅ) +๐บ (๐ฅ)) for all ๐ฅ โ ๐ .
Our goal is now to use (as in the proof of Lemma 3.5) the characterization of convexfunctionals via their epigraph together with the HahnโBanach separation theorem toconstruct a bounded linear functional ๐ฆโ โ ๐๐บ (๐ฅ) โ ๐ โ with ๐ฅโ โ ๐ฆโ โ ๐๐น (๐ฅ), i.e.,
๐น (๐ฅ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ ใ๐ฆโ, ๐ฅ โ ๐ฅใ๐ for all ๐ฅ โ dom ๐น,
๐บ (๐ฅ) โ๐บ (๐ฅ) โค ใ๐ฆโ, ๐ฅ โ ๐ฅใ๐ for all ๐ฅ โ dom๐บ.
For that purpose, we dene the sets
๐ถ1 โ {(๐ฅ, ๐ก โ (๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ )) | ๐ฅ โ dom ๐น, ๐ก โฅ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ } ,๐ถ2 โ {(๐ฅ,๐บ (๐ฅ) โ ๐ก) | ๐ฅ โ dom๐บ, ๐ก โฅ ๐บ (๐ฅ)} ,
i.e.,๐ถ1 = epi(๐น โ ๐ฅโ) โ (0, ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ ), ๐ถ2 = โ(epi๐บ โ (0,๐บ (๐ฅ))),
cf. Figure 4.2. To apply Corollary 1.6 to these sets, we have to verify its conditions.
50
4 convex subdifferentials
(i) Since ๐ฅ โ dom ๐น โฉ dom๐บ , both ๐ถ1 and ๐ถ2 are nonempty. Furthermore, since ๐น and๐บ are convex, it is straightforward (if tedious) to verify from the denition that ๐ถ1and ๐ถ2 are convex.
(ii) The critical point is of course the nonemptiness of int๐ถ1, for which we argue asfollows. Since ๐ฅ0 โ int(dom ๐น ), we know from Theorem 3.12 that ๐น is bounded in anopen neighborhood ๐ โ int(dom ๐น ) of ๐ฅ0. We can thus nd an open interval ๐ผ โ โ
such that ๐ ร ๐ผ โ ๐ถ1. Since ๐ ร ๐ผ is open by the denition of the product topologyon ๐ รโ, any (๐ฅ0, ๐ผ) with ๐ผ โ ๐ผ is an interior point of ๐ถ1.
(iii) It remains to show that int๐ถ1 โฉ๐ถ2 = โ . Assume there exists a (๐ฅ, ๐ผ) โ int๐ถ1 โฉ๐ถ2.But then the denitions of these sets and of the product topology imply that
๐น (๐ฅ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ < ๐ผ โค ๐บ (๐ฅ) โ๐บ (๐ฅ),
contradicting (4.5). Hence int๐ถ1 and ๐ถ2 are disjoint.
We can thus apply Corollary 1.6 to obtain a pair (๐งโ, ๐ ) โ (๐ โ รโ) \ {(0, 0)} ๏ฟฝ (๐ รโ)โ \{(0, 0)} and a _ โ โ with
ใ๐งโ, ๐ฅใ๐ + ๐ (๐ก โ (๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ )) โค _, ๐ฅ โ dom ๐น, ๐ก โฅ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ ,(4.6a)ใ๐งโ, ๐ฅใ๐ + ๐ (๐บ (๐ฅ) โ ๐ก) โฅ _, ๐ฅ โ dom๐บ, ๐ก โฅ ๐บ (๐ฅ).(4.6b)
We now show that ๐ < 0. If ๐ = 0, we can insert ๐ฅ = ๐ฅ0 โ dom ๐น โฉ dom๐บ to obtain thecontradiction
ใ๐งโ, ๐ฅ0ใ๐ < _ โค ใ๐งโ, ๐ฅ0ใ๐ ,which follows since (๐ฅ0, ๐ผ) for ๐ผ large enough is an interior point of ๐ถ1 and hence can bestrictly separated from ๐ถ2 by Theorem 1.5 (i). If ๐ > 0, choosing ๐ก > ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ makesthe term in parentheses in (4.6a) strictly positive, and taking ๐ก โ โ with xed ๐ฅ leads to acontradiction to the boundedness by _.
Hence ๐ < 0, and (4.6a) with ๐ก = ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅใ๐ and (4.6b) with ๐ก = ๐บ (๐ฅ) imply that
๐น (๐ฅ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ ๐ โ1(_ โ ใ๐งโ, ๐ฅใ๐ ), for all ๐ฅ โ dom ๐น,
๐บ (๐ฅ) โ๐บ (๐ฅ) โค ๐ โ1(_ โ ใ๐งโ, ๐ฅใ๐ ), for all ๐ฅ โ dom๐บ.
Taking ๐ฅ = ๐ฅ โ dom ๐น โฉ dom๐บ in both inequalities immediately yields that _ = ใ๐งโ, ๐ฅใ๐ .Hence, ๐ฆโ = ๐ โ1๐งโ โ ๐ โ is the desired functional with (๐ฅโ โ ๐ฆโ) โ ๐๐น (๐ฅ) and ๐ฆโ โ ๐๐บ (๐ฅ),i.e., ๐ฅโ โ ๐๐น (๐ฅ) + ๐๐บ (๐ฅ). ๏ฟฝ
The following example demonstrates that the inclusion is strict in general (althoughnaturally the situation in innite-dimensional vector spaces is nowhere near as obvious).
51
4 convex subdifferentials
๐น (๐ฅ) โ ๐ฅโ ยท ๐ฅ
๐ถ1
โ๐บ (๐ฅ)
๐ถ2 ๐ฆโ ยท ๐ฅ
๐ก
๐ฅ
Figure 4.2: Illustration of the proof of Theo-rem 4.14 for ๐น (๐ฅ) = 1
2 |๐ฅ |2, ๐บ (๐ฅ) =|๐ฅ |, and ๐ฅโ = 1
2 โ ๐(๐น +๐บ) (0). Thedashed line is the separating hy-perplane {(๐ฅ, ๐ก) | ๐งโ ยท ๐ฅ + ๐ ๐ก = _},i.e., _ = 0, ๐งโ = โ1, ๐ = โ2 andhence ๐ฆโ = 1
2 โ ๐๐บ (0).
๐ถ1
๐ถ2
Figure 4.3: Illustration of the situation inExample 4.15. Here the dashedseparating hyperplane corre-sponds to the vertical line{(๐ฅ, ๐ก) | ๐ฅ = 0} (i.e., ๐งโ = 1and ๐ = 0), and hence ๐ฆโ โ โ.
Example 4.15. We take again ๐ = โ and ๐น : ๐ โ โ from Example 4.1, i.e.,
๐น (๐ฅ) ={โโ๐ฅ if ๐ฅ โฅ 0,โ if ๐ฅ < 0,
as well as๐บ (๐ฅ) = ๐ฟ (โโ,0] (๐ฅ). Both ๐น and๐บ are convex, and 0 โ dom ๐น โฉ dom๐บ . In fact,(๐น +๐บ) (๐ฅ) = ๐ฟ{0} (๐ฅ) and hence it is straightforward to verify that ๐(๐น +๐บ) (0) = โ.
On the other hand, we know from Example 4.1 and the argument leading to (4.3) that
๐๐น (0) = โ , ๐๐บ (0) = [0,โ),
and hence that๐๐น (0) + ๐๐บ (0) = โ ( โ = ๐(๐น +๐บ) (0).
(As ๐น only admits a vertical tangent as ๐ฅ = 0, this example corresponds to the situationwhere ๐ = 0 in (4.6a), cf. Figure 4.3.)
Remark 4.16. There exist alternative conditions that guarantee that the sum rule holds with equality.For example, if ๐ is a Banach space and ๐น and ๐บ are in addition lower semicontinuous, this holds
52
4 convex subdifferentials
under the AttouchโBrรฉzis condition thatโ_โฅ0
_ (dom ๐น โ dom๐บ) =: ๐ is a closed subspace of ๐,
see [Attouch & Brezis 1986]. (Note that this condition is not satised in Example 4.15 either, since inthis case ๐ = โ dom๐บ = [0,โ) which is closed but not a subspace.)
It is not dicult to see that the condition ๐ฅ0 โ int(dom ๐น ) โฉ dom๐บ in the statement of Lemma 4.13implies the AttouchโBrรฉzis condition. In fact, the latter allows us to generalize the condition to๐ฅ0 โ ri(dom ๐น ) โฉ dom๐บ where ri๐ด for a set ๐ด denotes the relative interior : the interior of ๐ด withrespect to the smallest closed ane set that contains ๐ด. As an example, ri{๐} = {๐} for a point๐ โ ๐ .
By induction, we obtain from this sum rules for an arbitrary (nite) number of functionals(where ๐ฅ0 has to be an interior point of all but one eective domain). A chain rule for linearoperators can be proved similarly.
Theorem 4.17 (chain rule). Let ๐,๐ be Banach spaces, ๐พ โ ๐(๐ ;๐ ), ๐น : ๐ โ โ be proper,convex, and lower semicontinuous, and ๐ฅ โ dom(๐น โฆ ๐พ). Then,
๐(๐น โฆ ๐พ) (๐ฅ) โ ๐พโ๐๐น (๐พ๐ฅ) โ {๐พโ๐ฆโ | ๐ฆโ โ ๐๐น (๐พ๐ฅ)}
with equality if there exists an ๐ฅ0 โ ๐ with ๐พ๐ฅ0 โ int(dom ๐น ).
Proof. The inclusion is again a direct consequence of the denition: If ๐ฆโ โ ๐๐น (๐พ๐ฅ) โ ๐ โ,we in particular have for all ๏ฟฝ๏ฟฝ = ๐พ๐ฅ โ ๐ with ๐ฅ โ ๐ that
๐น (๐พ๐ฅ) โ ๐น (๐พ๐ฅ) โฅ ใ๐ฆโ, ๐พ๐ฅ โ ๐พ๐ฅใ๐ = ใ๐พโ๐ฆโ, ๐ฅ โ ๐ฅใ๐ ,
i.e., ๐ฅโ โ ๐พโ๐ฆโ โ ๐(๐น โฆ ๐พ) โ ๐ โ.
To show the claimed equality, let ๐ฅ โ dom(๐น โฆ ๐พ) and ๐ฅโ โ ๐(๐น โฆ ๐พ) (๐ฅ), i.e.,
๐น (๐พ๐ฅ) + ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค ๐น (๐พ๐ฅ) for all ๐ฅ โ ๐ .
We now construct a ๐ฆโ โ ๐๐น (๐พ๐ฅ) with ๐ฅโ = ๐พโ๐ฆโ by applying the sum rule to3
๐ป : ๐ ร ๐ โ โ, (๐ฅ, ๐ฆ) โฆโ ๐น (๐ฆ) + ๐ฟgraph๐พ (๐ฅ, ๐ฆ).
Since ๐พ is linear, graph๐พ and hence ๐ฟgraph๐พ are convex. Furthermore, ๐พ๐ฅ โ dom ๐น byassumption and thus (๐ฅ, ๐พ๐ฅ) โ dom๐ป .
3This technique of โliftingโ a problem to a product space in order to separate operators is also useful inmany other contexts.
53
4 convex subdifferentials
We begin by showing that ๐ฅโ โ ๐(๐น โฆ ๐พ) (๐ฅ) if and only if (๐ฅโ, 0) โ ๐๐ป (๐ฅ, ๐พ๐ฅ). First, let(๐ฅโ, 0) โ ๐๐ป (๐ฅ, ๐พ๐ฅ). Then we have for all ๐ฅ โ ๐, ๏ฟฝ๏ฟฝ โ ๐ that
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + ใ0, ๏ฟฝ๏ฟฝ โ ๐พ๐ฅใ๐ โค ๐น (๏ฟฝ๏ฟฝ) โ ๐น (๐พ๐ฅ) + ๐ฟgraph๐พ (๐ฅ, ๏ฟฝ๏ฟฝ) โ ๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ).In particular, this holds for all ๏ฟฝ๏ฟฝ โ ran(๐พ) = {๐พ๐ฅ | ๐ฅ โ ๐ }. By ๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ) = 0 we thusobtain that
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค ๐น (๐พ๐ฅ) โ ๐น (๐พ๐ฅ) for all ๐ฅ โ ๐,i.e., ๐ฅโ โ ๐(๐น โฆ ๐พ) (๐ฅ). Conversely, let ๐ฅโ โ ๐(๐น โฆ ๐พ) (๐ฅ). Since ๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ) = 0 and๐ฟgraph๐พ (๐ฅ, ๏ฟฝ๏ฟฝ) โฅ 0, it then follows for all ๐ฅ โ ๐ and ๏ฟฝ๏ฟฝ โ ๐ that
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + ใ0, ๏ฟฝ๏ฟฝ โ ๐พ๐ฅใ๐ = ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐โค ๐น (๐พ๐ฅ) โ ๐น (๐พ๐ฅ) + ๐ฟgraph๐พ (๐ฅ, ๏ฟฝ๏ฟฝ) โ ๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ)= ๐น (๏ฟฝ๏ฟฝ) โ ๐น (๐พ๐ฅ) + ๐ฟgraph๐พ (๐ฅ, ๏ฟฝ๏ฟฝ) โ ๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ),
where we have used that last equality holds trivially as โ = โ for ๏ฟฝ๏ฟฝ โ ๐พ๐ฅ . Hence,(๐ฅโ, 0) โ ๐๐ป (๐ฅ, ๐พ๐ฅ).We now consider ๐น : ๐ ร ๐ โ โ, (๐ฅ, ๐ฆ) โฆโ ๐น (๐ฆ), and (๐ฅ0, ๐พ๐ฅ0) โ graph๐พ = dom๐ฟgraph๐พ .Since ๐พ๐ฅ0 โ int(dom ๐น ) โ ๐ by assumption, (๐ฅ0, ๐พ๐ฅ0) โ int(dom ๐น ) as well. We can thusapply Theorem 4.14 to obtain
(๐ฅโ, 0) โ ๐๐ป (๐ฅ, ๐พ๐ฅ) = ๐๐น (๐พ๐ฅ) + ๐๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ),
i.e., (๐ฅโ, 0) = (๐ฅโ1 , ๐ฆโ1 ) + (๐ฅโ2, ๐ฆโ2) for some (๐ฅโ1 , ๐ฆโ1 ) โ ๐๐น (๐พ๐ฅ) and (๐ฅโ2, ๐ฆโ2) โ ๐๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ).Now we have (๐ฅโ1 , ๐ฆโ1 ) โ ๐๐น (๐พ๐ฅ) if and only if
ใ๐ฅโ1 , ๐ฅ โ ๐ฅใ๐ + ใ๐ฆโ1 , ๏ฟฝ๏ฟฝ โ ๐พ๐ฅใ๐ โค ๐น (๏ฟฝ๏ฟฝ) โ ๐น (๐พ๐ฅ) for all ๐ฅ โ ๐, ๏ฟฝ๏ฟฝ โ ๐ .Fixing ๐ฅ = ๐ฅ and ๏ฟฝ๏ฟฝ = ๐พ๐ฅ implies that ๐ฆโ1 โ ๐๐น (๐พ๐ฅ) and ๐ฅโ1 = 0, respectively. Furthermore,(๐ฅโ2, ๐ฆโ2) โ ๐๐ฟgraph๐พ (๐ฅ, ๐พ๐ฅ) if and only if
ใ๐ฅโ2, ๐ฅ โ ๐ฅใ๐ + ใ๐ฆโ2, ๏ฟฝ๏ฟฝ โ ๐พ๐ฅใ๐ โค 0 for all (๐ฅ, ๏ฟฝ๏ฟฝ) โ graph๐พ,
i.e., for all ๐ฅ โ ๐ and ๏ฟฝ๏ฟฝ = ๐พ๐ฅ . Therefore,
ใ๐ฅโ2 + ๐พโ๐ฆโ2, ๐ฅ โ ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐and hence ๐ฅโ2 = โ๐พโ๐ฆโ2 . Together we obtain
(๐ฅโ, 0) = (0, ๐ฆโ1 ) + (โ๐พโ๐ฆโ2, ๐ฆโ2),
which implies ๐ฆโ1 = โ๐ฆโ2 and thus ๐ฅโ = โ๐พโ๐ฆโ2 = ๐พโ๐ฆโ1 with ๐ฆโ1 โ ๐๐น (๐พ๐ฅ) as claimed. ๏ฟฝ
The condition for equality in particular holds if ๐พ is surjective and dom ๐น has nonemptyinterior. Again, the inequality can be strict.
54
4 convex subdifferentials
Example 4.18. Here we take ๐ = ๐ = โ and again ๐น : ๐ โ โ from Examples 4.1and 4.15 as well as
๐พ : โ โ โ, ๐พ๐ฅ = 0.
Clearly, (๐น โฆ ๐พ) (๐ฅ) = 0 for all ๐ฅ โ โ and hence ๐(๐น โฆ ๐พ) (๐ฅ) = {0} by Theorem 4.5. Onthe other hand, ๐๐น (0) = โ by Example 4.1 and hence
๐พโ๐๐น (๐พ๐ฅ) = ๐พโ๐๐น (0) = โ ( {0}.
(Note the problem: ๐พโ is far from surjective, and ran๐พ โฉ int(dom ๐น ) = โ .)
We can also obtain a chain rule when the inner mapping is nondierentiable.
Theorem 4.19. Let ๐น : ๐ โ โ be convex and ๐ : โ โ โ be convex, increasing, anddierentiable. Then ๐ โฆ ๐น is convex, and for all ๐ฅ โ ๐ ,
๐[๐ โฆ ๐น ] (๐ฅ) = ๐โฒ(๐น (๐ฅ))๐๐น (๐ฅ) = {๐โฒ(๐น (๐ฅ))๐ฅโ | ๐ฅโ โ ๐๐น (๐ฅ)} .
Proof. First, the convexity of ๐ โฆ ๐น follows from Lemma 3.4 (iii). To calculate the subdier-ential, we x ๐ฅ โ ๐ and observe from Theorem 3.13 that ๐ is Lipschitz continuous withsome constant ๐ฟ near ๐น (๐ฅ) โ int(dom๐) = โ. Thus, for any โ โ ๐ ,
(๐ โฆ ๐น )โฒ(๐ฅ ;โ) = lim๐กโ 0
[๐ โฆ ๐น ] (๐ฅ + ๐กโ) โ [๐ โฆ ๐น ] (๐ฅ)๐ก
= lim๐กโ 0
๐ (๐น (๐ฅ + ๐กโ)) โ ๐ (๐น (๐ฅ) + ๐ก๐น โฒ(๐ฅ ;โ))๐ก
+ lim๐กโ 0
๐ (๐น (๐ฅ) + ๐ก๐น โฒ(๐ฅ ;โ)) โ ๐ (๐น (๐ฅ))๐ก
โค lim๐กโ 0
๐ฟ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โ ๐น โฒ(๐ฅ ;โ)๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ + ๐โฒ(๐น (๐ฅ); ๐น โฒ(๐ฅ ;โ))
= ๐โฒ(๐น (๐ฅ); ๐น โฒ(๐ฅ ;โ)),where we have used the directional dierentiability of ๐น in ๐ฅ โ int(dom ๐น ) = ๐ in the laststep. Similarly, we prove the opposite inequality using ๐ (๐ก1) โ ๐ (๐ก2) โฅ โ๐ฟ |๐ก1 โ ๐ก2 | for all๐ก1, ๐ก2 suciently close to ๐น (๐ฅ). Hence [๐ โฆ ๐น ] (๐ฅ ;โ) = ๐โฒ(๐น (๐ฅ); ๐น โฒ(๐ฅ ;โ)) = ๐โฒ(๐น (๐ฅ))๐น โฒ(๐ฅ ;โ)by the dierentiability of ๐ .
Now Lemma 4.4 yields that
๐(๐ โฆ ๐น ) (๐ฅ) = {๐งโ โ ๐ โ | ใ๐งโ, โใ โค ๐โฒ(๐น (๐ฅ))๐น โฒ(๐ฅ ;โ) for all โ โ ๐ }= {๐โฒ(๐น (๐ฅ))๐ฅโ | ๐ฅโ โ ๐ โ, ใ๐ฅโ, โใ โค ๐น โฒ(๐ฅ ;โ) for all โ โ ๐ }= {๐โฒ(๐น (๐ฅ))๐ฅโ | ๐ฅโ โ ๐๐น (๐ฅ)}.
For the second step, note that๐โฒ(๐น (๐ฅ)) = 0 implies that ๐งโ = 0 as well; otherwise we can set๐งโ โ ๐โฒ(๐น (๐ฅ))๐ฅโ and use ๐โฒ(๐น (๐ฅ)) > 0 (since ๐ is increasing) to simplify the inequality. ๏ฟฝ
55
4 convex subdifferentials
Remark 4.20. The dierentiability assumption on ๐ in Theorem 4.19 is not necessary, but the proofis otherwise much more involved and demands the support functional machinery of Section 13.3.See also [Hiriart-Urruty & Lemarรฉchal 2001, Section D.4.3] for a version with set-valued ๐น in nitedimensions.
The Fermat principle together with the sum rule yields the following characterization ofminimizers of convex functionals under convex constraints.
Corollary 4.21. Let ๐ โ ๐ be nonempty, convex, and closed, and let ๐น : ๐ โ โ be proper,convex, and lower semicontinuous. If there exists an ๐ฅ0 โ int๐ โฉ dom ๐น , then ๐ฅ โ ๐ solves
min๐ฅโ๐
๐น (๐ฅ)
if and only if 0 โ ๐๐น (๐ฅ) + ๐๐ (๐ฅ) or, in other words, if there exists an ๐ฅโ โ ๐ โ with
(4.7){ โ ๐ฅโ โ ๐๐น (๐ฅ),ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ .
Proof. Due to the assumptions on ๐น and ๐ , we can apply Theorem 4.2 to ๐ฝ โ ๐น + ๐ฟ๐ .Furthermore, since ๐ฅ0 โ int๐ = int(dom๐ฟ๐ ), we can also apply Theorem 4.14. Hence ๐นhas a minimum in ๐ฅ if and only if
0 โ ๐๐ฝ (๐ฅ) = ๐๐น (๐ฅ) + ๐๐ฟ๐ (๐ฅ).
Together with the characterization of subdierentials of indicator functionals as normalcones, this yields (4.7). ๏ฟฝ
If ๐น : ๐ โ โ is Gรขteaux dierentiable (and hence nite-valued), (4.7) coincide with theclassical KarushโKuhnโTucker conditions; the existence of an interior point ๐ฅ0 โ int๐ isrelated to a Slater condition in nonlinear optimization needed to show existence of theLagrange multiplier ๐ฅโ for inequality constraints.
56
5 FENCHEL DUALITY
One of the main tools in convex optimization is duality: Any convex optimization problemcan be related to a dual problem, and the joint study of both problems yields additionalinformation about the solution. Our main objective in this chapter, the FenchelโRockafellarduality theorem, will be our main tool for deriving explicit optimality conditions as well asnumerical algorithms for convex minimization problems that can be expressed as the sumof (simple) functionals.
5.1 fenchel conjugates
Let ๐ be a normed vector space and ๐น : ๐ โ โ be proper but not necessarily convex. Wethen dene the Fenchel conjugate (or convex conjugate) of ๐น as
๐น โ : ๐ โ โ โ, ๐น โ(๐ฅโ) = sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ)} .
(Since dom ๐น = โ is excluded, we have that ๐น โ(๐ฅโ) > โโ for all ๐ฅโ โ ๐ โ, and hence thedenition is meaningful.) An alternative interpretation is that ๐น โ(๐ฅโ) is the (negative ofthe) ane part of the tangent to ๐น (in the point ๐ฅ at which the supremum is attained)with slope ๐ฅโ, see Figure 5.1. Lemma 3.4 (v) and Lemma 2.3 (v) immediately imply that ๐น โis always convex and weakly-โ lower semicontinuous (as long as ๐น is indeed proper). If๐น is bounded from below by an ane functional (which is always the case if ๐น is proper,convex, and lower semicontinuous by Lemma 3.5), then ๐น โ is proper as well. Finally, thedenition directly yields the FenchelโYoung inequality
(5.1) ใ๐ฅโ, ๐ฅใ๐ โค ๐น (๐ฅ) + ๐น โ(๐ฅโ) for all ๐ฅ โ ๐, ๐ฅโ โ ๐ โ.
If ๐ is not reexive, we can similarly dene for (weakly-โ lower semicontinuous) ๐น : ๐ โ โโ the Fenchel preconjugate
๐นโ : ๐ โ โ, ๐นโ(๐ฅ) = sup๐ฅโโ๐ โ
{ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅโ)} .
The point of this convention is that even in nonreexive spaces, the biconjugate
๐น โโ : ๐ โ โ, ๐น โโ(๐ฅ) = (๐น โ)โ(๐ฅ)
57
5 fenchel duality
๐ฅ
๐ก
๐ฅโ ยท ๐ฅ โ ๐น (๐ฅ)
๐น โ(๐ฅโ)
(a) ๐น โ(๐ฅโ) as maximizer of ๐ฅโ ยท ๐ฅ โ ๐น (๐ฅ)
๐ฅ
๐ก
๐น (๐ฅ)
โ๐น โ(๐ฅโ) + ๐ฅโ ยท ๐ฅ
(๐ฅ, ๐น (๐ฅ))
โ๐น โ(๐ฅโ)
(b) Alternative interpretation: โ๐น โ(๐ฅโ) as osetfor tangent to ๐น with given slope ๐ฅโ. Note thatin this case, ๐ฅโ โ ๐๐น (๐ฅ) and โ๐น โ(๐ฅโ) +๐ฅโ ยท๐ฅ =๐น (๐ฅ).
Figure 5.1: Geometrical illustration of the Fenchel conjugate
is again dened on ๐ (rather than ๐ โโ โ ๐ ). For reexive spaces, of course, we have๐น โโ = (๐น โ)โ. Intuitively, ๐น โโ is the convex envelope of ๐น , which by Lemma 3.5 coincideswith ๐น itself if ๐น is convex.
Theorem 5.1 (FenchelโMoreauโRockafellar). Let ๐น : ๐ โ โ be proper. Then,
(i) ๐น โโ โค ๐น ;
(ii) ๐น โโ = ๐น ฮ ;
(iii) ๐น โโ = ๐น if and only if ๐น is convex and lower semicontinuous.
Proof. For (i), we take the supremum over all ๐ฅโ โ ๐ โ in the FenchelโYoung inequality (5.1)and obtain that
๐น (๐ฅ) โฅ sup๐ฅโโ๐ โ
{ใ๐ฅโ, ๐ฅใ๐ โ ๐น โ(๐ฅโ)} = ๐น โโ(๐ฅ).
For (ii), we rst note that ๐น โโ is convex and lower semicontinuous by denition as a Fenchelconjugate as well as proper by (i). Hence, Lemma 3.5 yields that
๐น โโ(๐ฅ) = (๐น โโ)ฮ (๐ฅ) = sup {๐(๐ฅ) | ๐ : ๐ โ โ continuous ane with ๐ โค ๐น โโ} .
We now show that we can replace ๐น โโ with ๐น on the right-hand side. For this, let ๐(๐ฅ) =ใ๐ฅโ, ๐ฅใ๐ โ ๐ผ with arbitrary ๐ฅโ โ ๐ โ and ๐ผ โ โ. If ๐ โค ๐น โโ, then (i) implies that ๐ โค ๐น .
58
5 fenchel duality
Conversely, if ๐ โค ๐น , we have that ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ) โค ๐ผ for all ๐ฅ โ ๐ , and taking thesupremum over all ๐ฅ โ ๐ yields that ๐ผ โฅ ๐น โ(๐ฅโ). By denition of ๐น โโ, we thus obtain that
๐(๐ฅ) = ใ๐ฅโ, ๐ฅใ๐ โ ๐ผ โค ใ๐ฅโ, ๐ฅใ๐ โ ๐น โ(๐ฅโ) โค ๐น โโ(๐ฅ) for all ๐ฅ โ ๐,
i.e., ๐ โค ๐น โโ.
Statement (iii) now directly follows from (ii) and Lemma 3.5. ๏ฟฝ
Remark 5.2. Continuing fromRemark 3.6,we can adapt the proof of Theorem 5.1 to proper functionals๐น : ๐ โ โ โ to show that ๐น = (๐นโ)โ if and only if ๐น is convex and weakly-โ lower semicontinuous.
We again consider some relevant examples.
Example 5.3.
(i) Let ๐น๐ be the unit ball in the normed vector space ๐ and take ๐น = ๐ฟ๐น๐ . Then wehave for any ๐ฅโ โ ๐ โ that
(๐ฟ๐น๐ )โ(๐ฅโ) = sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ โ ๐ฟ๐น๐ (๐ฅ)}= sup
โ๐ฅ โ๐โค1{ใ๐ฅโ, ๐ฅใ๐ } = โ๐ฅโโ๐ โ .
Similarly, one shows using the denition of the Fenchel preconjugate and Corol-lary 1.7 that (๐ฟ๐น๐โ )โ(๐ฅ) = โ๐ฅ โ๐ .
(ii) Let ๐ be a normed vector space and take ๐น (๐ฅ) = โ๐ฅ โ๐ . We now distinguish twocases for a given ๐ฅโ โ ๐ โ.
Case 1: โ๐ฅโโ๐ โ โค 1. Then it follows from (1.1) that ใ๐ฅโ, ๐ฅใ๐ โ โ๐ฅ โ๐ โค 0 for all๐ฅ โ ๐ . Furthermore, ใ๐ฅโ, 0ใ = 0 = โ0โ๐ , which implies that
๐น โ(๐ฅโ) = sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ โ โ๐ฅ โ๐ } = 0.
Case 2: โ๐ฅโโ๐ โ > 1. Then by denition of the dual norm, there exists an ๐ฅ0 โ ๐with ใ๐ฅโ, ๐ฅ0ใ๐ > โ๐ฅ0โ๐ . Hence, taking ๐ก โ โ in
0 < ๐ก (ใ๐ฅโ, ๐ฅ0ใ๐ โ โ๐ฅ0โ๐ ) = ใ๐ฅโ, ๐ก๐ฅ0ใ๐ โ โ๐ก๐ฅ0โ๐ โค ๐น โ(๐ฅโ)
yields ๐น โ(๐ฅโ) = โ.
Together we obtain that ๐น โ = ๐ฟ๐น๐โ . As above, a similar argument shows that(โ ยท โ๐ โ)โ = ๐ฟ๐น๐ .
We can generalize Example 5.3 (ii) to powers of norms.
59
5 fenchel duality
Lemma 5.4. Let ๐ be a normed vector space and ๐น (๐ฅ) โ 1๐ โ๐ฅ โ
๐๐ for ๐ โ (1,โ). Then
๐น โ(๐ฅโ) = 1๐ โ๐ฅโโ
๐๐ โ for ๐ โ ๐
๐โ1 .
Proof. Werst consider the scalar function๐ (๐ก) โ 1๐ |๐ก |๐ and compute the Fenchel conjugate
๐โ(๐ ) for ๐ โ โ. By the choice of ๐ and ๐, we then can write 1๐ = 1 โ 1
๐ as well as |๐ |๐ =
sign(๐ )๐ |๐ |1/(๐โ1) = | sign(๐ ) |๐ |1/(๐โ1) |๐ for any ๐ โ โ and therefore obtain
1๐|๐ |๐ =
(sign(๐ ) |๐ |1/(๐โ1)
)๐ โ 1
๐
๏ฟฝ๏ฟฝ๏ฟฝsign(๐ ) |๐ |1/(๐โ1) ๏ฟฝ๏ฟฝ๏ฟฝ๐ โค sup๐กโโ
{๐ก๐ โ 1
๐|๐ก |๐
}โค 1๐|๐ |๐,
where we have used the classical Young inequality ๐ก๐ โค 1๐ |๐ก |๐ + 1
๐ |๐ |๐ in the last step. Thisshows that ๐โ(๐ ) = 1
๐ |๐ |๐ .1
We now write using the denition of the norm in ๐ โ that
๐น โ(๐ฅโ) = sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ โ 1
๐โ๐ฅ โ๐๐
}= sup
๐กโฅ0
{sup๐ฅโ๐น๐
{ใ๐ฅโ, ๐ก๐ฅใ๐ โ 1
๐โ๐ก๐ฅ โ๐๐
}}= sup
๐กโฅ0
{๐ก โ๐ฅโโ๐ โ โ 1
๐|๐ก |๐
}=
1๐|โ๐ฅโโ๐ โ |๐
since ๐ is even and the supremum over all ๐ก โ โ is thus attained for ๐ก โฅ 0. ๏ฟฝ
As for convex subdierentials, Fenchel conjugates of integral functionals can be computedpointwise.
Theorem 5.5. Let ๐ : โ โ โ be measurable, proper and lower semicontinuous, and let๐น : ๐ฟ๐ (ฮฉ) โ โ with 1 โค ๐ < โ be dened as in Lemma 3.7. Then we have for ๐ = ๐
๐โ1 that
๐น โ : ๐ฟ๐ (ฮฉ) โ โ, ๐น โ(๐ขโ) =โซฮฉ๐ โ(๐ขโ(๐ฅ)) ๐๐ฅ .
Proof. We argue similarly as in the proof of Theorem 4.11, with some changes that areneeded sincemeasurability of ๐ โฆ๐ข does not immediately imply that of ๐ โโฆ๐ขโ. Let๐ขโ โ ๐ฟ๐ (ฮฉ)be arbitrary and consider for all ๐ฅ โ ฮฉ the functions
๐ (๐ฅ) โ sup๐กโโ
{๐ก๐ขโ(๐ฅ) โ ๐ (๐ก)} = ๐ โ(๐ขโ(๐ฅ)),as well as for ๐ โ โ
๐๐ (๐ฅ) โ sup|๐ก |โค๐
{๐ก๐ขโ(๐ฅ) โ ๐ (๐ก)} .
1Which is how the FenchelโYoung inequality got its name.
60
5 fenchel duality
Since ๐ขโ is measurable, so is ๐ก๐ขโ โ ๐ (๐ก) for all ๐ก โ dom ๐ โ โ and hence ๐๐ as the pointwisesupremum of measurable functions. Furthermore, by assumption there exists a ๐ก0 โ dom ๐and hence ๐ข0 โ ๐ก0๐ข
โ(๐ฅ) โ ๐ (๐ก0) is measurable as well and satises ๐๐ (๐ฅ) โฅ ๐ข0 for all๐ โฅ |๐ก0 |. Finally, by construction, ๐๐ (๐ฅ) is monotonically increasing and converges to ๐ (๐ฅ)for all ๐ฅ โ ฮฉ. The sequence {๐๐ โ ๐ข0}๐โโ of functions is thus measurable and nonnegative,and the monotone convergence theorem yields thatโซ
ฮฉ๐ (๐ฅ) โ ๐ข0(๐ฅ) ๐๐ฅ =
โซฮฉsup๐โโ
๐๐ (๐ฅ) โ ๐ข0 ๐๐ฅ = sup๐โโ
โซฮฉ๐๐ (๐ฅ) โ ๐ข0(๐ฅ) ๐๐ฅ .
Hence, the pointwise limit ๐ = ๐ โ โฆ ๐ขโ is measurable. By a measurable selection theorem([Ekeland & Tรฉmam 1999, Theorem VIII.1.2]), the pointwise supremum in the denitionof ๐๐ is thus attained at some ๐ข๐ (๐ฅ) and denes a measurable mapping ๐ฅ โฆโ ๐ข๐ (๐ฅ) withโ๐ข๐โ๐ฟโ (ฮฉ) โค ๐. We thus haveโซ
ฮฉ๐ โ(๐ขโ(๐ฅ)) ๐๐ฅ = sup
๐โโ
โซฮฉsup|๐ก |โค๐
{๐ก๐ขโ(๐ฅ) โ ๐ (๐ก)} ๐๐ฅ
= sup๐โโ
โซฮฉ๐ขโ(๐ฅ)๐ข๐ (๐ฅ) โ ๐ (๐ข๐ (๐ฅ)) ๐๐ฅ
โค sup๐ขโ๐ฟ๐ (ฮฉ)
โซฮฉ๐ขโ(๐ฅ)๐ข (๐ฅ) โ ๐ (๐ข (๐ฅ)) ๐๐ฅ = ๐น โ(๐ขโ),
since ๐ข๐ โ ๐ฟโ(ฮฉ) โ ๐ฟ๐ (ฮฉ) for all ๐ โ โ.
For the converse inequality, we can now proceed as in the proof of Theorem 4.11. For any๐ข โ ๐ฟ๐ (ฮฉ) and ๐ขโ โ ๐ฟ๐ (ฮฉ), we have by the FenchelโYoung inequality (5.1) applied to ๐ and๐ โ that
๐ (๐ข (๐ฅ)) + ๐ โ(๐ขโ(๐ฅ)) โฅ ๐ขโ(๐ฅ)๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ.
Since both sides are measurable, this implies thatโซฮฉ๐ โ(๐ขโ(๐ฅ)) ๐๐ฅ โฅ
โซฮฉ๐ขโ(๐ฅ)๐ข (๐ฅ) โ ๐ (๐ข (๐ฅ)) ๐๐ฅ,
and taking the supremum over all ๐ข โ ๐ฟ๐ (ฮฉ) yields claim. ๏ฟฝ
Remark 5.6. A similar representation representation can be shown for vector-valued and spatially-dependent integrands ๐ : ฮฉ รโ โ โ๐ under stronger assumptions; see, e.g., [Rockafellar 1976a,Corollary 3C].
Fenchel conjugates satisfy a number of useful calculus rules, which follow directly fromthe properties of the supremum.
Lemma 5.7. Let ๐น : ๐ โ โ be proper. Then,
61
5 fenchel duality
(i) (๐ผ๐น )โ = ๐ผ๐น โ โฆ (๐ผโ1Id) for any ๐ผ > 0;
(ii) (๐น (ยท + ๐ฅ0) + ใ๐ฅโ0, ยทใ๐ )โ = ๐น โ(ยท โ ๐ฅโ0) โ ใยท โ ๐ฅโ0, ๐ฅ0ใ๐ for all ๐ฅ0 โ ๐ , ๐ฅโ0 โ ๐ โ;
(iii) (๐น โฆ ๐พ)โ = ๐น โ โฆ ๐พโโ for continuously invertible ๐พ โ ๐(๐ ;๐ ) and ๐พโโ โ (๐พโ1)โ.
Proof. (i): For any ๐ผ > 0, we have that
(๐ผ๐น )โ(๐ฅโ) = sup๐ฅโ๐
{๐ผ ใ๐ผโ1๐ฅโ, ๐ฅใ๐ โ ๐ผ๐น (๐ฅ)} = ๐ผ sup
๐ฅโ๐
{ใ๐ผโ1๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ)} = ๐ผ๐น โ(๐ผโ1๐ฅโ).
(ii): Since {๐ฅ + ๐ฅ0 | ๐ฅ โ ๐ } = ๐ , we have that
(๐น (ยท + ๐ฅ0) + ใ๐ฅโ0, ยทใ๐ )โ(๐ฅโ) = sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ + ๐ฅ0)} โ ใ๐ฅโ0, ๐ฅใ๐= sup๐ฅโ๐
{ใ๐ฅโ โ ๐ฅโ0, ๐ฅ + ๐ฅ0ใ๐ โ ๐น (๐ฅ + ๐ฅ0)} โ ใ๐ฅโ โ ๐ฅโ0, ๐ฅ0ใ๐
= sup๐ฅ=๐ฅ+๐ฅ0,๐ฅโ๐
{ใ๐ฅโ โ ๐ฅโ0, ๐ฅใ๐ โ ๐น (๐ฅ)} โ ใ๐ฅโ โ ๐ฅโ0, ๐ฅ0ใ๐
= ๐น โ(๐ฅโ โ ๐ฅโ0) โ ใ๐ฅโ โ ๐ฅโ0, ๐ฅ0ใ๐ .
(iii): Since ๐ = ran๐พ , we have that
(๐น โฆ ๐พ)โ(๐ฆโ) = sup๐ฆโ๐
{ใ๐ฆโ, ๐พโ1๐พ๐ฆใ๐ โ ๐น (๐พ๐ฆ)}= sup๐ฅ=๐พ๐ฆ,๐ฆโ๐
{ใ๐พโโ๐ฆโ, ๐ฅใ๐ โ ๐น (๐ฅ)} = ๐น โ(๐พโโ๐ฆโ). ๏ฟฝ
There are some obvious similarities between the denitions of the Fenchel conjugate andof the subdierential, which yield the following very useful property that plays the roleof a โconvex inverse function theoremโ. (See also Figure 5.1b and compare Figures 4.1aand 4.1b.)
Lemma 5.8 (FenchelโYoung). Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous.Then the following statements are equivalent for any ๐ฅ โ ๐ and ๐ฅโ โ ๐ โ:
(i) ใ๐ฅโ, ๐ฅใ๐ = ๐น (๐ฅ) + ๐น โ(๐ฅโ);(ii) ๐ฅโ โ ๐๐น (๐ฅ);(iii) ๐ฅ โ ๐๐น โ(๐ฅโ).
62
5 fenchel duality
Proof. If (i) holds, the denition of ๐น โ as a supremum immediately implies that
(5.2) ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ) = ๐น โ(๐ฅโ) โฅ ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ) for all ๐ฅ โ ๐,which again by denition is equivalent to (ii). Conversely, taking the supremum over all๐ฅ โ ๐ in (5.2) yields
ใ๐ฅโ, ๐ฅใ๐ โฅ ๐น (๐ฅ) + ๐น โ(๐ฅโ),which together with the FenchelโYoung inequality (5.1) leads to (i).
Similarly, (i) in combination with Theorem 5.1 yields that for all ๐ฅโ โ ๐ โ,
ใ๐ฅโ, ๐ฅใ๐ โ ๐น โ(๐ฅโ) = ๐น (๐ฅ) = ๐น โโ(๐ฅ) โฅ ใ๐ฅโ, ๐ฅใ๐ โ ๐น โ(๐ฅโ),yielding as above the equivalence of (i) and (iii). ๏ฟฝ
Remark 5.9. Recall that ๐๐น โ(๐ฅโ) โ ๐ โโ. Therefore, if ๐ is not reexive, ๐ฅ โ ๐๐น โ(๐ฅโ) in (iii) has to beunderstood via the canonical injection ๐ฝ : ๐ โฉโ ๐ โโ as ๐ฝ๐ฅ โ ๐๐น โ(๐ฅโ), i.e., as
ใ๐ฝ๐ฅ, ๐ฅโ โ ๐ฅโใ๐ โ = ใ๐ฅโ โ ๐ฅโ, ๐ฅใ๐ โค ๐น โ(๐ฅโ) โ ๐น โ(๐ฅโ) for all ๐ฅโ โ ๐ .Using (iii) to conclude equality in (i) or, equivalently, the subdierential inclusion (ii) thereforerequires the additional condition that ๐ฅ โ ๐ โฉโ ๐ โโ. Conversely, if (i) or (ii) hold, (iii) also guaranteesthat the subderivative ๐ฅ is an element of ๐๐น โ(๐ฅโ) โฉ ๐ , which is a stronger claim.
Similar statements apply to (weakly-โ lower semicontinuous) ๐น : ๐ โ โ โ and ๐นโ : ๐ โ โ.
5.2 duality of optimization problems
Lemma 5.8 can be used to replace the subdierential of a (complicated) norm with that of a(simpler) conjugate indicator functional (or vice versa). For example, given a problem ofthe form
(5.3) inf๐ฅโ๐
๐น (๐ฅ) +๐บ (๐พ๐ฅ)
for ๐น : ๐ โ โ and๐บ : ๐ โ โ proper, convex, and lower semicontinuous, and๐พ โ ๐(๐ ;๐ ),we can use Theorem 5.1 to replace ๐บ with the denition of ๐บโโ and obtain the saddle-pointproblem
(5.4) inf๐ฅโ๐
sup๐ฆโโ๐ โ
๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ).
If(!) we were now able to exchange inf and sup, we could write (with inf ๐น = โ sup(โ๐น ))inf๐ฅโ๐
sup๐ฆโโ๐ โ
๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ) = sup๐ฆโโ๐ โ
inf๐ฅโ๐
๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ)
= sup๐ฆโโ๐ โ
โ{sup๐ฅโ๐
โ๐น (๐ฅ) + ใโ๐พโ๐ฆโ, ๐ฅใ๐}โ๐บโ(๐ฆโ).
63
5 fenchel duality
From the denition of ๐น โ, we thus obtain the dual problem
(5.5) sup๐ฆโโ๐ โ
โ๐บโ(๐ฆโ) โ ๐น โ(โ๐พโ๐ฆโ).
As a side eect, we have shifted the operator ๐พ from๐บ to ๐น โ without having to invert it.
The following theorem in an elegant way uses the Fermat principle, the sum and chain rules,and the FenchelโYoung equality to derive sucient conditions for the exchangeability.
Theorem 5.10 (FenchelโRockafellar). Let ๐ and ๐ be Banach spaces, ๐น : ๐ โ โ and ๐บ :๐ โ โ be proper, convex, and lower semicontinuous, and ๐พ โ ๐(๐ ;๐ ). Assume furthermorethat
(i) the primal problem (5.3) admits a solution ๐ฅ โ ๐ ;(ii) there exists an ๐ฅ0 โ dom(๐บ โฆ ๐พ) โฉ dom ๐น with ๐พ๐ฅ0 โ int(dom๐บ) .
Then the dual problem (5.5) admits a solution ๐ฆโ โ ๐ โ and
(5.6) min๐ฅโ๐
๐น (๐ฅ) +๐บ (๐พ๐ฅ) = max๐ฆโโ๐ โ โ๐บ
โ(๐ฆโ) โ ๐น โ(โ๐พโ๐ฆโ).
Furthermore, ๐ฅ and ๐ฆโ are solutions to (5.3) and (5.5), respectively, if and only if
(5.7){
๐ฆโ โ ๐๐บ (๐พ๐ฅ),โ๐พโ๐ฆโ โ ๐๐น (๐ฅ).
Proof. Let rst ๐ฅ โ ๐ be a solution to (5.3). By assumption (ii), Theorems 4.14 and 4.17 areapplicable; Theorem 4.2 thus implies that
0 โ ๐(๐น +๐บ โฆ ๐พ) (๐ฅ) = ๐พโ๐๐บ (๐พ๐ฅ) + ๐๐น (๐ฅ)
and thus the existence of a ๐ฆโ โ ๐๐บ (๐พ๐ฅ) with โ๐พโ๐ฆโ โ ๐๐น (๐ฅ), i.e., satisfying (5.7).Conversely, let (5.7) hold for ๐ฅ โ ๐ and ๐ฆโ โ ๐ โ. Then again by Theorems 4.2, 4.14 and 4.17,๐ฅ is a solution to (5.3). Furthermore, (5.7) together with Lemma 5.8 imply equality in theFenchelโYoung inequalities for ๐น and ๐บ , i.e.,
(5.8){ ใ๐ฆโ, ๐พ๐ฅใ๐ = ๐บ (๐พ๐ฅ) +๐บโ(๐ฆโ),ใโ๐พโ๐ฆโ, ๐ฅใ๐ = ๐น (๐ฅ) + ๐น โ(โ๐พโ๐ฆโ).
Adding both equations and rearranging now yields
(5.9) ๐น (๐ฅ) +๐บ (๐พ๐ฅ) = โ๐น โ(โ๐พโ๐ฆโ) โ๐บโ(๐ฆโ) .
It remains to show that ๐ฆโ is a solution to (5.5). For this purpose, we introduce
(5.10) ๐ฟ : ๐ ร ๐ โ โ โ, ๐ฟ(๐ฅ, ๐ฆโ) = ๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ).
64
5 fenchel duality
For all ๐ฅ โ ๐ and ๏ฟฝ๏ฟฝโ โ ๐ โ, we always have that
(5.11) sup๐ฆโโ๐ โ
๐ฟ(๐ฅ, ๐ฆโ) โฅ ๐ฟ(๐ฅ, ๏ฟฝ๏ฟฝโ) โฅ inf๐ฅโ๐
๐ฟ(๐ฅ, ๏ฟฝ๏ฟฝโ),
and hence (taking the inmum over all ๐ฅ in the rst and the supremum over all ๏ฟฝ๏ฟฝโ in thesecond inequality) that
(5.12) inf๐ฅโ๐
sup๐ฆโโ๐ โ
๐ฟ(๐ฅ, ๐ฆโ) โฅ sup๐ฆโโ๐ โ
inf๐ฅโ๐
๐ฟ(๐ฅ, ๐ฆโ).
We thus obtain that
(5.13) ๐น (๐ฅ) +๐บ (๐พ๐ฅ) = inf๐ฅโ๐
sup๐ฆโโ๐ โ
๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ)
โฅ sup๐ฆโโ๐ โ
inf๐ฅโ๐
๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ)
= sup๐ฆโโ๐ โ
โ๐บโ(๐ฆโ) โ ๐น โ(โ๐พโ๐ฆโ).
Combining this with (5.9) yields that
โ๐บโ(๐ฆโ) โ ๐น (โ๐พโ๐ฆโ) = ๐น (๐ฅ) +๐บ (๐พ๐ฅ) โฅ sup๐ฆโโ๐ โ
โ๐บโ(๐ฆโ) โ ๐น โ(โ๐พโ๐ฆโ),
i.e., ๐ฆโ is a solution to (5.5), which in particular shows the claimed existence of a solution.
Since all solutions to (5.5) have by denition the same (maximal) functional value, (5.9) alsoimplies (5.6).
Finally, if ๐ฅ โ ๐ and ๐ฆโ โ ๐ โ are solutions to (5.3) and (5.5), respectively, the just derivedstrong duality (5.6) conversely implies that (5.9) holds. Together with the productive zero,we obtain from this that
0 = [๐บ (๐พ๐ฅ) +๐บโ(๐ฆโ) โ ใ๐ฆโ, ๐พ๐ฅใ๐ ] + [๐น (๐ฅ) + ๐น โ(โ๐พโ๐ฆโ) โ ใโ๐พโ๐ฆโ, ๐ฅใ๐ ] .Since both brackets have to be nonnegative due to the FenchelโYoung inequality, they eachhave to be zero. We therefore deduce that (5.8) holds, and hence Lemma 5.8 implies (5.7). ๏ฟฝ
Remark 5.11. If ๐ is the dual of a separable Banach space ๐โ, it is possible to derive a similar dualityresult with the (weakly-โ lower semicontinuous) preconjugate ๐นโ : ๐โ โ โ in place of ๐น โ : ๐ โ โ โ
under the additional assumption that ran๐พโ โ ๐โ ( ๐ โ (using Remark 5.9 in (5.8)). If ๐โ is a โnicerโspace than ๐ โ (e.g., for ๐ = M(ฮฉ), the space of bounded Radon measures on a domain ฮฉ with๐โ = ๐ถ0(ฮฉ), the space of continuous functions with compact support), the predual problem
sup๐ฆโโ๐ โ
โ๐บโ(๐ฆโ) โ ๐นโ(โ๐พโ๐ฆโ)
may be easier to treat than the dual problem (5.5). This is the basis of the โpreduality trickโ used in,e.g., [Hintermรผller & Kunisch 2004; Clason & Kunisch 2011].
65
5 fenchel duality
Remark 5.12. The condition (ii) was only used to guarantee equality in the sum and chain rulesTheorems 4.14 and 4.17 applied to ๐น +๐บ โฆ ๐พ . Since these rules hold under the weaker conditionof Remark 4.16 (recall that the chain rule was proved by reduction to the sum rule), Theorem 5.10and Corollary 5.13 hold under this weaker condition as well.
The relations (5.7) are referred to as Fenchel extremality conditions; we can use Lemma 5.8to generate further, equivalent, optimality conditions by inverting one or the other sub-dierential inclusion. We will later exploit this to derive implementable algorithms forsolving optimization problems of the form (5.3). Furthermore, Theorem 5.10 characterizesthe subderivative ๐ฆโ produced by the sum and chain rules as solution to a convex mini-mization problem, which may be useful. For example, if either ๐น โ or ๐บโ is strongly convex,this subderivative will be unique, which has benecial consequences for the stability andthe convergence of algorithms for the computation of solutions to (5.7).
For their analysis, it will sometimes be more convenient to apply the consequences ofTheorem 5.10 in the form of the saddle-point problem (5.4). For a general mapping ๐ฟ :๐ ร ๐ โ โ โ, we call (๐ฅ, ๏ฟฝ๏ฟฝโ) a saddle point of ๐ฟ if
(5.14) sup๐ฆโโ๐ โ
๐ฟ(๐ฅ, ๐ฆโ) โค ๐ฟ(๐ฅ, ๏ฟฝ๏ฟฝโ) โค inf๐ฅโ๐
๐ฟ(๐ฅ, ๏ฟฝ๏ฟฝโ).
(Note that the opposite inequality (5.11) always holds.)
Corollary 5.13. Assume that the conditions of Theorem 5.10 hold. Then there exists a saddlepoint (๐ฅ, ๐ฆโ) โ ๐ ร ๐ โ to
๐ฟ(๐ฅ, ๐ฆโ) โ ๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ).
Furthermore, for any (๐ฅ, ๐ฆโ) โ ๐ ร ๐ โ,
(5.15) ๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ) โค ๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ)โค ๐น (๐ฅ) + ใ๐ฆโ, ๐พ๐ฅใ๐ โ๐บโ(๐ฆโ).
Proof. Both statements follow from the fact that under the assumption, the inequality in(5.13) and hence in (5.14) holds as an equality. ๏ฟฝ
With the notation ๐ข = (๐ฅ, ๐ฆ), let us dene the duality gap
(5.16) G(๐ข) โ ๐น (๐ฅ) +๐บ (๐พ๐ฅ) +๐บโ(๐ฆโ) + ๐น โ(โ๐พโ๐ฆโ).
By Theorem 5.10, we have G โฅ 0 and G(๐ข) = 0 if and only if ๐ข is a saddle point.
On the other hand, for any saddle point ๐ข = (๐ฅ, ๐ฆโ) of a Lagrangian ๐ฟ : ๐ ร ๐ โ โ โ, wecan also dene the Lagrangian duality gap
G๐ฟ (๐ข;๐ข) โ ๐ฟ(๐ฅ, ๐ฆโ) โ ๐ฟ(๐ฅ, ๐ฆโ).
66
5 fenchel duality
For ๐ฟ dened in (5.10), we always have by the denition of the convex conjugate that
0 โค G๐ฟ (๐ข;๐ข) โค G(๐ข).
However, G๐ฟ (๐ข;๐ข) = 0 does not necessarily imply that ๐ข is a saddle point. (This is the caseif ๐ฟ is strictly convex in ๐ฅ or strictly concave in ๐ฆ , i.e., if either ๐น or ๐บโ is strictly convex.)Nevertheless, as we will see in later chapters, the Lagrangian duality gap can generally beshown to converge for iterates produced by optimization algorithms, while this is moredicult for the conventional duality gap.
67
6 MONOTONE OPERATORS AND PROXIMAL POINTS
Any minimizer ๐ฅ โ ๐ of a convex functional ๐น : ๐ โ โ satises by Theorem 4.2 theFermat principle 0 โ ๐๐น (๐ฅ). To use this to characterize ๐ฅ , and, later, to derive implementablealgorithms for its iterative computation, we now study the mapping ๐ฅ โฆโ ๐๐น (๐ฅ) in moredetail.
6.1 basic properties of set-valued mappings
We start with some basic concepts. For two normed vector spaces ๐ and ๐ we consider aset-valued mapping ๐ด : ๐ โ P(๐ ), also denoted by ๐ด : ๐ โ ๐ , and dene
โข its domain of denition dom๐ด = {๐ฅ โ ๐ | ๐ด(๐ฅ) โ โ };โข its range ran๐ด =
โ๐ฅโ๐ ๐ด(๐ฅ);
โข its graph graph๐ด = {(๐ฅ, ๐ฆ) โ ๐ ร ๐ | ๐ฆ โ ๐ด(๐ฅ)};โข its inverse ๐ดโ1 : ๐ โ ๐ via ๐ดโ1(๐ฆ) = {๐ฅ โ ๐ | ๐ฆ โ ๐ด(๐ฅ)} for all ๐ฆ โ ๐ .
(Note that ๐ดโ1(๐ฆ) = โ is allowed by the denition; hence for set-valued mappings, theirinverse and preimage โ which always exists โ coincide.)
For ๐ด, ๐ต : ๐ โ ๐ , ๐ถ : ๐ โ ๐ , and _ โ โ we further dene
โข _๐ด : ๐ โ ๐ via (_๐ด) (๐ฅ) = {_๐ฆ | ๐ฆ โ ๐ด(๐ฅ)};โข ๐ด + ๐ต : ๐ โ ๐ via (๐ด + ๐ต) (๐ฅ) = {๐ฆ + ๐ง | ๐ฆ โ ๐ด(๐ฅ), ๐ง โ ๐ต(๐ฅ)};โข ๐ถ โฆ๐ด : ๐ โ ๐ via (๐ถ โฆ๐ด) (๐ฅ) = {๐ง | there is ๐ฆ โ ๐ด(๐ฅ) with ๐ง โ ๐ถ (๐ฆ)}.
Of particular importance not only in the following but also in Part IV is the continuity ofset-valued mappings. We rst introduce notions of convergence of sets. So let {๐๐}๐โโ bea sequence of subsets of ๐ . We dene
(i) the outer limit as the set
lim sup๐โโ
๐๐ โ
{๐ฅ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ there exists {๐๐}๐โโ with ๐ฅ๐๐ โ ๐๐๐ and lim๐โโ
๐ฅ๐๐ = ๐ฅ
},
68
6 monotone operators and proximal points
๐1 ๐2
๐3
๐4
๐5
. . .
. . .
0
1
Figure 6.1: Illustration of Example 6.1 with lim sup๐โโ๐๐ = [0, 1] while lim inf๐โโ๐๐ = โ .
(ii) the inner limit as the set
lim inf๐โโ ๐๐ โ
{๐ฅ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ there exist ๐ฅ๐ โ ๐๐ with lim๐โโ๐ฅ๐ = ๐ฅ
}.
Correspondingly, we dene the weak outer limit and the weak inner limit, denoted byw-lim sup ๐โโ๐๐ and w-lim inf ๐โโ๐๐, respectively, using weakly converging (sub)se-quences. Similarly, for a dual space๐ โ, we dene theweak-โ outer limit w-โ-lim sup ๐โโ๐
โ๐
and the weak-โ inner limit w-โ-lim inf ๐โโ๐ โ๐ .
The outer limit consists of all points approximable through some subsequence of the sets๐๐, while the inner limit has to be approximable through every subsequence. The vastdierence between inner and outer limits is illustrated by the following extreme example.
Example 6.1. Let ๐ = โ and {๐๐}๐โโ, ๐๐ โ [0, 1], be given as
๐๐ โ
[0, 13 ) if ๐ = 3๐ โ 2 for some ๐ โ โ,
[ 13 , 23 ) if ๐ = 3๐ โ 1 for some ๐ โ โ,
[ 23 , 1] if ๐ = 3๐ for some ๐ โ โ,
see Figure 6.1. Then,
lim sup๐โโ
๐๐ = [0, 1],
since for any๐ฅ โ [0, 1],we can nd a subsequence of {๐๐}๐โโ (by selecting subsequenceswith, e.g., ๐ = 3๐ โ 1 for ๐ โ โ if ๐ฅ < 1
3 ) that contain ๐ฅ . On the other hand,
lim inf๐โโ ๐๐ = โ ,
since for any ๐ฅ โ [0, 1], there will be a subsequence of ๐๐ (again, selecting only subse-quences with, e.g., ๐ = 3๐ for ๐ โ โ if ๐ฅ < 1
3 ) that will not contain points arbitrarilyclose to ๐ฅ .
Lemma 6.2. Let {๐๐}๐โโ, ๐๐ โ ๐ . Then lim sup๐โโ๐๐ and lim inf๐โโ๐๐ are (possiblyempty) closed sets.
69
6 monotone operators and proximal points
Proof. Let ๐โ โ lim sup๐โโ๐๐. If ๐โ is empty, there is nothing to prove. So suppose,{๐ฅ๐}๐โโ โ ๐โ converges to some ๐ฆ โ ๐ . Then by the denition of ๐โ as an outerlimit, there exist innite subsets ๐๐ โ โ and subsequences ๐ฅ๐,๐ โ ๐๐ for ๐ โ ๐๐ withlim๐๐3๐โโ ๐ฅ๐,๐ = ๐ฅ๐ . We can nd for each ๐ โ โ an index ๐๐ โ ๐๐ such that โ๐ฅ๐ โ๐ฅ๐,๐๐ โ๐ โค1/๐. Thus โ๐ฆ โ ๐ฅ๐,๐๐ โ๐ โค โ๐ฆ โ ๐ฅ๐ โ๐ + 1/๐ . Letting ๐ โ โ we see that ๐๐๐ 3 ๐ฅ๐,๐๐ โ ๐ฆ .Thus ๐ฆ โ ๐โ, that is, ๐โ is (strongly) closed.
Let then ๐โ โ lim inf๐โโ๐๐. If ๐โ is empty, there is nothing to prove. So suppose{๐ฅ๐}๐โโ โ ๐โ converges to some ๐ฆ โ ๐ . Then for each ๐ โ โ there exist ๐ฅ๐,๐ โ ๐๐with lim๐โโ ๐ฅ๐,๐ = ๐ฅ๐ . We can consequently nd for each ๐ โ โ an index ๐๐ โ โ suchthat โ๐ฅ๐ โ ๐ฅ๐,๐โ๐ < 1/๐ for ๐ โฅ ๐๐ . Thus for every ๐ โ โ we can nd ๐๐ โ โ suchthat โ๐ฅ๐๐ โ ๐ฅ๐๐,๐โ๐ โค 1/๐๐ with ๐๐ โ โ as ๐ โ โ. Since this implies โ๐ฆ โ ๐ฅ๐๐,๐โ๐ โคโ๐ฆ โ ๐ฅ๐๐ โ๐ + 1/๐๐ , letting ๐ โ โ we see that ๐๐ 3 ๐ฅ๐๐,๐ โ ๐ฆ . Thus ๐ฆ โ ๐โ, that is, ๐โ is(strongly) closed. ๏ฟฝ
With these denitions, we can dene limits and continuity of set-valued mappings. Speci-cally, for ๐ด : ๐ โ ๐ , and a subset ๐ถ โ ๐ , we dene the inner and outer limits (relative to๐ถ , if ๐ถ โ ๐ ) as
lim sup๐ถ3๐ฅโ๐ฅ
๐ด(๐ฅ) โโ
๐ถ3๐ฅ๐โ๐ฅ
lim sup๐โโ
๐ด(๐ฅ๐),
andlim inf๐ถ3๐ฅโ๐ฅ
๐ด(๐ฅ) โโ
๐ถ3๐ฅ๐โ๐ฅ
lim inf๐โโ ๐ด(๐ฅ๐).
If ๐ถ = ๐ , we drop ๐ถ from the notations. Analogously, we dene weak-to-strong, strong-to-weak, and weak-to-weak limits by replacing ๐ฅ๐ โ ๐ฅ by ๐ฅ๐ โ ๐ฅ and/or the outer/innerlimit by the weak outer/inner limit.
Corollary 6.3. Let ๐ด : ๐ โ ๐ and ๐ฅ โ ๐ . Then lim sup๐ฅโ๐ฅ ๐ด(๐ฅ) and lim inf๐ฅโ๐ฅ ๐ด(๐ฅ) are(possibly empty) closed sets.
Proof. The proof of the closedness of the outer limit is analogous to Lemma 6.2, while theproof of the closedness of the inner limit is a consequence of Lemma 6.2 and of the factthat the intersections of closed sets are closed. ๏ฟฝ
Let then ๐ด : ๐ โ ๐ be a set-valued mapping. We say that
(i) ๐ด is outer semicontinuous at ๐ฅ if lim sup๐ถ3๐ฅโ๐ฅ ๐ด(๐ฅ) โ ๐ด(๐ฅ).(ii) ๐ด is inner semicontinuous at ๐ฅ if lim inf๐ถ3๐ฅโ๐ฅ ๐ด(๐ฅ) โ ๐ด(๐ฅ).(iii) The map ๐ด is outer/inner semicontinuous if it is outer/inner semicontinuous at all
๐ฅ โ ๐ .
70
6 monotone operators and proximal points
๐ด
๐ฅ1 ๐ฅ2
Figure 6.2: Illustration of outer and inner semicontinuity. The black line indicates thebounds on the boundary of graph ๐น that belong to the graph. The set-valuedmapping ๐ด is not outer semicontinuous at ๐ฅ1, because ๐ด(๐ฅ1) does not includeall limits from the right. It is outer semicontinuous at the โdiscontinuousโ point๐ฅ2, as ๐ด(๐ฅ2) includes all limits from both sides. The mapping ๐ด is not innersemicontinuous at ๐ฅ2, because at this point, ๐ด(๐ฅ) cannot be approximated fromboth sides. It is inner semicontinuous at every other point ๐ฅ , including ๐ฅ1, as atthis points ๐ด(๐ฅ) can be approximated from both sides.
(iv) continuous (at ๐ฅ ) if it is both inner and outer semicontinuous (at ๐ฅ ).
(v) We say that these properties are โrelative๐ถโ when we restrict ๐ฅ โ ๐ถ for some๐ถ โ ๐ .These concepts are illustrated in Figure 6.2.
Just like lower semicontinuity of functionals, the outer semicontinuity of set-valued map-pings can be interpreted as a closedness property and will be crucial. The following lemmais stated for strong-to-strong outer semicontinuity, but corresponding statements hold(with identical proof) for weak-to-strong, strong-to-weak, and weak-to-weak outer semi-continuity as well.
Lemma 6.4. A set-valued mapping ๐ด : ๐ โ ๐ is outer semicontinuous if and only ifgraph๐ด โ ๐ ร ๐ is closed, i.e., ๐ฅ๐ โ ๐ฅ and ๐ด(๐ฅ๐) 3 ๐ฆ๐ โ ๐ฆ imply that ๐ฆ โ ๐ด(๐ฅ).
Proof. Let ๐ฅ๐ โ ๐ฅ and ๐ฆ๐ โ ๐ด(๐ฅ๐), and suppose also ๐ฆ๐ โ ๐ฆ . Then if graph๐ด is closed,(๐ฅ, ๐ฆ) โ graph๐ด and hence ๐ฆ โ ๐ด(๐ฅ). Since this holds for arbitrary sequences {๐ฅ๐}๐โโ, ๐ดis outer semicontinuous.
If, on the other hand, ๐ด is outer semicontinuous, and (๐ฅ๐, ๐ฆ๐) โ graph๐ด converge to(๐ฅ, ๐ฆ) โ ๐ ร ๐ โ then ๐ฆ โ ๐ด(๐ฅ) and hence (๐ฅ, ๐ฆ) โ graph๐ด. Since this holds for arbitrarysequences {(๐ฅ๐, ๐ฆ๐)}๐โโ, graph๐ด is closed. ๏ฟฝ
6.2 monotone operators
For the codomain ๐ = ๐ โ (as in the case of ๐ฅ โฆโ ๐๐น (๐ฅ)), additional properties becomeimportant. A set-valued mapping๐ด : ๐ โ ๐ โ is calledmonotone if graph๐ด โ โ (to exclude
71
6 monotone operators and proximal points
trivial cases) and
(6.1) ใ๐ฅโ1 โ ๐ฅโ2, ๐ฅ1 โ ๐ฅ2ใ๐ โฅ 0 for all (๐ฅ1, ๐ฅโ1 ), (๐ฅ2, ๐ฅโ2) โ graph๐ด.
If ๐น : ๐ โ โ is convex, then ๐๐น : ๐ โ ๐ โ, ๐ฅ โฆโ ๐๐น (๐ฅ), is monotone: For any ๐ฅ1, ๐ฅ2 โ ๐with ๐ฅโ1 โ ๐๐น (๐ฅ1) and ๐ฅโ2 โ ๐๐น (๐ฅ2), we have by denition that
ใ๐ฅโ1 , ๐ฅ โ ๐ฅ1ใ๐ โค ๐น (๐ฅ) โ ๐น (๐ฅ1) for all ๐ฅ โ ๐,ใ๐ฅโ2, ๐ฅ โ ๐ฅ2ใ๐ โค ๐น (๐ฅ) โ ๐น (๐ฅ2) for all ๐ฅ โ ๐ .
Adding the rst inequality for ๐ฅ = ๐ฅ2 and the second for ๐ฅ = ๐ฅ1 and rearranging theresult yields (6.1). (This generalizes the well-known fact that if ๐ : โ โ โ is convex anddierentiable, ๐ โฒ is monotonically increasing.) Furthermore, if๐ด, ๐ต : ๐ โ ๐ โ are monotoneand _ โฅ 0, then _๐ด and ๐ด + ๐ต are monotone as well.
In fact, we will need the following, stronger, property, which guarantees that ๐ด is outersemicontinuous: A monotone operator ๐ด : ๐ โ ๐ โ is called maximally monotone, if theredoes not exist another monotone operator ๏ฟฝ๏ฟฝ : ๐ โ ๐ โ such that graph๐ด ( graph ๏ฟฝ๏ฟฝ. Inother words, ๐ด is maximal monotone if for any ๐ฅ โ ๐ and ๐ฅโ โ ๐ โ the condition
(6.2) ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ 0 for all (๐ฅ, ๐ฅโ) โ graph๐ด
implies that ๐ฅโ โ ๐ด(๐ฅ). (In other words, (6.2) holds if and only if (๐ฅ, ๐ฅโ) โ graph๐ด.)For xed ๐ฅ โ ๐ and ๐ฅโ โ ๐ โ, the condition claims that if ๐ด is monotone, then so is theextension
๏ฟฝ๏ฟฝ : ๐ โ ๐ โ, ๐ฅ โฆโ{๐ด(๐ฅ) โช {๐ฅโ} if ๐ฅ = ๐ฅ,
๐ด(๐ฅ) if ๐ฅ โ ๐ฅ .
For ๐ด to be maximally monotone means that this is not a true extension, i.e., ๏ฟฝ๏ฟฝ = ๐ด. Forexample, the operator
๐ด : โ โ โ, ๐ก โฆโ{1} if ๐ก > 0,{0} if ๐ก = 0,{โ1} if ๐ก < 0,
is monotone but not maximally monotone, since ๐ด is a proper subset of the monotoneoperator dened by ๏ฟฝ๏ฟฝ(๐ก) = sign(๐ก) = ๐( | ยท |) (๐ก) from Example 4.7.
Several useful properties follow directly from the denition.
Lemma 6.5. If ๐ด : ๐ โ ๐ โ is maximally monotone, then so is _๐ด for all _ > 0.
Proof. Let ๐ฅ โ ๐ and ๐ฅโ โ ๐ โ, and assume that
0 โค ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ = _ใ_โ1๐ฅโ โ _โ1๐ฅโ, ๐ฅ โ ๐ฅใ๐ for all (๐ฅ, ๐ฅโ) โ graph _๐ด.
Since ๐ฅโ โ _๐ด(๐ฅ) if and only if _โ1๐ฅโ โ ๐ด(๐ฅ) and ๐ด is maximally monotone, this impliesthat _โ1๐ฅโ โ ๐ด(๐ฅ), i.e., ๐ฅโ โ (_๐ด) (๐ฅ). Hence, _๐ด is maximally monotone. ๏ฟฝ
72
6 monotone operators and proximal points
Lemma 6.6. If ๐ด : ๐ โ ๐ โ is maximally monotone, then ๐ด(๐ฅ) is convex and closed for all๐ฅ โ ๐ .
Proof. Closedness follows from Lemma 6.8. Assume then that ๐ด(๐ฅ) is not convex, i.e.,๐ฅโ_โ _๐ฅโ + (1 โ _)๐ฅโ โ ๐ด(๐ฅ) for some ๐ฅโ, ๐ฅโ โ ๐ด(๐ฅ) and _ โ (0, 1). We then show that ๐ด is
not maximal. To see this, we dene ๏ฟฝ๏ฟฝ via
๏ฟฝ๏ฟฝ(๐ฆ) โ{๐ด(๐ฆ) ๐ฆ โ ๐ฅ,
๐ด(๐ฅ) โช {๐ฅโ_}, ๐ฆ = ๐ฅ,
and show that ๏ฟฝ๏ฟฝ is monotone. By the denition of ๏ฟฝ๏ฟฝ, it suces to show for all ๐ฆ โ ๐ and๐ฆโ โ ๐ด(๐ฆ) that
ใ๐ฅโ_ โ ๐ฆโ, ๐ฅ โ ๐ฆใ๐ โฅ 0.
But this follows directly from the denition of ๐ฅ๐ฅ_and the monotonicity of ๐ด. ๏ฟฝ
Lemma 6.7. Let ๐ be a reexive Banach space. If ๐ด : ๐ โ ๐ โ is maximally monotone, thenso is ๐ดโ1 : ๐ โ โ ๐ โโ ' ๐ .
Proof. First, recall that the inverse๐ดโ1 : ๐ โ โ ๐ always exists as a set-valued mapping andcan be identied with a set-valued mapping from ๐ โ to ๐ โโ with the aid of the canonicalinjection ๐ฝ : ๐ โ ๐ โโ from (1.2), i.e.,
๐ดโ1(๐ฅโ) โ {๐ฝ๐ฅ โ ๐ โโ | ๐ฅโ โ ๐ด(๐ฅ)} for all ๐ฅโ โ ๐ โ
From this and the denition (1.2), it is clear that ๐ดโ1 is monotone if and only if ๐ด is.
Let now ๐ฅโ โ ๐ โ and ๐ฅโโ โ ๐ โโ be given, and assume that
(6.3) ใ๐ฅโโ โ ๐ฅโโ, ๐ฅโ โ ๐ฅโใ๐ โ for all (๐ฅโ, ๐ฅโโ) โ graph๐ดโ1.
Since ๐ is reexive, ๐ฝ is surjective such that there exists an ๐ฅ โ ๐ with ๐ฅโโ = ๐ฝ๐ฅ . Similarly,we can write ๐ฅโโ = ๐ฝ๐ฅ for some ๐ฅ โ ๐ with ๐ฅโ โ ๐ด(๐ฅ). By denition of the duality pairing,(6.3) is thus equivalent to
ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ 0
for all ๐ฅ โ ๐ and ๐ฅโ โ ๐ด(๐ฅ). But since๐ด is maximally monotone, this implies that ๐ฅโ โ ๐ด(๐ฅ)and hence ๐ฅโโ = ๐ฝ๐ฅ โ ๐ดโ1(๐ฅ). ๏ฟฝ
We now come to the outer semicontinuity.
Lemma 6.8. Let ๐ด : ๐ โ ๐ โ be maximally monotone. Then ๐ด is both weak-to-strong andstrong-to-weak-โ outer semicontinuous.
73
6 monotone operators and proximal points
Proof. Let ๐ฅ โ ๐ and ๐ฅโ โ ๐ โ and consider sequences {๐ฅ๐}๐โโ โ ๐ with ๐ฅ๐ โ ๐ฅ and{๐ฅโ๐}๐โโ โ ๐ โ with ๐ฅโ๐ โ ๐ด(๐ฅ๐) and ๐ฅโ๐ โ ๐ฅโ (or ๐ฅ๐ โ ๐ฅ and ๐ฅโ๐ โโ ๐ฅโ). For arbitrary ๐ฅ โ ๐and ๐ฅโ โ ๐ด(๐ฅ), the monotonicity of ๐ด implies that
0 โค ใ๐ฅโ๐ โ ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐since the duality pairing of strongly and weakly (or weakly-โ and strongly) convergingsequences is convergent. Since ๐ด is maximally monotone, we obtain that ๐ฅโ โ ๐ด(๐ฅ) andhence๐ด is weak-to-strong (or strong-to-weakly-โ) outer semicontinuous by Lemma 6.4. ๏ฟฝ
Since the pairing of weakly and weakly-โ convergent sequences does not converge ingeneral, weak-to-weak-โ outer semicontinuity requires additional assumptions on the twosequences. Although we will not need to make use of it, the following notion can proveuseful in other contexts. We call a set-valuedmapping๐ด : ๐ โ ๐ โ BCP outer semicontinuous(for BrezisโCrandallโPazy), if for any sequences {๐ฅ๐}๐โโ โ ๐ and {๐ฅโ๐}๐โโ โ ๐ โ with
(i) ๐ฅ๐ โ ๐ฅ and ๐ด(๐ฅ๐) 3 ๐ฅโ๐ โโ ๐ฅโ,
(ii) lim sup๐โโ
ใ๐ฅโ๐ โ ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โค 0,
we have ๐ฅโ โ ๐ด(๐ฅ). The following result from [Brezis, Crandall & Pazy 1970, Lemma 1.2](hence the name) shows that maximally monotone operators are BCP outer semicontinu-ous.
Lemma 6.9. Let ๐ be a Banach space and let ๐ด : ๐ โ ๐ โ be maximally monotone. Then ๐ด isBCP outer semicontinuous.
Proof. First, the monotonicity of ๐ด and assumption (ii) imply that
(6.4) 0 โค lim inf๐โโ ใ๐ฅโ๐ โ ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โค lim sup
๐โโใ๐ฅโ๐ โ ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โค 0.
Furthermore, from assumption (i) and the fact that ๐ is a Banach space, it follows that{๐ฅ๐}๐โโ and {๐ฅโ๐}๐โโ and hence also {ใ๐ฅโ๐, ๐ฅ๐ใ๐ }๐โโ are bounded. Thus there exists a sub-sequence such that ใ๐ฅโ๐๐ , ๐ฅ๐๐ ใ๐ โ ๐ฟ for some ๐ฟ โ โ. Passing to the limit, and using (6.4),we obtain that
0 = lim๐โโ
ใ๐ฅโ๐๐ โ ๐ฅโ, ๐ฅ๐๐ โ ๐ฅใ๐= lim๐โโ
ใ๐ฅโ๐๐ , ๐ฅ๐๐ ใ๐ โ lim๐โโ
ใ๐ฅโ๐๐ , ๐ฅใ๐ โ lim๐โโ
ใ๐ฅโ, ๐ฅ๐๐ ใ๐ + ใ๐ฅโ, ๐ฅใ๐= ๐ฟ โ ใ๐ฅโ, ๐ฅใ๐ .
Since the limit does not depend on the subsequence, we have that ใ๐ฅโ๐, ๐ฅ๐ใ๐ โ ใ๐ฅโ, ๐ฅใ๐ .
74
6 monotone operators and proximal points
Let now ๐ฅ โ ๐ and ๐ฅโ โ ๐ด(๐ฅ) be arbitrary. Using again the monotonicity of ๐ด andassumption (i) together with the rst claim yields
0 โค lim inf๐โโ ใ๐ฅโ๐ โ ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐
โค lim๐โโใ๐ฅ
โ๐, ๐ฅ๐ใ๐ โ lim
๐โโใ๐ฅโ๐, ๐ฅใ๐ โ lim
๐โโใ๐ฅโ, ๐ฅ๐ใ๐ + ใ๐ฅโ, ๐ฅใ๐
= ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐and hence that ๐ฅโ โ ๐ด(๐ฅ) by the maximal monotonicity of ๐ด. ๏ฟฝ
The usefulness of BCP outer semicontinuity arises from the fact that it also implies weak-to-strong outer semicontinuity under slightly weaker conditions on ๐ด.
Lemma 6.10. Suppose ๐ด : ๐ โ ๐ โ is monotone (but not necessarily maximally monotone)and BCP outer semicontinuous. Then ๐ด is also weak-to-strong outer semicontinuous.
Proof. Let ๐ฅ๐ โ ๐ฅ and ๐ฅโ๐ โ ๐ฅโ with ๐ฅโ๐ โ ๐ด(๐ฅ๐) for all ๐ โ โ. This implies that ๐ฅโ๐ โโ ๐ฅโ
as well and that {๐ฅ๐}๐โโ is bounded. We thus have for some ๐ถ > 0 that
lim sup๐โโ
ใ๐ฅโ๐ โ ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โค ๐ถ lim sup๐โโ
โ๐ฅโ๐ โ ๐ฅโโ๐ โ = 0.
Hence, condition (ii) is satised, and the BCP outer semicontinuity yields ๐ฅโ โ ๐ด(๐ฅ). ๏ฟฝ
We now show that convex subdierentials are maximally monotone. Although this result(known as Rockafellarโs Theorem, see [Rockafellar 1970]) holds in arbitrary Banach spaces,the proof (adapted from [Simons 2009]) greatly simplies in reexive Banach spaces.
Theorem 6.11. Let ๐ be a reexive Banach space and ๐น : ๐ โ โ be proper, convex, and lowersemicontinuous. Then ๐๐น : ๐ โ ๐ โ is maximally monotone.
Proof. First, we already know that ๐๐น is monotone. Let now ๐ฅ โ ๐ and ๐ฅโ โ ๐ โ be givensuch that
(6.5) ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ 0 for all ๐ฅ โ ๐, ๐ฅโ โ ๐๐น (๐ฅ).
We consider๐ฝ : ๐ โ โ, ๐ง โฆโ ๐น (๐ง + ๐ฅ) โ ใ๐ฅโ, ๐งใ๐ + 1
2 โ๐งโ2๐ ,
which is proper, convex and lower semicontinuous by the assumptions on ๐น . Furthermore,๐ฝ is coercive by Lemma 3.9. Theorem 3.8 thus yields a ๐ง โ ๐ with ๐ฝ (๐ง) = min๐งโ๐ ๐ฝ (๐ง). ByTheorems 4.2, 4.5 and 4.14 and Lemma 4.13 (ii) then
(6.6) 0 โ ๐๐น (๐ง + ๐ฅ) โ {๐ฅโ} + ๐ ๐ (๐ง),
75
6 monotone operators and proximal points
where we have introduced ๐ (๐ง) โ 12 โ๐งโ2๐ . In other words, there exists a ๐งโ โ ๐ ๐ (๐ง) such that
๐ฅโ โ ๐งโ โ ๐๐น (๐ง + ๐ฅ). Combining Lemma 5.4 for ๐ = ๐ = 2 and Lemma 5.8, we furthermorehave that ๐งโ โ ๐ ๐ (๐ง) if and only if
(6.7) ใ๐งโ, ๐งใ๐ =12 โ๐งโ
2๐ + 1
2 โ๐งโโ2๐ โ .
Applying now (6.5) for ๐ฅ = ๐ง + ๐ฅ and ๐ฅโ = ๐ฅโ โ ๐งโ โ ๐๐น (๐ฅ), we obtain using (6.7) that
0 โค ใ๐ฅโ โ ๐ฅโ + ๐งโ, ๐ฅ โ ๐ง โ ๐ฅใ๐ = โใ๐งโ, ๐งใ๐ = โ 12 โ๐ง
โโ2๐ โ โ 12 โ๐งโ
2๐ ,
implying that both ๐ง = 0 and ๐งโ = 0. Hence by (6.6) we conclude that ๐ฅโ โ ๐๐น (๐ฅ), whichshows that ๐๐น is maximally monotone. ๏ฟฝ
The argument in the preceding proof can be modied to give a characterization of maximalmonotonicity for general monotone operators; this is known as Mintyโs Theorem and isa central result in the theory of monotone operators. We again make use of the dualitymapping ๐ ๐ : ๐ โ ๐ โ for ๐ (๐ฅ) = 1
2 โ๐ฅ โ2๐ , noting for later use that if ๐ is a Hilbert space(and we identify ๐ โ with ๐ ), then ๐ ๐ = Id.
Theorem 6.12 (Minty). Let ๐ be a reexive Banach space and ๐ด : ๐ โ ๐ โ be monotone. If ๐ดis maximally monotone, then ๐ ๐ +๐ด is surjective.
Proof. We proceed similarly as in the proof of Theorem 6.11 by constructing a functional๐น๐ด which plays the same role for๐ด as ๐น does for ๐๐น . Specically, we dene for a maximallymonotone operator ๐ด : ๐ โ ๐ โ the Fitzpatrick functional
(6.8) ๐น๐ด : ๐ ร ๐ โ โ [โโ,โ], (๐ฅ, ๐ฅโ) โฆโ sup(๐ง,๐งโ)โgraph๐ด
(ใ๐ฅโ, ๐งใ๐ + ใ๐งโ, ๐ฅใ๐ โ ใ๐งโ, ๐งใ๐ ) ,
which can be written equivalently as
(6.9) ๐น๐ด (๐ฅ, ๐ฅโ) = ใ๐ฅโ, ๐ฅใ๐ โ inf(๐ง,๐งโ)โgraph๐ด
ใ๐ฅโ โ ๐งโ, ๐ฅ โ ๐งใ๐ .
Each characterization implies useful properties.
(i) By maximal monotonicity of๐ด, we have by denition that ใ๐ฅโโ๐งโ, ๐ฅ โ๐งใ๐ โฅ 0 for all(๐ง, ๐งโ) โ graph๐ด if and only if (๐ฅ, ๐ฅโ) โ graph๐ด; in particular, ใ๐ฅโ โ ๐งโ, ๐ฅ โ ๐งใ๐ < 0for all (๐ฅ, ๐ฅโ) โ graph๐ด. Hence, (6.9) implies that ๐น๐ด (๐ฅ, ๐ฅโ) โฅ ใ๐ฅโ, ๐ฅใ๐ , with equalityif and only if (๐ฅ, ๐ฅโ) โ graph๐ด (since in this case the inmum is attained in (๐ง, ๐งโ) =(๐ฅ, ๐ฅโ)). In particular, ๐น๐ด is proper.
76
6 monotone operators and proximal points
(ii) On the other hand, the denition (6.8) yields that
๐น๐ด = (๐บ๐ด)โ for ๐บ๐ด (๐งโ, ๐ง) = ใ๐งโ, ๐งใ๐ + ๐ฟgraph๐ดโ1 (๐งโ, ๐ง)
(since (๐ง, ๐งโ) โ graph๐ด if and only if (๐งโ, ๐ง) โ graph๐ดโ1). As part of the monotonicityof ๐ด, we have required that graph๐ด โ โ ; hence ๐น๐ด is the Fenchel conjugate of aproper functional and therefore convex and lower semicontinuous.
As a rst step, we now show the result for the special case ๐ง = 0, i.e., that 0 โ ran(๐ ๐ +๐ด).We now set ฮ โ ๐ ร ๐ โ as well as b โ (๐ฅ, ๐ฅโ) โ ฮ and consider the functional
๐ฝ๐ด : ฮ โ โ, b โฆโ ๐น๐ด (b) + 12 โb โ
2ฮ.
We rst note that property (i) implies for all b โ ฮ that
(6.10) ๐ฝ๐ด (b) = ๐น๐ด (b) + 12 โb โ
2ฮ = ๐น๐ด (๐ฅ, ๐ฅโ) + 1
2 โ๐ฅ โ2๐ + 1
2 โ๐ฅโโ2๐ โ
โฅ ใ๐ฅโ, ๐ฅใ๐ + 12 โ๐ฅ โ
2๐ + 1
2 โ๐ฅโโ2๐ โ
โฅ 0,
where the last inequality follows from the FenchelโYoung inequality for ๐ applied to(๐ฅ,โ๐ฅโ). Furthermore, ๐ฝ๐ด is proper, convex, lower semicontinuous, and (by Lemma 3.9)coercive. Theorem 3.8 thus yields a b โ (๐ฅ, ๐ฅโ) โ ฮ with ๐ฝ๐ด (b) = minbโฮ ๐ฝ๐ด (b), which byTheorems 4.2, 4.5 and 4.14 satises that
0 โ ๐๐ฝ๐ด (b) = ๐(12 โb โ
2๐
)+ ๐๐น๐ด (b),
i.e., there exists a bโ = (๏ฟฝ๏ฟฝโ, ๏ฟฝ๏ฟฝ) โ ฮโ ' ๐ โ ร ๐ (since ๐ is reexive) such that bโ โ ๐๐น๐ด (b)and โbโ โ ๐( 12 โb โ2๐ ).By denition of the subdierential, we thus have for all b โ ฮ that
๐น๐ด (b) โฅ ๐น๐ด (b) + ใbโ, b โ bใฮ = ๐ฝ๐ด (b) + 12 โb
โโ2ฮโ + ใbโ, bใฮ โฅ 12 โb
โโ2ฮ + ใbโ, bใฮ,
where the second step uses again the FenchelโYoung inequality, which holds with equalityfor (b,โbโ), and the last step follows from (6.10). Property (i) then implies for all (๐ฅ, ๐ฅโ) โgraph๐ด that
ใ๐ฅโ, ๐ฅใ๐ = ๐น๐ด (๐ฅ, ๐ฅโ) โฅ 12 โ๏ฟฝ๏ฟฝ
โโ2๐ โ + 12 โ๏ฟฝ๏ฟฝ โ2๐ + ใ๏ฟฝ๏ฟฝโ, ๐ฅใ๐ + ใ๐ฅโ, ๏ฟฝ๏ฟฝใ๐ .
Adding ใ๏ฟฝ๏ฟฝโ, ๏ฟฝ๏ฟฝใ๐ on both sides and rearranging yields
(6.11) ใ๐ฅโ โ ๏ฟฝ๏ฟฝโ, ๐ฅ โ ๏ฟฝ๏ฟฝใ๐ โฅ ใ๏ฟฝ๏ฟฝโ, ๏ฟฝ๏ฟฝใ๐ + 12 โ๏ฟฝ๏ฟฝ
โโ2๐ โ + 12 โ๏ฟฝ๏ฟฝ โ2๐ โฅ 0,
77
6 monotone operators and proximal points
again by the FenchelโYoung inequality. The maximal monotonicity of ๐ด thus yields that๏ฟฝ๏ฟฝโ โ ๐ด(๏ฟฝ๏ฟฝ), i.e., (๏ฟฝ๏ฟฝ, ๏ฟฝ๏ฟฝโ) โ graph๐ด. Inserting this for (๐ฅ, ๐ฅโ) in (6.11) then shows that
ใ๏ฟฝ๏ฟฝโ, ๏ฟฝ๏ฟฝใ๐ + 12 โ๏ฟฝ๏ฟฝ
โโ2๐ โ + 12 โ๏ฟฝ๏ฟฝ โ2๐ = 0.
Hence the FenchelโYoung inequality for ๐ ๐ holds with equality at (๏ฟฝ๏ฟฝ,โ๏ฟฝ๏ฟฝโ), implyingโ๏ฟฝ๏ฟฝโ โ ๐ ๐ (๏ฟฝ๏ฟฝ). Together, we obtain that 0 = โ๏ฟฝ๏ฟฝโ + ๏ฟฝ๏ฟฝโ โ (๐ ๐ +๐ด) (๏ฟฝ๏ฟฝ).Finally, let ๐งโ โ ๐ โ be arbitrary and set ๐ต : ๐ โ ๐ โ, ๐ฅ โฆโ {โ๐งโ} + ๐ด(๐ฅ). Using thedenition, it is straightforward to verify that ๐ต is maximally monotone as well. As we havejust shown, there now exists a ๐ฅโ โ ๐ โ with 0 โ (๐ ๐ + ๐ต) (๐ฅโ) = {๐ฅโ} + {โ๐งโ} +๐ด(๐ฅโ), i.e.,๐งโ โ (๐ ๐ +๐ด) (๐ฅโ). Hence ๐ ๐ +๐ด is surjective. ๏ฟฝ
6.3 resolvents and proximal points
The proof of Theorem 6.11 is based on associating to any ๐ฅโ โ ๐๐น (๐ฅ) an element ๐ง โ ๐as the minimizer of a suitable functional. If ๐ is a Hilbert space, this functional is evenstrictly convex and hence the minimizer ๐ง is unique. This property can be exploited todene a new single-valued mapping that is more useful for algorithms than the set-valuedsubdierential mapping. For this purpose, we restrict the discussion in the remainder ofthis chapter to Hilbert spaces (but see Remark 6.26 below). This allows identifying ๐ โ with๐ ; in particular, we will from now on identify the set ๐๐น (๐ฅ) โ ๐ โ of subderivatives withthe corresponding set in ๐ of subgradients (i.e., their Riesz representations). By the sametoken, we will also use the same notation for inner products as for duality pairings to avoidthe danger of confusing pairs of elements (๐ฅ, ๐ฅโ) โ graph ๐๐น with their inner product.
We can then dene for a maximally monotone operator ๐ด : ๐ โ ๐ the resolvent
R๐ด : ๐ โ ๐, R๐ด (๐ฅ) = (Id +๐ด)โ1๐ฅ,as well as for a proper, convex, and lower semicontinuous functional ๐น : ๐ โ โ theproximal point mapping
(6.12) prox๐น : ๐ โ ๐, prox๐น (๐ฅ) = argmin๐งโ๐
12 โ๐ง โ ๐ฅ โ
2๐ + ๐น (๐ง).
Since a similar argument as in the proof of Theorem 6.11 shows that๐ค โ R๐๐น (๐ฅ) is equivalentto the necessary and sucient conditions for the proximal point ๐ค to be a minimizer of thestrictly convex functional in (6.12), we have that
(6.13) prox๐น = (Id + ๐๐น )โ1 = R๐๐น .
Resolvents of monotone and, in particular, maximal monotone operators have useful prop-erties.
78
6 monotone operators and proximal points
Lemma 6.13. If ๐ด : ๐ โ ๐ is monotone, R๐ด is rmly nonexpansive, i.e.,
(6.14) โ๐ง1 โ ๐ง2โ2๐ โค ใ๐ฅ1 โ ๐ฅ2, ๐ง1 โ ๐ง2ใ๐ for all (๐ฅ1, ๐ง1), (๐ฅ2, ๐ง2) โ graphR๐ดor equivalently,
(6.15) โ๐ง1 โ ๐ง2โ2๐ + โ(๐ฅ1 โ ๐ง1) โ (๐ฅ2 โ ๐ง2)โ2๐ โค โ๐ฅ1 โ ๐ฅ2โ2๐for all (๐ฅ1, ๐ง1), (๐ฅ2, ๐ง2) โ graphR๐ด .
Proof. Let ๐ฅ1, ๐ฅ2 โ domR๐ด as well as ๐ง1 โ R๐ด (๐ฅ1) and ๐ง2 โ R๐ด (๐ฅ2). By denition of theresolvent, this implies that ๐ฅ1 โ ๐ง1 โ ๐ด(๐ง1) and ๐ฅ2 โ ๐ง2 โ ๐ด(๐ง2). By the monotonicity of ๐ด,we thus have
0 โค ใ(๐ฅ1 โ ๐ง1) โ (๐ฅ2 โ ๐ง2), ๐ง1 โ ๐ง2ใ๐ ,which after rearranging yields (6.14). The equivalence of (6.14) and (6.15) is straightforwardto verify using binomial expansion. ๏ฟฝ
Corollary 6.14. Let ๐ด : ๐ โ ๐ be maximally monotone. Then R๐ด : ๐ โ ๐ is single-valuedand Lipschitz continuous with constant ๐ฟ = 1.
Proof. Since ๐ด is maximally monotone, Id + ๐ด is surjective by Theorem 6.12, which im-plies that domR๐ด = ๐ . Let now ๐ฅ โ ๐ and ๐ง1, ๐ง2 โ R๐ด (๐ฅ). Since ๐ด is monotone, R๐ดis nonexpansive by Lemma 6.13, which yields both single-valuedness of R๐ด (by taking๐ฅ1 = ๐ฅ2 = ๐ฅ implies ๐ง1 = ๐ง2) and its Lipschitz continuity (by applying the CauchyโSchwarzinequality). ๏ฟฝ
In particular, by Theorem 6.11, this holds for the proximal point mapping prox๐น : ๐ โ ๐of a proper, convex, and lower semicontinuous functional ๐น : ๐ โ โ.
Lipschitz continuous mappings with constant ๐ฟ = 1 are also called nonexpansive. A relatedconcept that is sometimes used is the following. A mapping๐ : ๐ โ ๐ is called ๐ผ-averagedfor some ๐ผ โ (0, 1), if ๐ = (1 โ ๐ผ)Id + ๐ผ ๐ฝ for some nonexpansive ๐ฝ : ๐ โ ๐ . We then havethe following relation.
Lemma 6.15. Let๐ : ๐ โ ๐ . Then๐ is rmly nonexpansive if and only if๐ is (1/2)-averaged.
Proof. Suppose ๐ is (1/2)-averaged. Then ๐ = 12 (Id + ๐ฝ ) for some nonexpansive ๐ฝ . We
compute
โ๐ (๐ฅ) โ๐ (๐ฆ)โ2๐ =14
(โ ๐ฝ (๐ฅ) โ ๐ฝ (๐ฆ)โ2๐ + 2ใ๐ฝ (๐ฅ) โ ๐ฝ (๐ฆ), ๐ฅ โ ๐ฆใ๐ + โ๐ฅ โ ๐ฆ โ2๐)
โค 12
(ใ๐ฝ (๐ฅ) โ ๐ฝ (๐ฆ), ๐ฅ โ ๐ฆใ๐ + โ๐ฅ โ ๐ฆ โ2๐)
= ใ๐ (๐ฅ) โ๐ (๐ฆ), ๐ฅ โ ๐ฆใ๐ .
79
6 monotone operators and proximal points
Thus ๐ is rmly nonexpansive.
Suppose then that ๐ is rmly nonexpansive. If we show that ๐ฝ โ 2๐ โ Id is nonexpansive,it follows that ๐ is (1/2)-averaged. This is established by the simple calculations
โ ๐ฝ (๐ฅ) โ ๐ฝ (๐ฆ)โ2๐ = 4โ๐ (๐ฅ) โ๐ (๐ฆ)โ2๐ โ 4ใ๐ (๐ฅ) โ๐ (๐ฆ), ๐ฅ โ ๐ฆใ๐ + โ๐ฅ โ ๐ฆ โ2๐โค โ๐ฅ โ ๐ฆ โ2๐ .
This completes the proof. ๏ฟฝ
Like maximally monotone operators, ๐ผ-averaged operators always have outer semiconti-nuity properties. To show this, we will use that in Hilbert spaces, the converse of MintyโsTheorem 6.12 holds (with the duality mapping ๐ ๐ = Id).
Lemma 6.16. Let ๐ด : ๐ โ ๐ be monotone. If Id + ๐ด is surjective, then ๐ด is maximallymonotone.
Proof. Consider ๐ฅ โ ๐ and ๐ฅโ โ ๐ with
(6.16) ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ 0 for all (๐ฅ, ๐ฅโ) โ graph๐ด.
If Id +๐ด is surjective, then for ๐ฅ + ๐ฅโ โ ๐ there exist a ๐ง โ ๐ and a ๐งโ โ ๐ด(๐ง) with
(6.17) ๐ฅ + ๐ฅโ = ๐ง + ๐งโ โ (Id +๐ด)๐ง.
Inserting (๐ฅ, ๐ฅโ) = (๐ง, ๐งโ) into (6.16) then yields that
0 โค ใ๐ฅโ โ ๐งโ, ๐ฅ โ ๐งใ๐ = ใ๐ง โ ๐ฅ, ๐ฅ โ ๐งใ๐ = โโ๐ฅ โ ๐งโ2๐ โค 0,
i.e., ๐ฅ = ๐ง. From (6.17) we further obtain ๐ฅโ = ๐งโ โ ๐ด(๐ง) = ๐ด(๐ฅ), and hence ๐ด is maximallymonotone. ๏ฟฝ
Lemma 6.17. Let๐ : ๐ โ ๐ be ๐ผ-averaged. Then๐ is weak-to-strong and strong-to-weakly-โouter semicontinuous, and the set of xed points ๐ฅ = ๐ (๐ฅ) of ๐ is convex and closed.
Proof. Let ๐ = (1 โ ๐ผ)Id + ๐ผ ๐ฝ for some nonexpansive operator ๐ฝ : ๐ โ ๐ . Then clearly๐ฅ โ ๐ is a xed point of ๐ if and only if ๐ฅ is a xed point of ๐ฝ . It thus suces to show theclaim for the xed-point set {๐ฅ | ๐ฅ = ๐ฝ (๐ฅ)} = (Id โ ๐ฝ )โ1(0) of a nonexpansive operator ๐ฝ .By Lemmas 6.6 to 6.8, we thus only need to show that Id โ ๐ฝ is maximally monotone.
First, Idโ ๐ฝ is clearly monotone. Moreover, 2Idโ ๐ฝ = Id+(Idโ ๐ฝ ) is surjective since otherwise2๐ฅ โ ๐ฝ (๐ฅ) = 2๐ฆ โ ๐ฝ (๐ฆ) for ๐ฅ โ ๐ฆ , which together with the assumed nonexpansivity wouldlead to the contradiction 0 โ 2โ๐ฅ โ ๐ฆ โ โค โ๐ฅ โ ๐ฆ โ. Lemma 6.16 then shows that Id โ ๐ฝ ismaximally monotone, and the claim follows. ๏ฟฝ
80
6 monotone operators and proximal points
The following useful result allows characterizing minimizers of convex functionals asproximal points.
Lemma 6.18. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐ฅ, ๐ฅโ โ ๐ .Then for any ๐พ > 0,
๐ฅโ โ ๐๐น (๐ฅ) โ ๐ฅ = prox๐พ๐น (๐ฅ + ๐พ๐ฅโ).
Proof. Multiplying both sides of the subdierential inclusion by ๐พ > 0 and adding ๐ฅ yieldsthat
๐ฅโ โ ๐๐น (๐ฅ) โ ๐ฅ + ๐พ๐ฅโ โ (Id + ๐พ๐๐น ) (๐ฅ)โ ๐ฅ โ (Id + ๐พ๐๐น )โ1(๐ฅ + ๐พ๐ฅโ)โ ๐ฅ = prox๐พ๐น (๐ฅ + ๐พ๐ฅโ),
where in the last step we have used that ๐พ๐๐น = ๐(๐พ๐น ) by Lemma 4.13 (i) and hence thatprox๐พ๐น = R๐(๐พ๐น ) = R๐พ๐๐น . ๏ฟฝ
By applying Lemma 6.18 to the Fermat principle 0 โ ๐๐น (๐ฅ), we obtain the followingxed-point characterization of minimizers of ๐น .
Corollary 6.19. Let ๐น : ๐ โ โ be proper, convex and lower semicontinuous, and ๐พ > 0 bearbitrary. Then ๐ฅ โ dom ๐น is a minimizer of ๐น if and only if
๐ฅ = prox๐พ๐น (๐ฅ).
This simple result should not be underestimated: It allows replacing (explicit) set inclusionsin optimality conditions by equivalent (implicit) Lipschitz continuous equations, which (aswe will show in following chapters) can be solved by xed-point iteration or Newton-typemethods.
We can also derive a generalization of the orthogonal decomposition of vector spaces.
Theorem 6.20 (Moreau decomposition). Let ๐น : ๐ โ โ be proper, convex, and lowersemicontinuous. Then we have for all ๐ฅ โ ๐ that
๐ฅ = prox๐น (๐ฅ) + prox๐น โ (๐ฅ).
Proof. Setting๐ค = prox๐น (๐ฅ), Lemmas 5.8 and 6.18 for ๐พ = 1 imply that
๐ค = prox๐น (๐ฅ) = prox๐น (๐ค + (๐ฅ โ๐ค)) โ ๐ฅ โ๐ค โ ๐๐น (๐ค)โ ๐ค โ ๐๐น โ(๐ฅ โ๐ค)โ ๐ฅ โ๐ค = prox๐น โ ((๐ฅ โ๐ค) +๐ค) = prox๐น โ (๐ฅ). ๏ฟฝ
The following calculus rules will prove useful.
81
6 monotone operators and proximal points
Lemma 6.21. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous. Then,
(i) for _ โ 0 and ๐ง โ ๐ we have with ๐ป (๐ฅ) โ ๐น (_๐ฅ + ๐ง) that
prox๐ป (๐ฅ) = _โ1(prox_2๐น (_๐ฅ + ๐ง) โ ๐ง);
(ii) for ๐พ > 0 we have that
prox๐พ๐น โ (๐ฅ) = ๐ฅ โ ๐พ prox๐พโ1๐น (๐พโ1๐ฅ);
(iii) for proper, convex, lower semicontinuous๐บ : ๐ โ โ and๐พ > 0we have with๐ป (๐ฅ, ๐ฆ) โ๐น (๐ฅ) +๐บ (๐ฆ) that
prox๐พ๐ป (๐ฅ, ๐ฆ) =(prox๐พ๐น (๐ฅ)prox๐พ๐บ (๐ฆ)
).
Proof. (i): By denition,
prox๐ป (๐ฅ) = argmin๐คโ๐
12 โ๐ค โ ๐ฅ โ2๐ + ๐น (_๐ค + ๐ง) =: ๏ฟฝ๏ฟฝ .
Now note that since ๐ is a vector space,
min๐คโ๐
12 โ๐ค โ ๐ฅ โ2๐ + ๐น (_๐ค + ๐ง) = min
๐ฃโ๐12 โ_
โ1(๐ฃ โ ๐ง) โ ๐ฅ โ2๐ + ๐น (๐ฃ),
and the respective minimizers ๏ฟฝ๏ฟฝ and ๐ฃ are related by ๐ฃ = _๏ฟฝ๏ฟฝ + ๐ง. The claim then followsfrom
๐ฃ = argmin๐ฃโ๐
12 โ_
โ1(๐ฃ โ ๐ง) โ ๐ฅ โ2๐ + ๐น (๐ฃ)
= argmin๐ฃโ๐
12_2 โ๐ฃ โ (_๐ฅ + ๐ง)โ2๐ + ๐น (๐ฃ)
= argmin๐ฃโ๐
12 โ๐ฃ โ (_๐ฅ + ๐ง)โ2๐ + _2๐น (๐ฃ)
= prox_2๐น (_๐ฅ + ๐ง).
(ii): Theorem 6.20, Lemma 5.7 (i), and (i) for _ = ๐พโ1 and ๐ง = 0 together imply that
prox๐พ๐น (๐ฅ) = ๐ฅ โ prox(๐พ๐น )โ (๐ฅ)= ๐ฅ โ prox๐พ๐น โโฆ(๐พโ1Id) (๐ฅ)= ๐ฅ โ ๐พ prox๐พ (๐พโ2๐น โ) (๐พโ1๐ฅ).
Applying this to ๐น โ and using that ๐น โโ = ๐น by Theorem 5.1 (iii) now yields the claim.
82
6 monotone operators and proximal points
(iii): By denition of the norm on the product space ๐ ร ๐ , we have that
prox๐พ๐ป (๐ฅ, ๐ฆ) = argmin(๐ข,๐ฃ)โ๐ร๐
12 โ(๐ข, ๐ฃ) โ (๐ฅ, ๐ฆ)โ2๐ร๐ + ๐พ๐ป (๐ข, ๐ฃ)
= argmin๐ขโ๐,๐ฃโ๐
(12 โ๐ข โ ๐ฅ โ2๐ + ๐พ๐น (๐ข)
)+
(12 โ๐ฃ โ ๐ฆ โ
2๐ + ๐พ๐บ (๐ฃ)
).
Since there are no mixed terms in ๐ข and ๐ฃ , the two terms in parentheses can be minimizedseparately. Hence, prox๐พ๐ป (๐ฅ, ๐ฆ) = (๐ข, ๐ฃ) for
๐ข = argmin๐ขโ๐
12 โ๐ข โ ๐ฅ โ2๐ + ๐พ๐น (๐ข) = prox๐พ๐น (๐ฅ),
๐ฃ = argmin๐ฃโ๐
12 โ๐ฃ โ ๐ฆ โ
2๐ + ๐พ๐บ (๐ฃ) = prox๐พ๐บ (๐ฅ) . ๏ฟฝ
Computing proximal points is dicult in general since evaluating prox๐น by its denitionentails minimizing ๐น . In some cases, however, it is possible to give an explicit formula forprox๐น .
Example 6.22. We rst consider scalar functions ๐ : โ โ โ.
(i) ๐ (๐ก) = 12 |๐ก |2. Since ๐ is dierentiable, we can set the derivative of 1
2 (๐ โ ๐ก)2 +๐พ2๐
2
to zero and solve for ๐ to obtain prox๐พ ๐ (๐ก) = (1 + ๐พ)โ1๐ก .(ii) ๐ (๐ก) = |๐ก |. By Example 4.7, we have that ๐๐ (๐ก) = sign(๐ก); hence ๐ โ prox๐พ ๐ (๐ก) =
(Id +๐พ sign)โ1(๐ก) if and only if ๐ก โ {๐ } +๐พ sign(๐ ). Let ๐ก be given and assume thisholds for some ๐ . We now proceed by case distinction.
Case 1: ๐ > 0. This implies that ๐ก = ๐ + ๐พ , i.e., ๐ = ๐ก โ ๐พ , and hence that ๐ก > ๐พ .
Case 2: ๐ < 0. This implies that ๐ก = ๐ โ ๐พ , i.e., ๐ = ๐ก + ๐พ , and hence that ๐ก < โ๐พ .Case 3: ๐ = 0. This implies that ๐ก โ ๐พ [โ1, 1] = [โ๐พ,๐พ].Since this yields a complete and disjoint case distinction for ๐ก , we can concludethat
prox๐พ ๐ (๐ก) =๐ก โ ๐พ if ๐ก > ๐พ,0 if ๐ก โ [โ๐พ,๐พ],๐ก + ๐พ if ๐ก < โ๐พ .
This mapping is also known as the soft-shrinkage or soft-thresholding operator.
(iii) ๐ (๐ก) = ๐ฟ [โ1,1] (๐ก). We can proceed here in the same way as in (ii), but for the sakeof variety we instead use Lemma 6.21 (ii) to compute the proximal point mapping
83
6 monotone operators and proximal points
from that of ๐ โ(๐ก) = |๐ก | (see Example 5.3 (ii)) via
prox๐พ ๐ (๐ก) = ๐ก โ ๐พ prox๐พโ1 ๐ โ (๐พโ1๐ก)
=
๐ก โ ๐พ (๐พโ1๐ก โ ๐พโ1) if ๐พโ1๐ก > ๐พโ1,๐ก โ 0 if ๐พโ1๐ก โ [โ๐พโ1, ๐พโ1],๐ก โ ๐พ (๐พโ1๐ก + ๐พโ1) if ๐พโ1๐ก < โ๐พโ1
=
1 if ๐ก > 1,๐ก if ๐ก โ [โ1, 1],
โ1 if ๐ก < โ1.
For every ๐พ > 0, the proximal point of ๐ก is thus its projection onto [โ1, 1].
Example 6.23. We can generalize Example 6.22 to ๐ = โ๐ (endowed with the Euclideaninner product) by applying Lemma 6.21 (iii) ๐ times. We thus obtain componentwise
(i) for ๐น (๐ฅ) = 12 โ๐ฅ โ22 =
โ๐๐=1
12๐ฅ
2๐ that
[prox๐พ๐น (๐ฅ)]๐ =(
11 + ๐พ
)๐ฅ๐, 1 โค ๐ โค ๐ ;
(ii) for ๐น (๐ฅ) = โ๐ฅ โ1 = โ๐๐=1 |๐ฅ๐ | that
[prox๐พ๐น (๐ฅ)]๐ = ( |๐ฅ๐ | โ ๐พ)+ sign(๐ฅ๐), 1 โค ๐ โค ๐ ;
(iii) for ๐น (๐ฅ) = ๐ฟ๐นโ (๐ฅ) =โ๐๐=1 ๐ฟ [โ1,1] (๐ฅ๐) that
[prox๐พ๐น (๐ฅ)]๐ = ๐ฅ๐ โ (๐ฅ๐ โ 1)+ โ (๐ฅ๐ + 1)โ =๐ฅ๐
max{1, |๐ฅ๐ |} , 1 โค ๐ โค ๐ .
Here we have used the convenient notation (๐ก)+ โ max{๐ก, 0} and (๐ก)โ โ min{๐ก, 0}.
Many more examples of projection operators and proximal mappings can be found in[Cegielski 2012], [Parikh&Boyd 2014, ยง 6.5], [Beck 2017], as well as athps://www.proximity-
operator.net.
Since the subdierential of convex integral functionals can be evaluated pointwise byTheorem 4.11, the same holds for the denition (6.13) of the proximal point mapping.
Corollary 6.24. Let ๐ : โ โ โ be proper, convex, and lower semicontinuous, and ๐น : ๐ฟ2(ฮฉ) โโ be dened by superposition as in Lemma 3.7. Then we have for all ๐พ > 0 and ๐ข โ ๐ฟ2(ฮฉ)
84
6 monotone operators and proximal points
that[prox๐พ๐น (๐ข)] (๐ฅ) = prox๐พ ๐ (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ.
Example 6.25. Let ๐ be a Hilbert space. Similarly to Example 6.22 one can show that
(i) for ๐น = 12 โ ยท โ2๐ = 1
2 ใยท, ยทใ๐ , that
prox๐พ๐น (๐ฅ) =(
11 + ๐พ
)๐ฅ ;
(ii) for ๐น = โ ยท โ๐ , using a case distinction as in Theorem 4.6, that
prox๐พ๐น (๐ฅ) =(1 โ ๐พ
โ๐ฅ โ๐
)+๐ฅ ;
(iii) for ๐น = ๐ฟ๐ถ with ๐ถ โ ๐ nonempty, convex, and closed, that by denition
prox๐พ๐น (๐ฅ) = proj๐ถ (๐ฅ) โ argmin๐งโ๐ถ
โ๐ง โ ๐ฅ โ๐
the metric projection of ๐ฅ onto ๐ถ; the proximal point mapping thus generalizesthe concept projection onto convex sets. Explicit or at least constructive formulasfor the projection onto dierent classes of sets can be found in [Cegielski 2012,Chapter 4.1].
Remark 6.26. The results of this section can be extended to (reexive) Banach spaces if the identityis replaced by the duality mapping ๐ฅ โฆโ ๐(โ ยท โ๐ ) (๐ฅ); see, e.g., [Cioranescu 1990, Theorem 3.11]. If thenorm is dierentiable (which is the case if the unit ball of๐ โ is strictly convex as for, e.g.,๐ = ๐ฟ๐ (ฮฉ)with ๐ โ (1,โ)), the duality mapping is in fact single-valued [Cioranescu 1990, Theorem 2.16] andhence the corresponding resolvent (๐ ๐ + ๐ด)โ1 is well-dened. However, the proximal mappingneed no longer be Lipschitz continuous, although the denition can be modied to obtain uniformcontinuity; see [Baฤรกk & Kohlenbach 2018]. Similarly, the Moreau decomposition (Theorem 6.20)needs to be modied appropriately; see [Combettes & Reyes 2013].
The main diculty from our point of view, however, lies in the evaluation of the proximal mapping,which then rarely admits a closed form even for simple functionals. This problem already arisesin Hilbert spaces if ๐ โ is not identied with ๐ and hence the Riesz isomorphism (which coincideswith ๐ฝโ1๐ โ in this case) has to be inverted to obtain a proximal point.
Remark 6.27. By Corollary 6.14, the proximal mapping of any proper, convex, and lower semicon-tinuous functional is nonexpansive. Conversely, it can be shown that every nonexpansive mapping๐ : ๐ โ ๐ that satises ๐ (๐ฅ) โ ๐๐บ (๐ฅ), for all ๐ฅ โ ๐ for some proper, convex, and lower semi-continuous functional ๐บ : ๐ โ โ is the proximal mapping of some proper, convex, and lowersemicontinuous functional; see [Moreau 1965; Gribonval & Nikolova 2020].
85
7 SMOOTHNESS AND CONVEXITY
Before we turn to algorithms for the solution of nonsmooth optimization problems, wederive consequences of convexity for dierentiable functionals that will be useful in prov-ing convergence of splitting methods for functionals involving a smooth component. Inparticular, we will show that Lipschitz continuous dierentiability is linked via Fenchelduality to strong convexity.
7.1 smoothness
We now derive useful consequences of Lipschitz dierentiability and their relation toconvexity. Recall from Theorem 4.5 that for ๐น : ๐ โ โ convex and Gรขteaux dierentiable,๐๐น (๐ฅ) = {๐ท๐น (๐ฅ)} (which can be identied with {โ๐น (๐ฅ)} โ ๐ in Hilbert spaces).
Lemma 7.1. Let ๐ be a Banach space and let ๐น : ๐ โ โ be Gรขteaux dierentiable. Considerthe properties:
(i) The property
(7.1) ๐น (๐ฆ) โค ๐น (๐ฅ) + ใ๐ท๐น (๐ฆ), ๐ฆ โ ๐ฅใ๐ โ 12๐ฟ โ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ)โ
2๐ โ for all ๐ฅ, ๐ฆ โ ๐ .
(ii) The co-coercivity of ๐ท๐น with factor ๐ฟโ1:
(7.2) ๐ฟโ1โ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ)โ2๐ โ โค ใ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ), ๐ฅ โ ๐ฆใ๐ for all ๐ฅ, ๐ฆ โ ๐ .
(iii) Lipschitz continuity of ๐ท๐น with factor ๐ฟ:
(7.3) โ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ)โ๐ โ โค ๐ฟโ๐ฅ โ ๐ฆ โ๐ for all ๐ฅ, ๐ฆ โ ๐ .
(iv) The property
(7.4) ใ๐ท๐น (๐ฅ + โ) โ ๐ท๐น (๐ฅ), โใ๐ โค ๐ฟโโโ2๐ for all ๐ฅ, โ โ ๐ .
86
7 smoothness and convexity
(v) The smoothness (also known as descent lemma) of ๐น with factor ๐ฟ:
(7.5) ๐น (๐ฅ + โ) โค ๐น (๐ฅ) + ใ๐ท๐น (๐ฅ), โใ๐ + ๐ฟ2 โโโ2๐ for all ๐ฅ, โ โ ๐ .
(vi) The uniform smoothness of ๐น with factor ๐ฟ:
(7.6) ๐น (_๐ฅ + (1 โ _)๐ฆ) + _(1 โ _)๐ฟ2 โ๐ฅ โ ๐ฆ โ2๐โฅ _๐น (๐ฅ) + (1 โ _)๐น (๐ฆ) for all ๐ฅ, ๐ฆ โ ๐, _ โ [0, 1] .
Then (i) โ (ii) โ (iii) โ (iv) โ (v) โ (vi). If ๐น is convex and ๐ is reexive, then all theproperties are equivalent.
Proof. (i)โ (ii): Summing the estimate (7.1) with the same estimate with ๐ฅ and ๐ฆ exchanged,we obtain (7.2).
(ii) โ (iii): This follows immediately from (1.1).
(iii) โ (iv): Taking ๐ฆ = ๐ฅ + โ and multiplying (7.3) by โโโ๐ , the property follows againfrom (1.1).
(iv) โ (v): Using the mean value Theorem 2.10 and (7.4), we obtain
๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ใ๐ท๐น (๐ฅ), โใ๐ =โซ 1
0ใ๐ท๐น (๐ฅ + ๐กโ), โใ๐ ๐๐ก โ ใ๐ท๐น (๐ฅ), โใ๐
=โซ 1
0ใ๐ท๐น (๐ฅ + ๐กโ) โ ๐ท๐น (๐ฅ), โใ๐ ๐๐ก
โคโซ 1
0๐ก ๐๐ก ยท ๐ฟโโโ2๐ =
๐ฟ
2 โโโ2๐ .
(v)โ (iv): This follows by adding together (7.5) and the same inequality with ๐ฅ +โ in placeof ๐ฅ .
(v) โ (vi): Set ๐ฅ_ โ _๐ฅ + (1 โ _)๐ฆ . Multiplying (7.5) rst for ๐ฅ = ๐ฅ_ and โ = ๐ฅ โ ๐ฅ_ =(1 โ _) (๐ฅ โ ๐ฆ) with _ and then for ๐ฅ = ๐ฅ_ and โ = ๐ฆ โ ๐ฅ_ = _(๐ฆ โ ๐ฅ) with 1 โ _ and addingthe results yields (7.6).
(vi) โ (v): This follows by dividing (7.6) by _ > 0 and taking the limit _ โ 0.
(v)โ (i) when ๐น is convex and ๐ is reexive: Since ๐น is convex, we have from Theorem 4.5that
ใ๐ท๐น (๐ฆ), (๐ฅ + โ) โ ๐ฆใ๐ โค ๐น (๐ฅ + โ) โ ๐น (๐ฆ).
87
7 smoothness and convexity
Combining this with (7.5) yields
(7.7) ๐น (๐ฆ) โค ๐น (๐ฅ) + ใ๐ท๐น (๐ฅ), โใ๐ โ ใ๐ท๐น (๐ฆ), (๐ฅ + โ) โ ๐ฆใ๐ + ๐ฟ2 โโโ2๐
= ๐น (๐ฅ) + ใ๐ท๐น (๐ฆ), ๐ฆ โ ๐ฅใ๐ + ใ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ), โใ๐ + ๐ฟ2 โโโ2๐ .
Let ๐งโ โ โ๐ฟโ1(๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ)). Since ๐ is reexive, the algebraic HahnโBanach Theo-rem 1.4 yields (after multiplication by โ๐งโโ๐ โ) an โ โ ๐ such that
โโโ๐ = โ๐งโโ๐ โ and ใ๐งโ, โใ๐ = โ๐งโโ2๐ โ .
Consequently, continuing from (7.7),
๐น (๐ฆ) โค ๐น (๐ฅ) + ใ๐ท๐น (๐ฆ), ๐ฆ โ ๐ฅใ๐ โ ๐ฟใ๐งโ, โใ๐ + ๐ฟ2 โโโ2๐
= ๐น (๐ฅ) + ใ๐ท๐น (๐ฆ), ๐ฆ โ ๐ฅใ๐ โ ๐ฟ
2 โ๐งโโ2๐ โ
= ๐น (๐ฅ) + ใ๐ท๐น (๐ฆ), ๐ฆ โ ๐ฅใ๐ โ 12๐ฟ โ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ)โ
2๐ โ .
This proves (7.1). ๏ฟฝ
The next โsmoothness three-point corollaryโ will be valuable for the study of splittingmethods that involve a smooth component function.
Corollary 7.2. Let ๐ be a reexive Banach space and let ๐น : ๐ โ โ be convex and Gรขteauxdierentiable. Then the following are equivalent:
(i) ๐น has ๐ฟโ1-co-coercive derivative (or any of the equivalent properties of Lemma 7.1).
(ii) The three-point smoothness
(7.8) ใ๐ท๐น (๐ง), ๐ฅ โ ๐ฅใ๐ โฅ ๐น (๐ฅ) โ ๐น (๐ฅ) โ ๐ฟ
2 โ๐ฅ โ ๐งโ2๐ for all ๐ฅ, ๐ง, ๐ฅ โ ๐,
(iii) The three-point monotonicity
(7.9) ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ โ๐ฟ4 โ๐ฅ โ ๐งโ2๐ for all ๐ฅ, ๐ง, ๐ฅ โ ๐ .
Proof. If โ๐น is ๐ฟโ1-co-coercive, using Lemma 7.1, we have the ๐ฟ-smoothness
๐น (๐ง) โ ๐น (๐ฅ) โฅ ใ๐ท๐น (๐ง), ๐ง โ ๐ฅใ๐ โ ๐ฟ
2 โ๐ฅ โ ๐งโ2๐ .
By convexity ๐น (๐ฅ) โ ๐น (๐ง) โฅ ใ๐ท๐น (๐ง), ๐ฅ โ ๐งใ๐ . Summing up, we obtain (7.8).
88
7 smoothness and convexity
Regarding (7.9), by assumption we have the co-coercivity
ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ง โ ๐ฅใ๐ โฅ ๐ฟโ1โ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ)โ2๐ โ .
Thus, using (1.1) and Youngโs inequality in the form ๐๐ โค 12๐ผ๐
2 + ๐ผ2๐
2 for ๐, ๐ โ โ and ๐ผ > 0,we obtain
ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ = ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ง โ ๐ฅใ๐ + ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐งใ๐โฅ ๐ฟโ1โ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ)โ2๐ โ โ โ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ)โ๐ โ โ๐ฅ โ ๐งโ๐โฅ โ๐ฟ4 โ๐ฅ โ ๐งโ2๐ .
This is (7.9)
For the reverse implications,we assume that (7.9) holds and set ๐งโ โ โ2๐ฟโ1(๐ท๐น (๐ง)โ๐ท๐น (๐ฅ)).By the assumed reexivity, we can again apply the algebraic HahnโBanach Theorem 1.4 toobtain an โ โ ๐ such that
โโโ๐ = โ๐งโโ๐ โ and ใ๐งโ, โใ๐ = โ๐งโโ2๐ โ .
With ๐ฅ = ๐ง + โ, (7.9) gives
ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ง โ ๐ฅใ๐ โฅ โใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), โใ๐ โ ๐ฟ
4 โโโ2๐
=๐ฟ
2 ใ๐งโ, โใ๐ โ ๐ฟ
4 โ๐งโโ2๐ โ
=๐ฟ
4 โ๐งโโ2๐ โ =
1๐ฟโ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ)โ2๐ โ .
This is the ๐ฟโ1-co-coercivity (7.2). The remaining equivalences follow from Lemma 7.1. ๏ฟฝ
7.2 strong convexity
The central notion in this chapter (and later for obtaining higher convergence rates forrst-order algorithms) is the following โquantitativeโ version of convexity. We say that๐น : ๐ โ โ is strongly convex with the factor ๐พ > 0 if for all ๐ฅ, ๐ฆ โ ๐ and _ โ [0, 1],
(7.10) ๐น (_๐ฅ + (1 โ _)๐ฆ) + _(1 โ _)๐พ2 โ๐ฅ โ ๐ฆ โ2๐ โค _๐น (๐ฅ) + (1 โ _)๐น (๐ฆ).
Obviously, strong convexity implies strict convexity, so strongly convex functions havea unique minimizer. If ๐ is a Hilbert space, it is straightforward if tedious to verify byexpanding the squared norm that (7.10) is equivalent to ๐น โ ๐พ
2 โ ยท โ2๐ being convex.
We have the following important duality result that was rst shown in [Azรฉ & Penot1995].
89
7 smoothness and convexity
Theorem 7.3. Let ๐น : ๐ โ โ be proper and convex. Then the following are true:
(i) If ๐น is strongly convex with factor ๐พ , then ๐น โ is uniformly smooth with factor ๐พโ1.
(ii) If ๐น is uniformly smooth with factor ๐ฟ, then ๐น โ is strongly convex with factor ๐ฟโ1.
(iii) If ๐น is lower semicontinuous, then ๐น is uniformly smooth with factor ๐ฟ if and only if ๐น โ
is strongly convex with factor ๐ฟโ1.
Proof. (i): Let ๐ฅโ, ๐ฆโ โ ๐ โ and ๐ผ๐ฅ , ๐ผ๐ฆ โ โ with ๐ผ๐ฅ < ๐น โ(๐ฅโ) and ๐ผ๐ฆ < ๐น โ(๐ฆโ). From thedenition of the Fenchel conjugate, there exist ๐ฅ, ๐ฆ โ ๐ such that
๐ผ๐ฅ < ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ), ๐ผ๐ฆ < ใ๐ฆโ, ๐ฆใ๐ โ ๐น (๐ฆ).
Multiplying the rst inequality with _ โ [0, 1], the second with (1 โ _), and using theFenchelโYoung inequality (5.1) in the form
0 โค ๐น (๐ฅ_) + ๐น โ(๐ฅโ_) โ ใ๐ฅโ_, ๐ฅ_ใ๐for ๐ฅโ
_โ _๐ฅโ + (1 โ _)๐ฆโ and ๐ฅ_ โ _๐ฅ + (1 โ _)๐ฆ then yields
_๐ผ๐ฅ + (1 โ _)๐ผ๐ฆ โค ๐น (๐ฅ_) + ๐น โ(๐ฅโ_) โ _๐น (๐ฅ) โ (1 โ _)๐น (๐ฆ) + _(1 โ _)ใ๐ฅโ โ ๐ฆโ, ๐ฅ โ ๐ฆใ๐โค ๐น โ(๐ฅโ_) + _(1 โ _)
(ใ๐ฅโ โ ๐ฆโ, ๐ฅ โ ๐ฆใ๐ โ ๐พ2 โ๐ฅ โ ๐ฆ โ2๐
)โค ๐น โ(๐ฅโ_) + _(1 โ _) sup
๐งโ๐
{ใ๐ฅโ โ ๐ฆโ, ๐งใ๐ โ ๐พ2 โ๐งโ
2๐
}= ๐น โ(๐ฅโ_) + _(1 โ _)
12๐พ โ๐ฅ
โ โ ๐ฆโโ2๐ โ,
where we have used the denition (7.10) of strong convexity in the second inequality andLemma 5.4 together with Lemma 5.7 (i) in the nal equality. Letting now ๐ผ๐ฅ โ ๐น โ(๐ฅโ) and๐ผ๐ฆ โ ๐น โ(๐ฆโ), we obtain (7.6) for ๐น โ with ๐ฟ โ ๐พโ1.
(ii): Let ๐ฅโ, ๐ฆโ โ ๐ โ and _ โ [0, 1]. Set again ๐ฅโ_โ _๐ฅโ + (1 โ _)๐ฆโ. Then we obtain from
the denition of the Fenchel conjugate and (7.6) that for any ๐ฅ, ๐ฆ โ ๐ ,
_๐น โ(๐ฅโ) + (1 โ _)๐น โ(๐ฆโ) โฅ _ [ใ๐ฅโ, ๐ฅ + (1 โ _)๐ฆใ๐ โ ๐น (๐ฅ + (1 โ _)๐ฆ)]+ (1 โ _) [ใ๐ฆโ, ๐ฅ โ _๐ฆใ๐ โ ๐น (๐ฅ โ _๐ฆ)]
โฅ _ใ๐ฅโ, ๐ฅ + (1 โ _)๐ฆใ๐ + (1 โ _)ใ๐ฆโ, ๐ฅ โ _๐ฆใ๐โ ๐น (๐ฅ) โ _(1 โ _)๐ฟ2 โ๐ฆ โ
2๐
= ใ๐ฅโ_, ๐ฅใ๐ โ ๐น (๐ฅ) + _(1 โ _)(ใ๐ฆโ โ ๐ฅโ, ๐ฆใ๐ โ ๐ฟ
2 โ๐ฆ โ2๐
).
Taking now the supremum over all ๐ฅ, ๐ฆ โ ๐ and using again Lemma 5.4 together withLemma 5.7 (i), we obtain the strong convexity (7.10) with ๐พ โ ๐ฟโ1.
90
7 smoothness and convexity
(iii): One direction of the claim is clear from (ii). For the other direction, if ๐น โ is stronglyconvex with factor ๐ฟโ1, then its preconjugate (๐น โ)โ is uniformly smooth with factor ๐ฟ by aproof completely analogous to (i). Then we use Theorem 5.1 to see that ๐น = ๐น โโ โ (๐น โ)โunder the lower semicontinuity assumption. ๏ฟฝ
Just as convexity of ๐น implies monotonicity of ๐๐น , strong convexity has the followingconsequences.
Lemma 7.4. Let ๐ be a Banach space and ๐น : ๐ โ โ. Consider the properties:
(i) ๐น is strongly convex with factor ๐พ > 0.
(ii) ๐น is strongly subdierentiable with factor ๐พ :
(7.11) ๐น (๐ฆ) โ ๐น (๐ฅ) โฅ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐ + ๐พ2 โ๐ฆ โ ๐ฅ โ2๐ for all ๐ฅ, ๐ฆ โ ๐ ; ๐ฅโ โ ๐๐น (๐ฅ).
(iii) ๐๐น is strongly monotone with factor ๐พ :
(7.12) ใ๐ฆโ โ ๐ฅโ, ๐ฆ โ ๐ฅใ๐ โฅ ๐พ โ๐ฆ โ ๐ฅ โ2๐ for all ๐ฅ, ๐ฆ โ ๐ ; ๐ฅโ โ ๐๐น (๐ฅ), ๐ฆโ โ ๐๐น (๐ฆ).
Then (i) โ (ii) โ (iii). If ๐ is reexive and ๐น is proper, convex, and lower semicontinuous,then also (iii) โ (i).
Proof. (i)โ (ii): Let๐ฅ, ๐ฆ โ ๐ and_ โ (0, 1) be arbitrary. Dividing (7.10) by _ and rearrangingyields
๐น (๐ฆ + _(๐ฅ โ ๐ฆ)) โ ๐น (๐ฆ)_
โค ๐น (๐ฅ) โ ๐น (๐ฆ) โ (1 โ _)๐พ2 โ๐ฅ โ ๐ฆ โ2๐ .Since strongly convex functions are also convex, we can apply Lemma 4.3 (ii) to pass to thelimit _ โ 0 on both sides to obtain
๐น โฒ(๐ฆ, ๐ฅ โ ๐ฆ) โค ๐น (๐ฅ) โ ๐น (๐ฆ) โ ๐พ2 โ๐ฅ โ ๐ฆ โ2๐ .
Using Lemma 4.4 for โ = ๐ฅ โ ๐ฆ , we thus obtain that for any ๐ฆโ โ ๐๐น (๐ฆ),
ใ๐ฆโ, ๐ฅ โ ๐ฆใ๐ โค ๐น โฒ(๐ฆ, ๐ฅ โ ๐ฆ) โค ๐น (๐ฅ) โ ๐น (๐ฆ) โ ๐พ2 โ๐ฅ โ ๐ฆ โ2๐ .
Exchanging the roles of ๐ฅ and ๐ฆ and rearranging yields (7.11).
(ii) โ (iii): Adding (7.11) with the same inequality with ๐ฅ and ๐ฆ exchanged immediatelyyields (7.12).
(iii)โ (i): Suppose rst that ๐๐น is surjective. Then dom ๐๐น โ = ๐ โ. Using the duality between๐๐น and ๐๐น โ in Lemma 5.8, we rewrite (7.12) as
(7.13) ใ๐ฆโ โ ๐ฅโ, ๐ฆ โ ๐ฅใ๐ โฅ ๐พ โ๐ฆ โ ๐ฅ โ2๐ for all ๐ฅโ, ๐ฆโ โ ๐ โ; ๐ฅ โ ๐๐น โ(๐ฅโ), ๐ฆ โ ๐๐น โ(๐ฆโ).
91
7 smoothness and convexity
Taking ๐ฆ = ๐ฅ , this implies that ๐ฅโ = ๐ฆโ, i.e., ๐๐น โ(๐ฅโ) is a singleton for all ๐ฅโ โ ๐ โ. Here weuse that dom ๐๐น โ = ๐ โ to avoid the possibility that ๐๐น โ(๐ฅโ) = โ . By Theorem 4.5 it followsthat ๐น โ is Gรขteaux dierentiable. Thus (7.13) describes the co-coercivity (7.2) of ๐ท๐น โ withfactor ๐พ . By Lemma 7.1 it follows that ๐น โ is uniformly smooth with factor ๐พโ1. Consequently,by Theorem 7.3 ๐น is strongly convex with factor ๐พ .
If ๐๐น is not surjective, we replace ๐น by ๐น + Y ๐ for the duality mapping ๐ (๐ฅ) โ 12 โ๐ฅ โ2๐ and
some Y > 0. By Theorem 6.11 and Mintyโs Theorem 6.12 now ๐(๐น + Y ๐) is surjective. It alsoremains strongly monotone with factor ๐พ as ๐ ๐ is monotone. Now, by the above reasoning,๐น + Y ๐ is strongly convex with factor ๐พ . Since Y > 0 was arbitrary, we deduce from thedening (7.10) that ๐น is strongly convex with factor ๐พ . ๏ฟฝ
Note that the factor ๐พ enters into the strong monotonicity (7.12) directly rather than as ๐พ2as in the strong subdierentiability (7.11) (and strong convexity).
We can also derive a stronger, quantitative, version of the fact that for convex functions,points that satisfy the Fermat principle are minimizers.
Lemma 7.5. Let ๐ be a Banach space and let ๐น : ๐ โ โ be strongly convex with factor๐พ > 0. Assume that ๐น admits a minimum๐ โ min๐ฅโ๐ ๐น (๐ฅ). Then the Polyakโลojasewiczinequality holds:
(7.14) ๐น (๐ฅ) โ๐ โค 12๐พ โ๐ฅ
โโ2๐ โ for all ๐ฅ โ ๐, ๐ฅโ โ ๐๐น (๐ฅ).
Proof. Let ๐ฅ โ ๐ and ๐ฅโ โ ๐๐น (๐ฅ) be arbitrary. Then from Lemma 7.4 (ii) we have that
โ๐น (๐ฅ) + ใ๐ฅโ, ๐ฅ โ ๐ฆใ๐ โ ๐พ2 โ๐ฅ โ ๐ฆ โ2๐ โฅ โ๐น (๐ฆ) .
Taking the supremum over all ๐ฆ โ ๐ , noting that this is equivalent to taking the supremumover all ๐ฅโ๐ฆ โ ๐ , and inserting the Fenchel conjugate of the squared norm from Lemma 5.4together with Lemma 5.7 (i), we obtain
โ๐น (๐ฅ) + 12๐พ โ๐ฅ
โโ2๐ โ โฅ sup๐ฆโ๐
โ๐น (๐ฆ) = โmin๐ฆโ๐
๐น (๐ฆ)
and hence, after rearranging, (7.14). ๏ฟฝ
Comparing the consequences of strong convexity in Lemma 7.4 and those of uniformsmoothness in Lemma 7.1, we can already see a certain duality between them: While theformer give lower bounds, the latter give upper bounds and vice versa. A simple exampleis the following
92
7 smoothness and convexity
Corollary 7.6. If ๐น : ๐ โ โ is strongly convex with factor ๐พ and uniformly smooth withfactor ๐ฟ, then
(7.15) ๐พ โ๐ฅ โ ๐ฆ โ2๐ โค ใ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฆ), ๐ฅ โ ๐ฆใ๐ โค ๐ฟโ๐ฅ โ ๐ฆ โ2๐ for all ๐ฅ, ๐ฆ โ ๐ .
Proof. The rst inequality follows from Lemma 7.4 (iii), while the second follows from (1.1)together with Lemma 7.1 (iii). ๏ฟฝ
The estimates of Corollary 7.2 can be improved if ๐น is in addition strongly convex.
Corollary 7.7. Let ๐ be a Banach space and let ๐น : ๐ โ โ be strongly convex with factor๐พ > 0 as well as Lipschitz dierentiable with constant ๐ฟ > 0. Then for any ๐ผ > 0,
(7.16) ใ๐ท๐น (๐ง), ๐ฅ โ๐ฅใ๐ โฅ ๐น (๐ฅ) โ ๐น (๐ฅ) + ๐พ โ ๐ผ๐ฟ2 โ๐ฅ โ๐ฅ โ2๐ โ๐ฟ
2๐ผ โ๐ฅ โ๐งโ2๐ for all ๐ฅ, ๐ง, ๐ฅ โ ๐,
as well as
(7.17) ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ (๐พ โ ๐ผ๐ฟ)โ๐ฅ โ ๐ฅ โ2๐ โ ๐ฟ
4๐ผ โ๐ฅ โ ๐งโ2๐ for all ๐ฅ, ๐ง, ๐ฅ โ ๐ .
Proof. Using the strong subdierentiability from Lemma 7.4 (ii), the Lipschitz continuity of๐ท๐น , (1.1), and Youngโs inequality, we obtain
ใ๐ท๐น (๐ง), ๐ฅ โ ๐ฅใ๐ = ใ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ + ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐โฅ ๐น (๐ฅ) โ ๐น (๐ฅ) + ๐พ2 โ๐ฅ โ ๐ฅ โ2๐ โ ๐ผ๐ฟ
2 โ๐ฅ โ ๐ฅ โ2๐ โ 12๐ผ๐ฟ โ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ)โ
2๐ โ
โฅ ๐น (๐ฅ) โ ๐น (๐ฅ) + ๐พ2 โ๐ฅ โ ๐ฅ โ2๐ โ ๐ผ๐ฟ
2 โ๐ฅ โ ๐ฅ โ2๐ โ ๐ฟ
2๐ผ โ๐ฅ โ ๐งโ2๐ .
For (7.17), we can use the strong monotonicity of ๐ท๐น from Lemma 7.4 (iii) to estimateanalogously
ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ = ใ๐ท๐น (๐ฅ) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ + ใ๐ท๐น (๐ง) โ ๐ท๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐โฅ ๐พ โ๐ฅ โ ๐ฅ โ2๐ โ ๐ผ๐ฟโ๐ฅ โ ๐ฅ โ2๐ โ ๐ฟ
4๐ผ โ๐ฅ โ ๐งโ2๐ . ๏ฟฝ
7.3 moreauโyosida regularization
We now look at another way to reformulate optimality conditions using proximal pointmappings. Although these are no longer equivalent reformulations, they will serve as alink to the Newton-type methods which will be introduced in Chapter 14.
93
7 smoothness and convexity
We again assume that๐ is a Hilbert space and identify๐ โ with๐ via the Riesz isomorphism.Let ๐ด : ๐ โ ๐ be a maximally monotone operator and ๐พ > 0. Then we dene the Yosidaapproximation of ๐ด as
๐ด๐พ โ1๐พ
(Id โ R๐พ๐ด
).
In particular, the Yosida approximation of the subdierential of a proper, convex, and lowersemicontinuous functional ๐น : ๐ โ โ is given by
(๐๐น )๐พ โ 1๐พ
(Id โ prox๐พ๐น
),
which by Corollary 6.14 and Theorem 6.11 is always Lipschitz continuous with constant๐ฟ = ๐พโ1.
An alternative point of view is the following. For a proper, convex, and lower semicontinuousfunctional ๐น : ๐ โ โ and ๐พ > 0, we dene the Moreau envelope1
(7.18) ๐น๐พ : ๐ โ โ, ๐ฅ โฆโ inf๐งโ๐
12๐พ โ๐ง โ ๐ฅ โ
2๐ + ๐น (๐ง),
see Figure 7.1. Comparing this with the denition (6.12) of the proximal point mapping of๐น , we see that
(7.19) ๐น๐พ (๐ฅ) = 12๐พ โprox๐พ๐น (๐ฅ) โ ๐ฅ โ
2๐ + ๐น (prox๐พ๐น (๐ฅ)) .
(Note that multiplying a functional by ๐พ > 0 does not change its minimizers.) Hence ๐น๐พ isindeed well-dened on ๐ and single-valued. Furthermore, we can deduce from (7.19) that๐น๐พ is convex as well.
Lemma 7.8. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐พ > 0. Then ๐น๐พis convex.
Proof. We rst show that for any convex ๐บ : ๐ โ โ, the mapping
๐ป : ๐ ร ๐ โ โ, (๐ฅ, ๐ง) โฆโ ๐น (๐ง) +๐บ (๐ง โ ๐ฅ)
is convex as well. Indeed, for any (๐ฅ1, ๐ง1), (๐ฅ2, ๐ง2) โ ๐ ร ๐ and _ โ [0, 1], convexity of ๐นand ๐บ implies that
๐ป (_(๐ฅ1, ๐ง1) + (1 โ _) (๐ฅ2, ๐ง2)) = ๐น (_๐ง1 + (1 โ _)๐ง2) +๐บ (_(๐ง1 โ ๐ฅ1) + (1 โ _) (๐ง2 โ ๐ฅ2))โค _ (๐น (๐ง1) +๐บ (๐ง1 โ ๐ฅ1)) + (1 โ _) (๐น (๐ง2) +๐บ (๐ง2 โ ๐ฅ2))= _๐ป (๐ฅ1, ๐ง1) + (1 โ _)๐ป (๐ฅ2, ๐ง2).
1not to be confused with the convex envelope ๐น ฮ!
94
7 smoothness and convexity
Let now ๐ฅ1, ๐ฅ2 โ ๐ and _ โ [0, 1]. Since ๐น๐พ (๐ฅ) = inf๐งโ๐ ๐ป (๐ฅ, ๐ง) for ๐บ (๐ฆ) โ 12๐พ โ๐ฆ โ2๐ , there
exist two minimizing sequences {๐ง1๐}๐โโ, {๐ง2๐}๐โโ โ ๐ with๐ป (๐ฅ1, ๐ง1๐) โ ๐น๐พ (๐ฅ1), ๐ป (๐ฅ2, ๐ง2๐) โ ๐น๐พ (๐ฅ2).
From the denition of the inmum together with the convexity of ๐ป , we thus obtain forall ๐ โ โ that
๐น๐พ (_๐ฅ1 + (1 โ _)๐ฅ2) โค ๐ป (_(๐ฅ1, ๐ง1๐) + (1 โ _) (๐ฅ2, ๐ง2๐))โค _๐ป (๐ฅ1, ๐ง1๐) + (1 โ _)๐ป (๐ฅ2, ๐ง2๐),
and passing to the limit ๐ โ โ yields the desired convexity. ๏ฟฝ
We will also show later that MoreauโYosida regularization preserves (global!) Lipschitzcontinuity.
The next theorem links the two concepts of Moreau envelope and of Yosida approximationand hence justies the term MoreauโYosida regularization.
Theorem 7.9. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐พ > 0. Then๐น๐พ is Frรฉchet dierentiable with
โ(๐น๐พ ) = (๐๐น )๐พ .
Proof. Let ๐ฅ, ๐ฆ โ ๐ be arbitrary and set ๐ฅโ = prox๐พ๐น (๐ฅ) and ๐ฆโ = prox๐พ๐น (๐ฆ). We rst showthat
(7.20) 1๐พใ๐ฆโ โ ๐ฅโ, ๐ฅ โ ๐ฅโใ๐ โค ๐น (๐ฆโ) โ ๐น (๐ฅโ).
(Note that for proper ๐น , the denition of proximal points as minimizers necessarily impliesthat ๐ฅโ, ๐ฆโ โ dom ๐น .) To this purpose, consider for ๐ก โ (0, 1) the point ๐ฅโ๐ก โ ๐ก๐ฆโ + (1 โ ๐ก)๐ฅโ.Using the minimizing property of the proximal point ๐ฅโ together with the convexity of ๐นand completing the square, we obtain that
๐น (๐ฅโ) โค ๐น (๐ฅโ๐ก ) +12๐พ โ๐ฅ
โ๐ก โ ๐ฅ โ2๐ โ 1
2๐พ โ๐ฅโ โ ๐ฅ โ2๐
โค ๐ก๐น (๐ฆโ) + (1 โ ๐ก)๐น (๐ฅโ) โ ๐ก
๐พใ๐ฅ โ ๐ฅโ, ๐ฆโ โ ๐ฅโใ๐ + ๐ก2
2๐พ โ๐ฅโ โ ๐ฆโโ2๐ .
Rearranging the terms, dividing by ๐ก > 0 and passing to the limit ๐ก โ 0 then yields (7.20).Combining this with (7.19) implies that
๐น๐พ (๐ฆ) โ ๐น๐พ (๐ฅ) = ๐น (๐ฆโ) โ ๐น (๐ฅโ) + 12๐พ
(โ๐ฆ โ ๐ฆโโ2๐ โ โ๐ฅ โ ๐ฅโโ2๐)
โฅ 12๐พ
(2ใ๐ฆโ โ ๐ฅโ, ๐ฅ โ ๐ฅโใ๐ + โ๐ฆ โ ๐ฆโโ2๐ โ โ๐ฅ โ ๐ฅโโ2๐
)=
12๐พ
(2ใ๐ฆ โ ๐ฅ, ๐ฅ โ ๐ฅโใ๐ + โ๐ฆ โ ๐ฆโ โ ๐ฅ + ๐ฅโโ2๐
)โฅ 1๐พใ๐ฆ โ ๐ฅ, ๐ฅ โ ๐ฅโใ๐ .
95
7 smoothness and convexity
By exchanging the roles of ๐ฅโ and ๐ฆโ in (7.20), we obtain that
๐น๐พ (๐ฆ) โ ๐น๐พ (๐ฅ) โค 1๐พใ๐ฆ โ ๐ฅ, ๐ฆ โ ๐ฆโใ๐ .
Together, these two inequalities yield that
0 โค ๐น๐พ (๐ฆ) โ ๐น๐พ (๐ฅ) โ 1๐พใ๐ฆ โ ๐ฅ, ๐ฅ โ ๐ฅโใ๐
โค 1๐พใ๐ฆ โ ๐ฅ, (๐ฆ โ ๐ฆโ) โ (๐ฅ โ ๐ฅโ)ใ๐
โค 1๐พ
(โ๐ฆ โ ๐ฅ โ2๐ โ โ๐ฆโ โ ๐ฅโโ2๐)
โค 1๐พโ๐ฆ โ ๐ฅ โ2๐ ,
where the next-to-last inequality follows from the rm nonexpansivity of proximal pointmappings (Lemma 6.13).
If we now set ๐ฆ = ๐ฅ + โ for arbitrary โ โ ๐ , we obtain that
0 โค ๐น๐พ (๐ฅ + โ) โ ๐น๐พ (๐ฅ) โ ใ๐พโ1(๐ฅ โ ๐ฅโ), โใ๐โโโ๐ โค 1
๐พโโโ๐ โ 0 for โ โ 0,
i.e., ๐น๐พ is Frรฉchet dierentiable with gradient 1๐พ (๐ฅ โ ๐ฅโ) = (๐๐น )๐พ (๐ฅ). ๏ฟฝ
Since ๐น๐พ is convex by Lemma 7.8, this result together with Theorem 4.5 yields the catchyrelation ๐(๐น๐พ ) = (๐๐น )๐พ .
Example 7.10. We consider again ๐ = โ๐ .
(i) For ๐น (๐ฅ) = โ๐ฅ โ1, we have from Example 6.23 (ii) that the proximal point mappingis given by the component-wise soft-shrinkage operator. Inserting this into thedenition yields that
[(๐โ ยท โ1)๐พ (๐ฅ)]๐ = 1๐พ (๐ฅ๐ โ (๐ฅ๐ โ ๐พ)) = 1 if ๐ฅ๐ > ๐พ,1๐พ๐ฅ๐ if ๐ฅ๐ โ [โ๐พ,๐พ],1๐พ (๐ฅ๐ โ (๐ฅ๐ + ๐พ)) = โ1 if ๐ฅ๐ < ๐พ .
Comparing this to the corresponding subdierential (4.2), we see that the set-valued case in the point ๐ฅ๐ = 0 has been replaced by a linear function on a smallinterval.
96
7 smoothness and convexity
(a) ๐ (๐ก) = |๐ก | (b) ๐ (๐ก) = ๐ฟ [โ1,1] (๐ก)
Figure 7.1: Illustration of the MoreauโYosida regularization (thick solid line) of ๐น (thin solidline). The dotted line indicates the quadratic function ๐ง โฆโ 1
2๐พ โ๐ฅ โ ๐งโ2๐ , whilethe dashed line is ๐ง โฆโ ๐น (๐ง) + 1
2๐พ โ๐ฅ โ ๐งโ2๐ . The dots and the horizontal andvertical lines (nontrivial only in the second point of (a)) emanating from thedots indicate the pair (๐ฅ, ๐น๐พ (๐ฅ)) and how it relates to the minimization of theshifted quadratic functional. (In (b) the two lines are overlaid within [โ1, 1], asonly the domain of denition of the two functions is dierent.)
Similarly, inserting the denition of the proximal point into (7.19) shows that
๐น๐พ (๐ฅ) =๐โ๐=1
๐๐พ (๐ฅ๐) for ๐๐พ (๐ก) =
12๐พ |๐ก โ (๐ก โ ๐พ) |2 + |๐ก โ ๐พ | = ๐ก โ ๐พ
2 if ๐ก > ๐พ,12๐พ |๐ก |2 if ๐ก โ [โ๐พ,๐พ],12๐พ |๐ก โ (๐ก + ๐พ) |2 + |๐ก + ๐พ | = โ๐ก โ ๐พ
2 if ๐ก < โ๐พ .
For small values, the absolute value is thus replaced by a quadratic function(which removes the nondierentiability at 0). This modication is well-knownunder the name Huber norm; see Figure 7.1a.
(ii) For ๐น (๐ฅ) = ๐ฟ๐นโ (๐ฅ), we have from Example 6.23 (iii) that the proximal mapping isgiven by the component-wise projection onto [โ1, 1] and hence that[(๐๐ฟ๐นโ)๐พ (๐ฅ)
]๐=
1๐พ
(๐ฅ๐ โ
(๐ฅ๐ โ (๐ฅ๐ โ 1)+ โ (๐ฅ๐ + 1)โ) ) = 1
๐พ(๐ฅ๐ โ 1)+ + 1
๐พ(๐ฅ๐ + 1)โ.
Similarly, inserting this and using that prox๐พ๐น (๐ฅ) โ ๐นโ and ใ(๐ฅ+1)โ, (๐ฅโ1)+ใ๐ = 0yields that
(๐ฟ๐นโ)๐พ (๐ฅ) =12๐พ โ(๐ฅ โ 1)+โ22 +
12๐พ โ(๐ฅ + 1)โโ22,
which corresponds to the classical penalty functional for the inequality constraints๐ฅ โ 1 โค 0 and ๐ฅ + 1 โฅ 0 in nonlinear optimization; see Figure 7.1b.
By Theorem 7.9, ๐น๐พ is Frรฉchet dierentiable with Lipschitz continuous gradient with factor๐พโ1. From Theorem 7.3, we thus know that ๐น โ๐พ is strongly convex with factor ๐พ , which in
97
7 smoothness and convexity
Hilbert spaces is equivalent to ๐น โ๐พ โ ๐พ2 โ ยท โ2๐ being convex. In fact, this can be made even
more explicit.
Theorem 7.11. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous. Then we have forall ๐พ > 0 that
(๐น๐พ )โ = ๐น โ + ๐พ2 โ ยท โ2๐ .
Proof. We obtain directly from the denition of the Fenchel conjugate in Hilbert spacesand of the Moreau envelope that
(๐น๐พ )โ(๐ฅโ) = sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ โ inf
๐งโ๐
[12๐พ โ๐ฅ โ ๐งโ2๐ + ๐น (๐ง)
]}= sup๐ฅโ๐
{ใ๐ฅโ, ๐ฅใ๐ + sup
๐งโ๐
{โ 1
2๐พ โ๐ฅ โ ๐งโ2๐ โ ๐น (๐ง)}}
= sup๐งโ๐
{ใ๐ฅโ, ๐งใ๐ โ ๐น (๐ง) + sup
๐ฅโ๐
{ใ๐ฅโ, ๐ฅ โ ๐งใ๐ โ 1
2๐พ โ๐ฅ โ ๐งโ2๐}}
= ๐น โ(๐ฅโ) +(12๐พ โ ยท โ2๐
)โ(๐ฅโ),
since for any given ๐ง โ ๐ , the inner supremum is always taken over the full space ๐ . Theclaim now follows from Lemma 5.4 with ๐ = 2 (using again the fact that we have identied๐ โ with ๐ ) and Lemma 5.7 (i). ๏ฟฝ
With this, we can show the converse of Theorem 7.9: every smooth function can be obtainedthrough MoreauโYosida regularization.
Corollary 7.12. Let ๐น : ๐ โ โ be convex and ๐ฟ-smooth. Then for all ๐ฅ โ ๐ ,
๐น (๐ฅ) = (๐บโ)๐ฟโ1 (๐ฅ) and โ๐น (๐ฅ) = prox๐ฟ๐บ (๐ฟ๐ฅ)
for
๐บ : ๐ โ โ, ๐บ (๐ฅ) = ๐น โ(๐ฅ) โ 12๐ฟ โ๐ฅ โ
2๐ .
Proof. Since ๐น is convex and ๐ฟ-smooth and๐ is a Hilbert space, Lemma 7.1 and Theorem 7.3yields that ๐น โ is strongly convex with factor ๐ฟโ1 and thus that ๐บ is convex. Furthermore,as a Fenchel conjugate of a proper convex functional, ๐น โ and thus ๐บ is proper and lowersemicontinuous. Theorems 5.1 and 7.11 now imply that for all ๐ฅ โ ๐ ,
(๐บโ)๐ฟโ1 (๐ฅ) = (๐บโ)โโ๐ฟโ1 (๐ฅ) =(๐บ + 1
2๐ฟ โ ยท โ2)โ
(๐ฅ) = ๐น โโ(๐ฅ) = ๐น (๐ฅ).
98
7 smoothness and convexity
Furthermore, by Lemma 4.13 and Theorems 4.5 and 4.14, we have that
๐๐บ (๐ง) = ๐๐น โ(๐ง) โ {๐ฟโ1๐ง} for all ๐ง โ ๐ .By the denition of the proximal mapping, this is equivalent to ๐ง = prox๐ฟ๐บ๐ฟ๐ฅ for any๐ฅ โ ๐๐น โ(๐ง). But by Lemma 5.8, ๐ฅ โ ๐๐น โ(๐ง) holds if and only if ๐ง โ ๐๐น (๐ฅ) = {โ๐น (๐ฅ)}, andcombining these two yields the rst expression for the gradient. ๏ฟฝ
Let us briey consider the relevance of the previous results to optimization.
Approximation by smoothmappings For a convex functional ๐น : ๐ โ โ, everyminimizer๐ฅ โ ๐ satises the Fermat principle 0 โ ๐๐น (๐ฅ), which we can write equivalently as๐ฅ โ ๐๐น โ(0). If we now replace ๐๐น โ with its Yosida approximation (๐๐น โ)๐พ , we obtain theregularized optimality condition
๐ฅ๐พ = (๐๐น โ)๐พ (0) = โ 1๐พprox๐พ๐น โ (0).
This is now an explicit and even Lipschitz continuous relation. Although ๐ฅ๐พ is no longer aminimizer of ๐น , the convexity of ๐น๐พ implies that ๐ฅ๐พ โ (๐๐น โ)๐พ (0) = ๐(๐น โ๐พ ) (0) is equivalentto
0 โ ๐(๐น โ๐พ )โ(๐ฅ๐พ ) = ๐(๐น โโ + ๐พ
2 โ ยท โ2๐) (๐ฅ๐พ ) = ๐ (
๐น + ๐พ2 โ ยท โ2๐
) (๐ฅ๐พ ),i.e., ๐ฅ๐พ is the (unique due to the strict convexity of the squared norm) minimizer of thefunctional ๐น + ๐พ
2 โ ยท โ2๐ . Hence, the regularization of ๐๐น โ has not made the original problemsmooth but merely (more) strongly convex. The equivalence can also be used to show(similarly to the proof of Theorem 2.1) that ๐ฅ๐พ โ ๐ฅ for ๐พ โ 0. In practice, this straight-forward approach fails due to the diculty of computing ๐น โ and prox๐น โ and is thereforeusually combined with one of the splitting techniques that will be introduced in the nextchapter.
Conversion between gradients and proximal mappings According to Corollary 7.12,solving min๐ฅ ๐น (๐ฅ) for an ๐ฟ-smooth function ๐น is equivalent to solving
min๐ฅ,๐ฅโ๐
๐บโ(๐ฅ) + 12๐ฟ โ๐ฅ โ ๐ฅ โ2.
Observe that๐บโmay be nonsmooth. Supposewe apply an algorithm for the latter thatmakesuse of the proximal mapping of ๐บโ (such as the splitting methods that will be discussedin the following chapters). Then using the Moreau decomposition of Lemma 6.21 (ii) withCorollary 7.12, we see that
prox๐ฟโ1๐บโ (๐ฅ) = ๐ฅ โ ๐ฟโ1โ๐น (๐ฅ).Therefore, this can still be done purely in terms of the gradient evaluations of ๐น .
99
7 smoothness and convexity
Remark 7.13. Continuing from Remark 6.26, MoreauโYosida regularization can also be dened inreexive Banach spaces; we refer to [Brezis, Crandall & Pazy 1970] for details. Again, the mainissue is the practical evaluation of ๐น๐พ and (๐๐น )๐พ if the duality mapping (or the Riesz isomorphismin Hilbert spaces that are not identied with their dual) is no longer the identity.
100
8 PROXIMAL POINT AND SPLITTING METHODS
We now turn to the development of algorithms for computing minimizers of functionals๐ฝ : ๐ โ โ of the form
๐ฝ (๐ฅ) โ ๐น (๐ฅ) +๐บ (๐ฅ)for ๐น,๐บ : ๐ โ โ convex but not necessarily dierentiable. One of the main dicultiescompared to the dierentiable setting is that the naive equivalent to steepest descent, theiteration
๐ฅ๐+1 โ ๐ฅ๐ โ ๐๐๐๐ฝ (๐ฅ๐),does not work since even in nite dimensions, arbitrary subgradients need not be descentdirections โ this can only be guaranteed for the subgradient of minimal norm; see, e.g.,[Ruszczynski 2006, Example 7.1, Lemma 2.77]. Furthermore, the minimal norm subgradientof ๐ฝ cannot be computed easily from those of ๐น and๐บ . We thus follow a dierent approachand look for a root ๐ฅ of the set-valued mapping ๐ฅ โฆโ ๐๐ฝ (๐ฅ) (which coincides with theminimizer ๐ฅ of ๐ฝ if ๐ฝ is convex). In this chapter, we only derive methods, postponing proofsof convergence, in various dierent senses, to Chapters 9 to 11. For the reasons mentionedin the beginning of Section 6.3, we will assume in this and the following chapters that๐ (aswell as all further occurring spaces) is a Hilbert space so that we can identify ๐ โ ๏ฟฝ ๐ .
8.1 proximal point method
We have seen in Corollary 6.19 that a root ๐ฅ of ๐๐ฝ : ๐ โ ๐ can be characterized as a xedpoint of prox๐ ๐ฝ for any ๐ > 0. This suggests a xed-point iteration: Choose ๐ฅ0 โ ๐ and foran appropriate sequence {๐๐}๐โโ of step sizes set
(8.1) ๐ฅ๐+1 = prox๐๐ ๐ฝ (๐ฅ๐).This iteration naturally generalizes to nding a root ๐ฅ โ ๐ดโ1(0) of a set-valued (usuallymonotone) operator ๐ด : ๐ โ ๐ as
(8.2) ๐ฅ๐+1 = R๐๐๐ด (๐ฅ๐).This is the proximal point method, which is the basic building block for all methods in thischapter. Using the denition of the resolvent, this can also be written in implicit form as
(8.3) 0 โ ๐๐๐ด(๐ฅ๐+1) + (๐ฅ๐+1 โ ๐ฅ๐),
101
8 proximal point and splitting methods
which will be useful for the analysis of the method.
If ๐ด is maximal monotone (in particular if ๐ด = ๐๐ฝ ), Lemma 6.13 shows that the iterationmap ๐ฅ โฆโ R๐๐๐ด (๐ฅ) is rmly nonexpansive. Mere (nonrm) nonexpansivity already impliesthat
โ๐ฅ๐+1 โ ๐ฅ โ = โR๐๐๐ด (๐ฅ๐) โ ๐ฅ โ๐ โค โ๐ฅ๐ โ ๐ฅ โ๐ .In other words, the method does not escape from a xed point. Either a more renedanalysis based on rm nonexpansivity of the iteration map or a more direct analysis basedon the maximal monotonicity of ๐ด can be used to further show that the iterates {๐ฅ๐}๐โโindeed converge to a xed point ๐ฅ for an initial iterate ๐ฅ0. The latter will be the topic ofChapter 9.
A practical issue is the steps (8.1) of the basic proximal point method are typically just asdicult as the original problem, so the method is not feasible for problems that demand aniterative method for their solution in the rst place. However, the proximal step does forman important building block of several more practical splitting methods for problems of theform ๐ฝ = ๐น +๐บ , which we derive in the following by additional clever manipulations.
Remark 8.1. The proximal point algorithm can be traced back to Krasnoselโฒskiฤฑ [Krasnoselโฒskiฤฑ 1955]and Mann [Mann 1953] (as a special case of the KrasnoselโฒskiฤฑโMann iteration); it was also studiedin [Martinet 1970]. The formulation considered here was proposed in [Rockafellar 1976b].
8.2 explicit splitting
As we have noted, the proximal point method is not feasible for most functionals of theform ๐ฝ (๐ฅ) = ๐น (๐ฅ) + ๐บ (๐ฅ), since the evaluation of prox๐ฝ is not signicantly easier thansolving the original minimization problem โ even if prox๐น and prox๐บ have a closed-formexpression. (Such functionals are called prox-simple). We thus proceed dierently: insteadof applying the proximal point reformulation directly to 0 โ ๐๐ฝ (๐ฅ), we rst apply thesubdierential sum rule (Theorem 4.14) to deduce the existence of ๐ โ ๐ with
(8.4){๐ โ ๐๐น (๐ฅ),
โ๐ โ ๐๐บ (๐ฅ).We can now replace one or both of these subdierential inclusions by a proximal pointreformulation that only involves ๐น or ๐บ .
Explicit splitting methods โ also known as forward-backward splitting โ are based onapplying Lemma 6.18 only to, e.g., the second inclusion in (8.4) to obtain
(8.5){๐ โ ๐๐น (๐ฅ),๐ฅ = prox๐๐บ (๐ฅ โ ๐๐).
The corresponding xed-point iteration then consists in
102
8 proximal point and splitting methods
(i) choosing ๐๐ โ ๐๐น (๐ฅ๐) (with minimal norm);
(ii) setting ๐ฅ๐+1 = prox๐๐๐บ (๐ฅ๐ โ ๐๐๐๐).Again, computing a subgradient with minimal norm can be complicated in general. It is,however, easy if ๐น is additionally dierentiable since in this case ๐๐น (๐ฅ) = {โ๐น (๐ฅ)} byTheorem 4.5. This leads to the proximal gradient or forward-backward splitting method
(8.6) ๐ฅ๐+1 = prox๐๐๐บ (๐ฅ๐ โ ๐๐โ๐น (๐ฅ๐)).
(The special case ๐บ = ๐ฟ๐ถ โ i.e., prox๐๐บ (๐ฅ) = proj๐ถ (๐ฅ) โ is also known as the projectedgradient method). Similarly to the proximal point method, this method can be written inimplicit form as
(8.7) 0 โ ๐๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] + (๐ฅ๐+1 โ ๐ฅ๐).
Based on this, we will see in Chapter 9 that the iterates {๐ฅ๐} converge weakly if ๐๐๐ฟ < 2 for๐ฟ the Lipschitz factor of โ๐น . The need to know ๐ฟ is one drawback of the explicit splittingmethod. This can to some extend be circumvented by performing a line search, i.e., testingfor various choices of ๐๐ until a sucient decrease in function values is achieved. Wewill discuss such strategies later on in Section 12.3. Another highly successful variant ofexplicit splitting applies inertia to the iterates for faster convergence; this we will discussin Section 12.2 after developing tools for the study of convergence rates.
Remark 8.2. Forward-backward splitting for nding the root of the sum of two monotone operatorswas already proposed in [Lions & Mercier 1979]. It has become especially popular under the nameiterative soft-thresholding (ISTA) in the context of sparse reconstruction (i.e., regularization of linearinverse problems with โ 1 penalties), see, e.g., [Chambolle, DeVore, et al. 1998; Daubechies, Defrise &De Mol 2004; Wright, Nowak & Figueiredo 2009].
8.3 implicit splitting
Even with a line search, the restriction on the step sizes ๐๐ in explicit splitting remainunsatisfactory. Such restrictions are not needed in implicit splitting methods. (Comparethe properties of explicit vs. implicit Euler methods for dierential equations.) Here, theproximal point formulation is applied to both subdierential inclusions in (8.4), whichyields the optimality system {
๐ฅ = prox๐๐น (๐ฅ + ๐๐),๐ฅ = prox๐๐บ (๐ฅ โ ๐๐).
To eliminate ๐ from these equations, we set ๏ฟฝ๏ฟฝ โ ๐ฅ +๐๐ and๐ค โ ๐ฅ โ๐๐ = 2๐ฅ โ ๏ฟฝ๏ฟฝ. It remainsto derive a recursion for ๏ฟฝ๏ฟฝ, which we obtain from the productive zero ๏ฟฝ๏ฟฝ = ๏ฟฝ๏ฟฝ + (๐ฅ โ ๐ฅ).
103
8 proximal point and splitting methods
Further replacing some copies of ๐ฅ by a new variable ๐ฆ leads to the overall xed pointsystem
๐ฅ = prox๐๐น (๏ฟฝ๏ฟฝ),๐ฆ = prox๐๐บ (2๐ฅ โ ๏ฟฝ๏ฟฝ),๏ฟฝ๏ฟฝ = ๏ฟฝ๏ฟฝ + ๐ฅ โ ๐ฆ.
The corresponding xed-point iteration leads to the DouglasโRachford splitting (DRS)method
(8.8)
๐ฅ๐+1 = prox๐๐น (๐ง๐),๐ฆ๐+1 = prox๐๐บ (2๐ฅ๐+1 โ ๐ง๐),๐ง๐+1 = ๐ง๐ + ๐ฆ๐+1 โ ๐ฅ๐+1.
Of course, the algorithm and its derivation generalize to arbitrary monotone operators๐ด, ๐ต : ๐ โ ๐ :
(8.9)
๐ฅ๐+1 = R๐๐ต (๐ง๐),๐ฆ๐+1 = R๐๐ด (2๐ฅ๐+1 โ ๐ง๐),๐ง๐+1 = ๐ง๐ + ๐ฆ๐+1 โ ๐ฅ๐+1.
We can also write the DRS method in more implicit form. Indeed, inverting the resolventsin (8.9) and using the last update to change variables in the rst two yields
0 โ ๐๐ต(๐ฅ๐+1) + ๐ฆ๐+1 โ ๐ง๐+1,0 โ ๐๐ด(๐ฆ๐+1) + ๐ง๐+1 โ ๐ฅ๐+1,0 = ๐ฅ๐+1 โ ๐ฆ๐+1 + (๐ง๐+1 โ ๐ง๐).
Therefore, with ๐ข โ (๐ฅ, ๐ฆ, ๐ง) โ ๐ 3, and the operators1
(8.10) ๐ป (๐ฅ, ๐ฆ, ๐ง) โ ยฉยญยซ๐๐ต(๐ฅ) + ๐ฆ โ ๐ง๐๐ด(๐ฆ) + ๐ง โ ๐ฅ
๐ฅ โ ๐ฆยชยฎยฌ and ๐ โ
ยฉยญยซ0 0 00 0 00 0 Id
ยชยฎยฌ ,we can write the DRS method as the preconditioned proximal point method
(8.11) 0 โ ๐ป (๐ข๐+1) +๐ (๐ข๐+1 โ ๐ข๐).
Indeed, the basic proximal point in implicit form (8.3) is just (8.11) with the preconditioner๐ = ๐โ1Id. It is furthermore straightforward to verify that 0 โ ๐ป (๐ข) is equivalent to0 โ ๐ด(๐ฅ) + ๐ต(๐ฅ).1Here and in the following, we identify ๐ฅ โ ๐ with the singleton set {๐ฅ} โ ๐ whenever there is no dangerof confusion.
104
8 proximal point and splitting methods
The formulation (8.11) will in the following chapter form the basis for proving the con-vergence of the method. Recalling the discussion on convergence in Section 8.1, it seemsbenecial for ๐ป to be maximally monotone, as then (although this is not immediate fromLemma 6.13) it is reasonable to expect the nonexpansivity of the iterates with respect tothe semi-norm ๐ข โฆโ โ๐ขโ๐ โ
โใ๐๐ข,๐ขใ on ๐ 3 induced by the self-adjoint operator๐ , i.e.,
thatโ๐ฅ๐+1 โ ๐ฅ โ๐ โค โ๐ฅ๐ โ ๐ฅ โ๐ .
While it is straightforward to verify that ๐ป is monotone if ๐ด and ๐ต are, the question ofmaximal monotonicity is more involved and will be addressed in Chapter 9. There, we willalso show that the expected nonexpansivity holds in a slightly stronger sense and that thiswill yield the convergence of the method.
Remark 8.3. The DouglasโRachford splitting was rst introduced in [Douglas & Rachford 1956]; therelationship to the proximal point method was discovered in [Eckstein & Bertsekas 1992]. It is alsopossible to devise acceleration schemes under strong monotonicity [see, e.g., Bredies & Sun 2016].
8.4 primal-dual proximal splitting
We now consider problems of the form
(8.12) min๐ฅโ๐
๐น (๐ฅ) +๐บ (๐พ๐ฅ)
for ๐น : ๐ โ โ and๐บ : ๐ โ โ proper, convex, and lower semicontinuous, and๐พ โ ๐(๐ ;๐ ).Applying Theorem 5.10 and Lemma 5.8 to such a problem yields the Fenchel extremalityconditions
(8.13){โ๐พโ๐ฆ โ ๐๐น (๐ฅ),
๐ฆ โ ๐๐บ (๐พ๐ฅ), โ{โ๐พโ๐ฆ โ ๐๐น (๐ฅ),
๐พ๐ฅ โ ๐๐บโ(๐ฆ).
With the general notation ๐ข โ (๐ฅ, ๐ฆ), this can be written as 0 โ ๐ป (๐ข) for
(8.14) ๐ป (๐ข) โ(๐๐น (๐ฅ) + ๐พโ๐ฆ๐๐บโ(๐ฆ) โ ๐พ๐ฅ
),
It is again not dicult to see that ๐ป is monotone. This suggests that we might be able toapply the proximal point method to nd a root of ๐ป . In practice we however need to worka little bit more, as the resolvent of ๐ป can rarely be given an explicit, easily solvable form.If, however, the resolvents of ๐บ and ๐น โ can individually be computed explicitly, it makessense to try to decouple the primal and dual variables. This is what we will do.
105
8 proximal point and splitting methods
To do so, we reformulate for arbitrary ๐, ๐ > 0 the extremality conditions (8.13) usingLemma 6.18 as {
๐ฅ = prox๐๐น (๐ฅ โ ๐๐พโ๐ฆ),๐ฆ = prox๐๐บโ (๐ฆ + ๐๐พ๐ฅ).
This suggests the xed-point iterations
(8.15){๐ฅ๐+1 = prox๐๐น (๐ฅ๐ โ ๐๐พโ๐ฆ๐),๐ฆ๐+1 = prox๐๐บโ (๐ฆ๐ + ๐๐พ๐ฅ๐+1).
In the rst equation, we now use prox๐๐น = (Id + ๐๐๐น )โ1 to obtain that
(8.16) ๐ฅ๐+1 = prox๐๐น (๐ฅ๐ โ ๐๐พโ๐ฆ๐) โ ๐ฅ๐ โ ๐๐พโ๐ฆ๐ โ ๐ฅ๐+1 + ๐๐๐น (๐ฅ๐+1)โ 0 โ ๐โ1(๐ฅ๐+1 โ ๐ฅ๐) โ ๐พโ(๐ฆ๐+1 โ ๐ฆ๐)
+ [๐๐น (๐ฅ๐+1) + ๐พโ๐ฆ๐+1] .
Similarly, the second equation of (8.15) gives
(8.17) ๐ฆ๐+1 = prox๐๐บโ (๐ฆ๐ + ๐๐พ๐ฅ๐+1) โ ๐โ1๐ฆ๐ โ ๐โ1๐ฆ๐+1 + ๐๐บโ(๐ฆ๐+1) โ ๐พ๐ฅ๐+1โ 0 โ ๐โ1(๐ฆ๐+1 โ ๐ฆ๐) + [๐๐บโ(๐ฆ๐+1) โ ๐พ๐ฅ๐+1] .
With the help of (8.16), (8.17), and the operator
๏ฟฝ๏ฟฝ โ
(๐โ1Id โ๐พโ
0 ๐โ1Id
),
we can then rearrange (8.15) as the preconditioned proximal point method (8.11). Further-more, provided the step lengths are such that๐ = ๏ฟฝ๏ฟฝ is invertible, this can be written
(8.18) 0 โ ๐ป (๐ข๐+1) +๐ (๐ข๐+1 โ ๐ข๐) โ ๐ข๐+1 = R๐โ1๐ป๐ข๐ .
However, considering our rough discussions on convergence in the previous sections, thereis a problem:๐ is not self-adjoint and therefore does not induce a (semi-)norm on ๐ ร ๐ .We therefore change our algorithm and take
(8.19) ๐ โ
(๐โ1Id โ๐พโ
โ๐พ ๐โ1Id
).
Correspondingly, replacing (8.17) by
๐ฆ๐+1 = prox๐๐บโ (๐ฆ๐ + ๐๐พ (2๐ฅ๐+1 โ ๐ฅ๐)) โ ๐โ1๐ฆ๐ โ ๐พ๐ฅ๐ โ ๐โ1๐ฆ๐+1 + ๐๐บโ(๐ฆ๐+1) โ 2๐พ๐ฅ๐+1
โ 0 โ ๐โ1(๐ฆ๐+1 โ ๐ฆ๐) โ ๐พ (๐ฅ๐+1 โ ๐ฅ๐)+ [๐๐บโ(๐ฆ๐+1) โ ๐พ๐ฅ๐+1],
106
8 proximal point and splitting methods
we then obtain from (8.18) the Primal-Dual Proximal Splitting (PDPS) method
(8.20)
๐ฅ๐+1 = prox๐๐น (๐ฅ๐ โ ๐๐พโ๐ฆ๐),๐ฅ๐+1 = 2๐ฅ๐+1 โ ๐ฅ๐ ,๐ฆ๐+1 = prox๐๐บโ (๐ฆ๐ + ๐๐พ๐ฅ๐+1).
The middle over-relaxation step is a consequence of our choice of the bottom-left cornerof ๐ dened in (8.20). This itself was forced to have its current form through the self-adjointness requirement on๐ and the choice of the top-right corner of๐ . As mentionedabove, the role of the latter is to decouple the primal update from the dual update by shifting๐พโ๐ฆ๐+1 within ๐ป to ๐พโ๐ฆ๐ so that the primal iterate ๐ฅ๐+1 can be computed without knowing๐ฆ๐+1. (Alternatively, we could zero out the o-diagonal of๐ and still have a self-adjointoperator, but then we would generally not be able to compute ๐ฅ๐+1 independent of ๐ฆ๐+1.)
Wewill in the following chapters demonstrate that the PDPSmethod converges if๐๐ โ๐พ โ2 <1, and that in fact it has particularly good convergence properties. Note that although theiteration (9.19) is implicit in ๐น and ๐บ , it is still explicit in ๐พ ; it is therefore not surprisingthat step size restrictions based on ๐พ remain. Applying, for example, the PDPS methodwith ๏ฟฝ๏ฟฝ (๐ฅ) โ ๐บ (๐พ๐ฅ) (i.e., applying only the sum rule but not the chain rule) would lead to afully implicit method. This would, however, require computing ๐พโ1 in the primal proximalstep involving prox๐๏ฟฝ๏ฟฝโ . It is precisely the point of the primal-dual proximal splitting toavoid having to invert ๐พ , which is often prohibitively expensive if not impossible (e.g., if ๐พdoes not have closed range as in many inverse problems).
Remark 8.4. The primal-dual proximal splitting was rst introduced in [Pock et al. 2009] for specicimage segmentation problems, and later more generally in [Chambolle & Pock 2011]. For this reason,it is frequently referred to as the ChambolleโPock method. The relation to proximal point methodswas rst pointed out in [He & Yuan 2012]. In [Esser, Zhang & Chan 2010] it was classied as thePrimal-Dual Hybrid Gradient method, Modied or PDHGM after the method (8.15), which is calledthe PDHG. The latter is due to [Zhu & Chan 2008].
Banach space generalizations of the PDPS method, based on a so-called Bregman divergence in placeof ๐ข โฆโ 1
2 โ๐ขโ2, were introduced in [Hohage & Homann 2014]. We will discuss Bregman divergencesin further detail in Section 11.1.
The PDPS method has been also generalized to dierent types of nonconvex problems in [Valkonen2014; Mรถllenho et al. 2015]. Stochastic generalizations are considered in [Valkonen 2019; Chambolle,Ehrhardt, et al. 2018].
8.5 primal-dual explicit splitting
The PDPS method is useful for dealing with the sum of functionals where one summandincludes a linear operator. However, if this is the case for both operators, i.e.,
min๐ฅโ๐
๐น (๐ด๐ฅ) +๐บ (๐พ๐ฅ)
107
8 proximal point and splitting methods
for ๐น : ๐ โ โ, ๐บ : ๐ โ โ, ๐ด โ ๐(๐ ;๐ ) and ๐พ โ ๐(๐ ;๐ ), we again have the problem ofdealing with a complicated proximal mapping. One workaround is the following โliftingtrickโ: we introduce
(8.21) ๐น (๐ฅ) โ 0, ๏ฟฝ๏ฟฝ (๐ฆ, ๐ง) โ ๐บ (๐ฆ) + ๐น (๐ง) and ๏ฟฝ๏ฟฝ๐ฅ โ (๐พ๐ฅ,๐ด๐ฅ),
and then apply the PDPS method to the reformulated problem min๐ฅ ๐น (๐ฅ) + ๏ฟฝ๏ฟฝ (๏ฟฝ๏ฟฝ๐ฅ). Ac-cording to Lemma 6.21 (iii), the dual step of the PDPS method will then split into separateproximal steps with respect to ๐บโ and ๐น โ, while the proximal map in the primal step willbe trivial. However, an additional dual variable will have been introduced through theintroduction of ๐ง above, which can be costly.
An alternative approach is the following. Analogously to (8.15), but only using Lemma 6.18on the second relation of (8.13) together with the chain rule (Theorem 4.17), we can refor-mulate the latter as
(8.22){๐ฅ โ ๐ฅ โ ๐ [๐๐ดโ๐น (๐ด๐ฅ) + ๐พโ๐ฆ],๐ฆ = prox๐๐บโ (๐ฆ + ๐๐พ๐ฅ).
(For ๐พ = Id, we can alternatively obtain (8.22) from the derivation of explicit splitting byusing Moreauโs identity, Theorem 6.20, in the second relation of (8.5).)
If ๐น is Gรขteaux dierentiable (and taking ๐ด = Id for the sake of presentation), inserting therst relation in the second relation, (8.22) can be further rewritten as{
๐ฅ = ๐ฅ โ ๐ [โ๐น (๐ฅ) + ๐พโ๐ฆ],๐ฆ = prox๐๐บโ (๐ฆ + ๐๐พ๐ฅ โ ๐๐๐พ [โ๐น (๐ฅ) + ๐พโ๐ฆ]).
Reordering the lines and xing ๐ = ๐ = 1, the corresponding xed-point iteration leads tothe primal-dual explicit splitting (PDES) method
(8.23){๐ฆ๐+1 = prox๐บโ ((Id โ ๐พ๐พโ)๐ฆ๐ + ๐พ (๐ฅ๐ โ โ๐น (๐ฅ๐))),๐ฅ๐+1 = ๐ฅ๐ โ โ๐น (๐ฅ๐) โ ๐พโ๐ฆ๐+1.
Again, we can write (8.23) in more implicit form as{0 โ ๐๐บโ(๐ฆ๐+1) โ ๐พ (๐ฅ๐ โ โ๐น (๐ฅ๐) โ ๐พโ๐ฆ๐) + (๐ฆ๐+1 โ ๐ฆ๐),0 = โ๐น (๐ฅ๐) + ๐พโ๐ฆ๐+1 + (๐ฅ๐+1 โ ๐ฅ๐).
Inserting the second relation in the rst, this is{0 โ ๐๐บโ(๐ฆ๐+1) โ ๐พ๐ฅ๐+1 + (Id โ ๐พ๐พโ) (๐ฆ๐+1 โ ๐ฆ๐),0 = โ๐น (๐ฅ๐) + ๐พโ๐ฆ๐+1 + (๐ฅ๐+1 โ ๐ฅ๐).
108
8 proximal point and splitting methods
If we now introduce the preconditioning operator
(8.24) ๐ โ
(Id 00 Id โ ๐พ๐พโ
),
then in terms of the monotone operator ๐ป introduced in (8.14) for the PDPS method and๐ข = (๐ฅ, ๐ฆ), the PDES method (8.23) can be written in implicit form as
(8.25) 0 โ ๐ป (๐ข๐+1) +(โ๐น (๐ฅ๐) โ โ๐น (๐ฅ๐+1)
0
)+๐ (๐ข๐+1 โ ๐ข๐).
The middle term switches the step with respect to ๐น to be explicit. Note that (8.7) couldhave also been written with a similar middle term; we can therefore think of the PDESmethod as a preconditioned explicit splitting method.
The preconditioning operator๐ is self-adjoint as well as positive semi-denite if โ๐พ โ โค 1.It does not have the o-diagonal decoupling terms that the preconditioner for the PDPSmethod has. Instead, through the special structure of the problem the term Id โ ๐พ๐พโ
decouple ๐ฆ๐+1 from ๐ฅ๐+1, allowing ๐ฆ๐+1 be computed rst.
We will in Section 9.4 see that the iterates of the PDES method converge weakly when โ๐นis Lipschitz with factor strictly less than 2.
Remark 8.5. The primal-dual explicit splitting was introduced in [Loris & Verhoeven 2011] asGeneralized Iterative Soft Thresholding (GIST) for ๐น (๐ฅ) = 1
2 โ๐ โ ๐ฅ โ2. The general case has later beencalled the primal-dual xed point method (PDFP) in [Chen, Huang & Zhang 2013] and the proximalalternating predictor corrector (PAPC) in [Drori, Sabach & Teboulle 2015].
8.6 the augmented lagrangian and alternating direction minimization
Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous. Also let๐ด โ ๐(๐ ;๐ ), and ๐ต โ ๐(๐ ;๐ ), and consider for some ๐ โ ๐ the problem
(8.26) min๐ฅ,๐ง
๐น (๐ฅ) +๐บ (๐ง) s.t. ๐ด๐ฅ + ๐ต๐ง = ๐.
A traditional way to handle this kind of constraint problems is by means of the AugmentedLagrangian. We start by introducing the Lagrangian
L(๐ฅ, ๐ง; _) โ ๐น (๐ฅ) +๐บ (๐ง) + ใ๐ด๐ฅ + ๐ต๐ง โ ๐, _ใ๐ .
Then (8.26) has the same solutions as the saddle-point problem
(8.27) min๐ฅโ๐,๐งโ๐
max_โ๐
L(๐ฅ, ๐ง; _).
109
8 proximal point and splitting methods
We may then โaugmentโ the Lagrangian by a squared penalty on the violation of theconstraint, hence obtaining the equivalent problem
(8.28) min๐ฅโ๐,๐งโ๐
max_โ๐
L๐ (๐ฅ, ๐ง; _) โ ๐น (๐ฅ) +๐บ (๐ง) + ใ๐ด๐ฅ + ๐ต๐ง โ ๐, _ใ๐ + ๐2 โ๐ด๐ฅ + ๐ต๐ง โ ๐ โ2๐ .
The saddle-point functional L๐ is called the augmented Lagrangian.
A classical approach for the solution of (8.28) is by alternatingly solving for one variable,keeping the others xed. If we take a proximal step for the dual variable or Lagrangemultiplier _, this yields the Alternating Directions Method of Multipliers (ADMM)
(8.29)
๐ฅ๐+1 โ argmin๐ฅโ๐
L๐ (๐ฅ, ๐ง๐ ; _๐),
๐ง๐+1 โ argmin๐งโ๐
L๐ (๐ฅ๐+1, ๐ง; _๐),
_๐+1 โ argmax_โ๐
L๐ (๐ฅ๐+1, ๐ง๐+1; _) โ 12๐ โ_ โ _
๐ โ2๐ .
This can be rewritten as
(8.30)
๐ฅ๐+1 โ (๐ดโ๐ด + ๐โ1๐๐น )โ1(๐ดโ(๐ โ ๐ต๐ง๐ โ ๐โ1_๐)),๐ง๐+1 โ (๐ตโ๐ต + ๐โ1๐๐บ)โ1(๐ตโ(๐ โ๐ด๐ฅ๐+1 โ ๐โ1_๐)),_๐+1 โ _๐ + ๐ (๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐).
As can be observed, the ADMM requires inverting relatively complicated set-valued opera-tors in place of simple proximal point operations. This is why the basic ADMM is seldompractically implementable without the application of a further optimization method tosolve the ๐ฅ and ๐ง updates.
In the literature, there have been various remedies to the nonimplementability of theADMM. In particular, one can modify the ADMM iterations by adding to (8.29) additionalproximal terms. Introducing for some๐๐ฅ โ ๐(๐ ;๐ ) and๐๐ง โ ๐(๐ ;๐ ) the weighted normsโ๐ฅ โ๐๐ฅ โ
โใ๐๐ฅ๐ฅ, ๐ฅใ๐ and โ๐งโ๐๐ง โ
โใ๐๐ง๐ง, ๐งใ๐ , this leads to the iteration
(8.31)
๐ฅ๐+1 โ argmin๐ฅโ๐
L๐ (๐ฅ, ๐ง๐ ; _๐) + 12 โ๐ฅ โ ๐ฅ๐ โ2๐๐ฅ ,
๐ง๐+1 โ argmin๐งโ๐
L๐ (๐ฅ๐+1, ๐ง; _๐) + 12 โ๐ง โ ๐ง
๐ โ2๐๐ง ,
_๐+1 โ argmax_โ๐
L๐ (๐ฅ๐+1, ๐ง๐+1; _) โ 12๐ โ_ โ _
๐ โ2๐ .
If we specically take ๐๐ฅ โ ๐โ1Id โ ๐๐ดโ๐ด and ๐๐ง โ \โ1Id โ ๐๐ตโ๐ต for some ๐, \ > 0 with
110
8 proximal point and splitting methods
\๐ โ๐ดโ < 1 and ๐๐ โ๐ตโ < 1, then we can expand
L๐ (๐ฅ, ๐ง; _) + 12 โ๐ฅ โ ๐ฅ๐ โ2๐๐ฅ = ๐น (๐ฅ) +๐บ (๐ง) + ใ๐ด๐ฅ + ๐ต๐ง โ ๐, _ใ๐
+ ๐ ใ๐ฅ,๐ดโ(๐ต๐ง โ ๐)ใ๐ + ๐2 โ๐ต๐ง โ ๐ โ2๐
+ 12๐ โ๐ฅ โ ๐ฅ๐ โ2๐ + ๐ ใ๐ฅ๐+1, ๐ดโ๐ด๐ฅ๐ใ๐ โ ๐
2 โ๐ด๐ฅ๐ โ2๐ ,
which has the โpartialโ subdierential ๐๐ฅ with respect to ๐ฅ (keeping ๐ง, _ xed)
๐๐ฅL๐ (๐ฅ, ๐ง; _) = ๐๐น (๐ฅ) +๐ดโ_ + ๐๐ดโ(๐ต๐ง โ ๐) + ๐โ1(๐ฅ โ ๐ฅ๐) + ๐๐ดโ๐ด๐ฅ๐ .
Similarly computing the partial subdierential ๐๐ง with respect to ๐ง, (8.31) can thus be writtenas the preconditioned ADMM
(8.32)
๐ฅ๐+1 โ prox๐๐น ((Id โ ๐๐)๐ดโ๐ด๐ฅ๐ + ๐๐ดโ(๐ (๐ โ ๐ต๐ง๐) โ _๐)),๐ง๐+1 โ prox\๐บ ((Id โ \๐)๐ตโ๐ต๐ง๐ + \๐ตโ(๐ (๐ โ๐ด๐ฅ๐+1) โ _๐)),_๐+1 โ _๐ + ๐ (๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐).
We will see in the next section that this method is just the PDPS method with the primaland dual variables exchanged.
Remark 8.6. The ADMM was introduced in [Gabay 1983; Arrow, Hurwicz & Uzawa 1958] as analternating approach to the classical Augmented Lagrangian method. The preconditioned ADMMis due to [Zhang, Burger & Osher 2011].
8.7 connections
In Section 8.5 we have seen the importance and interplay of problem formulation andalgorithm choice for problems with a specic structure. We will now see that many of thealgorithmswe have presented are actually equivalentwhen applied to diering formulationsof the problem. Hence, if one algorithm is ecient on one formulation of the problem,another algorithm may work equally well on a dierent formulation.
We start by considering the ADMM problem (8.26), which we can reformulate as
min๐ฅ,๐ง
๐น (๐ฅ) +๐บ (๐ง) + ๐ฟ{๐} (๐ด๐ฅ + ๐ต๐ง).
Applying the PDPS method (8.20) to this formulation yields the algorithm
(8.33)
๐ฅ๐+1 โ prox๐๐น (๐ฅ๐ โ ๐๐ดโ_๐),๐ง๐+1 โ prox๐๐บ (๐ง๐ โ ๐๐ตโ_๐),๐ฅ๐+1 โ 2๐ฅ๐+1 โ ๐ฅ๐ ,๐ง๐+1 โ 2๐ง๐+1 โ ๐ง๐ ,_๐+1 โ _๐ + ๐ (๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐).
111
8 proximal point and splitting methods
Note that both the ADMM (8.30) and the preconditioned ADMM (8.32) have a very similarform to this iteration. We will now demonstrate that if๐ด = Id and so ๐ = ๐ , i.e., if we wantto solve the (primal) problem
(8.34) min๐งโ๐
๐น (๐ โ ๐ต๐ง) +๐บ (๐ง),
then the ADMM is equivalent to the PDPS method (8.20) applied to the (dual) problem
(8.35) min๐ฆโ๐
[๐บโ(๐ตโ๐ฆ) โ ใ๐, ๐ฆใ๐ ] + ๐น โ(๐ฆ),
where the dual step will be performed with respect to ๐น โ.
To make the exact way the PDPS method is applied in each instance clearer, and to highlightthe primal-dual nature of the PDPS method, it will be more convenient to write the problemto which the PDPS method is applied in saddle-point form. Specically, minding (5.4)together with the discussion following Theorem 5.10, the problem min๐ฅ ๐น (๐ฅ) +๐บ (๐พ๐ฅ) canbe written as the saddle-point problem
min๐ฅโ๐
max๐ฆโ๐
๐น (๐ฅ) + ใ๐พ๐ฅ, ๐ฆใ๐ โ๐บโ(๐ฆ).
This formulation also shows the dual variable directly in the problem formulation. Appliedto (8.35), we then obtain the problem
(8.36) min๐ฆโ๐
max๐ฅโ๐
[๐บโ(๐ตโ๐ฆ) โ ใ๐, ๐ฆใ๐ ] + ใ๐ฅ, ๐ฆใ๐ โ ๐น (๐ฅ).
Our claim is that the PDPS method applied to this saddle-point formulation is equivalentto the ADMM in case of ๐ด = Id. The iterates of the two algorithms will be dierent, as thevariables solved for will be dierent aside from the shared ๐ฅ . However, all the variableswill be related by ane transformations.
We will also demonstrate that the preconditioned ADMM is equivalent to the PDPS methodwhen ๐ต = Id. In fact, we will demonstrate a chain of relationships from ADMM or precon-ditioned ADMM (primal problem) via the PDPS (saddle-point problem) method to the DRSmethod (dual problem); the equivalence between the ADMM and the DRS method evenholds generally.
To demonstrate the idea, we start with ๐ด = ๐ต = Id. Then (8.30) reads
(8.37)
๐ฅ๐+1 โ prox๐โ1๐น (๐ โ ๐ง๐ โ ๐โ1_๐),๐ง๐+1 โ prox๐โ1๐บ (๐ โ ๐ฅ๐+1 โ ๐โ1_๐),_๐+1 โ _๐ + ๐ (๐ฅ๐+1 + ๐ง๐+1 โ ๐).
Using the third step for the previous iteration to obtain an expression for ๐ง๐ , we can rewritethe rst step as
๐ฅ๐+1 โ prox๐โ1๐น (๐ฅ๐ โ ๐โ1(2_๐ โ _๐โ1)) .
112
8 proximal point and splitting methods
If we use Lemma 6.21 (ii), the second step reads
๐ง๐+1 โ (๐ โ ๐ฅ๐+1 โ ๐โ1_๐) โ ๐โ1prox๐๐บโ (๐ (๐ โ ๐ฅ๐+1) โ _๐).
Minding the third step of (8.37), this yields _๐+1 = โprox๐๐บโ (๐ (๐ โ ๐ฅ๐+1) โ _๐). Replacing_๐+1 by ๐ฆ๐+1 โ โ_๐+1, moving ๐ into the proximal part, and reordering the steps such that๐ฅ๐+1 becomes ๐ฅ๐ , transforms (8.37) into
(8.38){๐ฆ๐+1 โ prox๐ (๐บโโใ๐, ยท ใ) (๐ฆ๐ โ ๐๐ฅ๐),๐ฅ๐+1 โ prox๐โ1๐น (๐ฅ๐ + ๐โ1(2๐ฆ๐+1 โ ๐ฆ๐)) .
This is the PDPS method applied to (8.36) with ๐ต = Id. However, the step lengths ๐ and๐ = ๐โ1 do not satisfy ๐๐ โ๐พ โ2 < 1, which would be needed to deduce convergence of theADMM from that of the PDPS method. But we will see in Chapter 11 that these step lengthsat least lead to convergence of a certain โLagrangian duality gapโ, and for the ADMM wecan in general only prove such gap estimates.
To show the relation of ADMM to implicit splitting, we further use Lemma 6.21 (ii) in thesecond step of (8.38) to obtain
๐ฅ๐+1 = ๐โ1(2๐ฆ๐+1 โ ๐ฆ๐) + ๐ฅ๐ โ ๐โ1prox๐๐น โ (2๐ฆ๐+1 โ ๐ฆ๐ + ๐๐ฅ๐).
Introducing๐ค๐+1 โ ๐ฆ๐+1 โ ๐๐ฅ๐+1 and changing variables, we thus transform (8.38) into
(8.39){๐ฆ๐+1 โ prox๐ (๐บโโใ๐, ยท ใ) (๐ค๐),๐ค๐+1 โ ๐ค๐ โ ๐ฆ๐+1 + prox๐๐น โ (2๐ฆ๐+1 โ๐ค๐).
But this is the DRS method (8.8) applied to
min๐ฅโ๐
๐น โ(๐ฅ) + [๐บโ(๐ฅ) โ ใ๐, ๐ฅใ๐ ] .
Recall now from Lemma 5.4 that [๐บ (๐ โ ยท )]โ = ๐บโ(โ ยท ) + ใ๐, ยท ใ๐ . Theorem 5.10 thus showsthat this is the dual problem of (8.34).
We canmake the correspondence more general with the help of the following generalizationof Moreauโs identity (Theorem 6.20).
Lemma 8.7. Let ๐ = ๐บ โฆ ๐พ for convex, proper, and lower semicontinuous ๐บ : ๐ โ โ and๐พ โ ๐(๐ ;๐ ). Then for all ๐ฅ โ ๐ and ๐พ > 0,
๐ฅ = prox๐พ๐ (๐ฅ) + ๐พ๐พโ(๐พ๐พโ + ๐พโ1๐๐บโ)โ1(๐พโ1๐พ๐ฅ).
In particular,prox๐โ (๐ฅ) = ๐พโ(๐พ๐พโ + ๐๐บโ)โ1(๐พ๐ฅ).
113
8 proximal point and splitting methods
Proof. By Theorem 5.10,๐ค = prox๐พ๐ (๐ฅ) if and only if for some ๐ฆโ โ ๐ โ holds{โ๐พโ๐ฆโ โ ๐ค โ ๐ฅ,๐ฆโ โ ๐พ๐๐บ (๐พ๐ค) .
In other words, by Lemma 5.8, {โ๐พโ๐ฆโ = ๐ค โ ๐ฅ,๐พ๐ค โ ๐๐บโ(๐พโ1๐ฆโ).
Applying ๐พ to the rst relation, inserting the second, and multiplying by ๐พโ1 yields
๐พ๐พโ๐พโ1๐ฆโ + ๐พโ1๐๐บโ(๐พโ1๐ฅโ) = ๐พโ1๐พ๐ฆ,
i.e., ๐พโ1๐ฅโ โ (๐พ๐พโ +๐พโ1๐๐บโ)โ1(๐พโ1๐พ๐ฅ). Combined with โ๐พโ๐ฆโ = ๐ค โ ๐ฅ , this yields the rstclaim. The second claim then follows from Theorem 7.11 together with the rst claim for๐พ = 1. ๏ฟฝ
Theorem 8.8. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous.Also let ๐ด โ ๐(๐ ;๐ ), and ๐ต โ ๐(๐ ;๐ ), and ๐ โ ๐ . Assume the existence of a point (๐ฅ0, ๐ง0) โdom ๐น ร dom๐บ with ๐ด๐ฅ0 + ๐ต๐ง0 = ๐ . Then subject to ane transformations to obtain iteratesnot explicitly generated in each case, the following are equivalent:
(i) The ADMM applied to the (primal) problem
(8.40) min๐ฅโ๐,๐งโ๐
๐น (๐ฅ) +๐บ (๐ง) s.t. ๐ด๐ฅ + ๐ต๐ง = ๐.
(ii) The DRS method applied to the (dual) problem
(8.41) min๐ฆโ๐
๐น โ(๐ดโ๐ฆ) + [๐บโ(๐ตโ๐ฆ) โ ใ๐, ๐ฆใ๐ ] .
(iii) If ๐ด = Id, ๐ = ๐ , and ๐ = ๐โ1, the PDPS method applied to the (saddle-point) problem
(8.42) min๐ฆโ๐
max๐ฅโ๐
[๐บโ(๐ตโ๐ฆ) โ ใ๐, ๐ฆใ๐ ] + ใ๐ฅ, ๐ฆใ๐ โ ๐น (๐ฅ) .
Proof. The assumption on the existence of (๐ฅ0, ๐ง0) ensures that the inmum in (8.40) isnite. Multiplying the rst and second updates of (8.30) by๐ด and ๐ต, and changing variables๐ฅ๐+1 and ๐ง๐+1 to ๐ฅ๐+1 โ ๐ด๐ฅ๐+1 and ๐ง๐+1 โ ๐ต๐ง๐+1, we obtain
๐ฅ๐+1 โ ๐ด(๐ดโ๐ด + ๐โ1๐๐น )โ1(๐ดโ(๐ โ ๐ง๐ โ ๐โ1_๐)),๐ง๐+1 โ ๐ต(๐ตโ๐ต + ๐โ1๐๐บ)โ1(๐ตโ(๐ โ ๐ฅ๐+1 โ ๐โ1_๐)),_๐+1 โ _๐ + ๐ (๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐).
114
8 proximal point and splitting methods
Using Lemma 8.7 in place of Lemma 6.21 (ii), with ๐ฆ๐ โ โ_๐ and โ๐ฆ๐+1 = โ๐ฆ๐ + ๐ (๐ฅ๐+1 +๐ง๐+1 โ ๐), we transform this as above to
(8.43){๐ฆ๐+1 โ prox๐๐บโโฆ๐ตโ (๐ (๐ โ ๐ฅ๐) + ๐ฆ๐),๐ฅ๐+1 โ ๐ด(๐ดโ๐ด + ๐โ1๐๐น )โ1(๐ดโ(2๐ฆ๐+1 โ ๐ฆ๐ + ๐ฅ๐)) .
If ๐ด = Id, this is the PDPS method with the iterate equivalence ๐ฅ๐+1 = ๐ฅ๐+1. We continuewith Lemma 8.7 and๐ค๐+1 โ ๐ฆ๐+1 โ ๐๐ฅ๐+1 to transform (8.43) further into{
๐ฆ๐+1 โ prox๐ (๐บโโฆ๐ตโโใ๐, ยท ใ) (๐ค๐),๐ค๐+1 โ ๐ค๐ โ ๐ฆ๐+1 + prox๐๐น โโฆ๐ดโ (2๐ฆ๐+1 โ๐ค๐).
This is the DRS method.
We now want to apply Theorem 5.10 to ๐น (๐ฅ, ๐ง) โ ๐น (๐ฅ) + ๐บ (๐ง), ๏ฟฝ๏ฟฝ (๐ฆ) โ ๐ฟ{๐ฆ=๐} (๐ฆ), and๏ฟฝ๏ฟฝ โ (๐ด๐ต) to establish the claimed duality relationship between (8.40) and (8.41). However,dom๐บ = {๐} has empty interior, so condition (ii) of the theorem does not hold. RecallingRemarks 4.16 and 5.12, we can however replace the interior with the relative interiorri dom ๏ฟฝ๏ฟฝ = {๐}. Thus the condition reduces to the existence of ๐ฆ0 โ dom ๐น with ๐พ๐ฆ0 = ๐ ,which is satised by ๐ฆ0 = (๐ฅ0, ๐ง0).Finally, the relationship to (8.42) when ๐ด = Id is immediate from (8.41) and the denitionof the conjugate function ๐น โ. The existence of a saddle point follows from the proof ofTheorem 5.10. ๏ฟฝ
The methods in the proof of Theorem 8.8 are rarely computationally feasible or ecientunless ๐ด = ๐ต = Id, due to the dicult proximal mappings for compositions of functionalswith operators or the set-valued operator inversions required. On the other hand, the PDPSmethod (8.33) only requires that we can compute the proximal mappings of ๐บ and ๐น . Thisdemonstrates the importance of problem formulation.
Similar connections hold for the preconditioned ADMM (8.32). With the help of the thirdstep of (8.32), the rst step can be rewritten
๐ฅ๐+1 โ prox๐๐น (๐ฅ๐ โ ๐๐ดโ(2_๐ โ _๐โ1)) .
If \๐ = 1 and ๐ต = Id, the second step reads
๐ง๐+1 โ prox๐โ1๐บ ((๐ โ๐ด๐ฅ๐+1) โ ๐โ1_๐).
We transform this with Lemma 6.21 (ii) into
๐ง๐+1 = (๐ โ๐ด๐ฅ๐+1) โ ๐โ1_๐ โ ๐โ1prox๐๐บโ (๐ (๐ โ๐ด๐ฅ๐+1) โ _๐).
115
8 proximal point and splitting methods
Using the third step of (8.32), this is equivalent to
โ_๐+1 = prox๐๐บโ (๐ (๐ โ๐ด๐ฅ๐+1) โ _๐).
Introducing ๐ฆ๐+1 โ โ_๐+1 and changing the order of the rst and second step, we thereforetransform (8.32) into the PDPS method
(8.44){๐ฆ๐+1 โ prox๐๐บโ (๐ฆ๐ โ ๐๐ด๐ฅ๐),๐ฅ๐+1 โ prox๐๐น (๐ฅ๐ + ๐๐ดโ(2๐ฆ๐+1 โ ๐ฆ๐)) .
We therefore have obtained the following result.
Theorem 8.9. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous.Also let ๐ด โ ๐(๐ ;๐ ) and ๐ โ ๐ . Assume the existence of a point (๐ฅ0, ๐ง0) โ dom ๐น ร dom๐บwith ๐ด๐ฅ0 + ๐ต๐ง0 = ๐ . Take \ = ๐โ1. Then subject to ane transformations to obtain iterates notexplicitly generated in each case, the following are equivalent:
(i) The preconditioned ADMM (8.32) applied to the (primal) problem
min๐ฅโ๐,๐งโ๐
๐น (๐ฅ) +๐บ (๐ง) s.t. ๐ด๐ฅ + ๐ง = ๐.
(ii) The PDPS method applied to the (saddle point) problem
min๐ฆโ๐
max๐ฅโ๐
[๐บโ(๐ฆ) โ ใ๐, ๐ฆใ๐ ] + ใ๐ด๐ฅ, ๐ฆใ๐ โ ๐น (๐ฅ).
(iii) If ๐ด = Id, ๐ = ๐ , and ๐ = ๐โ1, the DouglasโRachford splitting applied to the (dual)problem
min๐ฆโ๐
๐น โ(๐ฆ) + [๐บโ(๐ฆ) โ ใ๐, ๐ฆใ๐ ] .
Proof. We have already proved the equivalence of the preconditioned ADMM and thePDPS method. For equivalence to the DouglasโRachford splitting, we observe that underthe additional assumptions of this theorem, (8.44) reduces to (8.38). ๏ฟฝ
116
9 WEAK CONVERGENCE
Now that we have in the previous chapter derived several iterative procedures through themanipulation of xed-point equations, we have to show that they indeed converge to axed point (which by construction is then the solution of an optimization problem, makingthese procedures optimization algorithms). We start with weak convergence, as this is themost that can generally be expected.
The classical approach to proving weak convergence is by introducing suitable contractive(or at least rmly nonexpansive) operators related to the algorithm and then applyingclassical xed-point theorems (see Remark 9.5 below). We will instead introduce a verydirect approach that will then extend in the following chapters to be also capable of provingconvergence rates. The three main ingredients of all convergence proofs will be
(i) The three-point identity (1.5), which we recall here as
(9.1) ใ๐ฅ โ ๐ฆ, ๐ฅ โ ๐งใ๐ =12 โ๐ฅ โ ๐ฆ โ2๐ โ 1
2 โ๐ฆ โ ๐งโ2๐ + 12 โ๐ฅ โ ๐งโ2๐ for all ๐ฅ, ๐ฆ, ๐ง โ ๐ .
(ii) The monotonicity of the operator ๐ป whose roots we seek to nd (which in thesimplest case equals ๐๐น for the functional ๐น we want to minimize).
(iii) The nonnegativity of the preconditioning operators๐ dening the implicit forms ofthe algorithms we presented in Chapter 8.
In the later chapters, stronger versions of the last two ingredients will be required to obtainconvergence rates and the convergence of function value dierences ๐น (๐ฅ๐+1) โ ๐น (๐ฅ) or ofmore general gap functionals.
9.1 opialโs lemma and fejรฉr monotonicity
The next lemma forms the basis of all our weak convergence proofs. It is a generalizedsubsequence argument, showing that if all weak limit points of a sequence lie in a set andif the sequence does not diverge (in the strong sense) away from this set, the full sequenceconverges weakly. We recall that ๐ฅ โ ๐ is a weak(-โ) limit point of the sequence {๐ฅ๐}๐โโ,if there exists a subsequence such that ๐ฅ๐โ โ ๐ weakly(-โ) in ๐ .
117
9 weak convergence
Lemma 9.1 (Opial). On a Hilbert space ๐ , let ๐ โ ๐ be nonempty, and {๐ฅ๐}๐โโ โ ๐ satisfythe following conditions:
(i) โ๐ฅ๐+1 โ ๐ฅ โ๐ โค โ๐ฅ๐ โ ๐ฅ โ๐ for all ๐ฅ โ ๐ and ๐ โ โ;
(ii) all weak limit points of {๐ฅ๐}๐โโ belong to ๐ ;
Then ๐ฅ๐ โ ๐ฅ weakly in ๐ for some ๐ฅ โ ๐ .
Proof. Condition (i) implies that the sequence {๐ฅ๐}๐โโ is bounded and hence by Theorem 1.9contains a weakly convergent subsequence. Let now ๐ฅ and๐ฅ be weak limit points. Condition(i) then implies that both {โ๐ฅ๐ โ ๐ฅ โ๐ }๐โโ and {โ๐ฅ๐ โ ๐ฅ โ๐ }๐โโ are decreasing and boundedand therefore convergent. This implies that
ใ๐ฅ๐ , ๐ฅ โ ๐ฅใ๐ =12
(โ๐ฅ๐ โ ๐ฅ โ2๐ โ โ๐ฅ๐ โ ๐ฅ โ2๐ + โ๐ฅ โ2๐ โ โ๐ฅ โ2๐
)โ ๐ โ โ.
Since ๐ฅ is a weak accumulation point, there exists a subsequence {๐ฅ๐๐ }๐โโ with ๐ฅ๐๐ โ ๐ฅ ;similarly, there exists a subsequence {๐ฅ๐๐ }๐โโ with ๐ฅ๐๐ โ ๐ฅ . Hence,
ใ๐ฅ, ๐ฅ โ ๐ฅใ๐ = lim๐โโใ๐ฅ
๐๐ , ๐ฅ โ ๐ฅใ๐ = ๐ = lim๐โโใ๐ฅ
๐๐ , ๐ฅ โ ๐ฅใ๐ = ใ๐ฅ, ๐ฅ โ ๐ฅใ๐ ,
and therefore0 = ใ๐ฅ โ ๐ฅ, ๐ฅ โ ๐ฅใ๐ = โ๐ฅ โ ๐ฅ โ2๐ ,
i.e., ๐ฅ = ๐ฅ . Every convergent subsequence thus has the same weak limit (which lies in ๐by condition (ii)). Since every subsequence contains a convergent subsequence, taking asubsequence of {๐ฅ๐}๐โโ assumed not to converge (to ๐ฅ ), we obtain a contradiction, thereforededucing that ๐ฅ is the weak limit of the full sequence {๐ฅ๐}๐โโ. ๏ฟฝ
A sequence satisfying the condition (i) is called Fejรฉr monotone (with respect to ๐ ); this is acrucial property of iterates generated by any xed-point algorithm.
Remark 9.2. Lemma 9.1 rst appeared in the proof of [Opial 1967, Theorem 1]. (There ๐ is assumedto be closed and convex, but we do not require this since Condition (ii) is already sucient to showthe claim.)
The concept of Fejรฉr monotone sequences rst appears in [Fejรฉr 1922], where it was observed thatfor every point outside the convex hull of a subset of the Euclidean plane, it is always possible toconstruct a point that is closer to each point in the subset than the original point (and that thisproperty in fact characterizes the convex hull). The term Fejรฉr monotone itself appears in [Motzkin& Schoenberg 1954], where this construction is used to show convergence of an iterative schemefor the projection onto a convex polytope.
118
9 weak convergence
9.2 the fundamental methods: proximal point and explicit splitting
Using Opialโs Lemma 9.1, we can fairly directly show weak convergence of the proximalpoint and forward-backward splitting methods.
proximal point method
We recall our most fundamental nonsmooth optimization algorithm, the proximal pointmethod. For later use, we treat the general version of (8.1) for an arbitrary set-valuedoperator ๐ป : ๐ โ ๐ , i.e.,
(9.2) ๐ฅ๐+1 = R๐๐๐ป (๐ฅ๐).
We will need the next lemma to allow a very general choice of the step lengths {๐๐}๐โโ. (Ifwe assume ๐๐ โฅ Y > 0, in particular if we keep ๐๐ โก ๐ constant, it will not be needed.) For thestatement, note that by the denition of the resolvent, (9.2) is equivalent to ๐โ1
๐(๐ฅ๐ โ๐ฅ๐+1) โ
๐ป (๐ฅ๐+1).
Lemma 9.3. Let {๐๐}๐โโ โ (0,โ) with โโ๐=0 ๐
2๐= โ, and let ๐ป : ๐ โ ๐ be monotone.
Suppose {๐ฅ๐}๐โโ and๐ค๐+1 โ โ๐โ1๐(๐ฅ๐+1 โ ๐ฅ๐) satises
(i) 0 โ ๐ค๐+1 โ ๐ป (๐ฅ๐+1) and
(ii)โโ๐=0
๐2๐ โ๐ค๐ โ2๐ < โ.
Then โ๐ค๐ โ๐ โ 0.
Proof. Since๐ค๐ โ ๐ป (๐ฅ๐) and ๐ป is monotone, we have from the denition of๐ค๐ that
0 โค ใ๐ค๐+1 โ๐ค๐ , ๐ฅ๐+1 โ ๐ฅ๐ใ๐ = ๐โ1๐ ใ๐ค๐ โ๐ค๐+1,๐ค๐+1ใ๐ โค ๐โ1๐ โ๐ค๐+1โ๐ (โ๐ค๐ โ๐ โ โ๐ค๐+1โ๐ ).
Thus the nonnegative sequence {โ๐ค๐ โ๐ }๐โโ is decreasing and hence converges to some๐ โฅ 0. Since โโ
๐=0 ๐2๐= โ, the second assumption implies that lim inf๐โโ โ๐ค๐ โ๐ = 0.
Since the full sequence converges,๐ = 0, i.e., โ๐ค๐ โ๐ โ 0 as claimed. ๏ฟฝ
This shows that the โgeneralized residualโ ๐ค๐ in the inclusion ๐ค๐ โ ๐ป (๐ฅ๐) converges(strongly) to zero. As usual, this does not (yet) imply that {๐ฅ๐}๐โโ itself converges; but if itdoes, we expect the limit to be a root of ๐ป . This is what we prove next, using the threefundamental ingredients we introduced in the beginning of the chapter.
119
9 weak convergence
Theorem 9.4. Let ๐ป : ๐ โ ๐ be monotone and weak-to-strong outer semicontinuous with๐ปโ1(0) โ 0. Furthermore, let {๐๐}๐โโ โ (0,โ) with โโ
๐=0 ๐2๐= โ. If {๐ฅ๐}๐โโ โ ๐ is given
by the iteration (9.2) for any initial iterate ๐ฅ0 โ ๐ , then ๐ฅ๐ โ ๐ฅ for some root ๐ฅ โ ๐ปโ1(0).
Proof. We recall that the proximal point iteration can be written in implicit form as
(9.3) 0 โ ๐๐๐ป (๐ฅ๐+1) + (๐ฅ๐+1 โ ๐ฅ๐).We โtestโ (9.3) by the application of ใ ยท , ๐ฅ๐+1 โ ๐ฅใ๐ for an arbitrary ๐ฅ โ ๐ปโ1(0). Thus weobtain
(9.4) 0 โ ใ๐๐๐ป (๐ฅ๐+1) + (๐ฅ๐+1 โ ๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ ,where the right-hand side should be understood as the set of all possible inner productsinvolving elements of ๐ป (๐ฅ๐+1). By the monotonicity of ๐ป , since 0 โ ๐ป (๐ฅ), we have
ใ๐ป (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ 0,
which again should be understood to hold for any๐ค โ ๐ป (๐ฅ๐+1). (We will frequently makeuse of this notation and the one from (9.4) throughout this and the following chapters tokeep the presentation concise.) Thus (9.4) yields
ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ โค 0.
Applying now the three-point identity (9.1) for ๐ฅ = ๐ฅ๐+1, ๐ฆ = ๐ฅ๐ , and ๐ง = ๐ฅ , yields
(9.5) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โค 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ .
This shows the Fejรฉr monotonicity of {๐ฅ๐}๐โโ with respect to ๐ = ๐ปโ1(0).Furthermore, summing (9.5) over ๐ = 0, . . . , ๐ โ 1 gives
(9.6) 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ +๐โ1โ๐=0
12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โค 12 โ๐ฅ
0 โ ๐ฅ โ2๐ =: ๐ถ0.
Writing ๐ค๐+1 โ โ๐โ1๐(๐ฅ๐+1 โ ๐ฅ๐), the implicit iteration (9.3) shows that ๐ค๐+1 โ ๐ป (๐ฅ๐+1).
From (9.6) we also deduce that๐โ1โ๐=0
๐2๐ โ๐ค๐+1โ2๐ โค 2๐ถ0.
If ๐๐ โฅ Y > 0, letting ๐ โ โ shows โ๐ค๐+1โ๐ โ 0. Otherwise, we can use Lemma 9.3 toestablish the same.
Let nally ๐ฅ be any weak limit point of {๐ฅ๐}๐โโ, that is ๐ฅ๐๐ โ ๐ฅ for a subsequence{๐๐}๐โโ โ โ. Recall that ๐ค๐๐ โ ๐ป (๐ฅ๐๐ ). The weak-to-strong outer semicontinuity of ๐ปnow immediately yields 0 โ ๐ป (๐ฅ). We then nish by applying Opialโs Lemma 9.1 for theset ๐ = ๐ปโ1(0). ๏ฟฝ
120
9 weak convergence
Note that the conditions of Theorem 9.4 are in particularly satised if๐ป is either maximallymonotone (Lemma 6.8) or monotone and BCP outer semicontinuous (Lemma 6.10). Inparticular, applying Theorem 9.4 to ๐ป = ๐๐ฝ yields the convergence of the proximal pointmethod (8.1) for any proper, convex, and lower semicontinuous functional ๐ฝ : ๐ โ โ.
Remark 9.5. A conventional way of proving the convergence of the proximal point method is withBrowderโs xed-point theorem [Browder 1965], which shows the existence of xed points of rmlynonexpansive or, more generally, ๐ผ-averaged mappings. (We have already shown in Lemma 6.13the rm nonexpansivity of the proximal map.) On the other hand, to prove Browderโs xed-pointtheorem itself, we can use similar arguments as Theorem 9.4, see Theorem 9.20 below.
explicit splitting
The convergence of the forward-backward splitting method
(9.7) ๐ฅ๐+1 = prox๐๐๐บ (๐ฅ๐ โ ๐๐โ๐น (๐ฅ๐))
can be shown analogously. To do so, we need to assume the Lipschitz continuity of thegradient of ๐น (since we are not using a proximal point mapping for ๐น which is alwaysrmly nonexpansive and hence Lipschitz continuous).
Theorem 9.6. Let ๐น : ๐ โ โ and๐บ : ๐ โ โ be proper, convex, and lower semicontinuous.Suppose (๐(๐น + ๐บ))โ1(0) โ โ , i.e., that ๐ฝ โ ๐น + ๐บ has a minimizer. Furthermore, let ๐น beGรขteaux dierentiable with ๐ฟ-Lipschitz gradient. If 0 < ๐min โค ๐๐ < 2๐ฟโ1, then for any initialiterate ๐ฅ0 โ ๐ the sequence generated by (9.7) converges weakly to a root ๐ฅ โ (๐(๐น +๐บ))โ1(0).
Proof. We again start by writing (9.7) in implicit form as
(9.8) 0 โ ๐๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] + (๐ฅ๐+1 โ ๐ฅ๐).
By the monotonicity of ๐๐บ and the three-point monotonicity (7.9) of ๐น from Corollary 7.2,we rst deduce for any ๐ฅ โ ๐ โ (๐(๐น +๐บ))โ1(0) that
ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ โ๐ฟ4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Thus, again testing (9.8) with ใ ยท , ๐ฅ๐+1 โ ๐ฅใyields
ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ โค ๐ฟ๐๐4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
The three-point identity (9.1) now implies that
(9.9) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + 1 โ ๐๐๐ฟ/22 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค 1
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
121
9 weak convergence
The assumption 2 > ๐๐๐ฟ then establishes the Fejรฉr monotonicity of {๐ฅ๐}๐โโ with respectto ๐ . Let now ๐ฅ be a weak limit point of {๐ฅ๐}๐โโ, i.e., ๐ฅ๐๐ โ ๐ฅ for a subsequence {๐๐}๐โโ โโ. Since (9.9) implies ๐ฅ๐+1 โ ๐ฅ๐ โ 0 strongly, we have โ๐น (๐ฅ๐๐+1) โ โ๐น (๐ฅ๐๐ ) โ 0 bythe Lipschitz continuity of โ๐น . Consequently, using again the subdierential sum ruleTheorem 4.14, ๐(๐บ + ๐น ) (๐ฅ๐๐+1) 3 ๐ค๐๐+1 + โ๐น (๐ฅ๐๐+1) โ โ๐น (๐ฅ๐๐ ) โ 0. By the weak-to-strongouter semicontinuity of ๐(๐บ + ๐น ) from Lemma 6.8 and Theorem 6.11, it follows that 0 โ๐(๐บ + ๐น ) (๐ฅ). We nish by applying Opialโs Lemma 9.1 with ๐ = (๐(๐น +๐บ))โ1(0). ๏ฟฝ
9.3 preconditioned proximal point methods: drs and pdps
We now extend the analysis of the previous section to the preconditioned proximal pointmethod (8.11), which we recall can be written in implicit form as
(9.10) 0 โ ๐ป (๐ฅ๐+1) +๐ (๐ฅ๐+1 โ ๐ฅ๐)
for some preconditioning operator๐ โ ๐(๐ ;๐ ) and includes the DouglasโRachford split-ting (DRS) and the primal-dual proximal splitting (PDPS) methods as special cases. To dealwith๐ , we need to improve Theorem 9.6 slightly. First, we introduce the preconditionednorm โ๐ฅ โ๐ โ
โใ๐๐ฅ, ๐ฅใ, which satises the preconditioned three-point identity
(9.11) ใ๐ (๐ฅ โ ๐ฆ), ๐ฅ โ ๐งใ = 12 โ๐ฅ โ ๐ฆ โ2๐ โ 1
2 โ๐ฆ โ ๐งโ2๐ + 12 โ๐ฅ โ ๐งโ2๐ for all ๐ฅ, ๐ฆ, ๐ง โ ๐ .
The boundedness assumption in the statement of the next theorem holds in particular for๐ = Id and ๐ป maximally monotone by Corollary 6.14.
Theorem 9.7. Suppose๐ป : ๐ โ ๐ is monotone and weak-to-strong outer semicontinuous with๐ปโ1(0) โ โ , that๐ โ ๐(๐ ;๐ ) is self-adjoint and positive semi-denite, and that either๐ hasa bounded inverse, or (๐ป +๐)โ1โฆ๐ 1/2 is bounded on bounded sets. Let the initial iterate ๐ฅ0 โ ๐be arbitrary, and assume that (9.10) has a unique solution ๐ฅ๐+1 for all ๐ โ โ. Then the iterates{๐ฅ๐}๐โโ of (9.10) are bounded and satisfy 0 โ lim sup๐โโ๐ป (๐ฅ๐) and๐ 1/2(๐ฅ๐ โ ๐ฅ) โ 0 forsome ๐ฅ โ ๐ปโ1(0).
Proof. Let ๐ฅ โ ๐ปโ1(0) be arbitrary. By the monotonicity of ๐ป , we then have as before
ใ๐ป (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ 0,
which together with (9.10) yields
(9.12) ใ๐ (๐ฅ๐+1 โ ๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โค 0.
122
9 weak convergence
Applying the preconditioned three-point identity (9.11) for ๐ฅ = ๐ฅ๐+1, ๐ฆ = ๐ฅ๐ , and ๐ง = ๐ฅ in(9.12) shows that
(9.13) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โค 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ ,
and summing (9.13) over ๐ = 0, . . . , ๐ โ 1 yields
(9.14) 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ +๐โ1โ๐=0
12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โค 12 โ๐ฅ
0 โ ๐ฅ โ2๐ .
Let now ๐ง๐ โ ๐ 1/2๐ฅ๐ . Our objective is then to show ๐ง๐ โ ๐ง for some ๐ง โ ๐ โ ๐ 1/2๐ปโ1(0),which we do by using Opialโs Lemma 9.1. From (9.13), we obtain the necessary Fejรฉrmonotonicity of {๐ง๐}๐โโ with respect to the set ๐ . It remains to verify that ๐ contains allweak limit points of {๐ง๐}๐โโ.Let therefore ๐ง be such a limit point, i.e., ๐ง๐๐ โ ๐ง for a subsequence {๐๐}๐โโ. We want toshow that now see that ๐ง = ๐ 1/2๐ฅ for a weak limit point ๐ฅ of {๐ฅ๐}๐โโ. We proceed by rstshowing in two cases the boundedness of {๐ฅ๐}๐โโ:
(i) If ๐ has a bounded inverse, then ๐ โฅ \๐ผ for some \ > 0, and thus the sequence{๐ฅ๐}๐โโ is bounded by (9.14).
(ii) Otherwise, (๐ป + ๐)โ1 โฆ ๐ 1/2 is bounded on bounded sets. Now (9.14) only givesboundedness of {๐ง๐}๐โโ. However, ๐ฅ๐+1 โ (๐ป +๐)โ1(๐๐ฅ๐) = (๐ป +๐)โ1(๐ 1/2๐ง๐),and {๐ง๐}๐โโ is bounded by (9.14), so we obtain the boundedness of {๐ฅ๐}๐โโ.
Thus there exists a further subsequence of {๐ฅ๐๐ }๐โโ, weakly converging to some ๐ฅ โ ๐ .Since ๐ง๐ = ๐ 1/2๐ฅ๐ , it follows that ๐ง = ๐ 1/2๐ฅ . To show that ๐ง โ ๐ , if therefore suces toshow that the weak limit points of {๐ฅ๐}๐โโ belong to ๐ปโ1(0).Let thus ๐ฅ be any weak limit point of {๐ฅ๐}๐โโ, i.e., ๐ฅ๐๐ โ ๐ฅ for some subsequence {๐๐}๐โโ โโ. From (9.14), we obtain rst that๐ 1/2(๐ฅ๐+1 โ ๐ฅ๐) โ 0 and hence that๐ค๐+1 โ โ๐ (๐ฅ๐+1 โ๐ฅ๐) โ 0. From (8.18), we also know that๐ค๐+1 โ ๐ป (๐ฅ๐+1). It follows that 0 = lim๐โโ๐ค๐+1 โlim sup๐โโ๐ป (๐ฅ๐+1). The weak-to-strong outer semicontinuity now immediately yields0 โ ๐ป (๐ฅ). Hence, ๐ contains all weak limit points of {๐ง๐}๐โโ.The claim now follows from Lemma 9.1. ๏ฟฝ
In the following, we verify that the DRS and PDPS methods satisfy the assumptions of thistheorem.
123
9 weak convergence
douglasโrachford splitting
Recall that the DRS method (8.8), i.e.,
(9.15)
๐ฅ๐+1 = prox๐๐น (๐ง๐),๐ฆ๐+1 = prox๐๐บ (2๐ฅ๐+1 โ ๐ง๐),๐ง๐+1 = ๐ง๐ + ๐ฆ๐+1 โ ๐ฅ๐+1,
can be written as the preconditioned proximal point method (8.11) in terms of๐ข = (๐ฅ, ๐ฆ, ๐ง) โ๐ โ ๐ 3 and the operators
(9.16) ๐ป (๐ฅ, ๐ฆ, ๐ง) โ ยฉยญยซ๐๐ต(๐ฅ) + ๐ฆ โ ๐ง๐๐ด(๐ฆ) + ๐ง โ ๐ฅ
๐ฅ โ ๐ฆยชยฎยฌ and ๐ โ
ยฉยญยซ0 0 00 0 00 0 ๐ผ
ยชยฎยฌfor ๐ต = ๐๐น and๐ด = ๐๐บ . We are now interested in the properties of ๐ป in terms of those of๐ดand ๐ต. For this, we can make use of the generic structure of ๐ป , which will reappear severaltimes in the following.
Lemma 9.8. If ๐ด : ๐ โ ๐ is maximally monotone and ฮ โ ๐(๐ ;๐ ) is skew-adjoint (i.e.,ฮโ = โฮ), then ๐ป โ ๐ด + ฮ is maximally monotone. In particular, any skew-adjoint operatorฮ is maximally monotone.
Proof. Let ๐ฅ, ๐งโ โ ๐ be given such that
ใ๐งโ โ ๐งโ, ๐ฅ โ ๐ฅใ๐ โฅ 0 for all ๐ฅ โ ๐, ๐งโ โ ๐ป (๐ฅ).
Recalling (6.2), we need to show that ๐งโ โ ๐ป (๐ฅ). By the denition of ๐ป , for any ๐งโ โ ๐ป (๐ฅ)there exists a ๐ฅโ โ ๐ด(๐ฅ) with ๐งโ = ๐ฅโ + ฮ๐ฅ . On the other hand, setting ๐ฅโ โ ๐งโ โ ฮ๐ฅ ,we have ๐งโ = ๐ฅโ + ฮ๐ฅ . We are thus done if we can show that ๐ฅโ โ ๐ด(๐ฅ). But using theskew-adjointness of ๐ป and the symmetry of the inner product, we can write
(9.17) 0 โค ใ๐งโ โ ๐งโ, ๐ฅ โ ๐ฅใ๐= ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + ใฮ(๐ฅ โ ๐ฅ), ๐ฅ โ ๐ฅใ๐= ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + 1
2 ใฮ(๐ฅ โ ๐ฅ), ๐ฅ โ ๐ฅใ๐ โ 12 ใ๐ฅ โ ๐ฅ,ฮ(๐ฅ โ ๐ฅ)ใ๐
= ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ ,
and ๐ฅโ โ ๐ด(๐ฅ) follows from the maximal monotonicity of ๐ด.
To prove the nal claim about skew-adjoint operators being maximally monotone, wetake ๐ด = {0} = ๐๐ for the constant functional ๐ โก 0, which is maximally monotone byTheorem 6.11. ๏ฟฝ
124
9 weak convergence
Corollary 9.9. Let ๐ด and ๐ต be maximally monotone. Then the operator ๐ป dened in (9.16) ismaximally monotone.
Proof. Let
๏ฟฝ๏ฟฝ(๐ข) โ ยฉยญยซ๐๐ต(๐ฅ)๐๐ด(๐ฆ)
0
ยชยฎยฌ and ฮ โยฉยญยซ0 Id โIdโId 0 IdId โId 0
ยชยฎยฌ .From the denition of the inner product on the product space ๐ 3 together with Lemma 6.5,we have that ๏ฟฝ๏ฟฝ is maximally monotone, while ฮ is clearly skew-adjoint. The claim nowfollows from Lemma 9.8. ๏ฟฝ
We can now show convergence of the DRS method.
Corollary 9.10. Let ๐ด, ๐ต : ๐ โ ๐ be maximally monotone, and suppose (๐ด + ๐ต)โ1(0) โ โ .Pick a step length ๐ > 0 and an initial iterate ๐ง0 โ ๐ . Then the iterates {(๐ฅ๐ , ๐ฆ๐ , ๐ง๐)}๐โโ ofthe DRS method (9.15) converge weakly to (๐ฅ, ๐ฆ, ๏ฟฝ๏ฟฝ) โ ๐ปโ1(0) satisfying ๐ฅ = ๐ฆ โ (๐ด +๐ต)โ1(0).Moreover, ๐ฅ๐ โ ๐ฆ๐ โ 0.
Proof. Since ๐ด and ๐ต are maximally monotone, Corollary 6.14 shows that the DRS iterationis always solvable for ๐ข๐+1. Regarding convergence, we start by proving that the sequence{๐ข๐ = (๐ฅ๐ , ๐ฆ๐ , ๐ง๐)}๐โโ is bounded, ๐ง๐ โ ๏ฟฝ๏ฟฝ, and 0 โ lim sup๐โโ๐ป (๐ข๐). Note that thelatter implies as claimed that ๐ฅ๐ โ ๐ฆ๐ โ 0 strongly. We do this using Theorem 9.7 whoseconditions we have to verify. By Corollary 9.9, ๐ป is maximally monotone and hence weak-to-strong outer semicontinuous by Lemma 6.8. Since๐ is noninvertible, we also have toverify that (๐ป +๐)โ1โฆ๐ 1/2 is bounded on bounded sets. But since๐ข๐+1 โ (๐ป +๐)โ1(๐๐ข๐) =(๐ป +๐)โ1(๐ 1/2๐ข๐) is an equivalent formulation of the iteration (9.15), this follows fromthe Lipschitz continuity of the resolvent (Corollary 6.14). Hence, we can apply Theorem 9.7to deduce ๐ง๐ โ ๏ฟฝ๏ฟฝ by the denition of ๐ . Furthermore, there exist ๐ฅ, ๐ฆ โ ๐ such that0 โ ๐ป (๐ฅ, ๐ฆ, ๏ฟฝ๏ฟฝ). The third relation of this inclusion gives ๐ฅ = ๐ฆ , and adding the remaininginclusion now yields that 0 โ ๐ด(๐ฅ) + ๐ต(๐ฅ).It remains to show weak convergence of the other variables. Since {๐ข๐}๐โโ is bounded,it contains a subsequence converging weakly to some ๏ฟฝ๏ฟฝ = (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) which satises 0 โ๐ป (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) such that ๐ฅ = ๏ฟฝ๏ฟฝ . Since ๐ง๐ โ ๏ฟฝ๏ฟฝ, we have ๐ง = ๏ฟฝ๏ฟฝ. The rst relation of the inclusionthen can be rearranged to ๏ฟฝ๏ฟฝ = ๐ฅ = R๐๐ต (๐ง) = R๐๐ต (๏ฟฝ๏ฟฝ) by the single-valuedness of theresolvent (Corollary 6.14). The limit is thus independent of the subsequence, and hence asubsequenceโsubsequence argument shows that the full sequence converges. ๏ฟฝ
In particular, this convergence result applies to the special case of ๐ต = ๐๐น and ๐ด = ๐๐บ forproper, convex, lower semicontinuous ๐น,๐บ : ๐ โ โ. However, the xed point providedby the DRS method is related to a solution of the problem min๐ฅโ๐ ๐น (๐ฅ) +๐บ (๐ฅ) only if thesubdierential sum rule (Theorem 4.14) holds with equality.
125
9 weak convergence
primal-dual proximal splitting
To study the PDPS method, we recall from (8.14) and (8.19) the operators
(9.18) ๐ป (๐ข) โ(๐๐น (๐ฅ) + ๐พโ๐ฆ๐๐บโ(๐ฆ) โ ๐พ๐ฅ
), and ๐ โ
(๐โ1Id โ๐พโ
โ๐พ ๐โ1Id
)for ๐ข = (๐ฅ, ๐ฆ) โ ๐ ร ๐ =: ๐ . With these we have already shown in Section 8.4 that thePDPS method
(9.19)
๐ฅ๐+1 = prox๐๐น (๐ฅ๐ โ ๐๐พโ๐ฆ๐),๐ฅ๐+1 = 2๐ฅ๐+1 โ ๐ฅ๐ ,๐ฆ๐+1 = prox๐๐บโ (๐ฆ๐ + ๐๐พ๐ฅ๐+1).
has the form (9.10) of the preconditioned proximal point method. To show convergence,we rst have to establish some basic properties of both ๐ป and๐ .
Lemma 9.11. The operator ๐ : ๐ โ ๐ dened in (8.19) is bounded and self-adjoint. If๐๐ โ๐พ โ2
๐(๐ ;๐ ) < 1, then๐ is positive denite.
Proof. The denition of๐ directly implies boundedness (since ๐พ โ ๐(๐ ;๐ ) is bounded)and self-adjointness. Let now ๐ข = (๐ฅ, ๐ฆ) โ ๐ be given. Then
(9.20)
ใ๐๐ข,๐ขใ๐ = ใ๐โ1๐ฅ โ ๐พโ๐ฆ, ๐ฅใ๐ + ใ๐โ1๐ฆ โ ๐พ๐ฅ, ๐ฆใ๐= ๐โ1โ๐ฅ โ2๐ โ 2ใ๐ฅ, ๐พโ๐ฆใ๐ + ๐โ1โ๐ฆ โ2๐โฅ ๐โ1โ๐ฅ โ2๐ โ 2โ๐พ โ๐(๐ ;๐ ) โ๐ฅ โ๐ โ๐ฆ โ๐ + ๐โ1โ๐ฆ โ2๐โฅ ๐โ1โ๐ฅ โ2๐ โ โ๐พ โ๐(๐ ;๐ )
โ๐๐ (๐โ1โ๐ฅ โ2๐ + ๐โ1โ๐ฆ โ2๐ ) + ๐โ1โ๐ฆ โ2๐
= (1 โ โ๐พ โ๐(๐ ;๐ )โ๐๐) (๐โ1โ๐ฅ โ2๐ + ๐โ1โ๐ฆ โ2๐ )
โฅ ๐ถ (โ๐ฅ โ2๐ + โ๐ฆ โ2๐ )
for ๐ถ โ (1 โ โ๐พ โ๐(๐ ;๐ )โ๐๐)min{๐โ1, ๐โ1} > 0. Hence, ใ๐๐ข,๐ขใ๐ โฅ ๐ถ โ๐ขโ2๐ for all ๐ข โ ๐ ,
and therefore๐ is positive denite. ๏ฟฝ
Lemma 9.12. The operator ๐ป : ๐ โ ๐ dened in (9.18) is maximally monotone.
Proof. Let ๐ด(๐ข) โ(๐๐น (๐ฅ)๐๐บโ (๐ฆ)
)and ฮ โ
( 0 ๐พโโ๐พ 0
). Then ฮ is skew-adjoint, and ๐ด is maximally
monotone by the denition of the inner product on๐ = ๐ ร๐ and Theorem 6.11. The claimnow follows from Lemma 9.8. ๏ฟฝ
With this, we can deduce the convergence of the PDPS method.
126
9 weak convergence
Corollary 9.13. Let the convex, proper, and lower semicontinuous functions ๐น : ๐ โ โ,๐บ : ๐ โ โ, and the linear operator ๐พ โ ๐(๐ ;๐ ) satisfy the assumptions of Theorem 5.10.If, moreover, ๐๐ โ๐พ โ2
๐(๐ ;๐ ) < 1, then the sequence {๐ข๐ โ (๐ฅ๐ , ๐ฆ๐)}๐โโ generated by thePDPS method (9.19) for any initial iterate ๐ข0 โ ๐ ร ๐ converges weakly in ๐ to a pair๐ข โ (๐ฅ, ๐ฆ) โ ๐ปโ1(0), i.e., satisfying (8.13).
Proof. By Lemma 9.11,๐ is self-adjoint and positive denite and thus has a bounded inverse.Minding Lemma 9.12, we can therefore apply Theorem 9.7 to show that (๐ข๐ โ ๐ข) โ 0 forsome ๐ข โ ๐ปโ1(0) with respect to the inner product ใ๐ ยท , ยท ใ๐ . Since ๐ is has a boundedinverse, this implies that
ใ๐ข๐ , ๐๐คใ๐ = ใ๐๐ข๐ ,๐คใ๐ โ ใ๐๐ข,๐คใ๐ = ใ๐ข,๐๐คใ๐ for all๐ค โ ๐
and hence ๐ข๐ โ ๐ข in๐ since ran๐ = ๐ due to the invertibility of๐ . ๏ฟฝ
9.4 preconditioned explicit splitting methods: pdes and more
Let ๐ด, ๐ต : ๐ โ ๐ be monotone operators and consider the iterative scheme
(9.21) 0 โ ๐ด(๐ฅ๐+1) + ๐ต(๐ฅ๐) +๐ (๐ฅ๐+1 โ ๐ฅ๐),
which is implicit in ๐ด but explicit in ๐ต. We obviously intend to use this method to ndsome ๐ฅ โ (๐ด + ๐ต)โ1(0).As we have seen, the proximal point, PDPS, and DRS methods are all of the form (9.21) with๐ต = 0. The basic explicit splitting method is also of this form with ๐ด = ๐๐บ , ๐ต = โ๐น , and๐ = ๐โ1Id. It is moreover not dicult to see from (8.25) that primal-dual explicit splitting(PDES) method is also of the form (9.21) with nonzero ๐ต. So to prove the convergence ofthis algorithm, we want to improve Theorem 9.6 to be able to deal with the preconditioningoperator๐ and the general monotone operators ๐ด and ๐ต in place of subdierentials andgradients.
To proceed, we need a suitable notion of smoothness for ๐ต to be able to deal with theexplicit step. In Theorem 9.6 we only used the Lipschitz continuity of โ๐น in two places:rst, to establish the three-point monotonicity using Corollary 7.2, and second, at the endof the proof for a continuity argument. To simplify dealing with ๐ต that may only act on asubspace, as in the case of the primal-dual explicit splitting in Section 8.5, we now makethis three-point monotonicity with respect to an operator ฮ our main assumption.
Specically, we say that ๐ต : ๐ โ ๐ is three-point monotone at ๐ฅ โ ๐ with respect toฮ โ ๐(๐ ;๐ ) if
(9.22) ใ๐ต(๐ง) โ ๐ต(๐ฅ), ๐ฅ โ ๐ฅใ โฅ โ 14 โ๐ง โ ๐ฅ โ
2ฮ for all ๐ฅ, ๐ง โ ๐ .
127
9 weak convergence
If this holds for every ๐ฅ , we say that ๐ต is three-point monotone with respect to ฮ. FromCorollary 7.2, it is clear that if โ๐น is Lipschitz continuous with constant ๐ฟ, then ๐ต = โ๐น isthree-point monotone with respect to ฮ = ๐ฟ Id.
We again start with a lemma exploiting the structural properties of the saddle-point operatorto show a โshifted outer semicontinuityโ.
Lemma 9.14. Let ๐ป = ๐ด + ๐ต : ๐ โ ๐ be weak-to-strong outer semicontinuous with ๐ตsingle-valued and Lipschitz continuous. If๐ค๐+1 โ ๐ด(๐ฅ๐+1) + ๐ต(๐ง๐) for ๐ โ โ with๐ค๐ โ ๏ฟฝ๏ฟฝand ๐ฅ๐+1 โ ๐ง๐ โ 0 strongly in ๐ and ๐ฅ๐ โ ๐ฅ weakly in ๐ , then ๏ฟฝ๏ฟฝ โ ๐ป (๐ฅ).
Proof. We have๐ค๐+1 โ ๐ด(๐ฅ๐+1) + ๐ต(๐ง๐) so that
๏ฟฝ๏ฟฝ๐+1 โ ๐ค๐+1 โ ๐ต(๐ง๐) + ๐ต(๐ฅ๐+1) โ ๐ป (๐ฅ๐+1).Since๐ค๐+1 โ ๏ฟฝ๏ฟฝ and ๐ฅ๐+1 โ ๐ง๐ โ 0 and ๐ต is Lipschitz continuous, we have ๏ฟฝ๏ฟฝ๐+1 โ ๏ฟฝ๏ฟฝ aswell. The weak-to-strong outer semicontinuity of๐ป then immediately yields ๏ฟฝ๏ฟฝ โ ๐ป (๐ฅ). ๏ฟฝ
Theorem 9.15. Let๐ป = ๐ด+๐ต with๐ปโ1(0) โ โ for๐ด, ๐ต : ๐ โ ๐ with๐ดmonotone and๐ต single-valued Lipschitz continuous and three-point monotone with respect to some ฮ โ ๐(๐ ;๐ ).Furthermore, let๐ โ ๐(๐ ;๐ ) be self-adjoint, positive denite, with a bounded inverse, andsatisfy (2 โ Y)๐ โฅ ฮ for some Y > 0. Suppose ๐ป is weak-to-strong outer semicontinuous.Let the starting point ๐ฅ0 โ ๐ be arbitrary and assume that (9.21) has a unique solution ๐ฅ๐+1for every ๐ โ โ. Then the iterates {๐ฅ๐}๐โโ of (9.21) satisfy ๐ 1/2(๐ฅ๐ โ ๐ฅ) โ 0 for some๐ฅ โ ๐ปโ1(0).
Proof. The proof follows along the same lines as that of Theorem 9.7 with minor modica-tions. First, since 0 โ ๐ป (๐ฅ), the monotonicity of ๐ด and the three-point monotonicity (9.22)of ๐ต yields
ใ๐ด(๐ฅ๐+1) + ๐ต(๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ โฅ โ 14 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2ฮ,which together with (9.21) leads to
ใ๐ (๐ฅ๐+1 โ ๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ โค 14 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2ฮ.
From the preconditioned three-point identity (9.11) we then obtain
(9.23) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐โฮ/2 โค12 โ๐ฅ
๐ โ ๐ฅ โ2๐ .
Our assumption that (2 โ Y)๐ โฅ ฮ implies that ๐ โ ฮ/2 โฅ Y๐/2. By denition, we cantherefore bound the second norm on the left-hand side from below to obtain (9.13) with anadditional constant depending on Y. We may thus proceed as in the proof of Theorem 9.7 toestablish๐ค๐+1 โ โ๐ (๐ฅ๐+1 โ ๐ฅ๐) โ 0. We now have๐ค๐+1 โ ๐ด(๐ฅ๐+1) + ๐ต(๐ฅ๐) and thereforeuse Lemma 9.14 with ๐ง๐ = ๐ฅ๐ and ๏ฟฝ๏ฟฝ = 0 to establish 0 โ ๐ป (๐ฅ). The rest of the proof againproceeds as for Theorem 9.7 with the application of Opialโs Lemma 9.1. ๏ฟฝ
128
9 weak convergence
We again apply this result to show the convergence of specic splitting methods containingan explicit step.
primal-dual explicit splitting
We now return to algorithms for problems of the form
min๐ฅโ๐
๐น (๐ฅ) +๐บ (๐พ๐ฅ)
for Gรขteaux dierentiable ๐น and linear๐พ . Recall from (8.23) the primal-dual explicit splitting(PDES) method
(9.24){๐ฆ๐+1 = prox๐บโ ((Id โ ๐พ๐พโ)๐ฆ๐ + ๐พ (๐ฅ๐ โ โ๐น (๐ฅ๐))),๐ฅ๐+1 = ๐ฅ๐ โ โ๐น (๐ฅ๐) โ ๐พโ๐ฆ๐+1,
which can be written in implicit form as
(9.25) 0 โ ๐ป (๐ข๐+1) +(โ๐น (๐ฅ๐) โ โ๐น (๐ฅ๐+1)
0
)+๐ (๐ข๐+1 โ ๐ข๐)
with
(9.26) ๐ป (๐ข) โ(๐๐น (๐ฅ) + ๐พโ๐ฆ๐๐บโ(๐ฆ) โ ๐พ๐ฅ
)and ๐ โ
(Id 00 Id โ ๐พ๐พโ
),
for ๐ข = (๐ฅ, ๐ฆ) โ ๐ ร ๐ =: ๐ .
Corollary 9.16. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be proper, convex, and lower semicontinuous,and ๐พ โ ๐(๐ ;๐ ). Suppose ๐น is Gรขteaux dierentiable with ๐ฟ-Lipschitz gradient for ๐ฟ < 2,that โ๐พ โ๐(๐ ;๐ ) < 1, and that the assumptions of Theorem 5.10 are satised. Then for anyinitial iterate ๐ข0 โ ๐ ร ๐ the iterates {๐ข๐ = (๐ฅ๐ , ๐ฆ๐)}๐โโ of the (8.23) converge weakly tosome ๐ข โ ๐ปโ1(0) with ๐ป given by (8.14).
Proof. We recall that Theorem 5.10 guarantees that ๐ปโ1(0) โ โ . To apply Theorem 9.15, wewrite ๐ป = ๐ด + ๐ต for
๐ด(๐ข) โ(
0๐๐บโ(๐ฆ)
)+ ฮ๐ข, ๐ต(๐ข) โ
(โ๐น (๐ฅ)0
), ฮ โ
(0 ๐พโ
โ๐พ 0
).
We rst note that ๐ as given in (9.26) is self-adjoint and positive denite under ourassumption โ๐พ โ๐(๐ ;๐ ) < 1. By Corollary 7.2, the three-point monotonicity (9.22) holds forฮ โ
(๐ฟ 00 0
). Since ๐ฟ < 2, there furthermore exists an Y > 0 suciently small such that
(2โY)๐ โฅ ฮ. Finally, Lemma 9.12 shows that๐ป is maximally monotone and hence weak-to-strong outer semicontinuous by Lemma 6.8. The claim now follows from Theorem 9.15. ๏ฟฝ
Remark 9.17. It is possible to improve the result to โ๐พ โ โค 1 if we increase the complexity ofTheorem 9.15 slightly to allow for ๐ โฅ 0. However, in this case it is only possible to show theconvergence of the partial iterates {๐ฅ๐ }๐โโ.
129
9 weak convergence
primal-dual proximal splitting with an additional forward step
Using a similar switching term as in the implicit formulation (9.25) of the PDES method,it is possible to incorporate additional forward steps in the PDPS method. For ๐น = ๐น0 + ๐ธwith ๐น0, ๐ธ convex and ๐ธ Gรขteaux dierentiable, we therefore consider
(9.27) min๐ฅโ๐
๐น0(๐ฅ) + ๐ธ (๐ฅ) +๐บ (๐พ๐ฅ).
With ๐ข = (๐ฅ, ๐ฆ) and following Section 8.4, any minimizer ๐ฅ โ ๐ satises 0 โ ๐ป (๐ข) for
(9.28) ๐ป (๐ข) โ(๐๐น (๐ฅ) + โ๐ธ (๐ฅ) + ๐พโ๐ฆ
๐๐บโ(๐ฆ) โ ๐พ๐ฅ).
Similarly, following the arguments in Section 8.4, we can show that the iteration
(9.29)
๐ฅ๐+1 = prox๐๐น0 (๐ฅ๐ โ ๐โ๐ธ (๐ฅ๐) โ ๐๐พโ๐ฆ๐),๐ฅ๐+1 = 2๐ฅ๐+1 โ ๐ฅ๐ ,๐ฆ๐+1 = prox๐๐บโ (๐ฆ๐ + ๐๐พ๐ฅ๐+1),
is equivalent to the implicit formulation
0 โ(๐๐น0(๐ฅ๐+1) + โ๐ธ (๐ฅ๐) + ๐พโ๐ฆ๐+1
๐๐บ (๐ฆ๐+1) โ ๐พ๐ฅ๐+1)+๐ (๐ข๐+1 โ ๐ข๐)
with the preconditioner๐ dened as in (9.18). The convergence can thus be shown as forthe PDES method.
Corollary 9.18. Let ๐ธ : ๐ โ โ, ๐น0 : ๐ โ โ, and ๐บ : ๐ โ โ be proper, convex, and lowersemicontinuous, and ๐พ โ ๐(๐ ;๐ ). Suppose ๐ธ is Gรขteaux dierentiable with an ๐ฟ-Lipschitzgradient, and that the assumptions of Theorem 5.10 are satised with ๐น โ ๐น0 + ๐ธ. Assume,moreover, that ๐, ๐ > 0 satisfy
(9.30) 1 > โ๐พ โ2๐(๐ ;๐ )๐๐ + ๐ ๐ฟ2 .
Then for any initial iterate ๐ข0 โ ๐ ร๐ the iterates {๐ข๐}๐โโ of (9.29) converge weakly to some๐ข โ ๐ปโ1(0) for ๐ป given by (9.28).
Proof. As before, Theorem 5.10 guarantees that ๐ปโ1(0) โ โ . We apply Theorem 9.15 to
๐ด(๐ข) โ(๐๐น0(๐ฅ)๐๐บโ(๐ฆ)
)+ ฮ๐ข, ๐ต(๐ข) โ
(โ๐ธ (๐ฅ)0
), ฮ โ
(0 ๐พโ
โ๐พ 0
),
130
9 weak convergence
and ๐ given by (9.18). By Corollary 7.2, the three-point monotonicity (9.22) holds withฮ โ
(๐ฟ 00 0
). We have already shown in Lemma 9.11 that ๐ is self-adjoint and positive
denite. Furthermore, from (9.20) in the proof of Lemma 9.11, we have
ใ๐๐ข,๐ขใ โฅ (1 โ โ๐พ โ๐(๐ ;๐ )โ๐๐) (๐โ1โ๐ฅ โ2๐ + ๐โ1โ๐ฆ โ2๐ )
Thus (9.30) implies that๐ is positive denite. Arguing similarly to (9.20), we also estimate
ใ๐๐ข,๐ขใ๐ โฅ ๐โ1โ๐ฅ โ2๐ โ 2โ๐พ โ๐(๐ ;๐ ) โ๐ฅ โ๐ โ๐ฆ โ๐ + ๐โ1โ๐ฆ โ2๐ โฅ (1 โ โ๐พ โ2๐(๐ ;๐ )๐๐)๐โ1โ๐ฅ โ2๐ .
By the strict inequality in (9.30), we thus deduce (2 โ Y)๐ โฅ ฮ for some Y > 0.
Now by Lemma 9.12, ๐ป is again maximally monotone and therefore weak-to-strong outersemicontinuous by Lemma 6.8, and the claim follows from Theorem 9.15. ๏ฟฝ
Remark 9.19. The forward step was introduced to the basic PDPS method in [Condat 2013; Vu 2013],see also [Chambolle & Pock 2015]. These papers also introduced an additional over-relaxation stepthat we will discuss in Chapter 12.
9.5 fixed-point theorems
Based on our generic approach, we now prove the classical Browderโs xed-point theorem,which can itself be used to prove the convergence of optimization methods and other xed-point iterations (see Remark 9.5). We recall from Lemma 6.15 that rmly nonexpansive mapsare (1/2)-averaged, so the result applies by Lemma 6.13 to the resolvents of maximallymonotone maps in particular โ hence proving the convergence of the proximal pointmethod.
Theorem 9.20 (Browderโs fixed-point theorem). On a Hilbert space ๐ , suppose ๐ : ๐ โ ๐is ๐ผ-averaged for some ๐ผ โ (0, 1) and has a xed point ๐ฅ = ๐ (๐ฅ). Let ๐ฅ๐+1 โ ๐ (๐ฅ๐). Then๐ฅ๐ โ ๐ฅ weakly in ๐ for some xed point ๐ฅ of ๐ .
Proof. Finding a xed point of ๐ is equivalent to nding a root of ๐ป (๐ฅ) โ ๐ (๐ฅ) โ ๐ฅ .Similarly, we can rewrite the xed-point iteration as solving for ๐ฅ๐+1 the inclusion
(9.31) 0 = ๐ฅ๐ โ๐ (๐ฅ๐) + (๐ฅ๐+1 โ ๐ฅ๐).
Proceeding as in the previous sections, we test this by the application of ใ ยท , ๐ฅ๐+1 โ ๐ฅใ๐ .After application of the three-point identity (9.1), we then obtain
(9.32) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ + ใ๐ฅ๐ โ๐ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โค 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ .
131
9 weak convergence
Since ๐ฅ๐+1 = ๐ (๐ฅ๐), ๐ฅ is a xed point of ๐ , and by assumption ๐ = (1 โ ๐ผ)Id + ๐ผ ๐ฝ for somenonexpansive operator ๐ฝ : ๐ โ ๐ , we have
ใ๐ฅ๐ โ๐ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ = ใ๐ฅ๐ โ ๐ฅ โ (๐ (๐ฅ๐) โ๐ (๐ฅ)),๐ (๐ฅ๐) โ๐ (๐ฅ)ใ๐= ๐ผ ใ๐ฅ๐ โ ๐ฅ โ (๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)), (1 โ ๐ผ) (๐ฅ๐ โ ๐ฅ) + ๐ผ (๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ))ใ๐= (๐ผ โ ๐ผ2)โ๐ฅ๐ โ ๐ฅ โ2๐ โ ๐ผ2โ ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)โ2๐+ (2๐ผ2 โ ๐ผ)ใ๐ฅ๐ โ ๐ฅ, ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)ใ๐
as well as12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ =12 โ๐ (๐ฅ
๐) โ ๐ฅ๐ โ2๐ =๐ผ2
2 โ ๐ฝ (๐ฅ๐) โ ๐ฅ๐ โ2๐ =๐ผ2
2 โ ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ) โ (๐ฅ๐ โ ๐ฅ)โ2๐
=๐ผ2
2 โ๐ฅ๐ โ ๐ฅ โ2๐ + ๐ผ2
2 โ ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)โ2๐ โ ๐ผ2ใ๐ฅ๐ โ ๐ฅ, ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)ใ๐ .
Thus, for any ๐ฟ > 0,
1 โ ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ใ๐ฅ๐ โ๐ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ = ((1 + ๐ฟ)๐ผ2 โ ๐ผ)ใ๐ฅ๐ โ ๐ฅ, ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)ใ๐
+ 2๐ผ โ (1 + ๐ฟ)๐ผ22 โ๐ฅ๐ โ ๐ฅ โ2๐ โ (1 + ๐ฟ)๐ผ2
2 โ ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)โ2๐ .
Taking ๐ฟ = 1๐ผ โ 1, we have ๐ฟ > 0 and ๐ผ = (1 + ๐ฟ)๐ผ2. Thus the factor in front of the inner
product term is positive, and hence we obtain by the nonexpansivity of ๐ฝ
1 โ ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ใ๐ฅ๐ โ๐ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ =
๐ผ
2 โ๐ฅ๐ โ ๐ฅ โ2๐ โ ๐ผ
2 โ ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ)โ2๐ โฅ 0.
From (9.32), it now follows that
12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค 1
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
As before, this implies Fejรฉr monotonicity of {๐ฅ๐}๐โโ and that โ๐ฅ๐+1โ๐ฅ๐ โ๐ โ 0. The latterimplies โ๐ (๐ฅ๐) โ ๐ฅ๐ โ๐ โ 0 via (9.31). Let ๐ฅ be any weak limit point of {๐ฅ๐}๐โโ. Denote by๐ โ โ be the indices of the corresponding subsequence. We snow that ๐ฅ is a xed point of๐ . Since by Lemma 6.17 the set of xed points is convex and closed, the claim then followsfrom Opialโs Lemma 9.1.
To show that ๐ฅ is a xed point of ๐ , rst, we expand12 โ๐ฅ
๐ โ๐ (๐ฅ)โ2๐ =12 โ๐ฅ
๐ โ ๐ฅ โ2๐ + 12 โ๐ฅ โ๐ (๐ฅ)โ2๐ + ใ๐ฅ๐ โ ๐ฅ, ๐ฅ โ๐ (๐ฅ)ใ๐ .
Since ๐ฅ๐ โ ๐ฅ , this gives
lim sup๐3๐โโ
12 โ๐ฅ
๐ โ๐ (๐ฅ)โ2๐ โฅ lim sup๐3๐โโ
12 โ๐ฅ
๐ โ ๐ฅ โ2๐ + โ๐ฅ โ๐ (๐ฅ)โ2๐ .
132
9 weak convergence
On the other hand, by the nonexpansivity of ๐ and ๐ (๐ฅ๐) โ ๐ฅ๐ โ 0, we have
lim sup๐3๐โโ
โ๐ฅ๐ โ๐ (๐ฅ)โ๐ โค lim sup๐3๐โโ
(โ๐ (๐ฅ๐) โ๐ (๐ฅ)โ๐ + โ๐ฅ๐ โ๐ (๐ฅ๐)โ๐
)โค lim sup
๐3๐โโโ๐ฅ๐ โ๐ฅ โ๐ .
Together this two inequalities show, as desired, that โ๐ (๐ฅ) โ ๐ฅ โ = 0. ๏ฟฝ
Remark 9.21. Theorem 9.20 in its modern form (stated for rmly nonexpansive or more generally๐ผ-averaged maps) can be rst found in [Browder 1967]. However, similar results for what are nowcalled Krasnosel โฒskiฤฑโMann iterations โ which are closely related to ๐ผ-averaged maps โ were statedin more limited settings in [Mann 1953; Schaefer 1957; Petryshyn 1966; Krasnoselโฒskiฤฑ 1955; Opial1967].
133
10 RATES OF CONVERGENCE BY TESTING
As we have seen, minimizers of convex problems in a Hilbert space ๐ can generally becharacterized by the inclusion
0 โ ๐ป (๐ฅ)for the unknown ๐ฅ โ ๐ and a suitable monotone operator ๐ป : ๐ โ ๐ . This inclusion inturn can be solved using a (preconditioned) proximal point iteration that converges weaklyunder suitable assumptions. In the present chapter, we want to improve this analysis toobtain convergence rates, i.e., estimates of the distance โ๐ฅ๐ โ ๐ฅ โ๐ of iterates to ๐ฅ in terms ofthe iteration number ๐ . Our general approach will be to consider this distance multipliedby an iteration-dependent testing parameter ๐๐ (or, for structured algorithms, considerthe norm relative to a testing operator) and to show by roughly the same arguments as inChapter 9 that this product stays bounded: ๐๐ โ๐ฅ๐ โ ๐ฅ โ๐ โค ๐ถ . If we can then show that thistesting parameter grows at a certain rate, the distance must decay at the reciprocal rate.Consequently, we can now avoid the complications of dealing with weak convergence; infact, this chapter will consist of simple algebraic manipulations. However, for this to workwe need to assume additional properties of ๐ป , namely strong monotonicity. Recall fromLemma 7.4 that ๐ป is called strongly monotone with factor ๐พ > 0 if
(10.1) ใ๐ป (๐ฅ) โ ๐ป (๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ ๐พ โ๐ฅ โ ๐ฅ โ2๐ (๐ฅ, ๐ฅ โ ๐ ),
where, in a slight abuse of notation, the left-hand side is understood to stand for any choiceof elements from ๐ป (๐ฅ) and ๐ป (๐ฅ).Before we turn to the actual estimates, we rst dene various notions of convergence rates.Consider a function ๐ : โ โ [0,โ) (e.g., ๐ (๐) = โ๐ฅ๐ โ ๐ฅ โ๐ or ๐ (๐) = ๐บ (๐ฅ๐) โ๐บ (๐ฅ) for ๐ฅ aminimizer of ๐บ).
(i) We say that ๐ (๐) converges (to zero as ๐ โ โ) at the rate ๐ (๐ (๐)) if ๐ (๐) โค ๐ถ๐ (๐)for some constant ๐ถ > 0 for all ๐ โ โ and a decreasing function ๐ : โ โ [0,โ)with lim๐โโ ๐ (๐) = 0 (e.g., ๐ (๐) = 1/๐ or ๐ (๐) = 1/๐2).
(ii) Analogously, we say that a function ๐ : โ โ โ grows at the rate ฮฉ(๐น (๐)) if ๐ (๐) โฅ๐๐น (๐) for all๐ โ โ for some constant๐ > 0 and an increasing function ๐ : โ โ [0,โ)with lim๐โโ ๐น (๐) = โ.
134
10 rates of convergence by testing
Clearly ๐ = 1/๐ converges to zero at the rate ๐ = 1/๐น if and only if ๐ grows at the rate ๐น .The most common cases are ๐น (๐) = ๐ or ๐น (๐) = ๐2.We can alternatively characterize orders of convergence via
` โ lim๐โโ
๐ (๐ + 1)๐ (๐) .
(i) If ` = 1, we say that ๐ (๐) converges (to zero as ๐ โ โ) sublinearly.
(ii) If ` โ (0, 1), then this convergence is linear. This is equivalent to a convergence atthe rate ๐ ( ห๐) for any ห โ (`, 1).
(iii) If ` = 0, then the convergence is superlinear.
Dierent rates of superlinear convergence can also be studied. We say that ๐ (๐) converges(to zero as ๐ โ โ) superlinearly with order ๐ > 1 if
lim๐โโ
๐ (๐ + 1)๐ (๐)๐ < โ.
The most common case is ๐ = 2, which is also known as quadratic convergence. (This is notto be confused with the โ much slower โ convergence at the rate ๐ (1/๐2); similarly, linearconvergence is dierent from โ and much faster โ than convergence at the rate๐ (1/๐).)
10.1 the fundamental methods
Before going into this abstract operator-based theory, we demonstrate the general conceptof testing by studying the fundamental methods, the proximal point and explicit splittingmethods. These are purely primal methods with a single step length parameter, whichsimplies the testing approach since we only need a single testing parameter. (It shouldbe pointed out that the proofs in this section can be carried out โ and in fact shortened โwithout introducing testing parameters at all. Nevertheless, we follow this approach sinceit provides a blueprint for the proofs for the structured primal-dual methods where theseare required.)
proximal point method
We start with the basic proximal point method for solving 0 โ ๐ป (๐ฅ) for a monotoneoperator ๐ป : ๐ โ ๐ , which we recall can be written in implicit form as
(10.2) 0 โ ๐๐๐ป (๐ฅ๐+1) + (๐ฅ๐+1 โ ๐ฅ๐).
135
10 rates of convergence by testing
Theorem 10.1 (proximal point method iterate rates). Suppose ๐ป : ๐ โ ๐ is stronglymonotone with ๐ปโ1(0) โ โ . Let ๐ฅ๐+1 โ R๐๐๐ป (๐ฅ๐) for some {๐๐}๐โโ โ (0,โ) and ๐ฅ0 โ ๐ bearbitrary. Then the following hold for the iterates {๐ฅ๐}๐โโ and the unique point ๐ฅ โ ๐ปโ1(0):
(i) If ๐๐ โก ๐ is constant, then โ๐ฅ๐ โ ๐ฅ โ๐ โ 0 linearly.
(ii) If ๐๐ โโ, then โ๐ฅ๐ โ ๐ฅ โ๐ โ 0 superlinearly.
Proof. Let ๐ฅ โ ๐ปโ1(0); by assumption, such a point exists and is unique due to the assumedstrong monotonicity of ๐ป (since inserting any two roots ๐ฅ, ๐ฅ โ ๐ of ๐ป in (10.1) yieldsโ๐ฅ โ ๐ฅ โ๐ โค 0). For each iteration ๐ โ โ, pick a testing parameter ๐๐ > 0 and apply the test๐๐ ใ ยท , ๐ฅ๐+1 โ ๐ฅใ๐ to (10.2) to obtain (using the same notation from Theorem 9.4)
(10.3) 0 โ ๐๐๐๐ ใ๐ป (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ + ๐๐ ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ .By the strong monotonicity of ๐ป , and the fact that 0 โ ๐ป (๐ฅ), for some ๐พ > 0,
ใ๐ป (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐พ โ๐ฅ๐+1 โ ๐ฅ โ2๐Multiplying this inequality with ๐๐๐๐ and using (10.3), we obtain
๐๐๐๐๐พ โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐ ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ โค 0.
An application of the three-point identity (9.1) then yields
(10.4) ๐๐ (1 + 2๐๐๐พ)2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค ๐๐
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
Let us now force on the testing parameters the recursion
(10.5) ๐0 = 1, ๐๐+1 = ๐๐ (1 + 2๐๐๐พ).Then (10.4) yields
(10.6) ๐๐+12 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค ๐๐
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
We now distinguish the two cases for the step sizes ๐๐ .
(i) Summing (10.6) for ๐ = 0, . . . , ๐ โ 1 gives
๐๐2 โ๐ฅ๐ โ ๐ฅ โ2๐ +
๐โ1โ๐=0
๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค ๐0
2 โ๐ฅ0 โ ๐ฅ โ2๐ .
In particular, ๐0 = 1 implies that
โ๐ฅ๐ โ ๐ฅ โ2๐ โค โ๐ฅ0 โ ๐ฅ โ2๐/๐๐ .Since ๐๐ โก ๐ , (10.5) implies that ๐๐ = (1 + 2๐๐พ)๐ . Setting ห โ (1 + 2๐๐พ)โ1/2 < 1 nowgives convergence at the rate ๐ ( หโ๐ ) and therefore the claimed linear rate.
136
10 rates of convergence by testing
(ii) From (10.6) combined with (10.5) follows directly that
โ๐ฅ๐+1 โ ๐ฅ โ2๐โ๐ฅ๐ โ ๐ฅ โ2๐
โค ๐๐๐๐+1
= (1 + 2๐๐๐พ)โ1 โ 0
since ๐๐ โ โ, which implies the claimed superlinear convergence of โ๐ฅ๐ โ ๐ฅ โ๐ . (Asimilar argument can be used to directly show linear convergence for constant stepsizes.) ๏ฟฝ
explicit splitting
We now return to problems of the form
(10.7) min๐ฅโ๐
๐น (๐ฅ) +๐บ (๐ฅ)
for Gรขteaux dierentiable ๐น , and study the convergence rates of the forward-backwardsplitting method
(10.8) ๐ฅ๐+1 โ prox๐๐๐บ (๐ฅ๐ โ ๐โ๐น (๐ฅ๐)),which we recall can be written in implicit form as
(10.9) 0 โ ๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] + (๐ฅ๐+1 โ ๐ฅ๐).
Theorem 10.2 (forward-backward spliing iterate rates). Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ
be convex, proper, and lower semicontinuous. Suppose further that ๐น is Gรขteaux dierentiable,โ๐น is Lipschitz continuous with constant ๐ฟ > 0, and๐บ is ๐พ-strongly convex for some ๐พ > 0. If[๐(๐น +๐บ)]โ1(0) โ โ and the step length parameter ๐ > 0 satises ๐๐ฟ โค 2, then for any initialiterate ๐ฅ0 โ ๐ the iterates {๐ฅ๐}๐โโ generated by the explicit splitting method (10.8) convergelinearly to the unique minimizer of (10.7).
Proof. Let ๐ฅ โ [๐(๐น +๐บ)]โ1(0); by assumption, such a point exists and is unique due tothe strong and therefore strict convexity of ๐บ . As in the proof of Theorem 10.1, for eachiteration ๐ โ โ, pick a testing parameter ๐๐ > 0 and apply the test ๐๐ ใ ยท , ๐ฅ๐+1 โ ๐ฅใ to (10.9)to obtain
(10.10) 0 โ ๐๐๐ ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ + ๐๐ ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ .Since ๐บ is strongly convex, it follows from (10.9) and Lemma 7.4 (iii) that
ใ๐๐บ (๐ฅ๐+1) โ ๐๐บ (๐ฅ), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐พ โ๐ฅ๐+1 โ ๐ฅ โ2๐ .Similarly, since โ๐น is Lipschitz continuous, it follows from Corollary 7.2 that
ใโ๐น (๐ฅ๐) โ โ๐น (๐ฅ), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ โ๐ฟ4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
137
10 rates of convergence by testing
Combining the last two inequalities with 0 โ ๐๐บ (๐ฅ) + โ๐น (๐ฅ), we obtain
(10.11) ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐พ โ๐ฅ๐+1 โ ๐ฅ โ2๐ โ ๐ฟ
4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Inserting this into (10.10) and using the three-point identity, as in the proof of Theorem 10.1,we now obtain
(10.12) ๐๐ (1 + 2๐๐พ)2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐ (1 โ ๐๐ฟ/2)2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค ๐๐
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
Since 1 โ ๐๐ฟ/2 โฅ 0, summing over ๐ = 0, . . . , ๐ โ 1, we arrive at๐๐2 โ๐ฅ๐ โ ๐ฅ โ2๐ โค ๐0
2 โ๐ฅ0 โ ๐ฅ โ2๐ .
As in Theorem 10.1, the denition of ๐๐ shows that โ๐ฅ๐ โ ๐ฅ โ2๐ โ 0 linearly. ๏ฟฝ
Observe that it is not possible to obtain superlinear convergence in this case since theassumption ๐๐ โค 2๐ฟโ1 forces the step lengths to remain bounded.
10.2 structured algorithms and acceleration
We now to extend the analysis above to the structured case where ๐ป = ๐ด + ๐ต, since wehave already seen that most common rst-order algorithm can be written as calculatingin each step the next iterate ๐ฅ๐+1 from a specic instance of the general preconditionedimplicitโexplicit splitting method
(10.13) 0 โ ๐ด(๐ฅ๐+1) + ๐ต(๐ฅ๐) +๐ (๐ฅ๐+1 โ ๐ฅ๐) .
In the proofs of convergence of the proximal point and explicit splitting methods (e.g., inTheorems 10.1 and 10.2 as well as in Chapter 9), we had the step length ๐๐ in front of ๐ปor โ๐น + ๐๐บ . On the other hand, in Section 9.3 on structured algorithms, we incorporatedthe step length parameters into the preconditioning operator ๐ . To transfer the testingapproach from these fundamental methods to the structured methods, we will now splitthem out from๐ and move them in front of ๐ป as well by introducing a step length operator๐๐+1. We will also allow the preconditioner๐๐+1 to vary by iteration; as we will see below,this is required for accelerated versions of the PDPS method. Correspondingly, we considerthe scheme
(10.14) 0 โ๐๐+1 [๐ด(๐ฅ๐+1) + ๐ต(๐ฅ๐)] +๐๐+1(๐ฅ๐+1 โ ๐ฅ๐).
Since we now have a step length operator instead of a single scalar step length, we willalso have to consider instead of a scalar testing parameter an iteration-dependent testingoperator ๐๐+1 โ ๐(๐ ;๐ ). The rough idea is that ๐๐+1๐ โ or, as needed for accelerated
138
10 rates of convergence by testing
algorithms, ๐๐+1๐๐+1 โ will form a โlocal normโ that measures the rate of convergencein a nonuniform way; and rather than testing the (scalar) three-point identity (10.4), wewill build the testing already into the initial strong monotonicity inequality. We thereforerequire an operator-level version of strong monotonicity, which we introduce next.
Let ๐ด : ๐ โ ๐ and let ๐, ฮ โ ๐(๐ ;๐ ) be such that ๐ฮ is self-adjoint and positive semi-denite. Then we say that ๐ด is ฮ-strongly monotone at ๐ฅ โ ๐ with respect to ๐ if
(10.15) ใ๐ด(๐ฅ) โ๐ด(๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ โ๐ฅ โ ๐ฅ โ2๐ฮ (๐ฅ โ ๐ ).
If this holds for all ๐ฅ โ ๐ , we say that ๐ด is ฮ-strongly monotone with respect to ๐ .
It is clear that strongly monotone operators with parameter ๐พ > 0 are ๐พ ยท Id-strongly mono-tone with respect to ๐ = Id. More generally, operators with a separable block-structure,๐ด(๐ฅ) = (๐ด1(๐ฅ1), . . . , ๐ด๐ (๐ฅ๐)) for ๐ฅ = (๐ฅ1, . . . , ๐ฅ๐) satisfy the property, as as illustrated inmore detail in the next example for the two-block case.
Example 10.3. Let ๐ด(๐ฅ) = (๐ด1(๐ฅ1), ๐ด2(๐ฅ2)) for ๐ฅ = (๐ฅ1, ๐ฅ2) โ ๐1 ร๐2 and the monotoneoperators ๐ด1 : ๐1 โ ๐1 and ๐ด2 : ๐2 โ ๐2. Suppose ๐ด1 and ๐ด2 are, respectively ๐พ1- and๐พ2-(strongly) monotone for ๐พ1, ๐พ2 โฅ 0. Then (10.15) holds for any ๐1, ๐2 > 0 for
ฮ โ
(๐พ1Id 00 ๐พ2Id
)and ๐ โ
(๐1Id 00 ๐2Id
)Let further ๐ต : ๐ โ ๐ and let ๐,ฮ โ ๐(๐ ;๐ ) be such that ๐ฮ is self-adjoint and positivesemi-denite. Then we say that ๐ต is three-point monotone at ๐ฅ โ ๐ with respect to ๐ and ฮif
(10.16) ใ๐ต(๐ง) โ ๐ต(๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ โ 14 โ๐ฅ โ ๐งโ2๐ฮ (๐ฅ, ๐ง โ ๐ ).
If this holds for all ๐ฅ โ ๐ , we say that ๐ต is three-point monotone with respect to ๐ and ฮ.
Example 10.4. Let ๐ต(๐ฅ) = (โ๐ธ1(๐ฅ1),โ๐ธ2(๐ฅ2)) for ๐ฅ = (๐ฅ1, ๐ฅ2) โ ๐1 ร ๐2 and the respec-tively ๐ฟ1- and ๐ฟ2-smooth convex functions ๐ธ1 : ๐1 โ โ and ๐ธ2 : ๐2 โ โ. Then areferral to Corollary 7.2 shows (10.16) to hold for any ๐1, ๐2 > 0 for
ฮ โ
(๐ฟ1Id 00 ๐ฟ2Id
)and ๐ โ
(๐1Id 00 ๐2Id
)More generally, we can take ๐ต(๐ฅ) = (๐ต1(๐ฅ1), ๐ต2(๐ฅ2)) for ๐ต1 : ๐1 โ ๐1 and ๐ต2 : ๐2 โ ๐2three-point monotone as dened in (7.9).
139
10 rates of convergence by testing
Clearly Example 10.4 as Example 10.3 generalizes to a large number of blocks, and both tooperators acting separably on more general direct sums of orthogonal subspaces.
We are now ready to forge our hammer for producing convergence rates for structuredalgorithms.
Theorem 10.5. Let๐ด, ๐ต : ๐ โ ๐ and๐ป โ ๐ด+๐ต. For each ๐ โ โ, let further๐๐+1,๐๐+1, ๐๐+1 โ๐(๐ ;๐ ) be such that ๐๐+1๐๐+1 is self-adjoint and positive semi-denite. Assume that thereexists a ๐ฅ โ ๐ปโ1(0). For each ๐ โ โ, suppose for some ฮ,ฮ โ ๐(๐ ;๐ ) that ๐ด is ฮ-stronglymonotone at ๐ฅ with respect to ๐๐+1๐๐+1 and that ๐ต is three-point monotone at ๐ฅ with respectto ๐๐+1๐๐+1 and ฮ. Let the initial iterate ๐ฅ0 โ ๐ be arbitrary, and suppose {๐ฅ๐+1}๐โโ aregenerated by (10.14). If for every ๐ โ โ both
๐๐+1(๐๐+1 + 2๐๐+1ฮ) โฅ ๐๐+2๐๐+2 and(10.17)๐๐+1๐๐+1 โฅ ๐๐+1๐๐+1ฮ/2.(10.18)
hold, then
(10.19) 12 โ๐ฅ
๐ โ ๐ฅ โ2๐๐+1๐๐+1 โค12 โ๐ฅ
0 โ ๐ฅ โ2๐1๐1.
Proof. For brevity, denote๐ป๐+1(๐ฅ๐+1) โ๐๐+1 [๐ด(๐ฅ๐+1)+๐ต(๐ฅ๐)]. First, from (10.15) and (10.16)we have that
(10.20) ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐๐+1 โฅ โ๐ฅ๐+1 โ ๐ฅ โ2๐๐+1๐๐+1ฮ โ14 โ๐ฅ
๐ โ ๐ฅ๐+1โ2๐๐+1๐๐+1ฮ.
Multiplying (10.14) with ๐๐+1 and rearranging, we obtain
๐๐+1(๐ฅ โ ๐ฅ๐+1 โ๐๐+1(๐ฅ๐+1 โ ๐ฅ๐)) โ ๐๐+1๐ป๐+1(๐ฅ๐+1).Inserting this into (10.20) and applying the preconditioned three-point formula (9.11) for๐ = ๐๐+1๐๐+1 yields
12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐๐+1 (๐๐+1+2๐๐+1ฮ) +12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐๐+1 (๐๐+1โ๐๐+1ฮ/2) โค12 โ๐ฅ
๐ โ ๐ฅ โ2๐๐+1๐๐+1 .
Using (10.17) and (10.18), this implies that
(10.21) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐๐+2๐๐+2 โค12 โ๐ฅ
๐ โ ๐ฅ โ2๐๐+1๐๐+1 .
Summing over ๐ = 0, . . . , ๐ โ 1 now yields the claim. ๏ฟฝ
The inequality (10.21) is a quantitative or variable-metric version of the Fejรฉr monotonicityof Lemma 9.1 (i) with respect to ๐ = {๐ฅ}.If Theorem 10.5 is applicable, we immediately obtain the convergence rate result.
Corollary 10.6 (convergence with a rate). If (10.19) holds and ๐๐+1๐๐+1 โฅ ` (๐ )๐ผ for some` : โ โ โ, then โ๐ฅ๐ โ ๐ขโ2 โ 0 at the rate ๐ (1/` (๐ )).
140
10 rates of convergence by testing
primal-dual proximal splitting methods
We now apply this operator-testing technique to primal-dual splitting methods for thesolution of
(10.22) min๐ฅโ๐
๐น0(๐ฅ) + ๐ธ (๐ฅ) +๐บ (๐พ๐ฅ)
with ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ, and ๐บ : ๐ โ โ convex, proper, and lower semicontinuousand ๐พ โ ๐(๐ ;๐ ). We will also write ๐น โ ๐น0 + ๐ธ. The methods include in particularthe PDPS method with a forward step (9.29). Now allowing varying step lengths and anover-relaxation parameter ๐๐ , this can be written
(10.23)
๐ฅ๐+1 โ (๐ผ + ๐๐๐๐น0)โ1(๐ฅ๐ โ ๐๐๐พโ๐ฆ๐ โ ๐๐โ๐ธ (๐ฅ๐)),๐ฅ๐+1 โ ๐๐ (๐ฅ๐+1 โ ๐ฅ๐) + ๐ฅ๐+1,๐ฆ๐+1 โ (๐ผ + ๐๐+1๐๐บโ)โ1(๐ฆ๐ + ๐๐+1๐พ๐ฅ๐+1).
For the basic version of the algorithm with ๐๐ = 1, ๐๐ โก ๐0 > 0, and ๐๐ โก ๐0 > 0, wehave seen in Corollary 9.18 that the iterates converge weakly if the step length parameterssatisfy
(10.24) ๐ฟ๐0/2 + ๐0๐0โ๐พ โ2๐(๐ ;๐ ) < 1,
where ๐ฟ is the Lipschitz constant of โ๐ธ. We will now show that under strong convexity of๐น0, we can choose these parameters to accelerate the algorithm to yield convergence at arate๐ (1/๐ 2). If both ๐น0 and๐บโ are strongly convex, we can even obtain linear convergence.Throughout ๐ข = (๐ฅ, ๐ฆ) denotes a root of
๐ป (๐ข) โ(๐๐น0(๐ฅ) + โ๐ธ (๐ฅ) + ๐พโ๐ฆ
๐๐บโ(๐ฆ) โ ๐พ๐ฅ),
which we assume exists. From Theorem 5.10, this is the case if an interior point conditionis satised for ๐บ โฆ ๐พ and (10.22) admits a solution.
We will also require the following technical lemma in place of the simpler growth argumentfor the choice (10.5).
Lemma 10.7. Pick ๐0 > 0 arbitrarily, and dene iteratively ๐๐+1 โ ๐๐(1 + 2๐พ๐โ1/2
๐
)for some
๐พ > 0. Then there exists a constant ๐ > 0 such that ๐๐ โฅ (๐๐ + ๐ 1/2
0)2 for all ๐ โ โ.
Proof. Replacing ๐๐ by ๐โฒ๐ โ ๐พโ2๐๐ , we may assume without loss of generality that ๐พ = 1.We claim that ๐ 1/2
๐โฅ ๐๐ + ๐ 1/2
0 for some ๐ > 0. We proceed by induction. The case ๐ = 0
141
10 rates of convergence by testing
is clear. If the claim holds for ๐ = 0, . . . , ๐ โ 1, we can unroll the recursion to obtain theestimate
๐๐ โ ๐0 =๐โ1โ๐=0
2๐ 1/2๐
โฅ 2๐โ1โ๐=0
๐๐ + 2๐ 1/20 ๐ = ๐๐ (๐ โ 1) + 2๐ 1/2
0 ๐ = ๐๐ 2 + (2๐ 1/20 โ ๐)๐ .
Expanding (๐๐ + ๐ 1/20 )2 = ๐2๐ 2 + 2๐๐ 1/2
0 ๐ + ๐0, we see that the claim for ๐๐ holds if๐ โฅ ๐2 and 2๐ 1/2
0 โ ๐ โฅ 2๐๐ 1/20 . Taking the latter with equality and solving for ๐ yields
๐ = 2๐ 1/20 /(1 + 2๐ 1/2
0 ) < 1 and hence also the former. Since this choice of ๐ does not dependon ๐ , the claim follows. ๏ฟฝ
Theorem 10.8 (accelerated and linearly convergent PDPS). Let ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ
and ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous with โ๐ธ Lipschitz continuouswith constant ๐ฟ > 0. Also let ๐พ โ ๐(๐ ;๐ ), and suppose the assumptions of Theorem 5.10 aresatised with ๐น โ ๐น0 + ๐ธ. Pick initial step lengths ๐0, ๐0 > 0 subject to (10.24). For any initialiterate ๐ข0 โ ๐ ร ๐ , suppose {๐ข๐+1 = (๐ฅ๐+1, ๐ฆ๐+1)}๐โโ are generated by (10.23).
(i) If ๐น0 is strongly convex with factor ๐พ > 0, and we take
(10.25) ๐๐ โ 1/โ1 + 2๐พ๐๐ , ๐๐+1 โ ๐๐๐๐ , and ๐๐+1 โ ๐๐/๐๐ ,
then โ๐ฅ๐ โ ๐ฅ โ2๐ โ 0 at the rate ๐ (1/๐ 2).(ii) If both ๐น0 and๐บโ are strongly convex with factor ๐พ > 0 and ๐ > 0, respectively, and we
take
(10.26) ๐๐ โ 1/โ1 + 2\, \ โ min{๐๐0, ๐พ๐0}, ๐๐ โ ๐0 and ๐๐ โ ๐0,
then โ๐ฅ๐ โ ๐ฅ โ2๐ + โ๐ฆ๐ โ ๐ฆ โ2๐ โ 0 linearly.
Proof. Recalling Corollary 9.18, we write (10.23) in the form (10.14) by taking
๐ด(๐ข) โ(๐๐น0(๐ฅ)๐๐บโ(๐ฆ)
)+ ฮ๐ข, ๐ต(๐ข) โ
(โ๐ธ (๐ฅ)0
), ฮ โ
(0 ๐พโ
โ๐พ 0
),
๐๐+1 โ(๐๐ Id 00 ๐๐+1Id
), and ๐๐+1 โ
(Id โ๐๐๐พโ
โ๐๐๐๐+1๐พ Id
).
As before, Theorem 5.10 guarantees that ๐ปโ1(0) โ โ . For some primal and dual testingparameters ๐๐ ,๐๐+1 > 0, we also take as our testing operator
(10.27) ๐๐+1 โ(๐๐ Id 00 ๐๐+1Id
).
142
10 rates of convergence by testing
By Examples 10.3 and 10.4, ๐ด is then ฮ-strongly monotone with respect to ๐๐+1๐๐+1 and ๐ตis three-point monotone with respect to ๐๐+1๐๐+1 and ฮ for
ฮ โ ฮ +(๐พ Id 00 ๐Id
), and ฮ โ
(๐ฟ Id 00 0
),
where ๐ = 0 if ๐บโ is not strongly convex.
We will apply Theorem 10.5. Taking ๐๐ โ ๐โ1๐+1๐
โ1๐+1๐๐๐๐ , we expand
(10.28) ๐๐+1๐๐+1 =(๐๐ Id โ๐๐๐๐๐พโ
โ๐๐๐๐๐พ ๐๐+1Id
).
Thus ๐๐+1๐๐+1 is self-adjoint as required. We still need to show that it is nonnegative andindeed grows at a rate that gives our claims. We also need to verify (10.17) and (10.18), whichexpand as ( (๐๐ (1 + 2๐พ๐๐) โ ๐๐+1)Id (๐๐๐๐ + ๐๐+1๐๐+1)๐พโ
(๐๐+1๐๐+1 โ 2๐๐+1๐๐+1 โ ๐๐๐๐)๐พ (๐๐+1(1 + 2๐๐๐+1) โ๐๐+2)Id)โฅ 0, and(10.29) (
๐๐ (1 โ ๐๐๐ฟ/2)Id โ๐๐๐๐๐พโ
โ๐๐๐๐๐พ ๐๐+1Id
)โฅ 0.(10.30)
We now proceed backward by deriving the step length rules as sucient conditions forthese two inequalities. First, clearly (10.29) holds if for all ๐ โ โ we can guarantee that
(10.31) ๐๐+1 โค ๐๐ (1 + 2๐พ๐๐), ๐๐+1 โค ๐๐ (1 + 2๐๐๐), and ๐๐๐๐ = ๐๐๐๐ .
We deal with (10.30) and the lower bounds on ๐๐+1๐๐+1 in one go. By Youngโs inequality,we have for any ๐ฟ โ (0, 1) that
๐๐๐๐ ใ๐พ๐ฅ, ๐ฆใ โค (1 โ ๐ฟ)๐๐ โ๐ฅ โ2 + ๐๐๐2๐ (1 โ ๐ฟ)โ1โ๐พโ๐ฆ โ2 (๐ฅ โ ๐, ๐ฆ โ ๐ ),hence recalling (10.28) also
(10.32) ๐๐+1๐๐+1 โฅ(๐ฟ๐๐ Id 00 ๐๐+1Id โ ๐๐๐2๐ (1 โ ๐ฟ)โ1๐พ๐พโ
).
The condition (10.30) is therefore satised and ๐๐+1๐๐+1 โฅ Y๐๐+1 if (10.31) holds, and forsome Z > 0 both
(10.33) (1 โ Z )๐ฟ๐๐ โฅ ๐๐๐๐๐ฟ/2 and ๐๐+1 โฅ ๐๐๐2๐ (1 โ ๐ฟ)โ1โ๐พ โ2.By (10.31),๐๐+1 โฅ ๐๐ , so using also using the last part of (10.31), we see (10.33) to hold if
(10.34) (1 โ Z )๐ฟ โฅ ๐๐๐ฟ/2 and 1 โ ๐ฟ โฅ ๐๐๐๐ โ๐พ โ2.If we choose ๐๐ and ๐๐ such that their product stays constant (i.e., ๐๐๐๐ = ๐0๐0), it is thenoptimal to take ๐ฟ โ 1 โ ๐0๐0โ๐พ โ2, which has to be positive. Inserting this into the rst
143
10 rates of convergence by testing
part of (10.34), we see that it holds for some Z > 0 if ๐๐๐ฟ/2 < 1 โ ๐0๐0โ๐พ โ2. Since {๐๐}๐โโis nonincreasing, we see that (10.34) and hence (10.30) is satised when the initializationcondition (10.24) holds.
To apply Theorem 10.5, all that remains is to verify (10.31) and that ๐๐๐๐ = ๐0๐0. To obtainconvergence rates, we need to further study the rate of increase of ๐๐ and๐๐+1, which werecall that we wish to make as high as possible.
(i) If ๐พ > 0 and ๐ = 0, the best possible choice allowed by (10.31) is ๐๐ โก ๐0 and๐๐+1 = ๐๐ (1 + 2๐พ๐๐) with ๐๐ = ๐๐๐๐/๐0. Together with the condition ๐๐๐๐ = ๐0๐0,this forces ๐0๐0 = ๐๐๐2๐/๐0. If we take๐0 = 1/(๐0๐0), we thus need ๐๐ = ๐โ1/2
๐. Since
๐๐+1 = ๐0๐0/๐๐+1 = 1/(๐0๐๐+1), we obtain the relations
๐๐ =๐๐๐๐
๐๐+1๐๐+1=๐ 1/2๐
๐ 1/2๐+1
=1โ
1 + 2๐พ๐๐,
which are satised for the choices of ๐๐ , ๐๐+1, and ๐๐+1 in (10.25).
We now use Theorem 10.5 and Corollary 10.6 and (10.32) to obtain
๐ฟ๐๐2 โ๐ฅ๐ โ ๐ฅ โ2๐ โค 1
2 โ๐ข๐ โ ๐ขโ2๐๐+1๐๐+1 โค ๐ถ0 โ
12 โ๐ข
0 โ ๐ขโ2๐1๐1.
Although this does not tell us anything about the convergence of the dual iterates{๐ฆ๐ }๐โโ as๐๐ โก ๐ stays constant, Lemma 10.7 shows that the primal test ๐๐ growsat the rate ฮฉ(๐ 2) Hence we obtain the claimed convergence of the primal iterates atthe rate ๐ (1/๐ 2).
(ii) If ๐พ > 0 and ๐ > 0 and we take ๐๐ โก ๐0 and ๐๐ โก ๐0, the last condition of (10.31) forces๐๐ = ๐๐๐0/๐0. Inserting this into the second condition yields ๐๐+1 โค ๐๐ (1 + 2๐๐0).Together with the rst condition, we therefore at best can take ๐๐+1 = ๐๐ (1 + 2\ ) for\ โ min{๐๐0, ๐พ๐0}. Reversing the roles of๐ and ๐ , we see that we can at best take๐๐+1 = ๐๐ (1 + 2\ ). This leads to the relations
๐๐ =๐๐๐0๐0๐๐+1
=๐๐๐๐+1
=1
1 + 2\ ,
which are again satised by the respective choices in (10.26).
We nish the proof with Theorem 10.5 and Corollary 10.6, observing now from (10.32)that ๐๐๐๐ โฅ ๐ถ (1 + 2\ )๐ Id for some ๐ถ > 0. ๏ฟฝ
Note that if๐พ = 0 and ๐ = 0, (10.31) forces ๐๐ โก ๐0 as well as๐๐ โก ๐0. If we take ๐๐ โก 1, thenwe also have to take ๐๐ = ๐๐๐0. We can use this to dene๐0 if we also x ๐๐ โก ๐0 and๐๐ โก ๐0.This also forces ๐๐ โก 1. We thus again arrive at (10.31) as well as ๐๐๐๐ = ๐0๐0. However, wecannot obtain from this convergence rates for the iterates, merely boundedness and henceweak convergence as in Section 9.4.
144
11 GAPS AND ERGODIC RESULTS
We continue with the testing framework introduced in Chapter 10 for proving rates ofconvergence of iterates of optimization methods. This generally required strong convexity,which is not always available. In this chapter, we use the testing idea to derive convergencerates of objective function values and other, more general, gap functionals that indicatealgorithm convergence more indirectly than iterate convergence. This can be useful incases where we can only obtain weak convergence of iterates, but can obtain rates ofconvergence of such a gap functional. Nevertheless, this gap convergence often will onlybe ergodic, i.e., the estimates only apply to a weighted sum of the history of iterates insteadof the most recent iterate. In fact, we will rst derive ergodic estimates for all algorithms.If we can additionally show that the algorithm is monotonic with respect to this gap, wecan improve the ergodic estimate to the nonergodic ones as in the previous chapters.
11.1 gap functionals
We recall that one of the three fundamental ingredients in the convergence proofs ofChapter 9 was the monotonicity of ๐ป (with one of the points xed to a root ๐ฅ). We nowmodify this requirement to be able to prove estimates on the convergence of functionvalues when ๐ป = ๐๐น for some proper, convex, and lower semicontinuous ๐น : ๐ โ โ. Inthis case, by the denition of the convex subdierential,
(11.1) ใ๐๐น (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐น (๐ฅ๐+1) โ ๐น (๐ฅ) (๐ฅ โ ๐ ).
On the other hand, for an ๐ฟ-smooth functional ๐บ : ๐ โ โ, we can use the three-pointestimates of Corollary 7.2 to obtain
(11.2) ใโ๐บ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐บ (๐ฅ๐+1) โ๐บ (๐ฅ) โ 12๐ฟ โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ (๐ฅ โ ๐ ).
These two inequalities are enough to obtain function value estimates for the more generalcase ๐ป = ๐๐น + โ๐บ including a forward step with respect to ๐บ . We will produce suchestimates in Section 11.2.
145
11 gaps and ergodic results
generic gap functionals
More generally, when ๐ป does not directly arise from subdierentials or gradients buthas a more complicated structure, we introduce several gap functionals. We identied inChapter 9 that for some lifted functionals ๐น and ๏ฟฝ๏ฟฝ and a skew-adjoint operator ฮ = โฮโ,the unaccelerated PDPS, PDES, and DRS consist in taking ๐ป = ๐๐น +โ๏ฟฝ๏ฟฝ +ฮ and iterating
(11.3) 0 โ ๐๏ฟฝ๏ฟฝ (๐ฅ๐+1) + โ๐น (๐ฅ๐) + ฮ๐ฅ๐+1 +๐ (๐ฅ๐+1 โ ๐ฅ๐),
where the skew-adjoint operator ฮ does not arise as a subdierential of any function. Work-ing with this requires extra eort, especially when we later study accelerated methods.
Note that by the skew-adjointness of ฮ, we have ใฮ๐ฅ, ๐ฅใ๐ = 0. Using this and the estimates(11.1) and (11.2) on ๐น and ๏ฟฝ๏ฟฝ , we obtain for the basic unaccelerated scheme (11.3) the estimate
ใ๐๏ฟฝ๏ฟฝ (๐ฅ๐+1) + โ๐น (๐ฅ๐) + ฮ๐ฅ๐+1, ๐ฅ๐+1 โ ๐ฅใ๐ โฅ G(๐ฅ ;๐ฅ) โ 12๐ฟ โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐
with the generic gap functional
(11.4) G(๐ฅ ;๐ฅ) โ (๏ฟฝ๏ฟฝ + ๐น ) (๐ฅ) โ (๏ฟฝ๏ฟฝ + ๐น ) (๐ฅ) + ใฮ๐ฅ, ๐ฅใ๐ .
In the next lemma, we collect some elementary properties of this functional. Note thatG(๐ฅ, ๐ง) = 0 is possible even for ๐ฅ โ ๐ง.
Lemma 11.1. Let ๐ป โ ๐๐น + โ๏ฟฝ๏ฟฝ + ฮ, where ฮ โ ๐(๐ ;๐ ) is skew-adjoint and ๏ฟฝ๏ฟฝ : ๐ โ โ and๐น : ๐ โ โ are convex, proper, and lower semicontinuous. If ๐ฅ โ ๐ปโ1(0), then G( ยท ;๐ฅ) โฅ 0and G(๐ฅ ;๐ฅ) = 0.
Proof. We rst note that ๐ฅ โ ๐ปโ1(0) is equivalent to โฮ๐ฅ โ ๐(๐น + ๏ฟฝ๏ฟฝ) (๐ฅ). Hence usingthe denition of the convex subdierential and the fact that ใฮ๐ฅ, ๐ฅใ๐ = 0 due to theskew-adjointness of ฮ, we deduce for arbitrary ๐ฅ โ ๐ that
(๐น + ๏ฟฝ๏ฟฝ) (๐ฅ) โ (๐น + ๏ฟฝ๏ฟฝ) (๐ฅ) โฅ ใโฮ๐ฅ, ๐ฅ โ ๐ฅใ๐ = ใโฮ๐ฅ, ๐ฅใ๐ ,
i.e., G(๐ฅ, ๐ฅ) โฅ 0. The fact that G(๐ฅ, ๐ฅ) = 0 follows immediately from the skew-adjointnessof ฮ. ๏ฟฝ
The function value estimates (11.1) and (11.2) โ unlike simple monotonicity-based nonnega-tivity estimates โ do not depend on ๐ฅ being a root of ๐ป . Therefore, taking any bounded set๐ต โ ๐ such that ๐ปโ1(0) โฉ ๐ต โ โ , we see that the partial gap
G(๐ฅ ;๐ต) โ sup๐ฅโ๐ต
G(๐ฅ ;๐ฅ)
also satises G( ยท ;๐ต) โฅ 0.
146
11 gaps and ergodic results
the lagrangian duality gap
Let us now return to the problem
(11.5) min๐ฅโ๐
๐น (๐ฅ) +๐บ (๐พ๐ฅ),
where we split ๐น = ๐น0 + ๐ธ assuming ๐ธ to have a Lipschitz-continuous gradient. With thenotation ๐ข = (๐ฅ, ๐ฆ), we recall that Theorem 5.10 guarantees the existence of a primal-dualsolution ๐ข whenever its conditions are satised. This, we further recall, can be written as0 โ ๐ป (๐ข) for
(11.6a) ๐ป (๐ข) โ(๐๐น (๐ฅ) + ๐พโ๐ฆ๐๐บโ(๐ฆ) โ ๐พ๐ฅ
).
As we have already seen in, e.g., Theorem 10.8, we can express this choice of ๐ป in thepresent framework with
๐น (๐ข) โ ๐น0(๐ฅ) +๐บ (๐ฆ), ๏ฟฝ๏ฟฝ (๐ข) โ ๐ธ (๐ฅ), and ฮ โ
(0 ๐พโ
โ๐พ 0
).(11.6b)
With this, the generic gap functional G from (11.4) becomes the Lagrangian duality gap
(11.7) G๐ฟ (๐ข;๐ข) โ(๐น (๐ฅ) + ใ๐ฆ, ๐พ๐ฅใ โ๐บโ(๐ฆ)) โ (
๐น (๐ฅ) + ใ๐ฆ, ๐พ๐ฅใ โ๐บโ(๐ฆ)) โค G(๐ข),where
G(๐ข) โ ๐น (๐ฅ) +๐บ (๐พ๐ฅ) + ๐น โ(โ๐พ๐ฆ) +๐บโ(๐ฆ)is the real duality gap, cf. (5.16). As Corollary 5.13 shows, when its conditions are satisedand ๐ข = ๐ข โ ๐ปโ1(0), the Lagrangian duality gap is nonnegative.
Since (11.1) and (11.2) do not depend on ๐ฅ being a root of ๐ป , convergence results for theLagrangian duality gap can sometimes be improved slightly by taking any bounded set๐ต โ ๐ ร ๐ such that ๐ต โฉ ๐ปโ1(0) โ โ and dening the partial duality gap
(11.8) G(๐ข;๐ต) โ sup๐ขโ๐ต
G๐ฟ (๐ข;๐ข).
This satises 0 โค G(๐ข;๐ต) โค G(๐ข). Moreover, by the denition of ๐น โ and๐บโโ = ๐บ , we haveG(๐ข;๐ ร ๐ ) = G(๐ข), which explains both the importance of partial duality gaps and theterm โpartial gapโ.
bregman divergences and gap functionals
Although we will not need this in the following, we briey discuss a possible extension toBanach spaces. Let ๐ฝ : ๐ โ โ be convex on a Banach space ๐ . Then for ๐ฅ โ dom ๐ฝ and๐ โ ๐๐ฝ (๐ฅ), one can dene the asymmetric Bregman divergence (or distance)
๐ต๐๐ฝ (๐ง, ๐ฅ) โ ๐ฝ (๐ง) โ ๐ฝ (๐ฅ) โ ใ๐, ๐ง โ ๐ฅใ๐ , (๐ฅ โ ๐ ).
147
11 gaps and ergodic results
Due to the denition of the convex subdierential, this is nonnegative. It is also possible tosymmetrize the distance by considering ๏ฟฝ๏ฟฝ ๐ฝ (๐ฅ, ๐ง) โ ๐ต
๐๐ฝ (๐ฅ, ๐ง) + ๐ต
๐๐ฝ (๐ง, ๐ฅ) with ๐ โ ๐๐ฝ (๐ง) and
๐ง โ dom ๐ฝ , but even the symmetrized divergence is not generally a true distance as it canhappen that ๐ต ๐ฝ (๐ฅ, ๐ง) = 0 even if ๐ฅ โ ๐ง.
The Bregman divergence satises a three-point identity for any ๐ฅ โ dom ๐ฝ : We have
๐ต๐๐ฝ (๐ฅ, ๐ฅ) โ ๐ต
๐๐ฝ (๐ฅ, ๐ง) + ๐ต
๐๐ฝ (๐ฅ, ๐ง) = [๐ฝ (๐ฅ) โ ๐ฝ (๐ฅ) โ ใ๐, ๐ฅ โ ๐ฅใ๐ ] โ [๐ฝ (๐ฅ) โ ๐ฝ (๐ง) โ ใ๐, ๐ฅ โ ๐งใ๐ ]
+ [๐ฝ (๐ฅ) โ ๐ฝ (๐ง) โ ใ๐, ๐ฅ โ ๐งใ๐ ],
which immediately gives the three-point identity
(11.9) ใ๐ โ๐, ๐ฅ โ๐ฅใ๐ = ๐ต๐๐ฝ (๐ฅ, ๐ฅ) โ๐ต๐๐ฝ (๐ฅ, ๐ง) +๐ต
๐๐ฝ (๐ฅ, ๐ง) (๐ฅ, ๐ฅ, ๐ง โ ๐, ๐ โ ๐๐ฝ (๐ง), ๐ โ ๐๐ฝ (๐ฅ)) .
If๐ is a Hilbert space,we can take ๐ฝ (๐ฅ) = 12 โ๐ฅ โ2 to obtain ๐ต๐ฅโ๐ง๐ฝ (๐ง, ๐ฅ) = ๏ฟฝ๏ฟฝ ๐ฝ (๐ง, ๐ฅ) = 1
2 โ๐งโ๐ฅ โ2๐ .Therefore this three-point identity generalizes the classical three-point identity (9.1) inHilbert spaces. This could be used to generalize our convergence proofs to Banach spacesto treat methods of the general form
0 โ ๐ป (๐ฅ๐+1) + ๐1๐ต๐๐
๐ฝ (๐ฅ๐+1, ๐ฅ๐),
where ๐1 denotes taking a subdierential with respect to the rst variable. To see how (11.9)applies, observe that
๐1๐ต๐๐
๐ฝ (๐ฅ๐+1, ๐ฅ๐) = ๐๐ฝ (๐ฅ๐+1) โ ๐๐ = {๐๐+1 โ ๐๐ | ๐๐+1 โ ๐๐ฝ (๐ฅ๐+1)}.
This would, however, not provide convergence in norm but with respect to ๐ต ๐ฝ . For a generalapproach to primal-dual methods based on Bregman divergences, see [Valkonen 2020a].
Returning to our generic gap functional G dened in (11.4), we have already observed inthe proof of Lemma 11.1 that โฮ๐ฅ โ ๐(๐น + ๏ฟฝ๏ฟฝ) (๐ฅ). Since due to the skew-adjointness of ฮwe also have ใฮ๐ฅ, ๐ฅใ๐ = ใฮ๐ฅ, ๐ฅ โ ๐ฅใ๐ for a solution ๐ฅ โ ๐ปโ1(0), this means that
G(๐ฅ, ๐ฅ) = ๐ตโฮ๐ฅ๏ฟฝ๏ฟฝ+๐น (๐ฅ, ๐ฅ).
In other words, the gap based at a solution ๐ฅ โ ๐ปโ1(0) is also a Bregman divergence. Ingeneral, as we have already remarked, it can be zero for ๐ฅ โ ๐ฅ .
11.2 convergence of function values
We start with the fundamental algorithms: the proximal point method and explicit splitting.In the following, we write ๐บmin โ min๐ฅโ๐ ๐บ (๐ฅ) whenever the minimum exists.
148
11 gaps and ergodic results
Theorem 11.2 (proximal pointmethod ergodic function value). Let๐บ be proper, lower semicon-tinuous, and (strongly) convex with factor ๐พ โฅ 0. Suppose [๐๐บ]โ1(0) โ โ . Pick an arbitrary๐ฅ0 โ ๐ . Let ๐๐+1 โ ๐๐ (1 + ๐พ๐๐), and ๐0 โ 1. For the iterates ๐ฅ๐+1 โ prox๐๐๐บ (๐ฅ๐) of theproximal point method, dene the โergodic sequenceโ
(11.10) ๐ฅ๐ โ1Z๐
๐โ1โ๐=0
๐๐๐๐๐ฅ๐+1 for Z๐ โ
๐โ1โ๐=0
๐๐๐๐ (๐ โฅ 1).
(i) If ๐๐ โก ๐ > 0 and ๐บ is not strongly convex (๐พ = 0), then ๐บ (๐ฅ๐ ) โ ๐บmin at the rate๐ (1/๐ ).
(ii) If ๐๐ โก ๐ > 0 and ๐บ is strongly convex (๐พ > 0), then ๐บ (๐ฅ๐ ) โ ๐บmin linearly.
(iii) If ๐๐ โโ and ๐บ is strongly convex, then ๐บ (๐ฅ๐ ) โ ๐บmin superlinearly.
Proof. Let the root ๐ฅ โ [๐๐บ]โ1(0) be arbitrary; by assumption at least one exists. Then๐บmin = ๐บ (๐ฅ) by Theorem 4.2. We recall that the proximal point iteration for minimizing ๐บcan be written as
(11.11) 0 โ ๐๐๐๐บ (๐ฅ๐+1) + (๐ฅ๐+1 โ ๐ฅ๐).
As in the proof of Theorem 9.4, we test (11.11) by the application of ๐๐ ใ ยท , ๐ฅ๐+1 โ ๐ฅใ๐ forsome testing parameter ๐๐ > 0 to obtain
(11.12) 0 โ ๐๐๐๐ ใ๐๐บ (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ + ๐๐ ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ .
The next step will dier from the proof of Theorem 9.4, as we want a value estimate. Indeed,by the subdierential characterization of strong convexity, Lemma 7.4 (ii),
ใ๐๐บ (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐บ (๐ฅ๐+1) โ๐บ (๐ฅ) + ๐พ2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ .
Using this and the three-point-identity (9.1) in (11.12), we obtain similarly to the proof ofTheorem 10.1 the estimate
(11.13) ๐๐ (1 + ๐๐๐พ)2 โ๐ฅ๐+1 โ๐ฅ โ2๐ +๐๐๐๐ [๐บ (๐ฅ๐+1) โ๐บ (๐ฅ)] + ๐๐2 โ๐ฅ๐+1 โ๐ฅ๐ โ2๐ โค ๐๐2 โ๐ฅ๐ โ๐ฅ โ2๐ .
We now impose the recursion
(11.14) ๐๐ (1 + ๐๐๐พ) = ๐๐+1.
(Observe the factor-of-two dierence compared to (10.5).) Thus
(11.15) ๐๐+12 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐๐๐ [๐บ (๐ฅ๐+1) โ๐บ (๐ฅ)] + ๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค ๐๐
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
149
11 gaps and ergodic results
Summing over ๐ = 0, . . . , ๐ โ 1 then yields
(11.16) ๐๐2 โ๐ฅ๐โ๐ฅ โ2๐+๐โ1โ๐=0
๐๐๐๐ [๐บ (๐ฅ๐+1)โ๐บ (๐ฅ)]+๐โ1โ๐=0
๐๐2 โ๐ฅ๐+1โ๐ฅ๐ โ2๐ โค ๐0
2 โ๐ฅ0โ๐ฅ โ2๐ =: ๐ถ0.
Using Jensenโs inequality, it follows for the ergodic sequence dened in (11.10) that
Z๐ [๐บ (๐ฅ๐ ) โ๐บ (๐ฅ)] โค ๐ถ0.
If ๐๐ โก ๐0 and ๐พ = 0, we therefore have that Z๐ = ๐๐0๐ and thus obtain ๐ (1/๐ ) conver-gence of function values for the ergodic variable ๐ฅ๐ .
If ๐๐ โก ๐0 and ๐พ > 0, we deduce from (11.14) that Z๐ =โ๐โ1๐=0 (1 + ๐พ๐๐)๐๐๐๐0. This grows
exponentially and hence we obtain the claimed linear convergence.
Finally, if ๐๐ โ โ, we would similarly to Theorem 10.1 (ii) obtain superlinear convergenceif Z๐ /Z๐+1 โ 0 were to hold. To show this, we can write
Z๐Z๐+1
=
โ๐โ1๐=0 ๐๐๐๐โ๐๐=0 ๐๐๐๐
=
โ๐โ1๐=0
๐๐๐๐๐๐ ๐๐
1 + โ๐โ1๐=0
๐๐๐๐๐๐ ๐๐
So it suces to show that ๐๐ โโ๐โ1๐=๐0
๐๐๐๐๐๐ ๐๐
โ 0 as ๐ โ โ. This we obtain by estimating
๐๐ =๐โ1โ๐=0
๐๐/๐๐โ๐โ1๐=๐ (1 + ๐พ๐ ๐ )
โค๐โ1โ๐=0
(1 + ๐พ๐๐)/(1 + ๐พ๐๐ )โ๐โ1๐=๐ (1 + ๐พ๐ ๐ )
=๐โ1โ๐=0
1โ๐๐=๐+1(1 + ๐พ๐ ๐ )
โค๐โ1โ๐=0
(1 + ๐พ๐๐+1)โ(๐โ๐) .
In the rst and last step we have used that {๐๐}๐โโ is increasing. Now we pick ๐ > 1 andnd ๐0 โ โ such that 1 + ๐พ๐๐ โฅ ๐ for ๐ โฅ ๐0. Then for ๐ > ๐0,
๐๐ โค๐0โ1โ๐=0
(1 + ๐พ๐๐+1)โ(๐โ๐) +๐โ1โ๐=๐0
๐โ(๐โ๐) =๐0โ1โ๐=0
(1 + ๐พ๐๐+1)โ(๐โ๐) +๐โ๐0โ๐=1
๐โ ๐ .
The rst term goes to zero as ๐ โ โ while the second term, as a geometric series,converges to ๐โ1/(1 โ ๐โ1). We therefore deduce that lim๐โโ ๐๐ โค ๐โ1/(1 โ ๐โ1). Letting๐ โ โ, we see that ๐๐โ 0. ๏ฟฝ
It is possible to improve the result to be nonergodic by showing that the proximal pointmethod is in fact monotonic.
Corollary 11.3 (proximal point method function value). The proximal point method is mono-tonic, i.e.,๐บ (๐ฅ๐+1) โค ๐บ (๐ฅ๐) for all ๐ โ โ. Therefore the convergence rates of Theorem 11.2 alsohold for ๐บ (๐ฅ๐ ) โ ๐บmin.
150
11 gaps and ergodic results
Proof. We know from (11.11) that
0 โค ๐โ1๐ โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ = ใ๐๐บ (๐ฅ๐+1), ๐ฅ๐ โ ๐ฅ๐+1ใ๐ โค ๐บ (๐ฅ๐) โ๐บ (๐ฅ๐+1).
This proves monotonicity. Now (11.16) gives
Z๐ [๐บ (๐ฅ๐ ) โ๐บ (๐ฅ)] โค ๐ถ0.
Now we proceed using the growth estimates for Z๐ in the proof of Theorem 11.2. ๏ฟฝ
These results can be extended to the explicit splitting method,
๐ฅ๐+1 โ prox๐๐บ (๐ฅ๐ โ ๐โ๐น (๐ฅ๐)),
in a straightforward manner. In the next theorem, observe in comparison to Theorem 10.2that ๐๐ฟ โค 1 instead of ๐๐ฟ โค 2. This kind of factor-of-two stricter step length or Lipschitzfactor bound is a general feature of function value estimates of methods involving anexplicit step, as well as of the gap estimates in the following sections. It stems from thecorresponding dierence between the value estimate (7.8) and the non-value estimate (7.9)in Corollary 7.2.
Theorem 11.4 (explicit spliing function value). Let ๐ฝ โ ๐น + ๐บ where ๐บ : ๐ โ โ and๐น : ๐ โ โ are convex, proper, and lower semicontinuous, with ๐น moreover ๐ฟ-smooth. Suppose[๐๐ฝ ]โ1(0) โ โ . If ๐๐ฟ โค 1, the explicit splitting method satises both ๐ฝ (๐ฅ๐ ) โ ๐ฝmin at the rate๐ (1/๐ ). If ๐บ is strongly convex, then this convergence is linear.
Proof. With ๐๐ โ ๐ , as usual, we write the method as
(11.17) 0 โ ๐๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] + (๐ฅ๐+1 โ ๐ฅ๐).
We then take arbitrary ๐ฅ โ [๐(๐น + ๐บ)]โ1(0) and use the three-point smoothness of ๐นproved in Corollary 7.2, and the subdierential characterization of strong convexity of ๐บ ,Lemma 7.4 (ii), to obtain
ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐ฝ (๐ฅ๐+1) โ ๐ฝ (๐ฅ) + ๐พ2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ โ ๐ฟ
4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
As in the proof of Theorem 11.2, after testing (11.17) by the application of ๐๐ ใ ยท , ๐ฅ๐+1 โ ๐ฅใ๐ ,we now obtain
(11.18) ๐๐+12 โ๐ฅ๐+1โ๐ฅ โ2๐ +๐๐๐๐ [๐ฝ (๐ฅ๐+1) โ ๐ฝ (๐ฅ)] +๐๐ (1 โ ๐๐๐ฟ)
2 โ๐ฅ๐+1โ๐ฅ๐ โ2๐ โค ๐๐2 โ๐ฅ๐ โ๐ฅ โ2๐ .
Since ๐๐๐ฟ โค 1, we may proceed as in Theorem 11.2 to prove the ergodic convergences.
๏ฟฝ
151
11 gaps and ergodic results
Again, we can show nonergodic convergence due to the monotonicity of the iteration.
Corollary 11.5. The convergence rates of Theorem 11.4 also hold for ๐ฝ (๐ฅ๐ ) โ ๐ฝmin.
Proof. We obtain from (11.17) and the smoothness of ๐น (see (7.5)) that
๐โ1๐ โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ = ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ฅ๐ โ ๐ฅ๐+1ใ๐ โค ๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ๐+1) + ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Since ๐ฟ๐๐ โค 1 < 2, we obtain monotonicity. The rest now follows as in Theorem 11.2and Corollary 11.3. ๏ฟฝ
Remark 11.6. Based on Corollary 7.7, any strong convexity of ๐น can also be used to obtain linearconvergence by adapting the steps of the proof of Theorem 11.4.
11.3 ergodic gap estimates
We now study the convergence of gap functionals for general unaccelerated schemes ofthe form (11.3). Since ๏ฟฝ๏ฟฝ may in general not have the same factor ๐ฟ of smoothness on allsubspaces, we introduce the condition (11.19) of the next result. It is simply a version ofthe standard result of Corollary 7.2 that allows a block-separable structure through theoperator ฮ in place of the factor ๐ฟ; compare Example 10.4.
Theorem 11.7. Let ๐ป โ ๐๐น + โ๏ฟฝ๏ฟฝ + ฮ, where ฮ โ ๐(๐ ;๐ ) is skew-adjoint and ๏ฟฝ๏ฟฝ : ๐ โ โ
and ๐น : ๐ โ โ are convex, proper, and lower semicontinuous. Suppose ๐น satises for someฮ โ ๐(๐ ;๐ ) the three-point smoothness condition
(11.19) ใโ๐น (๐ง), ๐ฅ โ ๐ฅใ๐ โฅ ๐น (๐ฅ) โ ๐น (๐ฅ) โ 12 โ๐ง โ ๐ฅ โ
2ฮ (๐ฅ, ๐ฅ, ๐ง โ ๐ ).
Also let ๐ โ ๐(๐ ;๐ ) be positive semi-denite and self-adjoint. Pick ๐ฅ0 โ ๐ , and let thesequence {๐ฅ๐+1}๐โโ be generated through the iterative solution of (11.3). Then for every ๐ฅ โ ๐ ,
(11.20) 12 โ๐ฅ
๐ โ ๐ฅ โ2๐๐ +๐โ1โ๐=0
(G(๐ฅ๐+1;๐ฅ) + 1
2 โ๐ฅ๐+1 โ ๐ฅ โ2๐โฮ
)โค 1
2 โ๐ฅ1 โ ๐ฅ โ2๐๐ .
Proof. Observe that (11.19) implies
(11.21) ใโ๐น (๐ง), ๐ฅ โ ๐ฅใ โฅ ๐น (๐ฅ) โ ๐น (๐ฅ) โ 12 โ๐ง โ ๐ฅ โ
2ฮ (๐ฅ, ๐ง โ ๐ ).
Likewise, by the convexity of ๐น we have
(11.22) ใ๐๏ฟฝ๏ฟฝ (๐ฅ), ๐ฅ โ ๐ฅใ โฅ ๏ฟฝ๏ฟฝ (๐ฅ) โ ๏ฟฝ๏ฟฝ (๐ฅ) (๐ฅ โ ๐ ).
152
11 gaps and ergodic results
Using (11.21) and (11.22), we obtain
(11.23) ใ๐๏ฟฝ๏ฟฝ (๐ฅ๐+1) + โ๐น (๐ฅ๐) + ฮ๐ฅ๐+1, ๐ฅ๐+1 โ ๐ฅใโฅ (๏ฟฝ๏ฟฝ + ๐น ) (๐ฅ๐+1) โ (๏ฟฝ๏ฟฝ + ๐น ) (๐ฅ) + ใฮ๐ฅ๐+1, ๐ฅ๐+1 โ ๐ฅใ๐ โ 1
2 โ๐ง โ ๐ฅ โ2ฮ
= G(๐ฅ๐+1;๐ฅ) โ 12 โ๐ง โ ๐ฅ โ
2ฮ.
In the nal stepwe have also referred to the denition of G in (11.4) and the skew-adjointnessof ฮ.
From here on, our arguments are already standard: We test (11.3) through the applicationof ใ ยท , ๐ฅ๐+1 โ ๐ฅใ, obtaining
0 โ ใ๐๏ฟฝ๏ฟฝ (๐ฅ๐+1) + โ๐น (๐ฅ๐) + ฮ๐ฅ๐+1 +๐ (๐ฅ๐+1 โ ๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ.
Then we insert (11.23), which gives
12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + G(๐ฅ๐+1;๐ฅ) + 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐โฮ โค 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ .
Summing over ๐ = 0, . . . , ๐ โ 1 yields (11.20). ๏ฟฝ
In particular, we obtain the following corollary that shows that G(๐ฅ๐ ;๐ฅ) โ G(๐ฅ ;๐ฅ) = 0at the rate ๐ (1/๐ ) for any ๐ฅ โ ๐ปโ1(0). Even further, taking any bounded set ๐ต โ ๐ suchthat ๐ปโ1(0) โฉ ๐ต โ โ , we see that also the partial gap G(๐ฅ๐ ;๐ต) โ G(๐ฅ ;๐ต) = 0.
Corollary 11.8. In Theorem 11.7, suppose in addition that ๐ โฅ ฮ and dene the ergodicsequence
๐ฅ๐ โ1๐
๐โ1โ๐=0
๐ฅ๐+1.
ThenG(๐ฅ๐ ;๐ฅ) โค 1
2๐ โ๐ฅ 1 โ ๐ฅ โ2๐ .
Proof. This follows immediately from using๐ โฅ ฮ to eliminate the term 12 โ๐ฅ๐+1 โ ๐ฅ โ2๐โฮ
from (11.20) and then using Jensenโs inequality on the gap. ๏ฟฝ
Due to the presence of ฮ, we cannot in general prove monotonicity of the abstract proximalpoint method and thus get rid of the ergodicity of the estimates.
153
11 gaps and ergodic results
implicit splitting
We now consider the solution of
min๐ฅโ๐
๐น (๐ฅ) +๐บ (๐ฅ).
Setting ๐ต = ๐๐น and ๐ด = ๐๐บ , (9.16), the DouglasโRachford or implicit splitting method canbe written in the general form (11.3) with ๐ข = (๐ฅ, ๐ฆ, ๐ง),
๏ฟฝ๏ฟฝ (๐ข) โ ๐๐บ (๐ฆ) + ๐๐น (๐ฅ), ๐น โก 0,
ฮ โยฉยญยซ0 Id โIdโId 0 IdId โId 0
ยชยฎยฌ , and ๐ โยฉยญยซ0 0 00 0 00 0 ๐ผ
ยชยฎยฌ .Moreover,
(11.24) ๐ป (๐ข) โ ๐๏ฟฝ๏ฟฝ (๐ข) + ฮ๐ข.
We then have the following ergodic estimate for
GDRS(๐ข;๐ข) = [๐บ (๐ฆ) + ๐น (๐ฅ)] โ [๐บ (๐ฅ) + ๐น (๐ฅ)] + ใ๐ฅ โ ๏ฟฝ๏ฟฝ, ๐ฅ โ ๐ฆใ โฅ 0.
Theorem 11.9. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be proper, convex, and lower semicontinuous.Let ๐ข โ ๐ปโ1(0) for ๐ป given by (11.24). Then for any initial iterate ๐ข0 = (๐ฅ0, ๐ฆ0, ๐ง0) โ ๐ 3, theiterates {๐ข๐}๐โ๐ of the implicit splitting method (8.8) satisfy
GDRS(๏ฟฝ๏ฟฝ๐ ;๐ข) โค 12๐๐ โ๐ข
1 โ ๐ขโ2๐ , where ๏ฟฝ๏ฟฝ๐ โ1๐
๐โ1โ๐=0
๐ข๐+1.
Proof. Clearly๐ is self-adjoint and positive semi-denite, and๐ โฅ ฮ โ 0. The rest is clearfrom Corollary 11.8 by moving ๐ from G on the right-hand side, and using that ๐ฅ = ๐ฆ . ๏ฟฝ
Clearly, following the discussion in Section 11.1, we can dene a partial version of GDRSand obtain its convergence from Theorem 11.9.
primal-dual explicit splitting
We recall that the PDES method (8.23) for (11.5) corresponds to (11.5) with the choice ๐น0 = 0and ๐ธ = ๐น , while the preconditioning operator is given by
๐ โ
(Id 00 Id โ ๐พ๐พโ
)With this, we obtain the following estimate for the Lagrangian duality gap dened in(11.7).
154
11 gaps and ergodic results
Theorem 11.10. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be proper, convex, and lower semicontinuous,and ๐พ โ ๐(๐ ;๐ ). Suppose ๐น is be Gรขteaux dierentiable with ๐ฟ-Lipschitz gradient for ๐ฟ โค 1,and that โ๐พ โ๐(๐ ;๐ ) โค 1. Then for any initial iterate๐ข0 โ ๐ ร๐ , the iterates {๐ข๐ = (๐ฅ๐ , ๐ฆ๐)}๐โโof (8.23) satisfy for all ๐ข = (๐ฅ, ๐ฆ) โ ๐ ร ๐ the ergodic gap estimate
G(๏ฟฝ๏ฟฝ๐ ;๐ข) โค 12๐ โ๐ข1 โ ๐ขโ2๐ , where ๏ฟฝ๏ฟฝ๐ โ
1๐
๐โ1โ๐=0
๐ข๐+1.
In particular, if ๐ต โ ๐ is bounded and ๐ต โฉ๐ปโ1(0) โ โ , the partial duality gap G(๐ข๐ , ๐ต) โ 0at the rate ๐ (1/๐ ).
Proof. We use Corollary 11.8. Using the assumed bound โ๐พ โ๐(๐ ;๐ ) โค 1, clearly ๐ is self-adjoint and positive semi-denite. By Corollary 7.2, the three-point smoothness condition(11.19) holds with ฮ โ
(๐ฟ 00 0
), where ๐ฟ is the Lipschitz factor of โ๐น . Since โ๐พ โ๐(๐ ;๐ ) โค 1
and ๐ฟ โค 1, we also verify๐ โฅ ฮ. The rest now follows from Corollary 11.8 as well as thenonnegativity of the partial duality gap (11.8). ๏ฟฝ
primal-dual proximal splitting
We continue with the problem (11.5) and the corresponding structure (11.6) for ๐ป . We recallfrom Corollaries 9.13 and 9.18 that for the unaccelerated PDPS we take the preconditioningoperator as
(11.25) ๐ โ
(๐โ1Id โ๐พโ
โ๐พ ๐โ1Id
)for some primal and dual step length parameters ๐, ๐ > 0. We now obtain the followingresult for the Lagrangian duality gap dened in (11.7).
Theorem 11.11. Let ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ, and ๐บ : ๐ โ โ be proper, convex, andlower semicontinuous, and ๐พ โ ๐(๐ ;๐ ). Suppose ๐ธ is Gรขteaux dierentiable with ๐ฟ-Lipschitzgradient. Take ๐, ๐ > 0 satisfying
๐ฟ๐ + ๐๐ โ๐พ โ2 < 1.
Then for any initial iterate ๐ข0 โ ๐ ร ๐ the iterates {๐ข๐ = (๐ฅ๐ , ๐ฆ๐)}๐โโ of the PDPS method(9.29) satisfy for any ๐ข = (๐ฅ, ๐ฆ) โ ๐ ร ๐ the ergodic gap estimate
G(๏ฟฝ๏ฟฝ๐ ;๐ข) โค 12๐๐ โ๐ข
1 โ ๐ขโ2๐ , where ๏ฟฝ๏ฟฝ๐ โ1๐
๐โ1โ๐=0
๐ข๐+1.
In particular, if ๐ต โ ๐ is bounded and ๐ต โฉ๐ปโ1(0) โ โ , the partial duality gap G(๐ข๐ , ๐ต) โ 0at the rate ๐ (1/๐ ).
155
11 gaps and ergodic results
Proof. We use Corollary 11.8. By Corollary 7.2, the three-point smoothness condition (11.19)holds with ฮ โ
(๐ฟ 00 0
), where ๐ฟ is the Lipschitz factor of โ๐ธ. In Corollary 9.18 we have
already proved that ๐๐ is self-adjoint and positive semi-denite. Similarly to the proofof the corollary, we verify that the condition ๐ฟ๐ + ๐๐ โ๐พ โ2 < 1 guarantees ๐ โฅ ฮ. (Theonly dierence to the conditions in that result is the standard gap estimate factor-of-twodierence in the term containing ๐ฟ.) The rest is clear from Corollary 11.8 as well as thenonnegativity of the partial duality gap (11.8). ๏ฟฝ
11.4 the testing approach in its general form
We now want to produce gap estimates for accelerated methods. As we have seen inSection 10.1, as an extension of (11.3) these iteratively solve
(11.26) 0 โ๐๐+1 [๐๏ฟฝ๏ฟฝ (๐ฅ๐+1) + โ๐น (๐ฅ๐) + ฮ๐ฅ๐+1] +๐๐+1(๐ฅ๐+1 โ ๐ฅ๐)
for iteration-dependent step length and preconditioning operators๐๐+1 โ ๐(๐ ;๐ ) and๐๐+1 โ ๐(๐ ;๐ ). We also introduced testing operators ๐๐+1 โ ๐(๐ ;๐ ) such that ๐๐+1๐๐+1is self-adjoint and positive semi-denite.
Unless ๐๐+1๐๐+1 is a scalar multiple of the identity, we will not be able to extract in astraightforward way any of the gap functionals of Section 11.1 out of (11.26). Indeed, it isnot clear how to provide a completely general approach to gap functionals of acceleratedor otherwise complex algorithms. We will specically see the diculties when performinggap realignment for the accelerated PDPS in Section 11.5 and when developing very specicgap functionals for the ADMM in Section 11.6.
Towards brevity in the following sections, we however do some general preparatory work.Observe that the method (11.26) can be written more abstractly as
(11.27) 0 โ ๐ป๐+1(๐ฅ๐+1) +๐๐+1(๐ฅ๐+1 โ ๐ฅ๐)
for some iteration-dependent set-valued function ๐ป๐+1 : ๐ โ ๐ . The estimate (11.28) inthe next theorem is in essence a quantitative or variable-metric version of the three-pointsmoothness and strong convexity estimate (7.16). The proof of the following result is alreadystandard, where the abstract value V๐+1(๐ฅ) models a suitable gap functional for iterate๐ฅ๐+1.
Theorem 11.12. On a Hilbert space ๐ , let ๐ป๐+1 : ๐ โ ๐ , and๐๐+1, ๐๐+1 โ ๐(๐ ;๐ ) for ๐ โ โ.Suppose (11.27) is solvable for the iterates {๐ฅ๐}๐โโ. If ๐๐+1๐๐+1 is self-adjoint and
(11.28) ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐๐+1 โฅ V๐+1(๐ฅ) +12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐๐+2๐๐+2โ๐๐+1๐๐+1โ 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐๐+1๐๐+1
156
11 gaps and ergodic results
for all ๐ โ โ and some ๐ฅ โ ๐ and V๐+1(๐ฅ) โ โ, then both
(11.29) 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐๐+2๐๐+2 + V๐+1(๐ฅ) โค12 โ๐ฅ
๐ โ ๐ฅ โ2๐๐+1๐๐+1 (๐ โ โ)
and
(11.30) 12 โ๐ฅ
๐ โ ๐ฅ โ2๐๐+1๐๐+1 +๐โ1โ๐=0
V๐+1(๐ฅ) โค12 โ๐ฅ
0 โ ๐ฅ โ2๐1๐1(๐ โฅ 1).
Proof. Inserting (11.27) into (11.28), we obtain
(11.31) โใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐๐+1๐๐+1 โฅ12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐๐+2๐๐+2โ๐๐+1๐๐+1โ 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐๐+1๐๐+1 + V๐+1(๐ฅ).
We recall for general self-adjoint๐ the three-point formula (9.1), i.e.,
ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ =12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โ 12 โ๐ฅ
๐ โ ๐ฅ โ2๐ + 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ .
Using this with ๐ = ๐๐+1๐๐+1, we rewrite (11.31) as (11.29). Summing (11.29) over ๐ =0, . . . , ๐ โ 1, we obtain (11.30). ๏ฟฝ
11.5 ergodic gaps for accelerated primal-dual methods
To derive ergodic gap estimates for the accelerated primal-dual proximal splitting of Theo-rem 10.8, we need to perform signicant additional work due to the fact that [๐ โ ๐๐๐๐ โ ๐๐+1๐๐+1. The overall idea of the proof remains the same, but we need to pay special atten-tion to the blockwise structure of the problem and to do some realignment of the blocks toget the same factor [๐ in front of both ๐บ and ๐น .
duality gap realignment
We continue with the problem (11.5) and the setup (11.6). Working with the general scheme(11.27), we write
(11.32a) ๐ป๐+1(๐ข) โ๐๐+1(๐๏ฟฝ๏ฟฝ (๐ข๐+1) + โ๐น (๐ข๐) + ฮ)
taking as in Theorem 10.8 the testing and step length operators
๐๐+1 โ(๐๐ Id 00 ๐๐+1Id
)and ๐๐+1 โ
(๐๐ Id 00 ๐๐+1Id
)(11.32b)
157
11 gaps and ergodic results
for some step length and testing parameters ๐๐ , ๐๐+1, ๐๐ , ๐๐+1 > 0. Throughout this sectionwe also take
(11.32c) ฮ โ
(๐พ ยท Id 00 ๐ ยท Id
)and ฮ โ
(๐ฟ ยท Id 00 0
).
For the moment, we do not yet need to know the specic structure of ๐๐+1; hence thefollowing estimates apply not only to the PDPS method but also to the PDES method andits potential accelerated variants.
Lemma 11.13. Let us be given ๐พ โ ๐(๐ ;๐ ), ๐น = ๐น0 + ๐ธ with ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ, and๐บโ : ๐ โ โ convex, proper, and lower semicontinuous on Hilbert spaces ๐ and ๐ . Suppose ๐น0and ๐บโ are (strongly) convex for some ๐พ, ๐ โฅ 0, and ๐ธ has ๐ฟ-Lipschitz continuous gradient.With the setup of (11.6) and (11.32), for any ๐ข,๐ข โ ๐ ร ๐ and any ๐ โ โ we have
ใ๐ป๐+1(๐ข), ๐ข โ ๐ขใ๐๐+1 โฅ G๐+1(๐ข;๐ข) +12 โ๐ข โ ๐ขโ2๐๐+1๐๐+1 (2ฮ+ฮ) โ
14 โ๐ข โ ๐ข๐ โ2๐๐+1๐๐+1ฮ
for
G๐+1(๐ข;๐ข) โ ๐๐๐๐ (๐น (๐ฅ) โ ๐น (๐ฅ)) +๐๐+1๐๐+1(๐บโ(๐ฆ) โ๐บโ(๐ฆ))+ ใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅใ๐ โ ใ(๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆใ๐ โ ใ(๐พ๐๐๐๐ โ๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆใ๐ .
Proof. Expanding ๐ป๐+1, we have
ใ๐ป๐+1(๐ข), ๐ข โ ๐ขใ๐๐+1 = ๐๐๐๐ ใ๐๐น0(๐ฅ), ๐ฅ โ ๐ฅใ๐+ ๐๐๐๐ ใโ๐ธ (๐ฅ๐), ๐ฅ โ ๐ฅใ๐+๐๐+1๐๐+1ใ๐๐บโ(๐ฆ), ๐ฆ โ ๐ฆใ๐+ ใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅ โ ๐ฅใ๐ โ ใ(๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆ โ ๐ฆใ๐ .
Observe thatใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅ โ ๐ฅใ๐ โ ใ(๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆ โ ๐ฆใ๐
= ใ(๐พ๐๐๐๐ โ๐๐+1๐๐+1๐พ) (๐ฅ โ ๐ฅ), ๐ฆ โ ๐ฆใ๐+ ใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅ โ ๐ฅใ๐ โ ใ(๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆ โ ๐ฆใ๐
=12 โ๐ข โ ๐ขโ22๐๐+1๐๐+1ฮ โ ใ(๐พ๐๐๐๐ โ๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆใ๐+ ใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅใ๐ โ ใ(๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆใ๐ .
Therefore
(11.33) ใ๐ป๐+1(๐ข), ๐ข โ ๐ขใ๐๐+1 = ๐๐๐๐ ใ๐๐น0(๐ฅ), ๐ฅ โ ๐ฅใ๐+ ๐๐๐๐ ใโ๐ธ (๐ฅ๐), ๐ฅ โ ๐ฅใ๐+๐๐+1๐๐+1ใ๐๐บโ(๐ฆ), ๐ฆ โ ๐ฆใ๐+ 12 โ๐ข โ ๐ขโ22๐๐+1๐๐+1ฮ โ ใ(๐พ๐๐๐๐ โ๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆใ๐
+ ใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅใ๐ โ ใ(๐๐+1๐๐+1๐พ)๐ฅ, ๐ฆใ๐ .
158
11 gaps and ergodic results
Due to the smoothness three-point corollaries, specically (7.8), we have
(11.34a) ใโ๐ธ (๐ฅ๐), ๐ฅ โ ๐ฅใ๐ โฅ ๐ธ (๐ฅ) โ ๐ธ (๐ฅ) โ ๐ฟ
2 โ๐ฅ โ ๐ฅ๐ โ2๐ .
Also, by the (strong) convexity of ๐น0, we have
(11.34b) ใ๐๐น0(๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ ๐น0(๐ฅ) โ ๐น0(๐ฅ) + ๐พ2 โ๐ฅ โ ๐ฅ โ2๐ ,
as well as by the (strong) convexity of ๐บโ
(11.34c) ใ๐๐บโ(๐ฆ), ๐ฆ โ ๐ฆใ๐ โฅ ๐บโ(๐ฆ) โ๐บโ(๐ฆ) + ๐2 โ๐ฆ โ ๐ฆ โ2๐ .
Applying these estimates in (11.33), and using the structure (11.32b) and (11.32c) of theinvolved operators, we obtain the claim. ๏ฟฝ
If๐๐๐๐ = ๐๐+1๐๐+1, clearly G๐+1(๐ข๐+1;๐ข) โฅ ๐๐๐๐G(๐ข๐+1). This is the case in the unacceleratedcase already considered in Theorems 11.10 and 11.11. Some specic stochastic acceleratedalgorithms also satisfy this [see Valkonen 2019]. Applying the techniques of Section 11.3,we could then use Jensenโs inequality to estimate โ๐โ1
๐=0 G๐+1(๐ข๐+1;๐ข) โฅโ๐โ1๐=0 ๐๐๐๐G(๐ข๐+1)
further from below to obtain a gap on suitable ergodic sequences. However, in our primaryaccelerated algorithm of interest, the PDPS method, instead ๐๐๐๐ = ๐๐๐๐ . We will thereforehave to do some rearrangements.
Lemma 11.14. Let ๐พ โ ๐(๐ ;๐ ), ๐น = ๐น0 + ๐ธ with ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ, and ๐บโ : ๐ โ โ
convex, proper, and lower semicontinuous on Hilbert spaces ๐ and ๐ . Suppose ๐น0 and ๐บโ are(strongly) convex for some ๐พ, ๐ โฅ 0, and ๐ธ has ๐ฟ-Lipschitz gradient. With the setup of (11.6)and (11.32), suppose ๐๐๐๐ = ๐๐๐๐ . If ๐ข โ ๐ปโ1(0), then
(11.35) ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โฅ Gโ,๐+1(๐ฅ๐+1, ๐ฆ๐ ;๐ข) +12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1๐๐+1 (2ฮ+ฮ)
โ 12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1๐๐+1ฮ (๐ โฅ 2)
for some Gโ,๐+1(๐ฅ๐+1, ๐ฆ๐ ;๐ข) satisfying with G given by (11.7) the estimate
(11.36)๐โ1โ๐=0
Gโ,๐+1(๐ฅ๐+1, ๐ฆ๐ ;๐ข) โฅ๐โ1โ๐=1
๐๐๐๐G(๐ฅ๐+1, ๐ฆ๐ ;๐ข).
Proof. First, note that (11.35) holds for
Gโ,๐+1(๐ฅ๐+1, ๐ฆ๐ ;๐ข) โ inf๐ค๐+1โ๐ป๐+1 (๐ข๐+1)
ใ๐ค๐+1, ๐ข๐+1 โ ๐ขใ๐๐+1
โ 12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1๐๐+1 (2ฮ+ฮ) +12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1๐๐+1ฮ.
159
11 gaps and ergodic results
It remains to prove the estimate (11.36) for this choice.
With ๐ โฅ 1, let us dene the set
๐๐ โ๐โ1โ๐=0
(ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โ
12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1๐๐+1 (2ฮ+ฮ) +12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1๐๐+1ฮ
)=๐โ1โ๐=0
(๐๐๐๐
(ใ๐๐น0(๐ฅ๐+1) + โ๐ธ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โ ๐พ2 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐
)+๐๐+1๐๐+1
(ใ๐๐บโ(๐ฆ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐ โ ๐
2 โ๐ฆ๐+1 โ ๐ฆ โ2๐
)).
Observe that in the second expression, ๐๐+1๐๐+1ฮ has canceled the corresponding compo-nent of ๐ป๐+1. Then it is enough to prove that ๐๐ โฅ โ๐โ1
๐=1 ๐๐๐๐G(๐ฅ๐+1, ๐ฆ๐ ;๐ข). To do this, weneed to shift ๐ฆ๐+1 to ๐ฆ๐ . With ๐ โฅ 2, we therefore rearrange terms to obtain
๐๐ = ๐ด๐ + ๐ต๐for
๐ด๐ = ๐0๐0(ใ๐๐น0(๐ฅ 1) + โ๐ธ (๐ฅ0), ๐ฅ 1 โ ๐ฅใ๐ โ ๐พ2 โ๐ฅ
1 โ ๐ฅ โ2๐ + ๐ฟ2 โ๐ฅ1 โ ๐ฅ0โ2๐
)+๐๐๐๐
(ใ๐๐บโ(๐ฆ๐ ), ๐ฆ๐ โ ๐ฆใ๐ โ ๐
2 โ๐ฆ๐ โ ๐ฆ โ2๐
)โ ใ(๐พ๐0๐0 โ๐๐๐๐๐พ)๐ฅ, ๐ฆใ๐ + ใ(๐0๐0๐พโ)๐ฆ, ๐ฅ 1ใ๐ โ ใ(๐๐๐๐๐พ)๐ฅ, ๐ฆ๐ ใ๐
and
๐ต๐ โ๐โ1โ๐=1
(๐๐๐๐
(ใ๐๐น0(๐ฅ๐+1) + โ๐ธ (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โ ๐พ2 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ + ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐
)+๐๐๐๐
(ใ๐๐บโ(๐ฆ๐), ๐ฆ๐ โ ๐ฆใ๐ โ ๐
2 โ๐ฆ๐+1 โ ๐ฆ โ2๐
)+ ใ(๐๐๐๐๐พโ)๐ฆ, ๐ฅ๐+1ใ๐ โ ใ(๐๐๐๐๐พ)๐ฅ, ๐ฆ๐ใ๐
)Observe that we only sum over ๐ = 1, . . . , ๐ โ 1 instead of ๐ = 0, . . . , ๐ โ 1.
We can now use (11.34) and our assumption ๐๐๐๐ = ๐๐๐๐ to estimate
(11.37) ๐ต๐ โฅ๐โ1โ๐=1
๐๐๐๐G(๐ฅ๐+1, ๐ฆ๐).
By Corollary 7.2, ๐ธ satises the three-point monotonicity estimate (7.9); in particular,
ใโ๐ธ (๐ฅ0) โ โ๐ธ (๐ฅ), ๐ฅ 1 โ ๐ฅใ๐ โฅ โ๐ฟ2 โ๐ฅ1 โ ๐ฅ0โ2๐ .
160
11 gaps and ergodic results
Since ๐พโ๐ฅ โ ๐๐บโ(๐ฆ), and โ๐พ๐ฆ โ ๐๐น0(๐ฅ) + โ๐ธ (๐ฅ), and ๐๐น0 and ๐๐บ are strongly monotone,we also obtain
ใ๐๐น0(๐ฅ 1) + โ๐ธ (๐ฅ) + ๐พโ๐ฆ, ๐ฅ 1 โ ๐ฅใ๐ โ ๐พ2 โ๐ฅ1 โ ๐ฅ โ2๐ โฅ 0 and
ใ๐๐บโ(๐ฆ๐ ) โ ๐พ๐ฅ, ๐ฆ๐ โ ๐ฆใ๐ โ ๐
2 โ๐ฆ๐ โ ๐ฆ โ2๐ โฅ 0.
Rearranging and using these estimates we obtain
(11.38) ๐ด๐ = ๐0๐0(ใ๐๐น0(๐ฅ 1) + โ๐ธ (๐ฅ0) + ๐พโ๐ฆ, ๐ฅ 1 โ ๐ฅใ๐ โ ๐พ2 โ๐ฅ
1 โ ๐ฅ โ2๐ + ๐ฟ2 โ๐ฅ1 โ ๐ฅ0โ2๐
)+๐๐๐๐
(ใ๐๐บโ(๐ฆ๐ ) โ ๐พ๐ฅ, ๐ฆ๐ โ ๐ฆใ๐ โ ๐พ2 โ๐ฆ
๐ โ ๐ฆ โ2๐)โฅ 0.
The estimates (11.37) and (11.38) nally give ๐๐ โฅ โ๐โ1๐=1 ๐๐๐๐G(๐ฅ๐+1, ๐ฆ๐ ;๐ข) as we set out to
prove. ๏ฟฝ
In the proof of Lemma 11.14, we required ๐ข โ ๐ปโ1(0) to show that ๐ด๐ โฅ 0. Therefore, asthe estimate (11.35) will not hold for an arbitrary base point ๐ข in place ๐ข, we will not beable to obtain for accelerated methods the convergence of the partial duality gap (11.8) thatconverges for unaccelerated methods.
The next theorem is our main result regarding ergodic gaps for general accelerated methods.As ๐พ and ๐ feature as acceleration parameters in algorithms, the conditions of this theoremimply that gap estimates require slower acceleration.
Theorem 11.15. Let ๐พ โ ๐(๐ ;๐ ), ๐น = ๐น0 + ๐ธ with ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ, and ๐บโ : ๐ โ โ
convex, proper, and lower semicontinuous on Hilbert spaces ๐ and ๐ . Suppose ๐น0 and ๐บโ
are (strongly) convex for some ๐พ, ๐ โฅ 0, and ๐ธ has ๐ฟ-Lipschitz gradient. Assume the setup(11.6) and (11.32). For each ๐ โ โ, also take ๐๐+1 โ ๐(๐ ร ๐ ;๐ ร ๐ ) such that ๐๐+1๐๐+1is self-adjoint. Pick an initial iterate ๐ข0 โ ๐ ร ๐ and suppose {๐ข๐+1 = (๐ฅ๐+1, ๐ฆ๐+1)}๐โโ aregenerated by (11.27). Let ๐ข = (๐ฅ, ๐ฆ) โ ๐ปโ1(0). If ๐๐๐๐ = ๐๐๐๐ , and
(11.39) 12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1 (๐๐+1โ๐๐+1ฮ) +12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1 (๐๐+1+๐๐+1 (2ฮ+ฮ))โ๐๐+2๐๐+2 โฅ 0,
then
(11.40) 12 โ๐ข
๐ โ ๐ขโ2๐๐+1๐๐+1 + Zโ,๐G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ;๐ข) โค โ๐ข0 โ ๐ขโ2๐1๐1(๐ โฅ 2)
for G given by (11.7) and the ergodic sequences
๐ฅโ,๐ โ Z โ1โ,๐๐โ1โ๐=1
๐๐๐๐๐ฅ๐+1 and ๏ฟฝ๏ฟฝโ,๐ โ Z โ1โ,๐
๐โ1โ๐=1
๐๐๐๐๐ฆ๐ for Zโ,๐ โ
๐โ1โ๐=1
[๐ .
161
11 gaps and ergodic results
Proof. Using (11.35) in (11.39), we obtain (11.28) forV๐+1(๐ข) โ Gโ,๐+1(๐ฅ๐+1, ๐ฆ๐ ;๐ข). By Jensenโsinequality,
๐โ1โ๐=0
Gโ,๐+1(๐ฅ๐+1, ๐ฆ๐ ;๐ข) โฅ Zโ,๐G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ;๐ข).
We therefore obtain (11.40) from (11.30) in Theorem 11.12. ๏ฟฝ
accelerated primal-dual proximal splitting
We now obtain gap estimates for the accelerated PDPS method. Observe the factor-of-twodierences in the denitions of ๐๐ and in the initial conditions for the step lengths inthe following theorem compared to Theorem 10.8. Because strong convexity with factor๐พ implies strong convexity with the factor ๐พ/2, the conditions and step length rules ofthis theorem imply the iterate convergence results of Corollary 9.18 and Theorem 10.8 aswell.
Theorem 11.16 (gap estimates for PDPS). Let ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ and ๐บ : ๐ โ โ beconvex, proper, and lower semicontinuous on Hilbert spaces๐ and ๐ with โ๐ธ ๐ฟ-Lipschitz. Alsolet ๐พ โ ๐(๐ ;๐ ) and let ๐ข = (๐ฅ, ๐ฆ) be a primal-dual solution to the problem (11.5). Pick initialstep lengths ๐0, ๐0 > 0 subject to ๐ฟ๐0 + ๐0๐0โ๐พ โ2๐(๐ ;๐ ) < 1. For any initial iterate ๐ข0 โ ๐ ร ๐ ,suppose {๐ข๐+1}๐โโ are generated by the (accelerated) PDPS method (10.23). Let the Lagrangianduality gap functional G be given by (11.7), and the ergodic iterates ๐ฅโ,๐ and ๏ฟฝ๏ฟฝโ,๐ by (11.15).
(i) If we take ๐๐ โก ๐0 and ๐๐ โก ๐0, then the ergodic gap G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ;๐ข) โ 0 at the rate๐ (1/๐ ).
(ii) If ๐น0 is strongly convex with factor ๐พ > 0, and we take
๐๐ โ 1/โ1 + ๐พ๐๐ , ๐๐+1 โ ๐๐๐๐ , and ๐๐+1 โ ๐๐/๐๐ ,
then G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ;๐ข) โ 0 at the rate ๐ (1/๐ 2)(iii) If both ๐น0 and ๐บโ are strongly convex with respective factors ๐พ > 0 and ๐ > 0, and we
take๐๐ โ 1/
โ1 + \, \ โ min{๐๐0, ๐พ๐0}, ๐๐ โ ๐0 and ๐๐ โ ๐0,
then G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ;๐ข) โ 0 linearly.
Proof. We use Theorem 11.15 in place of Theorem 10.5 in the proof of Theorem 10.8. Werecall that the latter consists of showing ๐๐+1๐๐+1 to be self-adjoint and (10.17) and๐ โฅ ฮto hold, i.e.,
๐๐+1(๐๐+1 + 2๐๐+1ฮ) โฅ ๐๐+2๐๐+2, and ๐๐+1(๐๐+1 โ๐๐+1ฮ/2) โฅ 0,
162
11 gaps and ergodic results
Now, to prove (11.39), we instead prove the self-adjointness as well as
๐๐+1(๐๐+1 +๐๐+1ฮ) โฅ ๐๐+2๐๐+2, and ๐๐+1(๐๐+1 โ๐๐+1ฮ) โฅ 0.
These all follows from the proof of Theorem 10.8 with the factor-of-two dierences in theformulas for ๐๐ and the initialization condition apparent from the statements of these twotheorems. The proof of Theorem 10.8 also veries that ๐๐๐๐ = ๐๐๐๐ .
All the conditions Theorem 11.15 are therefore satised, so (11.40) holds; in particular,Zโ,๐G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ;๐ข) โค ๐ถ0 โ โ๐ข0โ๐ขโ2๐1๐1
for all ๐ โฅ 2. It remains to study the convergencerate of the gap from this estimate. We have Zโ,๐ =
โ๐โ1๐=1 ๐
1/2๐
. In the unaccelerated case(๐พ = 0), we get Zโ,๐ = ๐๐ 1/2
0 . This gives the claimed๐ (1/๐ ) rate. In the accelerated case, ๐๐is of the order ฮฉ(๐2) by the proof of Theorem 10.8. Therefore also Zโ,๐ is of the orderฮ(๐ 2),so we get the claimed๐ (1/๐ 2) convergence. In the linear convergence case, likewise, ๐๐ isexponential. Therefore so is Zโ,๐ . ๏ฟฝ
Remark 11.17 (spatially adaptive and stochastic methods). Recalling the block-separability Exam-ple 10.3, consider the spaces ๐ = ๐1 ร ยท ยท ยท ร ๐๐ and ๐ = ๐1 ร ยท ยท ยท ร ๐๐ . Suppose ๐น (๐ฅ) = โ๐
๐=1 ๐น ๐ (๐ฅ ๐ )and๐บโ(๐ฆ) = โ๐
โ=1๐บโโ (๐ฆโ ) for ๐ฅ = (๐ฅ1, . . . , ๐ฅ๐) โ ๐ and ๐ฆ = (๐ฆ1, . . . , ๐ฆ๐) โ ๐ . Take ๐๐+1 โ
(ฮฆ๐ 00 ฮจ๐+1
)as well as๐๐+1 โ
(๐๐ 00 ฮฃ๐+1
)for ๐๐ โ
โ๐๐=1 ๐๐,๐๐ ๐ , and similar expressions for ฮฆ๐ , ฮฃ๐+1, and ฮฃ๐+1,
where ๐ ๐๐ฅ โ ๐ฅ ๐ projects into ๐ ๐ . Instead of ๐๐๐๐ = ๐๐๐๐ that we required in (10.8), imposing๐ผ[ฮฆ๐๐๐ ] = [๐ ๐ผ and ๐ผ[ฮจ๐ฮฃ๐ ] = [๐ ๐ผ for some scalar [๐ , we may then start following through theproof of Theorem 10.8 to derive stochastic block-coordinate methods that randomly update onlysome of the blocks on each iteration, as well as methods that adapt the blockwise step lengths tothe spatial or blockwise structure of the problem. With somewhat more eort, we can also followthrough the proofs of the present Section 11.5. Specically, if we replace our ergodic sequences by
๐ฅโ,๐ โ Z โ1โ,๐๐ผ
[๐โ1โ๐=1
๐ โ๐ ฮฆ
โ๐๐ฅ๐+1
]and ๏ฟฝ๏ฟฝโ,๐ โ Z โ1โ,๐๐ผ
[๐โ1โ๐=1
ฮฃโ๐ฮจ
โ๐๐ฆ
๐
]for Zโ,๐ โ
๐โ1โ๐=1
[๐ ,
we then obtain in place of (11.40) the estimate
๐ผ
[12 โ๐ข
๐ โ ๐ขโ2๐๐ +1๐๐ +1
]+ Zโ,๐G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ) +
๐โ1โ๐=0
๐ผ [V๐+1(๐ข)] โค โ๐ข0 โ ๐ขโ2๐1๐1.
If instead ๐ผ[ฮฆ๐๐๐ ] = [๐ ๐ผ , and ๐ผ[ฮจ๐+1ฮฃ๐+1] = [๐ ๐ผ , we get the result for the ergodic sequences
๐ฅ๐ โ Z โ1๐ ๐ผ
[๐โ1โ๐=0
๐ โ๐ ฮฆ
โ๐๐ฅ๐+1
]and ๏ฟฝ๏ฟฝ๐ โ Z โ1๐ ๐ผ
[๐โ1โ๐=0
ฮฃโ๐+1ฮจ
โ๐+1๐ฆ
๐+1]
where Z๐ โ๐โ1โ๐=0
[๐ .
In either case, if we do not or cannot, due to lack of strong convexity of some of the ๐นโ , accelerateall of the blockwise step lengths ๐๐+1, ๐ with the same factor ๐พ = ๐พ ๐ , it will generally be the casethat ๐ผ [V๐+1(๐ข)] < 0. This quantity will have such an order of magnitude that we get mixed๐ (1/๐ 2) + ๐ (1/๐ ) convergence rates. We refer to [Valkonen 2019] for details on such spatiallyadaptive and stochastic primal-dual methods, and [Wright 2015] for an introduction to the idea ofstochastic coordinate descent.
163
11 gaps and ergodic results
11.6 convergence of the admm
Let ๐บ : ๐ โ โ, ๐น : ๐ โ โ be convex, proper, and lower semicontinuous, ๐ด โ ๐(๐ ;๐ ),๐ต โ ๐(๐ ;๐ ), and ๐ โ ๐ . Recall the problem(11.41) min
๐ฅ,๐ง๐ฝ (๐ฅ, ๐ง) โ ๐บ (๐ฅ) + ๐น (๐ง) + ๐ฟ๐ถ (๐ฅ, ๐ง),
where๐ถ โ {(๐ฅ, ๐ง) โ ๐ ร ๐ | ๐ด๐ฅ + ๐ต๐ง = ๐}.
We now show an ergodic convergence result for the ADMM applied to this problem, whichwe recall from (8.30) to read
(11.42)
๐ฅ๐+1 โ (๐ดโ๐ด + ๐โ1๐๐น )โ1(๐ดโ(๐ โ ๐ต๐ง๐ โ ๐โ1_๐)),๐ง๐+1 โ (๐ตโ๐ต + ๐โ1๐๐บ)โ1(๐ตโ(๐ โ๐ด๐ฅ๐+1 โ ๐โ1_๐)),_๐+1 โ _๐ + ๐ (๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐).
The general structure of the convergence proof is very similar to all the other algorithmswe have studied. However, now the forward-step component does not arise as a gradientโ๐ธ but is a special nonself-adjoint preconditioner ๏ฟฝ๏ฟฝ๐+1. Moreover, in the rst stage of theproof we obtain a convergence estimate for a duality gap that we then rene at the end ofthe proof to separate function value and constraint satisfaction estimates.
Theorem 11.18. Let ๐บ : ๐ โ โ and ๐น : ๐ โ โ be convex, proper, and lower semicontinuous,๐ด โ ๐(๐ ;๐ ), ๐ต โ ๐(๐ ;๐ ), and ๐ โ ๐ . Let ๐ฝ be dened as in (11.41), which we assume to admita solution (๐ฅ, ๏ฟฝ๏ฟฝ) โ ๐ ร๐ . For arbitrary initial iterates (๐ฅ0, ๐ฆ0, ๐ง0), let {(๐ฅ๐+1, ๐ง๐+1, _๐+1)}๐โโ โ๐ ร ๐ ร ๐ be generated by the ADMM (11.42) for (11.41). Dene the ergodic sequences ๐ฅ๐ โ1๐
โ๐โ1๐=0 ๐ฅ
๐+1 and ๐ง๐ โ 1๐
โ๐โ1๐=0 ๐ง
๐+1. Then both (๐บ+๐น ) (๐ฅ๐ , ๐ง๐ ) โ min๐ฅโ๐ ๐ฝ (๐ฅ) and โ๐ด๐ฅ๐ +๐ต๐ง๐ โ ๐ โ๐ โ 0 at the rate ๐ (1/๐ ).
Proof. We consider the augmented problem
min(๐ฅ,๐ง)โ๐ร๐
๐ฝ๐ (๐ฅ, ๐ง) โ ๐บ (๐ฅ) + ๐น (๐ง) + ๐ฟ๐ถ (๐ฅ, ๐ง) + ๐2 โ๐ด๐ฅ + ๐ต๐ง โ ๐ โ2๐ ,
which has the same solutions as (11.41). As the normal cone to the constraint set ๐ถ at anypoint (๐ฅ, ๐ง) โ ๐ถ is given by ๐๐ถ (๐ฅ, ๐ง) = {(๐ดโ_, ๐ตโ_) | _ โ ๐ }, setting ๐ข = (๐ฅ, ๐ง, _) and
๐ป (๐ข) โ ยฉยญยซ๐๐บ (๐ฅ) +๐ดโ_ + ๐๐ดโ(๐ด๐ฅ + ๐ต๐ง โ ๐)๐๐น (๐ง) + ๐ตโ_ + ๐๐ตโ(๐ด๐ฅ + ๐ต๐ง โ ๐)
โ(๐ด๐ฅ + ๐ต๐ง โ ๐)ยชยฎยฌ ,
the optimality conditions for this problem can be written as 0 โ ๐ป (๐ข). In particular, thereexists _ โ ๐ such that (๐ฅ, ๏ฟฝ๏ฟฝ, _) โ ๐ปโ1(0). However, we will not be needing this, and take _arbitrary.
164
11 gaps and ergodic results
We could rewrite the algorithm (11.42) as (11.27) with
๐ป๐+1(๐ข) = ๐ป (๐ข) and ๐๐+1 =ยฉยญยซ0 โ๐๐ดโ๐ต โ๐ดโ
0 0 โ๐ตโ0 0 ๐โ1๐ผ
ยชยฎยฌ .However,๐๐+1 is nonsymmetric, and any symmetrizing ๐๐+1 would make ๐๐+1๐ป๐+1 dicultto analyze. We therefore take instead
๐ป๐+1(๐ข) โ ๐ป (๐ข) + ๏ฟฝ๏ฟฝ๐+1(๐ข โ ๐ข๐) with ๏ฟฝ๏ฟฝ๐+1 โยฉยญยซ0 โ๐๐ดโ๐ต โ๐ดโ
0 โ๐๐ตโ๐ต โ๐ตโ0 0 0
ยชยฎยฌ ,as well as
๐๐+1 โยฉยญยซ0 0 00 ๐๐ตโ๐ต 00 0 ๐โ1๐ผ
ยชยฎยฌ , and ๐๐+1 โ ๐ผ .
Clearly ๐๐+1๐๐+1 is self-adjoint.
Let us set
ฮ โ ๐ยฉยญยซ๐ดโ๐ด ๐ดโ๐ต 0๐ตโ๐ด ๐ตโ๐ต 00 0 0
ยชยฎยฌ and ฮ โยฉยญยซ0 0 ๐ดโ
0 0 ๐ตโ
โ๐ด โ๐ต 0
ยชยฎยฌ .Using the fact that ๐ด๐ฅ + ๐ต๐ฅ = ๐ , observe that we can split ๐ป = ๐๐น + ฮ, where
๐น (๐ข) โ ๐บ (๐ฅ) + ๐น (๐ง) + ๐2 โ๐ด๐ฅ + ๐ต๐ง โ ๐ โ2๐ + ใ๐, _ใ๐
= ๐บ (๐ฅ) + ๐น (๐ง) + 12 โ๐ข โ ๐ขโ2ฮ + ใ๐, _ใ๐ .
It follows
ใ๐ป (๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โฅ ๐น (๐ข๐+1) โ ๐น (๐ข) + 12 โ๐ข
๐+1 โ ๐ขโ2ฮ + ใ๐ข,๐ข๐+1ใฮ= [๐น (๐ฅ๐+1) +๐บ (๐ง๐+1)] โ [๐น (๐ฅ) +๐บ (๐ฅ)] + ใ๐, _๐+1 โ _ใ๐+ โ๐ข๐+1 โ ๐ขโ2ฮ + ใ๐ข,๐ข๐+1ใฮ.
Again using ๐ด๐ฅ + ๐ต๐ฅ = ๐ , we expand
ใ๐ข,๐ข๐+1ใฮ = ใ_, ๐ด๐ฅ๐+1 + ๐ต๐ง๐+1ใ๐ โ ใ๐ด๐ฅ + ๐ต๏ฟฝ๏ฟฝ, _๐+1ใ๐= ใ_, ๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐ใ๐ โ ใ๐, _๐+1 โ _ใ๐ .
Thus
(11.43) ใ๐ป (๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โฅ [๐น (๐ฅ๐+1) +๐บ (๐ง๐+1)] โ [๐น (๐ฅ) +๐บ (๐ฅ)]+ โ๐ข๐+1 โ ๐ขโ2ฮ + ใ_, ๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐ใ๐
= ๐น (๐ข๐+1; _) โ ๐น (๐ข; _) + โ๐ข๐+1 โ ๐ขโ2ฮ
165
11 gaps and ergodic results
for
(11.44) ๐น (๐ข; _) โ ๐น (๐ฅ) +๐บ (๐ง) + ใ_, ๐ด๐ฅ + ๐ต๐ง โ ๐ใ๐ .
On the other hand,
ใ๐ข๐+1 โ ๐ข๐ , ๐ข๐+1 โ ๐ขใ๐๐+1๏ฟฝ๏ฟฝ๐+1 = ใโ๐๐ต(๐ง๐+1 โ ๐ง๐) โ (_๐+1 โ _๐), ๐ด(๐ฅ๐+1 โ ๐ฅ)ใ๐+ ใโ๐๐ต(๐ง๐+1 โ ๐ง๐) โ (_๐+1 โ _๐), ๐ต(๐ง๐+1 โ ๏ฟฝ๏ฟฝ)ใ๐
= ใโ๐๐ต(๐ง๐+1 โ ๐ง๐) โ (_๐+1 โ _๐), ๐ด(๐ฅ๐+1 โ ๐ฅ) + ๐ต(๐ง๐+1 โ ๏ฟฝ๏ฟฝ)ใ๐ .From (11.42) we recall
_๐+1 โ _๐ = ๐ (๐ด๐ฅ๐+1 + ๐ต๐ง๐+1 โ ๐) = ๐ [๐ด(๐ฅ๐+1 โ ๐ฅ) + ๐ต(๐ง๐+1 โ ๏ฟฝ๏ฟฝ)] .Hence
(11.45) ใ๐ข๐+1 โ ๐ข๐ , ๐ข๐+1 โ ๐ขใ๐๐+1๏ฟฝ๏ฟฝ๐+1 = โโ๐ข๐+1 โ ๐ขโ2ฮ โ ใ๐ต(๐ง๐+1 โ ๐ง๐), _๐+1 โ _๐ใ๐โฅ โโ๐ข๐+1 โ ๐ขโ2ฮ โ
12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1๐๐+1 .
Combining (11.43) and (11.45) it follows that
ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โฅ ๐น (๐ข๐+1; _) โ ๐น (๐ข; _) โ 12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1๐๐+1 .
By Theorem 11.12 now
12 โ๐ข
๐ โ ๐ขโ2๐๐+1๐๐+1 +๐โ1โ๐=0
(๐น (๐ข๐+1; _) โ ๐น (๐ข; _)
)โค 1
2 โ๐ข0 โ ๐ขโ2๐1๐1
(๐ โฅ 1).
Writing ๏ฟฝ๏ฟฝ๐ = (๐ฅ๐ , ๏ฟฝ๏ฟฝ๐ , _๐ ) โ 1๐
โ๐โ1๐=0 ๐ข
๐+1, Jensenโs inequality now shows that
(11.46) ๐น (๏ฟฝ๏ฟฝ๐ ; _) โ ๐น (๐ข; _) โค 12๐ โ๐ข0 โ ๐ขโ2๐1๐1
(๐ โฅ 1).
Since ๐ด๐ฅ + ๐ต๏ฟฝ๏ฟฝ = ๐ , observe that ๐น ( ยท ; _) โ ๐น (๐ข; _) is the Lagrangian duality gap (11.7) forthe minmax formulation (8.27) of (11.41), hence nonnegative when ๐ข โ ๐ปโ1(0). So (11.46)shows the convergence of the duality gap. However, we can improve the result somewhatsince _ was taken as arbitrary. Expanding ๐น using (11.44) and taking the supremum over_ โ ๐น(0, ^) in (11.46), we thus obtain for any ^ > 0 the estimate
0 โค [๐น (๐ฅ๐ ) +๐บ (๐ง๐ )] โ [๐น (๐ฅ) +๐บ (๐ฅ)] + ^โ๐ด๐ฅ๐ + ๐ต๐ง๐ โ ๐ โ๐= sup_โ๐น(0, )
(๐น (๏ฟฝ๏ฟฝ๐ ; _) โ ๐น (๐ข; _)
)โค sup_โ๐น(0, )
12๐ โ๐ข0 โ ๐ขโ2๐1๐1
.
This gives the claim. ๏ฟฝ
166
12 META-ALGORITHMS
In this chapter, we consider several โmeta-algorithmsโ for accelerating minimization al-gorithms such as the ones derived in the previous chapters. These include inertia andover-relaxation, as well as line searches. These schemes dier from the strong convexitybased acceleration of Chapter 9 in that no additional assumptions are made on ๐น and ๐บ .Rather, through the use of an additional extrapolated or interpolated point, the rst twoschemes attempt to obtain a second-order approximation of the function. Line search, onthe other hand, can be used to nd optimal parameters or to estimate unknown parameters.Throughout the chapter, we base our work in the abstract algorithm (11.27), i.e.,
(12.1) 0 โ ๐ป๐+1(๐ฅ๐+1) +๐๐+1(๐ฅ๐+1 โ ๐ฅ๐),
where the iteration-dependent set-valued operator ๐ป๐+1 : ๐ โ ๐ in suitable sense approxi-mates a (monotone) operator๐ป : ๐ โ ๐ , whose root we intend to nd, and๐๐+1 โ ๐(๐ ;๐ )is a linear preconditioner.
12.1 over-relaxation
We start with over-relaxation. Essentially, this amounts to taking (12.1) and replacing ๐ฅ๐in the preconditioner by an over-relaxed point ๐ง๐ dened for some parameters _๐ > 0through the recurrence
(12.2) ๐ง๐+1 โ _โ1๐ ๐ฅ๐+1 + (1 โ _โ1๐ )๐ง๐ .
We thus seek to solve
(12.3) 0 โ ๐ป๐+1(๐ฅ๐+1) +๐๐+1(๐ฅ๐+1 โ ๐ง๐) .
Since ๐ง๐+1 โ ๐ง๐ = _โ1๐(๐ฅ๐+1 โ ๐ง๐), we can write (12.1) as
(12.4) 0 โ ๐ป๐+1(๐ฅ๐+1) + _๐๐๐+1(๐ง๐+1 โ ๐ง๐).
We can therefore lift the overall algorithm into the form (12.1) as
(12.5) 0 โ ๏ฟฝ๏ฟฝ๐+1(๐๐+1) + ๏ฟฝ๏ฟฝ๐+1(๐๐+1 โ ๐๐)
167
12 meta-algorithms
by taking ๐ โ (๐ฅ, ๐ง) with
(12.6) ๏ฟฝ๏ฟฝ๐+1(๐) โ(๐ป๐+1(๐ฅ)_โ1๐(๐ง โ ๐ฅ)
)and ๏ฟฝ๏ฟฝ๐+1 โ
(0 _๐๐๐+10 (๐ผ โ _โ1
๐)๐ผ
).
To be able to use our previous estimate on ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐๐+1 , we would like to testwith
๐๐+1 โ(_๐๐๐+1 0
0 0
).
Unfortunately, ๐๐+1๐๐+1 is not self-adjoint, so Theorem 11.12 does not apply. However,observing from (12.2) that
(12.7) ๐ง๐+1 โ ๐ฅ๐+1 = (1 โ _๐) (๐ง๐+1 โ ๐ง๐),
we are able to proceed along the same lines of proof.
Theorem 12.1. On a Hilbert space ๐ , let ๐ป๐+1 : ๐ โ ๐ , and๐๐+1, ๐๐+1 โ ๐(๐ ;๐ ) for ๐ โ โ.Suppose (12.3) is solvable for the iterates {๐ฅ๐}๐โโ. If ๐๐+1๐๐+1 is self-adjoint,
(12.8) _2๐๐๐+1๐๐+1 โฅ _2๐+1๐๐+2๐๐+2,
and
(12.9) ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐๐+1 โฅ V๐+1(๐ฅ) โ12 โ๐ฅ
๐+1 โ ๐ง๐ โ2๐๐+1๐๐+1for some ๐๐+1 โ ๐(๐ ;๐ ), for all ๐ โ โ and some ๐ฅ โ ๐ and V๐+1(๐ฅ) โ โ, then
(12.10)_2๐+12 โ๐ง๐+1 โ ๐ฅ โ2๐๐+2๐๐+2 + _๐V๐+1(๐ฅ) +
_๐2 โ๐ง๐+1 โ ๐ง๐ โ2_๐ (2_๐โ1)๐๐+1๐๐+1โ๐๐+1๐๐+1
โค _2๐
2 โ๐ง๐ โ ๐ฅ โ2๐๐+1๐๐+1 (๐ โ โ).
Proof. Taking ๐ โ (๐ฅ, ๐ฅ), we apply ใ ยท , ๐๐+1 โ ๐ใ๐๐+1 to (12.3). Thus
0 โ ใ๏ฟฝ๏ฟฝ๐+1(๐๐+1) + ๏ฟฝ๏ฟฝ๐+1(๐๐+1 โ ๐๐), ๐๐+1 โ ๐ใ๐๐+1 .
Observe that๐๐+1๏ฟฝ๏ฟฝ๐+1 =
(0 _2
๐๐๐+1๐๐+1
0 0
).
Thus0 โ ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ_๐๐๐+1 + _2๐ ใ๐ง๐+1 โ ๐ง๐ , ๐ฅ๐+1 โ ๐ฅใ๐๐+1๐๐+1 .
Using (12.7) we then get
0 โ ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1โ๐ฅใ_๐๐๐+1 โ_2๐ (1โ_๐)โ๐ง๐+1โ๐ง๐ โ2๐๐+1๐๐+1 +_2๐ ใ๐ง๐+1โ๐ง๐ , ๐ง๐+1โ๐ฅใ๐๐+1๐๐+1 .
168
12 meta-algorithms
Using the three-point-identity (9.1), we rearrange this into
0 โ ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ_๐๐๐+1 +_2๐โ 2_2
๐(1 โ _๐)2 โ๐ง๐+1 โ ๐ง๐ โ2๐๐+1๐๐+1
+ _2๐
2 โ๐ง๐+1 โ ๐ฅ โ2๐๐+1๐๐+1 โ_2๐
2 โ๐ง๐ โ ๐ฅ โ2๐๐+1๐๐+1 .
Observe that _2๐โ 2_2
๐(1 โ _๐) = _2
๐(2_๐ โ 1). Using (12.2), (12.9), and (12.8), this gives
(12.10). ๏ฟฝ
Clearly we should try to ensure _๐ (2_๐ โ 1)๐๐+1๐๐+1 โฅ ๐๐+1๐๐+1. If ๐๐+1๐๐+1 = ๐0๐0 is con-stant and ๐๐+1 = 0, this holds if {_๐}๐โ๐ is nonincreasing and satises _๐ โฅ 1/2. Therefore,we cannot get any convergence rates from the iterates in this case. It is, however, possibleto obtain convergence of a gap, and it would be possible to obtain weak convergence.
The next result is a variant of Corollary 11.8 for over-relaxed methods.
Corollary 12.2. Let ๐ป โ ๐๐น + โ๏ฟฝ๏ฟฝ + ฮ, where ฮ โ ๐(๐ ;๐ ) is skew-adjoint, and ๏ฟฝ๏ฟฝ : ๐ โ โ
and ๐น : ๐ โ โ convex, proper, and lower semicontinuous. Suppose ๐น satises for someฮ โ ๐(๐ ;๐ ) the three-point smoothness condition (11.19). Also let ๐ โ ๐(๐ ;๐ ) be positivesemi-denite and self-adjoint. Pick ๐ฅ0 = ๐ง0 โ ๐ , and dene the sequence {(๐ฅ๐+1, ๐ง๐+1)}๐โโthrough
(12.11){
0 โ [๐๏ฟฝ๏ฟฝ (๐ฅ๐+1) + ๐๐น (๐ง๐) + ฮ๐ฅ๐+1] +๐ (๐ฅ๐+1 โ ๐ง๐),๐ง๐+1 โ _โ1๐ ๐ฅ
๐+1 โ (_โ1๐ โ 1)๐ง๐ .
Suppose {_๐}๐โโโ is nonincreasing and
(12.12) _๐ (2_๐ โ 1)๐ โฅ ฮ (๐ โ โ).
Then for every ๐ฅ โ ๐ปโ1(0) and the gap functional G dened in (11.4),
(12.13) G(๐ฅ๐ ;๐ฅ) โค _202โ๐โ1
๐=0 _๐โ๐ง0 โ ๐ฅ โ2๐ , where ๐ฅ๐ โ
1โ๐โ1๐=0 _๐
๐โ1โ๐=0
_๐๐ฅ๐+1.
Proof. The method (12.11) is (12.3) with ๏ฟฝ๏ฟฝ๐+1(๐ฅ) โ ๐๏ฟฝ๏ฟฝ (๐ฅ) +โ๐น (๐ง๐) +ฮ๐ฅ as well as๐๐+1 โก ๐and ๐๐+1 โก Id. Using (11.19) for ๐น , the convexity of ๏ฟฝ๏ฟฝ , and the assumption ๐๐ = [Id, weobtain as in the proof of (11.7) the estimate
ใ๐ป๐+1(๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ โฅ G(๐ฅ๐+1;๐ฅ) โ 12 โ๐ง
๐ โ ๐ฅ๐+1โ2ฮ
169
12 meta-algorithms
This provides (12.9) while (12.12) and the constant choice of the testing and preconditioningoperators guarantee that _๐ (2_๐ โ 1)๐๐+1๐๐+1 โฅ ๐๐+1๐๐+1 for ๐๐+1 โก ฮ. By Theorem 12.1,we now obtain
(12.14)_2๐+12 โ๐ง๐+1 โ ๐ฅ โ2๐ + _๐ G(๐ฅ๐+1;๐ฅ) โค _2
๐
2 โ๐ง๐ โ ๐ฅ โ2๐ .
Summing over ๐ = 0, . . . , ๐ โ 1 and an application of Jensenโs inequality nishes theproof. ๏ฟฝ
over-relaxed proximal point method
We apply the above results to the over-relaxed proximal point method
(12.15){๐ฅ๐+1 โ prox๐๐บ (๐ง๐),๐ง๐+1 โ _โ1๐ ๐ฅ
๐+1 โ (_โ1๐ โ 1)๐ง๐ .
Theorem 12.3. Let๐บ : ๐ โ โ be convex, proper, and lower semicontinuous with [๐๐บ]โ1(0) โ โ . Pick an initial iterate ๐ฅ0 = ๐ง0 โ ๐ . If {_๐}๐โโ โฅ 1/2 is nonincreasing, the ergodicsequence {๐ฅ๐ }๐โโ dened in (12.13) and generated from the iterates {๐ฅ๐}๐โโ of the over-relaxed proximal point method (12.15) satises ๐บ (๐ฅ๐ ) โ ๐บmin โ min๐ฅโ๐ ๐บ (๐ฅ) at the rate๐ (1/๐ ).
Proof. We apply Corollary 12.2 with ๏ฟฝ๏ฟฝ = ๐บ , ๐น = 0, ๐ = ๐โ1Id. Clearly ๐น satises (11.19)with ฮ = 0. Then (12.12) holds if 2_๐ โฅ 1, that is to say _๐ โฅ 1/2. For ๐ฅ โ argmin๐บ , wehave G(๐ฅ ;๐ฅ) = ๐บ (๐ฅ) โ๐บ (๐ฅ) = ๐บ (๐ฅ) โ๐บmin. Therefore Corollary 12.2 gives
(12.16) ๐บ (๐ฅ๐ ) โค ๐บmin +_20
2๐ โ๐โ1๐=0 _๐
โ๐ง0 โ ๐ฅ โ2๐
Since โ๐โ1๐=0 _๐ โฅ ๐ /2 by the lower bound on _๐ , we get the claimed ๐ (1/๐ ) convergence
rate of the function values for the ergodic sequence. ๏ฟฝ
over-relaxed forward-backward splitting
For a smooth function ๐น , the over-relaxed forward-backward splitting iterates
(12.17){๐ฅ๐+1 โ prox๐๐บ (๐ง๐ โ ๐โ๐น (๐ง๐)),๐ง๐+1 โ _โ1๐ ๐ฅ
๐+1 โ (_โ1๐ โ 1)๐ง๐ .
170
12 meta-algorithms
Theorem 12.4. Let ๐ฝ โ ๐บ + ๐น for ๐บ : ๐ โ โ and ๐น : ๐ โ โ be convex, proper, andlower semicontinuous with โ๐น ๐ฟ-Lipschitz. Suppose [๐๐ฝ ]โ1(0) โ โ . Pick an initial iterate๐ฅ0 = ๐ง0 โ ๐ . If {_๐}๐โโ is nonincreasing and satises
(12.18) _๐ โฅ 14 (1 +
โ1 + 8๐ฟ๐),
then the ergodic sequence {๐ฅ๐ }๐โโ dened in (12.13) and generated from the iterates {๐ฅ๐}๐โโof the over-relaxed forward-backward splitting (12.17) satises ๐ฝ (๐ฅ๐ ) โ ๐ฝmin โ min๐ฅโ๐ ๐ฝ (๐ฅ)at the rate ๐ (1/๐ ).
Proof. We apply Corollary 12.2 with ๏ฟฝ๏ฟฝ = ๐บ , ๐น = ๐น , and ๐ = ๐โ1Id. By Corollary 7.2, ๐นsatises the three-point smoothness condition (11.19) with ฮ = ๐ฟ Id. The condition (12.12)consequently holds if _๐ (2_๐ โ 1) > ๐ฟ๐ , which holds under the assumption (12.18). The restfollows as in the proof of Theorem 12.3. ๏ฟฝ
over-relaxed pdps
With ๐น = ๐น0 + ๐ธ : ๐ โ โ, ๐บโ : ๐ โ โ, and ๐พ โ ๐(๐ ;๐ ), take ๐ป : ๐ ร ๐ โ ๐ ร ๐as well as ๐น, ๏ฟฝ๏ฟฝ , and ฮ as in (11.6), and the preconditioner ๐ as in (11.25) for xed steplength parameters ๐, ๐ > 0. Writing ๐ง๐ = (b๐ , ๐๐), and, as usual ๐ข๐ = (๐ฅ๐ , ๐ฆ๐), the method(12.4) then becomes the over-relaxed PDPS method with a forward step, also known as theVuโCondat method:
(12.19)
๐ฅ๐+1 โ (๐ผ + ๐๐๐น0)โ1(b๐ โ ๐๐พโ๐ฆ๐ โ ๐โ๐ธ (b๐)),๐ฅ๐+1 โ (๐ฅ๐+1 โ b๐) + ๐ฅ๐+1,๐ฆ๐+1 โ (๐ผ + ๐๐๐บโ)โ1(๐๐ + ๐๐พ๐ฅ๐+1),b๐+1 โ _โ1๐ ๐ฅ
๐+1 โ (_โ1๐ โ 1)b๐ ,๐๐+1 โ _โ1๐ ๐ฆ
๐+1 โ (_โ1๐ โ 1)๐๐ .For the statement of the next result, we recall that for the primal-dual saddle-point operator๐ป from (11.6), the generic gap functional G becomes the primal-dual gap G given in (11.7).
Theorem 12.5. Suppose ๐น0 : ๐ โ โ, ๐ธ : ๐ โ โ and ๐บ : ๐ โ โ are convex, proper, andlower semicontinuous on Hilbert spaces ๐ and ๐ with โ๐ธ ๐ฟ-Lipschitz. Let also ๐พ โ ๐(๐ ;๐ ).With ๐น = ๐น0 + ๐ธ, suppose the assumptions of Theorem 5.10 are satised. Pick an initial iterate๐ข0 = ๐ง0 โ ๐ ร ๐ . If the sequence {_๐}๐โโ is nonincreasing and satises
(12.20) _๐ โฅ 14 (1 +
โ1 + 8๐ฟ๐/(1 โ ๐๐ โ๐พ โ2)) and ๐๐ โ๐พ โ2 โค 1,
then the ergodic sequence {๏ฟฝ๏ฟฝ๐ = (๐ฅ๐ , ๏ฟฝ๏ฟฝ๐ )}๐โโ dened as in (12.13) and generated from theiterates {๐ข๐ = (๐ฅ๐ , ๐ฆ๐)}๐โโ of the over-relaxed PDPS method (12.19) satises G(๐ฅโ,๐ , ๏ฟฝ๏ฟฝโ,๐ ) โ0 at the rate ๐ (1/๐ ).
171
12 meta-algorithms
Proof. We recall that ๐ปโ1(0) โ โ under the assumptions of Theorem 5.10. Clearly ๐ isself-adjoint. The condition (12.12) can with (10.32) be reduced to(
_๐ (2_๐ โ 1)๐ฟ๐Id 00 ๐โ1๐ผ โ ๐ (1 โ ๐ฟ)โ1๐พ๐พโ
)โฅ
(๐ฟ 00 0
)for some ๐ฟ โ (0, 1). As in (10.34) in the proof of Theorem 10.8, these conditions reduce to
(12.21) _๐ (2_๐ โ 1)๐ฟ โฅ ๐๐ฟ and 1 โ ๐ฟ โฅ ๐๐ โ๐พ โ2.
The rst inequality holds if _๐ โฅ 14 (1 +
โ1 + 8๐ฟ๐๐ฟโ1). Solving the second inequality as an
equality for ๐ฟ yields the condition
_๐ โฅ 14 (1 +
โ1 + 8๐ฟ๐ [(1 โ ๐๐ โ๐พ โ2)]โ1),
i.e., (12.20). Now we obtain the gap convergence from Corollary 12.2. ๏ฟฝ
Remark 12.6. The method (12.19) is due to [Vu 2013; Condat 2013]. The convergence of the ergodicgap was observed in [Chambolle & Pock 2015].
12.2 inertia
Our next inertial meta-algorithm will likewise not yield convergence of the main iterates,but through a special arrangement of variables combinedwith intricate unrolling arguments,is able to do awaywith the word ergodic in the gap estimates. In essence, the meta-algorithmreplaces the previous iterate ๐ฅ๐ in the linear preconditioner of (12.1) by an inertial point
(12.22) ๐ฅ๐ โ (1 + ๐ผ๐)๐ฅ๐ โ ๐ผ๐๐ฅ๐โ1 for ๐ผ๐ โ _๐ (_โ1๐โ1 โ 1)
for some inertial parameters {_๐}๐โโ. We thus solve
(12.23) 0 โ ๐ป๐+1(๐ฅ๐+1) +๐๐+1(๐ฅ๐+1 โ ๐ฅ๐).
We can relate this to over-relaxation as follows: we simply replace ๐ง๐ in the denition (12.2)of ๐ง๐+1 by ๐ฅ๐ , i.e., we take
(12.24) ๐ง๐+1 โ _โ1๐ ๐ฅ๐+1 โ (_โ1๐ โ 1)๐ฅ๐ .
Since
(12.25) _๐ (๐ง๐+1 โ ๐ง๐) = ๐ฅ๐+1 โ (1 โ _๐)๐ฅ๐ โ _๐ [_โ1๐โ1๐ฅ๐ โ (_โ1๐โ1 โ 1)๐ฅ๐โ1]= ๐ฅ๐+1 โ [1 โ _๐ + _๐_โ1๐โ1]๐ฅ๐ + _๐ (_โ1๐โ1 โ 1)๐ฅ๐โ1= ๐ฅ๐+1 โ [(1 + ๐ผ๐)๐ฅ๐ โ ๐ผ๐๐ฅ๐โ1] = ๐ฅ๐+1 โ ๐ฅ๐ ,
172
12 meta-algorithms
we obtain the method (12.4), with the diering update (12.24) of ๐ง๐+1. Again we can also liftthe overall algorithm into the form (12.1), specically (12.5), by taking ๐ โ (๐ฅ, ๐ง) with
๏ฟฝ๏ฟฝ๐+1(๐) โ(๐ป๐+1(๐ฅ)๐ง โ ๐ฅ
), and ๏ฟฝ๏ฟฝ๐+1 โ
(0 _๐๐๐+1
(๐ผ โ _โ1๐)๐ผ 0
).
Now comes the trick with inertial methods: Unlike with over-relaxed methods, where wewanted to avoid having to estimate ใ๐ป๐+1(๐ฅ๐+1), ๐ง๐+1 โ ๏ฟฝ๏ฟฝใ๐๐+1 , with inertial methods we arebrave enough to do this. Indeed, our specic choice (12.24) makes this possible, as we shallsee below. We therefore test with
๐๐+1 โ(
0 0_๐๐๐+1 0
)to obtain a self-adjoint and positive semi-denite
(12.26) ๐๐+1๐๐+1 =(0 00 _2
๐๐๐+1๐๐+1
).
Therefore Theorem 11.12 applies, and we obtain the following:
Theorem 12.7. Let the inertial parameters {_๐}๐โโ โ (0,โ). On a Hilbert space ๐ , let๐ป๐+1 : ๐ โ ๐ , and๐๐+1, ๐๐+1 โ ๐(๐ ;๐ ) for ๐ โ โ. Suppose (12.23) is solvable for the iterates{๐ฅ๐}๐โโ. If ๐๐+1๐๐+1 is self-adjoint, and
(12.27) _๐ ใ๐ป๐+1(๐ฅ๐+1), ๐ง๐+1 โ ๏ฟฝ๏ฟฝใ๐๐+1 โฅ V๐+1(๐ฅ) +12 โ๐ง
๐+1 โ ๏ฟฝ๏ฟฝโ2_2๐+1๐๐+2๐๐+2โ_2๐๐๐+1๐๐+1
โ _2๐
2 โ๐ง๐+1 โ ๐ง๐ โ2๐๐+1๐๐+1
for all ๐ โ โ and some ๐ฅ โ ๐ and V๐+1(๐ฅ) โ โ, then
(12.28)_2๐2 โ๐ง๐ โ ๐ฅ โ2๐๐+1๐๐+1 +
๐โ1โ๐=0
V๐+1(๐ฅ) โค_202 โ๐ง0 โ ๐ฅ โ2๐1๐1
(๐ โฅ 1).
Proof. This follows directly from Theorem 11.12 and the expansion (12.26). ๏ฟฝ
We now provide examples of how to apply this result to the proximal point method andforward-backward splitting. As we recall, in these algorithms we take ๐๐+1 = ๐๐๐ผ and๐๐+1 = ๐๐๐ผ . To proceed, we will need a few further general-purpose technical lemmas.The rst one is the fundamental lemma for inertia, which provides inertial function valueunrolling.
173
12 meta-algorithms
Lemma 12.8. Let๐บ : ๐ โ โ be convex, proper, and lower semicontinuous. Suppose _๐ โ [0, 1]and ๐๐ , ๐๐ > 0 for ๐ โ โ with
(12.29) ๐๐+1๐๐+1(1 โ _๐+1) โค ๐๐๐๐ (๐ โฅ 0).
Assume ๐๐+1 โ ๐๐บ (๐ฅ๐+1) for ๐ = 0, . . . , ๐ โ 1, and 0 โ ๐๐บ (๐ฅ). Then
(12.30) ๐ ๐บ,๐ โ๐โ1โ๐=0
๐๐๐๐_๐ ใ๐๐+1, ๐ง๐+1 โ ๐ฅใ๐
โฅ ๐๐โ1๐๐โ1(๐บ (๐ฅ๐ ) โ๐บ (๐ฅ)) โ ๐0๐0(1 โ _0) (๐บ (๐ฅ0) โ๐บ (๐ฅ)).
Proof. Using (12.24), observe that
(12.31) _๐ (๐ง๐+1 โ ๐ฅ) = _๐ [_โ1๐+1๐ฅ๐+1 โ (_โ1๐ โ 1)๐ฅ๐ โ ๐ฅ]= _๐ (๐ฅ๐+1 โ ๐ฅ) + (1 โ _๐) (๐ฅ๐+1 โ ๐ฅ๐).
Recalling from (12.31) that _๐ (๐ง๐+1 โ ๐ฅ) = _๐ (๐ฅ๐+1 โ ๐ฅ) + (1 โ _๐) (๐ฅ๐+1 โ ๐ฅ๐) and using theconvexity of ๐บ , we can estimate
(12.32) ๐ ๐บ,๐ =๐โ1โ๐=0
๐๐๐๐[_๐ ใ๐๐+1, ๐ฅ๐+1 โ ๐ฅใ๐ + (1 โ _๐)ใ๐๐+1, ๐ฅ๐+1 โ ๐ฅ๐ใ๐
]โฅ
๐โ1โ๐=0
๐๐๐๐[_๐ (๐บ (๐ฅ๐+1) โ๐บ (๐ฅ)) + (1 โ _๐) (๐บ (๐ฅ๐+1) โ๐บ (๐ฅ๐))
]=๐โ1โ๐=0
[๐๐๐๐ (๐บ (๐ฅ๐+1) โ๐บ (๐ฅ)) โ ๐๐๐๐ (1 โ _๐) (๐บ (๐ฅ๐) โ๐บ (๐ฅ))
].
Since๐บ (๐ฅ๐) โฅ ๐บ (๐ฅ), the recurrence inequality (12.29) together with a telescoping argumentnow gives
๐ ๐บ,๐ โฅ ๐๐โ1๐๐โ1(๐บ (๐ฅ๐ ) โ๐บ (๐ฅ)) โ ๐0๐0(1 โ _0) (๐บ (๐ฅ0) โ๐บ (๐ฅ)) .
This is the claim. ๏ฟฝ
Lemma 12.9. Suppose _0 = 1 and _โ2๐
= _โ2๐+1 โ _โ1๐+1 for ๐ = 0, . . . , ๐ โ 1. Then
(12.33) _๐+1 =2
1 +โ1 + 4_โ2
๐
(๐ = 0, . . . , ๐ โ 1)
and _โ1๐ โฅ (๐ + 1).
174
12 meta-algorithms
Proof. First, the recurrence (12.33) is a simple solution of the assumed quadratic equation.We show the lower bound by total induction on ๐ . Assume that _โ1
๐โฅ (๐ + 1) for all
๐ = 0, . . . , ๐ โ 1. Rearranging the original update as
_โ2๐+1 โ _โ1๐+1 = _โ2๐ โ _โ1๐ + _โ1๐ ,summing over ๐ = 0, . . . , ๐ โ 1, and telescoping yields
_โ2๐ โ _โ1๐ =๐โ1โ๐=0
_โ1๐ .
From the induction assumption, we thus obtain _โ2๐ โ _โ1๐ โฅ (๐ + 2) (๐ + 1). Solving thisquadratic inequality as an equality then shows that _โ1๐ โฅ (1 +
โ1 + 4(๐ + 2) (๐ + 1)) โฅ
(๐ + 1), which completes the proof. ๏ฟฝ
inertial proximal point method
Let ๐ป = ๐๐บ and ๏ฟฝ๏ฟฝ๐+1 = ๐๐๐บ for a convex, proper, lower semicontinuous function ๐บ . Take๐ > 0 and _๐+1 by (12.33) for _0 = 1. Then (12.23) becomes the inertial proximal pointmethod
(12.34)
๐ฅ๐+1 โ prox๐๐บ (๐ฅ๐),๐ผ๐+1 โ _๐+1(_โ1๐ โ 1),๐ฅ๐+1 โ (1 + ๐ผ๐+1)๐ฅ๐+1 โ ๐ผ๐+1๐ฅ๐ .
Theorem 12.10. Let ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous. Suppose[๐๐บ]โ1(0) โ โ . Take ๐ > 0 and _0 = 1, and pick an initial iterate ๐ฅ0 โ ๐ . Then the inertialproximal point method (12.34) satises ๐บ (๐ฅ๐ ) โ ๐บmin at the rate ๐ (1/๐ 2).
Proof. If we take ๐๐ = ๐ as stated and ๐๐ = _โ2๐, then (12.9) veries (12.29). Since now
_2๐+1๐๐+1 = _
2๐๐๐ , (12.27) holds if
(12.35) _๐๐๐๐๐ ใ๐๐บ (๐ฅ๐+1), ๐ง๐+1 โ ๏ฟฝ๏ฟฝใ๐ โฅ V๐+1(๐ฅ) โ_2๐๐๐
2 โ๐ง๐+1 โ ๐ง๐ โ2๐for someV๐+1(๐ฅ) โ โ. This is veried by Lemma 12.8 for someV๐+1(๐ฅ) such that
๐โ1โ๐=0
V๐+1(๐ฅ) โฅ ๐๐โ1๐๐โ1(๐บ (๐ฅ๐ ) โ๐บ (๐ฅ)) โ ๐0๐0(1 โ _0) (๐บ (๐ฅ0) โ๐บ (๐ฅ)).
Since _0 = 1, Theorem 12.7 gives the estimate๐๐_
2๐
2 โ๐ฅ๐ โ ๐ฅ โ2๐ + ๐๐โ1๐๐โ1(๐บ (๐ฅ๐ ) โ๐บ (๐ฅ)) โค ๐0_20
2 โ๐ฅ0 โ ๐ฅ โ2๐ .
By Lemma 12.9 now๐๐โ1๐๐โ1 = _โ2๐โ1๐ โฅ ๐๐ 2. Thereforewe obtain the claimed convergencerate. ๏ฟฝ
175
12 meta-algorithms
inertial forward-backward splitting
Let๐ป = ๐๐บ+โ๐น and ๏ฟฝ๏ฟฝ๐+1(๐ฅ) = ๐ (๐๐บ (๐ฅ)+โ๐น (๐ฅ๐)) for convex, proper, lower semicontinuousfunctions ๐บ and ๐น with ๐น smooth. Take ๐ > 0 and _๐+1 by (12.33) for _0 = 1. Then (12.23)becomes the inertial forward-backward splitting method
(12.36)
๐ฅ๐+1 โ prox๐๐บ (๐ฅ๐ โ ๐โ๐น (๐ฅ๐)),๐ผ๐+1 โ _๐+1(_โ1๐ โ 1),๐ฅ๐+1 โ (1 + ๐ผ๐+1)๐ฅ๐+1 โ ๐ผ๐+1๐ฅ๐ .
To prove the convergence of this method, we need to incorporate the forward step intoLemma 12.8.
Lemma 12.11. Let ๐ฝ โ ๐น +๐บ for ๐บ : ๐ โ โ and ๐น : ๐ โ โ be convex, proper, and lowersemicontinuous. Suppose ๐น has ๐ฟ-Lipschitz gradient and that _๐ โ [0, 1] and ๐๐ , ๐๐ > 0 satisfythe recurrence inequality (12.29) for ๐ โ โ. Assume๐ค๐+1 โ ๐๐บ (๐ฅ๐+1) for all ๐ = 0, . . . , ๐ โ 1,and that 0 โ ๐๐ฝ (๐ฅ). Then
(12.37) ๐ ๐ โ๐โ1โ๐=0
(๐๐๐๐_๐ ใ๐ค๐+1
๐ + โ๐น (๐ฅ๐), ๐ง๐+1 โ ๐ฅใ๐ + ๐๐๐๐_2๐๐ฟ
2 โ๐ง๐+1 โ ๐ง๐ โ2๐)
โฅ ๐๐โ1๐๐โ1(๐ฝ (๐ฅ๐ ) โ ๐ฝ (๐ฅ)) โ ๐0๐0(1 โ _0) (๐ฝ (๐ฅ0) โ ๐ฝ (๐ฅ)) .
Proof. We recall from (12.25) that _2๐2 โ๐ง๐+1 โ ๐ง๐ โ2๐ = 1
2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .We therefore estimateusing Corollary 7.2 that(12.38)
๐ ๐น,๐ โ๐โ1โ๐=0
(๐๐๐๐_๐ ใโ๐น (๐ฅ๐), ๐ง๐+1 โ ๐ฅใ๐ + ๐๐๐๐_
2๐๐ฟ
2 โ๐ง๐+1 โ ๐ง๐ โ2๐)
=๐โ1โ๐=0
๐๐๐๐
[_๐ ใโ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ + (1 โ _๐)ใโ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅ๐ใ๐ + ๐ฟ2 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐]
โฅ๐โ1โ๐=0
๐๐๐๐[_๐ (๐น (๐ฅ๐+1) โ ๐น (๐ฅ)) + (1 โ _๐) (๐น (๐ฅ๐+1) โ ๐น (๐ฅ๐))
]=๐โ1โ๐=0
[๐๐๐๐ (๐น (๐ฅ๐+1) โ ๐น (๐ฅ)) โ ๐๐๐๐ (1 โ _๐) (๐น (๐ฅ๐) โ ๐น (๐ฅ))
].
Summing with the estimate (12.32) for ๐บ , we deduce
๐ ๐ โฅ๐โ1โ๐=0
[๐๐๐๐ ((๐น +๐บ) (๐ฅ๐+1) โ (๐น +๐บ) (๐ฅ)) โ ๐๐๐๐ (1 โ _๐) ((๐น +๐บ) (๐ฅ๐) โ (๐น +๐บ) (๐ฅ))
].
Since (๐น +๐บ) (๐ฅ๐) โฅ (๐น +๐บ) (๐ฅ), the recurrence inequality (12.29) together with a telescopingargument now gives the claim. ๏ฟฝ
176
12 meta-algorithms
Theorem 12.12. Let ๐ฝ โ ๐บ + ๐น for ๐บ : ๐ โ โ and ๐น : ๐ โ โ be convex, proper, andlower semicontinuous with โ๐น Lipschitz. Suppose [๐๐ฝ ]โ1(0) โ โ . Take ๐ > 0 and _0 = 1, andpick an initial iterate ๐ฅ0 โ ๐ . Then the inertial forward-backward splitting (12.36) satises๐ฝ (๐ฅ๐ ) โ min๐ฅโ๐ ๐ฝ (๐ฅ) at the rate ๐ (1/๐ 2).
Proof. The proof follows that of Theorem 12.10: in place of (12.35) we reduce (12.27) to thecondition
_๐๐๐๐๐ ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ง๐+1 โ ๏ฟฝ๏ฟฝใ๐ โฅ V๐+1(๐ฅ) โ_2๐๐๐
2 โ๐ง๐+1 โ ๐ง๐ โ2๐ .
This is veried by using (12.37) in place of Lemma 12.8. ๏ฟฝ
Remark 12.13 (accelerated gradient methods, FISTA). The inertial scheme was rst introducedby [Nesterov 1983] for the basic gradient descent method for smooth functions. The extensionto forward-backward splitting is due to [Beck & Teboulle 2009a], which proposed a fast iterativeshrinkage-thresholding algorithm (FISTA) for the specic problem of minimizing a least-squaresterm plus a weighted โ 1 norm. (Note that in most treatments of FISTA, our _โ1
๐is written as ๐ก๐ .) We
refer to [Nesterov 2004; Beck 2017] for a further discussion of these algorithms and more generalaccelerated gradient methods based on combinations of a history of iterates.
Remark 12.14 (PDPS, DouglasโRachford, and correctors). The above unrolling arguments cannotbe directly applied to the PDPS, DouglasโRachford splitting, and other methods based on (12.1)with non-maximally monotone ๐ป . Following [Chambolle & Pock 2015], one can apply inertia tothe PDPS method with the restricted choice ๐ผ๐ โ (0, 1/3). This prevents the use of the FISTA rule(12.33) and only yields ๐ (1/๐ ) convergence of an ergodic gap. Based on alternative argumentation,when one of the functions is quadratic, [Patrinos, Stella & Bemporad 2014] managed to employ theFISTA rule and obtain ๐ (1/๐ 2) rates for inertial DouglasโRachford splitting. Moreover, [Valkonen2020b] observed that by introducing a corrector for the non-subdierential component of ๐ป , inessence ฮ๐+1, the gap unrolling arguments can be performed. This approach also allows combininginertial acceleration with strong monotonicity based acceleration.
12.3 line search
Let us return to the basic results on weak convergence (Theorem 9.6), strong convergencewith rates (Theorem 10.2), and function value convergence (Theorem 11.4) of the explicitsplitting method. These results depend on the three-point inequalities of Corollary 7.2 (or,for faster rates under strong convexity, Corollary 7.7), specically either the non-valueestimate
ใโ๐น (๐ฅ๐) โ โ๐น (๐ฅ), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ โ๐ฟ4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐(12.39)
177
12 meta-algorithms
or the value estimate
ใโ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐น (๐ฅ๐+1) โ ๐น (๐ฅ) โ ๐ฟ
2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .(12.40)
Recall that for weak convergence of iterates, we required the step length parameters {๐๐}๐โโto satisfy on each iteration the bound ๐๐๐ฟ < 2. Under a strong convexity assumption, thebound๐๐๐ฟ โค 2was sucient for strong convergence of iterates. Function value convergencewas nally shown under the bound ๐๐๐ฟ โค 1. All cases thus hold for ๐๐๐ฟ โค 1, which weassume in the following for simplicity.
In this section, we address the following question: What if we do not know the Lipschitzfactor ๐ฟ? A basic idea is to take ๐ฟ large enough. But what is large enough? Finding such alarge enough ๐ฟ is the same as taking ๐๐ small enough and ๐ฟ = 1/๐๐ . This leads us to thefollowing rough line search rule: for some ๐ > 0 and line search parameter \ โ (0, 1), startwith ๐๐ โ ๐ , and iterate ๐๐ โฆโ \๐๐ until (12.40) (or (12.39)) is satised with ๐ฟ = 1/๐๐ . Notethat on each update of ๐๐ , we need to recalculate ๐ฅ๐+1 โ prox๐๐๐บ (๐ฅ๐ โ ๐๐โ๐น (๐ฅ๐))).Performing this line search still appears to depend on knowing ๐ฅ through (12.40). However,going back to the proof of Corollary 7.2, we see that what is really needed is to satisfy thesmoothness (or descent) inequality (7.5) which was used to derive (12.40). We are thereforelead to the following practical line search method to guarantee the inequality
(12.41) ใโ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅ๐ใ๐ โฅ ๐น (๐ฅ๐+1) โ ๐น (๐ฅ๐) โ 12๐๐
โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐
on every iteration:
0. Pick \ โ (0, 1), ๐ > 0, _0 โ 1, ๐ฅ0 โ ๐ ; set ๐ = 0.
1. Set ๐๐ = ๐ .
2. Calculate ๐ฅ๐+1 โ prox๐๐๐บ (๐ฅ๐ โ ๐๐โ๐น (๐ฅ๐))).3. If (12.41) does not hold, update ๐๐ โ \๐๐ , and go back to step 2.
4. Set ๐ โ ๐ + 1, and continue from step 1.
Theorem 12.15 (explicit spliing line search). Let ๐ฝ โ ๐น + ๐บ where ๐บ : ๐ โ โ and๐น : ๐ โ โ are convex, proper, and lower semicontinuous, with โ๐น moreover Lipschitz.Suppose [๐๐ฝ ]โ1(0) โ โ . Then the above line search method satises ๐ฝ (๐ฅ๐ ) โ min๐ฅโ๐ ๐ฝ (๐ฅ)at the rate ๐ (1/๐ ). If ๐บ is strongly convex, then this convergence is linear.
Proof. Since โ๐น is ๏ฟฝ๏ฟฝ-smooth for some unknown ๏ฟฝ๏ฟฝ > 0, eventually the line search proceduresatises 1/๐๐ โฅ ๏ฟฝ๏ฟฝ. Hence (12.41) is satised, and ๐๐ โฅ Y > 0 for some Y > 0. We can thereforefollow through the proof of Theorem 11.4 with ๐ฟ = 1/๐๐ . ๏ฟฝ
178
12 meta-algorithms
We can also combine the line search method with the inertial forward-backward splitting(12.36). If in place of (12.41) we seek to satisfy
(12.42) ใโ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅ๐ใ๐ โฅ ๐น (๐ฅ๐+1) โ ๐น (๐ฅ๐) โ 12๐๐
โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ ,
then alsoใโ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐น (๐ฅ๐+1) โ ๐น (๐ฅ) โ 1
2๐๐โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
This allows the inequality of (12.38) to be shown.
We are therefore lead to the following practical backtracking inertial forward-backwardsplitting:
0. Pick \ โ (0, 1), ๐ > 0, _0 โ 1, ๐ฅ0 = ๐ฅ0 โ ๐ ; set ๐ = 0.
1. Set ๐๐ = ๐ .
2. Calculate ๐ฅ๐+1 โ prox๐๐๐บ (๐ฅ๐ โ ๐๐โ๐น (๐ฅ๐))).3. If (12.42) does not hold, update ๐๐ โ \๐๐ , and go back to step 2.
4. Set ๐ฅ๐+1 โ (1 + ๐ผ๐+1)๐ฅ๐+1 โ ๐ผ๐+1๐ฅ๐ for ๐ผ๐+1 โ _๐+1(_โ1๐ โ 1).5. Set ๐ โ ๐ + 1, and continue from step 1.
The proof of the following is immediate:
Theorem 12.16. Let ๐ฝ โ ๐บ + ๐น for ๐บ : ๐ โ โ and ๐น : ๐ โ โ be convex, proper, and lowersemicontinuous with โ๐น Lipschitz. Suppose [๐๐ฝ ]โ1(0) โ โ . Take ๐ > 0 and _0 = 1, and pickan initial iterate ๐ฅ0 โ ๐ . Then the above backtracking inertial forward-backward splittingsatises ๐ฝ (๐ฅ๐ ) โ min๐ฅโ๐ ๐ฝ (๐ฅ) at the rate ๐ (1/๐ 2).
The reader may now work out how to use line search to satisfy the nonnegativity ofthe metric ๐๐+1๐๐+1 in the PDPS method when โ๐พ โ is not known, or how to satisfy thecondition ๐ฟ๐0 + ๐0๐0โ๐พ โ2 < 1 when the Lipschitz factor ๐ฟ of the forward step component๐ธ is not known.
Remark 12.17 (adaptive inertial parameters, quasi-Newton methods, and primal-dual proximal
line searches). Regarding our statement in the beginning of the chapter about inertia methodsattempting to construct a second-order approximation of the function, [Ochs & Pock 2019] show thatan adaptive inertial forward-backward splitting, performing an optimal line search on _๐ insteadof ๐๐ , is equivalent to a proximal quasi-Newton method. Such a method is a further developmentof variants [see Beck & Teboulle 2009b] of the method that attempt to restore the monotonicityof forward-backward splitting that is lost by inertia. Indeed, if ๐ฝ (๐ฅ๐+1) โค ๐ฝ (๐ฅ๐ ) does not hold for_๐ < 1, we can revert to _๐ = 1 to ensure descent as the step reduces to basic forward-backwardsplitting, which we know to be monotone by Theorem 11.4. Finally, a line search for the PDPSmethod is studied in [Malitsky & Pock 2018].
179
Part III
NONCONVEX ANALYSIS
180
13 CLARKE SUBDIFFERENTIALS
We now turn to a concept of generalized derivatives that covers, among others, both Frรฉchetderivatives and convex subdierentials. Again, we start with the general class of functionalsthat admit such a derivative. It is clear that we need to require some continuity properties,since otherwise there would be no relation between functional values at neighboring pointsand thus no hope of characterizing optimality through pointwise properties. In Part II, weused lower semicontinuity for this purpose, which together with convexity yielded therequired properties. In this part, we want to drop the latter, global, assumption; in turn weneed to strengthen the local continuity assumption. We thus consider now locally Lipschitzcontinuous functionals. Recall that ๐น : ๐ โ โ is locally Lipschitz continuous near ๐ฅ โ ๐if there exist a ๐ฟ > 0 and an ๐ฟ > 0 (which in the following will always denote the localLipschitz constant of ๐น ) such that
|๐น (๐ฅ1) โ ๐น (๐ฅ2) | โค ๐ฟโ๐ฅ1 โ ๐ฅ2โ๐ for all ๐ฅ1, ๐ฅ2 โ ๐(๐ฅ, ๐ฟ).
We will refer to the ๐(๐ฅ, ๐ฟ) from the denition as the Lipschitz neighborhood of ๐ฅ . Notethat for this we have to require that ๐น is (locally) nite-valued (but see Remark 13.27 below).Throughout this chapter, we will assume that๐ is a Banach space unless stated otherwise.
13.1 definition and basic properties
We proceed as for the convex subdierential and rst dene for ๐น : ๐ โ โ the generalizeddirectional derivative in ๐ฅ โ ๐ in direction โ โ ๐ as
(13.1) ๐น โฆ(๐ฅ ;โ) โ lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ)๐ก
.
Note the dierence to the classical directional derivative: We no longer require the existenceof a limit but merely of accumulation points. We will need the following properties.
Lemma 13.1. Let ๐น : ๐ โ โ be locally Lipschitz continuous near ๐ฅ โ ๐ with the factor ๐ฟ.Then the mapping โ โฆโ ๐น โฆ(๐ฅ ;โ) is
(i) Lipschitz continuous with the factor ๐ฟ and satises |๐น โฆ(๐ฅ ;โ) | โค ๐ฟโโโ๐ < โ;
181
13 clarke subdifferentials
(ii) subadditive, i.e., ๐น โฆ(๐ฅ ;โ + ๐) โค ๐น โฆ(๐ฅ ;โ) + ๐น โฆ(๐ฅ ;๐) for all โ,๐ โ ๐ ;(iii) positively homogeneous, i.e., ๐น โฆ(๐ฅ ;๐ผโ) = (๐ผ๐น )โฆ(๐ฅ ;โ) for all ๐ผ > 0 and โ โ ๐ ;(iv) reective, i.e., ๐น โฆ(๐ฅ ;โโ) = (โ๐น )โฆ(๐ฅ ;โ) for all โ โ ๐ .
Proof. (i): Let โ,๐ โ ๐ be arbitrary. The local Lipschitz continuity of ๐น implies that
๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ) โค ๐น (๐ฆ + ๐ก๐) โ ๐น (๐ฆ) + ๐ก๐ฟโโ โ ๐โ๐for all ๐ฆ suciently close to ๐ฅ and ๐ก suciently small. Dividing by ๐ก > 0 and taking thelim sup then yields that
๐น โฆ(๐ฅ ;โ) โค ๐น โฆ(๐ฅ ;๐) + ๐ฟโโ โ ๐โ๐ .Exchanging the roles of โ and ๐ shows the Lipschitz continuity of ๐น โฆ(๐ฅ ; ยท), which also yieldsthe claimed boundedness since ๐น โฆ(๐ฅ ;๐) = 0 for ๐ = 0 from the denition.
(ii): Since ๐กโ 0 and ๐ โ ๐ is xed, ๐ฆ โ ๐ฅ if and only if ๐ฆ + ๐ก๐ โ ๐ฅ . The denition of thelim sup and the productive zero thus immediately yield
๐น โฆ(๐ฅ ;โ + ๐) = lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐กโ + ๐ก๐) โ ๐น (๐ฆ)๐ก
โค lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐กโ + ๐ก๐) โ ๐น (๐ฆ + ๐ก๐)๐ก
+ lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐ก๐) โ ๐น (๐ฆ)๐ก
= ๐น โฆ(๐ฅ ;โ) + ๐น โฆ(๐ฅ ;๐).
(iii): Again from the denition we obtain for ๐ผ > 0 that
๐น โฆ(๐ฅ ;๐ผโ) = lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ โ ๐ก (๐ผโ)) โ ๐น (๐ฆ)๐ก
= lim sup๐ฆโ๐ฅ๐ผ๐กโ 0
๐ผ๐น (๐ฆ + (๐ผ๐ก)โ) โ ๐น (๐ฆ)
๐ผ๐ก= (๐ผ๐น )โฆ(๐ฅ ;โ).
(iv): Similarly, since ๐กโ 0 and โ โ ๐ is xed, ๐ฆ โ ๐ฅ if and only if ๐ค โ ๐ฆ โ ๐กโ โ ๐ฅ . Wethus have that
๐น โฆ(๐ฅ ;โโ) = lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ โ ๐กโ) โ ๐น (๐ฆ)๐ก
= lim sup๐คโ๐ฅ๐กโ 0
โ๐น (๐ค + ๐กโ) โ (โ๐น (๐ค))๐ก
= (โ๐น )โฆ(๐ฅ ;โ). ๏ฟฝ
182
13 clarke subdifferentials
In particular, Lemma 13.1 (i)โ(iii) imply that the mapping โ โฆโ ๐น โฆ(๐ฅ ;โ) is proper, convex,and lower semicontinuous.
We now dene for a locally Lipschitz continuous functional ๐น : ๐ โ โ the Clarkesubdierential in ๐ฅ โ ๐ as
(13.2) ๐๐ถ๐น (๐ฅ) โ {๐ฅโ โ ๐ โ | ใ๐ฅโ, โใ๐ โค ๐น โฆ(๐ฅ ;โ) for all โ โ ๐ } .
The denition together with Lemma 13.1 (i) directly implies the following properties.
Lemma 13.2. Let ๐น : ๐ โ โ be locally Lipschitz continuous and ๐ฅ โ ๐ . Then ๐๐ถ๐น (๐ฅ) isconvex, weakly-โ closed, and bounded. Specically, if ๐น is Lipschitz near ๐ฅ with constant ๐ฟ,then ๐๐ถ๐น (๐ฅ) โ ๐น(0, ๐ฟ).
Furthermore, we have the following useful continuity property.
Lemma 13.3. Let ๐น : ๐ โ โ. Then ๐๐ถ๐น (๐ฅ) is strong-to-weak-โ outer semicontinuous, i.e., if๐ฅ๐ โ ๐ฅ and if ๐๐ถ๐น (๐ฅ๐) 3 ๐ฅโ๐ โโ ๐ฅโ, then ๐ฅโ โ ๐๐ถ๐น (๐ฅ).
Proof. Let โ โ ๐ be arbitrary. By assumption, we then have that ใ๐ฅโ๐, โใ๐ โค ๐น โฆ(๐ฅ๐;โ) forall ๐ โ โ. The weak-โ convergence of {๐ฅโ๐}๐โโ then implies that
ใ๐ฅโ, โใ๐ = lim๐โโใ๐ฅ
โ๐, โใ๐ โค lim sup
๐โโ๐น โฆ(๐ฅ๐;โ).
Hence we are nished if we can show that lim sup๐โโ ๐นโฆ(๐ฅ๐;โ) โค ๐น โฆ(๐ฅ ;โ) (since then
๐ฅโ โ ๐๐ถ๐น (๐ฅ) by denition).
For this,we use that by denition of ๐น โฆ(๐ฅ๐;โ), there exist sequences {๐ฆ๐,๐}๐โโ and {๐ก๐,๐}๐โโwith ๐ฆ๐,๐ โ ๐ฅ๐ and ๐ก๐,๐โ 0 for๐ โ โ realizing each lim sup. Hence, for all ๐ โ โ wecan nd a ๐ฆ๐ โ ๐ฆ๐,๐(๐) and a ๐ก๐ โ ๐ก๐,๐(๐) such that โ๐ฆ๐ โ ๐ฅ๐โ๐ + ๐ก๐ < ๐โ1 (and hence inparticular ๐ฆ๐ โ ๐ฅ and ๐ก๐โ 0) as well as
๐น โฆ(๐ฅ๐;โ) โ 1๐ โค ๐น (๐ฆ๐ + ๐ก๐โ) โ ๐น (๐ฆ๐)
๐ก๐
for ๐ suciently large. Taking the lim sup for ๐ โ โ on both sides yields the desiredinequality. ๏ฟฝ
Again, the construction immediately yields a Fermat principle.1
1Similarly to Theorem 4.2, we do not need to require Lipschitz continuity of ๐น โ the Fermat principle forthe Clarke subdierential characterizes (among others) any local minimizer. However, if we want to usethis principle to verify that a given ๐ฅ โ ๐ is indeed a (candidate for) a minimizer, we need a suitablecharacterization of the subdierential โ and this is only possible for (certain) locally Lipschitz continuousfunctionals.
183
13 clarke subdifferentials
Theorem 13.4 (Fermat principle). If ๐น : ๐ โ โ has a local minimum in ๐ฅ , then 0 โ ๐๐ถ๐น (๐ฅ).
Proof. If ๐ฅ โ ๐ is a local minimizer of ๐น , then ๐น (๐ฅ) โค ๐น (๐ฅ + ๐กโ) for all โ โ ๐ and ๐ก > 0suciently small (since the topological interior is always included in the algebraic interior).But this implies that
ใ0, โใ๐ = 0 โค lim inf๐กโ 0
๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โค lim sup๐กโ 0
๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โค ๐น โฆ(๐ฅ ;โ)
and hence 0 โ ๐๐ถ๐น (๐ฅ) by denition. ๏ฟฝ
Note that ๐น is not assumed to be convex and hence the condition is in general not sucient(consider, e.g., ๐ (๐ก) = โ|๐ก |).
13.2 fundamental examples
Next, we show that the Clarke subdierential is indeed a generalization of the derivativeconcepts weโve studied so far.
Theorem 13.5. Let ๐น : ๐ โ โ be continuously Frรฉchet dierentiable in a neighborhood๐ of๐ฅ โ ๐ . Then ๐๐ถ๐น (๐ฅ) = {๐น โฒ(๐ฅ)}.
Proof. First, we note that ๐น is locally Lipschitz continuous near ๐ฅ by Lemma 2.11. We nowshow that ๐น โฆ(๐ฅ ;โ) = ๐น โฒ(๐ฅ)โ (= ๐น โฒ(๐ฅ ;โ)) for all โ โ ๐ . Take again sequences {๐ฆ๐}๐โโ and{๐ก๐}๐โโ with ๐ฆ๐ โ ๐ฅ and ๐ก๐โ 0 realizing the lim sup in (13.1). Applying the mean valueTheorem 2.10 and using the continuity of ๐น โฒ yields for any โ โ ๐ that
๐น โฆ(๐ฅ ;โ) = lim๐โโ
๐น (๐ฆ๐ + ๐ก๐โ) โ ๐น (๐ฆ๐)๐ก๐
= lim๐โโ
โซ 1
0
1๐ก๐ใ๐น โฒ(๐ฆ๐ + ๐ (๐ก๐โ)), ๐ก๐โใ๐ ๐๐
= ใ๐น โฒ(๐ฅ), โใ๐since the integrand converges uniformly in ๐ โ [0, 1] to ใ๐น โฒ(๐ฅ), โใ๐ . Hence by denition,๐ฅโ โ ๐๐ถ๐น (๐ฅ) if and only if ใ๐ฅโ, โใ๐ โค ใ๐น โฒ(๐ฅ), โใ๐ for all โ โ ๐ , which is only possible for๐ฅโ = ๐น โฒ(๐ฅ). ๏ฟฝ
The following example shows that Theorem 13.5 does not hold if ๐น is merely Frรฉchetdierentiable.
184
13 clarke subdifferentials
Example 13.6. Let ๐ : โ โ โ, ๐ (๐ก) = ๐ก2 sin(๐กโ1). Then it is straightforward (if tedious)to show that ๐ is dierentiable on โ with
๐ โฒ(๐ก) ={2๐ก sin(๐กโ1) โ cos(๐กโ1) if ๐ก โ 0,0 if ๐ก = 0.
In particular, ๐ is not continuously dierentiable at ๐ก = 0. But a similar limit argumentshows that for all โ โ โ,
๐ โฆ(0;โ) = |โ |and hence that
๐๐ถ ๐ (0) = [โ1, 1] ) {0} = {๐ โฒ(0)}.(The rst equality also follows more directly from Theorem 13.26 below.)
As the example suggests, we always have the following weaker relation.
Lemma 13.7. Let ๐น : ๐ โ โ be Lipschitz continuous and Gรขteaux dierentiable in aneighborhood๐ of ๐ฅ โ ๐ . Then ๐ท๐น (๐ฅ) โ ๐๐ถ๐น (๐ฅ).
Proof. Let โ โ ๐ be arbitrary. First, note that we always have that
(13.3) ๐น โฒ(๐ฅ ;โ) = lim๐กโ 0
๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โค lim sup๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ)๐ก
= ๐น โฆ(๐ฅ ;โ).
Since ๐น is Gรขteaux dierentiable, it follows that
ใ๐ท๐น (๐ฅ), โใ๐ = ๐น โฒ(๐ฅ ;โ) โค ๐น โฆ(๐ฅ ;โ) for all โ โ ๐,
and thus ๐ท๐น (๐ฅ) โ ๐๐ถ๐น (๐ฅ) by denition. ๏ฟฝ
Similarly, the Clarke subdierential reduces to the convex subdierential in some situa-tions.
Theorem 13.8. Let ๐น : ๐ โ โ be convex and lower semicontinuous. Then ๐๐ถ๐น (๐ฅ) = ๐๐น (๐ฅ)for all ๐ฅ โ int(dom ๐น ).
Proof. By Theorem 3.13, ๐น is locally Lipschitz continuous near ๐ฅ โ int(dom ๐น ). We nowshow that ๐น โฆ(๐ฅ ;โ) = ๐น โฒ(๐ฅ ;โ) for all โ โ ๐ , which together with Lemma 4.4 yields theclaim. By (13.3), we always have that ๐น โฒ(๐ฅ ;โ) โค ๐น โฆ(๐ฅ ;โ). To show the reverse inequality,
185
13 clarke subdifferentials
let ๐ฟ > 0 be arbitrary. Since the dierence quotient of convex functionals is increasing byLemma 4.3 (i), we obtain that
๐น โฆ(๐ฅ ;โ) = limYโ 0
sup๐ฆโ๐น(๐ฅ,๐ฟY)
sup0<๐ก<Y
๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ)๐ก
โค limYโ 0
sup๐ฆโ๐น(๐ฅ,๐ฟY)
๐น (๐ฆ + Yโ) โ ๐น (๐ฆ)Y
โค limYโ 0
๐น (๐ฅ + Yโ) โ ๐น (๐ฅ)Y
+ 2๐ฟ๐ฟ
= ๐น โฒ(๐ฅ ;โ) + 2๐ฟ๐ฟ,
where the last inequality follows by taking two productive zeros and using the localLipschitz continuity in ๐ฅ . Since ๐ฟ > 0 was arbitrary, this implies that ๐น โฆ(๐ฅ ;โ) โค ๐น โฒ(๐ฅ ;โ),and the claim follows. ๏ฟฝ
A locally Lipschitz continuous functional ๐น : ๐ โ โ with ๐น โฆ(๐ฅ ;โ) = ๐น โฒ(๐ฅ ;โ) for all โ โ ๐is called regular in ๐ฅ โ ๐ . We have just shown that every continuously dierentiable andevery convex and lower semicontinuous functional is regular; intuitively, a function is thusregular at any points in which it is either dierentiable or at least has a โconvex kinkโ.
Finally, similarly to Theorem 4.11 one can show the following pointwise characterizationof the Clarke subdierential of integral functionals with Lipschitz continuous integrands.We again assume that ฮฉ โ โ๐ is open and bounded.
Theorem 13.9. Let ๐ : โ โ โ be Lipschitz continuous and ๐น : ๐ฟ๐ (ฮฉ) โ โ with 1 โค ๐ < โas in Lemma 3.7. Then we have for all ๐ข โ ๐ฟ๐ (ฮฉ) with ๐ = ๐
๐โ1 (where ๐ = โ for ๐ = 1) that
๐๐ถ๐น (๐ข) โ {๐ขโ โ ๐ฟ๐ (ฮฉ) | ๐ขโ(๐ฅ) โ ๐๐ถ ๐ (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ} .
If ๐ is regular at ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ, then ๐น is regular at ๐ข, and equality holds.
Proof. First, by the properties of the Lebesgue integral and the Lipschitz continuity of ๐ ,we have for any ๐ข, ๐ฃ โ ๐ฟ๐ (ฮฉ) that
|๐น (๐ข) โ ๐น (๐ฃ) | โคโซฮฉ|๐ (๐ข (๐ฅ)) โ ๐ (๐ฃ (๐ฅ)) | ๐๐ฅ โค ๐ฟ
โซฮฉ|๐ข (๐ฅ) โ ๐ฃ (๐ฅ) | ๐๐ฅ โค ๐ฟ๐ถ๐ โ๐ข โ ๐ฃ โ๐ฟ๐ ,
where ๐ฟ is the Lipschitz constant of ๐ and๐ถ๐ the constant from the continuous embedding๐ฟ๐ (ฮฉ) โฉโ ๐ฟ1(ฮฉ) for 1 โค ๐ โค โ. Hence ๐น : ๐ฟ๐ (ฮฉ) โ โ is Lipschitz continuous andtherefore nite-valued as well.
186
13 clarke subdifferentials
Let now b โ ๐๐ถ๐น (๐ข) be given and โ โ ๐ฟ๐ (ฮฉ) be arbitrary. By denition, we thus have
(13.4) ใb, โใ๐ฟ๐ โค ๐น โฆ(๐ข;โ) = lim sup๐ฃโ๐ข๐กโ 0
๐น (๐ฃ + ๐กโ) โ ๐น (๐ฃ)๐ก
โคโซฮฉlim sup๐ฃโ๐ข๐กโ 0
๐ (๐ฃ (๐ฅ) + ๐กโ(๐ฅ)) โ ๐ (๐ฃ (๐ฅ))๐ก
๐๐ฅ
โคโซฮฉlim sup๐ฃ๐ฅโ๐ข (๐ฅ)๐ก๐ฅโ 0
๐ (๐ฃ๐ฅ + ๐ก๐ฅโ(๐ฅ)) โ ๐ (๐ฃ๐ฅ )๐ก๐ฅ
๐๐ฅ
=โซฮฉ๐ โฆ(๐ข (๐ฅ);โ(๐ฅ)) ๐๐ฅ,
where we were able to use the Reverse Fatou Lemma to exchange the lim sup with theintegral in the rst inequality since the integrand is bounded from above by the integrablefunction ๐ฟ |โ | due to Lemma 13.1 (i); the second inequality follows by bounding for almostevery ๐ฅ โ ฮฉ the (pointwise) limit over the sequences realizing the lim sup in the secondline by the lim sup over all admissible sequences.
To interpret (13.4) pointwise, we dene for ๐ฅ โ ฮฉ
๐๐ฅ : โ โ โ, ๐๐ฅ (๐ก) โ ๐ โฆ(๐ข (๐ฅ); ๐ก).
From Lemma 13.1 (ii)โ(iii), it follows that ๐๐ฅ is convex; Lemma 13.1 (i) further implies thatthe function ๐ฅ โฆโ ๐๐ฅ (โ(๐ฅ)) is measurable for any โ โ ๐ฟ๐ (ฮฉ). Since ๐๐ฅ (0) = 0, (13.4) impliesthat
ใb, โ โ 0ใ๐ฟ๐ โคโซฮฉ๐๐ฅ (โ(๐ฅ)) ๐๐ฅ โ
โซฮฉ๐๐ฅ (0) ๐๐ฅ,
i.e., b โ ๐๐บ (0) for the superposition operator ๐บ (โ) โโซฮฉ๐๐ฅ (โ(๐ฅ)) ๐๐ฅ . Arguing exactly as
in the proof of Theorem 4.11, this implies that b = ๐ขโ โ ๐ฟ๐ (ฮฉ) with ๐ขโ(๐ฅ) โ ๐๐๐ฅ (0) foralmost every ๐ฅ โ ฮฉ, i.e.,
๐ขโ(๐ฅ)โ(๐ฅ) = ๐ขโ(๐ฅ) (โ(๐ฅ) โ 0) โค ๐๐ฅ (โ(๐ฅ)) โ ๐๐ฅ (0) = ๐ โฆ(๐ข (๐ฅ);โ(๐ฅ))
for almost every ๐ฅ โ ฮฉ. Since โ โ ๐ฟ๐ (ฮฉ) was arbitrary, this implies that๐ขโ(๐ฅ) โ ๐๐ถ ๐ (๐ข (๐ฅ))almost everywhere as claimed.
It remains to show the remaining assertions when ๐ is regular. In this case, it follows from(13.4) that for any โ โ ๐ฟ๐ (ฮฉ),
(13.5) ๐น โฆ(๐ข;โ) โคโซฮฉ๐ โฆ(๐ข (๐ฅ);โ(๐ฅ)) ๐๐ฅ =
โซฮฉ๐ โฒ(๐ข (๐ฅ);โ(๐ฅ)) ๐๐ฅ
โค lim๐กโ 0
๐น (๐ข + ๐กโ) โ ๐น (๐ข)๐ก
= ๐น โฒ(๐ข;โ) โค ๐น โฆ(๐ข;โ),
187
13 clarke subdifferentials
where the second inequality is obtained by applying Fatouโs Lemma, this time appealingto the integrable lower bound โ๐ฟ |โ(๐ฅ) |. This shows that ๐น โฒ(๐ข;โ) = ๐น โฆ(๐ข;โ) and hencethat ๐น is regular. We further obtain for any ๐ขโ โ ๐ฟ๐ (ฮฉ) with ๐ขโ(๐ฅ) โ ๐๐ถ ๐ (๐ข (๐ฅ)) almosteverywhere and any โ โ ๐ฟ๐ (ฮฉ), that
ใ๐ขโ, โใ๐ฟ๐ =โซฮฉ๐ขโ(๐ฅ)โ(๐ฅ) ๐๐ฅ โค
โซฮฉ๐ โฆ(๐ข (๐ฅ);โ(๐ฅ)) ๐๐ฅ โค ๐น โฆ(๐ข,โ),
where we have used (13.5) in the last inequality. Since โ โ ๐ฟ๐ (ฮฉ) was arbitrary, this impliesthat ๐ขโ โ ๐๐ถ๐น (๐ข). ๏ฟฝ
Under additional assumptions similar to those of Theorem 2.12 and with more technicalarguments, this result can be extended to spatially varying integrands ๐ : ฮฉ รโ โ โ; see,e.g., [Clarke 1990, Theorem 2.7.5].
13.3 calculus rules
We now turn to calculus rules. The rst one follows directly from the denition.
Theorem 13.10. Let ๐น : ๐ โ โ be locally Lipschitz continuous near ๐ฅ โ ๐ and ๐ผ โ โ. Then,
๐๐ถ (๐ผ๐น ) (๐ฅ) = ๐ผ๐๐ถ (๐น ) (๐ฅ).
Proof. First, ๐ผ๐น is clearly locally Lipschitz continuous near ๐ฅ for any ๐ผ โ โ. If ๐ผ = 0, bothsides of the claimed equality are zero (which is easiest seen from Theorem 13.5). If ๐ผ > 0,we have that (๐ผ๐น )โฆ(๐ฅ ;โ) = ๐ผ๐น โฆ(๐ฅ ;โ) for all โ โ ๐ from the denition. Hence,
๐ผ๐๐ถ๐น (๐ฅ) = {๐ผ๐ฅโ โ ๐ โ | ใ๐ฅโ, โใ๐ โค ๐น โฆ(๐ฅ ;โ) for all โ โ ๐ }= {๐ผ๐ฅโ โ ๐ โ | ใ๐ผ๐ฅโ, โใ๐ โค ๐ผ๐น โฆ(๐ฅ ;โ) for all โ โ ๐ }= {๐ฆโ โ ๐ โ | ใ๐ฆโ, โใ๐ โค (๐ผ๐น )โฆ(๐ฅ ;โ) for all โ โ ๐ }= ๐๐ถ (๐ผ๐น ) (๐ฅ).
To conclude the proof, it suces to show the claim for๐ผ = โ1. For that,we use Lemma 13.1 (iv)to obtain that
๐๐ถ (โ๐น ) (๐ฅ) = {๐ฅโ โ ๐ โ | ใ๐ฅโ, โใ๐ โค (โ๐น )โฆ(๐ฅ ;โ) for all โ โ ๐ }= {๐ฅโ โ ๐ โ | ใโ๐ฅโ,โโใ๐ โค ๐น โฆ(๐ฅ ;โโ) for all โ โ ๐ }= {โ๐ฆโ โ ๐ โ | ใ๐ฆโ, ๐ใ๐ โค ๐น โฆ(๐ฅ ;๐) for all ๐ โ ๐ }= โ๐๐ถ๐น (๐ฅ). ๏ฟฝ
Corollary 13.11. Let ๐น : ๐ โ โ be locally Lipschitz continuous near ๐ฅ โ ๐ . If ๐น has a localmaximum in ๐ฅ , then 0 โ ๐๐ถ๐น (๐ฅ).
188
13 clarke subdifferentials
Proof. If ๐ฅ is a local maximizer of ๐น , it is a local minimizer of โ๐น . Hence, Theorems 13.4and 13.10 imply that
0 โ ๐๐ถ (โ๐น ) (๐ฅ) = โ๐๐ถ๐น (๐ฅ),i.e., 0 = โ0 โ ๐๐ถ๐น (๐ฅ). ๏ฟฝ
support functionals
The remaining rules are signicantly more involved. As in the previous proofs, a key stepis to relate dierent sets of the form (13.2), which we will do with the help of the followinglemmas.
Lemma 13.12. Let ๐น : ๐ โ โ be positively homogeneous, subadditive, and lower semicontin-uous, and let
๐ด = {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ โค ๐น (๐ฅ) for all ๐ฅ โ ๐ } .Then,
(13.6) ๐น (๐ฅ) = sup๐ฅโโ๐ด
ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅ โ ๐ .
Proof. By denition of ๐ด, the inequality ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ) โค 0 holds for all ๐ฅ โ ๐ if and onlyif ๐ฅโ โ ๐ด. Thus, a case distinction as in Example 5.3 (ii) using the positively homogeneityof ๐น shows that
๐น โ(๐ฅโ) = sup๐ฅโ๐
ใ๐ฅโ, ๐ฅใ๐ โ ๐น (๐ฅ) ={0 ๐ฅโ โ ๐ด,โ ๐ฅโ โ ๐ด,
i.e., ๐น โ = ๐ฟ๐ด. Further, ๐น by assumption is also subadditive and hence convex as well aslower semicontinuous. Theorem 5.1 thus implies that
๐น (๐ฅ) = ๐น โโ(๐ฅ) = (๐ฟ๐ด)โ(๐ฅ) = sup๐ฅโโ๐ด
ใ๐ฅโ, ๐ฅใ๐ . ๏ฟฝ
The right-hand side of (13.6) is called the support functional of ๐ด โ ๐ โ; see, e.g., [Hiriart-Urruty & Lemarรฉchal 2001] for their use in convex analysis (in nite dimensions).
Lemma 13.13. Let ๐ด, ๐ต โ ๐ โ be nonempty, convex and weakly-โ closed. Then ๐ด โ ๐ต if andonly if
(13.7) sup๐ฅโโ๐ด
ใ๐ฅโ, ๐ฅใ๐ โค sup๐ฅโโ๐ต
ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅ โ ๐ .
189
13 clarke subdifferentials
Proof. If ๐ด โ ๐ต, then the right-hand side of (13.7) is obviously not less than the left-handside. Conversely, assume that there exists an ๐ฅโ โ ๐ด with ๐ฅโ โ ๐ต. By the assumptions on ๐ดand ๐ต, we then obtain from Theorem 1.13 an ๐ฅ โ ๐ and a _ โ โ with
ใ๐งโ, ๐ฅใ๐ โค _ < ใ๐ฅโ, ๐ฅใ๐ for all ๐งโ โ ๐ต.
Taking the supremum over all ๐งโ โ ๐ต and estimating the right-hand side by the supremumover all ๐ฅโ โ ๐ด then yields that
sup๐งโโ๐ต
ใ๐งโ, ๐ฅใ๐ < sup๐ฅโโ๐ด
ใ๐ฅโ, ๐ฅใ๐ .
Hence (13.7) is violated, and the claim follows by contraposition. ๏ฟฝ
Corollary 13.14. Let ๐ด, ๐ต โ ๐ โ be nonempty, convex and weakly-โ closed. Then ๐ด = ๐ต if andonly if
(13.8) sup๐ฅโโ๐ด
ใ๐ฅโ, ๐ฅใ๐ = sup๐ฅโโ๐ต
ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅ โ ๐ .
Proof. Again, the claim is obvious if ๐ด = ๐ต. Conversely, if (13.8) holds, then in particular(13.7) holds, and we obtain from Lemma 13.13 that ๐ด โ ๐ต. Exchanging the roles of ๐ด and ๐ตnow yields the claim. ๏ฟฝ
Lemma 13.12 together with Lemma 13.1 directly yields the following useful representation.
Corollary 13.15. Let ๐น : ๐ โ โ be locally Lipschitz continuous and ๐ฅ โ ๐ . Then
๐น โฆ(๐ฅ ;โ) = sup๐ฅโโ๐๐ถ๐น (๐ฅ)
ใ๐ฅโ, โใ๐ for all โ โ ๐ .
For example, this implies a converse result to Theorem 13.5.
Corollary 13.16. Let ๐น : ๐ โ โ be locally Lipschitz continuous near ๐ฅ . If ๐๐ถ๐น (๐ฅ) = {๐ฅโ} forsome ๐ฅโ โ ๐ โ, then ๐น is Gรขteaux dierentiable in ๐ฅ with ๐ท๐น (๐ฅ) = ๐ฅโ.
Proof. Under the assumption, it follows from Corollary 13.15 that
๐น โฆ(๐ฅ ;โ) = sup๐ฅโโ๐๐น๐ถ (๐ฅ)
ใ๐ฅโ, โใ๐ = ใ๐ฅโ, โใ๐
190
13 clarke subdifferentials
for all โ โ ๐ . In particular, ๐น โฆ(๐ฅ ;โ) is linear (and not just reective) in โ. It thus followsfrom Lemma 13.1 (iv) that for any โ โ ๐ ,
lim inf๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ)๐ก
= โ lim sup๐ฆโ๐ฅ๐กโ 0
โ๐น (๐ฆ + ๐กโ) โ (โ๐น (๐ฆ))๐ก
= โ(โ๐น )โฆ(๐ฅ ;โ) = โ๐น โฆ(๐ฅ ;โโ) = ๐น โฆ(๐ฅ, โ)= lim sup
๐ฆโ๐ฅ๐กโ 0
๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ)๐ก
.
Hence the lim sup is a proper limit, and thus ๐น โฆ(๐ฅ ;โ) = ๐น โฒ(๐ฅ ;โ); i.e., ๐น is regular in ๐ฅ . Thisshows that ๐น โฒ(๐ฅ ;โ) is linear and bounded in โ, and hence ๐ฅโ is by denition the Gรขteauxderivative. ๏ฟฝ
It is not hard to verify from the denition and the Lipschitz continuity of ๐น that in thiscase, ๐ฅโ is in fact a Frรฉchet derivative.
We can also use this to show the promised nonemptiness of the convex subdierential.
Theorem 13.17. Let ๐ be a Banach space and let ๐น : ๐ โ โ be proper, convex, and lowersemicontinuous, and ๐ฅ โ int(dom ๐น ). Then ๐๐น (๐ฅ) is nonempty, convex, weakly-โ closed, andbounded.
Proof. Since ๐ฅ โ int(dom ๐น ), Theorem 13.8 shows that ๐๐ถ๐น (๐ฅ) = ๐๐น (๐ฅ) and that ๐น is regularin ๐ฅ . It thus follows from Corollary 13.15 and Lemma 4.3 (iii) that sup๐ฅโโ๐๐น (๐ฅ) ใ๐ฅโ, โใ๐ =๐น โฒ(๐ฅ ;โ) โ โ for ๐ฅ โ int(dom ๐น ), and hence the supremum cannot be over the empty set(for which any supremum is โโ by convention). The remaining properties follow fromLemma 13.2. ๏ฟฝ
By a similar argument, we now obtain the promised converse of Theorem 4.5; we combineboth statements here for the sake of reference.
Theorem 13.18. Let ๐ be a Banach space and let ๐น : ๐ โ โ be convex. If ๐น is Gรขteauxdierentiable at ๐ฅ , then ๐๐น (๐ฅ) = {๐ท๐น (๐ฅ)}. Conversely, if ๐ฅ โ int(dom ๐น ) and ๐๐น (๐ฅ) = {๐ฅโ}is a singleton, then ๐น is Gรขteaux dierentiable at ๐ฅ with ๐ท๐น (๐ฅ) = ๐ฅโ.
Proof. The rst claim was already shown in Theorem 4.5, while the second follows fromCorollary 13.16 together with Theorem 13.8. ๏ฟฝ
As another consequence, we can show that MoreauโYosida regularization dened in Sec-tion 7.3 preserves (global!) Lipschitz continuity.
191
13 clarke subdifferentials
Lemma 13.19. Let ๐ be a Hilbert space and let ๐น : ๐ โ โ be Lipschitz continuous withconstant ๐ฟ. Then ๐น๐พ is Lipschitz continuous with constant ๐ฟ as well. If ๐น is in addition convex,
then ๐น โ ๐พ๐ฟ2
2 โค ๐น๐พ โค ๐น .
Proof. Let ๐ฅ, ๐ง โ ๐ . We expand
๐น๐พ (๐ฅ) โ ๐น๐พ (๐ง) = sup๐ฆ๐งโ๐
inf๐ฆ๐ฅโ๐
(๐น (๐ฆ๐ฅ ) โ ๐น (๐ฆ๐ง) + 1
2๐พ โ๐ฆ๐ฅ โ ๐ฅ โ2๐ โ 1
2๐พ โ๐ฆ๐ง โ ๐งโ2๐
).
Taking ๐ฆ๐ฅ = ๐ฆ๐ง + ๐ฅ โ ๐ง, we estimate
๐น๐พ (๐ฅ) โ ๐น๐พ (๐ง) โค sup๐ฆ๐งโ๐
(๐น (๐ฆ๐ง + ๐ฅ โ ๐ง) โ ๐น (๐ฆ๐ง)) โค ๐ฟโ๐ฅ โ ๐งโ๐ .
Exchanging ๐ฅ and ๐ฆ , we obtain the rst claim.
For the second claim, we rst observe that by assumption dom ๐น = ๐ . Hence by Theo-rem 13.17 and Lemma 13.2, for every ๐ฅ โ ๐ , there exists some ๐ฅโ โ ๐๐น (๐ฅ) with โ๐ฅโโ๐ โ โค ๐ฟ.Thus, using Lemma 4.4, for any ๐ฅโ โ ๐๐น (๐ฅ),
๐น๐พ (๐ฅ) = inf๐ฆโ๐
๐น (๐ง) + 12๐พ โ๐ฅ โ ๐ฆ โ2๐ โฅ ๐น (๐ฅ) + ใ๐ฅโ, ๐ง โ ๐ฅใ๐ + 1
2๐พ โ๐ฅ โ ๐งโ2๐ .
The CauchyโSchwarz and generalized Youngโs inequality then yield ๐น๐พ (๐ฅ) โฅ ๐น (๐ฅ) โ๐พ2 โ๐ฅโโ2๐ โ โฅ ๐น (๐ฅ) โ ๐พ
2๐ฟ2. The second inequality follows by estimating the inmum in (7.18)
by ๐ง = ๐ฅ . ๏ฟฝ
sum rule
With the aid of these results on support functionals, we can now show a sum rule.
Theorem 13.20. Let ๐น,๐บ : ๐ โ โ be locally Lipschitz continuous near ๐ฅ โ ๐ . Then,
๐๐ถ (๐น +๐บ) (๐ฅ) โ ๐๐ถ๐น (๐ฅ) + ๐๐ถ๐บ (๐ฅ) .
If ๐น and ๐บ are regular at ๐ฅ , then ๐น +๐บ is regular at ๐ฅ and equality holds.
Proof. It is clear that ๐น +๐บ is locally Lipschitz continuous near ๐ฅ . Furthermore, from theproperties of the lim sup we always have for all โ โ ๐ that
(13.9) (๐น +๐บ)โฆ(๐ฅ ;โ) โค ๐น โฆ(๐ฅ ;โ) +๐บโฆ(๐ฅ ;โ).
If ๐น and ๐บ are regular at ๐ฅ , the calculus of limits yields that
๐น โฆ(๐ฅ ;โ) +๐บโฆ(๐ฅ ;โ) = ๐น โฒ(๐ฅ ;โ) +๐บโฒ(๐ฅ ;โ) = (๐น +๐บ)โฒ(๐ฅ ;โ) โค (๐น +๐บ)โฆ(๐ฅ ;โ),
192
13 clarke subdifferentials
which implies that (๐น +๐บ)โฆ(๐ฅ ;โ) = (๐น +๐บ)โฒ(๐ฅ ;โ), i.e., ๐น +๐บ is regular.
By denition, it follows from (13.9) that
๐๐ถ (๐น +๐บ) (๐ฅ) โ {๐ฅโ โ ๐ โ | ใ๐ฅโ, โใ๐ โค ๐น โฆ(๐ฅ ;โ) +๐บโฆ(๐ฅ ;โ) for all โ โ ๐ } =: ๐ด(with equality if ๐น and๐บ are regular); it thus remains to show that๐ด = ๐๐ถ๐น (๐ฅ) +๐๐ถ๐บ (๐ฅ). Forthis, we use that both ๐๐ถ๐น (๐ฅ) and ๐๐ถ๐บ (๐ฅ) are convex and weakly-โ closed by Lemma 13.2,and, as shown in Lemma 13.1, that generalized directional derivatives and hence their sumsare positively homogeneous, convex, and lower semicontinuous. We thus obtain fromLemma 13.12 for all โ โ ๐ that
sup๐ฅโโ๐๐ถ๐น (๐ฅ)+๐๐ถ๐บ (๐ฅ)
ใ๐ฅโ, โใ๐ = sup๐ฅโ1 โ๐๐ถ๐น (๐ฅ)
ใ๐ฅโ1 , โใ๐ + sup๐ฅโ2โ๐๐ถ๐บ (๐ฅ)
ใ๐ฅโ2, โใ๐
= ๐น โฆ(๐ฅ ;โ) +๐บโฆ(๐ฅ ;โ) = sup๐ฅโโ๐ด
ใ๐ฅโ, โใ๐ .
The claimed equality of ๐ด and the sum of the subdierentials now follows from Corol-lary 13.14. ๏ฟฝ
Note the dierences to the convex sum rule: The generic inclusion is now in the otherdirection; furthermore, both functionals have to be regular, and in exactly the point wherethe sum rule is applied. By induction, one obtains from this sum rule for an arbitrarynumber of functionals (which all have to be regular).
chain rule
To prove a chain rule, we need the following โnonsmoothโ mean value theorem.
Theorem 13.21. Let ๐น : ๐ โ โ be locally Lipschitz continuous near ๐ฅ โ ๐ and ๐ฅ be in theLipschitz neighborhood of ๐ฅ . Then there exists a _ โ (0, 1) and an ๐ฅโ โ ๐๐ถ๐น (๐ฅ + _(๐ฅ โ ๐ฅ))such that
๐น (๐ฅ) โ ๐น (๐ฅ) = ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ .
Proof. Dene๐,๐ : [0, 1] โ โ as
๐ (_) โ ๐น (๐ฅ + _(๐ฅ โ ๐ฅ)), ๐ (_) โ ๐ (_) + _(๐น (๐ฅ) โ ๐น (๐ฅ)) .By the assumptions on ๐น and ๐ฅ , both๐ and ๐ are Lipschitz continuous. In addition, ๐ (0) =๐น (๐ฅ) = ๐ (1), and hence ๐ has a local minimum or maximum in an interior point _ โ (0, 1).From the Fermat principle Theorem 13.4 or Corollary 13.11, respectively, together with thesum rule from Theorem 13.20 and the characterization of the subdierential of the secondterm from Theorem 13.5, we thus obtain that
0 โ ๐๐ถ๐ (_) โ ๐๐ถ๐ (_) + {๐น (๐ฅ) โ ๐น (๐ฅ)}.
193
13 clarke subdifferentials
Hence we are nished if we can show for ๐ฅ_ โ ๐ฅ + _(๐ฅ โ ๐ฅ) that
(13.10) ๐๐ถ๐ (_) โ{ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐
๏ฟฝ๏ฟฝ ๐ฅโ โ ๐๐ถ๐น (๐ฅ_)} =: ๐ด.
For this purpose, consider for arbitrary ๐ โ โ the generalized derivative
๐ โฆ(_; ๐ ) = lim sup_โ_๐กโ 0
๐ (_ + ๐ก๐ ) โ๐ (_)๐ก
= lim sup_โ_๐กโ 0
๐น (๐ฅ + (_ + ๐ก๐ ) (๐ฅ โ ๐ฅ)) โ ๐น (๐ฅ + _(๐ฅ โ ๐ฅ))๐ก
โค lim sup๐งโ๐ฅ_๐กโ 0
๐น (๐ง + ๐ก๐ (๐ฅ โ ๐ฅ)) โ ๐น (๐ง)๐ก
= ๐น โฆ(๐ฅ_; ๐ (๐ฅ โ ๐ฅ)),
where the inequality follows from considering arbitrary sequences ๐ง โ ๐ฅ_ (instead ofspecial sequences of the form ๐ง๐ = ๐ฅ + _๐ (๐ฅ โ ๐ฅ)) in the last lim sup. Lemma 13.13 thusimplies that
(13.11) ๐๐ถ๐ (_) โ{๐กโ โ โ
๏ฟฝ๏ฟฝ ๐กโ๐ โค ๐น โฆ(๐ฅ_; ๐ (๐ฅ โ ๐ฅ)) for all ๐ โ โ}=: ๐ต.
It remains to show that the sets ๐ด and ๐ต from (13.10) and (13.11) coincide. But this followsagain from Lemma 13.12 and Corollary 13.14, since for all ๐ โ โ we have that
sup๐กโโ๐ด
๐กโ๐ = sup๐ฅโโ๐๐ถ๐น (๐ฅ_)
ใ๐ฅโ, ๐ (๐ฅ โ ๐ฅ)ใ๐ = ๐น โฆ(๐ฅ_; ๐ (๐ฅ โ ๐ฅ)) = sup๐กโโ๐ต
๐กโ๐ . ๏ฟฝ
We also need the following generalization of the argument in Theorem 13.5.
Lemma 13.22. Let๐,๐ be Banach spaces and ๐น : ๐ โ ๐ be continuously Frรฉchet dierentiableat ๐ฅ โ ๐ . Let {๐ฅ๐}๐โโ โ ๐ be a sequence with ๐ฅ๐ โ ๐ฅ and {๐ก๐}๐โโ โ (0,โ) be a sequencewith ๐ก๐โ 0. Then for any โ โ ๐ ,
lim๐โโ
๐น (๐ฅ๐ + ๐ก๐โ) โ ๐น (๐ฅ๐)๐ก๐
= ๐น โฒ(๐ฅ)โ.
Proof. Let โ โ ๐ be arbitrary. By the HahnโBanach extension Theorem 1.4, for every ๐ โ โ
there exists a ๐ฆโ๐ โ ๐ โ with โ๐ฆโ๐ โ๐ โ = 1 and
โ๐กโ1๐ (๐น (๐ฅ๐ + ๐ก๐โ) โ ๐น (๐ฅ๐)) โ ๐น โฒ(๐ฅ)โโ๐ = ใ๐ฆโ๐, ๐กโ1๐ (๐น (๐ฅ๐ + ๐ก๐โ) โ ๐น (๐ฅ๐)) โ ๐น โฒ(๐ฅ)โใ๐ .
Applying now the classical mean value theorem to the scalar functions
๐๐ : [0, 1] โ โ, ๐๐ (๐ ) = ใ๐ฆโ๐, ๐น (๐ฅ๐ + ๐ ๐ก๐โ)ใ๐ ,
194
13 clarke subdifferentials
we obtain similarly to the proof of Theorem 2.10 for all ๐ โ โ that
โ๐กโ1๐ (๐น (๐ฅ๐ + ๐ก๐โ) โ ๐น (๐ฅ๐)) โ ๐น โฒ(๐ฅ)โโ๐ = ๐กโ1๐
โซ 1
0ใ๐ฆโ๐, ๐น โฒ(๐ฅ๐ + ๐ ๐ก๐โ)๐ก๐โใ๐ ๐๐ โ ใ๐ฆโ๐, ๐น โฒ(๐ฅ)โใ๐
=โซ 1
0ใ๐ฆโ๐, (๐น โฒ(๐ฅ๐ + ๐ ๐ก๐โ) โ ๐น โฒ(๐ฅ))โใ๐ ๐๐
โคโซ 1
0โ๐น โฒ(๐ฅ๐ + ๐ ๐ก๐โ) โ ๐น โฒ(๐ฅ))โ๐(๐ ;๐ ) ๐๐ โโโ๐ ,
where we have used (1.1) together with โ๐ฆโ๐ โ๐ โ = 1 in the last step. Since ๐น โฒ is continuousby assumption, the integrand goes to zero as ๐ โ โ uniformly in ๐ โ [0, 1], and the claimfollows. ๏ฟฝ
We now come to the chain rule, which in contrast to the convex case does not require thedual mapping to be linear; this is one of the main advantages of the Clarke subdierentialin the context of nonsmooth optimization.
Theorem 13.23. Let ๐ be a separable Banach space, ๐น : ๐ โ ๐ be continuously Frรฉchetdierentiable at ๐ฅ โ ๐ , and ๐บ : ๐ โ โ be locally Lipschitz continuous near ๐น (๐ฅ). Then,
๐๐ถ (๐บ โฆ ๐น ) (๐ฅ) โ ๐น โฒ(๐ฅ)โ๐๐ถ๐บ (๐น (๐ฅ)) โ {๐น โฒ(๐ฅ)โ๐ฆโ | ๐ฆโ โ ๐๐ถ๐บ (๐น (๐ฅ))} .
If ๐บ is regular at ๐น (๐ฅ), then ๐บ โฆ ๐น is regular at ๐ฅ , and equality holds.
Proof. Let us write ๐๐น (๐ฅ) for the neighborhood of ๐น (๐ฅ) where ๐บ is Lipschitz with factor ๐ฟ.The local Lipschitz continuity of๐บ โฆ ๐น follows from that of๐บ and ๐น (which in turn followsfrom Lemma 2.11). For the claimed inclusion (respectively, equality), we argue as above.First we show that for every โ โ ๐ there exists a ๐ฆโ โ ๐๐ถ๐บ (๐น (๐ฅ)) with
(13.12) (๐บ โฆ ๐น )โฆ(๐ฅ ;โ) = ใ๐ฆโ, ๐น โฒ(๐ฅ)โใ๐ .
For this, consider for given โ โ ๐ sequences {๐ฅ๐}๐โโ โ ๐ and {๐ก๐}๐โโ โ (0,โ) with๐ฅ๐ โ ๐ฅ , ๐ก๐โ 0, and
(๐บ โฆ ๐น )โฆ(๐ฅ ;โ) = lim๐โโ
๐บ (๐น (๐ฅ๐ + ๐ก๐โ)) โ๐บ (๐น (๐ฅ๐))๐ก๐
.
Furthermore, by continuity of ๐น , we can nd ๐0 โ โ such that for all ๐ โฅ ๐0, both๐น (๐ฅ๐), ๐น (๐ฅ๐ + ๐ก๐โ) โ ๐๐น (๐ฅ) . Theorem 13.21 thus yields for all ๐ โฅ ๐0 a _๐ โ (0, 1) anda ๐ฆโ๐ โ ๐๐ถ๐บ (๐ฆ๐) for ๐ฆ๐ := ๐น (๐ฅ๐) + _๐ (๐น (๐ฅ๐ + ๐ก๐โ) โ ๐น (๐ฅ๐)) with
(13.13) ๐บ (๐น (๐ฅ๐ + ๐ก๐โ)) โ๐บ (๐น (๐ฅ๐))๐ก๐
= ใ๐ฆโ๐, ๐๐ใ๐ for ๐๐ โ๐น (๐ฅ๐ + ๐ก๐โ) โ ๐น (๐ฅ๐)
๐ก๐
195
13 clarke subdifferentials
Since _๐ โ (0, 1) is uniformly bounded, we also have that ๐ฆ๐ โ ๐น (๐ฅ) for ๐ โ โ. Hence,for ๐ large enough, ๐ฆ๐ โ ๐๐น (๐ฅ) . By Lemma 13.1 and the discussion following (13.2) then,eventually, ๐ฆโ๐ โ ๐๐ถ๐บ (๐ฆ๐) โ ๐น(0, ๐ฟ). This implies that {๐ฆโ๐}๐โโ โ ๐ โ is bounded, andthe BanachโAlaoglu Theorem 1.11 yields a weakly-โ convergent subsequence with limit๐ฆโ โ ๐๐ถ๐บ (๐น (๐ฅ)) by Lemma 13.3. Finally, since ๐น is continuously Frรฉchet dierentiable,๐๐ โ ๐น โฒ(๐ฅ)โ strongly in ๐ by Lemma 13.22. Hence, ใ๐ฆโ๐, ๐๐ใ๐ โ ใ๐ฆโ, ๐น โฒ(๐ฅ)โใ as the dualitypairing of weakly-โ and strongly converging sequences. Passing to the limit in (13.13)therefore yields (13.12) (rst along the subsequence chosen above; by convergence of theleft-hand side of (13.12) and the uniqueness of limits then for the full sequence as well). Bydenition of the Clarke subdierential, we thus have for a ๐ฆโ โ ๐๐ถ๐บ (๐น (๐ฅ)) that
(13.14) (๐บ โฆ ๐น )โฆ(๐ฅ ;โ) = ใ๐ฆโ, ๐น โฒ(๐ฅ)โใ๐ โค ๐บโฆ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ).
If ๐บ is now regular at ๐ฅ , we have that ๐บโฆ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ) = ๐บโฒ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ) and hence bythe local Lipschitz continuity of ๐บ and the Frรฉchet dierentiability of ๐น that
๐บโฆ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ)= lim๐กโ 0
๐บ (๐น (๐ฅ) + ๐ก๐น โฒ(๐ฅ)โ) โ๐บ (๐น (๐ฅ))๐ก
= lim๐กโ 0
๐บ (๐น (๐ฅ) + ๐ก๐น โฒ(๐ฅ)โ) โ๐บ (๐น (๐ฅ + ๐กโ)) +๐บ (๐น (๐ฅ + ๐กโ)) โ๐บ (๐น (๐ฅ))๐ก
โค lim๐กโ 0
(๐ฟโโโ๐ โ๐น (๐ฅ) + ๐น โฒ(๐ฅ)๐กโ โ ๐น (๐ฅ + ๐กโ)โ๐
โ๐กโโ๐ + ๐บ (๐น (๐ฅ + ๐กโ)) โ๐บ (๐น (๐ฅ))๐ก
)= (๐บ โฆ ๐น )โฒ(๐ฅ ;โ) โค (๐บ โฆ ๐น )โฆ(๐ฅ ;โ).
(Since both the sum and the second summand in the next-to-last line converge, this hasto be the case for the rst summand as well.) Together with (13.14), this implies that(๐บ โฆ ๐น )โฒ(๐ฅ ;โ) = (๐บ โฆ ๐น )โฆ(๐ฅ ;โ) (i.e., ๐บ โฆ ๐น is regular at ๐ฅ ) and that
(13.15) (๐บ โฆ ๐น )โฆ(๐ฅ ;โ) = ๐บโฆ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ).
As before, Lemma 13.12 now implies for all โ โ ๐ that
sup๐ฅโโ๐น โฒ(๐ฅ)โ๐๐ถ๐บ (๐น (๐ฅ))
ใ๐ฅโ, โใ๐ = sup๐ฆโโ๐๐ถ๐บ (๐น (๐ฅ))
ใ๐ฆโ, ๐น โฒ(๐ฅ)โใ๐ = ๐บโฆ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ)
and hence by Lemma 13.13 that
๐น โฒ(๐ฅ)โ๐๐ถ๐บ (๐น (๐ฅ)) = {๐ฅโ โ ๐ โ | ใ๐ฅโ, โใ๐ โค ๐บโฆ(๐น (๐ฅ); ๐น โฒ(๐ฅ)โ) for all โ โ ๐ } .
Combined with (13.14) or (13.15) and the denition of the Clarke subdierential in (13.2),this now yields the claimed equality or inclusion for the Clarke subdierential of thecomposition. ๏ฟฝ
196
13 clarke subdifferentials
Again, the generic inclusion is the reverse of the one in the convex chain rule. Note thatequality in the chain rule also holds if โ๐บ is regular, since we can then apply Theorem 13.23to โ๐บ โฆ ๐น and use that ๐๐ถ (โ๐บ) (๐น (๐ฅ)) = โ๐๐ถ๐บ (๐น (๐ฅ)) by Theorem 13.10. Furthermore, if๐บ is not regular but ๐น โฒ(๐ฅ) is surjective, a similar proof shows that equality (but not theregularity of ๐บ โฆ ๐น ) holds in the chain rule; see [Clarke 2013, Theorem 10.19].
Example 13.24. As a simple example, we consider
๐ : โ2 โ โ, (๐ฅ1, ๐ฅ2) โฆโ |๐ฅ1๐ฅ2 |,
which is not convex. To compute the Clarke subdierential, we write ๐ = ๐ โฆ๐ for
๐ : โ โ โ, ๐ก โฆโ |๐ก |, ๐ : โ2 โ โ, (๐ฅ1, ๐ฅ2) โฆโ ๐ฅ1๐ฅ2,
where, ๐ is nite-valued, convex, and Lipschitz continuous, and hence regular at any๐ก โ โ, and ๐ is continuously dierentiable for all ๐ฅ โ โ2 with Frรฉchet derivative
๐ โฒ(๐ฅ) : โโโ, ๐ โฒ(๐ฅ)โ โ ๐ฅ2โ1 + ๐ฅ1โ2.
Its adjoint is easily veried to be given by
๐ โฒ(๐ฅ)โ : โ โ โ2, ๐ โฒ(๐ฅ)โ๐ โ ( ๐ฅ2๐ก๐ฅ1๐ก
).
Hence, Theorem 13.23 together with Theorem 13.8 yields that ๐ is regular at any ๐ฅ โ โ2
and that๐๐ถ ๐ (๐ฅ) = ๐ โฒ(๐ฅ)โ๐๐(๐ (๐ฅ)) =
(๐ฅ2๐ฅ1
)sign(๐ฅ1๐ฅ2),
for the set-valued sign function from Example 4.7.
13.4 characterization in finite dimensions
Amore explicit characterization of the Clarke subdierential is possible in nite-dimensionalspaces. The basis is the following theorem, which only holds in โ๐ ; a proof can be foundin, e.g., [DiBenedetto 2002, Theorem 23.2] or [Heinonen 2005, Theorem 3.1].
Theorem 13.25 (Rademacher). Let ๐ โ โ๐ be open and ๐น : ๐ โ โ be Lipschitz continuous.Then ๐น is Frรฉchet dierentiable at almost every ๐ฅ โ ๐ .
This result allows replacing the lim sup in the denition of the Clarke subdierential (nowconsidered as a subset of โ๐ , i.e., identifying the dual of โ๐ with โ๐ itself) with a properlimit.
197
13 clarke subdifferentials
Theorem 13.26. Let ๐น : โ๐ โ โ be locally Lipschitz continuous near ๐ฅ โ โ๐ . Then ๐น isFrรฉchet dierentiable on โ๐ \ ๐ธ๐น for a set ๐ธ๐น โ โ๐ of Lebesgue measure 0 and
(13.16) ๐๐ถ๐น (๐ฅ) = co{lim๐โโโ๐น (๐ฅ๐)
๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅ๐ โ ๐ฅ, ๐ฅ๐ โ ๐ธ๐น},
where co๐ด denotes the convex hull of ๐ด โ โ๐ .
Proof. We rst note that the Rademacher Theorem ensures that such a set ๐ธ๐น exists and โpossibly after intersection with the Lipschitz neighborhood of ๐ฅ โ has Lebesgue measure0. Hence there indeed exist sequences {๐ฅ๐}๐โโ โ โ๐ \ ๐ธ๐น with ๐ฅ๐ โ ๐ฅ . Furthermore, thelocal Lipschitz continuity of ๐น yields that for any ๐ฅ๐ in the Lipschitz neighborhood of ๐ฅand any โ โ โ๐ , we have that
|ใโ๐น (๐ฅ๐), โใ| =๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝlim๐กโ 0
๐น (๐ฅ๐ + ๐กโ) โ ๐น (๐ฅ๐)๐ก
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โค ๐ฟโโโ
and hence that โโ๐น (๐ฅ๐)โ โค ๐ฟ. This implies that {โ๐น (๐ฅ๐)}๐โโ is bounded and thus containsa convergent subsequence. The set on the right-hand side of (13.16) is therefore nonempty.
Let now {๐ฅ๐}๐โโ โ โ๐ \ ๐ธ๐น be an arbitrary sequence with ๐ฅ๐ โ ๐ฅ and {โ๐น (๐ฅ๐)}๐โโ โ ๐ฅโ
for some ๐ฅโ โ โ๐ . Since ๐น is dierentiable at every ๐ฅ๐ โ ๐ธ๐น by denition, Lemma 13.7yields that โ๐น (๐ฅ๐) โ ๐๐ถ๐น (๐ฅ๐), and hence ๐ฅโ โ ๐๐ถ๐น (๐ฅ) by Lemma 13.3. The convexity of๐๐ถ๐น (๐ฅ) now implies that any convex combination of such limits ๐ฅโ is contained in ๐๐ถ๐น (๐ฅ),which shows the inclusion โโโ in (13.16).
For the other inclusion, we rst show for all โ โ โ๐ and Y > 0 that
(13.17) ๐น โฆ(๐ฅ ;โ) โ Y โค lim sup๐ธ๐นโ๐ฆโ๐ฅ
ใโ๐น (๐ฆ), โใ =: ๐ (โ).
Indeed, by denition of๐ (โ) and of the lim sup, for every Y > 0 there exists a ๐ฟ > 0 suchthat
ใโ๐น (๐ฆ), โใ โค ๐ (โ) + Y for all ๐ฆ โ ๐(๐ฅ, ๐ฟ) \ ๐ธ๐น .Here, ๐ฟ > 0 can be chosen suciently small for ๐น to be Lipschitz continuous on ๐(๐ฅ, ๐ฟ).In particular, ๐ธ๐น โฉ๐(๐ฅ, ๐ฟ) is a set of zero measure. Hence, ๐น is dierentiable at ๐ฆ + ๐กโ foralmost all ๐ฆ โ ๐(๐ฅ, ๐ฟ2 ) and almost all ๐ก โ (0, ๐ฟ
2โโโ ) by Fubiniโs Theorem. The classical meanvalue theorem therefore yields for all such ๐ฆ and ๐ก that
(13.18) ๐น (๐ฆ + ๐กโ) โ ๐น (๐ฆ) =โซ ๐ก
0ใโ๐น (๐ฆ + ๐ โ), โใ ๐๐ โค ๐ก (๐ (โ) + Y)
since ๐ฆ + ๐ โ โ ๐(๐ฅ, ๐ฟ) for all ๐ โ (0, ๐ก) by the choice of ๐ก . The continuity of ๐น implies thatthe full inequality (13.18) even holds for all ๐ฆ โ ๐(๐ฅ, ๐ฟ2 ) and all ๐ก โ (0, ๐ฟ
2โโโ ). Dividing by๐ก > 0 and taking the lim sup over all ๐ฆ โ ๐ฅ and ๐กโ 0 now yields (13.17). Since Y > 0 wasarbitrary, we conclude that ๐น โฆ(๐ฅ ;โ) โค ๐ (โ) for all โ โ โ๐ .
198
13 clarke subdifferentials
As in Lemma 13.1, one can show that the mapping โ โฆโ ๐ (โ) is positively homogeneous,subadditive, and lower semicontinuous. We are thus nished if we can show that the seton the right-hand side of (13.16) โ hereafter denoted by co๐ด โ can be written as
co๐ด ={๐ฅโ โ โ๐
๏ฟฝ๏ฟฝ ใ๐ฅโ, โใ โค ๐ (โ) for all โ โ โ๐}.
For this, we once again appeal to Corollary 13.14 (since both sets are nonempty, convex,and closed). First, we note that the denition of the convex hull implies for all โ โ โ๐ that
sup๐ฅโโco๐ด
ใ๐ฅโ, โใ = sup๐ฅโ๐ โ๐ดโ
๐ ๐ก๐=1,๐ก๐โฅ0
โ๐
๐ก๐ ใ๐ฅโ๐ , โใ = supโ๐ ๐ก๐=1,๐ก๐โฅ0
โ๐
๐ก๐ sup๐ฅโ๐ โ๐ด
ใ๐ฅโ๐ , โใ = sup๐ฅโโ๐ด
ใ๐ฅโ, โใ
since the sum is maximal if and only if each summand is maximal. Now we have that
๐ (โ) = lim sup๐ธ๐นโ๐ฆโ๐ฅ
ใโ๐น (๐ฆ), โใ = sup๐ธ๐นโ๐ฅ๐โ๐ฅ
ใlim๐โโ โ๐น (๐ฅ๐), โใ = sup๐ฅโโ๐ด
ใ๐ฅโ, โใ,
and hence the claim follows from Lemma 13.12. ๏ฟฝ
Remark 13.27. It is possible to extend the Clarke subdierential dened here to extended-real valuedfunctions using an equivalent, more geometrical, construction involving generalized normal conesto epigraphs; see [Clarke 1990, Denition 2.4.10]. We will follow this approach when studying themore general subdierentials for set-valued functionals in Chapters 18 and 20.
199
14 SEMISMOOTH NEWTON METHODS
The proximal point and splitting methods in Chapter 8 are generalizations of gradientmethods and in general have at most linear convergence. In this chapter, we will thereforeconsider second-order methods, specically a generalization of Newtonโs method whichadmits (locally) superlinear convergence.
14.1 convergence of generalized newton methods
As a motivation, we rst consider the most general form of a Newton-type method. Let ๐and ๐ be Banach spaces and ๐น : ๐ โ ๐ be given and suppose we are looking for an ๐ฅ โ ๐with ๐น (๐ฅ) = 0. A Newton-type method to nd such an ๐ฅ then consists of repeating thefollowing steps:
1. choose an invertible๐๐ โ ๐ (๐ฅ๐) โ ๐(๐ ;๐ );2. solve the Newton step ๐๐๐
๐ = โ๐น (๐ฅ๐);3. update ๐ฅ๐+1 = ๐ฅ๐ + ๐ ๐ .
We can now ask under which conditions this method converges to ๐ฅ , and in particular,when the convergence is superlinear, i.e.,
(14.1) lim๐โโ
โ๐ฅ๐+1 โ ๐ฅ โ๐โ๐ฅ๐ โ ๐ฅ โ๐
= 0.
(Recall the discussion in the beginning of Chapter 10.) For this purpose, we set ๐๐ โ ๐ฅ๐ โ ๐ฅand use the Newton step together with the fact that ๐น (๐ฅ) = 0 to obtain that
โ๐ฅ๐+1 โ ๐ฅ โ๐ = โ๐ฅ๐ โ๐ (๐ฅ๐)โ1๐น (๐ฅ๐) โ ๐ฅ โ๐= โ๐ (๐ฅ๐)โ1
[๐น (๐ฅ๐) โ ๐น (๐ฅ) โ๐ (๐ฅ๐) (๐ฅ๐ โ ๐ฅ)
]โ๐
= โ๐ (๐ฅ + ๐๐)โ1[๐น (๐ฅ + ๐๐) โ ๐น (๐ฅ) โ๐ (๐ฅ + ๐๐)๐๐
]โ๐
โค โ๐ (๐ฅ + ๐๐)โ1โ๐(๐ ;๐ ) โ๐น (๐ฅ + ๐๐) โ ๐น (๐ฅ) โ๐ (๐ฅ + ๐๐)๐๐ โ๐ .
Hence, (14.1) holds under
200
14 semismooth newton methods
(i) a regularity condition: there exists a ๐ถ > 0 with
โ๐ (๐ฅ + ๐๐)โ1โ๐(๐ ;๐ ) โค ๐ถ for all ๐ โ โ;
(ii) an approximation condition:
lim๐โโ
โ๐น (๐ฅ + ๐๐) โ ๐น (๐ฅ) โ๐ (๐ฅ + ๐๐)๐๐ โ๐โ๐๐ โ๐
= 0.
This motivates the following denition: We call ๐น : ๐ โ ๐ Newton dierentiable in ๐ฅ โ ๐if there exists a neighborhood๐ โ ๐ of ๐ฅ and a mapping ๐ท๐ ๐น : ๐ โ ๐(๐ ;๐ ) such that
(14.2) limโโโ๐โ0
โ๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐ท๐ ๐น (๐ฅ + โ)โโ๐โโโ๐ = 0.
We then call ๐ท๐ ๐น (๐ฅ) a Newton derivative of ๐น at ๐ฅ . Note the dierences to the Frรฉchetderivative: First, the Newton derivative is evaluated in ๐ฅ +โ instead of ๐ฅ . More importantly,we have not required any connection between๐ท๐ ๐น with ๐น ,while the only possible candidatefor the Frรฉchet derivative was the Gรขteaux derivative (which itself was linked to ๐น via thedirectional derivative). A function thus can only be Newton dierentiable (or not) withrespect to a concrete choice of ๐ท๐ ๐น . In particular, Newton derivatives are not unique.
If ๐น is Newton dierentiable with Newton derivative ๐ท๐ ๐น , we can set๐ (๐ฅ๐) = ๐ท๐ ๐น (๐ฅ๐)and obtain the semismooth Newton method
(14.3) ๐ฅ๐+1 = ๐ฅ๐ โ ๐ท๐ ๐น (๐ฅ๐)โ1๐น (๐ฅ๐).Its local superlinear convergence follows directly from the construction.
Theorem 14.1. Let๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be Newton dierentiable at ๐ฅ โ ๐with ๐น (๐ฅ) = 0 with Newton derivative ๐ท๐ ๐น (๐ฅ). Assume further that there exist ๐ฟ > 0 and๐ถ > 0 with โ๐ท๐ ๐น (๐ฅ)โ1โ๐(๐ ;๐ ) โค ๐ถ for all ๐ฅ โ ๐(๐ฅ, ๐ฟ). Then the semismooth Newton method(14.3) converges to ๐ฅ for all ๐ฅ0 suciently close to ๐ฅ .
Proof. The proof is virtually identical to that for the classical Newton method. We havealready shown that for any ๐ฅ0 โ ๐(๐ฅ, ๐ฟ),(14.4) โ๐1โ๐ โค ๐ถ โ๐น (๐ฅ + ๐0) โ ๐น (๐ฅ) โ ๐ท๐ ๐น (๐ฅ + ๐0)๐0โ๐ .Let now Y โ (0, 1) be arbitrary. The Newton dierentiability of ๐น then implies that thereexists a ๐ > 0 such that
โ๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐ท๐ ๐น (๐ฅ + โ)โโ๐ โค Y
๐ถโโโ๐ for all โโโ๐ โค ๐.
Hence, if we choose ๐ฅ0 such that โ๐ฅ โ ๐ฅ0โ๐ โค min{๐ฟ, ๐}, the estimate (14.4) implies thatโ๐ฅโ๐ฅ 1โ๐ โค Yโ๐ฅโ๐ฅ0โ๐ . By induction,we obtain from this that โ๐ฅโ๐ฅ๐ โ๐ โค Y๐ โ๐ฅโ๐ฅ0โ๐ โ 0.Since Y โ (0, 1) was arbitrary, we can take in each step ๐ a dierent Y๐ โ 0, which showsthat the convergence is in fact superlinear. ๏ฟฝ
201
14 semismooth newton methods
14.2 newton derivatives
The remainder of this chapter is dedicated to the construction of Newton derivatives(although it should be pointed out that the verication of the approximation condition isusually the much more involved step in practice). We begin with the obvious connectionwith the Frรฉchet derivative.
Theorem 14.2. Let ๐,๐ be Banach spaces. If ๐น : ๐ โ ๐ is continuously dierentiable at๐ฅ โ ๐ , then ๐น is also Newton dierentiable at ๐ฅ with Newton derivative ๐ท๐ ๐น (๐ฅ) = ๐น โฒ(๐ฅ).
Proof. We have for arbitrary โ โ ๐ that
โ๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐น โฒ(๐ฅ + โ)โโ๐ โค โ๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐น โฒ(๐ฅ)โโ๐+ โ๐น โฒ(๐ฅ) โ ๐น โฒ(๐ฅ + โ)โ๐(๐ ;๐ ) โโโ๐ ,
where the rst summand is ๐ (โโโ๐ ) by denition of the Frรฉchet derivative and the secondby the continuity of ๐น โฒ. ๏ฟฝ
Calculus rules can be shown similarly to those for Frรฉchet derivatives. For the sum rulethis is immediate; here we prove a chain rule by way of example.
Theorem 14.3. Let ๐ , ๐ , and ๐ be Banach spaces, and let ๐น : ๐ โ ๐ be Newton dierentiableat ๐ฅ โ ๐ with Newton derivative ๐ท๐ ๐น (๐ฅ) and ๐บ : ๐ โ ๐ be Newton dierentiable at๐ฆ โ ๐น (๐ฅ) โ ๐ with Newton derivative ๐ท๐๐บ (๐ฆ). If ๐ท๐ ๐น and ๐ท๐๐บ are uniformly bounded ina neighborhood of ๐ฅ and ๐ฆ , respectively, then ๐บ โฆ ๐น is also Newton dierentiable at ๐ฅ withNewton derivative
๐ท๐ (๐บ โฆ ๐น ) (๐ฅ) = ๐ท๐๐บ (๐น (๐ฅ)) โฆ ๐ท๐ ๐น (๐ฅ).
Proof. We proceed as in the proof of Theorem 2.7. For โ โ ๐ and ๐ โ ๐น (๐ฅ + โ) โ ๐น (๐ฅ) wehave that
(๐บ โฆ ๐น ) (๐ฅ + โ) โ (๐บ โฆ ๐น ) (๐ฅ) = ๐บ (๐ฆ + ๐) โ๐บ (๐ฆ).The Newton dierentiability of ๐บ then implies that
โ(๐บ โฆ ๐น ) (๐ฅ + โ) โ (๐บ โฆ ๐น ) (๐ฅ) โ ๐ท๐๐บ (๐ฆ + ๐)๐โ๐ = ๐1(โ๐โ๐ )
with ๐1(๐ก)/๐ก โ 0 for ๐ก โ 0. The Newton dierentiability of ๐น further implies that
โ๐ โ ๐ท๐ ๐น (๐ฅ + โ)โโ๐ = ๐2(โโโ๐ )
with ๐2(๐ก)/๐ก โ 0 for ๐ก โ 0. In particular,
โ๐โ๐ โค โ๐ท๐ ๐น (๐ฅ + โ)โ๐(๐ ;๐ ) โโโ๐ + ๐2(โโโ๐ ).
202
14 semismooth newton methods
The uniform boundedness of ๐ท๐ ๐น now implies that โ๐โ๐ โ 0 for โโโ๐ โ 0. Hence, usingthat ๐ฆ + ๐ = ๐น (๐ฅ + โ), we obtain
โ(๐บ โฆ ๐น ) (๐ฅ + โ) โ (๐บ โฆ ๐น ) (๐ฅ) โ ๐ท๐๐บ (๐น (๐ฅ + โ))๐ท๐ ๐น (๐ฅ + โ)โโ๐โค โ๐บ (๐ฆ + ๐) โ๐บ (๐ฆ) โ ๐ท๐๐บ (๐ฆ + ๐)๐โ๐+ โ๐ท๐๐บ (๐ฆ + ๐) [๐ โ ๐ท๐ ๐น (๐ฅ + โ)โ] โ๐
โค ๐1(โ๐โ๐ ) + โ๐ท๐๐บ (๐ฆ + ๐)โ๐(๐ ;๐ )๐2(โโโ๐ ),
and the claim thus follows from the uniform boundedness of ๐ท๐๐บ . ๏ฟฝ
Finally, it follows directly from the denition of the product norm andNewton dierentiabil-ity that Newton derivatives of vector-valued functions can be computed componentwise.
Theorem 14.4. Let ๐,๐๐ be Banach spaces and let ๐น๐ : ๐ โ ๐๐ be Newton dierentiable withNewton derivative ๐ท๐ ๐น๐ for 1 โค ๐ โค ๐. Then
๐น : ๐ โ (๐1 ร ยท ยท ยท ร ๐๐), ๐ฅ โฆโ (๐น1(๐ฅ), . . . , ๐น๐ (๐ฅ))๐
is also Newton dierentiable with Newton derivative
๐ท๐ ๐น (๐ฅ) = (๐ท๐ ๐น1(๐ฅ), . . . , ๐ท๐ ๐น๐ (๐ฅ))๐ .
Since the denition of a Newton derivative is not constructive, allowing dierent choices,the question remains how to obtain a candidate for which the approximation condition inthe denition can be veried. For two classes of functions, such an explicit construction isknown.
locally lipschitz continuous functions on โ๐
If ๐น : โ๐ โ โ is locally Lipschitz continuous, candidates can be taken from the Clarkesubdierential, which has an explicit characterization by Theorem 13.26. Under someadditional assumptions, each candidate is indeed a Newton derivative.
A function ๐น : โ๐ โ โ is called piecewise (continuously) dierentiable or PC1 function, if
(i) ๐น is continuous on โ๐ , and
(ii) for all ๐ฅ โ โ๐ there exists an open neighborhood ๐๐ฅ โ โ๐ of ๐ฅ and a nite set{๐น๐ : ๐๐ฅ โ โ}๐โ๐ผ๐ฅ of continuously dierentiable functions with
๐น (๐ฆ) โ {๐น๐ (๐ฆ)}๐โ๐ผ๐ฅ for all ๐ฆ โ ๐๐ฅ .
203
14 semismooth newton methods
In this case, we call ๐น a continuous selection of the ๐น๐ in๐๐ฅ . The set
๐ผ๐ (๐ฅ) โ {๐ โ ๐ผ๐ฅ | ๐น (๐ฅ) = ๐น๐ (๐ฅ)}
is called the active index set at ๐ฅ . Since the ๐น๐ are continuous, we have that ๐น (๐ฆ) โ ๐น ๐ (๐ฆ)for all ๐ โ ๐ผ๐ (๐ฅ) and ๐ฆ suciently close to ๐ฅ . Hence, indices that are only active on sets ofzero measure do not have to be considered in the following. We thus dene the essentiallyactive index set
๐ผ๐ (๐ฅ) โ {๐ โ ๐ผ๐ฅ | ๐ฅ โ cl (int {๐ฆ โ ๐๐ฅ | ๐น (๐ฆ) = ๐น๐ (๐ฆ)})} โ ๐ผ๐ (๐ฅ).
An example of an active but not essentially active index set is the following.
Example 14.5. Consider the function
๐ : โ โ โ, ๐ก โฆโ max{0, ๐ก, ๐ก/2},
i.e., ๐1(๐ก) = 0, ๐2(๐ก) = ๐ก , and ๐3(๐ก) = ๐ก/2. Then ๐ผ๐ (0) = {1, 2, 3} but ๐ผ๐ (0) = {1, 2}, since๐3 is active only at ๐ก = 0 and hence int {๐ก โ โ | ๐ (๐ก) = ๐3(๐ก)} = โ = cl โ .
Since any ๐ถ1 function ๐น๐ : ๐๐ฅ โ โ is Lipschitz continuous with Lipschitz constant๐ฟ๐ โ sup๐ฆโ๐๐ฅ |โ๐น (๐ฆ) | by Lemma 2.11, PC1 functions are always locally Lipschitz continuous;see [Scholtes 2012, Corollary 4.1.1].
Theorem 14.6. Let ๐น : โ๐ โ โ be piecewise dierentiable. Then ๐น is locally Lipschitzcontinuous on โ๐ with local constant ๐ฟ(๐ฅ) = max๐โ๐ผ๐ (๐ฅ) ๐ฟ๐ .
This yields the following explicit characterization of the Clarke subdierential of PC1
functions.
Theorem 14.7. Let ๐น : โ๐ โ โ be piecewise dierentiable and ๐ฅ โ โ๐ . Then
๐๐ถ๐น (๐ฅ) = co {โ๐น๐ (๐ฅ) | ๐ โ ๐ผ๐ (๐ฅ)} .
Proof. Let ๐ฅ โ โ๐ be arbitrary. By Theorem 13.26 it suces to show that{lim๐โโโ๐น (๐ฅ๐)
๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅ๐ โ ๐ฅ, ๐ฅ๐ โ ๐ธ๐น}= {โ๐น๐ (๐ฅ) | ๐ โ ๐ผ๐ (๐ฅ)} ,
where ๐ธ๐น is the set of Lebesgue measure 0 where ๐น is not dierentiable from RademacherโsTheorem. For this, let {๐ฅ๐}๐โโ โ โ๐ be a sequence with ๐ฅ๐ โ ๐ฅ , ๐น is dierentiable at ๐ฅ๐for all ๐ โ โ, and โ๐น (๐ฅ๐) โ ๐ฅโ โ โ๐ . Since ๐น is dierentiable at ๐ฅ๐, it must hold that๐น (๐ฆ) = ๐น๐๐ (๐ฆ) for some ๐๐ โ ๐ผ๐ (๐ฅ) and all ๐ฆ suciently close to ๐ฅ๐ , which implies thatโ๐น (๐ฅ๐) = โ๐น๐๐ (๐ฅ๐). For suciently large ๐ โ โ, we can further assume that ๐๐ โ ๐ผ๐ (๐ฅ)
204
14 semismooth newton methods
(if necessary, by adding ๐ฅ๐ with ๐๐ โ ๐ผ๐ (๐ฅ) to ๐ธ๐น , which does not increase its Lebesguemeasure). If we now consider subsequences {๐ฅ๐๐ }๐โโ with constant index ๐๐๐ = ๐ โ ๐ผ๐ (๐ฅ)(which exist since ๐ผ๐ (๐ฅ) is nite), we obtain using the continuity of โ๐น๐ that
๐ฅโ = lim๐โโ
โ๐น (๐ฅ๐๐ ) = lim๐โโ
โ๐น๐ (๐ฅ๐๐ ) โ {โ๐น๐ (๐ฅ) | ๐ โ ๐ผ๐ (๐ฅ)} .
Conversely, for everyโ๐น๐ (๐ฅ) with ๐ โ ๐ผ๐ (๐ฅ) there exists by denition of the essentially activeindices a sequence {๐ฅ๐}๐โโ with ๐ฅ๐ โ ๐ฅ and ๐น = ๐น๐ in a suciently small neighborhoodof each ๐ฅ๐ for ๐ large enough. The continuous dierentiability of the ๐น๐ thus implies thatโ๐น (๐ฅ๐) = โ๐น๐ (๐ฅ๐) for all ๐ โ โ large enough and hence that
โ๐น๐ (๐ฅ) = lim๐โโโ๐น๐ (๐ฅ๐) = lim
๐โโโ๐น (๐ฅ๐). ๏ฟฝ
From this, we obtain the Newton dierentiability of PC1 functions.
Theorem 14.8. Let ๐น : โ๐ โ โ be piecewise dierentiable. Then ๐น is Newton dierentiablefor all ๐ฅ โ โ๐ , and every ๐ท๐ ๐น (๐ฅ) โ ๐๐ถ๐น (๐ฅ) is a Newton derivative.
Proof. Let ๐ฅ โ โ๐ be arbitrary and โ โ ๐ with ๐ฅ + โ โ ๐๐ฅ . By Theorem 14.7, every๐ท๐ ๐น (๐ฅ + โ) โ ๐๐ถ๐น (๐ฅ + โ) is of the form
๐ท๐ ๐น (๐ฅ + โ) =โ
๐โ๐ผ๐ (๐ฅ+โ)๐ก๐โ๐น๐ (๐ฅ + โ) for
โ๐โ๐ผ๐ (๐ฅ+โ)
๐ก๐ = 1, ๐ก๐ โฅ 0.
Since ๐น is continuous, we have for all โ โ โ๐ suciently small that ๐ผ๐ (๐ฅ +โ) โ ๐ผ๐ (๐ฅ +โ) โ๐ผ๐ (๐ฅ), where the second inclusion follows from the fact that by continuity, ๐น (๐ฅ) โ ๐น๐ (๐ฅ)implies that ๐น (๐ฅ + โ) โ ๐น๐ (๐ฅ + โ). Hence, ๐น (๐ฅ + โ) = ๐น๐ (๐ฅ + โ) and ๐น (๐ฅ) = ๐น๐ (๐ฅ) for all๐ โ ๐ผ๐ (๐ฅ + โ). Theorem 14.2 then yields that
|๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐ท๐ ๐น (๐ฅ + โ)โ | โคโ
๐โ๐ผ๐ (๐ฅ+โ)๐ก๐ |๐น๐ (๐ฅ + โ) โ ๐น๐ (๐ฅ) โ โ๐น๐ (๐ฅ + โ)โ | = ๐ (โโโ),
since all ๐น๐ are continuously dierentiable by assumption. ๏ฟฝ
A natural application of the above are proximal point mappings of convex function-als.
Example 14.9.
(i) We rst consider the proximal mapping for the indicator function ๐ฟ๐ด : โ๐ โ โ
of the set ๐ด โ{๐ฅ โ โ๐
๏ฟฝ๏ฟฝ ๐ฅ๐ โ [๐, ๐]} for some ๐ < ๐ โ โ. Analogously to (iii),the corresponding proximal mapping is the componentwise projection
[proj๐ด (๐ฅ)]๐ = proj[๐,๐]๐ฅ๐ =
๐ if ๐ฅ๐ < ๐,๐ฅ๐ if ๐ฅ๐ โ [๐, ๐],๐ if ๐ฅ๐ < ๐,
205
14 semismooth newton methods
which is clearly piecewise dierentiable. Theorem 14.7 thus yields (also compo-nentwise) that
๐๐ถ [proj๐ด (๐ฅ)]๐ ={1} if ๐ฅ๐ โ (๐, ๐),{0} if ๐ฅ๐ โ [๐, ๐],[0, 1] if ๐ฅ๐ โ {๐, ๐}.
By Theorems 14.4 and 14.8, a possible Newton derivative is therefore given by
[๐ท๐proj๐ด (๐ฅ)โ]๐ = [๐[๐,๐] (๐ฅ) ๏ฟฝ โ]๐ โ{โ๐ if ๐ฅ๐ โ [๐, ๐],0 if ๐ฅ๐ โ [๐, ๐],
where the choice of which case to include ๐ฅ๐ โ {๐, ๐} in is arbitrary. (The com-ponentwise product [๐ฅ ๏ฟฝ ๐ฆ]๐ โ ๐ฅ๐๐ฆ๐ on โ๐ is also known as the Hadamardproduct.)
(ii) Consider now the proximal mapping for ๐บ : โ๐ โ โ, ๐บ (๐ฅ) โ โ๐ฅ โ1, whoseproximal mapping for arbitrary๐พ > 0 is given by Example 6.23 (ii) componentwiseas
[prox๐พ๐บ (๐ฅ)]๐ =๐ฅ๐ โ ๐พ if ๐ฅ๐ > ๐พ,0 if ๐ฅ๐ โ [โ๐พ,๐พ],๐ฅ๐ + ๐พ if ๐ฅ๐ < โ๐พ .
Again, this is clearly piecewise dierentiable, and Theorem 14.7 thus yields (alsocomponentwise) that
๐๐ถ [(prox๐พ๐บ ) (๐ฅ)]๐ ={1} if |๐ฅ๐ | > ๐พ,{0} if |๐ฅ๐ | < ๐พ,[0, 1] if |๐ฅ๐ | = ๐พ .
By Theorems 14.4 and 14.8, a possible Newton derivative is therefore given by
[๐ท๐prox๐พ๐บ (๐ฅ)โ]๐ = [๐{|๐ก |โฅ๐พ} (๐ฅ) ๏ฟฝ โ]๐ โ{โ๐ if |๐ฅ๐ | โฅ ๐พ,0 if |๐ฅ๐ | < ๐พ,
where again we could have taken the value ๐กโ๐ for any ๐ก โ [0, 1] for |๐ฅ๐ | = ๐พ .
superposition operators on ๐ฟ๐ (ฮฉ)
Rademacherโs Theorem does not hold in innite-dimensional function spaces, and hencethe Clarke subdierential no longer yields an algorithmically useful candidate for a Newtonderivative in general. One exception is the class of superposition operators dened by
206
14 semismooth newton methods
scalar Newton dierentiable functions, for which the Newton derivative can be evaluatedpointwise as well.
We thus consider as in Section 2.3 for an open and bounded domainฮฉ โ โ๐ , a Carathรฉodoryfunction ๐ : ฮฉ รโ โ โ (i.e., (๐ฅ, ๐ง) โฆโ ๐ (๐ฅ, ๐ง) is measurable in ๐ฅ and continuous in ๐ง), and1 โค ๐, ๐ โค โ the corresponding superposition operator
๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ), [๐น (๐ข)] (๐ฅ) = ๐ (๐ฅ,๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ.
The goal is now to similarly obtain a Newton derivative ๐ท๐ ๐น for ๐น as a superpositionoperator dened by the Newton derivative ๐ท๐ ๐ (๐ฅ, ๐ง) for ๐ง โฆโ ๐ (๐ฅ, ๐ง). Here, the assump-tion that ๐ท๐ ๐ is also a Carathรฉodory function is too restrictive, since we want to allowdiscontinuous derivatives as well (see Example 14.9). Luckily, for our purpose, a weakerproperty is sucient: A function is called BaireโCarathรฉodory function if it can be writtenas a pointwise limit of Carathรฉodory functions.
Under certain growth conditions on ๐ and ๐ท๐ ๐ ,1 we can transfer the Newton dierentia-bility of ๐ to ๐น , but we again have to take a two norm discrepancy into account.
Theorem 14.10. Let ๐ : ฮฉ รโ โ โ be a Carathรฉodory function. Furthermore, assume that
(i) ๐ง โฆโ ๐ (๐ฅ, ๐ง) is uniformly Lipschitz continuous for almost every ๐ฅ โ ฮฉ and ๐ (๐ฅ, 0) isbounded;
(ii) ๐ง โฆโ ๐ (๐ฅ, ๐ง) is Newton dierentiable with Newton derivative ๐ง โฆโ ๐ท๐ ๐ (๐ฅ, ๐ง) for almostevery ๐ฅ โ ฮฉ;
(iii) ๐ท๐ ๐ is a BaireโCarathรฉodory function and uniformly bounded.
Then for any 1 โค ๐ < ๐ < โ, the corresponding superposition operator ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ)is Newton dierentiable with Newton derivative
๐ท๐ ๐น : ๐ฟ๐ (ฮฉ) โ ๐(๐ฟ๐ (ฮฉ);๐ฟ๐ (ฮฉ)), [๐ท๐ ๐น (๐ข)โ] (๐ฅ) = ๐ท๐ ๐ (๐ฅ,๐ข (๐ฅ))โ(๐ฅ)
for almost every ๐ฅ โ ฮฉ and all โ โ ๐ฟ๐ (ฮฉ).
Proof. First, the uniform Lipschitz continuity together with the reverse triangle inequalityyields that
|๐ (๐ฅ, ๐ง) | โค |๐ (๐ฅ, 0) | + ๐ฟ |๐ง | โค ๐ถ + ๐ฟ |๐ง |๐/๐ for almost every ๐ฅ โ ฮฉ, ๐ง โ โ,
and hence the growth condition (2.5) for all 1 โค ๐ โค โ. Due to the continuous embedding๐ฟ๐ (ฮฉ) โฉโ ๐ฟ๐ (ฮฉ) for all 1 โค ๐ โค ๐ โค โ, the superposition operator ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) istherefore well-dened and continuous by Theorem 2.12.
1which can be signicantly relaxed; see [Schiela 2008, Proposition a.1]
207
14 semismooth newton methods
For any measurable ๐ข : ฮฉ โ โ, we have that ๐ฅ โฆโ ๐ท๐ ๐ (๐ฅ,๐ข (๐ฅ)) is by denition thepointwise limit of measurable functions and hence itself measurable. Furthermore, itsuniform boundedness in particular implies the growth condition (2.5) for ๐โฒ โ ๐ and๐โฒ โ ๐ โ ๐ > 0. As in the proof of Theorem 2.13, we deduce that the correspondingsuperposition operator ๐ท๐ ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) is well-dened and continuous for ๐ โ ๐๐
๐โ๐ ,and that for any ๐ข โ ๐ฟ๐ (ฮฉ), the mapping โ โฆโ ๐ท๐ ๐น (๐ข)โ denes a bounded linear operator๐ท๐ ๐น (๐ข) : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ). (This time, we do not distinguish in notation between the linearoperator and the function that denes this operator by pointwise multiplication.)
To show that ๐ท๐ ๐น (๐ข) is a Newton derivative for ๐น in ๐ข โ ๐ฟ๐ (ฮฉ), we consider the pointwiseresidual
๐ : ฮฉ รโ โ โ, ๐ (๐ฅ, ๐ง) โ{ |๐ (๐ฅ,๐ง)โ๐ (๐ฅ,๐ข (๐ฅ))โ๐ท๐ ๐ (๐ฅ,๐ง) (๐งโ๐ข (๐ฅ)) |
|๐งโ๐ข (๐ฅ) | if ๐ง โ ๐ข (๐ฅ),0 if ๐ง = ๐ข (๐ฅ).
Since ๐ is a Carathรฉodory function and๐ท๐ ๐ is a BaireโCarathรฉodory function, the function๐ฅ โฆโ ๐ (๐ฅ, ๏ฟฝ๏ฟฝ (๐ฅ)) =: [๐ (๏ฟฝ๏ฟฝ)] (๐ฅ) is measurable for any measurable ๏ฟฝ๏ฟฝ : ฮฉ โ โ (since sums,products, and quotients of measurable functions are again measurable). Furthermore, for๏ฟฝ๏ฟฝ โ ๐ฟ๐ (ฮฉ), the uniform Lipschitz continuity of ๐ and the uniform boundedness of ๐ท๐ ๐imply that for almost every ๐ฅ โ ฮฉ with ๏ฟฝ๏ฟฝ (๐ฅ) โ ๐ข (๐ฅ),
(14.5) | [๐ (๏ฟฝ๏ฟฝ)] (๐ฅ) | = |๐ (๐ฅ, ๏ฟฝ๏ฟฝ (๐ฅ)) โ ๐ (๐ฅ,๐ข (๐ฅ)) โ ๐ท๐ ๐ (๐ฅ, ๏ฟฝ๏ฟฝ (๐ฅ)) (๏ฟฝ๏ฟฝ (๐ฅ) โ ๐ข (๐ฅ)) ||๏ฟฝ๏ฟฝ (๐ฅ) โ ๐ข (๐ฅ) | โค ๐ฟ +๐ถ
and thus that ๐ (๏ฟฝ๏ฟฝ) โ ๐ฟโ(ฮฉ). Hence, the superposition operator ๐ : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) iswell-dened.
Let now {๐ข๐}๐โโ โ ๐ฟ๐ (ฮฉ) be a sequence with ๐ข๐ โ ๐ข โ ๐ฟ๐ (ฮฉ). Then there exists asubsequence, again denoted by {๐ข๐}๐โโ, with ๐ข๐ (๐ฅ) โ ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ.Since ๐ง โฆโ ๐ (๐ฅ, ๐ง) is Newton dierentiable almost everywhere, we have by denitionthat ๐ (๐ฅ,๐ข๐ (๐ฅ)) โ 0 for almost every ๐ฅ โ ฮฉ. Together with the boundedness from (14.5),Lebesgueโs dominated convergence theorem therefore yields that ๐ (๐ข๐) โ 0 in ๐ฟ๐ (ฮฉ) (andhence along the full sequence since the limit is unique).2 For any ๏ฟฝ๏ฟฝ โ ๐ฟ๐ (ฮฉ), the Hรถlderinequality with 1
๐ + 1๐ =
1๐ thus yields that
โ๐น (๏ฟฝ๏ฟฝ) โ ๐น (๐ข) โ ๐ท๐ ๐น (๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝ โ ๐ข)โ๐ฟ๐ = โ๐ (๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝ โ ๐ข)โ๐ฟ๐ โค โ๐ (๏ฟฝ๏ฟฝ)โ๐ฟ๐ โ๏ฟฝ๏ฟฝ โ ๐ขโ๐ฟ๐ .
If we now set ๏ฟฝ๏ฟฝ โ ๐ข + โ for โ โ ๐ฟ๐ (ฮฉ) with โโโ๐ฟ๐ โ 0, we have that โ๐ (๐ข + โ)โ๐ฟ๐ โ 0and hence by denition the Newton dierentiability of ๐น in ๐ข with Newton derivativeโ โฆโ ๐ท๐ ๐น (๐ข)โ as claimed. ๏ฟฝ
2This is the step that fails for ๐น : ๐ฟโ (ฮฉ) โ ๐ฟโ (ฮฉ), since pointwise convergence and boundedness togetherdo not imply uniform convergence almost everywhere.
208
14 semismooth newton methods
Example 14.11.
(i) Consider
๐ด โ{๐ข โ ๐ฟ2(ฮฉ)
๏ฟฝ๏ฟฝ ๐ โค ๐ข (๐ฅ) โค ๐ for almost every ๐ฅ โ ฮฉ}
and proj๐ด : ๐ฟ๐ (ฮฉ) โ ๐ฟ2(ฮฉ) for ๐ > 2. Applying Theorem 14.10 to Example 14.9 (i)then yields the pointwise almost everywhere Newton derivative
[๐ท๐proj๐ด (๐ข)โ] (๐ฅ) = [๐[๐,๐] (๐ข)โ] (๐ฅ) โ{โ(๐ฅ) if ๐ข (๐ฅ) โ [๐, ๐],0 if ๐ข (๐ฅ) โ [๐, ๐] .
(ii) Consider now
๐บ : ๐ฟ2(ฮฉ) โ โ, ๐บ (๐ข) = โ๐ขโ๐ฟ1 =โซฮฉ|๐ข (๐ฅ) | ๐๐ฅ
and prox๐พ๐บ : ๐ฟ๐ (ฮฉ) โ ๐ฟ2(ฮฉ) for ๐ > 2 and ๐พ > 0. Applying Theorem 14.10 toExample 14.9 (ii) then yields the pointwise almost everywhere Newton derivative
[๐ท๐prox๐พ๐บ (๐ข)โ] (๐ฅ) = [๐{|๐ก |โฅ๐พ} (๐ข)โ] (๐ฅ) โ{โ(๐ฅ) if |๐ข (๐ฅ) | โฅ ๐พ,0 if |๐ข (๐ฅ) | < ๐พ .
For ๐ = ๐ โ [1,โ], however, the claim is false in general, as can be shown by counterexam-ples.
Example 14.12. We take
๐ : โ โ โ, ๐ (๐ง) = max{0, ๐ง} โ{0 if ๐ง โค 0,๐ก if ๐ง โฅ 0.
This is a piecewise dierentiable function, and hence by Theorem 14.8 we can for any๐ฟ โ [0, 1] take as Newton derivative
๐ท๐ ๐ (๐ง)โ =
0 if ๐ง < 0,๐ฟ if ๐ง = 0,โ if ๐ง > 0.
We now consider the corresponding superposition operators ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ)and ๐ท๐ ๐น (๐ข) โ ๐(๐ฟ๐ (ฮฉ);๐ฟ๐ (ฮฉ)) for any ๐ โ [1,โ) and show that the approximation
209
14 semismooth newton methods
condition (14.2) is violated for ฮฉ = (โ1, 1), ๐ข (๐ฅ) = โ|๐ฅ |, and
โ๐ (๐ฅ) ={
1๐ if |๐ฅ | < 1
๐ ,
0 if |๐ฅ | โฅ 1๐ .
First, it is straightforward to compute โโ๐โ๐๐ฟ๐ (ฮฉ) = 2๐๐+1 . Then since [๐น (๐ข)] (๐ฅ) =
max{0,โ|๐ฅ |} = 0 almost everywhere, we have that
[๐น (๐ข + โ๐) โ ๐น (๐ข) โ ๐ท๐ ๐น (๐ข + โ๐)โ๐] (๐ฅ) =โ|๐ฅ | if |๐ฅ | < 1
๐ ,
0 if |๐ฅ | > 1๐ ,
โ๐ฟ๐ if |๐ฅ | = 1
๐ ,
and thus
โ๐น (๐ข + โ๐) โ ๐น (๐ข) โ ๐ท๐ ๐น (๐ข + โ๐)โ๐โ๐๐ฟ๐ (ฮฉ) =โซ 1
๐
โ 1๐
|๐ฅ |๐ ๐๐ฅ =2
๐ + 1
(1๐
)๐+1.
This implies that
lim๐โโ
โ๐น (๐ข + โ๐) โ ๐น (๐ข) โ ๐ท๐ ๐น (๐ข + โ๐)โ๐โ๐ฟ๐ (ฮฉ)โโ๐โ๐ฟ๐ (ฮฉ)
=
(1
๐ + 1
) 1๐
โ 0
and hence that ๐น is not Newton dierentiable from ๐ฟ๐ (ฮฉ) to ๐ฟ๐ (ฮฉ) for any ๐ < โ.
For the case ๐ = ๐ = โ, we take ฮฉ = (0, 1), ๐ข (๐ฅ) = ๐ฅ , and
โ๐ (๐ฅ) ={๐๐ฅ โ 1 if ๐ฅ โค 1
๐ ,
0 if ๐ฅ โฅ 1๐ ,
such that โโ๐โ๐ฟโ (ฮฉ) = 1 for all ๐ โ โ. We also have that ๐ฅ + โ๐ = (1 + ๐)๐ฅ โ 1 โค 0 for๐ฅ โค 1
๐+1 โค 1๐ and hence that
[๐น (๐ข + โ๐) โ ๐น (๐ข) โ ๐ท๐ ๐น (๐ข + โ๐)โ๐] (๐ฅ) ={(1 + ๐)๐ฅ โ 1 if ๐ฅ โค 1
๐+1 ,0 if ๐ฅ โฅ 1
๐+1
since either โ๐ = 0 or ๐น (๐ข + โ๐) = ๐น (๐ข) + ๐ท๐ ๐น (๐ข)โ๐ in the second case. Now,
sup๐ฅโ(0, 1
๐+1 ]| (1 + ๐)๐ฅ โ 1| = 1 for all ๐ โ โ,
210
14 semismooth newton methods
which implies that
lim๐โโ
โ๐น (๐ข + โ๐) โ ๐น (๐ข) โ ๐ท๐ ๐น (๐ข + โ๐)โ๐โ๐ฟ๐ (ฮฉ)โโ๐โ๐ฟ๐ (ฮฉ)
= 1 โ 0
and hence that ๐น is not Newton dierentiable from ๐ฟโ(ฮฉ) to ๐ฟโ(ฮฉ) either.
Remark 14.13. Semismoothness was introduced in [Miin 1977] for Lipschitz-continuous functionals๐น : โ๐ โ โ as a condition relating Clarke subderivatives and directional derivatives near a point.This denition was extended to functions ๐น : โ๐ โ โ๐ in [Qi 1993; Qi & Sun 1993] and shownto imply a uniform version of the approximation condition (14.2) for all elements of the Clarkesubdierential and hence superlinear convergence of the semismooth Newton method in nitedimensions. A semismooth Newton method specically for PC1 functions was already consideredin [Kojima & Shindo 1986]. In Banach spaces, [Kummer 1988] was the rst to study an abstract classof Newton methods for nonsmooth equations based on the condition (14.2), unifying the previousresults; see [Klatte & Kummer 2002]. In all these works, the analysis was based on semismoothnessas a property relating ๐น : ๐ โ ๐ to a set-valued mapping ๐บ : ๐ โ ๐ฟ(๐,๐ ), whose elements(uniformly) satisfy (14.2). In contrast, [Kummer 2000; Chen, Nashed & Qi 2000] considered โ as wedo in this book โ single-valued Newton derivatives (named Newton maps in the former and slantingfunctions in the latter) in Banach spaces. This approach was later followed in [Hintermรผller, Ito& Kunisch 2002; Ito & Kunisch 2008] to show that for a specic choice of Newton derivative, theclassical primal-dual active set method for solving quadratic optimization problems under linearinequality constraints can be interpreted as a semismooth Newton method. In parallel, [Ulbrich2002; Ulbrich 2011] showed that superposition operators dened by semismooth functions (in thesense of [Qi & Sun 1993]) are semismooth (in the sense of [Kummer 1988]) between the right spaces.A similar result for single-valued Newton derivatives was shown in [Schiela 2008] using a proofthat is much closer to the one for the classical dierentiability of superposition operators; compareTheorems 2.13 and 14.10. It should, however, be mentioned that not all calculus results for semismoothfunctions are available in the single-valued setting; for example, the implicit function theorem from[Kruse 2018] requires set-valued Newton derivatives, since the selection of the Newton derivative ofthe implicit function need not correspond to the selection of the given mapping. Finally, we remarkthat the notion of semismoothness and semismooth Newton methods were very recently extendedto set-valued mappings in [Gferer & Outrata 2019].
211
15 NONLINEAR PRIMAL-DUAL PROXIMAL SPLITTING
In this chapter, our goal is to extend the primal-dual proximal splitting (PDPS) method tononlinear operators ๐พ โ ๐ถ1(๐ ;๐ ), i.e., to problems of the form
(15.1) min๐๐น (๐ฅ) +๐บ (๐พ (๐ฅ)),
where we still assume ๐น : ๐ โ โ and ๐บ : ๐ โ โ to be convex, proper, and lowersemicontinuous on the Hilbert spaces ๐ and ๐ . For simplicity, we will only consider linearconvergence under a strong convexity assumption and refer to the literature for weakconvergence and acceleration under partial strong convexity (see Remark 15.12 below). Asin earlier chapters, we use the same notation for the inner product as for the duality pairingin Hilbert spaces to distinguish them better from pairs of elements.
We recall the three-point program for convergence proofs of rst-order methods fromChapter 9, which remains fundamentally the same in the nonlinear setting. However,we need to make some of the concepts local. Thus the three main ingredients of ourconvergence proofs will be the following.
(i) The three-point identity (1.5).
(ii) The local monotonicity of the operator๐ป whose roots correspond to the (primal-dual)critical points of (15.1). We x one of the points in the denition of monotonicityin Section 6.2 to a root ๐ฅ of ๐ป , and only vary the other point in a neighborhood of๐ฅ . This is essentially a nonsmooth variant of the standard second-order sucient(or local quadratic growth) condition โ2๐น (๐ฅ) ๏ฟฝ 0 (i.e., positive deniteness of theHessian) for minimizing a smooth function ๐น : โ๐ โ โ.
(iii) The nonnegativity of the preconditioning operators๐๐+1 dening the implicit formof the algorithm. These will now in general depend on the current iterate, and thuswe can only show the nonnegativity in a neighborhood of suitable ๐ฅ .
15.1 nonconvex explicit splitting
To motivate our more specic assumptions on ๐พ , we start by showing that forward-backward splitting can be applied to a nonconvex function for the forward step. We
212
15 nonlinear primal-dual proximal splitting
thus consider for the problem
(15.2) min๐ฅโ๐
๐บ (๐ฅ) + ๐น (๐ฅ),
with ๐น smooth but possibly nonconvex, the algorithm
(15.3) ๐ฅ๐+1 = prox๐๐บ (๐ฅ๐ โ ๐โ๐น (๐ฅ๐)) .
To show convergence of this algorithm, we extend the non-value three-point smoothnessinequalities of Corollaries 7.2 and 7.7 from convex smooth functions to ๐ถ2 functions. (It isalso possible to obtain corresponding value inequalities.)
Lemma 15.1. Suppose ๐น โ ๐ถ2(๐ ). Let ๐ง, ๐ฅ โ ๐ , and suppose for some ๐ฟ > 0 and ๐พ โฅ 0 for allZ โ ๐น(๐ฅ, โ๐ง โ ๐ฅ โ๐ ) that ๐พ ยท Id โค โ2๐น (Z ) โค ๐ฟ ยท Id. Then for any ๐ฝ โ (0, 2] and ๐ฅ โ ๐ we have
(15.4) ใโ๐น (๐ง) โ โ๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ โฅ ๐พ (2 โ ๐ฝ)2 โ๐ฅ โ ๐ฅ โ2๐ โ ๐ฟ
2๐ฝ โ๐ฅ โ ๐งโ2๐ .
Proof. By the one-dimensionalmean value theorem applied to ๐ก โฆโ ใโ๐น (๐ฅ+๐ก (๐งโ๐ฅ)), ๐ฅโ๐ฅใ๐ ,we obtain for Z = ๐ฅ + ๐ (๐ง โ ๐ฅ) for some ๐ โ [0, 1] that
ใโ๐น (๐ง) โ โ๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ = ใโ2๐น (Z ) (๐ง โ ๐ฅ), ๐ฅ โ ๐ฅใ๐ .
Therefore, for any ๐ฝ > 0,
(15.5) ใโ๐น (๐ง) โ โ๐น (๐ฅ), ๐ฅ โ ๐ฅใ๐ = โ๐ฅ โ ๐ฅ โ2โ2๐น (Z ) + ใโ2๐น (Z ) (๐ง โ ๐ฅ), ๐ฅ โ ๐ฅใ๐
โฅ 2 โ ๐ฝ2 โ๐ฅ โ ๐ฅ โ2โ2๐น (Z ) โ
12๐ฝ โ๐ฅ โ ๐งโ2โ2๐น (Z ) .
By the denition of ๐พ and ๐ฟ, we obtain (15.4). ๏ฟฝ
The following result is almost a carbon copy of Theorems 9.6 and 10.2 for convex smooth๐น . However, since our present problem is nonconvex, we can only expect to locally nd acritical point of ๐ฝ โ ๐น +๐บ .
Theorem 15.2. Let ๐น โ ๐ถ2(๐ ) and let๐บ : ๐ โ โ be proper, convex, and lower semicontinuous.Given an initial iterate ๐ฅ0 and a critical point ๐ฅ โ [๐๐บ + โ๐น ]โ1(0) of ๐ฝ โ ๐น + ๐บ , letX โ ๐น(๐ฅ, โ๐ฅ0 โ ๐ฅ โ), and suppose for some ๐ฟ > 0 and ๐พ โฅ 0 that
(15.6) ๐พ ยท Id โค โ2๐น (Z ) โค ๐ฟ ยท Id (Z โ X).
Take 0 < ๐ < 2๐ฟโ1.
(i) If ๐พ > 0, the sequence {๐ฅ๐}๐โโ generated by (15.3) converges linearly to ๐ฅ .
213
15 nonlinear primal-dual proximal splitting
(ii) If ๐พ = 0, the sequence {๐ฅ๐}๐โโ converges weakly to a critical point of ๐ฝ .
Note that if ๐บ is locally nite-valued, then by Theorem 13.20 our denition of a criticalpoint in this theorem means ๐ฅ โ [๐๐ถ ๐ฝ ]โ1(0).
Proof. As usual, we write (15.3) as
(15.7) 0 โ ๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] + (๐ฅ๐+1 โ ๐ฅ๐).Suppose ๐ฅ๐ โ X and let ๐ฝ โ (๐ฟ๐, 2) be arbitrary (which is possible since ๐๐ฟ < 2). By themonotonicity of ๐๐บ and the local three-pointmonotonicity (15.4) of ๐น implied by Lemma 15.1,we obtain
ใ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ ๐พ (2 โ ๐ฝ)2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ โ ๐ฟ
2๐ฝ โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Observe that if we had ๐ฅ๐+1 = ๐ฅ๐ (or ๐น = 0), this would show the local quadratic growth of๐น +๐บ at ๐ฅ . Since ๐ฅ๐+1 = ๐ฅ๐ , we however need to compensate for taking the forward stepwith respect to ๐น .
Testing (15.7) by the application of ๐๐ ใ ยท , ๐ฅ๐+1 โ ๐ฅใ๐ for some testing parameter ๐๐ > 0yields
๐๐๐พ๐ (2 โ ๐ฝ)2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ โ ๐๐๐ฟ๐
2๐ฝ โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ๐๐ ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐ โค 0.
Taking
(15.8) ๐๐+1 โ ๐๐ (1 + ๐พ๐ (2 โ ๐ฝ)) with ๐0 > 0,
the three-point formula (9.1) yields
(15.9) ๐๐+12 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐ (1 โ ๐๐ฟ/๐ฝ)2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โค ๐๐
2 โ๐ฅ๐ โ ๐ฅ โ2๐ .
Since ๐ฝ โ (๐ฟ๐, 2) and ๐ฅ๐ โ X, this implies that ๐ฅ๐+1 โ X. By induction, we thus obtain that{๐ฅ๐}๐โโ โ X under our assumption ๐ฅ0 โ X.
If ๐พ > 0, the recursion (15.8) together with ๐ฝ < 2 shows that ๐๐ grows exponentially. Usingthat ๐๐ฟ/๐ฝ โค 1 and telescoping (15.9) then shows the claimed linear convergence.
Let us then consider weak convergence. With ๐พ = 0 and ๐ฝ < 2, the recursion (15.8) reducesto ๐๐+1 โก ๐0 > 0. Since ๐๐ฟ โค ๐ฝ , the estimate (15.9) yields Fejรฉr monotonicity of theiterates {๐ฅ๐}๐โโ. Moreover, we establish for๐ค๐+1 โ โ๐โ1(๐ฅ๐+1โ๐ฅ๐) that โ๐ค๐+1โ๐ โ 0 and๐ค๐+1 โ ๐๐บ (๐ฅ๐+1) +โ๐น (๐ฅ๐) for all ๐ โ โ. Let ๐ฅ be any weak limit point of {๐ฅ๐}๐โโ, i.e., thereexists a subsequence {๐ฅ๐๐ }๐โโ with ๐ฅ๐๐ โ ๐ฅ โ X. Then also ๐ฅ๐๐+1 โ ๐ฅ โ X. Since โ๐นis by (15.6) Lipschitz continuous in X, we have โ๐น (๐ฅ๐๐+1) โ โ๐น (๐ฅ๐๐ ) โ 0. Consequently,๐๐บ (๐ฅ๐๐+1) + โ๐น (๐ฅ๐๐+1) 3 ๐ค๐๐+1 + โ๐น (๐ฅ๐๐+1) โ โ๐น (๐ฅ๐๐ ) โ 0. By the outer semicontinuityof ๐๐บ + โ๐น , it follows that 0 โ ๐๐บ (๐ฅ) + โ๐น (๐ฅ) and therefore ๐ฅ โ (๐๐น + โ๐บ)โ1(0) โ X. Theclaim thus follows by applying Opialโs Lemma 9.1. ๏ฟฝ
214
15 nonlinear primal-dual proximal splitting
15.2 nl-pdps formulation and assumptions
Asmentioned above,we consider the problem (15.1) with ๐น : ๐ โ โ and๐บ : ๐ โ โ convex,proper, and lower semicontinuous, and ๐พ โ ๐ถ1(๐ ;๐ ). We will soon state more preciseassumptions on ๐พ . When either the null space of [โ๐พ (๐ฅ)]โ is trivial or dom๐บ = ๐ , we canapply the chain rule Theorem 13.23 for Clarke subdierentials as well as the equivalencesof Theorems 13.5 and 13.8 for convex and dierentiable functions, respectively, to rewrite asin Section 8.4 the critical point conditions for this problem as 0 โ ๐ป (๐ข) for the set-valuedoperator ๐ป : ๐ ร ๐ โ ๐ ร ๐ dened for ๐ข = (๐ฅ, ๐ฆ) โ ๐ ร ๐ as
(15.10) ๐ป (๐ข) โ(๐๐น (๐ฅ) + [โ๐พ (๐ฅ)]โ๐ฆ๐๐บโ(๐ฆ) โ ๐พ (๐ฅ)
).
Throughout the rest of this chapter, we write ๐ข = (๐ฅ, ๐ฆ) โ ๐ปโ1(0) for an arbitrary root of๐ป that we assume to exist.
In analogy to the basic PDPS method, the basic unaccelerated NL-PDPS method theniterates
(15.11)
๐ฅ๐+1 โ (๐ผ + ๐๐๐น )โ1(๐ฅ๐ โ ๐ [โ๐พ (๐ฅ๐)]โ๐ฆ๐),๐ฅ๐+1 โ 2๐ฅ๐+1 โ ๐ฅ๐ ,๐ฆ๐+1 โ (๐ผ + ๐๐๐บโ)โ1(๐ฆ๐ + ๐๐พ (๐ฅ๐+1)).
We can write this algorithm in the general form of Theorem 11.12 as follows. For someprimal and dual testing parameters ๐๐ ,๐๐+1 > 0, we dene the step length and testingoperators
๐๐+1 โก๐ โ
(๐Id 00 ๐Id
)and ๐๐+1 โ
(๐๐ Id 00 ๐๐+1Id
).
We also dene the linear preconditioner ๐๐+1 and the step length weighted partial lin-earization ๐ป๐+1 of ๐ป by
๐๐+1 โ(
Id โ๐ [โ๐พ (๐ฅ๐)]โโ๐โ๐พ (๐ฅ๐) Id
), and(15.12)
๐ป๐+1(๐ข) โ๐๐+1
(๐๐น (๐ฅ) + [โ๐พ (๐ฅ๐)]โ๐ฆ
๐๐บโ(๐ฆ) โ ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐) (๐ฅ โ ๐ฅ๐+1)).(15.13)
Observe that ๐ป๐+1(๐ข) simplies to๐๐+1๐ป (๐ข) for linear ๐พ . Then (15.11) becomes
(15.14) 0 โ ๐ป๐+1(๐ข๐+1) +๐๐+1(๐ข๐+1 โ ๐ข๐).
We will need ๐พ to be locally Lipschitz dierentiable.
215
15 nonlinear primal-dual proximal splitting
Assumption 15.3 (locally Lipschitz โ๐พ ). The operator ๐พ : ๐ โ ๐ is Frรฉchet dierentiable,and for some ๐ฟ โฅ 0 and a neighborhood X๐พ of ๐ฅ ,
(15.15) โโ๐พ (๐ฅ) โ โ๐พ (๐ง)โ๐(๐,๐ ) โค ๐ฟโ๐ฅ โ ๐งโ๐ (๐ฅ, ๐ง โ X๐พ ).
We also require a three-point assumption on ๐พ . This assumption combines a second-ordergrowth condition with a three-point smoothness estimate. Note that the factor ๐พ๐พ can benegative; if it is, it will need to be oset by sucient strong convexity of ๐น .
Assumption 15.4 (three-point condition on ๐พ). For a neighborhood X๐พ of ๐ฅ , and some๐พ๐พ โ โ and ๐ฟ, \ โฅ 0, we require
(15.16) ใ[โ๐พ (๐ง) โ โ๐พ (๐ฅ)]โ๐ฆ, ๐ฅ โ ๐ฅใ๐โฅ ๐พ๐พ โ๐ฅ โ ๐ฅ โ2๐ + \ โ๐พ (๐ฅ) โ ๐พ (๐ฅ) โ โ๐พ (๐ฅ) (๐ฅ โ ๐ฅ)โ๐ โ _
2 โ๐ฅ โ ๐งโ2๐ (๐ฅ, ๐ง โ X๐พ ).
For linear ๐พ , Assumption 15.4 trivially holds for any ๐พ๐พ โค 0, \ โฅ 0 and _ = 0. Furthermore,if we take ๐บโ = ๐ฟ{1} (so that ๐พ : ๐ โ โ), the problem (15.1) reduces to (15.2) with ๐พ inplace of ๐น . Minding that in this case ๐ฆ = 1, Lemma 15.1 with ๐ฝ = 1 proves Assumption 15.4for _ = ๐ฟ, any \ โฅ 0 and ๐พ๐พ โค ๐พ with ๐พ, ๐ฟ โฅ 0 satisfying ๐พ ยท Id โค โ2๐พ (Z ) โค ๐ฟ ยท Id or allZ โ X๐พ . In more general settings, the verication of Assumption 15.4 can demand someeort. We refer to [Clason, Mazurenko & Valkonen 2019] for further examples.
15.3 convergence proof
For simplicity of treatment, and to demonstrate the main ideas without excessive techni-calities, we only show linear convergence under strong convexity of both ๐น and ๐บโ.
We will base our proof on Theorem 11.12 and thus have to verify its assumptions. Mostof the work is in verifying the inequality (11.28), which we do in several steps. First, weensure that the operator ๐๐+1๐๐+1 giving rise to the local metric is self-adjoint. Then weshow that ๐๐+2๐๐+2 and the update ๐๐+1(๐๐+1 +ฮ๐+1) actually performed by the algorithmyield identical norms, where ฮ๐+1 represents some o-diagonal components from thealgorithm as well as any strong convexity provided by ๐น and ๐บโ. Finally, we estimate thelocal monotonicity of ๐ป๐+1.
We write ๐พ๐น , ๐พ๐บโ โฅ 0 for the factors of (strong) convexity of ๐น and ๐บโ, and recall the factor๐พ๐พ โ โ from Assumption 15.4 Then for some โacceleration parametersโ ๐พ๐น , ๐พ๐บโ โฅ 0 and^ โ [0, 1), we require that
๐พ๐น + ๐พ๐พ โฅ ๐พ๐น โฅ 0, ๐พ๐บโ โฅ ๐พ๐บโ โฅ 0,(15.17a)[๐ โ ๐๐๐ = ๐๐+1๐, 1 โ ^ โค ๐๐ โโ๐พ (๐ฅ๐)โ2,(15.17b)
๐๐+1 = ๐๐ (1 + 2๐พ๐น๐), and ๐๐+1 = ๐๐ (1 + 2๐พ๐บโ๐) (๐ โ โ).(15.17c)
216
15 nonlinear primal-dual proximal splitting
The next lemma adapts Lemma 9.11.
Lemma 15.5. Fix ๐ โ โ and suppose (15.17) holds. Then ๐๐+1๐๐+1 is self-adjoint and satises
๐๐+1๐๐+1 โฅ(๐ฟ๐๐ ยท Id 0
0 (^ โ ๐ฟ) (1 โ ๐ฟ)โ1๐๐+1 ยท Id)
for any ๐ฟ โ [0, ^] .
Proof. From (15.17) we have ๐๐๐ = ๐๐+1๐ = [๐ . By (15.12) then
(15.18) ๐๐+1๐๐+1 =(
๐๐ ยท Id โ[๐ [โ๐พ (๐ฅ๐)]โโ[๐โ๐พ (๐ฅ๐) ๐๐+1 ยท Id
).
This shows that ๐๐+1๐๐+1 is self-adjoint. Youngโs inequality furthermore implies that
(15.19) ๐๐+1๐๐+1 โฅ(๐ฟ๐๐ Id 00 ๐๐+1
(Id โ ๐๐
1โ๐ฟโ๐พ (๐ฅ๐) [โ๐พ (๐ฅ๐)]โ) ) .
The claimed estimate then follows from the assumptions (15.17). ๏ฟฝ
Our next step is to simplify ๐๐+1๐๐+1 โ ๐๐+2๐๐+2 in (11.28).
Lemma 15.6. Fix ๐ โ โ, and suppose (15.17) holds. Then 12 โ ยท โ2๐๐+1 (๐๐+1+ฮ๐+1)โ๐๐+2๐๐+2 = 0 for
(15.20) ฮ๐+1 โ(
2๐พ๐น๐Id 2๐ [โ๐พ (๐ฅ๐)]โโ2๐โ๐พ (๐ฅ๐+1) 2๐พ๐บโ๐Id
).
Proof. Using (15.17) and (15.18) can write
๐๐+1(๐๐+1 + ฮ๐+1) โ ๐๐+2๐๐+2 = ๐ท๐+1
for the skew-symmetric operator
๐ท๐+1 โ(
0 [[๐+1โ๐พ (๐ฅ๐+1) + [๐โ๐พ (๐ฅ๐)]โโ[[๐+1โ๐พ (๐ฅ๐+1) + [๐โ๐พ (๐ฅ๐)] 0
).
This yields the claim. ๏ฟฝ
For our main results, we need to assume that the dual variables stay bounded within theโnonlinear rangeโ of ๐พ . To this end, we introduce the (possibly empty) subspace ๐NL of ๐ inwhich ๐พ acts linearly, i.e.,
๐L โ {๐ฆ โ ๐ | the mapping ๐ฅ โฆโ ใ๐ฆ, ๐พ (๐ฅ)ใ is linear} and ๐NL โ ๐โฅL .
We then denote by ๐NL the orthogonal projection to ๐NL. We also write
๐นNL(๐ฆ, ๐ ) โ {๐ฆ โ ๐ | โ๐ฆ โ ๐ฆ โ๐NL โค ๐ }
217
15 nonlinear primal-dual proximal splitting
for the closed cylinder in ๐ of the radius ๐ with axis orthogonal to ๐NL.
With X๐พ given by Assumption 15.3, we now dene for some radius ๐๐ฆ > 0 the neighbor-hood
(15.21) U(๐๐ฆ) โ X๐พ ร ๐นNL(๐ฆ, ๐๐ฆ).
We will require that the iterates {๐ข๐}๐ขโโ of (15.11) stay within this neighborhood for somexed ๐๐ฆ > 0.
The next lemma provides the necessary three-point inequality to estimate the linearizationsperformed within ๐ป๐+1.
Lemma 15.7. For a xed ๐ โ โ, suppose ๐ฅ๐+1 โ X๐พ , and let ๐๐ฆ โฅ 0 be such that ๐ข๐ , ๐ข๐+1 โU(๐๐ฆ). Suppose ๐พ satises Assumptions 15.3 and 15.4 with \ โฅ ๐๐ฆ . If (15.17) holds, then
ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โฅ12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1ฮ๐+1 โ[๐ [_ + 3๐ฟ๐๐ฆ]
2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Proof. From (15.10), (15.13), (15.17), and (15.20), we calculate
(15.22) ๐ท โ ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โ12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1ฮ๐+1= ใ๐ป (๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1๐๐+1 โ [๐๐พ๐น โ๐ฅ๐+1 โ ๐ฅ โ2๐ โ [๐๐พ๐บโ โ๐ฆ๐+1 โ ๐ฆ โ2๐+ [๐ ใ[โ๐พ (๐ฅ๐) โ โ๐พ (๐ฅ๐+1)] (๐ฅ๐+1 โ ๐ฅ), ๐ฆ๐+1ใ๐+ [๐ ใ๐พ (๐ฅ๐+1) โ ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐) (๐ฅ๐+1 โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐+ [๐ ใ(โ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐)) (๐ฅ๐+1 โ ๐ฅ), ๐ฆ๐+1 โ ๐ฆใ๐ .
Here the rst of the terms involving ๐พ comes from the rst lines of ๐ป๐+1 and ๐ป , the secondof the terms from the second line, and the third from ฮ๐+1. Since 0 โ ๐ป (๐ข), we have๐๐น โ โ[โ๐พ (๐ฅ)]โ๐ฆ โ ๐๐น (๐ฅ) and ๐๐บโ โ ๐พ (๐ฅ) โ ๐๐บโ(๐ฆ). Using (15.17), we can thereforeexpand
ใ๐ป (๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1๐๐+1 = [๐ ใ๐๐น (๐ฅ๐+1) โ ๐๐น , ๐ฅ๐+1 โ ๐ฅใ๐ + [๐ ใ๐๐บโ(๐ฆ๐+1) โ ๐๐บโ, ๐ฆ๐+1 โ ๐ฆใ๐+ [๐ ใ[โ๐พ (๐ฅ๐+1)]โ๐ฆ๐+1 โ [โ๐พ (๐ฅ)]โ๐ฆ, ๐ฅ๐+1 โ ๐ฅใ๐+ [๐ ใ๐พ (๐ฅ) โ ๐พ (๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐ .
Using the ๐พ๐น -strong monotonicity of ๐๐น and the ๐พ๐บโ-strong monotonicity of ๐๐บโ, andrearranging terms, we obtain
ใ๐ป (๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1๐๐+1 โฅ [๐๐พ๐น โ๐ฅ๐+1 โ ๐ฅ โ2๐ + [๐๐พ๐บโ โ๐ฆ๐+1 โ ๐ฆ โ2๐+ [๐ ใโ๐พ (๐ฅ๐+1) (๐ฅ๐+1 โ ๐ฅ), ๐ฆ๐+1ใ๐โ [๐ ใโ๐พ (๐ฅ) (๐ฅ๐+1 โ ๐ฅ), ๐ฆใ๐ + [๐ ใ๐พ (๐ฅ) โ ๐พ (๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐ .
218
15 nonlinear primal-dual proximal splitting
Combining this estimate with (15.22) and rearranging terms, we obtain
๐ท โฅ [๐ (๐พ๐น โ ๐พ๐น )โ๐ฅ๐+1 โ ๐ฅ โ2๐ + [๐ (๐พ๐บโ โ ๐พ๐บโ)โ๐ฆ๐+1 โ ๐ฆ โ2๐โ [๐ ใโ๐พ (๐ฅ) (๐ฅ๐+1 โ ๐ฅ), ๐ฆใ๐ + [๐ ใโ๐พ (๐ฅ๐) (๐ฅ๐+1 โ ๐ฅ), ๐ฆ๐+1ใ๐+ [๐ ใ๐พ (๐ฅ) โ ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐) (๐ฅ๐+1 โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐+ [๐ ใ(โ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐)) (๐ฅ๐+1 โ ๐ฅ), ๐ฆ๐+1 โ ๐ฆใ๐ .
Further rearrangements and ๐พ๐น + ๐พ๐พ โฅ ๐พ๐น and ๐พ๐บโ โฅ ๐พ๐บโ give
(15.23) ๐ท โฅ โ[๐๐พ๐พ โ๐ฅ๐+1 โ ๐ฅ โ2๐ + [๐ ใ[โ๐พ (๐ฅ๐) โ โ๐พ (๐ฅ)] (๐ฅ๐+1 โ ๐ฅ), ๐ฆใ๐+ [๐ ใ๐พ (๐ฅ) โ ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐+1) (๐ฅ โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐+ [๐ ใ๐พ (๐ฅ๐+1) โ ๐พ (๐ฅ๐+1) + โ๐พ (๐ฅ๐+1) (๐ฅ๐+1 โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐+ [๐ ใ(โ๐พ (๐ฅ๐) โ โ๐พ (๐ฅ๐+1)) (๐ฅ๐+1 โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐ .
Using Assumption 15.3 and the mean value theorem in the form
๐พ (๐ฅโฒ) = ๐พ (๐ฅ) + โ๐พ (๐ฅ) (๐ฅโฒ โ ๐ฅ) +โซ 1
0(โ๐พ (๐ฅ + ๐ (๐ฅโฒ โ ๐ฅ)) โ โ๐พ (๐ฅ)) (๐ฅโฒ โ ๐ฅ)๐๐ ,
we obtain for any ๐ฅ, ๐ฅโฒ โ X๐พ and ๐ฆ โ dom๐บโ the inequality
(15.24) ใ๐พ (๐ฅโฒ) โ ๐พ (๐ฅ) โ โ๐พ (๐ฅ) (๐ฅโฒ โ ๐ฅ), ๐ฆใ๐ โค (๐ฟ/2)โ๐ฅ โ ๐ฅโฒโ2๐ โ๐ฆ โ๐NL .Applying Assumption 15.3, the identity (15.24), and ๐ฅ๐+1 โ ๐ฅ๐+1 = ๐ฅ๐+1 โ ๐ฅ๐ to the last twoterms of (15.23), we obtain
ใ๐พ (๐ฅ๐+1) โ ๐พ (๐ฅ๐+1) + โ๐พ (๐ฅ๐+1) (๐ฅ๐+1 โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐ โฅ โ๐ฟ2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2โ๐ฆ๐+1 โ ๐ฆ โ๐NL
andใ(โ๐พ (๐ฅ๐) โ โ๐พ (๐ฅ๐+1)) (๐ฅ๐+1 โ ๐ฅ๐+1), ๐ฆ๐+1 โ ๐ฆใ๐ โฅ โ๐ฟโ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โ๐ฆ๐+1 โ ๐ฆ โ๐NL .
These estimates together with (15.17) and ๐ข๐+1 โ U(๐๐ฆ) now imply that ๐ท โฅ [๐๐ท๐พ๐+1 for
๐ท๐พ๐+1 โ ใ[โ๐พ (๐ฅ๐) โ โ๐พ (๐ฅ)] (๐ฅ๐+1 โ ๐ฅ), ๐ฆใ๐ โ ๐พ๐พ โ๐ฅ๐+1 โ ๐ฅ โ2๐ โ ๐ฟ(1 + 1/2)๐๐ฆ โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐โ โ๐ฆ๐+1 โ ๐ฆ โ๐NL โ๐พ (๐ฅ) โ ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐+1) (๐ฅ โ ๐ฅ๐+1)โ๐ .
Finally, we use Assumption 15.4 and Youngโs inequality to estimate
๐ท๐พ๐+1 โฅ (\ โ โ๐ฆ๐+1 โ ๐ฆ โ๐NL)โ๐พ (๐ฅ) โ ๐พ (๐ฅ๐+1) โ โ๐พ (๐ฅ๐+1) (๐ฅ โ ๐ฅ๐+1)โ๐โ _ + 3๐ฟ๐๐ฆ
2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Now observe that \ โ โ๐ฆ๐+1โ๐ฆ โ๐NL โฅ \ โ๐๐ฆ โฅ 0. Combining with the estimate๐ท โฅ [๐๐ท๐พ๐+1,we therefore obtain our claim. ๏ฟฝ
219
15 nonlinear primal-dual proximal splitting
We now have all the necessary tools in hand to prove the main estimate (11.28) needed forthe application of Theorem 11.12.
Theorem 15.8. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous.Suppose ๐พ : ๐ โ ๐ satises Assumptions 15.3 and 15.4. Fix ๐ โ โ, and also suppose ๐ฅ๐+1 โ X๐พand that ๐ข๐ , ๐ข๐+1 โ U(๐๐ฆ) for some ๐๐ฆ โฅ 0. Suppose (15.17) holds for some ^ โ [0, 1) and
(15.25) ๐ <^
_ + 3๐ฟ๐๐ฆ.
Then there exists ๐ฟ > 0 such that
(15.26) 12 โ๐ข
๐+1 โ ๐ขโ2๐๐+2๐๐+2 +๐ฟ
2 โ๐ข๐+1 โ ๐ข๐ โ2๐๐+1๐๐+1 โค
12 โ๐ข
๐ โ ๐ขโ2๐๐+1๐๐+1 (๐ โ โ).
Proof. We show that (15.27) holds withV๐+1 โฅ ๐ฟ2 โ๐ข๐+1โ๐ข๐ โ2๐๐+1๐๐+1 for some ๐ฟ > 0, i.e., that
(15.27) ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ๐ขใ๐๐+1 โฅ๐ฟ โ 12 โ๐ข๐+1 โ๐ข๐ โ2๐๐+1๐๐+1 +
12 โ๐ข
๐+1 โ๐ขโ2๐๐+2๐๐+2โ๐๐+1๐๐+1 .
The claim then follows from Theorem 11.12 and Lemma 15.5, the latter of which providesthe necessary self-adjointness of ๐๐+1๐๐+1.
Let thus ๐ฟ โ (0, ^) be arbitrary, and dene
๐๐+1 โ((๐ฟ๐๐ โ [๐ [_ + 3๐ฟ๐๐ฆ])Id 0
0 ๐๐(Id โ ๐๐
1โ๐ฟโ๐พ (๐ฅ๐) [โ๐พ (๐ฅ๐)]โ) ) .
Using (15.19) and (15.18) and, in the second and third step, Lemmas 15.6 and 15.7, we estimate
12 โ๐ข
๐+1 โ ๐ข๐ โ2๐๐+1โ๐๐+1๐๐+1 โค โ[๐ [_ + 3๐ฟ๐๐ฆ]2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
โค ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โ12 โ๐ข
๐+1 โ ๐ขโ2๐๐+1ฮ๐+1= ใ๐ป๐+1(๐ข๐+1), ๐ข๐+1 โ ๐ขใ๐๐+1 โ
12 โ๐ข
๐+1 โ ๐ขโ2๐๐+2๐๐+2โ๐๐+1๐๐+1 .
Then (15.27) holds if ๐๐+1 > ๐ฟ ยท Id for some ๐ฟ > 0. Since (15.25) implies ๐ < ๐ฟ/(_ + 3๐ฟ๐๐ฆ) forsome ๐ฟ โ (0, ^), this follows from (15.17) and (15.25). ๏ฟฝ
Theorem 15.9. Let ๐น : ๐ โ โ and ๐บ : ๐ โ โ be strongly convex, proper, and lowersemicontinuous. Suppose ๐พ : ๐ โ ๐ satises Assumptions 15.3 and 15.4. Let ๐ ๐พ > 0 besuch that sup๐ฅโX๐พ โโ๐พ (๐ฅ)โ โค ๐ ๐พ . Pick 0 < ๐ < 1/(_ + 3๐ฟ๐๐ฆ) for a given ๐๐ฆ โฅ 0, and take๐ = ๐๐พ๐น/๐พ๐บโ for some ๐พ๐น โ (0, ๐พ๐น + ๐พ๐พ ] and ๐พ๐บโ โ (0, ๐พ๐บ ]. Let the iterates {(๐ข๐ , ๐ฅ๐+1)}๐โโ begenerated by the NL-PDPS method (15.11). If ๐ฅ๐+1 โ X๐พ and ๐ข๐ โ U(๐๐ฆ) for all ๐ โ โ andsome ๐ข โ ๐ปโ1(0) for ๐ป given in (15.10), then ๐ข๐ โ ๐ข linearly.
220
15 nonlinear primal-dual proximal splitting
Proof. Take ๐๐+1 โ ๐๐ (1 + 2๐พ๐น๐) and๐๐+1 โ ๐๐ (1 + 2๐พ๐บโ๐) for ๐0 = 1 and๐1 โ ๐/๐ . Then๐๐๐ = ๐๐+1๐ if and only if 1 + 2๐พ๐น๐ = 1 + 2๐พ๐บโ๐ , i.e., for ๐ = ๐๐พ๐น/๐พ๐บโ . Consequently (15.17) issatised and the testing parameters ๐๐ and๐๐+1 grow exponentially. Clearly (15.25) holdsfor some ^ โ [0, 1). Combining (15.26) from Theorem 15.8 with Lemma 15.5 now shows theclaimed linear convergence. ๏ฟฝ
Besides step length bounds and structural properties of the problem, Theorem 15.9 stillrequires us to ensure that the iterates stay close enough to the critical point ๐ฅ . This canbe done if we initialize close enough to a critical point. As the proof is very technical, wemerely state the following result.
Theorem 15.10 ([Clason, Mazurenko & Valkonen 2019, Proposition 4.8]). Under the assump-tions of Theorem 15.9, for any ๐๐ฆ > 0 there exists an Y > 0 such that {๐ข๐}๐โโ โ U(๐๐ฆ) forall initial iterates ๐ข0 = (๐ฅ0, ๐ฆ0) satisfying
(15.28)โ2๐ฟโ1(โ๐ฅ0 โ ๐ฅ โ2 + ๐๐โ1โ๐ฆ0 โ ๐ฆ โ2) โค Y.
Remark 15.11 (weaker assumptions, weaker convergence). We have only demonstrated linear conver-gence of the method under the strong convexity of both ๐น and ๐บโ. However, under similarly lesserassumptions as for the basic PDPS method familiar from Part II, both an accelerated ๐ (1/๐ 2) rateand weak convergence can be proved. We refer to [Clason, Mazurenko & Valkonen 2019] for details,noting that Opialโs Lemma 9.1 extends straightforwardly to the quantitative Fejรฉr monotonicity(10.21) that is the basis of our proofs here. We also note that our linear convergence result diersfrom that in [Clason, Mazurenko & Valkonen 2019] by taking the over-relaxation parameter ๐ = 1in (15.11) instead of ๐ = 1/(1 + 2๐พ๐น๐) < 1; compare Theorem 10.8.
Remark 15.12 (historical development of the NL-PDPS). The NL-PDPS method was rst introducedin [Valkonen 2014] in nite dimensions with applications to inverse problems in magnetic resonanceimaging. The method was later extended in [Clason & Valkonen 2017a] to innite dimensions andapplied to PDE-constrained optimization problems. In these works, only (weak) convergence of theiterates is shown, based on the metric regularity of the operator ๐ป . We discuss metric regularitylater in Chapter 27. Convergence rates were then rst shown in [Clason, Mazurenko & Valkonen2019]. In that paper, alternative forms of the three-point condition Assumption 15.4 on ๐พ are alsodiscussed.
Similarly to how we showed in Section 8.7 that the preconditioned ADMM is equivalent to thePDPS method, it is possible to derive a preconditioned nonlinear ADMM that is equivalent to theNL-PDPS method; such algorithms are considered in [Benning et al. 2016]. The NL-PDPS methodhas been extended in [Clason, Mazurenko & Valkonen 2020] by replacing ใ๐พ (๐ฅ), ๐ฆใ๐ by a generalsaddle term ๐พ (๐ฅ, ๐ฆ), which can be applied to nonconvex optimization problems such as โ0-TVdenoising or elliptic Nash equilibrium problems. Block-adapted and stochastic variants in the spiritof Remark 11.17 can be found in [Mazurenko, Jauhiainen & Valkonen 2020]. Finally, a simpliedapproach using the Bregman divergences of Section 11.1 is presented in [Valkonen 2020a].
221
16 LIMITING SUBDIFFERENTIALS
While the Clarke subdierential is a suitable concept for nonsmooth but convex or non-convex but smooth functionals, it has severe drawbacks for nonsmooth and nonconvexfunctionals: As shown in Corollary 13.11, its Fermat principle cannot distinguish minimizersfrom maximizers. The reason is that the Clarke subdierential is always convex, whichis a direct consequence of its construction (13.2) via polarity with respect to (generalized)directional derivatives. To obtain sharper results for such functionals, it is therefore nec-essary to construct nonconvex subdierentials directly via a primal limiting process. Onthe other hand, deriving calculus rules for the previous subdierentials crucially exploitedtheir convexity by applying HahnโBanach separation theorems, and calculus rules fornonconvex subdierentials are thus signicantly more dicult to obtain. As in Chapter 13,we will assume throughout this chapter that๐ is a Banach space unless stated otherwise.
16.1 bouligand subdifferentials
The rst denition is motivated by Theorem 13.26: We dene a subdierential as a suitablelimit of classical derivatives (without convexication). For ๐น : ๐ โ โ, we rst dene theset of Gรขteaux points
๐บ๐น โ {๐ฅ โ ๐ | ๐น is Gรขteaux dierentiable at ๐ฅ} โ dom ๐น
and then the Bouligand subdierential of ๐น at ๐ฅ as
(16.1) ๐๐ต๐น (๐ฅ) โ {๐ฅโ โ ๐ โ | ๐ท๐น (๐ฅ๐) โโ ๐ฅโ for some ๐บ๐น 3 ๐ฅ๐ โ ๐ฅ} .
For ๐น : โ๐ โ โ locally Lipschitz, it follows from Theorem 13.26 that ๐๐ถ๐น (๐ฅ) = co ๐๐ต๐น (๐ฅ).However, unless ๐ is nite-dimensional, it is not clear a priori that the Bouligand subdif-ferential is nonempty even for ๐ฅ โ dom ๐น .1 Furthermore, the subdierential does not admita satisfactory calculus; not even a Fermat principle holds.
1Although in special cases it is possible to give a full characterization in Hilbert spaces; see, e.g., [Christof,Clason, et al. 2018].
222
16 limiting subdifferentials
Example 16.1. Let ๐น : โ โ โ, ๐น (๐ฅ) โ |๐ฅ |. Then ๐น is dierentiable at every ๐ฅ โ 0 with๐น โฒ(๐ฅ) = sign(๐ฅ). Correspondingly,
0 โ {โ1, 1} = ๐๐ต๐น (0).
To make this approach work therefore requires a more delicate limiting process. Theremainder of this chapter is devoted to one such approach, where we only give an overviewand state important results following [Mordukhovich 2006]. The full theory is based on ageometric construction similar to Lemma 4.10 making use of tangent and normal cones(corresponding to generalized directional derivatives and subgradients, respectively) thatalso allows for dierentiation of set-valued mappings. We will develop this theory inChapters 18 to 21. For an alternative, more axiomatic, approach to generalized derivativesof nonconvex functionals, we refer to [Penot 2013; Ioe 2017].
16.2 frรฉchet subdifferentials
We begin with the following limiting construction, which combines the characterizationsof both the Frรฉchet derivative and the convex subdierential. Let ๐ be a Banach space and๐น : ๐ โ โ. The Frรฉchet subdierential (or regular subdierential or presubdierential) of ๐นat ๐ฅ is then dened as2
(16.2) ๐๐น๐น (๐ฅ) โ{๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim inf๐ฆโ๐ฅ
๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ โฅ 0
}.
Note how this โlocalizesโ the denition of the convex subdierential around the point ofinterest: the numerator does not need to be nonnegative for all ๐ฆ ; it suces if this holdsfor any ๐ฆ suciently close to ๐ฅ . By a similar argument as for Theorem 4.2, we thus obtaina Fermat principle for local minimizers.
Theorem 16.2. Let ๐น : ๐ โ โ be proper and๐ฅ โ dom ๐น be a localminimizer. Then 0 โ ๐๐น๐น (๐ฅ).
Proof. Let๐ฅ โ dom ๐น be a local minimizer. Then there exists an Y > 0 such that ๐น (๐ฅ) โค ๐น (๐ฆ)for all ๐ฆ โ ๐(๐ฅ, Y), which is equivalent to
๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ0, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ โฅ 0 for all ๐ฆ โ ๐(๐ฅ, Y).
Now for any strongly convergent sequence ๐ฆ๐ โ ๐ฅ , we have that ๐ฆ๐ โ ๐(๐ฅ, Y) for ๐ largeenough. Taking the lim inf in the above inequality thus yields 0 โ ๐๐น (๐ฅ). ๏ฟฝ
2The equivalence of (16.2) with the usual denition based on corresponding normal cones follows from,e.g., [Mordukhovich 2006, Theorem 1.86].
223
16 limiting subdifferentials
For convex functions, of course, the numerator is always nonnegative by denition, andthe Frรฉchet subdierential reduces to the convex subdierential.
Theorem 16.3. Let ๐น : ๐ โ โ be proper, convex, and lower semicontinuous and ๐ฅ โ dom ๐น .Then ๐๐น๐น (๐ฅ) = ๐๐น (๐ฅ).
Proof. By denition of the convex subdierential, any ๐ฅโ โ ๐๐น (๐ฅ) satises
๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐ โฅ 0 for all ๐ฆ โ ๐ .
Dividing by โ๐ฅโ๐ฆ โ๐ > 0 for ๐ฆ โ ๐ฅ and taking the lim inf as ๐ฆ โ ๐ฅ thus yields ๐ฅโ โ ๐๐น๐น (๐ฅ).Conversely, let ๐ฅโ โ ๐๐น๐น (๐ฅ). Let โ โ ๐ \ {0} be arbitrary. Then there exists an Y > 0 suchthat
๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐กโใ๐๐ก โโโ๐ โฅ 0 for all ๐ก โ (0, Y) .
Multiplying by โโโ๐ > 0 and letting ๐ก โ 0, we obtain from Lemma 4.3 that
(16.3) ใ๐ฅโ, โใ๐ โค ๐น (๐ฅ + ๐กโ) โ ๐น (๐ฅ)๐ก
โ ๐น โฒ(๐ฅ ;โ).
By Lemma 4.4, this implies that ๐ฅโ โ ๐๐น (๐ฅ). ๏ฟฝ
Similarly, for Frรฉchet dierentiable functionals, the limit in (16.2) is zero for all sequences.
Theorem 16.4. Let ๐น : ๐ โ โ be Frรฉchet dierentiable at ๐ฅ โ ๐ . Then ๐๐น๐น (๐ฅ) = {๐น โฒ(๐ฅ)}.
Proof. The denition of the Frรฉchet derivative immediately yields
lim๐ฆโ๐ฅ
๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐น โฒ(๐ฅ), ๐ฆ โ ๐ฅใ๐โ๐ฅ โ ๐ฆ โ๐ = lim
โโโ๐โ0
๐น (๐ฅ + โ) โ ๐น (๐ฅ) โ ๐น โฒ(๐ฅ)โโโโ๐ = 0
and hence ๐น โฒ(๐ฅ) โ ๐๐น๐น (๐ฅ).Conversely, let ๐ฅโ โ ๐๐น๐น (๐ฅ) and let again โ โ ๐ \ {0} be arbitrary. As in the proof ofTheorem 16.3, we then obtain that
(16.4) ใ๐ฅโ, โใ๐ โค ๐น โฒ(๐ฅ ;โ) = ใ๐น โฒ(๐ฅ), โใ๐ .
Applying the same argument to โโ then yields ใ๐ฅโ, โใ๐ = ใ๐น โฒ(๐ฅ), โใ๐ for all โ โ ๐ , i.e.,๐ฅโ = ๐น โฒ(๐ฅ). ๏ฟฝ
For nonsmooth nonconvex functions the Frรฉchet subdierential can be strictly smallerthan the Clarke subdierential.
224
16 limiting subdifferentials
Example 16.5. Consider ๐น : โ โ โ, ๐น (๐ฅ) โ โ|๐ฅ |. For any ๐ฅ โ 0, it follows fromTheorem 16.4 that ๐๐น๐น (๐ฅ) = {โ sign๐ฅ}. But for ๐ฅ = 0 and arbitrary ๐ฅโ โ โ, we havethat
lim inf๐ฆโ0
๐น (๐ฆ) โ ๐น (0) โ ใ๐ฅโ, ๐ฆ โ 0ใ|๐ฆ โ 0| = lim inf
๐ฆโ0(โ1 โ ๐ฅโ ยท sign(๐ฆ)) = โ1 โ |๐ฅโ | < 0
and hence that๐๐น๐น (0) = โ ( [โ1, 1] = ๐๐ถ๐น (0).
Note that 0 โ dom ๐น in this example. Although the Frรฉchet subdierential does not pickup a maximizer in contrast to the Clarke subdierential, the fact that ๐๐น๐น (๐ฅ) can be emptyeven for ๐ฅ โ dom ๐น is a problem when trying to derive calculus rules that hold with equality.In fact, as Example 16.5 shows, the Frรฉchet subdierential fails to be outer semicontinuous,which is also not desirable. This leads to the next and nal denition.
16.3 mordukhovich subdifferentials
Let ๐ be a reexive Banach space and ๐น : ๐ โ โ. The Mordukhovich subdierential (orbasic subdierential or limiting subdierential) of ๐น at ๐ฅ โ dom ๐น is then dened as thestrong-to-weakโ outer closure of ๐๐น๐น (๐ฅ), i.e.,3
(16.5) ๐๐๐น (๐ฅ) โ w-โ-lim sup๐ฆโ๐ฅ
๐๐น๐น (๐ฆ)
={๐ฅโ โ ๐ โ ๏ฟฝ๏ฟฝ ๐ฅโ๐ โโ ๐ฅโ for some ๐ฅโ๐ โ ๐๐น๐น (๐ฅ๐) with ๐ฅ๐ โ ๐ฅ
},
which can be seen as a generalization of the denition (16.1) of the Bouligand subdierential.Note that in contrast to (16.1), this denition includes the constant sequence ๐ฅโ๐ โก ๐ฅโ evenat nondierentiable points, which makes this a more useful concept in general. This alsoimplies that ๐๐น๐น (๐ฅ) โ ๐๐๐น (๐ฅ) for any ๐น , and Theorem 16.2 immediately yields a Fermatprinciple.
Corollary 16.6. Let ๐น : ๐ โ โ be proper and ๐ฅ โ dom ๐น be a local minimizer. Then0 โ ๐๐๐น (๐ฅ).
As for the Frรฉchet subdierential, maximizers do not satisfy the Fermat principle.
Example 16.7. Consider again ๐น : โ โ โ, ๐น (๐ฅ) โ โ|๐ฅ |. Using Example 16.5, we directly
3The equivalence of this denition with the original geometric denition โ which holds in reexive Banachspaces โ follows from [Mordukhovich 2006, Theorem 2.34].
225
16 limiting subdifferentials
obtain from (16.5) that ๐๐๐น (0) = {โ1, 1} = ๐๐ต๐น (0).
Since the convex subdierential is strong-to-weakโ outer semicontinuous, theMordukhovichsubdierential reduces to the convex subdierential as well.
Theorem 16.8. Let ๐ be a reexive Banach space, ๐น : ๐ โ โ be proper, convex, and lowersemicontinuous, and ๐ฅ โ dom ๐น . Then ๐๐๐น (๐ฅ) = ๐๐น (๐ฅ).
Proof. From Theorem 16.3, it follows that ๐๐น (๐ฅ) = ๐๐น๐น (๐ฅ) โ ๐๐๐น (๐ฅ). Let therefore ๐ฅโ โ๐๐๐น (๐ฅ) be arbitrary. Then by denition there exists a sequence {๐ฅโ๐}๐โโ โ with ๐ฅโ๐ โโ ๐ฅโ
and ๐ฅโ๐ โ ๐๐น๐น (๐ฅ๐) = ๐๐น (๐ฅ๐) for ๐ฅ๐ โ ๐ฅ . From Theorem 6.11 and Lemma 6.8, it then followsthat ๐ฅโ โ ๐๐น (๐ฅ) as well. ๏ฟฝ
A similar result holds for continuously dierentiable functionals.
Theorem 16.9. Let๐ be a reexive Banach space and ๐น : ๐ โ โ be continuously dierentiableat ๐ฅ โ ๐ . Then ๐๐๐น (๐ฅ) = {๐น โฒ(๐ฅ)}.
Proof. From Theorem 16.3, it follows that {๐น โฒ(๐ฅ)} = ๐๐น๐น (๐ฅ) โ ๐๐๐น (๐ฅ). Let therefore๐ฅโ โ ๐๐๐น (๐ฅ) be arbitrary. Then by denition there exists a sequence {๐ฅโ๐}๐โโ โ with๐ฅโ๐
โโ ๐ฅโ and ๐ฅโ๐ โ ๐๐น๐น (๐ฅ๐) = {๐น โฒ(๐ฅ๐)} for ๐ฅ๐ โ ๐ฅ . The continuity of ๐น โฒ then immediatelyimplies that ๐น โฒ(๐ฅ๐) โ ๐น (๐ฅ), and since strong limits are also weak-โ limits, we obtain๐ฅโ = ๐น โฒ(๐ฅ). ๏ฟฝ
The same function as in Example 13.6 shows that this equality does not hold if ๐น is merelyFrรฉchet dierentiable.
We also have the following relation to Clarke subdierentials, which should be comparedto Theorem 13.26. We will give a proof in a more restricted setting in Chapter 20, cf. Corol-lary 20.21.
Theorem 16.10 ([Mordukhovich 2006, Theorem 3.57]). Let ๐ be a reexive Banach spaceand ๐น : ๐ โ โ be locally Lipschitz continuous around ๐ฅ โ ๐ . Then ๐๐ถ๐น (๐ฅ) = clโ co ๐๐๐น (๐ฅ),where clโ๐ด stands for the weak-โ closure of the set ๐ด โ ๐ โ.4
The following example illustrates that the Mordukhovich subdierential can be noncon-vex.
4Of course, in reexive Banach spaces the weak-โ closure coincides with the weak closure. The statementholds more general in so-called Asplund spaces which include some nonreexive Banach spaces.
226
16 limiting subdifferentials
Example 16.11. Let ๐น : โ2 โ โ, ๐น (๐ฅ1, ๐ฅ2) = |๐ฅ1 | โ |๐ฅ2 |. Since ๐น is continuously dieren-tiable for any (๐ฅ1, ๐ฅ2) where ๐ฅ1, ๐ฅ2 โ 0 with
โ๐น (๐ฅ1, ๐ฅ2) โ {(1, 1), (โ1, 1), (1,โ1), (โ1,โ1)},
we obtain from (16.2) that
๐๐น๐น (๐ฅ1, ๐ฅ2) =
{(1,โ1)} if ๐ฅ1 > 0, ๐ฅ2 > 0,{(โ1,โ1)} if ๐ฅ1 < 0, ๐ฅ2 > 0,{(โ1, 1)} if ๐ฅ1 < 0, ๐ฅ2 < 0,{(โ1,โ1)} if ๐ฅ1 > 0, ๐ฅ2 < 0,{(๐ก,โ1) | ๐ก โ [โ1, 1]} if ๐ฅ1 = 0, ๐ฅ2 > 0,{(๐ก, 1) | ๐ก โ [โ1, 1]} if ๐ฅ1 = 0, ๐ฅ2 < 0,โ if ๐ฅ2 = 0.
In particular, ๐๐น๐น (0, 0) = โ . However, from (16.5) it follows that
๐๐๐น (0, 0) = {(๐ก,โ1) | ๐ก โ [โ1, 1]} โช {(๐ก, 1) | ๐ก โ [โ1, 1]} .
In particular, 0 โ ๐๐๐น (0, 0). On the other hand, Theorem 16.10 then yields that
(16.6) ๐๐ถ๐น (0, 0) = {(๐ก, ๐ ) | ๐ก, ๐ โ [โ1, 1]} = [โ1, 1]2
and hence 0 โ ๐๐ถ๐น (0, 0). (Note that ๐น admits neither a minimum nor a maximum onโ2, while (0, 0) is a nonsmooth saddle-point.)
In contrast to the Bouligand subdierential, the Mordukhovich subdierential admits asatisfying calculus, although the assumptions are understandably more restrictive than inthe convex setting. The rst rule follows as always straight from the denition.
Theorem 16.12. Let ๐ be a reexive Banach space and ๐น : ๐ โ โ. Then for any _ โฅ 0 and๐ฅ โ ๐ ,
๐๐ (_๐น ) (๐ฅ) = _๐๐๐น (๐ฅ).
Full calculus in innite-dimensional spaces holds only for a rather small class ofmappings.
Theorem 16.13 ([Mordukhovich 2006, Proposition 1.107]). Let ๐ be a reexive Banach space,๐น : ๐ โ โ be continuously dierentiable, and ๐บ : ๐ โ โ be arbitrary. Then for any๐ฅ โ dom๐บ ,
๐๐ (๐น +๐บ) (๐ฅ) = {๐น โฒ(๐ฅ)} + ๐๐๐บ (๐ฅ).
227
16 limiting subdifferentials
While the previous two theorems also hold for the Frรฉchet subdierential (the latter even formerely Frรฉchet dierentiable ๐น ), the following chain rule is only valid for the Mordukhovichsubdierential. Compared to Theorem 13.23, it also allows for the outer functional to beextended-real valued.
Theorem 16.14 ([Mordukhovich 2006, Proposition 1.112]). Let ๐ be a reexive Banach space,๐น : ๐ โ ๐ be continuously dierentiable, and ๐บ : ๐ โ โ be arbitrary. Then for any ๐ฅ โ ๐with ๐น (๐ฅ) โ dom๐บ and ๐น โฒ(๐ฅ) : ๐ โ ๐ surjective,
๐๐ (๐บ โฆ ๐น ) (๐ฅ) = ๐น โฒ(๐ฅ)โ๐๐๐บ (๐น (๐ฅ)) .
More general calculus rules require ๐ to be a reexive Banach5 space as well as additional,nontrivial, assumptions on ๐น and ๐บ ; see, e.g., [Mordukhovich 2006, Theorem 3.36 andTheorem 3.41].
Wewill illustrate how to prove the above calculus results andmore in Section 20.4 and Chap-ter 25, after studying the dierentiation of set-valued mappings.
5or Asplund
228
17 Y-SUBDIFFERENTIALS AND APPROXIMATE FERMAT
PRINCIPLES
We now study an approximate variant of the Frรฉchet subdierential of Section 16.2 aswell as related approximate Fermat principles; these will be needed in Chapter 18 to studylimiting tangent and normal cones.
17.1 Y-subdifferentials
Just like the Y-minimizers in Section 2.4, it can be useful to consider โrelaxedโ Y-subdierenti-als. In particular, it is possible to derive exact calculus rules for these relaxed subdierentials,which can lead to tighter results than inclusions for the corresponding exact subdierentials(in particular, for the Frรฉchet subdierential). We will make use of this in Chapter 27.
Similarly to the Frรฉchet subdierential (16.2), we thus dene for ๐น : ๐ โ โ the Y(-Frรฉchet)-subdierential by
(17.1) ๐Y๐น (๐ฅ) โ{๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim inf๐ฆโ๐ฅ
๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ โฅ โY
},
where ๐0๐น = ๐๐น๐น . The following lemma provides further insight into the Y-subdierential.
Lemma 17.1. Let ๐น : ๐ โ โ on a Banach space ๐ , and Y โฅ 0. Then the following areequivalent:
(i) ๐ฅโ โ ๐Y๐น (๐ฅ);(ii) ๐ฅโ โ ๐๐น [๐น + Yโ ยท โ ๐ฅ โ๐ ] (๐ฅ);(iii) 0 โ ๐๐น [๐น + Yโ ยท โ ๐ฅ โ๐ โ ใ๐ฅโ, ยท โ ๐ฅใ] (๐ฅ).
Proof. For each of the alternatives, (17.1) is equivalent to
lim inf๐ฆโ๐ฅ
Yโ๐ฆ โ ๐ฅ โ๐ + ๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ โฅ 0. ๏ฟฝ
229
17 Y-subdifferentials and approximate fermat principles
We have the following โfuzzyโ Y-sum rule.
Lemma 17.2. Let ๐ be a Banach space, ๐บ : ๐ โ โ, and ๐น : ๐ โ โ be convex with๐๐น (๐ฅ) โ ๐น(๐ฅโ, Y) for some Y โฅ 0 and ๐ฅโ โ ๐ โ. Then for all ๐ฟ โฅ 0,
๐๐ฟ๐บ (๐ฅ) + ๐๐น (๐ฅ) โ ๐๐ฟ [๐บ + ๐น ] (๐ฅ) โ ๐Y+๐ฟ๐บ (๐ฅ) + {๐ฅโ}.In particular, if ๐ฅโ โ ๐๐น (๐ฅ), then
๐๐ฟ๐บ (๐ฅ) + ๐๐น (๐ฅ) โ ๐๐ฟ [๐บ + ๐น ] (๐ฅ) โ ๐Y+๐ฟ๐บ (๐ฅ) + ๐๐น (๐ฅ).
Proof. We start with the rst inclusion. Let ๐ฅโ โ ๐๐น (๐ฅ) and ๐ฅโ โ ๐๐ฟ๐บ (๐ฅ). Then the deni-tions (4.1) and (17.1), respectively, imply that
lim inf๐ฆโ๐ฅ
๐บ (๐ฆ) โ๐บ (๐ฅ) + ๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐ฅโ + ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐
โฅ lim inf๐ฆโ๐ฅ
๐บ (๐ฆ) โ๐บ (๐ฅ) โ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ โฅ โ๐ฟ,
i.e., ๐ฅโ + ๐ฅโ โ ๐๐ฟ [๐บ + ๐น ] (๐ฅ).To prove the second inclusion, let ๐ฅโ โ ๐๐ฟ [๐บ + ๐น ] (๐ฅ) and โ โ ๐ with โโโ๐ = 1. Then (17.1)implies that for all ๐ก๐โ 0 and โ๐ โ โ,
(17.2) lim inf๐โโ
๐น (๐ฅ + ๐ก๐โ๐) โ ๐น (๐ฅ) +๐บ (๐ฅ + ๐ก๐โ๐) โ๐บ (๐ฅ) โ ๐ก๐ใ๐ฅโ, โ๐ใ๐๐ก๐
โฅ โ๐ฟ.
Since ๐น is directionally dierentiable by Lemma 4.3 and locally Lipschitz around ๐ฅ โint(dom ๐น ) = ๐ by Theorem 3.13 with Lipschitz constant ๐ฟ > 0, we have
lim๐โโ
๐น (๐ฅ + ๐ก๐โ๐) โ ๐น (๐ฅ)๐ก๐
โค lim๐โโ
(๐น (๐ฅ + ๐ก๐โ) โ ๐น (๐ฅ)
๐ก๐+ ๐ฟโโ๐ โ โโ๐
)= ๐น โฒ(๐ฅ ;โ).
Let now ๐ > 0 be arbitrary. Then by Lemma 4.4, Theorem 13.8, and Corollary 13.15 thereexists an ๐ฅโ
โ,๐โ ๐๐น (๐ฅ) such that ๐น โฒ(๐ฅ ;โ) โค ใ๐ฅโ
โ,๐, โใ๐ + ๐ . Therefore
lim๐โโ
๐น (๐ฅ + ๐ก๐โ๐) โ ๐น (๐ฅ) โ ๐ก๐ใ๐ฅโ, โ๐ใ๐๐ก๐
โค ๐น โฒ(๐ฅ ;โ) โ ใ๐ฅโ, โใ๐โค ใ๐ฅโโ,๐ โ ๐ฅโ, โใ๐ + ๐โค Y + ๐,
where we have used that ๐๐น (๐ฅ) โ ๐น(๐ฅโ, Y) and โโโ๐ = 1 in the last inequality. Since ๐ > 0was arbitrary, the characterization (17.2) now implies
lim inf๐โโ
๐บ (๐ฅ + ๐ก๐โ๐) โ๐บ (๐ฅ) โ ๐ก๐ใ๐ฅโ โ ๐ฅโ, โ๐ใ๐๐ก๐
โฅ โ(๐ฟ + Y).
Since ๐ฆ๐ โ ๐ฅ + ๐ก๐โ๐ โ ๐ฅ was arbitrary, this proves ๐ฅโ โ๐ฅโ โ ๐Y+๐ฟ๐บ (๐ฅ), i.e., ๐๐ฟ [๐บ + ๐น ] (๐ฅ) โ๐Y+๐ฟ๐บ (๐ฅ) + {๐ฅโ}. ๏ฟฝ
230
17 Y-subdifferentials and approximate fermat principles
The following is now immediate from Theorem 4.5, since we are allowed to take Y = 0 if๐๐น (๐ฅ) is a singleton.
Corollary 17.3. Let ๐ be a Banach space,๐บ : ๐ โ โ, and ๐น : ๐ โ โ be convex and Gรขteauxdierentiable at ๐ฅ โ ๐ . Then for every ๐ฟ โฅ 0,
๐๐ฟ [๐บ + ๐น ] (๐ฅ) = ๐๐ฟ๐บ (๐ฅ) + {๐ท๐น (๐ฅ)}.In particular,
๐๐น [๐บ + ๐น ] (๐ฅ) = ๐๐น๐บ (๐ฅ) + {๐ท๐น (๐ฅ)}.
17.2 smooth spaces
For the remaining results in this chapter, we need additional assumptions on the normedvector space ๐ . In particular, we need to assume that the norm is Gรขteaux dierentiable on๐ \ {0}; we call such spaces Gรขteaux smooth.
Recalling from Chapter 7 the duality between dierentiability and convexity, it is notsurprising that this property can be related to the convexity of the dual norm. Here weneed the following property: a normed vector space ๐ is called locally uniformly convex iffor any ๐ฅ โ ๐ with โ๐ฅ โ๐ = 1 and all Y โ (0, 2] there exists a ๐ฟ (Y, ๐ฅ) > 0 such that
(17.3) โ 12 (๐ฅ + ๐ฆ)โ๐ โค 1 โ ๐ฟ (Y, ๐ฅ) for all ๐ฆ โ ๐ with โ๐ฆ โ๐ = 1 and โ๐ฅ โ ๐ฆ โ๐ โฅ Y.
Lemma 17.4. Let ๐ be a Banach space and ๐ โ be locally uniformly convex. Then ๐ is Gรขteauxsmooth.
Proof. Let ๐ฅ โ ๐ \ {0} be given. Since norms are convex, it suces by Theorem 13.18to show that ๐โ ยท โ๐ (๐ฅ) is a singleton. Let therefore ๐ฅโ1 , ๐ฅโ2 โ ๐โ ยท โ๐ (๐ฅ), i.e., satisfying byTheorem 4.6
โ๐ฅโ1 โ๐ โ = โ๐ฅโ2 โ๐ โ = 1, ใ๐ฅโ1 , ๐ฅใ๐ = ใ๐ฅโ2, ๐ฅใ๐ = โ๐ฅ โ๐ .This implies that
2 =1
โ๐ฅ โ๐(ใ๐ฅโ1 , ๐ฅใ๐ + ใ๐ฅโ2, ๐ฅใ๐
)= ใ๐ฅโ1 + ๐ฅโ2, ๐ฅ
โ๐ฅ โ๐ ใ๐ โค โ๐ฅโ1 + ๐ฅโ2 โ๐ โ
by (1.1) and hence that โ 12 (๐ฅโ1 +๐ฅโ2)โ๐ โ โฅ 1. Since ๐ โ is locally uniformly convex, this is only
possible if ๐ฅโ1 = ๐ฅโ2 , as otherwise we could choose for Y โ โ๐ฅโ1 โ ๐ฅโ2 โ๐ โ โ (0, 2] a ๐ฟ (Y, ๐ฅ) > 0such that โ 1
2 (๐ฅโ1 + ๐ฅโ2)โ๐ โ โค 1 โ ๐ฟ (Y, ๐ฟ) < 1. ๏ฟฝ
Remark 17.5. In fact, if ๐ is additionally reexive, the norm is even continuously (Frรฉchet) dieren-tiable; see [Schirotzek 2007, Proposition 4.7.10]. We will not need this stronger property, however.In addition, locally uniformly convex spaces always have the RadonโRiesz property; see [Schirotzek2007, Lemma 4.7.9].
231
17 Y-subdifferentials and approximate fermat principles
Example 17.6. The following spaces are locally uniformly convex:
(i) ๐ a Hilbert space. This follows from the parallelogram identity
โ 12 (๐ฅ + ๐ฆ)โ2๐ =
12 โ๐ฅ โ
2๐ + 1
2 โ๐ฆ โ2๐ โ 1
4 โ๐ฅ โ ๐ฆ โ2๐ for all ๐ฅ, ๐ฆ โ ๐,
which in fact characterizes precisely those norms that are induced by an innerproduct. This identity immediately yields for all Y > 0 and all ๐ฅ, ๐ฆ โ ๐ satisfyingโ๐ฅ โ ๐ฆ โ๐ โฅ Y that
โ 12 (๐ฅ + ๐ฆ)โ2๐ โค 1 โ Y2
4 โค(1 โ Y2
8
)2,
which in particular veries (17.3) with ๐ฟ := Y2
8 .
(ii) ๐ = ๐ฟ๐ (ฮฉ) for ๐ โ (2,โ). This follows from the algebraic inequality
|๐ + ๐ |๐ + |๐ โ ๐ |๐ โค 2๐โ1( |๐ |๐ + |๐ |๐) for all ๐, ๐ โ โ,
see [Cioranescu 1990, Lemma II.4.1]. This implies that
โ 12 (๐ข + ๐ฃ)โ
๐๐ฟ๐ (ฮฉ) โค
12 โ๐ขโ
๐๐ฟ๐ (ฮฉ) +
12 โ๐ฃ โ
๐๐ฟ๐ (ฮฉ) โ
12๐ โ๐ข โ ๐ฃ โ
๐๐ฟ๐ (ฮฉ) for all ๐ข, ๐ฃ โ ๐ฟ๐ (ฮฉ).
We can now argue exactly as in case (i).
(iii) ๐ = ๐ฟ๐ (ฮฉ) for ๐ โ (1, 2). This follows from the algebraic inequality
|๐ + ๐ |๐ + |๐ โ ๐ |๐ โค 2( |๐ |๐ + |๐ |๐)๐/(๐โ1) for all ๐, ๐ โ โ,
see [Cioranescu 1990, Lemma II.4.1], implying a similar inequality for the ๐ฟ๐ (ฮฉ)norms from which the claim follows as for (i) and (ii).
Hence every Hilbert space (by identifying ๐ with ๐ โ) and every ๐ฟ๐ (ฮฉ) for ๐ โ (1,โ)(identifying ๐ฟ๐ (ฮฉ) with ๐ฟ๐ (ฮฉ), ๐ = ๐
๐โ1 โ (1,โ) is Gรขteaux smooth.
In fact, the celebrated Lindenstrauss and Trojanski renorming theorems show that everyreexive Banach space admits an equivalent norm such that the space (with that norm)becomes locally uniformly convex; see [Cioranescu 1990, Theorem III.2.10]. (Of course,even though that means that the dual space of the renormed space is Gรขteaux smooth, thisdoes not imply anything about the dierentiability of the original norm, as the obviousexample of โ๐ endowed with the 1- or the โ-norm shows.) For many more details onsmooth and uniformly convex spaces, see [Fabian et al. 2001; Schirotzek 2007; Cioranescu1990].
232
17 Y-subdifferentials and approximate fermat principles
Note that even in Gรขteaux smooth spaces, the norm will not be dierentiable at ๐ฅ = 0. Butthis can be addressed by considering โ๐ฅ โ๐๐ for ๐ > 1; for later use, we state this for ๐ = 2.
Lemma 17.7. Let ๐ be a Gรขteaux smooth Banach space and ๐น (๐ฅ) = โ๐ฅ โ2๐ . Then ๐น is Gรขteauxdierentiable at any ๐ฅ โ ๐ with
๐ท๐น (๐ฅ) = 2โ๐ฅ โ๐๐ฅโ for any ๐ฅโ โ ๐ โ with โ๐ฅโโ๐ โ = 1 and ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โ๐ .
Proof. Since norms are convex, we can apply Theorems 4.6 and 4.19 to obtain that
๐๐น (๐ฅ) = {2โ๐ฅ โ๐๐ฅโ | ๐ฅโ โ ๐ โ with โ๐ฅโโ๐ โ = 1 and ใ๐ฅโ, ๐ฅใ๐ = โ๐ฅ โ๐ } (๐ฅ โ ๐ ).
At any ๐ฅ โ 0, this set is a singleton by Theorem 4.5 and the assumption that ๐ is Gรขteauxsmooth. Clearly also ๐๐น (0) = {0}, and hence the claim follows from Theorem 13.18. ๏ฟฝ
Remark 17.8 (Asplund spaces). Asplund spaces are, by (one equivalent) denition, those Banachspaces where every continuous, convex, real-valued function is Frรฉchet-dierentiable on a denseset. (This is a limited version of Rademacherโs Theorem 13.25 in โ๐ .) We refer to [Yost 1993] for anintroduction to Asplund spaces. Importantly, reexive Banach spaces are Asplund.
The norm of an Asplund space is thus dierentiable on a dense set ๐ท . It was shown in [Ekeland &Lebourg 1976] that perturbed optimization problems on Asplund spaces have solutions on a denseset of perturbation parameters and that the objective function is dierentiable at such a solution. Ifwe worked in the following sections with perturbed optimization problems and applied such anexistence result instead of the Ekeland or the BorweinโPreiss variational principles (Theorem 2.14or Theorem 2.15, respectively), we would be able to extend the following results to Asplund spaces.
17.3 fuzzy fermat principles
The following result generalizes the Fermat principle of Theorem 16.2 to sums of twofunctions in a โfuzzyโ fashion. We will use it to show a fuzzy containment formula forY-subdierentials. Its generalizations to more than two functions can also be used to derivemore advanced fuzzy sum rules than 17.2. Our focus is, however, on exact calculus, so wewill not be developing such generalizations.
Lemma 17.9 (fuzzy Fermat principle). Let ๐ be a Gรขteaux smooth Banach space and ๐น,๐บ :๐ โ โ. If ๐น +๐บ attains a local minimum at a point ๐ฅ โ ๐ where ๐น is lower semicontinuousand ๐บ is locally Lipschitz, then for any ๐ฟ, ` > 0 we have
0 โโ
๐ฅ,๐ฆโ๐น(๐ฅ,๐ฟ)(๐๐น๐น (๐ฅ) + ๐๐น๐บ (๐ฆ)) + `๐น๐ โ .
233
17 Y-subdifferentials and approximate fermat principles
Proof. Let ๐, ๐ผ > 0 be arbitrary. The idea is to separate the two nonsmooth functions ๐นand๐บ , and hence be able to use the exact sum rule of Corollary 17.3, by locally relaxing theproblem min๐ฅโ๐ (๐น +๐บ) to
inf๐ฅ,๐ฆโ๐
๐ฝ๐ผ (๐ฅ, ๐ฆ) โ ๐น (๐ฅ) +๐บ (๐ฆ) + ๐ผ โ๐ฅ โ ๐ฆ โ2๐ + โ๐ฅ โ ๐ฅ โ2๐ + ๐ฟ๐น(๐ฅ,๐)2 (๐ฅ, ๐ฆ).
We take ๐ > 0 small enough that ๐ฅ minimizes ๐น +๐บ within ๐น(๐ฅ, ๐), and both ๐น โฅ ๐น (๐ฅ) โ 1and๐บ โฅ ๐บ (๐ฆ) โ 1 on ๐น(๐ฅ, ๐). The rst requirement is possible by the assumption of ๐น +๐บattaining its local minimum at ๐ฅ , while the latter follows from the lower semicontinuity of ๐นand the local Lipschitz continuity of๐บ . In the following, we denote by ๐ฟ the Lipschitz factorof๐บ on ๐น(๐ฅ, ๐). It follows that ๐ฝ๐ผ (๐ฅ, ๐ฆ) โฅ ๐น (๐ฅ) +๐บ (๐ฅ) โ2 for all (๐ฅ, ๐ฆ) โ ๐น(๐ฅ, ๐)2 = dom ๐ฝ๐ผ ,and hence ๐ฝ๐ผ is bounded from below.
We study the approximate solutions of the relaxed problem in several steps.
Step 1: constrained inmal values converge to ๐ฝ (๐ฅ, ๐ฅ). Let ๐ฅ๐ผ , ๐ฆ๐ผ โ ๐น(๐ฅ, ๐) be such that
(17.4) ๐ฝ๐ผ (๐ฅ๐ผ , ๐ฆ๐ผ ) < ๐๐ผ + ๐ผโ1 where ๐๐ผ โ inf๐ฅ,๐ฆโ๐
๐ฝ๐ผ (๐ฅ, ๐ฆ).
We show that
๐ฝ๐ผ (๐ฅ, ๐ฅ) < ๐๐ผ + Y๐ผ for Y๐ผ โ ๐ฟ
โ๐ผโ1 + 2๐ผ
+ ๐ผโ1.To start with, we have
๐น (๐ฅ) +๐บ (๐ฅ) + ๐ผโ1 = ๐ฝ (๐ฅ, ๐ฅ) + ๐ผโ1โฅ ๐๐ผ + ๐ผโ1> ๐ฝ๐ผ (๐ฅ๐ผ , ๐ฆ๐ผ )= ๐น (๐ฅ๐ผ ) +๐บ (๐ฆ๐ผ ) + ๐ผ โ๐ฅ๐ผ โ ๐ฆ๐ผ โ2๐ + โ๐ฅ๐ผ โ ๐ฅ โ2๐โฅ ๐น (๐ฅ) +๐บ (๐ฅ) + ๐ผ โ๐ฅ๐ผ โ ๐ฆ๐ผ โ2๐ + โ๐ฅ๐ผ โ ๐ฅ โ2๐ โ 2.
This implies that โ๐ฅ๐ผ โ ๐ฆ๐ผ โ๐ <
โ๐ผโ1+2๐ผ . Since ๐ฅ minimizes ๐น +๐บ within ๐น(๐ฅ, ๐), we obtain
the bound (17.4) through
๐ฝ๐ผ (๐ฅ, ๐ฅ) = ๐น (๐ฅ) +๐บ (๐ฅ)โค ๐น (๐ฅ๐ผ ) +๐บ (๐ฅ๐ผ )โค ๐น (๐ฅ๐ผ ) +๐บ (๐ฆ๐ผ ) + ๐ฟโ๐ฅ๐ผ โ ๐ฆ๐ผ โ๐ .โค ๐ฝ (๐ฅ๐ผ , ๐ฆ๐ผ ) + ๐ฟโ๐ฅ๐ผ โ ๐ฆ๐ผ โ๐ .< ๐๐ผ + Y๐ผ .
Step 2: exact unconstrained minimizers exist for a perturbed problem. By (17.4), we can applythe BorweinโPreiss variational principle (Theorem 2.15) for any _, ๐ผ > 0, small enough๐ > 0 (all to be xed later), and ๐ = 2 to obtain a sequence {`๐}๐โฅ0 of nonnegative weightssumming to 1 and a sequence {(๐ฅ๐, ๐ฆ๐)}๐โฅ0 โ ๐ โ ๐ with (๐ฅ0, ๐ฆ0) = (๐ฅ, ๐ฅ) convergingstrongly to some (๐ฅ๐ผ , ๐ฆ๐ผ ) โ ๐ ร ๐ (endowed with the euclidean product norm) such that
234
17 Y-subdifferentials and approximate fermat principles
(i) โ๐ฅ๐ โ ๐ฅ๐ผ โ2๐ + โ๐ฆ๐ โ ๐ฆ๐ผ โ2๐ โค _2 for all ๐ โฅ 0 (in particular, โ๐ฅ โ ๐ฅ๐ผ โ๐ โค _);
(ii) the function
๐ป๐ผ (๐ฅ, ๐ฆ) โ ๐ฝ๐ผ (๐ฅ, ๐ฆ) + Y๐ผ_2
โโ๐=0
`๐(โ๐ฅ โ ๐ฅ๐โ2 + โ๐ฆ โ ๐ฆ๐โ2
)attains its global minimum at (๐ฅ๐ผ , ๐ฆ๐ผ ).
Note that since ๐ฝ๐ผ includes the constraint (๐ฅ, ๐ฆ) โ ๐น(๐ฅ, ๐)2, we have (๐ฅ๐ผ , ๐ฆ๐ผ ) โ ๐น(๐ฅ, ๐)2.In fact, by taking _ โ (0, ๐), it follows from (i) and the convergence (๐ฅ๐, ๐ฆ๐) โ (๐ฅ๐ผ , ๐ฆ๐ผ )that the minimizer (๐ฅ๐ผ , ๐ฆ๐ผ ) โ ๐น(๐ฅ, _)2 โ int๐น(๐ฅ, ๐)2 is unconstrained.Step 3: the perturbed minimizers satisfy the claim for large ๐ผ and small _. Setting ฮจ๐ฆ (๐ฅ) โโ๐ฅ โ ๐ฆ โ2๐ , it follows from Lemma 17.7 that ฮจ๐ฆ is Gรขteaux dierentiable for any ๐ฆ โ๐ with ๐ทฮจ๐ฆ (๐ฅ) โ 2โ๐ฅ โ ๐ฆ โ๐๐น๐ โ . Furthermore, since (๐ฅ๐ผ , ๐ฆ๐ผ ) โ int๐น(๐ฅ, ๐)2, we have๐๐ฟ๐น(๐ฅ,๐)2 (๐ฅ๐ผ , ๐ฆ๐ผ ) = (0, 0). Hence the only nonsmooth component of ๐ป๐ผ at (๐ฅ๐ผ , ๐ฆ๐ผ ) is(๐ฅ, ๐ฆ) โฆโ ๐น (๐ฅ) +๐บ (๐ฆ). We can thus apply Theorem 16.2 and Corollary 17.3 to obtain
0 โ ๐๐น๐ป๐ผ (๐ฅ๐ผ , ๐ฆ๐ผ ) =(๐๐น๐น (๐ฅ) + ๐ผ๐ทฮจ๐ฆ๐ผ (๐ฅ๐ผ ) + ๐ทฮจ๐ฅ (๐ฅ๐ผ ) + Y๐ผ
_2โโ๐=0 `๐๐ทฮจ๐ฅ๐ (๐ฅ๐ผ )
๐๐น๐บ (๐ฅ) + ๐ผ๐ทฮจ๐ฅ๐ผ (๐ฆ๐ผ ) + Y๐ผ_2
โโ๐=0 `๐๐ทฮจ๐ฆ๐ (๐ฆ๐ผ )
).
By (i) and ๐ฅ๐ผ , ๐ฆ๐ผ โ ๐น(๐ฅ, _) we have โ๐ฅ๐ผ โ ๐ฅ๐โ๐ , โ๐ฆ๐ผ โ ๐ฆ๐โ๐ โค _ for all ๐ โฅ 0. In addition,โโ๐=0 `๐ = 1, and thus Y๐ผ
_2โโ๐=0 `๐๐ทฮจ๐ฅ๐ (๐ฅ๐ผ ) โ 2Y๐ผ
_ ๐น๐ โ and likewise for ๐ทฮจ๐ฆ๐ (so that in factwe were justied in dierentiating the series term-wise). By (i) also โ๐ฅ๐ผ โ ๐ฅ โ๐ โค _, so that๐ทฮจ๐ฅ (๐ฅ๐ผ ) โ 2_๐น๐ โ . Finally, since โ๐ฅโ โ ๐โ ยท โ๐ (โ๐ฅ) for any ๐ฅโ โ ๐โ ยท โ๐ (๐ฅ) and any ๐ฅ โ ๐ ,we have ๐ทฮจ๐ฆ (๐ฅ) = โ๐ทฮจ๐ฅ (๐ฆ) for all ๐ฅ, ๐ฆ โ ๐ . We thus have
โ๐ผ๐ทฮจ๐ฆ๐ผ (๐ฅ๐ผ ) โ ๐๐น๐น (๐ฅ๐ผ ) +(2_ + 2Y๐ผ
_
)๐น๐ โ,
๐ผ๐ทฮจ๐ฆ๐ผ (๐ฅ๐ผ ) โ ๐๐น๐บ (๐ฆ๐ผ ) + 2Y๐ผ_
๐น๐ โ,
which implies that
0 โ ๐๐น๐น (๐ฅ๐ผ ) + ๐๐น๐บ (๐ฆ๐ผ ) +(2_ + 4Y๐ผ
_
)๐น๐ โ .
Since (๐ฅ๐ผ , ๐ฆ๐ผ ) โ ๐น(๐ฅ, _)2, the claim now follows by taking _ โ (0, ๐) small enough andthen ๐ผ > 0 large (and thus Y๐ผ small) enough. ๏ฟฝ
Remark 17.10 (fuzzy Fermat principles and trustworthy subdierentials). Lemma 17.9 is due to[Fabian 1988]. Such fuzzy Fermat principles are studied in more detail from the point of view offuzzy variational principles in [Ioe 2017]. Specically, the claim of Lemma 17.9 has to hold foran arbitrary subdierential operator ๐โ for it to be trustworthy, whereas the opposite inclusion๐โ๐บ (๐ฅ) + ๐โ๐น (๐ฅ) โ ๐โ [๐บ + ๐น ] (๐ฅ) is required for the subdierential to be elementary.
235
17 Y-subdifferentials and approximate fermat principles
Remark 17.11 (notes on the proof of Lemma 17.9). Note how we had to apply the BorweinโPreissvariational principle instead of Ekelandโs to obtain a dierentiable convex perturbation and thusto be able to apply the sum rule Corollary 17.3. In contrast, the proof in [Ioe 2017] is based onthe DevilleโGodefroyโZizler variational principle that makes no convexity assumption on theperturbation function and hence requires the stronger property of Frรฉchet smoothness (i.e., Frรฉchetinstead of Gรขteaux dierentiability of the norm outside the origin).
Finally, with an additional argument showing ๐ฝ๐ผ (๐ฅ๐ผ , ๐ฆ๐ผ ) โค ๐๐ผ + ๐ฝ๐ผ for a suitable ๐ฝ๐ผ , it would bepossible to further constrain |๐น (๐ฅ) โ ๐น (๐ฅ) | โค ๐ฟ in the claim of Lemma 17.9, as is done in [Ioe 2017,Theorem 4.30].
Corollary 17.12. Let ๐ be a Gรขteaux smooth Banach space, let ๐น : ๐ โ โ be lower semicon-tinuous near ๐ฅ โ ๐ , and Y > 0. Then for any ๐ฟ > 0 and Yโฒ > Y we have
๐Y๐น (๐ฅ) โโ
๐งโ๐น(๐ฅ,๐ฟ)๐๐น๐น (๐ง) + Yโฒ๐น๐ โ .
Proof. We may assume that ๐ฅ โ dom ๐น , in particular that there exists some ๐ฅโ โ ๐Y๐น (๐ฅ),i.e., such that
lim inf๐ฅโ ๐ฆโ๐ฅ
๐น (๐ฆ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ โฅ โY.
Taking any Yโฒ > Y and dening
๐น (๐ฅ) โ ๐น (๐ฅ) โ ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ and ๐บ (๐ฅ) โ Yโฒโ๐ฅ โ ๐ฅ โ๐ ,
we obtain as in Lemma 17.1 that
lim inf๐ฅโ ๐ฆโ๐ฅ
(๐บ + ๐น ) (๐ฆ) โ (๐บ + ๐น ) (๐ฅ)โ๐ฆ โ ๐ฅ โ๐ โฅ (Yโฒ โ Y).
Thus ๐น +๐บ achieves its local minimum at ๐ฅ . The function ๐บ is convex and Lipschitz while๐น lower semicontinuous. Hence Lemma 17.9 implies for any ๐ฟ > 0 and `โฒ > 0 that
0 โโ
๐ง,๐ฆโ๐น(๐ฅ,๐ฟ)
(๐๐น๐น (๐ฆ) + ๐๐น๐บ (๐ง)) + `โฒ๐น๐ .
Since ๐๐น๐น (๐ฆ) = ๐๐น๐น (๐ฆ) โ {๐ฅโ} (by Corollary 17.3 or directly from the denition) and๐๐น๐บ (๐ง) = ๐๐บ (๐ง) โ Yโฒ๐น๐ , we obtain
๐ฅโ โโ
๐งโ๐น(๐ฅ,๐ฟ)๐๐น๐น (๐ง) + (`โฒ + Yโฒ)๐น๐ .
Since `โฒ > 0 and Yโฒ > Y were arbitrary, the claim follows. ๏ฟฝ
236
17 Y-subdifferentials and approximate fermat principles
17.4 approximate fermat principles and projections
We now introduce an approximate Fermat principle, which can be invoked when we do notknow whether a minimizer exists; in particular, when ๐น fails to be weakly lower semicon-tinuous so that Theorem 2.1 is not applicable.
Theorem 17.13. Let ๐ be a Banach space and ๐น : ๐ โ โ be proper, lower semicontinuous,and bounded from below. Then for every Y, ๐ฟ > 0 there exists an ๐ฅY โ ๐ such that
(i) ๐น (๐ฅY) โค inf๐ฅโ๐ ๐น (๐ฅ) + Y;(ii) ๐น (๐ฅY) < ๐น (๐ฅ) + ๐ฟ โ๐ฅ โ ๐ฅY โ๐ for all ๐ฅ โ ๐ฅY ;
(iii) 0 โ ๐๐ฟ๐น (๐ฅY).
Proof. Since ๐น is bounded from below, inf๐ฅโ๐ ๐น (๐ฅ) > โโ. We can thus take a minimizingsequence {๐ฅ๐}๐โโ with ๐น (๐ฅ๐)โ inf๐ฅโ๐ ๐น (๐ฅ) and nd a ๐(Y) โ โ such that ๐ฅY โ ๐ฅ๐(Y)satises (i). Ekelandโs variational principle Theorem 2.14 thus yields for _ โ Y/๐ฟ an๐ฅY โ ๐ฅY,_ such that โ๐ฅY โ ๐ฅY โ๐ โค _,
๐น (๐ฅY) โค ๐น (๐ฅY) + Y_โ๐ฅY โ ๐ฅY โ๐ โค ๐น (๐ฅY),
as well as๐น (๐ฅY) < ๐น (๐ฅ) + Y
_โ๐ฅY โ ๐ฅ โ๐ (๐ฅ โ ๐ฅY).
Thus (i) as well as (ii) hold. The latter implies for all ๐ฅ โ ๐ฅY that
๐น (๐ฅ) โ ๐น (๐ฅY) โ ใ0, ๐ฅ โ ๐ฅYใ๐โ๐ฅ โ ๐ฅY โ๐ โฅ โ๐ฟ,
i.e., 0 โ ๐๐ฟ๐น (๐ฅY) by denition. ๏ฟฝ
As an example for possible applications of approximate Fermat principles, we use it to provethe following result on projections and approximate projections onto a nonconvex set๐ถ โ ๐ .For nonconvex sets, even the exact projection need no longer be unique; furthermore, forthe reasons discussed before Theorem 17.13, the set of projections ๐๐ถ (๐ฅ) may be emptywhen ๐ถ โ โ is closed but not weakly closed. We recall that by Lemma 1.10, convex closedsets are weakly closed, as are, of course, nite-dimensional closed sets. However, moregenerally, weak closedness can be elusive. Hence we will need to perform approximateprojections in Part IV. It is not surprising that this requires additional assumptions on thecontaining space to make up for this.
237
17 Y-subdifferentials and approximate fermat principles
Theorem 17.14. Let ๐ be a Gรขteaux smooth Banach space and let ๐ถ โ ๐ be nonempty andclosed. Dene the (possibly multi-valued) projection
๐๐ถ : ๐ โ ๐, ๐๐ถ (๐ฅ) โ argmin๐ฅโ๐ถ
โ๐ฅ โ ๐ฅ โ๐
and the corresponding distance function
๐๐ถ : ๐ โ โ, ๐๐ถ (๐ฅ) โ inf๐ฅโ๐ถ
โ๐ฅ โ ๐ฅ โ๐ .
Then the following hold:
(i) For any ๐ฅ โ ๐๐ถ (๐ฅ), there exists an ๐ฅโ โ ๐๐น๐ฟ๐ถ (๐ฅ) such that
(17.5) ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ = โ๐ฅ โ ๐ฅ โ๐ , โ๐ฅโโ๐ โ โค 1.
(ii) For any Y > 0, there exists an approximate projection ๐ฅY โ ๐ถ satisfying
โ๐ฅY โ ๐ฅ โ๐ โค ๐๐ถ (๐ฅ) + Y
as well as (17.5) for some ๐ฅโ โ ๐Y๐ฟ๐ถ (๐ฅY).(iii) If ๐ is a Hilbert space, then ๐ฅ โ ๐ฅ โ ๐Y๐ฟ๐ถ (๐ฅ) for all Y โฅ 0.
Proof. (i): Let ๐ฅ โ ๐ถ , since otherwise ๐ฅโ โ 0 โ ๐๐น (๐ฅ) for ๐ฅ = ๐ฅ โ ๐ถ by the denition ofthe Frรฉchet subdierential. Set ๐น (๐ฅ) := โ๐ฅ โ ๐ฅ โ๐ and assume that ๐ฅ โ ๐๐ถ (๐ฅ). The Fermatprinciple Theorem 16.2 then yields that 0 โ ๐๐น [๐ฟ๐ถ + ๐น ] (๐ฅ). Since ๐ฅ โ ๐ถ and ๐ฅ โ ๐ถ , byassumption ๐น is dierentiable at ๐ฅ . Thus Theorem 4.5 shows that ๐๐น (๐ฅ) = {๐ท๐น (๐ฅ)} is asingleton. The sum rule of Corollary 17.3 then yields that ๐ฅโ โ โ๐ท๐น (๐ฅ) โ ๐๐น๐ฟ๐ถ (๐ฅ). Theclaim of (17.5) now follows from Theorem 4.6.
(ii): Compared to (i), we merely invoke the approximate Fermat principle of Theorem 17.13 inplace of Theorem 16.2,which establishes the existence of๐ฅY โ ๐ถ satisfying โ๐ฅYโ๐ฅ โ๐ โค ๐๐ถ (๐ฅ)and 0 โ ๐Y [๐ฟ๐ถ+๐น ] (๐ฅ). The sum rule of Lemma 17.2 then shows that๐ฅโ โ โ๐ท๐น (๐ฅ) โ ๐Y๐ฟ๐ถ (๐ฅ).(iii): In a Hilbert space, we can identify โ๐ท๐น (๐ฅ) with the corresponding gradient โโ๐น (๐ฅ) =(๐ฅ โ ๐ฅ)/โ๐ฅ โ ๐ฅ โ๐ โ ๐ for ๐ฅ โ 0 (otherwise โโ๐น (๐ฅ) = 0 = ๐ฅ โ ๐ฅ). Since ๐Y๐ฟ๐ถ (๐ฅ) is a cone,this implies that ๐ฅ โ ๐ฅ โ ๐๐น๐ฟ๐ถ (๐ฅ) as well. ๏ฟฝ
In the next chapters, we will see that ๐๐น๐ฟ๐ถ (๐ฅ) coincides with a suitable normal cone to๐ถ at๐ฅ . In other words, ๐ฅโ is a normal vector to the set ๐ถ . In Hilbert spaces, this normal vectorcan be identied with the (normalized) vector pointing from ๐ฅ to ๐ฅ .
238
Part IV
SET-VALUED ANALYSIS
239
18 TANGENT AND NORMAL CONES
We now start our study of stability properties of the solutions to nonsmooth optimizationproblems. As we have characterized the latter via subdierential inclusions, we need tostudy the sensitivity of such relations to perturbations. As in the smooth case, this can bedone through derivatives of these conditions with respect to relevant parameters; however,these conditions are expressed as inclusions instead of simple equations. Hence we requirenotions of derivatives for set-valued mappings.
To motivate how we will develop dierential calculus for set-valued mappings, recall fromLemma 4.10 how the subdierential of a convex function ๐น can be dened in terms ofthe normal cone to the epigraph of ๐น . This idea forms the basis of dierentiating generalset-valued mappings ๐ป : ๐ โ ๐ , where instead of taking the normal cone at (๐ฅ, ๐น (๐ฅ))to epi ๐น , we do this at any point (๐ฅ, ๐ฆ) of graph๐ป โ {(๐ฅ, ๐ฆ) โ ๐ ร ๐ | ๐ฆ โ ๐ป (๐ฅ)}. Sincewe are generally not in the nice convex setting โ even for a convex function ๐น โ the setgraph ๐๐น is not convex unless ๐น is linear โ there are some complications which result inhaving to deal with various nonequivalent denitions. In this chapter, we introduce therelevant graphical notions of tangent and normal cones. In Chapter 19, we develop specicexpressions for these conses to sets in ๐ฟ๐ (ฮฉ) dened as pointwise via nite-dimensionalsets. In the following Chapters 20 to 25, we then dene and further develop notions ofdierentiation of set-valued mappings based on these cones.
18.1 definitions and examples
the fundamental cones
Our rst type of tangent cone is dened using roughly the same limiting process ondierence quotients as basic directional derivatives. Let ๐ be a Banach space. We denethe tangent cone (or Bouligand or contingent cone) of the set ๐ถ โ ๐ at ๐ฅ โ ๐ as
(18.1) ๐๐ถ (๐ฅ) โ lim sup๐โ 0
๐ถ โ ๐ฅ๐
=
{ฮ๐ฅ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐
for some ๐ถ 3 ๐ฅ๐ โ ๐ฅ, ๐๐โ 0},
240
18 tangent and normal cones
i.e., the tangent cone is the outer limit (in the sense of Section 6.1) of the โblown upโ sets(๐ถ โ ๐ฅ)/๐ as ๐โ 0.
The tangent cone is closely related to the Frรฉchet normal cone, which is based on the samelimiting process as the Frรฉchet subdierential in Chapter 16:
(18.2) ๐๐ถ (๐ฅ) โ{๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim sup๐ถ3๐ฅโ๐ฅ
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐โ๐ฅ โ ๐ฅ โ๐ โค 0
}.
limiting cones in finite dimensions
One diculty with the Frรฉchet normal cone is that it is not outer semicontinuous. Bytaking their outer limit (in the sense of set-valued mappings), we obtain the less โirregularโ(basic or limiting orMordukhovich) normal cone. This denition is somewhat more involvedin innite dimensions, so we rst consider ๐ถ โ โ๐ at ๐ฅ โ โ๐ . In this case, the limitingnormal cone is dened as
(18.3) ๐๐ถ (๐ฅ) โ lim sup๐ถ3๐ฅโ๐ฅ
๐๐ถ (๐ฅ)
=
{๐ฅโ โ โ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅโ = lim๐โโ
๐ฅโ๐ for some ๐ฅโ๐ โ ๐๐ถ (๐ฅ๐), ๐ถ 3 ๐ฅ๐ โ ๐ฅ
}.
Despite ๐๐ถ being obtained by the outer semicontinuous regularization of ๐๐ถ , the latter issometimes in the literature called the regular normal cone. We stick to the convention ofcalling ๐๐ถ the Frรฉchet normal cone and ๐๐ถ the limiting normal cone.
The limiting variant of the tangent cone is the Clarke tangent cone (also known as theregular tangent cone), dened for a set ๐ถ โ โ๐ at ๐ฅ โ โ๐ as the inner limit
(18.4) ๐๐ถ (๐ฅ) โ lim inf๐ถ3๐ฅโ๐ฅ,๐โ 0
๐ถ โ ๐ฅ๐
=
{ฮ๐ฅ โ โ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐๐โ 0, ๐ถ 3 ๐ฅ๐ โ ๐ฅ there exists ๐ถ 3 ๐ฅ๐ โ ๐ฅwith (๐ฅ๐ โ ๐ฅ๐)/๐๐ โ ฮ๐ฅ
}.
We will later in Corollary 18.20 see that for a closed set ๐ถ โ โ๐ , we in fact have that๐๐ถ (๐ฅ) = lim inf๐ถ3๐ฅโ๐ฅ ๐๐ถ (๐ฅ).The following example as well as Figure 18.1 illustrate the dierent cones.
Example 18.1. We compute the dierent tangent and normal cones at all points ๐ฅ โ ๐ถfor dierent ๐ถ โ โ2.
241
18 tangent and normal cones
๐ถ
๐ฅ
(a) tangent cone ๐๐ถ (๐ฅ)
๐ถ
๐ฅ
(b) Clarke tangent cone๐๐ถ (๐ฅ)
๐ถ
๐ฅ
(c) Frรฉchet normal cone๐๐ถ (๐ฅ) = {0}
๐ถ
๐ฅ
(d) limiting normalcone ๐๐ถ (๐ฅ)
Figure 18.1: Illustration of the dierent normal and tangent cones at a nonregular point ofa set ๐ถ . The dot indicates the base point ๐ฅ . The thick arrows and dark lled-inareas indicate the directions included in the cones.
(i) ๐ถ = ๐น(0, 1): Clearly, if ๐ฅ โ int๐ถ , then
๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = {0},๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = โ2.
For any ๐ฅ โ bd๐ถ , on the other hand,
๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = [0,โ)๐ฅ โ {๐ก๐ฅ | ๐ก โฅ 0} ,๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = {๐ง | ใ๐ง, ๐ฅใ โค 0}.
(ii) ๐ถ = [0, 1]2: For ๐ฅ โ int๐ถ , we again have that ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = {0} and ๐๐ถ (๐ฅ) =๐๐ถ (๐ฅ) = โ2; similarly, for ๐ฅ โ bd๐ถ \ {(0, 0), (0, 1), (1, 0), (1, 1)} (i.e., ๐ฅ is not oneof the corners of ๐ถ), again ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = [0,โ)๐ฅ and ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) ={๐ง | ใ๐ง, ๐ฅใ = 0}. Of the corners, we concentrate on ๐ฅ = (1, 1), the others beinganalogous. Then
๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = {(ฮ๐ฅ,ฮ๐ฆ) | ฮ๐ฅ,ฮ๐ฆ โฅ 0},๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = {(ฮ๐ฅ,ฮ๐ฆ) | ฮ๐ฅ,ฮ๐ฆ โค 0}.
(iii) ๐ถ = [0, 1]2 \ [ 12 , 1]2: Here as well ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = {0} and ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) = โ2
for ๐ฅ โ int๐ถ . Other points on bd๐ถ are computed analogously to similar cornersand edges of the square [0, 1]2, but we have to be careful with the โinterior cornerโ๐ฅ = ( 12 , 12 ). Here, similarly to Figure 18.1c, we see that ๐๐ถ (๐ฅ) = {0}. However, asa lim sup,
๐๐ถ (๐ฅ) = (0, 1) [0,โ) โช (1, 0) [0,โ).
242
18 tangent and normal cones
For the tangent cones, we then get
๐๐ถ (๐ฅ) = {(ฮ๐ฅ,ฮ๐ฆ) | ฮ๐ฅ โค 0 or ฮ๐ฆ โค 0},while, as a lim inf ,
๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) โช (1, 0)โ โช (0, 1)โ.
limiting cones in infinite dimensions
Let now ๐ be again a Banach space. Although the fundamental cones โ the (basic) tangentcone and the Frรฉchet normal cone โ were dened based on strongly convergent sequences,in innite-dimensional spaces weak modes of convergence better replicate various rela-tionships between the dierent cones. We thus call an element ฮ๐ฅ โ ๐ weakly tangent to๐ถ at ๐ฅ if
(18.5) ฮ๐ฅ = w-lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐
for some ๐ถ 3 ๐ฅ๐ โ ๐ฅ, ๐๐โ 0,
where the w-lim of course stands for ๐โ1๐(๐ฅ๐ โ ๐ฅ) โ ฮ๐ฅ . We denote by the weak tangent
cone (or weak contingent cone) ๐๐ค๐ถ (๐ฅ) โ ๐ the set of all such ฮ๐ฅ . Using the notion of outerlimits of set-valued mappings from Chapter 6, we can also write
(18.6) ๐๐ค๐ถ (๐ฅ) = w-lim sup๐โ 0
๐ถ โ ๐ฅ๐
.
Likewise, the limiting normal cone ๐๐ถ (๐ฅ) to ๐ถ โ ๐ in a general innite-dimensionalBanach space ๐ is based on weak-โ limits. Moreover, several proofs will be easier if weslightly relax the denition. Therefore, given Y โฅ 0 we rst introduce the Y-normal cone of๐ฅโ โ ๐ โ satisfying
(18.7) ๐ Y๐ถ (๐ฅ) โ
{๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim sup๐ถ3๐ฅโ๐ฅ
ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐โ๐ฅ โ ๐ฅ โ๐ โค Y
}.
The Frรฉchet normal cone is then simply ๐๐ถ (๐ฅ) โ ๐ 0๐ถ (๐ฅ).
Now, the (basic or limiting or Mordukhovich) normal cone is dened as
(18.8) ๐๐ถ (๐ฅ) โ w-โ-lim sup๐ฅโ๐ฅ, Yโ 0
๐ Y๐ถ (๐ฅ).
In other words, ๐ฅโ โ ๐๐ถ (๐ฅ) if and only if there exist ๐ถ 3 ๐ฅ๐ โ ๐ฅ , Y๐โ 0 and ๐ฅโ๐โ ๐ Y๐
๐ถ (๐ฅ๐)such that ๐ฅโ
๐โโ ๐ฅโ.
In Gรขteaux smooth Banach spaces, we can x Y โก 0 in (18.8). Thus such spaces can betreated similarly to the nite-dimensional case in (18.3).
243
18 tangent and normal cones
Theorem 18.2. Let ๐ be a Gรขteaux smooth Banach space, ๐ถ โ ๐ , and ๐ฅ โ ๐ . Then
(18.9) ๐๐ถ (๐ฅ) = w-โ-lim sup๐ฅโ๐ฅ
๐๐ถ (๐ฅ).
Proof. Denote by ๐พ the set on the right hand side of (18.9). Then by the denition (18.8),clearly ๐๐ถ (๐ฅ) โ ๐พ . To show ๐๐ถ (๐ฅ) โ ๐พ , let ๐ฅโ โ ๐๐ถ (๐ฅ). Then (18.8) yields ๐ฅ๐ โ ๐ฅ , Y๐โ 0,and ๐ฅโ
๐โโ ๐ฅ๐ with ๐ฅโ
๐โ ๐ Y๐
๐ถ (๐ฅ๐). We need to show that there exist some ๐ฅ๐ โ ๐ฅ and๐ฅโ๐
โโ ๐ฅโ with ๐ฅโ๐โ ๐๐ถ (๐ฅ๐). Indeed, since ๐ Y
๐ถ = ๐Y๐ฟ๐ถ , by Corollary 17.12 applied to ๐น = ๐ฟ๐ถ ,we have for any sequence ๐ฟ๐โ 0 that
๐ฅโ๐ โ ๐ Y๐๐ถ (๐ฅ๐) โ
โ๐ฅโ๐น(๐ฅ๐ ,๐ฟ๐ )
๐๐ถ (๐ฅ) + ๐ฟ๐๐น๐ โ (๐ โ โ).
In particular, there exist ๐ฅ๐ โ ๐น(๐ฅ๐ , ๐ฟ๐) and ๐ฅโ๐ โ ๐๐ถ (๐ฅ๐) โฉ ๐น(๐ฅโ๐, ๐ฟ๐), which implies that
๐ฅ๐ โ ๐ฅ and ๐ฅโ๐
โโ ๐ฅโ as desired. ๏ฟฝ
Remark 18.3. Theorem 18.2 can be extended to Asplund spaces โ in particular to reexive Banachspaces. The equivalence of (18.9) and (18.8) can, in fact, be used as a denition of an Asplund space.For details we refer to [Mordukhovich 2006, Theorem 2.35].
Finally, the Clarke tangent cone is dened as in nite dimensions as
(18.10) ๐๐ถ (๐ฅ) โ lim inf๐ถ3๐ฅโ๐ฅ,๐โ 0
๐ถ โ ๐ฅ๐
=
{ฮ๐ฅ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐๐โ 0, ๐ถ 3 ๐ฅ๐ โ ๐ฅ there exists ๐ถ 3 ๐ฅ๐ โ ๐ฅwith (๐ฅ๐ โ ๐ฅ๐)/๐๐ โ ฮ๐ฅ
}.
In innite-dimensional spaces, however, we in general only have the inclusion ๐๐ถ (๐ฅ) โlim inf๐ถ3๐ฅโ๐ฅ ๐๐ถ (๐ฅ); see Corollary 18.20.
Remark 18.4 (a much too brief history of various cones). The (Bouligand) tangent cone was alreadyintroduced for smooth sets by Peano in 1908 [Peano 1908]; the term contingent cone is due toBouligand [Bouligand 1930]. The Clarke tangent cone (also called circatangent cone) was introducedin [Clarke 1973; Clarke 1975]; see also [Clarke 1990]. The limiting normal cone can be found in[Mordukhovich 1976], who stressed the need of dening (nonconvex) normal cones directly ratherthan as (necessarily convex) polars of tangent cones. The history of the Frรฉchet normal cone isharder to trace, but it has appeared in the literature as the polar of the tangent cone. We will seethat in nite dimensions, ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ. In innite dimensions, ๐๐ถ (๐ฅ)โฆ is sometimes called theDini normal cone and is in general not equal to the Frรฉchet normal cone.
We do not attempt to do full justice to the muddier parts of the historical development here, andrather refer to the accounts in [Dolecki & Greco 2011; Bigolin & Golo 2014] as well as [Rockafellar &Wets 1998, Commentary to Ch. 6] and [Mordukhovich 2018, Commentary to Ch. 1]. Various furthercones are also discussed in [Aubin & Frankowska 1990].
244
18 tangent and normal cones
18.2 basic relationships and properties
As seen in Example 18.1, the limiting normal cone ๐๐ถ (๐ฅ) can be larger than the Frรฉchetnormal cone ๐๐ถ (๐ฅ); conversely, the Clarke tangent cone ๐๐ถ (๐ฅ) is smaller than the tangentcone ๐๐ถ (๐ฅ); see Figure 18.1. These inclusions hold in general.
Theorem 18.5. Let ๐ถ โ ๐ and ๐ฅ โ ๐ . Then(i) ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ);(ii) ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ).
Proof. If we x the base point ๐ฅ as ๐ฅ in the denition (18.10) of๐๐ถ (๐ฅ), the tangent inclusion๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) is clear from the denition (18.1) of ๐๐ถ (๐ฅ) as an outer limit and of ๐๐ถ (๐ฅ) asan inner limit. The inclusion ๐๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ) is likewise clear from the denition of ๐๐ถ (๐ฅ)as a strong outer limit and of ๐๐ค๐ถ (๐ฅ) is the corresponding weak outer limit.
The normal inclusion ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) follows from the denition (18.8) of ๐๐ถ (๐ฅ) as theouter limit of ๐ Y
๐ถ (๐ฅ) as ๐ฅ โ ๐ฅ and Yโ 0. (In nite dimensions, we can x Y = 0 in thisargument or refer to the equivalence of denitions shon in Theorem 18.2.) ๏ฟฝ
For a closed and convex set๐ถ , however, both the Frรฉchet and limiting normal cones coincidewith the convex normal cone dened in Lemma 4.8 (which we here denote by ๐๐ฟ๐ถ (๐ฅ) toavoid confusion).
Lemma 18.6. Let ๐ถ โ ๐ be nonempty, closed, and convex. Then for all ๐ฅ โ ๐ ,(i) ๐๐ถ (๐ฅ) = ๐๐ฟ๐ถ (๐ฅ);(ii) if ๐ is Gรขteaux smooth (in particular, nite-dimensional), ๐๐ถ (๐ฅ) = ๐๐ฟ๐ถ (๐ฅ).
Proof. If ๐ฅ โ ๐ถ , it follows from their denitions that all three cones are empty. We can thusassume that ๐ฅ โ ๐ถ .(i): If ๐ฅโ โ ๐๐ฟ๐ถ (๐ฅ), we have by denition that
ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐ โค 0 for all ๐ฆ โ ๐ถ.
Taking in particular ๐ฆ = ๐ฅ and passing to the limit ๐ฅ โ ๐ฅ thus implies that ๐ฅโ โ ๐๐ถ (๐ฅ).Conversely, let ๐ฅโ โ ๐๐ถ (๐ฅ) and let ๐ฆ โ ๐ถ be arbitrary. Since ๐ถ is convex, this implies that๐ฅ๐ก โ ๐ฅ + ๐ก (๐ฆ โ ๐ฅ) โ ๐ถ for any ๐ก โ (0, 1) as well. We also have that ๐ฅ๐ก โ ๐ฅ for ๐ก โ 0. From(18.2), it then follows by inserting the denition of ๐ฅ๐ก and dividing by ๐ก > 0 that
0 โฅ lim๐กโ0
ใ๐ฅโ, ๐ฅ๐ก โ ๐ฅใ๐โ๐ฅ๐ก โ ๐ฅ โ๐ =
ใ๐ฅโ, ๐ฆ โ ๐ฅใ๐โ๐ฆ โ ๐ฅ โ๐ .
245
18 tangent and normal cones
and hence, since ๐ฆ โ ๐ถ was arbitrary, that ๐ฅโ โ ๐๐ฟ๐ถ (๐ฅ).(ii): By Lemmas 2.5 and 6.8 and Theorem 6.11, ๐๐ฟ๐ถ is strong-to-weak-โ outer semicontinuous,which by Theorem 18.5 and the Y โก 0 characterization of Theorem 18.2 implies that
๐๐ถ (๐ฅ) = w-โ-lim sup๐ฅโ๐ฅ
๐๐ถ (๐ฅ) = w-โ-lim sup๐ฅโ๐ฅ
๐๐ฟ๐ถ (๐ฅ) โ ๐๐ฟ๐ถ (๐ฅ) = ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ).
Hence ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ). ๏ฟฝ
Note that convexity was only used for the second inclusion, and hence ๐๐ฟ๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)always holds. In general, comparing (18.2) with (16.2), we have the following relation.
Corollary 18.7. Let ๐ถ โ ๐ and ๐ฅ โ ๐ . Then ๐๐ถ (๐ฅ) = ๐๐น๐ฟ๐ถ (๐ฅ).
The next theorem lists some of the most basic properties of the various tangent and normalcones.
Theorem 18.8. Let ๐ถ โ ๐ and ๐ฅ โ ๐ . Then(i) ๐๐ถ (๐ฅ), ๐๐ถ (๐ฅ), ๐๐ถ (๐ฅ), and ๐๐ถ (๐ฅ) are cones;(ii) ๐๐ถ (๐ฅ), ๐๐ถ (๐ฅ), and ๐ Y
๐ถ (๐ฅ) for every Y โฅ 0 are closed;
(iii) ๐๐ถ (๐ฅ) and ๐ Y๐ถ (๐ฅ) for every Y โฅ 0 are convex;
(iv) if ๐ is nite-dimensional, then ๐๐ถ (๐ฅ) is closed.
Proof. We argue the dierent properties for each type of cone in turn.
The Frรฉchet (Y-)normal cone: It is clear from the denition of ๐๐ถ (๐ฅ) that it is a cone, i.e.,that ๐ฅโ โ ๐๐ถ (๐ฅ) implies that _๐ฅโ โ ๐๐ถ (๐ฅ) for all _ > 0.
Let now Y โฅ 0 be arbitrary. Let ๐ฅโ๐โ ๐ Y
๐ถ (๐ฅ) converge to some ๐ฅโ โ ๐ โ. Also suppose๐ถ 3 ๐ฅโ โ ๐ฅ . Then for any โ, ๐ โ โ, we have by the CauchyโSchwarz inequality that
ใ๐ฅโ, ๐ฅโ โ ๐ฅใ๐โ๐ฅโ โ ๐ฅ โ๐ โค ใ๐ฅโ
๐, ๐ฅโ โ ๐ฅใ๐
โ๐ฅโ โ ๐ฅ โ๐ + โ๐ฅโ๐ โ ๐ฅโโ๐
and thus thatlim supโโโ
ใ๐ฅโ, ๐ฅโ โ ๐ฅใ๐โ๐ฅโ โ ๐ฅ โ๐ โค Y + โ๐ฅโ๐ โ ๐ฅโโ๐ .
Since ๐ โ โ was arbitrary and ๐ฅโ๐โ ๐ฅโ, we see that ๐ฅโ โ ๐ Y
๐ถ (๐ฅ) and may conclude that๐ Y๐ถ (๐ฅ) is closed.
246
18 tangent and normal cones
To show convexity, take ๐ฅโ1 , ๐ฅโ2 โ ๐ Y๐ถ (๐ฅ) and let ๐ฅโ โ _๐ฅโ1 + (1 โ _)๐ฅโ2 for some _ โ (0, 1).
We then have
ใ๐ฅโ, ๐ฅโ โ ๐ฅใ๐โ๐ฅโ โ ๐ฅ โ๐ = _
ใ๐ฅโ1 , ๐ฅโ โ ๐ฅใ๐โ๐ฅโ โ ๐ฅ โ๐ + (1 โ _) ใ๐ฅ
โ2, ๐ฅโ โ ๐ฅใ๐โ๐ฅโ โ ๐ฅ โ๐ .
Taking the limit ๐ฅโ โ ๐ฅ now yields ๐ฅโ โ ๐ Y๐ถ (๐ฅ) and hence the convexity.
The limiting normal cone: If ๐ is nite-dimensional, the set ๐๐ถ (๐ฅ) is a closed cone as thestrong outer limit of the (closed) cones ๐๐ถ (๐ฅโ) as ๐ฅโ โ ๐ฅ ; see Lemma 6.2.
The tangent cone: By Lemma 6.2,๐๐ถ (๐ฅ) is closed as the outer limit of the sets๐ถ๐ โ (๐ถโ๐ฅ)/๐as ๐โ 0. To see that it is a cone, suppose ฮ๐ฅ โ ๐๐ถ (๐ฅ). Then there exist by denition ๐๐โ 0and ๐ถ 3 ๐ฅ๐ โ ๐ฅ such that (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ . Now, for any _ > 0, taking ๐๐ โ _โ1๐๐ , wehave (๐ฅ๐ โ ๐ฅ)/๐๐ โ _ฮ๐ฅ . Hence _ฮ๐ฅ โ ๐๐ถ (๐ฅ).The Clarke tangent cone: Finally, ๐๐ถ (๐ฅ) is a closed set through its denition as an innerlimit, cf. Corollary 6.3, as well as a cone by analogous arguments as for ๐๐ถ (๐ฅ). To seethat it is convex, take ฮ๐ฅ 1,ฮ๐ฅ2 โ ๐๐ถ (๐ฅ). Since ๐๐ถ (๐ฅ) is a cone, we only need to showthat ฮ๐ฅ โ ฮ๐ฅ 1 + ฮ๐ฅ2 โ ๐๐ถ (๐ฅ). By the denition of ๐๐ถ (๐ฅ) as an inner limit, we thereforehave to show that for any sequence ๐๐โ 0 and any โbase point sequenceโ ๐ถ 3 ๐ฅ๐ โ ๐ฅ ,there exist ๐ฅ๐ โ ๐ถ such that (๐ฅ๐ โ ๐ฅ๐)/๐๐ โ ฮ๐ฅ . We do this by using the varying basepoint in the denition of ๐๐ถ (๐ฅ) to โbridgeโ between the sequences generating ฮ๐ฅ1 andฮ๐ฅ2; see Figure 18.2. First, since ฮ๐ฅ 1 โ ๐๐ถ (๐ฅ), by the very same denition of ๐๐ถ (๐ฅ) as aninner limit, we can nd for the base point sequence {๐ฅ๐}๐โโ points ๐ถ 3 ๐ฅ 1
๐โ ๐ with
(๐ฅ 1๐โ๐ฅ๐)/๐๐ โ ฮ๐ฅ 1. Continuing in the same way, since ฮ๐ฅ2 โ ๐๐ถ (๐ฅ), we can now nd with
{๐ฅ 1๐}๐โโ as the base point sequence points ๐ฅ2
๐โ ๐ถ such that (๐ฅ2
๐โ ๐ฅ 1
๐)/๐๐ โ ฮ๐ฅ2. It follows
๐ฅ2๐โ ๐ฅ๐๐๐
=๐ฅ2๐โ ๐ฅ 1
๐
๐๐+ ๐ฅ
1๐โ ๐ฅ๐๐๐
โ ฮ๐ฅ 1 + ฮ๐ฅ2 = ฮ๐ฅ .
Thus {๐ฅ๐}๐โโ = {๐ฅ2๐}๐โโ is the sequence we are looking for, showing that ฮ๐ฅ โ ๐๐ถ (๐ฅ) and
hence that the Clarke tangent cone is convex. ๏ฟฝ
One might expect ๐๐ค๐ถ (๐ฅ) to be weakly closed and ๐๐ถ (๐ฅ) to be weak-โ closed. However,this is not necessarily the case, since weak and weak-โ inner and outer limits need not beclosed in the respective topologies. Consequently, ๐๐ถ may also not be (strong-to-weak-โ)outer semicontinuous at a point ๐ฅ , as this would imply ๐๐ถ (๐ฅ) to be weak-โ closed andhence closed. However, in nite dimensions we do have outer semicontinuity.
Corollary 18.9. If ๐ is nite-dimensional, then the mapping ๐ฅ โฆโ ๐๐ถ (๐ฅ) is outer semicontinu-ous.
247
18 tangent and normal cones
๐ฅ๐ฅ๐
ฮ๐ฅ2๐
ฮ๐ฅ1๐
๐ฅ1๐๐ฅ2๐
ฮ๐ฅ1๐+ ฮ๐ฅ2
๐
Figure 18.2: Illustration of the โbridgingโ argument in the proof of Theorem 18.8. As ๐ฅ๐converges to ๐ฅ , the dashed arrows converge to the solid arrows,while the dottedarrow converges to the dash-dotted one, which depicts the point ฮ๐ฅ 1
๐+ ฮ๐ฅ2
๐
that we are trying to prove to be in ๐๐ถ (๐ฅ).
Proof. Let๐ถ 3 ๐ฅ๐ โ ๐ฅ and ๐ฅโ๐โ ๐๐ถ (๐ฅ๐) with ๐ฅโ๐ โ ๐ฅโ. Then for ๐ฟ๐โ 0, the denition (18.3)
provides ๐ฅ๐ โ ๐ถ and ๐ฅโ๐โ ๐๐ถ (๐ฅ๐) with โ๐ฅโ
๐โ ๐ฅโ
๐โ โค ๐ฟ๐ and โ๐ฅ๐ โ ๐ฅ๐ โ โค ๐ฟ๐ . It follows that
๐ถ 3 ๐ฅ๐ โ ๐ฅ and ๐ฅโ๐โ ๐ฅโ with ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐). Thus by denition, ๐ฅโ โ ๐๐ถ (๐ฅ), and hence
๐๐ถ is outer semicontinuous. ๏ฟฝ
18.3 polarity and limiting relationships
The tangent and normal cones satisfy various polarity relationships. To state these, recallfrom Section 1.2 for a general set ๐ถ โ ๐ the denition of the polar cone
๐ถโฆ = {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ถ}
as well as of the bipolar cone ๐ถโฆโฆ = (๐ถโฆ)โฆ โ ๐ .
the fundamental cones
The relations in the following result will be crucial.
Lemma 18.10. Let ๐ be a Banach space, ๐ถ โ ๐ , and ๐ฅ โ ๐ . Then(i) ๐๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ)โฆ โ ๐๐ถ (๐ฅ)โฆ;(ii) if ๐ is reexive, then ๐๐ถ (๐ฅ) = ๐๐ค๐ถ (๐ฅ)โฆ;
248
18 tangent and normal cones
(iii) if ๐ is nite-dimensional, then ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ.
Proof. (i): We take ฮ๐ฅ โ ๐๐ค๐ถ (๐ฅ) and ๐ฅโ โ ๐๐ถ (๐ฅ). Then there exist ๐๐โ 0 and ๐ถ 3 ๐ฅ๐ โ ๐ฅsuch that (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ weakly in ๐ . Thus
ใ๐ฅโ,ฮ๐ฅใ๐ = lim sup๐โโ
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐๐๐
= lim sup๐โโ
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐โ๐ฅ๐ โ ๐ฅ โ๐
ยท โ๐ฅ๐ โ ๐ฅ โ๐๐๐
.
Since ๐ฅโ โ ๐๐ถ (๐ฅ) and๐ถ 3 ๐ฅ๐ โ ๐ฅ ,we have by denition that lim sup๐โโใ๐ฅโ, ๐ฅ๐โ๐ฅใ๐/โ๐ฅ๐โ๐ฅ โ๐ โค 0. Moreover, (๐ฅ๐ โ๐ฅ)/๐๐ โ ฮ๐ฅ implies that โ๐ฅ๐ โ๐ฅ โ๐/๐๐ is bounded. Passing to thelimit, it therefore follows that ใ๐ฅโ,ฮ๐ฅใ๐ โค 0. Since this holds for every ฮ๐ฅ โ ๐๐ค๐ถ (๐ฅ), we seethat ๐ฅโ โ ๐๐ค๐ถ (๐ฅ)โฆ. This shows that ๐๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ)โฆ. Since๐๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ) by Theorem 18.5,๐๐ค๐ถ (๐ฅ)โฆ โ ๐๐ถ (๐ฅ)โฆ follows from Theorem 1.8.
(ii): Due to (i), we only need to show โโโ. Let ๐ฅโ โ ๐๐ถ (๐ฅ). Then, by denition, there exist๐ถ 3 ๐ฅ๐ โ ๐ฅ with
(18.11) lim๐โโ
ใ๐ฅโ,ฮ๐ฅ๐ใ > 0 for ฮ๐ฅ๐ โ๐ฅ๐ โ ๐ฅ
โ๐ฅ๐ โ ๐ฅ โ๐.
We now use the reexivity of ๐ and the EberleinโSmulyan Theorem 1.9 to pass to asubsequence, unrelabelled, such that that ฮ๐ฅ๐ โ ฮ๐ฅ for some ฮ๐ฅ โ ๐ that by denitionsatises ฮ๐ฅ โ ๐๐ค๐ถ (๐ฅ). However, passing to the limit in (18.11) now shows that ใ๐ฅโ,ฮ๐ฅใ๐ > 0and hence that ๐ฅโ โ ๐๐ค๐ถ (๐ฅ)โฆ.(iii): This is immediate from (ii) since ๐๐ถ (๐ฅ) = ๐๐ค๐ถ (๐ฅ) in nite-dimensional spaces. ๏ฟฝ
the limiting cones: preliminary lemmas
For a polarity relationship between the basic normal cone and the Clarke tangent cone, weneed to work signicantly harder. We start here with some preliminary lemmas sharedbetween the nite-dimensional and innite-dimensional setting, and then treat the two inthat order.
Lemma 18.11. Let ๐ be a reexive Banach space, ๐ถ โ ๐ , and ๐ฅ โ ๐ . Then
(18.12) ๐๐ถ (๐ฅ) โ lim inf๐ถ3๐ฅโ๐ฅ
๐๐ค๐ถ (๐ฅ).
If ๐ = โ๐ , then๐๐ถ (๐ฅ) โ lim inf
๐ถ3๐ฅโ๐ฅ๐๐ถ (๐ฅ).
249
18 tangent and normal cones
Proof. The case ๐ = โ๐ trivially follows from (18.12). To prove (18.12), denote by ๐พ the seton its right-hand side. If ฮ๐ฅ โ ๐พ , then there exist Y > 0 and a sequence ๐ถ 3 ๐ฅ๐ โ ๐ฅ suchthat
(18.13) infฮ๐ฅ๐โ๐๐ค๐ถ (๐ฅ๐ )
โฮ๐ฅ๐ โ ฮ๐ฅ โ๐ โฅ 3Y.
Fix ๐ โ โ and suppose that for some ๐โโ 0 and ๐ฅโ โ ๐ถ ,
(18.14) ๐ฅโโ๐ฅ๐๐โ
โ ฮ๐ฅ ๐โค 2Y (โ โ โ).
Using the reexivity of ๐ and the EberleinโSmulyan Theorem 1.9, we then nd a further,unrelabelled, subsequence of {(๐ฅโ , ๐โ)}โโโ such that (๐ฅโ โ ๐ฅ๐)/๐๐โ โ ฮ๐ฅ๐ as โ โ โ forsome ฮ๐ฅ๐ โ ๐๐ค๐ถ (๐ฅ๐) with โฮ๐ฅ๐ โ ฮ๐ฅ โ๐ โค 2Y, in contradiction to (18.13). We thus have
lim๐โ 0
inf๐ฅโ๐ถ
๐ฅโ๐ฅ๐๐ โ ฮ๐ฅ ๐โฅ 2Y.
Since this holds for all ๐ โ โ, we can nd ๐๐ > 0 with ๐๐โ 0 satisfying the inequality
lim inf๐โโ
inf๐ฅโ๐ถ
๐ฅโ๐ฅ๐๐๐โ ฮ๐ฅ
๐โฅ Y
implying that ฮ๐ฅ โ ๐๐ถ (๐ฅ). Therefore (18.12) holds. ๏ฟฝ
Lemma 18.12. Let ๐ be a reexive and Gรขteaux smooth (or nite-dimensional) Banach space,๐ถ โ ๐ , and ๐ฅ โ ๐ . Then
๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆ.
Proof. Take ๐ฅโ โ ๐๐ถ (๐ฅ) and ฮ๐ฅ โ ๐๐ถ (๐ฅ). This gives by Theorem 18.2 (or (18.3) if ๐ isnite-dimensional) sequences ๐ฅ๐ โ ๐ฅ and ๐ฅโ
๐โโ ๐ฅโ with ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐). By Lemma 18.11,
we can nd for each ๐ โ โ a ฮ๐ฅ๐ โ ๐๐ค๐ถ (๐ฅ๐) such that ฮ๐ฅ๐ โ ฮ๐ฅ . Since ๐๐ถ (๐ฅ๐) = ๐๐ค๐ถ (๐ฅ๐)โฆby Lemma 18.10 (ii) when ๐ is reexive, we have ใ๐ฅโ
๐,ฮ๐ฅ๐ใ๐ โค 0. Combining all these
observations, we obtain
ใ๐ฅโ,ฮ๐ฅใ๐ = lim๐โโ
(ใ๐ฅโ๐ ,ฮ๐ฅ๐ใ๐ + ใ๐ฅโ โ ๐ฅโ๐ ,ฮ๐ฅใ๐ + ใ๐ฅโ๐ ,ฮ๐ฅ โ ฮ๐ฅ๐ใ๐ )= lim๐โโ
ใ๐ฅโ๐ ,ฮ๐ฅ๐ใ๐ โค 0.
Since ๐ฅโ โ ๐๐ถ (๐ฅ) was arbitrary, we deduce that ฮ๐ฅ โ ๐๐ถ (๐ฅ)โฆ and hence the claim. ๏ฟฝ
250
18 tangent and normal cones
๐ฅ๐ฅ๐ฅ
๐ฅ + ๐ง๐ฅ + \๐ง๐ฅโ๐ฅโ
(a) By assumption, the interior of the ball around ๐ฅ + ๐ง of radius Y does not intersect ๐ถ (shaded). Inthis example, the point ๐ฅ โ ๐ถ intersects the boundary; however, it is not on the leading edge(thick lines) where the normal vector ๐ฅโ would satisfy ใ๐ง, ๐ฅโใ โฅ Y. Reducing \ < 1 produces anintersecting point ๐ฅ on the leading edge.
๐ฅ
๐ฅ + \๐ง
๐๐ถ (๐ฅ)
๐ฅโ
(b) The โice cream coneโ emanating from ๐ฅ along the line [๐ฅ, ๐ฅ + \๐ง] with a ball of radius Y\ doesnot intersect๐ถ (light shading). From this it follows that the tangent cone๐๐ถ (๐ฅ) (incomplete darkshading) is at a distance Y from ๐ง
Figure 18.3: Geometric illustration of the construction in the proof of Lemma 18.13.
the limiting cones in finite dimensions
We now start our development of polarity relationships between the limiting cones, aswell as limiting relationships between the tangent and Clarke tangent cones. Our maintool will be the following โice cream cone lemmaโ, for which it is important that we endowโ๐ with the Euclidean norm.
Lemma 18.13. Let ๐ถ โ โ๐ be closed and let ๐ฅ โ ๐ถ . Let ๐ง โ โ๐ \ {0} and Y > 0 be such that
(18.15) int๐น(๐ฅ + ๐ง, Y) โฉ๐ถ = โ .
Then for any Y โ (0, Y), there exists an ๐ฅ โ ๐ถ such that there exist
(i) \ โ (0, 1] satisfying โ(๐ฅ + \๐ง) โ (๐ฅ + ๐ง)โ โค Y and infฮ๐ฅโ๐๐ถ (๐ฅ) โฮ๐ฅ โ ๐งโ โฅ Y;(ii) ๐ฅโ โ ๐๐ถ (๐ฅ) satisfying ใ๐ฅโ, ๐งใ โฅ Y and โ๐ฅโโ โค 1.
Proof. We dene the increasing real function ๐ (๐ก) โโ1 + ๐ก2 and ๐น,๐บ : โ๐ รโ โ โ by
๐น (๐ฅ, \ ) โ ๐ (Y)\ + ๐ (โ(๐ฅ + \๐ง) โ (๐ฅ + ๐ง)โ) and ๐บ (๐ฅ, \ ) โ ๐ฟ๐ถ (๐ฅ) + ๐ฟ [0,โ) (\ ).
251
18 tangent and normal cones
Then ๐น +๐บ is proper, coercive, and lower semicontinuous and hence admits a minimizer(๐ฅ, \ ) โ ๐ถร [0,โ) by Theorem 2.1. (We illustrate the idea of such a minimizer geometricallyin Figure 18.3.) Let ๐ฆ := (๐ฅ + \๐ง) โ (๐ฅ + ๐ง).(i): We rst prove \ โ (0, 1]. Suppose \ = 0. Since ๐ฅ โ ๐ถ we obtain using (18.15) that
[๐น +๐บ] (๐ฅ, 0) = ๐ (โ๐ฅ โ (๐ฅ + ๐ง)โ) โฅ ๐ (Y) > ๐ (Y) = [๐น +๐บ] (๐ฅ, 1).This is a contradiction to (๐ฅ, 0) being a minimizer. Thus \ โ 0. Likewise,
๐ (Y)\ + ๐ (โ๐ฆ โ) = [๐น +๐บ] (๐ฅ, \ ) โค [๐น +๐บ] (๐ฅ, 1) = ๐ (Y),where both terms on the left-hand side are nonnegative. Hence \ โค 1. By the monotonicityof ๐ , this also veries the claim โ๐ฆ โ โค Y.We still need to prove the claim on the tangent cone. Since (๐ฅ, \ ) is a minimizer of ๐น +๐บ ,for any \ โฅ 0 and ๐ฅ โ ๐ถ we have
๐ (Y)\ + ๐ (โ๐ฆ โ) โค [๐น +๐บ] (๐ฅ, \ ) โค [๐น +๐บ] (๐ฅ, \ ) = ๐ (Y)\ + ๐ (โ๐ฆ โ) .Letting ๐ฆ โ (๐ฅ + \๐ง) โ (๐ฅ + ๐ง) and using rst this inequality and then the convexity of ๐with ๐โฒ(๐ก) = ๐ก/๐ (๐ก) โค 1 for all ๐ก โฅ 0 yields
๐ (Y) (\ โ \ ) โค ๐ (โ๐ฆ โ) โ ๐ (โ๐ฆ โ)โค ๐โฒ(โ๐ฆ โ) (โ๐ฆ โ โ โ๐ฆ โ)โค โ๐ฆ โ ๐ฆ โ = โ๐ฅ โ ๐ฅ โ (\ โ \ )๐งโ.
Dividing by ๐ = \ โ \ for \ โ [0, \ ), we obtain that Y โค ๐ (Y) โค ๐ฅโ๐ฅ
๐ โ ๐ง . Taking the
inmum over ๐ฅ โ ๐ถ and ๐ โ (0, \ ] thus yields infฮ๐ฅโ๐๐ถ (๐ฅ) โฮ๐ฅ โ ๐งโ โฅ Y.(ii): By Lemma 3.4 (iv), ๐น is convex. Furthermore, int(dom ๐น ) = โ๐+1 so that ๐น is Lipschitznear (๐ฅ, \ ) by Theorem 3.13. Using Theorems 4.6, 4.17 and 4.19 with ๐พ (๐ฅ, \ ) โ ๐ฅ + \๐ง, itfollows that
๐๐น (๐ฅ, \ ) ={(
๐โฒ(โ๐ฆ โ)๐ฆโ๐ (Y) + ๐โฒ(โ๐ฆ โ)ใ๐ง, ๐ฆโใ
) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ใ๐ฆโ, ๐ฆใ = โ๐ฆ โ, โ๐ฆโโ = 1 if ๐ฆ โ 0โ๐ฆโโ โค 1 if ๐ฆ = 0
}.(18.16)
Since โ๐ endowed with the euclidean norm is a Hilbert space, ๐ฅ โฆโ โ๐ฅ โ2 is Gรขteaux dier-entiable by Example 17.6 (i) and Lemma 17.7. Hence ๐๐น (๐ฅ, \ ) is a singleton, and therefore ๐นis Gรขteaux dierentiable at (๐ฅ, \ ) due to Lemma 13.7 and Theorem 13.8. We can thus applythe Fermat principle (Theorem 16.2) and the Frรฉchet sum rule (Corollary 17.3) to deduce0 โ ๐๐น๐น (๐ฅ, \ ) + ๐๐น๐บ (๐ฅ, \ ). Since \ > 0, we have ๐๐น๐บ (๐ฅ, \ ) = ๐๐ถ (๐ฅ) ร {0} by Corollary 18.7,which implies that
(18.17) โ ๐โฒ(โ๐ฆ โ)๐ฆโ โ ๐๐ถ (๐ฅ) and ๐ (Y) + ๐โฒ(โ๐ฆ โ)ใ๐ง, ๐ฆโใ = 0.
Since ๐ (Y) > 0, the second equation in (18.17) yields ๐โฒ(โ๐ฆ โ) โ 0 as well. As ๐โฒ(๐ก) โ (0, 1)and ๐ (๐ก) > ๐ก for all ๐ก > 0, we can set ๐ฅโ โ โ๐โฒ(โ๐ฆ โ)๐ฆโ to obtain ๐ฅโ โ ๐๐ถ (๐ฅ) with โ๐ฅโโ โค 1and ใ๐ง, ๐ฅโใ = ๐ (Y)
๐ โฒ(โ๐ฆ โ) โฅ ๐ (Y) โฅ Y . ๏ฟฝ
252
18 tangent and normal cones
The following consequence of the ice cream cone lemma will be useful for several polarityrelations. We call a set ๐ถ closed near ๐ฅ โ ๐ถ , if there exists a ๐ฟ > 0 such that ๐ถ โฉ ๐น(๐ฅ, ๐ฟ) isclosed.
Lemma 18.14. Let๐ถ โ โ๐ be closed near ๐ฅ . If ๐ง โ ๐๐ถ (๐ฅ), then there exist Y > 0 and a sequence๐ถ 3 ๐ฅ๐ โ ๐ฅ such that for all ๐ โ โ,
(i) infฮ๐ฅ๐โ๐๐ถ (๐ฅ๐ ) โฮ๐ฅ๐ โ ๐งโ โฅ Y;(ii) there exists ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐) with โ๐ฅโ
๐โ โค 1 and ใ๐ฅโ
๐, ๐งใ โฅ Y.
Proof. First, ๐ง โ ๐๐ถ (๐ฅ) implies by (18.10) the existence of Y > 0, ๐ถ 3 ๐ฅ๐ โ ๐ฅ , and ๐๐โ 0such that
inf๐ฅโ๐ถ
๐ฅโ๐ฅ๐๐๐โ ๐ง
โฅ Y (๐ โ โ),implying that
int๐น(๐ฅ๐ + ๐๐๐ง, ๐๐Y) โฉ๐ถ = โ .By taking ๐๐ small enough โ i.e., ๐ โ โ large enough โ we may without loss of generalityassume that๐ถ is closed. For any Y โ (0, Y) and every ๐ โ โ, Lemma 18.13 now yields ๐ฅ๐ โ ๐ถand \๐ โ (0, 1] satisfying(iโ) โ(๐ฅ๐ + \๐๐๐๐ง) โ (๐ฅ + ๐๐๐ง)โ โค Y๐๐ and infฮ๐ฅ๐โ๐๐ถ (๐ฅ๐ ) โฮ๐ฅ๐ โ ๐๐๐งโ โฅ Y๐๐ ;(iiโ) there exists an ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐) such that ใ๐ฅโ
๐, ๐๐๐งใ โฅ ๐๐ Y and โ๐ฅโ
๐โ โค 1.
We readily obtain (i) from (iโ) and (ii) from (iiโ) Since (iโ) also shows that ๐ฅ๐ โ ๐ฅ as ๐๐โ 0,this nishes the proof. ๏ฟฝ
We can now show the converse inclusion of Lemma 18.12 when the set is closed near ๐ฅ .
Theorem 18.15. If ๐ถ โ โ๐ is closed near ๐ฅ , then
๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ.
Proof. By Lemma 18.12, we only need to prove๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆ. We argue by contraposition.Let ๐ง โ ๐๐ถ (๐ฅ). Then Lemma 18.14 yields a sequence {๐ฅโ
๐}๐โโ โ โ๐ such that ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐)
for ๐ถ 3 ๐ฅ๐ โ ๐ฅ and ใ๐ฅโ๐, ๐งใ โฅ Y > 0 as well as โ๐ฅโ
๐โ โค 1. Since {๐ฅโ
๐}๐โโ is bounded, we
can extract a subsequence that converges to some ๐ฅโ โ โ๐ . By denition of the limitingnormal cone, ๐ฅโ โ ๐๐ถ (๐ฅ). Moreover, ใ๐ฅโ, ๐งใ โฅ Y > 0. This provides, as required, that๐ง โ ๐๐ถ (๐ฅ)โฆ. ๏ฟฝ
253
18 tangent and normal cones
the limiting cones in infinite dimensions
We now repeat the arguments above in innite dimensions, however, we need extra careand extra assumptions. Besides reexivity (to obtain weak-โ compactness from EberleinโSmulyan Theorem 1.9) and Gรขteaux smoothness (to obtain dierentiability of the norm),we need to use the approximate Fermat principle of Theorem 17.13 since exact projectionsto general sets ๐ถ may not exist; compare Theorem 17.14. This introduces Y-normal conesinto the proof. The geometric ideas of the proof, however, are the same as illustrated inFigure 18.3.
Lemma 18.16. Let ๐ be a Banach space, ๐ถ โ ๐ be closed, and ๐ฅ โ ๐ถ . Let ๐ง โ ๐ \ {0} andY > 0 be such that
(18.18) int๐น(๐ฅ + ๐ง, Y) โฉ๐ถ = โ .
Then for any Y โ (0, Y) and ๐ > 0, there exists ๐ฅ โ ๐ถ such that there exist
(i) \ โ (0, 1] such that โ(๐ฅ + \๐ง) โ (๐ฅ + ๐ง)โ๐ โค Y and infฮ๐ฅโ๐๐ถ (๐ฅ) โฮ๐ฅ โ ๐งโ๐ โฅ Y;(ii) if ๐ is Gรขteaux smooth, ๐ฅโ โ ๐ ๐
๐ถ (๐ฅ) such that ใ๐ฅโ, ๐งใ๐ โฅ Y and โ๐ฅโโ๐ โ โค 1.
Proof. We dene the convex and increasing real function ๐ (๐ก) โโ1 + ๐ก2 and pick arbitrary
(18.19) Y โ (Y, Y), 0 < ๐ <๐ (Y) โ Y2 + Y , and 0 < ๐ฟ < ๐ (Y) โ ๐ (Y).
The upper bound on ๐ is without loss of generality for (ii) because ๐ ๐๐ถ (๐ฅ) โ ๐
๐ โฒ๐ถ (๐ฅ) for
๐โฒ โฅ ๐ . Then we dene ๐น,๐บ : ๐ รโ โ โ by
๐น (๐ฅ, \ ) โ ๐ (Y)\ + ๐ (โ(๐ฅ + \๐ง) โ (๐ฅ + ๐ง)โ๐ ) and ๐บ (๐ฅ, \ ) โ ๐ฟ๐ถ (๐ฅ) + ๐ฟ [0,โ) (\ ).
The function ๐น + ๐บ is proper and coercive, hence inf (๐น + ๐บ) > โโ. However, it maynot admit a minimizer. Nevertheless, the approximate Fermat principle of Theorem 17.13produces an approximate minimizer (๐ฅ, \ ) โ ๐ถ ร [0,โ) with(a) [๐น +๐บ] (๐ฅ, \ ) โค inf [๐น +๐บ] + ๐ฟ ,(b) [๐น +๐บ] (๐ฅ, \ ) < [๐น +๐บ] (๐ฅ, \ ) + ๐ โ๐ฅ โ ๐ฅ โ๐ + ๐ |\ โ \ | for all (๐ฅ, \ ) โ (๐ฅ, \ ), and(c) 0 โ ๐๐ [๐น +๐บ] (๐ฅ, \ ).
Let again ๐ฆ := (๐ฅ + \๐ง) โ (๐ฅ + ๐ง).(i): We rst prove \ โ (0, ๐ (Y)๐ (Y) ], which will in particular imply that \ โ (0, 1 + Y). Suppose\ = 0. Since ๐ฅ โ ๐ถ , using (18.18) and the convexity of ๐ , we obtain
[๐น +๐บ] (๐ฅ, 0) โ ๐ฟ = ๐ (โ๐ฅ โ (๐ฅ + ๐ง)โ๐ ) โ ๐ฟ โฅ ๐ (Y) โ ๐ฟ > ๐ (Y) = [๐น +๐บ] (๐ฅ, 1)
254
18 tangent and normal cones
in contradiction to (a). Thus \ โ 0. Likewise,
๐ (Y)\ + ๐ (โ๐ฆ โ๐ ) = [๐น +๐บ] (๐ฅ, \ ) โค [๐น +๐บ] (๐ฅ, 1) + ๐ฟ = ๐ (Y) + ๐ฟ < ๐ (Y).
where both terms on the left-hand side are nonnegative. Hence \ โค ๐ (Y)๐ (Y) . By monotonicity
of ๐ , this also veries the claim โ๐ฆ โ๐ โค Y.We still need to prove the claim on the tangent cone. Letting ๐ฆ โ (๐ฅ + \๐ง) โ (๐ฅ + ๐ง), werearrange (b) as
(18.20) ๐ (Y) (\ โ \ ) โ ๐ |\ โ \ | โค ๐ (โ๐ฆ โ๐ ) โ ๐ (โ๐ฆ โ๐ ) + ๐ โ๐ฅ โ ๐ฅ โ๐Using the convexity of ๐ , we also have
๐ (โ๐ฆ โ๐ ) โ ๐ (โ๐ฆ โ๐ ) โค 1๐ (โ๐ฆ โ๐ ) (โ๐ฆ โ๐ โ โ๐ฆ โ๐ ) โค โ๐ฆ โ ๐ฆ โ๐ = โ๐ฅ โ ๐ฅ โ (\ โ \ )๐งโ๐
Further estimating โ๐ฅ โ ๐ฅ โ๐ โค โ๐ฅ โ ๐ฅ โ (\ โ \ )๐งโ๐ + |\ โ \ |, (18.20) now yields
[๐ (Y) โ 2๐] (\ โ \ ) โค (1 + ๐)โ๐ฅ โ ๐ฅ โ (\ โ \ )๐งโ๐ (\ โ [0, \ ), ๐ฅ โ ๐ถ).Dividing by (1 + ๐) (\ โ \ ) and using (18.19) (for the rst inequality), we obtain that
Y โค ๐ (Y) โ 2๐1 + ๐ โค inf
๐ฅโ๐ถ, \โ[0,\ )
๐ฅ โ ๐ฅ\ โ \ โ ๐ง
๐
.
This shows infฮ๐ฅโ๐๐ถ (๐ฅ) โฮ๐ฅ โ ๐งโ๐ โฅ Y.(ii): By Lemma 3.4 (iv), ๐น is convex. Furthermore int(dom ๐น ) = ๐ ร โ, and hence ๐น isLipschitz near (๐ฅ, \ ) by Theorem 3.13. Using Theorems 4.6, 4.17 and 4.19 with ๐พ (๐ฅ, \ ) โ๐ฅ + \๐ง, it follows that
๐๐น (๐ฅ, \ ) ={(
๐โฒ(โ๐ฆ โ๐ )๐ฆโ๐ (Y) + ๐โฒ(โ๐ฆ โ๐ )ใ๐ง, ๐ฆโใ๐
) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ใ๐ฆโ, ๐ฆใ๐ = โ๐ฆ โ๐ , โ๐ฆโโ๐ โ = 1 if ๐ฆ โ 0โ๐ฆโโ๐ โ โค 1 if ๐ฆ = 0
}.(18.21)
Again, ๐๐น (๐ฅ, \ ) is a singleton by Lemma 17.7 and the assumption that ๐ is Gรขteaux smooth.
We can thus apply the Y-sum rule (Lemma 17.2) in (c) to deduce 0 โ ๐๐น (๐ฅ, \ ) + ๐๐๐บ (๐ฅ, \ ).Since \ > 0, we have ๐๐๐บ (๐ฅ, \ ) = ๐ ๐
๐ถ (๐ฅ) ร {0}, which implies that
(18.22) โ ๐โฒ(โ๐ฆ โ๐ )๐ฆโ โ ๐ ๐๐ถ (๐ฅ) and ๐ (Y) + ๐โฒ(โ๐ฆ โ๐ )ใ๐ง, ๐ฆโใ๐ = 0.
Since ๐ (Y) > 0, the second equation in (18.22) yields ๐โฒ(โ๐ฆ โ๐ ) โ 0 as well. As ๐โฒ(๐ก) โ (0, 1)and ๐ (๐ก) > ๐ก for all ๐ก > 0, we can set ๐ฅโ โ โ๐โฒ(โ๐ฆ โ๐ )๐ฆโ to obtain ๐ฅโ โ ๐
๐๐ถ (๐ฅ) with
โ๐ฅโโ๐ โ โค 1 and ใ๐ง, ๐ฅโใ๐ = ๐ (Y)๐ โฒ(โ๐ฆ โ๐ ) โฅ ๐ (Y) โฅ Y . ๏ฟฝ
Remark 18.17. If ๐ is in addition reexive, we can use the EberleinโSmulyan Theorem 1.9 to pass tothe limit as ๐โ 0 in Lemma 18.16 and produce ๐ฅโ โ ๐๐ถ (๐ฅ) satisfying the other claims of the lemma.
255
18 tangent and normal cones
Lemma 18.18. Let ๐ be a Banach space and ๐ถ โ ๐ be closed near ๐ฅ โ ๐ถ . If ๐ง โ ๐๐ถ (๐ฅ), thenthere exist Y > 0 and a sequence ๐ถ 3 ๐ฅ๐ โ ๐ฅ such that for all ๐ โ โ,
(i) infฮ๐ฅ๐โ๐๐ถ (๐ฅ๐ ) โฮ๐ฅ๐ โ ๐งโ๐ โฅ Y;(ii) if ๐ is Gรขteaux smooth, there exists ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐) with โ๐ฅโ
๐โ๐ โ โค 1 and ใ๐ฅโ
๐, ๐งใ๐ โ โฅ Y.
Proof. The assumption ๐ง โ ๐๐ถ (๐ฅ) implies by (18.10) the existence of Y > 0,๐ถ 3 ๐ฅ๐ โ ๐ฅ , and๐๐โ 0 such that
inf๐ฅโ๐ถ
๐ฅโ๐ฅ๐๐๐โ ๐ง
๐โฅ Y (๐ โ โ).
This implies thatint๐น(๐ฅ๐ + ๐๐๐ง, ๐๐Y) โฉ๐ถ = โ .
Since the argument is local, by taking ๐๐ small enough โ i.e., ๐ โ โ large enough โ wemay without loss of generality assume that ๐ถ is closed. For any Y โ (0, Y) and every ๐ โ โ,Lemma 18.16 now produces ๐ฅ๐ โ ๐ถ and \๐ โ (0, 1] satisfying(iโ) โ(๐ฅ๐ + \๐๐๐๐ง) โ (๐ฅ + ๐๐๐ง)โ๐ โค Y๐๐ and infฮ๐ฅ๐โ๐๐ถ (๐ฅ๐ ) โฮ๐ฅ๐ โ ๐๐๐งโ๐ โฅ Y๐๐ ;(iiโ) if ๐ is Gรขteaux smooth, there exists ๐ฅโ
๐โ ๐ Y๐๐
๐ถ (๐ฅ๐) such that ใ๐ฅโ๐, ๐๐๐งใ๐ โฅ ๐๐ Y and
โ๐ฅโ๐โ๐ โ โค 1.
We readily obtain (i) from (iโ) and (ii) from (iiโ). Since (iโ) also shows that ๐ฅ๐ โ ๐ฅ as ๐๐โ 0,this nishes the proof. ๏ฟฝ
Theorem 18.19. Let ๐ be a reexive and Gรขteaux smooth Banach space and let ๐ถ โ ๐ beclosed near ๐ฅ โ ๐ถ . Then
๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ.
Proof. By Lemma 18.12, we only need to prove ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆ. Let ๐ง โ ๐๐ถ (๐ฅ). ThenLemma 18.18 yields a sequence {๐ฅโ
๐}๐โโ โ ๐น๐ โ such that ๐ฅโ
๐โ ๐๐ถ (๐ฅ๐) and ใ๐ฅโ
๐, ๐งใ๐ โฅ Y.
Since ๐ is reexive, ๐ โ is reexive as well, and so we can apply Theorem 1.9 to extract asubsequence of {๐ฅโ
๐}๐โโ that converges weakly and thus, again by reexivity, also weakly-โ
to some ๐ฅโ โ ๐๐ถ (๐ฅ) (by denition of the limiting normal cone) with ใ๐ฅโ, ๐งใ๐ โฅ Y > 0. ๏ฟฝ
the clarke tangent cone
We can now show the promised alternative characterization of the Clarke tangent cone๐๐ถ (๐ฅ) as the inner limit of tangent cones.
256
18 tangent and normal cones
Corollary 18.20. Let ๐ be a reexive Banach space and let ๐ถ โ ๐ be closed near ๐ฅ โ ๐ . Then
(18.23) lim inf๐ถ3๐ฅโ๐ฅ
๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) โ lim inf๐ถ3๐ฅโ๐ฅ
๐๐ค๐ถ (๐ฅ).
In particular, if ๐ is nite-dimensional, then
๐๐ถ (๐ฅ) = lim inf๐ถ3๐ฅโ๐ฅ
๐๐ถ (๐ฅ).
Proof. We have already proved the second inclusion of (18.23) in Lemma 18.11. For the rstinclusion, suppose ๐ง โ ๐๐ถ (๐ฅ). Then Lemma 18.18 yields an Y > 0 and a sequence๐ถ 3 ๐ฅ๐ โ ๐ฅsuch that infฮ๐ฅ๐โ๐๐ถ (๐ฅ๐ ) โฮ๐ฅ๐ โ๐งโ๐ โฅ Y for all ๐ . This shows that ๐ง โ lim inf๐ถ3๐ฅโ๐ฅ ๐๐ถ (๐ฅ). ๏ฟฝ
Remark 18.21. Lemma 18.18 and thus the rst inclusion of (18.23) do not actually require the reexivityof ๐ . In contrast , Lemma 18.11 and thus the second inclusion of (18.23) do not require the localclosedness assumption. Besides ๐ being reexive, it holds more generally if ๐ has the RadonโRieszproperty and is Frรฉchet smooth; see [Mordukhovich 2006, Theorem 1.9] and compare Remark 17.5.
18.4 regularity
It stands to reason that without any assumptions on the set ๐ถ โ ๐ such as convexity,there is little hope of obtaining precise characterizations or exact transformation rulesfor the various cones. Similarly, precise characterizations or exact calculus rules for thederivatives of set-valued mappings โ which, respectively, we will derive from the formerโ require strong assumptions on these mappings. This is especially true of the limitingcones. As betting the introductory character of this textbook, we will therefore onlydevelop calculus for the derivatives based on the limiting cones when they are equal thecorresponding basic cones โ i.e., when they are regular. This will allow deriving exactresults that are nevertheless applicable to the situations we have been focusing on in theprevious parts, such as problems of the form (P). These conditions can be compared toconstraint qualications in nonlinear optimization that guarantee that the tangent conecoincides with the linearization cone. However, โfuzzyโ results are available under moregeneral assumptions, for which we refer to the monographs [Aubin & Frankowska 1990;Rockafellar & Wets 1998; Mordukhovich 2018; Mordukhovich 2006].
Specically, we say that ๐ถ โ ๐ is tangentially regular at ๐ฅ โ ๐ถ if ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ), andnormally regular at ๐ฅ if ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ). We call ๐ถ regular at ๐ฅ if ๐ถ is both normally andtangentially regular.
Example 18.22. With ๐ถ โ โ2 as in Example 18.1, we see that ๐ถ = ๐น(0, 1) and ๐ถ = [0, 1]2are regular at every ๐ฅ โ ๐ถ , while ๐ถ = [0, 1]2 \ [ 12 , 1]2 is regular everywhere except at๐ฅ = ( 12 , 12 ).
257
18 tangent and normal cones
In nite dimensions, the two concepts of regularity are equivalent and have various char-acterizations. By Lemma 18.6, these hold in particular for closed convex sets.
Theorem 18.23. Let ๐ถ โ โ๐ be closed near ๐ฅ . Then the following conditions are equivalent:
(i) ๐ถ is normally regular at ๐ฅ ;
(ii) ๐ถ is tangentially regular at ๐ฅ ;
(iii) ๐๐ถ is outer semicontinuous at ๐ฅ ;
(iv) ๐๐ถ is inner semicontinuous at ๐ฅ (relative to ๐ถ).
In particular, if any of these hold, ๐ถ is regular at ๐ฅ .
Proof. (i) โ (ii): If (i) holds, then by Theorems 1.8, 18.5 and 18.15 and Lemma 18.10
๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆโฆ = ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ),
which shows (ii). The other direction is completely analogous, exchanging the roles of โ๐ โand โ๐ โ to obtain
๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆโฆ = ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ).
(i)โ (iii): If (i) holds, then the outer semicontinuity of ๐๐ถ (Corollary 18.9) and the inclu-sion ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) from Theorem 18.5 show that lim sup๐ฅโ๐ฅ ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ), i.e., theouter semicontinuity of ๐๐ถ . Conversely, the outer semicontinuity of ๐๐ถ and the deni-tion ๐๐ถ (๐ฅ) = lim sup๐ฅโ๐ฅ ๐๐ถ (๐ฅ) show that ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ). Combined with the inclusion๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) from Theorem 18.5, we obtain (i).
(ii) โ (iv): To show that (iv) implies (ii), recall from Corollary 18.20 that
(18.24) ๐๐ถ (๐ฅ) = lim inf๐ถ3๐ฅโ๐ฅ
๐๐ถ (๐ฅ).
By the assumed inner semicontinuity and the denition of the inner limit, we thus obtainthat ๐๐ถ (๐ฅ) = lim inf๐ถ3๐ฅโ๐ฅ ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ). For the other direction, we simply use ๐๐ถ (๐ฅ) =๐๐ถ (๐ฅ) in (18.24). ๏ฟฝ
Combining the previous result with Lemma 18.10 and Theorem 18.15, we deduce the follow-ing.
Corollary 18.24. If ๐ถ โ โ๐ is regular at ๐ฅ and closed near ๐ฅ , then both ๐๐ถ (๐ฅ) and ๐๐ถ (๐ฅ) areconvex. Furthermore,
(i) ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ;(ii) ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ.
258
18 tangent and normal cones
In innite dimensions, our main equivalent characterization of normal regularity is thefollowing. (We do not have a similar characterization of tangential regularity.)
Theorem 18.25. Let ๐ be a reexive and Gรขteaux smooth Banach space. Then ๐ถ โ ๐ isnormally regular at ๐ฅ โ ๐ถ if and only if ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ.
Proof. Suppose rst that ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ. Since ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆ by Lemma 18.12, wehave ๐๐ถ (๐ฅ)โฆ โ ๐๐ถ (๐ฅ)โฆ. Furthermore, Theorem 18.5 (ii) yields ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ) and thus๐๐ถ (๐ฅ)โฆ โ ๐๐ถ (๐ฅ)โฆ by Theorem 1.8. It follows that ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ)โฆ. We now recall fromTheorem 18.8 that ๐๐ถ (๐ฅ) is closed and convex. Hence ๐ฅโ โ ๐๐ถ (๐ฅ) \ ๐๐ถ (๐ฅ) implies byTheorem 1.13 that there exists ๐ฅ โ ๐ and _ โ โ such that
ใ๐ฅโ, ๐ฅใ๐ โค _ < ใ๐ฅโ, ๐ฅใ๐ (๐ฅโ โ ๐๐ถ (๐ฅ)) .Since ๐๐ถ (๐ฅ) is a cone, this is only possible for _ โฅ 0. Thus the rst inequality shows that๐ฅ โ ๐๐ถ (๐ฅ)โฆ and the second that ๐ฅ โ ๐๐ถ (๐ฅ)โฆ. This is in contradiction to ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ)โฆ.Hence ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ), i.e., ๐ถ is normally regular at ๐ฅ .
Conversely, if ๐ถ is normally regular at ๐ฅ , we obtain using Lemma 18.12 that
๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ)โฆ.By Lemma 18.10 (i), Theorem 18.5 (i), and Theorem 1.8 using the fact that ๐๐ถ (๐ฅ) is a closedconvex cone by Theorem 18.8, we also have
๐๐ถ (๐ฅ)โฆ โ ๐๐ถ (๐ฅ)โฆโฆ โ ๐๐ถ (๐ฅ)โฆโฆ = ๐๐ถ (๐ฅ).Therefore ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ)โฆ as claimed. ๏ฟฝ
In suciently regular spaces, normal regularity implies tangential regularity of closedsets.
Lemma 18.26. Let ๐ be a reexive and Gรขteaux smooth Banach space and let๐ถ โ ๐ be closednear ๐ฅ โ ๐ถ . If ๐ถ is normally regular at ๐ฅ , then ๐ถ is tangentially regular at ๐ฅ .
Proof. Arguing as in the proof of Theorem 18.23 (i) โ (ii), by Theorems 1.8, 18.5 and 18.19and Lemma 18.10 we have
๐๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ) โ ๐๐ค๐ถ (๐ฅ)โฆโฆ = ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ)โฆ = ๐๐ถ (๐ฅ) โ ๐๐ถ (๐ฅ).This shows that ๐๐ถ (๐ฅ) = ๐๐ถ (๐ฅ). ๏ฟฝ
From Lemmas 18.6 and 18.26, we immediately obtain the following regularity result.
Corollary 18.27. Let ๐ be a Gรขteaux smooth Banach space and let๐ถ โ ๐ be nonempty, closed,and convex. Then ๐ถ is normally regular at every ๐ฅ โ ๐ถ . If ๐ is additionally reexive, then ๐ถis also tangentially regular at every ๐ฅ โ ๐ถ .
259
19 TANGENT AND NORMAL CONES OF
POINTWISE-DEFINED SETS
As we have seen in Chapter 18, the relationships between the dierent tangent and normalcones are less complete in innite-dimensional spaces than in nite-dimensional ones. Inthis chapter, however, we show that certain pointwise-dened sets on ๐ฟ๐ (ฮฉ) for ๐ โ (1,โ)largely satisfy the nite-dimensional relations. We will use these results in Chapter 21 toderive expressions for generalized derivatives of pointwise-dened set-valued mappings,in particular for subdierentials of integral functionals. As mentioned in Section 18.4, theserelations are less satisfying for the limiting cones than for the basic cones. To treat thelimiting cones, we will therefore assume the regularity of the underlying pointwise sets. Forthe basic cones, we also require an assumption, which however is weaker than (tangential)regularity.
19.1 derivability
We start with the fundamental regularity assumption. Let ๐ be a Banach space and ๐ถ โ ๐ .We then say that a tangent vector ฮ๐ฅ โ ๐๐ถ (๐ฅ) at ๐ฅ โ ๐ถ is derivable if there exists an Y > 0and a curve b : [0, Y] โ ๐ถ that generates ฮ๐ฅ at 0, i.e.,
(19.1) b (0) = ๐ฅ and ฮ๐ฅ = lim๐โ 0
b (๐) โ b (0)๐
= bโฒ(0).
Note that we do not make any assumptions on the dierentiability or continuity of bexcept at ๐ = 0. We say that ๐ถ is geometrically derivable at ๐ฅ โ ๐ถ if every ฮ๐ฅ โ ๐๐ถ (๐ฅ) isderivable.
As the next lemma shows, the point of this denition is that derivable tangent vectors arecharacterized by a full limit instead of just an inner limit; this additional property willallow us to construct tangent vectors in ๐ฟ๐ (ฮฉ) from pointwise tangent vectors, similarlyto how Clarke regularity was used to obtain equality in the pointwise characterization ofClarke subdierentials of integral functionals in Theorem 13.9.
260
19 tangent and normal cones of pointwise-defined sets
Lemma 19.1. Let๐ถ โ ๐ and ๐ฅ โ ๐ถ . Then the set๐ 0๐ถ (๐ฅ) of derivable tangent vectors is given by
(19.2) ๐ 0๐ถ (๐ฅ) = lim inf
๐โ 0
๐ถ โ ๐ฅ๐
.
Proof. We rst recall that by denition of the inner limit, ฮ๐ฅ is an element of the set on theright-hand side if for every sequence ๐๐โ 0 there exist ๐ฅ๐ โ ๐ถ such that (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ .For a derivable tangent vector ฮ๐ฅ โ ๐ 0
๐ถ (๐ฅ) and any ๐๐โ 0, we can simply take ๐ฅ๐ = b (๐๐).For the converse inclusion, let ฮ๐ฅ be an element of the right-hand side set. Let now ๐๐โ 0be given and take ๐ฅ๐ โ ๐ถ realizing the inner limit. Since ๐๐โ 0 was arbitrary, settingb (๐๐) โ ๐ฅ๐ for all ๐ โ โ denes a curve b : [0, Y] โ ๐ถ for some Y > 0, and henceฮ๐ฅ โ ๐ 0
๐ถ (๐ฅ). ๏ฟฝ
By taking ๐ฅ โก ๐ฅ constant in (18.10) and comparing with (19.2), we immediately obtain thatall Clarke tangent vectors are derivable.
Corollary 19.2. Let ๐ถ โ ๐ and ๐ฅ โ ๐ถ . Then every ฮ๐ฅ โ ๐๐ถ (๐ฅ) is derivable.
Clearly, if ๐ถ is tangentially regular at ๐ฅ , then also every tangent vector is derivable.
Corollary 19.3. If ๐ถ โ ๐ is tangentially regular at ๐ฅ โ ๐ถ , then every ฮ๐ฅ โ ๐๐ถ (๐ฅ) is derivable.
However, a set can be geometrically derivable without being tangentially regular.
Example 19.4. Let ๐ถ โ ( [0,โ) ร {0}) โช ({0} ร [0,โ)) โ โ2. Then we obtain directlyfrom the denition of the tangent cone that
๐๐ถ (๐ฅ1, ๐ฅ2) =
๐ถ, if (๐ฅ1, ๐ฅ2) = (0, 0),โ ร {0}, if ๐ฅ1 = 0, ๐ฅ2 > 0,{0} รโ, if ๐ฅ1 > 0, ๐ฅ2 = 0,โ , otherwise.
However, it follows from Corollary 18.20 that ๐๐ถ (0, 0) = {(0, 0)}. Thus ๐ถ is not tangen-tially regular at (0, 0).On the other hand, for any ฮ๐ฅ = (๐ก1, 0) โ ๐๐ถ (0, 0), ๐ก1 โ โ, setting b (๐ ) := (๐ ๐ก1, 0)yields b (0) = (0, 0) and bโฒ(0) = (๐ก1, 0) = ฮ๐ฅ . Hence ฮ๐ฅ is derivable. Similarly, settingb (๐ ) := (0, ๐ ๐ก2) shows that ฮ๐ฅ = (0, ๐ก2) โ ๐๐ถ (0, 0) is derivable for every ๐ก2 โ โ. Thus ๐ถis geometrically derivable at (0, 0).
261
19 tangent and normal cones of pointwise-defined sets
19.2 tangent and normal cones
As the goal is to dene derivatives of set-valued mappings ๐น : ๐ โ ๐ via tangent cones totheir epigraphs epi ๐น โ ๐ ร๐ , we need to consider product spaces of ๐-integrable functions(with possibly dierent ๐). Let therefore ฮฉ โ โ๐ be an open and bounded domain. Forยฎ๐ โ (๐1, . . . , ๐๐) โ (1,โ)๐ , we then dene
๐ฟ ยฎ๐ (ฮฉ) โ ๐ฟ๐1 (ฮฉ) ร ยท ยท ยท ร ๐ฟ๐๐ (ฮฉ),
endowed with the canonical euclidean product norm, i.e.,
โ๐ขโ๐ฟ ยฎ๐ โ
โ๐โ๐=1
โ๐ข๐ โ2๐ฟ๐๐ (๐ข = (๐ข1, . . . , ๐ข๐) โ ๐ฟ ยฎ๐).
We will need the case ๐ = 2 in Chapter 21; on rst reading of the present chapter, werecommend picturing๐ = 1, i.e., ๐ฟ ยฎ๐ (ฮฉ) = ๐ฟ๐ (ฮฉ) for some ๐ โ (1,โ). We further denoteby ๐โ the conjugate exponent of ๐ โ (1,โ), dened as satisfying 1/๐ + 1/๐โ = 1, and writeยฎ๐โ โ (๐โ1 , . . . , ๐โ๐) so that ๐ฟ ยฎ๐ (ฮฉ)โ ๏ฟฝ ๐ฟ ยฎ๐โ (ฮฉ). Note that ๐ฟ ยฎ๐ (ฮฉ) is reexive and Gรขteauxsmooth as the product of reexive and Gรขteaux smooth spaces; cf. Example 17.6. Finally, wewill write L(ฮฉ) for the ๐-dimensional Lebesgue measure of ฮฉ and recall the characteristicfunction ๐๐ of a set ๐ โ ๐ฟ ยฎ๐ (ฮฉ), which satises ๐๐ (๐ข) = (1, . . . , 1) โ โ๐ if ๐ข โ ๐ and๐๐ (๐ข) = 0 โ โ๐ otherwise.
We then call a set๐ โ ๐ฟ ยฎ๐ (ฮฉ) for ยฎ๐ โ (1,โ)๐ pointwise dened if
๐ โ{๐ข โ ๐ฟ ยฎ๐ (ฮฉ) | ๐ข (๐ฅ) โ ๐ถ (๐ฅ) for a.e. ๐ฅ โ ฮฉ
}for a Borel-measurable mapping ๐ถ : ฮฉ โ โ๐ with ๐ถ (๐ฅ) โ โ๐ . We say that๐ is pointwisederivable if ๐ถ (๐ฅ) is geometrically derivable at every b โ ๐ถ (๐ฅ) for almost every ๐ฅ โ ฮฉ.
the fundamental cones
We now derive pointwise characterizations of the fundamental cones to pointwise denedsets, starting with the tangent cone.
Theorem 19.5. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable. Then for every ๐ข โ ๐ ,
(19.3) ๐๐ (๐ข) ={ฮ๐ข โ ๐ฟ ยฎ๐ (ฮฉ)
๏ฟฝ๏ฟฝ ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
Proof. The inclusion โโโ follows from (18.1) and the fact that a sequence convergent in๐ฟ ยฎ๐ (ฮฉ) for ยฎ๐ โ (1,โ) converges, after possibly passing to a subsequence, pointwise almosteverywhere.
262
19 tangent and normal cones of pointwise-defined sets
For the converse inclusion, we take for almost every ๐ฅ โ ฮฉ a tangent vector ฮ๐ข (๐ฅ) โ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) at ๐ข (๐ฅ) โ ๐ถ (๐ฅ). We only need to consider the case ฮ๐ข โ ๐ฟ ยฎ๐ (ฮฉ). By geometricderivability, we may nd for almost every ๐ฅ โ ฮฉ an Y (๐ฅ) > 0 and a curve b ( ยท , ๐ฅ) :[0, Y (๐ฅ)] โ ๐ถ (๐ฅ) such that b (0, ๐ฅ) = ๐ข (๐ฅ) and bโฒ+(0, ๐ฅ) = ฮ๐ข (๐ฅ). In particular, for anygiven ๐ > 0, we may nd Y๐ (๐ฅ) โ (0, Y (๐ฅ)] such that
(19.4) |b (๐ก, ๐ฅ) โ b (0, ๐ฅ) โ ฮ๐ข (๐ฅ)๐ก |2๐ก
โค ๐ (๐ก โ (0, Y๐ (๐ฅ)], a.e. ๐ฅ โ ฮฉ).
For ๐ก > 0, let us set๐ธ๐,๐ก โ {๐ฅ โ ฮฉ | ๐ก โค Y๐ (๐ฅ)}
and dene
๏ฟฝ๏ฟฝ๐,๐ก (๐ฅ) โ{b (๐ก, ๐ฅ) if ๐ฅ โ ๐ธ๐,๐ก ,๐ข (๐ฅ) if ๐ฅ โ ฮฉ \ ๐ธ๐,๐ก .
Writing b = (b1, . . . , b๐) and ฮ๐ข = (ฮ๐ข1, . . . ,ฮ๐ข๐), we have from (19.4) that
(19.5)|b ๐ (๐ก, ๐ฅ) โ b ๐ (0, ๐ฅ) โ ฮ๐ข ๐ (๐ฅ)๐ก |
๐กโค ๐ ( ๐ = 1, . . . ,๐, ๐ก โ (0, Y๐ (๐ฅ)] for a.e. ๐ฅ โ ฮฉ).
Therefore, using the elementary inequality (๐ + ๐)2 โค 2๐2 + 2๐2, we obtain
(19.6) โ๏ฟฝ๏ฟฝ๐,๐ก โ ๐ขโ2๐ฟ ยฎ๐ =
๐โ๐=1
โ [๏ฟฝ๏ฟฝ๐,๐ก๐ โ ๐ข] ๐ โ2๐ฟ๐ ๐
โค๐โ๐=1
(โซฮฉ๐ก๐ ๐ (๐ + |ฮ๐ข ๐ (๐ฅ) |)๐ ๐ ๐๐ฅ
)2/๐ ๐โค
๐โ๐=1
(๐ก๐L(ฮฉ)1/๐ ๐ + ๐ก โฮ๐ข ๐ โ๐ฟ๐ ๐
)2โค 2๐ก2
๐โ๐=1
(๐L(ฮฉ)1/๐ ๐
)2+ 2๐ก2โฮ๐ขโ2
๐ฟ ยฎ๐ .
Similarly, (19.5) and the same elementary inequality together with Minkowskiโs inequalityin the form (๐๐ + ๐๐)1/๐ โค |๐ | + |๐ | yield
(19.7)โ๏ฟฝ๏ฟฝ๐,๐ก โ ๐ข โ ๐กฮ๐ขโ2
๐ฟ ยฎ๐
๐ก2=
๐โ๐=1
1๐ก2
(โซ๐ธ๐,๐ก
|b ๐ (๐ก, ๐ฅ) โ b ๐ (0, ๐ฅ) โ ๐กฮ๐ข ๐ (๐ฅ) |๐ ๐ ๐๐ฅ
+โซฮฉ\๐ธ๐,๐ก
|ฮ๐ข ๐ (๐ฅ)๐ก |๐ ๐ ๐๐ฅ)2/๐ ๐
โค๐โ๐=1
(๐๐ ๐L(ฮฉ) + โฮ๐ข๐ฮฉ\๐ธ๐,๐ก โ
๐ ๐
๐ฟ ยฎ๐
)2/๐ ๐โค 2
๐โ๐=1
(๐L(ฮฉ)1/๐ ๐
)2+ 2โฮ๐ข๐ฮฉ\๐ธ๐,๐ก โ2๐ฟ ยฎ๐ .
263
19 tangent and normal cones of pointwise-defined sets
Now for each ๐ โ โ, we can nd ๐ก๐โ 0 such that โฮ๐ข๐ฮฉ\๐ธ1/๐,๐ก๐ โ๐ฟ ยฎ๐ โค 1/๐ . This follows fromLebesgueโs dominated convergence theorem and the fact that L(ฮฉ \ ๐ธ๐,๐ก ) โ 0 as ๐ก โ 0.The estimates (19.6) and (19.7) with ๐ = 1/๐ and ๐ก = ๐ก๐ thus show for ๐ข๐ โ ๏ฟฝ๏ฟฝ1/๐,๐ก๐ that๐ข๐ โ ๐ข and (๐ข๐ โ ๐ข)/๐ก๐ โ ฮ๐ข, i.e., ฮ๐ข โ ๐๐ (๐ข). ๏ฟฝ
We next consider the Frรฉchet normal cone.
Theorem 19.6. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable. Then for every ๐ข โ ๐ ,
(19.8) ๐๐ (๐ข) ={๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
Proof. Recalling the denition of ๐๐ (๐ข) from (18.7), we need to nd all ๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ) satis-fying for every given sequence๐ 3 ๐ข๐ โ ๐ข
(19.9) 0 โฅ lim sup๐โโ
ใ๐ขโ, ๐ข๐ โ ๐ขใ๐ฟ ยฎ๐
โ๐ข๐ โ ๐ขโ๐ฟ ยฎ๐=: lim sup
๐โโ๐ฟ๐ .
Let Y > 0 be arbitrary and set ๐ฃ๐ โ ๐ข โ ๐ข๐ as well as(19.10a) ๐ 1
๐ โ {๐ฅ โ ฮฉ | |๐ฃ๐ (๐ฅ) |2 โค Yโ1โ๐ฃ๐ โ๐ฟ ยฎ๐ } (๐ โ โ).Furthermore, let ๐ 2 โ ฮฉ be such that
๐ขโ is bounded on ๐ 2,(19.10b)L(๐ 1
๐ \ ๐ 2) โค Y (๐ โ โ).(19.10c)
Using Hรถlderโs inequality, (19.10a) and (19.10c), we then estimate for ๐ = 1, . . . ,๐
๐ฟ๐ =
โซฮฉ\(๐ 1
๐โฉ๐ 2) ใ๐ขโ(๐ฅ), ๐ฃ๐ (๐ฅ)ใ2 ๐๐ฅ
โ๐ฃ๐ โ๐ฟ ยฎ๐+
โซ๐ 1๐โฉ๐ 2 ใ๐ขโ(๐ฅ), ๐ฃ๐ (๐ฅ)ใ2 ๐๐ฅ
โ๐ฃ๐ โ๐ฟ ยฎ๐
โคโ๐ฮฉ\(๐ 1
๐โฉ๐ 2)๐ขโโ๐ฟ ยฎ๐โ โ๐ฃ๐ โ๐ฟ ยฎ๐
โ๐ฃ๐ โ๐ฟ ยฎ๐+
โซ๐ 1๐โฉ๐ 2
ใ๐ขโ(๐ฅ), ๐ฃ๐ (๐ฅ)ใ2|๐ฃ๐ (๐ฅ) |2
ยท |๐ฃ๐ (๐ฅ) |2โ๐ฃ๐ โ๐ฟ ยฎ๐๐๐ฅ
โค โ๐ฮฉ\(๐ 1๐โฉ๐ 2)๐ข
โโ๐ฟ ยฎ๐โ + Yโ1โซ๐ 2
max{0, ใ๐ข
โ(๐ฅ), ๐ฃ๐ (๐ฅ)ใ2|๐ฃ๐ (๐ฅ) |2
}๐๐ฅ .
If now for almost every๐ฅ โ ฮฉwe have that๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)), then also ใ๐ขโ(๐ฅ), ๐ฃ๐ (๐ฅ)ใ2 โค0 for almost every ๐ฅ โ ฮฉ. It follows using (19.10b) and the reverse Fatou inequality in theprevious estimate that
(19.11) lim sup๐โโ
๐ฟ๐ โค lim sup๐โโ
โ๐ฮฉ\(๐ 1๐โฉ๐ 2)๐ข
โโ๐ฟ ยฎ๐โ .
Since |๐ฃ๐ (๐ฅ) |2 โฅ Yโ1โ๐ฃ๐ โ๐ฟ ยฎ๐ for ๐ฅ โ ฮฉ \ ๐๐๐, we have
โ๐ฃ๐ โ๐ฟ ยฎ๐ โฅ โ๐ฮฉ\๐ 1๐๐ฃ๐ โ๐ฟ ยฎ๐ โฅ (Yโ๐L(ฮฉ \ ๐ 1
๐))1/๐ โ๐ฃ๐ โ๐ฟ ยฎ๐ .
264
19 tangent and normal cones of pointwise-defined sets
Hence L(ฮฉ \ ๐ 1๐) โค Y๐ and L(ฮฉ \ (๐ 1
๐โฉ ๐2)) โค L(ฮฉ \ ๐ 1
๐) + L(ฮฉ \ ๐2) โค ๐ถY for some
constant ๐ถ > 0 and small enough Y > 0. It therefore follows from Egorovโs theorem that๐ฮฉ\(๐ 1
๐โฉ๐ 2)๐ขโ converge to 0 in measure as ๐ โ โ. Since๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ) and ๐ฮฉ\(๐ 1
๐โฉ๐ 2)๐ขโ โค ๐ขโ,
it follows from Vitaliโs convergence theorem (see, e.g., [Fonseca & Leoni 2007, Proposition2.27]) that lim sup๐โโ โ๐ฮฉ\(๐ 1
๐โฉ๐ 2)๐ขโโ๐ฟ ยฎ๐โ = 0. Since Y > 0 was arbitrary, we deduce from
(19.11) that (19.9) holds and, consequently,
๐๐ (๐ข) โ {๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ) | ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.This proves one direction of (19.8), which therefore holds even without geometric deriv-ability.
For the converse inclusion, let ๐ขโ โ ๐๐ (๐ข). We have to show that ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ))for almost every ๐ฅ โ ฮฉ, which we do by contradiction. Assume therefore that the point-wise inclusion does not hold. By the polarity relationship ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) = ๐๐ถ (๐ฅ) (๐ข (๐ฅ))โฆfrom Lemma 18.10, we can nd ๐ฟ > 0 and a Borel set ๐ธ โ ฮฉ of nite positive Lebesguemeasure such that for each ๐ฅ โ ๐ธ, there exists ๐ค (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) with |๐ค (๐ฅ) |2 = 1and ใ๐ขโ(๐ฅ),๐ค (๐ฅ)ใ2 โฅ ๐ฟ . We may without loss of generality assume that ๐ถ (๐ฅ) is geo-metrically derivable at ๐ค (๐ฅ) for every ๐ฅ โ ๐ธ, i.e., for each ๐ฅ โ ๐ธ there exists a curveb ( ยท , ๐ฅ) : [0, Y (๐ฅ)] โ ๐ถ (๐ฅ) such that bโฒ+(0, ๐ฅ) = ๐ค (๐ฅ) and b (0, ๐ฅ) = ๐ข (๐ฅ). Let now ๐ โ (0, ๐ฟ)be arbitrary. By replacing ๐ธ by a subset of positive measure, we may by Egorovโs theoremassume the existence of Y > 0 such that
(19.12) |b (๐ก, ๐ฅ) โ b (0, ๐ฅ) โ๐ค (๐ฅ)๐ก |2 โค ๐๐ก (๐ก โ [0, Y], ๐ฅ โ ๐ธ).
Let us dene
๏ฟฝ๏ฟฝ๐ก (๐ฅ) โ{b (๐ก, ๐ฅ) if ๐ฅ โ ๐ธ,๐ข (๐ฅ) if ๐ฅ โ ฮฉ \ ๐ธ.
Setting ๐ฃ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ข, we have ๐ฃ๐ก (๐ฅ) = b (๐ก, ๐ฅ) โ b (0, ๐ฅ) for ๐ฅ โ ๐ธ and ๐ฃ๐ก (๐ฅ) = 0 for ๐ฅ โ ฮฉ \ ๐ธ.Therefore, writing ๐ฃ๐ก = (๐ฃ๐ก1, . . . , ๐ฃ๐ก๐),๐ค = (๐ค1, . . . ,๐ค๐), and b = (b1, . . . b๐), we obtain using(19.12) for ๐ก โ (0, Y] and some ๐โฒ > 0 that
โ๐ฃ๐ก โ2๐ฟ ยฎ๐ =
๐โ๐=1
(โซ๐ธ|b ๐ (๐ก, ๐ฅ) โ b ๐ (0, ๐ฅ) |๐ ๐ ๐๐ฅ
)2/๐ ๐โค
๐โ๐=1
(โซ๐ธ( |๐ค ๐ (๐ฅ) |๐ก + ๐๐ก)๐ ๐ ๐๐ฅ
)2/๐ ๐โค ๐โฒ๐ก2.
Likewise,
ใ๐ขโ(๐ฅ), ๐ฃ๐ก (๐ฅ)ใ2 โฅ ใ๐ขโ(๐ฅ),๐ค (๐ฅ)ใ2 โ |๐ขโ(๐ฅ) |2 ยท |b (๐ก, ๐ฅ) โ b (0, ๐ฅ) โ๐ค๐ก |2 โฅ ๐ฟ๐ก โ ๐๐ก .It follows that
lim sup๐กโ 0
โซ๐ธ
ใ๐ขโ(๐ฅ), ๐ฃ๐ก (๐ฅ)ใ2โ๐ฃ๐ก โ๐ฟ ยฎ๐
๐๐ฅ โฅ lim sup๐กโ 0
L(๐ธ) (๐ฟ๐ก โ ๐๐ก)๐โฒ๐ก
=L(๐ธ) (๐ฟ โ ๐)
๐โฒ> 0.
265
19 tangent and normal cones of pointwise-defined sets
Taking ๐ข๐ โ ๏ฟฝ๏ฟฝ1/๐ for ๐ โ โ, we obtain lim๐โโ ๐ฟ๐ > 0 and therefore ๐ขโ โ ๐๐ (๐ข). Bycontraposition, this shows that ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ. ๏ฟฝ
We can now derive a similar polarity relationships as to the nite-dimensional one inLemma 18.10.
Corollary 19.7. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable and ๐ข โ ๐ . Then ๐๐ (๐ข) = ๐๐ (๐ข)โฆ.
Proof. By Theorems 19.5 and 19.6 and Lemma 18.10, we have
(19.13) ๐ขโ โ ๐๐ (๐ข) โ ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) (a.e. ๐ฅ โ ฮฉ)โ ใ๐ขโ(๐ฅ),ฮ๐ข (๐ฅ)ใ2 โค 0 (a.e. ๐ฅ โ ฮฉ when ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)))โ ใ๐ขโ,ฮ๐ขใ๐ฟ ยฎ๐ โค 0 (when ฮ๐ข โ ๐๐ (๐ข))โ ๐ขโ โ ๐๐ (๐ข)โฆ.
Hence ๐๐ (๐ข) โ ๐๐ (๐ข)โฆ.For the converse inclusion, we need to improve the implication in (19.13) to an equivalence.We argue by contradiction. Assume that๐ขโ โ ๐๐ (๐ข)โฆ and that there exists some ฮ๐ข โ ๐๐ (๐ข)and a subset ๐ธ โ ฮฉ with L(ฮฉ \ ๐ธ) > 0 and
ใ๐ขโ(๐ฅ),ฮ๐ข (๐ฅ)ใ2 > 0 (๐ฅ โ ๐ธ).
Taking ๐ขโ(๐ฅ) โ (1 + ๐ก๐๐ธ (๐ฅ))๐ขโ(๐ฅ), we obtain for sucient large ๐ก that ใ๐ขโ,ฮ๐ขใ๐ฟ ยฎ๐ > 0. Thiscontradicts that ๐ขโ โ ๐๐ (๐ข)โฆ. Hence ๐๐ (๐ข) โ ๐๐ (๐ข)โฆ. ๏ฟฝ
the limiting cones
For the limiting cones, we in general only have an inclusion of the pointwise cones.
Theorem 19.8. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable. Then for every ๐ข โ ๐ ,
๐๐ (๐ข) โ{ฮ๐ข โ ๐ฟ ยฎ๐ (ฮฉ)
๏ฟฝ๏ฟฝ ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
Proof. Let ฮ๐ข โ ๐ฟ ยฎ๐ฮฉ) with ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ and let ๐ข๐ โ ๐ข in๐ฟ ยฎ๐ (ฮฉ). In particular, we then have ๐ข๐ (๐ฅ) โ ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ. Furthermore,by the inner limit characterization of ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) in Corollary 18.20, there exist ฮ๏ฟฝ๏ฟฝ๐ (๐ฅ) โ๐๐ถ (๐ฅ) (๐ข๐ (๐ฅ)) with ฮ๏ฟฝ๏ฟฝ๐ (๐ฅ) โ ฮ๐ข (๐ฅ). Egorovโs theorem, then yields for all โ โฅ 1 a Borel-measurable set ๐ธโ โ ฮฉ such that L(ฮฉ \ ๐ธโ) < 1/โ and ฮ๏ฟฝ๏ฟฝ๐ โ ฮ๐ข uniformly on ๐ธโ . Since๐๐ถ (๐ฅ) (๐ข๐ (๐ฅ)) is a cone, we have 0 โ ๐๐ถ (๐ฅ) (๐ข๐ (๐ฅ)). It follows that
๐๐ถ (๐ฅ) (๐ข๐ (๐ฅ)) 3 ฮ๐ขโ,๐ (๐ฅ) โ ๐๐ธโ (๐ฅ)ฮ๏ฟฝ๏ฟฝ๐ (๐ฅ).
266
19 tangent and normal cones of pointwise-defined sets
In particular, (19.3) shows that ฮ๐ขโ,๐ โ ๐๐ (๐ข๐) with ฮ๐ขโ,๐ โ ฮ๐ขโ โ ฮ๐ข๐๐ธโ in ๐ฟ ยฎ๐ (ฮฉ) as๐ โ โ. By Vitaliโs convergence theorem (compare the proof of Theorem 19.6),ฮ๐ข๐๐ธโ โ ฮ๐ข
in ๐ฟ ยฎ๐ (ฮฉ) as โ โ โ. Therefore, we may extract a diagonal subsequence {ฮ๏ฟฝ๏ฟฝ๐ โ ฮ๐ขโ๐ ,๐}๐โฅ1of {ฮ๐ขโ,๐}๐,โโฅ1 such that ฮ๏ฟฝ๏ฟฝ๐ โ ฮ๐ข. Since ๐ข๐ โ ๐ข was arbitrary and ฮ๏ฟฝ๏ฟฝ๐ โ ๐๐ (๐ข๐), wededuce that ฮ๐ข โ ๐๐ (๐ข). ๏ฟฝ
Theorem 19.9. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable. Then for every ๐ข โ ๐ ,
๐๐ (๐ข) โ{๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
Proof. Let ๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ) with ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ. Then by def-inition, for almost all ๐ฅ โ ฮฉ there exist ๐ถ (๐ฅ) 3 ๏ฟฝ๏ฟฝ๐ (๐ฅ) โ ๐ข (๐ฅ) as well as ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) 3๏ฟฝ๏ฟฝโ๐(๐ฅ) โ ๐ขโ(๐ฅ). By Egorovโs theorem, for every โ โฅ 1 there exists a Borel-measurable set
๐ธโ โ ฮฉ such that L(ฮฉ \ ๐ธโ) < 1/โ and ๏ฟฝ๏ฟฝโ๐โ ๐ขโ as well as ๏ฟฝ๏ฟฝ๐ โ ๐ข uniformly on ๐ธโ . We
set ๐ขโ,๐ โ ๐๐ธโ๐ข๐ + (1 โ ๐๐ธโ )๐ข and ๐ขโโ,๐
โ ๐๐ธ๐ฟ๏ฟฝ๏ฟฝโ๐. Then ๐ขโ
โ,๐(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ขโ,๐ (๐ฅ)) for almost
every ๐ฅ โ ฮฉ. By Vitaliโs convergence theorem (compare the proof of Theorem 19.6), both๐ขโ,๐ โ ๐ข in ๐ฟ ยฎ๐ (ฮฉ) and ๐ขโ
โ,๐โ ๐ขโโ in ๐ฟ
ยฎ๐โ (ฮฉ) for ๐ขโโ โ ๐๐ธโ๐ขโ. Since ๐ขโโ โ ๐ขโ in ๐ฟ ยฎ๐โ (ฮฉ), we
can extract a diagonal subsequence of {(๐ขโ,๐ , ๐ขโโ,๐)}โ,๐โฅ1 to deduce that ๐ขโ โ ๐๐ (๐ข). ๏ฟฝ
If the pointwise sets๐ถ (๐ฅ) are regular, we have the following polarity between the cones tothe pointwise-dened set๐ .
Lemma 19.10. Let ๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable and ๐ข โ ๐ . If ๐ถ (๐ฅ) is regular at ๐ข (๐ฅ)and closed near ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ, then ๐๐ (๐ข) = ๐๐ (๐ข)โฆ.
Proof. By the regularity of๐ถ (๐ฅ) at ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ and Theorem 19.6, we have
๐๐ (๐ข) ={๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
By Theorem 18.15, ๐๐ถ (๐ฅ) (๐ข (๐ฅ))โฆ = ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ. Arguing as in theproof of Corollary 19.7, we thus obtain
๐๐ (๐ข)โฆ ={ฮ๐ข โ ๐ฟ ยฎ๐ (ฮฉ)
๏ฟฝ๏ฟฝ ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
The regularity of ๐ถ (๐ฅ) also implies that ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) = ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for almost every ๐ฅ โ ฮฉ.The claims now follow from Theorem 19.5. ๏ฟฝ
We can use this result to transfer the regularity of ๐ถ (๐ฅ) to๐ .
Lemma 19.11. Let ๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable and ๐ข โ ๐ . If ๐ถ (๐ฅ) is regular at ๐ข (๐ฅ)and closed near ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ, then๐ is regular at ๐ข and
๐๐ค๐ (๐ข) = ๐๐ (๐ข) = ๐๐ (๐ข).
267
19 tangent and normal cones of pointwise-defined sets
Proof. Since ๐ฟ ยฎ๐ (ฮฉ) is reexive, we have ๐๐ (๐ข) = ๐๐ค๐ (๐ข)โฆ by Lemma 18.10 (ii). This facttogether with Lemma 19.10 and Theorems 1.8 and 18.5 shows that
๐๐ค๐ (๐ข) โ ๐๐ค๐ (๐ข)โฆโฆ = ๐๐ (๐ข)โฆ = ๐๐ (๐ข) โ ๐๐ค๐ (๐ข).
Furthermore, by the regularity and closedness assumptions, we obtain from Theorems 19.5and 19.8 that ๐๐ (๐ข) = ๐๐ (๐ข), which also implies tangential regularity.
Since ๐ฟ ยฎ๐ (ฮฉ) for ยฎ๐ โ (1,โ)๐ is reexive and Gรขteaux smooth, normal regularity followsfrom Theorem 18.25 together with Lemma 19.10. ๏ฟฝ
From this, we obtain pointwise expressions with equality. For the Clarke tangent cone, weonly require local closedness of the underlying sets.
Theorem 19.12. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable. If ๐ถ (๐ฅ) is closed near ๐ข (๐ฅ) for almostevery ๐ฅ โ ฮฉ for every ๐ข โ ๐ , then
๐๐ (๐ข) ={ฮ๐ข โ ๐ฟ ยฎ๐ (ฮฉ)
๏ฟฝ๏ฟฝ ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
Proof. The inclusion โโโ was already shown in Theorem 19.8. To prove the converseinclusion when ๐ถ (๐ฅ) is closed near ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ, we only need to observefrom Lemma 18.12 and Theorem 19.9 and
๐๐ถ (๐ข) โ ๐๐ถ (๐ข)โฆ โ{๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}โฆ
={ฮ๐ข โ ๐ฟ ยฎ๐ (ฮฉ)
๏ฟฝ๏ฟฝ ฮ๐ข (๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ},
where the last equality again follows from Theorem 18.15 together with an argument as inthe proof of Corollary 19.7. ๏ฟฝ
For the limiting normal cone, however, we do require regularity.
Theorem 19.13. Let๐ โ ๐ฟ ยฎ๐ (ฮฉ) be pointwise derivable. If ๐ถ (๐ฅ) is regular at ๐ข (๐ฅ) and closednear ๐ข (๐ฅ) for almost every ๐ฅ โ ฮฉ, then for every ๐ข โ ๐ ,
๐๐ (๐ข) ={๐ขโ โ ๐ฟ ยฎ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ ๐ขโ(๐ฅ) โ ๐๐ถ (๐ฅ) (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}.
Proof. The inclusion โโโ was already shown in Theorem 19.9. The converse inclusion forregular and closed ๐ถ (๐ฅ) follows from Lemma 19.11 and Theorem 19.6. ๏ฟฝ
268
19 tangent and normal cones of pointwise-defined sets
Remark 19.14. Theorems 19.5 and 19.6 on the fundamental cones are based on [Clason & Valkonen2017b]. Without regularity, the characterization of the limiting normal cone of a pointwise-denedset is much more delicate. A full characterization was given in [Mehlitz & Wachsmuth 2018; Mehlitz& Wachsmuth 2019], which showed that even for a closed nonconvex set, the limiting normal conecontains the convex hull of the strong limiting normal cone (where the limit is taken with respectto strong convergence instead of weak-โ convergence) and is dense in the Dini normal cone๐ โฆ
๐ถ (๐ฅ) โin the words of the authors, it may be โunpleasantly largeโ. This is due to an inherent convexifyingeect of integration with respect to the Lebesgue measure.
A characterization of specic pointwise-dened sets in Sobolev spaces was derived in [Harder &Wachsmuth 2018], with similar conclusions.
269
20 DERIVATIVES AND CODERIVATIVES OF SET-VALUED
MAPPINGS
We are now ready to dierentiate set-valued mappings; as already discussed, these gener-alized derivatives are based on the tangent and normal cones of the previous Chapter 18.To account for the changed focus, we will slightly switch notation and use in this and thefollowing chapters of Part IV uppercase letters for set-valued mappings and lowercaseletters for scalar-valued functionals such that, e.g., ๐น (๐ฅ) = ๐๐ (๐ฅ). We focus in this chapteron examples, basic properties, and relationships between the various derivative concepts.In the following Chapters 22 to 25, we then develop calculus rules for each of the dierentderivatives and coderivatives.
20.1 definitions
To motivate the following denitions, it is instructive to recall the geometric intuitionbehind the classical derivative of a scalar function ๐ as limit of a dierence quotient: givenan (innitesimal) change ฮ๐ฅ of the argument ๐ฅ , it gives the corresponding (innitesimal)change ฮ๐ฆ of the value ๐ฆ = ๐ (๐ฅ) required to stay on the graph of ๐ . In other words,(ฮ๐ฅ,ฮ๐ฆ) is a tangent vector to graph ๐ . For a proper set-valued mapping ๐น , however, it isalso possible to remain on the graph of ๐น by varying ๐ฆ without changing ๐ฅ ; it thus alsomakes sense to ask the โdualโ question of, given a change ฮ๐ฆ in image space, what changeฮ๐ฅ in domain space is required to stay inside the graph of ๐น . In geometric terms, the answeris given by ฮ๐ฆ such that (ฮ๐ฅ,ฮ๐ฆ) is a normal vector to graph ๐ (Note that normal vectorspoint away from a set, while we are trying to correct by moving towards it. Recall alsothat (๐ โฒ(๐ฅ),โ1) is normal to epi ๐ for a smooth function ๐ ; see Figure 20.1 and compareLemma 4.10 as well as Section 20.4 below.) In Banach spaces, of course, normal vectors aresubsets of the dual space.
We thus distinguish
(i) graphical derivatives, which generalize classical derivatives and are based on tangentcones;
(ii) coderivatives, which generalize adjoint derivatives and are based on normal cones.
270
20 derivatives and coderivatives of set-valued mappings
epi ๐
ฮ๐ฅ = ๐ฅโฮ๐ฆ = ๐ โฒ(๐ฅ)ฮ๐ฅ
โ๐ฆโ = โ๐ฅโ/๐ โฒ(๐ฅ)
๐graph ๐
๐graph ๐
Figure 20.1: Illustration why the coderivatives negate ๐ฆโ in comparison to the normal cone.
In each case, we can use either basic or limiting cones, leading to four dierent denitions.
Specically, let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ . Then we dene
(i) the graphical derivative of ๐น at ๐ฅ โ ๐ for ๐ฆ โ ๐ as
๐ท๐น (๐ฅ |๐ฆ) : ๐ โ ๐, ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ {ฮ๐ฆ โ ๐
๏ฟฝ๏ฟฝ (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)} ;(ii) the Clarke graphical derivative of ๐น at ๐ฅ โ ๐ for ๐ฆ โ ๐ as
๐ท๐น (๐ฅ |๐ฆ) : ๐ โ ๐, ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ{ฮ๐ฆ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)} ;(iii) the Frรฉchet coderivative of ๐น at ๐ฅ โ ๐ for ๐ฆ โ ๐ as
๐ทโ๐น (๐ฅ |๐ฆ) : ๐ โ โ ๐ โ, ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ{๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ (๐ฅโ,โ๐ฆโ) โ ๐graph ๐น (๐ฅ, ๐ฆ)};
(iv) the (basic or limiting or Mordukhovich) coderivative of ๐น at ๐ฅ โ ๐ for ๐ฆ โ ๐ as
๐ทโ๐น (๐ฅ |๐ฆ) : ๐ โ โ ๐ โ, ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ {๐ฅโ โ ๐ โ ๏ฟฝ๏ฟฝ (๐ฅโ,โ๐ฆโ) โ ๐graph ๐น (๐ฅ, ๐ฆ)
}.
Observe how the coderivatives operate from ๐ โ to๐ โ, while the derivatives operate from๐to๐ . It is crucial that these are dened directly via (possibly nonconvex) normal cones ratherthan via polarity from the corresponding graphical derivatives to avoid convexication.This will allow for sharper results involving these coderivatives.
We illustrate these denitions with the simplest example of a single-valued linear opera-tor.
Example 20.1 (single-valued linear operators). Let ๐น (๐ฅ) โ {๐ด๐ฅ} for ๐ด โ ๐(๐ ;๐ ) and๐ข = (๐ฅ,๐ด๐ฅ) โ graph ๐น . Note that graph ๐น is a linear subspace of ๐ ร๐ . Since graph ๐น isregular by Corollary 18.27, both of the tangent cones are given by
๐graph ๐น (๐ข) = ๐graph ๐น (๐ข) = graph ๐น = {(ฮ๐ฅ,๐ดฮ๐ฅ) โ ๐ ร ๐ | ฮ๐ฅ โ ๐ },
271
20 derivatives and coderivatives of set-valued mappings
while the normal cones are given by
๐graph ๐น (๐ข) = ๐graph ๐น (๐ข) = {๐ขโ โ ๐ โ ร ๐ โ | ๐ขโ โฅ graph ๐น }= {(๐ฅโ, ๐ฆโ) โ ๐ โ ร ๐ โ | ใ๐ฅโ,ฮ๐ฅใ๐ + ใ๐ฆโ, ๐ดฮ๐ฅใ๐ = 0 for all ฮ๐ฅ โ ๐ }= {(๐ดโ๐ฆโ,โ๐ฆโ) โ ๐ โ ร ๐ โ | ๐ฆโ โ ๐ โ}.
This immediately yields the graphical derivatives
๐ท๐น (๐ฅ |๐ด๐ฅ) (ฮ๐ฅ) = ๐ท๐น (๐ฅ |๐ด๐ฅ) (ฮ๐ฅ) = {๐ดฮ๐ฅ}
as well as the coderivatives
๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) = ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) = {๐ดโ๐ฆโ}.
Using (18.1), we can also write the graphical derivative as
(20.1) ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = lim sup๐กโ 0,ฮ๐ฅโฮ๐ฅ
๐น (๐ฅ + ๐กฮ๐ฅ) โ ๐ฆ๐ก
,
since(ฮ๐ฅ,ฮ๐ฆ) โ lim sup
๐โ 0
graph ๐น โ (๐ฅ, ๐ฆ)๐
if and only if there exist ๐๐โ 0 and ๐ฅ๐ such that
(20.2) ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐
and ฮ๐ฆ โ lim sup๐โโ
๐น (๐ฅ๐) โ ๐ฆ๐๐
.
The former forces ๐ฅ๐ = ๐ฅ โ ๐๐ฮ๐ฅ๐ for ฮ๐ฅ๐ โ ฮ๐ฅ , so the latter gives (20.1).
In innite-dimensional spaces, we also have to distinguish the weak graphical derivative๐ท๐ค๐น (๐ฅ |๐ฆ) and the Y-coderivative ๐ทโ
Y ๐น (๐ฅ |๐ฆ), both constructed analogously from the weaktangent cone ๐๐คgraph ๐น (๐ฅ, ๐ฆ) and the Y-normal cone ๐ Y
graph ๐น (๐ฅ, ๐ฆ), respectively. However, wewill not beworking directlywith these and instead switch to the setting of the correspondingcones when they would be needed.
Remark 20.2 (a much too brief history of various (co)derivatives). As for the various tangentand normal cones, the (more recent) development of derivatives and coderivatives of set-valuedmappings is convoluted, and we do not attempt to give a full account, instead referring to thecommentaries to [Rockafellar & Wets 1998, Chapter 8], [Mordukhovich 2006, Chapter 1.4.12], and[Mordukhovich 2018, Chapter 1].
The graphical derivative goes back to Aubin [Aubin 1981], who also introduced the Clarke graphicalderivative (under the name circatangent derivative) in [Aubin 1984]. Coderivatives based on normalcones were mainly treated there for mappings whose graphs are convex, for which these cones canbe dened as polars of the appropriate tangent cones. Graphical derivatives were further studied in
272
20 derivatives and coderivatives of set-valued mappings
[Thibault 1983]. In parallel, Mordukhovich introduced the (nonconvex) limiting coderivative via hislimiting normal cone in [Morduhoviฤ 1980], again stressing the need for a genuinely nonconvexdirect construction. The term coderivative was coined by Ioe, who was the rst to study thesemappings systematically in [Ioe 1984].
20.2 basic properties
We now translate various results of Chapter 18 on tangent and normal cones to the settingof graphical derivatives and coderivatives. From Theorem 18.5, we immediately obtain
Corollary 20.3. For ๐น : ๐ โ ๐ , ๐ฅ โ ๐ , and ๐ฆ โ ๐ , we have the inclusions(i) ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ ๐ท๐ค๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) for all ฮ๐ฅ โ ๐ ;(ii) ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) for all ๐ฆโ โ ๐ โ.
Similarly, we obtain from Theorem 18.8 the following outer semicontinuity and convexityproperties.
Corollary 20.4. For ๐น : ๐ โ ๐ , ๐ฅ โ ๐ , and ๐ฆ โ ๐ ,(i) ๐ท๐น (๐ฅ |๐ฆ), ๐ท๐น (๐ฅ |๐ฆ), and ๐ทโ๐น (๐ฅ |๐ฆ) are closed;(ii) if ๐ and ๐ are nite-dimensional, then ๐ทโ๐น (๐ฅ |๐ฆ) is closed;(iii) ๐ท๐น (๐ฅ |๐ฆ) and ๐ทโ๐น (๐ฅ |๐ฆ) are convex.
Graphical derivatives and coderivatives behave completely symmetrically with respectto inversion of a set-valued mapping (which we recall is always possible in the sense ofpreimages).
Lemma 20.5. Let ๐น : ๐ โ ๐ , ๐ฅ โ ๐ , and ๐ฆ โ ๐ . Thenฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ ฮ๐ฅ โ ๐ท๐นโ1(๐ฆ |๐ฅ) (ฮ๐ฆ),ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ ฮ๐ฅ โ ๐ท๐นโ1(๐ฆ |๐ฅ) (ฮ๐ฆ),๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ โ๐ฆโ โ ๐ทโ๐นโ1(๐ฆ |๐ฅ) (โ๐ฅโ),๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ โ๐ฆโ โ ๐ทโ๐นโ1(๐ฆ |๐ฅ) (โ๐ฅโ).
Proof. We have
ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โ (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)โ (ฮ๐ฆ,ฮ๐ฅ) โ ๐graph ๐นโ1 (๐ฆ, ๐ฅ)โ ฮ๐ฅ โ ๐ท๐นโ1(๐ฆ |๐ฅ) (ฮ๐ฆ).
The proof for the regular derivative and the coderivatives is completely analogous. ๏ฟฝ
273
20 derivatives and coderivatives of set-valued mappings
adjoints of set-valued mappings
From the various relations between normal and tangent cones, we obtain correspondingrelations between these derivatives. To state these relationships, we need to introduce theupper and lower adjoints of set-valued mappings. Let ๐ป : ๐ โ ๐ be a set-valued mapping.Then the upper adjoint of ๐ป is dened as
๐ป โฆ+(๐ฆโ) โ {๐ฅโ | ใ๐ฅโ, ๐ฅใ๐ โค ใ๐ฆโ, ๐ฆใ๐ for all ๐ฆ โ ๐ป (๐ฅ), ๐ฅ โ ๐ },
and the lower adjoint of ๐ป as
๐ป โฆโ(๐ฆโ) โ {๐ฅโ | ใ๐ฅโ, ๐ฅใ๐ โฅ ใ๐ฆโ, ๐ฆใ๐ for all ๐ฆ โ ๐ป (๐ฅ), ๐ฅ โ ๐ }.
As the next example shows, these notions generalize the denition of the adjoint of a linearoperator.
Example 20.6 (upper and lower adjoints of linear mappings). Let ๐ป (๐ฅ) โ {๐ด๐ฅ} for๐ด โ ๐(๐ ;๐ ). Then
๐ป โฆ+(๐ฆโ) = {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ โค ใ๐ฆโ, ๐ฆใ๐ for all ๐ฆ = ๐ด๐ฅ, ๐ฅ โ ๐ }= {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ โค ใ๐ฆโ, ๐ด๐ฅใ๐ for all ๐ฅ โ ๐ }= {๐ฅโ โ ๐ โ | ใ๐ฅโ โ๐ดโ๐ฆโ, ๐ฅใ๐ โค 0 for all ๐ฅ โ ๐ }= {๐ดโ๐ฆโ}.
Similarly, ๐ป โฆโ(๐ฆโ) = {๐ดโ๐ฆโ}.
For solution mappings of linear equations, we have the following adjoints.
Example 20.7 (upper and lower adjoints of solution maps to linear equations). Let๐ป (๐ฅ) โ {๐ฆ | ๐ด๐ฆ = ๐ฅ} for ๐ด โ ๐(๐ ;๐ ). Then
๐ป โฆ+(๐ฆโ) = {๐ฅโ | ใ๐ฅโ, ๐ฅใ โค ใ๐ฆโ, ๐ฆใ for all ๐ด๐ฆ = ๐ฅ, ๐ฅ โ ๐ }
If ๐ฆโ โ ran๐ดโ, then ran๐ดโ โฅ ker๐ด โ โ , so for every ๐ฅโ โ ๐ โ and ๐ฅ โ ๐ we can choose๐ฆ โ ๐ such that the above condition is not satised. Therefore ๐ป โฆ+(๐ฆ) = โ . Otherwise,if ๐ฆโ = ๐ดโ๐ฅโ, we continue to calculate
๐ป โฆ+(๐ฆโ) = {๐ฅโ โ ๐ โ | ใ๐ฅโ, ๐ฅใ๐ โค ใ๐ฅโ, ๐ฅใ๐ for all ๐ฅ โ ๐ } = {๐ฅโ}.
Therefore๐ป โฆ+(๐ฆโ) = {๐ฅโ โ ๐ | ๐ดโ๐ฅโ = ๐ฆโ}.
A similar argument shows that ๐ป โฆโ(๐ฆโ) = ๐ป โฆ+(๐ฆโ).
274
20 derivatives and coderivatives of set-valued mappings
These examples and Example 20.1 suggest the adjoint relationships of the next corollary.Note that in innite-dimensional spaces, we only have a relationship between the limitingderivatives, i.e., between the Clarke graphical derivative and the limiting coderivative.
Corollary 20.8. Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ .
(i) If ๐ and ๐ are nite-dimensional, then
๐ทโ๐น (๐ฅ |๐ฆ) = ๐ท๐น (๐ฅ |๐ฆ)โฆ+.
(ii) If๐ and๐ are reexive andGรขteaux smooth (in particular, if they are nite-dimensional),and graph ๐น is closed near (๐ฅ, ๐ฆ), then
๐ท๐น (๐ฅ |๐ฆ) = ๐ทโ๐น (๐ฅ |๐ฆ)โฆโ.
Proof. (i): Identifying ๐ โ with ๐ and ๐ โ with ๐ in nite dimension, we have by denitionthat
๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = {ฮ๐ฆ โ ๐
๏ฟฝ๏ฟฝ (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)}and
๐ทโ๐น (๐ฅ |๐ฆ) (ฮ๐ฆ) ={ฮ๐ฅ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ (ฮ๐ฅ,โฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)}.
Using Lemma 18.10 (iii), we then see that
๐ฅโ โ ๐ท๐น (๐ฅ |๐ฆ)โฆ+(๐ฆโ) โ ใ๐ฅโ,ฮ๐ฅใ๐ โค ใ๐ฆโ,ฮ๐ฆใ๐ for ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ)โ ใ๐ฅโ,ฮ๐ฅใ๐ + ใโ๐ฆโ,ฮ๐ฆใ๐ โค 0 for (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)โ (๐ฅโ,โ๐ฆโ) โ ๐graph ๐น (๐ฅ, ๐ฆ)โฆ = ๐graph ๐น (๐ฅ, ๐ฆ)โ ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ).
This proves the claim.
(ii): We proceed analogously to (i) using Theorem 18.19 (or Theorem 18.15 if ๐ and ๐ arenite-dimensional):
ฮ๐ฆ โ ๐ทโ๐น (๐ฅ |๐ฆ)โฆโ(ฮ๐ฅ) โ ใ๐ฆโ,ฮ๐ฆใ๐ โฅ ใ๐ฅโ,ฮ๐ฅใ๐ for ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ)โ ใ๐ฅโ,ฮ๐ฅใ๐ + ใโ๐ฆโ,ฮ๐ฆใ๐ โค 0 for (๐ฅโ,โ๐ฆโ) โ ๐graph ๐น (๐ฅ, ๐ฆ)โ (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)โฆ = ๐graph ๐น (๐ฅ, ๐ฆ)โ ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ). ๏ฟฝ
275
20 derivatives and coderivatives of set-valued mappings
limiting characterizations in finite dimensions
In nite dimensions, we can characterize the limiting coderivative and the Clarke derivativedirectly as inner and outer limits, respectively.
Corollary 20.9. Let๐ and๐ be nite-dimensional and ๐น : ๐ โ ๐ . Then for all (๐ฅ, ๐ฆ) โ ๐ ร๐and all ๐ฆโ โ ๐ ,
๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) =๐ฅโ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝthere exists graph ๐น 3 (๐ฅ, ๏ฟฝ๏ฟฝ) โ (๐ฅ, ๐ฆ)
and (๐ฅโ, ๏ฟฝ๏ฟฝโ) โ (๐ฅโ, ๐ฆโ)with ๐ฅโ โ ๐ทโ๐น (๐ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโ)
.(20.3)
If graph ๐น is closed near (๐ฅ, ๐ฆ), then for all ฮ๐ฅ โ โ๐
๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) =ฮ๐ฆ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all graph ๐น 3 (๐ฅ, ๏ฟฝ๏ฟฝ) โ (๐ฅ, ๐ฆ)there exists (ฮ๐ฅ,ฮ๏ฟฝ๏ฟฝ) โ (ฮ๐ฅ,ฮ๐ฆ)
with ฮ๏ฟฝ๏ฟฝ โ ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ)
.(20.4)
Proof. The characterization (20.3) of the limiting coderivative is a direct application of thedenition of the limiting normal cone (18.3) as an outer limit of the Frรฉchet normal. Thecharacterization (20.4) of the Clarke graphical derivative follows from the characterizationof Corollary 18.20 of the Clarke tangent cone as an inner limit of (basic) tangent cones. ๏ฟฝ
regularity
Based on the regularity concepts of sets from Section 18.4, we can dene concepts ofregularity of set-valued mappings. We say that ๐น at (๐ฅ, ๐ฆ) โ graph ๐น (or at ๐ฅ for ๐ฆ โ ๐น (๐ฅ))is
(i) T-regular if ๐ท๐น (๐ฅ |๐ฆ) = ๐ท๐น (๐ฅ |๐ฆ) (i.e., if graph ๐น has tangential regularity);
(ii) N-regular, if ๐ทโ๐น (๐ฅ |๐ฆ) = ๐ทโ๐น (๐ฅ |๐ฆ) (i.e., if graph ๐น has normal regularity).
If ๐น is both T- and N-regular at (๐ฅ, ๐ฆ), we say that ๐น is graphically regular.
FromTheorem 18.25,we immediately obtain the following characterization of๐ -regularity.
Corollary 20.10. Let ๐,๐ be reexive and Gรขteaux smooth Banach spaces, ๐น : ๐ โ ๐ , andlet (๐ฅ, ๐ฆ) โ graph ๐น with graph ๐น closed near (๐ฅ, ๐ฆ). Then ๐น is N-regular at (๐ฅ, ๐ฆ) if and onlyif ๐ท๐น (๐ฅ |๐ฆ) = [๐ทโ๐น (๐ฅ |๐ฆ)]โฆโ.
Writing out various alternatives of Theorem 18.23 for set-valued mappings, we obtain fullequivalence of the notions and alternative characterizations in nite dimensions.
276
20 derivatives and coderivatives of set-valued mappings
Corollary 20.11. Let ๐,๐ be nite-dimensional and ๐น : ๐ โ ๐ . If graph ๐น is closed near(๐ฅ, ๐ฆ), then the following conditions are equivalent:
(i) ๐น is N-regular at ๐ฅ for ๐ฆ , i.e., ๐ทโ๐น (๐ฅ |๐ฆ) = ๐ทโ๐น (๐ฅ |๐ฆ);(ii) ๐น is T-regular at ๐ฅ for ๐ฆ , i.e., ๐ท๐น (๐ฅ |๐ฆ) = ๐ท๐น (๐ฅ |๐ฆ);
(iii) ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ๐ฅโ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝthere exists graph ๐น 3 (๐ฅ, ๏ฟฝ๏ฟฝ) โ (๐ฅ, ๐ฆ)
and (๐ฅโ, ๏ฟฝ๏ฟฝโ) โ (๐ฅโ, ๐ฆโ)with ๐ฅโ โ ๐ทโ๐น (๐ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโ)
;
(iv) ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) โฮ๐ฆ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all (๐ฅ, ๏ฟฝ๏ฟฝ) โ (๐ฅ, ๐ฆ)there exists graph ๐น 3 (ฮ๐ฅ,ฮ๏ฟฝ๏ฟฝ) โ (ฮ๐ฅ,ฮ๐ฆ)
with ฮ๏ฟฝ๏ฟฝ โ ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ)
.In particular, if any of these hold, ๐น is graphically regular at ๐ฅ for ๐ฆ .
20.3 examples
As the following examples demonstrate, the graphical derivatives and coderivatives gener-alize classical (sub)dierentials.
single-valued mappings and their inverses
For the Clarke graphical derivative and the limiting coderivatives (which are obtainedas inner or outer limits), we have to require โ just as for the Clarke subdierential inTheorem 13.5 โ slightly more than just Frรฉchet dierentiability.
Theorem 20.12. Let ๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be single-valued and Frรฉchet-dierentiable at ๐ฅ โ ๐ . Then
๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) ={{๐น โฒ(๐ฅ)ฮ๐ฅ} if ๐ฆ = ๐น (๐ฅ),โ otherwise,
and
๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) ={{๐น โฒ(๐ฅ)โ๐ฆโ} if ๐ฆ = ๐น (๐ฅ),โ otherwise.
If ๐น is continuously Frรฉchet-dierentiable at ๐ฅ , then ๐น is graphically regular at ๐ฅ for ๐น (๐ฅ),and hence the corresponding expressions also hold for ๐ท๐น (๐ฅ |๐ฆ) and ๐ทโ๐น (๐ฅ |๐ฆ).
277
20 derivatives and coderivatives of set-valued mappings
Proof. The graphical derivative: We have (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ) if and only if for some๐ฅ๐ โ ๐ฅ , ๐ฆ๐ โ ๐น (๐ฅ๐), and ๐๐โ 0 there holds
ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐
=: lim๐โโ
ฮ๐ฅ๐(20.5a)
and
ฮ๐ฆ = lim๐โโ
๐ฆ๐ โ ๐ฆ๐๐
= lim๐โโ
๐น (๐ฅ + ๐๐ฮ๐ฅ๐) โ ๐น (๐ฅ)๐๐
.(20.5b)
If ฮ๐ฅ๐ = 0 for all suciently large ๐ โ โ, clearly both ฮ๐ฅ = 0 and ฮ๐ฆ = 0. This satisesthe claimed expression. So we may assume that ฮ๐ฅ๐ โ 0 for all ๐ โ โ. In this case, (20.5b)holds if and only if
lim๐โโ
๐น (๐ฅ + โ๐) โ ๐น (๐ฅ) โ ๐๐ฮ๐ฆ๐โโ๐ โ๐
= 0
for โ๐ โ ๐๐ฮ๐ฅ๐ and any ฮ๐ฆ๐ โ ฮ๐ฆ . Since ๐น is Frรฉchet dierentiable, this clearly holdswith
ฮ๐ฆ๐ โ ๐โ1๐ ๐นโฒ(๐ฅ)โ๐ = ๐น โฒ(๐ฅ)ฮ๐ฅ๐ โ ๐น โฒ(๐ฅ)ฮ๐ฅ =: ฮ๐ฆ.
This shows that ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = {๐น โฒ(๐ฅ)ฮ๐ฅ}.The Clarke graphical derivative: To calculate ๐ท๐น (๐ฅ |๐ฆ), we have to nd all ฮ๐ฅ and ฮ๐ฆ suchthat for every ๐๐โ 0 and (๐ฅ๐ , ๏ฟฝ๏ฟฝ๐) โ (๐ฅ, ๐ฆ) with ๏ฟฝ๏ฟฝ๐ = ๐น (๐ฅ๐), there exists ๐ฅ๐ โ ๐ฅ with
ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐๐
and ฮ๐ฆ = lim๐โโ
๐น (๐ฅ๐) โ ๐น (๐ฅ๐)๐๐
.
Setting ๐ฅ๐ = ๐ฅ๐ + ๐๐ฮ๐ฅ๐ with ฮ๐ฅ๐ โ ฮ๐ฅ , the second condition becomes
ฮ๐ฆ = lim๐โโ
๐น (๐ฅ๐ + ๐๐ฮ๐ฅ๐) โ ๐น (๐ฅ๐)๐๐
.
Taking ๐ฅ๐ = ๐ฅ , arguing as for ๐ท๐น shows that ฮ๐ฆ = ๐น โฒ(๐ฅ)ฮ๐ฅ is the only candidate. It justremains to show that any choice of ๐ฅ๐ gives the same limit, i.e., that
lim๐โโ
๐น (๐ฅ๐ + ๐๐ฮ๐ฅ๐) โ ๐น (๐ฅ๐) โ ๐๐๐น โฒ(๐ฅ)ฮ๐ฅ๐๐
= 0.
But this follows from the assumed continuous dierentiability using Lemma 13.22. Thusfor ๐ฆ = ๐น (๐ฅ),
๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = {๐น โฒ(๐ฅ)ฮ๐ฅ} = ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ).This shows that ๐น is T-regular at ๐ฅ for ๐ฆ .
The Frรฉchet coderivative: The claim follows from proving that
(20.6) ๐ทโY ๐น (๐ฅ |๐ฆ) (๐ฆโ) =
{๐น(๐น โฒ(๐ฅ)โ๐ฆโ, Y) if ๐ฆ = ๐น (๐ฅ),โ otherwise,
278
20 derivatives and coderivatives of set-valued mappings
To show this, we note that ๐ฅโ โ ๐ทโY ๐น (๐ฅ |๐ฆ) (๐ฆโ) if and only if for every sequence ๐ฅ๐ โ ๐ฅ
with ๐น (๐ฅ๐) โ ๐น (๐ฅ),
lim sup๐โโ
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐น (๐ฅ๐) โ ๐น (๐ฅ)ใ๐โโ๐ฅ๐ โ ๐ฅ โ2๐ + โ๐น (๐ฅ๐) โ ๐น (๐ฅ)โ2๐
โค Y.
Dividing both numerator and denominator by โ๐ฅ๐ โ ๐ฅ โ๐ > 0, we obtain the equivalentcondition that
lim sup๐โโ
๐๐ โค Y for ๐๐ โใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐น (๐ฅ๐) โ ๐น (๐ฅ)ใ๐
โ๐ฅ๐ โ ๐ฅ โ๐.
If we take ๐ฅโ โ ๐น(๐น โฒ(๐ฅ)โ๐ฆโ, Y), this condition is veried by the Frรฉchet dierentiability of๐น at ๐ฅ . Conversely, to show that this implies ๐ฅโ โ ๐น(๐น โฒ(๐ฅ)โ๐ฆโ, Y), we take ๐ฅ๐ โ ๐ฅ + ๐๐โ forsome ๐๐โ 0 and โ โ ๐ with โโโ๐ = 1. Then again by the Frรฉchet dierentiability of ๐น ,
Y โฅ lim๐โโ
๐๐ = ใ๐ฅโ, โใ โ ใ๐ฆโ, ๐น โฒ(๐ฅ)โใ.
Since โ โ ๐น๐ was arbitrary, this shows that ๐ฅโ โ ๐น(๐น โฒ(๐ฅ)โ๐ฆโ, Y).The limiting coderivative: By the denition (18.8), the formula (20.6) for Y-coderivatives,and the continuous dierentiability, we have
๐graph ๐น (๐ฅ, ๐น (๐ฅ)) = w-โ-lim sup๐ฅโ๐ฅ, Yโ 0
๐ Ygraph ๐น (๐ฅ, ๐น (๐ฅ))
= w-โ-lim sup๐ฅโ๐ฅ Yโ 0
{(๐ฆโ, ๐น โฒ(๐ฅ)โ๐ฆโ + ๐งโ) โ ๐ โ ร ๐ โ | ๐ฆโ โ ๐ โ, ๐งโ โ ๐น(0, Y)}
= w-โ-lim sup๐ฅโ๐ฅ
{(๐ฆโ, ๐น โฒ(๐ฅ)โ๐ฆโ) โ ๐ โ ร ๐ โ | ๐ฆโ โ ๐ โ}
= {(๐ฆโ, ๐น โฒ(๐ฅ)โ๐ฆโ) โ ๐ โ ร ๐ โ | ๐ฆโ โ ๐ โ}.This shows the claimed formula for the limiting coderivative and hence N- and thereforegraphical regularity. ๏ฟฝ
Remark 20.13. In nite dimensional spaces, it would be possible to more concisely prove theexpression for ๐ท๐น (๐ฅ |๐ฆ) using Corollary 18.20. Likewise, we could use the polarity relationshipsof Corollary 20.8 to obtain the expression for ๐ทโ๐น (๐ฅ |๐ฆ). These approaches will, however, not bepossible in more general spaces.
Combining Theorem 20.12 with Lemma 20.5 allows us to compute the graphical derivativesand coderivatives of inverses of single-valued functions.
Corollary 20.14. Let ๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be single-valued and Frรฉchet-dierentiable at ๐ฅ โ ๐ . Then
๐ท๐นโ1(๐ฆ |๐ฅ) (ฮ๐ฆ) ={{ฮ๐ฅ โ ๐ | ๐น โฒ(๐ฅ)ฮ๐ฅ = ฮ๐ฆ} if ๐ฆ = ๐น (๐ฅ),โ otherwise,
279
20 derivatives and coderivatives of set-valued mappings
and
๐ทโ๐นโ1(๐ฆ |๐ฅ) (๐ฅโ) ={{๐ฆโ โ ๐ โ | ๐น โฒ(๐ฅ)โ๐ฆโ = ๐ฅโ} if ๐ฆ = ๐น (๐ฅ),โ otherwise.
If ๐น is continuously Frรฉchet-dierentiable at ๐ฅ , then ๐นโ1 is graphically at ๐ฆ = ๐น (๐ฅ) for ๐ฅ , andhence the corresponding expressions also hold for ๐ท๐นโ1(๐ฆ |๐ฅ) and ๐ทโ๐นโ1(๐ฆ |๐ฅ).
It is important that Theorem 20.12 concerns the strong graphical derivatives ๐ท๐น instead ofthe weak graphical derivative๐ท๐ค๐น . Indeed, as the next counter-example demonstrates,๐ท๐ค๐นis more of a theoretical tool (with the important property in reexive spaces that๐ทโ๐น (๐ฅ |๐ฆ) =๐ท๐ค๐น (๐ฅ |๐ฆ)โฆ+ by Lemma 18.10 (ii)) which does not enjoy a rich calculus consistent withconventional notions. In the following chapters, we will therefore not develop calculusrules for the weak graphical derivative.
Example 20.15 (counter-example to single-valued weak graphical derivatives). Let๐ โ ๐ถ1(โ), ฮฉ โ โ๐ be open, and
๐น : ๐ฟ2(ฮฉ) โ โ, ๐น (๐ข) =โซ 1
0๐ (๐ข (๐ฅ)) ๐๐ฅ.
Then by the above,
๐ท๐น (๐ข |๐น (๐ข)) (ฮ๐ข) ={โซ 1
0๐ โฒ(๐ข (๐ฅ))ฮ๐ข (๐ฅ) ๐๐ฅ
}.
In particular, ๐ท๐น (๐ข |๐น (๐ข)) (0) = {0}.However, choosing, e.g., ๐ (๐ก) =
โ1 + ๐ก2, ฮฉ = (0, 1), and ๐ข๐ (๐ฅ) โ sign sin(2๐๐๐ฅ), we
have ๐ข๐ โ 0 in ๐ฟ2(ฮฉ) but |๐ข๐ (๐ฅ) | = 1 for a.e. ๐ฅ โ [0, 1]. Take now ๏ฟฝ๏ฟฝ๐ โ ๐ผ๐๐๐ข๐ for anygiven ๐๐โ 0 and ๐ผ > 0. Then ๏ฟฝ๏ฟฝ๐ โ 0 as well, while
๐น (๏ฟฝ๏ฟฝ๐) โ ๐น (0) =โ1 + ๐ผ2๐2
๐โ 1 โ 0.
Moreover, (๏ฟฝ๏ฟฝ๐ โ 0)/๐๐ = ๐ผ๐ข๐ โ 0 and lim๐โโ(โ
1 + ๐ผ2๐2๐โ 1
)/๐๐ = ๐ผ2. As ๐ผ > 0 was
arbitrary, we deduce that ๐ท๐ค๐น (๐ข |๐น (๐ข)) (0) โ [0,โ).
derivatives and coderivatives of subdifferentials
We now apply these notions to set-valued mappings arising as subdierentials of convexfunctionals. First, we directly obtain from Theorem 20.12 an expression for the squarednorm in Hilbert spaces.
280
20 derivatives and coderivatives of set-valued mappings
Corollary 20.16. Let ๐ be a Hilbert space and ๐ (๐ฅ) = 12 โ๐ฅ โ2๐ for ๐ฅ โ ๐ . Then
๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ) = ๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ) ={{ฮ๐ฅ} if ๐ฆ = ๐ฅ,
โ otherwise,
and
๐ทโ [๐๐ ] (๐ฅ |๐ฆ) (๐ฆโ) = ๐ทโ [๐๐ ] (๐ฅ |๐ฆ) (๐ฆโ) ={{๐ฆโ} if ๐ฆ = ๐ฅ,
โ otherwise.
In particular, ๐๐ is graphically regular at every ๐ฅ โ ๐ .
Of course, we are more interested in subdierentials of nonsmooth functionals. We rststudy the indicator functional of an interval; see Figure 20.2.
Theorem 20.17. Let ๐ (๐ฅ) โ ๐ฟ [โ1,1] (๐ฅ) for ๐ฅ โ โ. Then
๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ) =
โ if |๐ฅ | = 1, ๐ฆ โ (0,โ)๐ฅ, ฮ๐ฅ = 0,[0,โ)๐ฅ if |๐ฅ | = 1, ๐ฆ = 0, ฮ๐ฅ = 0,{0} if |๐ฅ | = 1, ๐ฆ = 0, ๐ฅฮ๐ฅ < 0,{0} if |๐ฅ | < 1, ๐ฆ = 0,โ otherwise,
(20.7)
๐ทโ [๐๐ ] (๐ฅ |๐ฆ) (๐ฆโ) =
โ, if |๐ฅ | = 1, ๐ฆ โ (0,โ)๐ฅ, ๐ฆโ = 0[0,โ)๐ฅ if |๐ฅ | = 1, ๐ฆ = 0, ๐ฅ๐ฆโ โฅ 0,{0} if |๐ฅ | < 1, ๐ฆ = 0,โ otherwise,
(20.8)
๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ) =
โ if |๐ฅ | = 1, ๐ฆ โ (0,โ)๐ฅ, ฮ๐ฅ = 0,{0} if |๐ฅ | = 1, ๐ฆ = 0, ฮ๐ฅ = 0,{0} |๐ฅ | < 1, ๐ฆ = 0,โ if otherwise,
(20.9)
and
๐ทโ [๐๐ ] (๐ฅ |๐ฆ) (๐ฆโ) =
โ if |๐ฅ | = 1, ๐ฆ โ [0,โ)๐ฅ, ๐ฆโ = 0[0,โ)๐ฅ if |๐ฅ | = 1, ๐ฆ = 0, ๐ฅ๐ฆโ > 0,{0} if |๐ฅ | = 1, ๐ฆ = 0, ๐ฅ๐ฆโ < 0,{0} if |๐ฅ | < 1, ๐ฆ = 0,โ otherwise.
(20.10)
In particular, ๐๐ is graphically regular at ๐ฅ for ๐ฆ โ ๐๐ (๐ฅ) if and only if |๐ฅ | < 1 or ๐ฆ โ 0.
281
20 derivatives and coderivatives of set-valued mappings
(a) graphical derivative ๐ท [๐๐ ] (b) convex hull co๐ท [๐๐ ] (c) Frรฉchet coderivative ๐ทโ [๐๐ ]
(d) limiting coderivative๐ทโ [๐๐ ]
(e) convex hull co๐ทโ [๐๐ ] (f) Clarke graphical derivative๐ท [๐๐ ]
Figure 20.2: Illustration of the dierent graphical derivatives and coderivatives of ๐๐ for๐ = ๐ฟ [โ1,1] . The dashed line is graph ๐๐ . The dots indicate the base points (๐ฅ, ๐ฆ)where ๐ท [๐๐ ] (๐ฅ |๐ฆ) is calculated, and the thick arrows and lled-in areas the di-rections of (ฮ๐ฅ,ฮ๐ฆ) (resp. (ฮ๐ฅ,โฮ๐ฆ) for the coderivatives) relative to the basepoint. Observe that there is no graphical regularity at (๐ฅ, ๐ฆ) โ {(โ1, 0), (1, 0)}.Everywhere else, ๐๐ is graphically regular. Observe also that cones in the lastgures of each row are polar to the cones in the rst and the second gureson the same row.
Proof. We rst of all recall from Example 4.9 that graph ๐๐ is closed with
(20.11) ๐๐ (๐ฅ) =[0,โ)๐ฅ if |๐ฅ | = 1,{0} if |๐ฅ | < 1,โ otherwise.
We now verify (20.7). If ๐ฆ โ ๐๐ (๐ฅ) and ฮ๐ฆ โ ๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ), there exist by (20.1) se-quences ๐ก๐โ 0, ๐ฅ๐ โ ๐ฅ , and ๐ฆ๐ โ ๐๐ (๐ฅ + ๐ก๐ฮ๐ฅ๐) such that
(20.12) ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐ก๐
and ฮ๐ฆ = lim๐โโ
๐ฆ๐ โ ๐ฆ๐ก๐
.
We proceed by case distinction.
282
20 derivatives and coderivatives of set-valued mappings
(i) |๐ฅ | = 1, ฮ๐ฅ = 0, and ๐ฆ โ (0,โ)๐ฅ : Then choosing ๐ฅ๐ โก ๐ฅ , any ฮ๐ฆ โ โ and ๐ largeenough, we can take ๐ฆ๐ = ๐ฆ + ๐ก๐ฮ๐ฆ โ [0,โ)๐ฅ = ๐๐ (๐ฅ). This yields the rst case of(20.7).
(ii) |๐ฅ | = 1, ฮ๐ฅ = 0, but ๐ฆ = 0: In this case, choosing ๐ฅ๐ โก ๐ฅ , we can take any ๐ฆ๐ โ ๐๐ (๐ฅ +๐ก๐ฮ๐ฅ๐) = ๐๐ (๐ฅ) = [0,โ)๐ฅ . Picking any ฮ๐ฆ โ [0,โ)๐ฅ and setting ๐ฆ๐ โ ๐ฆ + ๐ก๐ฮ๐ฆ ,we deduce that ฮ๐ฆ โ ๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ). Thus โโโ holds in the second case of (20.7).Since ฮ๐ฆ โ โ(0,โ)๐ฅ is clearly not obtainable with ๐ฆ๐ โ [0,โ)๐ฅ , also โโโ holds.
(iii) |๐ฅ | = 1 and ฮ๐ฅ = 0, but ๐ฆ โ โ(0,โ)๐ฅ : Then we have ๐ฆ๐ โ [0,โ)๐ฅ for ๐ large enoughsince in this case either ๐ฅ๐ = ๐ฅ or ๐ฅ๐ โ (โ1, 1). Thus |๐ฆ โ ๐ฆ๐ | โฅ |๐ฆ | > 0, so the secondlimit in (20.12) cannot exist. Therefore the coderivative is empty, which is coveredby the last case of (20.7).
(iv) |๐ฅ | = 1 and ๐ฅฮ๐ฅ > 0: Then the rst limit in (20.12) requires that ๐ฅ๐ โ dom ๐๐ , andhence ๐๐ (๐ฅ๐) = โ for ๐ large enough. This is again covered by the last case of (20.7).
(v) |๐ฅ | = 1 and ๐ฅฮ๐ฅ < 0 (the case ๐ฅฮ๐ฅ = 0 being covered by (i)โ(iii)): Since ฮ๐ฅ โ 0 has adierent sign from ๐ฅ , it follows from the rst limit in (20.12) that ๐ฅ๐ โ (โ1, 1) for ๐large enough. Consequently, ๐๐ (๐ฅ๐) = {0}, i.e., ๐ฆ๐ = 0. The limit (20.12) in this caseonly exists if ๐ฆ = 0, in which case also ฮ๐ฆ = 0. This is covered by the third caseof (20.7), while ๐ฆ โ 0 is covered by the last case.
(vi) |๐ฅ | < 1: Then ๐ฆ = 0 and necessarily ๐ฆ๐ = 0 for ๐ large enough. Therefore also ฮ๐ฆ = 0,which yields the fourth case in (20.7).
(vii) |๐ฅ | > 1: Then ๐๐ (๐ฅ) = โ and therefore the coderivative is empty as well, yieldingagain the nal case (20.7).
The expression for ๐ทโ [๐๐ ] (๐ฅ |๐ฆ) can be veried using Corollary 20.8 (i). It can also be seengraphically from Figure 20.2.
By the inner and outer limit characterizations of Corollary 20.9, we now obtain the ex-pressions for the Clarke graphical derivative ๐ท [๐๐ ] (๐ฅ |๐ฆ) and the limiting coderivative๐ทโ [๐๐ ] (๐ฅ |๐ฆ). Since graph ๐๐ is locally contained in an ane subspace outside of the โcor-ner casesโ (๐ฅ, ๐ฆ) โ {(1, 0), (โ1, 0)}, only the latter need special inspection. For the Clarkegraphical derivative,we need to writeฮ๐ฆ as the limit ofฮ๐ฆ๐ โ ๐ท [๐๐ ] (๐ฅ๐ , ๐ฆ๐) (ฮ๐ฅ๐) for someฮ๐ฅ๐ โ ฮ๐ฅ and all graph ๐๐ 3 (๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ). Consider for example (๐ฅ, ๐ฆ) = (โ1, 0).Trying both (๐ฅ๐ , ๐ฆ๐) = (โ1+ 1/๐, 0) and (๐ฅ๐ , ๐ฆ๐) = (โ1,โ1/๐), we see that this is only possi-ble for (ฮ๐ฅ,ฮ๐ฆ) = (ฮ๐ฅ๐ ,ฮ๐ฆ๐) = (0, 0). This yields the second case of (20.9). Conversely, forthe limiting coderivative, it suces to nd one such sequence from the Frรฉchet coderivative.Choosing for (๐ฅ, ๐ฆ) = (โ1, 0) again (๐ฅ๐ , ๐ฆ๐) = (โ1 + 1/๐, 0) and (๐ฅ๐ , ๐ฆ๐) = (โ1,โ1/๐) aswell as the constant sequence (๐ฅ๐ , ๐ฆ๐) = (โ1, 0) yields the second, third, and rst case of(20.16), respectively.
283
20 derivatives and coderivatives of set-valued mappings
(a) graphical derivative ๐ท [๐๐ ] (b) convex hull co๐ท [๐๐ ] (c) Frรฉchet coderivative ๐ทโ [๐๐ ]
(d) limiting coderivative๐ทโ [๐๐ ]
(e) convex hull co๐ทโ [๐๐ ] (f) Clarke graphical derivative๐ท [๐๐ ]
Figure 20.3: Illustration of the dierent graphical derivatives and coderivatives of ๐๐ for๐ = | ยท |. The dashed line is graph ๐๐ . The dots indicate the base points (๐ฅ, ๐ฆ)where ๐ท [๐๐ ] (๐ฅ |๐ฆ) is calculated, and the thick arrows and lled-in areas the di-rections of (ฮ๐ฅ,ฮ๐ฆ) (resp. (ฮ๐ฅ,โฮ๐ฆ) for the coderivatives) relative to the basepoint. Observe that there is no graphical regularity at (๐ฅ, ๐ฆ) โ {(0,โ1), (0, 1)}.Everywhere else, ๐๐ is graphically regular. Observe that cones in the last g-ures of each row are polar to the cones in the rst and the second gures onthe same row.
Finally, in nite dimensions themapping ๐๐ is graphically regular if and only if๐ท [๐๐ ] (๐ฅ |๐ฆ) =๐ท [๐๐ ] (๐ฅ |๐ฆ) by Corollary 20.11, which is the case exactly when |๐ฅ | < 1 or ๐ฆ โ 0 asclaimed. ๏ฟฝ
In nonlinear optimization with inequality constraints, the case where ๐๐ is graphicallyregular corresponds precisely to the case of strict complementarity of the minimizer ๐ฅ andthe Lagrange multiplier ๐ฆ for the constraint ๐ฅ โ [โ1, 1].We next study the dierent derivatives and graphical regularity of the subdierential ofthe absolute value function; see Figure 20.3.
284
20 derivatives and coderivatives of set-valued mappings
Theorem 20.18. Let ๐ (๐ฅ) โ |๐ฅ | for ๐ฅ โ โ. Then
๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ) =
{0} if ๐ฅ โ 0, ๐ฆ = sign๐ฅ,{0} if ๐ฅ = 0, ฮ๐ฅ โ 0, ๐ฆ = signฮ๐ฅ,(โโ, 0]๐ฆ if ๐ฅ = 0, ฮ๐ฅ = 0, |๐ฆ | = 1,โ if ๐ฅ = 0, ฮ๐ฅ = 0, |๐ฆ | < 1,โ if otherwise,
(20.13)
๐ทโ [๐๐ ] (๐ฅ |๐ฆ) (๐ฆโ) =
{0} if ๐ฅ โ 0, ๐ฆ = sign๐ฅ,(โโ, 0]๐ฆ if ๐ฅ = 0, ๐ฆ๐ฆโ โค 0, |๐ฆ | = 1,โ if ๐ฅ = 0, ๐ฆโ = 0, |๐ฆ | < 1,โ otherwise,
(20.14)
๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ) =
{0} if ๐ฅ โ 0, ๐ฆ = sign๐ฅ,{0} if ๐ฅ = 0, ฮ๐ฅ = 0, |๐ฆ | = 1,โ if ๐ฅ = 0, ฮ๐ฅ = 0, |๐ฆ | < 1,โ otherwise,
(20.15)
and
๐ทโ [๐๐ ] (๐ฅ |๐ฆ) (๐ฆโ) =
{0} if ๐ฅ โ 0, ๐ฆ = sign๐ฅ,{0} if ๐ฅ = 0, ๐ฆ๐ฆโ > 0, |๐ฆ | = 1,(โโ, 0]๐ฆ if ๐ฅ = 0, ๐ฆ๐ฆโ < 0, |๐ฆ | = 1,โ if ๐ฅ = 0, ๐ฆโ = 0, |๐ฆ | โค 1,โ otherwise.
(20.16)
In particular, ๐๐ is graphically regular if and only if ๐ฅ โ 0 or |๐ฆ | < 1.
Proof. To start with proving (20.13), we recall from Example 4.7 that
(20.17) ๐๐ (๐ฅ) = sign(๐ฅ) ={1} if ๐ฅ > 0{โ1} if ๐ฅ < 0[โ1, 1] if ๐ฅ = 0.
To calculate the graphical derivative, we use that if ๐ฆ โ ๐๐ (๐ฅ) and ฮ๐ฆ โ ๐ท [๐๐ ] (๐ฅ |๐ฆ) (ฮ๐ฅ),there exist by (20.1) sequences ๐ก๐โ 0, ๐ฅ๐ โ ๐ฅ , and ๐ฆ๐ โ ๐๐ (๐ฅ + ๐ก๐ฮ๐ฅ๐) such that
(20.18) ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐ก๐
and ฮ๐ฆ = lim๐โโ
๐ฆ๐ โ ๐ฆ๐ก๐
.
We proceed by case distinction:
(i) ๐ฅ โ 0 and ๐ฆ โ sign๐ฅ : Then ๐ฆ โ ๐๐ (๐ฅ) and therefore ๐ท [๐๐ ] (๐ฅ |๐ฆ) = โ , which iscovered by the last case of (20.13).
285
20 derivatives and coderivatives of set-valued mappings
(ii) ๐ฅ โ 0 and ๐ฆ = sign๐ฅ : Then for any ๐ฅ๐ โ ๐ฅ , we have that ๐๐ (๐ฅ๐) = ๐๐ (๐ฅ) = {sign๐ฅ}for ๐ large enough. Therefore, for any ฮ๐ฅ โ โ we have that ฮ๐ฆ = 0, which is therst case of (20.13).
(iii) ๐ฅ = 0 and ฮ๐ฅ โ 0: Then ๐ฅ๐ โ 0 and ๐ฆ๐ = sign๐ฅ๐ = signฮ๐ฅ . Therefore the limits in(20.18) will only exist if |๐ฆ | = 1, which holds from ๐ฆ = signฮ๐ฅ . Thus ฮ๐ฆ = 0, i.e., weobtain the second case of (20.13).
(iv) ๐ฅ = 0 and ฮ๐ฅ = 0: Then taking ๐ฅ๐ โก ๐ฅ , we can choose ๐ฆ๐ โ [โ1,โ1] arbitrarily. If|๐ฆ | = 1, then (๐ฆ โ ๐ฆ๐) sign ๐ฆ โค 0, so (20.18) shows that ฮ๐ฆ sign ๐ฆ โค 0, which is thethird case of (20.13). If |๐ฆ | < 1, we may obtain any ฮ๐ฆ โ โ by the limit in (20.18).This is the fourth case of (20.13).
The expression for ๐ทโ [๐๐ ] (๐ฅ |๐ฆ) can be veried using Corollary 20.8 (i). It can also be seengraphically from Figure 20.3.
By the inner and outer limit characterizations of Corollary 20.9, we now obtain the ex-pressions for the Clarke graphical derivative ๐ท [๐๐ ] (๐ฅ |๐ฆ) and the limiting coderivative๐ทโ [๐๐ ] (๐ฅ |๐ฆ). Since graph ๐๐ is locally contained in an ane subspace outside of the โcor-ner casesโ (๐ฅ, ๐ฆ) โ {(0, 1), (0,โ1)}, only the latter need special inspection. For the Clarkegraphical derivative,we need to writeฮ๐ฆ as the limit ofฮ๐ฆ๐ โ ๐ท [๐๐ ] (๐ฅ๐ , ๐ฆ๐) (ฮ๐ฅ๐) for someฮ๐ฅ๐ โ ฮ๐ฅ and all graph ๐๐ 3 (๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ). Consider for example (๐ฅ, ๐ฆ) = (0,โ1).Trying both (๐ฅ๐ , ๐ฆ๐) = (0,โ1+ 1/๐) and (๐ฅ๐ , ๐ฆ๐) = (โ1/๐,โ1), we see that this is only possi-ble for (ฮ๐ฅ,ฮ๐ฆ) = (ฮ๐ฅ๐ ,ฮ๐ฆ๐) = (0, 0). This yields the third case of (20.15). Conversely, forthe limiting coderivative, it suces to nd one such sequence from the Frรฉchet coderivative.Choosing for (๐ฅ, ๐ฆ) = (0,โ1) again (๐ฅ๐ , ๐ฆ๐) = (0,โ1 + 1/๐) and (๐ฅ๐ , ๐ฆ๐) = (โ1/๐, 1) as wellas the constant sequence (๐ฅ๐ , ๐ฆ๐) = (โ1, 0) yields the fourth, second, and third case of(20.16), respectively.
Finally, in nite dimensions themapping ๐๐ is graphically regular if and only if๐ท [๐๐ ] (๐ฅ |๐ฆ) =๐ท [๐๐ ] (๐ฅ |๐ฆ) by Corollary 20.11, which is the case exactly when ๐ฅ โ 0 or |๐ฆ | < 1 asclaimed. ๏ฟฝ
20.4 relation to subdifferentials
All of the subdierentials that we have studied in Part III can be constructed from thecorresponding normal cones to the epigraph of a functional ๐ฝ : ๐ โ โ as in the convexcase; see Lemma 4.10. For the Frรฉchet and limiting subdierentials, it is easy to see therelationships
๐๐น ๐ฝ (๐ฅ) = {๐ฅโ โ ๐ โ | (๐ฅโ,โ1) โ ๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))},(20.19)๐๐ ๐ฝ (๐ฅ) = {๐ฅโ โ ๐ โ | (๐ฅโ,โ1) โ ๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))},(20.20)
286
20 derivatives and coderivatives of set-valued mappings
from the corresponding denitions. For the Clarke subdierential, however, we have towork a bit harder.
First, we dene for ๐ด โ ๐ and ๐ฅ โ ๐ the Clarke normal cone
(20.21) ๐๐ถ๐ด (๐ฅ) โ ๐๐ด (๐ฅ)โฆ.
We can now extend the denition of the Clarke subdierential to arbitrary functionals๐ฝ : ๐ โ โ on Gรขteaux smooth Banach spaces via the Clarke normal cone to theirepigraph.
Lemma 20.19. Let ๐ be a reexive and Gรขteaux smooth Banach space and let ๐ฝ : ๐ โ โ belocally Lipschitz continuous around ๐ฅ โ ๐ . Then
๐๐ถ ๐ฝ (๐ฅ) = {๐ฅโ โ ๐ โ | (๐ฅโ,โ1) โ ๐๐ถepi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))}.
Proof. The Clarke tangent cone to epi ๐ฝ by denition is
๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ)) =(ฮ๐ฅ,ฮ๐ก) โ ๐ รโ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐๐โ 0, ๐ฅ๐ โ ๐ฅ, ๐ฝ (๐ฅ๐) โค ๐ก๐ โ ๐ฝ (๐ฅ)there exist ๐ฅ๐ โ ๐ and ๐ก๐ โฅ ๐ฝ (๐ฅ๐)
with (๐ฅ๐ โ ๐ฅ๐)/๐๐ โ ฮ๐ฅ and (๐ก๐ โ ๐ก๐)/๐๐ โ ฮ๐ก
.If (ฮ๐ฅ,ฮ๐ก) โ ๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ)), then replacing ๐ก๐ by ๐ก๐ + ๐๐ (ฮ๐ โ ฮ๐ก) โฅ ๐ฝ (๐ฅ๐) shows thatalso (ฮ๐ฅ,ฮ๐ ) โ ๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ)) for all ฮ๐ โฅ ฮ๐ก . Thus we may make the minimal choices๐ก๐ = ๐ฝ (๐ฅ๐) and ๐ก๐ = ๐ฝ (๐ฅ๐) to see that
๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ)) =(ฮ๐ฅ,ฮ๐ก) โ ๐ รโ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐๐โ 0, ๐ฅ๐ โ ๐ฅ there exist ๐ฅ๐ โ ๐with (๐ฅ๐ โ ๐ฅ๐)/๐๐ โ ฮ๐ฅ
and lim sup๐โโ(๐ฝ (๐ฅ๐) โ ๐ฝ (๐ฅ๐))/๐๐ โค ฮ๐ก
.Since ๐ฝ is locally Lipschitz continuous, it suces to take ๐ฅ๐ = ๐ฅ๐ + ๐๐ฮ๐ฅ to obtain
๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ)) = {(ฮ๐ฅ,ฮ๐ก) โ ๐ รโ | ๐ฅ โ ๐, ฮ๐ก โฅ ๐ฝ โฆ(๐ฅ ;ฮ๐ฅ)} = epi[๐ฝ โฆ(๐ฅ ; ยท )] .
Hence (๐ฅโ,โ1) โ ๐๐ถepi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ)) = ๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))โฆ if and only if ใ๐ฅโ,ฮ๐ฅใ๐ โค ๐ฝ โฆ(๐ฅ ;ฮ๐ฅ) for
all ๐ฅ โ ๐ , which by denition is equivalent to ๐ฅโ โ ๐๐ถ ๐ฝ (๐ฅ). ๏ฟฝ
We furthermore have the following relationship between the Clarke and limiting normalcones.
Corollary 20.20. Let ๐ be a reexive and Gรขteaux smooth Banach space and ๐ด โ ๐ be closednear ๐ฅ โ ๐ด. Then
๐๐ถ๐ด (๐ฅ) = ๐๐ด (๐ฅ)โฆโฆ = cl coโ ๐๐ด (๐ฅ),
where cl coโ denotes the weak-โ closed convex hull.
287
20 derivatives and coderivatives of set-valued mappings
Proof. First, ๐๐ด (๐ฅ) โ โ since ๐ฅ โ ๐ด. Furthermore, cl coโ ๐๐ด (๐ฅ) is the smallest weak-โ-closed and convex set that contains ๐๐ด (๐ฅ), and therefore Theorem 1.8 and Lemma 1.10imply ๐๐ด (๐ฅ)โฆโฆ = cl coโ ๐๐ด (๐ฅ)โฆโฆ = cl coโ ๐๐ด (๐ฅ). The relationship ๐๐ถ
๐ด (๐ฅ) = ๐๐ด (๐ฅ)โฆโฆ is animmediate consequence of Theorem 18.19. ๏ฟฝ
Assuming that ๐ is Gรขteaux smooth, we now have everything at hand to give a proof ofTheorem 16.10, which characterizes the Clarke subdierential as the weak-โ closed convexhull of the limiting subdierential.
Corollary 20.21. Let ๐ be a reexive and Gรขteaux smooth Banach space and ๐ฝ : ๐ โ โ belocally Lipschitz continuous around ๐ฅ โ ๐ . Then ๐๐ถ ๐ฝ (๐ฅ) = clโ co ๐๐ ๐ฝ (๐ฅ).
Proof. Together, Lemma 20.19 and Corollary 20.20 and (20.20) directly yield
๐๐ถ ๐ฝ (๐ฅ) = {๐ฅโ โ ๐ โ | (๐ฅโ,โ1) โ ๐๐ถepi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))}
= {๐ฅโ โ ๐ โ | (๐ฅโ,โ1) โ clโ co๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))}= clโ co{๐ฅโ โ ๐ โ | (๐ฅโ,โ1) โ ๐epi ๐ฝ (๐ฅ, ๐ฝ (๐ฅ))}= clโ co ๐๐ ๐ฝ (๐ฅ). ๏ฟฝ
(The Gรขteaux smoothness of ๐ can be relaxed to ๐ being an Asplund space followingRemark 17.8.)
From the corresponding denitions, it also follows that
๐๐น ๐ฝ (๐ฅ) = ๐ทโ [epif ๐ฝ ] (๐ฅ |๐ฝ (๐ฅ)) (โ1),๐๐ ๐ฝ (๐ฅ) = ๐ท [epif ๐ฝ ] (๐ฅ |๐ฝ (๐ฅ)) (โ1),
where the epigraphical function
epif ๐ฝ (๐ฅ) โ {๐ก โ โ | ๐ก โฅ ๐ฝ (๐ฅ)}
satises graph[epif ๐ฝ ] = epi ๐ฝ . Thus the results of the following Chapters 23 and 25 can beused to derive the missing calculus rules for the Frรฉchet and limiting subdierentials. Inparticular, Theorem 25.14 will provide the missing proof of the sum rule (Theorem 16.13)for the limiting subdierential.
288
21 DERIVATIVES AND CODERIVATIVES OF
POINTWISE-DEFINED MAPPINGS
Just as for tangent and normal cones, the relationships between the basic and limitingderivatives and coderivatives are less complete in innite-dimensional spaces than in nite-dimensional ones. In this chapter, we apply the results of Chapter 19 to derive pointwisecharacterizations analogous to Theorem 4.11 for the basic derivatives of pointwise-denedset-valued mappings, which (only) in the case of graphically regularity transfer to theirlimiting variants.
21.1 proto-differentiability
For our superposition formulas, we need some regularity from the nite-dimensionalmappings. The appropriate notion is that of proto-dierentiability, which corresponds tothe geometric derivability of the underlying tangent cone.
Let ๐,๐ be Banach spaces. We say that a set-valued mapping ๐น : ๐ โ ๐ is proto-dierentiable at ๐ฅ โ ๐ for ๐ฆ โ ๐น (๐ฅ) if
for every ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) and ๐๐โ 0,(21.1a)
there exist ๐ฅ๐ โ ๐ with ๐ฅ๐ โ ๐ฅ๐๐
โ ฮ๐ฅ and ๐ฆ๐ โ ๐น (๐ฅ๐) with๐ฆ๐ โ ๐ฆ๐๐
โ ฮ๐ฆ.(21.1b)
In other words, in addition to the basic limit (20.1) dening ๐ท๐น (๐ฅ |๐ฆ), a corresponding innerlimit holds in the graph space.
By application of Lemma 19.1 and Corollary 19.3, we immediately obtain the followingequivalent characterization.
Corollary 21.1. Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ . Then ๐น is proto-dierentiable atevery ๐ฅ โ ๐ for every ๐ฆ โ ๐น (๐ฅ) if and only if graph ๐น is geometrically derivable at (๐ฅ, ๐ฆ). Inparticular, if ๐น is graphically regular at (๐ฅ, ๐ฆ), then ๐น is proto-dierentiable at ๐ฅ for ๐ฆ .
Clearly, dierentiable single-valued mappings are proto-dierentiable. Another large classare maximally monotone set-valued mappings on Hilbert spaces.
289
21 derivatives and coderivatives of pointwise-defined mappings
Lemma 21.2. Let ๐ be a Hilbert space and let ๐ด : ๐ โ ๐ be maximally monotone. Then ๐ด isproto-dierentiable at any ๐ฅ โ dom๐ด for any ๐ฅโ โ ๐ด(๐ฅ).
Proof. Let ฮ๐ฅโ โ ๐ท [๐ด] (๐ฅ |๐ฅโ) (ฮ๐ฅ). By denition, there then exist ๐๐โ 0 and (๐ฅ๐ , ๐ฅโ๐ ) โgraph๐ด such that (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ and (๐ฅโ
๐โ ๐ฅโ)/๐๐ โ ฮ๐ฅโ. To show that ๐ด is proto-
dierentiable, we will construct for an arbitrary sequence ๐๐โ 0 sequences (๐ฅ๐ , ๐ฅโ๐ ) โgraph๐ด such that (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ and (๐ฅโ
๐โ ๐ฅโ)/๐๐ โ ฮ๐ฅโ. We will do so using
resolvents. Similarly to Lemma 6.18, we have that
๐ฅโ โ ๐ด(๐ฅ) โ ๐ฅ โ ๐ดโ1(๐ฅโ) โ ๐ฅโ + ๐ฅ โ {๐ฅโ} +๐ดโ1(๐ฅโ)โ ๐ฅโ โ R๐ดโ1 (๐ฅโ + ๐ฅ).
Since๐ด is maximallymonotone and๐ is reexive,๐ดโ1 is maximallymonotone by Lemma 6.7as well, and thus the resolvent R๐ดโ1 is single-valued by Corollary 6.14. We therefore take
๐ฅ๐ โ ๐ฅ + ๐๐๐๐
(๐ฅ๐ โ ๐ฅ) +๐๐๐๐๐ฅโ๐ +
(1 โ ๐๐
๐๐
)๐ฅโ โ ๐ฅโ๐ and
๐ฅโ๐ โ R๐ดโ1
(๐ฅ + ๐๐
๐๐(๐ฅ๐ โ ๐ฅ) +
๐๐๐๐๐ฅโ๐ +
(1 โ ๐๐
๐๐
)๐ฅโ โ ๐๐ (ฮ๐ฅ + ฮ๐ฅโ)
)+ ๐๐ฮ๐ฅโ
= R๐ดโ1 (๐ฅโ๐ + ๐ฅ๐ โ ๐๐ (ฮ๐ฅ + ฮ๐ฅโ)) + ๐๐ฮ๐ฅโ.
Since resolvents of maximally monotone operators are 1-Lipschitz by Lemma 6.13, we have
lim๐โโ
โ๐ฅโ๐โ ๐ฅโ โ ๐๐ฮ๐ฅโโ๐
๐๐= lim๐โโ
โR๐ดโ1 (๐ฅโ๐+ ๐ฅ๐ โ ๐๐ (ฮ๐ฅ + ฮ๐ฅโ)) โ R๐ดโ1 (๐ฅโ + ๐ฅ)โ๐
๐๐
โค lim๐โโ
โ(๐ฅโ๐+ ๐ฅ๐ โ ๐๐ (ฮ๐ฅ + ฮ๐ฅโ)) โ (๐ฅโ + ๐ฅ)โ๐
๐๐
= lim๐โโ
โ(๐ฅ๐ โ ๐ฅ โ ๐๐ฮ๐ฅ) + (๐ฅโ๐โ ๐ฅโ โ ๐๐ฮ๐ฅโ)โ๐
๐๐= 0.
Likewise, by inserting the denition of ๐ฅ๐ and using the triangle inequality, we obtain
lim๐โโ
โ๐ฅ๐ โ ๐ฅ โ ๐๐ฮ๐ฅ โ๐๐๐
โค lim๐โโ
โ(๐ฅ๐ โ ๐ฅ โ ๐๐ฮ๐ฅ) + (๐ฅโ๐โ ๐ฅโ โ ๐๐ฮ๐ฅโ)โ๐
๐๐
+ lim๐โโ
โ๐ฅโ๐โ ๐ฅโ โ ๐๐ฮ๐ฅโโ๐
๐๐= 0.
This shows the claimed proto-dierentiability. ๏ฟฝ
Since subdierentials of convex and lower semicontinuous functionals on reexive Banachspaces are maximally monotone by Theorem 6.11, we immediately obtain the following.
290
21 derivatives and coderivatives of pointwise-defined mappings
Corollary 21.3. Let ๐ be a Hilbert space and let ๐ฝ : ๐ โ โ be proper, convex, and lowersemicontinuous. Then ๐๐ฝ is proto-dierentiable at any ๐ฅ โ dom ๐ฝ for any ๐ฅโ โ ๐๐ฝ (๐ฅ).
This corollary combined with Theorems 20.17 and 20.18 shows that proto-dierentiabilityis a strictly weaker property than graphical regularity.
21.2 graphical derivatives and coderivatives
As a corollary of the tangent and normal cone representations from Theorems 19.5 and 19.6,we obtain explicit characterizations of the graphical derivative and the Frรฉchet coderivativeof a class of pointwise-dened set-valued mappings. In the following, let ฮฉ โ โ๐ be anopen and bounded domain and write again ๐โ for the conjugate exponent of ๐ โ (1,โ)satisfying 1/๐ + 1/๐โ = 1.
Theorem 21.4. Let ๐น : ๐ฟ๐ (ฮฉ) โ ๐ฟ๐ (ฮฉ) for ๐, ๐ โ (1,โ) have the form
๐น (๐ข) = {๐ค โ ๐ฟ๐ (ฮฉ) | ๐ค (๐ฅ) โ ๐ (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ}
for some pointwise almost everywhere proto-dierentiable mapping ๐ : โ โ โ. Then forevery๐คโ โ ๐ฟ๐โ (ฮฉ) and ฮ๐ข โ ๐ฟ๐ (ฮฉ),
๐ทโ๐น (๐ข |๐ค) (๐คโ) ={๐ขโ โ ๐ฟ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ขโ(๐ฅ) โ ๐ทโ๐ (๐ข (๐ฅ) |๐ค (๐ฅ)) (๐คโ(๐ฅ))for a.e. ๐ฅ โ ฮฉ
},(21.2a)
๐ท๐น (๐ข |๐ค) (ฮ๐ข) ={ฮ๐ค โ ๐ฟ๐ (ฮฉ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ๐ค (๐ฅ) โ ๐ท๐ (๐ข (๐ฅ) |๐ค (๐ฅ)) (ฮ๐ข (๐ฅ))for a.e. ๐ฅ โ ฮฉ
}.(21.2b)
Moreover, if ๐ is graphically regular at ๐ข (๐ฅ) for ๐ค (๐ฅ) for almost every ๐ฅ โ ฮฉ, then ๐น isgraphically regular at ๐ข for๐ค and
๐ท๐น (๐ข |๐ค) = ๐ท๐ค๐น (๐ข |๐ค) = ๐ท๐น (๐ข |๐ค),๐ทโ๐น (๐ข |๐ค) = ๐ทโ๐น (๐ข |๐ค).
Proof. First, graph ๐ is geometrically derivable by Corollary 21.1 due to the assumed proto-dierentiability of ๐ . We further have
graph ๐น ={(๐ข,๐ค) โ ๐ฟ๐ (ฮฉ) ร ๐ฟ๐ (ฮฉ)
๏ฟฝ๏ฟฝ (๐ข (๐ฅ),๐ค (๐ฅ)) โ graph ๐ for a.e. ๐ฅ โ ฮฉ}.
Now (21.2b) and (21.2a) follow fromTheorems 19.5 and 19.6, respectively, for๐ถ : ๐ฅ โฆโ graph ๐and๐ = graph ๐น togetherwith denitions of the graphical derivative in terms of the tangentcone the Frรฉchet coderivative in terms of the Frรฉchet normal cone. The remaining claimsunder graphical regularity follow similarly from Lemma 19.11. ๏ฟฝ
291
21 derivatives and coderivatives of pointwise-defined mappings
The above result directly applies to second derivatives of integral functionals.
Corollary 21.5. Let ๐ฝ : ๐ฟ๐ (ฮฉ) โ โ for ๐ โ (1,โ) be given by
๐ฝ (๐ข) =โซฮฉ๐ (๐ข (๐ฅ)) ๐๐ฅ
for some proper, convex, and lower semicontinuous integrand ๐ : โ โ (โโ,โ]. Then
๐ทโ [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) ={ฮ๐ขโ โ ๐ฟ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ๐ขโ(๐ฅ) โ ๐ทโ [๐ ๐] (๐ข (๐ฅ) |๐ขโ(๐ฅ)) (ฮ๐ข (๐ฅ))for a.e. ๐ฅ โ ฮฉ
},
๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) ={ฮ๐ขโ โ ๐ฟ๐โ (ฮฉ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ๐ขโ(๐ฅ) โ ๐ท [๐ ๐] (๐ข (๐ฅ) |๐ขโ(๐ฅ)) (ฮ๐ข (๐ฅ))for a.e. ๐ฅ โ ฮฉ
}.
Moreover, if ๐ ๐ is graphically regular at ๐ข (๐ฅ) for ๐ขโ(๐ฅ) for almost every ๐ฅ โ ฮฉ, then ๐๐ฝ isgraphically regular at ๐ข for ๐ขโ and
๐ท [๐๐ฝ ] (๐ข |๐ขโ) = ๐ท๐ค [๐๐ฝ ] (๐ข |๐ขโ) = ๐ท [๐๐ฝ ] (๐ข |๐ขโ),๐ทโ [๐๐ฝ ] (๐ข |๐ขโ) = ๐ทโ [๐๐ฝ ] (๐ข |๐ขโ).
Proof. By Corollary 21.3, ๐ ๐ is proto-dierentiable. Since
๐๐ฝ (๐ข) ={๐ขโ โ ๐ฟ๐โ (ฮฉ) | ๐ขโ(๐ฅ) โ ๐ ๐ (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ
}by Theorem 4.11 and therefore
graph[๐๐ฝ ] ={(๐ข,๐ขโ) โ ๐ฟ๐ (ฮฉ) ร ๐ฟ๐โ (ฮฉ) | ๐ขโ(๐ฅ) โ ๐ ๐ (๐ข (๐ฅ)) for a.e. ๐ฅ โ ฮฉ
},
the remaining claims follow from Theorem 21.4 with ๐น = ๐๐ฝ , ๐ = ๐ ๐ , and ๐ = ๐โ. ๏ฟฝ
Remark 21.6. The case of vector-valued and spatially-varying set-valued mappings and convexintegrands can be found in [Clason & Valkonen 2017b].
We illustrate this result with the usual examples. To keep the presentation simple, we focuson the case ๐โ = ๐ = 2 such that ๐ฟ2(ฮฉ) is a Hilbert space and we can identify ๐ ๏ฟฝ ๐ โ.
First, we immediately obtain from Corollary 20.16 together with Corollary 21.5
Corollary 21.7. Let ๐ฝ : ๐ฟ2(ฮฉ) โ โ be given by
๐ฝ (๐ข) โโซฮฉ
12 |๐ข (๐ฅ) |
2 ๐๐ฅ.
Then for ๐ขโ = ๐ข and all ฮ๐ข โ ๐ฟ2(ฮฉ), we have๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ๐ท๐ค [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ฮ๐ข,
๐ทโ [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ๐ทโ [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ฮ๐ข.
If ๐ขโ โ ๐ข, all the derivatives and coderivatives are empty.
292
21 derivatives and coderivatives of pointwise-defined mappings
From Theorem 20.17, we also obtain expressions for the basic derivatives of indicatorfunctionals for pointwise constraints. For the limiting derivatives, we only obtain expres-sions at points where graphical regularity (corresponding to strict complementarity) holds;cf. Remark 19.14.
Corollary 21.8. Let ๐ฝ : ๐ฟ2(ฮฉ) โ โ be given by
๐ฝ (๐ข) โโซฮฉ๐ฟ [โ1,1] (๐ข (๐ฅ)) ๐๐ฅ.
Let ๐ข โ dom ๐ฝ and ๐ขโ โ ๐๐ฝ (๐ข). Then ฮ๐ขโ โ ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) โ ๐ฟ2(ฮฉ) if and only if foralmost every ๐ฅ โ ฮฉ,
ฮ๐ขโ(๐ฅ) โ
โ if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) โ (0,โ)๐ข (๐ฅ), ฮ๐ข (๐ฅ) = 0,[0,โ)๐ข (๐ฅ) if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) = 0, ฮ๐ข (๐ฅ) = 0,{0} if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) = 0, ๐ข (๐ฅ)ฮ๐ข (๐ฅ) < 0,{0} if |๐ข (๐ฅ) | < 1, ๐ขโ(๐ฅ) = 0,โ otherwise.
Similarly, ฮ๐ข โ ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ขโ) โ ๐ฟ2(ฮฉ) if and only if for almost every ๐ฅ โ ฮฉ,
ฮ๐ข (๐ฅ) โ
โ, if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) โ (0,โ)๐ข (๐ฅ),ฮ๐ขโ(๐ฅ) = 0,[0,โ)๐ข (๐ฅ) if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) = 0, ๐ข (๐ฅ)ฮ๐ขโ(๐ฅ) โฅ 0,{0} if |๐ข (๐ฅ) | < 1, ๐ขโ(๐ฅ) = 0,โ otherwise.
If either |๐ข (๐ฅ) | < 1 or ๐ขโ(๐ฅ) โ 0, then ฮ๐ขโ โ ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ๐ทโ [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) if andonly if for almost every ๐ฅ โ ฮฉ,
ฮ๐ขโ(๐ฅ) โโ if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) โ (0,โ)๐ข (๐ฅ), ฮ๐ข (๐ฅ) = 0,{0} if |๐ข (๐ฅ) | < 1, ฮ๐ข (๐ฅ) โ โ,
โ otherwise.
A similar characterization holds for the basic derivatives of the ๐ฟ1 norm (as a functional on๐ฟ2(ฮฉ)).
Corollary 21.9. Let ๐ฝ : ๐ฟ2(ฮฉ) โ โ be given by
๐ฝ (๐ข) โโซฮฉ|๐ข (๐ฅ) | ๐๐ฅ.
293
21 derivatives and coderivatives of pointwise-defined mappings
Let ๐ข โ dom ๐ฝ and ๐ขโ โ ๐๐ฝ (๐ข). Then ฮ๐ขโ โ ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) โ ๐ฟ2(ฮฉ) if and only if foralmost every ๐ฅ โ ฮฉ,
ฮ๐ขโ(๐ฅ) โ
โ if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) โ (0,โ)๐ข (๐ฅ), ฮ๐ข (๐ฅ) = 0,[0,โ)๐ข (๐ฅ) if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) = 0, ฮ๐ข (๐ฅ) = 0,{0} if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) = 0, ๐ข (๐ฅ)ฮ๐ข (๐ฅ) < 0,{0} if |๐ข (๐ฅ) | < 1, ๐ขโ(๐ฅ) = 0,โ otherwise,
Similarly, ฮ๐ข โ ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ขโ) โ ๐ฟ2(ฮฉ) if and only if for almost every ๐ฅ โ ฮฉ,
ฮ๐ข (๐ฅ) โ
โ, if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) โ (0,โ)๐ข (๐ฅ),ฮ๐ขโ(๐ฅ) = 0,[0,โ)๐ข (๐ฅ) if |๐ข (๐ฅ) | = 1, ๐ขโ(๐ฅ) = 0, ๐ข (๐ฅ)ฮ๐ขโ(๐ฅ) โฅ 0,{0} if |๐ข (๐ฅ) | < 1, ๐ขโ(๐ฅ) = 0,โ otherwise,
If either ๐ข (๐ฅ) โ 0 or |๐ขโ(๐ฅ) | < 1, then ฮ๐ขโ โ ๐ท [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) = ๐ทโ [๐๐ฝ ] (๐ข |๐ขโ) (ฮ๐ข) if andonly if for almost every ๐ฅ โ ฮฉ,
ฮ๐ขโ(๐ฅ) โ{0} if ๐ข (๐ฅ) โ 0, ๐ขโ(๐ฅ) = sign๐ข (๐ฅ), ฮ๐ข (๐ฅ) โ โ,
โ if ๐ข (๐ฅ) = 0, |๐ขโ(๐ฅ) | < 1, ฮ๐ข (๐ฅ) = 0,โ otherwise.
Obtaining similar characterizations for derivatives of the Clarke subdierential of integralfunctions with nonsmooth nonconvex integrands requires verifying proto-dierentiabilityof the pointwise subdierential mapping, which is challenging since the Clarke subdif-ferential in general does not have the nice properties of the convex subdierential as aset-valued mapping. For problems of the form (P) in the introduction, it is therefore simplerto rst apply the calculus rules from the following chapters (assuming they are applicable)and to then use the above results for the derivatives of the convex or smooth componentmappings.
294
22 CALCULUS FOR THE GRAPHICAL DERIVATIVE
We now turn to calculus such as sum and product rules. We concentrate on the situationwhere at least one of the mappings involved is classically dierentiable, which allows exactresults and is already useful in practice. For a much fuller picture of innite-dimensionalcalculus in high generality, the reader is referred to [Mordukhovich 2006]. For furthernite-dimensional calculus we refer to [Rockafellar & Wets 1998; Mordukhovich 2018].
The rules we develop for the various (co)derivatives are in each case based on lineartransformation formulas of the underlying cones as well as on a fundamental compositionlemma. These fundamental lemmas, however, require further regularity assumptions thatare satised in particular by (continuously) Frรฉchet dierentiable single-valued mappingsand their inverses. For the sake of presentation, we treat each derivative in its own chapter,starting with the relevant regularity concept, then proving the fundamental lemmas, andnally deriving the calculus rules. We start with the (basic) graphical derivative.
22.1 semi-differentiability
Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ . We say that ๐น is semi-dierentiable at ๐ฅ โ ๐ for๐ฆ โ ๐น (๐ฅ) if
for every ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) and ๐ฅ๐ โ ๐ฅ, ๐๐โ 0 with ๐ฅ๐ โ ๐ฅ๐๐
โ ฮ๐ฅ(22.1a)
there exist ๐ฆ๐ โ ๐น (๐ฅ๐) with ๐ฆ๐ โ ๐ฆ๐๐
โ ฮ๐ฆ.(22.1b)
In other words, ๐ท๐น (๐ฅ |๐ฆ) is a full limit.
Lemma 22.1. A mapping ๐น : ๐ โ ๐ is semi-dierentiable at ๐ฅ โ ๐ for ๐ฆ โ ๐ if and only if
(22.2) ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = lim๐โ 0,ฮ๐ฅโฮ๐ฅ
๐น (๐ฅ + ๐ฮ๐ฅ) โ ๐ฆ๐
(ฮ๐ฅ โ ๐ ).
Proof. First, note that (20.1) shows that ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) is the outer limit corresponding to(22.2). Similarly, by (22.1), ๐น is semidierentiable if ๐ท๐น (๐ฅ |๐ฆ) is the corresponding innerlimit. (For any sequence ๐๐โ 0, we can relate ๐ฅ๐ in (22.1) and ฮ๐ฅ =: ฮ๐ฅ๐ in (22.2) via
295
22 calculus for the graphical derivative
ฮ๐ฅ๐ = (๐ฅ๐ โ ๐ฅ)/๐๐ .) Hence, ๐น is semidierentiable if and only if the outer limit in (20.1) is afull limit. ๏ฟฝ
Compared to the denition of proto-dierentiability in Section 21.1, we now require thatฮ๐ฆ can be written as the limit of a dierence quotient taken from ๐น (๐ฅ๐) for any sequence{๐ฅ๐} similarly realizing ฮ๐ฅ (while for proto-dierentiability, this only has to be possiblefor one such sequence). Hence, semi-dierentiability is a stronger property than proto-dierentiability with the former implying the latter.
Example 22.2 (proto-dierentiable but not semi-dierentiable). Let ๐น : โ โ โ havegraph ๐น = โ ร {0}. Then ๐น is proto-dierentiable at any ๐ฅ โ โ by the density of โ inโ. However, ๐น is not semi-dierentiable, as we can take ๐ฅ๐ โ โ in (22.1).
For single-valued mappings and their (set-valued) inverses, this implies the following.
Lemma 22.3. Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ .
(i) If ๐น is Frรฉchet dierentiable at ๐ฅ , then ๐น is semi-dierentiable at ๐ฅ for ๐ฆ = ๐น (๐ฅ).(ii) If ๐น is continuously dierentiable at ๐ฅ and ๐น โฒ(๐ฅ)โ โ ๐(๐ โ;๐ โ) has a left-inverse
๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ), then ๐นโ1 : ๐ โ ๐ is semi-dierentiable at ๐ฆ = ๐น (๐ฅ) for ๐ฅ .
Proof. (i): This follows directly from the denition of semi-dierentiability and the Frรฉchetderivative.
(ii): By Corollary 20.14, ๐ท๐นโ1(๐ฆ |๐ฅ) (ฮ๐ฆ) = {ฮ๐ฅ โ ๐ | ๐น โฒ(๐ฅ)ฮ๐ฅ = ฮ๐ฆ} for ๐ฆ = ๐น (๐ฅ). Hence(22.1) for ๐นโ1 requires showing that for all ๐๐โ 0 and ๐ฆ๐ โ ๐ with (๐ฆ๐ โ ๐ฆ)/๐๐ โ ๐น โฒ(๐ฅ)ฮ๐ฅ ,there exist ๐ฅ๐ with ๐ฆ๐ = ๐น (๐ฅ๐) and (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ . We will construct such ๐ฅ๐ throughthe inverse function theorem applied to an extended function.
Let ๐ด โ ๐น โฒ(๐ฅ) and ๐ดโ โ (๐น โฒ(๐ฅ)โโ )โ. Then ๐ด๐ดโ = Id because (๐ด๐ดโ )โ = (๐ดโ )โ๐ดโ =๐น โฒ(๐ฅ)โโ ๐น โฒ(๐ฅ)โ = Id. Moreover, ๐ โ Idโ๐ดโ ๐ด projects into ker๐ด = ker ๐น โฒ(๐ฅ), so that๐ด๐ = 0.We then dene
๐น : ๐ โ ๐ ร ker ๐น โฒ(๐ฅ), ๐น (๐ฅ) โ (๐น (๐ฅ), ๐๐ฅ),such that ๐น (๐ฅ)โฒฮ๐ฅ = (๐ดฮ๐ฅ, ๐ฮ๐ฅ) for all ฮ๐ฅ โ ๐ . We further dene
๐ : ๐ ร ker ๐น โฒ(๐ฅ) โ ๐, ๐ (๏ฟฝ๏ฟฝ, ๐ฅ) โ ๐ดโ ๏ฟฝ๏ฟฝ + ๐ฅ,
such that for all ฮ๐ฅ โ ๐ ,
๐๐น โฒ(๐ฅ)ฮ๐ฅ = ๐ดโ ๐ดฮ๐ฅ + ๐ฮ๐ฅ = ฮ๐ฅ .
Thus ๐ is a left-inverse of ๐น โฒ(๐ฅ) and thus ker ๐น โฒ(๐ฅ) = {0}. Similarly, we verify that ๐ isalso the right-inverse of ๐น โฒ(๐ฅ) on ๐ ร ker ๐น โฒ(๐ฅ). Hence ๐น โฒ(๐ฅ) is bijective.
296
22 calculus for the graphical derivative
By the inverse function Theorem 2.8, ๐นโ1 exists in a neighborhood of๐ค = (๐ฆ, ๐) โ ๐น (๐ฅ)in ๐ ร ker ๐น โฒ(๐ฅ) with (๐นโ1)โฒ(๐ฆ) = ๐ and ๐นโ1(๐ค) = ๐ฅ . Observe that ๐นโ1(๏ฟฝ๏ฟฝ) โ ๐นโ1(๏ฟฝ๏ฟฝ) for๏ฟฝ๏ฟฝ = (๏ฟฝ๏ฟฝ, ๐) in this neighborhood. Taking ๐ฅ๐ โ ๐นโ1(๐ฆ + ๐๐ฮ๐ฆ, ๐ + ๐๐๐ฮ๐ฅ), we have
lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐
= (๐นโ1)โฒ(๐ค) (ฮ๐ฆ, ๐ฮ๐ฅ) = ๐ (ฮ๐ฆ, ๐ฮ๐ฅ)
= ๐ดโ ฮ๐ฆ + ๐ฮ๐ฅ = ๐ดโ ๐ดฮ๐ฅ + ๐ฮ๐ฅ = ฮ๐ฅ,
which proves the claim. ๏ฟฝ
Remark 22.4. In Lemma 22.3 (ii), if๐ is nite-dimensional, it suces to assume that ๐น is continuouslydierentiable with ker ๐น โฒ(๐ฅ)โ = {0}. In this case we can take ๐น โฒ(๐ฅ)โ โ โ ๐ดโ(๐ด๐ดโ)โ1 for ๐ด โ ๐น โฒ(๐ฅ).
22.2 cone transformation formulas
At its heart, calculus rules for (co)derivatives of set-valued mappings derive from corre-sponding transformation formulas for the underlying cones. To formulate these, let ๐ถ โ ๐and ๐ โ ๐(๐ ;๐ ), and ๐ฅ โ ๐ ๐ถ โ {๐ ๐ฆ | ๐ฆ โ ๐ถ}. We then say that there exists a family ofcontinuous inverse selections
{๐ โ1๐ฆ : ๐๐ฆ โ ๐ถ | ๐ฆ โ ๐ถ, ๐ ๐ฆ = ๐ฅ}
of ๐ to ๐ถ at ๐ฅ โ ๐ ๐ถ if for each ๐ฆ โ ๐ถ with ๐ ๐ฆ = ๐ฅ there exists a neighborhood๐๐ฆ โ ๐ ๐ถ of๐ฅ = ๐ ๐ฆ and a map ๐ โ1๐ฆ : ๐๐ฆ โ ๐ถ continuous at ๐ฅ with ๐ โ1๐ฆ ๐ฅ = ๐ฆ and ๐ ๐ โ1๐ฆ ๐ฅ = ๐ฅ for every๐ฅ โ ๐๐ฅ a neighborhood of ๐ฅ .
Example 22.5. Let ๐บ : โ๐โ1 โ โ be continuous at ๐ฅ , and set ๐ถ โ epi๐บ as well as๐ (๐ฅ, ๐ก) โ ๐ฅ . Then by the classical inverse function Theorem 2.8,
{๐ โ1(๐ฅ,๐ก) (๐ฅ) โ (๐ฅ, ๐ก โ๐บ (๐ฅ) +๐บ (๐ฅ)) | ๐ก โฅ ๐บ (๐ฅ)}
is a family of continuous inverse selections to ๐ถ at ๐ฅ . If ๐บ is Frรฉchet dierentiable at ๐ฅ ,then so is ๐ โ1(๐ก,๐ฅ) .
Lemma 22.6. Let๐,๐ be Banach spaces and assume there exists a family of continuous inverseselections {๐ โ1๐ฆ : ๐๐ฆ โ ๐ถ | ๐ฆ โ ๐ถ, ๐ ๐ฆ = ๐ฅ} of ๐ โ ๐(๐ ;๐ ) to ๐ถ โ ๐ at ๐ฅ โ ๐ . If each ๐ โ1๐ฆ isFrรฉchet dierentiable at ๐ฅ , then
๐๐ ๐ถ (๐ฅ) =โ
๐ฆ :๐ ๐ฆ=๐ฅ๐ ๐๐ถ (๐ฆ).
297
22 calculus for the graphical derivative
Proof. We rst prove โโโ. Suppose ฮ๐ฆ โ ๐๐ถ (๐ฆ) for some ๐ฆ โ cl๐ถ with ๐ ๐ฆ = ๐ฅ . Thenฮ๐ฆ = lim๐โโ(๐ฆ๐ โ ๐ฆ)/๐๐ for some ๐ฆ๐ โ ๐ถ and ๐๐โ 0. Consequently, since ๐ is bounded,๐ (๐ฆ๐ โ ๐ฆ)/๐๐ โ ๐ ฮ๐ฆ . But ๐ ๐ฆ โ cl๐ ๐ถ , so ๐ ฮ๐ฆ โ ๐๐ ๐ถ (๐ฅ). On the other hand, if ๐ฆ โ cl๐ถ ,then ๐๐ถ (๐ฆ) = โ and thus there is nothing to show. Hence โโโ holds.To establish โโโ, we rst of all note that ๐cl๐ ๐ถ (๐ฅ) = โ if ๐ฅ โ cl๐ ๐ถ . So suppose ๐ฅ โ cl๐ ๐ถand ฮ๐ฅ โ ๐๐ ๐ถ (๐ฅ). Then ๐ฅ = ๐ ๐ฆ for some ๐ฆ โ cl๐ถ . Since 0 โ ๐๐ถ (๐ฆ), we can concentrate onฮ๐ฅ โ 0. Then ฮ๐ฅ = lim๐โโ(๐ฅ๐ โ๐ฅ)/๐๐ for some ๐ฅ๐ โ ๐ ๐ถ and ๐๐โ 0. We have ๐ฅ๐ = ๐ ๐ฆ๐ for๐ฆ๐ โ ๐ โ1๐ฆ (๐ฅ๐). If we can show that (๐ฆ๐ โ ๐ฆ)/๐๐ โ ฮ๐ฆ for some ฮ๐ฆ โ ๐ , then ฮ๐ฆ โ ๐๐ถ (๐ฅ)and ฮ๐ฅ = ๐ ฮ๐ฆ . Since ๐ โ1๐ฆ is Frรฉchet dierentiable at ๐ฅ , letting โ๐ โ ๐ฅ๐ โ ๐ฅ and using that(โ๐ โ ๐๐ฮ๐ฅ)/๐๐ = (๐ฅ๐ โ ๐ฅ)/๐๐ โ ฮ๐ฅ โ 0 and โโ๐ โ๐/๐๐ โ โฮ๐ฅ โ๐ , indeed
lim๐โโ
(๐ฆ๐ โ ๐ฆ๐๐
โ (๐ โ1๐ฆ )โฒ(๐ฅ)ฮ๐ฅ)= lim๐โโ
๐ โ1๐ฆ (๐ฅ๐) โ ๐ โ1๐ฆ (๐ฅ) โ ๐๐ (๐ โ1๐ฆ )โฒ(๐ฅ)ฮ๐ฅ๐๐
= lim๐โโ
๐ โ1๐ฆ (๐ฅ + โ๐) โ ๐ โ1๐ฆ (๐ฅ) โ (๐ โ1๐ฆ )โฒ(๐ฅ)โ๐๐๐
= 0.
Thus ฮ๐ฆ = (๐ โ1๐ฆ )โฒ(๐ฅ)ฮ๐ฅ , which proves โโโ. ๏ฟฝ
Remark 22.7 (qualification conditions in finite dimensions). If ๐ and ๐ are nite-dimensional, wecould replace the existence of the family of {๐ โ1๐ฆ } of continuous selections in Lemma 22.6 by themore conventional qualication conditionโ
๐ฆ :๐ ๐ฆ=๐ฅ๐๐ถ (๐ฆ) โฉ ker๐ = {0}.
We do not employ such a condition, as the extension to Banach spaces would have to be based noton ๐๐ถ (๐ฆ) but on the weak tangent cone ๐๐ค๐ถ (๐ฆ) that is dicult to compute explicitly.
Webase all our calculus rules on the previous linear transformation lemma and the followingcomposition lemma for the tangent cone ๐๐ถ .
Lemma 22.8 (fundamental lemma on compositions). Let ๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ โ ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}
for ๐น : ๐ โ ๐ , and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ and either
(i) ๐บ is semi-dierentiable at ๐ฆ for ๐ง, or
(ii) ๐นโ1 is semi-dierentiable at ๐ฆ for ๐ฅ ,
then
(22.3) ๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) | ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ), ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ)}.
298
22 calculus for the graphical derivative
Proof. We only consider the case (i); the case (ii) is shown analogously. By denition, wehave (ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง) if and only if for some (๐ฅ๐ , ๐ฆ๐ , ๐ง๐) โ ๐ถ and ๐๐โ 0,
ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐
, ฮ๐ฆ = lim๐โโ
๐ฆ๐ โ ๐ฆ๐๐
, ฮ๐ง = lim๐โโ
๐ง๐ โ ๐ง๐๐
.
On the other hand, we have ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) if and only if the rst two limits hold forsome (๐ฅ๐ , ๐ฆ๐) โ graph ๐น and ๐๐โ 0. Likewise, we have ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ) if and only ifthe last two limits hold. This immediately yields โโโ.To prove โโโ, take ๐๐ > 0 and (๐ฅ๐ , ๐ฆ๐) โ graph ๐น such that the rst two limits hold. By thesemi-dierentiability of ๐บ at ๐ฆ for ๐ง, for any ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ) we can nd ๐ง๐ โ ๐บ (๐ฆ๐)such that (๐ง๐ โ ๐ง)/๐๐ โ ฮ๐ง. This shows the remaining limit. ๏ฟฝ
If one of the two mappings is single-valued, we can use Lemma 22.3 for verifying its semi-dierentiability and Theorem 20.12 for the expression of its graphical derivative to obtainfrom Lemma 22.8 the following two special cases.
Corollary 22.9 (fundamental lemma on compositions: single-valued outer mapping). Let๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ,๐บ (๐ฆ)) | ๐ฆ โ ๐น (๐ฅ)}
for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ and ๐บ is Frรฉchet dierentiable at ๐ฆ , then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,๐บโฒ(๐ฆ)ฮ๐ฆ) | ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ)}.
Corollary 22.10 (fundamental lemma on compositions: single-valued inner mapping). Let๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ = ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}
for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ , ๐น is continuously Frรฉchet dierentiable at ๐ฅand ๐น โฒ(๐ฅ)โ has a left-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ), then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) | ฮ๐ฆ = ๐น โฒ(๐ฅ)ฮ๐ฅ, ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ)}.
22.3 calculus rules
Combining now the previous results, we quickly obtain various calculus rules. We beginas usual with a sum rule.
299
22 calculus for the graphical derivative
Theorem 22.11 (addition of a single-valued dierentiable mapping). Let ๐,๐ be Banachspaces, let ๐บ : ๐ โ ๐ be Frรฉchet dierentiable, and ๐น : ๐ โ ๐ . Then for any ๐ฅ โ ๐ and๐ฆ โ ๐ป (๐ฅ) โ ๐น (๐ฅ) +๐บ (๐ฅ),
๐ท๐ป (๐ฅ |๐ฆ) (ฮ๐ฅ) = ๐ท๐น (๐ฅ |๐ฆ โ๐บ (๐ฅ)) (ฮ๐ฅ) +๐บโฒ(๐ฅ)ฮ๐ฅ (ฮ๐ฅ โ ๐ ).
Proof. We have graph๐ป = ๐ ๐ถ for
(22.4) ๐ถ โ {(๐ข, ๐ฅ,๐บ (๐ฅ)) | ๐ฅ โ ๐, ๐ข โ ๐น (๐ฅ)} and ๐ (๐ข, ๐ฅ, ๐ฃ) โ (๐ฅ,๐ข + ๐ฃ).
We now use Lemma 22.6 to calculate ๐๐ ๐ถ . Accordingly, for all (๐ข, ๐ฅ,๐บ (๐ฅ)) โ ๐ถ such that๐ (๐ข, ๐ฅ,๐บ (๐ฅ)) = (๐ฅ, ๐ฆ) โ i.e., only for ๐ฅ = ๐ฅ and ๐ข = ๐ฆ โ ๐บ (๐ฅ) โ we dene the inverseselection
๐ โ1(๐ข,๐ฅ,๐บ (๐ฅ)) : ๐ ๐ถ โ ๐ถ, ๐ โ1(๐ข,๐ฅ,๐บ (๐ฅ)) (๐ฅ, ๏ฟฝ๏ฟฝ) โ (๏ฟฝ๏ฟฝ โ๐บ (๐ฅ), ๐ฅ,๐บ (๐ฅ)),
Then ๐ โ1(๐ข,๐ฅ,๐บ (๐ฅ)) (๐ฅ,๐ข +๐บ (๐ฅ)) = (๐ข, ๐ฅ,๐บ (๐ฅ)) and ๐ โ1(๐ข,๐ฅ,๐บ (๐ฅ)) (๐ฅ, ๏ฟฝ๏ฟฝ) โ ๐ถ for every (๐ฅ, ๏ฟฝ๏ฟฝ) โ ๐ ๐ถ .Furthermore, ๐ โ1(๐ข,๐ฅ,๐บ (๐ฅ)) is continuous and Frรฉchet dierentiable at (๐ฅ, ๐ง).Lemma 22.6 now yields
๐graph๐ป (๐ฅ, ๐ฆ) = {(ฮ๐ฅ,ฮ๐ข + ฮ๐ฃ) | (ฮ๐ข,ฮ๐ฅ,ฮ๐ฃ) โ ๐๐ถ (๐ฆ โ๐บ (๐ฅ), ๐ฅ,๐บ (๐ฅ))}.
Moreover, ๐ถ given in (22.4) coincides with the ๐ถ dened in Corollary 22.9 with ๐นโ1 inplace of ๐น . Using Corollary 22.9 and inserting the expression from Lemma 20.5 for ๐ท๐นโ1, itfollows
๐๐ถ (๐ข, ๐ฅ, ๐ฃ) = {(ฮ๐ข,ฮ๐ฅ,๐บโฒ(๐ฅ)ฮ๐ฅ) | ฮ๐ข โ ๐ท๐น (๐ฅ |๐ข) (ฮ๐ฅ)}.Thus
๐ท๐ป (๐ฅ |๐ฆ) (ฮ๐ฅ) = {ฮ๐ข + ฮ๐ฃ | (ฮ๐ข,ฮ๐ฅ,ฮ๐ฃ) โ ๐๐ถ (๐ฆ โ๐บ (๐ฅ), ๐ฅ,๐บ (๐ฅ))}= {ฮ๐ข +๐บโฒ(๐ฅ)ฮ๐ฅ | ฮ๐ข โ ๐ท๐น (๐ฅ |๐ฆ โ๐บ (๐ฅ)) (ฮ๐ฅ)},
which yields the claim. ๏ฟฝ
We now turn to chain rules, beginning with the case that the outer mapping is single-valued.
Theorem 22.12 (outer composition with a single-valued dierentiable mapping). Let ๐,๐be Banach spaces, ๐น : ๐ โ ๐ , and๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐น (๐ฅ)) be given.If ๐บ is Frรฉchet dierentiable at every ๐ฆ โ ๐น (๐ฅ), invertible on ran๐บ near ๐ง, and the inverse๐บโ1 is Frรฉchet dierentiable at ๐ง, then
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) =โ
๐ฆ :๐บ (๐ฆ)=๐ง๐บโฒ(๐ฆ)๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
300
22 calculus for the graphical derivative
Proof. Observing that graph๐ป = ๐ ๐ถ for
(22.5) ๐ถ โ {(๐ฅ, ๏ฟฝ๏ฟฝ,๐บ (๏ฟฝ๏ฟฝ)) | ๏ฟฝ๏ฟฝ โ ๐น (๐ฅ)} and ๐ (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) โ (๐ฅ, ๐ง),we again use Lemma 22.6 to calculate ๐๐ ๐ถ . Accordingly, we dene for ๐ฆ โ ๐บโ1(๐ง) โฉ ๐น (๐ฅ)the family of inverse selections
๐ โ1(๐ฅ,๐ฆ,๐ง) : ๐ ๐ถ โ ๐ถ, ๐ โ1(๐ฅ,๐ฆ,๐ง) (๐ฅ, ๐ง) โ (๐ฅ,๐บโ1(๐ง), ๐ง),
Clearly, ๐ โ1(๐ฅ,๐ฆ,๐ง) (๐ฅ, ๐ง) = (๐ฅ, ๐ฆ, ๐ง). Furthermore, ๐บ is by assumption invertible on its rangenear ๐ง = ๐บ (๐ฆ). Hence ๐บโ1(๐ง) โ ๐น (๐ฅ), and thus in fact ๐ โ1(๐ฅ,๐ฆ,๐ง) (๐ฅ, ๐ง) โ ๐ถ for all (๐ฅ, ๐ง) โ ๐ ๐ถ .Moreover, ๐ โ1(๐ฅ,๐ฆ,๐ง) is continuous and Frรฉchet dierentiable at (๐ฅ, ๐ง) since ๐บโ1 has theseproperties at ๐ง.
Applying Lemma 22.6 now yields
๐graph๐ป (๐ฅ, ๐ง) =โ
๐ฆ :๐บ (๐ฆ)=๐ง{(ฮ๐ฅ,ฮ๐ง) | (ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง)}.
Using Corollary 22.9, we then obtain
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) =โ
๐ฆ :๐บ (๐ฆ)=๐ง{ฮ๐ง | (ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง)}
=โ
๐ฆ :๐บ (๐ฆ)=๐ง{๐บโฒ(๐ฆ)ฮ๐ฆ | ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ)}.
After further simplication, we arrive at the claimed expression. ๏ฟฝ
In particular, this result holds if๐บ is Frรฉchet dierentiable and๐บโฒ(๐ฆ) is bijective, since in thiscase the inverse function Theorem 2.8 guarantees the local existence and dierentiabilityof ๐บโ1.
Another useful special case is when the mapping ๐บ is linear.
Corollary 22.13 (outer composition with a linear operator). Let ๐,๐, ๐ be Banach spaces,๐ด โ ๐(๐ ;๐ ), and ๐น : ๐ โ ๐ . If ๐ด has a bounded left-inverse ๐ดโ , then for any ๐ฅ โ ๐ and๐ง โ ๐ป (๐ฅ) := ๐ด๐น (๐ฅ),
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = ๐ด๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) (ฮ๐ฅ โ ๐ )for the unique ๐ฆ โ ๐ such that ๐ด๐ฆ = ๐ง.
Proof. We apply Theorem 22.12 to๐บ (๐ฆ) โ ๐ด๐ฆ , which is clearly continuously dierentiableat every ๐ฆ โ ๐น (๐ฅ). Since ๐ด has a bounded left-inverse ๐ดโ , ๐บโ1(๐ฆ) = ๐ดโ ๐ฆ is an inverse of๐บ on ๐บ (๐ฆ) = ran๐ด, which is also clearly dierentiable. Moreover, {๐ฆ | ๐บ (๐ฆ) = ๐ง} is asingleton, which removes the intersections and unions from the expressions provided byTheorem 22.12. ๏ฟฝ
301
22 calculus for the graphical derivative
The assumption is in particular satised if ๐ and ๐ are Hilbert spaces and ๐ด is injectiveand has closed range, since in this case we can take ๐ดโ = (๐ดโ๐ด)โ1๐ดโ๐ฆ (the MooreโPenrosepseudoinverse of ๐ด).
We next consider chain rules where the inner mapping is single-valued.
Theorem 22.14 (inner composition with a single-valued dierentiable mapping). Let๐,๐, ๐be Banach spaces, ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) := ๐บ (๐น (๐ฅ)). If ๐น iscontinuously Frรฉchet dierentiable near ๐ฅ and ๐น โฒ(๐ฅ)โ has a bounded left-inverse ๐น โฒ(๐ฅ)โโ โ๐(๐ โ;๐ โ), then
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = ๐ท๐บ (๐น (๐ฅ) |๐ง) (๐น โฒ(๐ฅ)ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
Proof. Observing that graph๐ป = ๐ ๐ถ for
(22.6) ๐ถ โ {(๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) | ๏ฟฝ๏ฟฝ = ๐น (๐ฅ), ๐ง โ ๐บ (๏ฟฝ๏ฟฝ)} and ๐ (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) โ (๐ฅ, ๐ง),
we again use Lemma 22.6 to compute ๐๐ ๐ถ . Accordingly, we dene a family of inverseselections for all (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) โ ๐ถ such that ๐ (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ง) = (๐ฅ, ๐ง). But this only holds for (๐ฅ, ๏ฟฝ๏ฟฝ, ๐ฅ) =(๐ฅ, ๐น (๐ฅ), ๐ง), and hence we only need
๐ โ1(๐ฅ,๐น (๐ฅ),๐ง) : ๐ ๐ถ โ ๐ถ, ๐ โ1(๐ฅ,๐น (๐ฅ),๐ง) (๐ฅ, ๐ง) โ (๐ฅ, ๐น (๐ฅ), ๐ง).
Clearly ๐ โ1(๐ฅ,๐น (๐ฅ),๐ง) (๐ฅ, ๐ง) = (๐ฅ, ๐น (๐ฅ), ๐ง) and, if (๐ฅ, ๐ง) โ ๐ ๐ถ , then ๐ โ1(๐ฅ,๐น (๐ฅ),๐ง) (๐ฅ, ๐ง) โ ๐ถ and๐ โ1(๐ฅ,๐น (๐ฅ),๐ง) is continuous and dierentiable at (๐ฅ, ๐ง).Thus Lemma 22.6 yields
๐graph๐ป (๐ฅ, ๐ง) = {(ฮ๐ฅ,ฮ๐ง) | (ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) โ ๐๐ถ (๐ฅ, ๐น (๐ฅ), ๐ง)}.
On the other hand, we can apply Corollary 22.10 due to the continuous dierentiability of๐น and left-invertibility of ๐น โฒ(๐ฅ)โ to obtain
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) | ฮ๐ฆ = ๐น โฒ(๐ฅ)ฮ๐ฅ, ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ)}.
Thus๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = {ฮ๐ง | (ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) โ ๐๐ถ (๐ฅ, ๐น (๐ฅ), ๐ง)}
= {ฮ๐ง | ฮ๐ฆ = ๐น โฒ(๐ฅ)ฮ๐ฅ, ฮ๐ง โ ๐ท๐บ (๐น (๐ฅ) |๐ง) (ฮ๐ฆ)},which yields the claim. ๏ฟฝ
Again, we can specialize this result to the case where the single-valued mapping is linear.
302
22 calculus for the graphical derivative
Corollary 22.15 (inner composition with a linear operator). Let ๐,๐, ๐ be Banach spaces,๐ด โ ๐(๐ ;๐ ), and ๐บ : ๐ โ ๐ . Let ๐ป โ ๐บ โฆ ๐ด for ๐ด โ ๐(๐ ;๐ ) and ๐บ : ๐ โ ๐ onBanach spaces ๐,๐ , and ๐ . If ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ), then for all ๐ฅ โ ๐ and๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ด๐ฅ),
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = ๐ท๐บ (๐ด๐ฅ |๐ง) (๐ดฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
We wish to apply these results to further dierentiate the chain rules from Theorems 4.17and 13.23. For the former, this is straight-forward based on the two corollaries so farobtained.
Corollary 22.16 (second derivative chain rule for convex subdierential). Let๐,๐ be Banachspaces, let ๐ : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐ด โ ๐(๐ ;๐ ) be suchthat ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ), and ran๐ด โฉ int dom ๐ โ โ . Let โ โ ๐ โฆ๐ด. Thenfor any ๐ฅ โ ๐ and ๐ฅโ โ ๐โ(๐ฅ) = ๐ดโ๐๐ (๐ด๐ฅ),
๐ท [๐โ] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = ๐ดโ๐ท [๐๐ ] (๐ด๐ฅ |๐ฆโ) (๐ดฮ๐ฅ) (ฮ๐ฅ โ ๐ )for the unique ๐ฆโ โ ๐ โ satisfying ๐ดโ๐ฆโ = ๐ฅโ.
Proof. The expression for ๐โ(๐ฅ) follows from Theorem 4.17, to which we apply Corol-lary 22.15 as well as Corollary 22.13 with ๐ดโ in place of ๐ด. ๏ฟฝ
To further dierentiate the result of applying a chain rule such as Theorem 13.23, we alsoneed a product rule for a single-valued mapping๐บ and a set-valued mapping ๐น . In principle,this could be obtained as a composition of ๐ฅ โฆโ (๐ฅ1, ๐ฅ2), (๐ฅ1, ๐ฅ2) โฆโ {๐บ (๐ฅ1)} ร ๐น (๐ฅ2), and(๐ฆ1, ๐ฆ2) โฆโ ๐ฆ1๐ฆ2; however, the last one of these mappings does not possess the left-inverserequired by Corollary 22.13. We therefore take another route.
Theorem 22.17 (product rule). Let ๐,๐, ๐ be Banach spaces, let ๐บ : ๐ โ ๐(๐ ;๐ ) be Frรฉchetdierentiable, and ๐น : ๐ โ ๐ . If ๐บ (๐ฅ) โ ๐(๐ ;๐ ) has a left-inverse ๐บ (๐ฅ)โ โ on ran๐บ (๐ฅ)for ๐ฅ near ๐ฅ โ ๐ and the mapping ๐ฅ โฆโ ๐บ (๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ , then for all๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ฅ)๐น (๐ฅ) โ โ
๐ฆโ๐น (๐ฅ)๐บ (๐ฅ)๐ฆ ,๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = [๐บโฒ(๐ฅ)ฮ๐ฅ]๐ฆ +๐บ (๐ฅ)๐ท๐น (๐ฅ |๐ฆ)ฮ๐ฅ (๐ง โ ๐ป (๐ฅ), ฮ๐ฅ โ ๐ )
for the unique ๐ฆ โ ๐น (๐ฅ) satisfying ๐บ (๐ฅ)๐ฆ = ๐ง.
Proof. First, dene ๐น : ๐ โ ๐ ร ๐ by
graph ๐น = ๐ 0 graph ๐น for ๐ 0(๐ฅ, ๏ฟฝ๏ฟฝ) โ (๐ฅ, ๐ฅ, ๏ฟฝ๏ฟฝ).Then we have graph๐ป = ๐ 1 graph(๐บ โฆ ๐น ) for
๐บ (๐ฅ, ๏ฟฝ๏ฟฝ) = (๐ฅ,๐บ (๐ฅ)๏ฟฝ๏ฟฝ) and ๐ 1(๐ฅ1, ๐ฅ2, ๐ง) โ (๐ฅ1, ๐ง).
303
22 calculus for the graphical derivative
Let now ๐ฆ โ ๐น (๐ฅ). By Lemma 22.6, we have
๐๐ 0 graph ๐น (๐ฅ, ๐ฅ, ๐ฆ) = {(ฮ๐ฅ,ฮ๐ฅ,ฮ๐ฆ) | (ฮ๐ฅ,ฮ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)}
so that๐ท๐น (๐ฅ |๐ฅ, ๐ฆ) (ฮ๐ฅ) = {ฮ๐ฅ} ร ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ).
We wish to apply Theorem 22.12. First, ๐บ is single-valued and dierentiable. Since ๐บ (๐ฅ) isassumed left-invertible on its range for ๐ฅ near ๐ฅ , the mapping ๐ : (๐ฅ, ๐ง) โฆโ (๐ฅ,๐บ (๐ฅ)โ โ๐ง)is an inverse of ๐บ , which is Frรฉchet dierentiable at (๐ฅ, ๐ง) since ๐ฅ โ ๐บ (๐ฅ)โ โ is Frรฉchetdierentiable at ๐ฅ . Finally, we also have
๐บโฒ(๐ฅ, ๐ฆ) (ฮ๐ฅ,ฮ๐ฆ) = (ฮ๐ฅ, [๐บโฒ(๐ฅ)ฮ๐ฅ]๐ฆ +๐บ (๐ฅ)ฮ๐ฆ) .
Thus Theorem 22.12 yields
๐ท [๐บ โฆ ๐น ] (๐ฅ |๐ฅ, ๐ง) (ฮ๐ฅ) =โ
๐ฆ :๐บ (๐ฅ,๐ฆ)=(๐ฅ,๐ง)๐บโฒ(๐ฅ, ๐ฆ)๐ท๐น (๐ฅ |๐ฅ, ๐ฆ) (ฮ๐ฅ)
=โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง๐บโฒ(๐ฅ, ๐ฆ) (ฮ๐ฅ, ๐ท๐น (๐ฅ |๐ฆ)ฮ๐ฅ)
=โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง{ฮ๐ฅ} ร ([๐บโฒ(๐ฅ)ฮ๐ฅ]๐ฆ +๐บ (๐ฅ)๐ท๐น (๐ฅ |๐ฆ)ฮ๐ฅ).
It follows that
๐graph(๐บโฆ๐น ) (๐ฅ, ๐ฅ, ๐ง) =โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง{(ฮ๐ฅ,ฮ๐ฅ,ฮ๐ง) | ฮ๐ง โ ([๐บโฒ(๐ฅ)ฮ๐ฅ]๐ฆ +๐บ (๐ฅ)๐ท๐น (๐ฅ |๐ฆ)ฮ๐ฅ)}.
Observe now that ๐ 1 is linear and invertible on ๐ graph(๐บ โฆ ๐น ). Therefore, another applica-tion of Lemma 22.6 yields
๐graph๐ป (๐ฅ, ๐ง) =โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง{(ฮ๐ฅ,ฮ๐ง) | ฮ๐ง โ ([๐บโฒ(๐ฅ)ฮ๐ฅ]๐ฆ +๐บ (๐ฅ)๐ท๐น (๐ฅ |๐ฆ)ฮ๐ฅ)}.
Since the ๐ฆ is unique by our invertibility assumptions on๐บ (๐ฅ) and exists due to ๐ง โ ๐ป (๐ฅ),we obtain the claim. ๏ฟฝ
Corollary 22.18 (second derivative chain rule for Clarke subdierential). Let๐,๐ be Banachspaces, let ๐ : ๐ โ ๐ be locally Lipschitz continuous, and let ๐ : ๐ โ ๐ be twice continuouslydierentiable. Set โ : ๐ โ ๐ , โ(๐ฅ) โ ๐ (๐ (๐ฅ)). If there exists a neighborhood ๐ of ๐ฅ โ ๐such that
(i) ๐ is Clarke regular at ๐ (๐ฅ) for all ๐ฅ โ ๐ ;(ii) ๐โฒ(๐ฅ)โ has a bounded left-inverse ๐โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) for all ๐ฅ โ ๐ ;
(iii) the mapping ๐ฅ โฆโ ๐โฒ(๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ ;
304
22 calculus for the graphical derivative
then for all ๐ฅโ โ ๐๐ถโ(๐ฅ) = ๐โฒ(๐ฅ)โ๐๐ถ ๐ (๐ (๐ฅ)),
๐ท [๐๐ถโ] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = (๐โฒโฒ(๐ฅ)ฮ๐ฅ)โ๐ฆโ + ๐โฒ(๐ฅ)โ๐ท [๐๐ถ ๐ ] (๐ (๐ฅ) |๐ฆโ) (๐โฒ(๐ฅ)ฮ๐ฅ) (ฮ๐ฅ โ ๐ )
for the unique ๐ฆโ โ ๐๐ถ ๐ (๐ (๐ฅ)) such that ๐โฒ(๐ฅ)โ๐ฆโ = ๐ฅโ.
Proof. The expression for ๐๐ถโ(๐ฅ) follows from Theorem 13.23. Let now ๐ : ๐ โ ๐(๐ โ;๐ โ),๐ (๐ฅ) โ ๐โฒ(๐ฅ)โ. Then ๐ is Frรฉchet dierentiable in๐ as well, which together with assump-tion (iii) allows us to apply Theorem 22.17 to obtain
๐ท [๐๐ถ (๐ โฆ ๐)] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = (๐โฒ(๐ฅ)ฮ๐ฅ)๐ฆโ + ๐โฒ(๐ฅ)โ๐ท [(๐๐ถ ๐ ) โฆ ๐] (๐ฅ |๐ฅโ) (ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
Furthermore, since ๐โฒ(๐ฅ)โ has a bounded left-inverse, we can apply Theorem 22.14 to obtainfor all ๐ฅ โ ๐ and all ๐ฅโ โ ๐๐ถ ๐ (๐ (๐ฅ))
๐ท [(๐๐ถ ๐ ) โฆ ๐] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = ๐ท [๐๐ถ ๐ ] (๐ (๐ฅ) |๏ฟฝ๏ฟฝโ) (๐โฒ(๐ฅ)ฮ๐ฅ) (ฮ๐ฅ โ ๐ )
for the unique ๏ฟฝ๏ฟฝโ โ ๐๐ถ ๐ (๐ (๐ฅ)) such that ๐โฒ(๐ฅ)โ๏ฟฝ๏ฟฝโ = ๐ฅโ. Finally, since the adjoint mapping๐ด โฆโ ๐ดโ is linear and an isometry, it is straightforward to verify using the denition that๐โฒ(๐ฅ)ฮ๐ฅ = (๐โฒโฒ(๐ฅ)ฮ๐ฅ)โ, which yields the claim. ๏ฟฝ
305
23 CALCULUS FOR THE FRรCHET CODERIVATIVE
We continue with calculus rules for the Frรฉchet coderivative. As in Chapter 22, we startwith the relevant regularity concept, then prove the fundamental lemmas, and nally derivethe calculus rules.
23.1 semi-codifferentiability
Let ๐,๐ be Banach spaces. We say that ๐น is semi-codierentiable at ๐ฅ โ ๐ for ๐ฆ โ ๐น (๐ฅ) iffor each ๐ฆโ โ ๐ โ there exists some ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) such that
(23.1) limgraph ๐น3(๐ฅ๐ ,๐ฆ๐ )โ(๐ฅ,๐ฆ)
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐โ(๐ฅ๐ โ ๐ฅ, ๐ฆ๐ โ ๐ฆ)โ๐ร๐
= 0.
Recalling (18.7), this is equivalent to requiring that โ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๐ฆโ) as well. ByLemma 20.5, a mapping is therefore semi-codierentiable if and only if its inverse is. Inparticular, we have
Lemma 23.1. Let ๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be single-valued. If ๐น is Frรฉchetdierentiable at ๐ฅ โ ๐ , then
(i) ๐น is semi-codierentiable at ๐ฅ for ๐ฆ = ๐น (๐ฅ);(ii) ๐นโ1 is semi-codierentiable at ๐ฆ = ๐น (๐ฅ) for ๐ฅ .
Proof. According to the discussion above, in both cases we need to prove that there existsan ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) such that also โ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๐ฆโ). This follows immediatelyfrom the expression for ๐ทโ๐น in Theorem 20.12. ๏ฟฝ
306
23 calculus for the frรฉchet coderivative
23.2 cone transformation formulas
In the following, we consider more generally Y-normal cones for Y โฅ 0 as these results willbe needed later in Chapter 25 for proving the corresponding expressions for the limitingnormal cone. We refer to Section 22.2 for the denition of a family of continuous inverseselections.
Lemma 23.2. Let๐,๐ be Banach spaces and assume there exists a family of continuous inverseselections {๐ โ1๐ฆ : ๐๐ฆ โ ๐ถ | ๐ฆ โ ๐ถ, ๐ ๐ฆ = ๐ฅ} of ๐ โ ๐(๐ ;๐ ) to ๐ถ โ ๐ at ๐ฅ โ ๐ . If each ๐ โ1๐ฆ isFrรฉchet dierentiable at ๐ฅ and we set ๐ฟ โ โ(๐ โ1๐ฆ )โฒ(๐ฅ)โ๐(๐ ;๐ ) , then for all Y โฅ 0,
(23.2) ๐Y/โ๐ โ๐ (๐ ;๐ )๐ ๐ถ (๐ฅ) โ
โ๐ฆ :๐ ๐ฆ=๐ฅ
{๐ฅโ โ ๐ โ | ๐ โ๐ฅโ โ ๐ Y๐ถ (๐ฆ)} โ ๐ Y๐ฟ
๐ ๐ถ (๐ฅ).
In particular,๐๐ ๐ถ (๐ฅ) =
โ๐ฆ :๐ ๐ฆ=๐ฅ
{๐ฅโ โ ๐ โ | ๐ โ๐ฅโ โ ๐๐ถ (๐ฆ)}.
Proof. By (18.7), ๐ฅโ โ ๐ Y๐ ๐ถ (๐ฅ) if and only if
lim sup๐ ๐ถ3๐ ๐ฆ๐โ๐ ๐ฆ
ใ๐ โ๐ฅโ, ๐ฆ๐ โ ๐ฆใ๐โ๐ (๐ฆ๐ โ ๐ฆ)โ๐
โค Y
for some ๐ฆ such that ๐ ๐ฆ = ๐ฅ or, equivalently,
(23.3) lim sup๐ ๐ถ3๐ฅ๐โ๐ฅ
ใ๐ โ๐ฅโ, ๐ โ1๐ฆ (๐ฅ๐) โ ๐ โ1๐ฆ (๐ฅ)ใ๐โ๐ฅ๐ โ ๐ฅ โ๐
โค Y .
This in turn holds if and only if for every Yโฒ > Y, there exists a ๐ฟ > 0 such that
(23.4) ใ๐ โ๐ฅโ, ๐ฆ๐ โ ๐ฆใ๐ โค Yโฒโ๐ (๐ฆ๐ โ ๐ฆ)โ๐ (๐ฆ๐ โ ๐น(๐ฆ, ๐ฟ) โฉ๐ถ).
Similarly, ๐ โ๐ฅโ โ ๐ Y๐ถ (๐ฆ) if and only if
(23.5) lim sup๐ถ3๐ฆ๐โ๐ฆ
ใ๐ โ๐ฅโ, ๐ฆ๐ โ ๐ฆใ๐โ๐ฆ๐ โ ๐ฆ โ๐
โค Y.
Now if๐ฅโ โ ๐ Y/โ๐ โ๐ (๐ ;๐ )๐ ๐ถ (๐ฅ), then using (23.4) and estimating โ๐ (๐ฆ๐โ๐ฆ)โ๐ โค โ๐ โ๐(๐ ;๐ ) โ๐ฆ๐โ
๐ฆ โ๐ yields (23.5). Hence the rst inclusion in (23.2) holds. On the other hand, if ๐ โ๐ฅโ โ๐ Y๐ถ (๐ฆ), then we can take ๐ฆ๐ = ๐ โ1๐ฆ (๐ฅ๐) for some ๐ถ 3 ๐ฅ๐ โ ๐ฅ in (23.5) to obtain (23.3) for
Y = Y๐ฟ, where ๐ฟ โฅ lim sup๐โโ โ๐ โ1๐ฆ (๐ฅ๐) โ ๐ โ1๐ฆ (๐ฅ)โ๐/โ๐ฅ๐ โ ๐ฅ โ๐ is nite by the Frรฉchetdierentiability of ๐ โ1๐ฆ at ๐ฅ . Hence the second inclusion in (23.2) holds as well. ๏ฟฝ
307
23 calculus for the frรฉchet coderivative
Remark 23.3 (polarity and qualification condition in finite dimensions). In nite dimensions,Lemma 23.2 for Y = 0 could also be proved with the help of the polarity relationships ๐๐ ๐ถ (๐ฅ) =๐๐ ๐ถ (๐ฅ)โฆ and๐๐ถ (๐ฆ) = ๐๐ถ (๐ฆ)โฆ from Lemma 18.10. Furthermore, the existence of a family of continuousselections can be replaced by a qualication condition as in Remark 22.7.
We are now ready to prove the fundamental composition lemma, this time for the Frรฉchetnormal cone.
Lemma 23.4 (fundamental lemma on compositions). Let ๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ โ ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}for ๐น : ๐ โ ๐ , and ๐บ : ๐ โ ๐ . Let (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ .
(i) If ๐บ is semi-codierentiable at ๐ฆ for ๐ง, then for all Y โฅ 0,
๐ Y๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐ฅโ, ๐ฆโ, ๐งโ) | ๐ฅโ โ ๐ทโ
Y ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ)}.
(ii) If ๐นโ1 is semi-codierentiable at ๐ฆ for ๐ฅ , then for all Y โฅ 0,
๐ Y๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐ฅโ, ๐ฆโ, ๐งโ) | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), โ๏ฟฝ๏ฟฝโ โ ๐ทโ
Y๐บ (๐ฆ |๐ง) (โ๐งโ)}.
Proof. We recall that (๐ฅโ, ๐ฆโ, ๐งโ) โ ๐ Y๐ถ (๐ฅ, ๐ฆ, ๐ง) if and only if
(23.6) lim sup๐ถ3(๐ฅ๐ ,๐ฆ๐ ,๐ง๐ )โ(๐ฅ,๐ฆ,๐ง)
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ + ใ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐ + ใ๐งโ, ๐ง๐ โ ๐งใ๐โ(๐ฅ๐ , ๐ฆ๐ , ๐ง๐) โ (๐ฅ, ๐ฆ, ๐ง)โ๐ร๐ร๐ โค Y.
In case (i), there exists a ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ) such that
limgraph๐บ3(๐ฆ๐ ,๐ง๐ )โ(๐ฆ,๐ง)
ใ๏ฟฝ๏ฟฝโ, ๐ฆ๐ โ ๐ฆใ๐ โ ใ๐งโ, ๐ง๐ โ ๐งใ๐โ(๐ฆ๐ , ๐ง๐) โ (๐ฆ, ๐ง)โ๐ร๐ = 0.
Thus (23.6) holds if and only if
lim sup๐ถ3(๐ฅ๐ ,๐ฆ๐ ,๐ง๐ )โ(๐ฅ,๐ฆ,๐ง)
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ + ใ๏ฟฝ๏ฟฝโ + ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐โ(๐ฅ๐ , ๐ฆ๐ , ๐ง๐) โ (๐ฅ, ๐ฆ, ๐ง)โ๐ร๐ร๐ โค Y.
But this is equivalent to ๐ฅโ โ ๐ทโY ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), which yields the claim.
In case (ii), the semi-codierentiability of ๐นโ1 yields the existence of a ๏ฟฝ๏ฟฝโ โ โ๐ฆโ +๐ทโ๐นโ1(๐ฆ |๐ฅ) (โ๐ฅโ), i.e., such that ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), satisfying
limgraph๐บ3(๐ฆ๐ ,๐ฅ๐ )โ(๐ฆ,๐ฅ)
โใ๏ฟฝ๏ฟฝโ + ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐ โ ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐โ(๐ฆ๐ , ๐ฅ๐) โ (๐ฆ, ๐ฅ)โ๐ร๐ = 0.
Thus (23.6) holds if and only if
lim sup๐ถ3(๐ฅ๐ ,๐ฆ๐ ,๐ง๐ )โ(๐ฅ,๐ฆ,๐ง)
ใ๐งโ, ๐ง๐ โ ๐งใ๐ โ ใ๏ฟฝ๏ฟฝโ, ๐ฆ๐ โ ๐ฆใ๐โ(๐ฅ๐ , ๐ฆ๐ , ๐ง๐) โ (๐ฅ, ๐ฆ, ๐ง)โ๐ร๐ร๐ โค Y.
But this is equivalent to โ๏ฟฝ๏ฟฝโ โ ๐ทโY๐บ (๐ฆ |๐ง) (โ๐งโ), which yields the claim. ๏ฟฝ
308
23 calculus for the frรฉchet coderivative
For the remaining results, we x Y = 0. If one of the two mappings is single-valued, we canuse Lemma 23.1 for verifying its semi-dierentiability and Theorem 20.12 for the expressionof its graphical derivative to obtain from Lemma 23.4 the following two special cases.
Corollary 23.5 (fundamental lemma on compositions: single-valued outer mapping). Let๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ,๐บ (๐ฆ)) | ๐ฆ โ ๐น (๐ฅ)}for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ and ๐บ is Frรฉchet dierentiable at ๐ฆ , then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐ฅโ, ๐ฆโ, ๐งโ) | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ[๐บโฒ(๐ฆ)]โ๐งโ โ ๐ฆโ), ๐ฆโ โ ๐ โ}.
Corollary 23.6 (fundamental lemma on compositions: single-valued inner mapping). Let๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ = ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ , ๐น is continuously Frรฉchet dierentiable at ๐ฅand ๐น โฒ(๐ฅ)โ has a left-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ), then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐น โฒ(๐ฅ)โ(โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), ๐ฆโ, ๐งโ) | โ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (โ๐งโ), ๐ฆโ โ ๐ โ}.
23.3 calculus rules
Using these lemmas, we again obtain calculus rules. The proofs are similar to those inSection 22.3, and we only note the dierences.
Theorem 23.7 (addition of a single-valued dierentiable mapping). Let ๐,๐ be Banachspaces, ๐บ : ๐ โ ๐ be Frรฉchet dierentiable, and ๐น : ๐ โ ๐ . Then for any ๐ฅ โ ๐ and๐ฆ โ ๐ป (๐ฅ) โ ๐น (๐ฅ) +๐บ (๐ฅ),
๐ทโ๐ป (๐ฅ |๐ฆ) (๐ฆโ) = ๐ทโ๐น (๐ฅ |๐ฆ โ๐บ (๐ฅ)) (๐ฆโ) + [๐บโฒ(๐ฅ)]โ๐ฆโ (๐ฆโ โ ๐ โ).
Proof. We have graph๐ป = ๐ ๐ถ for ๐ถ and ๐ as given by (22.4) in the proof of Theorem 22.11.Applying Lemma 23.2 in place of Lemma 22.6 in the proof of Theorem 22.11 gives
๐graph๐ป (๐ฅ, ๐ฆ) = {(๐ฅโ, ๐ฆโ) | (๐ฆโ, ๐ฅโ, ๐ฆโ) โ ๐๐ถ (๐ฆ โ๐บ (๐ฅ), ๐ฅ,๐บ (๐ฅ))}.Moreover,๐ถ given in (22.4) coincides with the๐ถ dened in Corollary 23.5 with ๐นโ1 in placeof ๐น . Using Corollary 23.5 and inserting the expression from Lemma 20.5 for ๐ทโ๐นโ1, itfollows
๐๐ถ (๐ข, ๐ฅ, ๐ฃ) = {(๐ขโ, ๐ฅโ, ๐ฃโ) โ ๐ โ ร ๐ โ ร ๐ โ | ๐ขโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ[๐บโฒ(๐ฆ)]โ๐ฃโ โ ๐ฅโ)}= {(๐ขโ, ๐ฅโ, ๐ฃโ) โ ๐ โ ร ๐ โ ร ๐ โ | [๐บโฒ(๐ฅ)]โ๐ฃโ + ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ข) (โ๐ขโ)}.
309
23 calculus for the frรฉchet coderivative
Thus๐ทโ๐ป (๐ฅ |๐ฆ) (๐ฆโ) = {๐ฅโ | (โ๐ฆโ, ๐ฅโ,โ๐ฆโ) โ ๐๐ถ (๐ฆ โ๐บ (๐ฅ), ๐ฅ,๐บ (๐ฅ))}
= {๐ฅโ | โ[๐บโฒ(๐ฅ)]โ๐ฆโ + ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ โ๐บ (๐ฅ)) (๐ฆโ)},which yields the claim. ๏ฟฝ
Theorem 23.8 (outer composition with a single-valued dierentiable mapping). Let ๐,๐ beBanach spaces, ๐น : ๐ โ ๐ , and๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐น (๐ )) be given. If๐บ is Frรฉchet dierentiable at every ๐ฆ โ ๐น (๐ฅ), invertible on ran๐บ near ๐ง, and the inverse ๐บโ1
is Frรฉchet dierentiable at ๐ง, then
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) =โ
๐ฆ :๐บ (๐ฆ)=๐ง๐ทโ๐น (๐ฅ |๐ฆ) ( [๐บโฒ(๐ฆ)]โ๐งโ) (๐งโ โ ๐ โ).
Proof. We have graph๐ป = ๐ ๐ถ for ๐ and ๐ถ as given by (22.5) in the proof of Theorem 22.12.Applying Lemma 23.2 in place of Lemma 22.6 then yields
๐graph๐ป (๐ฅ, ๐ง) =โ
๐ฆ :๐บ (๐ฆ)=๐ง{(๐ฅโ, ๐งโ) | (๐ฅโ, 0, ๐งโ) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง)}.
Corollary 23.5 then shows that
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) =โ
๐ฆ :๐บ (๐ฆ)=๐ง{๐ฅโ | (๐ฅโ, 0,โ๐งโ) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง)}
=โ
๐ฆ :๐บ (๐ฆ)=๐ง{๐ฅโ | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) ( [๐บโฒ(๐ฆ)]โ๐งโ)}.
After further simplication, we arrive at the claimed expression. ๏ฟฝ
Corollary 23.9 (outer composition with a linear operator). Let ๐,๐, ๐ be Banach spaces,๐ด โ ๐(๐ ;๐ ), and ๐น : ๐ โ ๐ . If ๐ด has a bounded left-inverse ๐ดโ , then for any ๐ฅ โ ๐ and๐ง โ ๐ป (๐ฅ) := ๐ด๐น (๐ฅ),
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = ๐ทโ๐น (๐ฅ |๐ฆ) (๐ดโ๐งโ) (๐งโ โ ๐ โ)
for the unique ๐ฆ โ ๐ such that ๐ด๐ฆ = ๐ง.
Proof. We only need to verify that ๐บ (๐ฆ) โ ๐ด๐ง satises the assumptions of Theorem 23.8,which can be done exactly as in the proof of Corollary 22.13. ๏ฟฝ
Theorem 23.10 (inner composition with a single-valued mapping). Let ๐,๐, ๐ be Banachspaces, ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) := ๐บ (๐น (๐ฅ)). If ๐น is continuouslyFrรฉchet dierentiable near ๐ฅ and ๐น โฒ(๐ฅ)โ has a bounded left-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ), then
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = [๐น โฒ(๐ฅ)]โ๐ทโ๐บ (๐น (๐ฅ) |๐ง) (๐งโ) (๐งโ โ ๐ โ).
310
23 calculus for the frรฉchet coderivative
Proof. We have graph๐ป = ๐ ๐ถ for๐ถ and ๐ as given by (22.6) in the proof of Theorem 22.14.Applying Lemma 23.2 in place of Theorem 22.14 then yields
๐graph๐ป (๐ฅ, ๐ง) = {(๐ฅโ, ๐งโ) | (๐ฅโ, 0, ๐งโ) โ ๐๐ถ (๐ฅ, ๐น (๐ฅ), ๐ง)}.
On the other hand, since ๐น is Frรฉchet dierentiable, Corollary 23.6 implies that
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐น โฒ(๐ฅ)โ(โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), ๐ฆโ, ๐งโ) | โ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (โ๐งโ), ๐ฆโ โ ๐ โ}.
Thus๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = {๐ฅโ | (๐ฅโ, 0,โ๐งโ) โ ๐๐ถ (๐ฅ, ๐น (๐ฅ), ๐ง)}
= {๐น โฒ(๐ฅ)โ๏ฟฝ๏ฟฝโ | ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ)},which yields the claim. ๏ฟฝ
Corollary 23.11 (inner composition with a linear operator). Let ๐,๐, ๐ be Banach spaces,๐ด โ ๐(๐ ;๐ ), and ๐บ : ๐ โ ๐ . Let ๐ป โ ๐บ โฆ ๐ด for ๐ด โ ๐(๐ ;๐ ) and ๐บ : ๐ โ ๐ onBanach spaces ๐,๐ , and ๐ . If ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ), then for all ๐ฅ โ ๐ and๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ด๐ฅ),
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = ๐ดโ๐ทโ๐บ (๐ด๐ฅ |๐ง) (๐งโ) (๐งโ โ ๐ โ).
We again apply this to the chain rule from Theorem 4.17. Compare the following expressionwith that from Corollary 22.16, noting that ๐๐ : ๐ โ ๐ โ in Banach spaces such that๐ทโ [๐๐ ] (๐ฅ |๐ฅโ) : ๐ โโ โ ๐ โ.
Corollary 23.12 (second derivative chain rule for convex subdierential). Let๐,๐ be Banachspaces, ๐ : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐ด โ ๐(๐ ;๐ ) be suchthat ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ), and ran๐ด โฉ int dom ๐ โ โ . Let โ โ ๐ โฆ๐ด. Thenfor any ๐ฅ โ ๐ and ๐ฅโ โ ๐โ(๐ฅ) = ๐ดโ๐๐ (๐ด๐ฅ),
๐ทโ [๐โ] (๐ฅ |๐ฅโ) (๐ฅโโ) = ๐ดโ๐ทโ [๐๐ ] (๐ด๐ฅ |๐ฆโ) (๐ดโโ๐ฅโโ) (๐ฅโโ โ ๐ โโ)
for the unique ๐ฆโ โ ๐ โ satisfying ๐ดโ๐ฆโ = ๐ฅโ.
Proof. The expression for ๐โ(๐ฅ) follows from Theorem 4.17, to which we apply Corol-lary 23.11 as well as Corollary 23.9 with ๐ดโ in place of ๐ด. ๏ฟฝ
Hence if ๐ is reexive, the expression for the coderivative is identical to that for thegraphical derivative.
For the corresponding result for the Clarke subdierential, we again need a product rule.
311
23 calculus for the frรฉchet coderivative
Theorem 23.13 (product rule). Let ๐,๐, ๐ be Banach spaces, ๐บ : ๐ โ ๐(๐ ;๐ ) be Frรฉchetdierentiable, and ๐น : ๐ โ ๐ . If ๐บ (๐ฅ) โ ๐(๐ ;๐ ) has a left-inverse ๐บ (๐ฅ)โ โ on ran๐บ (๐ฅ)for ๐ฅ near ๐ฅ โ ๐ and the mapping ๐ฅ โฆโ ๐บ (๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ , then for all๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ฅ)๐น (๐ฅ) โ โ
๐ฆโ๐น (๐ฅ)๐บ (๐ฅ)๐ฆ ,
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = ๐ทโ๐น (๐ฅ |๐ฆ) (๐บ (๐ฅ)โ๐งโ) + ([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ (๐งโ โ ๐ โ)
for the unique ๐ฆ โ ๐น (๐ฅ) satisfying that ๐บ (๐ฅ)๐ฆ = ๐ง.
Proof. The proof is analogous to Theorem 22.17 for the graphical derivative. We againhave graph๐ป = ๐ 1 graph(๐บ โฆ ๐น ) for ๐ 1, ๐น , and๐บ dened in the proof of Theorem 22.17. Let๐ฆ โ ๐น (๐ฅ). By Lemma 23.2, we have
๐๐ 0 graph ๐น (๐ฅ, ๐ฅ, ๐ฆ) = {(๐ฅโ,โ๐ฅโ0,โ๐ฆโ) | (๐ฅโ โ ๐ฅโ0,โ๐ฆโ) โ ๐graph ๐น (๐ฅ, ๐ฆ)}.
so that๐ทโ๐น (๐ฅ |๐ฅ, ๐ฆ) (๐ฅโ0, ๐ฆโ) = ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) + ๐ฅโ0
We also have๐บโฒ(๐ฅ, ๐ฆ)โ(๐ฅโ0, ๐งโ) = (๐ฅโ0 + ([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ,๐บ (๐ฅ)โ๐งโ).
We now apply Theorem 23.8, whose remaining assumptions are veried exactly as thoseof Theorem 22.12, which yields
๐ทโ [๐บ โฆ ๐น ] (๐ฅ |๐ฅ, ๐ง) (๐ฅโ0, ๐งโ) =โ
๐ฆ :๐บ (๐ฅ,๐ฆ)=(๐ฅ,๐ง)๐ทโ๐น (๐ฅ |๐ฅ, ๐ฆ) (๐บโฒ(๐ฅ, ๐ฆ)โ(๐ฅโ0, ๐งโ))
=โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง๐ทโ๐น (๐ฅ |๐ฅ, ๐ฆ) (๐ฅโ0 + ([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ,๐บ (๐ฅ)โ๐งโ)
=โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง๐ฅโ0 + ๐ทโ๐น (๐ฅ |๐ฆ) (๐บ (๐ฅ)โ๐งโ) + ([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ.
It follows that
๐graph(๐บโฆ๐น ) (๐ฅ, ๐ฅ, ๐ง) =โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง
{(๐ฅโ,โ๐ฅโ0,โ๐งโ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅโ โ ๐ฅโ0 โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐บ (๐ฅ)โ๐งโ)+([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ
}.
Observe now that ๐ 1 is linear and invertible on ๐ graph(๐บ โฆ ๐น ). Therefore, another applica-tion of Lemma 23.2 yields
๐graph๐ป (๐ฅ, ๐ง) =โ
๐ฆ :๐บ (๐ฅ)๐ฆ=๐ง{(๐ฅโ,โ๐งโ) | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐บ (๐ฅ)โ๐งโ) + ([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ}.
Since the ๐ฆ is unique by our invertibility assumptions on๐บ (๐ฅ) and exists due to ๐ง โ ๐ป (๐ฅ),we obtain the claim. ๏ฟฝ
312
23 calculus for the frรฉchet coderivative
Corollary 23.14 (second derivative chain rule for Clarke subdierential). Let ๐,๐ be Banachspaces, let ๐ : ๐ โ ๐ be locally Lipschitz continuous, and let ๐ : ๐ โ ๐ be twice continuouslydierentiable. Set โ : ๐ โ ๐ , โ(๐ฅ) โ ๐ (๐ (๐ฅ)). If there exists a neighborhood ๐ of ๐ฅ โ ๐such that
(i) ๐ is Clarke regular at ๐ (๐ฅ) for all ๐ฅ โ ๐ ;(ii) ๐โฒ(๐ฅ)โ has a bounded left-inverse ๐โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) for all ๐ฅ โ ๐ ;
(iii) the mapping ๐ฅ โฆโ ๐โฒ(๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ ;then for all ๐ฅโ โ ๐๐ถโ(๐ฅ) = ๐โฒ(๐ฅ)โ๐๐ถ ๐ (๐ (๐ฅ)),
๐ทโ [๐๐ถโ] (๐ฅ |๐ฅโ) (๐ฅโโ) = ๐ (๐ฅ)โ๐ฅโโ + ๐โฒ(๐ฅ)โ๐ทโ [๐๐ถ ๐ ] (๐ (๐ฅ) |๐ฆโ) (๐โฒ(๐ฅ)โโ๐ฅโโ) (๐ฅโโ โ ๐ โโ)
for the linear operator ๐ : ๐ โ ๐(๐ ;๐ โ), ๐ (๐ฅ)ฮ๐ฅ := (๐โฒโฒ(๐ฅ)ฮ๐ฅ)โ๐ฆโ and the unique ๐ฆโ โ๐๐ถ ๐ (๐ (๐ฅ)) such that ๐โฒ(๐ฅ)โ๐ฆโ = ๐ฅโ.
Proof. The expression for ๐๐ถโ(๐ฅ) follows from Theorem 13.23. Let now ๐ : ๐ โ ๐(๐ โ;๐ โ),๐ (๐ฅ) โ ๐โฒ(๐ฅ)โ. Then ๐ is Frรฉchet dierentiable in๐ as well, which together with assump-tion (iii) allows us to apply Theorem 23.13 to obtain
๐ทโ [๐๐ถ (๐ โฆ๐)] (๐ฅ |๐ฅโ) (๐ฅโโ) = (๐โฒ(๐ฅ)๐ฆโ)โ๐ฅโโ+๐ทโ [(๐๐ถ ๐ )โฆ๐] (๐ฅ |๐ฆโ) (๐โฒ(๐ฅ)โโ๐ฅโโ) (๐ฅโโ โ ๐ โโ).
Furthermore, since ๐โฒ(๐ฅ)โ has a bounded left-inverse, we can apply Theorem 23.10 to obtainfor all ๐ฅ โ ๐ and all ๐ฅโ โ ๐๐ถ ๐ (๐ (๐ฅ))
๐ทโ [(๐๐ถ ๐ ) โฆ ๐] (๐ฅ |๏ฟฝ๏ฟฝโ) (๐ฆโโ) = ๐โฒ(๐ฅ)โ๐ทโ [๐๐ถ ๐ ] (๐ (๐ฅ) |๏ฟฝ๏ฟฝโ) (๐ฆโโ) (๐ฆโโ โ ๐ โโ)
for the unique ๏ฟฝ๏ฟฝโ โ ๐๐ถ ๐ (๐ (๐ฅ)) such that ๐โฒ(๐ฅ)โ๏ฟฝ๏ฟฝโ = ๐ฅโ. The claim now follows again fromthe fact that ๐โฒ(๐ฅ)ฮ๐ฅ = (๐โฒโฒ(๐ฅ)ฮ๐ฅ)โ. ๏ฟฝ
Note that ๐ (๐ฅ)ฮ๐ฅ := (๐โฒโฒ(๐ฅ)ฮ๐ฅ)โ๐ฆโ also occurs in the corresponding Corollary 22.18 andrecall from Examples 20.1 and 20.6 and Theorem 20.12 that coderivatives for dierentiablesingle-valued mappings amount to taking adjoints of their Frรฉchet derivative.
313
24 CALCULUS FOR THE CLARKE GRAPHICAL
DERIVATIVE
We now turn to the limiting (co)derivatives. Compared to the basic (co)derivatives, calculusrules for these are much more challenging and require even more assumptions. In this chap-ter, we consider the Clarke graphical derivative, where in addition to strict dierentiabilitywe will for the sake of simplicity assume T-regularity of the set-valued mapping (so thatthe Clarke graphical derivative coincides with the graphical derivative) and show that thisregularity is preserved under addition and composition with a single-valued mapping.
24.1 strict differentiability
The following concept generalizes the notion of strict dierentiability for single-valuedmappings (see Remark 2.6) to set-valued mappings. Let ๐,๐ be Banach spaces. We say that๐น : ๐ โ ๐ is strictly dierentiable at ๐ฅ โ ๐ for ๐ฆ โ ๐น (๐ฅ) if graph ๐น is closed near (๐ฅ, ๐ฆ)and
for every ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ), ๐๐โ 0, ๐ฅ๐ โ ๐ฅ with ๐ฅ๐ โ ๐ฅ๐๐๐
โ ฮ๐ฅ,
and ๏ฟฝ๏ฟฝ๐ โ ๐น (๐ฅ๐) with ๏ฟฝ๏ฟฝ๐ โ ๐ฆ,
(24.1a)
there exist ๐ฆ๐ โ ๐น (๐ฅ๐) with ๐ฆ๐ โ ๏ฟฝ๏ฟฝ๐๐๐
โ ฮ๐ฆ.(24.1b)
Compared to semi-dierentiability, strict dierentiability requires that the limits realizingthe various directions are interchangeable with limits of the base points; in other words,that the graphical derivative is itself an inner limit, i.e., if
(24.2) ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = lim inf๐โ 0,ฮ๐ฅโฮ๐ฅ
graph ๐น3(๐ฅ,๏ฟฝ๏ฟฝ)โ(๐ฅ,๐ฆ)
๐น (๐ฅ + ๐ฮ๐ฅ) โ ๏ฟฝ๏ฟฝ๐
(ฮ๐ฅ โ ๐ ).
Lemma 24.1. If ๐ and ๐ are nite-dimensional, then ๐น : ๐ โ ๐ is strictly dierentiable at๐ฅ โ ๐ for ๐ฆ โ ๐ if and only if
(24.3) ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) = lim infgraph ๐น3(๐ฅ,๏ฟฝ๏ฟฝ)โ(๐ฅ,๐ฆ),
ฮ๐ฅโฮ๐ฅ, ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ)โ โ ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
314
24 calculus for the clarke graphical derivative
Proof. We rst show that
(24.4) graph๐ท๐น (๐ฅ, ๐ฆ) โ(ฮ๐ฅ,ฮ๐ฆ)
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝฮ๐ฆ โ lim infgraph ๐น3(๐ฅ,๏ฟฝ๏ฟฝ)โ(๐ฅ,๐ฆ),
ฮ๐ฅโฮ๐ฅ, ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ)โ โ ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ)
โ ๐พ.
If (ฮ๐ฅ,ฮ๐ฆ) โ ๐พ , then there exist graph ๐น 3 (๐ฅ๐ , ๏ฟฝ๏ฟฝ๐) โ (๐ฅ, ๐ฆ) and ฮ๐ฅ๐ โ ฮ๐ฅ with๐ท๐น (๐ฅ๐ |๏ฟฝ๏ฟฝ๐) (ฮ๐ฅ๐) โ โ such that for some Y > 0 and an innite subset ๐ โ โ,
infฮ๐ฆ๐โ๐ท๐น (๐ฅ๐ |๏ฟฝ๏ฟฝ๐ ) (ฮ๐ฅ๐ )
โฮ๐ฆ๐ โ ฮ๐ฆ โ โฅ 2Y (๐ โ ๐ ).
By the characterization (20.1) of ๐ท๐น (๐ฅ๐ |๏ฟฝ๏ฟฝ๐), this implies the existence of ๐๐โ 0 such that
lim sup๐โโ
inf๐ฆ๐โ๐น (๐ฅ๐+๐๐ฮ๐ฅ๐ )
๐ฆ๐ โ ๏ฟฝ๏ฟฝ๐๐๐โ ฮ๐ฆ
โฅ Y.
Thus (ฮ๐ฅ,ฮ๐ฆ) โ graph๐ท๐น (๐ฅ, ๐ฆ), so that (24.4) holds.
Writing now (24.3) as
๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) ={ฮ๐ฆ โ ๐
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ (๐ฅ, ๏ฟฝ๏ฟฝ,ฮ๐ฅ) โ (๐ฅ, ๐ฆ,ฮ๐ฅ) โ โฮ๏ฟฝ๏ฟฝ โ ฮ๐ฆwith ฮ๏ฟฝ๏ฟฝ โ ๐ท๐น (๐ฅ |๏ฟฝ๏ฟฝ) (ฮ๐ฅ)
},
the characterization (20.4) of๐ท๐น (๐ฅ, ๐ฆ) provides the opposite inclusion graph๐ท๐น (๐ฅ, ๐ฆ) โ ๐พ .Therefore (24.3) holds. ๏ฟฝ
In particular, single-valued continuously dierentiable mappings and their inverses arestrictly dierentiable.
Lemma 24.2. Let ๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be single-valued.
(i) If ๐น is continuously dierentiable at ๐ฅ โ ๐ , then ๐น is strictly dierentiable at ๐ฅ for๐ฆ = ๐น (๐ฅ).
(ii) If ๐น is continuously dierentiable near ๐ฅ โ ๐ and ๐น โฒ(๐ฅ)โ has a left-inverse ๐น โฒ(๐ฅ)โโ โ๐(๐ โ;๐ โ), then ๐นโ1 is strictly dierentiable at ๐ฆ = ๐น (๐ฅ) for ๐ฅ .
Proof. The proof is analogous to Lemma 22.3, since the inverse function Theorem 2.8establishes the continuous dierentiability of ๐นโ1 and hence strict dierentiability. ๏ฟฝ
Remark 24.3. As in Remark 22.4, if ๐ is nite-dimensional, it suces in Lemma 24.2 (ii) to assumethat ๐น is continuously dierentiable with ker ๐น โฒ(๐ฅ)โ = {0}.
315
24 calculus for the clarke graphical derivative
24.2 cone transformation formulas
The main aim in the following lemmas is to show that tangential regularity is preservedunder certain transformations. We do this by proceeding as in Section 22.2 to derive explicitexpressions for the transformed cones and then comparing them with the correspondingexpressions obtained there for the graphical derivative.
Lemma 24.4. Let๐,๐ be Banach spaces and assume there exists a family of continuous inverseselections {๐ โ1๐ฆ : ๐๐ฆ โ ๐ถ | ๐ฆ โ ๐ถ, ๐ ๐ฆ = ๐ฅ} of ๐ โ ๐(๐ ;๐ ) to ๐ถ โ ๐ at ๐ฅ โ ๐ . If each ๐ โ1๐ฆ isFrรฉchet dierentiable at ๐ฅ and ๐ถ is tangentially regular at all ๐ฆ โ ๐ถ with ๐ ๐ฆ = ๐ฅ , then ๐ ๐ถ istangentially regular at ๐ฅ and
๐๐ ๐ถ (๐ฅ) =โ
๐ฆ :๐ ๐ฆ=๐ฅ๐ ๐๐ถ (๐ฆ).
Proof. We rst prove โโโ. Suppose ฮ๐ฆ โ ๐๐ถ (๐ฆ) for some ๐ฆ โ ๐ with ๐ ๐ฆ = ๐ฅ . Then forany ๐ถ 3 ๏ฟฝ๏ฟฝ๐ โ ๐ฆ there exist ๐ฆ๐ โ ๐ถ and ๐๐โ 0 such that ฮ๐ฆ = lim๐โโ(๐ฆ๐ โ ๏ฟฝ๏ฟฝ๐)/๐๐ .Consequently, since ๐ is bounded, ๐ (๐ฆ๐ โ ๏ฟฝ๏ฟฝ๐)/๐๐ โ ๐ ฮ๐ฆ . To show that ๐ ฮ๐ฆ โ ๐๐ ๐ถ (๐ฅ), let๐ ๐ถ 3 ๐ฅ๐ โ ๐ฅ be given. Take now ๏ฟฝ๏ฟฝ๐ = ๐ โ1๐ฆ (๐ฅ๐), which satises ๏ฟฝ๏ฟฝ๐ โ ๐ฆ = ๐ โ1๐ฆ (๐ฅ) due to๐ฅ๐ โ ๐ฅ . Then ๐ฅ๐ = ๐ ๏ฟฝ๏ฟฝ๐ โ ๐ ๐ถ satises (๐ ๐ฆ๐ โ ๐ฅ๐)/๐๐ โ ๐ ฮ๐ฆ , which shows โโโ.To prove โโโ, suppose that ฮ๐ฅ โ ๐๐ ๐ถ (๐ฅ) and hence ฮ๐ฅ โ ๐๐ ๐ถ (๐ฅ) by Theorem 18.5. ByLemma 22.6, ฮ๐ฅ = ๐ ฮ๐ฆ for some ๐ฆ โ ๐ with ๐ ๐ฆ = ๐ฅ and ฮ๐ฆ โ ๐๐ถ (๐ฆ) = ๐๐ถ (๐ฆ) by theassumed tangential regularity of ๐ถ at ๐ฆ . This shows โโโ.Comparing now the expression for๐๐ ๐ถ (๐ฆ) = ๐๐ถ (๐ฆ) with the expression for๐๐ ๐ถ (๐ฅ) providedby Lemma 22.6 and using the tangential regularity of ๐ถ shows the claimed tangentialregularity of ๐ ๐ถ . ๏ฟฝ
Remark 24.5 (regularity assumptions). The assumption in Lemma 24.4 that๐ถ is tangentially regularis not needed if ker๐ = {0} or, more generally, if ๐ is a continuously dierentiable mapping withkerโ๐ (๐ฆ) = {0}; see [Mordukhovich 1994, Corollary 5.4].
Lemma 24.6 (fundamental lemma on compositions). Let ๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ โ ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}
for ๐น : ๐ โ ๐ , and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ and either
(a) ๐บ is strictly dierentiable at ๐ฆ for ๐ง, or
(b) ๐นโ1 is strictly dierentiable at ๐ฆ for ๐ฅ ,
316
24 calculus for the clarke graphical derivative
then
(24.5) ๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) | ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ), ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ)}.Moreover, if ๐น is T-regular at ๐ฅ for ๐ฆ and ๐บ is T-regular at ๐ฆ for ๐ง, then ๐ถ is tangentiallyregular at (๐ฅ, ๐ฆ, ๐ง).
Proof. The proof is analogous to Lemma 22.8, using in this case the strict dierentiabilityof ๐บ in place of semi-dierentiability. We only consider the case (a) as the case (b) is againproved similarly. First, we have (ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง) if and only if for all ๐๐โ 0 and(๐ฅ๐ , ๏ฟฝ๏ฟฝ๐ , ๐ง๐) โ (๐ฅ, ๐ฆ, ๐ง), there exist (๐ฅ๐ , ๐ฆ๐ , ๐ง๐) โ ๐ถ such that
ฮ๐ฅ = lim๐โโ
๐ฅ๐ โ ๐ฅ๐๐๐
, ฮ๐ฆ = lim๐โโ
๐ฆ๐ โ ๏ฟฝ๏ฟฝ๐๐๐
, ฮ๐ง = lim๐โโ
๐ง๐ โ ๐ง๐๐๐
.
This immediately yields โโโ.To prove โโโ, suppose ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) and ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ) and take ๐๐โ 0 and(๐ฅ๐ , ๏ฟฝ๏ฟฝ๐ , ๐ง๐) โ (๐ฅ, ๐ฆ, ๐ง). Furthermore, by denition of ๐ท๐น (๐ฅ |๐ฆ), there exist (๐ฅ๐ , ๐ฆ๐) โgraph ๐น such that the rst two limits hold. By the strict dierentiability of ๐บ at ๐ฆ for๐ง, we can also nd ๐ง๐ โ ๐บ (๐ฆ๐) such that (๐ง๐ โ ๐ง๐)/๐๐ โ ฮ๐ง. This shows the remaininglimit.
Finally, the tangential regularity of ๐ถ follows from the assumed ๐ -regularities of ๐น and ๐บby comparing (24.5) with the corresponding expression (22.3). ๏ฟฝ
If one of the two mappings is single-valued, we can use Lemma 24.2 for verifying itssemi-dierentiability and Theorem 20.12 for the expression of its graphical derivative toobtain from Lemma 24.6 the following two special cases.
Corollary 24.7 (fundamental lemma on compositions: single-valued outer mapping). Let๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ,๐บ (๐ฆ)) | ๐ฆ โ ๐น (๐ฅ)}for ๐น : ๐ โ ๐ and๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ and๐บ is continuously dierentiable at ๐ฆ , then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,๐บโฒ(๐ฆ)ฮ๐ฆ) | ฮ๐ฆ โ ๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ)}.Moreover, if ๐น is T-regular at (๐ฅ, ๐ฆ), then ๐ถ is tangentially-regular at (๐ฅ, ๐ฆ,๐บ (๐ฆ)).
Corollary 24.8 (fundamental lemma on compositions: single-valued inner mapping). Let๐,๐, ๐ be Banach spaces and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ = ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ , ๐น is continuously Frรฉchet dierentiable at ๐ฅ ,and ๐น โฒ(๐ฅ)โ has a left-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ), then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(ฮ๐ฅ,ฮ๐ฆ,ฮ๐ง) | ฮ๐ฆ = ๐น โฒ(๐ฅ)ฮ๐ฅ, ฮ๐ง โ ๐ท๐บ (๐ฆ |๐ง) (ฮ๐ฆ)}.Moreover, if ๐บ is T-regular at (๐ฆ, ๐ง), then ๐ถ is tangentially regular at (๐ฅ, ๐ฆ, ๐ง).
317
24 calculus for the clarke graphical derivative
24.3 calculus rules
Using these lemmas, we again obtain calculus rules under the assumption that the involvedset-valued mapping is regular.
Theorem 24.9 (addition of a single-valued dierentiable mapping). Let ๐,๐ be Banachspaces, let๐บ : ๐ โ ๐ be Frรฉchet dierentiable, and ๐น : ๐ โ ๐ . If ๐บ is continuously Frรฉchetdierentiable at ๐ฅ โ ๐ and ๐น is T-regular at (๐ฅ, ๐ฆ โ๐บ (๐ฅ)) for ๐ฆ โ ๐ป (๐ฅ) โ ๐น (๐ฅ) +๐บ (๐ฅ), then๐ป is T-regular at (๐ฅ, ๐ฆ) and
๐ท๐ป (๐ฅ |๐ฆ) (ฮ๐ฅ) = ๐ท๐น (๐ฅ |๐ฆ โ๐บ (๐ฅ)) (ฮ๐ฅ) +๐บโฒ(๐ฅ)ฮ๐ฅ (ฮ๐ฅ โ ๐ ).
Proof. We construct ๐ป from ๐ถ and ๐ as in Theorem 22.11. Due to the assumptions (notingthat continuous dierentiability implies strict dierentiability), ๐ถ and ๐ ๐ถ are tangentiallyregular by Lemmas 24.4 and 24.6, respectively. We now obtain the claimed expression fromTheorem 22.11. ๏ฟฝ
Theorem 24.10 (outer composition with a single-valued dierentiable mapping). Let๐,๐, ๐be Banach spaces, ๐น : ๐ โ ๐ , and๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐น (๐ )) be given.If ๐บ is continuously Frรฉchet dierentiable at each ๐ฆ โ ๐น (๐ฅ), invertible on ran๐บ near ๐ง withFrรฉchet dierentiable inverse at ๐ง, and ๐น is T-regular at (๐ฅ, ๐ฆ), then ๐ป is T-regular at (๐ฅ, ๐ง)and
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) =โ
๐ฆ :๐บ (๐ฆ)=๐ง๐บโฒ(๐ฆ)๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
Proof. We construct ๐ป from ๐ถ and ๐ as in Theorem 22.12. Due to the assumptions, ๐ถ and๐ ๐ถ are tangentially regular by Corollary 24.7 and Lemma 24.4, respectively. We now obtainthe claimed expression from Theorem 22.12. ๏ฟฝ
The special case for a linear operator follows from this exactly as in the proof of Corol-lary 22.13.
Corollary 24.11 (outer composition with a linear operator). Let ๐,๐, ๐ be Banach spaces,๐ด โ ๐(๐ ;๐ ), and ๐น : ๐ โ ๐ . If ๐ด has a bounded left-inverse ๐ดโ and ๐น is T-regular at (๐ฅ, ๐ฆ)for ๐ฅ โ ๐ and the unique ๐ฆ โ ๐ with ๐ด๐ฆ = ๐ง, then for any ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) := ๐ด๐น (๐ฅ),then ๐ป is T-regular at (๐ฅ, ๐ง) and
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = ๐ด๐ท๐น (๐ฅ |๐ฆ) (ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
318
24 calculus for the clarke graphical derivative
Theorem 24.12 (inner composition with a single-valued dierentiable mapping). Let๐,๐, ๐be Banach spaces, ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) := ๐บ (๐น (๐ฅ)). If๐น is continuously Frรฉchet dierentiable near ๐ฅ such that ๐น โฒ(๐ฅ)โ has a bounded left-inverse๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) and ๐บ is T-regular at (๐น (๐ฅ), ๐ง), then ๐ป is T-regular at (๐ฅ, ๐ง) and
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = ๐ท๐บ (๐น (๐ฅ) |๐ง) (๐น โฒ(๐ฅ)ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
Proof. We construct ๐ป from ๐ถ and ๐ as in Theorem 22.14. Due to the assumptions, ๐ถ and๐ ๐ถ are tangentially regular by Corollary 24.8 and Lemma 24.4, respectively. We now obtainthe claimed expression from Theorem 22.14. ๏ฟฝ
Corollary 24.13 (inner composition with a linear operator). Let ๐,๐, ๐ be Banach spaces,๐ด โ ๐(๐ ;๐ ), and ๐บ : ๐ โ ๐ . Let ๐ป โ ๐บ โฆ ๐ด for ๐ด โ ๐(๐ ;๐ ) and ๐บ : ๐ โ ๐ on Banachspaces ๐,๐ , and ๐ . If ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ) and ๐บ is T-regular at (๐ด๐ฅ, ๐ง) for๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ด๐ฅ), then ๐ป is ๐ -regular at (๐ฅ, ๐ง) and
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = ๐ท๐บ (๐ด๐ฅ |๐ง) (๐ดฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
As in Section 22.3, we can apply these results to chain rules for subdierentials, this timeonly at points where these subdierentials are T-regular.
Corollary 24.14 (second derivative chain rule for convex subdierential). Let๐,๐ be Banachspaces, let ๐ : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐ด โ ๐(๐ ;๐ ) be suchthat ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ), and ran๐ด โฉ int dom ๐ โ โ . Let โ โ ๐ โฆ๐ด. If ๐๐is T-regular at ๐ด๐ฅ , ๐ฅ โ ๐ , for ๐ฆโ โ ๐๐ (๐ด๐ฅ), then ๐โ is T-regular at ๐ฅ for ๐ฅโ = ๐ดโ๐ฆโ and
๐ท [๐โ] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = ๐ดโ๐ท [๐๐ ] (๐ด๐ฅ |๐ฆโ) (๐ดฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
Theorem 24.15 (product rule). Let ๐,๐, ๐ be Banach spaces, let ๐บ : ๐ โ ๐(๐ ;๐ ) be Frรฉchetdierentiable, and ๐น : ๐ โ ๐ . Assume that ๐บ (๐ฅ) โ ๐(๐ ;๐ ) has a left-inverse ๐บ (๐ฅ)โ โ onran๐บ (๐ฅ) for ๐ฅ near ๐ฅ โ ๐ and the mapping ๐ฅ โฆโ ๐บ (๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ . Let๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ฅ)๐น (๐ฅ) โ โ
๐ฆโ๐น (๐ฅ)๐บ (๐ฅ)๐ฆ . If ๐น is T-regular at ๐ฅ for the unique๐ฆ โ ๐น (๐ฅ) satisfying ๐บ (๐ฅ)๐ฆ = ๐ง and ๐บ is continuously dierentiable at ๐ฆ , then ๐ป is T-regularat ๐ฅ for ๐ง and
๐ท๐ป (๐ฅ |๐ง) (ฮ๐ฅ) = [๐บโฒ(๐ฅ)ฮ๐ฅ]๐ฆ +๐บ (๐ฅ)๐ท๐น (๐ฅ |๐ฆ)ฮ๐ฅ (ฮ๐ฅ โ ๐ ).
Proof. We construct ๐ป from ๐ 1 and graph(๐บ โฆ ๐น ) as in Theorem 22.17. Due to the as-sumptions, ๐บ and ๐น are T-regular, and hence ๐ป is tangentially regular by Theorem 24.10and Lemma 24.4. We now obtain the claimed expression from Theorem 22.17. ๏ฟฝ
319
24 calculus for the clarke graphical derivative
Corollary 24.16 (second derivative chain rule for Clarke subdierential). Let๐,๐ be Banachspaces, let ๐ : ๐ โ ๐ be locally Lipschitz continuous, and let ๐ : ๐ โ ๐ be twice continuouslydierentiable. Set โ : ๐ โ ๐ , โ(๐ฅ) โ ๐ (๐ (๐ฅ)). If there exists a neighborhood ๐ of ๐ฅ โ ๐such that
(i) ๐ is Clarke regular at ๐ (๐ฅ) for all ๐ฅ โ ๐ ;(ii) ๐โฒ(๐ฅ)โ has a bounded left-inverse ๐โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) for all ๐ฅ โ ๐ ;
(iii) the mapping ๐ฅ โฆโ ๐โฒ(๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ ;and ๐๐ถ ๐ is T-regular at ๐ (๐ฅ) for ๐ฆโ โ ๐๐ถ ๐ (๐ (๐ฅ)), then ๐๐ถโ is T-regular at ๐ฅ for ๐ฅโ = ๐โฒ(๐ฅ)โ๐ฆโand
๐ท [๐๐ถโ] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = (๐โฒโฒ(๐ฅ)ฮ๐ฅ)โ๐ฆโ + ๐โฒ(๐ฅ)โ๐ท [๐๐ถ ๐ ] (๐ (๐ฅ) |๐ฆโ) (๐โฒ(๐ฅ)ฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
320
25 CALCULUS FOR THE LIMITING CODERIVATIVE
The limiting coderivative is the most challenging of all the graphical and coderivatives,and developing exact calculus rules for it requires the most assumptions. In particular,we will here assume a stronger variant of the assumptions of Chapter 23 for the Frรฉchetcoderivative that also implies N-regularity of the set-valued mapping so that we can exploitthe stronger properties of the Frรฉchet coderivative. To prove the fundamental compositionlemmas, we will also need to introduce the concept of partial sequential normal compactnessthat will be used to prevent certain unit-length coderivatives from converging weakly-โ tozero. This concept will also be needed in Chapter 27.
25.1 strict codifferentiability
Let ๐,๐ be Banach spaces. We say that ๐น is strictly codierentiable at ๐ฅ โ ๐ for ๐ฆ โ ๐น (๐ฅ)if
(25.1) ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) ={๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โgraph ๐น 3 (๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ), Y๐โ 0 :โ(๐ฅโ
๐, ๐ฆโ๐) โโ (๐ฅโ, ๐ฆโ) with ๐ฅโ
๐โ ๐ทโ
Y๐๐น (๐ฅ๐ |๐ฆ๐) (๐ฆโ๐ )
},
i.e., if (18.8) is a full weak-โ-limit. From Theorem 20.12 and Corollary 20.14, it is clearthat single-valued continuously dierentiable mappings and their inverses are strictlycodierentiable.
Lemma 25.1. Let ๐,๐ be Banach spaces, ๐น : ๐ โ ๐ , ๐ฅ โ ๐ , and ๐ฆ = ๐น (๐ฅ).(i) If ๐น is continuously dierentiable at ๐ฅ , then ๐น is strictly codierentiable at ๐ฅ for ๐ฆ .
(ii) If ๐น is continuously dierentiably near ๐ฅ , then ๐นโ1 is strictly codierentiable at ๐ฆ for ๐ฅ .
The next lemma and counterexample demonstrate that strict codierentiability is a strictlystronger assumption than N-regularity.
Lemma 25.2. Let ๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be strictly codierentiable at ๐ฅ for๐ฆ . Then ๐น is N-regular at ๐ฅ for ๐ฆ .
321
25 calculus for the limiting coderivative
Proof. By Theorem 18.5, strict codierentiability, and the denition of the inner limit,respectively,
๐graph ๐น (๐ฅ, ๐ฆ) โ ๐graph ๐น (๐ฅ, ๐ฆ)= lim inf
graph ๐น3(๐ฅ,๏ฟฝ๏ฟฝ)โ(๐ฅ,๐ฆ), Yโ 0๐ Ygraph ๐น (๐ฅ, ๏ฟฝ๏ฟฝ)
โ ๐graph ๐น (๐ฅ, ๐ฆ).Therefore ๐graph ๐น (๐ฅ, ๐ฆ) = ๐graph ๐น (๐ฅ, ๐ฆ), i.e., graph ๐น is normally regular at (๐ฅ, ๐ฆ). ๏ฟฝ
Example 25.3 (graphical regularity does not imply strict codierentiability). Consider๐น (๐ฅ) โ [|๐ฅ |,โ),๐ฅ โ โ. Then graph ๐น = epi | ยท | is a convex set and therefore graphicallyregular at all points and
๐graph ๐น (๐ฅ, |๐ฅ |) ={(sign๐ฅ,โ1) [0,โ) if ๐ฅ โ 0,graph ๐น โฆ = {(๐ฅโ, ๐ฆโ) | โ๐ฆโ โฅ |๐ฅโ |} if ๐ฅ = 0.
Hence๐graph ๐น is not continuous and therefore, a fortiori, ๐น is not strictly codierentiableat (0, 0).
25.2 partial sequential normal compactness
One central diculty in working with innite-dimensional spaces is the need to distinguishweak-โ convergence and strong convergence. In particular, we need to prevent certainsequences whose norm is bounded away from zero from weak-โ converging to zero. Aswe cannot guarantee this in general, we need to add this as an assumption. In our specicsetting, this is the partial sequential normal compactness (PSNC) of ๐บ : ๐ โ ๐ at ๐ฆ for ๐ง,which holds if
(25.2) Y๐โ 0, (๐ฆ๐ , ๐ง๐) โ (๐ฆ, ๐ง), ๐ฆโ๐ โโ 0, โ๐งโ๐ โ๐ โ โ 0, and ๐ฆโ๐ โ ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐)
โ โ๐ฆโ๐ โ๐ โ โ 0.
Obviously, if ๐ โ nite-dimensional, then every mapping ๐บ : ๐ โ ๐ is PSNC. To prove thePSNC property of single-valued mappings and their inverses, we will need an estimate ofY-coderivatives.
Lemma 25.4. Let ๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be continuously dierentiable at๐ฅ โ ๐ . Then for any Y > 0, ๐ฟ โ โ๐น โฒ(๐ฅ)โ๐(๐ ;๐ ) , and ๐ฆ = ๐น (๐ฅ),
๐ทโY ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ ๐น(๐น โฒ(๐ฅ)โ๐ฆโ, (๐ฟ + 1)Y) (๐ฆโ โ ๐ โ) .
322
25 calculus for the limiting coderivative
Proof. By denition, ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) if and only if for every sequence ๐ฅ๐ โ ๐ฅ ,
(25.3) lim sup๐โโ
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐น (๐ฅ๐) โ ๐น (๐ฅ)ใ๐โโ๐ฅ๐ โ ๐ฅ โ2๐ + โ๐น (๐ฅ๐) โ ๐น (๐ฅ)โ2๐
โค Y.
Let โ > ๐ฟ. Then by the continuous dierentiability and therefore local Lipschitz continuityof ๐น at ๐ฅ , we have โ๐น (๐ฅ๐) โ ๐น (๐ฅ)โ๐ โค โ โ๐ฅ๐ โ ๐ฅ โ๐ for large enough ๐ and therefore
lim sup๐โโ
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐น (๐ฅ๐) โ ๐น (๐ฅ)ใ๐โ๐ฅ๐ โ ๐ฅ โ๐
โค Y (โ + 1).
Furthermore, the Frรฉchet dierentiability of ๐น implies that
lim sup๐โโ
ใ๐น โฒ(๐ฅ)โ๐ฆโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐น (๐ฅ๐) โ ๐น (๐ฅ)ใ๐โ๐ฅ๐ โ ๐ฅ โ๐
= 0
and hence thatlim sup๐โโ
ใ๐ฅโ โ ๐น โฒ(๐ฅ)โ๐ฆโ, ๐ฅ๐ โ ๐ฅใ๐โ๐ฅ๐ โ ๐ฅ โ๐
โค Y (โ + 1).
Since ๐ฅ๐ โ ๐ฅ was arbitrary, this implies โ๐ฅโ โ ๐น โฒ(๐ฅ)โ๐ฆโโ๐ โ โค Y (โ + 1), and since โ > ๐ฟ wasarbitrary, the claim follows. ๏ฟฝ
Lemma 25.5. Let ๐, ๐ be Banach spaces and ๐บ : ๐ โ ๐ . If either
(a) ๐บ is continuously dierentiable near ๐ฆ โ ๐ or
(b) ๐ โ is nite-dimensional,
then ๐บ is PSNC at ๐ฆ for ๐ง = ๐บ (๐ฆ).
Proof. The nite-dimensional case (b) is clear from the denition (25.2) of the PSNC prop-erty.
For case (a), we have from Lemma 25.4 that ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐) โ ๐น(๐บโฒ(๐ฆ๐)โ๐งโ๐ , โY๐) for
any โ > โ๐บโฒ(๐ฆ๐)โ๐(๐ ;๐ ) . By the continuous dierentiability of ๐บ , this will hold for โ >
โ๐บโฒ(๐ฆ)โ๐(๐ ;๐ ) and any ๐ โ โ large enough. Thus there exist ๐โ๐โ ๐น(0, โY๐) such that
๐ฆโ๐ = ๐บโฒ(๐ฆ๐)โ๐งโ๐ + ๐โ๐ = ๐บโฒ(๐ฆ)โ๐งโ๐ + [๐บโฒ(๐ฆ๐) โ๐บโฒ(๐ฆ)]โ๐งโ๐ + ๐โ๐ โ 0
since ๐โ๐โ 0 (due to Y๐โ 0), โ๐งโ
๐โ๐ โ โ 0, ๐ฆ๐ โ ๐ฆ , and ๐บ is continuously dierentiable
near ๐ฆ . ๏ฟฝ
Lemma 25.6. Let ๐, ๐ be Banach spaces and ๐บ : ๐ โ ๐ . If either
(a) ๐บ is continuously dierentiable near ๐ฆ โ ๐ and ๐บโฒ(๐ฆ)โ โ ๐(๐ โ;๐ โ) has a left-inverse๐บโฒ(๐ฆ)โโ โ ๐(๐ โ;๐ โ), i.e., ๐บโฒ(๐ฆ)โโ ๐บโฒ(๐ฆ)โ = Id, or
323
25 calculus for the limiting coderivative
(b) ๐ โ is nite-dimensional,
then ๐บโ1 is PSNC at ๐ง = ๐บ (๐ฆ) for ๐ฆ .
Proof. The nite-dimensional case (b) is clear from the denition (25.2) of the PSNC prop-erty.
For case (a), we have from the denition of ๐ทโY ๐น via ๐ Y
graph ๐น that ฮ๐งโ๐โ ๐ทโ
Y๐บโ1(๐ง๐ |๐ฆ๐) (ฮ๐ฆโ๐ )
if and only if ฮ๐ฆโ๐โ ๐ทโ
Y๐บ (๐ฆ๐ |๐ง๐) (ฮ๐งโ๐). We thus have to show that
Y๐โ 0, (๐ฆ๐ , ๐ง๐) โ (๐ฆ, ๐ง), ๐งโ๐ โโ 0, โ๐ฆโ๐ โ๐ โ โ 0, and ๐ฆโ๐ โ ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐)
โ โ๐งโ๐ โ๐ โ โ 0.
From Lemma 25.4, it follows that ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐) โ ๐น(๐บโฒ(๐ฆ๐)โ๐งโ๐ , โY๐)) for any โ >
โ๐บโฒ(๐ฆ๐)โ๐(๐ ;๐ ) . As in Lemma 25.5, we now deduce that ๐ฆโ๐= ๐บโฒ(๐ฆ๐)โ๐งโ๐ + ๐โ
๐for some
๐โ๐โ ๐น(0, โY๐). Since ๐ฆโ๐ โ ๐โ
๐โ 0, we also have ๐บโฒ(๐ฆ๐)โ๐งโ๐ โ 0 and thus ๐บโฒ(๐ฆ)โ๐งโ
๐+
[๐บโฒ(๐ฆ๐) โ ๐บโฒ(๐ฆ)]โ๐งโ๐โ 0. Since {๐งโ
๐}๐โโ is bounded by the continuous dierentiability
of ๐บ and ๐ฆ๐ โ ๐ฆ , we obtain ๐บโฒ(๐ฆ)โ๐งโ๐โ 0. Since ๐บโฒ(๐ฆ)โ is assumed to have a bounded
left-inverse, this implies ๐งโ๐โ 0 as required. ๏ฟฝ
We will use PSNC to obtain the following partial compactness property for the limitingcoderivative, for which we need to assume reexivity (or nite-dimensionality) of ๐ .
Lemma 25.7. Let ๐, ๐ be Banach spaces and ๐บ : ๐ โ ๐ . Let ๐ฆ โ ๐ and ๐ง โ ๐บ (๐ฆ) be given.Assume ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (0) implies ๐ฆโ = 0 and either
(a) ๐ is nite-dimensional or
(b) ๐ is reexive and ๐บ is PSNC at ๐ฆ for ๐ง.
If(๐ฆ๐ , ๐ง๐) โ (๐ฆ, ๐ง), ๐งโ๐
โโ ๐งโ, Y๐โ 0, and ๐ฆโ๐ โ ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐),
then there exists a subsequence such that ๐ฆโ๐
โโ ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ).
Proof. We rst show that {๐ฆโ๐}๐โโ is bounded. We argue by contradiction and suppose
that {๐ฆโ๐}๐โโ is unbounded. We may then assume that โ๐ฆโ
๐โ๐ โ โ โ by switching to an
(unrelabelled) subsequence. Since ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) is formed from a cone, we also have
๐ต๐ โ 3 ๐ฆโ๐/โ๐ฆโ๐ โ๐ โ โ ๐ทโY๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐/โ๐ฆโ๐ โ๐ โ) .
Observe that โ๐งโ๐/โ๐ฆโ
๐โ๐ โ โ๐ โ โ 0 because {๐งโ
๐}๐โโ is bounded. Since ๐ is reexive, we can
use the EberleinโSmulyan Theorem 1.9 to extract a subsequence such that ๐ฆโ๐/โ๐ฆโ
๐โ๐ โ โโ ๐ฆโ
for some ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (0). If๐ is nite-dimensional, clearly ๐ฆโ โ 0. Otherwise we need to
324
25 calculus for the limiting coderivative
use the assumed PSNC property. If ๐ฆโ = 0, then (25.2) implies that 1 = โ๐ฆโ๐/โ๐ฆโ
๐โ๐ โ โ๐ โ โ 0,
which is a contradiction. Therefore ๐ฆโ โ 0. However, we have assumed ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (0)to imply ๐ฆโ = 0, so we obtain a contradiction.
Therefore {๐ฆโ๐}๐โโ is bounded, so we may again use the EberleinโSmulyan Theorem 1.9
to extract a subsequence converging to some ๐ฆโ โ ๐ . By the denition of the limitingcoderivative, this implies ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ) and hence the claim. ๏ฟฝ
Remark 25.8. The PSNC property, its stronger variant sequential normal compactness (SNC), andtheir implications are studied in signicant detail in [Mordukhovich 2006].
25.3 cone transformation formulas
As in Section 24.2, we now show that normal regularity is preserved under certain trans-formations by deriving explicit expressions for the transformed cones and then comparingthem with the corresponding expressions of the Frรฉchet coderivative.
Lemma 25.9. Let๐,๐ be Banach spaces and assume there exists a family of continuous inverseselections {๐ โ1๐ฆ : ๐๐ฆ โ ๐ถ | ๐ฆ โ ๐ถ, ๐ ๐ฆ = ๐ฅ} of ๐ โ ๐(๐ ;๐ ) to ๐ถ โ ๐ at ๐ฅ โ ๐ . If each ๐ โ1๐ฆis Frรฉchet dierentiable at ๐ฅ and ๐ถ is normally regular at all ๐ฆ โ ๐ถ with ๐ ๐ฆ = ๐ฅ , then ๐ ๐ถ isnormally regular at ๐ฅ and
๐๐ ๐ถ (๐ฅ) =โ
๐ฆ :๐ ๐ฆ=๐ฅ{๐ฅโ โ ๐ โ | ๐ โ๐ฅโ โ ๐๐ถ (๐ฆ)}.
Proof. We rst prove โโโ. Let ๐ฅโ โ ๐๐ ๐ถ (๐ฅ). By denition, this holds if and only if thereexist Y๐โ 0 as well as ๐ฅโ
๐โโ ๐ฅโ and ๐ฅ๐ โ ๐ฅ with ๐ฅโ
๐โ ๐ Y๐
๐ ๐ถ (๐ฅ๐). Let ๐ฆ โ ๐ be such that๐ ๐ฆ = ๐ฅ . Dening ๐ฆ๐ โ ๐ โ1๐ฆ ๐ฅ๐ , we have ๐ ๐ฆ๐ = ๐ฅ๐ and๐ถ 3 ๐ฆ๐ โ ๐ฆ . Thus Lemma 23.2 yields๐ โ๐ฅโ
๐โ ๐ Y๐๐ฟ
๐ถ (๐ฆ๐). By denition of the limiting coderivative, this implies that ๐ โ๐ฅโ โ ๐๐ถ (๐ฆ).Since this holds for all ๐ฆ โ ๐ with ๐ ๐ฆ = ๐ฅ , we obtain โโโ.For โโโ, Let ๐ฅโ โ ๐ โ be such that ๐ โ๐ฅโ โ ๐๐ถ (๐ฆ) for all ๐ฆ โ ๐ with ๐ ๐ฆ = ๐ฅ . Then theassumption of regularity of ๐ถ at ๐ฆ implies that ๐ โ๐ฅโ โ ๐๐ถ (๐ฆ). Hence taking ๐ฆ๐ = ๐ฆ ,๐ฅโ๐= ๐ฅโ, and ๐ฅ๐ = ๐ฅ , we deduce from Lemma 23.2 that ๐ฅโ
๐โ ๐๐ ๐ถ (๐ฅ๐). Again by denition,
this implies that ๐ฅโ โ ๐๐ ๐ถ (๐ฅ).Finally, the normal regularity of๐ ๐ถ at๐ฅ is clear fromwriting๐๐ถ (๐ฆ) = ๐๐ถ (๐ฆ) and comparingour expression for ๐๐ ๐ถ (๐ฅ) to the expression for ๐๐ ๐ถ (๐ฅ) provided by Lemma 23.2. ๏ฟฝ
Remark 25.10 (regularity assumptions). Again, the assumption in Lemma 24.4 that ๐ถ is normallyregular is not needed if ker๐ = {0} or, more generally, if ๐ is a continuously dierentiable mappingwith kerโ๐ (๐ฆ) = {0}.
325
25 calculus for the limiting coderivative
For the fundamental lemma for the limiting coderivative, we need to assume reexivity of๐ in order to apply the PSNC via Lemma 25.7.
Lemma 25.11 (fundamental lemma on compositions). Let ๐,๐, ๐ be Banach spaces with ๐reexive and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ โ ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}for ๐น : ๐ โ ๐ , and ๐บ : ๐ โ ๐ . Let (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ .
(i) If ๐บ is strictly codierentiable and PSNC at ๐ฆ for ๐ง, semi-codierentiable near (๐ฆ, ๐ง) โgraph๐บ , and ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (0) implies ๐ฆโ = 0, then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐ฅโ, ๐ฆโ, ๐งโ) | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ)}.
(ii) If ๐นโ1 is strictly codierentiable and PSNC at ๐ฆ for ๐ฅ , semi-codierentiable near (๐ฆ, ๐ฅ) โgraph ๐นโ1, and ๐ฆโ โ ๐ทโ๐นโ1(๐ฆ |๐ฅ) (0) implies ๐ฆโ = 0, then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐ฅโ, ๐ฆโ, ๐งโ) | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), โ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (โ๐งโ)}.
Moreover, if ๐น is N-regular at ๐ฅ for ๐ฆ and๐บ is ๐ -regular at ๐ฆ for ๐ง, then๐ถ is normally regularat (๐ฅ, ๐ฆ, ๐ง).
Proof. We only consider the case (i); the case (ii) is shown analogously. To show the in-clusion โโโ, let (๐ฅโ, ๐ฆโ, ๐งโ) โ ๐๐ถ (๐ฅ, ๐ฆ, ๐ง), which by denition holds if and only if thereexist Y๐โ 0 as well as (๐ฅโ
๐, ๐ฆโ๐, ๐งโ๐) โโ (๐ฅโ, ๐ฆโ, ๐งโ) and ๐ถ 3 (๐ฅ๐ , ๐ฆ๐ , ๐ง๐) โ (๐ฅ, ๐ฆ, ๐ง) with
(๐ฅโ๐, ๐ฆโ๐, ๐งโ๐) โ ๐ Y๐
๐ถ (๐ฅ๐ , ๐ฆ๐ , ๐ง๐). Since by assumption ๐บ is semi-codierentiable at (๐ฆ๐ , ๐ง๐) โgraph๐บ for ๐ โ โ suciently large, we can apply Lemma 23.4 (i) to obtain a ๏ฟฝ๏ฟฝโ
๐โ
๐ทโ๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐) such that
(25.4) ๐ฅโ๐ โ ๐ทโY๐๐น (๐ฅ๐ |๐ฆ๐) (โ๏ฟฝ๏ฟฝโ๐ โ ๐ฆโ๐ ).
Since ๐งโ๐
โโ ๐งโ, (๐ฆ๐ , ๐ง๐) โ (๐ฆ, ๐ง), and Y๐โ 0, we deduce from Lemma 25.7 that ๏ฟฝ๏ฟฝโ๐
โโ ๏ฟฝ๏ฟฝโ (fora subsequence) for some ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ). Since also ๐ฅโ
๐โโ ๐ฅโ and ๐ฆโ
๐โโ ๐ฆโ, by (25.4)
and the denition of the limiting coderivative, this implies that ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ).To show โโโ, let ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ๏ฟฝ๏ฟฝโ โ ๐ฆโ) and ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (๐งโ). We can then by thedenition of ๐ทโ๐น (๐ฅ |๐ฆ) nd Y๐โ 0 as well as (๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ) and (๐ฅโ
๐, ๐ฆโ๐) โโ (๐ฅโ, ๏ฟฝ๏ฟฝโ + ๐ฆโ)
with๐ฅโ๐โ ๐ทโ
Y๐๐น (๐ฅ๐ |๐ฆ๐) (โ๐ฆโ๐ ). Since๐บ is strictly codierentiable at ๐ฆ for๐ง, taking any๐ง๐ โ ๐ง,
we can now nd ๐งโ๐
โโ ๐งโ and ๏ฟฝ๏ฟฝโ๐
โโ ๏ฟฝ๏ฟฝโ with ๏ฟฝ๏ฟฝโ๐โ ๐ทโ
Y๐๐บ (๐ฆ๐ |๐ง๐) (๐งโ๐). Letting ๐ฆโ๐ โ ๐ฆโ
๐โ ๏ฟฝ๏ฟฝโ
๐,
this implies that ๐ฆโ๐
โโ ๐ฆโ and that ๐ฅโ๐โ ๐ทโ
Y๐๐น (๐ฅ๐ |๐ฆ๐) (โ๏ฟฝ๏ฟฝโ๐ โ ๐ฆโ
๐). By Lemma 23.4 (i), it
follows that (๐ฅโ๐, ๐ฆโ๐, ๐งโ๐) โ ๐ Y๐
๐ถ (๐ฅ๐ , ๐ฆ๐ , ๐ง๐). The claim now follows again from the denitionof ๐๐ถ (๐ฅ, ๐ฆ, ๐ง) as the corresponding outer limit.
Finally, the normal regularity of๐ถ follows from the N-regularity of ๐น and๐บ (via Lemma 25.2)by comparing Lemma 23.4 with Lemma 23.4 (i) for Y = 0. ๏ฟฝ
326
25 calculus for the limiting coderivative
If one of the two mappings is single-valued, we can use Lemma 25.1 for verifying its semi-dierentiability and Theorem 20.12 for the expression of its graphical derivative to obtainfrom Lemma 25.11 the following two special cases.
Corollary 25.12 (fundamental lemma on compositions: single-valued outer mapping). Let๐,๐, ๐ be Banach spaces with ๐ reexive and
๐ถ โ {(๐ฅ, ๐ฆ,๐บ (๐ฆ)) | ๐ฆ โ ๐น (๐ฅ)}
for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ and ๐บ is continuously dierentiable near ๐ฆ ,then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐ฅโ, ๐ฆโ, ๐งโ) | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (โ[๐บโฒ(๐ฆ)]โ๐งโ โ ๐ฆโ), ๐ฆโ โ ๐ โ}.
Moreover, if ๐น is N-regular at (๐ฅ, ๐ฆ), then ๐ถ is normally regular at (๐ฅ, ๐ฆ,๐บ (๐ฆ)).
Proof. We apply Lemma 25.11, where the strict and semi-codierentiability requirementson๐บ are veried by Lemmas 23.1 and 25.1; the PSNC requirement follows from Lemma 25.5;and the requirement of ๐ฆโ โ ๐ทโ๐บ (๐ฆ |๐ง) (0) implying ๐ฆโ = 0 follows from the expressionof Theorem 20.12 for ๐ทโ๐บ (๐ฆ |๐ง) (0). The claimed normal regularity of ๐ถ for ๐ -regular ๐นfollows from the ๐ -regularity of ๐บ established by Theorem 20.12. ๏ฟฝ
Corollary 25.13 (fundamental lemma on compositions: single-valued inner mapping). Let๐,๐, ๐ be Banach spaces with ๐ reexive and
๐ถ โ {(๐ฅ, ๐ฆ, ๐ง) | ๐ฆ = ๐น (๐ฅ), ๐ง โ ๐บ (๐ฆ)}
for ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . If (๐ฅ, ๐ฆ, ๐ง) โ ๐ถ , ๐น is continuously dierentiable near ๐ฅ , andeither
(a) ๐น โฒ(๐ฅ)โ has a left-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) or(b) ๐ โ is nite-dimensional,
then
๐๐ถ (๐ฅ, ๐ฆ, ๐ง) = {(๐น โฒ(๐ฅ)โ(โ๏ฟฝ๏ฟฝโ โ ๐ฆโ), ๐ฆโ, ๐งโ) | โ๏ฟฝ๏ฟฝโ โ ๐ทโ๐บ (๐ฆ |๐ง) (โ๐งโ), ๐ฆโ โ ๐ โ}.
Moreover, if ๐บ is N-regular at (๐ฆ, ๐ง), then ๐ถ is normally regular at (๐ฅ, ๐ฆ, ๐ง).
Proof. We apply Lemma 25.11, where the strict and semi-codierentiability requirements on๐นโ1 are veried by Lemmas 23.1 and 25.1; the PSNC requirement follows from Lemma 25.6;and the requirement of ๐ฆโ โ ๐ทโ๐นโ1(๐ฆ |๐ฅ) (0) implying ๐ฆโ = 0 follows from the expressionof Corollary 20.14 for ๐ทโ๐นโ1(๐ฆ |๐ฅ) (0). The claimed normal regularity of ๐ถ for ๐ -regular ๐บfollows from the ๐ -regularity of ๐น established by Theorem 20.12. ๏ฟฝ
327
25 calculus for the limiting coderivative
25.4 calculus rules
Using these lemmas, we obtain again calculus rules.
Theorem 25.14 (addition of a single-valued dierentiable mapping). Let ๐,๐ be Banachspaces with ๐ reexive, let ๐บ : ๐ โ ๐ be Frรฉchet dierentiable, and ๐น : ๐ โ ๐ . If ๐บis continuously Frรฉchet dierentiable at ๐ฅ โ ๐ and ๐น is N-regular at (๐ฅ, ๐ฆ โ ๐บ (๐ฅ)) for๐ฆ โ ๐ป (๐ฅ) โ ๐น (๐ฅ) +๐บ (๐ฅ), then ๐ป is N-regular at (๐ฅ, ๐ฆ) and
๐ทโ๐ป (๐ฅ |๐ฆ) (๐ฆโ) = ๐ทโ๐น (๐ฅ |๐ฆ โ๐บ (๐ฅ)) (๐ฆโ) + [๐บโฒ(๐ฅ)]โ๐ฆโ (๐ฆโ โ ๐ โ).
Proof. We construct ๐ป from ๐ถ and ๐ as in Theorem 23.7. Due to the assumptions (notingthat continuous dierentiability implies strict dierentiability), ๐ถ and ๐ ๐ถ are normallyregular by Lemmas 25.9 and 25.11, respectively. We now obtain the claimed expression fromTheorem 23.7. ๏ฟฝ
Theorem 25.15 (outer composition with a single-valued dierentiable mapping). Let ๐,๐, ๐be Banach spaces with ๐ reexive, ๐น : ๐ โ ๐ , and ๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ๐บ (๐น (๐ )) be given. If ๐บ is continuously Frรฉchet dierentiable at each ๐ฆ โ ๐น (๐ฅ), invertible onran๐บ near ๐ง with Frรฉchet dierentiable inverse at ๐ง, and ๐น is N-regular at (๐ฅ, ๐ฆ), then ๐ป isN-regular at (๐ฅ, ๐ง) and
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) =โ
๐ฆ :๐บ (๐ฆ)=๐ง๐ทโ๐น (๐ฅ |๐ฆ) ( [๐บโฒ(๐ฆ)]โ๐งโ) (๐งโ โ ๐ โ).
Proof. We construct ๐ป from ๐ถ and ๐ as in Theorem 23.8. Due to the assumptions, ๐ถ and๐ ๐ถ are normally regular by Corollary 25.12 and Lemma 25.9, respectively. We now obtainthe claimed expression from Theorem 23.8. ๏ฟฝ
Corollary 25.16 (outer composition with a linear operator). Let ๐,๐, ๐ be Banach spaceswith ๐ reexive, ๐ด โ ๐(๐ ;๐ ), and ๐น : ๐ โ ๐ . If ๐ด has a bounded left-inverse ๐ดโ and ๐น isN-regular at (๐ฅ, ๐ฆ) for ๐ฅ โ ๐ and the unique ๐ฆ โ ๐ with ๐ด๐ฆ = ๐ง, then for any ๐ฅ โ ๐ and๐ง โ ๐ป (๐ฅ) := ๐ด๐น (๐ฅ), then ๐ป is N-regular at (๐ฅ, ๐ง) and
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = ๐ทโ๐น (๐ฅ |๐ฆ) (๐ดโ๐งโ) (๐งโ โ ๐ โ).
Theorem 25.17 (inner composition with a single-valued dierentiable mapping). Let ๐,๐, ๐be Banach spaces with ๐ reexive, ๐น : ๐ โ ๐ and ๐บ : ๐ โ ๐ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) :=๐บ (๐น (๐ฅ)). If ๐น is continuously Frรฉchet dierentiable near ๐ฅ such that ๐น โฒ(๐ฅ)โ has a boundedleft-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) and ๐บ is T-regular at (๐น (๐ฅ), ๐ง), then ๐ป is T-regular at (๐ฅ, ๐ง)and
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = [๐น โฒ(๐ฅ)]โ๐ทโ๐บ (๐น (๐ฅ) |๐ง) (๐งโ) (๐งโ โ ๐ โ).
328
25 calculus for the limiting coderivative
Proof. We construct ๐ป from ๐ถ and ๐ as in Theorem 23.10. Due to the assumptions, ๐ถ and๐ ๐ถ are normally regular by Corollary 25.13 and Lemma 25.9, respectively. We now obtainthe claimed expression from Theorem 23.10. ๏ฟฝ
Corollary 25.18 (inner composition with a linear operator). Let๐,๐, ๐ be Banach spaces with๐ reexive, ๐ด โ ๐(๐ ;๐ ), and ๐บ : ๐ โ ๐ . Let ๐ป โ ๐บ โฆ ๐ด for ๐ด โ ๐(๐ ;๐ ) and ๐บ : ๐ โ ๐on Banach spaces ๐,๐ , and ๐ . If ๐ดโ has a left-inverse ๐ดโโ โ ๐(๐ โ;๐ โ) and ๐บ is N-regular at(๐ด๐ฅ, ๐ง) for ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ด๐ฅ), then ๐ป is ๐ -regular at (๐ฅ, ๐ง) and
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = ๐ดโ๐ทโ๐บ (๐ด๐ฅ |๐ง) (๐งโ) (๐งโ โ ๐ โ).
To apply these results for chain rules of subdierentials, we now need to assume that bothspaces are reexive in addition to N-regularity.
Corollary 25.19 (second derivative chain rule for convex subdierential). Let๐,๐ be reexiveBanach spaces, let ๐ : ๐ โ โ be proper, convex, and lower semicontinuous, and ๐ด โ ๐(๐ ;๐ )be such that๐ดโ has a left-inverse๐ดโโ โ ๐(๐ โ;๐ โ), and ran๐ดโฉ int dom ๐ โ โ . Let โ โ ๐ โฆ๐ด.If ๐๐ is N-regular at๐ด๐ฅ , ๐ฅ โ ๐ , for ๐ฆโ โ ๐๐ (๐ด๐ฅ), then ๐โ is N-regular at ๐ฅ for ๐ฅโ = ๐ดโ๐ฆโ and
๐ทโ [๐โ] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = ๐ดโ๐ทโ [๐๐ ] (๐ด๐ฅ |๐ฆโ) (๐ดฮ๐ฅ) (ฮ๐ฅ โ ๐ ).
Theorem 25.20 (product rule). Let ๐,๐, ๐ be Banach spaces with ๐,๐ reexive, let ๐บ : ๐ โ๐(๐ ;๐ ) be Frรฉchet dierentiable, and ๐น : ๐ โ ๐ . Assume that ๐บ (๐ฅ) โ ๐(๐ ;๐ ) has aleft-inverse ๐บ (๐ฅ)โ โ on ran๐บ (๐ฅ) for ๐ฅ near ๐ฅ โ ๐ and the mapping ๐ฅ โฆโ ๐บ (๐ฅ)โ โ is Frรฉchetdierentiable at ๐ฅ . Let ๐ฅ โ ๐ and ๐ง โ ๐ป (๐ฅ) โ ๐บ (๐ฅ)๐น (๐ฅ) โ โ
๐ฆโ๐น (๐ฅ)๐บ (๐ฅ)๐ฆ . If ๐น is N-regularat ๐ฅ for the unique ๐ฆ โ ๐น (๐ฅ) satisfying ๐บ (๐ฅ)๐ฆ = ๐ง and ๐บ is continuously dierentiable at ๐ฆ ,then ๐ป is N-regular at ๐ฅ for ๐ง and
๐ทโ๐ป (๐ฅ |๐ง) (๐งโ) = ๐ทโ๐น (๐ฅ |๐ฆ) (๐บ (๐ฅ)โ๐งโ) + ([๐บโฒ(๐ฅ) ยท ]๐ฆ)โ๐งโ (๐งโ โ ๐ โ).
Proof. We construct ๐ป from ๐ 1 and graph(๐บ โฆ ๐น ) as in Theorem 23.13. Due to the as-sumptions, ๐บ and ๐น are T-regular, and hence ๐ป is tangentially regular by Theorem 25.15and Lemma 25.9. We now obtain the claimed expression from Theorem 23.13. ๏ฟฝ
Corollary 25.21 (second derivative chain rule for Clarke subdierential). Let๐,๐ be reexiveBanach spaces, let ๐ : ๐ โ ๐ be locally Lipschitz continuous, and let ๐ : ๐ โ ๐ be twicecontinuously dierentiable. Set โ : ๐ โ ๐ , โ(๐ฅ) โ ๐ (๐ (๐ฅ)). If there exists a neighborhood๐of ๐ฅ โ ๐ such that
(i) ๐ is Clarke regular at ๐ (๐ฅ) for all ๐ฅ โ ๐ ;(ii) ๐โฒ(๐ฅ)โ has a bounded left-inverse ๐โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) for all ๐ฅ โ ๐ ;
(iii) the mapping ๐ฅ โฆโ ๐โฒ(๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ ;
329
25 calculus for the limiting coderivative
and ๐๐ถ ๐ is N-regular at ๐ (๐ฅ) for ๐ฆโ โ ๐๐ถ ๐ (๐ (๐ฅ)), then ๐๐ถโ is N-regular at ๐ฅ for ๐ฅโ = ๐โฒ(๐ฅ)โ๐ฆโand
๐ทโ [๐๐ถโ] (๐ฅ |๐ฅโ) (๐ฅโโ) = ๐ (๐ฅ)โ๐ฅโโ + ๐โฒ(๐ฅ)โ๐ทโ [๐๐ถ ๐ ] (๐ (๐ฅ) |๐ฆโ) (๐โฒ(๐ฅ)โโ๐ฅโโ) (๐ฅโโ โ ๐ โโ).
Remark 25.22. Even in nite dimensions, calculus rules for the sum ๐น +๐บ of arbitrary set-valuedmappings ๐น,๐บ : โ๐ โ โ๐ or the composition ๐น โฆ ๐ป for ๐ป : โ๐ โ โ๐ are much more limited,and in general only yield inclusions of the form
๐ทโ [๐น +๐บ] (๐ฅ |๐ฆ) (๐ฆโ) โโ
๐ฆ=๐ฆ1+๐ฆ2,๐ฆ1โ๐น (๐ฅ),๐ฆ2โ๐บ (๐ฅ)
๐ทโ๐น (๐ฅ |๐ฆ1) (๐ฆโ) + ๐ทโ๐บ (๐ฅ |๐ฆ2) (๐ฆโ),
and๐ทโ [๐น โฆ ๐ป ] (๐ฅ |๐ฆ) (๐ฆโ) โ
โ๐งโ๐ป (๐ฅ)โฉ๐นโ1 (๐ฆ)
๐ทโ๐ป (๐ฅ |๐ง) โฆ ๐ทโ๐น (๐ง |๐ฆ) (๐ฆโ).
We refer to [Rockafellar & Wets 1998; Mordukhovich 2018] for these and other results.
330
26 SECOND-ORDER OPTIMALITY CONDITIONS
We now illustrate the use of set-valued derivatives for optimization problems by showinghow these can be used to derive second-order (sucient and necessary) optimality con-ditions for non-smooth problems. Again, we do not aim for the most general or sharpestpossible results and focus instead on problems having the form (P) involving the com-position of a nonsmooth convex functional with a smooth nonlinear operator. As in theprevious chapters, we will also assume a regularity conditions that allows for cleanerresults.
26.1 second-order derivatives
Let ๐ be a Banach space and ๐ : ๐ โ โ. In this chapter, we set
๐๐ถ ๐ (๐ฅ) โ{๐ฅโ โ ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ (๐ฅโ,โ1) โ ๐๐ถepi ๐ (๐ฅ, ๐ (๐ฅ))
},
where๐๐ถ๐ด โ ๐ โฆ
๐ด is the Clarke normal cone. By Lemma 20.19, this coincides with the classicalClarke subdierential if ๐ : ๐ โ โ is locally Lipschitz continuous.
As in the smooth case, second-order conditions are based on a local quadratic model builtfrom curvature information at a point. Since in the nonsmooth case, second derivatives,i.e., graphical derivatives of the subdierential, are no longer unique, we need to considerthe entire set of them when building this curvature information. We therefore need todistinguish a lower curvature model at ๐ฅ โ ๐ for ๐ฅโ โ ๐๐ถ ๐ (๐ฅ) in direction ฮ๐ฅ โ ๐
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) โ infฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)
ใฮ๐ฅโ,ฮ๐ฅใ๐
as well as an upper curvature model
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) โ supฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)
ใฮ๐ฅโ,ฮ๐ฅใ๐ .
It turns out that even for ฮ๐ฅ โ 0, we need to consider the stationary upper model
๐๐0 (ฮ๐ฅ ;๐ฅ |๐ฅโ) โ sup
ฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (0)ใฮ๐ฅโ,ฮ๐ฅใ๐ ,
331
26 second-order optimality conditions
which we use to dene the extended upper model
๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) โ max{๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ), ๐ ๐
0 (ฮ๐ฅ ;๐ฅ |๐ฅโ)}
= supฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)โช๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (0)
ใฮ๐ฅโ,ฮ๐ฅใ๐ .
For smooth functionals, these models coincide with the usual Hessian.
Theorem 26.1. Let๐ be a Banach space and let ๐ : ๐ โ โ be twice continuously dierentiable.Then for every ๐ฅ,ฮ๐ฅ โ ๐ ,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ โฒ(๐ฅ)) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ โฒ(๐ฅ)) = ใ๐ โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐and
๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |๐ โฒ(๐ฅ)) = max {0, ใ๐ โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ } .
Proof. Since ๐๐ถ ๐ (๐ฅ) = {๐ โฒ(๐ฅ)} by Theorem 13.5, it follows from Theorem 20.12 that
๐ท [๐๐ถ ๐ (๐ฅ)] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = ใ๐ โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐and in particular ๐ท [๐๐ถ ๐ (๐ฅ)] (๐ฅ |๐ฅโ) (0) = 0, which immediately yields the claim. ๏ฟฝ
We illustrate the nonsmooth case with the usual examples of the indicator functional ofthe unit ball and the norm on โ.
Lemma 26.2. Let ๐ (๐ฅ) = ๐ฟ [โ1,1] (๐ฅ), ๐ฅ โ โ. Then for every ๐ฅโ โ ๐๐ (๐ฅ) and ฮ๐ฅ โ โ,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) =โ if |๐ฅ | = 1, ๐ฅโ = 0, ๐ฅฮ๐ฅ > 0,โ if |๐ฅ | = 1, ๐ฅโ โ (0,โ)๐ฅ, ฮ๐ฅ โ 0,0, otherwise,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) =โโ if |๐ฅ | = 1, ๐ฅโ = 0, ๐ฅฮ๐ฅ > 0,โโ if |๐ฅ | = 1, ๐ฅโ โ (0,โ)๐ฅ, ฮ๐ฅ โ 0,0, otherwise,
and
๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) = ๐ ๐0 (ฮ๐ฅ ;๐ฅ |๐ฅโ) =
โ if |๐ฅ | = 1, ๐ฅโ โ (0,โ)๐ฅ,โ if |๐ฅ | = 1, ๐ฅโ = 0, ๐ฅฮ๐ฅ > 0,0 if |๐ฅ | = 1, ๐ฅโ = 0, ๐ฅฮ๐ฅ โค 0,0 if |๐ฅ | < 1.
Proof. The claims follow directly from the expression (20.7) in Theorem 20.17 with sup โ =โโ and inf โ = โ. ๏ฟฝ
332
26 second-order optimality conditions
Lemma 26.3. Let ๐ (๐ฅ) = |๐ฅ |, ๐ฅ โ โ. Then for every ๐ฅโ โ ๐๐ (๐ฅ) and ฮ๐ฅ โ โ,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) ={โ if ๐ฅ = 0, ฮ๐ฅ โ 0, signฮ๐ฅ โ ๐ฅโ,0 otherwise,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) ={โโ if ๐ฅ = 0, ฮ๐ฅ โ 0, signฮ๐ฅ โ ๐ฅโ,0 otherwise,
and
๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) = ๐ ๐0 (ฮ๐ฅ ;๐ฅ |๐ฅโ) =
0 if ๐ฅ โ 0, ๐ฅโ = sign๐ฅ,0 if ๐ฅ = 0, |๐ฅโ | = 1, ๐ฅโฮ๐ฅ โฅ 0,โ if ๐ฅ = 0, |๐ฅโ | = 1, ๐ฅโฮ๐ฅ < 0,โ if ๐ฅ = 0, |๐ฅโ | < 1.
Proof. The claims follow directly from the expression (20.13) in Theorem 20.18 with sup โ =โโ and inf โ = โ. ๏ฟฝ
These results can be lifted to the corresponding integral functionals on ๐ฟ๐ (ฮฉ) using theresults of Chapter 21. Similarly, we obtain calculus rules for the curvature functionals fromthe corresponding results in Chapter 22.
Theorem 26.4 (sum rule). Let ๐ be a Banach space, let ๐ : ๐ โ โ be locally Lipschitzcontinuous, and let ๐ : ๐ โ โ be twice continuously dierentiable. Set ๐ (๐ฅ) โ ๐ (๐ฅ) + ๐(๐ฅ).Then for every ๐ฅ โ ๐ and ๐ฅโ โ ๐๐ถ ๐ (๐ฅ),
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) + ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ (ฮ๐ฅ โ ๐ ),๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) + ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ (ฮ๐ฅ โ ๐ ).
Proof. We only show the expression for the upper model, the lower model being analo-gous. First, by Theorem 13.20, we have ๐๐ถ ๐ (๐ฅ) = {๐ฅโ + ๐โฒ(๐ฅ) | ๐ฅโ โ ๐๐ถ ๐ (๐ฅ)}. The sum ruleTheorem 22.11 for the graphical derivative together with Theorem 20.12 then yields
๐ท [๐๐ถ ๐] (๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) (ฮ๐ฅ) = ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ) + ๐โฒโฒ(๐ฅ)ฮ๐ฅand therefore
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) = supฮ๐ฅโโ๐ท [๐๐ถ ๐] (๐ฅ |๐ฅโ+๐โฒ(๐ฅ)) (ฮ๐ฅ)
ใฮ๐ฅโ,ฮ๐ฅใ๐
= supฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)
ใฮ๐ฅโ,ฮ๐ฅใ๐ + ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ . ๏ฟฝ
Theorem 26.5 (chain rule). Let ๐,๐ be Banach spaces, let ๐ : ๐ โ โ be convex, and let๐ : ๐ โ ๐ be twice continuously dierentiable. Set ๐ (๐ฅ) โ ๐ (๐ (๐ฅ)). If there exists aneighborhood๐ of ๐ฅ โ ๐ such that
333
26 second-order optimality conditions
(i) ๐ is Clarke regular at ๐ (๐ฅ) for all ๐ฅ โ ๐ ;(ii) ๐โฒ(๐ฅ)โ has a bounded left-inverse ๐โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ) for all ๐ฅ โ ๐ ;
(iii) the mapping ๐ฅ โฆโ ๐โฒ(๐ฅ)โ โ is Frรฉchet dierentiable at ๐ฅ ;then for all ๐ฅโ โ ๐๐ถโ(๐ฅ) = ๐โฒ(๐ฅ)โ๐๐ถ ๐ (๐ (๐ฅ)),
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) = ใ๐ฆโ, [๐โฒโฒ(๐ฅ)ฮ๐ฅ]ฮ๐ฅใ๐ +๐ ๐ (๐โฒ(๐ฅ)ฮ๐ฅ ; ๐ (๐ฅ) |๐ฆโ) (ฮ๐ฅ โ ๐ ),๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) = ใ๐ฆโ, [๐โฒโฒ(๐ฅ)ฮ๐ฅ]ฮ๐ฅใ๐ +๐ ๐ (๐โฒ(๐ฅ)ฮ๐ฅ ; ๐ (๐ฅ) |๐ฆโ) (ฮ๐ฅ โ ๐ ),
for the unique ๐ฆโ โ ๐๐ถ ๐ (๐ (๐ฅ)) such that ๐โฒ(๐ฅ)โ๐ฆโ = ๐ฅโ.
Proof. We again only consider the upper model ๐ ๐ , the lower model being analogous. Dueto our assumptions, we can apply Corollary 24.16 to obtain
๐ท [๐๐ถ (๐ โฆ ๐)] (๐ฅ |๐ฅโ) (ฮ๐ฅ) = [๐โฒโฒ(๐ฅ)โฮ๐ฅ]๐ฆโ + ๐โฒ(๐ฅ)โ๐ท [๐๐ ] (๐ (๐ฅ) |๐ฆโ) (๐โฒ(๐ฅ)ฮ๐ฅ),
where ๐โฒโฒ : ๐ โ [๐ โ ๐(๐ โ;๐ โ)]. Thus every ฮ๐ฅโ โ ๐ท [๐๐ถ (๐ โฆ ๐)] (๐ฅ |๐ฅโ) (ฮ๐ฅ) can bewritten for some ฮ๐ฆโ โ ๐ท [๐๐ ] (๐ (๐ฅ) |๐ฆโ) (๐โฒ(๐ฅ)ฮ๐ฅ) as ฮ๐ฅโ = [๐โฒโฒ(๐ฅ)ฮ๐ฅ]โ๐ฆโ + ๐โฒ(๐ฅ)โฮ๐ฆโ.Inserting this into the denition of ๐ ๐ yields
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) = supฮ๐ฆโโ๐ท [๐๐ ] (๐ (๐ฅ) |๐ฆโ) (๐ โฒ(๐ฅ)ฮ๐ฅ)
ใ[๐โฒโฒ(๐ฅ)ฮ๐ฅ]โ๐ฆโ + ๐โฒ(๐ฅ)โฮ๐ฆโ,ฮ๐ฅใ๐
= ใ๐ฆโ, [๐โฒโฒ(๐ฅ)ฮ๐ฅ]ฮ๐ฅใ๐ + supฮ๐ฆโโ๐ท [๐๐ ] (๐ (๐ฅ) |๐ฆโ) (๐ โฒ(๐ฅ)ฮ๐ฅ)
ใฮ๐ฆโ, ๐โฒ(๐ฅ)ฮ๐ฅใ๐ . ๏ฟฝ
26.2 subconvexity
We say that ๐ : ๐ โ โ is subconvex near ๐ฅ for ๐ฅโ โ ๐๐ถ ๐ (๐ฅ) if for all ๐ > 0, there existsY > 0 such that
(26.1) ๐ (๐ฅ)โ ๐ (๐ฅ) โฅ ใ๐ฅโ, ๐ฅโ๐ฅใ๐ โ ๐2 โ๐ฅโ๐ฅ โ2๐ (๐ฅ, ๐ฅ โ ๐น(๐ฅ, Y); ๐ฅโ โ ๐๐ถ ๐ (๐ฅ)โฉ๐น(๐ฅโ, Y)) .
We say that ๐ is subconvex at ๐ฅ for ๐ฅโ if this holds with ๐ฅ = ๐ฅ xed. It is clear that convexfunctions are subconvex near any point for any subderivative. By extension, scalar functionssuch as ๐ก โฆโ |๐ก |๐ for ๐ โ (0, 1) that are locally minorized by ๐ฅ โฆโ ๐ (๐ฅ) + ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ atpoints of nonsmoothness are also subconvex.
The sum of two subconvex functions for which the subdierential sum rule holds is clearlyalso subconvex. The next result shows that smooth functions simply need to have a non-negative Hessian at the point ๐ฅ to be subconvex. This is in contrast to the everywherenon-negative Hessian of convex functions.
334
26 second-order optimality conditions
Lemma 26.6. Let๐ be a Banach space and let ๐ : ๐ โ โ be twice continuously dierentiable.If ใ๐ โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ โฅ 0 for all ฮ๐ฅ โ ๐ , then ๐ is subconvex near ๐ฅ โ ๐ for ๐ โฒ(๐ฅ).
Proof. Fix ๐ > 0. We apply Theorem 2.10 rst to ๐ to obtain for every ๐ฅ, โ โ ๐ that
๐ (๐ฅ + โ) โ ๐ (๐ฅ) =โซ 1
0ใ๐ โฒ(๐ฅ + ๐กโ), โใ๐ ๐๐ก .
Similarly, the same theorem applied to ๐ก โฆโ ใ๐ โฒ(๐ฅ + ๐กโ), โใ for any ๐ฅ, โ โ ๐ yields
ใ๐ โฒ(๐ฅ + ๐กโ), โใ๐ โ ใ๐ โฒ(๐ฅ), โใ๐ =โซ 1
0ใ๐ โฒโฒ(๐ฅ + ๐ ๐กโ)โ,โใ๐ ๐๐ .
Combined these two expansions yield
(26.2) ๐ (๐ฅ + โ) โ ๐ (๐ฅ) = ใ๐ โฒ(๐ฅ), โใ๐ +โซ 1
0
โซ 1
0ใ๐ โฒโฒ(๐ฅ + ๐ ๐กโ)โ,โใ๐ ๐๐ ๐๐ก .
Since ใ๐ โฒโฒ(๐ฅ)โ,โใ๐ โฅ 0, we have
ใ๐ โฒโฒ(๐ฅ + ๐)โ,โใ๐ โฅ ใ[๐ โฒโฒ(๐ฅ + ๐) โ ๐ โฒโฒ(๐ฅ)]โ,โใ๐ (๐ฅ, ๐, โ โ ๐ ).
Therefore, by the continuity of ๐ โฒโฒ, for any ๐ > 0 we can nd Y > 0 such that
ใ๐ โฒโฒ(๐ฅ + ๐)โ,โใ๐ โฅ โ๐2 โโโ2๐ (๐ โ ๐น(0, Y), ๐ฅ โ ๐น(๐ฅ, Y), โ โ ๐ ).
Taking ๐ = ๐ ๐กโ, this and (26.2) shows that
๐ (๐ฅ + โ) โ ๐ (๐ฅ) โฅ ใ๐ โฒ(๐ฅ), โใ๐ โ ๐
2 โโโ2๐ .
The claim now follows by taking โ = ๐ฅ โ ๐ฅ . ๏ฟฝ
Remark 26.7. Subconvexity,which to our knowledge has not previously been treated in the literature,is a stronger condition than the prox-regularity introduced in [Poliquin & Rockafellar 1996]. Thelatter requires (26.1) to hold merely for a xed ๐ > 0. The denition in [Rockafellar & Wets 1998] isslightly broader and implies the earlier one. Their denition is itself a modication of the primal-lower-nice functions of [Thibault & Zagrodny 1995]. Our notion of subconvexity is also related tothose of subsmooth sets and submonotone operators introduced in [Aussel, Daniilidis & Thibault2005]. An alternative concept for functions, subsmoothness and lower-๐ถ๐ , has been introduced in[Rockafellar 1981].
335
26 second-order optimality conditions
26.3 sufficient and necessary conditions
We start with sucient conditions, which are based on the upper model.
Theorem 26.8. Let ๐ be a Banach space and ๐ : ๐ โ โ. If for ๐ฅ โ ๐ ,(i) ๐ is subconvex near ๐ฅ for ๐ฅโ = 0;
(ii) 0 โ ๐๐ถ ๐ (๐ฅ);(iii) there exists a ` > 0 such that
๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |0) โฅ `โฮ๐ฅ โ2๐ (ฮ๐ฅ โ ๐ );
then ๐ฅ is a strict local minimizer of ๐ .
Proof. Let ๐ฅโ โ 0 and ฮ๐ฅ โ ๐ . By the assumed subconvexity, for every ๐ > 0 there existsY๐ > 0 such that for ๐ฅ โ ๐น(๐ฅ, Y๐/2) and ๐ฅโ โ ๐๐ถ ๐ (๐ฅ) โฉ ๐น(๐ฅโ, Y๐), we have for every ๐ก > 0with ๐ก โฮ๐ฅ โ๐ < 1
2Y๐ that
๐ (๐ฅ + ๐กฮ๐ฅ) โ ๐ (๐ฅ) โ ๐ก ใ๐ฅโ,ฮ๐ฅใ๐๐ก2
โฅ ใ๐ฅโ โ ๐ฅโ,ฮ๐ฅใ๐๐ก
โ ๐
2 โฮ๐ฅ โ2๐ .
Since ๐ > 0 was arbitrary, we thus obtain for every ฮ๐ฅ โ ๐ and ฮ๐ฅโ โ ๐ท [๐๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)that
๐ด(ฮ๐ฅ,ฮ๐ฅ,ฮ๐ฅโ) โ lim inf๐กโ 0, (๐ฅโ๐ฅ)/๐กโฮ๐ฅ
(๐ฅโโ๐ฅโ)/๐กโฮ๐ฅโ, ๐ฅโโ๐๐ถ ๐ (๐ฅ)
๐ (๐ฅ + ๐กฮ๐ฅ) โ ๐ (๐ฅ) โ ๐ก ใ๐ฅโ,ฮ๐ฅใ๐๐ก2
โฅ lim inf๐กโ 0, (๐ฅโ๐ฅ)/๐กโฮ๐ฅ
(๐ฅโโ๐ฅโ)/๐กโฮ๐ฅโ, ๐ฅโโ๐๐ถ ๐ (๐ฅ)
ใ๐ฅโ โ ๐ฅโ,ฮ๐ฅใ๐๐ก
= ใฮ๐ฅโ,ฮ๐ฅใ๐ .
This implies that
supฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)
๐ด(ฮ๐ฅ,ฮ๐ฅ,ฮ๐ฅโ) โฅ supฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)
ใฮ๐ฅโ,ฮ๐ฅใ๐ =: ๐ต(ฮ๐ฅ,ฮ๐ฅ).
Since ๐ฅโ = 0, we can x ๐ฅ = ๐ฅ + ๐กฮ๐ฅ and ฮ๐ฅ = ฮ๐ฅ in the lim inf above and use (iii) to obtain
(26.3) lim inf๐กโ 0
๐ (๐ฅ + 2๐กฮ๐ฅ) โ ๐ (๐ฅ + ๐กฮ๐ฅ)๐ก2
โฅ ๐ต(ฮ๐ฅ,ฮ๐ฅ).
Similarly, xing ๐ฅ = ๐ฅ and ฮ๐ฅ = 0 yields
(26.4) lim inf๐กโ 0
๐ (๐ฅ + ๐กฮ๐ฅ) โ ๐ (๐ฅ)๐ก2
โฅ ๐ต(ฮ๐ฅ, 0) โฅ 0,
where the nal inequality follows from the denition of ๐ต by taking ฮ๐ฅโ = 0 (which ispossible since ๐ฅโ โ ๐๐ถ ๐ (๐ฅ)). We now make a case distinction.
336
26 second-order optimality conditions
(I) ๐ต(ฮ๐ฅ, 0) โฅ `โฮ๐ฅ โ2๐ . In this case, the lim inf is strictly positive for ฮ๐ฅ โ 0 and hence๐ (๐ฅ + ๐กฮ๐ฅ) > ๐ (๐ฅ) for all ๐ก > 0 suciently small.
(II) ๐ต(ฮ๐ฅ, 0) < `โฮ๐ฅ โ2๐ . In this case, it follows from (iii) that
`โฮ๐ฅ โ2๐ โค ๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |0) = max{๐ต(ฮ๐ฅ,ฮ๐ฅ), ๐ต(ฮ๐ฅ, 0)}
and hence that ๐ต(ฮ๐ฅ,ฮ๐ฅ) = ๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |0) โฅ `โฮ๐ฅ โ2๐ . Summing (26.3) and (26.4) thenyields
lim inf๐กโ 0
๐ (๐ฅ + 2๐กฮ๐ฅ) โ ๐ (๐ฅ)๐ก2
โฅ ๐ต(ฮ๐ฅ,ฮ๐ฅ) โฅ `โฮ๐ฅ โ2๐ ,
which again implies for ฮ๐ฅ โ 0 that ๐ (๐ฅ + ๐กฮ๐ฅ) > 0๐ (๐ฅ) for all ๐ก > 0 sucientlysmall.
Since ฮ๐ฅ โ ๐ was arbitrary, ๐ฅ is by denition a strict local minimizer of ๐ . ๏ฟฝ
Remark 26.9. The use of the stationarity curvature model ๐ ๐0 in the second-order condition is
required since the upper curvature model may not provide any information about the growth of๐ at ๐ฅ in certain directions. However, since ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (0) is a cone, if it contains any elementฮ๐ฅโ such that ใฮ๐ฅโ,ฮ๐ฅใ๐ > 0, then ๐ต(ฮ๐ฅ, 0) = ๐ ๐
0 (ฮ๐ฅ ;๐ฅ |๐ฅโ) = โ, ensuring that the condition (iii)holds in the direction ฮ๐ฅ for any ` > 0. For example, if ๐ (๐ฅ) = |๐ฅ |, then Lemma 26.3 shows that๐ ๐ (ฮ๐ฅ ; 0|0) = 0 for ฮ๐ฅ โ 0, which indeed does not provide any information about the growthof ๐ at 0. Conversely, ๐ ๐
0 (ฮ๐ฅ ; 0|0) = โ for any ฮ๐ฅ โ 0, so the growth is more rapid than ๐ ๐ canmeasure.
Combining Theorem 26.8 with Theorem 26.1, we obtain the classical sucient second-ordercondition. (Recall that in innite-dimensional spaces, positive deniteness and coercivityare no longer equivalent, and the latter, stronger, property is usually required.)
Corollary 26.10. Let ๐ be a Banach space and let ๐ : ๐ โ โ be twice continuously dieren-tiable. If for ๐ฅ โ ๐ ,
(i) ๐ โฒ(๐ฅ) = 0;
(ii) there exists a ` > 0 such that
ใ๐ โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ โฅ `โฮ๐ฅ โ2๐ (ฮ๐ฅ โ ๐ );
then ๐ฅ is a local minimizer of ๐ .
Proof. To apply Theorem 26.8, it suces to note that ๐๐ถ ๐ (๐ฅ) = {๐ โฒ(๐ฅ)} by Theorem 13.5 andthat the second-order condition ensures subconvexity of ๐ at ๐ฅ for ๐ฅโ = 0 by Lemma 26.6.
๏ฟฝ
337
26 second-order optimality conditions
For nonsmooth functionals, we merely illustrate the sucient second-order condition witha simple but nontrivial scalar example.
Corollary 26.11. Let ๐ = โ and ๐ โ ๐ + ๐ for ๐ : โ โ โ twice continuously dierentiableand ๐ (๐ฅ) = |๐ฅ |. Then the sucient condition of Theorem 26.8 holds at ๐ฅ โ โ if and only ifone of the following cases holds:
(a) ๐ฅ = 0 and |๐โฒ(๐ฅ) | < 1;
(b) ๐ฅ = 0, |๐โฒ(๐ฅ) | = 1, and ๐โฒโฒ(๐ฅ) > 0; or
(c) ๐ฅ โ 0, ๐โฒ(๐ฅ) = โ sign๐ฅ , and ๐โฒโฒ(๐ฅ) > 0.
Proof. We apply Theorem 26.8, for which we need to verify its conditions. First, note that(ii) is equivalent to 0 = ๐ฅโ + ๐โฒ(๐ฅ) for some ๐ฅโ โ ๐๐ (๐ฅ) = sign๐ฅ by Theorem 13.20 andExample 4.7.
We now verify the subconvexity of ๐ near ๐ฅ for ๐ฅโ = 0. Expanding the denition (26.1), thisrequires
(26.5) |๐ฅ | โ |๐ฅ | + ๐(๐ฅ) โ ๐(๐ฅ) โฅ ใ๐ฅโ + ๐โฒ(๐ฅ), ๐ฅ โ ๐ฅใ โ ๐
2 โ๐ฅ โ ๐ฅ โ2
(๐ฅ, ๐ฅ โ ๐น(๐ฅ, Y); ๐ฅโ โ ๐๐ถ | ยท | (๐ฅ) โฉ ๐น(๐ฅโ โ ๐โฒ(๐ฅ), Y)) .
In cases (b) and (c), we can apply Lemma 26.6 to deduce the subconvexity of ๐ and thereforeof ๐ = ๐ + ๐ since ๐ is convex. For case (a), we have ๐ฅ = 0 with |๐โฒ(๐ฅ) | < 1. Since ๐โฒ iscontinuous, we consequently have ๐ฅโ โ๐โฒ(๐ฅ) = โ๐โฒ(๐ฅ) โ (โ1, 1) when |๐ฅ โ๐ฅ | = |๐ฅ | is smallenough. Since ๐๐ (๐ฅ) โ {โ1, 1} for ๐ฅ โ 0, it follows that ๐๐ถ | ยท | (๐ฅ) โฉ ๐น(๐ฅโ โ ๐โฒ(๐ฅ), Y) = โ for๐ฅ โ ๐น(๐ฅ, Y) \ {๐ฅ} for small enough Y > 0. Therefore, for small enough Y > 0, the condition(26.5) reduces to
(26.6) |๐ฅ | +๐(๐ฅ) โ๐(0) โฅ ใ๐ฅโ +๐โฒ(0), ๐ฅใ โ ๐2 |๐ฅ |2 (๐ฅ โ [โY, Y], |๐ฅโ | โค 1, |๐ฅโ +๐โฒ(0) | โค Y).
Furthermore, |๐โฒ(0) | < 1 implies that for every ๐ > 0 and ๐ > 0, we can nd an Y > 0suciently small that
(1 โ Y โ |๐โฒ(0) |) |๐ฅ | โฅ ๐ โ ๐2 |๐ฅ |2 (๐ฅ โ [โY, Y]) .
Since ๐ : โ โ โ is twice continuously dierentiable, we can apply a Taylor expansion in๐ฅ = 0 to obtain for some ๐ > 0 and |๐ฅ | suciently small that
๐(0) โค ๐(๐ฅ) + ใ๐โฒ(0),โ๐ฅใ + ๐2 |๐ฅ |2.
338
26 second-order optimality conditions
Adding this to the previous inequality, we obtain for suciently small Y > 0 and ๐ฅโ โ [โ1, 1]satisfying |๐ฅโ + ๐โฒ(0) | โค Y that
|๐ฅ | + ๐(๐ฅ) โ ๐(0) โฅ (|๐โฒ(0) | + Y) |๐ฅ | + ใ๐โฒ(0), ๐ฅใ โ ๐
2 |๐ฅ |2
โฅ ใ๐ฅโ + ๐โฒ(0), ๐ฅใ โ ๐
2 |๐ฅ |2
for every |๐ฅ | โค Y, which is (26.6). Hence ๐ = ๐ +๐ is subconvex near ๐ฅ = 0 for 0 = ๐ฅโ +๐โฒ(0).To verify (iii), we compute the upper curvature model. Let ฮ๐ฅ โ ๐ . Then by Theorems 26.1and 26.4,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) + ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ,๐ ๐0(ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ),
where ๐ ๐ is given by Lemma 26.3. It follows that
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) ={โโ if ๐ฅ = 0, ฮ๐ฅ โ 0, signฮ๐ฅ โ ๐ฅโ,ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ otherwise,
and
๐ ๐0(ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) =
0 if ๐ฅ โ 0, ๐ฅโ = sign๐ฅ,0 if ๐ฅ = 0, |๐ฅโ | = 1, ๐ฅโฮ๐ฅ โฅ 0,โ if ๐ฅ = 0, |๐ฅโ | = 1, ๐ฅโฮ๐ฅ < 0,โ if ๐ฅ = 0, |๐ฅโ | < 1.
Thus
๏ฟฝ๏ฟฝ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) =
max{0, ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ} if ๐ฅ โ 0, ๐ฅโ = sign๐ฅ,max{0, ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ} if ๐ฅ = 0, |๐ฅโ | = 1, ๐ฅโฮ๐ฅ โฅ 0,โ if ๐ฅ = 0, |๐ฅโ | = 1, ๐ฅโฮ๐ฅ < 0,โ if ๐ฅ = 0, |๐ฅโ | < 1.
The condition (iii) is thus equivalent to
max{0, ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ} โฅ `โฮ๐ฅ โ2 when{๐ฅ โ 0 or๐ฅ = 0, |๐โฒ(๐ฅ) | = 1, and ๐โฒ(๐ฅ)ฮ๐ฅ < 0.
The left inequality can only hold for arbitrary ฮ๐ฅ โ โ if ` = ๐โฒโฒ(๐ฅ) > 0. Hence (ii) and (iii)hold if and only if one of the cases (a)โ(c) holds. ๏ฟฝ
Note that case (a) corresponds to the case of strict complementarity or graphical regularityof ๐๐ in Theorem 20.18. Conversely, cases (b) and (c) imply that ๐ and therefore ๐ is locallyconvex, recalling from Theorem 4.2 that for convex functionals, the rst-order optimalityconditions are necessary and sucient.
Now we formulate our necessary condition, which is based on the lower curvature model.
339
26 second-order optimality conditions
Theorem 26.12. Let ๐ be a Banach space and ๐ : ๐ โ โ. If ๐ฅ โ ๐ is a local minimizer of ๐and ๐ is locally Lipschitz continuous and subconvex at ๐ฅ for 0 โ ๐ โ, then
๐ ๐ (ฮ๐ฅ ;๐ฅ |0) โฅ 0 (ฮ๐ฅ โ ๐ ) .
Proof. We have from Theorem 13.4 that ๐ฅโ โ 0 โ ๐๐ถ ๐ (๐ฅ). By the assumed subconvexity, forevery ๐ > 0 there exists Y > 0 such that for ๐ฅ โ ๐น(๐ฅ, Y/2) and ๐ฅโ๐ก โ ๐๐ถ ๐ (๐ฅ + ๐กฮ๐ฅ) โฉ๐น(๐ฅโ, Y),we have for every ๐ก > 0 with ๐ก โฮ๐ฅ โ๐ < Y/2 that
๐ (๐ฅ + ๐กฮ๐ฅ) โ ๐ (๐ฅ) โ ๐ก ใ๐ฅโ,ฮ๐ฅใ๐๐ก2
โค ใ๐ฅโ๐ก โ ๐ฅโ,ฮ๐ฅใ๐๐ก
+ ๐2 โฮ๐ฅ โ2๐ .
For every ฮ๐ฅโ โ ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ), by denition there exist ฮ๐ฅ โ ฮ๐ฅ and, for smallenough ๐ก > 0, ๐ฅโ๐ก โ ๐๐ถ ๐ (๐ฅ + ๐กฮ๐ฅ) โฉ๐น(๐ฅโ, Y) such that (๐ฅโ๐ก โ๐ฅโ)/๐ก โ ฮ๐ฅโ โ ๐ โ. Since ๐ > 0was arbitrary and ๐ฅโ = 0, it follows that
lim infฮ๐ฅโฮ๐ฅ๐กโ 0
๐ (๐ฅ + ๐กฮ๐ฅ) โ ๐ (๐ฅ)๐ก2
โค lim infฮ๐ฅโฮ๐ฅ๐กโ 0
( ใ๐ฅโ๐ก โ ๐ฅโ,ฮ๐ฅใ๐๐ก
+ ใ๐ฅโ๐ก โ ๐ฅโ,ฮ๐ฅ โ ฮ๐ฅใ๐๐ก
)= lim inf
๐กโ 0
ใ๐ฅโ๐ก โ ๐ฅโ,ฮ๐ฅใ๐๐ก
โค infฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |๐ฅโ) (ฮ๐ฅ)
ใฮ๐ฅโ,ฮ๐ฅใ๐= ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |0).
Since ๐ฅ is a local minimizer, we have ๐ (๐ฅ) โค ๐ (๐ฅ + ๐กฮ๐ฅ) for ๐ก > 0 suciently small andฮ๐ฅ suciently close to ฮ๐ฅ . Rearranging and passing to the limit thus yields the claimednonnegativity of ๐ ๐ (ฮ๐ฅ ;๐ฅ |0). ๏ฟฝ
Remark 26.13. Compared to the sucient condition of Theorem 26.8, the necessary condition doesnot involve a โstationary lower modelโ
๐ ๐ ,0(ฮ๐ฅ ;๐ฅ |0) โ infฮ๐ฅโโ๐ท [๐๐ถ ๐ ] (๐ฅ |0) (0)
ใฮ๐ฅโ,ฮ๐ฅใ๐ .
In fact, ๐ ๐ ,0(ฮ๐ฅ ;๐ฅ |0) โฅ 0 is not a necessary optimality condition: let ๐ (๐ฅ) = |๐ฅ |, ๐ฅ โ โ, and ๐ฅ = 0.Then by Theorem 20.18, ๐ท [๐๐ ] (0|0) (0) = โ and hence ๐ ๐ ,0(ฮ๐ฅ ; 0|0) = โโ for all ฮ๐ฅ โ 0.
For smooth functions, we recover the usual second-order necessary condition from Theo-rem 26.1.
Corollary 26.14. Let ๐ be a Banach space and let ๐ : ๐ โ โ be twice continuously dieren-tiable. If ๐ฅ โ ๐ is a local minimizer of ๐ , then
ใ๐ โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ๐ โฅ 0 (ฮ๐ฅ โ ๐ ) .
340
26 second-order optimality conditions
We again illustrate the nonsmooth case with a scalar example.
Corollary 26.15. Let๐ = โ and ๐ โ ๐ +๐ for๐ : โ โ โ twice continuously dierentiable and๐ (๐ฅ) = |๐ฅ |. Then the necessary condition of Theorem 26.12 holds at ๐ฅ if and only if ๐โฒโฒ(๐ฅ) โฅ 0.
Proof. We apply Theorem 26.12, for which we need to verify its conditions. Both ๐ and ๐are locally Lipschitz continuous by Theorem 3.13 and Lemma 2.11, respectively, and henceso is ๐ . We have already veried the subconvexity of ๐ in Corollary 26.11.
By Theorems 13.4 and 13.20 and Example 4.7, we again have 0 = ๐ฅโ + ๐โฒ(๐ฅ) for some๐ฅโ โ ๐๐ (๐ฅ) = sign๐ฅ . It remains to compute the lower curvature model. Let ฮ๐ฅ โ ๐ . ByTheorems 26.1 and 26.4,
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) = ๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ) + ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ,
where ๐ ๐ is given by Lemma 26.3. It follows that
๐ ๐ (ฮ๐ฅ ;๐ฅ |๐ฅโ + ๐โฒ(๐ฅ)) ={โ if ๐ฅ = 0, ฮ๐ฅ โ 0, signฮ๐ฅ โ ๐ฅโ,ใ๐โฒโฒ(๐ฅ)ฮ๐ฅ,ฮ๐ฅใ otherwise.
Hence the condition ๐ ๐ (ฮ๐ฅ ;๐ฅ |0) โฅ 0 for all ฮ๐ฅ โ ๐ reduces to ๐โฒโฒ(๐ฅ) โฅ 0. ๏ฟฝ
Remark 26.16. Second-order optimality conditions can also be based on epigraphical derivatives,which were introduced in [Rockafellar 1985; Rockafellar 1988]; we refer to [Rockafellar & Wets1998] for a detailed discussion. A related approach based on second-order directional curvaturefunctionals was used in [Christof & Wachsmuth 2018] for deriving necessary and sucient second-order optimality conditions for smooth optimization problems subject to nonsmooth and possiblynonconvex constraints.
341
27 LIPSCHITZ-LIKE PROPERTIES AND STABILITY
A related issue to second-order conditions is that of stability of the solution to optimizationproblems under perturbation. To motivate the following, let ๐ : ๐ โ โ and suppose wewish to nd ๐ฅ โ ๐ such that 0 โ ๐๐ (๐ฅ) for a suitable subdierential. Suppose further thatwe are given some ๐ฅ โ ๐ with๐ค โ ๐๐ (๐ฅ) with โ๐ค โ๐ โ โค Y โ say, from one of the algorithmsin Chapter 8. A natural question is then for an error estimate โ๐ฅ โ๐ฅ โ๐ in terms of Y. Clearly,if ๐๐ has a single-valued and Lipschitz continuous inverse, this is the case since then
โ๐ฅ โ ๐ฅ โ๐ = โ(๐๐ )โ1(0) โ (๐๐ )โ1(๐ค)โ๐ โค ๐ฟโ๐ค โ๐ โ .
Of course, the situation is much more complicated in the set-valued case. To treat this, werst have to dene suitable notions of Lipschitz-like behavior of set-valuedmappings,whichwe then characterize using coderivatives (generalizing the characterization of the Lipschitzconstant of a dierentiable single-valued mapping through the norm of its derivative). Wereturn to the question of stability of minimizers in the more general context of perturbationsof parametrized solution mappings.
27.1 lipschitz-like properties of set-valued mappings
To set up the denition of Lipschitz-like properties for set-valued mappings, it is helpful torecall from Section 1.1 for single-valued functions the distinction between (point-based)local Lipschitz continuity at a point and (neighborhood-based) local Lipschitz continuitynear a point. (Figure 27.2b below shows a function that is locally Lipschitz at but notnear the give point.) Similarly, we will have to distinguish for set-valued mappings thecorresponding notions of Aubin property (which is point-based) and calmness (which isneighborhood-based). If these properties hold for the inverse of a mapping, we will callthe mapping itself metrically regular and metrically subregular, respectively. These fourproperties are illustrated in Figure 27.1.
Recall also from Lemma 17.4 the denition of the distance of a point ๐ฅ โ ๐ to a set ๐ด โ ๐ ,which we here write for the sake of convenience as
dist(๐ด, ๐ฅ) โ dist(๐ฅ,๐ด) โ ๐๐ด (๐ฅ) = inf๐ฅโ๐ด
โ๐ฅ โ ๐ฅ โ๐ .
342
27 lipschitz-like properties and stability
(๐ฅ, ๐ (๐ฅ))(a) locally Lipschitz ๐
(๐ฆ, ๐ โ1(๐ฆ))
(b) locally Lipschitz ๐ โ1
(๐ฅ, ๐ฆ)
(๐ฅ, ๐ฆ)
(c) Aubin property of ๐น
(๐ฅ, ๐ฆ)
(๐ฅ, ๐ฆ)
(d) metric regularity of ๐น
(๐ฅ, ๐ฆ)
(๐ฅ, ๐ฆ)
(e) calmness of ๐น
(๐ฅ, ๐ฆ)
(๐ฅ, ๐ฆ)
(f) metric subregularity of ๐น
Figure 27.1: Illustration of Lipschitz-like properties using cones. The thick lines are thegraph of the function; if this graph is locally contained in a lled cone, the prop-erty holds, while a cross-hatched cone indicates that the property is violated.
We then say that ๐น : ๐ โ ๐ has the Aubin or pseudo-Lipschitz property at ๐ฅ for ๐ฆ if graph ๐นis closed near (๐ฅ, ๐ฆ) and there exist ๐ฟ, ^ > 0 such that
(27.1) dist(๐ฆ, ๐น (๐ฅ)) โค ^ dist(๐นโ1(๐ฆ), ๐ฅ) (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น(๐ฆ, ๐ฟ)) .We call the inmum of all ^ > 0 for which (27.1) holds for some ๐ฟ > 0 the graphical modulusof ๐น at ๐ฅ for ๐ฆ , written lip ๐น (๐ฅ |๐ฆ).When we are interested in the stability of the optimality condition 0 โ ๐น (๐ฅ), it is typicallymore benecial to study the Aubin property of the inverse ๐นโ1. This is called the metricregularity of ๐น at a point (๐ฅ, ๐ฆ) โ graph ๐น , which holds if there exist ^, ๐ฟ > 0 such that
(27.2) dist(๐ฅ, ๐นโ1(๐ฆ)) โค ^ dist(๐ฆ, ๐น (๐ฅ)) (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น(๐ฆ, ๐ฟ)) .We call the inmum of all ^ > 0 for which (27.2) holds for some ๐ฟ > 0 themodulus of metricregularity of ๐น at ๐ฅ for ๐ฆ , written reg ๐น (๐ฅ |๐ฆ).
343
27 lipschitz-like properties and stability
The metric regularity and Aubin property are too strong to be satised in many applications.A weaker notion is provided by (metric) subregularity at (๐ฅ, ๐ฆ) โ graph ๐น , which holds ifthere exist ^, ๐ฟ > 0 such that
(27.3) dist(๐ฅ, ๐นโ1(๐ฆ)) โค ^ dist(๐ฆ, ๐น (๐ฅ)) (๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .
Compared to metric regularity, this allows much more leeway for ๐น by xing ๐ฆ = ๐ฆ โ ๐น (๐ฅ)(while still allowing ๐ฅ to vary). We call the inmum of all ^ > 0 for which (27.3) holds forsome ๐ฟ > 0 for themodulus of (metric) subregularity of ๐น at ๐ฅ for ๐ฆ , written subreg ๐น (๐ฅ |๐ฆ).The counterpart of metric subregularity that relaxes the Aubin property is known ascalmness. We say that ๐น : ๐ โ ๐ is calm at ๐ฅ for ๐ฆ if there exist ^, ๐ฟ > 0 such that
(27.4) dist(๐ฆ, ๐น (๐ฅ)) โค ^ dist(๐ฅ, ๐นโ1(๐ฆ)) (๐ฆ โ ๐น(๐ฆ, ๐ฟ)) .
We call the inmum of all ^ > 0 for which (27.4) holds for some ๐ฟ > 0 the modulus ofcalmness of ๐น at ๐ฅ for ๐ฆ , written calm ๐น (๐ฅ |๐ฆ). Clearly the Aubin property implies calmness,while metric regularity implies metric subregularity.
Unfortunately, the direct calculation of the dierent moduli is often infeasible in practice.Much of the rest of this chapter concentrates on calculating the graphical modulus and themodulus of metric regularity in special cases. We will consider metric subregularity (as wellas a related, weaker, notion of strong submonotonicity) in the following Section 28.1.
Remark 27.1. The Aubin property is due to [Aubin 1984], whereas metric subregularity is dueto [Ioe 1979], rst given the modern name in [Dontchev & Rockafellar 2004]. Calmness wasintroduced in [Robinson 1981] as the upper Lipschitz property. Metric regularity is equivalent toopenness at a linear rate near (๐ข,๐ค) and holds for smooth maps by the classical LyusternikโGravestheorem. We refer in particular to [Dontchev & Rockafellar 2014; Ioe 2017] for further informationon these and other related properties.
In particular, related to metric subregularity is the stronger concept of strong metric subregularity,which was introduced in [Rockafellar 1989] and requires the existence of ^, ๐ฟ > 0 such that
โ๐ฅ โ ๐ฅ โ๐ โค ^ dist(๐ฆ, ๐น (๐ฅ)) (๐ฅ โ ๐น(๐ฅ, ๐ฟ)),
i.e., a bound on the norm distance to ๐ฅ rather than the closest preimage of ๐ฆ . Its many propertiesare studied in [Cibulka, Dontchev & Kruger 2018], which also introduced ๐-exponent versions. Par-ticularly worth noting is that strong metric subregularity is invariant with respect to perturbationsby smooth functions, while metric subregularity is not.
Weaker and โpartialโ concepts of regularity have also been considered in the literature. Of particularnote is the directional metric subregularity of [Gfrerer 2013]. The idea here is to study necessaryoptimality conditions by requiring metric regularity or subregularity only along critical directionsinstead of all directions. In [Valkonen 2021], by contrast, the norms in the denition of subregularityare made operator-relative to study the partial subregularity on subspaces; compare the testing ofalgorithms for structured problems in Section 10.2.
344
27 lipschitz-like properties and stability
๐ฅ
(a) oscillating single-valued function
๐น (๐ฅ) + ๐น(0, โ โ๐ฅ โ ๐ฅ โ๐ )๐น (๐ฅ)
๐ฅ ๐ฅ
(b) graph of ๐ฅ โฆโ ๐น (๐ฅ) + ๐น(0, โ โ๐ฅ โ ๐ฅ โ๐ )
Figure 27.2: The oscillating example in (a) illustrates a function ๐ that is locally Lipschitz(or calm) at ๐ฅ , but not locally Lipschitz (or does not have the Aubin property)near the same point: the graph of the function stays in the cone formed bythe thick lines and based at (๐ฅ, ๐ (๐ฅ)) โ graph ๐ . If, however, we move thecone locally along the graph, even increasing its width, the graph will not becontained the cone. In (b) we illustrate the โfat coneโ structure graph(๐ฅ โฆโ๐น (๐ฅ) + ๐น(0, โ โ๐ฅ โ ๐ฅ โ๐ ) appearing on the right-hand-side in Theorem 27.2 (i),and varying with the second base point ๐ฅ around ๐ฅ . This is to be contrastedwith the leaner cone graph(๐ฅ โฆโ ๐ (๐ฅ)+๐น(0, โ โ๐ฅโ๐ฅ โ๐ )) bounding the functionin (a).
We now provide alternative characterizations of the Aubin property and of calmness. Theseextend to metric regularity and subregularity, respectively, by application to the inverse.
The right-hand-side of the set-inclusion characterization (i) in the next theorem forms aโfat coneโ that we illustrate in Figure 27.2b. It should locally at each base point ๐ฅ around ๐ฅbound ๐น for the Aubin property to be satised. Based on the formulation (i), we illustratein Figure 27.3 the satisfaction and dissatisfaction of the Aubin property. The other two newcharacterizations show that we do not need to restrict ๐ฅ to a tiny neighborhood of ๐ฅ inneither (i) nor the original characterization (27.1).
Theorem 27.2. Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ . Then the following are equivalentfor ๐ฅ โ ๐ and ๐ฆ โ ๐น (๐ฅ):
(i) There exists ^, ๐ฟ > 0 such that
๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ, ๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .
(ii) There exists ^, ๐ฟ > 0 such that
๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ โ ๐น(๐ฅ, ๐ฟ); ๐ฅ โ ๐ ).
(iii) The Aubin property (27.1).
(iv) There exists ^, ๐ฟ > 0 such that
dist(๐ฆ, ๐น (๐ฅ)) โค ^ dist(๐นโ1(๐ฆ) โฉ ๐น(๐ฅ, ๐ฟ), ๐ฅ) (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น(๐ฆ, ๐ฟ)) .
345
27 lipschitz-like properties and stability
The inmum of ^ > 0 for which each of these characterizations holds is equal to the graphicalmodulus lip ๐น (๐ฅ |๐ฆ). (The radius of validity ๐ฟ > 0 for any given ^ > 0 may be distinct in eachof the characterizations, however.)
Proof. (i) โ (ii): Clearly (ii) implies (i) with the same ^, ๐ฟ > 0. To show the implication inthe other direction, we start by applying (i) with ๐ฅ = ๐ฅ , which yields
๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .
Taking ๐ฅ โ ๐น(๐ฅ, ๐ฟโฒ) for some ๐ฟโฒ โ (0, ๐ฟ], we thus deduce that
๐ฆ โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) โ ๐น (๐ฅ) + ๐น(0, ^๐ฟโฒ).
In particular, for any Yโฒ > 0, we have
(27.5) ๐น(๐ฆ, Yโฒ) โ ๐น (๐ฅ) + ๐น(0, ^๐ฟโฒ + Yโฒ).
For ๐ฅ โ ๐น(๐ฅ, ๐ฟ), (ii) is immediate from (i), so we may concentrate on ๐ฅ โ ๐ \ ๐น(๐ฅ, ๐ฟ). Then
โ๐ฅ โ ๐ฅ โ๐ โฅ โ๐ฅ โ ๐ฅ โ๐ โ โ๐ฅ โ ๐ฅ โ๐ โฅ ๐ฟ โ ๐ฟโฒ.
If we pick Yโฒ, ๐ฟโฒ > 0 such that ^๐ฟโฒ + Yโฒ โค ^ (๐ฟ โ ๐ฟโฒ), it follows
^๐ฟโฒ + Yโฒ โค ^โ๐ฅ โ ๐ฅ โ๐ .
Thus (27.5) gives, as illustrated in Figure 27.4,
๐น (๐ฅ) โฉ ๐น(๐ฆ, Yโฒ) โ ๐น(๐ฆ, Yโฒ) โ ๐น (๐ฅ) + ๐น(0, ^๐ฟโฒ + Yโฒ) โ ๐น (๐ฅ) + ^๐น(0, โ๐ฅ โ ๐ฅ โ๐ ),
which is (ii).
(ii)โ (iii): We expand (ii) as
{๏ฟฝ๏ฟฝ} โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๏ฟฝ๏ฟฝ โ ๐น (๐ฅ);๐ฅ โ ๐น(๐ฅ, ๐ฟ); ๐ฅ โ ๐ ).
By rearranging and taking the inmum over all ๐ฆ โ ๐น (๐ฅ), this yields
inf๐ฆโ๐น (๐ฅ)
โ๏ฟฝ๏ฟฝ โ ๐ฆ โ๐ โค ^โ๐ฅ โ ๐ฅ โ๐ (๏ฟฝ๏ฟฝ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ); ๐ฅ โ ๐น(๐ฅ, ๐ฟ); ๐ฅ โ ๐ ).
This may further be rewritten as
inf๐ฆโ๐น (๐ฅ)
โ๏ฟฝ๏ฟฝ โ ๐ฆ โ๐ โค inf๐ฅโ๐นโ1 (๏ฟฝ๏ฟฝ)
^โ๐ฅ โ ๐ฅ โ๐ (๐ฅ โ ๐น(๐ฅ, ๐ฟ); ๏ฟฝ๏ฟฝ โฉ ๐น(๐ฆ, ๐ฟ)) .
Thus (iii) is equivalent to (i).
(iii)โ (iv): This is immediate from the denition of dist, which yields
dist(๐นโ1(๐ฆ), ๐ฅ) โค dist(๐นโ1(๐ฆ) โฉ ๐น(๐ฅ, ๐ฟ), ๐ฅ).
346
27 lipschitz-like properties and stability
๐น๐ฅ๐ฅ
๐น(๐ฆ, ๐)
๐น(๐ฅ, ๐ฟ)(a) property is satised
๐น
๐ฅ ๐ฅ
๐น(๐ฆ, ๐)
๐น(๐ฅ, ๐ฟ)(b) property is not satised
Figure 27.3: Illustration of satisfaction and dissatisfaction of the Aubin property for ๐ฅ = ๐ฅ .The dashed lines indicate ๐น(๐ฆ, ๐), and the dot marks (๐ฅ, ๐ฆ), while the dark graythick lines indicate ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐). It should remain within the bounds of theblack thick lines indicating โfatโ cone ๐น (๐ฅ) +๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ). The violation ofthe bounds at the bottom in (a) does not matter, because we are only interestedin the area between the dashed lines.
(iv)โ (i): We express (iv) as
inf๏ฟฝ๏ฟฝโ๐น (๐ฅ)
โ๐ฆ โ ๏ฟฝ๏ฟฝ โ๐ โค ^โ๐ฅ โ ๐ฅ โ๐ (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น(๐ฆ, ๐ฟ), ๐ฅ โ ๐นโ1(๐ฆ) โฉ ๐น(๐ฅ, ๐ฟ)) .
This can be rearranged to imply that
{๐ฆ} โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น(๐ฆ, ๐ฟ) โฉ ๐น (๐ฅ), ๐ฅ โ ๐น(๐ฅ, ๐ฟ)),
which can be further rewritten as
๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ, ๐ฅ โ ๐น(๐ฅ, ๐ฟ)),
yielding (i). ๏ฟฝ
We have similar characterizations of calmness. The proof is analogous, simply xing๐ฅ = ๐ฅ .
Corollary 27.3. Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ . Then the following are equivalentfor ๐ฅ โ ๐ and ๐ฆ โ ๐น (๐ฅ):
(i) There exists ^, ๐ฟ > 0 such that
๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .
(ii) There exists ^, ๐ฟ > 0 such that
๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ) โ ๐น (๐ฅ) + ๐น(0, ^โ๐ฅ โ ๐ฅ โ๐ ) (๐ฅ โ ๐ ).
347
27 lipschitz-like properties and stability
(๐ฅ, ๐ฆ)
๐น(๐ฅ, ๐ฟ)๐น(๐ฅ, ๐ฟ โฒ)
๐ฅ๐น (๐ฅ) + ^๐น(0, โ๐ฅ โ ๐ฅ โ๐ )
{๐ฅ} ร ๐น(๐ฆ, Yโฒ)
(a) illustration of technique (b) critical areas
Figure 27.4: (a) Illustration of the technique in Theorem 27.2 to prove the equivalence ofthe two set inclusion formulations of the Aubin property. For ๐ฅ outside theball ๐น(๐ฅ, ๐ฟ), the set ๐น(๐ฆ, Yโฒ) indicated by the thick dark gray line, is completelycontained in the fat-cone structure ๐น (๐ฅ) + ^๐น(0, โ๐ฅ โ ๐ฅ โ๐ ) of Figure 27.2b,indicated by the thick black and dotted lines. Closer to ๐ฅ , within ๐น(๐ฅ, ๐ฟ), thisis not the case, although ๐น (๐ฅ) โฉ๐น(๐ฆ, Yโฒ) itself is still contained in the structure.(b) highlights in darker color the areas that are critical for the Aubin propertyto hold.
(iii) Calmness (27.4).
(iv) There exists ^, ๐ฟ > 0 such that
dist(๐ฆ, ๐น (๐ฅ)) โค ^ dist(๐นโ1(๐ฆ) โฉ ๐น(๐ฅ, ๐ฟ), ๐ฅ) (๐ฆ โ ๐น(๐ฆ, ๐ฟ)) .
The inmum of ^ > 0 for which each of these characterizations holds is equal to the modulusof calmness calm ๐น (๐ฅ |๐ฆ). (The radius of validity ๐ฟ > 0 for any given ^ > 0 may be distinct ineach of the characterizations, however.)
27.2 neighborhood-based coderivative criteria
Our goal is now to relate the Aubin property to โouter normsโ of limiting coderivatives,just as the Lipschitz property of dierentiable single-valued functions can be related tonorms of their derivatives. Before embarking on this in the next section, as a preparatorystep we relate in this section the Aubin property to neighborhood-based criteria on Frรฉchetcoderivatives. To this end, we dene for a set-valued mapping ๐น : ๐ โ ๐ , (๐ฅ, ๐ฆ) โ graph ๐น ,and ๐ฟ, Y > 0
(27.6) ^Y๐ฟ (๐ฅ |๐ฆ) โ sup{โ๐ฅโโ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅโ โ ๐ทโY ๐น (๐ฅ |๐ฆ) (๐ฆโ), โ๐ฆโโ๐ โ โค 1,
๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ)},
348
27 lipschitz-like properties and stability
which measures locally the opening of the cones ๐ Ygraph ๐น (๐ฅ |๐ฆ) around (๐ฅ, ๐ฆ); for smooth
functions and Y = 0, it coincides with the local supremum of โ๐ท๐น (๐ฅ)โ๐(๐ ;๐ ) around (๐ฅ, ๐น (๐ฅ))(cf. Theorem 20.12). The next lemma bounds these openings in terms of the graphicalmodulus.
Lemma 27.4. Let ๐,๐ be Banach spaces and ๐น : ๐ โ ๐ . If graph ๐น is closed near (๐ฅ, ๐ฆ), then
inf๐ฟ>0
^๐ฟ๐ฟ (๐ฅ |๐ฆ) โค inf๐ฟ>0
^0๐ฟ (๐ฅ |๐ฆ) โค lip ๐น (๐ฅ |๐ฆ).
Proof. Since ๐ทโY ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ ๐ทโ(๐ฅ |๐ฆ) (๐ฆโ), we always have ^๐ฟ
๐ฟ(๐ฅ |๐ฆ) โค ^0
๐ฟ(๐ฅ |๐ฆ). It hence
suces to prove for any choice of Y (๐ฟ) โ [0, ๐ฟ] that
^ โ inf๐ฟ>0
^๐ฟY (๐ฟ) (๐ฅ |๐ฆ) โค lip ๐น (๐ฅ |๐ฆ) .
We may assume that lip ๐น (๐ฅ |๐ฆ) < โ, since otherwise there is nothing to prove. Thisimplies in particular that the Aubin property holds, so the denition (27.1) yields for any^โฒ > lip ๐น (๐ฅ |๐ฆ) a ๐ฟโฒ > 0 such that
(27.7) inf๏ฟฝ๏ฟฝโ๐น (๐ฅ)
โ๏ฟฝ๏ฟฝ โ ๐ฆ โ๐ โค ^โฒโ๐ฅ โ ๐ฅ โ๐ , (๐ฆ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟโฒ), ๐ฅ โ ๐น(๐ฅ, ๐ฟโฒ)) .
Pick ห โ (0, ^) and ๐ฟ โ (0, ๐ฟโฒ). By the denition of ^Y (๐ฟ)๐ฟ
(๐ฅ |๐ฆ), there exist ๐ฅ โ ๐น(๐ฅ, ๐ฟ),๐ฆ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ), and (๐ฅโ,โ๐ฆโ) โ ๐ Y (๐ฟ)
graph ๐น (๐ฅ, ๐ฆ) such that โ๐ฅโโ๐ โ โฅ ห and โ๐ฆโโ๐ โ โค 1.Theorem 1.4 then yields a ฮ๐ฅ โ ๐ such that
(27.8) ใ๐ฅโ,ฮ๐ฅใ๐ = โ๐ฅโโ๐ โ and โฮ๐ฅ โ๐ = 1.
Let ๐๐โ 0 with ๐๐ โค ๐ฟ and set ๐ฅ๐ โ ๐ฅ + ๐๐ฮ๐ฅ . Then taking ๐ฅ = ๐ฅ๐ in (27.7), we can take๐ฆ๐ โ ๐น (๐ฅ๐) such that
(27.9) lim inf๐โโ
๐โ1๐ โ๐ฆ๐ โ ๐ฆ โ๐ โค ^โฒโฮ๐ฅ โ๐ .
In particular, after passing to a subsequence if necessary, we may assume that ๐ฆ๐ โ ๐ฆstrongly in ๐ . Using (27.8), โ๐ฅโโ๐ โ โฅ ห , and โ๐ฆโโ๐ โ โค 1, this leads to
(27.10) lim sup๐โโ
๐โ1๐ (ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐ )
= lim sup๐โโ
(ใ๐ฅโ,ฮ๐ฅใ๐ โ ๐โ1๐ ใ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐)
โฅ (โ๐ฅโโ๐ โ โ ^โฒ)โฮ๐ฅ โ๐ โฅ ห โ ^โฒ.
By (27.9) (for the chosen subsequence) and the construction of ๐ฅ๐ , we have
(27.11) lim sup๐โโ
๐โ1๐ โ(๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ)โ๐ร๐ โค (1 + ^โฒ)โฮ๐ฅ โ๐ = 1 + ^โฒ.
349
27 lipschitz-like properties and stability
Since (๐ฅโ,โ๐ฆโ) โ ๐ Y (๐ฟ)graph ๐น (๐ฅ, ๐ฆ), the dening equation (18.7) of ๐ Y (๐ฟ)
graph ๐น (๐ฅ, ๐ฆ) we have
(27.12) lim sup๐โโ
ใ๐ฅโ, ๐ฅ๐ โ ๐ฅใ๐ โ ใ๐ฆโ, ๐ฆ๐ โ ๐ฆใ๐โ(๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ)โ๐ร๐ โค Y (๐ฟ).
Therefore, (27.10), (27.11), and (27.12) together yield
(1 + ^โฒ)Y (๐ฟ) โฅ ห โ ^โฒ.Taking the inmum over ๐ฟ > 0, it follows that ห โฅ ^โฒ. Since ^โฒ > lip ๐น (๐ฅ |๐ฆ) and ห < ^were arbitrary, we obtain ^ โค lip ๐น (๐ฅ |๐ฆ) as desired. ๏ฟฝ
For the next theorem, recall the denition of Gรขteaux smooth spaces from Section 17.2.
Theorem 27.5. Let ๐,๐ be Gรขteaux smooth Banach spaces and let ๐น : ๐ โ ๐ be such thatgraph ๐น is closed near (๐ฅ, ๐ฆ) โ ๐ ร๐ . Then ๐น has the Aubin property at ๐ฅ for ๐ฆ if and only if^๐ฟ๐ฟ(๐ฅ |๐ฆ) < โ or ^0
๐ฟ(๐ฅ |๐ฆ) < โ for some ๐ฟ > 0. Furthermore, in this case
(27.13) inf๐ฟ>0
^๐ฟ๐ฟ (๐ฅ |๐ฆ) = inf๐ฟ>0
^0๐ฟ (๐ฅ |๐ฆ) = lip ๐น (๐ฅ |๐ฆ).
Proof. By Lemma 27.4, it suces to show that
^ โ inf๐ฟ>0
^๐ฟ๐ฟ (๐ฅ |๐ฆ) โฅ lip ๐น (๐ฅ |๐ฆ).
We may assume that lip ๐น (๐ฅ |๐ฆ) > 0 as otherwise there is nothing to show. Our plan is nowto take arbitrary 0 < ห < lip ๐น (๐ฅ |๐ฆ) and show that ^ โฅ ห . This implies ^ โฅ lip ๐น (๐ฅ |๐ฆ) asdesired.
To show that ^ โฅ ห , it suces to show that ^๐ฟ๐ฟ(๐ฅ |๐ฆ) โฅ ห for all ๐ฟ > 0. To do this, for
a parameter ๐กโ 0, we take Y๐กโ 0 and (๐ฅ๐ก , ๐ฆ๐ก ) โ (๐ฅ, ๐ฆ) as ๐กโ 0 as well as Y๐ก -normals(๐ฅโ๐ก ,โ๐ฆโ๐ก ) โ ๐ Y๐ก
graph ๐น (๐ฅ๐ก , ๐ฆ๐ก ) that satisfy lim inf๐กโ0 โ๐ฅโ๐ก โ๐ โ โฅ ห and lim sup๐กโ0 โ๐ฆโ๐ก โ๐ โ โค 1.By the denition of ^๐ฟ
๐ฟ(๐ฅ |๐ฆ) in (27.6), taking for each ๐ฟ > 0 the index ๐ก > 0 such that
max{Y๐ก , โ๐ฅ๐ก โ ๐ฅ โ๐ , โ๐ฆ๐ก โ ๐ฆ โ๐ } โค ๐ฟ , this shows as claimed that ^๐ฟ๐ฟ(๐ฅ |๐ฆ) โฅ ห . The rough idea
is to construct the Y๐ก -normals by projecting points not in graph ๐น back onto this set. Thereare, however, some technical diculties along our way. We divide the construction intothree steps.
Step 1: setting up the projection problem. Let 0 < ห < lip ๐น (๐ฅ |๐ฆ). Since then the Aubinproperty does not hold for ห , by the characterization of Theorem 27.2 (iv) there exist
(27.14) ๏ฟฝ๏ฟฝ๐ก โ ๐น (๐ฅ๐ก ) โฉ ๐น(๐ฆ, ๐ก) and ๐ฅ๐ก , ๐ฅ๐ก โ ๐น(๐ฅ, ๐ก) for all ๐ก > 0
such that
(27.15) inf๐ฆ๐กโ๐น (๐ฅ๐ก )
โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ > ห โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ .
350
27 lipschitz-like properties and stability
Since inf๐ฆ๐กโ๐น (๐ฅ๐ก ) โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ = 0, this implies that ๐ฅ๐ก โ ๐ฅ๐ก and (๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก ) โ graph ๐น . We want tolocally project (๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก ) back onto graph ๐น . However, the nondierentiability of the distancefunction โ ยท โ ๏ฟฝ๏ฟฝ๐ก โ๐ at ๏ฟฝ๏ฟฝ๐ก would cause diculties, so โ similarly to the proof of Lemma 18.13โ we modify the projection by composing the norm with the โsmoothing functionโ
(27.16) ๐` (๐ ) โโ`2 + ๐ 2 โ `.
By Theorems 4.5, 4.6 and 4.19 and the assumed dierentiability of โ ยท โ๐ away from theorigin, ๐` (โ ยท โ๐ ) is convex and has a single-valued subdierential mapping with elementsof norm less than one. Hence this smoothed distance function is Gรขteaux dierentiable byLemma 13.7. Due to (27.16), for every ๐ก > 0 and `๐ก > 0, we further have
(27.17) โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ โ `๐ก โค ๐`๐ก (โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ ) โค โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ (๐ฆ โ ๐ ).To locally project (๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก ) onto graph ๐น , we thus seek to minimize the function
(27.18) ๐๐ก (๐ฅ, ๐ฆ) โ ๐ฟ๐ถ๐ก (๐ฅ, ๐ฆ) + ห โ๐ฅ โ ๐ฅ๐ก โ๐ + ๐`๐ก (โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ )for
๐ถ๐ก โ [๐น(๐ฅ, ๐ก + 2 ห) ร ๐น(๐ฆ, ๐ก + 2 ห)] โฉ graph ๐น .Clearly,๐๐ก is bounded from below by โ`๐ก as well as coercive since ๐ถ๐ก is bounded. If ๐ก is issmall enough, then ๐ถ๐ก is closed by the local closedness of graph ๐น . Therefore๐๐ก is lowersemicontinuous (but not weakly lower semicontinuous since graph ๐น need not be convex).
Step 2: nding approximate minimizers.We would like to nd a minimizer of๐๐ก , but the lackof weak lower semicontinuity prevents the use of Tonelliโs direct method of Theorem 2.1.We therefore use Ekelandโs variational principle (Theorem 2.14) to nd an approximateminimizer. Towards this end, choose for every ๐ก > 0
(27.19) `๐ก โ ๐กโ1/2 ห โ๐ฅ๐ก โ ๐ฅ๐ก โ2๐ โค ห๐ก 1/2โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ and _๐ก โ โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ + ๐ก 1/2 โค ๐ก + ๐ก 1/2,where the inequalities hold due to (27.14). Then
(27.20) ๐๐ก (๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก ) = ห โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ โค ( ห โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ + `๐ก ) + inf๐๐ก .
Therefore, applying Theorem 2.14 for _ = _๐ก and
Y = ห โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ + `๐ก = ห โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ ๐กโ1/2_๐ก = `๐ก_๐กโ๐ฅ๐ก โ ๐ฅ๐ก โ๐ ,
we obtain for each ๐ก > 0 a strict minimizer (๐ฅ๐ก , ๐ฆ๐ก ) of
๐๐ก (๐ฅ, ๐ฆ) โ ๐๐ก (๐ฅ, ๐ฆ) + `๐กโ๐ฅ๐ก โ ๐ฅ๐ก โ๐ (โ๐ฅ โ ๐ฅ๐ก โ๐ + โ๐ฆ โ ๐ฆ๐ก โ๐ )(27.21a)
with๐๐ก (๐ฅ๐ก , ๐ฆ๐ก ) + `๐ก
โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ (โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ + โ๏ฟฝ๏ฟฝ๐ก โ ๐ฆ๐ก โ๐ ) โค ๐๐ก (๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก ) = ^โ๐ฅ๐ก โ ๐ฅ๐ก โ๐(27.21b)
351
27 lipschitz-like properties and stability
andโ๐ฅ๐ก โ ๐ฅ๐ก โ๐ + โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ โค _๐ก .(27.21c)
We claim that ๐ฅ๐ก โ ๐ฅ๐ก , which we show by contradiction. Assume therefore that ๐ฅ๐ก = ๐ฅ๐ก .Then ๐ฆ๐ก โ ๐น (๐ฅ๐ก ), and (27.17) yields
๐๐ก (๐ฅ๐ก , ๐ฆ๐ก ) = ๐`๐ก (โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ ) โฅ โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ โ `๐ก .
Thus by (27.20) and (27.21b),
โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ โค โ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ โ `๐ก + `๐กโ๐ฅ๐ก โ ๐ฅ๐ก โ๐ (โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ + โ๏ฟฝ๏ฟฝ๐ก โ ๐ฆ๐ก โ๐ ) โค ห โ๐ฅ๐ก โ ๐ฅ๐ก โ๐ .
But this contradicts (27.15) as ๐ฆ๐ก โ ๐น (๐ฅ๐ก ).Step 3: constructing Y-normals.We are now ready to construct the desired Y-normals. Wewrite
(27.22) ๐๐ก (๐ฅ๐ก , ๐ฆ๐ก ) = ๐ฟ๐ถ๐ก (๐ฅ, ๐ฆ) + ๐น (๐ฅ, ๐ฆ)
for the convex and Lipschitz continuous function
๐น (๐ฅ, ๐ฆ) โ ห โ๐ฅ โ ๐ฅ๐ก โ๐ + ๐`๐ก (โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ ) + `๐กโ๐ฅ๐ก โ ๐ฅ๐ก โ๐ (โ๐ฅ โ ๐ฅ๐ก โ๐ + โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ ) .
Since we assume ๐ to be Gรขteaux smooth, ๐ฅ โฆโ ห โ๐ฅ โ ๐ฅ๐ก โ๐ is Gรขteaux dierentiable at๐ฅ๐ก โ ๐ฅ๐ก . Furthermore, ๐ฆ โฆโ ๐`๐ก (โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ ) is by construction Gรขteaux dierentiable for all๐ฆ . By (27.19), we have `๐ก
โ๐ฅ๐กโ๐ฅ๐ก โ๐ โค ๐ก 1/2 ห . Since ๐ฅ๐ก โ ๐ฅ๐ก , Theorems 4.6, 4.14 and 4.19 now yield
(27.23) ๐๐น (๐ฅ๐ก , ๐ฆ๐ก ) โ ๐น((โ๐ฅโ๐ก , ๐ฆโ๐ก ), ๐ก 1/2 ห) for{โ๐ฅโ๐ก = ห๐ท [โ ยท โ ๐ฅ๐ก โ๐ ] (๐ฅ๐ก ),๐ฆโ๐ก = ๐ท [๐โฒ`๐ก (โ ยท โ ๏ฟฝ๏ฟฝ๐ก โ๐ )] (๐ฆ๐ก ).
Since ๐ฅ๐ก โ ๐ฅ๐ก , we have โ๐ฅโ๐ก โ๐ โ = ห by Theorem 4.6. Moreover, โ๐ฆโ๐ก โ๐ โ โค 1 as observed inStep 1. Theorem 16.2 further yields 0 โ ๐๐น๐๐ก (๐ฅ๐ก , ๐ฆ๐ก ).Due to (27.22) and (27.23), Lemma 17.2 now shows that
(๐ฅโ๐ก ,โ๐ฆโ๐ก ) โ ๐ Y๐ก๐ถ๐ก(๐ฅ๐ก , ๐ฆ๐ก ), i.e., ๐ฅโ๐ก โ ๐ทโ
Y๐ก ๐น (๐ฅ๐ก |๐ฆ๐ก ) (๐ฆโ๐ก ) for Y๐ก โ ๐ก 1/2 ห .
We illustrate this construction in Figure 27.5. Since _๐ก โค ๐ก + ๐ก 1/2 by (27.19), it follows from(27.21c) that โ๐ฅ๐กโ๐ฅ โ๐ , โ๐ฆ๐กโ๐ฆ โ๐ โค 2๐ก+๐ก 1/2 and hence that (๐ฅ๐ก , ๐ฆ๐ก ) โ (๐ฅ, ๐ฆ) as ๐กโ 0. We alsohave both lim inf๐กโ 0 โ๐ฅโ๐ก โ๐ โ โฅ ห and lim sup๐กโ 0 โ๐ฆโ๐ก โ๐ โ โค 1. Thus we have constructedthe desired sequence of Y๐ก -normals. ๏ฟฝ
352
27 lipschitz-like properties and stability
๐น
(๐ฅ, ๐ฆ)
(๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก )
(๐ฅ๐ก , ๐ฆ๐ก )
(๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก )
(๐ฅ๐ก , ๐ฆ๐ก ) = proj(๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก )
๐
โฅ ห๐
โฅ ห๐๐ฆ
๐๐ฆ
Figure 27.5: The construction in the nal part of the proof of Theorem 27.5. The dotted arrowindicates how ๐ฆ๐ก minimizes the distance to ๏ฟฝ๏ฟฝ๐ก within ๐น (๐ฅ๐ก ), which ensures thatโ๐ฆ๐ก โ ๏ฟฝ๏ฟฝ๐ก โ๐ โฅ ห๐ for ๐ โ โ๐ฅ๐ก โ๐ฅ๐ก โ๐ . The point (๐ฅ๐ก , ๏ฟฝ๏ฟฝ๐ก ) is outside graph ๐น ; whenprojected back as (๐ฅ๐ก , ๐ฆ๐ก ), the normal vector to graph ๐น indicated by the solidarrow has ๐ฅ-component larger than the ๐ฆ-component ๐๐ฆ by the factor ห . Thedashed arrow indicates the convergence of the other points to (๐ฅ, ๐ฆ) as ๐กโ 0.
Remark 27.6. Our proof of Theorem 27.5 diers from those in [Mordukhovich 2018; Mordukhovich2006] by the specic construction of the point (๏ฟฝ๏ฟฝ๐ก , ๐ฅ๐ก ) โ graph ๐น and the use of the smootheddistance ๐Y๐ก (โ ยท โ๐ ). In contrast, the earlier proofs rst translate the Aubin property (or metricregularity) into a covering or linear openness property to construct the point outside graph ๐น that isto be projected back onto this set. In nite dimensions, [Mordukhovich 2018] develops calculus forthe limiting subdierential of Section 16.3 to avoid the lack of calculus for the Frรฉchet subdierential;we instead apply the fuzzy calculus of Lemma 17.2 to the smoothed distance function ๐Y๐ก (โ ยท โ๐ ). Afurther alternative in nite dimensions involves the proximal subdierentials used in [Rockafellar& Wets 1998]. In innite dimensions, [Mordukhovich 2006] develops advanced extremal principlesto work with the Frรฉchet subdierential.
Remark 27.7 (relaxation of Gรขteaux smoothness). The assumption that ๐ (or, with somewhat morework, ๐ ) is Gรขteaux smooth in Theorem 27.5 may be replaced with the assumption of the existenceof a family {\` : ๐ โ โ}`>0 of Gรขteaux dierentiable norm approximations satisfying
โ๐ฆ โ๐ โ ` โค \` (๐ฆ) โค โ๐ฆ โ๐ (๐ฆ โ ๐ ) .Then (27.17) holds with \`๐ก (๐ฆ โ ๏ฟฝ๏ฟฝ๐ก ) in place of ๐`๐ก (โ๐ฆ โ ๏ฟฝ๏ฟฝ๐ก โ๐ ). For example, with ๐` as in (27.16),in ๐ฟ๐ (ฮฉ) we can set
\` (๐ฆ) โ โ๐` ( |๐ฆ (b) |)โ๐ฟ๐ (ฮฉ) (๐ฆ โ ๐ฟ1(ฮฉ)) .With somewhat more eort, the Gรขteaux smoothness of ๐ can be similarly relaxed.
27.3 point-based coderivative criteria
We will now convert the neighborhood-based criterion of Lemma 27.4 and Theorem 27.5into a simpler point-based criterion. For the statement, we need to introduce a new smaller
353
27 lipschitz-like properties and stability
๐ป
[โ1, 1]
(๐ค, ๐ง)
(0, 0)
(a) general set-valued mapping ๐ป
๐ป
[โ1, 1]
(๐ค, ๐ง)
(0, 0)
(b) mapping whose graph๐ป is a cone
Figure 27.6: points (๐ค, ๐ง) achieving the supremum in the expression of the outer norm |๐ป |+
coderivative of ๐น : ๐ โ ๐ at ๐ฅ for ๐ฆ , the mixed (limiting) coderivative ๐ทโ๐๐น (๐ฅ |๐ฆ) : ๐ โ โ
๐ โ,
(27.24) ๐ทโ๐๐น (๐ฅ |๐ฆ) (๐ฆโ) โ w-โ-lim sup
(๐ฅ,๏ฟฝ๏ฟฝ)โ(๐ฅ,๐ฆ)๏ฟฝ๏ฟฝโโ๐ฆโ, Yโ 0
๐ทโY ๐น (๐ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโ),
which diers from the โnormalโ coderivative
(27.25) ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) = w-โ-lim sup(๐ฅ,๏ฟฝ๏ฟฝ)โ(๐ฅ,๐ฆ)๏ฟฝ๏ฟฝโ โโ๐ฆโ, Yโ 0
๐ทโY ๐น (๐ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโ),
by the use of weak-โ convergence in ๐ โ and strong convergence in ๐ โ instead of weak-โconvergence in both. (The mixed coderivative is not obtained directly from any of theusual normal cones, although one can naturally dene corresponding mixed normal coneson product spaces.)
We further dene for any ๐ป :๐ โ ๐ the outer norm
|๐ป |+ โ sup{โ๐งโ๐ | ๐ง โ ๐ป (๐ค), โ๐ค โ๐ โค 1}.
We illustrate the outer norm by two examples in Figure 27.6. We are mainly interested inthe outer norms of coderivatives, in particular of
(27.26) |๐ทโ๐๐น (๐ฅ |๐ฆ) |+ = sup{โ๐ฅโโ๐ โ | ๐ฅโ โ ๐ทโ
๐๐น (๐ฅ |๐ฆ) (๐ฆโ), โ๐ฆโโ๐ โ โค 1}.
Recalling Theorem 18.5, we have
(27.27) ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) โ ๐ทโ๐๐น (๐ฅ |๐ฆ) (๐ฆโ) โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ),
so the outer norms satisfy
|๐ทโ๐๐น (๐ฅ |๐ฆ) |+ โค |๐ทโ๐น (๐ฅ |๐ฆ) |+.
354
27 lipschitz-like properties and stability
We say that ๐น is coderivatively normal at ๐ฅ for ๐ฆ if |๐ทโ๐๐น (๐ฅ |๐ฆ) |+ = |๐ทโ๐น (๐ฅ |๐ฆ) |+. Of course,
if ๐ is nite-dimensional, then ๐ทโ๐๐น (๐ฅ |๐ฆ) = ๐ทโ๐น (๐ฅ |๐ฆ) and thus ๐น is always coderivatively
normal. Note that |๐ทโ๐น (๐ฅ |๐ฆ) |+ can be directly related to the neighborhood-based ^๐ฟ๐ฟde-
ned in (27.6). In particular, it measures the opening of the cone ๐graph ๐น (๐ฅ, ๐ฆ); compareFigure 27.6b.
As the central result of this chapter, we now use this connection to derive a characterizationof the Aubin property and the graphical modulus (and hence also of metric regularity andthe modulus of metric regularity) through the outer norm of the mixed limiting coderivative.ThisMordukhovich criterion generalizes the classical relation between the Lipschitz constantof a ๐ถ1 function and the norm of its derivative.
Lemma 27.8 (Mordukhovich criterion in general Banach spaces). Let ๐,๐ be Banach spacesand let ๐น : ๐ โ ๐ be such that graph ๐น is closed near (๐ฅ, ๐ฆ) โ ๐ ร ๐ . If ๐น has the Aubinproperty at ๐ฅ for ๐ฆ , then
(27.28) ๐ทโ๐๐น (๐ฅ |๐ฆ) (0) = {0}
and
(27.29) |๐ทโ๐๐น (๐ฅ |๐ฆ) |+ โค lip ๐น (๐ฅ |๐ฆ).
Proof. As the rst step, we show that the Aubin property implies (27.29) and hence that^ โ |๐ทโ
๐๐น (๐ฅ |๐ฆ) |+ < โ. Let ๐ > 0. By the denition of๐ทโ๐๐น (๐ฅ |๐ฆ) in (27.24), there then exist
๐ฟ โ (0, ๐), ๐ฅ โ ๐น(๐ฅ, ๐), and ๐ฆ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐) as well as ๐ฆโ โ ๐ โ and ๐ฅโ โ ๐ทโ๐ฟ๐น (๐ฅ |๐ฆ) (๐ฆโ)
such that โ๐ฆโโ๐ โ โค 1 + ๐ and โ๐ฅโโ๐ โ โฅ ^ (1 โ ๐)2. (The upper bound on โ๐ฆโโ๐ โ is whywe need the mixed coderivative, since โ ยท โ๐ โ is continuous only in the strong topology.For the lower bound on โ๐ฅโโ๐ โ , in contrast, the weak-โ lower semicontinuity of โ ยท โ๐ โ issucient.) Since ๐ทโ
๐ฟ๐น (๐ฅ |๐ฆ) is formed from a cone, we may divide ๐ฅโ and ๐ฆโ by 1 + ๐ and
thus assume that โ๐ฆโโ๐ โ โค 1 and โ๐ฅโโ๐ โ โฅ ^ (1 โ ๐). Consequently
^ (1 โ ๐) โค ^๐ฟ๐ฟ (๐ฅ |๐ฆ) = sup{โ๐ฅโโ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅโ โ ๐ทโ๐ฟ๐น (๐ฅ |๐ฆ) (๐ฆโ), โ๐ฆโโ๐ โ โค 1,
๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ)}.
Taking the inmum over ๐ฟ > 0 and letting ๐โ 0 thus shows
^ โค inf๐ฟ>0
^๐ฟ๐ฟ (๐ฅ |๐ฆ).
It now follows from Lemma 27.4 that ^ โค lip ๐น (๐ฅ |๐ฆ), which yields (27.29).
As the second step, we prove that the Aubin property implies (27.28). We argue by con-traposition. First, note that since graph๐ทโ
๐๐น (๐ฅ |๐ฆ) is a cone, 0 โ ๐ทโ๐๐น (๐ฅ |๐ฆ) (0). Hence if
(27.28) does not hold, there exists ๐ฅโ โ ๐ โ \ {0} such that
๐ฅโ [0,โ) โ ๐ทโ๐๐น (๐ฅ |๐ฆ) (0).
By (27.26) and the rst step, this implies thatโ = ^ โค lip ๐น (๐ฅ |๐ฆ) and hence that the Aubinproperty of ๐น at ๐ฅ for ๐ฆ is violated. ๏ฟฝ
355
27 lipschitz-like properties and stability
Applied to ๐นโ1, we obtain a corresponding result for metric regularity.
Corollary 27.9 (Mordukhovich criterion for metric regularity in general Banach spaces). Let๐,๐ be Banach spaces and let ๐น : ๐ โ ๐ be such that graph ๐น is closed near (๐ฅ, ๐ฆ) โ ๐ ร ๐ .If ๐น is metrically regular at (๐ฅ, ๐ฆ), then
(27.30) 0 โ ๐ทโ๐๐น (๐ฅ |๐ฆ) (๐ฆโ) โ ๐ฆโ = 0
and
(27.31) |๐ทโ๐๐น
โ1(๐ฆ |๐ฅ) |+ โค reg ๐น (๐ฅ |๐ฆ).
Proof. We apply Lemma 27.8 to ๐นโ1, observing that (27.28) applied to ๐นโ1 is (27.30). ๏ฟฝ
Under stronger assumptions on the spaces and the set-valued mapping, we obtain equiva-lence. For the following theorem, recall the denition of partial sequential normal com-pactness (PSNC) from Section 25.2.
Theorem 27.10 (Mordukhovich criterion in smooth Banach spaces). Let ๐,๐ be Gรขteauxsmooth Banach spaces with ๐ reexive and let ๐น : ๐ โ ๐ be such that graph ๐น is closed near(๐ฅ, ๐ฆ) โ ๐ ร ๐ . If ๐น is PSNC at ๐ฅ for ๐ฆ , then the following are equivalent:
(i) the Aubin property of ๐น at ๐ฅ for ๐ฆ ;
(ii) the implication (27.28);
(iii) |๐ทโ๐๐น (๐ฅ |๐ฆ) |+ < โ.
Proof. Due to Lemma 27.8, it suces to show that (iii)โ (ii)โ (i). We start with the secondimplication. Since ๐ and ๐ are Gรขteaux smooth, Theorem 27.5 yields
(27.32) lip ๐น (๐ฅ |๐ฆ) = ห โ inf๐ฟ>0
sup{โ๐ฅโโ๐ โ
๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ฅโ โ ๐ทโ๐ฟ๐น (๐ฅ |๐ฆ) (๐ฆโ), โ๐ฆโโ๐ โ โค 1,
๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น (๐ฅ) โฉ ๐น(๐ฆ, ๐ฟ)}
and that the Aubin property holds if ห < โ. We now argue by contradiction. Assumethat the Aubin property does not hold. Then ห = โ and hence we can nd (๐ฅ๐ , ๐ฆ๐) โ(๐ฅ, ๐ฆ), Y๐โ 0, and ๐ฅโ
๐โ ๐ทโ
Y๐๐น (๐ฅ๐ |๐ฆ๐) (๐ฆโ๐ ) with โ๐ฆโ
๐โ๐ โ โค 1 and โ๐ฅโ
๐โ๐ โ โ โ. In particular,
๐ฆโ๐/โ๐ฅโ
๐โ๐ โ โ 0. Since ๐ is reexive, we can apply the EberleinโSmulyan Theorem 1.9 to
extract a subsequence (not relabelled) such that ๐ฅโ๐/โ๐ฅโ
๐โ๐ โ โโ ๐ฅโ for some ๐ฅโ โ ๐ โ. Since
graph๐ทโY๐๐น (๐ฅ๐ |๐ฆ๐) is a cone, we also have
๐ฅโ๐/โ๐ฅโ๐ โ๐ โ โ ๐ทโY๐๐น (๐ฅ๐ |๐ฆ๐) (๐ฆโ๐/โ๐ฅโ๐ โ๐ โ).
356
27 lipschitz-like properties and stability
By the denition (27.24) of the mixed coderivative, we deduce that ๐ฅโ โ ๐ทโ๐๐น (๐ฅ |๐ฆ) (0).
We now make a case distinction: If ๐ฅโ โ 0, then this contradicts the qualication con-dition (27.28). On the other hand, if ๐ฅโ = 0, the PSNC of ๐น at ๐ฅ for ๐ฆ , implies that1 = โ๐ฅโ
๐/โ๐ฅโ
๐โ๐ โ โ๐ โ โ 0, which is also a contradiction. Therefore (27.28) implies the
Aubin property.
It remains to show that (iii)โ (ii). First, since graph๐ทโ๐๐น (๐ฅ |๐ฆ) is a cone, ๐ทโ
๐๐น (๐ฅ |๐ฆ) (0) is acone as well. Hence by (27.26), |๐ทโ
๐๐น (๐ฅ |๐ฆ) |+ < โ implies that ๐ทโ๐๐น (๐ฅ |๐ฆ) (0) = {0}, which
is (27.28). ๏ฟฝ
Again, applying Theorem 27.10 to ๐นโ1 yields a characterization of metric regularity.
Corollary 27.11 (Mordukhovich criterion for metric regularity in smooth Banach spaces).
Let ๐,๐ be Gรขteaux smooth Banach spaces with ๐ reexive and let ๐น : ๐ โ ๐ be suchthat graph ๐น is closed near (๐ฅ, ๐ฆ) โ ๐ ร ๐ . If ๐นโ1 is PSNC at ๐ฆ for ๐ฅ , then the following areequivalent:
(i) the metric regularity of ๐น at (๐ฅ, ๐ฆ);(ii) the implication (27.30);
(iii) |๐ทโ๐๐น
โ1(๐ฆ |๐ฅ) |+ < โ.
Remark 27.12 (separable and Asplund spaces). The reexivity of ๐ (resp. ๐ ) was used to obtain theweak-โ compactness of the unit ball in ๐ โ via the EberleinโSmulyan Theorem 1.9 applied to ๐ โ.Alternatively, this can be obtained by assuming separability of ๐ and using the BanachโAlaogluTheorem 1.11. More generally, dual spaces of Asplund spaces have weak-โ-compact unit balls; werefer to [Mordukhovich 2006] for the full theory in Asplund spaces.
In nite dimensions, we have a full characterization of the graphical modulus via the outernorm of the limiting coderivative (which here coincides with the mixed coderivative).
Corollary 27.13 (Mordukhovich criterion for the graphical modulus in finite dimensions).
Let ๐,๐ be nite-dimensional Gรขteaux smooth Banach spaces and let ๐น : ๐ โ ๐ be such thatgraph ๐น is closed near (๐ฅ, ๐ฆ) โ ๐ ร ๐ . Then
lip ๐น (๐ฅ |๐ฆ) = |๐ทโ๐น (๐ฅ |๐ฆ) |+.
Proof. Due to Lemma 27.8, we only have to show that
(27.33) lip ๐น (๐ฅ |๐ฆ) โค |๐ทโ๐น (๐ฅ |๐ฆ) |+.
As in the proof of Theorem 27.10, the smoothness of๐ and ๐ allows applying Theorem 27.5to obtain that lip ๐น (๐ฅ |๐ฆ) = ห given by (27.32). It therefore suces to show that ห โค|๐ทโ๐น (๐ฅ |๐ฆ) |+. Let ^โฒ < ห be arbitrary. By (27.32), we can then nd (๐ฅ๐ , ๐ฆ๐) โ (๐ฅ, ๐ฆ) and
357
27 lipschitz-like properties and stability
๐น
(a) property is satised
๐น
(b) property is not satised
Figure 27.7: Illustration of |๐ทโ๐น (๐ฅ |๐ฆ) |+ = sup{โ๐ฅโโ๐ โ | (๐ฅโ,โ๐ฆโ) โ๐graph ๐น (๐ฅ, ๐ฆ), โ๐ฆโโ๐ โ โค 1}, where the arrows denote the directionscontained in the normal cone. In (a), โ๐ฆโ โ [0,โ) but ๐ฅโ = 0, hence|๐ทโ๐น (๐ฅ |๐ฆ) |+ = 0 and the Aubin property is satised. In (b), we can take for๐ฆโ = 0 any ๐ฅโ โ (โโ, 0], hence |๐ทโ๐น (๐ฅ |๐ฆ) |+ = โ and the Aubin property isnot satised.
Y๐โ 0 as well as ๐ฅโ๐โ ๐ทโ
Y๐๐น (๐ฅ๐ |๐ฆ๐) (๐ฆโ๐ ) with โ๐ฆโ
๐โ๐ โ โค 1, and ห โฅ โ๐ฅโ
๐โ โฅ ^โฒ. Since ๐
and ๐ are nite-dimensional, we can apply the HeineโBorel Theorem to extract stronglyconverging subsequences (not relabelled) such that ๐ฅโ
๐โ ๐ฅโ with โ๐ฅโโ๐ โ โฅ ^โฒ and ๐ฆโ
๐โ
๐ฆโ with โ๐ฆโโ๐ โ โค 1. Since strongly converging sequences also converge weakly-โ, theexpression (27.25) for the normal coderivative implies that ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ) and that|๐ทโ๐น (๐ฅ |๐ฆ) |+ โฅ โ๐ฅโโ๐ โ โฅ ^โฒ. Since ^โฒ < ห was arbitrary, we obtain (27.33). ๏ฟฝ
We illustrate in Figure 27.7 how the outer norm of the coderivative relates to the Aubinproperty.
Corollary 27.14 (Mordukhovich criterion for the modulus of metric regularity in finite
dimensions). Let๐,๐ be nite-dimensional Gรขteaux smooth Banach spaces and let ๐น : ๐ โ ๐be such that graph ๐น is closed near (๐ฅ, ๐ฆ) โ ๐ ร ๐ . Then
reg ๐น (๐ฅ |๐ฆ) = |๐ทโ๐น (๐ฅ |๐ฆ)โ1 |+.
Proof. By Lemma 20.5, we have
|๐ทโ๐นโ1(๐ฆ |๐ฅ) |+ = sup{โ๐ฆโโ๐ โ | โ๐ฆโ โ ๐ทโ๐นโ1(๐ฆ |๐ฅ) (โ๐ฅโ), โ๐ฅโโ๐ โ โค 1}= sup{โ๐ฆโโ๐ โ | ๐ฅโ โ ๐ทโ๐น (๐ฅ |๐ฆ) (๐ฆโ), โ๐ฅโโ๐ โ โค 1}= | [๐ทโ๐น (๐ฅ |๐ฆ)]โ1 |+.
The claim now follows by applying Corollary 27.13 to ๐นโ1 together with Corollary 27.9. ๏ฟฝ
Remark 27.15. Derivative-based characterizations of calmness and metric subregularity are signi-cantly more involved than those of the Aubin property and metric regularity discussed above. Werefer to [Henrion, Jourani & Outrata 2002; Zheng & Ng 2010; Gfrerer 2011; Gfrerer & Outrata 2016]to a few characterizations in special cases.
358
27 lipschitz-like properties and stability
To close this section, we relate the Mordukhovich criterion to the classical inverse functiontheorem (Theorem 2.8).
Corollary 27.16 (inverse function theorem). Let๐,๐ be reexive and Gรขteaux smooth Banachspaces and let ๐น : ๐ โ ๐ be continuously dierentiable around ๐ฅ โ ๐ . If ๐น โฒ(๐ฅ)โ โ ๐(๐ โ;๐ โ)has a left-inverse ๐น โฒ(๐ฅ)โโ โ ๐(๐ โ;๐ โ), then there exist ^ > 0 and ๐ฟ > 0 such that for all๐ฆ โ ๐น(๐น (๐ฅ), ๐ฟ) there exists a single-valued selection ๐ฝ (๐ฆ) โ ๐นโ1(๐ฆ) with
โ๐ฅ โ ๐ฝ (๐ฆ)โ๐ โค ^โ๐น (๐ฅ) โ ๐ฆ โ๐ .
Proof. Let ๐ฆ โ ๐น (๐ฅ). By Theorem 20.12 and the reexivity of ๐ and ๐ ,
(27.34) ๐ทโ๐น (๐ฅ |๐ฆ) = ๐ทโ๐น (๐ฅ |๐ฆ) = {๐น โฒ(๐ฅ)โ}.
We have both๐ทโ๐นโ1(๐ฆ |๐ฅ) = [๐ทโ๐น (๐ฅ |๐ฆ)]โ1 and๐ทโ๐นโ1(๐ฆ |๐ฅ) = [๐ทโ๐น (๐ฅ |๐ฆ)]โ1 by Lemma 20.5.Due to (27.27), this then implies that ๐ทโ
๐๐นโ1(๐ฆ |๐ฅ) โ ๐ทโ๐นโ1(๐ฆ |๐ฅ) = [๐ทโ๐น (๐ฅ |๐ฆ)]โ1. The
existence of a left-inverse implies that ๐น โฒ(๐ฅ)โ is injective, which together with (27.34) yields(27.30).
By the continuity of ๐น , graph ๐นโ1 is closed near (๐ฆ, ๐ฅ). By Lemma 25.6, ๐นโ1 is PSNC at ๐ฆfor ๐ฅ . Consequently, Corollary 27.14 shows that ๐น is metrically regular at ๐ฅ for ๐ฆ . By thedenition (27.2) of metrical regularity, there thus exists for any ห > reg ๐น (๐ฅ |๐ฆ) a ๐ฟ > 0such that
inf๐ฅโ๐นโ1 (๐ฆ)
โ๐ฅ โ ๐ฅ โ๐ โค ห โ๐น (๐ฅ) โ ๐ฆ โ๐ (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฆ โ ๐น(๐ฆ, ๐ฟ)) .
Taking in particular ๐ฅ = ๐ฅ yields
inf๐ฅโ๐นโ1 (๐ฆ)
โ๐ฅ โ ๐ฅ โ๐ โค ห โ๐น (๐ฅ) โ ๐ฆ โ๐ (๐ฆ โ ๐น(๐น (๐ฅ), ๐ฟ)) .
Although the inmum might not be attained, this implies that we can take arbitrary ^ > หto obtain for any ๐ฆ โ ๐น(๐น (๐ฅ), ๐ฟ) the existence of some ๐ฝ (๐ฆ) โ ๐ฅ โ ๐นโ1(๐ฆ) satisfyingโ๐ฅ โ ๐ฅ โ๐ โค ^โ๐น (๐ฅ) โ ๐ฆ โ๐ , which is the claim. ๏ฟฝ
27.4 stability with respect to perturbations
We now return to the question of stability, where we are interested more generally in theeect of perturbations of an optimization problem on its solution. In the context of theintroductory example (P), this could be the data ๐ง, the penalty parameter ๐ผ , the constraintset๐ถ โ ๐ , or other parameters in the model. More generally, let ๐, ๐ be Banach spaces and๐ : ๐ ร ๐ โ โ. We then consider for some parameter ๐ โ ๐ the optimization problem
(27.35) min๐ฅโ๐
๐ (๐ฅ ;๐)
359
27 lipschitz-like properties and stability
and study how a minimizer (or critical point) ๐ฅ โ ๐ depends on changes in ๐ . For thispurpose, we introduce the set-valued solution mapping (or, if ๐ฅ โฆโ ๐ (๐ฅ ;๐) is not convex,critical point map)
(27.36) ๐ : ๐ โ ๐, ๐ (๐) โ {๐ฅ โ ๐ | 0 โ ๐๐ฅ ๐ (๐ฅ ;๐)},where ๐๐ฅ is a suitable (convex, Clarke) subdierential with respect to ๐ฅ for xed ๐ . We applythe concepts from Section 27.1 to this problem. Specically, if ๐ has the Aubin property at๐ for ๐ฅ , then we can take ๐ฆ = ๐ฆ in(27.1) to obtain
inf๐ฅโ๐ (๐)
โ๐ฅ โ ๐ฅ โ๐ โค ^โ๐ โ ๐ โ๐ (๐ โ ๐น(๐, ๐ฟ))
for some ๐ฟ, ^ > 0. In other words, the Aubin property of the solution map ๐ at ๐ for ๐ฅ๐ฅimplies the local Lipschitz stability of solutions ๐ฅ = ๐ (๐) under perturbations ๐ around theparameter ๐ . The question now is when a solution mapping has the Aubin property.
We start with a simple special case. Returning to the motivation at the beginning of thischapter,๐ค โ ๐๐ (๐ฅ) is of course equivalent to 0 โ ๐๐ (๐ฅ) โ {๐ค} = ๐(๐ โ ใ๐ค, ยท ใ๐ ) (๐ฅ) sincecontinuous linear mappings are dierentiable. Such a perturbation of ๐ is called a tiltperturbation, with๐ค โ ๐ โ called tilt parameter.
To make this more precise, let ๐ : ๐ โ โ be locally Lipschitz. For a tilt parameter ๐ โ ๐ โ,we then dene
(27.37) ๐ (๐ฅ ;๐) = ๐(๐ฅ) โ ใ๐, ๐ฅใ๐and refer to the stability of minimizers (or critical points) of ๐ with respect to ๐ as tiltstability. By Theorems 13.4 and 13.20, the solution mapping for ๐ is
๐ (๐) = {๐ฅ โ ๐ | ๐ โ ๐๐ถ๐(๐ฅ)} = (๐๐๐ถ)โ1(๐),which thus has the Aubin property โ and ๐ is tilt-stable โ if and only if ๐๐ถ๐ is metricallyregular at ๐ฅ for 0, i.e., by (27.2) that there exist ^, ๐ฟ > 0 such that
(27.38) dist(๐ฅ, (๐๐ถ๐)โ1(๐ฅโ)) โค ^ dist(๐๐ถ๐(๐ฅ), ๐ฅโ) (๐ฅโ โ ๐น(0, ๐ฟ); ๐ฅ โ ๐น(๐ฅ, ๐ฟ)).
We illustrate this with two examples. The rst concerns data stability of least squarestting, which in Hilbert spaces can be formulated as tilt stability.
Example 27.17 (data stability of least squares fiing). Let ๐,๐ be Hilbert spaces and๐(๐ฅ) = 1
2 โ๐ด๐ฅ โ๐โ2๐ for some๐ด โ ๐(๐ ;๐ ) and ๐ โ ๐ . Taking ๐ = ๐ดโฮ๐ for some ฮ๐ โ ๐ ,we can write this in the form of (27.37) via
๐ (๐ฅ ;๐) = ๐(๐ฅ) โ ใ๐ดโฮ๐, ๐ฅใ๐ =12 โ๐ด๐ฅ โ (๐ + ฮ๐)โ2๐ โ 1
2 โฮ๐โ2๐ .
Data stability thus follows from the metric regularity of ๐๐ at a minimizer ๐ฅ of the
360
27 lipschitz-like properties and stability
convex functional ๐. We have ๐๐ถ๐(๐ฅ) = {๐ดโ(๐ด๐ฅ โ ๐)}, so
(๐๐)โ1(๐ฅโ) = {๐ฅ | ๐ดโ๐ด๐ฅ = ๐ดโ๐ + ๐ฅโ}.
Therefore (27.38) is equivalent to
inf๐ฅ{โ๐ฅ โ ๐ฅ โ๐ | ๐ดโ๐ด๐ฅ = ๐ดโ๐ + ๐ฅโ} โค ^โ๐ดโ๐ด๐ฅ โ (๐ดโ๐ + ๐ฆ)โ๐
(๐ฅโ โ ๐น(0, ๐ฟ); ๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .
If๐ดโ๐ด has a bounded inverse (๐ดโ๐ด)โ1 โ ๐(๐ ;๐ ), then we can take ^ = โ(๐ดโ๐ด)โ1โ๐(๐ ;๐ )for any ๐ฟ > 0. On the other hand, if ๐ดโ๐ด is not surjective, then there cannot be metricregularity (simply take an appropriate choice of ๐ฅโ outside ran๐ดโ๐ด).
For a genuinely nonsmooth example, we consider the (academic) problem of minimizingthe (non-squared) norm on a Hilbert space.
Example 27.18 (tilt stability of least norm fiing). Let ๐ be a Hilbert space and ๐(๐ฅ) =โ๐ฅ โ ๐งโ๐ for some ๐ง โ ๐ . To show tilt stability, we have to verify (27.38) for some^, ๐ฟ > 0. For ๐ฅ โ ๐ง, we have ๐๐(๐ฅ) = {(๐ฅ โ ๐ง)/โ๐ฅ โ ๐งโ๐ }, and for ๐ฅ = ๐ง, we have๐๐(๐ฅ) = ๐น(0, 1). Thus (27.38) reads
dist(๐ฅ, (๐๐)โ1(๐ฅโ)) โค ^{ ๐ฅโ๐ง
โ๐ฅโ๐งโ๐ โ ๐ฅโ ๐
if ๐ฅ โ ๐ง,
dist(๐ฅโ,๐น(0, 1)) if ๐ฅ = ๐ง(27.39)
for all ๐ฅโ โ ๐น(0, ๐ฟ) and ๐ฅ โ ๐น(๐ฅ, ๐ฟ) where
dist(๐ฅ, (๐๐)โ1(๐ฅโ)) =dist(๐ฅ โ ๐ง, ๐ฅโ [0,โ)) if โ๐ฅโโ๐ = 1,โ๐ฅ โ ๐งโ๐ if โ๐ฅโโ๐ < 1,โ if โ๐ฅโโ๐ > 1.
As the inequality cannot hold if โ๐ฅโโ๐ > 1, we take ๐ฟ โ (0, 1] to ensure that this doesnot happen. If ๐ฅ = ๐ง, then (27.39) trivially holds for any ^ > 0, both sides being zero.For ๐ฅโ โ ๐น(0, ๐ฟ) and ๐ฅ โ ๐น(๐ฅ, ๐ฟ) \ {๐ง}, the inequality (27.39) reads
^
๐ฅ โ ๐งโ๐ฅ โ ๐งโ๐ โ ๐ฅโ
๐
โฅ{dist(๐ฅ โ ๐ง, ๐ฅโ [0,โ)) if โ๐ฅโโ๐ = 1,โ๐ฅ โ ๐งโ๐ if โ๐ฅโโ๐ < 1.
Choosing ๐ฅโ = _(๐ฅ โ ๐ง)/โ๐ฅ โ ๐งโ๐ , and letting _โ1, we see that the inequality cannothold unless ๐ฟ โ (0, 1) (which prevents _โ1). Thus, taking the inmum of the left-handside over โ๐ฅโโ๐ โค ๐ฟ < 1 and the supremum of the right-hand side over ๐ฅ โ ๐น(๐ฅ, ๐ฟ),the inequality holds if ^ (1 โ ๐ฟ) โฅ ๐ฟ . This can be satised for any ^ > 0 for sucientlysmall ๐ฟ โ (0, 1).
361
27 lipschitz-like properties and stability
Since ๐ฅโ โ ๐ is comparable to the tilt parameter ๐ โ ๐ , this says that we can onlystably โtiltโ ๐ by an amount โ๐ โ๐ < 1. If we tilt with โ๐โ๐ > 1, the tilted function hasno minimizer, while for โ๐โ๐ = 1, every ๐ฅ = ๐ง + ๐ก๐ for ๐ก โฅ 0 is a minimizer.
We now return to the general solution mapping (27.36). The following proposition appliedto ๐น (๐ฅ, ๐) โ ๐๐ฅ ๐ (๐ฅ ;๐) provides a general tool for our analysis.
Theorem 27.19. Let ๐ , ๐ , and ๐ be reexive and Gรขteaux smooth Banach spaces. For ๐น :๐ ร ๐ โ ๐ , let
๐ (๐) โ {๐ฅ โ ๐ | 0 โ ๐น (๐ฅ, ๐)}.Then ๐ has the Aubin property at ๐ for ๐ฅ โ ๐ (๐) if(27.40) (0, ๐โ) โ ๐ทโ
๐ ๐น (๐ฅ, ๐ |0) (๐ฆโ) โ ๐ฆโ = 0, ๐โ = 0
and๐ (๐ฆ, ๐) โ {๐ฅ โ ๐ | ๐ฆ โ ๐น (๐ฅ, ๐)}.
is PSNC at (๐ฆ, ๐) for ๐ฅ .
Proof. We have ๐ (๐) = ๐ (0, ๐). Hence if we can show that ๐ has the Aubin property at(0, ๐) for ๐ฅ , this will imply the Aubin property of ๐ at ๐ for ๐ฅ by simple restriction of thefree variables in Theorem 27.2 (i) to the subspace {0} ร ๐ .We do this by applying Theorem 27.10 to ๐ , which holds if we can show that
๐ทโ๐๐ (0, ๐ |๐ฅ) (0) = {0}.
By (27.27), a sucient assumption for this is that
๐ทโ๐ (0, ๐ |๐ฅ) (0) = {0},which can equivalently be expressed as
(27.41) (๐ฆโ, ๐โ, 0) โ ๐graph๐ (0, ๐, ๐ฅ) โ ๐ฆโ = 0, ๐โ = 0.
Nowgraph๐ = {(๐ฆ, ๐, ๐ฅ) | ๐ฆ โ ๐น (๐ฅ, ๐)} = ๐ graph ๐น
for the permutation ๐ (๐ฅ, ๐, ๐ฆ) โ (๐ฆ, ๐, ๐ฅ) (which applied to a set should be understood asapplied to every element of that set). We thus also have
๐graph๐ (๐ฆ, ๐, ๐ฅ) = ๐๐graph ๐น (๐ (๐ฆ, ๐, ๐ฅ)) .In particular, (27.41) becomes
(0, ๐โ, ๐ฆโ) โ ๐graph ๐น (๐ฅ, ๐, 0) โ ๐ฆโ = 0, ๐โ = 0.
But this is equivalent to (27.40). ๏ฟฝ
362
27 lipschitz-like properties and stability
Remark 27.20. Theorem 27.19 is related to the classical implicit function theorem. If ๐น is graphicallyregular at (๐ฅ, ๐, 0), it also possible derive formulas for ๐ท๐ , such as
๐ท๐ (๐ |๐ฅ) (ฮ๐) = {ฮ๐ฅ โ ๐ | ๐ท๐น (๐ฅ, ๐ |0) (ฮ๐ฅ,ฮ๐) 3 0}.
For details in nite dimensions, we refer to [Rockafellar & Wets 1998, Theorem 9.56 & Proposition8.41].
We close this chapter by illustrating the requirements of Theorem 27.19 for the stability ofspecic problems of the form (P) with respect to the penalty parameter ๐ผ . (Naturally, thesecan be relaxed or made further explicit in more concrete situations.)
Example 27.21. Let ๐ be a nite-dimensional and Gรขteaux smooth Banach space andlet โ : ๐ โ โ be twice continuously dierentiable and ๐ : ๐ โ โ be convex, proper,and lower semicontinuous. We then consider for ๐ผ > 0 the problem
(27.42) min๐ฅโ(๐ฅ) + ๐ผ๐(๐ฅ).
By Theorems 13.4, 13.5 and 13.20, the solution mapping with respect to the parameter ๐ผis then
๐ (๐ผ) โ {๐ฅ โ โ๐ | 0 โ ๐น (๐ฅ ;๐ผ)} for ๐น (๐ฅ ;๐ผ) โ โโฒ(๐ฅ) + ๐ผ๐๐(๐ฅ).
To apply Theorem 27.19 to obtain the Aubin property of ๐ at ๐ผ for ๐ฅ โ ๐ (๐ผ), we need toverify its assumptions. First, by Theorems 25.14 and 25.20, we have
๐ทโ๐น (๐ฅ ;๐ผ |0) (๐ฆ) =(โโฒโฒ(๐ฅ)โ๐ฆ + ๐ผ๐ทโ [๐๐] (๐ฅ | โ ๐ผโ1โโฒ(๐ฅ)) (๐ฆ)
โใโโฒ(๐ฅ), ๐ฆใ๐
).
Thus (27.40) holds if
(27.43) 0 โ โโฒโฒ(๐ฅ)โ๐ฆ + ๐ผ๐ทโ [๐๐] (๐ฅ | โ ๐ผโ1โโฒ(๐ฅ)) (๐ฆ) โ ๐ฆ = 0.
Furthermore, since ๐ โ รโ is nite-dimensional, the PSNC holds at every (๐ฆ, ๐ผ) with๐ฆ โ ๐น (๐ฅ, ๐ผ) and ๐ผ > 0 by Lemma 25.5. Hence Theorem 27.19 is indeed applicable andimplies that ๐ has the Aubin property at ๐ผ . We therefore have the stability estimate
inf๐ฅโ๐ (๐ผ)
โ๐ฅ โ ๐ฅ โ๐ โค ^ |๐ผ โ ๐ผ |
for some ^ > 0 and all ๐ผ suciently close to ๐ผ .
363
28 FASTER CONVERGENCE FROM REGULARITY
As we have seen in Chapter 10, proximal point and splitting methods can be accelerated ifat least one of the involved functionals is strongly convex. However, this can be a too strongrequirement, and we will show in this chapter how we can obtain faster convergence (evenwithout acceleration) under the weaker requirement of metric subregularity or of strongsubmonotonicity. We rst study these notions in general before illustrating their eect onsplitting methods by showing local linear convergence of forward-backward splitting.
28.1 subregularity and submonotonicity of subdifferentials
Throughout this section, let ๐ be a Banach space and ๐บ : ๐ โ โ be convex, proper, andlower semicontinuous. Our goal is now to give conditions for metric subregularity andstrong submonotonicity of ๐๐บ : ๐ โ ๐ โ at a critical point ๐ฅ โ ๐ with 0 โ ๐๐บ (๐ฅ).
metric subregularity
We recall from (27.3) that a set-valued mapping ๐ป : ๐ โ ๐ โ is metrically subregular at๐ฅ โ ๐ for๐ค โ ๐ โ if there exist ๐ฟ > 0 and ^ > 0 such that
dist(๐ฅ, ๐ปโ1(๐ค)) โค ^ dist(๐ค,๐ป (๐ฅ)) (๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .
We also recall that the inmum of all ^ > 0 for which this inequality holds for some ๐ฟ > 0is subreg๐ป (๐ฅ |๐ค), the modulus of (metric) subregularity of ๐ป at ๐ฅ for๐ค . In the following,we will also make use of the squared distance of ๐ฅ โ ๐ to a set ๐ด โ ๐ ,
dist2(๐ฅ,๐ด) โ inf๐ฅโ๐ด
โ๐ฅ โ ๐ฅ โ2๐ .
Theorem 28.1. Let ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous and let ๐ฅ โ ๐with 0 โ ๐๐บ (๐ฅ). If there exist ๐พ > 0 and ๐ฟ > 0 such that
(28.1) ๐บ (๐ฅ) โฅ ๐บ (๐ฅ) + ๐พ dist2(๐ฅ, [๐๐บ]โ1(0)) (๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ)),
then ๐๐บ is metrically subregular at ๐ฅ for 0 with subreg ๐๐บ (๐ฅ |0) โค ๐พโ1.
364
28 faster convergence from regularity
Conversely, if subreg ๐๐บ (๐ฅ |0) โค 4๐พโ1 for some ๐พ > 0, then there exists ๐ฟ > 0 such that (28.1)holds for this ๐พ .
Proof. Let rst (28.1) hold for ๐พ, ๐ฟ > 0. We need to show that
(28.2) ๐พ dist(๐ฅ, [๐๐บ]โ1(0)) โค dist(0, ๐๐บ (๐ฅ)) (๐ฅ โ ๐น(๐ฅ, ๐ฟ)) .To that end, let ๐ฅ โ ๐น(๐ฅ, ๐ฟ). Clearly, if ๐๐บ (๐ฅ) = โ , there is nothing to prove. So assume thatthere exists an ๐ฅโ โ ๐๐บ (๐ฅ). For each Y > 0, we can also nd ๐ฅY โ [๐๐บ]โ1(0) such that
(28.3) โ๐ฅ โ ๐ฅY โ๐ โค dist(๐ฅ, [๐๐บ]โ1(0)) + Y.By the denition of the convex subdierential and ๐ฅ, ๐ฅY โ argmin๐บ , we have
ใ๐ฅโ, ๐ฅ โ ๐ฅYใ๐ โฅ ๐บ (๐ฅ) โ๐บ (๐ฅY) = ๐บ (๐ฅ) โ๐บ (๐ฅ).Combined with (28.1) and (28.3), this yields
๐พ dist2(๐ฅ, [๐๐บ]โ1(0)) โค ใ๐ฅโ, ๐ฅ โ ๐ฅYใ๐โค โ๐ฅโโ๐ โ โ๐ฅ โ ๐ฅY โ๐ โค โ๐ฅโโ๐ โ (dist(๐ฅ, [๐๐บ]โ1(0)) + Y).
Since Y > 0 was arbitrary and โ๐ฅโโ๐ โ โค dist(0, ๐๐บ (๐ฅ)), we obtain (28.2) and thus metricsubregularity with the claimed modulus.
Conversely, let ๐๐บ be metrically subregular at ๐ฅ for 0 with subreg ๐๐บ (๐ฅ |0) = 4/๐พ for some๐พ > 0. Let ๐พ โ (0, 1/(4๐พ)). We argue by contradiction. Assume that (28.1) does not hold forarbitrary ๐ฟ > 0. Then we can nd some ๐ฅ โ ๐น(๐ฅ, 2๐ฟ/3) such that
(28.4) ๐บ (๐ฅ) < ๐บ (๐ฅ) + ๐พ dist2(๐ฅ, [๐๐บ]โ1(0)) .However, ๐ฅ is a minimizer of ๐บ , so necessarily ๐พ dist2(๐ฅ, [๐๐บ]โ1(0)) > 0. By Ekelandโsvariational principle (Theorem 2.14), we can thus nd ๐ฆ โ ๐ satisfying
(28.5) โ๐ฆ โ ๐ฅ โ๐ โค 12 dist(๐ฅ, [๐๐บ]
โ1(0))and for all ๐ฅ โ ๐ that
๐บ (๐ฅ) โฅ ๐บ (๐ฆ) โ ๐พ dist2(๐ฅ, [๐๐บ]โ1(0))
12 dist(๐ฅ, [๐๐บ]โ1(0))
โ๐ฅ โ ๐ฆ โ๐ = ๐บ (๐ฆ) โ 2๐พ dist(๐ฅ, [๐๐บ]โ1(0))โ๐ฅ โ ๐ฆ โ๐ .
In follows that ๐ฆ minimizes๐บ + 2๐พ dist(๐ฅ, [๐๐บ]โ1(0))โ ยท โ ๐ฆ โ๐ , which by Theorems 4.2, 4.6and 4.14 is equivalent to 0 โ ๐๐บ (๐ฆ) + 2๐พ dist(๐ฅ, [๐๐บ]โ1(0))๐น๐ โ . Hence we can nd some๐ฆโ โ ๐๐บ (๐ฆ) satisfying โ๐ฆโโ๐ โ โค 2๐พ dist(๐ฅ, [๐๐บ]โ1(0)). Using (28.5), we now obtain
2๐พ dist(0, ๐๐บ (๐ฆ)) < (2๐พ)โ1 dist(0, ๐๐บ (๐ฆ))โค (2๐พ)โ1โ๐ฆโโ๐ โ โค dist(๐ฅ, [๐๐บ]โ1(0))= 2 dist(๐ฅ, [๐๐บ]โ1(0)) โ dist(๐ฅ, [๐๐บ]โ1(0))โค 2โ๐ฆ โ ๐ฅ โ๐ + 2 dist(๐ฆ, [๐๐บ]โ1(0)) โ dist(๐ฅ, [๐๐บ]โ1(0))โค 2 dist(๐ฆ, [๐๐บ]โ1(0)) .
365
28 faster convergence from regularity
By (28.5) and our choice of ๐ฅ โ ๐น(๐ฅ, 2๐ฟ/3),
โ๐ฆ โ ๐ฅ โ๐ โค โ๐ฆ โ ๐ฅ โ๐ + โ๐ฅ โ ๐ฅ โ๐ โค 32 โ๐ฅ โ ๐ฅ โ๐ โค ๐ฟ.
Therefore ๐ฆ โ ๐น(๐ฅ, ๐ฟ) violates the assumed metric subregularity (28.2) with the factor ๐พ ,and hence (28.1) holds. ๏ฟฝ
Applying Theorem 28.1 to ๐ฅ โฆโ ๐บ (๐ฅ) + ใ๐ฅโ, ๐ฅใ๐ now yields the following characterizationdue to [Aragรณn Artacho & Georoy 2014].
Corollary 28.2. Let๐บ : ๐ โ โ be convex, proper, and lower semicontinuous and let ๐ฅ โ ๐and ๐ฅโ โ ๐๐บ (๐ฅ). If there exist ๐พ > 0 and ๐ฟ > 0 such that
(28.6) ๐บ (๐ฅ) โฅ ๐บ (๐ฅ) + ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + ๐พ dist2(๐ฅ, [๐๐บ]โ1(๐ฅโ)) (๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ)),
then ๐๐บ is metrically subregular at ๐ฅ for ๐ฅโ with subreg ๐๐บ (๐ฅ |๐ฅโ) โค ๐พโ1.Conversely, if subreg ๐๐บ (๐ฅ |๐ฅโ) โค 4/๐พ for some ๐พ > 0, then there exists ๐ฟ > 0 such that (28.6)holds for this ๐พ .
Remark 28.3 (strong metric subregularity). As in Remark 27.1, we can also characterize strong metricsubregularity using a strong notion of local subdierentiability. In the setting of Corollary 28.2, itwas shown in [Aragรณn Artacho & Georoy 2014] that strong metric subregularity of ๐๐บ at ๐ฅ for ๐ฅโis equivalent to
(28.7) ๐บ (๐ฅ) โฅ ๐บ (๐ฅ) + ใ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + ๐พ โ๐ฅ โ ๐ฅ โ2๐ (๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ)),
i.e., a local form of strong subdierentiability. Compared to the characterization of metric subregu-larity in (28.6), intuitively the strong version does not โsqueezeโ [๐๐บ]โ1(๐ฅโ) into a single point.
Strong metric subregularity may almost trivially be used in the convergence proofs of Part IIand Chapter 15 as a relaxation of strong convexity; compare [Clason, Mazurenko & Valkonen 2020].Also observe that (28.7) can be expressed in terms of the Bregman divergence (see Section 11.1) as
๐ต๐ฅโ๐บ (๐ฅ, ๐ฅ) โฅ ๐พ โ๐ฅ โ ๐ฅ โ2๐ (๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ)),
i.e., that ๐ต๐ฅโ๐บ is elliptic at ๐ฅ in the sense of [Valkonen 2020a]. In optimization methods based onpreconditioning by Bregman divergences instead of the linear preconditioner ๐ as discussed inSection 11.1, this generalizes the positive deniteness requirement on๐ .
366
28 faster convergence from regularity
strong submonotonicity
An alternative is to relax the strong monotonicity assumption of Chapter 10 more directly.We say that a set-valued mapping ๐ป : ๐ โ ๐ โ is (๐พ, \ )-strongly submonotone at ๐ฅ for๐ฅโ โ ๐ป (๐ฅ) with \ โฅ ๐พ > 0 if there exists ๐ฟ > 0 such that for all ๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ) and๐ฅโ โ ๐ป (๐ฅ) โฉ ๐น๐ โ (๐ฅโ, ๐ฟ),
(28.8) inf๐ฅโ๐ปโ1 (๐ฅโ)
(ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ + (\ โ ๐พ)โ๐ฅ โ ๐ฅ โ2๐) โฅ \ dist2(๐ฅ, ๐ปโ1(๐ฅโ)) .
If this only holds for \ โฅ ๐พ = 0, then we call ๐ป submonotone at ๐ฅ for ๐ฅโ.
Clearly, (strong) monotonicity (see Lemma 7.4) implies (strong) submonotonicity at any๐ฅ โ ๐ and ๐ฅโ โ ๐ป (๐ฅ). However, subdierentials of convex functionals need not be stronglymonotone. The next theorem shows that local second-order growth away from the set ofminimizers implies strong submonotonicity of such subdierentials at any minimizer ๐ฅ for๐ฅโ = 0, which is the monotonicity-based analogue of the characterization Theorem 28.1 ofmetric subregularity.
Theorem 28.4. Let ๐บ : ๐ โ โ be convex, proper, and lower semicontinuous and let ๐ฅ โ ๐with 0 โ ๐๐บ (๐ฅ). If there exists ๐ฟ > 0 such that
(28.9) ๐บ (๐ฅ) โฅ ๐บ (๐ฅ) + ๐พ dist2(๐ฅ, [๐๐บ]โ1(0)) (๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ)),
then ๐๐บ is (๐พ, \ )-strongly submonotone at ๐ฅ for 0 for any \ โฅ ๐พ .
Proof. Since \ โฅ ๐พ , (28.9) is equivalent to
(28.10) inf๐ฅโ[๐๐บ]โ1 (0)
(๐บ (๐ฅ) โ๐บ (๐ฅ) + (\ โ ๐พ)โ๐ฅ โ ๐ฅ โ2๐
) โฅ \ dist2(๐ฅ, [๐๐บ]โ1(0))for all ๐ฅ โ ๐น๐ (๐ฅ, ๐ฟ). By the denition of the convex subdierential, we have for all ๐ฅ โ[๐๐บ]โ1(0) and ๐ฅโ = 0 that
ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ ๐บ (๐ฅ) โ๐บ (๐ฅ) = ๐บ (๐ฅ) โ๐บ (๐ฅ).
Inserting this into (28.10) yields the denition (28.8) of strong submonotonicity for ๐ป =๐๐บ . ๏ฟฝ
TogetherwithTheorem 28.1, this shows that for convex subdierentials,metric subregularityimplies strong submonotonicity, which is thus a weaker property.
Remark 28.5. Submonotonicity was introduced in [Valkonen 2021] together with its โpartialโ (i.e.,only holding on subspace) variants.
367
28 faster convergence from regularity
examples
We conclude this section by showing that the subdierentials of the indicator functionalof the nite-dimensional unit ball and of the absolute value function are both subreg-ular and strongly submonotone. Note that neither of these subdierentials is stronglymonotone in the conventional sense. Here we restrict ourselves to showing (๐พ,๐พ)-strongsubmonotonicity for some (๐ฅ, ๐ฅโ) โ graph ๐๐บ , i.e., that there exists ๐ฟ > 0 such that
(28.11) ใ๐ฅโ โ ๐ฅโ, ๐ฅ โ ๐ฅใ๐ โฅ ๐พ dist2(๐ฅ, [๐๐บ]โ1(๐ฅโ)) (๐ฅ โ ๐น(๐ฅ, ๐ฟ), ๐ฅโ โ ๐๐บ (๐ฅ)) .
Lemma 28.6. Let ๐บ โ ๐ฟ๐น(0,๐ผ) on (โ๐ , โ ยท โ2) and (๐ฅ, ๐ฅโ) โ graph ๐๐บ . Then ๐๐บ is
(i) metrically subregular at ๐ฅ for ๐ฅโ for any ๐ฟ โ (0, ๐ผ] and
^ โฅ{2๐ผ/โ๐ฅโโ2 if ๐ฅโ โ 0,0 if ๐ฅโ = 0;
(ii) (๐พ,๐พ)-strongly submonotone at ๐ฅ for ๐ฅโ for any ๐ฟ > 0 and
๐พ โค{โ๐ฅโโ2/(2๐ผ) if ๐ฅโ โ 0,โ if ๐ฅโ = 0.
Proof. We rst verify (28.6) for ๐ฟ = ๐ผ and ๐พ = ^โ1 as stated. To that end, let ๐ฅ โ ๐น(0, ๐ผ). If๐ฅโ = 0, then (28.6) trivially holds by the subdierentiability of๐บ and dist2(๐ฅ, [๐๐บ]โ1(๐ฅโ)) =dist2(๐ฅ,๐น(0, ๐ผ)) = 0. Let therefore ๐ฅโ โ 0. Then [๐๐บ]โ1(๐ฅโ) = {๐ฅ} as well as โ๐ฅ โ2 = ๐ผ and๐ฅโ = ๐ฝ๐ฅ for ๐ฝ = โ๐ฅโโ2/โ๐ฅ โ2. Since ๐พ โค โ๐ฅโโ2/(2๐ผ), we have ๐ฝ โฅ 2๐พ . Then โ๐ฅ โ2 โค ๐ผ yields
๐พ dist2(๐ฅ, [๐๐บ]โ1(๐ฅโ)) = ๐พ โ๐ฅ โ ๐ฅ โ22โค ๐ฝ ใ๐ฅ, ๐ฅ โ ๐ฅใ2 โ ๐ฝ
2 โ๐ฅ โ22 +
๐ฝ
2 โ๐ฅ โ22
โค ๐ฝ ใ๐ฅ, ๐ฅ โ ๐ฅใ2= ใ๐ฅโ, ๐ฅ โ ๐ฅใ2โค ใ๐ฅโ, ๐ฅ โ ๐ฅใ2 +๐บ (๐ฅ) โ๐บ (๐ฅ).
Since dom๐บ = ๐น(0, ๐ผ), this shows that (28.6) holds for any ๐ฟ > 0.
Corollary 28.2 now yields (i). Adding
๐บ (๐ฅ) โ๐บ (๐ฅ) โฅ ใ๐ฅโ, ๐ฅ โ ๐ฅใ2 (๐ฅโ โ ๐๐บ (๐ฅ))
to (28.6), we also obtain (28.11) and thus (ii). ๏ฟฝ
Lemma 28.7. Let ๐บ โ | ยท | on โ and (๐ฅ, ๐ฅโ) โ graph ๐๐บ . Then ๐๐บ is
368
28 faster convergence from regularity
(i) metrically subregular at ๐ฅ for ๐ฅโ for any ^ > 0 and
๐ฟ โค{2^ if ๐ฅโ โ {1,โ1},^ if |๐ฅโ | < 1;
(ii) (๐พ,๐พ)-strongly submonotone at ๐ฅ for ๐ฅโ for any ๐พ > 0 and
๐ฟ โค{2๐พโ1 if ๐ฅโ โ {1,โ1},๐พโ1 if |๐ฅโ | < 1.
Proof. We rst verify (28.6) for any ๐ฟ > 0 and ๐พ = ^โ1 as stated. Suppose rst that ๐ฅโ = 1so that ๐ฅ โ [๐๐บ]โ1(๐ฅโ) = [0,โ). This implies that ๐ฅ = |๐ฅ |, and hence (28.6) becomes
|๐ฅ | โฅ ๐ฅ + ๐พ inf๐ฅโฅ0
(๐ฅ โ ๐ฅ)2 ( |๐ฅ โ ๐ฅ | โค ๐ฟ).
If ๐ฅ โฅ 0, this trivially holds by taking ๐ฅ = ๐ฅ . If ๐ฅ โค 0, the right-hand side is minimized by๐ฅ = 0, and thus the inequality holds for ๐ฅ โฅ โ2๐พโ1. Since ๐ฅ โฅ 0, this is guaranteed by ourbound on ๐ฟ . The case ๐ฅโ = โ1 is analogous.If |๐ฅโ | < 1, then ๐ฅ โ [๐๐บ]โ1(๐ฅโ) = {0}, and hence (28.6) becomes
|๐ฅ | โฅ ๐พ |๐ฅ |2 ( |๐ฅ | โค ๐ฟ).This again holds by our choice of ๐ฟ .
Corollary 28.2 now yields (i). Adding
๐บ (๐ฅ) โ๐บ (๐ฅ) โฅ ใ๐ฅโ, ๐ฅ โ ๐ฅใ (๐ฅโ โ ๐๐บ (๐ฅ))to (28.6), we also obtain (28.11) and thus (ii). ๏ฟฝ
Remark 28.8. If we allow in the denition of subregularity or submonotonicity an arbitrary neigh-borhood of ๐ฅ instead of a ball, then Lemma 28.7 holds in a much larger neighborhood.
28.2 local linear convergence of explicit splitting
Returning to the notation used in Chapters 8 to 12, we assume throughout that๐ is a Hilbertspace, ๐น,๐บ : ๐ โ โ are convex, proper, and lower semicontinuous, and that ๐น is Frรฉchetdierentiable and has a Lipschitz continuous gradient โ๐น with Lipschitz constant ๐ฟ โฅ 0.Let further an initial iterate ๐ฅ0 โ ๐ and a step size ๐ > 0 be given and let the sequence{๐ฅ๐}๐โโ be generated by the forward-backward splitting method (or basic proximal pointmethod if ๐น = 0), i.e., by solving for ๐ฅ๐+1 in
(28.12) 0 โ ๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] + (๐ฅ๐+1 โ ๐ฅ๐).
369
28 faster convergence from regularity
We also write ๐ป โ ๐๐บ + โ๐น : ๐ โ ๐ . Finally, it is worth recalling the approach ofChapter 10 for encoding convergence rate into โtestingโ parameters ๐๐ > 0.
We start our analysis by adapting the proofs of Theorems 10.2 and 11.4 to employ thesquared distance function ๐ฅ โฆโ dist2(๐ฅ ;๐ ) to the entire solution set ๐ = ๐ปโ1(0) in placeof the squared distance function ๐ฅ โฆโ โ๐ฅ โ ๐ฅ โ2๐ to a xed ๐ฅ โ ๐ปโ1(0).
Lemma 28.9. Let ๐ โ ๐ . If for all ๐ โ โ and๐ค๐+1 โ โโ๐น (๐ฅ๐) โ ๐โ1(๐ฅ๐+1 โ ๐ฅ๐) โ ๐๐บ (๐ฅ๐+1),
(28.13) inf๐ฅโ๐
(๐๐2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐๐ ใ๐ค๐+1 + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐
)โฅ ๐๐+1
2 dist2(๐ฅ๐+1, ๐ ) โ ๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ ,
then
(28.14) ๐๐2 dist2(๐ฅ๐ , ๐ ) โค ๐1
2 dist2(๐ฅ0, ๐ ) (๐ โฅ 1).
Proof. Inserting (28.12) into (28.13) yields
(28.15) inf๐ฅโ๐ปโ1 (0)
๐๐
(12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ + 12 โ๐ฅ
๐+1 โ ๐ฅ โ2๐ โ ใ๐ฅ๐+1 โ ๐ฅ๐ , ๐ฅ๐+1 โ ๐ฅใ๐)
โฅ ๐๐+12 dist2(๐ฅ๐+1;๐ปโ1(0)) .
Using the three-point formula (9.1), we can then rewrite (28.15) as๐๐2 dist2(๐ฅ๐ ;๐ปโ1(0)) โฅ ๐๐+1
2 dist2(๐ฅ๐+1;๐ปโ1(0)) .
The claim now follows by a telescoping sum over ๐ = 0, . . . , ๐ โ 1. ๏ฟฝ
rates from error bounds and metric subregularity
Our rst approach for the satisfaction of (28.13) is based on error bounds, which we willprove using metric subregularity. The essence of error bounds is to prove for some \ > 0that
โ๐ฅ๐+1 โ ๐ฅ๐ โ๐ โฅ \ โ๐ฅ๐+1 โ ๐ฅ โ๐ .We slightly weaken this condition, and assume the bound to be relative to the entire solutionset, i.e.,
(28.16) โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โฅ \ dist2(๐ฅ๐+1;๐ปโ1(0)) .
This bound holds under metric subregularity. We rst need the following technical lemmaon the iteration (28.12).
370
28 faster convergence from regularity
Lemma 28.10. If 0 < ๐๐ฟ < 1, then
12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โฅ ๐2
4[1 + ๐ฟ2๐2] dist2(0, ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐+1)) .
Proof. Since โ(๐ฅ๐+1 โ ๐ฅ๐) โ ๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)] by (28.12), we have
(28.17) 12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ =12 dist
2(0, {โ(๐ฅ๐+1 โ ๐ฅ๐)}) โฅ 12 dist
2(0, ๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)]).
The generalized Youngโs inequality for any ๐ผ โ (0, 1) then yields
12 dist
2(0, ๐ [๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐)])
=๐2
2 dist2(โ๐น (๐ฅ๐+1) โ โ๐น (๐ฅ๐), ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐+1))
= inf๐โ๐๐บ (๐ฅ๐+1)
๐2
2 โ(โ๐น (๐ฅ๐+1) โ โ๐น (๐ฅ๐)) โ (๐ + โ๐น (๐ฅ๐+1))โ2๐
โฅ ๐2(1 โ ๐ผโ1)2 โโ๐น (๐ฅ๐+1) โ โ๐น (๐ฅ๐)โ2๐ + inf
๐โ๐๐บ (๐ฅ๐+1)๐2(1 โ ๐ผ)
2 โ๐ + โ๐น (๐ฅ๐+1)โ2๐
โฅ ๐2(1 โ ๐ผโ1)๐ฟ22 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ๐
2(1 โ ๐ผ)2 dist2(0, ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐+1)),
where we have used in the last step that 1 โ ๐ผโ1 < 0 and that โ๐น is Lipschitz continuous.Combining this estimate with (28.17), we obtain that
1 โ ๐2(1 โ ๐ผโ1)๐ฟ22 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โฅ ๐2(1 โ ๐ผ)
2 dist2(0, ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐+1)) .
Rearranging and using that 1 > ๐2(1 โ ๐ผโ1)๐ฟ2 by assumption then yields
12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โฅ \
2 dist2(0, ๐๐บ (๐ฅ๐+1) + โ๐น (๐ฅ๐+1)) .
for\ โ
๐2(1 โ ๐ผ)1 โ ๐2(1 โ ๐ผโ1)๐ฟ2 ,
which for ๐ผ = 1/2 yields the claim. ๏ฟฝ
Metric subregularity then immediately yields the error bound (28.16).
Lemma 28.11. Let๐ป be metrically subregular at ๐ฅ for๐ค = 0 for^ > 0 and ๐ฟ > 0. If 0 < ๐๐ฟ โค 2and ๐ฅ๐+1 โ ๐น(๐ฅ, ๐ฟ), then (28.16) holds with \ = ๐2
2^2 [1+๐ฟ2๐2] .
371
28 faster convergence from regularity
Proof. Combining Lemma 28.10 and the denition of metric subregularity yields12 โ๐ฅ
๐+1 โ ๐ฅ๐ โ2๐ โฅ ๐2
4[1 + ๐ฟ2๐2] dist2(0, ๐ป (๐ฅ๐+1)) โฅ ๐2
4^2 [1 + ๐ฟ2๐2] dist2(๐ฅ๐+1, ๐ปโ1(0)) . ๏ฟฝ
From this lemma,we now obtain local linear convergence of the forward-backward splittingmethod when ๐ป is metrically subregular at a solution.
Theorem 28.12. Let ๐ป be metrically subregular at ๐ฅ โ ๐ปโ1(0) for๐ค = 0 for ^ > 0 and ๐ฟ > 0.If 0 < ๐๐ฟ โค 2 and ๐ฅ๐ โ ๐น(๐ฅ, ๐ฟ) for all ๐ โ โ, then (28.14) holds for ๐๐+1 โ ๐๐ (1 + \ ) and๐0 = 1 with \ = ๐2
2^2 [1+๐ฟ2๐2] . In particular, dist2(๐ฅ๐ ;๐ปโ1(0)) โ 0 at a linear rate.
Proof. Let ๐ฅ โ ๐ปโ1(0) and๐ค๐+1 โ ๐๐บ (๐ฅ๐+1) as in Lemma 28.9. From (10.11) in the proof ofTheorem 10.2 and using ๐๐ฟ โค 2, we obtain
ใ๐ค๐+1 + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ โ๐ฟ4 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โฅ โ 1
2๐ โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐
Lemma 28.11 now yields the error bound (28.16) and hence for all ๐ฅ โ ๐ปโ1(0) that๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ๐๐+1 โ ๐๐\2 โ๐ฅ๐+1 โ ๐ฅ โ๐ โฅ ๐๐+1
2 dist2(๐ฅ๐+1;๐ปโ1(0)) .Summing these two estimates yields๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ๐๐2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐๐ ใ๐ค๐+1 + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐
โฅ ๐๐+12 dist2(๐ฅ๐+1;๐ปโ1(0)) .
Taking the inmum over ๐ฅ โ ๐ปโ1(0), we obtain (28.13) for ๐ = ๐ปโ1(0). The claim nowfollows from Lemma 28.9 and the exponential growth of ๐๐ . ๏ฟฝ
The convergence is local due to the requirement ๐ฅ๐+1 โ ๐น(๐ฅ, ๐ฟ) for applying subregularity.In nite dimensions, the weak convergence result of Theorem 9.6 of course guaranteesthat the iterates enter and remain in this neighborhood after a nite number of steps.
Remark 28.13 (local linear convergence). Local linear convergence was rst derived from errorbounds in [Luo & Tseng 1992] for matrix splitting problems and was studied for other methods,including the ADMMand the proximal pointmethod among others, in [Han&Yuan 2013; Aspelmeier,Charitha & Luke 2016; Leventhal 2009; Li & Mordukhovich 2012]. An alternative approach to theproximal point method was taken in [Aragรณn Artacho & Gaydu 2012] based on LyusternikโGravesstyle estimates, while [Adly, Cibulka & Ngai 2015] presented an approach based on metric regularityto Newtonโs method for variational inclusions. Furthermore, [Zhou & So 2017] proposed a uniedapproach to error bounds for generic smooth constrained problems. Finally, [Liu et al. 2018; Valkonen2021] introduced partial or subspace versions of error bounds and show the fast convergence ofonly some variables of structured algorithms such as the ADMM or the PDPS. The relationshipsbetween error bounds and metric subregularity is studied in more detail in [Gfrerer 2011; Ioe 2017;Kruger 2015; Dontchev & Rockafellar 2014; Ngai & Thรฉra 2008].
372
28 faster convergence from regularity
rates from strong submonotonicity
If ๐ป is instead strongly submonotone, we can (locally) ensure (28.13) directly.
Theorem 28.14. Let ๐ป be (๐พ/2, \/2)-strongly submonotone at ๐ฅ โ ๐ปโ1(0) for๐ค = 0 for ๐ฟ > 0.If ๐พ > \ + ๐ฟ2๐ and ๐ฅ0 โ ๐น(๐ฅ, Y) for some Y > 0 suciently small, then (28.14) holds for๐๐+1 โ ๐๐ [1 + (๐พ โ ๐ฟ2๐)๐] and ๐0 = 1. In particular, dist2(๐ฅ๐ ;๐ปโ1(0)) โ 0 at a linear rate.
Proof. Let๐ค๐+1 โ โ๐โ1(๐ฅ๐+1 โ ๐ฅ๐) โ โ๐น (๐ฅ๐) โ ๐๐บ (๐ฅ๐+1) by (28.12). By (9.9) in the proof ofTheorem 9.6, if ๐ฅ0 โ ๐น(๐ฅ, Y) for Y > 0 small enough, then โ๐ฅ๐+1 โ ๐ฅ๐ โ๐ โค ๐ฟ/(๐ฟ + ๐โ1) forall ๐ โ โ such that the Lipschitz continuity of โ๐น yields
โโ๐น (๐ฅ๐+1) โ โ๐น (๐ฅ๐) โ ๐โ1(๐ฅ๐+1 โ ๐ฅ๐)โ๐ โค ๐ฟ.
Thus ๐ค๐+1 โ ๐๐บ (๐ฅ๐+1) โฉ ๐น(โโ๐น (๐ฅ๐+1), ๐ฟ) and ๐ฅ๐+1 โ ๐น(๐ฅ, ๐ฟ) for all ๐ โ โ. Now, for all๐ฅ โ ๐ปโ1(0), the strong submonotonicity of ๐ป at ๐ฅ for 0 implies that
๐๐๐ ใ๐ค๐+1 + โ๐น (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ + (\ โ ๐พ)๐๐๐2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ โฅ \๐๐๐
2 dist2(๐ฅ๐+1;๐ปโ1(0))
for all ๐ โ โ. Cauchyโs inequality and the Lipschitz continuity of โ๐น then yields
๐๐๐ ใโ๐น (๐ฅ๐) โ โ๐น (๐ฅ๐+1), ๐ฅ๐+1 โ ๐ฅใ๐ โฅ โ๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ โ ๐๐๐2๐ฟ2
2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ .
We now sum the last two inequalities to obtain
๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ + ๐๐๐ ใ๐ค๐+1 + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐
โฅ \๐๐๐
2 dist2(๐ฅ๐+1;๐ปโ1(0)) + (๐พ โ \ โ ๐ฟ2๐)๐๐๐2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ .
Using that \ โ ๐พ + ๐ฟ2๐ < 0 and taking the inmum over all ๐ฅ โ ๐ปโ1(0) then yields
inf๐ฅโ๐ปโ1 (0)
(๐๐2 โ๐ฅ๐+1 โ ๐ฅ โ2๐ + ๐๐๐ ใ๐ค๐+1 + โ๐น (๐ฅ๐), ๐ฅ๐+1 โ ๐ฅใ๐
)โฅ (๐พ โ ๐ฟ2๐)๐๐๐ + ๐๐
2 dist2(๐ฅ๐+1;๐ปโ1(0)) โ ๐๐2 โ๐ฅ๐+1 โ ๐ฅ๐ โ2๐ .
Since ๐พ โ ๐ฟ2๐ > 0 and ๐๐+1 = ๐๐ [1 + (๐พ โ ๐ฟ2๐)๐], this shows (28.13) with ๐ = ๐ปโ1(0). Theclaim now follows from Lemma 28.9 and the exponential growth of ๐๐ . ๏ฟฝ
Remark 28.15. Similarly to Theorem 10.1 (ii), if ๐น โก 0 we can let ๐โโ to obtain local superlinearconvergence of the proximal point method under strong submonotonicity of ๐๐บ at the solution.
373
BIBLIOGRAPHY
S. Adly, R. Cibulka & H.V. Ngai (2015), Newtonโs method for solving inclusions usingset-valued approximations, SIAM Journal on Optimization 25(1), 159โ184, doi: 10.1137/130926730.
H.W. Alt (2016), Linear Functional Analysis, An Application-Oriented Introduction, Universi-text, Springer, London, doi: 10.1007/978-1-4471-7280-2.
J. Appell & P. Zabreiko (1990), Nonlinear Superposition Operators, Cambridge UniversityPress, New York, doi: 10.1515/9783110265118.385.
F. J. Aragรณn Artacho & M. Gaydu (2012), A LyusternikโGraves theorem for the proximalpoint method, Computational Optimization and Applications 52(3), 785โ803, doi: 10.1007/s10589-011-9439-6.
F. J. Aragรณn Artacho & M.H. Geoffroy (2014), Metric subregularity of the convex subdif-ferential in Banach spaces, J. Nonlinear Convex Anal. 15(1), 35โ47, arxiv: 1303.3654.
K. J. Arrow, L. Hurwicz & H. Uzawa (1958), Studies in Linear and Non-Linear Programming,Stanford University Press, Stanford.
T. Aspelmeier, C. Charitha & D. R. Luke (2016), Local linear convergence of the ADMM/DouglasโRachford algorithms without strong convexity and application to statisticalimaging, SIAM Journal on Imaging Sciences 9(2), 842โ868, doi: 10.1137/15m103580x.
H. Attouch, G. Buttazzo & G. Michaille (2014), Variational Analysis in Sobolev and BVSpaces, 2nd ed., vol. 6, MOS-SIAM Series on Optimization, Society for Industrial andApplied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611973488.
H. Attouch & H. Brezis (1986), Duality for the sum of convex functions in general Banachspaces, in: Aspects of mathematics and its applications, vol. 34, North-Holland Math.Library, North-Holland, Amsterdam, 125โ133, doi: 10.1016/s0924-6509(09)70252-1.
J. Aubin & H. Frankowska (1990), Set-Valued Analysis, Birkhรคuser Basel, doi: 10.1007/978-0-8176-4848-0.
J.-P. Aubin (1981), Contingent derivatives of set-valued maps and existence of solutionsto nonlinear inclusions and dierential inclusions, in: Mathematical analysis and appli-cations, Part A, vol. 7, Adv. in Math. Suppl. Stud. Academic Press, New York-London,159โ229.
J.-P. Aubin (1984), Lipschitz behavior of solutions to convex minimization problems,Mathe-matics of Operations Research 9(1), 87โ111, doi: 10.1287/moor.9.1.87.
374
bibliography
D. Aussel, A. Daniilidis & L. Thibault (2005), Subsmooth sets: functional characterizationsand related concepts, Transactions of the American Mathematical Society 357(4), 1275โ1301,doi: 10.1090/s0002-9947-04-03718-3.
D. Azรฉ & J.-P. Penot (1995), Uniformly convex and uniformly smooth convex functions,Annales de la Facultรฉ des sciences de Toulouse: Mathรฉmatiques 4(4), 705โ730, url: hp://eudml.org/doc/73364.
M. Baฤรกk & U. Kohlenbach (2018), On proximal mappings with Young functions in uni-formly convex Banach spaces, Journal of Convex Analysis 25(4), 1291โ1318, arxiv: 1709.04700.
A. Bagirov, N. Karmitsa & M.M. Mรคkelรค (2014), Introduction to Nonsmooth Optimization,Theory, practice and software, Springer, Cham, doi: 10.1007/978-3-319-08114-4.
V. Barbu & T. Precupanu (2012), Convexity and Optimization in Banach Spaces, 4th ed.,Springer Monographs in Mathematics, Springer, Dordrecht, doi: 10.1007/978-94-007-2247-7.
H.H. Bauschke & P. L. Combettes (2017), Convex Analysis and Monotone Operator Theoryin Hilbert Spaces, 2nd ed., CMS Books in Mathematics/Ouvrages de Mathรฉmatiques de laSMC, Springer, New York, doi: 10.1007/978-3-319-48311-5.
A. Beck (2017), First-Order Methods in Optimization, Society for Industrial and AppliedMathematics, Philadelphia, PA, doi: 10.1137/1.9781611974997.
A. Beck&M. Teboulle (2009a), A fast iterative shrinkage-thresholding algorithm for linearinverse problems, SIAM Journal on Imaging Sciences 2(1), 183โ202, doi: 10.1137/080716542.
A. Beck & M. Teboulle (2009b), Fast gradient-based algorithms for constrained totalvariation image denoising and deblurring problems, IEEE Transactions on Image Processing18(11), 2419โ2434, doi: 10.1109/tip.2009.2028250.
M. Benning, F. Knoll, C.-B. Schรถnlieb & T. Valkonen (2016), Preconditioned ADMM withnonlinear operator constraint, in: System Modeling and Optimization: 27th IFIP TC 7Conference, CSMO 2015, Sophia Antipolis, France, June 29โJuly 3, 2015, Revised SelectedPapers, ed. by L. Bociu, J.-A. Dรฉsidรฉri & A. Habbal, Springer International Publishing,Cham, 117โ126, doi: 10.1007/978-3-319-55795-3_10, arxiv: 1511.00425, url: hp://tuomov.
iki.fi/m/nonlinearADMM.pdf.F. Bigolin & S. N. Golo (2014), A historical account on characterizations of ๐ถ1-manifoldsin Euclidean spaces by tangent cones, Journal of Mathematical Analysis and Applications412(1), 63โ76, doi: 10.1016/j.jmaa.2013.10.035, arxiv: 1003.1332.
J.M. Borwein & D. Preiss (1987), A smooth variational principle with applications tosubdierentiability and to dierentiability of convex functions, Trans. Am. Math. Soc.303, 517โ527, doi: 10.2307/2000681.
J.M. Borwein & Q. J. Zhu (2005), Techniques of Variational Analysis, vol. 20, CMS Booksin Mathematics/Ouvrages de Mathรฉmatiques de la SMC, Springer-Verlag, New York,doi: 10.1007/0-387-28271-8.
G. Bouligand (1930), Sur quelques points de mรฉthodologie gรฉomรฉtrique, Rev. Gรฉn. desSciences 41, 39โ43.
K. Bredies & H. Sun (2016), Accelerated DouglasโRachford methods for the solution ofconvex-concave saddle-point problems, Preprint, arxiv: 1604.06282.
375
bibliography
H. Brezis (2010), Functional Analysis, Sobolev Spaces and Partial Dierential Equations,Springer, New York, doi: 10.1007/978-0-387-70914-7.
H. Brezis, M. G. Crandall & A. Pazy (1970), Perturbations of nonlinear maximal monotonesets in Banach space, Comm. Pure Appl. Math. 23, 123โ144, doi: 10.1002/cpa.3160230107.
M. Brokate (2014), Konvexe Analysis und Evolutionsprobleme, Lecture notes, ZentrumMathematik, TU Mรผnchen, url: hp://www-m6.ma.tum.de/~brokate/cev_ss14.pdf.
F. E. Browder (1965), Nonexpansive nonlinear operators in a Banach space, Proceedings ofthe National Academy of Sciences of the United States of America 54(4), 1041, doi: 10.1073/pnas.54.4.1041.
F. E. Browder (1967), Convergence theorems for sequences of nonlinear operators in Banachspaces, Mathematische Zeitschrift 100(3), 201โ225, doi: 10.1007/bf01109805.
A. Cegielski (2012), Iterative methods for xed point problems in Hilbert spaces, vol. 2057,Lecture Notes in Mathematics, Springer, Heidelberg, doi: 10.1007/978-3-642-30901-4.
A. Chambolle, R. A. DeVore, N.-y. Lee & B. J. Lucier (1998), Nonlinear wavelet imageprocessing: variational problems, compression, and noise removal through waveletshrinkage, IEEE Transactions on Image Processing 7(3), 319โ335, doi: 10.1109/83.661182.
A. Chambolle, M. Ehrhardt, P. Richtรกrik & C. Schรถnlieb (2018), Stochastic primal-dualhybrid gradient algorithm with arbitrary sampling and imaging applications, SIAMJournal on Optimization 28(4), 2783โ2808, doi: 10.1137/17m1134834.
A. Chambolle & T. Pock (2011), A rst-order primal-dual algorithm for convex problemswith applications to imaging, Journal of Mathematical Imaging and Vision 40(1), 120โ145,doi: 10.1007/s10851-010-0251-1.
A. Chambolle & T. Pock (2015), On the ergodic convergence rates of a rst-order primalโdual algorithm, Mathematical Programming, 1โ35, doi: 10.1007/s10107-015-0957-3.
P. Chen, J. Huang & X. Zhang (2013), A primal-dual xed point algorithm for convexseparable minimization with applications to image restoration, Inverse Problems 29(2),025011, doi: 10.1088/0266-5611/29/2/025011.
X. Chen, Z. Nashed & L. Qi (2000), Smoothing methods and semismooth methods fornondierentiable operator equations, SIAM Journal on Numerical Analysis 38(4), 1200โ1216, doi: 10.1137/s0036142999356719.
C. Christof, C. Clason, C. Meyer & S. Walter (2018), Optimal control of a non-smoothsemilinear elliptic equation,Mathematical Control and Related Fields 8(1), 247โ276, doi: 10.3934/mcrf.2018011.
C. Christof & G. Wachsmuth (2018), No-gap second-order conditions via a directional cur-vature functional, SIAM Journal on Optimization 28(3), 2097โ2130, doi: 10.1137/17m1140418.
R. Cibulka, A. L. Dontchev & A. Y. Kruger (2018), Strong metric subregularity of mappingsin variational analysis and optimization, Journal of Mathematical Analysis and Applica-tions 457(2), Special Issue on Convex Analysis and Optimization: New Trends in Theoryand Applications, 1247โ1282, doi: j.jmaa.2016.11.045.
I. Cioranescu (1990), Geometry of Banach Spaces, Duality Mappings and Nonlinear Problems,vol. 62, Mathematics and Its Applications, Springer, Dordrecht, doi: 10.1007/978-94-009-2121-4.
376
bibliography
F. Clarke (2013), Functional Analysis, Calculus of Variations and Optimal Control, Springer,London, doi: 10.1007/978-1-4471-4820-3.
F. H. Clarke (1973), Necessary conditions for nonsmooth problems in optimal control andthe calculus of variations, PhD thesis, University of Washington.
F. H. Clarke (1975), Generalized gradients and applications, Transactions of the AmericanMathematical Society 205, 247โ262.
F. H. Clarke (1990), Optimization and Nonsmooth Analysis, vol. 5, Classics Appl. Math. Soci-ety for Industrial and Applied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611971309.
C. Clason (2020), Introduction to Functional Analysis, Compact Textbooks in Mathematics,Birkhรคuser, Basel, doi: 10.1007/978-3-030-52784-6.
C. Clason & K. Kunisch (2011), A duality-based approach to elliptic control problems innon-reexive Banach spaces, ESAIM: Control, Optimisation and Calculus of Variations17(1), 243โ266, doi: 10.1051/cocv/2010003.
C. Clason, S. Mazurenko & T. Valkonen (2019), Acceleration and global convergence of arst-order primalโdual method for nonconvex problems, SIAM Journal on Optimization29 (1), 933โ963, doi: 10.1137/18m1170194, arxiv: 1802.03347.
C. Clason, S. Mazurenko & T. Valkonen (2020), Primalโdual proximal splitting and gen-eralized conjugation in non-smooth non-convex optimization, Applied Mathematics &Optimization, doi: 10.1007/s00245-020-09676-1, arxiv: 1901.02746, url: hp://tuomov.iki.
fi/m/nlpdhgm_general.pdf.C. Clason & T. Valkonen (2017a), Primal-dual extragradient methods for nonlinear nons-mooth PDE-constrained optimization, SIAM Journal on Optimization 27(3), 1313โ1339,doi: 10.1137/16m1080859, arxiv: 1606.06219.
C. Clason & T. Valkonen (2017b), Stability of saddle points via explicit coderivatives ofpointwise subdierentials, Set-Valued and Variational Analysis 25 (1), 69โ112, doi: 10.1007/s11228-016-0366-7, arxiv: 1509.06582, url: hp://tuomov.iki.fi/m/pdex2_stability.pdf.
P. L. Combettes & N.N. Reyes (2013), Moreauโs decomposition in Banach spaces, Mathe-matical Programming 139(1), 103โ114, doi: 10.1007/s10107-013-0663-y.
L. Condat (2013), A primalโdual splitting method for convex optimization involving Lips-chitzian, proximable and linear composite terms, Journal of Optimization Theory andApplications 158(2), 460โ479, doi: 10.1007/s10957-012-0245-9.
I. Daubechies, M. Defrise & C. De Mol (2004), An iterative thresholding algorithm forlinear inverse problems with a sparsity constraint, Communications on Pure and AppliedMathematics 57(11), 1413โ1457, doi: 10.1002/cpa.20042.
E. DiBenedetto (2002), Real Analysis, Birkhรคuser Boston, Inc., Boston, MA, doi: 10.1007/978-1-4612-0117-5.
S. Dolecki & G.H. Greco (2011), Tangency vis-ร -vis dierentiability by Peano, Severi andGuareschi, Journal of Convex Analysis 18(2), 301โ339, arxiv: 1003.1332.
A. L. Dontchev & R. Rockafellar (2014), Implicit Functions and Solution Mappings, A Viewfrom Variational Analysis, 2nd ed., Springer Series in Operations Research and FinancialEngineering, Springer New York, doi: 10.1007/978-1-4939-1037-3.
377
bibliography
A. L. Dontchev & R. T. Rockafellar (2004), Regularity and conditioning of solution map-pings in variational analysis, Set-Valued and Variational Analysis 12(1-2), 79โ109, doi: 10.1023/b:svan.0000023394.19482.30.
J. Douglas Jim & J. Rachford H. H. (1956), On the numerical solution of heat conductionproblems in two and three space variables, Transactions of the American MathematicalSociety 82(2), 421โ439, doi: 10.2307/1993056.
Y. Drori, S. Sabach &M. Teboulle (2015), A simple algorithm for a class of nonsmoothconvexโconcave saddle-point problems,Operations Research Letters 43(2), 209โ214,doi: 10.1016/j.orl.2015.02.001.
J. Eckstein & D. P. Bertsekas (1992), On the DouglasโRachford splitting method and theproximal point algorithm for maximal monotone operators, Mathematical Programming55(1-3), 293โ318, doi: 10.1007/bf01581204.
I. Ekeland & G. Lebourg (1976), Generic Frรฉchet-dierentiability and perturbed optimiza-tion problems in Banach spaces, Transactions of the American Mathematical Society 224(2),193โ216, doi: 10.1090/s0002-9947-1976-0431253-2.
I. Ekeland & R. Tรฉmam (1999), Convex Analysis and Variational Problems, vol. 28, ClassicsAppl. Math. Society for Industrial and Applied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611971088.
E. Esser, X. Zhang & T. F. Chan (2010), A general framework for a class of rst orderprimal-dual algorithms for convex optimization in imaging science, SIAM Journal onImaging Sciences 3(4), 1015โ1046, doi: 10.1137/09076934x.
M. Fabian (1988), On classes of subdierentiability spaces of Ioe, Nonlinear Analysis:Theory, Methods & Applications 12(1), 63โ74, doi: 10.1016/0362-546x(88)90013-2.
M. Fabian, P. Habala, P. Hรกjek, V.M. Santalucรญa, J. Pelant & V. Zizler (2001), Dierentia-bility of norms, in: Functional Analysis and Innite-Dimensional Geometry, Springer NewYork, New York, NY, 241โ284, doi: 10.1007/978-1-4757-3480-5_8.
F. Facchinei & J.-S. Pang (2003a), Finite-dimensional Variational Inequalities and Comple-mentarity Problems. Vol. I, Springer Series in Operations Research, Springer-Verlag, NewYork, doi: 10.1007/b97543.
F. Facchinei & J.-S. Pang (2003b), Finite-dimensional Variational Inequalities and Comple-mentarity Problems. Vol. II, Springer Series in Operations Research, Springer-Verlag, NewYork, doi: 10.1007/b97544.
L. Fejรฉr (1922), รber die Lage der Nullstellen von Polynomen, die aus Minimumforderungengewisser Art entspringen, Math. Ann. 85(1), 41โ48, doi: 10.1007/bf01449600.
I. Fonseca & G. Leoni (2007), Modern Methods in the Calculus of Variations: ๐ฟ๐ Spaces,Springer, doi: 10.1007/978-0-387-69006-3.
D. Gabay (1983), Applications of the method of multipliers to variational inequalities, in:Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, ed. by M. Fortin & R. Glowinski, vol. 15, Studies in Mathematics and itsApplications, North-Holland, Amsterdam, 299โ331.
H. Gferer & J. Outrata (2019), On a semismoothโ Newton method for solving generalizedequations, Preprint, arxiv: 1904.09167.
378
bibliography
H. Gfrerer & J. Outrata (2016), On Lipschitzian properties of implicit multifunctions,SIAM Journal on Optimization 26(4), 2160โ2189, doi: 10.1137/15m1052299.
H. Gfrerer (2011), First order and second order characterizations of metric subregularityand calmness of constraint set mappings, SIAM Journal on Optimization 21(4), 1439โ1474,doi: 10.1137/100813415.
H. Gfrerer (2013), On directional metric regularity, subregularity and optimality conditionsfor nonsmooth mathematical programs, Set-Valued and Variational Analysis 21(2), 151โ176,doi: 10.1007/s11228-012-0220-5.
R. Gribonval & M. Nikolova (2020), A characterization of proximity operators, doi: 10.1007/s10851-020-00951-y, arxiv: 1807.04014.
D. Han & X. Yuan (2013), Local linear convergence of the Alternating Direction Method ofMultipliers for quadratic programs, SIAM Journal on Numerical Analysis 51(6), 3446โ3457,doi: 10.1137/120886753.
F. Harder & G. Wachsmuth (2018), The limiting normal cone of a complementarity set inSobolev spaces, Optimization 67(5), 1579โ1603, doi: 10.1080/02331934.2018.1484467.
B. He & X. Yuan (2012), Convergence analysis of primal-dual algorithms for a saddle-pointproblem: from contraction perspective, SIAM Journal on Imaging Sciences 5(1), 119โ149,doi: 10.1137/100814494.
J. Heinonen (2005), Lectures on Lipschitz analysis, vol. 100, Rep. Univ. Jyvรคskylรค Dept. Math.Stat. University of Jyvรคskylรค, url: hp://www.math.jyu.fi/research/reports/rep100.pdf.
R. Henrion, A. Jourani & J. Outrata (2002), On the calmness of a class of multifunctions,SIAM Journal on Optimization 13(2), 603โ618, doi: 10.1137/s1052623401395553.
M. Hintermรผller, K. Ito & K. Kunisch (2002), The primal-dual active set strategy asa semismooth Newton method, SIAM Journal on Optimization 13(3), 865โ888 (2003),doi: 10.1137/s1052623401383558.
M. Hintermรผller & K. Kunisch (2004), Total bounded variation regularization as a bilat-erally constrained optimization problem, SIAM Journal on Applied Mathematics 64(4),1311โ1333, doi: 10.1137/s0036139903422784.
J.-B. Hiriart-Urruty & C. Lemarรฉchal (1993a), Convex Analysis and Minimization Al-gorithms I, Fundamentals, vol. 305, Grundlehren der Mathematischen Wissenschaften,Springer-Verlag, Berlin, doi: 10.1007/978-3-662-02796-7.
J.-B. Hiriart-Urruty & C. Lemarรฉchal (1993b), Convex Analysis and Minimization Algo-rithms II, Advanced Theory and Bundle Methods, vol. 306, Grundlehren der Mathematis-chen Wissenschaften, Springer-Verlag, Berlin, doi: 10.1007/978-3-662-06409-2.
J.-B. Hiriart-Urruty & C. Lemarรฉchal (2001), Fundamentals of Convex Analysis, Springer-Verlag, Berlin, doi: 10.1007/978-3-642-56468-0.
T. Hohage & C. Homann (2014), A generalization of the ChambolleโPock algorithm toBanach spaces with applications to inverse problems, Preprint, arxiv: 1412.0126.
A.D. Ioffe (1984), Approximate subdierentials and applications. I. The nite-dimensionaltheory, Trans. Amer. Math. Soc. 281(1), 389โ416, doi: 10.2307/1999541.
A. Ioffe (2017), Variational Analysis of Regular Mappings: Theory and Applications, SpringerMonographs in Mathematics, Springer International Publishing, doi: 10.1007/978-3-319-64277-2.
379
bibliography
A. D. Ioffe (1979), Regular points of Lipschitz functions, Transactions of the American Math-ematical Society 251, 61โ69, doi: 10.1090/s0002-9947-1979-0531969-6.
K. Ito & K. Kunisch (2008), Lagrange Multiplier Approach to Variational Problems andApplications, vol. 15, Advances in Design and Control, Society for Industrial and AppliedMathematics, Philadelphia, PA, doi: 10.1137/1.9780898718614.
D. Klatte & B. Kummer (2002), Nonsmooth Equations in Optimization, vol. 60, NonconvexOptimization and its Applications, Regularity, calculus, methods and applications, KluwerAcademic Publishers, Dordrecht, doi: 10.1007/b130810.
M. Kojima & S. Shindo (1986), Extension of Newton and quasi-Newton methods to systemsof PC1 equations, J. Oper. Res. Soc. Japan 29(4), 352โ375, doi: 10.15807/jorsj.29.352.
M.A. Krasnoselโฒskii (1955), Two remarks on the method of successive approximations,
Akademiya Nauk SSSR i Moskovskoe Matematicheskoe Obshchestvo. Uspekhi Matematich-eskikh Nauk 10(1(63)), 123โ127.
A. Y. Kruger (2015), Error bounds and metric subregularity, Optimization 64(1), 49โ79,doi: 10.1080/02331934.2014.938074.
F. Kruse (2018), Semismooth implicit functions, Journal of Convex Analysis 28(2), 595โ622,url: hps://imsc.uni-graz.at/mannel/SIF_Kruse.pdf.
B. Kummer (1988), Newtonโs method for non-dierentiable functions,Mathematical Research45, 114โ125.
B. Kummer (2000), Generalized Newton and NCP-methods: convergence, regularity, actions,Discuss. Math. Dier. Incl. Control Optim. 20(2), 209โ244, doi: 10.7151/dmdico.1013.
D. Leventhal (2009), Metric subregularity and the proximal point method, Journal ofMathematical Analysis and Applications 360(2), 681โ688, doi: 10.1016/j.jmaa.2009.07.012.
G. Li & B. S. Mordukhovich (2012), Hรถlder metric subregularity with applications toproximal point method, SIAM Journal on Optimization 22(4), 1655โ1684, doi: 10.1137/120864660.
P. Lions & B. Mercier (1979), Splitting algorithms for the sum of two nonlinear operators,SIAM Journal on Numerical Analysis 16(6), 964โ979, doi: 10.1137/0716071.
Y. Liu, X. Yuan, S. Zeng & J. Zhang (2018), Partial error bound conditions and the linearconvergence rate of the Alternating Direction Method of Multipliers, SIAM Journal onNumerical Analysis 56(4), 2095โ2123, doi: 10.1137/17m1144623.
I. Loris & C. Verhoeven (2011), On a generalization of the iterative soft-thresholdingalgorithm for the case of non-separable penalty, Inverse Problems 27(12), 125007, doi: 10.1088/0266-5611/27/12/125007.
Z.-Q. Luo & P. Tseng (1992), Error bound and convergence analysis of matrix splittingalgorithms for the ane variational inequality problem, SIAM Journal on Optimization2(1), 43โ54, doi: 10.1137/0802004.
M.M. Mรคkelรค & P. Neittaanmรคki (1992), Nonsmooth Optimization, Analysis and algorithmswith applications to optimal control, World Scientic Publishing Co., Inc., River Edge,NJ, doi: 10.1142/1493.
Y. Malitsky & T. Pock (2018), A rst-order primal-dual algorithm with linesearch, SIAMJournal on Optimization 28(1), 411โ432, doi: 10.1137/16m1092015.
380
bibliography
W. R. Mann (1953), Mean value methods in iteration, Proc. Amer. Math. Soc. 4, 506โ510,doi: 10.2307/2032162.
B. Martinet (1970), Rรฉgularisation dโinรฉquations variationnelles par approximations suc-cessives, Rev. Franรงaise Informat. Recherche Opรฉrationnelle 4(Sรฉr. R-3), 154โ158.
S. Mazurenko, J. Jauhiainen & T. Valkonen (2020), Primal-dual block-proximal splittingfor a class of non-convex problems, Electronic Transactions on Numerical Analysis 52,509โ552, doi: 10.1553/etna_vol52s509, arxiv: 1911.06284.
P. Mehlitz & G. Wachsmuth (2018), The limiting normal cone to pointwise dened sets inLebesgue spaces, Set-Valued and Variational Analysis 26(3), 449โ467, doi: 10.1007/s11228-016-0393-4.
P. Mehlitz & G. Wachsmuth (2019), The weak sequential closure of decomposable sets inLebesgue spaces and its application to variational geometry, Set-Valued and VariationalAnalysis 27(1), 265โ294, doi: 10.1007/s11228-017-0464-1.
R. Mifflin (1977), Semismooth and semiconvex functions in constrained optimization, SIAMJournal on Control and Optimization 15(6), 959โ972, doi: 10.1137/0315061.
T. Mรถllenhoff, E. Strekalovskiy, M. Moeller & D. Cremers (2015), The primal-dualhybrid gradient method for semiconvex splittings, SIAM Journal on Imaging Sciences8(2), 827โ857, doi: 10.1137/140976601.
B. ล . Morduhoviฤ (1980), Metric approximations and necessary conditions for optimalityfor general classes of nonsmooth extremal problems, Dokl. Akad. Nauk SSSR 254(5),1072โ1076.
B. S. Mordukhovich (1994), Generalized dierential calculus for nonsmooth and set-valuedmappings, Journal of Mathematical Analysis and Applications 183(1), 250โ288, doi: 10.1006/jmaa.1994.1144.
B. S. Mordukhovich (2006), Variational Analysis and Generalized Dierentiation I, BasicTheory, vol. 330, Grundlehren der mathematischen Wissenschaften, Springer, doi: 10.1007/3-540-31247-1.
B. S. Mordukhovich (2018), Variational Analysis and Applications, Springer Monographs inMathematics, Springer International Publishing, doi: 10.1007/978-3-319-92775-6.
B. S. Mordukhovich (1976), Maximum principle in the problem of time optimal responsewith nonsmooth constraints, Journal of Applied Mathematics and Mechanics 40(6), 960โ969, doi: 10.1016/0021-8928(76)90136-2.
J.-J. Moreau (1965), Proximitรฉ et dualitรฉ dans un espace hilbertien, Bulletin de la Sociรฉtรฉmathรฉmatique de France 93, 273โ299.
T. S. Motzkin & I. J. Schoenberg (1954), The relaxation method for linear inequalities,Canadian Journal of Mathematics 6, 393โ404, doi: 10.4153/cjm-1954-038-x.
Y. E. Nesterov (1983), A method for solving the convex programming problem with con-vergence rate ๐ (1/๐2), Soviet Math. Doklad. 27(2), 372โ376.
Y. Nesterov (2004), Introductory Lectures on Convex Optimization, vol. 87, Applied Opti-mization, Kluwer Academic Publishers, Boston, MA, doi: 10.1007/978-1-4419-8853-9.
H.V. Ngai & M. Thรฉra (2008), Error bounds in metric spaces and application to theperturbation stability of metric regularity, SIAM Journal on Optimization 19(1), 1โ20,doi: 10.1137/060675721.
381
bibliography
P. Ochs & T. Pock (2019), Adaptive FISTA for non-convex optimization, SIAM Journal onOptimization 29(4), 2482โ2503, doi: 10.1137/17m1156678, arxiv: 1711.04343.
Z. Opial (1967), Weak convergence of the sequence of successive approximations fornonexpansive mappings, Bulletin of the American Mathematical Society 73(4), 591โ597,doi: 10.1090/s0002-9904-1967-11761-0.
J. Outrata, M. Koฤvara & J. Zowe (1998), Nonsmooth Approach to Optimization Problemswith Equilibrium Constraints, vol. 28, Nonconvex Optimization and its Applications,Theory, applications and numerical results, Kluwer Academic Publishers, Dordrecht,doi: 10.1007/978-1-4757-2825-5.
N. Parikh & S. Boyd (2014), Proximal algorithms, Foundations and Trends in Optimization1(3), 123โ231, doi: 10.1561/2400000003.
P. Patrinos, L. Stella & A. Bemporad (2014), DouglasโRachford splitting: complexityestimates and accelerated variants, in: 53rd IEEE Conference on Decision and Control,4234โ4239, doi: 10.1109/cdc.2014.7040049.
G. Peano (1908), Formulario Mathematico, Fratelli Boca, Torino.J.-P. Penot (2013), Calculus Without Derivatives, vol. 266, Graduate Texts in Mathematics,Springer, New York, doi: 10.1007/978-1-4614-4538-8.
W.V. Petryshyn (1966), Construction of xed points of demicompact mappings in Hilbertspace, Journal of Mathematical Analysis and Applications 14(2), 276โ284, doi: 10.1016/0022-247x(66)90027-8.
T. Pock, D. Cremers, H. Bischof & A. Chambolle (2009), An algorithm for minimizingthe MumfordโShah functional, in: 12th IEEE Conference on Computer Vision, 1133โ1140,doi: 10.1109/iccv.2009.5459348.
R. Poliquin & R. Rockafellar (1996), Prox-regular functions in variational analysis, Trans-actions of the American Mathematical Society 348(5), 1805โ1838, doi: 10.1090/s0002-9947-96-01544-9.
L. Qi (1993), Convergence analysis of some algorithms for solving nonsmooth equations,Math. Oper. Res. 18(1), 227โ244, doi: 10.1287/moor.18.1.227.
L. Qi& J. Sun (1993), A nonsmooth version of Newtonโs method,Mathematical Programming58(3, Ser. A), 353โ367, doi: 10.1007/bf01581275.
M. Renardy & R. C. Rogers (2004), An Introduction to Partial Dierential Equations, 2nd ed.,vol. 13, Texts in Applied Mathematics, Springer-Verlag, New York, doi: 10.1007/b97427.
S.M. Robinson (1981), Some continuity properties of polyhedral multifunctions, in: Math-ematical Programming at Oberwolfach, ed. by H. Kรถnig, B. Korte & K. Ritter, SpringerBerlin Heidelberg, Berlin, Heidelberg, 206โ214, doi: 10.1007/bfb0120929.
R. T. Rockafellar (1988), First-and second-order epi-dierentiability in nonlinear program-ming, Transactions of the American Mathematical Society 307(1), 75โ108, doi: 10.1090/s0002-9947-1988-0936806-9.
R. T. Rockafellar (1970), On the maximal monotonicity of subdierential mappings, PacicJ. Math. 33, 209โ216, doi: 10.2140/pjm.1970.33.209.
R. T. Rockafellar (1976a), Integral functionals, normal integrands andmeasurable selections,in:NonlinearOperators and the Calculus of Variations (Summer School,Univ. Libre Bruxelles,
382
bibliography
Brussels, 1975), vol. 543, Lecture Notes in Math. Springer, Berlin, 157โ207, doi: 10.1007/bfb0079944.
R. T. Rockafellar (1976b), Monotone operators and the proximal point algorithm, SIAMJournal on Control and Optimization 14(5), 877โ898, doi: 10.1137/0314056.
R. T. Rockafellar (1981), Favorable classes of Lipschitz continuous functions in subgradientoptimization, in: Progress in Nondierentiable Optimization, IIASA, Laxenburg, Austria,125โ143, url: hp://pure.iiasa.ac.at/id/eprint/1760/.
R. T. Rockafellar & R. J.-B. Wets (1998), Variational Analysis, vol. 317, Grundlehren dermathematischen Wissenschaften, Springer, doi: 10.1007/978-3-642-02431-3.
R. Rockafellar (1985), Maximal monotone relations and the second derivatives of nons-mooth functions,Annales de lโInstitut Henri Poincare (C) Non Linear Analysis 2(3), 167โ184,doi: 10.1016/s0294-1449(16)30401-2.
R. Rockafellar (1989), Proto-Dierentiability of Set-Valued Mappings and its Applicationsin Optimization, Annales de lโInstitut Henri Poincare (C) Non Linear Analysis 6, 449โ482,doi: 10.1016/s0294-1449(17)30034-3.
W. Rudin (1991), Functional Analysis, 2nd ed., McGraw-Hill, New York.A. Ruszczynski (2006), Nonlinear Optimization, Princeton University Press, Princeton, NJ,doi: 10.2307/j.ctvcm4hcj.
B. P. Rynne & M.A. Youngson (2008), Linear Functional Analysis, 2nd ed., Springer Under-graduate Mathematics Series, Springer, London, doi: 10.1007/978-1-84800-005-6.
H. Schaefer (1957), รber die Methode sukzessiver Approximationen, Jahresbericht derDeutschen Mathematiker-Vereinigung 59, 131โ140, url: hp://eudml.org/doc/146424.
A. Schiela (2008), A simplied approach to semismooth Newton methods in function space,SIAM Journal on Optimization 19(3), 1417โ1432, doi: 10.1137/060674375.
W. Schirotzek (2007), Nonsmooth Analysis, Universitext, Springer, Berlin, doi: 10.1007/978-3-540-71333-3.
S. Scholtes (2012), Introduction to Piecewise Dierentiable Equations, Springer Briefs inOptimization, Springer, New York, doi: 10.1007/978-1-4614-4340-7.
S. Simons (2009), A new proof of the maximal monotonicity of subdierentials, Journal ofConvex Analysis 16(1), 165โ168.
L. Thibault & D. Zagrodny (1995), Integration of subdierentials of lower semicontinuousfunctions on Banach spaces, Journal of Mathematical Analysis and Applications 189(1),33โ58, doi: 10.1006/jmaa.1995.1003.
L. Thibault (1983), Tangent cones and quasi-interiorly tangent cones to multifunctions,Trans. Amer. Math. Soc. 277(2), 601โ621, doi: 10.2307/1999227.
M. Ulbrich (2002), Semismooth Newton methods for operator equations in function spaces,SIAM Journal on Optimization 13(3), 805โ842 (2003), doi: 10.1137/s1052623400371569.
M. Ulbrich (2011), Semismooth Newton Methods for Variational Inequalities and ConstrainedOptimization Problems in Function Spaces, vol. 11, MOS-SIAM Series on Optimization, Soci-ety for Industrial and Applied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611970692.
T. Valkonen (2014), A primal-dual hybrid gradient method for nonlinear operators withapplications to MRI, Inverse Problems 30(5), 055012, doi: 10.1088/0266-5611/30/5/055012.
383
bibliography
T. Valkonen (2019), Block-proximal methods with spatially adapted acceleration, ElectronicTransactions on Numerical Analysis 51, 15โ49, doi: 10.1553/etna_vol51s15, arxiv: 1609.07373,url: hp://tuomov.iki.fi/m/blockcp.pdf.
T. Valkonen (2020a), First-order primal-dual methods for nonsmooth nonconvex optimi-sation, in: Handbook of Mathematical Models and Algorithms in Computer Vision andImaging, ed. by K. Chen, C.-B. Schรถnlieb, X.-C. Tai & L. Younes, accepted, Springer,arxiv: 1910.00115.
T. Valkonen (2020b), Inertial, corrected, primal-dual proximal splitting, SIAM Journal onOptimization 30(2), 1391โ1420, doi: 10.1137/18m1182851, arxiv: 1804.08736.
T. Valkonen (2020c), Testing and non-linear preconditioning of the proximal point method,Applied Mathematics & Optimization 82(2), 591โ636, doi: 10.1007/s00245-018-9541-6,arxiv: 1703.05705, url: hp://tuomov.iki.fi/m/proxtest.pdf.
T. Valkonen (2021), Preconditioned proximal point methods and notions of partial subregu-larity, Journal of Convex Analysis 28(1), in press, arxiv: 1711.05123, url: hp://tuomov.iki.
fi/m/proxtest.pdf.B. C. Vu (2013), A splitting algorithm for dual monotone inclusions involving cocoerciveoperators, Advances in Computational Mathematics 38(3), 667โ681, doi: 10.1007/s10444-011-9254-8.
S. J. Wright (2015), Coordinate descent algorithms,Mathematical Programming 151(1), 3โ34,doi: 10.1007/s10107-015-0892-3.
S. J. Wright, R. D. Nowak &M.A. T. Figueiredo (2009), Sparse reconstruction by separableapproximation, IEEE Transactions on Signal Processing 57(7), 2479โ2493, doi: 10.1109/tsp.2009.2016892.
D. Yost (1993), Asplund spaces for beginners, Acta Universitatis Carolinae. Mathematica etPhysica 34(2), 159โ177, url: hps://dml.cz/dmlcz/702006.
X. Zhang, M. Burger & S. Osher (2011), A unied primal-dual algorithm framework basedon Bregman iteration, Journal of Scientic Computing 46(1), 20โ46, doi: 10.1007/s10915-010-9408-8.
X. Y. Zheng& K. F. Ng (2010), Metric subregularity and calmness for nonconvex generalizedequations in Banach spaces, SIAM Journal on Optimization 20(5), 2119โ2136, doi: 10.1137/090772174.
Z. Zhou & A.M.-C. So (2017), A unied approach to error bounds for structured convexoptimization problems, Mathematical Programming 165(2), 689โ728, doi: 10.1007/s10107-016-1100-9.
M. Zhu & T. Chan (2008), An ecient primal-dual hybrid gradient algorithm for totalvariation image restoration, CAM Report 08-34, UCLA, url: p://p.math.ucla.edu/
pub/camreport/cam08-34.pdf.
384
top related