nonsmooth analysis and optimization - iki.fi1.2 dual spaces, separation, and weak convergence6 1.3...

Post on 26-Mar-2021

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

INTRODUCTION TO NONSMOOTH

ANALYSIS AND OPTIMIZATION

Christian Clason

christian.clason@uni-due.de

hps://udue.de/clason

orcid: 0000-0002-9948-8426

Tuomo Valkonen

tuomov@iki.fi

hps://tuomov.iki.fi

orcid: 0000-0001-6683-3572

2020-12-31

arxiv: 2001.00216v3

๐œ•๐‘“ (๐‘ข)๐‘“ (๐‘ข) + ใ€ˆ0, ๐‘ขใ€‰

๐‘“ (๐‘ข)

๐‘“ (๐‘ข) + ใ€ˆ๐‘“ โ€ฒ(๐‘ข;โˆ’1), ๐‘ขใ€‰

๐‘ข

CONTENTS

preface v

I BACKGROUND

1 functional analysis 2

1.1 Normed vector spaces 2

1.2 Dual spaces, separation, and weak convergence 6

1.3 Hilbert spaces 12

2 calculus of variations 15

2.1 The direct method 15

2.2 Dierential calculus in Banach spaces 20

2.3 Superposition operators 24

2.4 Variational principles 26

II CONVEX ANALYSIS

3 convex functions 32

3.1 Basic properties 32

3.2 Existence of minimizers 37

3.3 Continuity properties 39

4 convex subdifferentials 42

4.1 Definition and basic properties 42

4.2 Fundamental examples 45

4.3 Calculus rules 50

5 fenchel duality 57

5.1 Fenchel conjugates 57

5.2 Duality of optimization problems 63

i

contents

6 monotone operators and proximal points 68

6.1 Basic properties of set-valued mappings 68

6.2 Monotone operators 71

6.3 Resolvents and proximal points 78

7 smoothness and convexity 86

7.1 Smoothness 86

7.2 Strong convexity 89

7.3 Moreauโ€“Yosida regularization 93

8 proximal point and splitting methods 101

8.1 Proximal point method 101

8.2 Explicit spliing 102

8.3 Implicit spliing 103

8.4 Primal-dual proximal spliing 105

8.5 Primal-dual explicit spliing 107

8.6 The Augmented Lagrangian and alternating direction minimization 109

8.7 Connections 111

9 weak convergence 117

9.1 Opialโ€™s lemma and Fejรฉr monotonicity 117

9.2 The fundamental methods: proximal point and explicit spliing 119

9.3 Preconditioned proximal point methods: DRS and PDPS 122

9.4 Preconditioned explicit spliing methods: PDES and more 127

9.5 Fixed-point theorems 131

10 rates of convergence by testing 134

10.1 The fundamental methods 135

10.2 Structured algorithms and acceleration 138

11 gaps and ergodic results 145

11.1 Gap functionals 145

11.2 Convergence of function values 148

11.3 Ergodic gap estimates 152

11.4 The testing approach in its general form 156

11.5 Ergodic gaps for accelerated primal-dual methods 157

11.6 Convergence of the ADMM 164

12 meta-algorithms 167

12.1 Over-relaxation 167

12.2 Inertia 172

12.3 Line search 177

ii

contents

III NONCONVEX ANALYSIS

13 clarke subdifferentials 181

13.1 Definition and basic properties 181

13.2 Fundamental examples 184

13.3 Calculus rules 188

13.4 Characterization in finite dimensions 197

14 semismooth newton methods 200

14.1 Convergence of generalized Newton methods 200

14.2 Newton derivatives 202

15 nonlinear primal-dual proximal splitting 212

15.1 Nonconvex explicit spliing 212

15.2 NL-PDPS formulation and assumptions 215

15.3 Convergence proof 216

16 limiting subdifferentials 222

16.1 Bouligand subdierentials 222

16.2 Frรฉchet subdierentials 223

16.3 Mordukhovich subdierentials 225

17 Y-subdifferentials and approximate fermat principles 229

17.1 Y-subdierentials 229

17.2 Smooth spaces 231

17.3 Fuzzy Fermat principles 233

17.4 Approximate Fermat principles and projections 237

IV SET-VALUED ANALYSIS

18 tangent and normal cones 240

18.1 Definitions and examples 240

18.2 Basic relationships and properties 245

18.3 Polarity and limiting relationships 248

18.4 Regularity 257

19 tangent and normal cones of pointwise-defined sets 260

19.1 Derivability 260

19.2 Tangent and normal cones 262

20 derivatives and coderivatives of set-valued mappings 270

20.1 Definitions 270

20.2 Basic properties 273

20.3 Examples 277

20.4 Relation to subdierentials 286

iii

contents

21 derivatives and coderivatives of pointwise-defined mappings 289

21.1 Proto-dierentiability 289

21.2 Graphical derivatives and coderivatives 291

22 calculus for the graphical derivative 295

22.1 Semi-dierentiability 295

22.2 Cone transformation formulas 297

22.3 Calculus rules 299

23 calculus for the frรฉchet coderivative 306

23.1 Semi-codierentiability 306

23.2 Cone transformation formulas 307

23.3 Calculus rules 309

24 calculus for the clarke graphical derivative 314

24.1 Strict dierentiability 314

24.2 Cone transformation formulas 316

24.3 Calculus rules 318

25 calculus for the limiting coderivative 321

25.1 Strict codierentiability 321

25.2 Partial sequential normal compactness 322

25.3 Cone transformation formulas 325

25.4 Calculus rules 328

26 second-order optimality conditions 331

26.1 Second-order derivatives 331

26.2 Subconvexity 334

26.3 Suicient and necessary conditions 336

27 lipschitz-like properties and stability 342

27.1 Lipschitz-like properties of set-valued mappings 342

27.2 Neighborhood-based coderivative criteria 348

27.3 Point-based coderivative criteria 353

27.4 Stability with respect to perturbations 359

28 faster convergence from regularity 364

28.1 Subregularity and submonotonicity of subdierentials 364

28.2 Local linear convergence of explicit spliing 369

iv

PREFACE

Optimization is concerned with nding solutions to problems of the form

min๐‘ฅโˆˆ๐‘ˆ

๐น (๐‘ฅ)

for a function ๐น : ๐‘‹ โ†’ โ„ and a set ๐‘ˆ โŠ‚ ๐‘‹ . Specically, one considers the followingquestions:

1. Does this problem admit a solution, i.e., is there an ๐‘ฅ โˆˆ ๐‘ˆ such that

๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘ˆ ?

2. Is there an intrinsic characterization of ๐‘ฅ , i.e., one not requiring comparison with allother ๐‘ฅ โˆˆ ๐‘ˆ ?

3. How can this ๐‘ฅ be computed (eciently)?

4. Is ๐‘ฅ stable, e.g., with respect to computational errors?

For๐‘ˆ โŠ‚ โ„๐‘ , these questions can be answered in turn roughly as follows:

1. If๐‘ˆ is compact and ๐น is continuous, the WeierstraรŸ Theorem yields that ๐น attains itsminimum at a point ๐‘ฅ โˆˆ ๐‘ˆ (as well as its maximum).

2. If ๐น is dierentiable and๐‘ˆ is open, the Fermat principle

0 = ๐น โ€ฒ(๐‘ฅ)

holds.

3. If ๐น is continuously dierentiable and ๐‘ˆ is open, one can apply the steepest descentor gradient method to compute an ๐‘ฅ satisfying the Fermat principle: Choosing astarting point ๐‘ฅ0 and setting

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐‘ก๐‘˜๐น โ€ฒ(๐‘ฅ๐‘˜), ๐‘˜ = 0, . . . ,

for suitable step sizes ๐‘ก๐‘˜ , we have that ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ for ๐‘˜ โ†’ โˆž.

v

preface

If ๐น is even twice continuously dierentiable, one can apply Newtonโ€™s method to theFermat principle: Choosing a suitable starting point ๐‘ฅ0 and setting

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐น โ€ฒโ€ฒ(๐‘ฅ๐‘˜)โˆ’1๐น โ€ฒ(๐‘ฅ๐‘˜), ๐‘˜ = 0, . . . ,

we have that ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ for ๐‘˜ โ†’ โˆž.

4. If ๐น is twice continuously dierentiable and ๐น โ€ฒโ€ฒ(๐‘ฅ) is invertible, then the inversefunction theorem yields that (๐น โ€ฒ)โˆ’1 exists locally around ๐‘ฅ and is continuously dier-entiable and hence Lipschitz continuous. If we now have computed an approximatesolution ๐‘ฅ to the Fermat principle with ๐น โ€ฒ(๐‘ฅ) = ๐‘ค โ‰  0, we obtain from this theestimate

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€– = โ€–(๐น โ€ฒ)โˆ’1(0) โˆ’ (๐น โ€ฒ)โˆ’1(๐‘ค)โ€– โ‰ค โ€–((๐น โ€ฒ)โˆ’1)โ€ฒ(0)โ€–โ€–๐‘ค โ€–.

A small residual in the Fermat principle therefore implies a small error for theminimizer.

However, there are many practically relevant functions that are not dierentiable, suchas the absolute value or maximum function. The aim of nonsmooth analysis is thereforeto nd generalized derivative concepts that on the one hand allow the above sketchedapproach for such functions and on the other hand admit a suciently rich calculus togive explicit derivatives for a suciently large class of functions. Here we concentrate onthe two classes of

i) convex functions,

ii) locally Lipschitz continuous functions,

which together cover a wide spectrum of applications. In particular, the rst class will leadus to generalized gradient methods, while the second class are the basis for generalizedNewton methods. To x ideas, we aim at treating problems of the form

(P) min๐‘ฅโˆˆ๐ถ

1๐‘โ€–๐‘† (๐‘ฅ) โˆ’ ๐‘งโ€–๐‘๐‘Œ + ๐›ผ

๐‘žโ€–๐‘ฅ โ€–๐‘ž๐‘‹

for a convex set ๐ถ โŠ‚ ๐‘‹ , a (possibly nonlinear but dierentiable) operator ๐‘† : ๐‘‹ โ†’ ๐‘Œ ,๐›ผ โ‰ฅ 0 and ๐‘, ๐‘ž โˆˆ [1,โˆž) (in particular, ๐‘ = 1 and/or ๐‘ž = 1). Such problems are ubiquitousin inverse problems, imaging, and optimal control of dierential equations. Hence, weconsider optimization in innite-dimensional function spaces; i.e., we are looking forfunctions as minimizers. The main benet (beyond the frequently cleaner notation) is thatthe developed algorithms become discretization independent: they can be applied to any(reasonable) nite-dimensional approximation, and the details โ€“ in particular, the nenessโ€“ of the approximation do not inuence the convergence behavior of the algorithms. Aspecial role will be played throughout the book by integral functionals and superpositionoperators that act pointwise on functions, since these allow transferring the often moreexplicit nite-dimensional calculus to the innite-dimensional setting.

vi

preface

Nonsmooth analysis and optimization in nite dimensions has a long history; we referhere to the classical textbooks [Mรคkelรค & Neittaanmรคki 1992; Hiriart-Urruty & Lemarรฉchal1993a; Hiriart-Urruty & Lemarรฉchal 1993b; Rockafellar & Wets 1998] as well as the recent[Bagirov, Karmitsa & Mรคkelรค 2014; Beck 2017]. There also exists a large body of literatureon specic nonsmooth optimization problems, in particular ones involving variationalinequalities and equilibrium constraints; see, e.g., [Outrata, Koฤvara & Zowe 1998; Facchinei& Pang 2003a; Facchinei & Pang 2003b]. In contrast, the innite-dimensional setting isstill being actively developed, with monographs and textbooks focusing on either theory[Clarke 1990; Mordukhovich 2006; Schirotzek 2007; Barbu & Precupanu 2012; Penot 2013;Clarke 2013; Ioe 2017; Mordukhovich 2018] or algorithms [Ito & Kunisch 2008; Ulbrich2011] or restricted settings [Bauschke & Combettes 2017]. The aim of this book is thusto draw together results scattered throughout the literature in order to give a uniedpresentation of theory and algorithms โ€“ both rst- and second-order โ€“ in Banach spacesthat is suitable for an advanced class on mathematical optimization. In order to do this, wefocus on optimization of nonsmooth functionals rather than nonsmooth constraints; inparticular, we do not treat optimization with complementarity or equilibrium constraints,which still see signicant active development in innite dimensions. Regarding generalizedderivatives of set-valued mappings required for the mentioned stability results, we similarlydo not aim for a (possibly fuzzy) general theory and instead restrict ourselves to situationswhere one of a zoo of regularity conditions holds that allows deriving exact results thatstill apply to problems of the form (P). The general theory can be found in, e.g., [Aubin &Frankowska 1990; Rockafellar & Wets 1998; Mordukhovich 2018; Mordukhovich 2006] (towhich this book is, among other things, intended as a gentle introduction).

The book is intended for students and researchers with a solid background in analysis andlinear algebra and an interest in the mathematical foundations of nonsmooth optimization.Since we deal with innite-dimensional spaces, some knowledge of functional analysisis assumed, but the necessary background will be summarized in Chapter 1. Similarly,Chapter 2 collects needed fundamental results from the calculus of variations, including thedirect method for existence of minimizers and the related notion of lower semicontinuity aswell as dierential calculus in Banach spaces, where the results on pointwise superpositionoperators on Lebesgue spaces require elementary (Lebesgue) measure and integrationtheory. Basic familiarity with classical nonlinear optimization is helpful but not necessary.

In Part II we then start our study of convex optimization problems. After introducing convexfunctionals and their basic properties in Chapter 3, we dene our rst generalized derivativein Chapter 4: the convex subdierential, which is no longer a single unique derivative butconsists of a set of equally admissible subderivatives. Nevertheless, we obtain a usefulcorresponding Fermat principle as well as calculus rules. A particularly useful calculusrule in convex optimization is Fenchel duality, which assigns to any optimization problema dual problem that can help treating the original primal problem; this is the content ofChapter 5. We change our viewpoint in Chapter 6 slightly to study the subdierential as aset-valued monotone operator, which leads us to the corresponding resolvent or proximal

vii

preface

point mapping, which will later become the basis of all algorithms. The following Chapter 7discusses the relation between convexity and smoothness of primal and dual problem andintroduces the Moreauโ€“Yosida regularization, which has better properties in both regardsthat can be used to accelerate the convergence of algorithms. We turn to these in Chapter 8,where we start by deriving a number of popular rst-order methods including forward-backward splitting and primal-dual proximal splitting (also known as the Chambolleโ€“Pockmethod). Their convergence under rather general assumptions is then shown in Chapter 9.If additional convexity properties hold, we can even show convergence rates for the iteratesusing a general testing approach; this is carried out in Chapter 10. Otherwise we eitherhave to restrict ourselves to more abstract criticality measures as in Chapter 11 or modifythe algorithms to include over-relaxation or inertia as in Chapter 12. One philosophy wehere wish to pass to the reader is that the development of optimization methods consists,rstly, in suitable reformulation of the problem; secondly, in the preconditioning of theraw optimality conditions; and, thirdly, in testing with appropriate operators whether thisyields fast convergence.

We leave the convex world in Part III. For locally Lipschitz continuous functions, weintroduce the Clarke subdierential in Chapter 13 and derive calculus rules. Not only isthis useful for obtaining a Fermat principle for problems of the form (P), it is also the basisfor dening a further generalized derivative that can be used in place of the Hessian in ageneralized Newton method. This Newton derivative and the corresponding semismoothNewton method is studied in Chapter 14. We also derive and analyze a variant of the primal-dual proximal splitting method suitable for (P) in Chapter 15. We end this part with a shortoutlook Chapters 16 and 17 to further subdierential concepts that can lead to sharperoptimality conditions but in general admit a weaker calculus; we will look at some of thesein detail in the next part.

To derive stability properties of minimization problems, we need to study the sensitivity ofsubdierentials to perturbations and hence generalized derivative concepts for set-valuedmappings; this is the goal of Part IV. The construction of the generalized derivatives isgeometric, based on tangent and normal cones introduced in Chapter 18. From these, weobtain Frรฉchet and limiting (co)derivatives in Chapter 20 and derive calculus rules forthem in Chapters 22 to 25. In particular, we show how to lift the (more extensive) nite-dimensional theory to the special case of pointwise-dened sets and mappings operatorson Lebesgue spaces in Chapters 19 and 21. We then address second-order conditions fornonsmooth nonconvex optimization problems in Chapter 26. In Chapter 27, we use thesederivatives to characterize Lipschitz-like properties of set-valued mappings, which thenare used to obtain the desired stability properties. We also show in Chapter 28 that theseregularity properties imply faster convergence of rst-order methods.

This book can serve as a textbook for several dierent classes:

(i) an introductory course on convex optimization based on Chapters 3 to 10 (excludingSection 3.3 and results on superposition operators) and adding Chapters 11, 12 and 15

viii

preface

as time permits;

(ii) an intermediate course on nonsmooth optimization based on Chapters 3 to 9 (includ-ing Section 3.3 and results on superposition operators) together with Chapters 13, 14,16 and 17;

(iii) an intermediate course on nonsmooth analysis based on Chapters 3 to 6 togetherwith Chapter 13 and Chapters 16 to 20, adding Chapters 22 to 21 as time permits;

(iv) an advanced course on set-valued analysis based on Chapters 16 to 28.

This book is based in part on such graduate lectures given by the rst author in 2014 (inslightly dierent form) and 2016โ€“2017 at the University of Duisburg-Essen and by thesecond author at the University of Cambridge in 2015 and Escuela Politรฉcnica Nacionalin Quito in 2020. Shorter seminars were also delivered at the University of Jyvรคskylรค andthe Escuela Politรฉcnica Nacional in 2017. Part IV of the book was also used in a courseon variational analysis at the EPN in 2019. Parts of the book were also taught by bothauthors at the Winter School โ€œModern Methods in Nonsmooth Optimizationโ€ organized byChristian Kanzow and Daniel Wachsmuth at the University Wรผrzburg in February 2018,for which the notes were further adapted and extended. As such, much (but not all) ofthis material is classical. In particular, Chapters 3 to 7 as well as Chapter 13 are based on[Barbu & Precupanu 2012; Brokate 2014; Schirotzek 2007; Attouch, Buttazzo & Michaille2014; Bauschke & Combettes 2017; Clarke 2013], Chapter 14 is based on [Ulbrich 2002; Ito& Kunisch 2008; Schiela 2008], Chapter 16 is extracted from [Mordukhovich 2006], andChapters 18 to 25 are adapted from [Rockafellar & Wets 1998; Mordukhovich 2006]. Partsof Chapter 17 are adapted from [Ioe 2017], and other parts are original work. On the otherhand, Chapters 8 to 12 as well as Chapters 15, 21 and 28 are adapted from [Valkonen 2020c;Valkonen 2021; Clason, Mazurenko & Valkonen 2019], [Clason & Valkonen 2017b], and[Valkonen 2021], respectively.

Finally, we would like to thank Sebastian Angerhausen, Fernando Jimenez Torres, EnsioSuonperรค, Diego Vargas Jaramillo, Daniel Wachsmuth, and in particular Gerd Wachsmuthfor carefully reading parts of the manuscript, nding mistakes and bits that could beexpressed more clearly, and making helpful suggestions. All remaining errors are of courseour own.

Essen and Quito/Helsinki, December 2020

ix

Part I

BACKGROUND

1

1 FUNCTIONAL ANALYSIS

Functional analysis is the study of innite-dimensional vector spaces and of the operatorsacting between them, and has since its foundations in the beginning of the 20th centurygrown into the lingua franca of modern applied mathematics. In this chapter we collectthe basic concepts and results (and, more importantly, x notations) from linear functionalanalysis that will be used throughout the rest of the book. For details and proofs, the readeris referred to the standard literature, e.g., [Alt 2016; Brezis 2010; Rynne & Youngson 2008],or to [Clason 2020].

1.1 normed vector spaces

In the following, ๐‘‹ will denote a vector space over the eld ๐•‚, where we restrict ourselvesfor the sake of simplicity to the case ๐•‚ = โ„. A mapping โ€– ยท โ€– : ๐‘‹ โ†’ โ„+ โ‰” [0,โˆž) is calleda norm (on ๐‘‹ ), if for all ๐‘ฅ โˆˆ ๐‘‹ there holds

(i) โ€–_๐‘ฅ โ€– = |_ |โ€–๐‘ฅ โ€– for all _ โˆˆ ๐•‚,

(ii) โ€–๐‘ฅ + ๐‘ฆ โ€– โ‰ค โ€–๐‘ฅ โ€– + โ€–๐‘ฆ โ€– for all ๐‘ฆ โˆˆ ๐‘‹ ,(iii) โ€–๐‘ฅ โ€– = 0 if and only if ๐‘ฅ = 0 โˆˆ ๐‘‹ .

Example 1.1. (i) The following mappings dene norms on ๐‘‹ = โ„๐‘ :

โ€–๐‘ฅ โ€–๐‘ =(๐‘โˆ‘๐‘–=1

|๐‘ฅ๐‘– |๐‘) 1/๐‘

, 1 โ‰ค ๐‘ < โˆž,

โ€–๐‘ฅ โ€–โˆž = max๐‘–=1,...,๐‘

|๐‘ฅ๐‘– |.

(ii) The following mappings dene norms on ๐‘‹ = โ„“๐‘ (the space of real-valued se-

2

1 functional analysis

quences for which these terms are nite):

โ€–๐‘ฅ โ€–๐‘ =( โˆžโˆ‘๐‘–=1

|๐‘ฅ๐‘– |๐‘) 1/๐‘

, 1 โ‰ค ๐‘ < โˆž,

โ€–๐‘ฅ โ€–โˆž = sup๐‘–=1,...,โˆž

|๐‘ฅ๐‘– |.

(iii) The following mappings dene norms on ๐‘‹ = ๐ฟ๐‘ (ฮฉ) (the space of real-valuedmeasurable functions on the domain ฮฉ โŠ‚ โ„๐‘‘ for which these terms are nite):

โ€–๐‘ขโ€–๐‘ =(โˆซ

ฮฉ|๐‘ข (๐‘ฅ) |๐‘

) 1/๐‘, 1 โ‰ค ๐‘ < โˆž,

โ€–๐‘ขโ€–โˆž = ess sup๐‘ฅโˆˆฮฉ

|๐‘ข (๐‘ฅ) |,

where ess sup stands for the essential supremum; for details on these denitions,see, e.g., [Alt 2016].

(iv) The following mapping denes a norm on ๐‘‹ = ๐ถ (ฮฉ) (the space of continuousfunctions on ฮฉ):

โ€–๐‘ขโ€–๐ถ = sup๐‘ฅโˆˆฮฉ

|๐‘ข (๐‘ฅ) |.

An analogous norm is dened on ๐‘‹ = ๐ถ0(ฮฉ) (the space of continuous functionson ฮฉ with compact support), if the supremum is taken only over the space ofcontinuous functions on ฮฉ with compact support), if the supremum is taken onlyover ๐‘ฅ โˆˆ ฮฉ.

If โ€– ยท โ€– is a norm on ๐‘‹ , the tuple (๐‘‹, โ€– ยท โ€–) is called a normed vector space, and one frequentlydenotes this by writing โ€– ยท โ€–๐‘‹ . If the norm is canonical (as in Example 1.1 (ii)โ€“(iv)), it is oftenomitted, and one speaks simply of โ€œthe normed vector space ๐‘‹ โ€.

Two norms โ€– ยท โ€–1, โ€– ยท โ€–2 are called equivalent on ๐‘‹ , if there are constants ๐‘1, ๐‘2 > 0 suchthat

๐‘1โ€–๐‘ฅ โ€–2 โ‰ค โ€–๐‘ฅ โ€–1 โ‰ค ๐‘2โ€–๐‘ฅ โ€–2 for all ๐‘ฅ โˆˆ ๐‘‹ .If ๐‘‹ is nite-dimensional, all norms on ๐‘‹ are equivalent. However, the corresponding con-stants ๐‘1 and ๐‘2 may depend on the dimension ๐‘ of ๐‘‹ ; avoiding such dimension-dependentconstants is one of the main reasons to consider optimization in innite-dimensionalspaces.

If (๐‘‹, โ€– ยท โ€–๐‘‹ ) and (๐‘Œ, โ€– ยท โ€–๐‘Œ ) are normed vector spaces with ๐‘‹ โŠ‚ ๐‘Œ , we call ๐‘‹ continuouslyembedded in ๐‘Œ , denoted by ๐‘‹ โ†ฉโ†’ ๐‘Œ , if there exists a ๐ถ > 0 with

โ€–๐‘ฅ โ€–๐‘Œ โ‰ค ๐ถ โ€–๐‘ฅ โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

3

1 functional analysis

A norm directly induces a notion of convergence, the so-called strong convergence. Asequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ converges (strongly in ๐‘‹ ) to a ๐‘ฅ โˆˆ ๐‘‹ , denoted by ๐‘ฅ๐‘› โ†’ ๐‘ฅ , if

lim๐‘›โ†’โˆž โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ โ€–๐‘‹ = 0.

A set๐‘ˆ โŠ‚ ๐‘‹ is called

โ€ข closed, if for every convergent sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘ˆ the limit ๐‘ฅ โˆˆ ๐‘‹ is an elementof๐‘ˆ as well;

โ€ข compact, if every sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘ˆ contains a convergent subsequence {๐‘ฅ๐‘›๐‘˜ }๐‘˜โˆˆโ„•with limit ๐‘ฅ โˆˆ ๐‘ˆ .

A mapping ๐น : ๐‘‹ โ†’ ๐‘Œ is continuous if and only if ๐‘ฅ๐‘› โ†’ ๐‘ฅ implies ๐น (๐‘ฅ๐‘›) โ†’ ๐น (๐‘ฅ). If ๐‘ฅ๐‘› โ†’ ๐‘ฅand ๐น (๐‘ฅ๐‘›) โ†’ ๐‘ฆ imply that ๐น (๐‘ฅ) = ๐‘ฆ (i.e., graph ๐น โŠ‚ ๐‘‹ ร— ๐‘Œ is a closed set), we say that ๐นhas closed graph.

Further we dene for later use for ๐‘ฅ โˆˆ ๐‘‹ and ๐‘Ÿ > 0

โ€ข the open ball ๐•†(๐‘ฅ, ๐‘Ÿ ) โ‰” {๐‘ง โˆˆ ๐‘‹ | โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ < ๐‘Ÿ } andโ€ข the closed ball ๐”น(๐‘ฅ, ๐‘Ÿ ) โ‰” {๐‘ง โˆˆ ๐‘‹ | โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ โ‰ค ๐‘Ÿ }.

The closed ball around 0 โˆˆ ๐‘‹ with radius 1 is also referred to as the unit ball ๐”น๐‘‹ . A set๐‘ˆ โŠ‚ ๐‘‹ is called

โ€ข open, if for all ๐‘ฅ โˆˆ ๐‘ˆ there exists an ๐‘Ÿ > 0 with๐•†(๐‘ฅ, ๐‘Ÿ ) โŠ‚ ๐‘ˆ (i.e., all ๐‘ฅ โˆˆ ๐‘ˆ are interiorpoints of๐‘ˆ );

โ€ข bounded, if it is contained in ๐”น(0, ๐‘Ÿ ) for a ๐‘Ÿ > 0;

โ€ข convex, if for any ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘ˆ and _ โˆˆ [0, 1] also _๐‘ฅ + (1 โˆ’ _)๐‘ฆ โˆˆ ๐‘ˆ .

In normed vector spaces it always holds that the complement of an open set is closed andvice versa (i.e., the closed sets in the sense of topology are exactly the (sequentially) closedset as dened above). The denition of a norm directly implies that both open and closedballs are convex.

For arbitrary ๐‘ˆ , we denote by cl๐‘ˆ the closure of ๐‘ˆ , dened as the smallest closed set thatcontains๐‘ˆ (which coincides with the set of all limit points of convergent sequences in๐‘ˆ );we write int๐‘ˆ for the interior of๐‘ˆ , which is the largest open set contained in๐‘ˆ ; and wewrite bd๐‘ˆ โ‰” cl๐‘ˆ \ int๐‘ˆ for the boundary of๐‘ˆ . Finally, we write co๐‘ˆ for the convex hullof๐‘ˆ , dened as the smallest convex set that contains๐‘ˆ .

A normed vector space ๐‘‹ is called complete if every Cauchy sequence in ๐‘‹ is convergent;in this case,๐‘‹ is called a Banach space. All spaces in Example 1.1 are Banach spaces. Convexsubsets of Banach spaces have the following useful property which derives from the BaireTheorem.

4

1 functional analysis

Lemma 1.2. Let ๐‘‹ be a Banach space and๐‘ˆ โŠ‚ ๐‘‹ be closed and convex. Then

int๐‘ˆ = {๐‘ฅ โˆˆ ๐‘ˆ | for all โ„Ž โˆˆ ๐‘‹ there is a ๐›ฟ > 0 with ๐‘ฅ + ๐‘กโ„Ž โˆˆ ๐‘ˆ for all ๐‘ก โˆˆ [0, ๐›ฟ]} .

The set on the right-hand side is called algebraic interior or core. For this reason, Lemma 1.2is sometimes referred to as the โ€œcore-int Lemmaโ€. Note that the inclusion โ€œโŠ‚โ€ always holdsin normed vector spaces due to the denition of interior points via open balls.

We now consider mappings between normed vector spaces. In the following, let (๐‘‹, โ€– ยท โ€–๐‘‹ )and (๐‘Œ, โ€– ยท โ€–๐‘Œ ) be normed vector spaces,๐‘ˆ โŠ‚ ๐‘‹ , and ๐น : ๐‘ˆ โ†’ ๐‘Œ be a mapping. We denoteby

โ€ข ker ๐น โ‰” {๐‘ฅ โˆˆ ๐‘ˆ | ๐น (๐‘ฅ) = 0} the kernel or null space of ๐น ;โ€ข ran ๐น โ‰” {๐น (๐‘ฅ) โˆˆ ๐‘Œ | ๐‘ฅ โˆˆ ๐‘ˆ } the range of ๐น ;โ€ข graph ๐น โ‰” {(๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ | ๐‘ฆ = ๐น (๐‘ฅ)} the graph of ๐น .

We call ๐น : ๐‘ˆ โ†’ ๐‘Œ

โ€ข continuous at ๐‘ฅ โˆˆ ๐‘ˆ , if for all Y > 0 there exists a ๐›ฟ > 0 with

โ€–๐น (๐‘ฅ) โˆ’ ๐น (๐‘ง)โ€–๐‘Œ โ‰ค Y for all ๐‘ง โˆˆ ๐•†(๐‘ฅ, ๐›ฟ) โˆฉ๐‘ˆ ;

โ€ข Lipschitz continuous, if there exists an ๐ฟ > 0 (called Lipschitz constant) with

โ€–๐น (๐‘ฅ1) โˆ’ ๐น (๐‘ฅ2)โ€–๐‘Œ โ‰ค ๐ฟโ€–๐‘ฅ1 โˆ’ ๐‘ฅ2โ€–๐‘‹ for all ๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐‘ˆ .

โ€ข locally Lipschitz continuous at ๐‘ฅ โˆˆ ๐‘ˆ , if there exists a ๐›ฟ > 0 and a ๐ฟ = ๐ฟ(๐‘ฅ, ๐›ฟ) > 0with

โ€–๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ)โ€–๐‘Œ โ‰ค ๐ฟโ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐•†(๐‘ฅ, ๐›ฟ) โˆฉ๐‘ˆ ;

โ€ข locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘ˆ , if there exists a ๐›ฟ > 0 and a ๐ฟ = ๐ฟ(๐‘ฅ, ๐›ฟ) > 0with

โ€–๐น (๐‘ฅ1) โˆ’ ๐น (๐‘ฅ2)โ€–๐‘Œ โ‰ค ๐ฟโ€–๐‘ฅ1 โˆ’ ๐‘ฅ2โ€–๐‘‹ for all ๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐•†(๐‘ฅ, ๐›ฟ) โˆฉ๐‘ˆ .We will refer to the ๐•†(๐‘ฅ, ๐›ฟ) as the Lipschitz neighborhood of ๐‘ฅ (for ๐น ). If ๐น is locallyLipschitz continuous near every ๐‘ฅ โˆˆ ๐‘ˆ , we call ๐น locally Lipschitz continuous on ๐‘ˆ .

If ๐‘‡ : ๐‘‹ โ†’ ๐‘Œ is linear, continuity is equivalent to the existence of a constant ๐ถ > 0 with

โ€–๐‘‡๐‘ฅ โ€–๐‘Œ โ‰ค ๐ถ โ€–๐‘ฅ โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

5

1 functional analysis

For this reason, continuous linear mappings are called bounded; one speaks of a boundedlinear operator. The space ๐•ƒ(๐‘‹ ;๐‘Œ ) of bounded linear operators is itself a normed vectorspace if endowed with the operator norm

โ€–๐‘‡ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) = sup๐‘ฅโˆˆ๐‘‹\{0}

โ€–๐‘‡๐‘ฅ โ€–๐‘Œโ€–๐‘ฅ โ€–๐‘‹ = sup

โ€–๐‘ฅ โ€–๐‘‹=1โ€–๐‘‡๐‘ฅ โ€–๐‘Œ = sup

โ€–๐‘ฅ โ€–๐‘‹โ‰ค1โ€–๐‘‡๐‘ฅ โ€–๐‘Œ

(which is equal to the smallest possible constant ๐ถ in the denition of continuity). If(๐‘Œ, โ€– ยท โ€–๐‘Œ ) is a Banach space, then so is (๐•ƒ(๐‘‹ ;๐‘Œ ), โ€– ยท โ€–๐•ƒ(๐‘‹ ;๐‘Œ )).Finally, if ๐‘‡ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) is bijective, the inverse ๐‘‡ โˆ’1 : ๐‘Œ โ†’ ๐‘‹ is continuous if and only ifthere exists a ๐‘ > 0 with

๐‘ โ€–๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘‡๐‘ฅ โ€–๐‘Œ for all ๐‘ฅ โˆˆ ๐‘‹ .In this case, โ€–๐‘‡ โˆ’1โ€–๐•ƒ(๐‘Œ ;๐‘‹ ) = ๐‘โˆ’1 for the largest possible choice of ๐‘ .

1.2 dual spaces, separation, and weak convergence

Of particular importance to us is the special case ๐•ƒ(๐‘‹ ;๐‘Œ ) for ๐‘Œ = โ„, the space of boundedlinear functionals on ๐‘‹ . In this case, ๐‘‹ โˆ— โ‰” ๐•ƒ(๐‘‹ ;โ„) is called the dual space (or just dual) of๐‘‹ . For ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and ๐‘ฅ โˆˆ ๐‘‹ , we set

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰” ๐‘ฅโˆ—(๐‘ฅ) โˆˆ โ„.

This duality pairing indicates that we can also interpret it as ๐‘ฅ acting on ๐‘ฅโˆ—, which willbecome important later. The denition of the operator norm immediately implies that

(1.1) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ€–๐‘ฅ โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—.

In many cases, the dual of a Banach space can be identied with another known Banachspace.

Example 1.3. (i) (โ„๐‘ , โ€– ยท โ€–๐‘)โˆ— ๏ฟฝ (โ„๐‘ , โ€– ยท โ€–๐‘ž) with ๐‘โˆ’1 +๐‘žโˆ’1 = 1, where we set 0โˆ’1 = โˆžand โˆžโˆ’1 = 0. The duality pairing is given by

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘ =๐‘โˆ‘๐‘–=1

๐‘ฅโˆ—๐‘– ๐‘ฅ๐‘– .

(ii) (โ„“๐‘)โˆ— ๏ฟฝ (โ„“๐‘ž) for 1 < ๐‘ < โˆž. The duality pairing is given by

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘ =โˆžโˆ‘๐‘–=1

๐‘ฅโˆ—๐‘– ๐‘ฅ๐‘– .

Furthermore, (โ„“1)โˆ— = โ„“โˆž, but (โ„“โˆž)โˆ— is not a sequence space.

6

1 functional analysis

(iii) Analogously, ๐ฟ๐‘ (ฮฉ)โˆ— ๏ฟฝ ๐ฟ๐‘ž (ฮฉ) with ๐‘โˆ’1 + ๐‘žโˆ’1 = 1 for 1 < ๐‘ < โˆž. The dualitypairing is given by

ใ€ˆ๐‘ขโˆ—, ๐‘ขใ€‰๐‘ =โˆซฮฉ๐‘ขโˆ—(๐‘ฅ)๐‘ข (๐‘ฅ) ๐‘‘๐‘ฅ.

Furthermore, ๐ฟ1(ฮฉ)โˆ— ๏ฟฝ ๐ฟโˆž(ฮฉ), but ๐ฟโˆž(ฮฉ)โˆ— is not a function space.

(iv) ๐ถ0(ฮฉ)โˆ— ๏ฟฝ M(ฮฉ), the space of Radon measure; it contains among others theLebesguemeasure as well as Diracmeasures ๐›ฟ๐‘ฅ for๐‘ฅ โˆˆ ฮฉ, dened via ๐›ฟ๐‘ฅ (๐‘ข) = ๐‘ข (๐‘ฅ)for ๐‘ข โˆˆ ๐ถ0(ฮฉ). The duality pairing is given by

ใ€ˆ๐‘ขโˆ—, ๐‘ขใ€‰๐ถ =โˆซฮฉ๐‘ข (๐‘ฅ) ๐‘‘๐‘ขโˆ—.

A central result on dual spaces is the Hahnโ€“Banach Theorem, which comes in both analgebraic and a geometric version.

Theorem 1.4 (Hahnโ€“Banach, algebraic). Let ๐‘‹ be a normed vector space and ๐‘ฅ โˆˆ ๐‘‹ \ {0}.Then there exists a ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— with

โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 1 and ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ .

Theorem 1.5 (Hahnโ€“Banach, geometric). Let ๐‘‹ be a normed vector space and ๐ด, ๐ต โŠ‚ ๐‘‹ beconvex, nonempty, and disjoint.

(i) If ๐ด is open, there exists an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and a _ โˆˆ โ„ with

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ1ใ€‰๐‘‹ < _ โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ2ใ€‰๐‘‹ for all ๐‘ฅ1 โˆˆ ๐ด, ๐‘ฅ2 โˆˆ ๐ต.

(ii) If ๐ด is closed and ๐ต is compact, there exists an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and a _ โˆˆ โ„ with

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ1ใ€‰๐‘‹ โ‰ค _ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ2ใ€‰๐‘‹ for all ๐‘ฅ1 โˆˆ ๐ด, ๐‘ฅ2 โˆˆ ๐ต.

Particularly the geometric version โ€“ also referred to as separation theorems โ€“ is of crucialimportance in convex analysis. We will also require their following variant, which is knownas Eidelheit Theorem.

Corollary 1.6. Let ๐‘‹ be a normed vector space and ๐ด, ๐ต โŠ‚ ๐‘‹ be convex and nonempty. If theinterior int๐ด of ๐ด is nonempty and disjoint with ๐ต, there exists an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— \ {0} and a _ โˆˆ โ„

withใ€ˆ๐‘ฅโˆ—, ๐‘ฅ1ใ€‰๐‘‹ โ‰ค _ โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ2ใ€‰๐‘‹ for all ๐‘ฅ1 โˆˆ ๐ด, ๐‘ฅ2 โˆˆ ๐ต.

7

1 functional analysis

Proof. Theorem 1.5 (i) yields the existence of ๐‘ฅโˆ— and _ satisfying the claim for all ๐‘ฅ1 โˆˆ int๐ด;this inequality is even strict, which also implies ๐‘ฅโˆ— โ‰  0. It thus remains to show that therst inequality also holds for the remaining ๐‘ฅ1 โˆˆ ๐ด \ int๐ด. Since int๐ด is nonempty, thereexists an ๐‘ฅ0 โˆˆ int๐ด, i.e., there is an ๐‘Ÿ > 0 with ๐•†(๐‘ฅ0, ๐‘Ÿ ) โŠ‚ ๐ด. The convexity of ๐ด thenimplies that ๐‘ก๐‘ฅ + (1 โˆ’ ๐‘ก)๐‘ฅ โˆˆ ๐ด for all ๐‘ฅ โˆˆ ๐•†(๐‘ฅ0, ๐‘Ÿ ) and ๐‘ก โˆˆ [0, 1]. Hence,

๐‘ก๐•†(๐‘ฅ0, ๐‘Ÿ ) + (1 โˆ’ ๐‘ก)๐‘ฅ = ๐•†(๐‘ก๐‘ฅ0 + (1 โˆ’ ๐‘ก)๐‘ฅ, ๐‘ก๐‘Ÿ ) โŠ‚ ๐ด,

and in particular ๐‘ฅ (๐‘ก) โ‰” ๐‘ก๐‘ฅ0 + (1 โˆ’ ๐‘ก)๐‘ฅ โˆˆ int๐ด for all ๐‘ก โˆˆ (0, 1).We can thus nd a sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ int๐ด (e.g., ๐‘ฅ๐‘› = ๐‘ฅ (๐‘›โˆ’1)) with ๐‘ฅ๐‘› โ†’ ๐‘ฅ . Due to thecontinuity of ๐‘ฅโˆ— โˆˆ ๐‘‹ = ๐•ƒ(๐‘‹ ;โ„) we can thus pass to the limit ๐‘› โ†’ โˆž and obtain

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = lim๐‘›โ†’โˆžใ€ˆ๐‘ฅ

โˆ—, ๐‘ฅ๐‘›ใ€‰๐‘‹ โ‰ค _. ๏ฟฝ

This can be used to characterize a normed vector space by its dual. For example, a directconsequence of Theorem 1.4 is that the norm on a Banach space can be expressed as anoperator norm.

Corollary 1.7. Let ๐‘‹ be a Banach space. Then for all ๐‘ฅ โˆˆ ๐‘‹ ,

โ€–๐‘ฅ โ€–๐‘‹ = supโ€–๐‘ฅโˆ—โ€–๐‘‹โˆ—โ‰ค1

|ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ |,

and the supremum is attained.

A vector ๐‘ฅ โˆˆ ๐‘‹ can therefore be considered as a linear and, by (1.1), bounded functional on๐‘‹ โˆ—, i.e., as an element of the bidual ๐‘‹ โˆ—โˆ— โ‰” (๐‘‹ โˆ—)โˆ—. The embedding ๐‘‹ โ†ฉโ†’ ๐‘‹ โˆ—โˆ— is realized bythe canonical injection

(1.2) ๐ฝ : ๐‘‹ โ†’ ๐‘‹ โˆ—โˆ—, ใ€ˆ๐ฝ๐‘ฅ, ๐‘ฅโˆ—ใ€‰๐‘‹ โˆ— โ‰” ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—.

Clearly, ๐ฝ is linear; Theorem 1.4 furthermore implies that โ€– ๐ฝ๐‘ฅ โ€–๐‘‹ โˆ—โˆ— = โ€–๐‘ฅ โ€–๐‘‹ . If the canonicalinjection is surjective and we can thus identify ๐‘‹ โˆ—โˆ— with ๐‘‹ , the space ๐‘‹ is called reexive.All nite-dimensional spaces are reexive, as are Example 1.1 (ii) and (iii) for 1 < ๐‘ < โˆž;however, โ„“1, โ„“โˆž as well as ๐ฟ1(ฮฉ), ๐ฟโˆž(ฮฉ) and ๐ถ (ฮฉ) are not reexive. In general, a normedvector space is reexive if and only if its dual space is reexive.

The following consequence of the separation Theorem 1.5 will be of crucial importance inPart IV. For a set ๐ด โŠ‚ ๐‘‹ , we dene the polar cone

๐ดโ—ฆ โ‰” {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐ด} ,

cf. Figure 1.1. Similarly, we dene for ๐ต โŠ‚ ๐‘‹ โˆ— the prepolar cone

๐ตโ—ฆ โ‰” {๐‘ฅ โˆˆ ๐‘‹ | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅโˆ— โˆˆ ๐ต} .

8

1 functional analysis

๐ด

๐ดโ—ฆ0

Figure 1.1: The polar cone ๐ดโ—ฆ is the normal cone at zero to the smallest cone containing ๐ด.

The bipolar cone of ๐ด โŠ‚ ๐‘‹ is then dened as

๐ดโ—ฆโ—ฆ โ‰” (๐ดโ—ฆ)โ—ฆ โŠ‚ ๐‘‹ .

(If ๐‘‹ is reexive, ๐ดโ—ฆโ—ฆ = (๐ดโ—ฆ)โ—ฆ.) For the following statement about polar cones, recall that aset ๐ถ โŠ‚ ๐‘‹ is called a cone if ๐‘ฅ โˆˆ ๐ถ and _ > 0 implies that _๐‘ฅ โˆˆ ๐ถ (such that (pre-, bi-)polarcones are indeed cones).

Theorem 1.8 (bipolar theorem). Let ๐‘‹ be a normed vector space and ๐ด โŠ‚ ๐‘‹ . Then(i) ๐ดโ—ฆ is closed and convex;

(ii) ๐ด โŠ‚ ๐ดโ—ฆโ—ฆ;

(iii) If ๐ด โŠ‚ ๐ต, then ๐ตโ—ฆ โŠ‚ ๐ดโ—ฆ.

(iv) if ๐ถ is a nonempty, closed, and convex cone, then ๐ถ = ๐ถโ—ฆโ—ฆ.

Proof. (i): This follows directly from the denition and the continuity of the duality pairing.

(ii): Let ๐‘ฅ โˆˆ ๐ด be arbitrary. Then by denition of the polar cone, every ๐‘ฅโˆ— โˆˆ ๐ดโ—ฆ satises

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0,

i.e., ๐‘ฅ โˆˆ (๐ดโ—ฆ)โ—ฆ = ๐ดโ—ฆโ—ฆ.

(iii): This is immediate from the denition.

(iv): By (ii), we only need to prove๐ถโ—ฆโ—ฆ โŠ‚ ๐ถ which we do by contradiction. Assume thereforethat there exists ๐‘ฅ โˆˆ ๐ถโ—ฆโ—ฆ \ {0} with ๐‘ฅ โˆ‰ ๐ถ . Applying Theorem 1.4 (ii) to the nonempty(due to (ii)) closed, and convex set ๐ถโ—ฆโ—ฆ and the disjoint compact convex set {๐‘ฅ}, we obtain๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— \ {0} and _ โˆˆ โ„ such that

(1.3) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค _ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ ๐ถ.

9

1 functional analysis

Since๐ถ is a cone, the rst inequality must also hold for ๐‘ก๐‘ฅ โˆˆ ๐ถ for every ๐‘ก > 0. This impliesthat

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐‘กโˆ’1_ โ†’ 0 for ๐‘ก โ†’ โˆž,i.e., ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐ถ must hold, i.e., ๐‘ฅโˆ— โˆˆ ๐ถโ—ฆ. On the other hand, if _ < 0, we obtainby the same argument that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐‘กโˆ’1_ โ†’ โˆ’โˆž for ๐‘ก โ†’ 0,

which cannot hold. Hence, we can take _ = 0 in (1.3). Together, we obtain from ๐‘ฅ โˆˆ ๐ถโ—ฆโ—ฆ thecontradiction

0 < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0. ๏ฟฝ

The duality pairing induces further notions of convergence.

(i) A sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ converges weakly (in ๐‘‹ ) to ๐‘ฅ โˆˆ ๐‘‹ , denoted by ๐‘ฅ๐‘› โ‡€ ๐‘ฅ , if

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘›ใ€‰๐‘‹ โ†’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—.

(ii) A sequence {๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ โˆ— converges weakly-โˆ— (in๐‘‹ โˆ—) to ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, denoted by ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ—,if

ใ€ˆ๐‘ฅโˆ—๐‘›, ๐‘ฅใ€‰๐‘‹ โ†’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

Weak convergence generalizes the concept of componentwise convergence in โ„๐‘ , which โ€“as can be seen from the proof of the Heineโ€“Borel Theorem โ€“ is the appropriate conceptin the context of compactness. Strong convergence in ๐‘‹ implies weak convergence bycontinuity of the duality pairing; in the same way, strong convergence in๐‘‹ โˆ— implies weak-โˆ—convergence. If ๐‘‹ is reexive, weak and weak-โˆ— convergence (both in ๐‘‹ = ๐‘‹ โˆ—โˆ—) coincide.In nite-dimensional spaces, all convergence notions coincide.

Weakly convergent sequences are always bounded; if ๐‘‹ is a Banach space, so are weakly-โˆ—convergent sequences. If ๐‘ฅ๐‘› โ†’ ๐‘ฅ and ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ— or ๐‘ฅ๐‘› โ‡€ ๐‘ฅ and ๐‘ฅโˆ—๐‘› โ†’ ๐‘ฅโˆ—, then ใ€ˆ๐‘ฅโˆ—๐‘›, ๐‘ฅ๐‘›ใ€‰๐‘‹ โ†’ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ . However, the duality pairing of weak(-โˆ—) convergent sequences does not convergein general.

As for strong convergence, one denes weak(-โˆ—) continuity and closedness of mappingsas well as weak(-โˆ—) closedness and compactness of sets. The last property is of fundamen-tal importance in optimization; its characterization is therefore a central result of thischapter.

Theorem 1.9 (Eberleinโ€“Smulyan). If ๐‘‹ is a normed vector space, then ๐”น๐‘‹ is weakly compactif and only if ๐‘‹ is reexive.

10

1 functional analysis

Hence in a reexive space, all bounded sequences contain a weakly (but in general notstrongly) convergent subsequence. Note that weak closedness is a stronger claim thanclosedness, since the property has to hold for more sequences. For convex sets, however,both concepts coincide.

Lemma 1.10. Let ๐‘‹ be a normed vector space and๐‘ˆ โŠ‚ ๐‘‹ be convex. Then๐‘ˆ is weakly closedif and only if๐‘ˆ is closed.

Proof. Weakly closed sets are always closed since a convergent sequence is also weaklyconvergent. Let now๐‘ˆ โŠ‚ ๐‘‹ be convex closed and nonempty (otherwise nothing has to beshown) and consider a sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘ˆ with ๐‘ฅ๐‘› โ‡€ ๐‘ฅ โˆˆ ๐‘‹ . Assume that ๐‘ฅ โˆˆ ๐‘‹ \๐‘ˆ .Then the sets ๐‘ˆ and {๐‘ฅ} satisfy the premise of Theorem 1.5 (ii); we thus nd an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

and a _ โˆˆ โ„ withใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘›ใ€‰๐‘‹ โ‰ค _ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘› โˆˆ โ„•.

Passing to the limit ๐‘› โ†’ โˆž in the rst inequality yields the contradiction

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ . ๏ฟฝ

If ๐‘‹ is not reexive (e.g., ๐‘‹ = ๐ฟโˆž(ฮฉ)), we have to turn to weak-โˆ— convergence.

Theorem 1.11 (Banachโ€“Alaoglu). If ๐‘‹ is a separable normed vector space (i.e., contains acountable dense subset), then ๐”น๐‘‹ โˆ— is weakly-โˆ— compact.

By the WeierstraรŸ Approximation Theorem, both ๐ถ (ฮฉ) and ๐ฟ๐‘ (ฮฉ) for 1 โ‰ค ๐‘ < โˆž areseparable; also, โ„“๐‘ is separable for 1 โ‰ค ๐‘ < โˆž. Hence, bounded and weakly-โˆ— closed balls inโ„“โˆž, ๐ฟโˆž(ฮฉ), andM(ฮฉ) are weakly-โˆ— compact. However, these spaces themselves are notseparable.

We also have the following straightforward improvement of Theorem 1.8 (i).

Lemma 1.12. Let๐‘‹ be a separable normed vector space and๐ด โŠ‚ ๐‘‹ . Then๐ดโ—ฆ is weakly-โˆ— closedand convex.

Note, however, that arbitrary closed convex sets in nonreexive spaces do not have to beweakly-โˆ— closed.Finally, we will also need the following โ€œweak-โˆ—โ€ separation theorem, whose proof isanalogous to the proof of Theorem 1.5 (using the fact that the linear weakly-โˆ— continuousfunctionals are exactly those of the form ๐‘ฅโˆ— โ†ฆโ†’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for some ๐‘ฅ โˆˆ ๐‘‹ ); see also [Rudin1991, Theorem 3.4(b)].

11

1 functional analysis

Theorem 1.13. Let ๐‘‹ be a normed vector space and ๐ด โŠ‚ ๐‘‹ โˆ— be a nonempty, convex, andweakly-โˆ— closed subset and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— \๐ด. Then there exist an ๐‘ฅ โˆˆ ๐‘‹ and a _ โˆˆ โ„ with

ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค _ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘งโˆ— โˆˆ ๐ด.

Since a normed vector space is characterized by its dual, this is also the case for linearoperators acting on this space. For any ๐‘‡ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), the adjoint operator ๐‘‡ โˆ— โˆˆ ๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—) isdened via

ใ€ˆ๐‘‡ โˆ—๐‘ฆโˆ—, ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฆโˆ—,๐‘‡๐‘ฅใ€‰๐‘Œ for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—.

It always holds that โ€–๐‘‡ โˆ—โ€–๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—) = โ€–๐‘‡ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) . Furthermore, the continuity of๐‘‡ implies that๐‘‡ โˆ— is weakly-โˆ— continuous (and ๐‘‡ weakly continuous).

1.3 hilbert spaces

Especially strong duality properties hold in Hilbert spaces. A mapping (ยท | ยท) : ๐‘‹ ร— ๐‘‹ โ†’ โ„

on a vector space ๐‘‹ over โ„ is called inner product, if

(i) (๐›ผ๐‘ฅ + ๐›ฝ๐‘ฆ | ๐‘ง) = ๐›ผ (๐‘ฅ | ๐‘ง) + ๐›ฝ (๐‘ฆ | ๐‘ง) for all ๐‘ฅ, ๐‘ฆ, ๐‘ง โˆˆ ๐‘‹ and ๐›ผ, ๐›ฝ โˆˆ โ„;

(ii) (๐‘ฅ | ๐‘ฆ) = (๐‘ฆ | ๐‘ฅ) for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ ;(iii) (๐‘ฅ | ๐‘ฅ) โ‰ฅ 0 for all ๐‘ฅ โˆˆ ๐‘‹ with equality if and only if ๐‘ฅ = 0.

An inner product induces a norm

โ€–๐‘ฅ โ€–๐‘‹ โ‰”โˆš(๐‘ฅ | ๐‘ฅ)๐‘‹ ,

which satises the Cauchyโ€“Schwarz inequality

(๐‘ฅ | ๐‘ฆ)๐‘‹ โ‰ค โ€–๐‘ฅ โ€–๐‘‹ โ€–๐‘ฆ โ€–๐‘‹ .

If๐‘‹ is complete with respect to the induced norm (i.e., if (๐‘‹, โ€– ยท โ€–๐‘‹ ) is a Banach space), then๐‘‹ is called a Hilbert space; if the inner product is canonical, it is frequently omitted, and theHilbert space is simply denoted by ๐‘‹ . The spaces in Example 1.3 (i)โ€“(iii) for ๐‘ = 2(= ๐‘ž) areall Hilbert spaces, where the inner product coincides with the duality pairing and inducesthe canonical norm.

Directly from the denition of the induced norm we obtain the binomial expansion

(1.4) โ€–๐‘ฅ + ๐‘ฆ โ€–2๐‘‹ = โ€–๐‘ฅ โ€–2๐‘‹ + 2(๐‘ฅ | ๐‘ฆ)๐‘‹ + โ€–๐‘ฆ โ€–2๐‘‹ ,

which in turn can be used to verify the three-point identity

(1.5) (๐‘ฅ โˆ’ ๐‘ฆ | ๐‘ฅ โˆ’ ๐‘ง)๐‘‹ =12 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ โˆ’ 1

2 โ€–๐‘ฆ โˆ’ ๐‘งโ€–2๐‘‹ + 12 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ for all ๐‘ฅ, ๐‘ฆ, ๐‘ง โˆˆ ๐‘‹ .

12

1 functional analysis

(This can be seen as a generalization of the classical Pythagorean theorem in plane geome-try.)

The relevant point in our context is that the dual of a Hilbert space ๐‘‹ can be identiedwith ๐‘‹ itself.

Theorem 1.14 (Frรฉchetโ€“Riesz). Let ๐‘‹ be a Hilbert space. Then for each ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— there exists aunique ๐‘ง๐‘ฅโˆ— โˆˆ ๐‘‹ with โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = โ€–๐‘ง๐‘ฅโˆ— โ€–๐‘‹ and

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = (๐‘ฅ | ๐‘ง๐‘ฅโˆ—)๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

The element ๐‘ง๐‘ฅโˆ— is called Riesz representation of ๐‘ฅโˆ—. The (linear) mapping ๐ฝ๐‘‹ : ๐‘‹ โˆ— โ†’ ๐‘‹ ,๐‘ฅโˆ— โ†ฆโ†’ ๐‘ง๐‘ฅโˆ— , is called Riesz isomorphism, and can be used to show that every Hilbert space isreexive.

Theorem 1.14 allows to use the inner product instead of the duality pairing in Hilbert spaces.For example, a sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ converges weakly to ๐‘ฅ โˆˆ ๐‘‹ if and only if

(๐‘ฅ๐‘› | ๐‘ง)๐‘‹ โ†’ (๐‘ฅ | ๐‘ง)๐‘‹ for all ๐‘ง โˆˆ ๐‘‹ .

This implies that if ๐‘ฅ๐‘› โ‡€ ๐‘ฅ and in addition โ€–๐‘ฅ๐‘›โ€–๐‘‹ โ†’ โ€–๐‘ฅ โ€–๐‘‹ (in which case we say that ๐‘ฅ๐‘›strictly converges to ๐‘ฅ ),

(1.6) โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ โ€–2๐‘‹ = โ€–๐‘ฅ๐‘›โ€–2๐‘‹ โˆ’ 2(๐‘ฅ๐‘› | ๐‘ฅ)๐‘‹ + โ€–๐‘ฅ โ€–2๐‘‹ โ†’ 0,

i.e.,๐‘ฅ๐‘› โ†’ ๐‘ฅ . A normed vector space in which strict convergence implies strong convergenceis said to have the Radonโ€“Riesz property.

Similar statements hold for linear operators on Hilbert spaces. For a linear operator ๐‘‡ โˆˆ๐•ƒ(๐‘‹ ;๐‘Œ ) between Hilbert spaces ๐‘‹ and ๐‘Œ , the Hilbert space adjoint operator ๐‘‡โ˜… โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ) isdened via

(๐‘‡โ˜…๐‘ฆ | ๐‘ฅ)๐‘‹ = (๐‘‡๐‘ฅ | ๐‘ฆ)๐‘Œ for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ฆ โˆˆ ๐‘Œ .If ๐‘‡โ˜… = ๐‘‡ , the operator ๐‘‡ is called self-adjoint. A self-adjoint operator is called positivedenite, if there exists a ๐‘ > 0 such that

(๐‘‡๐‘ฅ | ๐‘ฅ)๐‘‹ โ‰ฅ ๐‘ โ€–๐‘ฅ โ€–2๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

In this case, ๐‘‡ has a bounded inverse ๐‘‡ โˆ’1 with โ€–๐‘‡ โˆ’1โ€–๐•ƒ(๐‘‹ ;๐‘‹ ) โ‰ค ๐‘โˆ’1. We will also use thenotation ๐‘† โ‰ฅ ๐‘‡ for two operators ๐‘†,๐‘‡ : ๐‘‹ โ†’ ๐‘‹ if

(๐‘†๐‘ฅ | ๐‘ฅ)๐‘‹ โ‰ฅ (๐‘‡๐‘ฅ | ๐‘ฅ)๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

Hence ๐‘‡ is positive denite if and only if ๐‘‡ โ‰ฅ ๐‘Id for some ๐‘ > 0; if ๐‘‡ โ‰ฅ 0, we say that ๐‘‡ ismerely positive semi-denite.

13

1 functional analysis

The Hilbert space adjoint is related to the (Banach space) adjoint via ๐‘‡โ˜… = ๐ฝ๐‘‹๐‘‡ โˆ—๐ฝโˆ’1๐‘Œ . If thecontext is obvious, we will not distinguish the two in notation. Similarly, we will also โ€“ bya moderate abuse of notation โ€“ use angled brackets to denote inner products in Hilbertspaces except where we need to refer to both at the same time (which will rarely be thecase, and the danger of confusing inner products with elements of a product space is muchgreater).

14

2 CALCULUS OF VARIATIONS

We rst consider the question about the existence of minimizers of a (nonlinear) functional๐น : ๐‘ˆ โ†’ โ„ for a subset ๐‘ˆ of a Banach space ๐‘‹ . Answering such questions is one of thegoals of the calculus of variations.

2.1 the direct method

It is helpful to include the constraint ๐‘ฅ โˆˆ ๐‘ˆ into the functional by extending ๐น to all of ๐‘‹with the valueโˆž. We thus consider

๐น : ๐‘‹ โ†’ โ„ โ‰” โ„ โˆช {โˆž}, ๐น (๐‘ฅ) ={๐น (๐‘ฅ) if ๐‘ฅ โˆˆ ๐‘ˆ ,โˆž if ๐‘ฅ โˆˆ ๐‘‹ \๐‘ˆ .

We use the usual arithmetic on โ„, i.e., ๐‘ก < โˆž and ๐‘ก + โˆž = โˆž for all ๐‘ก โˆˆ โ„; subtraction andmultiplication of negative numbers with โˆž and in particular ๐น (๐‘ฅ) = โˆ’โˆž is not allowed,however. Thus if there is any ๐‘ฅ โˆˆ ๐‘ˆ at all, a minimizer ๐‘ฅ necessarily must lie in๐‘ˆ .

We thus consider from now on functionals ๐น : ๐‘‹ โ†’ โ„. The set on which ๐น is nite is calledthe eective domain

dom ๐น โ‰” {๐‘ฅ โˆˆ ๐‘‹ | ๐น (๐‘ฅ) < โˆž} .If dom ๐น โ‰  โˆ…, the functional ๐น is called proper.

We now generalize the WeierstraรŸ Theorem (every real-valued continuous function ona compact set attains its minimum and maximum) to Banach spaces and in particular tofunctions of the form ๐น . Since we are only interested in minimizers, we only require aโ€œone-sidedโ€ continuity: We call ๐น lower semicontinuous in ๐‘ฅ โˆˆ ๐‘‹ if

๐น (๐‘ฅ) โ‰ค lim inf๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›) for every {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ with ๐‘ฅ๐‘› โ†’ ๐‘ฅ,

see Figure 2.1. Analogously, we dene weakly(-โˆ—) lower semicontinuous functionals viaweakly(-โˆ—) convergent sequences. Finally,๐น is called coercive if for every sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚๐‘‹ with โ€–๐‘ฅ๐‘›โ€–๐‘‹ โ†’ โˆž we also have ๐น (๐‘ฅ๐‘›) โ†’ โˆž.

15

2 calculus of variations

๐‘ฅ

๐น1(๐‘ฅ)๐‘ฅ๐‘›

๐น1(๐‘ฅ๐‘›)

(a) ๐น1 is lower semicontinuous at ๐‘ฅ

๐‘ฅ

๐น2(๐‘ฅ)

๐‘ฅ๐‘›

๐น2(๐‘ฅ๐‘›)

(b) ๐น2 is not lower semicontinuous at ๐‘ฅ

Figure 2.1: Illustration of lower semicontinuity: two functions ๐น1, ๐น2 : โ„ โ†’ โ„ and a se-quence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• realizing their (identical) limes inferior.

We now have everything at hand to prove the central existence result in the calculus ofvariations. The strategy for its proof is known as the direct method.1

Theorem 2.1. Let ๐‘‹ be a reexive Banach space and ๐น : ๐‘‹ โ†’ โ„ be proper, coercive, andweakly lower semicontinuous. Then the minimization problem

min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ)

has a solution ๐‘ฅ โˆˆ dom ๐น .

Proof. The proof can be separated into three steps.

(i) Pick a minimizing sequence.

Since ๐น is proper, there exists an ๐‘€ โ‰” inf๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) < โˆž (although ๐‘€ = โˆ’โˆž is notexcluded so far). We can thus nd a sequence {๐‘ฆ๐‘›}๐‘›โˆˆโ„• โŠ‚ ran ๐น \ {โˆž} โŠ‚ โ„ with๐‘ฆ๐‘› โ†’ ๐‘€ , i.e., there exists a sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ with

๐น (๐‘ฅ๐‘›) โ†’ ๐‘€ = inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ).

Such a sequence is called minimizing sequence. Note that from the convergence of{๐น (๐‘ฅ๐‘›)}๐‘›โˆˆโ„• we cannot conclude the convergence of {๐‘ฅ๐‘›}๐‘›โˆˆโ„• (yet).

(ii) Show that the minimizing sequence contains a convergent subsequence.

Assume to the contrary that {๐‘ฅ๐‘›}๐‘›โˆˆโ„• is unbounded, i.e., that โ€–๐‘ฅ๐‘›โ€–๐‘‹ โ†’ โˆž for ๐‘› โ†’ โˆž.The coercivity of ๐น then implies that ๐น (๐‘ฅ๐‘›) โ†’ โˆž as well, in contradiction to ๐น (๐‘ฅ๐‘›) โ†’๐‘€ < โˆž by denition of the minimizing sequence. Hence, the sequence is bounded, i.e.,

1This strategy is applied so often in the literature that one usually just writes โ€œExistence of a minimizerfollows from the direct method.โ€ or even just โ€œExistence follows from standard arguments.โ€ The basicidea goes back to Hilbert; the version based on lower semicontinuity which we use here is due to LeonidaTonelli (1885โ€“1946), who through it had a lasting inuence on the modern calculus of variations.

16

2 calculus of variations

there is an ๐‘€ > 0 with โ€–๐‘ฅ๐‘›โ€–๐‘‹ โ‰ค ๐‘€ for all ๐‘› โˆˆ โ„•. In particular, {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐”น(0, ๐‘€).The Eberleinโ€“Smulyan Theorem 1.9 therefore implies the existence of a weaklyconverging subsequence {๐‘ฅ๐‘›๐‘˜ }๐‘˜โˆˆโ„• with limit ๐‘ฅ โˆˆ ๐‘‹ . (This limit is a candidate for theminimizer.)

(iii) Show that its limit is a minimizer.

From the denition of the minimizing sequence, we also have ๐น (๐‘ฅ๐‘›๐‘˜ ) โ†’ ๐‘€ for๐‘˜ โ†’ โˆž. Together with the weak lower semicontinuity of ๐น and the denition of theinmum we thus obtain

inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ) โ‰ค lim inf๐‘˜โ†’โˆž

๐น (๐‘ฅ๐‘›๐‘˜ ) = ๐‘€ = inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) < โˆž.

This implies that ๐‘ฅ โˆˆ dom ๐น and that inf๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) = ๐น (๐‘ฅ) > โˆ’โˆž. Hence, the inmumis attained in ๐‘ฅ which is therefore the desired minimizer. ๏ฟฝ

Remark 2.2. If ๐‘‹ is not reexive but the dual of a separable Banach space, we can argue analogouslyfor weakly-โˆ— lower semicontinuous functionals using the Banachโ€“Alaoglu Theorem 1.11

Note how the topology on ๐‘‹ used in the proof is restricted in step (ii) and (iii): Step (ii)prots from a coarse topology (in which more sequences are convergent), while step (iii)prots from a ne topology (the fewer sequences are convergent, the easier it is to satisfythe lim inf conditions). Since in the cases of interest to us no more than boundedness of aminimizing sequence can be expected, we cannot use a ner than the weak topology. Wethus have to ask whether a suciently large class of (interesting) functionals are weaklylower semicontinuous.

A rst example is the class of bounded linear functionals: For any ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, the functional

๐น : ๐‘‹ โ†’ โ„, ๐‘ฅ โ†ฆโ†’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ ,

is weakly continuous by denition of weak convergence and hence a fortiori weakly lowersemicontinuous. Another advantage of (weak) lower semicontinuity is that it is preservedunder certain operations.

Lemma 2.3. Let๐‘‹ and๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ†’ โ„ be weakly(-โˆ—) lower semicontinuous.Then the following functionals are weakly(-โˆ—) lower semicontinuous as well:

(i) ๐›ผ๐น for all ๐›ผ โ‰ฅ 0;

(ii) ๐น +๐บ for ๐บ : ๐‘‹ โ†’ โ„ weakly(-โˆ—) lower semicontinuous;

(iii) ๐œ‘ โ—ฆ ๐น for ๐œ‘ : โ„ โ†’ โ„ lower semicontinuous and monotonically increasing.

(iv) ๐น โ—ฆ ฮฆ for ฮฆ : ๐‘Œ โ†’ ๐‘‹ weakly(-โˆ—) continuous, i.e., ๐‘ฆ๐‘› โ‡€(โˆ—) ๐‘ฆ implies ฮฆ(๐‘ฆ๐‘›) โ‡€(โˆ—) ฮฆ(๐‘ฆ);

17

2 calculus of variations

(v) ๐‘ฅ โ†ฆโ†’ sup๐‘–โˆˆ๐ผ ๐น๐‘– (๐‘ฅ) with ๐น๐‘– : ๐‘‹ โ†’ โ„ weakly(-โˆ—) lower semicontinuous for all ๐‘– โˆˆ ๐ผ andan arbitrary set ๐ผ .

Note that (v) does not hold for continuous functions.

Proof. We only show the claim for the case of weak lower semicontinuity; the statementsfor weak-โˆ— lower semicontinuity follow by the same arguments.

Statements (i) and (ii) follow directly from the properties of the limes inferior.

For statement (iii), it rst follows from the monotonicity and weak lower semicontinuityof ๐œ‘ that ๐‘ฅ๐‘› โ‡€ ๐‘ฅ implies

๐œ‘ (๐น (๐‘ฅ)) โ‰ค ๐œ‘ (lim inf๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›)) .

It remains to show that the right-hand side can be bounded by lim inf๐‘›โ†’โˆž ๐œ‘ (๐น (๐‘ฅ๐‘›)).For that purpose, we consider the subsequence {๐œ‘ (๐น (๐‘ฅ๐‘›๐‘˜ )}๐‘˜โˆˆโ„• which realizes the lim inf ,i.e., for which lim inf๐‘›โ†’โˆž ๐œ‘ (๐น (๐‘ฅ๐‘›)) = lim๐‘˜โ†’โˆž ๐œ‘ (๐น (๐‘ฅ๐‘›๐‘˜ )). By passing to a further subse-quence which we index by ๐‘˜โ€ฒ for simplicity, we can also obtain that lim inf๐‘˜โ†’โˆž ๐น (๐‘ฅ๐‘›๐‘˜ ) =lim๐‘˜ โ€ฒโ†’โˆž ๐น (๐‘ฅ๐‘›๐‘˜ โ€ฒ ). Since the lim inf restricted to a subsequence can never be smaller thanthat of the full sequence, the monotonicity of๐œ‘ together with its weak lower semicontinuitynow implies that

๐œ‘ (lim inf๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›)) โ‰ค ๐œ‘ ( lim

๐‘˜ โ€ฒโ†’โˆž๐น (๐‘ฅ๐‘›๐‘˜ โ€ฒ )) โ‰ค lim inf

๐‘˜ โ€ฒโ†’โˆž๐œ‘ (๐น (๐‘ฅ๐‘›๐‘˜ โ€ฒ )) = lim inf

๐‘›โ†’โˆž ๐œ‘ (๐น (๐‘ฅ๐‘›)),

where we have used in the last step that a subsequence of a convergent sequence has thesame limit (which coincides with the lim inf).

Statement (iv) follows directly from the weak continuity of ฮฆ, as ๐‘ฆ๐‘› โ‡€ ๐‘ฆ implies that๐‘ฅ๐‘› โ‰” ฮฆ(๐‘ฆ๐‘›) โ‡€ ฮฆ(๐‘ฆ) =: ๐‘ฅ , and the lower semicontinuity of ๐น yields

๐น (ฮฆ(๐‘ฆ)) โ‰ค lim inf๐‘›โ†’โˆž ๐น (ฮฆ(๐‘ฆ๐‘›)) .

Finally, let {๐‘ฅ๐‘›}๐‘›โˆˆโ„• be a weakly converging sequence with limit ๐‘ฅ โˆˆ ๐‘‹ . Then the denitionof the supremum implies that

๐น ๐‘— (๐‘ฅ) โ‰ค lim inf๐‘›โ†’โˆž ๐น ๐‘— (๐‘ฅ๐‘›) โ‰ค lim inf

๐‘›โ†’โˆž sup๐‘–โˆˆ๐ผ

๐น๐‘– (๐‘ฅ๐‘›) for all ๐‘— โˆˆ ๐ผ .

Taking the supremum over all ๐‘— โˆˆ ๐ผ on both sides yields statement (v). ๏ฟฝ

Corollary 2.4. If ๐‘‹ is a Banach space, the norm โ€– ยท โ€–๐‘‹ is proper, coercive, and weakly lowersemicontinuous. Similarly, the dual norm โ€– ยท โ€–๐‘‹ โˆ— is proper, coercive, and weakly-โˆ— lowersemicontinuous.

18

2 calculus of variations

Proof. Coercivity and dom โ€– ยท โ€–๐‘‹ = ๐‘‹ follow directly from the denition. Weak lowersemicontinuity follows from Lemma 2.3 (v) and Corollary 1.7 since

โ€–๐‘ฅ โ€–๐‘‹ = supโ€–๐‘ฅโˆ—โ€–๐‘‹โˆ—โ‰ค1

|ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ |.

The claim for โ€– ยท โ€–๐‘‹ โˆ— follows analogously using the denition of the operator norm in placeof Corollary 1.7. ๏ฟฝ

Another frequently occurring functional is the indicator function2 of a set ๐‘ˆ โŠ‚ ๐‘‹ , denedas

๐›ฟ๐‘ˆ (๐‘ฅ) ={0 ๐‘ฅ โˆˆ ๐‘ˆ ,โˆž ๐‘ฅ โˆˆ ๐‘‹ \๐‘ˆ .

The purpose of this denition is of course to reduce the minimization of a functional๐น : ๐‘‹ โ†’ โ„ over ๐‘ˆ to the minimization of ๐น โ‰” ๐น + ๐›ฟ๐‘ˆ over ๐‘‹ . The following result istherefore important for showing the existence of a minimizer.

Lemma 2.5. Let ๐‘‹ be a Banach space and๐‘ˆ โŠ‚ ๐‘‹ . Then ๐›ฟ๐‘ˆ : ๐‘‹ โ†’ โ„ is

(i) proper if๐‘ˆ is nonempty;

(ii) weakly lower semicontinuous if๐‘ˆ is convex and closed;

(iii) coercive if๐‘ˆ is bounded.

Proof. Statement (i) is clear. For (ii), consider a weakly converging sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹with limit ๐‘ฅ โˆˆ ๐‘‹ . If ๐‘ฅ โˆˆ ๐‘ˆ , then ๐›ฟ๐‘ˆ โ‰ฅ 0 immediately yields

๐›ฟ๐‘ˆ (๐‘ฅ) = 0 โ‰ค lim inf๐‘›โ†’โˆž ๐›ฟ๐‘ˆ (๐‘ฅ๐‘›).

Let now ๐‘ฅ โˆ‰ ๐‘ˆ . Since ๐‘ˆ is convex and closed and hence by Lemma 1.10 also weakly closed,there must be a ๐‘ โˆˆ โ„• with ๐‘ฅ๐‘› โˆ‰ ๐‘ˆ for all ๐‘› โ‰ฅ ๐‘ (otherwise we could โ€“ by passing to asubsequence if necessary โ€“ construct a sequence with ๐‘ฅ๐‘› โ‡€ ๐‘ฅ โˆˆ ๐‘ˆ , in contradiction to theassumption). Thus, ๐›ฟ๐‘ˆ (๐‘ฅ๐‘›) = โˆž for all ๐‘› โ‰ฅ ๐‘ , and therefore

๐›ฟ๐‘ˆ (๐‘ฅ) = โˆž = lim inf๐‘›โ†’โˆž ๐›ฟ๐‘ˆ (๐‘ฅ๐‘›).

For (iii), let๐‘ˆ be bounded, i.e., there exist an๐‘€ > 0 with๐‘ˆ โŠ‚ ๐”น(0, ๐‘€). If โ€–๐‘ฅ๐‘›โ€–๐‘‹ โ†’ โˆž, thenthere exists an ๐‘ โˆˆ โ„• with โ€–๐‘ฅ๐‘›โ€–๐‘‹ > ๐‘€ for all ๐‘› โ‰ฅ ๐‘ , and thus ๐‘ฅ๐‘› โˆ‰ ๐”น(0, ๐‘€) โŠƒ ๐‘ˆ for all๐‘› โ‰ฅ ๐‘ . Hence, ๐›ฟ๐‘ˆ (๐‘ฅ๐‘›) โ†’ โˆž as well. ๏ฟฝ

2not to be confused with the characteristic function ๐Ÿ™๐‘ˆ with ๐Ÿ™๐‘ˆ (๐‘ฅ) = 1 for ๐‘ฅ โˆˆ ๐‘ˆ and 0 else

19

2 calculus of variations

2.2 differential calculus in banach spaces

To characterize minimizers of functionals on innite-dimensional spaces using the Fermatprinciple, we transfer the classical derivative concepts to Banach spaces.

Let ๐‘‹ and ๐‘Œ be Banach spaces, ๐น : ๐‘‹ โ†’ ๐‘Œ be a mapping, and ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ be given.

โ€ข If the one-sided limit

๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰” lim๐‘กโ†’ 0

๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โˆˆ ๐‘Œ

(where ๐‘กโ†’ 0 denotes the limit for arbitrary positive decreasing null sequences) exists,it is called the directional derivative of ๐น in ๐‘ฅ in direction โ„Ž.

โ€ข If ๐น โ€ฒ(๐‘ฅ ;โ„Ž) exists for all โ„Ž โˆˆ ๐‘‹ and

๐ท๐น (๐‘ฅ) : ๐‘‹ โ†’ ๐‘Œ, โ„Ž โ†ฆโ†’ ๐น โ€ฒ(๐‘ฅ ;โ„Ž)

denes a bounded linear operator, we call ๐น Gรขteaux dierentiable (at ๐‘ฅ ) and๐ท๐น (๐‘ฅ) โˆˆ๐•ƒ(๐‘‹ ;๐‘Œ ) its Gรขteaux derivative.

โ€ข If additionallylim

โ€–โ„Žโ€–๐‘‹โ†’0

โ€–๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฅ)โ„Žโ€–๐‘Œโ€–โ„Žโ€–๐‘‹ = 0,

then ๐น is called Frรฉchet dierentiable (in ๐‘ฅ ) and ๐น โ€ฒ(๐‘ฅ) โ‰” ๐ท๐น (๐‘ฅ) โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) its Frรฉchetderivative.

โ€ข If additionally the mapping ๐น โ€ฒ : ๐‘‹ โ†’ ๐•ƒ(๐‘‹ ;๐‘Œ ) is (Lipschitz) continuous, we call ๐น(Lipschitz) continuously dierentiable.

The dierence between Gรขteaux and Frรฉchet dierentiable lies in the approximation errorof ๐น near ๐‘ฅ by ๐น (๐‘ฅ) + ๐ท๐น (๐‘ฅ)โ„Ž: While it only has to be bounded in โ€–โ„Žโ€–๐‘‹ โ€“ i.e., linear inโ€–โ„Žโ€–๐‘‹ โ€“ for a Gรขteaux dierentiable function, it has to be superlinear in โ€–โ„Žโ€–๐‘‹ if ๐น is Frรฉchetdierentiable. (For a xed directionโ„Ž, this is of course also the case forGรขteaux dierentiablefunctions; Frรฉchet dierentiability thus additionally requires a uniformity in โ„Ž.) We alsopoint out that continuous dierentiability always entails Frรฉchet dierentiability.

Remark 2.6. Sometimes a weaker notion than continuous dierentiability is used. A mapping๐น : ๐‘‹ โ†’ ๐‘Œ is called strictly dierentiable in ๐‘ฅ if

(2.1) lim๐‘ฆโ†’๐‘ฅ

โ€–โ„Ž โ€–๐‘‹โ†’0

โ€–๐น (๐‘ฆ + โ„Ž) โˆ’ ๐น (๐‘ฆ) โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘Œโ€–โ„Žโ€–๐‘‹ = 0.

The benet of this denition over that of continuous dierentiability is that the limit process is nowin the function ๐น rather than the derivative ๐น โ€ฒ; strict dierentiability can therefore hold if everyneighborhood of ๐‘ฅ contains points where ๐น is not dierentiable. However, if ๐น is dierentiable

20

2 calculus of variations

everywhere in a neighborhood of ๐‘ฅ , then ๐น is strictly dierentiable if and only if ๐น โ€ฒ is continuous;see [Dontchev & Rockafellar 2014, Proposition 1D.7]. Although many results of Chapters 13 to 25actually hold under the weaker assumption of strict dierentiability, we will therefore work onlywith the more standard notion of continuous dierentiability.

If ๐น is Gรขteaux dierentiable, the Gรขteaux derivative can be computed via

๐ท๐น (๐‘ฅ)โ„Ž =(๐‘‘๐‘‘๐‘ก ๐น (๐‘ฅ + ๐‘กโ„Ž)

) ๏ฟฝ๏ฟฝ๏ฟฝ๐‘ก=0.

Bounded linear operators ๐น โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) are obviously Frรฉchet dierentiable with derivative๐น โ€ฒ(๐‘ฅ) = ๐น โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) for all ๐‘ฅ โˆˆ ๐‘‹ . Further derivatives can be obtained through the usualcalculus, whose proof in Banach spaces is exactly as in โ„๐‘ . As an example, we prove achain rule.

Theorem 2.7. Let ๐‘‹ , ๐‘Œ , and ๐‘ be Banach spaces, and let ๐น : ๐‘‹ โ†’ ๐‘Œ be Frรฉchet dierentiableat ๐‘ฅ โˆˆ ๐‘‹ and ๐บ : ๐‘Œ โ†’ ๐‘ be Frรฉchet dierentiable at ๐‘ฆ โ‰” ๐น (๐‘ฅ) โˆˆ ๐‘Œ . Then ๐บ โ—ฆ ๐น is Frรฉchetdierentiable at ๐‘ฅ and

(๐บ โ—ฆ ๐น )โ€ฒ(๐‘ฅ) = ๐บโ€ฒ(๐น (๐‘ฅ)) โ—ฆ ๐น โ€ฒ(๐‘ฅ).

Proof. For โ„Ž โˆˆ ๐‘‹ with ๐‘ฅ + โ„Ž โˆˆ dom ๐น we have

(๐บ โ—ฆ ๐น ) (๐‘ฅ + โ„Ž) โˆ’ (๐บ โ—ฆ ๐น ) (๐‘ฅ) = ๐บ (๐น (๐‘ฅ + โ„Ž)) โˆ’๐บ (๐น (๐‘ฅ)) = ๐บ (๐‘ฆ + ๐‘”) โˆ’๐บ (๐‘ฆ)

with ๐‘” โ‰” ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ). The Frรฉchet dierentiability of ๐บ thus implies that

โ€–(๐บ โ—ฆ ๐น ) (๐‘ฅ + โ„Ž) โˆ’ (๐บ โ—ฆ ๐น ) (๐‘ฅ) โˆ’๐บโ€ฒ(๐‘ฆ)๐‘”โ€–๐‘ = ๐‘Ÿ1(โ€–๐‘”โ€–๐‘Œ )

with ๐‘Ÿ1(๐‘ก)/๐‘ก โ†’ 0 for ๐‘ก โ†’ 0. The Frรฉchet dierentiability of ๐น further implies

โ€–๐‘” โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘Œ = ๐‘Ÿ2(โ€–โ„Žโ€–๐‘‹ )

with ๐‘Ÿ2(๐‘ก)/๐‘ก โ†’ 0 for ๐‘ก โ†’ 0. In particular,

(2.2) โ€–๐‘”โ€–๐‘Œ โ‰ค โ€–๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘Œ + ๐‘Ÿ2(โ€–โ„Žโ€–๐‘‹ ) .

Hence, with ๐‘ โ‰” โ€–๐บโ€ฒ(๐น (๐‘ฅ))โ€–๐•ƒ(๐‘Œ ;๐‘ ) we have

โ€–(๐บ โ—ฆ ๐น ) (๐‘ฅ + โ„Ž) โˆ’ (๐บ โ—ฆ ๐น ) (๐‘ฅ) โˆ’๐บโ€ฒ(๐น (๐‘ฅ))๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘ โ‰ค ๐‘Ÿ1(โ€–๐‘”โ€–๐‘Œ ) + ๐‘ ๐‘Ÿ2(โ€–โ„Žโ€–๐‘‹ ).

If โ€–โ„Žโ€–๐‘‹ โ†’ 0, we obtain from (2.2) and ๐น โ€ฒ(๐‘ฅ) โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) that โ€–๐‘”โ€–๐‘Œ โ†’ 0 as well, and theclaim follows. ๏ฟฝ

A similar rule for Gรขteaux derivatives does not hold, however.

Of special importance in Part IV will be the following inverse function theorem, whoseproof can be found, e.g., in [Renardy & Rogers 2004, Theorem 10.4].

21

2 calculus of variations

Theorem 2.8 (inverse function theorem). Let ๐น : ๐‘‹ โ†’ ๐‘Œ be a continuously dierentiablemapping between the Banach spaces ๐‘‹ and ๐‘Œ and ๐‘ฅ โˆˆ ๐‘‹ . If ๐น โ€ฒ(๐‘ฅ) : ๐‘‹ โ†’ ๐‘Œ is bijective,then there exists an open set ๐‘‰ โŠ‚ ๐‘Œ with ๐น (๐‘ฅ) โˆˆ ๐‘‰ such that ๐นโˆ’1 : ๐‘‰ โ†’ ๐‘‹ exists and iscontinuously dierentiable.

Of particular relevance in optimization is of course the special case ๐น : ๐‘‹ โ†’ โ„, where๐ท๐น (๐‘ฅ) โˆˆ ๐•ƒ(๐‘‹ ;โ„) = ๐‘‹ โˆ— (if the Gรขteaux derivative exists). Following the usual notationfrom Section 1.2, we will then write ๐น โ€ฒ(๐‘ฅ ;โ„Ž) = ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ for directional derivative indirection โ„Ž โˆˆ ๐‘‹ . Our rst result is the classical Fermat principle characterizing minimizersof a dierentiable functions.

Theorem 2.9 (Fermat principle). Let ๐น : ๐‘‹ โ†’ โ„ be Gรขteaux dierentiable and ๐‘ฅ โˆˆ ๐‘‹ be alocal minimizer of ๐น . Then ๐ท๐น (๐‘ฅ) = 0, i.e.,

ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ = 0 for all โ„Ž โˆˆ ๐‘‹ .

Proof. Let โ„Ž โˆˆ ๐‘‹ be arbitrary. Since ๐‘ฅ is a local minimizer, the coreโ€“int Lemma 1.2 impliesthat there exists an Y > 0 such that ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ + ๐‘กโ„Ž) for all ๐‘ก โˆˆ (0, Y), i.e.,

(2.3) 0 โ‰ค ๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โ†’ ๐น โ€ฒ(๐‘ฅ ;โ„Ž) = ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ for ๐‘ก โ†’ 0,

where we have used the Gรขteaux dierentiability and hence directional dierentiability of๐น . Since the right-hand side is linear in โ„Ž, the same argument for โˆ’โ„Ž yields ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ โ‰ค 0and therefore the claim. ๏ฟฝ

We will also need the following version of the mean value theorem.

Theorem 2.10. Let ๐น : ๐‘‹ โ†’ โ„ be Frรฉchet dierentiable. Then for all ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ ,

๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) =โˆซ 1

0ใ€ˆ๐น โ€ฒ(๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰๐‘‹ ๐‘‘๐‘ก .

Proof. Consider the scalar function

๐‘“ : [0, 1] โ†’ โ„, ๐‘ก โ†ฆโ†’ ๐น (๐‘ฅ + ๐‘กโ„Ž).From Theorem 2.7 we obtain that ๐‘“ (as a composition of mappings on Banach spaces) isdierentiable with

๐‘“ โ€ฒ(๐‘ก) = ใ€ˆ๐น โ€ฒ(๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰๐‘‹ ,and the fundamental theorem of calculus in โ„ yields that

๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) = ๐‘“ (1) โˆ’ ๐‘“ (0) =โˆซ 1

0๐‘“ โ€ฒ(๐‘ก) ๐‘‘๐‘ก =

โˆซ 1

0ใ€ˆ๐น โ€ฒ(๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰๐‘‹ ๐‘‘๐‘ก . ๏ฟฝ

22

2 calculus of variations

As in classical analysis, this result is useful for relating local and pointwise properties ofsmooth functions. A typical example is the following lemma.

Lemma 2.11. Let ๐น : ๐‘‹ โ†’ ๐‘Œ be continuously Frรฉchet dierentiable in a neighborhood ๐‘ˆ of๐‘ฅ โˆˆ ๐‘‹ . Then ๐น is locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘ˆ .

Proof. Since ๐น โ€ฒ : ๐‘ˆ โ†’ ๐•ƒ(๐‘‹ ;๐‘Œ ) is continuous in ๐‘ˆ , there exists a ๐›ฟ > 0 with โ€–๐น โ€ฒ(๐‘ง) โˆ’๐น โ€ฒ(๐‘ฅ)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ‰ค 1 and hence โ€–๐น โ€ฒ(๐‘ง)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ‰ค 1 + โ€–๐น โ€ฒ(๐‘ฅ)โ€–๐‘‹ โˆ— for all ๐‘ง โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) โŠ‚ ๐‘ˆ . For any๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) we also have ๐‘ฅ2 + ๐‘ก (๐‘ฅ1โˆ’๐‘ฅ2) โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) for all ๐‘ก โˆˆ [0, 1] (since balls in normedvector spaces are convex), and hence Theorem 2.10 implies that

โ€–๐น (๐‘ฅ1) โˆ’ ๐น (๐‘ฅ2)โ€–๐‘Œ โ‰คโˆซ 1

0โ€–๐น โ€ฒ(๐‘ฅ2 + ๐‘ก (๐‘ฅ1 โˆ’ ๐‘ฅ2))โ€–๐•ƒ(๐‘‹ ;๐‘Œ )๐‘ก โ€–๐‘ฅ1 โˆ’ ๐‘ฅ2โ€–๐‘‹ ๐‘‘๐‘ก

โ‰ค 1 + โ€–๐น โ€ฒ(๐‘ฅ)โ€–๐•ƒ(๐‘‹ ;๐‘Œ )2 โ€–๐‘ฅ1 โˆ’ ๐‘ฅ2โ€–๐‘‹ .

and thus local Lipschitz continuity near ๐‘ฅ with constant ๐ฟ = 12 (1 + โ€–๐น โ€ฒ(๐‘ฅ)โ€–๐•ƒ(๐‘‹ ;๐‘Œ )). ๏ฟฝ

Note that since the Gรขteaux derivative of ๐น : ๐‘‹ โ†’ โ„ is an element of๐‘‹ โˆ—, it cannot be addedto elements in ๐‘‹ (as required for, e.g., a steepest descent method). However, in Hilbertspaces (and in particular in โ„๐‘ ), we can use the Frรฉchetโ€“Riesz Theorem 1.14 to identify๐ท๐น (๐‘ฅ) โˆˆ ๐‘‹ โˆ— with an element โˆ‡๐น (๐‘ฅ) โˆˆ ๐‘‹ , called the gradient of ๐น at ๐‘ฅ , in a canonical wayvia

ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ = (โˆ‡๐น (๐‘ฅ) | โ„Ž)๐‘‹ for all โ„Ž โˆˆ ๐‘‹ .As an example, let us consider the functional ๐น (๐‘ฅ) = 1

2 โ€–๐‘ฅ โ€–2๐‘‹ , where the norm is induced bythe inner product. Then we have for all ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ that

๐น โ€ฒ(๐‘ฅ ;โ„Ž) = lim๐‘กโ†’ 0

12 (๐‘ฅ + ๐‘กโ„Ž | ๐‘ฅ + ๐‘กโ„Ž)๐‘‹ โˆ’ 1

2 (๐‘ฅ | ๐‘ฅ)๐‘‹๐‘ก

= (๐‘ฅ | โ„Ž)๐‘‹ = ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ ,

since the inner product is linear in โ„Ž for xed ๐‘ฅ . Hence, the squared norm is Gรขteauxdierentiable at every ๐‘ฅ โˆˆ ๐‘‹ with derivative ๐ท๐น (๐‘ฅ) = โ„Ž โ†ฆโ†’ (๐‘ฅ | โ„Ž)๐‘‹ โˆˆ ๐‘‹ โˆ—; it is even Frรฉchetdierentiable since

limโ€–โ„Žโ€–๐‘‹โ†’0

๏ฟฝ๏ฟฝ 12 โ€–๐‘ฅ + โ„Žโ€–2๐‘‹ โˆ’ 1

2 โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ (๐‘ฅ, โ„Ž)๐‘‹๏ฟฝ๏ฟฝ

โ€–โ„Žโ€–๐‘‹ = limโ€–โ„Žโ€–๐‘‹โ†’0

12 โ€–โ„Žโ€–๐‘‹ = 0.

The gradient โˆ‡๐น (๐‘ฅ) โˆˆ ๐‘‹ by denition is given by

(โˆ‡๐น (๐‘ฅ) | โ„Ž)๐‘‹ = ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ = (๐‘ฅ | โ„Ž)๐‘‹ for all โ„Ž โˆˆ ๐‘‹,

i.e., โˆ‡๐น (๐‘ฅ) = ๐‘ฅ . To illustrate how the gradient (in contrast to the derivative) depends on theinner product, let๐‘€ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) be self-adjoint and positive denite (and thus continuouslyinvertible). Then (๐‘ฅ | ๐‘ฆ)๐‘ โ‰” (๐‘€๐‘ฅ | ๐‘ฆ)๐‘‹ also denes an inner product on the vector space ๐‘‹

23

2 calculus of variations

and induces an (equivalent) norm โ€–๐‘ฅ โ€–๐‘ โ‰” (๐‘ฅ | ๐‘ฅ)1/2๐‘ on ๐‘‹ . Hence (๐‘‹, (ยท | ยท)๐‘ ) is a Hilbertspace as well, which we will denote by ๐‘ . Consider now the functional ๐น : ๐‘ โ†’ โ„ with๐น (๐‘ฅ) โ‰” 1

2 โ€–๐‘ฅ โ€–2๐‘‹ (which is well-dened since โ€– ยท โ€–๐‘‹ is also an equivalent norm on๐‘ ). Then, thederivative ๐ท๐น (๐‘ฅ) โˆˆ ๐‘ โˆ— is still given by ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘ = (๐‘ฅ | โ„Ž)๐‘‹ for all โ„Ž โˆˆ ๐‘ (or, equivalently,for all โ„Ž โˆˆ ๐‘‹ since we dened ๐‘ via the same vector space). However, โˆ‡๐น (๐‘ฅ) โˆˆ ๐‘ is nowcharacterized by

(๐‘ฅ | โ„Ž)๐‘‹ = ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘ = (โˆ‡๐น (๐‘ฅ) | โ„Ž)๐‘ = (๐‘€โˆ‡๐น (๐‘ฅ) | โ„Ž)๐‘‹ for all โ„Ž โˆˆ ๐‘,i.e., โˆ‡๐น (๐‘ฅ) = ๐‘€โˆ’1๐‘ฅ โ‰  โˆ‡๐น (๐‘ฅ). (The situation is even more delicate if ๐‘€ is only positivedenite on a subspace, as in the case of ๐‘‹ = ๐ฟ2(ฮฉ) and ๐‘ = ๐ป 1(ฮฉ).)

2.3 superposition operators

A special class of operators on function spaces arise from pointwise application of a real-valued function, e.g.,๐‘ข (๐‘ฅ) โ†ฆโ†’ sin(๐‘ข (๐‘ฅ)). We thus consider for ๐‘“ : ฮฉร—โ„ โ†’ โ„ with ฮฉ โŠ‚ โ„๐‘‘

open and bounded as well as ๐‘, ๐‘ž โˆˆ [1,โˆž] the corresponding superposition or Nemytskiioperator

(2.4) ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ), [๐น (๐‘ข)] (๐‘ฅ) = ๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ.

For this operator to be well-dened requires certain restrictions on ๐‘“ . We call ๐‘“ : ฮฉร—โ„ โ†’ โ„

a Carathรฉodory function if

(i) for all ๐‘ง โˆˆ โ„, the mapping ๐‘ฅ โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is measurable;

(ii) for almost every ๐‘ฅ โˆˆ ฮฉ, the mapping ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is continuous.We additionally require the following growth condition: For given ๐‘, ๐‘ž โˆˆ [1,โˆž) there exist๐‘Ž โˆˆ ๐ฟ๐‘ž (ฮฉ) and ๐‘ โˆˆ ๐ฟโˆž(ฮฉ) with(2.5) |๐‘“ (๐‘ฅ, ๐‘ง) | โ‰ค ๐‘Ž(๐‘ฅ) + ๐‘ (๐‘ฅ) |๐‘ง |๐‘/๐‘ž .Under these conditions, ๐น is even continuous.

Theorem 2.12. If the Carathรฉodory function ๐‘“ : ฮฉ ร— โ„ โ†’ โ„ satises the growth condition(2.5) for ๐‘, ๐‘ž โˆˆ [1,โˆž), then the superposition operator ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ) dened via (2.4) iscontinuous.

Proof. We sketch the essential steps; a complete proof can be found in, e.g., [Appell &Zabreiko 1990, Theorems 3.1, 3.7]. First, one shows for given ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) the measurabilityof ๐น (๐‘ข) using the Carathรฉodory properties. It then follows from (2.5) and the triangleinequality that

โ€–๐น (๐‘ข)โ€–๐ฟ๐‘ž โ‰ค โ€–๐‘Žโ€–๐ฟ๐‘ž + โ€–๐‘โ€–๐ฟโˆž โ€–|๐‘ข |๐‘/๐‘ž โ€–๐ฟ๐‘ž = โ€–๐‘Žโ€–๐ฟ๐‘ž + โ€–๐‘โ€–๐ฟโˆž โ€–๐‘ขโ€–๐‘/๐‘ž๐ฟ๐‘ < โˆž,

24

2 calculus of variations

i.e., ๐น (๐‘ข) โˆˆ ๐ฟ๐‘ž (ฮฉ).To show continuity, we consider a sequence {๐‘ข๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐ฟ๐‘ (ฮฉ) with ๐‘ข๐‘› โ†’ ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ). Thenthere exists a subsequence, again denoted by {๐‘ข๐‘›}๐‘›โˆˆโ„•, that converges pointwise almosteverywhere in ฮฉ, as well as a ๐‘ฃ โˆˆ ๐ฟ๐‘ (ฮฉ) with |๐‘ข๐‘› (๐‘ฅ) | โ‰ค |๐‘ฃ (๐‘ฅ) | + |๐‘ข1(๐‘ฅ) | =: ๐‘”(๐‘ฅ) for all๐‘› โˆˆ โ„• and almost every ๐‘ฅ โˆˆ ฮฉ (see, e.g., [Alt 2016, Lemma 3.22 as well as (3-14) in the proofof Theorem 3.17]). The continuity of ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) then implies ๐น (๐‘ข๐‘›) โ†’ ๐น (๐‘ข) pointwisealmost everywhere as well as

| [๐น (๐‘ข๐‘›)] (๐‘ฅ) | โ‰ค ๐‘Ž(๐‘ฅ) + ๐‘ (๐‘ฅ) |๐‘ข๐‘› (๐‘ฅ) |๐‘/๐‘ž โ‰ค ๐‘Ž(๐‘ฅ) + ๐‘ (๐‘ฅ) |๐‘”(๐‘ฅ) |๐‘/๐‘ž for almost every ๐‘ฅ โˆˆ ฮฉ.

Since ๐‘” โˆˆ ๐ฟ๐‘ (ฮฉ), the right-hand side is in ๐ฟ๐‘ž (ฮฉ), and we can apply Lebesgueโ€™s dominatedconvergence theorem to deduce that ๐น (๐‘ข๐‘›) โ†’ ๐น (๐‘ข) in ๐ฟ๐‘ž (ฮฉ). As this argument can beapplied to any subsequence, the whole sequence must converge to ๐น (๐‘ข), which yield theclaimed continuity. ๏ฟฝ

In fact, the growth condition (2.5) is also necessary for continuity; see [Appell & Zabreiko1990, Theorem 3.2]. In addition, it is straightforward to show that for ๐‘ = ๐‘ž = โˆž, thegrowth condition (2.5) (with ๐‘/๐‘ž โ‰” 0 in this case) implies that ๐น is even locally Lipschitzcontinuous.

Similarly, one would like to show that dierentiability of ๐‘“ implies dierentiability of thecorresponding superposition operator ๐น , ideally with โ€œpointwiseโ€ derivative [๐น โ€ฒ(๐‘ข)โ„Ž] (๐‘ฅ) =๐‘“ โ€ฒ(๐‘ข (๐‘ฅ))โ„Ž(๐‘ฅ) (compare Example 1.3 (iii)). However, this does not hold in general; for ex-ample, the superposition operator dened by ๐‘“ (๐‘ฅ, ๐‘ง) = sin(๐‘ง) is not dierentiable at ๐‘ข = 0for 1 โ‰ค ๐‘ = ๐‘ž < โˆž. The reason is that for a Frรฉchet dierentiable superposition operator๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ) and a direction โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ), the pointwise(!) product has to satisfy๐น โ€ฒ(๐‘ข)โ„Ž โˆˆ ๐ฟ๐‘ž (ฮฉ). This leads to additional conditions on the superposition operator ๐น โ€ฒdened by ๐‘“ โ€ฒ, which is known as two norm discrepancy.

Theorem 2.13. Let ๐‘“ : ฮฉ ร— โ„ โ†’ โ„ be a Carathรฉodory function that satises the growthcondition (2.5) for 1 โ‰ค ๐‘ž < ๐‘ < โˆž. If the partial derivative ๐‘“ โ€ฒ๐‘ง is a Carathรฉodory functionas well and satises (2.5) for ๐‘โ€ฒ = ๐‘ โˆ’ ๐‘ž, the superposition operator ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ) iscontinuously Frรฉchet dierentiable, and its derivative in ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) in direction โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ) isgiven by

[๐น โ€ฒ(๐‘ข)โ„Ž] (๐‘ฅ) = ๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ))โ„Ž(๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ.

Proof. Theorem 2.12 yields that for ๐‘Ÿ โ‰” ๐‘๐‘ž๐‘โˆ’๐‘ž (i.e.,

๐‘Ÿ๐‘ = ๐‘ž

๐‘ โ€ฒ ), the superposition operator

๐บ : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘Ÿ (ฮฉ), [๐บ (๐‘ข)] (๐‘ฅ) = ๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ,

is well-dened and continuous. The Hรถlder inequality further implies that for any ๐‘ข โˆˆ๐ฟ๐‘ (ฮฉ),(2.6) โ€–๐บ (๐‘ข)โ„Žโ€–๐ฟ๐‘ž โ‰ค โ€–๐บ (๐‘ข)โ€–๐ฟ๐‘Ÿ โ€–โ„Žโ€–๐ฟ๐‘ for all โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ),

25

2 calculus of variations

i.e., the pointwise multiplication โ„Ž โ†ฆโ†’ ๐บ (๐‘ข)โ„Ž denes a bounded linear operator ๐ท๐น (๐‘ข) :๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ).Let now โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ) be arbitrary. Since ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is continuously dierentiable byassumption, the classical mean value theorem together with the properties of the integral(in particular, monotonicity, Jensenโ€™s inequality on [0, 1], and Fubiniโ€™s theorem) and (2.6)implies that

โ€–๐น (๐‘ข + โ„Ž) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐น (๐‘ข)โ„Žโ€–๐ฟ๐‘ž

=

(โˆซฮฉ|๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ) + โ„Ž(๐‘ฅ)) โˆ’ ๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ)) โˆ’ ๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ))โ„Ž(๐‘ฅ) |๐‘ž ๐‘‘๐‘ฅ

) 1๐‘ž

=

(โˆซฮฉ

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝโˆซ 1

0๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ) + ๐‘กโ„Ž(๐‘ฅ))โ„Ž(๐‘ฅ) ๐‘‘๐‘ก โˆ’ ๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ))โ„Ž(๐‘ฅ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๐‘ž ๐‘‘๐‘ฅ) 1๐‘ž

โ‰ค(โˆซ 1

0

โˆซฮฉ

๏ฟฝ๏ฟฝ (๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ) + ๐‘กโ„Ž(๐‘ฅ)) โˆ’ ๐‘“ โ€ฒ๐‘ง (๐‘ฅ,๐‘ข (๐‘ฅ))) โ„Ž(๐‘ฅ)๏ฟฝ๏ฟฝ๐‘ž ๐‘‘๐‘ฅ ๐‘‘๐‘ก ) 1๐‘ž

=โˆซ 1

0โ€–(๐บ (๐‘ข + ๐‘กโ„Ž) โˆ’๐บ (๐‘ข))โ„Žโ€–๐ฟ๐‘ž ๐‘‘๐‘ก

โ‰คโˆซ 1

0โ€–๐บ (๐‘ข + ๐‘กโ„Ž) โˆ’๐บ (๐‘ข)โ€–๐ฟ๐‘Ÿ ๐‘‘๐‘ก โ€–โ„Žโ€–๐ฟ๐‘ .

Due to the continuity of ๐บ : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘Ÿ (ฮฉ), the integrand tends to zero uniformly in[0, 1] for โ€–โ„Žโ€–๐ฟ๐‘ โ†’ 0, and hence ๐น is by denition Frรฉchet dierentiable with derivative๐น โ€ฒ(๐‘ข) = ๐ท๐น (๐‘ข) (whose continuity we have already shown). ๏ฟฝ

2.4 variational principles

As the example ๐‘“ (๐‘ก) = 1/๐‘ก on {๐‘ก โˆˆ โ„ : ๐‘ก โ‰ฅ 1} shows, the coercivity requirement inTheorem 2.1 is necessary to obtain minimizers even if the functional is bounded from below.However, sometimes one does not need an exact minimizer and is satised with โ€œalmostminimizersโ€. Variational principles state that such almost minimizers can be obtained asminimizers of a perturbed functional and even give a precise relation between the size ofthe perturbation needed in terms of the desired distance from the inmum.

The most well-known variational principle is Ekelandโ€™s variational principle, which holdsin general complete metric spaces but which we here state in Banach spaces for the sakeof notation. In the statement of the following theorem, note that we do not assume thefunctional to be weakly lower semicontinuous.

Theorem 2.14 (Ekelandโ€™s variational principle). Let ๐‘‹ be a Banach space and ๐น : ๐‘‹ โ†’ โ„ beproper, lower semicontinuous, and bounded from below. Let Y > 0 and ๐‘งY โˆˆ ๐‘‹ be such that

๐น (๐‘งY) < inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) + Y.

26

2 calculus of variations

Then for any _ > 0, there exists an ๐‘ฅ_ โˆˆ ๐‘‹ with

(i) โ€–๐‘ฅ_ โˆ’ ๐‘งY โ€–๐‘‹ โ‰ค _,

(ii) ๐น (๐‘ฅ_) + Y_ โ€–๐‘ฅ_ โˆ’ ๐‘งY โ€–๐‘‹ โ‰ค ๐น (๐‘งY),

(iii) ๐น (๐‘ฅ_) < ๐น (๐‘ฅ) + Y_ โ€–๐‘ฅ โˆ’ ๐‘ฅ_โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ \ {๐‘ฅ_}.

Proof. The proof proceeds similarly to that of Theorem 2.1: We construct an โ€œalmostminimizingโ€ sequence, show that it converges, and verify that the limit has the desiredproperties. Here we proceed inductively. First, set ๐‘ฅ0 โ‰” ๐‘งY . For given ๐‘ฅ๐‘› , dene now

๐‘†๐‘› โ‰”{๐‘ฅ โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ ๐น (๐‘ฅ) + Y_โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘›โ€–๐‘‹ โ‰ค ๐น (๐‘ฅ๐‘›)

}.

Since ๐‘ฅ๐‘› โˆˆ ๐‘†๐‘› , this set is nonempty. We can thus choose ๐‘ฅ๐‘›+1 โˆˆ ๐‘†๐‘› such that

(2.7) ๐น (๐‘ฅ๐‘›+1) โ‰ค 12๐น (๐‘ฅ๐‘›) +

12 inf๐‘ฅโˆˆ๐‘†๐‘›

๐น (๐‘ฅ),

which is possible because either the right-hand side equals ๐น (๐‘ฅ๐‘›) (in which case we choose๐‘ฅ๐‘›+1 = ๐‘ฅ๐‘›) or is strictly greater, in which case there must exist such an ๐‘ฅ๐‘›+1 by the propertiesof the inmum. By construction, the sequence {๐น (๐‘ฅ๐‘›)}๐‘›โˆˆโ„• is thus decreasing as well asbounded from below and therefore convergent. Using the triangle inequality, the fact that๐‘ฅ๐‘›+1 โˆˆ ๐‘†๐‘› , and the telescoping sum, we also obtain that for any๐‘š โ‰ฅ ๐‘› โˆˆ โ„•,

(2.8) Y

_โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ๐‘šโ€–๐‘‹ โ‰ค

๐‘šโˆ’1โˆ‘๐‘—=๐‘›

Y

_โ€–๐‘ฅ ๐‘— โˆ’ ๐‘ฅ ๐‘—+1โ€–๐‘‹ โ‰ค ๐น (๐‘ฅ๐‘›) โˆ’ ๐น (๐‘ฅ๐‘š).

Hence, {๐‘ฅ๐‘›}๐‘›โˆˆโ„• is a Cauchy sequence since {๐น (๐‘ฅ๐‘›)}๐‘›โˆˆโ„• is one and hence converges to some๐‘ฅ_ โˆˆ ๐‘‹ since ๐‘‹ is complete.

We now show that this limit has the claimed properties. We begin with (ii), for which weuse the fact that both ๐น and the norm in ๐‘‹ are lower semicontinuous and hence obtainfrom (2.8) by taking๐‘š โ†’ โˆž that

(2.9) Y

_โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ_โ€–๐‘‹ + ๐น (๐‘ฅ_) โ‰ค lim sup

๐‘šโ†’โˆžY

_โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ๐‘šโ€–๐‘‹ + ๐น (๐‘ฅ๐‘š) โ‰ค ๐น (๐‘ฅ๐‘›) for any ๐‘› โ‰ฅ 0.

Choosing in particular ๐‘› = 0 such that ๐‘ฅ0 = ๐‘งY yields (ii).

Furthermore, by denition of ๐‘งY , this implies that

Y

_โ€–๐‘งY โˆ’ ๐‘ฅ_โ€–๐‘‹ โ‰ค ๐น (๐‘งY) โˆ’ ๐น (๐‘ฅ_) โ‰ค ๐น (๐‘งY) โˆ’ inf

๐‘ฅโˆˆ๐‘‹๐น (๐‘ฅ) < Y

and hence (i).

27

2 calculus of variations

Assume now that (iii) does not hold, i.e., that there exists an ๐‘ฅ โˆˆ ๐‘‹ \ {๐‘ฅ_} such that

(2.10) ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ_) โˆ’Y

_โ€–๐‘ฅ โˆ’ ๐‘ฅ_โ€–๐‘‹ < ๐น (๐‘ฅ_).

Estimating ๐น (๐‘ฅ_) using (2.9) and then using the productive zero together with the triangleinequality, we obtain from the rst inequality that for all ๐‘› โˆˆ โ„•,

๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ๐‘›) โˆ’ Y

_โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ_โ€–๐‘‹ โˆ’ Y

_โ€–๐‘ฅ โˆ’ ๐‘ฅ_โ€–๐‘‹ โ‰ค ๐น (๐‘ฅ๐‘›) โˆ’ Y

_โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ โ€–๐‘‹ .

Hence, ๐‘ฅ โˆˆ ๐‘†๐‘› for all ๐‘› โˆˆ โ„•. From (2.7), we then deduce that

2๐น (๐‘ฅ๐‘›+1) โˆ’ ๐น (๐‘ฅ๐‘›) โ‰ค ๐น (๐‘ฅ) for all ๐‘› โˆˆ โ„•.

The convergence of {๐น (๐‘ฅ๐‘›)}๐‘›โˆˆโ„• together with (2.10) and the lower semicontinuity of ๐น thusyields the contradiction

lim๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›) โ‰ค ๐น (๐‘ฅ) < ๐น (๐‘ฅ_) โ‰ค lim

๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›). ๏ฟฝ

Ekelandโ€™s variational principle has the disadvantage that even for dierentiable ๐น , theperturbed function that is minimized by ๐‘ฅ_ is inherently nonsmooth. This is dierent forsmooth variational principles such as the following one due to Borwein and Preiss [Borwein& Preiss 1987].

Theorem 2.15 (Borweinโ€“Preiss variational principle). Let๐‘‹ be a Banach space and ๐น : ๐‘‹ โ†’ โ„

be proper, lower semicontinuous, and bounded from below. Let Y > 0 and ๐‘งY โˆˆ ๐‘‹ be such that

๐น (๐‘งY) < inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) + Y.

Then for any _ > 0 and ๐‘ โ‰ฅ 1, there exists

โ€ข a sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„•0 โŠ‚ ๐‘‹ with ๐‘ฅ0 = ๐‘งY converging strongly to some ๐‘ฅ_ โˆˆ ๐‘‹ and

โ€ข a sequence {`๐‘›}๐‘›โˆˆโ„•0 โŠ‚ (0,โˆž) with โˆ‘โˆž๐‘›=0 `๐‘› = 1

such that

(i) โ€–๐‘ฅ_ โˆ’ ๐‘ฅ๐‘›โ€–๐‘‹ โ‰ค _ for all ๐‘› โˆˆ โ„• โˆช {0},(ii) ๐น (๐‘ฅ_) + Y

_๐‘โˆ‘โˆž๐‘›=0 `๐‘›โ€–๐‘ฅ_ โˆ’ ๐‘ฅ๐‘›โ€–๐‘๐‘‹ โ‰ค ๐น (๐‘งY),

(iii) ๐น (๐‘ฅ_) + Y_๐‘

โˆ‘โˆž๐‘›=0 `๐‘›โ€–๐‘ฅ_ โˆ’ ๐‘ฅ๐‘›โ€–๐‘๐‘‹ โ‰ค ๐น (๐‘ฅ) + Y

_๐‘โˆ‘โˆž๐‘›=0 `๐‘›โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘›โ€–๐‘๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

Proof. We proceed similarly to the proof of Theorem 2.14 by induction. First, we choseconstants ๐›พ, [, `, \ > 0 such that

โ€ข ๐น (๐‘งY) โˆ’ inf๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) < [ < ๐›พ < Y,

28

2 calculus of variations

โ€ข ` < 1 โˆ’ ๐›พY ,

โ€ข \ < `

(1 โˆ’

([๐›พ

) 1/๐‘ )๐‘.

Let now ๐‘ฅ0 โ‰” ๐‘งY and ๐น0 โ‰” ๐น and set ๐›ฟ := (1 โˆ’ `) Y_๐‘ > 0. We then dene

๐น1(๐‘ฅ) := ๐น0(๐‘ฅ) + ๐›ฟ`โ€–๐‘ฅ โˆ’ ๐‘ฅ0โ€–๐‘ for all ๐‘ฅ โˆˆ ๐‘‹ .By construction, we then have

inf๐‘ฅโˆˆ๐‘‹

๐น1(๐‘ฅ) โ‰ค ๐น1(๐‘ฅ0) = ๐น0(๐‘ฅ0),

and thus we can nd, by the same argument as for (2.7), an ๐‘ฅ1 โˆˆ ๐‘‹ with

๐น1(๐‘ฅ1) โ‰ค \๐น0(๐‘ฅ0) + (1 โˆ’ \ ) inf๐‘ฅโˆˆ๐‘‹

๐น1(๐‘ฅ).

Continuing in this manner, we obtain sequences {๐‘ฅ๐‘›}๐‘›โˆˆโ„• and {๐น๐‘›}๐‘›โˆˆโ„• with

(2.11) ๐น๐‘›+1(๐‘ฅ) = ๐น๐‘› (๐‘ฅ) + ๐›ฟ`๐‘›โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘›โ€–๐‘๐‘‹and

(2.12) ๐น๐‘›+1(๐‘ฅ๐‘›+1) โ‰ค \๐น๐‘› (๐‘ฅ๐‘›) + (1 โˆ’ \ ) inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ).

Set now ๐‘ ๐‘› โ‰” inf๐‘ฅโˆˆ๐‘‹ ๐น๐‘› (๐‘ฅ) and ๐‘Ž๐‘› โ‰” ๐น๐‘› (๐‘ฅ๐‘›). Then (2.11) implies that {๐‘ ๐‘›}๐‘›โ‰ฅ0 is monoton-ically increasing, while (2.12) implies that {๐‘Ž๐‘›}๐‘›โ‰ฅ0 is monotonically decreasing. We thushave

(2.13) ๐‘ ๐‘› โ‰ค ๐‘ ๐‘›+1 โ‰ค ๐‘Ž๐‘›+1 โ‰ค \๐‘Ž๐‘› + (1 โˆ’ \ )๐‘ ๐‘›+1 โ‰ค ๐‘Ž๐‘›,which can be rearranged to show for all ๐‘› โ‰ฅ 0 that

(2.14) ๐‘Ž๐‘›+1 โˆ’ ๐‘ ๐‘›+1 โ‰ค \๐‘Ž๐‘› + (1 โˆ’ \ )๐‘ ๐‘›+1 โˆ’ ๐‘ ๐‘›+1 = \ (๐‘Ž๐‘› โˆ’ ๐‘ ๐‘›+1) โ‰ค \ (๐‘Ž๐‘› โˆ’ ๐‘ ๐‘›) โ‰ค \๐‘› (๐‘Ž0 โˆ’ ๐‘ 0).This together with the monotonicity of the two sequences and the boundedness of ๐น frombelow shows that lim๐‘›โ†’โˆž ๐‘Ž๐‘› = lim๐‘›โ†’โˆž ๐‘ ๐‘› โˆˆ โ„. We now use (2.11) in (2.13) to obtain that

๐‘Ž๐‘› โ‰ฅ ๐‘Ž๐‘›+1 = ๐น๐‘› (๐‘ฅ๐‘›) + ๐›ฟ`๐‘›โ€–๐‘ฅ๐‘›+1 โˆ’ ๐‘ฅ๐‘›โ€–๐‘ โ‰ฅ ๐‘ ๐‘› + ๐›ฟ`๐‘›โ€–๐‘ฅ๐‘›+1 โˆ’ ๐‘ฅ๐‘›โ€–๐‘,which together with (2.14) and the choice of [ yields

๐›ฟ`๐‘›โ€–๐‘ฅ๐‘›+1 โˆ’ ๐‘ฅ๐‘›โ€–๐‘๐‘‹ โ‰ค ๐‘Ž๐‘› โˆ’ ๐‘ ๐‘› โ‰ค \๐‘› (๐‘Ž0 โˆ’ ๐‘ 0) < [\๐‘› .The choice of \ and ` now ensure that 0 < \

` < 1, which implies that

(2.15) โ€–๐‘ฅ๐‘š โˆ’ ๐‘ฅ๐‘›โ€–๐‘‹ โ‰ค๐‘šโˆ’๐‘›โˆ’1โˆ‘๐‘˜=๐‘›

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘‹ โ‰ค([๐›ฟ

) 1/๐‘ ๐‘šโˆ’๐‘›โˆ’1โˆ‘๐‘˜=๐‘›

(\

`

)๐‘˜/๐‘โ‰ค

([๐›ฟ

) 1/๐‘ (\

`

)๐‘›/๐‘ (1 โˆ’

(\

`

) 1/๐‘)โˆ’1for all๐‘š,๐‘› โ‰ฅ 0

29

2 calculus of variations

using the partial geometric series

๐‘šโˆ’๐‘›โˆ’1โˆ‘๐‘˜=๐‘›

๐›ผ๐‘˜ =๐‘šโˆ’๐‘›โˆ’1โˆ‘๐‘˜=0

๐›ผ๐‘˜ โˆ’๐‘›โˆ’1โˆ‘๐‘˜=0

๐›ผ๐‘˜ =1 โˆ’ ๐›ผ๐‘šโˆ’๐‘›

1 โˆ’ ๐›ผ โˆ’ 1 โˆ’ ๐›ผ๐‘›1 โˆ’ ๐›ผ <

๐›ผ๐‘›

1 โˆ’ ๐›ผ

valid for any ๐›ผ โˆˆ (0, 1). Hence {๐‘ฅ๐‘›}๐‘› โˆˆ โ„• is a Cauchy sequence which therefore convergesto some ๐‘ฅ_ โˆˆ ๐‘‹ . Setting `๐‘› โ‰” `๐‘› (1 โˆ’ `) > 0, we also have โˆ‘โˆž

๐‘›=0 `๐‘› = 1 by the choice of` < 1. Furthermore, the denition of `๐‘› and ๐›ฟ implies for all ๐‘ฅ โˆˆ ๐‘‹ that

(2.16) ๐น (๐‘ฅ) + Y

_๐‘

โˆžโˆ‘๐‘˜=0

`๐‘˜ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘๐‘‹ = lim๐‘›โ†’โˆž ๐น (๐‘ฅ) +

๐‘›โˆ‘๐‘˜=0

๐›ฟ`๐‘˜ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘๐‘‹ = lim๐‘›โ†’โˆž ๐น๐‘› (๐‘ฅ).

It remains to verify the claims on ๐‘ฅ_ . First, (2.15) together with the choice of \ and ๐›ฟ impliesfor all ๐‘›,๐‘š โ‰ฅ 0 that

โ€–๐‘ฅ๐‘š โˆ’ ๐‘ฅ๐‘›โ€–๐‘‹ โ‰ค([๐›ฟ

) 1/๐‘ ([

๐›พ

)โˆ’1/๐‘=

(๐›พ๐›ฟ

) 1/๐‘<

( Y๐›ฟ

) 1/๐‘(1 โˆ’ `)1/๐‘ = _.

Letting๐‘š โ†’ โˆž for xed ๐‘› โˆˆ โ„• โˆช {0} now shows (i).

Second, by (2.11) and the denition of ๐›ฟ , we have

๐น (๐‘ฅ๐‘›) + Y

_๐‘

โˆžโˆ‘๐‘˜=0

`๐‘˜ โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘๐‘‹ = ๐น๐‘› (๐‘ฅ๐‘›) + Y

_๐‘

โˆžโˆ‘๐‘˜=๐‘›+1

`๐‘˜ โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘๐‘‹ โ‰ค ๐‘Ž๐‘› + Yโˆžโˆ‘

๐‘˜=๐‘›+1`๐‘˜ ,

where the inequality follows from (i). The lower semicontinuity of ๐น and of the norm thusyield

(2.17) ๐น (๐‘ฅ_) +Y

_๐‘

โˆžโˆ‘๐‘˜=0

`๐‘˜ โ€–๐‘ฅ_ โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘๐‘‹ โ‰ค lim๐‘›โ†’โˆž๐‘Ž๐‘› โ‰ค ๐‘Ž0 = ๐น (๐‘งY)

since {๐‘Ž๐‘›}๐‘›โ‰ฅ0 is monotonically decreasing. This shows (ii).

Finally, (2.16) and the denition of ๐‘ ๐‘› imply for all ๐‘ฅ โˆˆ ๐‘‹ that

๐น (๐‘ฅ) + Y

_๐‘

โˆžโˆ‘๐‘˜=0

`๐‘˜ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘๐‘‹ = lim๐‘›โ†’โˆž ๐น๐‘› (๐‘ฅ) โ‰ฅ lim

๐‘›โ†’โˆž ๐‘ ๐‘› = lim๐‘›โ†’โˆž๐‘Ž๐‘›,

which together with (2.17) yields (iii). ๏ฟฝ

The Borweinโ€“Preiss variational principle therefore guarantees a smooth perturbationif, e.g., ๐‘‹ is a Hilbert space and ๐‘ = 2. Further smooth variational principles that allowfor more general smooth perturbations such as the Devilleโ€“Godefroyโ€“Zizzler variationalprinciple can be found in, e.g., [Borwein & Zhu 2005; Schirotzek 2007].

30

Part II

CONVEX ANALYSIS

31

3 CONVEX FUNCTIONS

The classical derivative concepts from the previous chapter are not sucient for ourpurposes, since many interesting functionals are not dierentiable in this sense; also, theycannot handle functionals with values in โ„. We therefore need a derivative concept that ismore general than Gรขteaux and Frรฉchet derivatives and still allows a Fermat principle aswell as a rich calculus. Throughout this and the following chapters, ๐‘‹ will be a normedvector space unless noted otherwise.

3.1 basic properties

We rst consider a general class of functionals that admit such a generalized derivative. Afunctional ๐น : ๐‘‹ โ†’ โ„ is called convex if for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ and _ โˆˆ [0, 1], it holds that

(3.1) ๐น (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ) โ‰ค _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฆ)

(where the function valueโˆž is allowed on both sides). If for all ๐‘ฅ, ๐‘ฆ โˆˆ dom ๐น with ๐‘ฅ โ‰  ๐‘ฆand all _ โˆˆ (0, 1) we even have

๐น (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ) < _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฆ),

we call ๐น strictly convex.

An alternative characterization of the convexity of a functional ๐น : ๐‘‹ โ†’ โ„ is based on itsepigraph

epi ๐น โ‰” {(๐‘ฅ, ๐‘ก) โˆˆ ๐‘‹ ร—โ„ | ๐น (๐‘ฅ) โ‰ค ๐‘ก} .

Lemma 3.1. Let ๐น : ๐‘‹ โ†’ โ„. Then epi ๐น is

(i) nonempty if and only if ๐น is proper;

(ii) convex if and only if ๐น is convex;

(iii) (weakly) closed if and only if ๐น is (weakly) lower semicontinuous.1

1For that reason, some authors use the term closed to refer to lower semicontinuous functionals. We willstick with the latter, much less ambiguous, term throughout the following.

32

3 convex functions

Proof. Statement (i) follows directly from the denition: ๐น is proper if and only if thereexists an ๐‘ฅ โˆˆ ๐‘‹ and a ๐‘ก โˆˆ โ„ with ๐น (๐‘ฅ) โ‰ค ๐‘ก < โˆž, i.e., (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น .

For (ii), let ๐น be convex and (๐‘ฅ, ๐‘Ÿ ), (๐‘ฆ, ๐‘ ) โˆˆ epi ๐น be given. For any _ โˆˆ [0, 1], the denition(3.1) then implies that

๐น (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ) โ‰ค _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฆ) โ‰ค _๐‘Ÿ + (1 โˆ’ _)๐‘ ,

i.e., that_(๐‘ฅ, ๐‘Ÿ ) + (1 โˆ’ _) (๐‘ฆ, ๐‘ ) = (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ, _๐‘Ÿ + (1 โˆ’ _)๐‘ ) โˆˆ epi ๐น,

and hence epi ๐น is convex. Let conversely epi ๐น be convex and ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ be arbitrary,where we can assume that ๐น (๐‘ฅ) < โˆž and ๐น (๐‘ฆ) < โˆž (otherwise (3.1) is trivially satised).We clearly have (๐‘ฅ, ๐น (๐‘ฅ)), (๐‘ฆ, ๐น (๐‘ฆ)) โˆˆ epi ๐น . The convexity of epi ๐น then implies for all_ โˆˆ [0, 1] that

(_๐‘ฅ + (1 โˆ’ _)๐‘ฆ, _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฆ)) = _(๐‘ฅ, ๐น (๐‘ฅ)) + (1 โˆ’ _) (๐‘ฆ, ๐น (๐‘ฆ)) โˆˆ epi ๐น,

and hence by denition of epi ๐น that (3.1) holds.

Finally, we show (iii): Let rst ๐น be lower semicontinuous, and let {(๐‘ฅ๐‘›, ๐‘ก๐‘›)}๐‘›โˆˆโ„• โŠ‚ epi ๐น bean arbitrary sequence with (๐‘ฅ๐‘›, ๐‘ก๐‘›) โ†’ (๐‘ฅ, ๐‘ก) โˆˆ ๐‘‹ ร—โ„. Then we have that

๐น (๐‘ฅ) โ‰ค lim inf๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›) โ‰ค lim sup

๐‘›โ†’โˆž๐‘ก๐‘› = ๐‘ก,

i.e., (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น . Let conversely epi ๐น be closed and assume that ๐น is proper (otherwisethe claim holds trivially) and not lower semicontinuous. Then there exists a sequence{๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ with ๐‘ฅ๐‘› โ†’ ๐‘ฅ โˆˆ ๐‘‹ and

๐น (๐‘ฅ) > lim inf๐‘›โ†’โˆž ๐น (๐‘ฅ๐‘›) =: ๐‘€ โˆˆ [โˆ’โˆž,โˆž) .

We now distinguish two cases.

a) ๐‘ฅ โˆˆ dom ๐น : In this case, we can select a subsequence, again denoted by {๐‘ฅ๐‘›}๐‘›โˆˆโ„•, suchthat there exists an Y > 0with ๐น (๐‘ฅ๐‘›) โ‰ค ๐น (๐‘ฅ) โˆ’Y and thus (๐‘ฅ๐‘›, ๐น (๐‘ฅ) โˆ’Y) โˆˆ epi ๐น for all๐‘› โˆˆ โ„•. From ๐‘ฅ๐‘› โ†’ ๐‘ฅ and the closedness of epi ๐น , we deduce that (๐‘ฅ, ๐น (๐‘ฅ) โˆ’ Y) โˆˆ epi ๐นand hence ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ) โˆ’ Y, contradicting Y > 0.

b) ๐‘ฅ โˆ‰ dom ๐น : In this case, we can argue similarly using ๐น (๐‘ฅ๐‘›) โ‰ค ๐‘€ + Y for๐‘€ > โˆ’โˆž or๐น (๐‘ฅ๐‘›) โ‰ค Y for๐‘€ = โˆ’โˆž to obtain a contradiction with ๐น (๐‘ฅ) = โˆž.

The equivalence of weak lower semicontinuity and weak closedness follows in exactly thesame way. ๏ฟฝ

Note that (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น implies that ๐‘ฅ โˆˆ dom ๐น ; hence the eective domain of a proper,convex, and lower semicontinuous functional is always nonempty, convex, and closed aswell. Also, together with Lemma 1.10 we immediately obtain

33

3 convex functions

Corollary 3.2. Let ๐น : ๐‘‹ โ†’ โ„ be convex. Then ๐น is weakly lower semicontinuous if and only๐น is lower semicontinuous.

Also useful for the study of a functional ๐น : ๐‘‹ โ†’ โ„ are the corresponding sublevel sets

sub๐‘ก ๐น โ‰” {๐‘ฅ โˆˆ ๐‘‹ | ๐น (๐‘ฅ) โ‰ค ๐‘ก} , ๐‘ก โˆˆ โ„,

for which one shows as in Lemma 3.1 the following properties.

Lemma 3.3. Let ๐น : ๐‘‹ โ†’ โ„.

(i) If ๐น is convex, sub๐‘ก ๐น is convex for all ๐‘ก โˆˆ โ„ (but the converse does not hold).

(ii) ๐น is (weakly) lower semicontinuous if and only if sub๐‘ก ๐น is (weakly) closed for all ๐‘ก โˆˆ โ„.

Directly from the denition we obtain the convexity of

(i) continuous ane functionals of the form ๐‘ฅ โ†ฆโ†’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›ผ for xed ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and๐›ผ โˆˆ โ„;

(ii) the norm โ€– ยท โ€–๐‘‹ in a normed vector space ๐‘‹ ;

(iii) the indicator function ๐›ฟ๐ถ for a convex set ๐ถ .

If ๐‘‹ is a Hilbert space, ๐น (๐‘ฅ) = โ€–๐‘ฅ โ€–2๐‘‹ is even strictly convex: For ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ with ๐‘ฅ โ‰  ๐‘ฆ andany _ โˆˆ (0, 1),

โ€–_๐‘ฅ + (1 โˆ’ _)๐‘ฆ โ€–2๐‘‹ = (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ | _๐‘ฅ + (1 โˆ’ _)๐‘ฆ)๐‘‹= _2(๐‘ฅ | ๐‘ฅ)๐‘‹ + 2_(1 โˆ’ _) (๐‘ฅ | ๐‘ฆ)๐‘‹ + (1 โˆ’ _)2(๐‘ฆ | ๐‘ฆ)๐‘‹= _

(_(๐‘ฅ | ๐‘ฅ)๐‘‹ โˆ’ (1 โˆ’ _) (๐‘ฅ โˆ’ ๐‘ฆ | ๐‘ฅ)๐‘‹ + (1 โˆ’ _) (๐‘ฆ | ๐‘ฆ)๐‘‹

)+ (1 โˆ’ _)

(_(๐‘ฅ | ๐‘ฅ)๐‘‹ + _(๐‘ฅ โˆ’ ๐‘ฆ | ๐‘ฆ)๐‘‹ + (1 โˆ’ _) (๐‘ฆ | ๐‘ฆ)๐‘‹

)= (_ + (1 โˆ’ _))

(_(๐‘ฅ | ๐‘ฅ)๐‘‹ + (1 โˆ’ _) (๐‘ฆ | ๐‘ฆ)๐‘‹

)โˆ’ _(1 โˆ’ _) (๐‘ฅ โˆ’ ๐‘ฆ | ๐‘ฅ โˆ’ ๐‘ฆ)๐‘‹

= _โ€–๐‘ฅ โ€–2๐‘‹ + (1 โˆ’ _)โ€–๐‘ฆ โ€–2๐‘‹ โˆ’ _(1 โˆ’ _)โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹< _โ€–๐‘ฅ โ€–2๐‘‹ + (1 โˆ’ _)โ€–๐‘ฆ โ€–2๐‘‹ .

Further examples can be constructed as in Lemma 2.3 through the following operations.

Lemma 3.4. Let ๐‘‹ and ๐‘Œ be normed vector spaces and let ๐น : ๐‘‹ โ†’ โ„ be convex. Then thefollowing functionals are convex as well:

(i) ๐›ผ๐น for all ๐›ผ โ‰ฅ 0;

(ii) ๐น +๐บ for ๐บ : ๐‘‹ โ†’ โ„ convex (if ๐น or ๐บ are strictly convex, so is ๐น +๐บ);

34

3 convex functions

(iii) ๐œ‘ โ—ฆ ๐น for ๐œ‘ : โ„ โ†’ โ„ convex and increasing;

(iv) ๐น โ—ฆ ๐พ for ๐พ : ๐‘Œ โ†’ ๐‘‹ linear;

(v) ๐‘ฅ โ†ฆโ†’ sup๐‘–โˆˆ๐ผ ๐น๐‘– (๐‘ฅ) with ๐น๐‘– : ๐‘‹ โ†’ โ„ convex for an arbitrary set ๐ผ .

Lemma 3.4 (v) in particular implies that the pointwise supremum of continuous anefunctionals is always convex. In fact, any convex functional can be written in this way. Toshow this, we dene for a proper functional ๐น : ๐‘‹ โ†’ โ„ the convex envelope

๐น ฮ“ (๐‘ฅ) โ‰” sup {๐‘Ž(๐‘ฅ) | ๐‘Ž continuous ane with ๐‘Ž(๐‘ฅ) โ‰ค ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ } .

Note that ๐น ฮ“ : ๐‘‹ โ†’ [โˆ’โˆž,โˆž] without further assumptions of ๐น .

Lemma 3.5. Let ๐น : ๐‘‹ โ†’ โ„ be proper. Then ๐น is convex and lower semicontinuous if and onlyif ๐น = ๐น ฮ“ .

Proof. Since ane functionals are convex, Lemma 3.4 (v) and Lemma 2.3 (v) imply that๐น = ๐น ฮ“ is always continuous and lower semicontinuous.

Let now ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous. It is clear from thedenition of ๐น ฮ“ as a pointwise supremum that ๐น ฮ“ โ‰ค ๐น always holds. Assume therefore that๐น ฮ“ < ๐น . Then there exists an ๐‘ฅ0 โˆˆ ๐‘‹ and a _ โˆˆ โ„ with

๐น ฮ“ (๐‘ฅ0) < _ < ๐น (๐‘ฅ0).

We now use the Hahnโ€“Banach separation theorem to construct a continuous ane func-tional ๐‘Ž โˆˆ ๐‘‹ โˆ— with ๐‘Ž โ‰ค ๐น but ๐‘Ž(๐‘ฅ0) > _ > ๐น ฮ“ (๐‘ฅ0), which would contradict the denition of๐น ฮ“ . Since ๐น is proper, convex, and lower semicontinuous, epi ๐น is nonempty, convex, andclosed by Lemma 3.1. Furthermore, {(๐‘ฅ0, _)} is compact and, as _ < ๐น (๐‘ฅ0), disjoint withepi ๐น . Theorem 1.5 (ii) hence yields a ๐‘งโˆ— โˆˆ (๐‘‹ ร—โ„)โˆ— and an ๐›ผ โˆˆ โ„ with

ใ€ˆ๐‘งโˆ—, (๐‘ฅ, ๐‘ก)ใ€‰๐‘‹ร—โ„ โ‰ค ๐›ผ < ใ€ˆ๐‘งโˆ—, (๐‘ฅ0, _)ใ€‰๐‘‹ร—โ„ for all (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น .

We now dene an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— via ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘งโˆ—, (๐‘ฅ, 0)ใ€‰๐‘‹ร—โ„ for all ๐‘ฅ โˆˆ ๐‘‹ and set ๐‘  โ‰”ใ€ˆ๐‘งโˆ—, (0, 1)ใ€‰๐‘‹ร—โ„ โˆˆ โ„. Then ใ€ˆ๐‘งโˆ—, (๐‘ฅ, ๐‘ก)ใ€‰๐‘‹ร—โ„ = ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐‘ ๐‘ก and hence

(3.2) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐‘ ๐‘ก โ‰ค ๐›ผ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ0ใ€‰๐‘‹ + ๐‘ _ for all (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น .

Now for (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น we also have (๐‘ฅ, ๐‘ก โ€ฒ) โˆˆ epi ๐น for all ๐‘ก โ€ฒ > ๐‘ก , and the rst inequality in(3.2) implies that for all suciently large ๐‘ก โ€ฒ > 0,

๐‘  โ‰ค ๐›ผ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹๐‘ก โ€ฒ

โ†’ 0 for ๐‘ก โ€ฒ โ†’ โˆž.

Hence ๐‘  โ‰ค 0. We continue with a case distinction.

35

3 convex functions

(i) ๐‘  < 0: We set๐‘Ž : ๐‘‹ โ†’ โ„, ๐‘ฅ โ†ฆโ†’ ๐›ผ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹

๐‘ ,

which is ane and continuous. Furthermore, (๐‘ฅ, ๐น (๐‘ฅ)) โˆˆ epi ๐น for any ๐‘ฅ โˆˆ dom ๐น ,and using the productive zero in the rst inequality in (3.2) implies (noting ๐‘  < 0!)that

๐‘Ž(๐‘ฅ) = 1๐‘  (๐›ผ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐‘ ๐น (๐‘ฅ)) + ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ).

(For ๐‘ฅ โˆ‰ dom ๐น this holds trivially.) But the second inequality in (3.2) implies that

๐‘Ž(๐‘ฅ0) = 1๐‘  (๐›ผ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ0ใ€‰๐‘‹ ) > _.

(ii) ๐‘  = 0: Then ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐›ผ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ0ใ€‰๐‘‹ for all ๐‘ฅ โˆˆ dom ๐น , which can only hold for๐‘ฅ0 โˆ‰ dom ๐น . But ๐น is proper, and hence we can nd a ๐‘ฆ0 โˆˆ dom ๐น , for which wecan construct as in case (i) by separating epi ๐น and (๐‘ฆ0, `) for suciently small ` acontinuous ane functional ๐‘Ž0 : ๐‘‹ โ†’ โ„ with ๐‘Ž0 โ‰ค ๐น pointwise. For ๐œŒ > 0 we nowset

๐‘Ž๐œŒ : ๐‘‹ โ†’ โ„, ๐‘ฅ โ†ฆโ†’ ๐‘Ž0(๐‘ฅ) + ๐œŒ (ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›ผ) ,which is continuous ane as well. Since ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐›ผ , we also have that ๐‘Ž๐œŒ (๐‘ฅ) โ‰ค๐‘Ž0(๐‘ฅ) โ‰ค ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ dom ๐น and arbitrary ๐œŒ > 0. But due to ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ0ใ€‰๐‘‹ > ๐›ผ , wecan choose ๐œŒ > 0 with ๐‘Ž๐œŒ (๐‘ฅ0) > _.

In both cases, the denition of ๐น ฮ“ as a supremum implies that ๐น ฮ“ (๐‘ฅ0) > _ as well, contra-dicting the assumption ๐น ฮ“ (๐‘ฅ0) < _. ๏ฟฝ

Remark 3.6. Using the weak-โˆ— Hahnโ€“Banach Theorem 1.13 in place of Theorem 1.5, the same proofshows that a proper functional ๐น : ๐‘‹ โˆ— โ†’ โ„ is convex and weakly-โˆ— lower semicontinuous if andonly if ๐น = ๐นฮ“ for

๐นฮ“ (๐‘ฅโˆ—) โ‰” sup {ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐›ผ | ๐‘ฅ โˆˆ ๐‘‹, ๐›ผ โˆˆ โ„, ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐›ผ โ‰ค ๐น (๐‘ฅโˆ—) for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—} .

(Note that a convex and weakly-โˆ— lower semicontinuous functional need not be lower semicontinu-ous, since convex and closed sets need not be weakly-โˆ— closed.)

A particularly useful class of convex functionals in the calculus of variations arises fromintegral functionals with convex integrands dened through superposition operators.

Lemma 3.7. Let ๐‘“ : โ„ โ†’ โ„ be proper, convex, and lower semicontinuous. If ฮฉ โŠ‚ โ„๐‘‘ isbounded and 1 โ‰ค ๐‘ โ‰ค โˆž, this also holds for

๐น : ๐ฟ๐‘ (ฮฉ) โ†’ โ„, ๐‘ข โ†ฆโ†’{โˆซ

ฮฉ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ if ๐‘“ โ—ฆ ๐‘ข โˆˆ ๐ฟ1(ฮฉ),

โˆž else.

36

3 convex functions

Proof. First, Lemma 3.5 implies that there exist ๐‘, ๐‘ โˆˆ โ„ such that

(3.3) ๐‘“ (๐‘ก) โ‰ฅ ๐‘๐‘ก โˆ’ ๐‘ for all ๐‘ก โˆˆ โ„.

Since ฮฉ is bounded, this proves that ๐น (๐‘ข) > โˆ’โˆž for all ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) โŠ‚ ๐ฟ1(ฮฉ), consequentlyfor ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) as ๐ฟ๐‘ (ฮฉ) โŠ‚ ๐ฟ1(ฮฉ) due to the boundedness of ฮฉ. Since ๐‘“ is proper, there is a๐‘ก0 โˆˆ dom ๐‘“ . Hence, also using that ฮฉ is bounded, the constant function ๐‘ข0 โ‰ก ๐‘ก0 โˆˆ dom ๐นsatises ๐น (๐‘ข0) < โˆž. This shows that ๐น is proper.

To show convexity, we take ๐‘ข, ๐‘ฃ โˆˆ dom ๐น (since otherwise (3.1) is trivially satised) and_ โˆˆ [0, 1] arbitrary. The convexity of ๐‘“ now implies that

๐‘“ (_๐‘ข (๐‘ฅ) + (1 โˆ’ _)๐‘ฃ (๐‘ฅ)) โ‰ค _๐‘“ (๐‘ข (๐‘ฅ)) + (1 โˆ’ _) ๐‘“ (๐‘ฃ (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ.

Since ๐‘ข, ๐‘ฃ โˆˆ dom ๐น and ๐ฟ1(ฮฉ) is a vector space, _๐‘“ (๐‘ข (๐‘ฅ)) + (1 โˆ’ _) ๐‘“ (๐‘ฃ (๐‘ฅ)) โˆˆ ๐ฟ1(ฮฉ) as well.Similarly, the left-hand side is bounded from below by ๐‘ (_๐‘ข (๐‘ฅ) + (1 โˆ’ _)๐‘ฃ (๐‘ฅ)) โˆ’ ๐‘ โˆˆ ๐ฟ1(ฮฉ)by (3.3). We can thus integrate the inequality over ฮฉ to obtain the convexity of ๐น .

To show lower semicontinuity, we use Lemma 3.1. Let {(๐‘ข๐‘›, ๐‘ก๐‘›)}๐‘›โˆˆโ„• โŠ‚ epi ๐น with ๐‘ข๐‘› โ†’ ๐‘ขin ๐ฟ๐‘ (ฮฉ) and ๐‘ก๐‘› โ†’ ๐‘ก in โ„. Then there exists a subsequence {๐‘ข๐‘›๐‘˜ }๐‘˜โˆˆโ„• with ๐‘ข๐‘›๐‘˜ (๐‘ฅ) โ†’ ๐‘ข (๐‘ฅ)almost everywhere. Hence, the lower semicontinuity of ๐‘“ together with Fatouโ€™s Lemmaimplies thatโˆซ

ฮฉ๐‘“ (๐‘ข (๐‘ฅ)) โˆ’ (๐‘๐‘ข (๐‘ฅ) โˆ’ ๐›ผ) ๐‘‘๐‘ฅ โ‰ค

โˆซฮฉlim inf๐‘˜โ†’โˆž

(๐‘“ (๐‘ข๐‘›๐‘˜ (๐‘ฅ)) โˆ’ (๐‘๐‘ข๐‘›๐‘˜ (๐‘ฅ) โˆ’ ๐›ผ)) ๐‘‘๐‘ฅ

โ‰ค lim inf๐‘˜โ†’โˆž

โˆซฮฉ๐‘“ (๐‘ข๐‘›๐‘˜ (๐‘ฅ)) โˆ’ (๐‘๐‘ข๐‘›๐‘˜ (๐‘ฅ) โˆ’ ๐›ผ) ๐‘‘๐‘ฅ

= lim inf๐‘˜โ†’โˆž

โˆซฮฉ๐‘“ (๐‘ข๐‘›๐‘˜ (๐‘ฅ)) ๐‘‘๐‘ฅ โˆ’

โˆซฮฉ๐‘๐‘ข (๐‘ฅ) โˆ’ ๐›ผ ๐‘‘๐‘ฅ

as the integrands are nonnegative due to (3.3). Since (๐‘ข๐‘›๐‘˜ , ๐‘ก๐‘›๐‘˜ ) โˆˆ epi ๐น , this yields

๐น (๐‘ข) =โˆซฮฉ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ โ‰ค lim inf

๐‘˜โ†’โˆž

โˆซฮฉ๐‘“ (๐‘ข๐‘›๐‘˜ (๐‘ฅ)) ๐‘‘๐‘ฅ = lim inf

๐‘˜โ†’โˆž๐น (๐‘ข๐‘›๐‘˜ ) โ‰ค lim

๐‘˜โ†’โˆž๐‘ก๐‘›๐‘˜ = ๐‘ก,

i.e., (๐‘ข, ๐‘ก) โˆˆ epi ๐น . Hence epi ๐น is closed, and the lower semicontinuity of ๐น follows fromLemma 3.1 (iii). ๏ฟฝ

3.2 existence of minimizers

After all this preparation, we can quickly prove the main result on existence of solutionsto convex minimization problems.

Theorem 3.8. Let ๐‘‹ be a reexive Banach space and let

37

3 convex functions

(i) ๐‘ˆ โŠ‚ ๐‘‹ be nonempty, convex, and closed;

(ii) ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous with dom ๐น โˆฉ๐‘ˆ โ‰  โˆ…;(iii) ๐‘ˆ be bounded or ๐น be coercive.

Then the problemmin๐‘ฅโˆˆ๐‘ˆ

๐น (๐‘ฅ)admits a solution ๐‘ฅ โˆˆ ๐‘ˆ โˆฉ dom ๐น . If ๐น is strictly convex, the solution is unique.

Proof. We consider the extended functional ๐น = ๐น + ๐›ฟ๐‘ˆ : ๐‘‹ โ†’ โ„. Assumption (i) togetherwith Lemma 2.5 implies that ๐›ฟ๐‘ˆ is proper, convex, and weakly lower semicontinuous.From (i) we obtain an ๐‘ฅ0 โˆˆ ๐‘ˆ with ๐น (๐‘ฅ0) < โˆž, and hence ๐น is proper, convex, and (byCorollary 3.2) weakly lower semicontinuous. Finally, due to (iii), ๐น is coercive since forbounded ๐‘ˆ , we can use that ๐น > โˆ’โˆž, and for coercive ๐น , we can use that ๐›ฟ๐‘ˆ โ‰ฅ 0. Hencewe can apply Theorem 2.1 to obtain the existence of a minimizer ๐‘ฅ โˆˆ dom ๐น = ๐‘ˆ โˆฉ dom ๐นof ๐น with

๐น (๐‘ฅ) = ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ) = ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘ˆ ,i.e., ๐‘ฅ is the claimed solution.

Let now ๐น be strictly convex, and let ๐‘ฅ and ๐‘ฅโ€ฒ โˆˆ ๐‘ˆ be two dierent minimizers, i.e.,๐น (๐‘ฅ) = ๐น (๐‘ฅโ€ฒ) = min๐‘ฅโˆˆ๐‘ˆ ๐น (๐‘ฅ) and ๐‘ฅ โ‰  ๐‘ฅโ€ฒ. Then by the convexity of ๐‘ˆ we have for all_ โˆˆ (0, 1) that

๐‘ฅ_ โ‰” _๐‘ฅ + (1 โˆ’ _)๐‘ฅโ€ฒ โˆˆ ๐‘ˆ ,while the strict convexity of ๐น implies that

๐น (๐‘ฅ_) < _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฅโ€ฒ) = ๐น (๐‘ฅ).

But this is a contradiction to ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘ˆ . ๏ฟฝ

Note that for a sum of two convex functionals to be coercive, it is in general not sucientthat only one of them is. Functionals for which this is the case โ€“ such as the indicatorfunction of a bounded set โ€“ are called supercoercive; another example which will be helpfullater is the squared norm.

Lemma 3.9. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐‘ฅ0 โˆˆ ๐‘‹ begiven. Then the functional

๐ฝ : ๐‘‹ โ†’ โ„, ๐‘ฅ โ†ฆโ†’ ๐น (๐‘ฅ) + 12 โ€–๐‘ฅ โˆ’ ๐‘ฅ0โ€–2๐‘‹

is coercive.

38

3 convex functions

Proof. Since ๐น is proper, convex, and lower semicontinuous, it follows from Lemma 3.5 that๐น is bounded from below by a continuous ane functional, i.e., there exists an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

and an ๐›ผ โˆˆ โ„ with ๐น (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐›ผ for all ๐‘ฅ โˆˆ ๐‘‹ . Together with the reverse triangleinequality and (1.1), we obtain that

๐ฝ (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐›ผ + 12 (โ€–๐‘ฅ โ€–๐‘‹ โˆ’ โ€–๐‘ฅ0โ€–๐‘‹ )2

โ‰ฅ โˆ’โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ€–๐‘ฅ โ€–๐‘‹ + ๐›ผ + 12 โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ โ€–๐‘ฅ โ€–๐‘‹ โ€–๐‘ฅ0โ€–๐‘‹

= โ€–๐‘ฅ โ€–๐‘‹( 12 โ€–๐‘ฅ โ€–๐‘‹ โˆ’ โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โˆ’ โ€–๐‘ฅ0โ€–๐‘‹

) + ๐›ผ.Since ๐‘ฅโˆ— and ๐‘ฅ0 are xed, the term in parentheses is positive for โ€–๐‘ฅ โ€–๐‘‹ suciently large,and hence ๐ฝ (๐‘ฅ) โ†’ โˆž for โ€–๐‘ฅ โ€–๐‘‹ โ†’ โˆž as claimed. ๏ฟฝ

3.3 continuity properties

To close this chapter, we show the following remarkable result: Any (locally) bounded convexfunctional is (locally) continuous. (An extended real-valued proper functional is necessarilydiscontinuous at some point.) Besides being of use in later chapters, this result illustratesthe beauty of convex analysis: an algebraic but global property (convexity) connects twotopological but local properties (neighborhood and continuity). Here we consider of coursethe strong topology in a normed vector space.

Lemma 3.10. Let ๐‘‹ be a normed vector space, ๐น : ๐‘‹ โ†’ โ„ be convex, and ๐‘ฅ โˆˆ ๐‘‹ . If there is a๐œŒ > 0 such that ๐น is bounded from above on ๐•†(๐‘ฅ, ๐œŒ), then ๐น is locally Lipschitz continuousnear ๐‘ฅ .

Proof. By assumption, there exists an๐‘€ โˆˆ โ„ with ๐น (๐‘ฆ) โ‰ค ๐‘€ for all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, ๐œŒ). We rstshow that ๐น is locally bounded from below as well. Let ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, ๐œŒ) be arbitrary. Sinceโ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ < ๐œŒ , we also have that ๐‘ง โ‰” 2๐‘ฅ โˆ’ ๐‘ฆ = ๐‘ฅ โˆ’ (๐‘ฆ โˆ’ ๐‘ฅ) โˆˆ ๐•†(๐‘ฅ, ๐œŒ), and the convexity of๐น implies that ๐น (๐‘ฅ) = ๐น ( 1

2๐‘ฆ + 12๐‘ง

) โ‰ค 12๐น (๐‘ฆ) + 1

2๐น (๐‘ง) and hence that

โˆ’๐น (๐‘ฆ) โ‰ค ๐น (๐‘ง) โˆ’ 2๐น (๐‘ฅ) โ‰ค ๐‘€ โˆ’ 2๐น (๐‘ฅ) =:๐‘š,

i.e., โˆ’๐‘š โ‰ค ๐น (๐‘ฆ) โ‰ค ๐‘€ for all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, ๐œŒ).We now show that this implies Lipschitz continuity on ๐•†(๐‘ฅ, ๐œŒ2 ). Let ๐‘ฆ1, ๐‘ฆ2 โˆˆ ๐•†(๐‘ฅ, ๐œŒ2 ) with๐‘ฆ1 โ‰  ๐‘ฆ2 and set

๐‘ง โ‰” ๐‘ฆ1 + ๐œŒ2๐‘ฆ1 โˆ’ ๐‘ฆ2

โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹ โˆˆ ๐•†(๐‘ฅ, ๐œŒ),

which holds because โ€–๐‘ง โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘ฆ1 โˆ’ ๐‘ฅ โ€–๐‘‹ + ๐œŒ2 < ๐œŒ . By construction, we thus have that

๐‘ฆ1 = _๐‘ง + (1 โˆ’ _)๐‘ฆ2 for _ โ‰”โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹

โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹ + ๐œŒ2โˆˆ (0, 1),

39

3 convex functions

and the convexity of ๐น now implies that ๐น (๐‘ฆ1) โ‰ค _๐น (๐‘ง) + (1 โˆ’ _)๐น (๐‘ฆ2). Together with thedenition of _ as well as ๐น (๐‘ง) โ‰ค ๐‘€ and โˆ’๐น (๐‘ฆ2) โ‰ค ๐‘š = ๐‘€ โˆ’ 2๐น (๐‘ฅ), this yields the estimate

๐น (๐‘ฆ1) โˆ’ ๐น (๐‘ฆ2) โ‰ค _(๐น (๐‘ง) โˆ’ ๐น (๐‘ฆ2)) โ‰ค _(2๐‘€ โˆ’ 2๐น (๐‘ฅ))=

2(๐‘€ โˆ’ ๐น (๐‘ฅ))โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹ + ๐œŒ

2โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹

โ‰ค 2(๐‘€ โˆ’ ๐น (๐‘ฅ))๐œŒ/2 โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹ .

Exchanging the roles of ๐‘ฆ1 and ๐‘ฆ2, we obtain that

|๐น (๐‘ฆ1) โˆ’ ๐น (๐‘ฆ2) | โ‰ค 2(๐‘€ โˆ’ ๐น (๐‘ฅ))๐œŒ/2 โ€–๐‘ฆ1 โˆ’ ๐‘ฆ2โ€–๐‘‹ for all ๐‘ฆ1, ๐‘ฆ2 โˆˆ ๐•†(๐‘ฅ, ๐œŒ2 )

and hence the local Lipschitz continuity with constant ๐ฟ(๐‘ฅ, ๐œŒ/2) โ‰” 4(๐‘€ โˆ’ ๐น (๐‘ฅ))/๐œŒ . ๏ฟฝ

This result can be extended by showing that convex functions are bounded everywhere inthe interior (again a topological concept!) of their eective domain. As an intermediarystep, we rst consider the scalar case.2

Lemma 3.11. Let ๐‘“ : โ„ โ†’ โ„ be convex. Then ๐‘“ is locally bounded from above on int(dom ๐‘“ ).

Proof. Let ๐‘ฅ โˆˆ int(dom ๐‘“ ), i.e., there exist ๐‘Ž, ๐‘ โˆˆ โ„ with ๐‘ฅ โˆˆ (๐‘Ž, ๐‘) โŠ‚ dom ๐‘“ . Let now๐‘ง โˆˆ (๐‘Ž, ๐‘). Since intervals are convex, there exists a _ โˆˆ (0, 1) with ๐‘ง = _๐‘Ž + (1 โˆ’ _)๐‘. Byconvexity, we thus have

๐‘“ (๐‘ง) โ‰ค _๐‘“ (๐‘Ž) + (1 โˆ’ _) ๐‘“ (๐‘) โ‰ค max{๐‘“ (๐‘Ž), ๐‘“ (๐‘)} < โˆž.Hence ๐‘“ is locally bounded from above in ๐‘ฅ . ๏ฟฝ

The proof of the general case requires further assumptions on ๐‘‹ and ๐น .

Theorem 3.12. Let ๐‘‹ be a Banach space ๐น : ๐‘‹ โ†’ โ„ be convex and lower semicontinuous.Then ๐น is locally bounded from above on int(dom ๐น ).

Proof. We rst show the claim for the case ๐‘ฅ = 0 โˆˆ int(dom ๐น ), which implies in particularthat๐‘€ โ‰” |๐น (0) | is nite. Consider now for arbitrary โ„Ž โˆˆ ๐‘‹ the mapping

๐‘“ : โ„ โ†’ โ„, ๐‘ก โ†ฆโ†’ ๐น (๐‘กโ„Ž).It is straightforward to verify that ๐‘“ is convex and lower semicontinuous as well andsatises 0 โˆˆ int(dom ๐‘“ ). By Lemmas 3.10 and 3.11, ๐‘“ is thus locally Lipschitz continuous2With a bit more eort, one can show that the claim holds for ๐น : โ„๐‘ โ†’ โ„ with arbitrary ๐‘ โˆˆ โ„•; see, e.g.,[Schirotzek 2007, Corollary 1.4.2].

40

3 convex functions

near 0; hence in particular |๐‘“ (๐‘ก) โˆ’ ๐‘“ (0) | โ‰ค ๐ฟ๐‘ก โ‰ค 1 for suciently small ๐‘ก > 0. The reversetriangle inequality therefore yields a ๐›ฟ > 0 with

๐น (0 + ๐‘กโ„Ž) โ‰ค |๐น (0 + ๐‘กโ„Ž) | = |๐‘“ (๐‘ก) | โ‰ค |๐‘“ (0) | + 1 = ๐‘€ + 1 for all ๐‘ก โˆˆ [0, ๐›ฟ] .

Hence, 0 lies in the algebraic interior of the sublevel set sub๐‘€+1 ๐น , which is convex andclosed (since we assumed ๐น to be lower semicontinuous) by Lemma 3.3. The coreโ€“intLemma 1.2 thus yields that 0 โˆˆ int(sub๐‘€+1 ๐น ), i.e., there exists a ๐œŒ > 0 with ๐น (๐‘ง) โ‰ค ๐‘€ + 1for all ๐‘ง โˆˆ ๐•†(0, ๐œŒ).For the general case ๐‘ฅ โˆˆ int(dom ๐น ), consider

๐น : ๐‘‹ โ†’ โ„, ๐‘ฆ โ†ฆโ†’ ๐น (๐‘ฆ โˆ’ ๐‘ฅ).

Again, it is straightforward to verify convexity and lower semicontinuity of ๐น and that0 โˆˆ int(dom ๐น ). It follows from the above that ๐น is locally bounded from above on ๐•†(0, ๐œŒ),which immediately implies that ๐น is locally bounded from above on ๐•†(๐‘ฅ, ๐œŒ). ๏ฟฝ

Together with Lemma 3.10, we thus obtain the desired result.

Theorem 3.13. Let ๐‘‹ be a Banach space ๐น : ๐‘‹ โ†’ โ„ be convex and lower semicontinuous.Then ๐น is locally Lipschitz continuous on int(dom ๐น ).

We shall have several more occasions to observe the unreasonably nice behavior of convexlower semicontinuous functions on the interior of their eective domain.

41

4 CONVEX SUBDIFFERENTIALS

We now turn to the characterization of minimizers of convex functionals via a Fermatprinciple.

4.1 definition and basic properties

We rst dene our notion of generalized derivative. The motivation is geometric: Theclassical derivative ๐‘“ โ€ฒ(๐‘ก) of a scalar function ๐‘“ : โ„ โ†’ โ„ at ๐‘ก can be interpreted as the slopeof the tangent at ๐‘“ (๐‘ก). If the function is not dierentiable, the tangent โ€“ if it exists at all โ€“need no longer be unique. The idea is thus to dene as the generalized derivative the set ofall tangent slopes. Correspondingly, we dene in a normed vector space ๐‘‹ the (convex)subdierential of ๐น : ๐‘‹ โ†’ โ„ at ๐‘ฅ โˆˆ dom ๐น as

(4.1) ๐œ•๐น (๐‘ฅ) โ‰” {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ } .

(Note that ๐‘ฅ โˆ‰ dom ๐น is allowed since in this case the inequality is trivially satised.) For๐‘ฅ โˆ‰ dom ๐น , we set ๐œ•๐น (๐‘ฅ) = โˆ…. An element ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) is called a subderivative. (Followingthe terminology for classical derivatives, we reserve the more common term subgradientfor its Riesz representation ๐‘ง๐‘ฅโˆ— โˆˆ ๐‘‹ when ๐‘‹ is a Hilbert space.)

The following example shows that the subdierential can also be empty for ๐‘ฅ โˆˆ dom ๐น ,even if ๐น is convex.

Example 4.1. We take ๐‘‹ = โ„ (and hence ๐‘‹ โˆ— ๏ฟฝ ๐‘‹ = โ„) and consider

๐น (๐‘ฅ) ={โˆ’โˆš๐‘ฅ if ๐‘ฅ โ‰ฅ 0,โˆž if ๐‘ฅ < 0.

Since (3.1) is trivially satised if ๐‘ฅ or ๐‘ฆ is negative, we can assume ๐‘ฅ, ๐‘ฆ โ‰ฅ 0 so thatwe are allowed to take the square of both sides of (3.1). A straightforward algebraicmanipulation then shows that this is equivalent to ๐‘ก (๐‘ก โˆ’ 1) (โˆš๐‘ฅ โˆ’โˆš

๐‘ฆ)2 โ‰ฅ 0, which holdsfor any ๐‘ฅ, ๐‘ฆ โ‰ฅ 0 and ๐‘ก โˆˆ [0, 1]. Hence, ๐น is convex.

42

4 convex subdifferentials

However, for ๐‘ฅ = 0, any ๐‘ฅโˆ— โˆˆ ๐œ•๐น (0) by denition must satisfy

๐‘ฅโˆ— ยท ๐‘ฅ โ‰ค โˆ’โˆš๐‘ฅ for all ๐‘ฅ โˆˆ โ„.

Taking now ๐‘ฅ > 0 arbitrary, we can divide by it on both sides and let ๐‘ฅ โ†’ 0 to obtain

๐‘ฅโˆ— โ‰ค โˆ’(โˆš๐‘ฅ)โˆ’1

โ†’ โˆ’โˆž.

This is impossible for ๐‘ฅโˆ— โˆˆ โ„ ๏ฟฝ ๐‘‹ โˆ—. Hence, ๐œ•๐น (0) is empty.

In fact, it will become clear that the nonexistence of tangents is much more problematicthan the nonuniqueness. However, we will later show that for proper, convex, and lowersemicontinuous functionals, ๐œ•๐น (๐‘ฅ) is nonempty (and bounded) for all ๐‘ฅ โˆˆ int(dom ๐น ); seeTheorem 13.17. Furthermore, it follows directly from the denition that for all ๐‘ฅ โˆˆ ๐‘‹ , theset ๐œ•๐น (๐‘ฅ) is convex and weakly-โˆ— closed.The denition immediately yields a Fermat principle.

Theorem 4.2 (Fermat principle). Let ๐น : ๐‘‹ โ†’ โ„ and ๐‘ฅ โˆˆ dom ๐น . Then the followingstatements are equivalent:

(i) 0 โˆˆ ๐œ•๐น (๐‘ฅ);(ii) ๐น (๐‘ฅ) = min

๐‘ฅโˆˆ๐‘‹๐น (๐‘ฅ).

Proof. This is a direct consequence of the denitions: 0 โˆˆ ๐œ•๐น (๐‘ฅ) if and only if

0 = ใ€ˆ0, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹,

i.e., ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ .1 ๏ฟฝ

This matches the geometrical intuition: If ๐‘‹ = โ„ ๏ฟฝ ๐‘‹ โˆ—, the ane function ๐น (๐‘ฅ) โ‰”๐น (๐‘ฅ) + ๐‘ฅโˆ—(๐‘ฅ โˆ’ ๐‘ฅ) with ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) describes a tangent at (๐‘ฅ, ๐น (๐‘ฅ)) with slope ๐‘ฅโˆ—; thecondition ๐‘ฅโˆ— = 0 โˆˆ ๐œ•๐น (๐‘ฅ) thus means that ๐น has a horizontal tangent in ๐‘ฅ . (Conversely, thefunction from Example 4.1 only has a vertical tangent in ๐‘ฅ = 0, which corresponds to aninnite slope that is not an element of any vector space.)

1Note that convexity of ๐น is not required for Theorem 4.2! The condition 0 โˆˆ ๐œ•๐น (๐‘ฅ) therefore characterizesthe global(!) minimizers of any function ๐น . However, nonconvex functionals can also have localminimizers,for which the subdierential inclusion is not satised. In fact, (convex) subdierentials of nonconvexfunctionals are usually empty. (And conversely, one can show that ๐œ•๐น (๐‘ฅ) โ‰  โˆ… for all ๐‘ฅ โˆˆ dom ๐น impliesthat ๐น is convex.) This leads to problems in particular for the proof of calculus rules, for which we willindeed have to assume convexity.

43

4 convex subdifferentials

Not surprisingly, the convex subdierential behaves more nicely for convex functions. Thekey property is an alternative characterization using directional derivatives, which exist(at least in the extended real-valued sense) for any convex function.

Lemma 4.3. Let ๐น : ๐‘‹ โ†’ โ„ be convex and let ๐‘ฅ โˆˆ dom ๐น and โ„Ž โˆˆ ๐‘‹ be given. Then:

(i) the function

๐œ‘ : (0,โˆž) โ†’ โ„, ๐‘ก โ†ฆโ†’ ๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

,

is increasing;

(ii) there exists a limit ๐น โ€ฒ(๐‘ฅ ;โ„Ž) = lim๐‘กโ†’ 0 ๐œ‘ (๐‘ก) โˆˆ [โˆ’โˆž,โˆž], which satises

๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ);

(iii) if ๐‘ฅ โˆˆ int(dom ๐น ), the limit ๐น โ€ฒ(๐‘ฅ ;โ„Ž) is nite.

Proof. (i): Inserting the denition and sorting terms shows that for all 0 < ๐‘  โ‰ค ๐‘ก , thecondition ๐œ‘ (๐‘ ) โ‰ค ๐œ‘ (๐‘ก) is equivalent to

๐น (๐‘ฅ + ๐‘ โ„Ž) โ‰ค ๐‘ 

๐‘ก๐น (๐‘ฅ + ๐‘กโ„Ž) +

(1 โˆ’ ๐‘ 

๐‘ก

)๐น (๐‘ฅ),

which follows from the convexity of ๐น since ๐‘ฅ + ๐‘ โ„Ž = ๐‘ ๐‘ก (๐‘ฅ + ๐‘กโ„Ž) + (1 โˆ’ ๐‘ 

๐‘ก )๐‘ฅ .(ii): The claim immediately follows from (i) since

๐น โ€ฒ(๐‘ฅ ;โ„Ž) = lim๐‘กโ†’ 0

๐œ‘ (๐‘ก) = inf๐‘ก>0

๐œ‘ (๐‘ก) โ‰ค ๐œ‘ (1) = ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆˆ โ„.

(iii): Since int(dom ๐น ) is contained in the algebraic interior of dom ๐น , there exists an Y > 0such that ๐‘ฅ + ๐‘กโ„Ž โˆˆ dom ๐น for all ๐‘ก โˆˆ (โˆ’Y, Y). Proceeding as in (i), we obtain that ๐œ‘ (๐‘ ) โ‰ค ๐œ‘ (๐‘ก)for all ๐‘  < ๐‘ก < 0 as well. From ๐‘ฅ = 1

2 (๐‘ฅ + ๐‘กโ„Ž) + 12 (๐‘ฅ โˆ’ ๐‘กโ„Ž) for ๐‘ก > 0, we also obtain that

๐œ‘ (โˆ’๐‘ก) = ๐น (๐‘ฅ โˆ’ ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)โˆ’๐‘ก โ‰ค ๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)

๐‘ก= ๐œ‘ (๐‘ก)

and hence that ๐œ‘ is increasing on all โ„ \ {0}. As in (ii), the choice of Y now implies that

โˆ’โˆž < ๐œ‘ (โˆ’Y) โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค ๐œ‘ (Y) < โˆž. ๏ฟฝ

Lemma 4.4. Let ๐น : ๐‘‹ โ†’ โ„ be convex and ๐‘ฅ โˆˆ dom ๐น . Then

๐œ•๐น (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ } .

Proof. Since any ๐‘ฅ โˆˆ ๐‘‹ can be written as ๐‘ฅ = ๐‘ฅ +โ„Ž for some โ„Ž โˆˆ ๐‘‹ and vice versa, it sucesto show that for any ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, the following statements are equivalent:

44

4 convex subdifferentials

(i) ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ ;(ii) ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) for all โ„Ž โˆˆ ๐‘‹ .

If ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— satises ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ , we immediately obtain fromLemma 4.3 (ii) that

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) for all โ„Ž โˆˆ ๐‘‹ .

Setting ๐‘ฅ = ๐‘ฅ + โ„Ž โˆˆ ๐‘‹ then yields ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ).Conversely, if ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰ โ‰ค ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) holds for all โ„Ž := ๐‘ฅ โˆ’ ๐‘ฅ โˆˆ ๐‘‹ , it also holds for ๐‘กโ„Ž forall โ„Ž โˆˆ ๐‘‹ and ๐‘ก > 0. Dividing by ๐‘ก and passing to the limit then yields that

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค lim๐‘กโ†’ 0

๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

= ๐น โ€ฒ(๐‘ฅ ;โ„Ž). ๏ฟฝ

4.2 fundamental examples

We now look at some examples. First, the construction from the directional derivativeindicates that the subdierential is indeed a generalization of the Gรขteaux derivative.

Theorem 4.5. Let ๐น : ๐‘‹ โ†’ โ„ be convex. If ๐น Gรขteaux dierentiable at ๐‘ฅ , then ๐œ•๐น (๐‘ฅ) ={๐ท๐น (๐‘ฅ)}.

Proof. By denition of the Gรขteaux derivative, we have that

ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ = ๐ท๐น (๐‘ฅ)โ„Ž = ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ .

Lemma 4.4 now immediately yields ๐ท๐น (๐‘ฅ) โˆˆ ๐œ•๐น (๐‘ฅ). Conversely, ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) again byLemma 4.4 implies that

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) = ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ for all โ„Ž โˆˆ ๐‘‹ .

Taking the supremum over all โ„Ž with โ€–โ„Žโ€–๐‘‹ โ‰ค 1 now yields that โ€–๐‘ฅโˆ— โˆ’ ๐ท๐น (๐‘ฅ)โ€–๐‘‹ โˆ— โ‰ค 0, i.e.,๐‘ฅโˆ— = ๐ท๐น (๐‘ฅ). ๏ฟฝ

The converse holds as well: If ๐‘ฅ โˆˆ int(dom ๐น ) and ๐œ•๐น (๐‘ฅ) is a singleton, then ๐น is Gรขteauxdierentiable; see Theorem 13.18.

Of course, we also want to compute subdierentials of functionals that are not dierentiable.The canonical example is the norm โ€– ยท โ€–๐‘‹ on a normed vector space, which even for ๐‘‹ = โ„

is not dierentiable at ๐‘ฅ = 0.

45

4 convex subdifferentials

Theorem 4.6. For any ๐‘ฅ โˆˆ ๐‘‹ ,

๐œ•(โ€– ยท โ€–๐‘‹ ) (๐‘ฅ) ={{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ and โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 1} if ๐‘ฅ โ‰  0,๐”น๐‘‹ โˆ— if ๐‘ฅ = 0.

Proof. For ๐‘ฅ = 0, we have ๐‘ฅโˆ— โˆˆ ๐œ•(โ€– ยท โ€–๐‘‹ ) (๐‘ฅ) by denition if and only if

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค โ€–๐‘ฅ โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ \ {0}

(since the inequality is trivial for ๐‘ฅ = 0), which by the denition of the operator norm isequivalent to โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1.

Let now ๐‘ฅ โ‰  0 and consider ๐‘ฅโˆ— โˆˆ ๐œ•(โ€– ยท โ€–๐‘‹ ) (๐‘ฅ). Inserting rst ๐‘ฅ = 0 and then ๐‘ฅ = 2๐‘ฅ intothe denition (4.1) yields the sequence of inequalities

โ€–๐‘ฅ โ€–๐‘‹ โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฅโˆ—, 2๐‘ฅ โˆ’ ๐‘ฅใ€‰ โ‰ค โ€–2๐‘ฅ โ€–๐‘‹ โˆ’ โ€–๐‘ฅ โ€–๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ ,

which imply that ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ . Similarly, we have for all ๐‘ฅ โˆˆ ๐‘‹ that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฅโˆ—, (๐‘ฅ + ๐‘ฅ) โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค โ€–๐‘ฅ + ๐‘ฅ โ€–๐‘‹ โˆ’ โ€–๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘ฅ โ€–๐‘‹ ,

As in the case ๐‘ฅ = 0, this implies that โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1. For ๐‘ฅ = ๐‘ฅ/โ€–๐‘ฅ โ€–๐‘‹ we thus have that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–โˆ’1๐‘‹ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–โˆ’1๐‘‹ โ€–๐‘ฅ โ€–๐‘‹ = 1.

Hence, โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 1 is in fact attained.

Conversely, let ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— with ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ and โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 1. Then we obtain for all ๐‘ฅ โˆˆ ๐‘‹from (1.1) the relation

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค โ€–๐‘ฅ โ€–๐‘‹ โˆ’ โ€–๐‘ฅ โ€–๐‘‹ ,

and hence ๐‘ฅโˆ— โˆˆ ๐œ•(โ€– ยท โ€–๐‘‹ ) (๐‘ฅ) by denition. ๏ฟฝ

Example 4.7. In particular, we obtain for ๐‘‹ = โ„ the subdierential of the absolute valuefunction as

(4.2) ๐œ•( | ยท |) (๐‘ก) = sign(๐‘ก) โ‰”{1} if ๐‘ก > 0,{โˆ’1} if ๐‘ก < 0,[โˆ’1, 1] if ๐‘ก = 0,

cf. Figure 4.1a.2

46

4 convex subdifferentials

๐‘ฅ

๐œ•๐น (๐‘ฅ)

โˆ’1

0

1

(a) ๐น (๐‘ฅ) = |๐‘ฅ |

๐‘ฅ

๐œ•๐น (๐‘ฅ)

โˆ’1

โˆ’10 1

(b) ๐น (๐‘ฅ) = ๐›ฟ [โˆ’1,1] (๐‘ฅ)

Figure 4.1: Illustration of graph ๐œ•๐น for some example functions ๐น : โ„ โ†’ โ„

We can also give a more explicit characterization of the subdierential of the indicatorfunctional of a set ๐ถ โŠ‚ ๐‘‹ .

Lemma 4.8. For any ๐ถ โŠ‚ ๐‘‹ ,๐œ•๐›ฟ๐ถ (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐ถ} .

Proof. For any ๐‘ฅ โˆˆ ๐ถ = dom๐›ฟ๐ถ , we have that

๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ๐ถ (๐‘ฅ) โ‡” ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐›ฟ๐ถ (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹โ‡” ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐ถ,

since the rst inequality is trivially satised for all ๐‘ฅ โˆ‰ ๐ถ . ๏ฟฝ

The set ๐‘๐ถ (๐‘ฅ) โ‰” ๐œ•๐›ฟ๐ถ (๐‘ฅ) is also called the (convex) normal cone to ๐ถ at ๐‘ฅ (which maybe empty if ๐ถ is not convex). Depending on the set ๐ถ , this can be made even more ex-plicit.

Example 4.9. Let ๐‘‹ = โ„ and ๐ถ = [โˆ’1, 1], and let ๐‘ก โˆˆ ๐ถ . Then we have ๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ [โˆ’1,1] (๐‘ก) ifand only if ๐‘ฅโˆ—(๐‘ก โˆ’ ๐‘ก) โ‰ค 0 for all ๐‘ก โˆˆ [โˆ’1, 1]. We proceed by distinguishing three cases.

Case 1: ๐‘ก = 1. Then ๐‘ก โˆ’ ๐‘ก โˆˆ [โˆ’2, 0], and hence the product is nonpositive if andonly if ๐‘ฅโˆ— โ‰ฅ 0.

Case 2: ๐‘ก = โˆ’1. Then ๐‘ก โˆ’ ๐‘ก โˆˆ [0, 2], and hence the product is nonpositive if andonly if ๐‘ฅโˆ— โ‰ค 0.

Case 3: ๐‘ก โˆˆ (โˆ’1, 1). Then ๐‘ก โˆ’ ๐‘ก can be positive as well as negative, and hence only

2Note that this set-valued denition of sign(๐‘ก) diers from the usual (single-valued) one, in particular for๐‘ก = 0; to make this distinction clear, one often refers to (4.2) as the sign in the sense of convex analysis.Throughout this book, we will always use the sign in this sense.

47

4 convex subdifferentials

๐‘ฅโˆ— = 0 is possible.

We thus obtain that

(4.3) ๐œ•๐›ฟ [โˆ’1,1] (๐‘ก) =

[0,โˆž) if ๐‘ก = 1,(โˆ’โˆž, 0] if ๐‘ก = โˆ’1,{0} if ๐‘ก โˆˆ (โˆ’1, 1),โˆ… if ๐‘ก โˆˆ โ„ \ [โˆ’1, 1],

cf. Figure 4.1b. Readers familiar with (non)linear optimization will recognize these as thecomplementarity conditions for Lagrange multipliers corresponding to the inequalitiesโˆ’1 โ‰ค ๐‘ก โ‰ค 1.

Conversely, subdierentials of convex functionals can be obtained from normal cones tocorresponding epigraphs (which are convex sets by Lemma 3.1). This relation will be thebasis for dening further subdierentials for more general classes of mappings in Part IV.

Lemma 4.10. Let ๐น : ๐‘‹ โ†’ โ„ be convex and ๐‘ฅ โˆˆ dom ๐น . Then ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) if and only if(๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘epi ๐น (๐‘ฅ, ๐น (๐‘ฅ)).

Proof. By denition of the normal cone, (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘epi ๐น (๐‘ฅ, ๐น (๐‘ฅ)) is equivalent to

(4.4) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ (๐‘ก โˆ’ ๐น (๐‘ฅ)) โ‰ค 0 for all (๐‘ฅ, ๐‘ก) โˆˆ epi ๐น,

i.e., for all ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ก โ‰ฅ ๐น (๐‘ฅ). Taking ๐‘ก = ๐น (๐‘ฅ) and rearranging, this yields that ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ).Conversely, from ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) we immediately obtain that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โ‰ค ๐‘ก โˆ’ ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ก โ‰ฅ ๐น (๐‘ฅ),

i.e., (4.4) and thus (๐‘ฅโˆ—,โˆ’1) โˆˆ epi ๐น . ๏ฟฝ

The following result furnishes a crucial link between nite- and innite-dimensionalconvex optimization. We again assume (as we will from now on) that ฮฉ โŠ‚ โ„๐‘‘ is open andbounded.

Theorem 4.11. Let ๐‘“ : โ„ โ†’ โ„ be proper, convex, and lower semicontinuous, and let ๐น :๐ฟ๐‘ (ฮฉ) โ†’ โ„ with 1 โ‰ค ๐‘ < โˆž be as in Lemma 3.7. Then we have for all ๐‘ข โˆˆ dom ๐น with๐‘ž โ‰” ๐‘

๐‘โˆ’1 that

๐œ•๐น (๐‘ข) = {๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ) | ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐‘“ (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ} .

48

4 convex subdifferentials

Proof. Let ๐‘ข, ๏ฟฝ๏ฟฝ โˆˆ dom ๐น , i.e., ๐‘“ โ—ฆ ๐‘ข, ๐‘“ โ—ฆ ๏ฟฝ๏ฟฝ โˆˆ ๐ฟ1(ฮฉ), and let ๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ) be arbitrary. If๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐‘“ (๐‘ข (๐‘ฅ)) almost everywhere, we can integrate over all ๐‘ฅ โˆˆ ฮฉ to obtain

๐น (๏ฟฝ๏ฟฝ) โˆ’ ๐น (๐‘ข) =โˆซฮฉ๐‘“ (๏ฟฝ๏ฟฝ (๐‘ฅ)) โˆ’ ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ โ‰ฅ

โˆซฮฉ๐‘ขโˆ—(๐‘ฅ) (๏ฟฝ๏ฟฝ (๐‘ฅ) โˆ’ ๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ = ใ€ˆ๐‘ขโˆ—, ๏ฟฝ๏ฟฝ โˆ’ ๐‘ขใ€‰๐ฟ๐‘ ,

i.e., ๐‘ขโˆ— โˆˆ ๐œ•๐น (๐‘ข).Conversely, let ๐‘ขโˆ— โˆˆ ๐œ•๐น (๐‘ข). Then by denition it holds thatโˆซ

ฮฉ๐‘ขโˆ—(๐‘ฅ) (๏ฟฝ๏ฟฝ (๐‘ฅ) โˆ’ ๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ โ‰ค

โˆซฮฉ๐‘“ (๏ฟฝ๏ฟฝ (๐‘ฅ)) โˆ’ ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ for all ๏ฟฝ๏ฟฝ โˆˆ ๐ฟ๐‘ (ฮฉ).

Let now ๐‘ก โˆˆ โ„ be arbitrary and let ๐ด โŠ‚ ฮฉ be an arbitrary measurable set. Setting

๏ฟฝ๏ฟฝ (๐‘ฅ) โ‰”{๐‘ก if ๐‘ฅ โˆˆ ๐ด,๐‘ข (๐‘ฅ) if ๐‘ฅ โˆ‰ ๐ด,

the above inequality implies due to ๏ฟฝ๏ฟฝ โˆˆ ๐ฟ๐‘ (ฮฉ) thatโˆซ๐ด๐‘ขโˆ—(๐‘ฅ) (๐‘ก โˆ’ ๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ โ‰ค

โˆซ๐ด๐‘“ (๐‘ก) โˆ’ ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ.

Since ๐ด was arbitrary, it must hold that

๐‘ขโˆ—(๐‘ฅ) (๐‘ก โˆ’ ๐‘ข (๐‘ฅ)) โ‰ค ๐‘“ (๐‘ก) โˆ’ ๐‘“ (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ.

Furthermore, since ๐‘ก โˆˆ โ„ was arbitrary, we obtain that ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐‘“ (๐‘ข (๐‘ฅ)) for almost every๐‘ฅ โˆˆ ฮฉ. ๏ฟฝ

Remark 4.12. A similar representation representation can be shown for vector-valued and spatially-dependent integrands ๐‘“ : ฮฉ ร—โ„ โ†’ โ„๐‘š under stronger assumptions; see, e.g., [Rockafellar 1976a,Corollary 3F].

A similar proof shows that for ๐น : โ„๐‘ โ†’ โ„ with ๐น (๐‘ฅ) = โˆ‘๐‘๐‘–=1 ๐‘“๐‘– (๐‘ฅ๐‘–) and ๐‘“๐‘– : โ„ โ†’ โ„ convex,

we have for any ๐‘ฅ โˆˆ dom ๐น that

๐œ•๐น (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ โ„๐‘

๏ฟฝ๏ฟฝ ๐‘ฅโˆ—๐‘– โˆˆ ๐œ•๐‘“๐‘– (๐‘ฅ๐‘–), 1 โ‰ค ๐‘– โ‰ค ๐‘}.

Together with the above examples, this yields componentwise expressions for the subdif-ferential of the norm โ€– ยท โ€–1 as well as of the indicator functional of the unit ball with respectto the supremum norm in โ„๐‘ .

49

4 convex subdifferentials

4.3 calculus rules

As for classical derivatives, one rarely obtains subdierentials from the fundamental deni-tion but rather by applying calculus rules. It stands to reason that these are more dicultto derive the weaker the derivative concept is (i.e., the more functionals are dierentiablein that sense). For convex subdierentials, the following two rules still follow directly fromthe denition.

Lemma 4.13. Let ๐น : ๐‘‹ โ†’ โ„ be convex and ๐‘ฅ โˆˆ dom ๐น . Then,

(i) ๐œ•(_๐น ) (๐‘ฅ) = _(๐œ•๐น (๐‘ฅ)) โ‰” {_๐‘ฅโˆ— | ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ)} for _ โ‰ฅ 0;

(ii) ๐œ•๐น (ยท + ๐‘ฅ0) (๐‘ฅ) = ๐œ•๐น (๐‘ฅ + ๐‘ฅ0) for ๐‘ฅ0 โˆˆ ๐‘‹ with ๐‘ฅ + ๐‘ฅ0 โˆˆ dom ๐น .

Already the sum rule is considerably more delicate.

Theorem 4.14 (sum rule). Let ๐‘‹ be a Banach space, ๐น,๐บ : ๐‘‹ โ†’ โ„ be convex and lowersemicontinuous, and ๐‘ฅ โˆˆ dom ๐น โˆฉ dom๐บ . Then

๐œ•๐น (๐‘ฅ) + ๐œ•๐บ (๐‘ฅ) โŠ‚ ๐œ•(๐น +๐บ) (๐‘ฅ),

with equality if there exists an ๐‘ฅ0 โˆˆ int(dom ๐น ) โˆฉ dom๐บ .

Proof. The inclusion follows directly from adding the denitions of the two subdierentials.Let therefore ๐‘ฅ โˆˆ dom ๐น โˆฉ dom๐บ and ๐‘ฅโˆ— โˆˆ ๐œ•(๐น +๐บ) (๐‘ฅ), i.e., satisfying

(4.5) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค (๐น (๐‘ฅ) +๐บ (๐‘ฅ)) โˆ’ (๐น (๐‘ฅ) +๐บ (๐‘ฅ)) for all ๐‘ฅ โˆˆ ๐‘‹ .

Our goal is now to use (as in the proof of Lemma 3.5) the characterization of convexfunctionals via their epigraph together with the Hahnโ€“Banach separation theorem toconstruct a bounded linear functional ๐‘ฆโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ) โŠ‚ ๐‘‹ โˆ— with ๐‘ฅโˆ— โˆ’ ๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐‘ฅ), i.e.,

๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ใ€ˆ๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ dom ๐น,

๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) โ‰ค ใ€ˆ๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ dom๐บ.

For that purpose, we dene the sets

๐ถ1 โ‰” {(๐‘ฅ, ๐‘ก โˆ’ (๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ )) | ๐‘ฅ โˆˆ dom ๐น, ๐‘ก โ‰ฅ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ } ,๐ถ2 โ‰” {(๐‘ฅ,๐บ (๐‘ฅ) โˆ’ ๐‘ก) | ๐‘ฅ โˆˆ dom๐บ, ๐‘ก โ‰ฅ ๐บ (๐‘ฅ)} ,

i.e.,๐ถ1 = epi(๐น โˆ’ ๐‘ฅโˆ—) โˆ’ (0, ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ ), ๐ถ2 = โˆ’(epi๐บ โˆ’ (0,๐บ (๐‘ฅ))),

cf. Figure 4.2. To apply Corollary 1.6 to these sets, we have to verify its conditions.

50

4 convex subdifferentials

(i) Since ๐‘ฅ โˆˆ dom ๐น โˆฉ dom๐บ , both ๐ถ1 and ๐ถ2 are nonempty. Furthermore, since ๐น and๐บ are convex, it is straightforward (if tedious) to verify from the denition that ๐ถ1and ๐ถ2 are convex.

(ii) The critical point is of course the nonemptiness of int๐ถ1, for which we argue asfollows. Since ๐‘ฅ0 โˆˆ int(dom ๐น ), we know from Theorem 3.12 that ๐น is bounded in anopen neighborhood ๐‘ˆ โŠ‚ int(dom ๐น ) of ๐‘ฅ0. We can thus nd an open interval ๐ผ โŠ‚ โ„

such that ๐‘ˆ ร— ๐ผ โŠ‚ ๐ถ1. Since ๐‘ˆ ร— ๐ผ is open by the denition of the product topologyon ๐‘‹ ร—โ„, any (๐‘ฅ0, ๐›ผ) with ๐›ผ โˆˆ ๐ผ is an interior point of ๐ถ1.

(iii) It remains to show that int๐ถ1 โˆฉ๐ถ2 = โˆ…. Assume there exists a (๐‘ฅ, ๐›ผ) โˆˆ int๐ถ1 โˆฉ๐ถ2.But then the denitions of these sets and of the product topology imply that

๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ < ๐›ผ โ‰ค ๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ),

contradicting (4.5). Hence int๐ถ1 and ๐ถ2 are disjoint.

We can thus apply Corollary 1.6 to obtain a pair (๐‘งโˆ—, ๐‘ ) โˆˆ (๐‘‹ โˆ— ร—โ„) \ {(0, 0)} ๏ฟฝ (๐‘‹ ร—โ„)โˆ— \{(0, 0)} and a _ โˆˆ โ„ with

ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐‘  (๐‘ก โˆ’ (๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ )) โ‰ค _, ๐‘ฅ โˆˆ dom ๐น, ๐‘ก โ‰ฅ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ ,(4.6a)ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ + ๐‘  (๐บ (๐‘ฅ) โˆ’ ๐‘ก) โ‰ฅ _, ๐‘ฅ โˆˆ dom๐บ, ๐‘ก โ‰ฅ ๐บ (๐‘ฅ).(4.6b)

We now show that ๐‘  < 0. If ๐‘  = 0, we can insert ๐‘ฅ = ๐‘ฅ0 โˆˆ dom ๐น โˆฉ dom๐บ to obtain thecontradiction

ใ€ˆ๐‘งโˆ—, ๐‘ฅ0ใ€‰๐‘‹ < _ โ‰ค ใ€ˆ๐‘งโˆ—, ๐‘ฅ0ใ€‰๐‘‹ ,which follows since (๐‘ฅ0, ๐›ผ) for ๐›ผ large enough is an interior point of ๐ถ1 and hence can bestrictly separated from ๐ถ2 by Theorem 1.5 (i). If ๐‘  > 0, choosing ๐‘ก > ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ makesthe term in parentheses in (4.6a) strictly positive, and taking ๐‘ก โ†’ โˆž with xed ๐‘ฅ leads to acontradiction to the boundedness by _.

Hence ๐‘  < 0, and (4.6a) with ๐‘ก = ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ and (4.6b) with ๐‘ก = ๐บ (๐‘ฅ) imply that

๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐‘ โˆ’1(_ โˆ’ ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ ), for all ๐‘ฅ โˆˆ dom ๐น,

๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) โ‰ค ๐‘ โˆ’1(_ โˆ’ ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ ), for all ๐‘ฅ โˆˆ dom๐บ.

Taking ๐‘ฅ = ๐‘ฅ โˆˆ dom ๐น โˆฉ dom๐บ in both inequalities immediately yields that _ = ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ .Hence, ๐‘ฆโˆ— = ๐‘ โˆ’1๐‘งโˆ— โˆˆ ๐‘‹ โˆ— is the desired functional with (๐‘ฅโˆ— โˆ’ ๐‘ฆโˆ—) โˆˆ ๐œ•๐น (๐‘ฅ) and ๐‘ฆโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ),i.e., ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) + ๐œ•๐บ (๐‘ฅ). ๏ฟฝ

The following example demonstrates that the inclusion is strict in general (althoughnaturally the situation in innite-dimensional vector spaces is nowhere near as obvious).

51

4 convex subdifferentials

๐น (๐‘ฅ) โˆ’ ๐‘ฅโˆ— ยท ๐‘ฅ

๐ถ1

โˆ’๐บ (๐‘ฅ)

๐ถ2 ๐‘ฆโˆ— ยท ๐‘ฅ

๐‘ก

๐‘ฅ

Figure 4.2: Illustration of the proof of Theo-rem 4.14 for ๐น (๐‘ฅ) = 1

2 |๐‘ฅ |2, ๐บ (๐‘ฅ) =|๐‘ฅ |, and ๐‘ฅโˆ— = 1

2 โˆˆ ๐œ•(๐น +๐บ) (0). Thedashed line is the separating hy-perplane {(๐‘ฅ, ๐‘ก) | ๐‘งโˆ— ยท ๐‘ฅ + ๐‘ ๐‘ก = _},i.e., _ = 0, ๐‘งโˆ— = โˆ’1, ๐‘  = โˆ’2 andhence ๐‘ฆโˆ— = 1

2 โˆˆ ๐œ•๐บ (0).

๐ถ1

๐ถ2

Figure 4.3: Illustration of the situation inExample 4.15. Here the dashedseparating hyperplane corre-sponds to the vertical line{(๐‘ฅ, ๐‘ก) | ๐‘ฅ = 0} (i.e., ๐‘งโˆ— = 1and ๐‘  = 0), and hence ๐‘ฆโˆ— โˆ‰ โ„.

Example 4.15. We take again ๐‘‹ = โ„ and ๐น : ๐‘‹ โ†’ โ„ from Example 4.1, i.e.,

๐น (๐‘ฅ) ={โˆ’โˆš๐‘ฅ if ๐‘ฅ โ‰ฅ 0,โˆž if ๐‘ฅ < 0,

as well as๐บ (๐‘ฅ) = ๐›ฟ (โˆ’โˆž,0] (๐‘ฅ). Both ๐น and๐บ are convex, and 0 โˆˆ dom ๐น โˆฉ dom๐บ . In fact,(๐น +๐บ) (๐‘ฅ) = ๐›ฟ{0} (๐‘ฅ) and hence it is straightforward to verify that ๐œ•(๐น +๐บ) (0) = โ„.

On the other hand, we know from Example 4.1 and the argument leading to (4.3) that

๐œ•๐น (0) = โˆ…, ๐œ•๐บ (0) = [0,โˆž),

and hence that๐œ•๐น (0) + ๐œ•๐บ (0) = โˆ… ( โ„ = ๐œ•(๐น +๐บ) (0).

(As ๐น only admits a vertical tangent as ๐‘ฅ = 0, this example corresponds to the situationwhere ๐‘  = 0 in (4.6a), cf. Figure 4.3.)

Remark 4.16. There exist alternative conditions that guarantee that the sum rule holds with equality.For example, if ๐‘‹ is a Banach space and ๐น and ๐บ are in addition lower semicontinuous, this holds

52

4 convex subdifferentials

under the Attouchโ€“Brรฉzis condition thatโ‹ƒ_โ‰ฅ0

_ (dom ๐น โˆ’ dom๐บ) =: ๐‘ is a closed subspace of ๐‘‹,

see [Attouch & Brezis 1986]. (Note that this condition is not satised in Example 4.15 either, since inthis case ๐‘ = โˆ’ dom๐บ = [0,โˆž) which is closed but not a subspace.)

It is not dicult to see that the condition ๐‘ฅ0 โˆˆ int(dom ๐น ) โˆฉ dom๐บ in the statement of Lemma 4.13implies the Attouchโ€“Brรฉzis condition. In fact, the latter allows us to generalize the condition to๐‘ฅ0 โˆˆ ri(dom ๐น ) โˆฉ dom๐บ where ri๐ด for a set ๐ด denotes the relative interior : the interior of ๐ด withrespect to the smallest closed ane set that contains ๐ด. As an example, ri{๐‘} = {๐‘} for a point๐‘ โˆˆ ๐‘‹ .

By induction, we obtain from this sum rules for an arbitrary (nite) number of functionals(where ๐‘ฅ0 has to be an interior point of all but one eective domain). A chain rule for linearoperators can be proved similarly.

Theorem 4.17 (chain rule). Let ๐‘‹,๐‘Œ be Banach spaces, ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), ๐น : ๐‘Œ โ†’ โ„ be proper,convex, and lower semicontinuous, and ๐‘ฅ โˆˆ dom(๐น โ—ฆ ๐พ). Then,

๐œ•(๐น โ—ฆ ๐พ) (๐‘ฅ) โŠƒ ๐พโˆ—๐œ•๐น (๐พ๐‘ฅ) โ‰” {๐พโˆ—๐‘ฆโˆ— | ๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐พ๐‘ฅ)}

with equality if there exists an ๐‘ฅ0 โˆˆ ๐‘‹ with ๐พ๐‘ฅ0 โˆˆ int(dom ๐น ).

Proof. The inclusion is again a direct consequence of the denition: If ๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐พ๐‘ฅ) โŠ‚ ๐‘Œ โˆ—,we in particular have for all ๏ฟฝ๏ฟฝ = ๐พ๐‘ฅ โˆˆ ๐‘Œ with ๐‘ฅ โˆˆ ๐‘‹ that

๐น (๐พ๐‘ฅ) โˆ’ ๐น (๐พ๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅ โˆ’ ๐พ๐‘ฅใ€‰๐‘Œ = ใ€ˆ๐พโˆ—๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ ,

i.e., ๐‘ฅโˆ— โ‰” ๐พโˆ—๐‘ฆโˆ— โˆˆ ๐œ•(๐น โ—ฆ ๐พ) โŠ‚ ๐‘‹ โˆ—.

To show the claimed equality, let ๐‘ฅ โˆˆ dom(๐น โ—ฆ ๐พ) and ๐‘ฅโˆ— โˆˆ ๐œ•(๐น โ—ฆ ๐พ) (๐‘ฅ), i.e.,

๐น (๐พ๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐พ๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ .

We now construct a ๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐พ๐‘ฅ) with ๐‘ฅโˆ— = ๐พโˆ—๐‘ฆโˆ— by applying the sum rule to3

๐ป : ๐‘‹ ร— ๐‘Œ โ†’ โ„, (๐‘ฅ, ๐‘ฆ) โ†ฆโ†’ ๐น (๐‘ฆ) + ๐›ฟgraph๐พ (๐‘ฅ, ๐‘ฆ).

Since ๐พ is linear, graph๐พ and hence ๐›ฟgraph๐พ are convex. Furthermore, ๐พ๐‘ฅ โˆˆ dom ๐น byassumption and thus (๐‘ฅ, ๐พ๐‘ฅ) โˆˆ dom๐ป .

3This technique of โ€œliftingโ€ a problem to a product space in order to separate operators is also useful inmany other contexts.

53

4 convex subdifferentials

We begin by showing that ๐‘ฅโˆ— โˆˆ ๐œ•(๐น โ—ฆ ๐พ) (๐‘ฅ) if and only if (๐‘ฅโˆ—, 0) โˆˆ ๐œ•๐ป (๐‘ฅ, ๐พ๐‘ฅ). First, let(๐‘ฅโˆ—, 0) โˆˆ ๐œ•๐ป (๐‘ฅ, ๐พ๐‘ฅ). Then we have for all ๐‘ฅ โˆˆ ๐‘‹, ๏ฟฝ๏ฟฝ โˆˆ ๐‘Œ that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ0, ๏ฟฝ๏ฟฝ โˆ’ ๐พ๐‘ฅใ€‰๐‘Œ โ‰ค ๐น (๏ฟฝ๏ฟฝ) โˆ’ ๐น (๐พ๐‘ฅ) + ๐›ฟgraph๐พ (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆ’ ๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ).In particular, this holds for all ๏ฟฝ๏ฟฝ โˆˆ ran(๐พ) = {๐พ๐‘ฅ | ๐‘ฅ โˆˆ ๐‘‹ }. By ๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ) = 0 we thusobtain that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐พ๐‘ฅ) โˆ’ ๐น (๐พ๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹,i.e., ๐‘ฅโˆ— โˆˆ ๐œ•(๐น โ—ฆ ๐พ) (๐‘ฅ). Conversely, let ๐‘ฅโˆ— โˆˆ ๐œ•(๐น โ—ฆ ๐พ) (๐‘ฅ). Since ๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ) = 0 and๐›ฟgraph๐พ (๐‘ฅ, ๏ฟฝ๏ฟฝ) โ‰ฅ 0, it then follows for all ๐‘ฅ โˆˆ ๐‘‹ and ๏ฟฝ๏ฟฝ โˆˆ ๐‘Œ that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ0, ๏ฟฝ๏ฟฝ โˆ’ ๐พ๐‘ฅใ€‰๐‘Œ = ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹โ‰ค ๐น (๐พ๐‘ฅ) โˆ’ ๐น (๐พ๐‘ฅ) + ๐›ฟgraph๐พ (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆ’ ๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ)= ๐น (๏ฟฝ๏ฟฝ) โˆ’ ๐น (๐พ๐‘ฅ) + ๐›ฟgraph๐พ (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆ’ ๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ),

where we have used that last equality holds trivially as โˆž = โˆž for ๏ฟฝ๏ฟฝ โ‰  ๐พ๐‘ฅ . Hence,(๐‘ฅโˆ—, 0) โˆˆ ๐œ•๐ป (๐‘ฅ, ๐พ๐‘ฅ).We now consider ๐น : ๐‘‹ ร— ๐‘Œ โ†’ โ„, (๐‘ฅ, ๐‘ฆ) โ†ฆโ†’ ๐น (๐‘ฆ), and (๐‘ฅ0, ๐พ๐‘ฅ0) โˆˆ graph๐พ = dom๐›ฟgraph๐พ .Since ๐พ๐‘ฅ0 โˆˆ int(dom ๐น ) โŠ‚ ๐‘Œ by assumption, (๐‘ฅ0, ๐พ๐‘ฅ0) โˆˆ int(dom ๐น ) as well. We can thusapply Theorem 4.14 to obtain

(๐‘ฅโˆ—, 0) โˆˆ ๐œ•๐ป (๐‘ฅ, ๐พ๐‘ฅ) = ๐œ•๐น (๐พ๐‘ฅ) + ๐œ•๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ),

i.e., (๐‘ฅโˆ—, 0) = (๐‘ฅโˆ—1 , ๐‘ฆโˆ—1 ) + (๐‘ฅโˆ—2, ๐‘ฆโˆ—2) for some (๐‘ฅโˆ—1 , ๐‘ฆโˆ—1 ) โˆˆ ๐œ•๐น (๐พ๐‘ฅ) and (๐‘ฅโˆ—2, ๐‘ฆโˆ—2) โˆˆ ๐œ•๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ).Now we have (๐‘ฅโˆ—1 , ๐‘ฆโˆ—1 ) โˆˆ ๐œ•๐น (๐พ๐‘ฅ) if and only if

ใ€ˆ๐‘ฅโˆ—1 , ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฆโˆ—1 , ๏ฟฝ๏ฟฝ โˆ’ ๐พ๐‘ฅใ€‰๐‘Œ โ‰ค ๐น (๏ฟฝ๏ฟฝ) โˆ’ ๐น (๐พ๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹, ๏ฟฝ๏ฟฝ โˆˆ ๐‘Œ .Fixing ๐‘ฅ = ๐‘ฅ and ๏ฟฝ๏ฟฝ = ๐พ๐‘ฅ implies that ๐‘ฆโˆ—1 โˆˆ ๐œ•๐น (๐พ๐‘ฅ) and ๐‘ฅโˆ—1 = 0, respectively. Furthermore,(๐‘ฅโˆ—2, ๐‘ฆโˆ—2) โˆˆ ๐œ•๐›ฟgraph๐พ (๐‘ฅ, ๐พ๐‘ฅ) if and only if

ใ€ˆ๐‘ฅโˆ—2, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฆโˆ—2, ๏ฟฝ๏ฟฝ โˆ’ ๐พ๐‘ฅใ€‰๐‘Œ โ‰ค 0 for all (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆˆ graph๐พ,

i.e., for all ๐‘ฅ โˆˆ ๐‘‹ and ๏ฟฝ๏ฟฝ = ๐พ๐‘ฅ . Therefore,

ใ€ˆ๐‘ฅโˆ—2 + ๐พโˆ—๐‘ฆโˆ—2, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐‘‹and hence ๐‘ฅโˆ—2 = โˆ’๐พโˆ—๐‘ฆโˆ—2 . Together we obtain

(๐‘ฅโˆ—, 0) = (0, ๐‘ฆโˆ—1 ) + (โˆ’๐พโˆ—๐‘ฆโˆ—2, ๐‘ฆโˆ—2),

which implies ๐‘ฆโˆ—1 = โˆ’๐‘ฆโˆ—2 and thus ๐‘ฅโˆ— = โˆ’๐พโˆ—๐‘ฆโˆ—2 = ๐พโˆ—๐‘ฆโˆ—1 with ๐‘ฆโˆ—1 โˆˆ ๐œ•๐น (๐พ๐‘ฅ) as claimed. ๏ฟฝ

The condition for equality in particular holds if ๐พ is surjective and dom ๐น has nonemptyinterior. Again, the inequality can be strict.

54

4 convex subdifferentials

Example 4.18. Here we take ๐‘‹ = ๐‘Œ = โ„ and again ๐น : ๐‘‹ โ†’ โ„ from Examples 4.1and 4.15 as well as

๐พ : โ„ โ†’ โ„, ๐พ๐‘ฅ = 0.

Clearly, (๐น โ—ฆ ๐พ) (๐‘ฅ) = 0 for all ๐‘ฅ โˆˆ โ„ and hence ๐œ•(๐น โ—ฆ ๐พ) (๐‘ฅ) = {0} by Theorem 4.5. Onthe other hand, ๐œ•๐น (0) = โˆ… by Example 4.1 and hence

๐พโˆ—๐œ•๐น (๐พ๐‘ฅ) = ๐พโˆ—๐œ•๐น (0) = โˆ… ( {0}.

(Note the problem: ๐พโˆ— is far from surjective, and ran๐พ โˆฉ int(dom ๐น ) = โˆ….)

We can also obtain a chain rule when the inner mapping is nondierentiable.

Theorem 4.19. Let ๐น : ๐‘‹ โ†’ โ„ be convex and ๐œ‘ : โ„ โ†’ โ„ be convex, increasing, anddierentiable. Then ๐œ‘ โ—ฆ ๐น is convex, and for all ๐‘ฅ โˆˆ ๐‘‹ ,

๐œ•[๐œ‘ โ—ฆ ๐น ] (๐‘ฅ) = ๐œ‘โ€ฒ(๐น (๐‘ฅ))๐œ•๐น (๐‘ฅ) = {๐œ‘โ€ฒ(๐น (๐‘ฅ))๐‘ฅโˆ— | ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ)} .

Proof. First, the convexity of ๐œ‘ โ—ฆ ๐น follows from Lemma 3.4 (iii). To calculate the subdier-ential, we x ๐‘ฅ โˆˆ ๐‘‹ and observe from Theorem 3.13 that ๐œ‘ is Lipschitz continuous withsome constant ๐ฟ near ๐น (๐‘ฅ) โˆˆ int(dom๐œ‘) = โ„. Thus, for any โ„Ž โˆˆ ๐‘‹ ,

(๐œ‘ โ—ฆ ๐น )โ€ฒ(๐‘ฅ ;โ„Ž) = lim๐‘กโ†’ 0

[๐œ‘ โ—ฆ ๐น ] (๐‘ฅ + ๐‘กโ„Ž) โˆ’ [๐œ‘ โ—ฆ ๐น ] (๐‘ฅ)๐‘ก

= lim๐‘กโ†’ 0

๐œ‘ (๐น (๐‘ฅ + ๐‘กโ„Ž)) โˆ’ ๐œ‘ (๐น (๐‘ฅ) + ๐‘ก๐น โ€ฒ(๐‘ฅ ;โ„Ž))๐‘ก

+ lim๐‘กโ†’ 0

๐œ‘ (๐น (๐‘ฅ) + ๐‘ก๐น โ€ฒ(๐‘ฅ ;โ„Ž)) โˆ’ ๐œ‘ (๐น (๐‘ฅ))๐‘ก

โ‰ค lim๐‘กโ†’ 0

๐ฟ

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โˆ’ ๐น โ€ฒ(๐‘ฅ ;โ„Ž)๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ + ๐œ‘โ€ฒ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ ;โ„Ž))

= ๐œ‘โ€ฒ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ ;โ„Ž)),where we have used the directional dierentiability of ๐น in ๐‘ฅ โˆˆ int(dom ๐น ) = ๐‘‹ in the laststep. Similarly, we prove the opposite inequality using ๐œ‘ (๐‘ก1) โˆ’ ๐œ‘ (๐‘ก2) โ‰ฅ โˆ’๐ฟ |๐‘ก1 โˆ’ ๐‘ก2 | for all๐‘ก1, ๐‘ก2 suciently close to ๐น (๐‘ฅ). Hence [๐œ‘ โ—ฆ ๐น ] (๐‘ฅ ;โ„Ž) = ๐œ‘โ€ฒ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ ;โ„Ž)) = ๐œ‘โ€ฒ(๐น (๐‘ฅ))๐น โ€ฒ(๐‘ฅ ;โ„Ž)by the dierentiability of ๐œ‘ .

Now Lemma 4.4 yields that

๐œ•(๐œ‘ โ—ฆ ๐น ) (๐‘ฅ) = {๐‘งโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘งโˆ—, โ„Žใ€‰ โ‰ค ๐œ‘โ€ฒ(๐น (๐‘ฅ))๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= {๐œ‘โ€ฒ(๐น (๐‘ฅ))๐‘ฅโˆ— | ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= {๐œ‘โ€ฒ(๐น (๐‘ฅ))๐‘ฅโˆ— | ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ)}.

For the second step, note that๐œ‘โ€ฒ(๐น (๐‘ฅ)) = 0 implies that ๐‘งโˆ— = 0 as well; otherwise we can set๐‘งโˆ— โ‰” ๐œ‘โ€ฒ(๐น (๐‘ฅ))๐‘ฅโˆ— and use ๐œ‘โ€ฒ(๐น (๐‘ฅ)) > 0 (since ๐œ‘ is increasing) to simplify the inequality. ๏ฟฝ

55

4 convex subdifferentials

Remark 4.20. The dierentiability assumption on ๐œ‘ in Theorem 4.19 is not necessary, but the proofis otherwise much more involved and demands the support functional machinery of Section 13.3.See also [Hiriart-Urruty & Lemarรฉchal 2001, Section D.4.3] for a version with set-valued ๐น in nitedimensions.

The Fermat principle together with the sum rule yields the following characterization ofminimizers of convex functionals under convex constraints.

Corollary 4.21. Let ๐‘ˆ โŠ‚ ๐‘‹ be nonempty, convex, and closed, and let ๐น : ๐‘‹ โ†’ โ„ be proper,convex, and lower semicontinuous. If there exists an ๐‘ฅ0 โˆˆ int๐‘ˆ โˆฉ dom ๐น , then ๐‘ฅ โˆˆ ๐‘ˆ solves

min๐‘ฅโˆˆ๐‘ˆ

๐น (๐‘ฅ)

if and only if 0 โˆˆ ๐œ•๐น (๐‘ฅ) + ๐‘๐‘ˆ (๐‘ฅ) or, in other words, if there exists an ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— with

(4.7){ โˆ’ ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ),ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐‘ˆ .

Proof. Due to the assumptions on ๐น and ๐‘ˆ , we can apply Theorem 4.2 to ๐ฝ โ‰” ๐น + ๐›ฟ๐‘ˆ .Furthermore, since ๐‘ฅ0 โˆˆ int๐‘ˆ = int(dom๐›ฟ๐‘ˆ ), we can also apply Theorem 4.14. Hence ๐นhas a minimum in ๐‘ฅ if and only if

0 โˆˆ ๐œ•๐ฝ (๐‘ฅ) = ๐œ•๐น (๐‘ฅ) + ๐œ•๐›ฟ๐‘ˆ (๐‘ฅ).

Together with the characterization of subdierentials of indicator functionals as normalcones, this yields (4.7). ๏ฟฝ

If ๐น : ๐‘‹ โ†’ โ„ is Gรขteaux dierentiable (and hence nite-valued), (4.7) coincide with theclassical Karushโ€“Kuhnโ€“Tucker conditions; the existence of an interior point ๐‘ฅ0 โˆˆ int๐‘ˆ isrelated to a Slater condition in nonlinear optimization needed to show existence of theLagrange multiplier ๐‘ฅโˆ— for inequality constraints.

56

5 FENCHEL DUALITY

One of the main tools in convex optimization is duality: Any convex optimization problemcan be related to a dual problem, and the joint study of both problems yields additionalinformation about the solution. Our main objective in this chapter, the Fenchelโ€“Rockafellarduality theorem, will be our main tool for deriving explicit optimality conditions as well asnumerical algorithms for convex minimization problems that can be expressed as the sumof (simple) functionals.

5.1 fenchel conjugates

Let ๐‘‹ be a normed vector space and ๐น : ๐‘‹ โ†’ โ„ be proper but not necessarily convex. Wethen dene the Fenchel conjugate (or convex conjugate) of ๐น as

๐น โˆ— : ๐‘‹ โˆ— โ†’ โ„, ๐น โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ)} .

(Since dom ๐น = โˆ… is excluded, we have that ๐น โˆ—(๐‘ฅโˆ—) > โˆ’โˆž for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, and hence thedenition is meaningful.) An alternative interpretation is that ๐น โˆ—(๐‘ฅโˆ—) is the (negative ofthe) ane part of the tangent to ๐น (in the point ๐‘ฅ at which the supremum is attained)with slope ๐‘ฅโˆ—, see Figure 5.1. Lemma 3.4 (v) and Lemma 2.3 (v) immediately imply that ๐น โˆ—is always convex and weakly-โˆ— lower semicontinuous (as long as ๐น is indeed proper). If๐น is bounded from below by an ane functional (which is always the case if ๐น is proper,convex, and lower semicontinuous by Lemma 3.5), then ๐น โˆ— is proper as well. Finally, thedenition directly yields the Fenchelโ€“Young inequality

(5.1) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) + ๐น โˆ—(๐‘ฅโˆ—) for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—.

If ๐‘‹ is not reexive, we can similarly dene for (weakly-โˆ— lower semicontinuous) ๐น : ๐‘‹ โˆ— โ†’โ„ the Fenchel preconjugate

๐นโˆ— : ๐‘‹ โ†’ โ„, ๐นโˆ—(๐‘ฅ) = sup๐‘ฅโˆ—โˆˆ๐‘‹ โˆ—

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅโˆ—)} .

The point of this convention is that even in nonreexive spaces, the biconjugate

๐น โˆ—โˆ— : ๐‘‹ โ†’ โ„, ๐น โˆ—โˆ—(๐‘ฅ) = (๐น โˆ—)โˆ—(๐‘ฅ)

57

5 fenchel duality

๐‘ฅ

๐‘ก

๐‘ฅโˆ— ยท ๐‘ฅ โˆ’ ๐น (๐‘ฅ)

๐น โˆ—(๐‘ฅโˆ—)

(a) ๐น โˆ—(๐‘ฅโˆ—) as maximizer of ๐‘ฅโˆ— ยท ๐‘ฅ โˆ’ ๐น (๐‘ฅ)

๐‘ฅ

๐‘ก

๐น (๐‘ฅ)

โˆ’๐น โˆ—(๐‘ฅโˆ—) + ๐‘ฅโˆ— ยท ๐‘ฅ

(๐‘ฅ, ๐น (๐‘ฅ))

โˆ’๐น โˆ—(๐‘ฅโˆ—)

(b) Alternative interpretation: โˆ’๐น โˆ—(๐‘ฅโˆ—) as osetfor tangent to ๐น with given slope ๐‘ฅโˆ—. Note thatin this case, ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) and โˆ’๐น โˆ—(๐‘ฅโˆ—) +๐‘ฅโˆ— ยท๐‘ฅ =๐น (๐‘ฅ).

Figure 5.1: Geometrical illustration of the Fenchel conjugate

is again dened on ๐‘‹ (rather than ๐‘‹ โˆ—โˆ— โŠƒ ๐‘‹ ). For reexive spaces, of course, we have๐น โˆ—โˆ— = (๐น โˆ—)โˆ—. Intuitively, ๐น โˆ—โˆ— is the convex envelope of ๐น , which by Lemma 3.5 coincideswith ๐น itself if ๐น is convex.

Theorem 5.1 (Fenchelโ€“Moreauโ€“Rockafellar). Let ๐น : ๐‘‹ โ†’ โ„ be proper. Then,

(i) ๐น โˆ—โˆ— โ‰ค ๐น ;

(ii) ๐น โˆ—โˆ— = ๐น ฮ“ ;

(iii) ๐น โˆ—โˆ— = ๐น if and only if ๐น is convex and lower semicontinuous.

Proof. For (i), we take the supremum over all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— in the Fenchelโ€“Young inequality (5.1)and obtain that

๐น (๐‘ฅ) โ‰ฅ sup๐‘ฅโˆ—โˆˆ๐‘‹ โˆ—

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น โˆ—(๐‘ฅโˆ—)} = ๐น โˆ—โˆ—(๐‘ฅ).

For (ii), we rst note that ๐น โˆ—โˆ— is convex and lower semicontinuous by denition as a Fenchelconjugate as well as proper by (i). Hence, Lemma 3.5 yields that

๐น โˆ—โˆ—(๐‘ฅ) = (๐น โˆ—โˆ—)ฮ“ (๐‘ฅ) = sup {๐‘Ž(๐‘ฅ) | ๐‘Ž : ๐‘‹ โ†’ โ„ continuous ane with ๐‘Ž โ‰ค ๐น โˆ—โˆ—} .

We now show that we can replace ๐น โˆ—โˆ— with ๐น on the right-hand side. For this, let ๐‘Ž(๐‘ฅ) =ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›ผ with arbitrary ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and ๐›ผ โˆˆ โ„. If ๐‘Ž โ‰ค ๐น โˆ—โˆ—, then (i) implies that ๐‘Ž โ‰ค ๐น .

58

5 fenchel duality

Conversely, if ๐‘Ž โ‰ค ๐น , we have that ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ) โ‰ค ๐›ผ for all ๐‘ฅ โˆˆ ๐‘‹ , and taking thesupremum over all ๐‘ฅ โˆˆ ๐‘‹ yields that ๐›ผ โ‰ฅ ๐น โˆ—(๐‘ฅโˆ—). By denition of ๐น โˆ—โˆ—, we thus obtain that

๐‘Ž(๐‘ฅ) = ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›ผ โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น โˆ—(๐‘ฅโˆ—) โ‰ค ๐น โˆ—โˆ—(๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹,

i.e., ๐‘Ž โ‰ค ๐น โˆ—โˆ—.

Statement (iii) now directly follows from (ii) and Lemma 3.5. ๏ฟฝ

Remark 5.2. Continuing fromRemark 3.6,we can adapt the proof of Theorem 5.1 to proper functionals๐น : ๐‘‹ โˆ— โ†’ โ„ to show that ๐น = (๐นโˆ—)โˆ— if and only if ๐น is convex and weakly-โˆ— lower semicontinuous.

We again consider some relevant examples.

Example 5.3.

(i) Let ๐”น๐‘‹ be the unit ball in the normed vector space ๐‘‹ and take ๐น = ๐›ฟ๐”น๐‘‹ . Then wehave for any ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— that

(๐›ฟ๐”น๐‘‹ )โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›ฟ๐”น๐‘‹ (๐‘ฅ)}= sup

โ€–๐‘ฅ โ€–๐‘‹โ‰ค1{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ } = โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— .

Similarly, one shows using the denition of the Fenchel preconjugate and Corol-lary 1.7 that (๐›ฟ๐”น๐‘‹โˆ— )โˆ—(๐‘ฅ) = โ€–๐‘ฅ โ€–๐‘‹ .

(ii) Let ๐‘‹ be a normed vector space and take ๐น (๐‘ฅ) = โ€–๐‘ฅ โ€–๐‘‹ . We now distinguish twocases for a given ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—.

Case 1: โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1. Then it follows from (1.1) that ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ โ€–๐‘ฅ โ€–๐‘‹ โ‰ค 0 for all๐‘ฅ โˆˆ ๐‘‹ . Furthermore, ใ€ˆ๐‘ฅโˆ—, 0ใ€‰ = 0 = โ€–0โ€–๐‘‹ , which implies that

๐น โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ โ€–๐‘ฅ โ€–๐‘‹ } = 0.

Case 2: โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— > 1. Then by denition of the dual norm, there exists an ๐‘ฅ0 โˆˆ ๐‘‹with ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ0ใ€‰๐‘‹ > โ€–๐‘ฅ0โ€–๐‘‹ . Hence, taking ๐‘ก โ†’ โˆž in

0 < ๐‘ก (ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ0ใ€‰๐‘‹ โˆ’ โ€–๐‘ฅ0โ€–๐‘‹ ) = ใ€ˆ๐‘ฅโˆ—, ๐‘ก๐‘ฅ0ใ€‰๐‘‹ โˆ’ โ€–๐‘ก๐‘ฅ0โ€–๐‘‹ โ‰ค ๐น โˆ—(๐‘ฅโˆ—)

yields ๐น โˆ—(๐‘ฅโˆ—) = โˆž.

Together we obtain that ๐น โˆ— = ๐›ฟ๐”น๐‘‹โˆ— . As above, a similar argument shows that(โ€– ยท โ€–๐‘‹ โˆ—)โˆ— = ๐›ฟ๐”น๐‘‹ .

We can generalize Example 5.3 (ii) to powers of norms.

59

5 fenchel duality

Lemma 5.4. Let ๐‘‹ be a normed vector space and ๐น (๐‘ฅ) โ‰” 1๐‘ โ€–๐‘ฅ โ€–

๐‘๐‘‹ for ๐‘ โˆˆ (1,โˆž). Then

๐น โˆ—(๐‘ฅโˆ—) = 1๐‘ž โ€–๐‘ฅโˆ—โ€–

๐‘ž๐‘‹ โˆ— for ๐‘ž โ‰” ๐‘

๐‘โˆ’1 .

Proof. Werst consider the scalar function๐œ‘ (๐‘ก) โ‰” 1๐‘ |๐‘ก |๐‘ and compute the Fenchel conjugate

๐œ‘โˆ—(๐‘ ) for ๐‘  โˆˆ โ„. By the choice of ๐‘ and ๐‘ž, we then can write 1๐‘ž = 1 โˆ’ 1

๐‘ as well as |๐‘  |๐‘ž =

sign(๐‘ )๐‘  |๐‘  |1/(๐‘โˆ’1) = | sign(๐‘ ) |๐‘  |1/(๐‘โˆ’1) |๐‘ for any ๐‘  โˆˆ โ„ and therefore obtain

1๐‘ž|๐‘  |๐‘ž =

(sign(๐‘ ) |๐‘  |1/(๐‘โˆ’1)

)๐‘  โˆ’ 1

๐‘

๏ฟฝ๏ฟฝ๏ฟฝsign(๐‘ ) |๐‘  |1/(๐‘โˆ’1) ๏ฟฝ๏ฟฝ๏ฟฝ๐‘ โ‰ค sup๐‘กโˆˆโ„

{๐‘ก๐‘  โˆ’ 1

๐‘|๐‘ก |๐‘

}โ‰ค 1๐‘ž|๐‘  |๐‘ž,

where we have used the classical Young inequality ๐‘ก๐‘  โ‰ค 1๐‘ž |๐‘ก |๐‘ + 1

๐‘ž |๐‘  |๐‘ž in the last step. Thisshows that ๐œ‘โˆ—(๐‘ ) = 1

๐‘ž |๐‘  |๐‘ž .1

We now write using the denition of the norm in ๐‘‹ โˆ— that

๐น โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ 1

๐‘โ€–๐‘ฅ โ€–๐‘๐‘‹

}= sup

๐‘กโ‰ฅ0

{sup๐‘ฅโˆˆ๐”น๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ก๐‘ฅใ€‰๐‘‹ โˆ’ 1

๐‘โ€–๐‘ก๐‘ฅ โ€–๐‘๐‘‹

}}= sup

๐‘กโ‰ฅ0

{๐‘ก โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โˆ’ 1

๐‘|๐‘ก |๐‘

}=

1๐‘ž|โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— |๐‘ž

since ๐œ‘ is even and the supremum over all ๐‘ก โˆˆ โ„ is thus attained for ๐‘ก โ‰ฅ 0. ๏ฟฝ

As for convex subdierentials, Fenchel conjugates of integral functionals can be computedpointwise.

Theorem 5.5. Let ๐‘“ : โ„ โ†’ โ„ be measurable, proper and lower semicontinuous, and let๐น : ๐ฟ๐‘ (ฮฉ) โ†’ โ„ with 1 โ‰ค ๐‘ < โˆž be dened as in Lemma 3.7. Then we have for ๐‘ž = ๐‘

๐‘โˆ’1 that

๐น โˆ— : ๐ฟ๐‘ž (ฮฉ) โ†’ โ„, ๐น โˆ—(๐‘ขโˆ—) =โˆซฮฉ๐‘“ โˆ—(๐‘ขโˆ—(๐‘ฅ)) ๐‘‘๐‘ฅ .

Proof. We argue similarly as in the proof of Theorem 4.11, with some changes that areneeded sincemeasurability of ๐‘“ โ—ฆ๐‘ข does not immediately imply that of ๐‘“ โˆ—โ—ฆ๐‘ขโˆ—. Let๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ)be arbitrary and consider for all ๐‘ฅ โˆˆ ฮฉ the functions

๐œ‘ (๐‘ฅ) โ‰” sup๐‘กโˆˆโ„

{๐‘ก๐‘ขโˆ—(๐‘ฅ) โˆ’ ๐‘“ (๐‘ก)} = ๐‘“ โˆ—(๐‘ขโˆ—(๐‘ฅ)),as well as for ๐‘› โˆˆ โ„•

๐œ‘๐‘› (๐‘ฅ) โ‰” sup|๐‘ก |โ‰ค๐‘›

{๐‘ก๐‘ขโˆ—(๐‘ฅ) โˆ’ ๐‘“ (๐‘ก)} .

1Which is how the Fenchelโ€“Young inequality got its name.

60

5 fenchel duality

Since ๐‘ขโˆ— is measurable, so is ๐‘ก๐‘ขโˆ— โˆ’ ๐‘“ (๐‘ก) for all ๐‘ก โˆˆ dom ๐‘“ โ‰  โˆ… and hence ๐œ‘๐‘› as the pointwisesupremum of measurable functions. Furthermore, by assumption there exists a ๐‘ก0 โˆˆ dom ๐‘“and hence ๐‘ข0 โ‰” ๐‘ก0๐‘ข

โˆ—(๐‘ฅ) โˆ’ ๐‘“ (๐‘ก0) is measurable as well and satises ๐œ‘๐‘› (๐‘ฅ) โ‰ฅ ๐‘ข0 for all๐‘› โ‰ฅ |๐‘ก0 |. Finally, by construction, ๐œ‘๐‘› (๐‘ฅ) is monotonically increasing and converges to ๐œ‘ (๐‘ฅ)for all ๐‘ฅ โˆˆ ฮฉ. The sequence {๐œ‘๐‘› โˆ’ ๐‘ข0}๐‘›โˆˆโ„• of functions is thus measurable and nonnegative,and the monotone convergence theorem yields thatโˆซ

ฮฉ๐œ‘ (๐‘ฅ) โˆ’ ๐‘ข0(๐‘ฅ) ๐‘‘๐‘ฅ =

โˆซฮฉsup๐‘›โˆˆโ„•

๐œ‘๐‘› (๐‘ฅ) โˆ’ ๐‘ข0 ๐‘‘๐‘ฅ = sup๐‘›โˆˆโ„•

โˆซฮฉ๐œ‘๐‘› (๐‘ฅ) โˆ’ ๐‘ข0(๐‘ฅ) ๐‘‘๐‘ฅ .

Hence, the pointwise limit ๐œ‘ = ๐‘“ โˆ— โ—ฆ ๐‘ขโˆ— is measurable. By a measurable selection theorem([Ekeland & Tรฉmam 1999, Theorem VIII.1.2]), the pointwise supremum in the denitionof ๐œ‘๐‘› is thus attained at some ๐‘ข๐‘› (๐‘ฅ) and denes a measurable mapping ๐‘ฅ โ†ฆโ†’ ๐‘ข๐‘› (๐‘ฅ) withโ€–๐‘ข๐‘›โ€–๐ฟโˆž (ฮฉ) โ‰ค ๐‘›. We thus haveโˆซ

ฮฉ๐‘“ โˆ—(๐‘ขโˆ—(๐‘ฅ)) ๐‘‘๐‘ฅ = sup

๐‘›โˆˆโ„•

โˆซฮฉsup|๐‘ก |โ‰ค๐‘›

{๐‘ก๐‘ขโˆ—(๐‘ฅ) โˆ’ ๐‘“ (๐‘ก)} ๐‘‘๐‘ฅ

= sup๐‘›โˆˆโ„•

โˆซฮฉ๐‘ขโˆ—(๐‘ฅ)๐‘ข๐‘› (๐‘ฅ) โˆ’ ๐‘“ (๐‘ข๐‘› (๐‘ฅ)) ๐‘‘๐‘ฅ

โ‰ค sup๐‘ขโˆˆ๐ฟ๐‘ (ฮฉ)

โˆซฮฉ๐‘ขโˆ—(๐‘ฅ)๐‘ข (๐‘ฅ) โˆ’ ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ = ๐น โˆ—(๐‘ขโˆ—),

since ๐‘ข๐‘› โˆˆ ๐ฟโˆž(ฮฉ) โŠ‚ ๐ฟ๐‘ (ฮฉ) for all ๐‘› โˆˆ โ„•.

For the converse inequality, we can now proceed as in the proof of Theorem 4.11. For any๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) and ๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ), we have by the Fenchelโ€“Young inequality (5.1) applied to ๐‘“ and๐‘“ โˆ— that

๐‘“ (๐‘ข (๐‘ฅ)) + ๐‘“ โˆ—(๐‘ขโˆ—(๐‘ฅ)) โ‰ฅ ๐‘ขโˆ—(๐‘ฅ)๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ.

Since both sides are measurable, this implies thatโˆซฮฉ๐‘“ โˆ—(๐‘ขโˆ—(๐‘ฅ)) ๐‘‘๐‘ฅ โ‰ฅ

โˆซฮฉ๐‘ขโˆ—(๐‘ฅ)๐‘ข (๐‘ฅ) โˆ’ ๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ,

and taking the supremum over all ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) yields claim. ๏ฟฝ

Remark 5.6. A similar representation representation can be shown for vector-valued and spatially-dependent integrands ๐‘“ : ฮฉ ร—โ„ โ†’ โ„๐‘š under stronger assumptions; see, e.g., [Rockafellar 1976a,Corollary 3C].

Fenchel conjugates satisfy a number of useful calculus rules, which follow directly fromthe properties of the supremum.

Lemma 5.7. Let ๐น : ๐‘‹ โ†’ โ„ be proper. Then,

61

5 fenchel duality

(i) (๐›ผ๐น )โˆ— = ๐›ผ๐น โˆ— โ—ฆ (๐›ผโˆ’1Id) for any ๐›ผ > 0;

(ii) (๐น (ยท + ๐‘ฅ0) + ใ€ˆ๐‘ฅโˆ—0, ยทใ€‰๐‘‹ )โˆ— = ๐น โˆ—(ยท โˆ’ ๐‘ฅโˆ—0) โˆ’ ใ€ˆยท โˆ’ ๐‘ฅโˆ—0, ๐‘ฅ0ใ€‰๐‘‹ for all ๐‘ฅ0 โˆˆ ๐‘‹ , ๐‘ฅโˆ—0 โˆˆ ๐‘‹ โˆ—;

(iii) (๐น โ—ฆ ๐พ)โˆ— = ๐น โˆ— โ—ฆ ๐พโˆ’โˆ— for continuously invertible ๐พ โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ) and ๐พโˆ’โˆ— โ‰” (๐พโˆ’1)โˆ—.

Proof. (i): For any ๐›ผ > 0, we have that

(๐›ผ๐น )โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{๐›ผ ใ€ˆ๐›ผโˆ’1๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›ผ๐น (๐‘ฅ)} = ๐›ผ sup

๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐›ผโˆ’1๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ)} = ๐›ผ๐น โˆ—(๐›ผโˆ’1๐‘ฅโˆ—).

(ii): Since {๐‘ฅ + ๐‘ฅ0 | ๐‘ฅ โˆˆ ๐‘‹ } = ๐‘‹ , we have that

(๐น (ยท + ๐‘ฅ0) + ใ€ˆ๐‘ฅโˆ—0, ยทใ€‰๐‘‹ )โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ + ๐‘ฅ0)} โˆ’ ใ€ˆ๐‘ฅโˆ—0, ๐‘ฅใ€‰๐‘‹= sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0, ๐‘ฅ + ๐‘ฅ0ใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ + ๐‘ฅ0)} โˆ’ ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0, ๐‘ฅ0ใ€‰๐‘‹

= sup๐‘ฅ=๐‘ฅ+๐‘ฅ0,๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ)} โˆ’ ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0, ๐‘ฅ0ใ€‰๐‘‹

= ๐น โˆ—(๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0) โˆ’ ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0, ๐‘ฅ0ใ€‰๐‘‹ .

(iii): Since ๐‘‹ = ran๐พ , we have that

(๐น โ—ฆ ๐พ)โˆ—(๐‘ฆโˆ—) = sup๐‘ฆโˆˆ๐‘Œ

{ใ€ˆ๐‘ฆโˆ—, ๐พโˆ’1๐พ๐‘ฆใ€‰๐‘Œ โˆ’ ๐น (๐พ๐‘ฆ)}= sup๐‘ฅ=๐พ๐‘ฆ,๐‘ฆโˆˆ๐‘Œ

{ใ€ˆ๐พโˆ’โˆ—๐‘ฆโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ)} = ๐น โˆ—(๐พโˆ’โˆ—๐‘ฆโˆ—). ๏ฟฝ

There are some obvious similarities between the denitions of the Fenchel conjugate andof the subdierential, which yield the following very useful property that plays the roleof a โ€œconvex inverse function theoremโ€. (See also Figure 5.1b and compare Figures 4.1aand 4.1b.)

Lemma 5.8 (Fenchelโ€“Young). Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous.Then the following statements are equivalent for any ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—:

(i) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = ๐น (๐‘ฅ) + ๐น โˆ—(๐‘ฅโˆ—);(ii) ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ);(iii) ๐‘ฅ โˆˆ ๐œ•๐น โˆ—(๐‘ฅโˆ—).

62

5 fenchel duality

Proof. If (i) holds, the denition of ๐น โˆ— as a supremum immediately implies that

(5.2) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ) = ๐น โˆ—(๐‘ฅโˆ—) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹,which again by denition is equivalent to (ii). Conversely, taking the supremum over all๐‘ฅ โˆˆ ๐‘‹ in (5.2) yields

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ) + ๐น โˆ—(๐‘ฅโˆ—),which together with the Fenchelโ€“Young inequality (5.1) leads to (i).

Similarly, (i) in combination with Theorem 5.1 yields that for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—,

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น โˆ—(๐‘ฅโˆ—) = ๐น (๐‘ฅ) = ๐น โˆ—โˆ—(๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น โˆ—(๐‘ฅโˆ—),yielding as above the equivalence of (i) and (iii). ๏ฟฝ

Remark 5.9. Recall that ๐œ•๐น โˆ—(๐‘ฅโˆ—) โŠ‚ ๐‘‹ โˆ—โˆ—. Therefore, if ๐‘‹ is not reexive, ๐‘ฅ โˆˆ ๐œ•๐น โˆ—(๐‘ฅโˆ—) in (iii) has to beunderstood via the canonical injection ๐ฝ : ๐‘‹ โ†ฉโ†’ ๐‘‹ โˆ—โˆ— as ๐ฝ๐‘ฅ โˆˆ ๐œ•๐น โˆ—(๐‘ฅโˆ—), i.e., as

ใ€ˆ๐ฝ๐‘ฅ, ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ โˆ— = ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น โˆ—(๐‘ฅโˆ—) โˆ’ ๐น โˆ—(๐‘ฅโˆ—) for all ๐‘ฅโˆ— โˆˆ ๐‘‹ .Using (iii) to conclude equality in (i) or, equivalently, the subdierential inclusion (ii) thereforerequires the additional condition that ๐‘ฅ โˆˆ ๐‘‹ โ†ฉโ†’ ๐‘‹ โˆ—โˆ—. Conversely, if (i) or (ii) hold, (iii) also guaranteesthat the subderivative ๐‘ฅ is an element of ๐œ•๐น โˆ—(๐‘ฅโˆ—) โˆฉ ๐‘‹ , which is a stronger claim.

Similar statements apply to (weakly-โˆ— lower semicontinuous) ๐น : ๐‘‹ โˆ— โ†’ โ„ and ๐นโˆ— : ๐‘‹ โ†’ โ„.

5.2 duality of optimization problems

Lemma 5.8 can be used to replace the subdierential of a (complicated) norm with that of a(simpler) conjugate indicator functional (or vice versa). For example, given a problem ofthe form

(5.3) inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ)

for ๐น : ๐‘‹ โ†’ โ„ and๐บ : ๐‘Œ โ†’ โ„ proper, convex, and lower semicontinuous, and๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ),we can use Theorem 5.1 to replace ๐บ with the denition of ๐บโˆ—โˆ— and obtain the saddle-pointproblem

(5.4) inf๐‘ฅโˆˆ๐‘‹

sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—).

If(!) we were now able to exchange inf and sup, we could write (with inf ๐น = โˆ’ sup(โˆ’๐น ))inf๐‘ฅโˆˆ๐‘‹

sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—) = sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—)

= sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

โˆ’{sup๐‘ฅโˆˆ๐‘‹

โˆ’๐น (๐‘ฅ) + ใ€ˆโˆ’๐พโˆ—๐‘ฆโˆ—, ๐‘ฅใ€‰๐‘‹}โˆ’๐บโˆ—(๐‘ฆโˆ—).

63

5 fenchel duality

From the denition of ๐น โˆ—, we thus obtain the dual problem

(5.5) sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

โˆ’๐บโˆ—(๐‘ฆโˆ—) โˆ’ ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—).

As a side eect, we have shifted the operator ๐พ from๐บ to ๐น โˆ— without having to invert it.

The following theorem in an elegant way uses the Fermat principle, the sum and chain rules,and the Fenchelโ€“Young equality to derive sucient conditions for the exchangeability.

Theorem 5.10 (Fenchelโ€“Rockafellar). Let ๐‘‹ and ๐‘Œ be Banach spaces, ๐น : ๐‘‹ โ†’ โ„ and ๐บ :๐‘Œ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ). Assume furthermorethat

(i) the primal problem (5.3) admits a solution ๐‘ฅ โˆˆ ๐‘‹ ;(ii) there exists an ๐‘ฅ0 โˆˆ dom(๐บ โ—ฆ ๐พ) โˆฉ dom ๐น with ๐พ๐‘ฅ0 โˆˆ int(dom๐บ) .

Then the dual problem (5.5) admits a solution ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— and

(5.6) min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) = max๐‘ฆโˆ—โˆˆ๐‘Œ โˆ— โˆ’๐บ

โˆ—(๐‘ฆโˆ—) โˆ’ ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—).

Furthermore, ๐‘ฅ and ๐‘ฆโˆ— are solutions to (5.3) and (5.5), respectively, if and only if

(5.7){

๐‘ฆโˆ— โˆˆ ๐œ•๐บ (๐พ๐‘ฅ),โˆ’๐พโˆ—๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐‘ฅ).

Proof. Let rst ๐‘ฅ โˆˆ ๐‘‹ be a solution to (5.3). By assumption (ii), Theorems 4.14 and 4.17 areapplicable; Theorem 4.2 thus implies that

0 โˆˆ ๐œ•(๐น +๐บ โ—ฆ ๐พ) (๐‘ฅ) = ๐พโˆ—๐œ•๐บ (๐พ๐‘ฅ) + ๐œ•๐น (๐‘ฅ)

and thus the existence of a ๐‘ฆโˆ— โˆˆ ๐œ•๐บ (๐พ๐‘ฅ) with โˆ’๐พโˆ—๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐‘ฅ), i.e., satisfying (5.7).Conversely, let (5.7) hold for ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—. Then again by Theorems 4.2, 4.14 and 4.17,๐‘ฅ is a solution to (5.3). Furthermore, (5.7) together with Lemma 5.8 imply equality in theFenchelโ€“Young inequalities for ๐น and ๐บ , i.e.,

(5.8){ ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ = ๐บ (๐พ๐‘ฅ) +๐บโˆ—(๐‘ฆโˆ—),ใ€ˆโˆ’๐พโˆ—๐‘ฆโˆ—, ๐‘ฅใ€‰๐‘‹ = ๐น (๐‘ฅ) + ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—).

Adding both equations and rearranging now yields

(5.9) ๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) = โˆ’๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—) โˆ’๐บโˆ—(๐‘ฆโˆ—) .

It remains to show that ๐‘ฆโˆ— is a solution to (5.5). For this purpose, we introduce

(5.10) ๐ฟ : ๐‘‹ ร— ๐‘Œ โˆ— โ†’ โ„, ๐ฟ(๐‘ฅ, ๐‘ฆโˆ—) = ๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—).

64

5 fenchel duality

For all ๐‘ฅ โˆˆ ๐‘‹ and ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐‘Œ โˆ—, we always have that

(5.11) sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

๐ฟ(๐‘ฅ, ๐‘ฆโˆ—) โ‰ฅ ๐ฟ(๐‘ฅ, ๏ฟฝ๏ฟฝโˆ—) โ‰ฅ inf๐‘ฅโˆˆ๐‘‹

๐ฟ(๐‘ฅ, ๏ฟฝ๏ฟฝโˆ—),

and hence (taking the inmum over all ๐‘ฅ in the rst and the supremum over all ๏ฟฝ๏ฟฝโˆ— in thesecond inequality) that

(5.12) inf๐‘ฅโˆˆ๐‘‹

sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

๐ฟ(๐‘ฅ, ๐‘ฆโˆ—) โ‰ฅ sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

inf๐‘ฅโˆˆ๐‘‹

๐ฟ(๐‘ฅ, ๐‘ฆโˆ—).

We thus obtain that

(5.13) ๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) = inf๐‘ฅโˆˆ๐‘‹

sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—)

โ‰ฅ sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

inf๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—)

= sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

โˆ’๐บโˆ—(๐‘ฆโˆ—) โˆ’ ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—).

Combining this with (5.9) yields that

โˆ’๐บโˆ—(๐‘ฆโˆ—) โˆ’ ๐น (โˆ’๐พโˆ—๐‘ฆโˆ—) = ๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) โ‰ฅ sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

โˆ’๐บโˆ—(๐‘ฆโˆ—) โˆ’ ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—),

i.e., ๐‘ฆโˆ— is a solution to (5.5), which in particular shows the claimed existence of a solution.

Since all solutions to (5.5) have by denition the same (maximal) functional value, (5.9) alsoimplies (5.6).

Finally, if ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— are solutions to (5.3) and (5.5), respectively, the just derivedstrong duality (5.6) conversely implies that (5.9) holds. Together with the productive zero,we obtain from this that

0 = [๐บ (๐พ๐‘ฅ) +๐บโˆ—(๐‘ฆโˆ—) โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘‹ ] + [๐น (๐‘ฅ) + ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—) โˆ’ ใ€ˆโˆ’๐พโˆ—๐‘ฆโˆ—, ๐‘ฅใ€‰๐‘Œ ] .Since both brackets have to be nonnegative due to the Fenchelโ€“Young inequality, they eachhave to be zero. We therefore deduce that (5.8) holds, and hence Lemma 5.8 implies (5.7). ๏ฟฝ

Remark 5.11. If ๐‘‹ is the dual of a separable Banach space ๐‘‹โˆ—, it is possible to derive a similar dualityresult with the (weakly-โˆ— lower semicontinuous) preconjugate ๐นโˆ— : ๐‘‹โˆ— โ†’ โ„ in place of ๐น โˆ— : ๐‘‹ โˆ— โ†’ โ„

under the additional assumption that ran๐พโˆ— โŠ‚ ๐‘‹โˆ— ( ๐‘‹ โˆ— (using Remark 5.9 in (5.8)). If ๐‘‹โˆ— is a โ€œnicerโ€space than ๐‘‹ โˆ— (e.g., for ๐‘‹ = M(ฮฉ), the space of bounded Radon measures on a domain ฮฉ with๐‘‹โˆ— = ๐ถ0(ฮฉ), the space of continuous functions with compact support), the predual problem

sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

โˆ’๐บโˆ—(๐‘ฆโˆ—) โˆ’ ๐นโˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—)

may be easier to treat than the dual problem (5.5). This is the basis of the โ€œpreduality trickโ€ used in,e.g., [Hintermรผller & Kunisch 2004; Clason & Kunisch 2011].

65

5 fenchel duality

Remark 5.12. The condition (ii) was only used to guarantee equality in the sum and chain rulesTheorems 4.14 and 4.17 applied to ๐น +๐บ โ—ฆ ๐พ . Since these rules hold under the weaker conditionof Remark 4.16 (recall that the chain rule was proved by reduction to the sum rule), Theorem 5.10and Corollary 5.13 hold under this weaker condition as well.

The relations (5.7) are referred to as Fenchel extremality conditions; we can use Lemma 5.8to generate further, equivalent, optimality conditions by inverting one or the other sub-dierential inclusion. We will later exploit this to derive implementable algorithms forsolving optimization problems of the form (5.3). Furthermore, Theorem 5.10 characterizesthe subderivative ๐‘ฆโˆ— produced by the sum and chain rules as solution to a convex mini-mization problem, which may be useful. For example, if either ๐น โˆ— or ๐บโˆ— is strongly convex,this subderivative will be unique, which has benecial consequences for the stability andthe convergence of algorithms for the computation of solutions to (5.7).

For their analysis, it will sometimes be more convenient to apply the consequences ofTheorem 5.10 in the form of the saddle-point problem (5.4). For a general mapping ๐ฟ :๐‘‹ ร— ๐‘Œ โˆ— โ†’ โ„, we call (๐‘ฅ, ๏ฟฝ๏ฟฝโˆ—) a saddle point of ๐ฟ if

(5.14) sup๐‘ฆโˆ—โˆˆ๐‘Œ โˆ—

๐ฟ(๐‘ฅ, ๐‘ฆโˆ—) โ‰ค ๐ฟ(๐‘ฅ, ๏ฟฝ๏ฟฝโˆ—) โ‰ค inf๐‘ฅโˆˆ๐‘‹

๐ฟ(๐‘ฅ, ๏ฟฝ๏ฟฝโˆ—).

(Note that the opposite inequality (5.11) always holds.)

Corollary 5.13. Assume that the conditions of Theorem 5.10 hold. Then there exists a saddlepoint (๐‘ฅ, ๐‘ฆโˆ—) โˆˆ ๐‘‹ ร— ๐‘Œ โˆ— to

๐ฟ(๐‘ฅ, ๐‘ฆโˆ—) โ‰” ๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—).

Furthermore, for any (๐‘ฅ, ๐‘ฆโˆ—) โˆˆ ๐‘‹ ร— ๐‘Œ โˆ—,

(5.15) ๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—) โ‰ค ๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—)โ‰ค ๐น (๐‘ฅ) + ใ€ˆ๐‘ฆโˆ—, ๐พ๐‘ฅใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆโˆ—).

Proof. Both statements follow from the fact that under the assumption, the inequality in(5.13) and hence in (5.14) holds as an equality. ๏ฟฝ

With the notation ๐‘ข = (๐‘ฅ, ๐‘ฆ), let us dene the duality gap

(5.16) G(๐‘ข) โ‰” ๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) +๐บโˆ—(๐‘ฆโˆ—) + ๐น โˆ—(โˆ’๐พโˆ—๐‘ฆโˆ—).

By Theorem 5.10, we have G โ‰ฅ 0 and G(๐‘ข) = 0 if and only if ๐‘ข is a saddle point.

On the other hand, for any saddle point ๐‘ข = (๐‘ฅ, ๐‘ฆโˆ—) of a Lagrangian ๐ฟ : ๐‘‹ ร— ๐‘Œ โˆ— โ†’ โ„, wecan also dene the Lagrangian duality gap

G๐ฟ (๐‘ข;๐‘ข) โ‰” ๐ฟ(๐‘ฅ, ๐‘ฆโˆ—) โˆ’ ๐ฟ(๐‘ฅ, ๐‘ฆโˆ—).

66

5 fenchel duality

For ๐ฟ dened in (5.10), we always have by the denition of the convex conjugate that

0 โ‰ค G๐ฟ (๐‘ข;๐‘ข) โ‰ค G(๐‘ข).

However, G๐ฟ (๐‘ข;๐‘ข) = 0 does not necessarily imply that ๐‘ข is a saddle point. (This is the caseif ๐ฟ is strictly convex in ๐‘ฅ or strictly concave in ๐‘ฆ , i.e., if either ๐น or ๐บโˆ— is strictly convex.)Nevertheless, as we will see in later chapters, the Lagrangian duality gap can generally beshown to converge for iterates produced by optimization algorithms, while this is moredicult for the conventional duality gap.

67

6 MONOTONE OPERATORS AND PROXIMAL POINTS

Any minimizer ๐‘ฅ โˆˆ ๐‘‹ of a convex functional ๐น : ๐‘‹ โ†’ โ„ satises by Theorem 4.2 theFermat principle 0 โˆˆ ๐œ•๐น (๐‘ฅ). To use this to characterize ๐‘ฅ , and, later, to derive implementablealgorithms for its iterative computation, we now study the mapping ๐‘ฅ โ†ฆโ†’ ๐œ•๐น (๐‘ฅ) in moredetail.

6.1 basic properties of set-valued mappings

We start with some basic concepts. For two normed vector spaces ๐‘‹ and ๐‘Œ we consider aset-valued mapping ๐ด : ๐‘‹ โ†’ P(๐‘Œ ), also denoted by ๐ด : ๐‘‹ โ‡’ ๐‘Œ , and dene

โ€ข its domain of denition dom๐ด = {๐‘ฅ โˆˆ ๐‘‹ | ๐ด(๐‘ฅ) โ‰  โˆ…};โ€ข its range ran๐ด =

โ‹ƒ๐‘ฅโˆˆ๐‘‹ ๐ด(๐‘ฅ);

โ€ข its graph graph๐ด = {(๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ | ๐‘ฆ โˆˆ ๐ด(๐‘ฅ)};โ€ข its inverse ๐ดโˆ’1 : ๐‘Œ โ‡’ ๐‘‹ via ๐ดโˆ’1(๐‘ฆ) = {๐‘ฅ โˆˆ ๐‘‹ | ๐‘ฆ โˆˆ ๐ด(๐‘ฅ)} for all ๐‘ฆ โˆˆ ๐‘Œ .

(Note that ๐ดโˆ’1(๐‘ฆ) = โˆ… is allowed by the denition; hence for set-valued mappings, theirinverse and preimage โ€“ which always exists โ€“ coincide.)

For ๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘Œ , ๐ถ : ๐‘Œ โ‡’ ๐‘ , and _ โˆˆ โ„ we further dene

โ€ข _๐ด : ๐‘‹ โ‡’ ๐‘Œ via (_๐ด) (๐‘ฅ) = {_๐‘ฆ | ๐‘ฆ โˆˆ ๐ด(๐‘ฅ)};โ€ข ๐ด + ๐ต : ๐‘‹ โ‡’ ๐‘Œ via (๐ด + ๐ต) (๐‘ฅ) = {๐‘ฆ + ๐‘ง | ๐‘ฆ โˆˆ ๐ด(๐‘ฅ), ๐‘ง โˆˆ ๐ต(๐‘ฅ)};โ€ข ๐ถ โ—ฆ๐ด : ๐‘‹ โ‡’ ๐‘ via (๐ถ โ—ฆ๐ด) (๐‘ฅ) = {๐‘ง | there is ๐‘ฆ โˆˆ ๐ด(๐‘ฅ) with ๐‘ง โˆˆ ๐ถ (๐‘ฆ)}.

Of particular importance not only in the following but also in Part IV is the continuity ofset-valued mappings. We rst introduce notions of convergence of sets. So let {๐‘‹๐‘›}๐‘›โˆˆโ„• bea sequence of subsets of ๐‘‹ . We dene

(i) the outer limit as the set

lim sup๐‘›โ†’โˆž

๐‘‹๐‘› โ‰”

{๐‘ฅ โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ there exists {๐‘›๐‘˜}๐‘˜โˆˆโ„• with ๐‘ฅ๐‘›๐‘˜ โˆˆ ๐‘‹๐‘›๐‘˜ and lim๐‘˜โ†’โˆž

๐‘ฅ๐‘›๐‘˜ = ๐‘ฅ

},

68

6 monotone operators and proximal points

๐‘‹1 ๐‘‹2

๐‘‹3

๐‘‹4

๐‘‹5

. . .

. . .

0

1

Figure 6.1: Illustration of Example 6.1 with lim sup๐‘›โ†’โˆž๐‘‹๐‘› = [0, 1] while lim inf๐‘›โ†’โˆž๐‘‹๐‘› = โˆ….

(ii) the inner limit as the set

lim inf๐‘›โ†’โˆž ๐‘‹๐‘› โ‰”

{๐‘ฅ โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ there exist ๐‘ฅ๐‘› โˆˆ ๐‘‹๐‘› with lim๐‘›โ†’โˆž๐‘ฅ๐‘› = ๐‘ฅ

}.

Correspondingly, we dene the weak outer limit and the weak inner limit, denoted byw-lim sup ๐‘›โ†’โˆž๐‘‹๐‘› and w-lim inf ๐‘›โ†’โˆž๐‘‹๐‘›, respectively, using weakly converging (sub)se-quences. Similarly, for a dual space๐‘‹ โˆ—, we dene theweak-โˆ— outer limit w-โˆ—-lim sup ๐‘›โ†’โˆž๐‘‹

โˆ—๐‘›

and the weak-โˆ— inner limit w-โˆ—-lim inf ๐‘›โ†’โˆž๐‘‹ โˆ—๐‘› .

The outer limit consists of all points approximable through some subsequence of the sets๐‘‹๐‘›, while the inner limit has to be approximable through every subsequence. The vastdierence between inner and outer limits is illustrated by the following extreme example.

Example 6.1. Let ๐‘‹ = โ„ and {๐‘‹๐‘›}๐‘›โˆˆโ„•, ๐‘‹๐‘› โŠ‚ [0, 1], be given as

๐‘‹๐‘› โ‰”

[0, 13 ) if ๐‘› = 3๐‘˜ โˆ’ 2 for some ๐‘˜ โˆˆ โ„•,

[ 13 , 23 ) if ๐‘› = 3๐‘˜ โˆ’ 1 for some ๐‘˜ โˆˆ โ„•,

[ 23 , 1] if ๐‘› = 3๐‘˜ for some ๐‘˜ โˆˆ โ„•,

see Figure 6.1. Then,

lim sup๐‘›โ†’โˆž

๐‘‹๐‘› = [0, 1],

since for any๐‘ฅ โˆˆ [0, 1],we can nd a subsequence of {๐‘‹๐‘›}๐‘›โˆˆโ„• (by selecting subsequenceswith, e.g., ๐‘› = 3๐‘˜ โˆ’ 1 for ๐‘˜ โˆˆ โ„• if ๐‘ฅ < 1

3 ) that contain ๐‘ฅ . On the other hand,

lim inf๐‘›โ†’โˆž ๐‘‹๐‘› = โˆ…,

since for any ๐‘ฅ โˆˆ [0, 1], there will be a subsequence of ๐‘‹๐‘› (again, selecting only subse-quences with, e.g., ๐‘› = 3๐‘˜ for ๐‘˜ โˆˆ โ„• if ๐‘ฅ < 1

3 ) that will not contain points arbitrarilyclose to ๐‘ฅ .

Lemma 6.2. Let {๐‘‹๐‘›}๐‘›โˆˆโ„•, ๐‘‹๐‘› โŠ‚ ๐‘‹ . Then lim sup๐‘›โ†’โˆž๐‘‹๐‘› and lim inf๐‘›โ†’โˆž๐‘‹๐‘› are (possiblyempty) closed sets.

69

6 monotone operators and proximal points

Proof. Let ๐‘‹โˆž โ‰” lim sup๐‘›โ†’โˆž๐‘‹๐‘›. If ๐‘‹โˆž is empty, there is nothing to prove. So suppose,{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ ๐‘‹โˆž converges to some ๐‘ฆ โˆˆ ๐‘‹ . Then by the denition of ๐‘‹โˆž as an outerlimit, there exist innite subsets ๐‘๐‘˜ โŠ‚ โ„• and subsequences ๐‘ฅ๐‘˜,๐‘› โˆˆ ๐‘‹๐‘› for ๐‘› โˆˆ ๐‘๐‘˜ withlim๐‘๐‘˜3๐‘›โ†’โˆž ๐‘ฅ๐‘˜,๐‘› = ๐‘ฅ๐‘˜ . We can nd for each ๐‘˜ โˆˆ โ„• an index ๐‘›๐‘˜ โˆˆ ๐‘๐‘˜ such that โ€–๐‘ฅ๐‘˜ โˆ’๐‘ฅ๐‘˜,๐‘›๐‘˜ โ€–๐‘‹ โ‰ค1/๐‘›. Thus โ€–๐‘ฆ โˆ’ ๐‘ฅ๐‘˜,๐‘›๐‘˜ โ€–๐‘‹ โ‰ค โ€–๐‘ฆ โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘‹ + 1/๐‘˜ . Letting ๐‘˜ โ†’ โˆž we see that ๐‘‹๐‘›๐‘˜ 3 ๐‘ฅ๐‘˜,๐‘›๐‘˜ โ†’ ๐‘ฆ .Thus ๐‘ฆ โˆˆ ๐‘‹โˆž, that is, ๐‘‹โˆž is (strongly) closed.

Let then ๐‘‹โˆž โ‰” lim inf๐‘›โ†’โˆž๐‘‹๐‘›. If ๐‘‹โˆž is empty, there is nothing to prove. So suppose{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ ๐‘‹โˆž converges to some ๐‘ฆ โˆˆ ๐‘‹ . Then for each ๐‘› โˆˆ โ„• there exist ๐‘ฅ๐‘˜,๐‘› โˆˆ ๐‘‹๐‘›with lim๐‘›โ†’โˆž ๐‘ฅ๐‘˜,๐‘› = ๐‘ฅ๐‘˜ . We can consequently nd for each ๐‘˜ โˆˆ โ„• an index ๐‘›๐‘˜ โˆˆ โ„• suchthat โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜,๐‘›โ€–๐‘‹ < 1/๐‘˜ for ๐‘› โ‰ฅ ๐‘›๐‘˜ . Thus for every ๐‘› โˆˆ โ„• we can nd ๐‘˜๐‘› โˆˆ โ„• suchthat โ€–๐‘ฅ๐‘˜๐‘› โˆ’ ๐‘ฅ๐‘˜๐‘›,๐‘›โ€–๐‘‹ โ‰ค 1/๐‘˜๐‘› with ๐‘˜๐‘› โ†’ โˆž as ๐‘› โ†’ โˆž. Since this implies โ€–๐‘ฆ โˆ’ ๐‘ฅ๐‘˜๐‘›,๐‘›โ€–๐‘‹ โ‰คโ€–๐‘ฆ โˆ’ ๐‘ฅ๐‘˜๐‘› โ€–๐‘‹ + 1/๐‘˜๐‘› , letting ๐‘› โ†’ โˆž we see that ๐‘‹๐‘› 3 ๐‘ฅ๐‘˜๐‘›,๐‘› โ†’ ๐‘ฆ . Thus ๐‘ฆ โˆˆ ๐‘‹โˆž, that is, ๐‘‹โˆž is(strongly) closed. ๏ฟฝ

With these denitions, we can dene limits and continuity of set-valued mappings. Speci-cally, for ๐ด : ๐‘‹ โ‡’ ๐‘Œ , and a subset ๐ถ โŠ‚ ๐‘‹ , we dene the inner and outer limits (relative to๐ถ , if ๐ถ โ‰  ๐‘‹ ) as

lim sup๐ถ3๐‘ฅโ†’๐‘ฅ

๐ด(๐‘ฅ) โ‰”โ‹ƒ

๐ถ3๐‘ฅ๐‘›โ†’๐‘ฅ

lim sup๐‘›โ†’โˆž

๐ด(๐‘ฅ๐‘›),

andlim inf๐ถ3๐‘ฅโ†’๐‘ฅ

๐ด(๐‘ฅ) โ‰”โ‹‚

๐ถ3๐‘ฅ๐‘›โ†’๐‘ฅ

lim inf๐‘›โ†’โˆž ๐ด(๐‘ฅ๐‘›).

If ๐ถ = ๐‘‹ , we drop ๐ถ from the notations. Analogously, we dene weak-to-strong, strong-to-weak, and weak-to-weak limits by replacing ๐‘ฅ๐‘› โ†’ ๐‘ฅ by ๐‘ฅ๐‘› โ‡€ ๐‘ฅ and/or the outer/innerlimit by the weak outer/inner limit.

Corollary 6.3. Let ๐ด : ๐‘‹ โ‡’ ๐‘Œ and ๐‘ฅ โˆˆ ๐‘‹ . Then lim sup๐‘ฅโ†’๐‘ฅ ๐ด(๐‘ฅ) and lim inf๐‘ฅโ†’๐‘ฅ ๐ด(๐‘ฅ) are(possibly empty) closed sets.

Proof. The proof of the closedness of the outer limit is analogous to Lemma 6.2, while theproof of the closedness of the inner limit is a consequence of Lemma 6.2 and of the factthat the intersections of closed sets are closed. ๏ฟฝ

Let then ๐ด : ๐‘‹ โ‡’ ๐‘Œ be a set-valued mapping. We say that

(i) ๐ด is outer semicontinuous at ๐‘ฅ if lim sup๐ถ3๐‘ฅโ†’๐‘ฅ ๐ด(๐‘ฅ) โŠ‚ ๐ด(๐‘ฅ).(ii) ๐ด is inner semicontinuous at ๐‘ฅ if lim inf๐ถ3๐‘ฅโ†’๐‘ฅ ๐ด(๐‘ฅ) โŠƒ ๐ด(๐‘ฅ).(iii) The map ๐ด is outer/inner semicontinuous if it is outer/inner semicontinuous at all

๐‘ฅ โˆˆ ๐‘‹ .

70

6 monotone operators and proximal points

๐ด

๐‘ฅ1 ๐‘ฅ2

Figure 6.2: Illustration of outer and inner semicontinuity. The black line indicates thebounds on the boundary of graph ๐น that belong to the graph. The set-valuedmapping ๐ด is not outer semicontinuous at ๐‘ฅ1, because ๐ด(๐‘ฅ1) does not includeall limits from the right. It is outer semicontinuous at the โ€œdiscontinuousโ€ point๐‘ฅ2, as ๐ด(๐‘ฅ2) includes all limits from both sides. The mapping ๐ด is not innersemicontinuous at ๐‘ฅ2, because at this point, ๐ด(๐‘ฅ) cannot be approximated fromboth sides. It is inner semicontinuous at every other point ๐‘ฅ , including ๐‘ฅ1, as atthis points ๐ด(๐‘ฅ) can be approximated from both sides.

(iv) continuous (at ๐‘ฅ ) if it is both inner and outer semicontinuous (at ๐‘ฅ ).

(v) We say that these properties are โ€œrelative๐ถโ€ when we restrict ๐‘ฅ โˆˆ ๐ถ for some๐ถ โŠ‚ ๐‘‹ .These concepts are illustrated in Figure 6.2.

Just like lower semicontinuity of functionals, the outer semicontinuity of set-valued map-pings can be interpreted as a closedness property and will be crucial. The following lemmais stated for strong-to-strong outer semicontinuity, but corresponding statements hold(with identical proof) for weak-to-strong, strong-to-weak, and weak-to-weak outer semi-continuity as well.

Lemma 6.4. A set-valued mapping ๐ด : ๐‘‹ โ‡’ ๐‘Œ is outer semicontinuous if and only ifgraph๐ด โŠ‚ ๐‘‹ ร— ๐‘Œ is closed, i.e., ๐‘ฅ๐‘› โ†’ ๐‘ฅ and ๐ด(๐‘ฅ๐‘›) 3 ๐‘ฆ๐‘› โ†’ ๐‘ฆ imply that ๐‘ฆ โˆˆ ๐ด(๐‘ฅ).

Proof. Let ๐‘ฅ๐‘› โ†’ ๐‘ฅ and ๐‘ฆ๐‘› โˆˆ ๐ด(๐‘ฅ๐‘›), and suppose also ๐‘ฆ๐‘› โ†’ ๐‘ฆ . Then if graph๐ด is closed,(๐‘ฅ, ๐‘ฆ) โˆˆ graph๐ด and hence ๐‘ฆ โˆˆ ๐ด(๐‘ฅ). Since this holds for arbitrary sequences {๐‘ฅ๐‘›}๐‘›โˆˆโ„•, ๐ดis outer semicontinuous.

If, on the other hand, ๐ด is outer semicontinuous, and (๐‘ฅ๐‘›, ๐‘ฆ๐‘›) โˆˆ graph๐ด converge to(๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ โ€š then ๐‘ฆ โˆˆ ๐ด(๐‘ฅ) and hence (๐‘ฅ, ๐‘ฆ) โˆˆ graph๐ด. Since this holds for arbitrarysequences {(๐‘ฅ๐‘›, ๐‘ฆ๐‘›)}๐‘›โˆˆโ„•, graph๐ด is closed. ๏ฟฝ

6.2 monotone operators

For the codomain ๐‘Œ = ๐‘‹ โˆ— (as in the case of ๐‘ฅ โ†ฆโ†’ ๐œ•๐น (๐‘ฅ)), additional properties becomeimportant. A set-valued mapping๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— is calledmonotone if graph๐ด โ‰  โˆ… (to exclude

71

6 monotone operators and proximal points

trivial cases) and

(6.1) ใ€ˆ๐‘ฅโˆ—1 โˆ’ ๐‘ฅโˆ—2, ๐‘ฅ1 โˆ’ ๐‘ฅ2ใ€‰๐‘‹ โ‰ฅ 0 for all (๐‘ฅ1, ๐‘ฅโˆ—1 ), (๐‘ฅ2, ๐‘ฅโˆ—2) โˆˆ graph๐ด.

If ๐น : ๐‘‹ โ†’ โ„ is convex, then ๐œ•๐น : ๐‘‹ โ‡’ ๐‘‹ โˆ—, ๐‘ฅ โ†ฆโ†’ ๐œ•๐น (๐‘ฅ), is monotone: For any ๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐‘‹with ๐‘ฅโˆ—1 โˆˆ ๐œ•๐น (๐‘ฅ1) and ๐‘ฅโˆ—2 โˆˆ ๐œ•๐น (๐‘ฅ2), we have by denition that

ใ€ˆ๐‘ฅโˆ—1 , ๐‘ฅ โˆ’ ๐‘ฅ1ใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ1) for all ๐‘ฅ โˆˆ ๐‘‹,ใ€ˆ๐‘ฅโˆ—2, ๐‘ฅ โˆ’ ๐‘ฅ2ใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ2) for all ๐‘ฅ โˆˆ ๐‘‹ .

Adding the rst inequality for ๐‘ฅ = ๐‘ฅ2 and the second for ๐‘ฅ = ๐‘ฅ1 and rearranging theresult yields (6.1). (This generalizes the well-known fact that if ๐‘“ : โ„ โ†’ โ„ is convex anddierentiable, ๐‘“ โ€ฒ is monotonically increasing.) Furthermore, if๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘‹ โˆ— are monotoneand _ โ‰ฅ 0, then _๐ด and ๐ด + ๐ต are monotone as well.

In fact, we will need the following, stronger, property, which guarantees that ๐ด is outersemicontinuous: A monotone operator ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— is called maximally monotone, if theredoes not exist another monotone operator ๏ฟฝ๏ฟฝ : ๐‘‹ โ‡’ ๐‘‹ โˆ— such that graph๐ด ( graph ๏ฟฝ๏ฟฝ. Inother words, ๐ด is maximal monotone if for any ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— the condition

(6.2) ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 for all (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph๐ด

implies that ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ). (In other words, (6.2) holds if and only if (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph๐ด.)For xed ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, the condition claims that if ๐ด is monotone, then so is theextension

๏ฟฝ๏ฟฝ : ๐‘‹ โ‡’ ๐‘‹ โˆ—, ๐‘ฅ โ†ฆโ†’{๐ด(๐‘ฅ) โˆช {๐‘ฅโˆ—} if ๐‘ฅ = ๐‘ฅ,

๐ด(๐‘ฅ) if ๐‘ฅ โ‰  ๐‘ฅ .

For ๐ด to be maximally monotone means that this is not a true extension, i.e., ๏ฟฝ๏ฟฝ = ๐ด. Forexample, the operator

๐ด : โ„ โ‡’ โ„, ๐‘ก โ†ฆโ†’{1} if ๐‘ก > 0,{0} if ๐‘ก = 0,{โˆ’1} if ๐‘ก < 0,

is monotone but not maximally monotone, since ๐ด is a proper subset of the monotoneoperator dened by ๏ฟฝ๏ฟฝ(๐‘ก) = sign(๐‘ก) = ๐œ•( | ยท |) (๐‘ก) from Example 4.7.

Several useful properties follow directly from the denition.

Lemma 6.5. If ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— is maximally monotone, then so is _๐ด for all _ > 0.

Proof. Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, and assume that

0 โ‰ค ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = _ใ€ˆ_โˆ’1๐‘ฅโˆ— โˆ’ _โˆ’1๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ for all (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph _๐ด.

Since ๐‘ฅโˆ— โˆˆ _๐ด(๐‘ฅ) if and only if _โˆ’1๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) and ๐ด is maximally monotone, this impliesthat _โˆ’1๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ), i.e., ๐‘ฅโˆ— โˆˆ (_๐ด) (๐‘ฅ). Hence, _๐ด is maximally monotone. ๏ฟฝ

72

6 monotone operators and proximal points

Lemma 6.6. If ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— is maximally monotone, then ๐ด(๐‘ฅ) is convex and closed for all๐‘ฅ โˆˆ ๐‘‹ .

Proof. Closedness follows from Lemma 6.8. Assume then that ๐ด(๐‘ฅ) is not convex, i.e.,๐‘ฅโˆ—_โ‰” _๐‘ฅโˆ— + (1 โˆ’ _)๐‘ฅโˆ— โˆ‰ ๐ด(๐‘ฅ) for some ๐‘ฅโˆ—, ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) and _ โˆˆ (0, 1). We then show that ๐ด is

not maximal. To see this, we dene ๏ฟฝ๏ฟฝ via

๏ฟฝ๏ฟฝ(๐‘ฆ) โ‰”{๐ด(๐‘ฆ) ๐‘ฆ โ‰  ๐‘ฅ,

๐ด(๐‘ฅ) โˆช {๐‘ฅโˆ—_}, ๐‘ฆ = ๐‘ฅ,

and show that ๏ฟฝ๏ฟฝ is monotone. By the denition of ๏ฟฝ๏ฟฝ, it suces to show for all ๐‘ฆ โˆˆ ๐‘‹ and๐‘ฆโˆ— โˆˆ ๐ด(๐‘ฆ) that

ใ€ˆ๐‘ฅโˆ—_ โˆ’ ๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ โ‰ฅ 0.

But this follows directly from the denition of ๐‘ฅ๐‘ฅ_and the monotonicity of ๐ด. ๏ฟฝ

Lemma 6.7. Let ๐‘‹ be a reexive Banach space. If ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— is maximally monotone, thenso is ๐ดโˆ’1 : ๐‘‹ โˆ— โ‡’ ๐‘‹ โˆ—โˆ— ' ๐‘‹ .

Proof. First, recall that the inverse๐ดโˆ’1 : ๐‘‹ โˆ— โ‡’ ๐‘‹ always exists as a set-valued mapping andcan be identied with a set-valued mapping from ๐‘‹ โˆ— to ๐‘‹ โˆ—โˆ— with the aid of the canonicalinjection ๐ฝ : ๐‘‹ โ†’ ๐‘‹ โˆ—โˆ— from (1.2), i.e.,

๐ดโˆ’1(๐‘ฅโˆ—) โ‰” {๐ฝ๐‘ฅ โˆˆ ๐‘‹ โˆ—โˆ— | ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ)} for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

From this and the denition (1.2), it is clear that ๐ดโˆ’1 is monotone if and only if ๐ด is.

Let now ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and ๐‘ฅโˆ—โˆ— โˆˆ ๐‘‹ โˆ—โˆ— be given, and assume that

(6.3) ใ€ˆ๐‘ฅโˆ—โˆ— โˆ’ ๐‘ฅโˆ—โˆ—, ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ โˆ— for all (๐‘ฅโˆ—, ๐‘ฅโˆ—โˆ—) โˆˆ graph๐ดโˆ’1.

Since ๐‘‹ is reexive, ๐ฝ is surjective such that there exists an ๐‘ฅ โˆˆ ๐‘‹ with ๐‘ฅโˆ—โˆ— = ๐ฝ๐‘ฅ . Similarly,we can write ๐‘ฅโˆ—โˆ— = ๐ฝ๐‘ฅ for some ๐‘ฅ โˆˆ ๐‘‹ with ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ). By denition of the duality pairing,(6.3) is thus equivalent to

ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0

for all ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ). But since๐ด is maximally monotone, this implies that ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ)and hence ๐‘ฅโˆ—โˆ— = ๐ฝ๐‘ฅ โˆˆ ๐ดโˆ’1(๐‘ฅ). ๏ฟฝ

We now come to the outer semicontinuity.

Lemma 6.8. Let ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— be maximally monotone. Then ๐ด is both weak-to-strong andstrong-to-weak-โˆ— outer semicontinuous.

73

6 monotone operators and proximal points

Proof. Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and consider sequences {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ with ๐‘ฅ๐‘› โ‡€ ๐‘ฅ and{๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ โˆ— with ๐‘ฅโˆ—๐‘› โˆˆ ๐ด(๐‘ฅ๐‘›) and ๐‘ฅโˆ—๐‘› โ†’ ๐‘ฅโˆ— (or ๐‘ฅ๐‘› โ†’ ๐‘ฅ and ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ—). For arbitrary ๐‘ฅ โˆˆ ๐‘‹and ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ), the monotonicity of ๐ด implies that

0 โ‰ค ใ€ˆ๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘› โˆ’ ๐‘ฅใ€‰๐‘‹ โ†’ ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹since the duality pairing of strongly and weakly (or weakly-โˆ— and strongly) convergingsequences is convergent. Since ๐ด is maximally monotone, we obtain that ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) andhence๐ด is weak-to-strong (or strong-to-weakly-โˆ—) outer semicontinuous by Lemma 6.4. ๏ฟฝ

Since the pairing of weakly and weakly-โˆ— convergent sequences does not converge ingeneral, weak-to-weak-โˆ— outer semicontinuity requires additional assumptions on the twosequences. Although we will not need to make use of it, the following notion can proveuseful in other contexts. We call a set-valuedmapping๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— BCP outer semicontinuous(for Brezisโ€“Crandallโ€“Pazy), if for any sequences {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ and {๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ โˆ— with

(i) ๐‘ฅ๐‘› โ‡€ ๐‘ฅ and ๐ด(๐‘ฅ๐‘›) 3 ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ—,

(ii) lim sup๐‘›โ†’โˆž

ใ€ˆ๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘› โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0,

we have ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ). The following result from [Brezis, Crandall & Pazy 1970, Lemma 1.2](hence the name) shows that maximally monotone operators are BCP outer semicontinu-ous.

Lemma 6.9. Let ๐‘‹ be a Banach space and let ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— be maximally monotone. Then ๐ด isBCP outer semicontinuous.

Proof. First, the monotonicity of ๐ด and assumption (ii) imply that

(6.4) 0 โ‰ค lim inf๐‘›โ†’โˆž ใ€ˆ๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘› โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค lim sup

๐‘›โ†’โˆžใ€ˆ๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘› โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0.

Furthermore, from assumption (i) and the fact that ๐‘‹ is a Banach space, it follows that{๐‘ฅ๐‘›}๐‘›โˆˆโ„• and {๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• and hence also {ใ€ˆ๐‘ฅโˆ—๐‘›, ๐‘ฅ๐‘›ใ€‰๐‘‹ }๐‘›โˆˆโ„• are bounded. Thus there exists a sub-sequence such that ใ€ˆ๐‘ฅโˆ—๐‘›๐‘˜ , ๐‘ฅ๐‘›๐‘˜ ใ€‰๐‘‹ โ†’ ๐ฟ for some ๐ฟ โˆˆ โ„. Passing to the limit, and using (6.4),we obtain that

0 = lim๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—๐‘›๐‘˜ โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘›๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹= lim๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—๐‘›๐‘˜ , ๐‘ฅ๐‘›๐‘˜ ใ€‰๐‘‹ โˆ’ lim๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—๐‘›๐‘˜ , ๐‘ฅใ€‰๐‘‹ โˆ’ lim๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘›๐‘˜ ใ€‰๐‘‹ + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹= ๐ฟ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ .

Since the limit does not depend on the subsequence, we have that ใ€ˆ๐‘ฅโˆ—๐‘›, ๐‘ฅ๐‘›ใ€‰๐‘‹ โ†’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ .

74

6 monotone operators and proximal points

Let now ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) be arbitrary. Using again the monotonicity of ๐ด andassumption (i) together with the rst claim yields

0 โ‰ค lim inf๐‘›โ†’โˆž ใ€ˆ๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘› โˆ’ ๐‘ฅใ€‰๐‘‹

โ‰ค lim๐‘›โ†’โˆžใ€ˆ๐‘ฅ

โˆ—๐‘›, ๐‘ฅ๐‘›ใ€‰๐‘‹ โˆ’ lim

๐‘›โ†’โˆžใ€ˆ๐‘ฅโˆ—๐‘›, ๐‘ฅใ€‰๐‘‹ โˆ’ lim

๐‘›โ†’โˆžใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘›ใ€‰๐‘‹ + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹

= ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹and hence that ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) by the maximal monotonicity of ๐ด. ๏ฟฝ

The usefulness of BCP outer semicontinuity arises from the fact that it also implies weak-to-strong outer semicontinuity under slightly weaker conditions on ๐ด.

Lemma 6.10. Suppose ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— is monotone (but not necessarily maximally monotone)and BCP outer semicontinuous. Then ๐ด is also weak-to-strong outer semicontinuous.

Proof. Let ๐‘ฅ๐‘› โ‡€ ๐‘ฅ and ๐‘ฅโˆ—๐‘› โ†’ ๐‘ฅโˆ— with ๐‘ฅโˆ—๐‘› โˆˆ ๐ด(๐‘ฅ๐‘›) for all ๐‘› โˆˆ โ„•. This implies that ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ—

as well and that {๐‘ฅ๐‘›}๐‘›โˆˆโ„• is bounded. We thus have for some ๐ถ > 0 that

lim sup๐‘›โ†’โˆž

ใ€ˆ๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—, ๐‘ฅ๐‘› โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐ถ lim sup๐‘›โ†’โˆž

โ€–๐‘ฅโˆ—๐‘› โˆ’ ๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 0.

Hence, condition (ii) is satised, and the BCP outer semicontinuity yields ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ). ๏ฟฝ

We now show that convex subdierentials are maximally monotone. Although this result(known as Rockafellarโ€™s Theorem, see [Rockafellar 1970]) holds in arbitrary Banach spaces,the proof (adapted from [Simons 2009]) greatly simplies in reexive Banach spaces.

Theorem 6.11. Let ๐‘‹ be a reexive Banach space and ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lowersemicontinuous. Then ๐œ•๐น : ๐‘‹ โ‡’ ๐‘‹ โˆ— is maximally monotone.

Proof. First, we already know that ๐œ•๐น is monotone. Let now ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— be givensuch that

(6.5) ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ).

We consider๐ฝ : ๐‘‹ โ†’ โ„, ๐‘ง โ†ฆโ†’ ๐น (๐‘ง + ๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰๐‘‹ + 1

2 โ€–๐‘งโ€–2๐‘‹ ,

which is proper, convex and lower semicontinuous by the assumptions on ๐น . Furthermore,๐ฝ is coercive by Lemma 3.9. Theorem 3.8 thus yields a ๐‘ง โˆˆ ๐‘‹ with ๐ฝ (๐‘ง) = min๐‘งโˆˆ๐‘‹ ๐ฝ (๐‘ง). ByTheorems 4.2, 4.5 and 4.14 and Lemma 4.13 (ii) then

(6.6) 0 โˆˆ ๐œ•๐น (๐‘ง + ๐‘ฅ) โˆ’ {๐‘ฅโˆ—} + ๐œ• ๐‘— (๐‘ง),

75

6 monotone operators and proximal points

where we have introduced ๐‘— (๐‘ง) โ‰” 12 โ€–๐‘งโ€–2๐‘‹ . In other words, there exists a ๐‘งโˆ— โˆˆ ๐œ• ๐‘— (๐‘ง) such that

๐‘ฅโˆ— โˆ’ ๐‘งโˆ— โˆˆ ๐œ•๐น (๐‘ง + ๐‘ฅ). Combining Lemma 5.4 for ๐‘ = ๐‘ž = 2 and Lemma 5.8, we furthermorehave that ๐‘งโˆ— โˆˆ ๐œ• ๐‘— (๐‘ง) if and only if

(6.7) ใ€ˆ๐‘งโˆ—, ๐‘งใ€‰๐‘‹ =12 โ€–๐‘งโ€–

2๐‘‹ + 1

2 โ€–๐‘งโˆ—โ€–2๐‘‹ โˆ— .

Applying now (6.5) for ๐‘ฅ = ๐‘ง + ๐‘ฅ and ๐‘ฅโˆ— = ๐‘ฅโˆ— โˆ’ ๐‘งโˆ— โˆˆ ๐œ•๐น (๐‘ฅ), we obtain using (6.7) that

0 โ‰ค ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ— + ๐‘งโˆ—, ๐‘ฅ โˆ’ ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ = โˆ’ใ€ˆ๐‘งโˆ—, ๐‘งใ€‰๐‘‹ = โˆ’ 12 โ€–๐‘ง

โˆ—โ€–2๐‘‹ โˆ— โˆ’ 12 โ€–๐‘งโ€–

2๐‘‹ ,

implying that both ๐‘ง = 0 and ๐‘งโˆ— = 0. Hence by (6.6) we conclude that ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ), whichshows that ๐œ•๐น is maximally monotone. ๏ฟฝ

The argument in the preceding proof can be modied to give a characterization of maximalmonotonicity for general monotone operators; this is known as Mintyโ€™s Theorem and isa central result in the theory of monotone operators. We again make use of the dualitymapping ๐œ• ๐‘— : ๐‘‹ โ‡’ ๐‘‹ โˆ— for ๐‘— (๐‘ฅ) = 1

2 โ€–๐‘ฅ โ€–2๐‘‹ , noting for later use that if ๐‘‹ is a Hilbert space(and we identify ๐‘‹ โˆ— with ๐‘‹ ), then ๐œ• ๐‘— = Id.

Theorem 6.12 (Minty). Let ๐‘‹ be a reexive Banach space and ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— be monotone. If ๐ดis maximally monotone, then ๐œ• ๐‘— +๐ด is surjective.

Proof. We proceed similarly as in the proof of Theorem 6.11 by constructing a functional๐น๐ด which plays the same role for๐ด as ๐น does for ๐œ•๐น . Specically, we dene for a maximallymonotone operator ๐ด : ๐‘‹ โ‡’ ๐‘‹ โˆ— the Fitzpatrick functional

(6.8) ๐น๐ด : ๐‘‹ ร— ๐‘‹ โˆ— โ†’ [โˆ’โˆž,โˆž], (๐‘ฅ, ๐‘ฅโˆ—) โ†ฆโ†’ sup(๐‘ง,๐‘งโˆ—)โˆˆgraph๐ด

(ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰๐‘‹ + ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘งโˆ—, ๐‘งใ€‰๐‘‹ ) ,

which can be written equivalently as

(6.9) ๐น๐ด (๐‘ฅ, ๐‘ฅโˆ—) = ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ inf(๐‘ง,๐‘งโˆ—)โˆˆgraph๐ด

ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘งโˆ—, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ .

Each characterization implies useful properties.

(i) By maximal monotonicity of๐ด, we have by denition that ใ€ˆ๐‘ฅโˆ—โˆ’๐‘งโˆ—, ๐‘ฅ โˆ’๐‘งใ€‰๐‘‹ โ‰ฅ 0 for all(๐‘ง, ๐‘งโˆ—) โˆˆ graph๐ด if and only if (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph๐ด; in particular, ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘งโˆ—, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ < 0for all (๐‘ฅ, ๐‘ฅโˆ—) โˆ‰ graph๐ด. Hence, (6.9) implies that ๐น๐ด (๐‘ฅ, ๐‘ฅโˆ—) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ , with equalityif and only if (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph๐ด (since in this case the inmum is attained in (๐‘ง, ๐‘งโˆ—) =(๐‘ฅ, ๐‘ฅโˆ—)). In particular, ๐น๐ด is proper.

76

6 monotone operators and proximal points

(ii) On the other hand, the denition (6.8) yields that

๐น๐ด = (๐บ๐ด)โˆ— for ๐บ๐ด (๐‘งโˆ—, ๐‘ง) = ใ€ˆ๐‘งโˆ—, ๐‘งใ€‰๐‘‹ + ๐›ฟgraph๐ดโˆ’1 (๐‘งโˆ—, ๐‘ง)

(since (๐‘ง, ๐‘งโˆ—) โˆˆ graph๐ด if and only if (๐‘งโˆ—, ๐‘ง) โˆˆ graph๐ดโˆ’1). As part of the monotonicityof ๐ด, we have required that graph๐ด โ‰  โˆ…; hence ๐น๐ด is the Fenchel conjugate of aproper functional and therefore convex and lower semicontinuous.

As a rst step, we now show the result for the special case ๐‘ง = 0, i.e., that 0 โˆˆ ran(๐œ• ๐‘— +๐ด).We now set ฮž โ‰” ๐‘‹ ร— ๐‘‹ โˆ— as well as b โ‰” (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ ฮž and consider the functional

๐ฝ๐ด : ฮž โ†’ โ„, b โ†ฆโ†’ ๐น๐ด (b) + 12 โ€–b โ€–

2ฮž.

We rst note that property (i) implies for all b โˆˆ ฮž that

(6.10) ๐ฝ๐ด (b) = ๐น๐ด (b) + 12 โ€–b โ€–

2ฮž = ๐น๐ด (๐‘ฅ, ๐‘ฅโˆ—) + 1

2 โ€–๐‘ฅ โ€–2๐‘‹ + 1

2 โ€–๐‘ฅโˆ—โ€–2๐‘‹ โˆ—

โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + 12 โ€–๐‘ฅ โ€–

2๐‘‹ + 1

2 โ€–๐‘ฅโˆ—โ€–2๐‘‹ โˆ—

โ‰ฅ 0,

where the last inequality follows from the Fenchelโ€“Young inequality for ๐‘— applied to(๐‘ฅ,โˆ’๐‘ฅโˆ—). Furthermore, ๐ฝ๐ด is proper, convex, lower semicontinuous, and (by Lemma 3.9)coercive. Theorem 3.8 thus yields a b โ‰” (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ ฮž with ๐ฝ๐ด (b) = minbโˆˆฮž ๐ฝ๐ด (b), which byTheorems 4.2, 4.5 and 4.14 satises that

0 โˆˆ ๐œ•๐ฝ๐ด (b) = ๐œ•(12 โ€–b โ€–

2๐‘‹

)+ ๐œ•๐น๐ด (b),

i.e., there exists a bโˆ— = (๏ฟฝ๏ฟฝโˆ—, ๏ฟฝ๏ฟฝ) โˆˆ ฮžโˆ— ' ๐‘‹ โˆ— ร— ๐‘‹ (since ๐‘‹ is reexive) such that bโˆ— โˆˆ ๐œ•๐น๐ด (b)and โˆ’bโˆ— โˆˆ ๐œ•( 12 โ€–b โ€–2๐‘‹ ).By denition of the subdierential, we thus have for all b โˆˆ ฮž that

๐น๐ด (b) โ‰ฅ ๐น๐ด (b) + ใ€ˆbโˆ—, b โˆ’ bใ€‰ฮž = ๐ฝ๐ด (b) + 12 โ€–b

โˆ—โ€–2ฮžโˆ— + ใ€ˆbโˆ—, bใ€‰ฮž โ‰ฅ 12 โ€–b

โˆ—โ€–2ฮž + ใ€ˆbโˆ—, bใ€‰ฮž,

where the second step uses again the Fenchelโ€“Young inequality, which holds with equalityfor (b,โˆ’bโˆ—), and the last step follows from (6.10). Property (i) then implies for all (๐‘ฅ, ๐‘ฅโˆ—) โˆˆgraph๐ด that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = ๐น๐ด (๐‘ฅ, ๐‘ฅโˆ—) โ‰ฅ 12 โ€–๏ฟฝ๏ฟฝ

โˆ—โ€–2๐‘‹ โˆ— + 12 โ€–๏ฟฝ๏ฟฝ โ€–2๐‘‹ + ใ€ˆ๏ฟฝ๏ฟฝโˆ—, ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฅโˆ—, ๏ฟฝ๏ฟฝใ€‰๐‘‹ .

Adding ใ€ˆ๏ฟฝ๏ฟฝโˆ—, ๏ฟฝ๏ฟฝใ€‰๐‘‹ on both sides and rearranging yields

(6.11) ใ€ˆ๐‘ฅโˆ— โˆ’ ๏ฟฝ๏ฟฝโˆ—, ๐‘ฅ โˆ’ ๏ฟฝ๏ฟฝใ€‰๐‘‹ โ‰ฅ ใ€ˆ๏ฟฝ๏ฟฝโˆ—, ๏ฟฝ๏ฟฝใ€‰๐‘‹ + 12 โ€–๏ฟฝ๏ฟฝ

โˆ—โ€–2๐‘‹ โˆ— + 12 โ€–๏ฟฝ๏ฟฝ โ€–2๐‘‹ โ‰ฅ 0,

77

6 monotone operators and proximal points

again by the Fenchelโ€“Young inequality. The maximal monotonicity of ๐ด thus yields that๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ด(๏ฟฝ๏ฟฝ), i.e., (๏ฟฝ๏ฟฝ, ๏ฟฝ๏ฟฝโˆ—) โˆˆ graph๐ด. Inserting this for (๐‘ฅ, ๐‘ฅโˆ—) in (6.11) then shows that

ใ€ˆ๏ฟฝ๏ฟฝโˆ—, ๏ฟฝ๏ฟฝใ€‰๐‘‹ + 12 โ€–๏ฟฝ๏ฟฝ

โˆ—โ€–2๐‘‹ โˆ— + 12 โ€–๏ฟฝ๏ฟฝ โ€–2๐‘‹ = 0.

Hence the Fenchelโ€“Young inequality for ๐œ• ๐‘— holds with equality at (๏ฟฝ๏ฟฝ,โˆ’๏ฟฝ๏ฟฝโˆ—), implyingโˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐œ• ๐‘— (๏ฟฝ๏ฟฝ). Together, we obtain that 0 = โˆ’๏ฟฝ๏ฟฝโˆ— + ๏ฟฝ๏ฟฝโˆ— โˆˆ (๐œ• ๐‘— +๐ด) (๏ฟฝ๏ฟฝ).Finally, let ๐‘งโˆ— โˆˆ ๐‘‹ โˆ— be arbitrary and set ๐ต : ๐‘‹ โ‡’ ๐‘‹ โˆ—, ๐‘ฅ โ†ฆโ†’ {โˆ’๐‘งโˆ—} + ๐ด(๐‘ฅ). Using thedenition, it is straightforward to verify that ๐ต is maximally monotone as well. As we havejust shown, there now exists a ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— with 0 โˆˆ (๐œ• ๐‘— + ๐ต) (๐‘ฅโˆ—) = {๐‘ฅโˆ—} + {โˆ’๐‘งโˆ—} +๐ด(๐‘ฅโˆ—), i.e.,๐‘งโˆ— โˆˆ (๐œ• ๐‘— +๐ด) (๐‘ฅโˆ—). Hence ๐œ• ๐‘— +๐ด is surjective. ๏ฟฝ

6.3 resolvents and proximal points

The proof of Theorem 6.11 is based on associating to any ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) an element ๐‘ง โˆˆ ๐‘‹as the minimizer of a suitable functional. If ๐‘‹ is a Hilbert space, this functional is evenstrictly convex and hence the minimizer ๐‘ง is unique. This property can be exploited todene a new single-valued mapping that is more useful for algorithms than the set-valuedsubdierential mapping. For this purpose, we restrict the discussion in the remainder ofthis chapter to Hilbert spaces (but see Remark 6.26 below). This allows identifying ๐‘‹ โˆ— with๐‘‹ ; in particular, we will from now on identify the set ๐œ•๐น (๐‘ฅ) โŠ‚ ๐‘‹ โˆ— of subderivatives withthe corresponding set in ๐‘‹ of subgradients (i.e., their Riesz representations). By the sametoken, we will also use the same notation for inner products as for duality pairings to avoidthe danger of confusing pairs of elements (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph ๐œ•๐น with their inner product.

We can then dene for a maximally monotone operator ๐ด : ๐‘‹ โ‡’ ๐‘‹ the resolvent

R๐ด : ๐‘‹ โ‡’ ๐‘‹, R๐ด (๐‘ฅ) = (Id +๐ด)โˆ’1๐‘ฅ,as well as for a proper, convex, and lower semicontinuous functional ๐น : ๐‘‹ โ†’ โ„ theproximal point mapping

(6.12) prox๐น : ๐‘‹ โ†’ ๐‘‹, prox๐น (๐‘ฅ) = argmin๐‘งโˆˆ๐‘‹

12 โ€–๐‘ง โˆ’ ๐‘ฅ โ€–

2๐‘‹ + ๐น (๐‘ง).

Since a similar argument as in the proof of Theorem 6.11 shows that๐‘ค โˆˆ R๐œ•๐น (๐‘ฅ) is equivalentto the necessary and sucient conditions for the proximal point ๐‘ค to be a minimizer of thestrictly convex functional in (6.12), we have that

(6.13) prox๐น = (Id + ๐œ•๐น )โˆ’1 = R๐œ•๐น .

Resolvents of monotone and, in particular, maximal monotone operators have useful prop-erties.

78

6 monotone operators and proximal points

Lemma 6.13. If ๐ด : ๐‘‹ โ‡’ ๐‘‹ is monotone, R๐ด is rmly nonexpansive, i.e.,

(6.14) โ€–๐‘ง1 โˆ’ ๐‘ง2โ€–2๐‘‹ โ‰ค ใ€ˆ๐‘ฅ1 โˆ’ ๐‘ฅ2, ๐‘ง1 โˆ’ ๐‘ง2ใ€‰๐‘‹ for all (๐‘ฅ1, ๐‘ง1), (๐‘ฅ2, ๐‘ง2) โˆˆ graphR๐ดor equivalently,

(6.15) โ€–๐‘ง1 โˆ’ ๐‘ง2โ€–2๐‘‹ + โ€–(๐‘ฅ1 โˆ’ ๐‘ง1) โˆ’ (๐‘ฅ2 โˆ’ ๐‘ง2)โ€–2๐‘‹ โ‰ค โ€–๐‘ฅ1 โˆ’ ๐‘ฅ2โ€–2๐‘‹for all (๐‘ฅ1, ๐‘ง1), (๐‘ฅ2, ๐‘ง2) โˆˆ graphR๐ด .

Proof. Let ๐‘ฅ1, ๐‘ฅ2 โˆˆ domR๐ด as well as ๐‘ง1 โˆˆ R๐ด (๐‘ฅ1) and ๐‘ง2 โˆˆ R๐ด (๐‘ฅ2). By denition of theresolvent, this implies that ๐‘ฅ1 โˆ’ ๐‘ง1 โˆˆ ๐ด(๐‘ง1) and ๐‘ฅ2 โˆ’ ๐‘ง2 โˆˆ ๐ด(๐‘ง2). By the monotonicity of ๐ด,we thus have

0 โ‰ค ใ€ˆ(๐‘ฅ1 โˆ’ ๐‘ง1) โˆ’ (๐‘ฅ2 โˆ’ ๐‘ง2), ๐‘ง1 โˆ’ ๐‘ง2ใ€‰๐‘‹ ,which after rearranging yields (6.14). The equivalence of (6.14) and (6.15) is straightforwardto verify using binomial expansion. ๏ฟฝ

Corollary 6.14. Let ๐ด : ๐‘‹ โ‡’ ๐‘‹ be maximally monotone. Then R๐ด : ๐‘‹ โ†’ ๐‘‹ is single-valuedand Lipschitz continuous with constant ๐ฟ = 1.

Proof. Since ๐ด is maximally monotone, Id + ๐ด is surjective by Theorem 6.12, which im-plies that domR๐ด = ๐‘‹ . Let now ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง1, ๐‘ง2 โˆˆ R๐ด (๐‘ฅ). Since ๐ด is monotone, R๐ดis nonexpansive by Lemma 6.13, which yields both single-valuedness of R๐ด (by taking๐‘ฅ1 = ๐‘ฅ2 = ๐‘ฅ implies ๐‘ง1 = ๐‘ง2) and its Lipschitz continuity (by applying the Cauchyโ€“Schwarzinequality). ๏ฟฝ

In particular, by Theorem 6.11, this holds for the proximal point mapping prox๐น : ๐‘‹ โ†’ ๐‘‹of a proper, convex, and lower semicontinuous functional ๐น : ๐‘‹ โ†’ โ„.

Lipschitz continuous mappings with constant ๐ฟ = 1 are also called nonexpansive. A relatedconcept that is sometimes used is the following. A mapping๐‘‡ : ๐‘‹ โ†’ ๐‘‹ is called ๐›ผ-averagedfor some ๐›ผ โˆˆ (0, 1), if ๐‘‡ = (1 โˆ’ ๐›ผ)Id + ๐›ผ ๐ฝ for some nonexpansive ๐ฝ : ๐‘‹ โ†’ ๐‘‹ . We then havethe following relation.

Lemma 6.15. Let๐‘‡ : ๐‘‹ โ†’ ๐‘‹ . Then๐‘‡ is rmly nonexpansive if and only if๐‘‡ is (1/2)-averaged.

Proof. Suppose ๐‘‡ is (1/2)-averaged. Then ๐‘‡ = 12 (Id + ๐ฝ ) for some nonexpansive ๐ฝ . We

compute

โ€–๐‘‡ (๐‘ฅ) โˆ’๐‘‡ (๐‘ฆ)โ€–2๐‘‹ =14

(โ€– ๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ฆ)โ€–2๐‘‹ + 2ใ€ˆ๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ฆ), ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ + โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹)

โ‰ค 12

(ใ€ˆ๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ฆ), ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ + โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹)

= ใ€ˆ๐‘‡ (๐‘ฅ) โˆ’๐‘‡ (๐‘ฆ), ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ .

79

6 monotone operators and proximal points

Thus ๐‘‡ is rmly nonexpansive.

Suppose then that ๐‘‡ is rmly nonexpansive. If we show that ๐ฝ โ‰” 2๐‘‡ โˆ’ Id is nonexpansive,it follows that ๐‘‡ is (1/2)-averaged. This is established by the simple calculations

โ€– ๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ฆ)โ€–2๐‘‹ = 4โ€–๐‘‡ (๐‘ฅ) โˆ’๐‘‡ (๐‘ฆ)โ€–2๐‘‹ โˆ’ 4ใ€ˆ๐‘‡ (๐‘ฅ) โˆ’๐‘‡ (๐‘ฆ), ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ + โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹โ‰ค โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ .

This completes the proof. ๏ฟฝ

Like maximally monotone operators, ๐›ผ-averaged operators always have outer semiconti-nuity properties. To show this, we will use that in Hilbert spaces, the converse of Mintyโ€™sTheorem 6.12 holds (with the duality mapping ๐œ• ๐‘— = Id).

Lemma 6.16. Let ๐ด : ๐‘‹ โ‡’ ๐‘‹ be monotone. If Id + ๐ด is surjective, then ๐ด is maximallymonotone.

Proof. Consider ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐‘‹ with

(6.16) ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 for all (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph๐ด.

If Id +๐ด is surjective, then for ๐‘ฅ + ๐‘ฅโˆ— โˆˆ ๐‘‹ there exist a ๐‘ง โˆˆ ๐‘‹ and a ๐‘งโˆ— โˆˆ ๐ด(๐‘ง) with

(6.17) ๐‘ฅ + ๐‘ฅโˆ— = ๐‘ง + ๐‘งโˆ— โˆˆ (Id +๐ด)๐‘ง.

Inserting (๐‘ฅ, ๐‘ฅโˆ—) = (๐‘ง, ๐‘งโˆ—) into (6.16) then yields that

0 โ‰ค ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘งโˆ—, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ = ใ€ˆ๐‘ง โˆ’ ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ = โˆ’โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ โ‰ค 0,

i.e., ๐‘ฅ = ๐‘ง. From (6.17) we further obtain ๐‘ฅโˆ— = ๐‘งโˆ— โˆˆ ๐ด(๐‘ง) = ๐ด(๐‘ฅ), and hence ๐ด is maximallymonotone. ๏ฟฝ

Lemma 6.17. Let๐‘‡ : ๐‘‹ โ†’ ๐‘‹ be ๐›ผ-averaged. Then๐‘‡ is weak-to-strong and strong-to-weakly-โˆ—outer semicontinuous, and the set of xed points ๐‘ฅ = ๐‘‡ (๐‘ฅ) of ๐‘‡ is convex and closed.

Proof. Let ๐‘‡ = (1 โˆ’ ๐›ผ)Id + ๐›ผ ๐ฝ for some nonexpansive operator ๐ฝ : ๐‘‹ โ†’ ๐‘‹ . Then clearly๐‘ฅ โˆˆ ๐‘‹ is a xed point of ๐‘‡ if and only if ๐‘ฅ is a xed point of ๐ฝ . It thus suces to show theclaim for the xed-point set {๐‘ฅ | ๐‘ฅ = ๐ฝ (๐‘ฅ)} = (Id โˆ’ ๐ฝ )โˆ’1(0) of a nonexpansive operator ๐ฝ .By Lemmas 6.6 to 6.8, we thus only need to show that Id โˆ’ ๐ฝ is maximally monotone.

First, Idโˆ’ ๐ฝ is clearly monotone. Moreover, 2Idโˆ’ ๐ฝ = Id+(Idโˆ’ ๐ฝ ) is surjective since otherwise2๐‘ฅ โˆ’ ๐ฝ (๐‘ฅ) = 2๐‘ฆ โˆ’ ๐ฝ (๐‘ฆ) for ๐‘ฅ โ‰  ๐‘ฆ , which together with the assumed nonexpansivity wouldlead to the contradiction 0 โ‰  2โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€– โ‰ค โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–. Lemma 6.16 then shows that Id โˆ’ ๐ฝ ismaximally monotone, and the claim follows. ๏ฟฝ

80

6 monotone operators and proximal points

The following useful result allows characterizing minimizers of convex functionals asproximal points.

Lemma 6.18. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐‘ฅ, ๐‘ฅโˆ— โˆˆ ๐‘‹ .Then for any ๐›พ > 0,

๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) โ‡” ๐‘ฅ = prox๐›พ๐น (๐‘ฅ + ๐›พ๐‘ฅโˆ—).

Proof. Multiplying both sides of the subdierential inclusion by ๐›พ > 0 and adding ๐‘ฅ yieldsthat

๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) โ‡” ๐‘ฅ + ๐›พ๐‘ฅโˆ— โˆˆ (Id + ๐›พ๐œ•๐น ) (๐‘ฅ)โ‡” ๐‘ฅ โˆˆ (Id + ๐›พ๐œ•๐น )โˆ’1(๐‘ฅ + ๐›พ๐‘ฅโˆ—)โ‡” ๐‘ฅ = prox๐›พ๐น (๐‘ฅ + ๐›พ๐‘ฅโˆ—),

where in the last step we have used that ๐›พ๐œ•๐น = ๐œ•(๐›พ๐น ) by Lemma 4.13 (i) and hence thatprox๐›พ๐น = R๐œ•(๐›พ๐น ) = R๐›พ๐œ•๐น . ๏ฟฝ

By applying Lemma 6.18 to the Fermat principle 0 โˆˆ ๐œ•๐น (๐‘ฅ), we obtain the followingxed-point characterization of minimizers of ๐น .

Corollary 6.19. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex and lower semicontinuous, and ๐›พ > 0 bearbitrary. Then ๐‘ฅ โˆˆ dom ๐น is a minimizer of ๐น if and only if

๐‘ฅ = prox๐›พ๐น (๐‘ฅ).

This simple result should not be underestimated: It allows replacing (explicit) set inclusionsin optimality conditions by equivalent (implicit) Lipschitz continuous equations, which (aswe will show in following chapters) can be solved by xed-point iteration or Newton-typemethods.

We can also derive a generalization of the orthogonal decomposition of vector spaces.

Theorem 6.20 (Moreau decomposition). Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lowersemicontinuous. Then we have for all ๐‘ฅ โˆˆ ๐‘‹ that

๐‘ฅ = prox๐น (๐‘ฅ) + prox๐น โˆ— (๐‘ฅ).

Proof. Setting๐‘ค = prox๐น (๐‘ฅ), Lemmas 5.8 and 6.18 for ๐›พ = 1 imply that

๐‘ค = prox๐น (๐‘ฅ) = prox๐น (๐‘ค + (๐‘ฅ โˆ’๐‘ค)) โ‡” ๐‘ฅ โˆ’๐‘ค โˆˆ ๐œ•๐น (๐‘ค)โ‡” ๐‘ค โˆˆ ๐œ•๐น โˆ—(๐‘ฅ โˆ’๐‘ค)โ‡” ๐‘ฅ โˆ’๐‘ค = prox๐น โˆ— ((๐‘ฅ โˆ’๐‘ค) +๐‘ค) = prox๐น โˆ— (๐‘ฅ). ๏ฟฝ

The following calculus rules will prove useful.

81

6 monotone operators and proximal points

Lemma 6.21. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous. Then,

(i) for _ โ‰  0 and ๐‘ง โˆˆ ๐‘‹ we have with ๐ป (๐‘ฅ) โ‰” ๐น (_๐‘ฅ + ๐‘ง) that

prox๐ป (๐‘ฅ) = _โˆ’1(prox_2๐น (_๐‘ฅ + ๐‘ง) โˆ’ ๐‘ง);

(ii) for ๐›พ > 0 we have that

prox๐›พ๐น โˆ— (๐‘ฅ) = ๐‘ฅ โˆ’ ๐›พ prox๐›พโˆ’1๐น (๐›พโˆ’1๐‘ฅ);

(iii) for proper, convex, lower semicontinuous๐บ : ๐‘Œ โ†’ โ„ and๐›พ > 0we have with๐ป (๐‘ฅ, ๐‘ฆ) โ‰”๐น (๐‘ฅ) +๐บ (๐‘ฆ) that

prox๐›พ๐ป (๐‘ฅ, ๐‘ฆ) =(prox๐›พ๐น (๐‘ฅ)prox๐›พ๐บ (๐‘ฆ)

).

Proof. (i): By denition,

prox๐ป (๐‘ฅ) = argmin๐‘คโˆˆ๐‘‹

12 โ€–๐‘ค โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐น (_๐‘ค + ๐‘ง) =: ๏ฟฝ๏ฟฝ .

Now note that since ๐‘‹ is a vector space,

min๐‘คโˆˆ๐‘‹

12 โ€–๐‘ค โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐น (_๐‘ค + ๐‘ง) = min

๐‘ฃโˆˆ๐‘‹12 โ€–_

โˆ’1(๐‘ฃ โˆ’ ๐‘ง) โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐น (๐‘ฃ),

and the respective minimizers ๏ฟฝ๏ฟฝ and ๐‘ฃ are related by ๐‘ฃ = _๏ฟฝ๏ฟฝ + ๐‘ง. The claim then followsfrom

๐‘ฃ = argmin๐‘ฃโˆˆ๐‘‹

12 โ€–_

โˆ’1(๐‘ฃ โˆ’ ๐‘ง) โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐น (๐‘ฃ)

= argmin๐‘ฃโˆˆ๐‘‹

12_2 โ€–๐‘ฃ โˆ’ (_๐‘ฅ + ๐‘ง)โ€–2๐‘‹ + ๐น (๐‘ฃ)

= argmin๐‘ฃโˆˆ๐‘‹

12 โ€–๐‘ฃ โˆ’ (_๐‘ฅ + ๐‘ง)โ€–2๐‘‹ + _2๐น (๐‘ฃ)

= prox_2๐น (_๐‘ฅ + ๐‘ง).

(ii): Theorem 6.20, Lemma 5.7 (i), and (i) for _ = ๐›พโˆ’1 and ๐‘ง = 0 together imply that

prox๐›พ๐น (๐‘ฅ) = ๐‘ฅ โˆ’ prox(๐›พ๐น )โˆ— (๐‘ฅ)= ๐‘ฅ โˆ’ prox๐›พ๐น โˆ—โ—ฆ(๐›พโˆ’1Id) (๐‘ฅ)= ๐‘ฅ โˆ’ ๐›พ prox๐›พ (๐›พโˆ’2๐น โˆ—) (๐›พโˆ’1๐‘ฅ).

Applying this to ๐น โˆ— and using that ๐น โˆ—โˆ— = ๐น by Theorem 5.1 (iii) now yields the claim.

82

6 monotone operators and proximal points

(iii): By denition of the norm on the product space ๐‘‹ ร— ๐‘Œ , we have that

prox๐›พ๐ป (๐‘ฅ, ๐‘ฆ) = argmin(๐‘ข,๐‘ฃ)โˆˆ๐‘‹ร—๐‘Œ

12 โ€–(๐‘ข, ๐‘ฃ) โˆ’ (๐‘ฅ, ๐‘ฆ)โ€–2๐‘‹ร—๐‘Œ + ๐›พ๐ป (๐‘ข, ๐‘ฃ)

= argmin๐‘ขโˆˆ๐‘‹,๐‘ฃโˆˆ๐‘Œ

(12 โ€–๐‘ข โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐›พ๐น (๐‘ข)

)+

(12 โ€–๐‘ฃ โˆ’ ๐‘ฆ โ€–

2๐‘Œ + ๐›พ๐บ (๐‘ฃ)

).

Since there are no mixed terms in ๐‘ข and ๐‘ฃ , the two terms in parentheses can be minimizedseparately. Hence, prox๐›พ๐ป (๐‘ฅ, ๐‘ฆ) = (๐‘ข, ๐‘ฃ) for

๐‘ข = argmin๐‘ขโˆˆ๐‘‹

12 โ€–๐‘ข โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐›พ๐น (๐‘ข) = prox๐›พ๐น (๐‘ฅ),

๐‘ฃ = argmin๐‘ฃโˆˆ๐‘Œ

12 โ€–๐‘ฃ โˆ’ ๐‘ฆ โ€–

2๐‘Œ + ๐›พ๐บ (๐‘ฃ) = prox๐›พ๐บ (๐‘ฅ) . ๏ฟฝ

Computing proximal points is dicult in general since evaluating prox๐น by its denitionentails minimizing ๐น . In some cases, however, it is possible to give an explicit formula forprox๐น .

Example 6.22. We rst consider scalar functions ๐‘“ : โ„ โ†’ โ„.

(i) ๐‘“ (๐‘ก) = 12 |๐‘ก |2. Since ๐‘“ is dierentiable, we can set the derivative of 1

2 (๐‘  โˆ’ ๐‘ก)2 +๐›พ2๐‘ 

2

to zero and solve for ๐‘  to obtain prox๐›พ ๐‘“ (๐‘ก) = (1 + ๐›พ)โˆ’1๐‘ก .(ii) ๐‘“ (๐‘ก) = |๐‘ก |. By Example 4.7, we have that ๐œ•๐‘“ (๐‘ก) = sign(๐‘ก); hence ๐‘  โ‰” prox๐›พ ๐‘“ (๐‘ก) =

(Id +๐›พ sign)โˆ’1(๐‘ก) if and only if ๐‘ก โˆˆ {๐‘ } +๐›พ sign(๐‘ ). Let ๐‘ก be given and assume thisholds for some ๐‘  . We now proceed by case distinction.

Case 1: ๐‘  > 0. This implies that ๐‘ก = ๐‘  + ๐›พ , i.e., ๐‘  = ๐‘ก โˆ’ ๐›พ , and hence that ๐‘ก > ๐›พ .

Case 2: ๐‘  < 0. This implies that ๐‘ก = ๐‘  โˆ’ ๐›พ , i.e., ๐‘  = ๐‘ก + ๐›พ , and hence that ๐‘ก < โˆ’๐›พ .Case 3: ๐‘  = 0. This implies that ๐‘ก โˆˆ ๐›พ [โˆ’1, 1] = [โˆ’๐›พ,๐›พ].Since this yields a complete and disjoint case distinction for ๐‘ก , we can concludethat

prox๐›พ ๐‘“ (๐‘ก) =๐‘ก โˆ’ ๐›พ if ๐‘ก > ๐›พ,0 if ๐‘ก โˆˆ [โˆ’๐›พ,๐›พ],๐‘ก + ๐›พ if ๐‘ก < โˆ’๐›พ .

This mapping is also known as the soft-shrinkage or soft-thresholding operator.

(iii) ๐‘“ (๐‘ก) = ๐›ฟ [โˆ’1,1] (๐‘ก). We can proceed here in the same way as in (ii), but for the sakeof variety we instead use Lemma 6.21 (ii) to compute the proximal point mapping

83

6 monotone operators and proximal points

from that of ๐‘“ โˆ—(๐‘ก) = |๐‘ก | (see Example 5.3 (ii)) via

prox๐›พ ๐‘“ (๐‘ก) = ๐‘ก โˆ’ ๐›พ prox๐›พโˆ’1 ๐‘“ โˆ— (๐›พโˆ’1๐‘ก)

=

๐‘ก โˆ’ ๐›พ (๐›พโˆ’1๐‘ก โˆ’ ๐›พโˆ’1) if ๐›พโˆ’1๐‘ก > ๐›พโˆ’1,๐‘ก โˆ’ 0 if ๐›พโˆ’1๐‘ก โˆˆ [โˆ’๐›พโˆ’1, ๐›พโˆ’1],๐‘ก โˆ’ ๐›พ (๐›พโˆ’1๐‘ก + ๐›พโˆ’1) if ๐›พโˆ’1๐‘ก < โˆ’๐›พโˆ’1

=

1 if ๐‘ก > 1,๐‘ก if ๐‘ก โˆˆ [โˆ’1, 1],

โˆ’1 if ๐‘ก < โˆ’1.

For every ๐›พ > 0, the proximal point of ๐‘ก is thus its projection onto [โˆ’1, 1].

Example 6.23. We can generalize Example 6.22 to ๐‘‹ = โ„๐‘ (endowed with the Euclideaninner product) by applying Lemma 6.21 (iii) ๐‘ times. We thus obtain componentwise

(i) for ๐น (๐‘ฅ) = 12 โ€–๐‘ฅ โ€–22 =

โˆ‘๐‘๐‘–=1

12๐‘ฅ

2๐‘– that

[prox๐›พ๐น (๐‘ฅ)]๐‘– =(

11 + ๐›พ

)๐‘ฅ๐‘–, 1 โ‰ค ๐‘– โ‰ค ๐‘ ;

(ii) for ๐น (๐‘ฅ) = โ€–๐‘ฅ โ€–1 = โˆ‘๐‘๐‘–=1 |๐‘ฅ๐‘– | that

[prox๐›พ๐น (๐‘ฅ)]๐‘– = ( |๐‘ฅ๐‘– | โˆ’ ๐›พ)+ sign(๐‘ฅ๐‘–), 1 โ‰ค ๐‘– โ‰ค ๐‘ ;

(iii) for ๐น (๐‘ฅ) = ๐›ฟ๐”นโˆž (๐‘ฅ) =โˆ‘๐‘๐‘–=1 ๐›ฟ [โˆ’1,1] (๐‘ฅ๐‘–) that

[prox๐›พ๐น (๐‘ฅ)]๐‘– = ๐‘ฅ๐‘– โˆ’ (๐‘ฅ๐‘– โˆ’ 1)+ โˆ’ (๐‘ฅ๐‘– + 1)โˆ’ =๐‘ฅ๐‘–

max{1, |๐‘ฅ๐‘– |} , 1 โ‰ค ๐‘– โ‰ค ๐‘ .

Here we have used the convenient notation (๐‘ก)+ โ‰” max{๐‘ก, 0} and (๐‘ก)โˆ’ โ‰” min{๐‘ก, 0}.

Many more examples of projection operators and proximal mappings can be found in[Cegielski 2012], [Parikh&Boyd 2014, ยง 6.5], [Beck 2017], as well as athps://www.proximity-

operator.net.

Since the subdierential of convex integral functionals can be evaluated pointwise byTheorem 4.11, the same holds for the denition (6.13) of the proximal point mapping.

Corollary 6.24. Let ๐‘“ : โ„ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐น : ๐ฟ2(ฮฉ) โ†’โ„ be dened by superposition as in Lemma 3.7. Then we have for all ๐›พ > 0 and ๐‘ข โˆˆ ๐ฟ2(ฮฉ)

84

6 monotone operators and proximal points

that[prox๐›พ๐น (๐‘ข)] (๐‘ฅ) = prox๐›พ ๐‘“ (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ.

Example 6.25. Let ๐‘‹ be a Hilbert space. Similarly to Example 6.22 one can show that

(i) for ๐น = 12 โ€– ยท โ€–2๐‘‹ = 1

2 ใ€ˆยท, ยทใ€‰๐‘‹ , that

prox๐›พ๐น (๐‘ฅ) =(

11 + ๐›พ

)๐‘ฅ ;

(ii) for ๐น = โ€– ยท โ€–๐‘‹ , using a case distinction as in Theorem 4.6, that

prox๐›พ๐น (๐‘ฅ) =(1 โˆ’ ๐›พ

โ€–๐‘ฅ โ€–๐‘‹

)+๐‘ฅ ;

(iii) for ๐น = ๐›ฟ๐ถ with ๐ถ โŠ‚ ๐‘‹ nonempty, convex, and closed, that by denition

prox๐›พ๐น (๐‘ฅ) = proj๐ถ (๐‘ฅ) โ‰” argmin๐‘งโˆˆ๐ถ

โ€–๐‘ง โˆ’ ๐‘ฅ โ€–๐‘‹

the metric projection of ๐‘ฅ onto ๐ถ; the proximal point mapping thus generalizesthe concept projection onto convex sets. Explicit or at least constructive formulasfor the projection onto dierent classes of sets can be found in [Cegielski 2012,Chapter 4.1].

Remark 6.26. The results of this section can be extended to (reexive) Banach spaces if the identityis replaced by the duality mapping ๐‘ฅ โ†ฆโ†’ ๐œ•(โ€– ยท โ€–๐‘‹ ) (๐‘ฅ); see, e.g., [Cioranescu 1990, Theorem 3.11]. If thenorm is dierentiable (which is the case if the unit ball of๐‘‹ โˆ— is strictly convex as for, e.g.,๐‘‹ = ๐ฟ๐‘ (ฮฉ)with ๐‘ โˆˆ (1,โˆž)), the duality mapping is in fact single-valued [Cioranescu 1990, Theorem 2.16] andhence the corresponding resolvent (๐œ• ๐‘— + ๐ด)โˆ’1 is well-dened. However, the proximal mappingneed no longer be Lipschitz continuous, although the denition can be modied to obtain uniformcontinuity; see [Baฤรกk & Kohlenbach 2018]. Similarly, the Moreau decomposition (Theorem 6.20)needs to be modied appropriately; see [Combettes & Reyes 2013].

The main diculty from our point of view, however, lies in the evaluation of the proximal mapping,which then rarely admits a closed form even for simple functionals. This problem already arisesin Hilbert spaces if ๐‘‹ โˆ— is not identied with ๐‘‹ and hence the Riesz isomorphism (which coincideswith ๐ฝโˆ’1๐‘‹ โˆ— in this case) has to be inverted to obtain a proximal point.

Remark 6.27. By Corollary 6.14, the proximal mapping of any proper, convex, and lower semicon-tinuous functional is nonexpansive. Conversely, it can be shown that every nonexpansive mapping๐‘‡ : ๐‘‹ โ†’ ๐‘‹ that satises ๐‘‡ (๐‘ฅ) โˆˆ ๐œ•๐บ (๐‘ฅ), for all ๐‘ฅ โˆˆ ๐‘‹ for some proper, convex, and lower semi-continuous functional ๐บ : ๐‘‹ โ†’ โ„ is the proximal mapping of some proper, convex, and lowersemicontinuous functional; see [Moreau 1965; Gribonval & Nikolova 2020].

85

7 SMOOTHNESS AND CONVEXITY

Before we turn to algorithms for the solution of nonsmooth optimization problems, wederive consequences of convexity for dierentiable functionals that will be useful in prov-ing convergence of splitting methods for functionals involving a smooth component. Inparticular, we will show that Lipschitz continuous dierentiability is linked via Fenchelduality to strong convexity.

7.1 smoothness

We now derive useful consequences of Lipschitz dierentiability and their relation toconvexity. Recall from Theorem 4.5 that for ๐น : ๐‘‹ โ†’ โ„ convex and Gรขteaux dierentiable,๐œ•๐น (๐‘ฅ) = {๐ท๐น (๐‘ฅ)} (which can be identied with {โˆ‡๐น (๐‘ฅ)} โŠ‚ ๐‘‹ in Hilbert spaces).

Lemma 7.1. Let ๐‘‹ be a Banach space and let ๐น : ๐‘‹ โ†’ โ„ be Gรขteaux dierentiable. Considerthe properties:

(i) The property

(7.1) ๐น (๐‘ฆ) โ‰ค ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ 12๐ฟ โ€–๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ)โ€–

2๐‘‹ โˆ— for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ .

(ii) The co-coercivity of ๐ท๐น with factor ๐ฟโˆ’1:

(7.2) ๐ฟโˆ’1โ€–๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ)โ€–2๐‘‹ โˆ— โ‰ค ใ€ˆ๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ), ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ .

(iii) Lipschitz continuity of ๐ท๐น with factor ๐ฟ:

(7.3) โ€–๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ)โ€–๐‘‹ โˆ— โ‰ค ๐ฟโ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ .

(iv) The property

(7.4) ใ€ˆ๐ท๐น (๐‘ฅ + โ„Ž) โˆ’ ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ โ‰ค ๐ฟโ€–โ„Žโ€–2๐‘‹ for all ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ .

86

7 smoothness and convexity

(v) The smoothness (also known as descent lemma) of ๐น with factor ๐ฟ:

(7.5) ๐น (๐‘ฅ + โ„Ž) โ‰ค ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ + ๐ฟ2 โ€–โ„Žโ€–2๐‘‹ for all ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ .

(vi) The uniform smoothness of ๐น with factor ๐ฟ:

(7.6) ๐น (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ) + _(1 โˆ’ _)๐ฟ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹โ‰ฅ _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฆ) for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹, _ โˆˆ [0, 1] .

Then (i) โ‡’ (ii) โ‡’ (iii) โ‡’ (iv) โ‡” (v) โ‡” (vi). If ๐น is convex and ๐‘‹ is reexive, then all theproperties are equivalent.

Proof. (i)โ‡’ (ii): Summing the estimate (7.1) with the same estimate with ๐‘ฅ and ๐‘ฆ exchanged,we obtain (7.2).

(ii) โ‡’ (iii): This follows immediately from (1.1).

(iii) โ‡’ (iv): Taking ๐‘ฆ = ๐‘ฅ + โ„Ž and multiplying (7.3) by โ€–โ„Žโ€–๐‘‹ , the property follows againfrom (1.1).

(iv) โ‡’ (v): Using the mean value Theorem 2.10 and (7.4), we obtain

๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ =โˆซ 1

0ใ€ˆ๐ท๐น (๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰๐‘‹ ๐‘‘๐‘ก โˆ’ ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹

=โˆซ 1

0ใ€ˆ๐ท๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ ๐‘‘๐‘ก

โ‰คโˆซ 1

0๐‘ก ๐‘‘๐‘ก ยท ๐ฟโ€–โ„Žโ€–2๐‘‹ =

๐ฟ

2 โ€–โ„Žโ€–2๐‘‹ .

(v)โ‡’ (iv): This follows by adding together (7.5) and the same inequality with ๐‘ฅ +โ„Ž in placeof ๐‘ฅ .

(v) โ‡’ (vi): Set ๐‘ฅ_ โ‰” _๐‘ฅ + (1 โˆ’ _)๐‘ฆ . Multiplying (7.5) rst for ๐‘ฅ = ๐‘ฅ_ and โ„Ž = ๐‘ฅ โˆ’ ๐‘ฅ_ =(1 โˆ’ _) (๐‘ฅ โˆ’ ๐‘ฆ) with _ and then for ๐‘ฅ = ๐‘ฅ_ and โ„Ž = ๐‘ฆ โˆ’ ๐‘ฅ_ = _(๐‘ฆ โˆ’ ๐‘ฅ) with 1 โˆ’ _ and addingthe results yields (7.6).

(vi) โ‡’ (v): This follows by dividing (7.6) by _ > 0 and taking the limit _ โ†’ 0.

(v)โ‡’ (i) when ๐น is convex and ๐‘‹ is reexive: Since ๐น is convex, we have from Theorem 4.5that

ใ€ˆ๐ท๐น (๐‘ฆ), (๐‘ฅ + โ„Ž) โˆ’ ๐‘ฆใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฆ).

87

7 smoothness and convexity

Combining this with (7.5) yields

(7.7) ๐น (๐‘ฆ) โ‰ค ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ โˆ’ ใ€ˆ๐ท๐น (๐‘ฆ), (๐‘ฅ + โ„Ž) โˆ’ ๐‘ฆใ€‰๐‘‹ + ๐ฟ2 โ€–โ„Žโ€–2๐‘‹

= ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ), โ„Žใ€‰๐‘‹ + ๐ฟ2 โ€–โ„Žโ€–2๐‘‹ .

Let ๐‘งโˆ— โ‰” โˆ’๐ฟโˆ’1(๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ)). Since ๐‘‹ is reexive, the algebraic Hahnโ€“Banach Theo-rem 1.4 yields (after multiplication by โ€–๐‘งโˆ—โ€–๐‘‹ โˆ—) an โ„Ž โˆˆ ๐‘‹ such that

โ€–โ„Žโ€–๐‘‹ = โ€–๐‘งโˆ—โ€–๐‘‹ โˆ— and ใ€ˆ๐‘งโˆ—, โ„Žใ€‰๐‘‹ = โ€–๐‘งโˆ—โ€–2๐‘‹ โˆ— .

Consequently, continuing from (7.7),

๐น (๐‘ฆ) โ‰ค ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐ฟใ€ˆ๐‘งโˆ—, โ„Žใ€‰๐‘‹ + ๐ฟ2 โ€–โ„Žโ€–2๐‘‹

= ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐ฟ

2 โ€–๐‘งโˆ—โ€–2๐‘‹ โˆ—

= ๐น (๐‘ฅ) + ใ€ˆ๐ท๐น (๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ 12๐ฟ โ€–๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ)โ€–

2๐‘‹ โˆ— .

This proves (7.1). ๏ฟฝ

The next โ€œsmoothness three-point corollaryโ€ will be valuable for the study of splittingmethods that involve a smooth component function.

Corollary 7.2. Let ๐‘‹ be a reexive Banach space and let ๐น : ๐‘‹ โ†’ โ„ be convex and Gรขteauxdierentiable. Then the following are equivalent:

(i) ๐น has ๐ฟโˆ’1-co-coercive derivative (or any of the equivalent properties of Lemma 7.1).

(ii) The three-point smoothness

(7.8) ใ€ˆ๐ท๐น (๐‘ง), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ฟ

2 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ for all ๐‘ฅ, ๐‘ง, ๐‘ฅ โˆˆ ๐‘‹,

(iii) The three-point monotonicity

(7.9) ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐ฟ4 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ for all ๐‘ฅ, ๐‘ง, ๐‘ฅ โˆˆ ๐‘‹ .

Proof. If โˆ‡๐น is ๐ฟโˆ’1-co-coercive, using Lemma 7.1, we have the ๐ฟ-smoothness

๐น (๐‘ง) โˆ’ ๐น (๐‘ฅ) โ‰ฅ ใ€ˆ๐ท๐น (๐‘ง), ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐ฟ

2 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ .

By convexity ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ง) โ‰ฅ ใ€ˆ๐ท๐น (๐‘ง), ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ . Summing up, we obtain (7.8).

88

7 smoothness and convexity

Regarding (7.9), by assumption we have the co-coercivity

ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐ฟโˆ’1โ€–๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ)โ€–2๐‘‹ โˆ— .

Thus, using (1.1) and Youngโ€™s inequality in the form ๐‘Ž๐‘ โ‰ค 12๐›ผ๐‘Ž

2 + ๐›ผ2๐‘

2 for ๐‘Ž, ๐‘ โˆˆ โ„ and ๐›ผ > 0,we obtain

ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹โ‰ฅ ๐ฟโˆ’1โ€–๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ)โ€–2๐‘‹ โˆ— โˆ’ โ€–๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ)โ€–๐‘‹ โˆ— โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹โ‰ฅ โˆ’๐ฟ4 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ .

This is (7.9)

For the reverse implications,we assume that (7.9) holds and set ๐‘งโˆ— โ‰” โˆ’2๐ฟโˆ’1(๐ท๐น (๐‘ง)โˆ’๐ท๐น (๐‘ฅ)).By the assumed reexivity, we can again apply the algebraic Hahnโ€“Banach Theorem 1.4 toobtain an โ„Ž โˆˆ ๐‘‹ such that

โ€–โ„Žโ€–๐‘‹ = โ€–๐‘งโˆ—โ€–๐‘‹ โˆ— and ใ€ˆ๐‘งโˆ—, โ„Žใ€‰๐‘‹ = โ€–๐‘งโˆ—โ€–2๐‘‹ โˆ— .

With ๐‘ฅ = ๐‘ง + โ„Ž, (7.9) gives

ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ โˆ’ ๐ฟ

4 โ€–โ„Žโ€–2๐‘‹

=๐ฟ

2 ใ€ˆ๐‘งโˆ—, โ„Žใ€‰๐‘‹ โˆ’ ๐ฟ

4 โ€–๐‘งโˆ—โ€–2๐‘‹ โˆ—

=๐ฟ

4 โ€–๐‘งโˆ—โ€–2๐‘‹ โˆ— =

1๐ฟโ€–๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ)โ€–2๐‘‹ โˆ— .

This is the ๐ฟโˆ’1-co-coercivity (7.2). The remaining equivalences follow from Lemma 7.1. ๏ฟฝ

7.2 strong convexity

The central notion in this chapter (and later for obtaining higher convergence rates forrst-order algorithms) is the following โ€œquantitativeโ€ version of convexity. We say that๐น : ๐‘‹ โ†’ โ„ is strongly convex with the factor ๐›พ > 0 if for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ and _ โˆˆ [0, 1],

(7.10) ๐น (_๐‘ฅ + (1 โˆ’ _)๐‘ฆ) + _(1 โˆ’ _)๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ โ‰ค _๐น (๐‘ฅ) + (1 โˆ’ _)๐น (๐‘ฆ).

Obviously, strong convexity implies strict convexity, so strongly convex functions havea unique minimizer. If ๐‘‹ is a Hilbert space, it is straightforward if tedious to verify byexpanding the squared norm that (7.10) is equivalent to ๐น โˆ’ ๐›พ

2 โ€– ยท โ€–2๐‘‹ being convex.

We have the following important duality result that was rst shown in [Azรฉ & Penot1995].

89

7 smoothness and convexity

Theorem 7.3. Let ๐น : ๐‘‹ โ†’ โ„ be proper and convex. Then the following are true:

(i) If ๐น is strongly convex with factor ๐›พ , then ๐น โˆ— is uniformly smooth with factor ๐›พโˆ’1.

(ii) If ๐น is uniformly smooth with factor ๐ฟ, then ๐น โˆ— is strongly convex with factor ๐ฟโˆ’1.

(iii) If ๐น is lower semicontinuous, then ๐น is uniformly smooth with factor ๐ฟ if and only if ๐น โˆ—

is strongly convex with factor ๐ฟโˆ’1.

Proof. (i): Let ๐‘ฅโˆ—, ๐‘ฆโˆ— โˆˆ ๐‘‹ โˆ— and ๐›ผ๐‘ฅ , ๐›ผ๐‘ฆ โˆˆ โ„ with ๐›ผ๐‘ฅ < ๐น โˆ—(๐‘ฅโˆ—) and ๐›ผ๐‘ฆ < ๐น โˆ—(๐‘ฆโˆ—). From thedenition of the Fenchel conjugate, there exist ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ such that

๐›ผ๐‘ฅ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ), ๐›ผ๐‘ฆ < ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰๐‘‹ โˆ’ ๐น (๐‘ฆ).

Multiplying the rst inequality with _ โˆˆ [0, 1], the second with (1 โˆ’ _), and using theFenchelโ€“Young inequality (5.1) in the form

0 โ‰ค ๐น (๐‘ฅ_) + ๐น โˆ—(๐‘ฅโˆ—_) โˆ’ ใ€ˆ๐‘ฅโˆ—_, ๐‘ฅ_ใ€‰๐‘‹for ๐‘ฅโˆ—

_โ‰” _๐‘ฅโˆ— + (1 โˆ’ _)๐‘ฆโˆ— and ๐‘ฅ_ โ‰” _๐‘ฅ + (1 โˆ’ _)๐‘ฆ then yields

_๐›ผ๐‘ฅ + (1 โˆ’ _)๐›ผ๐‘ฆ โ‰ค ๐น (๐‘ฅ_) + ๐น โˆ—(๐‘ฅโˆ—_) โˆ’ _๐น (๐‘ฅ) โˆ’ (1 โˆ’ _)๐น (๐‘ฆ) + _(1 โˆ’ _)ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹โ‰ค ๐น โˆ—(๐‘ฅโˆ—_) + _(1 โˆ’ _)

(ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹

)โ‰ค ๐น โˆ—(๐‘ฅโˆ—_) + _(1 โˆ’ _) sup

๐‘งโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฆโˆ—, ๐‘งใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘งโ€–

2๐‘‹

}= ๐น โˆ—(๐‘ฅโˆ—_) + _(1 โˆ’ _)

12๐›พ โ€–๐‘ฅ

โˆ— โˆ’ ๐‘ฆโˆ—โ€–2๐‘‹ โˆ—,

where we have used the denition (7.10) of strong convexity in the second inequality andLemma 5.4 together with Lemma 5.7 (i) in the nal equality. Letting now ๐›ผ๐‘ฅ โ†’ ๐น โˆ—(๐‘ฅโˆ—) and๐›ผ๐‘ฆ โ†’ ๐น โˆ—(๐‘ฆโˆ—), we obtain (7.6) for ๐น โˆ— with ๐ฟ โ‰” ๐›พโˆ’1.

(ii): Let ๐‘ฅโˆ—, ๐‘ฆโˆ— โˆˆ ๐‘‹ โˆ— and _ โˆˆ [0, 1]. Set again ๐‘ฅโˆ—_โ‰” _๐‘ฅโˆ— + (1 โˆ’ _)๐‘ฆโˆ—. Then we obtain from

the denition of the Fenchel conjugate and (7.6) that for any ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ ,

_๐น โˆ—(๐‘ฅโˆ—) + (1 โˆ’ _)๐น โˆ—(๐‘ฆโˆ—) โ‰ฅ _ [ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ + (1 โˆ’ _)๐‘ฆใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ + (1 โˆ’ _)๐‘ฆ)]+ (1 โˆ’ _) [ใ€ˆ๐‘ฆโˆ—, ๐‘ฅ โˆ’ _๐‘ฆใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ โˆ’ _๐‘ฆ)]

โ‰ฅ _ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ + (1 โˆ’ _)๐‘ฆใ€‰๐‘‹ + (1 โˆ’ _)ใ€ˆ๐‘ฆโˆ—, ๐‘ฅ โˆ’ _๐‘ฆใ€‰๐‘‹โˆ’ ๐น (๐‘ฅ) โˆ’ _(1 โˆ’ _)๐ฟ2 โ€–๐‘ฆ โ€–

2๐‘‹

= ใ€ˆ๐‘ฅโˆ—_, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ) + _(1 โˆ’ _)(ใ€ˆ๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฆใ€‰๐‘‹ โˆ’ ๐ฟ

2 โ€–๐‘ฆ โ€–2๐‘‹

).

Taking now the supremum over all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ and using again Lemma 5.4 together withLemma 5.7 (i), we obtain the strong convexity (7.10) with ๐›พ โ‰” ๐ฟโˆ’1.

90

7 smoothness and convexity

(iii): One direction of the claim is clear from (ii). For the other direction, if ๐น โˆ— is stronglyconvex with factor ๐ฟโˆ’1, then its preconjugate (๐น โˆ—)โˆ— is uniformly smooth with factor ๐ฟ by aproof completely analogous to (i). Then we use Theorem 5.1 to see that ๐น = ๐น โˆ—โˆ— โ‰” (๐น โˆ—)โˆ—under the lower semicontinuity assumption. ๏ฟฝ

Just as convexity of ๐น implies monotonicity of ๐œ•๐น , strong convexity has the followingconsequences.

Lemma 7.4. Let ๐‘‹ be a Banach space and ๐น : ๐‘‹ โ†’ โ„. Consider the properties:

(i) ๐น is strongly convex with factor ๐›พ > 0.

(ii) ๐น is strongly subdierentiable with factor ๐›พ :

(7.11) ๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐›พ2 โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–2๐‘‹ for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ ; ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ).

(iii) ๐œ•๐น is strongly monotone with factor ๐›พ :

(7.12) ใ€ˆ๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–2๐‘‹ for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ ; ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ), ๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐‘ฆ).

Then (i) โ‡’ (ii) โ‡’ (iii). If ๐‘‹ is reexive and ๐น is proper, convex, and lower semicontinuous,then also (iii) โ‡’ (i).

Proof. (i)โ‡’ (ii): Let๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ and_ โˆˆ (0, 1) be arbitrary. Dividing (7.10) by _ and rearrangingyields

๐น (๐‘ฆ + _(๐‘ฅ โˆ’ ๐‘ฆ)) โˆ’ ๐น (๐‘ฆ)_

โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฆ) โˆ’ (1 โˆ’ _)๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ .Since strongly convex functions are also convex, we can apply Lemma 4.3 (ii) to pass to thelimit _ โ†’ 0 on both sides to obtain

๐น โ€ฒ(๐‘ฆ, ๐‘ฅ โˆ’ ๐‘ฆ) โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฆ) โˆ’ ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ .

Using Lemma 4.4 for โ„Ž = ๐‘ฅ โˆ’ ๐‘ฆ , we thus obtain that for any ๐‘ฆโˆ— โˆˆ ๐œ•๐น (๐‘ฆ),

ใ€ˆ๐‘ฆโˆ—, ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฆ, ๐‘ฅ โˆ’ ๐‘ฆ) โ‰ค ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฆ) โˆ’ ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ .

Exchanging the roles of ๐‘ฅ and ๐‘ฆ and rearranging yields (7.11).

(ii) โ‡’ (iii): Adding (7.11) with the same inequality with ๐‘ฅ and ๐‘ฆ exchanged immediatelyyields (7.12).

(iii)โ‡’ (i): Suppose rst that ๐œ•๐น is surjective. Then dom ๐œ•๐น โˆ— = ๐‘‹ โˆ—. Using the duality between๐œ•๐น and ๐œ•๐น โˆ— in Lemma 5.8, we rewrite (7.12) as

(7.13) ใ€ˆ๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–2๐‘‹ for all ๐‘ฅโˆ—, ๐‘ฆโˆ— โˆˆ ๐‘‹ โˆ—; ๐‘ฅ โˆˆ ๐œ•๐น โˆ—(๐‘ฅโˆ—), ๐‘ฆ โˆˆ ๐œ•๐น โˆ—(๐‘ฆโˆ—).

91

7 smoothness and convexity

Taking ๐‘ฆ = ๐‘ฅ , this implies that ๐‘ฅโˆ— = ๐‘ฆโˆ—, i.e., ๐œ•๐น โˆ—(๐‘ฅโˆ—) is a singleton for all ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—. Here weuse that dom ๐œ•๐น โˆ— = ๐‘‹ โˆ— to avoid the possibility that ๐œ•๐น โˆ—(๐‘ฅโˆ—) = โˆ…. By Theorem 4.5 it followsthat ๐น โˆ— is Gรขteaux dierentiable. Thus (7.13) describes the co-coercivity (7.2) of ๐ท๐น โˆ— withfactor ๐›พ . By Lemma 7.1 it follows that ๐น โˆ— is uniformly smooth with factor ๐›พโˆ’1. Consequently,by Theorem 7.3 ๐น is strongly convex with factor ๐›พ .

If ๐œ•๐น is not surjective, we replace ๐น by ๐น + Y ๐‘— for the duality mapping ๐‘— (๐‘ฅ) โ‰” 12 โ€–๐‘ฅ โ€–2๐‘‹ and

some Y > 0. By Theorem 6.11 and Mintyโ€™s Theorem 6.12 now ๐œ•(๐น + Y ๐‘—) is surjective. It alsoremains strongly monotone with factor ๐›พ as ๐œ• ๐‘— is monotone. Now, by the above reasoning,๐น + Y ๐‘— is strongly convex with factor ๐›พ . Since Y > 0 was arbitrary, we deduce from thedening (7.10) that ๐น is strongly convex with factor ๐›พ . ๏ฟฝ

Note that the factor ๐›พ enters into the strong monotonicity (7.12) directly rather than as ๐›พ2as in the strong subdierentiability (7.11) (and strong convexity).

We can also derive a stronger, quantitative, version of the fact that for convex functions,points that satisfy the Fermat principle are minimizers.

Lemma 7.5. Let ๐‘‹ be a Banach space and let ๐น : ๐‘‹ โ†’ โ„ be strongly convex with factor๐›พ > 0. Assume that ๐น admits a minimum๐‘€ โ‰” min๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ). Then the Polyakโ€“ลojasewiczinequality holds:

(7.14) ๐น (๐‘ฅ) โˆ’๐‘€ โ‰ค 12๐›พ โ€–๐‘ฅ

โˆ—โ€–2๐‘‹ โˆ— for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ).

Proof. Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) be arbitrary. Then from Lemma 7.4 (ii) we have that

โˆ’๐น (๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ โ‰ฅ โˆ’๐น (๐‘ฆ) .

Taking the supremum over all ๐‘ฆ โˆˆ ๐‘‹ , noting that this is equivalent to taking the supremumover all ๐‘ฅโˆ’๐‘ฆ โˆˆ ๐‘‹ , and inserting the Fenchel conjugate of the squared norm from Lemma 5.4together with Lemma 5.7 (i), we obtain

โˆ’๐น (๐‘ฅ) + 12๐›พ โ€–๐‘ฅ

โˆ—โ€–2๐‘‹ โˆ— โ‰ฅ sup๐‘ฆโˆˆ๐‘‹

โˆ’๐น (๐‘ฆ) = โˆ’min๐‘ฆโˆˆ๐‘‹

๐น (๐‘ฆ)

and hence, after rearranging, (7.14). ๏ฟฝ

Comparing the consequences of strong convexity in Lemma 7.4 and those of uniformsmoothness in Lemma 7.1, we can already see a certain duality between them: While theformer give lower bounds, the latter give upper bounds and vice versa. A simple exampleis the following

92

7 smoothness and convexity

Corollary 7.6. If ๐น : ๐‘‹ โ†’ โ„ is strongly convex with factor ๐›พ and uniformly smooth withfactor ๐ฟ, then

(7.15) ๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ โ‰ค ใ€ˆ๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฆ), ๐‘ฅ โˆ’ ๐‘ฆใ€‰๐‘‹ โ‰ค ๐ฟโ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ .

Proof. The rst inequality follows from Lemma 7.4 (iii), while the second follows from (1.1)together with Lemma 7.1 (iii). ๏ฟฝ

The estimates of Corollary 7.2 can be improved if ๐น is in addition strongly convex.

Corollary 7.7. Let ๐‘‹ be a Banach space and let ๐น : ๐‘‹ โ†’ โ„ be strongly convex with factor๐›พ > 0 as well as Lipschitz dierentiable with constant ๐ฟ > 0. Then for any ๐›ผ > 0,

(7.16) ใ€ˆ๐ท๐น (๐‘ง), ๐‘ฅ โˆ’๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) + ๐›พ โˆ’ ๐›ผ๐ฟ2 โ€–๐‘ฅ โˆ’๐‘ฅ โ€–2๐‘‹ โˆ’๐ฟ

2๐›ผ โ€–๐‘ฅ โˆ’๐‘งโ€–2๐‘‹ for all ๐‘ฅ, ๐‘ง, ๐‘ฅ โˆˆ ๐‘‹,

as well as

(7.17) ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ (๐›พ โˆ’ ๐›ผ๐ฟ)โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

4๐›ผ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ for all ๐‘ฅ, ๐‘ง, ๐‘ฅ โˆˆ ๐‘‹ .

Proof. Using the strong subdierentiability from Lemma 7.4 (ii), the Lipschitz continuity of๐ท๐น , (1.1), and Youngโ€™s inequality, we obtain

ใ€ˆ๐ท๐น (๐‘ง), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) + ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐›ผ๐ฟ

2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ 12๐›ผ๐ฟ โ€–๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ)โ€–

2๐‘‹ โˆ—

โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) + ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐›ผ๐ฟ

2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

2๐›ผ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ .

For (7.17), we can use the strong monotonicity of ๐ท๐น from Lemma 7.4 (iii) to estimateanalogously

ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐ท๐น (๐‘ฅ) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐ท๐น (๐‘ง) โˆ’ ๐ท๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹โ‰ฅ ๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐›ผ๐ฟโ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

4๐›ผ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ . ๏ฟฝ

7.3 moreauโ€“yosida regularization

We now look at another way to reformulate optimality conditions using proximal pointmappings. Although these are no longer equivalent reformulations, they will serve as alink to the Newton-type methods which will be introduced in Chapter 14.

93

7 smoothness and convexity

We again assume that๐‘‹ is a Hilbert space and identify๐‘‹ โˆ— with๐‘‹ via the Riesz isomorphism.Let ๐ด : ๐‘‹ โ‡’ ๐‘‹ be a maximally monotone operator and ๐›พ > 0. Then we dene the Yosidaapproximation of ๐ด as

๐ด๐›พ โ‰”1๐›พ

(Id โˆ’ R๐›พ๐ด

).

In particular, the Yosida approximation of the subdierential of a proper, convex, and lowersemicontinuous functional ๐น : ๐‘‹ โ†’ โ„ is given by

(๐œ•๐น )๐›พ โ‰” 1๐›พ

(Id โˆ’ prox๐›พ๐น

),

which by Corollary 6.14 and Theorem 6.11 is always Lipschitz continuous with constant๐ฟ = ๐›พโˆ’1.

An alternative point of view is the following. For a proper, convex, and lower semicontinuousfunctional ๐น : ๐‘‹ โ†’ โ„ and ๐›พ > 0, we dene the Moreau envelope1

(7.18) ๐น๐›พ : ๐‘‹ โ†’ โ„, ๐‘ฅ โ†ฆโ†’ inf๐‘งโˆˆ๐‘‹

12๐›พ โ€–๐‘ง โˆ’ ๐‘ฅ โ€–

2๐‘‹ + ๐น (๐‘ง),

see Figure 7.1. Comparing this with the denition (6.12) of the proximal point mapping of๐น , we see that

(7.19) ๐น๐›พ (๐‘ฅ) = 12๐›พ โ€–prox๐›พ๐น (๐‘ฅ) โˆ’ ๐‘ฅ โ€–

2๐‘‹ + ๐น (prox๐›พ๐น (๐‘ฅ)) .

(Note that multiplying a functional by ๐›พ > 0 does not change its minimizers.) Hence ๐น๐›พ isindeed well-dened on ๐‘‹ and single-valued. Furthermore, we can deduce from (7.19) that๐น๐›พ is convex as well.

Lemma 7.8. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐›พ > 0. Then ๐น๐›พis convex.

Proof. We rst show that for any convex ๐บ : ๐‘‹ โ†’ โ„, the mapping

๐ป : ๐‘‹ ร— ๐‘‹ โ†’ โ„, (๐‘ฅ, ๐‘ง) โ†ฆโ†’ ๐น (๐‘ง) +๐บ (๐‘ง โˆ’ ๐‘ฅ)

is convex as well. Indeed, for any (๐‘ฅ1, ๐‘ง1), (๐‘ฅ2, ๐‘ง2) โˆˆ ๐‘‹ ร— ๐‘‹ and _ โˆˆ [0, 1], convexity of ๐นand ๐บ implies that

๐ป (_(๐‘ฅ1, ๐‘ง1) + (1 โˆ’ _) (๐‘ฅ2, ๐‘ง2)) = ๐น (_๐‘ง1 + (1 โˆ’ _)๐‘ง2) +๐บ (_(๐‘ง1 โˆ’ ๐‘ฅ1) + (1 โˆ’ _) (๐‘ง2 โˆ’ ๐‘ฅ2))โ‰ค _ (๐น (๐‘ง1) +๐บ (๐‘ง1 โˆ’ ๐‘ฅ1)) + (1 โˆ’ _) (๐น (๐‘ง2) +๐บ (๐‘ง2 โˆ’ ๐‘ฅ2))= _๐ป (๐‘ฅ1, ๐‘ง1) + (1 โˆ’ _)๐ป (๐‘ฅ2, ๐‘ง2).

1not to be confused with the convex envelope ๐น ฮ“!

94

7 smoothness and convexity

Let now ๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐‘‹ and _ โˆˆ [0, 1]. Since ๐น๐›พ (๐‘ฅ) = inf๐‘งโˆˆ๐‘‹ ๐ป (๐‘ฅ, ๐‘ง) for ๐บ (๐‘ฆ) โ‰” 12๐›พ โ€–๐‘ฆ โ€–2๐‘‹ , there

exist two minimizing sequences {๐‘ง1๐‘›}๐‘›โˆˆโ„•, {๐‘ง2๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ with๐ป (๐‘ฅ1, ๐‘ง1๐‘›) โ†’ ๐น๐›พ (๐‘ฅ1), ๐ป (๐‘ฅ2, ๐‘ง2๐‘›) โ†’ ๐น๐›พ (๐‘ฅ2).

From the denition of the inmum together with the convexity of ๐ป , we thus obtain forall ๐‘› โˆˆ โ„• that

๐น๐›พ (_๐‘ฅ1 + (1 โˆ’ _)๐‘ฅ2) โ‰ค ๐ป (_(๐‘ฅ1, ๐‘ง1๐‘›) + (1 โˆ’ _) (๐‘ฅ2, ๐‘ง2๐‘›))โ‰ค _๐ป (๐‘ฅ1, ๐‘ง1๐‘›) + (1 โˆ’ _)๐ป (๐‘ฅ2, ๐‘ง2๐‘›),

and passing to the limit ๐‘› โ†’ โˆž yields the desired convexity. ๏ฟฝ

We will also show later that Moreauโ€“Yosida regularization preserves (global!) Lipschitzcontinuity.

The next theorem links the two concepts of Moreau envelope and of Yosida approximationand hence justies the term Moreauโ€“Yosida regularization.

Theorem 7.9. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐›พ > 0. Then๐น๐›พ is Frรฉchet dierentiable with

โˆ‡(๐น๐›พ ) = (๐œ•๐น )๐›พ .

Proof. Let ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ be arbitrary and set ๐‘ฅโˆ— = prox๐›พ๐น (๐‘ฅ) and ๐‘ฆโˆ— = prox๐›พ๐น (๐‘ฆ). We rst showthat

(7.20) 1๐›พใ€ˆ๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ โ‰ค ๐น (๐‘ฆโˆ—) โˆ’ ๐น (๐‘ฅโˆ—).

(Note that for proper ๐น , the denition of proximal points as minimizers necessarily impliesthat ๐‘ฅโˆ—, ๐‘ฆโˆ— โˆˆ dom ๐น .) To this purpose, consider for ๐‘ก โˆˆ (0, 1) the point ๐‘ฅโˆ—๐‘ก โ‰” ๐‘ก๐‘ฆโˆ— + (1 โˆ’ ๐‘ก)๐‘ฅโˆ—.Using the minimizing property of the proximal point ๐‘ฅโˆ— together with the convexity of ๐นand completing the square, we obtain that

๐น (๐‘ฅโˆ—) โ‰ค ๐น (๐‘ฅโˆ—๐‘ก ) +12๐›พ โ€–๐‘ฅ

โˆ—๐‘ก โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ 1

2๐›พ โ€–๐‘ฅโˆ— โˆ’ ๐‘ฅ โ€–2๐‘‹

โ‰ค ๐‘ก๐น (๐‘ฆโˆ—) + (1 โˆ’ ๐‘ก)๐น (๐‘ฅโˆ—) โˆ’ ๐‘ก

๐›พใ€ˆ๐‘ฅ โˆ’ ๐‘ฅโˆ—, ๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ + ๐‘ก2

2๐›พ โ€–๐‘ฅโˆ— โˆ’ ๐‘ฆโˆ—โ€–2๐‘‹ .

Rearranging the terms, dividing by ๐‘ก > 0 and passing to the limit ๐‘ก โ†’ 0 then yields (7.20).Combining this with (7.19) implies that

๐น๐›พ (๐‘ฆ) โˆ’ ๐น๐›พ (๐‘ฅ) = ๐น (๐‘ฆโˆ—) โˆ’ ๐น (๐‘ฅโˆ—) + 12๐›พ

(โ€–๐‘ฆ โˆ’ ๐‘ฆโˆ—โ€–2๐‘‹ โˆ’ โ€–๐‘ฅ โˆ’ ๐‘ฅโˆ—โ€–2๐‘‹)

โ‰ฅ 12๐›พ

(2ใ€ˆ๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ + โ€–๐‘ฆ โˆ’ ๐‘ฆโˆ—โ€–2๐‘‹ โˆ’ โ€–๐‘ฅ โˆ’ ๐‘ฅโˆ—โ€–2๐‘‹

)=

12๐›พ

(2ใ€ˆ๐‘ฆ โˆ’ ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ + โ€–๐‘ฆ โˆ’ ๐‘ฆโˆ— โˆ’ ๐‘ฅ + ๐‘ฅโˆ—โ€–2๐‘‹

)โ‰ฅ 1๐›พใ€ˆ๐‘ฆ โˆ’ ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹ .

95

7 smoothness and convexity

By exchanging the roles of ๐‘ฅโˆ— and ๐‘ฆโˆ— in (7.20), we obtain that

๐น๐›พ (๐‘ฆ) โˆ’ ๐น๐›พ (๐‘ฅ) โ‰ค 1๐›พใ€ˆ๐‘ฆ โˆ’ ๐‘ฅ, ๐‘ฆ โˆ’ ๐‘ฆโˆ—ใ€‰๐‘‹ .

Together, these two inequalities yield that

0 โ‰ค ๐น๐›พ (๐‘ฆ) โˆ’ ๐น๐›พ (๐‘ฅ) โˆ’ 1๐›พใ€ˆ๐‘ฆ โˆ’ ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅโˆ—ใ€‰๐‘‹

โ‰ค 1๐›พใ€ˆ๐‘ฆ โˆ’ ๐‘ฅ, (๐‘ฆ โˆ’ ๐‘ฆโˆ—) โˆ’ (๐‘ฅ โˆ’ ๐‘ฅโˆ—)ใ€‰๐‘‹

โ‰ค 1๐›พ

(โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ โ€–๐‘ฆโˆ— โˆ’ ๐‘ฅโˆ—โ€–2๐‘‹)

โ‰ค 1๐›พโ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–2๐‘‹ ,

where the next-to-last inequality follows from the rm nonexpansivity of proximal pointmappings (Lemma 6.13).

If we now set ๐‘ฆ = ๐‘ฅ + โ„Ž for arbitrary โ„Ž โˆˆ ๐‘‹ , we obtain that

0 โ‰ค ๐น๐›พ (๐‘ฅ + โ„Ž) โˆ’ ๐น๐›พ (๐‘ฅ) โˆ’ ใ€ˆ๐›พโˆ’1(๐‘ฅ โˆ’ ๐‘ฅโˆ—), โ„Žใ€‰๐‘‹โ€–โ„Žโ€–๐‘‹ โ‰ค 1

๐›พโ€–โ„Žโ€–๐‘‹ โ†’ 0 for โ„Ž โ†’ 0,

i.e., ๐น๐›พ is Frรฉchet dierentiable with gradient 1๐›พ (๐‘ฅ โˆ’ ๐‘ฅโˆ—) = (๐œ•๐น )๐›พ (๐‘ฅ). ๏ฟฝ

Since ๐น๐›พ is convex by Lemma 7.8, this result together with Theorem 4.5 yields the catchyrelation ๐œ•(๐น๐›พ ) = (๐œ•๐น )๐›พ .

Example 7.10. We consider again ๐‘‹ = โ„๐‘ .

(i) For ๐น (๐‘ฅ) = โ€–๐‘ฅ โ€–1, we have from Example 6.23 (ii) that the proximal point mappingis given by the component-wise soft-shrinkage operator. Inserting this into thedenition yields that

[(๐œ•โ€– ยท โ€–1)๐›พ (๐‘ฅ)]๐‘– = 1๐›พ (๐‘ฅ๐‘– โˆ’ (๐‘ฅ๐‘– โˆ’ ๐›พ)) = 1 if ๐‘ฅ๐‘– > ๐›พ,1๐›พ๐‘ฅ๐‘– if ๐‘ฅ๐‘– โˆˆ [โˆ’๐›พ,๐›พ],1๐›พ (๐‘ฅ๐‘– โˆ’ (๐‘ฅ๐‘– + ๐›พ)) = โˆ’1 if ๐‘ฅ๐‘– < ๐›พ .

Comparing this to the corresponding subdierential (4.2), we see that the set-valued case in the point ๐‘ฅ๐‘– = 0 has been replaced by a linear function on a smallinterval.

96

7 smoothness and convexity

(a) ๐‘“ (๐‘ก) = |๐‘ก | (b) ๐‘“ (๐‘ก) = ๐›ฟ [โˆ’1,1] (๐‘ก)

Figure 7.1: Illustration of the Moreauโ€“Yosida regularization (thick solid line) of ๐น (thin solidline). The dotted line indicates the quadratic function ๐‘ง โ†ฆโ†’ 1

2๐›พ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ , whilethe dashed line is ๐‘ง โ†ฆโ†’ ๐น (๐‘ง) + 1

2๐›พ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ . The dots and the horizontal andvertical lines (nontrivial only in the second point of (a)) emanating from thedots indicate the pair (๐‘ฅ, ๐น๐›พ (๐‘ฅ)) and how it relates to the minimization of theshifted quadratic functional. (In (b) the two lines are overlaid within [โˆ’1, 1], asonly the domain of denition of the two functions is dierent.)

Similarly, inserting the denition of the proximal point into (7.19) shows that

๐น๐›พ (๐‘ฅ) =๐‘โˆ‘๐‘–=1

๐‘“๐›พ (๐‘ฅ๐‘–) for ๐‘“๐›พ (๐‘ก) =

12๐›พ |๐‘ก โˆ’ (๐‘ก โˆ’ ๐›พ) |2 + |๐‘ก โˆ’ ๐›พ | = ๐‘ก โˆ’ ๐›พ

2 if ๐‘ก > ๐›พ,12๐›พ |๐‘ก |2 if ๐‘ก โˆˆ [โˆ’๐›พ,๐›พ],12๐›พ |๐‘ก โˆ’ (๐‘ก + ๐›พ) |2 + |๐‘ก + ๐›พ | = โˆ’๐‘ก โˆ’ ๐›พ

2 if ๐‘ก < โˆ’๐›พ .

For small values, the absolute value is thus replaced by a quadratic function(which removes the nondierentiability at 0). This modication is well-knownunder the name Huber norm; see Figure 7.1a.

(ii) For ๐น (๐‘ฅ) = ๐›ฟ๐”นโˆž (๐‘ฅ), we have from Example 6.23 (iii) that the proximal mapping isgiven by the component-wise projection onto [โˆ’1, 1] and hence that[(๐œ•๐›ฟ๐”นโˆž)๐›พ (๐‘ฅ)

]๐‘–=

1๐›พ

(๐‘ฅ๐‘– โˆ’

(๐‘ฅ๐‘– โˆ’ (๐‘ฅ๐‘– โˆ’ 1)+ โˆ’ (๐‘ฅ๐‘– + 1)โˆ’) ) = 1

๐›พ(๐‘ฅ๐‘– โˆ’ 1)+ + 1

๐›พ(๐‘ฅ๐‘– + 1)โˆ’.

Similarly, inserting this and using that prox๐›พ๐น (๐‘ฅ) โˆˆ ๐”นโˆž and ใ€ˆ(๐‘ฅ+1)โˆ’, (๐‘ฅโˆ’1)+ใ€‰๐‘‹ = 0yields that

(๐›ฟ๐”นโˆž)๐›พ (๐‘ฅ) =12๐›พ โ€–(๐‘ฅ โˆ’ 1)+โ€–22 +

12๐›พ โ€–(๐‘ฅ + 1)โˆ’โ€–22,

which corresponds to the classical penalty functional for the inequality constraints๐‘ฅ โˆ’ 1 โ‰ค 0 and ๐‘ฅ + 1 โ‰ฅ 0 in nonlinear optimization; see Figure 7.1b.

By Theorem 7.9, ๐น๐›พ is Frรฉchet dierentiable with Lipschitz continuous gradient with factor๐›พโˆ’1. From Theorem 7.3, we thus know that ๐น โˆ—๐›พ is strongly convex with factor ๐›พ , which in

97

7 smoothness and convexity

Hilbert spaces is equivalent to ๐น โˆ—๐›พ โˆ’ ๐›พ2 โ€– ยท โ€–2๐‘‹ being convex. In fact, this can be made even

more explicit.

Theorem 7.11. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous. Then we have forall ๐›พ > 0 that

(๐น๐›พ )โˆ— = ๐น โˆ— + ๐›พ2 โ€– ยท โ€–2๐‘‹ .

Proof. We obtain directly from the denition of the Fenchel conjugate in Hilbert spacesand of the Moreau envelope that

(๐น๐›พ )โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ inf

๐‘งโˆˆ๐‘‹

[12๐›พ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ + ๐น (๐‘ง)

]}= sup๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ + sup

๐‘งโˆˆ๐‘‹

{โˆ’ 1

2๐›พ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ โˆ’ ๐น (๐‘ง)}}

= sup๐‘งโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰๐‘‹ โˆ’ ๐น (๐‘ง) + sup

๐‘ฅโˆˆ๐‘‹

{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ โˆ’ 1

2๐›พ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹}}

= ๐น โˆ—(๐‘ฅโˆ—) +(12๐›พ โ€– ยท โ€–2๐‘‹

)โˆ—(๐‘ฅโˆ—),

since for any given ๐‘ง โˆˆ ๐‘‹ , the inner supremum is always taken over the full space ๐‘‹ . Theclaim now follows from Lemma 5.4 with ๐‘ = 2 (using again the fact that we have identied๐‘‹ โˆ— with ๐‘‹ ) and Lemma 5.7 (i). ๏ฟฝ

With this, we can show the converse of Theorem 7.9: every smooth function can be obtainedthrough Moreauโ€“Yosida regularization.

Corollary 7.12. Let ๐น : ๐‘‹ โ†’ โ„ be convex and ๐ฟ-smooth. Then for all ๐‘ฅ โˆˆ ๐‘‹ ,

๐น (๐‘ฅ) = (๐บโˆ—)๐ฟโˆ’1 (๐‘ฅ) and โˆ‡๐น (๐‘ฅ) = prox๐ฟ๐บ (๐ฟ๐‘ฅ)

for

๐บ : ๐‘‹ โ†’ โ„, ๐บ (๐‘ฅ) = ๐น โˆ—(๐‘ฅ) โˆ’ 12๐ฟ โ€–๐‘ฅ โ€–

2๐‘‹ .

Proof. Since ๐น is convex and ๐ฟ-smooth and๐‘‹ is a Hilbert space, Lemma 7.1 and Theorem 7.3yields that ๐น โˆ— is strongly convex with factor ๐ฟโˆ’1 and thus that ๐บ is convex. Furthermore,as a Fenchel conjugate of a proper convex functional, ๐น โˆ— and thus ๐บ is proper and lowersemicontinuous. Theorems 5.1 and 7.11 now imply that for all ๐‘ฅ โˆˆ ๐‘‹ ,

(๐บโˆ—)๐ฟโˆ’1 (๐‘ฅ) = (๐บโˆ—)โˆ—โˆ—๐ฟโˆ’1 (๐‘ฅ) =(๐บ + 1

2๐ฟ โ€– ยท โ€–2)โˆ—

(๐‘ฅ) = ๐น โˆ—โˆ—(๐‘ฅ) = ๐น (๐‘ฅ).

98

7 smoothness and convexity

Furthermore, by Lemma 4.13 and Theorems 4.5 and 4.14, we have that

๐œ•๐บ (๐‘ง) = ๐œ•๐น โˆ—(๐‘ง) โˆ’ {๐ฟโˆ’1๐‘ง} for all ๐‘ง โˆˆ ๐‘‹ .By the denition of the proximal mapping, this is equivalent to ๐‘ง = prox๐ฟ๐บ๐ฟ๐‘ฅ for any๐‘ฅ โˆˆ ๐œ•๐น โˆ—(๐‘ง). But by Lemma 5.8, ๐‘ฅ โˆˆ ๐œ•๐น โˆ—(๐‘ง) holds if and only if ๐‘ง โˆˆ ๐œ•๐น (๐‘ฅ) = {โˆ‡๐น (๐‘ฅ)}, andcombining these two yields the rst expression for the gradient. ๏ฟฝ

Let us briey consider the relevance of the previous results to optimization.

Approximation by smoothmappings For a convex functional ๐น : ๐‘‹ โ†’ โ„, everyminimizer๐‘ฅ โˆˆ ๐‘‹ satises the Fermat principle 0 โˆˆ ๐œ•๐น (๐‘ฅ), which we can write equivalently as๐‘ฅ โˆˆ ๐œ•๐น โˆ—(0). If we now replace ๐œ•๐น โˆ— with its Yosida approximation (๐œ•๐น โˆ—)๐›พ , we obtain theregularized optimality condition

๐‘ฅ๐›พ = (๐œ•๐น โˆ—)๐›พ (0) = โˆ’ 1๐›พprox๐›พ๐น โˆ— (0).

This is now an explicit and even Lipschitz continuous relation. Although ๐‘ฅ๐›พ is no longer aminimizer of ๐น , the convexity of ๐น๐›พ implies that ๐‘ฅ๐›พ โˆˆ (๐œ•๐น โˆ—)๐›พ (0) = ๐œ•(๐น โˆ—๐›พ ) (0) is equivalentto

0 โˆˆ ๐œ•(๐น โˆ—๐›พ )โˆ—(๐‘ฅ๐›พ ) = ๐œ•(๐น โˆ—โˆ— + ๐›พ

2 โ€– ยท โ€–2๐‘‹) (๐‘ฅ๐›พ ) = ๐œ• (

๐น + ๐›พ2 โ€– ยท โ€–2๐‘‹

) (๐‘ฅ๐›พ ),i.e., ๐‘ฅ๐›พ is the (unique due to the strict convexity of the squared norm) minimizer of thefunctional ๐น + ๐›พ

2 โ€– ยท โ€–2๐‘‹ . Hence, the regularization of ๐œ•๐น โˆ— has not made the original problemsmooth but merely (more) strongly convex. The equivalence can also be used to show(similarly to the proof of Theorem 2.1) that ๐‘ฅ๐›พ โ‡€ ๐‘ฅ for ๐›พ โ†’ 0. In practice, this straight-forward approach fails due to the diculty of computing ๐น โˆ— and prox๐น โˆ— and is thereforeusually combined with one of the splitting techniques that will be introduced in the nextchapter.

Conversion between gradients and proximal mappings According to Corollary 7.12,solving min๐‘ฅ ๐น (๐‘ฅ) for an ๐ฟ-smooth function ๐น is equivalent to solving

min๐‘ฅ,๐‘ฅโˆˆ๐‘‹

๐บโˆ—(๐‘ฅ) + 12๐ฟ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2.

Observe that๐บโˆ—may be nonsmooth. Supposewe apply an algorithm for the latter thatmakesuse of the proximal mapping of ๐บโˆ— (such as the splitting methods that will be discussedin the following chapters). Then using the Moreau decomposition of Lemma 6.21 (ii) withCorollary 7.12, we see that

prox๐ฟโˆ’1๐บโˆ— (๐‘ฅ) = ๐‘ฅ โˆ’ ๐ฟโˆ’1โˆ‡๐น (๐‘ฅ).Therefore, this can still be done purely in terms of the gradient evaluations of ๐น .

99

7 smoothness and convexity

Remark 7.13. Continuing from Remark 6.26, Moreauโ€“Yosida regularization can also be dened inreexive Banach spaces; we refer to [Brezis, Crandall & Pazy 1970] for details. Again, the mainissue is the practical evaluation of ๐น๐›พ and (๐œ•๐น )๐›พ if the duality mapping (or the Riesz isomorphismin Hilbert spaces that are not identied with their dual) is no longer the identity.

100

8 PROXIMAL POINT AND SPLITTING METHODS

We now turn to the development of algorithms for computing minimizers of functionals๐ฝ : ๐‘‹ โ†’ โ„ of the form

๐ฝ (๐‘ฅ) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ฅ)for ๐น,๐บ : ๐‘‹ โ†’ โ„ convex but not necessarily dierentiable. One of the main dicultiescompared to the dierentiable setting is that the naive equivalent to steepest descent, theiteration

๐‘ฅ๐‘˜+1 โˆˆ ๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜๐œ•๐ฝ (๐‘ฅ๐‘˜),does not work since even in nite dimensions, arbitrary subgradients need not be descentdirections โ€“ this can only be guaranteed for the subgradient of minimal norm; see, e.g.,[Ruszczynski 2006, Example 7.1, Lemma 2.77]. Furthermore, the minimal norm subgradientof ๐ฝ cannot be computed easily from those of ๐น and๐บ . We thus follow a dierent approachand look for a root ๐‘ฅ of the set-valued mapping ๐‘ฅ โ†ฆโ†’ ๐œ•๐ฝ (๐‘ฅ) (which coincides with theminimizer ๐‘ฅ of ๐ฝ if ๐ฝ is convex). In this chapter, we only derive methods, postponing proofsof convergence, in various dierent senses, to Chapters 9 to 11. For the reasons mentionedin the beginning of Section 6.3, we will assume in this and the following chapters that๐‘‹ (aswell as all further occurring spaces) is a Hilbert space so that we can identify ๐‘‹ โˆ— ๏ฟฝ ๐‘‹ .

8.1 proximal point method

We have seen in Corollary 6.19 that a root ๐‘ฅ of ๐œ•๐ฝ : ๐‘‹ โ‡’ ๐‘‹ can be characterized as a xedpoint of prox๐œ ๐ฝ for any ๐œ > 0. This suggests a xed-point iteration: Choose ๐‘ฅ0 โˆˆ ๐‘‹ and foran appropriate sequence {๐œ๐‘˜}๐‘˜โˆˆโ„• of step sizes set

(8.1) ๐‘ฅ๐‘˜+1 = prox๐œ๐‘˜ ๐ฝ (๐‘ฅ๐‘˜).This iteration naturally generalizes to nding a root ๐‘ฅ โˆˆ ๐ดโˆ’1(0) of a set-valued (usuallymonotone) operator ๐ด : ๐‘‹ โ‡’ ๐‘‹ as

(8.2) ๐‘ฅ๐‘˜+1 = R๐œ๐‘˜๐ด (๐‘ฅ๐‘˜).This is the proximal point method, which is the basic building block for all methods in thischapter. Using the denition of the resolvent, this can also be written in implicit form as

(8.3) 0 โˆˆ ๐œ๐‘˜๐ด(๐‘ฅ๐‘˜+1) + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜),

101

8 proximal point and splitting methods

which will be useful for the analysis of the method.

If ๐ด is maximal monotone (in particular if ๐ด = ๐œ•๐ฝ ), Lemma 6.13 shows that the iterationmap ๐‘ฅ โ†ฆโ†’ R๐œ๐‘˜๐ด (๐‘ฅ) is rmly nonexpansive. Mere (nonrm) nonexpansivity already impliesthat

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€– = โ€–R๐œ๐‘˜๐ด (๐‘ฅ๐‘˜) โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ .In other words, the method does not escape from a xed point. Either a more renedanalysis based on rm nonexpansivity of the iteration map or a more direct analysis basedon the maximal monotonicity of ๐ด can be used to further show that the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•indeed converge to a xed point ๐‘ฅ for an initial iterate ๐‘ฅ0. The latter will be the topic ofChapter 9.

A practical issue is the steps (8.1) of the basic proximal point method are typically just asdicult as the original problem, so the method is not feasible for problems that demand aniterative method for their solution in the rst place. However, the proximal step does forman important building block of several more practical splitting methods for problems of theform ๐ฝ = ๐น +๐บ , which we derive in the following by additional clever manipulations.

Remark 8.1. The proximal point algorithm can be traced back to Krasnoselโ€ฒskiฤฑ [Krasnoselโ€ฒskiฤฑ 1955]and Mann [Mann 1953] (as a special case of the Krasnoselโ€ฒskiฤฑโ€“Mann iteration); it was also studiedin [Martinet 1970]. The formulation considered here was proposed in [Rockafellar 1976b].

8.2 explicit splitting

As we have noted, the proximal point method is not feasible for most functionals of theform ๐ฝ (๐‘ฅ) = ๐น (๐‘ฅ) + ๐บ (๐‘ฅ), since the evaluation of prox๐ฝ is not signicantly easier thansolving the original minimization problem โ€“ even if prox๐น and prox๐บ have a closed-formexpression. (Such functionals are called prox-simple). We thus proceed dierently: insteadof applying the proximal point reformulation directly to 0 โˆˆ ๐œ•๐ฝ (๐‘ฅ), we rst apply thesubdierential sum rule (Theorem 4.14) to deduce the existence of ๐‘ โˆˆ ๐‘‹ with

(8.4){๐‘ โˆˆ ๐œ•๐น (๐‘ฅ),

โˆ’๐‘ โˆˆ ๐œ•๐บ (๐‘ฅ).We can now replace one or both of these subdierential inclusions by a proximal pointreformulation that only involves ๐น or ๐บ .

Explicit splitting methods โ€“ also known as forward-backward splitting โ€“ are based onapplying Lemma 6.18 only to, e.g., the second inclusion in (8.4) to obtain

(8.5){๐‘ โˆˆ ๐œ•๐น (๐‘ฅ),๐‘ฅ = prox๐œ๐บ (๐‘ฅ โˆ’ ๐œ๐‘).

The corresponding xed-point iteration then consists in

102

8 proximal point and splitting methods

(i) choosing ๐‘๐‘˜ โˆˆ ๐œ•๐น (๐‘ฅ๐‘˜) (with minimal norm);

(ii) setting ๐‘ฅ๐‘˜+1 = prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜๐‘๐‘˜).Again, computing a subgradient with minimal norm can be complicated in general. It is,however, easy if ๐น is additionally dierentiable since in this case ๐œ•๐น (๐‘ฅ) = {โˆ‡๐น (๐‘ฅ)} byTheorem 4.5. This leads to the proximal gradient or forward-backward splitting method

(8.6) ๐‘ฅ๐‘˜+1 = prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜โˆ‡๐น (๐‘ฅ๐‘˜)).

(The special case ๐บ = ๐›ฟ๐ถ โ€“ i.e., prox๐œ๐บ (๐‘ฅ) = proj๐ถ (๐‘ฅ) โ€“ is also known as the projectedgradient method). Similarly to the proximal point method, this method can be written inimplicit form as

(8.7) 0 โˆˆ ๐œ๐‘˜ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

Based on this, we will see in Chapter 9 that the iterates {๐‘ฅ๐‘˜} converge weakly if ๐œ๐‘˜๐ฟ < 2 for๐ฟ the Lipschitz factor of โˆ‡๐น . The need to know ๐ฟ is one drawback of the explicit splittingmethod. This can to some extend be circumvented by performing a line search, i.e., testingfor various choices of ๐œ๐‘˜ until a sucient decrease in function values is achieved. Wewill discuss such strategies later on in Section 12.3. Another highly successful variant ofexplicit splitting applies inertia to the iterates for faster convergence; this we will discussin Section 12.2 after developing tools for the study of convergence rates.

Remark 8.2. Forward-backward splitting for nding the root of the sum of two monotone operatorswas already proposed in [Lions & Mercier 1979]. It has become especially popular under the nameiterative soft-thresholding (ISTA) in the context of sparse reconstruction (i.e., regularization of linearinverse problems with โ„“ 1 penalties), see, e.g., [Chambolle, DeVore, et al. 1998; Daubechies, Defrise &De Mol 2004; Wright, Nowak & Figueiredo 2009].

8.3 implicit splitting

Even with a line search, the restriction on the step sizes ๐œ๐‘˜ in explicit splitting remainunsatisfactory. Such restrictions are not needed in implicit splitting methods. (Comparethe properties of explicit vs. implicit Euler methods for dierential equations.) Here, theproximal point formulation is applied to both subdierential inclusions in (8.4), whichyields the optimality system {

๐‘ฅ = prox๐œ๐น (๐‘ฅ + ๐œ๐‘),๐‘ฅ = prox๐œ๐บ (๐‘ฅ โˆ’ ๐œ๐‘).

To eliminate ๐‘ from these equations, we set ๏ฟฝ๏ฟฝ โ‰” ๐‘ฅ +๐œ๐‘ and๐‘ค โ‰” ๐‘ฅ โˆ’๐œ๐‘ = 2๐‘ฅ โˆ’ ๏ฟฝ๏ฟฝ. It remainsto derive a recursion for ๏ฟฝ๏ฟฝ, which we obtain from the productive zero ๏ฟฝ๏ฟฝ = ๏ฟฝ๏ฟฝ + (๐‘ฅ โˆ’ ๐‘ฅ).

103

8 proximal point and splitting methods

Further replacing some copies of ๐‘ฅ by a new variable ๐‘ฆ leads to the overall xed pointsystem

๐‘ฅ = prox๐œ๐น (๏ฟฝ๏ฟฝ),๐‘ฆ = prox๐œ๐บ (2๐‘ฅ โˆ’ ๏ฟฝ๏ฟฝ),๏ฟฝ๏ฟฝ = ๏ฟฝ๏ฟฝ + ๐‘ฅ โˆ’ ๐‘ฆ.

The corresponding xed-point iteration leads to the Douglasโ€“Rachford splitting (DRS)method

(8.8)

๐‘ฅ๐‘˜+1 = prox๐œ๐น (๐‘ง๐‘˜),๐‘ฆ๐‘˜+1 = prox๐œ๐บ (2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜),๐‘ง๐‘˜+1 = ๐‘ง๐‘˜ + ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1.

Of course, the algorithm and its derivation generalize to arbitrary monotone operators๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘‹ :

(8.9)

๐‘ฅ๐‘˜+1 = R๐œ๐ต (๐‘ง๐‘˜),๐‘ฆ๐‘˜+1 = R๐œ๐ด (2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜),๐‘ง๐‘˜+1 = ๐‘ง๐‘˜ + ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1.

We can also write the DRS method in more implicit form. Indeed, inverting the resolventsin (8.9) and using the last update to change variables in the rst two yields

0 โˆˆ ๐œ๐ต(๐‘ฅ๐‘˜+1) + ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ง๐‘˜+1,0 โˆˆ ๐œ๐ด(๐‘ฆ๐‘˜+1) + ๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1,0 = ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜+1 + (๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜).

Therefore, with ๐‘ข โ‰” (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐‘‹ 3, and the operators1

(8.10) ๐ป (๐‘ฅ, ๐‘ฆ, ๐‘ง) โ‰” ยฉยญยซ๐œ๐ต(๐‘ฅ) + ๐‘ฆ โˆ’ ๐‘ง๐œ๐ด(๐‘ฆ) + ๐‘ง โˆ’ ๐‘ฅ

๐‘ฅ โˆ’ ๐‘ฆยชยฎยฌ and ๐‘€ โ‰”

ยฉยญยซ0 0 00 0 00 0 Id

ยชยฎยฌ ,we can write the DRS method as the preconditioned proximal point method

(8.11) 0 โˆˆ ๐ป (๐‘ข๐‘˜+1) +๐‘€ (๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜).

Indeed, the basic proximal point in implicit form (8.3) is just (8.11) with the preconditioner๐‘€ = ๐œโˆ’1Id. It is furthermore straightforward to verify that 0 โˆˆ ๐ป (๐‘ข) is equivalent to0 โˆˆ ๐ด(๐‘ฅ) + ๐ต(๐‘ฅ).1Here and in the following, we identify ๐‘ฅ โˆˆ ๐‘‹ with the singleton set {๐‘ฅ} โŠ‚ ๐‘‹ whenever there is no dangerof confusion.

104

8 proximal point and splitting methods

The formulation (8.11) will in the following chapter form the basis for proving the con-vergence of the method. Recalling the discussion on convergence in Section 8.1, it seemsbenecial for ๐ป to be maximally monotone, as then (although this is not immediate fromLemma 6.13) it is reasonable to expect the nonexpansivity of the iterates with respect tothe semi-norm ๐‘ข โ†ฆโ†’ โ€–๐‘ขโ€–๐‘€ โ‰”

โˆšใ€ˆ๐‘€๐‘ข,๐‘ขใ€‰ on ๐‘‹ 3 induced by the self-adjoint operator๐‘€ , i.e.,

thatโ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–๐‘€ โ‰ค โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘€ .

While it is straightforward to verify that ๐ป is monotone if ๐ด and ๐ต are, the question ofmaximal monotonicity is more involved and will be addressed in Chapter 9. There, we willalso show that the expected nonexpansivity holds in a slightly stronger sense and that thiswill yield the convergence of the method.

Remark 8.3. The Douglasโ€“Rachford splitting was rst introduced in [Douglas & Rachford 1956]; therelationship to the proximal point method was discovered in [Eckstein & Bertsekas 1992]. It is alsopossible to devise acceleration schemes under strong monotonicity [see, e.g., Bredies & Sun 2016].

8.4 primal-dual proximal splitting

We now consider problems of the form

(8.12) min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ)

for ๐น : ๐‘‹ โ†’ โ„ and๐บ : ๐‘Œ โ†’ โ„ proper, convex, and lower semicontinuous, and๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ).Applying Theorem 5.10 and Lemma 5.8 to such a problem yields the Fenchel extremalityconditions

(8.13){โˆ’๐พโˆ—๐‘ฆ โˆˆ ๐œ•๐น (๐‘ฅ),

๐‘ฆ โˆˆ ๐œ•๐บ (๐พ๐‘ฅ), โ‡”{โˆ’๐พโˆ—๐‘ฆ โˆˆ ๐œ•๐น (๐‘ฅ),

๐พ๐‘ฅ โˆˆ ๐œ•๐บโˆ—(๐‘ฆ).

With the general notation ๐‘ข โ‰” (๐‘ฅ, ๐‘ฆ), this can be written as 0 โˆˆ ๐ป (๐‘ข) for

(8.14) ๐ป (๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ) + ๐พโˆ—๐‘ฆ๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ๐‘ฅ

),

It is again not dicult to see that ๐ป is monotone. This suggests that we might be able toapply the proximal point method to nd a root of ๐ป . In practice we however need to worka little bit more, as the resolvent of ๐ป can rarely be given an explicit, easily solvable form.If, however, the resolvents of ๐บ and ๐น โˆ— can individually be computed explicitly, it makessense to try to decouple the primal and dual variables. This is what we will do.

105

8 proximal point and splitting methods

To do so, we reformulate for arbitrary ๐œŽ, ๐œ > 0 the extremality conditions (8.13) usingLemma 6.18 as {

๐‘ฅ = prox๐œ๐น (๐‘ฅ โˆ’ ๐œ๐พโˆ—๐‘ฆ),๐‘ฆ = prox๐œŽ๐บโˆ— (๐‘ฆ + ๐œŽ๐พ๐‘ฅ).

This suggests the xed-point iterations

(8.15){๐‘ฅ๐‘˜+1 = prox๐œ๐น (๐‘ฅ๐‘˜ โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜),๐‘ฆ๐‘˜+1 = prox๐œŽ๐บโˆ— (๐‘ฆ๐‘˜ + ๐œŽ๐พ๐‘ฅ๐‘˜+1).

In the rst equation, we now use prox๐œ๐น = (Id + ๐œ๐œ•๐น )โˆ’1 to obtain that

(8.16) ๐‘ฅ๐‘˜+1 = prox๐œ๐น (๐‘ฅ๐‘˜ โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜) โ‡” ๐‘ฅ๐‘˜ โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜ โˆˆ ๐‘ฅ๐‘˜+1 + ๐œ๐œ•๐น (๐‘ฅ๐‘˜+1)โ‡” 0 โˆˆ ๐œโˆ’1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) โˆ’ ๐พโˆ—(๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜)

+ [๐œ•๐น (๐‘ฅ๐‘˜+1) + ๐พโˆ—๐‘ฆ๐‘˜+1] .

Similarly, the second equation of (8.15) gives

(8.17) ๐‘ฆ๐‘˜+1 = prox๐œŽ๐บโˆ— (๐‘ฆ๐‘˜ + ๐œŽ๐พ๐‘ฅ๐‘˜+1) โ‡” ๐œŽโˆ’1๐‘ฆ๐‘˜ โˆˆ ๐œŽโˆ’1๐‘ฆ๐‘˜+1 + ๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ ๐พ๐‘ฅ๐‘˜+1โ‡” 0 โˆˆ ๐œŽโˆ’1(๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜) + [๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ ๐พ๐‘ฅ๐‘˜+1] .

With the help of (8.16), (8.17), and the operator

๏ฟฝ๏ฟฝ โ‰”

(๐œโˆ’1Id โˆ’๐พโˆ—

0 ๐œŽโˆ’1Id

),

we can then rearrange (8.15) as the preconditioned proximal point method (8.11). Further-more, provided the step lengths are such that๐‘€ = ๏ฟฝ๏ฟฝ is invertible, this can be written

(8.18) 0 โˆˆ ๐ป (๐‘ข๐‘˜+1) +๐‘€ (๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜) โ‡” ๐‘ข๐‘˜+1 = R๐‘€โˆ’1๐ป๐‘ข๐‘˜ .

However, considering our rough discussions on convergence in the previous sections, thereis a problem:๐‘€ is not self-adjoint and therefore does not induce a (semi-)norm on ๐‘‹ ร— ๐‘Œ .We therefore change our algorithm and take

(8.19) ๐‘€ โ‰”

(๐œโˆ’1Id โˆ’๐พโˆ—

โˆ’๐พ ๐œŽโˆ’1Id

).

Correspondingly, replacing (8.17) by

๐‘ฆ๐‘˜+1 = prox๐œŽ๐บโˆ— (๐‘ฆ๐‘˜ + ๐œŽ๐พ (2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)) โ‡” ๐œŽโˆ’1๐‘ฆ๐‘˜ โˆ’ ๐พ๐‘ฅ๐‘˜ โˆˆ ๐œŽโˆ’1๐‘ฆ๐‘˜+1 + ๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ 2๐พ๐‘ฅ๐‘˜+1

โ‡” 0 โˆˆ ๐œŽโˆ’1(๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜) โˆ’ ๐พ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)+ [๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ ๐พ๐‘ฅ๐‘˜+1],

106

8 proximal point and splitting methods

we then obtain from (8.18) the Primal-Dual Proximal Splitting (PDPS) method

(8.20)

๐‘ฅ๐‘˜+1 = prox๐œ๐น (๐‘ฅ๐‘˜ โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜),๐‘ฅ๐‘˜+1 = 2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜+1 = prox๐œŽ๐บโˆ— (๐‘ฆ๐‘˜ + ๐œŽ๐พ๐‘ฅ๐‘˜+1).

The middle over-relaxation step is a consequence of our choice of the bottom-left cornerof ๐‘€ dened in (8.20). This itself was forced to have its current form through the self-adjointness requirement on๐‘€ and the choice of the top-right corner of๐‘€ . As mentionedabove, the role of the latter is to decouple the primal update from the dual update by shifting๐พโˆ—๐‘ฆ๐‘˜+1 within ๐ป to ๐พโˆ—๐‘ฆ๐‘˜ so that the primal iterate ๐‘ฅ๐‘˜+1 can be computed without knowing๐‘ฆ๐‘˜+1. (Alternatively, we could zero out the o-diagonal of๐‘€ and still have a self-adjointoperator, but then we would generally not be able to compute ๐‘ฅ๐‘˜+1 independent of ๐‘ฆ๐‘˜+1.)

Wewill in the following chapters demonstrate that the PDPSmethod converges if๐œŽ๐œ โ€–๐พ โ€–2 <1, and that in fact it has particularly good convergence properties. Note that although theiteration (9.19) is implicit in ๐น and ๐บ , it is still explicit in ๐พ ; it is therefore not surprisingthat step size restrictions based on ๐พ remain. Applying, for example, the PDPS methodwith ๏ฟฝ๏ฟฝ (๐‘ฅ) โ‰” ๐บ (๐พ๐‘ฅ) (i.e., applying only the sum rule but not the chain rule) would lead to afully implicit method. This would, however, require computing ๐พโˆ’1 in the primal proximalstep involving prox๐œŽ๏ฟฝ๏ฟฝโˆ— . It is precisely the point of the primal-dual proximal splitting toavoid having to invert ๐พ , which is often prohibitively expensive if not impossible (e.g., if ๐พdoes not have closed range as in many inverse problems).

Remark 8.4. The primal-dual proximal splitting was rst introduced in [Pock et al. 2009] for specicimage segmentation problems, and later more generally in [Chambolle & Pock 2011]. For this reason,it is frequently referred to as the Chambolleโ€“Pock method. The relation to proximal point methodswas rst pointed out in [He & Yuan 2012]. In [Esser, Zhang & Chan 2010] it was classied as thePrimal-Dual Hybrid Gradient method, Modied or PDHGM after the method (8.15), which is calledthe PDHG. The latter is due to [Zhu & Chan 2008].

Banach space generalizations of the PDPS method, based on a so-called Bregman divergence in placeof ๐‘ข โ†ฆโ†’ 1

2 โ€–๐‘ขโ€–2, were introduced in [Hohage & Homann 2014]. We will discuss Bregman divergencesin further detail in Section 11.1.

The PDPS method has been also generalized to dierent types of nonconvex problems in [Valkonen2014; Mรถllenho et al. 2015]. Stochastic generalizations are considered in [Valkonen 2019; Chambolle,Ehrhardt, et al. 2018].

8.5 primal-dual explicit splitting

The PDPS method is useful for dealing with the sum of functionals where one summandincludes a linear operator. However, if this is the case for both operators, i.e.,

min๐‘ฅโˆˆ๐‘‹

๐น (๐ด๐‘ฅ) +๐บ (๐พ๐‘ฅ)

107

8 proximal point and splitting methods

for ๐น : ๐‘ โ†’ โ„, ๐บ : ๐‘Œ โ†’ โ„, ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘ ) and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), we again have the problem ofdealing with a complicated proximal mapping. One workaround is the following โ€œliftingtrickโ€: we introduce

(8.21) ๐น (๐‘ฅ) โ‰” 0, ๏ฟฝ๏ฟฝ (๐‘ฆ, ๐‘ง) โ‰” ๐บ (๐‘ฆ) + ๐น (๐‘ง) and ๏ฟฝ๏ฟฝ๐‘ฅ โ‰” (๐พ๐‘ฅ,๐ด๐‘ฅ),

and then apply the PDPS method to the reformulated problem min๐‘ฅ ๐น (๐‘ฅ) + ๏ฟฝ๏ฟฝ (๏ฟฝ๏ฟฝ๐‘ฅ). Ac-cording to Lemma 6.21 (iii), the dual step of the PDPS method will then split into separateproximal steps with respect to ๐บโˆ— and ๐น โˆ—, while the proximal map in the primal step willbe trivial. However, an additional dual variable will have been introduced through theintroduction of ๐‘ง above, which can be costly.

An alternative approach is the following. Analogously to (8.15), but only using Lemma 6.18on the second relation of (8.13) together with the chain rule (Theorem 4.17), we can refor-mulate the latter as

(8.22){๐‘ฅ โˆˆ ๐‘ฅ โˆ’ ๐œ [๐œ•๐ดโˆ—๐น (๐ด๐‘ฅ) + ๐พโˆ—๐‘ฆ],๐‘ฆ = prox๐œŽ๐บโˆ— (๐‘ฆ + ๐œŽ๐พ๐‘ฅ).

(For ๐พ = Id, we can alternatively obtain (8.22) from the derivation of explicit splitting byusing Moreauโ€™s identity, Theorem 6.20, in the second relation of (8.5).)

If ๐น is Gรขteaux dierentiable (and taking ๐ด = Id for the sake of presentation), inserting therst relation in the second relation, (8.22) can be further rewritten as{

๐‘ฅ = ๐‘ฅ โˆ’ ๐œ [โˆ‡๐น (๐‘ฅ) + ๐พโˆ—๐‘ฆ],๐‘ฆ = prox๐œŽ๐บโˆ— (๐‘ฆ + ๐œŽ๐พ๐‘ฅ โˆ’ ๐œŽ๐œ๐พ [โˆ‡๐น (๐‘ฅ) + ๐พโˆ—๐‘ฆ]).

Reordering the lines and xing ๐œ = ๐œŽ = 1, the corresponding xed-point iteration leads tothe primal-dual explicit splitting (PDES) method

(8.23){๐‘ฆ๐‘˜+1 = prox๐บโˆ— ((Id โˆ’ ๐พ๐พโˆ—)๐‘ฆ๐‘˜ + ๐พ (๐‘ฅ๐‘˜ โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜))),๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ ๐พโˆ—๐‘ฆ๐‘˜+1.

Again, we can write (8.23) in more implicit form as{0 โˆˆ ๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ ๐พ (๐‘ฅ๐‘˜ โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ ๐พโˆ—๐‘ฆ๐‘˜) + (๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜),0 = โˆ‡๐น (๐‘ฅ๐‘˜) + ๐พโˆ—๐‘ฆ๐‘˜+1 + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

Inserting the second relation in the rst, this is{0 โˆˆ ๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ ๐พ๐‘ฅ๐‘˜+1 + (Id โˆ’ ๐พ๐พโˆ—) (๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜),0 = โˆ‡๐น (๐‘ฅ๐‘˜) + ๐พโˆ—๐‘ฆ๐‘˜+1 + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

108

8 proximal point and splitting methods

If we now introduce the preconditioning operator

(8.24) ๐‘€ โ‰”

(Id 00 Id โˆ’ ๐พ๐พโˆ—

),

then in terms of the monotone operator ๐ป introduced in (8.14) for the PDPS method and๐‘ข = (๐‘ฅ, ๐‘ฆ), the PDES method (8.23) can be written in implicit form as

(8.25) 0 โˆˆ ๐ป (๐‘ข๐‘˜+1) +(โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜+1)

0

)+๐‘€ (๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜).

The middle term switches the step with respect to ๐น to be explicit. Note that (8.7) couldhave also been written with a similar middle term; we can therefore think of the PDESmethod as a preconditioned explicit splitting method.

The preconditioning operator๐‘€ is self-adjoint as well as positive semi-denite if โ€–๐พ โ€– โ‰ค 1.It does not have the o-diagonal decoupling terms that the preconditioner for the PDPSmethod has. Instead, through the special structure of the problem the term Id โˆ’ ๐พ๐พโˆ—

decouple ๐‘ฆ๐‘˜+1 from ๐‘ฅ๐‘˜+1, allowing ๐‘ฆ๐‘˜+1 be computed rst.

We will in Section 9.4 see that the iterates of the PDES method converge weakly when โˆ‡๐นis Lipschitz with factor strictly less than 2.

Remark 8.5. The primal-dual explicit splitting was introduced in [Loris & Verhoeven 2011] asGeneralized Iterative Soft Thresholding (GIST) for ๐น (๐‘ฅ) = 1

2 โ€–๐‘ โˆ’ ๐‘ฅ โ€–2. The general case has later beencalled the primal-dual xed point method (PDFP) in [Chen, Huang & Zhang 2013] and the proximalalternating predictor corrector (PAPC) in [Drori, Sabach & Teboulle 2015].

8.6 the augmented lagrangian and alternating direction minimization

Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘ โ†’ โ„ be convex, proper, and lower semicontinuous. Also let๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and ๐ต โˆˆ ๐•ƒ(๐‘ ;๐‘Œ ), and consider for some ๐‘ โˆˆ ๐‘Œ the problem

(8.26) min๐‘ฅ,๐‘ง

๐น (๐‘ฅ) +๐บ (๐‘ง) s.t. ๐ด๐‘ฅ + ๐ต๐‘ง = ๐‘.

A traditional way to handle this kind of constraint problems is by means of the AugmentedLagrangian. We start by introducing the Lagrangian

L(๐‘ฅ, ๐‘ง; _) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ง) + ใ€ˆ๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘, _ใ€‰๐‘Œ .

Then (8.26) has the same solutions as the saddle-point problem

(8.27) min๐‘ฅโˆˆ๐‘‹,๐‘งโˆˆ๐‘

max_โˆˆ๐‘Œ

L(๐‘ฅ, ๐‘ง; _).

109

8 proximal point and splitting methods

We may then โ€œaugmentโ€ the Lagrangian by a squared penalty on the violation of theconstraint, hence obtaining the equivalent problem

(8.28) min๐‘ฅโˆˆ๐‘‹,๐‘งโˆˆ๐‘

max_โˆˆ๐‘Œ

L๐œ (๐‘ฅ, ๐‘ง; _) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ง) + ใ€ˆ๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘, _ใ€‰๐‘Œ + ๐œ2 โ€–๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘ โ€–2๐‘Œ .

The saddle-point functional L๐œ is called the augmented Lagrangian.

A classical approach for the solution of (8.28) is by alternatingly solving for one variable,keeping the others xed. If we take a proximal step for the dual variable or Lagrangemultiplier _, this yields the Alternating Directions Method of Multipliers (ADMM)

(8.29)

๐‘ฅ๐‘˜+1 โ‰” argmin๐‘ฅโˆˆ๐‘‹

L๐œ (๐‘ฅ, ๐‘ง๐‘˜ ; _๐‘˜),

๐‘ง๐‘˜+1 โ‰” argmin๐‘งโˆˆ๐‘

L๐œ (๐‘ฅ๐‘˜+1, ๐‘ง; _๐‘˜),

_๐‘˜+1 โ‰” argmax_โˆˆ๐‘Œ

L๐œ (๐‘ฅ๐‘˜+1, ๐‘ง๐‘˜+1; _) โˆ’ 12๐œ โ€–_ โˆ’ _

๐‘˜ โ€–2๐‘Œ .

This can be rewritten as

(8.30)

๐‘ฅ๐‘˜+1 โˆˆ (๐ดโˆ—๐ด + ๐œโˆ’1๐œ•๐น )โˆ’1(๐ดโˆ—(๐‘ โˆ’ ๐ต๐‘ง๐‘˜ โˆ’ ๐œโˆ’1_๐‘˜)),๐‘ง๐‘˜+1 โˆˆ (๐ตโˆ—๐ต + ๐œโˆ’1๐œ•๐บ)โˆ’1(๐ตโˆ—(๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1 โˆ’ ๐œโˆ’1_๐‘˜)),_๐‘˜+1 โ‰” _๐‘˜ + ๐œ (๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘).

As can be observed, the ADMM requires inverting relatively complicated set-valued opera-tors in place of simple proximal point operations. This is why the basic ADMM is seldompractically implementable without the application of a further optimization method tosolve the ๐‘ฅ and ๐‘ง updates.

In the literature, there have been various remedies to the nonimplementability of theADMM. In particular, one can modify the ADMM iterations by adding to (8.29) additionalproximal terms. Introducing for some๐‘„๐‘ฅ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) and๐‘„๐‘ง โˆˆ ๐•ƒ(๐‘ ;๐‘ ) the weighted normsโ€–๐‘ฅ โ€–๐‘„๐‘ฅ โ‰”

โˆšใ€ˆ๐‘„๐‘ฅ๐‘ฅ, ๐‘ฅใ€‰๐‘‹ and โ€–๐‘งโ€–๐‘„๐‘ง โ‰”

โˆšใ€ˆ๐‘„๐‘ง๐‘ง, ๐‘งใ€‰๐‘ , this leads to the iteration

(8.31)

๐‘ฅ๐‘˜+1 โ‰” argmin๐‘ฅโˆˆ๐‘‹

L๐œ (๐‘ฅ, ๐‘ง๐‘˜ ; _๐‘˜) + 12 โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘„๐‘ฅ ,

๐‘ง๐‘˜+1 โ‰” argmin๐‘งโˆˆ๐‘

L๐œ (๐‘ฅ๐‘˜+1, ๐‘ง; _๐‘˜) + 12 โ€–๐‘ง โˆ’ ๐‘ง

๐‘˜ โ€–2๐‘„๐‘ง ,

_๐‘˜+1 โ‰” argmax_โˆˆ๐‘Œ

L๐œ (๐‘ฅ๐‘˜+1, ๐‘ง๐‘˜+1; _) โˆ’ 12๐œ โ€–_ โˆ’ _

๐‘˜ โ€–2๐‘Œ .

If we specically take ๐‘„๐‘ฅ โ‰” ๐œŽโˆ’1Id โˆ’ ๐œ๐ดโˆ—๐ด and ๐‘„๐‘ง โ‰” \โˆ’1Id โˆ’ ๐œ๐ตโˆ—๐ต for some ๐œŽ, \ > 0 with

110

8 proximal point and splitting methods

\๐œ โ€–๐ดโ€– < 1 and ๐œŽ๐œ โ€–๐ตโ€– < 1, then we can expand

L๐œ (๐‘ฅ, ๐‘ง; _) + 12 โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘„๐‘ฅ = ๐น (๐‘ฅ) +๐บ (๐‘ง) + ใ€ˆ๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘, _ใ€‰๐‘Œ

+ ๐œ ใ€ˆ๐‘ฅ,๐ดโˆ—(๐ต๐‘ง โˆ’ ๐‘)ใ€‰๐‘‹ + ๐œ2 โ€–๐ต๐‘ง โˆ’ ๐‘ โ€–2๐‘Œ

+ 12๐œŽ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ๐œ ใ€ˆ๐‘ฅ๐‘˜+1, ๐ดโˆ—๐ด๐‘ฅ๐‘˜ใ€‰๐‘‹ โˆ’ ๐œ

2 โ€–๐ด๐‘ฅ๐‘˜ โ€–2๐‘Œ ,

which has the โ€œpartialโ€ subdierential ๐œ•๐‘ฅ with respect to ๐‘ฅ (keeping ๐‘ง, _ xed)

๐œ•๐‘ฅL๐œ (๐‘ฅ, ๐‘ง; _) = ๐œ•๐น (๐‘ฅ) +๐ดโˆ—_ + ๐œ๐ดโˆ—(๐ต๐‘ง โˆ’ ๐‘) + ๐œŽโˆ’1(๐‘ฅ โˆ’ ๐‘ฅ๐‘˜) + ๐œ๐ดโˆ—๐ด๐‘ฅ๐‘˜ .

Similarly computing the partial subdierential ๐œ•๐‘ง with respect to ๐‘ง, (8.31) can thus be writtenas the preconditioned ADMM

(8.32)

๐‘ฅ๐‘˜+1 โ‰” prox๐œŽ๐น ((Id โˆ’ ๐œŽ๐œ)๐ดโˆ—๐ด๐‘ฅ๐‘˜ + ๐œŽ๐ดโˆ—(๐œ (๐‘ โˆ’ ๐ต๐‘ง๐‘˜) โˆ’ _๐‘˜)),๐‘ง๐‘˜+1 โ‰” prox\๐บ ((Id โˆ’ \๐œ)๐ตโˆ—๐ต๐‘ง๐‘˜ + \๐ตโˆ—(๐œ (๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1) โˆ’ _๐‘˜)),_๐‘˜+1 โ‰” _๐‘˜ + ๐œ (๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘).

We will see in the next section that this method is just the PDPS method with the primaland dual variables exchanged.

Remark 8.6. The ADMM was introduced in [Gabay 1983; Arrow, Hurwicz & Uzawa 1958] as analternating approach to the classical Augmented Lagrangian method. The preconditioned ADMMis due to [Zhang, Burger & Osher 2011].

8.7 connections

In Section 8.5 we have seen the importance and interplay of problem formulation andalgorithm choice for problems with a specic structure. We will now see that many of thealgorithmswe have presented are actually equivalentwhen applied to diering formulationsof the problem. Hence, if one algorithm is ecient on one formulation of the problem,another algorithm may work equally well on a dierent formulation.

We start by considering the ADMM problem (8.26), which we can reformulate as

min๐‘ฅ,๐‘ง

๐น (๐‘ฅ) +๐บ (๐‘ง) + ๐›ฟ{๐‘} (๐ด๐‘ฅ + ๐ต๐‘ง).

Applying the PDPS method (8.20) to this formulation yields the algorithm

(8.33)

๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐น (๐‘ฅ๐‘˜ โˆ’ ๐œ๐ดโˆ—_๐‘˜),๐‘ง๐‘˜+1 โ‰” prox๐œ๐บ (๐‘ง๐‘˜ โˆ’ ๐œ๐ตโˆ—_๐‘˜),๐‘ฅ๐‘˜+1 โ‰” 2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ ,๐‘ง๐‘˜+1 โ‰” 2๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ ,_๐‘˜+1 โ‰” _๐‘˜ + ๐œŽ (๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘).

111

8 proximal point and splitting methods

Note that both the ADMM (8.30) and the preconditioned ADMM (8.32) have a very similarform to this iteration. We will now demonstrate that if๐ด = Id and so ๐‘‹ = ๐‘Œ , i.e., if we wantto solve the (primal) problem

(8.34) min๐‘งโˆˆ๐‘

๐น (๐‘ โˆ’ ๐ต๐‘ง) +๐บ (๐‘ง),

then the ADMM is equivalent to the PDPS method (8.20) applied to the (dual) problem

(8.35) min๐‘ฆโˆˆ๐‘Œ

[๐บโˆ—(๐ตโˆ—๐‘ฆ) โˆ’ ใ€ˆ๐‘, ๐‘ฆใ€‰๐‘Œ ] + ๐น โˆ—(๐‘ฆ),

where the dual step will be performed with respect to ๐น โˆ—.

To make the exact way the PDPS method is applied in each instance clearer, and to highlightthe primal-dual nature of the PDPS method, it will be more convenient to write the problemto which the PDPS method is applied in saddle-point form. Specically, minding (5.4)together with the discussion following Theorem 5.10, the problem min๐‘ฅ ๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) canbe written as the saddle-point problem

min๐‘ฅโˆˆ๐‘‹

max๐‘ฆโˆˆ๐‘Œ

๐น (๐‘ฅ) + ใ€ˆ๐พ๐‘ฅ, ๐‘ฆใ€‰๐‘Œ โˆ’๐บโˆ—(๐‘ฆ).

This formulation also shows the dual variable directly in the problem formulation. Appliedto (8.35), we then obtain the problem

(8.36) min๐‘ฆโˆˆ๐‘Œ

max๐‘ฅโˆˆ๐‘‹

[๐บโˆ—(๐ตโˆ—๐‘ฆ) โˆ’ ใ€ˆ๐‘, ๐‘ฆใ€‰๐‘Œ ] + ใ€ˆ๐‘ฅ, ๐‘ฆใ€‰๐‘Œ โˆ’ ๐น (๐‘ฅ).

Our claim is that the PDPS method applied to this saddle-point formulation is equivalentto the ADMM in case of ๐ด = Id. The iterates of the two algorithms will be dierent, as thevariables solved for will be dierent aside from the shared ๐‘ฅ . However, all the variableswill be related by ane transformations.

We will also demonstrate that the preconditioned ADMM is equivalent to the PDPS methodwhen ๐ต = Id. In fact, we will demonstrate a chain of relationships from ADMM or precon-ditioned ADMM (primal problem) via the PDPS (saddle-point problem) method to the DRSmethod (dual problem); the equivalence between the ADMM and the DRS method evenholds generally.

To demonstrate the idea, we start with ๐ด = ๐ต = Id. Then (8.30) reads

(8.37)

๐‘ฅ๐‘˜+1 โ‰” prox๐œโˆ’1๐น (๐‘ โˆ’ ๐‘ง๐‘˜ โˆ’ ๐œโˆ’1_๐‘˜),๐‘ง๐‘˜+1 โ‰” prox๐œโˆ’1๐บ (๐‘ โˆ’ ๐‘ฅ๐‘˜+1 โˆ’ ๐œโˆ’1_๐‘˜),_๐‘˜+1 โ‰” _๐‘˜ + ๐œ (๐‘ฅ๐‘˜+1 + ๐‘ง๐‘˜+1 โˆ’ ๐‘).

Using the third step for the previous iteration to obtain an expression for ๐‘ง๐‘˜ , we can rewritethe rst step as

๐‘ฅ๐‘˜+1 โ‰” prox๐œโˆ’1๐น (๐‘ฅ๐‘˜ โˆ’ ๐œโˆ’1(2_๐‘˜ โˆ’ _๐‘˜โˆ’1)) .

112

8 proximal point and splitting methods

If we use Lemma 6.21 (ii), the second step reads

๐‘ง๐‘˜+1 โ‰” (๐‘ โˆ’ ๐‘ฅ๐‘˜+1 โˆ’ ๐œโˆ’1_๐‘˜) โˆ’ ๐œโˆ’1prox๐œ๐บโˆ— (๐œ (๐‘ โˆ’ ๐‘ฅ๐‘˜+1) โˆ’ _๐‘˜).

Minding the third step of (8.37), this yields _๐‘˜+1 = โˆ’prox๐œ๐บโˆ— (๐œ (๐‘ โˆ’ ๐‘ฅ๐‘˜+1) โˆ’ _๐‘˜). Replacing_๐‘˜+1 by ๐‘ฆ๐‘˜+1 โ‰” โˆ’_๐‘˜+1, moving ๐‘ into the proximal part, and reordering the steps such that๐‘ฅ๐‘˜+1 becomes ๐‘ฅ๐‘˜ , transforms (8.37) into

(8.38){๐‘ฆ๐‘˜+1 โ‰” prox๐œ (๐บโˆ—โˆ’ใ€ˆ๐‘, ยท ใ€‰) (๐‘ฆ๐‘˜ โˆ’ ๐œ๐‘ฅ๐‘˜),๐‘ฅ๐‘˜+1 โ‰” prox๐œโˆ’1๐น (๐‘ฅ๐‘˜ + ๐œโˆ’1(2๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜)) .

This is the PDPS method applied to (8.36) with ๐ต = Id. However, the step lengths ๐œ and๐œŽ = ๐œโˆ’1 do not satisfy ๐œ๐œŽ โ€–๐พ โ€–2 < 1, which would be needed to deduce convergence of theADMM from that of the PDPS method. But we will see in Chapter 11 that these step lengthsat least lead to convergence of a certain โ€œLagrangian duality gapโ€, and for the ADMM wecan in general only prove such gap estimates.

To show the relation of ADMM to implicit splitting, we further use Lemma 6.21 (ii) in thesecond step of (8.38) to obtain

๐‘ฅ๐‘˜+1 = ๐œโˆ’1(2๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜) + ๐‘ฅ๐‘˜ โˆ’ ๐œโˆ’1prox๐œ๐น โˆ— (2๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜ + ๐œ๐‘ฅ๐‘˜).

Introducing๐‘ค๐‘˜+1 โ‰” ๐‘ฆ๐‘˜+1 โˆ’ ๐œ๐‘ฅ๐‘˜+1 and changing variables, we thus transform (8.38) into

(8.39){๐‘ฆ๐‘˜+1 โ‰” prox๐œ (๐บโˆ—โˆ’ใ€ˆ๐‘, ยท ใ€‰) (๐‘ค๐‘˜),๐‘ค๐‘˜+1 โ‰” ๐‘ค๐‘˜ โˆ’ ๐‘ฆ๐‘˜+1 + prox๐œ๐น โˆ— (2๐‘ฆ๐‘˜+1 โˆ’๐‘ค๐‘˜).

But this is the DRS method (8.8) applied to

min๐‘ฅโˆˆ๐‘‹

๐น โˆ—(๐‘ฅ) + [๐บโˆ—(๐‘ฅ) โˆ’ ใ€ˆ๐‘, ๐‘ฅใ€‰๐‘‹ ] .

Recall now from Lemma 5.4 that [๐บ (๐‘ โˆ’ ยท )]โˆ— = ๐บโˆ—(โˆ’ ยท ) + ใ€ˆ๐‘, ยท ใ€‰๐‘Œ . Theorem 5.10 thus showsthat this is the dual problem of (8.34).

We canmake the correspondence more general with the help of the following generalizationof Moreauโ€™s identity (Theorem 6.20).

Lemma 8.7. Let ๐‘† = ๐บ โ—ฆ ๐พ for convex, proper, and lower semicontinuous ๐บ : ๐‘Œ โ†’ โ„ and๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ). Then for all ๐‘ฅ โˆˆ ๐‘‹ and ๐›พ > 0,

๐‘ฅ = prox๐›พ๐‘† (๐‘ฅ) + ๐›พ๐พโˆ—(๐พ๐พโˆ— + ๐›พโˆ’1๐œ•๐บโˆ—)โˆ’1(๐›พโˆ’1๐พ๐‘ฅ).

In particular,prox๐‘†โˆ— (๐‘ฅ) = ๐พโˆ—(๐พ๐พโˆ— + ๐œ•๐บโˆ—)โˆ’1(๐พ๐‘ฅ).

113

8 proximal point and splitting methods

Proof. By Theorem 5.10,๐‘ค = prox๐›พ๐‘† (๐‘ฅ) if and only if for some ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— holds{โˆ’๐พโˆ—๐‘ฆโˆ— โˆˆ ๐‘ค โˆ’ ๐‘ฅ,๐‘ฆโˆ— โˆˆ ๐›พ๐œ•๐บ (๐พ๐‘ค) .

In other words, by Lemma 5.8, {โˆ’๐พโˆ—๐‘ฆโˆ— = ๐‘ค โˆ’ ๐‘ฅ,๐พ๐‘ค โˆˆ ๐œ•๐บโˆ—(๐›พโˆ’1๐‘ฆโˆ—).

Applying ๐พ to the rst relation, inserting the second, and multiplying by ๐›พโˆ’1 yields

๐พ๐พโˆ—๐›พโˆ’1๐‘ฆโˆ— + ๐›พโˆ’1๐œ•๐บโˆ—(๐›พโˆ’1๐‘ฅโˆ—) = ๐›พโˆ’1๐พ๐‘ฆ,

i.e., ๐›พโˆ’1๐‘ฅโˆ— โˆˆ (๐พ๐พโˆ— +๐›พโˆ’1๐œ•๐บโˆ—)โˆ’1(๐›พโˆ’1๐พ๐‘ฅ). Combined with โˆ’๐พโˆ—๐‘ฆโˆ— = ๐‘ค โˆ’ ๐‘ฅ , this yields the rstclaim. The second claim then follows from Theorem 7.11 together with the rst claim for๐›พ = 1. ๏ฟฝ

Theorem 8.8. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘ โ†’ โ„ be convex, proper, and lower semicontinuous.Also let ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and ๐ต โˆˆ ๐•ƒ(๐‘ ;๐‘Œ ), and ๐‘ โˆˆ ๐‘Œ . Assume the existence of a point (๐‘ฅ0, ๐‘ง0) โˆˆdom ๐น ร— dom๐บ with ๐ด๐‘ฅ0 + ๐ต๐‘ง0 = ๐‘ . Then subject to ane transformations to obtain iteratesnot explicitly generated in each case, the following are equivalent:

(i) The ADMM applied to the (primal) problem

(8.40) min๐‘ฅโˆˆ๐‘‹,๐‘งโˆˆ๐‘

๐น (๐‘ฅ) +๐บ (๐‘ง) s.t. ๐ด๐‘ฅ + ๐ต๐‘ง = ๐‘.

(ii) The DRS method applied to the (dual) problem

(8.41) min๐‘ฆโˆˆ๐‘Œ

๐น โˆ—(๐ดโˆ—๐‘ฆ) + [๐บโˆ—(๐ตโˆ—๐‘ฆ) โˆ’ ใ€ˆ๐‘, ๐‘ฆใ€‰๐‘Œ ] .

(iii) If ๐ด = Id, ๐‘‹ = ๐‘Œ , and ๐œŽ = ๐œโˆ’1, the PDPS method applied to the (saddle-point) problem

(8.42) min๐‘ฆโˆˆ๐‘Œ

max๐‘ฅโˆˆ๐‘‹

[๐บโˆ—(๐ตโˆ—๐‘ฆ) โˆ’ ใ€ˆ๐‘, ๐‘ฆใ€‰๐‘Œ ] + ใ€ˆ๐‘ฅ, ๐‘ฆใ€‰๐‘Œ โˆ’ ๐น (๐‘ฅ) .

Proof. The assumption on the existence of (๐‘ฅ0, ๐‘ง0) ensures that the inmum in (8.40) isnite. Multiplying the rst and second updates of (8.30) by๐ด and ๐ต, and changing variables๐‘ฅ๐‘˜+1 and ๐‘ง๐‘˜+1 to ๐‘ฅ๐‘˜+1 โ‰” ๐ด๐‘ฅ๐‘˜+1 and ๐‘ง๐‘˜+1 โ‰” ๐ต๐‘ง๐‘˜+1, we obtain

๐‘ฅ๐‘˜+1 โˆˆ ๐ด(๐ดโˆ—๐ด + ๐œโˆ’1๐œ•๐น )โˆ’1(๐ดโˆ—(๐‘ โˆ’ ๐‘ง๐‘˜ โˆ’ ๐œโˆ’1_๐‘˜)),๐‘ง๐‘˜+1 โˆˆ ๐ต(๐ตโˆ—๐ต + ๐œโˆ’1๐œ•๐บ)โˆ’1(๐ตโˆ—(๐‘ โˆ’ ๐‘ฅ๐‘˜+1 โˆ’ ๐œโˆ’1_๐‘˜)),_๐‘˜+1 โ‰” _๐‘˜ + ๐œ (๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘).

114

8 proximal point and splitting methods

Using Lemma 8.7 in place of Lemma 6.21 (ii), with ๐‘ฆ๐‘˜ โ‰” โˆ’_๐‘˜ and โˆ’๐‘ฆ๐‘˜+1 = โˆ’๐‘ฆ๐‘˜ + ๐œ (๐‘ฅ๐‘˜+1 +๐‘ง๐‘˜+1 โˆ’ ๐‘), we transform this as above to

(8.43){๐‘ฆ๐‘˜+1 โˆˆ prox๐œ๐บโˆ—โ—ฆ๐ตโˆ— (๐œ (๐‘ โˆ’ ๐‘ฅ๐‘˜) + ๐‘ฆ๐‘˜),๐‘ฅ๐‘˜+1 โˆˆ ๐ด(๐ดโˆ—๐ด + ๐œโˆ’1๐œ•๐น )โˆ’1(๐ดโˆ—(2๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜ + ๐‘ฅ๐‘˜)) .

If ๐ด = Id, this is the PDPS method with the iterate equivalence ๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜+1. We continuewith Lemma 8.7 and๐‘ค๐‘˜+1 โ‰” ๐‘ฆ๐‘˜+1 โˆ’ ๐œ๐‘ฅ๐‘˜+1 to transform (8.43) further into{

๐‘ฆ๐‘˜+1 โ‰” prox๐œ (๐บโˆ—โ—ฆ๐ตโˆ—โˆ’ใ€ˆ๐‘, ยท ใ€‰) (๐‘ค๐‘˜),๐‘ค๐‘˜+1 โ‰” ๐‘ค๐‘˜ โˆ’ ๐‘ฆ๐‘˜+1 + prox๐œ๐น โˆ—โ—ฆ๐ดโˆ— (2๐‘ฆ๐‘˜+1 โˆ’๐‘ค๐‘˜).

This is the DRS method.

We now want to apply Theorem 5.10 to ๐น (๐‘ฅ, ๐‘ง) โ‰” ๐น (๐‘ฅ) + ๐บ (๐‘ง), ๏ฟฝ๏ฟฝ (๐‘ฆ) โ‰” ๐›ฟ{๐‘ฆ=๐‘} (๐‘ฆ), and๏ฟฝ๏ฟฝ โ‰” (๐ด๐ต) to establish the claimed duality relationship between (8.40) and (8.41). However,dom๐บ = {๐‘} has empty interior, so condition (ii) of the theorem does not hold. RecallingRemarks 4.16 and 5.12, we can however replace the interior with the relative interiorri dom ๏ฟฝ๏ฟฝ = {๐‘}. Thus the condition reduces to the existence of ๐‘ฆ0 โˆˆ dom ๐น with ๐พ๐‘ฆ0 = ๐‘ ,which is satised by ๐‘ฆ0 = (๐‘ฅ0, ๐‘ง0).Finally, the relationship to (8.42) when ๐ด = Id is immediate from (8.41) and the denitionof the conjugate function ๐น โˆ—. The existence of a saddle point follows from the proof ofTheorem 5.10. ๏ฟฝ

The methods in the proof of Theorem 8.8 are rarely computationally feasible or ecientunless ๐ด = ๐ต = Id, due to the dicult proximal mappings for compositions of functionalswith operators or the set-valued operator inversions required. On the other hand, the PDPSmethod (8.33) only requires that we can compute the proximal mappings of ๐บ and ๐น . Thisdemonstrates the importance of problem formulation.

Similar connections hold for the preconditioned ADMM (8.32). With the help of the thirdstep of (8.32), the rst step can be rewritten

๐‘ฅ๐‘˜+1 โ‰” prox๐œŽ๐น (๐‘ฅ๐‘˜ โˆ’ ๐œŽ๐ดโˆ—(2_๐‘˜ โˆ’ _๐‘˜โˆ’1)) .

If \๐œ = 1 and ๐ต = Id, the second step reads

๐‘ง๐‘˜+1 โ‰” prox๐œโˆ’1๐บ ((๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1) โˆ’ ๐œโˆ’1_๐‘˜).

We transform this with Lemma 6.21 (ii) into

๐‘ง๐‘˜+1 = (๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1) โˆ’ ๐œโˆ’1_๐‘˜ โˆ’ ๐œโˆ’1prox๐œ๐บโˆ— (๐œ (๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1) โˆ’ _๐‘˜).

115

8 proximal point and splitting methods

Using the third step of (8.32), this is equivalent to

โˆ’_๐‘˜+1 = prox๐œ๐บโˆ— (๐œ (๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1) โˆ’ _๐‘˜).

Introducing ๐‘ฆ๐‘˜+1 โ‰” โˆ’_๐‘˜+1 and changing the order of the rst and second step, we thereforetransform (8.32) into the PDPS method

(8.44){๐‘ฆ๐‘˜+1 โ‰” prox๐œ๐บโˆ— (๐‘ฆ๐‘˜ โˆ’ ๐œ๐ด๐‘ฅ๐‘˜),๐‘ฅ๐‘˜+1 โ‰” prox๐œŽ๐น (๐‘ฅ๐‘˜ + ๐œŽ๐ดโˆ—(2๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ๐‘˜)) .

We therefore have obtained the following result.

Theorem 8.9. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘Œ โ†’ โ„ be convex, proper, and lower semicontinuous.Also let ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and ๐‘ โˆˆ ๐‘Œ . Assume the existence of a point (๐‘ฅ0, ๐‘ง0) โˆˆ dom ๐น ร— dom๐บwith ๐ด๐‘ฅ0 + ๐ต๐‘ง0 = ๐‘ . Take \ = ๐œโˆ’1. Then subject to ane transformations to obtain iterates notexplicitly generated in each case, the following are equivalent:

(i) The preconditioned ADMM (8.32) applied to the (primal) problem

min๐‘ฅโˆˆ๐‘‹,๐‘งโˆˆ๐‘Œ

๐น (๐‘ฅ) +๐บ (๐‘ง) s.t. ๐ด๐‘ฅ + ๐‘ง = ๐‘.

(ii) The PDPS method applied to the (saddle point) problem

min๐‘ฆโˆˆ๐‘Œ

max๐‘ฅโˆˆ๐‘‹

[๐บโˆ—(๐‘ฆ) โˆ’ ใ€ˆ๐‘, ๐‘ฆใ€‰๐‘Œ ] + ใ€ˆ๐ด๐‘ฅ, ๐‘ฆใ€‰๐‘Œ โˆ’ ๐น (๐‘ฅ).

(iii) If ๐ด = Id, ๐‘‹ = ๐‘Œ , and ๐œŽ = ๐œโˆ’1, the Douglasโ€“Rachford splitting applied to the (dual)problem

min๐‘ฆโˆˆ๐‘‹

๐น โˆ—(๐‘ฆ) + [๐บโˆ—(๐‘ฆ) โˆ’ ใ€ˆ๐‘, ๐‘ฆใ€‰๐‘Œ ] .

Proof. We have already proved the equivalence of the preconditioned ADMM and thePDPS method. For equivalence to the Douglasโ€“Rachford splitting, we observe that underthe additional assumptions of this theorem, (8.44) reduces to (8.38). ๏ฟฝ

116

9 WEAK CONVERGENCE

Now that we have in the previous chapter derived several iterative procedures through themanipulation of xed-point equations, we have to show that they indeed converge to axed point (which by construction is then the solution of an optimization problem, makingthese procedures optimization algorithms). We start with weak convergence, as this is themost that can generally be expected.

The classical approach to proving weak convergence is by introducing suitable contractive(or at least rmly nonexpansive) operators related to the algorithm and then applyingclassical xed-point theorems (see Remark 9.5 below). We will instead introduce a verydirect approach that will then extend in the following chapters to be also capable of provingconvergence rates. The three main ingredients of all convergence proofs will be

(i) The three-point identity (1.5), which we recall here as

(9.1) ใ€ˆ๐‘ฅ โˆ’ ๐‘ฆ, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ =12 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ โˆ’ 1

2 โ€–๐‘ฆ โˆ’ ๐‘งโ€–2๐‘‹ + 12 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ for all ๐‘ฅ, ๐‘ฆ, ๐‘ง โˆˆ ๐‘‹ .

(ii) The monotonicity of the operator ๐ป whose roots we seek to nd (which in thesimplest case equals ๐œ•๐น for the functional ๐น we want to minimize).

(iii) The nonnegativity of the preconditioning operators๐‘€ dening the implicit forms ofthe algorithms we presented in Chapter 8.

In the later chapters, stronger versions of the last two ingredients will be required to obtainconvergence rates and the convergence of function value dierences ๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ) or ofmore general gap functionals.

9.1 opialโ€™s lemma and fejรฉr monotonicity

The next lemma forms the basis of all our weak convergence proofs. It is a generalizedsubsequence argument, showing that if all weak limit points of a sequence lie in a set andif the sequence does not diverge (in the strong sense) away from this set, the full sequenceconverges weakly. We recall that ๐‘ฅ โˆˆ ๐‘‹ is a weak(-โˆ—) limit point of the sequence {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•,if there exists a subsequence such that ๐‘ฅ๐‘˜โ„“ โ‡€ ๐‘‹ weakly(-โˆ—) in ๐‘‹ .

117

9 weak convergence

Lemma 9.1 (Opial). On a Hilbert space ๐‘‹ , let ๐‘‹ โŠ‚ ๐‘‹ be nonempty, and {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ ๐‘‹ satisfythe following conditions:

(i) โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ and ๐‘˜ โˆˆ โ„•;

(ii) all weak limit points of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• belong to ๐‘‹ ;

Then ๐‘ฅ๐‘˜ โ‡€ ๐‘ฅ weakly in ๐‘‹ for some ๐‘ฅ โˆˆ ๐‘‹ .

Proof. Condition (i) implies that the sequence {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• is bounded and hence by Theorem 1.9contains a weakly convergent subsequence. Let now ๐‘ฅ and๐‘ฅ be weak limit points. Condition(i) then implies that both {โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ }๐‘˜โˆˆโ„• and {โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ }๐‘˜โˆˆโ„• are decreasing and boundedand therefore convergent. This implies that

ใ€ˆ๐‘ฅ๐‘˜ , ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ =12

(โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ + โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ โ€–๐‘ฅ โ€–2๐‘‹

)โ†’ ๐‘ โˆˆ โ„.

Since ๐‘ฅ is a weak accumulation point, there exists a subsequence {๐‘ฅ๐‘˜๐‘› }๐‘›โˆˆโ„• with ๐‘ฅ๐‘˜๐‘› โ‡€ ๐‘ฅ ;similarly, there exists a subsequence {๐‘ฅ๐‘˜๐‘š }๐‘šโˆˆโ„• with ๐‘ฅ๐‘˜๐‘š โ‡€ ๐‘ฅ . Hence,

ใ€ˆ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = lim๐‘›โ†’โˆžใ€ˆ๐‘ฅ

๐‘˜๐‘› , ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ๐‘ = lim๐‘šโ†’โˆžใ€ˆ๐‘ฅ

๐‘˜๐‘š , ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ ,

and therefore0 = ใ€ˆ๐‘ฅ โˆ’ ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ ,

i.e., ๐‘ฅ = ๐‘ฅ . Every convergent subsequence thus has the same weak limit (which lies in ๐‘‹by condition (ii)). Since every subsequence contains a convergent subsequence, taking asubsequence of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• assumed not to converge (to ๐‘ฅ ), we obtain a contradiction, thereforededucing that ๐‘ฅ is the weak limit of the full sequence {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. ๏ฟฝ

A sequence satisfying the condition (i) is called Fejรฉr monotone (with respect to ๐‘‹ ); this is acrucial property of iterates generated by any xed-point algorithm.

Remark 9.2. Lemma 9.1 rst appeared in the proof of [Opial 1967, Theorem 1]. (There ๐‘‹ is assumedto be closed and convex, but we do not require this since Condition (ii) is already sucient to showthe claim.)

The concept of Fejรฉr monotone sequences rst appears in [Fejรฉr 1922], where it was observed thatfor every point outside the convex hull of a subset of the Euclidean plane, it is always possible toconstruct a point that is closer to each point in the subset than the original point (and that thisproperty in fact characterizes the convex hull). The term Fejรฉr monotone itself appears in [Motzkin& Schoenberg 1954], where this construction is used to show convergence of an iterative schemefor the projection onto a convex polytope.

118

9 weak convergence

9.2 the fundamental methods: proximal point and explicit splitting

Using Opialโ€™s Lemma 9.1, we can fairly directly show weak convergence of the proximalpoint and forward-backward splitting methods.

proximal point method

We recall our most fundamental nonsmooth optimization algorithm, the proximal pointmethod. For later use, we treat the general version of (8.1) for an arbitrary set-valuedoperator ๐ป : ๐‘‹ โ‡’ ๐‘‹ , i.e.,

(9.2) ๐‘ฅ๐‘˜+1 = R๐œ๐‘˜๐ป (๐‘ฅ๐‘˜).

We will need the next lemma to allow a very general choice of the step lengths {๐œ๐‘˜}๐‘˜โˆˆโ„•. (Ifwe assume ๐œ๐‘˜ โ‰ฅ Y > 0, in particular if we keep ๐œ๐‘˜ โ‰ก ๐œ constant, it will not be needed.) For thestatement, note that by the denition of the resolvent, (9.2) is equivalent to ๐œโˆ’1

๐‘˜(๐‘ฅ๐‘˜ โˆ’๐‘ฅ๐‘˜+1) โˆˆ

๐ป (๐‘ฅ๐‘˜+1).

Lemma 9.3. Let {๐œ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ (0,โˆž) with โˆ‘โˆž๐‘˜=0 ๐œ

2๐‘˜= โˆž, and let ๐ป : ๐‘‹ โ‡’ ๐‘‹ be monotone.

Suppose {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• and๐‘ค๐‘˜+1 โ‰” โˆ’๐œโˆ’1๐‘˜(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) satises

(i) 0 โ‰  ๐‘ค๐‘˜+1 โˆˆ ๐ป (๐‘ฅ๐‘˜+1) and

(ii)โˆžโˆ‘๐‘˜=0

๐œ2๐‘˜ โ€–๐‘ค๐‘˜ โ€–2๐‘‹ < โˆž.

Then โ€–๐‘ค๐‘˜ โ€–๐‘‹ โ†’ 0.

Proof. Since๐‘ค๐‘˜ โˆˆ ๐ป (๐‘ฅ๐‘˜) and ๐ป is monotone, we have from the denition of๐‘ค๐‘˜ that

0 โ‰ค ใ€ˆ๐‘ค๐‘˜+1 โˆ’๐‘ค๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ใ€‰๐‘‹ = ๐œโˆ’1๐‘˜ ใ€ˆ๐‘ค๐‘˜ โˆ’๐‘ค๐‘˜+1,๐‘ค๐‘˜+1ใ€‰๐‘‹ โ‰ค ๐œโˆ’1๐‘˜ โ€–๐‘ค๐‘˜+1โ€–๐‘‹ (โ€–๐‘ค๐‘˜ โ€–๐‘‹ โˆ’ โ€–๐‘ค๐‘˜+1โ€–๐‘‹ ).

Thus the nonnegative sequence {โ€–๐‘ค๐‘˜ โ€–๐‘‹ }๐‘˜โˆˆโ„• is decreasing and hence converges to some๐‘€ โ‰ฅ 0. Since โˆ‘โˆž

๐‘˜=0 ๐œ2๐‘˜= โˆž, the second assumption implies that lim inf๐‘˜โ†’โˆž โ€–๐‘ค๐‘˜ โ€–๐‘‹ = 0.

Since the full sequence converges,๐‘€ = 0, i.e., โ€–๐‘ค๐‘˜ โ€–๐‘‹ โ†’ 0 as claimed. ๏ฟฝ

This shows that the โ€œgeneralized residualโ€ ๐‘ค๐‘˜ in the inclusion ๐‘ค๐‘˜ โˆˆ ๐ป (๐‘ฅ๐‘˜) converges(strongly) to zero. As usual, this does not (yet) imply that {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• itself converges; but if itdoes, we expect the limit to be a root of ๐ป . This is what we prove next, using the threefundamental ingredients we introduced in the beginning of the chapter.

119

9 weak convergence

Theorem 9.4. Let ๐ป : ๐‘‹ โ‡’ ๐‘‹ be monotone and weak-to-strong outer semicontinuous with๐ปโˆ’1(0) โ‰  0. Furthermore, let {๐œ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ (0,โˆž) with โˆ‘โˆž

๐‘˜=0 ๐œ2๐‘˜= โˆž. If {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ ๐‘‹ is given

by the iteration (9.2) for any initial iterate ๐‘ฅ0 โˆˆ ๐‘‹ , then ๐‘ฅ๐‘˜ โ‡€ ๐‘ฅ for some root ๐‘ฅ โˆˆ ๐ปโˆ’1(0).

Proof. We recall that the proximal point iteration can be written in implicit form as

(9.3) 0 โˆˆ ๐œ๐‘˜๐ป (๐‘ฅ๐‘˜+1) + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).We โ€œtestโ€ (9.3) by the application of ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ for an arbitrary ๐‘ฅ โˆˆ ๐ปโˆ’1(0). Thus weobtain

(9.4) 0 โˆˆ ใ€ˆ๐œ๐‘˜๐ป (๐‘ฅ๐‘˜+1) + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ ,where the right-hand side should be understood as the set of all possible inner productsinvolving elements of ๐ป (๐‘ฅ๐‘˜+1). By the monotonicity of ๐ป , since 0 โˆˆ ๐ป (๐‘ฅ), we have

ใ€ˆ๐ป (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0,

which again should be understood to hold for any๐‘ค โˆˆ ๐ป (๐‘ฅ๐‘˜+1). (We will frequently makeuse of this notation and the one from (9.4) throughout this and the following chapters tokeep the presentation concise.) Thus (9.4) yields

ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0.

Applying now the three-point identity (9.1) for ๐‘ฅ = ๐‘ฅ๐‘˜+1, ๐‘ฆ = ๐‘ฅ๐‘˜ , and ๐‘ง = ๐‘ฅ , yields

(9.5) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค 12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

This shows the Fejรฉr monotonicity of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• with respect to ๐‘‹ = ๐ปโˆ’1(0).Furthermore, summing (9.5) over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 gives

(9.6) 12 โ€–๐‘ฅ

๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ +๐‘โˆ’1โˆ‘๐‘˜=0

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค 12 โ€–๐‘ฅ

0 โˆ’ ๐‘ฅ โ€–2๐‘‹ =: ๐ถ0.

Writing ๐‘ค๐‘˜+1 โ‰” โˆ’๐œโˆ’1๐‘˜(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜), the implicit iteration (9.3) shows that ๐‘ค๐‘˜+1 โˆˆ ๐ป (๐‘ฅ๐‘˜+1).

From (9.6) we also deduce that๐‘โˆ’1โˆ‘๐‘˜=0

๐œ2๐‘˜ โ€–๐‘ค๐‘˜+1โ€–2๐‘‹ โ‰ค 2๐ถ0.

If ๐œ๐‘˜ โ‰ฅ Y > 0, letting ๐‘ โ†’ โˆž shows โ€–๐‘ค๐‘˜+1โ€–๐‘‹ โ†’ 0. Otherwise, we can use Lemma 9.3 toestablish the same.

Let nally ๐‘ฅ be any weak limit point of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•, that is ๐‘ฅ๐‘˜๐‘– โ‡€ ๐‘ฅ for a subsequence{๐‘˜๐‘–}๐‘–โˆˆโ„• โŠ‚ โ„•. Recall that ๐‘ค๐‘˜๐‘– โˆˆ ๐ป (๐‘ฅ๐‘˜๐‘– ). The weak-to-strong outer semicontinuity of ๐ปnow immediately yields 0 โˆˆ ๐ป (๐‘ฅ). We then nish by applying Opialโ€™s Lemma 9.1 for theset ๐‘‹ = ๐ปโˆ’1(0). ๏ฟฝ

120

9 weak convergence

Note that the conditions of Theorem 9.4 are in particularly satised if๐ป is either maximallymonotone (Lemma 6.8) or monotone and BCP outer semicontinuous (Lemma 6.10). Inparticular, applying Theorem 9.4 to ๐ป = ๐œ•๐ฝ yields the convergence of the proximal pointmethod (8.1) for any proper, convex, and lower semicontinuous functional ๐ฝ : ๐‘‹ โ†’ โ„.

Remark 9.5. A conventional way of proving the convergence of the proximal point method is withBrowderโ€™s xed-point theorem [Browder 1965], which shows the existence of xed points of rmlynonexpansive or, more generally, ๐›ผ-averaged mappings. (We have already shown in Lemma 6.13the rm nonexpansivity of the proximal map.) On the other hand, to prove Browderโ€™s xed-pointtheorem itself, we can use similar arguments as Theorem 9.4, see Theorem 9.20 below.

explicit splitting

The convergence of the forward-backward splitting method

(9.7) ๐‘ฅ๐‘˜+1 = prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜โˆ‡๐น (๐‘ฅ๐‘˜))

can be shown analogously. To do so, we need to assume the Lipschitz continuity of thegradient of ๐น (since we are not using a proximal point mapping for ๐น which is alwaysrmly nonexpansive and hence Lipschitz continuous).

Theorem 9.6. Let ๐น : ๐‘‹ โ†’ โ„ and๐บ : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous.Suppose (๐œ•(๐น + ๐บ))โˆ’1(0) โ‰  โˆ…, i.e., that ๐ฝ โ‰” ๐น + ๐บ has a minimizer. Furthermore, let ๐น beGรขteaux dierentiable with ๐ฟ-Lipschitz gradient. If 0 < ๐œmin โ‰ค ๐œ๐‘˜ < 2๐ฟโˆ’1, then for any initialiterate ๐‘ฅ0 โˆˆ ๐‘‹ the sequence generated by (9.7) converges weakly to a root ๐‘ฅ โˆˆ (๐œ•(๐น +๐บ))โˆ’1(0).

Proof. We again start by writing (9.7) in implicit form as

(9.8) 0 โˆˆ ๐œ๐‘˜ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

By the monotonicity of ๐œ•๐บ and the three-point monotonicity (7.9) of ๐น from Corollary 7.2,we rst deduce for any ๐‘ฅ โˆˆ ๐‘‹ โ‰” (๐œ•(๐น +๐บ))โˆ’1(0) that

ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐ฟ4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Thus, again testing (9.8) with ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰yields

ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค ๐ฟ๐œ๐‘˜4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

The three-point identity (9.1) now implies that

(9.9) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + 1 โˆ’ ๐œ๐‘˜๐ฟ/22 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค 1

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

121

9 weak convergence

The assumption 2 > ๐œ๐‘˜๐ฟ then establishes the Fejรฉr monotonicity of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• with respectto ๐‘‹ . Let now ๐‘ฅ be a weak limit point of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•, i.e., ๐‘ฅ๐‘˜๐‘– โ‡€ ๐‘ฅ for a subsequence {๐‘˜๐‘–}๐‘–โˆˆโ„• โŠ‚โ„•. Since (9.9) implies ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ†’ 0 strongly, we have โˆ‡๐น (๐‘ฅ๐‘˜๐‘–+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜๐‘– ) โ†’ 0 bythe Lipschitz continuity of โˆ‡๐น . Consequently, using again the subdierential sum ruleTheorem 4.14, ๐œ•(๐บ + ๐น ) (๐‘ฅ๐‘˜๐‘–+1) 3 ๐‘ค๐‘˜๐‘–+1 + โˆ‡๐น (๐‘ฅ๐‘˜๐‘–+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜๐‘– ) โ†’ 0. By the weak-to-strongouter semicontinuity of ๐œ•(๐บ + ๐น ) from Lemma 6.8 and Theorem 6.11, it follows that 0 โˆˆ๐œ•(๐บ + ๐น ) (๐‘ฅ). We nish by applying Opialโ€™s Lemma 9.1 with ๐‘‹ = (๐œ•(๐น +๐บ))โˆ’1(0). ๏ฟฝ

9.3 preconditioned proximal point methods: drs and pdps

We now extend the analysis of the previous section to the preconditioned proximal pointmethod (8.11), which we recall can be written in implicit form as

(9.10) 0 โˆˆ ๐ป (๐‘ฅ๐‘˜+1) +๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)

for some preconditioning operator๐‘€ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) and includes the Douglasโ€“Rachford split-ting (DRS) and the primal-dual proximal splitting (PDPS) methods as special cases. To dealwith๐‘€ , we need to improve Theorem 9.6 slightly. First, we introduce the preconditionednorm โ€–๐‘ฅ โ€–๐‘€ โ‰”

โˆšใ€ˆ๐‘€๐‘ฅ, ๐‘ฅใ€‰, which satises the preconditioned three-point identity

(9.11) ใ€ˆ๐‘€ (๐‘ฅ โˆ’ ๐‘ฆ), ๐‘ฅ โˆ’ ๐‘งใ€‰ = 12 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘€ โˆ’ 1

2 โ€–๐‘ฆ โˆ’ ๐‘งโ€–2๐‘€ + 12 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘€ for all ๐‘ฅ, ๐‘ฆ, ๐‘ง โˆˆ ๐‘‹ .

The boundedness assumption in the statement of the next theorem holds in particular for๐‘€ = Id and ๐ป maximally monotone by Corollary 6.14.

Theorem 9.7. Suppose๐ป : ๐‘‹ โ‡’ ๐‘‹ is monotone and weak-to-strong outer semicontinuous with๐ปโˆ’1(0) โ‰  โˆ…, that๐‘€ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) is self-adjoint and positive semi-denite, and that either๐‘€ hasa bounded inverse, or (๐ป +๐‘€)โˆ’1โ—ฆ๐‘€ 1/2 is bounded on bounded sets. Let the initial iterate ๐‘ฅ0 โˆˆ ๐‘‹be arbitrary, and assume that (9.10) has a unique solution ๐‘ฅ๐‘˜+1 for all ๐‘˜ โˆˆ โ„•. Then the iterates{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• of (9.10) are bounded and satisfy 0 โˆˆ lim sup๐‘˜โ†’โˆž๐ป (๐‘ฅ๐‘˜) and๐‘€ 1/2(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ) โ‡€ 0 forsome ๐‘ฅ โˆˆ ๐ปโˆ’1(0).

Proof. Let ๐‘ฅ โˆˆ ๐ปโˆ’1(0) be arbitrary. By the monotonicity of ๐ป , we then have as before

ใ€ˆ๐ป (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0,

which together with (9.10) yields

(9.12) ใ€ˆ๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0.

122

9 weak convergence

Applying the preconditioned three-point identity (9.11) for ๐‘ฅ = ๐‘ฅ๐‘˜+1, ๐‘ฆ = ๐‘ฅ๐‘˜ , and ๐‘ง = ๐‘ฅ in(9.12) shows that

(9.13) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€ + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘€ โ‰ค 12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘€ ,

and summing (9.13) over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 yields

(9.14) 12 โ€–๐‘ฅ

๐‘ โˆ’ ๐‘ฅ โ€–2๐‘€ +๐‘โˆ’1โˆ‘๐‘˜=0

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘€ โ‰ค 12 โ€–๐‘ฅ

0 โˆ’ ๐‘ฅ โ€–2๐‘€ .

Let now ๐‘ง๐‘˜ โ‰” ๐‘€ 1/2๐‘ฅ๐‘˜ . Our objective is then to show ๐‘ง๐‘˜ โ‡€ ๐‘ง for some ๐‘ง โˆˆ ๐‘ โ‰” ๐‘€ 1/2๐ปโˆ’1(0),which we do by using Opialโ€™s Lemma 9.1. From (9.13), we obtain the necessary Fejรฉrmonotonicity of {๐‘ง๐‘˜}๐‘˜โˆˆโ„• with respect to the set ๐‘ . It remains to verify that ๐‘ contains allweak limit points of {๐‘ง๐‘˜}๐‘˜โˆˆโ„•.Let therefore ๐‘ง be such a limit point, i.e., ๐‘ง๐‘˜๐‘– โ‡€ ๐‘ง for a subsequence {๐‘˜๐‘–}๐‘–โˆˆโ„•. We want toshow that now see that ๐‘ง = ๐‘€ 1/2๐‘ฅ for a weak limit point ๐‘ฅ of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. We proceed by rstshowing in two cases the boundedness of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•:

(i) If ๐‘€ has a bounded inverse, then ๐‘€ โ‰ฅ \๐ผ for some \ > 0, and thus the sequence{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• is bounded by (9.14).

(ii) Otherwise, (๐ป + ๐‘€)โˆ’1 โ—ฆ ๐‘€ 1/2 is bounded on bounded sets. Now (9.14) only givesboundedness of {๐‘ง๐‘˜}๐‘˜โˆˆโ„•. However, ๐‘ฅ๐‘˜+1 โˆˆ (๐ป +๐‘€)โˆ’1(๐‘€๐‘ฅ๐‘˜) = (๐ป +๐‘€)โˆ’1(๐‘€ 1/2๐‘ง๐‘˜),and {๐‘ง๐‘˜}๐‘˜โˆˆโ„• is bounded by (9.14), so we obtain the boundedness of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•.

Thus there exists a further subsequence of {๐‘ฅ๐‘˜๐‘– }๐‘–โˆˆโ„•, weakly converging to some ๐‘ฅ โˆˆ ๐‘‹ .Since ๐‘ง๐‘˜ = ๐‘€ 1/2๐‘ฅ๐‘˜ , it follows that ๐‘ง = ๐‘€ 1/2๐‘ฅ . To show that ๐‘ง โˆˆ ๐‘ , if therefore suces toshow that the weak limit points of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• belong to ๐ปโˆ’1(0).Let thus ๐‘ฅ be any weak limit point of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•, i.e., ๐‘ฅ๐‘˜๐‘– โ‡€ ๐‘ฅ for some subsequence {๐‘˜๐‘–}๐‘˜โˆˆโ„• โŠ‚โ„•. From (9.14), we obtain rst that๐‘€ 1/2(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) โ†’ 0 and hence that๐‘ค๐‘˜+1 โ‰” โˆ’๐‘€ (๐‘ฅ๐‘˜+1 โˆ’๐‘ฅ๐‘˜) โ†’ 0. From (8.18), we also know that๐‘ค๐‘˜+1 โˆˆ ๐ป (๐‘ฅ๐‘˜+1). It follows that 0 = lim๐‘˜โ†’โˆž๐‘ค๐‘˜+1 โˆˆlim sup๐‘˜โ†’โˆž๐ป (๐‘ฅ๐‘˜+1). The weak-to-strong outer semicontinuity now immediately yields0 โˆˆ ๐ป (๐‘ฅ). Hence, ๐‘ contains all weak limit points of {๐‘ง๐‘˜}๐‘˜โˆˆโ„•.The claim now follows from Lemma 9.1. ๏ฟฝ

In the following, we verify that the DRS and PDPS methods satisfy the assumptions of thistheorem.

123

9 weak convergence

douglasโ€“rachford splitting

Recall that the DRS method (8.8), i.e.,

(9.15)

๐‘ฅ๐‘˜+1 = prox๐œ๐น (๐‘ง๐‘˜),๐‘ฆ๐‘˜+1 = prox๐œ๐บ (2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜),๐‘ง๐‘˜+1 = ๐‘ง๐‘˜ + ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1,

can be written as the preconditioned proximal point method (8.11) in terms of๐‘ข = (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ๐‘ˆ โ‰” ๐‘‹ 3 and the operators

(9.16) ๐ป (๐‘ฅ, ๐‘ฆ, ๐‘ง) โ‰” ยฉยญยซ๐œ๐ต(๐‘ฅ) + ๐‘ฆ โˆ’ ๐‘ง๐œ๐ด(๐‘ฆ) + ๐‘ง โˆ’ ๐‘ฅ

๐‘ฅ โˆ’ ๐‘ฆยชยฎยฌ and ๐‘€ โ‰”

ยฉยญยซ0 0 00 0 00 0 ๐ผ

ยชยฎยฌfor ๐ต = ๐œ•๐น and๐ด = ๐œ•๐บ . We are now interested in the properties of ๐ป in terms of those of๐ดand ๐ต. For this, we can make use of the generic structure of ๐ป , which will reappear severaltimes in the following.

Lemma 9.8. If ๐ด : ๐‘‹ โ‡’ ๐‘‹ is maximally monotone and ฮž โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) is skew-adjoint (i.e.,ฮžโˆ— = โˆ’ฮž), then ๐ป โ‰” ๐ด + ฮž is maximally monotone. In particular, any skew-adjoint operatorฮž is maximally monotone.

Proof. Let ๐‘ฅ, ๐‘งโˆ— โˆˆ ๐‘‹ be given such that

ใ€ˆ๐‘งโˆ— โˆ’ ๐‘งโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 for all ๐‘ฅ โˆˆ ๐‘‹, ๐‘งโˆ— โˆˆ ๐ป (๐‘ฅ).

Recalling (6.2), we need to show that ๐‘งโˆ— โˆˆ ๐ป (๐‘ฅ). By the denition of ๐ป , for any ๐‘งโˆ— โˆˆ ๐ป (๐‘ฅ)there exists a ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) with ๐‘งโˆ— = ๐‘ฅโˆ— + ฮž๐‘ฅ . On the other hand, setting ๐‘ฅโˆ— โ‰” ๐‘งโˆ— โˆ’ ฮž๐‘ฅ ,we have ๐‘งโˆ— = ๐‘ฅโˆ— + ฮž๐‘ฅ . We are thus done if we can show that ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ). But using theskew-adjointness of ๐ป and the symmetry of the inner product, we can write

(9.17) 0 โ‰ค ใ€ˆ๐‘งโˆ— โˆ’ ๐‘งโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹= ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆฮž(๐‘ฅ โˆ’ ๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹= ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + 1

2 ใ€ˆฮž(๐‘ฅ โˆ’ ๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ 12 ใ€ˆ๐‘ฅ โˆ’ ๐‘ฅ,ฮž(๐‘ฅ โˆ’ ๐‘ฅ)ใ€‰๐‘‹

= ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ ,

and ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) follows from the maximal monotonicity of ๐ด.

To prove the nal claim about skew-adjoint operators being maximally monotone, wetake ๐ด = {0} = ๐œ•๐‘† for the constant functional ๐‘† โ‰ก 0, which is maximally monotone byTheorem 6.11. ๏ฟฝ

124

9 weak convergence

Corollary 9.9. Let ๐ด and ๐ต be maximally monotone. Then the operator ๐ป dened in (9.16) ismaximally monotone.

Proof. Let

๏ฟฝ๏ฟฝ(๐‘ข) โ‰” ยฉยญยซ๐œ๐ต(๐‘ฅ)๐œ๐ด(๐‘ฆ)

0

ยชยฎยฌ and ฮž โ‰”ยฉยญยซ0 Id โˆ’Idโˆ’Id 0 IdId โˆ’Id 0

ยชยฎยฌ .From the denition of the inner product on the product space ๐‘‹ 3 together with Lemma 6.5,we have that ๏ฟฝ๏ฟฝ is maximally monotone, while ฮž is clearly skew-adjoint. The claim nowfollows from Lemma 9.8. ๏ฟฝ

We can now show convergence of the DRS method.

Corollary 9.10. Let ๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘‹ be maximally monotone, and suppose (๐ด + ๐ต)โˆ’1(0) โ‰  โˆ….Pick a step length ๐œ > 0 and an initial iterate ๐‘ง0 โˆˆ ๐‘‹ . Then the iterates {(๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜)}๐‘˜โˆˆโ„• ofthe DRS method (9.15) converge weakly to (๐‘ฅ, ๐‘ฆ, ๏ฟฝ๏ฟฝ) โˆˆ ๐ปโˆ’1(0) satisfying ๐‘ฅ = ๐‘ฆ โˆˆ (๐ด +๐ต)โˆ’1(0).Moreover, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฆ๐‘˜ โ†’ 0.

Proof. Since ๐ด and ๐ต are maximally monotone, Corollary 6.14 shows that the DRS iterationis always solvable for ๐‘ข๐‘˜+1. Regarding convergence, we start by proving that the sequence{๐‘ข๐‘˜ = (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜)}๐‘˜โˆˆโ„• is bounded, ๐‘ง๐‘˜ โ‡€ ๏ฟฝ๏ฟฝ, and 0 โˆˆ lim sup๐‘˜โ†’โˆž๐ป (๐‘ข๐‘˜). Note that thelatter implies as claimed that ๐‘ฅ๐‘˜ โˆ’ ๐‘ฆ๐‘˜ โ†’ 0 strongly. We do this using Theorem 9.7 whoseconditions we have to verify. By Corollary 9.9, ๐ป is maximally monotone and hence weak-to-strong outer semicontinuous by Lemma 6.8. Since๐‘€ is noninvertible, we also have toverify that (๐ป +๐‘€)โˆ’1โ—ฆ๐‘€ 1/2 is bounded on bounded sets. But since๐‘ข๐‘˜+1 โˆˆ (๐ป +๐‘€)โˆ’1(๐‘€๐‘ข๐‘˜) =(๐ป +๐‘€)โˆ’1(๐‘€ 1/2๐‘ข๐‘˜) is an equivalent formulation of the iteration (9.15), this follows fromthe Lipschitz continuity of the resolvent (Corollary 6.14). Hence, we can apply Theorem 9.7to deduce ๐‘ง๐‘˜ โ‡€ ๏ฟฝ๏ฟฝ by the denition of ๐‘€ . Furthermore, there exist ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ such that0 โˆˆ ๐ป (๐‘ฅ, ๐‘ฆ, ๏ฟฝ๏ฟฝ). The third relation of this inclusion gives ๐‘ฅ = ๐‘ฆ , and adding the remaininginclusion now yields that 0 โˆˆ ๐ด(๐‘ฅ) + ๐ต(๐‘ฅ).It remains to show weak convergence of the other variables. Since {๐‘ข๐‘˜}๐‘˜โˆˆโ„• is bounded,it contains a subsequence converging weakly to some ๏ฟฝ๏ฟฝ = (๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) which satises 0 โˆˆ๐ป (๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) such that ๐‘ฅ = ๏ฟฝ๏ฟฝ . Since ๐‘ง๐‘˜ โ‡€ ๏ฟฝ๏ฟฝ, we have ๐‘ง = ๏ฟฝ๏ฟฝ. The rst relation of the inclusionthen can be rearranged to ๏ฟฝ๏ฟฝ = ๐‘ฅ = R๐œ๐ต (๐‘ง) = R๐œ๐ต (๏ฟฝ๏ฟฝ) by the single-valuedness of theresolvent (Corollary 6.14). The limit is thus independent of the subsequence, and hence asubsequenceโ€“subsequence argument shows that the full sequence converges. ๏ฟฝ

In particular, this convergence result applies to the special case of ๐ต = ๐œ•๐น and ๐ด = ๐œ•๐บ forproper, convex, lower semicontinuous ๐น,๐บ : ๐‘‹ โ†’ โ„. However, the xed point providedby the DRS method is related to a solution of the problem min๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) +๐บ (๐‘ฅ) only if thesubdierential sum rule (Theorem 4.14) holds with equality.

125

9 weak convergence

primal-dual proximal splitting

To study the PDPS method, we recall from (8.14) and (8.19) the operators

(9.18) ๐ป (๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ) + ๐พโˆ—๐‘ฆ๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ๐‘ฅ

), and ๐‘€ โ‰”

(๐œโˆ’1Id โˆ’๐พโˆ—

โˆ’๐พ ๐œŽโˆ’1Id

)for ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ =: ๐‘ˆ . With these we have already shown in Section 8.4 that thePDPS method

(9.19)

๐‘ฅ๐‘˜+1 = prox๐œ๐น (๐‘ฅ๐‘˜ โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜),๐‘ฅ๐‘˜+1 = 2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜+1 = prox๐œŽ๐บโˆ— (๐‘ฆ๐‘˜ + ๐œŽ๐พ๐‘ฅ๐‘˜+1).

has the form (9.10) of the preconditioned proximal point method. To show convergence,we rst have to establish some basic properties of both ๐ป and๐‘€ .

Lemma 9.11. The operator ๐‘€ : ๐‘ˆ โ†’ ๐‘ˆ dened in (8.19) is bounded and self-adjoint. If๐œŽ๐œ โ€–๐พ โ€–2

๐•ƒ(๐‘‹ ;๐‘Œ ) < 1, then๐‘€ is positive denite.

Proof. The denition of๐‘€ directly implies boundedness (since ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) is bounded)and self-adjointness. Let now ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘ˆ be given. Then

(9.20)

ใ€ˆ๐‘€๐‘ข,๐‘ขใ€‰๐‘ˆ = ใ€ˆ๐œโˆ’1๐‘ฅ โˆ’ ๐พโˆ—๐‘ฆ, ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐œŽโˆ’1๐‘ฆ โˆ’ ๐พ๐‘ฅ, ๐‘ฆใ€‰๐‘Œ= ๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ 2ใ€ˆ๐‘ฅ, ๐พโˆ—๐‘ฆใ€‰๐‘‹ + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œโ‰ฅ ๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ 2โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ€–๐‘ฅ โ€–๐‘‹ โ€–๐‘ฆ โ€–๐‘Œ + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œโ‰ฅ ๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ )

โˆš๐œŽ๐œ (๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œ ) + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œ

= (1 โˆ’ โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ )โˆš๐œŽ๐œ) (๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œ )

โ‰ฅ ๐ถ (โ€–๐‘ฅ โ€–2๐‘‹ + โ€–๐‘ฆ โ€–2๐‘Œ )

for ๐ถ โ‰” (1 โˆ’ โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ )โˆš๐œŽ๐œ)min{๐œโˆ’1, ๐œŽโˆ’1} > 0. Hence, ใ€ˆ๐‘€๐‘ข,๐‘ขใ€‰๐‘ˆ โ‰ฅ ๐ถ โ€–๐‘ขโ€–2๐‘ˆ for all ๐‘ข โˆˆ ๐‘ˆ ,

and therefore๐‘€ is positive denite. ๏ฟฝ

Lemma 9.12. The operator ๐ป : ๐‘ˆ โ‡’ ๐‘ˆ dened in (9.18) is maximally monotone.

Proof. Let ๐ด(๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ)๐œ•๐บโˆ— (๐‘ฆ)

)and ฮž โ‰”

( 0 ๐พโˆ—โˆ’๐พ 0

). Then ฮž is skew-adjoint, and ๐ด is maximally

monotone by the denition of the inner product on๐‘ˆ = ๐‘‹ ร—๐‘Œ and Theorem 6.11. The claimnow follows from Lemma 9.8. ๏ฟฝ

With this, we can deduce the convergence of the PDPS method.

126

9 weak convergence

Corollary 9.13. Let the convex, proper, and lower semicontinuous functions ๐น : ๐‘‹ โ†’ โ„,๐บ : ๐‘Œ โ†’ โ„, and the linear operator ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) satisfy the assumptions of Theorem 5.10.If, moreover, ๐œŽ๐œ โ€–๐พ โ€–2

๐•ƒ(๐‘‹ ;๐‘Œ ) < 1, then the sequence {๐‘ข๐‘˜ โ‰” (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜)}๐‘˜โˆˆโ„• generated by thePDPS method (9.19) for any initial iterate ๐‘ข0 โˆˆ ๐‘‹ ร— ๐‘Œ converges weakly in ๐‘ˆ to a pair๐‘ข โ‰” (๐‘ฅ, ๐‘ฆ) โˆˆ ๐ปโˆ’1(0), i.e., satisfying (8.13).

Proof. By Lemma 9.11,๐‘€ is self-adjoint and positive denite and thus has a bounded inverse.Minding Lemma 9.12, we can therefore apply Theorem 9.7 to show that (๐‘ข๐‘˜ โˆ’ ๐‘ข) โ‡€ 0 forsome ๐‘ข โˆˆ ๐ปโˆ’1(0) with respect to the inner product ใ€ˆ๐‘€ ยท , ยท ใ€‰๐‘ˆ . Since ๐‘€ is has a boundedinverse, this implies that

ใ€ˆ๐‘ข๐‘˜ , ๐‘€๐‘คใ€‰๐‘ˆ = ใ€ˆ๐‘€๐‘ข๐‘˜ ,๐‘คใ€‰๐‘ˆ โ†’ ใ€ˆ๐‘€๐‘ข,๐‘คใ€‰๐‘ˆ = ใ€ˆ๐‘ข,๐‘€๐‘คใ€‰๐‘ˆ for all๐‘ค โˆˆ ๐‘ˆ

and hence ๐‘ข๐‘˜ โ‡€ ๐‘ข in๐‘ˆ since ran๐‘€ = ๐‘ˆ due to the invertibility of๐‘€ . ๏ฟฝ

9.4 preconditioned explicit splitting methods: pdes and more

Let ๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘‹ be monotone operators and consider the iterative scheme

(9.21) 0 โˆˆ ๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ฅ๐‘˜) +๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜),

which is implicit in ๐ด but explicit in ๐ต. We obviously intend to use this method to ndsome ๐‘ฅ โˆˆ (๐ด + ๐ต)โˆ’1(0).As we have seen, the proximal point, PDPS, and DRS methods are all of the form (9.21) with๐ต = 0. The basic explicit splitting method is also of this form with ๐ด = ๐œ•๐บ , ๐ต = โˆ‡๐น , and๐‘€ = ๐œโˆ’1Id. It is moreover not dicult to see from (8.25) that primal-dual explicit splitting(PDES) method is also of the form (9.21) with nonzero ๐ต. So to prove the convergence ofthis algorithm, we want to improve Theorem 9.6 to be able to deal with the preconditioningoperator๐‘€ and the general monotone operators ๐ด and ๐ต in place of subdierentials andgradients.

To proceed, we need a suitable notion of smoothness for ๐ต to be able to deal with theexplicit step. In Theorem 9.6 we only used the Lipschitz continuity of โˆ‡๐น in two places:rst, to establish the three-point monotonicity using Corollary 7.2, and second, at the endof the proof for a continuity argument. To simplify dealing with ๐ต that may only act on asubspace, as in the case of the primal-dual explicit splitting in Section 8.5, we now makethis three-point monotonicity with respect to an operator ฮ› our main assumption.

Specically, we say that ๐ต : ๐‘‹ โ‡’ ๐‘‹ is three-point monotone at ๐‘ฅ โˆˆ ๐‘‹ with respect toฮ› โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) if

(9.22) ใ€ˆ๐ต(๐‘ง) โˆ’ ๐ต(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰ โ‰ฅ โˆ’ 14 โ€–๐‘ง โˆ’ ๐‘ฅ โ€–

2ฮ› for all ๐‘ฅ, ๐‘ง โˆˆ ๐‘‹ .

127

9 weak convergence

If this holds for every ๐‘ฅ , we say that ๐ต is three-point monotone with respect to ฮ›. FromCorollary 7.2, it is clear that if โˆ‡๐น is Lipschitz continuous with constant ๐ฟ, then ๐ต = โˆ‡๐น isthree-point monotone with respect to ฮ› = ๐ฟ Id.

We again start with a lemma exploiting the structural properties of the saddle-point operatorto show a โ€œshifted outer semicontinuityโ€.

Lemma 9.14. Let ๐ป = ๐ด + ๐ต : ๐‘‹ โ‡’ ๐‘‹ be weak-to-strong outer semicontinuous with ๐ตsingle-valued and Lipschitz continuous. If๐‘ค๐‘˜+1 โˆˆ ๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ง๐‘˜) for ๐‘˜ โˆˆ โ„• with๐‘ค๐‘˜ โ†’ ๏ฟฝ๏ฟฝand ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ†’ 0 strongly in ๐‘‹ and ๐‘ฅ๐‘˜ โ‡€ ๐‘ฅ weakly in ๐‘‹ , then ๏ฟฝ๏ฟฝ โˆˆ ๐ป (๐‘ฅ).

Proof. We have๐‘ค๐‘˜+1 โˆˆ ๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ง๐‘˜) so that

๏ฟฝ๏ฟฝ๐‘˜+1 โ‰” ๐‘ค๐‘˜+1 โˆ’ ๐ต(๐‘ง๐‘˜) + ๐ต(๐‘ฅ๐‘˜+1) โˆˆ ๐ป (๐‘ฅ๐‘˜+1).Since๐‘ค๐‘˜+1 โ†’ ๏ฟฝ๏ฟฝ and ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ†’ 0 and ๐ต is Lipschitz continuous, we have ๏ฟฝ๏ฟฝ๐‘˜+1 โ†’ ๏ฟฝ๏ฟฝ aswell. The weak-to-strong outer semicontinuity of๐ป then immediately yields ๏ฟฝ๏ฟฝ โˆˆ ๐ป (๐‘ฅ). ๏ฟฝ

Theorem 9.15. Let๐ป = ๐ด+๐ต with๐ปโˆ’1(0) โ‰  โˆ… for๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘‹ with๐ดmonotone and๐ต single-valued Lipschitz continuous and three-point monotone with respect to some ฮ› โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ).Furthermore, let๐‘€ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) be self-adjoint, positive denite, with a bounded inverse, andsatisfy (2 โˆ’ Y)๐‘€ โ‰ฅ ฮ› for some Y > 0. Suppose ๐ป is weak-to-strong outer semicontinuous.Let the starting point ๐‘ฅ0 โˆˆ ๐‘‹ be arbitrary and assume that (9.21) has a unique solution ๐‘ฅ๐‘˜+1for every ๐‘˜ โˆˆ โ„•. Then the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• of (9.21) satisfy ๐‘€ 1/2(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ) โ‡€ 0 for some๐‘ฅ โˆˆ ๐ปโˆ’1(0).

Proof. The proof follows along the same lines as that of Theorem 9.7 with minor modica-tions. First, since 0 โˆˆ ๐ป (๐‘ฅ), the monotonicity of ๐ด and the three-point monotonicity (9.22)of ๐ต yields

ใ€ˆ๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰ โ‰ฅ โˆ’ 14 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2ฮ›,which together with (9.21) leads to

ใ€ˆ๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰ โ‰ค 14 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2ฮ›.

From the preconditioned three-point identity (9.11) we then obtain

(9.23) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€ + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘€โˆ’ฮ›/2 โ‰ค12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘€ .

Our assumption that (2 โˆ’ Y)๐‘€ โ‰ฅ ฮ› implies that ๐‘€ โˆ’ ฮ›/2 โ‰ฅ Y๐‘€/2. By denition, we cantherefore bound the second norm on the left-hand side from below to obtain (9.13) with anadditional constant depending on Y. We may thus proceed as in the proof of Theorem 9.7 toestablish๐‘ค๐‘˜+1 โ‰” โˆ’๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) โ†’ 0. We now have๐‘ค๐‘˜+1 โˆˆ ๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ฅ๐‘˜) and thereforeuse Lemma 9.14 with ๐‘ง๐‘˜ = ๐‘ฅ๐‘˜ and ๏ฟฝ๏ฟฝ = 0 to establish 0 โˆˆ ๐ป (๐‘ฅ). The rest of the proof againproceeds as for Theorem 9.7 with the application of Opialโ€™s Lemma 9.1. ๏ฟฝ

128

9 weak convergence

We again apply this result to show the convergence of specic splitting methods containingan explicit step.

primal-dual explicit splitting

We now return to algorithms for problems of the form

min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ)

for Gรขteaux dierentiable ๐น and linear๐พ . Recall from (8.23) the primal-dual explicit splitting(PDES) method

(9.24){๐‘ฆ๐‘˜+1 = prox๐บโˆ— ((Id โˆ’ ๐พ๐พโˆ—)๐‘ฆ๐‘˜ + ๐พ (๐‘ฅ๐‘˜ โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜))),๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ ๐พโˆ—๐‘ฆ๐‘˜+1,

which can be written in implicit form as

(9.25) 0 โˆˆ ๐ป (๐‘ข๐‘˜+1) +(โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜+1)

0

)+๐‘€ (๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜)

with

(9.26) ๐ป (๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ) + ๐พโˆ—๐‘ฆ๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ๐‘ฅ

)and ๐‘€ โ‰”

(Id 00 Id โˆ’ ๐พ๐พโˆ—

),

for ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ =: ๐‘ˆ .

Corollary 9.16. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous,and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ). Suppose ๐น is Gรขteaux dierentiable with ๐ฟ-Lipschitz gradient for ๐ฟ < 2,that โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) < 1, and that the assumptions of Theorem 5.10 are satised. Then for anyinitial iterate ๐‘ข0 โˆˆ ๐‘‹ ร— ๐‘Œ the iterates {๐‘ข๐‘˜ = (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜)}๐‘˜โˆˆโ„• of the (8.23) converge weakly tosome ๐‘ข โˆˆ ๐ปโˆ’1(0) with ๐ป given by (8.14).

Proof. We recall that Theorem 5.10 guarantees that ๐ปโˆ’1(0) โ‰  โˆ…. To apply Theorem 9.15, wewrite ๐ป = ๐ด + ๐ต for

๐ด(๐‘ข) โ‰”(

0๐œ•๐บโˆ—(๐‘ฆ)

)+ ฮž๐‘ข, ๐ต(๐‘ข) โ‰”

(โˆ‡๐น (๐‘ฅ)0

), ฮž โ‰”

(0 ๐พโˆ—

โˆ’๐พ 0

).

We rst note that ๐‘€ as given in (9.26) is self-adjoint and positive denite under ourassumption โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) < 1. By Corollary 7.2, the three-point monotonicity (9.22) holds forฮ› โ‰”

(๐ฟ 00 0

). Since ๐ฟ < 2, there furthermore exists an Y > 0 suciently small such that

(2โˆ’Y)๐‘€ โ‰ฅ ฮ›. Finally, Lemma 9.12 shows that๐ป is maximally monotone and hence weak-to-strong outer semicontinuous by Lemma 6.8. The claim now follows from Theorem 9.15. ๏ฟฝ

Remark 9.17. It is possible to improve the result to โ€–๐พ โ€– โ‰ค 1 if we increase the complexity ofTheorem 9.15 slightly to allow for ๐‘€ โ‰ฅ 0. However, in this case it is only possible to show theconvergence of the partial iterates {๐‘ฅ๐‘˜ }๐‘˜โˆˆโ„•.

129

9 weak convergence

primal-dual proximal splitting with an additional forward step

Using a similar switching term as in the implicit formulation (9.25) of the PDES method,it is possible to incorporate additional forward steps in the PDPS method. For ๐น = ๐น0 + ๐ธwith ๐น0, ๐ธ convex and ๐ธ Gรขteaux dierentiable, we therefore consider

(9.27) min๐‘ฅโˆˆ๐‘‹

๐น0(๐‘ฅ) + ๐ธ (๐‘ฅ) +๐บ (๐พ๐‘ฅ).

With ๐‘ข = (๐‘ฅ, ๐‘ฆ) and following Section 8.4, any minimizer ๐‘ฅ โˆˆ ๐‘‹ satises 0 โˆˆ ๐ป (๐‘ข) for

(9.28) ๐ป (๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ) + โˆ‡๐ธ (๐‘ฅ) + ๐พโˆ—๐‘ฆ

๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ๐‘ฅ).

Similarly, following the arguments in Section 8.4, we can show that the iteration

(9.29)

๐‘ฅ๐‘˜+1 = prox๐œ๐น0 (๐‘ฅ๐‘˜ โˆ’ ๐œโˆ‡๐ธ (๐‘ฅ๐‘˜) โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜),๐‘ฅ๐‘˜+1 = 2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜+1 = prox๐œŽ๐บโˆ— (๐‘ฆ๐‘˜ + ๐œŽ๐พ๐‘ฅ๐‘˜+1),

is equivalent to the implicit formulation

0 โˆˆ(๐œ•๐น0(๐‘ฅ๐‘˜+1) + โˆ‡๐ธ (๐‘ฅ๐‘˜) + ๐พโˆ—๐‘ฆ๐‘˜+1

๐œ•๐บ (๐‘ฆ๐‘˜+1) โˆ’ ๐พ๐‘ฅ๐‘˜+1)+๐‘€ (๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜)

with the preconditioner๐‘€ dened as in (9.18). The convergence can thus be shown as forthe PDES method.

Corollary 9.18. Let ๐ธ : ๐‘‹ โ†’ โ„, ๐น0 : ๐‘‹ โ†’ โ„, and ๐บ : ๐‘Œ โ†’ โ„ be proper, convex, and lowersemicontinuous, and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ). Suppose ๐ธ is Gรขteaux dierentiable with an ๐ฟ-Lipschitzgradient, and that the assumptions of Theorem 5.10 are satised with ๐น โ‰” ๐น0 + ๐ธ. Assume,moreover, that ๐œ, ๐œŽ > 0 satisfy

(9.30) 1 > โ€–๐พ โ€–2๐•ƒ(๐‘‹ ;๐‘Œ )๐œ๐œŽ + ๐œ ๐ฟ2 .

Then for any initial iterate ๐‘ข0 โˆˆ ๐‘‹ ร—๐‘Œ the iterates {๐‘ข๐‘˜}๐‘˜โˆˆโ„• of (9.29) converge weakly to some๐‘ข โˆˆ ๐ปโˆ’1(0) for ๐ป given by (9.28).

Proof. As before, Theorem 5.10 guarantees that ๐ปโˆ’1(0) โ‰  โˆ…. We apply Theorem 9.15 to

๐ด(๐‘ข) โ‰”(๐œ•๐น0(๐‘ฅ)๐œ•๐บโˆ—(๐‘ฆ)

)+ ฮž๐‘ข, ๐ต(๐‘ข) โ‰”

(โˆ‡๐ธ (๐‘ฅ)0

), ฮž โ‰”

(0 ๐พโˆ—

โˆ’๐พ 0

),

130

9 weak convergence

and ๐‘€ given by (9.18). By Corollary 7.2, the three-point monotonicity (9.22) holds withฮ› โ‰”

(๐ฟ 00 0

). We have already shown in Lemma 9.11 that ๐‘€ is self-adjoint and positive

denite. Furthermore, from (9.20) in the proof of Lemma 9.11, we have

ใ€ˆ๐‘€๐‘ข,๐‘ขใ€‰ โ‰ฅ (1 โˆ’ โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ )โˆš๐œŽ๐œ) (๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œ )

Thus (9.30) implies that๐‘€ is positive denite. Arguing similarly to (9.20), we also estimate

ใ€ˆ๐‘€๐‘ข,๐‘ขใ€‰๐‘ˆ โ‰ฅ ๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ โˆ’ 2โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ€–๐‘ฅ โ€–๐‘‹ โ€–๐‘ฆ โ€–๐‘Œ + ๐œŽโˆ’1โ€–๐‘ฆ โ€–2๐‘Œ โ‰ฅ (1 โˆ’ โ€–๐พ โ€–2๐•ƒ(๐‘‹ ;๐‘Œ )๐œŽ๐œ)๐œโˆ’1โ€–๐‘ฅ โ€–2๐‘‹ .

By the strict inequality in (9.30), we thus deduce (2 โˆ’ Y)๐‘€ โ‰ฅ ฮ› for some Y > 0.

Now by Lemma 9.12, ๐ป is again maximally monotone and therefore weak-to-strong outersemicontinuous by Lemma 6.8, and the claim follows from Theorem 9.15. ๏ฟฝ

Remark 9.19. The forward step was introduced to the basic PDPS method in [Condat 2013; Vu 2013],see also [Chambolle & Pock 2015]. These papers also introduced an additional over-relaxation stepthat we will discuss in Chapter 12.

9.5 fixed-point theorems

Based on our generic approach, we now prove the classical Browderโ€™s xed-point theorem,which can itself be used to prove the convergence of optimization methods and other xed-point iterations (see Remark 9.5). We recall from Lemma 6.15 that rmly nonexpansive mapsare (1/2)-averaged, so the result applies by Lemma 6.13 to the resolvents of maximallymonotone maps in particular โ€“ hence proving the convergence of the proximal pointmethod.

Theorem 9.20 (Browderโ€™s fixed-point theorem). On a Hilbert space ๐‘‹ , suppose ๐‘‡ : ๐‘‹ โ†’ ๐‘‹is ๐›ผ-averaged for some ๐›ผ โˆˆ (0, 1) and has a xed point ๐‘ฅ = ๐‘‡ (๐‘ฅ). Let ๐‘ฅ๐‘˜+1 โ‰” ๐‘‡ (๐‘ฅ๐‘˜). Then๐‘ฅ๐‘˜ โ‡€ ๐‘ฅ weakly in ๐‘‹ for some xed point ๐‘ฅ of ๐‘‡ .

Proof. Finding a xed point of ๐‘‡ is equivalent to nding a root of ๐ป (๐‘ฅ) โ‰” ๐‘‡ (๐‘ฅ) โˆ’ ๐‘ฅ .Similarly, we can rewrite the xed-point iteration as solving for ๐‘ฅ๐‘˜+1 the inclusion

(9.31) 0 = ๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ๐‘˜) + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

Proceeding as in the previous sections, we test this by the application of ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ .After application of the three-point identity (9.1), we then obtain

(9.32) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ใ€ˆ๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

131

9 weak convergence

Since ๐‘ฅ๐‘˜+1 = ๐‘‡ (๐‘ฅ๐‘˜), ๐‘ฅ is a xed point of ๐‘‡ , and by assumption ๐‘‡ = (1 โˆ’ ๐›ผ)Id + ๐›ผ ๐ฝ for somenonexpansive operator ๐ฝ : ๐‘‹ โ†’ ๐‘‹ , we have

ใ€ˆ๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โˆ’ (๐‘‡ (๐‘ฅ๐‘˜) โˆ’๐‘‡ (๐‘ฅ)),๐‘‡ (๐‘ฅ๐‘˜) โˆ’๐‘‡ (๐‘ฅ)ใ€‰๐‘‹= ๐›ผ ใ€ˆ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โˆ’ (๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)), (1 โˆ’ ๐›ผ) (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ) + ๐›ผ (๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ))ใ€‰๐‘‹= (๐›ผ โˆ’ ๐›ผ2)โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐›ผ2โ€– ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)โ€–2๐‘‹+ (2๐›ผ2 โˆ’ ๐›ผ)ใ€ˆ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ, ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)ใ€‰๐‘‹

as well as12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ =12 โ€–๐‘‡ (๐‘ฅ

๐‘˜) โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ =๐›ผ2

2 โ€– ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ =๐›ผ2

2 โ€– ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ) โˆ’ (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)โ€–2๐‘‹

=๐›ผ2

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐›ผ2

2 โ€– ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)โ€–2๐‘‹ โˆ’ ๐›ผ2ใ€ˆ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ, ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)ใ€‰๐‘‹ .

Thus, for any ๐›ฟ > 0,

1 โˆ’ ๐›ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ใ€ˆ๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ = ((1 + ๐›ฟ)๐›ผ2 โˆ’ ๐›ผ)ใ€ˆ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ, ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)ใ€‰๐‘‹

+ 2๐›ผ โˆ’ (1 + ๐›ฟ)๐›ผ22 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ (1 + ๐›ฟ)๐›ผ2

2 โ€– ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)โ€–2๐‘‹ .

Taking ๐›ฟ = 1๐›ผ โˆ’ 1, we have ๐›ฟ > 0 and ๐›ผ = (1 + ๐›ฟ)๐›ผ2. Thus the factor in front of the inner

product term is positive, and hence we obtain by the nonexpansivity of ๐ฝ

1 โˆ’ ๐›ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ใ€ˆ๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ =

๐›ผ

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐›ผ

2 โ€– ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ)โ€–2๐‘‹ โ‰ฅ 0.

From (9.32), it now follows that

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐›ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค 1

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

As before, this implies Fejรฉr monotonicity of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• and that โ€–๐‘ฅ๐‘˜+1โˆ’๐‘ฅ๐‘˜ โ€–๐‘‹ โ†’ 0. The latterimplies โ€–๐‘‡ (๐‘ฅ๐‘˜) โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘‹ โ†’ 0 via (9.31). Let ๐‘ฅ be any weak limit point of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. Denote by๐‘ โŠ‚ โ„• be the indices of the corresponding subsequence. We snow that ๐‘ฅ is a xed point of๐‘‡ . Since by Lemma 6.17 the set of xed points is convex and closed, the claim then followsfrom Opialโ€™s Lemma 9.1.

To show that ๐‘ฅ is a xed point of ๐‘‡ , rst, we expand12 โ€–๐‘ฅ

๐‘˜ โˆ’๐‘‡ (๐‘ฅ)โ€–2๐‘‹ =12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ + 12 โ€–๐‘ฅ โˆ’๐‘‡ (๐‘ฅ)โ€–2๐‘‹ + ใ€ˆ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ, ๐‘ฅ โˆ’๐‘‡ (๐‘ฅ)ใ€‰๐‘‹ .

Since ๐‘ฅ๐‘˜ โ‡€ ๐‘ฅ , this gives

lim sup๐‘3๐‘˜โ†’โˆž

12 โ€–๐‘ฅ

๐‘˜ โˆ’๐‘‡ (๐‘ฅ)โ€–2๐‘‹ โ‰ฅ lim sup๐‘3๐‘˜โ†’โˆž

12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ + โ€–๐‘ฅ โˆ’๐‘‡ (๐‘ฅ)โ€–2๐‘‹ .

132

9 weak convergence

On the other hand, by the nonexpansivity of ๐‘‡ and ๐‘‡ (๐‘ฅ๐‘˜) โˆ’ ๐‘ฅ๐‘˜ โ†’ 0, we have

lim sup๐‘3๐‘˜โ†’โˆž

โ€–๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ)โ€–๐‘‹ โ‰ค lim sup๐‘3๐‘˜โ†’โˆž

(โ€–๐‘‡ (๐‘ฅ๐‘˜) โˆ’๐‘‡ (๐‘ฅ)โ€–๐‘‹ + โ€–๐‘ฅ๐‘˜ โˆ’๐‘‡ (๐‘ฅ๐‘˜)โ€–๐‘‹

)โ‰ค lim sup

๐‘3๐‘˜โ†’โˆžโ€–๐‘ฅ๐‘˜ โˆ’๐‘ฅ โ€–๐‘‹ .

Together this two inequalities show, as desired, that โ€–๐‘‡ (๐‘ฅ) โˆ’ ๐‘ฅ โ€– = 0. ๏ฟฝ

Remark 9.21. Theorem 9.20 in its modern form (stated for rmly nonexpansive or more generally๐›ผ-averaged maps) can be rst found in [Browder 1967]. However, similar results for what are nowcalled Krasnosel โ€ฒskiฤฑโ€“Mann iterations โ€“ which are closely related to ๐›ผ-averaged maps โ€“ were statedin more limited settings in [Mann 1953; Schaefer 1957; Petryshyn 1966; Krasnoselโ€ฒskiฤฑ 1955; Opial1967].

133

10 RATES OF CONVERGENCE BY TESTING

As we have seen, minimizers of convex problems in a Hilbert space ๐‘‹ can generally becharacterized by the inclusion

0 โˆˆ ๐ป (๐‘ฅ)for the unknown ๐‘ฅ โˆˆ ๐‘‹ and a suitable monotone operator ๐ป : ๐‘‹ โ‡’ ๐‘‹ . This inclusion inturn can be solved using a (preconditioned) proximal point iteration that converges weaklyunder suitable assumptions. In the present chapter, we want to improve this analysis toobtain convergence rates, i.e., estimates of the distance โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ of iterates to ๐‘ฅ in terms ofthe iteration number ๐‘˜ . Our general approach will be to consider this distance multipliedby an iteration-dependent testing parameter ๐œ‘๐‘˜ (or, for structured algorithms, considerthe norm relative to a testing operator) and to show by roughly the same arguments as inChapter 9 that this product stays bounded: ๐œ‘๐‘˜ โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ๐ถ . If we can then show that thistesting parameter grows at a certain rate, the distance must decay at the reciprocal rate.Consequently, we can now avoid the complications of dealing with weak convergence; infact, this chapter will consist of simple algebraic manipulations. However, for this to workwe need to assume additional properties of ๐ป , namely strong monotonicity. Recall fromLemma 7.4 that ๐ป is called strongly monotone with factor ๐›พ > 0 if

(10.1) ใ€ˆ๐ป (๐‘ฅ) โˆ’ ๐ป (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ (๐‘ฅ, ๐‘ฅ โˆˆ ๐‘‹ ),

where, in a slight abuse of notation, the left-hand side is understood to stand for any choiceof elements from ๐ป (๐‘ฅ) and ๐ป (๐‘ฅ).Before we turn to the actual estimates, we rst dene various notions of convergence rates.Consider a function ๐‘Ÿ : โ„• โ†’ [0,โˆž) (e.g., ๐‘Ÿ (๐‘˜) = โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ or ๐‘Ÿ (๐‘˜) = ๐บ (๐‘ฅ๐‘˜) โˆ’๐บ (๐‘ฅ) for ๐‘ฅ aminimizer of ๐บ).

(i) We say that ๐‘Ÿ (๐‘˜) converges (to zero as ๐‘˜ โ†’ โˆž) at the rate ๐‘‚ (๐‘“ (๐‘˜)) if ๐‘Ÿ (๐‘˜) โ‰ค ๐ถ๐‘“ (๐‘˜)for some constant ๐ถ > 0 for all ๐‘˜ โˆˆ โ„• and a decreasing function ๐‘“ : โ„• โ†’ [0,โˆž)with lim๐‘˜โ†’โˆž ๐‘“ (๐‘˜) = 0 (e.g., ๐‘“ (๐‘˜) = 1/๐‘˜ or ๐‘“ (๐‘˜) = 1/๐‘˜2).

(ii) Analogously, we say that a function ๐‘… : โ„• โ†’ โˆž grows at the rate ฮฉ(๐น (๐‘˜)) if ๐‘…(๐‘˜) โ‰ฅ๐‘๐น (๐‘˜) for all๐‘˜ โˆˆ โ„• for some constant๐‘ > 0 and an increasing function ๐‘“ : โ„• โ†’ [0,โˆž)with lim๐‘˜โ†’โˆž ๐น (๐‘˜) = โˆž.

134

10 rates of convergence by testing

Clearly ๐‘Ÿ = 1/๐‘… converges to zero at the rate ๐‘“ = 1/๐น if and only if ๐‘… grows at the rate ๐น .The most common cases are ๐น (๐‘˜) = ๐‘˜ or ๐น (๐‘˜) = ๐‘˜2.We can alternatively characterize orders of convergence via

` โ‰” lim๐‘˜โ†’โˆž

๐‘Ÿ (๐‘˜ + 1)๐‘Ÿ (๐‘˜) .

(i) If ` = 1, we say that ๐‘Ÿ (๐‘˜) converges (to zero as ๐‘˜ โ†’ โˆž) sublinearly.

(ii) If ` โˆˆ (0, 1), then this convergence is linear. This is equivalent to a convergence atthe rate ๐‘‚ ( หœ๐‘˜) for any หœ โˆˆ (`, 1).

(iii) If ` = 0, then the convergence is superlinear.

Dierent rates of superlinear convergence can also be studied. We say that ๐‘Ÿ (๐‘˜) converges(to zero as ๐‘˜ โ†’ โˆž) superlinearly with order ๐‘ž > 1 if

lim๐‘˜โ†’โˆž

๐‘Ÿ (๐‘˜ + 1)๐‘Ÿ (๐‘˜)๐‘ž < โˆž.

The most common case is ๐‘ž = 2, which is also known as quadratic convergence. (This is notto be confused with the โ€“ much slower โ€“ convergence at the rate ๐‘‚ (1/๐‘˜2); similarly, linearconvergence is dierent from โ€“ and much faster โ€“ than convergence at the rate๐‘‚ (1/๐‘˜).)

10.1 the fundamental methods

Before going into this abstract operator-based theory, we demonstrate the general conceptof testing by studying the fundamental methods, the proximal point and explicit splittingmethods. These are purely primal methods with a single step length parameter, whichsimplies the testing approach since we only need a single testing parameter. (It shouldbe pointed out that the proofs in this section can be carried out โ€“ and in fact shortened โ€“without introducing testing parameters at all. Nevertheless, we follow this approach sinceit provides a blueprint for the proofs for the structured primal-dual methods where theseare required.)

proximal point method

We start with the basic proximal point method for solving 0 โˆˆ ๐ป (๐‘ฅ) for a monotoneoperator ๐ป : ๐‘‹ โ‡’ ๐‘‹ , which we recall can be written in implicit form as

(10.2) 0 โˆˆ ๐œ๐‘˜๐ป (๐‘ฅ๐‘˜+1) + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

135

10 rates of convergence by testing

Theorem 10.1 (proximal point method iterate rates). Suppose ๐ป : ๐‘‹ โ‡’ ๐‘‹ is stronglymonotone with ๐ปโˆ’1(0) โ‰  โˆ…. Let ๐‘ฅ๐‘˜+1 โ‰” R๐œ๐‘˜๐ป (๐‘ฅ๐‘˜) for some {๐œ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ (0,โˆž) and ๐‘ฅ0 โˆˆ ๐‘‹ bearbitrary. Then the following hold for the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• and the unique point ๐‘ฅ โˆˆ ๐ปโˆ’1(0):

(i) If ๐œ๐‘˜ โ‰ก ๐œ is constant, then โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ โ†’ 0 linearly.

(ii) If ๐œ๐‘˜ โ†’โˆž, then โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ โ†’ 0 superlinearly.

Proof. Let ๐‘ฅ โˆˆ ๐ปโˆ’1(0); by assumption, such a point exists and is unique due to the assumedstrong monotonicity of ๐ป (since inserting any two roots ๐‘ฅ, ๐‘ฅ โˆˆ ๐‘‹ of ๐ป in (10.1) yieldsโ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค 0). For each iteration ๐‘˜ โˆˆ โ„•, pick a testing parameter ๐œ‘๐‘˜ > 0 and apply the test๐œ‘๐‘˜ ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ to (10.2) to obtain (using the same notation from Theorem 9.4)

(10.3) 0 โˆˆ ๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐ป (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐œ‘๐‘˜ ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ .By the strong monotonicity of ๐ป , and the fact that 0 โˆˆ ๐ป (๐‘ฅ), for some ๐›พ > 0,

ใ€ˆ๐ป (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹Multiplying this inequality with ๐œ‘๐‘˜๐œ๐‘˜ and using (10.3), we obtain

๐œ‘๐‘˜๐œ๐‘˜๐›พ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜ ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0.

An application of the three-point identity (9.1) then yields

(10.4) ๐œ‘๐‘˜ (1 + 2๐œ๐‘˜๐›พ)2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

Let us now force on the testing parameters the recursion

(10.5) ๐œ‘0 = 1, ๐œ‘๐‘˜+1 = ๐œ‘๐‘˜ (1 + 2๐œ๐‘˜๐›พ).Then (10.4) yields

(10.6) ๐œ‘๐‘˜+12 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

We now distinguish the two cases for the step sizes ๐œ๐‘˜ .

(i) Summing (10.6) for ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 gives

๐œ‘๐‘2 โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ +

๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘0

2 โ€–๐‘ฅ0 โˆ’ ๐‘ฅ โ€–2๐‘‹ .

In particular, ๐œ‘0 = 1 implies that

โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ โ‰ค โ€–๐‘ฅ0 โˆ’ ๐‘ฅ โ€–2๐‘‹/๐œ‘๐‘ .Since ๐œ๐‘˜ โ‰ก ๐œ , (10.5) implies that ๐œ‘๐‘ = (1 + 2๐œ๐›พ)๐‘ . Setting หœ โ‰” (1 + 2๐œ๐›พ)โˆ’1/2 < 1 nowgives convergence at the rate ๐‘‚ ( หœโˆ’๐‘ ) and therefore the claimed linear rate.

136

10 rates of convergence by testing

(ii) From (10.6) combined with (10.5) follows directly that

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹

โ‰ค ๐œ‘๐‘˜๐œ‘๐‘˜+1

= (1 + 2๐œ๐‘˜๐›พ)โˆ’1 โ†’ 0

since ๐œ๐‘˜ โ†’ โˆž, which implies the claimed superlinear convergence of โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ . (Asimilar argument can be used to directly show linear convergence for constant stepsizes.) ๏ฟฝ

explicit splitting

We now return to problems of the form

(10.7) min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐‘ฅ)

for Gรขteaux dierentiable ๐น , and study the convergence rates of the forward-backwardsplitting method

(10.8) ๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œโˆ‡๐น (๐‘ฅ๐‘˜)),which we recall can be written in implicit form as

(10.9) 0 โˆˆ ๐œ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

Theorem 10.2 (forward-backward spliing iterate rates). Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘‹ โ†’ โ„

be convex, proper, and lower semicontinuous. Suppose further that ๐น is Gรขteaux dierentiable,โˆ‡๐น is Lipschitz continuous with constant ๐ฟ > 0, and๐บ is ๐›พ-strongly convex for some ๐›พ > 0. If[๐œ•(๐น +๐บ)]โˆ’1(0) โ‰  โˆ… and the step length parameter ๐œ > 0 satises ๐œ๐ฟ โ‰ค 2, then for any initialiterate ๐‘ฅ0 โˆˆ ๐‘‹ the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• generated by the explicit splitting method (10.8) convergelinearly to the unique minimizer of (10.7).

Proof. Let ๐‘ฅ โˆˆ [๐œ•(๐น +๐บ)]โˆ’1(0); by assumption, such a point exists and is unique due tothe strong and therefore strict convexity of ๐บ . As in the proof of Theorem 10.1, for eachiteration ๐‘˜ โˆˆ โ„•, pick a testing parameter ๐œ‘๐‘˜ > 0 and apply the test ๐œ‘๐‘˜ ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰ to (10.9)to obtain

(10.10) 0 โˆˆ ๐œ‘๐‘˜๐œ ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐œ‘๐‘˜ ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ .Since ๐บ is strongly convex, it follows from (10.9) and Lemma 7.4 (iii) that

ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) โˆ’ ๐œ•๐บ (๐‘ฅ), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ .Similarly, since โˆ‡๐น is Lipschitz continuous, it follows from Corollary 7.2 that

ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐น (๐‘ฅ), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐ฟ4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

137

10 rates of convergence by testing

Combining the last two inequalities with 0 โˆˆ ๐œ•๐บ (๐‘ฅ) + โˆ‡๐น (๐‘ฅ), we obtain

(10.11) ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Inserting this into (10.10) and using the three-point identity, as in the proof of Theorem 10.1,we now obtain

(10.12) ๐œ‘๐‘˜ (1 + 2๐œ๐›พ)2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜ (1 โˆ’ ๐œ๐ฟ/2)2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

Since 1 โˆ’ ๐œ๐ฟ/2 โ‰ฅ 0, summing over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1, we arrive at๐œ‘๐‘2 โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ โ‰ค ๐œ‘0

2 โ€–๐‘ฅ0 โˆ’ ๐‘ฅ โ€–2๐‘‹ .

As in Theorem 10.1, the denition of ๐œ‘๐‘˜ shows that โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ โ†’ 0 linearly. ๏ฟฝ

Observe that it is not possible to obtain superlinear convergence in this case since theassumption ๐œ๐‘˜ โ‰ค 2๐ฟโˆ’1 forces the step lengths to remain bounded.

10.2 structured algorithms and acceleration

We now to extend the analysis above to the structured case where ๐ป = ๐ด + ๐ต, since wehave already seen that most common rst-order algorithm can be written as calculatingin each step the next iterate ๐‘ฅ๐‘˜+1 from a specic instance of the general preconditionedimplicitโ€“explicit splitting method

(10.13) 0 โˆˆ ๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ฅ๐‘˜) +๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) .

In the proofs of convergence of the proximal point and explicit splitting methods (e.g., inTheorems 10.1 and 10.2 as well as in Chapter 9), we had the step length ๐œ๐‘˜ in front of ๐ปor โˆ‡๐น + ๐œ•๐บ . On the other hand, in Section 9.3 on structured algorithms, we incorporatedthe step length parameters into the preconditioning operator ๐‘€ . To transfer the testingapproach from these fundamental methods to the structured methods, we will now splitthem out from๐‘€ and move them in front of ๐ป as well by introducing a step length operator๐‘Š๐‘˜+1. We will also allow the preconditioner๐‘€๐‘˜+1 to vary by iteration; as we will see below,this is required for accelerated versions of the PDPS method. Correspondingly, we considerthe scheme

(10.14) 0 โˆˆ๐‘Š๐‘˜+1 [๐ด(๐‘ฅ๐‘˜+1) + ๐ต(๐‘ฅ๐‘˜)] +๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

Since we now have a step length operator instead of a single scalar step length, we willalso have to consider instead of a scalar testing parameter an iteration-dependent testingoperator ๐‘๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ). The rough idea is that ๐‘๐‘˜+1๐‘€ โ€“ or, as needed for accelerated

138

10 rates of convergence by testing

algorithms, ๐‘๐‘˜+1๐‘€๐‘˜+1 โ€“ will form a โ€œlocal normโ€ that measures the rate of convergencein a nonuniform way; and rather than testing the (scalar) three-point identity (10.4), wewill build the testing already into the initial strong monotonicity inequality. We thereforerequire an operator-level version of strong monotonicity, which we introduce next.

Let ๐ด : ๐‘‹ โ‡’ ๐‘‹ and let ๐‘, ฮ“ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) be such that ๐‘ฮ“ is self-adjoint and positive semi-denite. Then we say that ๐ด is ฮ“-strongly monotone at ๐‘ฅ โˆˆ ๐‘‹ with respect to ๐‘ if

(10.15) ใ€ˆ๐ด(๐‘ฅ) โˆ’๐ด(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘ โ‰ฅ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘ฮ“ (๐‘ฅ โˆˆ ๐‘‹ ).

If this holds for all ๐‘ฅ โˆˆ ๐‘‹ , we say that ๐ด is ฮ“-strongly monotone with respect to ๐‘ .

It is clear that strongly monotone operators with parameter ๐›พ > 0 are ๐›พ ยท Id-strongly mono-tone with respect to ๐‘ = Id. More generally, operators with a separable block-structure,๐ด(๐‘ฅ) = (๐ด1(๐‘ฅ1), . . . , ๐ด๐‘› (๐‘ฅ๐‘›)) for ๐‘ฅ = (๐‘ฅ1, . . . , ๐‘ฅ๐‘›) satisfy the property, as as illustrated inmore detail in the next example for the two-block case.

Example 10.3. Let ๐ด(๐‘ฅ) = (๐ด1(๐‘ฅ1), ๐ด2(๐‘ฅ2)) for ๐‘ฅ = (๐‘ฅ1, ๐‘ฅ2) โˆˆ ๐‘‹1 ร—๐‘‹2 and the monotoneoperators ๐ด1 : ๐‘‹1 โ‡’ ๐‘‹1 and ๐ด2 : ๐‘‹2 โ‡’ ๐‘‹2. Suppose ๐ด1 and ๐ด2 are, respectively ๐›พ1- and๐›พ2-(strongly) monotone for ๐›พ1, ๐›พ2 โ‰ฅ 0. Then (10.15) holds for any ๐œ‘1, ๐œ‘2 > 0 for

ฮ“ โ‰”

(๐›พ1Id 00 ๐›พ2Id

)and ๐‘ โ‰”

(๐œ‘1Id 00 ๐œ‘2Id

)Let further ๐ต : ๐‘‹ โ‡’ ๐‘‹ and let ๐‘,ฮ› โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) be such that ๐‘ฮ› is self-adjoint and positivesemi-denite. Then we say that ๐ต is three-point monotone at ๐‘ฅ โˆˆ ๐‘‹ with respect to ๐‘ and ฮ›if

(10.16) ใ€ˆ๐ต(๐‘ง) โˆ’ ๐ต(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘ โ‰ฅ โˆ’ 14 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘ฮ› (๐‘ฅ, ๐‘ง โˆˆ ๐‘‹ ).

If this holds for all ๐‘ฅ โˆˆ ๐‘‹ , we say that ๐ต is three-point monotone with respect to ๐‘ and ฮ›.

Example 10.4. Let ๐ต(๐‘ฅ) = (โˆ‡๐ธ1(๐‘ฅ1),โˆ‡๐ธ2(๐‘ฅ2)) for ๐‘ฅ = (๐‘ฅ1, ๐‘ฅ2) โˆˆ ๐‘‹1 ร— ๐‘‹2 and the respec-tively ๐ฟ1- and ๐ฟ2-smooth convex functions ๐ธ1 : ๐‘‹1 โ†’ โ„ and ๐ธ2 : ๐‘‹2 โ†’ โ„. Then areferral to Corollary 7.2 shows (10.16) to hold for any ๐œ‘1, ๐œ‘2 > 0 for

ฮ› โ‰”

(๐ฟ1Id 00 ๐ฟ2Id

)and ๐‘ โ‰”

(๐œ‘1Id 00 ๐œ‘2Id

)More generally, we can take ๐ต(๐‘ฅ) = (๐ต1(๐‘ฅ1), ๐ต2(๐‘ฅ2)) for ๐ต1 : ๐‘‹1 โ†’ ๐‘‹1 and ๐ต2 : ๐‘‹2 โ†’ ๐‘‹2three-point monotone as dened in (7.9).

139

10 rates of convergence by testing

Clearly Example 10.4 as Example 10.3 generalizes to a large number of blocks, and both tooperators acting separably on more general direct sums of orthogonal subspaces.

We are now ready to forge our hammer for producing convergence rates for structuredalgorithms.

Theorem 10.5. Let๐ด, ๐ต : ๐‘‹ โ‡’ ๐‘‹ and๐ป โ‰” ๐ด+๐ต. For each ๐‘˜ โˆˆ โ„•, let further๐‘๐‘˜+1,๐‘Š๐‘˜+1, ๐‘€๐‘˜+1 โˆˆ๐•ƒ(๐‘‹ ;๐‘‹ ) be such that ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint and positive semi-denite. Assume that thereexists a ๐‘ฅ โˆˆ ๐ปโˆ’1(0). For each ๐‘˜ โˆˆ โ„•, suppose for some ฮ“,ฮ› โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) that ๐ด is ฮ“-stronglymonotone at ๐‘ฅ with respect to ๐‘๐‘˜+1๐‘Š๐‘˜+1 and that ๐ต is three-point monotone at ๐‘ฅ with respectto ๐‘๐‘˜+1๐‘Š๐‘˜+1 and ฮ›. Let the initial iterate ๐‘ฅ0 โˆˆ ๐‘‹ be arbitrary, and suppose {๐‘ฅ๐‘˜+1}๐‘˜โˆˆโ„• aregenerated by (10.14). If for every ๐‘˜ โˆˆ โ„• both

๐‘๐‘˜+1(๐‘€๐‘˜+1 + 2๐‘Š๐‘˜+1ฮ“) โ‰ฅ ๐‘๐‘˜+2๐‘€๐‘˜+2 and(10.17)๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ ๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ›/2.(10.18)

hold, then

(10.19) 12 โ€–๐‘ฅ

๐‘ โˆ’ ๐‘ฅ โ€–2๐‘๐‘+1๐‘€๐‘+1 โ‰ค12 โ€–๐‘ฅ

0 โˆ’ ๐‘ฅ โ€–2๐‘1๐‘€1.

Proof. For brevity, denote๐ป๐‘˜+1(๐‘ฅ๐‘˜+1) โ‰”๐‘Š๐‘˜+1 [๐ด(๐‘ฅ๐‘˜+1)+๐ต(๐‘ฅ๐‘˜)]. First, from (10.15) and (10.16)we have that

(10.20) ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘๐‘˜+1 โ‰ฅ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ“ โˆ’14 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ๐‘˜+1โ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ›.

Multiplying (10.14) with ๐‘๐‘˜+1 and rearranging, we obtain

๐‘๐‘˜+1(๐‘ฅ โˆ’ ๐‘ฅ๐‘˜+1 โˆ’๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)) โˆˆ ๐‘๐‘˜+1๐ป๐‘˜+1(๐‘ฅ๐‘˜+1).Inserting this into (10.20) and applying the preconditioned three-point formula (9.11) for๐‘€ = ๐‘๐‘˜+1๐‘€๐‘˜+1 yields

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1 (๐‘€๐‘˜+1+2๐‘Š๐‘˜+1ฮ“) +12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘๐‘˜+1 (๐‘€๐‘˜+1โˆ’๐‘Š๐‘˜+1ฮ›/2) โ‰ค12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 .

Using (10.17) and (10.18), this implies that

(10.21) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+2๐‘€๐‘˜+2 โ‰ค12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 .

Summing over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 now yields the claim. ๏ฟฝ

The inequality (10.21) is a quantitative or variable-metric version of the Fejรฉr monotonicityof Lemma 9.1 (i) with respect to ๐‘‹ = {๐‘ฅ}.If Theorem 10.5 is applicable, we immediately obtain the convergence rate result.

Corollary 10.6 (convergence with a rate). If (10.19) holds and ๐‘๐‘+1๐‘€๐‘+1 โ‰ฅ ` (๐‘ )๐ผ for some` : โ„• โ†’ โ„, then โ€–๐‘ฅ๐‘ โˆ’ ๐‘ขโ€–2 โ†’ 0 at the rate ๐‘‚ (1/` (๐‘ )).

140

10 rates of convergence by testing

primal-dual proximal splitting methods

We now apply this operator-testing technique to primal-dual splitting methods for thesolution of

(10.22) min๐‘ฅโˆˆ๐‘‹

๐น0(๐‘ฅ) + ๐ธ (๐‘ฅ) +๐บ (๐พ๐‘ฅ)

with ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„, and ๐บ : ๐‘Œ โ†’ โ„ convex, proper, and lower semicontinuousand ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ). We will also write ๐น โ‰” ๐น0 + ๐ธ. The methods include in particularthe PDPS method with a forward step (9.29). Now allowing varying step lengths and anover-relaxation parameter ๐œ”๐‘˜ , this can be written

(10.23)

๐‘ฅ๐‘˜+1 โ‰” (๐ผ + ๐œ๐‘˜๐œ•๐น0)โˆ’1(๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜๐พโˆ—๐‘ฆ๐‘˜ โˆ’ ๐œ๐‘˜โˆ‡๐ธ (๐‘ฅ๐‘˜)),๐‘ฅ๐‘˜+1 โ‰” ๐œ”๐‘˜ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) + ๐‘ฅ๐‘˜+1,๐‘ฆ๐‘˜+1 โ‰” (๐ผ + ๐œŽ๐‘˜+1๐œ•๐บโˆ—)โˆ’1(๐‘ฆ๐‘˜ + ๐œŽ๐‘˜+1๐พ๐‘ฅ๐‘˜+1).

For the basic version of the algorithm with ๐œ”๐‘˜ = 1, ๐œ๐‘˜ โ‰ก ๐œ0 > 0, and ๐œŽ๐‘˜ โ‰ก ๐œŽ0 > 0, wehave seen in Corollary 9.18 that the iterates converge weakly if the step length parameterssatisfy

(10.24) ๐ฟ๐œ0/2 + ๐œ0๐œŽ0โ€–๐พ โ€–2๐•ƒ(๐‘‹ ;๐‘Œ ) < 1,

where ๐ฟ is the Lipschitz constant of โˆ‡๐ธ. We will now show that under strong convexity of๐น0, we can choose these parameters to accelerate the algorithm to yield convergence at arate๐‘‚ (1/๐‘ 2). If both ๐น0 and๐บโˆ— are strongly convex, we can even obtain linear convergence.Throughout ๐‘ข = (๐‘ฅ, ๐‘ฆ) denotes a root of

๐ป (๐‘ข) โ‰”(๐œ•๐น0(๐‘ฅ) + โˆ‡๐ธ (๐‘ฅ) + ๐พโˆ—๐‘ฆ

๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ๐‘ฅ),

which we assume exists. From Theorem 5.10, this is the case if an interior point conditionis satised for ๐บ โ—ฆ ๐พ and (10.22) admits a solution.

We will also require the following technical lemma in place of the simpler growth argumentfor the choice (10.5).

Lemma 10.7. Pick ๐œ‘0 > 0 arbitrarily, and dene iteratively ๐œ‘๐‘˜+1 โ‰” ๐œ‘๐‘˜(1 + 2๐›พ๐œ‘โˆ’1/2

๐‘˜

)for some

๐›พ > 0. Then there exists a constant ๐‘ > 0 such that ๐œ‘๐‘˜ โ‰ฅ (๐‘๐‘˜ + ๐œ‘ 1/2

0)2 for all ๐‘˜ โˆˆ โ„•.

Proof. Replacing ๐œ‘๐‘˜ by ๐œ‘โ€ฒ๐‘˜ โ‰” ๐›พโˆ’2๐œ‘๐‘˜ , we may assume without loss of generality that ๐›พ = 1.We claim that ๐œ‘ 1/2

๐‘˜โ‰ฅ ๐‘๐‘˜ + ๐œ‘ 1/2

0 for some ๐‘ > 0. We proceed by induction. The case ๐‘˜ = 0

141

10 rates of convergence by testing

is clear. If the claim holds for ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1, we can unroll the recursion to obtain theestimate

๐œ‘๐‘ โˆ’ ๐œ‘0 =๐‘โˆ’1โˆ‘๐‘˜=0

2๐œ‘ 1/2๐‘˜

โ‰ฅ 2๐‘โˆ’1โˆ‘๐‘˜=0

๐‘๐‘˜ + 2๐œ‘ 1/20 ๐‘ = ๐‘๐‘ (๐‘ โˆ’ 1) + 2๐œ‘ 1/2

0 ๐‘ = ๐‘๐‘ 2 + (2๐œ‘ 1/20 โˆ’ ๐‘)๐‘ .

Expanding (๐‘๐‘ + ๐œ‘ 1/20 )2 = ๐‘2๐‘ 2 + 2๐‘๐œ‘ 1/2

0 ๐‘ + ๐œ‘0, we see that the claim for ๐œ‘๐‘ holds if๐‘ โ‰ฅ ๐‘2 and 2๐œ‘ 1/2

0 โˆ’ ๐‘ โ‰ฅ 2๐‘๐œ‘ 1/20 . Taking the latter with equality and solving for ๐‘ yields

๐‘ = 2๐œ‘ 1/20 /(1 + 2๐œ‘ 1/2

0 ) < 1 and hence also the former. Since this choice of ๐‘ does not dependon ๐‘ , the claim follows. ๏ฟฝ

Theorem 10.8 (accelerated and linearly convergent PDPS). Let ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„

and ๐บ : ๐‘Œ โ†’ โ„ be convex, proper, and lower semicontinuous with โˆ‡๐ธ Lipschitz continuouswith constant ๐ฟ > 0. Also let ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and suppose the assumptions of Theorem 5.10 aresatised with ๐น โ‰” ๐น0 + ๐ธ. Pick initial step lengths ๐œ0, ๐œŽ0 > 0 subject to (10.24). For any initialiterate ๐‘ข0 โˆˆ ๐‘‹ ร— ๐‘Œ , suppose {๐‘ข๐‘˜+1 = (๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜+1)}๐‘˜โˆˆโ„• are generated by (10.23).

(i) If ๐น0 is strongly convex with factor ๐›พ > 0, and we take

(10.25) ๐œ”๐‘˜ โ‰” 1/โˆš1 + 2๐›พ๐œ๐‘˜ , ๐œ๐‘˜+1 โ‰” ๐œ๐‘˜๐œ”๐‘˜ , and ๐œŽ๐‘˜+1 โ‰” ๐œŽ๐‘˜/๐œ”๐‘˜ ,

then โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ โ†’ 0 at the rate ๐‘‚ (1/๐‘ 2).(ii) If both ๐น0 and๐บโˆ— are strongly convex with factor ๐›พ > 0 and ๐œŒ > 0, respectively, and we

take

(10.26) ๐œ”๐‘˜ โ‰” 1/โˆš1 + 2\, \ โ‰” min{๐œŒ๐œŽ0, ๐›พ๐œ0}, ๐œ๐‘˜ โ‰” ๐œ0 and ๐œŽ๐‘˜ โ‰” ๐œŽ0,

then โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ + โ€–๐‘ฆ๐‘ โˆ’ ๐‘ฆ โ€–2๐‘Œ โ†’ 0 linearly.

Proof. Recalling Corollary 9.18, we write (10.23) in the form (10.14) by taking

๐ด(๐‘ข) โ‰”(๐œ•๐น0(๐‘ฅ)๐œ•๐บโˆ—(๐‘ฆ)

)+ ฮž๐‘ข, ๐ต(๐‘ข) โ‰”

(โˆ‡๐ธ (๐‘ฅ)0

), ฮž โ‰”

(0 ๐พโˆ—

โˆ’๐พ 0

),

๐‘Š๐‘˜+1 โ‰”(๐œ๐‘˜ Id 00 ๐œŽ๐‘˜+1Id

), and ๐‘€๐‘˜+1 โ‰”

(Id โˆ’๐œ๐‘˜๐พโˆ—

โˆ’๐œ”๐‘˜๐œŽ๐‘˜+1๐พ Id

).

As before, Theorem 5.10 guarantees that ๐ปโˆ’1(0) โ‰  โˆ…. For some primal and dual testingparameters ๐œ‘๐‘˜ ,๐œ“๐‘˜+1 > 0, we also take as our testing operator

(10.27) ๐‘๐‘˜+1 โ‰”(๐œ‘๐‘˜ Id 00 ๐œ“๐‘˜+1Id

).

142

10 rates of convergence by testing

By Examples 10.3 and 10.4, ๐ด is then ฮ“-strongly monotone with respect to ๐‘๐‘˜+1๐‘Š๐‘˜+1 and ๐ตis three-point monotone with respect to ๐‘๐‘˜+1๐‘Š๐‘˜+1 and ฮ› for

ฮ“ โ‰” ฮž +(๐›พ Id 00 ๐œŒId

), and ฮ› โ‰”

(๐ฟ Id 00 0

),

where ๐œŒ = 0 if ๐บโˆ— is not strongly convex.

We will apply Theorem 10.5. Taking ๐œ”๐‘˜ โ‰” ๐œŽโˆ’1๐‘˜+1๐œ“

โˆ’1๐‘˜+1๐œ‘๐‘˜๐œ๐‘˜ , we expand

(10.28) ๐‘๐‘˜+1๐‘€๐‘˜+1 =(๐œ‘๐‘˜ Id โˆ’๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—

โˆ’๐œ‘๐‘˜๐œ๐‘˜๐พ ๐œ“๐‘˜+1Id

).

Thus ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint as required. We still need to show that it is nonnegative andindeed grows at a rate that gives our claims. We also need to verify (10.17) and (10.18), whichexpand as ( (๐œ‘๐‘˜ (1 + 2๐›พ๐œ๐‘˜) โˆ’ ๐œ‘๐‘˜+1)Id (๐œ‘๐‘˜๐œ๐‘˜ + ๐œ‘๐‘˜+1๐œ๐‘˜+1)๐พโˆ—

(๐œ‘๐‘˜+1๐œ๐‘˜+1 โˆ’ 2๐œ“๐‘˜+1๐œŽ๐‘˜+1 โˆ’ ๐œ‘๐‘˜๐œ๐‘˜)๐พ (๐œ“๐‘˜+1(1 + 2๐œŒ๐œŽ๐‘˜+1) โˆ’๐œ“๐‘˜+2)Id)โ‰ฅ 0, and(10.29) (

๐œ‘๐‘˜ (1 โˆ’ ๐œ๐‘˜๐ฟ/2)Id โˆ’๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—

โˆ’๐œ‘๐‘˜๐œ๐‘˜๐พ ๐œ“๐‘˜+1Id

)โ‰ฅ 0.(10.30)

We now proceed backward by deriving the step length rules as sucient conditions forthese two inequalities. First, clearly (10.29) holds if for all ๐‘˜ โˆˆ โ„• we can guarantee that

(10.31) ๐œ‘๐‘˜+1 โ‰ค ๐œ‘๐‘˜ (1 + 2๐›พ๐œ๐‘˜), ๐œ“๐‘˜+1 โ‰ค ๐œ“๐‘˜ (1 + 2๐œŒ๐œŽ๐‘˜), and ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ .

We deal with (10.30) and the lower bounds on ๐‘๐‘˜+1๐‘€๐‘˜+1 in one go. By Youngโ€™s inequality,we have for any ๐›ฟ โˆˆ (0, 1) that

๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐พ๐‘ฅ, ๐‘ฆใ€‰ โ‰ค (1 โˆ’ ๐›ฟ)๐œ‘๐‘˜ โ€–๐‘ฅ โ€–2 + ๐œ‘๐‘˜๐œ2๐‘˜ (1 โˆ’ ๐›ฟ)โˆ’1โ€–๐พโˆ—๐‘ฆ โ€–2 (๐‘ฅ โˆˆ ๐‘‹, ๐‘ฆ โˆˆ ๐‘Œ ),hence recalling (10.28) also

(10.32) ๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ(๐›ฟ๐œ‘๐‘˜ Id 00 ๐œ“๐‘˜+1Id โˆ’ ๐œ‘๐‘˜๐œ2๐‘˜ (1 โˆ’ ๐›ฟ)โˆ’1๐พ๐พโˆ—

).

The condition (10.30) is therefore satised and ๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ Y๐‘๐‘˜+1 if (10.31) holds, and forsome Z > 0 both

(10.33) (1 โˆ’ Z )๐›ฟ๐œ‘๐‘˜ โ‰ฅ ๐œ‘๐‘˜๐œ๐‘˜๐ฟ/2 and ๐œ“๐‘˜+1 โ‰ฅ ๐œ‘๐‘˜๐œ2๐‘˜ (1 โˆ’ ๐›ฟ)โˆ’1โ€–๐พ โ€–2.By (10.31),๐œ“๐‘˜+1 โ‰ฅ ๐œ“๐‘˜ , so using also using the last part of (10.31), we see (10.33) to hold if

(10.34) (1 โˆ’ Z )๐›ฟ โ‰ฅ ๐œ๐‘˜๐ฟ/2 and 1 โˆ’ ๐›ฟ โ‰ฅ ๐œ๐‘˜๐œŽ๐‘˜ โ€–๐พ โ€–2.If we choose ๐œ๐‘˜ and ๐œŽ๐‘˜ such that their product stays constant (i.e., ๐œ๐‘˜๐œŽ๐‘˜ = ๐œŽ0๐œ0), it is thenoptimal to take ๐›ฟ โ‰” 1 โˆ’ ๐œŽ0๐œ0โ€–๐พ โ€–2, which has to be positive. Inserting this into the rst

143

10 rates of convergence by testing

part of (10.34), we see that it holds for some Z > 0 if ๐œ๐‘˜๐ฟ/2 < 1 โˆ’ ๐œŽ0๐œ0โ€–๐พ โ€–2. Since {๐œ๐‘˜}๐‘˜โˆˆโ„•is nonincreasing, we see that (10.34) and hence (10.30) is satised when the initializationcondition (10.24) holds.

To apply Theorem 10.5, all that remains is to verify (10.31) and that ๐œ๐‘˜๐œŽ๐‘˜ = ๐œ0๐œŽ0. To obtainconvergence rates, we need to further study the rate of increase of ๐œ‘๐‘˜ and๐œ“๐‘˜+1, which werecall that we wish to make as high as possible.

(i) If ๐›พ > 0 and ๐œŒ = 0, the best possible choice allowed by (10.31) is ๐œ“๐‘˜ โ‰ก ๐œ“0 and๐œ‘๐‘˜+1 = ๐œ‘๐‘˜ (1 + 2๐›พ๐œ๐‘˜) with ๐œŽ๐‘˜ = ๐œ‘๐‘˜๐œ๐‘˜/๐œ“0. Together with the condition ๐œ๐‘˜๐œŽ๐‘˜ = ๐œŽ0๐œ0,this forces ๐œŽ0๐œ0 = ๐œ‘๐‘˜๐œ2๐‘˜/๐œ“0. If we take๐œ“0 = 1/(๐œŽ0๐œ0), we thus need ๐œ๐‘˜ = ๐œ‘โˆ’1/2

๐‘˜. Since

๐œŽ๐‘˜+1 = ๐œŽ0๐œ0/๐œ๐‘˜+1 = 1/(๐œ“0๐œ๐‘˜+1), we obtain the relations

๐œ”๐‘˜ =๐œ‘๐‘˜๐œ๐‘˜

๐œŽ๐‘˜+1๐œ“๐‘˜+1=๐œ‘ 1/2๐‘˜

๐œ‘ 1/2๐‘˜+1

=1โˆš

1 + 2๐›พ๐œ๐‘˜,

which are satised for the choices of ๐œ”๐‘˜ , ๐œ๐‘˜+1, and ๐œŽ๐‘˜+1 in (10.25).

We now use Theorem 10.5 and Corollary 10.6 and (10.32) to obtain

๐›ฟ๐œ‘๐‘2 โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ โ‰ค 1

2 โ€–๐‘ข๐‘ โˆ’ ๐‘ขโ€–2๐‘๐‘+1๐‘€๐‘+1 โ‰ค ๐ถ0 โ‰”

12 โ€–๐‘ข

0 โˆ’ ๐‘ขโ€–2๐‘1๐‘€1.

Although this does not tell us anything about the convergence of the dual iterates{๐‘ฆ๐‘ }๐‘โˆˆโ„• as๐œ“๐‘ โ‰ก ๐œ“ stays constant, Lemma 10.7 shows that the primal test ๐œ‘๐‘ growsat the rate ฮฉ(๐‘ 2) Hence we obtain the claimed convergence of the primal iterates atthe rate ๐‘‚ (1/๐‘ 2).

(ii) If ๐›พ > 0 and ๐œŒ > 0 and we take ๐œ๐‘˜ โ‰ก ๐œ0 and ๐œŽ๐‘˜ โ‰ก ๐œŽ0, the last condition of (10.31) forces๐œ“๐‘˜ = ๐œ‘๐‘˜๐œ0/๐œŽ0. Inserting this into the second condition yields ๐œ‘๐‘˜+1 โ‰ค ๐œ‘๐‘˜ (1 + 2๐œŒ๐œŽ0).Together with the rst condition, we therefore at best can take ๐œ‘๐‘˜+1 = ๐œ‘๐‘˜ (1 + 2\ ) for\ โ‰” min{๐œŒ๐œŽ0, ๐›พ๐œ0}. Reversing the roles of๐œ“ and ๐œ‘ , we see that we can at best take๐œ“๐‘˜+1 = ๐œ“๐‘˜ (1 + 2\ ). This leads to the relations

๐œ”๐‘˜ =๐œ‘๐‘˜๐œ0๐œŽ0๐œ“๐‘˜+1

=๐œ‘๐‘˜๐œ‘๐‘˜+1

=1

1 + 2\ ,

which are again satised by the respective choices in (10.26).

We nish the proof with Theorem 10.5 and Corollary 10.6, observing now from (10.32)that ๐‘๐‘๐‘€๐‘ โ‰ฅ ๐ถ (1 + 2\ )๐‘ Id for some ๐ถ > 0. ๏ฟฝ

Note that if๐›พ = 0 and ๐œŒ = 0, (10.31) forces ๐œ‘๐‘˜ โ‰ก ๐œ‘0 as well as๐œ“๐‘˜ โ‰ก ๐œ“0. If we take ๐œ‘๐‘˜ โ‰ก 1, thenwe also have to take ๐œ๐‘˜ = ๐œŽ๐‘˜๐œ“0. We can use this to dene๐œ“0 if we also x ๐œ๐‘˜ โ‰ก ๐œ0 and๐œŽ๐‘˜ โ‰ก ๐œŽ0.This also forces ๐œ”๐‘˜ โ‰ก 1. We thus again arrive at (10.31) as well as ๐œ๐‘˜๐œŽ๐‘˜ = ๐œŽ0๐œ0. However, wecannot obtain from this convergence rates for the iterates, merely boundedness and henceweak convergence as in Section 9.4.

144

11 GAPS AND ERGODIC RESULTS

We continue with the testing framework introduced in Chapter 10 for proving rates ofconvergence of iterates of optimization methods. This generally required strong convexity,which is not always available. In this chapter, we use the testing idea to derive convergencerates of objective function values and other, more general, gap functionals that indicatealgorithm convergence more indirectly than iterate convergence. This can be useful incases where we can only obtain weak convergence of iterates, but can obtain rates ofconvergence of such a gap functional. Nevertheless, this gap convergence often will onlybe ergodic, i.e., the estimates only apply to a weighted sum of the history of iterates insteadof the most recent iterate. In fact, we will rst derive ergodic estimates for all algorithms.If we can additionally show that the algorithm is monotonic with respect to this gap, wecan improve the ergodic estimate to the nonergodic ones as in the previous chapters.

11.1 gap functionals

We recall that one of the three fundamental ingredients in the convergence proofs ofChapter 9 was the monotonicity of ๐ป (with one of the points xed to a root ๐‘ฅ). We nowmodify this requirement to be able to prove estimates on the convergence of functionvalues when ๐ป = ๐œ•๐น for some proper, convex, and lower semicontinuous ๐น : ๐‘‹ โ†’ โ„. Inthis case, by the denition of the convex subdierential,

(11.1) ใ€ˆ๐œ•๐น (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ) (๐‘ฅ โˆˆ ๐‘‹ ).

On the other hand, for an ๐ฟ-smooth functional ๐บ : ๐‘‹ โ†’ โ„, we can use the three-pointestimates of Corollary 7.2 to obtain

(11.2) ใ€ˆโˆ‡๐บ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ) โˆ’ 12๐ฟ โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ (๐‘ฅ โˆˆ ๐‘‹ ).

These two inequalities are enough to obtain function value estimates for the more generalcase ๐ป = ๐œ•๐น + โˆ‡๐บ including a forward step with respect to ๐บ . We will produce suchestimates in Section 11.2.

145

11 gaps and ergodic results

generic gap functionals

More generally, when ๐ป does not directly arise from subdierentials or gradients buthas a more complicated structure, we introduce several gap functionals. We identied inChapter 9 that for some lifted functionals ๐น and ๏ฟฝ๏ฟฝ and a skew-adjoint operator ฮž = โˆ’ฮžโˆ—,the unaccelerated PDPS, PDES, and DRS consist in taking ๐ป = ๐œ•๐น +โˆ‡๏ฟฝ๏ฟฝ +ฮž and iterating

(11.3) 0 โˆˆ ๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜) + ฮž๐‘ฅ๐‘˜+1 +๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜),

where the skew-adjoint operator ฮž does not arise as a subdierential of any function. Work-ing with this requires extra eort, especially when we later study accelerated methods.

Note that by the skew-adjointness of ฮž, we have ใ€ˆฮž๐‘ฅ, ๐‘ฅใ€‰๐‘‹ = 0. Using this and the estimates(11.1) and (11.2) on ๐น and ๏ฟฝ๏ฟฝ , we obtain for the basic unaccelerated scheme (11.3) the estimate

ใ€ˆ๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜) + ฮž๐‘ฅ๐‘˜+1, ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ G(๐‘ฅ ;๐‘ฅ) โˆ’ 12๐ฟ โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹

with the generic gap functional

(11.4) G(๐‘ฅ ;๐‘ฅ) โ‰” (๏ฟฝ๏ฟฝ + ๐น ) (๐‘ฅ) โˆ’ (๏ฟฝ๏ฟฝ + ๐น ) (๐‘ฅ) + ใ€ˆฮž๐‘ฅ, ๐‘ฅใ€‰๐‘‹ .

In the next lemma, we collect some elementary properties of this functional. Note thatG(๐‘ฅ, ๐‘ง) = 0 is possible even for ๐‘ฅ โ‰  ๐‘ง.

Lemma 11.1. Let ๐ป โ‰” ๐œ•๐น + โˆ‡๏ฟฝ๏ฟฝ + ฮž, where ฮž โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) is skew-adjoint and ๏ฟฝ๏ฟฝ : ๐‘‹ โ†’ โ„ and๐น : ๐‘‹ โ†’ โ„ are convex, proper, and lower semicontinuous. If ๐‘ฅ โˆˆ ๐ปโˆ’1(0), then G( ยท ;๐‘ฅ) โ‰ฅ 0and G(๐‘ฅ ;๐‘ฅ) = 0.

Proof. We rst note that ๐‘ฅ โˆˆ ๐ปโˆ’1(0) is equivalent to โˆ’ฮž๐‘ฅ โˆˆ ๐œ•(๐น + ๏ฟฝ๏ฟฝ) (๐‘ฅ). Hence usingthe denition of the convex subdierential and the fact that ใ€ˆฮž๐‘ฅ, ๐‘ฅใ€‰๐‘‹ = 0 due to theskew-adjointness of ฮž, we deduce for arbitrary ๐‘ฅ โˆˆ ๐‘‹ that

(๐น + ๏ฟฝ๏ฟฝ) (๐‘ฅ) โˆ’ (๐น + ๏ฟฝ๏ฟฝ) (๐‘ฅ) โ‰ฅ ใ€ˆโˆ’ฮž๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆโˆ’ฮž๐‘ฅ, ๐‘ฅใ€‰๐‘‹ ,

i.e., G(๐‘ฅ, ๐‘ฅ) โ‰ฅ 0. The fact that G(๐‘ฅ, ๐‘ฅ) = 0 follows immediately from the skew-adjointnessof ฮž. ๏ฟฝ

The function value estimates (11.1) and (11.2) โ€“ unlike simple monotonicity-based nonnega-tivity estimates โ€“ do not depend on ๐‘ฅ being a root of ๐ป . Therefore, taking any bounded set๐ต โŠ‚ ๐‘‹ such that ๐ปโˆ’1(0) โˆฉ ๐ต โ‰  โˆ…, we see that the partial gap

G(๐‘ฅ ;๐ต) โ‰” sup๐‘ฅโˆˆ๐ต

G(๐‘ฅ ;๐‘ฅ)

also satises G( ยท ;๐ต) โ‰ฅ 0.

146

11 gaps and ergodic results

the lagrangian duality gap

Let us now return to the problem

(11.5) min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ),

where we split ๐น = ๐น0 + ๐ธ assuming ๐ธ to have a Lipschitz-continuous gradient. With thenotation ๐‘ข = (๐‘ฅ, ๐‘ฆ), we recall that Theorem 5.10 guarantees the existence of a primal-dualsolution ๐‘ข whenever its conditions are satised. This, we further recall, can be written as0 โˆˆ ๐ป (๐‘ข) for

(11.6a) ๐ป (๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ) + ๐พโˆ—๐‘ฆ๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ๐‘ฅ

).

As we have already seen in, e.g., Theorem 10.8, we can express this choice of ๐ป in thepresent framework with

๐น (๐‘ข) โ‰” ๐น0(๐‘ฅ) +๐บ (๐‘ฆ), ๏ฟฝ๏ฟฝ (๐‘ข) โ‰” ๐ธ (๐‘ฅ), and ฮž โ‰”

(0 ๐พโˆ—

โˆ’๐พ 0

).(11.6b)

With this, the generic gap functional G from (11.4) becomes the Lagrangian duality gap

(11.7) G๐ฟ (๐‘ข;๐‘ข) โ‰”(๐น (๐‘ฅ) + ใ€ˆ๐‘ฆ, ๐พ๐‘ฅใ€‰ โˆ’๐บโˆ—(๐‘ฆ)) โˆ’ (

๐น (๐‘ฅ) + ใ€ˆ๐‘ฆ, ๐พ๐‘ฅใ€‰ โˆ’๐บโˆ—(๐‘ฆ)) โ‰ค G(๐‘ข),where

G(๐‘ข) โ‰” ๐น (๐‘ฅ) +๐บ (๐พ๐‘ฅ) + ๐น โˆ—(โˆ’๐พ๐‘ฆ) +๐บโˆ—(๐‘ฆ)is the real duality gap, cf. (5.16). As Corollary 5.13 shows, when its conditions are satisedand ๐‘ข = ๐‘ข โˆˆ ๐ปโˆ’1(0), the Lagrangian duality gap is nonnegative.

Since (11.1) and (11.2) do not depend on ๐‘ฅ being a root of ๐ป , convergence results for theLagrangian duality gap can sometimes be improved slightly by taking any bounded set๐ต โŠ‚ ๐‘‹ ร— ๐‘Œ such that ๐ต โˆฉ ๐ปโˆ’1(0) โ‰  โˆ… and dening the partial duality gap

(11.8) G(๐‘ข;๐ต) โ‰” sup๐‘ขโˆˆ๐ต

G๐ฟ (๐‘ข;๐‘ข).

This satises 0 โ‰ค G(๐‘ข;๐ต) โ‰ค G(๐‘ข). Moreover, by the denition of ๐น โˆ— and๐บโˆ—โˆ— = ๐บ , we haveG(๐‘ข;๐‘‹ ร— ๐‘Œ ) = G(๐‘ข), which explains both the importance of partial duality gaps and theterm โ€œpartial gapโ€.

bregman divergences and gap functionals

Although we will not need this in the following, we briey discuss a possible extension toBanach spaces. Let ๐ฝ : ๐‘‹ โ†’ โ„ be convex on a Banach space ๐‘‹ . Then for ๐‘ฅ โˆˆ dom ๐ฝ and๐‘ โˆˆ ๐œ•๐ฝ (๐‘ฅ), one can dene the asymmetric Bregman divergence (or distance)

๐ต๐‘๐ฝ (๐‘ง, ๐‘ฅ) โ‰” ๐ฝ (๐‘ง) โˆ’ ๐ฝ (๐‘ฅ) โˆ’ ใ€ˆ๐‘, ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ , (๐‘ฅ โˆˆ ๐‘‹ ).

147

11 gaps and ergodic results

Due to the denition of the convex subdierential, this is nonnegative. It is also possible tosymmetrize the distance by considering ๏ฟฝ๏ฟฝ ๐ฝ (๐‘ฅ, ๐‘ง) โ‰” ๐ต

๐‘ž๐ฝ (๐‘ฅ, ๐‘ง) + ๐ต

๐‘๐ฝ (๐‘ง, ๐‘ฅ) with ๐‘ž โˆˆ ๐œ•๐ฝ (๐‘ง) and

๐‘ง โˆˆ dom ๐ฝ , but even the symmetrized divergence is not generally a true distance as it canhappen that ๐ต ๐ฝ (๐‘ฅ, ๐‘ง) = 0 even if ๐‘ฅ โ‰  ๐‘ง.

The Bregman divergence satises a three-point identity for any ๐‘ฅ โˆˆ dom ๐ฝ : We have

๐ต๐‘๐ฝ (๐‘ฅ, ๐‘ฅ) โˆ’ ๐ต

๐‘๐ฝ (๐‘ฅ, ๐‘ง) + ๐ต

๐‘ž๐ฝ (๐‘ฅ, ๐‘ง) = [๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ฅ) โˆ’ ใ€ˆ๐‘, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ ] โˆ’ [๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ง) โˆ’ ใ€ˆ๐‘ž, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ ]

+ [๐ฝ (๐‘ฅ) โˆ’ ๐ฝ (๐‘ง) โˆ’ ใ€ˆ๐‘ž, ๐‘ฅ โˆ’ ๐‘งใ€‰๐‘‹ ],

which immediately gives the three-point identity

(11.9) ใ€ˆ๐‘ โˆ’๐‘ž, ๐‘ฅ โˆ’๐‘ฅใ€‰๐‘‹ = ๐ต๐‘๐ฝ (๐‘ฅ, ๐‘ฅ) โˆ’๐ต๐‘ž๐ฝ (๐‘ฅ, ๐‘ง) +๐ต

๐‘ž๐ฝ (๐‘ฅ, ๐‘ง) (๐‘ฅ, ๐‘ฅ, ๐‘ง โˆˆ ๐‘‹, ๐‘ โˆˆ ๐œ•๐ฝ (๐‘ง), ๐‘ž โˆˆ ๐œ•๐ฝ (๐‘ฅ)) .

If๐‘‹ is a Hilbert space,we can take ๐ฝ (๐‘ฅ) = 12 โ€–๐‘ฅ โ€–2 to obtain ๐ต๐‘ฅโˆ’๐‘ง๐ฝ (๐‘ง, ๐‘ฅ) = ๏ฟฝ๏ฟฝ ๐ฝ (๐‘ง, ๐‘ฅ) = 1

2 โ€–๐‘งโˆ’๐‘ฅ โ€–2๐‘‹ .Therefore this three-point identity generalizes the classical three-point identity (9.1) inHilbert spaces. This could be used to generalize our convergence proofs to Banach spacesto treat methods of the general form

0 โˆˆ ๐ป (๐‘ฅ๐‘˜+1) + ๐œ•1๐ต๐‘ž๐‘˜

๐ฝ (๐‘ฅ๐‘˜+1, ๐‘ฅ๐‘˜),

where ๐œ•1 denotes taking a subdierential with respect to the rst variable. To see how (11.9)applies, observe that

๐œ•1๐ต๐‘ž๐‘˜

๐ฝ (๐‘ฅ๐‘˜+1, ๐‘ฅ๐‘˜) = ๐œ•๐ฝ (๐‘ฅ๐‘˜+1) โˆ’ ๐‘ž๐‘˜ = {๐‘๐‘˜+1 โˆ’ ๐‘ž๐‘˜ | ๐‘ž๐‘˜+1 โˆˆ ๐œ•๐ฝ (๐‘ฅ๐‘˜+1)}.

This would, however, not provide convergence in norm but with respect to ๐ต ๐ฝ . For a generalapproach to primal-dual methods based on Bregman divergences, see [Valkonen 2020a].

Returning to our generic gap functional G dened in (11.4), we have already observed inthe proof of Lemma 11.1 that โˆ’ฮž๐‘ฅ โˆˆ ๐œ•(๐น + ๏ฟฝ๏ฟฝ) (๐‘ฅ). Since due to the skew-adjointness of ฮžwe also have ใ€ˆฮž๐‘ฅ, ๐‘ฅใ€‰๐‘‹ = ใ€ˆฮž๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ for a solution ๐‘ฅ โˆˆ ๐ปโˆ’1(0), this means that

G(๐‘ฅ, ๐‘ฅ) = ๐ตโˆ’ฮž๐‘ฅ๏ฟฝ๏ฟฝ+๐น (๐‘ฅ, ๐‘ฅ).

In other words, the gap based at a solution ๐‘ฅ โˆˆ ๐ปโˆ’1(0) is also a Bregman divergence. Ingeneral, as we have already remarked, it can be zero for ๐‘ฅ โ‰  ๐‘ฅ .

11.2 convergence of function values

We start with the fundamental algorithms: the proximal point method and explicit splitting.In the following, we write ๐บmin โ‰” min๐‘ฅโˆˆ๐‘‹ ๐บ (๐‘ฅ) whenever the minimum exists.

148

11 gaps and ergodic results

Theorem 11.2 (proximal pointmethod ergodic function value). Let๐บ be proper, lower semicon-tinuous, and (strongly) convex with factor ๐›พ โ‰ฅ 0. Suppose [๐œ•๐บ]โˆ’1(0) โ‰  โˆ…. Pick an arbitrary๐‘ฅ0 โˆˆ ๐‘‹ . Let ๐œ‘๐‘˜+1 โ‰” ๐œ‘๐‘˜ (1 + ๐›พ๐œ๐‘˜), and ๐œ‘0 โ‰” 1. For the iterates ๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜) of theproximal point method, dene the โ€œergodic sequenceโ€

(11.10) ๐‘ฅ๐‘ โ‰”1Z๐‘

๐‘โˆ’1โˆ‘๐‘˜=0

๐œ๐‘˜๐œ‘๐‘˜๐‘ฅ๐‘˜+1 for Z๐‘ โ‰”

๐‘โˆ’1โˆ‘๐‘˜=0

๐œ๐‘˜๐œ‘๐‘˜ (๐‘ โ‰ฅ 1).

(i) If ๐œ๐‘˜ โ‰ก ๐œ > 0 and ๐บ is not strongly convex (๐›พ = 0), then ๐บ (๐‘ฅ๐‘ ) โ†’ ๐บmin at the rate๐‘‚ (1/๐‘ ).

(ii) If ๐œ๐‘˜ โ‰ก ๐œ > 0 and ๐บ is strongly convex (๐›พ > 0), then ๐บ (๐‘ฅ๐‘ ) โ†’ ๐บmin linearly.

(iii) If ๐œ๐‘˜ โ†’โˆž and ๐บ is strongly convex, then ๐บ (๐‘ฅ๐‘ ) โ†’ ๐บmin superlinearly.

Proof. Let the root ๐‘ฅ โˆˆ [๐œ•๐บ]โˆ’1(0) be arbitrary; by assumption at least one exists. Then๐บmin = ๐บ (๐‘ฅ) by Theorem 4.2. We recall that the proximal point iteration for minimizing ๐บcan be written as

(11.11) 0 โˆˆ ๐œ๐‘˜๐œ•๐บ (๐‘ฅ๐‘˜+1) + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

As in the proof of Theorem 9.4, we test (11.11) by the application of ๐œ‘๐‘˜ ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ forsome testing parameter ๐œ‘๐‘˜ > 0 to obtain

(11.12) 0 โˆˆ ๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐œ‘๐‘˜ ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ .

The next step will dier from the proof of Theorem 9.4, as we want a value estimate. Indeed,by the subdierential characterization of strong convexity, Lemma 7.4 (ii),

ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ) + ๐›พ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ .

Using this and the three-point-identity (9.1) in (11.12), we obtain similarly to the proof ofTheorem 10.1 the estimate

(11.13) ๐œ‘๐‘˜ (1 + ๐œ๐‘˜๐›พ)2 โ€–๐‘ฅ๐‘˜+1 โˆ’๐‘ฅ โ€–2๐‘‹ +๐œ‘๐‘˜๐œ๐‘˜ [๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ)] + ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜ โˆ’๐‘ฅ โ€–2๐‘‹ .

We now impose the recursion

(11.14) ๐œ‘๐‘˜ (1 + ๐œ๐‘˜๐›พ) = ๐œ‘๐‘˜+1.

(Observe the factor-of-two dierence compared to (10.5).) Thus

(11.15) ๐œ‘๐‘˜+12 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜๐œ๐‘˜ [๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ)] + ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

149

11 gaps and ergodic results

Summing over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 then yields

(11.16) ๐œ‘๐‘2 โ€–๐‘ฅ๐‘โˆ’๐‘ฅ โ€–2๐‘‹+๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜ [๐บ (๐‘ฅ๐‘˜+1)โˆ’๐บ (๐‘ฅ)]+๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1โˆ’๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘0

2 โ€–๐‘ฅ0โˆ’๐‘ฅ โ€–2๐‘‹ =: ๐ถ0.

Using Jensenโ€™s inequality, it follows for the ergodic sequence dened in (11.10) that

Z๐‘ [๐บ (๐‘ฅ๐‘ ) โˆ’๐บ (๐‘ฅ)] โ‰ค ๐ถ0.

If ๐œ‘๐‘˜ โ‰ก ๐œ‘0 and ๐›พ = 0, we therefore have that Z๐‘ = ๐‘๐œ‘0๐œ and thus obtain ๐‘‚ (1/๐‘ ) conver-gence of function values for the ergodic variable ๐‘ฅ๐‘ .

If ๐œ‘๐‘˜ โ‰ก ๐œ‘0 and ๐›พ > 0, we deduce from (11.14) that Z๐‘ =โˆ‘๐‘โˆ’1๐‘˜=0 (1 + ๐›พ๐œ๐‘˜)๐‘˜๐œ๐‘˜๐œ‘0. This grows

exponentially and hence we obtain the claimed linear convergence.

Finally, if ๐œ๐‘˜ โ†’ โˆž, we would similarly to Theorem 10.1 (ii) obtain superlinear convergenceif Z๐‘ /Z๐‘+1 โ†’ 0 were to hold. To show this, we can write

Z๐‘Z๐‘+1

=

โˆ‘๐‘โˆ’1๐‘˜=0 ๐œ‘๐‘˜๐œ๐‘˜โˆ‘๐‘๐‘˜=0 ๐œ‘๐‘˜๐œ๐‘˜

=

โˆ‘๐‘โˆ’1๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜๐œ‘๐‘ ๐œ๐‘

1 + โˆ‘๐‘โˆ’1๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜๐œ‘๐‘ ๐œ๐‘

So it suces to show that ๐‘๐‘ โ‰”โˆ‘๐‘โˆ’1๐‘˜=๐‘˜0

๐œ‘๐‘˜๐œ๐‘˜๐œ‘๐‘ ๐œ๐‘

โ†’ 0 as ๐‘ โ†’ โˆž. This we obtain by estimating

๐‘๐‘ =๐‘โˆ’1โˆ‘๐‘˜=0

๐œ๐‘˜/๐œ๐‘โˆ๐‘โˆ’1๐‘—=๐‘˜ (1 + ๐›พ๐œ ๐‘— )

โ‰ค๐‘โˆ’1โˆ‘๐‘˜=0

(1 + ๐›พ๐œ๐‘˜)/(1 + ๐›พ๐œ๐‘ )โˆ๐‘โˆ’1๐‘—=๐‘˜ (1 + ๐›พ๐œ ๐‘— )

=๐‘โˆ’1โˆ‘๐‘˜=0

1โˆ๐‘๐‘—=๐‘˜+1(1 + ๐›พ๐œ ๐‘— )

โ‰ค๐‘โˆ’1โˆ‘๐‘˜=0

(1 + ๐›พ๐œ๐‘˜+1)โˆ’(๐‘โˆ’๐‘˜) .

In the rst and last step we have used that {๐œ๐‘˜}๐‘˜โˆˆโ„• is increasing. Now we pick ๐‘Ž > 1 andnd ๐‘˜0 โˆˆ โ„• such that 1 + ๐›พ๐œ๐‘˜ โ‰ฅ ๐‘Ž for ๐‘˜ โ‰ฅ ๐‘˜0. Then for ๐‘ > ๐‘˜0,

๐‘๐‘ โ‰ค๐‘˜0โˆ’1โˆ‘๐‘˜=0

(1 + ๐›พ๐œ๐‘˜+1)โˆ’(๐‘โˆ’๐‘˜) +๐‘โˆ’1โˆ‘๐‘˜=๐‘˜0

๐‘Žโˆ’(๐‘โˆ’๐‘˜) =๐‘˜0โˆ’1โˆ‘๐‘˜=0

(1 + ๐›พ๐œ๐‘˜+1)โˆ’(๐‘โˆ’๐‘˜) +๐‘โˆ’๐‘˜0โˆ‘๐‘—=1

๐‘Žโˆ’ ๐‘— .

The rst term goes to zero as ๐‘ โ†’ โˆž while the second term, as a geometric series,converges to ๐‘Žโˆ’1/(1 โˆ’ ๐‘Žโˆ’1). We therefore deduce that lim๐‘โ†’โˆž ๐‘๐‘ โ‰ค ๐‘Žโˆ’1/(1 โˆ’ ๐‘Žโˆ’1). Letting๐‘Ž โ†’ โˆž, we see that ๐‘๐‘โ†’ 0. ๏ฟฝ

It is possible to improve the result to be nonergodic by showing that the proximal pointmethod is in fact monotonic.

Corollary 11.3 (proximal point method function value). The proximal point method is mono-tonic, i.e.,๐บ (๐‘ฅ๐‘˜+1) โ‰ค ๐บ (๐‘ฅ๐‘˜) for all ๐‘˜ โˆˆ โ„•. Therefore the convergence rates of Theorem 11.2 alsohold for ๐บ (๐‘ฅ๐‘ ) โ†’ ๐บmin.

150

11 gaps and ergodic results

Proof. We know from (11.11) that

0 โ‰ค ๐œโˆ’1๐‘˜ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ = ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜+1ใ€‰๐‘‹ โ‰ค ๐บ (๐‘ฅ๐‘˜) โˆ’๐บ (๐‘ฅ๐‘˜+1).

This proves monotonicity. Now (11.16) gives

Z๐‘ [๐บ (๐‘ฅ๐‘ ) โˆ’๐บ (๐‘ฅ)] โ‰ค ๐ถ0.

Now we proceed using the growth estimates for Z๐‘ in the proof of Theorem 11.2. ๏ฟฝ

These results can be extended to the explicit splitting method,

๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œโˆ‡๐น (๐‘ฅ๐‘˜)),

in a straightforward manner. In the next theorem, observe in comparison to Theorem 10.2that ๐œ๐ฟ โ‰ค 1 instead of ๐œ๐ฟ โ‰ค 2. This kind of factor-of-two stricter step length or Lipschitzfactor bound is a general feature of function value estimates of methods involving anexplicit step, as well as of the gap estimates in the following sections. It stems from thecorresponding dierence between the value estimate (7.8) and the non-value estimate (7.9)in Corollary 7.2.

Theorem 11.4 (explicit spliing function value). Let ๐ฝ โ‰” ๐น + ๐บ where ๐บ : ๐‘‹ โ†’ โ„ and๐น : ๐‘‹ โ†’ โ„ are convex, proper, and lower semicontinuous, with ๐น moreover ๐ฟ-smooth. Suppose[๐œ•๐ฝ ]โˆ’1(0) โ‰  โˆ…. If ๐œ๐ฟ โ‰ค 1, the explicit splitting method satises both ๐ฝ (๐‘ฅ๐‘ ) โ†’ ๐ฝmin at the rate๐‘‚ (1/๐‘ ). If ๐บ is strongly convex, then this convergence is linear.

Proof. With ๐œ๐‘˜ โ‰” ๐œ , as usual, we write the method as

(11.17) 0 โˆˆ ๐œ๐‘˜ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

We then take arbitrary ๐‘ฅ โˆˆ [๐œ•(๐น + ๐บ)]โˆ’1(0) and use the three-point smoothness of ๐นproved in Corollary 7.2, and the subdierential characterization of strong convexity of ๐บ ,Lemma 7.4 (ii), to obtain

ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐ฝ (๐‘ฅ๐‘˜+1) โˆ’ ๐ฝ (๐‘ฅ) + ๐›พ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

As in the proof of Theorem 11.2, after testing (11.17) by the application of ๐œ‘๐‘˜ ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ ,we now obtain

(11.18) ๐œ‘๐‘˜+12 โ€–๐‘ฅ๐‘˜+1โˆ’๐‘ฅ โ€–2๐‘‹ +๐œ‘๐‘˜๐œ๐‘˜ [๐ฝ (๐‘ฅ๐‘˜+1) โˆ’ ๐ฝ (๐‘ฅ)] +๐œ‘๐‘˜ (1 โˆ’ ๐œ๐‘˜๐ฟ)

2 โ€–๐‘ฅ๐‘˜+1โˆ’๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜ โˆ’๐‘ฅ โ€–2๐‘‹ .

Since ๐œ๐‘˜๐ฟ โ‰ค 1, we may proceed as in Theorem 11.2 to prove the ergodic convergences.

๏ฟฝ

151

11 gaps and ergodic results

Again, we can show nonergodic convergence due to the monotonicity of the iteration.

Corollary 11.5. The convergence rates of Theorem 11.4 also hold for ๐ฝ (๐‘ฅ๐‘ ) โ†’ ๐ฝmin.

Proof. We obtain from (11.17) and the smoothness of ๐น (see (7.5)) that

๐œโˆ’1๐‘˜ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ = ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜+1ใ€‰๐‘‹ โ‰ค ๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ๐‘˜+1) + ๐ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Since ๐ฟ๐œ๐‘˜ โ‰ค 1 < 2, we obtain monotonicity. The rest now follows as in Theorem 11.2and Corollary 11.3. ๏ฟฝ

Remark 11.6. Based on Corollary 7.7, any strong convexity of ๐น can also be used to obtain linearconvergence by adapting the steps of the proof of Theorem 11.4.

11.3 ergodic gap estimates

We now study the convergence of gap functionals for general unaccelerated schemes ofthe form (11.3). Since ๏ฟฝ๏ฟฝ may in general not have the same factor ๐ฟ of smoothness on allsubspaces, we introduce the condition (11.19) of the next result. It is simply a version ofthe standard result of Corollary 7.2 that allows a block-separable structure through theoperator ฮ› in place of the factor ๐ฟ; compare Example 10.4.

Theorem 11.7. Let ๐ป โ‰” ๐œ•๐น + โˆ‡๏ฟฝ๏ฟฝ + ฮž, where ฮž โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) is skew-adjoint and ๏ฟฝ๏ฟฝ : ๐‘‹ โ†’ โ„

and ๐น : ๐‘‹ โ†’ โ„ are convex, proper, and lower semicontinuous. Suppose ๐น satises for someฮ› โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) the three-point smoothness condition

(11.19) ใ€ˆโˆ‡๐น (๐‘ง), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โˆ’ 12 โ€–๐‘ง โˆ’ ๐‘ฅ โ€–

2ฮ› (๐‘ฅ, ๐‘ฅ, ๐‘ง โˆˆ ๐‘‹ ).

Also let ๐‘€ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) be positive semi-denite and self-adjoint. Pick ๐‘ฅ0 โˆˆ ๐‘‹ , and let thesequence {๐‘ฅ๐‘˜+1}๐‘˜โˆˆโ„• be generated through the iterative solution of (11.3). Then for every ๐‘ฅ โˆˆ ๐‘‹ ,

(11.20) 12 โ€–๐‘ฅ

๐‘ โˆ’ ๐‘ฅ โ€–2๐‘๐‘€ +๐‘โˆ’1โˆ‘๐‘˜=0

(G(๐‘ฅ๐‘˜+1;๐‘ฅ) + 1

2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€โˆ’ฮ›

)โ‰ค 1

2 โ€–๐‘ฅ1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘€ .

Proof. Observe that (11.19) implies

(11.21) ใ€ˆโˆ‡๐น (๐‘ง), ๐‘ฅ โˆ’ ๐‘ฅใ€‰ โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) โˆ’ 12 โ€–๐‘ง โˆ’ ๐‘ฅ โ€–

2ฮ› (๐‘ฅ, ๐‘ง โˆˆ ๐‘‹ ).

Likewise, by the convexity of ๐น we have

(11.22) ใ€ˆ๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰ โ‰ฅ ๏ฟฝ๏ฟฝ (๐‘ฅ) โˆ’ ๏ฟฝ๏ฟฝ (๐‘ฅ) (๐‘ฅ โˆˆ ๐‘‹ ).

152

11 gaps and ergodic results

Using (11.21) and (11.22), we obtain

(11.23) ใ€ˆ๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜) + ฮž๐‘ฅ๐‘˜+1, ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰โ‰ฅ (๏ฟฝ๏ฟฝ + ๐น ) (๐‘ฅ๐‘˜+1) โˆ’ (๏ฟฝ๏ฟฝ + ๐น ) (๐‘ฅ) + ใ€ˆฮž๐‘ฅ๐‘˜+1, ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ 1

2 โ€–๐‘ง โˆ’ ๐‘ฅ โ€–2ฮ›

= G(๐‘ฅ๐‘˜+1;๐‘ฅ) โˆ’ 12 โ€–๐‘ง โˆ’ ๐‘ฅ โ€–

2ฮ›.

In the nal stepwe have also referred to the denition of G in (11.4) and the skew-adjointnessof ฮž.

From here on, our arguments are already standard: We test (11.3) through the applicationof ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰, obtaining

0 โˆˆ ใ€ˆ๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜) + ฮž๐‘ฅ๐‘˜+1 +๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰.

Then we insert (11.23), which gives

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€ + G(๐‘ฅ๐‘˜+1;๐‘ฅ) + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘€โˆ’ฮ› โ‰ค 12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘€ .

Summing over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 yields (11.20). ๏ฟฝ

In particular, we obtain the following corollary that shows that G(๐‘ฅ๐‘ ;๐‘ฅ) โ†’ G(๐‘ฅ ;๐‘ฅ) = 0at the rate ๐‘‚ (1/๐‘ ) for any ๐‘ฅ โˆˆ ๐ปโˆ’1(0). Even further, taking any bounded set ๐ต โŠ‚ ๐‘‹ suchthat ๐ปโˆ’1(0) โˆฉ ๐ต โ‰  โˆ…, we see that also the partial gap G(๐‘ฅ๐‘ ;๐ต) โ†’ G(๐‘ฅ ;๐ต) = 0.

Corollary 11.8. In Theorem 11.7, suppose in addition that ๐‘€ โ‰ฅ ฮ› and dene the ergodicsequence

๐‘ฅ๐‘ โ‰”1๐‘

๐‘โˆ’1โˆ‘๐‘˜=0

๐‘ฅ๐‘˜+1.

ThenG(๐‘ฅ๐‘ ;๐‘ฅ) โ‰ค 1

2๐‘ โ€–๐‘ฅ 1 โˆ’ ๐‘ฅ โ€–2๐‘€ .

Proof. This follows immediately from using๐‘€ โ‰ฅ ฮ› to eliminate the term 12 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€โˆ’ฮ›

from (11.20) and then using Jensenโ€™s inequality on the gap. ๏ฟฝ

Due to the presence of ฮž, we cannot in general prove monotonicity of the abstract proximalpoint method and thus get rid of the ergodicity of the estimates.

153

11 gaps and ergodic results

implicit splitting

We now consider the solution of

min๐‘ฅโˆˆ๐‘‹

๐น (๐‘ฅ) +๐บ (๐‘ฅ).

Setting ๐ต = ๐œ•๐น and ๐ด = ๐œ•๐บ , (9.16), the Douglasโ€“Rachford or implicit splitting method canbe written in the general form (11.3) with ๐‘ข = (๐‘ฅ, ๐‘ฆ, ๐‘ง),

๏ฟฝ๏ฟฝ (๐‘ข) โ‰” ๐œ๐บ (๐‘ฆ) + ๐œ๐น (๐‘ฅ), ๐น โ‰ก 0,

ฮž โ‰”ยฉยญยซ0 Id โˆ’Idโˆ’Id 0 IdId โˆ’Id 0

ยชยฎยฌ , and ๐‘€ โ‰”ยฉยญยซ0 0 00 0 00 0 ๐ผ

ยชยฎยฌ .Moreover,

(11.24) ๐ป (๐‘ข) โ‰” ๐œ•๏ฟฝ๏ฟฝ (๐‘ข) + ฮž๐‘ข.

We then have the following ergodic estimate for

GDRS(๐‘ข;๐‘ข) = [๐บ (๐‘ฆ) + ๐น (๐‘ฅ)] โˆ’ [๐บ (๐‘ฅ) + ๐น (๐‘ฅ)] + ใ€ˆ๐‘ฅ โˆ’ ๏ฟฝ๏ฟฝ, ๐‘ฅ โˆ’ ๐‘ฆใ€‰ โ‰ฅ 0.

Theorem 11.9. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous.Let ๐‘ข โˆˆ ๐ปโˆ’1(0) for ๐ป given by (11.24). Then for any initial iterate ๐‘ข0 = (๐‘ฅ0, ๐‘ฆ0, ๐‘ง0) โˆˆ ๐‘‹ 3, theiterates {๐‘ข๐‘˜}๐‘˜โˆˆ๐‘ of the implicit splitting method (8.8) satisfy

GDRS(๏ฟฝ๏ฟฝ๐‘ ;๐‘ข) โ‰ค 12๐‘๐œ โ€–๐‘ข

1 โˆ’ ๐‘ขโ€–2๐‘€ , where ๏ฟฝ๏ฟฝ๐‘ โ‰”1๐‘

๐‘โˆ’1โˆ‘๐‘˜=0

๐‘ข๐‘˜+1.

Proof. Clearly๐‘€ is self-adjoint and positive semi-denite, and๐‘€ โ‰ฅ ฮ› โ‰” 0. The rest is clearfrom Corollary 11.8 by moving ๐œ from G on the right-hand side, and using that ๐‘ฅ = ๐‘ฆ . ๏ฟฝ

Clearly, following the discussion in Section 11.1, we can dene a partial version of GDRSand obtain its convergence from Theorem 11.9.

primal-dual explicit splitting

We recall that the PDES method (8.23) for (11.5) corresponds to (11.5) with the choice ๐น0 = 0and ๐ธ = ๐น , while the preconditioning operator is given by

๐‘€ โ‰”

(Id 00 Id โˆ’ ๐พ๐พโˆ—

)With this, we obtain the following estimate for the Lagrangian duality gap dened in(11.7).

154

11 gaps and ergodic results

Theorem 11.10. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous,and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ). Suppose ๐น is be Gรขteaux dierentiable with ๐ฟ-Lipschitz gradient for ๐ฟ โ‰ค 1,and that โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ‰ค 1. Then for any initial iterate๐‘ข0 โˆˆ ๐‘‹ ร—๐‘Œ , the iterates {๐‘ข๐‘˜ = (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜)}๐‘˜โˆˆโ„•of (8.23) satisfy for all ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ the ergodic gap estimate

G(๏ฟฝ๏ฟฝ๐‘ ;๐‘ข) โ‰ค 12๐‘ โ€–๐‘ข1 โˆ’ ๐‘ขโ€–2๐‘€ , where ๏ฟฝ๏ฟฝ๐‘ โ‰”

1๐‘

๐‘โˆ’1โˆ‘๐‘˜=0

๐‘ข๐‘˜+1.

In particular, if ๐ต โŠ‚ ๐‘‹ is bounded and ๐ต โˆฉ๐ปโˆ’1(0) โ‰  โˆ…, the partial duality gap G(๐‘ข๐‘ , ๐ต) โ†’ 0at the rate ๐‘‚ (1/๐‘ ).

Proof. We use Corollary 11.8. Using the assumed bound โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ‰ค 1, clearly ๐‘€ is self-adjoint and positive semi-denite. By Corollary 7.2, the three-point smoothness condition(11.19) holds with ฮ› โ‰”

(๐ฟ 00 0

), where ๐ฟ is the Lipschitz factor of โˆ‡๐น . Since โ€–๐พ โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ‰ค 1

and ๐ฟ โ‰ค 1, we also verify๐‘€ โ‰ฅ ฮ›. The rest now follows from Corollary 11.8 as well as thenonnegativity of the partial duality gap (11.8). ๏ฟฝ

primal-dual proximal splitting

We continue with the problem (11.5) and the corresponding structure (11.6) for ๐ป . We recallfrom Corollaries 9.13 and 9.18 that for the unaccelerated PDPS we take the preconditioningoperator as

(11.25) ๐‘€ โ‰”

(๐œโˆ’1Id โˆ’๐พโˆ—

โˆ’๐พ ๐œŽโˆ’1Id

)for some primal and dual step length parameters ๐œ, ๐œŽ > 0. We now obtain the followingresult for the Lagrangian duality gap dened in (11.7).

Theorem 11.11. Let ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„, and ๐บ : ๐‘‹ โ†’ โ„ be proper, convex, andlower semicontinuous, and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ). Suppose ๐ธ is Gรขteaux dierentiable with ๐ฟ-Lipschitzgradient. Take ๐œŽ, ๐œ > 0 satisfying

๐ฟ๐œ + ๐œ๐œŽ โ€–๐พ โ€–2 < 1.

Then for any initial iterate ๐‘ข0 โˆˆ ๐‘‹ ร— ๐‘Œ the iterates {๐‘ข๐‘˜ = (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜)}๐‘˜โˆˆโ„• of the PDPS method(9.29) satisfy for any ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ the ergodic gap estimate

G(๏ฟฝ๏ฟฝ๐‘ ;๐‘ข) โ‰ค 12๐‘๐œ โ€–๐‘ข

1 โˆ’ ๐‘ขโ€–2๐‘€ , where ๏ฟฝ๏ฟฝ๐‘ โ‰”1๐‘

๐‘โˆ’1โˆ‘๐‘˜=0

๐‘ข๐‘˜+1.

In particular, if ๐ต โŠ‚ ๐‘‹ is bounded and ๐ต โˆฉ๐ปโˆ’1(0) โ‰  โˆ…, the partial duality gap G(๐‘ข๐‘ , ๐ต) โ†’ 0at the rate ๐‘‚ (1/๐‘ ).

155

11 gaps and ergodic results

Proof. We use Corollary 11.8. By Corollary 7.2, the three-point smoothness condition (11.19)holds with ฮ› โ‰”

(๐ฟ 00 0

), where ๐ฟ is the Lipschitz factor of โˆ‡๐ธ. In Corollary 9.18 we have

already proved that ๐‘๐‘€ is self-adjoint and positive semi-denite. Similarly to the proofof the corollary, we verify that the condition ๐ฟ๐œ + ๐œ๐œŽ โ€–๐พ โ€–2 < 1 guarantees ๐‘€ โ‰ฅ ฮ›. (Theonly dierence to the conditions in that result is the standard gap estimate factor-of-twodierence in the term containing ๐ฟ.) The rest is clear from Corollary 11.8 as well as thenonnegativity of the partial duality gap (11.8). ๏ฟฝ

11.4 the testing approach in its general form

We now want to produce gap estimates for accelerated methods. As we have seen inSection 10.1, as an extension of (11.3) these iteratively solve

(11.26) 0 โˆˆ๐‘Š๐‘˜+1 [๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜) + ฮž๐‘ฅ๐‘˜+1] +๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)

for iteration-dependent step length and preconditioning operators๐‘Š๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) and๐‘€๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ). We also introduced testing operators ๐‘๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) such that ๐‘๐‘˜+1๐‘€๐‘˜+1is self-adjoint and positive semi-denite.

Unless ๐‘๐‘˜+1๐‘Š๐‘˜+1 is a scalar multiple of the identity, we will not be able to extract in astraightforward way any of the gap functionals of Section 11.1 out of (11.26). Indeed, it isnot clear how to provide a completely general approach to gap functionals of acceleratedor otherwise complex algorithms. We will specically see the diculties when performinggap realignment for the accelerated PDPS in Section 11.5 and when developing very specicgap functionals for the ADMM in Section 11.6.

Towards brevity in the following sections, we however do some general preparatory work.Observe that the method (11.26) can be written more abstractly as

(11.27) 0 โˆˆ ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1) +๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)

for some iteration-dependent set-valued function ๐ป๐‘˜+1 : ๐‘‹ โ‡’ ๐‘‹ . The estimate (11.28) inthe next theorem is in essence a quantitative or variable-metric version of the three-pointsmoothness and strong convexity estimate (7.16). The proof of the following result is alreadystandard, where the abstract value V๐‘˜+1(๐‘ฅ) models a suitable gap functional for iterate๐‘ฅ๐‘˜+1.

Theorem 11.12. On a Hilbert space ๐‘‹ , let ๐ป๐‘˜+1 : ๐‘‹ โ‡’ ๐‘‹ , and๐‘€๐‘˜+1, ๐‘๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) for ๐‘˜ โˆˆ โ„•.Suppose (11.27) is solvable for the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. If ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint and

(11.28) ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘๐‘˜+1 โ‰ฅ V๐‘˜+1(๐‘ฅ) +12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+2๐‘€๐‘˜+2โˆ’๐‘๐‘˜+1๐‘€๐‘˜+1โˆ’ 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1

156

11 gaps and ergodic results

for all ๐‘˜ โˆˆ โ„• and some ๐‘ฅ โˆˆ ๐‘‹ and V๐‘˜+1(๐‘ฅ) โˆˆ โ„, then both

(11.29) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+2๐‘€๐‘˜+2 + V๐‘˜+1(๐‘ฅ) โ‰ค12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 (๐‘˜ โˆˆ โ„•)

and

(11.30) 12 โ€–๐‘ฅ

๐‘ โˆ’ ๐‘ฅ โ€–2๐‘๐‘+1๐‘€๐‘+1 +๐‘โˆ’1โˆ‘๐‘˜=0

V๐‘˜+1(๐‘ฅ) โ‰ค12 โ€–๐‘ฅ

0 โˆ’ ๐‘ฅ โ€–2๐‘1๐‘€1(๐‘ โ‰ฅ 1).

Proof. Inserting (11.27) into (11.28), we obtain

(11.31) โˆ’ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+2๐‘€๐‘˜+2โˆ’๐‘๐‘˜+1๐‘€๐‘˜+1โˆ’ 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 + V๐‘˜+1(๐‘ฅ).

We recall for general self-adjoint๐‘€ the three-point formula (9.1), i.e.,

ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘€ =12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘€ โˆ’ 12 โ€–๐‘ฅ

๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘€ + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€ .

Using this with ๐‘€ = ๐‘๐‘˜+1๐‘€๐‘˜+1, we rewrite (11.31) as (11.29). Summing (11.29) over ๐‘˜ =0, . . . , ๐‘ โˆ’ 1, we obtain (11.30). ๏ฟฝ

11.5 ergodic gaps for accelerated primal-dual methods

To derive ergodic gap estimates for the accelerated primal-dual proximal splitting of Theo-rem 10.8, we need to perform signicant additional work due to the fact that [๐‘˜ โ‰” ๐œ‘๐‘˜๐œ๐‘˜ โ‰ ๐œ“๐‘˜+1๐œŽ๐‘˜+1. The overall idea of the proof remains the same, but we need to pay special atten-tion to the blockwise structure of the problem and to do some realignment of the blocks toget the same factor [๐‘˜ in front of both ๐บ and ๐น .

duality gap realignment

We continue with the problem (11.5) and the setup (11.6). Working with the general scheme(11.27), we write

(11.32a) ๐ป๐‘˜+1(๐‘ข) โ‰”๐‘Š๐‘˜+1(๐œ•๏ฟฝ๏ฟฝ (๐‘ข๐‘˜+1) + โˆ‡๐น (๐‘ข๐‘˜) + ฮž)

taking as in Theorem 10.8 the testing and step length operators

๐‘Š๐‘˜+1 โ‰”(๐œ๐‘˜ Id 00 ๐œŽ๐‘˜+1Id

)and ๐‘๐‘˜+1 โ‰”

(๐œ‘๐‘˜ Id 00 ๐œ“๐‘˜+1Id

)(11.32b)

157

11 gaps and ergodic results

for some step length and testing parameters ๐œ๐‘˜ , ๐œŽ๐‘˜+1, ๐œ‘๐‘˜ , ๐œŽ๐‘˜+1 > 0. Throughout this sectionwe also take

(11.32c) ฮ“ โ‰”

(๐›พ ยท Id 00 ๐œŒ ยท Id

)and ฮ› โ‰”

(๐ฟ ยท Id 00 0

).

For the moment, we do not yet need to know the specic structure of ๐‘€๐‘˜+1; hence thefollowing estimates apply not only to the PDPS method but also to the PDES method andits potential accelerated variants.

Lemma 11.13. Let us be given ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), ๐น = ๐น0 + ๐ธ with ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„, and๐บโˆ— : ๐‘Œ โ†’ โ„ convex, proper, and lower semicontinuous on Hilbert spaces ๐‘‹ and ๐‘Œ . Suppose ๐น0and ๐บโˆ— are (strongly) convex for some ๐›พ, ๐œŒ โ‰ฅ 0, and ๐ธ has ๐ฟ-Lipschitz continuous gradient.With the setup of (11.6) and (11.32), for any ๐‘ข,๐‘ข โˆˆ ๐‘‹ ร— ๐‘Œ and any ๐‘˜ โˆˆ โ„• we have

ใ€ˆ๐ป๐‘˜+1(๐‘ข), ๐‘ข โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ G๐‘˜+1(๐‘ข;๐‘ข) +12 โ€–๐‘ข โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1 (2ฮž+ฮ“) โˆ’

14 โ€–๐‘ข โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ›

for

G๐‘˜+1(๐‘ข;๐‘ข) โ‰” ๐œ‘๐‘˜๐œ๐‘˜ (๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ)) +๐œ“๐‘˜+1๐œŽ๐‘˜+1(๐บโˆ—(๐‘ฆ) โˆ’๐บโˆ—(๐‘ฆ))+ ใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ โˆ’ ใ€ˆ(๐พ๐œ‘๐‘˜๐œ๐‘˜ โˆ’๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ .

Proof. Expanding ๐ป๐‘˜+1, we have

ใ€ˆ๐ป๐‘˜+1(๐‘ข), ๐‘ข โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 = ๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐œ•๐น0(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹+ ๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆโˆ‡๐ธ (๐‘ฅ๐‘˜), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹+๐œ“๐‘˜+1๐œŽ๐‘˜+1ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ+ ใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ .

Observe thatใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ

= ใ€ˆ(๐พ๐œ‘๐‘˜๐œ๐‘˜ โˆ’๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ) (๐‘ฅ โˆ’ ๐‘ฅ), ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ+ ใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ

=12 โ€–๐‘ข โˆ’ ๐‘ขโ€–22๐‘๐‘˜+1๐‘Š๐‘˜+1ฮž โˆ’ ใ€ˆ(๐พ๐œ‘๐‘˜๐œ๐‘˜ โˆ’๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ+ ใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ .

Therefore

(11.33) ใ€ˆ๐ป๐‘˜+1(๐‘ข), ๐‘ข โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 = ๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐œ•๐น0(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹+ ๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆโˆ‡๐ธ (๐‘ฅ๐‘˜), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹+๐œ“๐‘˜+1๐œŽ๐‘˜+1ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ+ 12 โ€–๐‘ข โˆ’ ๐‘ขโ€–22๐‘๐‘˜+1๐‘Š๐‘˜+1ฮž โˆ’ ใ€ˆ(๐พ๐œ‘๐‘˜๐œ๐‘˜ โˆ’๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ

+ ใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜+1๐œŽ๐‘˜+1๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ .

158

11 gaps and ergodic results

Due to the smoothness three-point corollaries, specically (7.8), we have

(11.34a) ใ€ˆโˆ‡๐ธ (๐‘ฅ๐‘˜), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐ธ (๐‘ฅ) โˆ’ ๐ธ (๐‘ฅ) โˆ’ ๐ฟ

2 โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Also, by the (strong) convexity of ๐น0, we have

(11.34b) ใ€ˆ๐œ•๐น0(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น0(๐‘ฅ) โˆ’ ๐น0(๐‘ฅ) + ๐›พ2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ ,

as well as by the (strong) convexity of ๐บโˆ—

(11.34c) ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ), ๐‘ฆ โˆ’ ๐‘ฆใ€‰๐‘Œ โ‰ฅ ๐บโˆ—(๐‘ฆ) โˆ’๐บโˆ—(๐‘ฆ) + ๐œŒ2 โ€–๐‘ฆ โˆ’ ๐‘ฆ โ€–2๐‘Œ .

Applying these estimates in (11.33), and using the structure (11.32b) and (11.32c) of theinvolved operators, we obtain the claim. ๏ฟฝ

If๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜+1๐œŽ๐‘˜+1, clearly G๐‘˜+1(๐‘ข๐‘˜+1;๐‘ข) โ‰ฅ ๐œ‘๐‘˜๐œ๐‘˜G(๐‘ข๐‘˜+1). This is the case in the unacceleratedcase already considered in Theorems 11.10 and 11.11. Some specic stochastic acceleratedalgorithms also satisfy this [see Valkonen 2019]. Applying the techniques of Section 11.3,we could then use Jensenโ€™s inequality to estimate โˆ‘๐‘›โˆ’1

๐‘˜=0 G๐‘˜+1(๐‘ข๐‘˜+1;๐‘ข) โ‰ฅโˆ‘๐‘โˆ’1๐‘˜=0 ๐œ‘๐‘˜๐œ๐‘˜G(๐‘ข๐‘˜+1)

further from below to obtain a gap on suitable ergodic sequences. However, in our primaryaccelerated algorithm of interest, the PDPS method, instead ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ . We will thereforehave to do some rearrangements.

Lemma 11.14. Let ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), ๐น = ๐น0 + ๐ธ with ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„, and ๐บโˆ— : ๐‘Œ โ†’ โ„

convex, proper, and lower semicontinuous on Hilbert spaces ๐‘‹ and ๐‘Œ . Suppose ๐น0 and ๐บโˆ— are(strongly) convex for some ๐›พ, ๐œŒ โ‰ฅ 0, and ๐ธ has ๐ฟ-Lipschitz gradient. With the setup of (11.6)and (11.32), suppose ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ . If ๐‘ข โˆˆ ๐ปโˆ’1(0), then

(11.35) ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ Gโˆ—,๐‘˜+1(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข) +12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1 (2ฮž+ฮ“)

โˆ’ 12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ› (๐‘ โ‰ฅ 2)

for some Gโˆ—,๐‘˜+1(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข) satisfying with G given by (11.7) the estimate

(11.36)๐‘โˆ’1โˆ‘๐‘˜=0

Gโˆ—,๐‘˜+1(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข) โ‰ฅ๐‘โˆ’1โˆ‘๐‘˜=1

๐œ‘๐‘˜๐œ๐‘˜G(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข).

Proof. First, note that (11.35) holds for

Gโˆ—,๐‘˜+1(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข) โ‰” inf๐‘ค๐‘˜+1โˆˆ๐ป๐‘˜+1 (๐‘ข๐‘˜+1)

ใ€ˆ๐‘ค๐‘˜+1, ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1

โˆ’ 12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1 (2ฮž+ฮ“) +12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ›.

159

11 gaps and ergodic results

It remains to prove the estimate (11.36) for this choice.

With ๐‘ โ‰ฅ 1, let us dene the set

๐‘†๐‘ โ‰”๐‘โˆ’1โˆ‘๐‘˜=0

(ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โˆ’

12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1 (2ฮž+ฮ“) +12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘Š๐‘˜+1ฮ›

)=๐‘โˆ’1โˆ‘๐‘˜=0

(๐œ‘๐‘˜๐œ๐‘˜

(ใ€ˆ๐œ•๐น0(๐‘ฅ๐‘˜+1) + โˆ‡๐ธ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹

)+๐œ“๐‘˜+1๐œŽ๐‘˜+1

(ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ๐œŒ

2 โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–2๐‘Œ

)).

Observe that in the second expression, ๐‘๐‘˜+1๐‘Š๐‘˜+1ฮž has canceled the corresponding compo-nent of ๐ป๐‘˜+1. Then it is enough to prove that ๐‘†๐‘ โ‰ฅ โˆ‘๐‘โˆ’1

๐‘˜=1 ๐œ‘๐‘˜๐œ๐‘˜G(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข). To do this, weneed to shift ๐‘ฆ๐‘˜+1 to ๐‘ฆ๐‘˜ . With ๐‘ โ‰ฅ 2, we therefore rearrange terms to obtain

๐‘†๐‘ = ๐ด๐‘ + ๐ต๐‘for

๐ด๐‘ = ๐œ‘0๐œ0(ใ€ˆ๐œ•๐น0(๐‘ฅ 1) + โˆ‡๐ธ (๐‘ฅ0), ๐‘ฅ 1 โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ

1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐ฟ2 โ€–๐‘ฅ1 โˆ’ ๐‘ฅ0โ€–2๐‘‹

)+๐œ“๐‘๐œŽ๐‘

(ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ๐‘ ), ๐‘ฆ๐‘ โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ๐œŒ

2 โ€–๐‘ฆ๐‘ โˆ’ ๐‘ฆ โ€–2๐‘Œ

)โˆ’ ใ€ˆ(๐พ๐œ‘0๐œ0 โˆ’๐œ“๐‘๐œŽ๐‘๐พ)๐‘ฅ, ๐‘ฆใ€‰๐‘Œ + ใ€ˆ(๐œ‘0๐œ0๐พโˆ—)๐‘ฆ, ๐‘ฅ 1ใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘๐œŽ๐‘๐พ)๐‘ฅ, ๐‘ฆ๐‘ ใ€‰๐‘Œ

and

๐ต๐‘ โ‰”๐‘โˆ’1โˆ‘๐‘˜=1

(๐œ‘๐‘˜๐œ๐‘˜

(ใ€ˆ๐œ•๐น0(๐‘ฅ๐‘˜+1) + โˆ‡๐ธ (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹

)+๐œ“๐‘˜๐œŽ๐‘˜

(ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ๐‘˜), ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ๐œŒ

2 โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–2๐‘Œ

)+ ใ€ˆ(๐œ‘๐‘˜๐œ๐‘˜๐พโˆ—)๐‘ฆ, ๐‘ฅ๐‘˜+1ใ€‰๐‘‹ โˆ’ ใ€ˆ(๐œ“๐‘˜๐œŽ๐‘˜๐พ)๐‘ฅ, ๐‘ฆ๐‘˜ใ€‰๐‘Œ

)Observe that we only sum over ๐‘˜ = 1, . . . , ๐‘ โˆ’ 1 instead of ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1.

We can now use (11.34) and our assumption ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ to estimate

(11.37) ๐ต๐‘ โ‰ฅ๐‘โˆ’1โˆ‘๐‘˜=1

๐œ‘๐‘˜๐œ๐‘˜G(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜).

By Corollary 7.2, ๐ธ satises the three-point monotonicity estimate (7.9); in particular,

ใ€ˆโˆ‡๐ธ (๐‘ฅ0) โˆ’ โˆ‡๐ธ (๐‘ฅ), ๐‘ฅ 1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐ฟ2 โ€–๐‘ฅ1 โˆ’ ๐‘ฅ0โ€–2๐‘‹ .

160

11 gaps and ergodic results

Since ๐พโˆ—๐‘ฅ โˆˆ ๐œ•๐บโˆ—(๐‘ฆ), and โˆ’๐พ๐‘ฆ โˆˆ ๐œ•๐น0(๐‘ฅ) + โˆ‡๐ธ (๐‘ฅ), and ๐œ•๐น0 and ๐œ•๐บ are strongly monotone,we also obtain

ใ€ˆ๐œ•๐น0(๐‘ฅ 1) + โˆ‡๐ธ (๐‘ฅ) + ๐พโˆ—๐‘ฆ, ๐‘ฅ 1 โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โ‰ฅ 0 and

ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ๐‘ ) โˆ’ ๐พ๐‘ฅ, ๐‘ฆ๐‘ โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ๐œŒ

2 โ€–๐‘ฆ๐‘ โˆ’ ๐‘ฆ โ€–2๐‘Œ โ‰ฅ 0.

Rearranging and using these estimates we obtain

(11.38) ๐ด๐‘ = ๐œ‘0๐œ0(ใ€ˆ๐œ•๐น0(๐‘ฅ 1) + โˆ‡๐ธ (๐‘ฅ0) + ๐พโˆ—๐‘ฆ, ๐‘ฅ 1 โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ๐›พ2 โ€–๐‘ฅ

1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐ฟ2 โ€–๐‘ฅ1 โˆ’ ๐‘ฅ0โ€–2๐‘‹

)+๐œ“๐‘๐œŽ๐‘

(ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ๐‘ ) โˆ’ ๐พ๐‘ฅ, ๐‘ฆ๐‘ โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ๐›พ2 โ€–๐‘ฆ

๐‘ โˆ’ ๐‘ฆ โ€–2๐‘Œ)โ‰ฅ 0.

The estimates (11.37) and (11.38) nally give ๐‘†๐‘ โ‰ฅ โˆ‘๐‘โˆ’1๐‘˜=1 ๐œ‘๐‘˜๐œ๐‘˜G(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข) as we set out to

prove. ๏ฟฝ

In the proof of Lemma 11.14, we required ๐‘ข โˆˆ ๐ปโˆ’1(0) to show that ๐ด๐‘ โ‰ฅ 0. Therefore, asthe estimate (11.35) will not hold for an arbitrary base point ๐‘ข in place ๐‘ข, we will not beable to obtain for accelerated methods the convergence of the partial duality gap (11.8) thatconverges for unaccelerated methods.

The next theorem is our main result regarding ergodic gaps for general accelerated methods.As ๐›พ and ๐œŒ feature as acceleration parameters in algorithms, the conditions of this theoremimply that gap estimates require slower acceleration.

Theorem 11.15. Let ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), ๐น = ๐น0 + ๐ธ with ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„, and ๐บโˆ— : ๐‘Œ โ†’ โ„

convex, proper, and lower semicontinuous on Hilbert spaces ๐‘‹ and ๐‘Œ . Suppose ๐น0 and ๐บโˆ—

are (strongly) convex for some ๐›พ, ๐œŒ โ‰ฅ 0, and ๐ธ has ๐ฟ-Lipschitz gradient. Assume the setup(11.6) and (11.32). For each ๐‘˜ โˆˆ โ„•, also take ๐‘€๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ร— ๐‘Œ ;๐‘‹ ร— ๐‘Œ ) such that ๐‘๐‘˜+1๐‘€๐‘˜+1is self-adjoint. Pick an initial iterate ๐‘ข0 โˆˆ ๐‘‹ ร— ๐‘Œ and suppose {๐‘ข๐‘˜+1 = (๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜+1)}๐‘˜โˆˆโ„• aregenerated by (11.27). Let ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐ปโˆ’1(0). If ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ , and

(11.39) 12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1 (๐‘€๐‘˜+1โˆ’๐‘Š๐‘˜+1ฮ›) +12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1 (๐‘€๐‘˜+1+๐‘Š๐‘˜+1 (2ฮž+ฮ“))โˆ’๐‘๐‘˜+2๐‘€๐‘˜+2 โ‰ฅ 0,

then

(11.40) 12 โ€–๐‘ข

๐‘ โˆ’ ๐‘ขโ€–2๐‘๐‘+1๐‘€๐‘+1 + Zโˆ—,๐‘G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ;๐‘ข) โ‰ค โ€–๐‘ข0 โˆ’ ๐‘ขโ€–2๐‘1๐‘€1(๐‘ โ‰ฅ 2)

for G given by (11.7) and the ergodic sequences

๐‘ฅโˆ—,๐‘ โ‰” Z โˆ’1โˆ—,๐‘๐‘โˆ’1โˆ‘๐‘˜=1

๐œ๐‘˜๐œ‘๐‘˜๐‘ฅ๐‘˜+1 and ๏ฟฝ๏ฟฝโˆ—,๐‘ โ‰” Z โˆ’1โˆ—,๐‘

๐‘โˆ’1โˆ‘๐‘˜=1

๐œŽ๐‘˜๐œ“๐‘˜๐‘ฆ๐‘˜ for Zโˆ—,๐‘ โ‰”

๐‘โˆ’1โˆ‘๐‘˜=1

[๐‘˜ .

161

11 gaps and ergodic results

Proof. Using (11.35) in (11.39), we obtain (11.28) forV๐‘˜+1(๐‘ข) โ‰” Gโˆ—,๐‘˜+1(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข). By Jensenโ€™sinequality,

๐‘โˆ’1โˆ‘๐‘˜=0

Gโˆ—,๐‘˜+1(๐‘ฅ๐‘˜+1, ๐‘ฆ๐‘˜ ;๐‘ข) โ‰ฅ Zโˆ—,๐‘G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ;๐‘ข).

We therefore obtain (11.40) from (11.30) in Theorem 11.12. ๏ฟฝ

accelerated primal-dual proximal splitting

We now obtain gap estimates for the accelerated PDPS method. Observe the factor-of-twodierences in the denitions of ๐œ”๐‘˜ and in the initial conditions for the step lengths inthe following theorem compared to Theorem 10.8. Because strong convexity with factor๐›พ implies strong convexity with the factor ๐›พ/2, the conditions and step length rules ofthis theorem imply the iterate convergence results of Corollary 9.18 and Theorem 10.8 aswell.

Theorem 11.16 (gap estimates for PDPS). Let ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘Œ โ†’ โ„ beconvex, proper, and lower semicontinuous on Hilbert spaces๐‘‹ and ๐‘Œ with โˆ‡๐ธ ๐ฟ-Lipschitz. Alsolet ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and let ๐‘ข = (๐‘ฅ, ๐‘ฆ) be a primal-dual solution to the problem (11.5). Pick initialstep lengths ๐œ0, ๐œŽ0 > 0 subject to ๐ฟ๐œ0 + ๐œ0๐œŽ0โ€–๐พ โ€–2๐•ƒ(๐‘‹ ;๐‘Œ ) < 1. For any initial iterate ๐‘ข0 โˆˆ ๐‘‹ ร— ๐‘Œ ,suppose {๐‘ข๐‘˜+1}๐‘˜โˆˆโ„• are generated by the (accelerated) PDPS method (10.23). Let the Lagrangianduality gap functional G be given by (11.7), and the ergodic iterates ๐‘ฅโˆ—,๐‘ and ๏ฟฝ๏ฟฝโˆ—,๐‘ by (11.15).

(i) If we take ๐œ๐‘˜ โ‰ก ๐œ0 and ๐œŽ๐‘˜ โ‰ก ๐œŽ0, then the ergodic gap G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ;๐‘ข) โ†’ 0 at the rate๐‘‚ (1/๐‘ ).

(ii) If ๐น0 is strongly convex with factor ๐›พ > 0, and we take

๐œ”๐‘˜ โ‰” 1/โˆš1 + ๐›พ๐œ๐‘˜ , ๐œ๐‘˜+1 โ‰” ๐œ๐‘˜๐œ”๐‘˜ , and ๐œŽ๐‘˜+1 โ‰” ๐œŽ๐‘˜/๐œ”๐‘˜ ,

then G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ;๐‘ข) โ†’ 0 at the rate ๐‘‚ (1/๐‘ 2)(iii) If both ๐น0 and ๐บโˆ— are strongly convex with respective factors ๐›พ > 0 and ๐œŒ > 0, and we

take๐œ”๐‘˜ โ‰” 1/

โˆš1 + \, \ โ‰” min{๐œŒ๐œŽ0, ๐›พ๐œ0}, ๐œ๐‘˜ โ‰” ๐œ0 and ๐œŽ๐‘˜ โ‰” ๐œŽ0,

then G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ;๐‘ข) โ†’ 0 linearly.

Proof. We use Theorem 11.15 in place of Theorem 10.5 in the proof of Theorem 10.8. Werecall that the latter consists of showing ๐‘๐‘˜+1๐‘€๐‘˜+1 to be self-adjoint and (10.17) and๐‘€ โ‰ฅ ฮ›to hold, i.e.,

๐‘๐‘˜+1(๐‘€๐‘˜+1 + 2๐‘Š๐‘˜+1ฮ“) โ‰ฅ ๐‘๐‘˜+2๐‘€๐‘˜+2, and ๐‘๐‘˜+1(๐‘€๐‘˜+1 โˆ’๐‘Š๐‘˜+1ฮ›/2) โ‰ฅ 0,

162

11 gaps and ergodic results

Now, to prove (11.39), we instead prove the self-adjointness as well as

๐‘๐‘˜+1(๐‘€๐‘˜+1 +๐‘Š๐‘˜+1ฮ“) โ‰ฅ ๐‘๐‘˜+2๐‘€๐‘˜+2, and ๐‘๐‘˜+1(๐‘€๐‘˜+1 โˆ’๐‘Š๐‘˜+1ฮ›) โ‰ฅ 0.

These all follows from the proof of Theorem 10.8 with the factor-of-two dierences in theformulas for ๐œ”๐‘˜ and the initialization condition apparent from the statements of these twotheorems. The proof of Theorem 10.8 also veries that ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ .

All the conditions Theorem 11.15 are therefore satised, so (11.40) holds; in particular,Zโˆ—,๐‘G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ;๐‘ข) โ‰ค ๐ถ0 โ‰” โ€–๐‘ข0โˆ’๐‘ขโ€–2๐‘1๐‘€1

for all ๐‘ โ‰ฅ 2. It remains to study the convergencerate of the gap from this estimate. We have Zโˆ—,๐‘ =

โˆ‘๐‘โˆ’1๐‘˜=1 ๐œ‘

1/2๐‘˜

. In the unaccelerated case(๐›พ = 0), we get Zโˆ—,๐‘ = ๐‘๐œ‘ 1/2

0 . This gives the claimed๐‘‚ (1/๐‘ ) rate. In the accelerated case, ๐œ‘๐‘˜is of the order ฮฉ(๐‘˜2) by the proof of Theorem 10.8. Therefore also Zโˆ—,๐‘ is of the orderฮ˜(๐‘ 2),so we get the claimed๐‘‚ (1/๐‘ 2) convergence. In the linear convergence case, likewise, ๐œ‘๐‘˜ isexponential. Therefore so is Zโˆ—,๐‘ . ๏ฟฝ

Remark 11.17 (spatially adaptive and stochastic methods). Recalling the block-separability Exam-ple 10.3, consider the spaces ๐‘‹ = ๐‘‹1 ร— ยท ยท ยท ร— ๐‘‹๐‘š and ๐‘Œ = ๐‘Œ1 ร— ยท ยท ยท ร— ๐‘Œ๐‘› . Suppose ๐น (๐‘ฅ) = โˆ‘๐‘š

๐‘—=1 ๐น ๐‘— (๐‘ฅ ๐‘— )and๐บโˆ—(๐‘ฆ) = โˆ‘๐‘›

โ„“=1๐บโˆ—โ„“ (๐‘ฆโ„“ ) for ๐‘ฅ = (๐‘ฅ1, . . . , ๐‘ฅ๐‘š) โˆˆ ๐‘‹ and ๐‘ฆ = (๐‘ฆ1, . . . , ๐‘ฆ๐‘›) โˆˆ ๐‘Œ . Take ๐‘๐‘˜+1 โ‰”

(ฮฆ๐‘˜ 00 ฮจ๐‘˜+1

)as well as๐‘Š๐‘˜+1 โ‰”

(๐‘‡๐‘˜ 00 ฮฃ๐‘˜+1

)for ๐‘‡๐‘˜ โ‰”

โˆ‘๐‘›๐‘—=1 ๐œ๐‘˜,๐‘—๐‘ƒ ๐‘— , and similar expressions for ฮฆ๐‘˜ , ฮฃ๐‘˜+1, and ฮฃ๐‘˜+1,

where ๐‘ƒ ๐‘—๐‘ฅ โ‰” ๐‘ฅ ๐‘— projects into ๐‘‹ ๐‘— . Instead of ๐œ‘๐‘˜๐œ๐‘˜ = ๐œ“๐‘˜๐œŽ๐‘˜ that we required in (10.8), imposing๐”ผ[ฮฆ๐‘˜๐‘‡๐‘˜ ] = [๐‘˜ ๐ผ and ๐”ผ[ฮจ๐‘˜ฮฃ๐‘˜ ] = [๐‘˜ ๐ผ for some scalar [๐‘˜ , we may then start following through theproof of Theorem 10.8 to derive stochastic block-coordinate methods that randomly update onlysome of the blocks on each iteration, as well as methods that adapt the blockwise step lengths tothe spatial or blockwise structure of the problem. With somewhat more eort, we can also followthrough the proofs of the present Section 11.5. Specically, if we replace our ergodic sequences by

๐‘ฅโˆ—,๐‘ โ‰” Z โˆ’1โˆ—,๐‘๐”ผ

[๐‘โˆ’1โˆ‘๐‘˜=1

๐‘‡ โˆ—๐‘˜ ฮฆ

โˆ—๐‘˜๐‘ฅ๐‘˜+1

]and ๏ฟฝ๏ฟฝโˆ—,๐‘ โ‰” Z โˆ’1โˆ—,๐‘๐”ผ

[๐‘โˆ’1โˆ‘๐‘˜=1

ฮฃโˆ—๐‘˜ฮจ

โˆ—๐‘˜๐‘ฆ

๐‘˜

]for Zโˆ—,๐‘ โ‰”

๐‘โˆ’1โˆ‘๐‘˜=1

[๐‘˜ ,

we then obtain in place of (11.40) the estimate

๐”ผ

[12 โ€–๐‘ข

๐‘ โˆ’ ๐‘ขโ€–2๐‘๐‘ +1๐‘€๐‘ +1

]+ Zโˆ—,๐‘G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ) +

๐‘โˆ’1โˆ‘๐‘˜=0

๐”ผ [V๐‘˜+1(๐‘ข)] โ‰ค โ€–๐‘ข0 โˆ’ ๐‘ขโ€–2๐‘1๐‘€1.

If instead ๐”ผ[ฮฆ๐‘˜๐‘‡๐‘˜ ] = [๐‘˜ ๐ผ , and ๐”ผ[ฮจ๐‘˜+1ฮฃ๐‘˜+1] = [๐‘˜ ๐ผ , we get the result for the ergodic sequences

๐‘ฅ๐‘ โ‰” Z โˆ’1๐‘ ๐”ผ

[๐‘โˆ’1โˆ‘๐‘˜=0

๐‘‡ โˆ—๐‘˜ ฮฆ

โˆ—๐‘˜๐‘ฅ๐‘˜+1

]and ๏ฟฝ๏ฟฝ๐‘ โ‰” Z โˆ’1๐‘ ๐”ผ

[๐‘โˆ’1โˆ‘๐‘˜=0

ฮฃโˆ—๐‘˜+1ฮจ

โˆ—๐‘˜+1๐‘ฆ

๐‘˜+1]

where Z๐‘ โ‰”๐‘โˆ’1โˆ‘๐‘˜=0

[๐‘˜ .

In either case, if we do not or cannot, due to lack of strong convexity of some of the ๐นโ„“ , accelerateall of the blockwise step lengths ๐œ๐‘˜+1, ๐‘— with the same factor ๐›พ = ๐›พ ๐‘— , it will generally be the casethat ๐”ผ [V๐‘˜+1(๐‘ข)] < 0. This quantity will have such an order of magnitude that we get mixed๐‘‚ (1/๐‘ 2) + ๐‘‚ (1/๐‘ ) convergence rates. We refer to [Valkonen 2019] for details on such spatiallyadaptive and stochastic primal-dual methods, and [Wright 2015] for an introduction to the idea ofstochastic coordinate descent.

163

11 gaps and ergodic results

11.6 convergence of the admm

Let ๐บ : ๐‘‹ โ†’ โ„, ๐น : ๐‘ โ†’ โ„ be convex, proper, and lower semicontinuous, ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ),๐ต โˆˆ ๐•ƒ(๐‘ ;๐‘Œ ), and ๐‘ โˆˆ ๐‘Œ . Recall the problem(11.41) min

๐‘ฅ,๐‘ง๐ฝ (๐‘ฅ, ๐‘ง) โ‰” ๐บ (๐‘ฅ) + ๐น (๐‘ง) + ๐›ฟ๐ถ (๐‘ฅ, ๐‘ง),

where๐ถ โ‰” {(๐‘ฅ, ๐‘ง) โˆˆ ๐‘‹ ร— ๐‘ | ๐ด๐‘ฅ + ๐ต๐‘ง = ๐‘}.

We now show an ergodic convergence result for the ADMM applied to this problem, whichwe recall from (8.30) to read

(11.42)

๐‘ฅ๐‘˜+1 โˆˆ (๐ดโˆ—๐ด + ๐œโˆ’1๐œ•๐น )โˆ’1(๐ดโˆ—(๐‘ โˆ’ ๐ต๐‘ง๐‘˜ โˆ’ ๐œโˆ’1_๐‘˜)),๐‘ง๐‘˜+1 โˆˆ (๐ตโˆ—๐ต + ๐œโˆ’1๐œ•๐บ)โˆ’1(๐ตโˆ—(๐‘ โˆ’๐ด๐‘ฅ๐‘˜+1 โˆ’ ๐œโˆ’1_๐‘˜)),_๐‘˜+1 โ‰” _๐‘˜ + ๐œ (๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘).

The general structure of the convergence proof is very similar to all the other algorithmswe have studied. However, now the forward-step component does not arise as a gradientโˆ‡๐ธ but is a special nonself-adjoint preconditioner ๏ฟฝ๏ฟฝ๐‘–+1. Moreover, in the rst stage of theproof we obtain a convergence estimate for a duality gap that we then rene at the end ofthe proof to separate function value and constraint satisfaction estimates.

Theorem 11.18. Let ๐บ : ๐‘‹ โ†’ โ„ and ๐น : ๐‘ โ†’ โ„ be convex, proper, and lower semicontinuous,๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), ๐ต โˆˆ ๐•ƒ(๐‘ ;๐‘Œ ), and ๐‘ โˆˆ ๐‘Œ . Let ๐ฝ be dened as in (11.41), which we assume to admita solution (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆˆ ๐‘‹ ร—๐‘ . For arbitrary initial iterates (๐‘ฅ0, ๐‘ฆ0, ๐‘ง0), let {(๐‘ฅ๐‘˜+1, ๐‘ง๐‘˜+1, _๐‘˜+1)}๐‘˜โˆˆโ„• โŠ‚๐‘‹ ร— ๐‘ ร— ๐‘Œ be generated by the ADMM (11.42) for (11.41). Dene the ergodic sequences ๐‘ฅ๐‘ โ‰”1๐‘

โˆ‘๐‘โˆ’1๐‘˜=0 ๐‘ฅ

๐‘˜+1 and ๐‘ง๐‘ โ‰” 1๐‘

โˆ‘๐‘โˆ’1๐‘˜=0 ๐‘ง

๐‘˜+1. Then both (๐บ+๐น ) (๐‘ฅ๐‘ , ๐‘ง๐‘ ) โ†’ min๐‘ฅโˆˆ๐‘‹ ๐ฝ (๐‘ฅ) and โ€–๐ด๐‘ฅ๐‘ +๐ต๐‘ง๐‘ โˆ’ ๐‘ โ€–๐‘Œ โ†’ 0 at the rate ๐‘‚ (1/๐‘ ).

Proof. We consider the augmented problem

min(๐‘ฅ,๐‘ง)โˆˆ๐‘‹ร—๐‘

๐ฝ๐œ (๐‘ฅ, ๐‘ง) โ‰” ๐บ (๐‘ฅ) + ๐น (๐‘ง) + ๐›ฟ๐ถ (๐‘ฅ, ๐‘ง) + ๐œ2 โ€–๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘ โ€–2๐‘Œ ,

which has the same solutions as (11.41). As the normal cone to the constraint set ๐ถ at anypoint (๐‘ฅ, ๐‘ง) โˆˆ ๐ถ is given by ๐‘๐ถ (๐‘ฅ, ๐‘ง) = {(๐ดโˆ—_, ๐ตโˆ—_) | _ โˆˆ ๐‘Œ }, setting ๐‘ข = (๐‘ฅ, ๐‘ง, _) and

๐ป (๐‘ข) โ‰” ยฉยญยซ๐œ•๐บ (๐‘ฅ) +๐ดโˆ—_ + ๐œ๐ดโˆ—(๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘)๐œ•๐น (๐‘ง) + ๐ตโˆ—_ + ๐œ๐ตโˆ—(๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘)

โˆ’(๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘)ยชยฎยฌ ,

the optimality conditions for this problem can be written as 0 โˆˆ ๐ป (๐‘ข). In particular, thereexists _ โˆˆ ๐‘Œ such that (๐‘ฅ, ๏ฟฝ๏ฟฝ, _) โˆˆ ๐ปโˆ’1(0). However, we will not be needing this, and take _arbitrary.

164

11 gaps and ergodic results

We could rewrite the algorithm (11.42) as (11.27) with

๐ป๐‘˜+1(๐‘ข) = ๐ป (๐‘ข) and ๐‘€๐‘˜+1 =ยฉยญยซ0 โˆ’๐œ๐ดโˆ—๐ต โˆ’๐ดโˆ—

0 0 โˆ’๐ตโˆ—0 0 ๐œโˆ’1๐ผ

ยชยฎยฌ .However,๐‘€๐‘˜+1 is nonsymmetric, and any symmetrizing ๐‘๐‘˜+1 would make ๐‘๐‘˜+1๐ป๐‘˜+1 dicultto analyze. We therefore take instead

๐ป๐‘˜+1(๐‘ข) โ‰” ๐ป (๐‘ข) + ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ข โˆ’ ๐‘ข๐‘˜) with ๏ฟฝ๏ฟฝ๐‘˜+1 โ‰”ยฉยญยซ0 โˆ’๐œ๐ดโˆ—๐ต โˆ’๐ดโˆ—

0 โˆ’๐œ๐ตโˆ—๐ต โˆ’๐ตโˆ—0 0 0

ยชยฎยฌ ,as well as

๐‘€๐‘˜+1 โ‰”ยฉยญยซ0 0 00 ๐œ๐ตโˆ—๐ต 00 0 ๐œโˆ’1๐ผ

ยชยฎยฌ , and ๐‘๐‘˜+1 โ‰” ๐ผ .

Clearly ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint.

Let us set

ฮ“ โ‰” ๐œยฉยญยซ๐ดโˆ—๐ด ๐ดโˆ—๐ต 0๐ตโˆ—๐ด ๐ตโˆ—๐ต 00 0 0

ยชยฎยฌ and ฮž โ‰”ยฉยญยซ0 0 ๐ดโˆ—

0 0 ๐ตโˆ—

โˆ’๐ด โˆ’๐ต 0

ยชยฎยฌ .Using the fact that ๐ด๐‘ฅ + ๐ต๐‘ฅ = ๐‘ , observe that we can split ๐ป = ๐œ•๐น + ฮž, where

๐น (๐‘ข) โ‰” ๐บ (๐‘ฅ) + ๐น (๐‘ง) + ๐œ2 โ€–๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘ โ€–2๐‘Œ + ใ€ˆ๐‘, _ใ€‰๐‘Œ

= ๐บ (๐‘ฅ) + ๐น (๐‘ง) + 12 โ€–๐‘ข โˆ’ ๐‘ขโ€–2ฮ“ + ใ€ˆ๐‘, _ใ€‰๐‘Œ .

It follows

ใ€ˆ๐ป (๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ ๐น (๐‘ข๐‘˜+1) โˆ’ ๐น (๐‘ข) + 12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2ฮ“ + ใ€ˆ๐‘ข,๐‘ข๐‘˜+1ใ€‰ฮž= [๐น (๐‘ฅ๐‘˜+1) +๐บ (๐‘ง๐‘˜+1)] โˆ’ [๐น (๐‘ฅ) +๐บ (๐‘ฅ)] + ใ€ˆ๐‘, _๐‘˜+1 โˆ’ _ใ€‰๐‘Œ+ โ€–๐‘ข๐‘˜+1 โˆ’ ๐‘ขโ€–2ฮ“ + ใ€ˆ๐‘ข,๐‘ข๐‘˜+1ใ€‰ฮž.

Again using ๐ด๐‘ฅ + ๐ต๐‘ฅ = ๐‘ , we expand

ใ€ˆ๐‘ข,๐‘ข๐‘˜+1ใ€‰ฮž = ใ€ˆ_, ๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1ใ€‰๐‘Œ โˆ’ ใ€ˆ๐ด๐‘ฅ + ๐ต๏ฟฝ๏ฟฝ, _๐‘˜+1ใ€‰๐‘Œ= ใ€ˆ_, ๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘ใ€‰๐‘Œ โˆ’ ใ€ˆ๐‘, _๐‘˜+1 โˆ’ _ใ€‰๐‘Œ .

Thus

(11.43) ใ€ˆ๐ป (๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ [๐น (๐‘ฅ๐‘˜+1) +๐บ (๐‘ง๐‘˜+1)] โˆ’ [๐น (๐‘ฅ) +๐บ (๐‘ฅ)]+ โ€–๐‘ข๐‘˜+1 โˆ’ ๐‘ขโ€–2ฮ“ + ใ€ˆ_, ๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘ใ€‰๐‘Œ

= ๐น (๐‘ข๐‘˜+1; _) โˆ’ ๐น (๐‘ข; _) + โ€–๐‘ข๐‘˜+1 โˆ’ ๐‘ขโ€–2ฮ“

165

11 gaps and ergodic results

for

(11.44) ๐น (๐‘ข; _) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ง) + ใ€ˆ_, ๐ด๐‘ฅ + ๐ต๐‘ง โˆ’ ๐‘ใ€‰๐‘Œ .

On the other hand,

ใ€ˆ๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜ , ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1๏ฟฝ๏ฟฝ๐‘˜+1 = ใ€ˆโˆ’๐œ๐ต(๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜) โˆ’ (_๐‘˜+1 โˆ’ _๐‘˜), ๐ด(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ)ใ€‰๐‘Œ+ ใ€ˆโˆ’๐œ๐ต(๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜) โˆ’ (_๐‘˜+1 โˆ’ _๐‘˜), ๐ต(๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝ)ใ€‰๐‘Œ

= ใ€ˆโˆ’๐œ๐ต(๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜) โˆ’ (_๐‘˜+1 โˆ’ _๐‘˜), ๐ด(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ) + ๐ต(๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝ)ใ€‰๐‘Œ .From (11.42) we recall

_๐‘˜+1 โˆ’ _๐‘˜ = ๐œ (๐ด๐‘ฅ๐‘˜+1 + ๐ต๐‘ง๐‘˜+1 โˆ’ ๐‘) = ๐œ [๐ด(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ) + ๐ต(๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝ)] .Hence

(11.45) ใ€ˆ๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜ , ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1๏ฟฝ๏ฟฝ๐‘˜+1 = โˆ’โ€–๐‘ข๐‘˜+1 โˆ’ ๐‘ขโ€–2ฮ“ โˆ’ ใ€ˆ๐ต(๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜), _๐‘˜+1 โˆ’ _๐‘˜ใ€‰๐‘Œโ‰ฅ โˆ’โ€–๐‘ข๐‘˜+1 โˆ’ ๐‘ขโ€–2ฮ“ โˆ’

12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘–+1๐‘€๐‘–+1 .

Combining (11.43) and (11.45) it follows that

ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ ๐น (๐‘ข๐‘˜+1; _) โˆ’ ๐น (๐‘ข; _) โˆ’ 12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘–+1๐‘€๐‘–+1 .

By Theorem 11.12 now

12 โ€–๐‘ข

๐‘ โˆ’ ๐‘ขโ€–2๐‘๐‘+1๐‘€๐‘+1 +๐‘โˆ’1โˆ‘๐‘˜=0

(๐น (๐‘ข๐‘˜+1; _) โˆ’ ๐น (๐‘ข; _)

)โ‰ค 1

2 โ€–๐‘ข0 โˆ’ ๐‘ขโ€–2๐‘1๐‘€1

(๐‘ โ‰ฅ 1).

Writing ๏ฟฝ๏ฟฝ๐‘ = (๐‘ฅ๐‘ , ๏ฟฝ๏ฟฝ๐‘ , _๐‘ ) โ‰” 1๐‘

โˆ‘๐‘โˆ’1๐‘˜=0 ๐‘ข

๐‘˜+1, Jensenโ€™s inequality now shows that

(11.46) ๐น (๏ฟฝ๏ฟฝ๐‘ ; _) โˆ’ ๐น (๐‘ข; _) โ‰ค 12๐‘ โ€–๐‘ข0 โˆ’ ๐‘ขโ€–2๐‘1๐‘€1

(๐‘ โ‰ฅ 1).

Since ๐ด๐‘ฅ + ๐ต๏ฟฝ๏ฟฝ = ๐‘ , observe that ๐น ( ยท ; _) โˆ’ ๐น (๐‘ข; _) is the Lagrangian duality gap (11.7) forthe minmax formulation (8.27) of (11.41), hence nonnegative when ๐‘ข โˆˆ ๐ปโˆ’1(0). So (11.46)shows the convergence of the duality gap. However, we can improve the result somewhatsince _ was taken as arbitrary. Expanding ๐น using (11.44) and taking the supremum over_ โˆˆ ๐”น(0, ^) in (11.46), we thus obtain for any ^ > 0 the estimate

0 โ‰ค [๐น (๐‘ฅ๐‘ ) +๐บ (๐‘ง๐‘ )] โˆ’ [๐น (๐‘ฅ) +๐บ (๐‘ฅ)] + ^โ€–๐ด๐‘ฅ๐‘ + ๐ต๐‘ง๐‘ โˆ’ ๐‘ โ€–๐‘Œ= sup_โˆˆ๐”น(0, )

(๐น (๏ฟฝ๏ฟฝ๐‘ ; _) โˆ’ ๐น (๐‘ข; _)

)โ‰ค sup_โˆˆ๐”น(0, )

12๐‘ โ€–๐‘ข0 โˆ’ ๐‘ขโ€–2๐‘1๐‘€1

.

This gives the claim. ๏ฟฝ

166

12 META-ALGORITHMS

In this chapter, we consider several โ€œmeta-algorithmsโ€ for accelerating minimization al-gorithms such as the ones derived in the previous chapters. These include inertia andover-relaxation, as well as line searches. These schemes dier from the strong convexitybased acceleration of Chapter 9 in that no additional assumptions are made on ๐น and ๐บ .Rather, through the use of an additional extrapolated or interpolated point, the rst twoschemes attempt to obtain a second-order approximation of the function. Line search, onthe other hand, can be used to nd optimal parameters or to estimate unknown parameters.Throughout the chapter, we base our work in the abstract algorithm (11.27), i.e.,

(12.1) 0 โˆˆ ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1) +๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜),

where the iteration-dependent set-valued operator ๐ป๐‘˜+1 : ๐‘‹ โ‡’ ๐‘‹ in suitable sense approxi-mates a (monotone) operator๐ป : ๐‘‹ โ‡’ ๐‘‹ , whose root we intend to nd, and๐‘€๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ )is a linear preconditioner.

12.1 over-relaxation

We start with over-relaxation. Essentially, this amounts to taking (12.1) and replacing ๐‘ฅ๐‘˜in the preconditioner by an over-relaxed point ๐‘ง๐‘˜ dened for some parameters _๐‘˜ > 0through the recurrence

(12.2) ๐‘ง๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฅ๐‘˜+1 + (1 โˆ’ _โˆ’1๐‘˜ )๐‘ง๐‘˜ .

We thus seek to solve

(12.3) 0 โˆˆ ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1) +๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜) .

Since ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ = _โˆ’1๐‘˜(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜), we can write (12.1) as

(12.4) 0 โˆˆ ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1) + _๐‘˜๐‘€๐‘˜+1(๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜).

We can therefore lift the overall algorithm into the form (12.1) as

(12.5) 0 โˆˆ ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ž๐‘˜+1) + ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ž๐‘˜+1 โˆ’ ๐‘ž๐‘˜)

167

12 meta-algorithms

by taking ๐‘ž โ‰” (๐‘ฅ, ๐‘ง) with

(12.6) ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ž) โ‰”(๐ป๐‘˜+1(๐‘ฅ)_โˆ’1๐‘˜(๐‘ง โˆ’ ๐‘ฅ)

)and ๏ฟฝ๏ฟฝ๐‘˜+1 โ‰”

(0 _๐‘˜๐‘€๐‘˜+10 (๐ผ โˆ’ _โˆ’1

๐‘˜)๐ผ

).

To be able to use our previous estimate on ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘๐‘˜+1 , we would like to testwith

๐‘๐‘˜+1 โ‰”(_๐‘˜๐‘๐‘˜+1 0

0 0

).

Unfortunately, ๐‘๐‘˜+1๐‘€๐‘˜+1 is not self-adjoint, so Theorem 11.12 does not apply. However,observing from (12.2) that

(12.7) ๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1 = (1 โˆ’ _๐‘˜) (๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜),

we are able to proceed along the same lines of proof.

Theorem 12.1. On a Hilbert space ๐‘‹ , let ๐ป๐‘˜+1 : ๐‘‹ โ‡’ ๐‘‹ , and๐‘€๐‘˜+1, ๐‘๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) for ๐‘˜ โˆˆ โ„•.Suppose (12.3) is solvable for the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. If ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint,

(12.8) _2๐‘˜๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ _2๐‘˜+1๐‘๐‘˜+2๐‘€๐‘˜+2,

and

(12.9) ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘๐‘˜+1 โ‰ฅ V๐‘˜+1(๐‘ฅ) โˆ’12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘๐‘˜+1๐‘„๐‘˜+1for some ๐‘„๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ), for all ๐‘˜ โˆˆ โ„• and some ๐‘ฅ โˆˆ ๐‘‹ and V๐‘˜+1(๐‘ฅ) โˆˆ โ„, then

(12.10)_2๐‘˜+12 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+2๐‘€๐‘˜+2 + _๐‘˜V๐‘˜+1(๐‘ฅ) +

_๐‘˜2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2_๐‘˜ (2_๐‘˜โˆ’1)๐‘๐‘˜+1๐‘€๐‘˜+1โˆ’๐‘๐‘˜+1๐‘„๐‘˜+1

โ‰ค _2๐‘˜

2 โ€–๐‘ง๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 (๐‘˜ โˆˆ โ„•).

Proof. Taking ๐‘ž โ‰” (๐‘ฅ, ๐‘ฅ), we apply ใ€ˆ ยท , ๐‘ž๐‘˜+1 โˆ’ ๐‘žใ€‰๐‘๐‘˜+1 to (12.3). Thus

0 โˆˆ ใ€ˆ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ž๐‘˜+1) + ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ž๐‘˜+1 โˆ’ ๐‘ž๐‘˜), ๐‘ž๐‘˜+1 โˆ’ ๐‘žใ€‰๐‘๐‘˜+1 .

Observe that๐‘๐‘˜+1๏ฟฝ๏ฟฝ๐‘˜+1 =

(0 _2

๐‘˜๐‘๐‘˜+1๐‘€๐‘˜+1

0 0

).

Thus0 โˆˆ ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰_๐‘˜๐‘๐‘˜+1 + _2๐‘˜ ใ€ˆ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘๐‘˜+1๐‘€๐‘˜+1 .

Using (12.7) we then get

0 โˆˆ ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1โˆ’๐‘ฅใ€‰_๐‘˜๐‘๐‘˜+1 โˆ’_2๐‘˜ (1โˆ’_๐‘˜)โ€–๐‘ง๐‘˜+1โˆ’๐‘ง๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 +_2๐‘˜ ใ€ˆ๐‘ง๐‘˜+1โˆ’๐‘ง๐‘˜ , ๐‘ง๐‘˜+1โˆ’๐‘ฅใ€‰๐‘๐‘˜+1๐‘€๐‘˜+1 .

168

12 meta-algorithms

Using the three-point-identity (9.1), we rearrange this into

0 โˆˆ ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰_๐‘˜๐‘๐‘˜+1 +_2๐‘˜โˆ’ 2_2

๐‘˜(1 โˆ’ _๐‘˜)2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1

+ _2๐‘˜

2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 โˆ’_2๐‘˜

2 โ€–๐‘ง๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 .

Observe that _2๐‘˜โˆ’ 2_2

๐‘˜(1 โˆ’ _๐‘˜) = _2

๐‘˜(2_๐‘˜ โˆ’ 1). Using (12.2), (12.9), and (12.8), this gives

(12.10). ๏ฟฝ

Clearly we should try to ensure _๐‘˜ (2_๐‘˜ โˆ’ 1)๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ ๐‘๐‘˜+1๐‘„๐‘˜+1. If ๐‘๐‘˜+1๐‘€๐‘˜+1 = ๐‘0๐‘€0 is con-stant and ๐‘„๐‘˜+1 = 0, this holds if {_๐‘˜}๐‘˜โˆˆ๐‘ is nonincreasing and satises _๐‘˜ โ‰ฅ 1/2. Therefore,we cannot get any convergence rates from the iterates in this case. It is, however, possibleto obtain convergence of a gap, and it would be possible to obtain weak convergence.

The next result is a variant of Corollary 11.8 for over-relaxed methods.

Corollary 12.2. Let ๐ป โ‰” ๐œ•๐น + โˆ‡๏ฟฝ๏ฟฝ + ฮž, where ฮž โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) is skew-adjoint, and ๏ฟฝ๏ฟฝ : ๐‘‹ โ†’ โ„

and ๐น : ๐‘‹ โ†’ โ„ convex, proper, and lower semicontinuous. Suppose ๐น satises for someฮ› โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) the three-point smoothness condition (11.19). Also let ๐‘€ โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) be positivesemi-denite and self-adjoint. Pick ๐‘ฅ0 = ๐‘ง0 โˆˆ ๐‘‹ , and dene the sequence {(๐‘ฅ๐‘˜+1, ๐‘ง๐‘˜+1)}๐‘˜โˆˆโ„•through

(12.11){

0 โˆˆ [๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ๐‘˜+1) + ๐œ•๐น (๐‘ง๐‘˜) + ฮž๐‘ฅ๐‘˜+1] +๐‘€ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ง๐‘˜),๐‘ง๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฅ

๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)๐‘ง๐‘˜ .

Suppose {_๐‘˜}๐‘˜โˆˆโˆˆโ„• is nonincreasing and

(12.12) _๐‘˜ (2_๐‘˜ โˆ’ 1)๐‘€ โ‰ฅ ฮ› (๐‘˜ โˆˆ โ„•).

Then for every ๐‘ฅ โˆˆ ๐ปโˆ’1(0) and the gap functional G dened in (11.4),

(12.13) G(๐‘ฅ๐‘ ;๐‘ฅ) โ‰ค _202โˆ‘๐‘โˆ’1

๐‘˜=0 _๐‘˜โ€–๐‘ง0 โˆ’ ๐‘ฅ โ€–2๐‘€ , where ๐‘ฅ๐‘ โ‰”

1โˆ‘๐‘โˆ’1๐‘˜=0 _๐‘˜

๐‘โˆ’1โˆ‘๐‘˜=0

_๐‘˜๐‘ฅ๐‘˜+1.

Proof. The method (12.11) is (12.3) with ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ฅ) โ‰” ๐œ•๏ฟฝ๏ฟฝ (๐‘ฅ) +โˆ‡๐น (๐‘ง๐‘˜) +ฮž๐‘ฅ as well as๐‘€๐‘˜+1 โ‰ก ๐‘€and ๐‘๐‘˜+1 โ‰ก Id. Using (11.19) for ๐น , the convexity of ๏ฟฝ๏ฟฝ , and the assumption ๐‘๐‘Š = [Id, weobtain as in the proof of (11.7) the estimate

ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰ โ‰ฅ G(๐‘ฅ๐‘˜+1;๐‘ฅ) โˆ’ 12 โ€–๐‘ง

๐‘˜ โˆ’ ๐‘ฅ๐‘˜+1โ€–2ฮ›

169

12 meta-algorithms

This provides (12.9) while (12.12) and the constant choice of the testing and preconditioningoperators guarantee that _๐‘˜ (2_๐‘˜ โˆ’ 1)๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ ๐‘๐‘˜+1๐‘„๐‘˜+1 for ๐‘„๐‘˜+1 โ‰ก ฮ›. By Theorem 12.1,we now obtain

(12.14)_2๐‘˜+12 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘€ + _๐‘˜ G(๐‘ฅ๐‘˜+1;๐‘ฅ) โ‰ค _2

๐‘˜

2 โ€–๐‘ง๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘€ .

Summing over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1 and an application of Jensenโ€™s inequality nishes theproof. ๏ฟฝ

over-relaxed proximal point method

We apply the above results to the over-relaxed proximal point method

(12.15){๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐บ (๐‘ง๐‘˜),๐‘ง๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฅ

๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)๐‘ง๐‘˜ .

Theorem 12.3. Let๐บ : ๐‘‹ โ†’ โ„ be convex, proper, and lower semicontinuous with [๐œ•๐บ]โˆ’1(0) โ‰ โˆ…. Pick an initial iterate ๐‘ฅ0 = ๐‘ง0 โˆˆ ๐‘‹ . If {_๐‘˜}๐‘˜โˆˆโ„• โ‰ฅ 1/2 is nonincreasing, the ergodicsequence {๐‘ฅ๐‘ }๐‘โˆˆโ„• dened in (12.13) and generated from the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• of the over-relaxed proximal point method (12.15) satises ๐บ (๐‘ฅ๐‘ ) โ†’ ๐บmin โ‰” min๐‘ฅโˆˆ๐‘‹ ๐บ (๐‘ฅ) at the rate๐‘‚ (1/๐‘ ).

Proof. We apply Corollary 12.2 with ๏ฟฝ๏ฟฝ = ๐บ , ๐น = 0, ๐‘€ = ๐œโˆ’1Id. Clearly ๐น satises (11.19)with ฮ› = 0. Then (12.12) holds if 2_๐‘˜ โ‰ฅ 1, that is to say _๐‘˜ โ‰ฅ 1/2. For ๐‘ฅ โˆˆ argmin๐บ , wehave G(๐‘ฅ ;๐‘ฅ) = ๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) = ๐บ (๐‘ฅ) โˆ’๐บmin. Therefore Corollary 12.2 gives

(12.16) ๐บ (๐‘ฅ๐‘ ) โ‰ค ๐บmin +_20

2๐œ โˆ‘๐‘โˆ’1๐‘˜=0 _๐‘˜

โ€–๐‘ง0 โˆ’ ๐‘ฅ โ€–2๐‘‹

Since โˆ‘๐‘โˆ’1๐‘˜=0 _๐‘˜ โ‰ฅ ๐‘ /2 by the lower bound on _๐‘˜ , we get the claimed ๐‘‚ (1/๐‘ ) convergence

rate of the function values for the ergodic sequence. ๏ฟฝ

over-relaxed forward-backward splitting

For a smooth function ๐น , the over-relaxed forward-backward splitting iterates

(12.17){๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐บ (๐‘ง๐‘˜ โˆ’ ๐œโˆ‡๐น (๐‘ง๐‘˜)),๐‘ง๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฅ

๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)๐‘ง๐‘˜ .

170

12 meta-algorithms

Theorem 12.4. Let ๐ฝ โ‰” ๐บ + ๐น for ๐บ : ๐‘‹ โ†’ โ„ and ๐น : ๐‘‹ โ†’ โ„ be convex, proper, andlower semicontinuous with โˆ‡๐น ๐ฟ-Lipschitz. Suppose [๐œ•๐ฝ ]โˆ’1(0) โ‰  โˆ…. Pick an initial iterate๐‘ฅ0 = ๐‘ง0 โˆˆ ๐‘‹ . If {_๐‘˜}๐‘˜โˆˆโ„• is nonincreasing and satises

(12.18) _๐‘˜ โ‰ฅ 14 (1 +

โˆš1 + 8๐ฟ๐œ),

then the ergodic sequence {๐‘ฅ๐‘ }๐‘โˆˆโ„• dened in (12.13) and generated from the iterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•of the over-relaxed forward-backward splitting (12.17) satises ๐ฝ (๐‘ฅ๐‘ ) โ†’ ๐ฝmin โ‰” min๐‘ฅโˆˆ๐‘‹ ๐ฝ (๐‘ฅ)at the rate ๐‘‚ (1/๐‘ ).

Proof. We apply Corollary 12.2 with ๏ฟฝ๏ฟฝ = ๐บ , ๐น = ๐น , and ๐‘€ = ๐œโˆ’1Id. By Corollary 7.2, ๐นsatises the three-point smoothness condition (11.19) with ฮ› = ๐ฟ Id. The condition (12.12)consequently holds if _๐‘˜ (2_๐‘˜ โˆ’ 1) > ๐ฟ๐œ , which holds under the assumption (12.18). The restfollows as in the proof of Theorem 12.3. ๏ฟฝ

over-relaxed pdps

With ๐น = ๐น0 + ๐ธ : ๐‘‹ โ†’ โ„, ๐บโˆ— : ๐‘Œ โ†’ โ„, and ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), take ๐ป : ๐‘‹ ร— ๐‘Œ โ‡’ ๐‘‹ ร— ๐‘Œas well as ๐น, ๏ฟฝ๏ฟฝ , and ฮž as in (11.6), and the preconditioner ๐‘€ as in (11.25) for xed steplength parameters ๐œ, ๐œŽ > 0. Writing ๐‘ง๐‘˜ = (b๐‘˜ , ๐œ๐‘˜), and, as usual ๐‘ข๐‘˜ = (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜), the method(12.4) then becomes the over-relaxed PDPS method with a forward step, also known as theVuโ€“Condat method:

(12.19)

๐‘ฅ๐‘˜+1 โ‰” (๐ผ + ๐œ๐œ•๐น0)โˆ’1(b๐‘˜ โˆ’ ๐œ๐พโˆ—๐‘ฆ๐‘˜ โˆ’ ๐œโˆ‡๐ธ (b๐‘˜)),๐‘ฅ๐‘˜+1 โ‰” (๐‘ฅ๐‘˜+1 โˆ’ b๐‘˜) + ๐‘ฅ๐‘˜+1,๐‘ฆ๐‘˜+1 โ‰” (๐ผ + ๐œŽ๐œ•๐บโˆ—)โˆ’1(๐œ๐‘˜ + ๐œŽ๐พ๐‘ฅ๐‘˜+1),b๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฅ

๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)b๐‘˜ ,๐œ๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฆ

๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)๐œ๐‘˜ .For the statement of the next result, we recall that for the primal-dual saddle-point operator๐ป from (11.6), the generic gap functional G becomes the primal-dual gap G given in (11.7).

Theorem 12.5. Suppose ๐น0 : ๐‘‹ โ†’ โ„, ๐ธ : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘Œ โ†’ โ„ are convex, proper, andlower semicontinuous on Hilbert spaces ๐‘‹ and ๐‘Œ with โˆ‡๐ธ ๐ฟ-Lipschitz. Let also ๐พ โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ).With ๐น = ๐น0 + ๐ธ, suppose the assumptions of Theorem 5.10 are satised. Pick an initial iterate๐‘ข0 = ๐‘ง0 โˆˆ ๐‘‹ ร— ๐‘Œ . If the sequence {_๐‘˜}๐‘˜โˆˆโ„• is nonincreasing and satises

(12.20) _๐‘˜ โ‰ฅ 14 (1 +

โˆš1 + 8๐ฟ๐œ/(1 โˆ’ ๐œ๐œŽ โ€–๐พ โ€–2)) and ๐œ๐œŽ โ€–๐พ โ€–2 โ‰ค 1,

then the ergodic sequence {๏ฟฝ๏ฟฝ๐‘ = (๐‘ฅ๐‘ , ๏ฟฝ๏ฟฝ๐‘ )}๐‘โˆˆโ„• dened as in (12.13) and generated from theiterates {๐‘ข๐‘˜ = (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜)}๐‘˜โˆˆโ„• of the over-relaxed PDPS method (12.19) satises G(๐‘ฅโˆ—,๐‘ , ๏ฟฝ๏ฟฝโˆ—,๐‘ ) โ†’0 at the rate ๐‘‚ (1/๐‘ ).

171

12 meta-algorithms

Proof. We recall that ๐ปโˆ’1(0) โ‰  โˆ… under the assumptions of Theorem 5.10. Clearly ๐‘€ isself-adjoint. The condition (12.12) can with (10.32) be reduced to(

_๐‘˜ (2_๐‘˜ โˆ’ 1)๐›ฟ๐œId 00 ๐œŽโˆ’1๐ผ โˆ’ ๐œ (1 โˆ’ ๐›ฟ)โˆ’1๐พ๐พโˆ—

)โ‰ฅ

(๐ฟ 00 0

)for some ๐›ฟ โˆˆ (0, 1). As in (10.34) in the proof of Theorem 10.8, these conditions reduce to

(12.21) _๐‘˜ (2_๐‘˜ โˆ’ 1)๐›ฟ โ‰ฅ ๐œ๐ฟ and 1 โˆ’ ๐›ฟ โ‰ฅ ๐œ๐œŽ โ€–๐พ โ€–2.

The rst inequality holds if _๐‘˜ โ‰ฅ 14 (1 +

โˆš1 + 8๐ฟ๐œ๐›ฟโˆ’1). Solving the second inequality as an

equality for ๐›ฟ yields the condition

_๐‘˜ โ‰ฅ 14 (1 +

โˆš1 + 8๐ฟ๐œ [(1 โˆ’ ๐œ๐œŽ โ€–๐พ โ€–2)]โˆ’1),

i.e., (12.20). Now we obtain the gap convergence from Corollary 12.2. ๏ฟฝ

Remark 12.6. The method (12.19) is due to [Vu 2013; Condat 2013]. The convergence of the ergodicgap was observed in [Chambolle & Pock 2015].

12.2 inertia

Our next inertial meta-algorithm will likewise not yield convergence of the main iterates,but through a special arrangement of variables combinedwith intricate unrolling arguments,is able to do awaywith the word ergodic in the gap estimates. In essence, the meta-algorithmreplaces the previous iterate ๐‘ฅ๐‘˜ in the linear preconditioner of (12.1) by an inertial point

(12.22) ๐‘ฅ๐‘˜ โ‰” (1 + ๐›ผ๐‘˜)๐‘ฅ๐‘˜ โˆ’ ๐›ผ๐‘˜๐‘ฅ๐‘˜โˆ’1 for ๐›ผ๐‘˜ โ‰” _๐‘˜ (_โˆ’1๐‘˜โˆ’1 โˆ’ 1)

for some inertial parameters {_๐‘˜}๐‘˜โˆˆโ„•. We thus solve

(12.23) 0 โˆˆ ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1) +๐‘€๐‘˜+1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

We can relate this to over-relaxation as follows: we simply replace ๐‘ง๐‘˜ in the denition (12.2)of ๐‘ง๐‘˜+1 by ๐‘ฅ๐‘˜ , i.e., we take

(12.24) ๐‘ง๐‘˜+1 โ‰” _โˆ’1๐‘˜ ๐‘ฅ๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)๐‘ฅ๐‘˜ .

Since

(12.25) _๐‘˜ (๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜) = ๐‘ฅ๐‘˜+1 โˆ’ (1 โˆ’ _๐‘˜)๐‘ฅ๐‘˜ โˆ’ _๐‘˜ [_โˆ’1๐‘˜โˆ’1๐‘ฅ๐‘˜ โˆ’ (_โˆ’1๐‘˜โˆ’1 โˆ’ 1)๐‘ฅ๐‘˜โˆ’1]= ๐‘ฅ๐‘˜+1 โˆ’ [1 โˆ’ _๐‘˜ + _๐‘˜_โˆ’1๐‘˜โˆ’1]๐‘ฅ๐‘˜ + _๐‘˜ (_โˆ’1๐‘˜โˆ’1 โˆ’ 1)๐‘ฅ๐‘˜โˆ’1= ๐‘ฅ๐‘˜+1 โˆ’ [(1 + ๐›ผ๐‘˜)๐‘ฅ๐‘˜ โˆ’ ๐›ผ๐‘˜๐‘ฅ๐‘˜โˆ’1] = ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ ,

172

12 meta-algorithms

we obtain the method (12.4), with the diering update (12.24) of ๐‘ง๐‘˜+1. Again we can also liftthe overall algorithm into the form (12.1), specically (12.5), by taking ๐‘ž โ‰” (๐‘ฅ, ๐‘ง) with

๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ž) โ‰”(๐ป๐‘˜+1(๐‘ฅ)๐‘ง โˆ’ ๐‘ฅ

), and ๏ฟฝ๏ฟฝ๐‘˜+1 โ‰”

(0 _๐‘˜๐‘€๐‘˜+1

(๐ผ โˆ’ _โˆ’1๐‘˜)๐ผ 0

).

Now comes the trick with inertial methods: Unlike with over-relaxed methods, where wewanted to avoid having to estimate ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝใ€‰๐‘๐‘˜+1 , with inertial methods we arebrave enough to do this. Indeed, our specic choice (12.24) makes this possible, as we shallsee below. We therefore test with

๐‘๐‘˜+1 โ‰”(

0 0_๐‘˜๐‘๐‘˜+1 0

)to obtain a self-adjoint and positive semi-denite

(12.26) ๐‘๐‘˜+1๐‘€๐‘˜+1 =(0 00 _2

๐‘˜๐‘๐‘˜+1๐‘€๐‘˜+1

).

Therefore Theorem 11.12 applies, and we obtain the following:

Theorem 12.7. Let the inertial parameters {_๐‘˜}๐‘˜โˆˆโ„• โŠ‚ (0,โˆž). On a Hilbert space ๐‘‹ , let๐ป๐‘˜+1 : ๐‘‹ โ‡’ ๐‘‹ , and๐‘€๐‘˜+1, ๐‘๐‘˜+1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ) for ๐‘˜ โˆˆ โ„•. Suppose (12.23) is solvable for the iterates{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. If ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint, and

(12.27) _๐‘˜ ใ€ˆ๐ป๐‘˜+1(๐‘ฅ๐‘˜+1), ๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝใ€‰๐‘๐‘˜+1 โ‰ฅ V๐‘˜+1(๐‘ฅ) +12 โ€–๐‘ง

๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝโ€–2_2๐‘˜+1๐‘๐‘˜+2๐‘€๐‘˜+2โˆ’_2๐‘˜๐‘๐‘˜+1๐‘€๐‘˜+1

โˆ’ _2๐‘˜

2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1

for all ๐‘˜ โˆˆ โ„• and some ๐‘ฅ โˆˆ ๐‘‹ and V๐‘˜+1(๐‘ฅ) โˆˆ โ„, then

(12.28)_2๐‘2 โ€–๐‘ง๐‘ โˆ’ ๐‘ฅ โ€–2๐‘๐‘+1๐‘€๐‘+1 +

๐‘โˆ’1โˆ‘๐‘˜=0

V๐‘˜+1(๐‘ฅ) โ‰ค_202 โ€–๐‘ง0 โˆ’ ๐‘ฅ โ€–2๐‘1๐‘€1

(๐‘ โ‰ฅ 1).

Proof. This follows directly from Theorem 11.12 and the expansion (12.26). ๏ฟฝ

We now provide examples of how to apply this result to the proximal point method andforward-backward splitting. As we recall, in these algorithms we take ๐‘๐‘˜+1 = ๐œ‘๐‘˜๐ผ and๐‘Š๐‘˜+1 = ๐œ๐‘˜๐ผ . To proceed, we will need a few further general-purpose technical lemmas.The rst one is the fundamental lemma for inertia, which provides inertial function valueunrolling.

173

12 meta-algorithms

Lemma 12.8. Let๐บ : ๐‘‹ โ†’ โ„ be convex, proper, and lower semicontinuous. Suppose _๐‘˜ โˆˆ [0, 1]and ๐œ‘๐‘˜ , ๐œ๐‘˜ > 0 for ๐‘˜ โˆˆ โ„• with

(12.29) ๐œ‘๐‘˜+1๐œ๐‘˜+1(1 โˆ’ _๐‘˜+1) โ‰ค ๐œ‘๐‘˜๐œ๐‘˜ (๐‘˜ โ‰ฅ 0).

Assume ๐‘ž๐‘˜+1 โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1) for ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1, and 0 โˆˆ ๐œ•๐บ (๐‘ฅ). Then

(12.30) ๐‘ ๐บ,๐‘ โ‰”๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜_๐‘˜ ใ€ˆ๐‘ž๐‘˜+1, ๐‘ง๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹

โ‰ฅ ๐œ‘๐‘โˆ’1๐œ๐‘โˆ’1(๐บ (๐‘ฅ๐‘ ) โˆ’๐บ (๐‘ฅ)) โˆ’ ๐œ‘0๐œ0(1 โˆ’ _0) (๐บ (๐‘ฅ0) โˆ’๐บ (๐‘ฅ)).

Proof. Using (12.24), observe that

(12.31) _๐‘˜ (๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ) = _๐‘˜ [_โˆ’1๐‘˜+1๐‘ฅ๐‘˜+1 โˆ’ (_โˆ’1๐‘˜ โˆ’ 1)๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ]= _๐‘˜ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ) + (1 โˆ’ _๐‘˜) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

Recalling from (12.31) that _๐‘˜ (๐‘ง๐‘˜+1 โˆ’ ๐‘ฅ) = _๐‘˜ (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ) + (1 โˆ’ _๐‘˜) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) and using theconvexity of ๐บ , we can estimate

(12.32) ๐‘ ๐บ,๐‘ =๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜[_๐‘˜ ใ€ˆ๐‘ž๐‘˜+1, ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + (1 โˆ’ _๐‘˜)ใ€ˆ๐‘ž๐‘˜+1, ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ใ€‰๐‘‹

]โ‰ฅ

๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜[_๐‘˜ (๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ)) + (1 โˆ’ _๐‘˜) (๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ๐‘˜))

]=๐‘โˆ’1โˆ‘๐‘˜=0

[๐œ‘๐‘˜๐œ๐‘˜ (๐บ (๐‘ฅ๐‘˜+1) โˆ’๐บ (๐‘ฅ)) โˆ’ ๐œ‘๐‘˜๐œ๐‘˜ (1 โˆ’ _๐‘˜) (๐บ (๐‘ฅ๐‘˜) โˆ’๐บ (๐‘ฅ))

].

Since๐บ (๐‘ฅ๐‘˜) โ‰ฅ ๐บ (๐‘ฅ), the recurrence inequality (12.29) together with a telescoping argumentnow gives

๐‘ ๐บ,๐‘ โ‰ฅ ๐œ‘๐‘โˆ’1๐œ๐‘โˆ’1(๐บ (๐‘ฅ๐‘ ) โˆ’๐บ (๐‘ฅ)) โˆ’ ๐œ‘0๐œ0(1 โˆ’ _0) (๐บ (๐‘ฅ0) โˆ’๐บ (๐‘ฅ)) .

This is the claim. ๏ฟฝ

Lemma 12.9. Suppose _0 = 1 and _โˆ’2๐‘˜

= _โˆ’2๐‘˜+1 โˆ’ _โˆ’1๐‘˜+1 for ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1. Then

(12.33) _๐‘˜+1 =2

1 +โˆš1 + 4_โˆ’2

๐‘˜

(๐‘˜ = 0, . . . , ๐‘ โˆ’ 1)

and _โˆ’1๐‘ โ‰ฅ (๐‘ + 1).

174

12 meta-algorithms

Proof. First, the recurrence (12.33) is a simple solution of the assumed quadratic equation.We show the lower bound by total induction on ๐‘ . Assume that _โˆ’1

๐‘˜โ‰ฅ (๐‘˜ + 1) for all

๐‘˜ = 0, . . . , ๐‘ โˆ’ 1. Rearranging the original update as

_โˆ’2๐‘˜+1 โˆ’ _โˆ’1๐‘˜+1 = _โˆ’2๐‘˜ โˆ’ _โˆ’1๐‘˜ + _โˆ’1๐‘˜ ,summing over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1, and telescoping yields

_โˆ’2๐‘ โˆ’ _โˆ’1๐‘ =๐‘โˆ’1โˆ‘๐‘˜=0

_โˆ’1๐‘˜ .

From the induction assumption, we thus obtain _โˆ’2๐‘ โˆ’ _โˆ’1๐‘ โ‰ฅ (๐‘ + 2) (๐‘ + 1). Solving thisquadratic inequality as an equality then shows that _โˆ’1๐‘ โ‰ฅ (1 +

โˆš1 + 4(๐‘ + 2) (๐‘ + 1)) โ‰ฅ

(๐‘ + 1), which completes the proof. ๏ฟฝ

inertial proximal point method

Let ๐ป = ๐œ•๐บ and ๏ฟฝ๏ฟฝ๐‘˜+1 = ๐œ๐œ•๐บ for a convex, proper, lower semicontinuous function ๐บ . Take๐œ > 0 and _๐‘˜+1 by (12.33) for _0 = 1. Then (12.23) becomes the inertial proximal pointmethod

(12.34)

๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐บ (๐‘ฅ๐‘˜),๐›ผ๐‘˜+1 โ‰” _๐‘˜+1(_โˆ’1๐‘˜ โˆ’ 1),๐‘ฅ๐‘˜+1 โ‰” (1 + ๐›ผ๐‘˜+1)๐‘ฅ๐‘˜+1 โˆ’ ๐›ผ๐‘˜+1๐‘ฅ๐‘˜ .

Theorem 12.10. Let ๐บ : ๐‘‹ โ†’ โ„ be convex, proper, and lower semicontinuous. Suppose[๐œ•๐บ]โˆ’1(0) โ‰  โˆ…. Take ๐œ > 0 and _0 = 1, and pick an initial iterate ๐‘ฅ0 โˆˆ ๐‘‹ . Then the inertialproximal point method (12.34) satises ๐บ (๐‘ฅ๐‘ ) โ†’ ๐บmin at the rate ๐‘‚ (1/๐‘ 2).

Proof. If we take ๐œ๐‘˜ = ๐œ as stated and ๐œ‘๐‘˜ = _โˆ’2๐‘˜, then (12.9) veries (12.29). Since now

_2๐‘˜+1๐œ‘๐‘˜+1 = _

2๐‘˜๐œ‘๐‘˜ , (12.27) holds if

(12.35) _๐‘˜๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1), ๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝใ€‰๐‘‹ โ‰ฅ V๐‘˜+1(๐‘ฅ) โˆ’_2๐‘˜๐œ‘๐‘˜

2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘‹for someV๐‘˜+1(๐‘ฅ) โˆˆ โ„. This is veried by Lemma 12.8 for someV๐‘˜+1(๐‘ฅ) such that

๐‘โˆ’1โˆ‘๐‘˜=0

V๐‘˜+1(๐‘ฅ) โ‰ฅ ๐œ‘๐‘โˆ’1๐œ๐‘โˆ’1(๐บ (๐‘ฅ๐‘ ) โˆ’๐บ (๐‘ฅ)) โˆ’ ๐œ‘0๐œ0(1 โˆ’ _0) (๐บ (๐‘ฅ0) โˆ’๐บ (๐‘ฅ)).

Since _0 = 1, Theorem 12.7 gives the estimate๐œ‘๐‘_

2๐‘

2 โ€–๐‘ฅ๐‘ โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘โˆ’1๐œ๐‘โˆ’1(๐บ (๐‘ฅ๐‘ ) โˆ’๐บ (๐‘ฅ)) โ‰ค ๐œ‘0_20

2 โ€–๐‘ฅ0 โˆ’ ๐‘ฅ โ€–2๐‘‹ .

By Lemma 12.9 now๐œ‘๐‘โˆ’1๐œ๐‘โˆ’1 = _โˆ’2๐‘โˆ’1๐œ โ‰ฅ ๐œ๐‘ 2. Thereforewe obtain the claimed convergencerate. ๏ฟฝ

175

12 meta-algorithms

inertial forward-backward splitting

Let๐ป = ๐œ•๐บ+โˆ‡๐น and ๏ฟฝ๏ฟฝ๐‘˜+1(๐‘ฅ) = ๐œ (๐œ•๐บ (๐‘ฅ)+โˆ‡๐น (๐‘ฅ๐‘˜)) for convex, proper, lower semicontinuousfunctions ๐บ and ๐น with ๐น smooth. Take ๐œ > 0 and _๐‘˜+1 by (12.33) for _0 = 1. Then (12.23)becomes the inertial forward-backward splitting method

(12.36)

๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œโˆ‡๐น (๐‘ฅ๐‘˜)),๐›ผ๐‘˜+1 โ‰” _๐‘˜+1(_โˆ’1๐‘˜ โˆ’ 1),๐‘ฅ๐‘˜+1 โ‰” (1 + ๐›ผ๐‘˜+1)๐‘ฅ๐‘˜+1 โˆ’ ๐›ผ๐‘˜+1๐‘ฅ๐‘˜ .

To prove the convergence of this method, we need to incorporate the forward step intoLemma 12.8.

Lemma 12.11. Let ๐ฝ โ‰” ๐น +๐บ for ๐บ : ๐‘‹ โ†’ โ„ and ๐น : ๐‘‹ โ†’ โ„ be convex, proper, and lowersemicontinuous. Suppose ๐น has ๐ฟ-Lipschitz gradient and that _๐‘˜ โˆˆ [0, 1] and ๐œ‘๐‘˜ , ๐œ๐‘˜ > 0 satisfythe recurrence inequality (12.29) for ๐‘˜ โˆˆ โ„•. Assume๐‘ค๐‘˜+1 โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1) for all ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1,and that 0 โˆˆ ๐œ•๐ฝ (๐‘ฅ). Then

(12.37) ๐‘ ๐‘ โ‰”๐‘โˆ’1โˆ‘๐‘˜=0

(๐œ‘๐‘˜๐œ๐‘˜_๐‘˜ ใ€ˆ๐‘ค๐‘˜+1

๐‘‹ + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ง๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐œ‘๐‘˜๐œ๐‘˜_2๐‘˜๐ฟ

2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘‹)

โ‰ฅ ๐œ‘๐‘โˆ’1๐œ๐‘โˆ’1(๐ฝ (๐‘ฅ๐‘ ) โˆ’ ๐ฝ (๐‘ฅ)) โˆ’ ๐œ‘0๐œ0(1 โˆ’ _0) (๐ฝ (๐‘ฅ0) โˆ’ ๐ฝ (๐‘ฅ)) .

Proof. We recall from (12.25) that _2๐‘˜2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘‹ = 1

2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .We therefore estimateusing Corollary 7.2 that(12.38)

๐‘ ๐น,๐‘ โ‰”๐‘โˆ’1โˆ‘๐‘˜=0

(๐œ‘๐‘˜๐œ๐‘˜_๐‘˜ ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ง๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐œ‘๐‘˜๐œ๐‘˜_

2๐‘˜๐ฟ

2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘‹)

=๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜

[_๐‘˜ ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + (1 โˆ’ _๐‘˜)ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ใ€‰๐‘‹ + ๐ฟ2 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹]

โ‰ฅ๐‘โˆ’1โˆ‘๐‘˜=0

๐œ‘๐‘˜๐œ๐‘˜[_๐‘˜ (๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ)) + (1 โˆ’ _๐‘˜) (๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ๐‘˜))

]=๐‘โˆ’1โˆ‘๐‘˜=0

[๐œ‘๐‘˜๐œ๐‘˜ (๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ)) โˆ’ ๐œ‘๐‘˜๐œ๐‘˜ (1 โˆ’ _๐‘˜) (๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ))

].

Summing with the estimate (12.32) for ๐บ , we deduce

๐‘ ๐‘ โ‰ฅ๐‘โˆ’1โˆ‘๐‘˜=0

[๐œ‘๐‘˜๐œ๐‘˜ ((๐น +๐บ) (๐‘ฅ๐‘˜+1) โˆ’ (๐น +๐บ) (๐‘ฅ)) โˆ’ ๐œ‘๐‘˜๐œ๐‘˜ (1 โˆ’ _๐‘˜) ((๐น +๐บ) (๐‘ฅ๐‘˜) โˆ’ (๐น +๐บ) (๐‘ฅ))

].

Since (๐น +๐บ) (๐‘ฅ๐‘˜) โ‰ฅ (๐น +๐บ) (๐‘ฅ), the recurrence inequality (12.29) together with a telescopingargument now gives the claim. ๏ฟฝ

176

12 meta-algorithms

Theorem 12.12. Let ๐ฝ โ‰” ๐บ + ๐น for ๐บ : ๐‘‹ โ†’ โ„ and ๐น : ๐‘‹ โ†’ โ„ be convex, proper, andlower semicontinuous with โˆ‡๐น Lipschitz. Suppose [๐œ•๐ฝ ]โˆ’1(0) โ‰  โˆ…. Take ๐œ > 0 and _0 = 1, andpick an initial iterate ๐‘ฅ0 โˆˆ ๐‘‹ . Then the inertial forward-backward splitting (12.36) satises๐ฝ (๐‘ฅ๐‘ ) โ†’ min๐‘ฅโˆˆ๐‘‹ ๐ฝ (๐‘ฅ) at the rate ๐‘‚ (1/๐‘ 2).

Proof. The proof follows that of Theorem 12.10: in place of (12.35) we reduce (12.27) to thecondition

_๐‘˜๐œ‘๐‘˜๐œ๐‘˜ ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ง๐‘˜+1 โˆ’ ๏ฟฝ๏ฟฝใ€‰๐‘‹ โ‰ฅ V๐‘˜+1(๐‘ฅ) โˆ’_2๐‘˜๐œ‘๐‘˜

2 โ€–๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ โ€–2๐‘‹ .

This is veried by using (12.37) in place of Lemma 12.8. ๏ฟฝ

Remark 12.13 (accelerated gradient methods, FISTA). The inertial scheme was rst introducedby [Nesterov 1983] for the basic gradient descent method for smooth functions. The extensionto forward-backward splitting is due to [Beck & Teboulle 2009a], which proposed a fast iterativeshrinkage-thresholding algorithm (FISTA) for the specic problem of minimizing a least-squaresterm plus a weighted โ„“ 1 norm. (Note that in most treatments of FISTA, our _โˆ’1

๐‘˜is written as ๐‘ก๐‘˜ .) We

refer to [Nesterov 2004; Beck 2017] for a further discussion of these algorithms and more generalaccelerated gradient methods based on combinations of a history of iterates.

Remark 12.14 (PDPS, Douglasโ€“Rachford, and correctors). The above unrolling arguments cannotbe directly applied to the PDPS, Douglasโ€“Rachford splitting, and other methods based on (12.1)with non-maximally monotone ๐ป . Following [Chambolle & Pock 2015], one can apply inertia tothe PDPS method with the restricted choice ๐›ผ๐‘˜ โˆˆ (0, 1/3). This prevents the use of the FISTA rule(12.33) and only yields ๐‘‚ (1/๐‘ ) convergence of an ergodic gap. Based on alternative argumentation,when one of the functions is quadratic, [Patrinos, Stella & Bemporad 2014] managed to employ theFISTA rule and obtain ๐‘‚ (1/๐‘ 2) rates for inertial Douglasโ€“Rachford splitting. Moreover, [Valkonen2020b] observed that by introducing a corrector for the non-subdierential component of ๐ป , inessence ฮž๐‘˜+1, the gap unrolling arguments can be performed. This approach also allows combininginertial acceleration with strong monotonicity based acceleration.

12.3 line search

Let us return to the basic results on weak convergence (Theorem 9.6), strong convergencewith rates (Theorem 10.2), and function value convergence (Theorem 11.4) of the explicitsplitting method. These results depend on the three-point inequalities of Corollary 7.2 (or,for faster rates under strong convexity, Corollary 7.7), specically either the non-valueestimate

ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐น (๐‘ฅ), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐ฟ4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹(12.39)

177

12 meta-algorithms

or the value estimate

ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ฟ

2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .(12.40)

Recall that for weak convergence of iterates, we required the step length parameters {๐œ๐‘˜}๐‘˜โˆˆโ„•to satisfy on each iteration the bound ๐œ๐‘˜๐ฟ < 2. Under a strong convexity assumption, thebound๐œ๐‘˜๐ฟ โ‰ค 2was sucient for strong convergence of iterates. Function value convergencewas nally shown under the bound ๐œ๐‘˜๐ฟ โ‰ค 1. All cases thus hold for ๐œ๐‘˜๐ฟ โ‰ค 1, which weassume in the following for simplicity.

In this section, we address the following question: What if we do not know the Lipschitzfactor ๐ฟ? A basic idea is to take ๐ฟ large enough. But what is large enough? Finding such alarge enough ๐ฟ is the same as taking ๐œ๐‘˜ small enough and ๐ฟ = 1/๐œ๐‘˜ . This leads us to thefollowing rough line search rule: for some ๐œ > 0 and line search parameter \ โˆˆ (0, 1), startwith ๐œ๐‘˜ โ‰” ๐œ , and iterate ๐œ๐‘˜ โ†ฆโ†’ \๐œ๐‘˜ until (12.40) (or (12.39)) is satised with ๐ฟ = 1/๐œ๐‘˜ . Notethat on each update of ๐œ๐‘˜ , we need to recalculate ๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜โˆ‡๐น (๐‘ฅ๐‘˜))).Performing this line search still appears to depend on knowing ๐‘ฅ through (12.40). However,going back to the proof of Corollary 7.2, we see that what is really needed is to satisfy thesmoothness (or descent) inequality (7.5) which was used to derive (12.40). We are thereforelead to the following practical line search method to guarantee the inequality

(12.41) ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ๐‘˜) โˆ’ 12๐œ๐‘˜

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹

on every iteration:

0. Pick \ โˆˆ (0, 1), ๐œ > 0, _0 โ‰” 1, ๐‘ฅ0 โˆˆ ๐‘‹ ; set ๐‘˜ = 0.

1. Set ๐œ๐‘˜ = ๐œ .

2. Calculate ๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜โˆ‡๐น (๐‘ฅ๐‘˜))).3. If (12.41) does not hold, update ๐œ๐‘˜ โ‰” \๐œ๐‘˜ , and go back to step 2.

4. Set ๐‘˜ โ‰” ๐‘˜ + 1, and continue from step 1.

Theorem 12.15 (explicit spliing line search). Let ๐ฝ โ‰” ๐น + ๐บ where ๐บ : ๐‘‹ โ†’ โ„ and๐น : ๐‘‹ โ†’ โ„ are convex, proper, and lower semicontinuous, with โˆ‡๐น moreover Lipschitz.Suppose [๐œ•๐ฝ ]โˆ’1(0) โ‰  โˆ…. Then the above line search method satises ๐ฝ (๐‘ฅ๐‘ ) โ†’ min๐‘ฅโˆˆ๐‘‹ ๐ฝ (๐‘ฅ)at the rate ๐‘‚ (1/๐‘ ). If ๐บ is strongly convex, then this convergence is linear.

Proof. Since โˆ‡๐น is ๏ฟฝ๏ฟฝ-smooth for some unknown ๏ฟฝ๏ฟฝ > 0, eventually the line search proceduresatises 1/๐œ๐‘˜ โ‰ฅ ๏ฟฝ๏ฟฝ. Hence (12.41) is satised, and ๐œ๐‘˜ โ‰ฅ Y > 0 for some Y > 0. We can thereforefollow through the proof of Theorem 11.4 with ๐ฟ = 1/๐œ๐‘˜ . ๏ฟฝ

178

12 meta-algorithms

We can also combine the line search method with the inertial forward-backward splitting(12.36). If in place of (12.41) we seek to satisfy

(12.42) ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ๐‘˜) โˆ’ 12๐œ๐‘˜

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ ,

then alsoใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐น (๐‘ฅ) โˆ’ 1

2๐œ๐‘˜โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

This allows the inequality of (12.38) to be shown.

We are therefore lead to the following practical backtracking inertial forward-backwardsplitting:

0. Pick \ โˆˆ (0, 1), ๐œ > 0, _0 โ‰” 1, ๐‘ฅ0 = ๐‘ฅ0 โˆˆ ๐‘‹ ; set ๐‘˜ = 0.

1. Set ๐œ๐‘˜ = ๐œ .

2. Calculate ๐‘ฅ๐‘˜+1 โ‰” prox๐œ๐‘˜๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜โˆ‡๐น (๐‘ฅ๐‘˜))).3. If (12.42) does not hold, update ๐œ๐‘˜ โ‰” \๐œ๐‘˜ , and go back to step 2.

4. Set ๐‘ฅ๐‘˜+1 โ‰” (1 + ๐›ผ๐‘˜+1)๐‘ฅ๐‘˜+1 โˆ’ ๐›ผ๐‘˜+1๐‘ฅ๐‘˜ for ๐›ผ๐‘˜+1 โ‰” _๐‘˜+1(_โˆ’1๐‘˜ โˆ’ 1).5. Set ๐‘˜ โ‰” ๐‘˜ + 1, and continue from step 1.

The proof of the following is immediate:

Theorem 12.16. Let ๐ฝ โ‰” ๐บ + ๐น for ๐บ : ๐‘‹ โ†’ โ„ and ๐น : ๐‘‹ โ†’ โ„ be convex, proper, and lowersemicontinuous with โˆ‡๐น Lipschitz. Suppose [๐œ•๐ฝ ]โˆ’1(0) โ‰  โˆ…. Take ๐œ > 0 and _0 = 1, and pickan initial iterate ๐‘ฅ0 โˆˆ ๐‘‹ . Then the above backtracking inertial forward-backward splittingsatises ๐ฝ (๐‘ฅ๐‘ ) โ†’ min๐‘ฅโˆˆ๐‘‹ ๐ฝ (๐‘ฅ) at the rate ๐‘‚ (1/๐‘ 2).

The reader may now work out how to use line search to satisfy the nonnegativity ofthe metric ๐‘๐‘˜+1๐‘€๐‘˜+1 in the PDPS method when โ€–๐พ โ€– is not known, or how to satisfy thecondition ๐ฟ๐œ0 + ๐œ0๐œŽ0โ€–๐พ โ€–2 < 1 when the Lipschitz factor ๐ฟ of the forward step component๐ธ is not known.

Remark 12.17 (adaptive inertial parameters, quasi-Newton methods, and primal-dual proximal

line searches). Regarding our statement in the beginning of the chapter about inertia methodsattempting to construct a second-order approximation of the function, [Ochs & Pock 2019] show thatan adaptive inertial forward-backward splitting, performing an optimal line search on _๐‘˜ insteadof ๐œ๐‘˜ , is equivalent to a proximal quasi-Newton method. Such a method is a further developmentof variants [see Beck & Teboulle 2009b] of the method that attempt to restore the monotonicityof forward-backward splitting that is lost by inertia. Indeed, if ๐ฝ (๐‘ฅ๐‘˜+1) โ‰ค ๐ฝ (๐‘ฅ๐‘˜ ) does not hold for_๐‘˜ < 1, we can revert to _๐‘˜ = 1 to ensure descent as the step reduces to basic forward-backwardsplitting, which we know to be monotone by Theorem 11.4. Finally, a line search for the PDPSmethod is studied in [Malitsky & Pock 2018].

179

Part III

NONCONVEX ANALYSIS

180

13 CLARKE SUBDIFFERENTIALS

We now turn to a concept of generalized derivatives that covers, among others, both Frรฉchetderivatives and convex subdierentials. Again, we start with the general class of functionalsthat admit such a derivative. It is clear that we need to require some continuity properties,since otherwise there would be no relation between functional values at neighboring pointsand thus no hope of characterizing optimality through pointwise properties. In Part II, weused lower semicontinuity for this purpose, which together with convexity yielded therequired properties. In this part, we want to drop the latter, global, assumption; in turn weneed to strengthen the local continuity assumption. We thus consider now locally Lipschitzcontinuous functionals. Recall that ๐น : ๐‘‹ โ†’ โ„ is locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘‹if there exist a ๐›ฟ > 0 and an ๐ฟ > 0 (which in the following will always denote the localLipschitz constant of ๐น ) such that

|๐น (๐‘ฅ1) โˆ’ ๐น (๐‘ฅ2) | โ‰ค ๐ฟโ€–๐‘ฅ1 โˆ’ ๐‘ฅ2โ€–๐‘‹ for all ๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐•†(๐‘ฅ, ๐›ฟ).

We will refer to the ๐•†(๐‘ฅ, ๐›ฟ) from the denition as the Lipschitz neighborhood of ๐‘ฅ . Notethat for this we have to require that ๐น is (locally) nite-valued (but see Remark 13.27 below).Throughout this chapter, we will assume that๐‘‹ is a Banach space unless stated otherwise.

13.1 definition and basic properties

We proceed as for the convex subdierential and rst dene for ๐น : ๐‘‹ โ†’ โ„ the generalizeddirectional derivative in ๐‘ฅ โˆˆ ๐‘‹ in direction โ„Ž โˆˆ ๐‘‹ as

(13.1) ๐น โ—ฆ(๐‘ฅ ;โ„Ž) โ‰” lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ)๐‘ก

.

Note the dierence to the classical directional derivative: We no longer require the existenceof a limit but merely of accumulation points. We will need the following properties.

Lemma 13.1. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘‹ with the factor ๐ฟ.Then the mapping โ„Ž โ†ฆโ†’ ๐น โ—ฆ(๐‘ฅ ;โ„Ž) is

(i) Lipschitz continuous with the factor ๐ฟ and satises |๐น โ—ฆ(๐‘ฅ ;โ„Ž) | โ‰ค ๐ฟโ€–โ„Žโ€–๐‘‹ < โˆž;

181

13 clarke subdifferentials

(ii) subadditive, i.e., ๐น โ—ฆ(๐‘ฅ ;โ„Ž + ๐‘”) โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) + ๐น โ—ฆ(๐‘ฅ ;๐‘”) for all โ„Ž,๐‘” โˆˆ ๐‘‹ ;(iii) positively homogeneous, i.e., ๐น โ—ฆ(๐‘ฅ ;๐›ผโ„Ž) = (๐›ผ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) for all ๐›ผ > 0 and โ„Ž โˆˆ ๐‘‹ ;(iv) reective, i.e., ๐น โ—ฆ(๐‘ฅ ;โˆ’โ„Ž) = (โˆ’๐น )โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ .

Proof. (i): Let โ„Ž,๐‘” โˆˆ ๐‘‹ be arbitrary. The local Lipschitz continuity of ๐น implies that

๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ) โ‰ค ๐น (๐‘ฆ + ๐‘ก๐‘”) โˆ’ ๐น (๐‘ฆ) + ๐‘ก๐ฟโ€–โ„Ž โˆ’ ๐‘”โ€–๐‘‹for all ๐‘ฆ suciently close to ๐‘ฅ and ๐‘ก suciently small. Dividing by ๐‘ก > 0 and taking thelim sup then yields that

๐น โ—ฆ(๐‘ฅ ;โ„Ž) โ‰ค ๐น โ—ฆ(๐‘ฅ ;๐‘”) + ๐ฟโ€–โ„Ž โˆ’ ๐‘”โ€–๐‘‹ .Exchanging the roles of โ„Ž and ๐‘” shows the Lipschitz continuity of ๐น โ—ฆ(๐‘ฅ ; ยท), which also yieldsthe claimed boundedness since ๐น โ—ฆ(๐‘ฅ ;๐‘”) = 0 for ๐‘” = 0 from the denition.

(ii): Since ๐‘กโ†’ 0 and ๐‘” โˆˆ ๐‘‹ is xed, ๐‘ฆ โ†’ ๐‘ฅ if and only if ๐‘ฆ + ๐‘ก๐‘” โ†’ ๐‘ฅ . The denition of thelim sup and the productive zero thus immediately yield

๐น โ—ฆ(๐‘ฅ ;โ„Ž + ๐‘”) = lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘กโ„Ž + ๐‘ก๐‘”) โˆ’ ๐น (๐‘ฆ)๐‘ก

โ‰ค lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘กโ„Ž + ๐‘ก๐‘”) โˆ’ ๐น (๐‘ฆ + ๐‘ก๐‘”)๐‘ก

+ lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘ก๐‘”) โˆ’ ๐น (๐‘ฆ)๐‘ก

= ๐น โ—ฆ(๐‘ฅ ;โ„Ž) + ๐น โ—ฆ(๐‘ฅ ;๐‘”).

(iii): Again from the denition we obtain for ๐›ผ > 0 that

๐น โ—ฆ(๐‘ฅ ;๐›ผโ„Ž) = lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ โˆ’ ๐‘ก (๐›ผโ„Ž)) โˆ’ ๐น (๐‘ฆ)๐‘ก

= lim sup๐‘ฆโ†’๐‘ฅ๐›ผ๐‘กโ†’ 0

๐›ผ๐น (๐‘ฆ + (๐›ผ๐‘ก)โ„Ž) โˆ’ ๐น (๐‘ฆ)

๐›ผ๐‘ก= (๐›ผ๐น )โ—ฆ(๐‘ฅ ;โ„Ž).

(iv): Similarly, since ๐‘กโ†’ 0 and โ„Ž โˆˆ ๐‘‹ is xed, ๐‘ฆ โ†’ ๐‘ฅ if and only if ๐‘ค โ‰” ๐‘ฆ โˆ’ ๐‘กโ„Ž โ†’ ๐‘ฅ . Wethus have that

๐น โ—ฆ(๐‘ฅ ;โˆ’โ„Ž) = lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ โˆ’ ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ)๐‘ก

= lim sup๐‘คโ†’๐‘ฅ๐‘กโ†’ 0

โˆ’๐น (๐‘ค + ๐‘กโ„Ž) โˆ’ (โˆ’๐น (๐‘ค))๐‘ก

= (โˆ’๐น )โ—ฆ(๐‘ฅ ;โ„Ž). ๏ฟฝ

182

13 clarke subdifferentials

In particular, Lemma 13.1 (i)โ€“(iii) imply that the mapping โ„Ž โ†ฆโ†’ ๐น โ—ฆ(๐‘ฅ ;โ„Ž) is proper, convex,and lower semicontinuous.

We now dene for a locally Lipschitz continuous functional ๐น : ๐‘‹ โ†’ โ„ the Clarkesubdierential in ๐‘ฅ โˆˆ ๐‘‹ as

(13.2) ๐œ•๐ถ๐น (๐‘ฅ) โ‰” {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ } .

The denition together with Lemma 13.1 (i) directly implies the following properties.

Lemma 13.2. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous and ๐‘ฅ โˆˆ ๐‘‹ . Then ๐œ•๐ถ๐น (๐‘ฅ) isconvex, weakly-โˆ— closed, and bounded. Specically, if ๐น is Lipschitz near ๐‘ฅ with constant ๐ฟ,then ๐œ•๐ถ๐น (๐‘ฅ) โŠ‚ ๐”น(0, ๐ฟ).

Furthermore, we have the following useful continuity property.

Lemma 13.3. Let ๐น : ๐‘‹ โ†’ โ„. Then ๐œ•๐ถ๐น (๐‘ฅ) is strong-to-weak-โˆ— outer semicontinuous, i.e., if๐‘ฅ๐‘› โ†’ ๐‘ฅ and if ๐œ•๐ถ๐น (๐‘ฅ๐‘›) 3 ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ—, then ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ฅ).

Proof. Let โ„Ž โˆˆ ๐‘‹ be arbitrary. By assumption, we then have that ใ€ˆ๐‘ฅโˆ—๐‘›, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ—ฆ(๐‘ฅ๐‘›;โ„Ž) forall ๐‘› โˆˆ โ„•. The weak-โˆ— convergence of {๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• then implies that

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ = lim๐‘›โ†’โˆžใ€ˆ๐‘ฅ

โˆ—๐‘›, โ„Žใ€‰๐‘‹ โ‰ค lim sup

๐‘›โ†’โˆž๐น โ—ฆ(๐‘ฅ๐‘›;โ„Ž).

Hence we are nished if we can show that lim sup๐‘›โ†’โˆž ๐นโ—ฆ(๐‘ฅ๐‘›;โ„Ž) โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) (since then

๐‘ฅโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ฅ) by denition).

For this,we use that by denition of ๐น โ—ฆ(๐‘ฅ๐‘›;โ„Ž), there exist sequences {๐‘ฆ๐‘›,๐‘š}๐‘šโˆˆโ„• and {๐‘ก๐‘›,๐‘š}๐‘šโˆˆโ„•with ๐‘ฆ๐‘›,๐‘š โ†’ ๐‘ฅ๐‘› and ๐‘ก๐‘›,๐‘šโ†’ 0 for๐‘š โ†’ โˆž realizing each lim sup. Hence, for all ๐‘› โˆˆ โ„• wecan nd a ๐‘ฆ๐‘› โ‰” ๐‘ฆ๐‘›,๐‘š(๐‘›) and a ๐‘ก๐‘› โ‰” ๐‘ก๐‘›,๐‘š(๐‘›) such that โ€–๐‘ฆ๐‘› โˆ’ ๐‘ฅ๐‘›โ€–๐‘‹ + ๐‘ก๐‘› < ๐‘›โˆ’1 (and hence inparticular ๐‘ฆ๐‘› โ†’ ๐‘ฅ and ๐‘ก๐‘›โ†’ 0) as well as

๐น โ—ฆ(๐‘ฅ๐‘›;โ„Ž) โˆ’ 1๐‘› โ‰ค ๐น (๐‘ฆ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฆ๐‘›)

๐‘ก๐‘›

for ๐‘› suciently large. Taking the lim sup for ๐‘› โ†’ โˆž on both sides yields the desiredinequality. ๏ฟฝ

Again, the construction immediately yields a Fermat principle.1

1Similarly to Theorem 4.2, we do not need to require Lipschitz continuity of ๐น โ€“ the Fermat principle forthe Clarke subdierential characterizes (among others) any local minimizer. However, if we want to usethis principle to verify that a given ๐‘ฅ โˆˆ ๐‘‹ is indeed a (candidate for) a minimizer, we need a suitablecharacterization of the subdierential โ€“ and this is only possible for (certain) locally Lipschitz continuousfunctionals.

183

13 clarke subdifferentials

Theorem 13.4 (Fermat principle). If ๐น : ๐‘‹ โ†’ โ„ has a local minimum in ๐‘ฅ , then 0 โˆˆ ๐œ•๐ถ๐น (๐‘ฅ).

Proof. If ๐‘ฅ โˆˆ ๐‘‹ is a local minimizer of ๐น , then ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฅ + ๐‘กโ„Ž) for all โ„Ž โˆˆ ๐‘‹ and ๐‘ก > 0suciently small (since the topological interior is always included in the algebraic interior).But this implies that

ใ€ˆ0, โ„Žใ€‰๐‘‹ = 0 โ‰ค lim inf๐‘กโ†’ 0

๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โ‰ค lim sup๐‘กโ†’ 0

๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž)

and hence 0 โˆˆ ๐œ•๐ถ๐น (๐‘ฅ) by denition. ๏ฟฝ

Note that ๐น is not assumed to be convex and hence the condition is in general not sucient(consider, e.g., ๐‘“ (๐‘ก) = โˆ’|๐‘ก |).

13.2 fundamental examples

Next, we show that the Clarke subdierential is indeed a generalization of the derivativeconcepts weโ€™ve studied so far.

Theorem 13.5. Let ๐น : ๐‘‹ โ†’ โ„ be continuously Frรฉchet dierentiable in a neighborhood๐‘ˆ of๐‘ฅ โˆˆ ๐‘‹ . Then ๐œ•๐ถ๐น (๐‘ฅ) = {๐น โ€ฒ(๐‘ฅ)}.

Proof. First, we note that ๐น is locally Lipschitz continuous near ๐‘ฅ by Lemma 2.11. We nowshow that ๐น โ—ฆ(๐‘ฅ ;โ„Ž) = ๐น โ€ฒ(๐‘ฅ)โ„Ž (= ๐น โ€ฒ(๐‘ฅ ;โ„Ž)) for all โ„Ž โˆˆ ๐‘‹ . Take again sequences {๐‘ฆ๐‘›}๐‘›โˆˆโ„• and{๐‘ก๐‘›}๐‘›โˆˆโ„• with ๐‘ฆ๐‘› โ†’ ๐‘ฅ and ๐‘ก๐‘›โ†’ 0 realizing the lim sup in (13.1). Applying the mean valueTheorem 2.10 and using the continuity of ๐น โ€ฒ yields for any โ„Ž โˆˆ ๐‘‹ that

๐น โ—ฆ(๐‘ฅ ;โ„Ž) = lim๐‘›โ†’โˆž

๐น (๐‘ฆ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฆ๐‘›)๐‘ก๐‘›

= lim๐‘›โ†’โˆž

โˆซ 1

0

1๐‘ก๐‘›ใ€ˆ๐น โ€ฒ(๐‘ฆ๐‘› + ๐‘  (๐‘ก๐‘›โ„Ž)), ๐‘ก๐‘›โ„Žใ€‰๐‘‹ ๐‘‘๐‘ 

= ใ€ˆ๐น โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹since the integrand converges uniformly in ๐‘  โˆˆ [0, 1] to ใ€ˆ๐น โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ . Hence by denition,๐‘ฅโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ฅ) if and only if ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ใ€ˆ๐น โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ for all โ„Ž โˆˆ ๐‘‹ , which is only possible for๐‘ฅโˆ— = ๐น โ€ฒ(๐‘ฅ). ๏ฟฝ

The following example shows that Theorem 13.5 does not hold if ๐น is merely Frรฉchetdierentiable.

184

13 clarke subdifferentials

Example 13.6. Let ๐‘“ : โ„ โ†’ โ„, ๐‘“ (๐‘ก) = ๐‘ก2 sin(๐‘กโˆ’1). Then it is straightforward (if tedious)to show that ๐‘“ is dierentiable on โ„ with

๐‘“ โ€ฒ(๐‘ก) ={2๐‘ก sin(๐‘กโˆ’1) โˆ’ cos(๐‘กโˆ’1) if ๐‘ก โ‰  0,0 if ๐‘ก = 0.

In particular, ๐‘“ is not continuously dierentiable at ๐‘ก = 0. But a similar limit argumentshows that for all โ„Ž โˆˆ โ„,

๐‘“ โ—ฆ(0;โ„Ž) = |โ„Ž |and hence that

๐œ•๐ถ ๐‘“ (0) = [โˆ’1, 1] ) {0} = {๐‘“ โ€ฒ(0)}.(The rst equality also follows more directly from Theorem 13.26 below.)

As the example suggests, we always have the following weaker relation.

Lemma 13.7. Let ๐น : ๐‘‹ โ†’ โ„ be Lipschitz continuous and Gรขteaux dierentiable in aneighborhood๐‘ˆ of ๐‘ฅ โˆˆ ๐‘‹ . Then ๐ท๐น (๐‘ฅ) โˆˆ ๐œ•๐ถ๐น (๐‘ฅ).

Proof. Let โ„Ž โˆˆ ๐‘‹ be arbitrary. First, note that we always have that

(13.3) ๐น โ€ฒ(๐‘ฅ ;โ„Ž) = lim๐‘กโ†’ 0

๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โ‰ค lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ)๐‘ก

= ๐น โ—ฆ(๐‘ฅ ;โ„Ž).

Since ๐น is Gรขteaux dierentiable, it follows that

ใ€ˆ๐ท๐น (๐‘ฅ), โ„Žใ€‰๐‘‹ = ๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹,

and thus ๐ท๐น (๐‘ฅ) โˆˆ ๐œ•๐ถ๐น (๐‘ฅ) by denition. ๏ฟฝ

Similarly, the Clarke subdierential reduces to the convex subdierential in some situa-tions.

Theorem 13.8. Let ๐น : ๐‘‹ โ†’ โ„ be convex and lower semicontinuous. Then ๐œ•๐ถ๐น (๐‘ฅ) = ๐œ•๐น (๐‘ฅ)for all ๐‘ฅ โˆˆ int(dom ๐น ).

Proof. By Theorem 3.13, ๐น is locally Lipschitz continuous near ๐‘ฅ โˆˆ int(dom ๐น ). We nowshow that ๐น โ—ฆ(๐‘ฅ ;โ„Ž) = ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ , which together with Lemma 4.4 yields theclaim. By (13.3), we always have that ๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž). To show the reverse inequality,

185

13 clarke subdifferentials

let ๐›ฟ > 0 be arbitrary. Since the dierence quotient of convex functionals is increasing byLemma 4.3 (i), we obtain that

๐น โ—ฆ(๐‘ฅ ;โ„Ž) = limYโ†’ 0

sup๐‘ฆโˆˆ๐”น(๐‘ฅ,๐›ฟY)

sup0<๐‘ก<Y

๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ)๐‘ก

โ‰ค limYโ†’ 0

sup๐‘ฆโˆˆ๐”น(๐‘ฅ,๐›ฟY)

๐น (๐‘ฆ + Yโ„Ž) โˆ’ ๐น (๐‘ฆ)Y

โ‰ค limYโ†’ 0

๐น (๐‘ฅ + Yโ„Ž) โˆ’ ๐น (๐‘ฅ)Y

+ 2๐ฟ๐›ฟ

= ๐น โ€ฒ(๐‘ฅ ;โ„Ž) + 2๐ฟ๐›ฟ,

where the last inequality follows by taking two productive zeros and using the localLipschitz continuity in ๐‘ฅ . Since ๐›ฟ > 0 was arbitrary, this implies that ๐น โ—ฆ(๐‘ฅ ;โ„Ž) โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž),and the claim follows. ๏ฟฝ

A locally Lipschitz continuous functional ๐น : ๐‘‹ โ†’ โ„ with ๐น โ—ฆ(๐‘ฅ ;โ„Ž) = ๐น โ€ฒ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹is called regular in ๐‘ฅ โˆˆ ๐‘‹ . We have just shown that every continuously dierentiable andevery convex and lower semicontinuous functional is regular; intuitively, a function is thusregular at any points in which it is either dierentiable or at least has a โ€œconvex kinkโ€.

Finally, similarly to Theorem 4.11 one can show the following pointwise characterizationof the Clarke subdierential of integral functionals with Lipschitz continuous integrands.We again assume that ฮฉ โŠ‚ โ„๐‘‘ is open and bounded.

Theorem 13.9. Let ๐‘“ : โ„ โ†’ โ„ be Lipschitz continuous and ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ โ„ with 1 โ‰ค ๐‘ < โˆžas in Lemma 3.7. Then we have for all ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ) with ๐‘ž = ๐‘

๐‘โˆ’1 (where ๐‘ž = โˆž for ๐‘ = 1) that

๐œ•๐ถ๐น (๐‘ข) โŠ‚ {๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ) | ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ} .

If ๐‘“ is regular at ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, then ๐น is regular at ๐‘ข, and equality holds.

Proof. First, by the properties of the Lebesgue integral and the Lipschitz continuity of ๐‘“ ,we have for any ๐‘ข, ๐‘ฃ โˆˆ ๐ฟ๐‘ (ฮฉ) that

|๐น (๐‘ข) โˆ’ ๐น (๐‘ฃ) | โ‰คโˆซฮฉ|๐‘“ (๐‘ข (๐‘ฅ)) โˆ’ ๐‘“ (๐‘ฃ (๐‘ฅ)) | ๐‘‘๐‘ฅ โ‰ค ๐ฟ

โˆซฮฉ|๐‘ข (๐‘ฅ) โˆ’ ๐‘ฃ (๐‘ฅ) | ๐‘‘๐‘ฅ โ‰ค ๐ฟ๐ถ๐‘ โ€–๐‘ข โˆ’ ๐‘ฃ โ€–๐ฟ๐‘ ,

where ๐ฟ is the Lipschitz constant of ๐‘“ and๐ถ๐‘ the constant from the continuous embedding๐ฟ๐‘ (ฮฉ) โ†ฉโ†’ ๐ฟ1(ฮฉ) for 1 โ‰ค ๐‘ โ‰ค โˆž. Hence ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ โ„ is Lipschitz continuous andtherefore nite-valued as well.

186

13 clarke subdifferentials

Let now b โˆˆ ๐œ•๐ถ๐น (๐‘ข) be given and โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ) be arbitrary. By denition, we thus have

(13.4) ใ€ˆb, โ„Žใ€‰๐ฟ๐‘ โ‰ค ๐น โ—ฆ(๐‘ข;โ„Ž) = lim sup๐‘ฃโ†’๐‘ข๐‘กโ†’ 0

๐น (๐‘ฃ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฃ)๐‘ก

โ‰คโˆซฮฉlim sup๐‘ฃโ†’๐‘ข๐‘กโ†’ 0

๐‘“ (๐‘ฃ (๐‘ฅ) + ๐‘กโ„Ž(๐‘ฅ)) โˆ’ ๐‘“ (๐‘ฃ (๐‘ฅ))๐‘ก

๐‘‘๐‘ฅ

โ‰คโˆซฮฉlim sup๐‘ฃ๐‘ฅโ†’๐‘ข (๐‘ฅ)๐‘ก๐‘ฅโ†’ 0

๐‘“ (๐‘ฃ๐‘ฅ + ๐‘ก๐‘ฅโ„Ž(๐‘ฅ)) โˆ’ ๐‘“ (๐‘ฃ๐‘ฅ )๐‘ก๐‘ฅ

๐‘‘๐‘ฅ

=โˆซฮฉ๐‘“ โ—ฆ(๐‘ข (๐‘ฅ);โ„Ž(๐‘ฅ)) ๐‘‘๐‘ฅ,

where we were able to use the Reverse Fatou Lemma to exchange the lim sup with theintegral in the rst inequality since the integrand is bounded from above by the integrablefunction ๐ฟ |โ„Ž | due to Lemma 13.1 (i); the second inequality follows by bounding for almostevery ๐‘ฅ โˆˆ ฮฉ the (pointwise) limit over the sequences realizing the lim sup in the secondline by the lim sup over all admissible sequences.

To interpret (13.4) pointwise, we dene for ๐‘ฅ โˆˆ ฮฉ

๐‘”๐‘ฅ : โ„ โ†’ โ„, ๐‘”๐‘ฅ (๐‘ก) โ‰” ๐‘“ โ—ฆ(๐‘ข (๐‘ฅ); ๐‘ก).

From Lemma 13.1 (ii)โ€“(iii), it follows that ๐‘”๐‘ฅ is convex; Lemma 13.1 (i) further implies thatthe function ๐‘ฅ โ†ฆโ†’ ๐‘”๐‘ฅ (โ„Ž(๐‘ฅ)) is measurable for any โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ). Since ๐‘”๐‘ฅ (0) = 0, (13.4) impliesthat

ใ€ˆb, โ„Ž โˆ’ 0ใ€‰๐ฟ๐‘ โ‰คโˆซฮฉ๐‘”๐‘ฅ (โ„Ž(๐‘ฅ)) ๐‘‘๐‘ฅ โˆ’

โˆซฮฉ๐‘”๐‘ฅ (0) ๐‘‘๐‘ฅ,

i.e., b โˆˆ ๐œ•๐บ (0) for the superposition operator ๐บ (โ„Ž) โ‰”โˆซฮฉ๐‘”๐‘ฅ (โ„Ž(๐‘ฅ)) ๐‘‘๐‘ฅ . Arguing exactly as

in the proof of Theorem 4.11, this implies that b = ๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ) with ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐‘”๐‘ฅ (0) foralmost every ๐‘ฅ โˆˆ ฮฉ, i.e.,

๐‘ขโˆ—(๐‘ฅ)โ„Ž(๐‘ฅ) = ๐‘ขโˆ—(๐‘ฅ) (โ„Ž(๐‘ฅ) โˆ’ 0) โ‰ค ๐‘”๐‘ฅ (โ„Ž(๐‘ฅ)) โˆ’ ๐‘”๐‘ฅ (0) = ๐‘“ โ—ฆ(๐‘ข (๐‘ฅ);โ„Ž(๐‘ฅ))

for almost every ๐‘ฅ โˆˆ ฮฉ. Since โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ) was arbitrary, this implies that๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ข (๐‘ฅ))almost everywhere as claimed.

It remains to show the remaining assertions when ๐‘“ is regular. In this case, it follows from(13.4) that for any โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ),

(13.5) ๐น โ—ฆ(๐‘ข;โ„Ž) โ‰คโˆซฮฉ๐‘“ โ—ฆ(๐‘ข (๐‘ฅ);โ„Ž(๐‘ฅ)) ๐‘‘๐‘ฅ =

โˆซฮฉ๐‘“ โ€ฒ(๐‘ข (๐‘ฅ);โ„Ž(๐‘ฅ)) ๐‘‘๐‘ฅ

โ‰ค lim๐‘กโ†’ 0

๐น (๐‘ข + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ข)๐‘ก

= ๐น โ€ฒ(๐‘ข;โ„Ž) โ‰ค ๐น โ—ฆ(๐‘ข;โ„Ž),

187

13 clarke subdifferentials

where the second inequality is obtained by applying Fatouโ€™s Lemma, this time appealingto the integrable lower bound โˆ’๐ฟ |โ„Ž(๐‘ฅ) |. This shows that ๐น โ€ฒ(๐‘ข;โ„Ž) = ๐น โ—ฆ(๐‘ข;โ„Ž) and hencethat ๐น is regular. We further obtain for any ๐‘ขโˆ— โˆˆ ๐ฟ๐‘ž (ฮฉ) with ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ข (๐‘ฅ)) almosteverywhere and any โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ), that

ใ€ˆ๐‘ขโˆ—, โ„Žใ€‰๐ฟ๐‘ =โˆซฮฉ๐‘ขโˆ—(๐‘ฅ)โ„Ž(๐‘ฅ) ๐‘‘๐‘ฅ โ‰ค

โˆซฮฉ๐‘“ โ—ฆ(๐‘ข (๐‘ฅ);โ„Ž(๐‘ฅ)) ๐‘‘๐‘ฅ โ‰ค ๐น โ—ฆ(๐‘ข,โ„Ž),

where we have used (13.5) in the last inequality. Since โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ) was arbitrary, this impliesthat ๐‘ขโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ข). ๏ฟฝ

Under additional assumptions similar to those of Theorem 2.12 and with more technicalarguments, this result can be extended to spatially varying integrands ๐‘“ : ฮฉ ร—โ„ โ†’ โ„; see,e.g., [Clarke 1990, Theorem 2.7.5].

13.3 calculus rules

We now turn to calculus rules. The rst one follows directly from the denition.

Theorem 13.10. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘‹ and ๐›ผ โˆˆ โ„. Then,

๐œ•๐ถ (๐›ผ๐น ) (๐‘ฅ) = ๐›ผ๐œ•๐ถ (๐น ) (๐‘ฅ).

Proof. First, ๐›ผ๐น is clearly locally Lipschitz continuous near ๐‘ฅ for any ๐›ผ โˆˆ โ„. If ๐›ผ = 0, bothsides of the claimed equality are zero (which is easiest seen from Theorem 13.5). If ๐›ผ > 0,we have that (๐›ผ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) = ๐›ผ๐น โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ from the denition. Hence,

๐›ผ๐œ•๐ถ๐น (๐‘ฅ) = {๐›ผ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= {๐›ผ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐›ผ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐›ผ๐น โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= {๐‘ฆโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฆโˆ—, โ„Žใ€‰๐‘‹ โ‰ค (๐›ผ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= ๐œ•๐ถ (๐›ผ๐น ) (๐‘ฅ).

To conclude the proof, it suces to show the claim for๐›ผ = โˆ’1. For that,we use Lemma 13.1 (iv)to obtain that

๐œ•๐ถ (โˆ’๐น ) (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค (โˆ’๐น )โ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆโˆ’๐‘ฅโˆ—,โˆ’โ„Žใ€‰๐‘‹ โ‰ค ๐น โ—ฆ(๐‘ฅ ;โˆ’โ„Ž) for all โ„Ž โˆˆ ๐‘‹ }= {โˆ’๐‘ฆโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฆโˆ—, ๐‘”ใ€‰๐‘‹ โ‰ค ๐น โ—ฆ(๐‘ฅ ;๐‘”) for all ๐‘” โˆˆ ๐‘‹ }= โˆ’๐œ•๐ถ๐น (๐‘ฅ). ๏ฟฝ

Corollary 13.11. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘‹ . If ๐น has a localmaximum in ๐‘ฅ , then 0 โˆˆ ๐œ•๐ถ๐น (๐‘ฅ).

188

13 clarke subdifferentials

Proof. If ๐‘ฅ is a local maximizer of ๐น , it is a local minimizer of โˆ’๐น . Hence, Theorems 13.4and 13.10 imply that

0 โˆˆ ๐œ•๐ถ (โˆ’๐น ) (๐‘ฅ) = โˆ’๐œ•๐ถ๐น (๐‘ฅ),i.e., 0 = โˆ’0 โˆˆ ๐œ•๐ถ๐น (๐‘ฅ). ๏ฟฝ

support functionals

The remaining rules are signicantly more involved. As in the previous proofs, a key stepis to relate dierent sets of the form (13.2), which we will do with the help of the followinglemmas.

Lemma 13.12. Let ๐น : ๐‘‹ โ†’ โ„ be positively homogeneous, subadditive, and lower semicontin-uous, and let

๐ด = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ } .Then,

(13.6) ๐น (๐‘ฅ) = sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

Proof. By denition of ๐ด, the inequality ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ) โ‰ค 0 holds for all ๐‘ฅ โˆˆ ๐‘‹ if and onlyif ๐‘ฅโˆ— โˆˆ ๐ด. Thus, a case distinction as in Example 5.3 (ii) using the positively homogeneityof ๐น shows that

๐น โˆ—(๐‘ฅโˆ—) = sup๐‘ฅโˆˆ๐‘‹

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โˆ’ ๐น (๐‘ฅ) ={0 ๐‘ฅโˆ— โˆˆ ๐ด,โˆž ๐‘ฅโˆ— โˆ‰ ๐ด,

i.e., ๐น โˆ— = ๐›ฟ๐ด. Further, ๐น by assumption is also subadditive and hence convex as well aslower semicontinuous. Theorem 5.1 thus implies that

๐น (๐‘ฅ) = ๐น โˆ—โˆ—(๐‘ฅ) = (๐›ฟ๐ด)โˆ—(๐‘ฅ) = sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ . ๏ฟฝ

The right-hand side of (13.6) is called the support functional of ๐ด โŠ‚ ๐‘‹ โˆ—; see, e.g., [Hiriart-Urruty & Lemarรฉchal 2001] for their use in convex analysis (in nite dimensions).

Lemma 13.13. Let ๐ด, ๐ต โŠ‚ ๐‘‹ โˆ— be nonempty, convex and weakly-โˆ— closed. Then ๐ด โŠ‚ ๐ต if andonly if

(13.7) sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค sup๐‘ฅโˆ—โˆˆ๐ต

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

189

13 clarke subdifferentials

Proof. If ๐ด โŠ‚ ๐ต, then the right-hand side of (13.7) is obviously not less than the left-handside. Conversely, assume that there exists an ๐‘ฅโˆ— โˆˆ ๐ด with ๐‘ฅโˆ— โˆ‰ ๐ต. By the assumptions on ๐ดand ๐ต, we then obtain from Theorem 1.13 an ๐‘ฅ โˆˆ ๐‘‹ and a _ โˆˆ โ„ with

ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค _ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘งโˆ— โˆˆ ๐ต.

Taking the supremum over all ๐‘งโˆ— โˆˆ ๐ต and estimating the right-hand side by the supremumover all ๐‘ฅโˆ— โˆˆ ๐ด then yields that

sup๐‘งโˆ—โˆˆ๐ต

ใ€ˆ๐‘งโˆ—, ๐‘ฅใ€‰๐‘‹ < sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ .

Hence (13.7) is violated, and the claim follows by contraposition. ๏ฟฝ

Corollary 13.14. Let ๐ด, ๐ต โŠ‚ ๐‘‹ โˆ— be nonempty, convex and weakly-โˆ— closed. Then ๐ด = ๐ต if andonly if

(13.8) sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = sup๐‘ฅโˆ—โˆˆ๐ต

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ .

Proof. Again, the claim is obvious if ๐ด = ๐ต. Conversely, if (13.8) holds, then in particular(13.7) holds, and we obtain from Lemma 13.13 that ๐ด โŠ‚ ๐ต. Exchanging the roles of ๐ด and ๐ตnow yields the claim. ๏ฟฝ

Lemma 13.12 together with Lemma 13.1 directly yields the following useful representation.

Corollary 13.15. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous and ๐‘ฅ โˆˆ ๐‘‹ . Then

๐น โ—ฆ(๐‘ฅ ;โ„Ž) = sup๐‘ฅโˆ—โˆˆ๐œ•๐ถ๐น (๐‘ฅ)

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ for all โ„Ž โˆˆ ๐‘‹ .

For example, this implies a converse result to Theorem 13.5.

Corollary 13.16. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ . If ๐œ•๐ถ๐น (๐‘ฅ) = {๐‘ฅโˆ—} forsome ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—, then ๐น is Gรขteaux dierentiable in ๐‘ฅ with ๐ท๐น (๐‘ฅ) = ๐‘ฅโˆ—.

Proof. Under the assumption, it follows from Corollary 13.15 that

๐น โ—ฆ(๐‘ฅ ;โ„Ž) = sup๐‘ฅโˆ—โˆˆ๐œ•๐น๐ถ (๐‘ฅ)

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ = ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹

190

13 clarke subdifferentials

for all โ„Ž โˆˆ ๐‘‹ . In particular, ๐น โ—ฆ(๐‘ฅ ;โ„Ž) is linear (and not just reective) in โ„Ž. It thus followsfrom Lemma 13.1 (iv) that for any โ„Ž โˆˆ ๐‘‹ ,

lim inf๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ)๐‘ก

= โˆ’ lim sup๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

โˆ’๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ (โˆ’๐น (๐‘ฆ))๐‘ก

= โˆ’(โˆ’๐น )โ—ฆ(๐‘ฅ ;โ„Ž) = โˆ’๐น โ—ฆ(๐‘ฅ ;โˆ’โ„Ž) = ๐น โ—ฆ(๐‘ฅ, โ„Ž)= lim sup

๐‘ฆโ†’๐‘ฅ๐‘กโ†’ 0

๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ)๐‘ก

.

Hence the lim sup is a proper limit, and thus ๐น โ—ฆ(๐‘ฅ ;โ„Ž) = ๐น โ€ฒ(๐‘ฅ ;โ„Ž); i.e., ๐น is regular in ๐‘ฅ . Thisshows that ๐น โ€ฒ(๐‘ฅ ;โ„Ž) is linear and bounded in โ„Ž, and hence ๐‘ฅโˆ— is by denition the Gรขteauxderivative. ๏ฟฝ

It is not hard to verify from the denition and the Lipschitz continuity of ๐น that in thiscase, ๐‘ฅโˆ— is in fact a Frรฉchet derivative.

We can also use this to show the promised nonemptiness of the convex subdierential.

Theorem 13.17. Let ๐‘‹ be a Banach space and let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lowersemicontinuous, and ๐‘ฅ โˆˆ int(dom ๐น ). Then ๐œ•๐น (๐‘ฅ) is nonempty, convex, weakly-โˆ— closed, andbounded.

Proof. Since ๐‘ฅ โˆˆ int(dom ๐น ), Theorem 13.8 shows that ๐œ•๐ถ๐น (๐‘ฅ) = ๐œ•๐น (๐‘ฅ) and that ๐น is regularin ๐‘ฅ . It thus follows from Corollary 13.15 and Lemma 4.3 (iii) that sup๐‘ฅโˆ—โˆˆ๐œ•๐น (๐‘ฅ) ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ =๐น โ€ฒ(๐‘ฅ ;โ„Ž) โˆˆ โ„ for ๐‘ฅ โˆˆ int(dom ๐น ), and hence the supremum cannot be over the empty set(for which any supremum is โˆ’โˆž by convention). The remaining properties follow fromLemma 13.2. ๏ฟฝ

By a similar argument, we now obtain the promised converse of Theorem 4.5; we combineboth statements here for the sake of reference.

Theorem 13.18. Let ๐‘‹ be a Banach space and let ๐น : ๐‘‹ โ†’ โ„ be convex. If ๐น is Gรขteauxdierentiable at ๐‘ฅ , then ๐œ•๐น (๐‘ฅ) = {๐ท๐น (๐‘ฅ)}. Conversely, if ๐‘ฅ โˆˆ int(dom ๐น ) and ๐œ•๐น (๐‘ฅ) = {๐‘ฅโˆ—}is a singleton, then ๐น is Gรขteaux dierentiable at ๐‘ฅ with ๐ท๐น (๐‘ฅ) = ๐‘ฅโˆ—.

Proof. The rst claim was already shown in Theorem 4.5, while the second follows fromCorollary 13.16 together with Theorem 13.8. ๏ฟฝ

As another consequence, we can show that Moreauโ€“Yosida regularization dened in Sec-tion 7.3 preserves (global!) Lipschitz continuity.

191

13 clarke subdifferentials

Lemma 13.19. Let ๐‘‹ be a Hilbert space and let ๐น : ๐‘‹ โ†’ โ„ be Lipschitz continuous withconstant ๐ฟ. Then ๐น๐›พ is Lipschitz continuous with constant ๐ฟ as well. If ๐น is in addition convex,

then ๐น โˆ’ ๐›พ๐ฟ2

2 โ‰ค ๐น๐›พ โ‰ค ๐น .

Proof. Let ๐‘ฅ, ๐‘ง โˆˆ ๐‘‹ . We expand

๐น๐›พ (๐‘ฅ) โˆ’ ๐น๐›พ (๐‘ง) = sup๐‘ฆ๐‘งโˆˆ๐‘‹

inf๐‘ฆ๐‘ฅโˆˆ๐‘‹

(๐น (๐‘ฆ๐‘ฅ ) โˆ’ ๐น (๐‘ฆ๐‘ง) + 1

2๐›พ โ€–๐‘ฆ๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ 1

2๐›พ โ€–๐‘ฆ๐‘ง โˆ’ ๐‘งโ€–2๐‘‹

).

Taking ๐‘ฆ๐‘ฅ = ๐‘ฆ๐‘ง + ๐‘ฅ โˆ’ ๐‘ง, we estimate

๐น๐›พ (๐‘ฅ) โˆ’ ๐น๐›พ (๐‘ง) โ‰ค sup๐‘ฆ๐‘งโˆˆ๐‘‹

(๐น (๐‘ฆ๐‘ง + ๐‘ฅ โˆ’ ๐‘ง) โˆ’ ๐น (๐‘ฆ๐‘ง)) โ‰ค ๐ฟโ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ .

Exchanging ๐‘ฅ and ๐‘ฆ , we obtain the rst claim.

For the second claim, we rst observe that by assumption dom ๐น = ๐‘‹ . Hence by Theo-rem 13.17 and Lemma 13.2, for every ๐‘ฅ โˆˆ ๐‘‹ , there exists some ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) with โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค ๐ฟ.Thus, using Lemma 4.4, for any ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ),

๐น๐›พ (๐‘ฅ) = inf๐‘ฆโˆˆ๐‘‹

๐น (๐‘ง) + 12๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ โ‰ฅ ๐น (๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ง โˆ’ ๐‘ฅใ€‰๐‘‹ + 1

2๐›พ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ .

The Cauchyโ€“Schwarz and generalized Youngโ€™s inequality then yield ๐น๐›พ (๐‘ฅ) โ‰ฅ ๐น (๐‘ฅ) โˆ’๐›พ2 โ€–๐‘ฅโˆ—โ€–2๐‘‹ โˆ— โ‰ฅ ๐น (๐‘ฅ) โˆ’ ๐›พ

2๐ฟ2. The second inequality follows by estimating the inmum in (7.18)

by ๐‘ง = ๐‘ฅ . ๏ฟฝ

sum rule

With the aid of these results on support functionals, we can now show a sum rule.

Theorem 13.20. Let ๐น,๐บ : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘‹ . Then,

๐œ•๐ถ (๐น +๐บ) (๐‘ฅ) โŠ‚ ๐œ•๐ถ๐น (๐‘ฅ) + ๐œ•๐ถ๐บ (๐‘ฅ) .

If ๐น and ๐บ are regular at ๐‘ฅ , then ๐น +๐บ is regular at ๐‘ฅ and equality holds.

Proof. It is clear that ๐น +๐บ is locally Lipschitz continuous near ๐‘ฅ . Furthermore, from theproperties of the lim sup we always have for all โ„Ž โˆˆ ๐‘‹ that

(13.9) (๐น +๐บ)โ—ฆ(๐‘ฅ ;โ„Ž) โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) +๐บโ—ฆ(๐‘ฅ ;โ„Ž).

If ๐น and ๐บ are regular at ๐‘ฅ , the calculus of limits yields that

๐น โ—ฆ(๐‘ฅ ;โ„Ž) +๐บโ—ฆ(๐‘ฅ ;โ„Ž) = ๐น โ€ฒ(๐‘ฅ ;โ„Ž) +๐บโ€ฒ(๐‘ฅ ;โ„Ž) = (๐น +๐บ)โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค (๐น +๐บ)โ—ฆ(๐‘ฅ ;โ„Ž),

192

13 clarke subdifferentials

which implies that (๐น +๐บ)โ—ฆ(๐‘ฅ ;โ„Ž) = (๐น +๐บ)โ€ฒ(๐‘ฅ ;โ„Ž), i.e., ๐น +๐บ is regular.

By denition, it follows from (13.9) that

๐œ•๐ถ (๐น +๐บ) (๐‘ฅ) โŠ‚ {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ—ฆ(๐‘ฅ ;โ„Ž) +๐บโ—ฆ(๐‘ฅ ;โ„Ž) for all โ„Ž โˆˆ ๐‘‹ } =: ๐ด(with equality if ๐น and๐บ are regular); it thus remains to show that๐ด = ๐œ•๐ถ๐น (๐‘ฅ) +๐œ•๐ถ๐บ (๐‘ฅ). Forthis, we use that both ๐œ•๐ถ๐น (๐‘ฅ) and ๐œ•๐ถ๐บ (๐‘ฅ) are convex and weakly-โˆ— closed by Lemma 13.2,and, as shown in Lemma 13.1, that generalized directional derivatives and hence their sumsare positively homogeneous, convex, and lower semicontinuous. We thus obtain fromLemma 13.12 for all โ„Ž โˆˆ ๐‘‹ that

sup๐‘ฅโˆ—โˆˆ๐œ•๐ถ๐น (๐‘ฅ)+๐œ•๐ถ๐บ (๐‘ฅ)

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ = sup๐‘ฅโˆ—1 โˆˆ๐œ•๐ถ๐น (๐‘ฅ)

ใ€ˆ๐‘ฅโˆ—1 , โ„Žใ€‰๐‘‹ + sup๐‘ฅโˆ—2โˆˆ๐œ•๐ถ๐บ (๐‘ฅ)

ใ€ˆ๐‘ฅโˆ—2, โ„Žใ€‰๐‘‹

= ๐น โ—ฆ(๐‘ฅ ;โ„Ž) +๐บโ—ฆ(๐‘ฅ ;โ„Ž) = sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ .

The claimed equality of ๐ด and the sum of the subdierentials now follows from Corol-lary 13.14. ๏ฟฝ

Note the dierences to the convex sum rule: The generic inclusion is now in the otherdirection; furthermore, both functionals have to be regular, and in exactly the point wherethe sum rule is applied. By induction, one obtains from this sum rule for an arbitrarynumber of functionals (which all have to be regular).

chain rule

To prove a chain rule, we need the following โ€œnonsmoothโ€ mean value theorem.

Theorem 13.21. Let ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅ be in theLipschitz neighborhood of ๐‘ฅ . Then there exists a _ โˆˆ (0, 1) and an ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ฅ + _(๐‘ฅ โˆ’ ๐‘ฅ))such that

๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) = ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ .

Proof. Dene๐œ“,๐œ‘ : [0, 1] โ†’ โ„ as

๐œ“ (_) โ‰” ๐น (๐‘ฅ + _(๐‘ฅ โˆ’ ๐‘ฅ)), ๐œ‘ (_) โ‰” ๐œ“ (_) + _(๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ)) .By the assumptions on ๐น and ๐‘ฅ , both๐œ“ and ๐œ‘ are Lipschitz continuous. In addition, ๐œ‘ (0) =๐น (๐‘ฅ) = ๐œ‘ (1), and hence ๐œ‘ has a local minimum or maximum in an interior point _ โˆˆ (0, 1).From the Fermat principle Theorem 13.4 or Corollary 13.11, respectively, together with thesum rule from Theorem 13.20 and the characterization of the subdierential of the secondterm from Theorem 13.5, we thus obtain that

0 โˆˆ ๐œ•๐ถ๐œ‘ (_) โŠ‚ ๐œ•๐ถ๐œ“ (_) + {๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ)}.

193

13 clarke subdifferentials

Hence we are nished if we can show for ๐‘ฅ_ โ‰” ๐‘ฅ + _(๐‘ฅ โˆ’ ๐‘ฅ) that

(13.10) ๐œ•๐ถ๐œ“ (_) โŠ‚{ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹

๏ฟฝ๏ฟฝ ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ฅ_)} =: ๐ด.

For this purpose, consider for arbitrary ๐‘  โˆˆ โ„ the generalized derivative

๐œ“ โ—ฆ(_; ๐‘ ) = lim sup_โ†’_๐‘กโ†’ 0

๐œ“ (_ + ๐‘ก๐‘ ) โˆ’๐œ“ (_)๐‘ก

= lim sup_โ†’_๐‘กโ†’ 0

๐น (๐‘ฅ + (_ + ๐‘ก๐‘ ) (๐‘ฅ โˆ’ ๐‘ฅ)) โˆ’ ๐น (๐‘ฅ + _(๐‘ฅ โˆ’ ๐‘ฅ))๐‘ก

โ‰ค lim sup๐‘งโ†’๐‘ฅ_๐‘กโ†’ 0

๐น (๐‘ง + ๐‘ก๐‘  (๐‘ฅ โˆ’ ๐‘ฅ)) โˆ’ ๐น (๐‘ง)๐‘ก

= ๐น โ—ฆ(๐‘ฅ_; ๐‘  (๐‘ฅ โˆ’ ๐‘ฅ)),

where the inequality follows from considering arbitrary sequences ๐‘ง โ†’ ๐‘ฅ_ (instead ofspecial sequences of the form ๐‘ง๐‘› = ๐‘ฅ + _๐‘› (๐‘ฅ โˆ’ ๐‘ฅ)) in the last lim sup. Lemma 13.13 thusimplies that

(13.11) ๐œ•๐ถ๐œ“ (_) โŠ‚{๐‘กโˆ— โˆˆ โ„

๏ฟฝ๏ฟฝ ๐‘กโˆ—๐‘  โ‰ค ๐น โ—ฆ(๐‘ฅ_; ๐‘  (๐‘ฅ โˆ’ ๐‘ฅ)) for all ๐‘  โˆˆ โ„}=: ๐ต.

It remains to show that the sets ๐ด and ๐ต from (13.10) and (13.11) coincide. But this followsagain from Lemma 13.12 and Corollary 13.14, since for all ๐‘  โˆˆ โ„ we have that

sup๐‘กโˆ—โˆˆ๐ด

๐‘กโˆ—๐‘  = sup๐‘ฅโˆ—โˆˆ๐œ•๐ถ๐น (๐‘ฅ_)

ใ€ˆ๐‘ฅโˆ—, ๐‘  (๐‘ฅ โˆ’ ๐‘ฅ)ใ€‰๐‘‹ = ๐น โ—ฆ(๐‘ฅ_; ๐‘  (๐‘ฅ โˆ’ ๐‘ฅ)) = sup๐‘กโˆ—โˆˆ๐ต

๐‘กโˆ—๐‘  . ๏ฟฝ

We also need the following generalization of the argument in Theorem 13.5.

Lemma 13.22. Let๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ†’ ๐‘Œ be continuously Frรฉchet dierentiableat ๐‘ฅ โˆˆ ๐‘‹ . Let {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ be a sequence with ๐‘ฅ๐‘› โ†’ ๐‘ฅ and {๐‘ก๐‘›}๐‘›โˆˆโ„• โŠ‚ (0,โˆž) be a sequencewith ๐‘ก๐‘›โ†’ 0. Then for any โ„Ž โˆˆ ๐‘‹ ,

lim๐‘›โ†’โˆž

๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)๐‘ก๐‘›

= ๐น โ€ฒ(๐‘ฅ)โ„Ž.

Proof. Let โ„Ž โˆˆ ๐‘‹ be arbitrary. By the Hahnโ€“Banach extension Theorem 1.4, for every ๐‘› โˆˆ โ„•

there exists a ๐‘ฆโˆ—๐‘› โˆˆ ๐‘Œ โˆ— with โ€–๐‘ฆโˆ—๐‘› โ€–๐‘Œ โˆ— = 1 and

โ€–๐‘กโˆ’1๐‘› (๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)) โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘Œ = ใ€ˆ๐‘ฆโˆ—๐‘›, ๐‘กโˆ’1๐‘› (๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)) โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰๐‘Œ .

Applying now the classical mean value theorem to the scalar functions

๐‘“๐‘› : [0, 1] โ†’ โ„, ๐‘“๐‘› (๐‘ ) = ใ€ˆ๐‘ฆโˆ—๐‘›, ๐น (๐‘ฅ๐‘› + ๐‘ ๐‘ก๐‘›โ„Ž)ใ€‰๐‘Œ ,

194

13 clarke subdifferentials

we obtain similarly to the proof of Theorem 2.10 for all ๐‘› โˆˆ โ„• that

โ€–๐‘กโˆ’1๐‘› (๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)) โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘Œ = ๐‘กโˆ’1๐‘›

โˆซ 1

0ใ€ˆ๐‘ฆโˆ—๐‘›, ๐น โ€ฒ(๐‘ฅ๐‘› + ๐‘ ๐‘ก๐‘›โ„Ž)๐‘ก๐‘›โ„Žใ€‰๐‘Œ ๐‘‘๐‘  โˆ’ ใ€ˆ๐‘ฆโˆ—๐‘›, ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰๐‘Œ

=โˆซ 1

0ใ€ˆ๐‘ฆโˆ—๐‘›, (๐น โ€ฒ(๐‘ฅ๐‘› + ๐‘ ๐‘ก๐‘›โ„Ž) โˆ’ ๐น โ€ฒ(๐‘ฅ))โ„Žใ€‰๐‘Œ ๐‘‘๐‘ 

โ‰คโˆซ 1

0โ€–๐น โ€ฒ(๐‘ฅ๐‘› + ๐‘ ๐‘ก๐‘›โ„Ž) โˆ’ ๐น โ€ฒ(๐‘ฅ))โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) ๐‘‘๐‘  โ€–โ„Žโ€–๐‘‹ ,

where we have used (1.1) together with โ€–๐‘ฆโˆ—๐‘› โ€–๐‘Œ โˆ— = 1 in the last step. Since ๐น โ€ฒ is continuousby assumption, the integrand goes to zero as ๐‘› โ†’ โˆž uniformly in ๐‘  โˆˆ [0, 1], and the claimfollows. ๏ฟฝ

We now come to the chain rule, which in contrast to the convex case does not require thedual mapping to be linear; this is one of the main advantages of the Clarke subdierentialin the context of nonsmooth optimization.

Theorem 13.23. Let ๐‘Œ be a separable Banach space, ๐น : ๐‘‹ โ†’ ๐‘Œ be continuously Frรฉchetdierentiable at ๐‘ฅ โˆˆ ๐‘‹ , and ๐บ : ๐‘Œ โ†’ โ„ be locally Lipschitz continuous near ๐น (๐‘ฅ). Then,

๐œ•๐ถ (๐บ โ—ฆ ๐น ) (๐‘ฅ) โŠ‚ ๐น โ€ฒ(๐‘ฅ)โˆ—๐œ•๐ถ๐บ (๐น (๐‘ฅ)) โ‰” {๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ— | ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ๐บ (๐น (๐‘ฅ))} .

If ๐บ is regular at ๐น (๐‘ฅ), then ๐บ โ—ฆ ๐น is regular at ๐‘ฅ , and equality holds.

Proof. Let us write ๐‘ˆ๐น (๐‘ฅ) for the neighborhood of ๐น (๐‘ฅ) where ๐บ is Lipschitz with factor ๐ฟ.The local Lipschitz continuity of๐บ โ—ฆ ๐น follows from that of๐บ and ๐น (which in turn followsfrom Lemma 2.11). For the claimed inclusion (respectively, equality), we argue as above.First we show that for every โ„Ž โˆˆ ๐‘‹ there exists a ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ๐บ (๐น (๐‘ฅ)) with

(13.12) (๐บ โ—ฆ ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) = ใ€ˆ๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰๐‘Œ .

For this, consider for given โ„Ž โˆˆ ๐‘‹ sequences {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘‹ and {๐‘ก๐‘›}๐‘›โˆˆโ„• โŠ‚ (0,โˆž) with๐‘ฅ๐‘› โ†’ ๐‘ฅ , ๐‘ก๐‘›โ†’ 0, and

(๐บ โ—ฆ ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) = lim๐‘›โ†’โˆž

๐บ (๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž)) โˆ’๐บ (๐น (๐‘ฅ๐‘›))๐‘ก๐‘›

.

Furthermore, by continuity of ๐น , we can nd ๐‘›0 โˆˆ โ„• such that for all ๐‘› โ‰ฅ ๐‘›0, both๐น (๐‘ฅ๐‘›), ๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆˆ ๐‘ˆ๐น (๐‘ฅ) . Theorem 13.21 thus yields for all ๐‘› โ‰ฅ ๐‘›0 a _๐‘› โˆˆ (0, 1) anda ๐‘ฆโˆ—๐‘› โˆˆ ๐œ•๐ถ๐บ (๐‘ฆ๐‘›) for ๐‘ฆ๐‘› := ๐น (๐‘ฅ๐‘›) + _๐‘› (๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)) with

(13.13) ๐บ (๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž)) โˆ’๐บ (๐น (๐‘ฅ๐‘›))๐‘ก๐‘›

= ใ€ˆ๐‘ฆโˆ—๐‘›, ๐‘ž๐‘›ใ€‰๐‘Œ for ๐‘ž๐‘› โ‰”๐น (๐‘ฅ๐‘› + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)

๐‘ก๐‘›

195

13 clarke subdifferentials

Since _๐‘› โˆˆ (0, 1) is uniformly bounded, we also have that ๐‘ฆ๐‘› โ†’ ๐น (๐‘ฅ) for ๐‘› โ†’ โˆž. Hence,for ๐‘› large enough, ๐‘ฆ๐‘› โˆˆ ๐‘ˆ๐น (๐‘ฅ) . By Lemma 13.1 and the discussion following (13.2) then,eventually, ๐‘ฆโˆ—๐‘› โˆˆ ๐œ•๐ถ๐บ (๐‘ฆ๐‘›) โŠ‚ ๐”น(0, ๐ฟ). This implies that {๐‘ฆโˆ—๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐‘Œ โˆ— is bounded, andthe Banachโ€“Alaoglu Theorem 1.11 yields a weakly-โˆ— convergent subsequence with limit๐‘ฆโˆ— โˆˆ ๐œ•๐ถ๐บ (๐น (๐‘ฅ)) by Lemma 13.3. Finally, since ๐น is continuously Frรฉchet dierentiable,๐‘ž๐‘› โ†’ ๐น โ€ฒ(๐‘ฅ)โ„Ž strongly in ๐‘Œ by Lemma 13.22. Hence, ใ€ˆ๐‘ฆโˆ—๐‘›, ๐‘ž๐‘›ใ€‰๐‘Œ โ†’ ใ€ˆ๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰ as the dualitypairing of weakly-โˆ— and strongly converging sequences. Passing to the limit in (13.13)therefore yields (13.12) (rst along the subsequence chosen above; by convergence of theleft-hand side of (13.12) and the uniqueness of limits then for the full sequence as well). Bydenition of the Clarke subdierential, we thus have for a ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ๐บ (๐น (๐‘ฅ)) that

(13.14) (๐บ โ—ฆ ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) = ใ€ˆ๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰๐‘Œ โ‰ค ๐บโ—ฆ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž).

If ๐บ is now regular at ๐‘ฅ , we have that ๐บโ—ฆ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž) = ๐บโ€ฒ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž) and hence bythe local Lipschitz continuity of ๐บ and the Frรฉchet dierentiability of ๐น that

๐บโ—ฆ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž)= lim๐‘กโ†’ 0

๐บ (๐น (๐‘ฅ) + ๐‘ก๐น โ€ฒ(๐‘ฅ)โ„Ž) โˆ’๐บ (๐น (๐‘ฅ))๐‘ก

= lim๐‘กโ†’ 0

๐บ (๐น (๐‘ฅ) + ๐‘ก๐น โ€ฒ(๐‘ฅ)โ„Ž) โˆ’๐บ (๐น (๐‘ฅ + ๐‘กโ„Ž)) +๐บ (๐น (๐‘ฅ + ๐‘กโ„Ž)) โˆ’๐บ (๐น (๐‘ฅ))๐‘ก

โ‰ค lim๐‘กโ†’ 0

(๐ฟโ€–โ„Žโ€–๐‘‹ โ€–๐น (๐‘ฅ) + ๐น โ€ฒ(๐‘ฅ)๐‘กโ„Ž โˆ’ ๐น (๐‘ฅ + ๐‘กโ„Ž)โ€–๐‘Œ

โ€–๐‘กโ„Žโ€–๐‘‹ + ๐บ (๐น (๐‘ฅ + ๐‘กโ„Ž)) โˆ’๐บ (๐น (๐‘ฅ))๐‘ก

)= (๐บ โ—ฆ ๐น )โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค (๐บ โ—ฆ ๐น )โ—ฆ(๐‘ฅ ;โ„Ž).

(Since both the sum and the second summand in the next-to-last line converge, this hasto be the case for the rst summand as well.) Together with (13.14), this implies that(๐บ โ—ฆ ๐น )โ€ฒ(๐‘ฅ ;โ„Ž) = (๐บ โ—ฆ ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) (i.e., ๐บ โ—ฆ ๐น is regular at ๐‘ฅ ) and that

(13.15) (๐บ โ—ฆ ๐น )โ—ฆ(๐‘ฅ ;โ„Ž) = ๐บโ—ฆ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž).

As before, Lemma 13.12 now implies for all โ„Ž โˆˆ ๐‘‹ that

sup๐‘ฅโˆ—โˆˆ๐น โ€ฒ(๐‘ฅ)โˆ—๐œ•๐ถ๐บ (๐น (๐‘ฅ))

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ = sup๐‘ฆโˆ—โˆˆ๐œ•๐ถ๐บ (๐น (๐‘ฅ))

ใ€ˆ๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰๐‘Œ = ๐บโ—ฆ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž)

and hence by Lemma 13.13 that

๐น โ€ฒ(๐‘ฅ)โˆ—๐œ•๐ถ๐บ (๐น (๐‘ฅ)) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐บโ—ฆ(๐น (๐‘ฅ); ๐น โ€ฒ(๐‘ฅ)โ„Ž) for all โ„Ž โˆˆ ๐‘‹ } .

Combined with (13.14) or (13.15) and the denition of the Clarke subdierential in (13.2),this now yields the claimed equality or inclusion for the Clarke subdierential of thecomposition. ๏ฟฝ

196

13 clarke subdifferentials

Again, the generic inclusion is the reverse of the one in the convex chain rule. Note thatequality in the chain rule also holds if โˆ’๐บ is regular, since we can then apply Theorem 13.23to โˆ’๐บ โ—ฆ ๐น and use that ๐œ•๐ถ (โˆ’๐บ) (๐น (๐‘ฅ)) = โˆ’๐œ•๐ถ๐บ (๐น (๐‘ฅ)) by Theorem 13.10. Furthermore, if๐บ is not regular but ๐น โ€ฒ(๐‘ฅ) is surjective, a similar proof shows that equality (but not theregularity of ๐บ โ—ฆ ๐น ) holds in the chain rule; see [Clarke 2013, Theorem 10.19].

Example 13.24. As a simple example, we consider

๐‘“ : โ„2 โ†’ โ„, (๐‘ฅ1, ๐‘ฅ2) โ†ฆโ†’ |๐‘ฅ1๐‘ฅ2 |,

which is not convex. To compute the Clarke subdierential, we write ๐‘“ = ๐‘” โ—ฆ๐‘‡ for

๐‘” : โ„ โ†’ โ„, ๐‘ก โ†ฆโ†’ |๐‘ก |, ๐‘‡ : โ„2 โ†’ โ„, (๐‘ฅ1, ๐‘ฅ2) โ†ฆโ†’ ๐‘ฅ1๐‘ฅ2,

where, ๐‘” is nite-valued, convex, and Lipschitz continuous, and hence regular at any๐‘ก โˆˆ โ„, and ๐‘‡ is continuously dierentiable for all ๐‘ฅ โˆˆ โ„2 with Frรฉchet derivative

๐‘‡ โ€ฒ(๐‘ฅ) : โ„โ†’โ„, ๐‘‡ โ€ฒ(๐‘ฅ)โ„Ž โ‰” ๐‘ฅ2โ„Ž1 + ๐‘ฅ1โ„Ž2.

Its adjoint is easily veried to be given by

๐‘‡ โ€ฒ(๐‘ฅ)โˆ— : โ„ โ†’ โ„2, ๐‘‡ โ€ฒ(๐‘ฅ)โˆ—๐‘  โ‰” ( ๐‘ฅ2๐‘ก๐‘ฅ1๐‘ก

).

Hence, Theorem 13.23 together with Theorem 13.8 yields that ๐‘“ is regular at any ๐‘ฅ โˆˆ โ„2

and that๐œ•๐ถ ๐‘“ (๐‘ฅ) = ๐‘‡ โ€ฒ(๐‘ฅ)โˆ—๐œ•๐‘”(๐‘‡ (๐‘ฅ)) =

(๐‘ฅ2๐‘ฅ1

)sign(๐‘ฅ1๐‘ฅ2),

for the set-valued sign function from Example 4.7.

13.4 characterization in finite dimensions

Amore explicit characterization of the Clarke subdierential is possible in nite-dimensionalspaces. The basis is the following theorem, which only holds in โ„๐‘ ; a proof can be foundin, e.g., [DiBenedetto 2002, Theorem 23.2] or [Heinonen 2005, Theorem 3.1].

Theorem 13.25 (Rademacher). Let ๐‘ˆ โŠ‚ โ„๐‘ be open and ๐น : ๐‘ˆ โ†’ โ„ be Lipschitz continuous.Then ๐น is Frรฉchet dierentiable at almost every ๐‘ฅ โˆˆ ๐‘ˆ .

This result allows replacing the lim sup in the denition of the Clarke subdierential (nowconsidered as a subset of โ„๐‘ , i.e., identifying the dual of โ„๐‘ with โ„๐‘ itself) with a properlimit.

197

13 clarke subdifferentials

Theorem 13.26. Let ๐น : โ„๐‘ โ†’ โ„ be locally Lipschitz continuous near ๐‘ฅ โˆˆ โ„๐‘ . Then ๐น isFrรฉchet dierentiable on โ„๐‘ \ ๐ธ๐น for a set ๐ธ๐น โŠ‚ โ„๐‘ of Lebesgue measure 0 and

(13.16) ๐œ•๐ถ๐น (๐‘ฅ) = co{lim๐‘›โ†’โˆžโˆ‡๐น (๐‘ฅ๐‘›)

๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅ๐‘› โ†’ ๐‘ฅ, ๐‘ฅ๐‘› โˆ‰ ๐ธ๐น},

where co๐ด denotes the convex hull of ๐ด โŠ‚ โ„๐‘ .

Proof. We rst note that the Rademacher Theorem ensures that such a set ๐ธ๐น exists and โ€“possibly after intersection with the Lipschitz neighborhood of ๐‘ฅ โ€“ has Lebesgue measure0. Hence there indeed exist sequences {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โˆˆ โ„๐‘ \ ๐ธ๐น with ๐‘ฅ๐‘› โ†’ ๐‘ฅ . Furthermore, thelocal Lipschitz continuity of ๐น yields that for any ๐‘ฅ๐‘› in the Lipschitz neighborhood of ๐‘ฅand any โ„Ž โˆˆ โ„๐‘ , we have that

|ใ€ˆโˆ‡๐น (๐‘ฅ๐‘›), โ„Žใ€‰| =๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝlim๐‘กโ†’ 0

๐น (๐‘ฅ๐‘› + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ๐‘›)๐‘ก

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ค ๐ฟโ€–โ„Žโ€–

and hence that โ€–โˆ‡๐น (๐‘ฅ๐‘›)โ€– โ‰ค ๐ฟ. This implies that {โˆ‡๐น (๐‘ฅ๐‘›)}๐‘›โˆˆโ„• is bounded and thus containsa convergent subsequence. The set on the right-hand side of (13.16) is therefore nonempty.

Let now {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ โ„๐‘ \ ๐ธ๐น be an arbitrary sequence with ๐‘ฅ๐‘› โ†’ ๐‘ฅ and {โˆ‡๐น (๐‘ฅ๐‘›)}๐‘›โˆˆโ„• โ†’ ๐‘ฅโˆ—

for some ๐‘ฅโˆ— โˆˆ โ„๐‘ . Since ๐น is dierentiable at every ๐‘ฅ๐‘› โˆ‰ ๐ธ๐น by denition, Lemma 13.7yields that โˆ‡๐น (๐‘ฅ๐‘›) โˆˆ ๐œ•๐ถ๐น (๐‘ฅ๐‘›), and hence ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ๐น (๐‘ฅ) by Lemma 13.3. The convexity of๐œ•๐ถ๐น (๐‘ฅ) now implies that any convex combination of such limits ๐‘ฅโˆ— is contained in ๐œ•๐ถ๐น (๐‘ฅ),which shows the inclusion โ€œโŠƒโ€ in (13.16).

For the other inclusion, we rst show for all โ„Ž โˆˆ โ„๐‘ and Y > 0 that

(13.17) ๐น โ—ฆ(๐‘ฅ ;โ„Ž) โˆ’ Y โ‰ค lim sup๐ธ๐นโˆŒ๐‘ฆโ†’๐‘ฅ

ใ€ˆโˆ‡๐น (๐‘ฆ), โ„Žใ€‰ =: ๐‘€ (โ„Ž).

Indeed, by denition of๐‘€ (โ„Ž) and of the lim sup, for every Y > 0 there exists a ๐›ฟ > 0 suchthat

ใ€ˆโˆ‡๐น (๐‘ฆ), โ„Žใ€‰ โ‰ค ๐‘€ (โ„Ž) + Y for all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, ๐›ฟ) \ ๐ธ๐น .Here, ๐›ฟ > 0 can be chosen suciently small for ๐น to be Lipschitz continuous on ๐•†(๐‘ฅ, ๐›ฟ).In particular, ๐ธ๐น โˆฉ๐•†(๐‘ฅ, ๐›ฟ) is a set of zero measure. Hence, ๐น is dierentiable at ๐‘ฆ + ๐‘กโ„Ž foralmost all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, ๐›ฟ2 ) and almost all ๐‘ก โˆˆ (0, ๐›ฟ

2โ€–โ„Žโ€– ) by Fubiniโ€™s Theorem. The classical meanvalue theorem therefore yields for all such ๐‘ฆ and ๐‘ก that

(13.18) ๐น (๐‘ฆ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฆ) =โˆซ ๐‘ก

0ใ€ˆโˆ‡๐น (๐‘ฆ + ๐‘ โ„Ž), โ„Žใ€‰ ๐‘‘๐‘  โ‰ค ๐‘ก (๐‘€ (โ„Ž) + Y)

since ๐‘ฆ + ๐‘ โ„Ž โˆˆ ๐•†(๐‘ฅ, ๐›ฟ) for all ๐‘  โˆˆ (0, ๐‘ก) by the choice of ๐‘ก . The continuity of ๐น implies thatthe full inequality (13.18) even holds for all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, ๐›ฟ2 ) and all ๐‘ก โˆˆ (0, ๐›ฟ

2โ€–โ„Žโ€– ). Dividing by๐‘ก > 0 and taking the lim sup over all ๐‘ฆ โ†’ ๐‘ฅ and ๐‘กโ†’ 0 now yields (13.17). Since Y > 0 wasarbitrary, we conclude that ๐น โ—ฆ(๐‘ฅ ;โ„Ž) โ‰ค ๐‘€ (โ„Ž) for all โ„Ž โˆˆ โ„๐‘ .

198

13 clarke subdifferentials

As in Lemma 13.1, one can show that the mapping โ„Ž โ†ฆโ†’ ๐‘€ (โ„Ž) is positively homogeneous,subadditive, and lower semicontinuous. We are thus nished if we can show that the seton the right-hand side of (13.16) โ€“ hereafter denoted by co๐ด โ€“ can be written as

co๐ด ={๐‘ฅโˆ— โˆˆ โ„๐‘

๏ฟฝ๏ฟฝ ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰ โ‰ค ๐‘€ (โ„Ž) for all โ„Ž โˆˆ โ„๐‘}.

For this, we once again appeal to Corollary 13.14 (since both sets are nonempty, convex,and closed). First, we note that the denition of the convex hull implies for all โ„Ž โˆˆ โ„๐‘ that

sup๐‘ฅโˆ—โˆˆco๐ด

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰ = sup๐‘ฅโˆ—๐‘– โˆˆ๐ดโˆ‘

๐‘– ๐‘ก๐‘–=1,๐‘ก๐‘–โ‰ฅ0

โˆ‘๐‘–

๐‘ก๐‘– ใ€ˆ๐‘ฅโˆ—๐‘– , โ„Žใ€‰ = supโˆ‘๐‘– ๐‘ก๐‘–=1,๐‘ก๐‘–โ‰ฅ0

โˆ‘๐‘–

๐‘ก๐‘– sup๐‘ฅโˆ—๐‘– โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—๐‘– , โ„Žใ€‰ = sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰

since the sum is maximal if and only if each summand is maximal. Now we have that

๐‘€ (โ„Ž) = lim sup๐ธ๐นโˆŒ๐‘ฆโ†’๐‘ฅ

ใ€ˆโˆ‡๐น (๐‘ฆ), โ„Žใ€‰ = sup๐ธ๐นโˆŒ๐‘ฅ๐‘›โ†’๐‘ฅ

ใ€ˆlim๐‘›โ†’โˆž โˆ‡๐น (๐‘ฅ๐‘›), โ„Žใ€‰ = sup๐‘ฅโˆ—โˆˆ๐ด

ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰,

and hence the claim follows from Lemma 13.12. ๏ฟฝ

Remark 13.27. It is possible to extend the Clarke subdierential dened here to extended-real valuedfunctions using an equivalent, more geometrical, construction involving generalized normal conesto epigraphs; see [Clarke 1990, Denition 2.4.10]. We will follow this approach when studying themore general subdierentials for set-valued functionals in Chapters 18 and 20.

199

14 SEMISMOOTH NEWTON METHODS

The proximal point and splitting methods in Chapter 8 are generalizations of gradientmethods and in general have at most linear convergence. In this chapter, we will thereforeconsider second-order methods, specically a generalization of Newtonโ€™s method whichadmits (locally) superlinear convergence.

14.1 convergence of generalized newton methods

As a motivation, we rst consider the most general form of a Newton-type method. Let ๐‘‹and ๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ†’ ๐‘Œ be given and suppose we are looking for an ๐‘ฅ โˆˆ ๐‘‹with ๐น (๐‘ฅ) = 0. A Newton-type method to nd such an ๐‘ฅ then consists of repeating thefollowing steps:

1. choose an invertible๐‘€๐‘˜ โ‰” ๐‘€ (๐‘ฅ๐‘˜) โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ );2. solve the Newton step ๐‘€๐‘˜๐‘ 

๐‘˜ = โˆ’๐น (๐‘ฅ๐‘˜);3. update ๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ + ๐‘ ๐‘˜ .

We can now ask under which conditions this method converges to ๐‘ฅ , and in particular,when the convergence is superlinear, i.e.,

(14.1) lim๐‘˜โ†’โˆž

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–๐‘‹โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹

= 0.

(Recall the discussion in the beginning of Chapter 10.) For this purpose, we set ๐‘’๐‘˜ โ‰” ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅand use the Newton step together with the fact that ๐น (๐‘ฅ) = 0 to obtain that

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–๐‘‹ = โ€–๐‘ฅ๐‘˜ โˆ’๐‘€ (๐‘ฅ๐‘˜)โˆ’1๐น (๐‘ฅ๐‘˜) โˆ’ ๐‘ฅ โ€–๐‘‹= โ€–๐‘€ (๐‘ฅ๐‘˜)โˆ’1

[๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ) โˆ’๐‘€ (๐‘ฅ๐‘˜) (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)

]โ€–๐‘‹

= โ€–๐‘€ (๐‘ฅ + ๐‘’๐‘˜)โˆ’1[๐น (๐‘ฅ + ๐‘’๐‘˜) โˆ’ ๐น (๐‘ฅ) โˆ’๐‘€ (๐‘ฅ + ๐‘’๐‘˜)๐‘’๐‘˜

]โ€–๐‘‹

โ‰ค โ€–๐‘€ (๐‘ฅ + ๐‘’๐‘˜)โˆ’1โ€–๐•ƒ(๐‘Œ ;๐‘‹ ) โ€–๐น (๐‘ฅ + ๐‘’๐‘˜) โˆ’ ๐น (๐‘ฅ) โˆ’๐‘€ (๐‘ฅ + ๐‘’๐‘˜)๐‘’๐‘˜ โ€–๐‘Œ .

Hence, (14.1) holds under

200

14 semismooth newton methods

(i) a regularity condition: there exists a ๐ถ > 0 with

โ€–๐‘€ (๐‘ฅ + ๐‘’๐‘˜)โˆ’1โ€–๐•ƒ(๐‘Œ ;๐‘‹ ) โ‰ค ๐ถ for all ๐‘˜ โˆˆ โ„•;

(ii) an approximation condition:

lim๐‘˜โ†’โˆž

โ€–๐น (๐‘ฅ + ๐‘’๐‘˜) โˆ’ ๐น (๐‘ฅ) โˆ’๐‘€ (๐‘ฅ + ๐‘’๐‘˜)๐‘’๐‘˜ โ€–๐‘Œโ€–๐‘’๐‘˜ โ€–๐‘‹

= 0.

This motivates the following denition: We call ๐น : ๐‘‹ โ†’ ๐‘Œ Newton dierentiable in ๐‘ฅ โˆˆ ๐‘‹if there exists a neighborhood๐‘ˆ โŠ‚ ๐‘‹ of ๐‘ฅ and a mapping ๐ท๐‘ ๐น : ๐‘ˆ โ†’ ๐•ƒ(๐‘‹ ;๐‘Œ ) such that

(14.2) limโ€–โ„Žโ€–๐‘‹โ†’0

โ€–๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ„Žโ€–๐‘Œโ€–โ„Žโ€–๐‘‹ = 0.

We then call ๐ท๐‘ ๐น (๐‘ฅ) a Newton derivative of ๐น at ๐‘ฅ . Note the dierences to the Frรฉchetderivative: First, the Newton derivative is evaluated in ๐‘ฅ +โ„Ž instead of ๐‘ฅ . More importantly,we have not required any connection between๐ท๐‘ ๐น with ๐น ,while the only possible candidatefor the Frรฉchet derivative was the Gรขteaux derivative (which itself was linked to ๐น via thedirectional derivative). A function thus can only be Newton dierentiable (or not) withrespect to a concrete choice of ๐ท๐‘ ๐น . In particular, Newton derivatives are not unique.

If ๐น is Newton dierentiable with Newton derivative ๐ท๐‘ ๐น , we can set๐‘€ (๐‘ฅ๐‘˜) = ๐ท๐‘ ๐น (๐‘ฅ๐‘˜)and obtain the semismooth Newton method

(14.3) ๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐ท๐‘ ๐น (๐‘ฅ๐‘˜)โˆ’1๐น (๐‘ฅ๐‘˜).Its local superlinear convergence follows directly from the construction.

Theorem 14.1. Let๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be Newton dierentiable at ๐‘ฅ โˆˆ ๐‘‹with ๐น (๐‘ฅ) = 0 with Newton derivative ๐ท๐‘ ๐น (๐‘ฅ). Assume further that there exist ๐›ฟ > 0 and๐ถ > 0 with โ€–๐ท๐‘ ๐น (๐‘ฅ)โˆ’1โ€–๐•ƒ(๐‘Œ ;๐‘‹ ) โ‰ค ๐ถ for all ๐‘ฅ โˆˆ ๐•†(๐‘ฅ, ๐›ฟ). Then the semismooth Newton method(14.3) converges to ๐‘ฅ for all ๐‘ฅ0 suciently close to ๐‘ฅ .

Proof. The proof is virtually identical to that for the classical Newton method. We havealready shown that for any ๐‘ฅ0 โˆˆ ๐•†(๐‘ฅ, ๐›ฟ),(14.4) โ€–๐‘’1โ€–๐‘‹ โ‰ค ๐ถ โ€–๐น (๐‘ฅ + ๐‘’0) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ท๐‘ ๐น (๐‘ฅ + ๐‘’0)๐‘’0โ€–๐‘Œ .Let now Y โˆˆ (0, 1) be arbitrary. The Newton dierentiability of ๐น then implies that thereexists a ๐œŒ > 0 such that

โ€–๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ„Žโ€–๐‘Œ โ‰ค Y

๐ถโ€–โ„Žโ€–๐‘‹ for all โ€–โ„Žโ€–๐‘‹ โ‰ค ๐œŒ.

Hence, if we choose ๐‘ฅ0 such that โ€–๐‘ฅ โˆ’ ๐‘ฅ0โ€–๐‘‹ โ‰ค min{๐›ฟ, ๐œŒ}, the estimate (14.4) implies thatโ€–๐‘ฅโˆ’๐‘ฅ 1โ€–๐‘‹ โ‰ค Yโ€–๐‘ฅโˆ’๐‘ฅ0โ€–๐‘‹ . By induction,we obtain from this that โ€–๐‘ฅโˆ’๐‘ฅ๐‘˜ โ€–๐‘‹ โ‰ค Y๐‘˜ โ€–๐‘ฅโˆ’๐‘ฅ0โ€–๐‘‹ โ†’ 0.Since Y โˆˆ (0, 1) was arbitrary, we can take in each step ๐‘˜ a dierent Y๐‘˜ โ†’ 0, which showsthat the convergence is in fact superlinear. ๏ฟฝ

201

14 semismooth newton methods

14.2 newton derivatives

The remainder of this chapter is dedicated to the construction of Newton derivatives(although it should be pointed out that the verication of the approximation condition isusually the much more involved step in practice). We begin with the obvious connectionwith the Frรฉchet derivative.

Theorem 14.2. Let ๐‘‹,๐‘Œ be Banach spaces. If ๐น : ๐‘‹ โ†’ ๐‘Œ is continuously dierentiable at๐‘ฅ โˆˆ ๐‘‹ , then ๐น is also Newton dierentiable at ๐‘ฅ with Newton derivative ๐ท๐‘ ๐น (๐‘ฅ) = ๐น โ€ฒ(๐‘ฅ).

Proof. We have for arbitrary โ„Ž โˆˆ ๐‘‹ that

โ€–๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐น โ€ฒ(๐‘ฅ + โ„Ž)โ„Žโ€–๐‘Œ โ‰ค โ€–๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žโ€–๐‘Œ+ โ€–๐น โ€ฒ(๐‘ฅ) โˆ’ ๐น โ€ฒ(๐‘ฅ + โ„Ž)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ€–โ„Žโ€–๐‘‹ ,

where the rst summand is ๐‘œ (โ€–โ„Žโ€–๐‘‹ ) by denition of the Frรฉchet derivative and the secondby the continuity of ๐น โ€ฒ. ๏ฟฝ

Calculus rules can be shown similarly to those for Frรฉchet derivatives. For the sum rulethis is immediate; here we prove a chain rule by way of example.

Theorem 14.3. Let ๐‘‹ , ๐‘Œ , and ๐‘ be Banach spaces, and let ๐น : ๐‘‹ โ†’ ๐‘Œ be Newton dierentiableat ๐‘ฅ โˆˆ ๐‘‹ with Newton derivative ๐ท๐‘ ๐น (๐‘ฅ) and ๐บ : ๐‘Œ โ†’ ๐‘ be Newton dierentiable at๐‘ฆ โ‰” ๐น (๐‘ฅ) โˆˆ ๐‘Œ with Newton derivative ๐ท๐‘๐บ (๐‘ฆ). If ๐ท๐‘ ๐น and ๐ท๐‘๐บ are uniformly bounded ina neighborhood of ๐‘ฅ and ๐‘ฆ , respectively, then ๐บ โ—ฆ ๐น is also Newton dierentiable at ๐‘ฅ withNewton derivative

๐ท๐‘ (๐บ โ—ฆ ๐น ) (๐‘ฅ) = ๐ท๐‘๐บ (๐น (๐‘ฅ)) โ—ฆ ๐ท๐‘ ๐น (๐‘ฅ).

Proof. We proceed as in the proof of Theorem 2.7. For โ„Ž โˆˆ ๐‘‹ and ๐‘” โ‰” ๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) wehave that

(๐บ โ—ฆ ๐น ) (๐‘ฅ + โ„Ž) โˆ’ (๐บ โ—ฆ ๐น ) (๐‘ฅ) = ๐บ (๐‘ฆ + ๐‘”) โˆ’๐บ (๐‘ฆ).The Newton dierentiability of ๐บ then implies that

โ€–(๐บ โ—ฆ ๐น ) (๐‘ฅ + โ„Ž) โˆ’ (๐บ โ—ฆ ๐น ) (๐‘ฅ) โˆ’ ๐ท๐‘๐บ (๐‘ฆ + ๐‘”)๐‘”โ€–๐‘ = ๐‘Ÿ1(โ€–๐‘”โ€–๐‘Œ )

with ๐‘Ÿ1(๐‘ก)/๐‘ก โ†’ 0 for ๐‘ก โ†’ 0. The Newton dierentiability of ๐น further implies that

โ€–๐‘” โˆ’ ๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ„Žโ€–๐‘Œ = ๐‘Ÿ2(โ€–โ„Žโ€–๐‘‹ )

with ๐‘Ÿ2(๐‘ก)/๐‘ก โ†’ 0 for ๐‘ก โ†’ 0. In particular,

โ€–๐‘”โ€–๐‘Œ โ‰ค โ€–๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ€–โ„Žโ€–๐‘Œ + ๐‘Ÿ2(โ€–โ„Žโ€–๐‘‹ ).

202

14 semismooth newton methods

The uniform boundedness of ๐ท๐‘ ๐น now implies that โ€–๐‘”โ€–๐‘Œ โ†’ 0 for โ€–โ„Žโ€–๐‘‹ โ†’ 0. Hence, usingthat ๐‘ฆ + ๐‘” = ๐น (๐‘ฅ + โ„Ž), we obtain

โ€–(๐บ โ—ฆ ๐น ) (๐‘ฅ + โ„Ž) โˆ’ (๐บ โ—ฆ ๐น ) (๐‘ฅ) โˆ’ ๐ท๐‘๐บ (๐น (๐‘ฅ + โ„Ž))๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ„Žโ€–๐‘โ‰ค โ€–๐บ (๐‘ฆ + ๐‘”) โˆ’๐บ (๐‘ฆ) โˆ’ ๐ท๐‘๐บ (๐‘ฆ + ๐‘”)๐‘”โ€–๐‘+ โ€–๐ท๐‘๐บ (๐‘ฆ + ๐‘”) [๐‘” โˆ’ ๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ„Ž] โ€–๐‘

โ‰ค ๐‘Ÿ1(โ€–๐‘”โ€–๐‘Œ ) + โ€–๐ท๐‘๐บ (๐‘ฆ + ๐‘”)โ€–๐•ƒ(๐‘Œ ;๐‘ )๐‘Ÿ2(โ€–โ„Žโ€–๐‘‹ ),

and the claim thus follows from the uniform boundedness of ๐ท๐‘๐บ . ๏ฟฝ

Finally, it follows directly from the denition of the product norm andNewton dierentiabil-ity that Newton derivatives of vector-valued functions can be computed componentwise.

Theorem 14.4. Let ๐‘‹,๐‘Œ๐‘– be Banach spaces and let ๐น๐‘– : ๐‘‹ โ†’ ๐‘Œ๐‘– be Newton dierentiable withNewton derivative ๐ท๐‘ ๐น๐‘– for 1 โ‰ค ๐‘– โ‰ค ๐‘š. Then

๐น : ๐‘‹ โ†’ (๐‘Œ1 ร— ยท ยท ยท ร— ๐‘Œ๐‘š), ๐‘ฅ โ†ฆโ†’ (๐น1(๐‘ฅ), . . . , ๐น๐‘š (๐‘ฅ))๐‘‡

is also Newton dierentiable with Newton derivative

๐ท๐‘ ๐น (๐‘ฅ) = (๐ท๐‘ ๐น1(๐‘ฅ), . . . , ๐ท๐‘ ๐น๐‘š (๐‘ฅ))๐‘‡ .

Since the denition of a Newton derivative is not constructive, allowing dierent choices,the question remains how to obtain a candidate for which the approximation condition inthe denition can be veried. For two classes of functions, such an explicit construction isknown.

locally lipschitz continuous functions on โ„๐‘

If ๐น : โ„๐‘ โ†’ โ„ is locally Lipschitz continuous, candidates can be taken from the Clarkesubdierential, which has an explicit characterization by Theorem 13.26. Under someadditional assumptions, each candidate is indeed a Newton derivative.

A function ๐น : โ„๐‘ โ†’ โ„ is called piecewise (continuously) dierentiable or PC1 function, if

(i) ๐น is continuous on โ„๐‘ , and

(ii) for all ๐‘ฅ โˆˆ โ„๐‘ there exists an open neighborhood ๐‘ˆ๐‘ฅ โŠ‚ โ„๐‘ of ๐‘ฅ and a nite set{๐น๐‘– : ๐‘ˆ๐‘ฅ โ†’ โ„}๐‘–โˆˆ๐ผ๐‘ฅ of continuously dierentiable functions with

๐น (๐‘ฆ) โˆˆ {๐น๐‘– (๐‘ฆ)}๐‘–โˆˆ๐ผ๐‘ฅ for all ๐‘ฆ โˆˆ ๐‘ˆ๐‘ฅ .

203

14 semismooth newton methods

In this case, we call ๐น a continuous selection of the ๐น๐‘– in๐‘ˆ๐‘ฅ . The set

๐ผ๐‘Ž (๐‘ฅ) โ‰” {๐‘– โˆˆ ๐ผ๐‘ฅ | ๐น (๐‘ฅ) = ๐น๐‘– (๐‘ฅ)}

is called the active index set at ๐‘ฅ . Since the ๐น๐‘– are continuous, we have that ๐น (๐‘ฆ) โ‰  ๐น ๐‘— (๐‘ฆ)for all ๐‘— โˆ‰ ๐ผ๐‘Ž (๐‘ฅ) and ๐‘ฆ suciently close to ๐‘ฅ . Hence, indices that are only active on sets ofzero measure do not have to be considered in the following. We thus dene the essentiallyactive index set

๐ผ๐‘’ (๐‘ฅ) โ‰” {๐‘– โˆˆ ๐ผ๐‘ฅ | ๐‘ฅ โˆˆ cl (int {๐‘ฆ โˆˆ ๐‘ˆ๐‘ฅ | ๐น (๐‘ฆ) = ๐น๐‘– (๐‘ฆ)})} โŠ‚ ๐ผ๐‘Ž (๐‘ฅ).

An example of an active but not essentially active index set is the following.

Example 14.5. Consider the function

๐‘“ : โ„ โ†’ โ„, ๐‘ก โ†ฆโ†’ max{0, ๐‘ก, ๐‘ก/2},

i.e., ๐‘“1(๐‘ก) = 0, ๐‘“2(๐‘ก) = ๐‘ก , and ๐‘“3(๐‘ก) = ๐‘ก/2. Then ๐ผ๐‘Ž (0) = {1, 2, 3} but ๐ผ๐‘’ (0) = {1, 2}, since๐‘“3 is active only at ๐‘ก = 0 and hence int {๐‘ก โˆˆ โ„ | ๐‘“ (๐‘ก) = ๐‘“3(๐‘ก)} = โˆ… = cl โˆ….

Since any ๐ถ1 function ๐น๐‘– : ๐‘ˆ๐‘ฅ โ†’ โ„ is Lipschitz continuous with Lipschitz constant๐ฟ๐‘– โ‰” sup๐‘ฆโˆˆ๐‘ˆ๐‘ฅ |โˆ‡๐น (๐‘ฆ) | by Lemma 2.11, PC1 functions are always locally Lipschitz continuous;see [Scholtes 2012, Corollary 4.1.1].

Theorem 14.6. Let ๐น : โ„๐‘ โ†’ โ„ be piecewise dierentiable. Then ๐น is locally Lipschitzcontinuous on โ„๐‘ with local constant ๐ฟ(๐‘ฅ) = max๐‘–โˆˆ๐ผ๐‘Ž (๐‘ฅ) ๐ฟ๐‘– .

This yields the following explicit characterization of the Clarke subdierential of PC1

functions.

Theorem 14.7. Let ๐น : โ„๐‘ โ†’ โ„ be piecewise dierentiable and ๐‘ฅ โˆˆ โ„๐‘ . Then

๐œ•๐ถ๐น (๐‘ฅ) = co {โˆ‡๐น๐‘– (๐‘ฅ) | ๐‘– โˆˆ ๐ผ๐‘’ (๐‘ฅ)} .

Proof. Let ๐‘ฅ โˆˆ โ„๐‘ be arbitrary. By Theorem 13.26 it suces to show that{lim๐‘›โ†’โˆžโˆ‡๐น (๐‘ฅ๐‘›)

๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅ๐‘› โ†’ ๐‘ฅ, ๐‘ฅ๐‘› โˆ‰ ๐ธ๐น}= {โˆ‡๐น๐‘– (๐‘ฅ) | ๐‘– โˆˆ ๐ผ๐‘’ (๐‘ฅ)} ,

where ๐ธ๐น is the set of Lebesgue measure 0 where ๐น is not dierentiable from Rademacherโ€™sTheorem. For this, let {๐‘ฅ๐‘›}๐‘›โˆˆโ„• โŠ‚ โ„๐‘ be a sequence with ๐‘ฅ๐‘› โ†’ ๐‘ฅ , ๐น is dierentiable at ๐‘ฅ๐‘›for all ๐‘› โˆˆ โ„•, and โˆ‡๐น (๐‘ฅ๐‘›) โ†’ ๐‘ฅโˆ— โˆˆ โ„๐‘ . Since ๐น is dierentiable at ๐‘ฅ๐‘›, it must hold that๐น (๐‘ฆ) = ๐น๐‘–๐‘› (๐‘ฆ) for some ๐‘–๐‘› โˆˆ ๐ผ๐‘Ž (๐‘ฅ) and all ๐‘ฆ suciently close to ๐‘ฅ๐‘› , which implies thatโˆ‡๐น (๐‘ฅ๐‘›) = โˆ‡๐น๐‘–๐‘› (๐‘ฅ๐‘›). For suciently large ๐‘› โˆˆ โ„•, we can further assume that ๐‘–๐‘› โˆˆ ๐ผ๐‘’ (๐‘ฅ)

204

14 semismooth newton methods

(if necessary, by adding ๐‘ฅ๐‘› with ๐‘–๐‘› โˆ‰ ๐ผ๐‘’ (๐‘ฅ) to ๐ธ๐น , which does not increase its Lebesguemeasure). If we now consider subsequences {๐‘ฅ๐‘›๐‘˜ }๐‘˜โˆˆโ„• with constant index ๐‘–๐‘›๐‘˜ = ๐‘– โˆˆ ๐ผ๐‘’ (๐‘ฅ)(which exist since ๐ผ๐‘’ (๐‘ฅ) is nite), we obtain using the continuity of โˆ‡๐น๐‘– that

๐‘ฅโˆ— = lim๐‘˜โ†’โˆž

โˆ‡๐น (๐‘ฅ๐‘›๐‘˜ ) = lim๐‘˜โ†’โˆž

โˆ‡๐น๐‘– (๐‘ฅ๐‘›๐‘˜ ) โˆˆ {โˆ‡๐น๐‘– (๐‘ฅ) | ๐‘– โˆˆ ๐ผ๐‘’ (๐‘ฅ)} .

Conversely, for everyโˆ‡๐น๐‘– (๐‘ฅ) with ๐‘– โˆˆ ๐ผ๐‘’ (๐‘ฅ) there exists by denition of the essentially activeindices a sequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• with ๐‘ฅ๐‘› โ†’ ๐‘ฅ and ๐น = ๐น๐‘– in a suciently small neighborhoodof each ๐‘ฅ๐‘› for ๐‘› large enough. The continuous dierentiability of the ๐น๐‘– thus implies thatโˆ‡๐น (๐‘ฅ๐‘›) = โˆ‡๐น๐‘– (๐‘ฅ๐‘›) for all ๐‘› โˆˆ โ„• large enough and hence that

โˆ‡๐น๐‘– (๐‘ฅ) = lim๐‘›โ†’โˆžโˆ‡๐น๐‘– (๐‘ฅ๐‘›) = lim

๐‘›โ†’โˆžโˆ‡๐น (๐‘ฅ๐‘›). ๏ฟฝ

From this, we obtain the Newton dierentiability of PC1 functions.

Theorem 14.8. Let ๐น : โ„๐‘ โ†’ โ„ be piecewise dierentiable. Then ๐น is Newton dierentiablefor all ๐‘ฅ โˆˆ โ„๐‘ , and every ๐ท๐‘ ๐น (๐‘ฅ) โˆˆ ๐œ•๐ถ๐น (๐‘ฅ) is a Newton derivative.

Proof. Let ๐‘ฅ โˆˆ โ„๐‘ be arbitrary and โ„Ž โˆˆ ๐‘‹ with ๐‘ฅ + โ„Ž โˆˆ ๐‘ˆ๐‘ฅ . By Theorem 14.7, every๐ท๐‘ ๐น (๐‘ฅ + โ„Ž) โˆˆ ๐œ•๐ถ๐น (๐‘ฅ + โ„Ž) is of the form

๐ท๐‘ ๐น (๐‘ฅ + โ„Ž) =โˆ‘

๐‘–โˆˆ๐ผ๐‘’ (๐‘ฅ+โ„Ž)๐‘ก๐‘–โˆ‡๐น๐‘– (๐‘ฅ + โ„Ž) for

โˆ‘๐‘–โˆˆ๐ผ๐‘’ (๐‘ฅ+โ„Ž)

๐‘ก๐‘– = 1, ๐‘ก๐‘– โ‰ฅ 0.

Since ๐น is continuous, we have for all โ„Ž โˆˆ โ„๐‘ suciently small that ๐ผ๐‘’ (๐‘ฅ +โ„Ž) โŠ‚ ๐ผ๐‘Ž (๐‘ฅ +โ„Ž) โŠ‚๐ผ๐‘Ž (๐‘ฅ), where the second inclusion follows from the fact that by continuity, ๐น (๐‘ฅ) โ‰  ๐น๐‘– (๐‘ฅ)implies that ๐น (๐‘ฅ + โ„Ž) โ‰  ๐น๐‘– (๐‘ฅ + โ„Ž). Hence, ๐น (๐‘ฅ + โ„Ž) = ๐น๐‘– (๐‘ฅ + โ„Ž) and ๐น (๐‘ฅ) = ๐น๐‘– (๐‘ฅ) for all๐‘– โˆˆ ๐ผ๐‘’ (๐‘ฅ + โ„Ž). Theorem 14.2 then yields that

|๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐ท๐‘ ๐น (๐‘ฅ + โ„Ž)โ„Ž | โ‰คโˆ‘

๐‘–โˆˆ๐ผ๐‘’ (๐‘ฅ+โ„Ž)๐‘ก๐‘– |๐น๐‘– (๐‘ฅ + โ„Ž) โˆ’ ๐น๐‘– (๐‘ฅ) โˆ’ โˆ‡๐น๐‘– (๐‘ฅ + โ„Ž)โ„Ž | = ๐‘œ (โ€–โ„Žโ€–),

since all ๐น๐‘– are continuously dierentiable by assumption. ๏ฟฝ

A natural application of the above are proximal point mappings of convex function-als.

Example 14.9.

(i) We rst consider the proximal mapping for the indicator function ๐›ฟ๐ด : โ„๐‘ โ†’ โ„

of the set ๐ด โ‰”{๐‘ฅ โˆˆ โ„๐‘

๏ฟฝ๏ฟฝ ๐‘ฅ๐‘– โˆˆ [๐‘Ž, ๐‘]} for some ๐‘Ž < ๐‘ โˆˆ โ„. Analogously to (iii),the corresponding proximal mapping is the componentwise projection

[proj๐ด (๐‘ฅ)]๐‘– = proj[๐‘Ž,๐‘]๐‘ฅ๐‘– =

๐‘Ž if ๐‘ฅ๐‘– < ๐‘Ž,๐‘ฅ๐‘– if ๐‘ฅ๐‘– โˆˆ [๐‘Ž, ๐‘],๐‘ if ๐‘ฅ๐‘– < ๐‘,

205

14 semismooth newton methods

which is clearly piecewise dierentiable. Theorem 14.7 thus yields (also compo-nentwise) that

๐œ•๐ถ [proj๐ด (๐‘ฅ)]๐‘– ={1} if ๐‘ฅ๐‘– โˆˆ (๐‘Ž, ๐‘),{0} if ๐‘ฅ๐‘– โˆ‰ [๐‘Ž, ๐‘],[0, 1] if ๐‘ฅ๐‘– โˆˆ {๐‘Ž, ๐‘}.

By Theorems 14.4 and 14.8, a possible Newton derivative is therefore given by

[๐ท๐‘proj๐ด (๐‘ฅ)โ„Ž]๐‘– = [๐Ÿ™[๐‘Ž,๐‘] (๐‘ฅ) ๏ฟฝ โ„Ž]๐‘– โ‰”{โ„Ž๐‘– if ๐‘ฅ๐‘– โˆˆ [๐‘Ž, ๐‘],0 if ๐‘ฅ๐‘– โˆ‰ [๐‘Ž, ๐‘],

where the choice of which case to include ๐‘ฅ๐‘– โˆˆ {๐‘Ž, ๐‘} in is arbitrary. (The com-ponentwise product [๐‘ฅ ๏ฟฝ ๐‘ฆ]๐‘– โ‰” ๐‘ฅ๐‘–๐‘ฆ๐‘– on โ„๐‘ is also known as the Hadamardproduct.)

(ii) Consider now the proximal mapping for ๐บ : โ„๐‘ โ†’ โ„, ๐บ (๐‘ฅ) โ‰” โ€–๐‘ฅ โ€–1, whoseproximal mapping for arbitrary๐›พ > 0 is given by Example 6.23 (ii) componentwiseas

[prox๐›พ๐บ (๐‘ฅ)]๐‘– =๐‘ฅ๐‘– โˆ’ ๐›พ if ๐‘ฅ๐‘– > ๐›พ,0 if ๐‘ฅ๐‘– โˆˆ [โˆ’๐›พ,๐›พ],๐‘ฅ๐‘– + ๐›พ if ๐‘ฅ๐‘– < โˆ’๐›พ .

Again, this is clearly piecewise dierentiable, and Theorem 14.7 thus yields (alsocomponentwise) that

๐œ•๐ถ [(prox๐›พ๐บ ) (๐‘ฅ)]๐‘– ={1} if |๐‘ฅ๐‘– | > ๐›พ,{0} if |๐‘ฅ๐‘– | < ๐›พ,[0, 1] if |๐‘ฅ๐‘– | = ๐›พ .

By Theorems 14.4 and 14.8, a possible Newton derivative is therefore given by

[๐ท๐‘prox๐›พ๐บ (๐‘ฅ)โ„Ž]๐‘– = [๐Ÿ™{|๐‘ก |โ‰ฅ๐›พ} (๐‘ฅ) ๏ฟฝ โ„Ž]๐‘– โ‰”{โ„Ž๐‘– if |๐‘ฅ๐‘– | โ‰ฅ ๐›พ,0 if |๐‘ฅ๐‘– | < ๐›พ,

where again we could have taken the value ๐‘กโ„Ž๐‘– for any ๐‘ก โˆˆ [0, 1] for |๐‘ฅ๐‘– | = ๐›พ .

superposition operators on ๐ฟ๐‘ (ฮฉ)

Rademacherโ€™s Theorem does not hold in innite-dimensional function spaces, and hencethe Clarke subdierential no longer yields an algorithmically useful candidate for a Newtonderivative in general. One exception is the class of superposition operators dened by

206

14 semismooth newton methods

scalar Newton dierentiable functions, for which the Newton derivative can be evaluatedpointwise as well.

We thus consider as in Section 2.3 for an open and bounded domainฮฉ โŠ‚ โ„๐‘ , a Carathรฉodoryfunction ๐‘“ : ฮฉ ร—โ„ โ†’ โ„ (i.e., (๐‘ฅ, ๐‘ง) โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is measurable in ๐‘ฅ and continuous in ๐‘ง), and1 โ‰ค ๐‘, ๐‘ž โ‰ค โˆž the corresponding superposition operator

๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ), [๐น (๐‘ข)] (๐‘ฅ) = ๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ.

The goal is now to similarly obtain a Newton derivative ๐ท๐‘ ๐น for ๐น as a superpositionoperator dened by the Newton derivative ๐ท๐‘ ๐‘“ (๐‘ฅ, ๐‘ง) for ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง). Here, the assump-tion that ๐ท๐‘ ๐‘“ is also a Carathรฉodory function is too restrictive, since we want to allowdiscontinuous derivatives as well (see Example 14.9). Luckily, for our purpose, a weakerproperty is sucient: A function is called Baireโ€“Carathรฉodory function if it can be writtenas a pointwise limit of Carathรฉodory functions.

Under certain growth conditions on ๐‘“ and ๐ท๐‘ ๐‘“ ,1 we can transfer the Newton dierentia-bility of ๐‘“ to ๐น , but we again have to take a two norm discrepancy into account.

Theorem 14.10. Let ๐‘“ : ฮฉ ร—โ„ โ†’ โ„ be a Carathรฉodory function. Furthermore, assume that

(i) ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is uniformly Lipschitz continuous for almost every ๐‘ฅ โˆˆ ฮฉ and ๐‘“ (๐‘ฅ, 0) isbounded;

(ii) ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is Newton dierentiable with Newton derivative ๐‘ง โ†ฆโ†’ ๐ท๐‘ ๐‘“ (๐‘ฅ, ๐‘ง) for almostevery ๐‘ฅ โˆˆ ฮฉ;

(iii) ๐ท๐‘ ๐‘“ is a Baireโ€“Carathรฉodory function and uniformly bounded.

Then for any 1 โ‰ค ๐‘ž < ๐‘ < โˆž, the corresponding superposition operator ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ)is Newton dierentiable with Newton derivative

๐ท๐‘ ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐•ƒ(๐ฟ๐‘ (ฮฉ);๐ฟ๐‘ž (ฮฉ)), [๐ท๐‘ ๐น (๐‘ข)โ„Ž] (๐‘ฅ) = ๐ท๐‘ ๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ))โ„Ž(๐‘ฅ)

for almost every ๐‘ฅ โˆˆ ฮฉ and all โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ).

Proof. First, the uniform Lipschitz continuity together with the reverse triangle inequalityyields that

|๐‘“ (๐‘ฅ, ๐‘ง) | โ‰ค |๐‘“ (๐‘ฅ, 0) | + ๐ฟ |๐‘ง | โ‰ค ๐ถ + ๐ฟ |๐‘ง |๐‘ž/๐‘ž for almost every ๐‘ฅ โˆˆ ฮฉ, ๐‘ง โˆˆ โ„,

and hence the growth condition (2.5) for all 1 โ‰ค ๐‘ž โ‰ค โˆž. Due to the continuous embedding๐ฟ๐‘ (ฮฉ) โ†ฉโ†’ ๐ฟ๐‘ž (ฮฉ) for all 1 โ‰ค ๐‘ž โ‰ค ๐‘ โ‰ค โˆž, the superposition operator ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ) istherefore well-dened and continuous by Theorem 2.12.

1which can be signicantly relaxed; see [Schiela 2008, Proposition a.1]

207

14 semismooth newton methods

For any measurable ๐‘ข : ฮฉ โ†’ โ„, we have that ๐‘ฅ โ†ฆโ†’ ๐ท๐‘ ๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ)) is by denition thepointwise limit of measurable functions and hence itself measurable. Furthermore, itsuniform boundedness in particular implies the growth condition (2.5) for ๐‘โ€ฒ โ‰” ๐‘ and๐‘žโ€ฒ โ‰” ๐‘ โˆ’ ๐‘ž > 0. As in the proof of Theorem 2.13, we deduce that the correspondingsuperposition operator ๐ท๐‘ ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘  (ฮฉ) is well-dened and continuous for ๐‘  โ‰” ๐‘๐‘ž

๐‘โˆ’๐‘ž ,and that for any ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ), the mapping โ„Ž โ†ฆโ†’ ๐ท๐‘ ๐น (๐‘ข)โ„Ž denes a bounded linear operator๐ท๐‘ ๐น (๐‘ข) : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ž (ฮฉ). (This time, we do not distinguish in notation between the linearoperator and the function that denes this operator by pointwise multiplication.)

To show that ๐ท๐‘ ๐น (๐‘ข) is a Newton derivative for ๐น in ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ), we consider the pointwiseresidual

๐‘Ÿ : ฮฉ ร—โ„ โ†’ โ„, ๐‘Ÿ (๐‘ฅ, ๐‘ง) โ‰”{ |๐‘“ (๐‘ฅ,๐‘ง)โˆ’๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ))โˆ’๐ท๐‘ ๐‘“ (๐‘ฅ,๐‘ง) (๐‘งโˆ’๐‘ข (๐‘ฅ)) |

|๐‘งโˆ’๐‘ข (๐‘ฅ) | if ๐‘ง โ‰  ๐‘ข (๐‘ฅ),0 if ๐‘ง = ๐‘ข (๐‘ฅ).

Since ๐‘“ is a Carathรฉodory function and๐ท๐‘ ๐‘“ is a Baireโ€“Carathรฉodory function, the function๐‘ฅ โ†ฆโ†’ ๐‘Ÿ (๐‘ฅ, ๏ฟฝ๏ฟฝ (๐‘ฅ)) =: [๐‘…(๏ฟฝ๏ฟฝ)] (๐‘ฅ) is measurable for any measurable ๏ฟฝ๏ฟฝ : ฮฉ โ†’ โ„ (since sums,products, and quotients of measurable functions are again measurable). Furthermore, for๏ฟฝ๏ฟฝ โˆˆ ๐ฟ๐‘ (ฮฉ), the uniform Lipschitz continuity of ๐‘“ and the uniform boundedness of ๐ท๐‘ ๐‘“imply that for almost every ๐‘ฅ โˆˆ ฮฉ with ๏ฟฝ๏ฟฝ (๐‘ฅ) โ‰  ๐‘ข (๐‘ฅ),

(14.5) | [๐‘…(๏ฟฝ๏ฟฝ)] (๐‘ฅ) | = |๐‘“ (๐‘ฅ, ๏ฟฝ๏ฟฝ (๐‘ฅ)) โˆ’ ๐‘“ (๐‘ฅ,๐‘ข (๐‘ฅ)) โˆ’ ๐ท๐‘ ๐‘“ (๐‘ฅ, ๏ฟฝ๏ฟฝ (๐‘ฅ)) (๏ฟฝ๏ฟฝ (๐‘ฅ) โˆ’ ๐‘ข (๐‘ฅ)) ||๏ฟฝ๏ฟฝ (๐‘ฅ) โˆ’ ๐‘ข (๐‘ฅ) | โ‰ค ๐ฟ +๐ถ

and thus that ๐‘…(๏ฟฝ๏ฟฝ) โˆˆ ๐ฟโˆž(ฮฉ). Hence, the superposition operator ๐‘… : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘  (ฮฉ) iswell-dened.

Let now {๐‘ข๐‘›}๐‘›โˆˆโ„• โŠ‚ ๐ฟ๐‘ (ฮฉ) be a sequence with ๐‘ข๐‘› โ†’ ๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ). Then there exists asubsequence, again denoted by {๐‘ข๐‘›}๐‘›โˆˆโ„•, with ๐‘ข๐‘› (๐‘ฅ) โ†’ ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ.Since ๐‘ง โ†ฆโ†’ ๐‘“ (๐‘ฅ, ๐‘ง) is Newton dierentiable almost everywhere, we have by denitionthat ๐‘Ÿ (๐‘ฅ,๐‘ข๐‘› (๐‘ฅ)) โ†’ 0 for almost every ๐‘ฅ โˆˆ ฮฉ. Together with the boundedness from (14.5),Lebesgueโ€™s dominated convergence theorem therefore yields that ๐‘…(๐‘ข๐‘›) โ†’ 0 in ๐ฟ๐‘  (ฮฉ) (andhence along the full sequence since the limit is unique).2 For any ๏ฟฝ๏ฟฝ โˆˆ ๐ฟ๐‘ (ฮฉ), the Hรถlderinequality with 1

๐‘ + 1๐‘  =

1๐‘ž thus yields that

โ€–๐น (๏ฟฝ๏ฟฝ) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐‘ ๐น (๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝ โˆ’ ๐‘ข)โ€–๐ฟ๐‘ž = โ€–๐‘…(๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝ โˆ’ ๐‘ข)โ€–๐ฟ๐‘ž โ‰ค โ€–๐‘…(๏ฟฝ๏ฟฝ)โ€–๐ฟ๐‘  โ€–๏ฟฝ๏ฟฝ โˆ’ ๐‘ขโ€–๐ฟ๐‘ .

If we now set ๏ฟฝ๏ฟฝ โ‰” ๐‘ข + โ„Ž for โ„Ž โˆˆ ๐ฟ๐‘ (ฮฉ) with โ€–โ„Žโ€–๐ฟ๐‘ โ†’ 0, we have that โ€–๐‘…(๐‘ข + โ„Ž)โ€–๐ฟ๐‘  โ†’ 0and hence by denition the Newton dierentiability of ๐น in ๐‘ข with Newton derivativeโ„Ž โ†ฆโ†’ ๐ท๐‘ ๐น (๐‘ข)โ„Ž as claimed. ๏ฟฝ

2This is the step that fails for ๐น : ๐ฟโˆž (ฮฉ) โ†’ ๐ฟโˆž (ฮฉ), since pointwise convergence and boundedness togetherdo not imply uniform convergence almost everywhere.

208

14 semismooth newton methods

Example 14.11.

(i) Consider

๐ด โ‰”{๐‘ข โˆˆ ๐ฟ2(ฮฉ)

๏ฟฝ๏ฟฝ ๐‘Ž โ‰ค ๐‘ข (๐‘ฅ) โ‰ค ๐‘ for almost every ๐‘ฅ โˆˆ ฮฉ}

and proj๐ด : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ2(ฮฉ) for ๐‘ > 2. Applying Theorem 14.10 to Example 14.9 (i)then yields the pointwise almost everywhere Newton derivative

[๐ท๐‘proj๐ด (๐‘ข)โ„Ž] (๐‘ฅ) = [๐Ÿ™[๐‘Ž,๐‘] (๐‘ข)โ„Ž] (๐‘ฅ) โ‰”{โ„Ž(๐‘ฅ) if ๐‘ข (๐‘ฅ) โˆˆ [๐‘Ž, ๐‘],0 if ๐‘ข (๐‘ฅ) โˆ‰ [๐‘Ž, ๐‘] .

(ii) Consider now

๐บ : ๐ฟ2(ฮฉ) โ†’ โ„, ๐บ (๐‘ข) = โ€–๐‘ขโ€–๐ฟ1 =โˆซฮฉ|๐‘ข (๐‘ฅ) | ๐‘‘๐‘ฅ

and prox๐›พ๐บ : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ2(ฮฉ) for ๐‘ > 2 and ๐›พ > 0. Applying Theorem 14.10 toExample 14.9 (ii) then yields the pointwise almost everywhere Newton derivative

[๐ท๐‘prox๐›พ๐บ (๐‘ข)โ„Ž] (๐‘ฅ) = [๐Ÿ™{|๐‘ก |โ‰ฅ๐›พ} (๐‘ข)โ„Ž] (๐‘ฅ) โ‰”{โ„Ž(๐‘ฅ) if |๐‘ข (๐‘ฅ) | โ‰ฅ ๐›พ,0 if |๐‘ข (๐‘ฅ) | < ๐›พ .

For ๐‘ = ๐‘ž โˆˆ [1,โˆž], however, the claim is false in general, as can be shown by counterexam-ples.

Example 14.12. We take

๐‘“ : โ„ โ†’ โ„, ๐‘“ (๐‘ง) = max{0, ๐‘ง} โ‰”{0 if ๐‘ง โ‰ค 0,๐‘ก if ๐‘ง โ‰ฅ 0.

This is a piecewise dierentiable function, and hence by Theorem 14.8 we can for any๐›ฟ โˆˆ [0, 1] take as Newton derivative

๐ท๐‘ ๐‘“ (๐‘ง)โ„Ž =

0 if ๐‘ง < 0,๐›ฟ if ๐‘ง = 0,โ„Ž if ๐‘ง > 0.

We now consider the corresponding superposition operators ๐น : ๐ฟ๐‘ (ฮฉ) โ†’ ๐ฟ๐‘ (ฮฉ)and ๐ท๐‘ ๐น (๐‘ข) โˆˆ ๐•ƒ(๐ฟ๐‘ (ฮฉ);๐ฟ๐‘ (ฮฉ)) for any ๐‘ โˆˆ [1,โˆž) and show that the approximation

209

14 semismooth newton methods

condition (14.2) is violated for ฮฉ = (โˆ’1, 1), ๐‘ข (๐‘ฅ) = โˆ’|๐‘ฅ |, and

โ„Ž๐‘› (๐‘ฅ) ={

1๐‘› if |๐‘ฅ | < 1

๐‘› ,

0 if |๐‘ฅ | โ‰ฅ 1๐‘› .

First, it is straightforward to compute โ€–โ„Ž๐‘›โ€–๐‘๐ฟ๐‘ (ฮฉ) = 2๐‘›๐‘+1 . Then since [๐น (๐‘ข)] (๐‘ฅ) =

max{0,โˆ’|๐‘ฅ |} = 0 almost everywhere, we have that

[๐น (๐‘ข + โ„Ž๐‘›) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐‘ ๐น (๐‘ข + โ„Ž๐‘›)โ„Ž๐‘›] (๐‘ฅ) =โˆ’|๐‘ฅ | if |๐‘ฅ | < 1

๐‘› ,

0 if |๐‘ฅ | > 1๐‘› ,

โˆ’๐›ฟ๐‘› if |๐‘ฅ | = 1

๐‘› ,

and thus

โ€–๐น (๐‘ข + โ„Ž๐‘›) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐‘ ๐น (๐‘ข + โ„Ž๐‘›)โ„Ž๐‘›โ€–๐‘๐ฟ๐‘ (ฮฉ) =โˆซ 1

๐‘›

โˆ’ 1๐‘›

|๐‘ฅ |๐‘ ๐‘‘๐‘ฅ =2

๐‘ + 1

(1๐‘›

)๐‘+1.

This implies that

lim๐‘›โ†’โˆž

โ€–๐น (๐‘ข + โ„Ž๐‘›) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐‘ ๐น (๐‘ข + โ„Ž๐‘›)โ„Ž๐‘›โ€–๐ฟ๐‘ (ฮฉ)โ€–โ„Ž๐‘›โ€–๐ฟ๐‘ (ฮฉ)

=

(1

๐‘ + 1

) 1๐‘

โ‰  0

and hence that ๐น is not Newton dierentiable from ๐ฟ๐‘ (ฮฉ) to ๐ฟ๐‘ (ฮฉ) for any ๐‘ < โˆž.

For the case ๐‘ = ๐‘ž = โˆž, we take ฮฉ = (0, 1), ๐‘ข (๐‘ฅ) = ๐‘ฅ , and

โ„Ž๐‘› (๐‘ฅ) ={๐‘›๐‘ฅ โˆ’ 1 if ๐‘ฅ โ‰ค 1

๐‘› ,

0 if ๐‘ฅ โ‰ฅ 1๐‘› ,

such that โ€–โ„Ž๐‘›โ€–๐ฟโˆž (ฮฉ) = 1 for all ๐‘› โˆˆ โ„•. We also have that ๐‘ฅ + โ„Ž๐‘› = (1 + ๐‘›)๐‘ฅ โˆ’ 1 โ‰ค 0 for๐‘ฅ โ‰ค 1

๐‘›+1 โ‰ค 1๐‘› and hence that

[๐น (๐‘ข + โ„Ž๐‘›) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐‘ ๐น (๐‘ข + โ„Ž๐‘›)โ„Ž๐‘›] (๐‘ฅ) ={(1 + ๐‘›)๐‘ฅ โˆ’ 1 if ๐‘ฅ โ‰ค 1

๐‘›+1 ,0 if ๐‘ฅ โ‰ฅ 1

๐‘›+1

since either โ„Ž๐‘› = 0 or ๐น (๐‘ข + โ„Ž๐‘›) = ๐น (๐‘ข) + ๐ท๐‘ ๐น (๐‘ข)โ„Ž๐‘› in the second case. Now,

sup๐‘ฅโˆˆ(0, 1

๐‘›+1 ]| (1 + ๐‘›)๐‘ฅ โˆ’ 1| = 1 for all ๐‘› โˆˆ โ„•,

210

14 semismooth newton methods

which implies that

lim๐‘›โ†’โˆž

โ€–๐น (๐‘ข + โ„Ž๐‘›) โˆ’ ๐น (๐‘ข) โˆ’ ๐ท๐‘ ๐น (๐‘ข + โ„Ž๐‘›)โ„Ž๐‘›โ€–๐ฟ๐‘ (ฮฉ)โ€–โ„Ž๐‘›โ€–๐ฟ๐‘ (ฮฉ)

= 1 โ‰  0

and hence that ๐น is not Newton dierentiable from ๐ฟโˆž(ฮฉ) to ๐ฟโˆž(ฮฉ) either.

Remark 14.13. Semismoothness was introduced in [Miin 1977] for Lipschitz-continuous functionals๐น : โ„๐‘ โ†’ โ„ as a condition relating Clarke subderivatives and directional derivatives near a point.This denition was extended to functions ๐น : โ„๐‘ โ†’ โ„๐‘€ in [Qi 1993; Qi & Sun 1993] and shownto imply a uniform version of the approximation condition (14.2) for all elements of the Clarkesubdierential and hence superlinear convergence of the semismooth Newton method in nitedimensions. A semismooth Newton method specically for PC1 functions was already consideredin [Kojima & Shindo 1986]. In Banach spaces, [Kummer 1988] was the rst to study an abstract classof Newton methods for nonsmooth equations based on the condition (14.2), unifying the previousresults; see [Klatte & Kummer 2002]. In all these works, the analysis was based on semismoothnessas a property relating ๐น : ๐‘‹ โ†’ ๐‘Œ to a set-valued mapping ๐บ : ๐‘‹ โ‡’ ๐ฟ(๐‘‹,๐‘Œ ), whose elements(uniformly) satisfy (14.2). In contrast, [Kummer 2000; Chen, Nashed & Qi 2000] considered โ€“ as wedo in this book โ€“ single-valued Newton derivatives (named Newton maps in the former and slantingfunctions in the latter) in Banach spaces. This approach was later followed in [Hintermรผller, Ito& Kunisch 2002; Ito & Kunisch 2008] to show that for a specic choice of Newton derivative, theclassical primal-dual active set method for solving quadratic optimization problems under linearinequality constraints can be interpreted as a semismooth Newton method. In parallel, [Ulbrich2002; Ulbrich 2011] showed that superposition operators dened by semismooth functions (in thesense of [Qi & Sun 1993]) are semismooth (in the sense of [Kummer 1988]) between the right spaces.A similar result for single-valued Newton derivatives was shown in [Schiela 2008] using a proofthat is much closer to the one for the classical dierentiability of superposition operators; compareTheorems 2.13 and 14.10. It should, however, be mentioned that not all calculus results for semismoothfunctions are available in the single-valued setting; for example, the implicit function theorem from[Kruse 2018] requires set-valued Newton derivatives, since the selection of the Newton derivative ofthe implicit function need not correspond to the selection of the given mapping. Finally, we remarkthat the notion of semismoothness and semismooth Newton methods were very recently extendedto set-valued mappings in [Gferer & Outrata 2019].

211

15 NONLINEAR PRIMAL-DUAL PROXIMAL SPLITTING

In this chapter, our goal is to extend the primal-dual proximal splitting (PDPS) method tononlinear operators ๐พ โˆˆ ๐ถ1(๐‘‹ ;๐‘Œ ), i.e., to problems of the form

(15.1) min๐‘‹๐น (๐‘ฅ) +๐บ (๐พ (๐‘ฅ)),

where we still assume ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘Œ โ†’ โ„ to be convex, proper, and lowersemicontinuous on the Hilbert spaces ๐‘‹ and ๐‘Œ . For simplicity, we will only consider linearconvergence under a strong convexity assumption and refer to the literature for weakconvergence and acceleration under partial strong convexity (see Remark 15.12 below). Asin earlier chapters, we use the same notation for the inner product as for the duality pairingin Hilbert spaces to distinguish them better from pairs of elements.

We recall the three-point program for convergence proofs of rst-order methods fromChapter 9, which remains fundamentally the same in the nonlinear setting. However,we need to make some of the concepts local. Thus the three main ingredients of ourconvergence proofs will be the following.

(i) The three-point identity (1.5).

(ii) The local monotonicity of the operator๐ป whose roots correspond to the (primal-dual)critical points of (15.1). We x one of the points in the denition of monotonicityin Section 6.2 to a root ๐‘ฅ of ๐ป , and only vary the other point in a neighborhood of๐‘ฅ . This is essentially a nonsmooth variant of the standard second-order sucient(or local quadratic growth) condition โˆ‡2๐น (๐‘ฅ) ๏ฟฝ 0 (i.e., positive deniteness of theHessian) for minimizing a smooth function ๐น : โ„๐‘ โ†’ โ„.

(iii) The nonnegativity of the preconditioning operators๐‘€๐‘˜+1 dening the implicit formof the algorithm. These will now in general depend on the current iterate, and thuswe can only show the nonnegativity in a neighborhood of suitable ๐‘ฅ .

15.1 nonconvex explicit splitting

To motivate our more specic assumptions on ๐พ , we start by showing that forward-backward splitting can be applied to a nonconvex function for the forward step. We

212

15 nonlinear primal-dual proximal splitting

thus consider for the problem

(15.2) min๐‘ฅโˆˆ๐‘‹

๐บ (๐‘ฅ) + ๐น (๐‘ฅ),

with ๐น smooth but possibly nonconvex, the algorithm

(15.3) ๐‘ฅ๐‘˜+1 = prox๐œ๐บ (๐‘ฅ๐‘˜ โˆ’ ๐œโˆ‡๐น (๐‘ฅ๐‘˜)) .

To show convergence of this algorithm, we extend the non-value three-point smoothnessinequalities of Corollaries 7.2 and 7.7 from convex smooth functions to ๐ถ2 functions. (It isalso possible to obtain corresponding value inequalities.)

Lemma 15.1. Suppose ๐น โˆˆ ๐ถ2(๐‘‹ ). Let ๐‘ง, ๐‘ฅ โˆˆ ๐‘‹ , and suppose for some ๐ฟ > 0 and ๐›พ โ‰ฅ 0 for allZ โˆˆ ๐”น(๐‘ฅ, โ€–๐‘ง โˆ’ ๐‘ฅ โ€–๐‘‹ ) that ๐›พ ยท Id โ‰ค โˆ‡2๐น (Z ) โ‰ค ๐ฟ ยท Id. Then for any ๐›ฝ โˆˆ (0, 2] and ๐‘ฅ โˆˆ ๐‘‹ we have

(15.4) ใ€ˆโˆ‡๐น (๐‘ง) โˆ’ โˆ‡๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ (2 โˆ’ ๐›ฝ)2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

2๐›ฝ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ .

Proof. By the one-dimensionalmean value theorem applied to ๐‘ก โ†ฆโ†’ ใ€ˆโˆ‡๐น (๐‘ฅ+๐‘ก (๐‘งโˆ’๐‘ฅ)), ๐‘ฅโˆ’๐‘ฅใ€‰๐‘‹ ,we obtain for Z = ๐‘ฅ + ๐‘  (๐‘ง โˆ’ ๐‘ฅ) for some ๐‘  โˆˆ [0, 1] that

ใ€ˆโˆ‡๐น (๐‘ง) โˆ’ โˆ‡๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = ใ€ˆโˆ‡2๐น (Z ) (๐‘ง โˆ’ ๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ .

Therefore, for any ๐›ฝ > 0,

(15.5) ใ€ˆโˆ‡๐น (๐‘ง) โˆ’ โˆ‡๐น (๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2โˆ‡2๐น (Z ) + ใ€ˆโˆ‡2๐น (Z ) (๐‘ง โˆ’ ๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹

โ‰ฅ 2 โˆ’ ๐›ฝ2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2โˆ‡2๐น (Z ) โˆ’

12๐›ฝ โ€–๐‘ฅ โˆ’ ๐‘งโ€–2โˆ‡2๐น (Z ) .

By the denition of ๐›พ and ๐ฟ, we obtain (15.4). ๏ฟฝ

The following result is almost a carbon copy of Theorems 9.6 and 10.2 for convex smooth๐น . However, since our present problem is nonconvex, we can only expect to locally nd acritical point of ๐ฝ โ‰” ๐น +๐บ .

Theorem 15.2. Let ๐น โˆˆ ๐ถ2(๐‘‹ ) and let๐บ : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous.Given an initial iterate ๐‘ฅ0 and a critical point ๐‘ฅ โˆˆ [๐œ•๐บ + โˆ‡๐น ]โˆ’1(0) of ๐ฝ โ‰” ๐น + ๐บ , letX โ‰” ๐”น(๐‘ฅ, โ€–๐‘ฅ0 โˆ’ ๐‘ฅ โ€–), and suppose for some ๐ฟ > 0 and ๐›พ โ‰ฅ 0 that

(15.6) ๐›พ ยท Id โ‰ค โˆ‡2๐น (Z ) โ‰ค ๐ฟ ยท Id (Z โˆˆ X).

Take 0 < ๐œ < 2๐ฟโˆ’1.

(i) If ๐›พ > 0, the sequence {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• generated by (15.3) converges linearly to ๐‘ฅ .

213

15 nonlinear primal-dual proximal splitting

(ii) If ๐›พ = 0, the sequence {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• converges weakly to a critical point of ๐ฝ .

Note that if ๐บ is locally nite-valued, then by Theorem 13.20 our denition of a criticalpoint in this theorem means ๐‘ฅ โˆˆ [๐œ•๐ถ ๐ฝ ]โˆ’1(0).

Proof. As usual, we write (15.3) as

(15.7) 0 โˆˆ ๐œ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).Suppose ๐‘ฅ๐‘˜ โˆˆ X and let ๐›ฝ โˆˆ (๐ฟ๐œ, 2) be arbitrary (which is possible since ๐œ๐ฟ < 2). By themonotonicity of ๐œ•๐บ and the local three-pointmonotonicity (15.4) of ๐น implied by Lemma 15.1,we obtain

ใ€ˆ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ (2 โˆ’ ๐›ฝ)2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ

2๐›ฝ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Observe that if we had ๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ (or ๐น = 0), this would show the local quadratic growth of๐น +๐บ at ๐‘ฅ . Since ๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ , we however need to compensate for taking the forward stepwith respect to ๐น .

Testing (15.7) by the application of ๐œ‘๐‘˜ ใ€ˆ ยท , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ for some testing parameter ๐œ‘๐‘˜ > 0yields

๐œ‘๐‘˜๐›พ๐œ (2 โˆ’ ๐›ฝ)2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐œ‘๐‘˜๐ฟ๐œ

2๐›ฝ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ๐œ‘๐‘˜ ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0.

Taking

(15.8) ๐œ‘๐‘˜+1 โ‰” ๐œ‘๐‘˜ (1 + ๐›พ๐œ (2 โˆ’ ๐›ฝ)) with ๐œ‘0 > 0,

the three-point formula (9.1) yields

(15.9) ๐œ‘๐‘˜+12 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜ (1 โˆ’ ๐œ๐ฟ/๐›ฝ)2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ค ๐œ‘๐‘˜

2 โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

Since ๐›ฝ โˆˆ (๐ฟ๐œ, 2) and ๐‘ฅ๐‘˜ โˆˆ X, this implies that ๐‘ฅ๐‘˜+1 โˆˆ X. By induction, we thus obtain that{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• โŠ‚ X under our assumption ๐‘ฅ0 โˆˆ X.

If ๐›พ > 0, the recursion (15.8) together with ๐›ฝ < 2 shows that ๐œ‘๐‘˜ grows exponentially. Usingthat ๐œ๐ฟ/๐›ฝ โ‰ค 1 and telescoping (15.9) then shows the claimed linear convergence.

Let us then consider weak convergence. With ๐›พ = 0 and ๐›ฝ < 2, the recursion (15.8) reducesto ๐œ‘๐‘˜+1 โ‰ก ๐œ‘0 > 0. Since ๐œ๐ฟ โ‰ค ๐›ฝ , the estimate (15.9) yields Fejรฉr monotonicity of theiterates {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. Moreover, we establish for๐‘ค๐‘˜+1 โ‰” โˆ’๐œโˆ’1(๐‘ฅ๐‘˜+1โˆ’๐‘ฅ๐‘˜) that โ€–๐‘ค๐‘˜+1โ€–๐‘‹ โ†’ 0 and๐‘ค๐‘˜+1 โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1) +โˆ‡๐น (๐‘ฅ๐‘˜) for all ๐‘˜ โˆˆ โ„•. Let ๐‘ฅ be any weak limit point of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•, i.e., thereexists a subsequence {๐‘ฅ๐‘˜๐‘› }๐‘›โˆˆโ„• with ๐‘ฅ๐‘˜๐‘› โ‡€ ๐‘ฅ โˆˆ X. Then also ๐‘ฅ๐‘˜๐‘›+1 โ‡€ ๐‘ฅ โˆˆ X. Since โˆ‡๐นis by (15.6) Lipschitz continuous in X, we have โˆ‡๐น (๐‘ฅ๐‘˜๐‘›+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜๐‘› ) โ†’ 0. Consequently,๐œ•๐บ (๐‘ฅ๐‘˜๐‘›+1) + โˆ‡๐น (๐‘ฅ๐‘˜๐‘›+1) 3 ๐‘ค๐‘˜๐‘›+1 + โˆ‡๐น (๐‘ฅ๐‘˜๐‘›+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜๐‘› ) โ†’ 0. By the outer semicontinuityof ๐œ•๐บ + โˆ‡๐น , it follows that 0 โˆˆ ๐œ•๐บ (๐‘ฅ) + โˆ‡๐น (๐‘ฅ) and therefore ๐‘ฅ โˆˆ (๐œ•๐น + โˆ‡๐บ)โˆ’1(0) โŠ‚ X. Theclaim thus follows by applying Opialโ€™s Lemma 9.1. ๏ฟฝ

214

15 nonlinear primal-dual proximal splitting

15.2 nl-pdps formulation and assumptions

Asmentioned above,we consider the problem (15.1) with ๐น : ๐‘‹ โ†’ โ„ and๐บ : ๐‘Œ โ†’ โ„ convex,proper, and lower semicontinuous, and ๐พ โˆˆ ๐ถ1(๐‘‹ ;๐‘Œ ). We will soon state more preciseassumptions on ๐พ . When either the null space of [โˆ‡๐พ (๐‘ฅ)]โˆ— is trivial or dom๐บ = ๐‘‹ , we canapply the chain rule Theorem 13.23 for Clarke subdierentials as well as the equivalencesof Theorems 13.5 and 13.8 for convex and dierentiable functions, respectively, to rewrite asin Section 8.4 the critical point conditions for this problem as 0 โˆˆ ๐ป (๐‘ข) for the set-valuedoperator ๐ป : ๐‘‹ ร— ๐‘Œ โ‡’ ๐‘‹ ร— ๐‘Œ dened for ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ as

(15.10) ๐ป (๐‘ข) โ‰”(๐œ•๐น (๐‘ฅ) + [โˆ‡๐พ (๐‘ฅ)]โˆ—๐‘ฆ๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ (๐‘ฅ)

).

Throughout the rest of this chapter, we write ๐‘ข = (๐‘ฅ, ๐‘ฆ) โˆˆ ๐ปโˆ’1(0) for an arbitrary root of๐ป that we assume to exist.

In analogy to the basic PDPS method, the basic unaccelerated NL-PDPS method theniterates

(15.11)

๐‘ฅ๐‘˜+1 โ‰” (๐ผ + ๐œ๐œ•๐น )โˆ’1(๐‘ฅ๐‘˜ โˆ’ ๐œ [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—๐‘ฆ๐‘˜),๐‘ฅ๐‘˜+1 โ‰” 2๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜+1 โ‰” (๐ผ + ๐œŽ๐œ•๐บโˆ—)โˆ’1(๐‘ฆ๐‘˜ + ๐œŽ๐พ (๐‘ฅ๐‘˜+1)).

We can write this algorithm in the general form of Theorem 11.12 as follows. For someprimal and dual testing parameters ๐œ‘๐‘˜ ,๐œ“๐‘˜+1 > 0, we dene the step length and testingoperators

๐‘Š๐‘˜+1 โ‰ก๐‘Š โ‰”

(๐œId 00 ๐œŽId

)and ๐‘๐‘˜+1 โ‰”

(๐œ‘๐‘˜ Id 00 ๐œ“๐‘˜+1Id

).

We also dene the linear preconditioner ๐‘€๐‘˜+1 and the step length weighted partial lin-earization ๐ป๐‘˜+1 of ๐ป by

๐‘€๐‘˜+1 โ‰”(

Id โˆ’๐œ [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—โˆ’๐œŽโˆ‡๐พ (๐‘ฅ๐‘˜) Id

), and(15.12)

๐ป๐‘˜+1(๐‘ข) โ‰”๐‘Š๐‘˜+1

(๐œ•๐น (๐‘ฅ) + [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—๐‘ฆ

๐œ•๐บโˆ—(๐‘ฆ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜) (๐‘ฅ โˆ’ ๐‘ฅ๐‘˜+1)).(15.13)

Observe that ๐ป๐‘˜+1(๐‘ข) simplies to๐‘Š๐‘˜+1๐ป (๐‘ข) for linear ๐พ . Then (15.11) becomes

(15.14) 0 โˆˆ ๐ป๐‘˜+1(๐‘ข๐‘˜+1) +๐‘€๐‘˜+1(๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜).

We will need ๐พ to be locally Lipschitz dierentiable.

215

15 nonlinear primal-dual proximal splitting

Assumption 15.3 (locally Lipschitz โˆ‡๐พ ). The operator ๐พ : ๐‘‹ โ†’ ๐‘Œ is Frรฉchet dierentiable,and for some ๐ฟ โ‰ฅ 0 and a neighborhood X๐พ of ๐‘ฅ ,

(15.15) โ€–โˆ‡๐พ (๐‘ฅ) โˆ’ โˆ‡๐พ (๐‘ง)โ€–๐•ƒ(๐‘‹,๐‘Œ ) โ‰ค ๐ฟโ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ (๐‘ฅ, ๐‘ง โˆˆ X๐พ ).

We also require a three-point assumption on ๐พ . This assumption combines a second-ordergrowth condition with a three-point smoothness estimate. Note that the factor ๐›พ๐พ can benegative; if it is, it will need to be oset by sucient strong convexity of ๐น .

Assumption 15.4 (three-point condition on ๐พ). For a neighborhood X๐พ of ๐‘ฅ , and some๐›พ๐พ โˆˆ โ„ and ๐ฟ, \ โ‰ฅ 0, we require

(15.16) ใ€ˆ[โˆ‡๐พ (๐‘ง) โˆ’ โˆ‡๐พ (๐‘ฅ)]โˆ—๐‘ฆ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹โ‰ฅ ๐›พ๐พ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ + \ โ€–๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ) โˆ’ โˆ‡๐พ (๐‘ฅ) (๐‘ฅ โˆ’ ๐‘ฅ)โ€–๐‘Œ โˆ’ _

2 โ€–๐‘ฅ โˆ’ ๐‘งโ€–2๐‘‹ (๐‘ฅ, ๐‘ง โˆˆ X๐พ ).

For linear ๐พ , Assumption 15.4 trivially holds for any ๐›พ๐พ โ‰ค 0, \ โ‰ฅ 0 and _ = 0. Furthermore,if we take ๐บโˆ— = ๐›ฟ{1} (so that ๐พ : ๐‘‹ โ†’ โ„), the problem (15.1) reduces to (15.2) with ๐พ inplace of ๐น . Minding that in this case ๐‘ฆ = 1, Lemma 15.1 with ๐›ฝ = 1 proves Assumption 15.4for _ = ๐ฟ, any \ โ‰ฅ 0 and ๐›พ๐พ โ‰ค ๐›พ with ๐›พ, ๐ฟ โ‰ฅ 0 satisfying ๐›พ ยท Id โ‰ค โˆ‡2๐พ (Z ) โ‰ค ๐ฟ ยท Id or allZ โˆˆ X๐พ . In more general settings, the verication of Assumption 15.4 can demand someeort. We refer to [Clason, Mazurenko & Valkonen 2019] for further examples.

15.3 convergence proof

For simplicity of treatment, and to demonstrate the main ideas without excessive techni-calities, we only show linear convergence under strong convexity of both ๐น and ๐บโˆ—.

We will base our proof on Theorem 11.12 and thus have to verify its assumptions. Mostof the work is in verifying the inequality (11.28), which we do in several steps. First, weensure that the operator ๐‘๐‘˜+1๐‘€๐‘˜+1 giving rise to the local metric is self-adjoint. Then weshow that ๐‘๐‘˜+2๐‘€๐‘˜+2 and the update ๐‘๐‘˜+1(๐‘€๐‘˜+1 +ฮž๐‘˜+1) actually performed by the algorithmyield identical norms, where ฮž๐‘˜+1 represents some o-diagonal components from thealgorithm as well as any strong convexity provided by ๐น and ๐บโˆ—. Finally, we estimate thelocal monotonicity of ๐ป๐‘˜+1.

We write ๐›พ๐น , ๐›พ๐บโˆ— โ‰ฅ 0 for the factors of (strong) convexity of ๐น and ๐บโˆ—, and recall the factor๐›พ๐พ โˆˆ โ„ from Assumption 15.4 Then for some โ€œacceleration parametersโ€ ๐›พ๐น , ๐›พ๐บโˆ— โ‰ฅ 0 and^ โˆˆ [0, 1), we require that

๐›พ๐น + ๐›พ๐พ โ‰ฅ ๐›พ๐น โ‰ฅ 0, ๐›พ๐บโˆ— โ‰ฅ ๐›พ๐บโˆ— โ‰ฅ 0,(15.17a)[๐‘˜ โ‰” ๐œ‘๐‘˜๐œ = ๐œ“๐‘˜+1๐œŽ, 1 โˆ’ ^ โ‰ค ๐œ๐œŽ โ€–โˆ‡๐พ (๐‘ฅ๐‘˜)โ€–2,(15.17b)

๐œ‘๐‘˜+1 = ๐œ‘๐‘˜ (1 + 2๐›พ๐น๐œ), and ๐œ“๐‘˜+1 = ๐œ“๐‘˜ (1 + 2๐›พ๐บโˆ—๐œŽ) (๐‘˜ โˆˆ โ„•).(15.17c)

216

15 nonlinear primal-dual proximal splitting

The next lemma adapts Lemma 9.11.

Lemma 15.5. Fix ๐‘˜ โˆˆ โ„• and suppose (15.17) holds. Then ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint and satises

๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ(๐›ฟ๐œ‘๐‘˜ ยท Id 0

0 (^ โˆ’ ๐›ฟ) (1 โˆ’ ๐›ฟ)โˆ’1๐œ“๐‘˜+1 ยท Id)

for any ๐›ฟ โˆˆ [0, ^] .

Proof. From (15.17) we have ๐œ‘๐‘˜๐œ = ๐œ“๐‘˜+1๐œŽ = [๐‘˜ . By (15.12) then

(15.18) ๐‘๐‘˜+1๐‘€๐‘˜+1 =(

๐œ‘๐‘˜ ยท Id โˆ’[๐‘˜ [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—โˆ’[๐‘˜โˆ‡๐พ (๐‘ฅ๐‘˜) ๐œ“๐‘˜+1 ยท Id

).

This shows that ๐‘๐‘˜+1๐‘€๐‘˜+1 is self-adjoint. Youngโ€™s inequality furthermore implies that

(15.19) ๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ฅ(๐›ฟ๐œ‘๐‘˜ Id 00 ๐œ“๐‘˜+1

(Id โˆ’ ๐œ๐œŽ

1โˆ’๐›ฟโˆ‡๐พ (๐‘ฅ๐‘˜) [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—) ) .

The claimed estimate then follows from the assumptions (15.17). ๏ฟฝ

Our next step is to simplify ๐‘๐‘˜+1๐‘€๐‘˜+1 โˆ’ ๐‘๐‘˜+2๐‘€๐‘˜+2 in (11.28).

Lemma 15.6. Fix ๐‘˜ โˆˆ โ„•, and suppose (15.17) holds. Then 12 โ€– ยท โ€–2๐‘๐‘˜+1 (๐‘€๐‘˜+1+ฮž๐‘˜+1)โˆ’๐‘๐‘˜+2๐‘€๐‘˜+2 = 0 for

(15.20) ฮž๐‘˜+1 โ‰”(

2๐›พ๐น๐œId 2๐œ [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—โˆ’2๐œŽโˆ‡๐พ (๐‘ฅ๐‘˜+1) 2๐›พ๐บโˆ—๐œŽId

).

Proof. Using (15.17) and (15.18) can write

๐‘๐‘˜+1(๐‘€๐‘˜+1 + ฮž๐‘˜+1) โˆ’ ๐‘๐‘˜+2๐‘€๐‘˜+2 = ๐ท๐‘˜+1

for the skew-symmetric operator

๐ท๐‘˜+1 โ‰”(

0 [[๐‘˜+1โˆ‡๐พ (๐‘ฅ๐‘˜+1) + [๐‘˜โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—โˆ’[[๐‘˜+1โˆ‡๐พ (๐‘ฅ๐‘˜+1) + [๐‘˜โˆ‡๐พ (๐‘ฅ๐‘˜)] 0

).

This yields the claim. ๏ฟฝ

For our main results, we need to assume that the dual variables stay bounded within theโ€œnonlinear rangeโ€ of ๐พ . To this end, we introduce the (possibly empty) subspace ๐‘ŒNL of ๐‘Œ inwhich ๐พ acts linearly, i.e.,

๐‘ŒL โ‰” {๐‘ฆ โˆˆ ๐‘Œ | the mapping ๐‘ฅ โ†ฆโ†’ ใ€ˆ๐‘ฆ, ๐พ (๐‘ฅ)ใ€‰ is linear} and ๐‘ŒNL โ‰” ๐‘ŒโŠฅL .

We then denote by ๐‘ƒNL the orthogonal projection to ๐‘ŒNL. We also write

๐”นNL(๐‘ฆ, ๐‘Ÿ ) โ‰” {๐‘ฆ โˆˆ ๐‘Œ | โ€–๐‘ฆ โˆ’ ๐‘ฆ โ€–๐‘ƒNL โ‰ค ๐‘Ÿ }

217

15 nonlinear primal-dual proximal splitting

for the closed cylinder in ๐‘Œ of the radius ๐‘Ÿ with axis orthogonal to ๐‘ŒNL.

With X๐พ given by Assumption 15.3, we now dene for some radius ๐œŒ๐‘ฆ > 0 the neighbor-hood

(15.21) U(๐œŒ๐‘ฆ) โ‰” X๐พ ร— ๐”นNL(๐‘ฆ, ๐œŒ๐‘ฆ).

We will require that the iterates {๐‘ข๐‘˜}๐‘ขโˆˆโ„• of (15.11) stay within this neighborhood for somexed ๐œŒ๐‘ฆ > 0.

The next lemma provides the necessary three-point inequality to estimate the linearizationsperformed within ๐ป๐‘˜+1.

Lemma 15.7. For a xed ๐‘˜ โˆˆ โ„•, suppose ๐‘ฅ๐‘˜+1 โˆˆ X๐พ , and let ๐œŒ๐‘ฆ โ‰ฅ 0 be such that ๐‘ข๐‘˜ , ๐‘ข๐‘˜+1 โˆˆU(๐œŒ๐‘ฆ). Suppose ๐พ satises Assumptions 15.3 and 15.4 with \ โ‰ฅ ๐œŒ๐‘ฆ . If (15.17) holds, then

ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1ฮž๐‘˜+1 โˆ’[๐‘˜ [_ + 3๐ฟ๐œŒ๐‘ฆ]

2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Proof. From (15.10), (15.13), (15.17), and (15.20), we calculate

(15.22) ๐ท โ‰” ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โˆ’12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1ฮž๐‘˜+1= ใ€ˆ๐ป (๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1๐‘Š๐‘˜+1 โˆ’ [๐‘˜๐›พ๐น โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ [๐‘˜๐›พ๐บโˆ— โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–2๐‘Œ+ [๐‘˜ ใ€ˆ[โˆ‡๐พ (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜+1)] (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆ๐‘˜+1ใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ๐พ (๐‘ฅ๐‘˜+1) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ(โˆ‡๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜)) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ .

Here the rst of the terms involving ๐พ comes from the rst lines of ๐ป๐‘˜+1 and ๐ป , the secondof the terms from the second line, and the third from ฮž๐‘˜+1. Since 0 โˆˆ ๐ป (๐‘ข), we have๐‘ž๐น โ‰” โˆ’[โˆ‡๐พ (๐‘ฅ)]โˆ—๐‘ฆ โˆˆ ๐œ•๐น (๐‘ฅ) and ๐‘ž๐บโˆ— โ‰” ๐พ (๐‘ฅ) โˆˆ ๐œ•๐บโˆ—(๐‘ฆ). Using (15.17), we can thereforeexpand

ใ€ˆ๐ป (๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1๐‘Š๐‘˜+1 = [๐‘˜ ใ€ˆ๐œ•๐น (๐‘ฅ๐‘˜+1) โˆ’ ๐‘ž๐น , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + [๐‘˜ ใ€ˆ๐œ•๐บโˆ—(๐‘ฆ๐‘˜+1) โˆ’ ๐‘ž๐บโˆ—, ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ[โˆ‡๐พ (๐‘ฅ๐‘˜+1)]โˆ—๐‘ฆ๐‘˜+1 โˆ’ [โˆ‡๐พ (๐‘ฅ)]โˆ—๐‘ฆ, ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹+ [๐‘˜ ใ€ˆ๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ .

Using the ๐›พ๐น -strong monotonicity of ๐œ•๐น and the ๐›พ๐บโˆ—-strong monotonicity of ๐œ•๐บโˆ—, andrearranging terms, we obtain

ใ€ˆ๐ป (๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1๐‘Š๐‘˜+1 โ‰ฅ [๐‘˜๐›พ๐น โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + [๐‘˜๐›พ๐บโˆ— โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–2๐‘Œ+ [๐‘˜ ใ€ˆโˆ‡๐พ (๐‘ฅ๐‘˜+1) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆ๐‘˜+1ใ€‰๐‘Œโˆ’ [๐‘˜ ใ€ˆโˆ‡๐พ (๐‘ฅ) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆใ€‰๐‘Œ + [๐‘˜ ใ€ˆ๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ .

218

15 nonlinear primal-dual proximal splitting

Combining this estimate with (15.22) and rearranging terms, we obtain

๐ท โ‰ฅ [๐‘˜ (๐›พ๐น โˆ’ ๐›พ๐น )โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + [๐‘˜ (๐›พ๐บโˆ— โˆ’ ๐›พ๐บโˆ—)โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–2๐‘Œโˆ’ [๐‘˜ ใ€ˆโˆ‡๐พ (๐‘ฅ) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆใ€‰๐‘Œ + [๐‘˜ ใ€ˆโˆ‡๐พ (๐‘ฅ๐‘˜) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆ๐‘˜+1ใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ(โˆ‡๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜)) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ .

Further rearrangements and ๐›พ๐น + ๐›พ๐พ โ‰ฅ ๐›พ๐น and ๐›พ๐บโˆ— โ‰ฅ ๐›พ๐บโˆ— give

(15.23) ๐ท โ‰ฅ โˆ’[๐‘˜๐›พ๐พ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + [๐‘˜ ใ€ˆ[โˆ‡๐พ (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐พ (๐‘ฅ)] (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜+1) (๐‘ฅ โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ๐พ (๐‘ฅ๐‘˜+1) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) + โˆ‡๐พ (๐‘ฅ๐‘˜+1) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ+ [๐‘˜ ใ€ˆ(โˆ‡๐พ (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜+1)) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ .

Using Assumption 15.3 and the mean value theorem in the form

๐พ (๐‘ฅโ€ฒ) = ๐พ (๐‘ฅ) + โˆ‡๐พ (๐‘ฅ) (๐‘ฅโ€ฒ โˆ’ ๐‘ฅ) +โˆซ 1

0(โˆ‡๐พ (๐‘ฅ + ๐‘  (๐‘ฅโ€ฒ โˆ’ ๐‘ฅ)) โˆ’ โˆ‡๐พ (๐‘ฅ)) (๐‘ฅโ€ฒ โˆ’ ๐‘ฅ)๐‘‘๐‘ ,

we obtain for any ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ X๐พ and ๐‘ฆ โˆˆ dom๐บโˆ— the inequality

(15.24) ใ€ˆ๐พ (๐‘ฅโ€ฒ) โˆ’ ๐พ (๐‘ฅ) โˆ’ โˆ‡๐พ (๐‘ฅ) (๐‘ฅโ€ฒ โˆ’ ๐‘ฅ), ๐‘ฆใ€‰๐‘Œ โ‰ค (๐ฟ/2)โ€–๐‘ฅ โˆ’ ๐‘ฅโ€ฒโ€–2๐‘‹ โ€–๐‘ฆ โ€–๐‘ƒNL .Applying Assumption 15.3, the identity (15.24), and ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ to the last twoterms of (15.23), we obtain

ใ€ˆ๐พ (๐‘ฅ๐‘˜+1) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) + โˆ‡๐พ (๐‘ฅ๐‘˜+1) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ โ‰ฅ โˆ’๐ฟ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–๐‘ƒNL

andใ€ˆ(โˆ‡๐พ (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜+1)) (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜+1), ๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆใ€‰๐‘Œ โ‰ฅ โˆ’๐ฟโ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–๐‘ƒNL .

These estimates together with (15.17) and ๐‘ข๐‘˜+1 โˆˆ U(๐œŒ๐‘ฆ) now imply that ๐ท โ‰ฅ [๐‘˜๐ท๐พ๐‘˜+1 for

๐ท๐พ๐‘˜+1 โ‰” ใ€ˆ[โˆ‡๐พ (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐พ (๐‘ฅ)] (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ), ๐‘ฆใ€‰๐‘Œ โˆ’ ๐›พ๐พ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ๐ฟ(1 + 1/2)๐œŒ๐‘ฆ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹โˆ’ โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–๐‘ƒNL โ€–๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜+1) (๐‘ฅ โˆ’ ๐‘ฅ๐‘˜+1)โ€–๐‘Œ .

Finally, we use Assumption 15.4 and Youngโ€™s inequality to estimate

๐ท๐พ๐‘˜+1 โ‰ฅ (\ โˆ’ โ€–๐‘ฆ๐‘˜+1 โˆ’ ๐‘ฆ โ€–๐‘ƒNL)โ€–๐พ (๐‘ฅ) โˆ’ ๐พ (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐พ (๐‘ฅ๐‘˜+1) (๐‘ฅ โˆ’ ๐‘ฅ๐‘˜+1)โ€–๐‘Œโˆ’ _ + 3๐ฟ๐œŒ๐‘ฆ

2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Now observe that \ โˆ’ โ€–๐‘ฆ๐‘˜+1โˆ’๐‘ฆ โ€–๐‘ƒNL โ‰ฅ \ โˆ’๐œŒ๐‘ฆ โ‰ฅ 0. Combining with the estimate๐ท โ‰ฅ [๐‘˜๐ท๐พ๐‘˜+1,we therefore obtain our claim. ๏ฟฝ

219

15 nonlinear primal-dual proximal splitting

We now have all the necessary tools in hand to prove the main estimate (11.28) needed forthe application of Theorem 11.12.

Theorem 15.8. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘Œ โ†’ โ„ be convex, proper, and lower semicontinuous.Suppose ๐พ : ๐‘‹ โ†’ ๐‘Œ satises Assumptions 15.3 and 15.4. Fix ๐‘˜ โˆˆ โ„•, and also suppose ๐‘ฅ๐‘˜+1 โˆˆ X๐พand that ๐‘ข๐‘˜ , ๐‘ข๐‘˜+1 โˆˆ U(๐œŒ๐‘ฆ) for some ๐œŒ๐‘ฆ โ‰ฅ 0. Suppose (15.17) holds for some ^ โˆˆ [0, 1) and

(15.25) ๐œ <^

_ + 3๐ฟ๐œŒ๐‘ฆ.

Then there exists ๐›ฟ > 0 such that

(15.26) 12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+2๐‘€๐‘˜+2 +๐›ฟ

2 โ€–๐‘ข๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ค

12 โ€–๐‘ข

๐‘˜ โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 (๐‘˜ โˆˆ โ„•).

Proof. We show that (15.27) holds withV๐‘˜+1 โ‰ฅ ๐›ฟ2 โ€–๐‘ข๐‘˜+1โˆ’๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 for some ๐›ฟ > 0, i.e., that

(15.27) ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’๐‘ขใ€‰๐‘๐‘˜+1 โ‰ฅ๐›ฟ โˆ’ 12 โ€–๐‘ข๐‘˜+1 โˆ’๐‘ข๐‘˜ โ€–2๐‘๐‘˜+1๐‘€๐‘˜+1 +

12 โ€–๐‘ข

๐‘˜+1 โˆ’๐‘ขโ€–2๐‘๐‘˜+2๐‘€๐‘˜+2โˆ’๐‘๐‘˜+1๐‘€๐‘˜+1 .

The claim then follows from Theorem 11.12 and Lemma 15.5, the latter of which providesthe necessary self-adjointness of ๐‘๐‘˜+1๐‘€๐‘˜+1.

Let thus ๐›ฟ โˆˆ (0, ^) be arbitrary, and dene

๐‘†๐‘˜+1 โ‰”((๐›ฟ๐œ‘๐‘˜ โˆ’ [๐‘˜ [_ + 3๐ฟ๐œŒ๐‘ฆ])Id 0

0 ๐œ“๐‘˜(Id โˆ’ ๐œ๐œŽ

1โˆ’๐›ฟโˆ‡๐พ (๐‘ฅ๐‘˜) [โˆ‡๐พ (๐‘ฅ๐‘˜)]โˆ—) ) .

Using (15.19) and (15.18) and, in the second and third step, Lemmas 15.6 and 15.7, we estimate

12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ข๐‘˜ โ€–2๐‘†๐‘˜+1โˆ’๐‘๐‘˜+1๐‘€๐‘˜+1 โ‰ค โˆ’[๐‘˜ [_ + 3๐ฟ๐œŒ๐‘ฆ]2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

โ‰ค ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โˆ’12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+1ฮž๐‘˜+1= ใ€ˆ๐ป๐‘˜+1(๐‘ข๐‘˜+1), ๐‘ข๐‘˜+1 โˆ’ ๐‘ขใ€‰๐‘๐‘˜+1 โˆ’

12 โ€–๐‘ข

๐‘˜+1 โˆ’ ๐‘ขโ€–2๐‘๐‘˜+2๐‘€๐‘˜+2โˆ’๐‘๐‘˜+1๐‘€๐‘˜+1 .

Then (15.27) holds if ๐‘†๐‘˜+1 > ๐›ฟ ยท Id for some ๐›ฟ > 0. Since (15.25) implies ๐œ < ๐›ฟ/(_ + 3๐ฟ๐œŒ๐‘ฆ) forsome ๐›ฟ โˆˆ (0, ^), this follows from (15.17) and (15.25). ๏ฟฝ

Theorem 15.9. Let ๐น : ๐‘‹ โ†’ โ„ and ๐บ : ๐‘Œ โ†’ โ„ be strongly convex, proper, and lowersemicontinuous. Suppose ๐พ : ๐‘‹ โ†’ ๐‘Œ satises Assumptions 15.3 and 15.4. Let ๐‘…๐พ > 0 besuch that sup๐‘ฅโˆˆX๐พ โ€–โˆ‡๐พ (๐‘ฅ)โ€– โ‰ค ๐‘…๐พ . Pick 0 < ๐œ < 1/(_ + 3๐ฟ๐œŒ๐‘ฆ) for a given ๐œŒ๐‘ฆ โ‰ฅ 0, and take๐œŽ = ๐œ๐›พ๐น/๐›พ๐บโˆ— for some ๐›พ๐น โˆˆ (0, ๐›พ๐น + ๐›พ๐พ ] and ๐›พ๐บโˆ— โˆˆ (0, ๐›พ๐บ ]. Let the iterates {(๐‘ข๐‘˜ , ๐‘ฅ๐‘˜+1)}๐‘˜โˆˆโ„• begenerated by the NL-PDPS method (15.11). If ๐‘ฅ๐‘˜+1 โˆˆ X๐พ and ๐‘ข๐‘˜ โˆˆ U(๐œŒ๐‘ฆ) for all ๐‘˜ โˆˆ โ„• andsome ๐‘ข โˆˆ ๐ปโˆ’1(0) for ๐ป given in (15.10), then ๐‘ข๐‘˜ โ†’ ๐‘ข linearly.

220

15 nonlinear primal-dual proximal splitting

Proof. Take ๐œ‘๐‘˜+1 โ‰” ๐œ‘๐‘˜ (1 + 2๐›พ๐น๐œ) and๐œ“๐‘˜+1 โ‰” ๐œ“๐‘˜ (1 + 2๐›พ๐บโˆ—๐œŽ) for ๐œ‘0 = 1 and๐œ“1 โ‰” ๐œ/๐œŽ . Then๐œ‘๐‘˜๐œ = ๐œ“๐‘˜+1๐œŽ if and only if 1 + 2๐›พ๐น๐œ = 1 + 2๐›พ๐บโˆ—๐œŽ , i.e., for ๐œŽ = ๐œ๐›พ๐น/๐›พ๐บโˆ— . Consequently (15.17) issatised and the testing parameters ๐œ‘๐‘˜ and๐œ“๐‘˜+1 grow exponentially. Clearly (15.25) holdsfor some ^ โˆˆ [0, 1). Combining (15.26) from Theorem 15.8 with Lemma 15.5 now shows theclaimed linear convergence. ๏ฟฝ

Besides step length bounds and structural properties of the problem, Theorem 15.9 stillrequires us to ensure that the iterates stay close enough to the critical point ๐‘ฅ . This canbe done if we initialize close enough to a critical point. As the proof is very technical, wemerely state the following result.

Theorem 15.10 ([Clason, Mazurenko & Valkonen 2019, Proposition 4.8]). Under the assump-tions of Theorem 15.9, for any ๐œŒ๐‘ฆ > 0 there exists an Y > 0 such that {๐‘ข๐‘˜}๐‘˜โˆˆโ„• โŠ‚ U(๐œŒ๐‘ฆ) forall initial iterates ๐‘ข0 = (๐‘ฅ0, ๐‘ฆ0) satisfying

(15.28)โˆš2๐›ฟโˆ’1(โ€–๐‘ฅ0 โˆ’ ๐‘ฅ โ€–2 + ๐œ๐œŽโˆ’1โ€–๐‘ฆ0 โˆ’ ๐‘ฆ โ€–2) โ‰ค Y.

Remark 15.11 (weaker assumptions, weaker convergence). We have only demonstrated linear conver-gence of the method under the strong convexity of both ๐น and ๐บโˆ—. However, under similarly lesserassumptions as for the basic PDPS method familiar from Part II, both an accelerated ๐‘‚ (1/๐‘ 2) rateand weak convergence can be proved. We refer to [Clason, Mazurenko & Valkonen 2019] for details,noting that Opialโ€™s Lemma 9.1 extends straightforwardly to the quantitative Fejรฉr monotonicity(10.21) that is the basis of our proofs here. We also note that our linear convergence result diersfrom that in [Clason, Mazurenko & Valkonen 2019] by taking the over-relaxation parameter ๐œ” = 1in (15.11) instead of ๐œ” = 1/(1 + 2๐›พ๐น๐œ) < 1; compare Theorem 10.8.

Remark 15.12 (historical development of the NL-PDPS). The NL-PDPS method was rst introducedin [Valkonen 2014] in nite dimensions with applications to inverse problems in magnetic resonanceimaging. The method was later extended in [Clason & Valkonen 2017a] to innite dimensions andapplied to PDE-constrained optimization problems. In these works, only (weak) convergence of theiterates is shown, based on the metric regularity of the operator ๐ป . We discuss metric regularitylater in Chapter 27. Convergence rates were then rst shown in [Clason, Mazurenko & Valkonen2019]. In that paper, alternative forms of the three-point condition Assumption 15.4 on ๐พ are alsodiscussed.

Similarly to how we showed in Section 8.7 that the preconditioned ADMM is equivalent to thePDPS method, it is possible to derive a preconditioned nonlinear ADMM that is equivalent to theNL-PDPS method; such algorithms are considered in [Benning et al. 2016]. The NL-PDPS methodhas been extended in [Clason, Mazurenko & Valkonen 2020] by replacing ใ€ˆ๐พ (๐‘ฅ), ๐‘ฆใ€‰๐‘Œ by a generalsaddle term ๐พ (๐‘ฅ, ๐‘ฆ), which can be applied to nonconvex optimization problems such as โ„“0-TVdenoising or elliptic Nash equilibrium problems. Block-adapted and stochastic variants in the spiritof Remark 11.17 can be found in [Mazurenko, Jauhiainen & Valkonen 2020]. Finally, a simpliedapproach using the Bregman divergences of Section 11.1 is presented in [Valkonen 2020a].

221

16 LIMITING SUBDIFFERENTIALS

While the Clarke subdierential is a suitable concept for nonsmooth but convex or non-convex but smooth functionals, it has severe drawbacks for nonsmooth and nonconvexfunctionals: As shown in Corollary 13.11, its Fermat principle cannot distinguish minimizersfrom maximizers. The reason is that the Clarke subdierential is always convex, whichis a direct consequence of its construction (13.2) via polarity with respect to (generalized)directional derivatives. To obtain sharper results for such functionals, it is therefore nec-essary to construct nonconvex subdierentials directly via a primal limiting process. Onthe other hand, deriving calculus rules for the previous subdierentials crucially exploitedtheir convexity by applying Hahnโ€“Banach separation theorems, and calculus rules fornonconvex subdierentials are thus signicantly more dicult to obtain. As in Chapter 13,we will assume throughout this chapter that๐‘‹ is a Banach space unless stated otherwise.

16.1 bouligand subdifferentials

The rst denition is motivated by Theorem 13.26: We dene a subdierential as a suitablelimit of classical derivatives (without convexication). For ๐น : ๐‘‹ โ†’ โ„, we rst dene theset of Gรขteaux points

๐บ๐น โ‰” {๐‘ฅ โˆˆ ๐‘‹ | ๐น is Gรขteaux dierentiable at ๐‘ฅ} โŠ‚ dom ๐น

and then the Bouligand subdierential of ๐น at ๐‘ฅ as

(16.1) ๐œ•๐ต๐น (๐‘ฅ) โ‰” {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ๐ท๐น (๐‘ฅ๐‘›) โˆ—โ‡€ ๐‘ฅโˆ— for some ๐บ๐น 3 ๐‘ฅ๐‘› โ†’ ๐‘ฅ} .

For ๐น : โ„๐‘ โ†’ โ„ locally Lipschitz, it follows from Theorem 13.26 that ๐œ•๐ถ๐น (๐‘ฅ) = co ๐œ•๐ต๐น (๐‘ฅ).However, unless ๐‘‹ is nite-dimensional, it is not clear a priori that the Bouligand subdif-ferential is nonempty even for ๐‘ฅ โˆˆ dom ๐น .1 Furthermore, the subdierential does not admita satisfactory calculus; not even a Fermat principle holds.

1Although in special cases it is possible to give a full characterization in Hilbert spaces; see, e.g., [Christof,Clason, et al. 2018].

222

16 limiting subdifferentials

Example 16.1. Let ๐น : โ„ โ†’ โ„, ๐น (๐‘ฅ) โ‰” |๐‘ฅ |. Then ๐น is dierentiable at every ๐‘ฅ โ‰  0 with๐น โ€ฒ(๐‘ฅ) = sign(๐‘ฅ). Correspondingly,

0 โˆ‰ {โˆ’1, 1} = ๐œ•๐ต๐น (0).

To make this approach work therefore requires a more delicate limiting process. Theremainder of this chapter is devoted to one such approach, where we only give an overviewand state important results following [Mordukhovich 2006]. The full theory is based on ageometric construction similar to Lemma 4.10 making use of tangent and normal cones(corresponding to generalized directional derivatives and subgradients, respectively) thatalso allows for dierentiation of set-valued mappings. We will develop this theory inChapters 18 to 21. For an alternative, more axiomatic, approach to generalized derivativesof nonconvex functionals, we refer to [Penot 2013; Ioe 2017].

16.2 frรฉchet subdifferentials

We begin with the following limiting construction, which combines the characterizationsof both the Frรฉchet derivative and the convex subdierential. Let ๐‘‹ be a Banach space and๐น : ๐‘‹ โ†’ โ„. The Frรฉchet subdierential (or regular subdierential or presubdierential) of ๐นat ๐‘ฅ is then dened as2

(16.2) ๐œ•๐น๐น (๐‘ฅ) โ‰”{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim inf๐‘ฆโ†’๐‘ฅ

๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ 0

}.

Note how this โ€œlocalizesโ€ the denition of the convex subdierential around the point ofinterest: the numerator does not need to be nonnegative for all ๐‘ฆ ; it suces if this holdsfor any ๐‘ฆ suciently close to ๐‘ฅ . By a similar argument as for Theorem 4.2, we thus obtaina Fermat principle for local minimizers.

Theorem 16.2. Let ๐น : ๐‘‹ โ†’ โ„ be proper and๐‘ฅ โˆˆ dom ๐น be a localminimizer. Then 0 โˆˆ ๐œ•๐น๐น (๐‘ฅ).

Proof. Let๐‘ฅ โˆˆ dom ๐น be a local minimizer. Then there exists an Y > 0 such that ๐น (๐‘ฅ) โ‰ค ๐น (๐‘ฆ)for all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, Y), which is equivalent to

๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ0, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ 0 for all ๐‘ฆ โˆˆ ๐•†(๐‘ฅ, Y).

Now for any strongly convergent sequence ๐‘ฆ๐‘› โ†’ ๐‘ฅ , we have that ๐‘ฆ๐‘› โˆˆ ๐•†(๐‘ฅ, Y) for ๐‘› largeenough. Taking the lim inf in the above inequality thus yields 0 โˆˆ ๐œ•๐น (๐‘ฅ). ๏ฟฝ

2The equivalence of (16.2) with the usual denition based on corresponding normal cones follows from,e.g., [Mordukhovich 2006, Theorem 1.86].

223

16 limiting subdifferentials

For convex functions, of course, the numerator is always nonnegative by denition, andthe Frรฉchet subdierential reduces to the convex subdierential.

Theorem 16.3. Let ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lower semicontinuous and ๐‘ฅ โˆˆ dom ๐น .Then ๐œ•๐น๐น (๐‘ฅ) = ๐œ•๐น (๐‘ฅ).

Proof. By denition of the convex subdierential, any ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) satises

๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 for all ๐‘ฆ โˆˆ ๐‘‹ .

Dividing by โ€–๐‘ฅโˆ’๐‘ฆ โ€–๐‘‹ > 0 for ๐‘ฆ โ‰  ๐‘ฅ and taking the lim inf as ๐‘ฆ โ†’ ๐‘ฅ thus yields ๐‘ฅโˆ— โˆˆ ๐œ•๐น๐น (๐‘ฅ).Conversely, let ๐‘ฅโˆ— โˆˆ ๐œ•๐น๐น (๐‘ฅ). Let โ„Ž โˆˆ ๐‘‹ \ {0} be arbitrary. Then there exists an Y > 0 suchthat

๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘กโ„Žใ€‰๐‘‹๐‘ก โ€–โ„Žโ€–๐‘‹ โ‰ฅ 0 for all ๐‘ก โˆˆ (0, Y) .

Multiplying by โ€–โ„Žโ€–๐‘‹ > 0 and letting ๐‘ก โ†’ 0, we obtain from Lemma 4.3 that

(16.3) ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น (๐‘ฅ + ๐‘กโ„Ž) โˆ’ ๐น (๐‘ฅ)๐‘ก

โ†’ ๐น โ€ฒ(๐‘ฅ ;โ„Ž).

By Lemma 4.4, this implies that ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ). ๏ฟฝ

Similarly, for Frรฉchet dierentiable functionals, the limit in (16.2) is zero for all sequences.

Theorem 16.4. Let ๐น : ๐‘‹ โ†’ โ„ be Frรฉchet dierentiable at ๐‘ฅ โˆˆ ๐‘‹ . Then ๐œ•๐น๐น (๐‘ฅ) = {๐น โ€ฒ(๐‘ฅ)}.

Proof. The denition of the Frรฉchet derivative immediately yields

lim๐‘ฆโ†’๐‘ฅ

๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐น โ€ฒ(๐‘ฅ), ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ = lim

โ€–โ„Žโ€–๐‘‹โ†’0

๐น (๐‘ฅ + โ„Ž) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐น โ€ฒ(๐‘ฅ)โ„Žโ€–โ„Žโ€–๐‘‹ = 0

and hence ๐น โ€ฒ(๐‘ฅ) โˆˆ ๐œ•๐น๐น (๐‘ฅ).Conversely, let ๐‘ฅโˆ— โˆˆ ๐œ•๐น๐น (๐‘ฅ) and let again โ„Ž โˆˆ ๐‘‹ \ {0} be arbitrary. As in the proof ofTheorem 16.3, we then obtain that

(16.4) ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) = ใ€ˆ๐น โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ .

Applying the same argument to โˆ’โ„Ž then yields ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ = ใ€ˆ๐น โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ for all โ„Ž โˆˆ ๐‘‹ , i.e.,๐‘ฅโˆ— = ๐น โ€ฒ(๐‘ฅ). ๏ฟฝ

For nonsmooth nonconvex functions the Frรฉchet subdierential can be strictly smallerthan the Clarke subdierential.

224

16 limiting subdifferentials

Example 16.5. Consider ๐น : โ„ โ†’ โ„, ๐น (๐‘ฅ) โ‰” โˆ’|๐‘ฅ |. For any ๐‘ฅ โ‰  0, it follows fromTheorem 16.4 that ๐œ•๐น๐น (๐‘ฅ) = {โˆ’ sign๐‘ฅ}. But for ๐‘ฅ = 0 and arbitrary ๐‘ฅโˆ— โˆˆ โ„, we havethat

lim inf๐‘ฆโ†’0

๐น (๐‘ฆ) โˆ’ ๐น (0) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ 0ใ€‰|๐‘ฆ โˆ’ 0| = lim inf

๐‘ฆโ†’0(โˆ’1 โˆ’ ๐‘ฅโˆ— ยท sign(๐‘ฆ)) = โˆ’1 โˆ’ |๐‘ฅโˆ— | < 0

and hence that๐œ•๐น๐น (0) = โˆ… ( [โˆ’1, 1] = ๐œ•๐ถ๐น (0).

Note that 0 โˆˆ dom ๐น in this example. Although the Frรฉchet subdierential does not pickup a maximizer in contrast to the Clarke subdierential, the fact that ๐œ•๐น๐น (๐‘ฅ) can be emptyeven for ๐‘ฅ โˆˆ dom ๐น is a problem when trying to derive calculus rules that hold with equality.In fact, as Example 16.5 shows, the Frรฉchet subdierential fails to be outer semicontinuous,which is also not desirable. This leads to the next and nal denition.

16.3 mordukhovich subdifferentials

Let ๐‘‹ be a reexive Banach space and ๐น : ๐‘‹ โ†’ โ„. The Mordukhovich subdierential (orbasic subdierential or limiting subdierential) of ๐น at ๐‘ฅ โˆˆ dom ๐น is then dened as thestrong-to-weakโˆ— outer closure of ๐œ•๐น๐น (๐‘ฅ), i.e.,3

(16.5) ๐œ•๐‘€๐น (๐‘ฅ) โ‰” w-โˆ—-lim sup๐‘ฆโ†’๐‘ฅ

๐œ•๐น๐น (๐‘ฆ)

={๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— ๏ฟฝ๏ฟฝ ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ— for some ๐‘ฅโˆ—๐‘› โˆˆ ๐œ•๐น๐น (๐‘ฅ๐‘›) with ๐‘ฅ๐‘› โ†’ ๐‘ฅ

},

which can be seen as a generalization of the denition (16.1) of the Bouligand subdierential.Note that in contrast to (16.1), this denition includes the constant sequence ๐‘ฅโˆ—๐‘› โ‰ก ๐‘ฅโˆ— evenat nondierentiable points, which makes this a more useful concept in general. This alsoimplies that ๐œ•๐น๐น (๐‘ฅ) โŠ‚ ๐œ•๐‘€๐น (๐‘ฅ) for any ๐น , and Theorem 16.2 immediately yields a Fermatprinciple.

Corollary 16.6. Let ๐น : ๐‘‹ โ†’ โ„ be proper and ๐‘ฅ โˆˆ dom ๐น be a local minimizer. Then0 โˆˆ ๐œ•๐‘€๐น (๐‘ฅ).

As for the Frรฉchet subdierential, maximizers do not satisfy the Fermat principle.

Example 16.7. Consider again ๐น : โ„ โ†’ โ„, ๐น (๐‘ฅ) โ‰” โˆ’|๐‘ฅ |. Using Example 16.5, we directly

3The equivalence of this denition with the original geometric denition โ€“ which holds in reexive Banachspaces โ€“ follows from [Mordukhovich 2006, Theorem 2.34].

225

16 limiting subdifferentials

obtain from (16.5) that ๐œ•๐‘€๐น (0) = {โˆ’1, 1} = ๐œ•๐ต๐น (0).

Since the convex subdierential is strong-to-weakโˆ— outer semicontinuous, theMordukhovichsubdierential reduces to the convex subdierential as well.

Theorem 16.8. Let ๐‘‹ be a reexive Banach space, ๐น : ๐‘‹ โ†’ โ„ be proper, convex, and lowersemicontinuous, and ๐‘ฅ โˆˆ dom ๐น . Then ๐œ•๐‘€๐น (๐‘ฅ) = ๐œ•๐น (๐‘ฅ).

Proof. From Theorem 16.3, it follows that ๐œ•๐น (๐‘ฅ) = ๐œ•๐น๐น (๐‘ฅ) โŠ‚ ๐œ•๐‘€๐น (๐‘ฅ). Let therefore ๐‘ฅโˆ— โˆˆ๐œ•๐‘€๐น (๐‘ฅ) be arbitrary. Then by denition there exists a sequence {๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• โŠ‚ with ๐‘ฅโˆ—๐‘› โˆ—โ‡€ ๐‘ฅโˆ—

and ๐‘ฅโˆ—๐‘› โˆˆ ๐œ•๐น๐น (๐‘ฅ๐‘›) = ๐œ•๐น (๐‘ฅ๐‘›) for ๐‘ฅ๐‘› โ†’ ๐‘ฅ . From Theorem 6.11 and Lemma 6.8, it then followsthat ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) as well. ๏ฟฝ

A similar result holds for continuously dierentiable functionals.

Theorem 16.9. Let๐‘‹ be a reexive Banach space and ๐น : ๐‘‹ โ†’ โ„ be continuously dierentiableat ๐‘ฅ โˆˆ ๐‘‹ . Then ๐œ•๐‘€๐น (๐‘ฅ) = {๐น โ€ฒ(๐‘ฅ)}.

Proof. From Theorem 16.3, it follows that {๐น โ€ฒ(๐‘ฅ)} = ๐œ•๐น๐น (๐‘ฅ) โŠ‚ ๐œ•๐‘€๐น (๐‘ฅ). Let therefore๐‘ฅโˆ— โˆˆ ๐œ•๐‘€๐น (๐‘ฅ) be arbitrary. Then by denition there exists a sequence {๐‘ฅโˆ—๐‘›}๐‘›โˆˆโ„• โŠ‚ with๐‘ฅโˆ—๐‘›

โˆ—โ‡€ ๐‘ฅโˆ— and ๐‘ฅโˆ—๐‘› โˆˆ ๐œ•๐น๐น (๐‘ฅ๐‘›) = {๐น โ€ฒ(๐‘ฅ๐‘›)} for ๐‘ฅ๐‘› โ†’ ๐‘ฅ . The continuity of ๐น โ€ฒ then immediatelyimplies that ๐น โ€ฒ(๐‘ฅ๐‘›) โ†’ ๐น (๐‘ฅ), and since strong limits are also weak-โˆ— limits, we obtain๐‘ฅโˆ— = ๐น โ€ฒ(๐‘ฅ). ๏ฟฝ

The same function as in Example 13.6 shows that this equality does not hold if ๐น is merelyFrรฉchet dierentiable.

We also have the following relation to Clarke subdierentials, which should be comparedto Theorem 13.26. We will give a proof in a more restricted setting in Chapter 20, cf. Corol-lary 20.21.

Theorem 16.10 ([Mordukhovich 2006, Theorem 3.57]). Let ๐‘‹ be a reexive Banach spaceand ๐น : ๐‘‹ โ†’ โ„ be locally Lipschitz continuous around ๐‘ฅ โˆˆ ๐‘‹ . Then ๐œ•๐ถ๐น (๐‘ฅ) = clโˆ— co ๐œ•๐‘€๐น (๐‘ฅ),where clโˆ—๐ด stands for the weak-โˆ— closure of the set ๐ด โŠ‚ ๐‘‹ โˆ—.4

The following example illustrates that the Mordukhovich subdierential can be noncon-vex.

4Of course, in reexive Banach spaces the weak-โˆ— closure coincides with the weak closure. The statementholds more general in so-called Asplund spaces which include some nonreexive Banach spaces.

226

16 limiting subdifferentials

Example 16.11. Let ๐น : โ„2 โ†’ โ„, ๐น (๐‘ฅ1, ๐‘ฅ2) = |๐‘ฅ1 | โˆ’ |๐‘ฅ2 |. Since ๐น is continuously dieren-tiable for any (๐‘ฅ1, ๐‘ฅ2) where ๐‘ฅ1, ๐‘ฅ2 โ‰  0 with

โˆ‡๐น (๐‘ฅ1, ๐‘ฅ2) โˆˆ {(1, 1), (โˆ’1, 1), (1,โˆ’1), (โˆ’1,โˆ’1)},

we obtain from (16.2) that

๐œ•๐น๐น (๐‘ฅ1, ๐‘ฅ2) =

{(1,โˆ’1)} if ๐‘ฅ1 > 0, ๐‘ฅ2 > 0,{(โˆ’1,โˆ’1)} if ๐‘ฅ1 < 0, ๐‘ฅ2 > 0,{(โˆ’1, 1)} if ๐‘ฅ1 < 0, ๐‘ฅ2 < 0,{(โˆ’1,โˆ’1)} if ๐‘ฅ1 > 0, ๐‘ฅ2 < 0,{(๐‘ก,โˆ’1) | ๐‘ก โˆˆ [โˆ’1, 1]} if ๐‘ฅ1 = 0, ๐‘ฅ2 > 0,{(๐‘ก, 1) | ๐‘ก โˆˆ [โˆ’1, 1]} if ๐‘ฅ1 = 0, ๐‘ฅ2 < 0,โˆ… if ๐‘ฅ2 = 0.

In particular, ๐œ•๐น๐น (0, 0) = โˆ…. However, from (16.5) it follows that

๐œ•๐‘€๐น (0, 0) = {(๐‘ก,โˆ’1) | ๐‘ก โˆˆ [โˆ’1, 1]} โˆช {(๐‘ก, 1) | ๐‘ก โˆˆ [โˆ’1, 1]} .

In particular, 0 โˆ‰ ๐œ•๐‘€๐น (0, 0). On the other hand, Theorem 16.10 then yields that

(16.6) ๐œ•๐ถ๐น (0, 0) = {(๐‘ก, ๐‘ ) | ๐‘ก, ๐‘  โˆˆ [โˆ’1, 1]} = [โˆ’1, 1]2

and hence 0 โˆˆ ๐œ•๐ถ๐น (0, 0). (Note that ๐น admits neither a minimum nor a maximum onโ„2, while (0, 0) is a nonsmooth saddle-point.)

In contrast to the Bouligand subdierential, the Mordukhovich subdierential admits asatisfying calculus, although the assumptions are understandably more restrictive than inthe convex setting. The rst rule follows as always straight from the denition.

Theorem 16.12. Let ๐‘‹ be a reexive Banach space and ๐น : ๐‘‹ โ†’ โ„. Then for any _ โ‰ฅ 0 and๐‘ฅ โˆˆ ๐‘‹ ,

๐œ•๐‘€ (_๐น ) (๐‘ฅ) = _๐œ•๐‘€๐น (๐‘ฅ).

Full calculus in innite-dimensional spaces holds only for a rather small class ofmappings.

Theorem 16.13 ([Mordukhovich 2006, Proposition 1.107]). Let ๐‘‹ be a reexive Banach space,๐น : ๐‘‹ โ†’ โ„ be continuously dierentiable, and ๐บ : ๐‘‹ โ†’ โ„ be arbitrary. Then for any๐‘ฅ โˆˆ dom๐บ ,

๐œ•๐‘€ (๐น +๐บ) (๐‘ฅ) = {๐น โ€ฒ(๐‘ฅ)} + ๐œ•๐‘€๐บ (๐‘ฅ).

227

16 limiting subdifferentials

While the previous two theorems also hold for the Frรฉchet subdierential (the latter even formerely Frรฉchet dierentiable ๐น ), the following chain rule is only valid for the Mordukhovichsubdierential. Compared to Theorem 13.23, it also allows for the outer functional to beextended-real valued.

Theorem 16.14 ([Mordukhovich 2006, Proposition 1.112]). Let ๐‘‹ be a reexive Banach space,๐น : ๐‘‹ โ†’ ๐‘Œ be continuously dierentiable, and ๐บ : ๐‘Œ โ†’ โ„ be arbitrary. Then for any ๐‘ฅ โˆˆ ๐‘‹with ๐น (๐‘ฅ) โˆˆ dom๐บ and ๐น โ€ฒ(๐‘ฅ) : ๐‘‹ โ†’ ๐‘Œ surjective,

๐œ•๐‘€ (๐บ โ—ฆ ๐น ) (๐‘ฅ) = ๐น โ€ฒ(๐‘ฅ)โˆ—๐œ•๐‘€๐บ (๐น (๐‘ฅ)) .

More general calculus rules require ๐‘‹ to be a reexive Banach5 space as well as additional,nontrivial, assumptions on ๐น and ๐บ ; see, e.g., [Mordukhovich 2006, Theorem 3.36 andTheorem 3.41].

Wewill illustrate how to prove the above calculus results andmore in Section 20.4 and Chap-ter 25, after studying the dierentiation of set-valued mappings.

5or Asplund

228

17 Y-SUBDIFFERENTIALS AND APPROXIMATE FERMAT

PRINCIPLES

We now study an approximate variant of the Frรฉchet subdierential of Section 16.2 aswell as related approximate Fermat principles; these will be needed in Chapter 18 to studylimiting tangent and normal cones.

17.1 Y-subdifferentials

Just like the Y-minimizers in Section 2.4, it can be useful to consider โ€œrelaxedโ€ Y-subdierenti-als. In particular, it is possible to derive exact calculus rules for these relaxed subdierentials,which can lead to tighter results than inclusions for the corresponding exact subdierentials(in particular, for the Frรฉchet subdierential). We will make use of this in Chapter 27.

Similarly to the Frรฉchet subdierential (16.2), we thus dene for ๐น : ๐‘‹ โ†’ โ„ the Y(-Frรฉchet)-subdierential by

(17.1) ๐œ•Y๐น (๐‘ฅ) โ‰”{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim inf๐‘ฆโ†’๐‘ฅ

๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ โˆ’Y

},

where ๐œ•0๐น = ๐œ•๐น๐น . The following lemma provides further insight into the Y-subdierential.

Lemma 17.1. Let ๐น : ๐‘‹ โ†’ โ„ on a Banach space ๐‘‹ , and Y โ‰ฅ 0. Then the following areequivalent:

(i) ๐‘ฅโˆ— โˆˆ ๐œ•Y๐น (๐‘ฅ);(ii) ๐‘ฅโˆ— โˆˆ ๐œ•๐น [๐น + Yโ€– ยท โˆ’ ๐‘ฅ โ€–๐‘‹ ] (๐‘ฅ);(iii) 0 โˆˆ ๐œ•๐น [๐น + Yโ€– ยท โˆ’ ๐‘ฅ โ€–๐‘‹ โˆ’ ใ€ˆ๐‘ฅโˆ—, ยท โˆ’ ๐‘ฅใ€‰] (๐‘ฅ).

Proof. For each of the alternatives, (17.1) is equivalent to

lim inf๐‘ฆโ†’๐‘ฅ

Yโ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ + ๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ 0. ๏ฟฝ

229

17 Y-subdifferentials and approximate fermat principles

We have the following โ€œfuzzyโ€ Y-sum rule.

Lemma 17.2. Let ๐‘‹ be a Banach space, ๐บ : ๐‘‹ โ†’ โ„, and ๐น : ๐‘‹ โ†’ โ„ be convex with๐œ•๐น (๐‘ฅ) โŠ‚ ๐”น(๐‘ฅโˆ—, Y) for some Y โ‰ฅ 0 and ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—. Then for all ๐›ฟ โ‰ฅ 0,

๐œ•๐›ฟ๐บ (๐‘ฅ) + ๐œ•๐น (๐‘ฅ) โŠ‚ ๐œ•๐›ฟ [๐บ + ๐น ] (๐‘ฅ) โŠ‚ ๐œ•Y+๐›ฟ๐บ (๐‘ฅ) + {๐‘ฅโˆ—}.In particular, if ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ), then

๐œ•๐›ฟ๐บ (๐‘ฅ) + ๐œ•๐น (๐‘ฅ) โŠ‚ ๐œ•๐›ฟ [๐บ + ๐น ] (๐‘ฅ) โŠ‚ ๐œ•Y+๐›ฟ๐บ (๐‘ฅ) + ๐œ•๐น (๐‘ฅ).

Proof. We start with the rst inclusion. Let ๐‘ฅโˆ— โˆˆ ๐œ•๐น (๐‘ฅ) and ๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ๐บ (๐‘ฅ). Then the deni-tions (4.1) and (17.1), respectively, imply that

lim inf๐‘ฆโ†’๐‘ฅ

๐บ (๐‘ฆ) โˆ’๐บ (๐‘ฅ) + ๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ— + ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹

โ‰ฅ lim inf๐‘ฆโ†’๐‘ฅ

๐บ (๐‘ฆ) โˆ’๐บ (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ โˆ’๐›ฟ,

i.e., ๐‘ฅโˆ— + ๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ [๐บ + ๐น ] (๐‘ฅ).To prove the second inclusion, let ๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ [๐บ + ๐น ] (๐‘ฅ) and โ„Ž โˆˆ ๐‘‹ with โ€–โ„Žโ€–๐‘‹ = 1. Then (17.1)implies that for all ๐‘ก๐‘›โ†’ 0 and โ„Ž๐‘› โ†’ โ„Ž,

(17.2) lim inf๐‘›โ†’โˆž

๐น (๐‘ฅ + ๐‘ก๐‘›โ„Ž๐‘›) โˆ’ ๐น (๐‘ฅ) +๐บ (๐‘ฅ + ๐‘ก๐‘›โ„Ž๐‘›) โˆ’๐บ (๐‘ฅ) โˆ’ ๐‘ก๐‘›ใ€ˆ๐‘ฅโˆ—, โ„Ž๐‘›ใ€‰๐‘‹๐‘ก๐‘›

โ‰ฅ โˆ’๐›ฟ.

Since ๐น is directionally dierentiable by Lemma 4.3 and locally Lipschitz around ๐‘ฅ โˆˆint(dom ๐น ) = ๐‘‹ by Theorem 3.13 with Lipschitz constant ๐ฟ > 0, we have

lim๐‘›โ†’โˆž

๐น (๐‘ฅ + ๐‘ก๐‘›โ„Ž๐‘›) โˆ’ ๐น (๐‘ฅ)๐‘ก๐‘›

โ‰ค lim๐‘›โ†’โˆž

(๐น (๐‘ฅ + ๐‘ก๐‘›โ„Ž) โˆ’ ๐น (๐‘ฅ)

๐‘ก๐‘›+ ๐ฟโ€–โ„Ž๐‘› โˆ’ โ„Žโ€–๐‘‹

)= ๐น โ€ฒ(๐‘ฅ ;โ„Ž).

Let now ๐œŒ > 0 be arbitrary. Then by Lemma 4.4, Theorem 13.8, and Corollary 13.15 thereexists an ๐‘ฅโˆ—

โ„Ž,๐œŒโˆˆ ๐œ•๐น (๐‘ฅ) such that ๐น โ€ฒ(๐‘ฅ ;โ„Ž) โ‰ค ใ€ˆ๐‘ฅโˆ—

โ„Ž,๐œŒ, โ„Žใ€‰๐‘‹ + ๐œŒ . Therefore

lim๐‘›โ†’โˆž

๐น (๐‘ฅ + ๐‘ก๐‘›โ„Ž๐‘›) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐‘ก๐‘›ใ€ˆ๐‘ฅโˆ—, โ„Ž๐‘›ใ€‰๐‘‹๐‘ก๐‘›

โ‰ค ๐น โ€ฒ(๐‘ฅ ;โ„Ž) โˆ’ ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹โ‰ค ใ€ˆ๐‘ฅโˆ—โ„Ž,๐œŒ โˆ’ ๐‘ฅโˆ—, โ„Žใ€‰๐‘‹ + ๐œŒโ‰ค Y + ๐œŒ,

where we have used that ๐œ•๐น (๐‘ฅ) โŠ‚ ๐”น(๐‘ฅโˆ—, Y) and โ€–โ„Žโ€–๐‘‹ = 1 in the last inequality. Since ๐œŒ > 0was arbitrary, the characterization (17.2) now implies

lim inf๐‘›โ†’โˆž

๐บ (๐‘ฅ + ๐‘ก๐‘›โ„Ž๐‘›) โˆ’๐บ (๐‘ฅ) โˆ’ ๐‘ก๐‘›ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, โ„Ž๐‘›ใ€‰๐‘‹๐‘ก๐‘›

โ‰ฅ โˆ’(๐›ฟ + Y).

Since ๐‘ฆ๐‘› โ‰” ๐‘ฅ + ๐‘ก๐‘›โ„Ž๐‘› โ†’ ๐‘ฅ was arbitrary, this proves ๐‘ฅโˆ— โˆ’๐‘ฅโˆ— โˆˆ ๐œ•Y+๐›ฟ๐บ (๐‘ฅ), i.e., ๐œ•๐›ฟ [๐บ + ๐น ] (๐‘ฅ) โŠ‚๐œ•Y+๐›ฟ๐บ (๐‘ฅ) + {๐‘ฅโˆ—}. ๏ฟฝ

230

17 Y-subdifferentials and approximate fermat principles

The following is now immediate from Theorem 4.5, since we are allowed to take Y = 0 if๐œ•๐น (๐‘ฅ) is a singleton.

Corollary 17.3. Let ๐‘‹ be a Banach space,๐บ : ๐‘‹ โ†’ โ„, and ๐น : ๐‘‹ โ†’ โ„ be convex and Gรขteauxdierentiable at ๐‘ฅ โˆˆ ๐‘‹ . Then for every ๐›ฟ โ‰ฅ 0,

๐œ•๐›ฟ [๐บ + ๐น ] (๐‘ฅ) = ๐œ•๐›ฟ๐บ (๐‘ฅ) + {๐ท๐น (๐‘ฅ)}.In particular,

๐œ•๐น [๐บ + ๐น ] (๐‘ฅ) = ๐œ•๐น๐บ (๐‘ฅ) + {๐ท๐น (๐‘ฅ)}.

17.2 smooth spaces

For the remaining results in this chapter, we need additional assumptions on the normedvector space ๐‘‹ . In particular, we need to assume that the norm is Gรขteaux dierentiable on๐‘‹ \ {0}; we call such spaces Gรขteaux smooth.

Recalling from Chapter 7 the duality between dierentiability and convexity, it is notsurprising that this property can be related to the convexity of the dual norm. Here weneed the following property: a normed vector space ๐‘‹ is called locally uniformly convex iffor any ๐‘ฅ โˆˆ ๐‘‹ with โ€–๐‘ฅ โ€–๐‘‹ = 1 and all Y โˆˆ (0, 2] there exists a ๐›ฟ (Y, ๐‘ฅ) > 0 such that

(17.3) โ€– 12 (๐‘ฅ + ๐‘ฆ)โ€–๐‘‹ โ‰ค 1 โˆ’ ๐›ฟ (Y, ๐‘ฅ) for all ๐‘ฆ โˆˆ ๐‘‹ with โ€–๐‘ฆ โ€–๐‘‹ = 1 and โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ โ‰ฅ Y.

Lemma 17.4. Let ๐‘‹ be a Banach space and ๐‘‹ โˆ— be locally uniformly convex. Then ๐‘‹ is Gรขteauxsmooth.

Proof. Let ๐‘ฅ โˆˆ ๐‘‹ \ {0} be given. Since norms are convex, it suces by Theorem 13.18to show that ๐œ•โ€– ยท โ€–๐‘‹ (๐‘ฅ) is a singleton. Let therefore ๐‘ฅโˆ—1 , ๐‘ฅโˆ—2 โˆˆ ๐œ•โ€– ยท โ€–๐‘‹ (๐‘ฅ), i.e., satisfying byTheorem 4.6

โ€–๐‘ฅโˆ—1 โ€–๐‘‹ โˆ— = โ€–๐‘ฅโˆ—2 โ€–๐‘‹ โˆ— = 1, ใ€ˆ๐‘ฅโˆ—1 , ๐‘ฅใ€‰๐‘‹ = ใ€ˆ๐‘ฅโˆ—2, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ .This implies that

2 =1

โ€–๐‘ฅ โ€–๐‘‹(ใ€ˆ๐‘ฅโˆ—1 , ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฅโˆ—2, ๐‘ฅใ€‰๐‘‹

)= ใ€ˆ๐‘ฅโˆ—1 + ๐‘ฅโˆ—2, ๐‘ฅ

โ€–๐‘ฅ โ€–๐‘‹ ใ€‰๐‘‹ โ‰ค โ€–๐‘ฅโˆ—1 + ๐‘ฅโˆ—2 โ€–๐‘‹ โˆ—

by (1.1) and hence that โ€– 12 (๐‘ฅโˆ—1 +๐‘ฅโˆ—2)โ€–๐‘‹ โˆ— โ‰ฅ 1. Since ๐‘‹ โˆ— is locally uniformly convex, this is only

possible if ๐‘ฅโˆ—1 = ๐‘ฅโˆ—2 , as otherwise we could choose for Y โ‰” โ€–๐‘ฅโˆ—1 โˆ’ ๐‘ฅโˆ—2 โ€–๐‘‹ โˆ— โˆˆ (0, 2] a ๐›ฟ (Y, ๐‘ฅ) > 0such that โ€– 1

2 (๐‘ฅโˆ—1 + ๐‘ฅโˆ—2)โ€–๐‘‹ โˆ— โ‰ค 1 โˆ’ ๐›ฟ (Y, ๐›ฟ) < 1. ๏ฟฝ

Remark 17.5. In fact, if ๐‘‹ is additionally reexive, the norm is even continuously (Frรฉchet) dieren-tiable; see [Schirotzek 2007, Proposition 4.7.10]. We will not need this stronger property, however.In addition, locally uniformly convex spaces always have the Radonโ€“Riesz property; see [Schirotzek2007, Lemma 4.7.9].

231

17 Y-subdifferentials and approximate fermat principles

Example 17.6. The following spaces are locally uniformly convex:

(i) ๐‘‹ a Hilbert space. This follows from the parallelogram identity

โ€– 12 (๐‘ฅ + ๐‘ฆ)โ€–2๐‘‹ =

12 โ€–๐‘ฅ โ€–

2๐‘‹ + 1

2 โ€–๐‘ฆ โ€–2๐‘‹ โˆ’ 1

4 โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹,

which in fact characterizes precisely those norms that are induced by an innerproduct. This identity immediately yields for all Y > 0 and all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ satisfyingโ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ โ‰ฅ Y that

โ€– 12 (๐‘ฅ + ๐‘ฆ)โ€–2๐‘‹ โ‰ค 1 โˆ’ Y2

4 โ‰ค(1 โˆ’ Y2

8

)2,

which in particular veries (17.3) with ๐›ฟ := Y2

8 .

(ii) ๐‘‹ = ๐ฟ๐‘ (ฮฉ) for ๐‘ โˆˆ (2,โˆž). This follows from the algebraic inequality

|๐‘Ž + ๐‘ |๐‘ + |๐‘Ž โˆ’ ๐‘ |๐‘ โ‰ค 2๐‘โˆ’1( |๐‘Ž |๐‘ + |๐‘ |๐‘) for all ๐‘Ž, ๐‘ โˆˆ โ„,

see [Cioranescu 1990, Lemma II.4.1]. This implies that

โ€– 12 (๐‘ข + ๐‘ฃ)โ€–

๐‘๐ฟ๐‘ (ฮฉ) โ‰ค

12 โ€–๐‘ขโ€–

๐‘๐ฟ๐‘ (ฮฉ) +

12 โ€–๐‘ฃ โ€–

๐‘๐ฟ๐‘ (ฮฉ) โˆ’

12๐‘ โ€–๐‘ข โˆ’ ๐‘ฃ โ€–

๐‘๐ฟ๐‘ (ฮฉ) for all ๐‘ข, ๐‘ฃ โˆˆ ๐ฟ๐‘ (ฮฉ).

We can now argue exactly as in case (i).

(iii) ๐‘‹ = ๐ฟ๐‘ (ฮฉ) for ๐‘ โˆˆ (1, 2). This follows from the algebraic inequality

|๐‘Ž + ๐‘ |๐‘ + |๐‘Ž โˆ’ ๐‘ |๐‘ โ‰ค 2( |๐‘Ž |๐‘ + |๐‘ |๐‘)๐‘/(๐‘โˆ’1) for all ๐‘Ž, ๐‘ โˆˆ โ„,

see [Cioranescu 1990, Lemma II.4.1], implying a similar inequality for the ๐ฟ๐‘ (ฮฉ)norms from which the claim follows as for (i) and (ii).

Hence every Hilbert space (by identifying ๐‘‹ with ๐‘‹ โˆ—) and every ๐ฟ๐‘ (ฮฉ) for ๐‘ โˆˆ (1,โˆž)(identifying ๐ฟ๐‘ (ฮฉ) with ๐ฟ๐‘ž (ฮฉ), ๐‘ž = ๐‘

๐‘โˆ’1 โˆˆ (1,โˆž) is Gรขteaux smooth.

In fact, the celebrated Lindenstrauss and Trojanski renorming theorems show that everyreexive Banach space admits an equivalent norm such that the space (with that norm)becomes locally uniformly convex; see [Cioranescu 1990, Theorem III.2.10]. (Of course,even though that means that the dual space of the renormed space is Gรขteaux smooth, thisdoes not imply anything about the dierentiability of the original norm, as the obviousexample of โ„๐‘ endowed with the 1- or the โˆž-norm shows.) For many more details onsmooth and uniformly convex spaces, see [Fabian et al. 2001; Schirotzek 2007; Cioranescu1990].

232

17 Y-subdifferentials and approximate fermat principles

Note that even in Gรขteaux smooth spaces, the norm will not be dierentiable at ๐‘ฅ = 0. Butthis can be addressed by considering โ€–๐‘ฅ โ€–๐‘๐‘‹ for ๐‘ > 1; for later use, we state this for ๐‘ = 2.

Lemma 17.7. Let ๐‘‹ be a Gรขteaux smooth Banach space and ๐น (๐‘ฅ) = โ€–๐‘ฅ โ€–2๐‘‹ . Then ๐น is Gรขteauxdierentiable at any ๐‘ฅ โˆˆ ๐‘‹ with

๐ท๐น (๐‘ฅ) = 2โ€–๐‘ฅ โ€–๐‘‹๐‘ฅโˆ— for any ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— with โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 1 and ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ .

Proof. Since norms are convex, we can apply Theorems 4.6 and 4.19 to obtain that

๐œ•๐น (๐‘ฅ) = {2โ€–๐‘ฅ โ€–๐‘‹๐‘ฅโˆ— | ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— with โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— = 1 and ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โ€–๐‘‹ } (๐‘ฅ โˆˆ ๐‘‹ ).

At any ๐‘ฅ โ‰  0, this set is a singleton by Theorem 4.5 and the assumption that ๐‘‹ is Gรขteauxsmooth. Clearly also ๐œ•๐น (0) = {0}, and hence the claim follows from Theorem 13.18. ๏ฟฝ

Remark 17.8 (Asplund spaces). Asplund spaces are, by (one equivalent) denition, those Banachspaces where every continuous, convex, real-valued function is Frรฉchet-dierentiable on a denseset. (This is a limited version of Rademacherโ€™s Theorem 13.25 in โ„๐‘ .) We refer to [Yost 1993] for anintroduction to Asplund spaces. Importantly, reexive Banach spaces are Asplund.

The norm of an Asplund space is thus dierentiable on a dense set ๐ท . It was shown in [Ekeland &Lebourg 1976] that perturbed optimization problems on Asplund spaces have solutions on a denseset of perturbation parameters and that the objective function is dierentiable at such a solution. Ifwe worked in the following sections with perturbed optimization problems and applied such anexistence result instead of the Ekeland or the Borweinโ€“Preiss variational principles (Theorem 2.14or Theorem 2.15, respectively), we would be able to extend the following results to Asplund spaces.

17.3 fuzzy fermat principles

The following result generalizes the Fermat principle of Theorem 16.2 to sums of twofunctions in a โ€œfuzzyโ€ fashion. We will use it to show a fuzzy containment formula forY-subdierentials. Its generalizations to more than two functions can also be used to derivemore advanced fuzzy sum rules than 17.2. Our focus is, however, on exact calculus, so wewill not be developing such generalizations.

Lemma 17.9 (fuzzy Fermat principle). Let ๐‘‹ be a Gรขteaux smooth Banach space and ๐น,๐บ :๐‘‹ โ†’ โ„. If ๐น +๐บ attains a local minimum at a point ๐‘ฅ โˆˆ ๐‘‹ where ๐น is lower semicontinuousand ๐บ is locally Lipschitz, then for any ๐›ฟ, ` > 0 we have

0 โˆˆโ‹ƒ

๐‘ฅ,๐‘ฆโˆˆ๐”น(๐‘ฅ,๐›ฟ)(๐œ•๐น๐น (๐‘ฅ) + ๐œ•๐น๐บ (๐‘ฆ)) + `๐”น๐‘‹ โˆ— .

233

17 Y-subdifferentials and approximate fermat principles

Proof. Let ๐œŒ, ๐›ผ > 0 be arbitrary. The idea is to separate the two nonsmooth functions ๐นand๐บ , and hence be able to use the exact sum rule of Corollary 17.3, by locally relaxing theproblem min๐‘ฅโˆˆ๐‘‹ (๐น +๐บ) to

inf๐‘ฅ,๐‘ฆโˆˆ๐‘‹

๐ฝ๐›ผ (๐‘ฅ, ๐‘ฆ) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ฆ) + ๐›ผ โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ + โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐›ฟ๐”น(๐‘ฅ,๐œŒ)2 (๐‘ฅ, ๐‘ฆ).

We take ๐œŒ > 0 small enough that ๐‘ฅ minimizes ๐น +๐บ within ๐”น(๐‘ฅ, ๐œŒ), and both ๐น โ‰ฅ ๐น (๐‘ฅ) โˆ’ 1and๐บ โ‰ฅ ๐บ (๐‘ฆ) โˆ’ 1 on ๐”น(๐‘ฅ, ๐œŒ). The rst requirement is possible by the assumption of ๐น +๐บattaining its local minimum at ๐‘ฅ , while the latter follows from the lower semicontinuity of ๐นand the local Lipschitz continuity of๐บ . In the following, we denote by ๐ฟ the Lipschitz factorof๐บ on ๐”น(๐‘ฅ, ๐œŒ). It follows that ๐ฝ๐›ผ (๐‘ฅ, ๐‘ฆ) โ‰ฅ ๐น (๐‘ฅ) +๐บ (๐‘ฅ) โˆ’2 for all (๐‘ฅ, ๐‘ฆ) โˆˆ ๐”น(๐‘ฅ, ๐œŒ)2 = dom ๐ฝ๐›ผ ,and hence ๐ฝ๐›ผ is bounded from below.

We study the approximate solutions of the relaxed problem in several steps.

Step 1: constrained inmal values converge to ๐ฝ (๐‘ฅ, ๐‘ฅ). Let ๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ โˆˆ ๐”น(๐‘ฅ, ๐œŒ) be such that

(17.4) ๐ฝ๐›ผ (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) < ๐‘—๐›ผ + ๐›ผโˆ’1 where ๐‘—๐›ผ โ‰” inf๐‘ฅ,๐‘ฆโˆˆ๐‘‹

๐ฝ๐›ผ (๐‘ฅ, ๐‘ฆ).

We show that

๐ฝ๐›ผ (๐‘ฅ, ๐‘ฅ) < ๐‘—๐›ผ + Y๐›ผ for Y๐›ผ โ‰” ๐ฟ

โˆš๐›ผโˆ’1 + 2๐›ผ

+ ๐›ผโˆ’1.To start with, we have

๐น (๐‘ฅ) +๐บ (๐‘ฅ) + ๐›ผโˆ’1 = ๐ฝ (๐‘ฅ, ๐‘ฅ) + ๐›ผโˆ’1โ‰ฅ ๐‘—๐›ผ + ๐›ผโˆ’1> ๐ฝ๐›ผ (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ )= ๐น (๐‘ฅ๐›ผ ) +๐บ (๐‘ฆ๐›ผ ) + ๐›ผ โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฆ๐›ผ โ€–2๐‘‹ + โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฅ โ€–2๐‘‹โ‰ฅ ๐น (๐‘ฅ) +๐บ (๐‘ฅ) + ๐›ผ โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฆ๐›ผ โ€–2๐‘‹ + โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ 2.

This implies that โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฆ๐›ผ โ€–๐‘‹ <

โˆš๐›ผโˆ’1+2๐›ผ . Since ๐‘ฅ minimizes ๐น +๐บ within ๐”น(๐‘ฅ, ๐œŒ), we obtain

the bound (17.4) through

๐ฝ๐›ผ (๐‘ฅ, ๐‘ฅ) = ๐น (๐‘ฅ) +๐บ (๐‘ฅ)โ‰ค ๐น (๐‘ฅ๐›ผ ) +๐บ (๐‘ฅ๐›ผ )โ‰ค ๐น (๐‘ฅ๐›ผ ) +๐บ (๐‘ฆ๐›ผ ) + ๐ฟโ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฆ๐›ผ โ€–๐‘‹ .โ‰ค ๐ฝ (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) + ๐ฟโ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฆ๐›ผ โ€–๐‘‹ .< ๐‘—๐›ผ + Y๐›ผ .

Step 2: exact unconstrained minimizers exist for a perturbed problem. By (17.4), we can applythe Borweinโ€“Preiss variational principle (Theorem 2.15) for any _, ๐›ผ > 0, small enough๐œŒ > 0 (all to be xed later), and ๐‘ = 2 to obtain a sequence {`๐‘›}๐‘›โ‰ฅ0 of nonnegative weightssumming to 1 and a sequence {(๐‘ฅ๐‘›, ๐‘ฆ๐‘›)}๐‘›โ‰ฅ0 โŠ‚ ๐‘‹ โˆˆ ๐‘‹ with (๐‘ฅ0, ๐‘ฆ0) = (๐‘ฅ, ๐‘ฅ) convergingstrongly to some (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) โˆˆ ๐‘‹ ร— ๐‘‹ (endowed with the euclidean product norm) such that

234

17 Y-subdifferentials and approximate fermat principles

(i) โ€–๐‘ฅ๐‘› โˆ’ ๐‘ฅ๐›ผ โ€–2๐‘‹ + โ€–๐‘ฆ๐‘› โˆ’ ๐‘ฆ๐›ผ โ€–2๐‘‹ โ‰ค _2 for all ๐‘› โ‰ฅ 0 (in particular, โ€–๐‘ฅ โˆ’ ๐‘ฅ๐›ผ โ€–๐‘‹ โ‰ค _);

(ii) the function

๐ป๐›ผ (๐‘ฅ, ๐‘ฆ) โ‰” ๐ฝ๐›ผ (๐‘ฅ, ๐‘ฆ) + Y๐›ผ_2

โˆžโˆ‘๐‘›=0

`๐‘›(โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘›โ€–2 + โ€–๐‘ฆ โˆ’ ๐‘ฆ๐‘›โ€–2

)attains its global minimum at (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ).

Note that since ๐ฝ๐›ผ includes the constraint (๐‘ฅ, ๐‘ฆ) โˆˆ ๐”น(๐‘ฅ, ๐œŒ)2, we have (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) โˆˆ ๐”น(๐‘ฅ, ๐œŒ)2.In fact, by taking _ โˆˆ (0, ๐œŒ), it follows from (i) and the convergence (๐‘ฅ๐‘›, ๐‘ฆ๐‘›) โ†’ (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ )that the minimizer (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) โˆˆ ๐”น(๐‘ฅ, _)2 โŠ‚ int๐”น(๐‘ฅ, ๐œŒ)2 is unconstrained.Step 3: the perturbed minimizers satisfy the claim for large ๐›ผ and small _. Setting ฮจ๐‘ฆ (๐‘ฅ) โ‰”โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–2๐‘‹ , it follows from Lemma 17.7 that ฮจ๐‘ฆ is Gรขteaux dierentiable for any ๐‘ฆ โˆˆ๐‘‹ with ๐ทฮจ๐‘ฆ (๐‘ฅ) โˆˆ 2โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹๐”น๐‘‹ โˆ— . Furthermore, since (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) โˆˆ int๐”น(๐‘ฅ, ๐œŒ)2, we have๐œ•๐›ฟ๐”น(๐‘ฅ,๐œŒ)2 (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) = (0, 0). Hence the only nonsmooth component of ๐ป๐›ผ at (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) is(๐‘ฅ, ๐‘ฆ) โ†ฆโ†’ ๐น (๐‘ฅ) +๐บ (๐‘ฆ). We can thus apply Theorem 16.2 and Corollary 17.3 to obtain

0 โˆˆ ๐œ•๐น๐ป๐›ผ (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) =(๐œ•๐น๐น (๐‘ฅ) + ๐›ผ๐ทฮจ๐‘ฆ๐›ผ (๐‘ฅ๐›ผ ) + ๐ทฮจ๐‘ฅ (๐‘ฅ๐›ผ ) + Y๐›ผ

_2โˆ‘โˆž๐‘›=0 `๐‘›๐ทฮจ๐‘ฅ๐‘› (๐‘ฅ๐›ผ )

๐œ•๐น๐บ (๐‘ฅ) + ๐›ผ๐ทฮจ๐‘ฅ๐›ผ (๐‘ฆ๐›ผ ) + Y๐›ผ_2

โˆ‘โˆž๐‘›=0 `๐‘›๐ทฮจ๐‘ฆ๐‘› (๐‘ฆ๐›ผ )

).

By (i) and ๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ โˆˆ ๐”น(๐‘ฅ, _) we have โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฅ๐‘›โ€–๐‘‹ , โ€–๐‘ฆ๐›ผ โˆ’ ๐‘ฆ๐‘›โ€–๐‘‹ โ‰ค _ for all ๐‘› โ‰ฅ 0. In addition,โˆ‘โˆž๐‘›=0 `๐‘› = 1, and thus Y๐›ผ

_2โˆ‘โˆž๐‘›=0 `๐‘›๐ทฮจ๐‘ฅ๐‘› (๐‘ฅ๐›ผ ) โˆˆ 2Y๐›ผ

_ ๐”น๐‘‹ โˆ— and likewise for ๐ทฮจ๐‘ฆ๐‘› (so that in factwe were justied in dierentiating the series term-wise). By (i) also โ€–๐‘ฅ๐›ผ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค _, so that๐ทฮจ๐‘ฅ (๐‘ฅ๐›ผ ) โˆˆ 2_๐”น๐‘‹ โˆ— . Finally, since โˆ’๐‘ฅโˆ— โˆˆ ๐œ•โ€– ยท โ€–๐‘‹ (โˆ’๐‘ฅ) for any ๐‘ฅโˆ— โˆˆ ๐œ•โ€– ยท โ€–๐‘‹ (๐‘ฅ) and any ๐‘ฅ โˆˆ ๐‘‹ ,we have ๐ทฮจ๐‘ฆ (๐‘ฅ) = โˆ’๐ทฮจ๐‘ฅ (๐‘ฆ) for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘‹ . We thus have

โˆ’๐›ผ๐ทฮจ๐‘ฆ๐›ผ (๐‘ฅ๐›ผ ) โˆˆ ๐œ•๐น๐น (๐‘ฅ๐›ผ ) +(2_ + 2Y๐›ผ

_

)๐”น๐‘‹ โˆ—,

๐›ผ๐ทฮจ๐‘ฆ๐›ผ (๐‘ฅ๐›ผ ) โˆˆ ๐œ•๐น๐บ (๐‘ฆ๐›ผ ) + 2Y๐›ผ_

๐”น๐‘‹ โˆ—,

which implies that

0 โˆˆ ๐œ•๐น๐น (๐‘ฅ๐›ผ ) + ๐œ•๐น๐บ (๐‘ฆ๐›ผ ) +(2_ + 4Y๐›ผ

_

)๐”น๐‘‹ โˆ— .

Since (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) โˆˆ ๐”น(๐‘ฅ, _)2, the claim now follows by taking _ โˆˆ (0, ๐œŒ) small enough andthen ๐›ผ > 0 large (and thus Y๐›ผ small) enough. ๏ฟฝ

Remark 17.10 (fuzzy Fermat principles and trustworthy subdierentials). Lemma 17.9 is due to[Fabian 1988]. Such fuzzy Fermat principles are studied in more detail from the point of view offuzzy variational principles in [Ioe 2017]. Specically, the claim of Lemma 17.9 has to hold foran arbitrary subdierential operator ๐œ•โˆ— for it to be trustworthy, whereas the opposite inclusion๐œ•โˆ—๐บ (๐‘ฅ) + ๐œ•โˆ—๐น (๐‘ฅ) โŠ‚ ๐œ•โˆ— [๐บ + ๐น ] (๐‘ฅ) is required for the subdierential to be elementary.

235

17 Y-subdifferentials and approximate fermat principles

Remark 17.11 (notes on the proof of Lemma 17.9). Note how we had to apply the Borweinโ€“Preissvariational principle instead of Ekelandโ€™s to obtain a dierentiable convex perturbation and thusto be able to apply the sum rule Corollary 17.3. In contrast, the proof in [Ioe 2017] is based onthe Devilleโ€“Godefroyโ€“Zizler variational principle that makes no convexity assumption on theperturbation function and hence requires the stronger property of Frรฉchet smoothness (i.e., Frรฉchetinstead of Gรขteaux dierentiability of the norm outside the origin).

Finally, with an additional argument showing ๐ฝ๐›ผ (๐‘ฅ๐›ผ , ๐‘ฆ๐›ผ ) โ‰ค ๐‘—๐›ผ + ๐›ฝ๐›ผ for a suitable ๐›ฝ๐›ผ , it would bepossible to further constrain |๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅ) | โ‰ค ๐›ฟ in the claim of Lemma 17.9, as is done in [Ioe 2017,Theorem 4.30].

Corollary 17.12. Let ๐‘‹ be a Gรขteaux smooth Banach space, let ๐น : ๐‘‹ โ†’ โ„ be lower semicon-tinuous near ๐‘ฅ โˆˆ ๐‘‹ , and Y > 0. Then for any ๐›ฟ > 0 and Yโ€ฒ > Y we have

๐œ•Y๐น (๐‘ฅ) โŠ‚โ‹ƒ

๐‘งโˆˆ๐”น(๐‘ฅ,๐›ฟ)๐œ•๐น๐น (๐‘ง) + Yโ€ฒ๐”น๐‘‹ โˆ— .

Proof. We may assume that ๐‘ฅ โˆˆ dom ๐น , in particular that there exists some ๐‘ฅโˆ— โˆˆ ๐œ•Y๐น (๐‘ฅ),i.e., such that

lim inf๐‘ฅโ‰ ๐‘ฆโ†’๐‘ฅ

๐น (๐‘ฆ) โˆ’ ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ โˆ’Y.

Taking any Yโ€ฒ > Y and dening

๐น (๐‘ฅ) โ‰” ๐น (๐‘ฅ) โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ and ๐บ (๐‘ฅ) โ‰” Yโ€ฒโ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ,

we obtain as in Lemma 17.1 that

lim inf๐‘ฅโ‰ ๐‘ฆโ†’๐‘ฅ

(๐บ + ๐น ) (๐‘ฆ) โˆ’ (๐บ + ๐น ) (๐‘ฅ)โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ (Yโ€ฒ โˆ’ Y).

Thus ๐น +๐บ achieves its local minimum at ๐‘ฅ . The function ๐บ is convex and Lipschitz while๐น lower semicontinuous. Hence Lemma 17.9 implies for any ๐›ฟ > 0 and `โ€ฒ > 0 that

0 โˆˆโ‹ƒ

๐‘ง,๐‘ฆโˆˆ๐”น(๐‘ฅ,๐›ฟ)

(๐œ•๐น๐น (๐‘ฆ) + ๐œ•๐น๐บ (๐‘ง)) + `โ€ฒ๐”น๐‘‹ .

Since ๐œ•๐น๐น (๐‘ฆ) = ๐œ•๐น๐น (๐‘ฆ) โˆ’ {๐‘ฅโˆ—} (by Corollary 17.3 or directly from the denition) and๐œ•๐น๐บ (๐‘ง) = ๐œ•๐บ (๐‘ง) โŠ‚ Yโ€ฒ๐”น๐‘‹ , we obtain

๐‘ฅโˆ— โˆˆโ‹ƒ

๐‘งโˆˆ๐”น(๐‘ฅ,๐›ฟ)๐œ•๐น๐น (๐‘ง) + (`โ€ฒ + Yโ€ฒ)๐”น๐‘‹ .

Since `โ€ฒ > 0 and Yโ€ฒ > Y were arbitrary, the claim follows. ๏ฟฝ

236

17 Y-subdifferentials and approximate fermat principles

17.4 approximate fermat principles and projections

We now introduce an approximate Fermat principle, which can be invoked when we do notknow whether a minimizer exists; in particular, when ๐น fails to be weakly lower semicon-tinuous so that Theorem 2.1 is not applicable.

Theorem 17.13. Let ๐‘‹ be a Banach space and ๐น : ๐‘‹ โ†’ โ„ be proper, lower semicontinuous,and bounded from below. Then for every Y, ๐›ฟ > 0 there exists an ๐‘ฅY โˆˆ ๐‘‹ such that

(i) ๐น (๐‘ฅY) โ‰ค inf๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) + Y;(ii) ๐น (๐‘ฅY) < ๐น (๐‘ฅ) + ๐›ฟ โ€–๐‘ฅ โˆ’ ๐‘ฅY โ€–๐‘‹ for all ๐‘ฅ โ‰  ๐‘ฅY ;

(iii) 0 โˆˆ ๐œ•๐›ฟ๐น (๐‘ฅY).

Proof. Since ๐น is bounded from below, inf๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) > โˆ’โˆž. We can thus take a minimizingsequence {๐‘ฅ๐‘›}๐‘›โˆˆโ„• with ๐น (๐‘ฅ๐‘›)โ†’ inf๐‘ฅโˆˆ๐‘‹ ๐น (๐‘ฅ) and nd a ๐‘›(Y) โˆˆ โ„• such that ๐‘ฅY โ‰” ๐‘ฅ๐‘›(Y)satises (i). Ekelandโ€™s variational principle Theorem 2.14 thus yields for _ โ‰” Y/๐›ฟ an๐‘ฅY โ‰” ๐‘ฅY,_ such that โ€–๐‘ฅY โˆ’ ๐‘ฅY โ€–๐‘‹ โ‰ค _,

๐น (๐‘ฅY) โ‰ค ๐น (๐‘ฅY) + Y_โ€–๐‘ฅY โˆ’ ๐‘ฅY โ€–๐‘‹ โ‰ค ๐น (๐‘ฅY),

as well as๐น (๐‘ฅY) < ๐น (๐‘ฅ) + Y

_โ€–๐‘ฅY โˆ’ ๐‘ฅ โ€–๐‘‹ (๐‘ฅ โ‰  ๐‘ฅY).

Thus (i) as well as (ii) hold. The latter implies for all ๐‘ฅ โ‰  ๐‘ฅY that

๐น (๐‘ฅ) โˆ’ ๐น (๐‘ฅY) โˆ’ ใ€ˆ0, ๐‘ฅ โˆ’ ๐‘ฅYใ€‰๐‘‹โ€–๐‘ฅ โˆ’ ๐‘ฅY โ€–๐‘‹ โ‰ฅ โˆ’๐›ฟ,

i.e., 0 โˆˆ ๐œ•๐›ฟ๐น (๐‘ฅY) by denition. ๏ฟฝ

As an example for possible applications of approximate Fermat principles, we use it to provethe following result on projections and approximate projections onto a nonconvex set๐ถ โŠ‚ ๐‘‹ .For nonconvex sets, even the exact projection need no longer be unique; furthermore, forthe reasons discussed before Theorem 17.13, the set of projections ๐‘ƒ๐ถ (๐‘ฅ) may be emptywhen ๐ถ โ‰  โˆ… is closed but not weakly closed. We recall that by Lemma 1.10, convex closedsets are weakly closed, as are, of course, nite-dimensional closed sets. However, moregenerally, weak closedness can be elusive. Hence we will need to perform approximateprojections in Part IV. It is not surprising that this requires additional assumptions on thecontaining space to make up for this.

237

17 Y-subdifferentials and approximate fermat principles

Theorem 17.14. Let ๐‘‹ be a Gรขteaux smooth Banach space and let ๐ถ โŠ‚ ๐‘‹ be nonempty andclosed. Dene the (possibly multi-valued) projection

๐‘ƒ๐ถ : ๐‘‹ โ‡’ ๐‘‹, ๐‘ƒ๐ถ (๐‘ฅ) โ‰” argmin๐‘ฅโˆˆ๐ถ

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹

and the corresponding distance function

๐‘‘๐ถ : ๐‘‹ โ†’ โ„, ๐‘‘๐ถ (๐‘ฅ) โ‰” inf๐‘ฅโˆˆ๐ถ

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ .

Then the following hold:

(i) For any ๐‘ฅ โˆˆ ๐‘ƒ๐ถ (๐‘ฅ), there exists an ๐‘ฅโˆ— โˆˆ ๐œ•๐น๐›ฟ๐ถ (๐‘ฅ) such that

(17.5) ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ , โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1.

(ii) For any Y > 0, there exists an approximate projection ๐‘ฅY โˆˆ ๐ถ satisfying

โ€–๐‘ฅY โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ๐‘‘๐ถ (๐‘ฅ) + Y

as well as (17.5) for some ๐‘ฅโˆ— โˆˆ ๐œ•Y๐›ฟ๐ถ (๐‘ฅY).(iii) If ๐‘‹ is a Hilbert space, then ๐‘ฅ โˆ’ ๐‘ฅ โˆˆ ๐œ•Y๐›ฟ๐ถ (๐‘ฅ) for all Y โ‰ฅ 0.

Proof. (i): Let ๐‘ฅ โˆ‰ ๐ถ , since otherwise ๐‘ฅโˆ— โ‰” 0 โˆˆ ๐œ•๐น (๐‘ฅ) for ๐‘ฅ = ๐‘ฅ โˆˆ ๐ถ by the denition ofthe Frรฉchet subdierential. Set ๐น (๐‘ฅ) := โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ and assume that ๐‘ฅ โˆˆ ๐‘ƒ๐ถ (๐‘ฅ). The Fermatprinciple Theorem 16.2 then yields that 0 โˆˆ ๐œ•๐น [๐›ฟ๐ถ + ๐น ] (๐‘ฅ). Since ๐‘ฅ โˆ‰ ๐ถ and ๐‘ฅ โˆˆ ๐ถ , byassumption ๐น is dierentiable at ๐‘ฅ . Thus Theorem 4.5 shows that ๐œ•๐น (๐‘ฅ) = {๐ท๐น (๐‘ฅ)} is asingleton. The sum rule of Corollary 17.3 then yields that ๐‘ฅโˆ— โ‰” โˆ’๐ท๐น (๐‘ฅ) โˆˆ ๐œ•๐น๐›ฟ๐ถ (๐‘ฅ). Theclaim of (17.5) now follows from Theorem 4.6.

(ii): Compared to (i), we merely invoke the approximate Fermat principle of Theorem 17.13 inplace of Theorem 16.2,which establishes the existence of๐‘ฅY โˆˆ ๐ถ satisfying โ€–๐‘ฅYโˆ’๐‘ฅ โ€–๐‘‹ โ‰ค ๐‘‘๐ถ (๐‘ฅ)and 0 โˆˆ ๐œ•Y [๐›ฟ๐ถ+๐น ] (๐‘ฅ). The sum rule of Lemma 17.2 then shows that๐‘ฅโˆ— โ‰” โˆ’๐ท๐น (๐‘ฅ) โˆˆ ๐œ•Y๐›ฟ๐ถ (๐‘ฅ).(iii): In a Hilbert space, we can identify โˆ’๐ท๐น (๐‘ฅ) with the corresponding gradient โˆ’โˆ‡๐น (๐‘ฅ) =(๐‘ฅ โˆ’ ๐‘ฅ)/โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โˆˆ ๐‘‹ for ๐‘ฅ โ‰  0 (otherwise โˆ’โˆ‡๐น (๐‘ฅ) = 0 = ๐‘ฅ โˆ’ ๐‘ฅ). Since ๐œ•Y๐›ฟ๐ถ (๐‘ฅ) is a cone,this implies that ๐‘ฅ โˆ’ ๐‘ฅ โˆˆ ๐œ•๐น๐›ฟ๐ถ (๐‘ฅ) as well. ๏ฟฝ

In the next chapters, we will see that ๐œ•๐น๐›ฟ๐ถ (๐‘ฅ) coincides with a suitable normal cone to๐ถ at๐‘ฅ . In other words, ๐‘ฅโˆ— is a normal vector to the set ๐ถ . In Hilbert spaces, this normal vectorcan be identied with the (normalized) vector pointing from ๐‘ฅ to ๐‘ฅ .

238

Part IV

SET-VALUED ANALYSIS

239

18 TANGENT AND NORMAL CONES

We now start our study of stability properties of the solutions to nonsmooth optimizationproblems. As we have characterized the latter via subdierential inclusions, we need tostudy the sensitivity of such relations to perturbations. As in the smooth case, this can bedone through derivatives of these conditions with respect to relevant parameters; however,these conditions are expressed as inclusions instead of simple equations. Hence we requirenotions of derivatives for set-valued mappings.

To motivate how we will develop dierential calculus for set-valued mappings, recall fromLemma 4.10 how the subdierential of a convex function ๐น can be dened in terms ofthe normal cone to the epigraph of ๐น . This idea forms the basis of dierentiating generalset-valued mappings ๐ป : ๐‘‹ โ‡’ ๐‘Œ , where instead of taking the normal cone at (๐‘ฅ, ๐น (๐‘ฅ))to epi ๐น , we do this at any point (๐‘ฅ, ๐‘ฆ) of graph๐ป โ‰” {(๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ | ๐‘ฆ โˆˆ ๐ป (๐‘ฅ)}. Sincewe are generally not in the nice convex setting โ€“ even for a convex function ๐น โ€š the setgraph ๐œ•๐น is not convex unless ๐น is linear โ€“ there are some complications which result inhaving to deal with various nonequivalent denitions. In this chapter, we introduce therelevant graphical notions of tangent and normal cones. In Chapter 19, we develop specicexpressions for these conses to sets in ๐ฟ๐‘ (ฮฉ) dened as pointwise via nite-dimensionalsets. In the following Chapters 20 to 25, we then dene and further develop notions ofdierentiation of set-valued mappings based on these cones.

18.1 definitions and examples

the fundamental cones

Our rst type of tangent cone is dened using roughly the same limiting process ondierence quotients as basic directional derivatives. Let ๐‘‹ be a Banach space. We denethe tangent cone (or Bouligand or contingent cone) of the set ๐ถ โŠ‚ ๐‘‹ at ๐‘ฅ โˆˆ ๐‘‹ as

(18.1) ๐‘‡๐ถ (๐‘ฅ) โ‰” lim sup๐œโ†’ 0

๐ถ โˆ’ ๐‘ฅ๐œ

=

{ฮ”๐‘ฅ โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

for some ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ, ๐œ๐‘˜โ†’ 0},

240

18 tangent and normal cones

i.e., the tangent cone is the outer limit (in the sense of Section 6.1) of the โ€œblown upโ€ sets(๐ถ โˆ’ ๐‘ฅ)/๐œ as ๐œโ†’ 0.

The tangent cone is closely related to the Frรฉchet normal cone, which is based on the samelimiting process as the Frรฉchet subdierential in Chapter 16:

(18.2) ๐‘๐ถ (๐‘ฅ) โ‰”{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim sup๐ถ3๐‘ฅโ†’๐‘ฅ

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค 0

}.

limiting cones in finite dimensions

One diculty with the Frรฉchet normal cone is that it is not outer semicontinuous. Bytaking their outer limit (in the sense of set-valued mappings), we obtain the less โ€œirregularโ€(basic or limiting orMordukhovich) normal cone. This denition is somewhat more involvedin innite dimensions, so we rst consider ๐ถ โŠ‚ โ„๐‘ at ๐‘ฅ โˆˆ โ„๐‘ . In this case, the limitingnormal cone is dened as

(18.3) ๐‘๐ถ (๐‘ฅ) โ‰” lim sup๐ถ3๐‘ฅโ†’๐‘ฅ

๐‘๐ถ (๐‘ฅ)

=

{๐‘ฅโˆ— โˆˆ โ„๐‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅโˆ— = lim๐‘˜โ†’โˆž

๐‘ฅโˆ—๐‘˜ for some ๐‘ฅโˆ—๐‘˜ โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜), ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ

}.

Despite ๐‘๐ถ being obtained by the outer semicontinuous regularization of ๐‘๐ถ , the latter issometimes in the literature called the regular normal cone. We stick to the convention ofcalling ๐‘๐ถ the Frรฉchet normal cone and ๐‘๐ถ the limiting normal cone.

The limiting variant of the tangent cone is the Clarke tangent cone (also known as theregular tangent cone), dened for a set ๐ถ โŠ‚ โ„๐‘ at ๐‘ฅ โˆˆ โ„๐‘ as the inner limit

(18.4) ๐‘‡๐ถ (๐‘ฅ) โ‰” lim inf๐ถ3๐‘ฅโ†’๐‘ฅ,๐œโ†’ 0

๐ถ โˆ’ ๐‘ฅ๐œ

=

{ฮ”๐‘ฅ โˆˆ โ„๐‘

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐œ๐‘˜โ†’ 0, ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ there exists ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅwith (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ

}.

We will later in Corollary 18.20 see that for a closed set ๐ถ โŠ‚ โ„๐‘ , we in fact have that๐‘‡๐ถ (๐‘ฅ) = lim inf๐ถ3๐‘ฅโ†’๐‘ฅ ๐‘‡๐ถ (๐‘ฅ).The following example as well as Figure 18.1 illustrate the dierent cones.

Example 18.1. We compute the dierent tangent and normal cones at all points ๐‘ฅ โˆˆ ๐ถfor dierent ๐ถ โŠ‚ โ„2.

241

18 tangent and normal cones

๐ถ

๐‘ฅ

(a) tangent cone ๐‘‡๐ถ (๐‘ฅ)

๐ถ

๐‘ฅ

(b) Clarke tangent cone๐‘‡๐ถ (๐‘ฅ)

๐ถ

๐‘ฅ

(c) Frรฉchet normal cone๐‘๐ถ (๐‘ฅ) = {0}

๐ถ

๐‘ฅ

(d) limiting normalcone ๐‘๐ถ (๐‘ฅ)

Figure 18.1: Illustration of the dierent normal and tangent cones at a nonregular point ofa set ๐ถ . The dot indicates the base point ๐‘ฅ . The thick arrows and dark lled-inareas indicate the directions included in the cones.

(i) ๐ถ = ๐”น(0, 1): Clearly, if ๐‘ฅ โˆˆ int๐ถ , then

๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) = {0},๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ) = โ„2.

For any ๐‘ฅ โˆˆ bd๐ถ , on the other hand,

๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) = [0,โˆž)๐‘ฅ โ‰” {๐‘ก๐‘ฅ | ๐‘ก โ‰ฅ 0} ,๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ) = {๐‘ง | ใ€ˆ๐‘ง, ๐‘ฅใ€‰ โ‰ค 0}.

(ii) ๐ถ = [0, 1]2: For ๐‘ฅ โˆˆ int๐ถ , we again have that ๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) = {0} and ๐‘‡๐ถ (๐‘ฅ) =๐‘‡๐ถ (๐‘ฅ) = โ„2; similarly, for ๐‘ฅ โˆˆ bd๐ถ \ {(0, 0), (0, 1), (1, 0), (1, 1)} (i.e., ๐‘ฅ is not oneof the corners of ๐ถ), again ๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) = [0,โˆž)๐‘ฅ and ๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ) ={๐‘ง | ใ€ˆ๐‘ง, ๐‘ฅใ€‰ = 0}. Of the corners, we concentrate on ๐‘ฅ = (1, 1), the others beinganalogous. Then

๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ) | ฮ”๐‘ฅ,ฮ”๐‘ฆ โ‰ฅ 0},๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ) | ฮ”๐‘ฅ,ฮ”๐‘ฆ โ‰ค 0}.

(iii) ๐ถ = [0, 1]2 \ [ 12 , 1]2: Here as well ๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) = {0} and ๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ) = โ„2

for ๐‘ฅ โˆˆ int๐ถ . Other points on bd๐ถ are computed analogously to similar cornersand edges of the square [0, 1]2, but we have to be careful with the โ€œinterior cornerโ€๐‘ฅ = ( 12 , 12 ). Here, similarly to Figure 18.1c, we see that ๐‘๐ถ (๐‘ฅ) = {0}. However, asa lim sup,

๐‘๐ถ (๐‘ฅ) = (0, 1) [0,โˆž) โˆช (1, 0) [0,โˆž).

242

18 tangent and normal cones

For the tangent cones, we then get

๐‘‡๐ถ (๐‘ฅ) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ) | ฮ”๐‘ฅ โ‰ค 0 or ฮ”๐‘ฆ โ‰ค 0},while, as a lim inf ,

๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ) โˆช (1, 0)โ„ โˆช (0, 1)โ„.

limiting cones in infinite dimensions

Let now ๐‘‹ be again a Banach space. Although the fundamental cones โ€“ the (basic) tangentcone and the Frรฉchet normal cone โ€“ were dened based on strongly convergent sequences,in innite-dimensional spaces weak modes of convergence better replicate various rela-tionships between the dierent cones. We thus call an element ฮ”๐‘ฅ โˆˆ ๐‘‹ weakly tangent to๐ถ at ๐‘ฅ if

(18.5) ฮ”๐‘ฅ = w-lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

for some ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ, ๐œ๐‘˜โ†’ 0,

where the w-lim of course stands for ๐œโˆ’1๐‘˜(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ) โ‡€ ฮ”๐‘ฅ . We denote by the weak tangent

cone (or weak contingent cone) ๐‘‡๐‘ค๐ถ (๐‘ฅ) โŠ‚ ๐‘‹ the set of all such ฮ”๐‘ฅ . Using the notion of outerlimits of set-valued mappings from Chapter 6, we can also write

(18.6) ๐‘‡๐‘ค๐ถ (๐‘ฅ) = w-lim sup๐œโ†’ 0

๐ถ โˆ’ ๐‘ฅ๐œ

.

Likewise, the limiting normal cone ๐‘๐ถ (๐‘ฅ) to ๐ถ โŠ‚ ๐‘‹ in a general innite-dimensionalBanach space ๐‘‹ is based on weak-โˆ— limits. Moreover, several proofs will be easier if weslightly relax the denition. Therefore, given Y โ‰ฅ 0 we rst introduce the Y-normal cone of๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— satisfying

(18.7) ๐‘ Y๐ถ (๐‘ฅ) โ‰”

{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ lim sup๐ถ3๐‘ฅโ†’๐‘ฅ

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค Y

}.

The Frรฉchet normal cone is then simply ๐‘๐ถ (๐‘ฅ) โ‰” ๐‘ 0๐ถ (๐‘ฅ).

Now, the (basic or limiting or Mordukhovich) normal cone is dened as

(18.8) ๐‘๐ถ (๐‘ฅ) โ‰” w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ, Yโ†’ 0

๐‘ Y๐ถ (๐‘ฅ).

In other words, ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) if and only if there exist ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , Y๐‘˜โ†’ 0 and ๐‘ฅโˆ—๐‘˜โˆˆ ๐‘ Y๐‘˜

๐ถ (๐‘ฅ๐‘˜)such that ๐‘ฅโˆ—

๐‘˜โˆ—โ‡€ ๐‘ฅโˆ—.

In Gรขteaux smooth Banach spaces, we can x Y โ‰ก 0 in (18.8). Thus such spaces can betreated similarly to the nite-dimensional case in (18.3).

243

18 tangent and normal cones

Theorem 18.2. Let ๐‘‹ be a Gรขteaux smooth Banach space, ๐ถ โŠ‚ ๐‘‹ , and ๐‘ฅ โˆˆ ๐‘‹ . Then

(18.9) ๐‘๐ถ (๐‘ฅ) = w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ

๐‘๐ถ (๐‘ฅ).

Proof. Denote by ๐พ the set on the right hand side of (18.9). Then by the denition (18.8),clearly ๐‘๐ถ (๐‘ฅ) โŠƒ ๐พ . To show ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐พ , let ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ). Then (18.8) yields ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , Y๐‘˜โ†’ 0,and ๐‘ฅโˆ—

๐‘˜โˆ—โ‡€ ๐‘ฅ๐‘˜ with ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘ Y๐‘˜

๐ถ (๐‘ฅ๐‘˜). We need to show that there exist some ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ and๐‘ฅโˆ—๐‘˜

โˆ—โ‡€ ๐‘ฅโˆ— with ๐‘ฅโˆ—๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜). Indeed, since ๐‘ Y

๐ถ = ๐œ•Y๐›ฟ๐ถ , by Corollary 17.12 applied to ๐น = ๐›ฟ๐ถ ,we have for any sequence ๐›ฟ๐‘˜โ†’ 0 that

๐‘ฅโˆ—๐‘˜ โˆˆ ๐‘ Y๐‘˜๐ถ (๐‘ฅ๐‘˜) โŠ‚

โ‹ƒ๐‘ฅโˆˆ๐”น(๐‘ฅ๐‘˜ ,๐›ฟ๐‘˜ )

๐‘๐ถ (๐‘ฅ) + ๐›ฟ๐‘˜๐”น๐‘‹ โˆ— (๐‘˜ โˆˆ โ„•).

In particular, there exist ๐‘ฅ๐‘˜ โˆˆ ๐”น(๐‘ฅ๐‘˜ , ๐›ฟ๐‘˜) and ๐‘ฅโˆ—๐‘˜ โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) โˆฉ ๐”น(๐‘ฅโˆ—๐‘˜, ๐›ฟ๐‘˜), which implies that

๐‘ฅ๐‘˜ โ†’ ๐‘ฅ and ๐‘ฅโˆ—๐‘˜

โˆ—โ‡€ ๐‘ฅโˆ— as desired. ๏ฟฝ

Remark 18.3. Theorem 18.2 can be extended to Asplund spaces โ€“ in particular to reexive Banachspaces. The equivalence of (18.9) and (18.8) can, in fact, be used as a denition of an Asplund space.For details we refer to [Mordukhovich 2006, Theorem 2.35].

Finally, the Clarke tangent cone is dened as in nite dimensions as

(18.10) ๐‘‡๐ถ (๐‘ฅ) โ‰” lim inf๐ถ3๐‘ฅโ†’๐‘ฅ,๐œโ†’ 0

๐ถ โˆ’ ๐‘ฅ๐œ

=

{ฮ”๐‘ฅ โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐œ๐‘˜โ†’ 0, ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ there exists ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅwith (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ

}.

In innite-dimensional spaces, however, we in general only have the inclusion ๐‘‡๐ถ (๐‘ฅ) โŠ‚lim inf๐ถ3๐‘ฅโ†’๐‘ฅ ๐‘‡๐ถ (๐‘ฅ); see Corollary 18.20.

Remark 18.4 (a much too brief history of various cones). The (Bouligand) tangent cone was alreadyintroduced for smooth sets by Peano in 1908 [Peano 1908]; the term contingent cone is due toBouligand [Bouligand 1930]. The Clarke tangent cone (also called circatangent cone) was introducedin [Clarke 1973; Clarke 1975]; see also [Clarke 1990]. The limiting normal cone can be found in[Mordukhovich 1976], who stressed the need of dening (nonconvex) normal cones directly ratherthan as (necessarily convex) polars of tangent cones. The history of the Frรฉchet normal cone isharder to trace, but it has appeared in the literature as the polar of the tangent cone. We will seethat in nite dimensions, ๐‘๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ)โ—ฆ. In innite dimensions, ๐‘‡๐ถ (๐‘ฅ)โ—ฆ is sometimes called theDini normal cone and is in general not equal to the Frรฉchet normal cone.

We do not attempt to do full justice to the muddier parts of the historical development here, andrather refer to the accounts in [Dolecki & Greco 2011; Bigolin & Golo 2014] as well as [Rockafellar &Wets 1998, Commentary to Ch. 6] and [Mordukhovich 2018, Commentary to Ch. 1]. Various furthercones are also discussed in [Aubin & Frankowska 1990].

244

18 tangent and normal cones

18.2 basic relationships and properties

As seen in Example 18.1, the limiting normal cone ๐‘๐ถ (๐‘ฅ) can be larger than the Frรฉchetnormal cone ๐‘๐ถ (๐‘ฅ); conversely, the Clarke tangent cone ๐‘‡๐ถ (๐‘ฅ) is smaller than the tangentcone ๐‘‡๐ถ (๐‘ฅ); see Figure 18.1. These inclusions hold in general.

Theorem 18.5. Let ๐ถ โŠ‚ ๐‘‹ and ๐‘ฅ โˆˆ ๐‘‹ . Then(i) ๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ);(ii) ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ).

Proof. If we x the base point ๐‘ฅ as ๐‘ฅ in the denition (18.10) of๐‘‡๐ถ (๐‘ฅ), the tangent inclusion๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐ถ (๐‘ฅ) is clear from the denition (18.1) of ๐‘‡๐ถ (๐‘ฅ) as an outer limit and of ๐‘‡๐ถ (๐‘ฅ) asan inner limit. The inclusion ๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ) is likewise clear from the denition of ๐‘‡๐ถ (๐‘ฅ)as a strong outer limit and of ๐‘‡๐‘ค๐ถ (๐‘ฅ) is the corresponding weak outer limit.

The normal inclusion ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ) follows from the denition (18.8) of ๐‘๐ถ (๐‘ฅ) as theouter limit of ๐‘ Y

๐ถ (๐‘ฅ) as ๐‘ฅ โ†’ ๐‘ฅ and Yโ†’ 0. (In nite dimensions, we can x Y = 0 in thisargument or refer to the equivalence of denitions shon in Theorem 18.2.) ๏ฟฝ

For a closed and convex set๐ถ , however, both the Frรฉchet and limiting normal cones coincidewith the convex normal cone dened in Lemma 4.8 (which we here denote by ๐œ•๐›ฟ๐ถ (๐‘ฅ) toavoid confusion).

Lemma 18.6. Let ๐ถ โŠ‚ ๐‘‹ be nonempty, closed, and convex. Then for all ๐‘ฅ โˆˆ ๐‘‹ ,(i) ๐‘๐ถ (๐‘ฅ) = ๐œ•๐›ฟ๐ถ (๐‘ฅ);(ii) if ๐‘‹ is Gรขteaux smooth (in particular, nite-dimensional), ๐‘๐ถ (๐‘ฅ) = ๐œ•๐›ฟ๐ถ (๐‘ฅ).

Proof. If ๐‘ฅ โˆ‰ ๐ถ , it follows from their denitions that all three cones are empty. We can thusassume that ๐‘ฅ โˆˆ ๐ถ .(i): If ๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ๐ถ (๐‘ฅ), we have by denition that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฆ โˆˆ ๐ถ.

Taking in particular ๐‘ฆ = ๐‘ฅ and passing to the limit ๐‘ฅ โ†’ ๐‘ฅ thus implies that ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ).Conversely, let ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) and let ๐‘ฆ โˆˆ ๐ถ be arbitrary. Since ๐ถ is convex, this implies that๐‘ฅ๐‘ก โ‰” ๐‘ฅ + ๐‘ก (๐‘ฆ โˆ’ ๐‘ฅ) โˆˆ ๐ถ for any ๐‘ก โˆˆ (0, 1) as well. We also have that ๐‘ฅ๐‘ก โ†’ ๐‘ฅ for ๐‘ก โ†’ 0. From(18.2), it then follows by inserting the denition of ๐‘ฅ๐‘ก and dividing by ๐‘ก > 0 that

0 โ‰ฅ lim๐‘กโ†’0

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘ก โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ โ€–๐‘‹ =

ใ€ˆ๐‘ฅโˆ—, ๐‘ฆ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ .

245

18 tangent and normal cones

and hence, since ๐‘ฆ โˆˆ ๐ถ was arbitrary, that ๐‘ฅโˆ— โˆˆ ๐œ•๐›ฟ๐ถ (๐‘ฅ).(ii): By Lemmas 2.5 and 6.8 and Theorem 6.11, ๐œ•๐›ฟ๐ถ is strong-to-weak-โˆ— outer semicontinuous,which by Theorem 18.5 and the Y โ‰ก 0 characterization of Theorem 18.2 implies that

๐‘๐ถ (๐‘ฅ) = w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ

๐‘๐ถ (๐‘ฅ) = w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ

๐œ•๐›ฟ๐ถ (๐‘ฅ) โŠ‚ ๐œ•๐›ฟ๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ).

Hence ๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ). ๏ฟฝ

Note that convexity was only used for the second inclusion, and hence ๐œ•๐›ฟ๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ)always holds. In general, comparing (18.2) with (16.2), we have the following relation.

Corollary 18.7. Let ๐ถ โŠ‚ ๐‘‹ and ๐‘ฅ โˆˆ ๐‘‹ . Then ๐‘๐ถ (๐‘ฅ) = ๐œ•๐น๐›ฟ๐ถ (๐‘ฅ).

The next theorem lists some of the most basic properties of the various tangent and normalcones.

Theorem 18.8. Let ๐ถ โŠ‚ ๐‘‹ and ๐‘ฅ โˆˆ ๐‘‹ . Then(i) ๐‘‡๐ถ (๐‘ฅ), ๐‘‡๐ถ (๐‘ฅ), ๐‘๐ถ (๐‘ฅ), and ๐‘๐ถ (๐‘ฅ) are cones;(ii) ๐‘‡๐ถ (๐‘ฅ), ๐‘‡๐ถ (๐‘ฅ), and ๐‘ Y

๐ถ (๐‘ฅ) for every Y โ‰ฅ 0 are closed;

(iii) ๐‘‡๐ถ (๐‘ฅ) and ๐‘ Y๐ถ (๐‘ฅ) for every Y โ‰ฅ 0 are convex;

(iv) if ๐‘‹ is nite-dimensional, then ๐‘๐ถ (๐‘ฅ) is closed.

Proof. We argue the dierent properties for each type of cone in turn.

The Frรฉchet (Y-)normal cone: It is clear from the denition of ๐‘๐ถ (๐‘ฅ) that it is a cone, i.e.,that ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) implies that _๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) for all _ > 0.

Let now Y โ‰ฅ 0 be arbitrary. Let ๐‘ฅโˆ—๐‘˜โˆˆ ๐‘ Y

๐ถ (๐‘ฅ) converge to some ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—. Also suppose๐ถ 3 ๐‘ฅโ„“ โ†’ ๐‘ฅ . Then for any โ„“, ๐‘˜ โˆˆ โ„•, we have by the Cauchyโ€“Schwarz inequality that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅโ„“ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅโ„“ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ใ€ˆ๐‘ฅโˆ—

๐‘˜, ๐‘ฅโ„“ โˆ’ ๐‘ฅใ€‰๐‘‹

โ€–๐‘ฅโ„“ โˆ’ ๐‘ฅ โ€–๐‘‹ + โ€–๐‘ฅโˆ—๐‘˜ โˆ’ ๐‘ฅโˆ—โ€–๐‘‹

and thus thatlim supโ„“โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅโ„“ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅโ„“ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค Y + โ€–๐‘ฅโˆ—๐‘˜ โˆ’ ๐‘ฅโˆ—โ€–๐‘‹ .

Since ๐‘˜ โˆˆ โ„• was arbitrary and ๐‘ฅโˆ—๐‘˜โ†’ ๐‘ฅโˆ—, we see that ๐‘ฅโˆ— โˆˆ ๐‘ Y

๐ถ (๐‘ฅ) and may conclude that๐‘ Y๐ถ (๐‘ฅ) is closed.

246

18 tangent and normal cones

To show convexity, take ๐‘ฅโˆ—1 , ๐‘ฅโˆ—2 โˆˆ ๐‘ Y๐ถ (๐‘ฅ) and let ๐‘ฅโˆ— โ‰” _๐‘ฅโˆ—1 + (1 โˆ’ _)๐‘ฅโˆ—2 for some _ โˆˆ (0, 1).

We then have

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅโ„“ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅโ„“ โˆ’ ๐‘ฅ โ€–๐‘‹ = _

ใ€ˆ๐‘ฅโˆ—1 , ๐‘ฅโ„“ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅโ„“ โˆ’ ๐‘ฅ โ€–๐‘‹ + (1 โˆ’ _) ใ€ˆ๐‘ฅ

โˆ—2, ๐‘ฅโ„“ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅโ„“ โˆ’ ๐‘ฅ โ€–๐‘‹ .

Taking the limit ๐‘ฅโ„“ โ†’ ๐‘ฅ now yields ๐‘ฅโˆ— โˆˆ ๐‘ Y๐ถ (๐‘ฅ) and hence the convexity.

The limiting normal cone: If ๐‘‹ is nite-dimensional, the set ๐‘๐ถ (๐‘ฅ) is a closed cone as thestrong outer limit of the (closed) cones ๐‘๐ถ (๐‘ฅโ„“) as ๐‘ฅโ„“ โ†’ ๐‘ฅ ; see Lemma 6.2.

The tangent cone: By Lemma 6.2,๐‘‡๐ถ (๐‘ฅ) is closed as the outer limit of the sets๐ถ๐œ โ‰” (๐ถโˆ’๐‘ฅ)/๐œas ๐œโ†’ 0. To see that it is a cone, suppose ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ). Then there exist by denition ๐œ๐‘˜โ†’ 0and ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ such that (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ . Now, for any _ > 0, taking ๐œ๐‘˜ โ‰” _โˆ’1๐œ๐‘˜ , wehave (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ†’ _ฮ”๐‘ฅ . Hence _ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ).The Clarke tangent cone: Finally, ๐‘‡๐ถ (๐‘ฅ) is a closed set through its denition as an innerlimit, cf. Corollary 6.3, as well as a cone by analogous arguments as for ๐‘‡๐ถ (๐‘ฅ). To seethat it is convex, take ฮ”๐‘ฅ 1,ฮ”๐‘ฅ2 โˆˆ ๐‘‡๐ถ (๐‘ฅ). Since ๐‘‡๐ถ (๐‘ฅ) is a cone, we only need to showthat ฮ”๐‘ฅ โ‰” ฮ”๐‘ฅ 1 + ฮ”๐‘ฅ2 โˆˆ ๐‘‡๐ถ (๐‘ฅ). By the denition of ๐‘‡๐ถ (๐‘ฅ) as an inner limit, we thereforehave to show that for any sequence ๐œ๐‘˜โ†’ 0 and any โ€œbase point sequenceโ€ ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ ,there exist ๐‘ฅ๐‘˜ โˆˆ ๐ถ such that (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ . We do this by using the varying basepoint in the denition of ๐‘‡๐ถ (๐‘ฅ) to โ€œbridgeโ€ between the sequences generating ฮ”๐‘ฅ1 andฮ”๐‘ฅ2; see Figure 18.2. First, since ฮ”๐‘ฅ 1 โˆˆ ๐‘‡๐ถ (๐‘ฅ), by the very same denition of ๐‘‡๐ถ (๐‘ฅ) as aninner limit, we can nd for the base point sequence {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• points ๐ถ 3 ๐‘ฅ 1

๐‘˜โ†’ ๐‘‹ with

(๐‘ฅ 1๐‘˜โˆ’๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ 1. Continuing in the same way, since ฮ”๐‘ฅ2 โˆˆ ๐‘‡๐ถ (๐‘ฅ), we can now nd with

{๐‘ฅ 1๐‘˜}๐‘˜โˆˆโ„• as the base point sequence points ๐‘ฅ2

๐‘˜โˆˆ ๐ถ such that (๐‘ฅ2

๐‘˜โˆ’ ๐‘ฅ 1

๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ2. It follows

๐‘ฅ2๐‘˜โˆ’ ๐‘ฅ๐‘˜๐œ๐‘˜

=๐‘ฅ2๐‘˜โˆ’ ๐‘ฅ 1

๐‘˜

๐œ๐‘˜+ ๐‘ฅ

1๐‘˜โˆ’ ๐‘ฅ๐‘˜๐œ๐‘˜

โ†’ ฮ”๐‘ฅ 1 + ฮ”๐‘ฅ2 = ฮ”๐‘ฅ .

Thus {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• = {๐‘ฅ2๐‘˜}๐‘˜โˆˆโ„• is the sequence we are looking for, showing that ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ) and

hence that the Clarke tangent cone is convex. ๏ฟฝ

One might expect ๐‘‡๐‘ค๐ถ (๐‘ฅ) to be weakly closed and ๐‘๐ถ (๐‘ฅ) to be weak-โˆ— closed. However,this is not necessarily the case, since weak and weak-โˆ— inner and outer limits need not beclosed in the respective topologies. Consequently, ๐‘๐ถ may also not be (strong-to-weak-โˆ—)outer semicontinuous at a point ๐‘ฅ , as this would imply ๐‘๐ถ (๐‘ฅ) to be weak-โˆ— closed andhence closed. However, in nite dimensions we do have outer semicontinuity.

Corollary 18.9. If ๐‘‹ is nite-dimensional, then the mapping ๐‘ฅ โ†ฆโ†’ ๐‘๐ถ (๐‘ฅ) is outer semicontinu-ous.

247

18 tangent and normal cones

๐‘ฅ๐‘ฅ๐‘˜

ฮ”๐‘ฅ2๐‘˜

ฮ”๐‘ฅ1๐‘˜

๐‘ฅ1๐‘˜๐‘ฅ2๐‘˜

ฮ”๐‘ฅ1๐‘˜+ ฮ”๐‘ฅ2

๐‘˜

Figure 18.2: Illustration of the โ€œbridgingโ€ argument in the proof of Theorem 18.8. As ๐‘ฅ๐‘˜converges to ๐‘ฅ , the dashed arrows converge to the solid arrows,while the dottedarrow converges to the dash-dotted one, which depicts the point ฮ”๐‘ฅ 1

๐‘˜+ ฮ”๐‘ฅ2

๐‘˜

that we are trying to prove to be in ๐‘‡๐ถ (๐‘ฅ).

Proof. Let๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ and ๐‘ฅโˆ—๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) with ๐‘ฅโˆ—๐‘˜ โ†’ ๐‘ฅโˆ—. Then for ๐›ฟ๐‘˜โ†’ 0, the denition (18.3)

provides ๐‘ฅ๐‘˜ โˆˆ ๐ถ and ๐‘ฅโˆ—๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) with โ€–๐‘ฅโˆ—

๐‘˜โˆ’ ๐‘ฅโˆ—

๐‘˜โ€– โ‰ค ๐›ฟ๐‘˜ and โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜ โ€– โ‰ค ๐›ฟ๐‘˜ . It follows that

๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ and ๐‘ฅโˆ—๐‘˜โ†’ ๐‘ฅโˆ— with ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜). Thus by denition, ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ), and hence

๐‘๐ถ is outer semicontinuous. ๏ฟฝ

18.3 polarity and limiting relationships

The tangent and normal cones satisfy various polarity relationships. To state these, recallfrom Section 1.2 for a general set ๐ถ โŠ‚ ๐‘‹ the denition of the polar cone

๐ถโ—ฆ = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐ถ}

as well as of the bipolar cone ๐ถโ—ฆโ—ฆ = (๐ถโ—ฆ)โ—ฆ โŠ‚ ๐‘‹ .

the fundamental cones

The relations in the following result will be crucial.

Lemma 18.10. Let ๐‘‹ be a Banach space, ๐ถ โŠ‚ ๐‘‹ , and ๐‘ฅ โˆˆ ๐‘‹ . Then(i) ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆ โŠ‚ ๐‘‡๐ถ (๐‘ฅ)โ—ฆ;(ii) if ๐‘‹ is reexive, then ๐‘๐ถ (๐‘ฅ) = ๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆ;

248

18 tangent and normal cones

(iii) if ๐‘‹ is nite-dimensional, then ๐‘๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ)โ—ฆ.

Proof. (i): We take ฮ”๐‘ฅ โˆˆ ๐‘‡๐‘ค๐ถ (๐‘ฅ) and ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ). Then there exist ๐œ๐‘˜โ†’ 0 and ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅsuch that (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ‡€ ฮ”๐‘ฅ weakly in ๐‘‹ . Thus

ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ = lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹๐œ๐‘˜

= lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹

ยท โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹๐œ๐‘˜

.

Since ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) and๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ ,we have by denition that lim sup๐‘˜โ†’โˆžใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜โˆ’๐‘ฅใ€‰๐‘‹/โ€–๐‘ฅ๐‘˜โˆ’๐‘ฅ โ€–๐‘‹ โ‰ค 0. Moreover, (๐‘ฅ๐‘˜ โˆ’๐‘ฅ)/๐œ๐‘˜ โ‡€ ฮ”๐‘ฅ implies that โ€–๐‘ฅ๐‘˜ โˆ’๐‘ฅ โ€–๐‘‹/๐œ๐‘˜ is bounded. Passing to thelimit, it therefore follows that ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ โ‰ค 0. Since this holds for every ฮ”๐‘ฅ โˆˆ ๐‘‡๐‘ค๐ถ (๐‘ฅ), we seethat ๐‘ฅโˆ— โˆˆ ๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆ. This shows that ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆ. Since๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ) by Theorem 18.5,๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆ โŠ‚ ๐‘‡๐ถ (๐‘ฅ)โ—ฆ follows from Theorem 1.8.

(ii): Due to (i), we only need to show โ€œโŠƒโ€. Let ๐‘ฅโˆ— โˆ‰ ๐‘๐ถ (๐‘ฅ). Then, by denition, there exist๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ with

(18.11) lim๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅ๐‘˜ใ€‰ > 0 for ฮ”๐‘ฅ๐‘˜ โ‰”๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ

โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹.

We now use the reexivity of ๐‘‹ and the Eberleinโ€“Smulyan Theorem 1.9 to pass to asubsequence, unrelabelled, such that that ฮ”๐‘ฅ๐‘˜ โ‡€ ฮ”๐‘ฅ for some ฮ”๐‘ฅ โˆˆ ๐‘‹ that by denitionsatises ฮ”๐‘ฅ โˆˆ ๐‘‡๐‘ค๐ถ (๐‘ฅ). However, passing to the limit in (18.11) now shows that ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ > 0and hence that ๐‘ฅโˆ— โˆ‰ ๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆ.(iii): This is immediate from (ii) since ๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐‘ค๐ถ (๐‘ฅ) in nite-dimensional spaces. ๏ฟฝ

the limiting cones: preliminary lemmas

For a polarity relationship between the basic normal cone and the Clarke tangent cone, weneed to work signicantly harder. We start here with some preliminary lemmas sharedbetween the nite-dimensional and innite-dimensional setting, and then treat the two inthat order.

Lemma 18.11. Let ๐‘‹ be a reexive Banach space, ๐ถ โŠ‚ ๐‘‹ , and ๐‘ฅ โˆˆ ๐‘‹ . Then

(18.12) ๐‘‡๐ถ (๐‘ฅ) โŠ‚ lim inf๐ถ3๐‘ฅโ†’๐‘ฅ

๐‘‡๐‘ค๐ถ (๐‘ฅ).

If ๐‘‹ = โ„๐‘ , then๐‘‡๐ถ (๐‘ฅ) โŠ‚ lim inf

๐ถ3๐‘ฅโ†’๐‘ฅ๐‘‡๐ถ (๐‘ฅ).

249

18 tangent and normal cones

Proof. The case ๐‘‹ = โ„๐‘ trivially follows from (18.12). To prove (18.12), denote by ๐พ the seton its right-hand side. If ฮ”๐‘ฅ โˆ‰ ๐พ , then there exist Y > 0 and a sequence ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ suchthat

(18.13) infฮ”๐‘ฅ๐‘˜โˆˆ๐‘‡๐‘ค๐ถ (๐‘ฅ๐‘˜ )

โ€–ฮ”๐‘ฅ๐‘˜ โˆ’ ฮ”๐‘ฅ โ€–๐‘‹ โ‰ฅ 3Y.

Fix ๐‘˜ โˆˆ โ„• and suppose that for some ๐œโ„“โ†’ 0 and ๐‘ฅโ„“ โˆˆ ๐ถ ,

(18.14) ๐‘ฅโ„“โˆ’๐‘ฅ๐‘˜๐œโ„“

โˆ’ ฮ”๐‘ฅ ๐‘‹โ‰ค 2Y (โ„“ โˆˆ โ„•).

Using the reexivity of ๐‘‹ and the Eberleinโ€“Smulyan Theorem 1.9, we then nd a further,unrelabelled, subsequence of {(๐‘ฅโ„“ , ๐œโ„“)}โ„“โˆˆโ„• such that (๐‘ฅโ„“ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜โ„“ โ‡€ ฮ”๐‘ฅ๐‘˜ as โ„“ โ†’ โˆž forsome ฮ”๐‘ฅ๐‘˜ โˆˆ ๐‘‡๐‘ค๐ถ (๐‘ฅ๐‘˜) with โ€–ฮ”๐‘ฅ๐‘˜ โˆ’ ฮ”๐‘ฅ โ€–๐‘‹ โ‰ค 2Y, in contradiction to (18.13). We thus have

lim๐œโ†’ 0

inf๐‘ฅโˆˆ๐ถ

๐‘ฅโˆ’๐‘ฅ๐‘˜๐œ โˆ’ ฮ”๐‘ฅ ๐‘‹โ‰ฅ 2Y.

Since this holds for all ๐‘˜ โˆˆ โ„•, we can nd ๐œ๐‘˜ > 0 with ๐œ๐‘˜โ†’ 0 satisfying the inequality

lim inf๐‘˜โ†’โˆž

inf๐‘ฅโˆˆ๐ถ

๐‘ฅโˆ’๐‘ฅ๐‘˜๐œ๐‘˜โˆ’ ฮ”๐‘ฅ

๐‘‹โ‰ฅ Y

implying that ฮ”๐‘ฅ โˆ‰ ๐‘‡๐ถ (๐‘ฅ). Therefore (18.12) holds. ๏ฟฝ

Lemma 18.12. Let ๐‘‹ be a reexive and Gรขteaux smooth (or nite-dimensional) Banach space,๐ถ โŠ‚ ๐‘‹ , and ๐‘ฅ โˆˆ ๐‘‹ . Then

๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ)โ—ฆ.

Proof. Take ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) and ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ). This gives by Theorem 18.2 (or (18.3) if ๐‘‹ isnite-dimensional) sequences ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ and ๐‘ฅโˆ—

๐‘˜โˆ—โ‡€ ๐‘ฅโˆ— with ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜). By Lemma 18.11,

we can nd for each ๐‘˜ โˆˆ โ„• a ฮ”๐‘ฅ๐‘˜ โˆˆ ๐‘‡๐‘ค๐ถ (๐‘ฅ๐‘˜) such that ฮ”๐‘ฅ๐‘˜ โ†’ ฮ”๐‘ฅ . Since ๐‘๐ถ (๐‘ฅ๐‘˜) = ๐‘‡๐‘ค๐ถ (๐‘ฅ๐‘˜)โ—ฆby Lemma 18.10 (ii) when ๐‘‹ is reexive, we have ใ€ˆ๐‘ฅโˆ—

๐‘˜,ฮ”๐‘ฅ๐‘˜ใ€‰๐‘‹ โ‰ค 0. Combining all these

observations, we obtain

ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ = lim๐‘˜โ†’โˆž

(ใ€ˆ๐‘ฅโˆ—๐‘˜ ,ฮ”๐‘ฅ๐‘˜ใ€‰๐‘‹ + ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—๐‘˜ ,ฮ”๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฅโˆ—๐‘˜ ,ฮ”๐‘ฅ โˆ’ ฮ”๐‘ฅ๐‘˜ใ€‰๐‘‹ )= lim๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—๐‘˜ ,ฮ”๐‘ฅ๐‘˜ใ€‰๐‘‹ โ‰ค 0.

Since ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) was arbitrary, we deduce that ฮ”๐‘ฅ โˆˆ ๐‘๐ถ (๐‘ฅ)โ—ฆ and hence the claim. ๏ฟฝ

250

18 tangent and normal cones

๐‘ฅ๐‘ฅ๐‘ฅ

๐‘ฅ + ๐‘ง๐‘ฅ + \๐‘ง๐‘ฅโˆ—๐‘ฅโˆ—

(a) By assumption, the interior of the ball around ๐‘ฅ + ๐‘ง of radius Y does not intersect ๐ถ (shaded). Inthis example, the point ๐‘ฅ โˆˆ ๐ถ intersects the boundary; however, it is not on the leading edge(thick lines) where the normal vector ๐‘ฅโˆ— would satisfy ใ€ˆ๐‘ง, ๐‘ฅโˆ—ใ€‰ โ‰ฅ Y. Reducing \ < 1 produces anintersecting point ๐‘ฅ on the leading edge.

๐‘ฅ

๐‘ฅ + \๐‘ง

๐‘‡๐ถ (๐‘ฅ)

๐‘ฅโˆ—

(b) The โ€œice cream coneโ€ emanating from ๐‘ฅ along the line [๐‘ฅ, ๐‘ฅ + \๐‘ง] with a ball of radius Y\ doesnot intersect๐ถ (light shading). From this it follows that the tangent cone๐‘‡๐ถ (๐‘ฅ) (incomplete darkshading) is at a distance Y from ๐‘ง

Figure 18.3: Geometric illustration of the construction in the proof of Lemma 18.13.

the limiting cones in finite dimensions

We now start our development of polarity relationships between the limiting cones, aswell as limiting relationships between the tangent and Clarke tangent cones. Our maintool will be the following โ€œice cream cone lemmaโ€, for which it is important that we endowโ„๐‘ with the Euclidean norm.

Lemma 18.13. Let ๐ถ โŠ‚ โ„๐‘ be closed and let ๐‘ฅ โˆˆ ๐ถ . Let ๐‘ง โˆˆ โ„๐‘ \ {0} and Y > 0 be such that

(18.15) int๐”น(๐‘ฅ + ๐‘ง, Y) โˆฉ๐ถ = โˆ….

Then for any Y โˆˆ (0, Y), there exists an ๐‘ฅ โˆˆ ๐ถ such that there exist

(i) \ โˆˆ (0, 1] satisfying โ€–(๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง)โ€– โ‰ค Y and infฮ”๐‘ฅโˆˆ๐‘‡๐ถ (๐‘ฅ) โ€–ฮ”๐‘ฅ โˆ’ ๐‘งโ€– โ‰ฅ Y;(ii) ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) satisfying ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰ โ‰ฅ Y and โ€–๐‘ฅโˆ—โ€– โ‰ค 1.

Proof. We dene the increasing real function ๐œ‘ (๐‘ก) โ‰”โˆš1 + ๐‘ก2 and ๐น,๐บ : โ„๐‘ ร—โ„ โ†’ โ„ by

๐น (๐‘ฅ, \ ) โ‰” ๐œ‘ (Y)\ + ๐œ‘ (โ€–(๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง)โ€–) and ๐บ (๐‘ฅ, \ ) โ‰” ๐›ฟ๐ถ (๐‘ฅ) + ๐›ฟ [0,โˆž) (\ ).

251

18 tangent and normal cones

Then ๐น +๐บ is proper, coercive, and lower semicontinuous and hence admits a minimizer(๐‘ฅ, \ ) โˆˆ ๐ถร— [0,โˆž) by Theorem 2.1. (We illustrate the idea of such a minimizer geometricallyin Figure 18.3.) Let ๐‘ฆ := (๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง).(i): We rst prove \ โˆˆ (0, 1]. Suppose \ = 0. Since ๐‘ฅ โˆˆ ๐ถ we obtain using (18.15) that

[๐น +๐บ] (๐‘ฅ, 0) = ๐œ‘ (โ€–๐‘ฅ โˆ’ (๐‘ฅ + ๐‘ง)โ€–) โ‰ฅ ๐œ‘ (Y) > ๐œ‘ (Y) = [๐น +๐บ] (๐‘ฅ, 1).This is a contradiction to (๐‘ฅ, 0) being a minimizer. Thus \ โ‰  0. Likewise,

๐œ‘ (Y)\ + ๐œ‘ (โ€–๐‘ฆ โ€–) = [๐น +๐บ] (๐‘ฅ, \ ) โ‰ค [๐น +๐บ] (๐‘ฅ, 1) = ๐œ‘ (Y),where both terms on the left-hand side are nonnegative. Hence \ โ‰ค 1. By the monotonicityof ๐œ‘ , this also veries the claim โ€–๐‘ฆ โ€– โ‰ค Y.We still need to prove the claim on the tangent cone. Since (๐‘ฅ, \ ) is a minimizer of ๐น +๐บ ,for any \ โ‰ฅ 0 and ๐‘ฅ โˆˆ ๐ถ we have

๐œ‘ (Y)\ + ๐œ‘ (โ€–๐‘ฆ โ€–) โ‰ค [๐น +๐บ] (๐‘ฅ, \ ) โ‰ค [๐น +๐บ] (๐‘ฅ, \ ) = ๐œ‘ (Y)\ + ๐œ‘ (โ€–๐‘ฆ โ€–) .Letting ๐‘ฆ โ‰” (๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง) and using rst this inequality and then the convexity of ๐œ‘with ๐œ‘โ€ฒ(๐‘ก) = ๐‘ก/๐œ‘ (๐‘ก) โ‰ค 1 for all ๐‘ก โ‰ฅ 0 yields

๐œ‘ (Y) (\ โˆ’ \ ) โ‰ค ๐œ‘ (โ€–๐‘ฆ โ€–) โˆ’ ๐œ‘ (โ€–๐‘ฆ โ€–)โ‰ค ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–) (โ€–๐‘ฆ โ€– โˆ’ โ€–๐‘ฆ โ€–)โ‰ค โ€–๐‘ฆ โˆ’ ๐‘ฆ โ€– = โ€–๐‘ฅ โˆ’ ๐‘ฅ โˆ’ (\ โˆ’ \ )๐‘งโ€–.

Dividing by ๐œ = \ โˆ’ \ for \ โˆˆ [0, \ ), we obtain that Y โ‰ค ๐œ‘ (Y) โ‰ค ๐‘ฅโˆ’๐‘ฅ

๐œ โˆ’ ๐‘ง . Taking the

inmum over ๐‘ฅ โˆˆ ๐ถ and ๐œ โˆˆ (0, \ ] thus yields infฮ”๐‘ฅโˆˆ๐‘‡๐ถ (๐‘ฅ) โ€–ฮ”๐‘ฅ โˆ’ ๐‘งโ€– โ‰ฅ Y.(ii): By Lemma 3.4 (iv), ๐น is convex. Furthermore, int(dom ๐น ) = โ„๐‘+1 so that ๐น is Lipschitznear (๐‘ฅ, \ ) by Theorem 3.13. Using Theorems 4.6, 4.17 and 4.19 with ๐พ (๐‘ฅ, \ ) โ‰” ๐‘ฅ + \๐‘ง, itfollows that

๐œ•๐น (๐‘ฅ, \ ) ={(

๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–)๐‘ฆโˆ—๐œ‘ (Y) + ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–)ใ€ˆ๐‘ง, ๐‘ฆโˆ—ใ€‰

) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰ = โ€–๐‘ฆ โ€–, โ€–๐‘ฆโˆ—โ€– = 1 if ๐‘ฆ โ‰  0โ€–๐‘ฆโˆ—โ€– โ‰ค 1 if ๐‘ฆ = 0

}.(18.16)

Since โ„๐‘ endowed with the euclidean norm is a Hilbert space, ๐‘ฅ โ†ฆโ†’ โ€–๐‘ฅ โ€–2 is Gรขteaux dier-entiable by Example 17.6 (i) and Lemma 17.7. Hence ๐œ•๐น (๐‘ฅ, \ ) is a singleton, and therefore ๐นis Gรขteaux dierentiable at (๐‘ฅ, \ ) due to Lemma 13.7 and Theorem 13.8. We can thus applythe Fermat principle (Theorem 16.2) and the Frรฉchet sum rule (Corollary 17.3) to deduce0 โˆˆ ๐œ•๐น๐น (๐‘ฅ, \ ) + ๐œ•๐น๐บ (๐‘ฅ, \ ). Since \ > 0, we have ๐œ•๐น๐บ (๐‘ฅ, \ ) = ๐‘๐ถ (๐‘ฅ) ร— {0} by Corollary 18.7,which implies that

(18.17) โˆ’ ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–)๐‘ฆโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) and ๐œ‘ (Y) + ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–)ใ€ˆ๐‘ง, ๐‘ฆโˆ—ใ€‰ = 0.

Since ๐œ‘ (Y) > 0, the second equation in (18.17) yields ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–) โ‰  0 as well. As ๐œ‘โ€ฒ(๐‘ก) โˆˆ (0, 1)and ๐œ‘ (๐‘ก) > ๐‘ก for all ๐‘ก > 0, we can set ๐‘ฅโˆ— โ‰” โˆ’๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–)๐‘ฆโˆ— to obtain ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) with โ€–๐‘ฅโˆ—โ€– โ‰ค 1and ใ€ˆ๐‘ง, ๐‘ฅโˆ—ใ€‰ = ๐œ‘ (Y)

๐œ‘ โ€ฒ(โ€–๐‘ฆ โ€–) โ‰ฅ ๐œ‘ (Y) โ‰ฅ Y . ๏ฟฝ

252

18 tangent and normal cones

The following consequence of the ice cream cone lemma will be useful for several polarityrelations. We call a set ๐ถ closed near ๐‘ฅ โˆˆ ๐ถ , if there exists a ๐›ฟ > 0 such that ๐ถ โˆฉ ๐”น(๐‘ฅ, ๐›ฟ) isclosed.

Lemma 18.14. Let๐ถ โŠ‚ โ„๐‘ be closed near ๐‘ฅ . If ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ), then there exist Y > 0 and a sequence๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ such that for all ๐‘˜ โˆˆ โ„•,

(i) infฮ”๐‘ฅ๐‘˜โˆˆ๐‘‡๐ถ (๐‘ฅ๐‘˜ ) โ€–ฮ”๐‘ฅ๐‘˜ โˆ’ ๐‘งโ€– โ‰ฅ Y;(ii) there exists ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) with โ€–๐‘ฅโˆ—

๐‘˜โ€– โ‰ค 1 and ใ€ˆ๐‘ฅโˆ—

๐‘˜, ๐‘งใ€‰ โ‰ฅ Y.

Proof. First, ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ) implies by (18.10) the existence of Y > 0, ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , and ๐œ๐‘˜โ†’ 0such that

inf๐‘ฅโˆˆ๐ถ

๐‘ฅโˆ’๐‘ฅ๐‘˜๐œ๐‘˜โˆ’ ๐‘ง

โ‰ฅ Y (๐‘˜ โˆˆ โ„•),implying that

int๐”น(๐‘ฅ๐‘˜ + ๐œ๐‘˜๐‘ง, ๐œ๐‘˜Y) โˆฉ๐ถ = โˆ….By taking ๐œ๐‘˜ small enough โ€“ i.e., ๐‘˜ โˆˆ โ„• large enough โ€“ we may without loss of generalityassume that๐ถ is closed. For any Y โˆˆ (0, Y) and every ๐‘˜ โˆˆ โ„•, Lemma 18.13 now yields ๐‘ฅ๐‘˜ โˆˆ ๐ถand \๐‘˜ โˆˆ (0, 1] satisfying(iโ€™) โ€–(๐‘ฅ๐‘˜ + \๐‘˜๐œ๐‘˜๐‘ง) โˆ’ (๐‘ฅ + ๐œ๐‘˜๐‘ง)โ€– โ‰ค Y๐œ๐‘˜ and infฮ”๐‘ฅ๐‘˜โˆˆ๐‘‡๐ถ (๐‘ฅ๐‘˜ ) โ€–ฮ”๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜๐‘งโ€– โ‰ฅ Y๐œ๐‘˜ ;(iiโ€™) there exists an ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) such that ใ€ˆ๐‘ฅโˆ—

๐‘˜, ๐œ๐‘˜๐‘งใ€‰ โ‰ฅ ๐œ๐‘˜ Y and โ€–๐‘ฅโˆ—

๐‘˜โ€– โ‰ค 1.

We readily obtain (i) from (iโ€™) and (ii) from (iiโ€™) Since (iโ€™) also shows that ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ as ๐œ๐‘˜โ†’ 0,this nishes the proof. ๏ฟฝ

We can now show the converse inclusion of Lemma 18.12 when the set is closed near ๐‘ฅ .

Theorem 18.15. If ๐ถ โŠ‚ โ„๐‘ is closed near ๐‘ฅ , then

๐‘‡๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ)โ—ฆ.

Proof. By Lemma 18.12, we only need to prove๐‘‡๐ถ (๐‘ฅ) โŠƒ ๐‘๐ถ (๐‘ฅ)โ—ฆ. We argue by contraposition.Let ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ). Then Lemma 18.14 yields a sequence {๐‘ฅโˆ—

๐‘˜}๐‘˜โˆˆโ„• โŠ‚ โ„๐‘ such that ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜)

for ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ and ใ€ˆ๐‘ฅโˆ—๐‘˜, ๐‘งใ€‰ โ‰ฅ Y > 0 as well as โ€–๐‘ฅโˆ—

๐‘˜โ€– โ‰ค 1. Since {๐‘ฅโˆ—

๐‘˜}๐‘˜โˆˆโ„• is bounded, we

can extract a subsequence that converges to some ๐‘ฅโˆ— โˆˆ โ„๐‘ . By denition of the limitingnormal cone, ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ). Moreover, ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰ โ‰ฅ Y > 0. This provides, as required, that๐‘ง โˆ‰ ๐‘๐ถ (๐‘ฅ)โ—ฆ. ๏ฟฝ

253

18 tangent and normal cones

the limiting cones in infinite dimensions

We now repeat the arguments above in innite dimensions, however, we need extra careand extra assumptions. Besides reexivity (to obtain weak-โˆ— compactness from Eberleinโ€“Smulyan Theorem 1.9) and Gรขteaux smoothness (to obtain dierentiability of the norm),we need to use the approximate Fermat principle of Theorem 17.13 since exact projectionsto general sets ๐ถ may not exist; compare Theorem 17.14. This introduces Y-normal conesinto the proof. The geometric ideas of the proof, however, are the same as illustrated inFigure 18.3.

Lemma 18.16. Let ๐‘‹ be a Banach space, ๐ถ โŠ‚ ๐‘‹ be closed, and ๐‘ฅ โˆˆ ๐ถ . Let ๐‘ง โˆˆ ๐‘‹ \ {0} andY > 0 be such that

(18.18) int๐”น(๐‘ฅ + ๐‘ง, Y) โˆฉ๐ถ = โˆ….

Then for any Y โˆˆ (0, Y) and ๐œŒ > 0, there exists ๐‘ฅ โˆˆ ๐ถ such that there exist

(i) \ โˆˆ (0, 1] such that โ€–(๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง)โ€–๐‘‹ โ‰ค Y and infฮ”๐‘ฅโˆˆ๐‘‡๐ถ (๐‘ฅ) โ€–ฮ”๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ โ‰ฅ Y;(ii) if ๐‘‹ is Gรขteaux smooth, ๐‘ฅโˆ— โˆˆ ๐‘ ๐œŒ

๐ถ (๐‘ฅ) such that ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰๐‘‹ โ‰ฅ Y and โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1.

Proof. We dene the convex and increasing real function ๐œ‘ (๐‘ก) โ‰”โˆš1 + ๐‘ก2 and pick arbitrary

(18.19) Y โˆˆ (Y, Y), 0 < ๐œŒ <๐œ‘ (Y) โˆ’ Y2 + Y , and 0 < ๐›ฟ < ๐œ‘ (Y) โˆ’ ๐œ‘ (Y).

The upper bound on ๐œŒ is without loss of generality for (ii) because ๐‘ ๐œŒ๐ถ (๐‘ฅ) โŠ‚ ๐‘

๐œŒ โ€ฒ๐ถ (๐‘ฅ) for

๐œŒโ€ฒ โ‰ฅ ๐œŒ . Then we dene ๐น,๐บ : ๐‘‹ ร—โ„ โ†’ โ„ by

๐น (๐‘ฅ, \ ) โ‰” ๐œ‘ (Y)\ + ๐œ‘ (โ€–(๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง)โ€–๐‘‹ ) and ๐บ (๐‘ฅ, \ ) โ‰” ๐›ฟ๐ถ (๐‘ฅ) + ๐›ฟ [0,โˆž) (\ ).

The function ๐น + ๐บ is proper and coercive, hence inf (๐น + ๐บ) > โˆ’โˆž. However, it maynot admit a minimizer. Nevertheless, the approximate Fermat principle of Theorem 17.13produces an approximate minimizer (๐‘ฅ, \ ) โˆˆ ๐ถ ร— [0,โˆž) with(a) [๐น +๐บ] (๐‘ฅ, \ ) โ‰ค inf [๐น +๐บ] + ๐›ฟ ,(b) [๐น +๐บ] (๐‘ฅ, \ ) < [๐น +๐บ] (๐‘ฅ, \ ) + ๐œŒ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ + ๐œŒ |\ โˆ’ \ | for all (๐‘ฅ, \ ) โ‰  (๐‘ฅ, \ ), and(c) 0 โˆˆ ๐œ•๐œŒ [๐น +๐บ] (๐‘ฅ, \ ).

Let again ๐‘ฆ := (๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง).(i): We rst prove \ โˆˆ (0, ๐œ‘ (Y)๐œ‘ (Y) ], which will in particular imply that \ โˆˆ (0, 1 + Y). Suppose\ = 0. Since ๐‘ฅ โˆˆ ๐ถ , using (18.18) and the convexity of ๐œ‘ , we obtain

[๐น +๐บ] (๐‘ฅ, 0) โˆ’ ๐›ฟ = ๐œ‘ (โ€–๐‘ฅ โˆ’ (๐‘ฅ + ๐‘ง)โ€–๐‘‹ ) โˆ’ ๐›ฟ โ‰ฅ ๐œ‘ (Y) โˆ’ ๐›ฟ > ๐œ‘ (Y) = [๐น +๐บ] (๐‘ฅ, 1)

254

18 tangent and normal cones

in contradiction to (a). Thus \ โ‰  0. Likewise,

๐œ‘ (Y)\ + ๐œ‘ (โ€–๐‘ฆ โ€–๐‘‹ ) = [๐น +๐บ] (๐‘ฅ, \ ) โ‰ค [๐น +๐บ] (๐‘ฅ, 1) + ๐›ฟ = ๐œ‘ (Y) + ๐›ฟ < ๐œ‘ (Y).

where both terms on the left-hand side are nonnegative. Hence \ โ‰ค ๐œ‘ (Y)๐œ‘ (Y) . By monotonicity

of ๐œ‘ , this also veries the claim โ€–๐‘ฆ โ€–๐‘‹ โ‰ค Y.We still need to prove the claim on the tangent cone. Letting ๐‘ฆ โ‰” (๐‘ฅ + \๐‘ง) โˆ’ (๐‘ฅ + ๐‘ง), werearrange (b) as

(18.20) ๐œ‘ (Y) (\ โˆ’ \ ) โˆ’ ๐œŒ |\ โˆ’ \ | โ‰ค ๐œ‘ (โ€–๐‘ฆ โ€–๐‘‹ ) โˆ’ ๐œ‘ (โ€–๐‘ฆ โ€–๐‘‹ ) + ๐œŒ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹Using the convexity of ๐œ‘ , we also have

๐œ‘ (โ€–๐‘ฆ โ€–๐‘‹ ) โˆ’ ๐œ‘ (โ€–๐‘ฆ โ€–๐‘‹ ) โ‰ค 1๐œ‘ (โ€–๐‘ฆ โ€–๐‘‹ ) (โ€–๐‘ฆ โ€–๐‘‹ โˆ’ โ€–๐‘ฆ โ€–๐‘‹ ) โ‰ค โ€–๐‘ฆ โˆ’ ๐‘ฆ โ€–๐‘‹ = โ€–๐‘ฅ โˆ’ ๐‘ฅ โˆ’ (\ โˆ’ \ )๐‘งโ€–๐‘‹

Further estimating โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘ฅ โˆ’ ๐‘ฅ โˆ’ (\ โˆ’ \ )๐‘งโ€–๐‘‹ + |\ โˆ’ \ |, (18.20) now yields

[๐œ‘ (Y) โˆ’ 2๐œŒ] (\ โˆ’ \ ) โ‰ค (1 + ๐œŒ)โ€–๐‘ฅ โˆ’ ๐‘ฅ โˆ’ (\ โˆ’ \ )๐‘งโ€–๐‘‹ (\ โˆˆ [0, \ ), ๐‘ฅ โˆˆ ๐ถ).Dividing by (1 + ๐œŒ) (\ โˆ’ \ ) and using (18.19) (for the rst inequality), we obtain that

Y โ‰ค ๐œ‘ (Y) โˆ’ 2๐œŒ1 + ๐œŒ โ‰ค inf

๐‘ฅโˆˆ๐ถ, \โˆˆ[0,\ )

๐‘ฅ โˆ’ ๐‘ฅ\ โˆ’ \ โˆ’ ๐‘ง

๐‘‹

.

This shows infฮ”๐‘ฅโˆˆ๐‘‡๐ถ (๐‘ฅ) โ€–ฮ”๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ โ‰ฅ Y.(ii): By Lemma 3.4 (iv), ๐น is convex. Furthermore int(dom ๐น ) = ๐‘‹ ร— โ„, and hence ๐น isLipschitz near (๐‘ฅ, \ ) by Theorem 3.13. Using Theorems 4.6, 4.17 and 4.19 with ๐พ (๐‘ฅ, \ ) โ‰”๐‘ฅ + \๐‘ง, it follows that

๐œ•๐น (๐‘ฅ, \ ) ={(

๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ )๐‘ฆโˆ—๐œ‘ (Y) + ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ )ใ€ˆ๐‘ง, ๐‘ฆโˆ—ใ€‰๐‘‹

) ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰๐‘‹ = โ€–๐‘ฆ โ€–๐‘‹ , โ€–๐‘ฆโˆ—โ€–๐‘‹ โˆ— = 1 if ๐‘ฆ โ‰  0โ€–๐‘ฆโˆ—โ€–๐‘‹ โˆ— โ‰ค 1 if ๐‘ฆ = 0

}.(18.21)

Again, ๐œ•๐น (๐‘ฅ, \ ) is a singleton by Lemma 17.7 and the assumption that ๐‘‹ is Gรขteaux smooth.

We can thus apply the Y-sum rule (Lemma 17.2) in (c) to deduce 0 โˆˆ ๐œ•๐น (๐‘ฅ, \ ) + ๐œ•๐œŒ๐บ (๐‘ฅ, \ ).Since \ > 0, we have ๐œ•๐œŒ๐บ (๐‘ฅ, \ ) = ๐‘ ๐œŒ

๐ถ (๐‘ฅ) ร— {0}, which implies that

(18.22) โˆ’ ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ )๐‘ฆโˆ— โˆˆ ๐‘ ๐œŒ๐ถ (๐‘ฅ) and ๐œ‘ (Y) + ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ )ใ€ˆ๐‘ง, ๐‘ฆโˆ—ใ€‰๐‘‹ = 0.

Since ๐œ‘ (Y) > 0, the second equation in (18.22) yields ๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ ) โ‰  0 as well. As ๐œ‘โ€ฒ(๐‘ก) โˆˆ (0, 1)and ๐œ‘ (๐‘ก) > ๐‘ก for all ๐‘ก > 0, we can set ๐‘ฅโˆ— โ‰” โˆ’๐œ‘โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ )๐‘ฆโˆ— to obtain ๐‘ฅโˆ— โˆˆ ๐‘

๐œŒ๐ถ (๐‘ฅ) with

โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1 and ใ€ˆ๐‘ง, ๐‘ฅโˆ—ใ€‰๐‘‹ = ๐œ‘ (Y)๐œ‘ โ€ฒ(โ€–๐‘ฆ โ€–๐‘‹ ) โ‰ฅ ๐œ‘ (Y) โ‰ฅ Y . ๏ฟฝ

Remark 18.17. If ๐‘‹ is in addition reexive, we can use the Eberleinโ€“Smulyan Theorem 1.9 to pass tothe limit as ๐œŒโ†’ 0 in Lemma 18.16 and produce ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) satisfying the other claims of the lemma.

255

18 tangent and normal cones

Lemma 18.18. Let ๐‘‹ be a Banach space and ๐ถ โŠ‚ ๐‘‹ be closed near ๐‘ฅ โˆˆ ๐ถ . If ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ), thenthere exist Y > 0 and a sequence ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ such that for all ๐‘˜ โˆˆ โ„•,

(i) infฮ”๐‘ฅ๐‘˜โˆˆ๐‘‡๐ถ (๐‘ฅ๐‘˜ ) โ€–ฮ”๐‘ฅ๐‘˜ โˆ’ ๐‘งโ€–๐‘‹ โ‰ฅ Y;(ii) if ๐‘‹ is Gรขteaux smooth, there exists ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) with โ€–๐‘ฅโˆ—

๐‘˜โ€–๐‘‹ โˆ— โ‰ค 1 and ใ€ˆ๐‘ฅโˆ—

๐‘˜, ๐‘งใ€‰๐‘‹ โˆ— โ‰ฅ Y.

Proof. The assumption ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ) implies by (18.10) the existence of Y > 0,๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , and๐œ๐‘˜โ†’ 0 such that

inf๐‘ฅโˆˆ๐ถ

๐‘ฅโˆ’๐‘ฅ๐‘˜๐œ๐‘˜โˆ’ ๐‘ง

๐‘‹โ‰ฅ Y (๐‘˜ โˆˆ โ„•).

This implies thatint๐”น(๐‘ฅ๐‘˜ + ๐œ๐‘˜๐‘ง, ๐œ๐‘˜Y) โˆฉ๐ถ = โˆ….

Since the argument is local, by taking ๐œ๐‘˜ small enough โ€“ i.e., ๐‘˜ โˆˆ โ„• large enough โ€“ wemay without loss of generality assume that ๐ถ is closed. For any Y โˆˆ (0, Y) and every ๐‘˜ โˆˆ โ„•,Lemma 18.16 now produces ๐‘ฅ๐‘˜ โˆˆ ๐ถ and \๐‘˜ โˆˆ (0, 1] satisfying(iโ€™) โ€–(๐‘ฅ๐‘˜ + \๐‘˜๐œ๐‘˜๐‘ง) โˆ’ (๐‘ฅ + ๐œ๐‘˜๐‘ง)โ€–๐‘‹ โ‰ค Y๐œ๐‘˜ and infฮ”๐‘ฅ๐‘˜โˆˆ๐‘‡๐ถ (๐‘ฅ๐‘˜ ) โ€–ฮ”๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜๐‘งโ€–๐‘‹ โ‰ฅ Y๐œ๐‘˜ ;(iiโ€™) if ๐‘‹ is Gรขteaux smooth, there exists ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘ Y๐œ๐‘˜

๐ถ (๐‘ฅ๐‘˜) such that ใ€ˆ๐‘ฅโˆ—๐‘˜, ๐œ๐‘˜๐‘งใ€‰๐‘‹ โ‰ฅ ๐œ๐‘˜ Y and

โ€–๐‘ฅโˆ—๐‘˜โ€–๐‘‹ โˆ— โ‰ค 1.

We readily obtain (i) from (iโ€™) and (ii) from (iiโ€™). Since (iโ€™) also shows that ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ as ๐œ๐‘˜โ†’ 0,this nishes the proof. ๏ฟฝ

Theorem 18.19. Let ๐‘‹ be a reexive and Gรขteaux smooth Banach space and let ๐ถ โŠ‚ ๐‘‹ beclosed near ๐‘ฅ โˆˆ ๐ถ . Then

๐‘‡๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ)โ—ฆ.

Proof. By Lemma 18.12, we only need to prove ๐‘‡๐ถ (๐‘ฅ) โŠƒ ๐‘๐ถ (๐‘ฅ)โ—ฆ. Let ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ). ThenLemma 18.18 yields a sequence {๐‘ฅโˆ—

๐‘˜}๐‘˜โˆˆโ„• โŠ‚ ๐”น๐‘‹ โˆ— such that ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐ถ (๐‘ฅ๐‘˜) and ใ€ˆ๐‘ฅโˆ—

๐‘˜, ๐‘งใ€‰๐‘‹ โ‰ฅ Y.

Since ๐‘‹ is reexive, ๐‘‹ โˆ— is reexive as well, and so we can apply Theorem 1.9 to extract asubsequence of {๐‘ฅโˆ—

๐‘˜}๐‘˜โˆˆโ„• that converges weakly and thus, again by reexivity, also weakly-โˆ—

to some ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) (by denition of the limiting normal cone) with ใ€ˆ๐‘ฅโˆ—, ๐‘งใ€‰๐‘‹ โ‰ฅ Y > 0. ๏ฟฝ

the clarke tangent cone

We can now show the promised alternative characterization of the Clarke tangent cone๐‘‡๐ถ (๐‘ฅ) as the inner limit of tangent cones.

256

18 tangent and normal cones

Corollary 18.20. Let ๐‘‹ be a reexive Banach space and let ๐ถ โŠ‚ ๐‘‹ be closed near ๐‘ฅ โˆˆ ๐‘‹ . Then

(18.23) lim inf๐ถ3๐‘ฅโ†’๐‘ฅ

๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐ถ (๐‘ฅ) โŠ‚ lim inf๐ถ3๐‘ฅโ†’๐‘ฅ

๐‘‡๐‘ค๐ถ (๐‘ฅ).

In particular, if ๐‘‹ is nite-dimensional, then

๐‘‡๐ถ (๐‘ฅ) = lim inf๐ถ3๐‘ฅโ†’๐‘ฅ

๐‘‡๐ถ (๐‘ฅ).

Proof. We have already proved the second inclusion of (18.23) in Lemma 18.11. For the rstinclusion, suppose ๐‘ง โˆ‰ ๐‘‡๐ถ (๐‘ฅ). Then Lemma 18.18 yields an Y > 0 and a sequence๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅsuch that infฮ”๐‘ฅ๐‘˜โˆˆ๐‘‡๐ถ (๐‘ฅ๐‘˜ ) โ€–ฮ”๐‘ฅ๐‘˜ โˆ’๐‘งโ€–๐‘‹ โ‰ฅ Y for all ๐‘˜ . This shows that ๐‘ง โˆ‰ lim inf๐ถ3๐‘ฅโ†’๐‘ฅ ๐‘‡๐ถ (๐‘ฅ). ๏ฟฝ

Remark 18.21. Lemma 18.18 and thus the rst inclusion of (18.23) do not actually require the reexivityof ๐‘‹ . In contrast , Lemma 18.11 and thus the second inclusion of (18.23) do not require the localclosedness assumption. Besides ๐‘‹ being reexive, it holds more generally if ๐‘‹ has the Radonโ€“Rieszproperty and is Frรฉchet smooth; see [Mordukhovich 2006, Theorem 1.9] and compare Remark 17.5.

18.4 regularity

It stands to reason that without any assumptions on the set ๐ถ โŠ‚ ๐‘‹ such as convexity,there is little hope of obtaining precise characterizations or exact transformation rulesfor the various cones. Similarly, precise characterizations or exact calculus rules for thederivatives of set-valued mappings โ€“ which, respectively, we will derive from the formerโ€“ require strong assumptions on these mappings. This is especially true of the limitingcones. As betting the introductory character of this textbook, we will therefore onlydevelop calculus for the derivatives based on the limiting cones when they are equal thecorresponding basic cones โ€“ i.e., when they are regular. This will allow deriving exactresults that are nevertheless applicable to the situations we have been focusing on in theprevious parts, such as problems of the form (P). These conditions can be compared toconstraint qualications in nonlinear optimization that guarantee that the tangent conecoincides with the linearization cone. However, โ€œfuzzyโ€ results are available under moregeneral assumptions, for which we refer to the monographs [Aubin & Frankowska 1990;Rockafellar & Wets 1998; Mordukhovich 2018; Mordukhovich 2006].

Specically, we say that ๐ถ โŠ‚ ๐‘‹ is tangentially regular at ๐‘ฅ โˆˆ ๐ถ if ๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ), andnormally regular at ๐‘ฅ if ๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ). We call ๐ถ regular at ๐‘ฅ if ๐ถ is both normally andtangentially regular.

Example 18.22. With ๐ถ โŠ‚ โ„2 as in Example 18.1, we see that ๐ถ = ๐”น(0, 1) and ๐ถ = [0, 1]2are regular at every ๐‘ฅ โˆˆ ๐ถ , while ๐ถ = [0, 1]2 \ [ 12 , 1]2 is regular everywhere except at๐‘ฅ = ( 12 , 12 ).

257

18 tangent and normal cones

In nite dimensions, the two concepts of regularity are equivalent and have various char-acterizations. By Lemma 18.6, these hold in particular for closed convex sets.

Theorem 18.23. Let ๐ถ โŠ‚ โ„๐‘ be closed near ๐‘ฅ . Then the following conditions are equivalent:

(i) ๐ถ is normally regular at ๐‘ฅ ;

(ii) ๐ถ is tangentially regular at ๐‘ฅ ;

(iii) ๐‘๐ถ is outer semicontinuous at ๐‘ฅ ;

(iv) ๐‘‡๐ถ is inner semicontinuous at ๐‘ฅ (relative to ๐ถ).

In particular, if any of these hold, ๐ถ is regular at ๐‘ฅ .

Proof. (i) โ‡” (ii): If (i) holds, then by Theorems 1.8, 18.5 and 18.15 and Lemma 18.10

๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐ถ (๐‘ฅ)โ—ฆโ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐ถ (๐‘ฅ),

which shows (ii). The other direction is completely analogous, exchanging the roles of โ€œ๐‘ โ€and โ€œ๐‘‡ โ€ to obtain

๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ)โ—ฆโ—ฆ = ๐‘‡๐ถ (๐‘ฅ)โ—ฆ = ๐‘‡๐ถ (๐‘ฅ)โ—ฆ = ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ).

(i)โ‡” (iii): If (i) holds, then the outer semicontinuity of ๐‘๐ถ (Corollary 18.9) and the inclu-sion ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ) from Theorem 18.5 show that lim sup๐‘ฅโ†’๐‘ฅ ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ), i.e., theouter semicontinuity of ๐‘๐ถ . Conversely, the outer semicontinuity of ๐‘๐ถ and the deni-tion ๐‘๐ถ (๐‘ฅ) = lim sup๐‘ฅโ†’๐‘ฅ ๐‘๐ถ (๐‘ฅ) show that ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ). Combined with the inclusion๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ) from Theorem 18.5, we obtain (i).

(ii) โ‡” (iv): To show that (iv) implies (ii), recall from Corollary 18.20 that

(18.24) ๐‘‡๐ถ (๐‘ฅ) = lim inf๐ถ3๐‘ฅโ†’๐‘ฅ

๐‘‡๐ถ (๐‘ฅ).

By the assumed inner semicontinuity and the denition of the inner limit, we thus obtainthat ๐‘‡๐ถ (๐‘ฅ) = lim inf๐ถ3๐‘ฅโ†’๐‘ฅ ๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ). For the other direction, we simply use ๐‘‡๐ถ (๐‘ฅ) =๐‘‡๐ถ (๐‘ฅ) in (18.24). ๏ฟฝ

Combining the previous result with Lemma 18.10 and Theorem 18.15, we deduce the follow-ing.

Corollary 18.24. If ๐ถ โŠ‚ โ„๐‘ is regular at ๐‘ฅ and closed near ๐‘ฅ , then both ๐‘‡๐ถ (๐‘ฅ) and ๐‘๐ถ (๐‘ฅ) areconvex. Furthermore,

(i) ๐‘๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ)โ—ฆ;(ii) ๐‘‡๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ)โ—ฆ.

258

18 tangent and normal cones

In innite dimensions, our main equivalent characterization of normal regularity is thefollowing. (We do not have a similar characterization of tangential regularity.)

Theorem 18.25. Let ๐‘‹ be a reexive and Gรขteaux smooth Banach space. Then ๐ถ โŠ‚ ๐‘‹ isnormally regular at ๐‘ฅ โˆˆ ๐ถ if and only if ๐‘‡๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ)โ—ฆ.

Proof. Suppose rst that ๐‘‡๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ)โ—ฆ. Since ๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ)โ—ฆ by Lemma 18.12, wehave ๐‘๐ถ (๐‘ฅ)โ—ฆ โŠ‚ ๐‘๐ถ (๐‘ฅ)โ—ฆ. Furthermore, Theorem 18.5 (ii) yields ๐‘๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ) and thus๐‘๐ถ (๐‘ฅ)โ—ฆ โŠƒ ๐‘๐ถ (๐‘ฅ)โ—ฆ by Theorem 1.8. It follows that ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ. We now recall fromTheorem 18.8 that ๐‘๐ถ (๐‘ฅ) is closed and convex. Hence ๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ) \ ๐‘๐ถ (๐‘ฅ) implies byTheorem 1.13 that there exists ๐‘ฅ โˆˆ ๐‘‹ and _ โˆˆ โ„ such that

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค _ < ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ (๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฅ)) .Since ๐‘๐ถ (๐‘ฅ) is a cone, this is only possible for _ โ‰ฅ 0. Thus the rst inequality shows that๐‘ฅ โˆˆ ๐‘๐ถ (๐‘ฅ)โ—ฆ and the second that ๐‘ฅ โˆ‰ ๐‘๐ถ (๐‘ฅ)โ—ฆ. This is in contradiction to ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ.Hence ๐‘๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ), i.e., ๐ถ is normally regular at ๐‘ฅ .

Conversely, if ๐ถ is normally regular at ๐‘ฅ , we obtain using Lemma 18.12 that

๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ.By Lemma 18.10 (i), Theorem 18.5 (i), and Theorem 1.8 using the fact that ๐‘‡๐ถ (๐‘ฅ) is a closedconvex cone by Theorem 18.8, we also have

๐‘๐ถ (๐‘ฅ)โ—ฆ โŠƒ ๐‘‡๐ถ (๐‘ฅ)โ—ฆโ—ฆ โŠƒ ๐‘‡๐ถ (๐‘ฅ)โ—ฆโ—ฆ = ๐‘‡๐ถ (๐‘ฅ).Therefore ๐‘‡๐ถ (๐‘ฅ) = ๐‘๐ถ (๐‘ฅ)โ—ฆ as claimed. ๏ฟฝ

In suciently regular spaces, normal regularity implies tangential regularity of closedsets.

Lemma 18.26. Let ๐‘‹ be a reexive and Gรขteaux smooth Banach space and let๐ถ โŠ‚ ๐‘‹ be closednear ๐‘ฅ โˆˆ ๐ถ . If ๐ถ is normally regular at ๐‘ฅ , then ๐ถ is tangentially regular at ๐‘ฅ .

Proof. Arguing as in the proof of Theorem 18.23 (i) โ‡” (ii), by Theorems 1.8, 18.5 and 18.19and Lemma 18.10 we have

๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐‘ค๐ถ (๐‘ฅ)โ—ฆโ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘๐ถ (๐‘ฅ)โ—ฆ = ๐‘‡๐ถ (๐‘ฅ) โŠ‚ ๐‘‡๐ถ (๐‘ฅ).This shows that ๐‘‡๐ถ (๐‘ฅ) = ๐‘‡๐ถ (๐‘ฅ). ๏ฟฝ

From Lemmas 18.6 and 18.26, we immediately obtain the following regularity result.

Corollary 18.27. Let ๐‘‹ be a Gรขteaux smooth Banach space and let๐ถ โŠ‚ ๐‘‹ be nonempty, closed,and convex. Then ๐ถ is normally regular at every ๐‘ฅ โˆˆ ๐ถ . If ๐‘‹ is additionally reexive, then ๐ถis also tangentially regular at every ๐‘ฅ โˆˆ ๐ถ .

259

19 TANGENT AND NORMAL CONES OF

POINTWISE-DEFINED SETS

As we have seen in Chapter 18, the relationships between the dierent tangent and normalcones are less complete in innite-dimensional spaces than in nite-dimensional ones. Inthis chapter, however, we show that certain pointwise-dened sets on ๐ฟ๐‘ (ฮฉ) for ๐‘ โˆˆ (1,โˆž)largely satisfy the nite-dimensional relations. We will use these results in Chapter 21 toderive expressions for generalized derivatives of pointwise-dened set-valued mappings,in particular for subdierentials of integral functionals. As mentioned in Section 18.4, theserelations are less satisfying for the limiting cones than for the basic cones. To treat thelimiting cones, we will therefore assume the regularity of the underlying pointwise sets. Forthe basic cones, we also require an assumption, which however is weaker than (tangential)regularity.

19.1 derivability

We start with the fundamental regularity assumption. Let ๐‘‹ be a Banach space and ๐ถ โŠ‚ ๐‘‹ .We then say that a tangent vector ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ) at ๐‘ฅ โˆˆ ๐ถ is derivable if there exists an Y > 0and a curve b : [0, Y] โ†’ ๐ถ that generates ฮ”๐‘ฅ at 0, i.e.,

(19.1) b (0) = ๐‘ฅ and ฮ”๐‘ฅ = lim๐œโ†’ 0

b (๐œ) โˆ’ b (0)๐œ

= bโ€ฒ(0).

Note that we do not make any assumptions on the dierentiability or continuity of bexcept at ๐œ = 0. We say that ๐ถ is geometrically derivable at ๐‘ฅ โˆˆ ๐ถ if every ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ) isderivable.

As the next lemma shows, the point of this denition is that derivable tangent vectors arecharacterized by a full limit instead of just an inner limit; this additional property willallow us to construct tangent vectors in ๐ฟ๐‘ (ฮฉ) from pointwise tangent vectors, similarlyto how Clarke regularity was used to obtain equality in the pointwise characterization ofClarke subdierentials of integral functionals in Theorem 13.9.

260

19 tangent and normal cones of pointwise-defined sets

Lemma 19.1. Let๐ถ โŠ‚ ๐‘‹ and ๐‘ฅ โˆˆ ๐ถ . Then the set๐‘‡ 0๐ถ (๐‘ฅ) of derivable tangent vectors is given by

(19.2) ๐‘‡ 0๐ถ (๐‘ฅ) = lim inf

๐œโ†’ 0

๐ถ โˆ’ ๐‘ฅ๐œ

.

Proof. We rst recall that by denition of the inner limit, ฮ”๐‘ฅ is an element of the set on theright-hand side if for every sequence ๐œ๐‘˜โ†’ 0 there exist ๐‘ฅ๐‘˜ โˆˆ ๐ถ such that (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ .For a derivable tangent vector ฮ”๐‘ฅ โˆˆ ๐‘‡ 0

๐ถ (๐‘ฅ) and any ๐œ๐‘˜โ†’ 0, we can simply take ๐‘ฅ๐‘˜ = b (๐œ๐‘˜).For the converse inclusion, let ฮ”๐‘ฅ be an element of the right-hand side set. Let now ๐œ๐‘˜โ†’ 0be given and take ๐‘ฅ๐‘˜ โˆˆ ๐ถ realizing the inner limit. Since ๐œ๐‘˜โ†’ 0 was arbitrary, settingb (๐œ๐‘˜) โ‰” ๐‘ฅ๐‘˜ for all ๐‘˜ โˆˆ โ„• denes a curve b : [0, Y] โ†’ ๐ถ for some Y > 0, and henceฮ”๐‘ฅ โˆˆ ๐‘‡ 0

๐ถ (๐‘ฅ). ๏ฟฝ

By taking ๐‘ฅ โ‰ก ๐‘ฅ constant in (18.10) and comparing with (19.2), we immediately obtain thatall Clarke tangent vectors are derivable.

Corollary 19.2. Let ๐ถ โŠ‚ ๐‘‹ and ๐‘ฅ โˆˆ ๐ถ . Then every ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ) is derivable.

Clearly, if ๐ถ is tangentially regular at ๐‘ฅ , then also every tangent vector is derivable.

Corollary 19.3. If ๐ถ โŠ‚ ๐‘‹ is tangentially regular at ๐‘ฅ โˆˆ ๐ถ , then every ฮ”๐‘ฅ โˆˆ ๐‘‡๐ถ (๐‘ฅ) is derivable.

However, a set can be geometrically derivable without being tangentially regular.

Example 19.4. Let ๐ถ โ‰” ( [0,โˆž) ร— {0}) โˆช ({0} ร— [0,โˆž)) โŠ‚ โ„2. Then we obtain directlyfrom the denition of the tangent cone that

๐‘‡๐ถ (๐‘ฅ1, ๐‘ฅ2) =

๐ถ, if (๐‘ฅ1, ๐‘ฅ2) = (0, 0),โ„ ร— {0}, if ๐‘ฅ1 = 0, ๐‘ฅ2 > 0,{0} ร—โ„, if ๐‘ฅ1 > 0, ๐‘ฅ2 = 0,โˆ…, otherwise.

However, it follows from Corollary 18.20 that ๐‘‡๐ถ (0, 0) = {(0, 0)}. Thus ๐ถ is not tangen-tially regular at (0, 0).On the other hand, for any ฮ”๐‘ฅ = (๐‘ก1, 0) โˆˆ ๐‘‡๐ถ (0, 0), ๐‘ก1 โˆˆ โ„, setting b (๐‘ ) := (๐‘ ๐‘ก1, 0)yields b (0) = (0, 0) and bโ€ฒ(0) = (๐‘ก1, 0) = ฮ”๐‘ฅ . Hence ฮ”๐‘ฅ is derivable. Similarly, settingb (๐‘ ) := (0, ๐‘ ๐‘ก2) shows that ฮ”๐‘ฅ = (0, ๐‘ก2) โˆˆ ๐‘‡๐ถ (0, 0) is derivable for every ๐‘ก2 โˆˆ โ„. Thus ๐ถis geometrically derivable at (0, 0).

261

19 tangent and normal cones of pointwise-defined sets

19.2 tangent and normal cones

As the goal is to dene derivatives of set-valued mappings ๐น : ๐‘‹ โ‡’ ๐‘Œ via tangent cones totheir epigraphs epi ๐น โŠ‚ ๐‘‹ ร—๐‘Œ , we need to consider product spaces of ๐‘-integrable functions(with possibly dierent ๐‘). Let therefore ฮฉ โŠ‚ โ„๐‘‘ be an open and bounded domain. Forยฎ๐‘ โ‰” (๐‘1, . . . , ๐‘๐‘š) โˆˆ (1,โˆž)๐‘š , we then dene

๐ฟ ยฎ๐‘ (ฮฉ) โ‰” ๐ฟ๐‘1 (ฮฉ) ร— ยท ยท ยท ร— ๐ฟ๐‘๐‘š (ฮฉ),

endowed with the canonical euclidean product norm, i.e.,

โ€–๐‘ขโ€–๐ฟ ยฎ๐‘ โ‰”

โˆš๐‘šโˆ‘๐‘˜=1

โ€–๐‘ข๐‘˜ โ€–2๐ฟ๐‘๐‘˜ (๐‘ข = (๐‘ข1, . . . , ๐‘ข๐‘š) โˆˆ ๐ฟ ยฎ๐‘).

We will need the case ๐‘š = 2 in Chapter 21; on rst reading of the present chapter, werecommend picturing๐‘š = 1, i.e., ๐ฟ ยฎ๐‘ (ฮฉ) = ๐ฟ๐‘ (ฮฉ) for some ๐‘ โˆˆ (1,โˆž). We further denoteby ๐‘โˆ— the conjugate exponent of ๐‘ โˆˆ (1,โˆž), dened as satisfying 1/๐‘ + 1/๐‘โˆ— = 1, and writeยฎ๐‘โˆ— โ‰” (๐‘โˆ—1 , . . . , ๐‘โˆ—๐‘š) so that ๐ฟ ยฎ๐‘ (ฮฉ)โˆ— ๏ฟฝ ๐ฟ ยฎ๐‘โˆ— (ฮฉ). Note that ๐ฟ ยฎ๐‘ (ฮฉ) is reexive and Gรขteauxsmooth as the product of reexive and Gรขteaux smooth spaces; cf. Example 17.6. Finally, wewill write L(ฮฉ) for the ๐‘‘-dimensional Lebesgue measure of ฮฉ and recall the characteristicfunction ๐Ÿ™๐‘ˆ of a set ๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ), which satises ๐Ÿ™๐‘ˆ (๐‘ข) = (1, . . . , 1) โˆˆ โ„๐‘š if ๐‘ข โˆˆ ๐‘ˆ and๐Ÿ™๐‘ˆ (๐‘ข) = 0 โˆˆ โ„๐‘š otherwise.

We then call a set๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) for ยฎ๐‘ โˆˆ (1,โˆž)๐‘š pointwise dened if

๐‘ˆ โ‰”{๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ) | ๐‘ข (๐‘ฅ) โˆˆ ๐ถ (๐‘ฅ) for a.e. ๐‘ฅ โˆˆ ฮฉ

}for a Borel-measurable mapping ๐ถ : ฮฉ โ‡’ โ„๐‘š with ๐ถ (๐‘ฅ) โŠ‚ โ„๐‘š . We say that๐‘ˆ is pointwisederivable if ๐ถ (๐‘ฅ) is geometrically derivable at every b โˆˆ ๐ถ (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ.

the fundamental cones

We now derive pointwise characterizations of the fundamental cones to pointwise denedsets, starting with the tangent cone.

Theorem 19.5. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable. Then for every ๐‘ข โˆˆ ๐‘ˆ ,

(19.3) ๐‘‡๐‘ˆ (๐‘ข) ={ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ)

๏ฟฝ๏ฟฝ ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Proof. The inclusion โ€œโŠ‚โ€ follows from (18.1) and the fact that a sequence convergent in๐ฟ ยฎ๐‘ (ฮฉ) for ยฎ๐‘ โˆˆ (1,โˆž) converges, after possibly passing to a subsequence, pointwise almosteverywhere.

262

19 tangent and normal cones of pointwise-defined sets

For the converse inclusion, we take for almost every ๐‘ฅ โˆˆ ฮฉ a tangent vector ฮ”๐‘ข (๐‘ฅ) โˆˆ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) at ๐‘ข (๐‘ฅ) โˆˆ ๐ถ (๐‘ฅ). We only need to consider the case ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ). By geometricderivability, we may nd for almost every ๐‘ฅ โˆˆ ฮฉ an Y (๐‘ฅ) > 0 and a curve b ( ยท , ๐‘ฅ) :[0, Y (๐‘ฅ)] โ†’ ๐ถ (๐‘ฅ) such that b (0, ๐‘ฅ) = ๐‘ข (๐‘ฅ) and bโ€ฒ+(0, ๐‘ฅ) = ฮ”๐‘ข (๐‘ฅ). In particular, for anygiven ๐œŒ > 0, we may nd Y๐œŒ (๐‘ฅ) โˆˆ (0, Y (๐‘ฅ)] such that

(19.4) |b (๐‘ก, ๐‘ฅ) โˆ’ b (0, ๐‘ฅ) โˆ’ ฮ”๐‘ข (๐‘ฅ)๐‘ก |2๐‘ก

โ‰ค ๐œŒ (๐‘ก โˆˆ (0, Y๐œŒ (๐‘ฅ)], a.e. ๐‘ฅ โˆˆ ฮฉ).

For ๐‘ก > 0, let us set๐ธ๐œŒ,๐‘ก โ‰” {๐‘ฅ โˆˆ ฮฉ | ๐‘ก โ‰ค Y๐œŒ (๐‘ฅ)}

and dene

๏ฟฝ๏ฟฝ๐œŒ,๐‘ก (๐‘ฅ) โ‰”{b (๐‘ก, ๐‘ฅ) if ๐‘ฅ โˆˆ ๐ธ๐œŒ,๐‘ก ,๐‘ข (๐‘ฅ) if ๐‘ฅ โˆˆ ฮฉ \ ๐ธ๐œŒ,๐‘ก .

Writing b = (b1, . . . , b๐‘š) and ฮ”๐‘ข = (ฮ”๐‘ข1, . . . ,ฮ”๐‘ข๐‘š), we have from (19.4) that

(19.5)|b ๐‘— (๐‘ก, ๐‘ฅ) โˆ’ b ๐‘— (0, ๐‘ฅ) โˆ’ ฮ”๐‘ข ๐‘— (๐‘ฅ)๐‘ก |

๐‘กโ‰ค ๐œŒ ( ๐‘— = 1, . . . ,๐‘š, ๐‘ก โˆˆ (0, Y๐œŒ (๐‘ฅ)] for a.e. ๐‘ฅ โˆˆ ฮฉ).

Therefore, using the elementary inequality (๐‘Ž + ๐‘)2 โ‰ค 2๐‘Ž2 + 2๐‘2, we obtain

(19.6) โ€–๏ฟฝ๏ฟฝ๐œŒ,๐‘ก โˆ’ ๐‘ขโ€–2๐ฟ ยฎ๐‘ =

๐‘šโˆ‘๐‘—=1

โ€– [๏ฟฝ๏ฟฝ๐œŒ,๐‘ก๐‘— โˆ’ ๐‘ข] ๐‘— โ€–2๐ฟ๐‘ ๐‘—

โ‰ค๐‘šโˆ‘๐‘—=1

(โˆซฮฉ๐‘ก๐‘ ๐‘— (๐œŒ + |ฮ”๐‘ข ๐‘— (๐‘ฅ) |)๐‘ ๐‘— ๐‘‘๐‘ฅ

)2/๐‘ ๐‘—โ‰ค

๐‘šโˆ‘๐‘—=1

(๐‘ก๐œŒL(ฮฉ)1/๐‘ ๐‘— + ๐‘ก โ€–ฮ”๐‘ข ๐‘— โ€–๐ฟ๐‘ ๐‘—

)2โ‰ค 2๐‘ก2

๐‘šโˆ‘๐‘—=1

(๐œŒL(ฮฉ)1/๐‘ ๐‘—

)2+ 2๐‘ก2โ€–ฮ”๐‘ขโ€–2

๐ฟ ยฎ๐‘ .

Similarly, (19.5) and the same elementary inequality together with Minkowskiโ€™s inequalityin the form (๐‘Ž๐‘ + ๐‘๐‘)1/๐‘ โ‰ค |๐‘Ž | + |๐‘ | yield

(19.7)โ€–๏ฟฝ๏ฟฝ๐œŒ,๐‘ก โˆ’ ๐‘ข โˆ’ ๐‘กฮ”๐‘ขโ€–2

๐ฟ ยฎ๐‘

๐‘ก2=

๐‘šโˆ‘๐‘—=1

1๐‘ก2

(โˆซ๐ธ๐œŒ,๐‘ก

|b ๐‘— (๐‘ก, ๐‘ฅ) โˆ’ b ๐‘— (0, ๐‘ฅ) โˆ’ ๐‘กฮ”๐‘ข ๐‘— (๐‘ฅ) |๐‘ ๐‘— ๐‘‘๐‘ฅ

+โˆซฮฉ\๐ธ๐œŒ,๐‘ก

|ฮ”๐‘ข ๐‘— (๐‘ฅ)๐‘ก |๐‘ ๐‘— ๐‘‘๐‘ฅ)2/๐‘ ๐‘—

โ‰ค๐‘šโˆ‘๐‘—=1

(๐œŒ๐‘ ๐‘—L(ฮฉ) + โ€–ฮ”๐‘ข๐Ÿ™ฮฉ\๐ธ๐œŒ,๐‘ก โ€–

๐‘ ๐‘—

๐ฟ ยฎ๐‘

)2/๐‘ ๐‘—โ‰ค 2

๐‘šโˆ‘๐‘—=1

(๐œŒL(ฮฉ)1/๐‘ ๐‘—

)2+ 2โ€–ฮ”๐‘ข๐Ÿ™ฮฉ\๐ธ๐œŒ,๐‘ก โ€–2๐ฟ ยฎ๐‘ .

263

19 tangent and normal cones of pointwise-defined sets

Now for each ๐‘˜ โˆˆ โ„•, we can nd ๐‘ก๐‘˜โ†’ 0 such that โ€–ฮ”๐‘ข๐Ÿ™ฮฉ\๐ธ1/๐‘˜,๐‘ก๐‘˜ โ€–๐ฟ ยฎ๐‘ โ‰ค 1/๐‘˜ . This follows fromLebesgueโ€™s dominated convergence theorem and the fact that L(ฮฉ \ ๐ธ๐œŒ,๐‘ก ) โ†’ 0 as ๐‘ก โ†’ 0.The estimates (19.6) and (19.7) with ๐œŒ = 1/๐‘˜ and ๐‘ก = ๐‘ก๐‘˜ thus show for ๐‘ข๐‘˜ โ‰” ๏ฟฝ๏ฟฝ1/๐‘˜,๐‘ก๐‘˜ that๐‘ข๐‘˜ โ†’ ๐‘ข and (๐‘ข๐‘˜ โˆ’ ๐‘ข)/๐‘ก๐‘˜ โ†’ ฮ”๐‘ข, i.e., ฮ”๐‘ข โˆˆ ๐‘‡๐‘ˆ (๐‘ข). ๏ฟฝ

We next consider the Frรฉchet normal cone.

Theorem 19.6. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable. Then for every ๐‘ข โˆˆ ๐‘ˆ ,

(19.8) ๐‘๐‘ˆ (๐‘ข) ={๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Proof. Recalling the denition of ๐‘๐‘ˆ (๐‘ข) from (18.7), we need to nd all ๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ) satis-fying for every given sequence๐‘ˆ 3 ๐‘ข๐‘˜ โ†’ ๐‘ข

(19.9) 0 โ‰ฅ lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ขโˆ—, ๐‘ข๐‘˜ โˆ’ ๐‘ขใ€‰๐ฟ ยฎ๐‘

โ€–๐‘ข๐‘˜ โˆ’ ๐‘ขโ€–๐ฟ ยฎ๐‘=: lim sup

๐‘˜โ†’โˆž๐ฟ๐‘˜ .

Let Y > 0 be arbitrary and set ๐‘ฃ๐‘˜ โ‰” ๐‘ข โˆ’ ๐‘ข๐‘˜ as well as(19.10a) ๐‘ 1

๐‘˜ โ‰” {๐‘ฅ โˆˆ ฮฉ | |๐‘ฃ๐‘˜ (๐‘ฅ) |2 โ‰ค Yโˆ’1โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘ } (๐‘˜ โˆˆ โ„•).Furthermore, let ๐‘ 2 โŠ‚ ฮฉ be such that

๐‘ขโˆ— is bounded on ๐‘ 2,(19.10b)L(๐‘ 1

๐‘˜ \ ๐‘ 2) โ‰ค Y (๐‘˜ โˆˆ โ„•).(19.10c)

Using Hรถlderโ€™s inequality, (19.10a) and (19.10c), we then estimate for ๐‘˜ = 1, . . . ,๐‘š

๐ฟ๐‘˜ =

โˆซฮฉ\(๐‘ 1

๐‘˜โˆฉ๐‘ 2) ใ€ˆ๐‘ขโˆ—(๐‘ฅ), ๐‘ฃ๐‘˜ (๐‘ฅ)ใ€‰2 ๐‘‘๐‘ฅ

โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘+

โˆซ๐‘ 1๐‘˜โˆฉ๐‘ 2 ใ€ˆ๐‘ขโˆ—(๐‘ฅ), ๐‘ฃ๐‘˜ (๐‘ฅ)ใ€‰2 ๐‘‘๐‘ฅ

โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘

โ‰คโ€–๐Ÿ™ฮฉ\(๐‘ 1

๐‘˜โˆฉ๐‘ 2)๐‘ขโˆ—โ€–๐ฟ ยฎ๐‘โˆ— โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘

โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘+

โˆซ๐‘ 1๐‘˜โˆฉ๐‘ 2

ใ€ˆ๐‘ขโˆ—(๐‘ฅ), ๐‘ฃ๐‘˜ (๐‘ฅ)ใ€‰2|๐‘ฃ๐‘˜ (๐‘ฅ) |2

ยท |๐‘ฃ๐‘˜ (๐‘ฅ) |2โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘๐‘‘๐‘ฅ

โ‰ค โ€–๐Ÿ™ฮฉ\(๐‘ 1๐‘˜โˆฉ๐‘ 2)๐‘ข

โˆ—โ€–๐ฟ ยฎ๐‘โˆ— + Yโˆ’1โˆซ๐‘ 2

max{0, ใ€ˆ๐‘ข

โˆ—(๐‘ฅ), ๐‘ฃ๐‘˜ (๐‘ฅ)ใ€‰2|๐‘ฃ๐‘˜ (๐‘ฅ) |2

}๐‘‘๐‘ฅ .

If now for almost every๐‘ฅ โˆˆ ฮฉwe have that๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)), then also ใ€ˆ๐‘ขโˆ—(๐‘ฅ), ๐‘ฃ๐‘˜ (๐‘ฅ)ใ€‰2 โ‰ค0 for almost every ๐‘ฅ โˆˆ ฮฉ. It follows using (19.10b) and the reverse Fatou inequality in theprevious estimate that

(19.11) lim sup๐‘˜โ†’โˆž

๐ฟ๐‘˜ โ‰ค lim sup๐‘˜โ†’โˆž

โ€–๐Ÿ™ฮฉ\(๐‘ 1๐‘˜โˆฉ๐‘ 2)๐‘ข

โˆ—โ€–๐ฟ ยฎ๐‘โˆ— .

Since |๐‘ฃ๐‘˜ (๐‘ฅ) |2 โ‰ฅ Yโˆ’1โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘ for ๐‘ฅ โˆˆ ฮฉ \ ๐‘๐‘ž๐‘˜, we have

โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘ โ‰ฅ โ€–๐Ÿ™ฮฉ\๐‘ 1๐‘˜๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘ โ‰ฅ (Yโˆ’๐‘L(ฮฉ \ ๐‘ 1

๐‘˜))1/๐‘ โ€–๐‘ฃ๐‘˜ โ€–๐ฟ ยฎ๐‘ .

264

19 tangent and normal cones of pointwise-defined sets

Hence L(ฮฉ \ ๐‘ 1๐‘˜) โ‰ค Y๐‘ and L(ฮฉ \ (๐‘ 1

๐‘˜โˆฉ ๐‘2)) โ‰ค L(ฮฉ \ ๐‘ 1

๐‘˜) + L(ฮฉ \ ๐‘2) โ‰ค ๐ถY for some

constant ๐ถ > 0 and small enough Y > 0. It therefore follows from Egorovโ€™s theorem that๐Ÿ™ฮฉ\(๐‘ 1

๐‘˜โˆฉ๐‘ 2)๐‘ขโˆ— converge to 0 in measure as ๐‘˜ โ†’ โˆž. Since๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ) and ๐Ÿ™ฮฉ\(๐‘ 1

๐‘˜โˆฉ๐‘ 2)๐‘ขโˆ— โ‰ค ๐‘ขโˆ—,

it follows from Vitaliโ€™s convergence theorem (see, e.g., [Fonseca & Leoni 2007, Proposition2.27]) that lim sup๐‘˜โ†’โˆž โ€–๐Ÿ™ฮฉ\(๐‘ 1

๐‘˜โˆฉ๐‘ 2)๐‘ขโˆ—โ€–๐ฟ ยฎ๐‘โˆ— = 0. Since Y > 0 was arbitrary, we deduce from

(19.11) that (19.9) holds and, consequently,

๐‘๐‘ˆ (๐‘ข) โŠƒ {๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ) | ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.This proves one direction of (19.8), which therefore holds even without geometric deriv-ability.

For the converse inclusion, let ๐‘ขโˆ— โˆˆ ๐‘๐‘ˆ (๐‘ข). We have to show that ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ))for almost every ๐‘ฅ โˆˆ ฮฉ, which we do by contradiction. Assume therefore that the point-wise inclusion does not hold. By the polarity relationship ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) = ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ))โ—ฆfrom Lemma 18.10, we can nd ๐›ฟ > 0 and a Borel set ๐ธ โŠ‚ ฮฉ of nite positive Lebesguemeasure such that for each ๐‘ฅ โˆˆ ๐ธ, there exists ๐‘ค (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) with |๐‘ค (๐‘ฅ) |2 = 1and ใ€ˆ๐‘ขโˆ—(๐‘ฅ),๐‘ค (๐‘ฅ)ใ€‰2 โ‰ฅ ๐›ฟ . We may without loss of generality assume that ๐ถ (๐‘ฅ) is geo-metrically derivable at ๐‘ค (๐‘ฅ) for every ๐‘ฅ โˆˆ ๐ธ, i.e., for each ๐‘ฅ โˆˆ ๐ธ there exists a curveb ( ยท , ๐‘ฅ) : [0, Y (๐‘ฅ)] โ†’ ๐ถ (๐‘ฅ) such that bโ€ฒ+(0, ๐‘ฅ) = ๐‘ค (๐‘ฅ) and b (0, ๐‘ฅ) = ๐‘ข (๐‘ฅ). Let now ๐‘ โˆˆ (0, ๐›ฟ)be arbitrary. By replacing ๐ธ by a subset of positive measure, we may by Egorovโ€™s theoremassume the existence of Y > 0 such that

(19.12) |b (๐‘ก, ๐‘ฅ) โˆ’ b (0, ๐‘ฅ) โˆ’๐‘ค (๐‘ฅ)๐‘ก |2 โ‰ค ๐‘๐‘ก (๐‘ก โˆˆ [0, Y], ๐‘ฅ โˆˆ ๐ธ).

Let us dene

๏ฟฝ๏ฟฝ๐‘ก (๐‘ฅ) โ‰”{b (๐‘ก, ๐‘ฅ) if ๐‘ฅ โˆˆ ๐ธ,๐‘ข (๐‘ฅ) if ๐‘ฅ โˆˆ ฮฉ \ ๐ธ.

Setting ๐‘ฃ๐‘ก โ‰” ๏ฟฝ๏ฟฝ๐‘ก โˆ’๐‘ข, we have ๐‘ฃ๐‘ก (๐‘ฅ) = b (๐‘ก, ๐‘ฅ) โˆ’ b (0, ๐‘ฅ) for ๐‘ฅ โˆˆ ๐ธ and ๐‘ฃ๐‘ก (๐‘ฅ) = 0 for ๐‘ฅ โˆˆ ฮฉ \ ๐ธ.Therefore, writing ๐‘ฃ๐‘ก = (๐‘ฃ๐‘ก1, . . . , ๐‘ฃ๐‘ก๐‘š),๐‘ค = (๐‘ค1, . . . ,๐‘ค๐‘š), and b = (b1, . . . b๐‘š), we obtain using(19.12) for ๐‘ก โˆˆ (0, Y] and some ๐‘โ€ฒ > 0 that

โ€–๐‘ฃ๐‘ก โ€–2๐ฟ ยฎ๐‘ =

๐‘šโˆ‘๐‘—=1

(โˆซ๐ธ|b ๐‘— (๐‘ก, ๐‘ฅ) โˆ’ b ๐‘— (0, ๐‘ฅ) |๐‘ ๐‘— ๐‘‘๐‘ฅ

)2/๐‘ ๐‘—โ‰ค

๐‘šโˆ‘๐‘—=1

(โˆซ๐ธ( |๐‘ค ๐‘— (๐‘ฅ) |๐‘ก + ๐‘๐‘ก)๐‘ ๐‘— ๐‘‘๐‘ฅ

)2/๐‘ ๐‘—โ‰ค ๐‘โ€ฒ๐‘ก2.

Likewise,

ใ€ˆ๐‘ขโˆ—(๐‘ฅ), ๐‘ฃ๐‘ก (๐‘ฅ)ใ€‰2 โ‰ฅ ใ€ˆ๐‘ขโˆ—(๐‘ฅ),๐‘ค (๐‘ฅ)ใ€‰2 โˆ’ |๐‘ขโˆ—(๐‘ฅ) |2 ยท |b (๐‘ก, ๐‘ฅ) โˆ’ b (0, ๐‘ฅ) โˆ’๐‘ค๐‘ก |2 โ‰ฅ ๐›ฟ๐‘ก โˆ’ ๐‘๐‘ก .It follows that

lim sup๐‘กโ†’ 0

โˆซ๐ธ

ใ€ˆ๐‘ขโˆ—(๐‘ฅ), ๐‘ฃ๐‘ก (๐‘ฅ)ใ€‰2โ€–๐‘ฃ๐‘ก โ€–๐ฟ ยฎ๐‘

๐‘‘๐‘ฅ โ‰ฅ lim sup๐‘กโ†’ 0

L(๐ธ) (๐›ฟ๐‘ก โˆ’ ๐‘๐‘ก)๐‘โ€ฒ๐‘ก

=L(๐ธ) (๐›ฟ โˆ’ ๐‘)

๐‘โ€ฒ> 0.

265

19 tangent and normal cones of pointwise-defined sets

Taking ๐‘ข๐‘˜ โ‰” ๏ฟฝ๏ฟฝ1/๐‘˜ for ๐‘˜ โˆˆ โ„•, we obtain lim๐‘˜โ†’โˆž ๐ฟ๐‘˜ > 0 and therefore ๐‘ขโˆ— โˆ‰ ๐‘๐‘ˆ (๐‘ข). Bycontraposition, this shows that ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ. ๏ฟฝ

We can now derive a similar polarity relationships as to the nite-dimensional one inLemma 18.10.

Corollary 19.7. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable and ๐‘ข โˆˆ ๐‘ˆ . Then ๐‘๐‘ˆ (๐‘ข) = ๐‘‡๐‘ˆ (๐‘ข)โ—ฆ.

Proof. By Theorems 19.5 and 19.6 and Lemma 18.10, we have

(19.13) ๐‘ขโˆ— โˆˆ ๐‘๐‘ˆ (๐‘ข) โ‡” ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) (a.e. ๐‘ฅ โˆˆ ฮฉ)โ‡” ใ€ˆ๐‘ขโˆ—(๐‘ฅ),ฮ”๐‘ข (๐‘ฅ)ใ€‰2 โ‰ค 0 (a.e. ๐‘ฅ โˆˆ ฮฉ when ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)))โ‡’ ใ€ˆ๐‘ขโˆ—,ฮ”๐‘ขใ€‰๐ฟ ยฎ๐‘ โ‰ค 0 (when ฮ”๐‘ข โˆˆ ๐‘‡๐‘ˆ (๐‘ข))โ‡” ๐‘ขโˆ— โˆˆ ๐‘‡๐‘ˆ (๐‘ข)โ—ฆ.

Hence ๐‘๐‘ˆ (๐‘ข) โŠ‚ ๐‘‡๐‘ˆ (๐‘ข)โ—ฆ.For the converse inclusion, we need to improve the implication in (19.13) to an equivalence.We argue by contradiction. Assume that๐‘ขโˆ— โˆˆ ๐‘‡๐‘ˆ (๐‘ข)โ—ฆ and that there exists some ฮ”๐‘ข โˆˆ ๐‘‡๐‘ˆ (๐‘ข)and a subset ๐ธ โŠ‚ ฮฉ with L(ฮฉ \ ๐ธ) > 0 and

ใ€ˆ๐‘ขโˆ—(๐‘ฅ),ฮ”๐‘ข (๐‘ฅ)ใ€‰2 > 0 (๐‘ฅ โˆˆ ๐ธ).

Taking ๐‘ขโˆ—(๐‘ฅ) โ‰” (1 + ๐‘ก๐Ÿ™๐ธ (๐‘ฅ))๐‘ขโˆ—(๐‘ฅ), we obtain for sucient large ๐‘ก that ใ€ˆ๐‘ขโˆ—,ฮ”๐‘ขใ€‰๐ฟ ยฎ๐‘ > 0. Thiscontradicts that ๐‘ขโˆ— โˆˆ ๐‘‡๐‘ˆ (๐‘ข)โ—ฆ. Hence ๐‘๐‘ˆ (๐‘ข) โŠƒ ๐‘‡๐‘ˆ (๐‘ข)โ—ฆ. ๏ฟฝ

the limiting cones

For the limiting cones, we in general only have an inclusion of the pointwise cones.

Theorem 19.8. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable. Then for every ๐‘ข โˆˆ ๐‘ˆ ,

๐‘‡๐‘ˆ (๐‘ข) โŠƒ{ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ)

๏ฟฝ๏ฟฝ ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Proof. Let ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ฮฉ) with ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ and let ๐‘ข๐‘˜ โ†’ ๐‘ข in๐ฟ ยฎ๐‘ (ฮฉ). In particular, we then have ๐‘ข๐‘˜ (๐‘ฅ) โ†’ ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ. Furthermore,by the inner limit characterization of ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) in Corollary 18.20, there exist ฮ”๏ฟฝ๏ฟฝ๐‘˜ (๐‘ฅ) โˆˆ๐‘‡๐ถ (๐‘ฅ) (๐‘ข๐‘˜ (๐‘ฅ)) with ฮ”๏ฟฝ๏ฟฝ๐‘˜ (๐‘ฅ) โ†’ ฮ”๐‘ข (๐‘ฅ). Egorovโ€™s theorem, then yields for all โ„“ โ‰ฅ 1 a Borel-measurable set ๐ธโ„“ โŠ‚ ฮฉ such that L(ฮฉ \ ๐ธโ„“) < 1/โ„“ and ฮ”๏ฟฝ๏ฟฝ๐‘˜ โ†’ ฮ”๐‘ข uniformly on ๐ธโ„“ . Since๐‘‡๐ถ (๐‘ฅ) (๐‘ข๐‘˜ (๐‘ฅ)) is a cone, we have 0 โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข๐‘˜ (๐‘ฅ)). It follows that

๐‘‡๐ถ (๐‘ฅ) (๐‘ข๐‘˜ (๐‘ฅ)) 3 ฮ”๐‘ขโ„“,๐‘˜ (๐‘ฅ) โ‰” ๐Ÿ™๐ธโ„“ (๐‘ฅ)ฮ”๏ฟฝ๏ฟฝ๐‘˜ (๐‘ฅ).

266

19 tangent and normal cones of pointwise-defined sets

In particular, (19.3) shows that ฮ”๐‘ขโ„“,๐‘˜ โˆˆ ๐‘‡๐‘ˆ (๐‘ข๐‘˜) with ฮ”๐‘ขโ„“,๐‘˜ โ†’ ฮ”๐‘ขโ„“ โ‰” ฮ”๐‘ข๐Ÿ™๐ธโ„“ in ๐ฟ ยฎ๐‘ (ฮฉ) as๐‘˜ โ†’ โˆž. By Vitaliโ€™s convergence theorem (compare the proof of Theorem 19.6),ฮ”๐‘ข๐Ÿ™๐ธโ„“ โ†’ ฮ”๐‘ข

in ๐ฟ ยฎ๐‘ (ฮฉ) as โ„“ โ†’ โˆž. Therefore, we may extract a diagonal subsequence {ฮ”๏ฟฝ๏ฟฝ๐‘˜ โ‰” ฮ”๐‘ขโ„“๐‘˜ ,๐‘˜}๐‘˜โ‰ฅ1of {ฮ”๐‘ขโ„“,๐‘˜}๐‘˜,โ„“โ‰ฅ1 such that ฮ”๏ฟฝ๏ฟฝ๐‘˜ โ†’ ฮ”๐‘ข. Since ๐‘ข๐‘˜ โ†’ ๐‘ข was arbitrary and ฮ”๏ฟฝ๏ฟฝ๐‘˜ โˆˆ ๐‘‡๐‘ˆ (๐‘ข๐‘˜), wededuce that ฮ”๐‘ข โˆˆ ๐‘‡๐‘ˆ (๐‘ข). ๏ฟฝ

Theorem 19.9. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable. Then for every ๐‘ข โˆˆ ๐‘ˆ ,

๐‘๐‘ˆ (๐‘ข) โŠƒ{๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Proof. Let ๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ) with ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ. Then by def-inition, for almost all ๐‘ฅ โˆˆ ฮฉ there exist ๐ถ (๐‘ฅ) 3 ๏ฟฝ๏ฟฝ๐‘˜ (๐‘ฅ) โ†’ ๐‘ข (๐‘ฅ) as well as ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) 3๏ฟฝ๏ฟฝโˆ—๐‘˜(๐‘ฅ) โ†’ ๐‘ขโˆ—(๐‘ฅ). By Egorovโ€™s theorem, for every โ„“ โ‰ฅ 1 there exists a Borel-measurable set

๐ธโ„“ โŠ‚ ฮฉ such that L(ฮฉ \ ๐ธโ„“) < 1/โ„“ and ๏ฟฝ๏ฟฝโˆ—๐‘˜โ†’ ๐‘ขโˆ— as well as ๏ฟฝ๏ฟฝ๐‘˜ โ†’ ๐‘ข uniformly on ๐ธโ„“ . We

set ๐‘ขโ„“,๐‘˜ โ‰” ๐Ÿ™๐ธโ„“๐‘ข๐‘˜ + (1 โˆ’ ๐Ÿ™๐ธโ„“ )๐‘ข and ๐‘ขโˆ—โ„“,๐‘˜

โ‰” ๐Ÿ™๐ธ๐›ฟ๏ฟฝ๏ฟฝโˆ—๐‘˜. Then ๐‘ขโˆ—

โ„“,๐‘˜(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ขโ„“,๐‘˜ (๐‘ฅ)) for almost

every ๐‘ฅ โˆˆ ฮฉ. By Vitaliโ€™s convergence theorem (compare the proof of Theorem 19.6), both๐‘ขโ„“,๐‘˜ โ†’ ๐‘ข in ๐ฟ ยฎ๐‘ (ฮฉ) and ๐‘ขโˆ—

โ„“,๐‘˜โ†’ ๐‘ขโˆ—โ„“ in ๐ฟ

ยฎ๐‘โˆ— (ฮฉ) for ๐‘ขโˆ—โ„“ โ‰” ๐Ÿ™๐ธโ„“๐‘ขโˆ—. Since ๐‘ขโˆ—โ„“ โ†’ ๐‘ขโˆ— in ๐ฟ ยฎ๐‘โˆ— (ฮฉ), we

can extract a diagonal subsequence of {(๐‘ขโ„“,๐‘˜ , ๐‘ขโˆ—โ„“,๐‘˜)}โ„“,๐‘˜โ‰ฅ1 to deduce that ๐‘ขโˆ— โˆˆ ๐‘๐‘ˆ (๐‘ข). ๏ฟฝ

If the pointwise sets๐ถ (๐‘ฅ) are regular, we have the following polarity between the cones tothe pointwise-dened set๐‘ˆ .

Lemma 19.10. Let ๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable and ๐‘ข โˆˆ ๐‘ˆ . If ๐ถ (๐‘ฅ) is regular at ๐‘ข (๐‘ฅ)and closed near ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, then ๐‘‡๐‘ˆ (๐‘ข) = ๐‘๐‘ˆ (๐‘ข)โ—ฆ.

Proof. By the regularity of๐ถ (๐‘ฅ) at ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ and Theorem 19.6, we have

๐‘๐‘ˆ (๐‘ข) ={๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

By Theorem 18.15, ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ))โ—ฆ = ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ. Arguing as in theproof of Corollary 19.7, we thus obtain

๐‘๐‘ˆ (๐‘ข)โ—ฆ ={ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ)

๏ฟฝ๏ฟฝ ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

The regularity of ๐ถ (๐‘ฅ) also implies that ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) = ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for almost every ๐‘ฅ โˆˆ ฮฉ.The claims now follow from Theorem 19.5. ๏ฟฝ

We can use this result to transfer the regularity of ๐ถ (๐‘ฅ) to๐‘ˆ .

Lemma 19.11. Let ๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable and ๐‘ข โˆˆ ๐‘ˆ . If ๐ถ (๐‘ฅ) is regular at ๐‘ข (๐‘ฅ)and closed near ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, then๐‘ˆ is regular at ๐‘ข and

๐‘‡๐‘ค๐‘ˆ (๐‘ข) = ๐‘‡๐‘ˆ (๐‘ข) = ๐‘‡๐‘ˆ (๐‘ข).

267

19 tangent and normal cones of pointwise-defined sets

Proof. Since ๐ฟ ยฎ๐‘ (ฮฉ) is reexive, we have ๐‘๐‘ˆ (๐‘ข) = ๐‘‡๐‘ค๐‘ˆ (๐‘ข)โ—ฆ by Lemma 18.10 (ii). This facttogether with Lemma 19.10 and Theorems 1.8 and 18.5 shows that

๐‘‡๐‘ค๐‘ˆ (๐‘ข) โŠ‚ ๐‘‡๐‘ค๐‘ˆ (๐‘ข)โ—ฆโ—ฆ = ๐‘๐‘ˆ (๐‘ข)โ—ฆ = ๐‘‡๐‘ˆ (๐‘ข) โŠ‚ ๐‘‡๐‘ค๐‘ˆ (๐‘ข).

Furthermore, by the regularity and closedness assumptions, we obtain from Theorems 19.5and 19.8 that ๐‘‡๐‘ˆ (๐‘ข) = ๐‘‡๐‘ˆ (๐‘ข), which also implies tangential regularity.

Since ๐ฟ ยฎ๐‘ (ฮฉ) for ยฎ๐‘ โˆˆ (1,โˆž)๐‘š is reexive and Gรขteaux smooth, normal regularity followsfrom Theorem 18.25 together with Lemma 19.10. ๏ฟฝ

From this, we obtain pointwise expressions with equality. For the Clarke tangent cone, weonly require local closedness of the underlying sets.

Theorem 19.12. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable. If ๐ถ (๐‘ฅ) is closed near ๐‘ข (๐‘ฅ) for almostevery ๐‘ฅ โˆˆ ฮฉ for every ๐‘ข โˆˆ ๐‘ˆ , then

๐‘‡๐‘ˆ (๐‘ข) ={ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ)

๏ฟฝ๏ฟฝ ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Proof. The inclusion โ€œโŠƒโ€ was already shown in Theorem 19.8. To prove the converseinclusion when ๐ถ (๐‘ฅ) is closed near ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, we only need to observefrom Lemma 18.12 and Theorem 19.9 and

๐‘‡๐ถ (๐‘ข) โŠ‚ ๐‘๐ถ (๐‘ข)โ—ฆ โŠ‚{๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}โ—ฆ

={ฮ”๐‘ข โˆˆ ๐ฟ ยฎ๐‘ (ฮฉ)

๏ฟฝ๏ฟฝ ฮ”๐‘ข (๐‘ฅ) โˆˆ ๐‘‡๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ},

where the last equality again follows from Theorem 18.15 together with an argument as inthe proof of Corollary 19.7. ๏ฟฝ

For the limiting normal cone, however, we do require regularity.

Theorem 19.13. Let๐‘ˆ โŠ‚ ๐ฟ ยฎ๐‘ (ฮฉ) be pointwise derivable. If ๐ถ (๐‘ฅ) is regular at ๐‘ข (๐‘ฅ) and closednear ๐‘ข (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, then for every ๐‘ข โˆˆ ๐‘ˆ ,

๐‘๐‘ˆ (๐‘ข) ={๐‘ขโˆ— โˆˆ ๐ฟ ยฎ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐‘๐ถ (๐‘ฅ) (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Proof. The inclusion โ€œโŠƒโ€ was already shown in Theorem 19.9. The converse inclusion forregular and closed ๐ถ (๐‘ฅ) follows from Lemma 19.11 and Theorem 19.6. ๏ฟฝ

268

19 tangent and normal cones of pointwise-defined sets

Remark 19.14. Theorems 19.5 and 19.6 on the fundamental cones are based on [Clason & Valkonen2017b]. Without regularity, the characterization of the limiting normal cone of a pointwise-denedset is much more delicate. A full characterization was given in [Mehlitz & Wachsmuth 2018; Mehlitz& Wachsmuth 2019], which showed that even for a closed nonconvex set, the limiting normal conecontains the convex hull of the strong limiting normal cone (where the limit is taken with respectto strong convergence instead of weak-โˆ— convergence) and is dense in the Dini normal cone๐‘‡ โ—ฆ

๐ถ (๐‘ฅ) โ€“in the words of the authors, it may be โ€œunpleasantly largeโ€. This is due to an inherent convexifyingeect of integration with respect to the Lebesgue measure.

A characterization of specic pointwise-dened sets in Sobolev spaces was derived in [Harder &Wachsmuth 2018], with similar conclusions.

269

20 DERIVATIVES AND CODERIVATIVES OF SET-VALUED

MAPPINGS

We are now ready to dierentiate set-valued mappings; as already discussed, these gener-alized derivatives are based on the tangent and normal cones of the previous Chapter 18.To account for the changed focus, we will slightly switch notation and use in this and thefollowing chapters of Part IV uppercase letters for set-valued mappings and lowercaseletters for scalar-valued functionals such that, e.g., ๐น (๐‘ฅ) = ๐œ•๐‘“ (๐‘ฅ). We focus in this chapteron examples, basic properties, and relationships between the various derivative concepts.In the following Chapters 22 to 25, we then develop calculus rules for each of the dierentderivatives and coderivatives.

20.1 definitions

To motivate the following denitions, it is instructive to recall the geometric intuitionbehind the classical derivative of a scalar function ๐‘“ as limit of a dierence quotient: givenan (innitesimal) change ฮ”๐‘ฅ of the argument ๐‘ฅ , it gives the corresponding (innitesimal)change ฮ”๐‘ฆ of the value ๐‘ฆ = ๐‘“ (๐‘ฅ) required to stay on the graph of ๐‘“ . In other words,(ฮ”๐‘ฅ,ฮ”๐‘ฆ) is a tangent vector to graph ๐‘“ . For a proper set-valued mapping ๐น , however, it isalso possible to remain on the graph of ๐น by varying ๐‘ฆ without changing ๐‘ฅ ; it thus alsomakes sense to ask the โ€œdualโ€ question of, given a change ฮ”๐‘ฆ in image space, what changeฮ”๐‘ฅ in domain space is required to stay inside the graph of ๐น . In geometric terms, the answeris given by ฮ”๐‘ฆ such that (ฮ”๐‘ฅ,ฮ”๐‘ฆ) is a normal vector to graph ๐‘“ (Note that normal vectorspoint away from a set, while we are trying to correct by moving towards it. Recall alsothat (๐‘“ โ€ฒ(๐‘ฅ),โˆ’1) is normal to epi ๐‘“ for a smooth function ๐‘“ ; see Figure 20.1 and compareLemma 4.10 as well as Section 20.4 below.) In Banach spaces, of course, normal vectors aresubsets of the dual space.

We thus distinguish

(i) graphical derivatives, which generalize classical derivatives and are based on tangentcones;

(ii) coderivatives, which generalize adjoint derivatives and are based on normal cones.

270

20 derivatives and coderivatives of set-valued mappings

epi ๐‘“

ฮ”๐‘ฅ = ๐‘ฅโˆ—ฮ”๐‘ฆ = ๐‘“ โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ

โˆ’๐‘ฆโˆ— = โˆ’๐‘ฅโˆ—/๐‘“ โ€ฒ(๐‘ฅ)

๐‘‡graph ๐‘“

๐‘graph ๐‘“

Figure 20.1: Illustration why the coderivatives negate ๐‘ฆโˆ— in comparison to the normal cone.

In each case, we can use either basic or limiting cones, leading to four dierent denitions.

Specically, let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then we dene

(i) the graphical derivative of ๐น at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐‘Œ as

๐ท๐น (๐‘ฅ |๐‘ฆ) : ๐‘‹ โ‡’ ๐‘Œ, ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โ‰” {ฮ”๐‘ฆ โˆˆ ๐‘Œ

๏ฟฝ๏ฟฝ (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)} ;(ii) the Clarke graphical derivative of ๐น at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐‘Œ as

๐ท๐น (๐‘ฅ |๐‘ฆ) : ๐‘‹ โ‡’ ๐‘Œ, ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โ‰”{ฮ”๐‘ฆ โˆˆ ๐‘Œ

๏ฟฝ๏ฟฝ๏ฟฝ (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)} ;(iii) the Frรฉchet coderivative of ๐น at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐‘Œ as

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) : ๐‘Œ โˆ— โ‡’ ๐‘‹ โˆ—, ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โ‰”{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)};

(iv) the (basic or limiting or Mordukhovich) coderivative of ๐น at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐‘Œ as

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) : ๐‘Œ โˆ— โ‡’ ๐‘‹ โˆ—, ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โ‰” {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— ๏ฟฝ๏ฟฝ (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)

}.

Observe how the coderivatives operate from ๐‘Œ โˆ— to๐‘‹ โˆ—, while the derivatives operate from๐‘‹to๐‘Œ . It is crucial that these are dened directly via (possibly nonconvex) normal cones ratherthan via polarity from the corresponding graphical derivatives to avoid convexication.This will allow for sharper results involving these coderivatives.

We illustrate these denitions with the simplest example of a single-valued linear opera-tor.

Example 20.1 (single-valued linear operators). Let ๐น (๐‘ฅ) โ‰” {๐ด๐‘ฅ} for ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and๐‘ข = (๐‘ฅ,๐ด๐‘ฅ) โˆˆ graph ๐น . Note that graph ๐น is a linear subspace of ๐‘‹ ร—๐‘Œ . Since graph ๐น isregular by Corollary 18.27, both of the tangent cones are given by

๐‘‡graph ๐น (๐‘ข) = ๐‘‡graph ๐น (๐‘ข) = graph ๐น = {(ฮ”๐‘ฅ,๐ดฮ”๐‘ฅ) โˆˆ ๐‘‹ ร— ๐‘Œ | ฮ”๐‘ฅ โˆˆ ๐‘‹ },

271

20 derivatives and coderivatives of set-valued mappings

while the normal cones are given by

๐‘graph ๐น (๐‘ข) = ๐‘graph ๐น (๐‘ข) = {๐‘ขโˆ— โˆˆ ๐‘‹ โˆ— ร— ๐‘Œ โˆ— | ๐‘ขโˆ— โŠฅ graph ๐น }= {(๐‘ฅโˆ—, ๐‘ฆโˆ—) โˆˆ ๐‘‹ โˆ— ร— ๐‘Œ โˆ— | ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฆโˆ—, ๐ดฮ”๐‘ฅใ€‰๐‘Œ = 0 for all ฮ”๐‘ฅ โˆˆ ๐‘‹ }= {(๐ดโˆ—๐‘ฆโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘‹ โˆ— ร— ๐‘Œ โˆ— | ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.

This immediately yields the graphical derivatives

๐ท๐น (๐‘ฅ |๐ด๐‘ฅ) (ฮ”๐‘ฅ) = ๐ท๐น (๐‘ฅ |๐ด๐‘ฅ) (ฮ”๐‘ฅ) = {๐ดฮ”๐‘ฅ}

as well as the coderivatives

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = {๐ดโˆ—๐‘ฆโˆ—}.

Using (18.1), we can also write the graphical derivative as

(20.1) ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = lim sup๐‘กโ†’ 0,ฮ”๐‘ฅโ†’ฮ”๐‘ฅ

๐น (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆ’ ๐‘ฆ๐‘ก

,

since(ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ lim sup

๐œโ†’ 0

graph ๐น โˆ’ (๐‘ฅ, ๐‘ฆ)๐œ

if and only if there exist ๐œ๐‘˜โ†’ 0 and ๐‘ฅ๐‘˜ such that

(20.2) ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

and ฮ”๐‘ฆ โˆˆ lim sup๐‘˜โ†’โˆž

๐น (๐‘ฅ๐‘˜) โˆ’ ๐‘ฆ๐œ๐‘˜

.

The former forces ๐‘ฅ๐‘˜ = ๐‘ฅ โˆ’ ๐œ๐‘˜ฮ”๐‘ฅ๐‘˜ for ฮ”๐‘ฅ๐‘˜ โ†’ ฮ”๐‘ฅ , so the latter gives (20.1).

In innite-dimensional spaces, we also have to distinguish the weak graphical derivative๐ท๐‘ค๐น (๐‘ฅ |๐‘ฆ) and the Y-coderivative ๐ทโˆ—

Y ๐น (๐‘ฅ |๐‘ฆ), both constructed analogously from the weaktangent cone ๐‘‡๐‘คgraph ๐น (๐‘ฅ, ๐‘ฆ) and the Y-normal cone ๐‘ Y

graph ๐น (๐‘ฅ, ๐‘ฆ), respectively. However, wewill not beworking directlywith these and instead switch to the setting of the correspondingcones when they would be needed.

Remark 20.2 (a much too brief history of various (co)derivatives). As for the various tangentand normal cones, the (more recent) development of derivatives and coderivatives of set-valuedmappings is convoluted, and we do not attempt to give a full account, instead referring to thecommentaries to [Rockafellar & Wets 1998, Chapter 8], [Mordukhovich 2006, Chapter 1.4.12], and[Mordukhovich 2018, Chapter 1].

The graphical derivative goes back to Aubin [Aubin 1981], who also introduced the Clarke graphicalderivative (under the name circatangent derivative) in [Aubin 1984]. Coderivatives based on normalcones were mainly treated there for mappings whose graphs are convex, for which these cones canbe dened as polars of the appropriate tangent cones. Graphical derivatives were further studied in

272

20 derivatives and coderivatives of set-valued mappings

[Thibault 1983]. In parallel, Mordukhovich introduced the (nonconvex) limiting coderivative via hislimiting normal cone in [Morduhoviฤ 1980], again stressing the need for a genuinely nonconvexdirect construction. The term coderivative was coined by Ioe, who was the rst to study thesemappings systematically in [Ioe 1984].

20.2 basic properties

We now translate various results of Chapter 18 on tangent and normal cones to the settingof graphical derivatives and coderivatives. From Theorem 18.5, we immediately obtain

Corollary 20.3. For ๐น : ๐‘‹ โ‡’ ๐‘Œ , ๐‘ฅ โˆˆ ๐‘‹ , and ๐‘ฆ โˆˆ ๐‘Œ , we have the inclusions(i) ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โŠ‚ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โŠ‚ ๐ท๐‘ค๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) for all ฮ”๐‘ฅ โˆˆ ๐‘‹ ;(ii) ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) for all ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—.

Similarly, we obtain from Theorem 18.8 the following outer semicontinuity and convexityproperties.

Corollary 20.4. For ๐น : ๐‘‹ โ‡’ ๐‘Œ , ๐‘ฅ โˆˆ ๐‘‹ , and ๐‘ฆ โˆˆ ๐‘Œ ,(i) ๐ท๐น (๐‘ฅ |๐‘ฆ), ๐ท๐น (๐‘ฅ |๐‘ฆ), and ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) are closed;(ii) if ๐‘‹ and ๐‘Œ are nite-dimensional, then ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) is closed;(iii) ๐ท๐น (๐‘ฅ |๐‘ฆ) and ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) are convex.

Graphical derivatives and coderivatives behave completely symmetrically with respectto inversion of a set-valued mapping (which we recall is always possible in the sense ofpreimages).

Lemma 20.5. Let ๐น : ๐‘‹ โ‡’ ๐‘Œ , ๐‘ฅ โˆˆ ๐‘‹ , and ๐‘ฆ โˆˆ ๐‘Œ . Thenฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โ‡” ฮ”๐‘ฅ โˆˆ ๐ท๐นโˆ’1(๐‘ฆ |๐‘ฅ) (ฮ”๐‘ฆ),ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โ‡” ฮ”๐‘ฅ โˆˆ ๐ท๐นโˆ’1(๐‘ฆ |๐‘ฅ) (ฮ”๐‘ฆ),๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โ‡” โˆ’๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (โˆ’๐‘ฅโˆ—),๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โ‡” โˆ’๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (โˆ’๐‘ฅโˆ—).

Proof. We have

ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โ‡” (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)โ‡” (ฮ”๐‘ฆ,ฮ”๐‘ฅ) โˆˆ ๐‘‡graph ๐นโˆ’1 (๐‘ฆ, ๐‘ฅ)โ‡” ฮ”๐‘ฅ โˆˆ ๐ท๐นโˆ’1(๐‘ฆ |๐‘ฅ) (ฮ”๐‘ฆ).

The proof for the regular derivative and the coderivatives is completely analogous. ๏ฟฝ

273

20 derivatives and coderivatives of set-valued mappings

adjoints of set-valued mappings

From the various relations between normal and tangent cones, we obtain correspondingrelations between these derivatives. To state these relationships, we need to introduce theupper and lower adjoints of set-valued mappings. Let ๐ป : ๐‘‹ โ‡’ ๐‘Œ be a set-valued mapping.Then the upper adjoint of ๐ป is dened as

๐ป โ—ฆ+(๐‘ฆโˆ—) โ‰” {๐‘ฅโˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰๐‘Œ for all ๐‘ฆ โˆˆ ๐ป (๐‘ฅ), ๐‘ฅ โˆˆ ๐‘‹ },

and the lower adjoint of ๐ป as

๐ป โ—ฆโˆ’(๐‘ฆโˆ—) โ‰” {๐‘ฅโˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ฅ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰๐‘Œ for all ๐‘ฆ โˆˆ ๐ป (๐‘ฅ), ๐‘ฅ โˆˆ ๐‘‹ }.

As the next example shows, these notions generalize the denition of the adjoint of a linearoperator.

Example 20.6 (upper and lower adjoints of linear mappings). Let ๐ป (๐‘ฅ) โ‰” {๐ด๐‘ฅ} for๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ). Then

๐ป โ—ฆ+(๐‘ฆโˆ—) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰๐‘Œ for all ๐‘ฆ = ๐ด๐‘ฅ, ๐‘ฅ โˆˆ ๐‘‹ }= {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ใ€ˆ๐‘ฆโˆ—, ๐ด๐‘ฅใ€‰๐‘Œ for all ๐‘ฅ โˆˆ ๐‘‹ }= {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ— โˆ’๐ดโˆ—๐‘ฆโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค 0 for all ๐‘ฅ โˆˆ ๐‘‹ }= {๐ดโˆ—๐‘ฆโˆ—}.

Similarly, ๐ป โ—ฆโˆ’(๐‘ฆโˆ—) = {๐ดโˆ—๐‘ฆโˆ—}.

For solution mappings of linear equations, we have the following adjoints.

Example 20.7 (upper and lower adjoints of solution maps to linear equations). Let๐ป (๐‘ฅ) โ‰” {๐‘ฆ | ๐ด๐‘ฆ = ๐‘ฅ} for ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ). Then

๐ป โ—ฆ+(๐‘ฆโˆ—) = {๐‘ฅโˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰ โ‰ค ใ€ˆ๐‘ฆโˆ—, ๐‘ฆใ€‰ for all ๐ด๐‘ฆ = ๐‘ฅ, ๐‘ฅ โˆˆ ๐‘‹ }

If ๐‘ฆโˆ— โˆ‰ ran๐ดโˆ—, then ran๐ดโˆ— โŠฅ ker๐ด โ‰  โˆ…, so for every ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— and ๐‘ฅ โˆˆ ๐‘‹ we can choose๐‘ฆ โˆˆ ๐‘Œ such that the above condition is not satised. Therefore ๐ป โ—ฆ+(๐‘ฆ) = โˆ…. Otherwise,if ๐‘ฆโˆ— = ๐ดโˆ—๐‘ฅโˆ—, we continue to calculate

๐ป โ—ฆ+(๐‘ฆโˆ—) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ for all ๐‘ฅ โˆˆ ๐‘‹ } = {๐‘ฅโˆ—}.

Therefore๐ป โ—ฆ+(๐‘ฆโˆ—) = {๐‘ฅโˆ— โˆˆ ๐‘‹ | ๐ดโˆ—๐‘ฅโˆ— = ๐‘ฆโˆ—}.

A similar argument shows that ๐ป โ—ฆโˆ’(๐‘ฆโˆ—) = ๐ป โ—ฆ+(๐‘ฆโˆ—).

274

20 derivatives and coderivatives of set-valued mappings

These examples and Example 20.1 suggest the adjoint relationships of the next corollary.Note that in innite-dimensional spaces, we only have a relationship between the limitingderivatives, i.e., between the Clarke graphical derivative and the limiting coderivative.

Corollary 20.8. Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ .

(i) If ๐‘‹ and ๐‘Œ are nite-dimensional, then

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) = ๐ท๐น (๐‘ฅ |๐‘ฆ)โ—ฆ+.

(ii) If๐‘‹ and๐‘Œ are reexive andGรขteaux smooth (in particular, if they are nite-dimensional),and graph ๐น is closed near (๐‘ฅ, ๐‘ฆ), then

๐ท๐น (๐‘ฅ |๐‘ฆ) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)โ—ฆโˆ’.

Proof. (i): Identifying ๐‘‹ โˆ— with ๐‘‹ and ๐‘Œ โˆ— with ๐‘Œ in nite dimension, we have by denitionthat

๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = {ฮ”๐‘ฆ โˆˆ ๐‘Œ

๏ฟฝ๏ฟฝ (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)}and

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฆ) ={ฮ”๐‘ฅ โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ (ฮ”๐‘ฅ,โˆ’ฮ”๐‘ฆ) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)}.

Using Lemma 18.10 (iii), we then see that

๐‘ฅโˆ— โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ)โ—ฆ+(๐‘ฆโˆ—) โ‡” ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ โ‰ค ใ€ˆ๐‘ฆโˆ—,ฮ”๐‘ฆใ€‰๐‘Œ for ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ)โ‡” ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ + ใ€ˆโˆ’๐‘ฆโˆ—,ฮ”๐‘ฆใ€‰๐‘Œ โ‰ค 0 for (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)โ‡” (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)โ—ฆ = ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)โ‡” ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—).

This proves the claim.

(ii): We proceed analogously to (i) using Theorem 18.19 (or Theorem 18.15 if ๐‘‹ and ๐‘Œ arenite-dimensional):

ฮ”๐‘ฆ โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)โ—ฆโˆ’(ฮ”๐‘ฅ) โ‡” ใ€ˆ๐‘ฆโˆ—,ฮ”๐‘ฆใ€‰๐‘Œ โ‰ฅ ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ for ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—)โ‡” ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ + ใ€ˆโˆ’๐‘ฆโˆ—,ฮ”๐‘ฆใ€‰๐‘Œ โ‰ค 0 for (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)โ‡” (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)โ—ฆ = ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)โ‡” ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ). ๏ฟฝ

275

20 derivatives and coderivatives of set-valued mappings

limiting characterizations in finite dimensions

In nite dimensions, we can characterize the limiting coderivative and the Clarke derivativedirectly as inner and outer limits, respectively.

Corollary 20.9. Let๐‘‹ and๐‘Œ be nite-dimensional and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then for all (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร—๐‘Œand all ๐‘ฆโˆ— โˆˆ ๐‘Œ ,

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) =๐‘ฅโˆ— โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝthere exists graph ๐น 3 (๐‘ฅ, ๏ฟฝ๏ฟฝ) โ†’ (๐‘ฅ, ๐‘ฆ)

and (๐‘ฅโˆ—, ๏ฟฝ๏ฟฝโˆ—) โ†’ (๐‘ฅโˆ—, ๐‘ฆโˆ—)with ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโˆ—)

.(20.3)

If graph ๐น is closed near (๐‘ฅ, ๐‘ฆ), then for all ฮ”๐‘ฅ โˆˆ โ„๐‘

๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) =ฮ”๐‘ฆ โˆˆ ๐‘Œ

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all graph ๐น 3 (๐‘ฅ, ๏ฟฝ๏ฟฝ) โ†’ (๐‘ฅ, ๐‘ฆ)there exists (ฮ”๐‘ฅ,ฮ”๏ฟฝ๏ฟฝ) โ†’ (ฮ”๐‘ฅ,ฮ”๐‘ฆ)

with ฮ”๏ฟฝ๏ฟฝ โˆˆ ๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ)

.(20.4)

Proof. The characterization (20.3) of the limiting coderivative is a direct application of thedenition of the limiting normal cone (18.3) as an outer limit of the Frรฉchet normal. Thecharacterization (20.4) of the Clarke graphical derivative follows from the characterizationof Corollary 18.20 of the Clarke tangent cone as an inner limit of (basic) tangent cones. ๏ฟฝ

regularity

Based on the regularity concepts of sets from Section 18.4, we can dene concepts ofregularity of set-valued mappings. We say that ๐น at (๐‘ฅ, ๐‘ฆ) โˆˆ graph ๐น (or at ๐‘ฅ for ๐‘ฆ โˆˆ ๐น (๐‘ฅ))is

(i) T-regular if ๐ท๐น (๐‘ฅ |๐‘ฆ) = ๐ท๐น (๐‘ฅ |๐‘ฆ) (i.e., if graph ๐น has tangential regularity);

(ii) N-regular, if ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (i.e., if graph ๐น has normal regularity).

If ๐น is both T- and N-regular at (๐‘ฅ, ๐‘ฆ), we say that ๐น is graphically regular.

FromTheorem 18.25,we immediately obtain the following characterization of๐‘ -regularity.

Corollary 20.10. Let ๐‘‹,๐‘Œ be reexive and Gรขteaux smooth Banach spaces, ๐น : ๐‘‹ โ‡’ ๐‘Œ , andlet (๐‘ฅ, ๐‘ฆ) โˆˆ graph ๐น with graph ๐น closed near (๐‘ฅ, ๐‘ฆ). Then ๐น is N-regular at (๐‘ฅ, ๐‘ฆ) if and onlyif ๐ท๐น (๐‘ฅ |๐‘ฆ) = [๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)]โ—ฆโˆ’.

Writing out various alternatives of Theorem 18.23 for set-valued mappings, we obtain fullequivalence of the notions and alternative characterizations in nite dimensions.

276

20 derivatives and coderivatives of set-valued mappings

Corollary 20.11. Let ๐‘‹,๐‘Œ be nite-dimensional and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If graph ๐น is closed near(๐‘ฅ, ๐‘ฆ), then the following conditions are equivalent:

(i) ๐น is N-regular at ๐‘ฅ for ๐‘ฆ , i.e., ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ);(ii) ๐น is T-regular at ๐‘ฅ for ๐‘ฆ , i.e., ๐ท๐น (๐‘ฅ |๐‘ฆ) = ๐ท๐น (๐‘ฅ |๐‘ฆ);

(iii) ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠƒ๐‘ฅโˆ— โˆˆ ๐‘‹

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝthere exists graph ๐น 3 (๐‘ฅ, ๏ฟฝ๏ฟฝ) โ†’ (๐‘ฅ, ๐‘ฆ)

and (๐‘ฅโˆ—, ๏ฟฝ๏ฟฝโˆ—) โ†’ (๐‘ฅโˆ—, ๐‘ฆโˆ—)with ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโˆ—)

;

(iv) ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) โŠ‚ฮ”๐‘ฆ โˆˆ ๐‘Œ

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all (๐‘ฅ, ๏ฟฝ๏ฟฝ) โ†’ (๐‘ฅ, ๐‘ฆ)there exists graph ๐น 3 (ฮ”๐‘ฅ,ฮ”๏ฟฝ๏ฟฝ) โ†’ (ฮ”๐‘ฅ,ฮ”๐‘ฆ)

with ฮ”๏ฟฝ๏ฟฝ โˆˆ ๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ)

.In particular, if any of these hold, ๐น is graphically regular at ๐‘ฅ for ๐‘ฆ .

20.3 examples

As the following examples demonstrate, the graphical derivatives and coderivatives gener-alize classical (sub)dierentials.

single-valued mappings and their inverses

For the Clarke graphical derivative and the limiting coderivatives (which are obtainedas inner or outer limits), we have to require โ€“ just as for the Clarke subdierential inTheorem 13.5 โ€“ slightly more than just Frรฉchet dierentiability.

Theorem 20.12. Let ๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be single-valued and Frรฉchet-dierentiable at ๐‘ฅ โˆˆ ๐‘‹ . Then

๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) ={{๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ} if ๐‘ฆ = ๐น (๐‘ฅ),โˆ… otherwise,

and

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) ={{๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—} if ๐‘ฆ = ๐น (๐‘ฅ),โˆ… otherwise.

If ๐น is continuously Frรฉchet-dierentiable at ๐‘ฅ , then ๐น is graphically regular at ๐‘ฅ for ๐น (๐‘ฅ),and hence the corresponding expressions also hold for ๐ท๐น (๐‘ฅ |๐‘ฆ) and ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ).

277

20 derivatives and coderivatives of set-valued mappings

Proof. The graphical derivative: We have (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ) if and only if for some๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , ๐‘ฆ๐‘˜ โ‰” ๐น (๐‘ฅ๐‘˜), and ๐œ๐‘˜โ†’ 0 there holds

ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

=: lim๐‘˜โ†’โˆž

ฮ”๐‘ฅ๐‘˜(20.5a)

and

ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐œ๐‘˜

= lim๐‘˜โ†’โˆž

๐น (๐‘ฅ + ๐œ๐‘˜ฮ”๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)๐œ๐‘˜

.(20.5b)

If ฮ”๐‘ฅ๐‘˜ = 0 for all suciently large ๐‘˜ โˆˆ โ„•, clearly both ฮ”๐‘ฅ = 0 and ฮ”๐‘ฆ = 0. This satisesthe claimed expression. So we may assume that ฮ”๐‘ฅ๐‘˜ โ‰  0 for all ๐‘˜ โˆˆ โ„•. In this case, (20.5b)holds if and only if

lim๐‘˜โ†’โˆž

๐น (๐‘ฅ + โ„Ž๐‘˜) โˆ’ ๐น (๐‘ฅ) โˆ’ ๐œ๐‘˜ฮ”๐‘ฆ๐‘˜โ€–โ„Ž๐‘˜ โ€–๐‘‹

= 0

for โ„Ž๐‘˜ โ‰” ๐œ๐‘˜ฮ”๐‘ฅ๐‘˜ and any ฮ”๐‘ฆ๐‘˜ โ†’ ฮ”๐‘ฆ . Since ๐น is Frรฉchet dierentiable, this clearly holdswith

ฮ”๐‘ฆ๐‘˜ โ‰” ๐œโˆ’1๐‘˜ ๐นโ€ฒ(๐‘ฅ)โ„Ž๐‘˜ = ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ๐‘˜ โ†’ ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ =: ฮ”๐‘ฆ.

This shows that ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = {๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ}.The Clarke graphical derivative: To calculate ๐ท๐น (๐‘ฅ |๐‘ฆ), we have to nd all ฮ”๐‘ฅ and ฮ”๐‘ฆ suchthat for every ๐œ๐‘˜โ†’ 0 and (๐‘ฅ๐‘˜ , ๏ฟฝ๏ฟฝ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ) with ๏ฟฝ๏ฟฝ๐‘˜ = ๐น (๐‘ฅ๐‘˜), there exists ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ with

ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜๐œ๐‘˜

and ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ๐‘˜)๐œ๐‘˜

.

Setting ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐œ๐‘˜ฮ”๐‘ฅ๐‘˜ with ฮ”๐‘ฅ๐‘˜ โ†’ ฮ”๐‘ฅ , the second condition becomes

ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐น (๐‘ฅ๐‘˜ + ๐œ๐‘˜ฮ”๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ๐‘˜)๐œ๐‘˜

.

Taking ๐‘ฅ๐‘˜ = ๐‘ฅ , arguing as for ๐ท๐น shows that ฮ”๐‘ฆ = ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ is the only candidate. It justremains to show that any choice of ๐‘ฅ๐‘˜ gives the same limit, i.e., that

lim๐‘˜โ†’โˆž

๐น (๐‘ฅ๐‘˜ + ๐œ๐‘˜ฮ”๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ๐‘˜) โˆ’ ๐œ๐‘˜๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ๐œ๐‘˜

= 0.

But this follows from the assumed continuous dierentiability using Lemma 13.22. Thusfor ๐‘ฆ = ๐น (๐‘ฅ),

๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = {๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ} = ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ).This shows that ๐น is T-regular at ๐‘ฅ for ๐‘ฆ .

The Frรฉchet coderivative: The claim follows from proving that

(20.6) ๐ทโˆ—Y ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) =

{๐”น(๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, Y) if ๐‘ฆ = ๐น (๐‘ฅ),โˆ… otherwise,

278

20 derivatives and coderivatives of set-valued mappings

To show this, we note that ๐‘ฅโˆ— โˆˆ ๐ทโˆ—Y ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) if and only if for every sequence ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ

with ๐น (๐‘ฅ๐‘˜) โ†’ ๐น (๐‘ฅ),

lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)ใ€‰๐‘Œโˆšโ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ + โ€–๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)โ€–2๐‘Œ

โ‰ค Y.

Dividing both numerator and denominator by โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ > 0, we obtain the equivalentcondition that

lim sup๐‘˜โ†’โˆž

๐‘ž๐‘˜ โ‰ค Y for ๐‘ž๐‘˜ โ‰”ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)ใ€‰๐‘Œ

โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹.

If we take ๐‘ฅโˆ— โˆˆ ๐”น(๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, Y), this condition is veried by the Frรฉchet dierentiability of๐น at ๐‘ฅ . Conversely, to show that this implies ๐‘ฅโˆ— โˆˆ ๐”น(๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, Y), we take ๐‘ฅ๐‘˜ โ‰” ๐‘ฅ + ๐œ๐‘˜โ„Ž forsome ๐œ๐‘˜โ†’ 0 and โ„Ž โˆˆ ๐‘‹ with โ€–โ„Žโ€–๐‘‹ = 1. Then again by the Frรฉchet dierentiability of ๐น ,

Y โ‰ฅ lim๐‘˜โ†’โˆž

๐‘ž๐‘˜ = ใ€ˆ๐‘ฅโˆ—, โ„Žใ€‰ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โ„Žใ€‰.

Since โ„Ž โˆˆ ๐”น๐‘‹ was arbitrary, this shows that ๐‘ฅโˆ— โˆˆ ๐”น(๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, Y).The limiting coderivative: By the denition (18.8), the formula (20.6) for Y-coderivatives,and the continuous dierentiability, we have

๐‘graph ๐น (๐‘ฅ, ๐น (๐‘ฅ)) = w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ, Yโ†’ 0

๐‘ Ygraph ๐น (๐‘ฅ, ๐น (๐‘ฅ))

= w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ Yโ†’ 0

{(๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ— + ๐‘งโˆ—) โˆˆ ๐‘Œ โˆ— ร— ๐‘‹ โˆ— | ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—, ๐‘งโˆ— โˆˆ ๐”น(0, Y)}

= w-โˆ—-lim sup๐‘ฅโ†’๐‘ฅ

{(๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—) โˆˆ ๐‘Œ โˆ— ร— ๐‘‹ โˆ— | ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}

= {(๐‘ฆโˆ—, ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—) โˆˆ ๐‘Œ โˆ— ร— ๐‘‹ โˆ— | ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.This shows the claimed formula for the limiting coderivative and hence N- and thereforegraphical regularity. ๏ฟฝ

Remark 20.13. In nite dimensional spaces, it would be possible to more concisely prove theexpression for ๐ท๐น (๐‘ฅ |๐‘ฆ) using Corollary 18.20. Likewise, we could use the polarity relationshipsof Corollary 20.8 to obtain the expression for ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ). These approaches will, however, not bepossible in more general spaces.

Combining Theorem 20.12 with Lemma 20.5 allows us to compute the graphical derivativesand coderivatives of inverses of single-valued functions.

Corollary 20.14. Let ๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be single-valued and Frรฉchet-dierentiable at ๐‘ฅ โˆˆ ๐‘‹ . Then

๐ท๐นโˆ’1(๐‘ฆ |๐‘ฅ) (ฮ”๐‘ฆ) ={{ฮ”๐‘ฅ โˆˆ ๐‘‹ | ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ = ฮ”๐‘ฆ} if ๐‘ฆ = ๐น (๐‘ฅ),โˆ… otherwise,

279

20 derivatives and coderivatives of set-valued mappings

and

๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (๐‘ฅโˆ—) ={{๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— | ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ— = ๐‘ฅโˆ—} if ๐‘ฆ = ๐น (๐‘ฅ),โˆ… otherwise.

If ๐น is continuously Frรฉchet-dierentiable at ๐‘ฅ , then ๐นโˆ’1 is graphically at ๐‘ฆ = ๐น (๐‘ฅ) for ๐‘ฅ , andhence the corresponding expressions also hold for ๐ท๐นโˆ’1(๐‘ฆ |๐‘ฅ) and ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ).

It is important that Theorem 20.12 concerns the strong graphical derivatives ๐ท๐น instead ofthe weak graphical derivative๐ท๐‘ค๐น . Indeed, as the next counter-example demonstrates,๐ท๐‘ค๐นis more of a theoretical tool (with the important property in reexive spaces that๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) =๐ท๐‘ค๐น (๐‘ฅ |๐‘ฆ)โ—ฆ+ by Lemma 18.10 (ii)) which does not enjoy a rich calculus consistent withconventional notions. In the following chapters, we will therefore not develop calculusrules for the weak graphical derivative.

Example 20.15 (counter-example to single-valued weak graphical derivatives). Let๐‘“ โˆˆ ๐ถ1(โ„), ฮฉ โŠ‚ โ„๐‘‘ be open, and

๐น : ๐ฟ2(ฮฉ) โ†’ โ„, ๐น (๐‘ข) =โˆซ 1

0๐‘“ (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ.

Then by the above,

๐ท๐น (๐‘ข |๐น (๐‘ข)) (ฮ”๐‘ข) ={โˆซ 1

0๐‘“ โ€ฒ(๐‘ข (๐‘ฅ))ฮ”๐‘ข (๐‘ฅ) ๐‘‘๐‘ฅ

}.

In particular, ๐ท๐น (๐‘ข |๐น (๐‘ข)) (0) = {0}.However, choosing, e.g., ๐‘“ (๐‘ก) =

โˆš1 + ๐‘ก2, ฮฉ = (0, 1), and ๐‘ข๐‘˜ (๐‘ฅ) โ‰” sign sin(2๐‘˜๐œ‹๐‘ฅ), we

have ๐‘ข๐‘˜ โ‡€ 0 in ๐ฟ2(ฮฉ) but |๐‘ข๐‘˜ (๐‘ฅ) | = 1 for a.e. ๐‘ฅ โˆˆ [0, 1]. Take now ๏ฟฝ๏ฟฝ๐‘˜ โ‰” ๐›ผ๐œ๐‘˜๐‘ข๐‘˜ for anygiven ๐œ๐‘˜โ†’ 0 and ๐›ผ > 0. Then ๏ฟฝ๏ฟฝ๐‘˜ โ‡€ 0 as well, while

๐น (๏ฟฝ๏ฟฝ๐‘˜) โˆ’ ๐น (0) =โˆš1 + ๐›ผ2๐œ2

๐‘˜โˆ’ 1 โ†’ 0.

Moreover, (๏ฟฝ๏ฟฝ๐‘˜ โˆ’ 0)/๐œ๐‘˜ = ๐›ผ๐‘ข๐‘˜ โ‡€ 0 and lim๐‘˜โ†’โˆž(โˆš

1 + ๐›ผ2๐œ2๐‘˜โˆ’ 1

)/๐œ๐‘˜ = ๐›ผ2. As ๐›ผ > 0 was

arbitrary, we deduce that ๐ท๐‘ค๐น (๐‘ข |๐น (๐‘ข)) (0) โŠƒ [0,โˆž).

derivatives and coderivatives of subdifferentials

We now apply these notions to set-valued mappings arising as subdierentials of convexfunctionals. First, we directly obtain from Theorem 20.12 an expression for the squarednorm in Hilbert spaces.

280

20 derivatives and coderivatives of set-valued mappings

Corollary 20.16. Let ๐‘‹ be a Hilbert space and ๐‘“ (๐‘ฅ) = 12 โ€–๐‘ฅ โ€–2๐‘‹ for ๐‘ฅ โˆˆ ๐‘‹ . Then

๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) ={{ฮ”๐‘ฅ} if ๐‘ฆ = ๐‘ฅ,

โˆ… otherwise,

and

๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = ๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) ={{๐‘ฆโˆ—} if ๐‘ฆ = ๐‘ฅ,

โˆ… otherwise.

In particular, ๐œ•๐‘“ is graphically regular at every ๐‘ฅ โˆˆ ๐‘‹ .

Of course, we are more interested in subdierentials of nonsmooth functionals. We rststudy the indicator functional of an interval; see Figure 20.2.

Theorem 20.17. Let ๐‘“ (๐‘ฅ) โ‰” ๐›ฟ [โˆ’1,1] (๐‘ฅ) for ๐‘ฅ โˆˆ โ„. Then

๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) =

โ„ if |๐‘ฅ | = 1, ๐‘ฆ โˆˆ (0,โˆž)๐‘ฅ, ฮ”๐‘ฅ = 0,[0,โˆž)๐‘ฅ if |๐‘ฅ | = 1, ๐‘ฆ = 0, ฮ”๐‘ฅ = 0,{0} if |๐‘ฅ | = 1, ๐‘ฆ = 0, ๐‘ฅฮ”๐‘ฅ < 0,{0} if |๐‘ฅ | < 1, ๐‘ฆ = 0,โˆ… otherwise,

(20.7)

๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) =

โ„, if |๐‘ฅ | = 1, ๐‘ฆ โˆˆ (0,โˆž)๐‘ฅ, ๐‘ฆโˆ— = 0[0,โˆž)๐‘ฅ if |๐‘ฅ | = 1, ๐‘ฆ = 0, ๐‘ฅ๐‘ฆโˆ— โ‰ฅ 0,{0} if |๐‘ฅ | < 1, ๐‘ฆ = 0,โˆ… otherwise,

(20.8)

๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) =

โ„ if |๐‘ฅ | = 1, ๐‘ฆ โˆˆ (0,โˆž)๐‘ฅ, ฮ”๐‘ฅ = 0,{0} if |๐‘ฅ | = 1, ๐‘ฆ = 0, ฮ”๐‘ฅ = 0,{0} |๐‘ฅ | < 1, ๐‘ฆ = 0,โˆ… if otherwise,

(20.9)

and

๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) =

โ„ if |๐‘ฅ | = 1, ๐‘ฆ โˆˆ [0,โˆž)๐‘ฅ, ๐‘ฆโˆ— = 0[0,โˆž)๐‘ฅ if |๐‘ฅ | = 1, ๐‘ฆ = 0, ๐‘ฅ๐‘ฆโˆ— > 0,{0} if |๐‘ฅ | = 1, ๐‘ฆ = 0, ๐‘ฅ๐‘ฆโˆ— < 0,{0} if |๐‘ฅ | < 1, ๐‘ฆ = 0,โˆ… otherwise.

(20.10)

In particular, ๐œ•๐‘“ is graphically regular at ๐‘ฅ for ๐‘ฆ โˆˆ ๐œ•๐‘“ (๐‘ฅ) if and only if |๐‘ฅ | < 1 or ๐‘ฆ โ‰  0.

281

20 derivatives and coderivatives of set-valued mappings

(a) graphical derivative ๐ท [๐œ•๐‘“ ] (b) convex hull co๐ท [๐œ•๐‘“ ] (c) Frรฉchet coderivative ๐ทโˆ— [๐œ•๐‘“ ]

(d) limiting coderivative๐ทโˆ— [๐œ•๐‘“ ]

(e) convex hull co๐ทโˆ— [๐œ•๐‘“ ] (f) Clarke graphical derivative๐ท [๐œ•๐‘“ ]

Figure 20.2: Illustration of the dierent graphical derivatives and coderivatives of ๐œ•๐‘“ for๐‘“ = ๐›ฟ [โˆ’1,1] . The dashed line is graph ๐œ•๐‘“ . The dots indicate the base points (๐‘ฅ, ๐‘ฆ)where ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) is calculated, and the thick arrows and lled-in areas the di-rections of (ฮ”๐‘ฅ,ฮ”๐‘ฆ) (resp. (ฮ”๐‘ฅ,โˆ’ฮ”๐‘ฆ) for the coderivatives) relative to the basepoint. Observe that there is no graphical regularity at (๐‘ฅ, ๐‘ฆ) โˆˆ {(โˆ’1, 0), (1, 0)}.Everywhere else, ๐œ•๐‘“ is graphically regular. Observe also that cones in the lastgures of each row are polar to the cones in the rst and the second gureson the same row.

Proof. We rst of all recall from Example 4.9 that graph ๐œ•๐‘“ is closed with

(20.11) ๐œ•๐‘“ (๐‘ฅ) =[0,โˆž)๐‘ฅ if |๐‘ฅ | = 1,{0} if |๐‘ฅ | < 1,โˆ… otherwise.

We now verify (20.7). If ๐‘ฆ โˆˆ ๐œ•๐‘“ (๐‘ฅ) and ฮ”๐‘ฆ โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ), there exist by (20.1) se-quences ๐‘ก๐‘˜โ†’ 0, ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , and ๐‘ฆ๐‘˜ โˆˆ ๐œ•๐‘“ (๐‘ฅ + ๐‘ก๐‘˜ฮ”๐‘ฅ๐‘˜) such that

(20.12) ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘ก๐‘˜

and ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐‘ก๐‘˜

.

We proceed by case distinction.

282

20 derivatives and coderivatives of set-valued mappings

(i) |๐‘ฅ | = 1, ฮ”๐‘ฅ = 0, and ๐‘ฆ โˆˆ (0,โˆž)๐‘ฅ : Then choosing ๐‘ฅ๐‘˜ โ‰ก ๐‘ฅ , any ฮ”๐‘ฆ โˆˆ โ„ and ๐‘˜ largeenough, we can take ๐‘ฆ๐‘˜ = ๐‘ฆ + ๐‘ก๐‘˜ฮ”๐‘ฆ โˆˆ [0,โˆž)๐‘ฅ = ๐œ•๐‘“ (๐‘ฅ). This yields the rst case of(20.7).

(ii) |๐‘ฅ | = 1, ฮ”๐‘ฅ = 0, but ๐‘ฆ = 0: In this case, choosing ๐‘ฅ๐‘˜ โ‰ก ๐‘ฅ , we can take any ๐‘ฆ๐‘˜ โˆˆ ๐œ•๐‘“ (๐‘ฅ +๐‘ก๐‘˜ฮ”๐‘ฅ๐‘˜) = ๐œ•๐‘“ (๐‘ฅ) = [0,โˆž)๐‘ฅ . Picking any ฮ”๐‘ฆ โˆˆ [0,โˆž)๐‘ฅ and setting ๐‘ฆ๐‘˜ โ‰” ๐‘ฆ + ๐‘ก๐‘˜ฮ”๐‘ฆ ,we deduce that ฮ”๐‘ฆ โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ). Thus โ€œโŠƒโ€ holds in the second case of (20.7).Since ฮ”๐‘ฆ โˆˆ โˆ’(0,โˆž)๐‘ฅ is clearly not obtainable with ๐‘ฆ๐‘˜ โˆˆ [0,โˆž)๐‘ฅ , also โ€œโŠ‚โ€ holds.

(iii) |๐‘ฅ | = 1 and ฮ”๐‘ฅ = 0, but ๐‘ฆ โˆˆ โˆ’(0,โˆž)๐‘ฅ : Then we have ๐‘ฆ๐‘˜ โˆˆ [0,โˆž)๐‘ฅ for ๐‘˜ large enoughsince in this case either ๐‘ฅ๐‘˜ = ๐‘ฅ or ๐‘ฅ๐‘˜ โˆˆ (โˆ’1, 1). Thus |๐‘ฆ โˆ’ ๐‘ฆ๐‘˜ | โ‰ฅ |๐‘ฆ | > 0, so the secondlimit in (20.12) cannot exist. Therefore the coderivative is empty, which is coveredby the last case of (20.7).

(iv) |๐‘ฅ | = 1 and ๐‘ฅฮ”๐‘ฅ > 0: Then the rst limit in (20.12) requires that ๐‘ฅ๐‘˜ โˆ‰ dom ๐œ•๐‘“ , andhence ๐œ•๐‘“ (๐‘ฅ๐‘˜) = โˆ… for ๐‘˜ large enough. This is again covered by the last case of (20.7).

(v) |๐‘ฅ | = 1 and ๐‘ฅฮ”๐‘ฅ < 0 (the case ๐‘ฅฮ”๐‘ฅ = 0 being covered by (i)โ€“(iii)): Since ฮ”๐‘ฅ โ‰  0 has adierent sign from ๐‘ฅ , it follows from the rst limit in (20.12) that ๐‘ฅ๐‘˜ โˆˆ (โˆ’1, 1) for ๐‘˜large enough. Consequently, ๐œ•๐‘“ (๐‘ฅ๐‘˜) = {0}, i.e., ๐‘ฆ๐‘˜ = 0. The limit (20.12) in this caseonly exists if ๐‘ฆ = 0, in which case also ฮ”๐‘ฆ = 0. This is covered by the third caseof (20.7), while ๐‘ฆ โ‰  0 is covered by the last case.

(vi) |๐‘ฅ | < 1: Then ๐‘ฆ = 0 and necessarily ๐‘ฆ๐‘˜ = 0 for ๐‘˜ large enough. Therefore also ฮ”๐‘ฆ = 0,which yields the fourth case in (20.7).

(vii) |๐‘ฅ | > 1: Then ๐œ•๐‘“ (๐‘ฅ) = โˆ… and therefore the coderivative is empty as well, yieldingagain the nal case (20.7).

The expression for ๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) can be veried using Corollary 20.8 (i). It can also be seengraphically from Figure 20.2.

By the inner and outer limit characterizations of Corollary 20.9, we now obtain the ex-pressions for the Clarke graphical derivative ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) and the limiting coderivative๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ). Since graph ๐œ•๐‘“ is locally contained in an ane subspace outside of the โ€œcor-ner casesโ€ (๐‘ฅ, ๐‘ฆ) โˆˆ {(1, 0), (โˆ’1, 0)}, only the latter need special inspection. For the Clarkegraphical derivative,we need to writeฮ”๐‘ฆ as the limit ofฮ”๐‘ฆ๐‘˜ โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) (ฮ”๐‘ฅ๐‘˜) for someฮ”๐‘ฅ๐‘˜ โ†’ ฮ”๐‘ฅ and all graph ๐œ•๐‘“ 3 (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ). Consider for example (๐‘ฅ, ๐‘ฆ) = (โˆ’1, 0).Trying both (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1+ 1/๐‘˜, 0) and (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1,โˆ’1/๐‘˜), we see that this is only possi-ble for (ฮ”๐‘ฅ,ฮ”๐‘ฆ) = (ฮ”๐‘ฅ๐‘˜ ,ฮ”๐‘ฆ๐‘˜) = (0, 0). This yields the second case of (20.9). Conversely, forthe limiting coderivative, it suces to nd one such sequence from the Frรฉchet coderivative.Choosing for (๐‘ฅ, ๐‘ฆ) = (โˆ’1, 0) again (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1 + 1/๐‘˜, 0) and (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1,โˆ’1/๐‘˜) aswell as the constant sequence (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1, 0) yields the second, third, and rst case of(20.16), respectively.

283

20 derivatives and coderivatives of set-valued mappings

(a) graphical derivative ๐ท [๐œ•๐‘“ ] (b) convex hull co๐ท [๐œ•๐‘“ ] (c) Frรฉchet coderivative ๐ทโˆ— [๐œ•๐‘“ ]

(d) limiting coderivative๐ทโˆ— [๐œ•๐‘“ ]

(e) convex hull co๐ทโˆ— [๐œ•๐‘“ ] (f) Clarke graphical derivative๐ท [๐œ•๐‘“ ]

Figure 20.3: Illustration of the dierent graphical derivatives and coderivatives of ๐œ•๐‘“ for๐‘“ = | ยท |. The dashed line is graph ๐œ•๐‘“ . The dots indicate the base points (๐‘ฅ, ๐‘ฆ)where ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) is calculated, and the thick arrows and lled-in areas the di-rections of (ฮ”๐‘ฅ,ฮ”๐‘ฆ) (resp. (ฮ”๐‘ฅ,โˆ’ฮ”๐‘ฆ) for the coderivatives) relative to the basepoint. Observe that there is no graphical regularity at (๐‘ฅ, ๐‘ฆ) โˆˆ {(0,โˆ’1), (0, 1)}.Everywhere else, ๐œ•๐‘“ is graphically regular. Observe that cones in the last g-ures of each row are polar to the cones in the rst and the second gures onthe same row.

Finally, in nite dimensions themapping ๐œ•๐‘“ is graphically regular if and only if๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) =๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) by Corollary 20.11, which is the case exactly when |๐‘ฅ | < 1 or ๐‘ฆ โ‰  0 asclaimed. ๏ฟฝ

In nonlinear optimization with inequality constraints, the case where ๐œ•๐‘“ is graphicallyregular corresponds precisely to the case of strict complementarity of the minimizer ๐‘ฅ andthe Lagrange multiplier ๐‘ฆ for the constraint ๐‘ฅ โˆˆ [โˆ’1, 1].We next study the dierent derivatives and graphical regularity of the subdierential ofthe absolute value function; see Figure 20.3.

284

20 derivatives and coderivatives of set-valued mappings

Theorem 20.18. Let ๐‘“ (๐‘ฅ) โ‰” |๐‘ฅ | for ๐‘ฅ โˆˆ โ„. Then

๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) =

{0} if ๐‘ฅ โ‰  0, ๐‘ฆ = sign๐‘ฅ,{0} if ๐‘ฅ = 0, ฮ”๐‘ฅ โ‰  0, ๐‘ฆ = signฮ”๐‘ฅ,(โˆ’โˆž, 0]๐‘ฆ if ๐‘ฅ = 0, ฮ”๐‘ฅ = 0, |๐‘ฆ | = 1,โ„ if ๐‘ฅ = 0, ฮ”๐‘ฅ = 0, |๐‘ฆ | < 1,โˆ… if otherwise,

(20.13)

๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) =

{0} if ๐‘ฅ โ‰  0, ๐‘ฆ = sign๐‘ฅ,(โˆ’โˆž, 0]๐‘ฆ if ๐‘ฅ = 0, ๐‘ฆ๐‘ฆโˆ— โ‰ค 0, |๐‘ฆ | = 1,โ„ if ๐‘ฅ = 0, ๐‘ฆโˆ— = 0, |๐‘ฆ | < 1,โˆ… otherwise,

(20.14)

๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) =

{0} if ๐‘ฅ โ‰  0, ๐‘ฆ = sign๐‘ฅ,{0} if ๐‘ฅ = 0, ฮ”๐‘ฅ = 0, |๐‘ฆ | = 1,โ„ if ๐‘ฅ = 0, ฮ”๐‘ฅ = 0, |๐‘ฆ | < 1,โˆ… otherwise,

(20.15)

and

๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) =

{0} if ๐‘ฅ โ‰  0, ๐‘ฆ = sign๐‘ฅ,{0} if ๐‘ฅ = 0, ๐‘ฆ๐‘ฆโˆ— > 0, |๐‘ฆ | = 1,(โˆ’โˆž, 0]๐‘ฆ if ๐‘ฅ = 0, ๐‘ฆ๐‘ฆโˆ— < 0, |๐‘ฆ | = 1,โ„ if ๐‘ฅ = 0, ๐‘ฆโˆ— = 0, |๐‘ฆ | โ‰ค 1,โˆ… otherwise.

(20.16)

In particular, ๐œ•๐‘“ is graphically regular if and only if ๐‘ฅ โ‰  0 or |๐‘ฆ | < 1.

Proof. To start with proving (20.13), we recall from Example 4.7 that

(20.17) ๐œ•๐‘“ (๐‘ฅ) = sign(๐‘ฅ) ={1} if ๐‘ฅ > 0{โˆ’1} if ๐‘ฅ < 0[โˆ’1, 1] if ๐‘ฅ = 0.

To calculate the graphical derivative, we use that if ๐‘ฆ โˆˆ ๐œ•๐‘“ (๐‘ฅ) and ฮ”๐‘ฆ โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ),there exist by (20.1) sequences ๐‘ก๐‘˜โ†’ 0, ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , and ๐‘ฆ๐‘˜ โˆˆ ๐œ•๐‘“ (๐‘ฅ + ๐‘ก๐‘˜ฮ”๐‘ฅ๐‘˜) such that

(20.18) ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘ก๐‘˜

and ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐‘ก๐‘˜

.

We proceed by case distinction:

(i) ๐‘ฅ โ‰  0 and ๐‘ฆ โ‰  sign๐‘ฅ : Then ๐‘ฆ โˆ‰ ๐œ•๐‘“ (๐‘ฅ) and therefore ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) = โˆ…, which iscovered by the last case of (20.13).

285

20 derivatives and coderivatives of set-valued mappings

(ii) ๐‘ฅ โ‰  0 and ๐‘ฆ = sign๐‘ฅ : Then for any ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ , we have that ๐œ•๐‘“ (๐‘ฅ๐‘˜) = ๐œ•๐‘“ (๐‘ฅ) = {sign๐‘ฅ}for ๐‘˜ large enough. Therefore, for any ฮ”๐‘ฅ โˆˆ โ„ we have that ฮ”๐‘ฆ = 0, which is therst case of (20.13).

(iii) ๐‘ฅ = 0 and ฮ”๐‘ฅ โ‰  0: Then ๐‘ฅ๐‘˜ โ‰  0 and ๐‘ฆ๐‘˜ = sign๐‘ฅ๐‘˜ = signฮ”๐‘ฅ . Therefore the limits in(20.18) will only exist if |๐‘ฆ | = 1, which holds from ๐‘ฆ = signฮ”๐‘ฅ . Thus ฮ”๐‘ฆ = 0, i.e., weobtain the second case of (20.13).

(iv) ๐‘ฅ = 0 and ฮ”๐‘ฅ = 0: Then taking ๐‘ฅ๐‘˜ โ‰ก ๐‘ฅ , we can choose ๐‘ฆ๐‘˜ โˆˆ [โˆ’1,โˆ’1] arbitrarily. If|๐‘ฆ | = 1, then (๐‘ฆ โˆ’ ๐‘ฆ๐‘˜) sign ๐‘ฆ โ‰ค 0, so (20.18) shows that ฮ”๐‘ฆ sign ๐‘ฆ โ‰ค 0, which is thethird case of (20.13). If |๐‘ฆ | < 1, we may obtain any ฮ”๐‘ฆ โˆˆ โ„ by the limit in (20.18).This is the fourth case of (20.13).

The expression for ๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) can be veried using Corollary 20.8 (i). It can also be seengraphically from Figure 20.3.

By the inner and outer limit characterizations of Corollary 20.9, we now obtain the ex-pressions for the Clarke graphical derivative ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) and the limiting coderivative๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ). Since graph ๐œ•๐‘“ is locally contained in an ane subspace outside of the โ€œcor-ner casesโ€ (๐‘ฅ, ๐‘ฆ) โˆˆ {(0, 1), (0,โˆ’1)}, only the latter need special inspection. For the Clarkegraphical derivative,we need to writeฮ”๐‘ฆ as the limit ofฮ”๐‘ฆ๐‘˜ โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) (ฮ”๐‘ฅ๐‘˜) for someฮ”๐‘ฅ๐‘˜ โ†’ ฮ”๐‘ฅ and all graph ๐œ•๐‘“ 3 (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ). Consider for example (๐‘ฅ, ๐‘ฆ) = (0,โˆ’1).Trying both (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (0,โˆ’1+ 1/๐‘˜) and (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1/๐‘˜,โˆ’1), we see that this is only possi-ble for (ฮ”๐‘ฅ,ฮ”๐‘ฆ) = (ฮ”๐‘ฅ๐‘˜ ,ฮ”๐‘ฆ๐‘˜) = (0, 0). This yields the third case of (20.15). Conversely, forthe limiting coderivative, it suces to nd one such sequence from the Frรฉchet coderivative.Choosing for (๐‘ฅ, ๐‘ฆ) = (0,โˆ’1) again (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (0,โˆ’1 + 1/๐‘˜) and (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1/๐‘˜, 1) as wellas the constant sequence (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) = (โˆ’1, 0) yields the fourth, second, and third case of(20.16), respectively.

Finally, in nite dimensions themapping ๐œ•๐‘“ is graphically regular if and only if๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) =๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฆ) by Corollary 20.11, which is the case exactly when ๐‘ฅ โ‰  0 or |๐‘ฆ | < 1 asclaimed. ๏ฟฝ

20.4 relation to subdifferentials

All of the subdierentials that we have studied in Part III can be constructed from thecorresponding normal cones to the epigraph of a functional ๐ฝ : ๐‘‹ โ†’ โ„ as in the convexcase; see Lemma 4.10. For the Frรฉchet and limiting subdierentials, it is easy to see therelationships

๐œ•๐น ๐ฝ (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))},(20.19)๐œ•๐‘€ ๐ฝ (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))},(20.20)

286

20 derivatives and coderivatives of set-valued mappings

from the corresponding denitions. For the Clarke subdierential, however, we have towork a bit harder.

First, we dene for ๐ด โŠ‚ ๐‘‹ and ๐‘ฅ โˆˆ ๐‘‹ the Clarke normal cone

(20.21) ๐‘๐ถ๐ด (๐‘ฅ) โ‰” ๐‘‡๐ด (๐‘ฅ)โ—ฆ.

We can now extend the denition of the Clarke subdierential to arbitrary functionals๐ฝ : ๐‘‹ โ†’ โ„ on Gรขteaux smooth Banach spaces via the Clarke normal cone to theirepigraph.

Lemma 20.19. Let ๐‘‹ be a reexive and Gรขteaux smooth Banach space and let ๐ฝ : ๐‘‹ โ†’ โ„ belocally Lipschitz continuous around ๐‘ฅ โˆˆ ๐‘‹ . Then

๐œ•๐ถ ๐ฝ (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘๐ถepi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))}.

Proof. The Clarke tangent cone to epi ๐ฝ by denition is

๐‘‡epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ)) =(ฮ”๐‘ฅ,ฮ”๐‘ก) โˆˆ ๐‘‹ ร—โ„

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐œ๐‘˜โ†’ 0, ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ, ๐ฝ (๐‘ฅ๐‘˜) โ‰ค ๐‘ก๐‘˜ โ†’ ๐ฝ (๐‘ฅ)there exist ๐‘ฅ๐‘˜ โˆˆ ๐‘‹ and ๐‘ก๐‘˜ โ‰ฅ ๐ฝ (๐‘ฅ๐‘˜)

with (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ and (๐‘ก๐‘˜ โˆ’ ๐‘ก๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ก

.If (ฮ”๐‘ฅ,ฮ”๐‘ก) โˆˆ ๐‘‡epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ)), then replacing ๐‘ก๐‘˜ by ๐‘ก๐‘˜ + ๐œ๐‘˜ (ฮ”๐‘  โˆ’ ฮ”๐‘ก) โ‰ฅ ๐ฝ (๐‘ฅ๐‘˜) shows thatalso (ฮ”๐‘ฅ,ฮ”๐‘ ) โˆˆ ๐‘‡epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ)) for all ฮ”๐‘  โ‰ฅ ฮ”๐‘ก . Thus we may make the minimal choices๐‘ก๐‘˜ = ๐ฝ (๐‘ฅ๐‘˜) and ๐‘ก๐‘˜ = ๐ฝ (๐‘ฅ๐‘˜) to see that

๐‘‡epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ)) =(ฮ”๐‘ฅ,ฮ”๐‘ก) โˆˆ ๐‘‹ ร—โ„

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ for all ๐œ๐‘˜โ†’ 0, ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ there exist ๐‘ฅ๐‘˜ โˆˆ ๐‘‹with (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ

and lim sup๐‘˜โ†’โˆž(๐ฝ (๐‘ฅ๐‘˜) โˆ’ ๐ฝ (๐‘ฅ๐‘˜))/๐œ๐‘˜ โ‰ค ฮ”๐‘ก

.Since ๐ฝ is locally Lipschitz continuous, it suces to take ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐œ๐‘˜ฮ”๐‘ฅ to obtain

๐‘‡epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ)) = {(ฮ”๐‘ฅ,ฮ”๐‘ก) โˆˆ ๐‘‹ ร—โ„ | ๐‘ฅ โˆˆ ๐‘‹, ฮ”๐‘ก โ‰ฅ ๐ฝ โ—ฆ(๐‘ฅ ;ฮ”๐‘ฅ)} = epi[๐ฝ โ—ฆ(๐‘ฅ ; ยท )] .

Hence (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘๐ถepi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ)) = ๐‘‡epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))โ—ฆ if and only if ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ โ‰ค ๐ฝ โ—ฆ(๐‘ฅ ;ฮ”๐‘ฅ) for

all ๐‘ฅ โˆˆ ๐‘‹ , which by denition is equivalent to ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐ฝ (๐‘ฅ). ๏ฟฝ

We furthermore have the following relationship between the Clarke and limiting normalcones.

Corollary 20.20. Let ๐‘‹ be a reexive and Gรขteaux smooth Banach space and ๐ด โŠ‚ ๐‘‹ be closednear ๐‘ฅ โˆˆ ๐ด. Then

๐‘๐ถ๐ด (๐‘ฅ) = ๐‘๐ด (๐‘ฅ)โ—ฆโ—ฆ = cl coโˆ— ๐‘๐ด (๐‘ฅ),

where cl coโˆ— denotes the weak-โˆ— closed convex hull.

287

20 derivatives and coderivatives of set-valued mappings

Proof. First, ๐‘๐ด (๐‘ฅ) โ‰  โˆ… since ๐‘ฅ โˆˆ ๐ด. Furthermore, cl coโˆ— ๐‘๐ด (๐‘ฅ) is the smallest weak-โˆ—-closed and convex set that contains ๐‘๐ด (๐‘ฅ), and therefore Theorem 1.8 and Lemma 1.10imply ๐‘๐ด (๐‘ฅ)โ—ฆโ—ฆ = cl coโˆ— ๐‘๐ด (๐‘ฅ)โ—ฆโ—ฆ = cl coโˆ— ๐‘๐ด (๐‘ฅ). The relationship ๐‘๐ถ

๐ด (๐‘ฅ) = ๐‘๐ด (๐‘ฅ)โ—ฆโ—ฆ is animmediate consequence of Theorem 18.19. ๏ฟฝ

Assuming that ๐‘‹ is Gรขteaux smooth, we now have everything at hand to give a proof ofTheorem 16.10, which characterizes the Clarke subdierential as the weak-โˆ— closed convexhull of the limiting subdierential.

Corollary 20.21. Let ๐‘‹ be a reexive and Gรขteaux smooth Banach space and ๐ฝ : ๐‘‹ โ†’ โ„ belocally Lipschitz continuous around ๐‘ฅ โˆˆ ๐‘‹ . Then ๐œ•๐ถ ๐ฝ (๐‘ฅ) = clโˆ— co ๐œ•๐‘€ ๐ฝ (๐‘ฅ).

Proof. Together, Lemma 20.19 and Corollary 20.20 and (20.20) directly yield

๐œ•๐ถ ๐ฝ (๐‘ฅ) = {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘๐ถepi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))}

= {๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’1) โˆˆ clโˆ— co๐‘epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))}= clโˆ— co{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘epi ๐ฝ (๐‘ฅ, ๐ฝ (๐‘ฅ))}= clโˆ— co ๐œ•๐‘€ ๐ฝ (๐‘ฅ). ๏ฟฝ

(The Gรขteaux smoothness of ๐‘‹ can be relaxed to ๐‘‹ being an Asplund space followingRemark 17.8.)

From the corresponding denitions, it also follows that

๐œ•๐น ๐ฝ (๐‘ฅ) = ๐ทโˆ— [epif ๐ฝ ] (๐‘ฅ |๐ฝ (๐‘ฅ)) (โˆ’1),๐œ•๐‘€ ๐ฝ (๐‘ฅ) = ๐ท [epif ๐ฝ ] (๐‘ฅ |๐ฝ (๐‘ฅ)) (โˆ’1),

where the epigraphical function

epif ๐ฝ (๐‘ฅ) โ‰” {๐‘ก โˆˆ โ„ | ๐‘ก โ‰ฅ ๐ฝ (๐‘ฅ)}

satises graph[epif ๐ฝ ] = epi ๐ฝ . Thus the results of the following Chapters 23 and 25 can beused to derive the missing calculus rules for the Frรฉchet and limiting subdierentials. Inparticular, Theorem 25.14 will provide the missing proof of the sum rule (Theorem 16.13)for the limiting subdierential.

288

21 DERIVATIVES AND CODERIVATIVES OF

POINTWISE-DEFINED MAPPINGS

Just as for tangent and normal cones, the relationships between the basic and limitingderivatives and coderivatives are less complete in innite-dimensional spaces than in nite-dimensional ones. In this chapter, we apply the results of Chapter 19 to derive pointwisecharacterizations analogous to Theorem 4.11 for the basic derivatives of pointwise-denedset-valued mappings, which (only) in the case of graphically regularity transfer to theirlimiting variants.

21.1 proto-differentiability

For our superposition formulas, we need some regularity from the nite-dimensionalmappings. The appropriate notion is that of proto-dierentiability, which corresponds tothe geometric derivability of the underlying tangent cone.

Let ๐‘‹,๐‘Œ be Banach spaces. We say that a set-valued mapping ๐น : ๐‘‹ โ‡’ ๐‘Œ is proto-dierentiable at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐น (๐‘ฅ) if

for every ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) and ๐œ๐‘˜โ†’ 0,(21.1a)

there exist ๐‘ฅ๐‘˜ โˆˆ ๐‘‹ with ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

โ†’ ฮ”๐‘ฅ and ๐‘ฆ๐‘˜ โˆˆ ๐น (๐‘ฅ๐‘˜) with๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐œ๐‘˜

โ†’ ฮ”๐‘ฆ.(21.1b)

In other words, in addition to the basic limit (20.1) dening ๐ท๐น (๐‘ฅ |๐‘ฆ), a corresponding innerlimit holds in the graph space.

By application of Lemma 19.1 and Corollary 19.3, we immediately obtain the followingequivalent characterization.

Corollary 21.1. Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then ๐น is proto-dierentiable atevery ๐‘ฅ โˆˆ ๐‘‹ for every ๐‘ฆ โˆˆ ๐น (๐‘ฅ) if and only if graph ๐น is geometrically derivable at (๐‘ฅ, ๐‘ฆ). Inparticular, if ๐น is graphically regular at (๐‘ฅ, ๐‘ฆ), then ๐น is proto-dierentiable at ๐‘ฅ for ๐‘ฆ .

Clearly, dierentiable single-valued mappings are proto-dierentiable. Another large classare maximally monotone set-valued mappings on Hilbert spaces.

289

21 derivatives and coderivatives of pointwise-defined mappings

Lemma 21.2. Let ๐‘‹ be a Hilbert space and let ๐ด : ๐‘‹ โ‡’ ๐‘‹ be maximally monotone. Then ๐ด isproto-dierentiable at any ๐‘ฅ โˆˆ dom๐ด for any ๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ).

Proof. Let ฮ”๐‘ฅโˆ— โˆˆ ๐ท [๐ด] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ). By denition, there then exist ๐œ๐‘˜โ†’ 0 and (๐‘ฅ๐‘˜ , ๐‘ฅโˆ—๐‘˜ ) โˆˆgraph๐ด such that (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ and (๐‘ฅโˆ—

๐‘˜โˆ’ ๐‘ฅโˆ—)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅโˆ—. To show that ๐ด is proto-

dierentiable, we will construct for an arbitrary sequence ๐œ๐‘˜โ†’ 0 sequences (๐‘ฅ๐‘˜ , ๐‘ฅโˆ—๐‘˜ ) โˆˆgraph๐ด such that (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ and (๐‘ฅโˆ—

๐‘˜โˆ’ ๐‘ฅโˆ—)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅโˆ—. We will do so using

resolvents. Similarly to Lemma 6.18, we have that

๐‘ฅโˆ— โˆˆ ๐ด(๐‘ฅ) โ‡” ๐‘ฅ โˆˆ ๐ดโˆ’1(๐‘ฅโˆ—) โ‡” ๐‘ฅโˆ— + ๐‘ฅ โˆˆ {๐‘ฅโˆ—} +๐ดโˆ’1(๐‘ฅโˆ—)โ‡” ๐‘ฅโˆ— โˆˆ R๐ดโˆ’1 (๐‘ฅโˆ— + ๐‘ฅ).

Since๐ด is maximallymonotone and๐‘‹ is reexive,๐ดโˆ’1 is maximallymonotone by Lemma 6.7as well, and thus the resolvent R๐ดโˆ’1 is single-valued by Corollary 6.14. We therefore take

๐‘ฅ๐‘˜ โ‰” ๐‘ฅ + ๐œ๐‘˜๐œ๐‘˜

(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ) +๐œ๐‘˜๐œ๐‘˜๐‘ฅโˆ—๐‘˜ +

(1 โˆ’ ๐œ๐‘˜

๐œ๐‘˜

)๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—๐‘˜ and

๐‘ฅโˆ—๐‘˜ โ‰” R๐ดโˆ’1

(๐‘ฅ + ๐œ๐‘˜

๐œ๐‘˜(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ) +

๐œ๐‘˜๐œ๐‘˜๐‘ฅโˆ—๐‘˜ +

(1 โˆ’ ๐œ๐‘˜

๐œ๐‘˜

)๐‘ฅโˆ— โˆ’ ๐œ๐‘˜ (ฮ”๐‘ฅ + ฮ”๐‘ฅโˆ—)

)+ ๐œ๐‘˜ฮ”๐‘ฅโˆ—

= R๐ดโˆ’1 (๐‘ฅโˆ—๐‘˜ + ๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜ (ฮ”๐‘ฅ + ฮ”๐‘ฅโˆ—)) + ๐œ๐‘˜ฮ”๐‘ฅโˆ—.

Since resolvents of maximally monotone operators are 1-Lipschitz by Lemma 6.13, we have

lim๐‘˜โ†’โˆž

โ€–๐‘ฅโˆ—๐‘˜โˆ’ ๐‘ฅโˆ— โˆ’ ๐œ๐‘˜ฮ”๐‘ฅโˆ—โ€–๐‘‹

๐œ๐‘˜= lim๐‘˜โ†’โˆž

โ€–R๐ดโˆ’1 (๐‘ฅโˆ—๐‘˜+ ๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜ (ฮ”๐‘ฅ + ฮ”๐‘ฅโˆ—)) โˆ’ R๐ดโˆ’1 (๐‘ฅโˆ— + ๐‘ฅ)โ€–๐‘‹

๐œ๐‘˜

โ‰ค lim๐‘˜โ†’โˆž

โ€–(๐‘ฅโˆ—๐‘˜+ ๐‘ฅ๐‘˜ โˆ’ ๐œ๐‘˜ (ฮ”๐‘ฅ + ฮ”๐‘ฅโˆ—)) โˆ’ (๐‘ฅโˆ— + ๐‘ฅ)โ€–๐‘‹

๐œ๐‘˜

= lim๐‘˜โ†’โˆž

โ€–(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โˆ’ ๐œ๐‘˜ฮ”๐‘ฅ) + (๐‘ฅโˆ—๐‘˜โˆ’ ๐‘ฅโˆ— โˆ’ ๐œ๐‘˜ฮ”๐‘ฅโˆ—)โ€–๐‘‹

๐œ๐‘˜= 0.

Likewise, by inserting the denition of ๐‘ฅ๐‘˜ and using the triangle inequality, we obtain

lim๐‘˜โ†’โˆž

โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โˆ’ ๐œ๐‘˜ฮ”๐‘ฅ โ€–๐‘‹๐œ๐‘˜

โ‰ค lim๐‘˜โ†’โˆž

โ€–(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โˆ’ ๐œ๐‘˜ฮ”๐‘ฅ) + (๐‘ฅโˆ—๐‘˜โˆ’ ๐‘ฅโˆ— โˆ’ ๐œ๐‘˜ฮ”๐‘ฅโˆ—)โ€–๐‘‹

๐œ๐‘˜

+ lim๐‘˜โ†’โˆž

โ€–๐‘ฅโˆ—๐‘˜โˆ’ ๐‘ฅโˆ— โˆ’ ๐œ๐‘˜ฮ”๐‘ฅโˆ—โ€–๐‘‹

๐œ๐‘˜= 0.

This shows the claimed proto-dierentiability. ๏ฟฝ

Since subdierentials of convex and lower semicontinuous functionals on reexive Banachspaces are maximally monotone by Theorem 6.11, we immediately obtain the following.

290

21 derivatives and coderivatives of pointwise-defined mappings

Corollary 21.3. Let ๐‘‹ be a Hilbert space and let ๐ฝ : ๐‘‹ โ†’ โ„ be proper, convex, and lowersemicontinuous. Then ๐œ•๐ฝ is proto-dierentiable at any ๐‘ฅ โˆˆ dom ๐ฝ for any ๐‘ฅโˆ— โˆˆ ๐œ•๐ฝ (๐‘ฅ).

This corollary combined with Theorems 20.17 and 20.18 shows that proto-dierentiabilityis a strictly weaker property than graphical regularity.

21.2 graphical derivatives and coderivatives

As a corollary of the tangent and normal cone representations from Theorems 19.5 and 19.6,we obtain explicit characterizations of the graphical derivative and the Frรฉchet coderivativeof a class of pointwise-dened set-valued mappings. In the following, let ฮฉ โŠ‚ โ„๐‘‘ be anopen and bounded domain and write again ๐‘โˆ— for the conjugate exponent of ๐‘ โˆˆ (1,โˆž)satisfying 1/๐‘ + 1/๐‘โˆ— = 1.

Theorem 21.4. Let ๐น : ๐ฟ๐‘ (ฮฉ) โ‡’ ๐ฟ๐‘ž (ฮฉ) for ๐‘, ๐‘ž โˆˆ (1,โˆž) have the form

๐น (๐‘ข) = {๐‘ค โˆˆ ๐ฟ๐‘ž (ฮฉ) | ๐‘ค (๐‘ฅ) โˆˆ ๐‘“ (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ}

for some pointwise almost everywhere proto-dierentiable mapping ๐‘“ : โ„ โ‡’ โ„. Then forevery๐‘คโˆ— โˆˆ ๐ฟ๐‘žโˆ— (ฮฉ) and ฮ”๐‘ข โˆˆ ๐ฟ๐‘ (ฮฉ),

๐ทโˆ—๐น (๐‘ข |๐‘ค) (๐‘คโˆ—) ={๐‘ขโˆ— โˆˆ ๐ฟ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐ทโˆ—๐‘“ (๐‘ข (๐‘ฅ) |๐‘ค (๐‘ฅ)) (๐‘คโˆ—(๐‘ฅ))for a.e. ๐‘ฅ โˆˆ ฮฉ

},(21.2a)

๐ท๐น (๐‘ข |๐‘ค) (ฮ”๐‘ข) ={ฮ”๐‘ค โˆˆ ๐ฟ๐‘ž (ฮฉ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ”๐‘ค (๐‘ฅ) โˆˆ ๐ท๐‘“ (๐‘ข (๐‘ฅ) |๐‘ค (๐‘ฅ)) (ฮ”๐‘ข (๐‘ฅ))for a.e. ๐‘ฅ โˆˆ ฮฉ

}.(21.2b)

Moreover, if ๐‘“ is graphically regular at ๐‘ข (๐‘ฅ) for ๐‘ค (๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, then ๐น isgraphically regular at ๐‘ข for๐‘ค and

๐ท๐น (๐‘ข |๐‘ค) = ๐ท๐‘ค๐น (๐‘ข |๐‘ค) = ๐ท๐น (๐‘ข |๐‘ค),๐ทโˆ—๐น (๐‘ข |๐‘ค) = ๐ทโˆ—๐น (๐‘ข |๐‘ค).

Proof. First, graph ๐‘“ is geometrically derivable by Corollary 21.1 due to the assumed proto-dierentiability of ๐‘“ . We further have

graph ๐น ={(๐‘ข,๐‘ค) โˆˆ ๐ฟ๐‘ (ฮฉ) ร— ๐ฟ๐‘ž (ฮฉ)

๏ฟฝ๏ฟฝ (๐‘ข (๐‘ฅ),๐‘ค (๐‘ฅ)) โˆˆ graph ๐‘“ for a.e. ๐‘ฅ โˆˆ ฮฉ}.

Now (21.2b) and (21.2a) follow fromTheorems 19.5 and 19.6, respectively, for๐ถ : ๐‘ฅ โ†ฆโ†’ graph ๐‘“and๐‘ˆ = graph ๐น togetherwith denitions of the graphical derivative in terms of the tangentcone the Frรฉchet coderivative in terms of the Frรฉchet normal cone. The remaining claimsunder graphical regularity follow similarly from Lemma 19.11. ๏ฟฝ

291

21 derivatives and coderivatives of pointwise-defined mappings

The above result directly applies to second derivatives of integral functionals.

Corollary 21.5. Let ๐ฝ : ๐ฟ๐‘ (ฮฉ) โ†’ โ„ for ๐‘ โˆˆ (1,โˆž) be given by

๐ฝ (๐‘ข) =โˆซฮฉ๐‘— (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ

for some proper, convex, and lower semicontinuous integrand ๐‘— : โ„ โ†’ (โˆ’โˆž,โˆž]. Then

๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) ={ฮ”๐‘ขโˆ— โˆˆ ๐ฟ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ”๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐ทโˆ— [๐œ• ๐‘—] (๐‘ข (๐‘ฅ) |๐‘ขโˆ—(๐‘ฅ)) (ฮ”๐‘ข (๐‘ฅ))for a.e. ๐‘ฅ โˆˆ ฮฉ

},

๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) ={ฮ”๐‘ขโˆ— โˆˆ ๐ฟ๐‘โˆ— (ฮฉ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ฮ”๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐ท [๐œ• ๐‘—] (๐‘ข (๐‘ฅ) |๐‘ขโˆ—(๐‘ฅ)) (ฮ”๐‘ข (๐‘ฅ))for a.e. ๐‘ฅ โˆˆ ฮฉ

}.

Moreover, if ๐œ• ๐‘— is graphically regular at ๐‘ข (๐‘ฅ) for ๐‘ขโˆ—(๐‘ฅ) for almost every ๐‘ฅ โˆˆ ฮฉ, then ๐œ•๐ฝ isgraphically regular at ๐‘ข for ๐‘ขโˆ— and

๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) = ๐ท๐‘ค [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) = ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—),๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) = ๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—).

Proof. By Corollary 21.3, ๐œ• ๐‘— is proto-dierentiable. Since

๐œ•๐ฝ (๐‘ข) ={๐‘ขโˆ— โˆˆ ๐ฟ๐‘โˆ— (ฮฉ) | ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ• ๐‘— (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ

}by Theorem 4.11 and therefore

graph[๐œ•๐ฝ ] ={(๐‘ข,๐‘ขโˆ—) โˆˆ ๐ฟ๐‘ (ฮฉ) ร— ๐ฟ๐‘โˆ— (ฮฉ) | ๐‘ขโˆ—(๐‘ฅ) โˆˆ ๐œ• ๐‘— (๐‘ข (๐‘ฅ)) for a.e. ๐‘ฅ โˆˆ ฮฉ

},

the remaining claims follow from Theorem 21.4 with ๐น = ๐œ•๐ฝ , ๐‘“ = ๐œ• ๐‘— , and ๐‘ž = ๐‘โˆ—. ๏ฟฝ

Remark 21.6. The case of vector-valued and spatially-varying set-valued mappings and convexintegrands can be found in [Clason & Valkonen 2017b].

We illustrate this result with the usual examples. To keep the presentation simple, we focuson the case ๐‘โˆ— = ๐‘ = 2 such that ๐ฟ2(ฮฉ) is a Hilbert space and we can identify ๐‘‹ ๏ฟฝ ๐‘‹ โˆ—.

First, we immediately obtain from Corollary 20.16 together with Corollary 21.5

Corollary 21.7. Let ๐ฝ : ๐ฟ2(ฮฉ) โ†’ โ„ be given by

๐ฝ (๐‘ข) โ‰”โˆซฮฉ

12 |๐‘ข (๐‘ฅ) |

2 ๐‘‘๐‘ฅ.

Then for ๐‘ขโˆ— = ๐‘ข and all ฮ”๐‘ข โˆˆ ๐ฟ2(ฮฉ), we have๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ๐ท๐‘ค [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ฮ”๐‘ข,

๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ฮ”๐‘ข.

If ๐‘ขโˆ— โ‰  ๐‘ข, all the derivatives and coderivatives are empty.

292

21 derivatives and coderivatives of pointwise-defined mappings

From Theorem 20.17, we also obtain expressions for the basic derivatives of indicatorfunctionals for pointwise constraints. For the limiting derivatives, we only obtain expres-sions at points where graphical regularity (corresponding to strict complementarity) holds;cf. Remark 19.14.

Corollary 21.8. Let ๐ฝ : ๐ฟ2(ฮฉ) โ†’ โ„ be given by

๐ฝ (๐‘ข) โ‰”โˆซฮฉ๐›ฟ [โˆ’1,1] (๐‘ข (๐‘ฅ)) ๐‘‘๐‘ฅ.

Let ๐‘ข โˆˆ dom ๐ฝ and ๐‘ขโˆ— โˆˆ ๐œ•๐ฝ (๐‘ข). Then ฮ”๐‘ขโˆ— โˆˆ ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) โŠ‚ ๐ฟ2(ฮฉ) if and only if foralmost every ๐‘ฅ โˆˆ ฮฉ,

ฮ”๐‘ขโˆ—(๐‘ฅ) โˆˆ

โ„ if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) โˆˆ (0,โˆž)๐‘ข (๐‘ฅ), ฮ”๐‘ข (๐‘ฅ) = 0,[0,โˆž)๐‘ข (๐‘ฅ) if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) = 0, ฮ”๐‘ข (๐‘ฅ) = 0,{0} if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) = 0, ๐‘ข (๐‘ฅ)ฮ”๐‘ข (๐‘ฅ) < 0,{0} if |๐‘ข (๐‘ฅ) | < 1, ๐‘ขโˆ—(๐‘ฅ) = 0,โˆ… otherwise.

Similarly, ฮ”๐‘ข โˆˆ ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ขโˆ—) โŠ‚ ๐ฟ2(ฮฉ) if and only if for almost every ๐‘ฅ โˆˆ ฮฉ,

ฮ”๐‘ข (๐‘ฅ) โˆˆ

โ„, if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) โˆˆ (0,โˆž)๐‘ข (๐‘ฅ),ฮ”๐‘ขโˆ—(๐‘ฅ) = 0,[0,โˆž)๐‘ข (๐‘ฅ) if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) = 0, ๐‘ข (๐‘ฅ)ฮ”๐‘ขโˆ—(๐‘ฅ) โ‰ฅ 0,{0} if |๐‘ข (๐‘ฅ) | < 1, ๐‘ขโˆ—(๐‘ฅ) = 0,โˆ… otherwise.

If either |๐‘ข (๐‘ฅ) | < 1 or ๐‘ขโˆ—(๐‘ฅ) โ‰  0, then ฮ”๐‘ขโˆ— โˆˆ ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) if andonly if for almost every ๐‘ฅ โˆˆ ฮฉ,

ฮ”๐‘ขโˆ—(๐‘ฅ) โˆˆโ„ if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) โˆˆ (0,โˆž)๐‘ข (๐‘ฅ), ฮ”๐‘ข (๐‘ฅ) = 0,{0} if |๐‘ข (๐‘ฅ) | < 1, ฮ”๐‘ข (๐‘ฅ) โˆˆ โ„,

โˆ… otherwise.

A similar characterization holds for the basic derivatives of the ๐ฟ1 norm (as a functional on๐ฟ2(ฮฉ)).

Corollary 21.9. Let ๐ฝ : ๐ฟ2(ฮฉ) โ†’ โ„ be given by

๐ฝ (๐‘ข) โ‰”โˆซฮฉ|๐‘ข (๐‘ฅ) | ๐‘‘๐‘ฅ.

293

21 derivatives and coderivatives of pointwise-defined mappings

Let ๐‘ข โˆˆ dom ๐ฝ and ๐‘ขโˆ— โˆˆ ๐œ•๐ฝ (๐‘ข). Then ฮ”๐‘ขโˆ— โˆˆ ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) โŠ‚ ๐ฟ2(ฮฉ) if and only if foralmost every ๐‘ฅ โˆˆ ฮฉ,

ฮ”๐‘ขโˆ—(๐‘ฅ) โˆˆ

โ„ if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) โˆˆ (0,โˆž)๐‘ข (๐‘ฅ), ฮ”๐‘ข (๐‘ฅ) = 0,[0,โˆž)๐‘ข (๐‘ฅ) if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) = 0, ฮ”๐‘ข (๐‘ฅ) = 0,{0} if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) = 0, ๐‘ข (๐‘ฅ)ฮ”๐‘ข (๐‘ฅ) < 0,{0} if |๐‘ข (๐‘ฅ) | < 1, ๐‘ขโˆ—(๐‘ฅ) = 0,โˆ… otherwise,

Similarly, ฮ”๐‘ข โˆˆ ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ขโˆ—) โŠ‚ ๐ฟ2(ฮฉ) if and only if for almost every ๐‘ฅ โˆˆ ฮฉ,

ฮ”๐‘ข (๐‘ฅ) โˆˆ

โ„, if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) โˆˆ (0,โˆž)๐‘ข (๐‘ฅ),ฮ”๐‘ขโˆ—(๐‘ฅ) = 0,[0,โˆž)๐‘ข (๐‘ฅ) if |๐‘ข (๐‘ฅ) | = 1, ๐‘ขโˆ—(๐‘ฅ) = 0, ๐‘ข (๐‘ฅ)ฮ”๐‘ขโˆ—(๐‘ฅ) โ‰ฅ 0,{0} if |๐‘ข (๐‘ฅ) | < 1, ๐‘ขโˆ—(๐‘ฅ) = 0,โˆ… otherwise,

If either ๐‘ข (๐‘ฅ) โ‰  0 or |๐‘ขโˆ—(๐‘ฅ) | < 1, then ฮ”๐‘ขโˆ— โˆˆ ๐ท [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) = ๐ทโˆ— [๐œ•๐ฝ ] (๐‘ข |๐‘ขโˆ—) (ฮ”๐‘ข) if andonly if for almost every ๐‘ฅ โˆˆ ฮฉ,

ฮ”๐‘ขโˆ—(๐‘ฅ) โˆˆ{0} if ๐‘ข (๐‘ฅ) โ‰  0, ๐‘ขโˆ—(๐‘ฅ) = sign๐‘ข (๐‘ฅ), ฮ”๐‘ข (๐‘ฅ) โˆˆ โ„,

โ„ if ๐‘ข (๐‘ฅ) = 0, |๐‘ขโˆ—(๐‘ฅ) | < 1, ฮ”๐‘ข (๐‘ฅ) = 0,โˆ… otherwise.

Obtaining similar characterizations for derivatives of the Clarke subdierential of integralfunctions with nonsmooth nonconvex integrands requires verifying proto-dierentiabilityof the pointwise subdierential mapping, which is challenging since the Clarke subdif-ferential in general does not have the nice properties of the convex subdierential as aset-valued mapping. For problems of the form (P) in the introduction, it is therefore simplerto rst apply the calculus rules from the following chapters (assuming they are applicable)and to then use the above results for the derivatives of the convex or smooth componentmappings.

294

22 CALCULUS FOR THE GRAPHICAL DERIVATIVE

We now turn to calculus such as sum and product rules. We concentrate on the situationwhere at least one of the mappings involved is classically dierentiable, which allows exactresults and is already useful in practice. For a much fuller picture of innite-dimensionalcalculus in high generality, the reader is referred to [Mordukhovich 2006]. For furthernite-dimensional calculus we refer to [Rockafellar & Wets 1998; Mordukhovich 2018].

The rules we develop for the various (co)derivatives are in each case based on lineartransformation formulas of the underlying cones as well as on a fundamental compositionlemma. These fundamental lemmas, however, require further regularity assumptions thatare satised in particular by (continuously) Frรฉchet dierentiable single-valued mappingsand their inverses. For the sake of presentation, we treat each derivative in its own chapter,starting with the relevant regularity concept, then proving the fundamental lemmas, andnally deriving the calculus rules. We start with the (basic) graphical derivative.

22.1 semi-differentiability

Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ . We say that ๐น is semi-dierentiable at ๐‘ฅ โˆˆ ๐‘‹ for๐‘ฆ โˆˆ ๐น (๐‘ฅ) if

for every ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) and ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ, ๐œ๐‘˜โ†’ 0 with ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

โ†’ ฮ”๐‘ฅ(22.1a)

there exist ๐‘ฆ๐‘˜ โˆˆ ๐น (๐‘ฅ๐‘˜) with ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐œ๐‘˜

โ†’ ฮ”๐‘ฆ.(22.1b)

In other words, ๐ท๐น (๐‘ฅ |๐‘ฆ) is a full limit.

Lemma 22.1. A mapping ๐น : ๐‘‹ โ‡’ ๐‘Œ is semi-dierentiable at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐‘Œ if and only if

(22.2) ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = lim๐œโ†’ 0,ฮ”๐‘ฅโ†’ฮ”๐‘ฅ

๐น (๐‘ฅ + ๐œฮ”๐‘ฅ) โˆ’ ๐‘ฆ๐œ

(ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. First, note that (20.1) shows that ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) is the outer limit corresponding to(22.2). Similarly, by (22.1), ๐น is semidierentiable if ๐ท๐น (๐‘ฅ |๐‘ฆ) is the corresponding innerlimit. (For any sequence ๐œ๐‘˜โ†’ 0, we can relate ๐‘ฅ๐‘˜ in (22.1) and ฮ”๐‘ฅ =: ฮ”๐‘ฅ๐‘˜ in (22.2) via

295

22 calculus for the graphical derivative

ฮ”๐‘ฅ๐‘˜ = (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ .) Hence, ๐น is semidierentiable if and only if the outer limit in (20.1) is afull limit. ๏ฟฝ

Compared to the denition of proto-dierentiability in Section 21.1, we now require thatฮ”๐‘ฆ can be written as the limit of a dierence quotient taken from ๐น (๐‘ฅ๐‘˜) for any sequence{๐‘ฅ๐‘˜} similarly realizing ฮ”๐‘ฅ (while for proto-dierentiability, this only has to be possiblefor one such sequence). Hence, semi-dierentiability is a stronger property than proto-dierentiability with the former implying the latter.

Example 22.2 (proto-dierentiable but not semi-dierentiable). Let ๐น : โ„ โ‡’ โ„ havegraph ๐น = โ„š ร— {0}. Then ๐น is proto-dierentiable at any ๐‘ฅ โˆˆ โ„š by the density of โ„š inโ„. However, ๐น is not semi-dierentiable, as we can take ๐‘ฅ๐‘˜ โˆ‰ โ„š in (22.1).

For single-valued mappings and their (set-valued) inverses, this implies the following.

Lemma 22.3. Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ†’ ๐‘Œ .

(i) If ๐น is Frรฉchet dierentiable at ๐‘ฅ , then ๐น is semi-dierentiable at ๐‘ฅ for ๐‘ฆ = ๐น (๐‘ฅ).(ii) If ๐น is continuously dierentiable at ๐‘ฅ and ๐น โ€ฒ(๐‘ฅ)โˆ— โˆˆ ๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—) has a left-inverse

๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then ๐นโˆ’1 : ๐‘Œ โ‡’ ๐‘‹ is semi-dierentiable at ๐‘ฆ = ๐น (๐‘ฅ) for ๐‘ฅ .

Proof. (i): This follows directly from the denition of semi-dierentiability and the Frรฉchetderivative.

(ii): By Corollary 20.14, ๐ท๐นโˆ’1(๐‘ฆ |๐‘ฅ) (ฮ”๐‘ฆ) = {ฮ”๐‘ฅ โˆˆ ๐‘‹ | ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ = ฮ”๐‘ฆ} for ๐‘ฆ = ๐น (๐‘ฅ). Hence(22.1) for ๐นโˆ’1 requires showing that for all ๐œ๐‘˜โ†’ 0 and ๐‘ฆ๐‘˜ โˆˆ ๐‘Œ with (๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)/๐œ๐‘˜ โ†’ ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ ,there exist ๐‘ฅ๐‘˜ with ๐‘ฆ๐‘˜ = ๐น (๐‘ฅ๐‘˜) and (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โ†’ ฮ”๐‘ฅ . We will construct such ๐‘ฅ๐‘˜ throughthe inverse function theorem applied to an extended function.

Let ๐ด โ‰” ๐น โ€ฒ(๐‘ฅ) and ๐ดโ€  โ‰” (๐น โ€ฒ(๐‘ฅ)โˆ—โ€ )โˆ—. Then ๐ด๐ดโ€  = Id because (๐ด๐ดโ€ )โˆ— = (๐ดโ€ )โˆ—๐ดโˆ— =๐น โ€ฒ(๐‘ฅ)โˆ—โ€ ๐น โ€ฒ(๐‘ฅ)โˆ— = Id. Moreover, ๐‘ƒ โ‰” Idโˆ’๐ดโ€ ๐ด projects into ker๐ด = ker ๐น โ€ฒ(๐‘ฅ), so that๐ด๐‘ƒ = 0.We then dene

๐น : ๐‘‹ โ†’ ๐‘Œ ร— ker ๐น โ€ฒ(๐‘ฅ), ๐น (๐‘ฅ) โ‰” (๐น (๐‘ฅ), ๐‘ƒ๐‘ฅ),such that ๐น (๐‘ฅ)โ€ฒฮ”๐‘ฅ = (๐ดฮ”๐‘ฅ, ๐‘ƒฮ”๐‘ฅ) for all ฮ”๐‘ฅ โˆˆ ๐‘‹ . We further dene

๐‘€ : ๐‘Œ ร— ker ๐น โ€ฒ(๐‘ฅ) โ†’ ๐‘‹, ๐‘€ (๏ฟฝ๏ฟฝ, ๐‘ฅ) โ‰” ๐ดโ€ ๏ฟฝ๏ฟฝ + ๐‘ฅ,

such that for all ฮ”๐‘ฅ โˆˆ ๐‘‹ ,

๐‘€๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ = ๐ดโ€ ๐ดฮ”๐‘ฅ + ๐‘ƒฮ”๐‘ฅ = ฮ”๐‘ฅ .

Thus ๐‘€ is a left-inverse of ๐น โ€ฒ(๐‘ฅ) and thus ker ๐น โ€ฒ(๐‘ฅ) = {0}. Similarly, we verify that ๐‘€ isalso the right-inverse of ๐น โ€ฒ(๐‘ฅ) on ๐‘Œ ร— ker ๐น โ€ฒ(๐‘ฅ). Hence ๐น โ€ฒ(๐‘ฅ) is bijective.

296

22 calculus for the graphical derivative

By the inverse function Theorem 2.8, ๐นโˆ’1 exists in a neighborhood of๐‘ค = (๐‘ฆ, ๐‘ž) โ‰” ๐น (๐‘ฅ)in ๐‘Œ ร— ker ๐น โ€ฒ(๐‘ฅ) with (๐นโˆ’1)โ€ฒ(๐‘ฆ) = ๐‘€ and ๐นโˆ’1(๐‘ค) = ๐‘ฅ . Observe that ๐นโˆ’1(๏ฟฝ๏ฟฝ) โˆˆ ๐นโˆ’1(๏ฟฝ๏ฟฝ) for๏ฟฝ๏ฟฝ = (๏ฟฝ๏ฟฝ, ๐‘ž) in this neighborhood. Taking ๐‘ฅ๐‘˜ โ‰” ๐นโˆ’1(๐‘ฆ + ๐œ๐‘˜ฮ”๐‘ฆ, ๐‘ž + ๐œ๐‘˜๐‘ƒฮ”๐‘ฅ), we have

lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

= (๐นโˆ’1)โ€ฒ(๐‘ค) (ฮ”๐‘ฆ, ๐‘ƒฮ”๐‘ฅ) = ๐‘€ (ฮ”๐‘ฆ, ๐‘ƒฮ”๐‘ฅ)

= ๐ดโ€ ฮ”๐‘ฆ + ๐‘ƒฮ”๐‘ฅ = ๐ดโ€ ๐ดฮ”๐‘ฅ + ๐‘ƒฮ”๐‘ฅ = ฮ”๐‘ฅ,

which proves the claim. ๏ฟฝ

Remark 22.4. In Lemma 22.3 (ii), if๐‘‹ is nite-dimensional, it suces to assume that ๐น is continuouslydierentiable with ker ๐น โ€ฒ(๐‘ฅ)โˆ— = {0}. In this case we can take ๐น โ€ฒ(๐‘ฅ)โ€ โˆ— โ‰” ๐ดโˆ—(๐ด๐ดโˆ—)โˆ’1 for ๐ด โ‰” ๐น โ€ฒ(๐‘ฅ).

22.2 cone transformation formulas

At its heart, calculus rules for (co)derivatives of set-valued mappings derive from corre-sponding transformation formulas for the underlying cones. To formulate these, let ๐ถ โŠ‚ ๐‘Œand ๐‘… โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ), and ๐‘ฅ โˆˆ ๐‘…๐ถ โ‰” {๐‘…๐‘ฆ | ๐‘ฆ โˆˆ ๐ถ}. We then say that there exists a family ofcontinuous inverse selections

{๐‘…โˆ’1๐‘ฆ : ๐‘ˆ๐‘ฆ โ†’ ๐ถ | ๐‘ฆ โˆˆ ๐ถ, ๐‘…๐‘ฆ = ๐‘ฅ}

of ๐‘… to ๐ถ at ๐‘ฅ โˆˆ ๐‘…๐ถ if for each ๐‘ฆ โˆˆ ๐ถ with ๐‘…๐‘ฆ = ๐‘ฅ there exists a neighborhood๐‘ˆ๐‘ฆ โŠ‚ ๐‘…๐ถ of๐‘ฅ = ๐‘…๐‘ฆ and a map ๐‘…โˆ’1๐‘ฆ : ๐‘ˆ๐‘ฆ โ†’ ๐ถ continuous at ๐‘ฅ with ๐‘…โˆ’1๐‘ฆ ๐‘ฅ = ๐‘ฆ and ๐‘…๐‘…โˆ’1๐‘ฆ ๐‘ฅ = ๐‘ฅ for every๐‘ฅ โˆˆ ๐‘ˆ๐‘ฅ a neighborhood of ๐‘ฅ .

Example 22.5. Let ๐บ : โ„๐‘โˆ’1 โ†’ โ„ be continuous at ๐‘ฅ , and set ๐ถ โ‰” epi๐บ as well as๐‘…(๐‘ฅ, ๐‘ก) โ‰” ๐‘ฅ . Then by the classical inverse function Theorem 2.8,

{๐‘…โˆ’1(๐‘ฅ,๐‘ก) (๐‘ฅ) โ‰” (๐‘ฅ, ๐‘ก โˆ’๐บ (๐‘ฅ) +๐บ (๐‘ฅ)) | ๐‘ก โ‰ฅ ๐บ (๐‘ฅ)}

is a family of continuous inverse selections to ๐ถ at ๐‘ฅ . If ๐บ is Frรฉchet dierentiable at ๐‘ฅ ,then so is ๐‘…โˆ’1(๐‘ก,๐‘ฅ) .

Lemma 22.6. Let๐‘‹,๐‘Œ be Banach spaces and assume there exists a family of continuous inverseselections {๐‘…โˆ’1๐‘ฆ : ๐‘ˆ๐‘ฆ โ†’ ๐ถ | ๐‘ฆ โˆˆ ๐ถ, ๐‘…๐‘ฆ = ๐‘ฅ} of ๐‘… โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ) to ๐ถ โŠ‚ ๐‘Œ at ๐‘ฅ โˆˆ ๐‘‹ . If each ๐‘…โˆ’1๐‘ฆ isFrรฉchet dierentiable at ๐‘ฅ , then

๐‘‡๐‘…๐ถ (๐‘ฅ) =โ‹ƒ

๐‘ฆ :๐‘…๐‘ฆ=๐‘ฅ๐‘…๐‘‡๐ถ (๐‘ฆ).

297

22 calculus for the graphical derivative

Proof. We rst prove โ€œโŠƒโ€. Suppose ฮ”๐‘ฆ โˆˆ ๐‘‡๐ถ (๐‘ฆ) for some ๐‘ฆ โˆˆ cl๐ถ with ๐‘…๐‘ฆ = ๐‘ฅ . Thenฮ”๐‘ฆ = lim๐‘˜โ†’โˆž(๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)/๐œ๐‘˜ for some ๐‘ฆ๐‘˜ โˆˆ ๐ถ and ๐œ๐‘˜โ†’ 0. Consequently, since ๐‘… is bounded,๐‘…(๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)/๐œ๐‘˜ โ†’ ๐‘…ฮ”๐‘ฆ . But ๐‘…๐‘ฆ โˆˆ cl๐‘…๐ถ , so ๐‘…ฮ”๐‘ฆ โˆˆ ๐‘‡๐‘…๐ถ (๐‘ฅ). On the other hand, if ๐‘ฆ โˆ‰ cl๐ถ ,then ๐‘‡๐ถ (๐‘ฆ) = โˆ… and thus there is nothing to show. Hence โ€œโŠƒโ€ holds.To establish โ€œโŠ‚โ€, we rst of all note that ๐‘‡cl๐‘…๐ถ (๐‘ฅ) = โˆ… if ๐‘ฅ โˆ‰ cl๐‘…๐ถ . So suppose ๐‘ฅ โˆˆ cl๐‘…๐ถand ฮ”๐‘ฅ โˆˆ ๐‘‡๐‘…๐ถ (๐‘ฅ). Then ๐‘ฅ = ๐‘…๐‘ฆ for some ๐‘ฆ โˆˆ cl๐ถ . Since 0 โˆˆ ๐‘‡๐ถ (๐‘ฆ), we can concentrate onฮ”๐‘ฅ โ‰  0. Then ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž(๐‘ฅ๐‘˜ โˆ’๐‘ฅ)/๐œ๐‘˜ for some ๐‘ฅ๐‘˜ โˆˆ ๐‘…๐ถ and ๐œ๐‘˜โ†’ 0. We have ๐‘ฅ๐‘˜ = ๐‘…๐‘ฆ๐‘˜ for๐‘ฆ๐‘˜ โ‰” ๐‘…โˆ’1๐‘ฆ (๐‘ฅ๐‘˜). If we can show that (๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)/๐œ๐‘˜ โ†’ ฮ”๐‘ฆ for some ฮ”๐‘ฆ โˆˆ ๐‘Œ , then ฮ”๐‘ฆ โˆˆ ๐‘‡๐ถ (๐‘ฅ)and ฮ”๐‘ฅ = ๐‘…ฮ”๐‘ฆ . Since ๐‘…โˆ’1๐‘ฆ is Frรฉchet dierentiable at ๐‘ฅ , letting โ„Ž๐‘˜ โ‰” ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ and using that(โ„Ž๐‘˜ โˆ’ ๐œ๐‘˜ฮ”๐‘ฅ)/๐œ๐‘˜ = (๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ)/๐œ๐‘˜ โˆ’ ฮ”๐‘ฅ โ†’ 0 and โ€–โ„Ž๐‘˜ โ€–๐‘‹/๐œ๐‘˜ โ†’ โ€–ฮ”๐‘ฅ โ€–๐‘‹ , indeed

lim๐‘˜โ†’โˆž

(๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐œ๐‘˜

โˆ’ (๐‘…โˆ’1๐‘ฆ )โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)= lim๐‘˜โ†’โˆž

๐‘…โˆ’1๐‘ฆ (๐‘ฅ๐‘˜) โˆ’ ๐‘…โˆ’1๐‘ฆ (๐‘ฅ) โˆ’ ๐œ๐‘˜ (๐‘…โˆ’1๐‘ฆ )โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ๐œ๐‘˜

= lim๐‘˜โ†’โˆž

๐‘…โˆ’1๐‘ฆ (๐‘ฅ + โ„Ž๐‘˜) โˆ’ ๐‘…โˆ’1๐‘ฆ (๐‘ฅ) โˆ’ (๐‘…โˆ’1๐‘ฆ )โ€ฒ(๐‘ฅ)โ„Ž๐‘˜๐œ๐‘˜

= 0.

Thus ฮ”๐‘ฆ = (๐‘…โˆ’1๐‘ฆ )โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ , which proves โ€œโŠ‚โ€. ๏ฟฝ

Remark 22.7 (qualification conditions in finite dimensions). If ๐‘‹ and ๐‘Œ are nite-dimensional, wecould replace the existence of the family of {๐‘…โˆ’1๐‘ฆ } of continuous selections in Lemma 22.6 by themore conventional qualication conditionโ‹ƒ

๐‘ฆ :๐‘…๐‘ฆ=๐‘ฅ๐‘‡๐ถ (๐‘ฆ) โˆฉ ker๐‘… = {0}.

We do not employ such a condition, as the extension to Banach spaces would have to be based noton ๐‘‡๐ถ (๐‘ฆ) but on the weak tangent cone ๐‘‡๐‘ค๐ถ (๐‘ฆ) that is dicult to compute explicitly.

Webase all our calculus rules on the previous linear transformation lemma and the followingcomposition lemma for the tangent cone ๐‘‡๐ถ .

Lemma 22.8 (fundamental lemma on compositions). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}

for ๐น : ๐‘‹ โ‡’ ๐‘Œ , and ๐บ : ๐‘Œ โ‡’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ and either

(i) ๐บ is semi-dierentiable at ๐‘ฆ for ๐‘ง, or

(ii) ๐นโˆ’1 is semi-dierentiable at ๐‘ฆ for ๐‘ฅ ,

then

(22.3) ๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) | ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ), ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ)}.

298

22 calculus for the graphical derivative

Proof. We only consider the case (i); the case (ii) is shown analogously. By denition, wehave (ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) โˆˆ ๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) if and only if for some (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆˆ ๐ถ and ๐œ๐‘˜โ†’ 0,

ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐œ๐‘˜

, ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ๐œ๐‘˜

, ฮ”๐‘ง = lim๐‘˜โ†’โˆž

๐‘ง๐‘˜ โˆ’ ๐‘ง๐œ๐‘˜

.

On the other hand, we have ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) if and only if the rst two limits hold forsome (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โˆˆ graph ๐น and ๐œ๐‘˜โ†’ 0. Likewise, we have ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ) if and only ifthe last two limits hold. This immediately yields โ€œโŠ‚โ€.To prove โ€œโŠƒโ€, take ๐œ๐‘˜ > 0 and (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โˆˆ graph ๐น such that the rst two limits hold. By thesemi-dierentiability of ๐บ at ๐‘ฆ for ๐‘ง, for any ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ) we can nd ๐‘ง๐‘˜ โˆˆ ๐บ (๐‘ฆ๐‘˜)such that (๐‘ง๐‘˜ โˆ’ ๐‘ง)/๐œ๐‘˜ โ†’ ฮ”๐‘ง. This shows the remaining limit. ๏ฟฝ

If one of the two mappings is single-valued, we can use Lemma 22.3 for verifying its semi-dierentiability and Theorem 20.12 for the expression of its graphical derivative to obtainfrom Lemma 22.8 the following two special cases.

Corollary 22.9 (fundamental lemma on compositions: single-valued outer mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ,๐บ (๐‘ฆ)) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ)}

for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ and ๐บ is Frรฉchet dierentiable at ๐‘ฆ , then

๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,๐บโ€ฒ(๐‘ฆ)ฮ”๐‘ฆ) | ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ)}.

Corollary 22.10 (fundamental lemma on compositions: single-valued inner mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ = ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}

for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ , ๐น is continuously Frรฉchet dierentiable at ๐‘ฅand ๐น โ€ฒ(๐‘ฅ)โˆ— has a left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then

๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) | ฮ”๐‘ฆ = ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ, ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ)}.

22.3 calculus rules

Combining now the previous results, we quickly obtain various calculus rules. We beginas usual with a sum rule.

299

22 calculus for the graphical derivative

Theorem 22.11 (addition of a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ be Banachspaces, let ๐บ : ๐‘‹ โ†’ ๐‘Œ be Frรฉchet dierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then for any ๐‘ฅ โˆˆ ๐‘‹ and๐‘ฆ โˆˆ ๐ป (๐‘ฅ) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ฅ),

๐ท๐ป (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = ๐ท๐น (๐‘ฅ |๐‘ฆ โˆ’๐บ (๐‘ฅ)) (ฮ”๐‘ฅ) +๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. We have graph๐ป = ๐‘…๐ถ for

(22.4) ๐ถ โ‰” {(๐‘ข, ๐‘ฅ,๐บ (๐‘ฅ)) | ๐‘ฅ โˆˆ ๐‘‹, ๐‘ข โˆˆ ๐น (๐‘ฅ)} and ๐‘…(๐‘ข, ๐‘ฅ, ๐‘ฃ) โ‰” (๐‘ฅ,๐‘ข + ๐‘ฃ).

We now use Lemma 22.6 to calculate ๐‘‡๐‘…๐ถ . Accordingly, for all (๐‘ข, ๐‘ฅ,๐บ (๐‘ฅ)) โˆˆ ๐ถ such that๐‘…(๐‘ข, ๐‘ฅ,๐บ (๐‘ฅ)) = (๐‘ฅ, ๐‘ฆ) โ€“ i.e., only for ๐‘ฅ = ๐‘ฅ and ๐‘ข = ๐‘ฆ โˆ’ ๐บ (๐‘ฅ) โ€“ we dene the inverseselection

๐‘…โˆ’1(๐‘ข,๐‘ฅ,๐บ (๐‘ฅ)) : ๐‘…๐ถ โ†’ ๐ถ, ๐‘…โˆ’1(๐‘ข,๐‘ฅ,๐บ (๐‘ฅ)) (๐‘ฅ, ๏ฟฝ๏ฟฝ) โ‰” (๏ฟฝ๏ฟฝ โˆ’๐บ (๐‘ฅ), ๐‘ฅ,๐บ (๐‘ฅ)),

Then ๐‘…โˆ’1(๐‘ข,๐‘ฅ,๐บ (๐‘ฅ)) (๐‘ฅ,๐‘ข +๐บ (๐‘ฅ)) = (๐‘ข, ๐‘ฅ,๐บ (๐‘ฅ)) and ๐‘…โˆ’1(๐‘ข,๐‘ฅ,๐บ (๐‘ฅ)) (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆˆ ๐ถ for every (๐‘ฅ, ๏ฟฝ๏ฟฝ) โˆˆ ๐‘…๐ถ .Furthermore, ๐‘…โˆ’1(๐‘ข,๐‘ฅ,๐บ (๐‘ฅ)) is continuous and Frรฉchet dierentiable at (๐‘ฅ, ๐‘ง).Lemma 22.6 now yields

๐‘‡graph๐ป (๐‘ฅ, ๐‘ฆ) = {(ฮ”๐‘ฅ,ฮ”๐‘ข + ฮ”๐‘ฃ) | (ฮ”๐‘ข,ฮ”๐‘ฅ,ฮ”๐‘ฃ) โˆˆ ๐‘‡๐ถ (๐‘ฆ โˆ’๐บ (๐‘ฅ), ๐‘ฅ,๐บ (๐‘ฅ))}.

Moreover, ๐ถ given in (22.4) coincides with the ๐ถ dened in Corollary 22.9 with ๐นโˆ’1 inplace of ๐น . Using Corollary 22.9 and inserting the expression from Lemma 20.5 for ๐ท๐นโˆ’1, itfollows

๐‘‡๐ถ (๐‘ข, ๐‘ฅ, ๐‘ฃ) = {(ฮ”๐‘ข,ฮ”๐‘ฅ,๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) | ฮ”๐‘ข โˆˆ ๐ท๐น (๐‘ฅ |๐‘ข) (ฮ”๐‘ฅ)}.Thus

๐ท๐ป (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = {ฮ”๐‘ข + ฮ”๐‘ฃ | (ฮ”๐‘ข,ฮ”๐‘ฅ,ฮ”๐‘ฃ) โˆˆ ๐‘‡๐ถ (๐‘ฆ โˆ’๐บ (๐‘ฅ), ๐‘ฅ,๐บ (๐‘ฅ))}= {ฮ”๐‘ข +๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ | ฮ”๐‘ข โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ โˆ’๐บ (๐‘ฅ)) (ฮ”๐‘ฅ)},

which yields the claim. ๏ฟฝ

We now turn to chain rules, beginning with the case that the outer mapping is single-valued.

Theorem 22.12 (outer composition with a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œbe Banach spaces, ๐น : ๐‘‹ โ‡’ ๐‘Œ , and๐บ : ๐‘Œ โ†’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐น (๐‘ฅ)) be given.If ๐บ is Frรฉchet dierentiable at every ๐‘ฆ โˆˆ ๐น (๐‘ฅ), invertible on ran๐บ near ๐‘ง, and the inverse๐บโˆ’1 is Frรฉchet dierentiable at ๐‘ง, then

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง๐บโ€ฒ(๐‘ฆ)๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

300

22 calculus for the graphical derivative

Proof. Observing that graph๐ป = ๐‘…๐ถ for

(22.5) ๐ถ โ‰” {(๐‘ฅ, ๏ฟฝ๏ฟฝ,๐บ (๏ฟฝ๏ฟฝ)) | ๏ฟฝ๏ฟฝ โˆˆ ๐น (๐‘ฅ)} and ๐‘…(๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) โ‰” (๐‘ฅ, ๐‘ง),we again use Lemma 22.6 to calculate ๐‘‡๐‘…๐ถ . Accordingly, we dene for ๐‘ฆ โˆˆ ๐บโˆ’1(๐‘ง) โˆฉ ๐น (๐‘ฅ)the family of inverse selections

๐‘…โˆ’1(๐‘ฅ,๐‘ฆ,๐‘ง) : ๐‘…๐ถ โ†’ ๐ถ, ๐‘…โˆ’1(๐‘ฅ,๐‘ฆ,๐‘ง) (๐‘ฅ, ๐‘ง) โ‰” (๐‘ฅ,๐บโˆ’1(๐‘ง), ๐‘ง),

Clearly, ๐‘…โˆ’1(๐‘ฅ,๐‘ฆ,๐‘ง) (๐‘ฅ, ๐‘ง) = (๐‘ฅ, ๐‘ฆ, ๐‘ง). Furthermore, ๐บ is by assumption invertible on its rangenear ๐‘ง = ๐บ (๐‘ฆ). Hence ๐บโˆ’1(๐‘ง) โˆˆ ๐น (๐‘ฅ), and thus in fact ๐‘…โˆ’1(๐‘ฅ,๐‘ฆ,๐‘ง) (๐‘ฅ, ๐‘ง) โˆˆ ๐ถ for all (๐‘ฅ, ๐‘ง) โˆˆ ๐‘…๐ถ .Moreover, ๐‘…โˆ’1(๐‘ฅ,๐‘ฆ,๐‘ง) is continuous and Frรฉchet dierentiable at (๐‘ฅ, ๐‘ง) since ๐บโˆ’1 has theseproperties at ๐‘ง.

Applying Lemma 22.6 now yields

๐‘‡graph๐ป (๐‘ฅ, ๐‘ง) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง{(ฮ”๐‘ฅ,ฮ”๐‘ง) | (ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) โˆˆ ๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง)}.

Using Corollary 22.9, we then obtain

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง{ฮ”๐‘ง | (ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) โˆˆ ๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง)}

=โ‹ƒ

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง{๐บโ€ฒ(๐‘ฆ)ฮ”๐‘ฆ | ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ)}.

After further simplication, we arrive at the claimed expression. ๏ฟฝ

In particular, this result holds if๐บ is Frรฉchet dierentiable and๐บโ€ฒ(๐‘ฆ) is bijective, since in thiscase the inverse function Theorem 2.8 guarantees the local existence and dierentiabilityof ๐บโˆ’1.

Another useful special case is when the mapping ๐บ is linear.

Corollary 22.13 (outer composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces,๐ด โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ), and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐ด has a bounded left-inverse ๐ดโ€ , then for any ๐‘ฅ โˆˆ ๐‘‹ and๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐ด๐น (๐‘ฅ),

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = ๐ด๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ )for the unique ๐‘ฆ โˆˆ ๐‘Œ such that ๐ด๐‘ฆ = ๐‘ง.

Proof. We apply Theorem 22.12 to๐บ (๐‘ฆ) โ‰” ๐ด๐‘ฆ , which is clearly continuously dierentiableat every ๐‘ฆ โˆˆ ๐น (๐‘ฅ). Since ๐ด has a bounded left-inverse ๐ดโ€ , ๐บโˆ’1(๐‘ฆ) = ๐ดโ€ ๐‘ฆ is an inverse of๐บ on ๐บ (๐‘ฆ) = ran๐ด, which is also clearly dierentiable. Moreover, {๐‘ฆ | ๐บ (๐‘ฆ) = ๐‘ง} is asingleton, which removes the intersections and unions from the expressions provided byTheorem 22.12. ๏ฟฝ

301

22 calculus for the graphical derivative

The assumption is in particular satised if ๐‘Œ and ๐‘ are Hilbert spaces and ๐ด is injectiveand has closed range, since in this case we can take ๐ดโ€  = (๐ดโˆ—๐ด)โˆ’1๐ดโˆ—๐‘ฆ (the Mooreโ€“Penrosepseudoinverse of ๐ด).

We next consider chain rules where the inner mapping is single-valued.

Theorem 22.14 (inner composition with a single-valued dierentiable mapping). Let๐‘‹,๐‘Œ, ๐‘be Banach spaces, ๐น : ๐‘‹ โ†’ ๐‘Œ and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐บ (๐น (๐‘ฅ)). If ๐น iscontinuously Frรฉchet dierentiable near ๐‘ฅ and ๐น โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = ๐ท๐บ (๐น (๐‘ฅ) |๐‘ง) (๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. Observing that graph๐ป = ๐‘…๐ถ for

(22.6) ๐ถ โ‰” {(๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) | ๏ฟฝ๏ฟฝ = ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๏ฟฝ๏ฟฝ)} and ๐‘…(๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) โ‰” (๐‘ฅ, ๐‘ง),

we again use Lemma 22.6 to compute ๐‘‡๐‘…๐ถ . Accordingly, we dene a family of inverseselections for all (๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) โˆˆ ๐ถ such that ๐‘…(๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ง) = (๐‘ฅ, ๐‘ง). But this only holds for (๐‘ฅ, ๏ฟฝ๏ฟฝ, ๐‘ฅ) =(๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง), and hence we only need

๐‘…โˆ’1(๐‘ฅ,๐น (๐‘ฅ),๐‘ง) : ๐‘…๐ถ โ†’ ๐ถ, ๐‘…โˆ’1(๐‘ฅ,๐น (๐‘ฅ),๐‘ง) (๐‘ฅ, ๐‘ง) โ‰” (๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง).

Clearly ๐‘…โˆ’1(๐‘ฅ,๐น (๐‘ฅ),๐‘ง) (๐‘ฅ, ๐‘ง) = (๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง) and, if (๐‘ฅ, ๐‘ง) โˆˆ ๐‘…๐ถ , then ๐‘…โˆ’1(๐‘ฅ,๐น (๐‘ฅ),๐‘ง) (๐‘ฅ, ๐‘ง) โˆˆ ๐ถ and๐‘…โˆ’1(๐‘ฅ,๐น (๐‘ฅ),๐‘ง) is continuous and dierentiable at (๐‘ฅ, ๐‘ง).Thus Lemma 22.6 yields

๐‘‡graph๐ป (๐‘ฅ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ง) | (ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) โˆˆ ๐‘‡๐ถ (๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง)}.

On the other hand, we can apply Corollary 22.10 due to the continuous dierentiability of๐น and left-invertibility of ๐น โ€ฒ(๐‘ฅ)โˆ— to obtain

๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) | ฮ”๐‘ฆ = ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ, ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ)}.

Thus๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = {ฮ”๐‘ง | (ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) โˆˆ ๐‘‡๐ถ (๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง)}

= {ฮ”๐‘ง | ฮ”๐‘ฆ = ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ, ฮ”๐‘ง โˆˆ ๐ท๐บ (๐น (๐‘ฅ) |๐‘ง) (ฮ”๐‘ฆ)},which yields the claim. ๏ฟฝ

Again, we can specialize this result to the case where the single-valued mapping is linear.

302

22 calculus for the graphical derivative

Corollary 22.15 (inner composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces,๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐ป โ‰” ๐บ โ—ฆ ๐ด for ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and ๐บ : ๐‘Œ โ‡’ ๐‘ onBanach spaces ๐‘‹,๐‘Œ , and ๐‘ . If ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then for all ๐‘ฅ โˆˆ ๐‘‹ and๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐ด๐‘ฅ),

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = ๐ท๐บ (๐ด๐‘ฅ |๐‘ง) (๐ดฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

We wish to apply these results to further dierentiate the chain rules from Theorems 4.17and 13.23. For the former, this is straight-forward based on the two corollaries so farobtained.

Corollary 22.16 (second derivative chain rule for convex subdierential). Let๐‘‹,๐‘Œ be Banachspaces, let ๐‘“ : ๐‘Œ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) be suchthat ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), and ran๐ด โˆฉ int dom ๐‘“ โ‰  โˆ…. Let โ„Ž โ‰” ๐‘“ โ—ฆ๐ด. Thenfor any ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐œ•โ„Ž(๐‘ฅ) = ๐ดโˆ—๐œ•๐‘“ (๐ด๐‘ฅ),

๐ท [๐œ•โ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = ๐ดโˆ—๐ท [๐œ•๐‘“ ] (๐ด๐‘ฅ |๐‘ฆโˆ—) (๐ดฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ )for the unique ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— satisfying ๐ดโˆ—๐‘ฆโˆ— = ๐‘ฅโˆ—.

Proof. The expression for ๐œ•โ„Ž(๐‘ฅ) follows from Theorem 4.17, to which we apply Corol-lary 22.15 as well as Corollary 22.13 with ๐ดโˆ— in place of ๐ด. ๏ฟฝ

To further dierentiate the result of applying a chain rule such as Theorem 13.23, we alsoneed a product rule for a single-valued mapping๐บ and a set-valued mapping ๐น . In principle,this could be obtained as a composition of ๐‘ฅ โ†ฆโ†’ (๐‘ฅ1, ๐‘ฅ2), (๐‘ฅ1, ๐‘ฅ2) โ†ฆโ†’ {๐บ (๐‘ฅ1)} ร— ๐น (๐‘ฅ2), and(๐‘ฆ1, ๐‘ฆ2) โ†ฆโ†’ ๐‘ฆ1๐‘ฆ2; however, the last one of these mappings does not possess the left-inverserequired by Corollary 22.13. We therefore take another route.

Theorem 22.17 (product rule). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces, let ๐บ : ๐‘‹ โ†’ ๐•ƒ(๐‘Œ ;๐‘ ) be Frรฉchetdierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐บ (๐‘ฅ) โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ) has a left-inverse ๐บ (๐‘ฅ)โ€ โˆ— on ran๐บ (๐‘ฅ)for ๐‘ฅ near ๐‘ฅ โˆˆ ๐‘‹ and the mapping ๐‘ฅ โ†ฆโ†’ ๐บ (๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ , then for all๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐‘ฅ)๐น (๐‘ฅ) โ‰” โ‹ƒ

๐‘ฆโˆˆ๐น (๐‘ฅ)๐บ (๐‘ฅ)๐‘ฆ ,๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = [๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]๐‘ฆ +๐บ (๐‘ฅ)๐ท๐น (๐‘ฅ |๐‘ฆ)ฮ”๐‘ฅ (๐‘ง โˆˆ ๐ป (๐‘ฅ), ฮ”๐‘ฅ โˆˆ ๐‘‹ )

for the unique ๐‘ฆ โˆˆ ๐น (๐‘ฅ) satisfying ๐บ (๐‘ฅ)๐‘ฆ = ๐‘ง.

Proof. First, dene ๐น : ๐‘‹ โ‡’ ๐‘‹ ร— ๐‘Œ by

graph ๐น = ๐‘…0 graph ๐น for ๐‘…0(๐‘ฅ, ๏ฟฝ๏ฟฝ) โ‰” (๐‘ฅ, ๐‘ฅ, ๏ฟฝ๏ฟฝ).Then we have graph๐ป = ๐‘…1 graph(๐บ โ—ฆ ๐น ) for

๐บ (๐‘ฅ, ๏ฟฝ๏ฟฝ) = (๐‘ฅ,๐บ (๐‘ฅ)๏ฟฝ๏ฟฝ) and ๐‘…1(๐‘ฅ1, ๐‘ฅ2, ๐‘ง) โ‰” (๐‘ฅ1, ๐‘ง).

303

22 calculus for the graphical derivative

Let now ๐‘ฆ โˆˆ ๐น (๐‘ฅ). By Lemma 22.6, we have

๐‘‡๐‘…0 graph ๐น (๐‘ฅ, ๐‘ฅ, ๐‘ฆ) = {(ฮ”๐‘ฅ,ฮ”๐‘ฅ,ฮ”๐‘ฆ) | (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆˆ ๐‘‡graph ๐น (๐‘ฅ, ๐‘ฆ)}

so that๐ท๐น (๐‘ฅ |๐‘ฅ, ๐‘ฆ) (ฮ”๐‘ฅ) = {ฮ”๐‘ฅ} ร— ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ).

We wish to apply Theorem 22.12. First, ๐บ is single-valued and dierentiable. Since ๐บ (๐‘ฅ) isassumed left-invertible on its range for ๐‘ฅ near ๐‘ฅ , the mapping ๐‘„ : (๐‘ฅ, ๐‘ง) โ†ฆโ†’ (๐‘ฅ,๐บ (๐‘ฅ)โ€ โˆ—๐‘ง)is an inverse of ๐บ , which is Frรฉchet dierentiable at (๐‘ฅ, ๐‘ง) since ๐‘ฅ โ†’ ๐บ (๐‘ฅ)โ€ โˆ— is Frรฉchetdierentiable at ๐‘ฅ . Finally, we also have

๐บโ€ฒ(๐‘ฅ, ๐‘ฆ) (ฮ”๐‘ฅ,ฮ”๐‘ฆ) = (ฮ”๐‘ฅ, [๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]๐‘ฆ +๐บ (๐‘ฅ)ฮ”๐‘ฆ) .

Thus Theorem 22.12 yields

๐ท [๐บ โ—ฆ ๐น ] (๐‘ฅ |๐‘ฅ, ๐‘ง) (ฮ”๐‘ฅ) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฅ,๐‘ฆ)=(๐‘ฅ,๐‘ง)๐บโ€ฒ(๐‘ฅ, ๐‘ฆ)๐ท๐น (๐‘ฅ |๐‘ฅ, ๐‘ฆ) (ฮ”๐‘ฅ)

=โ‹ƒ

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง๐บโ€ฒ(๐‘ฅ, ๐‘ฆ) (ฮ”๐‘ฅ, ๐ท๐น (๐‘ฅ |๐‘ฆ)ฮ”๐‘ฅ)

=โ‹ƒ

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง{ฮ”๐‘ฅ} ร— ([๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]๐‘ฆ +๐บ (๐‘ฅ)๐ท๐น (๐‘ฅ |๐‘ฆ)ฮ”๐‘ฅ).

It follows that

๐‘‡graph(๐บโ—ฆ๐น ) (๐‘ฅ, ๐‘ฅ, ๐‘ง) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง{(ฮ”๐‘ฅ,ฮ”๐‘ฅ,ฮ”๐‘ง) | ฮ”๐‘ง โˆˆ ([๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]๐‘ฆ +๐บ (๐‘ฅ)๐ท๐น (๐‘ฅ |๐‘ฆ)ฮ”๐‘ฅ)}.

Observe now that ๐‘…1 is linear and invertible on ๐‘… graph(๐บ โ—ฆ ๐น ). Therefore, another applica-tion of Lemma 22.6 yields

๐‘‡graph๐ป (๐‘ฅ, ๐‘ง) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง{(ฮ”๐‘ฅ,ฮ”๐‘ง) | ฮ”๐‘ง โˆˆ ([๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]๐‘ฆ +๐บ (๐‘ฅ)๐ท๐น (๐‘ฅ |๐‘ฆ)ฮ”๐‘ฅ)}.

Since the ๐‘ฆ is unique by our invertibility assumptions on๐บ (๐‘ฅ) and exists due to ๐‘ง โˆˆ ๐ป (๐‘ฅ),we obtain the claim. ๏ฟฝ

Corollary 22.18 (second derivative chain rule for Clarke subdierential). Let๐‘‹,๐‘Œ be Banachspaces, let ๐‘“ : ๐‘Œ โ†’ ๐‘… be locally Lipschitz continuous, and let ๐‘† : ๐‘‹ โ†’ ๐‘Œ be twice continuouslydierentiable. Set โ„Ž : ๐‘‹ โ†’ ๐‘Œ , โ„Ž(๐‘ฅ) โ‰” ๐‘“ (๐‘† (๐‘ฅ)). If there exists a neighborhood ๐‘ˆ of ๐‘ฅ โˆˆ ๐‘‹such that

(i) ๐‘“ is Clarke regular at ๐‘† (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ ;(ii) ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐‘†โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) for all ๐‘ฅ โˆˆ ๐‘ˆ ;

(iii) the mapping ๐‘ฅ โ†ฆโ†’ ๐‘†โ€ฒ(๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ ;

304

22 calculus for the graphical derivative

then for all ๐‘ฅโˆ— โˆˆ ๐œ•๐ถโ„Ž(๐‘ฅ) = ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)),

๐ท [๐œ•๐ถโ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = (๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)โˆ—๐‘ฆโˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ท [๐œ•๐ถ ๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ )

for the unique ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)) such that ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ— = ๐‘ฅโˆ—.

Proof. The expression for ๐œ•๐ถโ„Ž(๐‘ฅ) follows from Theorem 13.23. Let now ๐‘† : ๐‘‹ โ†’ ๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—),๐‘† (๐‘ฅ) โ‰” ๐‘†โ€ฒ(๐‘ฅ)โˆ—. Then ๐‘† is Frรฉchet dierentiable in๐‘ˆ as well, which together with assump-tion (iii) allows us to apply Theorem 22.17 to obtain

๐ท [๐œ•๐ถ (๐‘“ โ—ฆ ๐‘†)] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)๐‘ฆโˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ท [(๐œ•๐ถ ๐‘“ ) โ—ฆ ๐‘†] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Furthermore, since ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse, we can apply Theorem 22.14 to obtainfor all ๐‘ฅ โˆˆ ๐‘ˆ and all ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ))

๐ท [(๐œ•๐ถ ๐‘“ ) โ—ฆ ๐‘†] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘† (๐‘ฅ) |๏ฟฝ๏ฟฝโˆ—) (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ )

for the unique ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)) such that ๐‘†โ€ฒ(๐‘ฅ)โˆ—๏ฟฝ๏ฟฝโˆ— = ๐‘ฅโˆ—. Finally, since the adjoint mapping๐ด โ†ฆโ†’ ๐ดโˆ— is linear and an isometry, it is straightforward to verify using the denition that๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ = (๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)โˆ—, which yields the claim. ๏ฟฝ

305

23 CALCULUS FOR THE FRร‰CHET CODERIVATIVE

We continue with calculus rules for the Frรฉchet coderivative. As in Chapter 22, we startwith the relevant regularity concept, then prove the fundamental lemmas, and nally derivethe calculus rules.

23.1 semi-codifferentiability

Let ๐‘‹,๐‘Œ be Banach spaces. We say that ๐น is semi-codierentiable at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐น (๐‘ฅ) iffor each ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— there exists some ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) such that

(23.1) limgraph ๐น3(๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜ )โ†’(๐‘ฅ,๐‘ฆ)

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œโ€–(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)โ€–๐‘‹ร—๐‘Œ

= 0.

Recalling (18.7), this is equivalent to requiring that โˆ’๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๐‘ฆโˆ—) as well. ByLemma 20.5, a mapping is therefore semi-codierentiable if and only if its inverse is. Inparticular, we have

Lemma 23.1. Let ๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be single-valued. If ๐น is Frรฉchetdierentiable at ๐‘ฅ โˆˆ ๐‘‹ , then

(i) ๐น is semi-codierentiable at ๐‘ฅ for ๐‘ฆ = ๐น (๐‘ฅ);(ii) ๐นโˆ’1 is semi-codierentiable at ๐‘ฆ = ๐น (๐‘ฅ) for ๐‘ฅ .

Proof. According to the discussion above, in both cases we need to prove that there existsan ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) such that also โˆ’๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๐‘ฆโˆ—). This follows immediatelyfrom the expression for ๐ทโˆ—๐น in Theorem 20.12. ๏ฟฝ

306

23 calculus for the frรฉchet coderivative

23.2 cone transformation formulas

In the following, we consider more generally Y-normal cones for Y โ‰ฅ 0 as these results willbe needed later in Chapter 25 for proving the corresponding expressions for the limitingnormal cone. We refer to Section 22.2 for the denition of a family of continuous inverseselections.

Lemma 23.2. Let๐‘‹,๐‘Œ be Banach spaces and assume there exists a family of continuous inverseselections {๐‘…โˆ’1๐‘ฆ : ๐‘ˆ๐‘ฆ โ†’ ๐ถ | ๐‘ฆ โˆˆ ๐ถ, ๐‘…๐‘ฆ = ๐‘ฅ} of ๐‘… โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ) to ๐ถ โŠ‚ ๐‘Œ at ๐‘ฅ โˆˆ ๐‘‹ . If each ๐‘…โˆ’1๐‘ฆ isFrรฉchet dierentiable at ๐‘ฅ and we set ๐ฟ โ‰” โ€–(๐‘…โˆ’1๐‘ฆ )โ€ฒ(๐‘ฅ)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) , then for all Y โ‰ฅ 0,

(23.2) ๐‘Y/โ€–๐‘…โ€–๐•ƒ (๐‘Œ ;๐‘‹ )๐‘…๐ถ (๐‘ฅ) โŠ‚

โ‹‚๐‘ฆ :๐‘…๐‘ฆ=๐‘ฅ

{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘ Y๐ถ (๐‘ฆ)} โŠ‚ ๐‘ Y๐ฟ

๐‘…๐ถ (๐‘ฅ).

In particular,๐‘๐‘…๐ถ (๐‘ฅ) =

โ‹‚๐‘ฆ :๐‘…๐‘ฆ=๐‘ฅ

{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฆ)}.

Proof. By (18.7), ๐‘ฅโˆ— โˆˆ ๐‘ Y๐‘…๐ถ (๐‘ฅ) if and only if

lim sup๐‘…๐ถ3๐‘…๐‘ฆ๐‘˜โ†’๐‘…๐‘ฆ

ใ€ˆ๐‘…โˆ—๐‘ฅโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œโ€–๐‘…(๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)โ€–๐‘‹

โ‰ค Y

for some ๐‘ฆ such that ๐‘…๐‘ฆ = ๐‘ฅ or, equivalently,

(23.3) lim sup๐‘…๐ถ3๐‘ฅ๐‘˜โ†’๐‘ฅ

ใ€ˆ๐‘…โˆ—๐‘ฅโˆ—, ๐‘…โˆ’1๐‘ฆ (๐‘ฅ๐‘˜) โˆ’ ๐‘…โˆ’1๐‘ฆ (๐‘ฅ)ใ€‰๐‘‹โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹

โ‰ค Y .

This in turn holds if and only if for every Yโ€ฒ > Y, there exists a ๐›ฟ > 0 such that

(23.4) ใ€ˆ๐‘…โˆ—๐‘ฅโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ โ‰ค Yโ€ฒโ€–๐‘…(๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)โ€–๐‘‹ (๐‘ฆ๐‘˜ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ) โˆฉ๐ถ).

Similarly, ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘ Y๐ถ (๐‘ฆ) if and only if

(23.5) lim sup๐ถ3๐‘ฆ๐‘˜โ†’๐‘ฆ

ใ€ˆ๐‘…โˆ—๐‘ฅโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œโ€–๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ โ€–๐‘Œ

โ‰ค Y.

Now if๐‘ฅโˆ— โˆˆ ๐‘ Y/โ€–๐‘…โ€–๐•ƒ (๐‘Œ ;๐‘‹ )๐‘…๐ถ (๐‘ฅ), then using (23.4) and estimating โ€–๐‘…(๐‘ฆ๐‘˜โˆ’๐‘ฆ)โ€–๐‘‹ โ‰ค โ€–๐‘…โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) โ€–๐‘ฆ๐‘˜โˆ’

๐‘ฆ โ€–๐‘Œ yields (23.5). Hence the rst inclusion in (23.2) holds. On the other hand, if ๐‘…โˆ—๐‘ฅโˆ— โˆˆ๐‘ Y๐ถ (๐‘ฆ), then we can take ๐‘ฆ๐‘˜ = ๐‘…โˆ’1๐‘ฆ (๐‘ฅ๐‘˜) for some ๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ in (23.5) to obtain (23.3) for

Y = Y๐ฟ, where ๐ฟ โ‰ฅ lim sup๐‘˜โ†’โˆž โ€–๐‘…โˆ’1๐‘ฆ (๐‘ฅ๐‘˜) โˆ’ ๐‘…โˆ’1๐‘ฆ (๐‘ฅ)โ€–๐‘Œ/โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ is nite by the Frรฉchetdierentiability of ๐‘…โˆ’1๐‘ฆ at ๐‘ฅ . Hence the second inclusion in (23.2) holds as well. ๏ฟฝ

307

23 calculus for the frรฉchet coderivative

Remark 23.3 (polarity and qualification condition in finite dimensions). In nite dimensions,Lemma 23.2 for Y = 0 could also be proved with the help of the polarity relationships ๐‘๐‘…๐ถ (๐‘ฅ) =๐‘‡๐‘…๐ถ (๐‘ฅ)โ—ฆ and๐‘๐ถ (๐‘ฆ) = ๐‘‡๐ถ (๐‘ฆ)โ—ฆ from Lemma 18.10. Furthermore, the existence of a family of continuousselections can be replaced by a qualication condition as in Remark 22.7.

We are now ready to prove the fundamental composition lemma, this time for the Frรฉchetnormal cone.

Lemma 23.4 (fundamental lemma on compositions). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}for ๐น : ๐‘‹ โ‡’ ๐‘Œ , and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ .

(i) If ๐บ is semi-codierentiable at ๐‘ฆ for ๐‘ง, then for all Y โ‰ฅ 0,

๐‘ Y๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—

Y ๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—)}.

(ii) If ๐นโˆ’1 is semi-codierentiable at ๐‘ฆ for ๐‘ฅ , then for all Y โ‰ฅ 0,

๐‘ Y๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), โˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—

Y๐บ (๐‘ฆ |๐‘ง) (โˆ’๐‘งโˆ—)}.

Proof. We recall that (๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) โˆˆ ๐‘ Y๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) if and only if

(23.6) lim sup๐ถ3(๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜ ,๐‘ง๐‘˜ )โ†’(๐‘ฅ,๐‘ฆ,๐‘ง)

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ + ใ€ˆ๐‘งโˆ—, ๐‘ง๐‘˜ โˆ’ ๐‘งใ€‰๐‘โ€–(๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆ’ (๐‘ฅ, ๐‘ฆ, ๐‘ง)โ€–๐‘‹ร—๐‘Œร—๐‘ โ‰ค Y.

In case (i), there exists a ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—) such that

limgraph๐บ3(๐‘ฆ๐‘˜ ,๐‘ง๐‘˜ )โ†’(๐‘ฆ,๐‘ง)

ใ€ˆ๏ฟฝ๏ฟฝโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ใ€ˆ๐‘งโˆ—, ๐‘ง๐‘˜ โˆ’ ๐‘งใ€‰๐‘โ€–(๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆ’ (๐‘ฆ, ๐‘ง)โ€–๐‘Œร—๐‘ = 0.

Thus (23.6) holds if and only if

lim sup๐ถ3(๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜ ,๐‘ง๐‘˜ )โ†’(๐‘ฅ,๐‘ฆ,๐‘ง)

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ + ใ€ˆ๏ฟฝ๏ฟฝโˆ— + ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œโ€–(๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆ’ (๐‘ฅ, ๐‘ฆ, ๐‘ง)โ€–๐‘‹ร—๐‘Œร—๐‘ โ‰ค Y.

But this is equivalent to ๐‘ฅโˆ— โˆˆ ๐ทโˆ—Y ๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), which yields the claim.

In case (ii), the semi-codierentiability of ๐นโˆ’1 yields the existence of a ๏ฟฝ๏ฟฝโˆ— โˆˆ โˆ’๐‘ฆโˆ— +๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (โˆ’๐‘ฅโˆ—), i.e., such that ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), satisfying

limgraph๐บ3(๐‘ฆ๐‘˜ ,๐‘ฅ๐‘˜ )โ†’(๐‘ฆ,๐‘ฅ)

โˆ’ใ€ˆ๏ฟฝ๏ฟฝโˆ— + ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ โˆ’ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–(๐‘ฆ๐‘˜ , ๐‘ฅ๐‘˜) โˆ’ (๐‘ฆ, ๐‘ฅ)โ€–๐‘Œร—๐‘‹ = 0.

Thus (23.6) holds if and only if

lim sup๐ถ3(๐‘ฅ๐‘˜ ,๐‘ฆ๐‘˜ ,๐‘ง๐‘˜ )โ†’(๐‘ฅ,๐‘ฆ,๐‘ง)

ใ€ˆ๐‘งโˆ—, ๐‘ง๐‘˜ โˆ’ ๐‘งใ€‰๐‘‹ โˆ’ ใ€ˆ๏ฟฝ๏ฟฝโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œโ€–(๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆ’ (๐‘ฅ, ๐‘ฆ, ๐‘ง)โ€–๐‘‹ร—๐‘Œร—๐‘ โ‰ค Y.

But this is equivalent to โˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—Y๐บ (๐‘ฆ |๐‘ง) (โˆ’๐‘งโˆ—), which yields the claim. ๏ฟฝ

308

23 calculus for the frรฉchet coderivative

For the remaining results, we x Y = 0. If one of the two mappings is single-valued, we canuse Lemma 23.1 for verifying its semi-dierentiability and Theorem 20.12 for the expressionof its graphical derivative to obtain from Lemma 23.4 the following two special cases.

Corollary 23.5 (fundamental lemma on compositions: single-valued outer mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ,๐บ (๐‘ฆ)) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ)}for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ and ๐บ is Frรฉchet dierentiable at ๐‘ฆ , then

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’[๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ— โˆ’ ๐‘ฆโˆ—), ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.

Corollary 23.6 (fundamental lemma on compositions: single-valued inner mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ = ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ , ๐น is continuously Frรฉchet dierentiable at ๐‘ฅand ๐น โ€ฒ(๐‘ฅ)โˆ— has a left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐น โ€ฒ(๐‘ฅ)โˆ—(โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), ๐‘ฆโˆ—, ๐‘งโˆ—) | โˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (โˆ’๐‘งโˆ—), ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.

23.3 calculus rules

Using these lemmas, we again obtain calculus rules. The proofs are similar to those inSection 22.3, and we only note the dierences.

Theorem 23.7 (addition of a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ be Banachspaces, ๐บ : ๐‘‹ โ†’ ๐‘Œ be Frรฉchet dierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then for any ๐‘ฅ โˆˆ ๐‘‹ and๐‘ฆ โˆˆ ๐ป (๐‘ฅ) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ฅ),

๐ทโˆ—๐ป (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ โˆ’๐บ (๐‘ฅ)) (๐‘ฆโˆ—) + [๐บโ€ฒ(๐‘ฅ)]โˆ—๐‘ฆโˆ— (๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—).

Proof. We have graph๐ป = ๐‘…๐ถ for ๐ถ and ๐‘… as given by (22.4) in the proof of Theorem 22.11.Applying Lemma 23.2 in place of Lemma 22.6 in the proof of Theorem 22.11 gives

๐‘graph๐ป (๐‘ฅ, ๐‘ฆ) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—) | (๐‘ฆโˆ—, ๐‘ฅโˆ—, ๐‘ฆโˆ—) โˆˆ ๐‘๐ถ (๐‘ฆ โˆ’๐บ (๐‘ฅ), ๐‘ฅ,๐บ (๐‘ฅ))}.Moreover,๐ถ given in (22.4) coincides with the๐ถ dened in Corollary 23.5 with ๐นโˆ’1 in placeof ๐น . Using Corollary 23.5 and inserting the expression from Lemma 20.5 for ๐ทโˆ—๐นโˆ’1, itfollows

๐‘๐ถ (๐‘ข, ๐‘ฅ, ๐‘ฃ) = {(๐‘ขโˆ—, ๐‘ฅโˆ—, ๐‘ฃโˆ—) โˆˆ ๐‘Œ โˆ— ร— ๐‘‹ โˆ— ร— ๐‘Œ โˆ— | ๐‘ขโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’[๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘ฃโˆ— โˆ’ ๐‘ฅโˆ—)}= {(๐‘ขโˆ—, ๐‘ฅโˆ—, ๐‘ฃโˆ—) โˆˆ ๐‘Œ โˆ— ร— ๐‘‹ โˆ— ร— ๐‘Œ โˆ— | [๐บโ€ฒ(๐‘ฅ)]โˆ—๐‘ฃโˆ— + ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ข) (โˆ’๐‘ขโˆ—)}.

309

23 calculus for the frรฉchet coderivative

Thus๐ทโˆ—๐ป (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = {๐‘ฅโˆ— | (โˆ’๐‘ฆโˆ—, ๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘๐ถ (๐‘ฆ โˆ’๐บ (๐‘ฅ), ๐‘ฅ,๐บ (๐‘ฅ))}

= {๐‘ฅโˆ— | โˆ’[๐บโ€ฒ(๐‘ฅ)]โˆ—๐‘ฆโˆ— + ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ โˆ’๐บ (๐‘ฅ)) (๐‘ฆโˆ—)},which yields the claim. ๏ฟฝ

Theorem 23.8 (outer composition with a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ beBanach spaces, ๐น : ๐‘‹ โ‡’ ๐‘Œ , and๐บ : ๐‘Œ โ†’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐น (๐‘‹ )) be given. If๐บ is Frรฉchet dierentiable at every ๐‘ฆ โˆˆ ๐น (๐‘ฅ), invertible on ran๐บ near ๐‘ง, and the inverse ๐บโˆ’1

is Frรฉchet dierentiable at ๐‘ง, then

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) =โ‹‚

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) ( [๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

Proof. We have graph๐ป = ๐‘…๐ถ for ๐‘… and ๐ถ as given by (22.5) in the proof of Theorem 22.12.Applying Lemma 23.2 in place of Lemma 22.6 then yields

๐‘graph๐ป (๐‘ฅ, ๐‘ง) =โ‹‚

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง{(๐‘ฅโˆ—, ๐‘งโˆ—) | (๐‘ฅโˆ—, 0, ๐‘งโˆ—) โˆˆ ๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง)}.

Corollary 23.5 then shows that

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) =โ‹‚

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง{๐‘ฅโˆ— | (๐‘ฅโˆ—, 0,โˆ’๐‘งโˆ—) โˆˆ ๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง)}

=โ‹‚

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง{๐‘ฅโˆ— | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) ( [๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ—)}.

After further simplication, we arrive at the claimed expression. ๏ฟฝ

Corollary 23.9 (outer composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces,๐ด โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ), and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐ด has a bounded left-inverse ๐ดโ€ , then for any ๐‘ฅ โˆˆ ๐‘‹ and๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐ด๐น (๐‘ฅ),

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐ดโˆ—๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—)

for the unique ๐‘ฆ โˆˆ ๐‘Œ such that ๐ด๐‘ฆ = ๐‘ง.

Proof. We only need to verify that ๐บ (๐‘ฆ) โ‰” ๐ด๐‘ง satises the assumptions of Theorem 23.8,which can be done exactly as in the proof of Corollary 22.13. ๏ฟฝ

Theorem 23.10 (inner composition with a single-valued mapping). Let ๐‘‹,๐‘Œ, ๐‘ be Banachspaces, ๐น : ๐‘‹ โ†’ ๐‘Œ and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐บ (๐น (๐‘ฅ)). If ๐น is continuouslyFrรฉchet dierentiable near ๐‘ฅ and ๐น โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = [๐น โ€ฒ(๐‘ฅ)]โˆ—๐ทโˆ—๐บ (๐น (๐‘ฅ) |๐‘ง) (๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

310

23 calculus for the frรฉchet coderivative

Proof. We have graph๐ป = ๐‘…๐ถ for๐ถ and ๐‘… as given by (22.6) in the proof of Theorem 22.14.Applying Lemma 23.2 in place of Theorem 22.14 then yields

๐‘graph๐ป (๐‘ฅ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘งโˆ—) | (๐‘ฅโˆ—, 0, ๐‘งโˆ—) โˆˆ ๐‘๐ถ (๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง)}.

On the other hand, since ๐น is Frรฉchet dierentiable, Corollary 23.6 implies that

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐น โ€ฒ(๐‘ฅ)โˆ—(โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), ๐‘ฆโˆ—, ๐‘งโˆ—) | โˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (โˆ’๐‘งโˆ—), ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.

Thus๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = {๐‘ฅโˆ— | (๐‘ฅโˆ—, 0,โˆ’๐‘งโˆ—) โˆˆ ๐‘๐ถ (๐‘ฅ, ๐น (๐‘ฅ), ๐‘ง)}

= {๐น โ€ฒ(๐‘ฅ)โˆ—๏ฟฝ๏ฟฝโˆ— | ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—)},which yields the claim. ๏ฟฝ

Corollary 23.11 (inner composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces,๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐ป โ‰” ๐บ โ—ฆ ๐ด for ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and ๐บ : ๐‘Œ โ‡’ ๐‘ onBanach spaces ๐‘‹,๐‘Œ , and ๐‘ . If ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then for all ๐‘ฅ โˆˆ ๐‘‹ and๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐ด๐‘ฅ),

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = ๐ดโˆ—๐ทโˆ—๐บ (๐ด๐‘ฅ |๐‘ง) (๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

We again apply this to the chain rule from Theorem 4.17. Compare the following expressionwith that from Corollary 22.16, noting that ๐œ•๐‘“ : ๐‘‹ โ‡’ ๐‘‹ โˆ— in Banach spaces such that๐ทโˆ— [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) : ๐‘‹ โˆ—โˆ— โ†’ ๐‘‹ โˆ—.

Corollary 23.12 (second derivative chain rule for convex subdierential). Let๐‘‹,๐‘Œ be Banachspaces, ๐‘“ : ๐‘Œ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) be suchthat ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), and ran๐ด โˆฉ int dom ๐‘“ โ‰  โˆ…. Let โ„Ž โ‰” ๐‘“ โ—ฆ๐ด. Thenfor any ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐œ•โ„Ž(๐‘ฅ) = ๐ดโˆ—๐œ•๐‘“ (๐ด๐‘ฅ),

๐ทโˆ— [๐œ•โ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (๐‘ฅโˆ—โˆ—) = ๐ดโˆ—๐ทโˆ— [๐œ•๐‘“ ] (๐ด๐‘ฅ |๐‘ฆโˆ—) (๐ดโˆ—โˆ—๐‘ฅโˆ—โˆ—) (๐‘ฅโˆ—โˆ— โˆˆ ๐‘‹ โˆ—โˆ—)

for the unique ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— satisfying ๐ดโˆ—๐‘ฆโˆ— = ๐‘ฅโˆ—.

Proof. The expression for ๐œ•โ„Ž(๐‘ฅ) follows from Theorem 4.17, to which we apply Corol-lary 23.11 as well as Corollary 23.9 with ๐ดโˆ— in place of ๐ด. ๏ฟฝ

Hence if ๐‘‹ is reexive, the expression for the coderivative is identical to that for thegraphical derivative.

For the corresponding result for the Clarke subdierential, we again need a product rule.

311

23 calculus for the frรฉchet coderivative

Theorem 23.13 (product rule). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces, ๐บ : ๐‘‹ โ†’ ๐•ƒ(๐‘Œ ;๐‘ ) be Frรฉchetdierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐บ (๐‘ฅ) โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ) has a left-inverse ๐บ (๐‘ฅ)โ€ โˆ— on ran๐บ (๐‘ฅ)for ๐‘ฅ near ๐‘ฅ โˆˆ ๐‘‹ and the mapping ๐‘ฅ โ†ฆโ†’ ๐บ (๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ , then for all๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐‘ฅ)๐น (๐‘ฅ) โ‰” โ‹ƒ

๐‘ฆโˆˆ๐น (๐‘ฅ)๐บ (๐‘ฅ)๐‘ฆ ,

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐บ (๐‘ฅ)โˆ—๐‘งโˆ—) + ([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ— (๐‘งโˆ— โˆˆ ๐‘ โˆ—)

for the unique ๐‘ฆ โˆˆ ๐น (๐‘ฅ) satisfying that ๐บ (๐‘ฅ)๐‘ฆ = ๐‘ง.

Proof. The proof is analogous to Theorem 22.17 for the graphical derivative. We againhave graph๐ป = ๐‘…1 graph(๐บ โ—ฆ ๐น ) for ๐‘…1, ๐น , and๐บ dened in the proof of Theorem 22.17. Let๐‘ฆ โˆˆ ๐น (๐‘ฅ). By Lemma 23.2, we have

๐‘๐‘…0 graph ๐น (๐‘ฅ, ๐‘ฅ, ๐‘ฆ) = {(๐‘ฅโˆ—,โˆ’๐‘ฅโˆ—0,โˆ’๐‘ฆโˆ—) | (๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)}.

so that๐ทโˆ—๐น (๐‘ฅ |๐‘ฅ, ๐‘ฆ) (๐‘ฅโˆ—0, ๐‘ฆโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) + ๐‘ฅโˆ—0

We also have๐บโ€ฒ(๐‘ฅ, ๐‘ฆ)โˆ—(๐‘ฅโˆ—0, ๐‘งโˆ—) = (๐‘ฅโˆ—0 + ([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ—,๐บ (๐‘ฅ)โˆ—๐‘งโˆ—).

We now apply Theorem 23.8, whose remaining assumptions are veried exactly as thoseof Theorem 22.12, which yields

๐ทโˆ— [๐บ โ—ฆ ๐น ] (๐‘ฅ |๐‘ฅ, ๐‘ง) (๐‘ฅโˆ—0, ๐‘งโˆ—) =โ‹‚

๐‘ฆ :๐บ (๐‘ฅ,๐‘ฆ)=(๐‘ฅ,๐‘ง)๐ทโˆ—๐น (๐‘ฅ |๐‘ฅ, ๐‘ฆ) (๐บโ€ฒ(๐‘ฅ, ๐‘ฆ)โˆ—(๐‘ฅโˆ—0, ๐‘งโˆ—))

=โ‹‚

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง๐ทโˆ—๐น (๐‘ฅ |๐‘ฅ, ๐‘ฆ) (๐‘ฅโˆ—0 + ([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ—,๐บ (๐‘ฅ)โˆ—๐‘งโˆ—)

=โ‹‚

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง๐‘ฅโˆ—0 + ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐บ (๐‘ฅ)โˆ—๐‘งโˆ—) + ([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ—.

It follows that

๐‘graph(๐บโ—ฆ๐น ) (๐‘ฅ, ๐‘ฅ, ๐‘ง) =โ‹‚

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง

{(๐‘ฅโˆ—,โˆ’๐‘ฅโˆ—0,โˆ’๐‘งโˆ—)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—0 โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐บ (๐‘ฅ)โˆ—๐‘งโˆ—)+([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ—

}.

Observe now that ๐‘…1 is linear and invertible on ๐‘… graph(๐บ โ—ฆ ๐น ). Therefore, another applica-tion of Lemma 23.2 yields

๐‘graph๐ป (๐‘ฅ, ๐‘ง) =โ‹‚

๐‘ฆ :๐บ (๐‘ฅ)๐‘ฆ=๐‘ง{(๐‘ฅโˆ—,โˆ’๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐บ (๐‘ฅ)โˆ—๐‘งโˆ—) + ([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ—}.

Since the ๐‘ฆ is unique by our invertibility assumptions on๐บ (๐‘ฅ) and exists due to ๐‘ง โˆˆ ๐ป (๐‘ฅ),we obtain the claim. ๏ฟฝ

312

23 calculus for the frรฉchet coderivative

Corollary 23.14 (second derivative chain rule for Clarke subdierential). Let ๐‘‹,๐‘Œ be Banachspaces, let ๐‘“ : ๐‘Œ โ†’ ๐‘… be locally Lipschitz continuous, and let ๐‘† : ๐‘‹ โ†’ ๐‘Œ be twice continuouslydierentiable. Set โ„Ž : ๐‘‹ โ†’ ๐‘Œ , โ„Ž(๐‘ฅ) โ‰” ๐‘“ (๐‘† (๐‘ฅ)). If there exists a neighborhood ๐‘ˆ of ๐‘ฅ โˆˆ ๐‘‹such that

(i) ๐‘“ is Clarke regular at ๐‘† (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ ;(ii) ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐‘†โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) for all ๐‘ฅ โˆˆ ๐‘ˆ ;

(iii) the mapping ๐‘ฅ โ†ฆโ†’ ๐‘†โ€ฒ(๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ ;then for all ๐‘ฅโˆ— โˆˆ ๐œ•๐ถโ„Ž(๐‘ฅ) = ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)),

๐ทโˆ— [๐œ•๐ถโ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (๐‘ฅโˆ—โˆ—) = ๐‘† (๐‘ฅ)โˆ—๐‘ฅโˆ—โˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ทโˆ— [๐œ•๐ถ ๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)โˆ—โˆ—๐‘ฅโˆ—โˆ—) (๐‘ฅโˆ—โˆ— โˆˆ ๐‘‹ โˆ—โˆ—)

for the linear operator ๐‘† : ๐‘‹ โ†’ ๐•ƒ(๐‘‹ ;๐‘‹ โˆ—), ๐‘† (๐‘ฅ)ฮ”๐‘ฅ := (๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)โˆ—๐‘ฆโˆ— and the unique ๐‘ฆโˆ— โˆˆ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)) such that ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ— = ๐‘ฅโˆ—.

Proof. The expression for ๐œ•๐ถโ„Ž(๐‘ฅ) follows from Theorem 13.23. Let now ๐‘† : ๐‘‹ โ†’ ๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—),๐‘† (๐‘ฅ) โ‰” ๐‘†โ€ฒ(๐‘ฅ)โˆ—. Then ๐‘† is Frรฉchet dierentiable in๐‘ˆ as well, which together with assump-tion (iii) allows us to apply Theorem 23.13 to obtain

๐ทโˆ— [๐œ•๐ถ (๐‘“ โ—ฆ๐‘†)] (๐‘ฅ |๐‘ฅโˆ—) (๐‘ฅโˆ—โˆ—) = (๐‘†โ€ฒ(๐‘ฅ)๐‘ฆโˆ—)โˆ—๐‘ฅโˆ—โˆ—+๐ทโˆ— [(๐œ•๐ถ ๐‘“ )โ—ฆ๐‘†] (๐‘ฅ |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)โˆ—โˆ—๐‘ฅโˆ—โˆ—) (๐‘ฅโˆ—โˆ— โˆˆ ๐‘‹ โˆ—โˆ—).

Furthermore, since ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse, we can apply Theorem 23.10 to obtainfor all ๐‘ฅ โˆˆ ๐‘ˆ and all ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ))

๐ทโˆ— [(๐œ•๐ถ ๐‘“ ) โ—ฆ ๐‘†] (๐‘ฅ |๏ฟฝ๏ฟฝโˆ—) (๐‘ฆโˆ—โˆ—) = ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ทโˆ— [๐œ•๐ถ ๐‘“ ] (๐‘† (๐‘ฅ) |๏ฟฝ๏ฟฝโˆ—) (๐‘ฆโˆ—โˆ—) (๐‘ฆโˆ—โˆ— โˆˆ ๐‘Œ โˆ—โˆ—)

for the unique ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)) such that ๐‘†โ€ฒ(๐‘ฅ)โˆ—๏ฟฝ๏ฟฝโˆ— = ๐‘ฅโˆ—. The claim now follows again fromthe fact that ๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ = (๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)โˆ—. ๏ฟฝ

Note that ๐‘† (๐‘ฅ)ฮ”๐‘ฅ := (๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)โˆ—๐‘ฆโˆ— also occurs in the corresponding Corollary 22.18 andrecall from Examples 20.1 and 20.6 and Theorem 20.12 that coderivatives for dierentiablesingle-valued mappings amount to taking adjoints of their Frรฉchet derivative.

313

24 CALCULUS FOR THE CLARKE GRAPHICAL

DERIVATIVE

We now turn to the limiting (co)derivatives. Compared to the basic (co)derivatives, calculusrules for these are much more challenging and require even more assumptions. In this chap-ter, we consider the Clarke graphical derivative, where in addition to strict dierentiabilitywe will for the sake of simplicity assume T-regularity of the set-valued mapping (so thatthe Clarke graphical derivative coincides with the graphical derivative) and show that thisregularity is preserved under addition and composition with a single-valued mapping.

24.1 strict differentiability

The following concept generalizes the notion of strict dierentiability for single-valuedmappings (see Remark 2.6) to set-valued mappings. Let ๐‘‹,๐‘Œ be Banach spaces. We say that๐น : ๐‘‹ โ‡’ ๐‘Œ is strictly dierentiable at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐น (๐‘ฅ) if graph ๐น is closed near (๐‘ฅ, ๐‘ฆ)and

for every ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ), ๐œ๐‘˜โ†’ 0, ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ with ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜๐œ๐‘˜

โ†’ ฮ”๐‘ฅ,

and ๏ฟฝ๏ฟฝ๐‘˜ โˆˆ ๐น (๐‘ฅ๐‘˜) with ๏ฟฝ๏ฟฝ๐‘˜ โ†’ ๐‘ฆ,

(24.1a)

there exist ๐‘ฆ๐‘˜ โˆˆ ๐น (๐‘ฅ๐‘˜) with ๐‘ฆ๐‘˜ โˆ’ ๏ฟฝ๏ฟฝ๐‘˜๐œ๐‘˜

โ†’ ฮ”๐‘ฆ.(24.1b)

Compared to semi-dierentiability, strict dierentiability requires that the limits realizingthe various directions are interchangeable with limits of the base points; in other words,that the graphical derivative is itself an inner limit, i.e., if

(24.2) ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = lim inf๐œโ†’ 0,ฮ”๐‘ฅโ†’ฮ”๐‘ฅ

graph ๐น3(๐‘ฅ,๏ฟฝ๏ฟฝ)โ†’(๐‘ฅ,๐‘ฆ)

๐น (๐‘ฅ + ๐œฮ”๐‘ฅ) โˆ’ ๏ฟฝ๏ฟฝ๐œ

(ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Lemma 24.1. If ๐‘‹ and ๐‘Œ are nite-dimensional, then ๐น : ๐‘‹ โ‡’ ๐‘Œ is strictly dierentiable at๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐‘Œ if and only if

(24.3) ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = lim infgraph ๐น3(๐‘ฅ,๏ฟฝ๏ฟฝ)โ†’(๐‘ฅ,๐‘ฆ),

ฮ”๐‘ฅโ†’ฮ”๐‘ฅ, ๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ)โ‰ โˆ…๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

314

24 calculus for the clarke graphical derivative

Proof. We rst show that

(24.4) graph๐ท๐น (๐‘ฅ, ๐‘ฆ) โŠ‚(ฮ”๐‘ฅ,ฮ”๐‘ฆ)

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝฮ”๐‘ฆ โˆˆ lim infgraph ๐น3(๐‘ฅ,๏ฟฝ๏ฟฝ)โ†’(๐‘ฅ,๐‘ฆ),

ฮ”๐‘ฅโ†’ฮ”๐‘ฅ, ๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ)โ‰ โˆ…๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ)

โ‰• ๐พ.

If (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆ‰ ๐พ , then there exist graph ๐น 3 (๐‘ฅ๐‘˜ , ๏ฟฝ๏ฟฝ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ) and ฮ”๐‘ฅ๐‘˜ โ†’ ฮ”๐‘ฅ with๐ท๐น (๐‘ฅ๐‘˜ |๏ฟฝ๏ฟฝ๐‘˜) (ฮ”๐‘ฅ๐‘˜) โ‰  โˆ… such that for some Y > 0 and an innite subset ๐‘ โŠ‚ โ„•,

infฮ”๐‘ฆ๐‘˜โˆˆ๐ท๐น (๐‘ฅ๐‘˜ |๏ฟฝ๏ฟฝ๐‘˜ ) (ฮ”๐‘ฅ๐‘˜ )

โ€–ฮ”๐‘ฆ๐‘˜ โˆ’ ฮ”๐‘ฆ โ€– โ‰ฅ 2Y (๐‘˜ โˆˆ ๐‘ ).

By the characterization (20.1) of ๐ท๐น (๐‘ฅ๐‘˜ |๏ฟฝ๏ฟฝ๐‘˜), this implies the existence of ๐œ๐‘˜โ†’ 0 such that

lim sup๐‘˜โ†’โˆž

inf๐‘ฆ๐‘˜โˆˆ๐น (๐‘ฅ๐‘˜+๐œ๐‘˜ฮ”๐‘ฅ๐‘˜ )

๐‘ฆ๐‘˜ โˆ’ ๏ฟฝ๏ฟฝ๐‘˜๐œ๐‘˜โˆ’ ฮ”๐‘ฆ

โ‰ฅ Y.

Thus (ฮ”๐‘ฅ,ฮ”๐‘ฆ) โˆ‰ graph๐ท๐น (๐‘ฅ, ๐‘ฆ), so that (24.4) holds.

Writing now (24.3) as

๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) ={ฮ”๐‘ฆ โˆˆ ๐‘Œ

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ (๐‘ฅ, ๏ฟฝ๏ฟฝ,ฮ”๐‘ฅ) โ†’ (๐‘ฅ, ๐‘ฆ,ฮ”๐‘ฅ) โ‡’ โˆƒฮ”๏ฟฝ๏ฟฝ โ†’ ฮ”๐‘ฆwith ฮ”๏ฟฝ๏ฟฝ โˆˆ ๐ท๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (ฮ”๐‘ฅ)

},

the characterization (20.4) of๐ท๐น (๐‘ฅ, ๐‘ฆ) provides the opposite inclusion graph๐ท๐น (๐‘ฅ, ๐‘ฆ) โŠ‚ ๐พ .Therefore (24.3) holds. ๏ฟฝ

In particular, single-valued continuously dierentiable mappings and their inverses arestrictly dierentiable.

Lemma 24.2. Let ๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be single-valued.

(i) If ๐น is continuously dierentiable at ๐‘ฅ โˆˆ ๐‘‹ , then ๐น is strictly dierentiable at ๐‘ฅ for๐‘ฆ = ๐น (๐‘ฅ).

(ii) If ๐น is continuously dierentiable near ๐‘ฅ โˆˆ ๐‘‹ and ๐น โ€ฒ(๐‘ฅ)โˆ— has a left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then ๐นโˆ’1 is strictly dierentiable at ๐‘ฆ = ๐น (๐‘ฅ) for ๐‘ฅ .

Proof. The proof is analogous to Lemma 22.3, since the inverse function Theorem 2.8establishes the continuous dierentiability of ๐นโˆ’1 and hence strict dierentiability. ๏ฟฝ

Remark 24.3. As in Remark 22.4, if ๐‘‹ is nite-dimensional, it suces in Lemma 24.2 (ii) to assumethat ๐น is continuously dierentiable with ker ๐น โ€ฒ(๐‘ฅ)โˆ— = {0}.

315

24 calculus for the clarke graphical derivative

24.2 cone transformation formulas

The main aim in the following lemmas is to show that tangential regularity is preservedunder certain transformations. We do this by proceeding as in Section 22.2 to derive explicitexpressions for the transformed cones and then comparing them with the correspondingexpressions obtained there for the graphical derivative.

Lemma 24.4. Let๐‘‹,๐‘Œ be Banach spaces and assume there exists a family of continuous inverseselections {๐‘…โˆ’1๐‘ฆ : ๐‘ˆ๐‘ฆ โ†’ ๐ถ | ๐‘ฆ โˆˆ ๐ถ, ๐‘…๐‘ฆ = ๐‘ฅ} of ๐‘… โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ) to ๐ถ โŠ‚ ๐‘Œ at ๐‘ฅ โˆˆ ๐‘‹ . If each ๐‘…โˆ’1๐‘ฆ isFrรฉchet dierentiable at ๐‘ฅ and ๐ถ is tangentially regular at all ๐‘ฆ โˆˆ ๐ถ with ๐‘…๐‘ฆ = ๐‘ฅ , then ๐‘…๐ถ istangentially regular at ๐‘ฅ and

๐‘‡๐‘…๐ถ (๐‘ฅ) =โ‹ƒ

๐‘ฆ :๐‘…๐‘ฆ=๐‘ฅ๐‘…๐‘‡๐ถ (๐‘ฆ).

Proof. We rst prove โ€œโŠ‚โ€. Suppose ฮ”๐‘ฆ โˆˆ ๐‘‡๐ถ (๐‘ฆ) for some ๐‘ฆ โˆˆ ๐‘Œ with ๐‘…๐‘ฆ = ๐‘ฅ . Then forany ๐ถ 3 ๏ฟฝ๏ฟฝ๐‘˜ โ†’ ๐‘ฆ there exist ๐‘ฆ๐‘˜ โˆˆ ๐ถ and ๐œ๐‘˜โ†’ 0 such that ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž(๐‘ฆ๐‘˜ โˆ’ ๏ฟฝ๏ฟฝ๐‘˜)/๐œ๐‘˜ .Consequently, since ๐‘… is bounded, ๐‘…(๐‘ฆ๐‘˜ โˆ’ ๏ฟฝ๏ฟฝ๐‘˜)/๐œ๐‘˜ โ†’ ๐‘…ฮ”๐‘ฆ . To show that ๐‘…ฮ”๐‘ฆ โˆˆ ๐‘‡๐‘…๐ถ (๐‘ฅ), let๐‘…๐ถ 3 ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ be given. Take now ๏ฟฝ๏ฟฝ๐‘˜ = ๐‘…โˆ’1๐‘ฆ (๐‘ฅ๐‘˜), which satises ๏ฟฝ๏ฟฝ๐‘˜ โ†’ ๐‘ฆ = ๐‘…โˆ’1๐‘ฆ (๐‘ฅ) due to๐‘ฅ๐‘˜ โ†’ ๐‘ฅ . Then ๐‘ฅ๐‘˜ = ๐‘…๏ฟฝ๏ฟฝ๐‘˜ โˆˆ ๐‘…๐ถ satises (๐‘…๐‘ฆ๐‘˜ โˆ’ ๐‘ฅ๐‘˜)/๐œ๐‘˜ โ†’ ๐‘…ฮ”๐‘ฆ , which shows โ€œโŠƒโ€.To prove โ€œโŠƒโ€, suppose that ฮ”๐‘ฅ โˆˆ ๐‘‡๐‘…๐ถ (๐‘ฅ) and hence ฮ”๐‘ฅ โˆˆ ๐‘‡๐‘…๐ถ (๐‘ฅ) by Theorem 18.5. ByLemma 22.6, ฮ”๐‘ฅ = ๐‘…ฮ”๐‘ฆ for some ๐‘ฆ โˆˆ ๐‘Œ with ๐‘…๐‘ฆ = ๐‘ฅ and ฮ”๐‘ฆ โˆˆ ๐‘‡๐ถ (๐‘ฆ) = ๐‘‡๐ถ (๐‘ฆ) by theassumed tangential regularity of ๐ถ at ๐‘ฆ . This shows โ€œโŠ‚โ€.Comparing now the expression for๐‘‡๐‘…๐ถ (๐‘ฆ) = ๐‘‡๐ถ (๐‘ฆ) with the expression for๐‘‡๐‘…๐ถ (๐‘ฅ) providedby Lemma 22.6 and using the tangential regularity of ๐ถ shows the claimed tangentialregularity of ๐‘…๐ถ . ๏ฟฝ

Remark 24.5 (regularity assumptions). The assumption in Lemma 24.4 that๐ถ is tangentially regularis not needed if ker๐‘… = {0} or, more generally, if ๐‘… is a continuously dierentiable mapping withkerโˆ‡๐‘…(๐‘ฆ) = {0}; see [Mordukhovich 1994, Corollary 5.4].

Lemma 24.6 (fundamental lemma on compositions). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}

for ๐น : ๐‘‹ โ‡’ ๐‘Œ , and ๐บ : ๐‘Œ โ‡’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ and either

(a) ๐บ is strictly dierentiable at ๐‘ฆ for ๐‘ง, or

(b) ๐นโˆ’1 is strictly dierentiable at ๐‘ฆ for ๐‘ฅ ,

316

24 calculus for the clarke graphical derivative

then

(24.5) ๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) | ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ), ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ)}.Moreover, if ๐น is T-regular at ๐‘ฅ for ๐‘ฆ and ๐บ is T-regular at ๐‘ฆ for ๐‘ง, then ๐ถ is tangentiallyregular at (๐‘ฅ, ๐‘ฆ, ๐‘ง).

Proof. The proof is analogous to Lemma 22.8, using in this case the strict dierentiabilityof ๐บ in place of semi-dierentiability. We only consider the case (a) as the case (b) is againproved similarly. First, we have (ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) โˆˆ ๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) if and only if for all ๐œ๐‘˜โ†’ 0 and(๐‘ฅ๐‘˜ , ๏ฟฝ๏ฟฝ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ, ๐‘ง), there exist (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆˆ ๐ถ such that

ฮ”๐‘ฅ = lim๐‘˜โ†’โˆž

๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜๐œ๐‘˜

, ฮ”๐‘ฆ = lim๐‘˜โ†’โˆž

๐‘ฆ๐‘˜ โˆ’ ๏ฟฝ๏ฟฝ๐‘˜๐œ๐‘˜

, ฮ”๐‘ง = lim๐‘˜โ†’โˆž

๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜๐œ๐‘˜

.

This immediately yields โ€œโŠ‚โ€.To prove โ€œโŠƒโ€, suppose ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) and ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ) and take ๐œ๐‘˜โ†’ 0 and(๐‘ฅ๐‘˜ , ๏ฟฝ๏ฟฝ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ, ๐‘ง). Furthermore, by denition of ๐ท๐น (๐‘ฅ |๐‘ฆ), there exist (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โˆˆgraph ๐น such that the rst two limits hold. By the strict dierentiability of ๐บ at ๐‘ฆ for๐‘ง, we can also nd ๐‘ง๐‘˜ โˆˆ ๐บ (๐‘ฆ๐‘˜) such that (๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜)/๐œ๐‘˜ โ†’ ฮ”๐‘ง. This shows the remaininglimit.

Finally, the tangential regularity of ๐ถ follows from the assumed ๐‘‡ -regularities of ๐น and ๐บby comparing (24.5) with the corresponding expression (22.3). ๏ฟฝ

If one of the two mappings is single-valued, we can use Lemma 24.2 for verifying itssemi-dierentiability and Theorem 20.12 for the expression of its graphical derivative toobtain from Lemma 24.6 the following two special cases.

Corollary 24.7 (fundamental lemma on compositions: single-valued outer mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ,๐บ (๐‘ฆ)) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ)}for ๐น : ๐‘‹ โ‡’ ๐‘Œ and๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ and๐บ is continuously dierentiable at ๐‘ฆ , then

๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,๐บโ€ฒ(๐‘ฆ)ฮ”๐‘ฆ) | ฮ”๐‘ฆ โˆˆ ๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ)}.Moreover, if ๐น is T-regular at (๐‘ฅ, ๐‘ฆ), then ๐ถ is tangentially-regular at (๐‘ฅ, ๐‘ฆ,๐บ (๐‘ฆ)).

Corollary 24.8 (fundamental lemma on compositions: single-valued inner mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ = ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ , ๐น is continuously Frรฉchet dierentiable at ๐‘ฅ ,and ๐น โ€ฒ(๐‘ฅ)โˆ— has a left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then

๐‘‡๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(ฮ”๐‘ฅ,ฮ”๐‘ฆ,ฮ”๐‘ง) | ฮ”๐‘ฆ = ๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ, ฮ”๐‘ง โˆˆ ๐ท๐บ (๐‘ฆ |๐‘ง) (ฮ”๐‘ฆ)}.Moreover, if ๐บ is T-regular at (๐‘ฆ, ๐‘ง), then ๐ถ is tangentially regular at (๐‘ฅ, ๐‘ฆ, ๐‘ง).

317

24 calculus for the clarke graphical derivative

24.3 calculus rules

Using these lemmas, we again obtain calculus rules under the assumption that the involvedset-valued mapping is regular.

Theorem 24.9 (addition of a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ be Banachspaces, let๐บ : ๐‘‹ โ†’ ๐‘Œ be Frรฉchet dierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐บ is continuously Frรฉchetdierentiable at ๐‘ฅ โˆˆ ๐‘‹ and ๐น is T-regular at (๐‘ฅ, ๐‘ฆ โˆ’๐บ (๐‘ฅ)) for ๐‘ฆ โˆˆ ๐ป (๐‘ฅ) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ฅ), then๐ป is T-regular at (๐‘ฅ, ๐‘ฆ) and

๐ท๐ป (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) = ๐ท๐น (๐‘ฅ |๐‘ฆ โˆ’๐บ (๐‘ฅ)) (ฮ”๐‘ฅ) +๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. We construct ๐ป from ๐ถ and ๐‘… as in Theorem 22.11. Due to the assumptions (notingthat continuous dierentiability implies strict dierentiability), ๐ถ and ๐‘…๐ถ are tangentiallyregular by Lemmas 24.4 and 24.6, respectively. We now obtain the claimed expression fromTheorem 22.11. ๏ฟฝ

Theorem 24.10 (outer composition with a single-valued dierentiable mapping). Let๐‘‹,๐‘Œ, ๐‘be Banach spaces, ๐น : ๐‘‹ โ‡’ ๐‘Œ , and๐บ : ๐‘Œ โ†’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐น (๐‘‹ )) be given.If ๐บ is continuously Frรฉchet dierentiable at each ๐‘ฆ โˆˆ ๐น (๐‘ฅ), invertible on ran๐บ near ๐‘ง withFrรฉchet dierentiable inverse at ๐‘ง, and ๐น is T-regular at (๐‘ฅ, ๐‘ฆ), then ๐ป is T-regular at (๐‘ฅ, ๐‘ง)and

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) =โ‹ƒ

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง๐บโ€ฒ(๐‘ฆ)๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. We construct ๐ป from ๐ถ and ๐‘… as in Theorem 22.12. Due to the assumptions, ๐ถ and๐‘…๐ถ are tangentially regular by Corollary 24.7 and Lemma 24.4, respectively. We now obtainthe claimed expression from Theorem 22.12. ๏ฟฝ

The special case for a linear operator follows from this exactly as in the proof of Corol-lary 22.13.

Corollary 24.11 (outer composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces,๐ด โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ), and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐ด has a bounded left-inverse ๐ดโ€  and ๐น is T-regular at (๐‘ฅ, ๐‘ฆ)for ๐‘ฅ โˆˆ ๐‘‹ and the unique ๐‘ฆ โˆˆ ๐‘Œ with ๐ด๐‘ฆ = ๐‘ง, then for any ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐ด๐น (๐‘ฅ),then ๐ป is T-regular at (๐‘ฅ, ๐‘ง) and

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = ๐ด๐ท๐น (๐‘ฅ |๐‘ฆ) (ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

318

24 calculus for the clarke graphical derivative

Theorem 24.12 (inner composition with a single-valued dierentiable mapping). Let๐‘‹,๐‘Œ, ๐‘be Banach spaces, ๐น : ๐‘‹ โ†’ ๐‘Œ and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐บ (๐น (๐‘ฅ)). If๐น is continuously Frรฉchet dierentiable near ๐‘ฅ such that ๐น โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) and ๐บ is T-regular at (๐น (๐‘ฅ), ๐‘ง), then ๐ป is T-regular at (๐‘ฅ, ๐‘ง) and

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = ๐ท๐บ (๐น (๐‘ฅ) |๐‘ง) (๐น โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. We construct ๐ป from ๐ถ and ๐‘… as in Theorem 22.14. Due to the assumptions, ๐ถ and๐‘…๐ถ are tangentially regular by Corollary 24.8 and Lemma 24.4, respectively. We now obtainthe claimed expression from Theorem 22.14. ๏ฟฝ

Corollary 24.13 (inner composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces,๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐ป โ‰” ๐บ โ—ฆ ๐ด for ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and ๐บ : ๐‘Œ โ‡’ ๐‘ on Banachspaces ๐‘‹,๐‘Œ , and ๐‘ . If ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) and ๐บ is T-regular at (๐ด๐‘ฅ, ๐‘ง) for๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐ด๐‘ฅ), then ๐ป is ๐‘‡ -regular at (๐‘ฅ, ๐‘ง) and

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = ๐ท๐บ (๐ด๐‘ฅ |๐‘ง) (๐ดฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

As in Section 22.3, we can apply these results to chain rules for subdierentials, this timeonly at points where these subdierentials are T-regular.

Corollary 24.14 (second derivative chain rule for convex subdierential). Let๐‘‹,๐‘Œ be Banachspaces, let ๐‘“ : ๐‘Œ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) be suchthat ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), and ran๐ด โˆฉ int dom ๐‘“ โ‰  โˆ…. Let โ„Ž โ‰” ๐‘“ โ—ฆ๐ด. If ๐œ•๐‘“is T-regular at ๐ด๐‘ฅ , ๐‘ฅ โˆˆ ๐‘‹ , for ๐‘ฆโˆ— โˆˆ ๐œ•๐‘“ (๐ด๐‘ฅ), then ๐œ•โ„Ž is T-regular at ๐‘ฅ for ๐‘ฅโˆ— = ๐ดโˆ—๐‘ฆโˆ— and

๐ท [๐œ•โ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = ๐ดโˆ—๐ท [๐œ•๐‘“ ] (๐ด๐‘ฅ |๐‘ฆโˆ—) (๐ดฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Theorem 24.15 (product rule). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces, let ๐บ : ๐‘‹ โ†’ ๐•ƒ(๐‘Œ ;๐‘ ) be Frรฉchetdierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Assume that ๐บ (๐‘ฅ) โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ) has a left-inverse ๐บ (๐‘ฅ)โ€ โˆ— onran๐บ (๐‘ฅ) for ๐‘ฅ near ๐‘ฅ โˆˆ ๐‘‹ and the mapping ๐‘ฅ โ†ฆโ†’ ๐บ (๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ . Let๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐‘ฅ)๐น (๐‘ฅ) โ‰” โ‹ƒ

๐‘ฆโˆˆ๐น (๐‘ฅ)๐บ (๐‘ฅ)๐‘ฆ . If ๐น is T-regular at ๐‘ฅ for the unique๐‘ฆ โˆˆ ๐น (๐‘ฅ) satisfying ๐บ (๐‘ฅ)๐‘ฆ = ๐‘ง and ๐บ is continuously dierentiable at ๐‘ฆ , then ๐ป is T-regularat ๐‘ฅ for ๐‘ง and

๐ท๐ป (๐‘ฅ |๐‘ง) (ฮ”๐‘ฅ) = [๐บโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]๐‘ฆ +๐บ (๐‘ฅ)๐ท๐น (๐‘ฅ |๐‘ฆ)ฮ”๐‘ฅ (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. We construct ๐ป from ๐‘…1 and graph(๐บ โ—ฆ ๐น ) as in Theorem 22.17. Due to the as-sumptions, ๐บ and ๐น are T-regular, and hence ๐ป is tangentially regular by Theorem 24.10and Lemma 24.4. We now obtain the claimed expression from Theorem 22.17. ๏ฟฝ

319

24 calculus for the clarke graphical derivative

Corollary 24.16 (second derivative chain rule for Clarke subdierential). Let๐‘‹,๐‘Œ be Banachspaces, let ๐‘“ : ๐‘Œ โ†’ ๐‘… be locally Lipschitz continuous, and let ๐‘† : ๐‘‹ โ†’ ๐‘Œ be twice continuouslydierentiable. Set โ„Ž : ๐‘‹ โ†’ ๐‘Œ , โ„Ž(๐‘ฅ) โ‰” ๐‘“ (๐‘† (๐‘ฅ)). If there exists a neighborhood ๐‘ˆ of ๐‘ฅ โˆˆ ๐‘‹such that

(i) ๐‘“ is Clarke regular at ๐‘† (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ ;(ii) ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐‘†โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) for all ๐‘ฅ โˆˆ ๐‘ˆ ;

(iii) the mapping ๐‘ฅ โ†ฆโ†’ ๐‘†โ€ฒ(๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ ;and ๐œ•๐ถ ๐‘“ is T-regular at ๐‘† (๐‘ฅ) for ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)), then ๐œ•๐ถโ„Ž is T-regular at ๐‘ฅ for ๐‘ฅโˆ— = ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—and

๐ท [๐œ•๐ถโ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = (๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)โˆ—๐‘ฆโˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ท [๐œ•๐ถ ๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

320

25 CALCULUS FOR THE LIMITING CODERIVATIVE

The limiting coderivative is the most challenging of all the graphical and coderivatives,and developing exact calculus rules for it requires the most assumptions. In particular,we will here assume a stronger variant of the assumptions of Chapter 23 for the Frรฉchetcoderivative that also implies N-regularity of the set-valued mapping so that we can exploitthe stronger properties of the Frรฉchet coderivative. To prove the fundamental compositionlemmas, we will also need to introduce the concept of partial sequential normal compactnessthat will be used to prevent certain unit-length coderivatives from converging weakly-โˆ— tozero. This concept will also be needed in Chapter 27.

25.1 strict codifferentiability

Let ๐‘‹,๐‘Œ be Banach spaces. We say that ๐น is strictly codierentiable at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฆ โˆˆ ๐น (๐‘ฅ)if

(25.1) ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) ={๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โˆ€graph ๐น 3 (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ), Y๐‘˜โ†’ 0 :โˆƒ(๐‘ฅโˆ—

๐‘˜, ๐‘ฆโˆ—๐‘˜) โˆ—โ‡€ (๐‘ฅโˆ—, ๐‘ฆโˆ—) with ๐‘ฅโˆ—

๐‘˜โˆˆ ๐ทโˆ—

Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (๐‘ฆโˆ—๐‘˜ )

},

i.e., if (18.8) is a full weak-โˆ—-limit. From Theorem 20.12 and Corollary 20.14, it is clearthat single-valued continuously dierentiable mappings and their inverses are strictlycodierentiable.

Lemma 25.1. Let ๐‘‹,๐‘Œ be Banach spaces, ๐น : ๐‘‹ โ†’ ๐‘Œ , ๐‘ฅ โˆˆ ๐‘‹ , and ๐‘ฆ = ๐น (๐‘ฅ).(i) If ๐น is continuously dierentiable at ๐‘ฅ , then ๐น is strictly codierentiable at ๐‘ฅ for ๐‘ฆ .

(ii) If ๐น is continuously dierentiably near ๐‘ฅ , then ๐นโˆ’1 is strictly codierentiable at ๐‘ฆ for ๐‘ฅ .

The next lemma and counterexample demonstrate that strict codierentiability is a strictlystronger assumption than N-regularity.

Lemma 25.2. Let ๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ‡’ ๐‘Œ be strictly codierentiable at ๐‘ฅ for๐‘ฆ . Then ๐น is N-regular at ๐‘ฅ for ๐‘ฆ .

321

25 calculus for the limiting coderivative

Proof. By Theorem 18.5, strict codierentiability, and the denition of the inner limit,respectively,

๐‘graph ๐น (๐‘ฅ, ๐‘ฆ) โŠ‚ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ)= lim inf

graph ๐น3(๐‘ฅ,๏ฟฝ๏ฟฝ)โ†’(๐‘ฅ,๐‘ฆ), Yโ†’ 0๐‘ Ygraph ๐น (๐‘ฅ, ๏ฟฝ๏ฟฝ)

โŠ‚ ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ).Therefore ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ) = ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ), i.e., graph ๐น is normally regular at (๐‘ฅ, ๐‘ฆ). ๏ฟฝ

Example 25.3 (graphical regularity does not imply strict codierentiability). Consider๐น (๐‘ฅ) โ‰” [|๐‘ฅ |,โˆž),๐‘ฅ โˆˆ โ„. Then graph ๐น = epi | ยท | is a convex set and therefore graphicallyregular at all points and

๐‘graph ๐น (๐‘ฅ, |๐‘ฅ |) ={(sign๐‘ฅ,โˆ’1) [0,โˆž) if ๐‘ฅ โ‰  0,graph ๐น โ—ฆ = {(๐‘ฅโˆ—, ๐‘ฆโˆ—) | โˆ’๐‘ฆโˆ— โ‰ฅ |๐‘ฅโˆ— |} if ๐‘ฅ = 0.

Hence๐‘graph ๐น is not continuous and therefore, a fortiori, ๐น is not strictly codierentiableat (0, 0).

25.2 partial sequential normal compactness

One central diculty in working with innite-dimensional spaces is the need to distinguishweak-โˆ— convergence and strong convergence. In particular, we need to prevent certainsequences whose norm is bounded away from zero from weak-โˆ— converging to zero. Aswe cannot guarantee this in general, we need to add this as an assumption. In our specicsetting, this is the partial sequential normal compactness (PSNC) of ๐บ : ๐‘Œ โ‡’ ๐‘ at ๐‘ฆ for ๐‘ง,which holds if

(25.2) Y๐‘˜โ†’ 0, (๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฆ, ๐‘ง), ๐‘ฆโˆ—๐‘˜ โˆ—โ‡€ 0, โ€–๐‘งโˆ—๐‘˜ โ€–๐‘ โˆ— โ†’ 0, and ๐‘ฆโˆ—๐‘˜ โˆˆ ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜)

โ‡’ โ€–๐‘ฆโˆ—๐‘˜ โ€–๐‘Œ โˆ— โ†’ 0.

Obviously, if ๐‘Œ โˆ— nite-dimensional, then every mapping ๐บ : ๐‘Œ โ‡’ ๐‘ is PSNC. To prove thePSNC property of single-valued mappings and their inverses, we will need an estimate ofY-coderivatives.

Lemma 25.4. Let ๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be continuously dierentiable at๐‘ฅ โˆˆ ๐‘‹ . Then for any Y > 0, ๐ฟ โ‰” โ€–๐น โ€ฒ(๐‘ฅ)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) , and ๐‘ฆ = ๐น (๐‘ฅ),

๐ทโˆ—Y ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚ ๐”น(๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, (๐ฟ + 1)Y) (๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—) .

322

25 calculus for the limiting coderivative

Proof. By denition, ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) if and only if for every sequence ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ ,

(25.3) lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)ใ€‰๐‘Œโˆšโ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–2๐‘‹ + โ€–๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)โ€–2๐‘Œ

โ‰ค Y.

Let โ„“ > ๐ฟ. Then by the continuous dierentiability and therefore local Lipschitz continuityof ๐น at ๐‘ฅ , we have โ€–๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)โ€–๐‘Œ โ‰ค โ„“ โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹ for large enough ๐‘˜ and therefore

lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)ใ€‰๐‘Œโ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹

โ‰ค Y (โ„“ + 1).

Furthermore, the Frรฉchet dierentiability of ๐น implies that

lim sup๐‘˜โ†’โˆž

ใ€ˆ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐น (๐‘ฅ๐‘˜) โˆ’ ๐น (๐‘ฅ)ใ€‰๐‘Œโ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹

= 0

and hence thatlim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ— โˆ’ ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹โ€–๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ โ€–๐‘‹

โ‰ค Y (โ„“ + 1).

Since ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ was arbitrary, this implies โ€–๐‘ฅโˆ— โˆ’ ๐น โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—โ€–๐‘‹ โˆ— โ‰ค Y (โ„“ + 1), and since โ„“ > ๐ฟ wasarbitrary, the claim follows. ๏ฟฝ

Lemma 25.5. Let ๐‘Œ, ๐‘ be Banach spaces and ๐บ : ๐‘Œ โ†’ ๐‘ . If either

(a) ๐บ is continuously dierentiable near ๐‘ฆ โˆˆ ๐‘Œ or

(b) ๐‘Œ โˆ— is nite-dimensional,

then ๐บ is PSNC at ๐‘ฆ for ๐‘ง = ๐บ (๐‘ฆ).

Proof. The nite-dimensional case (b) is clear from the denition (25.2) of the PSNC prop-erty.

For case (a), we have from Lemma 25.4 that ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜) โŠ‚ ๐”น(๐บโ€ฒ(๐‘ฆ๐‘˜)โˆ—๐‘งโˆ—๐‘˜ , โ„“Y๐‘˜) for

any โ„“ > โ€–๐บโ€ฒ(๐‘ฆ๐‘˜)โ€–๐•ƒ(๐‘Œ ;๐‘ ) . By the continuous dierentiability of ๐บ , this will hold for โ„“ >

โ€–๐บโ€ฒ(๐‘ฆ)โ€–๐•ƒ(๐‘Œ ;๐‘ ) and any ๐‘˜ โˆˆ โ„• large enough. Thus there exist ๐‘‘โˆ—๐‘˜โˆˆ ๐”น(0, โ„“Y๐‘˜) such that

๐‘ฆโˆ—๐‘˜ = ๐บโ€ฒ(๐‘ฆ๐‘˜)โˆ—๐‘งโˆ—๐‘˜ + ๐‘‘โˆ—๐‘˜ = ๐บโ€ฒ(๐‘ฆ)โˆ—๐‘งโˆ—๐‘˜ + [๐บโ€ฒ(๐‘ฆ๐‘˜) โˆ’๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ—๐‘˜ + ๐‘‘โˆ—๐‘˜ โ†’ 0

since ๐‘‘โˆ—๐‘˜โ†’ 0 (due to Y๐‘˜โ†’ 0), โ€–๐‘งโˆ—

๐‘˜โ€–๐‘ โˆ— โ†’ 0, ๐‘ฆ๐‘˜ โ†’ ๐‘ฆ , and ๐บ is continuously dierentiable

near ๐‘ฆ . ๏ฟฝ

Lemma 25.6. Let ๐‘Œ, ๐‘ be Banach spaces and ๐บ : ๐‘Œ โ†’ ๐‘ . If either

(a) ๐บ is continuously dierentiable near ๐‘ฆ โˆˆ ๐‘Œ and ๐บโ€ฒ(๐‘ฆ)โˆ— โˆˆ ๐•ƒ(๐‘ โˆ—;๐‘Œ โˆ—) has a left-inverse๐บโ€ฒ(๐‘ฆ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘Œ โˆ—;๐‘ โˆ—), i.e., ๐บโ€ฒ(๐‘ฆ)โˆ—โ€ ๐บโ€ฒ(๐‘ฆ)โˆ— = Id, or

323

25 calculus for the limiting coderivative

(b) ๐‘ โˆ— is nite-dimensional,

then ๐บโˆ’1 is PSNC at ๐‘ง = ๐บ (๐‘ฆ) for ๐‘ฆ .

Proof. The nite-dimensional case (b) is clear from the denition (25.2) of the PSNC prop-erty.

For case (a), we have from the denition of ๐ทโˆ—Y ๐น via ๐‘ Y

graph ๐น that ฮ”๐‘งโˆ—๐‘˜โˆˆ ๐ทโˆ—

Y๐บโˆ’1(๐‘ง๐‘˜ |๐‘ฆ๐‘˜) (ฮ”๐‘ฆโˆ—๐‘˜ )

if and only if ฮ”๐‘ฆโˆ—๐‘˜โˆˆ ๐ทโˆ—

Y๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (ฮ”๐‘งโˆ—๐‘˜). We thus have to show that

Y๐‘˜โ†’ 0, (๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฆ, ๐‘ง), ๐‘งโˆ—๐‘˜ โˆ—โ‡€ 0, โ€–๐‘ฆโˆ—๐‘˜ โ€–๐‘Œ โˆ— โ†’ 0, and ๐‘ฆโˆ—๐‘˜ โˆˆ ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜)

โ‡’ โ€–๐‘งโˆ—๐‘˜ โ€–๐‘ โˆ— โ†’ 0.

From Lemma 25.4, it follows that ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜) โŠ‚ ๐”น(๐บโ€ฒ(๐‘ฆ๐‘˜)โˆ—๐‘งโˆ—๐‘˜ , โ„“Y๐‘˜)) for any โ„“ >

โ€–๐บโ€ฒ(๐‘ฆ๐‘˜)โ€–๐•ƒ(๐‘Œ ;๐‘ ) . As in Lemma 25.5, we now deduce that ๐‘ฆโˆ—๐‘˜= ๐บโ€ฒ(๐‘ฆ๐‘˜)โˆ—๐‘งโˆ—๐‘˜ + ๐‘‘โˆ—

๐‘˜for some

๐‘‘โˆ—๐‘˜โˆˆ ๐”น(0, โ„“Y๐‘˜). Since ๐‘ฆโˆ—๐‘˜ โˆ’ ๐‘‘โˆ—

๐‘˜โ†’ 0, we also have ๐บโ€ฒ(๐‘ฆ๐‘˜)โˆ—๐‘งโˆ—๐‘˜ โ†’ 0 and thus ๐บโ€ฒ(๐‘ฆ)โˆ—๐‘งโˆ—

๐‘˜+

[๐บโ€ฒ(๐‘ฆ๐‘˜) โˆ’ ๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ—๐‘˜โ†’ 0. Since {๐‘งโˆ—

๐‘˜}๐‘˜โˆˆโ„• is bounded by the continuous dierentiability

of ๐บ and ๐‘ฆ๐‘˜ โ†’ ๐‘ฆ , we obtain ๐บโ€ฒ(๐‘ฆ)โˆ—๐‘งโˆ—๐‘˜โ†’ 0. Since ๐บโ€ฒ(๐‘ฆ)โˆ— is assumed to have a bounded

left-inverse, this implies ๐‘งโˆ—๐‘˜โ†’ 0 as required. ๏ฟฝ

We will use PSNC to obtain the following partial compactness property for the limitingcoderivative, for which we need to assume reexivity (or nite-dimensionality) of ๐‘Œ .

Lemma 25.7. Let ๐‘Œ, ๐‘ be Banach spaces and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐‘ฆ โˆˆ ๐‘Œ and ๐‘ง โˆˆ ๐บ (๐‘ฆ) be given.Assume ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (0) implies ๐‘ฆโˆ— = 0 and either

(a) ๐‘Œ is nite-dimensional or

(b) ๐‘Œ is reexive and ๐บ is PSNC at ๐‘ฆ for ๐‘ง.

If(๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฆ, ๐‘ง), ๐‘งโˆ—๐‘˜

โˆ—โ‡€ ๐‘งโˆ—, Y๐‘˜โ†’ 0, and ๐‘ฆโˆ—๐‘˜ โˆˆ ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜),

then there exists a subsequence such that ๐‘ฆโˆ—๐‘˜

โˆ—โ‡€ ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—).

Proof. We rst show that {๐‘ฆโˆ—๐‘˜}๐‘˜โˆˆโ„• is bounded. We argue by contradiction and suppose

that {๐‘ฆโˆ—๐‘˜}๐‘˜โˆˆโ„• is unbounded. We may then assume that โ€–๐‘ฆโˆ—

๐‘˜โ€–๐‘Œ โˆ— โ†’ โˆž by switching to an

(unrelabelled) subsequence. Since ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) is formed from a cone, we also have

๐ต๐‘Œ โˆ— 3 ๐‘ฆโˆ—๐‘˜/โ€–๐‘ฆโˆ—๐‘˜ โ€–๐‘Œ โˆ— โˆˆ ๐ทโˆ—Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜/โ€–๐‘ฆโˆ—๐‘˜ โ€–๐‘Œ โˆ—) .

Observe that โ€–๐‘งโˆ—๐‘˜/โ€–๐‘ฆโˆ—

๐‘˜โ€–๐‘Œ โˆ— โ€–๐‘ โˆ— โ†’ 0 because {๐‘งโˆ—

๐‘˜}๐‘˜โˆˆโ„• is bounded. Since ๐‘Œ is reexive, we can

use the Eberleinโ€“Smulyan Theorem 1.9 to extract a subsequence such that ๐‘ฆโˆ—๐‘˜/โ€–๐‘ฆโˆ—

๐‘˜โ€–๐‘Œ โˆ— โˆ—โ‡€ ๐‘ฆโˆ—

for some ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (0). If๐‘Œ is nite-dimensional, clearly ๐‘ฆโˆ— โ‰  0. Otherwise we need to

324

25 calculus for the limiting coderivative

use the assumed PSNC property. If ๐‘ฆโˆ— = 0, then (25.2) implies that 1 = โ€–๐‘ฆโˆ—๐‘˜/โ€–๐‘ฆโˆ—

๐‘˜โ€–๐‘Œ โˆ— โ€–๐‘Œ โˆ— โ†’ 0,

which is a contradiction. Therefore ๐‘ฆโˆ— โ‰  0. However, we have assumed ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (0)to imply ๐‘ฆโˆ— = 0, so we obtain a contradiction.

Therefore {๐‘ฆโˆ—๐‘˜}๐‘˜โˆˆโ„• is bounded, so we may again use the Eberleinโ€“Smulyan Theorem 1.9

to extract a subsequence converging to some ๐‘ฆโˆ— โˆˆ ๐‘Œ . By the denition of the limitingcoderivative, this implies ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—) and hence the claim. ๏ฟฝ

Remark 25.8. The PSNC property, its stronger variant sequential normal compactness (SNC), andtheir implications are studied in signicant detail in [Mordukhovich 2006].

25.3 cone transformation formulas

As in Section 24.2, we now show that normal regularity is preserved under certain trans-formations by deriving explicit expressions for the transformed cones and then comparingthem with the corresponding expressions of the Frรฉchet coderivative.

Lemma 25.9. Let๐‘‹,๐‘Œ be Banach spaces and assume there exists a family of continuous inverseselections {๐‘…โˆ’1๐‘ฆ : ๐‘ˆ๐‘ฆ โ†’ ๐ถ | ๐‘ฆ โˆˆ ๐ถ, ๐‘…๐‘ฆ = ๐‘ฅ} of ๐‘… โˆˆ ๐•ƒ(๐‘Œ ;๐‘‹ ) to ๐ถ โŠ‚ ๐‘Œ at ๐‘ฅ โˆˆ ๐‘‹ . If each ๐‘…โˆ’1๐‘ฆis Frรฉchet dierentiable at ๐‘ฅ and ๐ถ is normally regular at all ๐‘ฆ โˆˆ ๐ถ with ๐‘…๐‘ฆ = ๐‘ฅ , then ๐‘…๐ถ isnormally regular at ๐‘ฅ and

๐‘๐‘…๐ถ (๐‘ฅ) =โ‹‚

๐‘ฆ :๐‘…๐‘ฆ=๐‘ฅ{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— | ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฆ)}.

Proof. We rst prove โ€œโŠ‚โ€. Let ๐‘ฅโˆ— โˆˆ ๐‘๐‘…๐ถ (๐‘ฅ). By denition, this holds if and only if thereexist Y๐‘˜โ†’ 0 as well as ๐‘ฅโˆ—

๐‘˜โˆ—โ‡€ ๐‘ฅโˆ— and ๐‘ฅ๐‘˜ โ†’ ๐‘ฅ with ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘ Y๐‘˜

๐‘…๐ถ (๐‘ฅ๐‘˜). Let ๐‘ฆ โˆˆ ๐‘Œ be such that๐‘…๐‘ฆ = ๐‘ฅ . Dening ๐‘ฆ๐‘˜ โ‰” ๐‘…โˆ’1๐‘ฆ ๐‘ฅ๐‘˜ , we have ๐‘…๐‘ฆ๐‘˜ = ๐‘ฅ๐‘˜ and๐ถ 3 ๐‘ฆ๐‘˜ โ†’ ๐‘ฆ . Thus Lemma 23.2 yields๐‘…โˆ—๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘ Y๐‘˜๐ฟ

๐ถ (๐‘ฆ๐‘˜). By denition of the limiting coderivative, this implies that ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฆ).Since this holds for all ๐‘ฆ โˆˆ ๐‘Œ with ๐‘…๐‘ฆ = ๐‘ฅ , we obtain โ€œโŠ‚โ€.For โ€œโŠƒโ€, Let ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— be such that ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฆ) for all ๐‘ฆ โˆˆ ๐‘Œ with ๐‘…๐‘ฆ = ๐‘ฅ . Then theassumption of regularity of ๐ถ at ๐‘ฆ implies that ๐‘…โˆ—๐‘ฅโˆ— โˆˆ ๐‘๐ถ (๐‘ฆ). Hence taking ๐‘ฆ๐‘˜ = ๐‘ฆ ,๐‘ฅโˆ—๐‘˜= ๐‘ฅโˆ—, and ๐‘ฅ๐‘˜ = ๐‘ฅ , we deduce from Lemma 23.2 that ๐‘ฅโˆ—

๐‘˜โˆˆ ๐‘๐‘…๐ถ (๐‘ฅ๐‘˜). Again by denition,

this implies that ๐‘ฅโˆ— โˆˆ ๐‘๐‘…๐ถ (๐‘ฅ).Finally, the normal regularity of๐‘…๐ถ at๐‘ฅ is clear fromwriting๐‘๐ถ (๐‘ฆ) = ๐‘๐ถ (๐‘ฆ) and comparingour expression for ๐‘๐‘…๐ถ (๐‘ฅ) to the expression for ๐‘๐‘…๐ถ (๐‘ฅ) provided by Lemma 23.2. ๏ฟฝ

Remark 25.10 (regularity assumptions). Again, the assumption in Lemma 24.4 that ๐ถ is normallyregular is not needed if ker๐‘… = {0} or, more generally, if ๐‘… is a continuously dierentiable mappingwith kerโˆ‡๐‘…(๐‘ฆ) = {0}.

325

25 calculus for the limiting coderivative

For the fundamental lemma for the limiting coderivative, we need to assume reexivity of๐‘Œ in order to apply the PSNC via Lemma 25.7.

Lemma 25.11 (fundamental lemma on compositions). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces with ๐‘Œreexive and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}for ๐น : ๐‘‹ โ‡’ ๐‘Œ , and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ .

(i) If ๐บ is strictly codierentiable and PSNC at ๐‘ฆ for ๐‘ง, semi-codierentiable near (๐‘ฆ, ๐‘ง) โˆˆgraph๐บ , and ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (0) implies ๐‘ฆโˆ— = 0, then

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—)}.

(ii) If ๐นโˆ’1 is strictly codierentiable and PSNC at ๐‘ฆ for ๐‘ฅ , semi-codierentiable near (๐‘ฆ, ๐‘ฅ) โˆˆgraph ๐นโˆ’1, and ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (0) implies ๐‘ฆโˆ— = 0, then

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), โˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (โˆ’๐‘งโˆ—)}.

Moreover, if ๐น is N-regular at ๐‘ฅ for ๐‘ฆ and๐บ is ๐‘ -regular at ๐‘ฆ for ๐‘ง, then๐ถ is normally regularat (๐‘ฅ, ๐‘ฆ, ๐‘ง).

Proof. We only consider the case (i); the case (ii) is shown analogously. To show the in-clusion โ€œโŠ‚โ€, let (๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) โˆˆ ๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง), which by denition holds if and only if thereexist Y๐‘˜โ†’ 0 as well as (๐‘ฅโˆ—

๐‘˜, ๐‘ฆโˆ—๐‘˜, ๐‘งโˆ—๐‘˜) โˆ—โ‡€ (๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) and ๐ถ 3 (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ, ๐‘ง) with

(๐‘ฅโˆ—๐‘˜, ๐‘ฆโˆ—๐‘˜, ๐‘งโˆ—๐‘˜) โˆˆ ๐‘ Y๐‘˜

๐ถ (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜). Since by assumption ๐บ is semi-codierentiable at (๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โˆˆgraph๐บ for ๐‘˜ โˆˆ โ„• suciently large, we can apply Lemma 23.4 (i) to obtain a ๏ฟฝ๏ฟฝโˆ—

๐‘˜โˆˆ

๐ทโˆ—๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜) such that

(25.4) ๐‘ฅโˆ—๐‘˜ โˆˆ ๐ทโˆ—Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (โˆ’๏ฟฝ๏ฟฝโˆ—๐‘˜ โˆ’ ๐‘ฆโˆ—๐‘˜ ).

Since ๐‘งโˆ—๐‘˜

โˆ—โ‡€ ๐‘งโˆ—, (๐‘ฆ๐‘˜ , ๐‘ง๐‘˜) โ†’ (๐‘ฆ, ๐‘ง), and Y๐‘˜โ†’ 0, we deduce from Lemma 25.7 that ๏ฟฝ๏ฟฝโˆ—๐‘˜

โˆ—โ‡€ ๏ฟฝ๏ฟฝโˆ— (fora subsequence) for some ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—). Since also ๐‘ฅโˆ—

๐‘˜โˆ—โ‡€ ๐‘ฅโˆ— and ๐‘ฆโˆ—

๐‘˜โˆ—โ‡€ ๐‘ฆโˆ—, by (25.4)

and the denition of the limiting coderivative, this implies that ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—).To show โ€œโŠƒโ€, let ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—) and ๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (๐‘งโˆ—). We can then by thedenition of ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) nd Y๐‘˜โ†’ 0 as well as (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ) and (๐‘ฅโˆ—

๐‘˜, ๐‘ฆโˆ—๐‘˜) โˆ—โ‡€ (๐‘ฅโˆ—, ๏ฟฝ๏ฟฝโˆ— + ๐‘ฆโˆ—)

with๐‘ฅโˆ—๐‘˜โˆˆ ๐ทโˆ—

Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (โˆ’๐‘ฆโˆ—๐‘˜ ). Since๐บ is strictly codierentiable at ๐‘ฆ for๐‘ง, taking any๐‘ง๐‘˜ โ†’ ๐‘ง,

we can now nd ๐‘งโˆ—๐‘˜

โˆ—โ‡€ ๐‘งโˆ— and ๏ฟฝ๏ฟฝโˆ—๐‘˜

โˆ—โ‡€ ๏ฟฝ๏ฟฝโˆ— with ๏ฟฝ๏ฟฝโˆ—๐‘˜โˆˆ ๐ทโˆ—

Y๐‘˜๐บ (๐‘ฆ๐‘˜ |๐‘ง๐‘˜) (๐‘งโˆ—๐‘˜). Letting ๐‘ฆโˆ—๐‘˜ โ‰” ๐‘ฆโˆ—

๐‘˜โˆ’ ๏ฟฝ๏ฟฝโˆ—

๐‘˜,

this implies that ๐‘ฆโˆ—๐‘˜

โˆ—โ‡€ ๐‘ฆโˆ— and that ๐‘ฅโˆ—๐‘˜โˆˆ ๐ทโˆ—

Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (โˆ’๏ฟฝ๏ฟฝโˆ—๐‘˜ โˆ’ ๐‘ฆโˆ—

๐‘˜). By Lemma 23.4 (i), it

follows that (๐‘ฅโˆ—๐‘˜, ๐‘ฆโˆ—๐‘˜, ๐‘งโˆ—๐‘˜) โˆˆ ๐‘ Y๐‘˜

๐ถ (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜ , ๐‘ง๐‘˜). The claim now follows again from the denitionof ๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) as the corresponding outer limit.

Finally, the normal regularity of๐ถ follows from the N-regularity of ๐น and๐บ (via Lemma 25.2)by comparing Lemma 23.4 with Lemma 23.4 (i) for Y = 0. ๏ฟฝ

326

25 calculus for the limiting coderivative

If one of the two mappings is single-valued, we can use Lemma 25.1 for verifying its semi-dierentiability and Theorem 20.12 for the expression of its graphical derivative to obtainfrom Lemma 25.11 the following two special cases.

Corollary 25.12 (fundamental lemma on compositions: single-valued outer mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces with ๐‘Œ reexive and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ,๐บ (๐‘ฆ)) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ)}

for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ and ๐บ is continuously dierentiable near ๐‘ฆ ,then

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐‘ฅโˆ—, ๐‘ฆโˆ—, ๐‘งโˆ—) | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (โˆ’[๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ— โˆ’ ๐‘ฆโˆ—), ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.

Moreover, if ๐น is N-regular at (๐‘ฅ, ๐‘ฆ), then ๐ถ is normally regular at (๐‘ฅ, ๐‘ฆ,๐บ (๐‘ฆ)).

Proof. We apply Lemma 25.11, where the strict and semi-codierentiability requirementson๐บ are veried by Lemmas 23.1 and 25.1; the PSNC requirement follows from Lemma 25.5;and the requirement of ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (0) implying ๐‘ฆโˆ— = 0 follows from the expressionof Theorem 20.12 for ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (0). The claimed normal regularity of ๐ถ for ๐‘ -regular ๐นfollows from the ๐‘ -regularity of ๐บ established by Theorem 20.12. ๏ฟฝ

Corollary 25.13 (fundamental lemma on compositions: single-valued inner mapping). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces with ๐‘Œ reexive and

๐ถ โ‰” {(๐‘ฅ, ๐‘ฆ, ๐‘ง) | ๐‘ฆ = ๐น (๐‘ฅ), ๐‘ง โˆˆ ๐บ (๐‘ฆ)}

for ๐น : ๐‘‹ โ‡’ ๐‘Œ and ๐บ : ๐‘Œ โ†’ ๐‘ . If (๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐ถ , ๐น is continuously dierentiable near ๐‘ฅ , andeither

(a) ๐น โ€ฒ(๐‘ฅ)โˆ— has a left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) or(b) ๐‘Œ โˆ— is nite-dimensional,

then

๐‘๐ถ (๐‘ฅ, ๐‘ฆ, ๐‘ง) = {(๐น โ€ฒ(๐‘ฅ)โˆ—(โˆ’๏ฟฝ๏ฟฝโˆ— โˆ’ ๐‘ฆโˆ—), ๐‘ฆโˆ—, ๐‘งโˆ—) | โˆ’๏ฟฝ๏ฟฝโˆ— โˆˆ ๐ทโˆ—๐บ (๐‘ฆ |๐‘ง) (โˆ’๐‘งโˆ—), ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—}.

Moreover, if ๐บ is N-regular at (๐‘ฆ, ๐‘ง), then ๐ถ is normally regular at (๐‘ฅ, ๐‘ฆ, ๐‘ง).

Proof. We apply Lemma 25.11, where the strict and semi-codierentiability requirements on๐นโˆ’1 are veried by Lemmas 23.1 and 25.1; the PSNC requirement follows from Lemma 25.6;and the requirement of ๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (0) implying ๐‘ฆโˆ— = 0 follows from the expressionof Corollary 20.14 for ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (0). The claimed normal regularity of ๐ถ for ๐‘ -regular ๐บfollows from the ๐‘ -regularity of ๐น established by Theorem 20.12. ๏ฟฝ

327

25 calculus for the limiting coderivative

25.4 calculus rules

Using these lemmas, we obtain again calculus rules.

Theorem 25.14 (addition of a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ be Banachspaces with ๐‘‹ reexive, let ๐บ : ๐‘‹ โ†’ ๐‘Œ be Frรฉchet dierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐บis continuously Frรฉchet dierentiable at ๐‘ฅ โˆˆ ๐‘‹ and ๐น is N-regular at (๐‘ฅ, ๐‘ฆ โˆ’ ๐บ (๐‘ฅ)) for๐‘ฆ โˆˆ ๐ป (๐‘ฅ) โ‰” ๐น (๐‘ฅ) +๐บ (๐‘ฅ), then ๐ป is N-regular at (๐‘ฅ, ๐‘ฆ) and

๐ทโˆ—๐ป (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ โˆ’๐บ (๐‘ฅ)) (๐‘ฆโˆ—) + [๐บโ€ฒ(๐‘ฅ)]โˆ—๐‘ฆโˆ— (๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ—).

Proof. We construct ๐ป from ๐ถ and ๐‘… as in Theorem 23.7. Due to the assumptions (notingthat continuous dierentiability implies strict dierentiability), ๐ถ and ๐‘…๐ถ are normallyregular by Lemmas 25.9 and 25.11, respectively. We now obtain the claimed expression fromTheorem 23.7. ๏ฟฝ

Theorem 25.15 (outer composition with a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ, ๐‘be Banach spaces with ๐‘Œ reexive, ๐น : ๐‘‹ โ‡’ ๐‘Œ , and ๐บ : ๐‘Œ โ†’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰”๐บ (๐น (๐‘‹ )) be given. If ๐บ is continuously Frรฉchet dierentiable at each ๐‘ฆ โˆˆ ๐น (๐‘ฅ), invertible onran๐บ near ๐‘ง with Frรฉchet dierentiable inverse at ๐‘ง, and ๐น is N-regular at (๐‘ฅ, ๐‘ฆ), then ๐ป isN-regular at (๐‘ฅ, ๐‘ง) and

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) =โ‹‚

๐‘ฆ :๐บ (๐‘ฆ)=๐‘ง๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) ( [๐บโ€ฒ(๐‘ฆ)]โˆ—๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

Proof. We construct ๐ป from ๐ถ and ๐‘… as in Theorem 23.8. Due to the assumptions, ๐ถ and๐‘…๐ถ are normally regular by Corollary 25.12 and Lemma 25.9, respectively. We now obtainthe claimed expression from Theorem 23.8. ๏ฟฝ

Corollary 25.16 (outer composition with a linear operator). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaceswith ๐‘Œ reexive, ๐ด โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ), and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If ๐ด has a bounded left-inverse ๐ดโ€  and ๐น isN-regular at (๐‘ฅ, ๐‘ฆ) for ๐‘ฅ โˆˆ ๐‘‹ and the unique ๐‘ฆ โˆˆ ๐‘Œ with ๐ด๐‘ฆ = ๐‘ง, then for any ๐‘ฅ โˆˆ ๐‘‹ and๐‘ง โˆˆ ๐ป (๐‘ฅ) := ๐ด๐น (๐‘ฅ), then ๐ป is N-regular at (๐‘ฅ, ๐‘ง) and

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐ดโˆ—๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

Theorem 25.17 (inner composition with a single-valued dierentiable mapping). Let ๐‘‹,๐‘Œ, ๐‘be Banach spaces with ๐‘Œ reexive, ๐น : ๐‘‹ โ†’ ๐‘Œ and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) :=๐บ (๐น (๐‘ฅ)). If ๐น is continuously Frรฉchet dierentiable near ๐‘ฅ such that ๐น โ€ฒ(๐‘ฅ)โˆ— has a boundedleft-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) and ๐บ is T-regular at (๐น (๐‘ฅ), ๐‘ง), then ๐ป is T-regular at (๐‘ฅ, ๐‘ง)and

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = [๐น โ€ฒ(๐‘ฅ)]โˆ—๐ทโˆ—๐บ (๐น (๐‘ฅ) |๐‘ง) (๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

328

25 calculus for the limiting coderivative

Proof. We construct ๐ป from ๐ถ and ๐‘… as in Theorem 23.10. Due to the assumptions, ๐ถ and๐‘…๐ถ are normally regular by Corollary 25.13 and Lemma 25.9, respectively. We now obtainthe claimed expression from Theorem 23.10. ๏ฟฝ

Corollary 25.18 (inner composition with a linear operator). Let๐‘‹,๐‘Œ, ๐‘ be Banach spaces with๐‘Œ reexive, ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ), and ๐บ : ๐‘Œ โ‡’ ๐‘ . Let ๐ป โ‰” ๐บ โ—ฆ ๐ด for ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and ๐บ : ๐‘Œ โ‡’ ๐‘on Banach spaces ๐‘‹,๐‘Œ , and ๐‘ . If ๐ดโˆ— has a left-inverse ๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) and ๐บ is N-regular at(๐ด๐‘ฅ, ๐‘ง) for ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐ด๐‘ฅ), then ๐ป is ๐‘ -regular at (๐‘ฅ, ๐‘ง) and

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = ๐ดโˆ—๐ทโˆ—๐บ (๐ด๐‘ฅ |๐‘ง) (๐‘งโˆ—) (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

To apply these results for chain rules of subdierentials, we now need to assume that bothspaces are reexive in addition to N-regularity.

Corollary 25.19 (second derivative chain rule for convex subdierential). Let๐‘‹,๐‘Œ be reexiveBanach spaces, let ๐‘“ : ๐‘Œ โ†’ โ„ be proper, convex, and lower semicontinuous, and ๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ )be such that๐ดโˆ— has a left-inverse๐ดโˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), and ran๐ดโˆฉ int dom ๐‘“ โ‰  โˆ…. Let โ„Ž โ‰” ๐‘“ โ—ฆ๐ด.If ๐œ•๐‘“ is N-regular at๐ด๐‘ฅ , ๐‘ฅ โˆˆ ๐‘‹ , for ๐‘ฆโˆ— โˆˆ ๐œ•๐‘“ (๐ด๐‘ฅ), then ๐œ•โ„Ž is N-regular at ๐‘ฅ for ๐‘ฅโˆ— = ๐ดโˆ—๐‘ฆโˆ— and

๐ทโˆ— [๐œ•โ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = ๐ดโˆ—๐ทโˆ— [๐œ•๐‘“ ] (๐ด๐‘ฅ |๐‘ฆโˆ—) (๐ดฮ”๐‘ฅ) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Theorem 25.20 (product rule). Let ๐‘‹,๐‘Œ, ๐‘ be Banach spaces with ๐‘‹,๐‘Œ reexive, let ๐บ : ๐‘‹ โ†’๐•ƒ(๐‘Œ ;๐‘ ) be Frรฉchet dierentiable, and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Assume that ๐บ (๐‘ฅ) โˆˆ ๐•ƒ(๐‘Œ ;๐‘ ) has aleft-inverse ๐บ (๐‘ฅ)โ€ โˆ— on ran๐บ (๐‘ฅ) for ๐‘ฅ near ๐‘ฅ โˆˆ ๐‘‹ and the mapping ๐‘ฅ โ†ฆโ†’ ๐บ (๐‘ฅ)โ€ โˆ— is Frรฉchetdierentiable at ๐‘ฅ . Let ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ง โˆˆ ๐ป (๐‘ฅ) โ‰” ๐บ (๐‘ฅ)๐น (๐‘ฅ) โ‰” โ‹ƒ

๐‘ฆโˆˆ๐น (๐‘ฅ)๐บ (๐‘ฅ)๐‘ฆ . If ๐น is N-regularat ๐‘ฅ for the unique ๐‘ฆ โˆˆ ๐น (๐‘ฅ) satisfying ๐บ (๐‘ฅ)๐‘ฆ = ๐‘ง and ๐บ is continuously dierentiable at ๐‘ฆ ,then ๐ป is N-regular at ๐‘ฅ for ๐‘ง and

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) (๐‘งโˆ—) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐บ (๐‘ฅ)โˆ—๐‘งโˆ—) + ([๐บโ€ฒ(๐‘ฅ) ยท ]๐‘ฆ)โˆ—๐‘งโˆ— (๐‘งโˆ— โˆˆ ๐‘ โˆ—).

Proof. We construct ๐ป from ๐‘…1 and graph(๐บ โ—ฆ ๐น ) as in Theorem 23.13. Due to the as-sumptions, ๐บ and ๐น are T-regular, and hence ๐ป is tangentially regular by Theorem 25.15and Lemma 25.9. We now obtain the claimed expression from Theorem 23.13. ๏ฟฝ

Corollary 25.21 (second derivative chain rule for Clarke subdierential). Let๐‘‹,๐‘Œ be reexiveBanach spaces, let ๐‘“ : ๐‘Œ โ†’ ๐‘… be locally Lipschitz continuous, and let ๐‘† : ๐‘‹ โ†’ ๐‘Œ be twicecontinuously dierentiable. Set โ„Ž : ๐‘‹ โ†’ ๐‘Œ , โ„Ž(๐‘ฅ) โ‰” ๐‘“ (๐‘† (๐‘ฅ)). If there exists a neighborhood๐‘ˆof ๐‘ฅ โˆˆ ๐‘‹ such that

(i) ๐‘“ is Clarke regular at ๐‘† (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ ;(ii) ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐‘†โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) for all ๐‘ฅ โˆˆ ๐‘ˆ ;

(iii) the mapping ๐‘ฅ โ†ฆโ†’ ๐‘†โ€ฒ(๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ ;

329

25 calculus for the limiting coderivative

and ๐œ•๐ถ ๐‘“ is N-regular at ๐‘† (๐‘ฅ) for ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)), then ๐œ•๐ถโ„Ž is N-regular at ๐‘ฅ for ๐‘ฅโˆ— = ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ—and

๐ทโˆ— [๐œ•๐ถโ„Ž] (๐‘ฅ |๐‘ฅโˆ—) (๐‘ฅโˆ—โˆ—) = ๐‘† (๐‘ฅ)โˆ—๐‘ฅโˆ—โˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ทโˆ— [๐œ•๐ถ ๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)โˆ—โˆ—๐‘ฅโˆ—โˆ—) (๐‘ฅโˆ—โˆ— โˆˆ ๐‘‹ โˆ—โˆ—).

Remark 25.22. Even in nite dimensions, calculus rules for the sum ๐น +๐บ of arbitrary set-valuedmappings ๐น,๐บ : โ„๐‘ โ‡’ โ„๐‘€ or the composition ๐น โ—ฆ ๐ป for ๐ป : โ„๐‘ โ‡’ โ„๐‘ are much more limited,and in general only yield inclusions of the form

๐ทโˆ— [๐น +๐บ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚โ‹ƒ

๐‘ฆ=๐‘ฆ1+๐‘ฆ2,๐‘ฆ1โˆˆ๐น (๐‘ฅ),๐‘ฆ2โˆˆ๐บ (๐‘ฅ)

๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ1) (๐‘ฆโˆ—) + ๐ทโˆ—๐บ (๐‘ฅ |๐‘ฆ2) (๐‘ฆโˆ—),

and๐ทโˆ— [๐น โ—ฆ ๐ป ] (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚

โ‹ƒ๐‘งโˆˆ๐ป (๐‘ฅ)โˆฉ๐นโˆ’1 (๐‘ฆ)

๐ทโˆ—๐ป (๐‘ฅ |๐‘ง) โ—ฆ ๐ทโˆ—๐น (๐‘ง |๐‘ฆ) (๐‘ฆโˆ—).

We refer to [Rockafellar & Wets 1998; Mordukhovich 2018] for these and other results.

330

26 SECOND-ORDER OPTIMALITY CONDITIONS

We now illustrate the use of set-valued derivatives for optimization problems by showinghow these can be used to derive second-order (sucient and necessary) optimality con-ditions for non-smooth problems. Again, we do not aim for the most general or sharpestpossible results and focus instead on problems having the form (P) involving the com-position of a nonsmooth convex functional with a smooth nonlinear operator. As in theprevious chapters, we will also assume a regularity conditions that allows for cleanerresults.

26.1 second-order derivatives

Let ๐‘‹ be a Banach space and ๐‘“ : ๐‘‹ โ†’ โ„. In this chapter, we set

๐œ•๐ถ ๐‘“ (๐‘ฅ) โ‰”{๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ (๐‘ฅโˆ—,โˆ’1) โˆˆ ๐‘๐ถepi ๐‘“ (๐‘ฅ, ๐‘“ (๐‘ฅ))

},

where๐‘๐ถ๐ด โ‰” ๐‘‡ โ—ฆ

๐ด is the Clarke normal cone. By Lemma 20.19, this coincides with the classicalClarke subdierential if ๐‘“ : ๐‘‹ โ†’ โ„ is locally Lipschitz continuous.

As in the smooth case, second-order conditions are based on a local quadratic model builtfrom curvature information at a point. Since in the nonsmooth case, second derivatives,i.e., graphical derivatives of the subdierential, are no longer unique, we need to considerthe entire set of them when building this curvature information. We therefore need todistinguish a lower curvature model at ๐‘ฅ โˆˆ ๐‘‹ for ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ) in direction ฮ”๐‘ฅ โˆˆ ๐‘‹

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) โ‰” infฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹

as well as an upper curvature model

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) โ‰” supฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ .

It turns out that even for ฮ”๐‘ฅ โ‰  0, we need to consider the stationary upper model

๐‘„๐‘“0 (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) โ‰” sup

ฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (0)ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ ,

331

26 second-order optimality conditions

which we use to dene the extended upper model

๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) โ‰” max{๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—), ๐‘„ ๐‘“

0 (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—)}

= supฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)โˆช๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (0)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ .

For smooth functionals, these models coincide with the usual Hessian.

Theorem 26.1. Let๐‘‹ be a Banach space and let ๐‘“ : ๐‘‹ โ†’ โ„ be twice continuously dierentiable.Then for every ๐‘ฅ,ฮ”๐‘ฅ โˆˆ ๐‘‹ ,

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘“ โ€ฒ(๐‘ฅ)) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘“ โ€ฒ(๐‘ฅ)) = ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹and

๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘“ โ€ฒ(๐‘ฅ)) = max {0, ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ } .

Proof. Since ๐œ•๐ถ ๐‘“ (๐‘ฅ) = {๐‘“ โ€ฒ(๐‘ฅ)} by Theorem 13.5, it follows from Theorem 20.12 that

๐ท [๐œ•๐ถ ๐‘“ (๐‘ฅ)] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹and in particular ๐ท [๐œ•๐ถ ๐‘“ (๐‘ฅ)] (๐‘ฅ |๐‘ฅโˆ—) (0) = 0, which immediately yields the claim. ๏ฟฝ

We illustrate the nonsmooth case with the usual examples of the indicator functional ofthe unit ball and the norm on โ„.

Lemma 26.2. Let ๐‘“ (๐‘ฅ) = ๐›ฟ [โˆ’1,1] (๐‘ฅ), ๐‘ฅ โˆˆ โ„. Then for every ๐‘ฅโˆ— โˆˆ ๐œ•๐‘“ (๐‘ฅ) and ฮ”๐‘ฅ โˆˆ โ„,

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) =โˆž if |๐‘ฅ | = 1, ๐‘ฅโˆ— = 0, ๐‘ฅฮ”๐‘ฅ > 0,โˆž if |๐‘ฅ | = 1, ๐‘ฅโˆ— โˆˆ (0,โˆž)๐‘ฅ, ฮ”๐‘ฅ โ‰  0,0, otherwise,

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) =โˆ’โˆž if |๐‘ฅ | = 1, ๐‘ฅโˆ— = 0, ๐‘ฅฮ”๐‘ฅ > 0,โˆ’โˆž if |๐‘ฅ | = 1, ๐‘ฅโˆ— โˆˆ (0,โˆž)๐‘ฅ, ฮ”๐‘ฅ โ‰  0,0, otherwise,

and

๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = ๐‘„ ๐‘“0 (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) =

โˆž if |๐‘ฅ | = 1, ๐‘ฅโˆ— โˆˆ (0,โˆž)๐‘ฅ,โˆž if |๐‘ฅ | = 1, ๐‘ฅโˆ— = 0, ๐‘ฅฮ”๐‘ฅ > 0,0 if |๐‘ฅ | = 1, ๐‘ฅโˆ— = 0, ๐‘ฅฮ”๐‘ฅ โ‰ค 0,0 if |๐‘ฅ | < 1.

Proof. The claims follow directly from the expression (20.7) in Theorem 20.17 with sup โˆ… =โˆ’โˆž and inf โˆ… = โˆž. ๏ฟฝ

332

26 second-order optimality conditions

Lemma 26.3. Let ๐‘“ (๐‘ฅ) = |๐‘ฅ |, ๐‘ฅ โˆˆ โ„. Then for every ๐‘ฅโˆ— โˆˆ ๐œ•๐‘“ (๐‘ฅ) and ฮ”๐‘ฅ โˆˆ โ„,

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) ={โˆž if ๐‘ฅ = 0, ฮ”๐‘ฅ โ‰  0, signฮ”๐‘ฅ โ‰  ๐‘ฅโˆ—,0 otherwise,

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) ={โˆ’โˆž if ๐‘ฅ = 0, ฮ”๐‘ฅ โ‰  0, signฮ”๐‘ฅ โ‰  ๐‘ฅโˆ—,0 otherwise,

and

๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = ๐‘„ ๐‘“0 (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) =

0 if ๐‘ฅ โ‰  0, ๐‘ฅโˆ— = sign๐‘ฅ,0 if ๐‘ฅ = 0, |๐‘ฅโˆ— | = 1, ๐‘ฅโˆ—ฮ”๐‘ฅ โ‰ฅ 0,โˆž if ๐‘ฅ = 0, |๐‘ฅโˆ— | = 1, ๐‘ฅโˆ—ฮ”๐‘ฅ < 0,โˆž if ๐‘ฅ = 0, |๐‘ฅโˆ— | < 1.

Proof. The claims follow directly from the expression (20.13) in Theorem 20.18 with sup โˆ… =โˆ’โˆž and inf โˆ… = โˆž. ๏ฟฝ

These results can be lifted to the corresponding integral functionals on ๐ฟ๐‘ (ฮฉ) using theresults of Chapter 21. Similarly, we obtain calculus rules for the curvature functionals fromthe corresponding results in Chapter 22.

Theorem 26.4 (sum rule). Let ๐‘‹ be a Banach space, let ๐‘“ : ๐‘‹ โ†’ โ„ be locally Lipschitzcontinuous, and let ๐‘” : ๐‘‹ โ†’ โ„ be twice continuously dierentiable. Set ๐‘— (๐‘ฅ) โ‰” ๐‘“ (๐‘ฅ) + ๐‘”(๐‘ฅ).Then for every ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ),

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) + ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ (ฮ”๐‘ฅ โˆˆ ๐‘‹ ),๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) + ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ (ฮ”๐‘ฅ โˆˆ ๐‘‹ ).

Proof. We only show the expression for the upper model, the lower model being analo-gous. First, by Theorem 13.20, we have ๐œ•๐ถ ๐‘— (๐‘ฅ) = {๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ) | ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ)}. The sum ruleTheorem 22.11 for the graphical derivative together with Theorem 20.12 then yields

๐ท [๐œ•๐ถ ๐‘—] (๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) (ฮ”๐‘ฅ) = ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) + ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅand therefore

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) = supฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘—] (๐‘ฅ |๐‘ฅโˆ—+๐‘”โ€ฒ(๐‘ฅ)) (ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹

= supฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ + ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ . ๏ฟฝ

Theorem 26.5 (chain rule). Let ๐‘‹,๐‘Œ be Banach spaces, let ๐‘“ : ๐‘Œ โ†’ โ„ be convex, and let๐‘† : ๐‘‹ โ†’ ๐‘Œ be twice continuously dierentiable. Set ๐‘— (๐‘ฅ) โ‰” ๐‘“ (๐‘† (๐‘ฅ)). If there exists aneighborhood๐‘ˆ of ๐‘ฅ โˆˆ ๐‘‹ such that

333

26 second-order optimality conditions

(i) ๐‘“ is Clarke regular at ๐‘† (๐‘ฅ) for all ๐‘ฅ โˆˆ ๐‘‹ ;(ii) ๐‘†โ€ฒ(๐‘ฅ)โˆ— has a bounded left-inverse ๐‘†โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—) for all ๐‘ฅ โˆˆ ๐‘ˆ ;

(iii) the mapping ๐‘ฅ โ†ฆโ†’ ๐‘†โ€ฒ(๐‘ฅ)โ€ โˆ— is Frรฉchet dierentiable at ๐‘ฅ ;then for all ๐‘ฅโˆ— โˆˆ ๐œ•๐ถโ„Ž(๐‘ฅ) = ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)),

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = ใ€ˆ๐‘ฆโˆ—, [๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]ฮ”๐‘ฅใ€‰๐‘Œ +๐‘„ ๐‘“ (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ ; ๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ),๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = ใ€ˆ๐‘ฆโˆ—, [๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]ฮ”๐‘ฅใ€‰๐‘Œ +๐‘„ ๐‘“ (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ ; ๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (ฮ”๐‘ฅ โˆˆ ๐‘‹ ),

for the unique ๐‘ฆโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘† (๐‘ฅ)) such that ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐‘ฆโˆ— = ๐‘ฅโˆ—.

Proof. We again only consider the upper model ๐‘„ ๐‘— , the lower model being analogous. Dueto our assumptions, we can apply Corollary 24.16 to obtain

๐ท [๐œ•๐ถ (๐‘“ โ—ฆ ๐‘†)] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) = [๐‘†โ€ฒโ€ฒ(๐‘ฅ)โˆ—ฮ”๐‘ฅ]๐‘ฆโˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—๐ท [๐œ•๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ),

where ๐‘†โ€ฒโ€ฒ : ๐‘‹ โ†’ [๐‘‹ โ†’ ๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—)]. Thus every ฮ”๐‘ฅโˆ— โˆˆ ๐ท [๐œ•๐ถ (๐‘“ โ—ฆ ๐‘†)] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ) can bewritten for some ฮ”๐‘ฆโˆ— โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ) as ฮ”๐‘ฅโˆ— = [๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]โˆ—๐‘ฆโˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—ฮ”๐‘ฆโˆ—.Inserting this into the denition of ๐‘„ ๐‘— yields

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = supฮ”๐‘ฆโˆ—โˆˆ๐ท [๐œ•๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘† โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)

ใ€ˆ[๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]โˆ—๐‘ฆโˆ— + ๐‘†โ€ฒ(๐‘ฅ)โˆ—ฮ”๐‘ฆโˆ—,ฮ”๐‘ฅใ€‰๐‘‹

= ใ€ˆ๐‘ฆโˆ—, [๐‘†โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ]ฮ”๐‘ฅใ€‰๐‘Œ + supฮ”๐‘ฆโˆ—โˆˆ๐ท [๐œ•๐‘“ ] (๐‘† (๐‘ฅ) |๐‘ฆโˆ—) (๐‘† โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฆโˆ—, ๐‘†โ€ฒ(๐‘ฅ)ฮ”๐‘ฅใ€‰๐‘Œ . ๏ฟฝ

26.2 subconvexity

We say that ๐‘“ : ๐‘‹ โ†’ โ„ is subconvex near ๐‘ฅ for ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ) if for all ๐œŒ > 0, there existsY > 0 such that

(26.1) ๐‘“ (๐‘ฅ)โˆ’ ๐‘“ (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅโˆ’๐‘ฅใ€‰๐‘‹ โˆ’ ๐œŒ2 โ€–๐‘ฅโˆ’๐‘ฅ โ€–2๐‘‹ (๐‘ฅ, ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, Y); ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ)โˆฉ๐”น(๐‘ฅโˆ—, Y)) .

We say that ๐‘“ is subconvex at ๐‘ฅ for ๐‘ฅโˆ— if this holds with ๐‘ฅ = ๐‘ฅ xed. It is clear that convexfunctions are subconvex near any point for any subderivative. By extension, scalar functionssuch as ๐‘ก โ†ฆโ†’ |๐‘ก |๐‘ž for ๐‘ž โˆˆ (0, 1) that are locally minorized by ๐‘ฅ โ†ฆโ†’ ๐‘“ (๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ atpoints of nonsmoothness are also subconvex.

The sum of two subconvex functions for which the subdierential sum rule holds is clearlyalso subconvex. The next result shows that smooth functions simply need to have a non-negative Hessian at the point ๐‘ฅ to be subconvex. This is in contrast to the everywherenon-negative Hessian of convex functions.

334

26 second-order optimality conditions

Lemma 26.6. Let๐‘‹ be a Banach space and let ๐‘“ : ๐‘‹ โ†’ โ„ be twice continuously dierentiable.If ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 for all ฮ”๐‘ฅ โˆˆ ๐‘‹ , then ๐‘“ is subconvex near ๐‘ฅ โˆˆ ๐‘‹ for ๐‘“ โ€ฒ(๐‘ฅ).

Proof. Fix ๐œŒ > 0. We apply Theorem 2.10 rst to ๐‘“ to obtain for every ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ that

๐‘“ (๐‘ฅ + โ„Ž) โˆ’ ๐‘“ (๐‘ฅ) =โˆซ 1

0ใ€ˆ๐‘“ โ€ฒ(๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰๐‘‹ ๐‘‘๐‘ก .

Similarly, the same theorem applied to ๐‘ก โ†ฆโ†’ ใ€ˆ๐‘“ โ€ฒ(๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰ for any ๐‘ฅ, โ„Ž โˆˆ ๐‘‹ yields

ใ€ˆ๐‘“ โ€ฒ(๐‘ฅ + ๐‘กโ„Ž), โ„Žใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘“ โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ =โˆซ 1

0ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ + ๐‘ ๐‘กโ„Ž)โ„Ž,โ„Žใ€‰๐‘‹ ๐‘‘๐‘ .

Combined these two expansions yield

(26.2) ๐‘“ (๐‘ฅ + โ„Ž) โˆ’ ๐‘“ (๐‘ฅ) = ใ€ˆ๐‘“ โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ +โˆซ 1

0

โˆซ 1

0ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ + ๐‘ ๐‘กโ„Ž)โ„Ž,โ„Žใ€‰๐‘‹ ๐‘‘๐‘  ๐‘‘๐‘ก .

Since ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)โ„Ž,โ„Žใ€‰๐‘‹ โ‰ฅ 0, we have

ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ + ๐‘ž)โ„Ž,โ„Žใ€‰๐‘‹ โ‰ฅ ใ€ˆ[๐‘“ โ€ฒโ€ฒ(๐‘ฅ + ๐‘ž) โˆ’ ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)]โ„Ž,โ„Žใ€‰๐‘‹ (๐‘ฅ, ๐‘ž, โ„Ž โˆˆ ๐‘‹ ).

Therefore, by the continuity of ๐‘“ โ€ฒโ€ฒ, for any ๐œŒ > 0 we can nd Y > 0 such that

ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ + ๐‘ž)โ„Ž,โ„Žใ€‰๐‘‹ โ‰ฅ โˆ’๐œŒ2 โ€–โ„Žโ€–2๐‘‹ (๐‘ž โˆˆ ๐”น(0, Y), ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, Y), โ„Ž โˆˆ ๐‘‹ ).

Taking ๐‘ž = ๐‘ ๐‘กโ„Ž, this and (26.2) shows that

๐‘“ (๐‘ฅ + โ„Ž) โˆ’ ๐‘“ (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘“ โ€ฒ(๐‘ฅ), โ„Žใ€‰๐‘‹ โˆ’ ๐œŒ

2 โ€–โ„Žโ€–2๐‘‹ .

The claim now follows by taking โ„Ž = ๐‘ฅ โˆ’ ๐‘ฅ . ๏ฟฝ

Remark 26.7. Subconvexity,which to our knowledge has not previously been treated in the literature,is a stronger condition than the prox-regularity introduced in [Poliquin & Rockafellar 1996]. Thelatter requires (26.1) to hold merely for a xed ๐œŒ > 0. The denition in [Rockafellar & Wets 1998] isslightly broader and implies the earlier one. Their denition is itself a modication of the primal-lower-nice functions of [Thibault & Zagrodny 1995]. Our notion of subconvexity is also related tothose of subsmooth sets and submonotone operators introduced in [Aussel, Daniilidis & Thibault2005]. An alternative concept for functions, subsmoothness and lower-๐ถ๐‘˜ , has been introduced in[Rockafellar 1981].

335

26 second-order optimality conditions

26.3 sufficient and necessary conditions

We start with sucient conditions, which are based on the upper model.

Theorem 26.8. Let ๐‘‹ be a Banach space and ๐‘“ : ๐‘‹ โ†’ โ„. If for ๐‘ฅ โˆˆ ๐‘‹ ,(i) ๐‘“ is subconvex near ๐‘ฅ for ๐‘ฅโˆ— = 0;

(ii) 0 โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ);(iii) there exists a ` > 0 such that

๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |0) โ‰ฅ `โ€–ฮ”๐‘ฅ โ€–2๐‘‹ (ฮ”๐‘ฅ โˆˆ ๐‘‹ );

then ๐‘ฅ is a strict local minimizer of ๐‘“ .

Proof. Let ๐‘ฅโˆ— โ‰” 0 and ฮ”๐‘ฅ โˆˆ ๐‘‹ . By the assumed subconvexity, for every ๐œŒ > 0 there existsY๐œŒ > 0 such that for ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, Y๐œŒ/2) and ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ) โˆฉ ๐”น(๐‘ฅโˆ—, Y๐œŒ), we have for every ๐‘ก > 0with ๐‘ก โ€–ฮ”๐‘ฅ โ€–๐‘‹ < 1

2Y๐œŒ that

๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ) โˆ’ ๐‘ก ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก2

โ‰ฅ ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก

โˆ’ ๐œŒ

2 โ€–ฮ”๐‘ฅ โ€–2๐‘‹ .

Since ๐œŒ > 0 was arbitrary, we thus obtain for every ฮ”๐‘ฅ โˆˆ ๐‘‹ and ฮ”๐‘ฅโˆ— โˆˆ ๐ท [๐œ•๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)that

๐ด(ฮ”๐‘ฅ,ฮ”๐‘ฅ,ฮ”๐‘ฅโˆ—) โ‰” lim inf๐‘กโ†’ 0, (๐‘ฅโˆ’๐‘ฅ)/๐‘กโ†’ฮ”๐‘ฅ

(๐‘ฅโˆ—โˆ’๐‘ฅโˆ—)/๐‘กโ†’ฮ”๐‘ฅโˆ—, ๐‘ฅโˆ—โˆˆ๐œ•๐ถ ๐‘“ (๐‘ฅ)

๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ) โˆ’ ๐‘ก ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก2

โ‰ฅ lim inf๐‘กโ†’ 0, (๐‘ฅโˆ’๐‘ฅ)/๐‘กโ†’ฮ”๐‘ฅ

(๐‘ฅโˆ—โˆ’๐‘ฅโˆ—)/๐‘กโ†’ฮ”๐‘ฅโˆ—, ๐‘ฅโˆ—โˆˆ๐œ•๐ถ ๐‘“ (๐‘ฅ)

ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก

= ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ .

This implies that

supฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)

๐ด(ฮ”๐‘ฅ,ฮ”๐‘ฅ,ฮ”๐‘ฅโˆ—) โ‰ฅ supฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ =: ๐ต(ฮ”๐‘ฅ,ฮ”๐‘ฅ).

Since ๐‘ฅโˆ— = 0, we can x ๐‘ฅ = ๐‘ฅ + ๐‘กฮ”๐‘ฅ and ฮ”๐‘ฅ = ฮ”๐‘ฅ in the lim inf above and use (iii) to obtain

(26.3) lim inf๐‘กโ†’ 0

๐‘“ (๐‘ฅ + 2๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ)๐‘ก2

โ‰ฅ ๐ต(ฮ”๐‘ฅ,ฮ”๐‘ฅ).

Similarly, xing ๐‘ฅ = ๐‘ฅ and ฮ”๐‘ฅ = 0 yields

(26.4) lim inf๐‘กโ†’ 0

๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ)๐‘ก2

โ‰ฅ ๐ต(ฮ”๐‘ฅ, 0) โ‰ฅ 0,

where the nal inequality follows from the denition of ๐ต by taking ฮ”๐‘ฅโˆ— = 0 (which ispossible since ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ)). We now make a case distinction.

336

26 second-order optimality conditions

(I) ๐ต(ฮ”๐‘ฅ, 0) โ‰ฅ `โ€–ฮ”๐‘ฅ โ€–2๐‘‹ . In this case, the lim inf is strictly positive for ฮ”๐‘ฅ โ‰  0 and hence๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) > ๐‘“ (๐‘ฅ) for all ๐‘ก > 0 suciently small.

(II) ๐ต(ฮ”๐‘ฅ, 0) < `โ€–ฮ”๐‘ฅ โ€–2๐‘‹ . In this case, it follows from (iii) that

`โ€–ฮ”๐‘ฅ โ€–2๐‘‹ โ‰ค ๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |0) = max{๐ต(ฮ”๐‘ฅ,ฮ”๐‘ฅ), ๐ต(ฮ”๐‘ฅ, 0)}

and hence that ๐ต(ฮ”๐‘ฅ,ฮ”๐‘ฅ) = ๏ฟฝ๏ฟฝ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |0) โ‰ฅ `โ€–ฮ”๐‘ฅ โ€–2๐‘‹ . Summing (26.3) and (26.4) thenyields

lim inf๐‘กโ†’ 0

๐‘“ (๐‘ฅ + 2๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ)๐‘ก2

โ‰ฅ ๐ต(ฮ”๐‘ฅ,ฮ”๐‘ฅ) โ‰ฅ `โ€–ฮ”๐‘ฅ โ€–2๐‘‹ ,

which again implies for ฮ”๐‘ฅ โ‰  0 that ๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) > 0๐‘“ (๐‘ฅ) for all ๐‘ก > 0 sucientlysmall.

Since ฮ”๐‘ฅ โˆˆ ๐‘‹ was arbitrary, ๐‘ฅ is by denition a strict local minimizer of ๐‘“ . ๏ฟฝ

Remark 26.9. The use of the stationarity curvature model ๐‘„ ๐‘“0 in the second-order condition is

required since the upper curvature model may not provide any information about the growth of๐‘“ at ๐‘ฅ in certain directions. However, since ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (0) is a cone, if it contains any elementฮ”๐‘ฅโˆ— such that ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ > 0, then ๐ต(ฮ”๐‘ฅ, 0) = ๐‘„ ๐‘“

0 (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = โˆž, ensuring that the condition (iii)holds in the direction ฮ”๐‘ฅ for any ` > 0. For example, if ๐‘“ (๐‘ฅ) = |๐‘ฅ |, then Lemma 26.3 shows that๐‘„ ๐‘“ (ฮ”๐‘ฅ ; 0|0) = 0 for ฮ”๐‘ฅ โ‰  0, which indeed does not provide any information about the growthof ๐‘“ at 0. Conversely, ๐‘„ ๐‘“

0 (ฮ”๐‘ฅ ; 0|0) = โˆž for any ฮ”๐‘ฅ โ‰  0, so the growth is more rapid than ๐‘„ ๐‘“ canmeasure.

Combining Theorem 26.8 with Theorem 26.1, we obtain the classical sucient second-ordercondition. (Recall that in innite-dimensional spaces, positive deniteness and coercivityare no longer equivalent, and the latter, stronger, property is usually required.)

Corollary 26.10. Let ๐‘‹ be a Banach space and let ๐‘“ : ๐‘‹ โ†’ โ„ be twice continuously dieren-tiable. If for ๐‘ฅ โˆˆ ๐‘‹ ,

(i) ๐‘“ โ€ฒ(๐‘ฅ) = 0;

(ii) there exists a ` > 0 such that

ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ โ‰ฅ `โ€–ฮ”๐‘ฅ โ€–2๐‘‹ (ฮ”๐‘ฅ โˆˆ ๐‘‹ );

then ๐‘ฅ is a local minimizer of ๐‘“ .

Proof. To apply Theorem 26.8, it suces to note that ๐œ•๐ถ ๐‘“ (๐‘ฅ) = {๐‘“ โ€ฒ(๐‘ฅ)} by Theorem 13.5 andthat the second-order condition ensures subconvexity of ๐‘“ at ๐‘ฅ for ๐‘ฅโˆ— = 0 by Lemma 26.6.

๏ฟฝ

337

26 second-order optimality conditions

For nonsmooth functionals, we merely illustrate the sucient second-order condition witha simple but nontrivial scalar example.

Corollary 26.11. Let ๐‘‹ = โ„ and ๐‘— โ‰” ๐‘“ + ๐‘” for ๐‘” : โ„ โ†’ โ„ twice continuously dierentiableand ๐‘“ (๐‘ฅ) = |๐‘ฅ |. Then the sucient condition of Theorem 26.8 holds at ๐‘ฅ โˆˆ โ„ if and only ifone of the following cases holds:

(a) ๐‘ฅ = 0 and |๐‘”โ€ฒ(๐‘ฅ) | < 1;

(b) ๐‘ฅ = 0, |๐‘”โ€ฒ(๐‘ฅ) | = 1, and ๐‘”โ€ฒโ€ฒ(๐‘ฅ) > 0; or

(c) ๐‘ฅ โ‰  0, ๐‘”โ€ฒ(๐‘ฅ) = โˆ’ sign๐‘ฅ , and ๐‘”โ€ฒโ€ฒ(๐‘ฅ) > 0.

Proof. We apply Theorem 26.8, for which we need to verify its conditions. First, note that(ii) is equivalent to 0 = ๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ) for some ๐‘ฅโˆ— โˆˆ ๐œ•๐‘“ (๐‘ฅ) = sign๐‘ฅ by Theorem 13.20 andExample 4.7.

We now verify the subconvexity of ๐‘— near ๐‘ฅ for ๐‘ฅโˆ— = 0. Expanding the denition (26.1), thisrequires

(26.5) |๐‘ฅ | โˆ’ |๐‘ฅ | + ๐‘”(๐‘ฅ) โˆ’ ๐‘”(๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ), ๐‘ฅ โˆ’ ๐‘ฅใ€‰ โˆ’ ๐œŒ

2 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2

(๐‘ฅ, ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, Y); ๐‘ฅโˆ— โˆˆ ๐œ•๐ถ | ยท | (๐‘ฅ) โˆฉ ๐”น(๐‘ฅโˆ— โˆ’ ๐‘”โ€ฒ(๐‘ฅ), Y)) .

In cases (b) and (c), we can apply Lemma 26.6 to deduce the subconvexity of ๐‘” and thereforeof ๐‘— = ๐‘“ + ๐‘” since ๐‘“ is convex. For case (a), we have ๐‘ฅ = 0 with |๐‘”โ€ฒ(๐‘ฅ) | < 1. Since ๐‘”โ€ฒ iscontinuous, we consequently have ๐‘ฅโˆ— โˆ’๐‘”โ€ฒ(๐‘ฅ) = โˆ’๐‘”โ€ฒ(๐‘ฅ) โˆˆ (โˆ’1, 1) when |๐‘ฅ โˆ’๐‘ฅ | = |๐‘ฅ | is smallenough. Since ๐œ•๐‘“ (๐‘ฅ) โˆˆ {โˆ’1, 1} for ๐‘ฅ โ‰  0, it follows that ๐œ•๐ถ | ยท | (๐‘ฅ) โˆฉ ๐”น(๐‘ฅโˆ— โˆ’ ๐‘”โ€ฒ(๐‘ฅ), Y) = โˆ… for๐‘ฅ โˆˆ ๐”น(๐‘ฅ, Y) \ {๐‘ฅ} for small enough Y > 0. Therefore, for small enough Y > 0, the condition(26.5) reduces to

(26.6) |๐‘ฅ | +๐‘”(๐‘ฅ) โˆ’๐‘”(0) โ‰ฅ ใ€ˆ๐‘ฅโˆ— +๐‘”โ€ฒ(0), ๐‘ฅใ€‰ โˆ’ ๐œŒ2 |๐‘ฅ |2 (๐‘ฅ โˆˆ [โˆ’Y, Y], |๐‘ฅโˆ— | โ‰ค 1, |๐‘ฅโˆ— +๐‘”โ€ฒ(0) | โ‰ค Y).

Furthermore, |๐‘”โ€ฒ(0) | < 1 implies that for every ๐œŒ > 0 and ๐‘ > 0, we can nd an Y > 0suciently small that

(1 โˆ’ Y โˆ’ |๐‘”โ€ฒ(0) |) |๐‘ฅ | โ‰ฅ ๐‘ โˆ’ ๐œŒ2 |๐‘ฅ |2 (๐‘ฅ โˆˆ [โˆ’Y, Y]) .

Since ๐‘” : โ„ โ†’ โ„ is twice continuously dierentiable, we can apply a Taylor expansion in๐‘ฅ = 0 to obtain for some ๐‘ > 0 and |๐‘ฅ | suciently small that

๐‘”(0) โ‰ค ๐‘”(๐‘ฅ) + ใ€ˆ๐‘”โ€ฒ(0),โˆ’๐‘ฅใ€‰ + ๐‘2 |๐‘ฅ |2.

338

26 second-order optimality conditions

Adding this to the previous inequality, we obtain for suciently small Y > 0 and ๐‘ฅโˆ— โˆˆ [โˆ’1, 1]satisfying |๐‘ฅโˆ— + ๐‘”โ€ฒ(0) | โ‰ค Y that

|๐‘ฅ | + ๐‘”(๐‘ฅ) โˆ’ ๐‘”(0) โ‰ฅ (|๐‘”โ€ฒ(0) | + Y) |๐‘ฅ | + ใ€ˆ๐‘”โ€ฒ(0), ๐‘ฅใ€‰ โˆ’ ๐œŒ

2 |๐‘ฅ |2

โ‰ฅ ใ€ˆ๐‘ฅโˆ— + ๐‘”โ€ฒ(0), ๐‘ฅใ€‰ โˆ’ ๐œŒ

2 |๐‘ฅ |2

for every |๐‘ฅ | โ‰ค Y, which is (26.6). Hence ๐‘— = ๐‘“ +๐‘” is subconvex near ๐‘ฅ = 0 for 0 = ๐‘ฅโˆ— +๐‘”โ€ฒ(0).To verify (iii), we compute the upper curvature model. Let ฮ”๐‘ฅ โˆˆ ๐‘‹ . Then by Theorems 26.1and 26.4,

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) + ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰,๐‘„ ๐‘—0(ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—),

where ๐‘„ ๐‘“ is given by Lemma 26.3. It follows that

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) ={โˆ’โˆž if ๐‘ฅ = 0, ฮ”๐‘ฅ โ‰  0, signฮ”๐‘ฅ โ‰  ๐‘ฅโˆ—,ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰ otherwise,

and

๐‘„ ๐‘—0(ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) =

0 if ๐‘ฅ โ‰  0, ๐‘ฅโˆ— = sign๐‘ฅ,0 if ๐‘ฅ = 0, |๐‘ฅโˆ— | = 1, ๐‘ฅโˆ—ฮ”๐‘ฅ โ‰ฅ 0,โˆž if ๐‘ฅ = 0, |๐‘ฅโˆ— | = 1, ๐‘ฅโˆ—ฮ”๐‘ฅ < 0,โˆž if ๐‘ฅ = 0, |๐‘ฅโˆ— | < 1.

Thus

๏ฟฝ๏ฟฝ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) =

max{0, ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰} if ๐‘ฅ โ‰  0, ๐‘ฅโˆ— = sign๐‘ฅ,max{0, ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰} if ๐‘ฅ = 0, |๐‘ฅโˆ— | = 1, ๐‘ฅโˆ—ฮ”๐‘ฅ โ‰ฅ 0,โˆž if ๐‘ฅ = 0, |๐‘ฅโˆ— | = 1, ๐‘ฅโˆ—ฮ”๐‘ฅ < 0,โˆž if ๐‘ฅ = 0, |๐‘ฅโˆ— | < 1.

The condition (iii) is thus equivalent to

max{0, ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰} โ‰ฅ `โ€–ฮ”๐‘ฅ โ€–2 when{๐‘ฅ โ‰  0 or๐‘ฅ = 0, |๐‘”โ€ฒ(๐‘ฅ) | = 1, and ๐‘”โ€ฒ(๐‘ฅ)ฮ”๐‘ฅ < 0.

The left inequality can only hold for arbitrary ฮ”๐‘ฅ โˆˆ โ„ if ` = ๐‘”โ€ฒโ€ฒ(๐‘ฅ) > 0. Hence (ii) and (iii)hold if and only if one of the cases (a)โ€“(c) holds. ๏ฟฝ

Note that case (a) corresponds to the case of strict complementarity or graphical regularityof ๐œ•๐‘“ in Theorem 20.18. Conversely, cases (b) and (c) imply that ๐‘” and therefore ๐‘— is locallyconvex, recalling from Theorem 4.2 that for convex functionals, the rst-order optimalityconditions are necessary and sucient.

Now we formulate our necessary condition, which is based on the lower curvature model.

339

26 second-order optimality conditions

Theorem 26.12. Let ๐‘‹ be a Banach space and ๐‘“ : ๐‘‹ โ†’ โ„. If ๐‘ฅ โˆˆ ๐‘‹ is a local minimizer of ๐‘“and ๐‘“ is locally Lipschitz continuous and subconvex at ๐‘ฅ for 0 โˆˆ ๐‘‹ โˆ—, then

๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |0) โ‰ฅ 0 (ฮ”๐‘ฅ โˆˆ ๐‘‹ ) .

Proof. We have from Theorem 13.4 that ๐‘ฅโˆ— โ‰” 0 โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ). By the assumed subconvexity, forevery ๐œŒ > 0 there exists Y > 0 such that for ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, Y/2) and ๐‘ฅโˆ—๐‘ก โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆฉ๐”น(๐‘ฅโˆ—, Y),we have for every ๐‘ก > 0 with ๐‘ก โ€–ฮ”๐‘ฅ โ€–๐‘‹ < Y/2 that

๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ) โˆ’ ๐‘ก ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก2

โ‰ค ใ€ˆ๐‘ฅโˆ—๐‘ก โˆ’ ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก

+ ๐œŒ2 โ€–ฮ”๐‘ฅ โ€–2๐‘‹ .

For every ฮ”๐‘ฅโˆ— โˆˆ ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ), by denition there exist ฮ”๐‘ฅ โ†’ ฮ”๐‘ฅ and, for smallenough ๐‘ก > 0, ๐‘ฅโˆ—๐‘ก โˆˆ ๐œ•๐ถ ๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆฉ๐”น(๐‘ฅโˆ—, Y) such that (๐‘ฅโˆ—๐‘ก โˆ’๐‘ฅโˆ—)/๐‘ก โ†’ ฮ”๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—. Since ๐œŒ > 0was arbitrary and ๐‘ฅโˆ— = 0, it follows that

lim infฮ”๐‘ฅโ†’ฮ”๐‘ฅ๐‘กโ†’ 0

๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) โˆ’ ๐‘“ (๐‘ฅ)๐‘ก2

โ‰ค lim infฮ”๐‘ฅโ†’ฮ”๐‘ฅ๐‘กโ†’ 0

( ใ€ˆ๐‘ฅโˆ—๐‘ก โˆ’ ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก

+ ใ€ˆ๐‘ฅโˆ—๐‘ก โˆ’ ๐‘ฅโˆ—,ฮ”๐‘ฅ โˆ’ ฮ”๐‘ฅใ€‰๐‘‹๐‘ก

)= lim inf

๐‘กโ†’ 0

ใ€ˆ๐‘ฅโˆ—๐‘ก โˆ’ ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹๐‘ก

โ‰ค infฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |๐‘ฅโˆ—) (ฮ”๐‘ฅ)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹= ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |0).

Since ๐‘ฅ is a local minimizer, we have ๐‘“ (๐‘ฅ) โ‰ค ๐‘“ (๐‘ฅ + ๐‘กฮ”๐‘ฅ) for ๐‘ก > 0 suciently small andฮ”๐‘ฅ suciently close to ฮ”๐‘ฅ . Rearranging and passing to the limit thus yields the claimednonnegativity of ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |0). ๏ฟฝ

Remark 26.13. Compared to the sucient condition of Theorem 26.8, the necessary condition doesnot involve a โ€œstationary lower modelโ€

๐‘„ ๐‘“ ,0(ฮ”๐‘ฅ ;๐‘ฅ |0) โ‰” infฮ”๐‘ฅโˆ—โˆˆ๐ท [๐œ•๐ถ ๐‘“ ] (๐‘ฅ |0) (0)

ใ€ˆฮ”๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ .

In fact, ๐‘„ ๐‘“ ,0(ฮ”๐‘ฅ ;๐‘ฅ |0) โ‰ฅ 0 is not a necessary optimality condition: let ๐‘“ (๐‘ฅ) = |๐‘ฅ |, ๐‘ฅ โˆˆ โ„, and ๐‘ฅ = 0.Then by Theorem 20.18, ๐ท [๐œ•๐‘“ ] (0|0) (0) = โ„ and hence ๐‘„ ๐‘“ ,0(ฮ”๐‘ฅ ; 0|0) = โˆ’โˆž for all ฮ”๐‘ฅ โ‰  0.

For smooth functions, we recover the usual second-order necessary condition from Theo-rem 26.1.

Corollary 26.14. Let ๐‘‹ be a Banach space and let ๐‘“ : ๐‘‹ โ†’ โ„ be twice continuously dieren-tiable. If ๐‘ฅ โˆˆ ๐‘‹ is a local minimizer of ๐‘“ , then

ใ€ˆ๐‘“ โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰๐‘‹ โ‰ฅ 0 (ฮ”๐‘ฅ โˆˆ ๐‘‹ ) .

340

26 second-order optimality conditions

We again illustrate the nonsmooth case with a scalar example.

Corollary 26.15. Let๐‘‹ = โ„ and ๐‘— โ‰” ๐‘“ +๐‘” for๐‘” : โ„ โ†’ โ„ twice continuously dierentiable and๐‘“ (๐‘ฅ) = |๐‘ฅ |. Then the necessary condition of Theorem 26.12 holds at ๐‘ฅ if and only if ๐‘”โ€ฒโ€ฒ(๐‘ฅ) โ‰ฅ 0.

Proof. We apply Theorem 26.12, for which we need to verify its conditions. Both ๐‘“ and ๐‘”are locally Lipschitz continuous by Theorem 3.13 and Lemma 2.11, respectively, and henceso is ๐‘— . We have already veried the subconvexity of ๐‘— in Corollary 26.11.

By Theorems 13.4 and 13.20 and Example 4.7, we again have 0 = ๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ) for some๐‘ฅโˆ— โˆˆ ๐œ•๐‘“ (๐‘ฅ) = sign๐‘ฅ . It remains to compute the lower curvature model. Let ฮ”๐‘ฅ โˆˆ ๐‘‹ . ByTheorems 26.1 and 26.4,

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) = ๐‘„ ๐‘“ (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ—) + ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰,

where ๐‘„ ๐‘“ is given by Lemma 26.3. It follows that

๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |๐‘ฅโˆ— + ๐‘”โ€ฒ(๐‘ฅ)) ={โˆž if ๐‘ฅ = 0, ฮ”๐‘ฅ โ‰  0, signฮ”๐‘ฅ โ‰  ๐‘ฅโˆ—,ใ€ˆ๐‘”โ€ฒโ€ฒ(๐‘ฅ)ฮ”๐‘ฅ,ฮ”๐‘ฅใ€‰ otherwise.

Hence the condition ๐‘„ ๐‘— (ฮ”๐‘ฅ ;๐‘ฅ |0) โ‰ฅ 0 for all ฮ”๐‘ฅ โˆˆ ๐‘‹ reduces to ๐‘”โ€ฒโ€ฒ(๐‘ฅ) โ‰ฅ 0. ๏ฟฝ

Remark 26.16. Second-order optimality conditions can also be based on epigraphical derivatives,which were introduced in [Rockafellar 1985; Rockafellar 1988]; we refer to [Rockafellar & Wets1998] for a detailed discussion. A related approach based on second-order directional curvaturefunctionals was used in [Christof & Wachsmuth 2018] for deriving necessary and sucient second-order optimality conditions for smooth optimization problems subject to nonsmooth and possiblynonconvex constraints.

341

27 LIPSCHITZ-LIKE PROPERTIES AND STABILITY

A related issue to second-order conditions is that of stability of the solution to optimizationproblems under perturbation. To motivate the following, let ๐‘“ : ๐‘‹ โ†’ โ„ and suppose wewish to nd ๐‘ฅ โˆˆ ๐‘‹ such that 0 โˆˆ ๐œ•๐‘“ (๐‘ฅ) for a suitable subdierential. Suppose further thatwe are given some ๐‘ฅ โˆˆ ๐‘‹ with๐‘ค โˆˆ ๐œ•๐‘“ (๐‘ฅ) with โ€–๐‘ค โ€–๐‘‹ โˆ— โ‰ค Y โ€“ say, from one of the algorithmsin Chapter 8. A natural question is then for an error estimate โ€–๐‘ฅ โˆ’๐‘ฅ โ€–๐‘‹ in terms of Y. Clearly,if ๐œ•๐‘“ has a single-valued and Lipschitz continuous inverse, this is the case since then

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ = โ€–(๐œ•๐‘“ )โˆ’1(0) โˆ’ (๐œ•๐‘“ )โˆ’1(๐‘ค)โ€–๐‘‹ โ‰ค ๐ฟโ€–๐‘ค โ€–๐‘‹ โˆ— .

Of course, the situation is much more complicated in the set-valued case. To treat this, werst have to dene suitable notions of Lipschitz-like behavior of set-valuedmappings,whichwe then characterize using coderivatives (generalizing the characterization of the Lipschitzconstant of a dierentiable single-valued mapping through the norm of its derivative). Wereturn to the question of stability of minimizers in the more general context of perturbationsof parametrized solution mappings.

27.1 lipschitz-like properties of set-valued mappings

To set up the denition of Lipschitz-like properties for set-valued mappings, it is helpful torecall from Section 1.1 for single-valued functions the distinction between (point-based)local Lipschitz continuity at a point and (neighborhood-based) local Lipschitz continuitynear a point. (Figure 27.2b below shows a function that is locally Lipschitz at but notnear the give point.) Similarly, we will have to distinguish for set-valued mappings thecorresponding notions of Aubin property (which is point-based) and calmness (which isneighborhood-based). If these properties hold for the inverse of a mapping, we will callthe mapping itself metrically regular and metrically subregular, respectively. These fourproperties are illustrated in Figure 27.1.

Recall also from Lemma 17.4 the denition of the distance of a point ๐‘ฅ โˆˆ ๐‘‹ to a set ๐ด โŠ‚ ๐‘‹ ,which we here write for the sake of convenience as

dist(๐ด, ๐‘ฅ) โ‰” dist(๐‘ฅ,๐ด) โ‰” ๐‘‘๐ด (๐‘ฅ) = inf๐‘ฅโˆˆ๐ด

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ .

342

27 lipschitz-like properties and stability

(๐‘ฅ, ๐‘“ (๐‘ฅ))(a) locally Lipschitz ๐‘“

(๐‘ฆ, ๐‘“ โˆ’1(๐‘ฆ))

(b) locally Lipschitz ๐‘“ โˆ’1

(๐‘ฅ, ๐‘ฆ)

(๐‘ฅ, ๐‘ฆ)

(c) Aubin property of ๐น

(๐‘ฅ, ๐‘ฆ)

(๐‘ฅ, ๐‘ฆ)

(d) metric regularity of ๐น

(๐‘ฅ, ๐‘ฆ)

(๐‘ฅ, ๐‘ฆ)

(e) calmness of ๐น

(๐‘ฅ, ๐‘ฆ)

(๐‘ฅ, ๐‘ฆ)

(f) metric subregularity of ๐น

Figure 27.1: Illustration of Lipschitz-like properties using cones. The thick lines are thegraph of the function; if this graph is locally contained in a lled cone, the prop-erty holds, while a cross-hatched cone indicates that the property is violated.

We then say that ๐น : ๐‘‹ โ‡’ ๐‘Œ has the Aubin or pseudo-Lipschitz property at ๐‘ฅ for ๐‘ฆ if graph ๐นis closed near (๐‘ฅ, ๐‘ฆ) and there exist ๐›ฟ, ^ > 0 such that

(27.1) dist(๐‘ฆ, ๐น (๐‘ฅ)) โ‰ค ^ dist(๐นโˆ’1(๐‘ฆ), ๐‘ฅ) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ)) .We call the inmum of all ^ > 0 for which (27.1) holds for some ๐›ฟ > 0 the graphical modulusof ๐น at ๐‘ฅ for ๐‘ฆ , written lip ๐น (๐‘ฅ |๐‘ฆ).When we are interested in the stability of the optimality condition 0 โˆˆ ๐น (๐‘ฅ), it is typicallymore benecial to study the Aubin property of the inverse ๐นโˆ’1. This is called the metricregularity of ๐น at a point (๐‘ฅ, ๐‘ฆ) โˆˆ graph ๐น , which holds if there exist ^, ๐›ฟ > 0 such that

(27.2) dist(๐‘ฅ, ๐นโˆ’1(๐‘ฆ)) โ‰ค ^ dist(๐‘ฆ, ๐น (๐‘ฅ)) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ)) .We call the inmum of all ^ > 0 for which (27.2) holds for some ๐›ฟ > 0 themodulus of metricregularity of ๐น at ๐‘ฅ for ๐‘ฆ , written reg ๐น (๐‘ฅ |๐‘ฆ).

343

27 lipschitz-like properties and stability

The metric regularity and Aubin property are too strong to be satised in many applications.A weaker notion is provided by (metric) subregularity at (๐‘ฅ, ๐‘ฆ) โˆˆ graph ๐น , which holds ifthere exist ^, ๐›ฟ > 0 such that

(27.3) dist(๐‘ฅ, ๐นโˆ’1(๐‘ฆ)) โ‰ค ^ dist(๐‘ฆ, ๐น (๐‘ฅ)) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .

Compared to metric regularity, this allows much more leeway for ๐น by xing ๐‘ฆ = ๐‘ฆ โˆˆ ๐น (๐‘ฅ)(while still allowing ๐‘ฅ to vary). We call the inmum of all ^ > 0 for which (27.3) holds forsome ๐›ฟ > 0 for themodulus of (metric) subregularity of ๐น at ๐‘ฅ for ๐‘ฆ , written subreg ๐น (๐‘ฅ |๐‘ฆ).The counterpart of metric subregularity that relaxes the Aubin property is known ascalmness. We say that ๐น : ๐‘‹ โ‡’ ๐‘Œ is calm at ๐‘ฅ for ๐‘ฆ if there exist ^, ๐›ฟ > 0 such that

(27.4) dist(๐‘ฆ, ๐น (๐‘ฅ)) โ‰ค ^ dist(๐‘ฅ, ๐นโˆ’1(๐‘ฆ)) (๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ)) .

We call the inmum of all ^ > 0 for which (27.4) holds for some ๐›ฟ > 0 the modulus ofcalmness of ๐น at ๐‘ฅ for ๐‘ฆ , written calm ๐น (๐‘ฅ |๐‘ฆ). Clearly the Aubin property implies calmness,while metric regularity implies metric subregularity.

Unfortunately, the direct calculation of the dierent moduli is often infeasible in practice.Much of the rest of this chapter concentrates on calculating the graphical modulus and themodulus of metric regularity in special cases. We will consider metric subregularity (as wellas a related, weaker, notion of strong submonotonicity) in the following Section 28.1.

Remark 27.1. The Aubin property is due to [Aubin 1984], whereas metric subregularity is dueto [Ioe 1979], rst given the modern name in [Dontchev & Rockafellar 2004]. Calmness wasintroduced in [Robinson 1981] as the upper Lipschitz property. Metric regularity is equivalent toopenness at a linear rate near (๐‘ข,๐‘ค) and holds for smooth maps by the classical Lyusternikโ€“Gravestheorem. We refer in particular to [Dontchev & Rockafellar 2014; Ioe 2017] for further informationon these and other related properties.

In particular, related to metric subregularity is the stronger concept of strong metric subregularity,which was introduced in [Rockafellar 1989] and requires the existence of ^, ๐›ฟ > 0 such that

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ^ dist(๐‘ฆ, ๐น (๐‘ฅ)) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)),

i.e., a bound on the norm distance to ๐‘ฅ rather than the closest preimage of ๐‘ฆ . Its many propertiesare studied in [Cibulka, Dontchev & Kruger 2018], which also introduced ๐‘ž-exponent versions. Par-ticularly worth noting is that strong metric subregularity is invariant with respect to perturbationsby smooth functions, while metric subregularity is not.

Weaker and โ€œpartialโ€ concepts of regularity have also been considered in the literature. Of particularnote is the directional metric subregularity of [Gfrerer 2013]. The idea here is to study necessaryoptimality conditions by requiring metric regularity or subregularity only along critical directionsinstead of all directions. In [Valkonen 2021], by contrast, the norms in the denition of subregularityare made operator-relative to study the partial subregularity on subspaces; compare the testing ofalgorithms for structured problems in Section 10.2.

344

27 lipschitz-like properties and stability

๐‘ฅ

(a) oscillating single-valued function

๐น (๐‘ฅ) + ๐”น(0, โ„“ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ )๐น (๐‘ฅ)

๐‘ฅ ๐‘ฅ

(b) graph of ๐‘ฅ โ†ฆโ†’ ๐น (๐‘ฅ) + ๐”น(0, โ„“ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ )

Figure 27.2: The oscillating example in (a) illustrates a function ๐‘“ that is locally Lipschitz(or calm) at ๐‘ฅ , but not locally Lipschitz (or does not have the Aubin property)near the same point: the graph of the function stays in the cone formed bythe thick lines and based at (๐‘ฅ, ๐‘“ (๐‘ฅ)) โˆˆ graph ๐‘“ . If, however, we move thecone locally along the graph, even increasing its width, the graph will not becontained the cone. In (b) we illustrate the โ€œfat coneโ€ structure graph(๐‘ฅ โ†ฆโ†’๐น (๐‘ฅ) + ๐”น(0, โ„“ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) appearing on the right-hand-side in Theorem 27.2 (i),and varying with the second base point ๐‘ฅ around ๐‘ฅ . This is to be contrastedwith the leaner cone graph(๐‘ฅ โ†ฆโ†’ ๐‘“ (๐‘ฅ)+๐”น(0, โ„“ โ€–๐‘ฅโˆ’๐‘ฅ โ€–๐‘‹ )) bounding the functionin (a).

We now provide alternative characterizations of the Aubin property and of calmness. Theseextend to metric regularity and subregularity, respectively, by application to the inverse.

The right-hand-side of the set-inclusion characterization (i) in the next theorem forms aโ€œfat coneโ€ that we illustrate in Figure 27.2b. It should locally at each base point ๐‘ฅ around ๐‘ฅbound ๐น for the Aubin property to be satised. Based on the formulation (i), we illustratein Figure 27.3 the satisfaction and dissatisfaction of the Aubin property. The other two newcharacterizations show that we do not need to restrict ๐‘ฅ to a tiny neighborhood of ๐‘ฅ inneither (i) nor the original characterization (27.1).

Theorem 27.2. Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then the following are equivalentfor ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฆ โˆˆ ๐น (๐‘ฅ):

(i) There exists ^, ๐›ฟ > 0 such that

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ, ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .

(ii) There exists ^, ๐›ฟ > 0 such that

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ); ๐‘ฅ โˆˆ ๐‘‹ ).

(iii) The Aubin property (27.1).

(iv) There exists ^, ๐›ฟ > 0 such that

dist(๐‘ฆ, ๐น (๐‘ฅ)) โ‰ค ^ dist(๐นโˆ’1(๐‘ฆ) โˆฉ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฅ) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ)) .

345

27 lipschitz-like properties and stability

The inmum of ^ > 0 for which each of these characterizations holds is equal to the graphicalmodulus lip ๐น (๐‘ฅ |๐‘ฆ). (The radius of validity ๐›ฟ > 0 for any given ^ > 0 may be distinct in eachof the characterizations, however.)

Proof. (i) โ‡” (ii): Clearly (ii) implies (i) with the same ^, ๐›ฟ > 0. To show the implication inthe other direction, we start by applying (i) with ๐‘ฅ = ๐‘ฅ , which yields

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .

Taking ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟโ€ฒ) for some ๐›ฟโ€ฒ โˆˆ (0, ๐›ฟ], we thus deduce that

๐‘ฆ โˆˆ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^๐›ฟโ€ฒ).

In particular, for any Yโ€ฒ > 0, we have

(27.5) ๐”น(๐‘ฆ, Yโ€ฒ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^๐›ฟโ€ฒ + Yโ€ฒ).

For ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), (ii) is immediate from (i), so we may concentrate on ๐‘ฅ โˆˆ ๐‘‹ \ ๐”น(๐‘ฅ, ๐›ฟ). Then

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โˆ’ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ ๐›ฟ โˆ’ ๐›ฟโ€ฒ.

If we pick Yโ€ฒ, ๐›ฟโ€ฒ > 0 such that ^๐›ฟโ€ฒ + Yโ€ฒ โ‰ค ^ (๐›ฟ โˆ’ ๐›ฟโ€ฒ), it follows

^๐›ฟโ€ฒ + Yโ€ฒ โ‰ค ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ .

Thus (27.5) gives, as illustrated in Figure 27.4,

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, Yโ€ฒ) โŠ‚ ๐”น(๐‘ฆ, Yโ€ฒ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^๐›ฟโ€ฒ + Yโ€ฒ) โŠ‚ ๐น (๐‘ฅ) + ^๐”น(0, โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ),

which is (ii).

(ii)โ‡” (iii): We expand (ii) as

{๏ฟฝ๏ฟฝ} โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๏ฟฝ๏ฟฝ โˆˆ ๐น (๐‘ฅ);๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ); ๐‘ฅ โˆˆ ๐‘‹ ).

By rearranging and taking the inmum over all ๐‘ฆ โˆˆ ๐น (๐‘ฅ), this yields

inf๐‘ฆโˆˆ๐น (๐‘ฅ)

โ€–๏ฟฝ๏ฟฝ โˆ’ ๐‘ฆ โ€–๐‘Œ โ‰ค ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ (๏ฟฝ๏ฟฝ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ); ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ); ๐‘ฅ โˆˆ ๐‘‹ ).

This may further be rewritten as

inf๐‘ฆโˆˆ๐น (๐‘ฅ)

โ€–๏ฟฝ๏ฟฝ โˆ’ ๐‘ฆ โ€–๐‘Œ โ‰ค inf๐‘ฅโˆˆ๐นโˆ’1 (๏ฟฝ๏ฟฝ)

^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ); ๏ฟฝ๏ฟฝ โˆฉ ๐”น(๐‘ฆ, ๐›ฟ)) .

Thus (iii) is equivalent to (i).

(iii)โ‡’ (iv): This is immediate from the denition of dist, which yields

dist(๐นโˆ’1(๐‘ฆ), ๐‘ฅ) โ‰ค dist(๐นโˆ’1(๐‘ฆ) โˆฉ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฅ).

346

27 lipschitz-like properties and stability

๐น๐‘ฅ๐‘ฅ

๐”น(๐‘ฆ, ๐œŒ)

๐”น(๐‘ฅ, ๐›ฟ)(a) property is satised

๐น

๐‘ฅ ๐‘ฅ

๐”น(๐‘ฆ, ๐œŒ)

๐”น(๐‘ฅ, ๐›ฟ)(b) property is not satised

Figure 27.3: Illustration of satisfaction and dissatisfaction of the Aubin property for ๐‘ฅ = ๐‘ฅ .The dashed lines indicate ๐”น(๐‘ฆ, ๐œŒ), and the dot marks (๐‘ฅ, ๐‘ฆ), while the dark graythick lines indicate ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐œŒ). It should remain within the bounds of theblack thick lines indicating โ€œfatโ€ cone ๐น (๐‘ฅ) +๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ). The violation ofthe bounds at the bottom in (a) does not matter, because we are only interestedin the area between the dashed lines.

(iv)โ‡’ (i): We express (iv) as

inf๏ฟฝ๏ฟฝโˆˆ๐น (๐‘ฅ)

โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ โ€–๐‘Œ โ‰ค ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ), ๐‘ฅ โˆˆ ๐นโˆ’1(๐‘ฆ) โˆฉ ๐”น(๐‘ฅ, ๐›ฟ)) .

This can be rearranged to imply that

{๐‘ฆ} โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ) โˆฉ ๐น (๐‘ฅ), ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)),

which can be further rewritten as

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ, ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)),

yielding (i). ๏ฟฝ

We have similar characterizations of calmness. The proof is analogous, simply xing๐‘ฅ = ๐‘ฅ .

Corollary 27.3. Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ . Then the following are equivalentfor ๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฆ โˆˆ ๐น (๐‘ฅ):

(i) There exists ^, ๐›ฟ > 0 such that

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .

(ii) There exists ^, ๐›ฟ > 0 such that

๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ) โŠ‚ ๐น (๐‘ฅ) + ๐”น(0, ^โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) (๐‘ฅ โˆˆ ๐‘‹ ).

347

27 lipschitz-like properties and stability

(๐‘ฅ, ๐‘ฆ)

๐”น(๐‘ฅ, ๐›ฟ)๐”น(๐‘ฅ, ๐›ฟ โ€ฒ)

๐‘ฅ๐น (๐‘ฅ) + ^๐”น(0, โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ )

{๐‘ฅ} ร— ๐”น(๐‘ฆ, Yโ€ฒ)

(a) illustration of technique (b) critical areas

Figure 27.4: (a) Illustration of the technique in Theorem 27.2 to prove the equivalence ofthe two set inclusion formulations of the Aubin property. For ๐‘ฅ outside theball ๐”น(๐‘ฅ, ๐›ฟ), the set ๐”น(๐‘ฆ, Yโ€ฒ) indicated by the thick dark gray line, is completelycontained in the fat-cone structure ๐น (๐‘ฅ) + ^๐”น(0, โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ ) of Figure 27.2b,indicated by the thick black and dotted lines. Closer to ๐‘ฅ , within ๐”น(๐‘ฅ, ๐›ฟ), thisis not the case, although ๐น (๐‘ฅ) โˆฉ๐”น(๐‘ฆ, Yโ€ฒ) itself is still contained in the structure.(b) highlights in darker color the areas that are critical for the Aubin propertyto hold.

(iii) Calmness (27.4).

(iv) There exists ^, ๐›ฟ > 0 such that

dist(๐‘ฆ, ๐น (๐‘ฅ)) โ‰ค ^ dist(๐นโˆ’1(๐‘ฆ) โˆฉ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฅ) (๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ)) .

The inmum of ^ > 0 for which each of these characterizations holds is equal to the modulusof calmness calm ๐น (๐‘ฅ |๐‘ฆ). (The radius of validity ๐›ฟ > 0 for any given ^ > 0 may be distinct ineach of the characterizations, however.)

27.2 neighborhood-based coderivative criteria

Our goal is now to relate the Aubin property to โ€œouter normsโ€ of limiting coderivatives,just as the Lipschitz property of dierentiable single-valued functions can be related tonorms of their derivatives. Before embarking on this in the next section, as a preparatorystep we relate in this section the Aubin property to neighborhood-based criteria on Frรฉchetcoderivatives. To this end, we dene for a set-valued mapping ๐น : ๐‘‹ โ‡’ ๐‘Œ , (๐‘ฅ, ๐‘ฆ) โˆˆ graph ๐น ,and ๐›ฟ, Y > 0

(27.6) ^Y๐›ฟ (๐‘ฅ |๐‘ฆ) โ‰” sup{โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅโˆ— โˆˆ ๐ทโˆ—Y ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—), โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1,

๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ)},

348

27 lipschitz-like properties and stability

which measures locally the opening of the cones ๐‘ Ygraph ๐น (๐‘ฅ |๐‘ฆ) around (๐‘ฅ, ๐‘ฆ); for smooth

functions and Y = 0, it coincides with the local supremum of โ€–๐ท๐น (๐‘ฅ)โ€–๐•ƒ(๐‘‹ ;๐‘Œ ) around (๐‘ฅ, ๐น (๐‘ฅ))(cf. Theorem 20.12). The next lemma bounds these openings in terms of the graphicalmodulus.

Lemma 27.4. Let ๐‘‹,๐‘Œ be Banach spaces and ๐น : ๐‘‹ โ‡’ ๐‘Œ . If graph ๐น is closed near (๐‘ฅ, ๐‘ฆ), then

inf๐›ฟ>0

^๐›ฟ๐›ฟ (๐‘ฅ |๐‘ฆ) โ‰ค inf๐›ฟ>0

^0๐›ฟ (๐‘ฅ |๐‘ฆ) โ‰ค lip ๐น (๐‘ฅ |๐‘ฆ).

Proof. Since ๐ทโˆ—Y ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚ ๐ทโˆ—(๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—), we always have ^๐›ฟ

๐›ฟ(๐‘ฅ |๐‘ฆ) โ‰ค ^0

๐›ฟ(๐‘ฅ |๐‘ฆ). It hence

suces to prove for any choice of Y (๐›ฟ) โˆˆ [0, ๐›ฟ] that

^ โ‰” inf๐›ฟ>0

^๐›ฟY (๐›ฟ) (๐‘ฅ |๐‘ฆ) โ‰ค lip ๐น (๐‘ฅ |๐‘ฆ) .

We may assume that lip ๐น (๐‘ฅ |๐‘ฆ) < โˆž, since otherwise there is nothing to prove. Thisimplies in particular that the Aubin property holds, so the denition (27.1) yields for any^โ€ฒ > lip ๐น (๐‘ฅ |๐‘ฆ) a ๐›ฟโ€ฒ > 0 such that

(27.7) inf๏ฟฝ๏ฟฝโˆˆ๐น (๐‘ฅ)

โ€–๏ฟฝ๏ฟฝ โˆ’ ๐‘ฆ โ€–๐‘Œ โ‰ค ^โ€ฒโ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ , (๐‘ฆ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟโ€ฒ), ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟโ€ฒ)) .

Pick หœ โˆˆ (0, ^) and ๐›ฟ โˆˆ (0, ๐›ฟโ€ฒ). By the denition of ^Y (๐›ฟ)๐›ฟ

(๐‘ฅ |๐‘ฆ), there exist ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ),๐‘ฆ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ), and (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘ Y (๐›ฟ)

graph ๐น (๐‘ฅ, ๐‘ฆ) such that โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ฅ หœ and โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1.Theorem 1.4 then yields a ฮ”๐‘ฅ โˆˆ ๐‘‹ such that

(27.8) ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ = โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— and โ€–ฮ”๐‘ฅ โ€–๐‘‹ = 1.

Let ๐œ๐‘˜โ†’ 0 with ๐œ๐‘˜ โ‰ค ๐›ฟ and set ๐‘ฅ๐‘˜ โ‰” ๐‘ฅ + ๐œ๐‘˜ฮ”๐‘ฅ . Then taking ๐‘ฅ = ๐‘ฅ๐‘˜ in (27.7), we can take๐‘ฆ๐‘˜ โˆˆ ๐น (๐‘ฅ๐‘˜) such that

(27.9) lim inf๐‘˜โ†’โˆž

๐œโˆ’1๐‘˜ โ€–๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ โ€–๐‘Œ โ‰ค ^โ€ฒโ€–ฮ”๐‘ฅ โ€–๐‘‹ .

In particular, after passing to a subsequence if necessary, we may assume that ๐‘ฆ๐‘˜ โ†’ ๐‘ฆstrongly in ๐‘Œ . Using (27.8), โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ฅ หœ , and โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1, this leads to

(27.10) lim sup๐‘˜โ†’โˆž

๐œโˆ’1๐‘˜ (ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ )

= lim sup๐‘˜โ†’โˆž

(ใ€ˆ๐‘ฅโˆ—,ฮ”๐‘ฅใ€‰๐‘‹ โˆ’ ๐œโˆ’1๐‘˜ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œ)

โ‰ฅ (โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โˆ’ ^โ€ฒ)โ€–ฮ”๐‘ฅ โ€–๐‘‹ โ‰ฅ หœ โˆ’ ^โ€ฒ.

By (27.9) (for the chosen subsequence) and the construction of ๐‘ฅ๐‘˜ , we have

(27.11) lim sup๐‘˜โ†’โˆž

๐œโˆ’1๐‘˜ โ€–(๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โˆ’ (๐‘ฅ, ๐‘ฆ)โ€–๐‘‹ร—๐‘Œ โ‰ค (1 + ^โ€ฒ)โ€–ฮ”๐‘ฅ โ€–๐‘‹ = 1 + ^โ€ฒ.

349

27 lipschitz-like properties and stability

Since (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ ๐‘ Y (๐›ฟ)graph ๐น (๐‘ฅ, ๐‘ฆ), the dening equation (18.7) of ๐‘ Y (๐›ฟ)

graph ๐น (๐‘ฅ, ๐‘ฆ) we have

(27.12) lim sup๐‘˜โ†’โˆž

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ๐‘˜ โˆ’ ๐‘ฅใ€‰๐‘‹ โˆ’ ใ€ˆ๐‘ฆโˆ—, ๐‘ฆ๐‘˜ โˆ’ ๐‘ฆใ€‰๐‘Œโ€–(๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โˆ’ (๐‘ฅ, ๐‘ฆ)โ€–๐‘‹ร—๐‘Œ โ‰ค Y (๐›ฟ).

Therefore, (27.10), (27.11), and (27.12) together yield

(1 + ^โ€ฒ)Y (๐›ฟ) โ‰ฅ หœ โˆ’ ^โ€ฒ.Taking the inmum over ๐›ฟ > 0, it follows that หœ โ‰ฅ ^โ€ฒ. Since ^โ€ฒ > lip ๐น (๐‘ฅ |๐‘ฆ) and หœ < ^were arbitrary, we obtain ^ โ‰ค lip ๐น (๐‘ฅ |๐‘ฆ) as desired. ๏ฟฝ

For the next theorem, recall the denition of Gรขteaux smooth spaces from Section 17.2.

Theorem 27.5. Let ๐‘‹,๐‘Œ be Gรขteaux smooth Banach spaces and let ๐น : ๐‘‹ โ‡’ ๐‘Œ be such thatgraph ๐น is closed near (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร—๐‘Œ . Then ๐น has the Aubin property at ๐‘ฅ for ๐‘ฆ if and only if^๐›ฟ๐›ฟ(๐‘ฅ |๐‘ฆ) < โˆž or ^0

๐›ฟ(๐‘ฅ |๐‘ฆ) < โˆž for some ๐›ฟ > 0. Furthermore, in this case

(27.13) inf๐›ฟ>0

^๐›ฟ๐›ฟ (๐‘ฅ |๐‘ฆ) = inf๐›ฟ>0

^0๐›ฟ (๐‘ฅ |๐‘ฆ) = lip ๐น (๐‘ฅ |๐‘ฆ).

Proof. By Lemma 27.4, it suces to show that

^ โ‰” inf๐›ฟ>0

^๐›ฟ๐›ฟ (๐‘ฅ |๐‘ฆ) โ‰ฅ lip ๐น (๐‘ฅ |๐‘ฆ).

We may assume that lip ๐น (๐‘ฅ |๐‘ฆ) > 0 as otherwise there is nothing to show. Our plan is nowto take arbitrary 0 < หœ < lip ๐น (๐‘ฅ |๐‘ฆ) and show that ^ โ‰ฅ หœ . This implies ^ โ‰ฅ lip ๐น (๐‘ฅ |๐‘ฆ) asdesired.

To show that ^ โ‰ฅ หœ , it suces to show that ^๐›ฟ๐›ฟ(๐‘ฅ |๐‘ฆ) โ‰ฅ หœ for all ๐›ฟ > 0. To do this, for

a parameter ๐‘กโ†’ 0, we take Y๐‘กโ†’ 0 and (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) โ†’ (๐‘ฅ, ๐‘ฆ) as ๐‘กโ†’ 0 as well as Y๐‘ก -normals(๐‘ฅโˆ—๐‘ก ,โˆ’๐‘ฆโˆ—๐‘ก ) โˆˆ ๐‘ Y๐‘ก

graph ๐น (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) that satisfy lim inf๐‘กโ†’0 โ€–๐‘ฅโˆ—๐‘ก โ€–๐‘‹ โˆ— โ‰ฅ หœ and lim sup๐‘กโ†’0 โ€–๐‘ฆโˆ—๐‘ก โ€–๐‘Œ โˆ— โ‰ค 1.By the denition of ^๐›ฟ

๐›ฟ(๐‘ฅ |๐‘ฆ) in (27.6), taking for each ๐›ฟ > 0 the index ๐‘ก > 0 such that

max{Y๐‘ก , โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ โ€–๐‘‹ , โ€–๐‘ฆ๐‘ก โˆ’ ๐‘ฆ โ€–๐‘Œ } โ‰ค ๐›ฟ , this shows as claimed that ^๐›ฟ๐›ฟ(๐‘ฅ |๐‘ฆ) โ‰ฅ หœ . The rough idea

is to construct the Y๐‘ก -normals by projecting points not in graph ๐น back onto this set. Thereare, however, some technical diculties along our way. We divide the construction intothree steps.

Step 1: setting up the projection problem. Let 0 < หœ < lip ๐น (๐‘ฅ |๐‘ฆ). Since then the Aubinproperty does not hold for หœ , by the characterization of Theorem 27.2 (iv) there exist

(27.14) ๏ฟฝ๏ฟฝ๐‘ก โˆˆ ๐น (๐‘ฅ๐‘ก ) โˆฉ ๐”น(๐‘ฆ, ๐‘ก) and ๐‘ฅ๐‘ก , ๐‘ฅ๐‘ก โˆˆ ๐”น(๐‘ฅ, ๐‘ก) for all ๐‘ก > 0

such that

(27.15) inf๐‘ฆ๐‘กโˆˆ๐น (๐‘ฅ๐‘ก )

โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ > หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ .

350

27 lipschitz-like properties and stability

Since inf๐‘ฆ๐‘กโˆˆ๐น (๐‘ฅ๐‘ก ) โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ = 0, this implies that ๐‘ฅ๐‘ก โ‰  ๐‘ฅ๐‘ก and (๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก ) โˆ‰ graph ๐น . We want tolocally project (๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก ) back onto graph ๐น . However, the nondierentiability of the distancefunction โ€– ยท โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ at ๏ฟฝ๏ฟฝ๐‘ก would cause diculties, so โ€“ similarly to the proof of Lemma 18.13โ€“ we modify the projection by composing the norm with the โ€œsmoothing functionโ€

(27.16) ๐œ‘` (๐‘Ÿ ) โ‰”โˆš`2 + ๐‘Ÿ 2 โˆ’ `.

By Theorems 4.5, 4.6 and 4.19 and the assumed dierentiability of โ€– ยท โ€–๐‘Œ away from theorigin, ๐œ‘` (โ€– ยท โ€–๐‘Œ ) is convex and has a single-valued subdierential mapping with elementsof norm less than one. Hence this smoothed distance function is Gรขteaux dierentiable byLemma 13.7. Due to (27.16), for every ๐‘ก > 0 and `๐‘ก > 0, we further have

(27.17) โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ โˆ’ `๐‘ก โ‰ค ๐œ‘`๐‘ก (โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ ) โ‰ค โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ (๐‘ฆ โˆˆ ๐‘Œ ).To locally project (๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก ) onto graph ๐น , we thus seek to minimize the function

(27.18) ๐œ“๐‘ก (๐‘ฅ, ๐‘ฆ) โ‰” ๐›ฟ๐ถ๐‘ก (๐‘ฅ, ๐‘ฆ) + หœ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + ๐œ‘`๐‘ก (โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ )for

๐ถ๐‘ก โ‰” [๐”น(๐‘ฅ, ๐‘ก + 2 หœ) ร— ๐”น(๐‘ฆ, ๐‘ก + 2 หœ)] โˆฉ graph ๐น .Clearly,๐œ“๐‘ก is bounded from below by โˆ’`๐‘ก as well as coercive since ๐ถ๐‘ก is bounded. If ๐‘ก is issmall enough, then ๐ถ๐‘ก is closed by the local closedness of graph ๐น . Therefore๐œ“๐‘ก is lowersemicontinuous (but not weakly lower semicontinuous since graph ๐น need not be convex).

Step 2: nding approximate minimizers.We would like to nd a minimizer of๐œ“๐‘ก , but the lackof weak lower semicontinuity prevents the use of Tonelliโ€™s direct method of Theorem 2.1.We therefore use Ekelandโ€™s variational principle (Theorem 2.14) to nd an approximateminimizer. Towards this end, choose for every ๐‘ก > 0

(27.19) `๐‘ก โ‰” ๐‘กโˆ’1/2 หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–2๐‘‹ โ‰ค หœ๐‘ก 1/2โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ and _๐‘ก โ‰” โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + ๐‘ก 1/2 โ‰ค ๐‘ก + ๐‘ก 1/2,where the inequalities hold due to (27.14). Then

(27.20) ๐œ“๐‘ก (๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก ) = หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ โ‰ค ( หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + `๐‘ก ) + inf๐œ“๐‘ก .

Therefore, applying Theorem 2.14 for _ = _๐‘ก and

Y = หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + `๐‘ก = หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ ๐‘กโˆ’1/2_๐‘ก = `๐‘ก_๐‘กโ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ ,

we obtain for each ๐‘ก > 0 a strict minimizer (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) of

๐œ“๐‘ก (๐‘ฅ, ๐‘ฆ) โ‰” ๐œ“๐‘ก (๐‘ฅ, ๐‘ฆ) + `๐‘กโ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ (โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + โ€–๐‘ฆ โˆ’ ๐‘ฆ๐‘ก โ€–๐‘Œ )(27.21a)

with๐œ“๐‘ก (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) + `๐‘ก

โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ (โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + โ€–๏ฟฝ๏ฟฝ๐‘ก โˆ’ ๐‘ฆ๐‘ก โ€–๐‘Œ ) โ‰ค ๐œ“๐‘ก (๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก ) = ^โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹(27.21b)

351

27 lipschitz-like properties and stability

andโ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ โ‰ค _๐‘ก .(27.21c)

We claim that ๐‘ฅ๐‘ก โ‰  ๐‘ฅ๐‘ก , which we show by contradiction. Assume therefore that ๐‘ฅ๐‘ก = ๐‘ฅ๐‘ก .Then ๐‘ฆ๐‘ก โˆˆ ๐น (๐‘ฅ๐‘ก ), and (27.17) yields

๐œ“๐‘ก (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) = ๐œ‘`๐‘ก (โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ ) โ‰ฅ โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ โˆ’ `๐‘ก .

Thus by (27.20) and (27.21b),

โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ โ‰ค โ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ โˆ’ `๐‘ก + `๐‘กโ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ (โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + โ€–๏ฟฝ๏ฟฝ๐‘ก โˆ’ ๐‘ฆ๐‘ก โ€–๐‘Œ ) โ‰ค หœ โ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ .

But this contradicts (27.15) as ๐‘ฆ๐‘ก โˆˆ ๐น (๐‘ฅ๐‘ก ).Step 3: constructing Y-normals.We are now ready to construct the desired Y-normals. Wewrite

(27.22) ๐œ“๐‘ก (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) = ๐›ฟ๐ถ๐‘ก (๐‘ฅ, ๐‘ฆ) + ๐น (๐‘ฅ, ๐‘ฆ)

for the convex and Lipschitz continuous function

๐น (๐‘ฅ, ๐‘ฆ) โ‰” หœ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + ๐œ‘`๐‘ก (โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ ) + `๐‘กโ€–๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ (โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ + โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ ) .

Since we assume ๐‘‹ to be Gรขteaux smooth, ๐‘ฅ โ†ฆโ†’ หœ โ€–๐‘ฅ โˆ’ ๐‘ฅ๐‘ก โ€–๐‘Œ is Gรขteaux dierentiable at๐‘ฅ๐‘ก โ‰  ๐‘ฅ๐‘ก . Furthermore, ๐‘ฆ โ†ฆโ†’ ๐œ‘`๐‘ก (โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ ) is by construction Gรขteaux dierentiable for all๐‘ฆ . By (27.19), we have `๐‘ก

โ€–๐‘ฅ๐‘กโˆ’๐‘ฅ๐‘ก โ€–๐‘‹ โ‰ค ๐‘ก 1/2 หœ . Since ๐‘ฅ๐‘ก โ‰  ๐‘ฅ๐‘ก , Theorems 4.6, 4.14 and 4.19 now yield

(27.23) ๐œ•๐น (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) โŠ‚ ๐”น((โˆ’๐‘ฅโˆ—๐‘ก , ๐‘ฆโˆ—๐‘ก ), ๐‘ก 1/2 หœ) for{โˆ’๐‘ฅโˆ—๐‘ก = หœ๐ท [โ€– ยท โˆ’ ๐‘ฅ๐‘ก โ€–๐‘‹ ] (๐‘ฅ๐‘ก ),๐‘ฆโˆ—๐‘ก = ๐ท [๐œ‘โ€ฒ`๐‘ก (โ€– ยท โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ )] (๐‘ฆ๐‘ก ).

Since ๐‘ฅ๐‘ก โ‰  ๐‘ฅ๐‘ก , we have โ€–๐‘ฅโˆ—๐‘ก โ€–๐‘‹ โˆ— = หœ by Theorem 4.6. Moreover, โ€–๐‘ฆโˆ—๐‘ก โ€–๐‘Œ โˆ— โ‰ค 1 as observed inStep 1. Theorem 16.2 further yields 0 โˆˆ ๐œ•๐น๐œ“๐‘ก (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ).Due to (27.22) and (27.23), Lemma 17.2 now shows that

(๐‘ฅโˆ—๐‘ก ,โˆ’๐‘ฆโˆ—๐‘ก ) โˆˆ ๐‘ Y๐‘ก๐ถ๐‘ก(๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ), i.e., ๐‘ฅโˆ—๐‘ก โˆˆ ๐ทโˆ—

Y๐‘ก ๐น (๐‘ฅ๐‘ก |๐‘ฆ๐‘ก ) (๐‘ฆโˆ—๐‘ก ) for Y๐‘ก โ‰” ๐‘ก 1/2 หœ .

We illustrate this construction in Figure 27.5. Since _๐‘ก โ‰ค ๐‘ก + ๐‘ก 1/2 by (27.19), it follows from(27.21c) that โ€–๐‘ฅ๐‘กโˆ’๐‘ฅ โ€–๐‘‹ , โ€–๐‘ฆ๐‘กโˆ’๐‘ฆ โ€–๐‘Œ โ‰ค 2๐‘ก+๐‘ก 1/2 and hence that (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) โ†’ (๐‘ฅ, ๐‘ฆ) as ๐‘กโ†’ 0. We alsohave both lim inf๐‘กโ†’ 0 โ€–๐‘ฅโˆ—๐‘ก โ€–๐‘‹ โˆ— โ‰ฅ หœ and lim sup๐‘กโ†’ 0 โ€–๐‘ฆโˆ—๐‘ก โ€–๐‘Œ โˆ— โ‰ค 1. Thus we have constructedthe desired sequence of Y๐‘ก -normals. ๏ฟฝ

352

27 lipschitz-like properties and stability

๐น

(๐‘ฅ, ๐‘ฆ)

(๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก )

(๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก )

(๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก )

(๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ) = proj(๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก )

๐‘‘

โ‰ฅ หœ๐‘‘

โ‰ฅ หœ๐‘›๐‘ฆ

๐‘›๐‘ฆ

Figure 27.5: The construction in the nal part of the proof of Theorem 27.5. The dotted arrowindicates how ๐‘ฆ๐‘ก minimizes the distance to ๏ฟฝ๏ฟฝ๐‘ก within ๐น (๐‘ฅ๐‘ก ), which ensures thatโ€–๐‘ฆ๐‘ก โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ โ‰ฅ หœ๐‘‘ for ๐‘‘ โ‰” โ€–๐‘ฅ๐‘ก โˆ’๐‘ฅ๐‘ก โ€–๐‘‹ . The point (๐‘ฅ๐‘ก , ๏ฟฝ๏ฟฝ๐‘ก ) is outside graph ๐น ; whenprojected back as (๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก ), the normal vector to graph ๐น indicated by the solidarrow has ๐‘ฅ-component larger than the ๐‘ฆ-component ๐‘›๐‘ฆ by the factor หœ . Thedashed arrow indicates the convergence of the other points to (๐‘ฅ, ๐‘ฆ) as ๐‘กโ†’ 0.

Remark 27.6. Our proof of Theorem 27.5 diers from those in [Mordukhovich 2018; Mordukhovich2006] by the specic construction of the point (๏ฟฝ๏ฟฝ๐‘ก , ๐‘ฅ๐‘ก ) โˆ‰ graph ๐น and the use of the smootheddistance ๐œ‘Y๐‘ก (โ€– ยท โ€–๐‘‹ ). In contrast, the earlier proofs rst translate the Aubin property (or metricregularity) into a covering or linear openness property to construct the point outside graph ๐น that isto be projected back onto this set. In nite dimensions, [Mordukhovich 2018] develops calculus forthe limiting subdierential of Section 16.3 to avoid the lack of calculus for the Frรฉchet subdierential;we instead apply the fuzzy calculus of Lemma 17.2 to the smoothed distance function ๐œ‘Y๐‘ก (โ€– ยท โ€–๐‘‹ ). Afurther alternative in nite dimensions involves the proximal subdierentials used in [Rockafellar& Wets 1998]. In innite dimensions, [Mordukhovich 2006] develops advanced extremal principlesto work with the Frรฉchet subdierential.

Remark 27.7 (relaxation of Gรขteaux smoothness). The assumption that ๐‘Œ (or, with somewhat morework, ๐‘‹ ) is Gรขteaux smooth in Theorem 27.5 may be replaced with the assumption of the existenceof a family {\` : ๐‘Œ โ†’ โ„}`>0 of Gรขteaux dierentiable norm approximations satisfying

โ€–๐‘ฆ โ€–๐‘Œ โˆ’ ` โ‰ค \` (๐‘ฆ) โ‰ค โ€–๐‘ฆ โ€–๐‘Œ (๐‘ฆ โˆˆ ๐‘Œ ) .Then (27.17) holds with \`๐‘ก (๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก ) in place of ๐œ‘`๐‘ก (โ€–๐‘ฆ โˆ’ ๏ฟฝ๏ฟฝ๐‘ก โ€–๐‘Œ ). For example, with ๐œ‘` as in (27.16),in ๐ฟ๐‘ (ฮฉ) we can set

\` (๐‘ฆ) โ‰” โ€–๐œ‘` ( |๐‘ฆ (b) |)โ€–๐ฟ๐‘ (ฮฉ) (๐‘ฆ โˆˆ ๐ฟ1(ฮฉ)) .With somewhat more eort, the Gรขteaux smoothness of ๐‘‹ can be similarly relaxed.

27.3 point-based coderivative criteria

We will now convert the neighborhood-based criterion of Lemma 27.4 and Theorem 27.5into a simpler point-based criterion. For the statement, we need to introduce a new smaller

353

27 lipschitz-like properties and stability

๐ป

[โˆ’1, 1]

(๐‘ค, ๐‘ง)

(0, 0)

(a) general set-valued mapping ๐ป

๐ป

[โˆ’1, 1]

(๐‘ค, ๐‘ง)

(0, 0)

(b) mapping whose graph๐ป is a cone

Figure 27.6: points (๐‘ค, ๐‘ง) achieving the supremum in the expression of the outer norm |๐ป |+

coderivative of ๐น : ๐‘‹ โ‡’ ๐‘Œ at ๐‘ฅ for ๐‘ฆ , the mixed (limiting) coderivative ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) : ๐‘Œ โˆ— โ‡’

๐‘‹ โˆ—,

(27.24) ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โ‰” w-โˆ—-lim sup

(๐‘ฅ,๏ฟฝ๏ฟฝ)โ†’(๐‘ฅ,๐‘ฆ)๏ฟฝ๏ฟฝโˆ—โ†’๐‘ฆโˆ—, Yโ†’ 0

๐ทโˆ—Y ๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโˆ—),

which diers from the โ€œnormalโ€ coderivative

(27.25) ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) = w-โˆ—-lim sup(๐‘ฅ,๏ฟฝ๏ฟฝ)โ†’(๐‘ฅ,๐‘ฆ)๏ฟฝ๏ฟฝโˆ— โˆ—โ‡€๐‘ฆโˆ—, Yโ†’ 0

๐ทโˆ—Y ๐น (๐‘ฅ |๏ฟฝ๏ฟฝ) (๏ฟฝ๏ฟฝโˆ—),

by the use of weak-โˆ— convergence in ๐‘‹ โˆ— and strong convergence in ๐‘Œ โˆ— instead of weak-โˆ—convergence in both. (The mixed coderivative is not obtained directly from any of theusual normal cones, although one can naturally dene corresponding mixed normal coneson product spaces.)

We further dene for any ๐ป :๐‘Š โ‡’ ๐‘ the outer norm

|๐ป |+ โ‰” sup{โ€–๐‘งโ€–๐‘ | ๐‘ง โˆˆ ๐ป (๐‘ค), โ€–๐‘ค โ€–๐‘Š โ‰ค 1}.

We illustrate the outer norm by two examples in Figure 27.6. We are mainly interested inthe outer norms of coderivatives, in particular of

(27.26) |๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ = sup{โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—

๐‘€๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—), โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1}.

Recalling Theorem 18.5, we have

(27.27) ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚ ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โŠ‚ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—),

so the outer norms satisfy

|๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ โ‰ค |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+.

354

27 lipschitz-like properties and stability

We say that ๐น is coderivatively normal at ๐‘ฅ for ๐‘ฆ if |๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ = |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+. Of course,

if ๐‘Œ is nite-dimensional, then ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) and thus ๐น is always coderivatively

normal. Note that |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+ can be directly related to the neighborhood-based ^๐›ฟ๐›ฟde-

ned in (27.6). In particular, it measures the opening of the cone ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ); compareFigure 27.6b.

As the central result of this chapter, we now use this connection to derive a characterizationof the Aubin property and the graphical modulus (and hence also of metric regularity andthe modulus of metric regularity) through the outer norm of the mixed limiting coderivative.ThisMordukhovich criterion generalizes the classical relation between the Lipschitz constantof a ๐ถ1 function and the norm of its derivative.

Lemma 27.8 (Mordukhovich criterion in general Banach spaces). Let ๐‘‹,๐‘Œ be Banach spacesand let ๐น : ๐‘‹ โ‡’ ๐‘Œ be such that graph ๐น is closed near (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ . If ๐น has the Aubinproperty at ๐‘ฅ for ๐‘ฆ , then

(27.28) ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (0) = {0}

and

(27.29) |๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ โ‰ค lip ๐น (๐‘ฅ |๐‘ฆ).

Proof. As the rst step, we show that the Aubin property implies (27.29) and hence that^ โ‰” |๐ทโˆ—

๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ < โˆž. Let ๐œŒ > 0. By the denition of๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) in (27.24), there then exist

๐›ฟ โˆˆ (0, ๐œŒ), ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐œŒ), and ๐‘ฆ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐œŒ) as well as ๐‘ฆโˆ— โˆˆ ๐‘Œ โˆ— and ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐›ฟ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—)

such that โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1 + ๐œŒ and โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ฅ ^ (1 โˆ’ ๐œŒ)2. (The upper bound on โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— is whywe need the mixed coderivative, since โ€– ยท โ€–๐‘Œ โˆ— is continuous only in the strong topology.For the lower bound on โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— , in contrast, the weak-โˆ— lower semicontinuity of โ€– ยท โ€–๐‘‹ โˆ— issucient.) Since ๐ทโˆ—

๐›ฟ๐น (๐‘ฅ |๐‘ฆ) is formed from a cone, we may divide ๐‘ฅโˆ— and ๐‘ฆโˆ— by 1 + ๐œŒ and

thus assume that โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1 and โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ฅ ^ (1 โˆ’ ๐œŒ). Consequently

^ (1 โˆ’ ๐œŒ) โ‰ค ^๐›ฟ๐›ฟ (๐‘ฅ |๐‘ฆ) = sup{โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐›ฟ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—), โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1,

๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ)}.

Taking the inmum over ๐›ฟ > 0 and letting ๐œŒโ†’ 0 thus shows

^ โ‰ค inf๐›ฟ>0

^๐›ฟ๐›ฟ (๐‘ฅ |๐‘ฆ).

It now follows from Lemma 27.4 that ^ โ‰ค lip ๐น (๐‘ฅ |๐‘ฆ), which yields (27.29).

As the second step, we prove that the Aubin property implies (27.28). We argue by con-traposition. First, note that since graph๐ทโˆ—

๐‘€๐น (๐‘ฅ |๐‘ฆ) is a cone, 0 โˆˆ ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (0). Hence if

(27.28) does not hold, there exists ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ— \ {0} such that

๐‘ฅโˆ— [0,โˆž) โŠ‚ ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (0).

By (27.26) and the rst step, this implies thatโˆž = ^ โ‰ค lip ๐น (๐‘ฅ |๐‘ฆ) and hence that the Aubinproperty of ๐น at ๐‘ฅ for ๐‘ฆ is violated. ๏ฟฝ

355

27 lipschitz-like properties and stability

Applied to ๐นโˆ’1, we obtain a corresponding result for metric regularity.

Corollary 27.9 (Mordukhovich criterion for metric regularity in general Banach spaces). Let๐‘‹,๐‘Œ be Banach spaces and let ๐น : ๐‘‹ โ‡’ ๐‘Œ be such that graph ๐น is closed near (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ .If ๐น is metrically regular at (๐‘ฅ, ๐‘ฆ), then

(27.30) 0 โˆˆ ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) โ‡’ ๐‘ฆโˆ— = 0

and

(27.31) |๐ทโˆ—๐‘€๐น

โˆ’1(๐‘ฆ |๐‘ฅ) |+ โ‰ค reg ๐น (๐‘ฅ |๐‘ฆ).

Proof. We apply Lemma 27.8 to ๐นโˆ’1, observing that (27.28) applied to ๐นโˆ’1 is (27.30). ๏ฟฝ

Under stronger assumptions on the spaces and the set-valued mapping, we obtain equiva-lence. For the following theorem, recall the denition of partial sequential normal com-pactness (PSNC) from Section 25.2.

Theorem 27.10 (Mordukhovich criterion in smooth Banach spaces). Let ๐‘‹,๐‘Œ be Gรขteauxsmooth Banach spaces with ๐‘‹ reexive and let ๐น : ๐‘‹ โ‡’ ๐‘Œ be such that graph ๐น is closed near(๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ . If ๐น is PSNC at ๐‘ฅ for ๐‘ฆ , then the following are equivalent:

(i) the Aubin property of ๐น at ๐‘ฅ for ๐‘ฆ ;

(ii) the implication (27.28);

(iii) |๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ < โˆž.

Proof. Due to Lemma 27.8, it suces to show that (iii)โ‡’ (ii)โ‡’ (i). We start with the secondimplication. Since ๐‘‹ and ๐‘Œ are Gรขteaux smooth, Theorem 27.5 yields

(27.32) lip ๐น (๐‘ฅ |๐‘ฆ) = หœ โ‰” inf๐›ฟ>0

sup{โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ—

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐›ฟ๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—), โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1,

๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐น (๐‘ฅ) โˆฉ ๐”น(๐‘ฆ, ๐›ฟ)}

and that the Aubin property holds if หœ < โˆž. We now argue by contradiction. Assumethat the Aubin property does not hold. Then หœ = โˆž and hence we can nd (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โ†’(๐‘ฅ, ๐‘ฆ), Y๐‘˜โ†’ 0, and ๐‘ฅโˆ—

๐‘˜โˆˆ ๐ทโˆ—

Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (๐‘ฆโˆ—๐‘˜ ) with โ€–๐‘ฆโˆ—

๐‘˜โ€–๐‘Œ โˆ— โ‰ค 1 and โ€–๐‘ฅโˆ—

๐‘˜โ€–๐‘‹ โˆ— โ†’ โˆž. In particular,

๐‘ฆโˆ—๐‘˜/โ€–๐‘ฅโˆ—

๐‘˜โ€–๐‘‹ โˆ— โ†’ 0. Since ๐‘‹ is reexive, we can apply the Eberleinโ€“Smulyan Theorem 1.9 to

extract a subsequence (not relabelled) such that ๐‘ฅโˆ—๐‘˜/โ€–๐‘ฅโˆ—

๐‘˜โ€–๐‘‹ โˆ— โˆ—โ‡€ ๐‘ฅโˆ— for some ๐‘ฅโˆ— โˆˆ ๐‘‹ โˆ—. Since

graph๐ทโˆ—Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) is a cone, we also have

๐‘ฅโˆ—๐‘˜/โ€–๐‘ฅโˆ—๐‘˜ โ€–๐‘‹ โˆ— โˆˆ ๐ทโˆ—Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (๐‘ฆโˆ—๐‘˜/โ€–๐‘ฅโˆ—๐‘˜ โ€–๐‘‹ โˆ—).

356

27 lipschitz-like properties and stability

By the denition (27.24) of the mixed coderivative, we deduce that ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (0).

We now make a case distinction: If ๐‘ฅโˆ— โ‰  0, then this contradicts the qualication con-dition (27.28). On the other hand, if ๐‘ฅโˆ— = 0, the PSNC of ๐น at ๐‘ฅ for ๐‘ฆ , implies that1 = โ€–๐‘ฅโˆ—

๐‘˜/โ€–๐‘ฅโˆ—

๐‘˜โ€–๐‘‹ โˆ— โ€–๐‘‹ โˆ— โ†’ 0, which is also a contradiction. Therefore (27.28) implies the

Aubin property.

It remains to show that (iii)โ‡’ (ii). First, since graph๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) is a cone, ๐ทโˆ—

๐‘€๐น (๐‘ฅ |๐‘ฆ) (0) is acone as well. Hence by (27.26), |๐ทโˆ—

๐‘€๐น (๐‘ฅ |๐‘ฆ) |+ < โˆž implies that ๐ทโˆ—๐‘€๐น (๐‘ฅ |๐‘ฆ) (0) = {0}, which

is (27.28). ๏ฟฝ

Again, applying Theorem 27.10 to ๐นโˆ’1 yields a characterization of metric regularity.

Corollary 27.11 (Mordukhovich criterion for metric regularity in smooth Banach spaces).

Let ๐‘‹,๐‘Œ be Gรขteaux smooth Banach spaces with ๐‘‹ reexive and let ๐น : ๐‘‹ โ‡’ ๐‘Œ be suchthat graph ๐น is closed near (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ . If ๐นโˆ’1 is PSNC at ๐‘ฆ for ๐‘ฅ , then the following areequivalent:

(i) the metric regularity of ๐น at (๐‘ฅ, ๐‘ฆ);(ii) the implication (27.30);

(iii) |๐ทโˆ—๐‘€๐น

โˆ’1(๐‘ฆ |๐‘ฅ) |+ < โˆž.

Remark 27.12 (separable and Asplund spaces). The reexivity of ๐‘‹ (resp. ๐‘Œ ) was used to obtain theweak-โˆ— compactness of the unit ball in ๐‘‹ โˆ— via the Eberleinโ€“Smulyan Theorem 1.9 applied to ๐‘‹ โˆ—.Alternatively, this can be obtained by assuming separability of ๐‘‹ and using the Banachโ€“AlaogluTheorem 1.11. More generally, dual spaces of Asplund spaces have weak-โˆ—-compact unit balls; werefer to [Mordukhovich 2006] for the full theory in Asplund spaces.

In nite dimensions, we have a full characterization of the graphical modulus via the outernorm of the limiting coderivative (which here coincides with the mixed coderivative).

Corollary 27.13 (Mordukhovich criterion for the graphical modulus in finite dimensions).

Let ๐‘‹,๐‘Œ be nite-dimensional Gรขteaux smooth Banach spaces and let ๐น : ๐‘‹ โ‡’ ๐‘Œ be such thatgraph ๐น is closed near (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ . Then

lip ๐น (๐‘ฅ |๐‘ฆ) = |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+.

Proof. Due to Lemma 27.8, we only have to show that

(27.33) lip ๐น (๐‘ฅ |๐‘ฆ) โ‰ค |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+.

As in the proof of Theorem 27.10, the smoothness of๐‘‹ and ๐‘Œ allows applying Theorem 27.5to obtain that lip ๐น (๐‘ฅ |๐‘ฆ) = หœ given by (27.32). It therefore suces to show that หœ โ‰ค|๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+. Let ^โ€ฒ < หœ be arbitrary. By (27.32), we can then nd (๐‘ฅ๐‘˜ , ๐‘ฆ๐‘˜) โ†’ (๐‘ฅ, ๐‘ฆ) and

357

27 lipschitz-like properties and stability

๐น

(a) property is satised

๐น

(b) property is not satised

Figure 27.7: Illustration of |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+ = sup{โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— | (๐‘ฅโˆ—,โˆ’๐‘ฆโˆ—) โˆˆ๐‘graph ๐น (๐‘ฅ, ๐‘ฆ), โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1}, where the arrows denote the directionscontained in the normal cone. In (a), โˆ’๐‘ฆโˆ— โˆˆ [0,โˆž) but ๐‘ฅโˆ— = 0, hence|๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+ = 0 and the Aubin property is satised. In (b), we can take for๐‘ฆโˆ— = 0 any ๐‘ฅโˆ— โˆˆ (โˆ’โˆž, 0], hence |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+ = โˆž and the Aubin property isnot satised.

Y๐‘˜โ†’ 0 as well as ๐‘ฅโˆ—๐‘˜โˆˆ ๐ทโˆ—

Y๐‘˜๐น (๐‘ฅ๐‘˜ |๐‘ฆ๐‘˜) (๐‘ฆโˆ—๐‘˜ ) with โ€–๐‘ฆโˆ—

๐‘˜โ€–๐‘Œ โˆ— โ‰ค 1, and หœ โ‰ฅ โ€–๐‘ฅโˆ—

๐‘˜โ€– โ‰ฅ ^โ€ฒ. Since ๐‘‹

and ๐‘Œ are nite-dimensional, we can apply the Heineโ€“Borel Theorem to extract stronglyconverging subsequences (not relabelled) such that ๐‘ฅโˆ—

๐‘˜โ†’ ๐‘ฅโˆ— with โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ฅ ^โ€ฒ and ๐‘ฆโˆ—

๐‘˜โ†’

๐‘ฆโˆ— with โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— โ‰ค 1. Since strongly converging sequences also converge weakly-โˆ—, theexpression (27.25) for the normal coderivative implies that ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—) and that|๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) |+ โ‰ฅ โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ฅ ^โ€ฒ. Since ^โ€ฒ < หœ was arbitrary, we obtain (27.33). ๏ฟฝ

We illustrate in Figure 27.7 how the outer norm of the coderivative relates to the Aubinproperty.

Corollary 27.14 (Mordukhovich criterion for the modulus of metric regularity in finite

dimensions). Let๐‘‹,๐‘Œ be nite-dimensional Gรขteaux smooth Banach spaces and let ๐น : ๐‘‹ โ‡’ ๐‘Œbe such that graph ๐น is closed near (๐‘ฅ, ๐‘ฆ) โˆˆ ๐‘‹ ร— ๐‘Œ . Then

reg ๐น (๐‘ฅ |๐‘ฆ) = |๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)โˆ’1 |+.

Proof. By Lemma 20.5, we have

|๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) |+ = sup{โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— | โˆ’๐‘ฆโˆ— โˆˆ ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) (โˆ’๐‘ฅโˆ—), โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1}= sup{โ€–๐‘ฆโˆ—โ€–๐‘Œ โˆ— | ๐‘ฅโˆ— โˆˆ ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) (๐‘ฆโˆ—), โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค 1}= | [๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)]โˆ’1 |+.

The claim now follows by applying Corollary 27.13 to ๐นโˆ’1 together with Corollary 27.9. ๏ฟฝ

Remark 27.15. Derivative-based characterizations of calmness and metric subregularity are signi-cantly more involved than those of the Aubin property and metric regularity discussed above. Werefer to [Henrion, Jourani & Outrata 2002; Zheng & Ng 2010; Gfrerer 2011; Gfrerer & Outrata 2016]to a few characterizations in special cases.

358

27 lipschitz-like properties and stability

To close this section, we relate the Mordukhovich criterion to the classical inverse functiontheorem (Theorem 2.8).

Corollary 27.16 (inverse function theorem). Let๐‘‹,๐‘Œ be reexive and Gรขteaux smooth Banachspaces and let ๐น : ๐‘‹ โ†’ ๐‘Œ be continuously dierentiable around ๐‘ฅ โˆˆ ๐‘‹ . If ๐น โ€ฒ(๐‘ฅ)โˆ— โˆˆ ๐•ƒ(๐‘Œ โˆ—;๐‘‹ โˆ—)has a left-inverse ๐น โ€ฒ(๐‘ฅ)โˆ—โ€  โˆˆ ๐•ƒ(๐‘‹ โˆ—;๐‘Œ โˆ—), then there exist ^ > 0 and ๐›ฟ > 0 such that for all๐‘ฆ โˆˆ ๐”น(๐น (๐‘ฅ), ๐›ฟ) there exists a single-valued selection ๐ฝ (๐‘ฆ) โˆˆ ๐นโˆ’1(๐‘ฆ) with

โ€–๐‘ฅ โˆ’ ๐ฝ (๐‘ฆ)โ€–๐‘‹ โ‰ค ^โ€–๐น (๐‘ฅ) โˆ’ ๐‘ฆ โ€–๐‘Œ .

Proof. Let ๐‘ฆ โ‰” ๐น (๐‘ฅ). By Theorem 20.12 and the reexivity of ๐‘‹ and ๐‘Œ ,

(27.34) ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) = ๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ) = {๐น โ€ฒ(๐‘ฅ)โˆ—}.

We have both๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) = [๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)]โˆ’1 and๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) = [๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)]โˆ’1 by Lemma 20.5.Due to (27.27), this then implies that ๐ทโˆ—

๐‘€๐นโˆ’1(๐‘ฆ |๐‘ฅ) โŠ‚ ๐ทโˆ—๐นโˆ’1(๐‘ฆ |๐‘ฅ) = [๐ทโˆ—๐น (๐‘ฅ |๐‘ฆ)]โˆ’1. The

existence of a left-inverse implies that ๐น โ€ฒ(๐‘ฅ)โˆ— is injective, which together with (27.34) yields(27.30).

By the continuity of ๐น , graph ๐นโˆ’1 is closed near (๐‘ฆ, ๐‘ฅ). By Lemma 25.6, ๐นโˆ’1 is PSNC at ๐‘ฆfor ๐‘ฅ . Consequently, Corollary 27.14 shows that ๐น is metrically regular at ๐‘ฅ for ๐‘ฆ . By thedenition (27.2) of metrical regularity, there thus exists for any หœ > reg ๐น (๐‘ฅ |๐‘ฆ) a ๐›ฟ > 0such that

inf๐‘ฅโˆˆ๐นโˆ’1 (๐‘ฆ)

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค หœ โ€–๐น (๐‘ฅ) โˆ’ ๐‘ฆ โ€–๐‘Œ (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฆ โˆˆ ๐”น(๐‘ฆ, ๐›ฟ)) .

Taking in particular ๐‘ฅ = ๐‘ฅ yields

inf๐‘ฅโˆˆ๐นโˆ’1 (๐‘ฆ)

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค หœ โ€–๐น (๐‘ฅ) โˆ’ ๐‘ฆ โ€–๐‘Œ (๐‘ฆ โˆˆ ๐”น(๐น (๐‘ฅ), ๐›ฟ)) .

Although the inmum might not be attained, this implies that we can take arbitrary ^ > หœto obtain for any ๐‘ฆ โˆˆ ๐”น(๐น (๐‘ฅ), ๐›ฟ) the existence of some ๐ฝ (๐‘ฆ) โ‰” ๐‘ฅ โˆˆ ๐นโˆ’1(๐‘ฆ) satisfyingโ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ^โ€–๐น (๐‘ฅ) โˆ’ ๐‘ฆ โ€–๐‘Œ , which is the claim. ๏ฟฝ

27.4 stability with respect to perturbations

We now return to the question of stability, where we are interested more generally in theeect of perturbations of an optimization problem on its solution. In the context of theintroductory example (P), this could be the data ๐‘ง, the penalty parameter ๐›ผ , the constraintset๐ถ โŠ‚ ๐‘‹ , or other parameters in the model. More generally, let ๐‘‹, ๐‘ƒ be Banach spaces and๐‘“ : ๐‘‹ ร— ๐‘ƒ โ†’ โ„. We then consider for some parameter ๐‘ โˆˆ ๐‘ƒ the optimization problem

(27.35) min๐‘ฅโˆˆ๐‘‹

๐‘“ (๐‘ฅ ;๐‘)

359

27 lipschitz-like properties and stability

and study how a minimizer (or critical point) ๐‘ฅ โˆˆ ๐‘‹ depends on changes in ๐‘ . For thispurpose, we introduce the set-valued solution mapping (or, if ๐‘ฅ โ†ฆโ†’ ๐‘“ (๐‘ฅ ;๐‘) is not convex,critical point map)

(27.36) ๐‘† : ๐‘ƒ โ‡’ ๐‘‹, ๐‘† (๐‘) โ‰” {๐‘ฅ โˆˆ ๐‘‹ | 0 โˆˆ ๐œ•๐‘ฅ ๐‘“ (๐‘ฅ ;๐‘)},where ๐œ•๐‘ฅ is a suitable (convex, Clarke) subdierential with respect to ๐‘ฅ for xed ๐‘ . We applythe concepts from Section 27.1 to this problem. Specically, if ๐‘† has the Aubin property at๐‘ for ๐‘ฅ , then we can take ๐‘ฆ = ๐‘ฆ in(27.1) to obtain

inf๐‘ฅโˆˆ๐‘† (๐‘)

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ^โ€–๐‘ โˆ’ ๐‘ โ€–๐‘ƒ (๐‘ โˆˆ ๐”น(๐‘, ๐›ฟ))

for some ๐›ฟ, ^ > 0. In other words, the Aubin property of the solution map ๐‘† at ๐‘ for ๐‘ฅ๐‘ฅimplies the local Lipschitz stability of solutions ๐‘ฅ = ๐‘† (๐‘) under perturbations ๐‘ around theparameter ๐‘ . The question now is when a solution mapping has the Aubin property.

We start with a simple special case. Returning to the motivation at the beginning of thischapter,๐‘ค โˆˆ ๐œ•๐‘“ (๐‘ฅ) is of course equivalent to 0 โˆˆ ๐œ•๐‘“ (๐‘ฅ) โˆ’ {๐‘ค} = ๐œ•(๐‘“ โˆ’ ใ€ˆ๐‘ค, ยท ใ€‰๐‘‹ ) (๐‘ฅ) sincecontinuous linear mappings are dierentiable. Such a perturbation of ๐‘“ is called a tiltperturbation, with๐‘ค โˆˆ ๐‘‹ โˆ— called tilt parameter.

To make this more precise, let ๐‘” : ๐‘‹ โ†’ โ„ be locally Lipschitz. For a tilt parameter ๐‘ โˆˆ ๐‘‹ โˆ—,we then dene

(27.37) ๐‘“ (๐‘ฅ ;๐‘) = ๐‘”(๐‘ฅ) โˆ’ ใ€ˆ๐‘, ๐‘ฅใ€‰๐‘‹and refer to the stability of minimizers (or critical points) of ๐‘“ with respect to ๐‘ as tiltstability. By Theorems 13.4 and 13.20, the solution mapping for ๐‘“ is

๐‘† (๐‘) = {๐‘ฅ โˆˆ ๐‘‹ | ๐‘ โˆˆ ๐œ•๐ถ๐‘”(๐‘ฅ)} = (๐œ•๐‘”๐ถ)โˆ’1(๐‘),which thus has the Aubin property โ€“ and ๐‘“ is tilt-stable โ€“ if and only if ๐œ•๐ถ๐‘” is metricallyregular at ๐‘ฅ for 0, i.e., by (27.2) that there exist ^, ๐›ฟ > 0 such that

(27.38) dist(๐‘ฅ, (๐œ•๐ถ๐‘”)โˆ’1(๐‘ฅโˆ—)) โ‰ค ^ dist(๐œ•๐ถ๐‘”(๐‘ฅ), ๐‘ฅโˆ—) (๐‘ฅโˆ— โˆˆ ๐”น(0, ๐›ฟ); ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)).

We illustrate this with two examples. The rst concerns data stability of least squarestting, which in Hilbert spaces can be formulated as tilt stability.

Example 27.17 (data stability of least squares fiing). Let ๐‘‹,๐‘Œ be Hilbert spaces and๐‘”(๐‘ฅ) = 1

2 โ€–๐ด๐‘ฅ โˆ’๐‘โ€–2๐‘Œ for some๐ด โˆˆ ๐•ƒ(๐‘‹ ;๐‘Œ ) and ๐‘ โˆˆ ๐‘Œ . Taking ๐‘ = ๐ดโˆ—ฮ”๐‘ for some ฮ”๐‘ โˆˆ ๐‘Œ ,we can write this in the form of (27.37) via

๐‘“ (๐‘ฅ ;๐‘) = ๐‘”(๐‘ฅ) โˆ’ ใ€ˆ๐ดโˆ—ฮ”๐‘, ๐‘ฅใ€‰๐‘‹ =12 โ€–๐ด๐‘ฅ โˆ’ (๐‘ + ฮ”๐‘)โ€–2๐‘Œ โˆ’ 1

2 โ€–ฮ”๐‘โ€–2๐‘Œ .

Data stability thus follows from the metric regularity of ๐œ•๐‘” at a minimizer ๐‘ฅ of the

360

27 lipschitz-like properties and stability

convex functional ๐‘”. We have ๐œ•๐ถ๐‘”(๐‘ฅ) = {๐ดโˆ—(๐ด๐‘ฅ โˆ’ ๐‘)}, so

(๐œ•๐‘”)โˆ’1(๐‘ฅโˆ—) = {๐‘ฅ | ๐ดโˆ—๐ด๐‘ฅ = ๐ดโˆ—๐‘ + ๐‘ฅโˆ—}.

Therefore (27.38) is equivalent to

inf๐‘ฅ{โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ | ๐ดโˆ—๐ด๐‘ฅ = ๐ดโˆ—๐‘ + ๐‘ฅโˆ—} โ‰ค ^โ€–๐ดโˆ—๐ด๐‘ฅ โˆ’ (๐ดโˆ—๐‘ + ๐‘ฆ)โ€–๐‘‹

(๐‘ฅโˆ— โˆˆ ๐”น(0, ๐›ฟ); ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .

If๐ดโˆ—๐ด has a bounded inverse (๐ดโˆ—๐ด)โˆ’1 โˆˆ ๐•ƒ(๐‘‹ ;๐‘‹ ), then we can take ^ = โ€–(๐ดโˆ—๐ด)โˆ’1โ€–๐•ƒ(๐‘‹ ;๐‘‹ )for any ๐›ฟ > 0. On the other hand, if ๐ดโˆ—๐ด is not surjective, then there cannot be metricregularity (simply take an appropriate choice of ๐‘ฅโˆ— outside ran๐ดโˆ—๐ด).

For a genuinely nonsmooth example, we consider the (academic) problem of minimizingthe (non-squared) norm on a Hilbert space.

Example 27.18 (tilt stability of least norm fiing). Let ๐‘‹ be a Hilbert space and ๐‘”(๐‘ฅ) =โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ for some ๐‘ง โˆˆ ๐‘‹ . To show tilt stability, we have to verify (27.38) for some^, ๐›ฟ > 0. For ๐‘ฅ โ‰  ๐‘ง, we have ๐œ•๐‘”(๐‘ฅ) = {(๐‘ฅ โˆ’ ๐‘ง)/โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ }, and for ๐‘ฅ = ๐‘ง, we have๐œ•๐‘”(๐‘ฅ) = ๐”น(0, 1). Thus (27.38) reads

dist(๐‘ฅ, (๐œ•๐‘”)โˆ’1(๐‘ฅโˆ—)) โ‰ค ^{ ๐‘ฅโˆ’๐‘ง

โ€–๐‘ฅโˆ’๐‘งโ€–๐‘‹ โˆ’ ๐‘ฅโˆ— ๐‘‹

if ๐‘ฅ โ‰  ๐‘ง,

dist(๐‘ฅโˆ—,๐”น(0, 1)) if ๐‘ฅ = ๐‘ง(27.39)

for all ๐‘ฅโˆ— โˆˆ ๐”น(0, ๐›ฟ) and ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) where

dist(๐‘ฅ, (๐œ•๐‘”)โˆ’1(๐‘ฅโˆ—)) =dist(๐‘ฅ โˆ’ ๐‘ง, ๐‘ฅโˆ— [0,โˆž)) if โ€–๐‘ฅโˆ—โ€–๐‘‹ = 1,โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ if โ€–๐‘ฅโˆ—โ€–๐‘‹ < 1,โˆž if โ€–๐‘ฅโˆ—โ€–๐‘‹ > 1.

As the inequality cannot hold if โ€–๐‘ฅโˆ—โ€–๐‘‹ > 1, we take ๐›ฟ โˆˆ (0, 1] to ensure that this doesnot happen. If ๐‘ฅ = ๐‘ง, then (27.39) trivially holds for any ^ > 0, both sides being zero.For ๐‘ฅโˆ— โˆˆ ๐”น(0, ๐›ฟ) and ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) \ {๐‘ง}, the inequality (27.39) reads

^

๐‘ฅ โˆ’ ๐‘งโ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ โˆ’ ๐‘ฅโˆ—

๐‘‹

โ‰ฅ{dist(๐‘ฅ โˆ’ ๐‘ง, ๐‘ฅโˆ— [0,โˆž)) if โ€–๐‘ฅโˆ—โ€–๐‘‹ = 1,โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ if โ€–๐‘ฅโˆ—โ€–๐‘‹ < 1.

Choosing ๐‘ฅโˆ— = _(๐‘ฅ โˆ’ ๐‘ง)/โ€–๐‘ฅ โˆ’ ๐‘งโ€–๐‘‹ , and letting _โ†’1, we see that the inequality cannothold unless ๐›ฟ โˆˆ (0, 1) (which prevents _โ†’1). Thus, taking the inmum of the left-handside over โ€–๐‘ฅโˆ—โ€–๐‘‹ โ‰ค ๐›ฟ < 1 and the supremum of the right-hand side over ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ),the inequality holds if ^ (1 โˆ’ ๐›ฟ) โ‰ฅ ๐›ฟ . This can be satised for any ^ > 0 for sucientlysmall ๐›ฟ โˆˆ (0, 1).

361

27 lipschitz-like properties and stability

Since ๐‘ฅโˆ— โˆˆ ๐‘‹ is comparable to the tilt parameter ๐‘ โˆˆ ๐‘‹ , this says that we can onlystably โ€œtiltโ€ ๐‘” by an amount โ€–๐‘ โ€–๐‘‹ < 1. If we tilt with โ€–๐‘โ€–๐‘‹ > 1, the tilted function hasno minimizer, while for โ€–๐‘โ€–๐‘‹ = 1, every ๐‘ฅ = ๐‘ง + ๐‘ก๐‘ for ๐‘ก โ‰ฅ 0 is a minimizer.

We now return to the general solution mapping (27.36). The following proposition appliedto ๐น (๐‘ฅ, ๐‘) โ‰” ๐œ•๐‘ฅ ๐‘“ (๐‘ฅ ;๐‘) provides a general tool for our analysis.

Theorem 27.19. Let ๐‘ƒ , ๐‘‹ , and ๐‘Œ be reexive and Gรขteaux smooth Banach spaces. For ๐น :๐‘‹ ร— ๐‘ƒ โ†’ ๐‘Œ , let

๐‘† (๐‘) โ‰” {๐‘ฅ โˆˆ ๐‘‹ | 0 โˆˆ ๐น (๐‘ฅ, ๐‘)}.Then ๐‘† has the Aubin property at ๐‘ for ๐‘ฅ โˆˆ ๐‘† (๐‘) if(27.40) (0, ๐‘โˆ—) โˆˆ ๐ทโˆ—

๐‘ ๐น (๐‘ฅ, ๐‘ |0) (๐‘ฆโˆ—) โ‡’ ๐‘ฆโˆ— = 0, ๐‘โˆ— = 0

and๐‘„ (๐‘ฆ, ๐‘) โ‰” {๐‘ฅ โˆˆ ๐‘‹ | ๐‘ฆ โˆˆ ๐น (๐‘ฅ, ๐‘)}.

is PSNC at (๐‘ฆ, ๐‘) for ๐‘ฅ .

Proof. We have ๐‘† (๐‘) = ๐‘„ (0, ๐‘). Hence if we can show that ๐‘„ has the Aubin property at(0, ๐‘) for ๐‘ฅ , this will imply the Aubin property of ๐‘† at ๐‘ for ๐‘ฅ by simple restriction of thefree variables in Theorem 27.2 (i) to the subspace {0} ร— ๐‘ƒ .We do this by applying Theorem 27.10 to ๐‘„ , which holds if we can show that

๐ทโˆ—๐‘€๐‘„ (0, ๐‘ |๐‘ฅ) (0) = {0}.

By (27.27), a sucient assumption for this is that

๐ทโˆ—๐‘„ (0, ๐‘ |๐‘ฅ) (0) = {0},which can equivalently be expressed as

(27.41) (๐‘ฆโˆ—, ๐‘โˆ—, 0) โˆˆ ๐‘graph๐‘„ (0, ๐‘, ๐‘ฅ) โ‡’ ๐‘ฆโˆ— = 0, ๐‘โˆ— = 0.

Nowgraph๐‘„ = {(๐‘ฆ, ๐‘, ๐‘ฅ) | ๐‘ฆ โˆˆ ๐น (๐‘ฅ, ๐‘)} = ๐œ‹ graph ๐น

for the permutation ๐œ‹ (๐‘ฅ, ๐‘, ๐‘ฆ) โ‰” (๐‘ฆ, ๐‘, ๐‘ฅ) (which applied to a set should be understood asapplied to every element of that set). We thus also have

๐‘graph๐‘„ (๐‘ฆ, ๐‘, ๐‘ฅ) = ๐œ‹๐‘graph ๐น (๐œ‹ (๐‘ฆ, ๐‘, ๐‘ฅ)) .In particular, (27.41) becomes

(0, ๐‘โˆ—, ๐‘ฆโˆ—) โˆˆ ๐‘graph ๐น (๐‘ฅ, ๐‘, 0) โ‡’ ๐‘ฆโˆ— = 0, ๐‘โˆ— = 0.

But this is equivalent to (27.40). ๏ฟฝ

362

27 lipschitz-like properties and stability

Remark 27.20. Theorem 27.19 is related to the classical implicit function theorem. If ๐น is graphicallyregular at (๐‘ฅ, ๐‘, 0), it also possible derive formulas for ๐ท๐‘† , such as

๐ท๐‘† (๐‘ |๐‘ฅ) (ฮ”๐‘) = {ฮ”๐‘ฅ โˆˆ ๐‘‹ | ๐ท๐น (๐‘ฅ, ๐‘ |0) (ฮ”๐‘ฅ,ฮ”๐‘) 3 0}.

For details in nite dimensions, we refer to [Rockafellar & Wets 1998, Theorem 9.56 & Proposition8.41].

We close this chapter by illustrating the requirements of Theorem 27.19 for the stability ofspecic problems of the form (P) with respect to the penalty parameter ๐›ผ . (Naturally, thesecan be relaxed or made further explicit in more concrete situations.)

Example 27.21. Let ๐‘‹ be a nite-dimensional and Gรขteaux smooth Banach space andlet โ„Ž : ๐‘‹ โ†’ โ„ be twice continuously dierentiable and ๐‘” : ๐‘‹ โ†’ โ„ be convex, proper,and lower semicontinuous. We then consider for ๐›ผ > 0 the problem

(27.42) min๐‘ฅโ„Ž(๐‘ฅ) + ๐›ผ๐‘”(๐‘ฅ).

By Theorems 13.4, 13.5 and 13.20, the solution mapping with respect to the parameter ๐›ผis then

๐‘† (๐›ผ) โ‰” {๐‘ฅ โˆˆ โ„๐‘› | 0 โˆˆ ๐น (๐‘ฅ ;๐›ผ)} for ๐น (๐‘ฅ ;๐›ผ) โ‰” โ„Žโ€ฒ(๐‘ฅ) + ๐›ผ๐œ•๐‘”(๐‘ฅ).

To apply Theorem 27.19 to obtain the Aubin property of ๐‘† at ๐›ผ for ๐‘ฅ โˆˆ ๐‘† (๐›ผ), we need toverify its assumptions. First, by Theorems 25.14 and 25.20, we have

๐ทโˆ—๐น (๐‘ฅ ;๐›ผ |0) (๐‘ฆ) =(โ„Žโ€ฒโ€ฒ(๐‘ฅ)โˆ—๐‘ฆ + ๐›ผ๐ทโˆ— [๐œ•๐‘”] (๐‘ฅ | โˆ’ ๐›ผโˆ’1โ„Žโ€ฒ(๐‘ฅ)) (๐‘ฆ)

โˆ’ใ€ˆโ„Žโ€ฒ(๐‘ฅ), ๐‘ฆใ€‰๐‘‹

).

Thus (27.40) holds if

(27.43) 0 โˆˆ โ„Žโ€ฒโ€ฒ(๐‘ฅ)โˆ—๐‘ฆ + ๐›ผ๐ทโˆ— [๐œ•๐‘”] (๐‘ฅ | โˆ’ ๐›ผโˆ’1โ„Žโ€ฒ(๐‘ฅ)) (๐‘ฆ) โ‡’ ๐‘ฆ = 0.

Furthermore, since ๐‘‹ โˆ— ร—โ„ is nite-dimensional, the PSNC holds at every (๐‘ฆ, ๐›ผ) with๐‘ฆ โˆˆ ๐น (๐‘ฅ, ๐›ผ) and ๐›ผ > 0 by Lemma 25.5. Hence Theorem 27.19 is indeed applicable andimplies that ๐‘† has the Aubin property at ๐›ผ . We therefore have the stability estimate

inf๐‘ฅโˆˆ๐‘† (๐›ผ)

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ^ |๐›ผ โˆ’ ๐›ผ |

for some ^ > 0 and all ๐›ผ suciently close to ๐›ผ .

363

28 FASTER CONVERGENCE FROM REGULARITY

As we have seen in Chapter 10, proximal point and splitting methods can be accelerated ifat least one of the involved functionals is strongly convex. However, this can be a too strongrequirement, and we will show in this chapter how we can obtain faster convergence (evenwithout acceleration) under the weaker requirement of metric subregularity or of strongsubmonotonicity. We rst study these notions in general before illustrating their eect onsplitting methods by showing local linear convergence of forward-backward splitting.

28.1 subregularity and submonotonicity of subdifferentials

Throughout this section, let ๐‘‹ be a Banach space and ๐บ : ๐‘‹ โ†’ โ„ be convex, proper, andlower semicontinuous. Our goal is now to give conditions for metric subregularity andstrong submonotonicity of ๐œ•๐บ : ๐‘‹ โ‡’ ๐‘‹ โˆ— at a critical point ๐‘ฅ โˆˆ ๐‘‹ with 0 โˆˆ ๐œ•๐บ (๐‘ฅ).

metric subregularity

We recall from (27.3) that a set-valued mapping ๐ป : ๐‘‹ โ‡’ ๐‘‹ โˆ— is metrically subregular at๐‘ฅ โˆˆ ๐‘‹ for๐‘ค โˆˆ ๐‘‹ โˆ— if there exist ๐›ฟ > 0 and ^ > 0 such that

dist(๐‘ฅ, ๐ปโˆ’1(๐‘ค)) โ‰ค ^ dist(๐‘ค,๐ป (๐‘ฅ)) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .

We also recall that the inmum of all ^ > 0 for which this inequality holds for some ๐›ฟ > 0is subreg๐ป (๐‘ฅ |๐‘ค), the modulus of (metric) subregularity of ๐ป at ๐‘ฅ for๐‘ค . In the following,we will also make use of the squared distance of ๐‘ฅ โˆˆ ๐‘‹ to a set ๐ด โŠ‚ ๐‘‹ ,

dist2(๐‘ฅ,๐ด) โ‰” inf๐‘ฅโˆˆ๐ด

โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ .

Theorem 28.1. Let ๐บ : ๐‘‹ โ†’ โ„ be convex, proper, and lower semicontinuous and let ๐‘ฅ โˆˆ ๐‘‹with 0 โˆˆ ๐œ•๐บ (๐‘ฅ). If there exist ๐›พ > 0 and ๐›ฟ > 0 such that

(28.1) ๐บ (๐‘ฅ) โ‰ฅ ๐บ (๐‘ฅ) + ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) (๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ)),

then ๐œ•๐บ is metrically subregular at ๐‘ฅ for 0 with subreg ๐œ•๐บ (๐‘ฅ |0) โ‰ค ๐›พโˆ’1.

364

28 faster convergence from regularity

Conversely, if subreg ๐œ•๐บ (๐‘ฅ |0) โ‰ค 4๐›พโˆ’1 for some ๐›พ > 0, then there exists ๐›ฟ > 0 such that (28.1)holds for this ๐›พ .

Proof. Let rst (28.1) hold for ๐›พ, ๐›ฟ > 0. We need to show that

(28.2) ๐›พ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) โ‰ค dist(0, ๐œ•๐บ (๐‘ฅ)) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ)) .To that end, let ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ). Clearly, if ๐œ•๐บ (๐‘ฅ) = โˆ…, there is nothing to prove. So assume thatthere exists an ๐‘ฅโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ). For each Y > 0, we can also nd ๐‘ฅY โˆˆ [๐œ•๐บ]โˆ’1(0) such that

(28.3) โ€–๐‘ฅ โˆ’ ๐‘ฅY โ€–๐‘‹ โ‰ค dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) + Y.By the denition of the convex subdierential and ๐‘ฅ, ๐‘ฅY โˆˆ argmin๐บ , we have

ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅYใ€‰๐‘‹ โ‰ฅ ๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅY) = ๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ).Combined with (28.1) and (28.3), this yields

๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅYใ€‰๐‘‹โ‰ค โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ€–๐‘ฅ โˆ’ ๐‘ฅY โ€–๐‘‹ โ‰ค โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— (dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) + Y).

Since Y > 0 was arbitrary and โ€–๐‘ฅโˆ—โ€–๐‘‹ โˆ— โ‰ค dist(0, ๐œ•๐บ (๐‘ฅ)), we obtain (28.2) and thus metricsubregularity with the claimed modulus.

Conversely, let ๐œ•๐บ be metrically subregular at ๐‘ฅ for 0 with subreg ๐œ•๐บ (๐‘ฅ |0) = 4/๐›พ for some๐›พ > 0. Let ๐›พ โˆˆ (0, 1/(4๐›พ)). We argue by contradiction. Assume that (28.1) does not hold forarbitrary ๐›ฟ > 0. Then we can nd some ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, 2๐›ฟ/3) such that

(28.4) ๐บ (๐‘ฅ) < ๐บ (๐‘ฅ) + ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) .However, ๐‘ฅ is a minimizer of ๐บ , so necessarily ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) > 0. By Ekelandโ€™svariational principle (Theorem 2.14), we can thus nd ๐‘ฆ โˆˆ ๐‘‹ satisfying

(28.5) โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค 12 dist(๐‘ฅ, [๐œ•๐บ]

โˆ’1(0))and for all ๐‘ฅ โˆˆ ๐‘‹ that

๐บ (๐‘ฅ) โ‰ฅ ๐บ (๐‘ฆ) โˆ’ ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))

12 dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))

โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ = ๐บ (๐‘ฆ) โˆ’ 2๐›พ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))โ€–๐‘ฅ โˆ’ ๐‘ฆ โ€–๐‘‹ .

In follows that ๐‘ฆ minimizes๐บ + 2๐›พ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))โ€– ยท โˆ’ ๐‘ฆ โ€–๐‘‹ , which by Theorems 4.2, 4.6and 4.14 is equivalent to 0 โˆˆ ๐œ•๐บ (๐‘ฆ) + 2๐›พ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))๐”น๐‘‹ โˆ— . Hence we can nd some๐‘ฆโˆ— โˆˆ ๐œ•๐บ (๐‘ฆ) satisfying โ€–๐‘ฆโˆ—โ€–๐‘‹ โˆ— โ‰ค 2๐›พ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)). Using (28.5), we now obtain

2๐›พ dist(0, ๐œ•๐บ (๐‘ฆ)) < (2๐›พ)โˆ’1 dist(0, ๐œ•๐บ (๐‘ฆ))โ‰ค (2๐›พ)โˆ’1โ€–๐‘ฆโˆ—โ€–๐‘‹ โˆ— โ‰ค dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))= 2 dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) โˆ’ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))โ‰ค 2โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ + 2 dist(๐‘ฆ, [๐œ•๐บ]โˆ’1(0)) โˆ’ dist(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))โ‰ค 2 dist(๐‘ฆ, [๐œ•๐บ]โˆ’1(0)) .

365

28 faster convergence from regularity

By (28.5) and our choice of ๐‘ฅ โˆˆ ๐”น(๐‘ฅ, 2๐›ฟ/3),

โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค โ€–๐‘ฆ โˆ’ ๐‘ฅ โ€–๐‘‹ + โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค 32 โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ค ๐›ฟ.

Therefore ๐‘ฆ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) violates the assumed metric subregularity (28.2) with the factor ๐›พ ,and hence (28.1) holds. ๏ฟฝ

Applying Theorem 28.1 to ๐‘ฅ โ†ฆโ†’ ๐บ (๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅใ€‰๐‘‹ now yields the following characterizationdue to [Aragรณn Artacho & Georoy 2014].

Corollary 28.2. Let๐บ : ๐‘‹ โ†’ โ„ be convex, proper, and lower semicontinuous and let ๐‘ฅ โˆˆ ๐‘‹and ๐‘ฅโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ). If there exist ๐›พ > 0 and ๐›ฟ > 0 such that

(28.6) ๐บ (๐‘ฅ) โ‰ฅ ๐บ (๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—)) (๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ)),

then ๐œ•๐บ is metrically subregular at ๐‘ฅ for ๐‘ฅโˆ— with subreg ๐œ•๐บ (๐‘ฅ |๐‘ฅโˆ—) โ‰ค ๐›พโˆ’1.Conversely, if subreg ๐œ•๐บ (๐‘ฅ |๐‘ฅโˆ—) โ‰ค 4/๐›พ for some ๐›พ > 0, then there exists ๐›ฟ > 0 such that (28.6)holds for this ๐›พ .

Remark 28.3 (strong metric subregularity). As in Remark 27.1, we can also characterize strong metricsubregularity using a strong notion of local subdierentiability. In the setting of Corollary 28.2, itwas shown in [Aragรณn Artacho & Georoy 2014] that strong metric subregularity of ๐œ•๐บ at ๐‘ฅ for ๐‘ฅโˆ—is equivalent to

(28.7) ๐บ (๐‘ฅ) โ‰ฅ ๐บ (๐‘ฅ) + ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + ๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ (๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ)),

i.e., a local form of strong subdierentiability. Compared to the characterization of metric subregu-larity in (28.6), intuitively the strong version does not โ€œsqueezeโ€ [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—) into a single point.

Strong metric subregularity may almost trivially be used in the convergence proofs of Part IIand Chapter 15 as a relaxation of strong convexity; compare [Clason, Mazurenko & Valkonen 2020].Also observe that (28.7) can be expressed in terms of the Bregman divergence (see Section 11.1) as

๐ต๐‘ฅโˆ—๐บ (๐‘ฅ, ๐‘ฅ) โ‰ฅ ๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ (๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ)),

i.e., that ๐ต๐‘ฅโˆ—๐บ is elliptic at ๐‘ฅ in the sense of [Valkonen 2020a]. In optimization methods based onpreconditioning by Bregman divergences instead of the linear preconditioner ๐‘€ as discussed inSection 11.1, this generalizes the positive deniteness requirement on๐‘€ .

366

28 faster convergence from regularity

strong submonotonicity

An alternative is to relax the strong monotonicity assumption of Chapter 10 more directly.We say that a set-valued mapping ๐ป : ๐‘‹ โ‡’ ๐‘‹ โˆ— is (๐›พ, \ )-strongly submonotone at ๐‘ฅ for๐‘ฅโˆ— โˆˆ ๐ป (๐‘ฅ) with \ โ‰ฅ ๐›พ > 0 if there exists ๐›ฟ > 0 such that for all ๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ) and๐‘ฅโˆ— โˆˆ ๐ป (๐‘ฅ) โˆฉ ๐”น๐‘‹ โˆ— (๐‘ฅโˆ—, ๐›ฟ),

(28.8) inf๐‘ฅโˆˆ๐ปโˆ’1 (๐‘ฅโˆ—)

(ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ + (\ โˆ’ ๐›พ)โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹) โ‰ฅ \ dist2(๐‘ฅ, ๐ปโˆ’1(๐‘ฅโˆ—)) .

If this only holds for \ โ‰ฅ ๐›พ = 0, then we call ๐ป submonotone at ๐‘ฅ for ๐‘ฅโˆ—.

Clearly, (strong) monotonicity (see Lemma 7.4) implies (strong) submonotonicity at any๐‘ฅ โˆˆ ๐‘‹ and ๐‘ฅโˆ— โˆˆ ๐ป (๐‘ฅ). However, subdierentials of convex functionals need not be stronglymonotone. The next theorem shows that local second-order growth away from the set ofminimizers implies strong submonotonicity of such subdierentials at any minimizer ๐‘ฅ for๐‘ฅโˆ— = 0, which is the monotonicity-based analogue of the characterization Theorem 28.1 ofmetric subregularity.

Theorem 28.4. Let ๐บ : ๐‘‹ โ†’ โ„ be convex, proper, and lower semicontinuous and let ๐‘ฅ โˆˆ ๐‘‹with 0 โˆˆ ๐œ•๐บ (๐‘ฅ). If there exists ๐›ฟ > 0 such that

(28.9) ๐บ (๐‘ฅ) โ‰ฅ ๐บ (๐‘ฅ) + ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0)) (๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ)),

then ๐œ•๐บ is (๐›พ, \ )-strongly submonotone at ๐‘ฅ for 0 for any \ โ‰ฅ ๐›พ .

Proof. Since \ โ‰ฅ ๐›พ , (28.9) is equivalent to

(28.10) inf๐‘ฅโˆˆ[๐œ•๐บ]โˆ’1 (0)

(๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) + (\ โˆ’ ๐›พ)โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹

) โ‰ฅ \ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(0))for all ๐‘ฅ โˆˆ ๐”น๐‘‹ (๐‘ฅ, ๐›ฟ). By the denition of the convex subdierential, we have for all ๐‘ฅ โˆˆ[๐œ•๐บ]โˆ’1(0) and ๐‘ฅโˆ— = 0 that

ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) = ๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ).

Inserting this into (28.10) yields the denition (28.8) of strong submonotonicity for ๐ป =๐œ•๐บ . ๏ฟฝ

TogetherwithTheorem 28.1, this shows that for convex subdierentials,metric subregularityimplies strong submonotonicity, which is thus a weaker property.

Remark 28.5. Submonotonicity was introduced in [Valkonen 2021] together with its โ€œpartialโ€ (i.e.,only holding on subspace) variants.

367

28 faster convergence from regularity

examples

We conclude this section by showing that the subdierentials of the indicator functionalof the nite-dimensional unit ball and of the absolute value function are both subreg-ular and strongly submonotone. Note that neither of these subdierentials is stronglymonotone in the conventional sense. Here we restrict ourselves to showing (๐›พ,๐›พ)-strongsubmonotonicity for some (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph ๐œ•๐บ , i.e., that there exists ๐›ฟ > 0 such that

(28.11) ใ€ˆ๐‘ฅโˆ— โˆ’ ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ ๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—)) (๐‘ฅ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), ๐‘ฅโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ)) .

Lemma 28.6. Let ๐บ โ‰” ๐›ฟ๐”น(0,๐›ผ) on (โ„๐‘ , โ€– ยท โ€–2) and (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph ๐œ•๐บ . Then ๐œ•๐บ is

(i) metrically subregular at ๐‘ฅ for ๐‘ฅโˆ— for any ๐›ฟ โˆˆ (0, ๐›ผ] and

^ โ‰ฅ{2๐›ผ/โ€–๐‘ฅโˆ—โ€–2 if ๐‘ฅโˆ— โ‰  0,0 if ๐‘ฅโˆ— = 0;

(ii) (๐›พ,๐›พ)-strongly submonotone at ๐‘ฅ for ๐‘ฅโˆ— for any ๐›ฟ > 0 and

๐›พ โ‰ค{โ€–๐‘ฅโˆ—โ€–2/(2๐›ผ) if ๐‘ฅโˆ— โ‰  0,โˆž if ๐‘ฅโˆ— = 0.

Proof. We rst verify (28.6) for ๐›ฟ = ๐›ผ and ๐›พ = ^โˆ’1 as stated. To that end, let ๐‘ฅ โˆˆ ๐”น(0, ๐›ผ). If๐‘ฅโˆ— = 0, then (28.6) trivially holds by the subdierentiability of๐บ and dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—)) =dist2(๐‘ฅ,๐”น(0, ๐›ผ)) = 0. Let therefore ๐‘ฅโˆ— โ‰  0. Then [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—) = {๐‘ฅ} as well as โ€–๐‘ฅ โ€–2 = ๐›ผ and๐‘ฅโˆ— = ๐›ฝ๐‘ฅ for ๐›ฝ = โ€–๐‘ฅโˆ—โ€–2/โ€–๐‘ฅ โ€–2. Since ๐›พ โ‰ค โ€–๐‘ฅโˆ—โ€–2/(2๐›ผ), we have ๐›ฝ โ‰ฅ 2๐›พ . Then โ€–๐‘ฅ โ€–2 โ‰ค ๐›ผ yields

๐›พ dist2(๐‘ฅ, [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—)) = ๐›พ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–22โ‰ค ๐›ฝ ใ€ˆ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰2 โˆ’ ๐›ฝ

2 โ€–๐‘ฅ โ€–22 +

๐›ฝ

2 โ€–๐‘ฅ โ€–22

โ‰ค ๐›ฝ ใ€ˆ๐‘ฅ, ๐‘ฅ โˆ’ ๐‘ฅใ€‰2= ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰2โ‰ค ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰2 +๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ).

Since dom๐บ = ๐”น(0, ๐›ผ), this shows that (28.6) holds for any ๐›ฟ > 0.

Corollary 28.2 now yields (i). Adding

๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰2 (๐‘ฅโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ))

to (28.6), we also obtain (28.11) and thus (ii). ๏ฟฝ

Lemma 28.7. Let ๐บ โ‰” | ยท | on โ„ and (๐‘ฅ, ๐‘ฅโˆ—) โˆˆ graph ๐œ•๐บ . Then ๐œ•๐บ is

368

28 faster convergence from regularity

(i) metrically subregular at ๐‘ฅ for ๐‘ฅโˆ— for any ^ > 0 and

๐›ฟ โ‰ค{2^ if ๐‘ฅโˆ— โˆˆ {1,โˆ’1},^ if |๐‘ฅโˆ— | < 1;

(ii) (๐›พ,๐›พ)-strongly submonotone at ๐‘ฅ for ๐‘ฅโˆ— for any ๐›พ > 0 and

๐›ฟ โ‰ค{2๐›พโˆ’1 if ๐‘ฅโˆ— โˆˆ {1,โˆ’1},๐›พโˆ’1 if |๐‘ฅโˆ— | < 1.

Proof. We rst verify (28.6) for any ๐›ฟ > 0 and ๐›พ = ^โˆ’1 as stated. Suppose rst that ๐‘ฅโˆ— = 1so that ๐‘ฅ โˆˆ [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—) = [0,โˆž). This implies that ๐‘ฅ = |๐‘ฅ |, and hence (28.6) becomes

|๐‘ฅ | โ‰ฅ ๐‘ฅ + ๐›พ inf๐‘ฅโ‰ฅ0

(๐‘ฅ โˆ’ ๐‘ฅ)2 ( |๐‘ฅ โˆ’ ๐‘ฅ | โ‰ค ๐›ฟ).

If ๐‘ฅ โ‰ฅ 0, this trivially holds by taking ๐‘ฅ = ๐‘ฅ . If ๐‘ฅ โ‰ค 0, the right-hand side is minimized by๐‘ฅ = 0, and thus the inequality holds for ๐‘ฅ โ‰ฅ โˆ’2๐›พโˆ’1. Since ๐‘ฅ โ‰ฅ 0, this is guaranteed by ourbound on ๐›ฟ . The case ๐‘ฅโˆ— = โˆ’1 is analogous.If |๐‘ฅโˆ— | < 1, then ๐‘ฅ โˆˆ [๐œ•๐บ]โˆ’1(๐‘ฅโˆ—) = {0}, and hence (28.6) becomes

|๐‘ฅ | โ‰ฅ ๐›พ |๐‘ฅ |2 ( |๐‘ฅ | โ‰ค ๐›ฟ).This again holds by our choice of ๐›ฟ .

Corollary 28.2 now yields (i). Adding

๐บ (๐‘ฅ) โˆ’๐บ (๐‘ฅ) โ‰ฅ ใ€ˆ๐‘ฅโˆ—, ๐‘ฅ โˆ’ ๐‘ฅใ€‰ (๐‘ฅโˆ— โˆˆ ๐œ•๐บ (๐‘ฅ))to (28.6), we also obtain (28.11) and thus (ii). ๏ฟฝ

Remark 28.8. If we allow in the denition of subregularity or submonotonicity an arbitrary neigh-borhood of ๐‘ฅ instead of a ball, then Lemma 28.7 holds in a much larger neighborhood.

28.2 local linear convergence of explicit splitting

Returning to the notation used in Chapters 8 to 12, we assume throughout that๐‘‹ is a Hilbertspace, ๐น,๐บ : ๐‘‹ โ†’ โ„ are convex, proper, and lower semicontinuous, and that ๐น is Frรฉchetdierentiable and has a Lipschitz continuous gradient โˆ‡๐น with Lipschitz constant ๐ฟ โ‰ฅ 0.Let further an initial iterate ๐‘ฅ0 โˆˆ ๐‘‹ and a step size ๐œ > 0 be given and let the sequence{๐‘ฅ๐‘˜}๐‘˜โˆˆโ„• be generated by the forward-backward splitting method (or basic proximal pointmethod if ๐น = 0), i.e., by solving for ๐‘ฅ๐‘˜+1 in

(28.12) 0 โˆˆ ๐œ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] + (๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜).

369

28 faster convergence from regularity

We also write ๐ป โ‰” ๐œ•๐บ + โˆ‡๐น : ๐‘‹ โ‡’ ๐‘‹ . Finally, it is worth recalling the approach ofChapter 10 for encoding convergence rate into โ€œtestingโ€ parameters ๐œ‘๐‘˜ > 0.

We start our analysis by adapting the proofs of Theorems 10.2 and 11.4 to employ thesquared distance function ๐‘ฅ โ†ฆโ†’ dist2(๐‘ฅ ;๐‘‹ ) to the entire solution set ๐‘‹ = ๐ปโˆ’1(0) in placeof the squared distance function ๐‘ฅ โ†ฆโ†’ โ€–๐‘ฅ โˆ’ ๐‘ฅ โ€–2๐‘‹ to a xed ๐‘ฅ โˆˆ ๐ปโˆ’1(0).

Lemma 28.9. Let ๐‘‹ โŠ‚ ๐‘‹ . If for all ๐‘˜ โˆˆ โ„• and๐‘ค๐‘˜+1 โ‰” โˆ’โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ ๐œโˆ’1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1),

(28.13) inf๐‘ฅโˆˆ๐‘‹

(๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜๐œ ใ€ˆ๐‘ค๐‘˜+1 + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹

)โ‰ฅ ๐œ‘๐‘˜+1

2 dist2(๐‘ฅ๐‘˜+1, ๐‘‹ ) โˆ’ ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ ,

then

(28.14) ๐œ‘๐‘2 dist2(๐‘ฅ๐‘ , ๐‘‹ ) โ‰ค ๐œ‘1

2 dist2(๐‘ฅ0, ๐‘‹ ) (๐‘ โ‰ฅ 1).

Proof. Inserting (28.12) into (28.13) yields

(28.15) inf๐‘ฅโˆˆ๐ปโˆ’1 (0)

๐œ‘๐‘˜

(12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โˆ’ ใ€ˆ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹)

โ‰ฅ ๐œ‘๐‘˜+12 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) .

Using the three-point formula (9.1), we can then rewrite (28.15) as๐œ‘๐‘˜2 dist2(๐‘ฅ๐‘˜ ;๐ปโˆ’1(0)) โ‰ฅ ๐œ‘๐‘˜+1

2 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) .

The claim now follows by a telescoping sum over ๐‘˜ = 0, . . . , ๐‘ โˆ’ 1. ๏ฟฝ

rates from error bounds and metric subregularity

Our rst approach for the satisfaction of (28.13) is based on error bounds, which we willprove using metric subregularity. The essence of error bounds is to prove for some \ > 0that

โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘‹ โ‰ฅ \ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–๐‘‹ .We slightly weaken this condition, and assume the bound to be relative to the entire solutionset, i.e.,

(28.16) โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ฅ \ dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) .

This bound holds under metric subregularity. We rst need the following technical lemmaon the iteration (28.12).

370

28 faster convergence from regularity

Lemma 28.10. If 0 < ๐œ๐ฟ < 1, then

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ฅ ๐œ2

4[1 + ๐ฟ2๐œ2] dist2(0, ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜+1)) .

Proof. Since โˆ’(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) โˆˆ ๐œ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)] by (28.12), we have

(28.17) 12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ =12 dist

2(0, {โˆ’(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)}) โ‰ฅ 12 dist

2(0, ๐œ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)]).

The generalized Youngโ€™s inequality for any ๐›ผ โˆˆ (0, 1) then yields

12 dist

2(0, ๐œ [๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜)])

=๐œ2

2 dist2(โˆ‡๐น (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜), ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜+1))

= inf๐‘žโˆˆ๐œ•๐บ (๐‘ฅ๐‘˜+1)

๐œ2

2 โ€–(โˆ‡๐น (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜)) โˆ’ (๐‘ž + โˆ‡๐น (๐‘ฅ๐‘˜+1))โ€–2๐‘‹

โ‰ฅ ๐œ2(1 โˆ’ ๐›ผโˆ’1)2 โ€–โˆ‡๐น (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜)โ€–2๐‘‹ + inf

๐‘žโˆˆ๐œ•๐บ (๐‘ฅ๐‘˜+1)๐œ2(1 โˆ’ ๐›ผ)

2 โ€–๐‘ž + โˆ‡๐น (๐‘ฅ๐‘˜+1)โ€–2๐‘‹

โ‰ฅ ๐œ2(1 โˆ’ ๐›ผโˆ’1)๐ฟ22 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ๐œ

2(1 โˆ’ ๐›ผ)2 dist2(0, ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜+1)),

where we have used in the last step that 1 โˆ’ ๐›ผโˆ’1 < 0 and that โˆ‡๐น is Lipschitz continuous.Combining this estimate with (28.17), we obtain that

1 โˆ’ ๐œ2(1 โˆ’ ๐›ผโˆ’1)๐ฟ22 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ฅ ๐œ2(1 โˆ’ ๐›ผ)

2 dist2(0, ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜+1)) .

Rearranging and using that 1 > ๐œ2(1 โˆ’ ๐›ผโˆ’1)๐ฟ2 by assumption then yields

12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ฅ \

2 dist2(0, ๐œ•๐บ (๐‘ฅ๐‘˜+1) + โˆ‡๐น (๐‘ฅ๐‘˜+1)) .

for\ โ‰”

๐œ2(1 โˆ’ ๐›ผ)1 โˆ’ ๐œ2(1 โˆ’ ๐›ผโˆ’1)๐ฟ2 ,

which for ๐›ผ = 1/2 yields the claim. ๏ฟฝ

Metric subregularity then immediately yields the error bound (28.16).

Lemma 28.11. Let๐ป be metrically subregular at ๐‘ฅ for๐‘ค = 0 for^ > 0 and ๐›ฟ > 0. If 0 < ๐œ๐ฟ โ‰ค 2and ๐‘ฅ๐‘˜+1 โˆˆ ๐”น(๐‘ฅ, ๐›ฟ), then (28.16) holds with \ = ๐œ2

2^2 [1+๐ฟ2๐œ2] .

371

28 faster convergence from regularity

Proof. Combining Lemma 28.10 and the denition of metric subregularity yields12 โ€–๐‘ฅ

๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ฅ ๐œ2

4[1 + ๐ฟ2๐œ2] dist2(0, ๐ป (๐‘ฅ๐‘˜+1)) โ‰ฅ ๐œ2

4^2 [1 + ๐ฟ2๐œ2] dist2(๐‘ฅ๐‘˜+1, ๐ปโˆ’1(0)) . ๏ฟฝ

From this lemma,we now obtain local linear convergence of the forward-backward splittingmethod when ๐ป is metrically subregular at a solution.

Theorem 28.12. Let ๐ป be metrically subregular at ๐‘ฅ โˆˆ ๐ปโˆ’1(0) for๐‘ค = 0 for ^ > 0 and ๐›ฟ > 0.If 0 < ๐œ๐ฟ โ‰ค 2 and ๐‘ฅ๐‘˜ โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) for all ๐‘˜ โˆˆ โ„•, then (28.14) holds for ๐œ‘๐‘˜+1 โ‰” ๐œ‘๐‘˜ (1 + \ ) and๐œ‘0 = 1 with \ = ๐œ2

2^2 [1+๐ฟ2๐œ2] . In particular, dist2(๐‘ฅ๐‘ ;๐ปโˆ’1(0)) โ†’ 0 at a linear rate.

Proof. Let ๐‘ฅ โˆˆ ๐ปโˆ’1(0) and๐‘ค๐‘˜+1 โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1) as in Lemma 28.9. From (10.11) in the proof ofTheorem 10.2 and using ๐œ๐ฟ โ‰ค 2, we obtain

ใ€ˆ๐‘ค๐‘˜+1 + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐ฟ4 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โ‰ฅ โˆ’ 1

2๐œ โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹

Lemma 28.11 now yields the error bound (28.16) and hence for all ๐‘ฅ โˆˆ ๐ปโˆ’1(0) that๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ๐œ‘๐‘˜+1 โˆ’ ๐œ‘๐‘˜\2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–๐‘‹ โ‰ฅ ๐œ‘๐‘˜+1

2 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) .Summing these two estimates yields๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜๐œ ใ€ˆ๐‘ค๐‘˜+1 + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹

โ‰ฅ ๐œ‘๐‘˜+12 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) .

Taking the inmum over ๐‘ฅ โˆˆ ๐ปโˆ’1(0), we obtain (28.13) for ๐‘‹ = ๐ปโˆ’1(0). The claim nowfollows from Lemma 28.9 and the exponential growth of ๐œ‘๐‘˜ . ๏ฟฝ

The convergence is local due to the requirement ๐‘ฅ๐‘˜+1 โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) for applying subregularity.In nite dimensions, the weak convergence result of Theorem 9.6 of course guaranteesthat the iterates enter and remain in this neighborhood after a nite number of steps.

Remark 28.13 (local linear convergence). Local linear convergence was rst derived from errorbounds in [Luo & Tseng 1992] for matrix splitting problems and was studied for other methods,including the ADMMand the proximal pointmethod among others, in [Han&Yuan 2013; Aspelmeier,Charitha & Luke 2016; Leventhal 2009; Li & Mordukhovich 2012]. An alternative approach to theproximal point method was taken in [Aragรณn Artacho & Gaydu 2012] based on Lyusternikโ€“Gravesstyle estimates, while [Adly, Cibulka & Ngai 2015] presented an approach based on metric regularityto Newtonโ€™s method for variational inclusions. Furthermore, [Zhou & So 2017] proposed a uniedapproach to error bounds for generic smooth constrained problems. Finally, [Liu et al. 2018; Valkonen2021] introduced partial or subspace versions of error bounds and show the fast convergence ofonly some variables of structured algorithms such as the ADMM or the PDPS. The relationshipsbetween error bounds and metric subregularity is studied in more detail in [Gfrerer 2011; Ioe 2017;Kruger 2015; Dontchev & Rockafellar 2014; Ngai & Thรฉra 2008].

372

28 faster convergence from regularity

rates from strong submonotonicity

If ๐ป is instead strongly submonotone, we can (locally) ensure (28.13) directly.

Theorem 28.14. Let ๐ป be (๐›พ/2, \/2)-strongly submonotone at ๐‘ฅ โˆˆ ๐ปโˆ’1(0) for๐‘ค = 0 for ๐›ฟ > 0.If ๐›พ > \ + ๐ฟ2๐œ and ๐‘ฅ0 โˆˆ ๐”น(๐‘ฅ, Y) for some Y > 0 suciently small, then (28.14) holds for๐œ‘๐‘˜+1 โ‰” ๐œ‘๐‘˜ [1 + (๐›พ โˆ’ ๐ฟ2๐œ)๐œ] and ๐œ‘0 = 1. In particular, dist2(๐‘ฅ๐‘ ;๐ปโˆ’1(0)) โ†’ 0 at a linear rate.

Proof. Let๐‘ค๐‘˜+1 โ‰” โˆ’๐œโˆ’1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜) โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1) by (28.12). By (9.9) in the proof ofTheorem 9.6, if ๐‘ฅ0 โˆˆ ๐”น(๐‘ฅ, Y) for Y > 0 small enough, then โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–๐‘‹ โ‰ค ๐›ฟ/(๐ฟ + ๐œโˆ’1) forall ๐‘˜ โˆˆ โ„• such that the Lipschitz continuity of โˆ‡๐น yields

โ€–โˆ‡๐น (๐‘ฅ๐‘˜+1) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ ๐œโˆ’1(๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜)โ€–๐‘‹ โ‰ค ๐›ฟ.

Thus ๐‘ค๐‘˜+1 โˆˆ ๐œ•๐บ (๐‘ฅ๐‘˜+1) โˆฉ ๐”น(โˆ’โˆ‡๐น (๐‘ฅ๐‘˜+1), ๐›ฟ) and ๐‘ฅ๐‘˜+1 โˆˆ ๐”น(๐‘ฅ, ๐›ฟ) for all ๐‘˜ โˆˆ โ„•. Now, for all๐‘ฅ โˆˆ ๐ปโˆ’1(0), the strong submonotonicity of ๐ป at ๐‘ฅ for 0 implies that

๐œ‘๐‘˜๐œ ใ€ˆ๐‘ค๐‘˜+1 + โˆ‡๐น (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ + (\ โˆ’ ๐›พ)๐œ‘๐‘˜๐œ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ โ‰ฅ \๐œ‘๐‘˜๐œ

2 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0))

for all ๐‘˜ โˆˆ โ„•. Cauchyโ€™s inequality and the Lipschitz continuity of โˆ‡๐น then yields

๐œ‘๐‘˜๐œ ใ€ˆโˆ‡๐น (๐‘ฅ๐‘˜) โˆ’ โˆ‡๐น (๐‘ฅ๐‘˜+1), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹ โ‰ฅ โˆ’๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ โˆ’ ๐œ‘๐‘˜๐œ2๐ฟ2

2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ .

We now sum the last two inequalities to obtain

๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ + ๐œ‘๐‘˜๐œ ใ€ˆ๐‘ค๐‘˜+1 + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹

โ‰ฅ \๐œ‘๐‘˜๐œ

2 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) + (๐›พ โˆ’ \ โˆ’ ๐ฟ2๐œ)๐œ‘๐‘˜๐œ2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ .

Using that \ โˆ’ ๐›พ + ๐ฟ2๐œ < 0 and taking the inmum over all ๐‘ฅ โˆˆ ๐ปโˆ’1(0) then yields

inf๐‘ฅโˆˆ๐ปโˆ’1 (0)

(๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ โ€–2๐‘‹ + ๐œ‘๐‘˜๐œ ใ€ˆ๐‘ค๐‘˜+1 + โˆ‡๐น (๐‘ฅ๐‘˜), ๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅใ€‰๐‘‹

)โ‰ฅ (๐›พ โˆ’ ๐ฟ2๐œ)๐œ‘๐‘˜๐œ + ๐œ‘๐‘˜

2 dist2(๐‘ฅ๐‘˜+1;๐ปโˆ’1(0)) โˆ’ ๐œ‘๐‘˜2 โ€–๐‘ฅ๐‘˜+1 โˆ’ ๐‘ฅ๐‘˜ โ€–2๐‘‹ .

Since ๐›พ โˆ’ ๐ฟ2๐œ > 0 and ๐œ‘๐‘˜+1 = ๐œ‘๐‘˜ [1 + (๐›พ โˆ’ ๐ฟ2๐œ)๐œ], this shows (28.13) with ๐‘‹ = ๐ปโˆ’1(0). Theclaim now follows from Lemma 28.9 and the exponential growth of ๐œ‘๐‘˜ . ๏ฟฝ

Remark 28.15. Similarly to Theorem 10.1 (ii), if ๐น โ‰ก 0 we can let ๐œโ†’โˆž to obtain local superlinearconvergence of the proximal point method under strong submonotonicity of ๐œ•๐บ at the solution.

373

BIBLIOGRAPHY

S. Adly, R. Cibulka & H.V. Ngai (2015), Newtonโ€™s method for solving inclusions usingset-valued approximations, SIAM Journal on Optimization 25(1), 159โ€“184, doi: 10.1137/130926730.

H.W. Alt (2016), Linear Functional Analysis, An Application-Oriented Introduction, Universi-text, Springer, London, doi: 10.1007/978-1-4471-7280-2.

J. Appell & P. Zabreiko (1990), Nonlinear Superposition Operators, Cambridge UniversityPress, New York, doi: 10.1515/9783110265118.385.

F. J. Aragรณn Artacho & M. Gaydu (2012), A Lyusternikโ€“Graves theorem for the proximalpoint method, Computational Optimization and Applications 52(3), 785โ€“803, doi: 10.1007/s10589-011-9439-6.

F. J. Aragรณn Artacho & M.H. Geoffroy (2014), Metric subregularity of the convex subdif-ferential in Banach spaces, J. Nonlinear Convex Anal. 15(1), 35โ€“47, arxiv: 1303.3654.

K. J. Arrow, L. Hurwicz & H. Uzawa (1958), Studies in Linear and Non-Linear Programming,Stanford University Press, Stanford.

T. Aspelmeier, C. Charitha & D. R. Luke (2016), Local linear convergence of the ADMM/Douglasโ€“Rachford algorithms without strong convexity and application to statisticalimaging, SIAM Journal on Imaging Sciences 9(2), 842โ€“868, doi: 10.1137/15m103580x.

H. Attouch, G. Buttazzo & G. Michaille (2014), Variational Analysis in Sobolev and BVSpaces, 2nd ed., vol. 6, MOS-SIAM Series on Optimization, Society for Industrial andApplied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611973488.

H. Attouch & H. Brezis (1986), Duality for the sum of convex functions in general Banachspaces, in: Aspects of mathematics and its applications, vol. 34, North-Holland Math.Library, North-Holland, Amsterdam, 125โ€“133, doi: 10.1016/s0924-6509(09)70252-1.

J. Aubin & H. Frankowska (1990), Set-Valued Analysis, Birkhรคuser Basel, doi: 10.1007/978-0-8176-4848-0.

J.-P. Aubin (1981), Contingent derivatives of set-valued maps and existence of solutionsto nonlinear inclusions and dierential inclusions, in: Mathematical analysis and appli-cations, Part A, vol. 7, Adv. in Math. Suppl. Stud. Academic Press, New York-London,159โ€“229.

J.-P. Aubin (1984), Lipschitz behavior of solutions to convex minimization problems,Mathe-matics of Operations Research 9(1), 87โ€“111, doi: 10.1287/moor.9.1.87.

374

bibliography

D. Aussel, A. Daniilidis & L. Thibault (2005), Subsmooth sets: functional characterizationsand related concepts, Transactions of the American Mathematical Society 357(4), 1275โ€“1301,doi: 10.1090/s0002-9947-04-03718-3.

D. Azรฉ & J.-P. Penot (1995), Uniformly convex and uniformly smooth convex functions,Annales de la Facultรฉ des sciences de Toulouse: Mathรฉmatiques 4(4), 705โ€“730, url: hp://eudml.org/doc/73364.

M. Baฤรกk & U. Kohlenbach (2018), On proximal mappings with Young functions in uni-formly convex Banach spaces, Journal of Convex Analysis 25(4), 1291โ€“1318, arxiv: 1709.04700.

A. Bagirov, N. Karmitsa & M.M. Mรคkelรค (2014), Introduction to Nonsmooth Optimization,Theory, practice and software, Springer, Cham, doi: 10.1007/978-3-319-08114-4.

V. Barbu & T. Precupanu (2012), Convexity and Optimization in Banach Spaces, 4th ed.,Springer Monographs in Mathematics, Springer, Dordrecht, doi: 10.1007/978-94-007-2247-7.

H.H. Bauschke & P. L. Combettes (2017), Convex Analysis and Monotone Operator Theoryin Hilbert Spaces, 2nd ed., CMS Books in Mathematics/Ouvrages de Mathรฉmatiques de laSMC, Springer, New York, doi: 10.1007/978-3-319-48311-5.

A. Beck (2017), First-Order Methods in Optimization, Society for Industrial and AppliedMathematics, Philadelphia, PA, doi: 10.1137/1.9781611974997.

A. Beck&M. Teboulle (2009a), A fast iterative shrinkage-thresholding algorithm for linearinverse problems, SIAM Journal on Imaging Sciences 2(1), 183โ€“202, doi: 10.1137/080716542.

A. Beck & M. Teboulle (2009b), Fast gradient-based algorithms for constrained totalvariation image denoising and deblurring problems, IEEE Transactions on Image Processing18(11), 2419โ€“2434, doi: 10.1109/tip.2009.2028250.

M. Benning, F. Knoll, C.-B. Schรถnlieb & T. Valkonen (2016), Preconditioned ADMM withnonlinear operator constraint, in: System Modeling and Optimization: 27th IFIP TC 7Conference, CSMO 2015, Sophia Antipolis, France, June 29โ€“July 3, 2015, Revised SelectedPapers, ed. by L. Bociu, J.-A. Dรฉsidรฉri & A. Habbal, Springer International Publishing,Cham, 117โ€“126, doi: 10.1007/978-3-319-55795-3_10, arxiv: 1511.00425, url: hp://tuomov.

iki.fi/m/nonlinearADMM.pdf.F. Bigolin & S. N. Golo (2014), A historical account on characterizations of ๐ถ1-manifoldsin Euclidean spaces by tangent cones, Journal of Mathematical Analysis and Applications412(1), 63โ€“76, doi: 10.1016/j.jmaa.2013.10.035, arxiv: 1003.1332.

J.M. Borwein & D. Preiss (1987), A smooth variational principle with applications tosubdierentiability and to dierentiability of convex functions, Trans. Am. Math. Soc.303, 517โ€“527, doi: 10.2307/2000681.

J.M. Borwein & Q. J. Zhu (2005), Techniques of Variational Analysis, vol. 20, CMS Booksin Mathematics/Ouvrages de Mathรฉmatiques de la SMC, Springer-Verlag, New York,doi: 10.1007/0-387-28271-8.

G. Bouligand (1930), Sur quelques points de mรฉthodologie gรฉomรฉtrique, Rev. Gรฉn. desSciences 41, 39โ€“43.

K. Bredies & H. Sun (2016), Accelerated Douglasโ€“Rachford methods for the solution ofconvex-concave saddle-point problems, Preprint, arxiv: 1604.06282.

375

bibliography

H. Brezis (2010), Functional Analysis, Sobolev Spaces and Partial Dierential Equations,Springer, New York, doi: 10.1007/978-0-387-70914-7.

H. Brezis, M. G. Crandall & A. Pazy (1970), Perturbations of nonlinear maximal monotonesets in Banach space, Comm. Pure Appl. Math. 23, 123โ€“144, doi: 10.1002/cpa.3160230107.

M. Brokate (2014), Konvexe Analysis und Evolutionsprobleme, Lecture notes, ZentrumMathematik, TU Mรผnchen, url: hp://www-m6.ma.tum.de/~brokate/cev_ss14.pdf.

F. E. Browder (1965), Nonexpansive nonlinear operators in a Banach space, Proceedings ofthe National Academy of Sciences of the United States of America 54(4), 1041, doi: 10.1073/pnas.54.4.1041.

F. E. Browder (1967), Convergence theorems for sequences of nonlinear operators in Banachspaces, Mathematische Zeitschrift 100(3), 201โ€“225, doi: 10.1007/bf01109805.

A. Cegielski (2012), Iterative methods for xed point problems in Hilbert spaces, vol. 2057,Lecture Notes in Mathematics, Springer, Heidelberg, doi: 10.1007/978-3-642-30901-4.

A. Chambolle, R. A. DeVore, N.-y. Lee & B. J. Lucier (1998), Nonlinear wavelet imageprocessing: variational problems, compression, and noise removal through waveletshrinkage, IEEE Transactions on Image Processing 7(3), 319โ€“335, doi: 10.1109/83.661182.

A. Chambolle, M. Ehrhardt, P. Richtรกrik & C. Schรถnlieb (2018), Stochastic primal-dualhybrid gradient algorithm with arbitrary sampling and imaging applications, SIAMJournal on Optimization 28(4), 2783โ€“2808, doi: 10.1137/17m1134834.

A. Chambolle & T. Pock (2011), A rst-order primal-dual algorithm for convex problemswith applications to imaging, Journal of Mathematical Imaging and Vision 40(1), 120โ€“145,doi: 10.1007/s10851-010-0251-1.

A. Chambolle & T. Pock (2015), On the ergodic convergence rates of a rst-order primalโ€“dual algorithm, Mathematical Programming, 1โ€“35, doi: 10.1007/s10107-015-0957-3.

P. Chen, J. Huang & X. Zhang (2013), A primal-dual xed point algorithm for convexseparable minimization with applications to image restoration, Inverse Problems 29(2),025011, doi: 10.1088/0266-5611/29/2/025011.

X. Chen, Z. Nashed & L. Qi (2000), Smoothing methods and semismooth methods fornondierentiable operator equations, SIAM Journal on Numerical Analysis 38(4), 1200โ€“1216, doi: 10.1137/s0036142999356719.

C. Christof, C. Clason, C. Meyer & S. Walter (2018), Optimal control of a non-smoothsemilinear elliptic equation,Mathematical Control and Related Fields 8(1), 247โ€“276, doi: 10.3934/mcrf.2018011.

C. Christof & G. Wachsmuth (2018), No-gap second-order conditions via a directional cur-vature functional, SIAM Journal on Optimization 28(3), 2097โ€“2130, doi: 10.1137/17m1140418.

R. Cibulka, A. L. Dontchev & A. Y. Kruger (2018), Strong metric subregularity of mappingsin variational analysis and optimization, Journal of Mathematical Analysis and Applica-tions 457(2), Special Issue on Convex Analysis and Optimization: New Trends in Theoryand Applications, 1247โ€“1282, doi: j.jmaa.2016.11.045.

I. Cioranescu (1990), Geometry of Banach Spaces, Duality Mappings and Nonlinear Problems,vol. 62, Mathematics and Its Applications, Springer, Dordrecht, doi: 10.1007/978-94-009-2121-4.

376

bibliography

F. Clarke (2013), Functional Analysis, Calculus of Variations and Optimal Control, Springer,London, doi: 10.1007/978-1-4471-4820-3.

F. H. Clarke (1973), Necessary conditions for nonsmooth problems in optimal control andthe calculus of variations, PhD thesis, University of Washington.

F. H. Clarke (1975), Generalized gradients and applications, Transactions of the AmericanMathematical Society 205, 247โ€“262.

F. H. Clarke (1990), Optimization and Nonsmooth Analysis, vol. 5, Classics Appl. Math. Soci-ety for Industrial and Applied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611971309.

C. Clason (2020), Introduction to Functional Analysis, Compact Textbooks in Mathematics,Birkhรคuser, Basel, doi: 10.1007/978-3-030-52784-6.

C. Clason & K. Kunisch (2011), A duality-based approach to elliptic control problems innon-reexive Banach spaces, ESAIM: Control, Optimisation and Calculus of Variations17(1), 243โ€“266, doi: 10.1051/cocv/2010003.

C. Clason, S. Mazurenko & T. Valkonen (2019), Acceleration and global convergence of arst-order primalโ€“dual method for nonconvex problems, SIAM Journal on Optimization29 (1), 933โ€“963, doi: 10.1137/18m1170194, arxiv: 1802.03347.

C. Clason, S. Mazurenko & T. Valkonen (2020), Primalโ€“dual proximal splitting and gen-eralized conjugation in non-smooth non-convex optimization, Applied Mathematics &Optimization, doi: 10.1007/s00245-020-09676-1, arxiv: 1901.02746, url: hp://tuomov.iki.

fi/m/nlpdhgm_general.pdf.C. Clason & T. Valkonen (2017a), Primal-dual extragradient methods for nonlinear nons-mooth PDE-constrained optimization, SIAM Journal on Optimization 27(3), 1313โ€“1339,doi: 10.1137/16m1080859, arxiv: 1606.06219.

C. Clason & T. Valkonen (2017b), Stability of saddle points via explicit coderivatives ofpointwise subdierentials, Set-Valued and Variational Analysis 25 (1), 69โ€“112, doi: 10.1007/s11228-016-0366-7, arxiv: 1509.06582, url: hp://tuomov.iki.fi/m/pdex2_stability.pdf.

P. L. Combettes & N.N. Reyes (2013), Moreauโ€™s decomposition in Banach spaces, Mathe-matical Programming 139(1), 103โ€“114, doi: 10.1007/s10107-013-0663-y.

L. Condat (2013), A primalโ€“dual splitting method for convex optimization involving Lips-chitzian, proximable and linear composite terms, Journal of Optimization Theory andApplications 158(2), 460โ€“479, doi: 10.1007/s10957-012-0245-9.

I. Daubechies, M. Defrise & C. De Mol (2004), An iterative thresholding algorithm forlinear inverse problems with a sparsity constraint, Communications on Pure and AppliedMathematics 57(11), 1413โ€“1457, doi: 10.1002/cpa.20042.

E. DiBenedetto (2002), Real Analysis, Birkhรคuser Boston, Inc., Boston, MA, doi: 10.1007/978-1-4612-0117-5.

S. Dolecki & G.H. Greco (2011), Tangency vis-ร -vis dierentiability by Peano, Severi andGuareschi, Journal of Convex Analysis 18(2), 301โ€“339, arxiv: 1003.1332.

A. L. Dontchev & R. Rockafellar (2014), Implicit Functions and Solution Mappings, A Viewfrom Variational Analysis, 2nd ed., Springer Series in Operations Research and FinancialEngineering, Springer New York, doi: 10.1007/978-1-4939-1037-3.

377

bibliography

A. L. Dontchev & R. T. Rockafellar (2004), Regularity and conditioning of solution map-pings in variational analysis, Set-Valued and Variational Analysis 12(1-2), 79โ€“109, doi: 10.1023/b:svan.0000023394.19482.30.

J. Douglas Jim & J. Rachford H. H. (1956), On the numerical solution of heat conductionproblems in two and three space variables, Transactions of the American MathematicalSociety 82(2), 421โ€“439, doi: 10.2307/1993056.

Y. Drori, S. Sabach &M. Teboulle (2015), A simple algorithm for a class of nonsmoothconvexโ€“concave saddle-point problems,Operations Research Letters 43(2), 209โ€“214,doi: 10.1016/j.orl.2015.02.001.

J. Eckstein & D. P. Bertsekas (1992), On the Douglasโ€“Rachford splitting method and theproximal point algorithm for maximal monotone operators, Mathematical Programming55(1-3), 293โ€“318, doi: 10.1007/bf01581204.

I. Ekeland & G. Lebourg (1976), Generic Frรฉchet-dierentiability and perturbed optimiza-tion problems in Banach spaces, Transactions of the American Mathematical Society 224(2),193โ€“216, doi: 10.1090/s0002-9947-1976-0431253-2.

I. Ekeland & R. Tรฉmam (1999), Convex Analysis and Variational Problems, vol. 28, ClassicsAppl. Math. Society for Industrial and Applied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611971088.

E. Esser, X. Zhang & T. F. Chan (2010), A general framework for a class of rst orderprimal-dual algorithms for convex optimization in imaging science, SIAM Journal onImaging Sciences 3(4), 1015โ€“1046, doi: 10.1137/09076934x.

M. Fabian (1988), On classes of subdierentiability spaces of Ioe, Nonlinear Analysis:Theory, Methods & Applications 12(1), 63โ€“74, doi: 10.1016/0362-546x(88)90013-2.

M. Fabian, P. Habala, P. Hรกjek, V.M. Santalucรญa, J. Pelant & V. Zizler (2001), Dierentia-bility of norms, in: Functional Analysis and Innite-Dimensional Geometry, Springer NewYork, New York, NY, 241โ€“284, doi: 10.1007/978-1-4757-3480-5_8.

F. Facchinei & J.-S. Pang (2003a), Finite-dimensional Variational Inequalities and Comple-mentarity Problems. Vol. I, Springer Series in Operations Research, Springer-Verlag, NewYork, doi: 10.1007/b97543.

F. Facchinei & J.-S. Pang (2003b), Finite-dimensional Variational Inequalities and Comple-mentarity Problems. Vol. II, Springer Series in Operations Research, Springer-Verlag, NewYork, doi: 10.1007/b97544.

L. Fejรฉr (1922), รœber die Lage der Nullstellen von Polynomen, die aus Minimumforderungengewisser Art entspringen, Math. Ann. 85(1), 41โ€“48, doi: 10.1007/bf01449600.

I. Fonseca & G. Leoni (2007), Modern Methods in the Calculus of Variations: ๐ฟ๐‘ Spaces,Springer, doi: 10.1007/978-0-387-69006-3.

D. Gabay (1983), Applications of the method of multipliers to variational inequalities, in:Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, ed. by M. Fortin & R. Glowinski, vol. 15, Studies in Mathematics and itsApplications, North-Holland, Amsterdam, 299โ€“331.

H. Gferer & J. Outrata (2019), On a semismoothโˆ— Newton method for solving generalizedequations, Preprint, arxiv: 1904.09167.

378

bibliography

H. Gfrerer & J. Outrata (2016), On Lipschitzian properties of implicit multifunctions,SIAM Journal on Optimization 26(4), 2160โ€“2189, doi: 10.1137/15m1052299.

H. Gfrerer (2011), First order and second order characterizations of metric subregularityand calmness of constraint set mappings, SIAM Journal on Optimization 21(4), 1439โ€“1474,doi: 10.1137/100813415.

H. Gfrerer (2013), On directional metric regularity, subregularity and optimality conditionsfor nonsmooth mathematical programs, Set-Valued and Variational Analysis 21(2), 151โ€“176,doi: 10.1007/s11228-012-0220-5.

R. Gribonval & M. Nikolova (2020), A characterization of proximity operators, doi: 10.1007/s10851-020-00951-y, arxiv: 1807.04014.

D. Han & X. Yuan (2013), Local linear convergence of the Alternating Direction Method ofMultipliers for quadratic programs, SIAM Journal on Numerical Analysis 51(6), 3446โ€“3457,doi: 10.1137/120886753.

F. Harder & G. Wachsmuth (2018), The limiting normal cone of a complementarity set inSobolev spaces, Optimization 67(5), 1579โ€“1603, doi: 10.1080/02331934.2018.1484467.

B. He & X. Yuan (2012), Convergence analysis of primal-dual algorithms for a saddle-pointproblem: from contraction perspective, SIAM Journal on Imaging Sciences 5(1), 119โ€“149,doi: 10.1137/100814494.

J. Heinonen (2005), Lectures on Lipschitz analysis, vol. 100, Rep. Univ. Jyvรคskylรค Dept. Math.Stat. University of Jyvรคskylรค, url: hp://www.math.jyu.fi/research/reports/rep100.pdf.

R. Henrion, A. Jourani & J. Outrata (2002), On the calmness of a class of multifunctions,SIAM Journal on Optimization 13(2), 603โ€“618, doi: 10.1137/s1052623401395553.

M. Hintermรผller, K. Ito & K. Kunisch (2002), The primal-dual active set strategy asa semismooth Newton method, SIAM Journal on Optimization 13(3), 865โ€“888 (2003),doi: 10.1137/s1052623401383558.

M. Hintermรผller & K. Kunisch (2004), Total bounded variation regularization as a bilat-erally constrained optimization problem, SIAM Journal on Applied Mathematics 64(4),1311โ€“1333, doi: 10.1137/s0036139903422784.

J.-B. Hiriart-Urruty & C. Lemarรฉchal (1993a), Convex Analysis and Minimization Al-gorithms I, Fundamentals, vol. 305, Grundlehren der Mathematischen Wissenschaften,Springer-Verlag, Berlin, doi: 10.1007/978-3-662-02796-7.

J.-B. Hiriart-Urruty & C. Lemarรฉchal (1993b), Convex Analysis and Minimization Algo-rithms II, Advanced Theory and Bundle Methods, vol. 306, Grundlehren der Mathematis-chen Wissenschaften, Springer-Verlag, Berlin, doi: 10.1007/978-3-662-06409-2.

J.-B. Hiriart-Urruty & C. Lemarรฉchal (2001), Fundamentals of Convex Analysis, Springer-Verlag, Berlin, doi: 10.1007/978-3-642-56468-0.

T. Hohage & C. Homann (2014), A generalization of the Chambolleโ€“Pock algorithm toBanach spaces with applications to inverse problems, Preprint, arxiv: 1412.0126.

A.D. Ioffe (1984), Approximate subdierentials and applications. I. The nite-dimensionaltheory, Trans. Amer. Math. Soc. 281(1), 389โ€“416, doi: 10.2307/1999541.

A. Ioffe (2017), Variational Analysis of Regular Mappings: Theory and Applications, SpringerMonographs in Mathematics, Springer International Publishing, doi: 10.1007/978-3-319-64277-2.

379

bibliography

A. D. Ioffe (1979), Regular points of Lipschitz functions, Transactions of the American Math-ematical Society 251, 61โ€“69, doi: 10.1090/s0002-9947-1979-0531969-6.

K. Ito & K. Kunisch (2008), Lagrange Multiplier Approach to Variational Problems andApplications, vol. 15, Advances in Design and Control, Society for Industrial and AppliedMathematics, Philadelphia, PA, doi: 10.1137/1.9780898718614.

D. Klatte & B. Kummer (2002), Nonsmooth Equations in Optimization, vol. 60, NonconvexOptimization and its Applications, Regularity, calculus, methods and applications, KluwerAcademic Publishers, Dordrecht, doi: 10.1007/b130810.

M. Kojima & S. Shindo (1986), Extension of Newton and quasi-Newton methods to systemsof PC1 equations, J. Oper. Res. Soc. Japan 29(4), 352โ€“375, doi: 10.15807/jorsj.29.352.

M.A. Krasnoselโ€ฒskii (1955), Two remarks on the method of successive approximations,

Akademiya Nauk SSSR i Moskovskoe Matematicheskoe Obshchestvo. Uspekhi Matematich-eskikh Nauk 10(1(63)), 123โ€“127.

A. Y. Kruger (2015), Error bounds and metric subregularity, Optimization 64(1), 49โ€“79,doi: 10.1080/02331934.2014.938074.

F. Kruse (2018), Semismooth implicit functions, Journal of Convex Analysis 28(2), 595โ€“622,url: hps://imsc.uni-graz.at/mannel/SIF_Kruse.pdf.

B. Kummer (1988), Newtonโ€™s method for non-dierentiable functions,Mathematical Research45, 114โ€“125.

B. Kummer (2000), Generalized Newton and NCP-methods: convergence, regularity, actions,Discuss. Math. Dier. Incl. Control Optim. 20(2), 209โ€“244, doi: 10.7151/dmdico.1013.

D. Leventhal (2009), Metric subregularity and the proximal point method, Journal ofMathematical Analysis and Applications 360(2), 681โ€“688, doi: 10.1016/j.jmaa.2009.07.012.

G. Li & B. S. Mordukhovich (2012), Hรถlder metric subregularity with applications toproximal point method, SIAM Journal on Optimization 22(4), 1655โ€“1684, doi: 10.1137/120864660.

P. Lions & B. Mercier (1979), Splitting algorithms for the sum of two nonlinear operators,SIAM Journal on Numerical Analysis 16(6), 964โ€“979, doi: 10.1137/0716071.

Y. Liu, X. Yuan, S. Zeng & J. Zhang (2018), Partial error bound conditions and the linearconvergence rate of the Alternating Direction Method of Multipliers, SIAM Journal onNumerical Analysis 56(4), 2095โ€“2123, doi: 10.1137/17m1144623.

I. Loris & C. Verhoeven (2011), On a generalization of the iterative soft-thresholdingalgorithm for the case of non-separable penalty, Inverse Problems 27(12), 125007, doi: 10.1088/0266-5611/27/12/125007.

Z.-Q. Luo & P. Tseng (1992), Error bound and convergence analysis of matrix splittingalgorithms for the ane variational inequality problem, SIAM Journal on Optimization2(1), 43โ€“54, doi: 10.1137/0802004.

M.M. Mรคkelรค & P. Neittaanmรคki (1992), Nonsmooth Optimization, Analysis and algorithmswith applications to optimal control, World Scientic Publishing Co., Inc., River Edge,NJ, doi: 10.1142/1493.

Y. Malitsky & T. Pock (2018), A rst-order primal-dual algorithm with linesearch, SIAMJournal on Optimization 28(1), 411โ€“432, doi: 10.1137/16m1092015.

380

bibliography

W. R. Mann (1953), Mean value methods in iteration, Proc. Amer. Math. Soc. 4, 506โ€“510,doi: 10.2307/2032162.

B. Martinet (1970), Rรฉgularisation dโ€™inรฉquations variationnelles par approximations suc-cessives, Rev. Franรงaise Informat. Recherche Opรฉrationnelle 4(Sรฉr. R-3), 154โ€“158.

S. Mazurenko, J. Jauhiainen & T. Valkonen (2020), Primal-dual block-proximal splittingfor a class of non-convex problems, Electronic Transactions on Numerical Analysis 52,509โ€“552, doi: 10.1553/etna_vol52s509, arxiv: 1911.06284.

P. Mehlitz & G. Wachsmuth (2018), The limiting normal cone to pointwise dened sets inLebesgue spaces, Set-Valued and Variational Analysis 26(3), 449โ€“467, doi: 10.1007/s11228-016-0393-4.

P. Mehlitz & G. Wachsmuth (2019), The weak sequential closure of decomposable sets inLebesgue spaces and its application to variational geometry, Set-Valued and VariationalAnalysis 27(1), 265โ€“294, doi: 10.1007/s11228-017-0464-1.

R. Mifflin (1977), Semismooth and semiconvex functions in constrained optimization, SIAMJournal on Control and Optimization 15(6), 959โ€“972, doi: 10.1137/0315061.

T. Mรถllenhoff, E. Strekalovskiy, M. Moeller & D. Cremers (2015), The primal-dualhybrid gradient method for semiconvex splittings, SIAM Journal on Imaging Sciences8(2), 827โ€“857, doi: 10.1137/140976601.

B. ล . Morduhoviฤ (1980), Metric approximations and necessary conditions for optimalityfor general classes of nonsmooth extremal problems, Dokl. Akad. Nauk SSSR 254(5),1072โ€“1076.

B. S. Mordukhovich (1994), Generalized dierential calculus for nonsmooth and set-valuedmappings, Journal of Mathematical Analysis and Applications 183(1), 250โ€“288, doi: 10.1006/jmaa.1994.1144.

B. S. Mordukhovich (2006), Variational Analysis and Generalized Dierentiation I, BasicTheory, vol. 330, Grundlehren der mathematischen Wissenschaften, Springer, doi: 10.1007/3-540-31247-1.

B. S. Mordukhovich (2018), Variational Analysis and Applications, Springer Monographs inMathematics, Springer International Publishing, doi: 10.1007/978-3-319-92775-6.

B. S. Mordukhovich (1976), Maximum principle in the problem of time optimal responsewith nonsmooth constraints, Journal of Applied Mathematics and Mechanics 40(6), 960โ€“969, doi: 10.1016/0021-8928(76)90136-2.

J.-J. Moreau (1965), Proximitรฉ et dualitรฉ dans un espace hilbertien, Bulletin de la Sociรฉtรฉmathรฉmatique de France 93, 273โ€“299.

T. S. Motzkin & I. J. Schoenberg (1954), The relaxation method for linear inequalities,Canadian Journal of Mathematics 6, 393โ€“404, doi: 10.4153/cjm-1954-038-x.

Y. E. Nesterov (1983), A method for solving the convex programming problem with con-vergence rate ๐‘‚ (1/๐‘˜2), Soviet Math. Doklad. 27(2), 372โ€“376.

Y. Nesterov (2004), Introductory Lectures on Convex Optimization, vol. 87, Applied Opti-mization, Kluwer Academic Publishers, Boston, MA, doi: 10.1007/978-1-4419-8853-9.

H.V. Ngai & M. Thรฉra (2008), Error bounds in metric spaces and application to theperturbation stability of metric regularity, SIAM Journal on Optimization 19(1), 1โ€“20,doi: 10.1137/060675721.

381

bibliography

P. Ochs & T. Pock (2019), Adaptive FISTA for non-convex optimization, SIAM Journal onOptimization 29(4), 2482โ€“2503, doi: 10.1137/17m1156678, arxiv: 1711.04343.

Z. Opial (1967), Weak convergence of the sequence of successive approximations fornonexpansive mappings, Bulletin of the American Mathematical Society 73(4), 591โ€“597,doi: 10.1090/s0002-9904-1967-11761-0.

J. Outrata, M. Koฤvara & J. Zowe (1998), Nonsmooth Approach to Optimization Problemswith Equilibrium Constraints, vol. 28, Nonconvex Optimization and its Applications,Theory, applications and numerical results, Kluwer Academic Publishers, Dordrecht,doi: 10.1007/978-1-4757-2825-5.

N. Parikh & S. Boyd (2014), Proximal algorithms, Foundations and Trends in Optimization1(3), 123โ€“231, doi: 10.1561/2400000003.

P. Patrinos, L. Stella & A. Bemporad (2014), Douglasโ€“Rachford splitting: complexityestimates and accelerated variants, in: 53rd IEEE Conference on Decision and Control,4234โ€“4239, doi: 10.1109/cdc.2014.7040049.

G. Peano (1908), Formulario Mathematico, Fratelli Boca, Torino.J.-P. Penot (2013), Calculus Without Derivatives, vol. 266, Graduate Texts in Mathematics,Springer, New York, doi: 10.1007/978-1-4614-4538-8.

W.V. Petryshyn (1966), Construction of xed points of demicompact mappings in Hilbertspace, Journal of Mathematical Analysis and Applications 14(2), 276โ€“284, doi: 10.1016/0022-247x(66)90027-8.

T. Pock, D. Cremers, H. Bischof & A. Chambolle (2009), An algorithm for minimizingthe Mumfordโ€“Shah functional, in: 12th IEEE Conference on Computer Vision, 1133โ€“1140,doi: 10.1109/iccv.2009.5459348.

R. Poliquin & R. Rockafellar (1996), Prox-regular functions in variational analysis, Trans-actions of the American Mathematical Society 348(5), 1805โ€“1838, doi: 10.1090/s0002-9947-96-01544-9.

L. Qi (1993), Convergence analysis of some algorithms for solving nonsmooth equations,Math. Oper. Res. 18(1), 227โ€“244, doi: 10.1287/moor.18.1.227.

L. Qi& J. Sun (1993), A nonsmooth version of Newtonโ€™s method,Mathematical Programming58(3, Ser. A), 353โ€“367, doi: 10.1007/bf01581275.

M. Renardy & R. C. Rogers (2004), An Introduction to Partial Dierential Equations, 2nd ed.,vol. 13, Texts in Applied Mathematics, Springer-Verlag, New York, doi: 10.1007/b97427.

S.M. Robinson (1981), Some continuity properties of polyhedral multifunctions, in: Math-ematical Programming at Oberwolfach, ed. by H. Kรถnig, B. Korte & K. Ritter, SpringerBerlin Heidelberg, Berlin, Heidelberg, 206โ€“214, doi: 10.1007/bfb0120929.

R. T. Rockafellar (1988), First-and second-order epi-dierentiability in nonlinear program-ming, Transactions of the American Mathematical Society 307(1), 75โ€“108, doi: 10.1090/s0002-9947-1988-0936806-9.

R. T. Rockafellar (1970), On the maximal monotonicity of subdierential mappings, PacicJ. Math. 33, 209โ€“216, doi: 10.2140/pjm.1970.33.209.

R. T. Rockafellar (1976a), Integral functionals, normal integrands andmeasurable selections,in:NonlinearOperators and the Calculus of Variations (Summer School,Univ. Libre Bruxelles,

382

bibliography

Brussels, 1975), vol. 543, Lecture Notes in Math. Springer, Berlin, 157โ€“207, doi: 10.1007/bfb0079944.

R. T. Rockafellar (1976b), Monotone operators and the proximal point algorithm, SIAMJournal on Control and Optimization 14(5), 877โ€“898, doi: 10.1137/0314056.

R. T. Rockafellar (1981), Favorable classes of Lipschitz continuous functions in subgradientoptimization, in: Progress in Nondierentiable Optimization, IIASA, Laxenburg, Austria,125โ€“143, url: hp://pure.iiasa.ac.at/id/eprint/1760/.

R. T. Rockafellar & R. J.-B. Wets (1998), Variational Analysis, vol. 317, Grundlehren dermathematischen Wissenschaften, Springer, doi: 10.1007/978-3-642-02431-3.

R. Rockafellar (1985), Maximal monotone relations and the second derivatives of nons-mooth functions,Annales de lโ€™Institut Henri Poincare (C) Non Linear Analysis 2(3), 167โ€“184,doi: 10.1016/s0294-1449(16)30401-2.

R. Rockafellar (1989), Proto-Dierentiability of Set-Valued Mappings and its Applicationsin Optimization, Annales de lโ€™Institut Henri Poincare (C) Non Linear Analysis 6, 449โ€“482,doi: 10.1016/s0294-1449(17)30034-3.

W. Rudin (1991), Functional Analysis, 2nd ed., McGraw-Hill, New York.A. Ruszczynski (2006), Nonlinear Optimization, Princeton University Press, Princeton, NJ,doi: 10.2307/j.ctvcm4hcj.

B. P. Rynne & M.A. Youngson (2008), Linear Functional Analysis, 2nd ed., Springer Under-graduate Mathematics Series, Springer, London, doi: 10.1007/978-1-84800-005-6.

H. Schaefer (1957), รœber die Methode sukzessiver Approximationen, Jahresbericht derDeutschen Mathematiker-Vereinigung 59, 131โ€“140, url: hp://eudml.org/doc/146424.

A. Schiela (2008), A simplied approach to semismooth Newton methods in function space,SIAM Journal on Optimization 19(3), 1417โ€“1432, doi: 10.1137/060674375.

W. Schirotzek (2007), Nonsmooth Analysis, Universitext, Springer, Berlin, doi: 10.1007/978-3-540-71333-3.

S. Scholtes (2012), Introduction to Piecewise Dierentiable Equations, Springer Briefs inOptimization, Springer, New York, doi: 10.1007/978-1-4614-4340-7.

S. Simons (2009), A new proof of the maximal monotonicity of subdierentials, Journal ofConvex Analysis 16(1), 165โ€“168.

L. Thibault & D. Zagrodny (1995), Integration of subdierentials of lower semicontinuousfunctions on Banach spaces, Journal of Mathematical Analysis and Applications 189(1),33โ€“58, doi: 10.1006/jmaa.1995.1003.

L. Thibault (1983), Tangent cones and quasi-interiorly tangent cones to multifunctions,Trans. Amer. Math. Soc. 277(2), 601โ€“621, doi: 10.2307/1999227.

M. Ulbrich (2002), Semismooth Newton methods for operator equations in function spaces,SIAM Journal on Optimization 13(3), 805โ€“842 (2003), doi: 10.1137/s1052623400371569.

M. Ulbrich (2011), Semismooth Newton Methods for Variational Inequalities and ConstrainedOptimization Problems in Function Spaces, vol. 11, MOS-SIAM Series on Optimization, Soci-ety for Industrial and Applied Mathematics, Philadelphia, PA, doi: 10.1137/1.9781611970692.

T. Valkonen (2014), A primal-dual hybrid gradient method for nonlinear operators withapplications to MRI, Inverse Problems 30(5), 055012, doi: 10.1088/0266-5611/30/5/055012.

383

bibliography

T. Valkonen (2019), Block-proximal methods with spatially adapted acceleration, ElectronicTransactions on Numerical Analysis 51, 15โ€“49, doi: 10.1553/etna_vol51s15, arxiv: 1609.07373,url: hp://tuomov.iki.fi/m/blockcp.pdf.

T. Valkonen (2020a), First-order primal-dual methods for nonsmooth nonconvex optimi-sation, in: Handbook of Mathematical Models and Algorithms in Computer Vision andImaging, ed. by K. Chen, C.-B. Schรถnlieb, X.-C. Tai & L. Younes, accepted, Springer,arxiv: 1910.00115.

T. Valkonen (2020b), Inertial, corrected, primal-dual proximal splitting, SIAM Journal onOptimization 30(2), 1391โ€“1420, doi: 10.1137/18m1182851, arxiv: 1804.08736.

T. Valkonen (2020c), Testing and non-linear preconditioning of the proximal point method,Applied Mathematics & Optimization 82(2), 591โ€“636, doi: 10.1007/s00245-018-9541-6,arxiv: 1703.05705, url: hp://tuomov.iki.fi/m/proxtest.pdf.

T. Valkonen (2021), Preconditioned proximal point methods and notions of partial subregu-larity, Journal of Convex Analysis 28(1), in press, arxiv: 1711.05123, url: hp://tuomov.iki.

fi/m/proxtest.pdf.B. C. Vu (2013), A splitting algorithm for dual monotone inclusions involving cocoerciveoperators, Advances in Computational Mathematics 38(3), 667โ€“681, doi: 10.1007/s10444-011-9254-8.

S. J. Wright (2015), Coordinate descent algorithms,Mathematical Programming 151(1), 3โ€“34,doi: 10.1007/s10107-015-0892-3.

S. J. Wright, R. D. Nowak &M.A. T. Figueiredo (2009), Sparse reconstruction by separableapproximation, IEEE Transactions on Signal Processing 57(7), 2479โ€“2493, doi: 10.1109/tsp.2009.2016892.

D. Yost (1993), Asplund spaces for beginners, Acta Universitatis Carolinae. Mathematica etPhysica 34(2), 159โ€“177, url: hps://dml.cz/dmlcz/702006.

X. Zhang, M. Burger & S. Osher (2011), A unied primal-dual algorithm framework basedon Bregman iteration, Journal of Scientic Computing 46(1), 20โ€“46, doi: 10.1007/s10915-010-9408-8.

X. Y. Zheng& K. F. Ng (2010), Metric subregularity and calmness for nonconvex generalizedequations in Banach spaces, SIAM Journal on Optimization 20(5), 2119โ€“2136, doi: 10.1137/090772174.

Z. Zhou & A.M.-C. So (2017), A unied approach to error bounds for structured convexoptimization problems, Mathematical Programming 165(2), 689โ€“728, doi: 10.1007/s10107-016-1100-9.

M. Zhu & T. Chan (2008), An ecient primal-dual hybrid gradient algorithm for totalvariation image restoration, CAM Report 08-34, UCLA, url: p://p.math.ucla.edu/

pub/camreport/cam08-34.pdf.

384

top related