functional analysis and optimization · functional analysis and optimization kazufumi ito november...

Functional Analysis and Optimization

Kazufumi Ito∗

November 29, 2016

Abstract

In this monograph we develop the function space method for optimization problemsand operator equations in Banach spaces. Optimization is the one of key componentsfor mathematical modeling of real world problems and the solution method providesan accurate and essential description and validation of the mathematical model. Op-timization problems are encountered frequently in engineering and sciences and havewidespread practical applications.In general we optimize an appropriately chosen costfunctional subject to constraints. For example, the constraints are in the form of equal-ity constraints and inequality constraints. The problem is casted in a function space andit is important to formulate the problem in a proper function space framework in orderto develop the accurate theory and the effective algorithm. Many of the constraintsfor the optimization are governed by partial differential and functional equations. Inorder to discuss the existence of optimizers it is essential to develop a comprehensivetreatment of the constraints equations. In general the necessary optimality conditionis in the form of operator equations.

Non-smooth optimization becomes a very basic modeling toll and enlarges and en-hances the applications of the optimization method in general. For example, in theclassical variational formulation we analyze the non Newtonian energy functional andnonsmooth friction penalty. For the imaging/signal analysis the sparsity optimizationis used by means of L1 and TV regularization. In order to develop an efficient solu-tion method for large scale optimizations it is essential to develop a set of necessaryconditions in the equation form rather than the variational inequality. The Lagrangemultiplier theory for the constrained minimizations and nonsmooth optimization prob-lems is developed and analyzed The theory derives the complementarity condition forthe multiplier and the solution. Consequently, one can derive the necessary optimalitysystem for the solution and the multiplier for a general class of nonsmooth and noncovexoptimizations. In the light of the theory we derive and analyze numerical algorithmsbased the primal-dual active set method and the semi-smooth Newton method.

The monograph also covers the basic materials for real analysis, functional anal-ysis, Banach space theory, convex analysis, operator theory and PDE theory, whichmakes the book self-contained, comprehensive and complete. The analysis componentis naturally connected to the optimization theory. The necessary optimality conditionis in general written as nonlinear operator equations for the primal variable and La-grange multiplier. The Lagrange multiplier theory of a general class of nonsmooth and

∗Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA

1

non-convex optimization are based on the functional analysis tool. For example, thenecessary optimality for the constrained optimization is in general nonsmooth and apsuedo-monontone type operator equation.

To develop the mathematical treatment of the PDE constrained minimization wealso present the PDE theory and linear and nonlinear C0-semigroup theory on Banachspaces and we have a full discussion of well-posedness of the control dynamics andoptimization problems. That is, the existence and uniqueness of solutions to a generalclass of controlled linear and nonlinear PDEs and Cauchy problems for nonlinear evo-lution equations in Banach space are discussed. Especially the linear operator theoryand the (psuedo-) monotone operator and mapping theory and the fixed point the-ory.Throughout the monograph demonstrates the theory and algorithm using concreteexamples and describes how to apply it for motivated applications

1 Introduction

In this monograph we develop the function space approach for the optimization prob-lems, e.g., nonlinear programming in Banach spaces, convex and non-convex nonsmoothvariational problems, control and inverse problems, image/signal analysis, materialdesign, classification, and resource allocation. We also develop the basic functionalanalysis tool for for the nonlinear equations and Cauchy problems in Banach spacesin order to establish the well-posedness of the constraint optimization. The functionspace optimization method allows us to study the well-posednees for a general classof optimization problems systematically using the functional analysis tool and it alsoenables us to develop and analyze the numerical optimization algorithm based on thecalculus of variations and the Lagrange multiplier theory for the constrained opti-mization. The function space analysis provides the regularizing procedure to remeadythe ill-posedness of the constraints and the necessary optimality system and thus onecan develop a stable and effective numerical algorithm. The convergence analysis forthe function space optimization guides us to analyze and improve the stability andeffectiveness of algorithms for the associated large scale optimizations.

A general class of constrained optimization problems in function spaces is consid-ered and we develop the Lagrange multiplier theory and effective solution algorithms.Applications are presented and analyzed for various examples including control anddesign optimization, inverse problems, image analysis and variational problems. Non-convex and non-smooth optimization becomes increasing important for the advancedand effective use of optimization methods and thus they are essential topics for themonograph. We introduce the basic function space and functional analysis tools neededfor our analysis. The monograph provides an introduction to the functional analysis,real analysis and convex analysis and their basic concepts with illustrated examples.Thus, one can use the book as a basic course material for the functional analysis andnonlinear operator theory. The nonlinear operator theory and their applications toPDE’s problems are presented in details, including classical variational optimizationproblems in Newtonian and non-Newtonian mechanics and fluid. An emerging appli-cation of optimizations include the imaging and signal analysis and the classificationand machine learning. It is often formulated as the PDE’s constrained optimization.

There are numerous applications of the optimization theory and variational methodin all sciences and to real world applications. For example, the followings are the

2

motivated examples.

Example 1 (Quadratic programming)

min 12x

tAx− atx over x ∈ Rn

subject to Ex = b (equality constraint) Gx ≤ c (inequality constraint).

where A ∈ Rn×n is a symmetric positive matrix, E ∈ Rm×n, G ∈ Rp×n, and a ∈ Rn,b ∈ Rm, c ∈ Rp are given vectors. The quadratic programming is formulated on aHilbert space X for the variational problem, in which (Ax, x)X defines the quadraticform on X.

Example 2 (Linear programming) The linear programing minimizes (c, x)X subject tothe linear constraint Ax = b and x ≥ 0. Its dual problem is to maximize (b, x) subjectto A∗x ≤ c. It has many important classes of applications. For example, the optimaltransport problem is to find the joint distribution function µ(x, y) ≥ 0 on X × Y thattransports the probability density p0(x) to p1(y), i.e.∫

Yµ(x, y) dx = p0(x),

∫Yµ(x, y) dy = p1(y),

with minimum energy ∫X×Y

c(x, y)µ(x, y) dxdy = (c, µ).

Example 3 (Integer programming) Let F is a continuous function Rn. The integerprogramming is to minimize F (x) over the integer variables x, i.e.,

min F (x) subject to x ∈ 0, 1, · · ·N.

The binary optimizations to minimize F (x) over the binary variables x. The networkoptimization problem is the one of important applications.

Example4 (Obstacle problem) Consider the variational problem;

min J(u) =

∫ 1

0(1

2| ddxu(x)|2 − f(x)u(x)) dx (1.1)

subject tou(x) ≤ ψ(x)

over all displacement function u(x) ∈ H10 (0, 1), where the function ψ represents the

upper bound (obstacle) of the deformation. The functional J represents the Newtoniandeformation energy and f(x) is the applied force.

Example 5 (Function Interpolation) Consider the interpolation problem;

min

∫ 1

0(1

2| d

2

dx2u(x)|2 dx over all functions u(x), (1.2)

subject to the interpolation conditions

u(xi) = bi,d

dxu(xi) = ci, 1 ≤ i ≤ m.

3

Example 6 (Image/Signal Analysis) Consider the deconvolution problem of finding theimage u that satisfies (Ku)(x) = f(x) for the convolution K

(Ku)(x) =

∫Ωk(x, y)u(y) dy.

over a domain Ω. It is in general ill-posed and we use the (Tikhonov) regularizationformulation to have a stable deconvolution solution u, i.e., for Ω = [0, 1]

min

∫ 1

0(|Ku− f |2 +

α

2| ddxu(x)|2 + β u(x)) dx over all possible image u ≥ 0, (1.3)

where α, β ≥ 0 are the regularization parameters. The first regularization term is thequadratic varation of image u. Since if u ≥ 0 then |u| = u, the regularization term forthe L1 sparsity. The two regularization parameters are chosen so that the variation ofimage u is regularized.

Example 7 (L0 optimization) Problem of optimal resource allocation and material dis-tribution and optimal time scheduling can be formulated as

F (u) +

∫Ωh(u(ω)) dω (1.4)

with h(u) = α2 |u|

2 + β |u|0 where

|0|0 = 0, |u|0 = 1, otherwise.

Since ∫Ω|u(ω)|0 dω = vol(u(ω) 6= 0),

the solution to (1.4) provides the optimal distribution of u for minimizing the perfor-mance F (u) with appropriate choice of (regularization) parameters α, β > 0.

Example 8 (Bayesian statistics) Consider the statical optimization;

min −log (p(y|x) + βp(x)) over all admissible parameters x, (1.5)

where p(y|x) is the conditional probability density of observation y given parameterx and the p(x) is the prior probability density for parameter x. It is the maximumposterior estimation of the parameter x given observation y. In the case of inverseproblems it has the form

φ(x|y) + αp(x) (1.6)

for x is in a function space X and φ(x|y) is the fidelity and p is a regularizationsemi-norm on X. A constant α > 0 is the Tikhonov regularization parameter.

Example 9 (Optimal Control Problem) Consider the constrained optimization of theform

min F (x) +H(u) subject to E(x, u) = 0 and u ∈ C

where x ∈ X and u ∈ U is the state and control variables in Hilbert spaces X and U ,respectively. Equality constraint E represents the model equations for x = x(u) ∈ Xas a function of u ∈ C, a closed convex set in U . The separable cost functionals areappropriately selected to achieve a specific performance. Such optimization problems

4

arise in the optimal control problem and inverse medium problem and optimal designproblem.

Example 8 (Machine Learning) Consider the constrained optimization of the form

L(p, q) =

Multi-dimensional versions of these examples and more illustrated examples will bediscussed and analyzed throughout the monograph.

Discretization An important aspect of the function space optimization is the approx-imation method and theory. For example, let Bn

i (x) is the linear B-spline definedby

Bi(x) =

x−xi−1

xi−xi−1on [xi−1, xi]

x−xi+1

xi−xi+1on [xi, xi+1]

0 otherwise

(1.7)

and let

un(x) =n−1∑i=1

uiBi(x) ∼ u(x). (1.8)

Then,d

dxun =

ui − ui−1

xi − xi−1, x ∈ (xi−1, xi).

Let xi = in and ∆x = 1

n and one can define the discretized problem of Example 1:

minn∑i=1

(1

2|ui − ui−1

∆x|2 − fiui) ∆x

subject tou0 = un = 0, ui ≤ ψi = ψ(xi).

where fi = f(xi). The discretized problem is the quadratic programming for ~u ∈ Rn−1

with

A =1

∆x

2 −1−1 2 −1

. . .. . .

. . .

−1 2 −1−1 2

, b = ∆x

f1

f2...

fn−2

fn−1

Thus, we will discuss a convergence analysis of the approximate solutions un to thesolution u as n→∞ in an appropriate function space.

In general, one has the optimization of the form

min F0(x) + F1(x) over x ∈ X subject to E(x) ∈ K

where X is a Banach space, F0 is C1 and F1 is nonsmooth functional and E(x) ∈ Krepresents the constraint conditions with K is a closed convex cone. For the existenceof a minimizer we require a coercivity condition and lower-semicontinuity and the

5

continuity of the constraint in a weak or weak star topology of X. For the necessaryoptimality condition, we develop a generalized Lagrange multiplier theory:

F ′(x∗) + λ+ E′(x∗)∗µ = 0, E(x) ∈ K

C1(x∗, λ) = 0

C2(x∗, µ) = 0,

(1.9)

where λ is the Lagrange multiplier for a relaxed derivative of F1 and µ is the Lagrangemultiplier for the constraint E(x) ∈ K. We derive the complementarity conditions C1

and C2 so that (1.9) is a complete set of equation for determining (x∗, λ, µ).We discuss the solution method for linear and nonlinear equations in Banach spaces

of the formf ∈ A(x) (1.10)

where f ∈ X∗ in the dual space of a Banach space and A is pseudo-monotone mappingX → X∗ and the Cauchy problem:

d

dtx(t) ∈ A(t)(x(t)) + f(t), x(0) = x0, (1.11)

where A(t) is an m-dissipative mapping in X × X. Also, we note that the necessaryoptimality (1.9) is a nonlinear operator equation and we use the nonlinear operatortheory to analyze solutions to (1.9). We also discuss the PDE constrained optimiza-tion problems, in which f and f(t) are a function of control and design and mediumparameters. We minimize a proper performance index defined for the state and controlfunctions subject to the constraint (1.10) or (1.11). Thus, we combine the analysis ofthe PDE theory and the optimization theory to develop a comprehensive treatmentand analysis of the PDE constrained optimization problems.

The outline of the monograph is as follows. In Section 2 we introduce the basicfunction space and functional analysis tools needed for our analysis. It provides theintroduction to the functional analysis and real analysis and their basic concepts withillustrated examples. For example, the dual space of normed space, Hahn-Banachtheorem, open mapping theory, closed range theory, distribution theory and weaksolution to PDEs, compact operator and spectral theory.

In Section 3 we develop the Lagrange multiplier method for a general class of(smooth) constrained optimization problems in Banach spaces.The Hilbert space the-ory is introduced first and the Lagrange multiplier is defined as the limit of the penal-ized problem. The constraint qualification condition is introduced and we derive the(normal) necessary optimality system for the primal and dual variables. We discussthe minimum norm problem and the duality principle in Banach spaces. In Section4 we present the linear operator theory and C0-semigroup theory on Banach spaces.In Section 5 we develop the monotone operator theory and pseudo-monotone operatortheory for nonlinear equations in Banach spaces and the nonlinear semigroup theoryin Banach spaces. The fixed point theory for nonlinear equations in Banach spaces.

In Section 7 we develop the basic convex analysis tools and the Lagrange multipliertheory for a general class of convex non-smooth optimization. We introduce and analyzethe augmented Lagrangian method for the constrained optimization and non-smoothoptimization problems.

6

In Section 8 we develop the Lagrange multiplier theory for a general class of non-smooth and non-convex optimizations.

In this course we discuss the well-posededness of the evolution equations in Ba-nach spaces. Such problems arise in PDEs dynamics and functional equations. Wedevelop the linear and nonlinear theory for the corresponding solution semigroups.The lectures include for example the Hille-Yosiida theory, Lumer-Philiips theory forlinear semigroup and Crandall-Liggett theory for nonlinear contractive semigroup andCrandall-Pazy theory for nonlinear evolution equations. Especially, (numerical) ap-proximation theory for PDE solutions are discussed based on Trotter-Kato theory andTakahashi-Oharu theory, Chernoff theory and the operator splitting method. The the-ory and its applications are examined and demonstrated using many motivated PDEexamples including linear dynamics (e.g. heat, wave and hyperbolic equations) andnonlinear dynamics (e.g. nonlinear diffusion, conservation law, Hamilton-Jacobi andNavier-Stokes equations). A new class of PDE examples are formulated and the de-tailed applications of the theory is carried out.

The lecture also covers the basic elliptic theory via Lax-Milgram, Minty-Browdertheory and convex optimization. Functional analytic methods are also introduced forthe basic PDEs theory.

The students are expected to have the basic knowledge in real and functional anal-ysis and PDEs.

Lecture notes will be provided. Reference book: ”Evolution equations and Approx-imation” K. Ito and F. Kappel, World Scientific.

2 Basic Function Space Theory

In this section we discuss the basic theory and concept of the functional analysis andintroduce the function spaces, which is essential to our function space analysis of a gen-eral class of optimizations. We refer to the more comprehensive treatise and discussionon the functional analysis for e.g., Yosida [].

2.1 Complete Normed Space

In this section we introduce the vector space, normed space and Banach spaces.

Definition (Vector Space) A vector space over the scalar field K is a set X, whoseelements are called vectors, and in which two operations, addition and scalar multipli-cation, are defined, with the following familiar algebraic properties.

(a) To every pair of vectors x and y corresponds a vector x+ y, in such a way that

x+ y = y + x and x+ (y + z) = (x+ y) + z;

X contains a unique vector 0 (the zero vector) such that x+0 = 0+x for every x ∈ X,and to each x ∈ X corresponds a unique inverse vector −x such that x+ (−x) = 0.

(b) To every pair (α, x) with α ∈ K and x ∈ X corresponds a vector αx, in such away that

1x = x, α(βx) = (αβ)x,

7

and such that the two distributive laws

α (x+ y) = αx+ α y, (α+ β)x = αx+ β x

hold.

The symbol 0 will of course also be used for the zero element of the scalar field. It iseasy to see that the inverse element −x of the additivity satisfies −x = (−1)x since0x = (1− 1)x = x+ (−1)x for each x ∈ X. In what follows we consider vector spacesonly over the real number field R or the complex number field C.

Definition (Subspace, Basis) (1) A subset S of X is a (linear) subspace of a vectorspace X if αx1 + β x2 ∈ S for all x, x2 ∈ S and and α, β ∈ K.(2) A family of linear vectors xini=1 are linearly independent if

∑ni=1 αi xi = 0 if and

only if αi = 0 for all 1 ≤ i ≤ n. The dimension dim(X) is the largest number ofindependent vectors in X.(3) A basis eα of a vector space X is a set of linearly independent vectors that, in alinear combination, can represent every vector in X, i.e., x =

∑α aαeα.

Example (Vector Space) (1) LetX = C[0, 1] be the vector space of continuous functionsf on [0, 1]. X is a vector space by

(α f + β g)(t) = α f(t) + β g(t), t ∈ [0, 1]

S= a space of polynomials of order less than n is a linear subspace of X and dim(X) =n. A family tk∞k=0 of polynomials is a basis of X and dom(X) =∞.(2) The vector space of real number sequences x = (x1, x2, x3, · · · ) is a vector space by

(αx+ β y)k = αxk + β yk, k ∈ N.

A family of unit vectors ek = (0, · · · , 1, 0, · · · ), k ∈ N is a basis.

(3) Let X be the vector space of square integrable functions on (0, 1). Let ek(x) =√(2) sin(kπx), 0 ≤ x ≤ 1. Then ek are orthonomal basis, i.e. (ek, ej) =

∫ 1

0ek(x)ej(x) dx =

δk,j and

f(x) =∞∑k=1

fk ek(x), the Fourier sine series

where fk =∫ 1

0 f(x)ek(x) dx. en+1 is not linearly dependent with (e1, · · · , en). If so,en+1 =

∑ni=1 βiei but βi = (en+1, ei) = 0, which is the contradiction.

Definition (Normed Space) A norm | · | on a vector space X is a real-valued func-tional on X which satisfies

|x| ≥ 0 for all x ∈ X with equality if and only if x = 0,

|αx| = |α| |x| for every x ∈ x and α ∈ K

|x+ y| ≤ |x|+ |y| (triangle inequality) for every x, y ∈ X.

A normed space is a vector space X which is equipped with a norm | · | and will bedenoted by (X, | · |). The norm of a normed space X will be denoted by | · |X or simplyby | · | whenever the underlined space X is understood from the context.

8

Example (Normed space) Let X = C[0, 1] be the vector space of continuous functionsf on [0, 1]. The followings are examples the two normed space for X; X1 is equippedwith sup norm

|f |X1 = maxx∈[0,1]

|f(x)|.

and X2 is equipped with L2 norm

|f |X2 =

√∫ 1

0|f(x)|2 dx.

Definition (Banach Space) Let X be a normed space.(1) A sequence xn in a normed space X converges to the limit x∗ ∈ X if and only iflimn→∞ |xn − x∗| = 0 in R.(2) A subspace S of a normed space X is said to be dense in X if each x ∈ X is thelimit of a sequence of elements in S.(3) A sequence xn in a normed space X is called a Cauchy sequence if and only iflimm, n→∞ |xm−xn| = 0, i.e., for all ε > 0 there exists N such that for m, n ≥ N suchthat |xm − xn| < ε. If every Cauchy sequence in X converges to a limit in X, then Xis complete and X is called a Banach space.

A Cauchy sequence xn in a normed space X has a unique limit if it has accumu-lation points. In fact, for any limit x∗

limn→∞

|x∗ − xn| = limn→∞

limm→∞

|xm − xn| = 0.

Example (continued) Two norms |x|1 and |x|2 on a vector space X are equivalent ifthere exist 0 < c ≤ c <∞ such that

c|x|1 ≤ |x|2 ≤ c |x|1 for all x ∈ X.

(1) All norms on Rn is are equivalent. But for X = C[0, 1] consider xn(t) = tn. Then,

|xn|X1 = 1 but |xn|X2 =√

12n+1 for all n. Thus, |xn|X2 → 0 as n → ∞ and the two

norms X1 and X2 are not equivalent.(2) Every bounded sequence in Rn has a convergent subsequence in Rn (Bolzano-Weierstrass theorem). Consider the sequence xn(t) = tn in X = C[0, 1]. Then xn isnot convergent despite |xn|X1 = 1 is bounded.

Definition (Hilbert Space) If X is a vector space, a functional (·, ·) defined on X×Xis called an inner product on X provided that for every x, y, z ∈ X and α, β ∈ C

(x, y) = (y, x)

(αx+ β y, z) = α (x, z) + β (y, z),

(x, x) ≥ 0 and (x, x) = 0 if and only if x = 0.

where c denotes the complex conjugate of c ∈ C. Given such an inner product, a normon X can be defined by

|x| = (x, x)12 .

9

A vector space (X, (·, ·)) equipped with an inner product is called a pre–Hilbert space.If X is complete with respect to the induced norm, then it is called a Hilbert space.

Every inner product satisfies the Cauchy-Schwarz inequality

|(x, y)| ≤ |x| |y| for x, y ∈ X

In fact, for all t ∈ R

|x+ t(x, y) y|2 = |x|2 + 2t (x, y) + t2 |y|2 ≥ 0.

Thus, we must have |(x, y)|2−|x|2|y|2 ≤ 0 which shows the Cauchy-Schwarz inequality.

Example (Hilbert space) (1) Rn with inner product

(x, y) =n∑k=1

xkyk for x, y ∈ Rn

(2) Consider a space `2 of square summable real number sequences x = (x1, x2, · · · )define the inner product

(x, y) =

∞∑k=1

xkyk for x, y ∈ `2

Then, `2 is an Hilbert space. In fact, let xn is a Cauchy sequence in `2. Since

|xnk − xmk | ≤ |xn − xm|`2 ,

xnk is a Cauchy sequence in R for every k. Thus, one can define a sequence x byxk = limn→∞ x

nk . Since

limm→∞

|xm − xn|`2 = |x− xn|`2

for a fixed n, x ∈ `2 and |xn − x|`2 → as n→∞.(3) Consider a space X2 of continuously differentiable functions on [0, 1] vanishing atx = 0, x = 1 with inner product

(f, g) =

∫ 1

0

d

dxfd

dxg dx for f, g ∈ X2.

Then X2 is a Pre-Hilbert space.

Every normed space X is either a Banach space or a dense subspace of a Banachspace Y whose norm |x|Y satisfies

|x|Y = |x| for every x ∈ X.

In this latter case Y is called the completion of X. The completion of a normed spaceproceeds as in Cantor’s Construction of real numbers from rational numbers.

Completion The set of Cauchy sequences xn of the normed space X can be classifiedaccording to the equivalence xn ∼ yn if lim n→∞ |xn−yn| = 0. Since ||xn|−|xm|| ≤|xn − xm| and R is complete, it follows that limn→∞ |xn| exists. We denote by xn′,

10

the class containing xn. Then the set Y of all such equivalent classes x = xn′ is avector space by

xn′ + yn′ = xn + yn′, α xn′ = αxn′

and we define|xn′|Y = lim

n→∞|xn|.

It is easy to see that these definitions of the vector sum and scalar multiplication andnorm do not depend on the particular representations for the classes xn′ and yn′.For example, if xm ∼ yn, then lim n→∞ ||xn| − |yn|| ≤ lim n→∞ |xn − yn| = 0, andthus lim n→∞ |xn| = lim n→∞ |yn|. Since |xn′|Y = 0 implies that limn→∞ |xn − 0|,we have 0 ∈ xn′ and thus | · |Y is a norm. Since an element x ∈ X corresponds tothe Cauchy sequence:

x = x, x, . . . , x, . . . ′ ∈ Y,

X is naturally contained in Y .

To prove the completeness of Y , let xk = x(k)n be a Cauchy sequence in Y .

Then for each k, we can choose an integer nk such that

|x(k)m − x(k)

nk|X ≤ k−1 for m > nk.

We show that the sequence xk converges to the class containing the Cauchy sequence

x(k)nk of X.To this end, note that

|xk − x(k)nk |Y = lim

m→∞|x(k)m − x(k)

nk|X ≤ k−1.

Since

|x(k)nk− x(m)

nm |X = |x(k)nk − x

(m)nm |Y ≤ |x

(k)nk − xk|Y + |xk − xm|Y + |xm − x(m)

nm |Y

≤ k−1 +m−1 + |xk − xm|Y ,

x(k)nk is a Cauchy sequence of X. Let x = x(k)

nk ′. Then,

|x− xk|Y ≤ |x− x(k)nk |Y + |x(k)

nk − xk|Y ≤ |x− x(k)nk |Y + k−1.

Since, as shown above

|x− x(k)nk |Y = lim

m→∞|xmnm − x

(k)nk|X ≤ lim

m→∞|xm − xk|Y + k−1

thus we prove that limk→∞ |x − x(k)nk |Y = 0 and so limk→∞ |x − xk|Y = 0. Moreover,

the above proof shows that the correspondence x ∈ X to x ∈ Y is isomorphic andisometric and the image of X by this correspondence is dense in Y , that is X is densein Y .

Example (Completion) Let X = C[0, 1] be the vector space of continuous functions fon [0, 1]. We consider the two normed space for X; X1 is equipped with sup norm

|f |X1 = maxx∈[0,1]

|f(x)|.

11

and X2 is equipped with L2 norm

|f |X2 =

√∫ 1

0|f(x)|2 dx

We claim that X1 is complete but X2 is not. The completion of X2 is L2(0, 1) =thespace of square integrable functions.

First, discuss X1. Let fn is a Cauchy sequence in X1. Since

|fn(x)− fm(x)| ≤ |fn − fm|X1 ,

fn(x) is Cauchy sequence in R for every x ∈ [0, 1]. Thus, one can define a functionf by f(x) = limn→∞ fn(x). Since

|f(x)− f(x)| ≤ |fn(x)− fn(x)|+ |fn(x)− f(x)|+ |fn(x)− f(x)|

and |fn(x) − f(x)| → 0 uniformly x ∈ [0, 1], the candidate f is continuous. Thus, X1

is complete.Next, discuss X2. Define a Cauchy sequence fn by

fn(x) =

12 + n (x− 1

2) on [12 −

1n ,

12 + 1

n ]0 on [0, 1

2 −1n ]

1 on [12 + 1

n , 1]

The candidate limit f in this case is defined by f(x) = 0, x < 12 , f(1

2) = 12 and

f(x) = 1, x > 12 and is not in X. Consider a Cauchy sequences gn and hn in the

equivalence class fn;

gn(x) =

1 + n (x− 1

2) on [12 −

1n ,

12 ]

0 on [0, 12 −

1n ]

1 on [12 , 1]

hn(x) =

n (x− 1

2) on [12 ,

12 + 1

n ]0 on [0, 1

2 ]1 on [1

2 + 1n , 1].

Thus, the candidate limits g, (g(x) = 0, x < 12 , g(x) = 1, x ≥ 1

2 and h, (h(x) = 0, x ≤12 , h(x) = 1, x > 1

2 belong to the same equivalent class f = fn. Note that f , g andh differ only at x = 1

2 . In general functions in an equivalent class differ at countablymany points. The completion Y of X2 is denoted by

L2(0, 1) = the space of square integrable measurable functions on (0, 1)

with

|f |2L2 =

∫ 1

0|f(x)|2 dx.

Here, the integrable is defined in the sense of Lebesgue as discussed in the next section.The concept of completion states, for every f ∈ Y = X =the completion of X there

exists a sequence fn in X such that |fn−f |Y → 0 as n→∞ and |f |Y = limn→∞ |fn|X .

12

For the example X2, for all square integrable function f ∈ L2(0, 1) = Y = X2 thereexists a continuous function sequence fn ∈ X2 such that |f − fn|L2(0,1) → 0.

Example H10 (0, 1) The completion of C1

( 0, 1), the pre-Hilberrt space of continuously

differentiable functions on [0, 1] vanishing at x = 0 x = 1 with respect to H10 inner

product

(f, g)H10

=

∫ 1

0

d

dxf(x)

d

dxg(x) dx

is denoted by H10 (0, 1). It can be proved that

H10 (0, 1) = a space of absolutely continuous functions on [0, 1] with square integrable derivative

and vanishing at x = 0, x = 1

with

(f, g)H10

=

∫ 1

0

d

dxf(x)

d

dxg(x) dx

where the integrable is defined in the sense of Lebesgue. Here, f is absolutely contin-uous function if there exits an integrable function g such that

f(x) = f(0) +

∫ x

0g(x) dx x ∈ [0, 1].

and f is almost everywhere (a.e.) differentiable and ddxf = g a.e. in (0, 1). For the

linear spline function Bnk (x), 1 ≤ k ≤ n− 1 is in H1

0 (0, 1).

2.2 Measure Theory

In this section we discuss the measure theory.

Definition (1) A topological space is a set E together with a collection τ of subsets ofE, called open sets and satisfying the following axioms; The empty set and E itself areopen. Any union of open sets is open. The intersection of any finite number of opensets is open. The collection τ of open sets is then also called a topology on E. Weassume that a topological space (E, τ) satisfies the Hausdoroff’s axiom of separation:

for every disjoints x1, x2 ∈ Ethere exist disjoint open sets G, G2 auch that x1 ∈ G1, x2 ∈ G2.

(2) A metric space E is a topological space with metric d if for all x, y, z ∈ E

d(x, y) ≥ for x, y ∈ E and d(x, y) = 0 if and only if x = y

d(x, y) = d(y, x)

d(x, z) ≤ d(x, y) + d(y, z).

For x0 ∈ E and r > 0 B(x0, r) = x ∈ E : d(x, x0) < r is an open ball of E with centerx0 an radius r. A set O is called open if for any points x ∈ E there exists an open ballB(x, r) contained in the set O.

13

(3) A collection of subsets F of E is σ-algebra if

for all A ∈ F , Ac = E \A ∈ F

for arbitrary family Ai in F , countable union⋃∞i=1Ai ∈ F

(4) A measure µ on (E,F) assigns A ∈ F the nonnegative real value µ(A) and satisfiesσ-additivity;

µ(∞⋃i=1

Ai) =∞∑i=1

µ(Ai)

for all disjoint sets Ai in F .

Theorem (Monotone Convergence) The measure µ is σ-additive if and only if forall sequence Ak of nondecreasing events and A =

⋃k≥1Ak, limn→∞ µ(An) = µ(A).

Proof: Ack is a sequence of nonincreasing events and Ac =⋂k≥1A

ck. Since⋂

k≥1

Ack = Ac1 + (Ac2 \Ac1) + (Ac3 \Ac2) + · · ·

we have

µ(Ac) = µ(Ac1) + µ(Ac2 \Ac1) + µ(Ac3 \Ac2) + · · ·

= µ(Ac1) + µ(Ac2)− µ(Ac1) + µ(Ac3)− µ(Ac1) + · · · = limn→∞ µ(Acn)

Thus,µ(A) = 1− µ(Ac) = 1− (1− lim

n→∞µ(An)) = lim

n→∞µ(An).

Conversely, let A1, A2, · · · ∈ F be pairwise disjoint and let∑∞

k=1 Ak ∈ F . Then

µ(∞∑k=1

Ak) = µ(n∑k=1

Ak) + µ(∞∑

k=n+1

Ak)

Since∑∞

k=n+1Ak ↓ ∅, we have

limn→∞

∞∑k=n+1

Ak = ∅

and thus

µ(

∞∑k=1

Ak) = limn→∞

n∑k=1

µ(Ak) =

∞∑k=1

µ(Ak).

Example (Metric space) A normed space X is a metric space with d(x, y) = |x− y|. Apoint x ∈ E is the accumulation point of a sequence xn if d(xn, x) → 0 as n → ∞.The closure A of a set A is the collection of all accumulation points of A. The boundaryof asset A is the collection points of x that any open ball at x contains both points inA and Ac. A set B in E is closed if B = B, i.e., every sequence in xn in B has aaccumulation x. The complement Ac of an open set A is closed.Examples (σ-algebra) There are trivial σ-algebras;

F0 = Ω,∅, F∗ = all subsets of Ω.

14

Let A be a subset of Ω and σ-algebra generated by A is

FA = Ω,∅, A,Ac.

A finite set of subsets A1, A2, · · · , An of E which are pairwise disjoint and whose unionis E. It is called a partition of Ω . It generates the σ-algebra: A = A =

⋃j∈J Aj

where J runs over all subsets of 1, · · · , n. This σ-algebra has 2n elements. Every finiteσ-algebra is of this form. The smallest nonempty elements A1, · · · , An of this algebraare called atoms.

Example (Countable measure) Let Ω has a countable decomposition Dk, i.e.,

Ω =∑

Dk, Dj ∩Dj = ∅, i 6= j.

Let F = F∗ and P (Dk) = αk > 0 and∑

k αk = 1. For the Poisson measure onnonnegative integers;

Dk = k, m(Dk) = e−λλk

k!

for λ > 0.

Definition For any family C of subsets of E, we can define the σ-algebra σ(C) by thesmallest σ-algebra A which contains C. The σ-algebra σ(C) is the intersection of allσ-algebras which contains C. It is again a σ-algebra, i.e.,

A =⋂α

Aα

where Aα are all σ-algebras that contain C.

The following construction of the measure space and the measure is essential forthe triple (E,F ,m) on uncountable space E.

If (E,O) is a topological space, where O is the set of open sets in E, then the σ-algebra B(E) generated by O is called the Borel σ-algebra of the topological space Eand (E,B(E)) defines the measure space and a set B in B(E) is called a Borel set. Forexample (Rn,B(Rn)) is the the measure space and B(Rn) is the σ-algebra generatedby open balls in Rn.

Caratheodory Theorem Let B = σ(A), the smallest σ-algebra containing an algebraA of subsets of E. Let µ0 is a σ additive measure of on (E,A). Then there exist aunique measure µ on (E,B) which is an extension of µ0, i.e., µ(A) = µ0(A), A ∈ A.

Definition A map f from a measure space (X,A) to an other measure space (Y,B)is called measurable, if f−1(B) = x ∈ X : f(x) ∈ B ∈ A for all B ∈ B. Iff : (Rn,B(Rn))→ (Rm,B(Rm)) is measurable, we say f is a Borel function.

In general every continuous function Rn → Rm is a Borel function since the inverseimage of open sets in Rn are open in Rm.

Example (Measure space) Let Ω = R and B(R) be the Borel σ-algebra. Note that

(a, b] =⋂n

(a, b+1

n), [a, b] =

⋂n

(a− 1

n, b+

1

n) ∈ B(R).

15

Thus, B(R) coincides with the σ-algebra generated by the semi-closed intervals. Let Abe the algebra of finite disjoint sum of semi-closed intervals (ai, bi] and define P0 by

µ0(n∑k=1

(ak, bk]) =n∑k=1

(F (bk)− F (ak))

where x ∈ R→ F (x) ∈ R+ is nondecreasing, right continuous and the left limit existseverywhere. We have the measure µ on (R,B(R)) by the Caratheodory Theorem if µ0

is σ-additive on AWe now prove that µ0 is σ-additive on A. By the monotone convergence theorem

it suffices to prove that

P0(An) ↓ 0, An ↓ ∅, An ∈ A.

Without loss of the generality one can assume that An ⊂ [−N,N ]. Since F is the rightcontinuous, for each An there exists a set Bn ∈ A such that Bn ⊂ An and

µ0(An)− µ0(Bn) ≤ ε2−n

for all ε > 0. The collection of sets [−N,N ]\Bn is an open covering of the compact set[−N,N ] since ∩Bn = ∅. By the Heine-Borel theorem there exists a finite subcovering:

n0⋃n=1

[−N,N ] \Bn = [−N,N ].

and thus⋂n0n=1Bn = 0. Thus,

µ0(An0) = µ0(An0 \⋂n0k=1Bk) + µ0(

⋂n0k=1Bk) = µ0(An0 \

⋂n0k=1Bk)

µ0(

n0⋂k=1

(Ak \Bk)) ≤n0∑k=1

µ0(Ak \Bk) ≤ ε.

Since ε > 0 is arbitrary µ0(An)→ 0 as n→∞.

Next, we define the integral of a measurable function f on (Ω,F , µ).

Defintion (Elementary function) An elementary function f is defined by

f(x) =n∑k=1

xk IAk(x)

where Ak is a partition of E, i.e, Ak ∈ F are disjoint and ∪Ak = E. Then theintegral of f is given by ∫

Ef(x) dµ(x) =

n∑k=1

xkµ(Ak).

Theorem (Approximation) For every measurable function f ≥ 0 on (E,F) thereexists a sequence of elementary functions fn such that 0 ≤ fn(x) ≤ f(x) and fn(x) ↑f(x) for all x ∈ E.

16

Proof: For n ≥ 1, define a sequence of elementary functions by

fn(ω) =n2n∑k=1

k − 1

2nIk,n(ω) + nIf(x)>n

where Ik,n is the indicator function of the set x : k−12n < f(x) ≤ k

2n . It is easy to verifythat fn(x) is monotonically nondecreasing and fn(x) ≤ f(x) and thus fn(x) → f(x)for all x ∈ E.

Definition (Integral) For a nonnegative measurable function f on (E,F) we definethe integral by ∫

Ef(x) dµ(x) = lim

n→∞

∫Efn(x) dµ(x)

where the limit exists since

∫Efn(x) dµ(x) is an increasing number sequence.

Note that f = f+ − f− with f+(x) = max(0, f(x)), f−(x) = max(0, f(x)). So, wecan apply for the above Theorem and Definition for f+ and f−.∫

Ef(x) dµ(x) =

∫Ef+(x) dµ(x)−

∫Ef−(x) dµ(x)

If

∫Ef+(x) dµ(x),

∫Ef−(x) dµ(x) <∞, f is integrable and

∫E|f(x)| dµ(x) =

∫Ef+(x) dµ(x) +

∫Ef−(x) dµ(x).

Corollary Let f is a measurable function on (R,B(R), µ) with µ((a, b]) = F (b)−F (a),then we have as n→∞∫ ∞

0f(x) dµ(x) =

n2n∑k=0

f(k − 1

2n)(F (

k

2n)−F (

k − 1

2n))+f(n)(1−F (n))→

∫ ∞0

f(x) dF (x),

which is the Lebesgue Stieljes integral with respect to measure dF .The space L of measurable functions on (E,F) is a complete metric space with

metric

d(f, g) =

∫E

|f − g|1 + |f − g|

dµ.

In fact, d(f, g) = 0 if and if f = g almost everywhere and since

|f + g|1 + |f + g|

≤ |f |1 + |f |

+|g|

1 + |g|,

d satisfies the triangle inequality.

Theorem (completeness) (L, d) is a complete metric space.

Proof: Let fn be a Cauchy sequence of (L, d). Select a subsequence fnk such that

∞∑k=1

d(fnk , fnk+1) <∞

17

Then, ∑k

E[|fnk − fnk+1

|1 + |fnk − fnk+1

|<∞

Since |f | ≤≤ 2|f |1+|f | for |f | ≤ 1,

∞∑|fnk − fnk+1

| <∞ a.e.

and fnk almost everywhere converges to f for some f ∈ L. Moreover, d(fnk , f)→ 0and thus (L, d) is complete.

Definition (Convergence in Measure) A sequence fn of mesurable functions on S con-verges to f in measure if

for ε > 0, limn→∞m(s ∈ S : |fn(s)− f(s)| ≥ ε)=0.

Theorem (Convergence in measure) fn converges in measure to f if and onlyif fn converges to f in d-metric.

Proof: For f, g ∈ L

d(f, g) =

∫|f−g|≥ε

|f − g|1 + |f − g|

dµ+

∫|f−g|<ε

|f − g|1 + |f − g|

dµ ≤ µ(|f − g| ≥ ε) +ε

1 + ε

holds for all ε > 0. Thus, fn converges in mesure to f , then fn converges to f ind-metric. Conversely, since

d(f, g) ≥ ε

1 + εµ(|f − g| ≥ ε)

if fn converges to f in d-metric, then fn converges in measure to f .

Corollary The completion of C(E) with respect to Lp(E)-norm:

(

∫E|f |p dµ)

1p

is Lp(E) = f ∈ L :∫E |f |

p dµ <∞.

2.3 Baire’s Category Theory

In this section we discuss the uniform boundedness principle and the open mappingtheorem. The underlying idea of the proofs of these theorems is the Baire theorem forcomplete metric spaces. It is concerned with the decomposition of a space as a unionof subsets. For instance, we have a countable union;

R2 =⋃k,j

Sk,j , Sk = (k, k + 1]× (j, j + 1].

On the other hand, we have uncountable union;⋃α∈R

`α, `α = α ×R

18

but `α has no interior points, the one dimensional subspace of R2. The question is canwe represent R2 asa countable union of these sets? It turns out that the answer is no.

Definition (Category) Let (X, d) be a metric space. A subset E of X is callednowhere dense if its closure does not contain any open set. X is said to be the firstcategory if X is the union of a countable number of nowhere dense sets. Otherwise, Xis said to be the second category, i.e., suppose X =

⋃k Ek, then the closure of at least

one of Ek’s has non-empty interior.

Baire-Hausdorff Theorem A complete metric space is of the second category.

Proof: Let Ek be a sequence of closed sets and assume ∪nEn = X in a completemetric space X. We will show that there is at least one of Ek’s has non-empty interior.Assume otherwise, no En contains an interior point. Let On = Ecn and then On isopen and dense in X for every n ≥ 1. Thus, O1 contains a closed ball B1 = x :d(x, x1) ≤ r1 with r1 ∈ (0, 1

2). Since O2 is open and dense, there exists a closed ballB2 = x : d(x, x2) ≤ r2 with r2 ∈ (0, 1

22 ) in B1. By repeating the same process, thereexists a sequence Bk of closed ball Bk = x : x : d(x, xk) ≤ rk such that

Bk+1 ⊂ Bk, 0 < rk1

2k, Bk ∩ Ek = ∅.

The sequence xk is a Cauchy sequence in X and since X is complete x∗ = limxn ∈ Xexists. Since

d(xn, x∗) ≤ d(xn, xm) + d(xm, x

∗) ≤ rn + d(xm, x∗)→ rn

as m → ∞, x∈Bn for every n, i.e. x∗ /∈ En and thus x∗ /∈ ∩nEn = X, which is acontradiction.

Baire Category theorem Let M be a set of the first category in a compact topolog-ical space. Then, the complement M c is dense in X.

Theorem (Totally bounded) A subset M in a complete metric space X is relativelycompact if and only if it is totally bounded, i.e, for every ε > 0 there exists a fam-ily of finite many points m1,m2, · · · ,mn in M such that for every point m ∈ M ,d(m,mk) < ε for at least one mk.

Theorem (Banach-Steinhaus, uniform boundedness principle)Let X, Y be Banach spaces and let Tk, k ∈ K be a family (not necessarily count-

able) of continuous linear operators from X to Y . Assume that

supk∈K|Tkx|Y <∞ for all x ∈ X.

Then there exists a constant c such that

|Tkx|Y ≤ c |x|X for all k ∈ K and x ∈ X.

Proof: DefineXn = x ∈ X : |Tkx| ≤ n for all k ∈ K.

Then, Xn is closed, and by the assumption we have X = ∪nXn. It follows from theBaire category theorem that int(X0) is not empty for some n0 and select x0 ∈ X andr > 0 such that B(x0, r) ⊂ int(X0). We have

Tk(x0 + r z) ≤ n0 for all k and z ∈ B(0, 1),

19

which impliesr |Tk|L(X,Y ) ≤ n0 + |Tkx0|.

The uniform bounded principle is quite remarkable and surprising, since from point-wise estimates one derives a global (uniform) estimate. We have the following examples.

Examples (1) Let X be a Banach space and B be a subset of X. If for every f ∈ X∗the set f(B) = 〈f, x〉, x ∈ B is bounded in R, then B is bounded.(2) Let X be a Banach space and B∗ be a subset of X∗. If for every x ∈ X the set〈B∗, x〉 = 〈f, x〉, f ∈ B∗ is bounded in R, then B∗ is bounded.

Theorem (Banach closed graph theorem) A closed linear operator on a Banachspace X to a Banach space Y is continuous.

2.4 Dual space and Hahn-Banach theorem

In this section we discuss the dual space of a normed space and the Hahn-Banachtheorem.

Definition (Dual Space) If X be a normed space, let X∗ denote the collection of allbounded linear functionals on X. X∗ is called the dual space of X. We define the dualproduct 〈x, f〉X×X∗ by

〈f, x〉X∗×X = f(x) for x ∈ X f ∈ X∗.

For f ∈ X∗ we define the operator norm of f by

|f |X∗ = supx 6=0

|f(x)||x|X

.

Theorem (Dual space) The dual space X∗ equipped with the operator norm is aBanach space.

Proof: fn be a Cauchy sequence in X∗, i.e.

|fn(φ)− fm(φ)| ≤ |fn − fm|∗|φ| → 0 as m→∞,

for all φ ∈ X. Thus, fn(φ) is Cauchy in R, define the functional f on X by

f(φ) = limn→∞

fn(φ), φ ∈ X.

Then f is linear and for all ε > 0 there exists Nε such that for m, n ≥ Nε

|fm(φ)− fn(φ)| ≤ ε |φ|

Letting m→∞, we have

|f(φ)− fn(φ)| ≤ ε |φ| for all n ≥ Nε.

and thus f ∈ X∗ and |fn − f |∗ → 0 as n→∞.

Examples (Dual space) (1) Let `p be the space of p-summable infinite sequences x =

(s1, s2, · · · ) with norm |x|p = (∑|sk|p)

1p . Then, (`p)∗ = `q, 1

p + 1q = 1 for 1 ≤ p <

20

∞. Let c = `∞ be the space of uniformly bounded sequences x = (ξ1, ξ2, · · · ) withnorm |x|∞ = maxk |ξk| and c0 be the space of uniformly bounded sequences withlimn→∞ξn = 0. Then, (`1)′ = c0

(2) Let Lp(Ω) is p-integrable functions f on Ω with

|f |Lp = (

∫Ω|f(x)|p dx)

1p

Then, Lp(Ω)∗ = Lq(Ω), 1p + 1

q = 1 for p ≥ 1. Especially, L2(Ω)∗ = L2(Ω) and

L1(Ω)∗ = L∞(Ω), where

L∞(Ω) = the space of essentially bounded functions on Ω

with|f |L∞ = inf M where |f(x)| ≤M almost everywhere on Ω.

(3) Consider a linear functional f(x) = x(1). Then, f is in X∗1 but not in X∗2 sincefor the sequence defined by xn(t) = tn, t ∈ [0, 1] we have f(xn) = 1 but |xn|X2 =√

12n+1 → 0 as n→∞. Since for x ∈ X2

|x(t)| = |x(0) +

∫ t

0x′(s) ds| ≤

√∫ 1

0|x′(t)|2 dt,

thus f ∈ X∗2 .

(4) The total variation V 10 (g) of a function g on an interval [0, 1] is defined by

V 10 (g) = sup

P∈P

n−1∑i=0

|g(xi+1)− g(xi)|,

where the supremum is taken over all partitions P = P : 0 ≤ t0 ≤ · · · ≤ tn ≤ 1 ofthe interval [0, 1]. Then, BV (0, 1) = uniformly bounded function g with V 1

0 (f) <∞and |g|BV = V 1

0 (g). It is a normed space if we assume g(0) = 0. For example, everymonotone functions is of bounded variation. Every real-valued BV-function can beexpressed as the difference of two increasing functions (Jordan decomposition theorem).

Hahn-Banach Theorem Suppose S is a subspace of a normed spaceX and f : S → Ris a linear functional satisfying f(x) ≤ |x| for all x ∈ S. Then there exists an F ∈ X∗such that

F (x) ≤ |x| and F (s) = f(s) for all s ∈ S.

In general, the theorem holds for a vector space X and sub-additive and positivehomogeneous function p, i.e., for all x, y ∈ X and α > 0

p(x+ y) ≤ p(x) + p(y), p(αx) = αp(x).

Every linear functional f on a subspace S, satisfying f(x) ≤ p(x) has a extension Fon X satisfying F (x) ≤ p(x) for all x ∈ X.

21

Proof: First, we show that there exists an one step extension of f , i.e., for x0 ∈ X \ S,there exists an extension f1 of f on the subspace Y1 spanned by (S, x0) such thatf1(x) ≤ p(x) on Y1. Note that for arbitrary s1, s2 ∈ S,

f(s1) + f(s2) ≤ p(s1 − x0) + p(s2 + x0).

Hence, there exist a constant c such that

sups∈Sf(s)− p(s− x0) ≤ c ≤ inf

s∈Sp(s+ x0)− f(s).

Extend f by defining, for s ∈ S and α ∈ R

f1(s+ αx0) = f(s) + α c,

where f1(x0) = c, is determined as above. First note that the representation s+αx0 ∈Y1 is unique since if s1 + αx0 = s2 + β x0 s2 − s1 = (α − β)x0 and thus s1 = s2 andα = β. Next, we claim that

f1(s+ αx0) ≤ p(s+ αx0)

for all α ∈ R and s ∈ S, i.e., by the definition of c, for α > 0

f1(s+ αx0) = α(c+ f(s

α) ≤ α (p(

s

α+ x0)− f(

s

α) + f(

s

α) = p(s+ αx0),

and for all α < 0

f1(s+ αx0) = −α(−c+ f(s

−α) ≤ −α(p(

s

−α− x0)− f(

s

−α) + f(

s

−α) = p(s+ αx0).

If X is a separable normed space. Let x1, x2, . . . be a countable dense base of X \S.Select victors from this dense vectors, which is independent of S, recursively. Extendf to subspaces Yn = spanA, x1, ·, xn, recursively, as described above. Thus, we havean extension F on Y = spanS, x1, x2, . . . Since Y is dense in X, the final extension Fis defined by the continuous extension F from Y to X = Y . In general, the remainingproof is done by using Zorn’s lemma.

Remark If X is a Hilbert space, then we have the orthogonal decomposition theorem

X = S ⊕ S⊥.

Thus, one can define the extension of f by

f(x) = f(x1) for x = x1 + x2, ; x1 ∈ S, x2 ∈ S⊥.

Corollary Let X be a normed space. For each x0 ∈ X there exists an f ∈ X∗ suchthat

f(x0) = |f |X∗ |x0|X .

Proof of Corollary: Let S = αx0 : α ∈ R and define f(αx0) = α |x0|X . By Hahn-Banach theorem there exits an extension F ∈ X∗ of f such that F (x) ≤ |x| for allx ∈ X. Since

−F (x) = F (−x) ≤ | − x| = |x|,

22

we have |F (x)| ≤ |x|, in particular |F |X∗ ≤ 1. On the other hand, F (x0) = f(x0) =|x0|, thus |F |X∗ = 1 and F (x0) = f(x0) = |F ||x0|.

Example (Hahn-Banach Theorem) (1) |x| = sup|x∗|=1 〈x∗, x〉(2)

Next, we prove the geometric form of the Hahn-Banach theorem. It states thatgiven a convex set K containing an interior point and x0 /∈ K, then x0 is separatedfrom K by a hyperplane. Let K be a convex set and 0 is an interior point of K. Ahyperplane is a maximal affine space x0+M where M is a subspace of X. We introducethe Minkowski functional that defines the distance from the origin measured by K.

Definition (Minkowski functional) Let K be a convex set in a normed space Xand 0 is an interior point of K. Then the Minkowski functional p of K is defined by

p(x) = infr :x

r∈ K, r > 0.

Note that if K is the unit sphere in X, p(x) = |x|.

Lemma (Minkowski functional)(1) 0 ≤ p(x) <∞.(2) p(αx) = αp(x) for α ≥ 0.(3) p(x1 + x2) ≤ p(x1) + p(x2).(4) p is continuous.(5) K = x : p(x) ≤ 1, int(K) = x : p(x) < 1.

Proof: (1) Since K contains a sphere at 0, for every x ∈ X there exists an r > 0 suchthat x

r ∈ K and thus p(x) is finite.(2) For α > 0

p(αx) = infr :αx

r∈ K = infαr′ : x

r′∈ K = α infr′ : x

r′∈ K = αp(x).

(3) Let p(xi) < ri < p(xi)+ε, i = 1, 2 for arbitrary ε > 0. Let r = r1+r2. By convexityof K since r1

rx1r1

+ r2rx2r2

= x1+x2r ∈ K and thus p(x1 + x2) ≤ r ≤ p(x1) + p(x2) + 2ε.

Since ε > 0 is arbitrary, p is subadditive.(4) Let ε be a radius of a closed sphere centered at 0 and contained in K. Sinceε x|x| ∈ K, p(ε x|x|) ≤ 1 and thus p(x) ≤ 1

ε |x|. This shows that p is continuous at 0. Sincep is sub linear

p(x) = p(x− y + y) ≤ p(x− y) + p(y) and p(y) = p(y − x+ x) ≤ p(y − x) + p(x)

and thus−p(y − x) ≤ p(x)− p(y) ≤ p(x− y).

Hence, p is continuous since it is continuous at 0.(5) It follows from the continuity of p.

Lemma (Hyperplane) Let H be a hyperplane in a vector space X if and only ifthere exist a linear functional f on X and a constant c such that H = f(x) = c.

Proof: If H ia hyperplane, then H is the translation of a subspace M in X, i.e.,H = x0 + M . If x0 6 inM , then [M + x0] = X. if for x = αx0 + m, m ∈ M we letf(x) = α, we have H = x : f(x) = 1. Conversely, if f is a nonzero linear functional

23

on X, then M = x : f(x) = 0 is a subspace. Let x0 ∈ X such that f(x0) = 1. Sincef(x − f(x)x0) = 0, X = span[x0,M ] and x : f(x) = c = f(x − c x0) = 0 is ahyperplane.Theorem (Mazur’s Theorem) Let K be convex set with a nonempty interior pointin a normed space X. Suppose V is an affine space in X that contains no interiorpoints of K. Then, there exist x∗ ∈ X∗ and constant c such that the hyperplanelanglex∗, v〉 = c has the separation:

〈x∗, v〉 = c for all v ∈ V and 〈x∗, k〉 < c for all k ∈ int(K).

Proof: By an appropriate translation one may assume that 0 is an interior point ofK. Let M is the subspace spanned by V . Since V does not contain 0, exists a linearfunctional f on M such that V = x : f(x) = 1. Let p be the Minkowski functionalof K. Since V contains no interior points of K, f(x) = 1 ≤ p(x) for all x ∈ V . By thehomogeneity of p, f(x) ≤ p(x) for all x ∈ V By the Hahn-Banach space, there exits anextension F of f to V from M to X with F (x) ≤ p(x). From Lemma p is continuousand F is continuous and F (x) < 1 for x ∈ int(K). Thus, H = x : F (x) = 1 is thedesired closed hyperplane.

Example (Hilbert space) If X is a Hilbert space and S is a subspace of X. Then theextension F is defined as

F (s) =

lim f(sn) for s ∈ S0 for s ∈ S⊥

where X = S ⊕ S⊥.Example (Integral) Let X = L2(0, 1) S = C(0, 1) and f(x) = R −

∫ 10 x(t) dt be the

Riemman integral of x ∈ S . Then the extension F is F (x) = L −∫ 1

0 x(t) dt is theLebesgue integral of x ∈ X.

Example (Alignment) (1) For Hilbert space X f = x0 is self aligned.(2) Let x0 ∈ X = Lp(Ω), 1 ≤ p <∞. For f ∈ X∗ = Lq(Ω) defined by by

f(t) = |x0(t)|p−2x0(t), t ∈ Ω,

f(x) =

∫Ω|x0(t)|p dt.

Thus, x0|x0|p−2 ∈ X∗ is aligned with x0.

A function g is called right continuous if limh→0+ g(t+ h) = g(t). Let

BV0[0, 1] = g ∈ BV0[0, 1] : g is right continuous on [0, 1) and g(0) = 0.

Let C[0, 1] is the normed space of continuous functions on [0, 1] with the sup norm.For f ∈ C[0, 1] and g ∈ BV0[0, 1] define the Riemann-Stieltjes integral by∫ t

0x(t)dg(t) = lim

∆tR(f, g, P ) = lim

∆t

∑k=1

f(zk)(g(tk)− g(tk−1)

for the partition P = 0 = t0 < · · · < tn = 1 and zk ∈ [tk−1, tk]. There is a version ofthe Riesz representation theorem.

24

Theorem (Dual of C[0, 1]) There is a norm-preserving linear isomorphism from C[0, 1]to V [0, 1], i.e., for F ∈ C[0, 1]∗ there exists a unique g ∈ BV0[0, 1] such that

F (x) =

∫ 1

0x(t)dg(t)

and|F |C[0,1]∗ = |g|BV .

That is, C[0, 1]∗ ∼= BV (0, 1), the space of functions with bounded variation.

Proof: Define the normed space of uniformly bound function on [0, 1] by

B = x : [0, 1]→ R is bounded on [0, 1] with norm |x|B = supt∈[0,1]

|x(t)|.

By the Hahn-Banach theorem, an extension F of f from C[0, 1] to B exists and |F | =|f |. For any s ∈ (0, 1], define χ[0,s] ∈ B and define v(s) = F (χ[0,s]). Then,∑

k |v(tk)− v(tk−1)| =∑

k sign(v(tk)− v(tk−1))(v(tk)− v(tk−1))

= sign(v(tk)− v(tk−1)(F (χ[0,tk])− F (χ[0,tk−1])) = F (∑

k sign(v(tk)− v(tk−1)(χ[0,tk] − χ[0,tk−1])

≤ |F ||∑

k sign(v(tk)− v(tk−1)χ(tk−1,tk]| ≤ |f |.

Thus, v ∈ BV [0, 1] and |v|BV ≤ |F |. Next, for any x ∈ C[0, 1] define the function

z(t) =∑k

x(tk−1(χ[0,tk] − χ[0,tk−1]).

Note that the difference

|z − x|B = maxk

maxt∈[tk−1,tk]

|x(tk−1)− x(t)| → 0 as ∆t→ 0+.

Since F is bounded F (z)→ F (x) = f(x) and

F (z) =∑k

x(tk−1)(v(tk)− v(tk−1))→∫ 1

0x(t)dv(t)

we have

f(x) =

∫ 1

0x(t)dv(t)

and |F | ≤ |x|C[0,1]|v|BV . Hence F | = |f | = |v|BV .

Riesz Representation Theorem For every bounded linear functional f on a Hilbertspace X there exits a unique element yf ∈ X such that

f(x) = (x, yf ) for all x ∈ X, and |f |X∗ = |yf |X .

We define the Riesz map R : X∗ → X by

Rf = yf ∈ X for f ∈ X∗

25

for a Hilbert space X.Proof: There exists a z ∈ X such that (z, x) = 0 for all f(x) = 0 and |z| = 1. Otherwise,f(x) = 0 for all x and f = 0. Note that for every x ∈ X we have f(f(x)z−f(z)x) = 0.Thus,

0 = (f(x)− f(z)x, z) = (f(x)z, z)− (f(z)x, z)

which impliesf(x) = (x, f(z)z)

The existence then follows by taking yf = f(z)z. Let for u1, u2 f(x) = (x, u1) = (x, u2)for all x ∈ X. Then, then (x, u1 − u2) = 0 and by taking x = u1 − u2 we obtain|u1 − u2| = 0, which implies the uniqueness.

Definition (Reflexive Banach space) For a Banach space X we define an injectioni of X into X∗∗ = (X∗)∗ (the double dual) as follows. If x ∈ X then i x ∈ X∗∗ isdefined by i x(f) = f(x) for f ∈ X∗. Note that

|i x|X∗∗ = supf∈X∗

|f(x)||f |X∗

≤ supf∈X∗

|f |X∗ |x|X|f |X∗

= |x|X

an thus |i x|X∗∗ ≤ |x|X . But by Hahn-Banach theorem there exists an f ∈ X∗ suchthat |f |X∗ = 1 and f(x) = |x|X . Hence |i x|X∗∗ = |x|X and i : X → X∗∗ is an isometry.If i is also subjective, then we say that X is reflexive and we may identify X with X∗∗.

Definition (Weak and Weak star Convergence) (1) A sequence xn in a normedspace X is said to converge weakly to x ∈ X if limn→∞ f(xn)→ f(x) for all f ∈ X∗(2) A sequence fn in the dual space X∗ of a normed space X is said to convergeweakly star to f ∈ X∗ if limn→∞ fn(x)→ f(x) for all x ∈ X.

Notes (1) If xn converges strongly to x ∈ X then xn converges weakly to x, but notconversely.(2) If xn converges weakly to x, then |xn|X is bounded and |x| ≤ lim inf |xn|X , i.e.,the norm is weakly lower semi-continuous.(3) A sequence xn in a normed spaceX converges weakly to x ∈ X if limn→∞ f(xn)→f(x) for all f inS, where S is a dense set in X∗.

(4) If X be a Hilbert space. If a sequence xn of X converges weakly to x ∈ X, thenxn converges strongly to x if and only if limn→∞ |xn| = |x|. In fact,

|xn − x|2 = (xn − x, xn − x) = |xn|2 − (xn, x)− (x, xn) + |x|2 → 0

as n→∞ if xn converges weakly to x.

Example (Weak Convergence) (1) Let X = L2(0, 1) and consider a sequence xn =√2 sin(nπt), t ∈ (0, 1). Then |xn| = 1 and xn converges weakly to 0. In fact, since a

family sin(kπt) is dense in L2(0, 1) and

(xn, sin(kπt))L2 → 0 as n→∞

for all k and thus xn converges weakly to 0. Since |0| = 0, the sequence xn does notconverge strongly to 0.

26

In general, if xn → x weakly and yn → y strongly in a Hilbert space, then (xn, yn)→(x, y) since

(xn, yn) = (xn − x, y) + (xn, yn − y).

But, note that (xn, xn) = 2∫ 1

0 sin2(nπt) dt = 1 6= 0 and (xn, xn) does not converges to0. In general (xn, yn) does not necessarily converge to (x, y) if xn → x and yn → yweakly in X.

Alaoglu Theorem If X is a normed space and x∗n is a bounded sequence in X∗,then there exists a subsequence that converges weakly star to an element of X∗.

Proof: We prove the theorem for the case when X is separable. Let x∗n be sequencein X∗ such that |x∗n| ≤ 1. Let xk is a dense sequence in X. The sequence 〈x∗n, x1〉is a bounded sequence in R and thus a convergent subsequence denoted by 〈x∗n1

, x1〉.Similarly, the sequence 〈x∗n1

, x2〉 contains a a convergent subsequence 〈x∗n2, x1〉.

Continuing in this manner we extract subsequences 〈x∗nk , xk〉 that converges in R.Thus, the diagonal sequence x∗nn on the dense subset xk, i.e., 〈x∗nn, xk〉 convergesfor each xk. We will show that x∗nn converges weakly start to an element x∗ ∈ X∗. Leta x ∈ X be fixed. For arbitrary ε > 0 there exists N ≥ Nε such that

|x−N∑k=1

αk xk| ≤ε

3.

Thus,|〈x∗nn, x〉 − 〈x∗mm, x〉| ≤ |〈x∗nn, x−

∑Nk=1 αkxk〉|

+|〈x∗nn − x∗mm,∑N

k=1 αkxk〉|+ |〈x∗mm, x−∑N

k=1 αkxk〉| ≤ ε

for sufficiently large nn, mm. Thus, 〈x∗nn, x〉 is a Cauchy sequence in R and convergesto a real number 〈x∗, x〉. The functional 〈x∗, x〉 so defined is obviously linear and|x∗| ≤ 1 since |〈x∗nn, x〉| ≤ |x|.

Example (Weak Star Convergence) (1) Let X = L1(Ω) and L∞(Ω) = X∗. Thus, abounded sequence in L∞(Ω) has a weakly star convergent subsequence. Let X =L∞(0, 1) and xn be a sequence in X defined by

xn(t) =

1 on [1

2 + 1n , 1]

nx on [12 −

1n ,

12 −

1n ]

−1 on [0, 12 −

1n ];

Then |xn|X = 1 and xn converges weakly star to the sign function x(t) = sign(t− 12).

But, since |xn − x| = 12 , xn does not converge strongly to x.

(2) Let X = c0, the space of infinite sequences x = (x1, x2, · · · ) that converge to zerowith norm |x|∞ = maxk |xk|. Then X∗ = `1 and X∗∗ = `∞. Let x∗n in `1 = X∗

defined by x∗n = (0, · · · , 1, 0, · · · ) where 1 appears at n−th term only. Then, x∗n convergeweakly star to 0 in X∗ but x∗n does not converge weakly to 0 since 〈x∗n, x∗∗〉 does notconverge for x∗∗ = (1, 1, · · · ) ∈ X∗∗.

27

Example (L1(Ω) bounded sequence) Let X = L1(0, 1) and let xn ∈ X be a sequencedefined by

xn(t) =

n t ∈ (1

2 −1

2n ,12 + 1

2n)0 otherwise.

Then, |xn|L1 = 1 and ∫ 1

0xn(t)φ(t) dt→ φ(

1

2)

for all φ ∈ C[0, 1] but a linear functional f(φ) = φ(12) is not bounded on L∞(0, 1).

Thus, xn does not converge weakly. Actually, sequence xn converges weakly star inC[0, 1]∗.

Eberlein-Suhmulyan Theorem A Banach space X is reflexive if and only if andonly if every closed bounded set is weakly sequentially compact, i.e., every boundedsequence of X contains a subsequence that converges weakly to an element of X.

Next, we consider linear operator T : X → Y .

Definition (Linear Operator) Let X, Y be normed spaces. A linear operator T onD(T ) ⊂ X into Y satisfies

T (αx1 + β x2) = αTx1 + β Tx2

where the domain D(T ) is a subspace of X. Then a linear operator T is continuous ifand only if there exists a constant M ≥ 0 such that

|Tx|Y ≤M |x|X for all x ∈ X.

For a continuous linear operator we define the operator norm |T | by

|T | = sup|x|X=1

|Tx|Y = inf M,

where we assume D(T ) = X. A continuous linear operator on a normed space X intoY is called a bounded linear operator on X into Y and will be denoted by T ∈ L(X,Y ).The norm of T ∈ L(X,Y ) is denoted either by |T |L(X,Y ) or simply by |T |. Note thatthe space L(X,Y ) is a Banach space with the operator norm and X∗ = L(X,R).

Open Mapping theorem Let T be a continuous linear operator from a Banach spaceX onto a Banach space Y . Then, T is an open map, i.e., maps every open set to of Xonto an open set in Y .

Proof: Suppose A : X → Y is a surjective continuous linear operator. In order to provethat A is an open map, it is sufficient to show that A maps the open unit ball in X toa neighborhood of the origin of Y . Let U = B1(0) and V = B1(0) be a open unit ballof 0 in X and Y , respectively. Then X =

⋃k∈N kU . Since A is surjective,

Y = A(X) = A

(⋃k∈N

kU

)=⋃k∈N

A(kU).

Since Y is a Banach, by the Baire’s category theorem there exists a k ∈ N such that(A(kU)

)6= ∅. Let c ∈

(A(kU)

). Then, there exists r > 0 such that Br(c) ⊆

28

(A(kU)

). If v ∈ V , then

c, c+ rv ∈ Br(c) ⊆(A(kU)

)⊆ A(kU).

and the difference rv = (c+ rv)− c of elements in(A(kU)

)satisfies rv ∈⊆ A(2kU).

Since A is linear, V ⊆ A(

2kr U). Thus, it follows that for all y ∈ Y and ε > 0,

there exists x ∈ X such that |x|X < 2kr |y|Y and |y −Ax|Y < ε. (2.1)

Fix y ∈ δV , by (2.1), there is some x1 ∈ X with |x1| < 1 and |y − Ax1| < δ2 . By (2.1)

one can find a sequence xn inductively by

|xn| < 2−(n−1) and |y −A(x1 + x2 + · · ·+ xn)| < δ2−n. (2.2)

Let sn = x1+x2+· · ·+xn. From the first inequality in (2.2), sn is a Cauchy sequence,and since X is complete, sn converges to some x ∈ X. By (2.2), the sequence Asntends to y in Y , and thus Ax = y by continuity of A. Also, we have

|x| = limn→∞

|sn| ≤∞∑n=1

|xn| < 2.

This shows that every y ∈ δ V belongs to A(2U), or equivalently, that the image A(U)of U = B1(0) in X contains the open ball δ

2V ⊂ Y .

Corollary (Open Mapping theorem) A bounded linear operator T on Banachspaces is bounded invertible if and only if it is one-to-one and onto.

Definition (Closed Linear Operator) (1) The graph G(T ) of a linear operator Ton the domain D(T ) ⊂ X into Y is the set (x, Tx) : x ∈ D(T ) in the product spaceX × Y . Then T is closed if its graph G(T ) is a closed linear subspace of X × Y , i.e.,if xn ∈ D(T ) converges strongly to x ∈ X and Txn converges strongly to y ∈ Y , thenx ∈ D(T ) and y = Tx. Thus the notion of a closed linear operator is an extension ofthe notion of a bounded linear operator.(2) A linear operator T is said be closable if xn ∈ D(T ) converges strongly to 0 andTxn converges strongly to y ∈ Y , then y = 0.

For a closed linear operator T , the domain D(T ) is a Banach space if it is equippedby the graph norm

|x|D(T ) = (|x|2X + |Tx|2Y )12 .

Example (Closed linear Operator) Let T = ddt with X = Y = L2(0, 1) is closed and

dom (A) = H1(0, 1) = f ∈ L2(0, 1) : absolutely continuous functions on [0, 1]with square integrable derivative.

If yn = Txn, then

xn(t) = xn(0) +

∫ t

0yn(s) ds.

If xn ∈ dom(T )→ x and yn → y in L2(0, 1), then letting n→∞ we have

x(t) = x(0) +

∫ t

0y(s) ds,

29

i.e., x ∈ dom (T ) and Tx = y.

In general if for λ I + T for some λ ∈ R has a bounded inverse (λ I + T )−1, thenT : dom (A) ⊂ X → X is closed. In fact, Txn = yn is equivalent to

xn = (λ I + T )−1(yn + λxn

Suppose xn → x and yn → y in X, letting n → ∞ in this, we have x ∈ dom (T ) andTx = T (λ I + T )−1(λx+ y) = y.

Definition (Dual Operator) Let T be a linear operator X into Y with dense domainD(T ). The dual operator of T ∗ of T is a linear operator on Y ∗ into X∗ defined by

〈y∗, Tx〉Y ∗×Y = 〈T ∗y∗, x〉X∗×X

for all x ∈ D(T ) and y∗ ∈ D(T ∗).

In fact, for y∗ ∈ Y ∗ x∗ ∈ X∗ satisfying

〈y∗, Tx〉 = 〈x∗, x〉 for all x ∈ D(T )

is uniquely defined if and only if D(T ) is dense. The only if part follows since ifD(T ) 6= X then the Hahn-Banach theory there exits a nonzero x∗0 ∈ X∗ such that〈x∗0, x〉 = 0 for all D(T ), which contradicts to the uniqueness assumption. If T isbounded with D(T ) = X then T ∗ is bounded with ‖T‖ = ‖T ∗‖.

Examples Consider the gradient operator T : L2(Ω)→ L2(Ω)n as

Tu = ∇u = (Dx1u, · · ·Dxnu)

with D(T ) = H1(Ω). The, we have for v ∈ L2(Ω)n

T ∗v = −div v = −∑

Dxkvk

with domain D(T ∗) = v ∈ L2(Ω)n : div v ∈ L2(Ω) and n ·v = 0 at the boundary ∂Ω.In fact by the divergence theorem

(Tu, v) =

∫Ω∇u · v

∫∂Ω

(n · v)u ds−∫

Ωu(div v) dx = (u, T ∗v)

for all v ∈ C1(Ω). First, let u ∈ H10 (Ω) we have T ∗v = −div v ∈ L2(Ω) since H1

0 (Ω) isdense in L2(Ω). Thus, n · v ∈ L(∂Ω) and n · v = 0.

Definition (Hilbert space Adjoint operator) Let X, Y be Hilbert spaces and Tbe a linear operator X into Y with dense domain D(T ). The Hilbert self adjointoperator of T ∗ of T is a linear operator on Y into X defined by

(y, Tx)Y = (T ∗y, x)X

for all x ∈ D(T ) and y ∈ D(T ∗). Note that if we let T ′ : Y ∗ → X∗ is the dual operatorof T , then

T ∗RY ∗→Y = RX∗→XT′

where RX∗→X and RY ∗→Y are the Riesz maps.

30

Examples (self-adjoint operator) Let X = L2(Ω) and T be the Laplace operator

Tu = ∆u =n∑k=1

Dxkxku

with domain D(T ) = H2(Ω) ∩H10 (Ω). Then T is sel-adjoint, i.e., T ∗ = T . In fact

(Tu, v)X =

∫Ω

∆u v dx =

∫∂Ω

((n · ∇u)v − (n · ∇v)u) ds+

∫Ω

∆v u dx = (x, T ∗v)

for all v ∈ C1(Ω).

Closed Range Theorem Let X and Y be Banach spaces and T a closed linear op-erator on D(T ) ⊂ X into Y . Assume that D(T ) is dense in X. The the followingproperties are all equivalent.(a) R(T ) is closed in Y .(b) R(T ∗) is closed in X∗.(c) R(T ) = N(T ∗)⊥ = y ∈ Y : 〈y∗, y〉 = 0 for all y∗ ∈ N(T ∗).(d) R(T ∗) = N(T )⊥ = x∗ ∈ X∗ : 〈x∗, x〉 = 0 for all x∗ ∈ N(T ).

Proof: If y∗ ∈ N(T ∗)⊥, then〈Tx, y∗〉 = 〈x, T ∗y∗〉

for all x ∈ dom(T ) and N(T ∗)⊥ ⊂ R(T ). If y ∈ R(T ), then

〈y, y∗〉 = 〈x, T ∗y∗〉

for all y∗ ∈ N(T ∗) and R(T ) ⊂ N(T ∗)⊥. Thus, R(T ) = N(T ∗)⊥.Since T is closed, the graph G = G(T ) = (x, Tx)inX ×Y : x ∈ dom(T ) is closed.

Thus, G is a Banach space equipped with norm |x, y| = |x|+ |y| of X × Y . Define acontinuous linear operator S from G into Y by

Sx, Tx = Tx

Then, the dual operator S∗ of S is continuous linear operator from Y ∗ intoG∗ and wehave

〈x, Tx, S∗y∗〉 = 〈Sx, Tx, y∗〉 = 〈Tx, y∗〉 = 〈x, Tx, 0, y∗〉

for x ∈ dom(T ) and y∗ ∈ Y ∗. Thus, the functional S Sy∗ = S ∗ y∗ − 0, y∗ : Y ∗ →X∗×Y ∗ vanishes for all elements of G. But, since 〈x, Tx, x∗, y∗1 for all x ∈ dom(T ),is equivalent to 〈x, x∗〉 = 〈−Tx, y∗1〉 for all x ∈ dom(T ), i.e., −T ∗y∗1 = x∗,

S∗y∗ = 0, y∗+ −T ∗y∗1, y∗1 = −T ∗y∗1, y∗ + y∗1

for all y∗ ∈ Y ∗. Since y∗1 is arbitrary, R(S∗) = R(T ∗)× Y ∗. Therefore, R(S∗) is closedin X∗ × Y ∗ if and only if R(T ) is closed and since R(S) = R(T ), R(S) is closed if andonly if R(T ) is closed in Y . Hence we have only prove the equivalency of (a) and (b)in the special case of a bounded operator S.

If T is bounded, then (a) implies(b). Since R(T ) a Banach space, by the Hahn-Banach theorem, one can assume R(T ) = Y without loss of generality

〈Tx, y∗1〉 = 〈x, T ∗y∗1〉

31

By the open mapping theorem, there exists c > 0 such that for each y ∈ Y there existsan x ∈ X with Tx = y and |x| ≤ c |y|. Thus, for y∗ ∈ Y ∗, we have

|〈y, y∗〉| = |〈Tx, y∗〉| = |〈x, T ∗y∗〉| ≤ c|y||T ∗y∗|.

and|y∗| ≤ sup

|y|≤1|〈y, y∗〉| ≤ c|T ∗y∗|

and so (T ∗)−1 exists and bounded and D(T ∗) = R(T ∗) is closed.Let T1 : X → Y1 = R(T ) be defined by T1x = Tx For y∗1 ∈ Y ∗1

〈T1x, y∗1〉 = 〈x, T ∗y∗〉 = 0, for all x ∈ X

and since R(T ) is dense in R(T ), y∗1 = 0. Since R(T ∗) = R(T ∗1 ) is closed, (T ∗1 )−1 existson Y1 = (Y1)∗

2.5 Distribution and Generalized Derivatives

In this section we introduce the distribution (generalized function). The concept ofdistribution is very essential for defining a generalized solution to PDEs and providesthe foundation of PDE theory. Let D(Ω) be a vector space of all infinitely many con-tinuously differentiable functions C∞0 (Ω) with compact support in Ω. For any compactset K of Ω, let DK(Ω) be the set of all functions f ∈ C∞0 (Ω) whose support are in K.Define a family of seminorms on D(Ω) by

pK,m(f) = supx∈K

sup|s|≤m

|Dsf(x)|

where

Ds =

(∂

∂x1

)s1· · ·(

∂

∂xn

)snwhere s = (s1, · · · , sn) is nonnegative integer valued vector and |s| =

∑sk ≤ m. Then,

DK(Ω) is a locally convex topological space.

Definition (Distribution) A linear functional T defined on C∞0 (Ω) is a distributionif for every compact subset K of Ω, there exists a positive constant C and a positiveinteger k such that

|T (φ)| ≤ C sup|s|≤k, x∈K |Dsφ(x)| for all φ ∈ DK(Ω).

Definition (Generalized Derivative) A distribution S defined by

S(φ) = −T (Dxkφ) for all φ ∈ C∞0 (Ω)

is called the distributional derivative of T with respect to xk and we denote S = DxkT .

In general we have

S(φ) = DsT (φ) = (−1)|s| T (Dsφ) for all φ ∈ C∞0 (Ω).

32

This definition is naturally followed from that for f is continuously differentiable∫ΩDxkfφ dx = −

∫Ωf∂

∂xkφdx

and thus Dxkf = DxkTf = T ∂∂xk

f . Thus, we let Dsf denote the distributional deriva-

tive of Tf .

Example (Distribution) (1) For f is a locally integrable function on Ω, one defines thecorresponding distribution by

Tf (φ) =

∫Ωfφ dx for all φ ∈ C∞0 (Ω).

since

|Tf (φ)| ≤∫K|f | dx sup

x∈K|φ(x)|.

(2) T (φ) = φ(0) defines the Dirac delta δ0 at x = 0, i.e.,

|δ0(φ)| ≤ supx∈K|φ(x)|.

(3) Let H be the Heaviside function defined by

H(x) =

0 for x < 01 for x ≥ 0

Then,

DTH (φ) = −∫ ∞−∞

H(x)φ′(x) dx = φ(0)

and thus DTH = δ0 is the Dirac delta function at x = 0.

(4) The distributional solution for −D2u = δx0 satisfies

−∫ ∞−∞

uφ′′ dx = φ(x0)

for all φ ∈ C∞0 (R). That is, u = 12 |x− x0| is the fundamental solution, i.e.,

−∫ ∞−∞|x− x0|φ′′ dx =

∫ x0

∞φ′(x) dx−

∫ ∞x0

φ′(x) dx = 2φ(x0).

In general for d ≥ 2 let

G(x, x0) =

1

4π log|x− x0| d = 2

cd |x− x0|2−d d ≥ 3.

Then∆G(x, x0) = 0, x 6= x0.

and u = G(x, x0) is the fundamental solution to to −∆ in Rd,

−∆u = δx0 .

33

In fact, let Bε = |x−x0| ≤ ε and Γ = |x−x0| = ε be the surface. By the divergencetheorem∫

Rd\Bε(x0)G(x, x0)∆φ(x) dx =

∫Γ

∂

∂νφ(G(x, x0)− ∂

∂νG(x, x0)φ(s)) ds

=

∫Γ(ε2−d

∂φ

∂ν− (2− d)ε1−dφ(s)) ds→ 1

cdφ(x0)

That is, G(x, x0) satisfies

−∫RdG(x, x0)∆φdx = φ(x0).

In general let L be a linear diffrenrtial operator and L∗ denote the formal adjointoperator of L An locally integrable function u is said to be a distributional solution toLu = T where L with a distribution T if∫

Ωu(L∗φ) dx = T (φ)

for all φ ∈ C∞0 (Ω).

Definition (Sovolev space) For 1 ≤ p <∞ and m ≥ 0 the Sobolev space is

Wm,p(Ω) = f ∈ Lp(Ω) : Dsf ∈ Lp(Ω), |s| ≤ m

with norm

|f |Wm,p(Ω) =

∫Ω

∑|s|≤m

|Dsf |p dx

1p

.

That is,

|Dsf(φ)| ≤ c |φ|Lq with1

p+

1

q= 1.

Remark (1) X = Wm,p(Ω) is complete. In fact If fn is Cauchy in X, then Dsfnis Cauchy in Lp(Ω) for all |s| ≤ m. Since Lp(Ω) is complete, Dsfn → gs in Lp(Ω). Butsince

limn→∞

∫ΩfnD

sφdx =

∫ΩfDsφdx =

∫gsφdx,

we have Dsf = gs for all |s| ≤ m and |fn − f |X → 0 as n→∞.(2) Hm,p ⊂W 1,p(Ω). Let Hm,p(Ω) be the completion of Cm(Ω) with respect to W I,p(Ω)norm. That is, f ∈ Hm,p(Ω) there exists a sequence fn ∈ Cm(Ω) such that fn → fand Dsfn → gs strongly in Lp(Ω) and thus

Dsfn(φ) = (−1)|s|∫

ΩDsfnφdx→ (−1)|s|

∫Ωgsφdx

which implies gs = Dsf and f ∈W 1,p(Ω).(3) If Ω has a Lipschitz continuous boundary, then

Wm,p(Ω) = Hm,p(Ω).

34

2.6 Lax-Milgram Theorem ann Banach-Necas-Babuska The-orem

Let X be a Hilbert space. Let σ be a (complex-valued) sesquilinear form on X × Xsatisfying

σ(αx1 + β x2, y) = ασ(x1, y) + β σ(x2, y)

σ(x, α y1 + β y2) = α σ(x, y1) + β σ(x, y2),

|σ(x, y)| ≤M |x||y| for all x, y ∈ X (Bounded)

andReσ(x, x) ≥ δ |x|2 for all x ∈ X and δ > 0 (Coercive).

Then for each f ∈ X∗ there exist a unique solution x ∈ X to

σ(x, y) = 〈f, y〉X∗×X for all y ∈ X

and|x|X ≤ δ−1 |f |X∗ .

Proof: Let us define the linear operator S from X∗ into X by

Sf = x, f ∈ X∗

where x ∈ X satisfiesσ(x, y) = 〈f, y〉 for all y ∈ X.

The operator S is well defined since if x1, x2 ∈ X satisfy the above, then σ(x1−x2, y) =0 for all y ∈ X and thus δ |x1 − x2|2X ≤ Reσ(x1 − x2, x1 − x2) = 0.

Next we show that dom(S) is closed in X∗. Suppose fn ∈ dom(S), i.e., there existsxn ∈ X satisfying σ(xn, y) = 〈fn, y〉 for all y ∈ X and fn → f in X∗ as n→∞. Then

σ(xn − xm, y) = 〈fn − fm, y〉 for all y ∈ X

Setting y = xn − xm in this we obtain

δ |xn − xm|2X ≤ Reσ(xn − xm, xn − xm) ≤ |fn − fm|X∗ |xn − xm|X .

Thus xn is a Cauchy sequence in X and so xn → x for some x ∈ X as n→∞. Sinceσ and the dual product are continuous, thus x = Sf .

Now we prove that dom(S) = X∗. Suppose dom(S) 6= X∗. Since dom(S) is closedthere exists a nontrivial x0 ∈ X such that 〈f, x0〉 = 0 for all f ∈dom(S). Considerthe linear functional F (y) = σ(x0, y), y ∈ X. Then since σ is bounded F ∈ X∗ andx0 = SF . Thus F (x0) = 0. But since σ(x0, x0) = 〈F, x0〉 = 0, by the coercivity of σx0 = 0, which is a contradiction. Hence dom(S) = X∗.

Assume that σ is coercive. By the Lax-Milgram theorem A has a bounded inverseS = A−1. Thus,

dom (A) = A−1H.

Moreover A is closed. In fact, if

xn ∈ dom (A)→ x and fn = Axn → f in H,

35

then since xn = Sfn and S is bounded, x = Sf and thus x ∈ dom (A) and Ax = f .If σ is symmetric, σ(x, y) = (x, y)X defines an inner product on X. and SF

coincides with the Riesz representation of F ∈ X∗. Moreover,

〈Ax, y〉 = 〈Ay, x〉 for all x, y ∈ X.

and thus A is a self-adjoint operator in H.

Example (Laplace operator) Consider X = H10 (Ω), H = L2(Ω) and

σ(u, φ) = (u, φ)X =

∫Ω∇u · ∇φdx.

Then,

Au = −∆u = −(∂2

∂x21

u+∂2

∂x22

u)

anddom (A) = H2(Ω) ∩H1

0 (Ω).

for Ω with C1 boundary or convex domain Ω.For Ω = (0, 1) and f ∈ L2(0, 1)∫ 1

0

d

dxyd

dxu dt =

∫ 1

0f(x)y(x) dx

is equivalent to ∫ 1

0

d

dxy (

d

dxu+

∫ 1

xf(s) ds) dx = 0

for all y ∈ H10 (0, 1). Thus,

d

dxu+

∫ 1

xf(s) ds = c (a constant)

and therefore ddxu ∈ H

1(0, 1) and

Au = − d2

dx2u = f in L2(0, 1).

Example (Elliptic operator) Consider a second order elliptic equation

Au = −∇ · (a(x)∇u) + b(x) · ∇u+ c(x)u(x) = f(x),∂u

∂ν= g at Γ1 u = 0 at Γ0

where Γ0 and Γ1 are disjoint and Γ0 ∪ Γ1 = Γ. Integrating this against a test functionφ, we have∫

ΩAuφ dx =

∫Ω

(a(x)∇u ·∇φ+ b(x) ·∇uφ+ c(x)uφ) dx−∫

Γ1

gφ dsx =

∫Ωf(x)φ(x) dx,

for all φ ∈ C1(Ω) vanishing at Γ0. Let X = H1Γ0

(Ω) is the completion of C1(Ω)vanishing at Γ0 with inner product

(u, φ) =

∫Ω∇u · ∇φdx

36

i.e.,H1

Γ0(Ω) = u ∈ H1(Ω) : u|Γ0 = 0

Define the bilinear form σ on X ×X by

σ(u, φ) =

∫Ω

(a(x)∇u · ∇φ+ b(x) · ∇uφ+ c(x)uφ.

Then, by the Green’s formula

σ(u, u) =

∫Ω

(a(x)|∇u|2 + b(x) · ∇(1

2|u|2) + c(x)|u|2) dx

=

∫Ω

(a(x)|∇u|2 + (c(x)− 1

2∇ · b) |u|2) dx+ +

∫Γ1

1

2n · b|u|2 dsx.

If we assume

0 < a ≤ a(x) ≤ a, c(x)− 1

2∇ · b ≥ 0, n · b ≥ 0 at Γ1,

then σ is bounded and coercive with δ = a.

The Banach space version of Lax-Milgram theorem is as follows.

Banach-Necas-Babuska Theorem Let V and W be Banach spaces. Consider thelinear equation for u ∈W

a(u, v) = f(v) for all v ∈ V (2.3)

for given f ∈ V ∗, where a is a bounded bilinear form on W × V . The problem iswell-posed in if and only if the following conditions hold:

infu∈W

supv∈V

a(u, v)

|u|W |v|V≥ δ > 0

a(u, v) = 0 for all u ∈W implies v = 0

(2.4)

Under conditions we have the unique solution u ∈W to (2.3) satisfies

|u|W ≤1

δ|f |V ∗ .

Proof: Let A be a bounded linear operator from W to V ∗ defined by

〈Au, v〉 = a(u, v) for all u ∈W, v ∈ V.

The inf-sup condition is equivalent to for any w?W

|Aw|V ∗ ≥ δ|u|W ,

and thus the range of A, R(A), is closed in V ∗ and N(A) = 0. But since V is reflexiveand

〈Au, v〉V ∗×V = 〈u,A∗v〉W×W ∗

37

from the second condition N(A∗) = 0. It thus follows from the closed range andopen mapping theorems that A−1 is bounded.

Next, we consider the generalized Stokes system. Let V and Q be Hilbert spaces.We consider the mixed variational problem for (u, p) ∈ V ×Q of the form

a(u, v) + b(p, v) = f(v), b(u, q) = g(q) (2.5)

for all v ∈ V and q ∈ Q, where a and b is bounded bilinear form on V × V and V ×Q.If we define the linear operators A ∈ L(V, V ∗) and B ∈ L(V,Q∗) by

〈Au, v〉 = a(u, v) and 〈Bu, q〉 = b(u, q)

then it is equivalent to the operator form: A B∗

B 0

u

p

=

f

g

.

Assume the coercivity on aa(u, u) ≥ δ |u|2V (2.6)

and the inf-sup condition on b

infq∈P

supu∈V

b(u, q)

|u|V |q|Q≥ β > 0 (2.7)

Note that inf-sup condition that for all q there exists u ∈ V such that Bu = q and|u|V ≤ 1

β |q|Q. Also, it is equivalent to |B∗p|V ∗ ≥ β |p|Q for all p ∈ Q.

Theorem (Mixed problem) Under conditions (2.6)-(2.7) there exits a unique solution(u, p) ∈ V ×Q to (2.5) and

|u|V + |p|Q ≤ c (|f |V ∗ + |g|Q∗)

Proof: For ε > 0 consider the penalized problem

a(uε, v) + b(v, pε) = f(v), for all v ∈ V

−b(uε, q) + ε(pε, q)Q = −g(q), for all q ∈ Q.(2.8)

By the Lax-Milgram theorem for every ε > 0 there exists a unique solution (uε, pε) ∈V ×Q. From the first equation and (2.7),

β |pε|Q ≤ |f −Auε|V ∗ ≤ |f |V ∗ +M |uε|V .

Letting v = uε and q = pε in the first and second equation and (2.7), we have

δ |uε|2V + ε |pε|2Q ≤ |f |V ∗ ||uε|V + |pε|Q|g|Q∗ ≤ C (|f |V ∗ + |g|Q∗)|uε|V ),

and thus |uε|V and thus |pε|Q are bounded uniformly in ε > 0. Thus, (uε, pε) has aweakly convergent subspace to (u, p) in V ×Q and (u, p) satisfies (2.5).

38

2.7 Compact Operator

A compact operator is a linear operator A from a Banach space X to another Banachspace Y , such that the image under A of any bounded subset of X is a relativelycompact subset of Y , i.e., its closure is compact. For example, the integral operator isa concrete example of compact operators. A Fredholm integral equation gives rise toa compact operator A on function spaces.

A compact set in a Banach space is bounded. The converse doses not hold ingeneral. The the closed unit sphere is weakly star compact but not strongly compactunless X is of finite dimension.

Let C(S) be the space of continuous functions on a compact metric space S withnorm |x| = maxs∈S |x(s)|. the Arzela-Ascoli theorem gives the necessary and sufficientcondition for a subset xα being (strongly) relatively compact.Arzela-Ascoli theorem A family xα in C(S) is strongly compact if and only if thefollowing conditions hold:

supα|xα| <∞ (equi-bouned)

limδ→0+

supα, |s′−s′′|≤δ

|xα(s′)− xα(s′)| = 0 (equi-continuous).

Proof: By the Bolzano-Weirstrass theorem for a fixed s ∈ S xα(s) contains a con-vergent subsequence. Since S is compact, there is a countable dense subset sn in Ssuch that for every ε > 0 there exists nε satisfying

sups∈S

inf1≤j≤nε

dist(s, sj) ≤ ε.

Let us denote a convergent subsequence of xn(s1) by xn1(s1). Similarly, the se-quence 〈xn1(s2) contains a a convergent subsequence xn2(s2). Continuing in thismanner we extract subsequences xnk(sk). The diagonal sequence x∗nn(s) convergesfor s = s1, s2, · · · , simultaneously for the dense subset sk. We will show that xnnis a Cauchy sequence in C(S). By the equi-continuity of xn(s) for all ε > 0 thereexists δε such that |xn(s′) − xn(s′′)| ≤ ε for all s′, s′ satisisfying dist(s′, s′′) ≤≤ δ.Thus, for s ∈ S there exists a j ≤ kε such that

|xnn(s)− xmm(s)| ≤ |xnn(s)− xmm(sj)|+ |xnn(sj)− xmm(sj)|+ |xmm(s)− xmm(sj)|

≤ 2ε+ |xnn(sj)− xmm(sj)|

which implies that limm,m→∞ maxs∈S |xnn(s)− xmm(s)| ≤ 2ε for arbitrary ε > 0.

For the space Lp(Ω), we have

Frechet-Kolmogorov theorem Let Ω be a subset of Rd. A family xα of Lp(Ω)functions is strongly compact if and only if the following conditions hold:

supα

∫Ω|xα|p ds <∞ (equi-bouned)

lim|t|→0

supα

∫Ω|xα(t+ s)− xα(s)|p ds = 0 (equi-continuous).

39

Example (Integral Operator) Let k(x, y) be the kernel function satisfying∫Ω

∫Ω|k(x, y)|2 dxdy <∞

Define the integral operator A by

Au =

∫Ωk(x, y)u(y) dy for u ∈ X = L2(Ω).

Then A ∈ L(X) i compact. This class of operators A is called the Hilbert-Schmitzoperator.

Rellich-Kondrachov theorem Every uniformly bounded sequence in W 1,p(Ω) has asubsequence that converges in Lq(Ω) provided that q < p∗ = dp

d−p .

Fredholm Alternative theorem Let A ∈ L(X) be compact. For any nonzero λ ∈ C,either1) The equation Ax− λx = 0 has a nonzero solution x, or2) The equation Ax− λ, x = f has a unique solution x for any function f ∈ X.In the second case, (λ I −A)−1 is bounded.

ExerciseProblem 1 T : D(T ) ⊂ X → Y is closable if and only if the closure of its graph G(T )in X × Y the graph of a linear operator S.Problem 2 For a closed linear operator T , the domain D(T ) is a Banach space if it isequipped by the graph norm

|x|D(T ) = (|x|2X + |Tx|2Y )12 .

3 Constrained Optimization

In this section we develop the Lagrange multiplier theory for the constrained minimiza-tion in Banach spaces.

3.1 Hilbert space theory

In this section we develop the Hilbert space theory for the constrained minimization.

Theorem 1 (Riesz Representation) Let X be a Hilbert space. Given F ∈ X∗,consider the minimization

J(x) =1

2(x, x)X − F (x) over x ∈ X

There exists a unique solution x∗ ∈ X and x∗ satisfies

(v, x∗) = F (v) for all v ∈ X (3.1)

The solution x∗ ∈ X is the Riesz representation of F ∈ X∗ and it satisfies |x∗|X =|F |X∗ .

40

Proof: Suppose x∗ is a minimizer. Then

1

2(x∗ + tv, x∗ + tv)− F (x∗ + tv) ≥ 1

2(x∗, x∗)− F (x∗)

for all t ∈ R and v ∈ X. Thus, for t > 0

(v, x∗)− F (v) +t

2|v|2 ≥ 0

Letting t→ 0+, we have (v, x∗)− F (v) ≥ 0 for all v ∈ X. If v ∈ X, then −v ∈ X andthis implies (3.1).

(Uniqueness) Suppose x∗1, x∗2 are minimizers. Then,

(v, .x∗1)− (v, x∗2) = F (v)− F (v) = 0 for all v ∈ X

Letting v = x∗1 − x∗2, we have |x∗1 − x∗2|2 = (x∗1 − x∗2, x∗1 − x∗2) = 0, i.e., x∗1 = x∗2.

(Existence) Suppose xn is a minimizing sequence, i.e., J(xn) is decreasing andlimn→∞ J(xn) = δ = infx∈X J(x). Note that

|xn − xm|2 + |xn + xm|2 = 2(|xn|2 + |xm|2)

Thus,

1

4|xn−xm|2 = J(xm)+J(xn)−2J(

xn + xm2

)−F (xn)−F (xm)+2F (xn + xm

2) ≤ J(xn)+J(xm)−2δ.

This implies xn is a Cauchy sequence in X. Since X is complete there exists x∗ =limn→∞ xn in X and J(x∗) = δ, i.e, x∗ is the minimizer.

Remark The existence and uniqueness proof of the theorem holds if we assume J isuniformly convex functional with modulus φ, i.e.,

J(λx1 + (1− λ)x2) ≤ J(x1) + (1− λ)J(x2)− λ(1− λ)φ(|x1 − x2|)

where φ is a function that is increasing and vanishes only at 0. In fact

1

4φ(|xn − xm|) ≤ J(xm) + J(xn)− 2J(

xn + xm2

) ≤ J(xn) + J(xm)− 2δ.

The necessary and sufficient optimality is given by 0 ∈ ∂J(x∗) where ∂J(x∗) is thesubdifferential of J at x∗.

Example (Mechanical System)

min

∫ 1

0

1

2| ddxu(x)|2 −

∫ 1

0f(x)u(x) dx (3.2)

over u ∈ H10 (0, 1). Here

X = H10 (0, 1) = u ∈ H1(0, 1) : u(0) = u(1) = 0

is a Hilbert space with the inner product

(u, v)H10

=

∫ 1

0

d

dxud

dxv dx.

41

H1(0, 1) is a space of absolute continuous functions on [0, 1] with square integrablederivative, i.e.,

u(x) = u(0) +

∫ x

0g(x) dx, g(x) =

d

dxu(x), a.e. and g ∈ L2(0, 1).

The solution yf = x∗ to (3.8) satisfies∫ 1

0

d

dxvd

dxyf dx =

∫ 1

0f(x)v(x) dx.

By the integration by part∫ 1

0f(x)v(x) dx =

∫ 1

0(

∫ 1

xf(s) ds)

d

dxv(x) dx,

and thus ∫ 1

0(d

dxyf −

∫ 1

xf(s) ds)

d

dxv(x) dx = 0

for all v ∈ H10 (0, 1). This implies that

d

dxyf −

∫ 1

xf(s) ds = c = a constant

Integrating this, we have

yf =

∫ x

0

∫ 1

xf(s) dsdx+ cx

Since yf (1) = 0 we have

c = −∫ 1

0

∫ 1

xf(s) dsdx.

Moreover, ddxyf ∈ H

1(0, 1) and

− d2

dx2yf (x) = f(x) a.e.

with yf (0) = yf (1) = 0.

Now, we consider the case when F (v) = v(1/2). Then,

|F (v)| = |∫ 1/2

0

d

dxv dx| ≤ (

∫ 1/2

0| ddxv| dx)1/2(

∫ 1/2

01 dx)1/2 ≤ 1√

2|v|X .

That is, F ∈ X∗. Thus, we have∫ 1

0(d

dxyf − χ(0,1/2)(x))

d

dxv(x) dx = 0

for all v ∈ H10 (0, 1), where χS is the characteristic function of a set S. It is equivalent

tod

dxyf − χ(0,1/2)(x) = c = a constant.

42

Integrating this, we have

yf (x) =

(1 + c)x x ∈ [0, 1

2 ]

cx+ 12 x ∈ [1

2 , 1].

Since yf (1) = 0 we have c = −12 . Moreover yf satisfies

− d2

dx2yf = 0, for x 6= 1

2yf (0) = yf (1) = 0

yf ((12)−) = yf (1

2)+), ddxyf ((1

2)−)− ddxyf ((1

2)+) = 1.

Or, equivalently

− d2

dx2yf = δ 1

2(x)

where δx0 is the Dirac delta distribution at x = x0.

Theorem 2 (Orthogonal Decomposition X = M ⊕M⊥) LetX be a Hilbert space

and M is a linear subspace of X. Let M be the closure of M and M⊥ = z ∈ X :(z, y) = 0 for all y ∈ M be the orthogonal complement of M . Then every elementu ∈ X has the unique representation

u = u1 + u2, u1 ∈M and u2 ∈M⊥.

Proof: Consider the minimum distance problem

min |x− u|2 over x ∈M. (3.3)

As for the proof of Theorem 1 one can prove that there exists a unique solution x∗ forproblem (3.3) and x∗ ∈M satisfies

(x∗ − u, y) = 0, for all y ∈M

i.e., u− x∗ ∈M⊥. If we let u1 = x∗ ∈M and u2 = u− x∗ ∈M⊥, then u = u1 + u2 isthe desired decomposition.

Example (Closed subspace) (1) M = span(yk, k ∈= x ∈ X;x =∑

k∈K αk yk is asubspace of X. If K is finite, M is closed.

(2) Let X = L2(−1, 1) and M = odd functions = f ∈ X : f(−x) = f(x), a.e. andthen M is closed. Let M1 = spansin(kπt) and M2 = spancos(kπt). L2(0, 1) =M1 ⊕M2.(3) Let E ∈ L(X,Y ). Then M = N(E) = x ∈ X : Ex = 0 is closed. In fact, ifxn ∈ M and xn → x in X, then 0 = limn→∞Exn = E limn→∞ xn = Ex and thusx ∈ N(E) = M .

Example (Hodge-Decomposition) Let X = L2(Ω)3 with where Ω is a bounded open

set in R3 with Lipschitz boundary Γ. Let M = ∇u, u ∈ H1(Ω) is the gradient field.Then,

M⊥ = ψ ∈ X : ∇ · ψ = 0, n · ψ = 0 at Γ

43

is the divergence free field. Thus, we have the Hodge decomposition

X = M ⊕M⊥ = gradu, u ∈ H1(Ω) ⊕ the divergence free field.

In fact ψ ∈M⊥ ∫∇φ · ψ =

∫Γn · ψφds−

∫Ω∇ψφdx = 0

and ∇u ∈M, u ∈ H1(Ω)/R is uniquely determined by∫Ω∇u · ∇φdx =

∫Ωf · ∇φdx

for all φ ∈ H1(Ω).

Theorem 3 (Decomposition) Let X, Y be Hilbert spaces and E ∈ L(X,Y ). LetE∗ ∈ L(Y,X) is the adjoint of E, i.e., (Ex, y)Y = (x,E∗y)X for all x ∈ X andy ∈ Y . Then N(E) = R(E∗)⊥ and thus from the orthogonal decomposition theoremX = N(E)⊕R(E∗).

Proof: Suppose x ∈ N(E), then

(x,E∗y)X = (Ex, y)Y = 0

for all y ∈ Y and thus N(E) ⊂ R(E∗)⊥. Conversely, x∗ ∈ R(E∗)⊥, i.e.,

(x∗, E∗y) = (Ex, y) = 0 for all y ∈ Y

Thus, Ex∗ = 0 and R(E∗)⊥ ⊂ N(E).

Remark Suppose R(E) is closed in Y , then R(E∗) is closed in X and

Y = R(E)⊕N(E∗), X = R(E∗)⊕N(E)

Thus, we have b = b1 + b2, b1 ∈ R(E), b2 ∈ N(E∗) for all b ∈ Y and

|Ex− b|2 = |Ex− b1|2 + |b2|2.

Example (R(E) is not closed) Consider X = L2(0, 1) and define a linear operator E ∈L(X) by

(Ef)(t) =

∫ t

0f(s) ds, t ∈ (0, 1)

Since R(E) ⊂ absolutely continuous functions on [0, 1], R(E) = L2(0, 1) and R(E)is not a closed subspace of L2(0, 1).

3.2 Minimum norm problem

We considermin |x|2 subject to (x, yi)X = ci, 1 ≤ i ≤ m. (3.4)

Theorem 4 (Minimum Norm) Suppose yi are linearly independent. Then thereexits a unique solution x∗ to (3.4) and x∗ =

∑mi=1 βiyi where β = βimi=1 satisfies

Gβ = c, Gi,j = (yi, yj).

44

Proof: Let M = spanyimi=1. Since X = M ⊕M⊥, it is necessary that x∗ ∈M , i.e.,

x∗ =

m∑i=1

βiyi.

From the constraint (x∗, yi) = ci, 1 ≤ i ≤ m, the condition for β ∈ Rn follows. Infact, x ∈ X satisfies (x, yi) = ci, 1 ≤ i ≤ m if and only if x = x∗ + z, z ∈ M⊥. But|x|2 = |x∗|2 + |z|2.

Remark yi is linearly independent if and only if the symmetric matrix G ∈ Rn,n ispositive definite since

(x,Gx)Rn = |n∑k=0

xk yk|2X for all x ∈ Rn.

Theorem 5 (Minimum Norm) Suppose E ∈ L(X,Y ). Consider the minimumnorm problem

min |x|2 subject to Ex = c

If R(E) = Y , then the minimum norm solution x∗ exists and

x∗ = E∗y, (EE∗)y = c.

Proof: Since R(E∗) is closed by the closed range theorem, it follows from the decom-position theorem that X = N(E) ⊕ R(E∗). Thus, we x∗ = E∗y if there exists y ∈ Ysuch that (EE∗)y = c. Since Y = N(E∗) ⊕ R(E), N(E∗) = 0. Since if EE∗x = 0,then (x,EE∗x) = |E∗x| = 0 and E∗x = 0. Thus, N(EE∗) = N(E∗) = 0. SinceR(EE∗) = R(E) it follows from the open mapping theorem EE∗ is bounded invertibleand there exits a unique solution to (EE∗)y = c.

Remark If R(E) is closed and c ∈ R(E), we let Y = R(E) and Theorem 5 is valid.For example we define E ∈ L(X,Rm) by

Ex = ((y1, x)X , · · · , (ym, x)X) ∈ Rm.

If yimi=1 are dependent, then spanyimi=1 is a closed subspace of Rm

Example (DC-motor Control) Let (θ(t), ω(t))) satisfy

ddtω(t) + ω(t) = u(t), ω(0) = 0

ddtθ(t) = ω(t), θ(0) = 0

(3.5)

This models a control problem for the dc-motor and the control function u(t) is thecurrent applied to the motor, θ is the angle and ω is the angular velocity. The objectiveto find a control u on [0, 1] such the desired state (θ, 0) is reached at time t = 1 andwith minimum norm ∫ 1

0|u(t)|2 dt.

Let X = L2(0, 1). Given u ∈ X we have the unique solution to (3.5) and

ω(1) =∫ 1

0 et−1u(t) dt

θ(t) =∫ 1

0 (1− et−1)u(t) dt

45

Thus, we can formulate this control problem as problem (3.4) by letting X = L2(0, 1)with the standard inner product and

y1(t) = 1− et−1, y2(t) = et−1 c1 = θ, c2 = 0.

Example (Linear Control System) Consider the linear control system

d

dtx(t) = Ax(t) +Bu(t), x(0) = x0 ∈ Rn (3.6)

where x(t) ∈ Rn and u(t) ∈ Rm are the state and the control function, respectivelyand system matrices A ∈ Rn×n and B ∈ Rn×m are given. Consider a control problemof finding a control that transfers x(0) = 0 to a desired state x ∈ Rn at time T > 0,i.e., x(T ) = x with minimum norm. The problem can be formulated as (3.4). First,we have the variation of constants formula:

x(t) = eAtx0 +

∫ t

0eA(t−s)Bu(s) ds

where eAt is the matrix exponential and satisfies ddte

At = AeAt = eAtA. Thus, conditionx(T ) = x is written as ∫ T

0eA(T−t)Bu(t) ds = x

Let X = L2(0, T ;Rm) and it then follows from Theorem 4 that the minimum normsolution is given by

u∗(t) = BtEAt(T−t)β, Gβ = x,

provided that the control Gramean

G =

∫ T

0eA(T−s)BBteA

t(T−t) dt ∈ Rn×n

is positive on Rn. If G is positive, then control system (3.6) is controllable and it isequivalent to

BteAtx = 0, t ∈ [0, T ] implies x = 0.

since

(x,Gx) =

∫ T

0|BteA

ttx|2 dt

Moreover, G is positive if and only if the Kalman rank condition holds; i.e.,

rank [B,AB, · · ·An−1B] = n.

In fact, (x,Gx) = 0 implies BteAttx = 0, t ≥ 0 and thus Bt(At)k = 0 for all k ≥ 0 and

thus the claim follows from the Cayley-Hamilton theorem.

Example (Function Interpolation)

(1) Consider the minimum norm on X = H10 (0, 1):

min

∫ 1

0| ddxu|2 dx over H1

0 (0, 1) (3.7)

46

subject to u(xi) = ci, 1 ≤ i ≤ m, where 0 < x1 < · · · < xm < 1 are given nodes.Let X = H1

0 (0, 1). From the mechanical system example we have the piecewise linearfunction yi(x) ∈ X such that (u, yi)X = u(xi) for all u ∈ X for each i. From Theorem3 the solution to (3.7) is given by

u∗(x) =m∑i=1

βi yi(x).

That is, u∗ is a piecewise linear function with nodes xi, 1 ≤ i ≤ m and nodal values

ci at xi, i.e., βi =ci

yi(xi), 1 ≤ i ≤ n.

(2) Consider the Spline interpolation problem (1.2):

min

∫ 1

0

1

2| d

2

dx2u(x)|2 dx over all functions u(x)

subject to the interpolation conditions u(xi) = bi, u′(xi) = ci, 1 ≤ i ≤ m.(3.8)

Let X = H20 (0, 1) be the completion of the pre-Hilbert space C2

0 [0, 1] = the space oftwice continuously differentiable function with u(0) = u′(0) = 0 and u(1) = u′(1) = 0with respect to H2

0 (0, 1) inner product;

(f, g)H20

=

∫ 1

0

d2

dx2fd2

dx2g dx,

i.e.,

H20 (0, 1) = u ∈ L2(0, 1) :

d

dxu,

d2

dx2u ∈ L2(0, 1) with u(0) = u′(0) = u(1) = u′(1) = 0.

Problem (3.8) is a minimum norm problem on x = H20 (0, 1). For Fi ∈ X∗ defined by

Fi(v) = v(xi) we have

(yi, v)X − Fi(v) =

∫ 1

0(d2

dx2yi + (x− xi)χ[0,xi])

d2

dx2v dx, (3.9)

and for Fi ∈ X∗ defined by Fi(v) = v′(xi) we have

(yi, v)X − Fi(v) =

∫ 1

0(d2

dx2yi − χ[0,xi])

d2

dx2v dx. (3.10)

Thus,d2

dx2yi + (x− xi)χ[0,xi] = a+ bx,

d2

dx2yi − χ[0,xi] = a+ bx

where a, a , b, b are constants and yi = RFi and yi = RFi are piecewise cubic functionson [0, 1]. One can determine yi, yi using the conditions y(0) = y′(0) = y(1) = y′(0) = 0.From Theorem 3 the solution to (3.8) is given by

u∗(x) =m∑i=1

(βi yi(x) + βi yi(x))

47

That is, u∗ is a piecewise cubic function with nodes xi, 1 ≤ i ≤ m and nodal valueci and derivative bi at xi.

Example (Differential form, Natural Boundary Conditions) (1) Let X be the Hilbertspace defined by

X = H1(0, 1) = u ∈ L2(0, 1) :du

dx∈ L2(0, 1).

Given a, c ∈ C[0, 1] with a > 0 c ≥ 0 and α ≥ 0 consider the minimization

min

∫ 1

0(1

2(a(x)|du

dx|2 + c(x)|u|2)− uf) dx+

α

2|u(1)|2 − gu(1)

over u ∈ X = H1(0, 1). If u∗ is a minimizer, it follows from Theorem 1 that u∗ ∈ Xsatisfies ∫ 1

0(a(x)

du∗

dx

dv

dx+ c(x)u∗v − fv) dx+ αu(1)v(1)− gv(1) = 0 (3.11)

for all v ∈ X. By the integration by parts∫ 1

0(− d

dx(a(x)

du∗

dx)+c(x)u∗(x)−f(x))v(x) dx−a(0)

du∗

dx(0)v(0)+(a(1)

du∗

dx(1)+αu∗(1)−g)v(1) = 0,

assuming a(x)du∗

dx ∈ H1(0, 1). Since v ∈ H1(0, 1) is arbitrary,

− d

dx(a(x)

du∗

dx) + c(x)u∗ = f(x), x ∈ (0, 1)

a(0)du∗

dx(0) = 0, a(1)

du∗

dx(1) + αu∗(1) = g.

(3.12)

(3.11) is the weak form and (3.12) is the strong form of the optimality condition.(2) Let X be the Hilbert space defined by

X = H2(0, 1) = u ∈ L2(0, 1) :du

dx,d2u

dx∈ L2(0, 1).

Given a, b, c ∈ C[0, 1] with a > 0 b, c ≥ 0 and α ≥ 0 consider the minimization onX = H2(0, 1)

min

∫ 1

0(1

2(a(x)|d

2u

dx2|2+b(x)|du

dx|2+c(x)|u|2)−uf) dx+

α

2|u(1)|2+

β

2|u′(1)|2−g1u(1)−g2u

′(1)

over u ∈ X = H2(0, 1) satisfying u(0) = 1. If u∗ is a minimizer, it follows fromTheorem 1 that u∗ ∈ X satisfies∫ 1

0(a(x)

d2u∗

dx2

d2v

dx2+b(x)

du∗

dx

dv

dx+c(x)u∗v−fv) dx+αu∗(1)v(1)+β (u∗)′(1)v′(1)−g1v(1)−g2v

′(1) = 0

(3.13)for all v ∈ X satisfying v(0) = 0. By the integration by parts∫ 1

0(d2

dx2(a(x)

d2

dx2u∗)− d

dx(b(x)

d

dxu∗) + c(x)u∗(x)− f(x))v(x) dx

+(a(1)d2u∗

dx2(1) + β

du∗

dx(1)− g2)v′(1)− a(0)

d2u∗

dx2(0)v′(0)

+(− d

dx(a(x)

d2u∗

dx2+ b(x)

du∗

dx+ αu∗ − g1)(1)v(1)− (− d

dx(a(x)

d2u∗

dx2) + b(x)

du∗

dx)(0)v(0) = 0

48

assuming u∗ ∈ H4(0, 1). Since v ∈ H2(0, 1) satisfying v(0) = 0 is arbitrary,

d2

dx2(a(x)

d2

dx2u∗)− d

dx(b(x)

du∗

dx) + c(x)u∗ = f(x), x ∈ (0, 1)

u∗(0) = 1, a(0)d2u∗

dx2(0) = 0

a(1)d2u∗

dx2(1) + β

du∗

dx(1)− g2 = 0, (− d

dx(a(x)

d2u∗

dx2) + b(x)

du∗

dx)(1) + αu∗(1)− g1 = 0.

(3.14)(3.13) is the weak form and (3.14) is the strong (differential) form of the optimalitycondition.

Next consider the multi dimensional case. Let Ω be a subdomain in Rd with Lips-chitz boundary and Γ0 is a closed subset of the boundary ∂Ω with Γ1 = ∂Ω \ Γ0. LetΓ is a smooth hypersurface in Ω.

H1Γ0

(Ω) = u ∈ H1(Ω) : u(s) = 0, s ∈ Γ0

with norm |∇u|L2(Ω). Let Γ is a smooth hyper-surface in Ω and consider the weak formof equation for u ∈ H1

Γ0(Ω):∫

(σ(x)∇u · ∇φ+ u(x) (~b(x) · ∇φ) + c(x)u(x)φ(x)) dx+

∫Γ1

α(s)u(s)φ(s) ds

−∫

Γg0(s)φ(s) ds−

∫Ωf(x)φ(x) dx−

∫Γ1

g(s)u(s) ds = 0

for all H1Γ0

(Ω). Then the differential form for u ∈ H1Γ0

(Ω) is given by

−∇ · (σ(x)∇u+~bu) + c(x)u(x) = f in Ω

n · (σ(x)∇u+~bu) + αu = g at Γ1

[n · (σ(x)∇u+~bu)] = g0 at Γ.

By the divergence theorem if u satisfies the strong form, then u is a weak solution.Conversely, letting φ ∈ C0(Ω \ Γ) we obtain the first equation, i.e., ∇ · (σ∇u +~bu) ∈L2(Ω) Thus, n·(σ∇u+~bu) ∈ L2(Γ) and the third equation holds. Also, n·(σ∇u+~bu) ∈L2(Γ1) and the third equation holds.

3.3 Lax-Milgram Theory and Applications

Let H be a Hilbert space with scalar product (φ, ψ) and X be a Hilbert space andX ⊂ H with continuous dense injection. Let X∗ denote the strong dual space of X.H is identified with its dual so that X ⊂ H = H∗ ⊂ X∗ (i.e., H is the pivoting space).The dual product 〈φ, ψ〉 on X∗ ×X is the continuous extension of the scalar productof H restricted to H ×X. This framework is called the Gelfand triple.

Let σ is a bounded coercive bilinear form on X×X. Note that given x ∈ X, F (y) =σ(x, y) defines a bounded linear functional on X. Since given x ∈ X, y → σ(x, y) is a

49

bounded linear functional on X, say x∗ ∈ X∗. We define a linear operator A from Xinto X∗ by x∗ = Ax. Equation σ(x, y) = F (y) for all y ∈ X is equivalently written asan equation

Ax = F ∈ X∗. (3.15)

It from the Lax-Milgram theorem that A has the bouned inverse A−1 ∈ L(X∗, X) Also,

〈Ax, y〉X∗×X = σ(x, y), x, y ∈ X,

and thus A is a bounded linear operator. In fact,

|Ax|X∗ ≤ sup|y|≤1

|σ(x, y)| ≤M |x|.

Let R be the Riesz operator X∗ → X, i.e.,

|Rx∗|X = |x∗| and (Rx∗, x)X = 〈x∗, x〉 for all x ∈ X,

then A = RA represents the linear operator A ∈ L(X,X). Moreover, we define a linearoperator A on H by

Ax = Ax ∈ H

withdom (A) = x ∈ X : |σ(x, y)| ≤ cx |y|H for all y ∈ X.

That is, A is a restriction of A on dom (A). It thus follows from the Lax-Milgramtheorem

dom (A) = A−1H.

We will use the symbol A for all three linear operators as above in the lecture note andits use should be understood by the underlining context.

3.4 Quadratic Constrained Optimization

In this section we discuss the constrained minimization. First, we consider the case ofequality constraint. Let A be coercive and self-adjoint operator on X and E ∈ L(X,Y ).Let σ : X ×X → R is a symmetric, bounded, coercive bilinear form, i.e.,

σ(αx1 + β x2, y) = ασ(x1, y) + β σ(x2, y) for all x1, x2, y ∈ X and α, β ∈ R

σ(x, y) = σ(y, x), |σ(x, y)| ≤M |x|X |y|X , σ(x, x) ≥ δ |x|2

for some 0 < δ ≤ M < ∞. Thus, σ(x, y) defines an inner product on X. In fact,since σ is bounded and coercive,

√σ(x, x) =

√(Ax, x) is an equivalent norm of X.

Since y → σ(x, y) is a bounded linear functional on X, say x∗ and define a linearoperator from X to X∗ by x∗ = Ax. Then, A ∈ L(X,X∗) with |A|X→X∗ ≤ M and〈Ax, y〉X∗×X = σ(x, y) for all x, y ∈ X. Also, if RX∗→X is the Riesz map of X∗ anddefine A = RX∗→XA ∈ LX,X), then

(Ax, y)X = σ(x, y) for all x, y ∈ X,

50

and thus A is a self-adjoint operator in X. Throughout this section, we assume X =X∗ and Y = Y ∗, i.e., we identify A as the self-adjoint operator A. Consider theunconstrained minimization problem: for f ∈ X∗

min F (x) =1

2σ(x, x)− f(x). (3.16)

Since for x, v ∈ X and t ∈ R

F (x+ tv)− F (x)

t= σ(x, v)− f(v) +

t

2σ(v, v),

x ∈ X → F (x) ∈ R is differentiable and we have the necessary optimality for (3.16):

σ(x, v)− f(v) = 0 for all v ∈ X. (3.17)

Or, equivalentlyAx = f in X∗

Also, if RX∗→X is the Riesz map of X∗ and define A = RX∗→XA ∈ L(X,X), then

(Ax, y)X = σ(x, y) for all x, y ∈ X,

and thus A is a self-adjoint operator in X. Throughout this section, we assume X = X∗

and Y = Y ∗, i.e., we identify A as the self-adjoint operator A.

Consider the equality constraint minimization;

min1

2(Ax, x)X − (a, x)X subject to Ex = b ∈ Y, (3.18)

or equivalently for f ∈ X∗

min1

2σ(x, x)− f(x) subject to Ex = b ∈ Y.

The necessary optimality for (3.18) is:

Theorem 6 (Equality Constraint) Suppose there exists a x ∈ X such that Ex = bthen (3.18) has a unique solution x∗ ∈ X and Ax∗ − a ⊥ N(E). If R(E∗) is closed,then there exits a Lagrange multiplier λ ∈ Y such that

Ax∗ + E∗λ = a, Ex∗ = b. (3.19)

If range(E) = Y (E is surjective), there exits a Lagrange multiplier λ ∈ Y is unique.

Proof: Suppose there exists a x ∈ X such that Ex = b then (3.18) is equivalent to

min1

2(A(z + x), z + x)− (a, z + x) over z ∈ N(E).

Thus, it follows from Theorem 2 that it has a unique minimizer z∗ ∈ N(E) and(A(z∗ + x) − a, v) = 0 for all v ∈ N(E) and thus A(z∗ + x) − a ⊥ N(E). From theorthogonal decomposition theorem Ax∗ − a ∈ R(E∗). Moreover if R(E∗) is surjective,from the closed range theorem R(E∗) = R(E∗) and there exist exists a λ ∈ Y satisfying

51

Ax∗+E∗λ = a. If E is surjective (R(E) = Y ), then N(E∗) = 0 and thus the multiplierλ is unique.

Remark (Lagrange Formulation) Define the Lagrange functional

L(x, λ) = J(x) + (λ,Ex− b).

The optimality condition (3.19) is equivalent to

∂

∂xL(x∗, λ) = 0,

∂

∂λL(x∗, λ) = 0.

Here, x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y is the dualvariable.

Remark (1) If E is not surjective, one can seek a solution using the penalty formula-tion:

min1

2(Ax, x)X − (a, x)X +

β

2|Ex− b|2Y (3.20)

for β > 0. The necessary optimality condition is

Axβ + β E∗(Exβ − b)− a = 0

If we let λβ = β (Ex− b), then we have(A E∗

E − Iβ

)(xβλβ

)=

(ab

).

the optimality system (3.19) is equivalent to the case when β =∞. One can prove themonotonicity

J(xβ) ↑, |Exβ − b|Y ↓ as β →∞

In fact for β > β we have

J(xβ) + β2 |Exβ − b|

2Y ≤ J(xβ) + β

2 |Exβ − b|2Y

J(xβ) + β2 |Exβ − b|

2Y ≤ J(xβ) + β

2 |Exβ − b|2Y .

Adding these inequalities, we obtain

(β − β)(|Exβ − b|2Y − |Exβ − b|2Y ) ≤ 0,

which implies the claim. Suppose there exists a x ∈ X such that Ex = b then fromTheorem 6

J(xβ) +β

2|Exβ − b|2 ≤ J(x∗).

and thus |xβ|X is bounded uniformly in β > 0 and

limβ→∞

|Exβ − b| = 0

Since X is a Hilbert space, there exists a weak limit x ∈ X with J(x) ≤ J(x∗) andEx = b since norms are weakly sequentially lower semi-continuous. Since the optimalsolution x∗ to (3.18) is unique, limxβ = x∗ as β → ∞. Suppose λβ is bounded in Y ,

52

then there exists a subsequence of λβ that converges weakly to λ in Y . Then, (x∗, λ)satisfies Ax∗ +E∗λ = a. Conversely, if there is a Lagrange multiplier λ∗ ∈ Y , then λβis bounded in Y . In fact, we have

(A(xβ − x∗), xβ − x∗) + (xβ − x∗, E∗(λβ − λ∗)) = 0

where

β (xβ − x∗, E∗(λβ − λ∗)) = β (Exβ − b, λβ − λ∗) = |λβ|2 − (λβ, λ∗).

Thus we obtainβ (xβ − x∗, A(xβ − x∗)) + |λβ|2 = (λβ, λ

∗)

and hence |λβ| ≤ |λ∗|. Moreover, if R(E) = Y , then N(E∗) = 0. Since λβ satisfiesE∗λβ = a−Axβ ∈ X, we have

λβ = Esβ, sβ = (E∗E)−1(a−Axβ)

and thus λβ ∈ Y is bounded uniformly in β > 0.(2) If E is not surjective but R(E) is closed. We have λ ∈ R(E) ⊂ Y such that

Ax∗ + E∗λ = a

by letting Y = R(E) is a Hilbert space. Also, we have (xβ, λβ)→ (x, λ) satisfying

Ax+ E∗λ = a

Ex = PR(E)b

since|Ex− b|2 = |Ex− PR(E)b|2 + |PN(E∗)b|2.

Example (Parameterized constrained problem)

min 12 ((Ay, y) + α (p, p))− (a, y)

subject to E1y + E2p = b

where x = (y, p) ∈ X1×X2 = X and E1 ∈ L(X1, Y ), E2 ∈ L(X2, Y ). Assume Y = X1

and E1 is bounded invertible. Then we have the optimalityAy∗ + E∗1λ = a

α p+ E∗2λ = 0

E1y∗ + E2p

∗ = b

Example (Stokes system) Let X = H10 (Ω)×H1

0 (Ω) where Ω is a bounded open set in

R2. A Hilbert space H10 (Ω) is the completion of C0(Ω) with respect to the H1

0 (Ω) innerproduct

(f, g)H10 (Ω) =

∫Ω∇f · ∇g dx

53

Consider the constrained minimization∫Ω

(1

2(|∇u1|2 + |∇u2|2)− f · u) dx

over the velocity field u = (u1, u2) ∈ X satisfying

divu =∂

∂x1u1 +

∂

∂x2u2 = 0.

From Theorem 5 with Y = L2(Ω) and Eu = divu on X we have the necessary opti-mality

−∆u+ gradp = f, divu = 0

which is called the Stokes system. Here f ∈ L2(Ω) × L2(Ω) is a applied force p =−λ∗ ∈ L2(Ω) is the pressure.

3.5 Variational Inequalities

Let Z be a Hilbert lattice with order ≤ and Z∗ = Z and G ∈ L(X,Z). For exam-ple, Z = L2(Ω) with a.e. pointwise inequality constarint. Consider the quadraticprogramming:

min1

2(Ax, x)X − (a, x)X subject to Ex = b, Gx ≤ c. (3.21)

Note that the constrain set C = x ∈ X,Ex = b, Gx ≤ c is a closed convex set in X.In general we consider the constrained problem on a Hilbert space X:

F0(x) + F1(x) over x ∈ C (3.22)

where C is a closed convex set, F : X → R is C1 and F1 : X → R is convex, i.e.

F1((1− λ)x1 + λx2) ≥ (1− λ)F1(x1) + λF (x2) for all x, x2 ∈ X and 0 ≤ λ ≤ 1.

Then, we have the necessary optimality in the form of the variational inequality:

Theorem 7 (Variational Inequality) Suppose x∗ ∈ C minimizes (3.22), then wehave the optimality condition

F ′0(x∗)(x− x∗) + F1(x)− F1(x∗) ≥ 0 for all x ∈ C. (3.23)

Proof: Let xt = x∗ + t (x− x∗) for all x ∈ C and 0 ≤ t ≤ 1. Since C is convex, xt ∈ C.Since x∗ ∈ C is optimal,

F0(xt)− F0(x∗) + F1(xt)− F1(x∗) ≥ 0, (3.24)

whereF1(xt) ≤ (1− t)F1(x∗) + tF (x)

and thusF1(xt)− F1(x∗) ≤ t (F1(x)− F1(x∗)).

54

SinceF0(xt)− F0(x∗)

t→ F ′0(x∗)(x− x∗) as t→ 0+,

the variational inequality (3.23) follows from (3.24) by letting t→ 0+.

Example (Inequality constraint) For X = Rn

F ′(x∗)(x− x∗) ≥ 0, x ∈ C

implies thatF ′(x∗) · ν ≤ 0 and F ′(x∗) · τ = 0,

where ν is the outward normal and τ is the tangents of C at x∗. Let X = R2, C =[0, 1]× [0, 1] and assume x∗ = [1, s], 0 < s < 1. Then,

∂F

∂x1(x∗) ≤ 0,

∂F

∂x2(x∗) = 0.

If x∗ = [1, 1] then∂F

∂x1(x∗) ≤ 0,

∂F

∂x2(x∗) ≤ 0.

Example (L1-optimization) Let U be a closed convex subset in R. Consider the mini-

mization problem on X = L2(Ω) and C = u ∈ U a.e. in Ω;

min1

2|Eu− b|2Y + α

∫Ω|u(x)| dx. over u ∈ C. (3.25)

Then, the optimality condition is

(Eu∗ − b, E(u− u∗))Y + α

∫Ω

(|u| − |u∗|) dx ≥ 0 (3.26)

for all u ∈ C. Let p = −E∗(Eu∗ − b) ∈ X and

u = v ∈ U on |x− x| ≤ δ, otherwise u = u∗.

From (4.31) we have

1

δ

∫|x−x|≤δ

(−p(v − u∗(x)) + α (|v| − |u∗(x)|) dx ≥ 0.

Suppose u∗ ∈ L1(Ω) and letting δ → 0+ we obtain

−p(v − u∗(x)) + α (|v| − |u∗(x)|) ≥ 0

a.e. x ∈ Ω (Lebesgue points of |u∗| and for all v ∈ U . That is, u∗ ∈ U minimizes

−pu+ α |u| over u ∈ U. (3.27)

Suppose U = R it implies that if |p| < α, then u∗ = 0 and otherwise p = α u∗

|u∗| =

α sign(u∗). Thus, we obtain the necessary optimality

(E∗(Eu∗ − b))(x) + α∂|u|(u∗(x)) = 0, a.e. in Ω,

where the sub-differential of |u| (see, Section ) is defined by

∂|u|(s) =

1 s > 0[−1, 1] s = 0−1 s < 0.

55

3.5.1 Inequality constraint problem

Consider the quadratic programing problem.

Theorem 8 (Quadratic Programming) Let Z be a Hilbert space lattice with order≤. For E ∈ L(X,Y ) and G ∈ L(X,Z) consider the quadratic programming (3.21):

min1

2(Ax, x)X − (a, x) subject to Ex = b, Gx ≤ c.

If there exists x ∈ X such that Ex = b and Gx ≤ c, then there exits unique solutionto (3.21) and

(Ax∗ − a, x− x∗) ≥ 0 for all x ∈ C, (3.28)

where C = Ex = b, Gx ≤ c. Let Z = Z+ + Z− with positive corn Z+ and negativecone Z−. Moreover, if we assume the regular point condition

0 ∈ int (E(x− x∗), G(x− x∗)− Z− +G(x∗)− c) : x ∈ X, (3.29)

then there exists λ ∈ Y and µ ∈ Z+ such thatAx∗ + E∗λ+G∗µ = a

Ex∗ = b

Gx∗ − c ≤ 0, µ ≥ 0 and (Gx∗ − c, µ)Z = 0.

(3.30)

Proof: First note that (3.21) is equivalent to

min1

2(A(z + x, z + x)X − (a, z + x)X subject to Ez = 0, Gz ≤ c−Gx = c

Since C = Ez = 0, Gz ≤ c is a closed convex set in the Hilbert space N(E). As inthe proof of Theorem 1 the minimizing sequence zn ∈ C satisfies

1

4|zm − zn| ≤ J(zn) + J(zm)− 2 δ → 0

as m ≥ n → ∞. Since C is closed, there exists a unique minimizer z∗ ∈ C and z∗

satisfies(A(z∗ + x)− a, z − z∗) ≥ 0 for all z ∈ C,

which is equivalent to (3.28). Moreover, it follows from the Lagrange Multiplier theory(below) if the regular point condition holds, then we obtain (3.30).

Example Suppose Z = Rm. If (Gx∗ − c)i = 0, then i is called an active index and letGa the (row) vectors of G corresponding to the active indices. Suppose(

EGa

)is surjective (Kuhn-Tuker condition),

then the regular point condition holds.

Remark (Lagrange Formulation) Define the Lagrange functional;

L(x, λ, µ) = J(x) + (λ,Ex− b) + (µ, (Gx− c)+).

56

for (x, λ) ∈ X × Y, µ ∈ Z+; The optimality condition (3.30) is equivalent to

∂

∂xL(x∗, λ, µ) = 0,

∂

∂λL(x∗, λ, µ) = 0,

∂

∂µL(x∗, λ, µ) = 0

Here x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y and µ ∈ Z+ arethe dual variables.

Remark Let A ∈ L(X,X∗) and E : X → Y is surjective. If we define G is naturalinjection X to Z. Define µ ∈ X∗ by

−µ = Ax∗ − a+ E∗λ.

Since 〈t(x∗− c)− (x∗− c), µ〉 = (t− 1) 〈x∗− c, µ〉 ≤ 0 for t > 0 we have 〈x∗− c, µ〉 = 0.Thus,

〈x− c, µ〉 ≤ 0 for all x− c ∈ Z+ ∩X

If we define the dual cone X∗+ by

X∗+ = x∗ ∈ X∗ : 〈x∗, x〉 ≤ 0 for all x ∈ X and x ≤ 0,

then µ ∈ X∗+. Thus, the regular point condition gives the stronger regularity µ ∈ Z+

for the Lagrange multiplier µ.

Example (Source Identification) Let X = Z = L2(Ω) and S : L(X,Y ) and consider theconstrained minimization:

min1

2|Sf − y|2Y +

∫Ωf dx+

α

2|f |2

subject to−f ≤ 0. The second term represents the L1(Ω) regularization for the sparsityof f . In this case, A = α I + S∗S. α > 0, a = S∗y − β and G = −I. The regular pointcondition (??) is given by

0 ∈ int (−Z + f∗)− Z− − f∗,

Since −Z + f∗ contains Z− and the regular point condition holds. Hence there existsµ ∈ Z+ such that

α f∗ + S∗Sf∗ + β + µ = S∗y, µ ≥ 0 and (−f∗, µ) = 0.

For example, the operator S is defined for Poisson equation:

−∆u = f in Ω

with a boundary condition∂

∂u+ u = 0 at Γ = ∂Ω and Sf = Cu with a measurement

operator C, e.g., Cu = u|Γ or Cu = u|Ω with Ω, a subdomain of Ω.

Remark (Penalty Method) Consider the penalized problem

J(x) +β

2|Ex− b|2Y +

β

2|max(0, Gx− c)|2Z . (3.31)

The necessary optimality condition is given by

Ax+ E∗λβ +G∗µβ = a, λβ = β (Exβ − b), µβ = β (Gxβ − c)+.

57

It follows from Theorem 7 that there exists a unique solution xβto (3.21) and satisfies

J(xβ) +β

2|Exβ − b|2Y +

β

2|max(0, Gxβ − c)|2Z ≤ J(x∗) (3.32)

Thus, xβ is bounded in X and has a weak convergence subsequence that convergesto x in X. Moreover, we have

Ex = b and max(0, Gx− c) = 0

and J(x) ≤ J(x∗) since the norm is weakly sequentially lower semi-continuous. Sincethe minimizer to (3.21) is unique, x = x∗ and limβ→∞ J(xβ) = J(x∗) which impliesthat xβ → x∗ strongly in X.

Suppose λβ ∈ Y and µβ ∈ Z are bounded, there exists a subsequence that convergesweakly to (λ, µ) in Y × Z and µ ≥ 0. Since limβ→∞ J(xβ) = J(x∗) it follows from(3.32) that

(µβ, Gxβ − c)→ 0 as β →∞

and thus (µ,Gx∗ − c) = 0. Hence (x∗, λ, µ) satisfies (3.30). Conversely, if there is aLagrange multiplier (λ∗, µ∗) ∈ Y , then (λβ, µβ) is bounded in Y × Z. In fact, we have

(xβ − x∗, A(xβ − x∗)) + (xβ − x∗, E∗(λβ − λ∗)) + (xβ − x∗, G∗(µβ − µ∗)) ≤ 0

where

β (xβ − x∗, E∗(λc − λ∗)) = β (Exβ − b, λβ − λ∗) = |λβ|2 − (λβ, λ∗)

and

β (xβ − x∗, G∗(µβ − µ∗)) = β (Gxβ − c− (Gx∗ − c), µβ − µ∗) ≥ |µβ|2 − (µβ, µ∗).

Thus we obtain

β (xβ − x∗, A(xβ − x∗)) + |λβ|2 + |µβ|2 ≤ (λβ, λ) + (µβ, µ∗)

and hence |(λβ, µβ)| ≤ |(λ∗, µ∗)|.

Moreover, we have

Lemma Consider the penalized problem

J(x) +β

2|E(x)|2Y +

β

2|max(0, G(x))|2Z .

over x ∈ C, where C is a closed convex set. Let xβ is a minimizer. Then

λβ = β E(xβ), µβ = max(0, Gxβ)

are uniformly bounded.

Proof: The necessary optimality condition is given by

(J ′(xβ) + (E′(xβ))∗λβ + (G′(xβ))∗µβ, x− xβ) ≥ 0 for all x ∈ C

λβ = β E(xβ), µβ = β max(0, G(xβ)).

58

For all feasible solution x ∈ C

J(xβ) +β

2|E(xβ)|2Y +

β

2|max(0, G(xβ))|2Z ≤ J(x). (3.33)

Assume that xβ is bounded in X (for example, J is J(x) → ∞ as |x| → ∞, x ∈|calC or C is bounded) and thus has a subsequence that converges weakly to x∗ ∈ C.Moreover, if E(xβ)→ E(x∗) and G(xβ)→ G(x∗) weakly in Y and Z, respectively (i.e.,E and G are weakly sequentially continuous), then

E(x∗) = 0 and max(0, G(x∗)) = 0.

and J(x∗) ≤ J(x). Since J is weakly sequentially lower semi-continuous. Hence x∗ isa minimizer of (3.34) and limβ→∞ J(xβ) = J(x∗). If J is convex, xβ → x∗ in X.

Suppose λβ ∈ Y and µβ ∈ Z are bounded, there exists a subsequence that convergesweakly to (λ, µ) in Y × Z and µ ≥ 0. Since limβ→∞ J(xβ) = J(x∗) it follows from(3.33) that

(µβ, G(xβ))→ 0 as β →∞

and thus (µ,G(x∗)) = 0. Hence (x∗, λ, µ) satisfies (3.39).Next we will show that λβ ∈ Y and µβ ∈ Z are bounded under the regular pint

condition. Note that

(λβ, E′(xβ)(x− xβ)) = (λβ, E

′(x∗)(x− x∗) + (E′(xβ)− E′(x∗))(x− x∗) + E′(xβ)(x∗ − xβ))

(µβ, G′(xβ)(x− xβ)) = (µβ, G

′(x∗)(x− x∗) + (G′(xβ)−G′(x∗))(x− x∗) +G′(xβ)(x∗ − xβ))

and it follows from the regular point condition that there exists ρ > 0 and x ∈ C andz ≤ 0 such that

−ρ λβ = E′(x∗)(x− x∗)

−ρµβ = G′(x∗)(x− x∗)− z +G(x∗).

Since for all x ∈ C

(J ′(xβ), x− xβ) + (λβ, E′(xβ)(x− xβ) + (µβ, G

′(xβ)(x− xβ)) ≥ 0

we obtain

ρ (|λβ|2 + |µβ|2) ≤ (J ′(xβ), x∗ − xβ)− (λβ, (E′(xβ)− E′(x∗))(x− x∗))

−(µβ, (G′(xβ)−G′(x∗))(x− x∗))− (µβ, G(xβ)) + (µβ, G(xβ)−G(x∗)).

where we used (µβ, z) ≤ 0. Since |x − x∗| ≤ ρM and |xβ − x∗| → 0, there exist aconstant M > 0 such that for sufficiently large β > 0

|λβ|+ |µβ| ≤ M.

Example (Quadratic Programming on Rn) Let X = Rn. Given a ∈ Rn, b ∈ Rm andc ∈ Rp consider

min1

2(Ax, x)− (a, x)

subject to Bx = b, Cx ≤ c

59

where A ∈ Rn×n is symmetric and positive, B ∈ Rm×n and C ∈ Rp×n. Let Ca the (row)

vectors of C corresponding to the active indices and assuming

(BCa

)is surjective,

the necessary and sufficient optimality condition is given byAx∗ +Btλ+ Ctµ = a

Bx∗ = b

(Cx∗ − c)j ≤ 0 µj ≥ 0 and (Cx∗ − c)jµj = 0.

Example (Minimum norm problem with inequality constraint)

min |x|2 subject to (x, yi) ≤ ci, 1 ≤ i ≤ m.

Assume yi are linearly independent. Then it has a unique solution x∗ and thenecessary and sufficient optimality is given by

x∗ +∑m

i=1 µiyi = 0

(x∗, yi)− ci ≤ 0, µi ≥ 0 and ((x∗, yi)− ci)µi = 0,; 1 ≤ i ≤ m.

That is, if G is Grammian (Gij = (yi, yj)) then

(−Gx∗ − c)i ≤ 0 µi ≥ 0 and (−Gx∗ − c)iµi = 0, 1 ≤ i ≤ m.

Example (Pointwise Obstacle Problem) Consider

min

∫ 1

0

1

2| ddxu(x)|2 −

∫ 1

0f(x)u(x) dx

subject to u(1

2) ≤ c

over u ∈ X = H10 (0, 1). Let f = 1. The Riesz representation of

F1(u) =

∫ 1

0f(t)u(t) dt, F2(u) = u(

1

2)

are

y1(x) =1

2x(1− x). y2(x) =

1

2(1

2− |x− 1

2|),

respectively. Assume that f = 1. Then, we have

u∗ − y1 + µ y2 = 0, u∗(1

2) =

1

8− µ

4≤ c, µ ≥ 0.

Thus, if c ≥ 18 , then u∗ = y1 and µ = 0. If c ≤ 1

8 , then u∗ = y1−µ y2 and µ = 4(c− 18).

60

3.6 Constrained minimization in Banach spaces and La-grange multiplier theory

In this section we discuss the general nonlinear programming. Let X, Y be Hilbertspaces and Z be a Hilbert lattice. Consider the constrained minimization;

min J(x) subject to E(x) = 0 and G(x) ≤ 0. (3.34)

over a closed convex set C in X. Here, where J : X → R, E : X → Y and G : X → Zare continuously differentiable.

Definition (Derivatives) (1) The functional F on a Banach space X is Gateauxdifferentiable at u ∈ X if for all d ∈ X

limt→0

F (u+ td)− F (u)

t= F ′(u)(d)

exists. If the G-derivative d ∈ X → F ′(u)(d) ∈ R is liner and bounded, then F isFrechet differentiable at u.(2) E : X → Y , Y ia Banach space is Frechet differentiable at u if there exists a linearbounded operator E′(u) such that

lim|d|X→0

|E(u+ d)− E(u)− E′(u)d|Y|d|X

= 0.

If Frechet derivative u ∈ X → E′(u) ∈ L(X,Y ) is continuous, then E is continuouslydifferentiable.

Definition (Lower semi-continuous) (1) A functional F is lower-semi continuousif

lim infn→∞

F (xn) ≥ F ( limn→∞

xn)

(2) A functional F is weakly lower-semi continuous if

lim infn→∞

F (xn) ≥ F (w − limn→∞ xn)

Theorem (Lower-semicontinuous) (1) Norm is weakly lower-semi continuous.(2) A convex lower-semicontinuous functional is weakly lower-semi continuous.

Proof: Assume xn → x weakly in X. Let x∗ ∈ F (x), i.e., 〈x∗, x〉 = |x∗||x|. Then, wehave

|x|2 = limn→∞

〈x∗, xn〉

and|〈x∗, xn〉| ≤ |xn||x∗|.

Thus,lim infn→∞

|xn| ≥ |x|.

(2) Since F is convex,

F (∑k

tkxk) ≤∑k

tk F (xk)

61

for all convex combination of xk, i.e.,∑∑

k tk = 1, tk ≥ 0. By the Mazur lemma thereexists a sequence of convex combination of weak convergent sequence (xk, F (xk))to (x, F (x)) in X ×R that converges strongly to (x, F (x)) and thus

F (x) ≤ lim inf n→∞F (xn).

Theorem (Existence) (1) A lower-semicontinuous functional F on a compact set Shas a minimizer.(2) A weak lower-semicontinuous, coercive functional F has a minimizer.

Proof: (1) One can select a minimizing sequence xn such that F (xn) ↓ η = infx∈S F (x).Since S is a compact, there exits a subsequaence that converges strongly to x∗ ∈ S.Since F is lower-semicontinuous, F (x∗) ≤ η.(2) Since F (x)→∞ as |x| → ∞, there exits a bounded minimizing sequence xn of F .A bounded sequence in a reflexible Banach space X has a weak convergent subsequenceto x∗ ∈ X. Since F is weakly lower-semicontinuous, F (x∗) ≤ η.

Next, we consider the constrained optimization in Banach spaces X, Y :

min F (y) subject to G(y) ∈ K (3.35)

over y ∈ C, where K is a closed convex cone of a Banach space Y , F : X → R andG : X → Y are C1 and C is a closed convex set of X. For example, if K = 0, thenG(y) = 0 is the equality constraint. If Y = Z is a Hilbert lattice and

K = z ∈ Z : z ≤ 0,

then G(y) ≤ 0 is the inequality constraint. Then, we have the Lagrange multipliertheory:

Theorem (Lagrange Multiplier Theory) Let y∗ ∈ C be a solution to (3.35) andassume the regular point condition

0 ∈ int G′(y∗)(C − y∗)−K +G(y∗) (3.36)

holds. Then, there exists a Lagrange multiplier

λ ∈ K+ = y ∈ Y : (y, z) ≤ 0 for all z ∈ K satisfying (λ,G(y∗))Y = 0

such that(F ′(y∗) +G′(y∗)∗λ, y − y∗)X ≥ 0 for all y ∈ C. (3.37)

Proof: Let F = y ∈ C : G(y) ∈ K be the feasible set. The sequential tangent coneof F at y∗ is defined by

T (F , y∗) = x ∈ X : x = limn→∞

yn − y∗

tn, tn → 0+, yn ∈ F.

It is easy show that

(F ′(y∗), y − y∗) ≥ 0 for all y ∈ T (F , y∗).

LetC(y∗) = ∪s≥0, y∈C s(y − y∗), K(G(y∗)) = K − tG(y∗), t ≥ 0.

62

and define the linearizing cone L(F , y∗) of F at y∗ by

L(F , y∗) = z ∈ C(y∗) : G′(y∗)z ∈ K(G(y∗)).

It follows the Listernik’s theorem [] that

L(F , y∗) ⊆ T (F , y∗),

and thus(F ′(y∗), z) ≥ 0 for all z = y − y∗ ∈ L(F , y∗), (3.38)

Define the convex cone B by

B = (F ′(y∗)C(y∗) +R+, G′(y∗)C(y∗)−K(G(y∗)) ⊂ R× Y.

By the regular point condition B has an interior point. From (3.38) the origin (0, 0) isa boundary point of B and thus there exists a hyperplane in R × Y that supports Bat (0, 0), i.e., there exists a nontrivial (α, λ) ∈ R× Y such that

α (F ′(y∗)z + r) + (λ,G′(y∗)z − y) ≥ 0

for all z ∈ C(y∗), y ∈ K(G(y∗)) and r ≥ 0. Setting (z, r) = (0, 0), we have (λ, y) ≤ 0for all y ∈ K(G(y∗)) and thus

λ ∈ K+ and (λ,G(y∗)) = 0.

Letting (r, y) = (0, 0) and z ∈ L(F , y∗), we have α ≥ 0. If α = 0, the regular pointcondition implies λ = 0, which is a contradiction. Without loss of generality we setα = 1 and obtain (3.37) by setting (r, y) = 0.

Corollary (Nonlinear Programming ) If there exists x such that E(x) = 0 andG(x) ≤ 0, then there exits a solution x∗ to (3.34). Assume the regular point condition

0 ∈ int E′(x∗)(C − x∗)× (G′(x∗)(C − x∗)− Z− +G(x∗)),

then there exists λ ∈ Y and µ ∈ Z such that

(J ′(x∗) + (E′(x∗))∗λ+ (G′(x∗)∗µ, x− x∗) ≥ 0 for all x ∈ C,

E(x∗) = 0

G(x∗) ≤ 0, µ ≥ 0 and (G(x∗), µ)Z = 0.

(3.39)

3.7 Control problem

Next, we consider the parameterized nonlinear programming. Let X, Y, U be Hilbertspaces and Z be a Hilbert lattice. We consider the constrained minimization of theform

min J(x, u) = F (x) +H(u) subject to E(x, u) = 0 and G(x, u) ≤ 0 over u ∈ C,(3.40)

where J : X × U → R, E : X × U → Y and G : X × U → Z are continuouslydifferentiable and C is a closed convex set in U . Here we assume that for given u ∈ C

63

there exists a unique x = x(u) ∈ X that satisfies E(x, u) = 0. Thus, one can eliminatex ∈ X as a function of u ∈ U and thus (3.40) can be reduced to the constrainedminimization over u ∈ C. But it is more advantageous to analyze (3.40) directly usingLangrange Theorem as

Theorem (Lagrange Caluclus) Assume F and H is C1. Suppose that for givenv ∈ C there exists a unique x(t) = x(u + t (v − u)) ∈ X for u(t) = u + t(v − u) ina neighborhood B(x, δ) in X and t ∈ [0, δ] that satisfies E(x(t), u + t (v − u)) = 0.Assume there exist a p satisfies the adjoint equation

Ex(x, u)∗p+ F ′(x) = 0 (3.41)

and for all 0 < t < 1 and v ∈ C

(E(x(t), u(t))− E(x, u)− Ex(x(t)− x(0))− Eu(t(v − u)), p)

+F (x(t))− F (x)− F ′(x)(x(t)− x) = o(t)(3.42)

Then, we have

limt→0

J(x(t), ut(v − u)− J(x, u)

t= (Eu(x, u)∗p+H ′(u), v − u).

Proof For v ∈ C let u(t) = u+ t(v − u) and x(t) = x(u+ t(v − u))

J(x(t), u(t))− J(x, u) = F ′(x)(x(t)− x(0)) +H ′(u)(th) + o(t)ε1 = o(t),

ε1 = F (x(t))− F (x)− F ′(x(t))(x(t)− F (x)(3.43)

From (3.41)F ′(x)(x(t)− x) = −(Ex(x, u)(x(t)− x(0)), p) (3.44)

Since E(x(t), u(t)) = E(x(0), u(0)) = 0 define

Ex(x, u)(x(t)− x(0)) + Eu(x, u)(u(t)− u) =

Ex(x, u)(x(t)− x(0)) + Eu(x, u)(u(t)− u)− E(x(t), u(t)− E(x(0, u(0)) = ε2(3.45)

and thus it follows from it follows from (3.43)–(3.45) that

J(x(t), u(t)− J(x, u) = (Eu(x, u)(u(t)− u), p) +H(u)(t (v − u)) + o(t) + ε1 + ε2

and from the assumption (3.42) the claim holds.

Corollary (Lagrange Multiplier theory (revised)) Let (x∗, u∗) be a minimizerof (3.40), The conditions in Theorem hols at (x∗, u∗). Then the necessary optimalitycondition is given by

(Eu(x∗, u∗)∗p+H ′(u), v − u∗) for all v ∈ C

where p satisfies the adjoint equation

Ex(x∗, u∗)p+ F ′(x∗) = 0.

64

Remark (1) If E(x, u) is C1 and Ex(x, u) is bijective it follows from the implicitfunction theorem (see, Section??) there exists a neighborhood B((x, u), δ) such thatE(x, u) = 0 has a unique solution x given u. Moreover, t → x(t) ∈ X is continuouslydifferentiable and

Ex(x, u)d

dtx(0) + Eu(x, u)(v − u) = 0.

Thus, assumption (3.42) holds.

(2) Suppose E : Y → Y ∗ and F are C2, p ∈ Y and |x(t)−x(0)|2t → 0 as t → 0, then

assumption (3.42) holds.(3) The penalty formulation is given by

min Jβ(x, u) = F (x) +H(u) +β

2|E(x, u)|2 subject to u ∈ C.

Assume (xβ, uβ) is a solution. Then the necessary optimality condition is given by

Ex(xβ, uβ)∗λβ + F ′(xβ) = 0

(Eu(xβ, uβ)∗λβ +H ′(uβ), v − uβ) = 0 for all v ∈ C

E(xβ, uβ)− λββ = 0.

(3.46)

Consider the iterative method for (xn, un) for as follows. If we apply E(x, u) ∼E(xn, un) + Ex(xn, un)(x+ − xn) + Eu(u+ − u). the linearization of E at (xn, un)for solutions to (3.46), then we have the update formula for (x+, u+), given (xn, un)with 0 < β ≤ ∞ by solving the system:

E(xn, un) + Ex(xn, un)(x+ − xn) + Eu(u+ − u)− λ+

β = 0

Ex(xn, un)∗λ+ + F ′(x+) = 0

(Eu(xn, un)∗λ+ +H ′(u+), v − u+) = 0 for all v ∈ C,

which is a sequential programming as in Section 10. In general the optimality conditionis written as u=Ψ(λ+) and thus the system is for (x+, λ+).Corollary (General case) Let (x∗, u∗) be a minimizer of (3.40) and assume the reg-ular point holds. It follows from Theorem 8 that the necessary optimality condition isgiven by

Jx(x∗, u∗) + Ex(x∗, u∗)∗λ+Gx(x∗, u∗)∗µ = 0

(Ju(x∗, u∗) + Eu(x∗, u∗)∗λ+Gu(x∗, u∗)∗µ, u− u∗) ≥ 0 for all u ∈ C

E(x∗, u∗) = 0

G(x∗, µ∗) ≤ 0, µ ≥ 0, and (G(x∗, u∗), µ) = 0.

(3.47)

65

Example (Control problem) Consider the optimal control problem

min

∫ T

0(`(x(t)) + h(u(t))) dt+G(x(T ))

subject tod

dtx(t)− f(x, u) = 0, x(0) = x0

u(t) ∈ U = a closed convex set in Rm.

(3.48)

We let X = H1(0, T ;Rn), U = L2(0, T : Rm), Y = L2(0, T ;Rn) and

J(x, u) =

∫ T

0(`(x) + h(u)) dt+G(x(T )), E(x, u) = f(x, u)− d

dtx

and C = u ∈ U, u(t) ∈ U a.e in (0, T ). From (3.47) we have∫ T

0(`′(x∗(t))φ(t) + (p(t), fx(x∗, u∗)φ(t)− d

dtφ) dt+G′(x(T ))φ(T ) = 0

for all φ ∈ H1(0, T ;Rn) with φ(0) = 0. By the integration by part∫ T

0(

∫ T

t(`′(x∗) + fx(x∗, u∗)tp) ds+G′(x∗(T ))− p(t), dφ

dt) dt = 0.

Since φ ∈ H1(0, T ;Rn) satisfying φ(0) = 0 is arbitrary, we have

p(t) = G′(x∗(T )) +

∫ T

t(`′(x∗) + fx(x∗, u∗)tp) ds

and thus the adjoint state p is absolutely continuous and satisfies

− d

dtp(t) = fx(x∗, u∗)tp(t) + `′(x∗), p(T ) = G′(x∗(T )).

From the the optimality, we have∫ T

0(h′(u∗) + fu(x∗, u∗)tp(t))(u(t)− u∗(t)) dt

for all u ∈ C, which implies

(h′(u∗(t)) + fu(x∗(t), u∗(t))tp(t))(v − u∗(t)) ≥ 0 for all v ∈ U

at Lebesgue points t of u∗(t). Hence we obtain the necessary optimality is of the formof Two-Point-Boundary value problem;

ddtx∗(t) = f(x∗, u∗), x(0) = x0

(h′(u∗(t)) + fu(x∗(t), u∗(t))tp(t), v − u∗(t)) ≥ 0 for all v ∈ U

− ddtp(t) = fx(x∗, u∗)tp(t) + lx(x∗), p(T ) = Gx(x∗(T )).

66

Let U = [−1, 1]m coordinate-wise and h(u) = α2 |u|

2, f(x, u) = f(x) + Bu. Thus,the optimality condition implies

(αu∗(t) +Btp(t), v − u∗(t))Rm ≥ 0 fur all |v|∞ ≤ 1.

Or, equivalently u∗(t) minimizes

α

2|u|2 + (Btp(t), u) over |u|∞ ≤ 1

Thus, we have

u∗(t) =

1 if − Btp(t)α > 1

−Btp(t)α if |B

tp(t)α | ≤ 1

−1 if − Btp(t)α < −1.

(3.49)

Example (Coefficient estimation) Let ω be a bounded open set in Rd and Ω be a subsetin Ω. Consider the parameter estimation problem

min J(u, c) =

∫Ω|y − u|2 dx+

∫Ω|∇c|2 dx

subject to−∆u+ c(x)u = f, c ≥ 0

over (u, c) ∈ H10 (Ω)×H1(Ω). We let Y = (H1

0 (Ω))∗ and

〈E(u, c), φ〉 = (∇u,∇φ) + (c u, φ)L2(Ω) − 〈f, φ〉

for all φ ∈ H10 (Ω). Since

〈Jx + E∗xλ, h〉 = (∇h,∇λ) + (c h, λ)L2(ω) − ((y − u)χΩ, h) = 0,

for all h ∈ H10 (Ω), i.e., the adjoint λ ∈ H1

0 (Ω) satisfies

−∆λ+ c λ = (y − u)χΩ.

Also, there exits µ ≥ 0 in L2(Ω) such that

〈Jc + E∗cλ, d〉 = α (∇c∗,∇d) + (d u, λ)L2(Ω) − (µ, d) = 0

for all d ∈ H1(Ω) and

〈µ, c− c∗〉 = 0 and 〈µ, d〉 ≥ 0 for all d ∈ H1(Ω) satisfying d ≥ 0.

3.8 Minimum norm solution in Banach space

In this section we discuss the minimum distance and the minimum norm problem inBanach spaces. Given a subspace M of a normed space X, the orthogonal complementM⊥ of M is defined by

M⊥ = x∗ ∈ X∗ : 〈x∗, x〉 = 0 for all x ∈M.

67

Theorem If M is a closed subspace of a normed space, then

(M⊥)⊥ = x ∈ X : 〈x, x∗〉 = 0 for all x∗ ∈M⊥ = M

Proof: If there exits a x ∈ (M⊥)⊥ but x /∈M , define a linear functional f by

f(αx+m) = α

on the space spanned by x+M . Since x /∈ m and M is closed

|f | = supm∈M, α∈R

|f(αx+m)||αx+m|X

= supm∈M

|f(x+m)||x+m|X

=1

infm∈M |x+m|X<∞,

and thus f is bounded. By the Hahn-Banach theorem, f can be extended to x∗ ∈ X∗.Since 〈x∗,m〉 = 0 for all m ∈ M , thus x∗ ∈ M⊥. It is contradicts to the fact that〈x∗, x〉 = 1.

Theorem (Duality I) Given x ∈ X and a subspace M , we have

d = infm∈M

|x−m|X = max|x∗|≤1, x∗∈M⊥

〈x∗, x〉 = 〈x∗0, x〉,

where the maximum is attained by some x∗0 ∈ M⊥ with |x∗0| = 1. If the infimum onthe first equality holds for some m0 ∈M , then x∗0 is aligned with x−m0.

Proof: By the definition of inf, for any ε > 0, there exists mε ∈M such that |x−mε| ≤d+ ε. For any x∗ ∈M⊥, |x∗| ≤ 1, it follows that

〈x∗, x〉 = 〈x∗, x−mε〉 ≤ |x∗||x−mε| ≤ d+ ε.

Since ε > 0 is arbitrary, 〈x∗, x〉 ≤ d. It only remains to show that 〈x∗0, x〉 = d for somex∗0 ∈ X∗. Define a linear functional f on the space spanned by x+M by

f(αx+m) = αd

Since

|f | = supm∈M, α∈R

|f(αx+m)||αx+m|X

=d

infm∈M |x+m|X= 1

f is bounded. By the Hahn-Banach theorem, there exists an extension x∗0 of f with|x∗0| = 1. Since 〈x∗0,m〉 = 0 for all m ∈ M , x∗0 ∈ M⊥. By construction 〈x∗0, x〉 = d.???Moreover, if |x−m0| = d for some m0 ∈M , then

〈x∗0, x−m0〉 = d = |x∗0||x−m0|.

3.8.1 Duality Principle

Corollary (Duality) If m0 ∈ M attains the minimum if and only if there exists a

nonzero x∗ ∈M⊥ that is aligned with x−m0.

Proof: Suppose that x∗ ∈M⊥ is aligned with x−m0,

|x−m0| = 〈x∗, x−m0〉 = 〈x∗, x〉 = 〈x∗, x〉 ≤ |x−m|

for all m ∈M .

68

Theorem (Duality principle) (1) Given x∗ ∈ X∗, we have

minm∗∈M⊥

|x∗ −m∗|X∗ = supx∈M, |x|≤1

〈x∗, x〉.

(2) The minimum of the left is achieved by some m∗0 ∈M⊥.(3) If the supremum on the right is achieved by some x0 ∈M , then x∗ −m∗0 is alignedwith x0.

Proof: (1) Note that for m∗ ∈∈M⊥

|x∗ −m∗| = sup|x|≤1〈x∗ −m∗, x〉 ≥ sup

|x|≤1, x∈M〈x∗ −m∗, x〉 = sup

|x|≤1, x∈M〈x∗, x〉.

Consider the restriction of x∗ to M (with norm supx∈M, |x|≤1〈x∗, x〉. Let y∗ be the

Hahn-Banach extension of this restricted x∗. Let m∗0 = x∗ − y∗. Then m∗0 ∈ M⊥ and|x∗ − m∗0| = |y∗| = supx∈M, |x|≤1〈x∗, x〉. Thus, the minimum of the left is achieved

by some m∗0 ∈ M⊥. If the supremum on the right is achieved by some x0 ∈ M , thenx∗−m∗0 is aligned with x0. Obviously, |x0| = 1 and |x∗−m∗0| = 〈x∗, x0〉 = 〈x∗−m∗0, x0〉.

In all applications of the duality theory, we use the alignment properties of the spaceand its dual to characterize optimum solutions, guarantee the existence of a solutionby formulating minimum norm problems in the dual space, and examine to see if thedual problem is easier than the primal problem.Example 1(Chebyshev Approximation) Problem is to to find a polynomial p of degreeless than or equal to n that ”best” approximates f ∈ C[0, 1] in the the sense of thesup-norm over [a, b]. Let M be the space of all polynomials of degree less than or equalto n. Then, M is a closed subspace of C[0, 1] of dimension n + 1. The infimum of|f − p|∞ is achievable by some p0 ∈ M since M is closed. The Chebyshev theoremstates that the set Γ = t ∈ [0, 1] : |f(t) − p0(t)| = |f − p0|∞ contains at least n + 2points. In fact, f−p0 must be aligned with some element in M⊥ ⊂ C[0, 1]∗ = BV [0, 1].Assume Γ contains m < n+ 2 points 0 ≤ t1 < · · · < tm ≤ 1. If v ∈ BV [0, 1] is alignedwith f − p0, then v is a piecewise continuous function with jump discontinuities onlyat these tk’s. Let tk be a point of jump discontinuity of v. Define

q(t) = Πmj 6=k(t− tj)

Then, q ∈M but 〈q, v〉 6= 0 and hence v /∈M⊥, which is a contradiction.

Next, we consider the minimum norm problem;

min |x∗|X∗ subject to 〈x∗, yk〉 = ck, 1 ≤ k ≤ n.

where y1, · · · , yn ∈ X are given linearly independent vectors.

Theorem Let x∗ ∈ X∗ satisfying the constraints 〈x∗, yk〉 = ck, 1 ≤ k ≤ n. LetM = spanyk. Then,

d = min〈x∗,yk〉=ck

|x∗|X∗ = minm∗∈M⊥

|x∗ −m∗|.

By the duality principle,

d = minm∗ıM⊥

|x∗ −m∗| = supx∈M, |x|≤1

〈x∗, x〉.

69

Write x = Y a where Y = [y1, · · · , yn] and a ∈ Rn. Then

d = min |x∗| = max|Y a|≤1

〈x∗, Y a〉 = max|Y a|≤1

(c, a).

Thus, the optimal solution x∗0 must be aligned with Y a.

Example (DC motor control) Let X = L1(0, 1) and X∗ = L∞(0, 1).

min |u|∞

subject to

∫ 1

0et−1u(t) dt = 0,

∫ 1

0(1 + et−1)u(t) = 1.

Let y1 = et−1 and y2 = 1 + et−1. From the duality principle

min〈yk,u〉=ck

|u|∞ = max|a1y1+a2y2|1≤1

a2.

The control problem now is reduced to finding constants a1 and a2 that maximizes a2

subject to∫ 1

0 |(a1 − a2)et−1 + a2| dt ≤ 1. The optimal solution u ∈ L∞(0, 1) should bealigned with (a1 − a2)et−1 + a2 ∈ L1(0, 1). Since (a1 − a2)et−1 + a2, can change signat most once. The alignment condition implies that u must have values |u|∞ and canchange sign at most once, which is the so called bang-bang control.

4 Linear Cauchy problem and C0-semigroup the-

ory

In this section we discuss the Cauchy problem of the form

d

dtu(t) = Au(t) + f(t), u(0) = u0 ∈ X

in a Banach space X, where u0 ∈ X is the initial condition and f ∈ L1(0, T ;X). Suchproblems arise in PDE dynamics and functional equations.

We construct the mild solution u(t) ∈ C(0, T ;X):

u(t) = S(t)u0 +

∫ t

0S(t− s)f(s) ds (4.1)

where a family of bounded linear operator S(t), t ≥ 0 is C0-semigroup on X.

Definition (C0 semigroup) (1) Let X be a Banach space. A family of boundedlinear operators S(t), t ≥ 0 on X is called a strongly continuous (C0) semigroup if

S(t+ s) = S(t)S(s) for t, s ≥ 0 with S(0) = I

|S(t)φ− φ| → 0 as t→ 0+ for all φ ∈ X.

(2) A linear operator A in X defined by

Aφ = limt→0+

S(t)φ− φt

(4.2)

70

with

dom(A) = φ ∈ X : the strong limit of limt→0+

S(t)φ− φt

in X exists.

is called the infinitesimal generator of the C0 semigroup S(t).

In this section we present the basic theory of the linear C0-semigroup on a Banachspace X. The theory allows to analyze a wide class of the physical and engineeringdynamics using the unified framework. We also present the concrete examples todemonstrate the theory. There is a necessary and sufficient condition (Hile-YosidaTheorem) for a closed, densely defined linear A in X to be the infinitesimal generatorof the C0 semigroup S(t). Moreover, we will show that the mild solution u(t) satisfies

〈u(t), ψ〉 = 〈u0, ψ〉+

∫(〈x(s), A∗ψ〉+ 〈f(s), ψ〉 ds (4.3)

for all ψ ∈ dom (A∗).Examples (1) For A ∈ L(X), define a sequence of linear operators in X

SN (t) =∑k

1

k!(At)k.

Then

|SN (t)| ≤∑ 1

k!(|A|t)k ≤ e|A| t

andd

dtSN (t) = ASN−1(t)

Since S(t) = eAt = limN→∞ SN (t) in the operator norm, we have

d

dtS(t) = AS(t) = S(t)A.

(2) Consider the the hyperbolic equation

ut + ux = 0, u(0, x) = u0(x) in (0, 1). (4.4)

Define the semigroup S(t) of translations on X = L2(0, 1) by

[S(t)u0](x) = u0(x− t), where u0(x) = 0, x ≤ 0, u0 = u0 on [0, 1].

Then, S(t), t ≥ 0 is a C0 semigroup on X. If we define u(t, x) = [S(t)u0](x) withu0 ∈ H1(0, 1) with u0(0) = 0 satisfies (4.6) a.e.. The generator A is given by

Aφ = −φ′ with dom(A) = φ ∈ H1(0, 1) with φ(0) = 0.

In factS(t)u0 − u0

t=u0(x− t)− u0(x)

t= u′0(x), a.e. x ∈ (0, 1).

if u0 ∈ dom(A). Thus, u(t) = S(t)u0 satisfies the Cauchy problem ddtu(t) = Au(t) if

u0 ∈ dom(A).

71

(3) Let Xt is a Markov process, i.e.

Ex[g(Xt+h|calF t] = E0,Xt [g(Xh)].

for all g ∈ X = L2(Rn). Define the linear operator S(t) by

(S(t)u0)(x) = E0,x[u0(Xt)], t ≥ 0, u0 ∈ X.

The semigroup property of S(t) follows from the Markov property, i.e.

S(t+s)u0 = E0,x[u0(Xt+s)] = E[E[g(X0,xt+s|Ft] = E[EX

0,xt u0(Xs)]] = Ex[(S(t)u0)(Xs)] = S(s)(S(t)u0).

The strong continuity follows from that X0,xt − x is a.s for all x ∈ Rn. If Xt = Bt is a

Brownian motion, then the semigroup S(t) is defined by

[S(t)u0](x) =1

(√

2πtσ)n

∫Rne−|x−y|2

2σ2 t u0(y) dy, (4.5)

and u(t) = S(t)u0 satisfies the heat equation.

ut =σ2

2∆u, u(0, x) = u0(x) in L2(Rn). (4.6)

4.1 Finite difference in time

Let A be closed, densely defined linear operator dom(A) → X. We use the finitedifference method in time to construct the mild solution (4.1). For a stepsize λ > 0consider a sequence un in X generated by

un − un−1

λ= Aun + fn−1, (4.7)

with

fn−1 =1

λ

∫ nλ

(n−1)λf(t) dt.

Assume that for λ > 0 the resolvent operator

Jλ = (I − λA)−1

is bounded. Then, we have the product formula:

un = Jnλu0 +

n−1∑k=0

Jn−kλ fk λ. (4.8)

In order to un ∈ X is uniformly bounded in n for all u0 ∈ X (with f = 0), it isnecessary that

|Jnλ | ≤M

(1− λω)nfor λω < 1, (4.9)

for some M ≥ 1 and ω ∈ R.

Hille’s Theorem Define a piecewise constant function in X by

uλ(t) = uk−1 on [tk−1, tk)

72

Then,maxt∈[0,T ]

|uλ − u(t)|X → 0

as λ→ 0+ to the mild solution (4.1). That is,

S(t)x = limn→∞

(I − t

nA)[ t

n]x

exists for all x ∈ X and S(t), t ≥ 0 is the C0 semigoup on X and its generator is A,where [s] is the largest integer less than s ∈ R.

Proof: First, note that

|Jλ| ≤M

1− λωand for x ∈ dom(A)

Jλx− x = λJλAx,

and thus

|Jλx− x| = |λJλAx| ≤λ

1− λω|Ax| → 0

as λ→ 0+. Since dom(A) is dense in X it follows that

|Jλx− x| → 0 as λ→ 0+ for all x ∈ X.

Define the linear operators Tλ(t) and Sλ(t) by

Sλ(t) = Jkλ and Tλ(t) = Jk−1λ +

t− tkλ

(Jkλ − Jk−1λ ), on (tk−1, tk].

Then,d

dtTλ(t) = ASλ(t), a.e. in t ∈ [0, T ].

Thus,

Tλ(t)u0−Tµ(t)u0 =

∫ t

0

d

ds(Tλ(s)Tµ(t−s)u0) ds =

∫ t

0(Sλ(s)Tµ(t−s)−Tλ(s)Sµ(t−s))Au0 ds

Since

Tλ(s)u− Sλ(s)u =s− tkλ

Tλ(tk−1)(Jλ − I)u on s ∈ (tk−1, tk].

By the bounded convergence theorem

|Tλ(t)u0 − Tµ(t)u|X → 0

as λ, µ → 0+ for all u ∈ dom(A2). Thus, the unique limit defines the linear operatorS(t) by

S(t)u0 = limλ→0+

Sλ(t)u0. (4.10)

for all u0 ∈ dom(A2). Since

|Sλ(t)| ≤ M

(1− λω)[t/n]≤Meωt

73

and dom(A2) is dense, (4.10) holds for all u0 ∈ X. Moreover, we have

S(t+ s)u = limλ→0+

Jn+mλ = JnλJ

mλ u = S(t)S(s)u

and limt→0+ S(t)u = limt→0+ Jtu = u for all u ∈ X. Thus, S(t) is the C0 semigroupon X. Moreover, S(t), t ≥ 0 is in the class G(M,ω), i.e.,

|S(t)| ≤Meωt.

Note that

Tλ(t)u0 − u0 = A

∫ t

0Sλu0 ds.

Since limλ→0+ Tλ(t)u0 = limλ→0+ Sλ(t)u0 = S(t)u0 and A is closed, we have

S(t)u0 − u0 = A

∫ t

0S(s)u0 ds,

∫ t

0S(s)u0 ds ∈ dom(A).

If B is a generator of S(t), t ≥ 0, then

Bx = limt→0+

S(t)x− xt

= Ax

if x ∈ dom(A). Conversely, if u0 ∈ dom(B), then u0 ∈ dom(A) since A is closed andt→ S(t)u is continuous at 0 for all u ∈ X and thus

1

tA

∫ t

0S(s)u0 ds = Au0 as t→ 0+.

Hence

Au0 =S(t)u0 − u0

t= Bu0

That is, A is the generator of S(t), t ≥ 0.Similarly, we have

n−1∑k=0

Jn−kλ fk =

∫ t

0Sλ(t− s)f(s) ds→

∫ t

0S(t− s)f(s) ds as λ→ 0+

by the Lebesgue dominated convergence theorem.

The following theorem state the basic properties of C0 semigroups:

Theorem (Semigroup) (1) There exists M ≥ 1, ω ∈ R such that S ∈ G(M,ω) class,i.e.,

|S(t)| ≤M eωt, t ≥ 0. (4.11)

(2) If x(t) = S(t)x0, x0 ∈ X, then x ∈ C(0, T ;X)(3) If x0 ∈ dom (A), then x ∈ C1(0, T ;X) ∩ C(0, T ; dom(A)) and

d

dtx(t) = Ax(t) = AS(t)x0.

(4) The infinitesimal generator A is closed and densely defined. For x ∈ X

S(t)x− x = A

∫ t

0S(s)x ds. (4.12)

74

(5) λ > ω the resolvent is given by

(λ I −A)−1 =

∫ ∞0

e−λsS(s) ds (4.13)

with estimate

|(λ I −A)−n| ≤ M

(λ− ω)n. (4.14)

Proof: (1) By the uniform boundedness principle there exists M ≥ 1 such that |S(t)| ≤M on [0, t0] For arbitrary t = k t0+τ , k ∈ N and τ ∈ [0, t0) it follows from the semigroupproperty we get

|S(t)| ≤ |S(τ)||S(t0|k ≤Mek log |S(t0)| ≤Meω t

with ω = 1t0

log |S(t0)|.(2) It follows from the semigroup property that for h > 0

x(t+ h)− x(t) = (S(h)− I)S(t)x0

and for t− h ≥ 0x(t− h)− x(t) = S(t− h)(I − S(h))x0

Thus, x ∈ C(0, T ;X) follows from the strong continuity of S(t) at t = 0.(3)–(4) Moreover,

x(t+ h)− x(t)

h=S(h)− I

hS(t)x0 = S(t)

S(h)x0 − x0

h

and thus S(t)x0 ∈ dom(A) and

limh→0+

x(t+ h)− x(t)

h= AS(t)x0 = Ax(t).

Similarly,

limh→0+

x(t− h)− x(t)

−h= lim

h→0+S(t− h)

S(h)φ− φh

= S(t)Ax0.

Hence, for x0 ∈ dom(A)

S(t)x0 − x0 =

∫ t

0S(s)Ax0 ds =

∫ t

0AS(s)x0 ds = A

∫ t

0S(s)x0 ds (4.15)

If xn ∈ don(A)→ x and Axn → y in X, we have

S(t)x− x =

∫ t

0S(s)y ds

Since

limt→0+

1

t

∫ t

0S(s)y ds = y

x ∈ dom(A) and y = Ax and hence A is closed. Since A is closed it follows from (4.15)that for x ∈ X ∫ t

0S(s)x ds ∈ dom(A)

75

and (4.12) holds. For x ∈ X let

xh =1

h

∫ h

0S(s)x ds ∈ dom(A)

Since xh → x as h→ 0+, dom(A) is dense in X.(5) For λ > ω define Rt ∈ L(X) by

Rt =

∫ t

0e−λsS(s) ds.

Since A− λ I is the infinitesimal generator of the semigroup eλtS(t), from (4.12)

(λ I −A)Rtx = x− e−λtS(t)x.

Since A is closed and |e−λtS(t)| → 0 as t→∞, we have R = limt→∞Rt satisfies

(λ I −A)Rφ = φ.

Conversely, for φ ∈ dom(A)

R(A− λ I)φ =

∫ ∞0

e−λsS(s)(A− λ I)φ = limt→∞

e−λtS(tφ− φ = −φ

Hence

R =

∫ ∞0

e−λsS(s) ds = (λ I −A)−1

Since for φ ∈ X

|Rx| ≤∫ ∞

0|e−λsS(s)x| ≤M

∫ ∞0

e(ω−λ)s|x| ds =M

λ− ω|x|,

we have

|(λ I −A)−1| ≤ M

λ− ω, λ > ω.

Note that

(λ I −A)−2 =

∫ ∞0

e−λtS(t) ds

∫ ∞0

eλ sS(s) ds =

∫ ∞0

∫ ∞0

e−λ(t+s)S(t+ s) ds dt

=

∫ ∞0

∫ ∞t

e−λσS(σ) dσ dt =

∫ ∞0

σe−λσS(σ) dσ.

By induction, we obtain

(λ I −A)−n =1

(n− 1)!

∫ ∞0

tn−1e−λtS(t) dt. (4.16)

Thus,

|(λ I −A)−n| ≤ 1

(n− 1)!

∫ ∞0

tn−1e−(λ−ω)t dt =M

(λ− ω)n.

We then we have the necessary and sufficient condition:

76

Hile-Yosida Theorem A closed, densely defined linear operator A on a Banach spaceX is the infinitesimal generator of a C0 semigroup of class G(M,ω) if and only if

|(λ I −A)−n| ≤ M

(λ− ω)nfor all λ > ω (4.17)

Proof: The sufficient part follows from the previous Theorem. In addition, we describethe Yosida construction. Define the Yosida approximation Aλ ∈ L(X) of A by

Aλ =Jλ − Iλ

= AJλ. (4.18)

Define the Yosida approximation:

Sλ(t) = eAλt = e−tλ eJλ

tλ .

Since

|Jkλ | ≤M

(1− λω)k

we have

|Sλ(t)| ≤ e−tλ

∞∑k=0

M

k!|Jkλ |(

t

λ)k ≤Me

ω1−λω t.

Sinced

dsSλ(s)Sλ(t− s) = Sλ(s)(Aλ −Aλ)Sλ(t− s),

we have

Sλ(t)x− Sλ(t)x =

∫ t

0Sλ(s)Sλ(t− s)(Aλ −Aλ)x ds

Thus, for x ∈ dom(A)

|Sλ(t)x− Sλ(t)x| ≤M2teωt |(Aλ −Aλ)x| → 0

as λ, λ→ 0+. Since dom(A) is dense in X this implies that

S(t)x = limλ→0+

Sλ(t)x exist for all x ∈ X,

defines a C0 semigroup of G(M,ω) class. The necessary part follows from (4.16)

Theorem (Mild solution) (1) If for f ∈ L1(0, T ;X) define

x(t) = x(0) +

∫ t

0S(t− s)f(s) ds,

then x(t) ∈ C(0, T ;X) and it satisfies

x(t) = A

∫ t

0x(s) ds+

∫ t

0f(s) ds. (4.19)

(2) If Af ∈ L1(0, T ;X) then x ∈ C(0, T ; dom(A)) and

x(t) = x(0) +

∫ t

0(Ax(s) + f(s)) ds.

77

(3) If f ∈ W 1,1(0, T ;X), i.e. f(t) = f(0) +∫ t

0 f′(s) ds, d

dtf = f ′ ∈ L1(0, T ;X), thenAx ∈ C(0, T ;X) and

A

∫ t

0S(t− s)f(s) ds = S(t)f(0)− f(t) +

∫ t

0S(t− s)f ′(s) ds. (4.20)

Proof: Since ∫ t

0

∫ τ

0S(t− s)f(s) dsdτ =

∫ t

0(

∫ t

sS(τ − s)d τ)f(s) ds,

and

A

∫ t

0S(s) ds = S(t)− I

we have x(t) ∈ dom(A) and

A

∫ t

0x(s) ds = S(t)x− x+

∫ t

0S(t− s)f(s) ds−

∫ t

0f(s) ds.

and we have (4.19).(2) Since for h > 0

x(t+ h)− x(t)

h=

∫ t

0S(t− s)S(h)− I

hf(s) ds+

1

h

∫ t+h

tS(t+ h− s)f(s) ds

if Af ∈ L1(0, T ;X)

limh→0+

x(t+ h)− x(t)

h=

∫ t

0S(t− s)Af(s) ds+ f(t)

a.e. t ∈ (0, T ). Similarly,

x(t− h)− x(t)

−h=

∫ t−h

0S(t− h− s)S(h)− I

hf(s) ds+

1

h

∫ t

t−hS(t− s)f(s) ds

→∫ t

0S(t− s)Af(s) ds+ f(t)

a.e. t ∈ (0, T ).(3) Since

S(h)− Ih

x(t) =1

h(

∫ h

0S(th − s)f(s) ds−

∫ t+h

tS(t+ f − s)f(s) ds

+

∫ t

0S(t− s)f(s+ h)− f(s)

hds,

letting h→ 0+, we obtain(4.20).

It follows from Theorems the mild solution

x(t) = S(t)x(0) +

∫ t

0S(t− s)f(s) ds

78

satisfies

x(t) = x(0) +A

∫ t

0x(s) +

∫ t

0f(s) ds.

Note that the mild solution x ∈ C(0, T ;X) depends continuously on x(0) ∈ X andf ∈ L1(0, T ;X) with estimate

|x(t)| ≤M(eωt|x(0)|+∫ t

0eω(t−s)|f(s)| ds).

Thus, the mild solution is the limit of a sequence xn of strong solutions with xn(0) ∈dom(A) and fn ∈W 1,1(0, T ;X), i.e., since dom(A) is dense in X and W 1,1(0, T ;X) isdense in L1(0, T ;X),

|xn(t)− x(t)|X → 0 uniformly on [0, T ]

for|xn(0)− x(0)|x → 0 and |fn − f |L1(0,T ;X) → 0 as n→∞.

Moreover, the mild solution x ∈ C(0, T : X) is a weak solution to the Cauchy problem

d

dtx(t) = Ax(t) + f(t) (4.21)

in the sense of (4.3), i.e., for all ψ ∈ dom(A∗) 〈x(t), ψ〉X×X∗ is absolutely continuesand

d

dt〈x(t), ψ〉 = 〈x(t), ψ〉+ 〈f(t), ψ〉 a.e. in (0, T ).

If x(0) ∈ dom(A) and Af ∈ L1(0, T ;X), then Ax ∈ C(0, T ;X), x ∈W 1,1(0, T ;X) and

d

dtx(t) = Ax(t) + f(t), a.e. in (0, T )

If x(0) ∈ dom(A) and f ∈W 1,1(0, T ;X), then x ∈ C(0, T ; dom(A)) ∩ C1(0, T ;X) and

d

dtx(t) = Ax(t) + f(t), everywhere in [0, T ].

4.2 Weak-solution and Ball’s result

Let A be a densely defined, closed linear operator on a Banach space X. Consider theCauchy equation in X:

d

dtu = Au+ f(t), (4.22)

where u(0) = x ∈ X and f ∈ L1(0, τ ;X) is a weak solution to of (4.22) if for everyψ ∈dom(A∗) the function t→ 〈u(t), ψ〉 is absolutely continuous on [0, τ ] and

d

dt〈u(t), ψ〉 = 〈u(t), A∗ψ〉+ 〈f(t), ψ〉, a.e. in [0, τ ]. (4.23)

It has been shown that the mild solution to (4.22) is a weak solution.Lemma B.1 Let A be a densely defined, closed linear operator on a Banach spaceX. If x, y ∈ X satisfy 〈y, ψ〉 = 〈x,A∗ψ〉 for all ψ ∈dom(A∗), then x ∈ dom(A) andy = Ax.

79

Proof: Let G(A) ⊂ X × X denotes the graph of A. Since A is closed G(A) isclosed. Suppose y 6= Ax. By Hahn-Banach theorem there exist z, z∗ ∈ X∗ such that〈Ax, z〉+ 〈x, z∗〉 = 0 and 〈y, z〉+ 〈x, z∗〉 6= 0. Thus z ∈dom(A∗) and z∗ = A∗z. By thecondition 〈y, z〉+ 〈x, z∗〉 = 0, which is a contradiction.

Then we have the following theorem.

Theorem (Ball) There exists for each x ∈ X and f ∈ L1(0, τ ;X) a unique weaksolution of (4.22)satisfying u(0) = x if and only if A is the generator of a stronglycontinuous semigroup T (t) of bounded linear operator on X, and in this case u(t) isgiven by

u(t) = T (t)x+

∫ t

0T (t− s)f(s) ds. (4.24)

Proof: Let A generate the strongly continuous semigroup T (t) on X. Then, forsome M , |T (t)| ≤ M on t ∈ [0, τ ]. Suppose x ∈dom(A) and f ∈ W 1,1(0, τ ;X). Thenwe have

d

dt〈u(t), ψ〉 = 〈Au(t) + f(t), ψ〉 = 〈u(t), A∗ψ〉+ 〈f(t), ψ〉.

For (x, f) ∈ X×L1(0, τ ;X) there exista a sequence (xn, fn) in dom(A)×W 1,1(0, τ ;X)such that |xn − x|X + |fn − f |L1(0,τ ;X) → 0 as n→∞ If we define

un(t) = T (t)xn +

∫ t

0T (t− s)fn(s) ds,

then we have

〈un(t), ψ〉 = 〈x, ψ〉+

∫ t

0(〈un(s), A∗ψ〉+ 〈fn(s), ψ〉) ds

and

|un(t)− u(t)|X ≤M (|xn − x|X +

∫ t

0|fn(s)− f(s)|X ds).

Passing limit n→]∞, we see that u(t) is a weak solution of (4.22).Next we prove that u(t) is the only weak solution to (4.22) satisfying u(0) = x. Let

u(t) be another such weak solution and set v = u− u. Then we have

〈v(t), ψ〉 = 〈∫ t

0v(s) dt, A∗ψ〉

for all ψ ∈dom(A∗) and t ∈ [0, τ ]. By Lemma B.1 this implies z(t) =∫ t

0 v(s) ds ∈dom(A)

and ddtz(t) = Az(t) with z(0) = 0. Thus z = 0 and hence u(t) = u(t) on [0, τ ].

Suppose that A such that (4.22) has a unique weak solution u(t) satisfying u(0) = x.For t ∈ [0, τ ] we define the linear operator T (t) on X by T (t)x = u(t) − u0(t), whereu0 is the weak solution of (4.22) satisfying u(0) = 0. If for t = nT + s, where n isa nonnegative integer and s ∈ [0, τ) we define T (t)x = T (s)T (τ)nx, then T (t) is asemigroup. The map θ : x→ C(0, τ ;X) defined by θ(x) = T (·)x has a closed graph bythe uniform bounded principle and thus T (t) is a strongly continuous semigroup. LetB be the generator of T (t) and x ∈dom(B). For ψ ∈dom(A∗)

d

dt〈T (t)x, ψ〉|t=0 = 〈Bx,ψ〉 = 〈x,A∗ψ〉.

80

It follows from Lemma that x ∈dom(A) and Ax = Bx. Thus dom(B) ⊂dom(A). Theproof of Theorem is completed by showing dom(A) ⊂dom(B). Let x ∈dom(A). Sincefor z(t) = T (t)x

〈z(t), ψ〉 = 〈∫ t

0z(s) dt, A∗ψ〉

it follows from Lemma that∫ t

0 T (s)x ds and∫ t

0 T (s)Axds belong to dom(A) and

T (t)x = x+A

∫ t

0T (s)x ds

T (t)Ax = Ax+A

∫ t

0T (s)Axds

(4.25)

Consider the function

w(t) =

∫ t

0T (s)Axds−A

∫ t

0T (s)x ds.

It then follows from (4.25) that z ∈ C(0, τ ; X). Clearly w(0) = 0 and it also followsfrom (4.25) that

d

dt〈w(t), ψ〉 = 〈w(t), A∗ψ〉. (4.26)

for ψ ∈dom(A∗). But it follows from our assumptions that (4.26) has the uniquesolution w = 0. Hence from (4.25)

T (t)x− x = A

∫ t

0T (s)x ds

and thus

limt→0+

T (t)x− xt

= Ax

which implies x ∈dom(B).

4.3 Lumer-Phillips Theorem

The condition (4.17) is very difficult to check for a given A in general. For the caseM = 1 we have a very complete characterization.

Lumer-Phillips Theorem The followings are equivalent:(a) A is the infinitesimal generator of a C0 semigroup of G(1, ω) class.(b) A− ω I is a densely defined linear m-dissipative operator,i.e.

|(λ I −A)x| ≥ (λ− ω)|x| for all x ∈ don(A), λ > ω (4.27)

and for some λ0 > ωR(λ0 I −A) = X. (4.28)

Proof: It follows from the m-dissipativity

|(λ0 I −A)−1| ≤ 1

λ0 − ω

81

Suppose xn ∈ dom(A)→ x and Axn → y in X, the

x = limn→∞

xn = (λ0 I −A)−1 limn→∞

(λ0 xn −Axn) = (λ0 I −A)−1(λ0 x− y).

Thus, x ∈ dom(A) and y = Ay and hence A is closed. Since for λ > ω

λ I −A = (I + (λ− λ0)(λ0 I −A)−1)(λ0 I −A),

if |λ−λ0|λ0−ω < 1, then (λ I − A)−1 ∈ L(X). Thus by the continuation method we have

(λ I −A)−1 exists and

|(λ I −A)−1| ≤ 1

λ− ω, λ > ω.

It follows from the Hile-Yosida theorem that (b) → (a).(b) → (a) Since for x∗ ∈ F (x), the dual element of x, i.e. x∗ ∈ X∗ satisfying〈x, x∗〉X×X∗ = |x|2 and |x∗| = |x|

〈e−ωtS(t)x, x∗〉 ≤ |x||x∗| = 〈x, x∗〉

we have for all x ∈ dom(A)

0 ≥ limt→0+

〈e−ωtS(t)x− x

t, x∗〉 = 〈(A− ω I)x, x∗〉 for all x∗ ∈ F (x).

which implies A− ω I is dissipative.

Theorem (Dissipative I) (1) A is a ω-dissipative

|λx−Ax| ≥ (λ− ω)|x| for all x ∈ dom (A).

if and only if (2) for all x ∈ dom(A) there exists x∗ ∈ F (x) such that

〈Ax, x∗〉 ≤ ω |x|2. (4.29)

(2)→ (1). Let x ∈ dom(A) and choose x∗ ∈ F (0) such that 〈A, x∗〉 ≤ 0. Then, for anyλ > 0,

|x|2 = 〈x, x∗〉 = 〈λx−Ax+Ax, x∗〉 ≤ 〈λx−Ax, x∗〉 − ω |x|2 ≤ |λx−Ax||x| − ω |x|2,

which implies (2).(1)→ (2). From (1) we obtain the estimate

− 1

λ(|x− λAx| − |x|) ≤ 0

and

〈Ax, x〉− = limλ→0+

=1

λ(|x| − |x− λAx|) ≤ 0

which implies there exists x∗ ∈ F (x) such that (4.29) holds since 〈Ax, x〉− = 〈AX, x∗〉for some x∗ ∈ F (x).

Thus, Lumer-Phillips theorem says that if m-diisipative, then (4.29) hold for allx∗ ∈ F (x).

82

Theorem (Dissipative II) Let A be a closed densely defined operator on X. If Aand A∗ are dissipative, then A is m-dissipative and thus the infinitesimal generator ofa C − 0-semigroup of contractions.

Proof: Let y ∈ R(I −A) be given. Then there exists a sequence xn ∈ dom(A) suchthat y=xn −Axn → y as n→∞. By the dissipativity of A we obtain

|xn − xm| ≤ |xn − xm −A(xn − xm)| ≤ |y−ym|

Hence xn is a Cauchy sequence in X. We set x = limn→∞xn. Since A is closed, wesee that x ∈ dom (A) and x − Ax = y, i.e., y ∈ R(I − A). Thus R(I − A) is closed.Assume that R(I −A) 6= X. Then there exists an x∗ ∈ X∗ such that

〈(I −A)x, x∗) = 0 for all x ∈ dom (A).

By definition of the dual operator this implies x∗ ∈ dom (A∗) and (I − A)∗x∗ = 0.Dissipativity of A* implies |x∗| < |x∗ −A∗x∗| = 0, which is a contradiction.

Example (revisited example 1)

Aφ = −φ′ in X = L2(0, 1)

and for φ ∈ H1(0, 1)

(Aφ, φ)X = −∫ 1

0φ′(x)φdx ==

1

2(|φ(0)|2 − |φ(1)|2)

Thus, A is dissipative if and only if φ(0) = 0, the in flow condition. Define the domainof A by

dom(A) = φ ∈ H1(0, 1) : φ(0) = 0

The resolvent equation is equivalent to

λu+d

dxu = f

and

u(x) =

∫ x

0e−λ (x−s)f(s) ds,

and R(λ I −A) = X. By the Lumer-Philips theorem A generates the C0 semigroup onX = L2(0, 1).

Example (Conduction equation) Consider the heat conduction equation:

d

dtu = Au =

∑i,j

aij(x)∂2u

∂xi∂xj+ b(x) · ∇u+ c(x)u, in Ω.

Let X = C(Ω) and dom(A) ⊂ C2(Ω). Assume that a ∈ Rn×n ∈ C(Omega) b ∈ Rn,1and c ∈ R are continuous on Ω and a is symmetric and

mI ≤ a(x) ≤M I for 0 < m ≤M <∞.

83

Then, if x0 is an interior point of Ω at which the maximum of φ ∈ C2(Ω) is attained.Then,

∇φ(x0) = 0,∑ij

aij(x0)∂2u

∂xi∂xj(x0) ≤ 0.

and thus(λφ−Aφ)(x0) ≤ ω φ(x0)

whereω ≤ max

x∈Ωc(x).

Similarly, if x0 is an interior point of Ω at which the minimum of φ ∈ C2(Ω) is attained,then

(λφ−Aφ)(x0) ≥ 0

If x0 ∈ ∂Ω attains the maximum, then

∂

∂νφ(x0) ≤ 0.

Consider the domain with the Robin boundary condition:

dom(A) = u ∈ α(x)u(x) + β(x)∂

∂νu = 0 at ∂Ω

with α, β ≥ 0 and infx∈∂Ω(α(x) + β(x)) > 0. Then,

|λφ−Aφ|X ≥ (λ− ω)|φ|X . (4.30)

for all φ ∈ C2(Ω). It follows from the he Lax Milgram theory that

(λ0 I −A)−1 ∈ L(L2(Ω), H2(Ω),

assuming that coefficients (a, b, c) are sufficiently smooth. Let

dom(A) = (λ0 I −A)−1C(Ω).

Since C2(Ω) is dense in dom(A), (??) holds for all φ ∈ dom(A), which shows A isdissipative.

Example (Advection equation) Consider the advection equation

ut +∇ · (~b(x)u) = ν∆u.

Let X = L1(Ω). Assume~b ∈ L∞(Ω)

Let ρ ∈ C1(R) be a monotonically increasing function satisfying ρ(0) = 0 and ρ(x) =sign(x), |x| ≥ 1 and ρε(s) = ρ( sε ) for ε > 0. For u ∈ C1(Ω)

(Au, u) =

∫Γ(ν

∂

∂nu− n ·~b u, ρε(u)) ds+ (~b u− ν∇u, 1

ερ′ε(u)∇u) + (c u, ρε(u)).

where

(~b u,1

ερ′ε(u)∇u) ≤ ν (∇u, 1

ερ′ε(u)∇u) +

ε

4νmeas(|u| ≤ ε).

84

Assume the inflow condition u on s ∈ ∂Ω : n · b < 0. Note that for u ∈ L1(Rd)

(u, ρε(u))→ |u|1 and (ψ, ρε(u))→ (ψ, sign0(u)) for ψ ∈ L1(Ω)

as ε→ 0. It thus follows(λ− ω) |u| ≤ |λu− λAu|. (4.31)

Since C2(Ω) is dense in W 2,1(Rd) (4.31) holds for u ∈W 2,1(Ω).

Example (X = Lp(Ω)) Let Au = ν∆u+ b · ∇u with homogeneous boundary conditionu = 0 on X = Lp(Ω). Since

〈∆u, u∗〉 =

∫Ω

(∆u, |u|p−2u) = −(p− 1)

∫Ω

(∇u, |u|p−2∇u)

and

(b · ∇u, |u|p−2u)L2 ≤(p− 1)ν

2|(∇u, |u|p−2∇u)L2 +

|b|2∞2ν(p− 1)

(|u|p, 1)L2

we have〈Au, u∗〉 ≤ ω|u|2

for some ω > 0.

Example (Fractional PDEs I)In this section we consider the nonlocal diffusion equation of the form

ut = Au =

∫RdJ(z)(u(x+ z)− u(x)) dz.

Or, equivalently

Au =

∫(Rd)+

J(z)(u(x+ z)− 2u(x) + u(x− z)) dz

for the symmetric kernel J in Rd. It will be shown that

(Au, φ)L2 =

∫Rd

∫(Rd)+

J(z)(u(x+ z)− u(x))(φ(x+ z)− φ(x)) dz dx

and thus A has a maximum extension.Also, the nonlocal Fourier law is given by

Au = ∇ · (∫RdJ(z)∇u(x+ z) dz).

Thus,

(Au, φ)L2 =

∫Rd×Rd

J(z)∇u(x+ z) · ∇φ(x) dz dx

Under the kernel J is completely monotone, one can prove that A has a maximalmonotone extension.

85

4.4 Jump diffusion Model for American option

In this section we discuss the American option for the jump diffusion model

ut + (x− σ2

2)ux +

σ2x2

2uxx +Bu+ λ = 0, u(T, x) = ψ,

(λ, u− ψ) = 0, λ ≤ 0, u ≥ ψ

where the generator B for the jump process is given by

Bu =

∫ ∞−∞

k(s)(u(x+ s)− u(x) + (es − 1)ux) ds.

The CMGY model for the jump kernel k is given by

k(s) =

Ce−M |s||s|1+Y = k+(s) s ≥ 0

Ce−G|s||s|1+Y = k−(s) s ≤ 0

Since∫ ∞−∞

k(s)(u(x+ s)− u(x)) ds =

∫ ∞0

k+(s)(u(x+ s)− u(x)) ds+

∫ ∞0

k−(s)(u(x− s)− u(x)) ds

=

∫ ∞0

k+(s) + k−(s)

2(u(x+ s)− 2u(x) + u(x− s)) ds+

∫ ∞0

k+(s)− k−(s)

2(u(x+ s)− u(x− s)) ds.

Thus, ∫ ∞−∞

(

∫ ∞−∞

k(s)(s)(u(x+ s)− u(x) ds)φdx

=

∫ ∞−∞

∫ ∞0

ks(s)(u(x+ s)− u(x))(φ(x+ s)− φ(s)) ds dx

+

∫ ∞−∞

(

∫ ∞0

ku(s)(u(x+ s)− u(s)))φ(x) dx

where

ks(s) =k+(s) + k−(s)

2, ku(s) =

k+(s)− k−(s)

2and hence

(Bu, φ) =

∫ ∞−∞

∫ ∞−∞

ks(s)(u(x+ s)− u(x))(φ(x+ s)− φ(s)) ds dx

+

∫ ∞−∞

(

∫ ∞−∞

ku(s)(u(x+ s)− u(s)))φ(x) dx+ ω

∫ ∞−∞

uxφdx.

where

ω =

∫ ∞−∞

(es − 1)k(s) ds.

If we equip V = H1(R) by

|u|2V =

∫ ∞−∞

∫ ∞−∞

ks(s)|u(x+ s)− u(x)|2 ds dx+σ2

2

∫ ∞−∞|ux|2 dx,

then A+B ∈ L(V, V ∗) and A+B generates the analytic semigroup on X = L2(R).

86

4.5 Numerical approximation of nonlocal operator

In this section we describe our higher order integration method for the convolution;∫ ∞0

k+(s) + k−(s)

2(u(x+s)−2u(x)+u(x−s)) ds+

∫ ∞0

k+(s)− k−(s)

2(u(x+s)−u(x−s)) ds.

For the symmetric part,∫ ∞−∞

s2ks(s)u(x+ s)− 2u(x) + u(x− s)

s2ds,

where we have

u(x+ s)− 2u(x) + u(x− s)s2

∼ uxx(x) +s2

12uxxxx(x) +O(s4)

We apply the fourth order approximation of uxx by

uxx(x) ∼ u(x+ h)− 2u(x) + u(x− h)

h2− 1

12

u(x+ 2h)− 4u(x) + 6u(x)− 4u(x− h) + u(x− 2h)

h2

and we apply the second order approximation of uxxxx(x) by

uxxxx(x) ∼ u(x+ 2h)− 4u(x) + 6u(x)− 4u(x− h) + u(x− 2h)

h4.

Thus, one can approximate∫ h2

−h2

s2ks(s)u(x+ s)− 2u(x) + u(x− s)

s2ds

by

ρ0 (uk+1 − 2uk + uk−1

h2− 1

12

uk+2 − 4uk+1 + 6uk − 4uk−1 + uk−2

h2)

+ρ1

12

uk+2 − 4uk+1 + 6uk − 4uk−1 + uk−2

h2,

where

ρ0 =

∫ h2

−h2

s2ks(s) ds and ρ1 =1

h2

∫ h2

−h2

s4ks(s) ds.

The remaining part of the convolution∫ (k+ 12

)h

(k− 12

)hu(xk+j + s)ks(s) ds

can be approximated by three point quadrature rule based on

u(xk+j + s) ∼ u(xk+j) + u′(xk+j)s+s2

2u′′(xk+j)

with

u′(xk+j) ∼uk+j+1 − uk+j−1

2h

u′′(xk+j) ∼uk+j+1 − 2uk+j + uk+j−1

h2.

87

That is, ∫ (k+ 12

)h

(k− 12

)hu(xk+j + s)ks(s) ds

∼ ρk0uk+j + ρk1uk+j−1 − uk+j+1

2+ ρk2

uj+k+1 − 2uk+j + uj+k−1

2

where

ρk0 =∫ (k+ 1

2)h

(k− 12

)hks(s) ds

ρk1 = 1h

∫ (k+ 12

)h

(k− 12

)h(s− xk)ks(s) ds

ρk2 = 1h2

∫ (k+ 12

)h

(k− 12

)h(s− xk)2ks(s) ds.

For the skew-symmetric integral∫ h2

−h2

ku(s)(u(x+ s)− u(x− s)) ds ∼ ρ2 ux(x) +ρ3

6h2 uxxx(x)

where

ρ2 =

∫ h2

−h2

2sku(s) ds, ρ3 =1

h2

∫ h2

−h2

2s3ku(s) ds.

We may use the forth order difference approximation

ux(x) ∼ u(x+ h)− u(x− h)

2h− u(x+ 2h)− 2u(x+ h) + 2u(x− h)− u(x− 2h)

6h

and the second order difference approximation

uxxx(x) ∼ u(x+ 2h)− 2u(x+ h) + 2u(x− h)− u(x− 2h)

h3

and obtain∫ h2

−h2

ku(s)(u(x+ s)− u(x− s)) ds

∼ ρ2 (uk+1 − uk−1

2h− uk+2 − 2uk+1 + 2uk−1 − uk−1

6h) +

ρ3

6

uk+2 − 2uk+1 + 2uk−1 − uk−1

h.

Consider the nonlocal PDE equations of the form

Example (Fractional PDEs II) Consider the fractional equation of the form∫ 0

−tg(θ)u′(t+ θ) dθ = Au, u(0) = u0,

88

where the kernel g satisfies

g > 0, g ∈ L1(−∞, 0) and non-decreasing.

For example, the case of the Caputo (fractional) derivative has

g(θ) =1

|Gamma(1− α)|θ|−α.

Define z(t, θ) = u(t + θ), θ ∈ (−∞, 0]. Then, ddtz = ∂

∂θz. Thus, we define the linearoperator A on Z = C((−∞, 0];X) by

Az = z′(θ) with dom(A) = z′ ∈ X :

∫ 0

−∞g(θ)z′(θ) dθ = Az(0)

Theorem 4.1 Assume A is m-dissipative in a Banach space X. Then, A is dissipativeand R(λ I − A) = Z for λ > 0. Thus, A generates the C0-semigroup T (t) on Z =C((−∞, 0];X).

Proof: First we show that A is dissipative. For φ ∈ dom (A) suppose |φ(0)| > |φ(θ)|for all θ < 0. Define

gε(θ) =1

ε

∫ θ

θ−εg(θ) dθ.

For all x∗ ∈ F (φ(0))

〈x∗,∫ 0

−∞gε(θ)(φ

′)dθ〉

= 〈x∗,∫ 0

−∞

g(θ)− g(θ − ε)ε

〈φ(θ)− φ(0)), 〉 dθ ≤ 0

since〈x∗, φ(θ)− φ(0)〉 ≤ (|φ(θ)| − |φ(0)|)|φ(0)| < 0, θ < 0.

Thus,

limε→0+

〈x∗,∫ 0

−∞gε(θ)(φ

′)dθ〉 = 〈∫ 0

−∞g(θ)φ′dθ, x∗〉 < 0. (4.32)

But, since there exists a x∗ ∈ F (φ(0)) such that

〈x∗, Ax〉 ≥ 0

which contradicts to (4.32). Thus, there exists θ0 such that |φ(θ0)| = |φ|Z . Since〈φ(θ), x∗〉 ≤ |φ(θ)| for x∗ ∈ F (φ(θ0)), θ → 〈φ(θ), x∗〉 attains the maximum at θ0 andthus 〈x∗φ′(θ0)〉 = 0 Hence,

|λφ− φ′|Z ≥ 〈x∗, λ φ(θ0)− φ′(θ0), 〉 = λ |φ(θ0)| = λ |φ|Z . (4.33)

For the range condition λφ−Aφ = f we note that

φ(θ) = eλθφ(0) + ψ(θ)

89

where

ψ(θ) =

∫ 0

θeλ(θ−ξ)f(ξ) dξ.

Thus,

(∆(λ) I −A)φ(0) =

∫ 0

−∞g(θ)ψ′(θ) dθ)

where

∆(λ) = λ

∫ 0

−∞g(θ)eλθ dθ > 0

Thus,

φ(0) = (∆(λ) I −A)−1

∫ 0

−∞g′(θ)ψ(θ) dθ.

Since A is dissipative and

λψ − ψ′ = f ∈ Z, ψ(0) = 0,

thus |ψ|Z ≤1

λ|f |Z . Thus φ = (λ I −A)−1f ∈ Z.

Example (Renewable system) We discuss the renewable system of the form

dp0

dt = −∑

i λi p0(t) +∑

i

∫ L0 µi(x)pi(x, t) dx

(pi)t + (pi)x = −µi(x)p, p(0, t) = λi p0(t)

for (p0, pi, 1 ≤ i ≤ d) ∈ R×L1(0, T )d. Here, p0(t) ≥ 0 is the total utility and λi ≥ 0 isthe rate for the energy conversion to the i-th process pi. The first equation is the energybalance law and s is the source = generation −consumption. The second equation isfor the transport (via pipeline and storage) for the process pi and µi ≥ 0 is the renewalrate and µ ≥ 0 is the natural loos rate. λi ≥ 0 represent the distribution of theutility to the i-th process.

Assume at the time t = 0 we have the available utility p0(0) = 1 and pi = 0. Thenwe have the following conservation

p0(t) +

∫ t

0pi(s) ds = 1

if t ≤ L. Let X = R× L1(0, L)d. Let A(µ) defined by

Ax = (−∑i

λi p0 +∑i

∫ L

0µi(x) dx,−(pi)x − µi(x)pi)

with domain

dom(A) = (p0, pi) ∈ R×W 11(0, L)d : pi(0) = λi p0

Let

signε(s) =

s|s| |s| > ε

sε |s| ≤ ε

90

Then,

(A(p0, p), (sign0(p0), signε(p)) ≤ −(∑

i λi)|p0|+ |∫ L

0 µipi dx|∑i(Ψε(pi(0))−Ψε(pi(L))−

∫ L0 µipisignε dx)

where

Ψε(s) =

|s| |s| > ε

s2

2ε + ε2 |s| ≤ ε

Sincesignε → signε, Ψε → |s|

by the Lebesgue dominated convergence theorem, we have

(A(p0, p), (sign0(p0), sign0(p)) ≤ 0.

The resolvent equationA(p0, p) = (s, f), (4.34)

has a solutionpi(x) = λip0e

−∫ x0 µi +

∫ x0 e−

∫ xs µif(s) ds

(∑

i λi)(1− e∫ L0 µi) p0 = s+

∫ L0 µipi(x) dx

Thus, A generates the contractive C0 semigroup S(t) on X. Moreover, it is conepreserving S(t)C+ ⊂ C+ since the resolvent is positive cone preserving.Example (Bi-domain equation)Example (Second order equation) Let V ⊂ H = H∗ ⊂ V ∗ be the Gelfand triple. Let ρbe a bounded bilinear form on H ×H, µ and σ be bounded bilinear forms on V × V .Assume ρ and σ are symmetric and coercive and µ(φ, φ) ≥ 0 for all φ ∈ V . Considerthe second order equation

ρ(utt, φ) + µ(ut, φ) + σ(u, φ) = 〈f(t), φ〉 for all φ ∈ V. (4.35)

Define linear operators M (mass), D (dampping), K and (stiffness) by

(Mφ,ψ)H = ρ(φ, ψ), φ, ψ ∈ H〈Dφ,ψ〉 = µ(φ, ψ) φ, ψ ∈ V 〈Kφ,ψ〉V ∗×V , φ, ψ ∈ V

Let v = ut and define A on X = V ×H by

A(u, v) = (v,−M−1(Ku+Dv))

with domaindom (A) = (u, v) ∈ X : v ∈ V and Ku+Dv ∈ H

The state space X is a Hilbert space with inner product

((u1, v1), (u,v2)) = σ(u1, u2) + ρ(v1, v2)

andE(t) = |(u(t), v(t))|2X = σ(u(t), u(t)) + ρ(v(t), v(t))

91

defines the energy of the state x(t) = (u(t), v(t)). First, we show that A is dissipative:

(A(u, v), (u, v))X = σ(u, v)+ρ(−M−1(Ku+Dv), v) = σ(u, v)−σ(u, v)−µ(v, v) = −µ(v, v) ≤ 0

Next, we show that R(λ I − A) = X. That is, for (f, g) ∈ X the exists a solution(u, v) ∈ dom (A) satisfying

λu− v = f, λMv +Dv +Ku = Mg,

or equivalently v = λu− f and

λ2Mu+ λDu+Ku = Mg + λMf +Df (4.36)

Define the bilinear form a on V × V

a(φ, ψ) = λ2 ρ(φ, ψ) + λµ(φ, ψ) + σ(φ, ψ)

Then, a is bounded and coercive and if we let

F (φ) = (M(g + λ f)φ)H + µ(f, φ)

then F ∈ V ∗. It thus follows from the Lax-Milgram theory there exists a uniquesolution u ∈ V to (4.36) and Dv +Ku ∈ H.

For example, consider the wave equation

1c2(x)

utt + κ(x)ut = ∆u

[∂u∂n ] + αu = γ ut at Γ

In this example we let V = H1(Ω)/R and H = L2(Ω) and define

σ(φ, ψ) =

∫Ω

(∇φ,∇ψ) dx+

∫Γαφψ ds

µ(φ, ψ) =

∫∂Ωκ(x)φ, ψ dx+

∫Γγ)φ, ψ ds

ρ(φ, ψ) =

∫Ω

1

c2(x)φψ dx

Example (Maxwell system for elecro-magnetic equations)

εEt = ∇×H, ∇E = ρ

µHt = −∇× E, ∇ ·B = 0

with boundary conditionE × n = 0

where E is Electric field, BµH is Magnetic field and D = εE is dielectric with ε, µis electric and magnetic permittivity, respectively Let X = L2(Ω)d × L2(Ω)d with thenorm defined by ∫

Ω(ε |E|2 + µ |H|2) dx.

92

The dissipativity follows from∫Ω

(E · (∇×H)−H · (∇× E)) dx =

∫Ω∇ · (E ×H) dx =

∫∂Ωn · (E ×H) ds = 0

Let ρ = 0 and thus ∇ · E = 0. The range condition is equivalent to

εE +∇× 1

µ(∇× E − g) = f

The weak form is given by

(εE, ψ) + (1

µ∇× E,∇× ψ) = (t, ψ) + (g,

1

µ∇× ψ). (4.37)

for ψ ∈ V = H1(Ω) : ∇ cotψ = 0, n × ψ = 0 at ∂Ω Since |∇ × ψ|2 = |∇ψ|2 for∇ · ψ = 0. the right hand side of (4.37) defines the bounded coercive quadratic formon V × V , it follows from the Lax-Milgram equation that (??) has a unique solutionin V .

4.6 Dual semigroup

Theorem (Dual semigroup) Let X be a reflexive Banach space. The adjoint S∗(t)of the C0 semigroup S(t) on X forms the C0 semigroup and the infinitesimal generatorof S∗(t) is A∗. Let X be a Hilbert space and dom(A∗) be the Hilbert space with graphnorm and X−1 be the strong dual space of dom(A∗), then the extension S(t) to X−1

defines the C0 semigroup on X−1.

Proof: (1) Since for t, s ≥ 0

S∗(t+ s) = (S(s)S(t))∗ = S∗(t)S∗(s)

and〈x, S∗(t)y − y〉X×X∗ = 〈S(t)x− x, y〉X×X∗ → 0.

for x ∈ X and y ∈ X∗ Thus, S∗(t) is weakly star continuous at t = 0 and let B is thegenerator of S∗(t) as

Bx = w∗ − limS∗(t)x− x

t.

Since

(S(t)x− x

t, y) = (x,

S∗(t)y − yt

),

for all x ∈ dom(A) and y ∈ dom(B) we have

〈Ax, y〉X×X∗ = 〈x,By〉X×X∗

and thus B = A∗. Thus, A∗ is the generator of a w∗− continuous semigroup on X∗.(2) Since

S∗(t)y − y = A∗∫ t

0S∗(s)y ds

for all y ∈ Y = dom (A∗). Thus, S∗(t) is strongly continuous at t = 0 on Y .

93

(3) If X is reflexive, dom (A∗) = X∗. If not, there exists a nonzero y0 ∈ X such that〈y0, x

∗〉X×X∗ = 0 for all x∗ ∈ dom(A∗). Thus, for x0 = (λ I−A)−1y0 〈λx0−Ax0, x∗〉 =

〈x0, λ x∗ − A∗x∗〉 = 0. Letting x∗ = (λ I − A∗)−1x∗0 for x∗0 ∈ F (x0), we have x0 = 0

and thus y0 = 0, which yields a contradiction.(4) X1 = dom (A∗) is a closed subspace of X∗ and is a invariant set of S∗(t). Since A∗

is closed, S∗(t) is the C0 semigroup on X1 equipped with its graph norm. Thus,

(S∗(t))∗ is the C0 semigroup on X−1 = X∗1

and defines the extension of S(t) to X−1. Since for x ∈ X ⊂ X−1 and x∗ ∈ X∗

〈S(t)x, x∗〉 = 〈x, S∗(t)x∗〉,

S(t) is the restriction of (S∗(t))∗ onto X.

4.7 Stability

Theorem (Datko 1970, Pazy 1972). A strongly continuous semigroup S(t), t ≥ 0on a Banach space X is uniformly exponentially stable if and only if for p ∈ [1,∞) onehas ∫ ∞

0|S(t)x|p dt <∞ for all x ∈ X.

Theorem. (Gearhart 1978, Pruss 1984, Greiner 1985) A strongly continuoussemigroup on S(t), t ≥ 0 on a Hilbert space X is uniformly exponentially stable if andonly if the half-plane λ ∈ C : Reλ > 0 is contained in the resolvent set ρ(A) of thegenerator A with the resolvent satisfying

|(λ I −A)−1|∞ <∞

4.8 Sectorial operator and Analytic semigroup

In this section we have the representation of the semigroup S(t) in terms of the inverseLaplace transform. Taking the Laplace transform of

d

dtx(t) = Ax(t) + f(t)

we havex = (λ I −A)−1(x(0) + f)

where for λ > ω

x =

∫ ∞0

e−λtx(t) dt

is the Laplace transform of x(t). We have the following the representation theory(inverse formula).

Theorem (Resolvent Calculus) For x ∈ dom(A2) and γ > ω

S(t)x =1

2πi

∫ γ+i∞

γ−i∞eλt(λ I −A)−1x dλ. (4.38)

94

Proof: Let Aµ be the Yosida approximation of A. Since Reσ(Aµ) ≤ ω0

1− µω0< γ, we

have

uµ(t) = Sµ(t)x =1

2πi

∫ γ+i∞

γ−i∞eλt(λ I −Aµ)−1x dλ.

Note thatλ(λ I −A)−1 = I + (λ I −A)−1A. (4.39)

Since1

2πi

∫ γ+i∞

γ−i∞

eλt

λdλ = 1

and ∫ γ+i∞

γ−i∞|(|λ− ω|−2| dλ <∞,

we have|Sµ(t)x| ≤M |A2x|,

uniformly in µ > 0. Since

(λ I −Aµ)−1x− (λ I −A)−1x =µ

1 + λµ(ν I −A)−1(λ I −A)−1A2x,

where ν =λ

1 + λµ, uµ(t) is Cauchy in C(0, T ;X) if x ∈ dom(A2). Letting µ → 0+,

we obtain (4.38).

Next we consider the sectorial operator. For δ > 0 let

Σδω = λ ∈ C : arg(λ− ω) <

π

2+ δ

be the sector in the complex plane C. A closed, densely defined, linear operator A ona Banach space X is a sectorial operator if

|(λ I −A)−1| ≤ M

|λ− ω|for all λ ∈ Σδ

ω.

For 0 < θ < δ let Γ = Γω,θ be the integration path defined by

Γ± = z ∈ C : |z| ≥ δ, arg(z − ω) = ±(π2 + θ),

Γ0 = z ∈ C : |z| = δ, |arg(z − ω)| ≤ π2 + θ.

For 0 < θ < δ define a family S(t), t ≥ 0 of bounded linear operators on X by

S(t)x =1

2πi

∫Γeλt(λ I −A)−1x dλ. (4.40)

Theorem (Analytic semigroup) If A is a sectorial operator on a Banach space X,then A generates an analytic (C0) semigroup on X, i.e., for x ∈ X t → S(t)x is ananalytic function on (0,∞). We have the representation (4.40) for x ∈ X and

|AS(t)x|X ≤Mθ

t|x|X

95

Proof: The elliptic operator A defined by the Lax-Milgram theorem defines a sectorialoperator on Hilbert space.

Theorem (Sectorial operator) Let V, H are Hilbert spaces and assume H ⊂ V ∗.Let ρ(u, v) is bounded bilinear form on H ×H and

ρ(u, u) ≥ |u|2H for all u ∈ H

Let a(u, v) to be a bounded bilinear form on V × V with

σ(u, u) ≥ δ |u|2V for all u ∈ V.

Define the linear operator A by

ρ(Au, φ) = a(u, φ) for all φ ∈ V.

Then, for Reλ > 0we have

|(λ I −A)−1|L(V,V ∗) ≤ 1δ

|(λ I −A)−1|L(H) ≤ M|λ|

|(λ I −A)−1|L(V ∗,H) ≤ M√|λ|

|(λ I −A)−1|L(H,V ) ≤ M√|λ|

Proof: Let a(u, v) to be a bounded bilinear form on V × V . For f ∈ V ∗ and <λ > 0,Define MH,H by

(Mu, v) = ρ(u, v) for all v ∈ H

and A0 ∈ L(V, V ∗) by〈A0u, v〉 = σ(u, v) for v ∈ V

Then, A = M−1A0 and (λ I −A)u = M−1f is equivalent to

λρ(u, φ) + a(u, φ) = 〈f, φ〉, for all φ ∈ V. (4.41)

It follows from the Lax-Milgram theorem that (4.41) has a unique solution, givenf ∈ V ∗ and

Reλρ(u, u) + a(u, u) ≤ |f |V ∗ |u|V .

Thus,

|(λ I −A)−1|L(V ∗,V ) ≤1

δ.

Also,|λ| |u|2H ≤ |f |V ∗ |u|V +M |u|2V = M1 |f |2V ∗

for M1 = 1 + Mδ2 and thus

|(λ I −A)−1|L(V ∗,H) ≤√M1

|λ|1/2.

96

For f ∈ H ⊂ V ∗δ |u|2V ≤ Reλρ(u, u) + a(u, u) ≤ |f |H |u|H , (4.42)

and|λ|ρ(u, u) ≤ |f |H |u|H +M |u|2V ≤M1|f |H |u|H

Thus,

|(λ I −A)−1|L(H) ≤M1

|λ|.

Also, from (4.42)δ |u|2V ≤ |f |H |u|H ≤M1 |f |2.

which implies

|(λ I −A)−1|L(H,V ) ≤M2

|λ|1/2.

4.9 Approximation Theory

In this section we discuss the approximation theory for the linear C0-semigroup. Equiv-alence Theorem (Lax-Richtmyer) states that for consistent numerical approximations,stability and convergence are equivalent. In terms of the linear semigroup theory wehave

Theorem (Trotter-Kato theorem) Let X and Xn be Banach spaces and A and Anbe the infinitesimal generator of C0 semigroups S(t) on X and Sn(t) on Xn of G(M,ω)class. Assume a family of uniformly bounded linear operators Pn ∈ L(X,Xn) andEn ∈ L(Xn, X) satisfy

PnEn = I |EnPnx− x|X → 0 for all x ∈ X (4.43)

Then, the followings are equivalent.(1) there exist a λ0 > ω such that for all x ∈ X

|En(λ0 I −An)−1Pnx− (λ0 I −A)−1x|X → 0 as n→∞, (4.44)

(2) For every x ∈ X and T ≥ 0

|EnSn(t)Pnx− S(t)x|X → as n→∞.

uniformly on t ∈ [0, T ].

Proof: Since for λ > ω

En(λ I −A)−1Pnx− (λ I −A)−1x =

∫ ∞0

EnSn(t)Pnx− S(t)x dt

(1) follows from (2). Conversely, from the representation theory

EnSn(t)Pnx− S(t)x =1

2πi

∫ γ+i∞

γ−i∞(En(λ I −A)−1Pnx− (λ I −A)−1x) dλ

where(λ I −A)−1 − (λ0 I −A)−1 = (λ− λ0)(λ I −A)−1(λ0 I −A)−1.

97

Thus, from the proof of Theorem (Resolvent Calculus) (1) holds for x ∈ dom(A2). Butsince dom(A2) is dense in X, (2) implies (1).

Remark (Stability) If An is uniformly dissipative:

|λun −Anun| ≥ (λ− ω) |un|

for all un ∈ dom(An) and some ω ≥ 0, then An generates ω contractive semigroupSn(t) on Xn.

Remark (Consistency)(λI −An)Pnun = Pnf

Pn(λI −A)u = Pnf

we have(λ I −An)Pn(u− un) + PnAu−AnPnu = 0

Thus|Pn(u− un)| ≤M |PnAu−AnPnu|

The consistency(4.44) foll0ws from

|PnAu−AnPnu| → 0

for all u in a dense subset of dom(A).

Corollary Let the assumptions of Theorem hold. The statement (1) of Theorem isequivalent to (4.43) and the followings:

(C.1) there exists a subset D of doma(A) such that D = X and (λ0 I −A)D = X.

(C.2) for all u ∈ D there exists a sequence un ∈ dom(An) such that limEnun = u andlimEnAnun = Au.

Proof: Without loss of generality we can assume λ0 = 0. First we assume that condition(1) hold. We set D = dom(A) and thus AD = X. For u ∈ dom(A) we set un =A−1n PnAu and

Enun − u = EnA−1n Pnx−A−1x→ 0

andEnAnun −Au = EnAnA

−1n Pnx−AA−1x = EnPnx− x→ 0

as n→∞. Hence condition (C.2) hold.Conversely, we assume conditions (C.1)–(C.2) hold.

EnA−1n Pn −A−1 = En(A−1

n PnA− Pn)A−1 + (EnPn − I)A−1.

For x ∈ AD we choose u ∈ D such that x = Au and set un = A−1n Pnx = A−1

n PnAu.We then for u we choose un according to (C.2). Thus, we obtain

|un − Pnu| = |Pn(Enun − u)| ≤M |Enun − u| → 0

as n→∞ and

|un − un| ≤ |A−1n ||Anun + PnAu| ≤ |A−1

n ||Pn|EnAnun +Au| → 0

98

as n∞. It thus follows that

|EnA−1n Pnx−A−1x| ≤ |En(un−Pnu)|+|EnPnu−u| ≤M (|un−Pnu|+|EnPnu−u| → 0

as n→∞ for all x ∈ AD.

Example 1 (Trotter-Kato theoarem) Consider the heat equation on Ω = (0, 1)× (0, 1):

d

dtu(t) = ∆u, u(0, x) = u0(x)

with boundary condition u = 0 at the boundary ∂Ω. We use the central differenceapproximation on uniform grid points: (i h, j h) ∈ Ω with mesh size h = 1

n :

d

dtui,j(t) = ∆hu =

1

h(ui+1,j − ui,j

h− ui,j − ui−1,j

h) +

1

h(ui,j+1 − ui,j

h− ui,j − ui,j−1

h)

for 1 ≤ i, j ≤ n1, where ui,0 = ui,n = u1, j = un,j = 0 at the boundary node. First,

let X = C(Ω) and Xn = R(n−1)2with sup norm. Let Enui,j = the piecewise linear

interpolation and (Pnu)i,j = u(i h, j h) is the point-wise evaluation. We will prove that∆h is dissipative on Xn. Suppose uij = |un|∞. Then, since

λui,j − (∆hu)i,j = fij

and

−(∆hu)i,j =1

h2(4ui,j − ui+1,j − ui,j+1 − ui−1,j − ui,j−1) ≥ 0

we have

0 ≤ ui,j ≤fi,jλ.

Thus, ∆h is dissipative on Xn with sup norm. Next X = L2(Ω) and Xn with `2 norm.Then,

(−∆hun, un) =∑i,j

|ui,j − ui−1,j

h|2 + |ui,j − ui,j−1

h|2 ≥ 0

and thus ∆h is dissipative on Xn with `2 norm.

Example 2 (Galerkin method) Let V ⊂ H = H∗ ⊂ V ∗ is the Gelfand triple. Considerthe parabolic equation

ρ(d

dtun, φ) = a(un, φ) (4.45)

for all φ ∈ V , where the ρ is a symmetric mass form

ρ(φ, φ) ≥ c |φ|2H

and a is a bounded coercive bilnear form on V × V such that

a(φ, φ) ≥ δ |φ|2V .

Define A byρ(Au, φ) = a(u, φ) for all φ ∈ V.

By the Lax-Milgram theorem

(λI −A)u = f ∈ H

99

has a unique solution satisfying

λρ(u, φ)− a(u, φ) = (f, φ)H

for all φ ∈ V . Let dom(A) = (I −A)−1H. Assume

Vn = u =∑

akφnk , φnk ∈ V is dense in V

Consider the Galerkin method, i.e. un(t) ∈ Vn satisfies

ρ(d

dtun(t), φ) = a(un, φ).

Since for u = (λ I −A)u and un ∈ Vn

λρ(un, φ) + a(un, φ) = (f, φ) for φ ∈ Vn

λρ(un, φ) + a(un, φ) = λρ(un − u, φ) + a(un − u, φ) + (f, φ) for φ ∈ Vn

λρ(un − un, φ) + a(un − un, φ) = λρ(un − u, φ) + a(un − u, φ).

Thus,

|un − un| ≤M

δ|un − u|V .

Example 2 (Discontinuous Galerkin method) Consider the parabolic system for u =

~u ∈ L2(Ω)d

∂

∂tu = ∇ · (a(x)∇u) + c(x)u

where a ∈ Rd × d is symmetric and

a|ξ|2 ≤ (ξ, a(x), ξ)Rd ≤ a|ξ|2, ξ ∈ Rd

for 0 < a ≤ a <∞. The region Ω is dived into n non-overlapping sub-domains Ωi withboundaries ∂Ωi such that Ω = ∪Ωi. At the interface Γij = ∂Ωi ∩ ∂Ωj define

[[u]] = u|∂Ωi− u|∂Ωj

<< u >>= 12(u|∂Ωi

+ u|∂Ωj).

The approximate solution uh(t) in

Vh = uh ∈ L2(Ω) : uh is linear on Ωi.

Define the bilinear for on Vh × Vh

ah(u, v) =∑i

∫Ωi

(a(x)∇u,∇v) dx−∑i>j

∫Γij

(<< n·(a∇u) >> [[v]]± << n·(a∇v) >> [[u]]+β

h[[u]][[v]] ds),

whee h is the meshsize and β > 0 is sufficiently large. If + on the third term ah issymmetric and for the case − then ah enjoys the coercivity

ah(u, u) ≥∑i

∫Ωi

(a(x)∇u,∇u) dx,∈ u ∈ Vh,

100

regardless of β > 0.Example 3 (Population dynaims) The transport equation

∂p∂t + ∂p

∂x +m(x)p(x, t) = 0

p(0, t) =∫β(x)p(x, t) dx

Define the difference approximation

Anp = (−pi − pi−1

h−m(xi)pi, 1 ≤ i ≤ n), p0 =

∑i

βi pi

Then,

(Anp, sign0(p)) ≤ (∑

mi − βi)|pi| ≤ 0

Thus, An on L1(0, 1) is dissipative.

(A,Enφ)− (PnAn, φ).

Example 4 (Yee’s scheme)Consider the two dimensional Maxwell’s equation. Consider the staggered grid; i.e.

E = (E1i− 1

2,j, E2

i,j− 12

) is defined at the the sides and H = Hi− 12,j− 1

2is defined at the

center of the cell Ωi,j = ((i− 1)h, ih)× ((j − 1)h, jh).

εi− 12,j

ddtE

1i− 1

2,j

= −Hi− 1

2 ,j+12−H

i− 12 ,j−

12

h

εi,j+ 12

ddtE

2i,j+ 1

2

=Hi+ 1

2 ,j+12−H

i− 12 ,j−+ 1

2h

µi− 12,j− 1

2

ddtHi− 1

2,j− 1

2=

E2

i,j− 12

−E2

i−1,j− 12

h −E1

i− 12 ,j−E1

i− 12 ,j−1

h ,

(4.46)

where E1i− 1

2,j

= 0, j = 0, j = N and E2i,j− 1

2,j

= 0, i = 0, j = N .

Since

N∑i=1

N∑j=1

−Hi− 1

2,j+ 1

2−Hi− 1

2,j− 1

2

hE1i− 1

2,j

+Hi+ 1

2,j+ 1

2−Hi− 1

2,j−+ 1

2h

E

2

i,j+ 12

+(E2i,j− 1

2

− E2i−1,j− 1

2

h−E1i− 1

2,j− E1

i− 12,j−1

h)Hi− 1

2,j− 1

2= 0

(4.46) is uniformly dissipative. The range condition λ I−Ah = (f, g) ∈ Xh is equivalentto the minimization for E

min1

2(εi− 1

2,jE

1i− 1

2,j

+ εi,j+ 12E2i,j+ 1

2

) +1

2

1

µi,j(|E2i,j− 1

2

− E2i−1,j− 1

2

h|2 + |

E1i− 1

2,j− E1

i− 12,j−1

h|2)

−(f1i− 1

2,j− 1

µi,j

gi− 1

2 ,j+12−g

i− 12 ,j−

12

h , E1i− 1

2,j− (f2

i,j+ 12

+ 1µi,j

Hi+ 1

2 ,j+12−H

i− 12 ,j−+ 1

2h )E2

i,j+ 12

.

Example 5 Legende-Tau method

101

5 Dissipative Operators and Semigroup of Non-

linear Contractions

In this section we consider

du

dt∈ Au(t), u(0) = u0 ∈ X

for the dissipative mapping A on a Banach space X.

Definition (Dissipative) A mapping A on a Banach space X is dissipative if

|x1 − x2 − λ (y1 − y2)| ≥ |x1 − x2| for all λ > 0 and [x1, y1], [x2, y2] ∈ A,

or equivalently

〈y1 − y2, x1 − x2〉− ≤ 0 for all [x1, y1], [x2, y2] ∈ A.

and if in addition R(I − λA) = X, then A is m-dissipative.

In particular, it follows that if A is dissipative, then for λ > 0 the operator (I −λA)−1 is a single-valued and nonexpansive on R(I − λA), i.e.,

|(I − λA)−1x− (I − λA)−1y| ≤ |x− y| for all x, y ∈ R(I − λA).

Define the resolvent and Yosida approximation A by

Jλx = (I − λA)−1x ∈ dom (A), x ∈ dom (Jλ) = R(I − λA)

Aλ = λ−1(Jλx− x), x ∈ dom (Jλ).(5.1)

We summarize some fundamental properties of Jλ and Aλ in the following theorem.

Theorem 1.4 Let A be an ω− dissipative subset of X ×X, i.e.,

|x1 − x2 − λ (y1 − y2)| ≥ (1− λω) |x1 − x2|

for all 0 < λ < ω−1 and [x1, y1], [x2, y2] ∈ A and define ‖Ax‖ by

‖Ax‖ = inf|y| : y ∈ Ax.

Then for 0 < λ < ω−1,

(i) |Jλx− Jλy| ≤ (1− λω)−1 |x− y| for x, y ∈ dom (Jλ).(ii) Aλx ∈ AJλx for x ∈ R(I − λA).(iii) For x ∈ dom(Jλ) ∩ dom (A) |Aλx| ≤ (1 − λω)−1 ‖Ax‖ and thus |Jλx − x| ≤λ(1− λω)−1 ‖Ax‖.(iv) If x ∈ dom (Jλ), λ, µ > 0, then

µ

λx+

λ− µλ

Jλx ∈ dom (Jµ)

and

Jλx = Jµ

(µ

λx+

λ− µλ

Jλx

).

102

(v) If x ∈ dom (Jλ)∩dom (A) and 0 < µ ≤ λ < ω−1, then (1−λω)|Aλx| ≤ (1−µω) |Aµy|(vi) Aλ is ω−1(1 − λω)−1-dissipative and for x, y ∈ dom(Jλ) |Aλx − Aλy| ≤ λ−1(1 +(1− λω)−1) |x− y|.

Proof: (i)−−(ii) If x, y ∈ dom (Jλ) and we set u = Jλx and v = Jλy, then there existu and v such that x = u− λ u and y = v − λ v. Thus, from (1.8)

|Jλx− Jλy| = |u− v| ≤ (1− λω)−1|u− v − λ (u− v)| = (1− λω)−1 |x− y|.

Next, by the definition Aλx = λ−1(u− x) = u ∈ Au = AJλx.(iii) Let x ∈ dom (Jλ)∩dom (A) and x ∈ Ax be arbitrary. Then we have Jλ(x−λ x) = xsince x− λ x ∈ (I − λA)x. Thus,

|Aλx| = λ−1|Jλx−x| = λ−1|Jλx−Jλ(x−λ x)| ≤ (1−λω)−1λ−1 |x−(x−λ x)| = (1−λω)−1 |x|.

which implies (iii).(iv) If x ∈ dom (Jλ) = R(I − λA) then we have x = u − λ u for [u, u] ∈ A and thusu = Jλx. For µ > 0

µ

λx+

λ− µλ

Jλx =µ

λ(u− λ u) +

λ− µλ

u = u− µ u ∈ R(I − µA) = dom (Jµ).

and

Jµ

(µ

λx+

λ− µλ

Jλx

)= Jµ(u− µ u) = u = Jλx.

(v) From (i) and (iv) we have

λ |Aλx| = |Jλx− x| ≤ |Jλx− Jλy|+ |Jµx− x|

≤∣∣∣∣Jµ(µλ x+

λ− µλ

Jλx

)− Jµx

∣∣∣∣+ |Jµx− x|

≤ (1− µω)−1

∣∣∣∣µλ x+λ− µλ

Jλx− x∣∣∣∣+ |Jµx− x|

= (1− µω)−1(λ− µ) |Aλx|+ µ |Aµx|,

which implies (v) by rearranging.(vi) It follows from (i) that for ρ > 0

|x− y − ρ (Aλx−Aλy)| = |(1 +ρ

λ) (x− y)− ρ

λ(Jλx− Jλy)|

≥ (1 +ρ

λ) |x− y| − ρ

λ|Jλx− Jλy|

≥ ((1 +ρ

λ)− ρ

λ(1− λω)−1) |x− y| = (1− ρω(1− λω)−1) |x− y|.

The last assertion follows from the definition of Aλ and (i).

Theorem 1.5(1) A dissipative set A ⊂ X ×X is m-dissipative, if and only if

R(I − λ0A) = X for some λ0 > 0.

103

(2) An m-dissipative mapping is maximal dissipative, i.e., all dissipative set containingA in X ×X coincide with A.(3) If X = X∗ = H is a Hilbert space, then the notions of the maximal dissipative setand m-dissipative set are equivalent.

Proof: (1) Suppose R(I −λ0A) = X. Then it follows from Theorem 1.4 (i) that Jλ0 iscontraction on X. We note that

I − λA =λ0

λ

(I − (1− λ0

λ) Jλ0

)(I − λ0A) (5.2)

for 0 < λ < ω−1. For given x ∈ X define the operator T : X → X by

Ty = x+ (1− λ0

λ) Jλ0y, y ∈ X

Then

|Ty − Tz| ≤ |1− λ0

λ| |y − z|

where |1 − λλ0| < 1 if 2λ > λ0. By Banach fixed-point theorem the operator T has a

unique fixed point z ∈ X, i.e., x = (I − (1− λ0λ Jλ0)z. Thus,

x ∈ (I − (1− λ0

λ) Jλ0)(I − λ0A)dom (A).

and it thus follows from (5.2) that R(I − λA) = X if λ > λ02 . Hence, (1) follows from

applying the above argument repeatedly.(2) Assume A is m-dissipative. Suppose A is a dissipative set containing A. We needto show that A ⊂ A. Let [x, x] ∈ A Since x− λ x ∈ X = R(I − λA), for λ > 0, thereexists a [y, y] ∈ A such that x − λ x = y − λ y. Since A ⊂ A it follows that [y, y] ∈ Aand thus

|x− y| ≤ |x− y − λ (x− y)| = 0.

Hence, [x, x] = [y, y] ∈ A.(3) It suffices to show that if A is maximal dissipative, then A is m-dissipative. We usethe following extension lemma, Lemma 1.6. Let y be any element of H. By Lemma1.6, taking C = H, we have that there exists x ∈ H such that

(ξ − x, η − x+ y) ≤ 0 for all [ξ, η] ∈ A.

and thus(ξ − x, η − (x− y)) ≤ 0 for all [ξ, η] ∈ A.

Since A is maximal dissipative, this implies that [x, x− y] ∈ A, that is x− y ∈ Ax, andtherefore y ∈ R(I −H).

Lemma 1.6 Let A be dissipative and C be a closed, convex, non-empty subset of Hsuch that dom (A) ∈ C. Then for every y ∈ H there exists x ∈ C such that

(ξ − x, η − x+ y) ≤ 0 for all [ξ, η] ∈ A.

Proof: Without loss of generality we can assume that y = 0, for otherwise we defineAy = [ξ, η + y] : [ξ, η] ∈ A with dom (Ay) = dom (A). Since A is dissipative if andonly if Ay is dissipative , we can prove the lemma for Ay. For [ξ, η] ∈ A, define the set

C([ξ, η]) = x ∈ C : (ξ − x, η − x) ≤ 0.

Thus, the lemma is proved if we can show that⋂

[ξ,η]∈A C([ξ, η]) is non-empty.

104

5.0.1 Properties of m-dissipative operators

In this section we discuss some properties of m-dissipative sets.

Lemma 1.7 Let X∗ be a strictly convex Banach space. If A is maximal dissipative,then Ax is a closed convex set of X for each x ∈ dom (A).

Proof: It follows from Lemma that the duality mapping F is single-valued. First, weshow that Ax is convex. Let x1, x2 ∈ Ax and set x = αx1 + (1− α) x2 for 0 ≤ α ≤ 1.Then, Since A is dissipative, for all [y, y] ∈ A

Re 〈x− y, F (x− y)〉 = αRe 〈x1 − y, F (x− y)〉+ (1− α)Re 〈x2 − y, F (x− y)〉 ≤ 0.

Thus, if we define a subset A by

Az =

Az if z ∈ dom (A) \ x

Ax ∪ x if z = x,

then A is a dissipative extension of A and dom (A) = dom (A). Since A is maximaldissipative, it follows that Ax = Ax and thus x ∈ Ax as desired.

Next, we show that Ax is closed. Let xn ∈ Ax and limn→∞ xn = x. Since Ais dissipative, Re 〈xn − y, x − y〉 ≤ 0 for all [y, y] ∈ A. Letting n → ∞, we obtainRe 〈x− y, x− y〉 ≤ 0. Hence, as shown above x ∈ Ax as desired.

Definition 1.4 A subset A of X ×X is said to be demiclosed if xn → x and yn yand [xn, yn] ∈ A imply that [x, y] ∈ A. A subset A is closed if [xn, yn], xn → x andyn → y imply that [x, y] ∈ A.

Theorem 1.8 Let A be m-dissipative. Then the followings hold.(i) A is closed.(ii) If xλ ⊂ X such that xλ → x and Aλxλ → y as λ→ 0+, then [x, y] ∈ A.

Proof: (i) Let [xn, xn] ∈ A and (xn, xn) → (x, x) in X × X. Since A is dissipativeRe 〈xn − y, xn − y〉i ≤ 0 for all [y, y] ∈ A. Since 〈·, ·〉i is lower semicontinuous, lettingn → ∞, we obtain Re 〈x − y, x − y〉i ≤ for all [y, y] ∈ A. Then A1 = [x, x] ∪ A is adissipative extension of A. Since A is maximal dissipative, A1 = A and thus [x, x] ∈ A.Hence, A is closed.(ii) Since Aλx is a bounded set in X, by the definition of Aλ, lim |Jλxλ − xλ| → 0and thus Jλxλ → x as λ → 0+. But, since Aλxλ ∈ AJλxλ, it follows from (i) that[x, y] ∈ A.

Theorem 1.9 Let A be m-dissipative and let X∗ be uniformly convex. Then thefollowings hold.(i) A is demiclosed.(ii) If xλ ⊂ X such that xλ → x and |Aλx| is bounded as λ → 0+, then x ∈dom (A). Moreover, if for some subsequence Aλnxn y, then y ∈ Ax.(iii) limλ→0+ |Aλx| = ‖Ax‖.

Proof: (i) Let [xn, xn] ∈ A be such that lim xn = x and w − lim xn = x as n → ∞.Since X∗ is uniformly convex, from Lemma the duality mapping is single-valued anduniformly continuous on the bounded subsets of X. Since A is dissipative Re 〈xn −y, F (xn − y)〉 ≤ 0 for all [y, y] ∈ A. Thus, letting n→∞, we obtain Re 〈x− y, F (x−y)〉 ≤ 0 for all [y, y] ∈ A. Thus, [x, x] ∈ A, by the maximality of A.

105

Definition 1.5 The minimal section A0 of A is defined by

A0x = y ∈ Ax : |y| = ‖Ax‖ with dom (A0) = x ∈ dom (A) : A0x is non-empty.

Lemma 1.10 Let X∗ be a strictly convex Banach space and let A be maximal dissi-pative. Then, the followings hold.(i) If X is strictly convex, then A0 is single-valued.(ii) If X reflexible, then dom (A0) = dom (A).(iii) If X strictly convex and reflexible, then A0 is single-valued and dom (A0) =dom (A).

Theorem 1.11 Let X∗ is a uniformly convex Banach space and let A be m-dissipative.Then the followings hold.(i) limλ→0+ F (Aλx) = F (A0x) for each x ∈ dom (A).

Moreover, if X is also uniformly convex, then(ii) limλ→0+ Aλx = A0x for each x ∈ dom (A).

Proof: (1) Let x ∈ dom (A). By (ii) of Theorem 1.2

|Aλx| ≤ ‖Ax‖

Since Aλx is a bounded sequence in a reflexive Banach space (i.e., since X∗ isuniformly convex, X∗ is reflexive and so is X), there exists a weak convergent sub-sequence Aλnx. Now we set y = w − limn→∞ Aλnx. Since from Theorem 1.2Aλnx ∈ AJλnx and limn→∞ Jλnx = x and from Theorem 1.10 A is demiclosed, itfollows that [x, y] ∈ A. Since by the lower-semicontinuity of norm this implies

‖Ax‖ ≤ |y| lim infn→∞

|Aλnx| ≤ lim supn→∞

|Aλnx| ≤ ‖Ax‖,

we have |y| = ‖Ax‖ = limn→∞ |Aλnx| and thus y ∈ A0x. Next, since |F (Aλnx)| =|Aλnx| ≤ ‖Ax‖, F (Aλnx) is a bounded sequence in the reflexive Banach space X∗

and has a weakly convergent subsequence F (Aλkx) of F (Aλnx). If we set y∗ = w −limk→∞ F (Aλkx), then it follows from the dissipativity of A that

Re 〈Aλkx− y, F (Aλkx)〉 = λ−1n Re 〈Aλkx− y, F (Jλkxx)〉 ≤ 0,

or equivalently |Aλkx|2 ≤ Re 〈y, F (Aλnx)〉. Letting k →∞, we obtain |y|2 ≤ Re 〈y, y∗〉.Combining this with

|y∗| ≤ limk→∞

|F (Aλkx)| = limk→∞

|Aλkx| = |y|,

we have|y∗| ≤ Re 〈y, y∗〉 ≤ |〈y, y∗〉| ≤ |y||y∗| ≤ |y|2.

Hence,〈y, y∗〉 = |y|2 = |y∗|2

and we have y∗ = F (y). Also, limk→∞ |F (Aλkx)| = |y| = |F (y)|. It thus follows fromthe uniform convexity of X∗ that

limk→∞

F (Aλkx) = F (y).

106

Since Ax is a closed convex set of X from Theorem , we can show that x → F (A0x)is single-valued. In fact, if C is a closed convex subset of X, the y is an element ofminimal norm in C, if and only if

|y| ≤ |(1− α)y + α z| for all z ∈ C and 0 ≤ α ≤ 1.

Hence,〈z − y, y〉+ ≥ 0.

and from Theorem 1.10

0 ≤ Re 〈z − y, f〉 = Re 〈z, f〉 − |y|2 (5.3)

for all z ∈ C and f ∈ F (y). Now, let y1, y2 be arbitrary in A0x. Then, from (5.3)

|y1|2 ≤ Re 〈y2, F (y1)〉 ≤ |y1||y2|

which implies that 〈y2, F (y1)〉 = |y2|2 and |F (y1)| = |y2|. Therefore, F (y1) = F (y2) asdesired. Thus, we have shown that for every sequence λ of positive numbers thatconverge to zero, the sequence F (Aλx) has a subsequence that converges to the samelimit F (A0x). Therefore, limλ→0 F (Aλx)→ F (A0x).

Furthermore, we assume that X is uniformly convex. We have shown above thatfor x ∈ dom (A) the sequence Aλ contains a weak convergent subsequence Aλnxand if y = w − limn→∞ Aλnx then [x, y] ∈ A0 and |y| = limn→∞ |Aλnx|. But sinceX is uniformly convex, it follows from Theorem 1.10 that A0 is single-valued and thusy = A0x. Hence, w − limn→∞ Aλnx = A0x and limn→∞ |Aλnx| = |A0x|. Since X isuniformly convex, this implies that limn→∞ Aλnx = A0x.

Theorem 1.12 Let X is a uniformly convex Banach space and let A be m-dissipative.Then dom (A) is a convex subset of X.

Proof: It follows from Theorem 1.4 that

|Jλx− x| ≤ λ ‖Ax‖ for x ∈ dom (A)

Hence |Jλx− x| → 0 as λ→ 0+. Since Jλx ∈ dom (A) for X ∈ X, it follows that

dom (A) = x ∈ X : |Jλx− x| → 0 as λ→ 0+.

Let x1, x2 ∈ dom (A) and 0 ≤ α ≤ 1 and set

x = αx1 + (1− α)x2.

Then, we have|Jλx− x1| ≤ |x− x1|+ |Jλx1 − x1|

|Jλx− x2| ≤ |x− x2|+ |Jλx2 − x2|(5.4)

where x− x1 = (1− α) (x2 − x1) and x− x2 = α (x1 − x2). Since Jλx is a boundedset and a uniformly convex Banach space is reflexive, it follows that there exists asubsequence Jλnx that converges weakly to z. Since the norm is weakly lower semi-continuous, letting n→∞ in (5.4) with λ = λn, we obtain

|z − x1| ≤ (1− α) |x1 − x2|

|z − x2| ≤ α |x1 − x2|

107

Thus,|x1 − x2| = |(x1 − z) + (z − x2)| ≤ |x1 − z|+ |z − x2| ≤ |x1 − x2|

and therefore |x1−z| = (1−α) |x1−x2|, |z−x2| = α |x1−x2| and |(x1−z)+(z−x2)| =|x1 − x2|. But, since X is uniformly convex we have z = x and w − lim Jλnx = x asn→∞. Since we also have

|x− x1| ≤ lim infn→∞

|Jλnx− Jλnx1| ≤ |x− x1|

|Jλnx− Jλnx1| → |x− x1| and w − lim Jλnx− Jλnx1 = x− x1 as n→∞. Since X isuniformly convex, this implies that limλ→0+ Jλx = x and x ∈ dom (A).

5.0.2 Generation of Nonlinear Semigroups

In this section, we consider the generation of nonlinear semigroup by Crandall-Liggetton a Banach space X. Definition 2.1 Let X0 be a subset of X. A semigroup S(t), t ≥of nonlinear contractions on X0 is a function with domain [0,∞) × X0 and range inX0 satisfying the following conditions:

S(t+ s)x = S(t)S(s)x and S(0)x = x for x ∈ X0, t, s ≥ 0

t→ S(t)x ∈ X is continuous

|S(t)x− S(t)y| ≤ |x− y| for t ≥ 0, x, y ∈ X0

Let A be a ω-dissipative operator and Jλ = (I −λA)−1 is the resolvent. The followingestimate plays an essential role in the Crandall-Liggett generation theory.

Lemma 2.1 Assume a sequence an,m of positive numbers satisfies

an,m ≤ αan−1,m−1 + (1− α) an−1,m (5.5)

and a0,m ≤ mλ and an,0 ≤ nµ for λ ≥ µ > 0 and α = µλ . Then we have the estimate

an,m ≤ [(mλ− nµ)2 +mλ2]12 + [(mλ− nµ)2 + nλµ]

12 . (5.6)

Proof: From the assumption, (7.2) holds for either m = 0 or n = 0. We will use theinduction in n, m, that is if (7.2) holds for (n + 1,m) when (7.2) is true for (n,m)and (n,m − 1), then (7.2) holds for all (n,m). Let β = 1 − α. We assume that(7.2) holds for (n,m) and (n,m − 1). Then, by (7.1) and Cauchy-Schwarz inequality

αx+ βy ≤ (α+ β)12 (αx2 + βy2)

12

an+1,m ≤ αan,m−1 + β an,m

≤ α ([((m− 1)λ− nµ)2 + (m− 1)λ2]12 + [((m− 1)λ− nµ)2 + nλµ]

12 )

+β ([(mλ− nµ)2 +mλ2]12 + [(mλ− nµ)2 + nλµ]

12 )

= (α+ β)12 (α [((m− 1)λ− nµ)2 + (m− 1)λ2] + β [(mλ− nµ)2 +mλ2])

12

+(α+ β)12 (α [((m− 1)λ− nµ)2 + nλµ] + β [(mλ− nµ)2 + nλµ])

12

≤ [(mλ− (n+ 1)µ)2 +mλ2]12 + [(λ− (n+ 1)µ)2 + (n+ 1)λµ]

12 .

108

Here, we used α+ β = 1, αλ = µ and

α [((m− 1)λ− nµ)2 + (m− 1)λ2] + β [(mλ− nµ)2 +mλ2])

≤ (mλ− nµ)2 +mλ2 − αλ(mλ− nµ) = (mλ− (n+ 1)µ)2 +mλ2 − µ2

α [((m− 1)λ− nµ)2 + nλµ] + β [(mλ− nµ)2 + nλµ]

≤ (mλ− nµ)2 + (n+ 1)λµ− 2αλ(mλ− nµ) ≤ (mλ− (n+ 1)µ)2 + (n+ 1)λµ− µ2.

Theorem 2.2 Assume A be a dissipative subset of X × X and satisfies the rangecondition

dom (A) ⊂ R(I − λA) for all sufficiently small λ > 0. (5.7)

Then, there exists a semigroup of type ω on S(t) on dom (A) that satisfies for x ∈dom (A)

S(t)x = limλ→0+

(I − λA)−[ tλ

]x, t ≥ 0 (5.8)

and|S(t)x− S(t)x| ≤ |t− s| ‖Ax‖ for x ∈ dom (A), and t, s ≥ 0.

Proof: First, note that from (7.3) dom (A) ⊂ dom (Jλ). Let x ∈ dom (A) and setan,m = |Jnµx− Jmλ x| for n, m ≥ 0. Then, from Theorem 1.4

a0,m = |x− Jmλ x| ≤ |x− Jλx|+ |Jλx− J2λx|+ · · ·+ |J

m−1λ x− Jmλ x|

≤ m |x− Jλx| ≤ mλ ‖Ax‖.

Similarly, an,0 = |Jnµx− x| ≤ nµ ‖Ax‖. Moreover,

an,m = |Jnµx− Jmλ x| ≤ |Jnµx− Jµ(µ

λJm−1λ x+

λ− µλ

Jmλ x

)|

≤ µ

λ|Jn−1µ x− Jm−1

λ x|+ λ− µλ|Jn−1µ x− Jmλ x|

= αan−1,m−1 + (1− α) an−1,m.

It thus follows from Lemma 2.1 that

|J[ tµ

]µ x− J [ t

λ]

λ x| ≤ 2(λ2 + λt)12 ‖Ax‖. (5.9)

Thus, J[ tλ

]

λ x converges to S(t)x uniformly on any bounded intervals, as λ→ 0+. Since

J[ tλ

]

λ is non-expansive, so is S(t). Hence (5.8) holds. Next, we show that S(t) satisfies

the semigroup property S(t + s)x = S(t)S(s)x for x ∈ dom (A) and t, s ≥ 0. Lettingµ→ 0+ in (??), we obtain

|T (t)x− J [ tλ

]

λ x| ≤ 2(λ2 + λt)12 . (5.10)

for x ∈ dom (A). If we let x = J[ sλ

]

λ z, then x ∈ dom (A) and

|S(t)J[ sλ

]

λ z − J [ tλ

]

λ J[ sλ

]

λ z| ≤ 2(λ2 + λt)12 ‖Az‖ (5.11)

109

where we used that ‖AJ [ sλ

]

λ z‖ ≤ ‖Az‖ for z ∈ dom (A). Since [ t+sλ ]− ([ tλ ] + [ sλ ]) equals0 or 1, we have

|J [ t+sλ

]

λ z − J [ tλ

]

λ J[ sλ

]

λ z| ≤ |Jλz − z| ≤ λ ‖Az‖. (5.12)

It thus follows from (5.10)–(5.12) that

|S(t+ s)z − S(t)S(s)z| ≤ |S(t+ s)x− J [ t+sλ

]

λ z|+ |J [ t+sλ

]

λ z − J [ tλ

]

λ J[ sλ

]

λ z|

+|J [ tλ

]

λ J[ sλ

]

λ z − S(t)J[ sλ

]

λ z|+ |S(t)J[ sλ

]

λ z − S(t)S(s)z| → 0

as λ→ 0+. Hence, S(t+ s)z = S(t)S(s)z.Finally, since

|J [ tλ

]

λ x− x| ≤ [t

λ] |Jλx− x| ≤ t ‖Ax‖

we have |S(t)x− x| ≤ t ‖Ax‖ for x ∈ dom (A). From this we obtain |S(t)x− x| → 0 ast→ 0+ for x ∈ dom (A) and also

|S(t)x− S(s)x| ≤ |S(t− s)x− x| ≤ (t− s) ‖Ax‖

for x ∈ dom (A) and t ≥ s ≥ 0.

6 Cauchy Problem

Definition 3.1 Let x0 ∈ X and ω ∈ R. Consider the Cauchy problem

d

dtu(t) ∈ Au(t), u(0) = x0. (6.1)

(1) A continuous function u(t) : [0, T ] → X is called a strong solution of (6.1) if u(t)is Lipschitz continuous with u(0) = x0, strongly differentiable a.e. t ∈ [0, T ], and (6.1)holds a.e. t ∈ [0, T ].(2) A continuous function u(t) : [0, T ] → X is called an integral solution of type ω of(6.1) if u(t) satisfies

|u(t)− x| − |u(t)− x| ≤∫ t

t(ω |u(s)− x|+ 〈y, u(s)− x〉+) ds (6.2)

for all [x, y] ∈ A and t ≥ t ∈ [0, T ].

Theorem 3.1 Let A be a dissipative subset of X × X. Then, the strong solutionto (6.1) is unique. Moreover if the range condition (7.3) holds, then then the strongsolution u(t) : [0,∞)→ X to (6.1) is given by

u(t) = limλ→0+

(I − λA)−[ tλ

]x for x ∈ dom (A) and t ≥ 0.

Proof: Let ui(t), i = 1, 2 be the strong solutions to (6.1). Then, t→ |u1(t)− u2(t)| isLipschitz continuous and thus a.e. differentiable t ≥ 0. Thus

d

dt|u1(t)− u2(t)|2 = 2 〈u′1(t)− u′2(t), u1(t)− u2(t)〉i

110

a.e. t ≥ 0. Since u′i(t) ∈ Aui(t), i = 1, 2 from the dissipativeness of A, we haveddt |u1(t)− u2(t)|2 ≤ 0 and therefore

|u1(t)− u2(t)|2 ≤∫ t

0

d

dt|u1(t)− u2(t)|2 dt ≤ 0,

which implies u1 = u2.For 0 < 2λ < s let uλ(t) = (I −λA)−[ t

λ]x and define gλ(t) = λ−1(u(t)−u(t−λ))−

u′(t) a.e. t ≥ λ. Since limλ→0+ |gλ| = 0 a.e. t > 0 and |gλ(t)| ≤ 2M for a.e. t ∈ [λ, s],where M is a Lipschitz constant of u(t) on [0, s], it follows that limλ→0+

∫ sλ |gλ(t)| dt = 0

by Lebesgue dominated convergence theorem. Next, since

u(t− λ) + λ gλ(t) = u(t)− λu′(t) ∈ (I − λA)u(t),

we have u(t) = (I − λA)−1(u(t− λ) + λ gλ(t)). Hence,

|uλ(t)− u(t)| ≤ |(I − λA)−[ t−λλ

]x− u(t− λ)− λ gλ(t)|

≤ |uλ(t− λ)− u(t− λ)|+ λ |gλ(t)|

a.e. t ∈ [λ, s]. Integrating this on [λ, s], we obtain

λ−1

∫ s

s−λ|uλ(t)− u(t)| dt ≤ λ−1

∫ λ

0|uλ(t)− u(t)| dt+

∫ s

λ|gλ(t)| dt.

Letting λ→ 0+, it follows from Theorem 2.2 that |S(s)x−u(s)| = 0 since u is Lipschitzcontinuous, which shows the desired result.

In general, the semigroup S(t) generated on dom (A) in Theorem 2.2 in not neces-sary strongly differentiable. In fact, an example of an m-dissipative A satisfying (7.3)is given, for which the semigroup constructed in Theorem 2.2 is not even weakly dif-ferentiable for all t ≥ 0. Hence, from Theorem 3.1 the corresponding Cauchy problem(6.1) does not have a strong solution. However, we have the following.

Theorem 3.2 Let A be an ω-dissipative subset satisfying (7.3) and S(t), t ≥ 0 be thesemigroup on dom (A), as constructed in Theorem 2.2. Then, the followings hold.(1) u(t) = S(t)x on dom (A) defined in Theorem 2.2 is an integral solution of type ωto the Cauchy problem (6.1).(2) If v(t) ∈ C(0, T ;X) be an integral of type ω to (6.1), then |v(t)−u(t)| ≤ eωt |v(0)−u(0)|.(3) The Cauchy problem (6.1) has a unique solution in dom (A) in the sense of Definition2.1.

Proof: A simple modification of the proof of Theorem 2.2 shows that for x0 ∈ dom (A)

S(t)x0 = limλ→0+

(I − λA)−[ tλ

]x0

exists and defines the semigroup S(t) of nonlinear ω-contractions on dom (A), i.e.,

|S(t)x− S(t)y| ≤ eωt |x− y| for t ≥ 0 and x, y ∈ dom (A).

111

Summing up the both sides of this in k from [ τλ ] + 1 to [ τλ ], we obtain∫ [ τλ

]λ

[ τλ

]λ(|v(t)− J [σ

λ]

λ x0| − |v(t− J [σλ

]

λ x0| dσ

≤∫ t

t(−|v(s)− J [ τ

λ]

λ x0|+ |v(s)− J [ τλ

]

λ x0|+∫ [ τ

λ]λ

[ τλ

]λω |v(s)− J [σ

λ]

λ x0| dσ) ds.

Now, by Lebesgue dominated convergence theorem, letting λ→ 0+∫ τ

τ(|v(t)− u(σ)| − |v(t− u(σ)|) dσ +

∫ t

t(|v(s)− u(τ)| − |v(s)− u(τ)|) ds

≤∫ t

t

∫ τ

τω |v(s)− u(σ)| dσ ds.

(6.7)

For h > 0 we define Fh by

Fh(t) = h−2

∫ t+h

t

∫ t+h

t|v(s)− u(σ)| dσ ds.

Then from (6.7) we haved

dtFh(t) ≤ ω Fh(t) and thus Fh(t) ≤ eωtFh(0). Since u, v are

continuous we obtain the desired estimate by letting h→ 0+.

Lemma 3.3 Let A be an ω-dissipative subset satisfying (7.3) and S(t), t ≥ 0 be thesemigroup on dom (A), as constructed in Theorem 2.2. Then, for x0 ∈ dom (A) and[x, y] ∈ A

|S(t)x0 − x|2 − |S(t)x0 − x|2 ≤ 2

∫ t

t(ω |S(s)x0 − x|2 + 〈y, S(s)x0 − x〉s ds (6.8)

and for every f ∈ F (x0 − x)

lim supt→0+

Re 〈S(t)x0 − x0

t, f〉 ≤ ω |x0 − x|2 + 〈y, x0 − x〉s. (6.9)

Proof: Let yλk be defined by (6.3). Since A is ω-dissipative, there exists f ∈ F (Jkλx0−x)such that Re (ykλ − y, f〉 ≤ ω |Jkλx0 − x|2. Since

Re 〈ykλ, f〉 = λ−1Re 〈Jkλx0 − x− (Jk−1λ x0 − x), f〉

≥ λ−1(|Jkλx0 − x|2 − |Jk−1λ x0 − x||Jkλx0 − x|) ≥ (2λ)−1(|Jkλx0 − x|2 − |Jk−1

λ x0 − x|2),

we have from Theorem 1.4

|Jkλx0 − x|2 − |Jk−1λ x0 − x|2 ≤ 2λRe 〈ykλ, f〉 ≤ 2λ (ω |Jkλx0 − x|2 + 〈y, Jkλx0 − x〉s.

Since J[ tλ

]

λ x0 = Jkλx0 on [kλ, (k + 1)λ), this can be written as

|Jkλx0 − x|2 − |Jk−1λ x0 − x|2 ≤

∫ (k+1)λ

kλ(ω |J [ t

λλ x0 − x|2 + 〈y, J [ t

λ]

λ x0 − x〉s) dt.

113

Hence,

|J [ tλ

]

λ x0 − x|2 − |J[ tλ

]

λ x0 − x|2 ≤ 2λ

∫ [ tλ

]λ

[ tλ

]λ(ω |J [ s

λ]

λ x0 − x|2 + 〈y, J [ sλ

]

λ x0 − x〉s) ds.

Since |J [ sλ

]

λ x0| ≤ (1 − λω)−k |x0| ≤ ekλω |x0|, by Lebesgue dominated convergence the-orem and the upper semicontinuity of 〈·, ·〉s, letting λ→ 0+ we obtain (6.8).

Next, we show (6.9). For any given f ∈ F (x0 − x) as shown above

2Re 〈S(t)x0 − x0, f〉 ≤ |S(t)x0 − x|2 − |x0 − x|2.

Thus, from (6.8)

Re 〈S(t)x0 − x0, f〉 ≤∫ t

0(ω |S(s)x0 − x|2 + 〈y, S(s)x0 − x〉s) ds

Since s → S(s)x0 is continuous, by the upper semicontinuity of 〈·, ·〉s, we have (6.9).

Theorem 3.4 Assume that A is a close dissipative subset of X ×X and satisfies therange condition (7.3) and let S(t), t ≥ 0 be the semigroup on dom (A), defined inTheorem 2.2. Then, if S(t)x is strongly differentiable at t0 > 0 then

S(t0)x ∈ dom (A) andd

dtS(t)x |t=t0 ∈ AS(t)x,

and moreover

S(t0)x ∈ dom (A0) andd

dtS(t)x

∣∣t=t0 = A0S(t0)x.

Proof: Let ddtS(t)x |t=t0 = y . Then S(t0 − λ)x− (S(t0)x− λ y) = o(λ), where |o(λ)|

λ →0 as λ → 0+. Since S(t0 − λ)x ∈ dom (A), there exists a [xλ, yλ] ∈ A such thatS(t0 − λ)x = xλ − λ yλ and

λ (y − yλ) = S(t0)x− xλ + o(λ). (6.10)

If we let x = xλ, y = yλ and x0 = S(t0)x in (6.9), then we obtain

Re 〈y, f〉 ≤ ω |S(t0)x− xλ|2 + 〈yλ, S(t0)x− xλ〉s.

for all f ∈ (S(t0)x−xλ). It follows from Lemma 1.3 that there exists a g ∈ F (S(t0)x−xλ) such that 〈yλ, S(t)x− xλ〉s = Re 〈yλ, g〉 and thus

Re 〈y − yλ, g〉 ≤ ω |S(t0)x− xλ|2.

From (6.10)

λ−1|S(t0)x− xλ|2 ≤|o(λ)|λ|S(t0)x− xλ|+ ω |S(t0)x− xλ|2

and thus

(1− λω)

∣∣∣∣S(t0)x− xλλ

∣∣∣∣→ 0 as λ→ 0+.

114

Combining this with (6.10), we obtain

xλ → S(t0)x and yλ → y

as λ → 0+. Since A is closed, it follows that [S(t0)x, y] ∈ A, which shows the firstassertion.

Next, from Theorem 2.2

|S(t0 + λ)x− S(t0)x| ≤ λ ‖AS(t0)x‖ for λ > 0

This implies that |y| ≤ ‖AS(t0)x‖. Since y ∈ AS(t0)x, it follows that S(t0)x ∈dom (A0) and y ∈ A0S(t0)x.

Theorem 3.5 Let A be a dissipative subset of X ×X satisfying the range condition(7.3) and S(t), t ≥ 0 be the semigroup on dom (A), defined in Theorem 2.2. Then thefollowings hold.(1) For ever x ∈ dom (A)

limλ→0+

|Aλx| = lim inft→0+

|S(t)x− x|.

(2) Let x ∈ dom (A). Then, limλ→0+ |Aλx| <∞ if and only if there exists a sequencexn in dom (A) such that limn→∞ xn = x and supn ‖Axn‖ <∞.

Proof: Let x ∈ dom (A). From Theorem 1.4 |Aλx| is monotone decreasing and

lim |Aλx| exists (including ∞). Since |J [ tλ

]

λ − x| ≤ t |Aλx|, it follows from Theorem2.2 that

|S(t)x− x| ≤ t limλ→0+

|Aλx|

thus

lim inft→0+

1

t|S(t)x− x| ≤ lim

λ→0+|Aλx|.

Conversely, from Lemma 3.3

− lim inft→0+

1

t|S(t)x− x||x− u| ≤ 〈v, x− u〉s

for [u, v] ∈ A. We set u = Jλx and v = Aλx. Since x− u = −λAλx,

− lim inft→0+

1

t|S(t)x− x|λ|Aλx| ≤ −λ |Aλx|2

which implies

lim inft→0+

1

t|S(t)x− x| ≥ |Aλx|.

Theorem 3.7 Let A be a dissipative set of X ×X satisfying the range condition (7.3)and let S(t), t ≥ 0 be the semigroup defined on dom (A) in Theorem 2.2.

(1) If x ∈ dom (A) and S(t)x is differentiable a.e., t > 0, then u(t) = S(t)x, t ≥ 0is a unique strong solution of the Cauchy problem (6.1).

(2) If X is reflexive. Then, if x ∈ dom (A) then u(t) = S(t)x, t ≥ 0 is a uniquestrong solution of the Cauchy problem (6.1).

Proof: The assertion (1) follows from Theorems 3.1 and 3.5. If X is reflexive, then sincean X-valued absolute continuous function is a.e. strongly differentiable, (2) followsfrom (1).

115

6.0.1 Infinitesimal generator

Definition 4.1 Let X0 be a subset of a Banach space X and S(t), t ≥ 0 be a semigroupof nonlinear contractions on X0. Set Ah = h−1(T (h) − I) for h > 0 and define thestrong and weak infinitesimal generators A0 and Aw by

A0x = limh→0+ Ahx with dom (A0) = x ∈ X0 : limh→0+ Ahx exists

A0x = w − limh→0+ Ahx with dom (A0) = x ∈ X0 : w − limh→0+ Ahx exists,(6.11)

respectively. We define the set D by

D = x ∈ X0 : lim infh→0

|Ahx| <∞ (6.12)

Theorem 4.1 Let S(t), t ≥ 0 be a semigroup of nonlinear contractions defined on aclosed subset X0 of X. Then the followings hold.(1) 〈Awx1 − Awx2, x

∗〉 for all x1, x2 ∈ dom (Aw) and x∗ ∈ F (x1 − x2). In particular,A0 and Aw are dissipative.

(2) If X is reflexive, then dom (A0) = dom (Aw) = D.(3) If X is reflexive and strictly convex, then dom(Aw) = D. In addition, if X isuniformly convex, then dom (Aw) = dom (A0) = D and Aw = A0.

Proof: (1) For x1, x2 ∈ X0 and x∗ ∈ F (x1 − x2) we have

〈Ahx1 −Ahx2, x∗〉 = h−1(〈S(h)x1 − S(h)x2, x

∗〉 − |x1 − x2|2)

≤ h−1(|S(h)x1 − S(h)x2||x1 − x2| − |x1 − x2|2) ≤ 0.

Letting h→ 0+, we obtain the desired inequality.(2) Obviously, dom (A0) ⊂ dom (Aw) ⊂ D. Let x ∈ D. It suffices to show thatx ∈ dom (A0). We can show that t → S(t)x is Lipschitz continuous. In fact, thereexists a monotonically decreasing sequence tk of positive numbers and L > 0 suchthat tk → 0 as k → ∞ and |S(tk)x − x| ≤ L tk. Let h > 0 and nk be a nonnegativeinteger such that 0 ≤ h− nktk < tk. Then we have

|S(t+ h)x− S(t)x| ≤ |S(t)x− x| = |S(h− nktk + nktk)x− x|

≤ |S(h− nktk)x− x|+ Lnktk ≤ |S(h− nktk)x− x|+ Lh

By the strong continuity of S(t)x at t = 0, letting k → ∞, we obtain |S(t + h)x −S(t)x| ≤ Lh. Now, since X is reflexive this implies that S(t)x is a.e. differentiableon (0,∞). But since S(t)x ∈ dom (A0) whenever d

dtS(t)x exists, S(t)x ∈ dom (A0) a.e.

t > 0. Thus, since |S(t)x− x| → 0 as t→ 0+, it follows that x ∈ dom (A0).(3) Assume that X is reflexive and strictly convex. Let x0 ∈ D and Y be the set of allweak cluster points of t−1(S(t)x0−x0) as t→ 0+. Let A be a subset of X ×X definedby

A = A0 ∪ [x0, co Y ] and dom (A) = dom (A0) ∪ x0

where co Y denotes the closure of the convex hull of Y . Note that from (1)

〈A0x1 −A0x2, x∗〉 ≤ 0 for all x1, x2 ∈ dom (A0) and x∗ ∈ F (x1 − x2)

116

and for every y ∈ Y

〈A0x1 − y, x∗〉 ≤ 0 for all x1 ∈ dom (A0) and x∗ ∈ F (x1 − x0).

This implies that A is a dissipative subset ofX×X. But, sinceX is reflexive, t→ S(t)x0

is a.e. differentiable and

d

dtS(t)x0 = A0S(t)x0 ∈ AS(t)x0, a.e., t > 0.

It follows from the dissipativity of A that

〈 ddt

(S(t)x0 − x0), x∗〉 ≤ 〈y, x∗〉, a.e, t > 0 and y ∈ Ax0 (6.13)

for x∗ ∈ F (S(t)x0 − x0). Note that for h > 0

〈h−1(S(t+h)x0−S(t)x0), x∗〉 ≤ h−1(|S(t+h)x0−x0|−|S(t)x0−x0|)|x∗| for x∗ ∈ F (S(t)x0−x0).

Letting h→ 0+, we have

〈 ddt

(S(t)x0 − x0), x∗〉 ≤ |S(t)x0 − x0|d

dt|S(t)x0 − x0|, a.e. t > 0

The converse inequality follows much similarly. Thus, we have

|S(t)x0−x0|d

dt|S(t)x0−x0| = 〈

d

dt(S(t)x0−x0), x∗〉 for x∗ ∈ F (S(t)x0−x0). (6.14)

It follows from (6.13)–(6.14) thatd

dt|S(t)x0 − x0| ≤ |y| for y ∈ Y and a.e. t > 0 and

thus|S(t)x0 − x0| ≤ t ‖Ax0‖ for all t > 0. (6.15)

Note that Ax0 = co Y is a closed convex subset of X. Since X is reflexive and strictlyconvex, there exists a unique element y0 ∈ Ax0 such that |y0| = ‖Ax0‖. Hence, (6.15)implies that co Y = y0 = Awx0 and therefore x0 ∈ dom (Aw).

Next, we assume that X is uniformly convex and let x0 ∈ dom(Aw) = D. Then

w − limS(t)x0 − x0

t= y0 as t→ 0+.

From (6.15)|t−1(S(t)x0 − x0)| ≤ |y0|, a.e. t > 0.

Since X is uniformly convex, these imply that

limS(t)x0 − x0

t= y0 as t→ 0+.

which completes the proof.

Theorem 4.2 Let X and X∗ be uniformly convex Banach spaces. Let S(t), t ≥0 be the semigroup of nonlinear contractions on a closed subset X0 and A0 be theinfinitesimal generator of S(t). If x ∈ dom (A0), then

117

(i) S(t)x ∈ dom (A0) for all t ≥ 0 and the function t→ A0S(t)x is right continuous on[0,∞).

(ii) S(t)x has a right derivative d+

dt S(t)x for t ≥ 0 and d+

dt S(t)x = A0S(t)x, t ≥ 0.

(iii) ddtS(t)x exists and is continuous except a countable number of values t ≥ 0.

Proof: (i)− (ii) Let x ∈ dom (A0). By Theorem 4.1, dom (A0) = D and thus S(t)x ∈dom (A0) and

d+

dtS(t)x = A0S(t)x for t ≥ 0. (6.16)

Moreover, t → S(t)x a.e. differentiable and ddtS(t)x = A0S(t)x a.e. t > 0. We next

prove that A0S(t)x is right continuous. For h > 0

d

dt(S(t+ h)x− S(t)x) = A0S(t+ h)x−A0S(t)x, a.e. t > 0.

From (6.14)

|S(t+ h)x− S(t)x| ddt|S(t+ h)x− S(t)x| = 〈A0S(t+ h)x−A0S(t)x, x∗〉 ≤ 0

for all x∗ ∈ F (S(t + h)x − S(t)x), since A0 is dissipative. Integrating this over [s, t],we obtain

|S(t+ h)x− S(t)x| ≤ |S(s+ h)x− S(s)x| for 0 ≤ s ≤ t

and therefore

|d+

dtS(t)x| ≤ |d

+

dsS(s)x|.

Hence t → |A0S(t)x| is monotonically non-increasing function and thus it is rightcontinuous. Let t0 ≥ 0 and let tk be a decreasing sequence of positive numbers suchthat tk → t0. Without loss of generality, we may assume that w−limk→∞ A0S(tk) = y0.The right continuity of |A0S(t)x| at t = t0, thus implies that

|y0| ≤ |A0S(t0)x| (6.17)

since norm is weakly lower semicontinuous. Let A0 be the maximal dissipative exten-sion of A0. It then follows from Theorem 1.9 that A is demiclosed and thus y0 ∈ AS(t)x.On the other hand, for x ∈ dom (A0) and y ∈ Ax, we have

〈 ddt

(S(t)x− x), x∗〉 ≤ 〈y, x∗〉 for all x∗ ∈ F (S(t)x− x)

a.e. t > 0, since A is dissipative and ddtS(t)x = A0S(t)x ∈ AS(t)x a.e. t > 0. From

(6.14) we havet−1|S(t)x− x| ≤ |Ax| = ‖A0x‖ for t ≥ 0

where A0 is the minimal section of A. Hence A0x = A0x. It thus follows from (6.17)that y0 = A0S(t0)x and limk→∞ A0S(tk)x = y0 since X is uniformly convex. Thus,we have proved the right continuity of A0S(t)x for t ≥ 0.(iii) Integrating (6.16) over [t, t+ h], we have

S(t+ h)x− S(t)x =

∫ t+h

tA0S(s)x ds

118

for t, h ≥ 0. Hence it suffices to prove that the function t → A0S(t)x is continuousexcept a countable number of t > 0. Using the same arguments as above, we can showthat if |A0S(t)x| is continuous at t = t0, then A0S(t)x is continuous at t = t0. Butsince |A0S(t)x| is monotone non-increasing, it follows that it has at most countablymany discontinuities, which completes the proof.

Theorem 4.3 LetX andX∗ be uniformly convex Banach spaces. IfA bem-dissipative,then A is demiclosed, dom (A) is a closed convex set and A0 is single-valued operatorwith dom (A0) = dom (A). Moreover, A0 is the infinitesimal generator of a semigroupof contractions on dom (A).

Proof: It follows from Theorem 1.9 that A is demiclosed. The second assertion followsfrom Theorem 1.12. Also, from Theorem 3.4

d

dtS(t)x = A0S(t)x, a.e. t > 0

and|S(t)x− x| ≤ t |A0x|, t > 0 (6.18)

for x ∈ dom (A). Let A0 be the infinitesimal generator of the semigroup S(t), t ≥ 0generated by A defined in Theorem 2.2. Then, (6.18) implies that by Theorem 4.1

x ∈ dom (A0) and by Theorem 4.2 d+

dt S(t)x = A0S(t)x and A0S(t)x is right continuousin t. Since A is closed,

A0x = limt→0+

A0S(t)x ∈ Ax.

Hence, (6.17) implies that A0x = A0x.When X is a Hilbert space we have the nonlinear version of Hille-Yosida theorem

as follows.Theorem 4.4 Let H be a Hilbert space. Then,(1) The infinitesimal generator A0 of a semigroup of contractions S(t), t ≥ 0 on aclosed convex set X0 has a dense domain in X0 and there exists a unique maximaldissipative operator A such that A0 = A0.

Conversely,(2) If A0 is a maximal dissipative operator, then dom (A) is a closed convex set andA0 is the infinitesimal generator of contractions on dom (A).

Proof: (2) Since from Theorem 1.7 the maximal dissipative operator in a Hilbert spaceis m-dissipative, (2) follows from Theorem 4.3.

Example (Nonlinear Diffusion) Consider the nonlinear diffusion equation of the form

ut = Au = ∆γ(u)− β(u)

on X = L1(Ω). Assume γ : R→ R is maximal monotone. and γ : R→ R is monotone.Let

dom(A) = there exists a v ∈W 1,1(Ω) such that v ∈ γ(u) and ∆v ∈ X

Thus, sign(x− y) = sign(γ(x)− γ(y)). Let ρ ∈ C2(R) be a monotonically increasingfunction satisfying ρ(0) = 0 and ρ(x) = sign(x), |x| ≥ 1 and ρε(x) = ρ(xε ) for ε > 0.Note that for u ∈ X

(u, ρε(u))→ |u| and (ψ, ρε(u))→ (ψ, sign0(u)) for ψ ∈ X

119

as ε→ 0+.

(Au1 −Au2, ρε(γ(u1)− γ(u2))) = −(∇(γ(u1)− γ(u2))ρ′ε)∇(γ(u1)− γ(u2)))

−(β(u1)− β(u2), ρε(γ(u1)− γ(u2)) ≤ 0

Letting ε→ 0∗ we obtain

(Au1 −Au2, sign0(u1 − u2)) ≤ 0

for all u1, u2 ∈ dom(A).For the range condition: we consider v ∈ γ(u) such that

λ γ−1(v)−∆v + β(γ−1(v)) = f, (6.19)

Let ∂j = λ γ−1(·)− β(γ−1(·)). For f ∈ L2(Ω) consider the minimization

1

2

∫Ω

(|∇v|2 + j(v)− f(x)v) dx

over v ∈ H10 (Ω). It has a unique solution v ∈ H2(Ω) ∩H1

0 (Ω) such that

∆v + f ∈ ∂j(v).

For f ∈ X we choose fn ∈ L2(Ω) such that |fn − f |X → 0 as n→∞. As show aboveun = ∆vn + fn,

|un − um|X ≤M |fn − fm|XThus, there exists u ∈ X such that ∆vn → u − f . Moreover, there exists a v ∈W 1,q, 1 < q < d

d−1 such that vn → v in X and ∆v = u− f . In fact, let p > d. For all

h0 ∈ Lp(Ω) and ~h ∈ Lp(Ω)d

−∆φ = h0 +∇ · ~h

has a unique solution u ∈W 1,p(Ω) and

|φ|W 1,p ≤M (|h|p + |~h|p).

By the Green’s formula

|(h0, vn)− (h,∇vn)| = | − (∆vn, φ)| ≤M (|h|p + |~h|p|)|∆vn|1.

Since h0 ∈ Lp(Ω) and h ∈ Lp(Ω)d are arbitraly

|vn|W 1,q ≤M |∆vn|,

Since W 1,q(Omega) is compactly embedded into L1(Ω),

vn → v, v ∈W 1,q0 (Ω).

Since ∂j is maximal monotone u ∈ ∂j(v) equivalently u ∈ γ−1(v) and ∆v+ f ∈ ∂j(v).

120

Example (Conservation law) We consider the scalar conservation law

ut + (f(u))x + f0(x, u) = 0, t > 0 u(x, 0) = u0(x), x ∈ Rd (6.20)

where f : R→ Rd is C1. Let X = L1(Rd) and define

Au = −(f(u))x,

where we assume f0 = 0 for the sake of simplicity of our presentation. Define Let

C = φ ∈ Z : φ ≥ 0.

Since Ac = 0 for all constant c, it follows that

φ− c ∈ C =⇒ (I − λA)−1φ− c ∈ C.

Similarly,c− φ ∈ C =⇒ c− (I − λA)−1φ ∈ C.

Thus, without loss of generality, one can assume f is bounded.We use the following lemma.

Lemma 2.1 For ψ ∈ H1(Rd) and φ ∈ L2div = φ ∈ (L2(Rd))d : ∇ · φ ∈ L2(Rd) we

have(φ,∇ψ) + (∇ · φ, ψ) = 0

Proof: Note that for ζ ∈ C∞0 (Rd)

(φ,∇(ζ ψ)) + (∇ · φ, ζ ψ) = 0

Let g ∈ C∞(Rd) satisfying g = 1 for |x| ≤ 1 and g = 0 if |x| ≥ 2 and set ζ = g(xr ).Then we have

(φ, ζ∇ψ +1

rψ∇g) + (∇ · φ, ζ ψ) = 0.

Since g(xr )→ 1 a.e. in Rd as r →∞ thus the lemma follows from Fatou’s lemma. .

Note that

−(f(u1)x − f(u2)x, ρ(u1 − u2)) = (f(u1)− f(u2), ρ′(u1 − u2) (u1 − u2)x),

If we define Ψ(x) =∫ x

0 σρ′(σ) dσ and ρε(x) = ρ(xε ) for ε > 0. then

|(η (u1 − u2), ρ′ε(u1 − u2) (u1 − u2)x)| = ε (Ψ(u1 − u2

ε), ηx) ≤M ε |ηx|1 → 0

as ε→ 0. Note that for u ∈ L1(Rd)

(u, ρε(u))→ |u| and (ψ, ρε(u))→ (ψ, sign0(u)) for ψ ∈ L1(Rd)

as ε→ 0+. Thus,〈Au1 −Au2, sign0(u1 − u2)〉 ≤ 0

and A is dissipative.It will be show that

range(λ I −A) = X,

121

i.e., for any g ∈ X there exists an entropy solution satisfying

(sign(u− k)(λu− g), ψ) ≤ (sing(u− k)(f(u)− f(k)), ψx)

for all ψ ∈ C10 (Rd) and k ∈ R. Hence A has a maximal monotone extension in L1(Rd).

In fact, for ε > 0 consider the viscous equation

λu− ε∆u+ f(u)x = g (6.21)

First, assume f is Lipschitz continuos, one can show that

λu− ε∆u− f(u)x : H1(Rd)→ (H1(Rd))∗

is monotone, hemi-continuous and coercive. It thus follows from [] that (6.21) has asolution u ∈ H1(Rd) for all g ∈ L2(Rd) ∩ L1(Rd). Since f(u)x ∈ L2(Rd), u ∈ H2(Rd).L∞(Rd) estimate From (6.21)

(λu− ε∆u+ f(u)x, |u|p−2u) = (g, |u|p−2u).

Sinceλ|u|pp + (f ′(u)ux, |u|p−2u)− ε(p− 1)(|u|

p2−1uux, |u|

p2−1uux)

≤ ((1− δ

2− |f

′|∞2εδ

p− 1)|u|pp,

we have

|up ≤ (1

2λ− |f

′|∞2ελ

p− 1)−1|g|p.

By letting p→∞|u|∞ ≤

1

λ|g|∞.

Thus, without loss of generality, f is C1(R)d.W 1,1(Rd) estimate Assuming g ∈W 11(Rd), v = ux satisfies

λ v − ε∆v + (f ′(u)v)x = gx.

Using the same arguments as above

|v| ≤ |gx|1.

Entropy condition We show that for every k ∈ R and nonnegative function ψ ∈ C∞0 (Rd)

(sign(u−k)(λu−g, , ψ)−ε(∆u, ψ) ≤ (sign(u−k)(f(x)−f(k)), ψx)+ε (|u−k|,∆ψ) ≥ 0(6.22)

It suffices to prove for u ∈ C20 (Rd). Note that

(ρε(u− k)f(u)x, ψ) = ((

∫ u

kρε(s− k)f ′(u) ds)x, ψ)

= −(

∫ u

kρε(s− k)fu(t, x, u) ds, ψx)

122

Since∫ k+εk ρε(s− k) ds = ε

∫ 10 ρ(s) ds→ 0 as ε→ 0, letting ε→ 0 we obtain

(sgn(u− k)f(u)x, ψ) = −(sgn(u− k)(f(u)− f(k)), ψx)

Next,−(ρε(u− k)∆u, ψ) = (ρ′ε(u− k)ux, ψ ux) + (ρε(u− k)ux, ψx)

where

(ρε(u− k)ux, ψx) = (−Ψε(u− k), De|taψ)→ −(|u− k|,∆ψ) as ε→ 0.

Thus, we obtain−(ρε(u− k)∆u, ψ) ≥ −(|u− k|,∆ψ)

and (6.22).

Example (Hamilton Jacobi equation) Let u is a solution to a scalar conservation in R1,

then v =∫ x

u dx satisfies the the Hamilton-Jacobi equation

vt + f(vx) = 0. (6.23)

Let X = C0(Rd) and

Av = −f(vx) dom(A) = f(vx) ∈ X

Then, for v1, v2 ∈ C1(R

d)

〈A(v1 − v2), δx0〉 = −(f((v1)x(x0))− f((v2)x(x0))) = 0

where x0 ∈ Rn such that |v|X = |v(x0)|. We prove the range condition

range(λ I −A) = X for λ > 0.

That is, there exists a unique viscosity solution to λv − f(vx) = g; for all φ ∈ C1(Ω) ifv − φ attains a local maximum at x0 ∈ Rd, then

λ v(x0)− g(x0) + f(φx(x0)) ≤ 0

and if v − φ attains a local minimum at x0 ∈ Rd, then

λ v(x0)− g(x0) + f(φx(x0)) ≥ 0.

Thus, A is maximal monotone and (6.23) has an intgral solution.Consider the equation of the form

λV + H(x, Vx)− ν∆V = ω f.

Assume that H is C1 and there exist c1 > 0, c2 > 0 and c3 ∈ L2(Rd) such that

|H(x, p)− H(x, q)| ≤ c1 |p− q| and H(t, x, 0) ∈ L2(Rd)

and|Hx(x, p)| ≤ c3(x) + c2 |p|

123

Define the Hilbert space H = H1(Rd) by

H = φ ∈ L2(Rd) : φx ∈ L2(Rd)

with inner product

(φ, ψ)H =

∫Rdφ(x)ψ(x) + φx(x) · ψx(x) dx.

Define the single valued operator A on H by

AV = −H(x, Vx) + ε∆V

with dom (A) = H3(Rd). We show that A − λ I is m-dissipative for some λ. First,

A− ω I with λ =c212ε is monotone since

(Aφ−Aψ, φ− ψ)H = −ε (|φx − ψx|22 + |∆(φ− ψ)|22)

−(H(·, φx)− H(·, ψx), φ− ψ −∆(φ− ψ))

≤ λ |φ− ψ|2H −ε

2|φx − ψx|2H ,

where we used the following lemma.

Lemma 2.1 For ψ ∈ H1(Rd) and φ ∈ L2div = φ ∈ (L2(Rd))d : ∇ · φ ∈ L2(Rd) we

have(φ,∇ψ) + (∇ · φ, ψ) = 0

Proof: Note that for ζ ∈ C∞0 (Rd)

(φ,∇(ζ ψ)) + (∇ · φ, ζ ψ) = 0

Let g ∈ C∞(Rd) satisfying g = 1 for |x| ≤ 1 and g = 0 if |x| ≥ 2 and set ζ = g(xr ).Then we have

(φ, ζ∇ψ +1

rψ∇g) + (∇ · φ, ζ ψ) = 0.

Since g(xr )→ 1 a.e. in Rd as r →∞ thus the lemma follows from Fatou’s lemma. .

Let us define the linear operator T on X by Tφ = ν2 ∆. with dom (T ) = H3(Rd).

Then T is a self-adjoint operator in H. Moreover, if let X = H2(Rd) then X∗ = L2(Rd)where H = H1(Rd) is the pivoting space and H = H∗ and T ∈ L(X, X∗) is hermiteand coercive. Thus, T is maximal monotone. Hence equation ω V − AV = ω f inL2(Rd) has a unique solution V ∈ H. Note that from (2.7) for φ ∈ H2(Rd)

H(x, φx)x = Hx(x, φx) + Hp(x, φx)φxx ∈ L2(Rd).

Thus, if f ∈ H then the solution V ∈ dom (A) and thus A is maximum monotone inH.

Step 3: We establish W 1,∞(Rd) estimate of solutions to

ω V +H(x, Vx)− ν∆V = ω f,

124

when fx, Hx(x, 0) ∈ L∞(Rd) and H satisfies

(2.9) |Hx(x, p)−Hx(x, 0)| ≤M1 |p| and |Hp(x, p)| ≤M2

√1 + |x|2.

Consider the equation of the form

(2.10) ω V + ψH(x, Vx)− ν∆V = ω f

where

ψ(x) =c2

c2 + |x|2for c ≥ 1.

Then ψ(x)H(x, p) satisfies (2.6)–(2.7) and thus there exists a unique solution V ∈H3(Rd) to (2.10) for sufficiently large ω > 0. Define U = Vx. Then, from (2.10) wehave

(2.11) ω (U, φ) + (ψxH(x, U) + ψHp(x, U) · Ux, φ) + ν (Ux, φx) = ω (fx, φ)

for φ ∈ H1(Rd)d. Let

|U | =√U2

1 + · · ·+ U2n and |U |p =

(∫Rd|U(x)|p dx

) 1p

Define the functions Ψ, Φ : R+ → R+ by

Ψ(r) =

rp2 for r ≤ R2

Rp−2r for r ≥ R2

and Φ(r) =

rp2−1 for r ≤ R2

Rp−2 for r ≥ R2.

Setting φ = Φ(|U |2)U ∈ H1(Rd)d in (2.11), we obtain

(2.12)

ω |Ψ(|U |2)|1 − (2x · Uc2 + |x|2

Φ(|U |2), ψ H(x, U))

+(ψ (Hx(x, U)− ω fx),Φ(|U |2)U) + (ψHp(x, U),Φ(|U |2)(1

2|U |2)x)

+ν 2 (Φ′(|U |2)(1

2|U |2)x, (

1

2|U |2)x) + (Φ(|U |2)Ux, Ux)) = 0.

Since from (2.9)|H(x, p)| ≤ const

√1 + |x|2 (1 + |p|),

there exists constants k1, k2 independent of c ≥ 1 and R > 0 such that

(2x · Uc2 + |x|2

Φ(|U |2), ψ H(x, U)))) ≤ k1 (ψ,Ψ(|U |2)) + k2 |Φ(|U |2)U |q

where q = pp−1 . It thus follows from (2.9) and (2.12) that

ω |Ψ(|U |2)|1+1

ν(Φ(|U |2)Ux, Ux) ≤ ω|Ψ(|U |2)|1+(k2+|Hx(x, 0)|p+ω |fx|p) |Φ(|U |2)U |q.

125

for some constant ω > 0 independent of c ≥ 1 and R > 0. Since |Ψ(|U |2)|1 ≥|Φ(|U |2)U |qq, it follows that for ω ≥ ω

|Ψ(|U |2)|1 +1

2ν(Φ(|U |2)Ux, Ux)

is uniformly bounded in R > 0. Letting R → ∞, it follows from Fatou’s lemma thatU ∈ Lp(Rd) and from (2.12)

ω (|U |p − |fx|p) + (ψ2x · Uc2 + |x|2

|U |p−2, H(x, U) + (ψHp(x, U), |U |p−2(1

2|U |2)x)

+(ψHx(x, U), |U |p−2U) + ν (p− 2) (|U |p−4, (1

2|U |2)x, (

1

2|U |2)x) + (|U |p−2Ux, Ux).

Thus we have

|U |p ≤ |fx|p +1

ω(σp |U |p + k2 + |Hx(x, 0|p)

where

σp = k1 +M1 +|ψHp(x, U)|2∞

(p− 2)ν.

Letting p→∞, we obtain

(2.13) |U |∞ ≤ |fx|∞ +1

ω((k1 +M1) |U |∞ + k2 + |Hx(x, 0|∞).

For c ≥ 1 let us denote by V c, the unique solution of (2.10). Let ζ(x) = χ(x/r) ∈C2(Rd), r ≥ 1 where χ = χ(|x|) ∈ C2(Rd) is a nonincreasing function such that

χ(s) = 1 for 0 ≤ s ≤ 1 and χ(s) = exp(−|s|) for s ≥ 2.

and we assume −∆χ ≤ k3 χ. Then

(2.14) ω (V c, ζ ξ) + (ψH(x, U c), ζ ξ)− ν (∆V c, ζ ξ) = ω (f, ζ ξ)

for all ξ ∈ L2(Rd), and U c = V cx satisfies

(2.15) ω (U c, ζ φ) + (ψH(x, U c),∇ · (ζ φ))− ν (tr U cx,∇ · ζ φ) = ω (fx, ζ φ)

for all φ ∈ H1(Rd)d. Setting φ = U c in (2.15), we obtain

ω (ζ U c, U c) + (ψH(x, U c), U c · ζx + ζ∇ · U c)

+ν (ζ U cx, U cx)− 1

2(∆ζ, |U c|2) = ω (fx, ζ U).

Since |U c|∞ is uniformly bounded in c ≥ 1, it follows that for any compact set Ω in Rd

there exists a constant MΩ independent of c ≥ 1 such that

ω (ζ U c, U c) + ν (ζ U cx, Ucx) ≤MΩ.

Hence for every compact set Ω of Rd Since if Ωr = |x| < r, then H2(Ωr) is compactlyembedded into H1(Ωr), it follows that there exists a subsequence of V c which con-verges strongly in H1(Ωr). By a standard diagonalization process, we can construct

126

a subsequence V c which converges to a function V strongly in H1(Ω)) and weaklyin H2(Ω) for every compact set Ω in Rd. Let U = Vx. Then, U c converges weakly inH1(Ω)d and strongly in L2(Ω). Since L2(Ω) convergent sequence has an a.e. pointwiseconvergent subsequence, without loss of generality we can assume that U c convergesto U a.e. in Rd. Hence, by Lebesgue dominated convergence theorem

Hp(·, U c)→ Hp(·, U) and Hx(·, U c)→ Hx(·, U)

strongly in L2(Ω)d. It follows from (2.14)–(2.15) that the limit V satisfies

(2.16) ω (V, ζ ξ) + (H(x, Vx), ζ ξ)− ν(∆V, ζ ξ) = ω (V, ζ ξ)

for all ξ ∈ L2loc(R

d) and

(2.17) ω (U, ζ φ) + (H(x, U),∇ · (ζ φ))− ν (tr Ux, (ζ φ)x) = ω (fx, ζ φ)

for all φ ∈ H1loc(R

d). Setting φ = |U |p−2U in (2.17), we obtain(2.18)

ω (ζ, |U |p) + (ζ Hp(x, U), |U |p−2(1

2|U |2)x) + (ζ Hx(x, U), |U |p−2U)− ω (fx, ζ |U |p−2U)

+ν (p− 2) (ζ |U |p−4(1

2|U |2)x, (

1

2|U |2)x) + (ζ |U |p−2Ux, Ux))− 1

p(∆ζ, |U |p) = 0

for all ζ = χ(x/r). Thus,

(2.19) (ζ, |U |p)1p ≤ (ζ, |fx|p)

1p +

1

ω(cp (ζ, |U |p)

1p + (ζ, |Hx(x, 0)|p)

1p )

where cp = M1 + k3p +

|ζ Hp(x,U)|2∞2ν (p−2) and

(2.20) (Hx(x, 0)−Hx(x, p), p) ≤M1 |p|2 all x ∈ Rd.

Now, letting p→∞ we obtain from (2.19)

(ω −M1) sup|x|≤r

|U | ≤ ω |fx|∞ + |Hx(x, 0)|∞.

Since r ≥ 1 is arbitrary, we have

(2.21) (ω −M1) |U |∞ ≤ ω |fx|∞ + |Hx(x, 0)|∞

Setting ξ = V in (2.16), we obtain

ω (ζ, |V |2) + (ζ,Hp(x, U))− ω (f, ζ V ) + ν (ζ |V |2)x,−1

2(∆ζ, |U |2) = 0.

Also, from (2.18)

ω (ζ, |U |2) + (ζ Hp(x, U), U · Ux) + (ζ Hx(x, U), U)− ω (fx, ζ U)

+ν (ζ Ux, Ux))− 1

2(∆ζ, |U |2) = 0

127

Since |U |∞ is bounded, thus V ∈ H2òc. In fact

ω (ζ V, V ) + ν (ζ Ux, Ux) ≤ a1 |U |2∞ + a2ω ((ζf, f) + (ζfx, fx)

for some constants a1, a2.Step 4: Next we prove that for ω > max(M1, ωα) equation

ω V −Aν(t)V = ω f

has a unique solution satisfying (2.21). We define the sequence Vk in H2òc(R

d) bythe successive iteration

(2.22) ω Vk+1 −Aν(t)Vk+1 − (ω − ω0)Vk = ω0 f.

From Step 3 (2.22) has a solution Vk+1 satisfying

(2.23) (ω −M1) |Uk+1|∞ ≤ (ω − ω0) |Uk|∞ + ω0 |fx|∞ + |Hx(x, 0)|∞

Thus,

|Uk|∞ ≤ (1− ω − ω0

ω −M1)−1 (ω0 |fx|∞ + |Hx(x, 0)|∞)

ω −M1=ω0 |fx|∞ + |Hx(x, 0)|∞

ω0 −M1= α

for all k ≥ 1. Vk is bounded sequence in W 1,∞(Rd) and thus in H2òc(R

d). Moreover,we have from (2.4)

|Vk+1 − Vk|X ≤ω − ω0

ω − ωα|Vk − Vk−1|X .

Thus Vk is a Cauchy sequence in X and Vk converges to V in X. Let us definethe single-valued operator B on X by Bφ = −H(x, φx). Since BVk is bounded inL∞(Rd) we may assume that BVk converges weakly star to w in L∞(Rd). If we showthat w = Bu, then V ∈ H2(Rd) solves the desired equation. Since Vk is bounded inH2(Ωr), Vk is strongly precompact in H1(Ωr) for each r > 0. Hence BVk → BV a.e.in Ωr and thus w = BV .Step 5: Next, we consider the case when H ∈ C1 is without the Lipschitz bound in pbut satisfies (2.2). Consider the cut-off function of H(x, p) by

HM (x, p) =

Hp(x,

p|p|)(p−M

p|p|) +H(M p

|p|) if |p| ≥M

H(x, p) if |p| ≤M.

From Step 3 and (2.20) equation

ω V +HM (x, Vx)− ν∆V = ω f

has the unique solution V and U = Vx satisfies (2.21) with M1 = βM + a. Let M > 0be so that (ω − (βM + a))M ≤ ω |fx|∞ + |Hx(x, 0)|∞. Then, |U |∞ ≤ M . and thusV ∈ H2

loc(Rd) ∩W 1,∞(Rd) satisfies

(2.24) ω V +H(x, Vx)− ν∆V = ω f

and

(2.25) (ω − (βM + a))M ≤ ω |fx|∞ + |Hx(x, 0)|∞

128

for ω > M1. If β = 0, then

ϕ(V )− ϕ(f)

ω≤ aϕ(V ) + b

where b = |Hx(x, 0)|∞.

Step 6: We prove that the solution V ν to (2.24) converges to a viscosity solution V to

(2.26) ω V +H(x, Vx) = ω f

as ν → 0+. First we show that for all φ ∈ C2(Ω) if V ν − φ attains a local maximum(minimum, respectively) at x0 ∈ Rn, then

(2.27) ω (V (x0)− f(x0)) +H(x0, φx(x0))− ν (∆V )(x0) ≤ 0 (≥ 0, respectively).

For φ ∈ C2(Ω) we assume that V ν − φ attains a local maximum at x0 ∈ Rd. LetΩ = x ∈ Rd : |x − x0| < 1 and Γ denote its boundary. Then without loss ofgenerality we can assume that V ν − φ attains the unique global maximum 1 at x0 andV ν − φ ≤ 0 on Γ. In fact we can choose ζ ∈ C∞(Ω) such that V ν − (φ − ζ) attainsthe unique global maximum 1 at x0, V ν − (φ − ζ) ≤ 0 on Γ and ζx(x0) = 0. Letψ = sup(0, V ν −φ) ∈W 1,∞

0 (Ω). Multiplying ψp−1 to (2.26) and integrating over Ω, weobtain

ω |ψ|pp + (η · (V ν − φ)x, ψp−1) + ν (p− 1) (ψx, ψ

p−2ψx) = −(δ ψ, ψp−1),

where

η =

∫ 1

0Hp(x, φx + θ (V ν − φ)x) dθ

andδ = ω (φ− f) +H(x, φx)− ν∆φ.

Since

|(η · ψx, ψp−1)| ≤ 1

4ν(p− 1)|η|2L∞(Ω) |ψ|

pp + ν(p− 1) |ψ

p2−1ψx|22

it follows that

(ω − |η|2∞4ν(p− 1)

) |ψ|pp ≤ −(δψ, ψp−1).

Letting p → ∞, we can conclude that δ(x0) ≤ 0 since δ ∈ C(Ω). Setting ψ =inf(0, V ν − φ)) and assuming V ν − φ has the unique global minimum −1 at x0 andV ν − φ ≥ 0 on Γ, the same argument as above shows the second assertion.

Next, we show that there exists a subsequence of V ν that converges to a viscositysolution V to

(2.28) ω V +H(x, Vx) = ω f,

i.e., for all φ ∈ C1(Ω) if V − φ attains a local maximum at x0 ∈ Rd, then

(2.29a) ω (V (x0)− f(x0)) +H(x0, φx(x0)) ≤ 0

and if V − φ attains a local minimum at x0 ∈ Rh, then

(2.29b) ω (V (x0)− f(x0)) +H(x0, φx(x0)) ≥ 0.

129

It follows from Step 5 that for some γ > 0 independent of ν

|V ν |W 1,∞(Ω) ≤ γ

Thus there exists a subsequence of V ν (denoted by the same) that converges weaklystar to V in W 1,∞(Rd), and thus the convergence is uniform in Ω. We prove (2.29a)first for φ ∈ C2(Ω). Assume that for φ ∈ C2(Ω) V ν−φ has a local maximum at x0 ∈ Ω.We can choose ζ ∈ C∞(Ω) such that ζx(x0) = 0 and V ν − (φ − ζ) has a strict localmaximum at x0. For ν > 0 sufficiently small, V ν − (φ − ζ) has a local maximum atsome xν ∈ Ω and xν → x0 as ν → 0+. From (2.27)

ω (V ν(xν)− f(xν)) +H(xν , φx(xν))− ν (∆φ)(xν) ≤ 0

We conclude (2.29a), since V ν(xν) → V (x0), φx(xν) − ζx(xν) → φx(x0) − ζx(x0) =φx(x0) and ν∆φ(xν) → 0 as ν → 0+. For φ ∈ C1(Ω) exactly the same argument isapplied to the convergent sequence φn ∈ C2(Ω) to φ in C1(Ω) to prove (2.29a).

Step 7: We show that if V ,W ∈ Dα are viscosity solutions to ω (V −f)+H(x, Vx) = 0and ω (W − g) +H(x,Wx) = 0, respectively, then

(3.30) (ω − ωα) |u− v|X ≤ ω |f − g|X .

For δ > 0 let

ψ(x) =1√

1 + |x|2+δ

If u, v ∈ Dα then

(2.31) lim|x|→∞

ψ(x)V (x) = lim|x|→∞

ψ(x)W (x) = 0.

We choose a function β ∈ C∞(Rd) satisfying

0 ≤ β ≤ 1, β(0) = 1, β(x) = 0 if |x| > 1.

Let M = max (|u|Xδ , |W |Xδ). Define the function Φ : Rn ×Rn → R by

(2.32) Φ(x, y) = ψ(x)V (x)− ψ(y)W (y) + 3Mβε(x− y)

whereβε(x) = β(

x

ε) for x ∈ Rd.

Off the support of βε(x − y), Φ ≤ 2M , while if |x| + |y| → ∞ on this support, then|x|, |y| → ∞ and thus from (2.30) lim|x|+|y|→∞ Φ ≤ 3M . We may assume thatV (x)−W (x) > 0 for some x. Then,

Φ(x, x) = ψ(x)(V (x)−W (x)) + 3M βε(0) > 3M.

Hence Φ attains its maximum value at some point (x0, y0) ∈ Rd × Rd. Moreover,|x0 − y0| ≤ ε since βε(x0 − y0) > 0. Now x0 is a maximum point of

ψ(x)

(V (x)− ψ(y0)W (y0)− 3Mβε(x− y0) + Φ(x0, y0)

ψ(x)

)

130

and since ψ > 0 the function

x→ V (x)− ψ(y0)W (y0)− 3Mβε(x− y0) + Φ(x0, y0)

ψ(x)

attains a maximum 0 at x0. Since

ψ(y0)W (y0)− 3Mβε(x0 − y0) + Φ(x0, y0) = ψ(x0)V (x0)

and V is a viscosity solution

(2.33) ψ(x0)(ω (V (x0)− f(x0)) +H(x0, p)) ≤ 0,

where

p =2 + δ

2ψ(x0)V (x0)ψ(x0)|x0|δ x0 −

3Mβ′ε(x0 − y0)

ψ(x0).

and we used the fact that (|x|2+δ)′ = (2 + δ)|x|δx. Moreover since V ∈ Dα

(2.34) |p| ≤ α

Similarly, the function

y →W (y)− ψ(x0)V (x0) + 3Mβε(x0 − y)− Φ(x0, y0)

ψ(y)

attains a minimum 0 at y0 and since W is a viscosity solution

(2.35) ψ(y0)(ω (W (y0)− g(y0)) +H(y0, q)) ≥ 0,

where

q =2 + δ

2ψ(y0)W (y0)ψ(y0)|y0|δ y0 −

3Mβ′ε(x0 − y0)

ψ(y0)

and |q| ≤ α. Thus by (2.33) and (2.35) we have

(2.36)ω (ψ(x0)V (x0)− ψ(y0)W (y0))

≤ ψ(y0)H(y0, q)− ψ(x0)H(x0, p) + ω(ψ(x0)f(x0)− ψ(y0)g(y0))

Since Φ(x0, y0) ≥ Φ(x, x) we have

ψ(x0)V (x0)− ψ(y0)W (y0) ≥ ψ(x)(V (x)−W (x)) + 3M (1− βε(x0 − y0))

and thus

(ψ(x0)−ψ(y0))V (x0)+ψ(y0)(V (x0)−W (y0)) ≥ ψ(x)(V (x)−W (x))+3M (1−βε(x0−y0)).

Since(2.37)

|(ψ(x0)−ψ(y0))V (x0)| = ψ(y0)ψ(x0)|V (x0)|(√

1 + |y0|2+δ−√

1 + |x0|2+δ) ≤ const |x0−y0|,

it follows that V (x0) ≥W (y0) for sufficiently small ε > 0. Note that

ψ(y0)H(y0, q)− ψ(x0)H(x0, p) = (ψ(y0)− ψ(x0))H(x0, p)

+ψ(y0)(H(y0, p)−H(x0, p)) + ψ(y0)(H(y0, q)−H(y0, p)).

131

From (2.36)–(2.37) we have that

(2.38)ω (ψ(x0)V (x0)− ψ(y0)W (y0)− (ψ(x0)f(x0)− ψ(y0)g(y0)))

≤ O(ε) + ψ(y0) (c1(y0), p− q) + c2 |p− q|).

where O(ε)→ 0 as ε→ 0. Now we evaluate p− q, i.e.,

p− q =2 + δ

2(ψ(x0)V (x0)− ψ(y0)W (y0))ψ(y0)|y0|δ y0

+2 + δ

2ψ(x0)V (x0)(ψ(x0) |x0|δx0 − ψ(y0)|y0|δy0)

+3Mβ′ε(x0 − y0) (√

1 + |x0|2+δ −√

1 + |y0|2+δ).

Since

|ψ(x0)V (x0)ψ(x0)|x0|δ x0| ≤ |u|X|x0|δ

√1 + |x0|2

1 + |x0|2+δ|x0| ≤M3

for some M3 > 0, it follows from (2.34) that

3|β′ε(x0 − y0)√

1 + |x0|2+δ| ≤M4

for some M4 > 0. Thus,

|3Mβ′ε(x0 − y0) |√

1 + |x0|2+δ −√

1 + |y0|2+δ| ≤ const (2 + δ)MM4|x0 − y0|.

and therefore

2 + δ

2ψ(x0)V (x0)(ψ(x0)|x0|δx0−ψ(y0)|y0|δy0)+3Mβε(x0−y0) (|x0|2+δ−|y0|2+δ) = O(ε).

In the right-hand side of (2.38) we have

ψ(y0)((c1(y0), p− q) + c2 |p− q|)

≤ O(ε) +2 + δ

2

β |y0|2+δ + c2α |y0|1+δ

1 + |y0|2+δ(ψ(x0)u(x0)− ψ(y0)v(y0)).

Hence from (2.38) we conclude

(2.39) ωy0 (ψ(x0)u(x0)− ψ(y0)v(y0)) ≤ ψ(x0)f(x0)− ψ(y0)g(y0) +O(ε)

where

ωδ = supy0

(ω − 2 + δ

2

β |y0|2+δ + c2α |y0|1+δ

1 + |y0|2+δ).

Assume that ω > λα. For x ∈ Rd we have

ψ(x)(u(x)− v(x)) + 3M = Φ(x, x) ≤ Φ(x0, y0) ≤ ψ(x0)u(x0)− ψ(y0)v(y0) + 3M

132

and so by (2.39)

ωδ supRd

ψ(x)(u(x)− v(x))+ ≤ ω (ψ(x0)u(x0)− ψ(y0)v(y0)) ≤ ψ(x0)f(x0)− ψ(y0)g(y0) +O(ε)

≤ supRd

ψ(f − g)+ + |ψ(x0)g(x0)− ψ(y0)g(y0)|+O(ε)

≤ supRd

ψ(f − g)+ + ωψg(ε) +O(ε)

where ωψg(·) is the modulus of continuity of ψg. Letting ε→ 0, we obtain

(3.40) ωδ supRd

ψ(x)(u(x)− v(x))+ ≤ supRn

ψ(f − g)+

Since |ψ|Xδ → |ψ|X as δ → 0+ for ψ ∈ X we obtain (2.30) by taking limit δ → 0+ in(2.40).Example (Plastic equations) Consider the visco-plastic equation of the form

vt + div(σ) = 0

where the stress σ(ε) = sigmat minimizes

h(σ)− ε : σ

and the strain ε is given by

εi,j =1

2(∂

∂xivj +

∂

∂xivj

That is,σ∂h∗(ε)

where h∗ is the conjugate of h

h∗(ε) = supσ∈Cε : σ − h(σ)

For the case of the linear elastic system

σ11

6.1 Evolution equations

In this section we consider the evolution equation of the form

d

dtx(t) ∈ A(t)u(t) (6.24)

in a Banach space X ∩D, where D is a closed set. We assume the dissipativity: thereexist a constant ω = ωD and continuous functions f : [0, T ] → X and L : R+ → R+

independent of t, s ∈ [0, T ] such that

(1− λω) |x1 − x2| ≤ |x1 − x2 − λ (y1 − y2)|+ λ |f(t)− f(s)|L(x2)|K(|y2|)

133

for all x1 ∈ D ∩ dom (A(t)) x2 ∈ D ∩ dom (A(s))and y1 ∈ A(t)x1, y2 ∈ A(s)x2, andA(t), t in[0, T ] is m-dissipative and

Jλ(t) = (λ I −A)−1 : D → D ∩ dom(A(tk)).

Thus, one constructs the mild solution as

u(t) = limλ→0+

Π[t/λ]k=1 Jλ(tk)u0, tk = kλ.

Theorem 1.1 Let (A(t), X) satisfy (H.1)–(H.4). Then,

U(t, s) = limλ→0+

Π[ t−sλ

]

i=1 Jλ(s+ iλ)x

exists for x ∈ dom (A(0)) and 0 ≤ s ≤ t ≤ T . The U(t, s) for 0 ≤ s ≤ t ≤ T defines anevolution operator on domA(0) and moreover satisfies

|U(t, s)x− U(t, s)y| ≤ eω (t−s) |x− y|

for 0 ≤ s ≤ t ≤ T and x, y ∈ dom (A(0)).

Proof: Let hi = λ and ti = s+ iλ in (). Then xi = Jλ(ti)xi−1, where we dropped thesuperscript λ and xm = (Πm

i=1Jλ(ti))x. Thus |xi| ≤ (1 − ωλ)−m ≤ e2ω(T−s) |x| = M1

for 0 < mλ ≤ T −s. Let xj is the approximation solution corresponding to the stepsize

h− j = µ and tj = s+ jµ. Define am,n = |xm − xn|. We first evaluate a0,n, am,0.

am,0 = |xm − x| ≤m∑k=1

|(Πmi=kJλ(ti))x− (Πm

i=k+1Jλ(ti))x|

≤m∑k=1

(1− ωλ)−(m−k+1)λ |||A(ti)x||| ≤ e2ω(T−s)mλM(x).

where M(x) = supt∈[0,T ] |||A(t)x|||. Similarly, we have

a0,n ≤ e2ω(T−s)nµM(x).

Next, we establish the recursive formula for ai,j . For λ ≥ µ > 0

ai,j = |xi − xj | ≤ |Jλ(ti)xi−1 − Jµ(tj)xj−1|

≤ |Jλ(ti)xi−1 − Jµ(ti)xj−1|+ |Jµ(ti)xj−1 − Jµ(tj)xj−1|

From Theorem 1.10

|Jλ(ti)xi−1 − Jµ(ti)xj−1|

= |Jµ(µ

λxi−1 +

λ− µλ

Jλ(ti)xi−1)− Jµ(ti)xj−1|

≤ (1− ω µ)−1(µ

λ|xi−1 − xj−1|+

λ− µλ|xi − xj−1|)

134

Hence for i, j ≥ 1 we have

ai,j ≤ (1− ωµ)−1(αai−1,j−1 + β ai,j−1) + bi,j

where

α =µ

λ, and β =

λ− µλ

and

bi,j = |Jµ(ti)xj−1 − Jµ(tj)xj−1| ≤ µ |f(ti)− f(tj)|L(|xj |)K(|Aµ(tj)xj−1|).

We show that if yi = Aλ(ti)xi−1, then there exists a constantM2 = M2(x0, T, ||A(s)x0||)such that |yi| ≤ M2 Since xi = Jλ(ti)xi−1 and Aλ(ti)xi−1 ∈ A(ti)xi, from (H.4) and(1.2)–(1.3) we have

|||A(ti)xi||| = |Aλ(ti)xi−1| ≤ |Aλ(ti−1)xi−1|+ |f(ti)− f(ti−1)|L(M1)(1 + |Ah(ti−1)xi−1|))

≤ (1− λ)−1(|||A(ti−1)xi−1|||+ |f(ti)− f(ti−1)|L(M1)(1 + |||A(ti−1)xi−1|||)

If we define ai = |||A(ti)xi|||, then

(1− ω λ) ai ≤ ai−1 + bi(1 + ai−1).

where bi = L(M1)|f(ti)− f(ti−1)|. Thus, it follows from the proof of Lemma 2.4 that|||A(ti)xi||| ≤M2, for some constant M2 = M(x0, T ). Since

(1.∗) |yi| ≤ (1−ωλ)−1(|||A(ti−1)xi−1|||+ |f(ti)− f(ti−1)|L(M1)(1 + |||A(ti−1)xi−1|||),

thus |yi| is uniformly bounded.It follows from Theorem and [] that

am,n ≤ e2ω(T−s)M(x) [((nµ−mλ)2 + nµ(λ− µ))1/2 + ((nµ−mλ)2 +mλ(λ− µ))1/2]

+(1− ωµ)−nn−1∑j=0

min(m−1,j)∑i=0

βj−1αi(ji

)bm−i,n−j ,

where mλ, nµ ≤ T − s. et ρ be the modulus of continuity of f on [0, T ], i.e.,

ρ(r) = sup |f(t)− f(τ) : 0 ≤ t, τ ≤ T and |t− τ | ≤ r

Then ρ is is nondecreasing and subadditive; i.e., ρ(r1+r2) ≤ ρ(r1)+ρ(r2) for r1, r2 ≥ 0.Thus

J =n−1∑j=0

min(m−1,j)∑i=0

βj−1αi(ji

)bm−i,n−j

≤ Cµ

nρ(|nµ−mλ)|) +n−1∑j=0

∑i=0

(m− 1)j βj−1αi(ji

)ρ(|jµ− iλ)|)

.

135

where we used () with C = L(M1)K(M2), the subadditivity of of ρ and the estimate

min(m−1,j)∑i=0

βj−1αi(ji

)≤ 1.

Next, let δ > 0, be given and write

n−1∑j=0

min(m−1,j)∑i=0

βj−1αi(ji

)ρ(|jµ− iλ)|) = I1 + I2

where I1 is the sum over indecies such that |jµ − iλ| < δ, while I2 is the sum overindecies satisfying |jµ− iλ| ≥ δ. Clearly I1 ≤ nρ(δ), but

I2 ≤ ρ(T )n−1∑j=0

min(m−1,j)∑i=0

βj−1αi(ji

)|jµ− iλ|2

δ2=ρ(T )

δ2n(n−1)(λµ−µ2) ≤ ρ(T )n2

δ2µ(λ−µ)

Therefore,

J ≤ Cnµ(ρ(|nµ−mλ|+ ρ(δ) +

ρ(T )

δ2nµ(λ− µ)

).

Combining ()–(), we obtain

am,n ≤ e2ω(T−s)M(x) [((nµ−mλ)2 + nµ(λ− µ))1/2 + ((nµ−mλ)2 +mλ(λ− µ))1/2]

+e2ω(T−s)C nµ

(ρ(|nµ−mλ|+ ρ(δ) +

ρ(T )

δ2nµ(λ− µ)

).

Now, we can choose, e.g, δ2 =√λ− µ and it follows from () that an,m as function

of m, n and λ, µ, tends to zero as |nµ − mλ| → 0 and n, m → ∞, subject to0 < nµ, mλ ≤ T − s, and the convergence is uniform in s.

Theorem 1.1 Let (A(t), X) satisfy (H.1)–(H.4). Then,

U(t, s) = limλ→0+

Π[ t−sλ

]

i=1 Jλ(s+ iλ)x

exists for x ∈ dom (A(0)) and 0 ≤ s ≤ t ≤ T . The U(t, s) for 0 ≤ s ≤ t ≤ T defines anevolution operator on A(0) and moreover satisfies

|U(t, s)x− U(t, s)y| ≤ eω (t−s) |x− y|

for 0 ≤ s ≤ t ≤ T and x, y ∈ dom (A(0)).

Moreover, we have the following lemma.

Lemma 1.2 If x ∈ D, then there exists a constant L such that

|U(s+ r, s)x− U(s+ r, s)x| ≤ Lρ(|s− s|)

for r ≥ 0 and s, s, s+ r, s+ r ≤ T .

Proof: Let ak = |xk − xk| where

xk = Πki=1(I − λA(s+ i λ))−1x and xk = Πk

i=1(I − λA(s+ i λ))−1x.

136

Then, for λ = rn

ak = |Jλ(s+ k λ)xk−1 − Jλ(s+ k λ)xk−1|

≤ |Jλ(s+ k λ)xk−1 − Jλ(s+ k λ)xk−1|+ |Jλ(s+ k λ)xk−1 − Jλ(s+ k λ)xk−1|

≤ λCρ (|s− s|) + (1− λω)−1 ak−1

for some C. Thus, we obtain

|ak| ≤e2ωr − 1

2ωρ(|s− s|),

and letting λ→ 0 we obtain the desired result.

6.2 Applications

We consider the Cauchy problem of the form

(1.1)d

dtu(t) = A(t, u(t))u(t) u(s) = u0

where A(t, u), t ∈ [0, T ] is a maximal dissipative linear operator in a Banach space Xfor each u belonging toD. Define the nonlinear operatorA(t) inX byA(t)u = A(t, u)u.We assume that dom (A(t, u)) is independent of u ∈ D and for each α ∈ R there existsan ωα ∈ R such that

(1.2) 〈A(t, u)x1−A(s, u)x2, x1−x2〉− ≤ ωα |x1−x2|+|f(t)−f(s)|L(|x2|)K(|A(s, u)x2|)

for u ∈ Dα and x1 ∈ dom (A(t, u)), x2 ∈ dom (A(s, u)). Moreover, A(t) satisfies (C.2),i.e.,(1.3)(1−λωα) |u1−u2| ≤ |(u1−u2)−λ (A(t)u1−A(s)u2)|+λ |f(t)−f(s)|L(|u2|)K(|A(s)u2|)

for u1 ∈ dom (A(t)) ∩ Dα, u2 ∈ dom (A(s)) ∩ Dα. We consider the finite differenceapproximation of (1.1); for sufficiently small λ > 0 there exists a family uλi in D suchthat

(1.4)

uλi − uλi−1

hλi= A(tλi , u

λi−1)uλi with uλ0 = u0

ϕ(uλi )− ϕ(uλi−1)

hλi≤ aϕ(uλi ) + b.

Then, it follows from Theorems 2.5–2.7 that if the sequence

(1.5) ελi = A(tλi , uλi )uλi −A(tλi , u

λi−1)uλi satisfy

∑Nλi=1 h

λi |ελi | → 0 as λ→ 0+,

then (1.1) has the unique integrable solution.We have the following theorem.

Theorem 5.0 Assume (1.2) holds and for each α ∈ R there exist a cα ≥ 0 such that

(1.6) |(A(t, u)u−A(t, v)u| ≤ cα |u− v|, for u, v ∈ dom (A(t)) ∩Dα.

137

Let f be of bounded variation and u0 ∈ dom (A(s))∩D. Then (1.3) and (1.5) are sat-isfied and thus (1.1) has a unique integrable solution u and limλ→0+ uλ = u uniformlyon [s, T ].

Proof: If yλi = (hλi )−1(uλi − uλi−1), then we have

yλi+1 − yλi = A(tλi+1, uλi )uλi+1 −A(tλi , u

λi )uλi

+A(tλi , uλi )uλi −A(tλi , u

λi−1)uλi

and thus from (1.2) and (1.6)

(1− λωα) |yλi+1| ≤ (1 + λcα) |yλi |+ |f(tλi+1)− f(tλi )|L(|uλi |)K(|yλi |)

with yλ0 = A(s, u0)u0 = A(s)u0. Thus by the same arguments as in the proof of Lemma2.4, we obtain |yλi | ≤M for some constant M and therefore from (1.6)∑Nλ

i=1 |ελi |λ ≤MT λ→ 0 as λ→ 0+.

We also note that (1.3) follows from (1.2) and (1.6).

6.3 Navier Stokes Equation, Revisited

We consider the incompressible Navier-Stokes equations (5.1). We use exactly the samenotation as in Section.. Define the evolution operator A(t, u) by w = A(t, u)v ∈ H,where

(5.1) (w, φ)H + ν σ(u, φ) + b(u, v, φ)− (f(t), φ)H = 0

for φ ∈ V , with dom (A(t)) = dom (A0). We let D = V and define the functional asbelow.

Theorem 5.1 The evolution operator (A(t, u), ϕ,D,H) defined above satisfies theconditions (1.2)–(1.5) with g(r) = b, a suitably chosen positive constant.

Proof: For u1, u2 ∈ dom (A0).

(A(t, v)u1 −A(s, v)u2, u1 − u2)H + ν |u1 − u2|2V = (f(t)− f(s), u1 − u2)

since b(u, v1 − v2, v1 − v2) = 0, which implies (1.2). The existence of uδ ∈ V for theequation: δ−1(uδ − u0) = A(t, u0)uδ for u0 ∈ V , δ > 0 and t ∈ [0, T ] follows from Step2. of Section NS and we have

(5.2)|uδ|2H − |u0|2H

δ+ ν |uδ|2V ≤

1

ν|f(t)|2V ∗ .

We also have the estimate of |uδ|V .

(5.3)1

2δ(|uδ|2V − |u0|2V ) +

ν

2|A0u

δ|2 ≤ 27M41

4ν3|u0|2H |u0|2V |uδ|2V +

1

ν|Pf(t)|2H .

for the two dimensional case. Multiplying the both side of (5.2) by |uδ|2H + |u0|2H , wehave

|uδ|4H − |u0|4Hδ

+ ν (|uδ|2H + |u0|2H)|uδ|2V ≤1

ν(|uδ|2H + |u0|2H) |f(t)|2V ∗

138

Since s→ log(1 + s) is concave

c0ν4 log(1 + |uδ|2V )− log(1 + |u0|2V )

δ

≤ c0ν4δ−1 |uδ|2V − |u0|2V

1 + |u0|2V≤ 2ν

|u0|2H |u0|2V |uδ|2V + 2ν−1|Pf(t)|2H1 + |u0|2V

where we set c0 =4

27M41

. Thus, if we define

ϕ(u) = 2 |u|2H + c0ν4 log(1 + |u|2V )

then for every δ > 0ϕ(uδ)− ϕ(u0)

δ≤ b

for some constant b ≥ 0, since |uλi |H is uniformly bounded. For (1.5) if yλi = (hλi )−1(uλi −uλi−1), then

(yλi+1 − yλi , yλi+1) + νhλi+1 |yλi+1|2 + hλi b(yλi , u

λi , y

λi+1) + (f(tλi+1) = f(tλi ), yλi+1)

Note that

(5.4)

|b(yλi , uλi , yλi+1)| ≤M1|uλi |V |yλi |12H |yλi |

12V |yλi+1|

12H |yλi+1|

12V

≤ ν

2|yλi+1|2V +

ν

2|yλi+1|2V +

M21 |uλi |2V8ν

|yλi+1|2V +M2

1 |uλi |2V8ν

|yλi+1|2V .

For simplicity of our discussions we assume hλi = h Then we have(5.5)(1−hωα) |yλi+1|2H+hν |yλi+1|2V + ≤ (1+hωα) |yλi+1|2H+ν |yλi+1|2V +|f(tλi+1−f(tλi )|H |yλi+1|H

which implies that |yλi |H ≤M for some constant independent of h > 0.Note that

|A(t, u)u−A(t, v)u|H = sup|phi|H≤1

|b(u, u, φ)− b(v, u, φ)| ≤ |u− v|12V |u− v|

12H |u|

12V |A0u|

12H

and from (5.3)

ν

2

N∑i=1

hλi |A0uλi |2H ≤ C

for some constant C independent of N and hλi . Since |yλi |H ≤M , |uλi − uλi−1| ≤M hλi .Thus (1.5) holds and (1.1) has a unique integrable solution u and limλ→0+ uλ = uuniformly on [s, T ].

Next, we consider the three dimensional problem (d = 3). We show that thereexists a locally defined solution and a global solution exists when the data (u0, f(·))are small.

We have the corresponding estimate of (5.3):

(5.6)1

2δ(|uδ|2V − |u0|2V ) +

ν

2|A0u

δ|2 ≤ 27M42

4ν3|u0|4V |uδ|2V +

1

ν|Pf(t)|2H .

139

We define the functional ϕ by

ϕ(u) = 1− (1 + |u|2V )−1

Since s → 1 − (1 + s)−1 is concave and |uλi |H is uniformly bounded, it follows from(5.2) and (5.6) that for every δ > 0

(5.6)ϕ(uδ)− ϕ(u0)

δ≤ (

27M42

4ν3|u0|4V |uδ|2V +

1

ν|Pf(t)|2H)(1 + |u0|2)−2.

Thus

ϕ(uλi )− ϕ(uλi−1)

hλi≤ (

27M42

4ν3|u0|4V |uδ|2V +

1

ν|Pf(t)|2H)(1 + |u0|2)−2 ≤ bi

where bi = (27M4

24ν3 |uλi |2V + c1

ν and from (5.3)

N∑i=1

hλi |uλi |2V ≤1

ν(|u0|2H +

c1T

ν).

The estimate (5.4) is replaced by

|b(yλi , uλi , yλi+1)| ≤M1|uλi |V |yλi |14H |yλi |

34V |yλi+1|

14H |yλi+1|

34V

≤ ν

2|yλi+1|2V +

ν

2|yλi+1|2V +

3M21 |uλi |2V32ν

|yλi+1|2V +3M2

1 |uλi |2V32ν

|yλi+1|2V .

Thus, if hλi = h then we obtain (5.5) and thus |yλi |H ≤M for some constant independentof h > 0 and (1.5) holds.

6.4 Approximation theory

In this section we discuss the approximation theory of the mild solution to (6.24).Consider an evolution equation in Xn:

(3.1)d

dtun(t) ∈ An(t)un(t) t > s; un(s) = x

where Xn is a linear closed subspace of X. We consider the family of approximatingsequences (An(t), dom (An(t)), ϕ,D) on Xn satisfying the uniform dissipativity: thereexist constant ω = ωα continous functions f : [0, T ] → X and L : R+ → R+ indepen-dent of t, s ∈ [0, T ] and n such that

(3.2) (1− λωα) |x1 − x2| ≤ |x1 − x2 − λ (y1 − y2)|+ λ |f(t)− f(s)|L(x2)|K(|y2|)

for all x1 ∈ Dα ∩ dom (An(t)) x2 ∈ Dα ∩ dom (An(s))and y1 ∈ An(t)x1, y2 ∈ An(s)x2,and the consistency:

(3.3)

for β > 0, t ∈ [0, T ], and [x, y] ∈ A(t) with x ∈ Dβ,

there exists [xn, yn] ∈ An(t) with xn ∈ Dα(β) such that

lim |xn − x|+ |yn − y| = 0 as n→∞.

140

where α(β) ≥ β and α : R+ → R+ is an increasing function.

Lemma 3.1 Let (A(t), X) satisfy (A.1)− (A.2) and either (R)− (C.1) or (R.1)− (C.2)on [0, T ]. For each λ > 0, we assume that (tλi , x

λi , y

λi , ε

λi ) satisfies

yλj =xλi − xλi−1

tλi − tλi−1

− ελi ∈ A(tλi )xλi

with tλ0 = s and xλ0 = x and xλ ∈ Dα for 0 ≤ i ≤ Nλ. For the case of (R.1) − (C.2)we assume that x ∈ D × dom (A(s)), f is of bounded variation, and |ylambdai| isuniformly bounded in 1 ≤ i ≤ Nλ and λ > 0. Then the step function uλ(t; s, x) definedby uλ(t, s, x) = xλi on (tλi−1, t

λi ], satisfies

|uλ(t; s, x)− u(t; s, x)| ≤ e2ω(2t+dλ) (2|x− u|+ dλ (|v|+ Mρ(T ))

+M(T − s)(c−1ρ(T )dλ + ρ(δ) + δλ).

Proof: It follows from Lemms 2.6–2.7 that there exists a DS-approximation sequence(tµj , x

µj , y

µj , ε

µj ) as defined in (2.12). For the case of (C.2)− (R.1), from Lemma 2.4 we

have |yµj | ≤ K for 1 ≤ j ≤ Nµ uniformly in µ. It thus follows from Theorem 2.5 that

there exists a constnat M such that

|uλ(t; s, x)− uµ(t; s, x)| ≤ e2ω(2t+dλ) (2|x− u|+ dλ (|v|+ Mρ(T ))

+M(T − s)(c−1ρ(T )dλ + ρ(δ) + δλ − δµ).

Theorem 3.2 Let (An(t), dom (An(t)), ϕ,D) be approximating sequences satisfying(R) or (R.1) (resp.) and (3.2)–(3.3) and we assume (A(t), dom (A(t)), D, ϕ) satisfies(A.1)–(A.2) and (R) or (R.1) (resp.). Then for every x ∈ D ∩ dom (A(s)) and xn ∈D ∩Xn such that limxn = x as n→∞ we have

lim |un(t; s, xn)− u(t; s, x)| = 0, as n→∞

uniformly on [s, T ], where u(t; s, x) and un(t; s, x) is the unique mild solution to (2.1)and (3.1), respectively.

Proof: Let [xi, yi] ∈ A(ti) and xi ∈ Dα for i = 1, 2. From (3.3) we can choose[xni , y

ni ] ∈ An(ti), i = 1, 2 with xi ∈ Dα′ such that |xni − xi|+ |yni − yi| → 0 as n→∞

for i = 1, 2. Thus, letting n → ∞ in (3.2), we obtain (C.1) or (C.2). Let (tλi , xλi , y

λi )

be a DS–approximation sequence of (2.1). We assume that there exits a β > 0 suchthat xλi ∈ Dβ for all λ and 1 ≤ i ≤ Nλ. By the consistency (3.3) for any ε > 0 thereexists an integer n = n(ε) such that for n ≥ n(ε)

(3.4) |xλ,ni − xλi | ≤ ε, |yλ,ni − yλi | ≤ ε

and xλ,ni ∈ Dα for 1 ≤ i ≤ Nλ.

(3.5)

∑Nλi=1 |x

λ,ni − xλ,ni−1 − hλi y

λ,ni |

≤∑Nλ

i=1 |xλi − xλi−1 − hλi yλi |+ (Nλ + T ) ε+ |xn − x| = δn,λ,ε

141

By Theorems 2.5 and 2.8 that

(3.6)|uλ,n(t, xn)− un(t;xn)| ≤ e2ω(2t+dλ) (2|xn − un|+ dλ (|vn|+ Mρ(T ))

+M(T − s)(c−1ρ(T )dλ + ρ(δ) + δλ,n,ε)

for all 0 < c < δ < T and [un, vn] ∈ An(s) with un ∈ Dα. From the definition offunction uλ, uλ,n and (3.4)

|uλ(t; s, x)− uλ,n(t; s, xn)| ≤ ε, t ∈ (s, T ], n ≥ n(ε)

Thus, we have

|un(t; s, xn)− u(t; , s, x)| ≤ e2ω(2t+dλ) (2|xn − un|+ 2|x− u|+ dλ (|vn|+ |v|+ 2Mρ(T ))

+2M(T − s)(c−1ρ(T )dλ + ρ(δ)) + δλ,n,ε) + ε

for t ∈ (s, T ], n ≥ n(ε), where [u, v] ∈ A(s) with u ∈ Dβ. From the consistency (3.3)we can take [un, vn] ∈ An(s) such that un → u and vn → v as n→∞. It thus followsthat

limn→∞ |un(t; s, xn)− u(t; s, x)| = e2ω(2t+dλ) (4|x− u|+ 2dλ (|v|+ Mρ(T ))

+2M(T − s)(c−1ρ(T )dλ + ρ(δ)) + δλ,ε) + ε

for t ∈ [s, T ], where δλ,ε = (Nλ+T )ε. Now, letting ε→ 0+ and then λ→ 0+, we obtain

limn→∞

|un(t; s, xn)− u(t; s, x)| = e4ωt (4|x− u|+ ρ(δ)).

Since u ∈ Dβ∩dom (A(s)) and δ > 0 are arbitrary it follows that limn→∞ |un(t; s, xn)−u(t; s, x)| = 0 uniformly on [s, T ].

The following theorems give the equivalent characterization of the consistency con-dition (3.3).Theorem 3.2 Let (An, Xn), n ≥ 1 and (A,X) be dissipative operators satisfying thecondition

dom (An) ⊂ R(I − λAn) and dom (A) ⊂ R(I − λA)

and set Jnλ = (I−λAn)−1 and Jλ = (I−λA)−1 for λ > 0. Also, let B be the operator

that has [Jλx, λ−1(Jλx − x)] : x ∈ dom (A), 0 < λ < ω−1 as its graph. Then thefollowing statements (i) and (ii) are equivalent.

(i) B ⊂ limn→∞ An (i.e., for all [x, y] ∈ B there exists a sequence (xn, yn)| suchthat [xn, yn] ∈ An and lim |xn − x|+ |yn − y| → 0 as n→∞.)

(ii) dom (A) ⊂ limn→∞ dom (An) (equivalently, for all x ∈ dom (A) there exists asequence xn such that xn ∈ dom (An) and lim |xn − x| → 0 as n→∞.) and for allxn ∈ dom (An) and x ∈ dom (A) with x = limn→∞ xn, we have limn→∞ Jnλxn → Jλxfor each 0 < λ < ω−1.In particular, if dom (A) ⊂ dom (An) for all n, then the above are equivalent to

(iii) limn→∞ Jnλx→ Jλx for all 0 < λ < ω−1 and x ∈ dom (A).

Proof: (i) → (ii). Assume (i) holds. Then it is easy to prove that dom (B) ⊂limn→∞ dom (An). Since Jλx→ x as lambda→ 0+ for all x ∈ dom (A), it follows that

142

dom (B) = dom (A). Thus, the first assertion of (ii) holds. Next, we let xn ∈ dom (An),x ∈ dom (A) and limn→∞ xn = x. From (i), we have I−λB ⊂ limn→∞ I−λAn. Thus,R(I − λB) ⊂ limn→∞ R(I − λAn). Since x = Jλx− λλ−1(Jλx− x) ∈ (I − λB)Jλxfor x ∈ dom (A), we have dom (A) ⊂ R(I − λB). From (i) we can choose [un, vn] ∈ Ansuch that lim |un−Jλx)|+ |vn−λ−1(Jλx−x)| → 0 as n→∞. If zn = Jnλxn, then thereexists a [zn, yn] ∈ An such that xn = zn− λyn. It thus follows from the dissipativity ofAn that

|un − zn| ≤ |un − zn − λ(vn − yn)| = |un − λvn − xn| → |Jλx− (Jλx− x)− x| = 0

as n→∞. Hence, we obtain

lim Jλxn = lim zn = lim un = Jλx as n→∞.

(ii) → (i). If [u, v] ∈ B, then from the definition of B, there exist x ∈ dom (A)and 0 < λ < ω−1 such that u = Jλx and v = λ−1(Jλx − x). Since x ∈ dom (A) ⊂limn→∞ dom (An) we can choose xn ∈ dom (An) such that lim xn = x. Thus from theassumption, we have lim Jnλxn = jλx = u and lim λ−1(Jnλxn−xn) = λ−1(Jλx−x) = vas n→∞. Hence we obtain [u, v] ∈ limn→∞ An and thus B ⊂ limn→∞ An.

Finally, if dom (A) ⊂ dom (An), then (ii) → (iii) is obvious. Conversely, if (iii)holds, then

|Jnλxn − Jλx| ≤ |xn − x|+ |Jnλx− Jλx| → 0 as n→∞

when lim xn = x, and thus (ii) holds.

Theorem 3.3 Let (An, Xn), n ≥ 1 and (A,X) be m-dissipative operators, i.e.,

Xn = R(I − λAn) and X = R(I − λA)

and set Jnλ = (I − λAn)−1 and Jλ = (I − λA)−1 for 0 < λ < ω−1. Then the followingstatements are equivalent.

(i) A = limn→∞ An(ii) A ⊂ limn→∞ An(iii) For all xn, x ∈ X such that limn→∞ xn = x, we have limn→∞ Jnλxn → Jλx

for each 0 < λ < ω−1.(iv) For all x ∈ X and 0 < λ < ω−1 limn→∞ Jnλx→ Jλx.(v) For some 0 < λ0 < ω−1, limn→∞ Jnλx→ Jλx for all x ∈ X.

Proof: (i) → (ii) and (iv) → (v) are obvious. (ii) → (iii) follows from the proof of(i)→ (ii) in Theorem 3.2.

(v) → (ii). If [x, y] ∈ A, then from (v), lim Jnλ0(x − λ0 y) = Jλ0(x − λ0 y) = x and

lim λ−10 (Jnλ0

(x−λ0 y)−(x−λ0 y)) = y as n→∞. Since λ−10 (Jnλ0

(x−λ0 y)−(x−λ0 y)) ∈AnJ

nλ0

(x− λ0 y), [x, y] ∈ limn→∞ An and thus (ii) holds.(ii) → (i). It suffices to show that limn→∞ An ⊂ A. If [x, y] ∈ limn→∞ An then

there exists a [xn, yn] ∈ An such that lim |xn − x| + |yn − y| = 0 as n → ∞. Sincexn − λ yn → x − λ y and n → ∞, it follows from (iii) that xn = Jnλ (xn − λ yn) →Jλ(x− λ y) as n→∞ and hence x = Jλ(x− λ y) ∈ dom (A). But, since

λ−1(x− (x− λ y)) = λ−1(Jλ(x− λ y)− (x− λ y)) ∈ AJλ(x− λ y) = Ax

we have [x, y] ∈ A and thus (i) holds.

143

The following corollary is an immediate consequence of Theorems 3.1 and 3.3.

Corollary 3.4 Let (An(t), Xn) and (A(t), X) be m-dissipative operators for t ∈ [0, T ],(i.e.,

Xn = R(I − λAn(t)) and X = R(I − λA(t))

and (C.2) is satisfied), and let Un(t, s), U(t, s) be the nonlinear semigroups generatedby An(t), A(t), respectively. We set Jnλ (t) = (I−λAn(t))−1 and Jλ(t) = (I−λA(t))−1

for 0 < λ < ω−1. Then, if

Jnλ0xn → Jλ0x as n→∞

for all sequence xn satisfying xn → x as n→∞, then

|Un(t, s)xn − U(t, s)x| → 0 as n→∞

where the convergence is uniform on arbitrary bounded subintervals.

The next corollary is an extension of the Trotter-Kato theorem on the convergenceof linear semigroups to the nonlinear evolution operators.

Corollary 3.5 Let (An(t), X) be m-dissipative operators and Un(t, s), t ≥ s ≥ 0be the semigroups generated by An(t). We set Jnλ (t) = (I − λAn(t))−1. For some)λ0 < ω−1, we assume that there exists lim Jnλ0

(t)x exits as n→∞ for all x ∈ X andt ≥ 0 and denote the limit by Jλ0x. Then we have

(i) The operator A(t) defined by the graph [Jλ0(t)x, λ−10 (Jλ0(t)x− x)] : x ∈ X is

an m-dissipative operator. Hence (A(t), X) generates a nonlinear semigroup U(t, s) onX.

(ii) For every x ∈ R(Jλ0(s)) (= dom (A(s))), there exist a xn ∈ dom (An(s)) suchthat lim xn = x and lim U(t, s)xn = U(t, s)x for t ≥ s ≥ 0 as n → ∞. Moreover, theabove convergence holds for every xn ∈ dom (An(s)) = X satisfying lim xn = x, andthe convergence is uniform on arbitrary bounded intervals.

Proof: (i) Let [ui, vi] ∈ A(ti), i = 1, 2. Then from the definition of A(t) we have

ui = Jλ0(ti)xi and vi = λ−10 (Jλ0(ti)xi − xi)

for xi, i = 1, 2. By the assumption

lim Jnλ0(ti)xi = Jλ0(ti)xi lim λ−1

0 (Jnλ0(ti)xi − xi) = λ−1

0 (Jλ0(ti)xi − xi)

as n→∞ for i = 1, 2. Since λ−10 (Jnλ0

(ti)xi − xi) ∈ An(ti)Jn(Jnλ0

(ti)xi), it follows from(C.2) that

(1− λω) |Jnλ0(t1)x1 − Jnλ0

(t2)x2|

≤ |Jnλ0(t1)x1 − Jnλ0

(t2)x2 − λ (λ−10 (Jnλ0

(t1)x1 − x1)− λ−10 (Jnλ0

(t2)x2 − x2)|

+λ |f(t)− f(s)|L(|Jnλ0(t2)x2|)(1 + |λ−1

0 (Jnλ0(t2)xs − x2)|).

Letting n→∞, we obtain

(1− λω) |u1 − u2| ≤ |u1 − u2 − λ (v1 − v2)|+ λ |f(t)− f(s)|L(|u2|)(1 + |v2|).

144

Hence, (A(t), X) is dissipative. Next, from the definition of A(t), for every x|inX wehave x = Jλ0(t)x− λ0(λ−1

0 (Jλ0(t)x− x)) ∈ (I − λ0A(t))Jλ0x, which implies Jλ0(t)x =(I − λA(t))−1x. Hence, we obtain R(I − λ0A(t)) = X and thus A(t) is m-dissipative.

(ii) SinceA(t) is anm-dissipative operator and Jλ0x = (I−λ0A(t))−1 and dom (A(t)) =R((Jλ0(t)), it follows from Corollary 3.4 that Un(t, s)xn → U(t, s)x as n → ∞ ifx ∈ dom (A(s)) and xn → x as n→∞.

6.5 Chernoff Theorem

In this section we discuss the Chernoff theorem for the evolution equation (2.1).

Lemma 2.12 Let Tρ(t), t ∈ [0, T ] for ρ > 0 be a family of mapping from D intoitself satisfying

Tρ(t) : Dβ → Dψ(ρ,β),

for t ∈ [0, T ] and β ≥ 0,

|Tρ(t)x− Tρ(t)y| ≤ (1 + ωαρ) |x− y|

for x, y ∈ Dα, and

|Aρ(t)x−Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t)− f(s)|

for x ∈ D and t, s ∈ [0, T ], where

Aρ(t)x =1

ρ(Tρ(t)x− x), x ∈ D

Then, the evolution operator Aρ(t), t ∈ [0, T ] satisfies (C.2).

Proof: For x1, x2 ∈ Dα

|x1 − x2 − λ (Aρ(t)x1 −Aρ(s)x2)|

≥ (1 +λ

ρ) |x1 − x2| −

λ

ρ(|Tρ(t)x1 − Tρ(t)x2|) + |Tρ(t)x2 − Tρ(s)x2|

≥ (1− λωα) |x1 − x2| − λL(|x2|)(1 + |Aρ(s)x2|) |f(t)− f(s)|.

Lemma 2.14 Let X0 be a closed convex subset of a Banach space X and α ≥ 1.Suppose C(t) : X0 → X0, t ∈ [0, T ] be a family of evolution operators satisfying

(2.30) |C(t)x− C(t)y| ≤ α |x− y|

for x, y ∈ X0, and

(2.31) |C(t)x− C(s)x| ≤ L(|x|)(δ + |(C(s)− I)x|) |g(t)− g(s)|.

where δ > 0 and g : [0, T ] → X is continuous. Then, there exists a unique functionu = u(t;x) ∈ C1(0, T ;X0) satisfying

(2.32)d

dtu(t) = (C(t)− I)u(t), u(0) = x,

145

and we have

(2.33) |u(t;x)− u(t; y)| ≤ e(α−1)t |x− y|.

Proof: It suffices to prove that there exists a unique function u ∈ C(0, T ;X0) satisfying

(2.34) u(t) = e−tx+

∫ t

0e−(t−s)C(s)u(s) ds, t ∈ [0, T ].

First, note that if v(t) : [0, T ]→ X0 is continuous, then for t ∈ [0, T ]

e−tx+∫ t

0 e−(t−s)C(s)v(s) ds

= (1− λ)x+ λ

(∫ t0 e−(t−s)C(s)v(s) ds∫ t

0 e−(t−s) ds

)∈ X0

where

λ =

∫ t

0e−(t−s) ds,

since X0 is closed and convex. Define a sequence of functions un ∈ C(0, T ;X0) by

un(t) = e−tx+

∫ t

0e−(t−s)C(s)un−1(s) ds

with u0(t) = x ∈ X0. By induction we can show that

|un+1(t)− un(t)| ≤ (αt)n

n!

∫ t

0|C(s)x− x| ds

and thus un is a Cauchy sequence in C(0, T ;X0). It follows that un converges to aunique limit u ∈ C(0, T ;X0) and u satisfies (2.34). Also, it is easy to show that thereexists a unique continuous function that satisfies (2.34). Moreover since for x, y ∈ X0

et |u(t;x)− u(t; y)| ≤∫ t

0α es|u(s;x)− u(s; y)| ds,

by Gronwall’s inequality we have (2.33). Since s → C(s)u(s) ∈ X0 is continuous,u ∈ C1(0, T ;X0) satisfies (2.32).

Theorem 2.15 Let α ≥ 1 and let C(t) : Dβ → Dβ, t ∈ [0, T ] be a family of evolutionoperators satisfying (2.30)–(2.31). Assume that Xβ is a closed convex subset of X andg is Lipschitz continuous with Lipschitz constant Lg. Then, if we let u(t) = u(t;x) bethe solution to

d

dtu(t;x) = (C(t)− I)u(t;x), u(0;x) = x,

then there exist some constants Mτ such that for t ∈ [0, τ ](2.35)

|u(t;x)−Πni=1Cix| ≤ αne(α−1)t [(n− αt)2 + αt]

12 (Cn,τ + |C(0)x− x|)

+LgL(Mτ )

∫ t

0e(α−1)(t−s) [(n− s)− α(t− s)2 + α(t− s)]

12 K(|C(s)u(s)− u(s)|) ds

146

where Ci = C(i), i ≥ 0 and Cn,τ = L(|x|)Lg(δ + |C(0)x− x|) max(n, τ).

Proof: Note that

(2.36) u(t;x)− x =

∫ t

0es−t(C(s)u(s;x)− x) ds

Thus, from (2.30)–(2.31)

|u(t;x)− x| ≤∫ t

0es−t(|C(s)u(s;x)− C(0)u(s, x)|+ |C(0)u(s;x)− C(0)x|+ |C(0)x− x|) ds

≤∫ t

0es−t(L(|u(s, x)|)|g(s)− g(0)|K(|C(0)x− x|) + |C(0)x− x|+ α |u(s;x)− x|) ds

where K(s) = δ + s. Or, equivalently

et |u(t;x)−x| ≤∫ t

0es(L(M)|g(s)−g(0)|K(|C(0)x−x|)+|C(0)x−x|+α es|u(s;x)−x|) ds

since |u(t, x)| ≤M on [0, τ ] for some M = Mτ . Hence by Gronwall’s inequality

(2.37) |u(t;x)−x| ≤∫ t

0e(α−1)(t−s)(L(M)|g(s)−g(0)|K(|C(0)x−x|)+ |C(0)x−x|) ds.

It follows from (2.36) that

u(t;x)−Πni=1Cix = e−t(x−Πn

i=1Cix) +

∫ t

0es−t(C(s)u(s;x)−Πn

i=1Cix) ds.

Thus, by assumption

|u(t)−Πni=1Cix| ≤ e−t|x−Πn

i=1Cix|+∫ t

0es−t(|C(s)u(s)− Cnu(s)|+ |Cnu(s)− CnΠn−1

i=1 Cix|) ds

≤ e−tαnδn +

∫ t

0es−t(L(M)|g(s)− g(n)| |C(s)x− x|+ α |u(s)−Πn−1

i=1 Cix| ds

where δn =∑n

i=1 |Cix− x|. If we define

ϕn(t) = α−net|u(t, x)−Πni=1Cix|

then

ϕn(t) ≤ δn +

∫ t

0(Mesα−n|g(s)− g(n)|+ ϕn−1(s)) ds

By induction in n we obtain from (2.36)–(2.37)(2.38)

ϕn(t) ≤n−1∑k=0

δn−ktk

k!+

1

(n− 1)!

∫ t

0(t− s)n−1αseαs ds (L(|x|)LgτK(|C(0)x− x|) + |C(0)x− x|)

+L(M)α−nn∑k=0

∫ t

0esαk

(t− s)k

k!|C(s)u(s)− u(s)| |g(s)− g(n− k)| ds

147

on t ∈ [0, τ ], where we used ∫ t

0e(α−1)s ds ≤ αte−teαt.

Since ∫ t

0(t− s)n−1sk+1 ds = tk+n+1 (k + 1)!(n− 1)!

(k + n+ 1)!.

we have∫ t

0(t− s)n−1seαs ds =

∞∑k=0

αk

k!

∫ t

0(t− s)n−1sk+1 ds

= (n− 1)!

∞∑k=0

(k + 1)αktk+n+1

(k + n+ 1)!= (n− 1)!

∞∑k=n+1

(k − n)αk−1tk

k!.

Note that from (2.31)

|C(k)x− x| ≤ L(|x|)LgkK(|C(0)x− x|) for 0 ≤ k ≤ n.

Let Cn,τ = L(|x|)LgK(|C(0)x− x|)max(n, τ). Then we have(2.39)n−1∑k=0

δn−ktk

k!+

1

(n− 1)!

∫ t

0(t− s)n−1αseαs ds (L(|x|)LgτK(|C(0)x− x|) + |C(0)x− x|)

≤

( ∞∑k=0

|k − n|αktk

k!

)(Cn,τ + |C(0)x− x|) ≤ Ceαt [(n− αt)2 + αt]

12 (Cn,τ + |C(0)x− x|),

where we used

∞∑k=0

|k − n|αktk

k!≤ αt

2

( ∞∑k=0

|k − n|2αktk

k!

) 12

≤ eαt[(n− αt)2 + αt]12

Moreover, we have

(2.40)

n∑k=0

esαk(t− s)k

k!|g(s)− g(n− k)|

≤ Lgesn∑k=0

αk(t− s)k

k!|s− (n− k)|

≤ Lgeseα(t−s)

2

( ∞∑k=0

|s− (n− k)|2αk(t− s)k

k!

) 12

≤ Lgeseα(t−s) [(n− s)− α(t− s)2 + α(t− s)]12

Hence (2.35) follows from (2.36)–(2.40).

148

Theorem 2.16 Let Tρ(t), t ∈ [0, T ] for ρ > 0 be a family of mapping from D intoitself satisfying

(2.41) Tρ(t) : Dβ → Dψ(ρ,β),

for t ∈ [0, T ] and β ≥ 0,

(2.42) |Tρ(t)x− Tρ(t)y| ≤ (1 + ωαρ) |x− y|

for x, y ∈ Dα, and

(2.43) |Aρ(t)x−Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t)− f(s)|

for x ∈ D and t, s ∈ [0, T ], where

Aρ(t)x =1

ρ(Tρ(t)x− x), x ∈ D

Assume that Dβ is a closed convex subset in X for each β ≥ 0 and f is Lipschitzcontinuous on [0, T ] with Lipschitz constant Lf . Then, if we let uρ(t) = u(t;x) be thesolution to

(2.44)d

dtu(t;x) = Aρ(t)u(t;x), u(0;x) = x ∈ Dα,

then there exist constant M and ω = ωα such that

|uρ(t)−Π[ tρ

]

k=1Tρ(kρ)x| ≤≤ eωt [(1 + ωt)2ρ+ ωtρ+ t]12 (eωt (L(|x|Lf t(1 + |Aρ(0)x|) + |Aρ(0)x|)

+LfL(M)

∫ t

0eω(t−σ)(1 + |Aρ(σ)uρ(σ)|) dσ)

√ρ.

for x ∈ dom (A(0)) and t ∈ [0, T ].

Proof: Let α = 1 + ωρ and we let C(t) = Tρ(ρ t), g(t) = f(ρt) and δ = ρ. Then, C(t)satisfies (2.30)–(2.31). Next, note that |uρ(t)| ≤ M . It thus follows from Lemma 2.14and Theorem 2.15 that

|uρ(t)−Π[ tρ

]

k=1Tρ(kρ)x| ≤ e2ωt [(1 + ωt)2ρ+ ωtρ+ t]12√ρ ((|Aρ(0)x|+ L(|x|)Lf t(1 + |Aρ(0)x|))

+LfL(M)eωt∫ t

0eω(t−σ)[(1 + ω(t− σ))2ρ+ ω(t− σ)ρ+ (t− σ)]

12√ρ(1 + |Aρ(σ)uρ(σ)|) dσ.

where we set s = σρ , since Lg = Lfρ.

Theorem 2.17 Assume that (A(t), dom (A(t)), D, ϕ) satisfies (A.1)− (A.2) and either(R)− (C.1) or (2.3)− (C.2) and that Dβ is a closed convex subset in X for each β ≥ 0and f is Lipschitz continuous on [0, T ]. Let Tρ(t), t ∈ [0, T ] for ρ > 0 be a family ofmapping from D into itself satisfying (2.41)–(2.43) and assume the consistency

for β ≥ 0, t ∈ [0, T ], and [x, y] ∈ A(t) with x ∈ Dβ

there exists xρ ∈ Dα(β) such that lim |xρ − x|+ |Aρ(t)xρ − y| = 0 as ρ→ 0.

149

Then, for 0 ≤ s ≤ t ≥ T and x ∈ D ∩ dom (A(s))

(2.45) |Π[ t−sρ

]

k=1 Tρ(kρ+ s)x− u(t; s, x)| → 0 as ρ→ 0+

uniformly on [0, T ], where u(t; s, x) is the mild solution to (2.1), defined in Theorem2.10.

Proof: Without loss of generality we can assume that s = 0. It follows from Lemma2.4 and Lemma 2.12 that

(2.46) |Aρ(t)u(t)| ≤ e(α−1) t+L(M) |f |BV (0,T ) (|Aρ(0)x|+ L(M) |f |BV (0,T ))

on [0, T ]. Note that uρ(t) satisfies

uρ(iλ)− uρ((i− 1)λ)

λ= Aρ(iλ)uρ(iλ) + ελi

where

ελi =1

λ

∫ iλ

(i−1)λ(Aρ(s)uρ(s)−Aρ(iλ)uρ(iλ)) ds.

Since Aρ(t) : [0, T ] × X → X and t → uρ(t) ∈ X are Lipschitz continuous, it followsthat ∑Nλ

i=1 λ |ελi | → 0 as λ→ 0+.

Hence uρ(t) is the integral solution to (2.44). It thus follows from Theorem 3.2 that

lim |uρ(t)− u(t; s, x)|X → 0 as ρ→ 0

for x ∈ D ∩ dom (A(s)). Now, from Theorem 2.16 and (2.46)

|Π[ t−sρ

]

k=1 Tρ(kρ+ s)x− uρ(t)| ≤M√ρ

for x ∈ Dβ ∩ dom (A(s)). Thus, (2.45) holds for all for x ∈ Dβ ∩ dom (A(s)). By thecontinuity of the right-hand side of (2.45) with respect to the initial condition x ∈ X(2.45) holds for all x ∈ Dβ ∩ dom (A(s)).

The next corollary follows from Theorem 2.17 and is an extension of the Chernofftheorem of the autonomous nonlinear semigroups to the evolution case.Corollary 2.18 (Chernoff Theorem Let C be a closed convex subset of X. Weassume that the evolution operator (A(t), X) satisfies (A.1) with Lipschtz continuousf and

C ⊂ R(I − λA(t)) and (I − λA(t))−1 ∈ C

for 0 < λ ≤ δ and t ≥ 0. Let Tρ(t), t ≥ 0 for ρ > 0 be a family of mapping from Cinto itself satisfying

|Tρ(t)x− Tρ(t)y| ≤ (1 + ωρ) |x− y|

and|Aρ(t)x−Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t)− f(s)|

for x, y ∈ C. If A(t) ⊂ limρ→0+ Aρ(t), or equivalently

(I − λ Tρ(t)− Iρ

)−1x→ (I − λA(t))−1x

150

for all x ∈ C and 0 < λ ≤ δ, then for x ∈ C and 0 ≤ s ≤ t

|Π[ t−sρ

]

k=1 Tρ(s+ kρ)x− U(t, s)x| → 0 as ρ→ 0+,

where U(t, s) is the nonlinear semigroup generated by A(t) and the convergence isuniform on arbitrary bounded intervals.

Corollary 2.19 (Chernoff Theorem) Assume that (A(t), X) is m-dissipative andsatisfy (A.1) with Lipschitz continuous f , and that dom (A(t)) are independent oft ∈ [0, T ] and convex. Let Tρ(t), t ≥ 0 for ρ > 0 be a family of mapping from X intoitself satisfying

|Tρ(t)x− Tρ(t)y| ≤ (1 + ωρ) |x− y|

for x, y ∈ X, and

|Aρ(t)x−Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t)− f(s)|

If A(t) ⊂ limρ→0+ Aρ(t), or equivalently

(I − λ0Tρ(t)− I

ρ)−1x→ (I − λ0A(t))−1x

for all x ∈ X, t ≥ 0 and some 0 < λ0 < ω−1, then for x ∈ X and t ≥ s ≥ 0

|Π[ t−sρ

]

k=1 Tρ(s+ kρ)x− U(t, s)x| → 0 as ρ→ 0+,

where U(t, s) is the nonlinear semigroup generated by A(t) and the convergence isuniform on arbitrary bounded intervals.

Theorem 2.20 Let (A(t), X) be m-dissipative subsets of X×X and satisfy (A.1) withLipschitz continuous f and assume that dom (A(t)) are independent of t ∈ [0, T ] andconvex. Let Aλ(t) = λ−1(Jλ(t)−I) for λ > 0 and t ∈ [0, T ] and u(t; s, x) = U(t, s;Aλ)xbe the solution to

d

dtu(t) = Aλ(t)u(t), u(s) = x ∈ dom (A(s))

Then, we have

U(t, s)x = limλ→0+

Π[ tλ

]

k=1Jλ(s+ k λ) = limλ→0+

U(t, s;Aλ)x.

6.6 Operator Splitting

Theorem 3.1 Let X and X∗ be uniformly convex and let An, n ≥ 1 and A be m-dissipative subsets of X ×X. If for all [x, y] ∈ A0 there exists [xn, yn] ∈ An such that|xn − x|+ |yn − y| → 0 as n→∞, then

Sn(t)xn → S(t)x as n→∞

for every sequence xn ∈ dom (An) satisfying xn → x ∈ dom (A), where the convergenceis uniform on arbitrary bounded intevals.

151

Proof: It follows from Theorem 1.4.2–1.4.3 that for x ∈ dom (A), S(t)x ∈ dom (A0),d+

dt S(t)x = A0S(t)x, and t → A0S(t)x ∈ X is continuous except a countable numberof values t ≥ 0. Hence if we define xλi = S(tλi )x, tλi = i λ, then

xλi − xλi−1

λ−A0xλi = ελi = λ−1

∫ tλi

tλi−1

(A0(t)x(t)−A(tλi )x(tλi )) dt

where ∑Nλi=1 λ |ε

λi | ≤

∫ T

0|A0S(t)x−A0S(([

t

λ] + 1)λ| dt.

Since |A0S(t)x − A0S(([ tλ ] + 1)λ)x| → 0 a.e. t ∈ [0, T ] as λ → 0+, by Lebesgue

dominated convergence theorem∑Nλ

i=1 λ |ελi | → 0 as λ → 0. Hence it follows from theproof of Theorem 2.3.2 that |Sn(t)xn − S(t)x| → 0 as λ → 0 for all xn ∈ dom (A)satisfying xn → x ∈ dom (A). The theorem follows from the fact that S(t) and Sn(t)are of cotractions.

Theorem 3.2 Let X and X∗ be uniformly convex and let A and B be two m-dissipativesubests of X ×X. Assume that A+B is m-dissipative and let S(t) be the semigroupgenerated by A+B. Then we have

S(t)x = limρ→0+

((I − ρA)−1(I − ρB)−1

)[ tρ

]x

for x ∈ dom (A) ∩ dom (B), uniformly in any t-bounded intervals.

Proof: Define Tρ = JAρ JBρ and let xρ = x−ρ b where b ∈ Bx. Then, since JBρ (x−ρ b) =

xTρxρ − xρ

ρ=JAρ x− x

ρ+ b.

Hence ρ−1(Tρxρ− xρ)→ A0x+ b as ρ→ 0+. If we choose b ∈ Bx such that A0x+ b =(A + B)0x, then ρ−1(Tρxρ − xρ) → (A + B)0x as ρ → 0+. Thus, the theorem followsfrom Theorem 3.1 and Corollay 2.3.6.

Theorem 3.3 Let X and X∗ be uniformly convex and let A and B be two m-dissipativesubests of X ×X. Assume that A+B is m-dissipative and let S(t) be the semigroupgenerated by A+B. Then we have

S(t)x = limρ→0+

((2(I − ρ

2A)−1 − I)(2(I − ρ

2B)−1 − I)

)[ tρ

]x

for x ∈ dom (A) ∩ dom (B), uniformly in any t-bounded intervals.Proof: Define T2ρ = (2JAρ − I)(2JBρ − I) and let xρ = x − ρ b where b ∈ Bx. Then,

since JBρ (x− ρ b) = x, it follows that 2JBρ (xρ)− xρ = x+ ρ b and

T2ρxρ − xρ2ρ

=2JAρ (x+ ρ b)− (x+ ρ b)− (x− ρ b)

2ρ=JAρ (x+ ρ b)− x

ρ.

If we define the subset E of X × X by Ex = Ax + b, then E is m-dissipative andJAρ (x+ρ b) = JEρ x. Thus, it follows from Theorem 1.6 that ρ−1(Tρxρ−xρ)→ (A+B)0xas ρ→ 0+ and hence the theorem follows from Theorem 3.1 and Corollary 2.3.6.

152

Theorem 3.4 Let X and X∗ be uniformly convex and let A and B be two m-dissipativesubests of X×X. Assume that A, B are single-valued and A+B is m-dissipative. LetSA(t), SB(t) and S(t) be the semigroups generated by A, B and A+B, respectively.Then we have

S(t)x = limρ→0+

(SA(ρ)SB(ρ))[ tρ

]x

for x ∈ dom (A) ∩ dom (B), uniformly in any t-bounded intervals.

Proof: Clearly Tρ = SA(ρ)SB(ρ) is nonexpansive on C = dom (A) ∩ dom (B). Weshow that limρ→0+ h−1(Tρx− x) = Ax+B0x for every dom (A)∩ dom (B). Note that

Tρx− xρ

=SA(ρ)x− x

ρ+SA(ρ)SB(ρ)x− SA(ρ)x

ρ.

Since A is single-valued, it follows from Theorem 1.4.3 that limρ→0+ h−1(SA(ρ)x−x) =Ax. Thus, it suffices to show that

yρ =SA(ρ)SB(ρ)x− SA(ρ)x

ρ→ B0x as ρ→ 0+.

Since SA(ρ) is nonexpansive and B is dissipative, it follows from Theorem 1.4.3 that

(3.1) |yρ| ≤ |SB(ρ)x− x

ρ| ≤ |B0x| for all ρ > 0.

On the other hand,

(3.2) 〈SA(ρ)u− uρ

+SB(ρ)x− x

ρ− SA(ρ)x− x

ρ− yρ, F (u− SB(ρ)x)〉 ≥ 0

since SA(ρ)− I is dissipative and F is singled-valued. We choose a subsequence yρnthat converges weakly to y in X. It thus follows from (3.2) that since A is single-valued

〈Au+B0x−Ax− y, F (u− x)〉 ≥ 0.

Since A is maximal dissipstive, this implies that y = B0x and thus yρn convergesweakly to B0x. Since X is uniformly convex, (3.2) implies that yρn converges stronglyto B0x as ρn → 0+. Now, since C = A+B is m-dissipative and C0x = Ax+B0x, thetheorem follows from Corollary 2.3.6.

7 Monotone Operator Theory and Nonlinear semi-

group

In this section we discuss the nonlinear operator theory that extends the Lax-Milgramtheory for nonlinear monotone operators and mappings from Banach space from X toX∗.

Let us denote by F : X → X∗, the duality mapping of X, i.e.,

F (x) = x∗ ∈ X∗ : 〈x, x∗〉 = |x|2 = |x∗|2.

153

By Hahn-Banach theorem, F (x) is non-empty. In general F is multi-valued. Therefore,when X is a Hilbert space, 〈·, ·〉 coincides with its inner product if X∗ is identified withX and F (x) = x.

Let H be a Hilbert space with scalar product (φ, ψ) and X be a real, reflexiveBanach space and X ⊂ H with continuous dense injection. Let X∗ denote the strongdual space of X. H is identified with its dual so that X ⊂ H = H∗ ⊂ X∗. Thedual product 〈φ, ψ〉 on X ×X∗ is the continuous extension of the scalar product of Hrestricted to X ×H.

The following proposition contains some further important properties of the dualitymapping F .

Theorem (Duality Mapping) (a) F (x) is a closed convex subset.(b) If X∗ is strictly convex (i.e., balls in X∗ are strictly convex), then for any x ∈ X,F (x) is single-valued. Moreover, the mapping x → F (x) is demicontinuous, i.e., ifxn → x in X, then F (xn) converges weakly star to F (x) in X∗.(c) Assume X be uniformly convex (i.e., for each 0 < ε < 2 there exists δ = δ(ε) > 0such that if |x| = |y| = 1and |x − y| > ε, then |x + y| ≤ 2(1 − δ)). If xn convergesweakly to x and lim supn→∞ |xn| ≤ |x|, then xn converges strongly to x in X.(d) If X∗ is uniformly convex, then the mapping x→ F (x) is uniformly continuous onbounded subsets of X.

Proof: (a) Closeness of F (x) is an easy consequence of the follows from the continuityof the duality product. Choose x∗1, x

∗2 ∈ F (x) and α ∈ (0, 1). For arbitrary z ∈ X

we have (using |x∗1| = |x∗2| = |x|) 〈z, αx∗1 + (1 − α)x∗2〉 ≤ α|z| |x∗1| + (1 − α)|z| |x∗2| =|z| |x|, which shows |αx∗1 + (1 − α)x∗2| ≤ |x|. Using 〈x, x∗〉 = 〈x, x∗1〉 = |x|2 we get〈x, αx∗1 + (1−α)x∗2〉 = α〈x, x∗1〉+ (1−α)〈x, x∗2〉 = |x|2, so that |αx∗1 + (1−α)x∗2| = |x|.This proves αx∗1 + (1− α)x∗2 ∈ F (x).(b) Choose x∗1, x

∗2 ∈ F (x), α ∈ (0, 1) and assume that |αx∗1 +(1−α)x∗2| = |x|. Since X∗

is strictly convex, this implies x∗1 = x∗2. Let xn be a sequence such that xn → x ∈ X.From |F (xn)| = |xn| and the fact that closed balls in X∗ are weakly star compact wesee that there exists a weakly star accumulation point x∗ of F (xn). Since the closedball in X∗ is weakly star closed, thus

〈x, x∗〉 = |x|2 ≥ |x∗|2.

Hence 〈x, x∗〉 = |x|2 = |x∗|2 and thus x∗ = F (x). Since F (x) is single-valued, thisimplies F (xn) converges weakly to F (x).(c) Since lim inf |xn| ≤ |x|, thus limn→∞ |xn| = |x|. We set yn = xn/|xn| and y = x/|x|.Then yn converges weakly to y in X. Suppose yn does not converge strongly to y inX. Then there exists an ε > 0 such that for a subsequence yn |yn − y| > ε. SinceX∗ is uniformly convex there exists a δ > 0 such that |yn + y| ≤ 2(1 − δ). Since thenorm is weakly lower semicontinuos, letting n → ∞ we obtain |y| ≤ 1 − δ, which is acontradiction.(d) Assume F is not uniformly continuous on bounded subsets of X. Then there existconstants M > 0, ε > 0 and sequences un, vn in X satisfying

|un|, |vn| ≤M, |un − vn| → 0, and |F (un)− F (vn)| ≥ ε.

Without loss of the generality we can assume that, for a constant β > 0, we have in

154

addition |un| ≥ β, |vn| ≥ β. We set xn = un/|un| and yn = vn/|vn|. Then we have

|xn − yn| =1

|un| |vn|||vn|un − |un|vn|

≤ 1

β2(|vn||un − vn|+ ||vn| − |un|| |vn|) ≤

2M

β2|un − vn| → 0 as n→∞.

Obviously we have 2 ≥ |F (xn) + F (yn)| ≥ 〈xn, F (xn) + F (yn)〉 and this together with

〈xn, F (xn) + F (yn)〉 = |xn|2 + |yn|2 + 〈xn − yn, F (yn)〉

= 2 + 〈xn − yn, F (yn)〉 ≥ 2− |xn − yn|

implieslimn→∞

|F (xn) + F (yn)| = 2.

Suppose there exists an ε0 > 0 and a subsequence nk such that |F (xnk)−F (ynk)| ≥ ε0.Observing |F (xnk)| = |F (ynk)| = 1 and using uniform convexity of X∗ we concludethat there exists a δ0 > 0 such that

|F (xnk) + F (ynk)| ≤ 2(1− δ0),

which is a contradiction to the above. Therefore we have limn→∞ |F (xn)−F (yn)| = 0.Thus

|F (un)− F (vn)| ≤ |un| |F (xn)− F (yn)|+ ||un| − |vn|| |F (yn)|

which implies F (un) converges strongly to F (vn). This contradiction proves the result.

7.1 Monotone equations and Minty–Browder Theorem

Definition (Monotone Mapping)(a) A mapping A ⊂ X ×X∗ be given. is called monotone if

〈x1 − x2, y1 − y2〉 ≥ 0 for all [x1, y1], [x2, y2] ∈ A.

(b) A monotone mapping A is called maximal monotone if any monotone extension ofA coincides with A, i.e., if for [x, y] ∈ X ×X∗, 〈x− u, y− v〉 ≥ 0 for all [u, v] ∈ A then[x, y] ∈ A.(c) The operator A is called coercive if for all sequences [xn, yn] ∈ A with limn→∞ |xn| =∞ we have

limn→∞

〈xn, yn〉|xn|

=∞.

(d) Assume that A is single-valued with dom(A) = X. The operator A is calledhemicontinuouson X if for all x1, x2, x ∈ X, the function defined by

t ∈ R→ 〈x,A(x1 + tx2)〉

is continuous on R.

155

For example, let F be the duality mapping of X. Then F is monotone, coerciveand hemicontinuous. Indeed, for [x1, y1], [x2, y2] ∈ F we have

〈x1 − x2, y1 − y2〉 = |x1|2 − 〈x1, y2〉 − 〈x2, y1〉+ |x2|2 ≥ (|x1| − |x2|)2 ≥ 0, (7.1)

which shows monotonicity of F . Coercivity is obvious and hemicontinuity follows fromthe continuity of the duality product.

Lemma 1 Let X be a finite dimensional Banach space and A be a hemicontinuousmonotone operator from X to X∗. Then A is continuous.

Proof: We first show that A is bounded on bounded subsets. In fact, otherwise thereexists a sequence xn in X such that |Axn| → ∞ and xn → x0 as n → ∞. Bymonotonicity we have

〈xn − x,Axn|Axn|

− Ax

|Axn|〉 ≥ 0 for all x ∈ X.

Without loss of generality we can assume that Axn|Axn| → y0 in X∗ as n→∞. Thus

〈x0 − x, y0〉 ≥ 0 for all x ∈ X

and therefore y0 = 0. This is a contradiction and thus A is bounded. Now, assumexn converges to x0 and let y0 be a cluster point of Axn. Again by monotonicityof A

〈x0 − x, y0 −Ax〉 ≥ 0 for all x ∈ X.

Setting x = x0 + t (u− x0), t > 0 for arbitrary u ∈ X, we have

〈x0 − u, y0 −A(x0 + t (u− x0)) ≥ 0〉 for all u ∈ X.

Then, letting limit t→ 0+, by hemicontinuity of A we have

〈x0 − u, y0 −Ax0〉 ≥ 0 for all u ∈ X,

which implies y0 = Ax0.

Lemma 2 Let X be a reflexive Banach space and A : X → X∗ be a hemicontinuousmonotone operator. Then A is maximumal monotone.

Proof: For [x0, y0] ∈ X ×X∗

〈x0 − u, y0 −Au〉 ≥ 0 for all u ∈ X.

Setting u = x0 + t (x− x0), t > 0 and letting t→ 0+, by hemicontinuity of A we have

〈x0 − x, y0 −Ax0〉 ≥ 0 for all x ∈ X.

Hence y0 = Ax0 and thus A is maximum monotone.

The next theorem characterizes maximal monotone operators by a range condition.

Minty–Browder Theorem Assume that X, X∗ are reflexive and strictly convex. LetF denote the duality mapping of X and assume that A ⊂ X ×X∗ is monotone. ThenA is maximal monotone if and only if

Range (λF +A) = X∗

156

for all λ > 0 or, equivalently, for some λ > 0.

Proof: Assume that the range condition is satisfied for some λ > 0 and let [x0, y0] ∈X ×X∗ be such that

〈x0 − u, y0 − v〉 ≥ 0 for all [u, v] ∈ A.

Then there exists an element [x1, y1] ∈ A with

λFx1 + y1 = λFx0 + y0. (7.2)

From these we obtain, setting [u, v] = [x1, y1],

〈x1 − x0, Fx1 − Fx0〉 ≤ 0.

By monotonicity of F we also have the converse inequality, so that

〈x1 − x0, Fx1 − Fx0〉 = 0.

From (7.1) this implies that |x1| = |x0| and 〈x1, Fx0〉 = |x1|2, 〈x0, Fx1〉 = |x0|2. HenceFx0 = Fx1 and

〈x1, Fx0〉 = 〈x0, Fx0〉 = |x0|2 = |Fx0|2.

If we denote by F ∗ the duality mapping of X∗ (which is also single-valued), then the lastequation implies x1 = x0 = F ∗(Fx0). This and (7.2) imply that [x0, y0] = [x1, y1] ∈ A,which proves that A is maximal monotone.

In stead of the detailed proof of ”only if’ part of Theorem, we state the followingresults.

Corollary Let X be reflexive and A be a monotone, everywhere defined, hemicontinousoperator. If A is coercive, then R(A) = X∗.

Proof: Suppose A is coercive. Let y0 ∈ X∗ be arbitrary. By the Appland’s renormingtheorem, we may assume that X and X∗ are strictly convex Banach spaces. It thenfollows from Theorem that every λ > 0, equation

λFxλ +Axλ = y0

has a solution xλ ∈ X. Multiplying this by xλ,

λ |xλ|2 + 〈xλ, Axλ〉 = 〈y0, xλ〉.

and thus〈xλ, Axλ〉|xλ|X

≤ |y0|X∗

Since A is coercive, this implies that xλ is bounded in X as λ→ 0+. Thus, we mayassume that xλ converges weakly to x0 in X and Axλ converges strongly to y0 in X∗

as λ→ 0+. Since A is monotone

〈xλ − x, y0 − λFxλ −Ax〉 ≥ 0,

and letting λ→ 0+, we have

〈x0 − x, y0 −Ax〉 ≥ 0,

157

for all x ∈ X. Since A is maximal monotone, this implies y0 = Ax0. Hence, weconclude R(A) = X∗.

Theorem (Galerkin Approximation) Assume X is a reflexive, separable Banachspace and A is a bounded, hemicontinuous, coercive monotone operator from X intoX∗. Let Xn = spanφni=1 satisfies the density condition: for each ψ ∈ X and anyε > 0 there exists a sequence ψn ∈ Xn such that |ψ − ψn| → 0 as n → ∞. The xn bethe solution to

〈ψ,Axn〉 = 〈ψ, f〉 for all ψ ∈ Xn, (7.3)

then there exists a subsequence of xn that converges weakly to a solution to Ax = f .

Proof: Since 〈x,Ax〉/|x|X → ∞ as |x|X → ∞ there exists a solution xn to (7.3)and |xn|X is bounded. Since A is bounded, thus Axn bounded. Thus there exists asubsequence of n (denoted by the same) such that xn converges weakly to x in Xand Axn converges weakly in X∗. Since

limn→∞

〈ψ,Axn〉 = limn→∞

(〈ψn, f〉+ 〈ψ − ψn, Axn〉) = 〈ψ, f〉

Axn converges weakly to f . Since A is monotone

〈xn − u,Axn −Au〉 ≥ 0 for all u ∈ X

Note thatlimn→∞

〈xn, Axn〉 = limn→∞

〈xn, f〉 = 〈x, f〉.

Thus taking limit n→∞, we obtain

〈x− u, f −Au〉 ≥ 0 for all u ∈ X.

Since A is maximum monotone this implies Ax = f .

The main theorem for monotone operators applies directly to the model probleminvolving the p-Laplace operator

−div(|∇u|p−2∇u) = f on Ω

(with appropriate boundary conditions) and

−∆u+ c u = f, − ∂

∂nu ∈ β(u). at ∂Ω

with β maximal monotone on R. Also, nonlinear problems of non-variational form areapplicable, e.g.,

Lu+ F (u,∇u) = f on Ω

whereL(u) = −div(σ(∇u)−~b u)

and we are looking for a solution u ∈ W 1,p0 (Ω), 1 < p < ∞. We assume the following

conditions:(i) Monotonicity for the principle part L(u):

(σ(ξ)− σ(η), ξ − η)Rn ≥ 0 for all ξ, η ∈ Rn.

158

(ii) Monotonicity for F = F (u):

(F (u)− F (v), u− v) ≥ 0 for all u, v ∈ R.

(iii) Coerciveness and Growth condition: for some c, d > 0

(σ(ξ), σ) ≥ c |ξ|p, |σ(ξ)| ≤ d (1 + |ξ|p−1)

hold for all ξ ∈ Rn.

7.2 Pseudo-Monotone operator

Next, we examine a somewhat more general class of nonlinear operators, called pseudo-monotone operators. In applications, it often occurs that the hypotheses imposed onmonotonicity are unnecessarily strong. In particular, the monotonicity assumptionF (u) involves both the first-order derivatives and the function itself. In general thecompactness will take care of the lower-order terms.

Definition (Pseudo-Monotone Operator) Let X be a reflexive Banach space. Anoperator T : X → X∗ is said to be pseudo-monotone operator if T is a boundedoperator (not necessarily continuous) and if whenever un converges weakly to u in Xand

lim supn→∞

〈un − u, T (un)〉 ≤ 0,

it follows that, for all v ∈ X,

lim infn→∞

〈un − v, T (un)〉 ≥ 〈u− v, T (u)〉. (7.4)

For example, a monotone and hemi-continuous operator is pseudo-monotone.

Lemma If T is maximal monotone, then T is pseudo-monotone.

Proof: Since T is monotone,

〈un − u, T (un)〉 ≥ 〈un − u, T (u)〉 → 0

it follows from (7.4) that limn 〈un − u, T (un)〉 = 0. Assume un → u and T (un) → yweakly in X and X∗, respectively. Thus,

0 ≤ 〈un − v, T (un)− T (v)〉 = 〈un − u+ u− v, T (un)− T (v)〉 → 〈u− v, y − T (v)〉.

Since T is maximal monotone, we have y = T (u). Now,

limn〈un − v, T (un)〉 = lim

n〈un − u+ u− v, T (un)〉 = 〈u− v, T (u)〉.

A class of pseudo-monotone operators is intermediate between monotone and hemi-continuous. The sum of two pseudo-monotone operators is pseudo-monotone. More-over, the sum of a pseudo-monotone and a strongly continuous operator (xn convergesweakly to x, then Txn → Tx strongly) is pseudo-monotone.

Property of Pseudo-monotone operator(1) Letting v = u in (7.4)

0 ≤ lim inf〈un − u, T (un)〉 ≤ lim sup〈un − u, T (un)〉 ≤ 0

159

and thus limn〈un − u, T (un)〉 = 0.(2) From (1) and (7.4)

〈u− v, Tu〉 ≤ lim inf〈un− v, T (un)〉+ lim inf〈u−un−u, T (un)〉 = lim inf〈u− v, T (un)〉

(3) T (un)→ T (u) weakly. In fact, from(2)

〈v, T (u)〉 ≤ lim inf〈v, T (un)〉 ≤ lim sup〈v, T (un)〉

= lim sup〈un − u, T (un)〉+ lim sup〈u+ v − un, T (un)〉

= − lim inf〈un − (u+ v), T (un)〉 ≤ 〈v, T (u)〉.

(4) 〈un − u, T (un)− T (u)〉 → 0 By the hypothesis

0 ≤ lim inf〈un − u, T (un)〉 = lim inf〈un − u, T (un)〉 − lim inf〈un − u, T (u)〉

= lim inf〈un − u, T (un)− T (u)〉 ≤ lim sup〈un − u, T (un)− T (u)〉

= lim sup〈un − u, T (un)− T (u)〉 ≤ 0

(5) 〈un, T (un)〉 → 〈u, T (u)〉. Since

〈un, T (un)〉 = 〈un − u, T (un)− u〉 − 〈u, T (un)〉 − 〈un, T (u)〉+ 〈un, T (un)〉

the claim follows from (3)–(4).(6) Conversely, if T (un) → T (u) weakly and 〈un, T (un)〉 → 〈u, T (u)〉 → 0, then thehypothesis holds.

Using a very similar proof to that of the Browder-Minty theorem, one can show thefollowing.

Theorem (Bresiz) Assume, the operator A : X → X∗ is pseudo-monotone, boundedand coercive on the real separable and reflexive Banach space X. Then, for each f ∈ X∗the equation A(u) = f has a solution.

7.3 Generalized Pseudo-Monotone Mapping

In this section we discuss pseudo-monotone mappings. We extend the notion of themaximal monotone mapping, i.e., the maximal monotone mapping is a generalizedpseudo-monotone mappings.

Definition (Pseudo-monotone Mapping)(a) The set T (u) is nonempty, bounded, closed and convex for all u ∈ X.(b) T is upper semicontinuous from each finite-dimensional subspace F of X to theweak topology on X∗, i.e., to a given element u+ ∈ F and a weak neighborhood Vof T (u), in X∗ there exists a neighborhood U of u0 in F such that T (u) ⊂ V for allu ∈ U .(c) If un is a sequence in X that converges weakly tou ∈ X and if wn ∈ T (un) issuch that lim sup〈un − u,wn〉 ≤ 0, then to each element v ∈ X there exists w ∈ T (u)with the property that

lim inf〈un − v, wn〉 ≥ 〈u− v, w〉.

160

Definition (Generalized Pseudo-monotone Mapping) A mapping T fromX intoX∗ is said to be generalized pseudo-monotone if the following is satisfied. For any se-quence un in X and a corresponding sequence wn in X∗ with wn ∈ T (xn), if unconverges weakly to u and wn weakly to w such that

lim supn→∞

〈un − u,wn)〉 ≤ 0,

then w ∈ T (u) and 〈un, wn〉 → 〈u,w〉.

Let X be a reflexive Banach space, T a pseudo-monotone mapping from X intoX∗. Then T is generalized pseudo-monotone. Conversely, suppose T is a boundedgeneralized pseudo-monotone mapping from X into X∗. and assume that for eachu ∈ X, Tu is a nonempty closed convex subset of X∗. Then T is pseudo-monotone.

PROPOSITION 2. A maximal monotone mapping T from the Banach space X intoX∗ is generalized pseudo-monotone.

Proof: Let un be a sequence in X that converges weakly to u ∈ X, wn a se-quence in X∗ with wn ∈ T (un) that converges weakly to w ∈ X∗. Suppose thatlim sup〈un − u,wn〉 ≤ 0 Let [x, y] ∈ T be an arbitrary element of the graph G(T ). Bythe monotonicity of T ,

〈un − x,wn − y〉 ≥ 0

Since〈un, wn〉 = 〈un − x,wn − y〉+ 〈x,wn〉+ 〈un, y〉 − 〈x, y〉,

where〈x,wn〉+ 〈un, y〉 → 〈x,w〉+ 〈u, y〉 − 〈x, y〉,

we have〈u,w〉 ≤ lim sup〈un, wn〉 ≥ 〈x,w〉+ 〈u, y〉 − 〈x, y〉,

i.e.,〈u− x,w − y〉 ≥ 0

Since T is maximal, w ∈ T (u). Consequently,

〈un − u,wn − w〉 ≥ 0.

Thus,lim inf〈un, wn〉 ≥ lim inf(〈un, w〉〈u,wn〉+ 〈u,w〉 = 〈u,w〉,

It follows that lim〈un, wn〉 → 〈u,w〉. DEFINITION 5 (1) A mapping T fromX intoX∗ is coercive if there exists a functionc : R+→ R+ with limr∞ c(r) =∞ such that

〈u,w〉 ≥ c(|u|)|u| for all [u,w] ∈ G(T ).

(1) An operator T from X into X∗ is called smooth if it is bounded, coercive, maximalmonotone, and has effective domain D(T ) = X.(2) Let T be a generalized pseudo-monotone mapping from X into X∗. Then T is saidto be regular if R(T + T2) = X∗ for each smooth operator T2.

161

(3) T is said to be quasi-bounded if for each M > 0 there exists K(M) > 0 such thatwhenever [u,w] ∈ G(T ) of T and

〈u,w〉 ≤M |u|, |u| ≤M,

then |w| ≤ K(M). T is said to be strongly quasi-bounded if for each M > 0 thereexists K(M) > 0 such that whenever [u,w] ∈ G(T ) of T and

〈u,w〉 ≤M |u| ≤M,

then |w| ≤ K(M).

Theorem 1 Let T = T1 + T0, where T1 is maximal monotone with 0 ∈ D(T ), whileT0 is a regular generalized pseudo-monotone mapping such that for a given constantk, 〈u,w〉k |u| for all [u,w] ∈ G(T ). If either T0 is quasi-bounded or T1 is stronglyquasi-bounded, then T is regular.

Theorem 2 (Browder). Let X be a reflexive Banach space and T be a generalizedpseudo-monotone mapping from X into X∗. Suppose that R(ε F + T ) = X∗ for ε > 0and T is coercive. Then R(T ) = X∗.

Lemma 1 Let T be a generalized pseudo-monotone mapping from the reflexive Banachspace X into X∗, and let C be a bounded weakly closed subset of X. Then T (C) =w ∈ X∗ : w ∈ T (u)for some u ∈ C is closed in the strong topology of X∗.

Proof: Let Let wn be a sequence in T (C) converging strongly to some w ∈ X∗. Foreach n, there exists un ∈ C such that wn ∈ T (un). Since C is bounded and weaklyclosed, without loss of the generality we assume un → u ∈ C, weakly in X. It followsthat lim〈un − u,wn〉 = 0. The generalized pseudo-monotonicity of T implies thatw ∈ T (u), i.e., w lies in T (C).

Proof of Theorem 2: Let J denote the normalized Let F denote the duality mappingof X into X∗. It is known that F is a smooth operator. Let w0 be a given element ofX∗. We wish to show that w0 ∈ R(T ). For each ε > 0, it follows from the regularityof T that there exists an element uε ∈ X such that

w0 − ε pε ∈ T (uε) for some pε ∈ F (uε).

Let wε = w0 − ε pε. Since T is coercive and 〈uε, wε〉 = |uε|2, we have for some k

ε |uε|2 ≤ k |uε|+ |uε||w0|.

Henceε |uε| = ε |pε| ≤ |w0|.

and|wε| ≤ |w0|+ ε |pε| ≤ k + 2|w0|

Since T is coercive, there exits M > 0 such that |uε| ≤M and thus

|wε − w0| ≤ ε |pε| = ε |uε| → 0 as ε→ 0+

It thus follows from Lemma 1 that T (B(0,M)) is closed and thus w0 ∈ T (B(0,M)),i.e., w0 ∈ R(T ).

162

THEOREM 3 Let X be a reflexive Banach space and T a pseudo-monotone mappingfrom X into X∗. Suppose that T is coercive, R(T ) = X∗.

PROPOSITION 10. Let F be a finite-dimensional Banach space, a mapping fromF into F ∗ such that for each u ∈ F , T(u) is a nonempty bounded closed convex subsetof F ∗. Suppose that T is coercive and upper semicontinuous from F to F ∗. ThenR(T ) = F ∗.Proof: Since for each w0 ∈ F ∗ define the mapping Tw0 : F → F ∗ by

Tw0(u) = T (u)− w0

satisfies the same hypotheses as the original mapping T . It thus suffices to prove that0 ∈ R(T ). Suppose that 0 does not lie in R(T ). Then for each u ∈ F , there existsv(u) ∈ F such that

infw∈T (u)

〈v(u), w〉 > 0

Since T is coercive, there exists a positive constant R such that c(R) > 0, and hencefor each u ∈ F with |u| = R and each w ∈ T (u),

〈u,w〉 ≥ λ > 0

where λ = c(R)R, For such points u ∈ F , we may take v(u) = u. For a given nonzerov0 in F let

Wv0 = u ∈ F : infw∈T (u)

〈v0, w〉 > 0.

The family Wv0 : v0 ∈ F, v0 6= 0 forms a covering of the space F . By the uppersemicontinuity of T , from F to F ∗ each Wv0 is open in F . Hence the family Wv0 :v0 ∈ F, v0 6= 0 forms an open covering of F . We now choose a finite open coveringV1, · · · , Vm of the closed ball B(0, R) in F of radius R about the origin, with theproperty that for each k there exists an element vk in F such that Vk ⊂ Wvk and theadditional condition that if Vk intersects the boundary sphere S(0, R), then vk is apoint of Vk ∩ S(0, R) and the diameter of such Vk is less than R

2 .. We next choose apartition of unity α1, · · · , αm subordinated to the covering V1, · · · , Vm where eachαk is a continuous mapping of F into [0, I] which vanishes outside the correspondingset Vk, and with the property that

∑k αk(x) = 1 for each x ∈ B(0, R). Using this

partition of unity, we may define a continuous mapping f of B(0, R) into F by setting

f(x) =∑k

α(x)vk.

We note that for each k for which alphak(x) > 0 and for any winT0x, we have 〈vk, w〉 >0 since x must lie in Vk, which is a subset of Wk. Hence for each x ∈ B(0, R) and anyw ∈ T0x

〈f(x), w〉 =∑k

α(x)〈vk, w〉 > 0

Consequently, f(x) 6= 0 for any x ∈ B(0, R). This implies that the degree of themapping f on B(0, R) with respect to 0 equals 0. For x ∈ S(0, R), on the other hand,f(x) is a convex linear combination of points vk ∈ S(0, R), each of which is at distanceat most R

2 from the point x. We infer that

|f(x)− x| ≤ R

2, x ∈ S(0, R).

163

By this inequality, f considered as a mapping from B(0, R) into F is homotopic tothe identity mapping. Hence the degree of f on B(0, R) with respect to 0 is 1. Thiscontradiction proves that 0 must lie in T0u for some u ∈ B(0, R). Proof of Theorem 3 Let Λ be the family of all finite-dimensional subspaces F of X,ordered by inclusion. For F ∈ Λ, let jF : F → X denote the inclusion mapping of Finto X, and j∗F : X∗ → F ∗ the dual projection mapping of X∗ onto F ∗. The operator

TF = j∗FTjF

then maps F into F ∗. For each u ∈ F , TFu is a weakly compact convex subset of X∗

and j∗F is continuous from the weak topology on X∗ to the (unique) topology on F ∗.Hence TFu is a nonempty closed convex subset of F ∗. Since T is upper semicontinuousfrom F to F ∗ with X∗ given its weak topology, TF is upper semicontinuous from Fto F ∗. By Proposition 10, to each F ∈ Λ there exists an element uF ∈ F such thatj∗Fw0 = wF for some wF ∈ TFuF . The coerciveness of T implies that wF ∈ TFuF , i.e.

〈uF , j∗Fw0〉 = 〈uF , wF 〉 ≥ c(|uF |)|uF |.

Consequently, the elements |uF | are uniformly bounded by M for all F ∈ Λ. ForF ∈ Λ, let

VF =⋃

F ′∈Λ, F ′⊂FuF ′.

Then the set VF is contained in the closed ball B(0,M) in X Since B(0,M) is weaklycompact, and since the family weak closure(VF ) has the finite intersection property,the intersection ∩F∈Λweak closure(VF ) is not empty. Let u0 be an element containedin this intersection.

The proof will be complete if we show that 0 ∈ Tw0u0. Let v ∈ X be arbitrarilygiven. We choose F ∈ Λ such that it contains u0, v ∈ F Let uFkdenote a sequencein VF converging weakly to u0. Since j∗FkwFk = 0, we have

〈uFk − u0, wFk〉 = 0 for all k.

Consequently, by the pseudo-monotonicity of T , to given v ∈ X there exists w(v) ∈T (u0) with

lim〈uFk − v, wFk〉 ≥ 〈u0 − v, w〉. (7.5)

Suppose now that 0 /∈ T (u0). Then 0 can be separated from the nonempty closedconvex set T (u0), i.e., there exists an element x = u− v ∈ X such that

infz∈T (u0)

〈u0 − v, z〉 > 0.

But this is a contradiction to (7.5). Navier-Stokes equationsExample Consider the constrained minimization

min F (y) subject to E(y) = 0.

whereE(y) = E0y + f(y).

164

The necessary optimality condition for x = (y, p)

A(y, p) ∈(∂F − E′(y)∗pE(y)

).

Let

A0(y, p) ∈(∂F − E∗0pE0y

),

and

A1(y, p) =

(−f ′(y)pf(y)

).

7.4 Fixed Point Theorems

In this section we discuss the fixed point theorem.

Banach Fixed Point Theorem. Let (X, d) be a non-empty complete metric spacewith a contraction mapping T : X → X. Then T admits a unique fixed-point x∗ in X(i.e. T (x∗) = x∗). Moreover, x∗ is a fixed point of the fixed point iterate xn = T (xn−1)with arbitrary initial condition x0.

Proof:d(xn+1, xn) = d(T (xn), T (xn−1) ≤ ρ d(xn, xn−1.

By the inductiond(xn+1, xn) ≤ ρn d(x1, x0)

and

d(xm, xn) ≤ ρn−1

1− ρd(x1, x0)→ 0

as m ≥ n→∞. Thus, xn is a Cauchy sequence and xn converges a fixed point x∗

of T . Suppose x∗1, x∗2 are two fixed point. Then,

d(x∗1, x∗2) = d(T (x∗1), T (x∗2) ≤ ρ d(x∗1, x

∗2)

Since ρ < 1, d(x∗1, x∗2) = 0 and thus x∗1 = x∗2.

Brouwer fixed point theorem: is a fundamental result in topology which proves theexistence of fixed points for continuous functions defined on compact, convex subsetsof Euclidean spaces. Kakutani’s theorem extends this to set-valued functions.

Kakutani’s theorem states: Let S be a non-empty, compact and convex subset ofsome Euclidean space Rn. Let T : S → S be a set-valued function on S with a closedgraph and the property that T (x) is non-empty and convex for all x ∈ S. Then T hasa fixed point.The Schauder fixed point theorem is an extension of the Brouwer fixed point theoremto topological vector spaces.

Schauder fixed point theorem Let X be a locally convex topological vector space,and let K ⊂ Xbe a compact, and convex set. Then given any continuous mappingT : K → K has a fixed point.

165

Proof: Given ε > 0 there exists the family of open balls Bε(x) : x ∈ K is opencovering of K. Since K is compact, there exists a finite open sub-cover, i.e. there existfinite many points xk in K such that

K =n⋃k=1

Bε(xk).

Define the functions gk by gk(x) = max(0, ε − |x − xk|). It is clear that each gk iscontinuous, gk(x) ≥ 0 and

∑k gk(x) > 1 for all x ∈ K. Thus, if we define a function

on K by

g(x) =

∑k gk(x)xk∑k gk(x)

and then g is a continuous function from K to the convex hull K0 of points xknk=1.It is easily shown that |g(x)− x| ≤ ε for x ∈ K. Then g maps K0 to K0. Since K0 iscompact convex subset of a finite dimensional vector space, we can apply the Brouwerfixed point theorem to assure the existence of zε ∈ K0 such that zε = g(zε). We havethe estimate

|zε − T (zε)| = |g(T (zε))− T (zε)| ≤ ε.

Since K is compact, the exits a subsequence εn → 0 such that xεn → x0 ∈ K and thus

|xεn − T (xεn)| ≤ |g(T (xεn))− T (xεn)| ≤ εn.

Since T is continuous, T (x0) = x0.

Schaefer’s fixed point theorem: Let T be a continuous and compact mapping of aBanach space X into itself, such that the set

x ∈ X : x = λTx for some 0 ≤ λ ≤ 1

is bounded. Then T has a fixed point.

Proof: Let |x| ≤M for x ∈ X : x = λTx for some 0 ≤ λ ≤ 1. Define

T (x) =MT (x)

max(M, |T (x)|)

Note that T : B(0,M) → B(0,M).. Let K be the closed convex hull go T (B(0,M)).Since T is compact, K is a compact convex subset of X. Since TK → K, it followsfrom Shauder’s fixed point theorem that there exits x ∈ K such that x = T (x). Wenow claim that x is a fixed point of T . If not, we should have |T (x)| > M and

x = λT (u) for λ =M

|T (x)|< 1

But, since |x| = |T (x)| ≤M , which is a contradiction.

The advantage of Schaefer’s theorem over Schauder’s for applications is that it isnot necessary to determine an explicit convex, compact set K.

Krasnoselskii theorem The sum of two operators T = A + B has a fixed point ina nonempty closed convex subset C of a real Banach space X under conditions such

166

that T (C) ⊂ C, A is continuous on C, A(C) is a relatively compact subset of X, andB is a strict contraction on C.

Note that the proof of Kransnoselskiis fixed point theorem combines the Banachcontraction theorem and Schauders fixed point theorem, i.e., y ∈ C, Ay +Bx = x hasthe fixed point map x = ψ(y) and use Schauders fixed point theorem for ψ.Kakutani-Glicksberg-Fan theorem Let S be a non-empty, compact and convexsubset of a locally convex topological vector space X. Let T : S → S be a Kakutanimap, i.e., it is upper hemicontinuous, i.e., if for every open set W ⊂ X, the setx ∈ X : T (x) ∈ W is open in X and T (x) is non-empty, compact and convex for allx ∈ X. Then T has a fixed point.

The corresponding result for single-valued functions is the Tychonoff fixed-pointtheorem.

8 Convex Analysis and Duality

This chapter is organized as follows. We present the basic convex analysis and du-ality theory for the optimization in Banach spaces. The subdifferential operator andmonotone operator equation in Banach spaces are presented.

8.1 Convex Functional and Subdifferential

Definition (Convex Functional) (1) A proper convex functional on a Banach spaceX is a function ϕ from X to (−∞,∞], not identically +∞ such that

ϕ((1− λ)x1 + λx2) ≤ (1− λ)ϕ(x1) + λϕ(x2)

for all x1, x2 ∈ X and 0 ≤ λ ≤ 1.(2) A functional ϕ : X → R is said to be lower-semicontinuous if

ϕ(x) ≤ lim infy→x

ϕ(y) for all x ∈ X.

(3) A functional ϕ : X → R is said to be weakly lower-semicontinuous if

ϕ(x) ≤ lim infn→∞

ϕ(xn)

for all weakly convergent sequence xn to x.(4) The subset D(ϕ) = x ∈ X;ϕ(x) <∞ of X is called the domain of ϕ.(5) The epigraph of ϕ is defined by epi(ϕ) = (x, c) ∈ X ×R : ϕ(x) ≤ c.

Lemma 3 A convex functional ϕ is lower-semicontinuous if and only if it is weaklylower-semicontinuous on X.

Proof: Since the level set x ∈ X : ϕ(x) ≤ c is a closed convex subset if ϕ is lower-semicontinuous. Thus, the claim follows the fact that a convex subset of X is closed ifand only if it is weakly closed.

Lemma 4 If ϕ be a proper lower-semicontinuous, convex functional on X, then ϕ isbounded below by an affine functional, i.e., there exist x∗ ∈ X∗ and c ∈ R such that

ϕ(x) ≥ 〈x∗, x〉+ β, x ∈ X.

167

Proof: Let x0 ∈ X and β ∈ R be such that ϕ(x0) > c. Since ϕ is lower-semicontinuouson X, there exists an open neighborhood V (x0) of X0 such that ϕ(x) > c for allx ∈ V (x0). Since the ephigraph epi(ϕ) is a closed convex subset of the product spaceX×R. It follows from the separation theorem for convex sets that there exists a closedhyperplane H ⊂ X ×R;

H = (x, r) ∈ X ×R : 〈x∗0, x〉+ r = α with x∗0 ∈ X∗, α ∈ R,

that separates epi(ϕ) and V (x0)× (−∞, c). Since x0 × (−∞, c) ⊂ (x, r) ∈ X ×R :〈x∗0, x〉+ r < α it follows that

〈x∗0, x〉+ r > α for all (x, c) ∈ epi(ϕ)

which yields the desired estimate.

Theorem C.6 If F : X → (−∞,∞] is convex and bounded on an open set U , then Fis continuous on U .

Proof: We choose M ∈ R such that F (x) ≤M−1 for all x ∈ U . Let x be any elementin U . Since U is open there exists a δ > 0 such that the open ball x ∈ X : |x−x| < δ is

contained in U . For any epsilon ∈ (0, 1), let θ =ε

M − F (x). Then for x ∈ X satisfying

|x− x| < θ δ

|x− xθ

+ x− x| = |x− x|θ

< δ

Hencex− xθ

+ x ∈ U . By the convexity of F

F (x) ≤ (1− θ)F (x) + θ F (x− xθ

+ x) ≤ (1− θ)F (x) + θM

and thusF (x)− F (x) < θ(M − F (x) = ε

Similarly,x− xθ

+ x ∈ U and

F (x) ≤ θ

1 + θF (x− xθ

+ x) +1

1 + θF (x) <

θM

1 + θ+

1

1 + θF (x)

which impliesF (x)− F (x) > −θ(M − F (x) = −ε

Therefore |F (x)− F (x)| < ε if |x− x| < θδ and F is continuous in U .

Definition (Subdifferential) Given a proper convex functional ϕ on a Banach spaceX the subdifferential of ∂ϕ(x) is a subset in X∗, defined by

∂ϕ(x) = x∗ ∈ X∗ : ϕ(y)− ϕ(x) ≥ 〈x∗, y − x〉 for all y ∈ X.

Since for x∗1 ∈ ∂ϕ(x1) and x∗2 ∈ ∂ϕ(x2),

ϕ(x1)− ϕ(x2) ≤ 〈x∗2, x1 − x2〉

ϕ(x2)− ϕ(x1) ≤ 〈x∗1, x2 − x1〉

168

it follows that 〈x∗1 − x∗2, x1 − x2〉 ≥ 0. Hence ∂ϕ is a monotone operator from X intoX∗.

Example 1 Let ϕ be Gateaux differentiable at x. i.e., there exists w∗ ∈ X∗ such that

limt→0+

ϕ(x+ t v)− ϕ(x)

t= 〈w∗, h〉 for all h ∈ X

and w∗ is the Gateaux differential of ϕ at x and is denoted by ϕ′(x). If ϕ is convex,then ϕ is subdifferentiable at x and ∂ϕ(x) = ϕ′(x). Indeed, for v = y − x

ϕ(x+ t (y − x))− ϕ(x)

t≤ ϕ(y)− ϕ(x), 0 < t < 1

Letting t→ 0+ we have

ϕ(y)− ϕ(x) ≥ 〈ϕ′(x), y − x〉 for all y ∈ X,

and thus ϕ′(x) ∈ ∂ϕ(x). On the other hand if w∗ ∈ ∂ϕ(x) we have for y ∈ X and t > 0

ϕ(x+ t y)− ϕ(x)

t≥ 〈w∗, y〉.

Taking limit t→ 0+, we obtain

〈ϕ′(x)− w∗, y〉 ≥ 0 for all y ∈ X.

This implies w∗ = ϕ′(x).

Example 2 If ϕ(x) = 12 |x|

2 then we will show that ∂ϕ(x) = F (x), the duality map-ping. In fact, if x∗ ∈ F (x), then

〈x∗, x− y〉 = |x|2 − 〈y, x∗〉 ≥ 1

2(|x|2 − |y|2) for all y ∈ X.

Thus x∗ ∈ ∂ϕ(x). Conversely, if x∗ ∈ ∂ϕ(x), then

1

2(|y|2 − |x|2) ≥ 〈x∗, y − x〉 for all y ∈ X (8.1)

We let y = t x, 0 < t < 1 and obtain

1 + t

2|x|2 ≤ 〈x, x∗〉

and thus |x|2 ≤ 〈x, x∗〉. Similarly, if t > 1, then we conclude |x|2 ≥ 〈x, x∗〉 and therefore|x|2 = 〈x, x∗〉 and |x∗| ≥ |x|. On the other hand, letting y = x + λu, λ > 0 in (8.1),we have

λ 〈x∗, u〉 ≤ 1

2(|x+ λu|2 − |x|2) ≤ λ |u||x|+ λ |u|2,

which implies 〈x∗, u〉 ≤ |u||x|. Hence |x∗| ≤ |x| and we obtain |x|2 = |x∗|2 = 〈x∗, x〉.

Example 3 Let K be a closed convex subset of X and IK be the indicator functionof K, i.e.,

IK(x) =

0 if x ∈ K∞ otherwise.

169

Obviously, IK is convex and lower-semicontinuous on X. By definition we have forx ∈ K

∂IK(x) = x∗ ∈ X∗ : 〈x∗, x− y〉 ≥ 0 for all y ∈ KThus D(IK) = D(∂IK) = K and ∂K(x) = 0 for each interior point of K. Moreover,if x lies on the boundary of K, then ∂IK(x) coincides with the cone of normals to Kat x.

Note that ∂F (x) is closed and convex and may be empty.

Theorem C.10 If a convex function F is continuous at x then ∂F (x) is non empty.

Proof: Since F is continuous at x for any ε > 0 there exists a neighborhood Uε of xsuch that

F (x) ≤ F (x) + ε, x ∈ Uε.Then Uε × (F (x) + ε,∞) is an open set in X × R and is contained in epi F . Hence(epi F )o is non empty. Since F is convex epi F is convex and (epi F )o is convex. Forany neighborhood of O of (x, F (x)) there exists a t < 1 such that (x, tF (x)) ∈ O. But,tF (x) < F (x) and so (x, tF (x)) /∈ epi F . Thus (x, F (x)) /∈ (epi F )o. By the HahnBanach separation theorem, there exists a closed hyperplane S = (x, a) ∈ X × R :〈x∗, x〉+ αa = β for nontrivial (x∗, α) ∈ X∗ ×R and β ∈ R such that

〈x∗, x〉+ αa > β for all (x, a) ∈ (epi F )o

〈x∗, x〉+ αF (x) = β.(8.2)

Since (epi F )o = epi F every neighborhood of (x, a) ∈ epi F contains an element of(epi ϕ)o. Suppose 〈x∗, x〉+ αa < β. Then

(x′, a′) ∈ X ×R : 〈x∗, x′〉+ αa′ < β

is an neighborhood of (x, a) and contains an element of (epi F )o, which contradicts to(8.2). Hence

〈x∗, x〉+ αa ≥ β for all (x, a) ∈ epi F. (8.3)

Suppose α = 0. For any u ∈ Uε there is an a ∈ R such that F (u) ≤ a. Then from (8.3)

〈x∗, u〉 = 〈x∗, u〉+ αa ≥ β

and thus〈x∗, u− x〉 ≥ 0 for all u ∈ Uε.

Choose a δ such that |u − x| ≤ δ implies u ∈ U . For any nonzero element x ∈ X lett = δ

|x| . Then |(tx+ x)− x| = |tx| = δ so that tx+ x ∈ Uε. Hence

〈x∗, x〉 = 〈x∗, (tx+ x)− x〉/t ≥ 0.

Similarly, −t x+ x ∈ Uε and

〈x∗, x〉 = 〈x∗, (−tx+ x)− x〉/(−t) ≤ 0.

Thus, 〈x∗, x〉 and x∗ = 0, which is a contradiction. Therefore α is nonzero. It nowfollows from (8.2)–(8.3) that

〈−x∗

α, x− x〉+ F (x) ≤ F (x)

170

for all x ∈ X and thus −x∗

α ∈ ∂F (x).

Definition (Conjugate function) Given a proper functional ϕ on a Banach spaceX, the conjugate functional on X∗ is defined by

ϕ∗(x∗) = supx〈x∗, x〉 − ϕ(x).

Examples (Conjugate functions) (1) If ϕ(x) = 〈a, x〉 − b is a bounded afine functional,then ϕ∗(x∗) = Ix∗=a.(2) If ϕ(x) = 1

p |x|p, then ϕ∗(x) = 1

q |x∗|p.

(3) If ϕ(x) = |x|, then ϕ∗(x∗) = I|x∗|≤1.

Theorem 2.1 The conjugate function ϕ∗ is convex, lower semi-continuous and proper.

Proof: For 0 < t < 1 and x∗1, x∗2 ∈ X∗

ϕ∗(t x∗1 + (1− t)x∗2) = supx〈t x∗1 + (1− t)x∗2, x〉 − ϕ(x)

≥ t supx〈x∗1, x〉 − ϕ(x)+ (1− t) supx〈x∗2, x〉 − ϕ(x)

= t ϕ∗(x∗1) + (1− t)ϕ∗(x∗2).

Thus, ϕ∗ is convex. For ε > 0 arbitrary let y ∈ X be

ϕ∗(x∗)− ε ≤ 〈x∗, y〉 − ϕ(y)

For xn → x∗

ϕ∗(x∗n) ≥ 〈x∗n, y〉 − ϕ(y)

andlim infn→∞

ϕ∗(x∗n) ≥ 〈x∗, y〉 − ϕ(y)

Since ε > arbitrary, ϕ∗ is lower semi-continuous.

Definition (Bi-Conjugate Function) For any proper functional ϕ on X the bi-conjugate function of ϕ is ϕ∗∗ : X → (−∞,∞] is defined by

ϕ∗∗(x) = supx∗∈X∗

〈x∗, x〉 − ϕ∗(x∗).

Theorem C.3 For any F : X → (−∞,∞] F ∗∗ is convex and l.s.c. and F ∗∗ ≤ F .

Proof: Note thatF ∗∗(x) = sup

x∗∈D(F ∗)〈x∗, x〉 − F ∗(x∗).

Since for x∗ ∈ D(F ∗)〈x∗, x〉 − F ∗(x∗) ≤ F (x)

for all x, F ∗∗(x) ≤ F (x).

Theorem If F is a proper l.s.c. convex function, then F = F ∗∗. If X is reflexive, then(F ∗)∗ = F ∗∗.

171

Proof: By Theorem if F is a proper l.s.c. convex function, then F is bounded belowby an affine functional 〈x∗, x〉 − a, i.e. a ≥ 〈x∗, x〉 − F (x) for all x ∈ X. Define

a(x∗) = infa ∈ R : a ≥ 〈x∗, x〉 − F (x) for all x ∈ X

Then for a ≥ a(x∗)〈x∗, x〉 − a ≤ 〈x∗, x〉 − a(x∗)

for all x ∈ X. By Theorem again

F (x) = supx∗∈X∗

〈x∗, x〉 − a(x∗).

Sincea(x∗) = sup

x∈X〈x∗, x〉 − F (x) = F ∗(x)

we haveF (x) = sup

x∗∈X∗〈x∗, x〉 − F ∗(x∗).

We have the duality property.

Theorem 2.2 Let F be a proper convex function on X.(1) For F : X → (−∞,∞] x∗ ∈ ∂F (x) is if and only if

F (x) + F ∗(x∗) = 〈x∗, x〉.

(2) Assume X is reflexive. If x∗ ∈ ∂F (x), then x ∈ ∂F ∗(x∗). If F is convex and lowersemi-continuous, then x∗ ∈ ∂F (x) if and only if x ∈ ∂F ∗(x∗).

Proof: (1) Note that x∗ ∈ ∂F (x) is if and only if

〈x∗, x〉 − F (x) ≤ 〈x∗, x〉 − F (x) (8.4)

for all x ∈ X. By the definition of F ∗ this implies F ∗(x∗) = 〈x∗, x〉−F (x). Conversely,if F (x∗) + F (x) = 〈x∗, x〉 then (8.4) holds for all x ∈ X.(2) Since F ∗∗ ≤ F by Theorem C.3, it follows from (1)

F ∗∗(x) ≤ 〈x∗, x〉 − F ∗(x∗)

But by the definition of F ∗∗

F ∗∗(x) ≥ 〈x∗, x〉 − F ∗(x∗).

HenceF ∗(x∗) + F ∗∗(x) = 〈x∗, x〉.

Since X is reflexive, F ∗∗ = (F ∗)∗. If we apply (1) to F ∗ we have x ∈ ∂F ∗(x∗). Inaddition if F is convex and l.s.c, it follows from Theorem C.1 that F ∗∗ = F . Thus ifx∗ ∈ ∂F ∗(x) then

F (x) + F ∗(x∗) = F ∗(x∗) + F ∗∗(x) = 〈x∗, x〉

by applying (1) for F ∗. Therefore x∗ ∈ ∂F (x) again by (1).

172

Theorem(Rockafellar) Let X be real Banach space. If ϕ is lower-semicontinuousproper convex functional on X, then ∂ϕ is a maximal monotone operator from X intoX∗.

Proof: We prove the theorem when X is reflexive. By Apuland theorem we can assumethat X and X∗ are strictly convex. By Minty-Browder theorem ∂ϕ it suffices to provethat R(F + ∂ϕ) = X∗. For x∗0 ∈ X∗ we must show that equation x∗0 ∈ Fx+ ∂ϕ(x) hasat least a solution x0 Define the proper convex functional on X by

f(x) =1

2|x|2X + ϕ(x)− 〈x∗0, x〉.

Since f is lower-semicontinuous and f(x) → ∞ as |x| → ∞ there exists x0 ∈ D(f)such that f(x0) ≤ f(x) for all x ∈ X. Since F is monotone

ϕ(x)− ϕ(x0) ≥ 〈x∗0, x− x0, 〉 − 〈x− x0, F (x)〉.

Setting xt = x0 + t (u− x0) and since ϕ is convex, we have

ϕ(u)− ϕ(x0) ≥ 1

t(ϕ(xt)− ϕ(x0)) ≥ 〈x∗0, u− x0, 〉 − 〈F (xt), u− x0〉.

Taking limit t→ 0+, we obtain

ϕ(u)− ϕ(x0) ≥ 〈x∗0, u− x0〉 − 〈F (x0), u− x0〉,

which implies x∗0 − F (x0) ∈ ∂ϕ(x0).

We have the perturbation result.

Theorem Assume that X is a real Hilbert space and that A is a maximal monotoneoperator on X. Let ϕ be a proper, convex and lower semi-continuous functional on Xsatisfying dom(A) ∩ dom(∂ϕ) is not empty and

ϕ((I + λA)−1x) ≤ ϕ(x) + λM, for all λ > 0, x ∈ D(ϕ),

where M is some non-negative constant. Then the operator A+ ∂ϕ is maximal mono-tone.

We use the following lemma.

Lemma Let A and B be m-dissipative operators on X. Then for every y ∈ X theequation

y ∈ −Ax−Bλx (8.5)

has a unique solution x ∈ dom (A).

Proof: Equation (10.15) is equivalent to y = xλ − wλ − Bλxλ for some wλ ∈ A(xλ).Thus,

xλ −λ

λ+ 1wλ =

λ

λ+ 1y +

1

λ+ 1(xλ + λBλxλ)

=λ

λ+ 1y +

1

λ+ 1(I − λB)−1.

173

Since A is m-dissipative, we conclude that (10.15) is equivalent to that xλ is the fixedpoint of the operator

Fλx = (I − λ

λ+ 1A)−1(

λ

λ+ 1y +

1

λ+ 1(I − λB)−1x).

By m-dissipativity of the operators A and B their resolvents are contractions on Xand thus

|Fλx1 −Fλx2| ≤λ

λ+ 1|x1 − x2| for all λ > 0, x1, x2 ∈ X.

Hence, Fλ has the unique fixed point xλ and xλ ∈ dom (A) solves (10.15).

Proof of Theorem: From Lemma there exists xλ for y ∈ X such that

y ∈ xλ − (−A)λxλ + ∂ϕ(xλ)

Moreover, one can show that |xλ| is bounded uniformly. Since

y − xλ + (−A)λxλ ∈ ∂ϕ(xλ)

for z ∈ Xϕ(z)− ϕ(xλ) ≥ (z − xλ, y − xλ + (−A)λxλ)

Letting λ(I + λA)−1x, so that z − xλ = λ(−A)λxλ and we obtain

(λ(−A)λxλ, y − xλ + (−A)λxλ) ≤ ϕ((I + λA)−1)− ϕ(xλ) ≤ λM,

and thus|(−A)λxλ|2 ≤ |(−A)λxλ||y − xλ|+M.

Since xλ| is bounded and so that |(−A)λxλ|.

8.2 Duality Theory

In this section we discuss the duality theory. We call

(P ) infx∈X

F (x)

is the primal problem, where X is a Banach space and F : X → (−∞,∞] is a properl.s.c. convex function. We have the following result for the existence of minimizer.

Theorem 2 Let X be reflexive and F be a lower semicontinuous proper convex func-tional defined on X. Suppose F

lim F (x) =∞ as |x| → ∞. (8.6)

Then there exists an x ∈ X such that

F (x) = inf F (x); x ∈ X.

Proof: Let η = infF (x);x ∈ X and let xn be a minimizing sequence such thatlim F (xn) = η. The condition (8.6) implies that xn is bounded in X. Since X isreflexive there exists a subsequence that converges weakly to x in X and it follows fromLemma 1 that F (x) = η.

174

We imbed (P ) into a family of perturbed problem

(Py) infx∈X

Φ(x, y)

where y ∈ Y , a Banach space is an imbedding variable and Φ : X × Y → (−∞,∞] isa proper l.s.c. convex function with Φ(x, 0) = F (x). Thus (P0) = (P ). For example interms of (13.1) we let

F (x) = f(x) + ϕ(Λx) (8.7)

and with Y = HΦ(x, y) = f(x) + ϕ(Λx+ y) (8.8)

Definition The dual problem of (P ) with respect to Φ is

(P ∗) supy∗∈Y

(−Φ∗(0, y∗)).

The value function of (Py) is defined by

h(y) = infx∈X

Φ(x, y), y ∈ Y.

We assume that h(y) > −∞ for all y ∈ Y . Then we have

Theorem D.0 inf (P ) ≥ sup (P ∗).

Proof. For any (x, y) ∈ X × Y and (x∗, y∗) ∈ X∗ × Y ∗

Φ∗(x∗, y∗) ≥ 〈x∗, x〉+ 〈y∗, y〉 − Φ(x, y).

Thus, letting y = 0 and x∗ = 0

F (x) + Φ∗(0, y∗) ≥ 〈0, x〉+ 〈y∗, 0〉 = 0

for all x ∈ X and y∗ ∈ Y ∗. Therefore

inf (P ) = inf F (x) ≥ supy∗∈Y ∗

(−Φ∗(0, y∗)) = sup (P ∗).

In what follows we establish the equivalence of the primal problem (P ) and thedual problem (P ∗). We start with the properties of h.

Lemma D.1 h is convex.

Proof. The proof is by contradiction. Suppose there exist y1, y2 ∈ Y and θ ∈ (0, 1)such that

h(θ y1 + (1− θ) y2) > θ h(y1) + (1− θ)h(y2).

Then there exist c and ε > 0 such that

h(θ y1 + (1− θ) y2) > c > c− ε > θ h(y1) + (1− θ)h(y2).

Let a1 = h(y1) + εθ > h(y1) and

a2 =c− θ a1

1− θ=c− ε− θ h(y1)

1− θ> h(y2).

175

By definition of h there exist x1, x2 ∈ X such that

h(y1) ≤ Φ(x1, y1) ≤ a1 and h(y2) ≤ Φ(x2, y2) ≤ a2.

Thush(θ y1 + (1− θ) y2) ≤ Φ(θ x1 + (1− θ)x2, θ y1 + (1− θ) y2)

≤ θΦ(x1, y1) + (1− θ) Φ(x2, y2) ≤ θ a1 + (1− θ) a2 = c,

which is a contradiction. Hence h is convex.

Lemma D.2 For all y∗ ∈ Y ∗, h∗(y∗) = Φ∗(0, y∗).

Proof.

h∗(y∗) = supy∈Y (〈y∗, y〉 − h(y)) = supy∈Y (〈y∗, y〉 − infx∈X Φ(x, y))

= supy∈Y

supx∈X

((〈y∗, y〉 − Φ(x, y)) = supy∈Y

supx∈X

(〈0, x〉+ 〈y∗, y〉 − Φ(x, y))

= sup(x,y)∈X×Y (〈(0, y∗), (x, y)〉 − Φ(x, y)) = Φ∗(0, y∗).

Note that h is not necessarily l.s.c..

Theorem D.3 If h is l.s.c. at 0, then inf (P ) = sup (P ∗).

Proof. Since F is proper, h(0) = infx∈X F (x) <∞. Since h is convex by Lemma D.1,it follows from Theorem C.4 that h(0) = h∗∗(0). Thus

sup (P ∗) = supy∗∈Y ∗ (−Φ(0, y∗)) = supy∗∈Y ∗ ((〈y∗, 0〉 − h∗(y∗))

= h∗∗(0) = h(0) = inf (P )

Theorem D.4 If h is subdifferentiable at 0, then inf (P ) = sup (P ∗) and ∂h(0) is theset of solutions of (P ∗).

Proof. By Lemma D.2 y∗ solves (P ∗) if and only if

−h∗(y∗) = −Φ(0, y∗) = supy∗∈Y ∗ (−Φ(0, y∗))

= supy∗ (〈y∗, 0)− h∗(y∗)) = h∗∗(0).

By Theorem 2.2h∗∗(0) + h∗∗∗(y∗) = 〈y∗, 0) = 0

if and only if y∗ ∈ ∂h∗∗(0). Since h∗∗∗ = h∗ by Theorem C.5, y∗ solves (P ∗) if and onlyif y∗ ∈ ∂h∗∗(y∗). Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗(0). Therefore ∂h(0) is theset of all solutions of (P ∗) and (P ∗) has at least one solution.

Let y∗ ∈ ∂h(0). Then〈y∗, x〉+ h(0) ≤ h(x)

for all x ∈ X. Let xn is a sequence in X such that xn → 0.

lim infn→∞

h(xn) ≥ lim〈y∗, xn〉+ h(0) = h(0)

and h is l.s.c. at 0. By Theorem D.3 inf (P ) = sup (P ∗).

176

Corollary D.6 If there exists an x ∈ X such that Φ(x, ·) is finite and continuous at0, h is continuous on an open neighborhood U of 0 and h = h∗∗. Moreover,

inf (P ) = sup (P ∗)

and ∂h(0) is the set of solutions of (P ∗).

Proof. First show that h is continuous. Clearly, Φ(X, ·) is bounded above on an openneighborhood U of 0. Since for all y ∈ Y

h(y) ≤ Φ(x, y),

his bounded above on U . Since h is convex by Lemma D.1 and h is continuous byTheorem C.6. Hence h = h∗∗ by Theorem C. Now, h is subdifferential at 0 by TheoremC.10. The conclusion follows from Theorem D.4.

Example D.4 Consider the example (8.7)–(8.8), i.e.,

Φ(x, y) = f(x) + ϕ(Λx+ y) (8.9)

for (13.1). Let us calculate the conjugate of Φ.

Φ∗(x∗, y∗) = supx∈X supy∈Y 〈x∗, x〉+ 〈y∗, y〉 − Φ(x, y)

= supx∈X supy∈Y 〈x∗, x〉+ 〈y∗, y〉 − f(x)− ϕ(Λx+ y)

= supx∈X〈x∗, x〉 − f(x) supy∈Y [〈y∗, y〉 − ϕ(Λx+ y)]

where

supy∈Y [〈y∗, y〉 − ϕ(Λx+ y)] = supy∈Y [〈y∗,Λx+ y〉 − ϕ(Λx+ y)− 〈y∗,Λx〉]

= supz∈Y [〈y∗, z〉 −G(z)]− 〈y∗,Λx〉 = ϕ(y∗)− 〈y∗,Λx〉.

Thus

Φ∗(x∗, y∗) = supx∈X〈x∗, x〉 − 〈y∗,Λx〉 − f(x) + ϕ(y∗)

= supx∈X〈x∗ − Λ∗y∗, x〉 − f(x) + ϕ(y∗) = f∗(x∗ − Λy∗) + ϕ∗(y∗).

Theorem D.7 For any x ∈ X and y∗ ∈ Y ∗, the following statements are equivalent.

(1) x solves (P ), y∗ solves (P ∗), and min (P ) = max (P ∗).(2) Φ(x) + Φ∗(0, y∗) = 0.(3) (0, y∗) ∈ ∂Φ(x, 0).

Proof: Suppose (1) holds, then obviously (2) holds. Suppose (2) holds, then since

Φ(x, 0) = F (x) ≥ inf (P ) ≥ sup (P ∗) ≥ −Φ(0, y∗)

by Theorem D.2, thus

Φ(x, 0) = min (P ) = sup (P ∗) = −Φ∗(0, y∗)

177

Therefore (2) implies (1). Since 〈(0, y∗), (x, 0)〉 = 0, (2) and (3) by Theorem C.7.

Any solution y∗ of (P ∗) is called a Lagrange multiplier associated with Φ. In factfor the example (8.9) the optimality condition implies

0 = Φ(x, 0) + Φ∗(0, y∗) = f(x) + f∗(−Λ∗y∗) + ϕ(Λx) + ϕ(y∗)

= [f(x) + f∗(−Λ∗y∗)− 〈−Λ∗y∗, x〉] + [ϕ(Λx) + ϕ∗(y∗)− 〈y∗,Λx)](8.10)

Since each expression of (8.10) in square brackets is nonnegative, it follows that

f(x) + f∗(−Λ∗y∗)− 〈−Λ∗y∗, x〉 = 0

ϕ(Λx) + ϕ∗(y∗)− 〈y∗,Λx) = 0

By Theorem 2.20 ∈ ∂f(x) + Λ∗y∗

y∗ ∈ ∂ϕ(Λx).(8.11)

The function L : X × Y ∗ → (−∞,∞] defined by

L(x, y∗) = infy∈YΦ(x, y)− 〈y∗, y〉 (8.12)

is called the Lagrangian. Note that

Φ∗(x∗, y∗) = supx∈X, y∈Y

〈x∗, x〉+ 〈y∗, y〉 − Φ(x, y)

= supx∈X〈x∗, x〉+ sup

y∈Y〈y∗, y〉 − Φ(x, y) = sup

x∈X〈x∗, x〉 − L(x, y∗)

Thus,−Φ∗(0, y∗) = inf

x∈XL(x, y∗). (8.13)

Hence the dual problem (P ∗) is equivalently written as

supy∗∈Y ∗

infx∈X

L(x, y∗).

Similarly, if Φ is a proper convex l.s.c. function, then the bi-conjugate of Φ in y,Φ∗∗x = Φ and

Φ(x, y) = Φ∗∗x (y) = supy∗∈Y ∗ 〈y∗, y〉 − Φ∗x(y∗)

= supy∗∈Y ∗

〈y∗, y〉+ L(x, y∗).

HenceΦ(x, 0) = sup

y∗∈Y ∗L(x, y∗) (8.14)

and the primal problem (P ) is equivalently written as

infx∈X

supy∗∈Y

L(x, y∗).

178

By introducing the Lagrangian L, the Problems (P ) and (P ∗) are formulated as min-max problems, which arise in the game theory. By Theorem D.0

supy∗

infxL(x, y∗) ≤ inf

xsupy∗

L(x, y∗).

Theorem D.8 (Saddle Point) Assume Φ is a proper convex l.s.c. function. Thenthe followings are equivalent.(1) (x, y∗) ∈ X × Y ∗ is a saddle point of L, i.e.,

L(x, y∗) ≤ L(x, y∗) ≤ L(x, y∗) for all x ∈ X, y∗ ∈ Y ∗. (8.15)

(2) x solves (P ), y∗ solves (P ∗), and min (P ) = max (P ∗).

Proof: Suppose (1) holds. From (8.13) and (8.15)

L(x, y∗) = infx∈X

L(x, y∗) = −Φ∗(0, y∗)

and form (8.14) and (8.15)

L(x, y∗) = supy∗∈Y ∗

L(x, y∗) = Φ(x, 0)

Thus, Φ∗(x, 0) + Φ(0, y∗) = 0 and (2) follows from Theorem D.7.Conversely, if (2) holds, then from (8.13) and (8.14)

−Φ∗(0, y∗) = infx∈X L(x, y∗) ≤ L(x, y∗)

Φ∗(x, 0) = supy∗∈Y ∗ L(x, y) ≥ L(x, y∗).

By Theorem D.7 −Φ∗(0, y∗) = Φ∗(x, 0) and (8.15) holds.

Theorem D.8 implies that no duality gap between (P ) and (P ∗) is equivalent to thesaddle point property of the pair (x, y∗).

For the example

L(x, y∗) = infy∈Yf(x) + ϕ(Λx+ y)− 〈y∗, y〉 = f(x) + 〈y∗,Λx〉 − ϕ∗(y∗) (8.16)

If (x, y∗) is a saddle point, then from (8.15)

0 ∈ ∂f(x) + Λ∗y∗

0 ∈ Λx− ∂ϕ∗(x).(8.17)

It follows from Theorem 2.2 that the second equation is equivalent to

y∗ ∈ ∂ϕ(Λx)

and (8.17) is equivalent to (8.11).

8.3 p-Laplacian

Φ(u, p) =

∫ϕ(p) + (λ,∆u− p)

179

9 Optimization Algorithms

In this section we discuss iterative methods for finding a solution to the optimizationproblem. First, consider the unconstrained minimization:

min J(x) over a Hilbert space X.

Assume J : X → R+ is continuously differentiable. If u∗ ∈ X minimizes J , then thenecessarily optimality is given by J ′(u∗) = 0. The gradient method is given by

xn+1 = xn − αJ ′(xn) (9.1)

where α > 0 is a stepsize chosen so that the decent J(xn+1) < J(xn) holds. Letgn = J ′(xn) and since

J(xn − α gn)− J(xn) = −α |J ′(xn)|2 + o(α), (9.2)

there exits a α > 0 satisfies the decent property. A stepsize α > 0 can be determinedby

minα>0

J(xn − α gn) (9.3)

or by a line search method (e.g, Amijo-rule)Next, consider the constrained minimization

min J(x) over x ∈ C. (9.4)

where C is a closed convex set in X. It follows from Theorem 7 that if J : X → R isC1 and x∗ ∈ C is a minimizer, then

(J ′(x∗), x− x∗) ≥ 0 for all x ∈ C. (9.5)

Define the orthogonal projection operator PC of X onto C, i.e., z∗ = PCx minimizes

|z − x|2 over z ∈ C.

From Theorem 7, z∗ ∈ C satisfies

(z∗ − x, z − z∗) ≥ 0 for all z ∈ C. (9.6)

Note that (9.5) is equivalent to

(x∗ − (x∗ − αJ ′(x∗)), x− x∗) ≥ 0 for all x ∈ C and α > 0.

Thus, it is follows from (9.6) that (9.5) is equivalent to

x∗ = PC(x∗ − αJ ′(x∗)).

The projected gradient method for (9.4) is defined by

xn+1 = PC(xn − αn J ′(xn)) (9.7)

with a stepsize αn > 0.

180

Define the the Hessian H = J ′′(x0) ∈ L(X,X) of J at x0 by

J(x)− J(x0)− (J ′(x0), x− x0) =1

2(x− x0, H(x− x0)) + o(|x− x0|2).

Thus, we minimize over x

J(x) ∼ J(x0) + (J ′(x0), x− x0) +1

2(x− x0, H(x− x0))

we obtain the damped-Newton method for equation given by

xn+1 = xn − αH(xn)−1J ′(xn), (9.8)

where α > 0 is a stepsize and α = 1 corresponds to the Newton method. In the caseof min J(x) over x ∈ C we obtain the projected Newton method

xn+1 = ProjC(xn − αH(xn)−1J ′(xn)). (9.9)

Computing the Hessian can be expensive and numerical evaluation can also be lessaccurate. The finite rank scant update (BFGS and GFP) is used to substitute H(Quasi-Newton method). It is a variable merit method and such updates are pre-conditioner for the gradient method.

Consider the (regularized) nonlinear least square problem

min1

2|F (x)|2Y +

α

2(Px, x)X (9.10)

where F : X → Y is C1 and P is a nonnegative self-adjoint operator on a Hilbert spaceX. The Gauss-Newton method is an iterative method of the form

xn+1 = argmin1

2|F ′(xn)(x− xn) + F (xn)|2Y +

α

2(Px, x)X,

i.e.,xn+1 = xn − (F ′(xn)∗F (xn) + αP )−1(F ′(xn)∗F (xn)). (9.11)

9.1 Constrained minimization and Lagrange multiplier method

Consider the constrained minimization

min J(x, u) = F (x) +H(u) subject to E(x, u) = 0 and u ∈ C. (9.12)

We use the implicit function theory for developing algorithms for (9.12).

Implicit Function Theory Let E : X×U → X is C1. Suppose a pair (x, u) satisfiesE(x, u) = 0 and Ex(x, u) is bounded invertible. Then the exists a δ > 0 such thatfor |u − u| < δ equation E(x, u) = 0 has a locally defined unique solution x = Φ(u).Moreover, Φ : U → X is continuously differentiable at u and x = Φ′(x)(d) satisfies

Ex(x, u)x+ Eu(x, u)d = 0.

Theorem (Lagrange Calculus) Assume that E(x, u) = 0 and Ex(x, u) is boundedinvertible. Then, by the implicit function theory x = Φ(u) and for J(u) = J(Φ(u), u)

(J ′(u), d) = (H ′(u), d) + (Eu(x, u)∗λ, d)

181

where λ ∈ Y satisfiesEx(x, u)∗λ+ F ′(x) = 0.

Proof: From the implicit function theory

(J ′, d) = (F ′(x), x) + (H ′(u), d),

whereEx(x, u)x+ Eu(x, u) = 0.

Thus, the claim follows from

(F ′(x), x) = −(E∗xλ, x) = −(λ,Exx) = (λ,Eud).

Hence we obtain the gradient method for equality constraint problem (9.12);

Gradient method

1. Solve for xnE(xn, un) = 0

2. Solve for λnEx(xn, un)∗λn + F ′(xn) = 0

3. Compute the gradient by

gn = H ′(un) + Eu(xn, un)∗λn

4. Gradient Step: Updateun+1 = PC(un − α gn)

with a line search for α > 0.

9.2 Conjugate gradient method

In this section we develop the conjugate gradient and conjugate residual method forthe saddle point problem.

Consider the quadratic programing

J(x) =1

2(Ax, x)X − (b, x)X

where A is positive self-adjioint operator on a Hilbert space X. Then, the negativegradient of J is given by

r = −J ′(x) = b−Ax.

The gradient method is given by

xk+1 = xk + αk pk, pk = rk,

where the step size αk is selected by

min J(xk + α rk) over α > 0.

182

That is, the Cauchy step is given by

0 = (J ′(xk+1), pk) = α (Apk, pk)− (pk, rk)⇒ αk =(pk, rk)

(Apk, pk). (9.13)

It is known that the gradient method with Cauchy step searches the solution on asubspace spanned by the major and minor axes of A dominantly, and exhibits a zig-zag pattern of its iterates. If the condition number

ρ =max σ(A)

min σ(A)

is large, then it is very slow convergent. In order to improve the convergence weintroduce the conjugacy of the search directions pk and it can be achieved by theconjugate gradient method. For example, a new search direction pk is extracted forthe residual rk = b−Axk by

pk = rk − β pk−1

and the orthogonality

0 = (Apk, pk−1) = (rk, Apk−1)− β(Apk−1, pk−1)

implies

βk =(rk, Apk−1)

(Apk−1, pk−1). (9.14)

Let P is the pre-conditioner, self-adjiont operator so that the condition number ofP−

12AP−

12 is much smaller that the one for A. If we apply (9.13)(9.14) with the

pre-conditioner P , we have the pre-conditioned conjugate gradient method:

• Initialize x0 and set

r0 = b−Ax0, z0 = P−1r0, p0 = z0

• Iterate on k until (rk, zk) is sufficiently small

αk =(rk, zk)

(pk, Apk), xk+1 = xk + αk pk, rk+1 = rk − αk Apk.

zk+1 = P−1rk+1, βk =(zk+1, rk+1)

(zk, rk), pk+1 := zk+1 + βk pk

Here, zk is the pre-conditioned residual and (rk, zk) = (rk, P−1rk). The above

formulation is equivalent to applying the conjugate gradient method without precon-ditioned system

P−12AP−

12 x = P−

12 b,

where x = P12x. We have the conjugate gradient theory.

Theorem (Conjugate gradient method)(1) We have conjugate property

(rk, zj) = 0, (pk, Apj) = 0 for 0 ≤ j < k.

183

(2) xm+1 minimizes J(x) over x ∈ x0 + spanp0, · · · , pm.

Proof: For k = 1(r1, z0) = (r0, z0)− α0 (Az0, z0) = 0

and since (r1, z1) = (r0, z1)− α0 (r1, Az0)

(p1, Ap0) = (z1, Az0) + β0(z0, Az0) = − 1

α0(r1, z1) + β0(z0, Az0) = 0

Note that spanp0, · · · , pk = spanz0, · · · , zk. In induction in k, for j < k

(rk+1, zj) = (rk, zj)− αk(Apk, pj) = 0

and((rk+1, zk) = (rk, zk)− αk(Apk, pk) = 0

Also, for j < k

(pk+1, Apj) = (zk+1, Apj) + βk(pk, Apj) = − 1

αj(zk+1, rj+1 − rj) = 0

and

(pk+1, Apk) = (zk+1, Apk) + βk(pk, Apk) = − 1

αk(rk+1, zk) + βk(pk, Apk) = 0.

Hence (1) holds. For (2) we note that

(J ′(xm+1), pj) = (rm+1, pj) = 0 for j ≤ m

and xm+1 = x0 + spanp0, · · · , pm. Thus, (2) holds.

Remark Note that

rk = r0 −k−1∑j=0

αjP−1Apj

Thus,

|rk|2 = |r0|2 − (2k−1∑

αj=0P−1Apj , r0) + |αj |.

9.3 Conjugate residual method

Consider the constrained quadratic programing

min1

2(Qy, y)X − (a, y)X subject to Ey = b in Y.

Then the necessary optimality condition is written for x = (y, λ) ∈ X × Y :

A(y, λ) = (a, b) where A =

Q E∗

E 0

,

where A is self-adjoint but is indefinite.

184

In general we assume A is self-adjoint and invertible. Consider the least squareproblem

1

2|Ax− b|2X =

1

2(AAx, x)X − (Ab, x)X +

1

2|b|2.

We consider the conjugate directions pk satisfying (Apk, Apj) = 0, j < k. The pre-conditioned residual method may be derived in the same way as done for the conjugategradient method:

• Initialize x0 and set

r0 = P−1(b−Ax0), p0 = r0, (Ap)0 = Ap0

• Iterate with k until (rk, rk) is sufficiently small

αk =(rk, Ark)

((Ap)k, P−1(Ap)k), xk+1 = xk + αk pk, rk+1 = rk − αk P−1(Ap)k.

βk =(rk+1, Ark+1)

(rk, Ark), pk+1 = rk+1 + βk pk, (Ap)k+1 = Ark+1 + βk (Ap)k

Theorem (Conjugate residual method) Assuming αk 6= 0,(1) We have conjugate property

(rj , Ark) = 0, (Apk, P−1Apj) = 0 for 0 ≤ j < k.

(2) xm+1 minimizes |Ax− b|2 over x ∈ x0 + spanp0, · · · , pm.

Proof: For k = 1

(Ar0, r1) = (Ar0, r0)− α0 (Ap0, P−1Ap0) = 0

and since (Ap0, P−1Ap1) = (P−1Ap0, Ar1) + β0 (Ap0, P

−1Ap0) and (P−1Ap0, Ar1) =1α0

(r0 − r1, Ar1) = − 1α0

(r1, Ar1), we have

(Ap0, P−1Ap1) = − 1

α0(r1, Ar1) + β0(Ap0, P

−1Ap0) = 0.

Note that spanp0, · · · , pk = spanr0, · · · , rk and rk = r0+P−1Aspanp0, · · · , pk−1.In induction in k, for j < k

(rk+1, Arj) = (rk, Arj)− αk(P−1Apk, Arj) = 0

and(Ark+1, rk) = (Ark, rk)− αk(Apk, P−1Apk) = 0

Also, for j < k

(Apk+1, P−1Apj) = (Ark+1, P

−1Apj) +βk(Apk, P−1Apj) = − 1

αj(Ark+1, rj+1− rj) = 0

and

(Apk+1, P−1Apk) = (Ark+1, P

−1Apk)+βk(Apk, P−1Apk) = − 1

αk(rk+1, Ark+1)+βk(pk, Apk) = 0.

Hence (1) holds. For (2) we note that

(Axm+1 − b, P−1Apj) = −(rm+1, P−1Apj) = 0 for j ≤ m

and xm+1 = x0 + spanp0, · · · , pm. Thus, (2) holds.

185

9.4 MINRES

The conjugate gradient method can be viewed as a special variant of the Lanczosmethod for positive definite symmetric systems. The minimal residual method (MIN-RES) and symmetric LQ method (SYMMLQ) methods are variants that can be appliedto symmetric indefinite systems.

The vector sequences in the conjugate gradient method correspond to a factorizationof a tridiagonal matrix similar to the coefficient matrix. Therefore, a breakdown of thealgorithm can occur corresponding to a zero pivot if the matrix is indefinite. The CGmethod is for the minimization

min(x,Ax)− (b, x)

Thus, for indefinite matrix A the minimization property of the conjugate gradientmethod is no longer well-defined. The MINRES methods is a variant of the conjugategradient method that avoids the LU decomposition and does not suffer from breakdown.MINRES minimizes the residual in the 2-norm

min |Ax− b|2

The convergence behavior of the conjugate gradient and MINRES methods for indefi-nite systems was analyzed by Paige et al. (1995).

When A is not positive definite, but symmetric, we can still construct an orthogonalbasis for the Krylov subspace by three-term recurrence relations. Eliminating thesearch directions in the equations of the conjugate gradient method gives a recurrence

Ari = ri+1ti+1,i + riti,i + ri−1ti−1,i,

which can be written in matrix form as

ARi = Ri+1Ti,

where Ti is an (i+1)× i is the tridiagonal matrix. That is, given a symmetric matrix Aand a vector b, the Lanczos process [30] computes vectors rk and tridiagonal matricesTk according to r0 = 0 and β1v1 = b, and then

pi = Ari, αi = (ri, pi), βi+1ri+1 = pi − αi ri − βi ri−1

where ti+1,i = βi+1, ti,i = αi, ti−1,i = βi. In exact arithmetic, the columns of Ri areorthonormal and the process stops with i = ` and β`+1 = 0.

In this case we have the problem that (·, ·)A no longer defines an inner product.However we can still try to minimize the residual in the 2-norm by obtaining

x(i) ∈ Krylov subspace = r0, Ar0, ..., Ai−1r0, i.e. x(i) = Riy

that minimizes|Ax(i) − b|2 = |ARiy − b|2 = |Ri+1Tiy − b|2.

Now we exploit the fact that if

Di+1 = diag(|r(0)|2, |r(1)|2, ..., |r(i)|2),

186

then Ri+1D−1i+1 is an orthonormal transformation with respect to the current Krylov

subspace|Ax(i) − b|2 = |Di+1T

iy − r(0)|2|e((1)|2,

and this final expression can simply be seen as a minimum norm least squares problem.The element in the (i + 1, i) position of Ti can be annihilated by a simple Givens

rotation and the resulting upper bidiagonal system (the other sub-diagonal elementshaving been removed in previous iteration steps) can simply be solved, which leads tothe MINRES method (Paige and Saunders 1975).

9.5 DIIS method

DIIS (direct inversion in the iterative subspace or direct inversion of the iterative sub-space), also known as Pulay mixing, is an extrapolation technique. DIIS was developedby Peter Pulay in the field of computational quantum chemistry with the intent to ac-celerate and stabilize the convergence of the Hartree-Fock self-consistent field method.Approximate error vectors ei = b − Api corresponding to the sequences of the trialvectors pimi=1 is evaluated. With sufficient large m, we consider a linear combinationpimi=1

p =

m∑cipi.

Then, we have

r = b−Ap =∑i

ci(b−Api) =m∑i=1

ciri

for∑

i ci = 1. The DIIS method seeks to minimize the norm of r = b−Ap over cimi=1,i.e.

min1

2|r|2 subject to

∑i ci = 1.


Bc = λ,∑i

ci = 1,

where the symmetric matrix B is given by

Bij = (ri, rj)

and a constant λ is a constant Lagrange multiplier for the constraint∑

i ci = 1.

9.6 Nonlinear Conjugate Residual method

One can extend the conjugate residual method for the nonlinear equation. Considerthe equality constrained optimization

min J(y) subject to E(y) = 0.

The necessary optimality system for x = (y, λ) is

F (y, λ) =

J ′(y) + E′(y)∗λ

E(y)

= 0.

187

Multiplication of vector by the Jacobian F ′ can be approximated by the difference oftwo residual vectors by

F (xk + t y)− F (xk)

t∼ Ay with t |y| ∼ 1.

Thus, we have the nonlinear version of the conjugate residual method:

Nonlinear Conjugate Residual method

• Calculate sk for approximating Ark by

sk =F (xk + t rk)− F (xk)

t, t =

1

|rk|.

• Update the direction by

pk = P−1rk − βk pk−1, βk =(sk, rk)

(sk−1, rk−1).

• Calculateqk = sk − βk qk−1.

• Update the solution by

xk+1 = xk + αk pk, αk =(sk, rk)

(qk, P−1qk).

• Calculate the residualrk+1 = F (xk+1).

9.7 Newton method

Next, we develop the Newton method for J ′(u) = 0. Define the Lagrange functional

L(x, u, λ) = F (x) +H(u) + (λ,E(x, u)).

Note thatJ ′(u) = Lu(x(u), u, λ(u))

where (x(u), λ(u)) satisfies

E(x(u), u) = 0, E′(x(u))∗λ(u) + F ′(x(u)) = 0

where we assumed Ex(x, u) is bounded invertible. By the implicit function theory

J ′′(u) = Lux(x(u), u, λ(u))x+ Eu(x(u), u, λ(u))∗λ+ Luu(x(u), u, λ(u)),

where (x, λ) satisfies Lxx(x, u, λ) Ex(x, u)∗

Ex(x, u) 0

x

λ

+

Lxu(x, u, λ)

Eu(x, u)

= 0.

188

Thus,

x = −(Ex(x, u))−1Eu(x, u), λ = −(Ex(x, u)∗)−1(Lxx(x, u, λ)x+ Lxu(x, u, λ))

andJ ′′(u) = Luxx− Eu(x, u)∗(Ex(x, u)∗)−1(Luxx+ Lxu) + Luu(x, u, λ).

Theorem (Lagrange Calculus II) The Newton step

J ′′(u)(u+ − u) + J ′(u) = 0

is equivalently written asLxx(x, u, λ) Lxu(x, u, λ) Ex(x, u)∗

Lux(x, u, λ) Luu(x, u, λ) Eu(x, u)∗

Ex(x, u) Eu(x, u, λ) 0

∆x

∆u

∆λ

+

0

Eu(x, u)∗λ+H ′(u)

0

= 0

where ∆x = x+ − x, ∆u = u+ − u and ∆λ = λ+ − λ.

Proof: Define the linear operator T by

T (x, u) =

−Ex(x, u)−1Eu(x, u)

I

.

Then, the Newton method is equivalently written as

T (x, u)∗L′′(x, u, λ)

(∆x∆u

)+ T (x, u)∗

(0

E∗uλ+H ′(u)

)= 0.

Since E′(x, u) is surjective, it follows from the closed range theorem

N(T (x, u)∗) = R(T (x, u))⊥ = N(E′(x, u)∗)⊥ = R(E′(x, u)).

Hence, we obtain the claimed update.

Thus, we have the Newton method for the equality constraint problem (9.12).

Newton method

1. Given u0, initialize (x0, λ0) by

E(x0, u0) = 0, Ex(x0, u0)∗λ0 + F ′(x0) = 0

2. Newton Step: Solve for (∆x,∆u,∆λ) L′′(xn, un, λn) E′(xn, un)∗

E′(xn, un) 0

∆x∆u∆λ

+

0Eu(xn, un)∗λn +H ′(un)

0

= 0

and update un+1 = ProjC(un + α∆u) with a stepsize α > 0.

189

3. Feasible Step: Solve for (xn+1, λn+1)

E(xn+1, un+1) = 0, Ex(xn+1, un+1)∗λn+1 + F ′(xn+1) = 0.

4. Stop or set n = n+ 1 and return to step 2.

where

L′′(xn, un, λn) =

F ′′(xn) + (λn, Exx(xn, un)) (λn, Exu(xn, un))

(λn, Eux(xn, un)) H ′′(un) + (λn, Euu(xn, un))

.

andE′(xn, un) = (Ex(xn, un), Eu(xn, un)).

Consider the general equality constraint minimization;

min J(y) subject to E(y) = 0


J ′(y) + E′(y)∗λ = 0, E(y) = 0, (9.15)

where we assume E′(y) is surjective. Problem (9.12) is a specific case of this, i.e.,

y = (x, u), J(y) = F (x) +H(u), E = E(x, u).

The Newton method applied to the necessary optimality (9.15) for (y, λ) is as follows.

One can apply the Newton method for the necessary optimality system

(Lx(x, u, λ), Lu(x, u, λ,E(x, u)) = 0

and results in the SQP (sequential quadratic programing):

Newton method applied to Necessary optimality system (SQP)

1. Newton direction: Solve for (∆y,∆λ) L′′(yn, λn) E′(yn)∗

E′(yn) 0

∆y

∆λ

+

J ′(yn) + E′(yn)∗λn

E(yn)

= 0. (9.16)

2. Update yn+1 = yn + α∆y and λn+1 = λm + α∆λ.

3. Stop or set n = n+ 1 and return to step 1.

whereL′′(y, λ) = J ′′(y) + E′′(y)∗λ.

The both Newton methods solve the same linear system (9.16) with the differentright hand side. But, the first Newton method we iterate on the variable u only buteach step uses the feasible step E(yn) = E(xn, un) = 0 and Lx(xn, un, λn) = 0 for(xn, λn), which are the steps for the gradient method. The, it computes the Newtondirection for J ′(u) and updates u. In this sense the second Newton’s method is arelaxed version of the first one.

190

10 Sequential programing

In this section we discuss the sequential programming. It is an intermediate methodbetween the gradient method and the Newton method. Consider the constrained op-timization

F (y) subject to E(y) = 0, y ∈ C.

We linearize the equality constraint and consider a sequence of the constrained opti-mization;

min F (y) subject to E′(yn)(y − yn) + E(yn) = 0, y ∈ C. (10.1)

The necessary optimality (if E(yn) is surjective) is given by(F ′(y) + E′(yn)∗λ, y − y) ≥ 0 for all y ∈ C

E′(yn)(y − yn) + E(yn) = 0.(10.2)

Note that the necessary condition for (y∗, λ∗) is written as(F ′(y∗) + E′(yn)∗λ∗ − (E′(yn)∗ − E′(y∗)∗)λ∗, y − y∗) ≥ 0 for all y ∈ C

E′(yn)(y∗ − yn) + E(yn) = E′(yn)(y∗ − yn) + E(yn)− E(y∗).(10.3)

Suppose we have the Lipschitz continuity of solutions to (10.2) and (10.3) with respectto the perturbation; ∆1 = (E′(yn)∗ −E′(y∗)∗)λ∗ and ∆2 = E′(yn)(y∗ − yn) +E(yn)−E(y∗), we have

|y − y∗| ≤ c1 |(E′(yn)∗ − E′(y∗)∗)λ∗|+ c2 |E′(yn)(y∗ − yn) + E(yn)− E(y∗)|.

Thus, for the (damped) update:

yn+1 = (1− α)yn + α y, α ∈ (0, 1), (10.4)

we have the estimate

|yn+1 − y∗| ≤ (1− α+ αγ)|yn − y∗|+ cα |yn − y∗|2, (10.5)

for some constants γ and c > 0. In fact, for the case C = X let zn = (yn, λn),z∗ = (y∗, λ∗) and define Gn by

Gn =

F ′′(yn) E′(yn)∗

E′(yn) 0

.

For the update:zn+1 = (1− α)zn + α z, α ∈ (0, 1]

we have

zn+1 − z∗ = (1− α) (zn − z∗) + αG−1n (

((E′(yn)∗ − E′(y∗)∗)λ∗

0

)+ δn)

191

with

δn =

(F ′(y∗)− F ′(yn+1)− F ′′(yn)(y∗ − yn+1)E(y∗)− E′(yn)(y∗ − yn)− E(yn).

).

Note that if F ′′(yn) = Q, then

G−1n

(∆1

0

)=

(Q−1(∆1 − E′(yn)∗e)

e

), e = (E′(yn)Q−1E′(yn)∗)−1E′(yn)Q−1∆1

Thus, we have

|G−1n

((E′(yn)∗ − E′(y∗)∗)λ∗

0

)| ≤ γ |yn − y∗|

Since|G−1

n δn| ≤ c |yn − y∗|2,

we have|zn+1 − z∗| ≤ (1− α+ αγ) |zn − z∗|+ αc |zn − z∗|2

which imply (10.5). When either the quadratic variation of E is dominated by thelinearization E′ of E, i.e.,

|E′(y∗)†(E′(yn)− E′(y∗))| ≤ β |yn − y|

with small β > 0 or the multiplier |λ∗| is small, γ > 0 is sufficiently small. The estimate(10.5) implies that if |y− y∗|2 is dominating we use a small α > 0 to the damp out andif |y − y∗| is dominating we use α→ 1.

10.1 Second order version

The second order sequential programming is given by

min F (y) + 〈λn, E(y)− (E′(yn)(y − yn) + E(yn))〉

subject to E′(yn)(y − yn) + E(yn) = 0 and y ∈ C.(10.6)

The necessary optimality condition for (10.6) is given by(F ′(y) + (E′(y)∗ − E′(yn)∗)λn + E′(yn)∗λ, y − y) ≥ 0 for all y ∈ C

E′(yn)(y − yn) + E(yn) = 0.(10.7)

It follows from (10.3) and (10.7) that

|y − y∗|+ |λ− λ∗| ≤ c (|yn − y∗|2 + |λn − λ∗|2).

In summary we have

Sequential Programming I

1. Given yn ∈ C, we solve for

min F (y) subject to E′(yn)(y − yn) + E(yn) = 0, over y ∈ C.

192

2. Update yn+1 = (1− α) yn + α y, α ∈ (0, 1). Iterate until convergence.

Sequential Programming II

1. Given yn ∈ C, λn and we solve for y ∈ C:

min F (y)+(λn, E(y)−(E′(yn)(y−yn)+E(yn)) subject to E′(yn)(y−yn)+E(yn) = 0, y ∈ C.

2. Update (yn+1, λn+1) = (y, λ). Iterate until convergence.

10.2 Examples

Consider the constrained optimization of the form

min F (x) +H(u) subject to E(x, u) = 0 and u ∈ C. (10.8)

Suppose H(u) is nonsmooth. We linearize the equality constraint and consider a se-quence of the constrained optimization;

min F (x) +H(u) + 〈λn, E(x, u)− (E′(xn, un)(x− xn, u− un) + E(xn, un)〉

subject to E′(xn, un)(x− xn, u− un) + E(xn, un) = 0 and u ∈ C.(10.9)

Note that if (x∗, u∗) is an optimizer of (10.8), then

F ′(x∗) + (E′(x∗, u∗)− E′(xn, un))λn + E′(xn, un)λ∗ = (E′(xn, un)− E′(x∗, u∗))(λ∗ − λn) = ∆1

E′(xn, un)(x∗ − xn, u∗ − un) + E(xn, un) = ∆2, |∆2| ∼M (|xn − x∗|2 + |un − u∗|2).(10.10)

Thus, (x∗, u∗) is the solution to the perturbed problem of (10.9):

min F (x) +H(u) + 〈λn, E(x, u)− (E′(xn, un)(x− xn, u− un) + E(xn, un)〉

subject to E′(xn, un)(x− xn, u− un) + E(xn, un) = ∆ and u ∈ C.(10.11)

Let (xn+1, un+1) be the solution of (10.9). One can assume the Lipschitz continuity ofsolutions to (10.11):

|xn+1 − x∗|+ |un+1 − u∗|+ |λn − λ∗| ≤ C |∆| (10.12)

From (10.10) and assumption (10.12) we have the quadratic convergence of the sequen-tial programing method (10.15):

|xn+1 − x∗|+ |un+1 − u∗|+ |λ− λ∗| ≤M (|xn − x∗|2 + |un − u∗|2 + |λn − λ∗|). (10.13)

Assuming H is convex, the necessary optimality given byF ′(x) + (Ex(x, u)∗ − Ex(xn, un)∗)λn + Ex(xn, un)∗λ = 0

H(v)−H(u) + ((Eu(x, u)∗ − Eu(xn, un)∗)λn + Eu(x, u)∗λ, v − u) ≥ 0 for all v ∈ C

Ex(xn, un)(x− xn) + Eu(xn, un)(u− un) + E(xn, un) = 0.(10.14)

193

As in the Newton method we use

(Ex(x, u)∗ − Ex(xn, un)∗)λn ∼ E′′(xn, un)(x− xn, u− un)λn

F ′(xn+1) ∼ F ′′(xn)(xn+1 − xn) + F ′(xn)

and obtain

F ′′(xn)(x− xn) + F ′(xn) + λn(Exx(xn, un)(x− xn) + Exu(xn, un)(u− un)) + Ex(xn, un)∗λ = 0

Ex(xn, un)x+ Eu(xn, un)u = Ex(xn, un)xn + Eu(xn, un)un − E(xn, un),

which is a linear system of equations for (x, λ). Thus, one has x = x(u), λ = λ(u), acontinuous affine function in u; x(u)

λ(u)

=

F ′′(xn) + λnExx(xn, un) Ex(xn, un)∗

Ex(xn, un) 0

−1

RHS (10.15)

with

RHS =

(F ′′(xn)xn − F ′(xn) + λnExx(xn, un)xn − Exu(xn, un)(u− un))λn

Ex(xn, un)xn − Eu(xn, un)(u− un)− E(xn, un)

)and then the second inequality of (10.14) becomes the variational inequality for u ∈ C.Consequently, we obtain the following algorithm:

Sequential Programming II

1. Given u ∈ C, solve (10.15) for (x(u), λ(u)).

2. Solve the variational inequality for u ∈ C:

H(v)−H(u) + ((Eu(x(u), u)∗ − Eu(xn, un)∗)λn + Eu(xn, un)∗λ(u), v − u) ≥ 0,(10.16)

for all v ∈ C.3. Set un+1 = u and xn+1 = x(u). Iterate until convergence.

Example (Optimal Control Problem) Let (x, u) ∈ H1(0, T ;Rn) × L2(0, T ;Rm). Con-sider the optimal control problem:

min

∫ T

0(`(x(t)) + h(u(t))) dt

subject to the dynamical constraint

d

dtx(t) = f(x(t)) +Bu(t), x(0) = x0

and the control constraint

u ∈ C = u(t) ∈ U, a.e.,

194

where U is a closed convex set in Rm. Then, the necessary optimality condition for(10.9) is

E′(xn, un)(x− xn, u− un) + E(xn, un) = − d

dtx+ f ′(xn)(x− xn) +Bu = 0

F ′(x) + Ex(xn, un)λ =d

dtλ+ f ′(xn)tλ+ `′(x) = 0

u(t) = argminv∈Uh(v) + (Btλ(t), v).

If h(u) = α2 |u|

2 and U = Rm, then u(t) = −Btλ(t)α . Thus, (10.9) is equivalent to solving

the two point boundary value for (x, λ):d

dtx = f ′(xn)(x− xn)− 1

αBBtλ, x(0) = x0

− d

dtλ = (f ′(x)− f ′(xn))∗λn + f ′(xn)∗λ+ `′(x), λ(T ) = 0.

Example (Inverse Medium problem) Let (y, v) ∈ H1(Ω)×H(Ω). Consider the inversemedium problem:

min

∫Γ

1

2|y − z|2 dsx +

∫Ω

(β |v|+ α

2|∇v|2) dx

subject to

−∆y + vy = 0 in Ω,∂y

∂n= g at the boundary Γ

andv ∈ C = 0 ≤ v ≤ γ <∞,

where Ω is a bounded open domain in R2, z is a boundary measurement of the voltagey and g ∈ L2(Γ) is a given current. Problem is to determine the potential functionv ∈ C from z ∈ L2(Γ) in a class of sparse and clustered media. Then, the necessaryoptimality condition for (10.9) is given by

−∆y + vn(y − yn) + vyn = 0,∂y

∂n= g

−∆λ+ vnλ = 0,∂λ

∂n= y − z

∫Ω

(α (∇v,∇φ−∇v) + β (|φ| − |v|) + (λyn, v)) dx ≥ 0 for all φ ∈ C

10.3 Exact penalty method

Consider the penalty method for the inequality constraint minimization

min F (x), subject to Gx ≤ c (10.17)

195

where G ∈ L(X,L2(Ω)). If x∗ is a minimizer of (10.17), the necessary optimalitycondition is given by

F ′(x∗) +G∗µ = 0, µ = max(0, µ+Gx∗ − c) a.e. in Ω. (10.18)

For β > 0 the penalty method is defined by

min F (x) + β ψ(Gx− c) (10.19)

where

ψ(y) =

∫Ω

max(0, y) dω.


−F ′(x) ∈ βG∗∂ψ(Gx− c). (10.20)

where

∂ψ(s) =

0 s < 0[0, 1] s = 01 s > 0

Suppose the Lagrange multiplier µ satisfies

supω∈Ω|µ(ω)| ≤ β, (10.21)

then µ ∈ β ∂ψ(Gx∗−c) and thus x∗ satisfies (10.20). Moreover, if (10.19) has a uniqueminimizer in a neighborhood of x∗, then x = x∗ where x is a minimizer of (10.19).

Due to the singularity and the non-uniqueness of the subdifferential of ∂ψ, the directtreatment of the condition (10.19) many not be feasible for numerical computation. Wedefine a regularized functional of max(0, s); for ε > 0

maxε(0, s) =

ε2 s ≤ 0

12ε |s|

2 + ε2 0 ≤ s ≤ ε

s s ≥ 0

.

and consider the regularized problem of (10.19);

min Jε(x) = F (x) + β ψε(Gx− c), (10.22)

whee

ψε(y) =

∫Ω

maxε(0, y) dω

Theorem (Consistency) Assume F is weakly lower semi-continuous. For an arbi-trary β > 0, any weak cluster point of the solution xε, ε > 0 of the regularized problem(10.22) converges to a solution x of the non-smooth penalized problem (10.19) as ε→ 0.

Proof: First we note that 0 ≤ maxε(s) − max(0, s) ≤ ε

2for all s ∈ R. Let x be a

solution to (10.19) and xε be a solution of the regularized problem (10.22). Then wehave

F (xε) + β ψε(xε) ≤ F (x) + β ψε(x)

F (x) + β ψ(x) ≤ F (xε) + β ψ(xε).

196

Adding these inequalities,

ψε(xε)− ψ(xε) ≤ ψε(x)− ψ(x).

Thus, we have for all cluster points x of xε

F (x) + β ψ(x) ≤ F (x) + β ψ(x),

since F is weakly lower semi-continuous and

ψε(xε)− ψ(x) = ψε(xε)− ψ(xε) + ψ(xε)− ψ(x)

≤ ψε(x)− ψ(x) + ψ(xε)− ψ(x).

andlimε→0+

(ψε(x)− ψ(x) + ψ(xε)− ψ(x)) ≤ 0.

As a consequence of Theorem, we have

Corollary If (10.19) has the unique minimizer in a neighborhood of x∗ and (10.21)holds, then xε converges to the solution x∗ of (10.17) as ε→ 0.

The necessary optimality for (10.22) is given by

F ′(x) +G∗ψ′ε(Gx− c) = 0. (10.23)

Although the non-uniqueness for concerning subdifferential in the optimality conditionin (10.20) is now bypassed through regularization, the optimality condition (10.23)is still nonlinear. For this objective, we first observe that max′ε has an alternativerepresentation, i.e.,

max′ε(s) = χε(s)s, χε(s) =

0 s ≤ 0

1max(ε,s) s > 0

(10.24)

This suggests the following semi-implicit fixed point iteration;

αP (xk+1 − xk) + F ′(xk) + β G∗χk(Gxk − c) = 0, χk = χε(Gxk − c), (10.25)

where P is positive, self-adjoint and serves a pre-conditioner for F ′ and α > 0 serves astabilizing and acceleration stepsize (see, Theorem 2).

Let dk = xk+1 − xk. Equation (10.25) for xk+1 is equivalent to the equation for dk

αPdk + F ′(xk) + βG∗χk(Gxk − c+Gdk) = 0,

which gives us(αP + β G∗χkG)dk = −F ′ε(xk). (10.26)

The direction dk is a decent direction for Jε(x) at xk, indeed,

(dk, J′ε(x

k)) = −((αP + β G∗χkG)dk, dk) = −α(Ddk, dk)− β(χkGd

k, Gdk) < 0.

Here we used the fact that P is strictly positive definite. So, the iteration (10.25) canbe seen as a decent method:

197

Algorithm (Fixed point iteration)

Step 0. Set parameters: β, α, ε, P and 0 < ρ ≤ 1

Step 1. Compute the direction by (αP + βG∗χkG)dk = −F ′(xk)

Step 2. Update xk+1 =)xk + ρdk.

Step 3. If |J ′ε(xk)| < TOL, then stop. Otherwise repeat Step 1 - Step 2.

Let us make some remarks on the Algorithm. In many applications, the structure ofF ′ and G are sparse block diagonals, and the resulting system (10.26) for the directiondk then becomes a linear system with a sparse symmetric positive-definite matrix;and can be efficiently solved by the Cholesky decomposition method, for example. IfF (x) = 1

2(x,Ax)− (b, x), then we have F ′(x) = Ax− b. For this case we may use thealternative update

αP (xk+1 − xk) +Axk+1 − b+ β G∗χk(Gxk+1 − c) = 0,

assuming that it doesn’t cost much to perform this fully implicit step.For the bilateral inequality constraint g ≤ Gx ≤ c the update (10.25) becomes

αP (xk+1 − xk) + F ′(xk) + β G∗(χk(Gxk+1 − c)A+

k+ (g −Gxk+1)A−k

(10.27)

where the active index set Ak and the diagonal matrix χk on Ak are defined by

if j ∈ A+k = j : (Gxk − c)j ≥ 0, (χk)jj = 1

max(ε,(Gxk−c)j).

if j ∈ A−k = j : (Gxk − g)j ≤ 0, (χk)jj = 1max(ε,(g−Gxk)j)

The damping parameter 0 < ρ ≤ 1 is used to globalizing the convergence of theproposed algorithm, i.e., we select ρ to have the decent property for Jε(x). The algo-rithm is globally convergent with ρ = 1 practically and the following results justify thefact.

Theorem 3 Let

R(x, x) = −(F (x)− F (x)− F ′(x)(x− x)) + α (P (x− x), x− x) ≥ ω |x− x|2

for some ω > 0 and all x, x. Then, we have

R(xk+1, xk) + F (xk+1)− F (xk) + β2 ((χkGd

k, Gdk)

+∑k=j′

(χk, |Gxk+1 − c|2 − |Gxk − c|2) +∑k=j′′

(χk, |Gxk+1 − g|2 − |Gxk − g|2) = 0,

where j′ : (Gxk − c)j′ ≥ 0 and j′′ : (Gxk − g)j′′ ≤ 0.Proof: Multiplying (10.27) by dk = xk+1 − xk

α (Pdk, dk)− (F (xk+1)− F (xk)− F ′(xk)dk)

+F (xk+1)− F (xk) + Ek = 0,

198

where

Ek = β(χk(G(j, :)xk+1 − c), G(j, :)dk) = β2 ((χkG(j, :)tdk, G(j, :)dk)

+β

2

∑k=j′

(χk, |Gxk+1 − c|2 − |Gxk − c|2) +∑k=j′′

(χk, |Gxk+1 − g|2 − |Gxk − g|2)),

Corollary 2 Suppose all inactive indexes χk = 0 remain inactive, then

ω |xk+1 − xk|2 + Jε(xk+1)− Jε(xk) ≤ 0

and xk is globally convergent.

Proof: Since s → ψ(√|s− c|) on s ≥ c is and s → ψ(

√|s− g|) on s ≤ g are concave,

we have

(χk, |(Gxk+1 − c)j′ |2 − |(Gxk − c)j′ |2) ≥ ψε((Gxk+1 − c)j′)− ψε((Gxk − c)j′)

and

(χk, |(g −Gxk+1)j′′ |2 − |(g −Gxk)j′′ |2) ≥ ψε((g −Gxk+1)j′′)− ψε((g −Gxk)j′′)

Thus, we obtain

F (xk+1)+β ψε(xk+1)+R(xk+1, xk)+

β

2(χkG(j, :)(xk+1−xk), G(j, :)(xk+1−xk)) ≤ F (xk)+β ψε(x

k).

If we assume R(x, x) ≥ ω |x − x|2 for some ω > 0, then F (xk) + ψε(Gxk − c) is

monotonically decreasing and ∑|xk+1 − xk|2 <∞.

Corollary 3 Suppose i ∈ Ack = Ik is inactive but ψε(Gxk+1i − c) > 0. Assume

β

2(χkG(xk+1 − xk), G(xk+1 − xk))− β

∑i∈Ik

ψε(Gxk+1i − c) ≥ −ω′ |xk+1 − xk|2 (10.28)

with 0 ≤ ω′ < ω, then the algorithm is globally convergent.

The condition (10.28) holds if

n=5000; m=50;

A=rand(n,n); A=A+A’; A=A+70*eye(n); C=rand(m,n);

f=[ones(n,1);.5*ones(m,1)]; u=A\f(1:n);

k=find(C*u(1:n)>=.5*ones(m,1));

AA=[A C(k,:)’;C(k,:) -1e-8*eye(size(k,1))];

uu=u; u=AA\[f(1:n);.5*ones(size(k,1),1)];

norm(u(1:n)-uu(1:n))

199

11 Sensitivity analysis

In this section the sensitivity analysis is discussed for the parameter-dependent opti-mization. Consider the parametric optimization problem;

F (x, p) x ∈ C. (11.1)

For given p ∈ P , a complete matrix space, let x = x(p) ∈ C is a minimizer of (11.1).For p ∈ P and p = p+ tp ∈ P with t ∈ R and increment p we have

F (x(p), p) ≤ F (x(p), p), F (x(p), p) ≤ F (x(p), p).

Thus, we have

F (x(p), p)− F (x(p), p) = F (x(p), p)− F (x(p), p) + F (x(p), p)− F (x(p), p) ≤ F (x(p), p)− F (x(p), p)

F (x(p), p)− F (x(p), p) = F (x(p), p)− F (x(p), p) + F (x(p), p)− F (x(p), p) ≥ F (x(p), p)− F (x(p), p)

and

F (x(p), p)− F (x(p), p) ≤ F (x(p), p)− F (x(p), p) ≤ F (x(p), p)− F (x(p), p).

Hence for t > 0 we have

F (x(p+ tp), p+ tp)− F (x(p+ tp), p)

t≤ F (x(p+ tp), p+ tp)− F (x(p), p)

t≤ F (x(p), p+ tp)− F (x(p), p)

t

F (x(p), p)− F (x(p), p− tp)t

≤ F (x(p), p)− F (x(p− tp), p− tp)t

≤ F (x(p− tp), p)− F (x(p− t p), p− tp)t

.

Based on this inequality we have

Theorem (Sensitivity I) Assume(H1) there exists the continuous graph p → x(p) ∈ C in a neighborhood of (x(p), p) ∈C × P and define the value function

V (p) = F (x(p), p).

(H2) in a neighborhood of (x(p), p)

(x, p) ∈ C ×Q→ Fp(x, p) is continuous.

Then, the G-derivative of V at p exists and is given by

V ′(p)p = Fp(x(p, p)p. (11.2)

Conversely, if V (p) is differentiable at p then (11.2) holds.

Lemma (Sensitivity I) Suppose V (p) = infx∈C F (x, p) where C is a closed convexset. If p ∈ F (x, p) is convex, then V (p) is concave and Fp(x, p) ∈ −∂V (p).

Proof: Since for 0 ≤ t ≤ 1

F (x, t p1 + (1− t) p2) ≥ t F (x, p1) + (1− t)F (t, p2)

200

we haveV (t p1 + (1− t) p2) ≥ t V (p1) + (1− t)V (p2).

and V is concave. Since

V (p)− V (p) = F (x, p)− F (x, p) + F (x, p)− F (x, p) ≤ F (x, p)− F (x, p)

we haveFp(x, p) ∈ −∂V (p).

Next, consider the implicit function case. Consider a functional defined by

V (p) = F (x(p), p), E(x(p), p) = 0

where we assume that the constraint E is C1 and E(x, p) = 0 defines a continuoussolution graph p→ x(p) in a neighborhood of (x(p), p).

Theorem (Sensitivity II) Assume there exists λ that satisfies the adjoint equation

Ex(x(p), p)∗λ+ Fx(x(p), p) = 0

and assume the Lagrange functional

L(x, p, λ) = F (x) + (E(x), λ)

satisfies

L(x(p), p, λ)− L(x(p), p, λ)− Lx(x(p), p, λ)(x(p)− x(p)) = o(|p− p|). (11.3)

Then, p→ V (p) is differentiable at p and

V ′(p)p = (Lp(x(p), p, λ), p).

Proof: If we let

F (x(p), p)− F (x(p, (p)− F ′(x(p))(x(p)− x(p)) = ε1,

we have

V (p)− V (p)− Fx(x(p))(x(p)− x(p)) = ε1 + Fp(x(p), p)(p− p) + o(|p− p|).

Note that

(Ex(x(p), p)(x(p)− x(p)), λ) + Fx(x(p))(x(p)− x(p)) = 0.

Letε2 = (E(x(p), p)− E(x(p), p)− Ex(x(p), p)(x(p)− x(p)), λ),

where

0 = (E(x(p), p)− E(x(p), p), λ) = ε2 − Ep(x(p), p)(p− p)), λ) + o(|p− p|)

Thus, summing these we obtain

V (p)− V (p) = Lp(x(p), p)(p− p) + ε1 + ε2 + o(|p− p|)

201

Since

L(x(p), p, λ)− L(x(p), p, λ)− Lx(x(p), p, λ)(x(p)− x(p)) = ε1 + ε2,

p→ V (x(p)) is differentiable at p and the desired result.

Lemma (Sensitivity II) If F (x, p) = F (x), p→ E(x, p) is linear and x→ L(x, p, λ)is convex, then p→ V (p) is concave and Lp(x, p, λ) ∈ −∂V (p).

If we assume that E and F is C2 in x, then it is sufficient to have the Holdercontinuity

|x(p)− x(p)|X ∼ o(|p− p|12Q)

for condition (11.3) holding.

Next, consider the parametric constrained optimization:

F (x, p) subject to E(x, p) ∈ K.

We assume(H3) λ ∈ Y satisfies the adjoint equation

Ex(x(p), p)∗λ+ Fx(p), p) = 0.

(H4) Conditions (11.3) holds.(H5) In a neighborhood of (x(p), p).

(x, p) ∈ C × P → Ep(x, p) ∈ Y is continuous.

Eigenvalue problem Let A(p) be a linear operator in a Hilbert space X. For x =(µ, y) ∈ R×X

F (x) = µ, subject to A(p)y = µ y.

That is, (µ, y) is an eigen pair of A(p).

A(p)∗λ− µλ = 0, (y, λ) = 1

Thus,

µ′(p) = (d

dpA(p)py, λ).

Next, consider the parametric constrained optimization; given p ∈ Q

minx,u

F (x, u, p) subject to E(x, u, p) = 0. (11.4)

Let (x, u) is a minimizer given p ∈ Q. We assume(H3) λ ∈ Y satisfies the adjoint equation

Ex(x, u, p)∗λ+ Fx(x, u, p) = 0.

(H4) Conditions (11.3) holds.(H5) In a neighborhood of (x(p), p).

(x, p) ∈ C ×Q→ Ep(x, p) ∈ Y is continuous.

202

Theorem (Sensitivity III) Under assumptions (H1)–(H5) the necessary optimalityfor (13.18) is given by

E∗xλ+ Fx(x, u, p) = 0

E∗uλ+ Fu(x, u, p) = 0

E(x, u, p) = 0

andV ′(p)p = Fp(x, u, p)p+ (λ,Ep(x, u, p)p) = 0.

Proof:

F (x(p), p)− F (x(p), p) = F (x(p), p)− F (x(p), p) + F (x(p), p)− F (x(p), p)

≥ Fx(x(p), p)(x(p)− x(p) + F (x(p), p)− F (x(p), p)

F (x(p), p)− F (x(p), p) = F (x(p), p)− F (x(p), p) + F (x(p), p)− F (x(p), p)

≤ Fx(x(p), p)(x(p)− x(p)) + F (x(p), p)− F (x(p), p).

Note that

E(x(p), p)−E(x(p), p) = Ex(x(p))(x(p)−x(p)+E(x(p), p)−E(x(p), p) = o(|x(p)−x(p|)

and(λ,Ex(x, p)(x(p)− x(p)) + Fx(x, p)(x(p)− x(p)〉 = 0

Combining the above inequalities and letting t→ 0, we have

12 Augmented Lagrangian Method

In this section we discuss the augmented Lagrangian method for the constrained min-imization;

min F (x) subject to E(x) = 0, G(x) ≤ 0 (12.1)

over x ∈ C. For the equality constraint case we consider the augmented Lagrangianfunctional

Lc(x, λ) = F (x) + (λ,E(x)) +c

2|E(x)|2.

for c > 0. The augmented Lagrangian method is an iterative method with the update

xn+1 = argminx∈C Lc(x, λn)

λn+1 = λn + cE(xn).

It is a combination of the multiplier method and the penalty method. That is, if c = 0and λn+1 = λn + α E(xn) with step size α > 0 is the multiplier method and if λn = 0is the penalty method. The multiplier method converges (locally) if F is (locally)uniformly convex. The penalty method requires c→∞ and may suffer if c > 0 is too

203

large. It can be shown that the augmented Lagrangian method is locally convergentprovided that L′′c (x, λ) is uniformly positive near the solution pair (x, λ). Since

L′′(x, λ) = F ′′(x) + (λ, E′′(x)) + cE′(x)∗E′(x),

the augmented Lagrangian method has an enlarged convergent class compared to themultiplier method. Unlike the penalty method it is not necessary to c → ∞ and theLagrange multiplier update speeds up the convergence.

For the inequality constraint case G(x) ≤ 0 we consider the equivalent formulation:

min F (x) +c

2|G(x)− z|2 + (µ,G(x)− z) over (x, z) ∈ C × Z

subject toG(x) = z, and z ≤ 0.

For minimizing this over z ≤ 0 we have that

z∗ = min(0,µ+ cG(x)

c)

attains the minimum and thus we obtain

min F (x) +1

2c(|max(0, µ+ cG(x))|2 − |µ|2) over x ∈ C.

For (12.1), given (λ, µ) and c > 0, define the augmented Lagrangian functional

Lc(x, λ, µ) = F (x) + (λ,E(x)) +c

2|E(x)|2 +

1

2c(|max(0, µ+ cG(x))|2 − |µ|2). (12.2)

The first order augmented Lagrangian method is a sequential minimization of Lc(x, λ, µ)over x ∈ C;

Algorithm (First order Augmented Lagrangian method)

1. Initialize (λ0, µ0)

2. Let xn be a solution to

min Lc(x, λn, µn) over x ∈ C.

3. Update the Lagrange multipliers

λn+1 = λn + cE(xn), µn+1 = max(0, µn + cG(xn)). (12.3)

Remark (1) Define the value function

Φ(λ, µ) = minx∈C

Lc(x, λ, µ).

Then, one can prove that

Φλ = E(x), Φµ = max(0, µ+ cG(x))

(2) The Lagrange multiplier update (12.3) is a gradient method for maximizing Φ(λ, µ).

204

(3) For the equality constraint case

Lc(x, λ) = F (x) + (λ,E(x)) +c

2|E(x)|2

andL′c(x, λ) = L′0(x, λ+ cE(x)).

The necessary optimality formaxλ

minx

Lc(x, λ)

is given byL′c(x, λ) = L′0(x, λ+ cE(x)) = 0, E(x) = 0 (12.4)

If we apply the Newton method to system (12.4), we obtain L′′0(x, λ+ cE(x)) + cE′(x)∗E′(x) E′(x)∗

E′(x) 0

x+ − x

λ+ − λ

= −

L′0(x, λ+ cE(x))

E(x)

.

The c > 0 adds the coercivity on the Hessian L′′0 term.

12.1 Primal-Dual Active set method

For the inequality constraint G(x) ≤ 0 (e.g., Gx− c ≤ 0);

Lc(x, µ) = F (x) +1

2c(|max(0, µ+ cG(x)|2 − |µ|2)

and the necessary optimality condition is

F ′(x) +G′(x)∗µ, µ = max(0, µ+ cG(x))

where c > 0 is arbitrary. Based on the complementarity condition we have primal dualactive set method.

Primal Dual Active Set methodFor the affine case G(x) = Gx− c we have the Primal Dual Active Set method as:

1. Define the active index and inactive index by

A = k ∈ (µ+ c (Gx− c))k > 0, I = j ∈ (µ+ c (Gx− c))j ≤ 0

2. Let µ+ = 0 on I and Gx+ = c on A.

3. Solve for (x+, µ+A)

F ′′(x)(x+ − x) + F ′(x) +G∗Aµ+ = 0, GAx

+ − c = 0

4. Stop or set n = n+ 1 and return to step 1,

205

where GA = Gk, k ∈ A. For the nonlinear G

G′(x)(x+ − x) +G(x) = 0 on A = k ∈ (µ+ cG(x))k > 0.

The primal dual active set method is a semismooth Newton method. That is, ifwe define a generalized derivative of s → max(0, s) by 0 on (−∞, 0] and 1 on (0,∞).Note that s → max(0, s) is not differentiable at s = 0 and we define the derivative ats = 0 as the limit of derivative for s < 0. So, we select the generalized derivative ofmax(0, s) as 0, s ≤ 0 and 1, s > 0 and thus the generalized Newton update

µ+ − µ+ 0 = µ if µ+ c (Gx− c) ≤ 0

cG(x+ − x) + c (Gx− c) = Gx+ − c = 0 if µ+ c (Gx− c) > 0,

which results in the active set strategy

µ+j = 0, j ∈ I and (Gx+ − c)k = 0, k ∈ A.

We will introduce the class of semismooth functions and the semismooth Newtonmethod in Chapter 7 in function spaces. Specifically, the pointwise (coordinate) oper-ation s→ max(0, s) defines a semismooth function from Lp(Ω)→ Lq(Ω), p > q.

13 Lagrange multiplier Theory for Nonsmooth

convex Optimization

In this section we present the Lagrange multiplier theory for the nonsmmoth optimiza-tion of the form

min f(x) + ϕ(Λx) over x ∈ C (13.1)

where X is a Banach space, H is a Hilbert space lattice, f : X → R is C1, Λ ∈ L(X,H)and C is a closed convex set in X. The nonsmoothness is represented by ϕ and ϕ is aproper, lower semi continuous convex functional in H.

This problem class encompasses a wide variety of optimization problems includingvariational inequalities of the first and second kind [?, ?]. We will specify those to ourspecific examples. Moreover, the exact penalty formulation of the constraint problemis

min f(x) + c |E(x)|L1 + c |(G(x))+|L1 (13.2)

where (G(x))+ = max(0, G(x)).We describe the Lagrange multiplier theory to deal with the non-smoothness of ϕ.

Note that (13.1) is equivalent to

min f(x) + ϕ(Λx− u)

subject to x ∈ C and u = 0 in H.(13.3)

Treating the equality constraint u = 0 in (13.3) by the augmented Lagrangian method,results in the minimization problem

minx∈C,u∈H

f(x) + ϕ(Λx− u) + (λ, u)H +c

2|u|2H , (13.4)

206

where λ ∈ H is a multiplier and c is a positive scalar penalty parameter. By (13.6) wehave

miny∈C

Lc(y, λ) = f(x) + ϕc(Λx, λ). (13.5)

whereϕc(u, λ) = infv∈H ϕ(u− v) + (λ, v)H + c

2 |v|2H

= infz∈Hϕ(z) + (λ, u− z)H +

c

2|u− z|2H (u− v = z).

(13.6)

13.1 Yosida-Moreau approximations and Monotone Op-erators

For u, λ ∈ H and c > 0, ϕc(u, λ) is called the generalized Yoshida-Moreau approxima-tions ϕ. It can be shown that u→ ϕc(u, λ) is continuously Frechet differentiable withLipschitz continuous derivative. Then, the augmented Lagrangian functional [?, ?, ?]of (13.1) is given by

Lc(x, λ) = f(x) + ϕc(Λx, λ). (13.7)

Let ϕ∗ is the convex conjugate of ϕ

ϕc(u, λ) = infz∈Hsupy∈H(y, z)− ϕ∗(y)+ (λ, u− z)H + c2 |u− z|

2H

= supy∈H− 1

2c|y − λ|2 + (y, u)− ϕ∗(y).

(13.8)

where we used

infz∈H(y, z)H + (λ, u− z)H +

c

2|u− z|2H = (y, u)− 1

2c|y − λ|2H

Thus, from Theorem 2.2

ϕ′c(u, λ) = yc(u, λ) = argmaxy∈H− 12c |y − λ|

2 + (y, u)− ϕ∗(y)

= λ+ c vc(u, λ)(13.9)

where

vc(u, λ) = argminv∈Hϕ∗(u− v) + (λ, v) +c

2|v|2 =

∂

∂λϕc(u, λ)

Moreover, if

pc(λ) = argminp∈H1

2c|p− λ|2 + ϕ∗(p), (13.10)

then (13.9) is equivalent toϕ′c(u, λ) = pc(λ+ c u). (13.11)

Theorem (Lipschitz complementarity)(1) If λ ∈ ∂ϕ(x) for x, λ ∈ H, then λ = ϕ′c(x, λ) for all c > 0.(2) Conversely, if λ = ϕ′c(x, λ) for some c > 0, then λ ∈ ∂ϕ(x).

Proof: If λ ∈ ∂ϕ(x), then from (13.6) and (13.8)

ϕ(x) ≥ ϕc(x, λ) ≥ 〈λ, x〉 − ϕ∗(λ) = ϕ(x).

207

Thus, λ ∈ H attains the supremum of (13.9) and we have λ = ϕ′c(x, λ). Conversely,if λ ∈ H satisfies λ = ϕ′c(x, λ) for some c > 0, then vc(x, λ) = 0 by (13.9). Hence itfollows from that

ϕ(x) = ϕc(x, λ) = (λ, x)− ϕ∗(λ).

which implies λ ∈ ∂ϕ(x).

If xc ∈ C denotes the solution to (13.5), then it satisfies

〈f ′(xc) + Λ∗λc, x− xc〉X∗,X ≥ 0, for all x ∈ C

λc = ϕ′c(Λxc, λ).(13.12)

Under appropriate conditions [?, ?, ?] the pair (xc, λc) ∈ C × H has a (strong-weak)cluster point (x, λ) as c → ∞ such that x ∈ C is the minimizer of (13.1) and thatλ ∈ H is a Lagrange multiplier in the sense that

〈f ′(x) + Λ∗λ, x− x〉X∗,X ≥ 0, for all x ∈ C, (13.13)

with the complementarity condition

λ = ϕ′c(Λx, λ), for each c > 0. (13.14)

System (13.13)–(13.14) is for the primal-dual variable (x, λ). The advantage here is thatthe frequently employed differential inclusion λ ∈ ∂ϕ(Λx) is replaced by the equivalentnonlinear equation (13.14).

The first order augmented Lagrangian method for (13.1) is given byFirst order augmented Lagrangian method

• Select λ0 and set n = 0.

• Let xn+1 = argminx∈CLc(x, λn).

• Update the Lagrange multiplier by λn+1 = ϕ′c(Λxn+1, λn).

• Stop or set n = n+ 1 and return to step 1.

The step minx∈C Lc(x, λn) can be spitted into the coordinate-wise as

Coordinate splitting method

• Select λ0 and set n = 0

• zn+1 = argminzϕ(v) + (λn,Λxn − z)H + c2 |Λxn − z|

2H.

• xn+1 = argminx∈Cf(x) + (λn,Λx− zn+1)H + c2 |Λx− zn+1|2H.

• Update the Lagrange multiplier by λn+1 = pc(λn + cΛxn+1).

• Stop or set n = n+ 1 and return to step 1.

In many applications, the convex conjugate functional ϕ∗ of ϕ is given by

ϕ∗(v) = IK∗(v),

where K∗ is a closed convex set in H and IS is the indicator function of a set S. From(13.8)

ϕc(u, λ) = supy∈K∗

− 1

2c|y − λ|2 + (y, u)

208

and it follows from (13.10)–(13.11) that (13.14) is equivalent to

λ = ProjK∗(λ+ cΛx). (13.15)

which is the basis of our approach.

13.2 Examples

In this section we discuss the applications of the Lagrange multiplier theorem and thecomplementarity.Example (Inequality) Let H = L2(Ω) and ϕ = IC with C = s ∈ L2(Ω) : z ≤ 0, a.e.,i.e.,

min f(x) over x ∈ Λx− c ≤ 0. (13.16)

In this caseϕ∗(v) = IK∗(v) with K∗ = −C

and the complementarity (15.26) implies that

f ′(x) + Λ∗λ = 0

λ = max(0, λ+ c (Λx− c)).

Example (L1(Ω) optimization) Let H = L2(Ω) and ϕ(v) =

∫Ω|v| dx, i.e.

min f(y) +

∫Ω|Λy|2 dx. (13.17)

In this case

ϕ∗(v) = IK∗(v) with K∗ = z ∈ L2(Ω) : |z|2 ≤ 1, a.e.

and the complementarity (15.26) implies that

f ′(x) + Λ∗λ = 0

λ = λ+ cΛxmax(1, |λ+ cΛx|2)

a.e..

Exaples (Constrained optimization) Let X be a Hilbert space and consider

min f(x) subject to Λx ∈ C (13.18)

where f : X → R is C1 and C is a closed convex set in X. In this case we let f = Jand Λ is the natural injection and ϕ = IC .

ϕc(u, λ) = infz∈C(λ, u− z)H +

c

2|u− z|2H. (13.19)

and

z∗ = ProjC(u+λ

c) (13.20)

209

attains the minimum in (13.19) and

ϕ′c(u, λ) = λ+ c (u− z∗) (13.21)

If H = L2(Ω) and C is the bilateral constraint y ∈ X : φ ≤ Λy ≤ ψ, the complemen-tarity (13.20)–(13.21) implies that for c > 0

f ′(x∗) + Λ∗µ = 0

µ = max(0, µ+ c (Λx∗ − ψ) + min(0, µ+ c (Λx∗ − φ)) a.e..(13.22)

If either φ = −∞ or ψ = ∞, it is unilateral constrained and defines the generalizedobstacle problem. The cost functional J(y) can represent the performance index for theoptimal control and design problem, the fidelity of data-to-fit for the inverse problemand deformation and restoring energy for the variational problem.

13.3 Monotone Operators and Yosida-Morrey approxi-mations

13.4 Lagrange multiplier Theory

In this action we introduce the generalized Lagrange multiplier theory. We establishthe conditions for the existence of the Lagrange multiplier λ and derive the optimal-ity condition (1.7)–(1.8). In Section 1.5 it is shown that the both Uzawa and aug-mented Lagrangian method are fixed point methods for the complementarity condition(??))and the convergence analysis is presented . In Section 1.6 we preset the concreteexamples and demonstrate the applicability of the results in Sections 1.4–1.5.

14 Semismooth Newton method

In this section we present the semismooth Newton method for the nonsmooth equationin Banach spaces, especially the necessary optimality condition, e.g., the complemen-tarity condition (13.12):

λ = ϕ′c(Λx, λ).

Consider the nonlinear equation F (y) = 0 in a Banach space X. The generalizedNewton update is given by

yk+1 = yk − V −1k F (yk), (14.1)

where Vk is a generalized derivative of F at yk. In the finite dimensional space for alocally Lipschitz continuous function F let DF denote the set of points at which F isdifferentiable. For x ∈ X = Rn we define the Bouligand derivative ∂BF (x) as

∂BF (x) =

J : J = lim

xi→x, xi∈DF∇F (xi)

. (14.2)

Here, DF is dense by Rademacher’s theorem which states that every locally Lipschitzcontinuous function in the finite dimensional space is differentiable almost everywhere.Thus, we take Vk ∈ ∂BF (yk).

210

In infinite dimensional spaces notions of generalized derivatives for functions whichare not C1 cannot rely on Rademacher’s theorem. Here, instead, we shall mainlyutilize a concept of generalized derivative that is sufficient to guarantee superlinearconvergence of Newton’s method [?]. This notion of differentiability is called Newtonderivative and is defined below. We refer to [?, ?, ?, ?] for further discussion of thenotions and topics. Let X, Z be real Banach spaces and let D ⊂ X be an open set.

Definition (Newton differentiable) (1) F : D ⊂ X → Z is called Newton dif-ferentiable at x, if there exists an open neighborhood N(x) ⊂ D and mappingsG : N(x)→ L(X,Z) such that

lim|h|→0

|F (x+ h)− F (x)−G(x+ h)h|Z|h|X

= 0.

The family G(y) : y ∈ N(x) is called a N -derivative of F at x.(2) F is called semismooth at y, if it is Newton differentiable at y and

limt→0+

G(y + t h)h exists uniformly in |h| = 1.

Semi-smoothness was originally introduced in [?] for scalar-valued functions. Con-vex functions and real-valued C1 functions are examples for such semismooth functions[?, ?] in the finite dimensional space.

For example, if F (y)(s) = ψ(y(s)), point-wise, then G(y)(s) ∈ ψB(y(s)) is an N -derivative in Lp(Ω)→ Lq(Ω) under appropriate conditions [?]. We often use ψ(s) = |s|and max(0, s) for the necessary optimality.

Suppose F (y∗) = 0, Then, y∗ = yk + (y∗ − yk) and

|yk+1 − y∗| = |V −1k (F (y∗)− F (yk)− Vk(y∗ − yk)| ≤ |V −1

k |o(|yk − y∗|).

Thus, the semismooth Newton method is q-superlinear convergent provided that theJacobian sequence Vk is uniformly invertible as yk → y∗. That is, if one can selecta sequence of quasi-Jacobian Vk that is consistent, i.e., |Vk − G(y∗)| as yk → y∗ andinvertible, (14.1) is still q-superlinear convergent.

14.1 Bilateral constraint

Consider the bilateral constraint problem

min f(x) subject to φ ≤ Λx ≤ ψ. (14.3)

We have the optimality condition (13.22):

f ′(x) + Λ∗µ = 0

µ = max(0, µ+ c (Λx− ψ) + min(0, µ+ c (Λx− φ)) a.e..

Since∂B max(0, s) = 0, 1, ∂B min(0, s) = −1, 0, at s = 0,

the semismooth Newton method for the bilateral constraint (14.3) is of the form of theprimal-dual active set method:

Primal dual active set method

211

1. Initialize y0 ∈ X and λ0 ∈ H. Set k = 0.

2. Set the active set A = A+ ∪ A− and the inactive set I by

A+ = x ∈ Ω : µk+c (Λyk−ψ) > 0, A− = x ∈ Ω : µk+c (Λyk−φ) < 0, I = Ω\A

3. Solve for (yk+1, λk+1)

J ′′(yk)(yk+1 − yk) + J ′(yk) + Λ∗µk+1 = 0

Λyk+1(x) = ψ(x), x ∈ A+, Λyk−1(x) = φ(x), x ∈ A−, µk+1(x) = 0, x ∈ I

4. Stop, or set k = k + 1 and return to the second seep.

Thus the algorithm involves solving the linear system of the form A Λ∗A+ Λ∗A−ΛA+ 0 0ΛA− 0 0

yµA+

µA−

=

fψφ

.

In order to gain the stability the following one-parameter family of the regularizationcan be used [?, ?]

µ = α (max(0, µ+ c (Λy − ψ)) + min(0, µ+ c (Λy − φ))), α→ 1+.

14.2 L1 optimization

As an important example, we discuss the case of ϕ(v) =∫

Ω |v|2 dx (L1-type optimiza-tion). The complementarity is reduced to the form

λmax(1 + ε, |λ+ c v|) = λ+ c v (14.4)

for ε ≥ 0. It is often convenient (but not essential) to use very small ε > 0 to avoid thesingularity for the implementation of algorithm. Let

ϕε(s) =

s2

2ε + ε2 , if |s| ≤ ε,

|s|, if |s| ≥ ε.(14.5)

Then, (14.4) corresponds to the regularized one of (4.31):

min Jε(y) = F (y) + ϕε(Λy).

The semismooth Newton method is given by

F ′(y+) + Λ∗λ+ = 0λ+ =

v+

εif |λ+ v| ≤ 1 + ε,

|λ+ c v|λ+ +

(λ

(λ+ c v

|λ+ c v|

)t)(λ+ + c v+)

= λ+ + c v+ + |λ+ c v|λ if |λ+ c v| > 1 + ε.

(14.6)

212

There is no guarantee that (14.6) is solvable for λ+ and is stable. In order to obtainthe compact and unconditionally stable formula we use the damped and regularizedalgorithm with β ≤ 1;

λ+ =v

εif |λ+ c v| ≤ 1 + ε,

|λ+ c v|λ+ − β

(λ

max(1, |λ|)

(λ+ c v

|λ+ c v|

)t)(λ+ + c v+)

= λ+ + c v+ + β |λ+ c v| λ

max(1, |λ|)if |λ+ c v| > 1 + ε.

(14.7)Here, the purpose of the regularization λ

|λ|∧1 is to automatically constrain the dualvariable λ into the unit ball. The damping factor β is automatically selected to achievethe stability. Let

d = |λ+ c v|, η = d− 1, a =λ

|λ| ∧ 1, b =

λ+ c v

|λ+ c v|, F = abt.

Then, (14.7) is equivalent to

λ+ = (η I + β F )−1((I − β F )(c v+) + βd a),

where by Sherman–Morrison formula

(η I + β F )−1 =1

η(I − β

η + β a · bF ).

Then,

(η I + β F )−1βd a =β d

η + β a · ba.

Since F 2 = (a · b)F ,

(η I + β F )−1(I − β F ) =1

η(I − βd

η + β a · bF ).

In order to achieve the stability, we let

β d

η + β a · b= 1, i.e., β =

d− 1

d− a · b≤ 1.

Consequently, we obtain a compact Newton step

λ+ =1

d− 1(I − F )(c v+) +

λ

|λ| ∧ 1, (14.8)

which results in;

Primal-dual active set method (L1-optimization)

1. Initialize: λ0 = 0 and solve F ′(y0) = 0 for y0. Set k = 0.

2. Set inactive set Ik and active set Ak by

Ik = |λk + cΛyk| > 1 + ε, and Ak = |λk + cΛyk| ≤ 1 + ε.

213

3. Solve for (yk+1, λk+1) ∈ X ×H:

F ′(yk+1) + Λ∗λk+1 = 0

λk+1 =1

dk − 1(I − F k)(cΛyk+1) +

λk

|λk| ∧ 1in Ak and

Λyk+1 = ε λk+1 in Ik.

4. Convergent or set k = k + 1 and Return to Step 2.

This algorithm is unconditionally stable and is rapidly convergent forour test examples.Remark (1) Note that

(λ+, v+) =|v+|2 − (a · v+)(b · v+)

d− 1+ a · v+, (14.9)

which implies stability of the algorithm. In fact, sinceF ′(yk+1)− F ′(yk) + Λ∗(λk+1 − λk)

λk+1 − λk = (I − F k)(cΛyk+1) + (1

|λk|− 1)λk,

we have

(F ′(yk+1)− F ′(yk), yk+1 − yk) + c ((I − F k)Λyk+1,Λyk+1) + (1

|λk|− 1) (λk,Λyk+1).

SupposeF (y)− F (x)− (F ′(y), y − x) ≥ ω |y − x|2X ,

we have

F (yk) +k∑j=1

ω |yj − yj−1|2 + (1

|λj−1| ∧ 1− 1) (λj−1, vj) ≤ F (y0).

where vj = Λyj . Note that (λk, vk+1) > 0 implies that (λk+1, vk+1) > 0 and thusinactive at k+1-step, pointwise. This fact is a key step for proving a global convergenceof the algorithm.(2) If a · b→ 1+, then β → 1. Suppose (λ, v) is a fixed point of (14.8), then(

1− 1

|λ| ∧ 1

(1− c

d− 1

λ+ c v

|λ+ c v|· v))

λ =1

d− 1v.

Thus, the angle between λ and λ+ c v is zero and

1− 1

|λ| ∧ 1=

c

|λ|+ c |v| − 1

(|v||λ|− |v||λ| ∧ 1

),

which implies |λ| = 1. It follows that

λ+ c v =λ+ c v

max(|λ+ c v|, 1 + ε).

214

That is, if the algorithm converges, a · b→ 1 and |λ| → 1 and it is consistent.(3) Consider the substitution iterate:

F ′(y+) + Λ∗λ+ = 0,

λ+ =1

max(ε, |v|)v+, v = Λy.

(14.10)

Note that (v

|v|, v+ − v

)=

(|v+|2 − |v|2 + |v+ − v|2

2|v|

)dx

=

(|v+| − |v| − (|v+| − |v|)2 + |v+ − v|2

2|v|

).

Thus,Jε(v

+) ≤ Jε(v),

where the equality holds only if v+. This fact can be used to prove the iterativemethod (14.10) is globally convergent [?, ?]. It also suggests to use the hybrid method;for 0 < µ < 1

λ+ =µ

max(ε, |v|)v+ + (1− µ) (

1

d− 1(I − F )v+ +

λ

max(|λ|, 1)),

in order to gain the global convergence property without loosing the fast convergenceof the Newton method.

15 Non-convex nonsmooth optimization

In this section we consider a general class of non-convex, nonsmooth optimizationproblems. Let X be a Banach space and continuously embed to H = L2(Ω)m. LetJ : X → R+ is C1 and N is nonsmooth functional on H is given by

N(y) =

∫Ωh(y(ω)) dω.

where h : Rm → R is lower semi-continuous. Consider the minimization on X;

min J(x) +N(x) (15.1)

subject tox ∈ C = x(ω) ∈ U a.e.

where U is a closed convex set in Rm.For example, we consider h(u) = α

2 |u|2 + |u|0 where |s|0 = 1, s 6= 0 and |0|0 = 0.

For an integrable function u∫Ω|u(ω)|0 dω = meas(u 6= 0).

Thus, one can formulate the volume control for material design and and time schedulingproblem for the optimal control problem as (15.1).

215

15.1 Existence of Minimizer

In this section we develop a existence method for a general class of minimizations

minG(x) (15.2)

on a Banach space X, including (15.1). Here, the constrained problem x ∈ C isformulated as G(x) = G(x) + IC(x) using the indicator function of the constrained setC. Let X be a reflexible space or X = X∗ for some normed space X. Thus xn xmeans the weak or weakly star convergence in X. To prove the existence of solutionsto (15.2) we modify a general existence result given in [?].

Theorem (Existence) Let G be an extended real-valued functional on a Banachspace X satisfying the following properties:

(i) G is proper, weakly lower semi-continuous and G(x) + ε |x|p for each ε > 0 iscoercive for some p.

(ii) for any sequence xn in X with |xn| → ∞, xn|xn| x weakly in X, and G(xn)

bounded from above, we have xn|xn| → x strongly inX and there exists ρn ∈ (0, |xn|]

and n0 = n0(xn) such that

G(xn − ρnx) ≤ G(xn), (15.3)

for all n ≥ n0.

Then minx∈X G(x) admits a minimizer.

Proof: Let εn ⊂ (0, 1) be a sequence converging to 0 from above and consider thefamily of auxiliary problems

minx∈x

G(x) + εn |x|p. (15.4)

By (i), every minimizing sequence for (15.1) is bounded. Extracting a weakly conver-gent subsequence, the existence of a solution xn ∈ X for (15.1) can be argued in astandard manner.

Suppose xn is bounded. Then there exists a weakly convergent subsequence,denoted by the same symbols and x∗ such that xn x∗. Passing to the limit εn → 0+

inG(xn) +

εn2|xn|p ≤ G(x) +

εn2|x|p for all x ∈ X

and using again (i) we have

G(x∗) ≤ G(x) for all x ∈ X

and thus x∗ is a minimizer for G.We now argue that xn is bounded, and assume to the contrary that limn→∞ |xn| =

∞. Then a weakly convergent subsequence can be extracted from xn|xn| such that, again

dropping indices, xn|xn| x, for some x ∈ X. Since G(xn) is bounded from (ii)

xn|xn|

→ x strongly in X. (15.5)

Next, choose ρ > 0 and n0 according to (15.3) and thus for all n ≥ n0

G(xn) + εn |xn|p ≤ G(xn − ρx) + εn |xn − ρx|p ≤ G(xn) + εn |xn − ρx|p.

216

It follows that

|xn| ≤ |xn − ρx| = |xn − ρnxn|xn|

+ ρn(xn|xn|− x)| ≤ |xn|(1−

ρ

|xn|) + ρn|

xn|xn|− x|.

This implies that

1 ≤ | xn|xn|− x|,

which give a contradiction to (15.5), and concludes the proof.

The first half of condition (ii) is a compactness assumption and (15.3) is a compat-ibility condition [?].

Remark (1) If G is not bounded below. Define the (sequential) recession functionalG∞ associated with G. It is the extended real-valued functional defined by

G∞(x) = infxnx,tn→∞

lim infn→∞

1

tnG(tnxn). (15.6)

and assume G∞(x) ≥ 0. Then, G(x) + ε |x|2 is coercive. If not, there exists a sequencexn satisfying G(xn) + ε |xn|2 is bounded but |xn| → ∞. Let w − lim xn

|xn| = x.

1

|xn|G(|xn|

xn|xn|

) + ε|xn| → 0.

Since G∞(x) ≤ 1|xn|G(|xn| xn|xn|), which yields a contradiction.

Moreover, in the proof note that

G∞(x) ≤ lim infn→∞

1

|xn|G(xn) = lim inf

n→∞

1

|xn|G(

xn|xn||xn|) ≤ 0.

If we assume G∞ ≥ 0, then G∞(x) = 0. Thus, (15.3) must hold, assuming G∞(x) = 0.In [?] (15.3) is assumed for all x ∈ X and G∞(x) = 0, i.e.,

G(x− ρx) ≤ G(x).

Thus, our condition is much weaker. In [?] we find a more general version of (15.3):there exists a ρn and zn such that ρn ∈ (0, |xn|] and zn ∈ X

G(xn − ρn zn) ≤ G(xn), |zn − z| → 0 and |z − x| < 1.

(2) We have also proved that a sequence xn that minimizes the regularized problemfor each εn has a weakly convergent subsequence to a minimizer of G(x).

Example 1 If ker(G∞) = 0, then x = 0 and the compactness assumption implies|x| = 1, which is a contradiction. Thus, the theorem applies.

Example 2 (`0 optimization) Let X = `2 and consider `0 minimization:

G(x) =1

2|Ax− b|22 + β |x|0, (15.7)

where|x|0 = the number of nonzero elements of x ∈ `2,

217

the counting measure of x. Assume N(A) is finite and R(A) is closed. The, onecan apply Theorem to prove the existence of solutions to (15.7). That is, suppose|xn| → ∞, G(xn) is bounded from above and xn

|xn| z. First, we show that z ∈ N(A)

and xn

|xn| → z. Since G(xn) is bounded from above, here exists M such that

J(xn) =1

2|Axn − b|22 + β |xn|0 ≤M, for all n. (15.8)

Since R(A) closed, by the closed range theorem X = R(A∗)+N(A) and let xn = x1n+x2

n

Consequently, 0 ≤ |Ax1n|2 − 2(b, Ax1

n)2 + |b|22 ≤ K and |Ax1n| is bounded.

0 ≤∣∣A(

x1n

|xn|2)∣∣22− 2

1

|xn|2(A∗b,

xn|xn|2

)2 + |b|22 → 0. (15.9)

By the closed range theorem this implies that |xni is bounded and x1n

|xn|2 → x1 = 0 in `2.

Since x2n

|xn|2 x2 and by assumption dim N(A) < ∞ it follows that xn|xn|2 → z = x2

strongly in `2. Next, (15.3) holds. Since

|xn|0 = | xn|xn||0

and thus |z|0 <∞ (15.3) is equivalent to showing that

|(x1n + x2

n − ρ z)i|0 ≤ |(x1n + x2

n)i|0 for all i,

and for all n sufficiently large. Only the coordinates for which (x1n + x2

n)i = 0 withzi 6= 0 require our attention. Since |z|0 <∞ there exists i such that zi = 0 for all i > i.For i ∈ 1, . . . , i we define Ii = n : (x1

n + x2n)i = 0, zi 6= 0. These sets are finite.

In fact, if Ii is infinite for some i ∈ 1, . . . , i, then limn→∞,n∈Ii1|xn|2 (x1

n + x2n)i = 0.

Since limn→∞1|xn|2 (x1

n)i = 0 this implies that zi = 0, which is a contradiction.

Example 3 (Obstacle problem) Consider

min G(u) =

∫ 1

0(1

2|ux|2 − fu) dx (15.10)

subject to u(12) ≤ 1. Let X = H1(0, 1). If |un| → ∞ and vn = un

|un| v. By the

assumption in (ii)

1

2

∫ 1

0|(vn)x|2 dx ≤

1

|un|

∫ 1

0fvn dx+

G(un)

|un|2→ 0

and thus (vn)x → 0 and vn → v = c for some constant c. But, since vn(12) ≤ 1

|un| → 0,

c = v(12) ≤ 0 Thus,

G(un − ρ v) = G(un) +

∫ 1

0ρvf(x) dx ≤ G(un)

if∫ 1

0 f(x) dx ≥ 0. That is, if∫ 1

0 f(x) dx ≥ 0 (15.10) has a minimizer.

218

Example 4 (Friction Problem)

min

∫ 1

0(1

2|ux|2 − fu) dx+ |u(1)| (15.11)

Using exactly the same argument above. we have (vn) → 0 and vn → v = c. But, wehave

G(un − ρv) = G(un) + ρc

∫ 1

0f(x) dx+ |un(1)− ρc| − |un(1)|

where|un(1)− ρc| − |un(1)|

|un|= |vn(1)− ρ

|un|c| − |vn(1)| ≤ 0

for sufficiently large n since

|un(1)− ρc| − |un(1)||un|

→ |c− ρc| − |c| = −ρ|c|

where ρ = limn→∞ρn|un| . Thus, if |

∫ 10 f(x) dx| ≤ 1, then (15.11) has a minimizer.

Example 5 (L∞ Laplacian).

min

∫ 1

0|u− f | dx subject to |ux| ≤ 1

Similarly, we have (vn)x → 0 and vn → v = c.

G(un − ρc)−G(un)

|un|= G(vn − ρc−

f

|un|)−G(vn −

f

|un|) ≤ 0

for sufficiently large n.

Example 6 (Elastic contact problem) Let X = H1(Ω)d and ~u is the deformation field.Given, the boundary body force ~g, consider the elastic contact problem:

min

∫Ω

1

2ε : σ dx−

∫Γ1

~g · ~u dsx +

∫Γ2

|τ · ~u| dsx, subject to n · ~u ≤ ψ, (15.12)

where we assume the linear strain:

ε(u) =1

2(∂ui∂xj

+∂uj∂xi

)

and the Hooke’s law:σ = µε+ λ trεI

with positive Lame constants µ, λ. If |~un| → ∞ and G(un) is bounded we have

ε(vn)→ 0 ∇ε(vn)→ 0

for ~vn = ~un| ~un| v. Thus, one can show that ~vn → v = (v1, v2) ∈ (−Ax2 + C1, Ax1 +

C2) for the two dimensional problem. Consider the case when Ω = (0, 1)2 and Γ1 =x2 = 1 and Γ2 = x2 = 0. As shown in the previous examples

G(~un − ρ v)−G(~un)

|~un|=

∫Γ1

ρn · ~gv2 dsx +

∫Γ1

ρτ · ~gv1 dsx

+

∫Γ2

(|(vn)1 − ρv1| − |(vn)1|) dsx

219

Since v2 ∈ C = Ax1 + C2 ≤ 0 and for v1 = C1

|(vn)1 − ρv1| − |(vn)1| → |(v)1 − ρv2| − |(v)1| = −ρ|v1|,

if we assume |∫

Γ1τ · ~g dsx| ≤ 1 and

∫Γ1n · ~gv2 dsx ≥ 0 for all v2 ∈ C, then (15.12) has

a minimizer.

Example 7

min

∫ 1

0(1

2(a(x)|ux|2 + |u|2)− fu) dx

with a(x) > 0 except a(12) = 0.

15.2 Necessary Optimality

Let X be a Banach space and continuously embed to H = L2(Ω)m. Let J : X → R+

is C1 and N is nonsmooth functional on H = L2(Ω)m is given by

N(y) =

∫Ωh(y(ω)) dω.

where h : Rm → R is a lower semi-continuous functional. Consider the minimizationon H;

min J(y) +N(y) (15.13)

subject toy ∈ C = y(ω) ∈ U a.e.

where U is a closed convex set in Rm.Given y∗ ∈ C, define needle perturbations of (the optimal solution) y∗ by

y =

y∗ on |ω − s| > δu on |ω − s| ≤ δ

for δ > 0, s ∈ ω, and u ∈ U . Assume that

|J(y)− J(y∗)− (J ′(y∗), y − y∗)| ∼ o(meas(|s| < δ). (15.14)

Let H is the Hamiltonian defined by

H(y, λ) = (λ, y)− h(y), y, λ ∈ Rm.

Then, we have the pointwise maximum principle.

Theorem (Pointwise Optimality) If an integrable function y∗ attains the minimumand (15.14) holds, then for a.e. ω ∈ Ω

J ′(y∗) + λ = 0

H(u, λ(ω)) ≤ H(y∗(ω), λ(ω)) for all u ∈ U.(15.15)

Proof: Since

0 ≥ J(y) +N(y)− (J(y∗) +N(y∗))

= J(y)− J(y∗)− (J ′(y∗), y − y∗) +

∫|ω−s|<δ

(J ′(y∗), u− y∗) + h(u)− h(y∗)) dω

220

Since h is lower semicontinuos, it follows from (15.14) that at a Lubegue point s = ωof y∗ (15.15) holds.

For λ ∈ Rm define a set-valued function by

Φ(λ) = argmaxy∈U(λ, y)− h(y).

Then, (15.15) is written asy(ω) ∈ Φ(−J ′(y(ω))

The graph of Φ is monotone:

(λ1 − λ2, y1 − y2) ≥ 0 for all y1 ∈ Φ(λ1), y2 ∈ Φ(λ2).

but it not necessarily maximum monotone. Let h∗ be the conjugate function of hdefined by

h∗(λ) = maxu∈UH(λ, y) = max

u∈U(λ, u)− h(u).

Then, h∗ is necessarily convex. Since if u ∈ Φ(λ), then

h∗(λ) ≥ (λ, y)− h(u) = (λ− λ, y) + h∗(u),

thus u ∈ ∂h∗(λ). Since h∗ is convex, ∂h∗ is maximum monotone and thus ∂h∗ is themaximum monotone extension of Φ.

Let h∗∗ be the bi-conjugate function of h;

h∗∗(x) = supλ(x, λ)− sup

yH(y, λ)

whose epigraph is the convex envelope of the epigraph of h, i.e.,

epi(h∗∗) = λ1 y1 + (1− λ) y2 for all y1, y2 ∈ epi(h) and λ ∈ [0, 1].

Then, h∗∗ is necessarily convex and is the convexfication of h and

λ ∈ ∂h∗∗(u) if and only if u ∈ ∂h∗(λ)

andh∗(λ) + h∗∗(u) = (λ, u).

Consider the relaxed problem:

min J(y) +

∫Ωh∗∗(y(ω)) dω. (15.16)

Since h∗∗ is convex,

N∗∗(y) =

∫Ωh∗∗(y) dω

is weakly lower semi-continuous and thus there exits a minimizer y of (15.16). It thusfollows from Theorem (Pointwise Optimality) that the necessary optimality of (15.16)is given by

−J ′(y)(ω) ∈ ∂h∗∗(y(ω)),

or equivalentlyy ∈ ∂h∗(−J ′(y)). (15.17)

221

Lemma 2.2 Assume

J(y)− J(y)− J ′(y)(y − y) ≥ 0 for all y ∈ C,

wherey(ω) ∈ Φ(−J ′(y(ω))).

Then, y minimizes the original cost functional (15.13).Proof: Since ∫

Ω(J ′(y), y − y) + h(y)− h(y) ≥ 0,

we have

J(y) +

∫Ωh(y) dω ≥ J(y) +

∫Ωh(y) dω for all y ∈ C.

One can construct a C1 realization of h∗ by

h∗ε (λ) = minp ε

2|p− λ|2 + h∗(p).

That is, h∗ε is C1 and (h∗ε )′ is Lipschitz and (h∗ε )

′ is the Yosida approximation of ∂h∗.If we let hε = (h∗ε )

∗, then hε is convex and

hε(u) = min 1

2ε|y − u|2 + h∗∗(y).

Consider the relaxed problem

min J(y) +

∫Ωhε(y(ω)) dω. (15.18)

The necessary optimality condition becomes

y(ω) = (h∗ε )′(−J ′(y(ω))). (15.19)

Define the proximal approximation by

pε(λ) = argmin ε2|p− λ|2 + h∗(p)

Then,(h∗ε )

′ = ε (λ− pε(λ)).

Thus, one can develop the semi smooth Newton method to solve (15.19).Since hε ≤ h∗∗, one can argue that a sequence of minimizers yε of (15.18) to a

minimizer of (15.16). In fact,

J(yε) +

∫Ωhε(yε) dω ≤ J(y) +

∫Ωhε(y) dω.

for all y ∈ C. Suppose |yε| is bounded, then yε has a weakly convergent subsequence toy ∈ C and

J(y) +

∫Ωh∗∗(y) dω ≤ J(y) +

∫Ωh∗∗(y) dω.

for all y ∈ C.

222

Example (L0(Ω) optimization) Based on our analysis one can analyze when h involves

the volume control by L0(Ω), i.e., Let h on R be given by

h(u) =α

2|u|2 + β|u|0, (15.20)

where

|u|0 =

0 if u = 0

1 if u 6= 0.

That is, ∫Ω|u|0 dω = meas(u(ω) 6= 0).

In this case Theorem applies with

Φ(q) := argminu∈R(h(u)− qu) =

qα for |q| ≥

√2αβ

0 for |q| <√

2αβ.(15.21)

and the conjugate function h∗ of h;

−h∗(q) = h(Φ(q))− qΦ(q) =

− 1

2α |q|2 + β for |q| ≥

√2αβ

0 for |q| <√

2αβ.

The bi-conjugate function h∗∗(u) is given by

h∗∗(u) =

α2 |u|

2 + β |u| >√

2βα

√2αβ |u| |u| ≤

√2βα .

Clearly, Φ : R → R is monotone, but it is not maximal monotone. The maximalmonotone extension Φ = (∂h∗∗)−1 = ∂h∗ of Φ and is given by

Φ(q) ∈

qα for |q| >

√2αβ

0 for |q| <√

2αβ

[ qα , 0] for q = −√

2αβ

[0, qα ] for q =√

2αβ.

Thus,

(h∗ε )′ =

λ− pε(λ)

ε

where

pε(λ) = argmin 1

2ε|p− λ|2 + h∗(p).

223

We have

pε(λ) =

λ |λ| <√

2αβ

√2αβ

√2αβ ≤ λ ≤ (1 + ε

α)√

2αβ

−√

2αβ√

2αβ ≤ −λ ≤ (1 + εα)√

2αβ

λ1+ ε

α|λ| ≥ (1 + ε

α)√

2αβ.

(h∗ε )′(λ) =

0 |λ| <√

2αβ

λ−√

2αβε

√2αβ ≤ λ ≤ (1 + ε

α)√

2αβ

λ+√

2αβε

√2αβ ≤ −λ ≤ (1 + ε

α)√

2αβ

λα+ε |λ| ≥ (1 + ε

α)√

2αβ.

(15.22)

Example (Authority optimization) We analyze the authority optimization of two un-knowns u1 and u2, i.e., the at least one of u1 and u2 is zero. We formulate such aproblem by using

h(u) =α

2(|u|21 + |u2|2) + β|u1u2|0, (15.23)

The, we have

h∗(p1, p2) =

12α(|p1|2 + |p2|2)− β |p1|, |p2| ≥

√2αβ

12α |p1|2 |p1| ≥ |p2|, |p2| ≤

√2αβ

12α |p2|2 |p2| ≥ |p1|, |p1| ≤

√2αβ

224

Also,we have

pε(λ1, λ2) =

11+ ε

α(λ1, λ2) |λ1|, |λ2| ≥ (1 + ε

α)√

2αβ

( 11+ ε

αλ1,±

√2αβ) |λ1| ≥ (1 + ε

α)√

2αβ,√

2αβ ≤ |λ2| ≤ (1 + εα)√

2αβ

(±√

2αβ, 11+ ε

αλ2) |λ2| ≥ (1 + ε

α)√

2αβ,√

2αβ ≤ |λ1| ≤ (1 + εα)√

2αβ

(λ2, λ2) |λ2| ≤ |λ1| ≤ (1 + εα)√

2αβ,√

2αβ ≤ |λ2| ≤ (1 + εα)√

2αβ

(λ1, λ1) |λ1| ≤ |λ2| ≤ (1 + εα)√

2αβ,√

2αβ ≤ |λ1| ≤ (1 + εα)√

2αβ

( λ11+ ε

α, λ2) |λ1| ≥ (1 + ε

α)| ≥ |λ2|, |λ2| ≤√

2αβ

(λ1,λ2

1+ εα

) |λ2| ≥ (1 + εα)| ≥ |λ1|, |λ1| ≤

√2αβ

(λ2, λ2) |λ2| ≤ |λ1| ≤ (1 + εα)|λ2|, |λ2| ≤

√2αβ

(λ1, λ1) |λ1| ≤ |λ2| ≤ (1 + εα)|λ1|, |λ1| ≤

√2αβ(15.24)

15.3 Augmented Lagrangian method

Given λ ∈ H and c > 0, consider the augmented Lagrangian functional;

Lc(x, y, λ) = J(x) + (λ, x− y)H +c

2|x− y|2H +N(y).

Since

Lc(x, y, λ) = J(x) +

∫Ω

(c

2|x− y|2 + (λ, x− y) + h(y)) dω

it follows from Theorem that if (x, y) minimizes Lc(x, y, λ), then the necessary opti-mality condition is given by

−J ′(x) = λ+ c (x− y)

y ∈ Φc(λ+ c x).

whereΦc(p) = argmax(p, y)− c

2|y|2 − h(y)

Thus, the augmented Lagrangian update is given by

λn+1 = λn + c (xn − yn)

where (xn, yn) solves−J ′(xn) = λn + c (xn − yn)

yn ∈ Φc(λn + c xn).

225

Let L∗(λ) = inf(x,y) Lc(x, y, λ). Since

L∗(λ) = − sup(x,y)(x,−λ) + (y, λ)− J(x)− h(y)− c

2|x− y|2|

and thus L∗(λ) is concave and

y − x ∈ ∂L∗(λ) for (x, y) = argmin(x,y)Lc(x, y, λ).

Let λ is a maximizer of L∗(λ). Since

L∗(λ) ≤ L∗(λ) + (λ− λ, x− y)

andL∗(λ)− L∗(λ) ≤ 0

we have y − x = 0 and thus we obtain the necessary optimality

−J ′(x) = λ

x ∈ Φc(λ+ c x).

Note thatsupp− 1

2c |p− λ|2 + (x, p)− h∗(p)

= supp− sup

y(y, p)− h(y) − 1

2c|p− λ|2 + (x, p)

= infyh(y) + (λ, x− y) +

c

2|y − x|2 = ϕc(x, λ).

LetLc(x, y, λ) = J(x) + (λ, x− y)H +

c

2|y − x|2H + h(y).

Then

L∗(λ) = infx

infpJ(x) +

1

2c|p− λ|2 + (x, p) + h∗(p)

If we define the proximal approximation by

pc(λ) = argmin 1

2c|p− λ|2 + h∗(p),

then(ϕc)

′(x, λ) = pc(λ+ c x). (15.25)

If we define the proximal approximation by

pc(λ) = argmin 1

2c|p− λ|2 + h∗(p),

then(ϕc)

′(x, λ) = pc(λ+ c x). (15.26)

Since

− 1

2c|p− λ|2 + (x, p) = − 1

2c|p− (λ+ c x)|2 +

1

2c(|λ+ c x)|2 − |λ|2)

226

Also, we have

(ϕc)′(λ) =

λ− pc(λ)

c. (15.27)

Thus, the maximal monotone extension Φc of the graph Φc is given by (ϕc)′(λ), the

Yosida approximation of ∂h∗(λ) and results in the update:

−J ′(xn) = λn + c (xn − yn)

yn = (ϕ∗c)′(λn + c xn).

In the case c = 0, we have the multiplier method

λn+1 = λn + α (xn − yn), α > 0

where (xn, yn) solves−J ′(xn) = λn

yn ∈ Φ(λn)

In the case λ = 0 we have the proximal iterate of the form

xn+1 = (ϕc)′(xn − αJ ′(xn)).

15.4 Semismooth Newton method

The necessary optimality condition is written as

−J ′(x) = λ

λ = h′c(x, λ) = pc(λ+ c x).

is the semi-smooth equation. For the case of h = α2 |x|

2 + β|x|0 we have from (15.22)-(15.26)

h′c(x, λ) =

λ if |λ+ cx| ≤√

2αβ

√2αβ if

√2αβ ≤ λ+ cx ≤ (1 + c

α)√

2αβ

−√

2αβ if√

2αβ ≤ −(λ+ cx) ≤ (1 + cα)√

2αβ

λα+c if |λ+ cx| ≥ (1 + c

α)√

2αβ.

Thus, the semi-smooth Newton update is

−J ′(xn+1) = λn+1,

xn+1 = 0 if |λn + cxn| ≤√

2αβ

λn+1 =√

2αβ if√

2αβ < λn + cxn < (1 + cα)√

2αβ

λn+1 = −√

2αβ if√

2αβ < −(λn + cxn) < (1 + cα)√

2αβ

xn+1 = λn+1

α if |λn + cxn| ≥ (1 + cα)√

2αβ.

227

For the case h(u1, u2) = α2 (|u1|2 + |u2|2) + β|u1u2|0. Let µn = λn + cun. From

(15.24) the semi-smooth Newton update is

un+1 = λn+1

α |µn1 |, |µn2 | ≥ (1 + εα)√

2αβ

un+11 =

λn+11α , λn+1

2 = ±√

2αβ |µn1 | ≥ (1 + εα)√

2αβ,√

2αβ ≤ |µn2 | ≤ (1 + εα)√

2αβ

λn+1 = ±√

2αβ, un+12 =

λn+12α |µn2 | ≥ (1 + ε

α)√

2αβ,√

2αβ ≤ |µn1 | ≤ (1 + εα)√

2αβ

λn+11 , λn+1

2 , un+11 =

λn+11α |µn2 | ≤ |µn1 | ≤ (1 + ε

α)√

2αβ,√

2αβ ≤ |µn2 | ≤ (1 + εα)√

2αβ

λn+11 = λn+1

2 , un+12 =

λn+12α |λ1| ≤ |λ2| ≤ (1 + ε

α)√

2αβ,√

2αβ ≤ |λ1| ≤ (1 + εα)√

2αβ

un+11 =

λn+11α , un+1

2 = 0 |µn1 | ≥ (1 + εα)| ≥ |µn2 |, |µn2 | ≤

√2αβ

un+12 =

λn+12α , un+1

1 = 0 |µn2 | ≥ (1 + εα)| ≥ |µn1 |, |µn1 | ≤

√2αβ

λn+12 = λn+1

2 un+12 = 0 |µn2 | ≤ |µn1 | ≤ (1 + ε

α)|µn2 |, |µn2 | ≤√

2αβ

λn+11 = λn+1

2 , un+11 = 0 |µn1 | ≤ |µn2 | ≤ (1 + ε

α)|µn1 |, |µn1 | ≤√

2αβ.

15.5 Constrained optimization

Consider the constrained minimization problem of the form;

min J(x, u) =

∫Ω

(`(x) + h(u)) dω over u ∈ U , (15.28)

subject to the equality constraint

E(x, u) = Ex+ f(x) +Bu = 0. (15.29)

Here U = u ∈ L2(Ω)m : u(ω) ∈ U a.e. in Ω, where U is a closed convex subset ofRm. Let X is a closed subspace of L2(Ω)n and E : X × U → X∗ with E ∈ L(X,X∗)bounded invertible, B ∈ L(L2(Ω)m, L2(Ω)n) and f : L2(Ω)n → L2(Ω)n Lipschitz. Weassume E(x, u) = 0 has a unique solution x = x(u) ∈ X, given u ∈ U . We assume thereexist a solution (x, x) to Problem (15.33). Also, we assume that the adjoint equation:

(E + fx(x))∗p+ `′(x) = 0. (15.30)

has a solution in X. It is a special case of (15.1) when J(u) is the implicit functionalof u:

J(u) =

∫Ω`(x(ω)) dω,

where x = x(u) ∈ X for u ∈ U is the unique solution to Ex+ f(x) +Bu = 0.To derive a necessary condition for this class of (nonconvex) problems we use a

maximum principle approach. For arbitrary s ∈ Ω, we shall utilize needle perturbations

228

of the optimal solution u defined by

v(ω) =

u on ω : |ω − s| < δ

u(ω) otherwise,(15.31)

where u ∈ U is constant and δ > 0 is sufficiently small so that |ω − s| < δ ⊂ Ω. Thefollowing additional properties for the optimal state x and each perturbed state x(v)will be used:

|x(v)− x|2L2(Ω) = o(meas(|ω − s| < δ))12

∫Ω

(`(·, x(v))− `(·, x)− `x(·, x)(x(v)− x)

)dω = O(|x(v)− x|2L2(Ω))

〈f(·, x(v))− f(·, x)− fx(·, x)(x(v)− x), p〉X∗,X = O(|x(v)− x|2X)

(15.32)

Theorem Suppose (x, u) ∈ X×U is optimal for problem (15.33) , that p ∈ X satisfiesthe adjoint equation (15.30) and that (??), (??), and (15.32) hold. Then we have thenecessary optimality condition that u(ω) minimizes h(u) + (p,Bu) pointwise a.e. in Ω.

Proof: By the second property in (15.32) we have

0 ≤ J(v)− J(u) =∫

Ω

(`(·, x(v))− `(·, x(u)) + h(v)− h(u)

)dω

=∫

Ω (`x(·, x)(x− x) + h(v)− h(u)) dω +O(|x− x|2),

where(`x(·, x)(x− x), p) = −〈(E + fx(·, x)(x− x)), p〉

and0 = 〈E(x− x) + f(·, x)− f(·, x) +B(v − u), p〉

= 〈E(x− x) + fx(·, x)(x− x) +B(v − u), p〉+O(|x− x|2)

Thus, we obtain

0 ≤ J(v)− J(u) ≤∫

Ω((−B∗p, v − u) + h(v)− h(u)) dω.

Dividing this by meas(|ω − s| < δ) > 0 , letting δ → 0, and using the first propertyin (15.32) we obtain the desired result at a Lebesgue point s

ω → (−B∗p, v − u) + h(v)− h(u))

since h is lower semi continuous. Consider the relaxed problem of (15.33):

min J(x, u) =

∫Ω

(`(x) + h∗∗(u)) dω over u ∈ U , (15.33)

229

We obtain the pointwise necessary optimalityEx+ f ′(x) +Bu = 0

(E + f ′(x))∗p+ `′(x) = 0

u ∈ ∂h∗(−B∗p),

or equivalentlyλ = (h∗c)

′(u, λ), λ = B∗p.

where h∗c is the Yoshida-M approximation of h∗∗.

15.6 Algoritm

Based the complementarity condition (??) we have the Primal dual active method forLp optimization (h(u) = α

2 |u|2 + β |u|p.):

Primal-Dual Active method (Lp(Ω)-optimization)

1. Initialize u0 ∈ X and λ0 ∈ H. Set n = 0.

2. Solve for (yn+1, un+1, pn+1)

Eyn+1 +Bun+1 = g, E∗pn+1 + `′(yn+1) = 0, λn+1 = B∗pn+1

and

un+1 = − λn+1

α+ βp|un|p−2if ω ∈ |λn + c un| > (µp + cup

λn+1 = µp if ω ∈ µp ≤ λn + cun ≤ µp + cun.

λn+1 = −µp if ω ∈ µp ≤ −(λn + cun) ≤ µp + cun.

un+1 = 0, if ω ∈ |λn + cun| ≤ µp.

3. Stop, or set n = n+ 1 and return to the second seep.

16 `0 optimization

Consider `0 minimization in `2

J(x) = F (x) + β N0(x)

whereN0(x) =

∑i

|xi|0

counts the number of nonzero elements of `2.Necessary optimality: Let x be a minimizer. Then for each coordinate

Gi(xi) + β |xi|0 ≤ Gi(xi) + β |xi|0 for all xi ∈ R,

230

whereGi(xi) = F (x+ (xi − xi) ei)

Suppose xi → Gi(xi) is strict convex. If zi is a minimizer of Gi, then

Fxi(z) = 0

andGi(z) + β < Gi(0),

then zi = xi is a minimizer of Gi(xi) + β |xi|0. Otherwise xi = 0 is a minimizer Thus,we obtain the necessary optimality

F ′(x) + λ = 0, λixi = 0

xi = 0 if Gi(0)−Gi(xi) ≤ β, λi = 0 if Gi(0)−Gi(xi) > β

Note that

Gi(0)−Gi(zi) =

∫ zi

0(z − s)G′′i (s) ds ∼

1

2FG′′i (

zi2

)|zi|2.

and

λi = −G′i(0) = −(G′i(0)−G′i(zi)) =

∫ z

0G′′i (s) ds ∼ G′′i (

zi2

)zi.

Thus, we have approximate complementarity

|λi|2 < 2βG′′i (zi2 )⇒ xi = 0

λi = 0⇒ |xi|2 ≥ 2βG′′i (

zi2

)

Equivalently,λk = 0 if k ∈ k : |λk + Λkxk|2 > 2βΛk

λj = 0 if j ∈ j : |λj + Λjxj |2 ≤ 2βΛj(16.1)

where Λi = G′′i (zi2 ).

For the quadratic case F (x) = 12(|Ax − b|2 + α (x, Px) with nonnegative P and

α ≥ 0. We have G′′i (zi2 ) = |Ai|2 + Pii and thus

A∗(Ax− b) + Px+ λ = 0, λixi = 0

with complementarity condition (16.1) is the necessary optimality condition.

17 Exercise

Problem 1 If A is a symmetric matrix on Rn and b ∈ Rn is a vector. Let F (x) =12x

tAx− btx. Show that F ′(x) = Ax− b.

Problem 2 Sequences fn, gn, hn are Cauchy in L2(0, 1) but not on C[0, 1].Problem 3 Show that ||x| − |y|| ≤ |x− y| and thus the norm is continuous.Problem 4 Show that all norms are equivalent on a finite dimensional vector space.

231

Problem 5 Show that for a closed linear operator T , the domain D(T ) is a Banachspace if it is equipped by the graph norm

|x|D(T ) = (|x|2X + |Tx|2Y )1/2.

Problem 6 Complete the DC motor problem.Problem 7 Consider the spline interpolation problem (3.8). Find the Riesz representa-tions Fi and Fj in H2

0 (0, 1). Complete the spline interpolation problem (3.8).

Problem 8 Derive the necessary optimality for L1 optimization (3.27) with U = [−1, 1].

Problem 9 Solve the constrained minimization in a Hilbert space X;

min1

2|x|2 − (a, x)

subject to(yi, x) = bi, 1 ≤ i ≤ m1, (yj , x) ≤ cj , 1 ≤ j ≤ m2.

over x ∈ X. We assume yim1i=1, yj

m2j=1 are linearly independent.

Problem 10 Solve the pointwise obstacle problem:

min

∫ 1

0(1

2|dudx|2 − f(x)u(x)) dx

subject to

u(1

3) ≤ c1, u(

2

3) ≤ c2,

over a Hilbert space X = H10 (0, 1) for f = 1.

Problem 11 Find the differential form of the necessary optimality for

min

∫ 1

0(1

2(a(x)|du

dx|2+c(x)|u|2)−f(x)u(x)) dx+

1

2(α1|u(0)|2+α2|u(1)|2)−(g1, u(0))−(g2, u(1))

over a Hilbert space X = H1(0, 1).

Problem 12 Let B ∈ Rm,n. Show that R(B) = Rm if and only if BBt is positive definiteon Rm. Show that x∗ = Bt(BBt)−1c defines a minimum norm solution to Bx = c.

Problem 13 Consider the optimal control problem (3.48) (Example Control problem)with U = u ∈ Rm : |u|∞ ≤ 1. Derive the optimality condition (3.49).

Problem 14 Consider the optimal control problem (3.48) (Example Control problem)with U = u ∈ Rm : |u|2 ≤ 1. Find the orthogonal projection PC and derive thenecessary optimality and develop the gradient method.

Problem 15 Let J(x) =∫ 1

0 |x(t)| dt is a functional on C(0, 1). Show that J ′(d) =∫ 10 sign(x(t))d(t) dt.

Problem 16 Develop the Newton methods for the optimal control problem (3.48) with-out the constraint on u (i.e., U = Rm).

232

Problem 17 (1) Show that the conjugate functional

h∗(p) = supu(p, u)− h(u)

is convex.(2) The set-valued function Φ

Φ(p) = argmaxu(p, u)− h(u)

is monotone.

Problem 18 Find the graph Φ(p) for

max|u|≤γpu− |u|0

andmax|u|≤γpu− |u|1.

233

functional analysis and optimization · functional analysis and optimization kazufumi ito november...

Documents