variational methods leafletrev - universiteit...

150
Further information and subscription For further information, please contact Andonowati at [email protected] OR [email protected] To subscribe, please send your cv (due date June 15, 2002) and the name/s of a person/s (with her/his/their contact address/es) who will give you recommendation (if asked) about your academic credibility to PUSAT PENELITIAN PENGEMBANGAN DAN PENERAPAN MATEMATIKA (P4M) INSTITUT TEKNOLOGI BANDUNG Gedung Lab. Tek. III, Jl. Ganesha 10 Bandung, 40132 INDONESIA Phone/Fax: +62 +22 250 8126 E-mail: [email protected] Notification for acceptance will be sent before June 22. Confirmation to attend the course should be submitted by July1, 2002 Target group, Participation The topics are interesting, and the course is designed, for participants with different background: mathematics, physics, engineering. The division of the material in two levels guarantees that students with different level can profit: S1, S2 and S3 students. For successful absorption of the material, participants should have basic knowledge of analysis and of ODE’s and PDE’s. The course will be for the major part be conducted in English, so sufficient level of English is required. Subscription and selection Only a LIMITED NUMBER of participants will be accommodated. Selection process will be based on the academic background of the candidates and her/his English proficiency. Please supply us as much information as possible on these matter. Fee No registration fee is required. It is FREE Awards... There will be small awards for outstanding performance as well as for enthusiastic participation Variational Methods in Science with emphasis on applications from fluid dynamics and optics ITB, July 15 – July 20, 2002 A one-week course with integrated lectures and exercises extensive lecture notes 'Maple’ and 'Matlab' illustrations 15/07 Basic Calculus of Variations 16/07 Linear Eigenvalue Problems 17/07 Nonlinear Eigenvalue Problems 18/07 Consistent Variational Approximations 19/07 Variational Numerics 20/07 Summary, epilogue 8:00-10:00 Motivation, examples 10:00-12:00 General theory 13:00-17:00 Class work organised by P4M – ITB, Andonowati in collaboration with AAMP – UTwente, E. van Groesen Funded by KNAW through EPAM Industrial Mathematics Project

Upload: dangnhi

Post on 14-Jun-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Further information and subscription

For further information, please contact Andonowati at [email protected] OR [email protected] To subscribe, please send your cv (due date June 15, 2002) and the name/s of a person/s (with her/his/their contact address/es) who will give you recommendation (if asked) about your academic credibility to

PUSAT PENELITIAN PENGEMBANGAN DAN PENERAPAN MATEMATIKA (P4M) INSTITUT TEKNOLOGI BANDUNG

Gedung Lab. Tek. III, Jl. Ganesha 10 Bandung, 40132 INDONESIA

Phone/Fax: +62 +22 250 8126 E-mail: [email protected]

Notification for acceptance will be sent before June 22.

Confirmation to attend the course should be submitted by July1, 2002

Target group, Participation The topics are interesting, and the course is designed, for participants with different background: mathematics, physics, engineering. The division of the material in two levels guarantees that students with different level can profit: S1, S2 and S3 students. For successful absorption of the material, participants should have basic knowledge of analysis and of ODE’s and PDE’s. The course will be for the major part be conducted in English, so sufficient level of English is required.

Subscription and selection Only a LIMITED NUMBER of participants will be accommodated. Selection process will be based on the academic background of the candidates and her/his English proficiency. Please supply us as much information as possible on these matter.

Fee

No registration fee is required.

It is FREE

Awards... There will be small awards for outstanding

performance as well as for enthusiastic participation

Variational Methods in Science with emphasis on applications from fluid dynamics and optics

ITB, July 15 – July 20, 2002

A one-week course with integrated lectures and exercises

extensive lecture notes 'Maple’ and 'Matlab' illustrations

15/07 Basic Calculus of Variations 16/07 Linear Eigenvalue Problems 17/07 Nonlinear Eigenvalue Problems 18/07 Consistent Variational

Approximations 19/07 Variational Numerics 20/07 Summary, epilogue

8:00-10:00 Motivation, examples 10:00-12:00 General theory

13:00-17:00 Class work

organised by P4M – ITB, Andonowati

in collaboration with AAMP – UTwente, E. van Groesen

Funded by KNAW through EPAM Industrial Mathematics Project

The subject Optimisation properties are quite common in every-day life but also in the natural and engineering sciences where many problems are formulated in terms of ordinary or partial differential equations. Well known examples are principles of minimal potential energy, Fermat’s principle of least time for light rays, and principle of stationary action for dynamical systems. But also solitons in surface waves or optics can be characterised by the fact that these profiles maximise the momentum at given energy (maximise speed at given cost). An optimisation principle makes it possible to use special mathematical methods, which then can often be adapted for other problems as well. Besides that, from its origin of study by scientists like Euler, Lagrange, Newton, Huygens, the methods can deal remarkably well with nonlinear problems, and as such it is still one of the most coherent mathematical set of methods for study of nonlinear phenomena. Since an optimisation property, or more generally a variational structure, is ‘special’ and leads to specific behaviour of the system, it is clear that when one wants to design simplified models of the system, the model should retain these properties, i.e. inherit the variational structure. In that sense the variational structure is an essential element, and guides the way, to consistent approximations in mathematical modelling. In particular this holds just as well if one looks for numerical discretizations: the numerical code should

respect this basic property. Finite element methods are often directly associated with this, but also other methods can be used in ‘variational discretizations’. The course intends to introduce to the basic mathematical methods as well as to recognise variational structures in problems from the natural and engineering sciences.

Topics

The topics will be extensively motivated and illustrated by various examples, where the examples are chosen mainly from the natural and engineering sciences, in particular from problems in fluid dynamics (surface waves) and optics. • Theory of first and second variation. • Unconstrained problems: the Euler-

Lagrange equations. • Prescribed and natural boundary

conditions. • Weak formulations and interface

conditions • Direct and inverse problem of the Calculus

of Variations • Constrained problems: Lagrange’s

Multiplier Rule, and the multiplier as derivative of value function.

• Sturm-Liouville eigenvalue problems, more dimensional and nonlinear extensions.

• Dynamical systems: conservative, dissipative and thermodynamic systems

• Direct optimisation methods, steepest descent method

• Low-dimensional modelling retaining the variational structure

• Numerical codes from variational discretizations: FD and FEM-methods

Applications that will be dealt with include: • Classical mechanics: Lagrangian and

Hamiltonian systems • Nonlinear oscillators, vibrations of drums

and bars • Variational structure of Maxwell equations

and equations for surface waves, like Korteweg – de Vries equation, Nonlinear Schrodinger equation, etc.

• Solitons in surface waves and optics • Wave propagation through wave guides • Finite-mode approximations of dispersive

wave equations • FD and FEM numerical schemes for simple

wave models.

It is possible that some participants may have specific

questions or problems. These problems may be discussed, either included or outside the

course week.

Variational Methods in Sciencewith applications in fluid dynamics and optics

E. (Brenny) van Groesen & AndonowatiApplied Analysis & Mathematical Physics AAMP,

University of Twente, The Netherlandsand

Center of Mathematics P4M,Institut Teknologi Bandung, Indonesia

10 July 2002; update 18 July 02

ii

Contents

Introduction ix

1 Basic Calculus of Variations 11.1 Motivation and Basic Notions . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Extremal problems in finite dimensions . . . . . . . . . . 11.1.2 Generalization to infinite dimensions . . . . . . . . . . . . 21.1.3 Notation and General Formulation . . . . . . . . . . . . . 31.1.4 Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.5 Bilinear functionals and quadratic forms . . . . . . . . . . 51.1.6 Admissible variations . . . . . . . . . . . . . . . . . . . . 7

1.2 Theory of first variation . . . . . . . . . . . . . . . . . . . . . . . 91.2.1 First variation and variational derivative . . . . . . . . . . 91.2.2 Characteristic cases . . . . . . . . . . . . . . . . . . . . . 111.2.3 Stationarity condition . . . . . . . . . . . . . . . . . . . . 131.2.4 Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . 141.2.5 Natural boundary conditions . . . . . . . . . . . . . . . . 141.2.6 Weak formulation and Interface conditions . . . . . . . . 16

1.3 Principle of Minimal Potential Energy . . . . . . . . . . . . . . . 18Dirichlet’s principle . . . . . . . . . . . . . . . . . . . . . 18Bars and plates, strings and membranes . . . . . . . . . . 19Theory of bars . . . . . . . . . . . . . . . . . . . . . . . . 19Theory of strings . . . . . . . . . . . . . . . . . . . . . . . 202D-elasticity: plates and membranes . . . . . . . . . . . . 21

1.4 Dynamical Systems and Evolution Equations . . . . . . . . . . . 211.4.1 Classical Mechanics . . . . . . . . . . . . . . . . . . . . . 21

Lagrangian systems . . . . . . . . . . . . . . . . . . . . . 21Classical Hamiltonian systems . . . . . . . . . . . . . . . 25

1.4.2 Poisson systems . . . . . . . . . . . . . . . . . . . . . . . . 27Canonical Hamiltonian systems . . . . . . . . . . . . . . . 28Complex canonical structure . . . . . . . . . . . . . . . . 29

1.4.3 Evolution equations (Nonlinear Wave equations) . . . . . 29Boussinesq Equations . . . . . . . . . . . . . . . . . . . . 29KdV (Korteweg - de Vries) Equation . . . . . . . . . . . . 30NLS (Non-Linear Schrodinger) Equation . . . . . . . . . . 30

1.4.4 Gradient systems (Steepest decent) . . . . . . . . . . . . . 301.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.6 ** Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

iii

iv CONTENTS

1.6.1 Theory of second variation . . . . . . . . . . . . . . . . . 391.6.2 Legendre transformation . . . . . . . . . . . . . . . . . . . 401.6.3 Convexity Theory . . . . . . . . . . . . . . . . . . . . . . 401.6.4 Hamilton Jacobi equations . . . . . . . . . . . . . . . . . 401.6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Constrained Problems 432.1 Motivation and Introductory Examples . . . . . . . . . . . . . . . 432.2 Lagrange Multiplier Rule . . . . . . . . . . . . . . . . . . . . . . 45

2.2.1 Constrained to levelsets . . . . . . . . . . . . . . . . . . . 452.2.2 Formulations of LMR . . . . . . . . . . . . . . . . . . . . 472.2.3 Families of constrained problems . . . . . . . . . . . . . . 502.2.4 The multiplier as derivative of the value function . . . . . 512.2.5 Homogeneous functionals . . . . . . . . . . . . . . . . . . 512.2.6 ** Constrained minimizers and the Lagrangian functional 52

2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.1 Linear Eigenvalue Problems . . . . . . . . . . . . . . . . . 53

Basic problem from Linear Algebra . . . . . . . . . . . . . 53Levelsets of Quadratic Forms . . . . . . . . . . . . . . . . 54Eigenvalue problem for linear operators . . . . . . . . . . 55General formulation of EVP . . . . . . . . . . . . . . . . . 56Spectral theorem for differential operators . . . . . . . . . 59Generalized Fourier theory . . . . . . . . . . . . . . . . . 61Fredholm alternative . . . . . . . . . . . . . . . . . . . . . 61Example: EVP for S-L operator on an interval . . . . . . 62Example: EVP for S-L operator on a spatial domain . . . 63Comparison methods for principal eigenvalues . . . . . . . 64

2.3.2 Algorithm for Relative Equilibrium (Solutions) . . . . . . 672.3.3 Thermodynamic systems: constrained steepest descent . . 69

2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.5 ** Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.5.1 Theory of Constrained Second Variation . . . . . . . . . . 742.5.2 LEVP: Non-successive characterization of eigenvalues . . 75

Min-max and Max-Min formulations . . . . . . . . . . . . 75

3 Variational approximations 773.1 Motivation and Introductory Examples . . . . . . . . . . . . . . . 77

Accuracy of the restricted solution . . . . . . . . . . . . . 793.2 Variational Numerical Methods . . . . . . . . . . . . . . . . . . . 80

3.2.1 General method . . . . . . . . . . . . . . . . . . . . . . . 803.2.2 Projection of (variational) equations . . . . . . . . . . . . 84

Ritz-Galerkin projection method . . . . . . . . . . . . . . 853.3 Consistent modelling by restriction . . . . . . . . . . . . . . . . . 85

3.3.1 Restriction to suitable families of functions . . . . . . . . 85Nonlinear oscillator: Duffing’s equation . . . . . . . . . . 86WKB-approximation . . . . . . . . . . . . . . . . . . . . . 87

3.3.2 Design of simplified models . . . . . . . . . . . . . . . . . 883.4 Direct optimization methods . . . . . . . . . . . . . . . . . . . . 89

3.4.1 Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . 90

CONTENTS v

3.4.2 Conjugate Gradient Method . . . . . . . . . . . . . . . . . 91Search directions . . . . . . . . . . . . . . . . . . . . . . . 91CGM-Algortihm . . . . . . . . . . . . . . . . . . . . . . . 92

A Variational Optics 95A.1 Basic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

A.1.1 Macroscopic Maxwell Equations . . . . . . . . . . . . . . 95A.1.2 Restriction to 2 spatial dimensions . . . . . . . . . . . . . 97A.1.3 Restriction to 1 spatial dimension . . . . . . . . . . . . . 97A.1.4 Bidirectional equation for pulse propagation . . . . . . . . 97A.1.5 Unidirectional Maxwell equation . . . . . . . . . . . . . . 98A.1.6 NLS Envelope equation for pulse propagation . . . . . . . 99A.1.7 Spatial 2D NLS . . . . . . . . . . . . . . . . . . . . . . . . 100

A.2 Optical waveguide modes . . . . . . . . . . . . . . . . . . . . . . 100A.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 100A.2.2 Variational formulation for guided modes with Transpar-

ent BC’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103Direct formulation on the unbounded domain . . . . . . . 103Confined formulation using Transparent Boundary Con-

ditions (TBC) . . . . . . . . . . . . . . . . . . . 103A.2.3 Approximations with simple trial profiles . . . . . . . . . 105

Confinement at ‘partly-optimal’ Dirichlet boundary . . . 105Using the confined formulation . . . . . . . . . . . . . . . 106

A.2.4 Variational formulation for radiation modes . . . . . . . . 106A.2.5 FEM-numerics for complicated index variations . . . . . . 107

B Variational Fluid Dynamics 109B.1 Free Surface Wave Models . . . . . . . . . . . . . . . . . . . . . . 109

B.1.1 Full surface wave equations . . . . . . . . . . . . . . . . . 109B.1.2 Variational structure of FSWE . . . . . . . . . . . . . . . 110B.1.3 Linearized SW, dispersion . . . . . . . . . . . . . . . . . . 111B.1.4 Boussinesq type of equations . . . . . . . . . . . . . . . . 112B.1.5 KdV type of equations . . . . . . . . . . . . . . . . . . . . 112B.1.6 NLS-model . . . . . . . . . . . . . . . . . . . . . . . . . . 113

C Solitons and wave groups 115C.1 Coherent structures as relative equilibria . . . . . . . . . . . . . . 115C.2 Solitons of KdV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

C.2.1 Motivation from Travelling Wave Ansatz . . . . . . . . . . 116Analysis of solitary wave profiles . . . . . . . . . . . . . . 116

C.2.2 Solitons as Relative Equilibria . . . . . . . . . . . . . . . 119Scaling argument for KdV solitons . . . . . . . . . . . . . 120

C.3 NLS Wave Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 120Hamiltonian structure . . . . . . . . . . . . . . . . . . . . 121First integrals and their flow . . . . . . . . . . . . . . . . 121

C.3.1 Relative Equilibria: soliton- and periodic wave groups . . 121Nonlinear harmonic . . . . . . . . . . . . . . . . . . . . . 122Nonlinear modulated harmonic . . . . . . . . . . . . . . . 123Soliton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

vi CONTENTS

Nonlinear bi-harmonic . . . . . . . . . . . . . . . . . . . . 123Scaling and (non-) existence of NLS-solitons . . . . . . . . 124

C.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Preface

The present Lecture Notes are prepared for a course given at ITB in summer2002. The basic material in Chapters 1, 2 and 3 can be found scattered around inseveral text books; the text here is based on previous lecture notes at UTwente(course: Applied Analytical Methods’ of EvG), but with restyled presentationto be attractive for a larger audience. The contents of Chapter 3 and theAppendices were written and assembled for this course.The authors like to present these Lecture Notes as a gift to the participants

of the course as their wedding present and hope that the love in which it isprepared and the enthusiasm for the topic show itself in these notes and in theexecution of the course.We will be grateful to the participants for remarks and criticism.Bandung, 10 July 2002.New lecture notes, one-week course...

vii

viii PREFACE

Introduction

Optimality in the natural sciences

“...... je suis convaincu que par tout la nature agit selon quelqueprincipe d’un maximum ou minimum." (Euler, 1746)

This quotation of one of the greatest scientists that shaped the modernmathematical description and investigation of the natural sciences, expressesclearly the underlying expectation. The belief that optimization was importantto describe natural phenomena was verified by Euler for various problems, andexploited to present a more thorough investigation of the problems. More farreaching conclusions were drawn by some other scientists:

“...... des loix du mouvement ou l’action est toujours employeeavec la plus grande economie, demontreront l’existence de l’Etresupreme ... ", (Maupertuis, 1757)

but this point of view belongs to metaphysics, and is as such not very fruitfulfor a deeper investigation1.

Actually, optimization problems are known already from ancient times; wellknown is Dido’s problem: the problem to find the plain domain of largest areagiven the circumference of the domain. Many other problems can also be for-mulated as geodetic problems, where one investigates those curves (or surfaces)with the property that a functional measuring the length (or the area) is assmall as possible. A major example is the following

Fermat’s principle, 1662The actual trajectory of a light ray between two points in an inhomo-geneous medium has the property that the time (or optical length)required to transverse the curve is as small as possible when com-pared to the time required for any other curve between the points.

In fact, the investigation of this principle led Fermat to the principal mathe-matical result that will be formulated in the first chapter as Fermat’s algorithm.From Fermat’s principle, Snell’s law can be derived about the breaking of lightbetween two media. A dual point of view (looking for the evolution of light

1 It should be noted, however, that modern theoretical physicists who look for “a theoryof everything” (Grand Universal Theory) actually search for functionals (Lagrangians) thatproduce the desired unified field equations upon optimization, just as Einsteins general theoryof relativity is based on a minimality principle.

ix

x INTRODUCTION

fronts, the surfaces that can be reached by the light from a point source in agiven time) was investigated by Huygens, 1695. Huygens’ principle, of vitalimportance for the basic understanding of light propagation, can be consideredas a major example of what later has become known as duality methods.These historical remarks2 make it clear that practical problems from physics

provided the initial motivation for the beautiful mathematical theory that hasbeen developed since then.

Dynamical systems with a variational structureExcept from problems that have by their very nature an “obvious” formula-

tion as a minimization problem (minimum length, minimum costs, etc.), thereare many problems for which such an extremizing property exists, but not soobvious. Important examples can be found in dynamical systems.The principle of minimum (potential) energy leads to equilibrium states for

which the total energy is minimal (while the kinetic energy vanishes for equi-libria). For nontrivial dynamic evolutions in certain systems, a less intuitivequantity, the ‘action’ (see the quotation of Maupertuis), turns out to be animportant functional; actual evolutions correspond to saddle points (not ex-tremizers in general) of this functional. Formulations of such systems werestudied by Lagrange, Hamilton etc., and the many results are collected in whatis now called Classical Mechanics, a well structured set of methods and results tostudy dynamical systems of collection of mass points, mechanical (rigid) struc-tures etc.. Nowadays, much effort is done to generalize these ideas to partialdifferential equations for continuous systems such as fluid dynamics and fieldtheories like optics.The systems referred to above, Lagrangian and Hamiltonian systems, and

more generally Poisson systems, are roughly speaking ‘conservative’ (the energyis conserved), and the dynamic motions have a variational nature. In manycases this structure makes it also possible that special (but important) solutionscan be characterized in a variational way. These solutions can be equilibriumsolutions or ‘steady state solutions’. Often these are called coherent structuresand are characteristic for such problems; examples are phenomena like ‘solitons’and ‘vortices’ that appear in fluid dynamics and optics. Using their variationalnature, these can be found in a systematic way.Even when the system is not conservative, but (mainly) dissipative, such

as in gradient and thermodynamic systems, equilibrium solutions can still befound by exploiting variational structures in the equations.

Modelling and calculatingIf a problem has a variational structure, this is a special property. When

one wants to make a simpler model or when a numerical algorithm has to bedesigned for accurate calculations, it is best to take care that this special struc-ture is retained. One reason is that then corresponding properties immediatelyrelated to the variational structure are inherited; another reason that in thestudy of the simplified or discretized model variational methods can again beused. It is not easy to describe in a well-structured way this idea of ‘variationallyconsistent’ modelling, but certain basic approaches can be identified. The keymethod is to restrict the set on which the original functional is considered to a

2The interested reader may consult such references like Goldstein, and Newman vol. 2.

xi

‘suitable’ subset. Probably the best known application of this idea is the FiniteElement Method, a numerical method that is obtained by replacing functionsfrom an infnite dimensional space to finite dimensional approximations. Butalso restrictions to other sets, for instance to functions with a specific profilethat are characterized by only a few parameters, can be very useful to obtainsimplified models that show the main properties of the full problem. Viewedin this way, ‘variational’ numerical methods are just variants of more generalvariationally consistent modelling.

Contents of the Lecture NotesThe lecture notes consist of chapters and a few appendices.

Most chapters start with a motivation and introductory examples; then the basicmathematical method is described in a more or less general setting, which is thenillustrated to and specified for various application areas. Exercises finishes thisbasic material, but is then followed by more advanced methods that can beskipped in first reading.The main classical methods of the Calculus of Variations are described in

Chapters 1 and 2.In Chapter 1 problems ‘without constraints’ are considered, and the main

result is the vanishing of the (variational) derivative at a stationary point ofa functional. Essentially this is a direct generalization of results for a finitenumber of variables, but the infinite dimensional setting will lead to equationsthat are usually (partial or ordinary) differential equations: the Euler-Lagrangeequations; the treatment of boundary conditions requires special attention.In Chapter 2 we consider optimization problems for which the functions to

be considered do not form a linear space but satisfy certain ‘constraints’; it willlead to the infinite dimensional generalization of Lagrange’s Multiplier Rule.The appearance of a ‘multiplier’ that is not given in advance but has to bedetermined as part of the solution of the total problem, resembles the standardEigenvalue Problem from Linear Algebra, which is why these problems can becalled non-linear Eigenvalue Problems. Linear eigenvalue problems are a specialbut important case, which is why this treated in some detail; it will be shownthat in many cases complete sets of eigenfunctions can be expected (Fouriertheory is a characteristic example). The linear eigenvalue problem gives thebest interpretation of infinite dimensional spaces and shows that ‘operators’ aregeneralizations of matrices in finite dimensional spaces.Chapter 3 deals with variationally-consistent modelling. The basic idea is

to restrict the functional to a subset of the original set; when the subset issimpler, the Euler-Lagrange equation will become simpler too. For instance, asfor numerical purposes, when the subset is finite dimensional, this will bring thepartial differential equation to an equation in the finite dimensional set, and wehave obtained a ‘variationally-consistent’ discretization of the equation. Oneof the methods will be the Finite Element Method. Besides we will also showin a very condensed, but complete, way the most well-known method to solvethe resulting algebraic equation: the method of steepest descent and its morepractical implementation: the conjugate gradient method. The last method isnowadays the prime example of a direct optimization method.In all these chapters, at several places and in many examples and exercises,

problems from the natural sciences are not only used to illustrate the meth-ods, but will also show that variational structures appear abundantly. More

xii INTRODUCTION

elaborate applications from optics and surface waves, exploiting various meth-ods treated in the chapters, are collected in appendices. Appendix A dealswith Optics; the basic equations are given and modern ways to treat waveguidemodes are described. Appendix B deals with the basic equations from fluiddynamics. Appendix C describes how solitons and wave groups can be obtainedfor the equations of optics and fluid mechanics: the similar variational structureof the equations leads to the applicability of the fundamental methods to bothproblems at the same time.

Chapter 1

Basic Calculus of Variations

1.1 Motivation and Basic Notions

1.1.1 Extremal problems in finite dimensions

In real analysis courses at an introductory level, functions of one or more vari-ables are considered. The definition of differentiation of functions is a vital partof such courses, and a standard result is the following

Algorithm of Fermat, for 1-D optimization problems1.If the differentiable scalar function of one variable f : R → R

attains a (local) extreme value at the point x, then the derivative atthat point vanishes:

f 0(x) = 0.

Viewed as a condition for a point to be an extremal element, this conditionis necessary but not sufficient; every point that satisfies this property is calleda stationary, or critical, point, so including ‘saddle points’.Knowing the above result for functions of one variable, the generalization to

functions of more variables, n dimensional problems, is remarkably simple: theuse of partial derivatives reduces the n dimensional problem to n 1-D problems,as follows.For F : Rn → R, recall that at the point x the directional derivative in a

direction η is found by differentiating the scalar function obtained by restrictingF to the line through x in the direction η, i.e. the function ε→ F (x+ εη),

d

¯ε=0

F (x+ εη) ≡ DF (x)η.

Here DF (x) is seen as a map from Rn into R as η → DF (x)η. If x minimizesF on Rn, this point certainly minimizes at ε = 0 the restriction to the line, andhence

d

¯ε=0

F (x+ εη) = DF (x)η =0.

1Fermat did not write down the actual equation; he reasoned that small variations neara minimizer produces a higher order variation in the function, the fundamental idea thatleads to the result and justifies to adhere his name to the mathematical algorithm. Fermatdidn’t know the concept of derivative of functions other than polynomials; it was Leibniz whointroduced in 1684 the concept of derivative of arbitrary functions.

1

2 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

If x minimizes F on Rn, this should hold for any direction η, and the vanishingof the directional derivative in every direction η can be expressed by writing

DF (x) = 0.

It is common to rewrite this property by using the notion of gradient as follows.For F : Rn → R, let ∇F be the gradient of the function, defined to be the

column vector

∇F (x) = ∂x1F (x)...∂xnF (x)

.The relation with the directional derivative is simply that for each η

DF (x)η ≡ ∇F (x) · η

where in the rhs the usual inner product of vectors in Rn is meant. Then, fromDF (x)η = ∇F (x) · η = 0 for all η, the vanishing of the map DF (x) can now beexpressed by the vanishing of the gradient (vector)

∇F (x) = 0.

This formulation is the direct generalization of Fermat’s algorithm to n dimen-sional optimization problems.

1.1.2 Generalization to infinite dimensions

In this course we tackle problems for which a scalar function is defined on aninfinite dimensional space (the function is then usually called a functional ).Most times this means that the functional assigns to functions of one or severalvariables a real number by integrating powers of the function and its derivative:for instance the L2-norm is an example of such so-called ‘density-functionals.Then the above can be generalized as follows:

• by restricting the functional to one dimensional lines the notion of direc-tional derivative can be defined just as easily; it will be called the firstvariation in that case;

• when dealing with density-functionals, a generalization of the gradient canbe defined and will lead to the notion of variational derivative. The specificexpression is related to the choice of the L2 innerproduct for functionsunder consideration;

• the fact, which is trivial in finite dimensions, that from ∇F (x) · η = 0 forall η, it follows that ∇F (x) = 0, can be generalized to infinite dimensionalfunction spaces with the L2 innerproduct as a consequence of Lagrange’sLemma; this result will enable us to identify the first variation with thevariational derivative (not considering boundary contributions).

The infinite dimensional case also brings characteristic differences:

1.1. MOTIVATION AND BASIC NOTIONS 3

• the functions to which the functional assigns a certain value are definedon a domain; ‘variations’ of the functions may be restricted in the interiorof the domain (for instance if the average of the function is prescribedto vanish) but also because of restrictions on the boundary; boundaryconditions may be prescribed or may result for critical points (so-callednatural boundary conditions);

• when talking about functions, clearly their smoothness (continuity, differ-entiability) may also be of importance; it also leads to the most character-istic difference with finite dimensional spaces that in infinite dimensionalspaces different (non-equivalent) norms are possible: a function may besquare integrable, while the squared derivative may have arbitrary largeintegral.

The typical notation to be used in the following for the variational derivativeis δL(u), and Fermat’s algorithm generalizes to

δL(u) = 0

as the condition for a minimizing element. This equation is most times a differ-ential equation, replacing the algebraic equation ∇F (x) = 0 that is obtained fora minimizer of a function of a finite number of variables, together with boundaryconditions.Just as in finite dimensions, the second derivative may reflect minimizationproperties, and in general provide insight into the character of a critical point.In the Calculus of Variations these aspect are dealt with in the theory of firstand second variation.

1.1.3 Notation and General Formulation

Generally speaking, for an optimization problem we have the following basicingredients:

• a set of admissible elementsM, usually some subset of an (infinite dimen-sional) space U ;

• a functional L, defined on U (or only onM).

The minimization problem of L onM concerns questions about an elementu that minimizes the functional on the set of admissible elements. By definitionu is the element that, among all admissible elements in the setM set for whichthe functional achieves it’s lowest value µ :

µ = L(u) and L(u) ≤ L(u) for all u ∈M.

We will use in the following the notation

u ∈ Min L(u) | u ∈M = minu∈M

L(u) | u ∈M µ = Min L(u) | u ∈M .

In principle, the questions could deal with the existence, the uniqueness,and the characterization and computation of the minimizer. In this course, we

4 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

will mainly deal with the characterization of the minimizer (and more generalcritical points). We will concentrate on the equation(s) that have to be satisfiedby such a critical point: the Euler-Lagrange equation or Lagrange’s multiplierrule, boundary conditions, etc..

1.1.4 Functionals

Now we will first deal with the basic ingredients of a variational problem: thefunctionals and the set of admissible elements.The functionals we will encounter are mainly density-functionals. That

means that the functional assigns to each function from a set of admissiblefunctions a number that is found by integrating (powers) of the function andit’s derivatives. So, for functions defined on the interval (0, `), simple examples

are u(x)→ R `0u2(x)dx or u(x)→ R `

0

h(∂xu)

2+ u4 + sin(u)

idx etc..

The example of a functional that assigns the value of a (continuous) functionin one point, such as u(x) → u(`/2) could be called a ‘generalized’ densityfunctional since it can be written using Dirac’s delta function2 like u(x) →Ru(x)δDir(x− `/2)dx .The general form of a density functional cannot be given without compli-

cated notation; therefore we just list the main type of functionals that we willencounter in the following with their characteristic names.

Sturm-Liouville type of functionalsFor scalar functions u of one variable x, defined on in an interval I, thefunctional is of the form

L(u) =ZI

hp(x) (∂xu)

2 − q(x)u2 − f(x)uidx (1.1)

where p, q and f are given functions. As is common, the arguments (x)are suppressed in the expression under the integral sign. The functionalcan be defined for piecewise differentiable functions u(x).

Lagrangian functionals from Classical MechanicsFor vector functions q, say q ∈ RN , of one variable t (the time), and withq denoting the time derivative of q, the functional is of the general form

L(q) =ZL(q, q, t)dt (1.2)

where the Lagrangian density L is a given, smooth, function of its 2N +1(RN ×RN ×R) arguments.

Dirichlet type of integralsFor scalar functions Φ defined on a domain Ω ⊂ Rn , and with ∇Φ thegradient of Φ, the functional integrates over the spatial domain

L(Φ) =Z £

p(x)|∇Φ|2 − q(x)Φ2 − f(x)Φ¤ dxthe more-dimensional variant of the SL-type of functionals.

2The symbol ‘delta’ δ will appear abundantly in this course: it will be used to denote akind of differential operation (variational derivative), but will also appear to denote admissiblevariations. Therefore, to minimize confusion, Dirac’s delta function will be denoted by δDir .

1.1. MOTIVATION AND BASIC NOTIONS 5

Lagrangian functionals for evolution equationsFor functions u depending on spatial variables x and the time t , a char-acteristic example is

L(u) =Z Z

Ω

hρ(x) (∂tu)

2 − c2 (∂xu)2 − q(x)u2idxdt (1.3)

and more generally with a Lagrangian density L :

L(u) =Z ·Z

Ω

L(∂tu, ∂xu, u, x, t)dx

¸dt. (1.4)

Sometimes one distinguishes between ‘single-integral’ functionals (when func-tions depend on only one independent variable, and only integration over thatsingle variable is needed to arrive at a real number) and ‘multiple-integral’ func-tionals (when the functions depend on more independent variables), but fromthe general point of view there is no essential difference.

1.1.5 Bilinear functionals and quadratic forms

Here we recall some general notions that will be used repeatedly in the following.

A functional ` defined on a linear function space U is linear if for all u, v ∈ Uand all λ ∈ R

`(u+ v) = `(u) + `(v), `(λu) = λ`(u).

A functional b : U × U → R is a bilinear functional if it is linear in each ofits arguments, so

U 3 v 7→ b(u, v) is linear for all u ∈ U ,and

U 3 u 7→ b(u, v)is linear for all v ∈ U .A bilinear functional b can have special properties:

symmetry : b(u, v) = b(v, u)

skew-symmetry : b(u, v) = −b(v, u)

non-degenerate :

½[b(u, v) = 0 for all u]⇒ v = 0[b(u, v) = 0 for all v]⇒ u = 0

non-negative : b(u, u) ≥ 0positive : b(u, u) > 0 for all u 6= 0.

Note the following fact:

b(u, u) = 0 if b is skew-symmetric.

A symmetric bilinear functional is a kind of generalized inner product; when itis positive, it is a true innerproduct. In all cases it defines a quadratic form.

6 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

Definition 1 A functional a : U → R is called a quadratic form if there is asymmetric bilinear form b on U so that

a(u) = b(u, u);

when b is positive, a is called a norm, and b is an innerproduct; when b is onlynon-negative a is called a semi-norm.

Usually a first check to recognize a functional as a quadratic form is to verifythat it is homogeneous of degree 2; this is not sufficient, however.

Proposition 2 Given a quadratic form, the symmetric bilinear functional isgiven by

b(u, v) =1

4[a(u+ v)− a(u− v)];

hence there is a one to one relation between quadratic forms and symmetricbilinear functionals.

For the symmetric bilinear functional b and the corresponding quadraticform a, the following relation holds:

a(u+ λv) = a(u) + 2λb(u, v) + λ2a(v) for each λ ∈ RFrom this we derive the following important consequences:

Proposition 3 When a is positive semi definite, Cauchy-Schwartz inequalityholds:

|b(u, v)|2 ≤ a(u)a(v).Exercise.

1. Often a quadratic density functional has the form

a(u) =

ZLu · u

where L is a (differential) operator. Find the corresponding bilinear form.

2. Show that u→pa(u) with

a(u) =

Z b

a

∂xu2dx

defines a norm on the set of functions that satisfy u(0) = 0; write down theCauchy-Schwartz inequality. Show that on the set of all smooth functionsit is only a semi-norm.

3. Derive the corresponding bilinear functional for the following quadraticform

a(u) =

Z b

a

σ(x)u2x + u2dx.

Compare this bilinear functional with the following one:

c(u, v) =1

2

Z b

a

[−∂x (σ(x)∂xu) + uv + −∂x (σ(x)∂xv) + vu] dx.

1.1. MOTIVATION AND BASIC NOTIONS 7

4. Under suitable conditions on the function σ and the space of functions onwhich it is defined, the quadratic form

R baσ(x)u2x+u2dx defines a norm;

give some examples when it is a norm and then write down the CauchySchwartz inequality.

5. The same questions for the more dimensional generalization

a(u) =

σ(x)|∇u|2 + u2dx.

1.1.6 Admissible variations

Just as is the case that for functions of a finite number of variables its minimalvalue and the minimizer depend heavily on the domain on which this functionis considered, for each variational problem a set of admissible elements shouldbe specified: the set on which the functional is defined, and the functions thatare allowed (‘admissible’) in the competition of looking for the minimizer.In general, the set of admissible elements consists of functions u(x) of one or

more variables x ∈ Rn defined on a certain domain (bounded or not) Ω ⊂ Rn .Usually these functions are subject to certain conditions, conditions which maybe of different character:

smoothness and integrability conditions, at least to assure that the densityfunctional is well defined;

boundary conditions: conditions on the function and/or its derivatives on(parts of) the boundary ∂Ω of the domain Ω on which the functions aredefined;

(internal) constraints.

An example of an ‘internal constraint’ isRΩu(x)dx = 0 ; it shows that the

function over the full domain has to be considered to verify the condition: localvariations of the function in the interior can easily make the integral nonzero.It is essential for the following to describe this more clearly. Therefore we

recall that when talking about the ’derivative’ of a function at a certain point,we compare the function values at neighbouring points. To be able to concludethat the derivative vanishes at the point where the minimal value is achieved, itshould be the case that neighbouring points are in the domain of definition forthe optimization problem: a function of one variable that attains its minimalvalue at the boundary of an interval doesn’t need to have vanishing derivativethere. Hence, it is important to know in which ‘directions’ the function valuescan be compared. That brings us to the notion of (admissible) variation.Take a function u(x) defined on Ω. A variation of that function is a ‘small’

change of that function, possibly over its full domain Ω. To make ‘small’ a bitmore precise: for any (finite) function η(x) on Ω and ε sufficiently small, thefunction εη(x) is small and we can consider the function u(x) + εη(x) to be inthe neighbourhood of u(x). Stated differently, and getting close to the sameinterpretation as a line through a point u in the direction η in finite dimensions,

8 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

for given ‘direction’ η(x) the line through the point u(x) in the direction η(x)is the set in function space:

ε→ u(x) + εη(x);

it is a family of graphs in which the original graph of u(x) is embedded, andwhich ‘approaches’ this function for vanishing ε. PLOTTTTTTIn the classical Calculus of Variations it is common to write δu(x) for εη(x) withε small and to call it a variation; then δu(x) is more interpreted like differentelements on the line with ε small than as a single function. To avoid thiscumbersome interpretation, we will avoid the use of δu but will write εη(x) andmake the dependence on the parameter ε explicit.The above holds for any function, and any variation. Now we consider a

specified set of admissible elements, to which will correspond at each element aset of admissible variations. The idea can be simply illustrated to a function ofthree variables that is restricted to the set of admissible points that are pointsthat lie in a plane in the three-dimensional space. Then, for a given pointin the plane, only variations ‘in the plane’ are allowed to be considered, notin the direction perpendicular to the plane for instance. In the example of aplane there are two independent directions such that a full line lies in the plane.When the plane is replaced by a curved surface, a manifold, not two full lineswill belong to the manifold in general, but lines in so-called tangent directions(that form the tangent plane at the point of consideration) differ close to thepoint at the manifold only in higher order. These tangent directions will be theadmissible variations. Generally speaking, for a set of admissible elements Mwe will define at a point u ∈M the set of admissible variations the functions ηsuch that u+ εη belongs toM for small ε ‘up to higher order’. We will specifythis more precisely in Chapter 3, but introduce here already the notation TuMfor the set of admissible variations at the point u (common notation in geometryto denote the ‘tangent’-space).In this chapter we will deal with admissible sets for which each element

(function) can be changed locally (in a neighbourhood of each interior point inits domain of definition) in an arbitrary way and that still the varied elementremains admissible.To give meaning to these statements, first we consider such local variations,

i.e. we first introduce test functions. These functions will make it possibleto change a given function in the interior of its domain of definition, withoutaltering the behaviour at the boundary.

Definition 4 Given a domain Ω ⊂ Rn , the set of test functions on Ω will bedenoted by C∞0 (Ω) and consist of all functions that are infinitely differentiable(C∞) and that vanish, together with all derivatives, near the boundary ∂Ω (C∞0 ).

Remark. Such test functions really exist: for any interior point x0 ∈ Ω andr0 such that x ∈ Ω if |x− x0| < r0 an example is

φ(x) =

(exp

³− 1r20−|x−x0|2

´for |x− x0| < r0

0 for |x− x0| ≥ r0

Now we can define two essentially different classes of admissible elements,leading to different variational problems and different methods and results:

1.2. THEORY OF FIRST VARIATION 9

Definition 5 An unconstrained variational problem is a problem for which theset of admissible elementsM consists of functions defined on a domain Ω suchthat all test functions belong to the set of admissible variations:

if u ∈M then u+ C∞0 (Ω) ⊂M ,

meaning u+ εη ∈M for each η ∈ C∞0 (Ω).When this is not the case we will talk about a constrained variational problem.

For instance, a problem with admissible setM of functions that are requiredto satisfy

RΩu(x)dx = 0, will be a problem with constraints, since adding a

positive test function the condition will not be satisfied anymore. On the otherhand, ifM consists of ‘all’ smooth functions on Ω but possibly with restrictionson the boundary (boundary conditions), then this will be a problem withoutconstraints.Example. For the following given sets M the set of admissible variations

TuM is specified (in the examples this set does not depend on the point u); whenTuM contains the test functions (or not) the variational problem is without (orwith) constraints. Observe that all these examples are ‘planes’ in the functionspace, affine spaces.

1. ForM= u ∈ C1(0, 1)|u(0) = 2, ∂xu(1) = 0 , TuM = η ∈ C1(0, 1)|η(0) =0, ∂xu(1) = 0 ⊃ C∞0 (0, 1).

2. For M = u ∈ C0(0, 1)|u(0) = 2 , TuM = η ∈ C0(0, 1)|η(0) =0,Rηdx = 0 + C∞0 (0, 1).

1.2 Theory of first variationIn this section we derive the generalization of Fermat’s algorithm as announcedin the introduction. It must be noted that this is in fact a local result: assumingthe existence of a minimizer, we derive the anticipated result; no conditions arestated that guarantee the existence of a minimizer.

1.2.1 First variation and variational derivative

The aim is to consider the ‘derivative’ of a functional. As stated already, itis natural to use the idea of directional derivative since then the problem isreduced to the differentiation of a scalar function of only one variable.Hence, let u be a given function, and v an (arbitrary) variation. With thisvariation the original function u is embedded in a class of “varied” functions (aone-parameter family) of the form

ε 7→ u+ εv.

Fixing v, and restricting the functional to this line, we get a scalar function ofone variable:

ε 7→ L(u+ εv).

The derivative of this function is then by definition the directional derivative,the first variation.

10 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

Definition 6 First variationThe first variation of a functional L at u in the direction v is denoted by δL(u; v)and defined as

δL(u; v) = d

dεL(u+ εv)

¯ε=0

. (1.5)

In most cases, the first variation is linear in v (nonlinear in u in general).When it is linear in v (and continuous with respect to a topology on the space),it is also known as the Gateaux-derivative, it is the direct generalization of thedirectional derivative of a function on a finite dimensional space.

From the definition of first variation above, it follows directly that a linearapproximation of L(u+ εv) is given as

L(u+ εv) = L(u) + εδL(u; v) + o(ε) (1.6)

where, here and in the following, o(ε) means terms that are of higher than firstorder in ε: o(ε)/ε 7→ 0 for ε 7→ 0.

The definition above applies to all kind of functionals. For density func-tionals that we will consider mostly, it is usually possible to perform a partialintegration and to rewrite δL(u; v) as the L2(Ω)-innerproduct of v and somefunction which will be denoted by3 δL(u) (and which will be the direct gener-alization of the gradient of a function of a finite number of variables).

This may require the function u to be smooth enough, and usually a contri-bution consisting of an integration over the boundary appears in addition:

δL(u; v) =ZΩ

δL(u) · v +Z∂Ω

b(u; v) (1.7)

If functions v are considered that vanish on the boundary, the boundary con-tribution vanishes identically. Therefore, we can use in particular the class oftest functions C∞0 (Ω) to avoid these boundary contributions. Then we have thefollowing notion.

Definition 7 The function δL(u) on Ω defined by the conditionδL(u; η) = < δL(u), η >

≡ZΩ

δL(u) · η dx, for all η ∈ C∞0 (Ω) (1.8)

is called the variational derivative of the functional L at the point u.3For notational convenience we will exploit the notation δL(u), although in

much of the literature the notation δL/δu is often used:

δL(u) ≡ δLδu(u).

1.2. THEORY OF FIRST VARIATION 11

It will follow from Lagrange’s Lemma 11 below that when δL(u) is continu-ous, (1.8) indeed defines the function δL(u) uniquely. We will give various exam-ples in the following to demonstrate the calculation of the variational derivative.

Remark. Sometimes it is more convenient to extend the notation a littlebit, and to interpret the variational derivative δL(u) not only as a function, butto interpret δL(u) as a convenient notation for the linear functional δL(u; ·),and so to extend (1.8) to

δL(u; v) = hδL(u), vi, for all v ∈ TuM (1.9)

where then hδL(u), vi is nothing more than a symbolic way of writing the firstvariation. But note that then boundary terms are included in the interpretation.In the applications the formulae will be mostly exploited for the variationalderivative, i.e. by taking for h , i the L2-innerproduct.

1.2.2 Characteristic cases

For the characteristic functionals mentioned before, the first variation and thevariational derivative will be given here in illustrative notation.

Sturm-Liouville type of functionalsFor

L(u) =ZI

·1

2p(x) (∂xu)

2 − 12q(x)u2 − f(x)u

¸dx

we get

δL(u; v) =

ZI

[p(x) (∂xu) (∂xv)− q(x)uv − f(x)v] dx,δL(u) = −∂x (p(x)∂xu)− q(x)u− f(x) (1.10)

Lagrangian functionals from Classical MechanicsFor

L(q) =ZL(q, q, t)dt

(using vector notation, and a sloppy, but characteristic, way of writing thederivative of L with respect to the ‘variables’ that collect the vector q)

δL(q; ξ) =

Z ·∂L

∂qξ +

∂L

∂qξ

¸dt,

δL(q) = − ddt

·∂L

∂q

¸+

∂L

∂q. (1.11)

Dirichlet type of integralsFor the more-dimensional variant of the SL-type of functionals

L(Φ) =ZΩ

·1

2p(x)|∇Φ|2 − 1

2q(x)Φ2 − f(x)Φ

¸dx

12 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

the first variation is given by

δL(Φ,ψ) =ZΩ

[p(x)∇Φ ·∇ψ − q(x)Φψ − f(x)ψ] dx

To find the variational derivative we have to perform a ‘partial integra-tion’ for this multiple integral. This is done by using the following basicelements from differential calculus (often we will write for a vector a it’sdivergence as div(a) or as ∇ · a =div(a)): first the basic identity:

for scalar α and vector a: div(αa) = ∇α · a+ αdiv(a)

with which we can write p(x)∇Φ ·∇ψ =div(ψp(x)∇Φ)− ψdiv(p(x)∇Φ).Secondly we recall

Gausz’ theorem :

div(a) =Z∂Ω

a · n

where n is the outward pointing normal to the boundary ∂Ω of Ω.Then we find:Z

Ω

p(x)∇Φ ·∇ψ =ZΩ

−ψ∇ · [p(x)∇Φ] +Z∂Ω

ψp(x)∇Φ · n

Using the common notation: ∂nΦ = ∇Φ ·n for this normal derivative, wehave found

δL(Φ,ψ) =ZΩ

[−∇ · [p(x)∇Φ]− q(x)Φ− f(x)]ψdx+Z∂Ω

ψp(x)∂nΦ

(1.12)

For finding the variational derivative, we restrict to test functions (whichvanish at the boundary) and find

δL(Φ) = −∇ · [p(x)∇Φ]− q(x)Φ− f(x) (1.13)

Finally, we recall the special notation for the Laplace operator:

Laplace operator: ∆ = ∇ ·∇ = ∂2x + ∂2y + ...

so, for instance

δ

|∇Φ|2 = −∆Φ

Lagrangian functionals for evolution equationsFor the characteristic example

L(u) =Z Z

Ω

·1

2ρ(x) (∂tu)

2 − 12c2 (∂xu)

2 − 12q(x)u2

¸dxdt

we find

δL(u; v) =Z Z

Ω

£ρ(x) (∂tu) (∂tv)− c2 (∂xu) (∂xv)− q(x)uv

¤dxdt

1.2. THEORY OF FIRST VARIATION 13

and after partial integration (special case of Gausz, but now using repeti-tive integration may be easier)

δL(u) = −ρ(x)∂2t u+ ∂x£c2 (∂xu)

¤− q(x)u;similarly, for the more general functional:

L(u) =Z ·Z

Ω

L(∂tu, ∂xu, u, x, t)dx

¸dt

δL(u) = − ∂

∂t

·∂L

∂(∂tu)

¸− ∂

∂x

·∂L

∂(∂xu)

¸+

∂L

∂u(1.14)

(Be careful with confusing notation: the ‘partial derivatives’ like ∂∂t here

also differentiate the function u and its derivatives that appear in thearguments.)

Quadratic formsFor any quadratic form a the directional derivative is given by twice thecorresponding bilinear functional:

δa(u; v) = 2b(u, v)

and when L is the operator corresponding to a : a(u) =< Lu, u >, thenthe variational derivative is

δa(u) = 2Lu

1.2.3 Stationarity condition

We now consider the basic optimization problem.Let M be a smooth manifold, and, as before, let TuM denote the set of ad-missible variations (tangent space to M at u). Considering L(u + εv) for anadmissible variation, in general this value will differ from L(u) in first order inε as follows from (1.6). A critical point will be defined by the fact that thisdifference is of higher (most times second) order.

Definition 8 A point u is called a critical point, or stationary point, of thefunctional L on the setM if the following holds:

δL(u; v) = 0 for all v ∈ TuM. (1.15)

Of course, as in finite dimensions, the notion of ‘critical point’ is a general-ization of a local maximum or minimum:

Proposition 9 If L has a local maximal or minimal value at u, then u is acritical point of L.

This is a basic result in the theory of ‘first variation‘: (1.15) gives the condi-tion for a point to be a critical point, and this is a necessary condition (usuallynot sufficient) for a point to be a local maximum or minimum.

14 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

1.2.4 Euler-Lagrange equation

It is possible to translate condition (1.15) into an explicit equation for u alongthe following lines.Let u be a critical point of the unconstrained variational problem for L onM.From the stationarity condition (1.15) and the fact that all test functions areadmissible variations (TuM ⊃ C∞0 (Ω)), it follows that certainly it must holdthat

δL(u; η) ≡ hδL(u), ηi = 0 for all η ∈ C∞0 (Ω). (1.16)

This leads to the equation for u:

Proposition 10 Euler-Lagrange equationIf u is a critical point of the unconstrained variational problem for L on M,then (provided δL(u) is a continuous function) u satisfies

δL(u) = 0. (1.17)

This equation for u is called the Euler-Lagrange equation of the functional L.

The proof of this result is an immediate consequence of the first order con-dition (1.16) and the following basic Lemma.

Lemma 11 Lagrange’s LemmaLet f be a continuous function on Ω that is such thatZ

Ω

f(x)η(x)dx = 0 for all η ∈ C∞0 (Ω).

Then f vanishes identically on (the interior of) Ω: f(x) = 0 for all x ∈ Ω.

Proof. Suppose that at some interior point x∗ of Ω the function f doesnot vanish, say has value α > 0. Then, from continuity of f , there is a smallneighbourhood of f such that f doesn’t vanish there, and in fact, f(x) > α/2for all x, |x − x∗| < r0 for small enough r0. Now take a test function, say η,that is nonnegative (positive) inside, and vanishes outside this neighbourhood.Then

RΩf(x)η(x)dx > α/2

RΩη(x)dx > 0, contradicting the assumption.

1.2.5 Natural boundary conditions

From the vanishing of the first variation for all test functions, the Euler-Lagrangeequation is obtained. For unconstrained problems, there may be more admissiblevariations than only test functions. In that case, for a critical point it shouldalso hold that the boundary contribution in (1.7) vanishes:Z

∂Ω

b(u; v) = 0, for all v ∈ TuM. (1.18)

For admissible variations different from test functions this condition will givecertain conditions for u on the boundary ∂Ω. If this happens, these conditionsare called natural boundary conditions: they appear as additional conditions fora critical point, not by the requirement that u should belong to M, but from

1.2. THEORY OF FIRST VARIATION 15

(1.18), which is a consequence of the stationarity condition (1.15).

Example. The vertical deflection u(x), with x ∈ [0, 1], of a string underthe influence of an external force f(x) is governed by the principal of minimalpotential energy. That is to say: the potential energy, which is given by

L(u) =Z 1

0

·1

2(∂xu)

2 − f(x)u¸dx,

attains the minimal possible value at the actual (physical) deflection, minimalwhen compared to all other virtual deflections. To specify this more, let usassume the string is fixed at the origin, and ‘free’ at the other endpoint: x(1)is arbitrary. Then, for this variational principle, the set of admissible elementsare all possible deflections that can be collected in the setM:

M = u ∈ C1(0, 1) | u(0) = 0 .The admissible variations are all the functions that vanish at the origin, but areotherwise unrestricted, in particular, have arbitrary value at x = 1:

TuM = v ∈ C1(0, 1) | v(0) = 0 ;clearly all test functions are admissible variations, so this is an unconstrainedvariational problem.Now assume that u is the minimal element; then the stationarity result leads to

δL(u; v) =

Z 1

0

[∂xu · ∂xv − f(x)v] dx

=

Z 1

0

£−∂2xu− f(x)¤ vdx+ v∂xu]x=1x=0

= 0 for all v ∈ TuMNow, first exploit the vanishing of the first variation for test functions: thenZ 1

0

£−∂2xu− f(x)¤ ηdx+ η∂xu]x=1x=0 =

Z 1

0

£−∂2xu− f(x)¤ ηdx = 0for all test functions η and therefore (with Lagrange’s lemma) we find the Euler-Lagrange equation:

−∂2xu− f(x) = 0, for x ∈ (0, 1).Now use this fact in the vanishing of the first variation for any admissiblevariation and, using the fact that v(0) = 0, find:Z 1

0

£−∂2xu− f(x)¤ vdx+ v∂xu]x=1x=0 = 0 + v∂xu]x=1 = v(1)∂xu(1).

Then this expression has to vanish (as a consequence of the vanishing of thefirst variation) for all admissible variations, and therefore for any value of v(1).This then leads trivially to the requirement that

∂xu(1) = 0.

16 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

Hence, the stationarity condition not only produces the Euler-Lagrange equa-tion, but also a boundary condition, here a Neumann condition at the ‘free’ end-point. This boundary condition was not prescribed in advance; it is an exampleof a natural boundary condition. (Observe that it would not have appeared ifalso at the right endpoint the deflection had been prescribed in advance, forinstance u(1) = 0.)From a mathematical point of view, the prescribed boundary condition at x = 0,and the natural boundary condition at x = 1 together make the problem forthe Euler-Lagrange equation a well posed Boundary Value problem (BVP): aunique solution exists.From a physical point of view, the natural boundary condition means that theactual deflection will be horizontal at the free endpoint. When the right end-point would have been prescribed, this can not be expected, except for a veryspecial value of the deflection at that point.

1.2.6 Weak formulation and Interface conditions

We have seen that the variational derivative of a density functional follows fromthe first variation by applying a partial integration, restricted to test functionsto avoid contributions at the boundary:

δL(u; η) ≡ hδL(u), ηi for all η ∈ C∞0 (Ω).Without having explicitly stated that, this assumes some additional regularityof the term to be integrated. For instance, looking atZ

∂xu∂xηdx =

Z−η∂2xu for each η ∈ C∞0 (1.19)

it is seen that the lhs is well defined for (piecewise) differentiable functions u,while a simple interpretation of the rhs requires the function u to be (piecewise)twice differentiable. That implies that when writing down the Euler-Lagrangeequation δL(u) = 0, in general some smoothness assumptions had to be madeabout the extremal function. On the other hand, we have actually seen thatthe first variation is the basic result from differentiation in a direction, so with-out partial integration, that result is valid. So we could just as well interpretthe result after partial integration, e.g. −∂2xu, even in cases when it is not acontinuous function, by requiring (1.19) to hold ! This turns out to be a veryfruitful idea to generalize the notion of differentiability of functions that are notdifferentiable in the classical way. This is then often called generalized deriva-tive, or distributional derivative (in the theory of generalized functions). Whenwe interpret a BVP in this way, we often talk about the variational formulationof the problem, or about the weak formulation. It is also the basis of FiniteElement methods, where a given equation is not interpreted pointwise, but in-tegrated against spline functions, so that a weak formulation results.

Example.

1. Consider the Heaviside function of unit step:

H(x) =

½0 for x < 01 for x > 0

1.2. THEORY OF FIRST VARIATION 17

The derivative at the point x = 0 cannot be defined in the classical sense,but in the distributional sense it is Dirac’s delta function

d

dxH(x) = δDir(x)

because for integration intervals that contain the origin it holdsZH(x)∂xη(x) = −η(0) = −

ZδDir(x)η(x) for each η ∈ C∞0 .

In the same way, derivatives of Dirac-function can be uniquely defined.

2. The differential equation

−∂2xu = H(x)has the corresponding weak formulationZ

∂xu∂xv =

ZH(x)v(x) for each η ∈ C∞0 .

3. Now consider the following problem that is characteristic for problems inintegrated optics where material properties change abruptly:

∂2xu+ (k2 + αH(x))u = 0

with k,α constants. At each side of the origin the solutions are simple,but the jump in the coefficient at the origin makes it at first sight unclearhow to connect these solutions. It is natural to require the solution tobe continuous at x = 0. This jump in coefficient will lead in general(when u(0) 6= 0) to a finite jump in the second derivative. And thiscan only be true if the first derivative is continuous (if it had a jump,the second derivative would be a Dirac delta function). Hence, for aunique interpretation, this differential equation has to be accompanied byso-called interface conditions at the point where the equation has to beinterpreted in a generalized sense:

u and ∂xu continuous at x = 0.

In solving the equation explicitly, and in the design of discretization schemes,these interface conditions are essential to match the solutions in the twohalf lines together. It should be noted that the weak formulationZ £

(∂xu)(∂xη)− (k2 + αH(x))uη¤= 0 for each η ∈ C∞0

for continuous functions u automatically lead to the continuity of the deriv-ative (split the integral as the sum of integrals over the two half-lines, andobserve the contributions at the origin from the two sides). So continuityof the derivative will be a direct consequence of the variational formula-tion, it does not have to be required separately. This can be exploited inthe design of numerical schemes that start with the discretization of thecorresponding functionalZ £

(∂xu)2 − (k2 + αH(x))u2

¤.

18 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

1.3 Principle of Minimal Potential EnergyFor time independent problems, or for stationary states of time dependent prob-lems, the actual physical state may be described by a principle of minimum(potential) energy, which means the following:

• there is a set of admissible, physically acceptable, statesM,

• there is a (potential) energy functional E that assigns a value (“energy-like”) E(u) to each state u ∈M,

• the actual physical state is the state u that minimizes E onM.

We present several examples to illustrate the applicability.

Dirichlet’s principle

In a domain Ω ⊂ R3 with an electrostatic field E, the potential energy is RΩ12E

2.Since rot E = 0, the field is conservative: E = −∇φ for an electro-magneticpotential φ. In the presence of a charge distribution ρ in the domain, the totalelectrostatic energy is given by

E(φ) =ZΩ

12|∇φ|2 − ρ(x)φdx.

Dirichlet’s principle states that the actual field is such that it minimizes thetotal energy among all potentials that satisfy certain boundary conditions.Two types of boundary conditions are usually considered. When the boundaryconsists of two parts ∂Ω = ∂Ω1 ∪ ∂Ω2, they can be described as• ∂Ω1 is conducting, i.e. E ·τ = 0 for each tangent vector τ ; this is achievedby requiring φ = 0 on the boundary;

• ∂Ω2 is insulating: E ·n = 0 on the boundary. This implies that the normalderivative of φ vanishes on the boundary ∂nφ = 0.

The minimization problem

φ ∈Min E(φ) | φ(x) = 0 for x ∈ ∂Ω1 leads to the boundary value problem

−∆φ = ρ(x) in Ω,

φ = 0 on ∂Ω1,

∂nφ = 0 on ∂Ω2.

Observe that the Neumann condition on ∂Ω2 arises as a natural boundary con-dition!Also note that when ∂Ω1 is empty (only Neumann conditions) a solution canonly exist if

Rρ = 0. Inhomogeneous Dirichlet and Neumann boundary condi-

tions can be obtained also: the Dirichlet conditions by prescribing the potential,the Neumann condition by adding a suitable boundary functional to the energy.Exercise.

1.3. PRINCIPLE OF MINIMAL POTENTIAL ENERGY 19

1. Show that a critical point of

Crit E(φ)−Z∂Ω2

ψ2φ | φ(x) = ψ1(x) for x ∈ ∂Ω1

satisfies−∆φ = ρ(x) in Ω,

φ = ψ1 on ∂Ω1,

∂nφ = ψ2 on ∂Ω2.

2. Show that there exists at most one critical point, and that, if it exists, itis in fact a minimizer.

3. When ∂Ω1 is empty, derive the necessary condition between ψ2 and ρ fora solution to exist. How is this condition related to the finiteness of theminimum value, i.e. to the boundedness from below, of the functional?

Bars and plates, strings and membranes

An elastic medium is characterized by the fact that deformations from a givenrest state require a certain amount of energy. In general, the local energy densitywill depend on the extension as well as on the curvature of the medium.For simplicity we first restrict to 1D elastic media with a rest state along thex-axis, and deformations in a plane. Two idealizations are

• strings: completely flexible, but extension requires energy,

• bars: fixed length (inextricable), but bending requires energy.

Theory of bars

For a bar with (fixed) length ` it is natural to use the arclength as parameterand to describe its position in the plane as

r(s) = (x(s), y(s)), or rs(s) = (cos θ(s), sin θ(s)).

The curvature k(s) at a point s is defined (up to sign) by

k(s)2 = |rss|2 ≡ |θs(s)|2.

The material properties can be described with a local energy density E =E(s, k(s)) (depending on position and curvature); in the presence of an externaladditional potential energy V = V (r), the total energy is then given byZ `

0

E(s, k(s)) + V (r(s))ds.

20 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

Usually, E is an even function of k and minimal at k = 0. Linear elasticitytheory (assuming small curvatures) then approximates E like E(s, k) ≈ E(s, 0)+12σ(s)k

2 leading to an approximate energy functionalZ `

0

12σ(s)|rss|2 + V (r(s))ds,

with Euler -Lagrange equation

∂2s [σ(s)rss] +∇rV (r) = 0When looking for small vertical deformations u from the rest state along thex-axis, x instead of s is used as the independent variable. (Hence a laboratorycoordinate x instead of the material coordinate s; note that then, with x ∈ [0, `],the bar slightly extends.) If f is a prescribed vertical force (≈ ∂yV (x, 0)), theresulting energy functional becomesZ `

0

12σ(x)u2xx − f(x)u(x)dx.

The Euler-Lagrange equation reads

∂2x[σuxx] = f(x), for x ∈ (0, `).Concerning boundary conditions, two types can be distinguished:

• supported endpoint : only the position is prescribed, for instance u(0) = 0;• inclined endpoint : a more restricted condition for which both the positionand the angle are prescribed, for instance u(0) = 0, ux(0) = 1.

When a bar is supported at one endpoint, say x = 0, a natural boundarycondition arises in addition to the prescribed position:

u(0) = 0, σ(0)uxx(0) = 0;

at an inclined end point, no natural boundary conditions arise.

Theory of strings

In a string, both longitudinal and transverse displacements of particles willoccur. If one restricts to small deflections form a state of rest along the x-axis,the change in length is in lowest order given by

[p1 + u2x − 1] ≈

1

2u2x.

With σ(x) the tension in the undeformed state, and f an external vertical force,the approximated potential energy is given byZ `

0

12σu2x − f(x)udx,

leading to the Euler-Lagrange equation

−∂x[σ(x)ux] = f(x) for x ∈ (0, `).When the deflection is not prescribed at an end point, a natural boundarycondition appears: σux = 0.

1.4. DYNAMICAL SYSTEMS AND EVOLUTION EQUATIONS 21

2D-elasticity: plates and membranes

2D elasticity is a direct generalization of 1D elastica; in the linear approximation,the analog of a bar is a plate and has potential energyZ

Ω

12σ(∆u)2 − f(x)udx;

the analog of a string is a membrane, with approximate potential energyZΩ

12σ|∇u|2 − f(x)udx.

Boundary conditions, prescribed and natural boundary conditions, can be ofthe same type as in the 1D case.

1.4 Dynamical Systems and Evolution EquationsIn this section we consider various dynamical systems with some variationalstructure. We start with the classical systems from Classical mechanics, andend up with Poisson systems, the most general structure that seems easiestapplicable to partial differential equations. Also dissipative systems will beconsidered from a variational point of view.

1.4.1 Classical Mechanics

Problems from Classical Mechanics deal with the motion of a finite number ofpoint-masses, usually with interaction between them and in a force field. Thesesystems are described by ordinary differential equations for the position of eachof the masses. Roughly speaking, these equations are in the form of newton’slaw of force. But, and that is characteristic for Classical Mechanics, the systemsare ’conservative’, in general: there is no dissipation. In many cases some ‘totalenergy’ quantity is conserved. From the conservative nature it follows that theequations of motion themselves have a variational structure: the actual motionsare critical points of a certain functional, the ‘action functional’. We considertwo ways of describing such systems; the Lagrangian and the Hamiltonian way.It may be observed that the integrals are integrals over time, from an initial toa final time. Boundary conditions are NOT considered in general, meaning thatat these moments the actual configuration is supposed to be known; note thatthis (boundary-value formulation, without, however, specifying the positions) isvery different from an initial-value problem.

Lagrangian systems

First the general definition, with the characteristic nomenclature, then simpleexamples

Definition 12 A dynamical system with position vector q ∈ RN is a Lagrangiansystem if a Lagrangian L(q, q, t) can be given such that critical points of theaction functional (or Lagrangian functional)

L(q) =ZL(q, q, t)dt

22 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

(the corresponding Euler Lagrange equations) are the dynamic equations of thesystem. This variational principle is often called the action principle.

Consider the motion of a single mass-point of mass m moving along thex-axis, position at time t denoted by q(t). Consider the Lagrangian functional

L(q) =

Z ·1

2mq2 − V (q, t)

¸dt

which assigns to a certain trajectory t→ q(t) between initial and final time thevalue determined by L.The action-principle then states that the actual, physical trajectory is the one

that is a critical point of the Lagrangian functional; more precisely; given initialposition q(ti) = P and final position q(tf ) = Q, the admissible trajectories arethose that connect these points in the specified time interval and the admissiblevariations are deformations of the trajectory that vanish at the initial and finaltime. The Euler-Lagrange equations is given by:

mq +∂V

∂q= 0.

This is a simple form of Newton’s equation for a system of one degree of freedom:q is the acceleration, and−dVdq is the ‘force’, which is here the derivative of the so-called potential energy function V (q); such force-fields are called ’conservative’.Note that the Lagrangian density is the difference between kinetic energy and

potential energy ; this is typically the case for systems from Classical Mechanics..Also, consider the total energy E(q, q) :

E(q, q) :=1

2mq2 + V (q, t)

and calculate directly that for solutions of the equations it holds

d

dtE(q, q) = mqq +

∂V

∂qq +

∂V

∂t=

·mq +

∂V

∂q

¸q +

∂V

∂t=

∂V

∂t.

Hence, if the Lagrangian density does not explicitly depend on time, i.e. ∂V∂t = 0,

the total energy is conserved :

d

dtE(q, q) = 0 if

∂V

∂t= 0.

Again, this is a property that holds in more general systems as well, as we shallsee.For such systems of one degree of freedom, energy conservation allows phase-

plane analysis: in the phase plane (q, q) the motion of the particle is restrictedto a level set of the total energy E(q, q) = E0, where the value E0 is determinedby specifying an initial position and velocity.Example. Phase plane analysis for oscillators4

We now consider simple potential energy functions, and find the characteristicbehaviour of the solutions form the phase plane analysis. Usually this type of

4We will use phse-plane analysis in the investigation of soliton-profiles in KdV and NLSequations later on.

1.4. DYNAMICAL SYSTEMS AND EVOLUTION EQUATIONS 23

equations are called ‘oscillator equations’: they describe the motion of a masspoint attached to a spring that exerts a force on the particle depending onits deviation form the rest-position at the origin. The simplest case of a linearrestoring force leads to periodic motions around the origin; nonlinear effects willdisturb these motions a little bit near the origin, but may have large influencefor large-amplitude motions; this can be seen from the phase plane analysis.

1. Harmonic oscillator : Show that trajectories are ellipses for the well knownlinear oscillator of which all solutions can be written down explicitly:

mq + ω2q = 0

2. Nonlinear oscillator with quadratic nonlinearity :

mq + ω2q + αq2 = 0

3. Nonlinear oscillator with qubic nonlinearity, Duffing’s equation:

mq + ω2q + γq3 = 0

Exercise. Generalization to systems of N degrees of freedom.The above example deals with a system of one particle moving along a line: onedegree of freedom. Now we consider more degrees of freedom; either becausethere are more particles with a one-dimensional motion, or one particle butmoving in more dimensions.

1. Consider Lagrangian density for position vector q = q(t) ∈ RN given by asmooth function such that the Lagrangian functional reads:

L = L(q, q, t)

Show that the Euler Lagrange equations are now N equations that invector form are written like

− ddt

·∂L

∂q

¸+

∂L

∂q= 0

Specialize this result for a system of N mass-points with mutual interac-tions between them:

L = ΣNk=11

2mkq

2k − V (q1, ..., qN )

2. Consider one particle of mass m that moves in the (x, y) plane: t →(x(t), y(t)), under the influence of a conservative force field from a poten-tial energy function V (x, y).

(a) Formulate this case in the notation of the exercise above.

(b) Take the special case that the potential energy does not depend ontime; show energy conservation.

24 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

(c) Suppose the potential energy depends only on the distance of theparticle from the origin

V (x, y) =W³p

x2 + y2´,

for which ∇V is a so-called ‘central’ force field. Write down theequations.

(d) Introduce polar coordinates (r,φ) in the plane, so that the trajectoryis described by t → (r(t),φ(t)). Write down the equations of mo-tion in terms of (r(t),φ(t)) by transforming the original equations inCartesian coordinates. Do the same for the total energy.

(e) In the last case, the equations can also be obtained by performing thetransformation in the Lagrangian: find the transformed Lagrangianin terms of (r(t),φ(t)) and (r(t), φ(t)). Take for this Lagrangian theaction principle and find the Euler Lagrange equations for (r(t),φ(t)). Conclude that the same equations are found.

(f) Observing the equations of motion in polar coordinates, except thetotal energy, another constant of the motion is found, the angularmomentum; what is the physical interpretation? Note that is a con-sequence that the variable φ does not appear explicitly in the La-grangian: a missing coordinate in the Lagrangian is called ‘cyclic’and automatically gives rise to a corresponding constant of the mo-tion; show this in general.

(g) Take as a comparable case the motion of a spherical pendulum: writedown the Lagrangian and find the equations of motion, the energyand angular momentum conservation.

3. Define the total energy of a system with Lagrangian L(q, q, t) by

E(q, q, t) = q · ∂L∂q− L

Show that there is energy conservation when L does not depend on texplicitly:

d

dtE(q, q, t) = 0 for solutions if

∂L

∂t= 0

4. Consider as a specific example of an infinite dimensional Lagrangian sys-tem for functions u(x, t) the Lagrangian density

L(u,∂tu) =

Z ·1

2(∂tu)

2 − c2

2(∂xu)

2 − fu¸dx;

note the notation: L(u, ∂tu) is, on the one hand, a functional as far itconcerns the dependence on x, and therefore it actually depends on twofunctions, here denoted by u and ∂tu, which makes sense after consideringthe action functional which really maps the function u of (x, t) into thereals: Z

L(u, ∂tu)dt

1.4. DYNAMICAL SYSTEMS AND EVOLUTION EQUATIONS 25

The Euler-Lagrange equation is a forced wave equation

∂2t u = c2∂2xu− f

which is supplemented by prescribed and/or natural boundary condi-tions, depending on the conditions of the functions in spatial variables.A more dimensional analog, with more spatial dimensions, is L(u, ∂tu) =R h

12(∂tu)

2 − c2

2 |∇u|2 − fuidx with Euler-Lagrange equation

∂2t u = c2∆u− f

Classical Hamiltonian systems

One way to introduce Hamiltonian systems is as an alternative description forof Lagrangian systems, although the correspondence is not one-to-one. Theobservation is that the Euler Lagrange equations for Lagrangian L are typicallysecond order in time. If a first order in time description is preferred (for whichthere may good reasons, for instance from a conceptual point of view of theIVP — Initial Value problem —) this can be done by introducing more dependentvariables: except the position vector q ∈ RN one introduces momentum-typeof variables p ∈ RN and looks for first order in time system of equations in thepair (q, p) ∈ RN × RN . Starting with a Lagrangian system, this can often bedone in a systematic way using Legendre transformation.Somewhat more general, we define

Definition 13 A dynamical system is called a classical Hamiltonian system ifthe dynamical equations can be described with pairs of variables (q, p) ∈ RN ×RN in the so-called the phase space, and with a Hamiltonian H(t, q, p) : R ×RN × RN → R in and such that the dynamical equations are found from thecanonical action principle, which means as the critical points of the canonicalaction functional

Ac(q, p) =Z[p(t) · ∂tq(t)−H(t, q(t), p(t))] dt,

and hence satisfying the so-called Hamilton equations(∂tq = ∂H

∂p

∂tp = −∂H∂q

(1.20)

In many problems form classical and continuous mechanics, the Hamiltonianis the sum of kinetic and potential energy, i.e. the total energy ; this is differentfrom the Lagrangian, which often is the difference between kinetic and potentialenergy.Almost immediately seen from the equations is that for autonomous Hamil-

tonians there is energy-conservation:

when∂H

∂t= 0 the Hamiltonian is conserved:

d

dtH(q, p) = 0.

Exercise.

26 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

1. As a specific example of a finite dimensional system, verify that New-ton’s equations as given in Lagrangian form by the system of second orderequations with mass matrix M and potential energy V (q) by

Mq = −∂qV (q)

are also obtained for the Hamiltonian

H(q, p) =1

2p ·M−1p+ V (q)

since then Hamilton’s equations read:

∂tq = Mp

∂tp = −∂qV (q).

Observe that H is indeed the total energy.

2. For plane fluid motions, when the flow is supposed to be irrotational, theEulerian velocity field v(x, y) can be written with a stream function ψ like

v(x, y) = (ψy,−ψx).

Observing that the fluid velocity is the particle velocity,

v(x, y) = (∂tx, ∂ty),

it is clear that the particle dynamics is a Hamiltonian system, with thestream function as Hamiltonian. This is an example for which a descrip-tion with a Lagrangian is not possible in general.

3. Consider as a specific example of an infinite dimensional Hamiltoniansystem for functions u(x, t), p(x, t) the Hamiltonian

H(u, p) =

Z ·1

2p2 +

c2

2(∂xu)

2 + fu

¸dx;

The Hamiltonian is a functional on the space of spatially depending pairsof functions (u, p). The canonical action functional now readsZ ½Z

[p∂tu] dx−H(u, p)¾dt

and Hamilton’s equations are

∂tu = δpH(u, p) = p

∂tp = −δuH(u) = c2∂2xu− f

describing the same forced wave equation as treated earlier in the La-grangian setting.

1.4. DYNAMICAL SYSTEMS AND EVOLUTION EQUATIONS 27

1.4.2 Poisson systems

Another way of writing Hamilton’s equations will lead the way to a very fruitfulgenuine generalization. The other way is to recognize that the classical Hamil-ton’s equations can be written like

∂t

µqp

¶=

µ0 IN−IN 0

¶µ∂qH∂pH

¶where IN is the identity matrix in RN . Using the notation u = (q, p), J =µ

0 IN−IN 0

¶, this can be written in the compact form

∂tu = J∇H(u)J is called the standard symplectic matrix. It is skew-symmetric J∗ = −J, andalso invertible: J−1 = −J, J2 = −I2n. Observe that the skew-symmetryimmediately implies the conservation of H (when not depending explicitly ontime):

∂tH(u) = ∇H(u) · ∂tu = ∇H(u) · J∇H(u) = 0

the last equality since for skew-symmetric matrix it holds that a · Ja = 0 forany vector a.

This description motivates a generalization of Hamiltonian systems to so-called Poisson systems. These are systems that have a conservative structureand appear quite regularly is systems form classical mechanics and especially incontinuous systems from mathematical physics. The structure of the equationimplies that there is an integral, the Hamiltonian of the system (which is mosttimes the total energy). Here we investigate only the simplest properties, and inparticular generalize the symplectic matrix J from above to any skew-symmetricoperator. However, we do restrict to operators that are ’constant’, i.e. do notdepend on the variables of the system; this makes the theory much simpler, butalso less interesting from a geometric point of view5.The following definition applies equally well for finite as for infinite dimen-

sional systems; the wording that we use for infinite dimensional case.

Definition 14 A Poisson system in the so-called state space U is a dynamicalsystem for which the equations of motion are of the form

∂tu = Γ δH(u) (1.21)

with:

• H : U → R, a functional called the Hamiltonian,• δH the variational derivative, defined with the innerproduct in which theoperator

5 In particular, the famous Jacobi identity will be automatically satisfied for constant op-erators, and the many Lie-algebraic consequences are not considered here, nor the relationbetween integrals in (Poisson) involution and the commutation property of their flows. seee.g. [13].

28 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

• Γ is a linear and skew-symmetric operator (sometimes called structuremap)

hζ,Γηi = −hΓζ, ηi, for each pair ζ, η ∈ U

Related to the specific operator Γ one defines the Poisson bracket of twofunctionals like

F,G := hδF, ΓδGiIn the cases that we will consider, the innerproduct is the standard L2-

innerproduct of the function space U or the standard innerproduct when isfinite dimensional; in the last case, δH(u) = ∇H(u).

We will now consider the simplest dynamic properties of Poisson systems.

Equilibria

If Γ is not degenerate (invertible), the only equilibria are elements u with

δH(u) = 0, (1.22)

i.e. the critical points of H are equilibrium solutions:

u ∈ Crit H(u) | u ∈ U . (1.23)

Remark. Note that these are the same as for the gradient system ∂tu =−δH(u), which equation may be used to construct the special critical pointsthat are the minimizers of H.If Γ is degenerate, elements u such that δH(u) ∈ ker(Γ) will be other equi-

libria.

Diagnostics

Any functional F evolves according to

∂tF (u) = F,H (u).In particular, the next result holds.

Proposition 15 For the Poisson system ∂tu = ΓδH(u), a functional I is afirst integral iff I Poisson commutes with H, meaning I,H = 0:

∂tI(u) = 0 iff I,H = 0.Since H,H = 0 from skew-symmetry, the Hamiltonian H itself is a firstintegral:

∂tH(u) = 0.

Canonical Hamiltonian systems

The standard example of a Poisson system is a classical Hamiltonian system westarted with as motivation; it is often called a canonical Hamiltonian system.The canonical Poisson bracket with J as structure map is given by

F,G = ∇F · J∇G,

1.4. DYNAMICAL SYSTEMS AND EVOLUTION EQUATIONS 29

Complex canonical structure

Associated to the real canonical structure described by (??) there is a naturalcomplex structure. The essential relation is that the symplectic matrix J in realspace corresponds to multiplication with the imaginary unit i in complex space.Briefly, for z ∈ Cn, let z denotes the complex conjugate:if z = q + ip ∈ C, (q, p) ∈ Rn ×Rn, then z = q − ip.

The inner product of z1, and z2, with zk = qk + ipk, reads

hz1, z2iC = Re(z1 · z2) = q1 · q2 + p1 · p2where Re denotes the real part. Functions (real valued) F on Cn are relatedto (real valued) functions F on R2n by F (z) = F (q + ip) = F (q, p) and for thederivative it holds

dF (z) = ∂qF + i∂pF.

Hence the Poisson bracket

F , G(z) := hdF (z),−idG(z)iC (1.24)

is naturally related to the real canonical bracket (??):

F , G = ∂qF · ∂pG− ∂pF · ∂qG = F,G.

The state equation for a Poisson system with Hamiltonian H is

∂tz = −idH(z). (1.25)

Example. A system of n uncoupled harmonic oscillators with (real) fre-quencies ω1, . . . ,ωn is described in complex variables with a Hamiltonian

H(z) = Σk1

2ωk|zk|2, z = (z1, . . . , zn) ∈ Cn

as

∂tzk = −iωkzk, 1 ≤ k ≤ n.

1.4.3 Evolution equations (Nonlinear Wave equations)

We now briefly describe the particular Poisson structure that is found for varioustypes of non-linear wave equations. More details are given in the Appendices Aand B.

Boussinesq Equations

Equations for surface waves on a layer of fluid, and optical pulses, that dependon one spatial variable but allowing waves running in both directions often havethe form

∂t u = −∂x δηH(u, η), ∂t η = −∂x δuH(u, η),

30 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

i.e. written in characteristic Poisson-form:

∂t

µuη

¶= −

µ0 ∂x∂x 0

¶µδuH(u, η)δηH(u, η)

¶where u is a velocity-type of variable, and η is the surface elevation. TheHamiltonian usually consists of a part that determines the linear dispersiveproperties, and a part to account for the nonlinearity in the equations.

KdV (Korteweg - de Vries) Equation

Equations for surface waves on a layer of fluid, and optical pulses, that depend onone spatial variable and are restricted to waves running mainly in one directionoften have the form

∂t η = −∂x δuH(η),where η is the surface elevation. Again, the Hamiltonian usually consists of apart that determines linear dispersive properties, and a part to account for thenonlinearity in the equations.

NLS (Non-Linear Schrodinger) Equation

When modulations of a linear monochromatic wave are studied for KdV-typeof equations, the complex amplitude A(x, t) satisfies an NLS-type of equationthat is of the form of a complex infinite dimensional Hamiltonian system:

∂tA = iδH(A),

with Hamiltonian that accounts for linear dispersive and nonlinear effects.

1.4.4 Gradient systems (Steepest decent)

Gradient systems usually describe systems with some dissipative character.These systems can also be used in a constructive way to calculate minimizersof a given smooth functional.

Definition 16 A gradient system is a dynamical system in the state space Uof the form

∂tu = −δH(u)where H : U → R a given functional.

Remark. It is possible to consider, somewhat more general, equations ofthe form

∂tu = −SδH(u)with: S a linear and self-adjoint operator hζ, Sηi = hSζ, ηi, that is positive-definite: hζ, Sζi > 0. Then much of the following is easily generalized to thiscase. Observe the essential difference with Poisson systems: instead of a skew-symmetric structure map Γ we now have a symmetric operator S.Observe that equilibrium solutions of a gradient system arise from a varia-

tional principle:

1.4. DYNAMICAL SYSTEMS AND EVOLUTION EQUATIONS 31

Proposition 17 For a gradient system ∂tu = −δH(u), the dynamic equilib-rium solutions u are precisely the critical points of the functional H:

u ∈ CrituH(u) : δH(u) = 0.

The role that the functional H plays for the dynamics can be understood bystudying the evolution of H on trajectories and explains why such systems arecalled dissipative: it is a fact that H decreases monotonically outside equilibria:

∂tH(u) = hδH(u),−δH(u)i = −|δH(u)|2½ ≤ 0= 0 iff δH(u) = 0

.

This shows in particular that if u is an isolated local minimizer of H, trajec-tories starting close eventually approach the point u; it is said that the point uis an asymptotically stable equilibrium solution; the rate of convergence to theminimizer depends on the geometry of the level sets of H near u.

Proposition 18 Local minimizers of H are asymptotically stable equilibriumsolutions of the gradient system.

The decrease of H on solutions also clearly shows that the solutions definetrajectories of steepest descent : at each point the trajectory is along the direc-tion of steepest descent of the functional H. They define the most ’efficient’ wayto reach lower values of H. This observation can be exploited in a constructiveway. Using a discretization for the time derivative, these equations are oftenused to find a (possibly local) minimizer of H in a numerical way.The dynamic behaviour near a local minimizer (where H decreases to its lowestpossible value) is completely determined by the topological properties of thelevel sets of H.

Exercise.

1. In Rn consider

∂tx = −∇H(x)

with behaviour of the function H like H(x) ≈ |x|α near 0 for some α > 1.Suppose that x = 0 is a (local) minimizer for H. Then determine the rateof convergence to 0 depending on the value of α.

2. The simple linear diffusion equation

∂tu = uxx, u(0) = u(π) = 0

is of the form of a gradient system with H(u) =R

12 u

2x. Determine the

rate of convergence to the zero state of any solution; compare this with theexact general solution that can be written in a Fourier series with respectto the spatial variable.

3. The non-linear diffusion equation (with α ∈ R)

∂tu = uxx + αu(1− u), u(0) = u(π) = 0.

32 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

is a gradient system too:

∂u = −δH(u) with H(u) =

Z[1

2u2x − α(

1

2u2 − 1

3u3)]dx.

Investigate for which values of α the trivial solution u ≡ 0 is the minimizer,and find the rate of convergence in that case.

1.5 Exercises

1. Calculus for variational derivativesSince functionals map functions into R, functionals can be added andmultiplied. Verify the following rules of calculation that are well knownfor functions on finite dimensional spaces:

linearity : δ(L1 + L2) = δL1 + δ2;

product rule : δ(L1.L2) = L2δL1 + L1δ2;

quotient rule : δL1L2 =

L2δL1 − L1δ2L22

for g : R→ R : δg(L) = g0(L)δL

Derive the corresponding expressions for the second variation.

2. Calculate the first variation and the variational derivative of the followingfunctionals. Below we use the notation with subscripts to denote thederivative: ux = ∂xu, uxx = ∂2xu.

(a) L(u) := R 10

£xu(x)2 + ux(x)

2¤dx

(b) L(u) := R 10

£sin(x)u(x)2 + x3 ux(x)

2¤dx

(c) L(u) := R £u(x)2 + ux(x)2¤ dx(d) L(u) := R 1

0

£sin(u(x)) + uxx(x)

2¤dx

(e) L(u) := R 10

£u(x)4 + ux(x)

7¤dx

(f) L(u) := R 10n(x)

p1 + ux(x)2 dx

(g) L(q) := R £12 q(t)2 − 12 q(t)

2 + q(t)3¤dt

(h) L(u) := R £12 ux(x)2 + x3 sin(u(x)) + u(x)5¤ dx(i) L(u) := R L(x, u, ux)dx, with L a given smooth function of its argu-

ments.

(j) L(u) := RL(x, u, ux, uxx)dx, with L a given smooth function of its

arguments.

1.5. EXERCISES 33

3. Conservative force fields and calculation of the potential.A differentiable vector field F : Rn → Rn is called conservative if thereexists a scalar function f : Rn → R (the so-called potential) such thatF (x) ≡ ∇f(x).Given a conservative filed F and then to find the potential f is the more-dimensional analog of finding the primitive of a function of one variable.It is often called the ’inverse’ problem.

(a) Find the conditions on F that guarantee that it is conservative.

(b) To solve the inverse problem, observe that if f is the potential, thenits derivative along a curve ξ(s) can be written like:

d

dsf(ξ(s)) = F (ξ(s)) · dξ

ds

Show from this that the potential is uniquely defined (up to its valueat one point, here taken to be the point x∗) by F and can be found byintegrating along an arbitrary curve ξ(s) from x∗ = ξ(0) to x = ξ(1)

f(x)− f(x∗) =Z 1

0

F (ξ(s)) · dξdsds

Since the path is arbitrary, a simple path (linear line through x∗ andx) can be taken:

f(x)− f(x∗) =Z 1

0

F (x∗ + s(x− x∗)) · (x− x∗)ds

(c) Consider the following vector fields F on Rn; determine which onesare conservative, and which ones are not. If conservative, write downthe potential.

n = 2 : F (x, y) = (2x sin(xy) + x2y cos(xy), x3 cos(xy) + y3)

n = 2 : F (x, y) = (x2 sin(y), x2 cos(y))

n = 3 : F (x, y, z) = (2xy sin(z) + x3, x2 sin(z) + z, x2y cos(z) + y)

4. ** Inverse problem of the Calculus of VariationsThe variational derivative (and Euler-Lagrange equation) of a given func-tional can be written down. How is it possible to see for a given equation(bvp) if it is the Euler-Lagrange equation of some functional, and if itis, how can we find the functional? This is the ‘inverse problem’ of theCalculus of Variations, and a generalization of ‘conservative vector fields’of the previous exercise.For simplicity, we restrict to investigate the equation only, forgetting aboutboundary values, but these can be included.The operator u 7→ E(u) is called conservative if there exists some func-tional L, again called the potential, such that

E(u) ≡ δL(u).Let E0(u) denote the formal derivative at the point u. Prove the followingresult.

34 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

(a) Proposition 19 The operator E is conservative if its derivative de-fines a symmetric bilinear form, i.e. if

hE0(u)ξ, ηi = hξ, E0(u)ηifor all functions ξ, η.If that is the case, the potential is given (up to a constant) by

L(u)− L(0) =Z 1

0

hE(su), ui ds.

5. Find the variational formulation of each of the following boundary valueproblems:

(a) −uxx = sin(u)+exu2, u(0) = 0, ux(1) = 7. (Make sure the Neumanncondition arises as a natural boundary condition by introducing asimple boundary functional to the density functional that producesthe correct equation as E-L-equation.)

(b) −1r∂r(r∂ru) = f(r), ur(0) = u(1) = 0. (It may be helpful to interpretr as the radial coordinate in a description with polar coordinates.)

(c) −div [σ(x, y)∇u(x, y)]+u(x, y) = 0, u(x, 0) = u(x, 1) = 0; ux(0, y) =1;ux(1, y) = 0.(The Neumann conditions as natural boundary condi-tions.)

6. Linear two-point boundary value problemFor given f ∈ C0([0, 1]) consider

L(u) =Z 1

0

12u2x − f(x)udx.

(a) Prove: u ∈ C2 is a solution of the bvp( −uxx = f on (0, 1)

u(0) = ux(1) = 0

iff u is the only critical point of L onM0 = u piecewise differentiable | u(0) = 0 ;

in fact it is a minimizer for L on this set. (Concerning additional reg-ularity for a critical point, see also the next Chapter, the Exercise on“Lemma DuBois-Reymond, Integrated Euler-Lagrange equation”.)

(b) Show that for the Neumann problem

−uxx = f, ux(0) = ux(1) = 0,

there exists a solution iffR 10f(x)dx = 0. If it exists, the solution is

not unique. Moreover show that

• if R 10f(x)dx = 0, u is a solution iff it is a minimizer (not iso-

lated) of L on the set of piecewise differentiable functions (norestrictions on the boundary);

1.5. EXERCISES 35

• if R 10f(x)dx 6= 0, L does not have a critical point on the set of

piecewise differentiable functions (no restrictions on the bound-ary); the infimum of this functional is −∞.

7. Light rays, Fermat’s principleAccording to Fermat, the trajectory of a light ray between two points issuch that the required time is as small as possible. The propagation speedof light depends on material properties, which is expressed by c0/n wherec0 is the speed in vacuum (which is maximal), and n > 1 is the so-calledindex of refraction, characteristic for the material.For trajectories, for simplicity described as graphs of functions x→ y(x),the total time between points isZ

n(x, y)p1 + y2xdx

This is also often called the optical pathlength. Note that this functionalcan also be given very different interpretations, depending on the meaningof n (for instance: the cost of a road between points when the local costsare given by n).

(a) Write down the Euler-Lagrange equation.

(b) Determine the optimal trajectory in case n does not depend on yexplicitly. Then use the ‘conservation’-property expressed by the E-L equations to study the trajectories.

(c) Determine the optimal trajectory in case n does not depend on xexplicitly. Then use ‘energy-conservation’ to study trajectories. [[Al-ternatively: describe the trajectories as functions x(y) and transformthe functional.]]

(d) Consider the special cases n = y and n = 1y for which the trajectories

can be expressed explicitly.

8. Boussinesq type of equationsSurface waves (in one horizontal direction x) that decay at infinity (|x| 7→±∞) can be described in terms of the wave height η(x, t) and a velocityu(x, t) in the following form (a Hamiltonian system):

∂t u = −∂x δηH(u, η) , (1.26)

∂t η = −∂x δuH(u, η) . (1.27)

for a suitable functional (the Hamiltonian) H(u, η).

(a) Describe the equations in full detail when the Hamiltonian is givenby the following functional

H(u, η) =

Z 12gη2 +

1

2(u2 − 1

3u2x) dx.

(This set of equations are the ‘linearized’ equations.)

36 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

(b) In another case (shallow water, no dispersion, but nonlinear), theequations are of the form

∂t u = −∂x g η + 12u2 ,

∂tη = −∂x u+ β η u ,where β is a constant. Determine the value of β such that this systemof equations is a Hamiltonian system of the form (3, 4) given above.

(c) Show that the equations have the horizontal momentum as constantof the motion:Z

u(x)η(x)dx

9. The KdV-eqn in normalized form is given by

∂tu = −∂x·u+ ∂2xu+

1

2u2¸.

(a) Show that it can be written as a generalized Hamiltonian system bydetermining the Hamiltonian H such that

∂t u(t) = ∂x δHKdV (u)

(b) Show that the following functionals are constants of the motion forKdV:

Ru(x) dx,

Ru(x)2 dx, HKdV (u).

10. The BBM-eqn in normalized form is given by

∂t u− ∂t∂2xu = −∂x(u+

1

2u2).

(a) Show that it can be written as a Hamiltonian system by determiningthe Hamiltonian H and L that is the inverse of a suitable differentialoperator such that it is given by

∂t u(t) = L ∂x δHBBM (u)

(b) Show that the following functionals are constants of the motion forBBM:

Ru(x) dx,

Ru(x)2 dx, HBBM (u).

11. Show that Burgers’ eqn

∂tu+ u∂xu = ∂2xu

can be written as a combination of a conservative and dissipative structureby determining functionals D and H such that it gets the form

∂t u = ∂x δH(u) + δD(u)

12. Periodic motions and boundary conditionsWe have already remarked that in general the dynamic variational prin-ciples are not well suited to prove existence; usually dynamic evolutions

1.5. EXERCISES 37

are saddle points of the action functional. In particular cases existencecan be proved with variational methods. The most successful results dealwith period solutions, the reason being that then the problem can be for-mulated as a boundary value problem. We will show that in this exercise.Consider a Lagrangian dynamical system with Lagrangian L. L may de-pend on t, but if it does, it is in a periodic way, say with period T . Thenone may look for motions that are periodic with period T .

(a) Show that the evolution t 7→ q(t) is T -periodic iff it is the periodiccontinuation of the function defined on [0, T ] that satisfies the periodicboundary conditions:

q(0) = q(T ), q(0) = q(T ).

(b) Show that (under mild assumptions) these boundary conditions arisepartly as natural boundary conditions from the action functional withprescribed boundary condition for q only: q(0) = q(T ).

(c) Formulate the periodic boundary conditions for a Hamiltonian sys-tem; show that they arise from the canonical action principle whenonly conditions on q are prescribed as above.

13. ** Stationary states of a nonlinear diffusion equationConsider the stationary solutions of a nonlinear diffusion equation on adomain Ω for which the diffusion coefficient D may depend on u:

div [D(u)∇u] + f(x, u) = 0 inΩ

u(x) = ϕ(x) on∂Ω1,

D(u) ∂u∂n = ψ(x) on∂Ω2.

(a) When D is constant derive the variational formulation of this bound-ary value problem (including the boundary conditions).

(b) Observe that when D depends on u there is no obvious variationalformulation.

(c) Suppose that D is positive and a monotone function of u. Considerthe transformation of the dependent variable u→ v such that

∇v = D(u)∇u.Show that v can be expressed directly in terms of u.

(d) Derive the governing boundary value problem for v.

(e) Show that the bvp for v has a variational structure; denote the gov-erning functional by L = L(v).

(f) Define uniquely the functional L of u asL(v) ≡ L(u).

Find the critical points of L(u). Verify the transformation betweenthe two formulations of the boundary value problem from relationsbetween δvL(v) and δuL.

38 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

(g) Conclusion?

14. ** Variations of the boundaryConsider (for simplicity, on the plane) a given density function ρ and thetotal ”mass” in a region Ω:

M(Ω) =

ρ(x, y)dxdy.

We want to see how M depends on Ω. (Assume that the regions are”convex-like” and can be deformed smoothly without introducing inter-sections.)

(a) First take the special case that Ω is the area between the x-axis andthe graph of a function η = η(x):

Ω = (x, y) | a ≤ x ≤ b, 0 ≤ y ≤ η(x) ,and consider the corresponding functional

L(η) =M(Ω).Determine the first variation and show that the variational derivativeof L for variations of the domain described by a variation of thefunction η is given by

δL(η) = ρ |y=η(x)≡ ρ(x, η(x)).

(b) Now, more generally, describe a variation of the boundary ∂Ω by a”normal” displacement σ (defined on the boundary). Determine thefirst variation of M .Can you find an expression for the variational derivative of M?Verify the formula for the case of a radial deformation of a circulardomain (and ρ = 1).

(c) Show that the more general result specializes to the case of changingthe graph that determines the boundary. (Relate a variation η andthe normal displacement σ in this case.)

15. ** Jacobi functional in Classical MechanicsFor a Lagrangian system for which the energy is conserved, one may lookfor solutions of prescribed total energy E.

(a) Consider the Jacobi functional on the set of functions [0, 1] 3 τ 7→x(τ) ∈ Rn with x(0) = x(1):

J(x) =

Z 1

0

pE − V (x)|xτ |dτ

Derive the equation for its critical points, and the boundary condi-tions.

(b) Show that for a suitable scaling of the parameter τ to t and a relatedtransformation x(τ) ≡ q(t), a standard second order Newton equa-tion for q and potential V results; show that the solution has indeedenergy E. What about the boundary conditions?

1.6. ** EXTENSIONS 39

(c) How can the Jacobi functional be obtained by constraining the actionprinciple to motions that satisfy the energy constraint?

(d) Show that the followingmodified Jacobi functional can serve the samepurposes:

[

Z 1

0

x2dτ ] . [E −Z 1

0

V (x)dτ ].

How can this functional be obtained form the action principle andan energy constraint?

1.6 ** Extensions

1.6.1 Theory of second variation

When for fixed v the function ε 7→ L(u+ εv) is twice differentiable, its secondderivative leads to the following notion.

Definition 20 The second variation of a functional L at u in the direction vis denoted by δ2L(u; v) and is defined as

δ2L(u; v) = d2

dε2L(u+ εv)

¯ε=0

. (1.28)

Hence we have

L(u+ εv) = L(u) + εδL(u; v) + 12ε2δ2L(u; v) + o(ε2). (1.29)

From this the following second order condition for an extremal element is obvi-ous.

Proposition 21 If u is a local extremal for L, the second variation is sign-definite for all directions v in the tangent space.Specifically, if L has a (local) minimum at u:

L(u) ≤ L(u) for all u ∈M in a neighbourhood of u, (1.30)

then

δ2L(u; v) ≥ 0 for all v ∈ TuM. (1.31)

In most cases, the second variation δ2L(u; v) is quadratic in v. When it is, itcan also be obtained as a repeated differentiation of the first variation. In fact,a bilinear form can be defined as follows:

Q(u; v, w) :=d

d

dεL(u+ εv + ρw)

¯ε=0,ρ=0

. (1.32)

When the order of differentiation can be interchanged, this form is in fact sym-metric in v and w:

Q(u; v, w) = Q(u;w, v) (1.33)

40 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

and leads to the introduction of a symmetric mapping Q(u) such that

Q(u; v, w) =

v ·Q(u)w ≡ hv,Q(u)wi. (1.34)

This mapping Q(u) is the generalization of the Hessian matrix of functions onEuclidian space. It is referred to as the second variation operator. Its relationto the second variation is explicitly given by

δ2L(u; v) = hv,Q(u)vi, (1.35)

and in fact, this relation, together with the requirement that Q is symmetric,can serve to define the operator Q.All these notions can also be translated in statements about the variational

derivative δL. This is made more precise in the next lemma which will be usedfrequently in the following.

Lemma 22 For a functional L onM, with δL its variational derivative, denotethe formal Frechet derivative of δL by DδL:

DδL(u)ξ := d

dεδL(u+ εξ)

¯ε=0

. (1.36)

Then DδL(u) : TuM → T ∗uM is a symmetric map, the second variationoperator, satisfying

δ2L(u; ξ) =< DδL(u)ξ, ξ >Proof. For arbitrary ξ and η from TuM it holds

hDδL(u)ξ, ηi = d

dεhδL(U + εξ), ηi

¯ε=0

.

Using the definition of variational derivative, this can be rewritten like

hDδL(u)ξ, ηi = d

d

dρL(u+ εξ + ρη)

¯ρ=0,ε=0

.

Assuming smoothness of the function (ε, ρ) → L(u + εξ + ρη), the order ofdifferentiation at the right hand side can be interchanged and one obtains thesymmetry as stated.

1.6.2 Legendre transformation

1.6.3 Convexity Theory

1.6.4 Hamilton Jacobi equations

1.6.5 Exercises

1. Nonlinear two-point boundary value problemFor given f ∈ C1([0, 1]×R,R) consider the non linear bvp( −uxx = f(x, u) on (0, 1)

u(0) = u(1) = 0

1.6. ** EXTENSIONS 41

(a) Give the variational formulation, i.e. the functional L such that itscritical points on M0 = u | u(0) = u(1) = 0 correspond to thesolutions of the bvp.

(b) Determine the second variation: η 7→ δ2L(u; η) ≡ Qu(η).(c) Write down the Euler-Lagrange equation for η 7→ Qu(η).

(d) Compare the result with the linearization of the bvp:( −ηxx = f 0(x, u)η on (0, 1)

η(0) = η(1) = 0.

(e) Show that if the linearized bvp has a nontrivial solution η, thenQu(η) = 0.

(f) Prove the general result:

Proposition 23 The linearization of the Euler-Lagrange equation ofa functional L around a solution u is the Euler-Lagrange equation ofthe second variation δ2L(u; ·).

2. Euler bucklingReconsider the transversal deflections of a bar, written with the arclengths, and angle θ = θ(s). Assume that in the rest state the bar is along thex-axis, length `, and fixed at x = 0. The other end point is free. Take forthe bending energy (related to the curvature θs)Z `

0

1

2θ2sds.

If at the free end point a force µ is acting in the direction of the negativex-axis, then µ[`−x(`)] is the work executed by the force. For given force,the deflection is described by the principle of minimal potential energy,i.e. of the functional

L(θ) =Z `

0

12θ2s − µ(1− cos θ)ds.

(a) Determine the bvp for a critical point. What is the meaning of the(natural) boundary condition at s = 0 and s = `.

(b) Relate the equation to that for the pendulum equation

x = − sinx;which dynamic solutions correspond to the desired deflection of thebar? Use “energy-conservation” to write down the solution implicitly.

(c) To investigate the Euler-buckling problem directly, observe that witha solution θ(s), also −θ(s) is a solution, and hence θ ≡ 0 is a solutionfor all µ.

(d) Determine the second variation around the trivial state, and showthat only for specific values of µ = µk, k ∈ N, the linearized equationhas nontrivial solutions, and determine these solutions. Verify thatall these solutions correspond to the same physical oscillation of thelinearized pendulum equation.

42 CHAPTER 1. BASIC CALCULUS OF VARIATIONS

(e) Conclude from the phase plane analysis of the pendulum equationthat for the Euler buckling there is a bifurcation value µ1 such thatfor µ < µ1 there is no nontrivial buckled state, while for any µ > µ1there is precisely one buckled state that is positive.

Chapter 2

Constrained Problems

2.1 Motivation and Introductory ExamplesExample. Finite dimensional constrained extremal problemsFor a function of one variable, at a minimizer x it holds f 0(x) = 0.Now consider a function of two variables, F (x, y). At a minimizer (x, y) it holds

∇F (x, y) = 0, i.e. ∂xF (x, y) = ∂yF (x, y) = 0,

∂xF (x, y) = 0 meaning that the function vanishes for variations in the x-direction, and ∂yF (x, y) = 0 meaning that it vanishes for variations in they-direction. From this we conclude that it vanishes for variations in any di-rection, which leads to ∇F (x, y) = 0. Of course, these relations hold whenwe are allowed to take variations in these directions. In this Chapter we willconsider constrained problems, i.e. problems which will lead to restrictions onthe admissible variations. In a simple example: if we would not be allowedto vary in the y-direction, we could not conclude that in a minimizer it holdsthat ∂yF (x, y) = 0. This would, for instance, be the case for the minimizationproblem:

Min F (x, y) | (x, y) ∈M , with M = (x, y) | y = 0 ,i.e. if we constrain the domain of definition. Then in a minimizer (x∗, 0) of thisproblem we can only conclude ∂xF (x∗, 0) = 0. With τ the tangent directionto M , and n the normal, we can interpret this as ∇F (x∗, 0) · τ = 0, but haveno information about ∇F (x∗, 0) ·n: this last component is at yet undetermined(just like we cannot determine the y-derivative of the function (x, y) → x2 + ywhen we restrict it to the line y = 0), and we could write

∇F (x∗, 0) = λn

for some number λ, which is usually called a multiplier. This interpretationbecomes even more appealing if we consider the minimization of the functionon a given curve in the (x, y)-plane, say

Min F (x, y) | (x, y) ∈M , with M = (x, y) | y = φ(x) .Then the constraint can easily be ‘eliminated???’, and we find that this is thesame as minimization of x→ F (x,φ(x)), which at a minimizer x∗ leads to

d

dxF (x,φ(x)) = ∂xF + ∂yF.

dx= ∇F · τ = 0

43

44 CHAPTER 2. CONSTRAINED PROBLEMS

where now τ =(1, ∂xφ) is tangent to the curve. And hence, again we concludethat at the minimizer (x∗, y∗) we can write

∇F (x∗, y∗) = λn

for some number λ.The same conclusions will be obtained if the curve is given implicitly by somerelation Φ(x, y) = 0 or when we consider extrema of a function of three variablesrestricted to a plane or to a 2-dimensional manifold in that space.

Example. Geometric LMRMore generally, consider a function F of, say, n variables, restricted to somesetM, which at each point has n− p admissible directions, which means thatthere will be p relations between the n components of each point. Consider-ing the directional derivative at a minimizing point x∗ in various directions η:DF (x∗)η = ∇F (x∗) ·η, the minimizing property implies that this is zero if η is adirection that is admissible at x∗, but we have no information for the p directionsthat are ‘perpendicular’ to the set at that point and which are non-admissiblevariations. So, ∇F (x∗) is undetermined in p non-admissible directions, and wecan write

∇F (x∗) = λ1n2 + λ2n2 + ...λpnp

for undetermined multipliers λ1,λ2, ...λp. This is called the Lagrange MultiplierRule.Of course, the fact that we know that the minimizing element x∗ is in theset M will actually reduce the number of ‘free’ components of x∗ to n − pwhich should be found from the n− p conditions from the vanishing derivativein admissible directions. Stated differently, when concerning the problem tofind the minimizer, consider the p multipliers and the n components of x∗ asunknowns; then these should be determined from the p relations between thecomponents of x∗ and the n equations of LMR, again the same number ofequations as the number of unknowns.

The above examples show already the way how to generalize to infinite di-mensions when we would be able to find the ‘tangent space’, or the set of‘normals’ to it. In finite dimensions, recall that the gradient of a function ata point defines the direction in which this function increases most (it is thedirection of steepest ascent), and is perpendicular to the level set of the func-tion through that point. Hence, if the constraint is given as a level set of anexplicit function, a normal direction is given by the gradient of the function atthat point. More generally, if the constrained set consists of the intersectionof levelsets of p functions, the p normals are found from the gradients of the pfunctions. All these directions will be perpendicular to the tangent space. If thep gradients are independent, the tangent space will be n − p dimensional; thisis usually the case, and such points are therefore called ‘regular’ points of themanifold. Stated differently; at a regular point of a manifold, the n-dimensionalspace decomposes into a number of p normal directions and remaining n − pdirections from the tangent space. A manifold with only regular points is thencalled a manifold of dimension n− p, or a manifold of co-dimension p.

Example. In finite dimensions Rn, n > 1 consider the following examples.

2.2. LAGRANGE MULTIPLIER RULE 45

1. For n = 2, the levelsets of a function K = K(x, y) are curves in theplane. A regular point u = (x, y) is one for which ∇K(x, y) 6= 0; thesingular points are those for which ∇K(x, y) = 0. At a regular point, thetangent ‘space’ is the one-dimensional straight line through the point inthe tangent direction: the direction vector τ such that ∇K(x, y) · τ = 0.Hence ∇K(x, y) is the normal to the tangent line, i.e. normal to the levelline.For instance, for K(x, y) = x2 + y2, every point on the level set K−1(k)with k > 0 is a regular point; for k = 0, the point (0, 0) (which is the onlypoint on the level set) is singular.

2. In R3, consider the intersection of a sphere with a horizontal plane:M = (x, y, z) | x2 + y2 + z2 = R2, z = ζ .

When |ζ| < R, each point is a regular point and the tangent space is one-dimensional; when ζ = R, the point (0, 0, R) is singular, and the tangentspace is two-dimensional.

3. Analytic LMR for levelset-constraintsConsistent with the above description is the result for a critical point x∗

of

Crit F (x) | x ∈M ,withM = x ∈ Rn | K1(x) = k1, . . . ,Kp(x) = kp .

Supposing it is a regular point of the manifold, i.e. ∇K1(x∗), ...,∇Kp(x

∗)are independent, then it holds that

∇F (x∗) = λ1∇K1(x∗) + ...+ λp∇Kp(x

∗)

All of this will now be generalized to infinite dimensions for the simplestcase that the set of admissible elements is the intersection of a finite number oflevel sets of specific functionals (hence p will be finite, while n− p is infinite ininfinite dimensional function spaces): an infinite dimensional manifold of finiteco-dimension p.

2.2 Lagrange Multiplier Rule

2.2.1 Constrained to levelsets

When talking about the setM and the independent admissible variations at apoint, we are actually dealing with the tangent space at the point to the ‘set’,which is usually envisaged as a ‘manifold’, and which was mentioned alreadybriefly in Chapter 1. The elements v from the tangent space TuM are theadmissible variations: they are such that with u, the element u+ εv belongs toM up to terms of higher order (usually O(ε2)) for small real ε; in detail:

TuM =

½v

¯there is w(v; ε) such that

½u+ εv + w ∈Mw = o(ε) i.e. wε → 0 for ε→ 0

¾

46 CHAPTER 2. CONSTRAINED PROBLEMS

In infinite dimensions, for constrained variational problems the functionsbelonging to the set of admissible elementsM have to satisfy certain boundaryconditions and certain ‘interior’ conditions. Then, if u belongs to M, for avariation η ∈ C∞0 (Ω), the function u+ εη does not in general belong toM (upto second order): the setM is a nonlinear manifold. This has been denoted inChapter 1 that the tangent space does not contain all test functions:

C∞0 (Ω) 6⊂ TuM.

In most problems that we will deal with, the manifoldM is a subset of a functionspace U for which the functions satisfy (apart from certain boundary conditions)a finite number of nonlinear functional constraints.

First, to deal separately with (linear) inhomogeneous boundary conditions ina simple way in the following we first define a subspace to which the admissiblevariations will belong. Therefore, denote by U be the space of functions thatsatisfy the (linear) boundary conditions, and let U0 be the tangent space to U ,i.e. U0 consists of elements v such that u + εv ∈ U whenever u ∈ U : the func-tions v are the functions that satisfy the homogeneous boundary conditions. Allthe admissible elements related to the setM will certainly belong to this set U0.

Secondly, we will restrict to the most important sets that will be encoun-tered in the following. These will be setsM of admissible elements which willbe defined as the intersection of the levelsets of certain (density) functionalsK1, . . . ,Kp:

M = u ∈ U | K1(u) = k1, . . . ,Kp(u) = kp , (2.1)

where k1, . . . , kp are given values. In general this set may be empty, so we as-sume that for the given values of the constraints k1, . . . , kp this set is non-empty.

Now consider first a single levelset, say for the functional K the set K(u) = k.Assume that we take a smooth curve in this levelset, through a point u, say acurve parameterized by ε :

ε→ u(ε), with u(0) = u, and K(u(ε)) = kThen, upon differentiating with respect to ε we arrive at:

0 =d

dεK(u(ε)) = δK(u(ε); ∂εu)

When we write τ = ∂εu(0) for the tangent direction to the curve at u, we seethat at ε = 0,

0 = δK(u; τ) =< δK(u), τ > .Thus for any tangent direction τ to the levelset at the point u it holds that< δK(u), τ >= 0. Therefore it is consistent1 to consider n := δK(u) as the

1Actually we also have to prove the converse: if < δK(u), τ >= 0 then τ is a tangentdirection of some curve in the levelset. This can be proved by constructing the curve alongthat direction; it is clear that in general the actual curve will have a deviation in the normaldirection. Taking this as Ansatz, we have that the curve ε → u + ετ + α(ε)n belongs to thelevelset if α(ε) can be choosen such that K(u+ετ+α(ε)n) = k and such that α(ε) is of higherorder than linear in ε. Viewing K(u+ ετ + αn) = k as a relation between two real variablesthat defines α implicitly as a function of ε, the use of the implicit function theorem shows theexistence of α(ε) and the required higher order dependence on ε, provided that δK(u) 6= 0.

2.2. LAGRANGE MULTIPLIER RULE 47

normal to the tangent space, provided that u is a regular point, i.e. providedthat δK(u) 6= 0. This shows that the finite dimensional picture of tangent spaceand normal to levelset remains valid in infinite dimensions.For a set that is the intersection of various levelsets

M = u ∈ U | K1(u) = k1, . . . ,Kp(u) = kp , (2.2)

in the regular points, the p constraints will define a tangent space that containsall but p directions, so that near a regular point the setM is well approximatedby its tangent space (hyper plane) of co-dimension p; this is the analog of a(n − p)-dimensional smooth manifold in Rn. Stated differently, at a regularpoint, there are p independent normal directions to the tangent space.In a singular point, some of the normal directions to the tangent space coincide:the elements of the tangent space are restricted by less than p conditions. Wewill make this more precise in the following.Let u ∈ M be a point of the manifold M at which the linear functionals

δK1(u; ·), . . . , δKp(u; ·) are linearly independent; we will call this a regular point .The following result generalizes what has been proved above for one functionalconstraint.

Lemma 24 LyusternikThe tangent space toM at a regular point u ∈M is the set of co-dimension p :

TuM := v ∈ U0 | δK1(u; v) = . . . = δKp(u; v) = 0 . (2.3)

This shows that admissible variations v have to satisfy p linear constraints.

2.2.2 Formulations of LMR

Consider the variational problem

Crit L(u) | u ∈M , withM = u ∈ U | K1(u) = k1, . . . ,Kp(u) = kp Recall the general stationarity condition (1.15) for a critical point u of L onM:

δL(u; v) = 0, for all v ∈ TuM.

Using Lyusternik’s Lemma for the specific setM under consideration, this con-dition for a critical point can be reformulated to

δL(u; v) = 0, for all v ∈ U0for which δKk(u; v) = 0, 1 ≤ k ≤ p.

(2.4)

Clearly, (2.4) is satisfied if δL(u; ·) is a linear combination of the δKk(u; ·), 1 ≤k ≤ p. In fact this is also necessary, as expressed in the next proposition.Proposition 25 Lagrange’s multiplier ruleA regular point u ∈ M is a constrained critical point of L on M, i.e. satis-fies (2.4), if and only if there are real numbers, called Lagrange multipliers,λ1, . . . ,λp such that

δL(u; v) = Σk λkδKk(u; v), for all v ∈ U0. (2.5)

48 CHAPTER 2. CONSTRAINED PROBLEMS

It is possible to formulate this result in a different way; this may be easierto remember, but may also be somewhat misleading.

Proposition 26 LMR with the Lagrangian functionalA regular point u ∈M is a critical point of L on the constrained setM (2.1) ifffor some multipliers λ1, . . . ,λp the element u is an unconstrained critical pointof the unconstrained functional

U 3 u 7→ L(u)− Σm λmKm(u). (2.6)

This functional is called the Lagrangian functional2 of the constrained problem.

Proof. For a critical point of the Lagrangian functional it holds that

δ[L−Xm

λmKm](u; v) = 0, for all v ∈ U0.

This is precisely the equation from the multiplier rule. The other way aroundis obvious.

Remark. The above formulation with the Lagrangian may be misleading inthe following respect: it may be possible that u is a constrained minimizer, whileit is not a minimizer, but only a saddle point of the unconstrained Lagrangian;we will consider this in more detail later. On the other hand, this proceduredoes lead to the correct set of equations, including possible natural boundaryconditions.

The results above have obvious consequences for the relation between thevariational derivatives since C∞0 ⊂ U0. When no natural boundary conditionsappear, these relations are in fact equivalent to the original result. The inves-tigation of natural boundary conditions, in which the multipliers may appear,should be based on a study of (2.5).Independence of the linear functionals leads to a result for the variational

derivatives (however, not in a one-to-one way). Since δK(u; η) =< δK(u), η >,for η ∈ C∞0 ⊂ U0, the independence of the linear functionals δK1(u; ·), . . . , δKp(u; ·)on U0 implies the independence of the p variational derivatives (as elements ofL2(Ω))

δK1(u), . . . , δKp(u).

For admissible variations that are also testfunctions η ∈ C∞0 ⊂ U0, we get inparticular

TuM ⊃ η ∈ C∞0 ⊂ U0 | hδK1(u), ηi = . . . =< δKp(u), η >= 0 .

This makes it clear that the testfunctions from the tangent space satisfy p or-thogonality conditions, namely orthogonal to the p normal directions δK1, . . . , δKp.

2Note that the name “Lagrangian” (functional) appears at various places with a differentmeaning!

2.2. LAGRANGE MULTIPLIER RULE 49

Proposition 27 LMR for variational derivativesFor a constrained critical point u ∈M it holds that the variational derivativesare dependent:

δL(u) =Xm

λmδKm(u).

Equivalently: the variational derivative of the Lagrangian functional vanishes.Possible natural boundary conditions are overlooked in this formulation.

This restriction to test functions may be incomplete if boundary conditionsare involved. The following examples motivate that sometimes one has to workwith the first variations of the functionals, instead of with the variational deriv-atives only.

Exercise. Consider the set

M = u : [0, 1]→ R | K(u) = 1, u(0) = 1 .1. For K(u) = R 1

2u2 the tangent space is

TuM = v; [0, 1]→ R |Zuv = 0, v(0) = 0 ,

i.e. the functions with v(0) = 0 perpendicular to the variational derivativeδK(u) (= u), just as is the case for test functions.

2. For K(u) = R 12u

2x the tangent space is

TuM = v; [0, 1]→ R |Zuxvx = 0, v(0) = 0 ;

relating this to the variational derivative δK(u) = −uxx we find that vhas to satisfy

hδK(u), vi+ ux(1)v(1) = 0, and v(0) = 0.So, for test functions η this means hδK(u), ηi = 0, but there are morefunctions in the tangent space when v(1) 6= 0. These last functions shouldcertainly be considered in the stationarity condition (the multiplier rule)in order to find the correct natural boundary conditions.

3. In more dimensions, an example of a nonlinear boundary condition is

M = u : Ω→ R | u(x) = ϕ(x) on ∂Ω1,

Z∂Ω2

u2(x) = 1 .

The tangent space is given by

TuM = v : Ω→ R | v(x) = 0 on ∂Ω1,

Z∂Ω2

u(x)v(x) = 0

and clearly contains all test functions.

Proof. Proof of the multiplier rule

50 CHAPTER 2. CONSTRAINED PROBLEMS

2.2.3 Families of constrained problems

Although the following can be formulated in a much more general way, werestrict to illustrate the general ideas to a simple example: the constrainedoptimization of one functional H on level sets of another functional I:

Crit H(u) | I(u) = γ ;we will use the formulation with variational derivatives in the following, notbothering about possible (natural) boundary conditions.In the previous sections we studied the problem with fixed value of the

constraint, i.e. given γ. Now we will treat γ as a parameter, and consider thefamily of constrained optimization problems. This will enable us to give aninterpretation of the multiplier and relate the nature of the constrained criticalpoint to its character as a critical point of the Lagrangian functional.Suppose therefore that we can find a smooth family

γ 7→ U(γ) ∈ Crit H(u) | I(u) = γof constrained critical points of H on level sets I−1(γ) for parameter values γin a neighbourhood of some γ0. It may happen that, for instance, this familyconsists only of constrained minimizers, but also that the character of the crit-ical point changes with γ (without violating the smoothness assumption).

A first observation is that the derivative along this family is "normal" to thelevel sets of I in the following sense. Defining

n(U) :=dU

dγ, (2.7)

it is found by differentiating the relation I(U(γ)) = γ with respect to γ, thatn(U) is normal to the level set I−1(γ) in the sense that

hδI(U), n(U)i = 1. (2.8)

Along this branch, each element satisfies for a multiplier λ = λ(γ) the equation

δH(U(γ)) = λ(γ)δI(U(γ)). (2.9)

The multiplier can be related to the so-called value function.

Definition 28 The value function of the constrained critical point problem ona branch of critical points is defined as:

h(γ) := H(U(γ)) = Crit H(u) | I(u) = γ . (2.10)

The value function can be depicted in a so-called integral diagram. With thevalue of the integrals H and I along the axis, each point represents all stateswith that value ofH and I (a two-dimensional representation of the state space).Assuming that there are branches of constrained critical points parameterizedwith the value of the constraint functional γ, a schematic representation of theseequilibria in the integral diagram may look like Fig. ?????????.Both the first and second derivative of the value function play a particular

role in the understanding of the critical point problem; this will be consideredin the next two subsections.

2.2. LAGRANGE MULTIPLIER RULE 51

2.2.4 The multiplier as derivative of the value function

Proposition 29 For the smooth family γ 7→ U(γ), the multiplier λ(γ) appear-ing in (2.9) is related to the value function according to

λ(γ) =dh(γ)

dγ. (2.11)

Proof. A direct differentiation with respect to γ leads to the result:

dh(γ)

dγ= hδH(U(γ)), dU(γ)

dγi = λhδI(U(γ)), n(U)i = λ,

the last equality from differentiation of I(U(γ)) = γ.This result clearly shows that only by viewing a single problem for γ0 as

being embedded in a family provides an interpretation of the Lagrange multiplierλ(γ0).

2.2.5 Homogeneous functionals

A functional F is called homogeneous of degree α if for each µ

F (µu) = |µ|αF (u).

A quadratic functional is homogeneous of degree 2, for instance.For a homogeneous functional of degree α it holds for the directional deriv-

ative in the ‘radial’ direction that

δF (u;u) = αF (u)

Proposition 30 If the functional H is homogeneous of degree α, and I is ho-mogeneous of degree β, then the value function is given up to a multiplicativeconstant by

h(γ) = h(0)γα/β

and the multiplier by

λ(γ) = λ(0)γα/β−1,

where λ(0) = h(0)α/β. By defining the following quotient R that is homogeneousof degree zero

R(u) :=H(u)β

I(u)α

the critical points of R are up to normalization, in a one to one relation to theconstrained critical points of H on levelsets of I.For quadratic forms, when α = β = 2, the quotient is known as the Rayleighquotient; then the multiplier is independent of the value of the constraint and isgiven by the value of R at the critical point.

52 CHAPTER 2. CONSTRAINED PROBLEMS

2.2.6 ** Constrained minimizers and the Lagrangian func-tional

The character of the constrained critical point at γ0 can be related to its char-acter as a critical point of u 7→ H(u) − λ(γ0)I(u). This is determined by thesecond derivative of the value function.

Proposition 31 The second variation of the constrained problem in the normaldirection n(U) defined in (2.7), is precisely the second derivative of the valuefunction:

h[D2H(U)− λD2I(U)]n(U), n(U)i = d2h(γ)

dγ2. (2.12)

In particular it follows that the sign of the second variation in this direction isdetermined by the convexity or concavity of the value function in a neighbour-hood of the value of γ.

Proof. Differentiating the equation (2.9) with respect to γ there results

[D2H(U)− λD2I(U)]n(U) =dλ

dγδI(U). (2.13)

Since dλ/dγ = d2h(γ)/dγ2, (2.12) results after taking the inner product withn(U).In the foregoing Propositions, U can also be viewed as a critical point of the

Lagrangian functional H − λI on the whole space. The expression (2.12) givesinformation how this functional changes in the direction transversal to level setsof I. This may determine the character of U as an unconstrained critical pointof H − λI.

Proposition 32 A constrained minimizer of H on level set of I is an uncon-strained (local) minimizer of H − λI (where λ is the multiplier) if the valuefunction is (locally) convex. If the value function is (locally) concave, a con-strained minimizer is an unconstrained saddle point of H − λI.

Proof. It would be tempting to use the second variation to provide theproof. Indeed, being constrained minimal, the second variation is non-negativeon the tangent space, and the previous result shows the sign in the remainingdirection transversal (normal) to the tangent space. This is correct if certaintechnical conditions are met, viz. non-degeneracy (signs strictly positive or neg-ative), and a property that sign-definiteness of the second variation is sufficientfor minimality. However, the following reasoning is elegant and simple, and doesnot need any of such requirements.Let U0 be a constrained minimizer of H on the level set I = γ0, with λ0 the mul-tiplier. Assuming that the value function is (locally) convex (i.e. d2h(γ)/dγ2(γ0) >0), the result that U0 is in fact an unconstrained (local) minimizer of the func-tional H − λ0I follows by comparing its value with arbitrary functions u, withI(u) = γ, γ in a neighbourhood of γ0:

H(u)− λ0I(u)

≥ h(γ)− λ0γ by definition of h

≥ h(γ0)− λ0γ0 from convexity of h

= H(U0)− λ0I(U0) since U0 is a constrained minimizer.

2.3. APPLICATIONS 53

If h is locally concave, H − λ0I is maximal on the family γ 7→ U(γ), and thesaddle point character is clear. See Fig. ??????.

2.3 ApplicationsWe will now show some main application areas of Constrained Problems.First we investigate in some detail Linear Eigenvalue Problems; these will be

formulated as constrained problems with quadratic functionals, and the eigen-values are the multipliers.Further we will consider two applications for dynamical systems; one is a

deep result in Classical Mechanics, that can nicely be extended to evolutionequations: the theory for Relative Equilibria. The other one is the generalizationof the steepest decent method subject to constraints, so-called thermodynamicsystems.

2.3.1 Linear Eigenvalue Problems

Basic problem from Linear Algebra

The basic problem in Linear Algebra is to solve the vector equation for a givenn×m-matrix A and vector b ∈ Rm

x ∈ Rn such that Ax = bFor square matrices, m = n, the eigenvalue problem is the search for non-trivialeigenvectors ϕ for which there exists an eigenvalue λ such that

Aϕ = λϕ.

The simplest examples show that even for real matrices the eigenvalues andeigenvectors can be complex-valued. Furthermore, a complete set of eigenvectors(the eigenvectors spanning the whole space) cannot be expected to exist. Verydifferent is the situation when the matrix is symmetric:

Definition 33 A square matrix S : Rn → Rn is called symmetric (Hermitian)if

Sx · y = x · Sy for all x, yProposition 34 A symmetric matrix S has the following properties :

• all eigenvalues are real; as a consequence, the eigenvectors can be taken tobe real,and mutually orthogonal;

• there exists a complete set of (real) eigenvectors ϕ1, ...,ϕn :Sϕk = λkϕk, k = 1...n;

the eigenvalues are expressed in terms of the eigenvectors by the Rayleighquotient

λk = R(ϕk) ≡Sϕk · ϕkϕk · ϕk

54 CHAPTER 2. CONSTRAINED PROBLEMS

• with respect to the set of eigenvectors, the matrix S has the form of adiagonal matrix (it is ‘similar’ to the diagonal matrix)

S ∼ diag[λ1, ...,λn];

• the null-space of S is the span of all eigenvectors corresponding to eigen-value zero.

Using the eigenvectors, the basic vector equation can easily be solved (or in-vestigated) by taking innerproducts with respect to the eigenvectors (normalizedfor simplicity)

Sx = b⇐⇒ b · ϕk = Sx · ϕk = x · Sϕk = λkx · ϕk for k = 1...n

and hence

x = Σkb · ϕkλk

ϕk

provided ϕk 6= 0 for all k = 1...n. If S is degenerate, there is only a solutionprovided the rhs satisfies the orthogonality (‘solvability’) conditions

b · ϕj = 0 for all eigenvectors ϕj with eigenvalue = 0;

i.e. b should be orthogonal to the null-space of S. In that case the solution isnot unique: any vector form the null-space can be added. This is a special case(restricted to symmetric matrices) of the so-called Fredholm alternative.

Exercise. Consider a matrix S = diag[λ1,λ2]. Sketch the image of theunit-circle under the action of S in the following cases

1. λ1 = λ2, either positive or negative;

2. 0 < λ1 < λ2 < 1,

3. 0 < λ1 < 1 < λ2.

Levelsets of Quadratic Forms

Basically, the characteristic properties of quadratic forms are expressed in thesimple examples in R2 : ap(x, y) = x2 + y2 and as(x, y) = x2 − y2 and moregenerally in (x, y) ∈ Rm ×Rn−m = Rn

a(x, y) = |x|2 − |y|2.

Levelsets are either bounded, in which case these have ‘elliptic’ shape, or un-bounded and then these have ‘hyperbolic’ shape. In the first case the origin isa center, and a minimizer (or maximizer) of the form, in the other case it is ahyperbolic point and a saddle point of the form. This is all that can be expectedin finite dimensions:

2.3. APPLICATIONS 55

Proposition 35 Any quadratic form in Rn can be written like

a(x) = Sx · xwhere S is a symmetric matrix. With respect to a basis of eigenvectors of S, thematrix has diagonal form

S = diag [λ1, ...,λn]

and then

a(x) = Σkλkx2k.

The form is positive [non-negative] iff all eigenvalues are positive [non-negative].The number of zero-eigenvalues determines the degeneracy of the form; thenumber of negative eigenvalues is the order of hyperbolicity.

Eigenvalue problem for linear operators

Definition 36 Let L be a linear mapping (operator) from one function spaceinto another one. The eigenvalue problem for L asks for the eigenvalues λ thatare the complex numbers for which there exists a non-trivial eigenfunction u:

Lu = λu.

Clearly the eigenfunctions must be elements that are both in the domain ofdefinition and in the range of the operator.To define the notion of a symmetric operator, we exploit (as in the previous

chapters) the L2-inner product < ·, · > for functions on the domain Ω.Definition 37 Let L be an operator defined on a function space U that containsthe test functions. The formal adjoint of L is the (linear) operator denoted byL∗ such that

hLu, vi =< u,L∗v > for all u, v ∈ C∞0 (Ω).The operator L is called (formally) symmetric on U if L = L∗ and moreover

hLu, vi = hu,Lvi for all u ∈ U .Associated to L there is a bilinear functional

b(u, v) :=< Lu, v > .

If L is (formally) symmetric on U , this bilinear functional is symmetric anddefines the quadratic form Q on U :

Q(u) =< Lu, u > .Note that in that case the operator L is obtained as the variational derivativeof Q:

u 7→ Q(u) : δuQ(u) = 2Lu

56 CHAPTER 2. CONSTRAINED PROBLEMS

since the first variation is given by δQ(u; v) = 2b(u, v) for all u, v ∈ U .Remark. Above we defined the adjoint of an operator. When (homo-

geneous) boundary conditions are present, one can define the adjoint of theoperator and the adjoint boundary conditions from

hLu, vi = hu,L∗vi for all u and v.

We will see examples in the following.In the following we will mainly deal with differential operators.Exercise. Differential operators

For a linear differential operator L, the result applied to a (smooth) functionu is a function Lu(x) that depends on u and a finite number of derivatives ofu at the point x. The order of the highest derivative is called the order of thedifferential operator.

1. For U functions on the interval [0, 1], and for given functions a, b, c deter-mine the (formal) adjoint of the second order differential operator

Lu = a(x)uxx + b(x)ux + c(x)u.

2. Determine conditions on the functions a, b, c that guarantee that L is sym-metric.

3. Consider for given function f the following boundary value problem forthe operator L above:

Lu = f, u(0) = 0, ux(1) = 0.

Determine the corresponding adjoint boundary value problem.

4. The generalization to functions of more variables: for functions on thedomain Ω ⊂ Rn, and for given scalar functions a, b1, . . . , bn, c, determinethe (formal) adjoint of the operator

Lu = a(x)∆u+Xk

bk(x)uxk + c(x)u.

Determine the adjoint boundary value problem for this operator withDirichlet boundary conditions on part of the boundary ∂Ω1.

General formulation of EVP

There is a one-to-one correspondence between symmetric operators and quadraticforms in finite dimensions as we have seen already; the same is (more or less)true in infinite dimensions: with a symmetric operator L corresponds a quadraticform Q(u) =< Lu, u > and the variational derivative is δQ(u) = 2Lu.A specific example are Sturm-Liouville operators in one or more dimensions

Lu = −∂x[p(x)∂xu] + q(x)u,Lu = −div [p(x) ∇u] + q(x)u,

2.3. APPLICATIONS 57

with corresponding quadratic forms

Q(u) =

Z[p(x)u2x + q(x)u

2]dx,

Q(u) =

[p(x)|∇u|2 + q(x)u2]dx.

We will now formulate the eigenvalue problem (EVP) in a somewhat moregeneral way by formulating it using two symmetric quadratic forms instead ofwith the operators itself.

Let N be a quadratic form on L2; we will denote the corresponding operatorby N and use the following notation for the corresponding bilinear functional:

N (u) = hNu, ui, N (u, v) ≡ hNu, vi

In the following we want to have N as a norm, and so we have to assume thatN is positive:

N (u) > 0 for u 6= 0.

Note, for N = Identity then N (u) is just the usual L2-innerproduct. The moregeneral formulation includes the case when we use so-called weighted L2-norms.Let Q be another quadratic form on U , with corresponding symmetric operatorL:

Q(u) = hLu, ui, Q(u, v) ≡ hLu, vi.

We will study the eigenvalue problem corresponding to these two operators:

Lu = λNu

Note, for N = Identity we recover the standard formulation above.

The eigenvalues corresponding to one eigenvalue λ form a linear space, theeigenspace of the eigenvalue, to be denoted by Eλ. These eigenspaces are mu-tually orthogonal in the following senses.

Proposition 38 All eigenvalues are real valued, and the eigenfunctions can beassumed to be real.Eigenfunctions corresponding to different eigenvalues are “orthogonal” with re-spect to both quadratic forms:

for ϕ ∈ Eλ, ψ ∈ Eµ, with λ 6= µ( Q(ϕ,ψ) = 0

N (ϕ,ψ) = 0.

We can denote this for the eigenspaces as3

Eλ ⊥N Eµ, Eλ ⊥Q Eµ when λ 6= µ.3Be carefull with this (useful) description: since N is a norm, orthogonality can be under-

stand in the ususal sense; however Q is not necessarily positive; when it is not positive theuse of the word ‘orthogonal’ may be somewhat misleading.

58 CHAPTER 2. CONSTRAINED PROBLEMS

The eigenvalue problem for Q and N can then equivalently be defined asthe problem to find eigenfunctions ϕ 6= 0, such that for some eigenvalue λ:

Q(ϕ, v) = λN (ϕ, v) for all v ∈ U .

Since this can be rewritten like

δQ(u; v) = λδN (u; v),

one interpretation of an eigenfunction with eigenvalue λ is as a critical point ofthe functional

U 3u→ Q(u)− λN (u).

However, since λ is not given, but has to be found, this is not a very usefulattack. Much more fruitful is to interpret λ as a multiplier appearing from aconstrained problem.

Proposition 39 (Normalized) Eigenfunctions ϕ are critical points of:

ϕ ∈ Crit Q(u)|u ∈ U, N (u) = 1 .

Equivalently,

ϕ ∈ Crit R(u) |u ∈ U ,with R(u) = Q(u)N (u) .

where R is the so-called Rayleigh quotient. The corresponding eigenvalues areprecisely the critical values R(ϕ).

This formulation will be most useful, as we will see. It will determine theprincipal eigenvalue (the largest or smallest one) if R attains its maximumor minimum. Other eigenfunctions can then be found in a recursive, or non-recursive way, all based on the constrained variational formulation above.

To exploit the variational characterization of the eigenfunctions with theRayleigh quotient in a constructive way, one has to find out whether the Rayleighquotient is bounded from above or from below. (In finite dimensions it isbounded both from below and from above, in infinite dimensions this is notthe case except in trivial cases).

Definition 40 We say that Q is (strongly) coercive (or elliptic) with respectto N if the Rayleigh-quotient is bounded from below (but not from above): forsome γ ∈ R

R(u) ≥ γ, and for some sequence um, R(um)→∞.

Exercise.

1. If γ > 0, Q defines a norm itself, and the assumption of ellipticity meansthat this norm is stronger (not equivalent) than the N -norm. Show thaton U0 = u(0) = u(π) = 0 the norm Q(u) =

Ru2x is coercive with respect

to N (u) = R u2.

2.3. APPLICATIONS 59

2. If R is not definite (then γ < 0) the quadratic form Q is not a norm; bydefining

Q(u) := Q(u) + 2|γ|N (u)it follows that Q(u) ≥ |γ|N (u), and so is a norm that is stronger than N .Since the eigenfunctions of Q and Q are the same, and the eigenvalues justdiffer the constant shift 2γ, one could just as well study the eigenvalueproblem for Q and N . Stated differently, it would be no restriction toassume Q to be positive definite from the start on.

3. Show that the Sturm-Liouville operator with

Q(u) =Zp(x)u2x + q(x)u

2

is coercive on U0 with respect to N =Rρu2 provided p is non-negative

(and non-trivial), q is bounded, and ρ is positive definite.

Spectral theorem for differential operators

Theorem 41 Principal eigenfunction and -valueSuppose that Q is coercive (elliptic) with respect to N : for some γ(> 0) Q(u) ≥γN (u), and assume that the minimization problem for R has a solution. Thenthe solution

ϕ1 ∈Min Q(u)| u ∈ U , N (u) = 1 ∼Min R(u) |u ∈ Uis the principal eigenfunction ϕ1, i.e. the eigenfunction corresponding to thesmallest eigenvalue, the principal eigenvalue, λ1 that is given by

λ1 = R(ϕ1) (≥ γ).

Any other eigenfunction (independent of ϕ1) can be assumed to be orthogo-nal (both in N -, as well as in Q-sense) to ϕ1. In the following formulation thiswill be exploited in a successive way.

Theorem 42 Successive characterizationThe eigenfunctions and eigenvalues can be obtained in a successive way: ifϕ1, . . . ,ϕk are the eigenfunctions corresponding to the eigenvalues that are or-dered like

(γ ≤) λ1 ≤ λ2 . . . ≤ λk,

the “next” eigenfunction is found as the solution of

ϕk+1 ∈Min Q(u)|u ∈ H,N (u) = 1,N (u,ϕj) = 0, for 1 ≤ j ≤ kª;

the corresponding eigenvalue λk+1 ≡ R(ϕk+1) “follows” λk in the sense thatλk+1 ≥ λk, while, when λk+1 > λk, there are no other eigenvalues in between.

Exercise.

60 CHAPTER 2. CONSTRAINED PROBLEMS

1. The orthogonality constraints in the successive characterization are nat-ural constraints: although essential in the definition of the constraint set,there is no effect in the equation for the critical point: the correspondingmultiplier vanishes. To verify this, consider the equation for

ψ ∈ Crit R(u) |u ∈ H,N (u, f) = 0where f is any given function. The governing equation is for some multi-pliers µ,σ

Lψ = µNψ + σf, with µ = R(ψ).Verify that σ = 0 if f is some eigenfunction, but that in general σ will notvanish.

2. Proof with the methods of this chapter that the EVP from Linear Algebra

Ax = λBx,

with A and B > 0 symmetric matrices, has a complete set of eigenvectors.

3. Variational accuracySuppose that in a numerical calculation an approximation ϕ1 for the firsteigenfunction ϕ1 is constructed that is correct up to order ε (in N -normfor instance):

ϕ1 − ϕ1 = O(ε).

Show that then the approximate first eigenvalue λ1 that is constructed iscorrect up to order ε2:

λ1 − λ1 = O(ε2).Investigate the effect of an error ε in the calculation of ϕ1 for the ap-proximation of λ2 and of ϕ2. Do the same for higher eigenvalues andeigenfunctions.

Remark. By its nature, the above formulation requires the knowledge ofthe previous eigenfunctions to find the next eigenvalue: the eigenvalue λk+1follows by investigating the minimizer of R on the set of functions orthogonalto the eigenfunctions ϕ1, ...,ϕk. When one wants to use this formulation in anumerical procedure, for instance, this may lead to serious error-accumulation:in calculating λ1, an error in the calculation of ϕ1 influences the constraint setfor λ2 and induces an additional error in the calculation of λ2 and of ϕ2, and soon. The conclusion must be that the successive characterization as given aboveis not very suitable for numerical calculation of the successive eigenvalues andeigenfunctions. In subsection .. a non-successive characterization that is free oferror-accumulation is described.

Just as for symmetric matrices, the eigenfunctions for most differential op-erators form a complete set; this is a very strong result but requires in infinite

2.3. APPLICATIONS 61

dimensions some additional compactness condition. The proof will be based onthe successive characterization, but actually only requires the knowledge thatthe eigenvalues can be ordered and tend to infinity (are not bounded above).This is usually the case for differential operators when Q defines a norm that is‘essentially stronger’ than N .Theorem 43 Completeness of the set of eigenfunctionsIf each eigenvalue has finite multiplicity, and if the eigenvalues are unbounded:

(γ ≤) λ1 ≤ λ2 . . . ≤ λk . . .→∞,then the set of eigenfunctions is complete, both with respect to the N -norm andwith respect to the Q-norm4 .

Generalized Fourier theory

The completeness result implies that any function in U can be written as a(generalized) Fourier series

u(x) =∞X1

umϕm(x);

using the fact that the eigenfunctions are orthonormal, it follows directly thatthe Fourier coefficients are given by

um = N (u,ϕm);the infinite sum converges in the sense that

N (u−MX1

umϕm)→ 0 for M →∞

and also in the stronger norm

Q(u−MX1

umϕm)→ 0 for M →∞.

Fredholm alternative

Another interpretation is that the operator L is in diagonal form with respectto a basis of eigenfunctions, and hence that the inverse of L can be found easily.For simplicity suppose that N = Identity, and consider the inhomogeneousproblem

Lu = f, u ∈ U .Writing f =

Pfnϕn, with fn the Fourier coefficients of f , the solution is given

by

u =Xm

fmλm

ϕm,

at least when4When Q is not definite, this should be understood with respect to the Q-norm, with

Q = Q+ (|λ1|+ ε)N , for any ε > 0.

62 CHAPTER 2. CONSTRAINED PROBLEMS

• either all eigenvalues λm are non-zero (the operator L is invertible),

• or, if there is a zero eigenvalue, with eigenspaceEλ=0 (consisting of the eigenfunctions with eigenvalue 0),

there exists a solution only if the inhomogeneous term satisfies the orthog-onality conditions

f ⊥ Eλ=0;in that case the solution is not unique: any element from Eλ=0 can beadded.

These results are just a straightforward generalization of the Fredholm al-ternative for (symmetric) matrices.

Example: EVP for S-L operator on an interval

This example shows that the results for the EVP for specific operators aregeneralizations of the usual Fourier theory. For given positive functions ρ andp, and a function q on [0,π] (all smooth), the Sturm-Liouville eigenvalue problem(with Dirichlet boundary conditions) reads:

Lϕ = −∂x(p(x)ϕx) + q(x)ϕ = λρ(x)ϕ, ϕ(0) = ϕ(1) = 0,

and is obtained in U0 = u ∈ L2 | u(0) = u(π) = 0 with the quadratic forms

N (u) =Z

ρ(x)u2, Q(u) =Z[p(x)u2x + q(x)u

2].

Exercise.

1. The special case ρ ≡ 1, p ≡ 1, q ≡ 0 provides Fourier theory (for func-tions that are odd on [−π,π]): then the eigenvalues and (normalized)corresponding eigenfunctions are given by

λm = m2, ϕm =

p2/π sinmx, m ≥ 1.

The completeness result in the spectral theorem implies that any functionsatisfying the boundary conditions can be written as a Fourier-sine series

u(x) =p2/π

∞X1

um sinmx,

for Fourier coefficients given by

um = hu,ϕmi =p2/π

Zu(x) sinmxdx;

the convergence in the N -norm is just the usual L2-norm:Z(u−

MX1

umϕm(x))2dx→ 0, for M →∞.

2.3. APPLICATIONS 63

The convergence in the Q-norm implies a much stronger statement. Toinvestigate that, exploit the Poincaré inequality: for some constant c1 > 0it holds that

|u|2∞ ≤ c1Zu2x for all u, u(0 = u(π) = 0.

Then the convergence in the Q-norm implies the pointwise convergence ofthe Fourier-sine series:

|u−MX1

umϕm(x)|∞ → 0, for M →∞.

2. Changing the boundary conditions to Neumann boundary conditions:

ux(0) = ux(π) = 0

provides Fourier cosine series, since then

λm = m2, ϕm =

p2/π cosmx, m ≥ 0;

completeness in L2-norm, and pointwise, in the same way as above.

3. Observe that in both cases the eigenvalues are “simple”: to each eigen-value there corresponds precisely one eigenfunction; equivalently: theeigenspaces are one-dimensional. This is characteristic for Sturm-Liouvilleeigenvalue problems on an interval.

4. For functions u ∈ U0, where U0 are functions from C1([0, 1]) with u(1) = 0,show that the following Poincaré -Friedrichs inequality holds for a suitableconstant c:Z 1

0

u2x ≥ cZ 1

0

u2.

Can you determine the best possible value of c?

Example: EVP for S-L operator on a spatial domain

For a domain Ω ⊂ Rn, with boundary ∂Ω = ∂ΩD∪∂ΩN , and for given functionsp(x), q(x) and ρ(x), the quadratic forms

N (u) =ZΩ

ρu2, Q(u) =ZΩ

p(x)|∇u|2 + q(x)u2

on the set

U = u : Ω ∈ R | u(x) = 0 for x ∈ ∂ΩD leads to the EVP

−div (p(x)∇φ) + q(x)φ = λφ in Ω

φ = 0 on ∂ΩD

p(x)∂nφ = 0 on ∂ΩN

64 CHAPTER 2. CONSTRAINED PROBLEMS

Sufficient conditions on the functions p, ρ that make it possible to apply thegeneral theory are that they are positive definite:

p(x) ≥ p0 > 0, ρ(x) ≥ ρ0 > 0.

Then existence and completeness follows.(It should be noted that in more dimensions (n ≥ 2) the convergence in Q-normdoes not imply pointwise convergence; only for functions of one variable thePoincaré inequality holds!)

Exercise.

1. For given smooth and bounded functions p(x) and q(x), consider thequadratic forms

Q(u) =Z 1

0

p(x)u2x + q(x)u2, N (u) =

Z 1

0

u2 (2.14)

(a) Write down the Sturm-Liouville eigenvalue problem corresponding toQ and N on the set U0 of functions from C1([0, 1]) with u(1) = 0.

(b) Show that if the function p in (1) is strictly positive on the entireinterval [0, 1], the Rayleigh quotient

R(u) := Q(u)N (u)

is bounded from below on U0.

2. In a few specific cases, for special domains Ω, the eigenfunctions can befound explicitly. In all these cases the method of separation of variables isused. This can be done for instance for rectangular domains and for balls.Study the EVP for the Laplace operator with Dirichlet, and then withNeumann, boundary conditions for a domain Ω ∈ R2 in case

• Ω is the rectangle [0, a]× [0, b],• Ω is the unit disc.

Comparison methods for principal eigenvalues

Often we want to compare the eigenvalues of two different eigenvalue problems.When for each problem the eigenvalues are found in a variational way, this maybe done in an elegant way. The eigenvalue problems to be compared may differin three ways (or combinations thereof)

• the operators are different,• the boundary conditions are different,• the domain of definition of the functions is different.

2.3. APPLICATIONS 65

Exercise. Consider the vibrations of a linear string governed by

utt = ∂x(σ(x)ux), u(0, t) = u(`, t) = 0,

where ` is the length, and σ is a material property. When looking for time-harmonic solutions of the form

u(x, t) = v(x) exp[iωt]

there results the eigenvalue problem for v with ω2 = λ:

−∂x(σ(x)vx) = λv, v(0) = v(`) = 0.

The principal eigenvalue λ1 determines the lowest frequency of vibration of thestring; in practise it determines the fundamental tone of a piano etc. Of course,its value depends on the length `, on the material properties described by σ,and in fact also on the boundary conditions.

1. For σ is constant, the principal eigenvalue is given by

λ1 = σ(π

`)2 with eigenfunction ϕ1 = sin(

πx

`).

Hence, the principal eigenvalue decreases when the length increases, and/orwhen the tension σ decreases.

2. For a string with a free endpoint at x = `, the boundary conditions arereplaced by v(0) = 0, vx(`) = 0. Then the principal eigenvalue µ1 is givenby

µ1 = σ(π

2`)2, with eigenfunction ψ1 = sin(

πx

2`)

and produces a lower fundamental tone since µ1 < λ1.

3. The same result holds true for the non-fundamental eigenvalues:

λm = σ(mπ

`)2, with eigenfunction ϕm = sin(

mπx

`),

µm = σ(mπ

2`)2, with eigenfunction ψm = sin(

mπx

2`);

note that µm < λm for all m ≥ 1.4. The physical statement of these results is that upon relaxing the con-straints, the eigenvalues decrease.

Using the extremal characterization for the principal eigenvalue, comparisonbetween different problems may be relatively easy. For ease of presentation wewill mainly deal with differential operators for which the principal eigenvalueminimizes the Rayleigh quotient.

Proposition 44 Let U be the linear space, and R the Rayleigh quotient. Sup-pose that the principal eigenvalue Λ minimizes R on U ; making its dependenceexplicitly, we write

Λ(R,U) = Min R(u) |u ∈ UThen Λ depends monotonically on R and on U in the following senses:

66 CHAPTER 2. CONSTRAINED PROBLEMS

Proposition 45 •Λ(R1,U) ≤ Λ(R2,U) if R1(u) ≤ R2(u) for all u ∈ U

•Λ(R,U1) ≤ Λ(R,U2) if U1 ⊃ U2.

Proof. The first statement follows from

R1(u) ≤ R2(u) for all u ∈ U=⇒ Min R1(u)|u∈ U ≤ R2(u) for all u ∈ U=⇒ Min R1(u)|u ∈ U ≤Min R2(u)|u∈ U.

The second statement from the fact that the minimizer decreases (or at leastdoes not increase) if the domain of definition is enlarged (“relaxing the con-straints . . .”).

Exercise. Sturm-Liouville comparison results

1. Different operators

(a) Let p(x) = 1 + x2, and q(x) = sinx in the scalar S-L-operator andconsider Dirichlet conditions. Derive a lower bound and an upper-bound for the smallest eigenvalue of the Sturm- Liouville eigenvalueproblem.

(b) Let Λ1,2 be the principal eigenvalue of respectively

−div [p1(x)∇u] + q1(x)u = Λ1ρ1u,−div [p2(x)∇u] + q2(x)u = Λ2ρ2u

on a domain Ω with the same boundary conditions. If

p1 ≤ p2, q1 ≤ q2, ρ1 ≥ ρ2 on Ω,

the Rayleigh quotients satisfy

R1(u) ≡R[p1|∇u|2 + q1u2]R

ρ1u2

≤R[p2|∇u|2 + q2u2]R

ρ2u2

≡ R2(u),

and hence Λ1 ≤ Λ2.2. Different boundary conditionsFor the same S-L operator on Ω, consider two different boundary condi-tions

u = 0 on ∂Ω1, & ∂nu = 0 on ∂Ω/∂Ω1

u = 0 on ∂Ω2, & ∂nu = 0 on ∂Ω/∂Ω2

It should be noted now that the Neumann boundary conditions arise asnatural boundary conditions; hence the correct boundary conditions areobtained by investigating the Rayleigh quotient on the sets

U1,2 = u | u = 0 on ∂Ω1,2 .When ∂Ω1 ⊂ ∂Ω2 (“relaxing ...”), it holds that U1 ⊃ U2, and henceΛ1 ≤ Λ2.

2.3. APPLICATIONS 67

3. Different domains, Dirichlet boundary conditionsConsider the same S-L operator with Dirichlet boundary condition on twodomains Ω2 ⊂ Ω1 (the functions are defined on the largest domain, andso is the Rayleigh quotient). Any function v2 ∈ U2 = v : Ω2 | v =0 on∂Ω2 can be extended to a function v1 on Ω1 by assigning it thevalue zero for x ∈ Ω1/Ω2; this defines the space of functions U1 = v :Ω1 | v = 0 forx ∈ Ω1/Ω2 . Since extension with zero does not changethe value of the Rayleigh quotient, R(v1) = R(v2), and since U1 ⊂ U1 = u : Ω1 | u = 0 on∂Ω1 , it follows that

Λ2 = Min R(v2)| v2 ∈ U2 =Min R(v1)| v1 ∈ U1≥ Min R(u)| u∈ U1 = Λ1

Remark. It should be noted that the inequalities derived above for theprincipal eigenvalue cannot so easily be extended to non-principal eigenvalues.The reason is that, for instance for the second eigenvalue, R has to be investi-gated on the functions orthogonal to the first eigenfunction. When dealing withtwo problems, the principal eigenfunctions, say Φ1 and Ψ1, will differ, and sowill their orthogonal complements:

u | u ∈ U , N (u,Φ1) = 0 6= u | u ∈ U , N (u,Ψ1) = 0 while one is not simply included in the other. Hence, no conclusions can bedrawn by considering the minimization problems, not even for the same R:

Min R(u)|N (u,Φ1) = 0 = ?? = Min R(u)|N (u,Ψ1) = 0and hence no conclusions for the second eigenvalue.This situation motivates the characterization of the second (and higher) eigen-value without using the first eigenfunction; in the next section we will describesuch a non-successive characterization.

2.3.2 Algorithm for Relative Equilibrium (Solutions)

Many partial differential equations for problems from the physical and technicalsciences are nonlinear, and as such are difficult in the sense that usually noexplicit solutions can be written down. Occasionally special solutions can befound, and then usually a who family can be found, a family depending on pa-rameters. These special solutions are often found in an ad-hoc way, using somespecial Ansatz. But actually, in many cases for dynamical systems, a deep resultfrom Classical Mechanics lies behind this Ansatz. This result can be generalizedto Variational Evolution Equations, like wavelike equations and more generalcontinuous Poisson systems. Without proof, it is formulated as an algorithm inthe next subsection; from this it also becomes clear that the notion generalizesthe idea of ‘simple’ equilibrium solutions.

Consider a dynamical system with Poisson structure that is written in ageneral form like

∂tu = ΓδH(u)

68 CHAPTER 2. CONSTRAINED PROBLEMS

Critical points of the Hamiltonian are then Equilibrium solutions:

For δH(u) = 0, it holds ∂tu = 0

Assume that the system has one or more ‘constants of the motion’, meaningfunction(als) I that do not change in time on solutions:

∂tI(u) = 0 for solutions of ∂tu = ΓδH(u);

such a functional is often simply called an ‘integral’ of the dynamical system.Denote the ‘flow’ of each integrals by ΦI , meaning that

ΦIτ (v0) is the solution of the IVP ∂τv = ΓδH(v), v(τ = 0) = v0.

Then (provided some technical conditions are satisfied, roughly speaking thatthe integrals are ‘independent’), consider the constrained critical point problem:

Crit H(u) | I(u) = γ (2.15)

Any constrained critical point is called a relative equilibrium; it satisfies forcertain multiplier

δH(u) = λδI(u). (2.16)

Related to this relative equilibrium there is a true solution of the system, calleda relative equilibrium solution U(t), for which the time evolution is actuallygiven by

U(t) = ΦIλt(u); (2.17)

for more integrals this has to be interpreted as a superposition of flows (inarbitrary order5):

U(t) = ΦI1λ1t ΦI2λ2t ...ΦINλN t(u)

In many applications these solutions are rather special and often referred to ascoherent structures of the dynamical system.

Note that this result means that the relative equilibria can be found from aNonlinear Eigenvalue Problem of variational form in which the functionals havedirect physical relevance for the dynamics. In many cases, the multipliers havea clear relevant meaning also. We will show in Appendix C that ‘solitons’ inKdV and NLS are special cases of the principle above.Exercise. Spherical pendulum

1. Derive the equation for spherical pendulum using as Lagrangian the dif-ference of kinetic and potential energy, expressed in spherical angle coor-dinates.

5This is one of the basic results for Poisson systems which is not proved in this course:Provided the functionals Im are ‘independent’ in the sense that they Poisson-commute, i.e.that they satisfy, using the notation of Poisson bracket, Ik, Im = 0, their flows commutes :ΦIk ΦIm = ΦIm ΦIk .

2.3. APPLICATIONS 69

2. Show that the dynamics is a Hamiltonian system with Hamiltonian

H =1

2

Ãp2θ +

p2φ

sin2 θ

!+ ω20(1− cos θ)

3. Observe that there is, except from H, an additional first integral I, theso-called angular momentum.

4. Reduce the dynamics by prescribing the value of I, and find equilibria ofthis reduced dynamics.

5. Show that the equilibria of the reduced dynamics are in fact the relativeequilibria: constrained minimizers of H at given I.

6. Determine the relative equilibrium solutions and interpret their motion inspace. Relate the angular velocity to (the derivative) of the relevant valuefunction.

2.3.3 Thermodynamic systems: constrained steepest de-scent

We have seen that gradient systems can be used for numerical purposes to findthe minimizer of a given functional; the dynamic trajectories are in the directionof steepest descent.

When looking for constrained minimizers, this method has to adapted totake the constraints into account. We briefly describe the modification; forsimplicity we restrict ourselves to the case of one functional constraint.These systems are called thermodynamic systems, since there is one integralthat is conserved (the “energy”) while another one decreases monotonically (the“entropy”)6.

Definition 46 A thermodynamic system is a dynamical system in the statespace U that is of the form

∂tu = −[δH(u)− λ(u)δI(u)], with λ(u) =hδI(u), δH(u)ihδI(u), δI(u)i

where H : U → R and I : U → R are functionals.

To see the dynamic properties, observe that any functional F evolves ac-cording to

∂tF (u) = hδF (u), [δH(u)− λ(u)δI(u)]i.Substituting the functional I for F and the expression for λ(u), it follows that

∂tI(u) = hδI(u), [δH(u)− λ(u)δI(u)]i = 06From a mathematical point of view, the system can also be considered as a simple example

of a dynamical system on a manifold: the level set of the conserved functional as the manifoldon which a disspative system is defined.

70 CHAPTER 2. CONSTRAINED PROBLEMS

Hence, for the dynamics the functional I is conserved, a constant of the motion,a first integral :

I(u(t)) = I(u(0)) for all t.

Geometrically this is seen since the vectorfield δH −λδI is perpendicular to δI,and so tangent to the level sets of I.

On each level set of I, the system behaves like a dissipative system as treatedin Chapter 1. In fact, for the evolution of H:

∂tH(u) = hδH(u), [δH(u)− λ(u)δI(u)]i= h[δH(u)− λ(u)δI(u)], [δH(u)− λ(u)δI(u)]i,

so

∂tH(u)

½ ≤ 0= 0 iff δH(u) = λδI(u)

.

From this it follows that H decreases monotonically, except from the pointsthat are the equilibria of the system. Indeed, an equilibrium solution satisfiesthe equation

δH(u) = λδI(u)

for some scalar λ. Recalling Lagrange’s multiplier rule, this is the equation forconstrained critical points of H on a level set of I:

u ∈ Crit H(u) | I(u) = γ .

From the above observations it is clear that the trajectories are in the di-rection of constrained steepest descent ; hence the equation can be used to findconstrained minimizers of H on level sets of I in a numerical way.

Exercise.

1. The dynamic system above provides a way to prove Lagrange’s multiplierrule in an alternative way, different from the proof as given before. Givethe detailed argumentation.

2. Write down the equation of constrained steepest descent to find the solu-tion of

Min Zu2x |

Zu2 = 1, u(0) = u(π) = 0,

and investigate the convergence to the minimal element.

2.4. EXERCISES 71

2.4 Exercises1. Lemma DuBois-Reymond, integrated Euler-Lagrange equationWe start with a generalization of Lagrange’s Lemma, and then show howit can be used to weaken the regularity assumptions for a critical pointthat were required in the first chapter.

(a) Use the (linear) multiplier rule to prove the following

Proposition 47 Lemma DuBois-ReymondLet f ∈ C0([0, 1]) satisfy

hf(x), ξ(x)i = 0, for all ξ ∈ C∞0 withZ 1

0

ξ(x)dx = 0.

Then f is constant on [0, 1].

(b) Give a different proof (based on Lagrange’s Lemma) by reformulatingthe information for f by observing that

ηx | η ∈ C∞0 ([0, 1]) ≡ ξ ∈ C∞0 ([0, 1]) |Z 1

0

ξ(x)dx = 0.

Observe that this method requires the assumption that f is differen-tiable!

(c) Integrated Euler-Lagrange equation for single integral problemsFor the single integral functional L(u) ≡ R b

aL(u, ux)dx, and the ex-

pression for the first variation

δL(u; η) =Z[Luη + Luxηx]dx

the Euler-Lagrange equation was found in Chapter 1 by partial inte-gration of the last term. This required the assumption that Lux is dif-ferentiable, which usually requires the assumption that u ∈ C2. Showthat, by partial integration of the first (!) term, this assumption canbe avoided when using the results of the Lemma of DuBois-Reymondabove. The result is then the integrated form of the Euler-Lagrangeequation:

−Z x

[Lu] + Lux = constant on [a, b].

Inspecting this integrated form, conclude that at a critical point u ∈C1 actually Lux is differentiable. In most cases this implies thatactually u ∈ C2. This means that for a critical point additionalregularity is obtained from the stationarity condition!Illustrate this to the simple case L = 1

2u2x − V (u).

2. Consider for γ > 0 the value function of the minimization problem

h(γ) = Min½Z π

0

£u2x + u

2¤dx

¯ Z π

0

u4dx = γ, u(0) = u(π) = 0

¾.

Determine, up to a multiplicative factor, the value function by using thehomogeneity of the functionals (do not try to calculate the minimizersexplicitly).

72 CHAPTER 2. CONSTRAINED PROBLEMS

3. Geometric problemsFollowing are a few of the classical problems that deal with constrainedproblems. (Exploit ”energy conservation” to help to solve the equationsexplicitly.)

(a) Dido’s problemFind the surface of largest area, given the value of its perimeter.

(b) Chain lineFind (from minimal potential energy) the form of a chain with pre-scribed length hanging in the (constant) gravitational field betweengiven points.

(c) BrachistochroneDetermine the shape of a curve of given length in the vertical planethat is such that the time for a particle (starting with zero initialvelocity at the highest end point) to reach the other point is as smallas possible (friction neglected).

4. Rotating free fluid surfaceConsider a standing circular cylinder, with radius R, and with the z-axispointing in the vertical direction as axis. Assume that the cylinder ispartly filled with water (mass density ρ ≡ 1) that rotates with constantangular velocity ω around the z-axis. Assume that friction of the fluidwith the cylinder wall and the bottom can be neglected.Taking the bottom at z = 0, the free surface of the fluid can be describedby a function z = η(r), where r is the radial distance from the axis.

(a) The volume of the fluid is determined when the free surface is given,i.e. is a functional of η, to be called V (η). Determine this functional.

(b) The kinetic energy of the rotating fluid is given by

K(η) =

Z R

0

π ω2 r3 η(r) dr

and the potential energy (with g the gravitational acceleration) by

P (η) =

Z R

0

π g r η(r)2 dr.

i. Write down the equation for a minimizer of the functionalK(η)−P (η) on the set of functions η that satisfy the constraintZ R

0

2π r η(r) dr = γ

where γ is a given constant.ii. Determine the minimizing free surface explicitly (for γ suffi-ciently large).

5. Free fluid surface in a containerConsider a cylinder with axis vertically (in the direction of gravitation,

2.4. EXERCISES 73

the z-axis), partly filled with fluid. Assuming that the bottom of thecylinder is described at z = 0 by the region Ω ∈ R2, the fluid surface willbe described by u = u(x, y), so that the fluid occupies the region

(x, y, z) | (x, y) ∈ Ω, 0 ≤ z ≤ u(x, y) .

We are looking for the form of the free surface of the fluid, i.e. the functionu, from a minimal energy principle when effects of surface tension andadhesion are taken into account.To that end, let

• S = the area of the free surface,• S∗ = the area of the wetted part of the cylinder wall,• V = the volume of the water.

(a) Describe S, S∗, V as functionals of u.

(b) For given V0 > 0 and σ ∈ R with |σ| < 1 the minimization problem

Min S(u)− σS∗(u)|u ∈ C1(Ω), V (u) = V0.

Interpret this optimization problem in physical terms as a minimumenergy principle.

(c) Supposing that u ∈ C2, determine the governing boundary valueproblem for a minimizer u.

(d) Write σ = sinβ, with β ∈ (−π/2,π/2). Give the meaning of β.Sketch the form of the free surface for the two different cases thatσ < 0 and σ > 0. (Can you give examples of fluids with theseproperties?)

(e) Express the multiplier that appears in the equation for u in terms ofknown quantities (σ, area of Ω, length of ∂Ω).

(f) Approximate the functional S for surfaces for which ∇u is small, by aconstant plus a quadratic functional S(u). Write down the governingboundary value problem.

(g) Consider the special case of a circular cylinder: ∂Ω is the circle withradius R. Introduce cylinder coordinates (r,φ, z) and write S, S∗, Vas functionals of u = u(r,φ). Express the multiplier in terms of σand R.Determine explicitly the free surface for given σ and V0 (sufficientlylarge) in the approximation considered.

(h) Find a lower bound for V0 (given σ and R) for which a C2-solutioncan be expected. What is the physical interpretation?

6. ** Periodic oscillations of constrained (pseudo-) potential energyWhen looking for periodic solutions of a second order system with poten-tial energy V

−q = V 0(q), q(0) = q(T ), q(0) = q(T ),

74 CHAPTER 2. CONSTRAINED PROBLEMS

where the period T is not prescribed in advance, one may try to use theconstrained critical point problem

Crit K(x) | V(x) = R,x ∈ X,

with K(x) =

Z 1

0

1

2|x|2dτ , V(x) =

Z 1

0

V (x)dτ

and X = x ∈ C1([0, 1]) | x(0) = x(1) .

(a) Give sufficient conditions for the potential energy function V thatimply that the multiplier in the equation for the constrained criticalpoints:

−x = λV 0(x)

is positive.

(b) Show that, when λ > 0, the critical points x(τ) correspond to thedesired periodic solutions up to a scaling of the time variable. Givethe physical meaning of the functionals K and V expressed in termsof q(t).

(c) The minimization problem Min K(x) | V(x) = R, x ∈ X (assumingthe constrained set to be non-empty) has a trivial solution, viz. aconstant. Therefore, we have to look for non-minimal critical points.

(d) One case in which non-trivial critical points can be found is when Vis an even function: V(x) = V(−x). Show that in that case periodicsolutions can be found on an interval [−1, 1] by odd continuation ofa critical point on [0, 1] of

Min K(x)|V(x) = R, x ∈ X,x(0) = x(1) = 0.

(e) Find the solutions when x = (x1, x2) and V (x) = x21 + 3x22.

2.5 ** Extensions

2.5.1 Theory of Constrained Second Variation

For the manifold M given in (2.1) we calculate the second variation at a con-strained critical point u that satisfies (2.5).To that end, take v ∈ TuM and investigate for which functions w, dependingon ε and v, the curve

ε 7→ u+ εv + w

belongs toM, i.e. satisfies the constraints. Assuming w = o(ε) from the start(i.e. w/ε 7→ 0 for ε 7→ 0), it follows from

K(u+ εv + w) = K(u) + δK(u; εv + w) + 12ε2δ2K(u; v) + o(ε2)

= K(u) + δK(u;w) + 12ε2δ2K(u; v) + o(ε2) (2.18)

2.5. ** EXTENSIONS 75

that w has to satisfy

δKk(u;w) + 12ε2δ2Kk(u; v) + o(ε2) = 0, (2.19)

for 1 ≤ k ≤ p. Calculating the functional L on such a curve, using equation(2.5) for u, produces

L(u+ εv + w) = L(u) + εδL(u; v) + δL(u;w) + 12ε2δ2L(u; v) + o(ε2)(2.20)

= L(u) +Σk λkδKkL(u;w) + 12ε2δ2L(u; v) + o(ε2). (2.21)

Inserting the expression for δKk(u;w) from (2.19), there results:

L(u+ εv + w) = L(u) + 12ε2£δ2L(u; v)− Σkλkδ2Kk(u; v)

¤+ o(ε2) (2.22)

The expression

δ2L(u; v)− Σk λkδ2Kk(u; v) (2.23)

for v ∈ TuM is called the constrained second variation of L on the manifoldMat the critical point u. Note that it is precisely the (unconstrained) variation ofthe Lagrangian functional (2.6) that leads to the equation for u, but restrictedto variations from the tangent space.

Proposition 48 If u is a local extremal for L on the manifold M given by(2.1), that satisfies the multiplier equation (2.5), the constrained second varia-tion (2.23) is sign-definite in all directions v from the tangent space.Specifically, if L has a (local) minimum at u, then

δ2L(u; v)−Xk

λkδ2Kk(u; v) ≥ 0 for all v ∈ TuM. (2.24)

2.5.2 LEVP: Non-successive characterization of eigenval-ues

Recall the two reasons we have encountered until now to look for non-successivecharacterizations for eigenvalues: the error-accumulation when using numeri-cal approximations, and the comparison of non-principal eigenvalues describedabove.

Min-max and Max-Min formulations

76 CHAPTER 2. CONSTRAINED PROBLEMS

Chapter 3

Variational approximations

3.1 Motivation and Introductory ExamplesIn the introductory examples in the first Chapter we motivated and illustratedthe variational formulations for problems from mathematical physics as gener-alizations of finite dimensional optimization problems. Now, in a certain sense,we will do the opposite: we will approximate an infinite dimensional variationalproblems by some simplified, finite dimensional problem. Sometimes the aimis to get an approximation that is as good as possible; sometimes the aim isjust to get a simpler formulation that is easier to investigate. The procedure isactually very simple:Summary. of variational restriction-method:

Starting with the variational formulation of a certain problem (the ‘exact’ for-mulation), say

u ∈ Crit L(u) | u ∈M ,

a consistent simplified model is obtained by restricting the set of competingfunctions:

uS ∈ Crit L(u) | u ∈M AND u ∈ S .

We call this is a consistent way of approximating the original problem, be-cause by restriction the variational formulation is inherited, and as a conse-quence characteristic properties remain present which may otherwise be easilylost. Clearly the solution found will depend on the choice of the restriction S,and may or may not be a good approximation.

When dealing with numerical methods, the infinite dimensional function spaceis approximated by a (high-dimensional) finite space, for instance by ap-proximating the competing functions by piecewise linear functions, as inFinite Element Methods.

Different from the direct aim to find explicit or approximate solutions, onecan also use a variational structure to obtain a simplified model from acomplicated model. The restriction will then be specified by the typeof phenomena one wants to consider, thereby ignoring other phenomena

77

78 CHAPTER 3. VARIATIONAL APPROXIMATIONS

that may also be described by the original model. The simplified modelwill then (approximately) describe the selected type of phenomena, whilemaybe other phenomena are not at all, or not correctly, described. Wewill show below, and in Appendix B, how the very complicated full set ofequations for free surface waves can be simplified by making further andfurther restrictions.

We will present some examples.Example. Trial functions for BVP’s

1. Consider the simple BVP

−∂2xu = 1, ∂xu(0) = u(1) = 0

which has a parabola as solution u(x) = 12(1− x)(x+ 1).

When we would ask how to approximate the solution by a linear function,which then has to be of the form

w(x) = a(1− x)for some value a, we would not know how to do if we don’t use the exactsolution: inserting such function in the equation doesn’t give any usefulresult (however, see further the Ritz-Galerkin method for a way out).Now, however, use the exact variational formulation:

Min½ Z 1

0

£(∂xu)

2 − 2u¤ dx ¯ u(1) = 0 ¾The correct equation for the minimizer is found if all smooth functions uwith u(1) = 0 are taken. But the formulation does makes sense when werestrict to the linear functions above:

Min½ Z 1

0

£(∂xw)

2 − 2w¤ dx ¯ w(x) = a(1− x) ¾This can easily be calculated and leads to a minimization problem in thesingle variable a :

Min½Z 1

0

£a2 − 2a(1− x)¤ dx) ¯ a ¾ =Min (a2 − a) | a

with solution a = 12 . This simple example illustrates the method that can

and will be used in much more complicated situations also.An alternative would be to look for a solution in the form of a harmonicfunction. If we take it of the form w(x) = b cos(xπ/2), this trial functionsatisfies for each parameter value b both boundary conditions. Insertingin the variational formulation leads to

Min½ Z 1

0

£(∂xw)

2 − 2w¤ dx) ¯ w(x) = b cos(xπ/2) ¾= Min

½Z 1

0

·(bπ

2)2 sin2(xπ/2)− 2b cos(xπ/2)

¸dx)

¯b

¾= Min (1

2(bπ

2)2 − 4

πb) | b

3.1. MOTIVATION AND INTRODUCTORY EXAMPLES 79

attained for b = 16π3 .In the plot below the exact solution (bold line) is

compared with the two approximations found here.

2. Now a slight variant:

−∂2xu = 1, u(−1) = u(1) = 0.

for which an approximation is sought as a tent-function obeying the bound-ary conditions: w(x) = a(1− |x|) with obvious result as above with mirrorsymmetry.

3. Instead of using one ‘tent’ function, one could use other trial functions, forinstance polynomials, harmonics, etc. Or combinations of such functions.When taking combinations of many of such functions, which preferablyform a basis when the number of combinations would not be limited, thisbecomes the area of numerical methods, which will briefly be described inthe next section.

Accuracy of the restricted solution

By restricting the domain of the functional in the variational restriction method,the critical point of the restricted functional will in general not coincide withthat of the functional on the original domain of definition. Stated differently,the restricted critical point will not satisfy the original Euler-Lagrange equa-tion. What can we say about the quality of this restricted critical point as anapproximation of the original E-L-equation? Compared to the original set ofadmissible variations, the restriction method restricts the admissible variations.As a consequence, the directional derivative will vanish only in the restricted

80 CHAPTER 3. VARIATIONAL APPROXIMATIONS

directions. Recalling the ideas of the LMR, this means that only the projectionof the variational derivative (i.e. of the E-L-equation) on the restricted set willvanish.In general notation, assume that the set is given by a parametrized family

of functions:

S = U(p) | p ∈ P

where P is the ‘parameter space’; for instance, this may be the collection ofFourier coefficients in the procedure to restrict the original functions to finiteFourier-truncations. Then the admissible variations are given by

∂U

∂pδp

where δp are the variations in the parameters, and a critical point of the re-stricted functional L will satisfy

∂pL(U(p)) ≡ δL(U(p); ∂U

∂pδp) = 0.

This can be written with the variational derivative in a more appealing way like

< δL(U(p)), ∂U∂p

δp >= 0.

This shows that, instead of satisfying the full E-L-equation δL(u) = 0, therestricted critical point U(p) satisfies the projection in the admissible directionsonly. For instance, if the parameters are from a linear finite dimensional space(i.e. are ‘constants’), δp can be taken out of the innerproduct and it follows (ifthe parameters are independent)

< δL(U(p)), ∂U∂p

>= 0.

Observe that these are just as many ‘equations’ as their are parameter valuesp to be determined in the calculation of the restricted critical point; we willreturn to this in the next section.When the ‘parameters’ are actually functions themselves, it cannot be taken

out of the inner-product and the expression means that the E-L-equation issatisfied in a weighted sense with weight determined by the admissible directions.

3.2 Variational Numerical Methods

3.2.1 General method

Let S be a finite dimensional subset of an infinite dimensional function space,usually taken as the set consisting of linear superpositions of certain base func-tions φk

SN =©ΣNk=1akφk(x)

¯a1, ...aN

ª

3.2. VARIATIONAL NUMERICAL METHODS 81

To denote such a combination, we often refer only to the vector composed ofthe coefficients

SN ' RN = a|a = (a1,..., aN );the linear combination can then be written when we think of a as a row-vector,and collect the base functions in a column vector like a · φ(x).For numerical purposes we often want ‘completeness’ for N →∞, meaning

that each element from the original function space can be arbitrary well (insome norm) approximated by some element from SN provided N is taken largeenough.For illustration, for functions of one variable x the following typical choices

of base functions are characteristic for so-called global base functions and localbase functions respectively:

• Harmonic functions, Fourier truncation, a global method:Using complex notation, and adjusting the numbering of the index slightly,the base functions are

φk = eikx;

for periodic functions on a given interval of length 2π the index is restrictedto the integers, say m ∈ [ −N...N ], for arbitrary L2-functions the indexis continuous and the summation becomes integration; this is the usualdescription with a Fourier-basis. Functions can then be represented asusual

u(x) = ΣaNm=−Nφm(x), with....am =1

Zu(x)e−imxdx

• FEM, linear elements, a local methodConsider an interval on which the problem is defined; make a partition ofthe interval, for simplicity say equidistant, with mesh-size h and nodes tobe called x0, x1,x2, ...xN ; the interval (xm, xm+1) is called the (m+1)-thelement. Consider, so-called, linear splines: continuous linear functionsthat are confined to two successive elements (and normalized amplitude:

φm(x) =

½0 for x /∈ (xm−1, xm+1)1− |x− xm|/h for x ∈ (xm−1, xm+1)

For obvious reasons, such functions will be called ‘tent-functions’. Theseform a local base (have only limited extension: any function can thus beuniquely represented by a continuous piecewise linear function, connectingthe values of the function at the nodes:

u(x) = Σmumφm(x), with um = u(xm).

Intuitively, by letting N → ∞ the approximation becomes better andbetter: the basis is complete.Taking the values of a function in nodes is also done in so-called collocationmethods, in Finite Difference-methods. There then the behaviour of thefunction in between is not explicitly stated, Taylor-expansions are used toderive formulas for derivatives etc. Here, in FEM-method, the behaviourof the function in between nodes is prescribed before hand, which will benecessary to approximate integrals that appear in functionals.

82 CHAPTER 3. VARIATIONAL APPROXIMATIONS

Example. A functional of Sturm-Liouville type:

L(u) =Z £

p(x)(∂xu)2 + q(x)u2 − 2f(x)u¤ dx

reduces to function on RN by restricting to functions from SN as follows

L(a) := L(u = ΣNk=1akφk) = Σm,kamakPmk +Σm,kamakQmk − 2ΣmamFmin which we have introduced the N ×N matrices P and Q

Pmk =

Zp(x)(∂xφm)(∂xφk), Qmk =

Zq(x)φmφk

and the vector

Fm =

Zf(x)φm.

Then this can be written in matrix notation more elegantly like

L(a) = (P +Q)a · a− 2F · aThis function L : RN → R is called the restriction of the functional to the finitedimensional set.The SL-BVP results from variations of the functional

Min u L(u) : − ∂xp(x)∂xu+ qu− f = 0;the corresponding equation for the vector a follows analogously from variationsof the restricted function

Min a L(a) : (P +Q)a− F = 0In this way we have found a discretization of the differential operator on the setSN :

−∂xp(x)∂x + q ←→ P +Q

Observe that the matrices are symmetric (from their definition), and the sym-metric operator has been consistently modelled by a symmetric matrix. Thefinite dimensional equation for a is called the projection of the original equa-tion into the set SN .Solving the algebraic equation can be done with linear analysis-methods; how-ever, the variational structure can also be used: a direct method to minimizethe function L(a).Example. S-L-Eigenvalue problem

For the S-L-operator above, the constrained formulation for the eigenvalue prob-lem

−∂xp(x)∂xu+ q(x)u = λρ(x)u

reads

Min½ Z £

p(∂xu)2 + qu2

¤ ¯ Zρu2 = 1

¾

3.2. VARIATIONAL NUMERICAL METHODS 83

and reduces to

Min (P +Q)a · a | Ra · a = 1 where the (symmetric) matrix R is defined as Rmk =

Rρ(x)φkφm

The corresponding eigenvalue problem in RN :

(P +Q)a = µRa

will produce approximations for (N) eigenfunctions (forming a complete baseinRN since the eigenvalue problem is symmetric) and of the corresponding(real) eigenvalues.Example. Of course, the precise form of the matrices depend on the set of

base functions.Consider the simple S-L-eigenvalue problem with p(x) = ρ(x) = 1, q(x) = 0 onthe interval x ∈ [0,π] with Dirichlet boundary values.• Take φk(x) = sin(kx). Show that the algebraic problem has only diagonalmatrices and produces exactly the first N eigenfunctions and eigenvalues;in this way we reproduce the Fourier-sine-series.

• Using linear elements in FEM, leads to tri-diagonal matrices. For instancefor non-boundary matrix-elements:

Pmk =

Z(∂xφm)(∂xφk)dx =

−1/h for k = m± 12/h for k = m0 for k 6= m,m± 1

Rmk =

Zφmφkdx =

h/6 for k = m± 12h/3 for k = m0 for k 6= m,m± 1

In particular, the SL-operator is replaced by a three-point stencil whichin an internal node reads£−∂2xu¤xm ↔ um−1 − 2um + um+1

h2,

and which is the same discretization as in FD-methods when using centraldifference scheme for the second derivative.Note, however, that the identity operator in function space is also replacedby a three-point stencil, making a certain average over 3 neighbouringpoints:

[u]xm ↔ (um−1 + 4um + um+1)/6,

different from standard FD-reasoning1.1 In that case, it is custom to take

[u]xm ↔ um.

This would correspond to the gradient of the function Σu2m. This can be seen as the ’dis-cretization’ of the functional

Ru2dx, if the function u is approximated by a piecewise constant

function! Hence, this is then NOT consistent with the reasoning to approximate with piece-wise linear functions, which lead to the central difference. Stated differently: by using thediscretization [u]xm ↔ um the function Σu2m is not a restriction of the quadratic functionalto the set of piecewise linear functions; the functional has been changed.

84 CHAPTER 3. VARIATIONAL APPROXIMATIONS

Exercise. Study the FEM-discretization of the eigenvalue problem on aninterval (0, 2π) with periodic boundary conditions

−∂2xu = λu;

write down the discrete eigenvalue problem, find the eigenvectors and eigenval-ues and compare the approximations obtained with the exact results as functionof the number of elements.

3.2.2 Projection of (variational) equations

Contemplation: If we have a certain problem, and some ‘approximate solution’,how do we judge the quality of that solution?In general the answer is difficult to give. Let us write our problem abstractly

like E(u) = 0; usually a (set of) ode’s or pde’s with initial and or boundaryvalues. If we can find exact solutions (rarely), we are done and can verify thatthese are truly solutions by simply substituting. However, if we substitute anapproximate (‘trial’) solution, the equation will not be satisfied (precisely) andsome ’residue’ is left:

E(uapprx) = resIntuitively, when the residue is ’small’ we will expect the approximation to begood, but it is difficult to make this more precise. (What is small: in whichnorm, etc.) Using linearization around an exact solution, the error is related tothe residue by the linearized equation as follows:

E(uexact + err) = E(uexact) + E 0(uexact)err + ...,so E 0(uexact)err = res

so, the error is small if the residue is small and the linearized operator boundedlyinvertible.When we restrict a functional to find an approximate solution, the situation

seems more natural. Indeed, now E(u) = δL(u), say for L defined on infinitedimensional space. Suppose we restrict to some linear subspace S. This thedefines a function L on S, the restricted function

L(u) = L(u) for u ∈ S.A critical point of L satisfies ∇L(uS) = 0, where ∇ denotes the gradient fordifferentiation in S. This critical point uS ∈ S is then an approximation of theoriginal minimizer. The relation between ∇L(uS) = 0 and the original equationδL(u) = 0 is expressed with the first variation like:

∇L(uS) = 0 ⇐⇒ δL(uS ; ζ) = 0 for all ζ ∈ S.This means that the directional derivative is zero for all admissible variationswithin the subspace S. For instance, when S is the span of a set of base functionsϕjNj=1 and u = Σjujϕj then

L(u) = L(Σujϕj), and so∂L

∂uj=< δL(u),ϕj > .

3.3. CONSISTENT MODELLING BY RESTRICTION 85

Vanishing of the gradient∇L(uS) = 0 then implies that uS satisfies the equationδL(u) = 0 not in the full function space, but precisely in the directions ϕj , forj = 1...N :

< δL(uS),ϕj >= 0 for j = 1...N.

Summarizing, variational restriction to a subspace gives a solution within thesubspace that corresponds to the vanishing of the projection of the equation intothe subspace.This also gives a constructive way to find the solution: there are N equations

for the N coefficients uj , j = 1...N to be determined. Given the base functions,these equations can be written down directly from the equation.

Ritz-Galerkin projection method

The idea above has been generalized to arbitrary equations, not necessarilyvariational equations.Consider the general equation E(u) = 0, and consider two sets of base func-

tions: ϕjNj=1 and ψjNj=1. The so-called Ritz-Galerkin method is to find anapproximate solution

u = Σjujϕj such that < E(Σjujϕj),ψm >= 0 for all m = 1..N.

Note that again this gives N equations for the N unknowns uj, j = 1...N .In the variational restriction method the projection is in the same subspace

as the solution is sought: ψj = ϕj ; this special case is called Ritz’-method. (Seefurther Section ‘Direct Optimization methods’.)A motivation to choose different base functions for projecting the equation

than the solution is that the function spaces may be very different if E is adifferential operator; the aim of the choice ψ is to capture the main ‘directions’of the equation.

3.3 Consistent modelling by restriction

3.3.1 Restriction to suitable families of functions

In the numerical methods treated above the functional was discretized by sub-stituting for the function a linear combination of base functions. In fact, themain reason to do so comes from the completeness argument of getting betterapproximations of the function by taking larger superpositions. Often a moreclever Ansatz is possible when some characteristic property of the solution isknown in advance. For instance, if it would be known that the optimal solutionis a confined function, or a quickly decreasing function on the interval, Fourierdecomposition would require many modes to describe such a function. Then aprofile function with a few parameters may give an approximation that can bejust as good in a practical sense. The difference depends on the formulation ofthe set to which the functional is restricted. In the latter case, the choice of thisset is important for the quality of the approximation. Besides that, usually theparameters that describe the functions to be varied do not appear linearly: theset is in general not a linear space.We will illustrate the line of reasoning to specific examples.

86 CHAPTER 3. VARIATIONAL APPROXIMATIONS

At several other places in these notes we will approximate a ‘hump-like’solution by simple ‘tent-functions’ which have as parameters (to be varied andto be determined) the amplitude and width; other trial-functions, usually moreelaborate to deal with but possibly more accurate, could be taken. For instance,Gaussian functions; then the parameters appear in a nonlinear way, such as theparameter σ in the Gaussian Ae−x

2/σ2 .

Nonlinear oscillator: Duffing’s equation

Consider the equation for an oscillator with third order nonlinearity

x+ x+ x3 = 0, x(0) = ², x(0) = 0.

We are interested in (relatively) small amplitude solutions, so ε is small.Phase plane analysis of this 2-nd order Lagrangian equation shows that all

solutions are periodic, with period that depends on the amplitude of the solution.Straightforward series expansion in the amplitude will lead to resonance in

the third-order, which can be prevented by adjusting the frequency in Poincare-Lindeloff type of way. We will now deal with two (closely related) variationalvariants of this method.The first variant starts with the observation that the equation x+x+x3 = 0

transforms after a time scaling τ = ωt ∈ [0, 2π] to

ω2u00 + u+ u3 = 0.

This can be interpreted as a nonlinear eigenvalue problem, with ω2 (the squaredfrequency) to be sought as the Lagrange multiplier. Recalling the correspond-ing theory, the multiplier is found as the derivative of the value function of aconstrained variational problem:

ω2 =d

·Max

½ Zu2 +

1

2u4¯ Z

u02 = γ

¾¸.

An approximation for the minimizer and the minimizing value can be obtainedby using as trial function the solution of the linear equation, i.e. U = ε cos(τ),with amplitude related to the given constraint, i.e. ε =

pγ/π. Then (usingR 2π

0cos4(τ)dτ = 3π/4) we findZ

U2 +1

2U4 = γ +

1

2γ2/π2

Zcos4(τ)dτ = γ +

3

8πγ2

which leads to

ω2 =d

·γ +

3

8πγ2¸= 1 + γ

3

4π, hence ω ≈ 1 + γ

3

8π= 1 +

3

8ε2.

In the original variables this leads to the first order solution that is asymptoti-cally correct, given by

x(t) = ε cos

µ·1 +

3

8ε2¸t

¶+O(ε3).

3.3. CONSISTENT MODELLING BY RESTRICTION 87

The, closely related, second variant is to derive this result without trans-forming the variables, and apply the same reasoning directly to the Lagrangianfunctional for the equation:Z ·

x2 − (x2 + 12x4)

¸dt

Substituting the trial function x(t) = ε cosωt, and taking the interval of timeintegration to be one period, t ∈ (0, 2π/ω) there resultsZ ·

ε2ω2 sin2(ωt)− ε2 cos2(ωt)− 12ε4 cos4(ωt)

¸= ε2ωπ− ε2π/ω− 3π

8ε4/ω

Taking variations with respect to ε (or ε2) the same result follows:

ω2 − 1− 34ε2 = 0, i.e. ω ≈ 1 +

3

8ε2

WKB-approximation

The standard second order, linear, non-autonomous ode of mathematical physicsis the equation for the pendulum with slowly varying frequency, or, equivalently,the equation that describes the field of an optical pulse in a slowly varyingmedium:

∂2zu+ k2(z)u = 0.

For arbitrary function k(z) the solutions of this equation cannot be writtendown explicitly. When the given inhomogeneity k(z) varies ’slowly’2 in z, agood approximation can be found; we will show this now using the variationalstructure of the equation.First we remark that it is possible to look for solutions in the form of a

phase-amplitude representation

u(z) = A(z)eiθ(z).

Then the equation for the (real) amplitude and (real) phase can be obtained bysubstituting in the ode. However, it is somewhat simpler to do this from thefunctional. To that end observe thatZ £|∂zu|2 − k2|u|2¤ dz = Z £

(∂zA)2 +A2(∂zθ)

2 − k2A2¤ dz.2This ‘slowly’ vaying can be made more specific by assuming that the function k(z) is

actually given by

k(z) = K(εz)

where K is a given, fixed function; then taking ε small, the function k will be slowly varying:changes of unit order in k will take place on distances x = O(1/ε), which is large for small ε.Equivalently this can be seen by looking at the derivative:

∂zk(z) = ε∂ζK(ζ) for ζ = εz :

with ∂ζK(ζ) = O(1), it follows that ∂zk(z) = O(ε). This explains why the amplitude A(z)in the following is actually a function of εz and that therefore the expression (∂zA)2 in thefunctional, or ∂2zA in the equation, is of second order and will be neglected.

88 CHAPTER 3. VARIATIONAL APPROXIMATIONS

The correct equations are then found (verify!!) by writing down the Euler-Lagrange equations that result from variations in A and θ.When we assume now that the function k(z) varies slowly, it is reasonable

to neglect the termR(∂zA)

2 in the functional. Then the equations become:

δA

Z £A2(∂zθ)

2 − k2A2¤ dz = 2A£(∂zθ)

2 − k2¤ = 0,δθ

Z £A2(∂zθ)

2 − k2A2¤ dz = −2∂z£A2∂zθ

¤= 0.

These can be solved, for θ up to an an initial phase constant, and for theamplitude up to a multiplicative constant:

θ(z) =

Z z

k(ξ)dξ, A(z) =cpk(z)

The corresponding solution

u(z) =cpk(z)

e±iR z k(ξ)dξ

is called the WKB-approximation (Wentzel, Kramer, Brioullin). It is a remark-ably accurate approximation and describes well the main features of the exactsolution.

3.3.2 Design of simplified models

Different from the direct aim to find explicit or approximate solutions, onecan also use a variational structure to obtain a simplified (usually a restricted,approximate) model. For instance, in Appendix B we show in some detail howthe very complicated full set of equations for free surface waves can be simplifiedby making further and further restrictions:

the full equations are valid for small to large amplitude waves of any wavelength,

then by restricting to ‘rather small, rather long waves’ only — to be made moreprecise in order to become operational—, the full equations become themuch simpler Boussinesq (type of) equations;

while Boussinesq describes waves running in both directions, a further restric-tion to waves running in only one direction lead to a simpler model ofKdV-type;

KdV describes waves with a broad spectrum; by restricting to narrow spectra,an equation for the envelope can be derived, a NLS-type of equation.

ALL these simplified models can be most easily derived by using the varia-tional structure. That is by restricting the functional (Hamiltonian) to smallerand smaller classes of wave phenomena. The validity of the model becomesmore restricted in doing so. But the simplified model needs not to be much lessaccurate, provided it is used for the correct type of phenomena. This variation-ally consistent way of modelling assures that each more limited model retains a

3.4. DIRECT OPTIMIZATION METHODS 89

variational structure, inherited from the full equations.

Example. (See Appendix B for a similar reasoning as described here toderive variationally consistent simplified models for free surface waves.)Consider the wave equation:

∂tu = −∂xδH(u) with H(u) =

Z ·1

2u2 + β(∂xu)

2 + γu4¸dx.

When restricting to small amplitude waves, the quartic term in the functional(qubic in the equation) will be neglected. Further, the resulting linear equationhas dispersion, determined by the term with β. Looking at the dispersion rela-tion, for waves of small wave length, this β-term doesn’t contribute much, andwe could neglect this term also.Z ·

1

2u2 +

1

2β(∂xu)

2 +1

4γu4

¸dx

→ small amplitude →Z ·

1

2u2 +

1

2β(∂xu)

2

¸dx

→ & long waves →Z ·

1

2u2¸dx.

These restrictions show itself in a different way in the simplification of thesuccessive equations:

∂tu = −∂x£u− β∂2xu+ γu3

¤→ small amplitude → ∂tu = −∂x

£u− β∂2xu

¤→ & long waves → ∂tu = −∂x [u] .

The final equation is the simplest equation, the translation equation: ∂tu =−∂xu.Of course, there is no justification in this process: we decide beforehand to whattype of phenomena (type of waves) we want to restrict our attention; then wechoose the restricted class accordingly.It should be noted that the simplified model that then results may have, andwill have in general, also solutions far out of the restricted set. Indeed, e.g.the translation equation has solutions of arbitrary amplitude, and very short,just as well as very long, waves. But of course, in view of the derivation, thisequation, and hence its solutions, are only relevant as a simplified model (withsolutions that approximate well the solutions of the original equation) when thesolutions belong to the restricted class: waves of small amplitude and long wavelength.

3.4 Direct optimization methodsOne of applications of Variational Methods in Numerics is the cleverly designednumerical scheme called Conjugate Gradient Method (CGM) to solve linearsystems, sayAx = b, whereA is an n×n -matrix while x and b ∈ Rn. To simplify,without loosing the idea, here we have assumed that A is a symmetric positivedefinite matrix. Such linear systems are often found in the applications of

90 CHAPTER 3. VARIATIONAL APPROXIMATIONS

other numerical schemes such as Boundary Element Method and Finite ElementMethods. We will motivate the description of CGM by first examining a schemecalled the steepest descent.

3.4.1 Steepest Descent

Consider the following function

φ(x) =< x,Ax > − < x, b >where A is an nxn symmetric positive definite matrix, x, b ∈ Rn, <,> is aninner product defined by < x, y >= xT y. The equilibrium point, say xe, of thedynamical system

∂x

∂t= −∇φ(x)

satisfies ∇φ(x) = Ax− b = 0, so it is a solution of the liner system Ax = b. As

∂φ

∂t=< ∇φ, ∂x

∂t>=< ∇φ,−∇φ >= −|∇φ|2

½= 0, if ∇φ(x) = 0< 0 , otherwise

then xe is an isolated local minimizer of φ(x). Furthermore for a given point x06= xe, that is the level set φ(x) = φ(x0) does not cross the equilibrium point xe,the trajectory of x starting from x0 always points toward xe.The steepest descent iterative numerical scheme for solving the linear system

Ax = b is designed based on the above observation, namely finding the minimizerof φ(x) starting from a point, say x0, taking the direction of the steepest descent−∇φ(x). In this fashion, if rc = b−Axc = −∇φ(xc), called the residual at thecurrent point xc, the next point xnext to be reached from xc is by taking thedirection rc, that is

xnext = xc + αrc

where α is obtained by minimizing φ(xc + αrc). It is not difficult to see thatα = <rc,rc>

<rc,Arc>.

The scheme thus can be written briefly as followsx0 : initial guessr0 = b−Ax0k = 0while rk 6= 0

k = k + 1αk =

<rk−1,rk−1><rk−1,Ark−1>

xk = xk−1 + αkrk−1rk = b−Axk

endAlthough the method is globally convergent, for some cases when the matrix

A has eigenvalues of different orders of magnitude, the convergence is very slow.The following example may provide the idea, GIVE AN EXAMPLE WITH APICTURE!

3.4. DIRECT OPTIMIZATION METHODS 91

3.4.2 Conjugate Gradient Method

Search directions

To overcome the weakness of the steepest descent, the choice of better searchdirections to move from one level set of φ to another level set will be describedin the following. As a motivation to understand the concept, let first considera simple two dimensional linear system written in the matrix form Ax = b,where A = (aij), i, j = 1, 2 is a 2 × 2-matrix, while x and b ∈ R2. Let x0 bethe initial guess and let the first direction p1 = r0, the residue at x = x0. Letx1 = x0 + α1p1 be the next point where α1 =

<p1,r0><p1,Ap1>

, and let r1 be theresidue at x = x1, then < p1, r1 >= 0 (show this!). If xe is the exact solutionthe system, then Axe = b, then

< xe − x1, Ap1 >=< Axe −Ax1, p1 >=< b−Ax1, p1 >=< r1, p1 >= 0

This shows that xe − x1 perpendicular to Ap1 or xe − x1 ∈ Ap1⊥. To movefrom the current point x1 to the exact solution xe, one must take a searchdirection that lies on Ap1⊥. The following picture illustrates this concept.

PICTURE!To generalize the idea to an n dimensional space, we will employ the following

properties.

Proposition 49 Let A be an n × n symmetric positive definite matrix, x, b ∈Rn, and φ(x) =< x,Ax > − < x, b > .Let x0 be an initial guess and p1, p2, ..., pkbe the first k general search directions, that is xk = x0 +

kPi=1

αipi, for some

αi, i = 1, 2, ..., k.If αi, i = 1, 2, ...k are chosen such that xk minimizes φ(x) andrk = b−Axk then < pi, rk >= 0, i = 1, 2, ..., k.

Proof. φ(xk) = φ(x0)+ <kPi=1

αipi, Ax0 > +12 <

kPi=1

αipi,kPi=1

Aαipi > − <kPi=1

αipi, b > .

Then 0 = ∂φ∂αi

=< pi, Ax0 > + < pi,kPi=1

αipi > − < pi, b >=< pi, A(x0 +

kPi=1

αipi)− b >=< pi, Axk − b >=< pi, rk >, i = 1, 2, ..., kFrom this property, if xe is the exact solution of Ax = b, it follows that for

i = 1, 2, ...k,

< xe − xk, Api >=< Axe −Axk, pi >=< b−Axk, pi >=< rk, pi >= 0

that is xe−xk perpendicular to each ofApi, i = 1, 2, ..., k or xe−xk ∈ Ap1, Ap2, ..., Apk⊥.It is then making sense to take the next search direction pk+1 ∈ Ap1, Ap2, ..., Apk⊥to move from xk on the level set φ(x) = φ(xk) to the next point xk+1. Here,xk+1 = xk + αk+1pk+1, where αk+1 is chosen such that xk+1 minimizes φ(x).The following proposition shows the possibility of designing an iterative

scheme for solving Ax = b that converges at most for n steps.

92 CHAPTER 3. VARIATIONAL APPROXIMATIONS

Proposition 50 Let p1, p2, ..., pk be the first k general search directions andpk+1 ∈ Ap1, Ap2, ..., Apk⊥ . Let xk+1 = x0 +

kPi=1

αipi + αk+1pk+1 be a mini-

mizer of φ(x) for αi, i = 1, 2, ..., k, k + 1.Then αi, i = 1, 2, ..., k can be obtained

from minimizing φ(x0 +kPi=1

αipi) independently from αk+1. Having obtained

αi, i = 1, 2, ..., k and so defining xk = x0 +kPi=1

αipi, αk+1 =<pk+1,rk>

<pk+1,Apk+1>,

where rk = b−Axk.

Proof. First observe that φ(xk+1) = φ(x0 +kPi=1

αipi + αk+1pk+1)

= φ(x0 +kPi=1

αipi) + αk+1 < pk+1,kPi=1

αiApi > +12α

2k+1 < pk+1, Apk+1 >

−αk+1 < pk+1, r0 > .Since pk+1 ∈ Ap1, Ap2, ..., Apk⊥ then < pk+1,

kPi=1

αiApi >= 0. Thus minimiz-

ing φ(xk+1) leads to two independent minimizations Min φ(xk+1) =Minαi,i=1,2,..,k

φ(x0 +kPi=1

αipi)+ Minαk+112α

2k+1 < pk+1, Apk+1 > −αk+1 < pk+1, r0 > .From

the second part, we obtain αk+1 =<pk+1,r0>

<pk+1,Apk+1>. We again use the fact that

pk+1 ∈ Ap1, Ap2, ..., Apk⊥ to obtain < pk+1, rk >=< pk+1, b − Axk >=<pk+1, b − A(x0 +

kPi=1

αipi) >=< pk+1, b − Ax0 >=< pk+1, r0 > giving αk+1 =

<pk+1,rk><pk+1,Apk+1>

Given p1, p2, ..., pk the first k general search directions, the next questionis how to choose pk+1 ∈ Ap1, Ap2, ..., Apk⊥. The idea is find the right searchdirection that minimizes the resids. Thus to combine the above scheme withthe steepest descent. Here, pk+1 is chosen to be the orthogonal projection orrk = b − Axk into Ap1, Ap2, ..., Apk⊥. In other word pk+1is a minimizer of||p − rk||2, p ∈ Ap1, Ap2, ..., Apk⊥. It can be shown that such pk+1can bewritten as

pk+1 = rk + βkpk

where βk = −<pk,Ark><pk,Apk>

.

CGM-Algortihm

The above description leads to an iterative scheme as follows.x0 : initial guessr0 = b−Ax0p1 = r0α1 =

<p1,rk−1><p1,Ap1>

k = 0while rk 6= 0

k = k + 1xk = xk−1 + αkpkrk = b−Axk

3.4. DIRECT OPTIMIZATION METHODS 93

βk = −<pk,Ark><pk,Apk>

.pk+1 = rk + βkpkαk+1 =

<pk+1,rk><pk+1,Apk+1>

end

94 CHAPTER 3. VARIATIONAL APPROXIMATIONS

Appendix A

Variational Optics

A.1 Basic equations

A.1.1 Macroscopic Maxwell Equations

The Macroscopic Maxwell Equations (MME) in a medium without free chargesare given in its standard form by

∂t

µDB

¶=

µ0 curl

−curl 0

¶µEH

¶where the basic electromagnetic fields are

E : electric field

H : magnetic field

and the variables

D : dielectric displacement

B : magnetic induction

are expressed in E,H by so-called constitutive relations:

• for propagation in vacuum, D = ε0E,B = µ0H with ε0, µ0constant(ε0µ0 =

1c2 with c the speed of light in vacuum);

• for propagation in material, polarization effects are present because ofinteraction of fields with molecules and electrons; in these lectures we willassume that the magnetic susceptibility vanishes at the relevant opticalfrequencies, in which case one has

D = ε0E+P(E)

B = µ0H

with polarization P depending on E in a way determined by the materialproperties.

95

96 APPENDIX A. VARIATIONAL OPTICS

• For lossless materials, to which we will restrict in the following, the con-stitutive relations can be formulated using constitutive functionals1 . Inparticular, a functional H of E,H can be found such that

D = δEH , B = δHH with H = C(E) +Z1

2µ0H ·H.

For instance, in vacuum, the constitutive functional C(E) on a domain Ωreads

C(E) =Z1

2ε0E ·E

• As a consequence of the variational structure of the constitutive relations,and the fact that (with suitable boundary conditions) the matrix opera-

tor Γ =µ

0 curl−curl 0

¶is skew-symmetric (since curl is symmetric),

Maxwell equations can be written down in the following variational form2:

∂tδH = ΓδE with E =Z1

2(E ·E+H ·H).

Monochromatic lightIn many cases one is interested to investigate time harmonic solutions (often

called CW: Continuous Waves, in the physics literature), with frequency ω thatmay be prescribed or to be found. Then it is custom to exploit complex notationand write fields like E = 1

2 Ee−iωt + cc, where here and in the following, cc

denotes ’complex conjugate’. Solutions of this type can only be expected toexist provided the polarization of a time-harmonic field is purely harmonic withthe same frequency. Then the equations become

−iωµD

B

¶=

µ0 curl

−curl 0

¶µE

H

¶which can be written by eliminating the magnetic field like

−ω2µ0D = curl curlE.

The variational formulation is then retained by using the related constitutivefunctional:

δ

·Z|curlE|2 + ω2µ0C(E)

¸= 0.

1We do not specify here whether these functionals are defined as integrals over the spatialdomain or as integrals over the time. We will see below in the case of one spatial dimensionthat a time-integration may be most natural.

2As a dynamical system evolving in time, this is of the form of a Poisson system when thefunctionals involved are given by integrations over the spatial domain. When they are givenby integrals over time, the ’dynamic’ interpretation is different, but a variational structure ispresent. This can be seen by formally writing the equations like

∂−1t Γ

µEH

¶= δH

and, observing that in space-time the operator ∂−1t Γ is symmetric, writing down the La-grangian for this equation.

A.1. BASIC EQUATIONS 97

A.1.2 Restriction to 2 spatial dimensions

In the following we will restrict to two-dimensional (2D) spatial problems (orto 1D). We will think of structures and variables independent of y, and lightpropagation in the z-direction. Then the total set of equations for the six fieldcomponents decouple into two sets of equations for three components only, asplitting is so-called TE-modes (transverse electric) and TM-modes (transversemagnetic):

TE-case : E = (0, Ey, 0), H = (Hx, 0,Hz)

TM-case : E = (Ex, 0, Ez), H = (0,Hy, 0)

Restricting to the TE-case, and assuming that the polarization has also onlyits y-component non-vanishing, MME’s become

∂tDy = ∂zHx − ∂xHz

µ0∂tHx = ∂zEy

µ0∂tHz = −∂xEy.These equations can be reduced to a scalar equation for E ≡ Ey, with D ≡ Dy,the sME (scalar Maxwell Equation):

sME : µ0∂2tD = ∆E ≡ ¡∂2x + ∂2z

¢E;

in vacuum this leads to the standard wave equation: ∂2tE = c2∆E.

For monochromatic light there results the Helmholtz equation:

−ω2µ0D = ∆E

with variational formulation

δ

·Z|∇E|2 + ω2µ0C(E)

¸= 0.

A.1.3 Restriction to 1 spatial dimension

With further restriction, uniformity in the x and y-direction, a further simplifi-cation is obtained: the MME’s become

∂tDy = ∂zHx, µ0∂tHx = ∂zEy (A.1)

and hence

sME : µ0∂2tD = ∂2zE

Helmholtz : −µ0ω2D = ∂2z E

A.1.4 Bidirectional equation for pulse propagation

When the Maxwell equations are restricted to depend on one spatial directiononly, the z-direction as above, there result equations for the y-component of theE-field and the x-component of theH-field; assuming that also the electric polar-ization has only its y-component non-vanishing, and restricting to non-magneticmaterials, the equations (A.1) can be written as a bidirectional equation like

∂z

µEH

¶=

µ0 ∂t∂t 0

¶µDµ0H

¶(A.2)

98 APPENDIX A. VARIATIONAL OPTICS

which can also be written as the second order scalar equation

∂2zE = µ0∂2tD

In the following we consider lossless material with linear dispersion given byε1(ω) and non-dispersive quadratic and/or qubic nonlinearity3; then the dielec-tric displacement is given by

D = εoE + ε1 ∗E + χ2E2 + χ3E

3

and can be written as the variational derivative with respect to E of the consti-tutive functional4

C(E) =Z ·

1

2

¡ε0E

2 + ε1 ∗E ·E¢+1

3χ2E

3 +1

4χ3E

4

¸dt.

The linear dispersion relation for modes ei[kz−ωt] has two solution branches

k = ±K(ω) with K(ω) ≡ ω

cR(ω) ≡ ω

c

p1 + ε1(ω)/ε0

with K(ω) real-valued and skew symmetric for real frequencies. Introducing the‘Hamiltonian’

H = C(E) +Z1

2µ0H

2dt

the equations can be written as a Hamiltonian system evolving in z as follows

∂z

µEH

¶=

µ0 ∂t∂t 0

¶µδEHδHH

¶.

Using the analogy with wave propagation in fluid dynamics, this variationalstructure of the equations can be exploited to derive simplified models thatdescribes the envelope equation for waves propagating in one direction. Withoutderivation, we simply state the results in the next subsections.

A.1.5 Unidirectional Maxwell equation

For weakly dispersive, non-linear equations, as considered here, in a good ap-proximation a splitting can be made between right and left travelling waves.Following the unidirectionalization process described in detail in Van Groesen& De Jager [13], the result is the following unidirectional Maxwell equation(uni-ME)

∂zE +1

c∂t£R(i∂t)E + χ2E

2 + χ3E3¤= 0 (A.3)

where we use the notation χ2,3 = χ2,3/2ε0.3Actually, the following can be generalised in many ways: higher order dispersion, disper-

sion in nonlinear terms, higher order non-linearity; only the lossless character is of importancewhich implies the existence of a constitutive potential for the dielectric displacement. Withthe same assumption for the magnetic polarization, magnetic properties can be included aswell.

4Note that here the constitutive functional is given pointwise in space as an integrationover time.

A.1. BASIC EQUATIONS 99

The linear part corresponds to the right-travelling branch of the dispersionrelation, k = K(ω), and can be written like (∂z + iK(i∂t))E = 0. This can beseen easily from the linear bidirectional equation by writing it as

(∂z − iK(i∂t)) (∂z + iK(i∂t))E = 0;this also shows that an exact splitting between right- and left-travelling wavesis possible.The unidirectional equation has inherited the Hamiltonian structure of the

bidirectional equation, as can be seen by writing

∂zE = −1c∂tH with H =

Z ·1

2RE ·E + 1

3χ2E

3 +1

4χ3E

4

¸dt. (A.4)

The corresponding magnetic field is for this unidirectional propagation given byH = −

qε0µ0E.

Remark. In the theory of surface waves on a layer of fluid, a similar equa-tion (with z and t interchanged, and for long-wave dispersion approximated byR = 1 + ∂2t ) is known as the Korteweg - de Vries (KdV) equation; it describesunidirectional surface waves in a remarkably good approximation.

A.1.6 NLS Envelope equation for pulse propagation

We now present the equation for the envelope of a wave group centered at acentral frequency ω. The resulting wave group is a modulation of a harmonicmode, represented by a complex-valued amplitude A :

u(z, t) = A(z, t)eiθ + cc, with θ = K(ω)z − ωt.

The equation for the amplitude is an NLS-type of equation. To get it inan attractive form, it is custom to eliminate the first order term in the dis-persion by introducing a frame moving with the group velocity 1/K0(ω), i.e.τ = t−K0(ω)z, ζ = z, and to approximate the dispersion relation by a quadraticpolynomial at ω :

K(ω + ν) ≈ K(ω) +K0(ω)ν + βν2 with β =1

2K00(ω).

and make the following scaling (restricting for simplicity to the simplest case ofqubic nonlinearity5, χ2 = 0 and χ3 6= 0)

z∗ = z/c and u =pχ3E.

Then the NLS-equation is obtained:

NLS: ∂ζA+ iβ∂2τA+ iγ|A|2A = 0. (A.5)

Again, this equation has variational structure: it is a (complex) Hamiltoniansystem (evolving in space):

∂ζA = iδH(A), with H(A) =Z ·

1

2β|∂τA|2 − 1

4γ|A|4

¸dτ

5For quadratic nonlinearity, the third order resonance appears through interaction of firstwith second order effects. The interaction coefficient γ is then much more complicated. See[?, ?, ?] for more details.

100 APPENDIX A. VARIATIONAL OPTICS

Remark. The coefficients γ and β depend on ω and the sign of χ3; a simplescaling transforms this equation to the normalized form

∂ζA+ i∂2τA+ isign(βγ)|A|2A = 0.

Different signs lead to equations with essentially different properties:

CNLS: sign (βγ) = 1, Converging NLS

DNLS: sign (βγ) = 1, Diverging NLS

For instance, for CNLS (γ < 0 and anomalous dispersion, i.e. K00(ω) < 0)soliton solutions exist, but this is not the case for DNLS.The NLS-equations are well known in optics and have been studied exten-

sively; see e.g. [1, 15, 20]

A.1.7 Spatial 2D NLS

Consider the Nonlinear Helmholtz equation in a plane medium with Kerr-nonlinearity (third order: χ3)

∆E + ω2£n2 + χ3|E|2

¤E = 0

It is then quite common to look for a beam propagating in the z-direction withamplitude-variations in the transversal x-direction and ‘slowly’ varying in z:Substituting for real propagation constant6 β and complex-valued amplitude Athe Ansatz

E = A(x, z)eiβz

there results

∂2zA+ 2iβ∂zA+ ∂2xA− β2A+ ω2£n2 + χ3|A|2

¤A = 0

The assumed ‘slow’ variations in the z-direction is exploited by neglecting thesecond order derivative ∂2zA which then leads to the equation

2iβ∂zA+ ∂2xA− β2A+ ω2£n2 + χ3|A|2

¤A = 0

Assuming the index to be constant (for simplicity), by taking β2 = ω2n2 thissimplifies to an NLS-equation in the form

2iβ∂zA+ ∂2xA+ ω2χ3|A|2A = 0, (A.6)

which can be further rewritten to normalized form when desired.

A.2 Optical waveguide modes

A.2.1 Preliminaries

Consider a wave guide of width 2w in the x-direction. For simplicity, we willconsider an index that is symmetric around the z-axis. Although more general

6 It is custom to use the notation β for the propagation constant, and we follow this custom.Note, however, that this parameter is different from the parameter β = 1

2K00(ω) that we used

in the previous subsection on pulse propagation.

A.2. OPTICAL WAVEGUIDE MODES 101

index variations can be taken, we will consider in particular the case (that canbe analyzed most easily) that the index of refraction is a step function

n(x) = n0 + (n1 − n0)χ(−w,w) =½

n1 > n0 for |x| < wn0 for |x| > w .

For the TE-case, with time-harmonic electric field, E = u(x, z)e−iωt Helmholtzequation for the space dependent field u reads

∆u+ ω2n(x)2u = 0.

The discontinuity in the index requires to find a ‘weak’ solution, i.e. a solutionthat satisfies the interface conditions

u and ∂xu continuous at x = ±w.Then, Fourier transformation with respect to z leads one to look for solutionswith harmonic z dependence:

u(x, z) = φ(x)eiβz.

The value β is usually called the ‘propagation-constant’; it is the wave numberof the travelling wave in the z-direction of the E field:

E = φ(x)ei(βz−ωt)

Then the problem for the profile function φ becomes

∂2xφ+¡ω2n(x)2 − β2

¢φ = 0 (A.7)

with corresponding interface conditions for φ

φ and ∂xφ continuous at x = ±w.For a step-index the solutions of this problem can be found explicitly. Indeed,

both within and outside the wave guide, the index is constant, and the ode forφ is a simple equation for which the solution can be written down explicitly.Then the interface conditions should match the interior solution with the outersolution to obtain a valid solution on the whole real line.In more detail the calculation is as follows.For constant n the solutions of (A.7) depend on the sign of ω2n2 − β2.

If positive, say k2 := ω2n2 − β2 > 0, the the solutions are harmonic: φ =A cos(kx) + B sin(kx) = aeikx + be−ikx, while for negative values, say ρ2 =β2−ω2n2 the solutions are exponentials: φ = C cosh(ρx)+D sinh(ρx) = ceρx+de−ρx.In the following we will restrict to the case that the solutions we are looking

for should vanish at infinity; these are so-called guided modes, defined by therequirement

φ(x)→ 0 for |x|→∞.[[Solutions that are periodic (in x) are called radiation modes, and are referred toas non-guided modes: the light is not ‘confined’ to, not guided by, the waveguide.]]

102 APPENDIX A. VARIATIONAL OPTICS

Collecting these pieces, we conclude that we can expect guided modes onlyif the value of β satisfies

ω2n21 > β2 > ω2n20.

Then the solution will be harmonic in the interior, and exponential outside.Using symmetry, and an (arbitrary) normalization to unity at x = 0, the solutionin the two regions can be written like

φ(x) =

cos

µqω2n21 − β2x

¶for 0 < x < w

A exp

µ−qβ2 − ω2n20(x− w)

¶for x > w

.

The parameter A is arbitrary as yet, but has to be chosen to satisfy the interface

conditions. Requiring continuity, leads to the value A = cosµq

ω2n21 − β2w

¶.

To satisfy the continuity of the first derivative requires:qω2n21 − β2 sin

µqω2n21 − β2w

¶= A

qβ2 − ω2n20,

i.e.qω2n21 − β2 tan

µqω2n21 − β2w

¶=

qβ2 − ω2n20,

which can be written using λ =qω2n21 − β2 like

λ tan(λw) =

qω2 (n21 − n20)− λ2.

Since solutions of this transcendental equation cannot be written down explic-itly, we rely on graphical presentation. Plotting the graphs of the right- andleft-hand side as function of λ shows that there is at least one solution λ, andhence at least one value of β for which both interface conditions are satisfied,and hence a physical mode profile is obtained.Stated differently, the problem (A.7) for guided modes is an eigenvalue prob-

lem, with φ the eigenfunctions to be sought and β2 the eigenvalues. The possiblevalues of β2 are discrete and number of modes will depend on the width w andthe index difference n1 − n0. There is always at least one mode, which corre-sponds to the largest value of β2; this will be called the principal mode, and issymmetric and sign-definite (say positive). For simplicity of exposition, we willrestrict the presentation in the following to this principal mode.Exercise. Investigate the transcendental equation to find the number of

modes as depending on the width w and the index-difference. Study in partic-ular the limiting cases, for instance at fixed index-difference, the cases w → 0and w → ∞. Make a plot (maple) of the principal value β as function of thewidth.Exercise. Investigate radiation modes: the continuum of solutions (bounded,

but non-vanishing at infinity) that correspond to values of β ∈ (0,ωn0) (thecontinuous part of the spectrum). Give also the physical interpretation of thesesolutions.

A.2. OPTICAL WAVEGUIDE MODES 103

A.2.2 Variational formulation for guidedmodes with Trans-parent BC’s

Direct formulation on the unbounded domain

For general index variation, the eigenvalue problem, including the required in-terface conditions at places where the index has jumps, can be formulated withRayleigh’s quotient or the constrained formulation. When the index profile isassumed to be symmetric (as above), for the normalized, symmetric, principaleigenfunction we can formulate the problem on the positive half-line and thesymmetry-boundary condition at x = 0 : ∂xφ = 0. The variational formulationis then as follows (verify!!):

−β2 = minφ

½ Z ∞0

£∂xφ

2 − ω2n2φ2¤dx

¯ Z ∞0

φ2 = 1 , φ→ 0 for x→∞¾

(A.8)

Confined formulation using Transparent Boundary Conditions (TBC)

The problem above is formulated on the unbounded domain. Certainly for nu-merical methods this is a problem and one would like a confined formulation.We will now derive this formulation directly from the general formulation, butusing the knowledge of the solution of the exterior problem. The arguments tofind the formulation proceed as follows for the case of a arbitrary index variationwithin the wave guide of width w, and uniform outside with index n0.

Take any point outside the wave guide, say x = B ≥ w; we will split theinterval [0,∞) in a bounded interior and unbounded exterior: [0,∞) = [0, B] ∪[B,∞). In the exterior domain we use the fact that we know the solution ifthe eigenvalue, say β, and the value Φ of the field at the point x = B would beknown. Then we match this exterior solution to the yet unknown solution ψ inthe interior, requiring only continuity of the solution at x = B, hence we putΦ = ψ(B). So, trial functions are taken to be of the form:

φ(x) =

(ψ(x) for x ∈ [0, B]ψ(B) exp(−

qβ2 − ω2n20(x−B)) for x > B.

The part of the integral over the exterior domainR∞B

£∂xφ

2 − ω2n2φ2¤dx is

first reduced by partial integration and then by using the fact that the functionsatisfies the correct equation there. This leads toZ ∞

B

£∂xφ

2 − ω2n20φ2¤dx =

Z ∞B

£−∂2xφ− ω2n20φ¤φdx− [φ∂xφ]x=B

= −β2Z ∞B

φ2dx+ ψ(B)2qβ2 − ω2n20

With the normalization 1 =R∞0

φ2 =R B0ψ2 +

R∞B

φ2 we arrive for the integral

104 APPENDIX A. VARIATIONAL OPTICS

over the total domain atZ ∞0

£∂xφ

2 − ω2n2φ2¤dx =

Z B

0

£∂xψ

2 − ω2n2ψ2¤dx

+ψ(B)2qβ2 − ω2n20 − β

2

Ã1−

Z B

0

ψ2

!

=

Z B

0

h∂xψ

2 + (β2 − ω2n2)ψ2

idx+ ψ(B)2

qβ2 − ω2n20 − β

2

With this result we can transform the original constrained eigenvalue formula-tion to the following unconstrained formulation, where now the ‘variables’ to bevaried are the function ψ on the confined interval, and the unknown parameterβ :

minφ

½ Z ∞0

£∂xφ

2 − ω2n2φ2¤ ¯ Z ∞

0

φ2 = 1 , φ→ 0 for x→∞¾=

minψ,β

(Z B

0

h∂xψ

2 + (β2 − ω2n2)ψ2

idx+ ψ(B)2

qβ2 − ω2n20 − β

2

)(A.9)

Observe that the correct Euler-Lagrange equation and the natural boundarycondition at x = 0 are found:

∂2xψ + ω2n2ψ = −β2ψ for x ∈ (0, B) (A.10)

∂xψ = 0 at x = 0

Moreover, at x = B, variations of the free end value ψ(B) leads to the naturalboundary condition

∂xψ = −qβ2 − ω2n20ψ at x = B−. (A.11)

This is a boundary condition for the ‘interior’ solution (indicated by writingB−). For the exterior solution a similar relation holds:

∂xφ = −qβ2 − ω2n20φ at x = B+.

Since we required φ(B) = ψ(B), it follows that also the derivatives are the same:

φ(B) = ψ(B) and ∂xφ(B) = ∂xψ(B).

This means that the interior and the exterior solution are matched to make onegenuine solution on the whole real line since both interface conditions at x = Bare satisfied.

Summary. Resuming, we can say that we have reduced the problem on thewhole real line (A.8) to the variational formulation (A.9) on a bounded domain.

Remark. Observe that the variational formulation (A.9) still has the char-acter of a minimization problem. The optimal value, the eigenvalue to be found−β2, is actually equal to −β2 where β is the solution of the variational problem,

A.2. OPTICAL WAVEGUIDE MODES 105

i.e. the optimal value β is the eigenvalue to be found.

Remark. Formulating the results for the differential equation, we observethat in the interior domain we look for a solution and a value β such that(A.10) is satisfied together with the boundary condition (A.11). This boundarycondition for the interior problem is a condition of mixed Neumann-Dirichlettype, and could be called a transparent boundary condition. The remarkablefact is that this condition makes it possible to replace the unbounded problemto a BVP on a bounded interval.

Remark. Actually, the position of the point x = B in the above has been‘arbitrary’ outside the wave guide, since for x > B the solution was determinedby the exterior value n0. This makes it clear that there is no objection to takeB = w, just the boundary of the wave guide. For numerical purposes this is themost optimal way to do.

A.2.3 Approximations with simple trial profiles

Confinement at ‘partly-optimal’ Dirichlet boundary

We start with the formulation on the unbounded domain for the case of the step-index. We try a very crude approximation: approximate the principal mode bya confined function identically vanishing for |x| > W , where the width W willbecome a parameter in the trial function.Remark. An approximation of this kind, to ‘confine’ the field, is often

made: knowing that the field vanishes exponentially, for large enough W thefield is very small, and is there, ‘far out’, replaced by zero: a Dirichlet boundarycondition.Continuing with this crude approximation, and anticipating the symmetry

and positiveness of the solution, let us simply take as trial profile a functionthat is piecewise linear, a ‘tent-function’. Thus,

φ(x) = a(W − |x|)The approximate constrained formulation then provides an approximate valueof the eigenvalue, and becomes (we use additionally symmetry in x)

−β2app = mina,W

(a2Z W

0

£1− ω2n2(W − x)2¤ dx ¯¯ a2

Z W

0

(W − x)2dx = 1)

The integrals can be evaluated easily, and the 2-parameter constrained opti-mization problem can be solved.To appreciate the value of the variational formulation, and the good result

that is obtained for the eigenvalue, in the plots below we present the eigenvalueas function of the width of the waveguide and compare this with the plot of theexact values. Also the value of the ‘optimal’ Dirichlet-width W as function ofthe wave guide width is shown.PLOTSExercise. Assume the waveguide is bi-modal, i.e. supports precisely two

guided modes. The second mode will then be an odd function of x. Describea comparable approximation for the eigenvalue of the second mode by using atent-function vanishing at x = 0 and at x =W .

106 APPENDIX A. VARIATIONAL OPTICS

Using the confined formulation

The approximation with Dirichlet conditions above is rather awkward, sinceno non-identically vanishing field can smoothly (satisfying interface conditions)be connected to a zero field in an exterior domain. When using the confinedformulation derived above this problem can be resolved.Then again, a simple way to find the principal eigenvalue would be to ap-

proximate the function by a tent-like trial function in the interior x < B. TakingB = w this is a function of the form

φ(x) = a(w − x) + b

The minimization problem then reduces to a 3-parameter minimization problemin the parameters a, b and β. Results of the calculations are plotted below, andcompared to the exact value and to the approximation with optimal-Dirichletboundary conditions derived above.PLOTS....

Exercise. Observe that for the fundamental mode it is possible to make amuch more clever choice for the trial function: a harmonic function of the form

φ(x) = a cos(px)

with a, p parameters. In fact, in this case we know that for solutions in the

interior it should hold that p =qω2n21 − β2; using this information the problem

reduces to a minimization problem in only two parameters: a and β. Verify thatin this way the exact solution is found.

Exercise. Use a tent-like approximation to approximate the eigenvalue ofthe second mode in a bi-modal wave guide. Then use a harmonic trial function.

A.2.4 Variational formulation for radiation modes

The (non-guided) modes are solutions that do not vanish at infinity. Restrictingto bounded solutions, for β2 < ω2n20 the profile functions φ behave for x > wlike

φ(x) = φ(w) exp(±iqω2n20 − β2(x− w)) .

As stated above, these solutions are called radiation modes; their behaviour inthe x-direction is oscillatory. In the z-direction the solution is oscillatory forβ2 > 0 (and hence the electrical field propagating). [[ For β2 < 0 the solutionis evanescent (exponentially decreasing or increasing) in the z-direction. ]]A variational formulation on the whole real line formally doesn’t make sense

since the exterior integrals will diverge. However, the confined variational formu-lation, now with β prescribed, is still sensible, just as the differential formulationwith the transparent boundary condition.

A.2. OPTICAL WAVEGUIDE MODES 107

A.2.5 FEM-numerics for complicated index variations

Of course, the confined formulation is ideally suited to be discretized using FEM:in the interior the function can be approximated by elements, for instance linearelements, leaving the value at the end point of the waveguide free. Actually,arbitrary index-profile within the wave guide can then be calculated; the derivedformulation remains valid as long as the exterior problem is not changed.This remark opens the possibility to deal with much more complicated inter-

nal structures, for instance consisting of a number of parallel waveguides witharbitrary index profiles.

Exercise. Using this idea, but taking only a simple trial function, approxi-mate the eigenvalue of the principal guided mode of a system consisting of twoseparated, parallel wave guides (with the same, or different, index, larger thanoutside).

108 APPENDIX A. VARIATIONAL OPTICS

Appendix B

Variational Fluid Dynamics

B.1 Free Surface Wave Models

The evolution of waves on the surface of a layer of fluid (such as water) remainsa challenging task. Assuming the fluid to be incompressible and inviscid, andthe flow to be irrotational, the full surface wave equations (FSWE) are wellknown. The combination of dispersive and non-linear effects present majorproblems in the numerical simulation as well as in the theoretical analysis of theresulting interesting phenomena. Although FSWE is a well known descriptionfor the complete physics, this set of equations is too complicated for a directinvestigation. Therefore, to gain insight in interesting characteristic phenomenasimplified models are desired that are amenable for theoretical investigations,while, at the same time, should be accurate enough to capture the phenomenonof interest.In this section we present a unified view based on the basic variational struc-

ture and describe models of KdV- and NLS-type of equations and their relation,avoiding for shortness all variational-consistent modelling steps in between1.

B.1.1 Full surface wave equations

We consider the motion of a layer of fluid under the following simplifying as-sumptions:

• the fluid is inviscid, incompressible (density normalized to unity), and nosurface tension;

• the bottom is flat, at depth z = −h;• the fluid motion is irrotational, assumed to be uniform in the (horizontal)y-direction and unbounded in the x-direction; if the horizontal and verticalvelocities are denoted by U = U(x, z, t) and W = W (x, z, t) respectively,

1For simplicity we will approximate the dispersive properties as they are relevant for longwaves (shallow water), leaving out more precise dispersive properties for short waves (deepwater). Formally speaking, the dispersion for long waves leads to the classical KdV-equation,and to the defocusing (diverging) NLS-equation for wave packets. For waves with small wavelengths, comparable to the depth of the fluid, the full dispersive properties should be dealtwith, which leads to a non-local version of the KdV-equation and the focusing (convergent)NLS-equation.

109

110 APPENDIX B. VARIATIONAL FLUID DYNAMICS

irrotationality allows the introduction of the fluid potential Φ such that(U,W ) = ∇Φ, U = Φx, W = Φz.Then incompressibility implies

∆Φ = 0;

• the surface elevation is the graph of a function (no overturning waves)η = η(x, t).

Then the governing equations are

∆Φ ≡ Φxx +Φzz = 0, − h < z < η(x, t) (B.1)

Φz = 0 at z = −h, (B.2)

∂tη = −ηxΦx +Φz at z = η(x, t), (B.3)

∂tΦ+1

2(Φ2x +Φ

2z) + gη = 0 at z = η(x, t). (B.4)

Equation (B.3) is a kinematic condition; equation (B.4) is a dynamic condition,resulting from Bernoulli’s equation restricted to the free surface. For a correctinterpretation of the FSWE, most important is to observe how the interiorLaplace problem is linked to the dynamic equations.Notation. In the following we will sometimes use normalized variables,

which lead to the same equations but with g = h = 1.

B.1.2 Variational structure of FSWE

Another way to interpret the dynamical structure is using a variational descrip-tion. In fact, it has been observed (independently) by Zakharov (1968) andBroer (1974) and Miles (1977) that FSWE can be described as a Hamiltoniansystem. Summarizing, this can be described by using as variables the fluidpotential at the free surface

φ(x, t) = Φ(x, η(x, t), t)

and the surface elevation. Then the full surface wave equations can be describedas a Hamiltonian system (see for full details [13])µ

∂tη∂tφ

¶=

µ0 −11 0

¶µδηH(φ, η)δφH(φ, η)

¶(B.5)

Here, H is the Hamiltonian functional which, just as for Hamiltonian systemsfrom Classical Mechanics, is the sum of kinetic and potential energy

H(φ, η) = K(φ, η) +

Z1

2gη2dx.

The kinetic energy is given for solutions of the Dirichlet problem for the Laplaceproblem in the fluid domain:

K(φ, η) =

Z Z1

2|∇Φ|2dxdz =

Z1

2φ [∂nΦ]z=η dx.

Since this functional cannot be expressed explicitly in terms of φ, η, simplifiedmodels are sought for by constructing approximations for this functional. In

B.1. FREE SURFACE WAVE MODELS 111

doing so, and taking as governing equations the system (B.5) with the approx-imated Hamiltonian, leads to a model that has retained the basic variationalstructure, which is not guaranteed in a direct approach.Remark. The equations can be reformulated by using a velocity type of

quantity, the x-derivative of the potential at the free surface:

u(x, t) = ∂xφ.

This could be expected since the potential is determined from physical quantitiesonly up to an arbitrary constant. Then the equations get the form (generalizedHamiltonian, Poisson-structure)µ

∂tη∂tu

¶= −

µ0 ∂x∂x 0

¶µδηH(u, η)δuH(u, η)

¶(B.6)

observe the skew-symmetry of the matrix-differential operator.Variational-consistent models: linear, Boussinesq, KdV and NLSUsing the variational formulation of the FSWE above, simplified models

can be obtained with variational-consistent modelling by approximating theHamiltonian, in particular the kinetic energy functional. We will give the mainsteps and results below, and refer to [13] for full details.

B.1.3 Linearized SW, dispersion

In linearized theory, infinitessimal small surface elevations are considered. With-out surface elevation and bottom variations, the problem is on a straight stripand can be solved in closed form by Fourier techniques. For infinitesimally smallamplitude waves, the linearized problem can be solved and leads to the basic(linear) dispersion relation between frequency and wavenumber, given by

ω = ±Ω(k), with Ω(k) = kpg tanh(hk)/k. (B.7)

Introduce the dispersion operator R such that ω = kR(k), so

R(k) =pg tanh(hk)/k,

and observe the limiting behaviour for small wave numbers (long waves)

R ∼ RKdV ≡pgh(1− 1

6h2k2) for k → 0.

Note that this shows that this limiting operator is a simple differential operator:

RKdV =pgh(1 +

1

6h2∂2x)

Using this dispersion operator, the kinetic energy is approximated by a singleintegral functional over the horizontal direction

Klin(u, η) =

Z1

2guR2udx.

112 APPENDIX B. VARIATIONAL FLUID DYNAMICS

The linearized equations are then given by (B.6) with linearized Hamiltonian:

Hlin =

Z ·1

2guR2u+

1

2gη2¸dx.

Observe that when dispersive effects are neglected, this leads to the Hamil-tonian for shallow water

Hsh =

Z ·1

2hu2 +

1

2gη2¸dx

and the simple equations

∂tη = −∂x [hu] , ∂tu = −∂x [gη] .

This is also true for varying bottom, leading to the second order equation forthe wave elevation

∂2t η = ∂x [gh(x)∂xη] .

B.1.4 Boussinesq type of equations

One step further than the linear approximation, is to take for small ampli-tude solutions a first order nonlinear effect into account. At the same time,assumptions on the characteristic wave length are commonly made; mostly therestriction is to ‘long’ waves.A characteristic, often used approximation is the so-called Boussinesq ap-

proximation, which corresponds to the specific relation between the wave ampli-tude ε and wave length λ given by 1/λ2 ∼ ε, the case of ‘rather small, rather longwaves’. This is the basic assumption to arrive at what are called Boussinesq-type of equations. These equations describe both waves running to the rightand the left. The equations are again given by (B.6), now with Hamiltonian

HBous =

Z ·1

2guR2KdV u+

1

2gη2 +

1

2ηu2

¸dx.

B.1.5 KdV type of equations

Further restricting to waves running mainly in one direction, leads to KdV-typeof equations2. From two dependent variables u, η the approximation leads to

2Korteweg-de Vries equation (1895)Korteweg and de Vries derived in 1895 a model equation for the motion of waves on the

surface of a layer of fluid above a flat bottom. Restricting to rather low, rather long waves,they derived the equation (B.8) that now bears their name with RKdV (k). This equationbecame well known in the sixties since it turned out that from a mathematical point of viewit was the first partial differential equation shown to be completely integrable, leading to ahuge extension of the theory of nonlinear pde’s. It also became clear that many problems inphysics and technics are modelled by this equation.Being an evolution equation, first order in time, the initial value problem requires to find

the evolution of the surface profile from a given initial profile. This initial value problem forKdV is not easy to solve; for arbitrary initial profiles, numerical calculations have to be usedto find the subsequent wave profiles; the complete integrability makes it possible in principleto write down the time-asymptotic profile.

B.1. FREE SURFACE WAVE MODELS 113

a single variable, say the surface wave elevation η. The Hamiltonian becomes(with some factors taken out)

HKdV =

Z ·1

2ηRη +

1

4η3¸dx

and the equations becomes a first order in time equation of the form

∂tη = −∂xδηH(η), i.e. ∂tη = −∂x(Rη + 34η2) (B.8)

for the normalized surface elevation η. Taking the dispersion operator RKdV ,KdV is a partial differential equation:

∂tη = −∂x(η + 16∂2xη +

3

4η2) (B.9)

Often, a moving frame is introduced and an additional scaling in the spatialvariable, and the equation gets a ‘standard’ form given by

∂tη + ∂x

·∂2xη +

1

2η2¸= 0.

B.1.6 NLS-model

Considering perturbations of a monochromatic wave centered around a certainwavenumber k0, one looks for the slow and small evolutions of the (complex-valued) amplitude A of the form

η(x, t) = A(x, t)ei(k0x−Ω(k0)t + cc

For the second nonlinearity as in KdV, the analysis is somewhat complicated(more than for qubic nonlinearities as in optics). The result is easily described:the governing equation for A reads

i [∂tA+ V0∂xA] + β∂2xA+ γ|A|2A = 0 (B.10)

Here, V0 = Ω0(k0) , β = −12Ω00(k0) is the group velocity and the coefficientfor the group velocity dispersion respectively. γ is a coefficient from modegeneration and also depends on k0.In a frame moving with the group velocity, the equation reduces to the form

of the NLS-equation:

i∂τA+ β∂2xA+ γ|A|2A = 0Performing a simple scaling transforms this equation to the standard form ofthe NLS-equation

i∂τA+ ∂2xA+ sign(βγ)|A|2A = 0, (B.11)

a well known equation that has been studied extensively (e.g. [1]).The sign of the coefficients, or better sign(βγ), determines the character of

the NLS equation:

• diverging (defocusing) NLS if sign(βγ) < 0, and

114 APPENDIX B. VARIATIONAL FLUID DYNAMICS

• converging (focusing) NLS if sign(βγ) > 0.

The converging NLS has soliton-type of solutions and more ’confined’ solu-tions, as we shall see in the next section. In this case, the dispersive and thenonlinear effects counterbalance each other, while in the defocussing NLS thewaves will spread.For surface wave equations, for all wavelengths β is negative. For the KdV-

dispersion, γ is positive for all wavelengths, while when using the full dispersionproperties γ changes sign from positive for k < kcrit to negative for k > kcrit.The critical wave number kcrit is known as the Davey-Stewartson value, ap-proximately kcrit ≈ 1.363. This distinguishes the two cases that can appear innature.The variational formulation for (B.11) is of the form of a complex Hamil-

tonian system and reads

∂τA = −iδHNLS(u), with HNLS(A) =

Z ·1

2|∂xA|2 − sign(βγ)

4|A|4

¸dx.

Appendix C

Solitons and wave groups

C.1 Coherent structures as relative equilibria

We have come across KdV- and NLS-type of equations in optics and in surfacewaves:

in optics:for pulse propagation in dispersive, nonlinear material, the variations ofthe E-field are described by KdV-type of equation (A.3), and the com-plex amplitude of wavegroups describing modulations of a monochromaticwave by the NLS-equation (A.5);in material with Kerr-nonlinearity the amplitude variations for a beampropagating and slowly deforming in one direction is described by theNLS-equation (A.6);

in surface waves:the KdV for the surface wave elevation for unidirectional waves (B.9), andfor the complex amplitude of wavegroups describing modulations of amonochromatic wave, the NLS-equation (B.10).

Depending on the type of application the parameters have a different mean-ing, just as well as the independent variables which are time- or space-like.Performing a scaling, the parameters can be scaled away, and a ‘standard’ formof the equations can be written like:

for KdV

∂tu = −∂xδH(u), with H(u) =Z ·

1

2(∂xu)

2 +1

6u3¸dx

for (Converging/Diverging)NLS

∂tA = −iδH(A) with H(A) =

½ R £12 |∂xA|2 − 1

4 |A|4¤

for CNLSR £12 |∂xA|2 + 1

4 |A|4¤

for DNLS

For ease of presentation we will interpret in the following t as time and x asspatial variable.

115

116 APPENDIX C. SOLITONS AND WAVE GROUPS

Equations like these that are nonlinear and are therefore difficult in the sensethat usually no explicit solutions can be written down. Occasionally specialsolutions can be found, and even a family of solutions depending on parameters.These special solutions are often found in an ad-hoc way, using some specialAnsatz, such as a ‘travelling wave’ Ansatz. In many cases it can be understoodin a constructive way using the theory of Relative Equilibria for dynamicalsystems from Classical Mechanics, when generalized to Variational EvolutionEquations like wave equations and more general continuous Poisson systems.We will show in this Appendix how this theory can be applied to find someof the most famous ‘coherent structures’ in nonlinear wave theory: the solitonsolutions of the KdV and NLS equation.

C.2 Solitons of KdV

C.2.1 Motivation from Travelling Wave Ansatz

The motivation for Korteweg and de Vries to study the problem of surface waves,was to settle a dispute that continued throughout the nineteenth century aboutthe existence of travelling waves: is it possible that a wave exists that doesn’tchange in time, but is merely translated at a fixed speed?They showed, by deriving their (KdV-) equation and analyzing it, that the

answer is affirmative. More so, it is possible to write down the wave shapes andspeeds explicitly. This is quite unexpected at first sight, since KdV combinesnonlinearity (leading to ‘breaking’-phenomenon) and dispersion (‘spreading’ ofinitial profile). The remarkable property is that these combined effects make itpossible that there exist travelling waves, waves with a specific profile, say f ,that will neither break nor spread (an exact balance between the counteractingeffects of breaking and spreading), and that travel undisturbed in shape at aspecific speed, say V . That is, a solution of the form

η(x, t) = f(x− V t)for specific profiles f and specific related velocity V .To find the wave profile f and the velocity V , we substitute this form in the

KdV-equation, in normalized variables

∂tη + ∂x

·∂2xη +

1

2η2¸= 0.

Then the pde becomes an ode for the function f in which V enters as a parameterto be determined together with the profile. We shall see that, in fact, there is awhole family of such waves; the higher the amplitude, the larger the velocity.Writing ξ := x− V t, the equation becomes−V ∂ξf(ξ) + f(ξ) ∂ξ f(ξ) + ∂3ξf(ξ) = 0

A solution of this equation, for certain V , produces the wave profile f of thewave that travels undisturbed in shape at speed V .

Analysis of solitary wave profiles

To find the solution we have to distinguish two cases:

C.2. SOLITONS OF KDV 117

• space-(and time-) periodic solutions, for which f is a periodic functionof ξ, the so-called cnoidal waves (since the profile is expressed with theelliptic cnoidal function), and

• solitary wave solutions: wave profiles of a single hump that decay, to-gether with all derivatives, sufficiently fast at infinity (“almost confined”,exponentially small outside a certain interval).

We will concentrate on the solitary wave profiles.Then by integrating the equation above once, noticing that the constant ofintegration has to vanish as a consequence of the decay at infinity, leads to thesecond order ode for the profile:

−V f(ξ) + 12f(ξ)2 + ∂2ξ f(ξ) = 0 (C.1)

This equation can be solved in a standard way by observing the mechanicalanalogue: when ξ is interpreted as the time, and f as the position, the equationdescribes the motion of a particle of unit mass subject to a potential force withpotential energy U according to Newton’s law:

∂2ξ f(ξ) +dU

d f= 0 (C.2)

with potential energy

U(f) = −12V f2 +

1

6f3.

The plot of U is qualitatively as shown below, at the left for positive valuesof V , at the right for negative values:

Looking for a solitary wave profile f that decays to zero for ξ tending to−∞, ∞, we look for the solution that is nontrivial and connects the origin with

118 APPENDIX C. SOLITONS AND WAVE GROUPS

itself: a homoclinic orbit. Clearly, this can only be achieved for positive valuesof V .In more detail, for the profile equation mechanical-energy conservation holds

and phase plane analysis can be used as is described in the following.Multiplying (C.2) with ∂ξf and integrating the equation again, there results:

1

2[∂ξf ]

2 + U(f(ξ)) = E.

Since E should be zero for a solitary wave profile, the equation becomes

1

2[∂ξf(ξ)]

2 − 12V f(ξ)2 +

1

6f(ξ)3 = 0.

This is a first order equation for the profile function and its solution can begiven explicitly. This solution is a solitary wave profile: for each V with V > 0it is given by

f(ξ, V ) =3V

cosh(12√V ξ)2

Two profiles, for V = .2 and V = 1, are shown below. The solution above canbe recognized in the phase plane z = f, w = ∂ξ f(ξ) in the following way. Thecurves of constant energy, given by

1

2w2 + U(z) = E,

are sketched in the phase-plane (z,w) below; in this phase portrait the solitarywave corresponds to the homoclinic orbit which is the level curve through theorigin (for which E = 0).

Remark. With V the velocity, the amplitude is proportional to V , and thewidth proportional to 1√

V: the larger the amplitude, the more confined the wave,

and the larger its speed. We will find this same result in a way that doesn’t usesthe detailed formula in the following.

C.2. SOLITONS OF KDV 119

C.2.2 Solitons as Relative Equilibria

The KdV equation in the standard form has the dynamic structure

∂tη = ∂xδH(η), with H(η) =

Z ·1

2(∂xη)

2 − 16η3¸dx.

Restricting to waves that decay at infinity, the following functional is aconstant of the motion for the KdV-equation, as can easily be verified:

I(η) =

Z1

2η2dx.

The corresponding flow ΦI follows from solving the equation

∂τη = ∂xδI(η) = ∂xη

which is a simple translation of any initial profile u(x):

ΦIτ (u)(x) = u(x+ τ).

This motivates to call I the momentum integral.The constrained variational problem for RE using the Hamiltonian and the

momentum integral reads:

Crit H(u) | I(u) = γ = Crit

½ Z ·1

2(∂xη)

2 − 16η3¸ ¯ Z

1

2η2 = γ

¾with equation from LMR for a critical point f and multiplier λ:

−∂2xf −1

2f2 = λf

This is precisely equation (C.1) for the profile function with (minus) the multi-plier the velocity. Then the dynamic solution is given by

ΦIλt(f) = f(x+ λt)

just as found above.Remark. Observe that the soliton, now characterized as a Relative Equi-

librium (coherent structure) with the constrained formulation, has the propertythat it minimizes the energy at given momentum, or reversed, that the momen-tum is maximized for prescribed energy: nature chooses the soliton profile totransport ‘information’ with least energy at given momentum, or with maximalmomentum at given energy. A nice description and ‘discovery’ of optimality innature.Remark. Actually the KdV equation is so special that it has infinitely

many integrals; therefore many relative equilibria can be constructed by takingmore and more integrals as constraints For instance, such formulations will leadto two- and more general N -soliton interactions. The problem is then that the‘flows’ of these other integrals are just as difficult to find as the KdV equationitself; the momentum functional above is special in that respect.

120 APPENDIX C. SOLITONS AND WAVE GROUPS

Scaling argument for KdV solitons

The constrained variational RE-formulation, can also be used to derive in asimple way the relation between amplitude, width and velocity of a soliton byapproximating the soliton by a simple confined tent-function. Consider the‘tent-’ function as trial function with amplitude a and width W as parameters:

φ(x) =

½aW−|x|W for |x| < W0 for |x| ≥W

Then, leaving out all numerical constant that appear, the constrained formula-tion leads to a simple optimization problem in the parameters:

Crit½ Z £

∂xu2 − u3¤ dx ¯ Z u2dx = γ

¾∼ CritW,a

½W

·³ aW

´2− a3

¸ ¯Wa2 = γ

¾∼ CritW

h γ

W 2− γ

pγ/W

i(taking a > 0)

attained for W ∼ γ−1/3 with value γ5/3

From this it follows: W ∼ γ−1/3, a ∼ γ2/3,λ ∼ ∂γγ5/3 ∼ γ2/3. Hence,

a ∼ λ ∼W−1/2, the same scaling as found above for the exact formula.

Exercise. Consider the so-called BBM eqn. (Benjamin, Bona & Mahony,1972):

(1− ∂2x)∂tu = −∂xu− u∂xu (C.3)

This model equation is a variant of the KdV eqn. (normalized variables).

1. Determine the dispersion relation of the linearized equation. What is therelation with the dispersion relation of the linearized FSWE, and withthat of the standard-KdV-equation.

2. Looking for travelling waves, u(x, t) = f(x−V t), write down the equationfor the profile function f ; do you recognize this (form of the) equation?Find the solution explicitly.

3. Give the constrained variational formulation for the soliton profile, andexplain it as a Relative Equilibrium by recognizing the first integrals.

C.3 NLS Wave GroupsUsing the Hamiltonian structure of NLS, we will now directly apply the reason-ing for Relative Equilibria by using the two most simple integrals. (Just likeKdV, NLS is very special and has infinitely many integrals...). Among severalother interesting solutions, we will also find the NLS-soliton solution. That is,all these solutions are found for CNLS, while DNLS has none of these solutions.Therefore we restrict to CNLS in the following.

C.3. NLS WAVE GROUPS 121

Hamiltonian structure

The CNLS has a Hamiltonian structure of the following form:

∂τA = iδH(A), with H(A) =Z ·

1

2β|Ax|2 − 1

4γ|A|4

¸dx. (C.4)

(We retain the parameters β and γ so that we can recognize at each place whichterm is caused by dispersion and which by nonlinearity.)

First integrals and their flow

The following two quadratic functionals are both constants of the motion. Theyhave a physical meaning and their flow can be written down explicitly from therelated equation.The first integral can be interpreted as the wave energy (wave power), and

its flow (infinitesimal symmetry) expresses the Gauge invariance of NLS:

N(A) =

Z1

2|A|2dx,

with flow : ∂τA = iδN(A) = iA, i.e. A = c.eiτ

Another quadratic functional is called Linear momentum since its flow is trans-lation symmetry:

L(A) = ImZ1

2A∂xAdx,

with flow : ∂τA = iδL(A) = −∂xA, i.e. A(x, τ) = A(x− τ , 0)

C.3.1 Relative Equilibria: soliton- and periodic wave groups

NLS combines diverging/converging effects of dispersion and of nonlinearity.When signs are correct, for the CNLS, the diverging effect of dispersion, andthe confining effect of nonlinearity balance each other and ‘confined’ solutions,like a soliton, exist. CNLS is most famous for its 1,2, .. N - soliton solutions,which can (accidentally) be written down relatively easy. This is related tothe completely integrability of NLS, and the related existence of an infinity ofconservation laws (first integrals).Relative equilibria are found as critical points of the Hamiltonian at a given

value of one or more other integrals. With the wave energy and linear momen-tum as constraints, the constrained critical point problem reads

Crit H(A) | N(A) = constant, L(A) = const .and a critical point should satisfy for some multipliers the equation

δH(A) = σNδN(A) + σLδL(A)

−β∂2xA− γ|A|2A = σNA+ σLi∂xA.

Actually we can simplify the following a little bit by noticing that the termσLi∂xA can be ‘gauged’ away from this equation by a transformation B = Aeiαx

for a suitable α. So, essentially we consider

Crit H(A) | N(A) = constant .

122 APPENDIX C. SOLITONS AND WAVE GROUPS

with equation (an additional minus sign for the multiplier for convenience)

δH(A) = −µδN(A)−β∂2xA− γ|A|2A = −µA

Clearly we can look for real-valued solutions of this RE-equation, which wewill denote by a(x). Having found such a real solution (relative equilibrium),the corresponding dynamic Relative equilibrium solution will be

A(x, t) = a(x)e−iµt

which is a time-harmonic modulation e−iµt of the fixed profile a(x), a ‘standingwave’.Basic in the analysis of the RE-equation is the recognition that it has as

mechanical analogue Newton’s equation: with β the mass of a particle in aconservative force field with potential P :

β∂2xa = −∂

∂aP (a), with potential P (a) = −1

2µa2 +

1

4γa4 (C.5)

This problem can again be analyzed with phase plane techniques. The sign ofµ and γ will be important for the potential profile.The sign of µ determines the stability of the trivial solution a ≡ 0 : for

µ < 0 the trivial solution is stable, and is unstable for µ > 0. Note that forpositive µ, the lowest value of the potential is at a = ±

qµγ , while the potential

is negative for |a| < √2q

µγ .

We now describe briefly the various solutions that can exist and can be foundform phase plane analysis.

Nonlinear harmonic

This is the solution with constant amplitude:

A = qe−iγq2t

which corresponds to the case that µ > 0, and q =q

µγ , is the point of minimal

potential. The real part of the NLS-solution is sketched below as function of ζwith the constant amplitude indicated:

C.3. NLS WAVE GROUPS 123

Nonlinear modulated harmonic

Also for µ > 0, small amplitude periodic motions around the point of minimalpotential energy lead to NLS-solutions that are a modulation of the t-harmonic:

Soliton

For µ > 0 there exists the famous soliton solution as homoclinic orbit. Theamplitude

a = qsech(qr

γ

2βx)

for µ = γq2/2 modulates the t-harmonic and confines its support:

Nonlinear bi-harmonic

For µ < 0 the potential is convex, and only periodic solutions can exist thatcross the origin. This leads to what could be called ‘nonlinear bi-harmonic’solutions:

124 APPENDIX C. SOLITONS AND WAVE GROUPS

Scaling and (non-) existence of NLS-solitons

Just as for KdV we can get information from the RE-constrained critical pointproblem about scaling properties of the soliton. However, it is also possible to seefrom the following simple argument that for DNLS no solitons can be expectedto exist. Consider again the ‘tent’ function as trial function with amplitude aand width W as parameters:

φ(x) =

½aW−|x|W for |x| < W0 for |x| ≥W

Upon substituting in CNLS (-sign), DNLS (+ sign) there results (up to positivemultiplicative factors):

Crit½ Z £|∂xA|2 ± |A|4¤ dx ¯ Z |A|2dx = γ

¾∼ CritW,a

½W

·³ aW

´2± a4

¸ ¯Wa2 = γ

¾∼ CritW

·γ

W 2± γ2

W

¸Clearly, with the plus-sign, i.e. for DNLS, no critical points are found. Withthe minus sign, for CNLS, the minimal value is achieved for W ∼ 1/γ, henceW ∼ 1/a and the multiplier µ follows from differentiation the minimal value:µ ∼ ∂γ

£γ3¤ ∼ γ2 ∼ a2, just the result found above in the exact formula.

C.4 Exercises

1. Kink solutions of Sine-Gordon equationThe Sine-Gordon equation uses an angle variable u to describe the orien-tation of spins on a continuous line in an magnetic system. The equationreads (with κ some material constant)

utt = uxx + κ sin 2u.

(a) Derive the equation for a travelling wave: u(x, t) = U(x− λt).

(b) Show that there is a kink-solution, a travelling wave with U(ξ) → 0for ξ → −∞, and U(ξ)→ π for ξ →∞; use phase plane analysis.

(c) Investigate the variational formulation for the kink solution.

2. Cnoidal waves for KdVTravelling waves of KdV were investigated on the whole real line before.In this exercise we want to investigate travelling waves that are periodic.

(a) Show that solutions that are periodic with period 2π on the real linecan be found by periodic continuation of functions on [0, 2π] thatsatisfy periodic boundary conditions.

C.4. EXERCISES 125

(b) Show that for periodic solutions it is possible to restrict to solutionswith zero mass:

Ru = 0.

(c) Derive the equation for a periodic travelling wave; investigate thisequation (phase plane analysis). Derive the solution in an implicitway. Using elliptic functions, the so-called cnoidal function, the solu-tion can be “explicitly” written down; therefore such periodic wavesare called cnoidal waves.

(d) Show that the cnoidal wave form is obtained as a relative equilibriumform the constrained minimal energy problem

Min½Z

(1

2u2x − u3)

¯ Z1

2u2 = γ,

Zu = 0, u(0) = u(2π)

¾,

and that the cnoidal wave is the corresponding relative equilibriumsolution.

3. ** KdV-cnoidals, cnt’dIn the rest of this exercise we study the above constrained minimizationproblem; denote a solution by U (suppressing the dependence on γ thatdoes not play a particular role in this exercise).

(a) Show that u ≡ 0 is a critical point, but not the minimizer.(b) Conclude that for the minimizer

RU3 > 0, and that U cannot be a

constant.

(c) Observe that with U , any translate of U is also a minimizer: there isa continuum of minimizers.

(d) Construct the Lagrangian functional; show that this Lagrangian func-tional is not bounded from below. Hence, U is not the (global) min-imizer of the Lagrangian functional.

(e) Now show that U is also not a local minimizer of the Lagrangianfunctional. To that end, investigate the second variation at U . Firstshow that the second variation at U vanishes in the direction Ux(why?). Then show that the second variation at U in the directionU is negative (use the equation for U ; note that the U -direction isnot tangent to the level set of the constraint-functional!).

(f) Conclude from the previous result that the value function must be aconcave function.

126 APPENDIX C. SOLITONS AND WAVE GROUPS

Bibliography

[1] N. Akhmediev & A. Ankiewicz, Solitons, Nonlinear pulses and beams,Chapman & Hall, 1997.

[2] V.I. Arnol’d, Mathematical methods of classical mechanics, Springer 1989(revised edition).

[3] C. Carathéodory, Calculus of Variations and partial differential equations,Chelsea, New York, 1982.

[4] F.H. Clarke,Optimization and nonsmooth analysis, Wiley, New York, 1983.

[5] R. Courant & D. Hilbert, Methods of Mathematical Physics, vol 1, 2; In-terscience Publishers, New York, 11953, 1962.

[6] I. Ekeland, Convexity methods in Hamiltonian Mechanics, Springer, Berlin,1990.

[7] I. Ekeland and R. Temam, Convex analysis and variational problems,North-Holland, Amsterdam, 1976.

[8] B.A. Finlayson, The method of weigthed residuals and variational princi-ples, Academic Press, New York, 1972.

[9] A. Friedman, Variational principles and free boundary problems, Wiley,New York, 1982.

[10] I.M. Gelfand & S.V. Fomin, Calculus of Variations, Prentice Hall, 1963.

[11] H. Goldstein, Classical Mechanics, Addison Wesley, 1980.

[12] Goldstein, History of the Calculus of Variations, Springer Verlag, 1980

[13] E. van Groesen & E.M. de Jager, Mathematical structures in continuousdynamical systems, Elsevier North-Holland, Amsterdam, 1994.

[14] E. van Groesen & J. Westhuis, Modelling and simulation of surface waterwaves, Journal Mathematics and Computers in Simulation 2001

[15] A. Hasegawa and Y. Kodama, Solitons in optical communications, Claren-don Press, Oxford, 1995.

[16] M.R. Hestenes, Optimization theory, The finite dimensional case, Wiley-Interscience, 1975.

127

128 BIBLIOGRAPHY

[17] Ioffe & V.M. Tikhomirov, Theory of Extremal problems, North-Holland,1979

[18] C. Lanczos, The variational principles of mechanics, Univ. Toronto Press,1970.

[19] P. Lax, Integrals of nonlinear equations of evolution and solitary waves,Comm. Pure Appl. Math. 21(1968)467-490

[20] A.C. Newell and J.V. Moloney, Nonlinear optics, Addison-Wesley,Canada,1992.

[21] C. Sulem & P-L Sulem, The Nonlinear Schrödinger Equation, SpringerVerlag, 1999.

[22] T. R. Taha & M. J. Ablowitz, Analytical and Numerical Aspects of CertainNonlinear Evolution Equations II, J. Comput. Phys. 55(1984)203.

[23] V.M. Tikhomirov, Stories about Maxima and Minima, American Mathe-matical Society, 1990.

[24] J.L. Troutman, Variational calculus with elementary convexity, Springer,1983.

[25] Whitham, Linear and nonlinear waves, Wiley, 1976.

[26] E. Zeidler, Nonlinear functional analysis and its applications, part III, Vari-ational methods and optimization, Springer, 1985.

1

Overview Basic Calculus of Variations (Chapter 1)

Topic Finite dimensional Infinite dimensional Example Sturm-Liouville Unconstrained minimization

nxxF ℜ∈)(min

UuuL ∈)(min

domain of definition

Linear space (or affine space) xn ∋ℜ

Function space + BoundyCond’s : uU ∋

2)0(),1,0(|)( =∈= uxxuU

function(al) ℜ→ℜ nF :

ℜ→UL : ( )[ ]dxquupuL x∫ +∂=

1

0

22

21)(

Local Theory differentiation

Partial derivatives nmFxF

mxm

...1, =∂==∂∂

Directional derivative

0

)(:);(=

+=ε

εηε

η xFddxDF

First variation

0

)(:);(=

+=ε

εε

δ vuLddvuL

∫ +∂∂=1

0

][);( dxquvvupvuL xxδ

linear approximation

)();()()( 2εηεεη OxDFxFxF ++=+

)();()()( 2εεε OvuDLuLvuL ++=+

Gradient

∂=∇

F

FxF

nx

x1

)( so that ηη •∇= )();( xFxDF

Variational derivative

)(uLδ so that for all testfunctions η

[ ]∫ •= ηδηδ )();( uLuL

[ ] quupuL xx +∂−∂=)(δ

Unconstrained Critical point

0)ˆ();ˆ( =•∇= ηη xFxDF

[ ]∫ =•= 0)ˆ();ˆ( vuLvuL δδ

admissible variations

for all nℜ∈η

for all v that vanish at prescribed boundary cond’s

v arbitrary except satisfying 0)0( =v

Equation

Fermat’s algorithm 0)ˆ( =∇ xF

Euler-Lagrange equation 0)ˆ( =uLδ

++ Boundary Cond’s

++ Natural BC’s

[ ]

0)1()1(2)0(

0

=∂=

=+∂∂−

upu

quup

x

xx

2

Overview Constrained Problems (Chapter 2)

Topic Finite dimensional Infinite dimensional Example S-L Eigenvalue Problem Constrained minimization

nCxxF ℜ⊂∈)(min

UMuuL ⊂∈)(min ( ) dxquupuL x∫ +∂=1

0

22 ][)(

admissible elements

Subset of linear space nC ℜ⊂ , for instance:

Subset of function space + BC’s : UM ⊂

0)0(),1,0(|)( =∈=⊂ uxxuUM

for function(al) constraints

for given ℜ→ℜ nK : the levelset nxKxC ℜ⊂== )(| γ

for given ℜ→UK : the levelset UuKuM ⊂== )(| γ

=∈= ∫1

0

2 1uUuM

Constrained Critical point

0)ˆ();ˆ( =•∇= vxFvxDF

0);ˆ( =vuLδ

admissible variations

for all nv ℜ∈ for which vx ε+ˆ satisfies the constraints up to )( 2εO (the tangent space)

for all admissible v for which vu ε+ˆ satisfies the constraints up to )( 2εO

v satisfying 0)0( =v and

for function(al) constraints:

0);ˆ(| =vxDKv i.e. 0)ˆ(| =•∇ vxKv

0);ˆ(| =vuKv δ

01

0

=∫ uvdx

Equation: Lagrange’s Multiplier Rule

There is a multiplier such that

)ˆ()ˆ( xKxF ∇=∇ λ (& γ=)ˆ(xK ) equivalently: is an UNconstrained critical point of

)()( xKxFx λ−→

There is a multiplier such that

0)ˆ()ˆ( =− uKuL λδδ (& γ=)ˆ(uK ) ++ Boundary Cond’s

++ Natural BC’s equivalently: is an UNconstrained critical point in U of

)()( uKuLuU λ−→∋

[ ]

0)1()1(,0)0(

=∂=

=+∂∂−

upu

uquup

x

xx λ

UNconstrained critical point in U of

( ) dxuquupuU x∫ −+∂→∋1

0

222 ][ λ

Multiplier & Value function

Value function γγ === )()(min)ˆ(:)( xKxFxFV

then )(γγ

λ V∂∂=

Value function γγ === )()(min)ˆ(:)( uKuLuLV

then )(γγ

λ V∂∂=

( ) λγγγ =

=+∂= ∫∫ dxudxquupV x

1

0

21

0

22 ][min)(

Rayleigh quotient

( )

∈+∂= ∫∫ Uudxudxquup x

1

0

21

0

22 ][minλ

PROJECTS Variational Methods in Science

August 19, 2002

Abstract

These are the ‘projects’ for the course. Each group will select one project.REPORTING (for each project):– Make a ‘laboratory’ report of your analytical work, and assemble the major plots obtainedwith Maple or otherwise (add captions that make clear what is the contents; what is on horizon-tal/vertical axes etc.). Don’t waste too much time with nice lay-out etc., better use that time forthe contents.– An oral presentation on Saturday 22 July of not more than 20 minutes.

1

Title Surface waves above varying bottomDate Bandung, 10 July 2002Groups 1,2Remark See in particular Appendix B (with errors improved)

General problem formulationConsider surface waves in one horizontal direction (x-axis), above varying bottom; the depth is denotedby h(x), and we use the notation c(x) :=

pgh(x) in the following.

Essential is that we use linearized theory, so restriction to small-amplitude waves.Then the equations are given byµ

∂tη∂tv

¶= −

µ0 ∂x∂x 0

¶µδηH(v, η)δuH(v, η)

¶with H =

Z ·1

2h(x)v2 +

1

2gη2¸

so

∂tη = −∂x [h(x)v] , ∂tv = −∂x [gη]i.e. ∂2t η = ∂x

£c2x)∂xη

¤In the following we will investigate time-periodic motions,

η(x, t) = η(x)e−iωt + cc

and the equation for η becomes (where from now on we ommit the hat)

∂x£c2x)∂xη

¤+ ω2η = 0. (1)

It is sometimes simpler to consider a transformation and introduce a new inhomogeneity function

u(x) = c2(x)∂xη, k2(x) := ω2/c2(x) (2)

which brings the equation to the ‘standard’ form

∂2xu+ k2(x)u = 0, with k2(x) = ω2/c2(x) (3)

[[ Note the optical analogy: equation (1) is the equation for TM-modes, while (3) is the equation forTE-modes, when 1/c(x) is identified as the index of refraction.]]

2

1 Project 1: Step-wise variationsWhen the bottom is not gradually changing, reflections of incoming waves have to be taken intoaccount. For the case of piecewise constant bottom (with jumps in the profile), the solution can befound analytically. We will consider the case of prescribed incoming wave (with normalized amplitude1), and look for transmission and reflection-properties.

1. Consider the case of a single depression in the bottom:

h(x) = h0 + (h1 − h0)χ[0,L]Find the solution(s??); investigate how the transmission depends on ω, and the height differenceh1 − h0. Make plots.

2. Write down a formal variational formulation on the whole real axis, and a confined formulationusing the solution in the exterior domains. (This confined formulation can be used for a FEM-code when the interior variations are more complicated, not constant, and the interior solutioncannot be written down explicitly.)

3. Take trial functions for the wave elevation above the depression, and find the approximatesolution. Compare the results with the analytic solution.

4. Consider the case of two (or several, sayN) successive depressions in the interval [0, L]. Calculatethe transmission.

2 Project 2: Sloping bottomSuppose the bottom is slowly varying over an interval of length L, connecting two constant levels fromdepth h0 to depth h1 < h0.

1. Write down a confined variational formulation for the case of an incoming wave.

2. Use a WKB-approximation in the interior to calculate transmission properties.

3

Title Waveguiding in optics: mode analysisDate Bandung, 10 July 2002Groups 3,4Remark See in particular Appendix A of Lecture Notes

General problem formulationIn Appendix A the main ideas of wave guiding in optics and various methods have been explained.The text below uses this information in the same notation.In addition to the ‘symmetric’ wave guides treated in Appendix A (indices equal at both exteriordomains), one can also consider the non-symmetric wave guide (say indices n− at the left and n+ atthe right, n1 in the middle. Of course the symmetric wave guide is recovered for n− = n+ = n0, whichgives the possibility for checking the results.The analytical treatment for step-indices leads to a transcedental equation for the propagation con-stant(s) β.

1. Make a plot in Maple for the value of the principal β versus width of the waveguide.

2. Determine the number of guided modes as function of the width.

3 Project 3: Trial functionsIn each of the following cases, make the plot of the principal β versus width, and compare with the‘analytic’ plot found above.

1. Use the simple tent-function as trial function to approximate the principal β for a symmetricwave guide.

2. Use the confined variational formulation with a tent-like function for the field in the interiorwaveguide.

3. Use the confined variational formulation with a harmonic trial function within the waveguide(cos(λx), with the parameter λ to be optimised). [[ Note that this contains the exact solution!!]]

4. For a non-symmetric wave guide, ‘invent’ a suitable (non-symmetric) ‘tent-like’ trial function,and calculate the principal β. See how the results depend on the exterior index-difference (atfixed width, plot the value of β as function of n+ − n−).

4 Project 4: FEM with TBC (Transparent Boundary Condi-tions)

1. Make a FE numerical scheme based on the confined formulation with TBC, first for the sym-metric waveguide.

2. Do some numerical experiments, plot the field (extended to the exterior regions), and for variousvalues of the width calculate β and compare it with the theoretical values.

3. Show that the numerical scheme is second order accurate.

4. Calculate the principal field and β for the case of two parallel wave guides.

4

Title NLS-soliton wave groupsDate Bandung, 10 July 2002Group 5Remark See in particular Appendix C of Lecture Notes

General problem formulationWe start with the constrained formulation for the NLS-soliton as given in Appendix C: for realamplitude a(x) :

mina(x)

½Z ·1

2β(∂xa)

2 − 14γa4

¸dx

¯Za2dx = n, a(x)→ 0 for x→ ±∞

¾where n is given positive number.The aim is to approximate the soliton profile with numerical methods, without exploiting the factthat the exact formula can in fact be written down.

5 Project 5: Approximation of soliton1. Use a tent-function, variable height and width, to find a confined approximation.

2. Investigate the governing equation for a minimizer (soliton) to find asymptotic behaviour. Usethis information to derive an (approximate) confined variational formulation, in much the smeway as in Appendix A.2. for guided modes.

3. Use again a tentfunction to approximate the soliton, but now in the confined formulation.

4. Compare the results with the exact soliton formula.

5. Try to design a FEM code for the confined variational formulation (the quadratic term is some-what cumbersome). Use a simple iteration scheme to solve the governing equations. Comparethe result with the exact soliton formula.

5

Title NLS low-dimensional model for bi-harmonic wavegroupsDate Bandung, 10 July 2002Group 6Remark See Chapter 3 and Appendix C of Lecture Notes

General problem formulationConsider the NLS-equation

i∂τA− δH(A) ≡ i∂τA+ β∂2xA+ γ|A|2A = 0

with initial value:A(x, 0) = a0 cos(κx).

This corresponds to the evolution of a physical bi-harmonic wave group with difference κ between(closeby) wavenumbers.Show that the periodicity in x, with period X = 2π/κ, remains for all time τ > 0. Hence we caninvestigate the equation on an interval of length one period with periodic boundary conditions.The resulting dynamics is quite difficult. In this project we derive a low-dimensional model for thedyamics by taking only the two most dominant spatial modes into account.

6 Project 6: Low dimensional model1. Observe that the NLS is the E-L equation from the action principle with action functional L:

L(A) =Zdτ

·Z1

2Ai∂τAdx−H(A)

¸(x-integration is over one period).

2. Take as Ansatz for the approximate dynamics

A(x, τ) = a(τ) cos(κx) + b(τ) cos(3κx)

with a, b complex-valued coefficients.

3. Restrict the action functional to the functions from the Ansatz, and derive the resulting func-tional, say L(a, b) in the coefficient functions a(τ), b(τ).

4. Write down the E-L-eqn for L(a, b) and find in this way the low-dimensional dynamics.5. Show that the original wave energy funtional N(A) =

R12 |A|2dx, when restricted to the low-

dimensional dynamics, is still a constant of the motion.

6. Use this last fact to simplify the equations; in fact the system can be reduced to a system withone degree of freedom that can be studied by phase-plane analysis. Try to do this. Find theapproximate evolution of the IVP.

6