notes on calculus of variation with applications in image...

Notes on Calculus of Variation with applications in

Image Analysis

Francois Lauze

March 8, 2011

Contents

1 Introduction 5

2 Finite Dimension 72.1 Objective functions and optimizers . . . . . . . . . . . . . . . 72.2 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 An Important Gradient Computation . . . . . . . . . . . . . 132.4 Change of Metric and Gradient . . . . . . . . . . . . . . . . . 132.5 Cauchy-Scharwz, gradient descent . . . . . . . . . . . . . . . 152.6 Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . 162.7 Inverse and Implicit Function Theorems . . . . . . . . . . . . 21

3 Optimization of Functionals 253.1 Inner Products, Boundary Conditions,... . . . . . . . . . . . 253.2 Classical Integration Results . . . . . . . . . . . . . . . . . . 263.3 Adjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4 Change of Variables Theorem . . . . . . . . . . . . . . . . . . 29

4 Some Applications 314.1 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . 314.2 Rudin-Osher-Fatemi Model, BV Regularization . . . . . . . . 334.3 A Somewhat General Lagrangian . . . . . . . . . . . . . . . . 374.4 Heat Equation and Calculus of Variations . . . . . . . . . . . 394.5 Optical Flow a la Horn and Schunck . . . . . . . . . . . . . . 424.6 Non Linear Flow Recovery . . . . . . . . . . . . . . . . . . . . 44

5 Planar curves 455.1 Moving frames . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Length, arc-length parametrization, curvature . . . . . . . . . 465.3 Geometric integrals . . . . . . . . . . . . . . . . . . . . . . . . 485.4 Implicit representation of curves . . . . . . . . . . . . . . . . 49

3

4 CONTENTS

5.5 Representation by distance functions . . . . . . . . . . . . . . 515.6 A very short introduction to variations of curves . . . . . . . 535.7 An extremely short / ultralight introduction to shape deriva-

tives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Optimization of curves and regions 596.1 Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.2 Curves of minimal length . . . . . . . . . . . . . . . . . . . . 606.3 Geodesic active contours . . . . . . . . . . . . . . . . . . . . . 616.4 Inner products and distances . . . . . . . . . . . . . . . . . . 646.5 Chan and Vese’s Active Contours . . . . . . . . . . . . . . . . 64

6.5.1 Chan and Vese cost function . . . . . . . . . . . . . . 64Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Chapter 1

Introduction

The purpose of these notes is to introduce, in a somehow informal way,although with a few abstract concepts, some elements of the Calculus ofVariations. I will in particular avoid functional analysis aspects, althoughI will mention a few Hilbert spaces. I will however assume that the readeris familiar with some formulas from Integral Calculus, although I will recallthem when needed.

A main point of these notes is the link between the notion of differentialof a function and the notion of Gradient.Gradients depend on a Euclidean(Hilbertian) inner product on the space where one works, while differentialdoes not. The adjoint of a linear operator is then naturally introduced.

This document is targeted toward not too scared electrical engineers andcomputer scientists, but not hard core mathematicians, I have a reputationto defend.

5

6 CHAPTER 1. INTRODUCTION

Chapter 2

Finite Dimension

In this chapter, we are interested in optimization of real functions defined onan open subset of an Euclidean space, generally Rn. I will introduce in a bitformal way the notion of Euclidean (and Hilbertian) structure, adjunctionand gradients. I will discuss smooth constrained optimization and Lagrangemultipliers.

2.1 Objective functions and optimizers

Given a function f : U ⊂ Rn → R, defined on an open subset U of Rn,a standard question is to find, if it exists, an extremum of f , minimum ormaximum, and its corresponding location x∗ ∈ U such that

∀x ∈ U, f(x∗) ≤ f(x) (resp. f(x∗) ≥ f(x) ).

When x∗ corresponds to a local (resp. global) minimum f(x∗), x∗ is calleda local (resp. global) minimizer of f . When x∗ corresponds to a local (resp.global) maximum f(x∗), x∗ is called a local (resp. global) maximizer of f .In many problems, this function is built as a function that measures “howwell” a certain number of criteria are enforced at a given point x ∈ U . Sucha function is called an objective function and an extremum x0 representsthe location where an optimal trade-off is achieved between the differentcriteria.

When the optimization is a minimization, f is generally called a costfunction or energy function, f(x) measures the “cost” of choosing x, orthe “energy used” for that choice, and one searches for a point of minimalcost/energy.

7

8 CHAPTER 2. FINITE DIMENSION

When f is differentiable, and we will always assume it from now, anecessary condition is that the differential dfx of f in x vanishes. Instead ofusing this differential, one often considers the gradient ∇xf of f in x, whichis a vector in Rn and we are naturally led to solve in x the vector equation∇x = 0. This can be done by different methods, more or less “direct”, asthe equation to solve may be rather complex. Among these methods, theclassical gradient descent, when seeking a minimizer x∗ of f , one attemptsto obtain it as a solution of the differential equation:

xt = −∇fx.

The underlying idea is that, starting from a point (x0, f(x0)) on the graphGf of f , one goes down the graph along the curve of steepest slope, and thisis encoded by −∇xf ,, and one will hopefully “hit” an equilibrium point,i.e.xt = 0, which means that −∇xf = 0. Under some assumptions on theshape of Gf , this should provide a local minimizer.

2.2 Gradients

It thus becomes necessary to precise the connection between the differentialand the gradient of a function. To do so, I will start recalling some (shouldbe) well known notions of differential calculus and Euclidean geometry.

Recall that the differential of f at a point x = (x1, . . . , xn) is defined asthe unique linear form (if it exists) Lx : Rn → R satisfying

f(x + h) = f(x) + Lx(h) + o(h)

where h is “sufficiently” small.The directional derivative of f at x in the direction v is defined as the

limit. if it exists, of

dxfv := limt→0

f(x + tv)− f(x)

t

A clearly equivalent definition of dxfv is the following (I will also use theterm “Gateaux Derivative”, especially in the infinite dimensional case). Let` : R→ R be defined as `(t) = f(x + tv). Then

dxfv = `′(0).

When f est differentiable, on has in fact the simple relation (simple butextremely important in the rest of these notes)

dxfv = Lx(v).

2.2. GRADIENTS 9

and I will denote dxf the differential of f at x. Until now, I have not talkedof partial derivatives. The partial derivative of f wrt1 the i-th variable xiat x is by definition:

∂f

∂xi(x) := dxfei

where ei is the i-th element of the natural basis of Rn This notion of partialderivative is thus basically the same as the one of directional derivative, butthe important part is with respect to the choice of a basis of Rn as follows:If v =

∑ni=1 viei, one has the relation

dxfv =

n∑i=1

vi∂f

∂xi(x).

This relation is in fact an inner product :

n∑i=1

vi∂f

∂xi(x) =

v1...vn

·

∂f∂x1...∂f∂xn

(2.1)

(I threw the x away to keep the formulas legible). The vector∂f∂x1...∂f∂xn

is the gradient of f , denoted ∇f . It seems that this gradient depends apriori on the choice of a basis, but in fact the dependency is “weaker”: it thestandard basis is replaced by an other one, also orthonormal, the the valueof expression (2.1) does not change. We will use this property to define thegradient in an abstract way.

I will first recall a few concepts of linear algebra dealing with the notionof inner product and orthogonality.

Definition 1 Euclidean Spaces, (real) Hilbert Spaces A prehilbertian realvector space E (i.e. a scalars are real) is a vector space endowed with aninner product or inner product, i.e. a bilinear form (x,y) 7→ π(x,y) ∈ Rwith the following properties:

1. ∀x ∈ E, π(x,x) ≥ 0 (positive)

1wrt = with respect to


2. π(x,x) = 0⇐⇒ x = 0 (definite)

3. π(x,y) = π(y,x) (symmetric).

The bilinear form π is sometimes called a prehilbertian metric. Two vectorsx et y are orthogonal if π(x,y) = 0. the prehilbertian norm of a vector x is‖x‖π =

√π(x,x), it depends of course on the inner product and associated

with this norm is the prehilbertian distance d(x,y) := ‖x−y‖π. One gener-ally writes π(x,y) = x ·y or 〈x,y〉 or 〈x,y〉π if one wants to emphasize thedependency in π, and his will prove important in several occasions in thesenotes. When E is finite-dimensional, one talks of Euclidean space. A pre-hilbertian space is called Hilbertian or Hilbert space if it is complete wrt theabove distance, i.e. every Cauchy sequence for that distance converges. AnEuclidean space is always Hilbertian, the complications arise in the infinitedimensional case.

Definition 2 Orthogonality. Let E be an Euclidean space (resp. Hilbert)with its inner product π.When π(x,y) = 0 we say that x is orthogonal (forπ, or π-orthogonal) to y and we denote it x⊥y or x⊥πy if we need / wishto make the inner product explicit. The set y ∈ E, π(x,y) = 0 is theorthogonal x⊥ of x. When V is vector subspace of E, then V ⊥ = y ∈E, π(x,y) = 0, ∀x ∈ V is the orthogonal of V .

There are plenty of super-classical relations, most of them are assumed to beknown. One that we will use explicitly: if V is a subspace of E, E = V ⊕V ⊥,i.e any vector h of E can be written uniquely as a sum h = h1 + h2 whereh1 ∈ V and h2 ∈ V ⊥. I may sometimes call h1 the component of h tangentto V and h2 as the component of h normal to V .

What’s is this stuff useful for? The following result (should be known,at least in the Euclidean case) is paramount for the sequel

Theorem 3 Representation of a linear form by a vector. Let (E, π) anHilbert space and L : E → R a linear form. Then there exists a uniquevector v such that

L(x) = 〈v,x〉π.Of course v depends on the inner product π.

In a more concrete way, a linear form on Rn is classically given by itscoefficients (l1, . . . , ln) and we clearly have

L(x) = (l1, . . . , ln)

x1...xn

=n∑i=1

lixi

2.2. GRADIENTS 11

which is the usual inner product of Rn! The corresponding vector v is justl1...ln

= (l1, . . . , ln)T

where T denotes transposition. Linked to transposition is the adjunctionoperation:

Definition 4 Adjunction, simple one, to start with. Let E be an Euclideanspace (or Hilbert) with inner product 〈x,y〉 and A : E → E a linear map.The adjoint de A, denoted A∗ is the unique linear map E → E which satisfiesthe following relation (called adjunction relation):

〈Ax,y〉 = 〈x, A∗y〉.

If A = A∗ one says that A is self-adjoint.

In Rn with canonical basis and inner product,linear maps Rn → Rn areidentified to square matrices. The adjoint A∗ of A is just its transposed AT !Thus a matrix is self-adjoint if and only if it is symmetric.

Definition 5 Adjunction, a bit more general. Let E and F two Hilbertspaces (or just Euclidean), 〈−,−〉E and 〈−,−〉F their respective inner prod-ucts,and A : E → F a linear map. The adjoint A∗ of A is the uniquelinear map F → E (if it exists, always the case in finite dimension, but forthe infinite dimensional case, I’m not sure) which satisfies, for all x ∈ E,y ∈ F ,

〈Ax,y〉F = 〈x, A∗y〉E .

Once again, in the finite dimensional situation, if E = Rn, F = Rn withstandard bases and inner products, A is a n ×m matrix and A∗ is just itstransposed AT ! In fact, transposition and adjunction are identical in thestandard Euclidean setting!

It’s now time to come back to our main subject of interest: gradients.

Definition 6 Let g a differentiable function from an (open of an) Euclideanspace E (inner product 〈−,−〉E) with values in R and let g′(x) = dxg be itsdifferential at point x. It’s a linear map from E to R and by the represen-tation theorem above, there exists a unique vector v ∈ E such that

∀h ∈ E, dxg(h) = 〈v,h〉E

This vector v is, by definition, the gradient of g in x and is denoted ∇xg(or ∇g when there is no ambiguity).


The important remark here is that the notion of gradient is inner prod-uct dependent, while differential is not

Important idea. If, for a “generic direction” h, one can write dxgh as〈v,h〉, that is, such that the second factor is h itself, then v must bethe gradient of g at x (for the given inner product). This is this kind ofcomputation one does in order to obtain a more or less analytical form forthe gradient.And what about adjunction here? In fact, in a typical gradient computa-tion, one obtains in general an expression of the form

dxgh = 〈w, Lh〉

where L is some linear map. By adjunction, one gets 〈w, Lh〉 = 〈L∗w,h〉,which thus means that the sought gradient is L∗w.

There is in fact an other definition of gradient, which, for an Eucliean/Hilbert spaceprovides an equivalent characterization, but that only requires a Banach space E,we won’t use it but I add for the shake of completeness (for more see [10]).

Recall (perhaps) that a vector space E with a norm ‖−‖E is a Banach space ifit is complete for that norm. A linear form F on E (i.e. a linear map F : E → R) iscontinuous if there exists a constant c > 0 such that for each h ∈ E, |F (h)| ≤ c‖h‖E(|F (h)| is just the absolute value of F (h) ∈ R). The infemum of such constant c’sis called the operator (or dual here) norm of F ,‖F‖E∗ . It is also characterized as

‖F‖E∗ = suph∈E,h6=0

|F (h)|‖h‖E

.

Let g a differentiable function from (an open set of) E to R, x ∈ E and let g′(x) =dxg the differential of g at x. This is a linear form on E, that we will assumecontinuous.

Definition 7 Among all h ∈ E such that ‖h‖E = ‖dxg‖E∗ (they form a sphere ofradius ‖dxg‖E∗), there exists a unique one h which maximizes dxg(h) ∈ R. This isthe gradient ∇xg of g at x.

The uniqueness of h needs to be proved, we will not do it here. This definition

is interesting/importannt as it frees us from the Euclidean/Hilbertian restriction

allowing optimization via geometric approaches in more general spaces. In partic-

ular for `p and Lp spaces, the operator/dual norm is well known and this lead to

interesting gradient computations.

2.3. AN IMPORTANT GRADIENT COMPUTATION 13

2.3 An Important Gradient Computation

Two differentiable functions f : Rn → R and g : Rm → Rn are given, withusual bases and inner products. Let us set k(x) := f(g(x)). We want tocompute ∇k. To that extend, we compute the derivative of the function` : t 7→ `(t) = k(x + th) = f(g(x + th)) at t = 0:

`′(t) = dfg(x+th)dgx+thh

by the chain rule. The map dgx from Rm → Rn is linear. At t = 0 we thenget

dkxh = dfg(x)dgxh

= 〈∇fg(x), dgxh〉 definition du gradient de f

= 〈dg∗x∇fg(x),h〉 adjunction

We succeeded in writing the directional derivative, for a generic direction has an inner product 〈v,h〉 with v = dg∗x∇fg(x). v is consequently the soughtgradient. In matrix notations, dg is the Jacobian matrix

Jg =

∂g1∂x1

. . . ∂g1∂xm

......

...∂gn∂x1

. . . ∂gn∂xm

ofg at x. We thus have, (with standard bases blablabla...) written thegradient as ∇k = JgT∇f = (df Jg)T .

This example can appear as a somewhat artificial play with the chainrule, it is in my opinion, one of the most fundamentals in order to understandthe algebra for the Calculus of Variations. It reflects the series of operationsthat have to be applied in so as to obtain the gradient of a functional andthe associated Euler-Lagrange equation. This will be the object of one ofthe following sections. We will encounter some non linear f , while g willin general be a linear differential operator. It will be necessary to give ameaning to such an operator.

2.4 Change of Metric and Gradient

In Rn, we know that the datum of an inner product is the same as thedatum of a symmetric, positive definite matrix A: the form (x,y) 7→ xTAyis symmetric and positive definite. The usual inner product corresponds to


A = In, the identity matrix, I will denote it as 〈−,−〉In , 〈x,y〉In = xTy.Given a function f : (U ⊂) Rn → R, its standard gradient (i.e. gradient forthe standard inner product) ∇Inf is just

∂f∂x1...∂f∂xn

But what happens if one changes the inner product? We want to compute∇Af . This is straightforward, but let us do it step by step. ∇Af is definedby the relation

〈∇Af,h〉A = dfh

But

dfh = 〈∇Af,h〉A= (∇Af)TAh definition of inner product

= 〈∇Af,Ah〉In definition of inner product

= 〈A∗∇Af,h〉In adjunction

= 〈A∇Af,h〉In symmetric ⇐⇒ self-adjoint

= 〈∇Inf,h〉In definition of gradient

and we have the relation ∇Inf = A∇Af . In other words,

∇Af = A−1∇Inf.

In order to make the reader more confused, as well as my explanations,the dimension 1 case is somewhat special. Let us Consider R: every innerproduct on R is given by (x, y) 7→ λxy with λ > 0 since symmetric positivedefinite matrices on R are just strictly positive real numbers. This means,that, up to a strictly positive multiplicative constant, gradient and differen-tial (which is nothing else than the multiplication by the usual derivative)match. We can just suppose that this constant λ is 1, and will no longerdistinguish between gradient, differential and derivative.

To add more confusion, in the somewhat opposite situation, in infinitedimension, A will often be represented by a differential operator. To changethe metric and obtain the new gradient from the old one, it will be necessaryto solve a linear differential equation. This is what Yezzi et al., in one hand,and Charpiat et al. have done, we will see that with the Sobolev activecontours.

2.5. CAUCHY-SCHARWZ, GRADIENT DESCENT 15

2.5 Cauchy-Scharwz, gradient descent

The Cauchy-Schwarz formula is also a elementary and classical result, whichwill make precise how the gradient indicates the direction of largest change.It is obviously important for optimization via gradient descent/ascent, andthus in many variational questions.

Theorem 8 Cauchy-Scharwz. Let x and y two elements from an Hilbertspace E. Then

|〈x,y〉| ≤ ‖x‖‖y‖

with equality if and only if x and y are colinear.

The proof of the theorem is elementary. It main consequence, at least forthis document is the following. Assume that we have succeeded in writingdfxh = 〈∇fx,h〉. From Cauchy-Schwarz, we thus have

|dfxh| = |〈∇fx,h〉| ≤ ‖∇xf‖‖h‖

and the direction h for which f(x + h) changes the most is by necessity onefor which we have equality in Cauchy-Schwarz, therefore it is the gradientdirection and since f(x + h) = f(x) + 〈∇fx,h〉+ o(h), it will be +∇fx fora maximal increase, and −∇fx for a maximal decrease, around x.

The following result will conclude this section.

Theorem 9 Let f : Ω → R be a differentiable function, where Ω is abounded open set (a domain), having a minimum at x ∈ Ω. Then ∇fx = 0.If this is the only vanishing point for the gradient in Ω, then the gradientdescent differential equation

x(0) = x0 ∈ Ω, xt = −∇fx

satisfies limt→∞ x(t) = x.

In general the minimum is not attained in finite time. For instance, if f = x2,∇f = 2x, gradient descent becomes

x′(t) = −2x(t)

whose general solution has the form x0e−2t. If x0 6= 0, the minimum of f

won’t be reached in finite time.


2.6 Equality Constraints

It is of course not the end of the story! In the situations described aboved,the function f had to be optimized on an open set of an Euclidean space.In many cases, we have to consider situations where f is defined or has tobe optimized on a subspace, linear, affine, or more general, of an Euclideanspace, and more generally on an open set of such a subspace.

Let us start with a very elementary example, which illustrates the someof the problems one encounters. Let f : R2 → R be the function (x, y) 7→x2 + (y − 1)4. Its “standard” gradient is given by

∇f(x,y) =

(2x

4(y − 1)3

)which vanishes only at point (0, 1). Let us assume that we are interested bythe values of f on the line D : x + y = 0. Point (0, 1) 6∈ D! It cannot be asolution of the problem

minDf(x, y).

Of course, by a direct substitution y = −x, one can compute the minimumofg(x) = f(x,−x) = x2 + (−x− 1)4. One has g′(x) = 4x3 + 12x2 + 14x+ 1,its unique real root is x ≈ −0.0762905 and the solution of the minimizationproblem is (x,−x).

Ok, that was easy! But what is D is a more complicated subspace?Let us assume that D is defined implicitely by an equation φ(x, y) = 0sufficienty complicated so that one can not perform a substitution. Thenlets us perform a simple but nevertheless interesting computation.

Let us assume the ambient space is Rn, so I will be able to (re)usesome standard notations. We will also assume that φ is differentiable. SetD = x, φ(x) = 0 and, last but not least, assume that ∇φ does not vanishon D. This condition, in the “modern” language of differential geometry,means that D is a submanifold of Rn, and in our case, an hypersurface ofRn (pour n = 2 this is a curve, for n = 3 a surface...). For a given h smallenough, a Taylor expansion gives

φ(x + h) = φ(x) +∇φx · h + o(h).

We deduce from it that the directions h for which φ changes the least in aneighborhood of x, and thus that φ(x+h) ≈ 0, i.e. x+h remains close to D,must be orthogonal to ∇φx, for, in this case, the inner product ∇φx ·h = 0.These directions form the tangent space ( line, plane..) tangent to D in xand its usual notation is TxD.

2.6. EQUALITY CONSTRAINTS 17

Now Taylor expansion for f gives

f(x + h) = f(x) +∇fx · h + o(h)

for h tangent to D. A few remarks. Let NxD be the subspace orthogonalto TxD, then

∇fx = v1 + v2, v1 ∈ TxD, v2 ∈ NxV

and for h ∈ TxD,

∇fx · h = v1 · h + v2 · h = v1 · h

because v2⊥h by definition of v2: in this case, only the component of ∇fxtangent to D en x plays a role. This is the gradient of f along D in x.We deduce from it that in order for f to vary minimally in a neighborhoodof x∗ along D, it is necessary that ∇fx∗ · h = 0, and thus that v1 = 0,i.e.∇fx∗⊥h. But h⊥∇φx∗ and therefore ∇fx∗ must then be parallel to ∇φx∗ .In other words: ∇fx∗ must be orthogonal to the tangen space Tx∗D of D atx∗.

Since we assumed that ∇φx∗ 6= 0, this parallelism conditions means thatthere exists a unique real λ∗ such that

∇fx∗ + λ∗∇φx∗ = 0.

This real λ∗ is called the Lagrange multiplier of f associate to the constraintφ = 0. How to use it? We first define an augmented objective function onRn+1:

f : (x, λ) = f(x) + λφ(x)

and we compute its differential in (x, λ)

df(x,λ) = (dfx + λdφx, φ(x)).

We need to search for a stationary point (x∗, λ∗) at which this differentialvanishes, or the associated gradient

∇f(x∗,λ∗) =

(∇fx∗ + λ∇φx∗

φ(x∗)

)= 0.

Thus, at this critical point (x∗, λ∗), the first component x∗ satisfies theconstraint. Beware nevertheless that such a critical point in general is notan extremum but a saddle point and a gradient descent type method willnot function as is. Specific resolution strategies may have to be used.


What happens when one has several constraints? Figure 2.1 illustratethe situation where two constraints are given in R3. The first one is givenby φ1(x, y, z) = x2 + y2 + z1 − 1 = 0, the corresponding surface is the unitsphere S2. The second is a linear one, φ2(x, y, z) = αx + βy + γz = 0, thecorresponding surface is a plane, that I have not drawn, and to satisfy thetwo constraint simultaneously mean that the (x, y, z) of the problem mustlie in the intersection of S2 the plane, which is a great circle of S2, denotedC on the figure.

Figure 2.1: Contraintes et espaces tangents

For the minimization problem on C we thus must only look at direc-tions h that keep us infinitesimally on C. At a given point P , these formthe tangent line to C at P , TPC and this is precisely the ones which aresimultaneously orthogonal to ∇φ1P and ∇φ2P , and for a function f(x, y, z)to have a minimum on C at that point, its gradient must be orthogonal toTPC and thus lie in the subspace generated by the two vectors ∇φ1P and∇φ2P , denoted a and c in the figure. But this means that ∇fP , ∇φ1P et∇φ2P are linearly dependent and there exists two real numbers λ1 and λ2

such that∇fP + λ1∇φ1P + λ2∇φ2P = 0.

We thus have two Lagrange multipliers. The objective function augmenta-tion technique still “works”, we will seek a stationary point of

f(x, y, z, λ1, λ2) = f(x, y, z) + λ1φ1(x, y, z) + λ2φ2(x, y, z).

2.6. EQUALITY CONSTRAINTS 19

And of course, if one has k constraints of the type φk(x) = 0,we can use kLagrange multipliers (but I can no longer draw a picture).

Important idea to remember When one constrains an optimizationproblem to a subspace D, linear or not , one constrains search directions:at each point P of this subspace, search directions must belong to thetangent space of D at P , TPD and the gradient of the objective functionat an extremum Q is orthogonal to the tagent space at this point TQD.If D is a linear subspece, it is its own tangent space in each point. If D isaffine, its tangent space at all point is the linear subspace parallel to D,or, what is in fact completely equivalent, one can figure the tangent spaceof D at P as D itself, but with the origo at P : all the vectors start fromP . This can be important so as to understand boundary conditions whichact as constraints for problems within the Calculus of Variations.

When the constraint is given as φ1(x) = 0, . . . , φk(x) = 0, the ob-jective function f to be optimized can augmented by introducing kLagrange multipliers and seek instead a critical point of

f(x, λ1, . . . , λk) = f(x) +

k∑i=1

λiφi(x).

Example. Optimizing a function on the circle S1. Set g(x, y) = xy2 + y4−y2 +x+ 1. We want to find the minimum of g along the unit circle S1 ⊂ R2.Substitution works here, we can replace y2 by 1− x2 and solve for

arg.minx∈[−1,1]

f(x) = x(1− x2) + (1− x2)2 − (1− x2) + x+ 1.

The derivative of f has a unique real root, located at x ≈ −0.763284 and itcan be check that this is indeed a minimum for f , thus g has two minima,at (x,±

√1− x2).

Now we do it geometrically. If x = (x, y) ∈ S1, we can make an or-thonormal basis

~τ =

(y−x

)and ~ν =

(xy

)with ~τ tangent to S1 at x and ~ν normal; ~τ is thus a basis of the tangentspace TxS1. Our optimality condition is then

∇xg · ~τ = 0.


We denote by ∇S1,xg this quantity, the gradient of g along S1. A gradientdescent approach gives

xt = −∇S1,xg.

A first attempt to solve it numerically:

xn+1 − xn

dt= −∇S1,xg ⇐⇒ xn+1 = xn − dt∇S1,xg.

The problem here is that the new point xn+1 = xn − dt −∇S1,xg will gen-erally not belong to the circle, as illustrated in Figure 2.2, and this becausethe constraint is non linear, and we take a finite step as opposed to an in-finitesimal one. We may project back by renormalizing, this is point xn+1

on the figure, or better, by “rolling” the update vector along the circle suchthat the length of the arc from xn to xn+1 is the same as ‖xn+1 − xn‖.Because arc length and angles are the same for the unit circle, what we need

Figure 2.2: Optimization along the circle

to do is to rotate xn by an angle of size ‖xn+1 − xn‖ with the proper sign,this is just h = −dt∇g · ~τ since ~τ has norm one. This can be written as

xn+1 = R−hx =

(− cosh sinh− sinh − cosh

)xn = e

0 h−h 0

xn.

The matrix Rh is the rotation of angle −h. The minus sign is due to thechoice of orientation for TxS1, a positive h will generate a clockwise rotation,while the standard choice is counterclockwise. The map that rolls the tangentvector xn+1−xn to the circle is called the Exponential map and in this casehas something to do with matrix exponential!

Finally, a numerical algorithm for our problem can be given as

2.7. INVERSE AND IMPLICIT FUNCTION THEOREMS 21

• Let x0 ∈ S1 a starting value.

• While not converged up to a given threshold do

1. Compute h = −dt (∇gxn · ~τxn)

2. Set xn+1 = R−hxn

• output x∞

A numerical implementation in Matlab, gives the following:

Starting point x0 steady state x∞1√2(1, 1) (−0.763284, 0.646064)

− 1√2(1, 1) (−0.763284,−0.646064)

The two solutions correspond to what we got by substitution. Note that theminimizer is not unique, and that Gradient Descent clearly depends on thestarting point.

2.7 Inverse and Implicit Function Theorems

In this paragraph I recall tow classical theorems in Rn and more generalyfor a Banach space (that is: a real, normed vector space, which is completefor the distance induced par norm, Hilbert spaces are Banach spaces), butI will not go beyond Rn and finite dimensional spaces.

Theorem 10 Inverse Functions Theorem Set f : U ⊂ Rn → Rn with U anopen subset of RRn, a continuously differentiable function on U . Assumethat there exists a point x0 of U such that je Jacobian matrix of f at thatpoint Jfx0 est invertible. Then there exists an open subset x0 ∈ V ⊂ Uand an open subset f(x0) ∈ W ∈ Rn such that f is a diffeomorphism fromV onto W , i.e there exists g : W → V continuously differentiable withg(f(x)) = x,∀x ∈ V and f(g(y)) = y,∀y ∈W and Jgf(x) = (Jfx)−1.

The most important consequence of this theorem is the Implicit Func-tions Theorem (these dtwo theorems are in fact logically equivalent). Thistheorem precises when an implicit relation between point can be transformedinto a differentiable explicit one, and does provide locally the differential: ifpoints (x1, . . . , xn, y1, . . . , yk) satisfy simultaneously k relations of the form

fi(x1, . . . , xn, y1, . . . , yk) = 0, i = 1 . . . k


when can we have

yi = φi(x1, . . . , xn), i = 1 . . . k.

Theorem 11 Implicit Functions Theorem. Let f : U ∈ Rn × Rk → Rk,f = (f1, . . . , fk) be a continuously differentiable map. Assume that a =(a1, . . . , an) and b = (b1, . . . , bk) satisfy (a,b) ∈ U et f(a,b) = 0. Computethe partial Jacobian matrix with respect to the yi de f at (a,b):

Jyf(a,b) =

∂f1∂y1

(a,b) . . . ∂f1∂yk

(a,b)...

......

∂fk∂y1

(a,b) . . . ∂fk∂yk

(a,b)

this is a k×k matrix. If Jyf(a,b) is invertible, there exists a neighborhood V

of a ∈ Rn, a neighborhood W of b ∈ Rk such that V ×W ⊂ U and a uniquecontinuoulsy differentiable map g : V → W , g = (g1, . . . , gk) such that forall x ∈ V , one has f(x, g(x)) = 0. Moreover, if Jxf is the partial Jacobianmatrix of f with respect to the xj at such a point (x,y = g(x)) and Jgx isthe jacobian matrix of g at x,

Jxf(x,y) =

∂f1∂x1

(x,y) . . . ∂f1∂xn

(x,y)...

......

∂fk∂x1

(x,y) . . . ∂fk∂xn

(x,y)

, Jgx =

∂g1∂x1

(x) . . . ∂g1∂xn

(x)...

......

∂gk∂x1

(x) . . . ∂gk∂xn

(x)

which are k × n matrices, and one has

Jgx = −(Jyf(x,y)

)−1Jxf(x,y).

A standard example: Set f(x, y) = x2 + y2 − r2, the set f(x, y) = 0 isthe circle of radius r and centre at (0, 0). The partial Jacobian matrix withrespect to y is simply is partial derivative

fy = 2y.

If (x0, y0) is a point of the circle which does not lie on the horizontal plany = 0 we get of course that fy(x0, y0) = 2y0 6= 0. The implicit functionstheorem then says that there exists a unique local parameterization y = g(x)in a neighborhood of (x0, y0) with f(x, g(x)) = 0 and the derivative of g isgiven by

g′(x) = − 1

fy(x, y)fx(x, y) = − 1

2y2x = −x

y= − x

g(x)

2.7. INVERSE AND IMPLICIT FUNCTION THEOREMS 23

and g is a solution of the differential equation g′g = −x. But g′g = 12(g2)′

and −x = −12(x2)′ therefore g2(x) = c−x2. Since f(x, g(x)) = x2 +g2(x) =

r2, one must have c = r2 and the two solutions are

g(x) = ±√r2 − x2

a not entirely unexpected result.

Chapter 3

Infinite Dimension andOptimization of Functionalson Rn

I will not try to write rigorous statments in this chapter, only pretend todo so one in a while, and I will allow myself a series of abuses (but still,reasonable abuses). Many, if not all, derivations of formulas will be donepurely formally (or somewhat informally I should say).

3.1 Inner Products, Boundary Conditions,...

The general framework is the following: we are given an open set Ω of Rn(often n = 2)and a space of functions, denoted L2(Ω), these functions areregular enough in the sense that the following integral always exists (it takesa finite value)

f, g 7→ 〈f, g〉 =

∫Ωfg dx.

This operation defines a bilinear form, symmetric and positive definite onL2(Ω), i.e. an inner product and one can show that this space is a Hilbertspace. “bingo!”, most of what we said on gradients and adjoints will stillhold.. We may also restrict this function space to some particular subspaces,generally by imposing the existence of partial derivatives up to a certainorder, something one may wish when one is interested in solving PartialDifferential Equations. We also may wish to impose special propertiesinthe case where Ω has a boundary ∂Ω, along this boundary a vector fieldoriented towards the exterior of Ω, of norm 1 at each point (for the usual

25

26 CHAPTER 3. OPTIMIZATION OF FUNCTIONALS

inner product in Rn...) and orthogonal to the boundary (i.e. to the tangentline, plane....) along ∂Ω). We will almost always denote this field by n inthe sequel, n = (n1, . . . , nn)T and we call it the exterior or outer normalalong ∂Ω. Among conditions on ∂Ω, called boundary conditions we will beespeciallyt interested to

• impose that a function takes predefined values along ∂Ω (Dirichlet),par exemple 0,

• impose that the directional derivative of a function in the direction den along ∂Ω takes prescribed values (Neumann), often 0.

After I have obfuscated a bit the notion of gradient in the previous section,in the sequel, the gradient of a good old function from U ⊂ Rn → R willalways be the good old one too. However for a functional, the gradientwill be defined via the above inner product or a related one, in the case ofvector valued functions it is easly extended, but I will at some point be ledto consider inner products that include partial derivatives functions, in theframework of Sobolev spaces.

In the case of vector-valued functions, if f et g are functions with valuesin Rmone can also define an inner product with the same notations

〈f ,g〉 =

∫Ω

f · g dx.

This is the integral of the inner productof the values of these functions,andfunctions for which this product is always finite will be denoted by L2(Ω)m.

3.2 Classical Integration Results

Now we turn to a series of classical results of multivariate integral calculus:integration by parts, Green formulas, divergence theorem, they are essen-tially three formulations/applications of the Stoke formula.

In all the following results, we assume the functions we manipulate to bedifferentiable enough so we can forget about complicated functional analysisarguments.

Theorem 12 Integration by parts. Let u and v be differentiable functionsdefined on Ω ∪ ∂Ω. On has∫

Ω

∂u

∂xiv dx = −

∫Ωu∂v

∂xidx+

∫∂Ωu v νi ds

3.2. CLASSICAL INTEGRATION RESULTS 27

where ds is “boundary integration element” and νi the i-th component of theouter normal (I will describe a bit more precisely in the Chen Vese sectionwhat this boundary integral is).

Figure 3.1: Domain Ω, its boundary ∂Ω and outer normals n

In dimension 1, if Ω is the interval (a, b) the outer normal at a is −1 andthe outer normal at b is +1. In that case, the previous formula becomes∫ b

au′v dx = −

∫ b

auv′ dx+ u(b)v(b)− u(a)v(a)

and this is the classical integration par parts formula.

Theorem 13 Green’s Formulas. They are equivalent to the prevous one(after a bity of work)

1. ∫Ω

∆u dx =

∫∂Ω

∂u

∂nds

where ∂u∂n is the directional derivative of u in the direction of n (it was

denoted duxn or just dun in the previous section).

2. ∫Ω∇u · ∇v dx = −

∫Ωu∆v dx+

∫∂Ωu∂v

∂nds

3. ∫Ωu∆v − v∆u dx =

∫∂Ωu∂v

∂n− v ∂u

∂nds.


Theorem 14 Divergence Theorem. Let F = (F1, . . . , Fn)T be a vector fielddefined on Ω ∪ ∂Ω and divF :=

∑ni=1

∂Fi∂xi

(remark that F has as manycomponents as the dimension of Ω)

q

∫∂ΩF · n ds =

∫Ω

divF dx.

In dimension 1, this is the “Fundamental theorem of Calculus”

F (b)− F (a) =

∫ b

aF ′(x) dx.

3.3 Adjunction

Next step: we start by rewriting the integration by parts formula with thehelp of our inner product of functions.

〈 ∂u∂xi

, v〉 = 〈u,− ∂v

∂xi〉+

∫∂Ωuvni ds.

Assume that we are interested only by functions which vanish along ∂Ω(they still form an Hilbert space), then the formula becomes

〈 ∂u∂xi

, v〉 = 〈u,− ∂v

∂xi〉

and it says that the adjoint of the linear map ∂∂xi

is − ∂∂x1

.Next we look at the Gradient linear map gradient, ∇ : u 7→ ∇u, the

good old gradient in Rn. Let v = (v1, . . . , vn) be a function with values inRn (i.e. a function from L2(Ω)n). I will from now denote (and it was timeto) ∂f

∂xi= fxi . Utilising integration by parts for each of the variables xi,

i = 1 . . . n,

〈∇u,v〉 =

∫Ω∇u · v dx

=n∑i=1

∫Ωuxiv

i dx

= −n∑i=1

∫Ωuvixi dx+

n∑i=1

∫∂Ωuvini ds

= −∫

Ωudiv(v) dx+

∫∂Ωuv · n ds.

3.4. CHANGE OF VARIABLES THEOREM 29

Now, if u vanishes along ∂Ω or if v is orthogonal to n along ∂Ω, the boundaryintegral is zero and one gets

〈∇u,v〉 = 〈u,−div(v)〉

i.e. in the “good function spaces” (good depends on boundary conditions)the adjoint of the gradient is ∇∗ = −div (note the minus sign).

3.4 Change of Variables Theorem

Before ending this chapter, I recall the Change of Variables theorem, in itssimplest version to start with.

Theorem 15 Let φ : [e, f ] → [a, b] be such that ∀r ∈ [e, f ], φ′(r) > 0. Letg : [a, b]→ R an integrable function. Then∫ b

ag(x) dx =

∫ f

eg(φ(r))φ′(r) dr.

The standard multidimensional version:

Theorem 16 Let U and V be open subsets of Rn, φ : U → V a diffeomor-phism. We will denote by Jφ its Jacobian matrix and by |Jφ| its Jacobian,i.e. the determinant of that matrix. Let g : V → R an integrable function,then ∫

Vg(x) dx =

∫Ug(φ(y))|Jφy| dy.

Chapter 4

Applications to theOptimization of Functionalsfor Domains of Rn

In this chapter, we collect a series of standard examples of functionals used inImage Analysis and sometimes elsewhere, and the corresponding gradientsand Euler-Lagrange equations. Here too the treatment is largely informal,and most questions regarding function spaces and other existence stuffs arecompletely forgotten.

4.1 Tikhonov Regularization

We will first look at Tikhonov Regularization [13], or merely a special formof it, used in denoising. It consists in finding the minimizer of the followingenergy

E(v) =

∫Ω

((v − u0)2 + λ|∇v|2

)dx = 0.

where |∇v|2 = ∇v · ∇v. A minimizer must be close enough of u0 whilebeing smooth. As in the finite dimensional case, a necessary condition fora function u to minimize E,i.e. E(u) is a (local) minimum, is that thegradient of E must vanish at that point: ∇Eu = 0, this is the l’Euler-Lagrange Equation associated to E.

We will obtain it with the method of directional derivatives. Set `(t) =E(v + th) and compute `′(0). First note that if f(t) : R→ R2, then

d

dtf · f = 2

df

dt· f

31

32 CHAPTER 4. SOME APPLICATIONS

(it is a trivial generalization of the rule (f2)′ = 2ff ′). A direct computationthen gives (

∫et d

dt can always be interchanged when the functions underconsideration are “nice enough”)

`′(t) =

∫Ω

d

dt

(v + th− u0)2 + λ|∇(v + th)|2

dx

= 2

∫Ω(v + th− uo)h+ 2λ∇h · ∇(v + th) dx

and at t = 0 one gets

`′(0) = 2

∫Ω(v − u0)h+ λ∇h · ∇v dx.

Recall either Green’s second formula or the computation we just did forthe adjonction gradient/divergence. Assume for instance that we only areinterested to the class of functions vs such that ∇v ·n = ∂v

∂n = 0, the so calledNeumann boundary conditions, we then can use the adjonction property soas to get

`′(0) = 2

∫Ω(v − u0)− λdiv(∇v)h dx.

But div(∇v) = ∆v and one has

`′(0) = 2

∫Ω

(v − u0 − λ∆v)h dx = 〈2(v − u0 − λ∆v), h〉

which means precisely that the gradient of E at v is 2(v−u0−λ∆v) and thatthe condition for a fonction u that satisfies the boundary condition ∂u

∂n = 0on ∂Ω minimizes the Tikhonov regularization functional is

u− u0 − λ∆u = 0,∂u

∂n |∂Ω= 0.

Note that I have included the boundary condition ∂u∂n |∂Ω

= 0, it indicates

indeed the hypothese we provided in order to have an adjunction. Moreovera boundary condition such as that one is necessay in order to solve theEuler-Lagrange Equation. A carefull look shows that this calulation is,especially in its gradient/divergence part, analogous to the calculation ofgradient made in section 2.3: take f(y) = λy · y and g(v) = ∇v. In thisexample g is linear, therefore equal to its differential. f is quadratic, itsdifferential is thus linear, the resulting equation is linear. lineaire. In fact,every functional/energy purely quadratic has a linear gradient. It is true infinite dimension, the gradient of an homogeneous polynomial of degree 2 isa linear expression, and it is still true in infinite dimension.

4.2. RUDIN-OSHER-FATEMI MODEL, BV REGULARIZATION 33

4.2 Rudin-Osher-Fatemi Model, BV Regulariza-tion

Tikhonov regularization is generaly not well suited for image denoising asis imposes a too strong constraint on the image gradient ∇u: large gradi-ents are penalized too much via the quadratic term |∇u|2 and the resultingimage present generally s too smooth to be acceptable. BV Regularization(“Bounded Variations”), first proposed by Rudin, Osher and Fatemi in [12]proposes a weaker constraint by penalizing the gradient norm |∇u| insteadof its square. The corresponding functional is

F (v) =

∫Ω

(v − u0)2 + λ|∇v|

dx.

Before starting the actual computation of the gradient, a digressionseems necessary in order to explain the difference between Tikhonov reg-ularization and BV regularization.

The total variation of a function is defined by

TV (u) =

∫Ω|∇u| dx.

where |∇u| =√∇u · ∇u. A function u is sait to have bounded variation,

BV if TV (u) < ∞. In fact total variation can be defined for functionswhich have a “serious” differentiability problem via the concept of Radonmeasure,but, I said I would not dig into Functional Analysis, so I will notcontinue in that direction. I will propose instead a dimension 1 examplethat will allow to clarify the concept as well as show the difference with theterm ∫

Ω|∇u|2 dx.

We will use a very simple function, a modified Heaviside function definedon R by

H(x) =

0 si x < 0

a sinon.

with a a real. Which sense can we give to TV (H) since there is a seriousdiscontinuity problem at 0, and thus H ′(0) is not defined? Integration re-moves in some sense the problem! Let us choose h > 0, and write (a bitformally) to start with

TV (H) =

∫ ∞−∞|H ′(x)| dx =

∫ −h2

−∞|H ′(x)| dx+

∫ −h2

−h2

|H ′(x)| dx+

∫ ∞h2

|H ′(x)| dx


On ] − ∞,−h/2] and [h/2,∞[, H is constant, its derivative vanishes andtherefore

TV (H) =

∫ −h2

−h2

|H ′(x)| dx.

When h→ 0 the remaining term is

TV (H) = limh→0

∫ −h2

−h2

|H ′(x)| dx.

We replace the pseudo-derivative (it exists in fact as an object called adistribution, but not as a function) by the unsurprising approximation

H(h/2)−H(−h/2)

h.

Indeed, if H was differentiable at 0, this approximation would converge,basically by definition, toward H ′(0)! With the choosen definition of H,this approximation takes the value a/h. Replacing it into the integral, onegets

TV (h) = limh→0

∫ −h2

−h2

|H(h/2)−H(−h/2)|h

dx = limh→0

∫ −h2

−h2

|a|hdx = lim

h→0h|a|h

= |a|

and the total variation H is the jump height at 0. If one would remake thecomputation for ∫ ∞

−∞|H ′(x)|2 dx

with the same approximations∫ ∞−∞|H ′(x)|2 dx = lim

h→0h|a|2

h2= lim

h→0

|a|2

h=∞.

There is even more with the BV model. Instead of the function H above,let us consider the “nicer” edge function

Hε(x) =

0 x <= −εax2ε + a

2 −ε < x < ε

a x > ε

This is illustrated in figure 4.1.

4.2. RUDIN-OSHER-FATEMI MODEL, BV REGULARIZATION 35

Figure 4.1: Two edge functions Hε for different values of ε.

By direct integration (left to the reader) one gets that∫ ∞∞|H ′ε(x)| dx = |a|

and depends only on the edge height and not on the edge slope a/2ε, Thesame calculation for the L2 gradient norm gives

∫ ∞∞|H ′ε(x)|2 dx =

a2

2ε

and decreases with increasing ε, i.e. with flatter edges. Tichonov regulariza-tion would favor these types of flatter edges, thus blurring contours, whileTV regularization disregards slope information.

Conclusion (a bit fast, but it really holds): Solutions of Tikhonov regular-ization problem cannot have discontinuities, while solutions of BV regular-ization problems are generally associated to edges/s. Solutions of Tichonovregularization prefer blurred edges to crisper ones. Solutions of the BVproblem do not a priori distinguish between them and this can be decidedby the data, for instance.

The BV problem is thus much more interesting for contour preservation.

Let us compute now the gradient of F (v). Here too we will use themethod of directional derivatives. Define the function `(t) = F (v + th).Then one has, using this time the rule

(f1/2)′ =1

2f−1/2f ′


`′(t) =

∫Ω

d

dt

(v + th− u0)2 + λ(∇(v + th) · ∇(v + th))1/2

dx

=

∫Ω

2(v + th− u0)h+ λ

d

dt(∇(v + th) · ∇(v + th))1/2

dx

=

∫Ω

2(v + th− u0)h+

λ

2|∇(v + th)|d

dt(∇(v + th) · ∇(v + th))

dx

=

∫Ω

2(v + th− u0)h+ λ (∇(v + th) · ∇(v + th))−1/2∇h · ∇(v + th)

dx

and for t = 0, one gets

`′(0) =

∫Ω

2(v − u0)h+ λ|∇v|−1∇h · ∇v

dx

=

∫Ω

2(v − u0)h+ λ∇h · ∇v

|∇v|

dx.

We need to tranform the term containing ∇h into a term containing h∫Ω∇h · ∇v

|∇v|dx.

In order to get rid of this gradient, we use the l’adjunction gradient / diver-gence, with boundary conditions ∂v

∂n = 0 ∂Ω so as to obtain∫Ω∇h · ∇v

|∇v|dx = −

∫Ωhdiv

(∇v|∇v|

)dx

and, gathering the pieces,

`′(0) =

∫Ω

2(v − u0)− λ div

(∇v|∇v|

)h dx = 〈2(v − u0)− λ div

(∇v|∇v|

), h〉

i.e. the gradient of F at v is ∇Fv = 2(v − u0) − λ div(∇v|∇v|

)and a mini-

mizer u of F must satisfy the Euler-Lagrange equation, setting λ′ = 2λ andforgetting the “’ ”:

u− u0 − λdiv

(∇u|∇u|

)= 0,

∂u

∂n |∂Ω= 0.

This computation too corresponds to the one performed in section 2.3:one takes f(y) = λ

√y · y and g(v) = ∇v. In this example g is linear,

therefore equal to its differential. This time f is more complex than in the

4.3. A SOMEWHAT GENERAL LAGRANGIAN 37

Tikhonov regularization case, its differential toand the resulting equation isvery non linear.

In reality, it is not only non linear, it is also non smooth, and optimalityconsition should be defined in term of subdifferential and subgradient. Theinterested reader may consult [14] or [1].

4.3 A Somewhat General Lagrangian

In this paragraph we look at the minimization problem

arg.minv

E(v) =

∫ΩL(x, v, vx1 , . . . , vxn) dx

where L : x ∈ Rn, p ∈ R, q1 ∈ R, . . . , qn ∈ R 7→ L(x, p, q1, . . . , qn) is calledthe Lagrangian of the functional E. We will comnpute its gradient via themethod of directional derivatives. As we did before, set `(t) = E(v + th)and let us calculate `′(0). Omitting a few details, one can check that

dEv = `′(0) =

∫Ω

Lp(x, v, vx, vy)h+

n∑i=1

Lqi(x, v, vx1 , . . . , vxn)hxi

dx.

(4.1)This expression is clearly linear in h but with partial derivatives. In orderto obtain the gradient, we must get rid of them and we use the integrationby part formule, which here reads as (x, v.vx1 . . . , vxn):

dEv =

∫Ω

Lp −

n∑i=1

∂

∂xiLqi

h dx+

∫∂Ωh

n∑i=1

Lqini ds

=

∫Ω

Lp −

n∑i=1

∂

∂xiLqi

h dx+

∫∂Ωh

Lq1...Lqn

· n ds


If we are in a space of functions that vanish along ∂Ω or if

Lq1...Lqn

·n is zero

on ∂Ω, the boundary term is 0 and the sought gradient is

∇Ev = Lp(x, v.vx1 . . . , vxn)−n∑i=1

∂

∂xiLqi(x, v.vx1 . . . , vxn) (4.2)

= Lp(x, v.vx1 . . . , vxn)−n∑i=1

n∑j=1

Lqqj (x, v.vx1 . . . , vxn)vxj

n∑i=1

n∑j=1

Lqiqj (x, v, vx1 . . . , vxn)vxixj . (4.3)

The Euler-Lagrange equation of a minimizer of u is then

Lp(x, u, ux1 , . . . , uxn)−n∑i=1

∂

∂xiLqi(x, u, ux1 , . . . , uxn) = 0

with corresponding boundary conditions. If one sets

F (x, u,∇u) =

Lq1(x, u, ux1 , . . . , uxn)...

Lqn(x, u, ux1 , . . . , uxn)

,

the associated Euler-Lagrange equation can be written as

Lp(x, u, ux1 , . . . , uxn)− div (F (x, u,∇u)) = 0

and one says that it is in conservative form. The conservative form is inter-esting in that is indeed expresses certain conservation properties (but I willnot say which ones!). Important for us is that the writing with the help ofa divergence may have consequences for the numerical schemes developedto solve the Euler-Lagrange equation (see my notes “Sur la reesolution decertaines EDP en Analyse d’Images” which will be translated into Englishsoon (?)).

A remark which has its importance When the differential is expressedvia (4.1), v appears in dE with derivatives with the same order as in E, i.e.1. When the gradient formulation (4.3) is used, v appears with second orderderivatives. It really means that solutions of the minimization problemsabove need not be twice differentiable!

4.4. HEAT EQUATION AND CALCULUS OF VARIATIONS 39

4.4 Heat Equation and Calculus of Variations

In this paragraph, I present two ways of deriving the Heat Equation fromvariational methods. The first is a gradient descent on an erergy term whichis precisely the regularization term of Tikhonov regularization. The second,more abstract, is also Tikhonov like, with derivatives up to order ∞ in theregularization term and follows the computation of Nielsen et al. in [11].

In the first derivation, we are interested to the minimization of energy:

E(v) =

∫Ω|∇v|2 dx

by gradient descent. By redoing verbatim some of the computations per-formed in the Tikhonov regularization paragraph, the gradient of this energyis, with proper boundary conditions,

∇Ev = ∇∗∇v = −div (∇v)− = ∆v.

and the Euler-Lagrange equation associated to it is −∆u = 0, this is theLaplace equation. Solutions depend deeply on boundary conditions. If onerequires that u|∂Ω = 0, the corresponding solution is u ≡ 0. In general,Laplace equation is associated with a Dirichlet boundary condition u|∂Ω = ffor some prescribed function f defined along ∂Ω. This is one of the mostwell studied PDE, especially because of its usefulness in a large variety ofproblems. The solutions the Laplace Equation are called harmonic functionsand possess many regularity properties, in particular they are analytic, i.e.admit and are equal to their Taylor series expansion in a neighborhood ofevery point of Ω (thus indefinitly differentiable, but harmonicity is more),see [6] for instance. They provide an analytic prolongation of f in Ω.

If general Neumann boundary conditions are imposed, instead of Dirich-let, i.e.

∂u

∂n

∣∣∣∣∂Ω

= f

there may be no solutions or multiple ones. Using Green’s formula (13) (thesecond one) we find indeed that

0 =

∫Ω

∆u dx =

∫∂Ω

∂u

∂nds =

∫∂Ωf ds.

Thus the condition∫∂Ω f ds = 0 is necessary for solutions to exist. When

a solution exists, then infinitely many do, by adding any real constant. Infact, at least when Ω has some nice symmetries, then the solutions for f ≡ 0


are exactly the constant functions. In our imaging problems, it will alwaysbe the case.

We now choose these Neumann conditions and consider the gradientdescent approach for the Heat Equation. From an initila value u0, aradientdescent builds a “time dependent” function u : Ω × [0,∞) with u(x, 0) =u0(x), which satisfies the following equation

ut = −∆u, u(−, 0) = u0. (4.4)

where the Laplacian is computed with respect to spatial variables. This isexactly the Heat equation with initial value u(−, 0) = u0. Its value at t =∞solves −∆u = 0, and is therefore a constant from what I said above. Thisconstant is the mean value

1

|Ω|

∫Ωu0 dx

where |Ω| denotes the area (length, volume...) of Ω. This comes from thefollowing: integrate (4.4) and use the first Green’s formula (Theorem 13-2).

1

|Ω|

∫Ωut dx = − 1

|Ω|

∫Ω

∆u dx = − 1

|Ω|

∫∂Ω

∂u

∂nds = 0

because of our boundary conditions. But

0 =1

|Ω|

∫Ωut dx =

d

dt

(1

|Ω|

∫Ωu dx

)which means that the mean value remains constant.

We consider now the second approach. This was proposed in dimension1 by Nielsen et al. in [11] and a straightforward generalization to dimension2 was given in [7]. I present that one in details. The domain Ω here isR2 and we work with C∞ functions that are regular in the sense that thefollowing energy is always finite

E(u) =1

2

(∫∫(u− u0)2 dxdy +

∞∑k=1

λk

k!

∫∫ k∑`=0

((k

`

)∂ku

∂x`∂yk−`

)2

dxdy

)(4.5)

where(k`

)are the binomial coefficients (the space of such function is non-

empty, it is in fact infinite-dimensional as it contains, for instance, all thesmooth functions with compact support).

4.4. HEAT EQUATION AND CALCULUS OF VARIATIONS 41

In computing its gradient, the problem lies of course in the regularizerpart. Let us set

Lk(u) =λk

k!

∫∫ k∑`=0

((k

`

)∂ku

∂x`∂yk−`

)2

dxdy

and computed

dtLk(u+ tv)

∣∣∣∣t=0

i.e. we still use the method of directional derivatives. We use the followingfact, whose proof is of course left to the reader: In our function space, wehave the following adjunction result: if

Tm,n :=∂m+n

∂xm∂ynthen T ∗ = (−1)m+nTm,n.

Thanks to this formula, one get

d

dtLk(u+ tv)

∣∣∣∣t=0

=

(−λk

)k!

∫∫ ( k∑`=0

(k

`

)T 2k,2`u

)v dxdy

Then the reader can check the identity

k∑`=0

(k

`

)T 2k,2`u = ∆ · · · ∆︸︷︷︸

k times

u := ∆(k)u

(∆(k) is the k-th iterated Laplacian). Putting the pieces toghether, we getthat

d

dtE(u+ tv)

∣∣∣∣t=0

= 〈u− u0 +∞∑k=1

(−λ)k

k!∆(k)u, v〉

= 〈−u0 +∞∑k=0

(−λ)k

k!∆(k)u, v〉.

We have therefore the sought gradient as the first argument of the aboveinner product. Recall a few general facts about the exponentional of anelement tA in a Banach algebra (like square matrices for instance, whereproduct and composition are the same), t being a real number,

1. Power series: etA =∑∞

k=0tkAk

k!


2. Inverse:(etA)−1

= e−tA

3. Derivative: d/dtetA = AetA.

With the first formula, we can rewrite, at least formally

∇uE = e−λ∆u− u0.

For the gradient to be null, we deduce, from the second formula that thesolution u, which thus depends on λ must satifsy

u := u(λ) = eλ∆u0

from which we get u(0) = u0, and computing the derivative with respect toλ, we get

∂u

∂λ= ∆eλ∆u0 = ∆u

and we find once again the heat equation! Althout apparently purely formal,this derivation is more meaningful that it may seem. It is known indeed thatthe solution u can be written as a convolution with a Gaussian kernel:

u = Gλ ∗ u0, Gλ(x, y) =1

2πλe−

x2+y2

2λ .

However, when discretizing the problem, the Laplace operator becomes amatrix M , and the exponential of tM provides a good approximation ofGaussian kernel (see my notes “Sur la reesolution de certaines EDP en Anal-yse d’Images” which will be translated into English soon (?)).

To conclude that paragraph, let us add that the computation performedhere in dimension 2 extends to higher dimensions in a straightforward way,we need just to replace binomial coefficients and expansion by more generalmultinomial coefficients and expansion.

4.5 Optical Flow a la Horn and Schunck

Recovery of apparent motion is a standard problem in image anaysis. Inits simplest formulation, two images are given u1 and u2 and we want todetermine a best way of deforming u1 into u2, knowing that these imagescome from the observation of the same dynamic scene at to differenr timeinstants.

Intensity preservation during displacement, sometimes referred to as theLambertian Assumption, is the simplest hypothesis We may look for a dis-placement field d that encodes this preservation:

u1(x) = u2(x+ d(x)).

4.5. OPTICAL FLOW A LA HORN AND SCHUNCK 43

By setting u1(x) = u(x, t), u2(x) = u(x, t+dt) we have u(x+d(x), t+dt) =u(x, t). Then for small dt the displacement d(x) can be approximated asdt v(x) where v(x) = v(x, t) is the instantaneous velocity of x at time t– the approximation comes from Taylor expansion on as the displacementtends to 0 when dt→ 0, and by performing a Taylor expansion, this can beapproximated as

u(x+ v(x), t+ dt)− u(x, t) ≈ dt(∇u · v + ut) ≈ 0

where ∇u is the gradient of u w.r.t spatial variables and ut is the timederivative of u. The equation

∇u · v + ut = 0 (4.6)

is called the Optical Flow Constraint Equation (OFC or OFCE in the sequel).At each pixel/image location, it provides at most one constraint, and is thusinsufficient to determine the full velocity/displacement field. This is the socalled aperture problem.

A first solution has been proposed by Horn and Schunck in [8] and con-sists in solving the OFCE in a least square sense while imposiing regularityto the velocity field v = (vx, vy), vx and vy being the x and y componentsof the velocity vectors. Horn and Schunck is thus

EHS(v) =1

2

∫Ω

(∇u · v + ut)2 dx+

λ

2

∫Ω

(|∇vx|2 + |∇vy|2

)dx. (4.7)

We compute its gradient via the usual directional derivatives technique,by setting `(s) = EHS(v + sw) where w is a vector field of deformations ofv. The only “difficulty” comes from the fact that now v and w are vectorvalued. A straightforward computation gives

`′(0) =

∫Ω

(∇u · v + ut)∇u · w dx+ λ

∫Ω

(∇vx · ∇wx +∇vy · ∇wy) dx

=

∫Ω

[(∇u · v + ut)∇u− λ

(∆vx

∆vy

)]· w +

∫∂Ωw ·(∂vx

∂n∂vy

∂n

)ds

In order to get a gradient, boundary conditions are needed, and choosingvariations w that vanish at the boundary means Dirichlet conditions for v, itcould be for instance null motion, if one knows for instance that the camerais fixed, the background is static, and projected moving object do not crossthe image boundary. We may instead ask that the normal derivatives of theflow components vanish at the boundary, this means some sort of constant


motion, for instance a translational motion of the camera parallel to theimaging plane. None of them is completly natural, but this is it...

Choose one of them in order to eliminate the boundary term and theEuler-Lagrange equation is

∇vEHS = (∇u · v + ut)∇u− λ(

∆vx

∆vy

)= 0.

This equation is only valid for smooth enough images or small displacements.When considering large displacements, a standard solution is to use mul-tiresolution framework. Although widely criticised as a poor quality motionestimator, interestingly, modern discretizations for the Horn and Schunckapproach provide decent solutions for the optical flow problem. [Francoisdit: I should add a reference / link to the Middlebury OpticalFlow Evaluation Database].

4.6 Non Linear Flow Recovery

[Francois dit: I will probably have time to write some thing someday, but not now, sorry.]

Chapter 5

Planar curves

In this chapter, I will recall facts about the geometry of planar curves:length, arc length parametrization, curvature, Frenet formulas and the ideasthat allow us to define the notions of inner product and gradient, adaptedto the problems of optimization of curves.

A parametrized planar curve is a mapping

c : [a, b]→ R2, p 7→ c(p) = (x(p), y(p)).

The “geometric curve” is the corresponding subset c([a, b]) of the plane,together with an orientation. If A = c(a) 6= c(b) = B, then c is an opencurve with endpoints A and B. if A = B, then c is a closed curve. If c isdifferentiable, then we say that it is regular if its velocity vector cp nevervanishes.

From now on we will, unless otherwise explicitly stated, focus on the casewhere c is continuously differentiable. This is not strictly necessary, but itmakes life easier.

5.1 Moving frames

For such a smooth curve, we define a moving frame: To each point P = c(p),we attach a planar frame defined by the tangent and normal vector to thecurve, such that whenever cp 6= 0, we denote by

~τ(p) =cp|cp|

=1√

x2p + y2

p

(xpyp

)

45

46 CHAPTER 5. PLANAR CURVES

the normalized tangent velocity vector, and we define

~ν(p) =1√

x2p + y2

p

(−ypxp

).

Figure 5.1: A curve and its moving frame, the osculating circle at a pointP , and the radius of curvature r.

5.2 Length, arc-length parametrization, curvature

The length of c is given by

`(c) =

∫ b

a|cp| dp.

We can, for example, approximate the curve segments as shown in figure 5.2.A curve is an arc length parametrization, or for short, an a-l parametrizationif |cp| ≡ 1, that is, if cp = ~τ . If c is regular, it always admits an arc lengthparametrization. As a consequence, writing L for the length of c, the map

σ : [a, b]→ [0, L], σ(p) =

∫ p

a|cp| dp

is invertible because σp = |cp| > 0. Denote by π its inverse mapping; thenc : s 7→ c(π(s)) is an arc-length parametrization by the differentiation rules

5.2. LENGTH, ARC-LENGTH PARAMETRIZATION, CURVATURE 47

Figure 5.2: Approximation of segments and length.

for composed maps. In general we forget about the˜and distinguish betweenthe two parametrizations by their arguments. We use the argument p forarbitrary parametrizations and reserve the argument s for a-l parametriza-tions, for which ~τ(s) = cs(s).

From the last formula, we see that

〈cs, cs〉 ≡ 1 =⇒ d

ds〈cs, cs〉 = 0 = 2〈css, cs〉

and thus css is orthogonal to ~τ . Thus there exists a real number κ such thatcss = κ~ν. We call κ = κ(s) the algebraic curvature of s (or rather c(s)).Note that its inverse, 1/κ(s) is the radius of the osculating circle of s, thatis the largest circle which is tangent to c at s and located in the ”concavity”of c at c(s), i.e. it is the best approximation of c by a circle at that point.See figure 5.1.

Since |~τ | = 1, only the direction of ~τ is varying, and the curvatureindicates the speed of its angular variation. In the case where c is a circlecentered at (0, 0) with radius r, c is its own osculating circle of curvatureat each point, and we expect the curvature at any point to be 1/r. We canverify this: The a-l parametrization of c is simply

s : [0, 2πr] 7→ (r coss

r, r sin

s

r).

and

cs =

(− sin s

rcos sr

), css =

1

r

(− cos sr− sin s

r

)=

1

r~ν

and the algebraic curvature is exactly 1/r!


Because ~ν is obtained by rotating ~τ by an angle of π/2, we obtain thefollowing relations, which are known as the Frenet relations:

d~τ

ds(s) = κ(s)~ν(s) (5.1)

d~ν

ds(s) = −κ(s)~τ(s). (5.2)

The curvature κ(s) is simply given by

κ(s) = yssxs − xssys. (5.3)

For a curve which is not an a-l parametrization, we can use the theoremof change of variables to change the original parametrization to an arc-lengthparametrization, which gives:

1

|cp|d

dp~τ(p) = κ(p)~ν(p) (5.4)

1

|cp|d

dp~ν(p) = −κ(p)~τ(p) (5.5)

The algebraic curvature is then given by

κ(p) =yppxp − xppyp

|cp|3=

det(cpp, cp)

|cp|3.

5.3 Geometric integrals

In the following paragraphs, we will pay particular attention to integralsalong a regular curve c : [a, b]→ R2:

Lc(g) =

∫ b

ag(c(p))|cp(p)| dp. (5.6)

We have already seen that when g ≡ 1, this is the length of c. Our interestin such integrals is due to their invariance under reparametrization: Let c :[e, f ] → R2, q 7→ c(q) be another parametrization, and let φ : [a, b] → [e, f ]be a diffeomorphic change of parametrization such that c(φ(p)) = c(p), withinverse diffeomorphism ψ. By the chain rule, we get cp(p) = φp(p)cq(φ(p))

5.4. IMPLICIT REPRESENTATION OF CURVES 49

and by theorem 15 we get

Lc(g) =

∫ f

eg(c(q))|cq(q)| dq

=

∫ b

ag (c(φ(p))) |cq(φ(p))|φp(p) dp

=

∫ b

ag(c(p))|cp(p)| dp

= Lc(g).

When s is an a-l parametrization of c, we then have

Lc(g) =

∫ L

0g(c(s)) ds

because |cs| = 1. We then write

Lc(g) =

∫cg(s) ds =

∮cg(s) ds

and we call this the curvilinear integral of g along c. It depends only on thegeometric curve c and not on its parametrization, except this: if we changethe direction along c, then we must change the sign of the integral.

5.4 Implicit representation of curves

In this paragraph I will present a series of standard results concerning theimplicit ”level set” representation of curves as constant value level sets offunctions. The formulas for length and curvature calculations are particu-larly important.

The idea is to implicitly represent c as the curve of level 0 of a givenfunction: c = x, φ(x) = 0 or, in other words: φ(c(p)) = 0 for all p ∈ [a, b].By differentiating this identity, we obtain

∇φ · c′ = 0⇔ ∇φ⊥c′.

This means that the outward normal to c is

~ν = ± ∇φ|∇φ|

because | ∇φ|∇φ| | = 1. The sign depends on the choice of φ; one generallychooses the +, but if not, simply replace φ by −φ.


Now, if c is a family of curves c = c(p, t), where t is the parameter ofevolution, we can search for a family of functions φ(x, t) such that c(−, t) isthe 0 level set of φ(−, t): φ(c(p, t), t) = 0. From now on, by the gradient of φwe will always mean its spatial gradient, its partial derivative with respect totime will be denoted φt, the derivative of c with respect to p will be writtencp and that with respect to t will be denote ct. (??)

Differentiating the identity with respect to t, we obtain

∇φ · ct + φt = 0.

If c(−, t) verifies the PDE ct = F~ν or F is a scalar function, we get bysubstitution:

∇φ · F~ν + φt = F∇φ · ∇φ|∇φ|

+ φt

= F |∇φ|+ φt = 0.

if we can extend F at least in a neighborhood of the 0th level set of φ.Now we are almost done (????) The problem of Chan-Vese (and manyothers, even most) is to express the curvature of c as a function of φ and itsderivatives. This is the following theorem:

Theorem 17 Let φ : R2 → R be a differential map, and let x0 be a pointin R2 for which ∇φx0 is nonzero. If φ(x0) = λ ∈ R, there exists a regularcurve c :]a, b[→ R2, p 7→ c(p) containing x0 such that φ(c(p)) = λ, ∀p ∈]a, b[.Moreover, the algebraic curvature of c at p is given by

κ(p) = div

(∇φ|∇φ|

)(c(p)).

The first part of the theorem can be proved using the implicit functiontheorem. The second part, which is also not very complicated, needs somecalculations using the Frenet formulas.

We shall assume that c is a-l parametrized with x0 = c(s0) so that ~τ = cs,κ~ν = css. In addition, we assume that ~ν = ∇φ

|∇φ| and hence since ~τ is obtained

by rotating ~ν by an angle of −π/2, we have

~τ = − 1

|∇φ|

(uy−ux

).

From the identity φ(c(s)) = 0 for s ∈]s0− ε, s0 + ε[ we obtain ∇φ · ~τ = 0by deriving with respect to s. Deriving once more, we get

(Hessφ~τ) · ~τ +∇φ · d~τds

5.5. REPRESENTATION BY DISTANCE FUNCTIONS 51

where Hessφ is the hessian of φ.By the results from the first section, (Hessφ~τ) · ~τ = ~τ tHessφ~τ and we

have

κ = −~τtHessφ~τ

∇φ · ~ν.

We replace ~τ and ~ν by their expressions in terms of φ, and obtain

κ = −

(φy −φx

)(φxx φxyφxy φyy

)(φy−φx

)|∇φ|3

=φxxφ

2y − 2φxyφxφy + φyyφ

2x(√

φ2x + φ2

y

)3 .

Now this develops as

div

(∇φ|∇φ|

)=

∂

∂x

φx√φ2x + φ2

y

+∂

∂y

φy√φ2x + φ2

y

and we can see that we obtain the same expression as above!

5.5 Representation by distance functions

Among functions representing implicitly curves, signed distance functionsare very attractive, both because of their mathematical properties and theircomputational ones. To start with, a definition and a theorem

Definition 18 A continuous curve c : [0, 1] → R2 is called a Jordan curveif it is closed and has no self intersection: c(0) = c(1), and if p 6= p′ ∈ [0, 1),then c(p) 6= c(p′).

Theorem 19 The Jordan curve theorem. Let c a Jordan curve of the planeR2. Then the image of c separates the plane into two regions, a bounded one,the interior of c and an unbounded one, the exterior of c.

Interestingly, this very intuitive theorem is quite difficult to prove. Figure5.3 illustrates the situation.

Given a bounded domain U and a point x in R2, one can define thefunction dU (x), the distance of x to U as

dU (x) = infy∈U

d(x, y)

where d is the usual Euclidean distance of the plane.


Figure 5.3: A Jordan curve, the interior, bounded region is blue

If now c is a Jordan curve, then call Bc its bounded interior, Uc itsunbounded exterior and define

dC(x) = dBc(x)− dUc(x).

Then it is easy to show that

1. if x ∈ Uc, dc(x) > 0 is the distance from x to C

2. if x ∈ BC , dc(x) < 0 and −dc(x) is the distance from x to C

3. if x ∈ c then dc(x) = 0.

This is the signed distance function associated to c (or more precisely itsimage in R2) is the zero-level set of dC . A very important property of dCis that for most points x, one has ‖∇xdc‖ = 1. The points where it fails orwhere the gradient is not defined is of huge interest, especially in the studyof shapes, this is made by the skeleton of c and the cracks. I won’t howeversay much more about them, see [4, 15].

When c is “sufficiently straight”, then one can find a neighborhood alongc (a tubular neighborhood T ) such that is x ∈ T there is a unique point y ofc such that dc(x) = d(x, y). Within this neighborhood, one has ‖∇dc‖ = 1.

Many formulas from the previous section, including the implicit normal,curvature, etc... are highly simplified in that case.

5.6. A VERY SHORT INTRODUCTION TO VARIATIONS OF CURVES53

(a) (b) (c)

Figure 5.4: Deformations of curves. a) open curve, deformation changingendpoints. b) open curve, deformation fixing endpoints. c) deformation ofa closed curve.

5.6 A very short introduction to variations of curves

This section introduces basic ideas on how to vary curves, compute differ-entials of curve functionals and gradients. As usual, mathematical rigor ison vacations.

We fix a smooth curve c : p ∈ [0, 1] 7→ c(p) ∈ R2. The simplest way curvedeformation can be defined is by moving the points of c: a deformation d ofc is given as p 7→ c(p)+h(p) where h is vector field along c in R2. Figure 5.4illustrates three different situations: first the case of an open curve whoseendpoints move when the curve deforms. The second is the case of an opencurve whose deformation does not change the endpoints. The third case isthe one of a deforming closed curve. Other situations can occur, they arenormally dictated by problems at hand.

To give a deformation of c is thus to give the vector field h. We can useit to develop differential calculus on curves: let F a functional defined ona space of planar curves; we want to compute its differential and to startwith, its directional derivative in a given vector field h direction. We willassume that F has the following form

F(c) =

∫ 1

0F (c, cp) dp. (5.7)

Its directional derivative in the direction of h, dcFh defined as `′(0) where

`(t) = F(c+ th).

Assuming that the Lagrangian (a, b) 7→ F (a, b) of (5.7) is regular enough,


one gets

`′(0) =

(d

dt

∫ 1

0F (c+ t h, cp + t hp) dp

)|t=0

=

∫ 1

0(Fa(c, cp) · h+ Fb(c, cp) · hp) dp

where Fa and Fb are respectively the derivatives of the Lagrangian F w.r.tits first and second argument (a, b ∈ R2). As usual, then, by integration bypart, one gets

dcFh =

∫ 1

0

(Fa(c, cp)−

dFbdp

(c, cp)

)· h dp+ Fb(c, cp) · h|10

and in the case of fixed endpoints or close curves, the boundary term abovevanishes (for free endpoint problems conditions can be given on Fb in orderto make this term vanish if desired). This then suggests a gradient:

∇cF = Fa(c, cp)−d

dpFb(c, cp). (5.8)

This is turn means that we have used a very simple inner product:

h1, h2 7→ 〈h1, h2〉 =

∫ 1

0h1 · h2 dp (5.9)

Thus the situation looks completely similar to what has been used in sections3 and 4. This type of approach has been used in a series of work, the mostknown being the snakes of Kass et al. [9].

However, with the inner product above, no5.3. geometry of the curve citself appears. Curves are not really used, only their parametrizations are.In order to be geometric, the deformation field should be considered actingnot the curve parameter, but on the corresponding point of the curve.

To make it geometric in the sens of paragraph 5.3, it can be replaced bythe following. Given two vector fields h1 and h2 defined along the image ofthe regular curve c, set

h1, h2 7→ 〈h1, h2〉c =

∫ 1

0h1(c(p)) · h2(c(p)) |cp| dp. (5.10)

This is geometric: if we change the parametrization of c, (5.10) remainsunchanged. Remark now that the inner product is indexed by c, the de-formations h1 and h2 make only sense for the geometric curve c. On the

5.6. A VERY SHORT INTRODUCTION TO VARIATIONS OF CURVES55

space of regular curves, one does not have one inner product, but a familyof products, one per geometric curve. This forms a Riemannian metric.

This is not the only geometric structure of the space of regular curves,and this family of inner products, although widely used in many applications,some will be presented in the next section, has also severe shortcomings thatwill maybe be discussed.

At the same time, the type of functional we looked at, may fail to begeometric in the sense of paragraph 5.3. We look at specific ones, guaranteedto be geometric:

F (a, b) = G(a)|b|, so that F(c) =

∫ 1

0G(c)|cp| dp (5.11)

So, with such a functional and the inner product above, we will computenew gradients. They will illustrate a rather interesting fact. As usual, wefirst compute the differential dcFh, details are left to the reader:

dcFh =

(∫ 1

0

d

dt(G(c+ t h)|cp + t hp| dp)

)|t=0

=

∫ 1

0|cp|∇G(c) · h dp︸︷︷︸

Ih

+

∫ 1

0

(G(c)

cp|cp|

)· hp dp︸︷︷︸

Ihp

Before going further, remember that cp/|cp| is the unit tangent ~τ at c(p)(5.1). In order to write it as an inner product of the form (5.10), one needsfirst to get rid of the derivative in hp in the term Ihpby integration by partand assuming that the boundary term vanishes, Ihp becomes, developingand using Frenet formula (5.4),

Ihp = −∫

d

dp(G(c)~τ) · h dp

= −∫ 1

0(∇G(c) · cp)~τ + |cp|G(c)κ~ν ) · h dp

= −∫ 1

0|cp| ((∇G(c) · ~τ)~τ +G(c)κ~ν) · h dp

using the fact that cp = |cp|~τ . We thus obtain

dcFh =

∫ 1

0|cp| (∇G(c)− (∇G(c) · ~τ)~τ − κG(c)~ν) · h dp


But the pair (~τ , ~ν) is an orthonormal moving frame along c, so that

∇G(c) = (∇G · ~τ)~τ + (∇G · ~ν)~ν

from which one gets

dcFh =

∫ 1

0|cp| ((∇G(c) · ~ν)− κG(c))~ν · h dp

And the corresponding gradient, for inner product (5.10) is

∇cF = (∇G(c) · ~ν − κG(c))~ν (5.12)

is a vector field normal to c. In fact when evolving curves geometrically,only the normal direction “counts”, the tangential direction only changesthe parametrization. There is in fact the following result of Epstein andGage [5, 15].

Theorem 20 Let c : [0, 1] × [0, 1] → R2, (p, t) 7→ c(p, t) a time-dependentcurve family, i.e, for t fixed, ct : p 7→ ct(p) = c(p, t) is a regular curve inR2, C2 in space and C1 in time, and assume that it satisfies the differentialequation

∂tc(p, t) = h1(p, t)~τ(c(p, t)) + h2(p, t)~ν(c(p, t))

where (~τ(−), ~ν(−) is the orthonormal moving frame at time t. Then thereexists a time-dependent reparametrization of c, denoted φ such that c(p, t) =c(φ(p, t), t) satisfies the differential equation

∂tc = h2~ν

with h2(p, t) = h2(φ(p, t), t).

Figure 5.5 illustrates this fact. A deformation of a curve from a purelygeometric point of view, is a normal vector field to that curve. If one isinterested in purely geometric aspects, then one can restrict to such vectorfields.

5.7 An extremely short / ultralight introductionto shape derivatives

A rigorous description of shape calculus is far beyond the purpose of thesenote, the reader is referred to [4]. We discuss it in an informal manner. Theregions we consider will be “reasonable ones” in the following sense.

5.7. AN EXTREMELY SHORT / ULTRALIGHT INTRODUCTION TO SHAPE DERIVATIVES57

Figure 5.5: Two deformations; that is, two vector fields on a curve c inducinga deformation to the curve c′. The first deformation (in green) has tangentialcomponents. The second (in blue) is a purely normal vector field to c.

In order to describe how a domain changes, one starts with a time-dependent vector field v(x, t) representing the velocity of a “particle” atposition x at time t. At time t + dt, this point will move at first order tonew position x + v(x, t)dt. Now, assume that we are given a domain Ω(t)and a vector field v(x, t) defined on a domain sufficiently larger than Ω(t),call it D, the “hold-all” domain. We assume also that at the boundary ofD, v(x, t) = 0.

One deforms Ω(t) at time t+dt by moving each of its points as done aboveand one get the moved/deformed region at time t+ dt, Ω(t+ dt). What weneed is a reasonable, operational, definition of dΩ(t)/dt, the shape derivative,skipping the rather complicated “details”: if ∂Ω(t) is the boundary of Ω(t),i.e. the exterior delimiting curve of Ω(t), the instantaneous deformationvelocity of Ω(t) is given by a vector field normal to ∂Ω(t). Each point xof ∂Ω(t) moves with instantaneous velocity v(x, t), but one only keeps thenormal direction, as the tangent one does not modify ∂Ω(t) geometrically,as I explained in a previous paragraph, see (again) Figure 5.5. Let ~ν be theexterior normal vector field to ∂Ω(t). One has the following fundamentalresult:

Theorem 21 Let f(x, t) be a differentiable function in x ∈ D ⊂ Rn andt ∈ [a, b] ⊂ R (the “time”) and let g(t) be the function defined by

g(t) =

∫Ω(t)

f(x, t) dx.


Assume that the instantaneous deformation of ∂Ω(t) is given by the vectorfield v(x, t). Then

g′(t) =

∫Ω(t)

∂f

∂t(x, t) dx+

∫∂Ω(t)

f(x, t) v · ~ν ds.

For the case where ∂Ω(t) is a curve, the last integral means is the geometricintegral along a good parametrization of ∂Ω(t): if c : [a, b] → R2, p 7→ c(p)is one of them, oriented such that the orthonormal basis ( c′

|c′| , ~ν) is direct,then ∫

∂Ω(t)f(x, t)V · ~ν ds :=

∫ b

a|cp)]|f(c(p), t)v(p) · ~ν(p) dp.

This formula is in fact also valid in Rn with ∂Ω(t) an hypersurface of Rn, butinstead of dealing with the curve velocity |c′|, a more complicated expression,for the hypersurface volume element, is necessary.

An alternate way to look at the previous result is as follows. Let χt bethe characteristic function of Ω(t),

χt(x) =

1 if x ∈ Ω(t)

0 if x ∈ D\Ω(t).

Then the above function g can be written as

g(t) =

∫Df(x, t)χt(x) dx

and the domain of integration D is now fixed. In order to compute g′(t) onedifferentiates the integrand to obtain:

g′(t) =

∫D

∂f

∂t(x, t)χt(x) dx+

∫Df(x, t)

∂χt∂t

(x) dx.

As χt is constant inside Ωt (value 1) and outside it (value 0), its variation ∂χt∂t

is concentrated on ∂Ω(t) and is given by the normal component of v(x, t).This requires of course serious theoretical justifications, they can be foundin [4].

Chapter 6

Optimization of curves andregions

Thsi chapter collects some classical curve optimization computations, start-ing with length and the Euclidean shortening flow and then many problemsrelated to image segmentation, starting from Kass, Wiktin and Terzopoulos’Snakes and gradually moving to more complicated formulations.

6.1 Snakes

This is where a large part of the story started in Image Analysis. The ideais to find a curve that delineates an object in an image by following itsedges. Because of noise and other problems, the edges might be incompleteand / or noisy. Thus Kass, Witkin and Terzopoloulos proposed to find thesegmentation bu minimizing the energy

E(c; I) =

∫ 1

0

(α|cp|2 + β|cpp|2

)dp+ γ

∫ 1

0gI(c) dp (6.1)

where alpha > 0, β ≤= 0 and γ > 0, while g should be small at edges of I.This could be

gI(x, y) = −∇Iσ(x, y), gI(x, y) =1

1 +∇Iσ(x, y)

where Iσ is a (Gaussian) smoothed version of I in order to get rid of someof the noise.

59

60 CHAPTER 6. OPTIMIZATION OF CURVES AND REGIONS

The energy E(c, I) is not geometric so we use inner product (5.8) in thatcase. Let’s do the computation step by step.

dcEh =d

dt

(∫ 1

0

(α|cp + t hp|2 + β|cpp + t hpp|2

)dp+ γ

∫ 1

0gI(c+ t h) dp

)t=0

=

∫ 1

0(αcp · hp + βcpp · hpp) dp+ γ

∫ 1

0∇cgI(c) · h dp

=

∫ 1

0(−αcpp · h+ βcpppp · h) dp+ γ

∫ 1

0∇cgI(c) · h dp

(one integration by parts to elimate hp, two for hpp)

=

∫ 1

0(−αcpp + βcpppp + γgI(c)) h dp

and the sought gradient is thus

∇cE = −αcpp + βcpppp + γgI(c).

6.2 Curves of minimal length

In the second example of Calculus of variation with curves, we wish tocharacterize the curve of minimal length joining the points A and B in theeuclidean plane. It is well known that it is simply the straight line segmentjoining A to B. But what can we say from the point of view of the calculusof variations?

For a curve c : [a, b]→ R2, the length functional is given by

`(c) =

∫ b

a|cp| dp

which corresponds precisely to the case F (a, b) = |b| and G = 1 in the gradi-ent computations performed in §5.6. We do the computations with the twogradients we have discussed. Using inner product (5.9) and correspondinggradient (5.8), one find the following gradient

∇1c` = −d∇bF (cp)

dpp = − d

dp

cp|cp|

= − d

dp~τ = −|cp|κ~ν

the second equality because ∇b|b| = b|b| and the last by Frenet’s formula.

Using the geometric product (5.9), we find instead

∇2c` = −κ~ν.

6.3. GEODESIC ACTIVE CONTOURS 61

Of course, in both case, the optimality condition reduces to the vanishingof the curvature or the curve: κ = 0. Change to al-parametrization s ∈[0, L] 7→ c(s) where L is the length of c. This can be done w.l.o.g. Thenκ~ν = css, the second derivative of c. It follows immediatly that c must havethe form

c(s) = A+ (L− s)B,

thus a straight-line segment.

6.3 Geodesic active contours

We are given an image u :]0, 1[×]0, 1[→ R containing an object that wewould like to segment via its contours, where the segments are given as theinterior of the curve given by c.

How can we find or evolve this curve so that it tours the object? I willsuppose, at least in this version of these notes, that the reader knows thesnake method of Kass, Witkin and Terzopoulos (KWT), because I have goodreason to believe that a reader of this document is familiar with these.

Figure 6.1: Active contours

The idea of geodesic active contours is to remedy the problems with theparametrization by KWT through a geometric formulation: If, in a changeof parametrization of the contour, the energy does not change, then the twocurves parametrizing the same image in R2 should have the same energy.

For a curve c : [a, b] → R2, where we assume that c is closed smooth,that is c(a) = c(b) and c(a) = c(b), this energy is given by

J(c) =

∫ b

ag(|∇u(c(p))|)|cp)| dp.


and the invariance of energy in this reparametrization comes from the term|cp)| in the integrand: let c : [e, f ] → R2 be another parametrization of c,that is, there exists a function φ : [e, f ]→ [a, b], φ′ > 0, c(r) = c(φ(r)), thenwe have, since c′(r) = cφ(r))φ′(r),

J(c) =

∫ f

eg(|∇u(c(r))|)|c′(r)| dr

=

∫ f

eg(|∇u(c(φ(r))|)|cφ(r))|φ′(r) dr

=

∫ b

ag(|∇u(c(p))|)|cp)| dp

= J(c)

by the change of variables theorem.Again and again, we will proceed quickly in calculating the gradient

using the method of directional derivatives in a given direction h. TheFrenet relations are crucial for this method, and we will generalize our energy(a little) to conceal the dependence on the gradient of the image u in g :(x, y)→ R:

J(c) =

∫ b

ag(c(p))|cp)| dp.

dJc h =d

dt

∫ b

ag(c(p) + th(p))|cp) + th′(p)| dp

∣∣∣∣t=0

=

∫ b

a

(∇g(c) · h|c′|+ g(c)

c′ · h′

|c′|

)dp

=

∫ b

a

(∇g(c)|c′| − d

dp

c′

|c′|

)· h dp

Using the Frenet relations (5.4) given in the previous section,

d

dp

c′

|c′|=d~τ

dp= |c′|κ~ν

we get

dJc h =

∫ b

a|c′| (∇g(c)− κ~ν) · h dp = 〈|c′| (∇g(c)− κ~ν), h〉. (6.2)

The factor |c′| appears in the above integral, which means that dJc h isinvariant under reparametrization. We must look for a gradient of the form

6.3. GEODESIC ACTIVE CONTOURS 63

|c′| (∇g(c)− κ~ν), yet this is not the one generally presented, which is

(∇g(c) · ~ν)~ν − κ~ν.

In fact, in terms of evolution of geometric objects, both gradients arevery close. On one hand, when applying a tangential deformation to a curve,the geometric curve in R2 remains unchanged, although the parametrizationdoes change. Purely normal deformations deform the geometric curves.

With this fact, one has ∇g = (∇g · ~τ)~τ + (∇g · ~ν)~ν and g(c) · ~ν)~ν isprecisely the normal component of ∇g. From a purely geometric viewpointthese two gradients provide the same solution.

By choosing an arc-length parameterization, |c′| = 1, there is no muchdifference between these gradients. In fact the difference between these twoexpressions comes from the inner product one works with. The functionspace inner product does not say much on geometry!

Thus instead, given a closed and smooth curve c : [a, b] → R2, then, totwo normal vector fields along c, v1, v2 : [a, b]→ R2 (i.e. ∀p ∈ [a, b], vi(P ) isparallel to ~ν(p), i = 1, 2) one associates the inner product

〈v1, v2〉c =

∫ b

a(v1 · v2) |c′| dp. (6.3)

A main difference between this on and the previous one is that this product isdefined w.r.t the curve c, we are in fact in a situation somewhat analoguousto the case of optimization under smooth equality constraints discussed insection 2.6: at each point x of the constraint space D one has a tangentspace, TxD. Here the tangent space to a curve c is the set of smooth normalvector fields to c (yes!)

With this scala product (6.3), what is teh corresponding gradient? Wegot

dJc h =

∫ b

a|c′| (∇g(c)− κ~ν) · h dp

We shall restrict ourselves to the vector fields h that are normal w.r.t. c andby a straightforward calculation, one gets

(∇g(c)− κ~ν) · h = ((∇g(c)− κ~ν) · ~ν)~ν · h

et

dJc h =

∫ b

a|c′| ((∇g(c)− κ~ν) · ~ν)~ν · h dp =

∫ b

a|c′| ((∇g(c) · ~ν)~ν − κ~ν) · h dp


from which it follows that, with the inner product (6.3),

∇Jc = (∇g(c) · ~ν)~ν − κ~ν

which is the expected gradient for the geodesic active contours.

When using a steepest gradient descent, in order to solve the minimiza-tion, the steepest descent direction h is given via the Cauchy-Schwarz Theo-rem (Theorem 9), it says that this must be a positive multiple of the oppositeof the gradient:

h = −λ ((∇g(c) · ~ν)~ν − κ~ν) .

Choosing λ = 1, one gets

ct = −(∇g(c) · ~ν)~ν + κ~ν

another choice for λ will only result in constant global multiplication of thisevolution speed, and in the implementation, will be “absorbed” in our choiceof dt.

6.4 Inner products and distances

6.5 Chan and Vese’s Active Contours

In the segmentation problem of Chan and Vese [3], variables regions of agiven domain, these regions are used as integration domains. We need thusto understand how to manipulate them in order to optimize them w.r.t agiven criterion. Before describing the exact criterion used by Chan and Vese,we describe how the regions can be manipulated.

6.5.1 Chan and Vese cost function

Let u : D → R a grayscale image. D is the hold-all domain discussedabove, usually a square in R2. We assume that this image a made of an “in-ner” region Ωi and an outer region Ωe which differ enough by their contents,typically: the mean gray values of each region should sufficiently far awayand the gray values well distributed around the means.

The goal of the segmentation is to recover these regions, or, equivalently,to recover their common boundary c. Chan and Vese segmentation has the

6.5. CHAN AND VESE’S ACTIVE CONTOURS 65

following variational formulation: find c minimizing the following objectivefunction

J(c, ai, ae) = λ1

∫Ωi

(u(x, y)− ai)2 dxdy + λ2

∫Ωe

(u(x, y)− ae)2 dxdy + µ`(c)

where c has Ωi as interior, Ωe, as exterior ai et ae are two real to be deter-mined, while λ1, λ2 and µ are three positive constant and `(c) is the lengthof c (defined in one of the previous sections).

So, what we are looking for is a curve of minimal length, separating asbest as possible the two regions in the sens that the quantities ai et ae as-sociated to each region are as far away as possible from each other (thereis, in the original formulation of Chan and Vese [3] an area term, which inpractice is seldom used, so I have dropped it).

We will look at two formulations of the Chan-Vese problem, the first usesoptimization of regions/curves, and the second one optimizes a levelset func-tion. We start with the region approach.

Figure 6.2: Chan-Vese problem.

Based on the inner product (6.3), we redo again the calculation of dc`Vfor a vector field v normal to c, where c is given with a parametrization


c : [a, b]→ R2

dc`V =

∫ b

a

c′ · v′

|c′|dp

= −∫ b

a

d

dp

(c′

|c′|

)· V dp

= −∫ b

a|c′|(

1

|c′|d

dp

(c′

|c′|

)· V)dp

= −∫ b

a|c′|(κ~ν · V )dp = 〈−κ~ν, V 〉c

by integration by part and Frenet formula (5.4), and the gradient is thus

∇`c = −κ~ν.

When c is fixed and thus Ωi and Ωe as well, minimization on ae and aiis quite simple. For instance, for ai one gets

∂J

∂ai= 2λi

∫Ωi

(u(x, y)− ai) dxdy

and setting ∂J∂ai

= 0 gives∫Ωi

u(x, y) dxdy =

∫Ωi

ai dxdy = ai

∫Ωi

dxdy ⇒ ai =

∫Ωiu(x, y) dxdy∫Ωidxdy

i.e. , ai is the mean value of u in Ωi. The same result holds of course forae. Before we proceed to the gradient computation, let us remark that theoutward normal to Ωi along c is the inward normal to Ωe along c: Ωe i the“exterior” of Ωi and Ωi is the “exterior” of Ωe!

We will use Theorem 21 as follows. Suppose that for a given t, Ωi(t)and Ωe(t) deform via the vector field V , normal to c. Since the values ofai(t) and ae(t) are known exactly for t as the mean values of u within thecorresponding regions, we fix them and forget their t-dependency 1. SetJi(t) = λi

∫Ωi(t)

(u− ai)2 dxdy and Je(t) = λe∫

Ωi(t)(u− ae)2 dxdy. Then

J ′i(t) = λi

∫Ωi(t)

∂(u(x, y)− ai)2

∂tdxdt+λi

∫c(u(x, y)−ai)2V ·~ν ds = λi

∫c(u(x, y)−ai)2V ·~ν ds

1This may seema bit strange, but one uses implicitly the Lagrange multipliers method:one add extra parameters for ai et ae and we impose the mean value constraint [Francoisdit: need to come with a more decent explanation...]

6.5. CHAN AND VESE’S ACTIVE CONTOURS 67

since the first integral does not depend on t. In the same way, using theremark on the sign of the outward normals,

J ′e(t) = −λi∫c(u(x, y)− ae)2V · ~ν ds.

So set Ji(c) = λi∫

Ωi(u− ai)2 dxdy and Je(c) = λi

∫Ωe

(u− ai)2 dxdy, and wehave in fact obtained

dJic V = λi

∫c(u(x, y)− ai)2V · ~ν ds = 〈λi(u− ai)2~ν, V 〉c

dJec V = −λe∫c(u(x, y)− ae)2V · ~ν ds = 〈−λe(u− ae)2~ν, V 〉c

Reuse the calculation made for the gradient of the length functional d`c V =〈−κ~ν, V 〉 to get

dJc V = 〈(λi(u− ai)2 − λe(u− ae)2

)~ν − µκ~ν, V 〉c.

If one uses a gradient descent approach, the resulting PDE on c is

ct = −(λi(u− ai)2 − λe(u− ae)2

)~ν + µκ~ν.

[Francois dit: When I have time, I will discuss how to make aLevel Set implementation of it and discuss that it will differ fromthe Chan-Vese direct Level Set approach. Maybe I could say awork about some of the stuffs of Charpiat.]

Bibliography

[1] G. Aubert and P. Kornprobst. Mathematical Problems in Image Pro-cessing: Partial Differential Equations and the Calculus of Variations,Second edition, volume 147 of Applied Mathematical Sciences. Springer,2006.

[2] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. TheInternational Journal of Computer Vision, 22(1):61–79, 1997.

[3] T. Chan and L. Vese. Active contours without edges. IEEE Transac-tions on Image Processing, 10(2):266–277, February 2001.

[4] M.C. Delfour and J.-P. Zolesio. Shapes and Geometries. Advances inDesign and Control. Siam, 2001.

[5] C.L. Epstein and M. Gage. The curve shortening flow. In A.J. Chorin,A.J. Majda, and P.D. Lax, editors, Wave Motion: Theory, Modellingand Computation. Springer–Verlag, 1987.

[6] L.C. Evans. Partial Differential Equations, volume 19 of Graduate Stud-ies in Mathematics. Proceedings of the American Mathematical Society,1998.

[7] D. Gustavsson, K. S. Pedersen, F. Lauze, and M. Nielsen. On the Rateof Structural Change in Scale Spaces. In X. C. Tai, M. Lysaker, andK. A. Lie, editors, Proceedings of the 2nd SSVM Conference, LNCS5567, pages 832–843, Voss, Norway, 2009. Springer.

[8] B.K. Horn and B.G. Schunck. Determining Optical Flow. ArtificialIntelligence, 17:185–203, 1981.

[9] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active ContourModels. The International Journal of Computer Vision, 1(4):321–331,January 1988.

69

70 BIBLIOGRAPHY

[10] J. W. Neuberger. Sobolev Gradients, volume 1670 of Lecture Notes inMathematics. Spinger, 1997.

[11] M. Nielsen, L. Florack, and R. Deriche. Regularization and scale space.Research Report 2352, INRIA Sophia-Antipolis, 1994.

[12] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation basednoise removal algorithms. Physica D, 60:259–268, 1992.

[13] A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill-Posed Problems.Winston and Sons, Washington, D.C., 1977.

[14] L. Vese. A study in the BV space of a denoising–deblurring variationalproblem. Applied Mathematics and Optimization, 44(2):131–161, 2001.

[15] L. Younes. Shapes and Diffeomorphisms, volume 171 of Applied Math-ematical Sciences. Springer, 2010.

notes on calculus of variation with applications in image...

Documents