inverse problems - university of cambridge problems.pdfin the following we are going to present...

Inverse Problems

Martin BenningUpdated on: June 6, 2016

Lecture NotesMichaelmas Term 2015

Contents

1 Introduction to inverse problems 51.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.3 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.1.4 Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.1.5 Magnetic Resonance Imaging (MRI) . . . . . . . . . . . . . . . . . . . . . . 10

2 Linear inverse problems 132.1 Generalised inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Singular value decomposition of compact operators . . . . . . . . . . . . . . . . . . 18

3 Regularisation 213.1 Truncated singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Tikhonov regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Source-conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Asymptotic regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5 Landweber iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.6 Variational regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6.1 Tikhonov-type regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . 323.7 Convex Variational Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Computational realisation 394.1 A primal dual hybrid gradient method . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Convergence of PDHGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.1.2 Deconvolution with total variation regularisation . . . . . . . . . . . . . . . 43

3

4 CONTENTS

These lecture notes are based on the following books and lecture notes:

1. Engl, Heinz Werner, Martin Hanke, and Andreas Neubauer. Regularization of inverse prob-lems. Vol. 375. Springer Science & Business Media, 1996.

2. Mueller, Jennifer L., and Samuli Siltanen. Linear and nonlinear inverse problems withpractical applications. Vol. 10. SIAM, 2012.

3. Martin Burger, Inverse Problems. Lecture notes winter 2007/2008.

http://wwwmath.uni-muenster.de/num/Vorlesungen/IP_WS07/skript.pdf

4. Christian Clason, Inverse Probleme (in German), Lecture notes winter 2014/2015

https://www.uni-due.de/~adf040p/teaching/inverse_14/InverseSkript.pdf

The lecture notes are under constant redevelopment and will likely contain errors and mistakes. Ivery much appreciate the finding and reporting of those (to [email protected]). Thanks!

http://wwwmath.uni-muenster.de/num/Vorlesungen/IP_WS07/skript.pdf

https://www.uni-due.de/~adf040p/teaching/inverse_14/InverseSkript.pdf

mailto:[email protected]

Chapter 1

Introduction to inverse problems

Solving an inverse problem is the task of computing an unknown physical quantity that is relatedto given, indirect measurements via a forward model. Inverse problems appear in a vast majorityof applications, including imaging (Computed Tomography (CT), Positron Emission Tomography(PET), Magnetic Resonance Imaging (MRI), Electron Tomography (ET), microscopic imaging,geophysical imaging), signal- and image-processing, computer vision, machine learning and (big)data analysis in general, and many more.Mathematically, an inverse problem can be described as the solution of the operator equation

Ku = f (1.1)

with given measurement data f for the unknown quantity u. Here, K : U → V denotes an operatormapping from the Banach space U to the Banach space V. Throughout this lecture we are goingto restrict ourselves to linear and bounded operators though.

Inverting a forward model however is not straightforward in most relevant applications, for twobasic reasons: either a (unique) inverse model simply does not exist, or existing inverse modelsheavily amplify small measurement errors. In the sense of Hadamard the problem (1.1) is calledwell-posed if

• for all input data there exists a solution of the problem, i.e. for all f ∈ V there exists au ∈ U with Ku = f .

• for all input data this solution is unique, i.e. u 6= v implies Kv 6= f .

• the solution of the problem depends continuously on the input datum, i.e. for all ukk∈Nwith Kuk → f we have uk → u.

If any of these conditions is violated, problem (1.1) is called ill-posed. In the following we are goingto see that most practically relevant inverse problems are ill-posed or approximately ill-posed.1

1.1 Examples

In the following we are going to present various examples of inverse problems and highlight thechallenges of dealing with them.

1In fact the name ill-posed problems may be a more suitable name for this lecture, as the real challenge is to dealwith the ill-posedness of the inverse problems. However, the name inverse problems became more widely acceptedfor this field of mathematics.

5

6 1.1. EXAMPLES

1.1.1 Matrix inversion

One of the most simple (class of) inverse problems that arises from (numerical) linear algebra isthe solution of linear systems. These can be written in the form of (1.1) with u ∈ Rn and f ∈ Rnbeing n-dimensional vectors with real entries and K ∈ Rn×n being a matrix with real entries. Wefurther assume K to be a symmetric, positive definite matrix. In that case we know from thespectral theory of symmetric matrices that there exist Eigenvalues 0 ≤ λ1 ≤ λ2 ≤ . . . ≤ λn andcorresponding Eigenvectors kj ∈ Rn for j ∈ 1, . . . , n such that K can be written as

K =

n∑j=1

λjkjkTj . (1.2)

Note that by dividing (1.1) on both sides of the equality with λn, (1.2) can be rescaled so thatλn = 1 and λ1 = κ−1 hold. Here, κ denotes the so-called condition number κ = λn/λ1 (in thecase of λ1 6= 0) of the matrix K. It is well known from numerical linear algebra that the conditionnumber is a measure of how stable (1.1) can be solved.

If we assume that we observe f δ instead of f , with ‖f − f δ‖2 ≤ δ (here ‖ · ‖2 denotes theEuclidean norm), and further denote uδ the solution of Kuδ = f δ, the difference between uδ andthe solution u of (1.1) reads as

u− uδ =

n∑j=1

λ−1j kjk

Tj

(f − f δ

).

Therefore we can estimate

∥∥∥u− uδ∥∥∥2

2=

n∑j=1

λ−2j ‖kj‖

22︸︷︷︸

=1

∣∣∣kTj (f − f δ)∣∣∣2 ≤ λ−21

∥∥∥f − f δ∥∥∥2

2,

due to λ1 ≤ λi for i 6= 1 and the orthogonality of the eigenvectors. Thus, taking the square rootyields the estimate ∥∥∥u− uδ∥∥∥

2≤ κ

∥∥∥f − f δ∥∥∥2≤ κδ .

Hence, we observe that in the worst case an error δ in the data y is amplified by the conditionnumber κ of the matrix K. A matrix with large κ is therefore called ill-conditioned. We want todemonstrate the effect of this error amplification with a small example.

Example 1.1. Let us consider the matrix

K =

(1 11 1001

1000

)

and the vector f = (1, 1)T . Then the solution of Ku = f is simply given via u = (1, 0)T . If we,however, consider the perturbed data f δ = (99/100, 101/100)T instead of f , the solution uδ ofKuδ = f δ is uδ = (−19.01, 20)T .

CHAPTER 1. INTRODUCTION TO INVERSE PROBLEMS 7

1.1.2 Differentiation

Another classic inverse problem is differentiation. Assume we are given a function f ∈ L∞([0, 1])with f(0) = 0 for which we want to compute u = f ′. These conditions are satisfied if and only ifu and f satisfy the operator equation

f(y) =

∫ y

0u(x) dx ,

which can be written as the operator equation Ku = f with the linear operator (K·)(y) :=∫ y0 ·(x) dx. As in the previous section, we assume that instead of f we observe a noisy version f δ

for which we further assume that the perturbation is additive, i.e. f δ = f +nδ with f ∈ C1([0, 1])and nδ ∈ L∞([0, 1]). It is obvious that the derivative u only exists if the noise nδ is differentiable.However, even in case nδ is differentiable the error in the derivative can become arbitrarily large.Consider for instance the sequence (δj)j∈N with δj → 0 and the sequence of corresponding noisefunctions nδj ∈ C1([0, 1]) → L∞([0, 1]) with

nδj (x) := δj sin

(kx

δj

),

for a fixed but arbitrary number k. We on the one hand observe∥∥nδj∥∥

L∞([0,1])= δj → 0, but on

the other hand have

uδj (x) = f ′(x) + k cos

(kx

δj

),

and therefore obtain the estimate∥∥∥u− uδj∥∥∥L∞([0,1])

=

∥∥∥∥(nδj)′∥∥∥∥L∞([0,1])

= k .

Thus, despite the noise in the data becoming arbitrarily small, the error in the derivative canbecome arbitrarily big (depending on k).

Note that considering a decreasing error in the norm of the Banach space C1([0, 1]) will yielda different result. If we have a sequence of noise functions with

∥∥nδj∥∥C1([0,1])

= δj → 0 instead,we can conclude ∥∥∥u− uδj∥∥∥

L∞([0,1])≤∥∥∥nδj∥∥∥

C1([0,1])→ 0 ,

due to C1([0, 1]) being embedded in L∞([0, 1]). In contrast to the previous example the sequenceof functions nδj (x) := δj sin(kx) for instance satisfies

∥∥∥nδj∥∥∥C1([0,1])

= supx∈[0,1]

∣∣∣nδj (x)∣∣∣+ sup

x∈[0,1]

∣∣∣∣(nδj)′ (x)

∣∣∣∣ = (1 + k)δj → 0 .

However, for a fixed δ the bound on∥∥u− uδ∥∥

L∞([0,1])can obviously still become fairly large

compared to δ, depending on how large k is.

8 1.1. EXAMPLES

1.1.3 Deconvolution

An interesting problem that occurs in many imaging, image- and signal processing applications isthe deblurring or deconvolution of signals from a known, linear degradation. Deconvolution of asignal can be modelled as solving the inverse problem of the convolution, which reads as

f(y) = (Ku)(y) :=

∫Rnu(x)g(y − x) dx . (1.3)

Here f denotes the blurry image, u is the (unknown) true image and g is the function that modelsthe degradation. Due to the Fourier convolution theorem we can rewrite (1.3) to

f = (2π)n2F−1 (F(u)F(g)) . (1.4)

with F denoting the Fourier transform

F(u)(ξ) := (2π)−n2

∫Rnu(x) exp(−ix · ξ) dx (1.5)

and F−1 being the inverse Fourier transform

F−1(f)(x) := (2π)−n2

∫Rnf(ξ) exp(ix · ξ) dξ (1.6)

It is important to note that the inverse Fourier transform is indeed the unique, inverse operator ofthe Fourier transform in the Hilbert-space L2 due to the theorem of Plancherel. If we rearrange(1.4) to solve it for u we obtain

u = (2π)−n2F−1

(F(f)

F(g)

), (1.7)

and hence, we allegedly can recover u by simple division in the Fourier domain. However, we arequickly going to discover that this inverse problem is ill-posed and the division will lead to heavyamplifications of small errors.

Let u denote the image that satisfies (1.3). Further we assume that instead of the blurry imagef we observe f δ = f+nδ instead, and that uδ is the solution of (1.7) with input datum f δ. Hence,we observe

(2π)n2

∣∣∣u− uδ∣∣∣ =

∣∣∣∣F−1

(F(f − f δ)F(g)

)∣∣∣∣ =

∣∣∣∣F−1

(F(nδ)

F(g)

)∣∣∣∣ . (1.8)

As the convolution kernel g usually has compact support, F(g) will tend to zero for high frequen-cies. Hence, the denominator of (1.8) becomes fairly small, whereas the numerator will be non-zeroas the noise is of high frequency. Thus, in the limit the solution will not depend continuously onthe data and the convolution problem therefore be ill-posed.

1.1.4 Tomography

In almost any tomography application the underlying inverse problem is either the inversion ofthe Radon transform or of the X-ray transform in dimensions higher than two. For u ∈ C∞0 (Rn),


s ∈ R and θ ∈ Sn−1, the Radon transform2 R : C∞0 (Rn) → C∞(Sn−1 × R) can be defined as theline integral operator

f(θ, s) = (Ru)(θ, s) =

∫x·θ=s

u(x) dx (1.9)

=

∫Ru(sθ + tθ⊥) dt . (1.10)

Here θ⊥ denotes a vector orthogonal to θ.

Example 1.2. Let n = 2. Then Sn−1 is simply the unit sphere S1 = θ ∈ R2 | ‖θ‖2 = 1. Wecan choose for instance θ = (cos(ϕ), sin(ϕ))T , ϕ ∈ [0, 2π[, and parametrise the Radon transformin terms of ϕ and s, i.e.

f(ϕ, s) = (Ru)(ϕ, s) =

∫Ru(s cos(ϕ)− t sin(ϕ), s sin(ϕ) + t cos(ϕ)) dt .

Note that - with respect to the origin of the reference coordinate system - ϕ determines the angleof the line along one wants to integrate, while s is the offset of that line to the centre of thecoordinate system.

X-ray Computed Tomography (CT)

In X-ray computed tomography (CT), the unknown quantity u represents a spatially varying den-sity that is exposed to X-radiation from different angles, and that absorbs the radiation accordingto its material or biological properties.

The basic modelling assumption for the intensity decay of an X-ray beam is that on a smalldistance ∆t it is proportional to the intensity itself, the density and the distance, i.e.

I(sθ + (t+ ∆t)θ⊥)− I(sθ + tθ⊥)

∆t= −I(sθ + tθ⊥)u(sθ + tθ⊥) .

By taking the limit ∆t→ 0 we end up with the ordinary differential equation

d

dtI(sθ + tθ⊥) = −I(sθ + tθ⊥)u(sθ + tθ⊥) . (1.11)

We now integrate (1.11) from t = −R/2, the position of the emitter, to t = R/2, the position ofthe detector, to obtain ∫ R

2

−R2

ddtI(sθ + tθ⊥)

I(sθ + tθ⊥)dt = −

∫ R2

−R2

u(sθ + tθ⊥) dt .

Note that due to d/dx log(f(x)) = f ′(x)/f(x) the left hand side in the above equation simplifiesto ∫ R

2

−R2

ddtI(sθ + tθ⊥)

I(sθ + tθ⊥)dt = log

(I

(sθ +

R

2θ⊥))− log

(I

(sθ − R

2θ⊥))

.

As we know the radiation intensity at both the emitter and the detector, we therefore knowf(θ, s) := log(I(sθ − (Rθ⊥)/2)) − log(I(sθ + (Rθ⊥)/2)) and we can write the estimation of theunknown density u as the inverse problem of the Radon transform (1.10) (if we further assumethat u can be continuously extended to zero outside of the circle of radius R).

2Named after the Austrian mathematician Johann Karl August Radon (16 December 1887 – 25 May 1956)

10 1.1. EXAMPLES

Positron Emission Tomography (PET)

In Positron Emission Tomography (PET) a so-called radioactive tracer (a positron emitting ra-dionuclide on a biologically active molecule) is injected into a patient (or subject). The emittedpositrons of the tracer will interact with the subjects’ electrons after travelling a short distance(usually less than 1mm), causing the annihilation of both the positron and the electron, whichresults in a pair of gamma photons moving into (approximately) opposite directions. This pairof photons is detected by the scanner detectors, and an intensity f(ϕ, s) can be associated withthe number of annihilations detected at the detector pair that forms the line with offset s andangle ϕ (with respect to the reference coordinate system). Thus, we can consider the problem ofrecovering the unknown tracer density u as a solution of the inverse problem (1.10) again. The lineof integration is determined by the position of the detector pairs and the geometry of the scanner.

1.1.5 Magnetic Resonance Imaging (MRI)

Magnetic resonance imaging (MRI) is an imaging technique that allows to visualise the chemicalcomposition of patients or materials. MRI scanners use strong magnetic fields and radio wavesto excite subatomic particles (like protons) that subsequently emit radio frequency signals whichcan be measured by the radio frequency coils. In the following we want to briefly outline themathematics of the acquisition process. Subsequently we are going to see that finding the unknownspin proton density basically leads to solving the inverse problem of the Fourier transform (1.5).The magnetisation of a so-called spin isochromat can be described by the Bloch equations3

d

dt

Mx(t)My(t)Mz(t)

=

− 1T2

γBz(t) −γBy(t)−γBz(t) − 1

T2γBx(t)

γBy(t) −γBx(t) − 1T1

Mx(t)My(t)Mz(t)

+

00M0T1

. (1.12)

Here M(t) = (Mx(t),My(t),Mz(t)) is the nuclear magnetisation (of the spin isochromat), γ is thegyromagnetic ratio, B(t) = (Bx(t), By(t), Bz(t)) denotes the magnetic field experienced by thenuclei, T1 is the longitudinal and T2 the transverse relaxation time and M0 the magnetisation inthermal equilibrium. If we define Mxy(t) = Mx(t) + iMy(t) and Bxy(t) = Bx(t) + iBy(t), we canrewrite (1.12) to

d

dtMxy(t) = −iγ (Mxy(t)Bz(t)−Mz(t)Bxy(t))−

Mxy(t)

T2(1.13a)

d

dtMz(t) = i

γ

2

(Mxy(t)Bxy(t)−Mxy(t)Bxy(t)

)− Mz(t)−M0

T1(1.13b)

with · denoting the complex conjugate of ·.If we assume for instance that B = (0, 0, B0) is just a constant magnetic field in z-direction,

(1.13) reduces to the decoupled equations

d

dtMxy(t) = −iγB0Mxy(t)−

Mxy(t)

T2, (1.14a)

d

dtMz(t) = −Mz(t)−M0

T1. (1.14b)

3Named after the Swiss born American physicist Felix Bloch (23 October 1905 - 10 September 1983)


It is easy to see that this system of equations (1.14) has the unique solution

Mxy(t) = e−t(iω0+1/T2)Mxy(0) (1.15a)

Mz(t) = Mz(0)e− tT1 +M0

(1− e−

tT1

)(1.15b)

for ω0 := γB0 denoting the Lamor frequency, and Mxy(0), Mz(0) being the initial magnetisationsat time t = 0.

Rotating frame

Thus, for a constant magnetic background field in z-direction, B0, Mxy basically rotates aroundthe z-axis in clockwise direction with frequency ω0 (if we ignore the T2 decay for a moment).Rotating the x- and y-coordinate axes with the same frequency yields the representation of theBloch equations in the so-called rotating frame. If we substitute M r

xy(t) := eiω0tMxy(t), Brxy(t) :=

Bxy(t)eiω0t, M r

z (t) := Mz(t) and Brz(t) := Bz(t), we obtain

d

dtM rxy(t) = −iγ

(M rxy(t)(B

rz(t)−B0)−M r

z (t)Brxy(t)

)−M rxy(t)

T2(1.16a)

d

dtM rz (t) = i

γ

2

(M rxy(t)B

rxy(t)−M r

xy(t)Brxy(t)

)− M r

z (t)−M0

T1(1.16b)

instead of (1.13).Thus, if we assume the magnetic field to be constant with magnitude B0 in z-direction within

the rotating frame, i.e. Br(t) = (Brx(t), Br

y(t), B0), (1.16a) simplifies to

d

dtM rxy(t) = iγM r

z (t)Brxy(t)−

M rxy(t)

T2. (1.17)

90 pulse

Now we assume that Brx(t) = c, c constant, and Br

y(t) = 0 for t ∈ [0, τ ], and τ T1 and τ T2.Then we can basically ignore the effect ofM r

xy(t)/T2 and (M rz (t)−M0)/T1, and the Bloch equations

in the rotating frame simplify to

d

dt

M rx(t)

M ry (t)

M rz (t)

=

0 0 00 0 ω0 −ω 0

M rx(t)

M ry (t)

M rz (t)

(1.18)

with ω := γc, in matrix form with separate components. Assuming the initial magnetisations inthe rotating frame to be zero in the x-y plane, i.e. M r

x(0) = 0 and M ry (0) = 0, and constant in

the z-plane with value M rz (0), the solution of (1.18) can be written as M rx(t)

M ry (t)

M rz (t)

=

1 0 00 cos(ωt) sin(ωt)0 − sin(ωt) cos(ωt)

00

M rz (0)

. (1.19)

Thus, equation (1.19) rotates the initial z-magnetisation around the x-axis by the angle θ := ωt.Note that if c and τ are chosen such that θ = π, all magnetisation is rotated from the z-axis tothe y-axis, i.e. M r

y (τ) = M rz (0). In analogy, choosing Bx(t) = 0 and By(t) = c, all magnetisation

can be shifted from the z- to the x-axis.

12 1.1. EXAMPLES

Signal acquisition

If the radio-frequency (RF) pulse is turned off and thus, Brx(t) = 0 and Br

y(t) = 0 for t > τ ,the same coils that have been used to induce the RF pulse can be used to measure the x-ymagnetisation. Since we measure a volume of the whole x-y net-magnetisation, the acquiredsignal equals

y(t) =

∫R3

M(x, t) dx =

∫R2

e−iω0(x)tM r(x, t) dx (1.20)

with M(x, t) denoting Mxy(t) for a specific spatial coordinate x ∈ R3 (M r(x, t) respectively).Using (1.15a) and assuming τ < t T2, this yields

y(t) =

∫R3

Mτ (x)e−iω0(x)t dx , (1.21)

with Mτ denoting the x-y-magnetisation at spatial location x ∈ R3 and time t = τ . Note thatMτ = 0 without any RF pulse applied in advance.

Signal recovery

The basic clue to allow for spatially resolving nuclear magnetic resonance spectrometry is to adda magnetic field B(t) to the constant magnetic field B0 in z-direction that varies spatially overtime. Then, (1.14a) changes to

d

dtMxy(t) = −iγ(B0 + B(t))Mxy(t)−

Mxy(t)

T2,

which, for initial value Mxy(0), has the unique solution

Mxy(t) = e−iγ(B0t+∫ t0 B(τ) dτ)e

− tT2Mxy(0) (1.22)

if we ensure B(0) = 0. If now x(t) denotes the spatial location of a considered spin isochromat attime t, we can write B(t) as B(t) = x(t) ·G(t), with a function G that describes the influence ofthe magnetic field gradient over time.Based on the considerations that lead to (1.21) we therefore measure

y(t) =

∫R3

Mτ (x)e−iγ(B0(x)t+∫ t0 x(τ)·G(τ) dτ) dx

in an NMR experiment. Further assuming that B0 is also constant in space, we can eliminate thisterm and write the signal acquisition as

eiγB0ty(t) =

∫R3

Mτ (x)e−iγ∫ t0 x(τ)·G(τ) dτ dx (1.23)

In the following we assume that x(t) can be approximated reasonably well via its zero-orderTaylor approximation around t0 = 0, i.e.∫ t

0x(τ) ·G(τ) dτ ≈ x(0) ·

∫ t

0G(τ) dτ . (1.24)

and hence, the inverse problem of finding the unknown spin-proton density Mτ for given measure-ments y is equivalent to solving the inverse problem of the Fourier transform

f(t) = (KMτ )(t) =

∫R3

Mτ (x)e−ix·ξ(t) dx , (1.25)

with f(t) := eiγB0ty(t) and ξ(t) := γ∫ t

0 G(τ) dτ .

Chapter 2

Linear inverse problems

Throughout this lecture we deal with functional analytic operators. For the sake of brevity, wecannot recall all basic concepts of functional analysis but refer to popular textbooks that deal withthis subject, like [10]. Nevertheless, we want to recall a few important properties that are going tobe important for the further course of this lecture. In particular, we are going to focus on inverseproblems with bounded, linear operators K only, i.e. K ∈ L(U ,V) with

‖K‖L(U ,V) := supu∈U\0

‖Ku‖V‖u‖U

= sup‖u‖U≤1

‖Ku‖V <∞ .

For X ⊂ U and K : X → V we further want to denote with

1. D(K) := U the domain

2. N (K) := u ∈ U | Ku = 0 the kernel

3. R(K) := Ku ∈ V | x ∈ U the range

of K. We say that K is continuous in u ∈ U if there exists a δ > 0 for all ε > 0 with

‖Ku−Kv‖V ≤ ε for all v ∈ U with ‖u− v‖U ≤ δ.

For linear K (as assumed throughout this lecture) it can be shown that continuity is equivalentto the existence of a positive constant C such that

‖Ku‖V ≤ C‖u‖U

for all u ∈ X . Note that this constant C actually equals the operator norm ‖K‖L(U ,V).For the first part of the lecture we only consider K ∈ L(U ,V) with U and V being Hilbert

spaces. From functional calculus we know that every Hilbert space is equipped with a scalarproduct, which we are going to denote by 〈·, ·〉U (if U denotes the corresponding Hilbertspace). Inanalogy to the transpose of a matrix, this scalar product structure together with the theorem ofFréchet-Riesz [10, Section 2.10, Theorem 2.E] allows us to define the (unique) adjoint operator ofK, denoted with K∗, as follows:

〈Ku, v〉V = 〈u,K∗v〉U .

13

14 2.1. GENERALISED INVERSE

In addition to that, a scalar product allows to have a notion of orthogonality. Two function u, v ∈ Uare said to be orthogonal if 〈u, v〉U = 0. For a subset X ⊂ U the orthogonal complement of X inU is defined as

X⊥ := u ∈ U | 〈u, x〉U = 0 for all x ∈ X .

From this definition we immediately observe that X⊥ is a closed subspace. Further we haveU⊥ = 0. Moreover, we have X ⊂ (X⊥)⊥. If X is a closed subspace we even have X = (X⊥)⊥

and 0⊥ = X . In this case there exists the orthogonal decomposition

U = X ⊕ X⊥ ,

which means that every element u ∈ U can uniquely be represented as

u = x+ x⊥ with x ∈ X , x⊥ ∈ X⊥ ,

see for instance [10, Section 2.9, Corollary 1]. The mapping u 7→ x defines a linear operatorPX ∈ L(U ,X ) that is called orthogonal projection on X . The orthogonal projection satisfies thefollowing conditions (cf. [6, Section 5.16]):

1. PX is self-adjoint, i.e. PX = P ∗X ;

2. ‖PX ‖L(U ,X ) = 1 (if X 6= 0);

3. I − PX = PX⊥ ;

4. ‖v − PX v‖U = minu∈X ‖v − u‖U ;

5. x = PXu if and only if x ∈ X and u− x ∈ X⊥.

Note that for a non-closed subspace X we only have (X⊥)⊥ = X . For K ∈ L(U ,V) we thereforehave

• R(K)⊥ = N (K∗) and thus N (K∗)⊥ = R(K);

• R(K∗)⊥ = N (K) and thus N (K)⊥ = R(K∗).

Hence, we can conclude the orthogonal decompositions

U = N (K)⊕R(K∗) and V = N (K∗)⊕R(K) .

In the following we want to investigate the concept of generalised inverses of bounded linearoperators, before we will identify compactness of operators as the major source of ill-posedness.Subsequently we are going to discuss this in more detail by analysing compact operators in termsof their singular value decomposition.

2.1 Generalised inverse

In order to overcome the issues of non-existence or non-uniqueness we want to generalise theconcept of least-squares solutions to linear operators in Hilbert spaces.

If we consider the generic inverse problem (1.1) again, we know that there does not exist asolution of the inverse problem if f /∈ R(K). In that case it seems reasonable to find an elementu ∈ U for which ‖Ku− f‖V gets minimal instead.

However, for N (K) 6= 0 there are infinitely many solutions that minimise ‖Ku − f‖V ofwhich we have to pick one. Picking the one with minimal norm ‖u‖U brings us to the definitionof the minimal norm solution.

CHAPTER 2. LINEAR INVERSE PROBLEMS 15

Definition 2.1. An element u ∈ U is called

• least-squares solution of (1.1), if

‖Ku− f‖V = minw∈U‖Kw − f‖V ; (2.1)

• minimal norm solution of (1.1), if

‖u‖U = min‖w‖U | w satisfies (2.1) . (2.2)

Note that for arbitrary f a minimum norm solution does not need to exist if R(K) is notclosed. If, however, a minimum norm solution exists, it is unique and can (in theory) be computedvia the Moore-Penrose generalised inverse.

Definition 2.2. Let K := K|N (K)⊥ : N (K)⊥ → R(K) denote the restriction of K ∈ L(U ,V).Then the Moore-Penrose inverse K† is defined as the unique linear extension of K−1 with

D(K†) := R(K)⊕R(K)⊥ ,

N (K†) := R(K)⊥ .

Note that K is injective due to the restriction to N (K)⊥, and surjective due to the restrictionto R(K). Hence, K−1 exists, and - as a consequence - K† is well-defined on R(K). Due to theorthogonal decomposition D(K†) = R(K)⊕R(K)⊥ there exist f1 ∈ R(K) and f2 ∈ R(K)⊥ withf = f1 + f2, for arbitrary f ∈ D(K†). Hence, we have

K†f = K†f1 +K†f2 = K†f1 = K−1f1 , (2.3)

due to R(K)⊥ = N (K†), and thus, K† is well-defined on the whole of D(K†). It can be shownthat K† can be characterized by the Moore-Penrose equations.

Lemma 2.1. The Moore-Penrose inverse K† satisfies R(K†) = N (K)⊥ and the Moore-Penroseequations

1. KK†K = K,

2. K†KK† = K†,

3. K†K = I − PN (K),

4. KK† = PR(K)

∣∣∣D(K†)

,

where PN (K) : U → N (K) and PR(K): V → R(K) denote the orthogonal projections on N (K)

and R(K), respectively.

Proof. First of all we are going to prove R(K†) = N (K)⊥. According to (2.3) and the definitionof the Moore-Penrose inverse we observe for f ∈ D(K†) that

K†f = K−1PR(K)f = K†PR(K)

f , (2.4)

16 2.1. GENERALISED INVERSE

as we not only have PR(K)f ∈ R(K) but also PR(K)

f ∈ R(K). Hence, K†f ∈ R(K−1) =

N (K)⊥ and therefore R(K†) ⊂ N (K)⊥. However, we also observe N (K)⊥ ⊂ R(K†) as K†Ku =K−1Ku = u for u ∈ N (K)⊥ already implies u ∈ R(K†).

It remains to prove the Moore-Penrose equations. We begin with 4: For f ∈ D(K†) it followsfrom R(K†) = N (K)⊥ and (2.4) that

KK†f = KK−1PR(K)f = KK−1PR(K)

f = PR(K)f ,

due to K−1PR(K)f ∈ N (K)⊥ and K = K on N (K)⊥.

3: According to the definition of K† we have K†Ku = K−1Ku for all u ∈ U and thus

K†Ku = K−1KPN (K)u︸︷︷︸=0

+K−1K (I − PN (K))︸︷︷︸=PN (K)⊥

u = K−1K(I − PN (K))u = (I − PN (K))u .

2: Inserting 4 into (2.4) yields

K†f = K†PR(K)f = K†KK†f

for all f ∈ D(K†).1: With 3 we have

KK†K = K(I − PN (K)) = K −KPN (K) = K .

The following theorem states that minimum norm solutions can be computed via the generalisedinverse.

Theorem 2.1. For each f ∈ D(K†) the minimal norm solution u† of (1.1) is given via

u† = K†f .

The set of all least squares solutions is given via u†+N (K).

Proof. See [2, Theorem 2.5].

In numerical linear algebra it is a well known fact that the normal equations can be consideredto compute least squares solutions. The same is true in the continuous case.

Theorem 2.2. For given f ∈ D(K†) the function u ∈ U is minimal norm solution if and only ifit satisfies the normal equations

K∗Ku = K∗f . (2.5)

We further have u = u† in case of u ∈ N (K)⊥.

Proof. An element u ∈ U is least squares solution if and only ifKu is the projection of f ontoR(K),i.e. Ku = PR(K)(f). The latter is equivalent to Ku ∈ R(K) and Ku − f ∈ R(K)⊥ = N (K∗).Thus, we conclude K∗(Ku − f) = 0. Further, a least squares solution has minimal norm if andonly if u ∈ N (K)⊥.


As a direct consequence from Theorem 2.1 and Theorem 2.2 we obtain

K†f = (K∗K)†K∗f ,

and hence, in order to approximate K†f we may also compute an approximation via (2.5) instead.At the end of this section we further want to analyse the domain of the generalised inverse

in more detail. Due to the construction of the Moore-Penrose inverse we have D(K†) = R(K) ⊕R(K)⊥. As orthogonal complements are always closed we can conclude

D(K†) = R(K)⊕R(K)⊥ = R(K)⊕N (K∗)⊥ = V ,

and hence, D(K†) is dense in V. Thus, if R(K) is closed it follows that D(K†) = V and vice versa,D(K†) = V implies R(K) to be closed. Moreover, for f ∈ R(K)⊥ = N (K†) the minimum-normsolution is u† = 0. Therefore, for given f ∈ R(K) the important question to address is when falso satisfies f ∈ R(K). If this is the case, K† has to be continuous. However, the existence of asingle element f ∈ R(K) \ R(K) is enough already to prove that K† is discontinuous.

Theorem 2.3. Let K ∈ L(U ,V). Then K† ∈ L(D(K†),U) if and only if R(K) is closed.

Proof. See [2, Proposition 2.4].

In the next section we are going to discover that the class of compact operators is a class forwhich the Moore-Penrose inverses are discontinuous.

2.2 Compact operators

Compact operators are very common in inverse problems; in fact, almost all (linear) inverse prob-lems involve the inversion of compact operators. Compact operators are defined as follows.

Definition 2.3. Let K ∈ L(U ,V). Then K is said to be compact if the image of a boundedsequence unn∈N ⊂ U contains a convergent subsequence Kunkk∈N ⊂ V. We denote the spaceof compact operators with K(U ,V).

Compact operators can be seen as the infinite dimensional analogue to ill-conditioned matrices.Indeed it can be seen that compactness is a main source of ill-posedness, confirmed by the followingresult.

Theorem 2.4. Let K ∈ K(U ,V) between infinite dimensional Banach space U and V, such thatthe dimension of R(K) is infinite. Then the problem (1.1) is ill-posed, i.e. the correspondingMoore-Penrose inverse K† is discontinuous.

Proof. Since U and R(K) are infinite dimensional, we can conclude that N (K)⊥ is also infi-nite dimensional. We can therefore find a sequence unn∈N with un ∈ N (K)⊥, ‖un‖ = 1 and〈un, uk〉 = 0 for n 6= k. Since K is a compact operator the sequence fn = Kun is compact. Hence,we can find k, l such that for each δ > 0 we have ‖Kul −Kuk‖ < δ. However, we also obtain

‖K†Kul −K†Kuk‖2 = ‖ul − uk‖2 = ‖ul‖2 − 〈ul, uk〉+ ‖uk‖2 = 2 .

Hence, K† is discontinuous.

To have a better understanding of when we have f ∈ R(K) \ R(K) for compact operators K,we want to consider the singular value decomposition of compact operators.

18 2.3. SINGULAR VALUE DECOMPOSITION OF COMPACT OPERATORS

2.3 Singular value decomposition of compact operators

We want to characterise the Moore-Penrose inverse of compact operators in terms of a spectraldecomposition. Like in the finite dimensional case of matrices, we can only expect a spectraldecomposition to exist for self-adjoint operators. Due to Theorem 2.2 we can consider K∗Kinstead of K, which brings us to the singular value decomposition of linear, compact operators.

Theorem 2.5. Let K ∈ K(U ,V). Then there exists

• a null sequence σjj∈N with σ1 ≥ σ2 ≥ . . . > 0,

• an orthonormal basis ujj∈N ⊂ U of R(K∗),

• an orthonormal basis vjj∈N ⊂ V of R(K),

with

Kuj = σjvj , K∗vj = σjuj for all j ∈ N (2.6)

and

Kw =

∞∑j=1

σj〈w, uj〉U vj for all j ∈ N. (2.7)

A sequence (σj , vj , uj) for which (2.7) is satisfied, is called singular system.

Proof. Exercise 3 on Exercise Sheet 2.

Remark 2.1. Since Eigenvalues of K∗K with Eigenvectors uj are also Eigenvalues of KK∗ withEigenvectors vj , we further obtain a singular value decomposition of K∗ due to (2.6), i.e.

K∗z =∞∑j=1

σj〈z, vj〉V uj for all j ∈ N.

We now want to derive a representation of the Moore-Penrose inverse in terms of the singularvalue decomposition. Remember that we have concluded K†f = (K∗K)†K∗f from Theorem 2.1and Theorem 2.2. Hence, for u† = K†f we obtain

∞∑j=1

σ2j 〈u†, uj〉U uj = K∗Ku† = K∗f =

∞∑j=1

σj〈f, vj〉V uj

and by comparison of the respective linear components

〈u†, uj〉U =1

σj〈f, vj〉V .

As a direct consequence, the singular value decomposition of the generalised inverse reads as

u† = K†f =∞∑j=1

1

σj〈f, vj〉V uj . (2.8)

As for the singular value decomposition of K and K∗ we have to verify if the sum converges. Incontrast to the singular value decompositions of K and K∗ this is not always true. Precisely, theso-called Picard criterion has to be met for (2.8) to converge.


Theorem 2.6. Let K ∈ K(U ,V) with singular system (σj , vj , uj), and f ∈ R(K). Thenf ∈ R(K) and (2.8) converges if and only if the Picard criterion

‖K†f‖2U =∞∑j=1

|〈f, vj〉V |2

σ2j

<∞ (2.9)

is met.

Proof. See [2, Page 38, Theorem 2.8]

Hence, the Picard criterion tells us how quickly the coefficients 〈f, vj〉V have to decay comparedto the singular values σj . From representation (2.8) we can further conduct what happens in caseof noisy measurements. Assume we have f δ = f + δvj instead of f . We then observe∥∥∥K†f −K†f δ∥∥∥

U= δ‖K†vj‖U =

δ

σj→∞ for j →∞ .

For static j we see that the amplification of the error δ depends on how small σj is. Hence, thefaster the singular values decay, the stronger the amplification of errors. For that reason, onedistinguishes between three classes of ill-posed problems:

• Mildly ill-posed inverse problems are problems of the form (1.1) for which c > 0 and r > 0exist with σj ≥ cj−r for all j ∈ N.

• Severely ill-posed inverse problems of the form (1.1) for which a significantly large j ∈ Nexists, such that σj ≤ cj−r is true for all j ≥ j and c, r > 0.

• Exponentially ill-posed problems of the form (1.1) for which a significantly large j ∈ N exists,such that σj ≤ ce−j

r is true for all j ≥ j and c, r > 0.

Example 2.1. Let us consider the example of differentiation again, as introduced in Section 1.1.2.Let U = V = L2([0, 1]), then the operator K of the inverse problem (1.1) of differentiation is givenas

(Ku)(y) =

∫ y

0u(x) dx =

∫ 1

0k(x, y)u(x) dx ,

for

k(x, y) =

1 x ≤ y0 else

.

From the theory of integral operators it follows that K is compact, due to it’s kernel k(x, y) beingsquare integrable (cf. [1]). In order to compute the singular value decomposition we have tocompute K∗ first, which is characterised via

〈Ku, v〉L2([0,1]) = 〈u,K∗v〉L2([0,1]) .

Hence, we obtain

〈Ku, v〉L2([0,1]) =

∫ 1

0

∫ 1

0k(x, y)u(x) dx v(y) dy =

∫ 1

0u(x)

∫ 1

0k(x, y)v(y) dy dx .

20 2.3. SINGULAR VALUE DECOMPOSITION OF COMPACT OPERATORS

Hence, the adjoint operator K∗ is given via

(K∗v)(x) =

∫ 1

0k(x, y)v(y) dy =

∫ 1

xv(y) dy .

Now we want to compute the Eigenvalues and Eigenvectors of K∗K, i.e. we look for λ > 0 andu ∈ L2([0, 1]) with

λu(x) = (K∗Ku)(x) =

∫ 1

x

∫ y

0u(z) dz dy .

We immediately observe u(1) = 0 and further

λu′(x) =d

dx

∫ 1

x

∫ y

0u(z) dz dy = −

∫ x

0u(z) dz ,

from which we conclude u′(0) = 0. Taking the derivative another time thus yields the ordinarydifferential equation

λu′′(x) + u(x) = 0 ,

for which solutions are of the form

u(x) = c1 sin(σ−1x) + c2 cos(σ−1x) ,

with σ :=√λ and constants c1, c2. In order to satisfy the boundary conditions u(1) = 0 and

u′(0) = 0, the constants have to be chosen such that c1 = 0 and c2 cos(σ−1) = 0. Choosing c2 = 0would result in the trivial zero solution u ≡ 0, which is why we choose σ such that cos(σ−1) = 0instead. Hence, we have

σj =2

(2j − 1)πfor j ∈ N ,

and by choosing c2 =√

2 we obtain the following normalised representation of uj :

uj(x) =√

2 cos

((j − 1

2

)πx

).

According to (2.6) we further obtain

vj(x) = σ−1j (Kuj)(x) =

(j − 1

2

)π

∫ x

0

√2 cos

((j − 1

2

)πy

)dy =

√2 sin

((j − 1

2

)πx

),

and hence, for f ∈ L2([0, 1]) the Picard criterion becomes

∞∑j=1

(2j − 1)2π2

2

∣∣∣∣∫ 1

0f(x) sin

((j − 1

2

)πx

)dx

∣∣∣∣2 <∞ .

Thus, the picard criterion in this case is equivalent to the condition that the Fourier series of f isdifferentiable with respect to every element of the series (and hence, that f is differentiable).

Chapter 3

Regularisation

We have seen in the previous section that the major source of ill-posedness of inverse problemsof the type (1.1) is a fast decay of the singular values of K. An idea to overcome this issue is todefine approximations of K† in the following fashion. Consider the family of operators

Rαf :=∞∑j=1

gα(σj)〈f, vj〉V uj , (3.1)

with functions gα : R>0 → R≥0 that converge to 1/σj as α converges to zero. We are going to seethat such an operator Rα is what is called a regularisation (of K†), if gα is bounded, i.e.

gα(σ) ≤ Cα for all σ ∈ R>0. (3.2)

In case (3.2) holds true, we immediately observe

‖Rαf‖2U =∞∑j=1

gα(σj)2|〈f, vj〉V |2 ≤ C2

α

∞∑j=1

|〈f, vj〉V |2 ≤ C2α‖f‖2V ,

which means that Cα is a bound for the norm of Rα.Before we are going to examine concrete examples of regularisations of the form (3.1), we want

to define what a regularisation actually is, and what properties come along with it.

Definition 3.1. Let K ∈ L(U ,V) be a bounded operator. A family Rαα>0 of linear operators iscalled regularisation (or regularisation operator) of K† if

• Rα ∈ L(V,U) for all α > 0;

• Rαf → K†f for all f ∈ D(K†).

Hence, a regularisation is a pointwise approximation of the Moore-Penrose inverse with con-tinuous operators. Note however, that we usually cannot expect f ∈ D(K†) for most applications,due to measurement and modelling errors. If we however assume that there exists f ∈ D(K†) suchthat we have ∥∥∥f − f δ∥∥∥

V≤ δ

21

22

for measured data f δ ∈ V, we can expect that α has to be chosen in accordance to this error δ.Let us therefore consider the overall error ‖Rαf δ −K†f‖U that we can split up as follows:∥∥∥Rαf δ −K†f∥∥∥

U≤∥∥∥Rαf δ −Rαf∥∥∥

U+∥∥∥Rαf −K†f∥∥∥

U

≤ δ ‖Rα‖L(V,U) +∥∥∥Rαf −K†f∥∥∥

U. (3.3)

The first term of (3.3) is the data error; this term unfortunately does not stay bounded for α→ 0,which can be concluded from the uniform boundedness principle (also known as the Banach-Steinhaus theorem), cf. [5, Section 2.2, Theorem 2.2]. The second term however vanishes forα → 0, due to the pointwise convergence of Rα to K†. Hence it becomes evident from (3.3) thatthe choice of α depends on δ.

Definition 3.2. A function α : R>0×V → R>0, (δ, f δ)→ α(δ, f δ) is called parameter choice rule.We distinguish between

1. a-priori parameter choice rules, if they depend on δ only;

2. a-posteriori parameter choice rules, if they depend on δ and f δ;

3. heuristic parameter choice rules, if they depend on f δ only.

If Rαα>0 is a regularisation of K† and α is a parameter choice rule, then the pair (Rα, α) iscalled convergent regularisation, if for all f ∈ D(K†) there exists a parameter choice rule α :R>0 × V → R>0 such that

lim supδ→0

∥∥∥Rαf δ −K†f∥∥∥U

∣∣∣ f δ ∈ V,∥∥∥f − f δ∥∥∥V≤ δ

= 0 (3.4)

and

lim supδ→0

α(δ, f †)

∣∣∣ f δ ∈ V,∥∥∥f − f δ∥∥∥V≤ δ

= 0

are guaranteed.

In case of 1 or 2 we would simply write α(δ), respectively α(f δ), instead of α(δ, f δ).For the sake of brevity we are only discussing a-priori parameter choices throughout this lecture.

In fact, it can be shown that for every regularisation an a-priori parameter choice rule, and thus,a convergent regularisation, exists.

Theorem 3.1. Let Rαα>0 be a regularisation of K†, for K ∈ L(U ,V). Then there exists ana-priori parameter choice rule, such that (Rα, α) is a convergent regularisation.

Proof. Let f ∈ D(K†) be arbitrary but fixed. We can find a monotone increasing function σ :R>0 → R>0 such that for every ε > 0 we have∥∥∥Rσ(ε)f

δ −K†f∥∥∥U≤ ε

2,

due to the pointwise convergence Rα → K†.As the operator Rσ(ε) is continuous for fixed ε, there exists ρ(ε) > 0 with∥∥Rσ(ε)g −Rσ(ε)f

∥∥U ≤

ε

2for all g ∈ V with ‖g − f‖V ≤ ρ(ε) .

CHAPTER 3. REGULARISATION 23

Without loss of generality we can assume ρ to be a continuous, strictly monotone increasingfunction with limε→0 ρ(ε) = 0. Then, due to the inverse function theorem there exists a strictlymonotone and continuous function ρ−1 on R(ρ) with limδ→0 ρ

−1(δ) = 0. We continuously extendρ−1 on R>0 and define our a-priori strategy as

α : R>0 → R>0, δ → σ(ρ−1(δ)) .

Then limδ→0 α(δ) = 0 follows. Furthermore, there exists δ := ρ(ε) for all ε > 0, such that withα(δ) = σ(ε) ∥∥∥Rα(δ)f

δ −K†f∥∥∥U≤∥∥∥Rσ(ε)f

δ −Rσ(ε)f∥∥∥U

+∥∥∥Rσ(ε)f

δ −K†f∥∥∥U≤ ε

follows for all f δ ∈ V with ‖f−f δ‖V ≤ δ. Thus, (Rα, α) is a convergent regularisation method.

Now we want to turn our attention to the case when f /∈ D(K†). As the generalised inverse isnot defined for those functions we cannot expect a convergent regularisation to stay bounded asα→ 0. This is confirmed by the following result.

Theorem 3.2. Let K ∈ L(U ,V) and Rαα>0 be a regularisation of K†. Then, uα := Rαfconverges to K†f as α→ 0, for f ∈ D(K†). If

supα>0‖KRα‖L(V,U) <∞ ,

then ‖uα‖U →∞ for f /∈ D(K†).

Proof. The convergence in case of f ∈ D(K†) follows from Theorem 3.1. We therefore onlyneed to consider the case f /∈ D(K†). We assume that there exists a sequence αk → 0 suchthat ‖uαk‖U is uniformly bounded. Then there exists a weakly convergent subsequence uαkl withsome limit u ∈ U , cf. [3, Section 2.2, Theorem 2.1]. As continuous linear operators are also weaklycontinuous, we further have Kuαkl Ku. However, as KRα are uniformly bounded operators, wealso conclude Kuαkl = KRαklf PR(K)

f (according to Lemma 2.1). Hence, we have f ∈ D(K†)

in contradiction to the assumption f /∈ D(K†).

We can finally characterise a-priori parameter choice strategies that lead to convergent regu-larisation methods via the following theorem.

Theorem 3.3. Let Rαα>0 be a regularisation, and α : R>0 → R>0 an a-priori parameter choicerule. Then (Rα, α) is a convergent regularisation method if and only if

1. limδ→0 α(δ) = 0

2. limδ→0 δ‖Rα(δ)‖L(V,U) = 0

Proof. ⇒: Let condition 1 and 2 be fulfilled. From (3.3) we then observe∥∥∥Rα(δ)fδ −K†f

∥∥∥U→ 0 for δ → 0.

Hence, (Rα, α) is a convergent regularisation method.⇐: Now let (Rα, α) be a convergent regularisation method. We prove that conditions 1 and 2have to follow from this by showing that violation of either one of them leads to a contradiction

24

to (Rα, α) being a convergent regularisation method. If condition 1 is violated, Rα(δ)f does notconverge pointwise to K†f and hence, (3.4) is violated for f = f δ and δ = 0. Hence, (Rα, α) isnot a convergent regularisation method. If condition 1 is fulfilled but condition 2 is violated, thereexists a null sequence δkk∈N with δk‖Rα(δk)‖L(V,U) ≥ C > 0, and hence, we can find a sequencegkk∈N ⊂ V with ‖gk‖V = 1 and δk‖Rα(δk)gk‖U ≥ C. Let f ∈ D(K†) be arbitrary and definefk := f + δkgk. Then we have on the one hand ‖f − fk‖V ≤ δk, but on the other hand the norm of

Rα(δk)fk −K†f = Rα(δk)f −K†f + δkRα(δk)gk

cannot converge to zero, as the second term δkRα(δk)gk is bounded from below by construction.Hence, (3.4) is violated for f δ = gk and thus, (Rα, α) is not a convergent regularisation method.

In the following we want to revisit (3.1) and prove that these methods are indeed regularisationmethods for piecewise continuous functions gα satisfying (3.2).

Theorem 3.4. Let gα : R>0 → R be a piecewise continuous function satisfying (3.2), limα→0 gα(σ) =1σ and

supα,σ

σgα(σ) ≤ γ , (3.5)

for some constant γ > 0. If Rα is defined as in (3.1), we have

Rαf → K†f as α→ 0,

for all f ∈ D(K†).

Proof. From the singular value decomposition of K† and the definition of Rα we obtain

Rαf −K†f =∞∑j=1

(gα(σj)−

1

σj

)〈f, vj〉V uj =

∞∑j=1

(σjgα(σj)− 1) 〈u†, uj〉U uj .

From (3.5) we can conclude∣∣∣(σjgα(σj)− 1) 〈u†, uj〉U∣∣∣ ≤ (γ + 1)‖u†‖U ,

and hence, each element of the sum stays bounded. Thus, we can estimate

lim supα→0

∥∥∥Rαf −K†f∥∥∥2

U≤ lim sup

α→0

∞∑j=1

|σjgα(σj)− 1|2∣∣∣〈u†, uj〉U ∣∣∣2

≤∞∑j=1

∣∣∣ limα→0

σjgα(σj)− 1∣∣∣2 ∣∣∣〈u†, uj〉U ∣∣∣2 ,

and due to the pointwise convergence of gα(σj) to 1/σj we deduce limα→0 σjgα(σj)−1 = 0. Hence,we have

∥∥Rαf −K†f∥∥U → 0 for all f ∈ D(K†).

Proposition 3.1. Let the same assumptions hold as in Theorem 3.4. Further, let (3.2) be satisfied.Then (Rα, α) is a convergent regularisation method if

limδ→0

δCα(δ) = 0

is guaranteed.


Finally we want to consider the error propagation of a data error for the regularisation method(3.1).

Theorem 3.5. Let the same assumptions hold for gα as in Proposition 3.1. If we define uα := Rαfand uδα := Rαf

δ, with f ∈ D(K†), f δ ∈ V and ‖f − f δ‖V ≤ δ, then

‖Kuα −Kuδα‖V ≤ γδ , (3.6)

and

‖uα − uδα‖U ≤ Cαδ (3.7)

hold true.

Proof. From the singular value decomposition we can estimate

‖Kuα −Kuδα‖2V ≤∞∑j=1

σ2j gα(σj)

2|〈f − f δ, vj〉V |2

≤ γ2∞∑j=1

|〈f − f δ, vj〉V |2 = γ2‖f − f δ‖2V ≤ γ2δ2 ,

which yields (3.6). In the same fashion we can estimate

‖uα − uδα‖2U ≤∞∑j=1

gα(σj)2|〈f − f δ, vj〉V |2

≤ C2α

∞∑j=1

|〈f − f δ, vj〉V |2 = C2α‖f − f δ‖2V ≤ C2

αδ2 ,

to obtain (3.7).

Combining the assertions of Theorem 3.4, Proposition 3.1 and Theorem 3.5, we obtain thefollowing convergence results of the regularised solutions.

Proposition 3.2. Let the assumptions of Theorem 3.4, Proposition 3.1 and Theorem 3.5 holdtrue. Then,

uα(δ,fδ) → u†

is guaranteed as δ → 0.

3.1 Truncated singular value decomposition

As a first example for a spectral regularisation of the form (3.1) we want to consider the so-calledtruncated singular value decomposition. As the name suggests, the idea is to discard all singularvalues below a certain threshold. If we identity this threshold with the regularisation parameterα, we can realise this truncation via

gα(σ) =

1σ σ ≥ α0 σ < α

. (3.8)

26 3.2. TIKHONOV REGULARISATION

Note that we naturally obtain limα→0 gα(σ) = 1/σ for σ > 0. Equation (3.1) then reads as

uα = Rαf =∑σj≥α

1

σj〈f, vj〉V uj , (3.9)

for all f ∈ V. Note that (3.9) is always finite for α > 0 as zero is the only accumulation point ofsingular vectors of compact operators.

From (3.8) we immediately observe gα(σ) ≤ Cα = 1/α. Thus, according to Proposition 3.1the truncated singular value decomposition, together with an a-priori parameter choice strategysatisfying limδ→0 α(δ) = 0, is a convergent regularisation method if limδ→0 δ/α(δ) = 0.

Moreover, we observe supσ,α σgα(σ) = γ = 1 and hence, we obtain the error estimates ‖Kuα−Kuδα‖V ≤ δ and ‖uα − uδα‖U ≤ δ/α as a consequence of Theorem 3.5.

3.2 Tikhonov regularisation

The main idea behind Tikhonov regularisation1 is to shift the singular values ofK∗K by a constantfactor, which will be associated with the regularisation parameter α. This shift can be realised viathe function

gα(σ) =σ

σ2 + α.

Again, we immediately observe limα→0 gα(σ) = 1/σ for σ > 0. Further, we can estimate gα(σ) ≤1/(2√α) due to σ2 +α ≥ 2

√ασ. Moreover, we observe σgα(σ) = σ2/(σ2 +α) < 1 =: γ for α > 0.

The corresponding Tikhonov regularisation (3.1) reads as

uα = Rαf =∞∑j=1

σjσ2j + α

〈f, vj〉V uj . (3.10)

Note that Tikhonov regularisation can be computed without knowledge of the singular system.Considering the equation (K∗K +αI)uα in terms of the singular value decomposition, we observe

∞∑j=1

σjσ2j + α

〈f, vj〉V K∗ Kuj︸︷︷︸=σjvj︸︷︷︸

=σ2juj

+

∞∑j=1

ασjσ2j + α

〈f, vj〉V uj

=

∞∑j=1

σj(σ2j + α)

σ2j + α

〈f, vj〉V uj =∞∑j=1

σj〈f, vj〉V uj = K∗f .

Hence, the Tikhonov-regularised solution uα can be obtained by solving

(K∗K + αI)uα = K∗f (3.11)

for uα. The advantage in computing uα via (3.11) is that its computation does not require thesingular value decomposition of K, but only involves the inversion of a linear, well-posed operatorequation with a symmetric, positive definite operator.

1Named after the Russian mathematician Andrey Nikolayevich Tikhonov (30 October 1906 - 7 October 1993)


3.3 Source-conditions

Before we continue to investigate other examples of regularisation we want to briefly addressthe question of the convergence speed of a regularisation method. From Theorem 3.5 we havealready obtained a convergence rate result; however, with additional regularity assumptions onthe (unknown) minimal norm solution we are able to improve those. The regularity assumptionsthat we want to consider are known as source conditions, and are of the form

∃w ∈ U : u† = (K∗K)µw . (3.12)

The power µ > 0 of the operator is understood in the sense of the consider the µ-th power of thesingular values of the operator K∗K, i.e.

(K∗K)µw =

∞∑j=1

σ2µj 〈w, uj〉Uuj .

The rate of convergence of a regularisation scheme to the minimal norm solution now depends onthe specific choice of gα. We assume that gα satisfies

σ2µ|σgα(σ)− 1| ≤ ωµ(α) ,

for all σ > 0. In case of the truncated singular value decomposition we would for instance haveωµ(α) = αµ. With this additional assumption, we can improve the estimate in Theorem 3.4 asfollows:

‖Rαf −K†f‖V ≤∞∑j=1

|σjgα(σj)− 1|2|〈u†, uj〉U |2

=∞∑j=1

|σjgα(σj)− 1|2σ4µj |〈w, uj〉U |

2

≤ ωµ(α)2‖w‖2U

Hence, we have obtained the estimate

‖uα − u†‖U ≤ ωµ(α)‖w‖U .

Together with (3.3) we can further estimate

‖uα(δ) − u†‖U ≤ ωµ(α)‖w‖U + Cαδ . (3.13)

Example 3.1. In case of the truncated singular value decomposition we know from Section 3.1that Cα = 1/α, and we can further conclude ωµ(α) = α2µ. Hence, (3.13) simplifies to

‖uα(δ) − u†‖U ≤ α2µ‖w‖U + δα−1 (3.14)

in this case. In order to make the right-hand-side of (3.14) as small as possible, we have to chooseα such that

α =

(δ

2µ‖w‖U

) 12µ+1

.

28 3.4. ASYMPTOTIC REGULARISATION

With this choice of α we estimate

‖uα(δ) − u†‖U ≤ 21−2µ1+2µ︸︷︷︸≤2

µ1−2µ1+2µ︸︷︷︸≤1

δ2µ

2µ+1 ‖w‖1

2µ+1

U

≤ 2δ2µ

2µ+1 ‖w‖1

2µ+1

U .

It is important to note that no matter how large µ is, the rate of convergence δ2µ

2µ+1 will always beslower than δ, due to the ill-posedness of the inversion of K.

3.4 Asymptotic regularisation

Another form of regularisation is asymptotic regularisation of the form

∂tu(t) = K∗ (f −Ku(t))

u(0) = 0. (3.15)

As the linear operator K does not change with respect to the time t, we can make the Ansatz ofwriting u(t) in terms of the singular value decomposition of K as

u(t) =

∞∑j=1

γj(t)uj , (3.16)

for some function γ : R→ R. From the initial conditions we immediately observe γ(0) = 0. Fromthe singular value decomposition and (3.15) we further see

∞∑j=1

γ′j(t)uj =∞∑j=1

σj

〈f, vj〉V − σjγ(t) 〈uj , uj〉U︸︷︷︸=‖uj‖2U=1

uj .

Hence, by equating the coefficients we get

γ′j(t) = σj〈f, vj〉V − σ2j γj(t) ,

and together with γj(0) we obtain

γj(t) =(

1− e−σ2j t) 1

σj〈f, vj〉V

as a solution for all j and hence, (3.16) reads as

u(t) =

∞∑j=1

(1− e−σ

2j t) 1

σj〈f, vj〉Vuj .

If we substitute t = 1/α, we obtain the regularisation

uα =

∞∑j=1

(1− e−

σ2jα

)1

σj〈f, vj〉Vuj

with gα(σ) =

(1− e−

σ2

α

)1σ . We immediately see that gα(σ)σ ≤ 1 =: γ, and due to ex ≥

1 + x we further observe 1 − e−σ2

α ≤ σ2/α and therefore (1 − e−σ2

α )/σ ≤ maxj σj/α = σ1/α =‖K‖L(U ,V)/α =: Cα.


3.5 Landweber iteration

If we approximate (3.15) via a forward finite-difference discretisation, we end up with the iterativeprocedure

uk+1 − uk

τ= K∗

(f −Kuk

), (3.17)

⇔ uk+1 = uk + τK∗(f −Kuk

),

⇔ uk+1 = (I − τK∗K)uk + τK∗f ,

for some τ > 0 and u0 ≡ 0. Iteration (3.17) is known as the so-called Landweber iteration. Weassume f ∈ D(K†) first, and with the singular value decomposition of K and K∗ we obtain

∞∑j=1

〈uk+1, uj〉Uuj =

∞∑j=1

((1− τσ2

j

)〈uk, uj〉U + τσj〈f, vj〉V

)uj , (3.18)

and hence, by equating the individual summands

〈uk+1, uj〉U =(1− τσ2

j

)〈uk, uj〉U + τσj〈f, vj〉V . (3.19)

Assuming u0 ≡ 0, summing up equation (3.19) yields

〈uk, uj〉U = τσj〈f, vj〉Vk∑i=1

(1− τσ2j )k−i . (3.20)

The following Lemma will help us simplifying (3.20).

Lemma 3.1. For k ∈ N \ 1 we have

k∑i=1

(1− τσ2)k−i =1−

(1− τσ2

)kτσ2

. (3.21)

Proof. Equation (3.21) can simply be verified via induction. We immediately see that

2∑i=1

(1− τσ2)2−i = 1 + (1− τσ2) =1− (1− 2τσ2 + τ2σ4)

τσ2=

1−(1− τσ2

)2τσ2

serves as as our induction base. Considering k → k + 1, we observe

k+1∑i=1

(1− τσ2)k+1−i = 1 +k∑i=1

(1− τσ2)k+1−i

= 1 + (1− τσ2)

k∑i=1

(1− τσ2)k−i

= 1 + (1− τσ2)1−

(1− τσ2

)kτσ2

=1−

(1− τσ2

)k+1

τσ2,

and we are done.

30 3.6. VARIATIONAL REGULARISATION

If we now insert (3.21) into (3.20) we therefore obtain

〈uk, uj〉U =(

1− (1− τσ2j )k) 1

σj〈f, vj〉V . (3.22)

The important consequence of Equation (3.22) is that we now immediately see that 〈uk, uj〉U →〈u†, uj〉U if we ensure (1− τσ2

j )k → 0. In other words, we need to choose τ such that |1− τσ2

j | < 1(respectively 0 < τσj < 2) for all j. As in the case of asymptotic regularisation we exploit thatσ1 = ‖K‖L(U ,V) > σj for all j and select τ such that

0 < τ <2

‖K‖2L(U ,V)

is satisfied. If we interpret the iteration number as the regularisation parameter α := 1/k, weobtain the regularisation method

uα = Rαf =∞∑j=1

(1−

(1− τσ2

j

) 1α

) 1

σj〈f, vj〉V

with gα(σ) =(

1− (1− τσ2)1α

)/σ.

3.6 Variational regularisation

Variational regularisation aims at finding approximations to (1.1) by minimising appropriate func-tionals of the form

Eα(u) := H(Ku, f) + αJ(u) . (3.23)

Here H : V × V → R ∪ +∞ and J : U → R ∪ +∞ represent two functionals over Banachspaces U and V, K ∈ L(U ,V) a linear and continuous operator, and α > 0 is a real, positiveconstant. The term H is usually named fidelity or data term, as it measures the deviation betweenthe measured data f and the forward model Ku. The functional J is the regularisation term asit will impose certain regularity conditions on the unknown u. The regularisation parameter willbalance between both terms. It is beyond the scope of this lecture to discuss all the functionalanalytic concepts to study the minimisation of (3.23) for general functionals H and J . However,we briefly want to discuss three properties that will guarantee existence of a minimiser of (3.23).The first of them is a rather trivial one, assuring that functionals have some effective domain.

Definition 3.3. A functional E : X → R ∪ +∞ is called proper, if the effective domain

dom(E) := x ∈ X | E(x) <∞

is not empty.

More importantly we are going to need that functionals are lower semi-continuous. Roughlyspeaking this means that the functional values for arguments near an argument x are either closeto E(x) or greater than E(x).

Definition 3.4. Let X be a Banach space with topology τ . The functional E : X → R ∪ +∞ issaid to be lower semi-continuous at x ∈ X if

E(x) ≤ lim infk→∞

E(xk)

for all xk → x in the topology τ of X .


Together with compactness of the sub level sets this leads to the fundamental theorem ofoptimisation.

Theorem 3.6. Let X be a Banach space with topology τ and let E : X → R ∪ +∞ be lowersemi-continuous. Furthermore, let the level set

x ∈ X | E(x) ≤ c

be non-empty, and compact in the topology τ for some c ∈ R. Then there exists a global minimiserx of E, i.e. E(x) ≤ E(x) for all x ∈ X .

Proof. Let E = infx∈X E(x). Then a sequence xkk∈N exists with E(xk) → E for k → ∞. Fork sufficiently large, E(xk) ≤ c holds and hence, xkk∈N is contained in a compact set. As aconsequence, a subsequence xkll∈N exists with xkl → x, for l → ∞, for some x ∈ X . From thelower semi-continuity of E we obtain

E ≤ E(x) ≤ lim infl→∞

E(xkl) ≤ E ,

consequently x is a global minimizer.

For the sake of brevity, we will not going to prove lower semi-continuity and compactness ofthe sub-level sets for any of the functionals that we are going to consider. Nevertheless we wantto emphasise (without proof) that all functionals considered throughout this lecture are lowersemi-continuous and have compact sub level sets in the weak-* topology of the correspondingBanach space. According to the Theorem of Banach-Alaoglu (cf. [9, Theorem 1.9.13]), the setv ∈ X ∗ | ‖v‖X ∗ ≤ c, for c > 0, is compact in the weak-* topology. Hence, we could concludeexistence of a global minimum for a given infinite dimensional optimisation problem, if we wereable to prove lower semi-continuity in the weak-* topology. Unfortunately this is not as easy as itsounds and certainly beyond the scope of this lecture.

Before we want to proceed to our first example of a variational regularisation model, we furtherwant to recall the concepts of directional and Fréchet-derivatives of functionals.

Definition 3.5. Let E : X → R ∪ +∞ be a functional or operator. The directional derivative(also called first variation) at position x ∈ X in direction y ∈ X is defined as

dyE(x) := limt↓0

E(x+ ty)− E(x)

t, (3.24)

if that limit exists.

Note that if we define the function ϕy(τ) := E(x + τy), the directional derivative of E isequivalent to ϕ′y(τ)|τ=0

Definition 3.6. Let E : X → R∪ +∞ a functional mapping from the Banach space X , and letdyE(x) exist for all y ∈ X . If E′(x) : X → R ∪ +∞ exists such that

(E′(x))(y) = dyE(x) ∀y ∈ X ,

and

|E(x+ y)− E(x)− (E′(x))(y)|‖y‖X

→ 0 for ‖y‖X → 0 ,

holds true, E is called Fréchet-differentiable in x and E′(x) the Fréchet-derivative in x. If theFréchet-derivative exists for all x ∈ X , the operator E′ : X → X ∗ is called Fréchet-derivative.

32 3.6. VARIATIONAL REGULARISATION

3.6.1 Tikhonov-type regularisation

First of all we want to show that the solution of the Tikhonov regularisation (3.11) can be inter-preted as the minimiser of a functional of the form (3.23).

Theorem 3.7. For f ∈ V the Tikhonov-regularised solution uα = Rαf with Rα as defined in(3.10) is uniquely determined as the global minimiser of the Tikhonov-functional

Tα(u) :=1

2‖Ku− f‖2V +

α

2‖u‖2U . (3.25)

Proof. ⇒: A global minimiser u ∈ U of Tα(u) is characterised via Tα(u) ≤ Tα(u) for all u ∈ U .Hence, it follows from

Tα(u)− Tα(uα) =1

2‖Ku− f‖2V +

α

2‖u‖2U −

1

2‖Kuα − f‖2V −

α

2‖uα‖2U

=1

2‖Ku‖2V − 〈Ku, f〉+

α

2‖u‖2U −

1

2‖Kuα‖2V + 〈Kuα, f〉 −

α

2‖uα‖2U

+ 〈(K∗K + αI)uα −K∗f, uα − u〉︸︷︷︸=0

=1

2‖Ku−Kuα‖2V +

α

2‖u− uα‖2U

≥ 0

that uα is a global minimiser of Tα.⇐: If we have Tα(u) ≤ Tα(u) (for all u ∈ U), it follows with u = u + τv for arbitrary τ > 0 andfixed v ∈ U that

0 ≤ Tα(u)− Tα(u) =τ2

2‖Kv‖2V +

τ2α

2‖v‖2U + τ〈(K∗K + αI)u−K∗f, v〉U

holds true. Dividing by τ and subsequent consideration of the limit τ ↓ 0 thus yields

〈(K∗K + αI)u−K∗f, v〉U ≥ 0 , for all v ∈ U .

Hence, we can conclude u = uα and that uα is the unique minimiser of (3.25).

We therefore see, that uα satisfying (3.10) (resp. (3.11)) is also the solution of a variationalregularisation problem of the form (3.23), with H(Ku, f) = 1

2‖Ku − f‖2V and J(u) = 12‖u‖U .

A nice property of this representation as a minimiser of a functional is that we can generaliseTikhonov regularisation by choosing different regularisation functionals for J , i.e.

Tα(u) :=1

2‖Ku− f‖2V + αJ(u) , (3.26)

for general J : U → R ∪ +∞. We want to denote this regularisation as Tikhonov-type regular-isation and present a few examples in the following. Note that Tikhonov-type regularisation forgeneral J is not necessarily a regularisation in the sense of Definition 3.1, as we cannot expectuα := arg minu∈U Tα(u) → u† for α → 0. However, we can generalise Definition 2.1 to justifycalling the minimisation of (3.26) a regularisation.

Definition 3.7. An element u ∈ U is called J-minimising solution of (1.1), if

J(u) = minJ(w) | w satisfies (2.1) . (3.27)


Clearly, for f δ with ‖f − f δ‖V ≤ δ and u† satisfying (3.27) we can expect uα → u† forα → 0. However, it is beyond the scope of this lecture to verify this or any other generalisedconcepts of regularisation and we leave it to the interested reader to explore more about possiblegeneralisations of regularisation.

Before we are going to restrict ourselves to convex functionals and address key properties ofthem and their minimisation, we want to give a few examples of regularisation functionals used inthe context of Tikhonov-type regularisation.

Example 3.2 (Squared norm with linear operator). The easiest way to extend classical Tikhonovregularisation to a more general regularisation method is to replace ‖u‖2U/2 with

J(u) :=1

2‖Du‖2W ,

for an operator D ∈ L(U ,W). An example for an operator D that is commonly used in thisframework is the gradient operator ∇ ∈ L(L2(Ω), L2(Ω;Rn)). In this context it is also common touse the H1-norm as a regulariser, i.e.

J(u) =1

2‖u‖H1(Ω) :=

1

2‖u‖2L2(Ω) +

1

2‖∇u‖2L2(Ω;Rn) .

Example 3.3 (Maximum-entropy regularisation). Maximum-entropy regularisation is of particu-lar interest if solutions of the inverse problem are assumed to be probability density functions, i.e.functions in the space

PDF(Ω) :=

u ∈ L1(Ω)

∣∣∣∣ ∫Ωu(x) dx = 1, u ≥ 0

.

The (negative) entropy used in physics and information theory is defined as the functional J :PDF(Ω)→ R≥−1 with

J(u) :=

∫Ωu(x) log(u(x))− u(x) dx ,

and the convention 0 log(0) := 0. The corresponding Tikhonov-type regularisation reads as

uα = arg minu∈PDF(Ω)

1

2‖Ku− f‖2V + α

∫Ωu(x) log(u(x))− u(x) dx

, (3.28)

for operators K ∈ L(PDF(Ω),V).

Example 3.4 (`1 regularisation). When it comes to non-injective operators K ∈ L(`2, `2) betweensequence spaces, the `1-norm

J(u) = ‖u‖`1 :=

∞∑j=1

|uj |

is often used as a regulariser, in order to enforce sparse solutions. The corresponding Tikhonov-type regularisation reads as

uα ∈ arg minu∈`1

1

2‖Ku− f‖2`2 + α

∞∑j=1

|uj |

.

Note that `p spaces are increasing in p, which implies ‖u‖`2 ≤ ‖u‖`1 and hence, u ∈ `2 if u ∈ `1.

34 3.7. CONVEX VARIATIONAL CALCULUS

Example 3.5 (Total variation regularisation). Total variation as a regulariser has originally beenintroduced for image-denoising and -restoration applications with the goal to preserve edges inimages, respectively discontinuities in signals [8]. For smooth signals u ∈ W 1,1(Ω) the totalvariation is simply defined as

J(u) :=

∫Ω‖(∇u)(x)‖2 dx ;

a more rigorous definition to also include functions with discontinuties is given via the dual for-mulation

J(u) = TV(u) := supϕ∈C∞0 (Ω;Rn)‖‖ϕ‖2‖∞≤1

∫Ωu(x) (divϕ)(x) dx .

Note that the choice of the vector norm is somewhat arbitrary and could be replaced by otherp-vector norms. The corresponding Tikhonov-type regularisation reads as

uα ∈ arg minu∈BV(Ω)

1

2‖Ku− f‖2V + αTV(u)

, (3.29)

for K ∈ L(BV(Ω) ,V), and where BV(Ω) denotes the space of functions of bounded variationdefined as

BV(Ω) :=u ∈ L1(Ω) | TV(u) <∞

.

3.7 Convex Variational Calculus

Definition 3.8 (Convex Set). Let X be a Banach space. A subset C ⊆ X is called convex, if

λx+ (1− λ)y ∈ C ,

for all λ ∈ [0, 1] and all x, y ∈ C.

In analogy we can define (strictly) convex functionals on convex sets.

Definition 3.9 (Convex Functional). Let C be a convex set. A functional E : C → R ∪ +∞ iscalled convex, if

E(λx+ (1− λ)y) ≤ λE(x) + (1− λ)E(y) (3.30)

for all λ ∈ [0, 1] and all x, y ∈ C. The functional E is called strictly convex, if the equality of(3.30) only holds for x = y or λ ∈ 0, 1.

Example 3.6 (Absolute Value Function). For C = R and E : R→ R≥0 the absolute value functionE(x) = |x| is convex, since the absolute value function is a metric, and the triangular inequalityyields |λx + (1 − λ)y| ≤ λ|x| + (1 − λ)|y|. Obviously, the absolute value function is not strictlyconvex.

Example 3.7 (Characteristic Function). The characteristic function χC : X → R ∪ +∞ with

χC(x) :=

0 x ∈ C+∞ x 6∈ C

(3.31)

is convex if C ⊆ X is a convex subset.


We want to generalise the notion of differentiability to non-differentiable but convex functionals.

Definition 3.10 (Subdifferential). Let X be a Banach space with dual space X ∗, and let thefunctional E : X → R ∪ +∞ be convex. Then, E is called subdifferentiable at x ∈ X , if thereexists an element p ∈ X ∗ such that

E(y)− E(x)− 〈p, y − x〉X ≥ 0

holds, for all y ∈ X . Furthermore, we call p a subgradient at position x. The collection of allsubgradients at position x, i.e.

∂E(x) := p ∈ X ∗ | E(y)− E(x)− 〈p, v − u〉X ≥ 0 ,∀y ∈ X ⊂ X ∗ ,

is called subdifferential of E at x.

For non-differentiable functionals the subdifferential is multivalued; we want to consider thesubdifferential of the absolute value function as an illustrative example.

Example 3.8. Let X = R, and let E : R→ R≥0 be the absolute value function E(u) = |x|. Then,the subdifferential of E at x is given by

∂E(x) = sign(x) :=

1 for x > 0

[−1, 1] for x = 0

−1 for x < 0

.

This can easily be verified by case differentiation. For x = 0 the inequality |x| ≥ px is obviouslyfulfilled for any x ∈ R iff p ∈ [−1, 1]. For x > 0 the inequality |y| ≥ py+(1−p)x is fulfilled for everyy ∈ R iff p = 1. In analogy, p = −1 has to hold for x < 0 in order to guarantee |y| ≥ py− (1 + p)xfor each y ∈ R.

Throughout the remainder of this lecture we are particularly interested in convex, non-differentiableand absolutely one-homogeneous functionals (like the `1-norm or the total variation), i.e. we wantto consider functionals E for which E(cx) = |c|E(x) is satisfied, for every c ∈ R. We thereforecharacterise the subdifferential of (absolutely) one-homogeneous functionals with the followinglemma.

Lemma 3.2 (Subdifferential for One-Homogeneous Functionals). Let E : X → R ∪ +∞ be aconvex and one-homogeneous functional. Then, the subdifferential of E at x can equivalently bewritten as

∂E(x) = p ∈ X ∗ | 〈p, x〉 = E(x), 〈p, y〉X ≤ E(y), ∀y ∈ X .

Proof. If we consider y = 0 ∈ X in the definition of the subdifferential we immediately see

〈p, x〉 ≥ E(x) ,

while for y = 2x ∈ X we obtain

〈p, x〉 ≤ E(2x)− E(x) = 2E(x)− E(x) = E(x) ,

due to the one-homogeneity of E. Thus, 〈p, x〉 = E(x) follows. Moreover, if we insert 〈p, x〉 = E(x)in the definition of the subdifferential we get

〈p, y〉 ≤ E(y) ,

for all y ∈ X .


Example 3.9. If we consider the one-homogeneous absolute value function E : R → R≥0 withE(x) = |x| again, we see that xp = |x| implies p = sign(x). Moreover, we discover that forarguments y with the same sign as x we obtain yp = |y|, while for arguments y having theopposite sign as x, we get yp = −|y| < |y|.

Considering (3.23) again, we now and for the rest of the lecture assume H and J to be convex(in addition to the assumption of being proper and lower semi-continuous). Further, we assumeH to be Fréchet-differentiable with respect to u. For this setup we can state the following usefulcharacterisation of the subdifferential (without proof).

Theorem 3.8. Let H : V × V → R ∪ +∞ and J : U → R ∪ +∞ be proper, convex and lowersemi-continuous functionals, for Banach spaces U and V. Let furthermore K ∈ L(U ,V), and H beFréchet-differentiable with respect to u. Then, for α > 0, the subdifferential of the functional Eαas defined in (3.23) is given via

∂(H(K·, f) + αJ)(u) = K∗H ′(Ku, f) + α∂J(u) .

Proof. See [7] or [4, Proposition 5.6].

Finally we want to characterise global minimisers of convex functionals.

Theorem 3.9 (Generalised Fermat). Let E : X → R ∪ +∞ be a proper, convex functional. Anelement x ∈ X is a (global) minimiser of E iff 0 ∈ ∂E(x).

Proof. For 0 ∈ ∂E(x) we have

0 = 〈0, y − x〉 ≤ E(y)− E(x)

for all y ∈ X by definition of the subdifferential. Thus, x is a global minimizer of E.If 0 /∈ ∂E(x) holds, then there exists at least one y ∈ X such that

E(y)− E(x) < 〈0, y − x〉 = 0 ,

and hence, x cannot be a minimizer of E.

It is worth mentioning that all the definitions and theorems derived in this chapter can betransferred to (strictly) concave functions and (global) maximisation problems. In particular, if Eis a (strictly) concave functional then −E is (strictly) convex, and the characterisation of a globalmaximum of E is identical to the characterisation of a global minimum of −E. We will make useof this fact throughout the next chapter when it comes to saddle-point problems.

To conclude this chapter, we want to introduce the useful notion of the generalised Bregmandistance.

Definition 3.11. For a proper, lower semi-continuous and convex functional E : X → R∪+∞with non-empty subdifferential ∂E, the generalised Bregman distance is defined as

DpE(x, y) := E(x)− E(y)− 〈p, x− y〉 , p ∈ ∂E(y) . (3.32)

The generalised Bregman distance is no distance in the classical sense. It does satisfyDpE(x, y) ≥

0 for all x, y ∈ X ; however, it is neither symmetric nor does it satisfy a triangular inequality ingeneral. Nevertheless can we symmetrise the Bregman distance by simply adding the two Bregmandistances with interchanged arguments.


Definition 3.12. Let the same assumptions hold as in Definition 3.11. The generalised symmetricBregman distance is defined as

DsymmE (x1, x2) := Dp1

E (x2, x1) +Dp2E (x1, x2) = 〈x1 − x2, p1 − p2〉 ,

for p1 ∈ ∂E(x1) and p2 ∈ ∂E(x2).

Note that we have omitted p1 and p2 in the notation of DsymmE only for the sake of brevity.

The generalised symmetric Bregman distance will be a helpful tool throughout the next chapterfor the convergence analysis of the so-called primal-dual hybrid gradient method for the iterativesolution of convex variational regularisation problems.

Chapter 4

Computational realisation

In this final chapter of the lecture we want to focus on how to implement variational regularisationmethods numerically. We focus on variational regularisation methods as they appear to be the mostgeneral class of regularisation methods that we have studied throughout this lecture. However,we will only consider convex variational regularisation methods, as we want to guarantee that theiterative methods that we are going to consider converge to global minimisers.

4.1 A primal dual hybrid gradient method

Many Tikhonov-type regularisation problems of the form (3.26) can be formulated as a memberof the following class of saddle-point problems:

infu

supvF (u)−G(v) + 〈Du, v〉 . (4.1)

Here both F and G are proper, l.s.c. and convex functionals, and D ∈ L(U ,W). Before wecontinue to study (4.1) and potential numerical solutions, we want to give a quick example.

Example 4.1 (Total variation regularisation). Note that we can reformulate (3.29) as

uα ∈ arg minu∈BV(Ω)

1

2‖Ku− f‖2V + sup

v〈u, αdivv〉 − χC(v)

with χC being a characterisitc function as defined in (3.31) and

C :=v ∈ C∞0 (Ω;Rn) | ‖‖v‖2‖L∞(Ω) ≤ 1

In order to discuss how to solve (4.1) numerically, we want to consider the optimality conditions

of the saddle-point problem first. For any subgradients p ∈ ∂F (u) and q ∈ ∂G(v) that satisfy

p+D∗v = 0 ,q −Du = 0 ,

(4.2)

the point (u, v) is a saddle-point of (4.1), due to the convexity of F and G. We therefore want toapproach finding a saddle point of (4.1) via the iterative approach(

pk+1 +D∗vk+1

qk+1 −Duk+1

)+M

(uk+1 − ukvk+1 − vk

)=

(00

), (4.3)

39

40 4.1. A PRIMAL DUAL HYBRID GRADIENT METHOD

for pk+1 ∈ ∂F (uk+1) and qk+1 ∈ ∂G(vk+1), and a 2 × 2 operator-matrix M . We immediatelyobserve that (4.3) should approach (4.2) in the limit (for properly chosenM). Hence, the importantquestion is how to chooseM such that (4.3) results in a convergent algorithm with relatively simpleupdate steps for uk+1 and vk+1? From our intuitive knowledge as numerical analysts we wouldguess that M should be positive definite and maybe even self-adjoint, to guarantee convergenceof (4.3). But beyond that? A naïve choice for M could simply be the identity; however, in thatcase each update step is implicit in both uk+1 and vk+1, which may not simplify solving oursaddle-point problem at all. Alternatively, we propose to use

M :=

(1τ I −D∗−D 1

σ I

)(4.4)

instead, with τ and σ being positive scalars. For this choice, (4.3) simplifies to the equations

uk+1 − (uk − τD∗vk) + τpk+1 = 0 , (4.5)

vk+1 − (vk + σD(2uk+1 − uk)) + σqk+1 = 0 . (4.6)

Due to pk+1 ∈ ∂F (uk+1) and qk+1 ∈ ∂G(uk+1), Equations (4.5) and (4.6) can be rewritten to

uk+1 = (I + τ∂F )−1(uk − τD∗vk) , (4.7)

vk+1 = (I + σ∂G)−1(vk + σD(2uk+1 − uk)) , (4.8)

where (I + α∂E)−1 is short for

(I + α∂E)−1(f) := arg minu

1

2‖u− f‖22 + αE(u)

. (4.9)

Operations of the form (4.9) are known as the proximal or resolvent operator with respect to theconvex functional E.

The iterates (4.7) and (4.8) are known to be the so-called primal-dual hybrid gradient method(PDHGM). Before we prove actual convergence of those iterates to a saddle point of (4.1), wewant to highlight what makes PDHGM so useful. First of all, the particular choice of M in (4.4)decouples uk+1 and vk+1 in the update for uk+1, which makes updating uk+1 and vk+1 in analternating possible. Secondly, PDHGM now only requires basic arithmetic operations, operator-adjoints and -multiplications, and the evaluation of the resolvent operations with respect to F andG. If these are simple, the overall PDHGM is simple. We want to take a look at the simple totalvariation problem to see that in that case the resolvent operations are particularly easy.

Example 4.2. Note that the resolvent operators in case of Example 4.1 are given as

u = (I + τ∂F )−1(w) = arg minu

1

2‖u− w‖22 +

τ

2‖Ku− f‖22

(4.10)

and

v = (I + σ∂G)−1(z) = arg minv

1

2‖v − z‖2F + σχC(v)

. (4.11)

The resolvent operator (4.10) simply resembles the variational form of (shifted) Tikhonov regular-isation. Hence, we can solve (4.10) via

u = (I + τK∗K)−1(w + τK∗f) ,

CHAPTER 4. COMPUTATIONAL REALISATION 41

according to Equation (3.11).Further, we can figure out via case differentiation (left as an exercise) that the resolvent operator

(4.11) can be computed via

vj =zj

max(1, ‖z‖2),

where zj denotes the j-th vectorial component of the vector field z, for j ∈ 1, . . . , n. Hence,PDHGM reads as

uk+1 = (I + τK∗K)−1(uk + τ(divvk +K∗f)) (4.12)

vk+1j =

(vk + σ∇(2uk+1 − uk))jmax(1, ‖vk + σ∇(2uk+1 − uk)‖2)

for all j ∈ 1, . . . , n (4.13)

in case of total variation regularisation, due to ∇∗ = −div. Note that in case of ROF-denoising asproposed in [8], Equation (4.12) simplifies to

uk+1 =uk + τ(divvk + f)

1 + τ.

Note that for real world applications one obviously has to find appropriate discretisations of Kand ∇.

4.1.1 Convergence of PDHGM

Before we prove convergence of the iterates under suitable conditions on τ and σ, we want to provethat M as defined in (4.4) is positive definite (under suitable conditions on τ and σ).

Lemma 4.1. Consider the self-adjoint matrix M as defined in (4.4). Then, M is positive definiteif 0 < τσ‖D‖2 < 1.

Proof. In order to prove positive definiteness of M we need to show that⟨M

(uv

),

(uv

)⟩> 0 (4.14)

holds true for all (u, v)T 6= (0, 0)T . Evaluating the dual product in (4.14) yields the condition

1

τ‖u‖2 +

1

σ‖v‖2 − 2〈Du, v〉 > 0 . (4.15)

With the help of Young’s inequality, we can estimate −2〈Du, v〉 from below with

−2〈Du, v〉 ≥ −(τσ‖D‖2 c

τ‖u‖2 +

1

cσ‖v‖2

),

for some constant c > 0. Due to the condition τσ‖D‖2 < 1 it is clear that we can find an ε > 0with (1 + ε)2τσ‖D‖2 = 1. This further implies 1/((1 + ε)τσ‖D‖2) = 1 + ε and hence, we canchoose c = 1 + ε = 1/((1 + ε)τσ‖D‖2) to obtain

−2〈Du, v〉 ≥ −(

1

1 + ε

)(1

τ‖u‖2 +

1

σ‖v‖2

).


Inserting this estimate into (4.15) therefore yields(ε

1 + ε

)(1

τ‖u‖2 +

1

σ‖v‖2

)> 0 ,

for (u, v)T 6= (0, 0)T , since ε > 0.

With the help of Lemma 4.1 we are able to prove convergence of the iterates (4.7) and (4.8)to a saddle point of (4.1).

Theorem 4.1. Let 0 < τσ‖D‖2 < 1. Then the sequence (uk, vk) defined via (4.7) and (4.8)satisfies the following properties (for ‖u0‖ <∞ and ‖v0‖ <∞):

• limk→∞DsymmF (uk, u) = 0

• limk→∞DsymmG (vk, v) = 0

• All limit points of (uk, vk) are saddle points of (4.1).

Here (u, v) denotes a saddle point satisfying (4.2).

Proof. Note that we can combine (4.2) and (4.3) with M as defined in (4.4) to(pk+1 − p+D∗(vk+1 − v)qk+1 − q −D(uk+1 − u)

)+

(1τ I −D∗−D 1

σ I

)(uk+1 − ukvk+1 − vk

)=

(00

). (4.16)

Taking a dual product of (pk+1 − p+D∗(vk+1 − v)qk+1 − q −D(uk+1 − u)

)with (uk+1 − u, vk+1 − v)T yields⟨(

pk+1 − p+D∗(vk+1 − v)qk+1 − q −D(uk+1 − u)

),

(uk+1 − uvk+1 − v

)⟩= Dsymm

F (uk+1, u) +DsymmG (vk+1, v) ≥ 0 .

Hence, we can conclude

0 ≥⟨M

(uk+1 − ukvk+1 − vk

),

(uk+1 − uvk+1 − v

)⟩=

⟨(uk+1 − ukvk+1 − vk

),M

(uk+1 − uvk+1 − v

)⟩due to M being self-adjoint. By using the notation wk := (uk, vk)T and w := (u, v)T , and‖w‖M :=

√〈M,w,w〉, we observe

0 ≥ ‖wk+1 − wk‖2M + 〈wk − w,M(wk+1 − wk)〉= ‖wk+1 − wk‖2M − ‖wk − w‖2M − 〈w − wk,M(wk+1 − w)〉

= ‖wk+1 − wk‖2M − ‖wk − w‖2M +1

2

(‖wk+1 − w‖2M + ‖wk − w‖2M − ‖wk+1 − wk‖2M

)=

1

2

(‖wk+1 − wk‖2M − ‖wk − w‖2M + ‖wk+1 − w‖2M

),

and thus, we conclude

‖wk − w‖2M ≥ ‖wk+1 − w‖2M + ‖wk+1 − wk‖2M .

CHAPTER 4. COMPUTATIONAL REALISATION 43

As ‖ · ‖M is a norm, due to M being self-adjoint and positive definite, this implies boundednessof the sequence wk. Summing up the inner product of (4.16) with wk+1 − w over all k thereforeyields

∞∑k=0

2(DsymmF (uk+1, u) +Dsymm

G (vk+1, v))

+

∞∑k=0

‖wk+1 − wk‖2M

=

∞∑k=0

(‖wk − w‖2M + ‖wk+1 − w‖2M

)= ‖w0 − w‖2M <∞ .

This implies ‖wk+1 − wk‖M → 0, DsymmF (uk+1, u)→ 0 and Dsymm

G (vk+1, v)→ 0 for k → +∞.Due to the boundedness of wk there exists a subsequence wkl that converges to a limit point

w∞ := liml→+∞wkl . We have to verify that w∞ is a saddle point, which for s∞ := liml→+∞ s

kl ,sk := (pk, qk)T , follows immediately from (4.16).

4.1.2 Deconvolution with total variation regularisation

An example for total variation regularisation of the inverse problem of image convolution is givenas Exercise 2 on Exercisesheet 3.

Bibliography

[1] Heinz Werner Engl. Integralgleichungen. Springer-Verlag, 1997. 19

[2] Heinz Werner Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems,volume 375. Springer Science & Business Media, 1996. 16, 17, 19

[3] Charles W Groetsch. Stable approximate evaluation of unbounded operators. Springer, 2006.23

[4] Jan Lellmann. Convex optimization with applications to image processing, 2013. Lecturenotes. 36

[5] Terry J Morrison. Functional analysis: An introduction to Banach space theory, volume 43.John Wiley & Sons, 2011. 22

[6] Arch W Naylor and George R Sell. Linear operator theory in engineering and science. SpringerScience & Business Media, 2000. 14

[7] R Tyrrell Rockafellar. Convex Analysis. Princeton University Press, 1970. 36

[8] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noiseremoval algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992. 34, 41

[9] Terence Tao. Epsilon of Room, One, volume 1. American Mathematical Soc., 2010. 31

[10] Eberhard Zeidler. Applied functional analysis, applications to mathematical physics. AppliedMathematical Sciences, 108, 1995. 13, 14

45

inverse problems - university of cambridge problems.pdfin the following we are going to present...

Documents