numerical methods for separable nonlinear inverse …numerical methods for separable nonlinear...
Post on 17-Jun-2020
23 Views
Preview:
TRANSCRIPT
Numerical Methods for Separable Nonlinear Inverse Problems with
Constraint and Low Rank
Taewon Cho
Thesis submitted to the Faculty of the
Virginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Master
in
Mathematics
Julianne Chung, Chair
Matthias Chung
Mark Embree
Nov 20, 2017
Blacksburg, Virginia
Keywords: Nonlinear Inverse Problem, Image Deblurring, Gauss-Newton method, Variable
Projection, Alternating Optimization
Copyright 2017, Taewon Cho
Numerical Methods for Separable Nonlinear Inverse Problems with
Constraint and Low Rank
Taewon Cho
(ABSTRACT)
In this age, there are many applications of inverse problems to lots of areas ranging from
astronomy, geoscience and so on. For example, image reconstruction and deblurring require
the use of methods to solve inverse problems. Since the problems are subject to many factors
and noise, we can’t simply apply general inversion methods. Furthermore in the problems
of interest, the number of unknown variables is huge, and some may depend nonlinearly on
the data, such that we must solve nonlinear problems. It is quite different and significantly
more challenging to solve nonlinear problems than linear inverse problems, and we need to
use more sophisticated methods to solve these kinds of problems.
Numerical Methods for Separable Nonlinear Inverse Problems with
Constraint and Low Rank
Taewon Cho
(GENERAL AUDIENCE ABSTRACT)
In various research areas, there are many required measurements which can’t be observed due
to physical and economical reasons. Instead, these unknown measurements can be recovered
by known measurements. This phenomenon can be modeled and be solved by mathematics.
Contents
1 Introduction 1
2 Background 5
2.1 Point Spread Function (PSF) . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 One-dimensional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Two-dimensional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Low-Rank PSF problem . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Regularization for the Linear Problem . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Picard Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Spectral Filtering Methods . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Choosing the Regularization Parameter . . . . . . . . . . . . . . . . . 19
2.3 Gauss-Newton Method Nonlinear Least Squares . . . . . . . . . . . . . . . . 23
2.4 Variable Projection for Separable Nonlinear Least-Squares Problems . . . . . 27
3 Exploiting a Low Rank PSF in Solving Nonlinear Inverse Problems 32
iii
3.1 Symmetric PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Non-symmetric PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Numerical Results 40
4.1 Variable projection with low rank PSF . . . . . . . . . . . . . . . . . . . . . 40
4.2 xtrue - Alternating Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Alternating Optimization 3 ways . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Conclusions and Discussion 60
6 References 62
iv
List of Figures
1.1 Forward Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example of image blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Blurring by PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Discrete Picard Conditions from [2] . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 True parameters ytrue, ztrue . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Comparing λnl = 0.025, 0.028, 0.03, 0.032, 0.035 . . . . . . . . . . . . . . . . . 42
4.3 Error compares with graph for Non-symmetric PSF with λnl = 0.03 . . . . . 43
4.4 A comparison the true, blurred, deblurred images as the reduced Gauss-Newton
method with non-symmetric PSF and λnl = 0.03 . . . . . . . . . . . . . . . . . 45
4.5 A comparison the true, reconstructed PSFs with λnl = 0.03 . . . . . . . . . . 46
4.6 Error Compares with Alternating Optimization . . . . . . . . . . . . . . . . 48
4.7 Error Compares with Alternating Optimization 3 ways . . . . . . . . . . . . 52
4.8 Rotated true and blurred images with 0o, 90o, 180o, 270o. . . . . . . . . . . . 53
4.9 A comparison errors of y and z by Alternating Optimization 3 ways with rotation 54
v
4.10 Relative errors by Alternating Optimization 3 ways with rotation . . . . . . . . . 55
4.11 Computed images by Alternating Optimization 3 ways with rotation (64× 64 size) 56
4.12 Computed PSFs by Alternating Optimization 3 ways with rotation (64× 64 size) . 57
4.13 Computed images by Alternating Optimization 3 ways with rotation (256× 256 size) 58
4.14 Computed PSFs by Alternating Optimization 3 ways with rotation (256× 256 size) 59
vi
List of Tables
4.1 Table of relative norm errors for Non-symmetric PSF with λnl = 0.03 . . . . 44
4.2 Table of relative norm errors for Alternating Optimization . . . . . . . . . . 49
vii
Chapter 1
Introduction
x A b
Input Forward Operation Output
Figure 1.1: Forward Problem
We are usually interested in forward problems where, we just compute data from given
parameters. In other words, if we have input data and a forward procedure, we observe
output data. In this case the input data and the forward procedure are known quantities,
and the output is unknown. For the problems of interest here, evaluating the forward
procedure usually does not require us to spend much time or lots of cost. But let’s consider
the problem where we know the output data and don’t know one or more of the input
parameters and forward procedure. Then inversion becomes a much more complicated task
because we need to modify this system to get unknown data. Problems where the goal is
to compute unknown input data and the forward procedure from output data, are called
inverse problems [2]. In simulated problems, we get observed data from the forward system
1
Taewon Cho Chapter 1. Introduction 2
(a) True Image (b) Blur (c) Blurred Image
Figure 1.2: Example of image blurring
and the goal is to obtain the original one; it is an inverse problem. Other examples arise
when we take blurred images in astronomy, medicine, or geoscience. We desire to obtain true
images from blurred images (Figure 1.1) or we want to check some characteristic of inside
the earth through surface measurements.
In order to understand the inverse problem, we need tools from basic linear algebra and
matrix computations. Let A ∈ Rn×n be a linear operator and let x ∈ Rn×1, b ∈ Rn×1 be
vectors. Then the forward model is given by b = Ax. That is, in the forward model, we
know A, x and we compute b by matrix-vector multiplication. On the other hand, if we
only know A and b, then it becomes a complicated problem to find the exact x. This type
of problem is known as an inverse problem and in well-posed situations, we could get x by
using the inverse matrix (or pseudo inverse) of A. But in many real applications, A is more
likely to be ill-conditioned or a singular matrix. This means that the inverse may not exist
and if it exists, it may not be easy to compute the inverse matrix of A, A−1 even when A
is well-conditioned but large, and solve for x = A−1b even if we can use high performance
computers. Then it will require tons of time and cost to compute the inverse matrix. When
A is singular, A−1 does not exist. And if A is not a square matrix and it is a m-by-n
rectangular matrix with m > n, we are not able to use general ways. Sometimes x won’t be
unique. So we need to solve that least squares problem, minx‖b−Ax‖, using techniques such
Taewon Cho Chapter 1. Introduction 3
as the normal equations, QR decomposition, and Singular Value Decomposition (SVD)such
as A = UΣVT where the U, V are orthogonal matrices and the Σ is a diagonal matrix with
nonnegative real numbers.
Regularization is one approach to impose prior knowledge to solve the inverse problem more
correctly. The meaning of solving correctly is obtaining xreg which is close to xtrue. To
analyze what happens when solving for x, we use the SVD form, where xreg is expressed by
xreg =n∑i=1
φiuTi b
σivi, (1.1)
where the σi’s are diagonal elements of Σ, the ui’s are column vectors of U, and the vi’s are
column vectors of V. And the φi’s are filtering factors which play a big role in regularization.
We will discuss regularization more in chapter 2.
But even if we use regularization to stabilize the inversion process, it is still a challenge
for us because the forward operator A may depend on some unknown parameters. In many
real cases, A may not be known exactly. Thus when we build A, we need to incorporate a
new variable y. Now we need to consider the form
A = A(y)
where y ∈ Rn×1. Then we will have
b = A(y)x
and we need to solve a nonlinear least squares problem with both linear parameters x and
nonlinear parameters y,
minx,y‖b−A(y)x‖22. (1.2)
Taewon Cho Chapter 1. Introduction 4
In many scenarios, it may be desirable to include additional solution constraints. For exam-
ple, since images often represent light intensities or densities, pixels should be expressed by
nonnegative values in the matrix. Thus we could force x to be constrained such as x ≥ 0.
Various methods have been investigated such as the Active Set method [11]. However in our
numerical experience, we did not observe significant improvements for the Active Set, thus
we do not consider it here. However, we do enforce y ≥ 0.
Therefore our goal in this thesis is to solve the following constrained nonlinear least squares
problem
minx,y
s.t. y≥0‖A(y)x− b‖22 (1.3)
where A is forward operator matrix depending on unknown parameters y, b is observed data
vector, and x is true parameter vector. To solve this problem, we will use the Gauss-Newton
method, but it is very expensive to compute the Jacobian at every iteration. Thus we need
to discuss how to use variable projection methods to reduce the computational costs and
exploit the separable model. We also consider alternating optimization methods that can
exploit problem structure.
In this thesis, we will start by introducing how to construct the PSF and how to math-
ematically describe image deblurring as a linear model. And then we are going to look at
how to regularize the linear problem and investigate methods for nonlinear least squares
problems including Gauss-Newton, variable projection, and alternating optimization.
Then we will apply the numerical method to blind image deblurring problems and analyze
the numerical results.
Chapter 2
Background
2.1 Point Spread Function (PSF)
First we describe what the point spread function (PSF) is. The PSF is very important in
image processing because it can be used to describe a blur and define the forward operation.
There are different reasons for blurs. Mainly we can split them into physical and mechanical
processes. For example in taking a photo, moving the camera or imaging through the
atmosphere physically cause blurs. A deformed or broken lens is one mechanical reason for
blur [3, 12].
The PSF can be used to construct mathematical models. The PSF is based on the assumption
that each pixel is blurred by its neighboring pixels. For example in two-dimensions, if there
is the only intensity at the center of the matrix, then a PSF array containing values of 110
,
210
would result in the origin image being blurred.
In Figure 2.1, the operator � means that each cell in the blurred matrix is calculated by
a sum of component-wise multiplication between the origin matrix and PSF array with the
5
Taewon Cho Chapter 2. Background 6
Origin
0 0 0
0 1 0
0 0 0
�
PSF array
110
110
110
110
210
110
110
110
110
=
Blurred
110
110
110
110
210
110
110
110
110
Figure 2.1: Blurring by PSF
assumption that outside components of the origin matrix are zeros. By matching the center
of the PSF array and the chosen node, we can compute the blurred matrix like below.
· In center node (1,1),
0 · 110
0 · 110
0 · 110 0 0
0 · 110
0 · 210
0 · 110 0 0
0 · 110
0 · 110
1 · 110 0 0
0 0 0 0 0
0 0 0 0 0
0 · 110
0 · 110
0 · 110
0 · 110
0 · 210
0 · 110
0 · 110
0 · 110
1 · 110
⇒
110
Sum of all
Blurred
· In center node (1,2),
0 0 · 110
0 · 110
0 · 110 0
0 0 · 110
0 · 210
0 · 110 0
0 0 · 110
1 · 110
0 · 110 0
0 0 0 0 0
0 0 0 0 0
0 · 110
0 · 110
0 · 110
0 · 110
0 · 210
0 · 110
0 · 110
1 · 110
0 · 110
⇒
110
110
Sum of all
Blurred
Taewon Cho Chapter 2. Background 7
· In center node (1,3),
0 0 0 · 110
0 · 110
0 · 110
0 0 0 · 110
0 · 210
0 · 110
0 0 1 · 110
0 · 110
0 · 110
0 0 0 0 0
0 0 0 0 0
0 · 110
0 · 110
0 · 110
0 · 110
0 · 210
0 · 110
1 · 110
0 · 110
0 · 110
⇒
110
110
110
Sum of all
Blurred
· In center node (2,1),
0 0 0 0 0
0 · 110
0 · 110
0 · 110 0 0
0 · 110
0 · 210
1 · 110 0 0
0 · 110
0 · 110
0 · 110 0 0
0 0 0 0 0
0 · 110
0 · 110
0 · 110
0 · 110
0 · 210
1 · 110
0 · 110
0 · 110
0 · 110
⇒
110
110
110
110Sum of all
Blurred
· In center node (2,2),
0 0 0 0 0
0 0 · 210
0 · 110
0 · 110 0
0 0 · 110
1 · 210
0 · 110 0
0 0 · 110
0 · 110
0 · 110 0
0 0 0 0 0
0 · 110
0 · 110
0 · 110
0 · 110
1 · 210
0 · 110
0 · 110
0 · 110
0 · 110
⇒
110
110
110
110
210Sum of all
Blurred
Taewon Cho Chapter 2. Background 8
· In center node (2,3),
0 0 0 0 0
0 0 0 · 110
0 · 110
0 · 110
0 0 1 · 110
0 · 210
0 · 110
0 0 0 · 110
0 · 110
0 · 110
0 0 0 0 0
0 · 110
0 · 110
0 · 110
1 · 110
0 · 210
0 · 110
0 · 110
0 · 110
0 · 110
⇒
110
110
110
110
210
110Sum of all
Blurred
· In center node (3,1),
0 0 0 0 0
0 0 0 0 0
0 · 110
0 · 110
1 · 110 0 0
0 · 110
0 · 210
0 · 110 0 0
0 · 110
0 · 110
0 · 110 0 0
0 · 110
0 · 110
1 · 110
0 · 110
0 · 210
0 · 110
0 · 110
0 · 110
0 · 110
⇒
110
110
110
110
210
110
110
Sum of all
Blurred
· In center node (3,2),
0 0 0 0 0
0 0 0 0 0
0 0 · 110
1 · 110
0 · 110 0
0 0 · 110
0 · 210
0 · 110 0
0 0 · 110
0 · 110
0 · 110 0
0 · 110
1 · 110
0 · 110
0 · 110
0 · 210
0 · 110
0 · 110
0 · 110
0 · 110
⇒
110
110
110
110
210
110
110
110
Sum of all
Blurred
Taewon Cho Chapter 2. Background 9
· In center node (3,3),
0 0 0 0 0
0 0 0 0 0
0 0· 1 · 110
0 · 110
0 · 110
0 0 0 · 110
0 · 210
0 · 110
0 0 0 · 110
0 · 110
0 · 110
1 · 110
0 · 110
0 · 110
0 · 110
0 · 210
0 · 110
0 · 110
0 · 110
0 · 110
⇒
110
110
110
110
210
110
110
110
110
Sum of all
Blurred
Regardless of the location of the pixel, the blur is same. This is called a spatially invariant
blur. This is just one example of a blur PSF and various boundary conditions can be used for
the image. In the blur process, regardless of the PSF, each pixel is influenced by neighboring
pixels.
2.1.1 One-dimensional
First let’s see how to construct the blur matrix in one-dimension. We need to define the
forward operation describing that each pixel has an effect on each other with a described
weight from the PSF. If p(s) and x(s) are continuous functions, then the convolution of p(s)
and x(s) is defined by b(t), which is a Fredholm integral equation of the first kind such that
b(s) =
∫ ∞−∞
p(s− t)x(t) dt. (2.1)
Then for each s, b(s) is obtained by the integration of x(t) with a weight from function
p. So we have to flip the function p and shift to get p(s − t). For the discrete version of
convolution, we consider the vectors x, p, and b as the true image, PSF array, and blurred
Taewon Cho Chapter 2. Background 10
image respectively. In one-dimension, for example the true image and PSF in R3×1 are
denoted by
w1
x1
x2
x3
y1
and
p1
p2
p3
,
where w1 and y1 are pixels outside of the original image, which is called the boundary. In
order to get a pixel value in the blurred image we flip and shift the PSF array, and get,
b1 = p3w1 + p2x1 + p1x2,
b2 = p3x1 + p2x2 + p1x3,
b3 = p3x2 + p2x3 + p1y1.
Then we could write this convolution as
b1
b2
b3
=
p3 p2 p1
p3 p2 p1
p3 p2 p1
w1
x1
x2
x3
y1
.
Now depending on how w1 and y1 are defined, we could define different boundary conditions
or assumptions.
• Zero boundary condition: Setting boundary pixels to zero. In this case, w1 = 0 and
y1 = 0. And the blur matrix is a Toeplitz matrix.
Taewon Cho Chapter 2. Background 11
b1
b2
b3
=
p3 p2 p1
p3 p2 p1
p3 p2 p1
0
x1
x2
x3
0
=
p2 p1
p3 p2 p1
p3 p2
x1
x2
x3
.
• Periodic boundary condition: Setting boundary pixels to be periodic with respect to
inside image pixels. In this case, w1 = x3 and y1 = x1. And the matrix is a circulant
matrix.
b1
b2
b3
=
p3 p2 p1
p3 p2 p1
p3 p2 p1
x3
x1
x2
x3
x1
=
p2 p1 p3
p3 p2 p1
p1 p3 p2
x1
x2
x3
.
• Reflexive boundary condition: Setting boundary pixels to reflect inside image pixels.
In this case, w1 = x1 and y1 = x3. And the matrix is a Toeplitz plus Hankel matrix.
b1
b2
b3
=
p3 p2 p1
p3 p2 p1
p3 p2 p1
x1
x1
x2
x3
x3
=
p3 + p2 p1
p3 p2 p1
p3 p2 + p1
x1
x2
x3
.
2.1.2 Two-dimensional
Now let’s check the two-dimensional case in R3×3. Consider the matrices
Taewon Cho Chapter 2. Background 12
X =
w1 w2 w3 w4 w5
w6 x11 x12 x13 w7
w8 x21 x22 x23 w9
w10 x31 x32 x33 w11
w12 w13 w14 w15 w16
, P =
p11 p12 p13
p21 p22 p23
p31 p32 p33
, and B =
b11 b12 b13
b21 b22 b23
b31 b32 b33
,
where wi’s are outside of the original image. To get a blurred image by the convolution
operation in two-dimensions, we flip the matrix P vertically and horizontally and shift it.
Then, for instance, we get the elements of B such that
b11 = p33w1 + p32w2 + p31w3 + p23w6 + p22x11 + p21x12 + p13w8 + p12x21 + p11x22,
b21 = p33w6 + p32x11 + p31x12 + p23x11 + p22x21 + p21x22 + p13w10 + p12x31 + p11x32,
b31 = p33w8 + p32x21 + p31x22 + p23x12 + p22x31 + p21x32 + p13w12 + p12w13 + p11w14,
b12 = p33w2 + p32w3 + p31w4 + p23w8 + p22x12 + p21x13 + p13x21 + p12x22 + p11x23,
b22 = p33x11 + p32x12 + p31x13 + p23x21 + p22x22 + p21x23 + p13x31 + p12x32 + p11x33,
b32 = p33x21 + p32x22 + p31x23 + p23x22 + p32x22 + p21x33 + p13w13 + p12w14 + p11w15,
b13 = p33w3 + p32w4 + p31w5 + p23w10 + p22x13 + p21w7 + p13x22 + p12x23 + p11w9,
b23 = p33x12 + p32x13 + p31w7 + p23x31 + p22x23 + p21w9 + p13x32 + p12x33 + p11w11,
b33 = p33x22 + p32x23 + p31w9 + p23x32 + p22x33 + p21w11 + p13w14 + p12w15 + p11w16.
For the elements, we need to take into account boundary conditions such as zero, peri-
odic, or reflexive. Then, for example, we could describe the relations between b = vec(B)
and x = vec(X) with zero boundary condition as
Taewon Cho Chapter 2. Background 13
b =
b11
b21
b31
b12
b22
b32
b13
b23
b33
=
p22 p12 p21 p11
p32 p22 p12 p31 p21 p11
p32 p22 p31 p21
p23 p13 p22 p12 p21 p11
p33 p23 p13 p32 p22 p12 p31 p21 p11
p33 p23 p32 p22 p31 p21
p23 p13 p22 p12
p33 p23 p13 p32 p22 p12
p33 p23 p32 p22
x11
x21
x31
x12
x22
x32
x13
x23
x33
= A(P)x,
where A(p) is a BTTB matrix. Notice that we can rewrite by change of variables in (2.1),
b =
b11
b21
b31
b12
b22
b32
b13
b23
b33
=
x22 x12 x21 x11
x32 x22 x12 x31 x21 x11
x32 x22 x31 x21
x23 x13 x22 x12 x21 x11
x33 x23 x13 x32 x22 x12 x31 x21 x11
x33 x23 x32 x22 x31 x21
x23 x13 x22 x12
x33 x23 x13 x32 x22 x12
x33 x23 x32 x22
p11
p21
p31
p12
p22
p32
p13
p23
p33
= A(X)vec(P).
Thus for invariant PSFs, we have the following property
A(P)x = A(X)vec(P) (2.2)
where x = vec(X). We will exploit this property in our algorithms.
Taewon Cho Chapter 2. Background 14
2.1.3 Low-Rank PSF problem
The reason we are interested in low-rank PSFs is that the corresponding matrix A can be
describe by a Kronecker product. For instance, assume that the PSF is n × n and can be
written as
P = yzT =
y1...
yn
[z1 · · · zn
]. (2.3)
Then we could write
A = A(P) = A(P(y, z)) = Az ⊗Ay (2.4)
where
Ay =
y1+dn2e · · · y1
.... . .
.... . .
yn · · · y1+dn2e · · · y1
. . ....
. . ....
yn · · · y1+dn2e
and Az =
z1+dn2e · · · z1
.... . .
.... . .
zn · · · z1+dn2e · · · z1
. . ....
. . ....
zn · · · z1+dn2e
.
If n is an odd number, then (1 + dn2e, 1 + dn
2e) will be the center of Ay, Az.
The reason we focus on the Kronecker product is that we can use the properties of Kronecker
products for efficient computation [3, 16]. If A is a Kronecker product, then we could write
its SVD like,
(UyΣyVTy )⊗ (UzΣzV
Tz ) = (Uy ⊗Uz)(Σy ⊗Σz)(Vy ⊗Vz)
T . (2.5)
Taewon Cho Chapter 2. Background 15
And if P is a sum of rank 1 matrices such as
P =n∑j=1
yjzTj , (2.6)
A can be expressed as a sum of Kronecker products (See Kamm and Nagy [13].),
A =n∑j=1
A(j)y ⊗A(j)
z . (2.7)
2.2 Regularization for the Linear Problem
First we will consider the linear problem and investigate why regularization is needed to
compute a solution. The objective of regularization is to constrain unwanted parts and to
reconstruct more stable solutions that are close to the exact one. First, let’s investigate
classical perturbation theory. For some perturbation e, there are two solutions x and xexact
such that
Axexact = bexact and Ax = bexact + e,
where A is nonsingular square matirx. Then we have a bound like
‖x− xexact‖‖xexact‖
≤ cond(A)‖e‖‖bexact‖
, (2.8)
where cond(A) = ‖A−1‖ · ‖A‖ [2, 14]. When A is ill-posed, cond(A) is very large, and x
may not be close to xexact. Although it has an upper bound, empirically the error between
x and xexact tends to follow the upper bound. So regularization is needed to make solution
x close to xexact [1].
Taewon Cho Chapter 2. Background 16
In this thesis we will use the SVD formulation to express and describe the regularization
methods (1.1), where φi are the filter factors being decided by the regularization method.
Regularization for inverse problems is a well-studied field with many excellent textbooks and
papers, e.g.[1, 2, 3].
2.2.1 Picard Condition
For the linear problem Ax = b with A a nonsingular square matrix, we could describe the
inverse solution x from the SVD,
VTx = Σ−1UTb =n∑i=1
uTi b
σi,
x = VΣ−1UTb =n∑i=1
uTi b
σivi. (2.9)
The discrete Picard condition is proposed and studied in [2, 15].
- The Discrete Picard Condition: Let τ be the level at which the computed singular
value σi levels off because of rounding errors. The discrete Picard condition requires,
for all singular values larger than τ , the corresponding |uTi b| decay faster than the σi.
Without noise, if the solution satisfies the discrete Picard condition, |uTi b| decays faster
than σi until round off error. But with noise, the solution starts to be violated, so that it
won’t satisfy the discrete Picard condition. If we check the Picard plot (Figure 2.2) with
|uTi b/σi|, |uTi b|, and σi, they are supposed to be decreasing and the rate of decay of |uTi b|
is faster than the rate of decay of σi, so that |uTi b/σi| also decreases. However when noise
e is added to b, the rate of decay of |uTi b| becomes slower than σi. Hence |uTi b/σi| starts
to increase without bound some point, which makes a solution x far away from the true
x. From this result, if we could suppress noise after that point, we could get a more stable
Taewon Cho Chapter 2. Background 17
0 10 20 30 40 50 60 70
i
10-15
10-10
10-5
100
Picard plot
i
|ui
Tb|
|ui
Tb|/
i
(a) Rounding error
0 10 20 30 40 50 60 70
i
10-10
100
1010
Picard plot
i
|ui
Tb|
|ui
Tb|/
i
(b) Noise: 10−6 error
Figure 2.2: Discrete Picard Conditions from [2]
approximated solution. To avoid these noise components, we could use Truncated SVD or
Tikhonov regularization.
2.2.2 Spectral Filtering Methods
For problems where the SVD of A can be computed, we will use the SVD of A as (2.9) to
compute x. The noise components start to disturb the solution when the rate of decreasing
σi is faster than the rate of decay of |uTi b|. To make a stable approximated solution, the
solution needs to be free from the noise components. Then let’s look for a way to avoid the
noise components.
First, we could cut the noise parts from the sum ofuTi b
σivi. By cutting them simply, we could
make x not to be unbounded. This method is called Truncated SVD(TSVD). We could split
at some k like
x =k∑i=1
uTi b
σivi +
n∑i=k+1
uTi b
σivi
Taewon Cho Chapter 2. Background 18
where the rate of decay of |uTi b| becomes slower than σi at k. TSVD cuts off all noise
components and we get the regularized solution
x =k∑i=1
uTi b
σivi.
But we still need to determine k where we have to cut off.
Another option is to remove noise components using filters. Rather than cut off at some
point k, we will define a new function called the filtering function φi for each i = 1, . . . , n,
with given parameter λ > 0,
φi =σ2i
λ2 + σ2i
.
and we will multiply this filter function, so we get regularized solution
x =n∑i=1
φiuTi b
σivi.
If σi is much bigger than the given λ, φi will be close to 1. If σi is much less than the given λ,
φi will be close to 0. Thus we could minimize disturbance from noise. This method is called
Tikhonov regularization. It can be shown that this form is equivalent to the optimization
problem,
minx‖Ax− b‖22 + λ2‖x‖22. (2.10)
Notice that,
minx‖Ax− b‖22 + λ2‖x‖22 = min
x
∥∥∥∥∥∥∥A
λI
x−
b
0
∥∥∥∥∥∥∥2
2
.
Taewon Cho Chapter 2. Background 19
Then the solution x can be obtained by the normal equations,
x = (ATA + λ2I)−1ATb
= (VΣUTUΣVT + λ2VVT )−1VΣUTb
= (V(Σ2 + λ2I)VT )−1VΣUTb
= V(Σ2 + λ2I)−1VTVΣUTb
= V(Σ2 + λ2I)−1ΣUTb
=n∑i=1
φiuTi b
σivi
(2.11)
Similar to TSVD, we need to select a regularization parameter λ that gives us a stable
approximated solution.
2.2.3 Choosing the Regularization Parameter
In this section, we describe various methods to select k or λ. Now define diagonal filter
matrix,
φ(λ) = diag(φ1, φ2, . . . , φn)
Let’s consider the error of the solution from Tikhonov regularization,
xexact − x = xexact −Vφ(λ)Σ−1UTb
= xexact −Vφ(λ)Σ−1UT (bexact + e)
= xexact −Vφ(λ)Σ−1UTbexact −Vφ(λ)Σ−1UTe
= xexact −Vφ(λ)Σ−1UTAxexact −Vφ(λ)Σ−1UTe
= (I−Vφ(λ)Σ−1UTA)xexact −Vφ(λ)Σ−1UTe
= (VVT −Vφ(λ)Σ−1UTUΣVT )xexact −Vφ(λ)Σ−1UTe
= V(I− φ(λ))VTxexact −Vφ(λ)Σ−1UTe.
(2.12)
Taewon Cho Chapter 2. Background 20
In this way, we can consider the error of the solution in two parts. The first term is called
the regularization error and the second term is called the perturbation error. When λ is
close to zero, then the regularization error is very small because φ(λ) is close to I as λ→ 0
but the perturbation error can be large. Conversely, as λ increases the perturbation error
decreases but the regularization error increases. So we need to give an appropriate value of
λ to get balance between the regularization and perturbation error in order to minimize the
error of the solution. A similar form for the filter factors can be obtained for TSVD where
φi = 1 or φi = 0.
We will explore three methods to choose the parameter λ for Tikhonov or the proper index
k for TSVD. We consider the Discrepancy Principle (DP), L-curve, and the Generalized
Cross Validation (GCV) method which is a statistical method. We will use the norm to
approximate the parameter such as
‖xλ‖22 and ‖Axλ − b‖22 or ‖xk‖22 and ‖Axk − b‖22
since these norms also will decrease to zero as x→ xexact.
First, the Discrepancy Principle is one of the most simple approaches. If the noise is known
(e.g. ‖e‖2), then we will choose kdp or kλ such that
‖Axk − b‖2 ≥ νdp‖e‖2 ≥ ‖Axk+1 − b‖2,
‖Axλ − b‖2 = νdp‖e‖2,
respectively, where νdp is a safety factor and νdp > 1 in [2]. So if we know the measure of
noise, we could choose a parameter by making the residual error be equal to the error norm
with a safety factor. But it is a very critical disadvantage because we usually don’t know
‖e‖2 exactly. But if we know it, the Discrepancy Principle is really simple to compute.
Taewon Cho Chapter 2. Background 21
Second, the L-curve uses the curvature of curve (log ‖Axλ − b‖2, log ‖xλ‖2) and seeks a
location where the curve transitions between horizontal and vertical parts. It uses the
quantities
ξ = ‖xλ‖22 and ρ = ‖Axλ − b‖22,
and chooses λ to maximize the curvature
cλ = 2ξρ
ξ′λ2ξ′ρ+ 2λξρ+ λ4ξξ′
(λ2ξ2 + ρ2)3/2.
For TSVD, we choose k at the corner of the L-curve. Unfortunately, the L-curve method
fails when vTi xexact decays to zero quickly or the change in the norms of residual and solution
is small compared with two consecutive values of k.
Last, GCV is a very common and useful method. Let’s check the difference between bexact
and Axk for some rank-k of TSVD solution. Then
Axk − bexact = AVφ(k)Σ−1UTb− bexact
= UΣVTVφ(k)Σ−1UT (bexact + e)−UUTbexact
= U
Ik 0
0 0
UTbexact + U
Ik 0
0 0
UTe−UUTbexact
= U
Ik 0
0 0
UTe−U
0 0
0 In−k
UTbexact
where φ(k) = diag(1, . . . , 1, 0, . . . , 0). Thus, the error norm becomes
‖Axk − bexact‖22 =k∑i=1
(uTi e)2 +n∑
i=k+1
(uTi bexact)2.
If we know bexact and a noise e, we will be able to find an appropriate index k to minimize
the error. But the noise is usually not known and bexact won’t be available. Since we don’t
know bexact, we will estimate each element by using the other elements.
Taewon Cho Chapter 2. Background 22
Consider the Tikhonov case. Without the ith row of A, b, call them A(i) and b(i) respectively,
solve x(i)λ by Tikhonov such that
x(i)λ = ((A(i))TA(i) + λ2In−1)
−1(A(i))Tb(i)
and then use this x(i)λ to estimate the element bi by computing A(i, :)x
(i)λ . Hence our goal is
to minimize the errors of each part such that
minλ
1
n
n∑i=1
(A(i, :)x(i)λ − bi)
2.
With some technical computation it could be written by
minλ
1
n
n∑i=1
(A(i, :)xλ − bi
1− hii
)2
,
where hii are the diagonal elements of matrix A(ATA + λ2I)−1AT and xλ is Tikhonov
solution. But it still has an issue because the solution will depend on the order of hii. So
to recover this problem, we replace hii by the average of them. We called this method
generalized cross validation (GCV), where it has the minimization form
minλ
1
n
n∑i=1
(A(i, :)xλ − bi
1− trace(A(ATA + λ2I)−1AT )/n
)2
.
By using the SVD of A,
trace(A(ATA + λ2I)−1AT ) = trace(UΣVTV(Σ2 + λ2I)−1VTVΣUT )
= trace(UΣ(Σ2 + λ2I)−1ΣUT )
= trace(Uφ(λ)UT )
= trace(φ(λ))
= trace(n∑i=1
φi(λ)).
Taewon Cho Chapter 2. Background 23
Hence GCV chooses λ such that
λGCV minimizes‖Axλ − b‖22(n−
n∑i=1
φi(λ))2 . (2.13)
For TSVD, since φ(k) = diag(1, . . . , 1, 0, . . . , 0),
kGCV minimizes‖Axk − b‖22(n− k
)2 . (2.14)
In summary, we have checked that the regularization method is needed for a linear inverse
problem to get a more stable solution which is close to the exact solution. Through the
Picard condition and plots, we noticed that we should consider the first few elements of the
SVD form (2.9) to avoid unwanted errors. Standard methods for choosing the regularization
parameter include DP, L-curve, and GCV.
2.3 Gauss-Newton Method Nonlinear Least Squares
Next we review optimization methods for solving nonlinear least squares problems. Our goal
is to minimize f(x) =1
2
n∑j=1
r2j (x) where rj is a smooth function from Rn to R. We call r the
residual vector from Rn to Rn with
r(x) = (r1(x), r2(x), . . . , rn(x))T
and we get
f(x) =1
2‖r(x)‖22.
Now we can express the derivatives of f(x) in terms of the Jacobian J(x) ∈ Rn×n, where
Taewon Cho Chapter 2. Background 24
J(x) =
∇r1(x)T
∇r2(x)T
...
∇rn(x)T
such that
∇f = J(x)T r(x)
∇2f = J(x)TJ(x) +n∑j=1
rj(x)∇2rj(x)
and ∇f , ∇2f are called the gradient vector and Hessian matrix respectively [4]. For linear
problems where r(x) = Ax− b, we have J(x) = A and ∇f = AT (Ax− b).
The standard Newton method for minimizing f(x) is an iterative method, such as xk+1 =
xk + αkpk where αk is step size and the descent direction pk is calculated such that
∇2f(xk)pk = −∇f(xk)
at each step k. The Gauss-Newton method uses an approximation of the Hessian
∇2f(xk) ≈ J(xk)TJ(xk).
Then, along with the gradient,
∇f(xk) = J(xk)T r(xk),
we get the Gauss-Newton step pGNk such that
J(xk)TJ(xk)p
GNk = −J(xk)
T r(xk). (2.15)
Taewon Cho Chapter 2. Background 25
The reason we use the approximation is to avoid calculating the Hessian matrix ∇2f . Fur-
thermore J(x)TJ(x) usually dominates the second term of ∇2f(x) from the Taylor series.
Then let’s go back to the nonlinear problem
minx,y‖b−A(y)x‖22
and from Chung and Nagy [7], we can define the coupled least squares problem as
minw
ψ(w) = minw
1
2‖f(w)‖22 (2.16)
where
f(w) = f(x,y) =
A(y)
λI
x−
b
0
and w =
x
y
.
Now the problem minw
ψ(w) can be solved by the Gauss-Newton method, which the iterates
are given by
w`+1 = w` + d`, ` = 0, 1, 2, . . . ,
where w0 is an initial guess, and d` is computed by solving
Jψ(w`)TJψ(w`)d` = −Jψ(w`)
T f(w`).
If we define r = −f, then
Jψ(w`)TJψ(w`)d` = −Jψ(w`)
T f(w`)
= Jψ(w`)T r(w`).
Finding the search direction d` is equivalent to
Taewon Cho Chapter 2. Background 26
mind‖Jψ(w`)
TJψ(w`)d− Jψ(w`)T r(z`)‖22
= mind‖Jψ(w`)
T (Jψ(w`)d− r(w`))‖22
= mind‖Jψ(w`)d− r(w`)‖22
The Jacobian matrix Jψ can be written as
Jψ =
[fx fy
]=
[∂f(x,y)
∂x
∂f(x,y)
∂y
]. (2.17)
In summary, the Gauss-Newton method for minw
ψ(w) has the following general form.
1. Choose initial w0 =
x0
y0
2. for ` = 0, 1, 2, . . .
2.1 r` =
b
0
−A(y`)
λI
x`
2.2 d` = argmind‖Jψd− r`‖2
2.3 w`+1 = w` + d`
3. end
But calculating and solving the above algorithm with Jψ at each step can be expensive
and take much time when the size of x, y is big. Alternative approaches include variable
projection [6, 10] and alternating optimization [24].
Taewon Cho Chapter 2. Background 27
2.4 Variable Projection for Separable Nonlinear Least-
Squares Problems
In Golub and Pereyra [6], O’Leary and Rust [10], for given observation b =
[b1 · · · bn
]T,
a separable nonlinear least-square problem consists of a linear combination of nonlinear
functions that have multiple parameters. Then the residual vector r is defined as
ri(x, y) = bi −n∑j=1
xjφj(y; ti)
where
- ti are independent variables related to bi.
- the nonlinear functions φj(y; ti) are the columns of A(y) and y is computed at all ti.
- xj and the n-dimensional vector y are obtained by minimizing ‖r(x,y)‖22.
Thus we have
‖r(x,y)‖22 = ‖b−A(y)x‖22.
If we assume we know what the nonlinear parameters y are, then the linear parameters x
can be computed as,
x = A(y)+b
where A+ is the pseudoinverse of A(y) and x represents the solution of the linear least
squares problem for fixed y. If we incorporate this in the nonlinear problem, the optimization
problem has the form
Taewon Cho Chapter 2. Background 28
miny
12‖(I−A(y)A(y)+)b‖22
where the linear parameters have vanished. So (I − A(y)A(y)+)b is called the variable
projection of b and I−A(y)A(y)+ is the projector onto the orthogonal complement of the
column space of A(y).
In the variable projection method is an iterative nonlinear algorithm that is used to solve
the minimum problem in a reduced space. In general, it tends to converge in fewer iterations
compared to the original minimization problem. However, convergence of this method is not
guaranteed.
By eliminating x implicitly, we can reduce the cost to depend only on y. Thus this method
will be proper when y has relatively fewer parameters than x. By the fact that ψ(x,y) from
(2.16) is linear in x, we can consider
ρ(y) ≡ ψ(x(y),y) =1
2
∥∥∥∥∥∥∥A(y)
λI
x(y)−
b
0
∥∥∥∥∥∥∥2
2
(2.18)
where x(y) is the solution of minxψ(x,y). Now in order to apply the Gauss-Newton algorithm
to ρ(y), we need to compute
ρ′(y) = ψy(x,y) = ψx ·dx
dy+ ψy ·
dy
dy.
Since x is the solution of minxψ(x,y), we assume that
ψx =
[A(y)T λI
]A(y)
λI
x−
b
0
= 0.
Thus
Taewon Cho Chapter 2. Background 29
ρ′(y) = 0 · dx
dy+ ψy · 1 = ψy,
and with f =
[f1 f2 · · · f2n
]Tfrom f =
A(y)
λI
x−
b
0
,
ψy =1
2
[f1y f2y · · · f2ny
]
f1
f2...
f2n
+
1
2
[f1 f2 · · · f2n
]
f1y
f2y...
f2ny
= fTy f,
and so ρ′(y) = fTy f. From this calculation, we can check that
Jρ = fy =∂
∂y
A(y)x− b
λx
=
∂[A(y)x]∂y
0
. (2.19)
Let Jρ = ∂[A(y)x]∂y
and r = b−A(y)x. Since we can reduce the form as,
JTρ Jρ =
[[∂[A(y)x]∂y
]T0
]∂[A(y)x]∂y
0
=
[∂[A(y)x]∂y
]T ∂[A(y)x]∂y
= JT
ρ Jρ,
−JTρ f =
[[∂[A(y)x]∂y
]T0
]A(y)
λI
x−
b
0
=[∂[A(y)x]
∂y
]T [A(y)x− b
]= J
T
ρ r,
we can get the search direction d` for the Gauss-Newton algorithm by solving
JT
ρ Jρd` = JT
ρ r.
Taewon Cho Chapter 2. Background 30
Finding d` is equivalent to
mind‖J
T
ρ Jρd− JT
ρ r‖22 = mind‖J
T
ρ (Jρd− r)‖22
= mind‖Jρd− r‖22
This reduced Gauss-Newton method for minyρ(y) has the general form :
1. Choose initial y0
2. for ` = 0, 1, 2, . . .
2.1 x` = argminx
∥∥∥∥∥∥∥A(y`)
λ`I
x−
b
0
∥∥∥∥∥∥∥2
2.2 r` = b−A(y`)x`
2.3 d` = argmind‖Jρd− r`‖2
2.4 y`+1 = y` + d`
3. end
Besides computing the search direction d` at each step, we also need to check the step
length or distance of descent. There is an algorithm for this such as the Armijo rule [20] for
step distance α`,
y`+1 = y` + α`d`.
But in our numerical examples, the step size does not affect significantly our numerical result
so we let α` = 1.
For choosing a regularization parameter to solve the linear least squares problem, GCV (2.13)
Taewon Cho Chapter 2. Background 31
is applied to Tikhonov problem
x` = argminx
∥∥∥∥∥∥∥A(y`)
λ`I
x−
b
0
∥∥∥∥∥∥∥2
.
And to obtain the nonlinear parameter direction,
d` = argmind‖Jρd− r`‖2
we solve using the normal equations, d` = (JT
ρ Jρ)−1J
T
ρ r`. For enforcing the constraint,
y ≥ 0, we use the built in Matlab code ‘lsqnonlin.m’, which is based on [8, 9], to enforce the
nonnegativity constraint.
Chapter 3
Exploiting a Low Rank PSF in
Solving Nonlinear Inverse Problems
The general separable nonlinear least squares problem could be written as
minx,y
s.t. y≥0‖A(y)x− b‖22, (3.1)
where x ∈ Rn×1, y ∈ Rn×1 ≥ 0, A(y) ∈ Rn×n, and b ∈ Rn×1. For our problem, we will
consider point spread functions P that have rank 1.
First we consider the symmetric case, where y ∈ Rn×1 and P = 1s(y)
yyT with s(y) =n∑i=1
n∑j=1
yiyj. We also consider the nonsymmetric case, where y, z ∈ Rn×1 and P(y, z) =
1
s(y, z)yzT with s(y, z) =
n∑i=1
n∑j=1
yizj. So we can assume that P is rank 1.
In order to implement the Gauss-Newton approach to solve the reduced problem, we need
to derive the Jacobian, Jρ. So first we assume that the entries of the PSF are nonnegative
32
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 33
and let the sum of the entries of the PSF be 1.
3.1 Symmetric PSF
Let the PSF be P = 1s(y)
yyT where s(y) =n∑i=1
vec(P(y)). Then the Jacobian Jρ ∈ Rn2×n
contains all partial derivatives of vec(P(y)) with respect to y, and Jρ has the form
Jρ =∂vec(P(y))
∂y
=∂
∂y
[1
s(y)y⊗ y
],
(3.2)
because for y ∈ Rn×1
P(y) =1
s(y)yyT =
1
s(y)
y1
y2...
yn
[y1 y2 . . . yn
]=
1
s(y)
y1y1 y1y2 . . . y1yn
y2y1 y2y2 . . . y2yn...
.... . .
...
yny1 yny2 . . . ynyn
,
vec(P(y)) =1
s(y)
[y21 y2y1 . . . yny1 . . . y1yn y2yn . . . y2n
]T=
1
s(y)y⊗ y.
For all k = 1, . . . , n,
∂s(y)
∂yk=
∂
∂yk(n∑i=1
n∑j=1
yiyj) = 2n∑i=1
yi = 2‖y‖1.
Therefore
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 34
∂vec(P(y))
∂y=
∂
∂y1
(y21s(y)
)∂
∂yn
(y21s(y)
)... . . .
...
∂
∂y1
(yny1s(y)
)∂
∂yn
(yny1s(y)
)...
. . ....
∂
∂y1
(y1yns(y)
)∂
∂yn
(y1yns(y)
)... . . .
...
∂
∂y1
(y2ns(y)
)∂
∂yn
(y2ns(y)
)
=
2y1 · s(y)− 2y21‖y‖1s(y)2
0 · s(y)− 2y21‖y‖1s(y)2
... . . ....
yn · s(y)− 2yny1‖y‖1s(y)2
y1 · s(y)− 2yny1‖y‖1s(y)2
.... . .
...
yn · s(y)− 2y1yn‖y‖1s(y)2
y1 · s(y)− 2y1yn‖y‖1s(y)2
... . . ....
0 · s(y)− 2y2n‖y‖1s(y)2
2yn · s(y)− 2y2n‖y‖1s(y)2
=1
s(y)
2y1
y2...
yn
. . .
0
...
0
y1
...
. . ....
yn
0
...
0
. . .
y1
y2...
2yn
− 2‖y‖1s(y)2
y21
y2y1...
yny1
. . .
y21
y2y1...
yny1
...
. . ....
y1yn
y2yn...
y2n
. . .
y1yn
y2yn...
y2n
.
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 35
Thus with 1T =
[1 1 . . . 1
]∈ R1×n,
Jρ =∂vec(P(y))
∂y=
1
s(y)
y1In
...
ynIn
+
y
. . .
y
− 2‖y‖1
s(y)2
y1y1T
...
yny1T
=1
s(y)(y⊗ In + In ⊗ y)− 2‖y‖1
s(y)2(y⊗ y1T )
(3.3)
and Jρ ∈ Rn2×n.
3.2 Non-symmetric PSF
Now consider the scaled PSF with two different nonnegative vectors y ∈ Rn×1 and z ∈ Rn×1,
P(y, z) =1
s(y, z)yzT where s(y, z) =
n∑i=1
vec(yzT ).
The Jacobians can be derived as:
Jy =∂vec(P(y, z))
∂y=
∂
∂y
[1
s(y, z)(z⊗ y)
],
Jz =∂vec(P(y, z))
∂z=
∂
∂z
[1
s(y, z)(z⊗ y)
],
because for y ∈ Rn×1, z ∈ Rn×1,
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 36
P(y, z) =1
s(y, z)yzT =
1
s(y, z)
y1
y2...
yn
[z1 z2 . . . zn
]=
1
s(y, z)
y1z1 y1z2 . . . y1zn
y2z1 y2z2 . . . y2zn...
.... . .
...
ynz1 ynz2 . . . ynzn
,
vec(P(y, z)) =1
s(y, z)
[y1z1 y2z1 . . . ynz1 . . . y1zn y2zn . . . ynzn
]T=
1
s(y, z)z⊗ y.
For all k = 1, . . . , n,
∂s(y, z)
∂yk=
∂
∂yk(n∑i=1
n∑j=1
yizj) =n∑i=1
zi = ‖z‖1,∂s(y, z)
∂zk=
∂
∂zk(n∑i=1
n∑j=1
yizj) =n∑i=1
yi = ‖y‖1.
Therefore
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 37
∂vec(P(y, z))
∂y=
∂
∂y1
(y1z1s(y, z)
)∂
∂yn
(y1z1s(y, z)
)... . . .
...
∂
∂y1
(ynz1s(y, z)
)∂
∂yn
(ynz1s(y, z)
)...
. . ....
∂
∂y1
(y1zns(y, z)
)∂
∂yn
(y1zns(y, z)
)... . . .
...
∂
∂y1
(ynzns(y, z)
)∂
∂yn
(ynzns(y, z)
)
=
z1 · s(y, z)− y1z1‖z‖1s(y, z)2
0 · s(y, z)− y1z1‖z‖1s(y, z)2
... . . ....
0 · s(y, z)− ynz1‖z‖1s(y, z)2
z1 · s(y, z)− ynz1‖z‖1s(y, z)2
.... . .
...
zn · s(y, z)− y1zn‖z‖1s(y, z)2
0 · s(y, z)− y1zn‖z‖1s(y, z)2
... . . ....
0 · s(y, z)− ynzn‖z‖1s(y, z)2
zn · s(y, z)− ynzn‖z‖1s(y, z)2
=1
s(y, z)
z1
0
...
0
. . .
0
...
0
z1
...
. . ....
zn
0
...
0
. . .
0
...
0
zn
− ‖z‖1s(y, z)2
y1z1
y2z1...
ynz1
. . .
y1z1
y2z1...
ynz1
...
. . ....
y1zn
y2zn...
ynzn
. . .
y1zn
y2zn...
ynzn
.
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 38
Thus with 1T =
[1 1 . . . 1
]∈ R1×n,
Jy =∂vec(P(y, z))
∂y=
1
s(y, z)
z1In
...
znIn
− ‖z‖1s(y, z)2
z1y1T
...
zny1T
=1
s(y, z)(z⊗ In)− ‖z‖1
s(y, z)2(z⊗ y1T )
. (3.4)
Similarly,
Jz =∂vec(P(y, z))
∂z=
1
s(y, z)(In ⊗ y)− ‖y‖1
s(y, z)2(z⊗ y1T ) . (3.5)
Note that both Jy, Jz are in Rn2×n. Therefore, we finally get the Jacobian
Jρ =
[Jy Jz
]
=
[1
s(y, z)(z⊗ In)− ‖z‖1
s(y, z)2(z⊗ y1T )
1
s(y, z)(In ⊗ y)− ‖y‖1
s(y, z)2(z⊗ y1T )
]
and Jρ ∈ Rn2×2n. We will use this Jacbian in Section 4.1 for the reduced Gauss-Newton
method.
3.3 Reformulation
We also noticed that a reformulation can be done to simplify the problem. For non-symmetric
PSF, P = yzT , if we fix x0 and y0, then we can show that the problem is linear in z. Let
vec(X0) = x0. Then
minz‖A(y0z
T )x0 − b‖22 = minz‖A(X0)(z⊗ y0)− b‖22.
Taewon Cho Chapter 3. Exploiting a Low Rank PSF 39
And we also can change
z⊗ y0 =
z1y0
z2y0
· · ·
zny0
=
y0
y0
. . .
y0
z1
z2...
zn
= (In ⊗ y0)z. (3.6)
Thus we have
minz‖A(y0z
T )x0 − b‖22 = minz‖A(X0)(In ⊗ y0)z− b‖22. (3.7)
Let Y = A(X0)(In ⊗ y0). Then we have minz‖Yz − b‖22 and we get linear least squares
problem. In the same way, if we fix z0 and x0, then we can reformulate the nonlinear
problem into linear in y:
miny‖A(yzT0 )x0 − b‖22 = min
y‖A(X0)(z0 ⊗ y)− b‖22.
We can change
z0 ⊗ y =
z0,1y
z0,2y
· · ·
z0,ny
=
z0,1In
z0,2In...
z0,nIn
y1
y2
...
yn
= (z0 ⊗ In)y. (3.8)
Thus we have
miny‖A(yzT0 )x0 − b‖22 = min
y‖A(X0)(z0 ⊗ In)y− b‖22. (3.9)
Let Z = A(X0)(z0 ⊗ In). Then we have miny‖Zy − b‖22 and we get a linear least squares
problem again. With these reformulations, the alternating optimization method can be done
efficiently. See Section 4.2.
Chapter 4
Numerical Results
4.1 Variable projection with low rank PSF
In this chapter, we will solve a nonlinear inverse problem with a separable low rank PSF on
an image deblurring example. This experiment is called Blind Deconvolution. The forward
process is (1.3). In real processing, we don’t know how the images are blurred. There
are many types of blurring which have been investigated e.g. Gaussian blur [7]. But our
framework allows more general and realistic blurs through low-rank PSFs.
We assume that the PSF is a rank 1 matrix. Thus we can consider the symmetric case,
P = yyT ,
where y ∈ Rn×1 and y ≥ 0 or the non-symmetric case,
P = yzT ,
where y ∈ Rn×1, z ∈ Rn×1 and y ≥ 0, z ≥ 0. We will apply the PSF to the Grain image
40
Taewon Cho Chapter 4. Numerical Results 41
50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5True yInitial y
(a) ytrue, y0
50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5True zInitial z
(b) ztrue, z0
Figure 4.1: True parameters ytrue, ztrue
from Chung and Nagy [7]. The image size is 256× 256.
This experiment uses a non-symmetiric PSF, so we set true parameter ytrue and ztrue in
Figure 4.1. After constructing the PSF, the matrix A is also made. Then with the true
image, we get b. To give noise to the observed data b, 1% Gaussian white noise is added.
We force y and z to satisfy that the sum of yzT is 1 by dividing by the sum of elements
of yzT . We set the initial y0 by convolving ytrue with a Gaussian kernel [18], [19] and z0
similarly.
We will use Jρ from the result of chapter 3 for the non-symmetric case,
Jρ =
[Jy Jz
]
=
[1
s(y, z)(z⊗ In)− ‖z‖1
s(y, z)2(z⊗ y1T )
1
s(y, z)(In ⊗ y)− ‖y‖1
s(y, z)2(z⊗ y1T )
]and apply the Gauss-Newton method. We need to solve
minx,y,z
s.t. y,z≥0‖A(P(y, z))x− b‖22, (4.1)
Taewon Cho Chapter 4. Numerical Results 42
0 2 4 6 8 10 12 14 16
iteration
0.05
0.1
0.15
0.2
0.25
0.3
rela
tive
erro
r
Relative Error of x
nl=0.025
nl=0.028
nl=0.03
nl=0.032
nl=0.035
(a) Relative Error of x
0 2 4 6 8 10 12 14 16
iteration
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
rela
tive
erro
r
Relative Error of y
nl=0.025
nl=0.028
nl=0.03
nl=0.032
nl=0.035
(b) Relative Error of y
0 2 4 6 8 10 12 14 16
iteration
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
rela
tive
erro
r
Relative Error of z
nl=0.025
nl=0.028
nl=0.03
nl=0.032
nl=0.035
(c) Relative Error of z
0 2 4 6 8 10 12 14 16
iteration
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
rela
tive
erro
r
Relative Error of x, y, and z
nl=0.025
nl=0.028
nl=0.03
nl=0.032
nl=0.035
(d) Relative Error of x, y, and z
Figure 4.2: Comparing λnl = 0.025, 0.028, 0.03, 0.032, 0.035
but this problem is ill-posed. Thus, regularization parameters for linear, λl, and for non-
linear, λnl, parameters are added to give a restriction such that
minx,y,z
s.t. y,z≥0‖A(P(y, z))x− b‖22 + λl‖x‖22 + λnl
∥∥∥∥∥∥∥ y
z
∥∥∥∥∥∥∥2
2
. (4.2)
The linear regularization parameter λl is chosen by GCV (2.13). Without non-linear regu-
larization parameter λnl, y and z are unstable because the condition number of Jρ is a really
Taewon Cho Chapter 4. Numerical Results 43
50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5True yInitial yComputed y
(a) Compare ytrue, yinitial, ycomputed
50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5True zInitial zComputed z
(b) Compare ztrue, zinitial, zcomputed
0 2 4 6 8 10 12 14 160.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
(c) Relative errors of x, y, and z
Figure 4.3: Error compares with graph for Non-symmetric PSF with λnl = 0.03
large number near 1014 even we assumedn∑i=1
n∑j=1
yizj = 1 in Chapter 3. To suppress y and z,
λnl is chosen experimentally, see Figure 4.2. By choosing λnl between 0.025 and 0.035, we
can observe the changing of the relative error of x, y, and z. Among those values λnl = 0.035
shows the smallest relative error of x while the relative errors of y, z drastically increase. In
the aspect of the total relative error, λnl = 0.03 is the best non-linear regularization param-
eter in this experiment. We chose λnl = 0.03, we can check that y, z are slightly closer to
ytrue, ztrue in the graphs of Figure 4.3. Also, the relative error of x is much closer to zero as
the iteration number increases. But after iteration ` = 15, there was no significant change
Taewon Cho Chapter 4. Numerical Results 44
Table 4.1: Table of relative norm errors for Non-symmetric PSF with λnl = 0.03
` ‖xtrue−x`‖2‖xtrue‖2
‖ytrue−y`‖2‖ytrue‖2
‖ztrue−z`‖2‖ztrue‖2
1 0.2810 0.2560 0.26712 0.2528 0.2539 0.26523 0.2176 0.2504 0.26164 0.1962 0.2470 0.25815 0.1824 0.2439 0.25476 0.1720 0.2410 0.25167 0.1647 0.2384 0.24888 0.1580 0.2360 0.24619 0.1518 0.2338 0.243510 0.1461 0.2317 0.241111 0.1407 0.2299 0.238912 0.1363 0.2283 0.236813 0.1321 0.2268 0.234814 0.1284 0.2255 0.233015 0.1245 0.2244 0.231316 0.1245 0.2244 0.2313
in the errors. We can check the numbers of the relative errors in Table 4.1.
In Figure 4.4, we see how the Grain image is changed as the reduced Gauss-Newton method
is applied. The final image looks still far from true one, but the images becomes closer to
the true image as ` increases. In addition, there were unexpected results when the same
experiment was tried in the case of symmetric blur such that the relative errors of x, y do
not decrease constantly and they end up increasing at some point. In order to investigate
the problem, we fix xtrue and alternate between solving the linear problems for y and z.
In Figure 4.5, we see how the PSF is changed as the reduced Gauss-Newton method is
applied. In the true PSF, we see that the blur contains oscillations in both y and z. The
initial guesses y0 and z0 have no oscillation so that the initial PSF starts from a continuous
blur. As ` increases, the final PSF looks closer to the true PSF than the initial PSF.
Taewon Cho Chapter 4. Numerical Results 45
True Image Blurred Image
Figure 4.4: A comparison the true, blurred, deblurred images as the reduced Gauss-Newton method
with non-symmetric PSF and λnl = 0.03
Taewon Cho Chapter 4. Numerical Results 46
True PSF
Figure 4.5: A comparison the true, reconstructed PSFs with λnl = 0.03
Taewon Cho Chapter 4. Numerical Results 47
4.2 xtrue - Alternating Optimization
In section 3.3, we changed the nonlinear problem to a sequence of linear problems by fixing
xtrue and y0, or xtrue and z0. For this linear problem, we have used the matlab function
lsqlin.m to enforce nonnegativity [8] and linear equality [17] when we need to solve a con-
strained least squares problem. First for xtrue, we have the problem
miny,z
s.t. y,z≥0‖y‖1=‖z‖1=1
‖A(P(y, z))xtrue − b‖22, (4.3)
where P(y, z) =1
s(y, z)yzT with s(y, z) =
n∑i=1
n∑j=1
yizj. Then the alternating optimization
has the general form:
1. Choose y0.
2. For ` = 0, 1, 2, . . .,
- Set zinitial s.t. minz‖A(Xtrue)(In ⊗ y`)z− b‖2 .
- Modify zinitial to be nonnegative and ‖z‖1 = 1.
2.1 Find z`+1 s.t. minz≥0,‖z‖1=1
‖A(Xtrue)(In ⊗ y`)z− b)‖2 with zinitial.
- Set yinitial s.t. miny‖A(Xtrue)(z`+1 ⊗ In)y− b‖2 .
- Modify yinitial to be nonnegative and ‖y‖1 = 1.
2.2 Find y`+1 s.t. miny≥0,‖y‖1=1
‖A(Xtrue)(z`+1 ⊗ In)y− b)‖2 with yinitial.
3. end.
Taewon Cho Chapter 4. Numerical Results 48
0 50 100 150 200 250 3000
0.01
0.02
0.03
0.04
0.05
0.06
True yInitial yComputed y
(a) Compare ytrue, yinitial, ycomputed
0 50 100 150 200 250 3000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
True zInitial zComputed z
(b) Compare ztrue, zinitial, zcomputed
0 1 2 3 4 5 6 7 8 9 10
0.05
0.1
0.15
0.2
0.25
0.3
(c) Relative errors of y and z
Figure 4.6: Error Compares with Alternating Optimization
To start this algorithm, y0 is chosen by convolving between ytrue and a Gaussian ker-
nel [18], [19]. For setting the PSF, P(y, z), we make the sum of all elements of P(y, z) to be
1 by dividing yzT byn∑i=1
n∑j=1
yizj as we set in Chapter 3. However by forcing ‖y‖1 = ‖z‖1 = 1,
the sum of all elements of P(y, z) is automatically equal to 1 because
n∑i=1
n∑j=1
yizj =n∑i=1
yin∑j=1
zj
=n∑i=1
|yi|n∑j=1
|zj| since y, z ≥ 0
= ‖y‖1‖z‖1 = 1 since ‖y‖1 = ‖z‖1 = 1.
Taewon Cho Chapter 4. Numerical Results 49
Table 4.2: Table of relative norm errors for Alternating Optimization
` ‖ytrue−y`‖2‖ytrue‖2
‖ztrue−z`‖2‖ztrue‖2
0 0.25847714 0.045904911 0.04462480 0.043927982 0.04437537 0.043938023 0.04434619 0.043942304 0.04434112 0.043920705 0.04433983 0.043921046 0.04433946 0.043921137 0.04433936 0.043921158 0.04433934 0.043921169 0.04433933 0.0439211610 0.04433933 0.04392116
For the linear least squares problem with y, z ≥ 0 and ‖y‖1 = ‖z‖1 = 1, the matlab code
lsqlin.m has been used with initial vector yinitial, zinitial respectively. When zinitial,yinitial has
been set for the first iteration, they aren’t nonnegative and their 1-norm isn’t 1. To give an
initial vector for lsqlin.m, we give a small modification by changing small negative numbers
to zero and normalizing it. Once the alternating optimization process has started, we can
check that the relative error of y has dropped to a smaller error. The error of z started
small and there is no great change, as the iterations continue. See Figure 4.6. Also the
computed y, z are really close to the true y, z. Actually, we need to show convergence of the
alternating optimization method if we want to give valid theoretical fundamentals. Without
a proof of the convergence, it may not be expected that y`, zz will converge to the true y,
z. In Chan and Wong [24], if a model is not convex then it can allow multiple solutions.
Though they use a PSF with only one parameter, they have found convergence globally with
the converged solution depending on the initial guess. In this thesis, we assume that the
PSF is a rank-1 matrix so that we have n2 nonlinear parameters, which means we need to
consider results may more parameters than the model of Chan and Wong. Therefore we just
Taewon Cho Chapter 4. Numerical Results 50
give numerical experiment, without convergence results. Showing theoretical convergence,
remains future work.
Taewon Cho Chapter 4. Numerical Results 51
4.3 Alternating Optimization 3 ways
This section is a extension of alternating optimization to solve for x, y, and z. Thus it is
much more difficult. After initializing y0 and z0, we will compute initial x0 by solving
minx‖A(P(y0, z0))x− b‖2 (4.4)
using a regularization method, namely a Weighted-GCV Method for Lanczos-Hybrid Regu-
larization [23] with matlab code HyBR.m [21, 22, 23]. Also, we enforce y, z to be nonnegative
and ‖y‖1 = ‖z‖1 = 1. Then the alternating optimization 3 ways has the general form:
1. Choose y0, z0.
2. For ` = 0, 1, 2, . . .,
2.1 Compute x` s.t. minx‖A(P(y`, z`))x− b‖2 and vec(X`) = x`.
- Set zinitial s.t. minz‖A(X`)(In ⊗ y`)z− b‖2.
- Modify zinitial to be nonnegative and ‖z‖1 = 1.
2.2 Find z`+1 s.t. minz≥0,‖z‖1=1
‖A(X`)(In ⊗ y`)z− b‖2 with zinitial.
- Set yinitial s.t. miny‖A(X`)(z`+1 ⊗ In)y− b‖2 .
- Modify yinitial to be nonnegative and ‖y‖1 = 1.
2.3 Find y`+1 s.t. miny≥0,‖y‖1=1
‖A(X`)(z`+1 ⊗ In)y− b‖2 with yinitial.
3. end.
For this experiment, we consider a smaller example, e.g., 64 × 64, by cutting the middle
Taewon Cho Chapter 4. Numerical Results 52
0 5 10 150.2
0.4
0.6
0.8
1
1.2
1.4
1.6
(a) Relative errors of x, y and z
Figure 4.7: Error Compares with Alternating Optimization 3 ways
part of the Grain image. Then A ∈ R642×642 , b,x ∈ R642×1. When we just try to find
x, y, and z through the alternating optimization 3 ways in Figure 4.7, it doesn’t give us
satisfactory results as shown in the above plot. Since we start with x0, which has been
derived from y0, z0 by using HyBR.m, x0 may be far from xtrue. To compensate for this
weak point, we add more information by using rotated images with the assumption that the
PSF is invariant. In this case, A and b have to be changed. Since we may lose some parts of
the original images interpolation is need if we rotate by degrees except 90o, 180o, and 270o.
Thus, we will just consider those 3 types of rotation, see Figure 4.8.
For the image vector x, let’s denote Rx to be the rotated image with a rotation matrix
Rk ∈ Rn2×n2where k = 0, 1, 2, 3. Then Rkx rotates image x counterclockwise by k · 90o.
The corresponding observed image is bk = A(P(y, z))Rkx.
Taewon Cho Chapter 4. Numerical Results 53
True Image, 0 o True Image, 90 o True Image, 180 o True Image, 270 o
Blurred image, 0 o Blurred Image, 90 o Blurred Image, 180 o Blurred Image, 270 o
Figure 4.8: Rotated true and blurred images with 0o, 90o, 180o, 270o.
Thus we can state again our problem as with k = 0, 1, 2, 3,
minx,y,z
s.t. y,z≥0‖y‖1=‖z‖1=1
∥∥∥∥∥∥∥∥∥∥∥∥∥
A(P(y, z))R0
A(P(y, z))R1
...
A(P(y, z))Rk
x−
b0
b1
...
bk
∥∥∥∥∥∥∥∥∥∥∥∥∥
2
2
, (4.5)
and let
A(P(y, z)) =
A(P(y, z))R0
A(P(y, z))R1
...
A(P(y, z))Rk
and b =
b0
b1
...
bk
. (4.6)
Now this problem can be considered as minimizing ‖A(P(y, z))x− b‖22 where A(P(y, z)) ∈
Rkn2×n2and b ∈ Rkn2
. The relative errors are shown in Figure 4.10. We can expect that
more rotations give better reconstructions.
Taewon Cho Chapter 4. Numerical Results 54
0 10 20 30 40 50 60 700
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08Error between True and Computed y
0 rotation1 rotation2 rotation3 rotation
(a) Computed y and True y
0 10 20 30 40 50 60 700
0.02
0.04
0.06
0.08
0.1
0.12
0.14Error between True and Computed z
0 rotation1 rotation2 rotation3 rotation
(b) Computed z and True z
Figure 4.9: A comparison errors of y and z by Alternating Optimization 3 ways with rotation
From Figure 4.10, we can check that the relative errors of x, y, z, and PSF decrease as the
number of rotations grow. Using three rotations, we get the smallest errors overall. From
this we can expect that we can get better computed x, y, and z if we have more rotation
images. In Figure 4.9, as we have checked the relative errors in Figure 4.10, the alternating
optimization 3 ways with three times rotations gives us a meaningful error between the
true and computed y, z. The errors are just absolute difference values for each of the
corresponding yi, zi, i = 1, . . . , 64. While the other cases go away from the true y, z, three
times rotation gives the final computed y, z which is close to the true y, z respectively.
A potential problem of the alternating optimization 3 ways with rotation is that it taken
more time for bigger n. As n (size of image) and k (number of rotations) are bigger, the
algorithm needs more time to compute. When we look at Figure 4.11, 4.12, the results show
that the computed image and PSF with more rotations will give us better consequences.
In Figure 4.13, we repeat the same experiment with the original Grain image, whose size
is 256 × 256. Without rotation, the deblurred image gets worse and the first image is still
better than after 16 iteration of Alternating Optimization 3 ways. However, with 2 or 3
Taewon Cho Chapter 4. Numerical Results 55
0 5 10 15
iteration
0
0.5
1
1.5
2
2.5
rela
tive
erro
r
Relative Error of x
0 rotation1 rotation2 rotation3 rotation
(a) Relative errors of x
0 5 10 15
iteration
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
rela
tive
erro
r
Relative Error of y
0 rotation1 rotation2 rotation3 rotation
(b) Relative errors of y
0 5 10 15
iteration
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
rela
tive
erro
r
Relative Error of z
0 rotation1 rotation2 rotation3 rotation
(c) Relative errors of z
0 5 10 15
iteration
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
rela
tive
erro
r
Relative Error of PSF
0 rotation1 rotation2 rotation3 rotation
(d) Relative errors of PSF
Figure 4.10: Relative errors by Alternating Optimization 3 ways with rotation
rotations, the computed images are much better than no rotation or just 1 rotation. In
Figure 4.14, for PSF of 256× 256, 2 or 3 rotations are better than 0 or 1 rotation. However,
the reconstructed PSFs are still poor approximation of the true PSF. When we use the larger
size of Grain image, the results are not as good as the case of 64× 64. However, with more
rotations, better reconstructions are likely.
Taewon Cho Chapter 4. Numerical Results 56
True Image Initial Image Final Image
(a) Computed images with No rotation
True Image Initial Image Final Image
(b) Computed images with One rotation
True Image Initial Image Final Image
(c) Computed images with Two rotation
True Image Initial Image Final Image
(d) Computed images with Three rotation
Figure 4.11: Computed images by Alternating Optimization 3 ways with rotation (64× 64 size)
Taewon Cho Chapter 4. Numerical Results 57
True PSF Initial PSF Final PSF
(a) Computed PSF with No rotation
True PSF Initial PSF Final PSF
(b) Computed PSF with One rotation
True PSF Initial PSF Final PSF
(c) Computed PSF with Two rotation
True PSF Initial PSF Final PSF
(d) Computed PSF with Three rotation
Figure 4.12: Computed PSFs by Alternating Optimization 3 ways with rotation (64× 64 size)
Taewon Cho Chapter 4. Numerical Results 58
True Image Initial Image Final Image
(a) Computed images with No rotation
True Image Initial Image Final Image
(b) Computed images with One rotation
True Image Initial Image Final Image
(c) Computed images with Two rotation
True Image Initial Image Final Image
(d) Computed images with Three rotation
Figure 4.13: Computed images by Alternating Optimization 3 ways with rotation (256× 256 size)
Taewon Cho Chapter 4. Numerical Results 59
True PSF Initial PSF Final PSF
(a) Computed PSF with No rotation
True PSF Initial PSF Final PSF
(b) Computed PSF with One rotation
True PSF Initial PSF Final PSF
(c) Computed PSF with Two rotation
True PSF Initial PSF Final PSF
(d) Computed PSF with Three rotation
Figure 4.14: Computed PSFs by Alternating Optimization 3 ways with rotation (256× 256 size)
Chapter 5
Conclusions and Discussion
We have experimented with different cases for image deblurring with respect to different
methods. Solving nonlinear inverse problems is a very challenging problem but it is more
practical because we actually don’t know what the true image is and how the true images
are blurred in the real world. In this thesis we assumed that the blurred processing is based
on a rank-1 PSF. It looks pretty restricted, but it is enough to be applicable to some cases
if blurs only happen vertically and horizontally. Also rank-1 PSFs have the special property
that the blur matrix can be expressed as a Kronecker product.
For the reduced Gauss-Newton method, we made the Jacobian matrix with Kronecker prod-
uct, and we can observe that the blurred image has been deblurred as much as we expected.
It takes much time but we can improve the time by modifying the algorithm in Matlab. Since
computing Jacobian matrices or multiplication with these matrices require lots of time, we
can improve that part.
While we are trying to get more stable solutions which are close to the exact solution, we have
found that this nonlinear inverse problem with 1-rank PSF can be rewritten as a sequence of
60
Taewon Cho Chapter 5. Conclusions and Discussion 61
linear inverse problems by fixing some linear and nonlinear parameters. As we have discussed
in the introduction section, we already have efficient ways to solve a linear problem so that
we can apply them within nonlinear solvers. But when we started with the only initial y0,
z0, we get a solution that is far away from the true one as we repeat the algorithm. To
overcome and give more information for the linear problem, we take a rotation of images
which gave us improved results as the number of iterations increases.
We tried the alternating optimization 3 ways with rotation for a small image, but if we can
make rotation images or compute solutions more efficiently, then we can try to apply this
approach to larger images with a short time.
In future research, we need to extend the application to higher rank PSFs and different types
of images. Because we have checked the Grain image with reflexive boundary conditions for
these experiments, we may also want to use sparse images such as a satellite or a bone in a
medical image with the different PSF boundary conditions.
As we mentioned in Section 4.2., showing convergence of the alternating optimization is also
future work. But then we may need to consider PSFs with a smaller number of parameters,
less than n2.
Chapter 6
References
[1] D. Calvetti and L. Reichel. Tikhonov regularization with a solution constraint. SIAM J.
Sci. Comput., 26:224–239, 2004.
[2] Per Christian Hansen. Discrete Inverse Problems: Insight and Algorithms. SIAM, 2010.
[3] Per Christian Hansen, James G. Nagy, Dianne P. O’Leary. Deblurring Images: Matrices,
Spectra, and Filtering SIAM, 2006.
[4] Nocedal, Jorge, Wright, S. Numerical Optimization 2nd. Springer, 2006.
[5] Per Christian Hansen. Rank-deficient and Discrete Ill-posed Problems: Numerical As-
pects of Linear Inversion. SIAM, 1998.
[6] Gene Golub, Victor Pereyra. Separable nonlinear least squares: the variable projection
method and its applications. Institute of Physics Publishing, Inverse Problem, 19, R1-
R26, 2003.
[7] Julianne Chung, James G. Nagy. An efficient iterative approach for large-scale separable
nonlinear inverse problems. SIAM J. Sci. Comput. Vol. 31, No. 6, pp. 4654-4674, 2010.
62
Taewon Cho Chapter 5. Conclusions and Discussion 63
[8] Thomas F. Coleman, Yuying Li. An Interior, trust region approach for nonlinear mini-
mization subject to bounds. SIAM journal on Optimization, Vol.6, pp. 418-445, 1996
[9] Thomas F. Coleman, Yuying Li. On the convergence of interior-reflexive Newton methods
for nonlinear minimization subject to bounds. Mathematical Programming 67 189-224,
1994
[10] Dianne P. O’Leary, Bert W. Rust. Variable Projection for Nonlinear Least Squares
Problems. Computational Optimization and Application, 54, pp.579-593, 2013
[11] Anastasia Cornelio, Elena Loli Piccolomini, James G. Nagy. Constrained numerical
optimization methods for blind deconvolution. Numerical Algorithms, vol. 65, p. 23-42,
2014
[12] Rafael C. Gonzalez, Richard E. Woods. Digital Image Processing, 3rd edition. Prentice
Hall, 2008
[13] Julie Kamm, James G. Nagy. Kronecker product and SVD approximations in image
restoration. Linear Algebra and its Applications 284, 177-192, 1998
[14] James W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997
[15] Per Christian Hansen. The discrete Picard condition for discrete ill-posed problems.
BIT, 30, pp. 503-518, 1990
[16] C. F. Van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM,
Philadelphia, pp. 39, 42, 47, 1992
[17] Gill, P. E., W. Murray, and M. H. Wright. Practical Optimization. Academic Press,
London, UK, 1981
Taewon Cho Chapter 5. Conclusions and Discussion 64
[18] Harris, Fredric J. On the Use of Windows for Harmonic Analysis with the Discrete
Fourier Transform. Proceedings of the IEEE. Vol. 66, pp. 51-83., January 1978
[19] Roberts, Richard A., and C. T. Mullis. Digital Signal Processing. Reading, MA:
Addison-Wesley, pp. 135-136., 1987
[20] C. T. Kelley. Iterative Methods for Optimization. SIAM, Philadelphia, PA., 1999
[21] Paige and Saunders. LSQR an algorithm for sparse linear equations an sparse least
squares. ACM Trans. Math Software, 8, pp. 43-71., 1982
[22] Bjorck, Grimme and Van Dooren. An implicit shift bidiagonalization algorithm for
ill-posed systems. BIT 34, pp. 520-534., 1994
[23] Chung, Nagy and O’Leary. A Weighted-GCV Method for Lanczos-Hybrid Regulariza-
tion. ETNA 28, pp. 149-167, 2008
[24] Tony F. Chan, C.K. Wong Convergence of the alternating minimization algorithm for
blind convolution. Linear Algebra and Applications 316 259-285, 2000
top related