numerical methods for separable nonlinear inverse …numerical methods for separable nonlinear...

Numerical Methods for Separable Nonlinear Inverse Problems with

Constraint and Low Rank

Taewon Cho

Thesis submitted to the Faculty of the

Virginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Master

Mathematics

Julianne Chung, Chair

Matthias Chung

Mark Embree

Nov 20, 2017

Blacksburg, Virginia

Keywords: Nonlinear Inverse Problem, Image Deblurring, Gauss-Newton method, Variable

Projection, Alternating Optimization

Taewon Cho

(ABSTRACT)

In this age, there are many applications of inverse problems to lots of areas ranging from

astronomy, geoscience and so on. For example, image reconstruction and deblurring require

the use of methods to solve inverse problems. Since the problems are subject to many factors

and noise, we can’t simply apply general inversion methods. Furthermore in the problems

of interest, the number of unknown variables is huge, and some may depend nonlinearly on

the data, such that we must solve nonlinear problems. It is quite different and significantly

more challenging to solve nonlinear problems than linear inverse problems, and we need to

use more sophisticated methods to solve these kinds of problems.

Taewon Cho

(GENERAL AUDIENCE ABSTRACT)

In various research areas, there are many required measurements which can’t be observed due

to physical and economical reasons. Instead, these unknown measurements can be recovered

by known measurements. This phenomenon can be modeled and be solved by mathematics.

Contents

1 Introduction 1

2 Background 5

2.1 Point Spread Function (PSF) . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 One-dimensional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 Two-dimensional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.3 Low-Rank PSF problem . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Regularization for the Linear Problem . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Picard Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Spectral Filtering Methods . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 Choosing the Regularization Parameter . . . . . . . . . . . . . . . . . 19

2.3 Gauss-Newton Method Nonlinear Least Squares . . . . . . . . . . . . . . . . 23

2.4 Variable Projection for Separable Nonlinear Least-Squares Problems . . . . . 27

3 Exploiting a Low Rank PSF in Solving Nonlinear Inverse Problems 32

3.1 Symmetric PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Non-symmetric PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Numerical Results 40

4.1 Variable projection with low rank PSF . . . . . . . . . . . . . . . . . . . . . 40

4.2 xtrue - Alternating Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Alternating Optimization 3 ways . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Conclusions and Discussion 60

6 References 62

List of Figures

1.1 Forward Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Example of image blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Blurring by PSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Discrete Picard Conditions from [2] . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 True parameters ytrue, ztrue . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Comparing λnl = 0.025, 0.028, 0.03, 0.032, 0.035 . . . . . . . . . . . . . . . . . 42

4.3 Error compares with graph for Non-symmetric PSF with λnl = 0.03 . . . . . 43

4.4 A comparison the true, blurred, deblurred images as the reduced Gauss-Newton

method with non-symmetric PSF and λnl = 0.03 . . . . . . . . . . . . . . . . . 45

4.5 A comparison the true, reconstructed PSFs with λnl = 0.03 . . . . . . . . . . 46

4.6 Error Compares with Alternating Optimization . . . . . . . . . . . . . . . . 48

4.7 Error Compares with Alternating Optimization 3 ways . . . . . . . . . . . . 52

4.8 Rotated true and blurred images with 0o, 90o, 180o, 270o. . . . . . . . . . . . 53

4.9 A comparison errors of y and z by Alternating Optimization 3 ways with rotation 54

4.10 Relative errors by Alternating Optimization 3 ways with rotation . . . . . . . . . 55

4.11 Computed images by Alternating Optimization 3 ways with rotation (64× 64 size) 56

4.12 Computed PSFs by Alternating Optimization 3 ways with rotation (64× 64 size) . 57

4.13 Computed images by Alternating Optimization 3 ways with rotation (256× 256 size) 58

4.14 Computed PSFs by Alternating Optimization 3 ways with rotation (256× 256 size) 59

List of Tables

4.1 Table of relative norm errors for Non-symmetric PSF with λnl = 0.03 . . . . 44

4.2 Table of relative norm errors for Alternating Optimization . . . . . . . . . . 49

Chapter 1

Introduction

Input Forward Operation Output

Figure 1.1: Forward Problem

We are usually interested in forward problems where, we just compute data from given

parameters. In other words, if we have input data and a forward procedure, we observe

output data. In this case the input data and the forward procedure are known quantities,

and the output is unknown. For the problems of interest here, evaluating the forward

procedure usually does not require us to spend much time or lots of cost. But let’s consider

the problem where we know the output data and don’t know one or more of the input

parameters and forward procedure. Then inversion becomes a much more complicated task

because we need to modify this system to get unknown data. Problems where the goal is

to compute unknown input data and the forward procedure from output data, are called

inverse problems [2]. In simulated problems, we get observed data from the forward system

Taewon Cho Chapter 1. Introduction 2

(a) True Image (b) Blur (c) Blurred Image

Figure 1.2: Example of image blurring

and the goal is to obtain the original one; it is an inverse problem. Other examples arise

when we take blurred images in astronomy, medicine, or geoscience. We desire to obtain true

images from blurred images (Figure 1.1) or we want to check some characteristic of inside

the earth through surface measurements.

In order to understand the inverse problem, we need tools from basic linear algebra and

matrix computations. Let A ∈ Rn×n be a linear operator and let x ∈ Rn×1, b ∈ Rn×1 be

vectors. Then the forward model is given by b = Ax. That is, in the forward model, we

know A, x and we compute b by matrix-vector multiplication. On the other hand, if we

only know A and b, then it becomes a complicated problem to find the exact x. This type

of problem is known as an inverse problem and in well-posed situations, we could get x by

using the inverse matrix (or pseudo inverse) of A. But in many real applications, A is more

likely to be ill-conditioned or a singular matrix. This means that the inverse may not exist

and if it exists, it may not be easy to compute the inverse matrix of A, A−1 even when A

is well-conditioned but large, and solve for x = A−1b even if we can use high performance

computers. Then it will require tons of time and cost to compute the inverse matrix. When

A is singular, A−1 does not exist. And if A is not a square matrix and it is a m-by-n

rectangular matrix with m > n, we are not able to use general ways. Sometimes x won’t be

unique. So we need to solve that least squares problem, minx‖b−Ax‖, using techniques such

as the normal equations, QR decomposition, and Singular Value Decomposition (SVD)such

as A = UΣVT where the U, V are orthogonal matrices and the Σ is a diagonal matrix with

nonnegative real numbers.

Regularization is one approach to impose prior knowledge to solve the inverse problem more

correctly. The meaning of solving correctly is obtaining xreg which is close to xtrue. To

analyze what happens when solving for x, we use the SVD form, where xreg is expressed by

xreg =n∑i=1

φiuTi b

σivi, (1.1)

where the σi’s are diagonal elements of Σ, the ui’s are column vectors of U, and the vi’s are

column vectors of V. And the φi’s are filtering factors which play a big role in regularization.

We will discuss regularization more in chapter 2.

But even if we use regularization to stabilize the inversion process, it is still a challenge

for us because the forward operator A may depend on some unknown parameters. In many

real cases, A may not be known exactly. Thus when we build A, we need to incorporate a

new variable y. Now we need to consider the form

A = A(y)

where y ∈ Rn×1. Then we will have

b = A(y)x

and we need to solve a nonlinear least squares problem with both linear parameters x and

nonlinear parameters y,

minx,y‖b−A(y)x‖22. (1.2)

In many scenarios, it may be desirable to include additional solution constraints. For exam-

ple, since images often represent light intensities or densities, pixels should be expressed by

nonnegative values in the matrix. Thus we could force x to be constrained such as x ≥ 0.

Various methods have been investigated such as the Active Set method [11]. However in our

numerical experience, we did not observe significant improvements for the Active Set, thus

we do not consider it here. However, we do enforce y ≥ 0.

Therefore our goal in this thesis is to solve the following constrained nonlinear least squares

problem

minx,y

s.t. y≥0‖A(y)x− b‖22 (1.3)

where A is forward operator matrix depending on unknown parameters y, b is observed data

vector, and x is true parameter vector. To solve this problem, we will use the Gauss-Newton

method, but it is very expensive to compute the Jacobian at every iteration. Thus we need

to discuss how to use variable projection methods to reduce the computational costs and

exploit the separable model. We also consider alternating optimization methods that can

exploit problem structure.

In this thesis, we will start by introducing how to construct the PSF and how to math-

ematically describe image deblurring as a linear model. And then we are going to look at

how to regularize the linear problem and investigate methods for nonlinear least squares

problems including Gauss-Newton, variable projection, and alternating optimization.

Then we will apply the numerical method to blind image deblurring problems and analyze

the numerical results.

Chapter 2

Background

2.1 Point Spread Function (PSF)

First we describe what the point spread function (PSF) is. The PSF is very important in

image processing because it can be used to describe a blur and define the forward operation.

There are different reasons for blurs. Mainly we can split them into physical and mechanical

processes. For example in taking a photo, moving the camera or imaging through the

atmosphere physically cause blurs. A deformed or broken lens is one mechanical reason for

blur [3, 12].

The PSF can be used to construct mathematical models. The PSF is based on the assumption

that each pixel is blurred by its neighboring pixels. For example in two-dimensions, if there

is the only intensity at the center of the matrix, then a PSF array containing values of 110

would result in the origin image being blurred.

In Figure 2.1, the operator � means that each cell in the blurred matrix is calculated by

a sum of component-wise multiplication between the origin matrix and PSF array with the

Taewon Cho Chapter 2. Background 6

Origin

PSF array

Blurred

Figure 2.1: Blurring by PSF

assumption that outside components of the origin matrix are zeros. By matching the center

of the PSF array and the chosen node, we can compute the blurred matrix like below.

· In center node (1,1),

0 · 110

0 · 110 0 0

0 · 110

0 · 210

0 · 110 0 0

0 · 110

1 · 110 0 0

0 0 0 0 0

0 · 110

0 · 210

0 · 110

1 · 110

Sum of all

Blurred

0 0 · 110

0 · 110

0 · 110 0

0 0 · 110

0 · 210

0 · 110 0

0 0 · 110

1 · 110

0 · 110 0

0 0 0 0 0

0 · 110

0 · 210

0 · 110

1 · 110

0 · 110

Sum of all

Blurred

0 0 0 · 110

0 · 110

0 0 0 · 110

0 · 210

0 · 110

0 0 1 · 110

0 · 110

0 0 0 0 0

0 · 110

0 · 210

0 · 110

1 · 110

0 · 110

Sum of all

Blurred

0 0 0 0 0

0 · 110

0 · 110 0 0

0 · 110

0 · 210

1 · 110 0 0

0 · 110

0 · 110 0 0

0 0 0 0 0

0 · 110

0 · 210

1 · 110

0 · 110

110Sum of all

Blurred

0 0 0 0 0

0 0 · 210

0 · 110

0 · 110 0

0 0 · 110

1 · 210

0 · 110 0

0 0 · 110

0 · 110

0 · 110 0

0 0 0 0 0

0 · 110

1 · 210

0 · 110

210Sum of all

Blurred

0 0 0 0 0

0 0 0 · 110

0 · 110

0 0 1 · 110

0 · 210

0 · 110

0 0 0 · 110

0 · 110

0 0 0 0 0

0 · 110

1 · 110

0 · 210

0 · 110

110Sum of all

Blurred

0 0 0 0 0

0 · 110

1 · 110 0 0

0 · 110

0 · 210

0 · 110 0 0

0 · 110

0 · 110 0 0

0 · 110

1 · 110

0 · 110

0 · 210

0 · 110

Sum of all

Blurred

0 0 0 0 0

0 0 · 110

1 · 110

0 · 110 0

0 0 · 110

0 · 210

0 · 110 0

0 0 · 110

0 · 110

0 · 110 0

0 · 110

1 · 110

0 · 110

0 · 210

0 · 110

Sum of all

Blurred

0 0 0 0 0

0 0· 1 · 110

0 · 110

0 0 0 · 110

0 · 210

0 · 110

0 0 0 · 110

0 · 110

1 · 110

0 · 110

0 · 210

0 · 110

Sum of all

Blurred

Regardless of the location of the pixel, the blur is same. This is called a spatially invariant

blur. This is just one example of a blur PSF and various boundary conditions can be used for

the image. In the blur process, regardless of the PSF, each pixel is influenced by neighboring

pixels.

2.1.1 One-dimensional

First let’s see how to construct the blur matrix in one-dimension. We need to define the

forward operation describing that each pixel has an effect on each other with a described

weight from the PSF. If p(s) and x(s) are continuous functions, then the convolution of p(s)

and x(s) is defined by b(t), which is a Fredholm integral equation of the first kind such that

b(s) =

∫ ∞−∞

p(s− t)x(t) dt. (2.1)

Then for each s, b(s) is obtained by the integration of x(t) with a weight from function

p. So we have to flip the function p and shift to get p(s − t). For the discrete version of

convolution, we consider the vectors x, p, and b as the true image, PSF array, and blurred

image respectively. In one-dimension, for example the true image and PSF in R3×1 are

denoted by

where w1 and y1 are pixels outside of the original image, which is called the boundary. In

order to get a pixel value in the blurred image we flip and shift the PSF array, and get,

b1 = p3w1 + p2x1 + p1x2,

b2 = p3x1 + p2x2 + p1x3,

b3 = p3x2 + p2x3 + p1y1.

Then we could write this convolution as

p3 p2 p1

Now depending on how w1 and y1 are defined, we could define different boundary conditions

or assumptions.

• Zero boundary condition: Setting boundary pixels to zero. In this case, w1 = 0 and

y1 = 0. And the blur matrix is a Toeplitz matrix.

p3 p2 p1

• Periodic boundary condition: Setting boundary pixels to be periodic with respect to

inside image pixels. In this case, w1 = x3 and y1 = x1. And the matrix is a circulant

matrix.

p3 p2 p1

p2 p1 p3

p3 p2 p1

p1 p3 p2

• Reflexive boundary condition: Setting boundary pixels to reflect inside image pixels.

In this case, w1 = x1 and y1 = x3. And the matrix is a Toeplitz plus Hankel matrix.

p3 p2 p1

p3 + p2 p1

p3 p2 p1

p3 p2 + p1

2.1.2 Two-dimensional

Now let’s check the two-dimensional case in R3×3. Consider the matrices

w1 w2 w3 w4 w5

w6 x11 x12 x13 w7

w8 x21 x22 x23 w9

w10 x31 x32 x33 w11

w12 w13 w14 w15 w16

p11 p12 p13

p21 p22 p23

p31 p32 p33

, and B =

b11 b12 b13

b21 b22 b23

b31 b32 b33

where wi’s are outside of the original image. To get a blurred image by the convolution

operation in two-dimensions, we flip the matrix P vertically and horizontally and shift it.

Then, for instance, we get the elements of B such that

b11 = p33w1 + p32w2 + p31w3 + p23w6 + p22x11 + p21x12 + p13w8 + p12x21 + p11x22,

b21 = p33w6 + p32x11 + p31x12 + p23x11 + p22x21 + p21x22 + p13w10 + p12x31 + p11x32,

b31 = p33w8 + p32x21 + p31x22 + p23x12 + p22x31 + p21x32 + p13w12 + p12w13 + p11w14,

b12 = p33w2 + p32w3 + p31w4 + p23w8 + p22x12 + p21x13 + p13x21 + p12x22 + p11x23,

b22 = p33x11 + p32x12 + p31x13 + p23x21 + p22x22 + p21x23 + p13x31 + p12x32 + p11x33,

b32 = p33x21 + p32x22 + p31x23 + p23x22 + p32x22 + p21x33 + p13w13 + p12w14 + p11w15,

b13 = p33w3 + p32w4 + p31w5 + p23w10 + p22x13 + p21w7 + p13x22 + p12x23 + p11w9,

b23 = p33x12 + p32x13 + p31w7 + p23x31 + p22x23 + p21w9 + p13x32 + p12x33 + p11w11,

b33 = p33x22 + p32x23 + p31w9 + p23x32 + p22x33 + p21w11 + p13w14 + p12w15 + p11w16.

For the elements, we need to take into account boundary conditions such as zero, peri-

odic, or reflexive. Then, for example, we could describe the relations between b = vec(B)

and x = vec(X) with zero boundary condition as

p22 p12 p21 p11

p32 p22 p12 p31 p21 p11

p32 p22 p31 p21

p23 p13 p22 p12 p21 p11

p33 p23 p13 p32 p22 p12 p31 p21 p11

p33 p23 p32 p22 p31 p21

p23 p13 p22 p12

p33 p23 p13 p32 p22 p12

p33 p23 p32 p22

= A(P)x,

where A(p) is a BTTB matrix. Notice that we can rewrite by change of variables in (2.1),

x22 x12 x21 x11

x32 x22 x12 x31 x21 x11

x32 x22 x31 x21

x23 x13 x22 x12 x21 x11

x33 x23 x13 x32 x22 x12 x31 x21 x11

x33 x23 x32 x22 x31 x21

x23 x13 x22 x12

x33 x23 x13 x32 x22 x12

x33 x23 x32 x22

= A(X)vec(P).

Thus for invariant PSFs, we have the following property

A(P)x = A(X)vec(P) (2.2)

where x = vec(X). We will exploit this property in our algorithms.

2.1.3 Low-Rank PSF problem

The reason we are interested in low-rank PSFs is that the corresponding matrix A can be

describe by a Kronecker product. For instance, assume that the PSF is n × n and can be

written as

P = yzT =

[z1 · · · zn

]. (2.3)

Then we could write

A = A(P) = A(P(y, z)) = Az ⊗Ay (2.4)

y1+dn2e · · · y1

.... . .

yn · · · y1+dn2e · · · y1

. . ....

yn · · · y1+dn2e

and Az =

z1+dn2e · · · z1

.... . .

zn · · · z1+dn2e · · · z1

. . ....

zn · · · z1+dn2e

If n is an odd number, then (1 + dn2e, 1 + dn

2e) will be the center of Ay, Az.

The reason we focus on the Kronecker product is that we can use the properties of Kronecker

products for efficient computation [3, 16]. If A is a Kronecker product, then we could write

its SVD like,

(UyΣyVTy )⊗ (UzΣzV

Tz ) = (Uy ⊗Uz)(Σy ⊗Σz)(Vy ⊗Vz)

T . (2.5)

And if P is a sum of rank 1 matrices such as

P =n∑j=1

yjzTj , (2.6)

A can be expressed as a sum of Kronecker products (See Kamm and Nagy [13].),

A =n∑j=1

A(j)y ⊗A(j)

z . (2.7)

2.2 Regularization for the Linear Problem

First we will consider the linear problem and investigate why regularization is needed to

compute a solution. The objective of regularization is to constrain unwanted parts and to

reconstruct more stable solutions that are close to the exact one. First, let’s investigate

classical perturbation theory. For some perturbation e, there are two solutions x and xexact

such that

Axexact = bexact and Ax = bexact + e,

where A is nonsingular square matirx. Then we have a bound like

‖x− xexact‖‖xexact‖

≤ cond(A)‖e‖‖bexact‖

, (2.8)

where cond(A) = ‖A−1‖ · ‖A‖ [2, 14]. When A is ill-posed, cond(A) is very large, and x

may not be close to xexact. Although it has an upper bound, empirically the error between

x and xexact tends to follow the upper bound. So regularization is needed to make solution

x close to xexact [1].

In this thesis we will use the SVD formulation to express and describe the regularization

methods (1.1), where φi are the filter factors being decided by the regularization method.

Regularization for inverse problems is a well-studied field with many excellent textbooks and

papers, e.g.[1, 2, 3].

2.2.1 Picard Condition

For the linear problem Ax = b with A a nonsingular square matrix, we could describe the

inverse solution x from the SVD,

VTx = Σ−1UTb =n∑i=1

x = VΣ−1UTb =n∑i=1

σivi. (2.9)

The discrete Picard condition is proposed and studied in [2, 15].

- The Discrete Picard Condition: Let τ be the level at which the computed singular

value σi levels off because of rounding errors. The discrete Picard condition requires,

for all singular values larger than τ , the corresponding |uTi b| decay faster than the σi.

Without noise, if the solution satisfies the discrete Picard condition, |uTi b| decays faster

than σi until round off error. But with noise, the solution starts to be violated, so that it

won’t satisfy the discrete Picard condition. If we check the Picard plot (Figure 2.2) with

is faster than the rate of decay of σi, so that |uTi b/σi| also decreases. However when noise

e is added to b, the rate of decay of |uTi b| becomes slower than σi. Hence |uTi b/σi| starts

to increase without bound some point, which makes a solution x far away from the true

x. From this result, if we could suppress noise after that point, we could get a more stable

0 10 20 30 40 50 60 70

Picard plot

(a) Rounding error

0 10 20 30 40 50 60 70

Picard plot

(b) Noise: 10−6 error

Figure 2.2: Discrete Picard Conditions from [2]

approximated solution. To avoid these noise components, we could use Truncated SVD or

Tikhonov regularization.

2.2.2 Spectral Filtering Methods

For problems where the SVD of A can be computed, we will use the SVD of A as (2.9) to

compute x. The noise components start to disturb the solution when the rate of decreasing

σi is faster than the rate of decay of |uTi b|. To make a stable approximated solution, the

solution needs to be free from the noise components. Then let’s look for a way to avoid the

noise components.

First, we could cut the noise parts from the sum ofuTi b

σivi. By cutting them simply, we could

make x not to be unbounded. This method is called Truncated SVD(TSVD). We could split

at some k like

x =k∑i=1

σivi +

n∑i=k+1

where the rate of decay of |uTi b| becomes slower than σi at k. TSVD cuts off all noise

components and we get the regularized solution

x =k∑i=1

σivi.

But we still need to determine k where we have to cut off.

Another option is to remove noise components using filters. Rather than cut off at some

point k, we will define a new function called the filtering function φi for each i = 1, . . . , n,

with given parameter λ > 0,

φi =σ2i

λ2 + σ2i

and we will multiply this filter function, so we get regularized solution

x =n∑i=1

φiuTi b

σivi.

If σi is much bigger than the given λ, φi will be close to 1. If σi is much less than the given λ,

φi will be close to 0. Thus we could minimize disturbance from noise. This method is called

Tikhonov regularization. It can be shown that this form is equivalent to the optimization

problem,

minx‖Ax− b‖22 + λ2‖x‖22. (2.10)

Notice that,

minx‖Ax− b‖22 + λ2‖x‖22 = min

∥∥∥∥∥∥∥A

∥∥∥∥∥∥∥2

Then the solution x can be obtained by the normal equations,

x = (ATA + λ2I)−1ATb

= (VΣUTUΣVT + λ2VVT )−1VΣUTb

= (V(Σ2 + λ2I)VT )−1VΣUTb

= V(Σ2 + λ2I)−1VTVΣUTb

= V(Σ2 + λ2I)−1ΣUTb

=n∑i=1

φiuTi b

(2.11)

Similar to TSVD, we need to select a regularization parameter λ that gives us a stable

approximated solution.

2.2.3 Choosing the Regularization Parameter

In this section, we describe various methods to select k or λ. Now define diagonal filter

matrix,

φ(λ) = diag(φ1, φ2, . . . , φn)

Let’s consider the error of the solution from Tikhonov regularization,

xexact − x = xexact −Vφ(λ)Σ−1UTb

= xexact −Vφ(λ)Σ−1UT (bexact + e)

= xexact −Vφ(λ)Σ−1UTbexact −Vφ(λ)Σ−1UTe

= xexact −Vφ(λ)Σ−1UTAxexact −Vφ(λ)Σ−1UTe

= (I−Vφ(λ)Σ−1UTA)xexact −Vφ(λ)Σ−1UTe

= (VVT −Vφ(λ)Σ−1UTUΣVT )xexact −Vφ(λ)Σ−1UTe

= V(I− φ(λ))VTxexact −Vφ(λ)Σ−1UTe.

(2.12)

In this way, we can consider the error of the solution in two parts. The first term is called

the regularization error and the second term is called the perturbation error. When λ is

close to zero, then the regularization error is very small because φ(λ) is close to I as λ→ 0

but the perturbation error can be large. Conversely, as λ increases the perturbation error

decreases but the regularization error increases. So we need to give an appropriate value of

λ to get balance between the regularization and perturbation error in order to minimize the

error of the solution. A similar form for the filter factors can be obtained for TSVD where

φi = 1 or φi = 0.

We will explore three methods to choose the parameter λ for Tikhonov or the proper index

k for TSVD. We consider the Discrepancy Principle (DP), L-curve, and the Generalized

Cross Validation (GCV) method which is a statistical method. We will use the norm to

approximate the parameter such as

‖xλ‖22 and ‖Axλ − b‖22 or ‖xk‖22 and ‖Axk − b‖22

since these norms also will decrease to zero as x→ xexact.

First, the Discrepancy Principle is one of the most simple approaches. If the noise is known

(e.g. ‖e‖2), then we will choose kdp or kλ such that

‖Axk − b‖2 ≥ νdp‖e‖2 ≥ ‖Axk+1 − b‖2,

‖Axλ − b‖2 = νdp‖e‖2,

respectively, where νdp is a safety factor and νdp > 1 in [2]. So if we know the measure of

noise, we could choose a parameter by making the residual error be equal to the error norm

with a safety factor. But it is a very critical disadvantage because we usually don’t know

‖e‖2 exactly. But if we know it, the Discrepancy Principle is really simple to compute.

Second, the L-curve uses the curvature of curve (log ‖Axλ − b‖2, log ‖xλ‖2) and seeks a

location where the curve transitions between horizontal and vertical parts. It uses the

quantities

ξ = ‖xλ‖22 and ρ = ‖Axλ − b‖22,

and chooses λ to maximize the curvature

cλ = 2ξρ

ξ′λ2ξ′ρ+ 2λξρ+ λ4ξξ′

(λ2ξ2 + ρ2)3/2.

For TSVD, we choose k at the corner of the L-curve. Unfortunately, the L-curve method

fails when vTi xexact decays to zero quickly or the change in the norms of residual and solution

is small compared with two consecutive values of k.

Last, GCV is a very common and useful method. Let’s check the difference between bexact

and Axk for some rank-k of TSVD solution. Then

Axk − bexact = AVφ(k)Σ−1UTb− bexact

= UΣVTVφ(k)Σ−1UT (bexact + e)−UUTbexact

UTbexact + U

UTe−UUTbexact

UTe−U

0 In−k

UTbexact

where φ(k) = diag(1, . . . , 1, 0, . . . , 0). Thus, the error norm becomes

‖Axk − bexact‖22 =k∑i=1

(uTi e)2 +n∑

(uTi bexact)2.

If we know bexact and a noise e, we will be able to find an appropriate index k to minimize

the error. But the noise is usually not known and bexact won’t be available. Since we don’t

know bexact, we will estimate each element by using the other elements.

Consider the Tikhonov case. Without the ith row of A, b, call them A(i) and b(i) respectively,

solve x(i)λ by Tikhonov such that

x(i)λ = ((A(i))TA(i) + λ2In−1)

−1(A(i))Tb(i)

and then use this x(i)λ to estimate the element bi by computing A(i, :)x

(i)λ . Hence our goal is

to minimize the errors of each part such that

n∑i=1

(A(i, :)x(i)λ − bi)

With some technical computation it could be written by

n∑i=1

(A(i, :)xλ − bi

1− hii

where hii are the diagonal elements of matrix A(ATA + λ2I)−1AT and xλ is Tikhonov

solution. But it still has an issue because the solution will depend on the order of hii. So

to recover this problem, we replace hii by the average of them. We called this method

generalized cross validation (GCV), where it has the minimization form

n∑i=1

(A(i, :)xλ − bi

1− trace(A(ATA + λ2I)−1AT )/n

By using the SVD of A,

trace(A(ATA + λ2I)−1AT ) = trace(UΣVTV(Σ2 + λ2I)−1VTVΣUT )

= trace(UΣ(Σ2 + λ2I)−1ΣUT )

= trace(Uφ(λ)UT )

= trace(φ(λ))

= trace(n∑i=1

φi(λ)).

Hence GCV chooses λ such that

λGCV minimizes‖Axλ − b‖22(n−

n∑i=1

φi(λ))2 . (2.13)

For TSVD, since φ(k) = diag(1, . . . , 1, 0, . . . , 0),

kGCV minimizes‖Axk − b‖22(n− k

)2 . (2.14)

In summary, we have checked that the regularization method is needed for a linear inverse

problem to get a more stable solution which is close to the exact solution. Through the

Picard condition and plots, we noticed that we should consider the first few elements of the

SVD form (2.9) to avoid unwanted errors. Standard methods for choosing the regularization

parameter include DP, L-curve, and GCV.

2.3 Gauss-Newton Method Nonlinear Least Squares

Next we review optimization methods for solving nonlinear least squares problems. Our goal

is to minimize f(x) =1

n∑j=1

r2j (x) where rj is a smooth function from Rn to R. We call r the

residual vector from Rn to Rn with

r(x) = (r1(x), r2(x), . . . , rn(x))T

and we get

f(x) =1

2‖r(x)‖22.

Now we can express the derivatives of f(x) in terms of the Jacobian J(x) ∈ Rn×n, where

J(x) =

∇r1(x)T

∇r2(x)T

∇rn(x)T

such that

∇f = J(x)T r(x)

∇2f = J(x)TJ(x) +n∑j=1

rj(x)∇2rj(x)

and ∇f , ∇2f are called the gradient vector and Hessian matrix respectively [4]. For linear

problems where r(x) = Ax− b, we have J(x) = A and ∇f = AT (Ax− b).

The standard Newton method for minimizing f(x) is an iterative method, such as xk+1 =

xk + αkpk where αk is step size and the descent direction pk is calculated such that

∇2f(xk)pk = −∇f(xk)

at each step k. The Gauss-Newton method uses an approximation of the Hessian

∇2f(xk) ≈ J(xk)TJ(xk).

Then, along with the gradient,

∇f(xk) = J(xk)T r(xk),

we get the Gauss-Newton step pGNk such that

J(xk)TJ(xk)p

GNk = −J(xk)

T r(xk). (2.15)

The reason we use the approximation is to avoid calculating the Hessian matrix ∇2f . Fur-

thermore J(x)TJ(x) usually dominates the second term of ∇2f(x) from the Taylor series.

Then let’s go back to the nonlinear problem

minx,y‖b−A(y)x‖22

and from Chung and Nagy [7], we can define the coupled least squares problem as

ψ(w) = minw

2‖f(w)‖22 (2.16)

f(w) = f(x,y) =

and w =

Now the problem minw

ψ(w) can be solved by the Gauss-Newton method, which the iterates

are given by

w`+1 = w` + d`, ` = 0, 1, 2, . . . ,

where w0 is an initial guess, and d` is computed by solving

Jψ(w`)TJψ(w`)d` = −Jψ(w`)

T f(w`).

If we define r = −f, then

Jψ(w`)TJψ(w`)d` = −Jψ(w`)

T f(w`)

= Jψ(w`)T r(w`).

Finding the search direction d` is equivalent to

mind‖Jψ(w`)

TJψ(w`)d− Jψ(w`)T r(z`)‖22

= mind‖Jψ(w`)

T (Jψ(w`)d− r(w`))‖22

= mind‖Jψ(w`)d− r(w`)‖22

The Jacobian matrix Jψ can be written as

[fx fy

[∂f(x,y)

∂f(x,y)

]. (2.17)

In summary, the Gauss-Newton method for minw

ψ(w) has the following general form.

1. Choose initial w0 =

2. for ` = 0, 1, 2, . . .

2.1 r` =

−A(y`)

2.2 d` = argmind‖Jψd− r`‖2

2.3 w`+1 = w` + d`

3. end

But calculating and solving the above algorithm with Jψ at each step can be expensive

and take much time when the size of x, y is big. Alternative approaches include variable

projection [6, 10] and alternating optimization [24].

2.4 Variable Projection for Separable Nonlinear Least-

Squares Problems

In Golub and Pereyra [6], O’Leary and Rust [10], for given observation b =

[b1 · · · bn

a separable nonlinear least-square problem consists of a linear combination of nonlinear

functions that have multiple parameters. Then the residual vector r is defined as

ri(x, y) = bi −n∑j=1

xjφj(y; ti)

- ti are independent variables related to bi.

- the nonlinear functions φj(y; ti) are the columns of A(y) and y is computed at all ti.

- xj and the n-dimensional vector y are obtained by minimizing ‖r(x,y)‖22.

Thus we have

‖r(x,y)‖22 = ‖b−A(y)x‖22.

If we assume we know what the nonlinear parameters y are, then the linear parameters x

can be computed as,

x = A(y)+b

where A+ is the pseudoinverse of A(y) and x represents the solution of the linear least

squares problem for fixed y. If we incorporate this in the nonlinear problem, the optimization

problem has the form

12‖(I−A(y)A(y)+)b‖22

where the linear parameters have vanished. So (I − A(y)A(y)+)b is called the variable

projection of b and I−A(y)A(y)+ is the projector onto the orthogonal complement of the

column space of A(y).

In the variable projection method is an iterative nonlinear algorithm that is used to solve

the minimum problem in a reduced space. In general, it tends to converge in fewer iterations

compared to the original minimization problem. However, convergence of this method is not

guaranteed.

By eliminating x implicitly, we can reduce the cost to depend only on y. Thus this method

will be proper when y has relatively fewer parameters than x. By the fact that ψ(x,y) from

(2.16) is linear in x, we can consider

ρ(y) ≡ ψ(x(y),y) =1

∥∥∥∥∥∥∥A(y)

x(y)−

∥∥∥∥∥∥∥2

(2.18)

where x(y) is the solution of minxψ(x,y). Now in order to apply the Gauss-Newton algorithm

to ρ(y), we need to compute

ρ′(y) = ψy(x,y) = ψx ·dx

dy+ ψy ·

Since x is the solution of minxψ(x,y), we assume that

[A(y)T λI

ρ′(y) = 0 · dx

dy+ ψy · 1 = ψy,

and with f =

[f1 f2 · · · f2n

]Tfrom f =

ψy =1

[f1y f2y · · · f2ny

[f1 f2 · · · f2n

f2y...

= fTy f,

and so ρ′(y) = fTy f. From this calculation, we can check that

Jρ = fy =∂

A(y)x− b

∂[A(y)x]∂y

. (2.19)

Let Jρ = ∂[A(y)x]∂y

and r = b−A(y)x. Since we can reduce the form as,

JTρ Jρ =

[[∂[A(y)x]∂y

]∂[A(y)x]∂y

[∂[A(y)x]∂y

]T ∂[A(y)x]∂y

ρ Jρ,

−JTρ f =

[[∂[A(y)x]∂y

=[∂[A(y)x]

]T [A(y)x− b

we can get the search direction d` for the Gauss-Newton algorithm by solving

ρ Jρd` = JT

Finding d` is equivalent to

mind‖J

ρ Jρd− JT

ρ r‖22 = mind‖J

ρ (Jρd− r)‖22

= mind‖Jρd− r‖22

This reduced Gauss-Newton method for minyρ(y) has the general form :

1. Choose initial y0

2. for ` = 0, 1, 2, . . .

2.1 x` = argminx

∥∥∥∥∥∥∥A(y`)

∥∥∥∥∥∥∥2

2.2 r` = b−A(y`)x`

2.3 d` = argmind‖Jρd− r`‖2

2.4 y`+1 = y` + d`

3. end

Besides computing the search direction d` at each step, we also need to check the step

length or distance of descent. There is an algorithm for this such as the Armijo rule [20] for

step distance α`,

y`+1 = y` + α`d`.

But in our numerical examples, the step size does not affect significantly our numerical result

so we let α` = 1.

For choosing a regularization parameter to solve the linear least squares problem, GCV (2.13)

is applied to Tikhonov problem

x` = argminx

∥∥∥∥∥∥∥A(y`)

∥∥∥∥∥∥∥2

And to obtain the nonlinear parameter direction,

d` = argmind‖Jρd− r`‖2

we solve using the normal equations, d` = (JT

ρ Jρ)−1J

ρ r`. For enforcing the constraint,

y ≥ 0, we use the built in Matlab code ‘lsqnonlin.m’, which is based on [8, 9], to enforce the

nonnegativity constraint.

Chapter 3

Exploiting a Low Rank PSF in

Solving Nonlinear Inverse Problems

The general separable nonlinear least squares problem could be written as

minx,y

s.t. y≥0‖A(y)x− b‖22, (3.1)

where x ∈ Rn×1, y ∈ Rn×1 ≥ 0, A(y) ∈ Rn×n, and b ∈ Rn×1. For our problem, we will

consider point spread functions P that have rank 1.

First we consider the symmetric case, where y ∈ Rn×1 and P = 1s(y)

yyT with s(y) =n∑i=1

n∑j=1

yiyj. We also consider the nonsymmetric case, where y, z ∈ Rn×1 and P(y, z) =

s(y, z)yzT with s(y, z) =

n∑i=1

n∑j=1

yizj. So we can assume that P is rank 1.

In order to implement the Gauss-Newton approach to solve the reduced problem, we need

to derive the Jacobian, Jρ. So first we assume that the entries of the PSF are nonnegative

Taewon Cho Chapter 3. Exploiting a Low Rank PSF 33

and let the sum of the entries of the PSF be 1.

3.1 Symmetric PSF

Let the PSF be P = 1s(y)

yyT where s(y) =n∑i=1

vec(P(y)). Then the Jacobian Jρ ∈ Rn2×n

contains all partial derivatives of vec(P(y)) with respect to y, and Jρ has the form

Jρ =∂vec(P(y))

s(y)y⊗ y

because for y ∈ Rn×1

P(y) =1

s(y)yyT =

[y1 y2 . . . yn

y1y1 y1y2 . . . y1yn

y2y1 y2y2 . . . y2yn...

.... . .

yny1 yny2 . . . ynyn

vec(P(y)) =1

[y21 y2y1 . . . yny1 . . . y1yn y2yn . . . y2n

s(y)y⊗ y.

For all k = 1, . . . , n,

∂s(y)

∂yk=

∂yk(n∑i=1

n∑j=1

yiyj) = 2n∑i=1

yi = 2‖y‖1.

Therefore

∂vec(P(y))

(y21s(y)

)... . . .

(yny1s(y)

. . ....

(y1yns(y)

)... . . .

(y2ns(y)

2y1 · s(y)− 2y21‖y‖1s(y)2

0 · s(y)− 2y21‖y‖1s(y)2

... . . ....

yn · s(y)− 2yny1‖y‖1s(y)2

y1 · s(y)− 2yny1‖y‖1s(y)2

.... . .

yn · s(y)− 2y1yn‖y‖1s(y)2

y1 · s(y)− 2y1yn‖y‖1s(y)2

... . . ....

0 · s(y)− 2y2n‖y‖1s(y)2

2yn · s(y)− 2y2n‖y‖1s(y)2

. . ....

− 2‖y‖1s(y)2

y2y1...

. . ....

y2yn...

Thus with 1T =

[1 1 . . . 1

]∈ R1×n,

Jρ =∂vec(P(y))

− 2‖y‖1

s(y)(y⊗ In + In ⊗ y)− 2‖y‖1

s(y)2(y⊗ y1T )

and Jρ ∈ Rn2×n.

3.2 Non-symmetric PSF

Now consider the scaled PSF with two different nonnegative vectors y ∈ Rn×1 and z ∈ Rn×1,

P(y, z) =1

s(y, z)yzT where s(y, z) =

n∑i=1

vec(yzT ).

The Jacobians can be derived as:

Jy =∂vec(P(y, z))

s(y, z)(z⊗ y)

Jz =∂vec(P(y, z))

s(y, z)(z⊗ y)

because for y ∈ Rn×1, z ∈ Rn×1,

P(y, z) =1

s(y, z)yzT =

s(y, z)

[z1 z2 . . . zn

s(y, z)

y1z1 y1z2 . . . y1zn

y2z1 y2z2 . . . y2zn...

.... . .

ynz1 ynz2 . . . ynzn

vec(P(y, z)) =1

s(y, z)

[y1z1 y2z1 . . . ynz1 . . . y1zn y2zn . . . ynzn

s(y, z)z⊗ y.

For all k = 1, . . . , n,

∂s(y, z)

∂yk=

∂yk(n∑i=1

n∑j=1

yizj) =n∑i=1

zi = ‖z‖1,∂s(y, z)

∂zk=

∂zk(n∑i=1

n∑j=1

yizj) =n∑i=1

yi = ‖y‖1.

Therefore

∂vec(P(y, z))

(y1z1s(y, z)

)... . . .

(ynz1s(y, z)

. . ....

(y1zns(y, z)

)... . . .

(ynzns(y, z)

z1 · s(y, z)− y1z1‖z‖1s(y, z)2

0 · s(y, z)− y1z1‖z‖1s(y, z)2

... . . ....

0 · s(y, z)− ynz1‖z‖1s(y, z)2

z1 · s(y, z)− ynz1‖z‖1s(y, z)2

.... . .

zn · s(y, z)− y1zn‖z‖1s(y, z)2

0 · s(y, z)− y1zn‖z‖1s(y, z)2

... . . ....

0 · s(y, z)− ynzn‖z‖1s(y, z)2

zn · s(y, z)− ynzn‖z‖1s(y, z)2

s(y, z)

. . ....

− ‖z‖1s(y, z)2

y2z1...

. . ....

y2zn...

Thus with 1T =

[1 1 . . . 1

]∈ R1×n,

Jy =∂vec(P(y, z))

s(y, z)

− ‖z‖1s(y, z)2

s(y, z)(z⊗ In)− ‖z‖1

s(y, z)2(z⊗ y1T )

. (3.4)

Similarly,

Jz =∂vec(P(y, z))

s(y, z)(In ⊗ y)− ‖y‖1

s(y, z)2(z⊗ y1T ) . (3.5)

Note that both Jy, Jz are in Rn2×n. Therefore, we finally get the Jacobian

[Jy Jz

s(y, z)(z⊗ In)− ‖z‖1

s(y, z)2(z⊗ y1T )

s(y, z)(In ⊗ y)− ‖y‖1

s(y, z)2(z⊗ y1T )

and Jρ ∈ Rn2×2n. We will use this Jacbian in Section 4.1 for the reduced Gauss-Newton

method.

3.3 Reformulation

We also noticed that a reformulation can be done to simplify the problem. For non-symmetric

PSF, P = yzT , if we fix x0 and y0, then we can show that the problem is linear in z. Let

vec(X0) = x0. Then

minz‖A(y0z

T )x0 − b‖22 = minz‖A(X0)(z⊗ y0)− b‖22.

And we also can change

z⊗ y0 =

· · ·

= (In ⊗ y0)z. (3.6)

Thus we have

minz‖A(y0z

T )x0 − b‖22 = minz‖A(X0)(In ⊗ y0)z− b‖22. (3.7)

Let Y = A(X0)(In ⊗ y0). Then we have minz‖Yz − b‖22 and we get linear least squares

problem. In the same way, if we fix z0 and x0, then we can reformulate the nonlinear

problem into linear in y:

miny‖A(yzT0 )x0 − b‖22 = min

y‖A(X0)(z0 ⊗ y)− b‖22.

We can change

z0 ⊗ y =

· · ·

z0,1In

z0,2In...

z0,nIn

= (z0 ⊗ In)y. (3.8)

Thus we have

miny‖A(yzT0 )x0 − b‖22 = min

y‖A(X0)(z0 ⊗ In)y− b‖22. (3.9)

Let Z = A(X0)(z0 ⊗ In). Then we have miny‖Zy − b‖22 and we get a linear least squares

problem again. With these reformulations, the alternating optimization method can be done

efficiently. See Section 4.2.

Chapter 4

Numerical Results

4.1 Variable projection with low rank PSF

In this chapter, we will solve a nonlinear inverse problem with a separable low rank PSF on

an image deblurring example. This experiment is called Blind Deconvolution. The forward

process is (1.3). In real processing, we don’t know how the images are blurred. There

are many types of blurring which have been investigated e.g. Gaussian blur [7]. But our

framework allows more general and realistic blurs through low-rank PSFs.

We assume that the PSF is a rank 1 matrix. Thus we can consider the symmetric case,

P = yyT ,

where y ∈ Rn×1 and y ≥ 0 or the non-symmetric case,

P = yzT ,

where y ∈ Rn×1, z ∈ Rn×1 and y ≥ 0, z ≥ 0. We will apply the PSF to the Grain image

Taewon Cho Chapter 4. Numerical Results 41

50 100 150 200 2500

0.5True yInitial y

(a) ytrue, y0

50 100 150 200 2500

0.5True zInitial z

(b) ztrue, z0

Figure 4.1: True parameters ytrue, ztrue

from Chung and Nagy [7]. The image size is 256× 256.

This experiment uses a non-symmetiric PSF, so we set true parameter ytrue and ztrue in

Figure 4.1. After constructing the PSF, the matrix A is also made. Then with the true

image, we get b. To give noise to the observed data b, 1% Gaussian white noise is added.

We force y and z to satisfy that the sum of yzT is 1 by dividing by the sum of elements

of yzT . We set the initial y0 by convolving ytrue with a Gaussian kernel [18], [19] and z0

similarly.

We will use Jρ from the result of chapter 3 for the non-symmetric case,

[Jy Jz

s(y, z)(z⊗ In)− ‖z‖1

s(y, z)2(z⊗ y1T )

s(y, z)(In ⊗ y)− ‖y‖1

s(y, z)2(z⊗ y1T )

]and apply the Gauss-Newton method. We need to solve

minx,y,z

s.t. y,z≥0‖A(P(y, z))x− b‖22, (4.1)

0 2 4 6 8 10 12 14 16

iteration

Relative Error of x

nl=0.025

nl=0.028

nl=0.03

nl=0.032

nl=0.035

(a) Relative Error of x

0 2 4 6 8 10 12 14 16

iteration

Relative Error of y

nl=0.025

nl=0.028

nl=0.03

nl=0.032

nl=0.035

(b) Relative Error of y

0 2 4 6 8 10 12 14 16

iteration

Relative Error of z

nl=0.025

nl=0.028

nl=0.03

nl=0.032

nl=0.035

(c) Relative Error of z

0 2 4 6 8 10 12 14 16

iteration

Relative Error of x, y, and z

nl=0.025

nl=0.028

nl=0.03

nl=0.032

nl=0.035

(d) Relative Error of x, y, and z

Figure 4.2: Comparing λnl = 0.025, 0.028, 0.03, 0.032, 0.035

but this problem is ill-posed. Thus, regularization parameters for linear, λl, and for non-

linear, λnl, parameters are added to give a restriction such that

minx,y,z

s.t. y,z≥0‖A(P(y, z))x− b‖22 + λl‖x‖22 + λnl

∥∥∥∥∥∥∥ y

∥∥∥∥∥∥∥2

. (4.2)

The linear regularization parameter λl is chosen by GCV (2.13). Without non-linear regu-

larization parameter λnl, y and z are unstable because the condition number of Jρ is a really

50 100 150 200 2500

0.5True yInitial yComputed y

(a) Compare ytrue, yinitial, ycomputed

50 100 150 200 2500

0.5True zInitial zComputed z

(b) Compare ztrue, zinitial, zcomputed

0 2 4 6 8 10 12 14 160.12

(c) Relative errors of x, y, and z

Figure 4.3: Error compares with graph for Non-symmetric PSF with λnl = 0.03

large number near 1014 even we assumedn∑i=1

n∑j=1

yizj = 1 in Chapter 3. To suppress y and z,

λnl is chosen experimentally, see Figure 4.2. By choosing λnl between 0.025 and 0.035, we

can observe the changing of the relative error of x, y, and z. Among those values λnl = 0.035

shows the smallest relative error of x while the relative errors of y, z drastically increase. In

the aspect of the total relative error, λnl = 0.03 is the best non-linear regularization param-

eter in this experiment. We chose λnl = 0.03, we can check that y, z are slightly closer to

ytrue, ztrue in the graphs of Figure 4.3. Also, the relative error of x is much closer to zero as

the iteration number increases. But after iteration ` = 15, there was no significant change

Table 4.1: Table of relative norm errors for Non-symmetric PSF with λnl = 0.03

` ‖xtrue−x`‖2‖xtrue‖2

‖ytrue−y`‖2‖ytrue‖2

‖ztrue−z`‖2‖ztrue‖2

1 0.2810 0.2560 0.26712 0.2528 0.2539 0.26523 0.2176 0.2504 0.26164 0.1962 0.2470 0.25815 0.1824 0.2439 0.25476 0.1720 0.2410 0.25167 0.1647 0.2384 0.24888 0.1580 0.2360 0.24619 0.1518 0.2338 0.243510 0.1461 0.2317 0.241111 0.1407 0.2299 0.238912 0.1363 0.2283 0.236813 0.1321 0.2268 0.234814 0.1284 0.2255 0.233015 0.1245 0.2244 0.231316 0.1245 0.2244 0.2313

in the errors. We can check the numbers of the relative errors in Table 4.1.

In Figure 4.4, we see how the Grain image is changed as the reduced Gauss-Newton method

is applied. The final image looks still far from true one, but the images becomes closer to

the true image as ` increases. In addition, there were unexpected results when the same

experiment was tried in the case of symmetric blur such that the relative errors of x, y do

not decrease constantly and they end up increasing at some point. In order to investigate

the problem, we fix xtrue and alternate between solving the linear problems for y and z.

In Figure 4.5, we see how the PSF is changed as the reduced Gauss-Newton method is

applied. In the true PSF, we see that the blur contains oscillations in both y and z. The

initial guesses y0 and z0 have no oscillation so that the initial PSF starts from a continuous

blur. As ` increases, the final PSF looks closer to the true PSF than the initial PSF.

True Image Blurred Image

Figure 4.4: A comparison the true, blurred, deblurred images as the reduced Gauss-Newton method

with non-symmetric PSF and λnl = 0.03

True PSF

Figure 4.5: A comparison the true, reconstructed PSFs with λnl = 0.03

4.2 xtrue - Alternating Optimization

In section 3.3, we changed the nonlinear problem to a sequence of linear problems by fixing

xtrue and y0, or xtrue and z0. For this linear problem, we have used the matlab function

lsqlin.m to enforce nonnegativity [8] and linear equality [17] when we need to solve a con-

strained least squares problem. First for xtrue, we have the problem

miny,z

s.t. y,z≥0‖y‖1=‖z‖1=1

‖A(P(y, z))xtrue − b‖22, (4.3)

where P(y, z) =1

s(y, z)yzT with s(y, z) =

n∑i=1

n∑j=1

yizj. Then the alternating optimization

has the general form:

1. Choose y0.

2. For ` = 0, 1, 2, . . .,

- Set zinitial s.t. minz‖A(Xtrue)(In ⊗ y`)z− b‖2 .

- Modify zinitial to be nonnegative and ‖z‖1 = 1.

2.1 Find z`+1 s.t. minz≥0,‖z‖1=1

‖A(Xtrue)(In ⊗ y`)z− b)‖2 with zinitial.

- Set yinitial s.t. miny‖A(Xtrue)(z`+1 ⊗ In)y− b‖2 .

- Modify yinitial to be nonnegative and ‖y‖1 = 1.

2.2 Find y`+1 s.t. miny≥0,‖y‖1=1

‖A(Xtrue)(z`+1 ⊗ In)y− b)‖2 with yinitial.

3. end.

0 50 100 150 200 250 3000

True yInitial yComputed y

(a) Compare ytrue, yinitial, ycomputed

0 50 100 150 200 250 3000

True zInitial zComputed z

(b) Compare ztrue, zinitial, zcomputed

0 1 2 3 4 5 6 7 8 9 10

(c) Relative errors of y and z

Figure 4.6: Error Compares with Alternating Optimization

To start this algorithm, y0 is chosen by convolving between ytrue and a Gaussian ker-

nel [18], [19]. For setting the PSF, P(y, z), we make the sum of all elements of P(y, z) to be

1 by dividing yzT byn∑i=1

n∑j=1

yizj as we set in Chapter 3. However by forcing ‖y‖1 = ‖z‖1 = 1,

the sum of all elements of P(y, z) is automatically equal to 1 because

n∑i=1

n∑j=1

yizj =n∑i=1

yin∑j=1

=n∑i=1

|yi|n∑j=1

|zj| since y, z ≥ 0

= ‖y‖1‖z‖1 = 1 since ‖y‖1 = ‖z‖1 = 1.

Table 4.2: Table of relative norm errors for Alternating Optimization

` ‖ytrue−y`‖2‖ytrue‖2

‖ztrue−z`‖2‖ztrue‖2

0 0.25847714 0.045904911 0.04462480 0.043927982 0.04437537 0.043938023 0.04434619 0.043942304 0.04434112 0.043920705 0.04433983 0.043921046 0.04433946 0.043921137 0.04433936 0.043921158 0.04433934 0.043921169 0.04433933 0.0439211610 0.04433933 0.04392116

For the linear least squares problem with y, z ≥ 0 and ‖y‖1 = ‖z‖1 = 1, the matlab code

lsqlin.m has been used with initial vector yinitial, zinitial respectively. When zinitial,yinitial has

been set for the first iteration, they aren’t nonnegative and their 1-norm isn’t 1. To give an

initial vector for lsqlin.m, we give a small modification by changing small negative numbers

to zero and normalizing it. Once the alternating optimization process has started, we can

check that the relative error of y has dropped to a smaller error. The error of z started

small and there is no great change, as the iterations continue. See Figure 4.6. Also the

computed y, z are really close to the true y, z. Actually, we need to show convergence of the

alternating optimization method if we want to give valid theoretical fundamentals. Without

a proof of the convergence, it may not be expected that y`, zz will converge to the true y,

z. In Chan and Wong [24], if a model is not convex then it can allow multiple solutions.

Though they use a PSF with only one parameter, they have found convergence globally with

the converged solution depending on the initial guess. In this thesis, we assume that the

PSF is a rank-1 matrix so that we have n2 nonlinear parameters, which means we need to

consider results may more parameters than the model of Chan and Wong. Therefore we just

give numerical experiment, without convergence results. Showing theoretical convergence,

remains future work.

4.3 Alternating Optimization 3 ways

This section is a extension of alternating optimization to solve for x, y, and z. Thus it is

much more difficult. After initializing y0 and z0, we will compute initial x0 by solving

minx‖A(P(y0, z0))x− b‖2 (4.4)

using a regularization method, namely a Weighted-GCV Method for Lanczos-Hybrid Regu-

larization [23] with matlab code HyBR.m [21, 22, 23]. Also, we enforce y, z to be nonnegative

and ‖y‖1 = ‖z‖1 = 1. Then the alternating optimization 3 ways has the general form:

1. Choose y0, z0.

2. For ` = 0, 1, 2, . . .,

2.1 Compute x` s.t. minx‖A(P(y`, z`))x− b‖2 and vec(X`) = x`.

- Set zinitial s.t. minz‖A(X`)(In ⊗ y`)z− b‖2.

- Modify zinitial to be nonnegative and ‖z‖1 = 1.

2.2 Find z`+1 s.t. minz≥0,‖z‖1=1

‖A(X`)(In ⊗ y`)z− b‖2 with zinitial.

- Set yinitial s.t. miny‖A(X`)(z`+1 ⊗ In)y− b‖2 .

- Modify yinitial to be nonnegative and ‖y‖1 = 1.

2.3 Find y`+1 s.t. miny≥0,‖y‖1=1

‖A(X`)(z`+1 ⊗ In)y− b‖2 with yinitial.

3. end.

For this experiment, we consider a smaller example, e.g., 64 × 64, by cutting the middle

0 5 10 150.2

(a) Relative errors of x, y and z

Figure 4.7: Error Compares with Alternating Optimization 3 ways

part of the Grain image. Then A ∈ R642×642 , b,x ∈ R642×1. When we just try to find

x, y, and z through the alternating optimization 3 ways in Figure 4.7, it doesn’t give us

satisfactory results as shown in the above plot. Since we start with x0, which has been

derived from y0, z0 by using HyBR.m, x0 may be far from xtrue. To compensate for this

weak point, we add more information by using rotated images with the assumption that the

PSF is invariant. In this case, A and b have to be changed. Since we may lose some parts of

the original images interpolation is need if we rotate by degrees except 90o, 180o, and 270o.

Thus, we will just consider those 3 types of rotation, see Figure 4.8.

For the image vector x, let’s denote Rx to be the rotated image with a rotation matrix

Rk ∈ Rn2×n2where k = 0, 1, 2, 3. Then Rkx rotates image x counterclockwise by k · 90o.

The corresponding observed image is bk = A(P(y, z))Rkx.

True Image, 0 o True Image, 90 o True Image, 180 o True Image, 270 o

Blurred image, 0 o Blurred Image, 90 o Blurred Image, 180 o Blurred Image, 270 o

Figure 4.8: Rotated true and blurred images with 0o, 90o, 180o, 270o.

Thus we can state again our problem as with k = 0, 1, 2, 3,

minx,y,z

s.t. y,z≥0‖y‖1=‖z‖1=1

∥∥∥∥∥∥∥∥∥∥∥∥∥

A(P(y, z))R0

A(P(y, z))R1

A(P(y, z))Rk

∥∥∥∥∥∥∥∥∥∥∥∥∥

, (4.5)

and let

A(P(y, z)) =

A(P(y, z))R0

A(P(y, z))R1

A(P(y, z))Rk

and b =

. (4.6)

Now this problem can be considered as minimizing ‖A(P(y, z))x− b‖22 where A(P(y, z)) ∈

Rkn2×n2and b ∈ Rkn2

. The relative errors are shown in Figure 4.10. We can expect that

more rotations give better reconstructions.

0 10 20 30 40 50 60 700

0.08Error between True and Computed y

0 rotation1 rotation2 rotation3 rotation

(a) Computed y and True y

0 10 20 30 40 50 60 700

0.14Error between True and Computed z

(b) Computed z and True z

Figure 4.9: A comparison errors of y and z by Alternating Optimization 3 ways with rotation

From Figure 4.10, we can check that the relative errors of x, y, z, and PSF decrease as the

number of rotations grow. Using three rotations, we get the smallest errors overall. From

this we can expect that we can get better computed x, y, and z if we have more rotation

images. In Figure 4.9, as we have checked the relative errors in Figure 4.10, the alternating

optimization 3 ways with three times rotations gives us a meaningful error between the

true and computed y, z. The errors are just absolute difference values for each of the

corresponding yi, zi, i = 1, . . . , 64. While the other cases go away from the true y, z, three

times rotation gives the final computed y, z which is close to the true y, z respectively.

A potential problem of the alternating optimization 3 ways with rotation is that it taken

more time for bigger n. As n (size of image) and k (number of rotations) are bigger, the

algorithm needs more time to compute. When we look at Figure 4.11, 4.12, the results show

that the computed image and PSF with more rotations will give us better consequences.

In Figure 4.13, we repeat the same experiment with the original Grain image, whose size

is 256 × 256. Without rotation, the deblurred image gets worse and the first image is still

better than after 16 iteration of Alternating Optimization 3 ways. However, with 2 or 3

0 5 10 15

iteration

Relative Error of x

(a) Relative errors of x

0 5 10 15

iteration

Relative Error of y

(b) Relative errors of y

0 5 10 15

iteration

Relative Error of z

(c) Relative errors of z

0 5 10 15

iteration

Relative Error of PSF

(d) Relative errors of PSF

Figure 4.10: Relative errors by Alternating Optimization 3 ways with rotation

rotations, the computed images are much better than no rotation or just 1 rotation. In

Figure 4.14, for PSF of 256× 256, 2 or 3 rotations are better than 0 or 1 rotation. However,

the reconstructed PSFs are still poor approximation of the true PSF. When we use the larger

size of Grain image, the results are not as good as the case of 64× 64. However, with more

rotations, better reconstructions are likely.

True Image Initial Image Final Image

(a) Computed images with No rotation

(b) Computed images with One rotation

(c) Computed images with Two rotation

(d) Computed images with Three rotation

Figure 4.11: Computed images by Alternating Optimization 3 ways with rotation (64× 64 size)

True PSF Initial PSF Final PSF

(a) Computed PSF with No rotation

(b) Computed PSF with One rotation

(c) Computed PSF with Two rotation

(d) Computed PSF with Three rotation

Figure 4.12: Computed PSFs by Alternating Optimization 3 ways with rotation (64× 64 size)

(a) Computed images with No rotation

(b) Computed images with One rotation

(c) Computed images with Two rotation

(d) Computed images with Three rotation

Figure 4.13: Computed images by Alternating Optimization 3 ways with rotation (256× 256 size)

(a) Computed PSF with No rotation

(b) Computed PSF with One rotation

(c) Computed PSF with Two rotation

(d) Computed PSF with Three rotation

Figure 4.14: Computed PSFs by Alternating Optimization 3 ways with rotation (256× 256 size)

Chapter 5

Conclusions and Discussion

We have experimented with different cases for image deblurring with respect to different

methods. Solving nonlinear inverse problems is a very challenging problem but it is more

practical because we actually don’t know what the true image is and how the true images

are blurred in the real world. In this thesis we assumed that the blurred processing is based

on a rank-1 PSF. It looks pretty restricted, but it is enough to be applicable to some cases

if blurs only happen vertically and horizontally. Also rank-1 PSFs have the special property

that the blur matrix can be expressed as a Kronecker product.

For the reduced Gauss-Newton method, we made the Jacobian matrix with Kronecker prod-

uct, and we can observe that the blurred image has been deblurred as much as we expected.

It takes much time but we can improve the time by modifying the algorithm in Matlab. Since

computing Jacobian matrices or multiplication with these matrices require lots of time, we

can improve that part.

While we are trying to get more stable solutions which are close to the exact solution, we have

found that this nonlinear inverse problem with 1-rank PSF can be rewritten as a sequence of

Taewon Cho Chapter 5. Conclusions and Discussion 61

linear inverse problems by fixing some linear and nonlinear parameters. As we have discussed

in the introduction section, we already have efficient ways to solve a linear problem so that

we can apply them within nonlinear solvers. But when we started with the only initial y0,

z0, we get a solution that is far away from the true one as we repeat the algorithm. To

overcome and give more information for the linear problem, we take a rotation of images

which gave us improved results as the number of iterations increases.

We tried the alternating optimization 3 ways with rotation for a small image, but if we can

make rotation images or compute solutions more efficiently, then we can try to apply this

approach to larger images with a short time.

In future research, we need to extend the application to higher rank PSFs and different types

of images. Because we have checked the Grain image with reflexive boundary conditions for

these experiments, we may also want to use sparse images such as a satellite or a bone in a

medical image with the different PSF boundary conditions.

As we mentioned in Section 4.2., showing convergence of the alternating optimization is also

future work. But then we may need to consider PSFs with a smaller number of parameters,

less than n2.

Chapter 6

References

[1] D. Calvetti and L. Reichel. Tikhonov regularization with a solution constraint. SIAM J.

Sci. Comput., 26:224–239, 2004.

[2] Per Christian Hansen. Discrete Inverse Problems: Insight and Algorithms. SIAM, 2010.

[3] Per Christian Hansen, James G. Nagy, Dianne P. O’Leary. Deblurring Images: Matrices,

Spectra, and Filtering SIAM, 2006.

[4] Nocedal, Jorge, Wright, S. Numerical Optimization 2nd. Springer, 2006.

[5] Per Christian Hansen. Rank-deficient and Discrete Ill-posed Problems: Numerical As-

pects of Linear Inversion. SIAM, 1998.

[6] Gene Golub, Victor Pereyra. Separable nonlinear least squares: the variable projection

method and its applications. Institute of Physics Publishing, Inverse Problem, 19, R1-

R26, 2003.

[7] Julianne Chung, James G. Nagy. An efficient iterative approach for large-scale separable

nonlinear inverse problems. SIAM J. Sci. Comput. Vol. 31, No. 6, pp. 4654-4674, 2010.

[8] Thomas F. Coleman, Yuying Li. An Interior, trust region approach for nonlinear mini-

mization subject to bounds. SIAM journal on Optimization, Vol.6, pp. 418-445, 1996

[9] Thomas F. Coleman, Yuying Li. On the convergence of interior-reflexive Newton methods

for nonlinear minimization subject to bounds. Mathematical Programming 67 189-224,

[10] Dianne P. O’Leary, Bert W. Rust. Variable Projection for Nonlinear Least Squares

Problems. Computational Optimization and Application, 54, pp.579-593, 2013

[11] Anastasia Cornelio, Elena Loli Piccolomini, James G. Nagy. Constrained numerical

optimization methods for blind deconvolution. Numerical Algorithms, vol. 65, p. 23-42,

[12] Rafael C. Gonzalez, Richard E. Woods. Digital Image Processing, 3rd edition. Prentice

Hall, 2008

[13] Julie Kamm, James G. Nagy. Kronecker product and SVD approximations in image

restoration. Linear Algebra and its Applications 284, 177-192, 1998

[14] James W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997

[15] Per Christian Hansen. The discrete Picard condition for discrete ill-posed problems.

BIT, 30, pp. 503-518, 1990

[16] C. F. Van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM,

Philadelphia, pp. 39, 42, 47, 1992

[17] Gill, P. E., W. Murray, and M. H. Wright. Practical Optimization. Academic Press,

London, UK, 1981

[18] Harris, Fredric J. On the Use of Windows for Harmonic Analysis with the Discrete

Fourier Transform. Proceedings of the IEEE. Vol. 66, pp. 51-83., January 1978

[19] Roberts, Richard A., and C. T. Mullis. Digital Signal Processing. Reading, MA:

Addison-Wesley, pp. 135-136., 1987

[20] C. T. Kelley. Iterative Methods for Optimization. SIAM, Philadelphia, PA., 1999

[21] Paige and Saunders. LSQR an algorithm for sparse linear equations an sparse least

squares. ACM Trans. Math Software, 8, pp. 43-71., 1982

[22] Bjorck, Grimme and Van Dooren. An implicit shift bidiagonalization algorithm for

ill-posed systems. BIT 34, pp. 520-534., 1994

[23] Chung, Nagy and O’Leary. A Weighted-GCV Method for Lanczos-Hybrid Regulariza-

tion. ETNA 28, pp. 149-167, 2008

[24] Tony F. Chan, C.K. Wong Convergence of the alternating minimization algorithm for

blind convolution. Linear Algebra and Applications 316 259-285, 2000

numerical methods for separable nonlinear inverse …numerical methods for separable nonlinear...

Documents

numerical solution of a nonlinear inverse heat conduction...

dynamics of functionally separable solutions for a …...

classification - linear svmsvm { discussions nonlinear...

support vector machines chapter 12. 1 outline separating...

dlr oberpfaﬀenhofen, germany nonlinear inverse … ·...

direct and inverse problems for nonlinear time-harmonic

inverse scattering transform and nonlinear evolution...

nonlinear ocean waves & the inverse scattering transform,...

nonlinear inverse synthesis for high spectral efficiency ......

large scale computing and storage requirements for...

expectation propagation for nonlinear inverse problems...

improved fast inverse nonlinear fourier transform for...

analysis of a class of nonlinear and non-separable ... ·...

inverse problems - compute.dtu.dkpcha/hdtomo/hd-tomo.pdf ·...

inverse scattering transform for the vector nonlinear...

modi ed gauss elimination technique for separable...

statistical regularization approaches for linear/nonlinear...

3.4 inverse functions goal: find and use inverses of linear...

generalized nonlinear inverse problems solved using the...

integration of nonlinear equations of mathematical physics...