sovling linear systems via iterative methods

Sovling Linear Systems via Iterative methods

by

CHEN Yuning17251265

A thesis submitted in partial fulfillment of the requirements for the degree of

Bachelor of Science (Honours)in Mathematics and Statistics

at

Hong Kong Baptist University

January 21, 2021

AcknowledgementsI would like to thank my supervisor Dr.Hon and, Hong Kong Baptist University Departmentof Mathematics. In my 4 years of study, I not only learned knowledge, but also learned thespirit of overcoming difficulties. From the beginning to the final completion of the final yearproject, Dr.Hon has always given me careful guidance and unremitting support. Especiallywhen I met difficulties and retreated, it was Dr.Hon who gave me technical support and spiritualencouragement so that I was able to successfully complete the graduation project. When I wasabout to finish my thesis, I was very excited, holding the results of my efforts in recent months.At this time, I felt that all the efforts were worth it, and fulfilled the old saying "no pains, nogains". Last but not least, I would like to express my heartfelt appreciation to all the expertsand professors who have been busy reviewing the final year project and taking part in the oraldefense, and all the teachers I have met in my 16 years of study. Without your efforts, I wouldnot be who I am today.

Signature of Student

Student NameDepartment of Mathematics

Hong Kong Baptist University

Date

2

AbstractGiven that matrix A is a n-by-n Hermitian positive definite, this project studies solving linearsystem Ax= b via iterative methods, including the steepest descent (SD) method and the conju-gate gradient (CG) method. For specific important linear systems: Toeplitz systems, applicationof iterative methods is mainly discussed. To optimize the CG application for solving Toeplitzsystems, this paper provides preconditioned conjugate gradient (PCG) method with circulantmatrices including Strang’s preconditioner and T.Chan’s preconditioner. The operation andconvergence rate of the PCG method are discussed via numerical experiments. According tothe result, the superiority of PCG is shown by comparing with the CG. Finally, an applicationof iterative method solving linear system in muti layer perception (MLP) is introduced.

3

1 IntroductionWith the development of computer science, more and more linear systems can be solved bycomputers. Particularly when the scale of matrix is very large, iterative methods are widelyused to ensure the operations are computed faster and more accurate. In the beginning I wantto apply Toeplitz systems to neural network, but due to the limited time and the extensiveresearch of numerical analysis in solving linear problems, this project mainly combines theidea of machine learning gradient descent with numerical analysis to solve linear problems.In this section, two iteration methods: the steepdest descent and the conjugate gradient, areintroduced and I will discuss the convergence rate of them. Besides, I present an importantclass of system: Toeplitz systems. These systems have strong practicability in many scientificfields. For instance, the solution of Toeplitz systems can be utilized for solving numericalpartial and ordinary differential equations [6], obtaining stationary autoregressive time seriesin statistics [7], and making image restoration in image processing [8]. Therefore, how to solvethis system is an important subject. In this project, two iterative methods mentioned aboveare applied to solve these systems. Although this project discuss large scale linear systems,the condition of n is infity will not be introduced because this part depends on knowledge ofcontrol theory of infinite dimensional linear systems, which is more complex than my currentresearch.

1.1 Introduction of algorithmGiven that A is a n-by-n Hermitian positive definite matrix, consider the linear system

(1) Ax = b

which has high practicability in mathematics and computers, such as image and signal process-ing, partial differential equations, and queueing networks. To solve this problem, we can useGaussian elimination, but when n is large, the procedure is very complicated. For computers,repeating the same operation can greatly increase the speed of the operation. For this reason,iterative methods are necessary. In this section, the SD and the CG algorithms are derived.These algorithms create the optimal approximation from the Krylov space and the problem canbe considered as minimizing the formula xHAx−2bHx.

1.1.1 Steepest Descent (SD)

The problem which can be solved by the steepest gradient descent method is an unconstrainedoptimization problem. First we take an initial guess x0 as the solution, and introduce a param-eter ak to approximate real solution. Let residual rk+1 = b−Axk and error ek = x− xk, theiterations can be expressed by the following equation

(2) xk+1 = xk +ak(b−Axk) = xk +akrk = xk +ak pk

and residual r satisfies rk+1 = rk−akArk, to minimize 2-norm of rk+1, we get

(3) ak =〈rk,rk〉〈rk,Ark〉

Let Pk be the direction vector which are equal to rk, take inner product of rk +1 , we have

(4) 〈rk+1,rk+1〉= 〈rk,rk〉−|〈rk,Ark〉|2

〈Ark,Ark〉,

4

and this equation can be written as

(5) ‖rk+1‖2 = ‖rk‖2

1−

∣∣∣∣∣rHk AHrk

rHk rk

∣∣∣∣∣2

·(‖rk‖‖Ark‖

)2

In next theorem, we use inequality to simplify the formula.

Theorem 1.1. We define d be the distance from the field of values of AH to the origin. For allinitial vectors r0, the iteration (2) with coefficient formula (3) converges to the solution A−1b ifand only if 0 /∈F (AH), therefore, the 2-norm of the rk satisfies

(6) ‖rk+1‖ ≤√

1−d2/‖A‖2 ‖rk‖

And we can adjust ak to have a stronger bound

(7) ‖rk+1‖ ≤ ‖I−αA‖ · ‖rk‖

for α is any coefficient. Under the condition of A is Hermitian and positive definite, let α =2

λn+λ1,λ1 and λn are the smallest and largest eigenvalue of A, so (7) can be written as

(8) ‖rk+1‖ ≤ maxi=1,...n

∣∣∣∣1− 2λi

λn +λ1

∣∣∣∣ · ‖rk‖ ≤(

κ−1κ +1

)‖rk‖

and κ = λn/λ1 is condition number of A. However, rk+1 is not orthogonal to Apk−1,Apk−2....Thus the direction vector is not optimal and iteration times is probably large. This inequalitycan also be proved by Kantorovich inequality.

1.1.2 Conjugate Gradient (CG)

The CG is an improvement of the SD. Hestenes and Stiefel [5] first proposed it in 1952. Innumerical analysis, this algorithm is applied to solve the problem of large sparse matrix. Con-sider Krylov subspace created by A and b: Kk(A,b) = span{b, Ab, A2b,....Ak−1b} and the realsolution of Ax = b is x∗ = A−1b. The CG algorithm is to find xk ∈ Kk(A,b) as an optimumapproximation of x∗.

Definition 1.2. For symmetric and positive definite matrix A, we define matrix norm

(9) ‖v‖A =√

vT Av

Theorem 1.3. xk ∈ Kk(A,b), the following two conditions are equal‖xk− x∗‖A = min{‖x− x∗‖A ,x ∈ Kk(A,b)} and xT (b−Axk) = 0,∀x ∈ Kk(A,b)

It is obvious to find that r0 = b−Ax0, Kk(A,b) = Kk(A,r0)+ x0 by induction, so we havethe following theorem.

Theorem 1.4. According to 1.3, solving xk = argminx∈Kk(A,r0)+x0

‖x−x∗‖A is to find x∈Kk(A,r0)+x0

and b−Axk⊥Kk(A,r0)

Under the condition of Hermitian positive definite matrix, the procedure of the CG can belisted as follow.

5

Algorithm. Conjugate Gradient Method

Choose arbitrary xo, r0 = b−Ax0 and set p0 = r0

For k = 0,1,2...until convergence

Compute Apk−1

xk = xk−1 +ak−1Apk−1, where ak =〈rk−1,rk−1〉〈pk−1,Apk−1〉

Compute rk = rk−1−ak−1Apk−1

rk = rk−1−ak−1Apk−1

pk = rk +βk−1 pk−1,βk−1 =〈rk,rk〉〈rk−1,rk−1〉

The residual rk+1 reduce over the space r0+span{Ap0, ...Apk}. Besides, we have⟨rk+1,Ap j

⟩=⟨

Apk+1,Ap j⟩= 0. Denote error of xk in the CG algorithm as ek = A−1b− xk, by induc-

tion we can find rk+1 in the space r0 + span{Ar0,A2r0, ...Ak+1r0} and ek in the space e0 +span{Ae0,A2e0, ...Ak+1e0}. To show the convergence rate of the CG, here is a lemma

Lemma 1.5. For the CG algorithm, let Pk be the kth-degree polynomials with value 1 at theorigin and λi are eigenvalue of A for i = 1,2, ...n, we have

(10) ‖ek‖A/‖e0‖A ≤ minpk

maxi=1,2,...n

|pk(λi)|

Therefore, if A has only n distinct eigenvalues, The number of iterations of the CG method

is not more than n because we can construct a polynomial P(x) =n∏i=1

(1− xλi), which equal to 0

at λi. When the largest and smallest eigenvalue are known, we can get the following bound.

(11)‖ek‖A

‖e0‖A≤ 2

[(√κ−1√κ +1

)k

+

(√κ +1√κ−1

)k]−1

≤ 2(√

κ−1√κ +1

)k

When κ → ∞,√

κ−1√κ+1 ∼ 1− 2√

κ. This indicates when k is large, the iterations is about O(

√κ).

Base on this equation, we We use eigenvalues to express the convergence rate.

Theorem 1.6. Let eigenvalues of A: 0 < λ1 ≤ ... ≤ λp ≤ b1 ≤ λq+1 ≤ ... ≤ λn−q ≤ b2 ≤λn−q+1 ≤ ...≤ λn, and b1 and b2 are constants

(12)‖ek‖A

‖e0‖A≤ 2(

a−1a+1

)k−p−q · maxλ∈[b1,b2]

p

∏i=1

λ −λi

λi,

in this equation, a =√

b2b1≥ 1

We can find when eigenvalues of A are concentrated or the condition number κ is small,the convergence rate of the CG is fast. Moreover, the convergence rate is linear for

(13) 0 <‖ek+1‖A

‖ek‖A=

a−1a+1

= c < 1

We can prove is by induction.

6

1.2 Toeplitz systemsToeplitz is a kind of n×n matrix which can be written as

(14) Tn =

t0 t−1 · · · t1−nt1 t0 · · · t2−n... . . . ...

...tn−1 tn−2 · · · t0

i.e.,ti j = ti− j and Tn is constant along its diagonals. The great majority of early researches aboutToeplitz algorithms were direct methods. And one of traditional direct algorithms for solvingTx = b is the Gaussian elimination method, which requires O(n3) operations. But it is tooinefficient for large n. Fortunately, a number of specialized fast direct methods were inventedwhich can decrease the complexity to O(n2) operations such as Trench algorithms [12] andLevinson algorithms [13]. In the following report, fast Fourier transform is used to reduceoperations in Toeplitz systems.

7

2 Background

2.1 Toeplitz matrixToeplitz systems have been applied in numerous fields such as signal processing and imageprocessing. The methods for solving Tnx = b have also been widely researched. First wediscuss a notation of Topelitz matrix Tn. Let C2π be the function set containing all 2π-periodiccontinuous real-valued functions defined on [−π,π], then we have f (x) ∈C2π, let

(15) tk =1

2π

∫ π

−πf (x)e−ikx dx,k = 0,±1,±2...

where i2 = 1. So the entries of Tn are Fourier coefficients of f and we call f generating functionof Tn. If f is real-valued function, Tn is Hermitian matrix. We define Toeplitz matrix by thisformula because in generally we know the function f first, and then we calculate elements of Tn.If f is even function, Tn is real symmetric matrix. The following theory shows the connectionbetween generating function f and eigenvalue of Tn.

Theorem 2.1. Let fmin and fmax be the maximum and minimum of generating function f , λminand λmax be the largest and smallest eigenvalue of Tn, it satisfies

(16) fmin ≤ λmin ≤ λmax ≤ fmax

Moreover, when fmin > 0, Tn is positive definite.

Proof. Let u = (u0,u1, ...,un−1)T ∈ Cn, we have

(17)

uT Tnu =n−1

∑p=0

n−1

∑q=0

tp−qupuq =n−1

∑p=0

n−1

∑q=0

[1

2π

∫ π

−πf (x)e−i(p−q)xdx

]upuq

=1

2π

∫ π

−π

∣∣∣∣∣n−1

∑p=0

uke−ikx

∣∣∣∣∣2

f (x)dx

Since for all x, fmin ≤ f (x)≤ fmax, if u satisfies

(18) uT u = ‖u‖22 =

12π

∫ π

−π

∣∣∣∣∣n−1

∑k=0

uke−ikx

∣∣∣∣∣2

dx = 1

so we have

(19) fmin ≤ uT Tnu≤ fmax

According to Courant–Fischer minimax theorem, (16) can be proved.

Definition 2.2. For ∀ε > 0 is given, there exist positive integers n1 and n2, such that for alln> n1, at most n2 eigenvalues of matrix Tn−In have absolute value larger than ε , then sequenceof matrices Tn is defined having clustered spectra around 1. Figure 1 [14] shows the eigenvaluesof Tn when Tn has clustered spectra around 1.

8

Figure 1: Clustered spectra around 1

The CG algorithm can be used to solve this system. But according to equation (11), if thecondition number of Tn is too large, the convergence rate may be slow. So we need to knowthe if the spectra of sequence of matrices Tn are clustered. In the case that spectra is clusteredaround 1, similar to Theorem 1.6, the eigenvalues 0 < δ < λ1 ≤ ·· · ≤ λp ≤ 1− ε ≤ λp+1 ≤·· · ≤ λn−q ≤ 1+ε≤ λn−q+1 ≤ ·· · ≤ λn,we have

(20)‖ek‖A

‖e0‖A≤ 2

(1+ε

δ

)p

εk−p−q

where k− p−q > 0. By induction, we know that

(21)‖ek+1‖A

‖ek‖A= ε

when n→ ∞,k ≤ n and ε→ 0, we have

(22) limk→+∞

‖ek+1‖A

‖ek‖A= 0

Therefore, the convergence rate is superlinear.

2.2 Circulant matricesBefore explaining PCG, circulant matrix should be introduced first. Strang [10] and Olkin [11]proved that for any vector v, product of circulant matrix, Toeplitz matrix and v: C−1

n Tnv can becomputed efficiently in O(nlogn) operations. Circulant matrix is a kind of n×n matrix whichcan be denoted by

(23) Cn =

c0 c−1 · · · c1−nc1 c0 · · · c2−n...

... . . . ...cn−1 cn−2 · · · c0

where c−i = cn−i, for i = 1,2, ...n−1. According to the property of diagonalization of circulantmatrixt, Cn can be diagonalized by the Fourier matrix Fn

(24) Cn = FHΛnF

where the entries of F is

(25) [Fn] j,k =1√n

e2πi jk/n

9

for 0 ≤ j,k ≤ n− 1 and Λ is a diagonal matrix which having same eigenvalues as Cn. Thediagonal entries λk of Λn are

(26) λk =n−1

∑j=0

c je2πi jk/n

Therefore, we can use fast Fourier transform (FFT) to obtain Λ in O(nlogn) operations. More-over, for any vector v, Cnv and C−1

n v can be computed in O(nlogn) operations. Besides, bythe properties of circulant, we can prove that product of Toeplitz matrix Tn and v also requireO(nlogn) operations by FFT. First we embed the matrix Tn to 2n-by-2n circulant matrix

(27) M2n =

t0 t−1 · · · t2−n t1−n 0 tn−1 · · · t2 t1t1 t0

. . . . . . t2−n t1−n 0 . . . . . . t2... . . . . . . . . . ...

... . . . . . . . . . ...

tn−2. . . . . . t0 t−1 t−2

. . . . . . 0 tn−1tn−1 tn−2 · · · t1 t0 t−1 t−2 · · · t1−n 0

0 tn−1 · · · t2 t1 t0 t−1 · · · t2−n t1−n... . . . . . . . . . ...

... . . . . . . . . . ...

t−2. . . . . . 0 tn−1 tn−2

. . . . . . t0 t−1t−1 t−2 · · · t1−n 0 tn−1 tn−2 · · · t1 t0

then we have

(28) M2n

[v0

]=

[Tnv×

]and we can multiply M2n and [v 0]T to obtain Tnv. This matrix-vector multiplication requires0(2nlog(2n)) operations and Tnv requires O(nlogn) operations.

2.3 Preconditioned conjugate gradient (PCG)If spectrum of Tn is not cluster, and condition number is large, we can use preconditioner P toreduce the condition number of the system. Instead of solving the original problem, we solve

(29) P−1n Tnx = P−1

n b

This algorithm is called preconditioned conjugate gradient(PCG) and procedure is below

Algorithm. Preconditioned Conjugate Gradient

Choose arbitrary xo, r0 = b−Ax0 and set p0 = r0/Pn,z0 = p0

For k=0,1,2...until convergence

ak = (zk,rk)/(pk,Tn pk)

xk+1 = xk +ak pk

rk+1 = rk−akTn pk

zk+1 = Pn/rk+1

βk = (zk+1,rk+1)/(rk,rk)

pk+1 = zk+1 +βk pk

10

The preconditioner Pn should be constructed within O(nlogn) operations and calculatingPnzk+1 = rk+1 needs O(nlogn) operations. Besides, the spectrum of P−1

n An should be clustered.According to the properties of circulant matrix, Cn is a kind of good preconditioner. Thefollowing Table shows that FFT can significantly improve the operation speed when n is verylarge.

Table 1: Complexity with different N

11

3 Circulant preconditionersStrang [10] and Olkin [11] first applied PCG method with circulant preconditioners Cn toToeplitz systems. We can solve Tnx = b by solving C−1

n Tnx = C−1n b, the complexity per it-

eration has a lot to do with the complexity of matrix-vector multiplication such as C−1n Tnv.

Fast Fourier transform can be applied to solve it in O(logn) operations. In this section, twodifferent preconditioners are discussed and we analysis the convergence rate of PCG with bothof them.

3.1 Spectrum analysisTo begin with, a method of judging if the spectrum of matrix is clustered around 1 is introduced.This method is important for us to analyse convergence rate of PCG. Weyl Lemma is shown asbelow.

Lemma 3.1. (Weyl)If A and E are n-by-n Hermite matrices, the eigenvalues λ j(A),λ j(E) andλ j(A+E) are in ascending order for j = 1,2, ..n, we have

(30) λ j(A)+λ1(E)≤ λ j(A+E)≤ λ j(A)+λn(E)

Based on this lemma, we can do matrix decomposition in the following theorem.

Theorem 3.2. (R.Chan)For arbitrary number ε > 0, if there exists M,N > 0, when n > N,matrix An can be can be decomposed as

(31) An = Kn + In +Ln

where Kn and Ln are Hermite matrices,‖Kn‖2 < ε,rank(Ln) < M, then the spectrum of An isclustered around 1.

Proof. The spectrum of Kn + In belongs to the interval(1− ε,1+ ε).Besides, Ln has at mostM nonzero eigenvalue. According to Weyl lemma, except for at most M eigenvalues, mosteigenvalues of An are in the interval(1−ε,1+ε). Especially, when lower bound of λmin(An)is greater than 0, the convergence rate of CG is superlinear. Then two specific preconditionersare provided which can make convergence rate superlinear.

3.2 Strang’s preconditionerFirstly, we introduce Strang’s preconditioner. It was first proposed by Strang [10] in 1986.We only use half of elements of Tn and it doesn’t take a lot of calculation to construct thispreconditioner. We define a generating functions set as Wiener class whose function f satisfies

(32) f (x) =∞

∑k=−∞

tke−ikx,∞

∑k=−∞

|tk|< ∞

Recall equation (15), when f in Wiener class, the Toeplitz matrix Tn is Hermitian matrix. letcirculant matrix s(Tn) be the Strang’s preconitioner,when n = 2m+1, the diagonals s j are

(33) sk =

tk 0≤ k ≤ mtk−n m < k ≤ n−1s−k 0 <−k ≤ n−1

12

when n = 2m

(34) sk =

tk, 0≤ k ≤ m−10 or tm+t−m

2 , k = mtk−n, m <−k ≤ n−1s−k, 0 <−k ≤ n−1

It is obvious to find that the jth eigenvalue of s(Tn) is

(35) λ j(s(Tn)) =m

∑k=−m

tke2πi jk/n

for j = 0,1,2...n−1. As we know from theorem 1.6, if the eigenvalues of [s(Tn)]−1Tn are close

to each other, the convergence rate is fast. First of all, we prove a lemma.

Lemma 3.3. Consider that function f is in the Wiener class and is a positive real-valuedfunction, the matrices of s(Tn) and [s(Tn)]

−1 are uniformly bounded under 2-norm.

Proof. The series∞

∑k=−∞

tkeikx is absolute convergent for all x ∈ [−π,π], for ∀ε > 0, there exists

N such that for all m > N−12

(36) | ∑|k|>m

tkeikx|< ε

So we have

(37)

λ j(s(Tn)) =m

∑k=−m

tke2πi jk/n− f(

2π jn

)+ f

(2π j

n

)=

m

∑k=−m

tke2πi jk/n−∞

∑k=−∞

tke2πi jk/n + f(

2π jn

)= f

(2π j

n

)− ∑|k|>m

tke2πi jk/n

fmin− ε ≤ fmin−

∣∣∣∣∣ ∑|k|>m

tke2πi jk/n

∣∣∣∣∣≤ λ j(s(Tn))

≤ fmax +

∣∣∣∣∣ ∑|k|>m

tke2πi jk/n

∣∣∣∣∣≤ fmax + ε

Since function f is positive and bounded in [−π,π], ∃ a constant m > ε , such that

(38) 0 < m≤ fmin ≤ fmax ≤ ∞

Therefore,

(39) ‖s(Tn)‖2 = max0≤ j≤n−1

|λ j(s(Tn))| ≤ fmax + ε < ∞

and

(40) ‖s(Tn)−1‖2 = max

0≤ j≤n−1|λ j(s(Tn)

−1)|= 1max

0≤ j≤n−1|λ j(s(Tn))|

≤ 1fmin− ε

< ∞

13

With this lemma, the following theorem proves that the spectrum of [s(Tn)]−1Tn is clustered.

Theorem 3.4. Consider that f is in Wiener class and is positive function, for ∀ε > 0, there existpositive integer M and N, such that for all n > N, at most M eigenvalues of [s(Tn)]

−1Tn− Inhave absolute values larger than ε .

This theorem can be proved by proving that at most M eigenvalue of Tn− s(Tn) is greaterthan ε first. Let Hn = Tn− s(Tn) and Hn is a Hermite matrix. Because f in Wiener class, we

know for any ε > 0, there exist N such that∞

∑k=N+1

|tk| < ε

2 . Let Rn(N) be the n-by-n matrix

constracted by replacing the (n-N)-by-(n-N) leading principal submatrix of Hn by the zeromatrix. It’s obvious that

(41) rank(Rn(N))≤ 2N

Then we define matrix Vn(N) = Hn−Rn(N). Among all columns of Vn(N), first column hasmaximum absolute column sum. And Vn(N) is Toeplitz Hermite matrix, so

(42) ‖Vn(N)‖1 =n−N−1

∑k=m+1

|hk|=n−N−1

∑k=m+1

|tk− tk−n| ≤ 2n−N−1

∑k=N+1

|tk|< ε

and

(43) ‖Vn(N)‖2 ≤ ‖Vn(N)‖1 < ε

The spectrum of Vn(N) is in (1−ε,1+ε). According to lemma 3.1, at most 2N eigenvalues ofHn are greater than ε .

And then we decompose [s(Tn)]−1Tn as

(44)[s(Tn)]

−1 Tn = In +[s(Tn)]−1[Tn− s(Tn)]

= In +[s(Tn)]−1(Vn(N)+Rn(N))

= In +Kn +Ln

Where rank(Ln)< 2N. According to Theorem 3.2 we know the spectrum of [s(Tn)−1Tn] is

clustered around 1. Besides, R Chan and M Yeung have proved that λmin[s(Tn)]−1Tn is far from

0 by following equation, so the convergence rate is superlinear. Here is a theorem proposed byR.Chan as reference.

(45) 0 < δ ≤ fmin

fmax +ε≤ min

x 6=0

xT TnxxT s(Tn)x

= λmin([s(Tn)−1Tn])

Theorem 3.5. Let |||ek|||= eTk [s(Tn)]

− 12 Tn[s(T )]−

12 ek and f be an w-times differentiable function

with f (w) ∈ L1[−π,π] and w >1 (which means it satisfies |t j| ≤ c/ jw+1 for some constant c).When n is large, c is a constant and depends only on w and f , such that

(46)|||e2q||||||e0|||

≤ cq

((q−1)!)2w−2

14

3.3 T.Chan PreconditionerT. Chan proposed another circulant preconditioner for Toepltiz systems in 1988. First we definea set of matrices.

Definition 3.6. Give a unitary matrix U ∈ Cn×n, we define a matrices set as

(47) M = {UHΛnU |Λn is any n-by-n diagonal matrix}

And if U is Fourier matrix, M is the set of circulant matrices. For a matrix An, T. Chan’spreconditioner cU(An) is the solution of

(48) minWn∈MU

‖An−Wn‖F

In the case of n-by-n Toeplitz systems, T.Chan has proved cF(Tn) is an optimal circulant pre-conditioner, where F is Fourier matrix. The diagonal elements of cF(Tn) are given by

(49) ck =

{(n−k)tk+ktk−n

n , 0≤ k ≤ n−1cn+k, 0 <−k ≤ n−1

It takes O(n) operations to construct cF(Tn).We now review some important properties throughthe following lemma.

Lemma 3.7. (1). The preconditioner cF(Tn) is uniquely determined by Tn and is given by

(50) cF(Tn) = FHdiag(FT FH)F

(2). If Tn is a Hermitian matrix, then cF(Tn) is also a Hermitian matrix. Besides, it satisfies

(51) λmin(Tn)≤ λmin(cF(Tn))≤ λmax(cF(Tn))≤ λmax(Tn)

Now we consider the convergence rate of PCG with T.Chan preconditioner, and that is tofind eigenvalue distribution of [cF(Tn)]

−1Tn. There is a relation between Strang’s precondi-tioner and T.Chan’s preconditioner.

Lemma 3.8. Let f be the function in Wiener class, then

(52) limx→∞‖s(Tn)− cF(Tn)‖2 = 0

To prove this lemma, suppose n = 2m, we construct a matrix Bn = s(Tn)− cF(Tn), then Bnis a circulant matrix and its diagonal elements are bk. Therefore, the jth eigenvalue of Bn is

(53) λ j(Bn) =n−1

∑k=0

bke2πi jk/n ≤ 2m

∑k=1

kn(|tk|+ |tk−n|)

and we have

(54) ‖Bn‖2 ≤ 2m

∑k=1

kn|tk|+

n−1

∑k=m|tk|

Then we can use the boundedness of function to complete the proof.

15

Lemma 3.9. Let f be a positive real-valued function in the Wiener class, the matrices of cF(Tn)and [cF(Tn)]

−1 are uniformly bounded under 2-norm.

We can prove this lemma by 3.7 (3), so we have

(55) ‖cF(Tn)‖2 = λmax(cF(Tn))≤ λmax(Tn)≤ fmax

and

(56) ‖[cF(Tn)]−1‖2 =

1λmin(cF(Tn))

≤ 1λmin(Tn)

≤ 1fmin

And similar to Strang’s preconditioner, we divide [cF(Tn)]−1Tn into three parts,

(57) [cF(Tn)]−1Tn = In +[cF(Tn)]

−1[Tn− s(Tn)]+ [cF(Tn)]−1[s(Tn)− cF(Tn)]

When n is large, according to Lemma 3.8 3.9, for any ε> 0

(58) ‖[cF(Tn)]−1[s(Tn)− cF(Tn)]‖2 ≤ ‖[cF(Tn)]

−1‖2‖s(Tn)− cF(Tn)‖2 < ε

By the proof of Theorem 3.4, there exists M,N > 0, such that

(59) [cF(Tn)]−1[Tn− s(Tn)] = Kn +Ln

where n > N,‖Kn‖2 < ε and rank(Ln)≤M.So we can prove that at most M eigenvalues of [cF(Tn)]

−1Tn− In have absolute values largerthan ε . Similar to Strang’s preconditioner, λmin([cF(Tn)]

−1Tn) is far from 0, so convergence rateof PCG with T.Chan’s preconditioner is superlinear.

16

4 Numerical resultsIn this part, we conduct numerical experiments to show the effectiveness of the preconditionersdescribed above,. In all experiments, the implementations are made in Matlab R2020b.

Example 4.1. We use PCG with preconditioners s(Tn) and cF(Tn) to the Toeplitz systemsTnx = b, where b is a vector where all elements are zero. Consider the generating function isdefined as(60)

f1(x)= x2, f2(x)= x4+1, f3(x)= 2∞

∑m=0

sin(mx)+ cos(mx)(1+m)1.1 , and f4(x)= |x|3+0.01

To construct Toeplitz matrix Tn, we just need to calculate first column of Tn. We take initialguess as x0 = [0,0, ..0] and we stop iteration until kth residual satisfies

(61)‖rk‖2

‖r0‖2 < 10−7

We generate n = 32,64,128,512,1024 to compare the iteration times with the two precondi-tioner and non-preconditioner for each model of f.

(a) Number of iterations of f1 (b) Number of iterations of f2

(c) Number of iterations of f3 (d) Number of iterations of f4

Table 2: Number of iterations

From the Table 2, we observed that preconditioned conjugate gradient is much more effi-cient than conjugate gradient, especially when n is greater than 512. In generally, the iterationsof the two preconditioners are very close and the number does not increase with increase of n.

Example 4.2. In this part, we illustrate that convergence rate of PCG with Strang’s and T.Chan’spreconditioner are similar. I generated the generating function f as f (x) = x4+1, which is evenfunction. For n = 64, I analysed eigenvalues distribution of P−1Tn by different preconditionerand the result is shown in following Figure 2 and I also show the relative residual r of eachiteration in Figure 3. The faster residual decreases, the faster convergence rate is. In the caseof f is not even function, I generated f as f3 and n = 64, the similar Figure 4 and Figure 5 areshown below as well.

17

Figure 2: Eigenvalues distribution of P−1Tn by different preconditioner.

Figure 3: Residual of PCG with different preconditioners.

18

Figure 4: Eigenvalues distribution of P−1Tn by different preconditioner.

Figure 5: Residual of PCG with different preconditioners.

19

We observed that whether f is even function or not, spectrum of [s(Tn)]−1Tn and [cF(Tn)]

−1Tnare clustered around 1, so the convergence rate is fast. However, in the case of no precondi-tioner, the spectrum distribution is sparse, so the convergence rate is much slower than PCG.What’s more, the distribution of spectrum of [s(Tn)]

−1Tn and [cF(Tn)]−1Tn are similar, so their

convergence rate are close. Besides, according to Figure 3, we find when iteration is 8, theresiduals of PCG with preconditioners are almost 0, however, the residual of CG is still greaterthan 10−4 and it stop iterating until the iteration is 36. In Figure 5, when iteration is 7, the PCGalgorithm is stop, but CG does not stop iterating until n = 17 Therefore, PCG is very efficient.

20

5 ConclusionIterative methods are accurate and fast to solve large linear systems. From the steepest de-scent to preconditioned conjugate gradient, the iterative algorithm is constantly optimized. ForToeplitz linear systems, which have a wide range of applications in many scientific fields, pre-conditioning is a very effective method to speed up the convergence and we discuss two kinds ofcirculant matrices as preconditioners. The convergence rate of the CG is linear but superlinearof PCG. In the case that the generating function is f1, f2, f4, when n is large such as n = 1024,the number of iteration times of PCG is s smaller than 1/10 of that of the CG. In generally,the iterations of the two preconditioners are very close and the number does not increase withincrease of n. Moreover, the efficiency of T.Chan’s and Strang’s preconditioners are similar.

21

6 Application in feedforward deep networksLinear systems are popular in machine learning. And the idea of gradient descent can also beused to estimate the parameters of linear model in neural network. Neural network is widelyused in the field of machine learning, such as image recognition, speech recognition and so on.In this section, we discuss muti layer perception (MLP), which is one of the most basic andimportant neural network. Iteration for solving linear system can also be used effectively inthis aspect. Let activation function be sigmoid function

(62) σ(x) =1

1+ e−x

and neuron pre-activation be

(63) a(x) = b+∑i

wixi

Gradient descent, as a iteration method is applied to reduce loss function.

6.1 BackgroundMuti layer perception is consist of input layers, hidden layers and output layers. As long asthe number of nodes is enough, a three-layer neural network can approximate very complexfunctions. For the same problem, compared with the shallow network, the deep network needsless nodes and the complexity of the model is lower. In general, we take input and outputvalue as training set, and we use BackPropagation with Stochastic Gradient Descent to estimateweight w. Figure 3 shows the structure of a muti layer perception.

Figure 6: Muti Layer Perception

22

6.2 BackPropagation algorithmBackPropagation (BP) algorithm was first proposed by Werbos in 1974. We use BP to adjustweight w in every step of iterations.The loss function is defined as

(64) E(w) =12 ∑

d∈D∑

k∈out puts(tkd−okd)

2

for single sample,

(65) Ed(w) =12 ∑

k∈out puts(tk−ok)

2

x ji is ith input of jth neuronw ji is weight of x jinet j = ∑i w jix ji is weighted sum of jth neuron inputso j is output of jth neuront j is target output of jth neuronσ is sigmoid functionDownstream( j) is a set of neurons whose input includes jth neuron

The gradient direction is

(66) ∆w ji =−η∂Ed

∂w ji

According to chain rule,

(67)∂Ed

∂w ji=

∂Ed

∂net j

∂net j

∂w ji=

∂Ed

∂net jx ji

In the case of solving output layer, the procedure is as follow steps.Step1: ∂Ed

∂net j= ∂Ed

∂o j

∂o j∂net j

Step2: ∂Ed∂o j

= ∂

∂o j

12 ∑

k∈out puts(tk−ok)

2 =−(t j−o j)

Step3: ∂o j∂net j

=∂σ(net j)

∂net j= o j(1−o j)

such that ∆w ji = −η∂Ed∂w ji

= η(t j− o j)o j(1− o j)x ji. In the case of solving hidden layer, theprocedure is shown as below.

(68)

∂Ed

∂net j= ∑

k∈Downstream( j)

∂Ed

∂netk

∂netk∂o j

∂o j

∂net j

= ∑k∈Downstream( j)

δk∂netk∂o j

∂o j

∂net j


δkwk j∂o j

∂net j


δklwk jo j(1−o j)

23

So ∆w ji =ηδ jx ji and δ j = o j(1−o j)∑k∈Downstream( j) δkwk j. We update gradient as w ji←w ji+∆w ji. BP algoriithm has a high degree of self-learning and adaptive ability, but BP algorithmcan only ensure convergence to local minimum not global minimum. And similar to the steepestdescent, the convergence rate is slow.

24

References[1] R. Chan, M. Ng, Conjugate gradient methods for Toeplitz systems, In: SIAM Rev. 38

(1996) 427–482.

[2] Greenbaum, A. (1997), Iterative Methods for Solving Linear Systems , In: Frontiers inApplied Mathematics

[3] Goodfellow, Ian, Bengio, Yoshua, & Courville, Aaron. (2018). Deep Learning –Grundlagen, aktuelle Verfahren und Algorithmen, neue Forschungsansätze (1st ed.). MitpVerlag

[4] Greenbaum, A. (1997), Iterative Methods for Solving Linear Systems (Frontiers in AppliedMathematics)

[5] R. H. Chan and T. F. Chan, Circulant Preconditioners for Elliptic Problems, Numer. LinearAlgebra Appl., Vol. 1 (1992), pp. 77–101.

[6] N. Levinson, The Wiener RMS (Root Mean Square) error criterion in filter design andprediction, J. Math. Phys. 25 (1946) 261–278.

[7] J. R. Bunch, Stability of Methods for Solving Toeplitz Systems of Equations, SIAM J. Sci.Stat. Comput., Vol. 6 (1985), pp. 349–364.

[8] W.-K. Ching, Iterative Methods for Queuing and Manufacturing Systems, Springer-Verlag,London, 2001.

[9] X.-Q. Jin, Developments and Applications of Block Toeplitz Iterative Solvers, Kluwer Aca-demic Publishers, Dordrecht, The Netherlands, and Science Press, Beijing, China, 2002

[10] G. Strang,A Proposal for Toeplitz Matrix Calculations, Stud. Appl. Math., Vol. 74 (1986),pp. 171–176.

[11] J. Olkin, Linear and Nonlinear Deconvolution Problems, Ph.D. thesis, Rice University,Houston, TX, 1986

[12] W. F. Trench, An Algorithm for the Inversion of Finite Toeplitz Matrices, SIAM J. Appl.Math., Vol. 12 (1964), pp. 515–522.

[13] N. Levinson. The Wiener RMS (Root Mean Square) Error Criterion in Filter Design andPrediction, J. Math. Phys., Vol. 25 (1946), pp. 261–278.

[14] R. H. Chan, The Spectrum of a Family of Circulant Preconditioned Toeplitz Systems,SIAM J. Numer. Anal., Vol. 26 (1989), pp. 503–506.

25

sovling linear systems via iterative methods

Documents