the numerical solution of banded linear systems by ...loughborough university institutional...

396

Upload: others

Post on 26-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

The numerical solution of banded linear systems by generallized factorization
procedures
This item was submitted to Loughborough University's Institutional Repository by the/an author.
Additional Information:
• A Doctoral Thesis. Submitted in partial fullment of the requirements for the award of Doctor of Philosophy of Loughborough University.
Metadata Record: https://dspace.lboro.ac.uk/2134/10378
(https://dspace.lboro.ac.uk/) under the following Creative Commons Licence conditions.
For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/
LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY
LIBRARY i AUTHOR/FILING TITlE
THE NUMERICALSOLUrION OF BANDED LINEAR SYSTEMS
BY GENERALIZED FACTORIZATION PROCEDURES
A Doctoral Thesis
of the Loughborough University of Technology
.November, 1981.
Department of Computer Studies
© Shaker E1ias Audish, 1981.
I declare that the following thesis is a record of
research work carried out by me, and that the thesis is
of my own composition. I also certify that neither this
thesis.nor the original work contained therein has been
submitted to this or any other institution for a degree.
S.E. AUDISH.
I would like to express my sincere appreciation to Professor
D.J. Evans for his continuous encouragement, capable guidance and
supervision, and by helping me to alleviate the non-academic
difficulties I experienced throughout the period of this research.
I am very grateful to all the staff of the Computer Centre
and the Department of Computer Studies, and to friends at Loughborough,
in particular Mr. M. Shuker, for their help during the difficult stages
of this work.
My deep thanks go to Miss Judith M. Briers for the excellent
and careful typing of the thesis.
Last, but by no means least, I am warmly indebted to my parents,
sisters, brothers and other relatives for their patience, constant
encouragement and financial support, especially to my brother,
Abu-Ghazwan, whose personal efforts were undoubtedly instrumental in
enabling me to complete this thesis.
CONTENTS
PAGE
2.1 Basic Concepts of Matrix Algebra............... 7
2.2 Direct and Iterative Methods for Solving Linear Systems of Equations •••••••••••••••••• 16
2.3 Contract Mapping Theorem, Newton's Method •••••• 32
2.4 Eigenvalue Problem............................. 41
2.5 Evaluation of the Square Root of a Sq~~e Matrix. ..•.. •.•.. .••••. .••.••. •. . .••••••••• •• 48
2.6 Main Properties of Continued Fractions ••••••••• 50
Chapter 3: NUMERICAL SOLUTION OF BOUNDARY VALUE PROBLEMS
3.1 Different Numerical Approaches for Solving Boundary Value Problems •••••••••••• ". • • • • • • • • • 56
3.2 Finite-Difference Methods •••••.•••••••••••••••• 61
3.3 Low-Order Discretization ••••••••••••••••••••••• 64
3.4 High-Order Discretization •••••••••••••••••••••• 70
3.5 Finite Difference Methods for Partial" Differential Equations ••••••••••••••••••••••• 84

4.2.2 Iterative Method of Solution (GITRM) ••••••••• 100
4.2.3 Solution of the System QZ=~ •••••••••••••••••• 108
4.2.4 A Polynomial Scheme for the Solution of the Modified Non-Linear Syste~ •••••••••••••••• 113
4.2.5 Stability of the Method •••••••••••••••••••••• 115
4.2.6 Convergence of the Non-Linear System ••••••••• 119
PAGE
4.2.7 Error Analysis for the Linear Systems Involved in the Algorithm FICM1 ........... 124
4.3.1 Algorithm FICM2. .••...................•.... 131
4.3.3 Determination of the Elements of the Matrix Factors.... . . • . . . . • . . . . . . . . • . . . . . . 148
4.3.4 Solution of Symmetric Linear Systems....... 159
4.3.5 Rounding Error Analysis.................... 166
4.3.6 Convergence Analysis of the Iterative Procedure Applied in the Algorithm FICM2 168
4.4.1 Algorithm FIRM1........ ........ ... .. .. ..... 182
4.4.2 Algorithmic Solution of a Coupled System... 192
4.4.3 Determination of the Elements of the Rectangular Matrices U and L............. 206
4.5 Algorithm FICM5............................ 221
4.6 Algori thm FIRM4... • . . . • . . . • . • . . . . • . . . • . . . • . 228
Chapter 5: NEW ALGORITHMIC METHODS FOR THE SOLUTION OF BLOCK MATRIX EQUATIONS
5.1 A1gori thm FICM3. . . . • . . • . . • . . • . . . . • • . . • . • • . . 238
5.2 Algorithm FIRM2............................ 252
5.5 -Algorithm FICM6 and FIRMS.................. 279
Chapter 6: APPLICATIONS TO ORDINARY DIFFERENTIAL EQUATIONS
6.1 Introduction............................................................ 284
6.3 Non-Linear Equations Involved in FICM2 and (FIRM1).............................. 287
PAGE
6.5 Application of FIR}!l on Eigenproblems....... 302
Chapter 7: APPLICATIONS TO PARTIAL DIFFERENTIAL EQUATIONS
7.1 Introduction...................... •••••••.•. 308
7.3 Numerical Examples.......................... 314
Chapter 8: CONCLUSIVE REMARKS AND FURTHER INVESTIGATIONS
REFERENCES
The work of this thesis mainly presents new direct computational
algorithmic solvers for real linear systems of equations (of wide banded
matrices) derived from the application of well-known finite-difference
techniques to boundary value problems involving ordinary and partial
differential equations. These algorithms are for illustrative purposes
suitable for problems, not only differential equations with specific
boundary conditions or two-point boundary value problems, but a wider class
of differential equations can also be treated. They are applicable for
partial differential equations where a banded matrix is obtained by using
a high-order approximation such as a 9-point formula for the Laplace or
Poisson equation. Also the application is extendable to higher order
equations such as the Biharmonic equation. Whilst one type of the
algorithm is suggested only for treating block linear systems, the other
type is also applicable to these as well as their use in the point form
applications to which they were basically proposed. The two types are
respectively named in the last chaptersof this thesis as BLOCKSOLVERs and
BANDSOLVERs.
The two SOLVERs are categorised to suit two common kinds of problems,
i) subjected to periodic boundary conditions and ii) those subjected to
non-periodic or more commonly known, Dirichlet, Ne~ann and Robin
conditions. Subsequently the factorisation procedure of the coefficient
matrix takes place according to the type of the condition that the
considered problem is subjected to. Precisely for a given matrix of order
N with bandwidth 2r+l, r~l (N~2r+l), with type (i) the matrix is factorised
into two invertable, cyclic (or periodic) upper and lower matrices of semi­
bandwidth r+l, whilst with type (ii) the obtained factor matrices are
rectangular upper and lower of size (Nx(N+r» and «N+r)xN) respectively,
and of semi-bandwidth r+l.
2
As an alternative approach to the conventional methods (as a LU­
Decomposition), the elements of the factor matrices are obtained by
adopting some iterative schemes whose convergence properties are
investigated. This is applicable to the BANDSOLVERs, whilst in the
BLOCKSOLVE:<.s the factorisation procedure involves computing a matrix square
root.
Eowever, consistent with the demands of the new era of technology
where high-speed computers are introduced, and the start of the revolution
of micro-chips, the investigation for reliable computational methods is
extensively broadening. Moreover, the emergence of parallel processing
mQchines so far shows remarkable results on reducing the execution time
for some particular numerical algorithms, although some reservations on
storage demands still exist.
fields which are still encountered by Numerical Analysts and other
specialists for which no satisfactory solution procedures have been
reached and not so for the forseeable future.
Basically, the development of computational methods takes place in
one of two directions: to obtain the solution itepatively or directly,
and consequently it has become customary in literature to classify the
conventional and new methods to these appropriate directions. It is known
that no method has the merit of generality, but they are valued or
preferred for certain problems according to many vital factors associated
with the use of the. computer such as the amount of storage required,
computing time, levels of obtainable accuracy, ••• etc., and then the
advantages and disadvantages of either method may accordingly be recognised
or detected. The conventional types for both methods are discussed in
Chapter 2. Here we present a brief indication to a few methods for both
3
Iterative methods have witnessed considerable advances in the last
three decades or so, in particular we refer to the contributions of
Frankel, Young and others in the 50's to generalise the successive over­
relaxation procedure (point form), and for the block case as given by
Varga (1902) who also contributed earlier a method of normalisation of
block systems so that a considerable reduction in arithmetic operations is
implied (Cuthill and Varga (1959». Other'methods for sparse matrices may
be found in Evans (1974). For certain cases, when the coefficient matrix
of the considered linear system possess special properties some recent
methods are suggested.
For example, when the matrix is symmetric and positive definite
Gustafsson (1979) presents the so-called Modified Incomplete Choleski,
prior to that the "Incomplete LU-Decomposition" for a symmetric M-matrix
was proposed by Meijerink and van der Vorst (1977) in which both methods
are based on the idea of splitting the matrix, and in the former seeking a
suitable parameter to accelerate the iteration process is significant and
important. Another method deals with non-negative types of matrices, as in
Neumann and Plemmonns (1978) in which their work includes a study of linear
stationary iterative methods with non-negative matrices for solving singular
and consistent linear systems.
In direct methods too, the development in a similar period has
progressed extensively, both in the theoretical and practical sides. In
the former, for instance, the error analysis for the direct method
contributed by Wi1kinson has enabled the 'users' to predict or recognise
the behaviour of the method, its stability and the bounds of the accuracy
in the obtainable solution. On the other hand, fast methods have been
suggested, such as in (Hockney (1965» involving Fast Fourier transforms,
4
sparse factorisation by Evans (1971) and his work in the recent years.
Other methods involving cyclic reduction as in Sweet (1974, 1977) or the
spectral resolution methods introduced by Buzbee et al (1970). A comparison
between point and block elimination schemes to solve block-tridiagonal
systems and the stability for the latter scheme are given in Varah (1972);
for "the considered block matrix being symmetric and positive defi,ite it
is indicated in the same reference that Gene Golub has used the
Choleski decomposition for this particular case. A fast numerical solution
of linear systems of equations led to a block quindiagonal coefficient
matrix using a factoring and block elimination process as proposed by Bauer
and Reiss (1972). Another type of method which deals with rather sparse
matrices is suggested by Henderson and Wassyng (1978) in which the method
exploits the zero elements below the diagonal of the given coefficient
matrix, but the method shows superiority to Gaussian elimination only when
the matrix is sparse strictly in the lower triangular part.
The presentation in this thesis is partitioned into seven chapters
(excluding the current one) and may be outlined as follows.
In Chapter 2, the general mathematical background is included which
involves the basic concepts, definitions and theorems; in addition to some
conventional theoretical work such as, direct and iterative methods, the
contract mapping theorem and Newton's method with a few of its variants.
The chapter also covers some other topics which to a certain extent are "
directly related to the procedure of the new algorithms, such as the theory
of the periodic continued fractions, the computation of a matrix square
root by Newton's method, eigenvalue problem, etc.
As a matter of interest, the field of the applications for some of
the algorithms, the 2-point boundary value problem concerning non-linear
(or linear) ordinary differential equations is chosen. In relation to this
5
problem, the so-called iterative-deferred-correction technique is adopted.
Thus, this technique has been covered considerably in Chapter 3. Also
indicated in this chapter we extend the idea of using symmetric finite­
difference formulae of high-order (or it is called in the appropriate
chapter, high-order approximations) for the non-linear case, notably the
work carried out by Shoosmith (1973) on the linear case is referred to.
In fact, the motivation of considering such techniques is to provide us
with 'the generality of the new algorithms indicated earlier, that is to deal
with matrices of any' bandwidth! Apart from a brief indication of·the
concepts involved in partial differential equations, the description of the
discretisation schemes to specific continuous problems via using finite­
difference approximations involve different computational molecules, is
included in Chapter 3. In addition, because the chapter is devoted to the
numerical solution of boundary value problems, thus an abbreviated description
to some of the numerical approaches are made at the beginning. in particular.
finite element methods followed by our main interest approach in this work,
the finite-difference method.
The new suggested algorithms are presented in two chapters, 4 and 5.
Chapter 4 includes the algorithms which are proposed for the pointwise
problems (BANDSOLVERs). One of them is designed for the special case. when
the coefficient matrix of the considered linear system is periodic and
possesses constant elements. While the remaining BAND SOLVERs deal with the
matrices of non-constant (generally. non-symmetric), elements for both
periodic and non-periodic cases. The extension of these algorithms to
certain skew-type matrices is also included. While Chapter 5 presents the
BLOCKSOLVERs which in fact are considered as an extension to the BANDSOLVERs
for special cases only.
The results of the numerical experimental work corresponding to the
6
algorithms of the last two chapters are given in Chapter 6 and 7 respectively.
In these chapters some model problems for both ordinary and partial
differential equations are introduced; in addition, a >considerable
discussion on the factorisation procedures applied to various common types
of matrices in which some related aspects are included such as the rate of
convergence of the involved iteration processes, etc. Eigenproblems are
discussed in Sections 6.5 and 7.4. The tested examples as a whole may
reflect to which type of matrices the new algorithms are ~oth practical and
applicable.
Finally, the main remarks in the light of this work are concluded in
Chapter 8 with some recommendations for pursuing further investigations
and extensions.
CHAPTER 2
MATHEMATICAL BACKGROUND
Numerical approaches such as finite-difference, finite element methods
(see Chapter 3) are generally based on matrix algebra which by using its
cQn~ept3 the analysis of these methods or the solution process can be
expressed in a suitable manner. In addition, in practice, the use of
electronic computers enables matrix algebra to be an important tool in the
aprlication fields. In this presentation, we will emphasise the concepts
which are (generally) associated with the subjects throughout this thesis.
The most important and well-known elementary concept is the matrix
which is defined to be a rectanglar array of ordered numbers and customarily
denoted by a capital letter (our consideration is merely on re(xZ matrices).
A matrix A is of size (mxn) if it has m l'OUS and n columns. (Figure 2.1.1).
A =
a l,n
a 2,n
I I
FIGURE 2.1.1: A is an mXn matrix
The matrix A is said to be square (or quadratic) when m=n, and hence A is
of order n (or m). When m=l, we have a row vector., and for n=l, a colwnn
vector, usually denoted by small underlined letters. The transpose of a
T matrix A=[a .. ] is written as A and obtained by interchanging the rows
~.J . T T
and columns of A, i.e. the element a •. of A becomes a .. of A. If A=A , 1,] J,1
then A is said to be symmetric, and anti-symmetric if A=_AT (obviously the
two concepts are applicable for square matrices only), i.e. a .. =a .. and L,] J,1
a .. =-a .. respectively. A square matrix (from now on any mentioned matrix ~,J J,1
is assumed square unless otherwise stated). A matrix A possesses an inverse,
denoted by A-I and is called a non-singular or invertable matrix (sometimes
8
this property is equivalent to, say, that A h'ls linearly independent ccl~ns
cre rcws), ctherwise A is singuZar. On th\! ctller hand, if, the determinant
cf A, which will be dencted by det(A), is zero, then A is singular, ctherwise
(i.e. det(A)#O) A-I dces exist, and hence we have
AA-I = A-lA = I ,
Definiticn 2.1.1: (Pseudc-inverse, (Strang (1976»
Given a rectangular (mxn) matrix A which may nct be invertable.
Its "inverse" which is dencted by A+ is expressed in the fcrm
A+ = (ATA)-lAT
T where A A is a square matrix cf crder n which can be inverted unless it is
singular.
In this thesis we shall be mainly ccncerned with banded matrices.
Bandedness means that all elements beycnd the bandwidth cf the matrix are
zero" i.e. fcr a banded matrix
a .. = 0 fcr 1,J
A= [a .. ] we 1,J
where 2r+l is the bandwidth cf A.
If A has a large number cf zercs, then it is said to, be sparse banded
r7atrix. In this chapter we may illustrate scme examples cf matrices such
that the zero, elements will be presented as a single zero, nctaticn, "0" and
the ncn-zerc elements will be dencted by "X".
Two, types cf bandwidth fcr matrix A are shcwn in Figure 2.1.2.
A =
\ ' \,' \' \ \ '
XXX XXX
r=3
XXX XXX
X X X X
FIGURE 2.1.2: Banded matrices (pentadiagcnal, septadiagcnal)
If one half-bandwidth of a matrix is merely zero, then we have either
an uvper or ZOUJer triangular banded matrix. For example. U= [u .. ] is upper 1J
triangular if u .. =0 for i>j and L=[t .. ] is lower triangular, if t .. =0 for 1J 1J 1J
i<j; also we have adiagonaZ matrix D= d .. if d .. =0 for all i/j and non- 1J 1J
zero for d .. (Fig.2.1.3). 11
U =
, " X' , , , , , , , , , ,
" " , , , , , , , o
X
X
,1) =
X
X
9
It may be important to indicate that we shall also consider banded
matrices as presented in Fig.2.l.4 and consists of bandwidth 2r+l plus T(~+~/2
extra elements on each of the upper right hand and lower half hand corners.
X X
""" , " " , , o
X X X
X X
'--...---' • r+l r
_F_IG"-U"'RE=..:2:...; • ..;:1c.:. • ...;,4: Banded matrices - Periodic type
Also we shall consider rectangular upper and lower banded matrices of
bandwidth r+l, as in Fig.2.l.5.
U =
, 'r+ 1 ' I ~' ~.,
o 1 '\ 1 X " I, '" I I \ , \ I t I ,
o , 'I I I , , , I ' , "I ,
, I " I , \ 'I ' X'X-----X
o 'x::,-,~~rJ}r ~~
and L is (n+r)xn.
10
We may classify the type of matrices shown in Fig.2.l.4 as of periodic
type and in Fig.2.1.5 as non-periodic type. (see later chapter).
Defini don 2.1.2: (Augmented matrix)
Given a system of linear equations A~=~, of order n, the augmented
matrix is (A,~)which has the form given in Fig.2.1.6.
all - - - - - - - - .a1n , , , I I I , I I I
an1-- ------ ann
n
measurement of convergence is usually required. Also, for direct solution
procedures where the effect of rounding errors are considered. In this
respect it is customary to measure the 'size' or magnitude of vectors and
matrices by norms.
Definition 2.1.3:
The )Wrm of an n-dimensiona1 vector 2:., written as 112:.11, is a scalar
(or number) satisfying the following three axioms:
(1) II.'::II~O and 11.'::11=0 if and only if .':: is a null vector,
(2) 118.::<.11= I 131.11.'::11 for any scalar S (jzo.",ogeneity condition)
(3) 11.'::+z.II~ I 1.'::1 1+1 1z.1 I for vectolB.':: and Z. (triangle inequality).
Also
Definition 2.1.4:
If x=[x.]; i=1,2, ... ,n, then we have - 1
(a) infinite-norm 11.::<.1100 = m~x IXi I (uniform or Chebyshev norm) 1
(b) one-norm 112:.111 =
n
1
(2.1.1' )
(2.1.1)
(2.1.2)
(2.1.3)
In fact, these norms are special cases of the general p-norm (or
Holder norm) given by, i.e., n
= ( L i=l
1
where by setting p equal to 00, 1 and 2 in (2.1.4) yields the norms
(2.1.1) to (2.1.3) respectively.
Analogous to the Definition 2.1. 3, we proceed to present the
definition of a matrix norm as well.
Definition 2.1.5:
(2.1.4)
A norm of a matrix A of order n, written as IIAII, is a scalar such
that the following four conditions fulfil
(a) IIAII>O and IIAII=o if and only if A=O (the null matrix),
(b) 11 SA 11 = I S 1.11 A 1.1 for any scalar a (homogeneity condition),
(c) IIA+BII~IIAII+IIBII for matrices A and B (triangle inequality)
and
12
(d) IIABII"IIAII.IIBII for matrices A and B (multiplicative triangular
inequality) • The postulation of (d) in a matrix norm imposes the occurrence
of matrix products.
Below are frequently used matrix norms:
n IIAII",= max I la. ·1 (the "'-norm or maximum absolute co lurrr1 suy>')
i j=l 1J (2.1.5)
n IIAlll = max I la .. 1 (the l-norm or maximum absolute 1'01<) SU,ir:)
j i=l 1J (2.1.6)
IIAI12 = {maximum eigenvalue of the product ATA}! (spectral or
Hilbert norm). (2.1.7)
Another type of norm which is used is the F1'obenius no1'~ which is
denoted by IIAIIF and defined as follows:
IIAIIF = ( .I.lai,j 12)! 1,J
vectors, therefore it is useful to apply the multiplicative triangular
inequality norm (Definition 2.1.5) for the produce of a matrix and vector.
Thus, for a product Ax we have
(2.1.9)
Definition 2.1.6:
If matrix A and vector ~ have the norms IIAII and II~II respectively,
then these two norms are said to be compatibZe provided that (2.1.9) is
fulfilled.
Definition 2.1.7:
A subo1'dinate or induced matrix norm IIAII is defined as follows
IIAII = sup x';O
II~II
II~II Sometimes (2.1.10) is written in an equivalent form, i.e.
(2.1.10)
sup 11 A2:.11
112&11=1
It can be shown that matrix norms (2.1.5) to (2.1.7) are subordinate
(i.e. they satisfy (2.1.10) or (2.1.9) to the corresponding vector norm
(2.1.1) to (2.1.3», whilst the Frobenius norm (2.1.8) is not subordinate
to any vector norm (see Froberg (1974), Noble (1969», Conte and de Boor
(1972), Broyden (1975».
Definition 2.1.8:
A vector is said to be normalised if it is multiplied by a scalar in
13
order to produce the size of the components to numbers of value less than or
equal to 1 without changing the direction of the vector.
Two common ways of normalising a vector x=[ x.l , i=l,2, ... ,n, is by - 1
selecting a scalar S such that either:
(i) S n
=( I i=l
to obtain the normalized vector
relation 2:.T~=l holds.
Definition 2.1.9: (Permutation matrix)
Notice that for (i) the
A square matrix is called a permutation matrix if for any of its rows
only one non-zero element is included (which is unity), for example
P =
o 0 1 0
o lob 1 000
It can be shown that any permutation matrix, P (say), is orthogonal
. T-l (l.e. P =p _). Also for any matrix A, the operations of pre-multiplication,
i.e. PA and that of post-multiplication AP results in changing the order of
rows and columns respectively.
n diagonally dOminant if L
j=l j#i
la .. I~Ia .. I. 1.J 1.1. ~
n
j=l la .. I'1la.;;I. i=1.2 .... ,n. 1.J 1.J
j~i
14
f . (r) 2 A sequence 0 matr~ces A ,r=l, , ... , of the same dimension converges
to a ~Gte. limit. A (say). if the following necessary and sufficient
condition is fulfilled.
(2.1.11)
In fact. the result (2.1.11) does exist if Cauchy's theorem holds,
i.e. for any £>0 there must be an integer N such that
1 IA(r+s)_A(r)1 1 < £ for all r>N and s>O. (2.1.12)
Obviously (2.1.11) or (2.1.12) can be applied for vectors as well.
(see Demidovich and Xcuan .<I976). Kolmogorov and Fomin (1970».
Definition 2.1.11:
(s) In general, if a sequence of matrices {A }. s=1.2 •••• converges to a
1·· h . A(-A(l». . db' M . f 1.m1.t, t en matr1.X = 1.S sal. to e convepgent matp~x. oreover. l.
1im A(s) is a zero matrix (null matrix) then A is said to be a zero- s- convepgent matpix (Neumann and P1emmonS(1978».
Definition 2.1.12:
The convergence of the sequence of vectors {x(s)} to a limit x* (say).
is said to be of order P if
11~(s+l)_~*11 --~~~------ = k, where k is a non-negative constant. lim II.~ (s) -.::.* liP
Thus. for p=2. we have quadPatic convergence.
and for p=l we have (i) linear convergence iff O<k<l.
(ii) supepZineap convergence iff k=O.
Remark 2.1.1:
(or unit upper) triangular, lower (or unit lower) triangular, Hermitian,
positive definite, then so is its inVep$8, (Broyden (1975», page 39).
15
16
2.2 DIRECT AND ITERATIVE METHODS FOR SOLVING LI~~AR SYSTEMS OF EQUATIONS
The task of solving a linear system of equations which is usually
expressed in matrix form, i.e.
all - -- ----aln Xl zl , = (2.2.1)
I
or in abbreviated form, (2.2.2)
is still a major challenge in the solution of scientific problems. The
derivation of ~he system (2.2.1) is basically from linear problems and non-
linear problems as well which are usually broken down into a sequence of
steps involving linear equations, and is termed sometimes a linearization
process which forms the basis of many numerical methods (e.g., see Chapter
3, or Section 2.3). As Scarborough (1955) points out there is no single
method which is best for any and all systems of equations that may arise.
In other words a certain method may achieve quite a satisfactory solution
for (2.2.1) if it is a sparse matrix (with few non-zero elements) as in
problems which arise in large order differential equations but unsatisfactory
if it has a dense matrix (with few zero elements) as in statistical problems
where the dimension is small.
, . The available approaches for solving (2.2.1) usually lie in the
following categories:
Direct methods (e.g. Cramer's method, Gaussian elimination, the method
of square root, etc •••• ) are basically designed to achieve an exact solution
for (2.2.1) after a fixed number of arithmetical steps. This is true
theoretically, but unattainable in practice due to the limitation of computers
(i.e. their mantissa has a limited number of digits) which eventually enables
17
the occurrence of rounding errors to appear in the calculatio~ for example
the rational number 2/3 has to be presented ~n a terminated form (e.g.
0.66666 for five significant digits). This is actually one of the main
drawbacks of direct methods. The accummulation of rounding errors is well
considered in these methods because of the alteration of the matrix A in
(2.~.2) which may destroy the initial property of the matrix (i.e. sparseness)
and ultimately have a consideratce effect on the solution. Nevertheless.
most of the computer routines for solving (2.2.2) involve direct methods
since the total amount of computational labour can be determined in advance.
For a given length of mantissa (i.e. number of digits) one may be able to
predict the bounds of the rounding error and hence determine the range of
reliability of the method. If A in (2.2.2) is dense. then the elimination
methods are preferable (Jennings (1964».
Iterative methods (such as Jacobi; Gauss-Seidel. Successive Over-
relaxation method. etc ••• ) are essentially based on generating a sequence
of approximate solutions {~(s)}. s=O.l ••••• for (2.2.2) and hope that this
-1 sequence approaches the solution A ~ provided that the inverse exists.
Generally speaking. iterative methods are considered to be reZiabZe approaches
provided that the existence of convergence is assured; this is because (i)
there is no inherent inaccuracy. (ii) it is self-correcting, (iii) the method
is applicable to systems of any number of unknowns (Scarborough (1955» and
(iv) the matrix remains unaltered. The criticism of these methods is mainly
based upon: (i) there are certain systems of equations i.e. ill-conditioned
one can not predict how many steps the iteration process will require to
satisfy the required tolerance (ii) unless the sufficient and necessary
condition is satisfied. convergence cannot be guaranteed. Thus. when using
iterative methods it is advisable (i) to reduce the error each step of the
iteration if it is possible or to determine an asymptotic factor of reducing
the error to be less than one. and (ii) to provide an error bound to the
18
solution vector after a finite number of iterations (Lieberstein (1968».
We may demonstrate some of the conventional methods of both types:
(A) Direct Methods: frequently are classified into three groups:
(Leiberstein (1968».
extreme computation. For example, to solve (2.2.2) with order 10
requires some 70 million multiplications (Kunz (1957», with order
5 h h d . 1 64 . ate met 0 requLres 0 operatLons. The number of operations
involved in this method is of order (n!) if the system is of order
n (Fr,oberg (1974». What would be the case of a system consisting
of several thousands of equations? No computer so far can provide
enough storage and perform this large number of operations.
(2) Inversion of Matrices: This strategy involves computing the inverse
of the matrix A in (2.3.26) explicitly, which necessitates the
solution of n systems of linear equations and hence the number
of operations is proportional to (n4).
(3) Systematic Eliminations: These methods are superior to the
previous methods. The most widely used method is Gaussian
elimination which involves a finite number of transformations
(precisely one bess than the size of the given system) that will
eliminate all coefficients of the matrix below the diagonal and
we end up with an upper triangular matrix. Thus, for the system
(2,2.1) we have after n-l transformations (Ra1ston (1965»:
all a12 - - - - - - - - -- - - -- a 1n xl zl
(1) (1) (1) (1) a2Z aZ3 - - - - - - - - - -- aZn X z Zz
(Z) (Z) (2) (2) a 33 a34 ... - - - - - - - --a x3 z3 3n
(Z.Z.3) ... ... I I I .. ... = ... ... ... I I I .... ... I ... ... ... ... I ... ... "- ... , ... ... , ... ... I ...
...... (n-Z)"' ... '(n-2) I
19
a .• = a .. (k-l) a kj 1.J 1.J
akk j=k+l, ••• ,n (k-l) (2.2.4) (k) (k) a ik (It-I) i~k+l, ... ,n;
z. = z. (k-l) zk 1. 1. (0) akk a.· =a .. 1.J 1.J (0)
zl =z 1
(n-I) / (n-l) x = z a n n n,n
[ (i-I) , (i-I) 1 z. - L a.. x. 1 j=i+l 1J J
i=n-l,n-2, ••• ,1.
off-diagonal elements but also the upper off-diagonal elements as well.
Therefore, the final stage of the transformation produces a matrix of non-
zero elements solely on the diagonal, and hence the solution is obtained
straight-forwardly by dividing the components of the right hand side vector
by the corresponding diagonal elements. In other words, there is no need
for the back substitution stage as in the Gaussian eZimination. Furthermore,
Gaussian elimination can be shown to be superior to Gauss-Jordan since the 3 3
b f · . 1 n d n . 1 df num er 0 operat1.ons are proport1.ona to 3i an Z- respect1ve y, an or
large n the latter requires 50 percent more operations than the former
(Ralston (1965».
LU-Decomposition (Triangular Factorisation)
Let the (nxn) matrices Ml,M2""'~' k=l, ••• ,n-l, be defined as
follows (see Ralston (1965), Goult et al (1974»,
Ml
1
-m n.k
20
1
' 1
'1
th and the values ask'~k are obtained at the (k-l) step of the
transformation as illustrated in (Z.2.4).
th In fact. ~+l.k's are the multipliers of the k step of the transformation
for the Gaussian elimination method. Thus. the triangular matrix form
(2.2.3) is equivalent to MA~ = ~ •
where M = M lM 2 •••• ,M. n- n- "I
If we define U such that U = MA ,
then (2.2.5) becomes Ux = Mz - -
-1 - L •
(2.2.5)
(2.2.6)
(2.2.7)
(2.2.8)
Since the inverse of a lower triangular matrix is ano.ther lower
triangular matrix (see Remark 2.1.1), then we can write -1 -1 -1 -1
M = M1 MZ ••••• Mn_l = L
21
'fhe form (2.2iS)' is termed triangular or the 'LU' decomposition. \'--/'
Subsequently the solution of (2.2.2) by this algorithm follows from
(2.2.10) via the introduction of an auxilliary vector, Z (say), such
that the system (2.2.10) will be split into two systems, i.e.
LZ==., (2.2.l1a)
U~=Z, (2.2.l1b)
where L is a unit lower triangular matrix and U is an upper triangular
matrix. It turns out that the solution vector x can be obtained from
(2.2.11) through fOY't!al'd and /xd,wal'd substitutions (i.e. by (2.2.na)
and (2.2.11b), i.e. Y1 = zl
i-I y. = z. - L Yk~'k , i=2, ..• ,n, ~ ~ k=l ~
n x. = (y. - L u .. x.) lu .. , ~ ~ j=i+1 ~J J 11
and
i=n(-l)1
For sparse matrices with special form (tridiagonal, pentadaigonal,
etc.) the factorisation (2.2.9) may be achieved by equating both sides
"0 that a 'g~neral' recurrence relation can be formulated mainly for
progrannning purposes. Further. the intermediate V!i\ctor Z in (2.2 .1l)
doas not ceed to be computed explicitly, for example the solution of the
tridiagonal system
dl a
I I
0 , , I I ,
22
can be expressed by the following recurrence relation (Varga (1962~:
x = gn' . x. = g. - S .x. 1 , i=1,2, ••• ,n-l n 1. 1. 1. 1.+ i+1 a
1 a.
Sl S. = 1. i=2,3, ..... n-l (2.2.12) =- , d1 1. d.-c.S. 1 1. 1. 1.-
where
1 b.-e.g. 1 =
1. 1. 1.- i=2,3, •.• ,n. gl =- g. , d1 1. d.-c.S. 1
1. 1. 1.-
In fact (2.2.12) 1.S an equivalent (nested) form of the Gaussian
elimination process,
theorem is valid.
Theorem 2.2.1:
A non-singular matrix may be decomposed into the product LV (where L
and U are lower and upper triangular matrices) if and only if every leading
principal submatrix of A is non-singular.
Corollary 2.2.1:
If L is unit lower triangular then the decomposition is unique.
Proof:
Both Theorem 2.2.1 and Corollary 2.1.1 are given in Broyden (1975),
see also Faddeeva (1959).
Corollary 2.2.2:
If U is a unit upper triangular matrix then the decomposition is
unique.
Proof: Similar to Corollary 2.2.1.
The LU decomposition where Corollary 2.2.1 is valid is often called
DoZittZe's method, whilst if Corollary 2.2.2 is valid, it is called
Crout's method (6o",tt et a1 (1974».
If the matrix A in (2.2.2). is symmetric, then the decomposition
(2.2.9) may have a modified variant which is an economised procedure as
23
far as the computational work is concerned is called the Choleski's method
(or the square-root method) which can be outlined as follows:
Since A is a symmetric matrix, then U can be replaced by LT and hence
we have (2.2.13)
where L .=0 for i<j, then 1J
L. = [a .. H JJ
1<=1
J1 k=l 1 J
provided ~ .. ;,o. JJ
j=1,2, .•. ,n (2.2.14)
It is worthwhile to point out that if the positive square roots in
(2.2.14) are chosen only, then (2.2.13) is a unique factorisation provided
that the matrix A is reaZ symmetric and positive definite. In actual fact,
this latter property may place the Choleski scheme superior to other
variants of the elimination methods (such as those mentioned above), in
particular if double-precision arithmetic is used so that the square roots
of (2.2.14) are evaluated as accurately as possible. Although, the
calculation of the square roots remains one of the main disadvantages of
T the Choleski method, but this may be alleviated by the decomposition LDL =A,
where D is a diagonal matrix (Broyden (1975».
Practical Refinement of Gaussian Elimination Process
If any of the diagonal elements of the matrix in (2.2.1) becomes
zero during the elimination process, then the final upper triangular form
will be unattainable, and hence the process will break down. Nevertheless,
to overcome such difficulty and to ensure the continuation of the
elimination process we may apply one of two basic well known pivoting
schemes.
24
Definition 2.2.1:
(k-1) Any of the diagonal elements in (2.1.1). i.e. ~k • k=l ••••• n
(where a~~)=a11) is termed the kth pivot. If it is zero. then it is called
The two strategies of pivoting are mainly concerned with avoiding a
zero pivot which may arise during the elimination process.
(1) Partial Pivoting
This strategy is based on selecting an element with largest value in·
modulus from the column of the reduced matrix as a pivotal element.
Eventually. the appropriate rows of the augmented matrix (A(k).Z(k)) must
be interchanged.
The following example shows that the partial pivoting scheme can be
inadequate. i.e •• (Williams (1973))
4x + 3y = la
3x - 2y = 12
Any row of the above equations can be multiplied by an arbitrary
constant and hence change the pivotal row. This can be overcome by
normalizing the rows and thereby making them comparable in one of the two
following ways: (see De£. 2.1 • 8 ) :
(i) divide each row by the largest element in modulus.
(ii) divide each row by the Euclidian norm of the row.
(2) Full (or complete) Pivoting
The pivotal element is chosen to be the element of largest magnitude
amongst the elements of the reduced matrix. regardless of the position of
the element in this matrix.
Both ways of pivoting can be easily illustrated in Fig.2.2.l
assuming the system (2,2,2) is of order S.
25
1 2 3
4 5 6
o 0 ill ill I]} ;@
FIGURE 2.2.1: The two strategies (X and ~ denote non-zero elements)
(i) for partial pivoting, any of the elements in the box can
. be taken as the pivot. If '7' is the largest magnitude,
then the 3rd and 5th rows of (A(2) ,~(2» have to be
interchanged.
(ii) for full pivotin~ any of the 9 numbered elements can be
taken as the pivot. If '5' is the element of largest
magnitude, then the interchanging involves (1) the 3rd and
4th rows of (A(2),~(2» followed by (2) the 3rd and 4th
columns and the corresponding unknowns as well.
The full pivoting is considered to be a satisfactory strategy but in
practice it is time-consuming for execution. In addition, since the
columns are included in the interchanging process, then it may be difficult
to preserve the triangular form of the matrix to the final step. Also,
searching for the pivot element may take a long time, especially for large
systems of equations. Thus, the partial pivoting is, generally,
preferred in practice and for most problems including the iterative
improvement (or residual correction) procedure (see Goult et a1 (1974h
Broyden (1975».
The pivoting approach can also be applied for the LU decomposition.
However it can be shown to be unnecessary for positive definite matrices.
26
(8) Iterative Methods
These methods may be considered as an alternative to direct methods
for solving linear systems of special properties, notably when the matrix
is sparse (elimination methods may fill-in the zero elements with non-zero
values and/or of large dimension.
Iterative methods for solving the linear system (2.2.2) are,
generally, based on genera_ing a sequence of approximate solution vectors
(s) {! }, s=0,1,2, ..• , h h h . 1· (5+1). 1· sue t at t e approxlmate so utlon x lS a lnear
function of ~(s). If this sequence does converge, then the iteration
process can be interrupted whenever the desired accuracy in the solution
is attained or an optimal number of significant figures is reached
depending on the word length of the computer. Furthermore, in contrast
to direct methods, iterative methods do not suffer from the inherent
inaccuracy in the calculation since they are always self-correcting
th procedures where the solution at the s step will not affect the solution
th at the next step and can be regarded as an initial solution to the (5+1)
iteration. On the other hand, if we are seeking a solution of N-digits
accuracy, and the generated sequence of solutions is carried out retaining
M-digits ef accuracy (M>N) then for the computation to be worthwhile the
loss of accuracy should not exceed (M-N) digits.
Three well-known iterative procedures are presented to solve (2.2.1):
(i) Jacobi (or Simultaneous Displacements) Method
In this method, the sequence of approximate solutions can be generated
successively from the formula,
(s+l) x. 1
(ii) Gauss-Seidel (or Successive Displacements) Method
The iteration process described by this method has the form.
(2.2.15)
27
1 =- a .. (z. -
a .. x. ), 1) )
(2.2.16)
This method is, basically, to accelerate the convergence of (2.2.16)
by inserting an over-relaxation factor w whose optimum value lies between
1 and 2 (sometimes the method is formed under-relaxation for O<w<l).
The computation form of this method (which was suggested by D.M. Young
(1954» is
n I a .. x~s»,i=l,2, ... ,n
j=i+l 1) )
can be given by (2.2.18)
where the matrix A of (2.2.2) has been split into matrices Rand T such
that A=R-T, and R is non-singular matrix. Subsequently, if A is split
into three component matrices, L,D and U, i.e. A=-L+D-U, where D is
diagonal, Land U are lower and upper triangular matrices respectively,
then on substitution of Rand T in (2.2.18) as follows:
(a) R=D, T=L+U,
(b) R=D-L, T=U, -1 -1
and (c) R=w D-L, T=U+(w -l)D,
we will obtain the equivalent matrix form of the above mentioned iterative
schemes (i),(ii) and (iii) respectively, i.e.,
~imu1taneous Displacement Metho& (2.2.19)
x(s+l) = D-1L~(S+l)+D-1u~(s)+D-1z (Successive Displacement Metho& (2.2.20)
(s+l) -1 (s+l) -1 (s)-l x = wD L~ +[wD U+(l-w)Il~ +wD ~ (S.O.R. Metho&
(2.2.21)
28
~(S+l) = ~(S) + i ' (2.2.22)
where the iteration (or correction) matrix M=R-IT and d=R-lz (where
R,T and z are defined in (2.2.18». - In fact, the iterative scheme (2.2.22) represents the general
form of a stationary iterative process, where the matrix M remains
unchanged throughout the iteration operation, (if the relaxation factor
w in (2.2.21) depends upon s, then (2.2.21) becomes a non-stationary
iterative process).
The iteration process (2.2.22) converges to a fixed point ~,
(~=A-l~, the'solution of (2.2.2) for any initial solution ~(O) if the
matrix M is zero-convergent. More precisely, since any matrix is zero-
convergent if and only if its spectral radius is less than unity i.e.
P(M)<l (Neumann and Plemmons (1978», then a sufficient and necessary
condition of convergence for (2.2.22) can be given by the
Theorem 2.2.2:
A necessary and sufficient condition for the iteration process (2.2.22)
to converge for any initial vector x(O) is that all the eigenvalues of M
should be less than 1 in modulus.
Proof: (see Goult et al (1974»
Whilst a sufficient condition for convergence of (2.2.22) is merely
that 11 MII <I, since 1 A k 11 MII (see Section 2.4), where A refers to the
largest eigenvalue of the matrix M. This means that it may happen in
some cases 1 IMI 1>1 but 11.1<1 which guarantees the convergence of the
iteration process according to the above stated Theorem 2.2.2. On the
otherhand as confirmed by Theorem 2.2.2 the convergence of (2.2.22)
is totally independent of the choice of the initial vector, ~(O) as long
as the matrix A in (2.2.2) is non-singular, whilst it is dependent on
x(O) if A is singular (Meyer and Plemmons (1977».
29
The asymptotic rate of convergence of (2.2.22) is given by the value
-log!A! (Froberg (1974» or the average rate of convergence for s
iterations may be given by -.9.n! !M(s)! !
s , (Varga (1962». So, for a
given non-singular linear system we can determine the rate of convergence
of the iterative algorithms (2.2.19),(2.2.20) and (2.2.21). Generally,
the Gauss-Seidel scheme yields a better rate of convergence than the Jacobi.
Moreover, sometimes it happens that the former might converge and the latter
diverge, and vice versa, (illustrated in Fox (1964), Faddeeva (1959». For
a linear system which possesses adiagonaZ dominant matrix both schemes may
converge since the suffici~nt condition (as given above) is fulfilled.
Furthermore, the superiority of the Gauss-Seidel method over the Jacobi
nethod is given by the following theorem:
• Theorem 2.2.3:
If A in (2.3.26) is symmetric positive-definite, then the Gauss-Seidel
method always converges since all the eigenvalues of the iteration matrix
(i.e. M=(D-L)-lU) are less than 1 in modulus.
Proof: (see Lieberstein (1968), Fox (1964»
In the former reference (see page 62) there is given a counter-
example which verifies the invalidity of Theorem 2.2.3 for the Jacobi
scheme, i.e. although the matrix A is symmetric and positive definite
the iteration matrix D-l(L+U) may have eigenvalue(s) greater than 1 in
modulus.
The convergence of the S.O.R. method depends upon the choice of over-
relaxation factor w so as to ensure the eigenvalues of the iteration
matrix M be minimised to as small as possible and <1 in modulus.
Unfortunately, there is no general method available to locate the optimum
value of w to satisfy this requirement. This is discussed in Varga (1962),
Goult et al (1974), Froberg (1975) and Smith (1978), etc.
30
In general, the amount of computational work involved in any iterative
method cannot always be easily determined in advance. However, it can be
shown that an iterative process requires approximately 0(n2) operations
(multiplications and additions) per step for an (nxn) full/dense matrix.
Thus, an iterative method would be superior to the conventional elimination
h d ~f 1 (h f h b f hI' . met 0 s. s~ were s re ers to t e num er 0 steps w en tle LteratLve
process is interrupted). Obviously, for large sparse linear systems, the
number of operations may be considQrably less than 2 n • Conrad and Wallach
(1979) proved that the number of operations can be reduced consid~rably
(25% or 50% for some iterative algorithms) by a so-called aZternating
technique. This involves the combination of any two explicit iterative
procedures, such as (2.2.19) :0 (2.2.21) in an alternating fashion, i.e.
each step of (2.2.18) being replaced by two 'half' iterations of the form,
where
R (s+l) (5+ 1)
X = Z + T x 2 5-0 1 2 2- 1 2- ,- , , , ...
Finally, we outline the residuaZ oorreotion procedure which aims to
improve the unacceptable solution of (2.2.2). The residuaZ vector, r
(say) which is 0 for the exact solution can be shown to satisfy the
following iteration process, (i)
r = b - Ax (i) - - , i=O,1,2, •••
where x(O) is the initial solution vector, and rei) is the residual
h . th . . vector at t e L 1teratLon.
(2.2.23)
If the solution ~(i), i~O, is not sufficiently accurate then one
should proceed to compute the residual vector in doubZe preoision
computation form (2.2.23), and consequently solve the system (using
single precis ion computation),
A.!l.(i) = rei) (2.2.24)
for the correction vector ~(i) which can be added to ~(i) to produce the
'improved' solution (~(i)+~(i». Further. if the factorization LU for A
is computed initially and retained. then the work to carry out the
iteration (2.2.23) is considerably reduced for i=1.2 •••• via the process
f l · L (i) (i) d u (i) (i) o so vlng ~ =£ an ~ =~ • The iterative process can be
terminated at a stage where no further improvement in the solution is
obtained. Meanwhile. it is important to point out that the residual
£(O)=~_A~(O) may have a 'misleading' concept. i.e. even if it has small
components it does not necessarily indicate that the solution x(O) is
acceptable (Fox and Mayers(1977» as for instance in ill-conditioned
equations or cases where the exact solution x is small. (0)
Thus. £ and
31
. (1) (2) (s) the remainder of the reslduals. E .! ..... ! must be calculated with
double precision computation (Goult and et al. (1974». Thus. the residual
correction scheme is a reliable procedure which reduces the error in the
approximate solution and in particular whenever x(O) is reasonably close
-1 to A z.
2.3 CONTRACT MAPPING THEOREM, NEWTON'S METHOD
Let there be given a non-linear system of n (~1) equations, i.e.
(2.3.1)
where the functions ~1'~2:'" '~n are defined and continuous in a given
domain G, where GCa n
(the real n-dimensiona1 space). If the values x 1 ,x
2 ,
••• ,x EG, then the function ~., i=l,2, ••• ,n form the mapping of G onto n 1
itself. Moreover, rewriting (2.3.1) in the compact form,
where
Definition 2.3.1:
The mapping i in (2.3.2) is termed a contraction mapping in the
(2.3.2)
domain G if there exists a proper fraction L such that for any two vectors
~l'~2EG their images i(~l) and i(~2) fulfil the following condition
Ili(~l) - i(~2) 11 :S Lllx1-~211 , Ol:L<l , (2.3.3)
and L is independent of ~1 and ~2 and is commonly termed a Lipschitz
constant. The inequality (2.3.3) is known as the Lipschitz (contraction)
condition. It leads to an important theorem which is stated below.
Theorem 2.3.1:
Given a closed domain G, a constant L<l and a function i to be an
contraction mapping in G satisfying the Lipschitz condition (2.3.3), then
the following statements hold true:
(i) for any irrespective choice of the initial solution ~(O)EG, the
sequence of successive solutions {~(r)}, r~O and ~(r)EG, will
converge to a limit, x* (say), And x~G is the root of (2.3.2)
33
(ii) the non-linear vector equation (2.3.2) has a unique solution, i.e.
~* is a sole one,
(2.3.4)
Proof:
Let s>r and writing
11~(s)_~(r)11 =11 (~(r+l)_~(r»+(~(r+2)_~(r+l)+ ... (~(s)_~(S-l)11 ,
(2.3.5)
we obtain the following by applying the triangle inequality given earlier
in this chapter,
11~(s)_~(r)II~II~(r+l)_~(r)II+II~(r+2)_~(r+l)II+ ... +II~(s)_~(s-l)ll.
Now, by virtue of Lipschitz condition (2.3.3) we have
111(~(m»_1(~(m-l» 11
~ LII~(m)_~(m-l)11
~ L211~(m-l)_~(m-2)11
~ Lmll~(l)_~(O)11 ,
Applying the result (2.3.7) on (2.3.6), we obtain
11~(s)_~(r)II~(Lr+Lr+l+ ... +Ls-l)ll~il)_~(O)11 r s
= L -L 1 Ix(l)_x(O) 1 1 (by using the sum formula l-L - -
(2.3.6)
(2.3.7)
(2.3.8)
r Since L<l,then L +0 as~. Thus, for any € the Cauchy inequality (2.1.1~
can be applied on (2.3.8) and hence the sequence {~(r)} has a limit
(cf. (2.1.11», Le.,
x* = lim (r)
and x* E G which completes the proof of
To prove part (ii) we proceed as follows.
Assume that x**EG is another solution of (2.3.2) different from ~*. then
we have,
since (l-L)<O, then (2.3.9) cannot hold unless x*=x**.
By letting s- in (2.3.8) we have E;' = lim x(s) and hence s--
point (Hi) of the theorem is complete.
(See Ortega and Rheinboldt (1970), Demidovich and M~.~n (1976),
Henrici (1964».
process for (2.3. 2) i.e.
(k+1) (k) x = 1(2:, A k=O,l,... (2.3.10)
converges for a unique fixed point 2:,*E <?=Rn for any 2:,(0) EG. Furthermore,
if G=R n
, then we have global convergence for (2.3.10). Meanwhile,
Theorem 2.3.1 in this .case, may be termed as the gZobaZ convergence theorem
(Ortega and Rheinboldt (1970), page 385).
We may introduce .another theorem which is associated with the
preceeding theorem, concerning the convergency of the non-linear equation
(2.3.10) (see Dahlquist and Bjork (1969), Demidovich and M9lron (1976),
Szidarovszky and Yakowitz (1978»:
Let the vector function 1(~ be continuous together with its
derivative 1'(2:,) in a bounded convex closed domain G and satisfies
where ~ is a constant and n aq, • (x) I 11 t' (~ III = max (max L 1 - ) •
~G j i=l aXj
If x(O)EG and all successi:e approximations 2:,(1) ,2:,(2) , ••• also lie
(2.3.11)
(2.3.12)
in G, then the iteration process (2.3.10) converges to a unique solution
of the equation (2.3.2)~.
35
(N. B. this theorem is also valid for I 1.1 loo or I 1.1 IF in addition to
11.11 1
as in (2.3.11) and (2.3.12). but not necessarily all of them at
the same time).
Corollary 2.3.1:
The process of the Picard iteration (2.3.10) converges to the unique
solution of equation (2.3.2), if the inequalities
n
dq,.(X) ~ - ~ \.lJ' < 1 • j=1.,2, .•. ,n
dXj
We consider a non-linear system of equations.
fl (Xl .X2 ' .... X n
) = 0
(2.3.13)
(2.3.14)
or compactly. (2.3.14) can be written in conventional vector form given
by
where
and
!(!!.) = 2. •
F
o is the null vector of the n-tuple.
(2.3.15)
Suppose that (2.3.15) has the exact solution~. By solving (2.3.15)
iteratively (using the preceding iterative procedure) we may obtain an
approximate solution x(s) after s iterations. thus eventually we may write
(8) (s) x =cx+ E: (2.3.16)
where
which represents the error vector of the root.
Since a is the exact solution. then it is trivial -to write
!(::) = 0 •
36
(Z.3.17)
By Taylor's expansion. (2.3.17) yields the following result
() () ( ) aF (s) a! (s) a! (s) Q = !(~ s -E., s ) _ (~s) - [..".--x (~ ). -, -(x ), .... -:;--(x ) 1 E
o 1 oX2 - oXn - -
(Z.3.18)
where O(sl ••••• sn) represents the high order terms of the error values
sl.sZ .... ,sn (> order 1). By supressing this term in (Z.3.18) we obtain.
(Z.3.19)
where J(~ is the Jacobian matrix which involves the derivatives of
f l .f2 ••••• f
l .x
2 ••••• x
n • i.e ••
J.
Assume J(~) is a non-singular matrix. thus we have from (2.3.19).
E.,(s) = [J(~(S»l-l!(~(s» • (2.3.20)
Taking (s) (s+l) (s) . e. =-(x. -x.), ~=1,2, ••• ,n, ~ ~ ~
and substituting (2.3.20) we obtain the so-called generalized Newton
method" i.e.
x(s+l) = ~(s)_(J(~(s»(l!(~(s». s=O.l .... (2.3.21)
where x(O) refers to the initial solution which is often recommended
to be taken as close as possible to the desired exact solution.
It is known that (2.3.21) is impractical for implementation purposes.
therefore it is usually converted to the equivalent form. i.e.
J(~(S»ll~(S) = _!(~(s) • (2.3.22)
which can be solved for the OOl'l'eotionll~(s) and added to x(s) to
. (s+l) produce the new approx~mate ~ •
The modified form of the Newton's process is to approximate J(~(s» (0)
by J(~ ). then (2.3.21) becomes
(2.3.23)
(a)
(b)
37
for a single variable is given in Fig.2.3.1.
y
Cl
--~----------~~~~(S-+~2~)----~(-s+~1~)----------~~(s~)~"x o x x x
F(x)
y
(b) Modified simplified Newton method
38
We may derive from (2.3.21) that each step of the iteration process
requires to evaluate the following:
(i) the n components of !(~), fk(xl, ••• ,x n ), k=1,2, ••• ,n,
2 af. n elements of the Jacobian matrix, i.e. ~(xl""'x ),i,j=1,2, ... ,n. ox. n
(ii) J
(iii) the solution of the linear system (2.3.22) by a suitable method
(see previous section).'
One of the procedures to economise on the amount 'of work is, to avoid
computing the Jacobian at every step and instead we either (1) use the
modified Newtpn process (2.3.23), or (2) the Jacobian is evaluated once
after several steps. Both cases however may depend upon the initial guess
of the solution vector.
Generally speaking, Newton's method is still an attractive method from
the theoretical viewpoint, this could be mainly due to its quadratic
convergency propert~ where the error vectors in two successive steps of
the iteration are associated by the relation
1 I,£(s+l) 1 I::KI I,£(s) 1 12 , K is constant
where e(j) = x(j)-u and ~ is the exact solution. - - (2.3.24)
Relation (2.3.24) is judged to be valid as long as the initial vector
solution is sufficiently close to the exact solution.
Convergence of the Newton process and its sufficiency conditions have
been studied and formulated by Kantorovich (see Henrici (1962),
Demidovich and MQron (1976), Brown (1962». Also it has been discussed by
Ortega and Rheinbo1dt (1970) and Ostrowski (1966).
In praatiaeNewton's method, unfortunately, may not be considered as
an efficient and attractive computational procedure, in particular for
large systems of non-linear equations where the order may exceed several
thousands (as in non-linear partial differential equations). The main
concern in this respect is the loss of accuracy during the solution of
39
the linear system (step (iii), page 38) by direct methods and loss of
both practical and theoretical efficiency in solving the linear system by
iteration (Lieberstein (1968». In addition, the amount of computational
effort required by step (i) and step (ii) (page 38) is too expensive and
may be too difficult (unless the desired derivatives are in a simple form).
However, due to extensive investigations which have been reported in
this respect so far some modifications of the Newton's process have been
proposed (see Ortega and Rheinboldt (1970». Three variants will now be
introduced.
(1) Discretized Newton Iteration
In this method (2.3.21) is replaced by the iteration (by way of a
simple illustration we choose a single variable example),
(s+l) (s) x = x (2.3.25)
where
d h d · . df. 1 db' .. an t e erLvatLve dX LS rep ace y Lts approxLmatLon, i.e.
df ~ f(x+~x)-f(x) dX ~ fix
(2) By inserting a damping factor w such that the iteration process will
have the form (2.3.26)
, (2.3.27)
to be fulfilled each step. Usually w is less or equal to unity
(Hall and Watt (1976».
(3) By shifting the origin of the Jacobian matrix. This method involves
adding the diagonal matrix AI to the matrix J, thus (2.3.21) now becomes:
where the factor A can be chosen to ensure the validity of (2.3.27).
The modifications (2.3.25),(2.3.26) and (2.3.28) may have the
property of super linear convergence under certain conditions or higher­
order convergence under others (Ortega and Rheinbo1do (1970».
40
41
2.4 EIGENVALUE PROBLEM
Solving a linear system of equations, such as (2.2.2) , has already
been discussed in Section 2.2. The investigation of the dynamic behaviour ,
(i.e. the stability) of such linear systems (which arise in many physical
problems, e.g. in electrical or mechanical oscillations) can be based on
scalar values called the eigenvaZues. For example, for a vibration
problem the eigenva1ues give the natural frequencies of the system.
These are especially important because, if external loads are applied at
or near these frequencies, resonance will cause an amplification of motion
making failure more likely.
Ax = AX (2.4.1)
where A is known as the eigenvaZue (Zatent root, characteristic number
or rroper number) of A and x its corresponding eigenvector etc. The n
values of A represents the roots of the polynomial which can be expanded
from the determinanta1 equation,
P(A) = det(A-AI) = 0 • (2.4.2)
In fact the matrix A also satisfies (2.4.2) as well, i.e. P(A)=O.
This is given by the following theorem:
Theorem 2.4.1: (Cay1ey-Hamilton theorem)
Any square matrix A is a root of its characteristic equation. If
n n-1 P(A)=[A +cIA + ••• +cn]=det(AI-Ax), then
_ n n-l P(A) = A +clA +, •• +cnI = O.
Proof: (see Faddeeva (1959), Demidovich and Moron (1976».
The problem in (2.4.1) is called a standard eigenprobZem, an
eigenvalue probZem if the eigenvalues only are required to be determined
and an eigen~b~em if the corresponding eigenvectors are required as well.
These may be obtained from the homogeneous equation
42
Whenever the characteristic equation (2.4.2) has simple zeros, i.e.
the matrix A has distinct eigenvalues, each of them p,ossessing a unique
corresponding eigenvector, and consequently those eigenvectors are
linearly independent the matrix is then called non-detective, (Goult et al
(1974), page 9, Ralston (1965) page 470). Otherwise, if there exists
Al=AZ= ••• =Ak'Aj' l~k<j~n, then the number of the corresponding eigenvectors
will be less than or equal to k and hence the whole set of eigenvectors of
A fail to form a base of the space since their number is less than the
order of the matrix (in this case a matrix is called a defeotive matrix).
Practically, (2.4.2) is not used to determine the eigenvalue(s) of
a matrix unless it is of very low-order. Before referring to an alternative
strategy we introduce the main definitions and theorems that might be
related to this thesis.
positive definite if XTAx>O } -' - for all non-null, real vector
positive semi-definite if ~TA~O ~.
(1)
(2)
A rectangular matrix A of order (~n) with linearly independent
columns, the product ATA is symmetric and positive definite. (Broyden
(1975). page 34).
Moreover, it can be shown that a real matrix A is positive definite,
if and only if it is symmetric and all its eigenvalues are positive,
positive semi-definite if they are greater than or equal to zero and
indefinite if they are negative, zero, or positive (see Noble (1969».
Definition 2.4.2:
th The n order matrices A and B are said to be similar if there is
-1 a non-singular matrix P such that P AP=B. Matrix B is said to be ,
obtained from matrix A by a similarity transloPmatlcrlt or orthogonal
tY'ansj'cY',"ae;ion if P is ortho(JcnaZ matrix, (Le. if pT=p-l ).
Then both the matrices A and B have the same eigenvalues and their
eigenvectors are associated with the relation PL=~' where ~ and L refer
to the eigenvectors of A and B respectively.
43
The last definition is often exploited whenever the standard eigen-
problem (2.4.1) is difficult to deal with, thus by use of a similarity
transformation the standard problem can be transferred to the so-called
generalised eigenpY'olJ lem, i. e.
or
Theorem 2.4.2: (Gerschgorin or Brauer's theorem)
If A=[a •. ) is any matrix of order n, then all the eigenvalues of A ~J
lie within the union of the circles n
I A-a.·1 ~ L I a,,1 ,i=l(l)n. u. . 1 ~J J=
jii
(2.4.3)
Since the transposed matrix AT has the same eigenvalues as A, hence
the result of the above theorem for AT yields (Froberg (1974» n
lA-a .. I ~ L la .. 1 ,j=l(l)n. JJ i=l ~J
(2.4.4)
iij
Using result (2.1.1) the inequalities (2.4.3) and (2.4.4) can be
written as n I AI ~ L la .. 1 , i=l(l)n,
. 1 ~J J=
I AI ~ L la .. 1 , j=l(l)n. i=l ~J
44
Hence an estimate of A can be given by the results. n
I AI ~ max L la.·1 - 11 All", (2.4.5) i j=l ~J
n I AI ::: max L la··1 - 11 AliI .
j i=l ~J (2.4.6)
If peA) is defined such that p(A)= maxIA.I. hence the estimate of . ~ ~
the spectral rad:us of A is bounded by the "'-norm or the I-norm of A.
In fact. although both norms can be computed easily in practice.
theoretically it can be shown that peA) is bounded by any norm of A. i.e.
p (A) ::: IIAII (2.4.7)
This result follows from (2.4.1), Le. IIA~II=IAI·II~II=IIA~II
~ 11 AII·II ~II or I A I ~ 11 All provided ~ is non-null vector.
Determination of the Eigenvalues
In this respect two fundamental approaches are normally adopted. (i)
if there exists two eigenvalues (not equal) of ratio less than unity in
modulus. then this ratio may be made small if it is raised to a suitable
high power. Subsequently,methods based on this approach are often used
to calculate one eigenvalue of the matrix. Examples of these strategies
are the Power method. inverse iteration. etc ••••• (ii) to perform a
similarity transformation (which is often an orthogona1 transformation)
so that the matrix can be reduced to either diagona~ or tridiagona~ or
triangular form where the eigenva1ues appear on the principal diagonal
or as a recursive Sturm sequence. Methods based on this technique give
all the eigenva1ues. such methods are Jacobi. Givens. Householder. QR
method. etc •••• However. we are interested only in method of the first
type. thus we briefly demonstrate the following methods.
(a) The Power Method
~
45
such that there exists one of them which has the largest value in modulus
A l
\\\>\A2\::\A3\ ... ::\An\ • Let x
l .x2' •••• x be the corresponding eigenvectors of the eigenvalues
- - -n
Ai such that their linear combination can be expressed as a vector l' i.e., n
Z = 1: c.x. , (2.4.8) i=l 1.-:t.
where c., i=l(l)n. are constant coefficients. 1.
For any eigenvalue A. we have from (2.4.1) 1.
Ax. = A.x. , l~i~n -:L 1.-1,
Now, operating on Z in (2.4.8) by A we obtain n
AZ = 1: c.Ax. i=l
1. -1.
i=l 1. 1.-1.
1: c. (..f:.)x.}, i=2 1. Al -1,
or the iterative form after s steps, (2.4.10) may be written as n A. s 1: c.(,1.) x.}
i=2 1. Al -1.
(2.4.9)
(2.4.10)
(2.4.11)
Since 1:~1<1' i=2, ••• ,n by the initial assumption, therefore the
second term in the parentheses of (2.4.11) tends to zero for sufficiently
large s. 'Subsequently, the vector Z(s) becomes a scalar multiple of .!l
and the ratio between the kth component, l~k~n, of Z(S+l) and Z(S) tends
to ~1' i.e. (s+l)
y lim ....;k:::.-_ s-- Yk
=
The practical feature of the algorithm can be summarised as follows.
Given a vector .!(s), the iteration process involves,
v(s+l) __ A,!(s) Step 1 ...
Step 2 (s+l) Choose a = the element of largest modulus amongst the
components of Z (s+l)
Step 3
Step 4
i3 (s+l)
if ~(S+l) and ~(s) are sufficiently close, then halt the
procedure, otherwise repeat from step 1:
46
The rate of convergence depends upon the ratio I:~I (where AZ is
assumed to be the sub-dominant eigenvalue, i.e. the A2 = max IA.I) being 2
. L ~l~n
very small. Obviously, the smaller the value of this ratio. the faster
convergence.
(b) The Inv~rse Power Method
Any non-singular matrix A and its inverse A-I have the same eigen-
vectors but reciprocal eigenvalues as can be noticed from (2.4.1) and
the equation -1 1 A ~=I~' (2.4,12)
Therefore, the smallest eigenvalue of A can be determined by
obtaining the largest -1
eigenvalue of A • Furthermore, it is unnecessary
to compute A-I explicitly since the iteration procedure can be carried out
as follows.
At iteration s, we compute
Step 1 r,(s+l) = A-l~(S) which can be written as
Ar,(S+l) = x(s)
(2.4.13)
The system (2.4.13) can be solved by a suitable method (such as
those discussed earlier or the ones proposed in this thesis). For
example, of the LU decomposition process is used initially, then (2.4.13)
will be solved cheaply in each successive iteration.
Further it can be shown that for any number p, the eigenvectors of
the matrix A-pI coincide with those of A, but its eigenvalues are Ai-P,
i=l(l)n. This is known as shifting the origin of the matrix A by the
47
amount P. The shifting strategy is basically introduced to speed up the A '
convergence. For example, if the ratio IA~I is net small
very c1ese to 1), then P can be chos~nsuch that the ratio
enough (Le.,
I P-Aol I A21 max --~- <- i P \ A1
which eventually accelerates the convergence. Likewise adopOting the
shifting strategy for the inverse power method leads us to solve
(A-PI)Z.(s+l) = ~(s) , (2.4.14)
instead of (2.4.15) and hence the smallest eigenva1ue of A-1 is given
by l/(A-p).
Apart from the scheme of shifting the origin which is referred to
as Wi1kinson's method (1955), there are other techniques for accelerating
2 the convergence of the Power method such as 0 -process, Rayleigh quotient,
etc •••• (see Ra1ston (1965), Fadeeva (1959».
48
2.5 EVALUATION OF THE SQUARE ROOT OF A SQUARE MATRIX
Let a matrix A of order n possess the eigenvalues A1.A Z ••••• An • The
characteristic polynomial which is derived from det(4-AI) is of order n
and may be expressed in the form
By the Cayley-Hami1ton theorem 2.4.1 matrix A is a root of its own
characteristic equation. i.e. P(A)=O. thus we have
(2.5.1)
Therefore the matrices AlI.AZI ••••• AnI are solutions of the
matrix equation p(A)=O. Furthermore. the products of matrices in (2.5.1)
maY'be zero even though no factor is zero (Hohn (1973). page 31). thus
P(A)=O may also have other solutions apart from A.I. l~i~n. (See Jennings 1
(1964). Hohn (1973».
We should point out that in this thesis our interest is the square
root. denoted by A!. for a positive (or semi-positive) definite matrix A
satisfies the following theorem.
Theorem 2.5.1:
The matrix A of order n is a definite (i.e. positive or non-negative)
matrix of rank r (r~n) iff there is a definite matrix A! of rank r such
that (A!)2=A.
Proof: (see Lancaster (1969). p.95).
In his paper. Laasonen (1958) recommended the use of Newton's method
for computing the square root of a matrix possessing the properties as
stated in the following theorem:
Theorem 2.5.2:
Let A denote a real square matrix with real. positive eigenva1ues.
Then. the matrix iterative algorithm X(O) = kI
X(i+1) = lx(i) + l(AX(i»-l 2 2
. } (Z.5.Z)
49
where k is a non-zero constant, generates a sequence of matrices which
converges to the solution of
AX?-I = 0 , (2.5.3)
which has positive eigenvalues. Moreover the rate of convergence is
quadratic.
Laasonen also suggested that if the matrix A is non-negative definite,
then A!=X can be obtained from the algorithm
1 1-1 X(i+l) = 2' XCi) + 2' AX(i) , (2.5.4)
where the initial matrix X(o) is as given in (2.5.2). Therefore, the
iterative process (2.5.4) will produce an approximate solution to the
equation 2 X -A = 0 • (2.5.5)
According to the theorem (2.5. 1), the solution of (2.5.3) and (2.5.5)
by the algorithms (2.5.2) and (2.5.4) respectively preserve the property
of the original matrix, i.e. the matrices A-! (and A!) remain positive
(and non-negative) if A is also.
Each iteration of both the processes (2.5.2) or (2.5.4) involve the
1 , f 2 I' , so ut10n 0 n 1near equat10ns. It is recommended that any of the above
iterative procedures should be terminated as soon as the difference
(i) x(i+l) between two successive solutions X and no longer decreases,
otherwise the influence of the round-off errors may be significant on
the obtained solution. Laasonen pointed out that in most cases the
influence of round-off errors does not become serious due to the
quadratic rate of convergence of the process.
50
Z.6 MAIN PROPERTIES OF CONTINUED FRACTIONS
We consider in this section the basic theory of continued fractions
and their application which is relevant to the algorithms presented in
Chapter 4. A comprehensive study of continued fractions (in particular
the convergence theory) is due to H.S. Wall (1948». Others such as
Frank (196Z). Blanch (1964) •••• etc •• have contributed to develop the
theory and the.application of continued fractions.
Definition Z.6.l:
Consider the two variables t and w associated by the relation
a. t. (w) =--L
J b.+w J
:.", .. ,, l (Z.6.l)
where the a's and b's are real or (generally) complex numbers, and
the linear transformation of w into t is expressed in the form:
or
totl(w)
totltZ(w)
to[tl(w)] ,
l T (w) = b + ...:.......,.-----
3 b3 + ""b;:"+'---
which is called an infinite continued fraction. The abbreviated notation
for (Z.6.Za) will be used and is
51
The fractions b
O b =­o 1 • -L. j=1,2 ••••• are called the ~omponents or the
b. J
partial- ql!ot:q.n.~s of the continued fraction (2.6.2). (N.B. the partial a.
quotients ~ can not be reduced). and a .• b .• j=1.2 •••• are called the j J J
partial numerators and denominators respectively. For the case T (w), n
n+». i.e. n is a finite number then the continued fraction is said to be
fl:nite
If the partial numerators are equal to 1. i.e. a.=l. i=1.2, ... ~
then (2.6.3) is said to be a simple or standard continued fraction. i.e.
T",(w) = bo + 1!)1~)1~3+1 "'I~) ... (2.6.4)
Furthermore. the continued fraction (2.6.2) is said to converge
if there exists a limit (or has the value) v such that n
lim IT t.("') :: lim T ("') = v • n- i=l ~ n- n
This means that at a fixed point w- under the transformations t .• i=1.2 •••• ~
as defined earlier the value of the continued fraction is a limit of an
infinite sequence of images. n
Similarly at the fixed point w=O.
defined. The quantity T (0) is termed
then lim T (0) =lim TT t. (0) is ~ n n- i=l ~
the n approximant or convergent. n
'
It is shown by mathematical induction (Wall (1948» that n A JA +A
TT n-J. n T (w) :: t. (w) = B +B' n=0.1.2 ....
n . 1 ~ JA L= n-l. n (2.6.5)
where the quantities A 1.A ,B l,B are independent of w anA can be n- n n- n
evaluated by the fOllowing fundamental recurrence formulae.
A. 1 J+ = b. lA.+a. lA. 1) J+ J J+ J-
= b. IB.+a. lB. 1 J+ J J+ J-
and the initial values,
, j=O,1,2, •..
Thus, the nth approximant, i.e. T (0) can be easily obtained n
from (2.6.5), i.e., A
n
52
(2.6.6a)
(2.6.6b)
th called the n numerator and denominator respectively.
Moreover for a simpte continued fraction, the recurrence relation (2.6.6a)
becomes
, j=0,l,2 •••. ,
(2.6.7)
Finally, the value of the continued fraction (2.6.2) does exist if
the following conditions are fulfilled (Blanch (1964»:
(i) At most a finite number of the denominators Bk vanish.
(ii) Given a positive quantity E, there exists an N such·that,
for n>-N
l ~n_:n+kl<E n n+k for all positive k. (2.6.8)
The implication of the validity of (2.6.8) ensures the existence of a
limit quantity T such that
whereas the failure of (2.6.8) means the continued fraction is said to
diverge or to be divergent and its value can not be assigned.
53
~ l aZ I
________ v-__ ~n~=::;lf2, •••
The essential property of the continued fraction (2.6.9) is that
its partial numerators and denominators are periddically repeated after a.
n divisions. or the partial quotient ~ • j=1.2 ••••• n. is repeated after J
a period of 'length' or cycle n since its previous occurrence. Thus.
equation (2.6.9) is termed an infinite periodic continued fraction, and
its linear fractional transformation can be expressed by
T(w) = ;~+ I I:~+ I ... I::+wl (2.6.10)
Consequently, as in (2.6.5), we introduce (2.6.10) in the form
T(w) = A lw+A n- n B lw+B n- n
th where A ,B refer to the n numerator and denominator of the continued n n
fraction and their values are given by the recurrence formulae (2.6.6a)
with initial values Al = I, B_1 = 0
Aa = 0, Ba = 1.
We now define the fixed point of the continued fraction.
Definition 2.6.3:
A lx+A x = n- n
B lx+B n- n
holds true. Then there are two values of x which can be obtained by
solving the quadratic equation.
54
which for xl ,x2 (say), are termed the fixed points of the transformation
(2.6.10).
Some of the algorithms adopted in Chapter 4 are associated with
the numerical evaluation of periodic continued fractions. This is
basically formulated by the following 'theorem.
Theorem 2.6.1:
be the fixed points of the transformation (2.6.10) A
where a. ,b .• 1 1
i=1,2 •••• ,n are any complex numbers and a.fO. 1
th the m
Let Bm be m Then approximant of the periodic continued fraction (2.6.9).
(2.6.9) converges iff xl and x2 are finite numbers satisfying one of the
following two conditions:
x2 ' j=O,1,2, ••• ,n-l. or (ii)
If the continued fraction converges, its value is xl'
Proof: see Wall (1948), page 37.
Theorem 2.6.2: (Equivalence theorem)
A continued fraction is unchanged in value if some partial numerator
and partial denominator, along with the immediately succeeding partial
numerator, are multiplied by the same non-zero constant (see Blanch (1964».
Such a tranformation has been termed in (Wall (1948» an equivalence
transformation.
(2.6.12)
By virtue of Theorem 2.6.1 due to successive transformations,
the periodic continued fraction (2.6.12) may be expressed in a form
with unitary partial denominators, i.e.
55
where Y1 = ~1/S1 '
y. = CI../S. 1S" S. and S. 1,,0, i=2,3, ... ,n. ~ 1. 1- 1. 1. 1-
A
It is proved by Blanch (1964) that T (2.6.13) will converge to a
positive value less or equal to I provided that any of the partial
1 numerators is positive and does not exceed 4' i.e.,
If l' ~ 1 o < Yi ~ 4 ' then T converges and O<T~2 (2.6.14)
Okolie (1978) or (gJa"~ and OJcolie(1979» pointed out that the .. '. condition (2;6.14) for the convergence of (2.6.12) can be exploited to
introduce a cyclic factorisation of a periodic tridiagonal matrix, i.e.
if CI.. and S. are given by the relations, 1. 1.
Cl. 1 = a1 cn , S = b 1 n
, CI.. = ck~+l
} k=(n-i+l)mod 1. i=2,3, ..• ,n, S. = bk n,
1.
where a.,b.,c., i=l(l)n are the coefficients of the periodic tridiagonal 1. 1. 1.
matrix
(2.6.15)
then a periodic continued fraction of the form (2.6.l2) converges provided
the matrix (2.6.15) is diagonally dominant, in a sense that the
inequalities
hold true.
Likewise, we will consider the equivalence theorem and the
condition (2.6.14) to introduce the method in Chapter 4 which involves
the cyclic factorization of a periodic general matrix of bandwidth
2r+l, r>.l (see Section 2.1).
CHAPTER 3
56
3.1 DIFFERENT NUMERICAL APPROACHES FOR SOLVING BOUNDARY VALUE PROBLEMS
To deal with a suitable approach to obtain the solution of certain
boundary value problems (b.v.p.) there arise many points which should be
taken into account, 'i. e. the boundary condi tion(s) which the problem is
subject to, the existence and uniqueness of the solution, the stability
of the adopted approach, the level of accuracy in the solution which can
be attained, ... etc. For example, techniques such as the factorisation
of the operators and the use of projection operators are suitable for
linear boundary value. problems while for the non-linear boundary value
problems the non-iterative schemes which are based on continuous trans­
formation are used (Meyer (1973».
Broadly speaking, n'jmerical techniques have had advantages and dis­
advantages in practice. The Shooting (or Driving) method, for instance,
is a well known approach for initial value problems, Keller (1975 ) in
his survey indicated that this method accounts for nearly one third of
the work concerned with the numerical investigation of differential
equations. On the other hand the shooting method has many drawbacks due
to the difficulties which are encountered in practice, such as 1) the
starting solution might not be assured for the convergency of the Newton­
Raphson iteration or (and) 2) the method becomes unstable due to its
sensitivity to any perturbation in the initial conditions (which accounts
for ,the growth of round-off error) although the numerical method is stable
(however, the Multiple or Parallel shooting procedures are proposed to
tackle such difficulties), (Hall and Watt (1976», Keller (1968), Osborne
(1969» •
finite-difference methods are used for boundary value problems. An
important exposition of the recent theoretical advances have been made
57
on the methods for initial and b.v.p. are collected in Hall and Watt (1976).
Our sole interest is the finite-difference methods which will be
discussed in the next section. Whilst for the finite,e1ement methods
we briefly outline the following.
Finite Element Method
The finite element method is a recent new method which has been used
widely during the last three decades. During this time the electronic
digital computer has progressed to the stage where it can accomplish
considerable amounts of computational work in a short time. The method is
commonly used in engineering problems. in particular civil. aeronautical
and mechanical engineering. especially for the analysis of stress in solid
components. Furthermore. it has been applied even to three-dimensional
problems. such as the time-dependent problems involving fluid flow. heat
transfer. magnetic field analysis •••• etc. (Fenner (1975). Bathe and
Wi1son (1976), Martin and Carey (1973».
The finite element method is based on the idea of partitioning the
physical system, such as structures, solid or fluid continua into small
non-overlapping svbregions or eZements. Each element is a basic unit
which has to be considered. Within these elements an approximation
function (in the form of polynomials or rational functions •••• etc.)
where parameters can be adjusted to ensure the existence of the continuity
of the functions in adjacent elements (Mitchell and Wait (1977».
Moreover. an approximating function. generally •. can