f. r. gantmacher the theory of matrices, vol. 1 1990

THE THEORY OF

MATRICESF. R. GANTMACHER

VOLUME ONE

AMS CHELSEA PUBLISHINGAmerican Mathematical Society Providence, Rhode Island

The present work, published in two volumes, is an English translation by K. A. Hirsch,of the Russian-language book TEORIYA MATRITS by F. R. Gantmacher (FaarMaxep)

2000 Mathematics Subject Classification. Primary 15-02.

Library of Congress Catalog Card Number 59-11779International Standard Book Number 0-8218-1376-5 (Vol.I)

Copyright © 1959, 1960, 1977 by Chelsea Publishing CompanyPrinted in the United States of America.

Reprinted by the American Mathematical Society, 2000The American Mathematical Society retains all rights

except those granted to the United States Government.® The paper used in this book is acid-free and falls within the guidelines

established to ensure permanence and durability.Visit the AMS home page at URL: http://wv.ams.org/

1098765432 0403020100

PREFACE

THE MATRIX CALCULUS is widely applied nowadays in various branches ofmathematics, mechanics, theoretical physics, theoretical electrical engineer-ing, etc. However, neither in the Soviet nor the foreign literature is there abook that gives a sufficiently complete account of the problems of matrixtheory and of its diverse applications. The present book is an attempt to fillthis gap in the mathematical literature.

The book is based on lecture courses on the theory of matrices and itsapplications that the author has given several times in the course of the lastseventeen years at the Universities of Moscow and Tiflis and at the MoscowInstitute of Physical Technology.

The book is meant not only for mathematicians (undergraduates andresearch students) but also for specialists in allied fields (physics, engi-neering) who are interested in mathematics and its applications. Thereforethe author has endeavoured to make his account of the material as accessibleas possible, assuming only that the reader is acquainted with the theory ofdeterminants and with the usual course of higher mathematics within theprogramme of higher technical education. Only a few isolated sections inthe last chapters of the book require additional mathematical knowledge onthe part of the reader. Moreover, the author has tried to keep the indi-vidual chapters as far as possible independent of each other. For example,Chapter V, Functions of Matrices, does not depend on the material con-tained in Chapters II and III. At those places of Chapter V where funda-mental concepts introduced in Chapter IV are being used for the first time,the corresponding references are given. Thus, a reader who is acquaintedwith the rudiments of the theory of matrices can immediately begin withreading the chapters that interest him.

The book consists of two parts, containing fifteen chapters.In Chapters I and III, information about matrices and linear operators

is developed ab initio and the connection between operators and matricesis introduced.

Chapter II expounds the theoretical basis of Gauss's elimination methodand certain associated effective methods of solving a system of n linearequations, for large n. In this chapter the reader also becomes acquaintedwith the technique of operating with matrices that are divided into rectan-gular `blocks.'

iii

iv PREFACE

In Chapter IV we introduce the extremely important `characteristic'and `minimal' polynomials of a square matrix, and the `adjoint' and `reducedadjoint' matrices.

In Chapter V, which is devoted to functions of matrices, we give thegeneral definition of f (A) as well as concrete methods of computing it-where f (A) is a function of a scalar argument A and A is a square matrix.The concept of a function of a matrix is used in §§ 5 and 6 of this chapterfor a complete investigation of the solutions of a system of linear differen-tial equations of the first order with constant coefficients. Both the conceptof a function of a matrix and this latter investigation of differential equa-tions are based entirely on the concept of the minimal polynomial of a matrixand-in contrast to the usual exposition-do not use the so-called theory ofelementary divisors, which is treated in Chapters VI and VII.

These five chapters constitute a first course on matrices and their appli-cations. Very important problems in the theory of matrices arise in con-nection with the reduction of matrices to a normal form. This reductionis carried out on the basis of Weierstrass' theory of elementary divisors.In view of the importance of this theory we give two expositions in thisbook : an analytic one in Chapter VI and a geometric one in Chapter VII.We draw the reader's attention to §§ 7 and 8 of Chapter VI, where we studyeffective methods of finding a matrix that transforms a given matrix tonormal form. In § 8 of Chapter VII we investigate in detail the methodof A. N. Krylov for the practical computation of the coefficients of thecharacteristic polynomial.

In Chapter VIII certain types of matrix equations are solved. We alsoconsider here the problem of determining all the matrices that are permutablewith a given matrix and we study in detail the many-valued functions ofmatrices -N/A and 1nA.

Chapters IX and X deal with the theory of linear operators in a unitaryspace and the theory of quadratic and hermitian forms. These chapters donot depend on Weierstrass' theory of elementary divisors and use, of thepreceding material, only the basic information on matrices and linear opera-tors contained in the first three chapters of the book. In § 9 of Chapter Xwe apply the theory of forms to the study of the principal oscillations of asystem with n degrees of freedom. In § 11 of this chapter we give an accountof Frobenius' deep results on the theory of Hankel forms. These results areused later, in Chapter XV, to study special cases of the Routh-Hurwitzproblem.

The last five chapters form the second part of the book [the secondvolume, in the present English translation). In Chapter XI we determinenormal forms for complex symmetric, skew-symmetric, and orthogonal mat-

PREFACE V

rices and establish interesting connections of these matrices with real matricesof the same classes and with unitary matrices.

In Chapter XII we expound the general theory of pencils of matrices ofthe form A + AB, where A and B are arbitrary rectangular matrices of thesame dimensions. Just as the study of regular pencils of matrices A + ABis based on Weierstrass' theory of elementary divisors, so the study of singu-lar pencils is built upon Kronecker's theory of minimal indices, which is, asit were, a further development of Weierstrass's theory. By means of Kron-ecker's theory-the author believes that he has succeeded in simplifying theexposition of this theory-we establish in Chapter XII canonical forms ofthe pencil of matrices A + AB in the most general case. The results obtainedthere are applied to the study of systems of linear differential equationswith constant coefficients.

In Chapter XIII we explain the remarkable spectral properties of mat-rices with non-negative elements and consider two important applicationsof matrices of this class : 1) homogeneous Markov chains in the theory ofprobability and 2) oscillatory properties of elastic vibrations in mechanics.The matrix method of studying homogeneous Markov chains was developedin the book [25] by V. I. Romanovskii and is based on the fact that the matrixof transition probabilities in a homogeneous Markov chain with a finitenumber of states is a matrix with non-negative elements of a special type(a `stochastic' matrix).

The oscillatory properties of elastic vibrations are connected with anotherimportant class of non-negative matrices-the `oscillation matrices.' Thesematrices and their applications were studied by 'Al. G. Krei:n jointly withthe author of this book. In Chapter XIII, only certain basic results in thisdomain are presented. The reader can find a detailed account of the wholematerial in the monograph [7].

In Chapter XIV we compile the applications of the theory of matricesto systems of differential equations with variable coefficients. The centralplace (§§ 5-9) in this chapter belongs to the theory of the multiplicativeintegral (Produktintegral) and its connection with Volterra's infinitesimalcalculus. These problems are almost entirely unknown in Soviet mathe-matical literature. In the first sections and in § 11, we study reduciblesystems (in the sense of Lyapunov) in connection with the problem of stabil-ity of motion ; we also give certain results of N. P. Erugin. Sections 9-11refer to the analytic theory of systems of differential equations. Here weclarify an inaccuracy in Birkhoff's fundamental theorem, which is usuallyapplied to the investigation of the solution of a system of differential equa-tions in the neighborhood of a singular point, and we establish a canonicalform of the solution in the case of a regular singular point.

vi PREFACE

In § 12 of Chapter XIV we give a brief survey of some results of thefundamental investigations of I. A. Lappo-Danilevskii on analytic functionsof several matrices and their applications to differential systems.

The last chapter, Chapter XV, deals with the applications of the theoryof quadratic forms (in particular, of Hankel forms) to the Routh-Hurwitzproblem of determining the number of roots of a polynomial in the righthalf-plane (Re z > 0). The first sections of the chapter contain the classicaltreatment of the problem. In § 5 we give the theorem of A. M. Lyapunov inwhich a stability criterion is set up which is equivalent to the Routh-Hurwitzcriterion. Together with the stability criterion of Routh-Hurwitz we give,in § 11 of this chapter, the comparatively little known criterion of Lienardand Chipart in which the number of determinant inequalities is only abouthalf of that in the Routh-Hurwitz criterion.

At the end of Chapter XV we exhibit the close connection between stabil-ity problems and two remarkable theorems of A. A. Markov and P. L.Chebyshev, which were obtained by these celebrated authors on the basis of theexpansion of certain continued fractions of special types in series of decreas-ing powers of the argument. Here we give a matrix proof of these theorems.

This, then, is a brief summary of the contents of this book.

F. R. Gantmaeher

PUBLISHERS' PREFACE

TIIE PUBLISHERS WISH TO thank Professor Gantmaeher for his kindness incommunicating to the translator new versions of several paragraphs of theoriginal Russian-language book.

The Publishers also take pleasure in thanking the VEB Deutscher Verlagder Wissenschaften, whose many published translations of Russian scientificbooks into the German language include a counterpart of the present work,for their kind spirit of cooperation in agreeing to the use of their formulasin the preparation of the present work.

No material changes have been made in the text in translating the presentwork from the Russian except for the replacement of several paragraphs bythe new versions supplied by Professor Gantmacher. Some changes in thereferences and in the Bibliography have been made for the benefit of theEnglish-language reader.

CONTENTS

PREFACE ill

PUBLISHERS' PREFACE vi

I. MATRICES AND OPERATIONS ON MATRICES. . --- -»» 1

§ 1. Matrices. Basic notation .^.^.. - » - --^- --»^-^^ I§ 2. Addition and multiplication of rectangular matrices--_.... 3

§ 3. Square matrices 12

§ 4. Compound matrices. Minors of the inverse matrix............-.-.... 19

H. THE ALGORITHM OF GAUSS AND SOME OF ITS APPLICATIONS 23

§ 1. Gauss's elimination method-- 23

§ 2. Mechanical interpretation of Gauss's algorithm .^....»-- 28

§ 3. Sylvester's determinant identity - _^_ -_- - . - 31

§ 4. The decomposition of a square matrix into triangular fac-tors 33

§ 5. The partition of a matrix into blocks. The technique ofoperating with partitioned matrices. The generalized algo-rithm of Gauss 41

III. LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE 50

§ 1. Vector spaces 50

§ 2. A linear operator mapping an n-dimensional space into anm-dimensional space ^^....._.^.. _. __^.^ .__... _.-.... ^ _ .»._..^ 55

§ 3. Addition and multiplication of linear operators ....... 57

§ 4. Transformation of coordinates 59

§ 5. Equivalent matrices. The rank of an operator. Sylvester'sinequality 61

§ 6. Linear operators mapping an n-dimensional space intoitself 66

vii

CONTENTS

§ 7.

§ 8.

Characteristic values and characteristic vectors of a linearoperator 69

Linear operators of simple structure.-_...... - 72

IV. THE CHARACTERISTIC POLYNOMIAL AND THE MINIMAL POLY-NOMIAL OF A MATRIX. _ _ _ 76

§ 1.

§ 2.

§ 3.

§ 4.

§ 5.

§ 6.

Addition and multiplication of matrix polynomials___ 76

Right and left division of matrix polynomials._.___ 77

The generalized Bezout theorem ._..r.. _ 80

The characteristic polynomial of a matrix. The adjointmatrix 82

The method of Faddeev for the simultaneous computationof the coefficients of the characteristic polynomial and ofthe ad joint matrix 87

The minimal polynomial of a matrix._...- _ _ »» 89

V. FUNCTIONS OF MATRICES...-.......-_._..w _ ___ 95

§ 1. Definition of a function of a matrix.. --- 95§ 2. The Lagrange-Sylvester interpolation polynomial- 101

§ 3. Other forms of the definition of f (A). The componentsof the matrix A_ _ . » 104

§ 4. Representation of functions of matrices by means of series 110§ 5. Application of a function of a matrix to the integration of

a system of linear differential equations with constantcoefficients 116

§ 6. Stability of motion in the case of a linear system....... 125

VI. EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES.ANALYTIC THEORY OF ELEMENTARY DMSORS...w . - 130

§ 1. Elementary transformations of a polynomial matrix-._... 130§ 2. Canonical form of a A-matrix._ .. 134

§ 3. Invariant polynomials and elementary divisors of a poly-nomial matrix 139

§ 4. Equivalence of linear binomials------ 145§ 5. A criterion for similarity of matrices---..... 147§ 6. The normal forms of a matrix. 149§ 7. The elementary divisors of the matrix f(A) 153

CONTENTS ix

§ 8. A general method of constructing the transforming matrix 159§ 9. Another method of constructing a transforming matrix...... 164

VII. THE STRUCTURE OF A LINEAR OPERATOR IN AN n-DIMEN-SIONAL SPACE..-..-- 175

§ 1.

§ 2.

§ 3.

§ 4.

§ 5.

§ 6.

§ 7.

§ 8.

The minimal polynomial of a vector and a space (withrespect to a given linear operator)-...... ____ 175

Decomposition into invariant subspaces with co-primeminimal polynomials ............ _ . 177

Congruence. Factor space . _... » _ 181

Decomposition of a space into cyclic invariant subspaces._... 184The normal form of a matrix --- 190Invariant polynomials. Elementary divisors_.-......--- 193The Jordan normal form of a matrix...... ____ 200Krylov's method of transforming the secular equation......... 202

VIII. MATRIX EQUATIONS ... _._ _ . »» 215

§ 1. The equation AX = XB .» _ . .. _. ... ... _.... 215§ 2. The special case A = B. Commuting matrices. ............ 220

§ 3. The equation AX - XB = C . . ... _ .. _.. 225§ 4. The scalar equation f (X) = 0___ ... 225§ 5. Matrix polynomial equations 227§ 6. The extraction of m-tb roots of a non-singular matrix._...._... 231§ 7. The extraction of m-th roots of a singular matrix- ...... 234§ 8. The logarithm of a matrix. ...... _ 239

IX. LINEAR OPERATORS IN A UNITARY SPACE ........_ 242

§ 1.

§ 2.

§ 3.

§ 4.

§ 5.

§ 6.

§ 7.

§ 8.

General considerations 242Metrization of a space....- 243Gram's criterion for linear dependence of vectors ..... 246Orthogonal projection 248The geometrical meaning of the Gramian and some in-equalities .. r _. . _. 250Orthogonalization of a sequence of vectors ... 256Orthonormal bases .. ,.._.. .. 262The adjoint operator 265

x CONTENTS

§ 9. Normal operators in a unitary space.............».».._ 268

§ 10. The spectra of normal, hermitian, and unitary operators...... 270§ 11. Positive-semidefinite and positive-definite hermitian op-

erators _ _ . _..._ __ ... . _ . _ 274

§ 12. Polar decomposition of a linear operator in a unitary space.Cayley's formulas ----_.. _ . w »» 276

§ 13. Linear operators in a euclidean space- .. 280§ 14. Polar decomposition of an operator and the Cayley for-

mulas in a euclidean space..._....... 286§ 15. Commuting normal operators 290

X. QUADRATIC AND HERMITIAN FORMS.. 294

§ 1. Transformation of the variables in a quadratic form..... 294§ 2. Reduction of a quadratic form to a sum of squares. The

law of inertia 296§ 3. The methods of Lagrange and Jacobi of reducing a quad-

ratic form to a sum of squares ._-.... 299§ 4. Positive quadratic forms 304§ 5. Reduction of a quadratic form to principal axes.... 308§ 6. Pencils of quadratic 310§ 7. Extremal properties of the characteristic values of a regu-

lar pencil of forms ._...M... 317§ S. Small oscillations of a system with n degrees of freedom...... 326§ 9. Hermitian forms 331§ 10. Hankel forms _ . _... _ _ _. 338

BIBLIOGRAPHY 351

INDEX - ............ ....... 369

CHAPTER I

MATRICES AND OPERATIONS ON MATRICES

§ I. Matrices. Basic Notation

1. Let F be a given number field.'DEFINITION 1: A rectangular array of numbers of the field F

a21 ary ... a.n

I

(1)

is called a matrix. When m = n, the matrix is called square and the numberm, equal to n, is called its order. In the general case the matrix is calledrectangular (of dimension m X n). The numbers that constitute the matrixare called its elements.

NOTATION : In the double-subscript notation for the elements, the firstsubscript always denotes the row and the second subscript the column con-taining the given element.

As an alternative to the notation (1) for a matrix we shall also use theabbreviation

Ilaikll (i =1, 2, ..., m; k=1, 2, ..., n). (2)

Often the matrix (1) will also be denoted by a single letter, for example A.If A is a square matrix of order it, then we shall write A= 11 ait 111. Thedeterminant of a square matrix A aik 1171 will be denoted by I aix in or by

A number field is defined as an arbitrary collection of numbers within which the fouroperations of addition, subtraction, multiplication, and division by a non-zero numbercan always be carried out.

Examples of number fields are: the set of all rational numbers, the set of all real num-bers, and the set of all complex numbers.

All the numbers that will occur in the sequel are assumed to belong to the numberfield given initially.

1

2 1. MATRICES AND MATRIX OPERATIONS

We introduce a concise notation for determinants formed from elementsof the given matrix:

t1 i$...pA(k1k2...kP)

at,k, a,,k, ... a.,,,

a?,k',...ajk,

arr,, a1Pk, ... ajPkP

(3)

The determinant (3) is called a minor of A of order p, provided' _ 5and <k2 <... < k,,:5n. Arectangular

matrix A ask 1, 2, ... , m ; k = 1, 2, ... , n) has l') ) minorsof order p '

t112...ip

G

Sm 1A P n'. (3')k1 k$... kP 5 k1 < k2 < ... < k,:9 n

The minors (3') in which i, = k,, i2 = k2, ... , ip = k,,, are called principalminors.

In the notation (3) the determinant of a square matrix A=11aikIII canbe written as follows :

IAI=A(1 2.....n)I 2 . .n

The largest among the orders of the non-zero minors generated by amatrix is called the rank of the matrix. If r is the rank of a rectangularmatrix A of dimension m X n, then obviously r < min (m, n).

A rectangular matrix consisting of a single column

x1

XS

is called a column matrix and will be denoted by (XI, x2, ... ,A rectangular matrix consisting of a single row

1I21,22,...,zn11

is called a row matrix and will be denoted by [zi, z2,. . .,A square matrix in which all the elements outside the main diagonal

are zero

§ 2. ADDITION AND MULTIPLICATION OF MATRICES 3

dl 0 ...00 d2...0

110 0 ...dnliis called a diagonal matrix and is denoted by' JJd{8;k1J; or by

(dl,dt,...,dJ-Suppose that m quantities y,, Y 2 ,- .. , y,,, have linear and homogeneous

expressions in terms of n other quantities x,, x2i . . . , xR :

yl = allxl + a12x2 + . + alnxn

y2 = a2lxl + a22x2 + . + a2nxn

ym = amlxl + amsxs + ... + amnxn ,

or more concisely,

(4)

Y, = aikxk (i =1, 2, ..., in). (4')k-lThe transformation of the quantities x,, x into the quantities

Yi, Y2, .. , y,n by means of the formulas (4) is called a linear transformation.The coefficients of this transformation form a rectangular matrix (1)

of dimension m X n.The linear transformation (4) determines the matrix (1) uniquely, and

vice versa.In the next section we shall define the basic operations on rectangular

matrices using the properties of the linear transformations (4) as our start-ing point.

§ 2. Addition and Multiplication of Rectangular Matrices

We shall define the basic operations on matrices : addition of matrices,multiplication of a matrix by a number, and multiplication of matrices.

1. Suppose that the quantities y,, Y2 ,-- . , y,, are expressed in terms ofthe quantities x,, x2i ..., xn by means of the linear transformation

n

yy = E a{kxk (i =1, 2, ..., m) (5)k_1

2 Here 84. is the Kronecker symbol : 8rk = 1 (i =k),10 (1* k).


and the quantities z,, z2i ... , z,n in terms of the same quantities xl, x2 , ... , xnby means of the transformation

n

zi = bikxk (i =1, 2, .. , m) . (6)k=1

Thenn

yr + zi = E (ark + brk) xk (i =1, 2, ... , in). (7)k-1

In accordance with this,we formulate the following definition.

DEFINITION 2: The suns of two rectangular matrices A= 11 aik II andB = H bik If , both of dimension tit X n, is the matrix C _ 11 crk 11, of the samedimension, whose elements are the smuts of the corresponding elements of thegiven matrices :

whereC=A+B,

irk=ark+ bik (i =1, 2, ..., m; k= 1, 2, ..., n).

The operation of forming the sum of given matrices is called addition.Example.

lle, a3 + c3a, a. a3b$ bsll+lldi d2 ,li -llbi+di bE+d$ 63+dall'bl

According to Definition 2, only rectangular matrices of equal dimensioncan be added.

By virtue of the same definition, the coefficient matrix of the transforma-tion (7) is the sum of the coefficient matrices of the transformations (5)and (6).

From the definition of matrix addition it follows immediately that thisoperation has the properties of commutativity and associativity :

1. A+B=B+A;2. (A+B) +C=A+ (B+C).

Here A, B, and C are arbitrary rectangular matrices all of equal dimension.The operation of addition of matrices extends in a natural way to the

case of an arbitrary finite number of summands.

2. Let us multiply the quantities y,, y2, ... , y,,, in the transformation(5) by some number a of F. Then

ayi (aark) xk (i =1, 2, ..., m).k-1

In accordance with this, we formulate the following definition.


DEFINITION 3. The product of a matrix A = I aik a i (i = 1, 2, ... , m;

k=1,2,...,n) by anumberaof Fis the matrixC=ll cck (i=1.2,...,m;k = 1, 2, ... , n) whose elements are obtained from the corresponding ele-

ments of A by multiplication by a :

C, = aA,

wherec;k= aa,,t (i=1, 2, ... , m; k=1, 2, .. , n).

The operation of forming the product of a matrix by a number is calledmultiplication of the matrix by the number.

Example. a1 azof b1 b2

It is easy to see that1. a(A + B) = aA + aB,2. (a+#) A = aA+#A,3. (ao)A = a(#A).

b3 II -11 agal =2 "43

b1 bs abs

Here A and B are rectangular matrices of equal dimension and a and P arenumbers of F.

The difference A - B of two rectangular matrices of equal dimension

is defined byA-B=A+ (-1)B.

If A is a square matrix of order n and a a number Of F, then3

1 aAJ=an A1.

3. Suppose that the quantities z1, z2i ... , z in are expressed in terms ofthe quantities y1, y2, ... , y by the transformation

n

z; =X a+yk (i=1, 2, ..., m) (8)k. 1

and that the quantities y1, y2, ... , y are expressed in terms of the quantitiesx1i x2, ... , xq by the formulas

4

yk= bkixi (k =1, 2, ..., n) (9)

Then on substituting these expressions for the yk (k = 1, 2, ... , n) in (8) wecan express z1, z2i ... , z,, in terms of x1, x2, ... , xq by means of the compositetransformation :

3 Here the symbols I A I and j a.4 denote the detenuiuants of the matrices A and aA(see p. 1).


n

zi=A'aikEb,xt'=.('aikbsi)xt (a=1, 2, ..., m). (10)k-1 i-1 1-1 k-1

In accordance with this we formulate the following definition.

DEFINITION 4. The product of two rectangular matrices

all a12 . . . aln

A= a21 a22 ... a2n

aml am2... arn,:

is the matrix

Ci =

B=

b11 b12 ... bleb21 b22 ... b2q

I bnl bn2 ...

cll c12 ... clq

C21 C22 ... e2q

cm, Cm2...Cmq

in which the element c{, at the intersection of the i-th row and the j-th columnis the 'product'* of the i-th row of the first matrix A into the j-th columnof the second matrix B :

n

(i= 1, 2, ..., m; j=1, 2, ..., q). (11)k-l

The operation of forming the product of given matrices is called matrixmultiplication.

Example.

bl b, b,

Cl dl ex /1

Ca d, ea

cs d, e, /a_ alc1 + age, + aac, aldl + a,ds + aad,

blcl + b,c, + b,cs bids + beds + b,d,

a1e1 + ases + aaea

bier + bne, + b,ea

a1/1 + aJ1 + adsab,/1+bsl2 +b3/3 ll

By Definition 4 the coefficient matrix of the transformation (10) is theproduct of the coefficient matrices of (8) and (9).

Note that the operation of multiplication of two rectangular matrices canonly be carried out when the number of columns of the first factor is equalto the number of rows of the second. In particular, multiplication is alwayspossible when both factors are square matrices of one and the same order.

The product of two sequences of numbers a,, a,, ... , a and b,, b2, ... , b, is definedn

*s the sum of the products of the corresponding numbers: a;b{.i=t


The reader should observe that even in this special case the multiplicationof matrices does not have the property of commutativity. For example,

1 2

3 4

2 0

3 -1 H8 -2

18 -4 , but1 2

3 4

2 4

0 2

If AB = BA, then the matrices A and B are called permutable or commuting.Example. The matrices

A- I0are permutable, because

AB=I

and B =3

-22

-:u

-7 -66 -4

7IandBA=II 6-4IIt is very easy to verify the associative property of matrix multiplication

and also the distributive property of multiplication with respect to addition:

1. (AB)C = A (BC),2. (A+B)C=AC+BC,3.A(B+C)=AB+AC.The definition of matrix multiplication extends in a natural way to the

case of several factors.When we make use of the multiplication of rectangular matrices, we can

write the linear transformation

yl = a11x1 + a12x2 + ... + alnxa

y2 = a21x1 + a22x2 + ... + a2nxn

ym = amlxl+ am2xt + ... + amnxn

as a single matrix equation

y1

Y2

Ym

all a12 ... alnaY1 a2, ... a2,

amt am2... a,, xn

or in abbreviated form,U=:1j.

8 I. MATRICES AND MATRIX OPERATIONS

Here x = (x1, x2, ... , and y = (yl, y2, ... , y,,,) are column matricesand A= II a{k II is a rectangular matrix of dimension m X n.

Let us treat the special case when in the product C = AB the secondfactor is a square diagonal matrix B = {dl, d2, ... , Then it followsfrom (11) that

c{,=a,,df (i =1, 2, ..., m; j= 1, 2, ..., n),

all a12 ... aln

aml am2...amn

Similarly,

d1 0 ...00 d2...0

0 O ...d

all a12 ... a1n

a21 a22 ... a2n

amt am2...ama

I

alld1 a12d2 . . . alnd,.a21d1 a2,d2 ... a2nd,.

amldi am2d2... amndn

dla11 dla12 ... dlaln

d2a21 d2a22 ... d2a2n

dmaml dmam2... dmama

Hence: When a rectangular matrix A is multiplied on the right (left)by a diagonal matrix {dx, d2, ...}, then the columns (rows) of A are multi-plied by dl, d2 ,--- , respectively.

4. Suppose that a square matrix C = 11 c{q IIi is the product of two rectan-gular matrices A= II ack II and B = If bk5 II of dimension m X n and n X m,respectively :

C11 ... Cim

Cmi ... cam

all ale ... aln

aml amt ... amn

Ibl, ... bim

b b22 ... 2m

b.1 ... bn,n

(12)

cy =«aibq

(i, j=1, 2, ..., m). {13)


We shall establish the important Binet-Cauchy formula, which expresses

the determinant I C I in terms of the minors of A and B :

C11 ... C1m alk, ... a1k,,, bkll ... bk,m' (14)

Cml Cmm amk} amkm k 1 bk m... ... bm ...m

or, in the notations of page 2,

C(12...m) I ``1(klk2...km)B(11 2k

...mm)'(14')

1g P,<k,<... <km n

According to this formula the determinant of C is the sum of the productsof all possible minors of the maximal (m-th) orders of A into the correspond-ing minors of the same order of B.

Derivation of the Binet-Cauchy formula. By (13) the determinant ofC can be represented in the form

C11 ... clm

Cml ... Cmm

n n

E ala,ba,1 ... L alambamma,-1 anal

n n

X a,=,ba,la, 01

n

a,, ..., am= 1

I ama+nba,,,mam=1

ala,ba,1 ... alamba,,,m

a, ,,,ba,1 ... amambam,,,n (1 2 .::

m_ E A ` ba, l ba,2 ... bamm .

al, ..., am- 1 al a2 46

If m > n, then among the numbers al, as, ..., am there are always atleast two that are equal, so that every summand on the right-hand side of(15) is zero. Hence in this case I C I =0.

Now let m < n. Then in the sum on the right-hand side of (15) all thosesummands will be zero in which at least two of the subscripts al, aa, ... , amare equal. All the remaining summands of (15) can be split into groups ofm ! terms each by combining into one group those summands that differfrom each other only in the order of the subscripts a,. a2, ... , am (so that

S When in > n, the matrices A and B do not have minors of order in. In that casethe right-hand sides of (14) and (14') are to be replaced by zero.


within each such group the subscripts a,, a2, ... , am have one and the sameset of values). Now within one such group the sum of the correspondingterms ise

e (a1, a2, ..., am) A (k1 k2 ... km ) ball ba,2 ... b

= A 1 2 ... m xe (ale a2, ..., am) ballba,2(k, k2 ... km

.... k)A (k1 k2 ... km } B (11

k2.. mm [2

Hence from (15) we obtain (14').Example 1.

laa

c1+ascs.+...+ancn aide+azda.+... +and

b1c, + bscs + ... + bncn b1d1 + b,d, + ... + bndn

Cl

c=

en

d1

d,

Therefore formula (14) yields the so-called Cauchy identity

a,cl + a, y + ... + a,,cn ald1 + a02 + ... + and, at a* cc dt (16)b1e1 + b et + ... + bncn bldg + b + ... + bndn . I

l;gi<ks bi b1 c* 4

Setting ai = ci, b{ = d{ (i = 1, 2, ... , n) in this identity, we obtain :

aj+of+...+an albs+aob,-+...+anbn aj aka,b1+asbs+...+anbn bl+b22+...+b'n 1= X Ibi bk I'

1:5 i<k:gn

a1 a2...an

b1 bs ... bn

If as and b{ (i = 1, 2, ... , n) are real numbers, we deduce the well-knowninequality

(a1b1+a$b2+...+anbn)2<(a,s+ay+...+an) (bi+bj+...+bn). (17)

Here the equality sign holds if and only if all the numbers aj are propor-tional to the corresponding numbers b{ (i=1, 2, ... , n).

Example 2.

a1c1+ bldg ... ales + bldn............... a1 b1

Ildl...dnland -}- bndl ... ancn -}- bndn I j

I

an bb

6 Here k, < k, < ... < k., is the normal order of the subscripts a,, a,, ... , am ands ( a1, a,, ... , a_) = (-1)N , where N is the number of transpositions of the indicesneeded to put the permutation a,, a,, ... , a., into normal order.

2. ADDITION AND MULTIPLICATION OF MATRICES 11

Therefore for n > 2ale, + bid1 ... alc + blda

= 0.

a,c1 + bade ... aaca + bed,,

Let us consider the special case where A and B are square matrices ofone and the same order n. When we set m =,n in (14'), we arrive at thewell-known multiplication theorem for determinants :

2...n_A 12...n 12...n1 2... n) \1 2...n)B 1 2...n

or, in another notation,

(18)

Thus, the determinant of the product of two square matrices is equal tothe product of the determinants of the factors.

5. The Binet-Cauchy formula enables us, in the general case also, to expressthe minors of the product of two rectangular matrices in terms of theminors of the factors. Let

A= 1Ia.kII, B= IIbkj 1, C =Iic{f (I

(i 1, 2, ..., m; k =1, 2, ..., n; j =1, 2, ..., q)and

C=AB.

We consider an arbitrary minor of C :

C$1 i=...ip 1 PSmandp5q.i j:... jp)

r1 s9h<j:<...<ip9q

The matrix formed from the elements of this minor is the product oftwo rectangular matrices

af,1 ai,2 ... ai,, blf, ... blipbQf, ... b2M

alpl aip8 ... a(pabaf, ... bafp


Therefore, by applying the Binet-Cauchy formula, we obtain :'

Ci2 _ A 2 ..

) Bka (19)(ii

ki k2 ... kr(k1

ii 92 ... jp }.

ii 92 ... jr) I S k1 <ks <<.... <kp n

(ui

For p =1 formula (19) goes over into (11). For p > 1 formula (19)is a natural generalization of (11).

We mention another consequence of (19).The rank of the product of two rectangular matrices does not exceed the

rank of either factor.If C = AR and rA, rn, rc are the ranks of A, B, C, then

re <min (rA,rn)

§ 3. Square Matrices

1. The square matrix of order n in which the main diagonal consists entirelyof units and all the other elements are zero is called the unit matrix and isdenoted by E'"' or simply by E. The name `unit matrix' is connected withthe following property of E : For every rectangular matrix

A=IlataI( (i=1,2,...,m; k=1,2,...,n)we have

E(m)A= AEt" )= A.Clearly

EW=II&IILet A= II au I1; be a square matrix. Then the power of the matrix is definedin the usual way :

AP=AA ... A (p =1, 2, ...) ; A° = R.p times

From the associative property of matrix multiplication it follows that

ApA°-Ap+g.

Here p and q are arbitrary non-negative integers.

7 It follows from the Binet-Cauchy formula that the minors of order p in C for p > n(if minors of such orders exist) are all zero. In that case the right-hand side of (19) isto be replaced by zero. See footnote 5, p. 9.

§ 3. SQUARE MATRICES 13

We consider a polynomial (integral rational function) with coefficients

in the field F:f (t) = aotm + altm-1 + ... + am.

Then by f (A) we shall mean the matrix

/(A)=aoAm+aiAm-1+...+amE.

We define in this way a polynomial in a matrix.Suppose that f (t) is the product of two polynomials g (t) and h (t) :

f(t) = g(t)h(t). (21)

The polynomial f (t) is obtained from g (t) and h (t) by multiplication termby term and collection of similar terms. In this we make use of the multi-plication rule for powers: tp . t9 = tp+q. Since all these operations remainvalid when the scalar t is replaced by the matrix A, it follows from (21) that

f(A) =g(A)h(A).Hence, in particular,'

g(A)h(A) =h(A)g(A); (22)

i.e., two polynomials in one and the same matrix are always permuetable.

Examples.Let the sequence of elements aik for which k - i = p (i - k = p) in a

rectangular matrix A = II atk li be called the p-th superdiagonal (subdiago-nal) of the matrix. We denote by H'"' the square matrix order n in whichall the elements of the first superdiagonal are units and all the other elementsare zero. The matrix H'"' will also be denoted simply by H. Then

010...00 0 1

H = H(") = Hz =

0 0 0... 0l

Hp=0 (P >n).

0 0 1 ... 01

. 1

0

0 0 0 ... 0

S Since each of these products is equal to one and the same f (A), by virtue of the factthat h(t) g(t) = f (t). It is worth mentioning that the substitution of matrices in analgebraic identity in several variables is not valid. The substitution of matrices thatcommute with one another, however, is allowable in this case.


By these equations, if

/(t)=ao+alt+a2te-F

is a polynomial in t, then

f(H)=a0E+a1H+ a2HP+... =

ao at as ..

0 ao a,

as

at

j 0 0 0 ... ao

Similarly, if F is the square matrix of order n in which all the elements ofthe first subdiagonal are units and all others are zero, then

/(Ei)=a0E+a1F+ag +...=

ao 0 ... 0a1 ae

0

al ao

We leave it to the reader to verify the following properties of the matricesH and F:

1. When an arbitrary rectangular matrix. A of dimension m X n ismultiplied on the left by the matrix H (or F) of order m, then all the rowsof A are shifted upward (or downward) by one place, the first (last) rowof A disappears, and the last (first) row of the product is filled by zeros.For example,

0 1 0 a1 a= as b1 b$ bs b4

0 0 1 b1 bs b3 b4 C1 Cs q C40 0 0 C1 Ct C3 C4 0 0 0 0

0 0 0 al as a3 a4 0 0 0 0

1 0 0 bl b, ba b4 al a2 as a4

0 1 0 C1 Cp Ca C4 b, bs ba b4

2. When an arbitrary rectangular matrix A of dimension m X n ismultiplied on the right by the matrix H (or F) of order n, then all thecolumns of A are shifted to the right (left) by one place, the last (first)column of A disappears, and the first (last) column of the product is filledby zeros. For example,

§ 3. SQUARE MATRICES

a, a2

b, b2

Cl C2

a, a,

b, b,

C1 C3

as a,

b3 b,

C3 C,

a3 a,

bs b,

C3 C4

0 1 0 00 0 1 0

0 0 0 1

0 0 0 00 0 0 01 0 0 00 1 0 00 0 1 0.!

0 a, a2 as

0 b, b2 bs

0 c1 c2 C3

a3 as a, 0

by bs b, 0

C2 C3 c, 0

15

2. A square matrix A is called singular if I A = 0. Otherwise A is callednon-singular.

Let A = + 1 aix 1+i be a non-singular matrix ( I A 0). Let us considerthe linear transformation with coefficient matrix A

n

yi= 'aixxx (i =1, 2, ... , n).x-1

(23)

When we regard (23) as equations for x1i x2j . . . , x, and observe that thedeterminant of the system of equations (23) is, by assumption, differentfrom zero, then we can express x1i x2, ... , X. in terms of yl, y2i ... , yn bymeans of the well-known formulas :

ail ... a1, i-1 y1 a,.,+I ... a1,a21 ... a2, i-1 y2 a2. i+1 ... aft

ant - an. i-1 yn an, i+1 ... ann

n

''a!x11 yx (24)xa1

We have thus obtained the `inverse' transformation of the transforma-tion (23). The coefficient matrix of this transformation

A-1=11 a{x1)

III

will be called the inverse matrix of A. From (24) it is easy to see that

a(-1)- Axi.x p-i i, k .- 1, 2, (25)

where Ax{ is the algebraic complement (the cofactor) of the element akiin the determinant I A I (i, k =1, 2, ... , n).


For example, if

A=

then

A-' I-JAI

a1 a2 as

bl b2 bs

Cl C2 C$

and I A 10,

b2c3-bsc2 ascs a2cs albs-asb2

bsci-blcs ales-ascs asbi albsb1cQ-b2c1 a2ci a1c2 albs-a2b1

By forming the composite transformation of the given transformation(23) and the inverse (24), in either order, we obtain in both cases the identitytransformation (with the unit matrix as coefficient matrix) ; therefore

AA-' =A-'A =E. (26)

The validity of equation (26) can also be established by direct multipli-cation of the matrices A and A'. In fact, by (25) we have"

n(-I) I

nn

au_4ik = 6iiIAA-']ii- aikaki l =FA-1 k-1

Similarly,

(i,j=1,2,...,n).

[A-'A],,- ' asi i>aki - Aq .' Akiakj - 'ail (i, j =1, 2, ..., n).

It is easy to see that the matrix equations

AX=E and IA =E A O) (27)

have no solutions other than X = A-'. For by multiplying both sides ofthe first (second) equation on the left (right) by A-' and using the asso-ciative property of matrix multiplication we obtain from (26) in bothcases :'0

X = A-'.

"Here we make use of the well known property of determinants that the sum of theproducts of the elements of an arbitrary column into the cofactors of the elements ofthat column is equal to the value of the determinant and the sum of the products of theelements of a column into the cofactors of the corresponding element of another columnis zero.

10 If A is a singular matrix, then the equations (27) have no solution. For if one ofthese equations had a solution X = Ij xik (' , then we would have by the multiplicationtheorem of determinants (see formula (18)) that I A X E = 1, and this isimpossible when I A I = 0.

§ 3. SQUARE MATRICES 17

In the same way it can be shown that each of the matrix equations

AX=B, XA=B (IAI 0), (28)

where X and B are rectangular matrices of equal dimensions and A is asquare matrix of appropriate order, have one and only one solution,

X = A-'B and X =BA-, (29)

respectively. The matrices (29) are the `left' and the `right' quotientson `dividing' B by A. From (28) and (29) we deduce (see p. 12) thatrB S rx and rx S rB, so that ra = rB. On comparing this with (28), wehave:

When a rectangular matrix is multiplied on the left or on the right bya non-singular matrix, the rank of the original matrix remains unchanged.

Note that (26) implies. A I . I A-' I = 1, i.e.

IA-lI=I

For any two non-singular matrices we have

(AB)-1 = B-'A-1. (30)

3. All the matrices of order n form a ring" with unit element E'"'.Since in this ring the operation of multiplication by a number of F is

defined, and since there exists a basis of n2 linearly independent matrices interms of which all the matrices of order n can be expressed linearly" 2 thering of matrices of order n is an algebra."

11 A ring is a collection of elements in which two operations are defined and canalways be carried out uniquely: the `addition' of two elements (with the commutativeand associative properties) and the 'multiplication' of two elements (with the associa-tive and distributive properties with respect to addition); moreover, the addition isreversible. See, for example, van der Waerdeu, Modern Algebra, § 14.

12 For, an arbitrary matrix A = II atk 11n with elements in F can be represented inn

the form A = aikAk, where Elk is the matrix of order n in which there is a 1 at thef,kmI

intersection of the i-th row and the k-th column and all the other elements are zeros.13 See, for example, van der Waerden, Modern Algebra, § 17.


All the square matrices of order v form a commutative group with respectto the operation of addition." All the non-singular matrices of order n forma (non-commutative) group with respect to the operation of multiplication.

A square matrix A = 11 ate') i is called upper triangular (lower trian-gular) if all the elements below (above) the main diagonal are zero:

a 11 axe ... a.,0 a22 ... aan

A=

0 0 ... ann

ail 0 ... 0a21 a. ... 0

axi an$ . . . ana

A diagonal matrix is a special case both of an upper triangular matrixand a lower triangular matrix.

Since the determinant of a triangular matrix is equal to the product ofits diagonal elements, a triangular (and, in particular, a diagonal) matrixis non-singular if and only if all its diagonal elements are different from zero.

It is easy to verify that the sum and the product of two diagonal (uppertriangular, lower triangular) matrices is a diagonal (upper triangular, lowertriangular) matrix and that the inverse of a non-singular diagonal (uppertriangular, lower triangular) matrix is a matrix of the same type. Therefore :

1. All the diagonal matrices of order n form a commutative group underthe operation of addition, as do all the upper triangular matrices or all thelower triangular matrices.

2. All the non-singular diagonal matrices form a commutative groupunder multiplication.

3. All the non-singular upper (lower) triangular matrices form a (non-commutative) group under multiplication.

4. We conclude this section with a further important operation on matrices-transposition.

14A group is a set of objects in which an operation is defined which associates withany two elements a and b of the set a well-defined third element a * b of the same setprovided that

1) the operation has the associative property ((a * b) * c = a * (b * o) ),2) there exists a unit element e in the set (a * e = e * a = a), and3) for every element e of the set there exists an inverse element a-1 (a * a-1 =

a-1 * a=e).A group is called commutative, or abelian, if the group operation has the commutative

property. Concerning the group concept see, for example, [531, pp. 245ff.

A=

§ 4. COMPOUND MATRICES. MINORS 19

If A - II ail, (i=1, 2, ... , m; k =1, 2, ... , n), then the transpose AT

is defined as AT=11aTt11,where aTN =aik (i=1,2,...,m;k=1,2,...,n).

If A is of dimension m X n, then AT is of dimension n X m.

It is easy to verify the following properties :15

1. (A + B)T = AT + BT,2. (aA )T = aAT,3. (AB)T = BTAT,4. (A -1)T = (AT)-1.

If a square matrix S= 11 sib 11i coincides with its transpose (ST = S), thenit is called symmetric. In a symmetric matrix elements that are symmet-rically placed with respect to the main diagonal are equal. Note that theproduct of two symmetric matrices is not, in general, symmetric. By 3.,this holds if and only if the two given symmetric matrices are permutable.

If a square matrix K =,I kii IIi differs from its transpose by a factor -1(KT=-K), then it is called skew-symmetric. In a skew-symmetric matrixany two elements that are symmetrical to the main diagonal differ fromeach other by a factor -1 and the diagonal elements are zero. From 3. itfollows that the product of two permutable skew-symmetric matrices is asymmetric matrix.16

§ 4. Compound Matrices. Minors of the Inverse Matrix

1. Let A = II ask IIi be a given matrix. We consider all possible minors of Aof order p (1 < p < n) :

l (it < $2 < ... < ip

(31)t1 i2 .. iP

sA\k1 ks ... kp) 11kl<ks<...<kpSn/.

The number of these minors is N2, where N = (%) is the number of combina-

tions of n objects taken p at a time. In order to arrange the minors (31) ina square array, we enumerate in some definite order-lexicographic order,for example-all the N combinations of p indices selected from among theindices 1, 2, ... , n.

15 In formulas 1., 2., 3., A and B are arbitrary rectangular matrices for which thecorresponding operations are feasible. In 4., A is an arbitrary square non-singular matrix.

16 As regards the representation of a square matrix A in the form of a product of twosymmetric matrices (A = 8,S,) or two skew-symmetric matrices (A = K,K:), see [357].


If the combinations of indices i, < i2 < . . . < iD and k, < k2 < ... < kphave the numbers a and 0, then the minors (31) will also be denoted asfollows :

a_ A zl i$ ... ip

(kl ks ... kp

By giving to a and j independently all the values from 1 to N, we obtainall the minors of A = 11 a{k 11 ft of order p.

The square matrix of order N11 a.,q I IN

is called the p-th compound matrix of A= II aik lift,; p can take the values1, 2, ... , n. Here VI, = A, and ?I consists of the single element I A I.

Note. The order of enumeration of the combination of indices is fixedonce and for all and does not depend on the choice of A.

Example. Let

A=all anas, assan anau au

a,3am

an

a4$

ai4au

a.,

a44

We enumerate all combinations of the indices 1, 2, 3, 4 taken two at a timeby arranging them in the following order:

(12) (13) (14) (23) (24) (34).

Then

A (1 2) A (1 3)A(1 4) A (2 3) A (12) A(34)

A (1 2) A (1 3) A (1 4) A (2 3) A (2 4) A (3 4)

A(11

2) A(11

3) A (1 4) A (2 3) A(2,)

A (3 4)9fs =

A(2

) A (2

A (3

)2 3

A ( )2 3

A (2 3

A( )1 2 1 3 1 4 2 3 2 4 34

A(21

2) A(21

3) A(21

4)

24A (2 3) A

(2) A

(24)

3 3 3 3

2 4 34

A(

1 2) A(

1 3) A(

1 4)

4A (2 3) A

(234)

34A (34)

§ 4. COMPOUND MATRICES. MINORS 21

We mention some properties of compound matrices :

1. From C = AB it follows that ( E . = SIP $p(p =1, 2, ... , n).For when we express the minors of order p (1 < p :5n) of the matrix

product C, by formula (19), in terms of the minors of the same order of thefactors, then we have :

C(i1 i2 ''-'p _ A

QL

is ... iP) B (llk1 k2 kP 1 z i, <1, <... <(PS, 12 ... lP `\kl k2 ... kP/

it <i2(1S nk1<k2<...<kp- .). (32)

Obviously, in the notation of this section, equation (32) can be writtenas follows :

N

C.# = Y a.AbAp (a, =1, 2, ..., N)A..1

(here a, , and A are the numbers of the combinations of indicesil<i2<...<ip; k1<k2<... <kP; 11<IL, <...<1.). Hence

(EP=%PzP (p=1, 2, ...,n).

2. From B = A' it follows that Q3P= ¶,1 (p = 1, 2, ... , n).This result follows immediately from the preceding one when we set

C = E and bear in mind that 9, is the unit matrix of order N = (n)P

From 2. there follows an important formula that expresses the minorsof the inverse matrix in terms of the minors of the given matrix :

If B = A-', then for arbitrary (i it < i, < <kp Sn)

B it 12I

(ki k2 ... kP)

P Ptr + C 4 w. kr

A

(k'1k2 .. kn-P)

2...n2...n)G

(33)

where it < i2 < < iP and ii < i; < < iA_, form a complete system ofindices 1,2,...,n,asdo and ki<k2<. <k; .,,.

For it follows from AB = E that

QIPZP = (EP


or in more explicit form :N 1 (Y=#),

La',abge-bye- { o (yP)Equations (34) can also be written as follows :

P

(34)

1

1, if... 1

k,)2 - 0,i1 i2 ... ip r 21 Z2 ... ip

nA i i ti} Btk k k } Pp 2 .. p 1 ^2 .. . p0, if k,)2>0

\ 1S ?l < j2 C ... < ip

S n1 .k1 < k2 < ... < kp

(34')

On the other hand, when we apply the well-known Laplace expansionto the determinant I A 1, we obtain

P P. tr +. _V k, 1,

A

(ii2 jp 1).- 1 . A l k2

19tl<ta<... 22 ... ip 2i 2z ... in-P

0, if

PPL' (j,, - k,)2 = 0,vmlP

(j,-k,)2>0,v-1

(35)

where i1 < is < < ip and i'1 < i2 < < i;,-., form a complete system ofindices 1, 2, ... , n, as do k1 < k2 < < kp and ki < k2 < . < k,',-? . Com-parison of (35) with (34') and (34) shows that the equations (34) are

satisfied if we take together with b,,# not B (k, it il. . . k') but rather2 p

P P2 1. + .2 kllve1 v v-1 v_4 (k'11 L2... -P

tii{s si - p1 2 ... n

`4 (1 2...n)

Since the elements b fi of the inverse matrix of W. are uniquely deter-mined by (34), equation (33) must hold.

CHAPTER II

THE ALGORITHM OF GAUSS AND SOME OF ITS APPLICATIONS

§ 1. Gauss's Elimination Method

1. Letaltxl + a12x2 + ... + alnxn = yla21x1 + as2x2 + ... + a,,"'0. = y2

an1x1 + an2x2 + ... + a...,-,, = yn

(1)

be a system of n linear equations in n unknowns x,, x2i ... , X. with right-hand sides y,, y2, ... , y,,.

In matrix form this system may be written as

Ax=y. (1')

Here x = (x,, x2, ..., and y = (y1, y2, - .. , y,,) are columns andA =11 an 1j' is the square coefficient matrix.

If A is non-singular, then we can rewrite this as

or in explicit form :x=A-'y, (2)

n

x. a;,Ell yk (i =1, 2, ... , n). (2')k-1

Thus, the task of computing the elements of the inverse matrixA-1 = 11 a'-'' 11i is equivalent to the task of solving the system of equations

(1) for arbitrary right-hand sides y,, Y2 ,- .. , y,,. The elements of the inversematrix are determined by the formulas (25) of Chapter I. However, theactual computation of the elements of A-' by these formulas is very tediousfor large n. Therefore, effective methods of computing the elements of aninverse matrix-and hence of solving a system of linear equations-are ofgreat practical value.'

I For a detailed account of these methods, we refer the reader to the book by Faddeev[151 and the group of papers that appeared in Uspehi Mat. Nauk, Vol. 5, 3 (1950).

23

24 II. THE ALGORITHM OF GAUSS AND SOME APPLICATIONS

In the present chapter we expound the theoretical basis of some of thesemethods; they are variants of Gauss's elimination method, whose acquaint-ance the reader first made in his algebra course at school.

2. Suppose that in the system of equations (1) we have all r 0. Weeliminate x, from all the equations beginning with the second by addingto the second equation the first multipled by - a31, to the third the first

allmultiplied by - !!', and so on. The system (1) has now been replaced by.a1

the equivalent system

allxl + a12x2 + ... + yl

a22 x2 + + a(i1)T, = y2)

an2x2 + ... + ann)xn = yAl)

(3)

The coefficients of the unknowns and the constant terms of the last n - 1equations are given by the formulas

(1) ail 1)aif = ail - i- all, Yi ) = yt - aailyl (s, 9 = 2, ... , n) . (3)

Suppose that aW 0. Then we eliminate x2 in the same way from thelast n - 2 equations of the system (3) and obtain the system

allxl + alsx2 + a,3x8 + . + alxxn = yla22x2+a2sxa+ ... +a2,x,,=y21)

aS8x8-- ... +a x,,= y$2)

(2) (2) (2)a;2x8 + ... + a xx = yn

(4)

The new coefficients and the new right-hand sides are connected with thepreceding ones by the formulas :

(E) (1) ailt (1) (2) (1) at' (1)ail - a{i - a(1) a2, , y11 = X - ai> y2 n). (5)

E2 22

Continuing the algorithm, we go in n -1 steps from the original system(1) to the triangular recurrent system

allxl + a12x2+ a + ... +a1nx,, = yl

a22x2 + a4s x8 +... + a2n)x, = yyl)(2) (2) (2)-y2 (6)

(n-1) - ("-l)an, xx - yy

§ 1. GAUSS'S ELIMINATION METHOD 25

This reduction can be carried out if and only if in the process all thenumbers all, a22), a33, ann ,,_1 turn out to be different from zero.

This algorithm of Gauss consists of operations of a simple type such ascan easily be carried out by present-day computing machines.

3. Let us express the coefficients and the right-hand sides of the reducedsystem in terms of the coefficients and the right-hand sides of the originalsystem (1). We shall not assume here that in the reduction process all thenumbers all, a22, a3s1, .. ,

(n-2)an-l,n-1 turn out to be different from zero; weconsider the general case, in which the first p of these numbers are differentfrom zero."

a11r0, a22 0, ..., a(l) 0 (pSn-1). (7)

This enables us (at the p-th step of the reduction) to put the original systemof equations into the form

allxl + a12x2 + . . . . . . . . . . . . . . . . + alnxn = yl(1) (1) (1)a22 x2 . . . . . . . . . . . . . . . . . + a2n xn = y2.........................

a(pv 1)xp + . . . . . . . . . . . + atn 1)x =n _ ptn-1) (8)rp)(p)

pap+1. p+lxp+1 + + aP+l.nxn = ytl+t

.

. .

ripP,lx,+1 + . +ann>xn=

ya )a

We denote the coefficient matrix of this system of equations by Gp :

all a12 . . . a1, al.p+1 . . . aln

0 a22(1) . . a2p . a2,(1)2p

0 0 app a(p-1)pp p. +1

0 0 4 awl, p+1

(p-1)apna0')

p+1. n

(9)

(p)0 0 . . . 0 a(p)n, p+1 . ann

The transition from A to G. is effected as follows : To every row of Ain succession from the second to the n-th there are added some precedingrows (from the first p) multiplied by certain factors. Therefore all theminors of order h contained in the first h rows of A and G. are equal :

1 2 . . . h _ G2 . . . h\ /1 .k1<k2< . . . <kASAk1 k2 . . . kn/ p

(kll

k 2 . . . k,, 1( h=1,2, ... , n). (10)


From these formulas we find, by taking into account the structure (9)of Gp,

1 2 . . . p (1) (p-1)A1 2 p y alia22 .. aPP , (11)

1 2 . . . P i _ (1) (P-1) (p) ki 12A (1 2 . . . p k)- alla22 ... ai ark = p + l , ... , n),,( ( )

When we divide the second of these equations by the first, we obtain thefundamental formulas'

12...piarx)

1122 ..p k (i, k=p + 1, ..., n). (13)p

`4(1 2 ... p

If the conditions (7) hold for a given value of p, then they also hold forevery smaller value of p. Therefore the formulas (13) are valid not onlyfor the given value of p but also for all smaller values of p. The sameholds true of (11). Hence instead of this formula we can write theequations

11

A l 1) =a., A (1 2)-aiias, A (1 23)=ai1a22ass,

.... (14)

Thus, the conditions (7), i.e., the necessary and sufficient conditions for thefeasibility of the first p steps in Gauss's algorithm, can be written in theform of the following inequalities:

A (1) 0, (1 2)0, ... ,2

From (14) we then find:

a11=A (1),

A1 2..,p

0(15)

1 2... p

(2) __ A (1

1

2 32 3)

a33 1 2 , ... ,(1 2)

(1 2...p'A \1 2 ... p) (16)a - 1 2... P-1)*-1

A (1 2 ... p-1)

In order to eliminate x1, 02, ... , xp consecutively by Gauss's algorithmit is necessary that all the values (16) should be different from zero, i.e.,that the inequalities (15) should hold. However, the formulas for ate) makesense if only the last of the conditions (15). holds.

2 See [181], p. 89.

§ 1. GAUSS'S ELIMINATION METHOD 27

4. Suppose the coefficient matrix of the system of equations (1) to be of

rank r. Then, by a suitable permutation of the equations and a renumber-ing of the unknowns, we can arrange that the following inequalities hold :

A (1 2 ... 1) 0 (j =1, 2, ..., r). (17)I1 2_7This enables us to eliminate x1f x2j ... , x, consecutively and to obtain thesystem of equations

ai1x1 + al2x2 + . . . . . . . . . . . . . . + alnxn

a22 x2 .+ .. ............ + a(1)xn

==ys)

a(;ri-1)x y:'-1) (18)(*) (r) (r)

ar+I.+lxr+1 + .. -f- at+l.nxn yr+l

(r () (r)aar+lxr+1 .+' .....+. annxn = yn

Here the coefficients are determined by the formulas (13). From theseformulas it follows, because the rank of the matrix A =11 as ft' is equal to r,that

aik =0 (i, k=r+ 1, ..., n). (19)

Therefore the last n - r equations (18) reduce to the consistency conditions

yir) =0 (i = r + 1, ... , n) . (20)

Note that in the elimination algorithm the column of constant terms issubjected to the same transformations as the other columns, of coefficients.Therefore, by supplementing the matrix A aik 111 with an (n + 1) -thcolumn of the constant terms we obtain :

l...p i l(P)=A(l...p n+1y (i=1,2,... ,n; p=1,2,... ,r). . (21)

In particular, the consistency conditions (20) reduce to the well-knownequations

Al...r r+' )=o (22)1 ... r n+1


If n = r, i.e. if the matrix A = 11 a,, (ii is non-singular, and

A (1 2 ...') O(9 -1, 2, ... , n) ,

then we can eliminate x1, x2, ... , x,a_, in succession by means of Gauss'salgorithm and reduce the system of equations to the form (6).

§ 2. Mechanical Interpretation of Gauss's Algorithm

1. We consider an arbitrary elastic statical system 8 supported on edges(for example, a string, a rod, a multispan rod, a membrane, a lamina, or adiscrete system) and choose n points (1), (2), . . . , (n) on it. We shallconsider the displacements (sags) y1, y2, ... , y of the points (1), (2), ... ,(n) of S under the action of forces F1, F2, ... , F, applied at these points.

Fig. I

ajc

Fig. 2

We assume that the forces and the displacements are parallel to one and thesame direction and are determined, therefore, by their algebraic magnitudes(Fig. 1). Moreover, we assume the principle of linear superposition offorces:

1. Under the combined action of two systems of forces the correspondingdisplacements are added together.

2. When the magnitudes of all the forces are multiplied by one and thesame real number, then all the displacements are multiplied by the samenumber.

§ 2. MECFIANICAL INTERPRETATION OF GAUSS'S ALGORITHM 29

We denote by aik the coefficient of influence of the point (k) on thepoint (i), i.e., the displacement of (i) under the action of a unit forceapplied at (k) (i, k = 1, 2, ... , n) (Fig. 2). Then under the combined actionof the forces F1, F2, ... , F. the displacements y1, y2, ... , y are determinedby the formulas

n

,E aikFk = yi (i=1, 2, ... , n). (23)k- I

Comparing (23) with the original system (1), we can interpret the taskof solving the system of equations (1) as follows :

The displacements y1, y2i ... , y,, being given, we are required to find thecorresponding forces F1, F2, ... , F,,.

We denote by Sp the statical system that is obtained from kS" by introduc-ing p fixed hinged supports at the points (1), (2), . . . , (p) (p < n). Wedenote the coefficients of influence for the remaining movable points(p + 1), ... , (n) of the system S,, by

a(ik) (i,k=p+1,...,n)(see Fig. 3 for p =1) .

Fig. 3

The coefficient a;k) can be regarded as the displacement at the point (i)of S under the action of a unit force at (k) and of the reactions RI, R2, ... , RPat the fixed points (1), (2), ... , (p). Therefore

a{k) R1ai1 + ... + Rpaip + aik, (24)

On the other hand, under the same forces the displacements of thesystem Sat the points (1), (2), . . ., (p) are zero :

R1a11 + ... + Rpalp + alk = 0

R1ap1 + ... + Rpapp + apk = 0- I(25)

30 II. THE ALGORITIIM OF GAUSS AND SOME APPLICATIONS

If

IsA

then we can determine RI, R2, ... , R. from (25) and substitute the expres-sions so obtained in (24). This elimination of RI, R2, ... , R. can be carriedout as follows. To the system of equations (25) we adjoin (24) writtenin the form

Rla;1 + + Rpa{p + aik - aik) =0. (24')

Regarding (25) and (24') as a system of p + 1 homogeneous equationswith non-zero solutions RI, R2, ... , Rp, Rp+1 = 1, we see that the determinantof the system must be zero :

all ... alp alk

apt ... as,, apk

Hence

1 2. .pl(1 2.. p/

= 0.

(i,k=p+1, ...,n). (26)

These formulas express the coefficients of influence of the `support' systemS. in terms of those of the original system S.

But formulas (26) coincide with formulas (13) of the preceding section.Therefore for every p (< n -1) the coefficients (i, k = p + 1, ... , n)in the algorithm of Gauss are the coefficients of influence of the supportsystem sp.

The truth of this fundamental proposition can also be ascertained bypurely mechanical considerations without recourse to the algebraic deriva-tion of formulas (13). For this purpose we consider, to begin with, thespecial case of a single support: p =1 (Fig. 3). In this case, the coefficientsof influence of the system Si are given by the formulas (we put p = 1in (26) ) :

I)li

A (1 k) - ail_a;k atk - alk

1(i, k= 1, 2, ..., n)

A( 1 all1

a --

(1 2...pA 1 2... p °,

ail ... a(p aik- a(p)

(1 2 .. , p i(p) -

A\1 2 ... p k)

These formulas coincide with the formulas (W).

§ 3. SYLVESTER'S DETERMINANT IDENTITY 31

Thus, if the coefficients a{k (i, k = 1, 2, ... , n) in the system of equations(1) are the coefficients of influence of the statical system S, then the coeffi-cients aik (i, k = 2, ... , n) in Gauss's algorithm are the coefficients of in-fluence of the system S1. Applying the same reasoning to the system S1and introducing a second support at the point (2) in this system, we see thatthe coefficients a?k) (i, k = 3, . . . , n) in the system of equations (4) are thecoefficients of influence of the support system S2 and, in general, for everyp (< n - 1) the coefficients a{k) (i, k = p + 1, ... , n) in Gauss's algorithmare the coefficients of influence of the support system Si,.

From mechanical considerations it is clear that the successive introduc-tion of p supports is equivalent to the simultaneous introduction of thesesupports.

Note. We wish to point out that in the mechanical interpretation of theelimination algorithm it was not necessary to assume that the points atwhich the displacements are investigated coincide with the points at whichthe forces F1i F2, ... , F. are applied. We can assume that y1i y2, ... , ynare the displacements of the points (1), (2), . . . , (n) and that the forcesF1, F2, ... , F. are applied at the points (1'), (2'), . . . , (n'). Then aik isthe coefficient of influence of the point (k') on the point (k). In that casewe must consider instead of the support at the point (j) a generalized sup-port at the points (j), (j') under which the displacement at the point (j)is maintained all the time equal to zero at the expense of a suitably chosenauxiliary force Rf at the point (j'). The conditions that allow us to intro-duce p generalized supports at the points (1), (1') ; (2), (2'), ... ; (p), (p'),i.e., that allow us to satisfy the conditions y1 = 0, y2 = 0, ... , yp = 0 for arbi-trary FP+1i . . . , F,, at the expense of suitable R1= F1, . . . , Rp = Fp, can beexpressed by the inequality

A1 2 ... p ; 0.1 2...p)

§ 3. Sylvester's Determinant Identity

1. In § 1, a comparison of the matrices A and G, led to equations (10)and (11).

These equations enable us to give an easy proof of the important deter-minant identity of Sylvester. For from (10) and (11) we find :

=A 1 2... n) =A

(1 2...

plA A1 2 n 1 2 p

a(P) a(P)P-F.P+1 p+ 1,n

. . . a(P)nn

. (27)


We introduce borderings of the minor A (1 2 ::: P) by the determinants

b;x=A (1 2 ... V k)(i, k=p+ 1, ..., n).

The matrix formed from these determinants will be denoted by

B =11bjkilp+1.

Then by formulas (13)

aPP+.P+1 ... apP+I,n

acn,P)p+ 1 ... a(P)nn

by+1,P+i ... by+1,n

bn, P+ 1 ... bnnrA (1 2 ... P n-P

l 1 2 ... p)1 [ 12[ (1 2 ...

Therefore equation (27) can be rewritten as follows:

n-P-p 1

(B, A (1 2 ... p)JA 1. (28)

This is Sylvester's determinant identity. It expresses the determinant I Bformed from the bordered determinants in terms of the original determinantand the bordered minor.

We have established equation (28) for a matrix A( 2 ::: )whose ele-ments satisfy the inequalities

A(1 2 ... 0 (29)

(9 =1, 2, ... , p)

However, we can show by a `continuity argument' that this restrictionmay be removed and that Sylvester's identity holds for an arbitrary matrixA = 11 a 11i. For suppose that the inequalities (29) do not hold. We intro-duce the matrix

As = A + eE.

Obviously lim A. = A. On the other hand, the minorss-00

2.A s 1 2.. ' -ef {-...j

(j =1, 2, ... , P)

JBI

§ 4. DECOMPOSITION OF SQUARE MATRIX INTO TRIANGULAR FACTORS 33

are p polynomials in e that do not vanish identically. Therefore we canchoose a sequence e,,, --), 0 such that

A`.1 2.. 9,)0

We can write down the identity (28) for the matrices Aem. Taking thelimit m -* oo on both sides of this identity, we obtain Sylvester's identityfor the limit matrix3 A= lim A6,.

lA- 00

If we apply the identity (28) to the determinant

(1 2 ... p i1 i2 ... iq i1 < i2 < ... < iqA 1 2 ... P k1 k2 ... kq) \p < k1 < k2 < ... < ke

Sn

then we obtain a form of Sylvester's identity particularly convenient forapplications

i1 i2 ... iq 1 2 ... p)19-1 A (1 2 ... p it i2 ... iqB

k1 k2 ... kq) [A (1 2 ... p11 2 ... p k1 k2 ... kq)(30)

§ 4. The Decomposition of a Square Matrix into Triangular Factors

1. Let A= II aik IIi be a given matrix of rank r. We introduce the follow-ing notation for the successive principal minors of the matrix

Dk=AG 2...k) (k= 1, 2, ..., n).

Let us assume that the conditions for the feasibility of Gauss's algorithmare satisfied:

Dk O (k^1,2,...,r).

We denote by G the coefficient matrix of the system of equations (18)to which the system

n

I auxk= Y (i =1, 2, ..., n)k-1

8 By the limit (for p -* oo) of a sequence of matrices Xp =1I xtkl 11i we mean thematrix %_ where xjk=limx(ik (i,k=1,2,...,a).

p -soe


has been reduced by the elimination method of Gauss. The matrix G is ofupper triangular form and the elements of its first r rows are determinedby the formulas (13), while the elements of the last n - r rows are all equalto zero:'

I all

to

a12... a1r al, +1 ... aln(1) (1) (1) (1)a22 ... a2r a2,,+1 ... a2,,

G = 0 0 ... arr0 0 ... 0

0 ... 0

ar, r+1 ... am

0 ...0 ;;

0 ... 0

The transition from A to G is effected by a certain number N of opera-tions of the following type : to the i-th row of the matrix we add the j-throw (j < i), after a preliminary multiplication by some number a. Suchan operation is equivalent to the multiplication on the left of the matrix tobe transformed by the matrix

(7) (i)0 ... 0 ... 0

1

(31)

0 ... a ... 1 ... 0

0 ... 0 ... 0 ... 11

In this matrix the main diagonal consists entirely of units, and all theremaining elements, except a, are zero.

Thus,G =Wv ... W2W1A,

where each matrix W1, W2, ... , WN is of the form (31) and is therefore alower triangular matrix with diagonal elements equal to 1.

4 see formulas (19). G coincides with the matrix G, (p. 25) for p=r.


LetW = WV ... W2W1. (32)

ThenG= WA. (33)

We shall call W the transforming matrix for A in Gauss's eliminationmethod. Both matrices G and W are uniquely determined by A. From(32) it follows that W is lower triangular with diagonal elements equal to 1.

Since W is non-singular, we obtain from (33) :

A =W-10. (33')

We have thus represented A in the form of a product of a lower triangularmatrix W-1 and an upper triangular matrix G. The problem of decom-posing a matrix A into factors of this type is completely answered by thefollowing theorem :

THEOREM 1: Every matrix A= 11as )ji of rank r in which the first rsuccessive principal minors are different from zero

D,t =A( 2 .. , k) 0 for k = 1, 2 ..., r (34)

can be represented in the form of a product of a lower triangular matrix Band an upper triangular matrix C

A =BC=

Here

b11 0 ... 0

bn1 bn2 ... bnn

C11 C12 ... CIn0 C22 ... C2n

0 0 ... Cnn

D,b11C11= D1, b22C22 =

D,D1

, bncn = D,_1 .

(35)

(36)

The values of the first r diagonal elements of B and C can be chosenarbitrarily subject to the conditions (36).

When the first r diagonal elements of B and C are given, then the ele-ments of the first r rows of B and of the first r columns of C are uniquelydetermined, and are given by the following formulas:

b - bA(12...k-1k)- - _ A(12...k-1g)cc

(37)

#r kkA (1

12 k,

2 ... k)kk A (112 k`° - 2 ... k)

(g =k, k +1, ..., n; k=1, 2, ..., r).

36 IT. TnE ALGORITIIM OF GAUSS AND SOME APPLICATIONS

If r < n (I A 1=0), then all the elements in the last n-r rows of Bcan be put equal to zero and all the elements of the last n-r columns of Ccan be chosen arbitrarily; or, conversely, the last n - r rows of C can befilled with zeros and the last n - r rows of B can be chosen arbitrarily.

Proof. That a representation of a matrix satisfying conditions (34) canbe given in the form of a product (35) has been proved above (see (33') ).

Now let B and C be arbitrary lower and upper triangular matrices whoseproduct is A. Making use of the formulas for the minors of the product oftwo matrices we find:

(1 ...k-1g 1 2...k-1g a1 a2 ...akA

1 2 ... k- 1 k) «,« , <aB (ai a2 ... ak-1 k) C (1 2 ... k) (38)

(g=k,k+1, ...,n; k=1,2, ...,r).

Since C is an upper triangular matrix, the first k columns of C contain only

one non-vanishing minor of order k, namely C (1 2 ::: k) Therefore, equa-

tion (38) can be written as follows :

A(1 2 ... k-1k)-B(1

2 ... k-1 k)C(1 2 ... k)= b11b22 ... bk-1. k-lbrkc11C22 ... ckk

(g=k, k + 1, ...,n;.=1,2, ..., r).We put g = k in this equation, obtaining

b11b22 ... bur-11c22 ... ckk - Dk

(39)

(k== 1, 2,..., r), (40)

and relations (36) follow.Without violating equation (35) we may multiply the matrix B in that

equation on the right by an arbitrary non-singular diagonal matrixM = !f yj8ttf i, while multiplying C at the same time on the left byM'1= aik But this is equivalent to multiplying the columns of B byu1, µ2, ... , .u., respectively, and the rows of C by µl1, u21, ..., p;1. We maytherefore give arbitrary values to the diagonal elements b1i, b22, ... , b,.r andc,,, c2.2.... , c,,, provided they satisfy (36).

Further, from (39) and (40) we find :12 ... k-1 gg

bok=b 1 2... k (g-k,k+1, ...,n; k=1,2, ...,r),'4(1 2 ... k)

i.e., the first formulas in (37). The second formulas in (37), for the ele-ments of C, are established similarly.


We observe that in the multiplication of B and C the elements bk, of thelast n - r columns of B and the elements c0k of the last n - r rows of C aremultiplied only among each other. We have seen that all the elements ofthe last n - r rows of C may be chosen to be zero.' But as a consequence,the elements of the last n - r columns of B may be chosen arbitrarily.Clearly the product of B and C does not change if we choose the last n - rcolumns of B to be zeros and choose the elements of the last n - r rows of

C arbitrarily.This completes the proof of the theorem.From this theorem there follow a number of interesting corollaries.

COROLLARY 1: The elements of the first r columns of B and the first rrows of C are connected with the elements of A by the recurrence relations

k-1aik- bticit

ba= '1 (i?k; i=1,2, ..., n; k= 1,2, ..., r),Ckk

i-1aik - I bticik

ca =i-1

bii(iSk; i=1,2, ..., r; k=1,2, ..., n).

(41)

The relations (41) follow immediately from the matrix equation (35) ; theycan be used to advantage in the actual computation of the elements of Band C.

COROLLARY 2: If A= jI as jji is a non-singular matrix (r= n) satisfying(34), then the matrices B and C in the representation (35) are uniquelydetermined as soon as the diagonal elements of these matrices are chosen inaccordance with (36).

COROLLARY 3: If S gtk11 i is a symmetric matrix of rank r and

Dk 8(12 ... k}

0 (k=1, 2, ...,r),

then

S=BB',

where B= I I bik I I i is a lower triangular matrix in which

5 This follows from the representation (33'). Here, as we have shown already, arbi-trary values may be given to the diagonal elements b11, ..., be., C , ..., c,r provided (36)is satisfied by the introduction of suitable factors µi, u , ..., µ,. .


1A(1

2...k-19(9k,k+1,...,n;k=1,2,...,r),byk = {'DkDk -1 1 2 ... k- I k) (42)

0 (g=k,k+ 1, ...,n; kr+ 1, ...,n).2. In the representation (35) let the elements of the last n - r columnsof C be zero. Then we may set

B=F

bil

b0

0 I1

0 0

C= L , (43)

where F and L are upper and lower triangular matrices respectively ; thefirst r diagonal elements of F and L are 1 and the elements of the last n - rcolumns of F and the last n - r rows of L can be chosen completely arbi-trarily. Substituting (43) for B and C in (35) and using (36), we obtainthe following theorem :

THEOREM 2: Every matrix A a of rank r in which

Dk _A (12 ... k) 0

fork= 1, 2, ..., r

can be represented in the form of a product of a lower triangular matrix F,a diagonal matrix D, and an upper triangular matrix. L :

1 0 ...0

A=FDL=f21 I ... 0

0

where

Al fn2 ... 1

D,D,

D1

c11 0

e0

0 0

0

1 l19 ... lln

0 1 . . . 12n

100...1

_`4 (1 2...k-1k) A(12 ... k -I g)fyk

(1 c ... k)

lm

(1 2 ... k)

(g=k+1,...,n;k=1,2,..., r),

, (44)

(45)

and f gk and lky are arbitrary for g = k + 1, ... , n ; k = r + 1, ... , n.


3. The elimination method of Gauss, when applied to a matrix A =11 a 11 i

of rank r for which Dk 0 (k =1, 2, ... , r), yields two matrices: a lower

triangular matrix W with diagonal elements 1 and an upper triangular

matrix G in which the first r diagonal elements are D,,DDDr and the,1,

Dr-1

last n - r rows consist entirely of zeros. G is the Gaussian form of thematrix A ; W is the transforming matrix.

For actual computation of the elements of IV we recommend the follow-

ing device.We obtain the matrix 14' when we apply to the unit matrix E all the

transformations (given by W1 , ... , W,,) that we have performed on A inthe algorithm of Gauss (in this case we shall have instead of the product WA,equal to G, the product WE, equal to W). Let us, therefore, write the unitmatrix E on the right of A :

al.. ... al 1 ... 0

a,,,a 0 ... I

(46)

By applying all the transformations of the algorithm of Gauss to thisrectangular matrix we obtain a rectangular matrix consisting of the twosquare matrices G and W :

(G, W).

Thus, the application of Gauss's algorithm to the matrix (46) gives thematrices G and W simultaneously.

If A is non-singular, so that A I =4 0, then I G 0 as well. In thiscase, (33) implies that A-' = G' Zl'. Since G and 11' are determined bymeans of the algorithm of Gauss, the task of finding the inverse matrix A-'reduces to determining G-' and multiplying G-' by 11'.

Although there is no difficulty in finding the inverse matrix G-' oncethe matrix G has been determined, because G is triangular, the operationsinvolved can nevertheless be avoided. For this purpose we introduce, to-gether with the matrices G and W, similar matrices G, and W, for thetransposed matrix AT. Then AT= WW',-'G,, i.e.,

A=G;W;-1.

Let us compare (33') with (44) :

A=W-'G, A=FDL.

(47)


These equations may be regarded as two distinct decompositions of the form(35) ; here we take the product DL as the second factor C. Since the first rdiagonal elements of the first factors are the same (they are equal to 1),their first r columns coincide. But then, since the last n - r columns of Fmay be chosen arbitrarily, we chose them such that

F = W-'. (48)

On the other hand, a comparison of (47) with (44),

A=G W;-', A=FDL,shows that we may also select the arbitrary elements of L in such a way that

L=W;-1. (49)

Replacing F and L in (44) by their expressions (48) and (49), we obtain

A = W-1DW; -1 . (50)

Comparing this equation with (33') and (47) we find :

G = DW; -1, G; = W ID . (51)

We now introduce the diagonal matrix

^ 1

Dl D,_1(52)D,0,...,0

1 : Dr

D = DDD,

it follows from (50) and (51) that

A=G;DG. (53)

Formula (53) shows that the decomposition of A into triangular factorscan be obtained by applying the algorithm of Gauss to the matrices A and AT.

Now let A be non-singular (r = n). Then I D 0, D = D-'. There-fore it follows from (50) that

A'=WW. (54)

This formula yields an effective computation of the inverse matrix A-' bythe application of Gauss's algorithm to the rectangular matrices

(A, E) (AT, E).

§ 5. PARTITIONED MATRICES. GENERALIZED ALGORITHM OF GAUSS 41

If, in particular, we take as our A a symmetrical matrix S, then Glcoincides with G and W1 with W, and therefore formulas (53) and (54)assume the form

B =GTDG, (55)

S-1= WTDW. (56)

§ 5. The Partition of a Matrix into Blocks. The Technique of Operat.ing with Partitioned Matrices. The Generalized Algorithm of Gauss

It often becomes necessary to use matrices that are partitioned into rec-tangular parts-'cells' or `blocks.' In the present section we deal with suchpartitioned matrices.

1. Let a rectangular matrix

A=IIa,, ll (i=1,2,...,m; k=1,2,...,n) (57)

be given.By means of horizontal and vertical lines we dissect A into rectangular

blocks:

A11 A1!...Au )m1

A= A!1 An...A2t ) m2

A.1 A,s...An )m.(58)

We shall say of matrix (58) that it is partitioned into st blocks, orcells A.# of dimensions ma X np (a = 1, 2.... or that itis represented in the form of a partitioned, or blocked, matrix. Instead of(58) we shall simply write

A = (Ap) (a =1, 2, ..., 8; fi = 1, 2, .. ., f). (59)

In the case s = t we shall use the following notation :

A = (Aap)i (60)


Operations on partitioned matrices are performed according to the sameformal rules as in the case in which we have numerical elements instead ofblocks. For example, let A and B be two rectangular matrices of equaldimensions partitioned into blocks in exactly the same way :

A =(Aas), B =(Bap)

It is easy to verify that

A -}- B= (Aap -}- Bp)

(a=1,2,...,8; #=1,2,...,t). (61)

(a =1, 2, ..., 8; fl =1, 2, ..., t). (62)

We have to consider multiplication of partitioned matrices in moredetail. We know (see Chapter I, p. 6) that for the multiplication of tworectangular matrices A and B the length of the rows of the first factor Amust be the same as the height of the columns of the second factor B. For`block' multiplication of these matrices we require, in addition, that thepartitioning into blocks be such that the horizontal dimensions in the firstfactor are the same as the corresponding vertical dimensions in the second :

nl n9 n,

All All_ Al$

A - (A2. All ... A$1} ml

) ml B =

pt pl .. pnBu Bll ... BBll B22 ... B

IN ) nl

l } nl (63)

A,l A12 ... A,j

Then it is easy to verify that

} m, BB l Bt 2 ... B, ) n,

a 2 8AB= C = (Cap) , where C,p = ,' A,6Bas

, ...,(64)

a_l P =1, 2, ..., u

We mention separately the special case in which one of the factors is aWequasi-diagonal matrix. Let A be quasi-diagonal, i.e., let s = t and Aap = 0for a P. In this case formula (64) gives

Cap= AaaBap (a =1, 2, ..., 8; #= 1, 2, ..., u). (65)

When a partitioned matrix is multiplied on the left by a quasi-diagonalmatrix, then the rows of the matrix are multiplied on the left by the corres-ponding diagonal blocks of the quasi-diagonal matrix.

Now let B be a quasi-diagonal matrix, i.e., let t = u and Bop = 0 for a # f3.1'hen we obtain from (64) :

Cap =AapBpp (a =1, 2, ... , 8; #= 1, 2, ..., u). (66)


When a partitioned matrix is multiplied on the right by a quasi-diagonalmatrix, then all the columns of the partitioned matrix are multiplied on theright by the corresponding diagonal cells of the quasi-diagonal matrix.

Note that the multiplication of square partitioned matrices of one andthe same order is always feasible if the factors are split into equal quadraticschemes of blocks and there are square matrices on the diagonal places ineach factor.

The partitioned matrix (58) is called upper (lower) quasi-triangular ifs = t and all Aae =0 for a > fi (a < 1). A quasi-diagonal matrix is a specialcase of a quasi-triangular matrix.

From the formulas (64) it is easy to see that :

The product of two upper (lower) quasi-triangular matrices is itself anupper (lower) quasi-triangular matrix;' the diagonal cells of the productare obtained by multiplying the corresponding diagonal cells of the factors.

For when we set s = t in (64) and

we find

a dn

(;aa = A.Baa I(a,13=1, Z..... 81.

The case of lower quasi-triangular matrices is treated similarly.

We mention a rule for the calculation of the determinant of a quasi-triangular matrix. This rule can be obtained from the Laplace expansion.

If A is a quasi-triangular matrix (in particular, a quasi-diagonal matrix),then the determinant of the matrix is equal to the product of the determinantof the diagonal cells:

IAI=IA11IIA:aI ... I A,.I

2. Let a partitioned matrixnl nt

All A12 ... Alt )m1A = A81 An ... A2c ) ma

Ail Aso ... A498 ) m,

Aaa=O, Bas=0 for a<#,

C0 =0 for a<#

(67)

(68)

6 It is assumed here that the block multiplication is feasible.


be given. To the a-th row of submatrices we add the fl-th row, multipliedon the left by a rectangular matrix X of dimension ma. X nfl. We obtain apartitioned matrix

B=

An ... All............

Aa1+XA,01 ... Aa +XA08

Apt ... App

(69)

\.

We introduce an auxiliary square matrix V, which we give in the formof a square scheme of blocks :

m1 ... Ma ... no ... m,

O ... O ... El )m,In the diagonal blocks of V there are unit matrices of order m1, m2, ... , m,,

respectively ; all the non-diagonal blocks of V are equal to zero except theblock I that lies at the intersection of the a-th row and 1-th column.

It is easy to see thatVA=B. (71)

As V is non-singular, we have' for the ranks of A and B :

rA=rB. (72)

In the special case where A is a square matrix, we have from (70) :

I VIIAI=IB1. (73)

But the determinant of the quasi-triangular matrix V is 1:

IVI=1. (74)

Hence

IAI=IB1. (75)

See p. 12.


The same conclusion holds when we add to an arbitrary column of (68)another column multiplied on the right by a rectangular matrix x of suitabledimensions.

The results obtained can be formulated as the following theorem.

THEOREM 3: If to the a-th row (column) of the blocks of the partitionedmatrix A we add the fl-th row (column) multiplied on the left (right) by arectangular matrix X of the corresponding dimensions, then the rank of Aremains unchanged under this transformation and, if A is a square matrix,the determinant of A is also unchanged.

3. We now consider the special case in which the diagonal block All in Ais square and non-singular (I A11 10).

To the a-th row of A we add the first row multiplied on the left by- Aa1Ai1 (a = 2, ... , s). We thus obtain the matrixAu Ala... Al=

o!) clf0

Aaa ... Aal

0 A; ... Anl

>

Bl =

wherel

1Alp-Am# (a=2, ..., s; fl=2,s ..., t).Aa=-A,A11

(76)

(77)

If the matrix Ap=l is square and non-singular, then the process can becontinued. In this way we arrive at the generalized algorithm of Gauss.

Let A be a square matrix. Then

A(' ... Awl

IAI=1B11=IA111 . (78)

Formula (78) reduces the computation of the determinant I A 1, consist-ing of st blocks to the computation of a determinant of lower order consistingof (s -1) (t -1) blocks.8

Let us consider a determinant A partitioned into four blocks :

dI C

AD I

(79)

where A and D are square matrices.Suppose I A I # 0. Then from the second row we subtract the first

multiplied on the left by CA-'. We obtain

s If A) is a square matrix and IA22 # 0, then this determinant of (s -1) (t -1)blocks can again be subjected to such a transformation, etc.


d= A B

0 D CA-1B=1A1 ID-CA-IBI. (I)

Similarly, if I D I 0, we subtract from the first row in d the secondmultiplied on the left by BD-', obtaining

d= A - BD-1C 0C D

=IA-BD-ICI IDI. (II)

In the special case in which all four matrices A, B, C, D are square (of oneand the same order n), we deduce from (I) and (II) the formulas of Schur,which reduce the computation of a determinant of order 2n to the computa-tion of a determinant of order n :

d=AD - ACA-'BI (A 0), (Ia)d=IAD-BD-'CDI (D 0). (IIa)

If the matrices A and C are permutable, then it follows from (Ia) that

d = I AD - CBI (provided AC = CA). (Ib)

Similarly, if C and D are permutable, then

d =IAD-BC I (provided CD = DC). (IIb)

Formula (Ib) was obtained under the assumption I A 10, and (IIb )under the assumption I D 0. However, these restrictions can be removedby continuity arguments.

From formulas (I) - (IIb) we can obtain another six formulas by replac-ing A and D on the right-hand sides simultaneously by B and C.

Example.

d=

By formula (Ib),

1 0 b1 b,0 1 b3 b4

cl c, d1 d2

C3 c4 d, d4

d = I d1-- c1b1- c,b,d, - c,b1 --- cob,

d, --- c1b - c,b4d4-w b,-C4b`


4. From Theorem 3 there follows also

THEOREM 4: If a rectangular matrix R is represented in partitioned form

R = (C D), (80)

where A is a square non-singular matrix of order n (I A J 0), then the rankof R is equal to n if and only if

D= CA-1B. (81)

Proof. We subtract from the second row of blocks of R the first, multipliedon the left by CA-'. Then we obtain the matrix

A BD-CA-1B

. (82)

By Theorem 3, the matrices R and T have the same rank. But the rank of Tcoincides with the rank of A (namely, n) if and only if D -CA - 1B = 0,i.e., when (80) holds. This proves the theorem.

From Theorem 4 there follows an algorithm9 for the construction of theinverse matrix A-1 and, more generally, the product CA-'B, where B andC are rectangular matrices of dimensions n X p and q X n.

By means of Gauss's algorithm,10 we reduce the matrix

-C 0) (JAJ o) (83)

to the form

(84)

We will show that

X =CA-1B. (85)

For, the same transformation that was applied to the matrix (83) reducesthe matrix

9 see [181].to We do not apply here the entire algorithm of Gauss to the matrix (83) but only

the first n steps of the algorithm, where n is the order of the matrix. This can be done ifthe conditions (15) hold for p = n. But if these conditions do not hold, then, since

A I 0, we may renumber the first n rows (or the first n columns) of the matrix (83)so that the n steps of Gauss's algorithm turn out to be feasible. Such a modified Gaussianalgorithm is sometimes applied even when the conditions (15). with p = n, are satisfied.


to the form

A B(- C - CA-1B

(G BlO X-CA-1B

(86)

(87)

By Theorem 4, the matrix (86) is of rank n (n is the order of A). Butthen (87) must also be of rank n. Hence X - CA-1B = 0, i.e., (85) holds.

In particular, if B = y, where y is a column matrix, and C = E, then

X A-1y.

Therefore, when we apply Gauss's algorithm to the matrix

we obtain the solution of the system of equations

Ax =y.

Further, if in (83) we set B = C = E, then by applying the algorithmof Gauss to the matrix

we obtain(- E O)'

(0 %)'where

1=A-'.Let us illustrate this method by finding A-' in the following example.

Example. Let

A=2 1 1

1 0 23 1 2

It is required to compute A-'.We apply a somewhat modified elimination method" to the matrix

11 See the preceding footnote.


2 1 1 1 0 0

1 0 2 0 1 0

3 1 2 0 0 1

- 1 0 0 0 0 0

0-1 0 0 0 0

0 0-1 0 0 0

To all the rows we add certain multiples of the second row and we arrangethat all the elements of the first column, except the second, become zero.Then we add to all the rows, except the second, the third row multiplied bycertain factors and see to it that in the second column all the elements, exceptthe second and third, become zero. Then we add to the last three rows thefirst row with suitable factors and obtain a matrix of the form

Therefore

* * * * * ** * * * * ** * * * * *0 0 0-2-1 2

0 0 0 4 1 -30 0 0 1 1-1

A'1=-2 -I 2

4 1 -31 1 -1

I

CHAPTER IIILINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE

Matrices constitute the fundamental analytic apparatus for the study oflinear operators in an n-dimensional space. The study of these operators,in turn, enables us to divide all matrices into classes and to exhibit thesignificant properties that all matrices of one and the same class have incommon.

In the present chapter we shall expound the simpler properties of linearoperators in an n-dimensional space. The investigation will be continuedin Chapters VII and IX.

§ 1. Vector Spaces

1. Let R be a set of arbitrary elements x, y, a, ..An which two operations aredefined.-' the operation of `addition' and the operation of `multiplicationby a number of the field F.' We postulate that these operations can alwaysbe performed uniquely in R and that the following rules hold for arbitraryelements x, y, z of R and numbers a, # of F :

1. x+y=y+x.2. (x+y)+x=x+(y+z).3. There exists an element o in R such that the product of the number 0

with any element x of R is equal to o :

4. 1 x=x.5. a(fx)=(afl)x.6. (a+fl)x=ax-- x.7. a(x+y) =ax+ay.

1 These operations will be denoted by the usual signs `+' and °.'; the latter sign willsometimes be omitted.

50

§ 1. VECTOR SPACES 51

DEFINITION 1: A set R of elements in which two operations-'addition'of elements and `multiplication of elements of R by a number of F'-canalways be performed uniquely and for which postulates 1.-7. hold is called avector space (over the field F) and the elements are called vectors.'

DEFINITION 2. The vectors x, y, ... , u of R, are called linearly dependenti f there exist numbers a, P ,.. . , d in F, not all zero, such that

ax+fly+...+8u=o. (1)

If such a linear dependence does not hold, then the vectors x, y, ... , u arecalled linearly independent.

If the vectors x, y, ... , u are linearly dependent, then one of the vectorscan be repesented as a linear combination, with coefficients in it, of theremaining ones. For example, if a 0 in (1), then

x=- AY -...- 8U.a a

DEFINITION 3. The space R is called finite-dimensional and the numbern is called the dimension of the space if there exist n linearly independentvectors in R, while any n + 1 vectors in R are linearly dependent. If thespace contains linearly independent systems of an arbitrary number ofvectors, then it is called infinite-dimensional.

In this book we shall study mainly finite-dimensional spaces.

DEFINITION 4. A system of n linearly independent vectors el, e2, ..., enof an n-dimensional space, given in a definite order, is called a basis of thespace.

2. Example 1. The set of all ordinary vectors (directed geometrical seg-ments) is a three-dimensional vector space. The part of this space thatconsists of the vectors parallel to some plane is a two-dimensional space,and all the vectors parallel to a given line form a one-dimensional vectorspace.

Example 2. Let us call a column x = (XI, x2, ... , of n numbers ofF a vector (where n is a fixed number). We define the basic operations asoperations on column matrices :

2 It is easy to see that all the usual properties of the operations of addition and ofmultiplication by a number follow from propertieR 1:7. For example, for arbitraryx of R we have:

x+o-x [x+ox + (-x) = o, where -x =(- 1) -x;

etc.

52 III. LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACE

(x1, x2, ..., xn) + (Yi, yar ..., Yn) _ (Xi + Yi' x2 + y2, ..., '.L'n + yn),a (x1, x2, ..., xn) _ ((Xx1, axe, ..., axn).

The null vector is the column (0, 0, ... , 0). It is easy to verify that all thepostulates 1.-7. are satisfied. The vectors form an n-dimensional space.As a basis of the space we can take, for example, the column of unit matricesof order n :

(1,0,...,0), (0,1,...,0),..., (0,0,...,1).

The space thus defined is often called the n-dimensional number space.

Example 3. The set of all infinite sequences (x1, x2i ... , xn.... ) in whichthe operations are defined in a natural way, i.e.,

(x1, x1, ..., xn, . . .) + (y1, y2, ..., yn, ...) = (x1 + y1, x1 + y2, ..., xn + yn, ...),at (x1, x2, ..., xn, ...) _ (ax1, axe, ..., axn, ...) ,

is an infinite-dimensional space.

Example 4. The set of polynomials ao + at + ... + an_1t"-' of degree< n with coefficients in F is an n-dimensional vector space.3 As a basis ofthis space we can take, say, the system of powers to, t', , to-'

The set of all such polynomials (without a bound on the degree) form aninfinite-dimensional space.

Example 5. The set of all functions defined on a closed interval [a, b]form an infinite-dimensional space.

3. Let the vectors e1i e2, ... , en forms a basis of an n-dimensional vectorspace R and let x be an arbitrary vector of the space. Then the vectorsx, e1, e2, ... , en are linearly dependent (because there are n + 1 of them)

aox + ales + aze2 + ... + anen = o,

where at least one of the numbers ao, a1, ... , a is different from zero. Butin this case we must have ao; 0, since the vectors e1i e2, ... , e,, cannot belinearly dependent. Therefore

x= xiel + xze2 + ... + xnen (2)

where xi= - ai/ao (i= 1, 2,...,n).Note that the numbers x1i x2, ... , x,, are uniquely determined when the

vector x and the basis e1i e2, ... , en are given. For if there is another decom-position of x besides (2),

x=xie1+x2ez+...+xAen, (3)

8 The basic operations are taken to be ordinary addition of polynomials and multipli-cation of a polynomial by a number.

§ 1. VECTOR SPACES

then, by subtracting (2) from (3), we obtain

(xl - xl) el+ (x2-x2)e2+...+(xn-xn) en=0,

and since the vectors of a basis are linearly dependent, it follows that

XI-XI=X2-X2= ...= xnxn.=0,

xlx1, x2=x2, . . ., xn=xn

53

(4)

The numbers x1, x2i ... , x,, are called the coordinates of x in the basis

e 2 , .

Ifn

x =X xiei and y y{e1,

then

x + y =2' (xi + y{) e{ and a x = ,' a xe{ .i_1 i_1

i.e., the coordinates of a sum of vectors are obtained by addition of thecorresponding coordinates of the summands and the product of a vector bya number a is obtained by multiplying all the coordinates of the vector by a.

4. Let the vectorsx

) ,

_ x eii-l

be linearly dependent, i.e.,m

Xcrxt=O (5)

where at least one of the numbers c1i c2, ... , C. is not equal to zero.If a vector is the null vector, then all its components are zero. Hence

the vector equation (5) is equivalent to the following system of scalarequations :

C1xu + C2x11 + ... + Cmxi,n = 0

ctxa1 + cax2a + ... + cmxa,n = 0..............Cixn1 + C2--.a + ... + CmX = 0

(6)

As is well known, this system of homogeneous linear equations forc1, C2, ... , C. has a non-zero solution if and only if the rank of the coefficientmatrix is less than the number of unknowns, i.e., less than m. A necessaryand sufficient condition for the independence of the vectors x1i x2, ... , xmis, therefore, that this rank should be m.


Thus, the following theorem holds :

THEOREM 1: In order that the vectors x1, x2, ... , X. be linearly inde-pendent it is necessary and sufficient that the rank r of the matrix formedfrom the coordinates of these vectors in an arbitrary basis

x11 x12 ... xlmX21 X22 ... x2m

(7)

xnl xn2 ...

be equal to m, i.e., to the number of vectors.Note. The linear independence of the vectors x1i x2, ... , x, means that

the columns of the matrix (7) are linearly independent, since the k-th columnconsists of the coordinates of Xk (k = 1, 2, . . . , m). By the theorem, there-fore, if the columns of a matrix are linearly independent, then the rank ofthe matrix is equal to the number of columns. Hence it follows that in anarbitrary rectangular matrix the maximal number of linearly independentcolumns is equal to the rank of the matrix. Moreover, if we transpose thematrix, i.e., change the rows into columns and the columns into rows, thenthe rank obviously remains unchanged. Hence in a rectangular matrix thenumber of linearly independent columns is always equal to the number oflinearly independent rows and equal to the rank of the matrix.4

5. If in an n-dimensional space a basis e1, e2, ... , e,;has been chosen, thento every vector x there corresponds uniquely the column x = (x1, x2, ... , xn),where x1i x2, . . . , xn are the coordinates of x in the given basis. Thus, thechoosing of a basis establishes a one-to-one correspondence between the vec-tors of an arbitrary n-dimensional vector space R and the vectors of then-dimensional number space R' considered in Example 2. Here the sumof vectors in R corresponds to the sum of the corresponding vectors of R'.The analogous correspondence holds for the product of a vector by a numbera of F. In other words, an arbitrary n-dimensional vector space is isomorphicto the n-dimensional number space, and therefore all vector spaces of thesame number n of dimensions over the same number field F are isomorphic.This means that to within isomorphism there exists only one n-dimensionalvector space for a given number field.

4 This proposition follows from Theorem 1, in the proof of which we have startedfrom the well-known property of a system of linear homogeneous equations: a non-zerosolution exists only when the rank of the coefficient matrix is less than the number ofunknowns. For a proof of Theorem 1 independent of this property, see § 5.

§ 2. A LINEAR OPERATOR 55

The reader may ask why we have introduced an `abstract' n-dimensionalspace if it coincides to within isomorphism with the n-dimensional numberspace. Indeed, we could have defined a vector as a system of n numbersgiven in a definite order and could have introduced the operations on thesevectors in the very way it was done in Example 2. But we would then havemixed up properties of vectors that do not depend on the choice of abasis with properties of a particular basis. For example, the fact that allthe coordinates of a vector are zero is a property of the vector itself ; it doesnot depend on the choice of basis. But the equality of all its coordinates isnot a property of the vector itself, because it disappears under a change ofbasis. The axiomatic definition of a vector space immediately singles outthe properties of vectors that do not depend on the choice of a basis.

§ 2. A Linear Operator Mapping an n-Dimensional Spaceinto an r-Dimensional Space

1. We consider a linear transformationy1 = a11x1 + a12x2 + ... + a1Axx

ys ` at1x1 + ax2 + ... + (8)

ym= aa,1xl + am,2x2 + ... + a,,,,,x , 1whose coe ficients be'ong to the number field F as well as two vector spacesover F: an n-dimensional space R and an n-dimensional space S. We choosea basis el, e2, ... , e in R and a basis g,, g2, ... , g, in S. Then the trans-formation (8) associates with every vector x= j xe, of R a certain vector

m t-1y tI1y t

of S, i.e., the transformation (8) determines a certain operator

A that sets up a correspondence between the vector x and the vectory : y = Ax. It is easy to see that this operator A has the property of linear-ity, which we formulate as follows :

DEFINITION 5: An operator A mapping R into S, i.e., associating withevery vector x of R a certain vector y = Ax of S is called linear if for arbi-trary x1, x2 of R and a of F

A (x1 + xs) = Ax1+ Axs , A (ax1) = aAx1. (9)

Thus, the transformation (8), for a given basis in R and a given basisin S, determines a linear operator mapping R into S.


We shall now show the converse, i.e., that for an arbitrary linear operatorA mapping R into S and arbitrary bases e1,-e2, .. , e in R and g,, g2, ... ,g,,,in S, there exists a rectangular matrix with elements in F

aall a12 .. alw

i 2l 22... zn (10)

11 aml amt ... 11

such that the linear transformation (8) formed by means of this matrixexpresses the coordinates of the transformed vector y = Ax in terms of thecoordinates of the original vector x.

Let us, in fact, apply the operator A to the basis vector ek and let thecoordinates in the basis gl, g ... , g. of the vector Aek thus obtained bedenoted byalk,a2k,...,a,, (k = 1,2,...,n)

MAek = aikgi (k =1, 2, ... , n). (11)

i.lMultiplying both sides of (11) by Xk and summing from 1 to n, we obtain

J' xkAek = (.Y aikxk) gik-1 i-1 k-1

hence

where

n n my=Ax= =,Py&,

k-1 k-1 i-1

ayi =.E aikxk

k-1(i=1,2,...,m),

and this is what we had to show.

Thus, for given bases of R and S : to every linear operator A mapping Rinto S there corresponds a rectangular matrix of dimension m X n and, con-versely, to every such matrix there corresponds a linear operator mappingR into S.

Here, in the matrix A corresponding to the operator A, the k-th columnconsists of the coordinates of the vector Aek (k = 1, 2, . . . , n).

We denote by x = (xl, x2, ... , x,,) and y = (y), y2, ... , y,,,) the coordi-nate columns of the vectors x and y. Then the vector equation

y=Ax

corresponds to the matrix equation

y=Ax,

§ 3. ADDITION AND MULTIPLICATION OF LINEAR OPERATORS 57

which is the matrix form of the transformation (8).Example. We consider the set of all polynomials in t of degree < n -

with coefficients in F. This set forms an n-dimensional vector space R(see Example 4., p. 52). Similarly, the polynomials in t of degree < n - 2

with coefficients in F form a space The differentiation operator d

associates with every polynomial of R. a certain polynomial in R,,-, . Thus,this operator maps R into R,,-,. The differentiation operator is linear,since

d[m(t) + +V(t)] =

do(t) + do(t),d (aq(t)] =at dp(t)

dt ds dt dt cu.

In R. and R,,,_, we choose bases consisting of powers of t :

t°=I,t,...,t°-' and t°=1,t,...,tn-2.Using formulas (11), we construct the rectangular matrix of dimension

(n - 1 X n) corresponding to the differentiation operator d in these bases :

1

0 1 0... 00 0 2... 0

0 0 0 ... n-1

§ 3. Addition and Multiplication of Linear Operators

1. Let A and B be two linear operators mapping R into S and let the cor-responding matrices be

A=1ja II, B=IIbikII (i=1,2,...,m; k=1,2,...,n).DEFINITION 6: The sum of the operators A and B is the operator C

defined by the equations

Cx=Ax+Bx (x a R). (12)

On the basis of this definition it is easy to verify that the sum C = A + Bof the linear operators A and B is itself a linear operator. Furthermore,

x

Cek =Act + Be. (a. + b.) ek .k-1

5 x e R means that the element x belongs to the set R. It is assumed that (12) holdsfor arbitrary x in R.


Hence it follows that the operator C corresponds to the matrix C = II c4k II,

where Cik = atk + bik (i = 1, 2, ... , m; k = 1, 2, ... , n), i.e., the operator C

corresponds to the matrixC=A+B. (13)

We would come to the same conclusion starting from the matrix equation

Cx = Ax + Bx (14)

(x is the coordinate column of the vector x) corresponding to the vectorequation (12). Since x is an arbitrary column, (13) follows from (14).

2. Let R, S, and T be three vector spaces of dimension q, n, and m, and letA and B be two linear operators, of which B maps R into S and A maps Sinto T ; in symbols : B

AR -. S -aT.

DEFINITION 7. The product of the operators A and B is the operator Cfor which

Cx=A(Bx) (x E R). (15)

holds for every x of R.The operator C maps R into T :

R C-AB T.

From the linearity of the operators A and B follows the linearity of C.We choose arbitrary bases in R, S, and T and denote by A, B, and C thematrices corresponding, in this choice of basis, to the operators A, B, and C.Then the vector equations

X =Ay, y= Bx, a =Cx (16)

correspond to the matrix equations :

z=Ay, y=Bx, z=Cx,

where x, y, z are the coordinate columns of the vectors x, y, s. Hence

Cx=A (Bx) = (AB) x

and as the column x is arbitrary

C=AB. (17)

Thus, the product C = AB of the operators A and B corresponds to thematrix C = 11 c,, I I (i = 1, 2, ... , m; j = 1, 2, ... , q), which is the productof the matrices A and B.

§ 4. TRANSFORMATION OF COORDINATES

We leave it to the reader to show that the operator6

C=aA (aeF)

corresponds to the matrix

C = aA.

59

Thus we see that in Chapter I the operations on matrices were so definedthat the sum A + B, the product AB, and the product aA correspond to thematrices A + B, AB, and aA, respectively, where A and B are the matricescorresponding to the operators A and B, and a is a number of F.

§ 4. Transformation of Coordinates

1. In an n-dimensional vector space we consider two bases: e1, e2, ..., e*(the `old' basis) and ei, eQ, . . . , e; the `new' basis).

The mutual disposition of the basis vectors is determined if the coordi-nates of the vectors of the basis are given relative to the other basis.

We setel = tli el + t21 e2 + ... + tnl ee*=tUel +

e*=tinek +t2ne2+... +tnrien

or in abbreviated form,

(k=1,2,...,n).t_I

(18)

(18')

We shall now establish the connection between the coordinates of oneand the same vector in the two different bases.

Let x1, x=, ... , T. and xi, xE, ... , x; be the coordinates of the vectorx relative to the `old' and the `new' bases, respectively:

n n

x = ' x{ e{ =,'xr et . (19)t_1 k_1

In (19) we substitute for the vectors et the expressions given for them in(18). We obtain :

6 I.e., the operator for which Cx= oAx (x a R).

60 III. LINEAR OPERATORS IN AN n-DIMENSIONAL VECTOR SPACErn

X x k*

a(

n

= G t L t. ei tik xk1/ e+k-1 i-1 i-l k-l

Comparing this with (19) and bearing in mind that the coordinates of avector are uniquely determined when the vector and the basis are given,we find: n

xi = t. xt (i=1,2, ... , n), (20)k-1

or in explicit form :x1 = t11 xl `F 112x2 ++ ... tl,t xA

x2 = t2l xi + t22x2 + ... + ten xrt (21)

X. = tal xl + tn2x2 ... + tnttxn.

Formulas (21) determine the transformation of the coordinates of avector on transition from one basis to another. They express the `old'coordinates in terms of the `new' ones. The matrix

T=11ikIn (22)

is called the matrix of the coordinate transformation or the transformingmatrix. Its k-th column consists of the `old' coordinates of the k-th `new'basis vector. This follows from formulas (18) or immediately from (21) ifwe set in the latter xx= 1, xx = 0 for i ' k.

Note that the matrix T is non-singular, i.e.,

I T O. (23)

For when we set in (21) x1 = x2 = ... = x = 0, we obtain a system of nlinear homogeneous equations in the n unknowns xi, e2, ..., x; with deter-minant T 1. This system can only have the zero solution xi = 0, x2 = 0,.... x = 0, since otherwise (19) would imply a linear dependence among thevectors el*, e2 , ..., e.*. Therefore IT I 0.7

We now introduce the column matrices x = (x1i x2, ... , and x* _(xi, x2, ... , x%). Then the formulas (21) for the coordinate transformationcan be written in the form of the following matrix equation :

x = Tx*. (24)

Multiplying both sides of this equation by T-1, we obtain the expressionfor the inverse transformation

x*=T-lx. (25)

7 The inequality (23) also follows from Theorem 1 (p.64), because the elements of Tare the ' old' coordinates of the linearly independent vectors ei , e2* ,... , eg.

§ 5. EQUIVALENT MATRICES. RANK OF OPERATOR. SYLVESTER'S INEQUALITY 61

§ S. Equivalent Matrices. The Rank of an Operator.Sylvester's Inequality

1. Let R and S be two vector spaces of dimension n and m, respectively,

over the number field F and let A be a linear operator mapping R into S.

In the present section we shall make clear how the matrix A corresponding

to the given linear operator A changes when the bases in R and S are changed.

We choose arbitrary bases e1, e2,... , eA in R and 9.1 92, ... , g,,, in S.In these bases the operator A corresponds to a matrix A = II aik II (i = 1, 2,

3, ..., m ; k = 1, 2, ... , n). To the vector equation

y = Ax (26)

there corresponds the matrix equation

y = Ax , (27)

where x and y are the coordinate columns for the vectors x and y in thebases e1, e2,. .., e and gI, 92, ... ,g,,.

We now choose other bases e*, ez, ..., en and gi, g2, ... , gam, in R andS. In the new bases we shall have x', y*, A* instead of x, y, A. Here

y* =A *X*. (28)

Let us denote by Q and N the non-singular square matrices of order n and m,respectively, that realize the coordinate tranformations in the spaces R andS on transition from the old bases to the new ones (see § 4) :

x = Qx*, y = Ny*. (29)

Then we obtain from (27) and (29) :

y* = N-i y = N-1 Ax Y-1AQx*. (30)

Setting P = N-1, we find from (28) and (30)A*= PAQ. (31)

DEFINITION 8: Two rectangular matrices A and B of the same dimen-sion are called equivalent if there exist two non-singular matrices P and Qsuch that"

B = PAQ. (32)

8 If the matrices A and B are of dimension m x n, then in (32) the square matrix Pis of order in, and Q of order n. If the elements of the equivalent matrices A and B belongto some number field, then P and Q may be chosen such that their elements belong to thesame number field.


From (31) it follows that two matrices corresponding to one and thesame linear operator A for different choices of bases in R and S are alwaysequivalent. It is easy to see that, conversely, if a matrix A corresponds tothe operator A for certain bases in R and S, and if a matrix B is equivalentto A, then it corresponds to the same linear operator for certain other basesin R and S.

Thus, to every linear operator mapping R into S there corresponds aclass of equivalent matrices with elements in F.2. The following theorem establishes a criterion for the equivalence of twomatrices :

THEOREM 2: Two rectangular matrices of the same dimension are equi-valent if and only if they have the same rank.

Proof. The condition is necessary. When a rectangular matrix is multi-plied by an arbitrary non-singular square matrix (on the right or left),then its rank does not change (see Chapter I, p. 17). Therefore it followsfrom (32) that

rd=fB.

The condition is sufficient. Let A be a rectangular matrix of dimensionm X n. It determines a linear operator A mapping the space R with thebasis e1, e2, ... , e into the space S with the basis g1, g2,. . ., g,,, . Let rdenote the number of linearly independent vectors among the vectorsAe1, Aes, ..., Ae,,. Without loss of generality we may assume that thevectors Ael, Ae .... Ae, are linearly independent9 and that the remain-ing Ae,+1, Ae,+2, ..., Ae are expressed linearly in terms of them :

... , n) .Ae0 Y c,,Ae! (k=r+1,J-1

We define a new basis in R as follows :

e{ (i =1, 2, ... , r),e{-,'c{ref (i=r+ 1, ... , n).

1-1

(33)

(34)

Then by (33),

Next, we setAe, =o (k=r+1, ...,n). (35)

Aef =gj (j =1, 2, ... , r). (36)

9 This can be achieved by a suitable numbering of the basis vectors e,, e,, . . . , e,,.


The vectors gi, gQ, ..., g, are linearly independent. We supplement themto obtain a basis g;, g;, . . ,with suitable vectors gr+1, 9;+ 21 > gm gm

of S.The matrix corresponding to the same operator A in the new bases

e' e' e' ' g' g; has now, by (35) and (36), the form

r

1 0 ... 0 0 ... 00 1 ... 0 0 ... 0

0 0 ... 1 0 ... 00 0 ... 0 0 ... 0

0 0...0 0...0

(37)

Along the main diagonal of If, starting at the top, there are r units ; all theremaining elements of If are zeros. Since the matrices A and Ir correspondto one and the same operator A, they are equivalent. As we have proved,equivalent matrices have the same rank. Hence the rank of the originalmatrix A is r.

We have shown that an arbitrary rectangular matrix of rank r is equiva-lent to the `canonical' matrix Jr. But Jr is completely determined by speci-fying its dimensions m X n and the number r. Therefore all rectangularmatrices of given dimension m X n and of given rank r are equivalent to oneand the same matrix Ir and consequently to each other. This completes theproof of the theorem.

3. Let A be a linear operator mapping an n-dimensional space R into ann-dimensional space S. The set of all vectors of the form Ax, where x e It,forms a vector space.10 This space will be denoted by AR; it is part of thespace S or, as we shall say, is a subspace of S.

Together with the subspace AR of S we consider the set of all vectorsx e R that satisfy the equation

Ax=o (38)

These vectors also form a subspace of R, which we shall denote by NA.

20 The set of vectors of the form Ax (x a R) satisfies the postulates 1.-7. of § 1,because the sum of two such vectors and the product of such a vector by a number are alsovectors of this form.


DEFINITION 9: If a linear operator A maps R into S, then the dimension

r of the space AR is called the rank of A,11 and the dimension d of the spaceNA consisting of all vectors x e R that satisfy the condition (38) is calledthe defect, or nullity, of A.

Among all the equivalent rectangular matrices that describe a givenoperator A in distinct bases there occurs the canonical matrix I,. (see (37) ).We denote the corresponding bases of R and S by ei, eQ, ..., eA andgi, g$, .. , gm . Then

Aei=gi, ..., Ae,=gr, Ae,+1=...=AeA=o.

From the definition of AR and NA it follows that the vectors gi, gE, ... , g,form a basis of AR and that the vectors e;+1, er+s, ... , ew form a basis ofNA . Hence it follows that r is the rank of the operator A and that

d= n - r. (39)

If A is an arbitrary matrix corresponding to A, then it is equivalent toI, and therefore has the same rank r. Thus, the rank of an operator A coin-cides with the rank of the rectangular matrix A

A=all a12 ... a1.a21 an ... a2,

amt ams ... a,.

determined by A in arbitrary bases e,, e,, ... , e. e R and 91, 93, ... , g,,, a S.The columns of A are formed by the coordinate vectors Ale,, ... , Ae,,.

Since it follows from x = x{et that Ax = xAe,, the rank of A, i.e.,

the dimension of RA, is equal to the maximal number of linearly independ-ent vectors among Ael, Ae2, ... , Ae . Thus :

The rank of a matrix coincides with the number of linearly independentcolumns of the matrix.

Since under transposition the rows of a matrix become its columns andthe rank remains unchanged :

11 The dimension of the space AR never exceeds the dimension of R, so that r < n.

This follows from the fact that the equation x = xiei (where1.1

wbasis of R) implies the equation Ax x;Aei.

e1, e2, ... , e is a

i-1


The number of linearly independent rows of a matrix is also equal tothe rank o f the matrix.12

4. Let A and B be two linear operators and let C = AB be their product.Suppose that the operator B maps R into S and that the operator A maps Sinto T. Then the operator C maps R into T:

R 86.S__ A +T, R-Cw T.

We introduce the matrices A, B, C corresponding to A, B, C in somechoice of bases in R, S, and T. Then the matrix equation C = AB will cor-respond to the operator equation C =AB.

We denote by rd, rB, ro the ranks of the operators A, B, C or, what is thesame, of the matrices A, B, C. These numbers determine the dimensions ofthe subspaces AS, BR, A(BR). Since BR c S, we have A(BR) c AS.13Moreover, the dimension of A (BR) cannot exceed the dimension of BR.14

ThereforeroSrA, roSrB.

These inequalities were obtained in Chapter I, § 2 from the formula for theminors of a product of two matrices.

Let us regard A as an operator mapping BR into T. Then the rank ofthis operator is equal to the dimension of the space A(BR), i.e., to rC. There-fore, by applying (39) we obtain

rc= rB-di, (40)

where d1 is the maximal number of linearly independent vectors of BR thatsatisfy the equation

Ax = o . (41)

But all the solutions of this equation that belong to S form a subspace ofdimension d, where

d=n--rA

is the defect of the operator A mapping S into T. Since BR C S,

di-5 d.

From (40), (42), and (43) we find:

ra+rB-nSro.

(42)

(43)

12 In § 1 we reached these conclusions on the basis of other arguments (see p. 54).13 R C S means that the set R forms part of the set S.14 See Footnote 11.


Thus we have obtained Sylvester's inequality for the rank of the productof two rectangular matrices A and B of dimensions m X n and n X q:

rA+rB-n5rABSmin(ra, rB). (44)

§ 6. Linear Operators Mapping an n-Dimensional Space into Itself

1. A linear operator mapping the n-dimensional vector space R into itself(here R = S and n = m) will be referred to simply as a linear operator in R.

The sum of two linear operators in R and the product of such an operatorby a number are also linear operators in R. Multiplication of two suchoperators is always feasible, and this product is also a linear operator in R.Hence the linear operators in R form a ring.'s This ring has an identityoperator, namely the operator E for which

Ex=x (xcR).

For every operator A in R we have

EA=AE=A.

(45)

If A is a linear operator in R, then the powers A2=AA, A3=AAA,and in general A' = AA...A have a meaning. We set A*= E. Then it is

m eeasy to see that for all non-negative integers p and q we have

APA4 =AP+t.

Let 1(t)= aer + alt" + + a,,,-it + a,, be a polynomial in a scalarargument t with coefficients in the field F. Then we set :

f(A)=a0AA1+alAm-1+...+asr1A+aE. (46)

Here I(A)g(A) = g(A)t(A) for any two polynomials f (t) and g (t).Let

y=Ax (x,yER).

We denote by x1i x2, ... , X. the coordinates of the vector x in an arbitrarybasis e1, e2,. .., e,, and by yl, Y2, ... , y, the coordinates of y in the samebasis. Then

rt

be =2 attxt (i =1, 2, ... , n)*x-i

(47)

15 This ring is in fact an algebra. Bee Chapter I, p. 17.

§ 6. MAPPING n-DIMENSIONAL SPACE INTO ITSELF 67

In the basis e1, es, ..., e the linear operator A corresponds to a square

matrix A= 11 aik 1101. ' We remind the reader (see § 2.) that in the k-thcolumn of this matrix are to be found the coordinates of the vector Aek(k =1, 2, ... , n). Introducing the coordinate columns x= (x1, x2, ... ,and y = (y1, y2, , y,,), we can write the transformation (47) in matrixform

y=Ax. (48)

The sum and product of two operators A and B correspond to the sumand product of the corresponding square matrices A = II a(5 li, andB bik Il .. The product aA corresponds to the matrix aA. The identityoperator E corresponds to the square unit matrix E= II 8a+i. Thus, thechoice of a basis establishes an isomorphism between the ring of linear opera-tors in R and the ring of square matrices of order n with elements in F. Inthis isomorphism the polynomial f (A) corresponds to the matrix f (A).

Let us consider, apart from the basis el, e2, . . . , e,,, another basisei,e2, . . . ,e* of R. Then, in analogy with (48), we have

y* ; A*x* (49)

where x*, y* are the column matrices formed from the coordinates of thevectors x, y in the basis ei, e8, ... , e.* and A*='I ask 11 is the square matrixcorresponding to the operator A in this basis. We rewrite in matrix formthe formulas for the transformation of coordinates

x=Tx*, y=Ty*. (50)

Then from (48) and (50) we find:

y* = T-IATx*;

and a comparison with (49) gives:

A*=T-IAT. (51)

Formula (51) is a special case of (31) on p. 61 (namely, P = T-1 andQ=T).

DEFINITION 10: Two matrices A and B connected by the relation

B = T-IA T T. (51')

where T is a non-singular matrix, are called similar."'

is See § 2 of this chapter. In this case the spaces R and S coincide; in the same way,the bases e,, e,, ... , e and gi, ga, ... , g- of these spaces are identified.


Thus, we have shown that two matrices corresponding to one and thesame linear operator in R for distinct bases are similar and the matrix Tlinking these matrices coincides with the matrix of the coordinate trans-formation in the transition from the first basis to the second (see (50) ).

In other words, to a linear operator in R there corresponds a whole classof similar matrices ; they represent the given operator in various bases.

In studying properties of a linear operator in R, we are at the same timestudying the matrix properties that are common to the whole class of similarmatrices, that is, that remain unchanged, or invariant, under transition froma given matrix to a similar one.

We note at once that two similar matrices always have the same determi-nant. For it follows from (51') that

B=DTI-11 AII I is a necessary, but not a sufficient condition

for the similarity of the matrices A and B.In Chapter VI we shall establish a criterion for the similarity of two

matrices, i.e., we shall give necessary and sufficient conditions for twosquare matrices of order n to be similar.

In accordance with (52) we may define the determinant I A I of a linearoperator A in R as the determinant of an arbitrary matrix correspondingto the given operator.

If I A 1= 0 ( 0), then the operator A is called singular (non-singular).In accordance with this definition a singular (non-singular) operator cor-responds to a singular (non-singular) matrix in any basis. For a singularoperator:

1) There always exists a vector x o such that Ax = o;2) AR is a proper part of R.

For a non-singular operator :

1) Ax = o implies that x = o ;2) AR - R, i.e., the vectors of the form Ax (x a R) fill out the whole

space R.

In other words, a linear operator in R is singular or non-singular dependingon whether its defect is positive or zero.

17 The matrix T can always be chosen such that its elements belong to the same basicnumber field r as those of A and B. It is easy to verify the three properties of similarmatrices :

Reflexivity (a matrix A is always similar to itself)Symmetry (if A is similar to B, then B is similar to A) ; andTransitivity (if A is similar to B, and B to C, then A is similar to C).

§ 7. CHARACTERISTIC VALUES AND CHARACTERISTIC VECTORS 69

§ 7. Characteristic Values and Characteristic Vectorsof a Linear Operator

1. An important role in the study of the structure of a linear operator Ain R is played by the vectors x for which

Ax=Ax (lei, xro) (53)

Such vectors are called characteristic vectors and the numbers A corres-ponding to them are called characteristic values or characteristic roots ofthe operator A (or of the matrix A).t

In order to find the characteristic values and characteristic vectors of an{operator A we choose an arbitrary basis e1, e2, ... , e in R. Let x x{e3

and let A= 11 ack 11 71 be the matrix corresponding to A in the basise1, e2, ... , e.. Then if we equate the corresponding coordinates of the vec-tors on the left-hand and right-hand sides of (53), we obtain a system ofscalar equations

a11x1+

a12x2+ ... + alnxr.

= Ax1

a21x1+ a22x2 + .. - + a2nxn = Ax2

....................anlx1 + an2x2 + ... + annxn = .lxn,

which can also be written as(all - A) x1 + a12x2 + ... + alnx = O

a21x1 + (a22 _ 2)x2+ ... + a2nxn = 0

anlx1 + an2x2 + ... + (ann - 2) x a = 0

(54)

(55)

Since the required vector must not be the null vector, at least one of itscoordinates x,, x2, ... , x must be different from zero.

In order that the system of linear homogeneous equations (55) shouldhave a non-zero solution it is necessary and sufficient that the determinantof the system be zero :

all - A a12 ... alna21 a22 - I ... a2n..................

and ant ... ann - A

t Other terms in use for the former are: proper vector, latent vector, eigenvector.Other terms for the latter are: proper value, latent value, latent root, latent number,

characteristic number, eigenvalue, etc.


The equation (56) is an algebraic equation of degree n in A. Its coeffi-

cients belong to the same number field F as the elements of the matrix

A = 11 a{k lift, .

Equation (56) occurs in various problems of geometry, mechanics,astronomy, and physics and is known as the characteristic equation or thesecular equation's of the matrix A = II ask I11 (the left-hand side is calledthe characteristic polynomial).

Thus, every characteristic value A of a linear operator A is a root of thecharacteristic equation (56). And conversely, if a number I is a root of(56), then for this value A the system (55) and hence (54) has a non-zerosolution x1i X2.... , x,,, i.e., to this number I there corresponds acharacteristicvector x = I xiei of the operator A.

From what we have shown, it follows that every linear operator A in Rhas not more than n distinct characteristic values.

If r is the field of complex numbers, then every linear operator in Ralways has at least one characteristic vector in R corresponding to a charac-teristic value A.19 This follows from the fundamental theorem of algebra,according to which an algebraic equation (56) in the field of complexnumbers always has at least one root.

Let us write (56) in explicit form

(57)

It is easy to see that here

(i kSi = atr Ss =A (58)

and, in general, S, is the sum of the principal minors of order p of the matrixA = Ij afk ,1 i (p1, 2, ... , n) .20 In particular, S. _ I A 1.

We denote by A the matrix corresponding to the same operator A inanother basis. A is similar to A :

18 The name is due to the fact that this equation occurs in the study of secular per-turbations of the planets.

19 This proposition is valid even in the more general case in which r is an arbitraryalgebraically closed field, i.e., a field that contains the roots of all algebraic equationswith coefficients in the field.

20 The power (-1)4-P occurs only in those terms of the characteristic determinant (56)that contain precisely n - p of the diagonal elements, say,

affil - A, off1-A, ..., sin_Pf._P-A.

The product of these diagonal elements occurs in the expansion of the determinant (56)

Hence

§ 7. CHARACTERISTIC VALUES AND CHARACTERISTIC VECTORS 71

A= T-1 AT.

lE=T-1(A-AE)Tand therefore

1A-AEI =JA - AEI. (59)

Thus, similar matrices A and j have the same characteristic polynomial.This polynomial is sometimes called the characteristic polynomial of theoperator A and is denoted by I A - AE

If x, y, z, ... are linearly independent characteristic vectors of anoperator A corresponding to one and the same characteristic A, and a, P, y, .. .are arbitrary numbers of F, then the vector ax + fly + ya + . is either equalto zero or is also a characteristic vector of A corresponding to the same A.

For fromAx=Ax, Ay =Ay, Ax=Ax, ...

it follows that

A(ax+PY+Yz+...)=,1(ax+fY+Yz+ ..).

In other words, linearly independent characteristic vectors correspondingto one and the same characteristic value A form a basis of a `characteristic'subspace each vector of which is a characteristic vector for the same A. Inparticular, each characteristic vector generates a one-dimensional subspace,a `characteristic' direction.

However, if characteristic vectors of a linear operator A correspond todistinct characteristic values, then a linear combination of these character-istic vectors is not, in general, a characteristic vector of A.

The significance of the characteristic vectors and characteristic numbersfor the study of linear operators will be illustrated in the next section by theexample of operators of simple structure.

with a factor in which the term free of A is the principal minor

A1 iy ... ip

it t= ... ip)

where i,, i,, ... , i, together with j,, j,, ... , forms a complete set of indices 1, 2, ... , n ;hence in the development of (56) we have

flA -AEI _ (aim -A) (aj,i, - A) ... (ai

is ...-A) `4

ip+i1 i2 ... ip

When we take all possible combinations j,, j,, ... , jn_p of n - p of the indices 1, 2, . . . , n,we obtain for the coefficient .S, of (-A)"-p the sum of all principal minors of order p in A.


§ 8. Linear Operators of Simple Structure

1. We begin with the following lemma.

LEMMA: Characteristic vectors belonging to pairwise distinct charac-

teristic values are always linearly independent.

Proof. Let

Axi=,lixi (xi o; A, At for i; k; i,k=1,2,...,m) (60)

and

cixi=o. (61)i-1

Applying the operator A to both sides we obtain :

XCAX,= o.i-1

(62)

We multiply both sides of (61) by At and subtract (61) from (62) term byterm. Then we obtain

in

Eci(li-11)x;=o. (63)i_2

We can say that (63) is obtained from (61) by termwise applicationof the operator A-21E. If we apply the operators A-22E,to (63) term by term, we are led to the following equation:

Cfa 2m-1) (Am- 4-2) ... (A.- A1) xm= 0

so that c, = 0. Since any of the summands in (61) can be put last, we havein (61)

c1ca=...=cam,=0,

i.e., there is no linear dependence among the vectors x1, x2, ... , x n . Thisproves the lemma.

If the characteristic equation of an operator has n distinct roots and theseroots belong to F, then by the lemma the characteristic vectors belonging tothese roots are linearly independent.

DEFINITION 11: A linear operator A in R is said to be an operator ofsimple structure if A has n linearly independent characteristic vectors in R,where n is the dimension of R.

Thus, a linear operator in R has simple structure if all the roots of thecharacteristic equation are distinct and belong to F. However, these condi-

§ 8. LINEAR OPERATORS OF SIMPLE STRUCTURE 73

tions are not necessary. There exist linear operators of simple structurewhose characteristic polynomial has multiple roots.

Let us consider an arbitrary linear operator A of simple structure. We

denote by g1, g2, ... , g a basis of R consisting of characteristic vectors of

the operator, i.e.,

If

x =,' xkgkk-1

thenn n

Ax = E xxAgk =.E Akxkgkk-1 k-1

n

The effect of the operator A of simple structure on the vector x = I xkgkk-1

may be put into words as follows :

In the n-dimensional space R there exist n linearly independent 'direc-tions' along which the operator A of simple structure realizes a 'dilatation'with coefficients A,, 22i . . . , A,,. An arbitrary vector x may be decomposedinto components along these characteristic directions. These componentsare subject to the corresponding 'dilatations' and their sum then gives thevector Ax.

It is easy to see that to the operator A in a `characteristic' basisg1, g2, ... , g, there corresponds the diagonal matrix

A = I A:Bck 117-

If we denote by A the matrix corresponding to A in an arbitrary basise1, e., ... , e,, then

A =I II 2j8& II i 7-'. (64)

A matrix that is similar (p. 68) to a diagonal matrix is called a matrix ofsimple structure. Thus, to an operator of simple structure there correspondsin any basis a matrix of simple structure, and vice versa.

2. The matrix T in (64) realizes the transition from the basis e1, e2, ... , e#to the basis g1, g2,... , g,. The k-th column of T contains the coordinates ofa characteristic vector gk (with respect to e1, e2, ... , that correspondsto the characteristic value 2k of A (k = 1, 2, . . . , n). The matrix T is calledthe fundamental matrix for A.


We rewrite (64) as follows:

A =TLT-1 (L = (Al,1e, ... , A")) . (64')

On going over to the p-th compound matrices (1 < p < n), we obtain(see Chapter I, § 4) :

2p is a diagonal matrix of order N (N=(P")) along whose main diagonal

are all the possible products of Al, A2,... , A. taken p at a time. A comparisonof (65) with (64') yields the following theorem:

THEOREM 3: If a matrix A = II aik II i has simple structure, then forevery p < n the compound matrix 91p also has simple structure; moreover,the characteristic values of vtp are all the possible products 2 A Atp

(1 C i, < i2 < ... < ip < n) of p of the characteristic values A,, A2, ... , A.of A, and the fundamental matrix of stp is the compound Zp of the funda-mental matrix T of A.

COROLLARY: If a characteristic value Ak of a matrix of simple structureA = 11 aik II, corresponds to a characteristic vector with the coordinatestlk, t2k, ... , t"k (k = 1, 2, . . . , n) and if T = II t{k II, , then the characteristicvalue 4142 Ak,, (1:5 k, < k2 < ... < kp < n) of S1p corresponds to thecharacteristic vector with coordinates

T('1 ti$ ip) (1Si1<i2<...<ipSn). (66)kl ke ... kp

An arbitrary matrix A = 11 aik I+i may be represented in the form of asequence of matrices A. (m -4 oo) each of which does not have multiplecharacteristic values and, therefore, has simple structure. The characteristicvalues Ar'">, A$"'), of the matrix A. converge for m oo to the char-acteristic values A,, A2, ... , A. of A,

lim Ak'" = Ak (k = 1, 2, ... , n).my00

Hence

limAk7tA A =Ak,Ak1 A1,,, (1Sk1<ks<...<kp5n).111 -100

Moreover, since lim 1p = q(p , we deduce from Theorem 3:.-#00

§ S. LINEAR OPERATORS OF SIMPLE STRUCTURE 75

THEOREM 4 (Kronecker) : If Al, 22, ... , A. is a complete system of char-acteristic values of an arbitrary matrix A, then a complete systent of charac-teristic values of the compound matrix Wp consists of all possible products ofthe numbers A,, 22i ... , )., taken p at a time (p =1, 2, ... , n).

In the present section we have investigated operators and matrices ofsimple structure. The study of the structure of operators and matrices ofgeneral type will be resumed in Chapters VI and VII.

CHAPTER IV

THE CHARACTERISTIC POLYNOMIAL AND THEMINIMAL POLYNOMIAL OF A MATRIX

Two polynomials are associated with every square matrix: the characteristicpolynomial and the minimal polynomial. These polynomials play an impor-tant role in various problems of the theory of matrices. For example, theconcept of a function of a matrix, which we shall introduce in the nextchapter, will be based entirely on the concept of the minimal polynomial.In the present chapter, the properties of the characteristic polynomial andthe minimal polynomial are studied. A prerequisite to this investigationis some basic information about polynomials with matrix coefficients andoperations on them.

§ I. Addition and Multiplication of Matrix Polynomials

1. We consider a square polynomial matrix A(A), i.e., a square matrixwhose elements are polynomials in A (with coefficients in the given numberfield F)

A ('1)=11 ait(2) Ili=11a;x)fm+aik)2m-1 + ... (1)

The matrix A(A) can be represented in the form of a polynomial withmatrix coefficients arranged with respect to the powers of A :

whereA(I.)=A02m+Allm-1+ ... +Am, (2)

Ai = Il air '{i (9 =0, 1, ... , m) . (3)

The number m is called the degree of the polynomial, provided A. 0.The number n is called the order of the polynomial. The polynomial (1)is called regular if i Ao 1 0.

A polynomial with matrix coefficients will sometimes be called a matrixpolynomial. In contrast to a matrix polynomial an ordinary polynomialwith scalar coefficients will be called a scalar polynomial.

76

§ 2. RIGHT AND LEFT DIVISION OF MATRIX POLYNOMIALS 77

We shall now consider the fundamental operations on matrix poly-nomials. Let two matrix polynomials A (A) and B (A) of the same order begiven. We denote by m the larger of their degrees. These polynomials canbe written in the form

A(A)=A0Am+Al2m-1+...+Am,

B(A)=BO,lm+Bllm-1+... +Bm.

ThenA(A)±B(1)=(Ao±BO)Am+(A1+B1)2m-1+...+(Am±Bm),

i.e.: The sum (difference) of two matrix polynomials of the same order canbe represented in the form of a polynomial whose degree does not exceedthe larger of the degrees of the given polynomials.

Let A (A) and B (,l) be two matrix polynomials of the same order n andof respective degrees m and p :

A(2) = A02m + A1Rm-I + ... + A. (Ao O) ,B(A)=BOAp+B1ZP-1 +...+Bp (Bo 0).

ThenA (A) B (A) =AOB01m+P + (AOB1 + A1B0) Am+P-1 + ... + ArBp. (4)

If we multiply B(A) by A(2) (i.e., interchange the order of the factors),then we obtain, in general, a different polynomial.

2. The multiplication of matrix polynomials has a specific property. Incontrast to the product of scalar polynomials, the product (4) of matrixpolynomials may have a degree less than m + p, i.e., less than the sum ofthe degrees of the factors. For, in (4) the product AoBo may be the nullmatrix even though A0 0, Bo 0. However, if at least one of the matricesAo and Bo is non-singular, then it follows from Ao 0 and B. 0 thatAoBo 0. Thus: The product of two matrix polynomials is a polynomialwhose degree is less than or equal to the sum of the degrees of the factors.If at least one of the two factors is regular, then the degree of the product isalways equal to the sum of the degrees of the factors.

§ 2. Right and Left Division of Matrix Polynomials

1. Let A (A) and B (A) be two matrix polynomials of the same order n, andlet B(A) be regular:

A(A)=ADAm+AlAm-1+...+Am (A0 O),B(A) = BOAP + B1AP -1 + ... + Bp (I BO 10) .

78 IV. CHARACTERISTIC AND MINIMAL POLYNOMIAL OF A MATRIX

We shall say that the matrix polynomials Q(1) and R(1) are the rightquotient and the right remainder, respectively, of A(1) on division byB(1) if

A (A)= Q (1) B (A) + B (A) (5)

and if the degree of R(1.) is less than that of B(1).Similarly, we shall call the polynomials. Q(1) and R(1) the left quotient

and the left remainder of A (1) on division by B (1) if

A (A) = B (A) Q (1) + R (A) (6)

and if the degree of R (1) is less than that of B (1).The reader should note that in the `right' division (i.e., when the right

quotient and the right remainder are to be found) in (5) the quotient Q(1)is multiplied by the `divisor' B(1) on the right, and in the `left' division in

(6) the quotient Q(1) is multiplied by the divisor B(1) on the left. Thepolynomials Q(1) and R(1) do not, in general, coincide with Q(1) and R(1).

2. We shall now show that both right and left division of matrix polynomialsof the same order are always possible and unique, provided the divisor is aregular polynomial.

Let us consider the right division of A(1) by B(A). If m < p, we can setQ (1) = 0, R (1) = A (A). If m ? p, we apply the usual scheme for the divi-sion of a polynomial by a polynomial in order to find the quotient Q(1) andthe remainder R (A). We `divide' the highest term of the dividend Ao2m bythe highest term of the divisor Bo2P. We obtain the highest term AOBo11of the required quotient. We multiply this term on the right by the divisorB(1) and subtract the product so obtained from A(1). Thus we find the`first remainder' AM(2):

A(A) = AOBB'Am -P B(A) + A(1)(A).

The degree m(l) of A(1)(1) is less than m :

A(')(A) _ A () Am( + ... (Aal) 0, mil) < m) .

If m(')zP, then we repeat the process and obtain :

A(l)(2) =A(') B;1 Am(1)-P B(A) + A(2)(A) ,A(2)(A) = A(02) lm(2) + ... (m(2) < m(l)),

(7)

(8)

(9)

etc.

§ 2. RIGHT AND LEFT DIVISION OF MATRIX POLYNOMIALS 79

Since the degrees of A(A), A(')(A), A(2)(A), ... decrease, at some stage we

arrive at a remainder R(A) whose degree is less than p. Then it follows

from (7) and (9) thatA (A) = Q (A)B (A) + .R(A),

whereQ(A) =A0Bo1 A--P + A(1)B0' A'(')-P + ... , (10)

We shall now prove the uniqueness of the right division. Suppose wehave simultaneously

andA(A) = Q(A) B (A) + R(A) (11)

A(A) = Q*(A) B(A) + R*(A). (12)

where the degrees of R(A) and R*(A) are less than that of B(A), i.e., lessthan p. Subtracting (11) from (12) term by term we obtain

[Q (A)-Q*(A)]B(A)=R*(A)-B(A). (13)

If we had Q (A) - Q * (A) * 0, then the degree on the left-hand side of (13)would be the sum of the degrees of B (2) and Q (A) - Q *(A), becauseBo 10, and would therefore be at least equal to p. This is impossible,

since the degree of the polynomial on the right-hand side of (13) is lessthan p. Thus, Q (A) - Q * (A) = 0, and then it follows from (13) thatR(A) -R*(A) = 0, i.e.,

Q (A) = Q* (A), R (A) = B* (A)

The existence and uniqueness of the left quotient and left remainder isestablished similarly.'

1 Note that the possibility and uniqueness of the left division of A (1) by B(A) followsfrom that of the right division of the transposed matrices AT(A) and BT(A). (The regu-larity of B (A) implies that of BT (A).) For from

AT(A) =Q1(1) BT(A) + RI(A)

it follows (see Chapter I, p. 19) that

A(A) = B (A) QT(A) + RI(A) (61)

By the same reasoning, the left division of A(A) by B(A) is unique; for if it were not,then the right division of AT(A) by BT(A) would not be unique.

Comparison of (6) and (6') givesn

Q(A)=QT(A), R(A)=RT (A)


Example.SA(A)-II-A3-2A2+ 1

Ao

2A' +'1'3A' + A

= 11-1

jI.134lf-21111'+111°lRa+11' oil.

B (A) =

Bof=1,Bo1'11 1

21'+3-A'-1 A'+ 211- 1i-l 11 I_+ 11-1 211'

2 1I, AoBl-112511.

AoB1B(A)-11-A=+1 312+121i'

1(1'(1) - IS + A

II -A'-2A'+ 1

II

-3A-211-A+ 1

A(1) (1) = II - 0

2A3 + 12+ A311+1 II-II-A'+A

A3 - 13A-111 11'

4111°+ 11-1-11111 + 11 1 0

21(o1)x18(1)--11-21 -2

B (A) = A(') (1) - A(O1)Bo-'B (A)

223 + 131II313 +12A

1231 1'+2!1=11-21'--4

At -65111

_3AI

5 1--1 -13A-511-212 --1+1 13-11111-11-213-4

A2--}632 +5 -112+6

e(A)=AoBolA+AV>Bo1=112 5111+11-2-2211=112111+1

As an exercise, the reader should verify that

A(A) =Q(A)B(A) + R(2).

51+261-211-

11

§ 3. The Generalized Bezout Theorem

1. We consider an arbitrary matrix polynomial of order n

F(1)=F0A'"+FFR'm-1+....+F. (Fo,O). (14)

B,

§ 3. THE GENERALIZED BEZOUT THEOREM 81

This polynomial can also be written as follows:

F(A)=2mF0+2m-1F1+...+pal. (15)

For a scalar A, both ways of writing give the same result. However, ifwe substitute for the scalar argument A a square matrix A of order n, thenthe results of the substitution in (14) and (15) will, in general, be distinct,since the powers of A need not be permutable with the matrix coefficientsF0,F1i...,Fm.

We set

andF(A)=FAm+FiAm-1+...+Fm (16)

F(A)=ArFO+Am-1F1+...+Fm, (17)

and call F(A) the right value and F(A) the left value of F(2) on substi-tution of A for ,1.2

We divide F (,l) by the binomial AE- A. In this case the right re-mainder R(A) and left remainder R(A) will not depend on A. To determinethe right remainder we use the usual division scheme :

P(A) =FO2m+F12m-1 + ... +Fm

=F0Am-1(AE -A) + (FOA + F1) Am-1 +F22m-2 + . .[Fo2m-1+ (FoA+F1) 2m-21 (AE-A)+(FOA2+F1A+F2)dm-2+Fe2m-$+...

_ [FOAM-1 + (FOA + F1) Am-2 + .. .

+ FOAm-1 + F1Am-2 + ... + F,n-11 PE - A)+FOAm+F1Am-1+ ... +Fm.

Thus we have found that

SimilarlyR`FOAm+F1Am-1+ ... +Fm=F(A), (18)

This provesR=F(A). (19)

THEOREM 1 (The Generalized Bezout Theorem) : When the matrix poly-nomial F(R) is divided on the right by the binomial AE - A, the remainderis F(A); when it is divided on the left, the remainder is I`(A).

2In the 'right' value F(A) the powers of A are at the right of the coefficients; inthe `left' value F(A), at the left.


2. From this theorem it follows that:A polynomial F(1) is divisible by the binomial AE - A on the right (left)

without remainder if and only if F(A) = 0 (F(A) = 0).Example. Let A= II a;k I'; and let f (1) be a polynomial in A. Then

F(A) = f(1)E-f(A)

is divisible by AE - A (both on the right and on the left) without remainder.This follows immediately from the generalized Bezout Theorem, because in

this case P(A) =F(A) =0.

§ 4. The Characteristic Polynomial of a Matrix. The Adjoint Matrix1. We consider a matrix A = II aik 11 77 . The characteristic matrix of A is1E - A. The determinant of the characteristic matrix

A(1)= I2E-A I =I18a--aoli

is a scalar polynomial in A and is called the characteristic polynomial of A(see Chapter III, § 7).11

The matrix B(1) = 11 ba (1) 11*,, where bik (A) is the algebraic complementof the element 18ik - aik in the determinant d (A) is called the ad joint matrixof A.

By way of example, for the matrix

A=

we have :

1E-A=-a. - alsI!

1-a23 -ate- ass ' - a83

A(1)=I2E-A I=13-(a3i+a82+a33)12+ ...,

13 -- (a23 + a.)1 + a2saa9 - a23ass

B(1) =

all a12 a13

a., a22 a23

a81 a33 as9

a,12 + a2aS1 - a2iasa

a,12 + a21a32 - a22a3t

8 This polynomial differs by the factor (-1)" from the polynomial A(k) introducedin Chapter III, § 7.

§ 4. CHARACTERISTIC POLYNOMIAL OF A MATRIX. ADJOINT MATRIX 83

These definitions imply the following identities in A :

(AE -- A) B (A) =,J (A) E, (20)

B (1) (AE -A) =A (A) E. (20')

The right-hand sides of these equations can be regarded as polynomials withmatrix coefficients (each of these coefficients is the product of a scalar andthe unit matrix E). The polynomial matrix B(A) can also be representedin the form of a polynomial arranged with respect to the powers of A. Equa-tions (20) and (20') show that A (A)E is divisible on the right and on theleft by AE -A without remainder. By the Generalized Be'zout Theorem,this is only possible when the remainder A (A) E = A (A) is the null matrix.Thus we have proved :

THEOREM 2 (Hamilton-Cayley) : Every square matrix A satisfies itscharacteristic equation, i.e.

Example.

A('1)=

A(A)=A -5A+72=

A(A)=0. (21)

2 -11311 '

A-2 -1 _1 ,1_3(==A -5A+7,

3 51

-5 8I-5 I-1 3'I+7I

2. We denote by Al, A2, ... , A. all the characteristic values of A, i.e., all theroots of the characteristic polynomial A (A) (each A{ is repeated as often asits multiplicity as a root of A(A) requires). Then

A(1)=IAR-AI =(A--A,)(A-A=) ... (A-A»). (22)

Let g(µ) be an arbitrary scalar polynomial. We wish to find the charac-teristic values of g (A) . For this purpose we split g (y) into linear factors

g (µ) = ao (p -- ph) (# -- pas) ... (p - (23)

On both sides of this identity we substitute the matrix A for ju :

g (A) - ao (A - phE) (A -.nag) ... (A - 4u E) . (24)

Passing to determinants on both sides of (24) and using (22) and (23)we find


g(A)1=aolA-y1EI IA-# 9I...IA-µe-91_ (-1)"'a"od (Pi) A (4u2) ... A (u )

_ {-1)"t4rl j1 (µc - At) = g (Ai) g (As) ... g(-i k-1

If in the equation19 (A) I = 9 (Al) 9 (As) ... 90.) (25)

we replace the polynomial g (u) by A - g (,u), where A is some parameter,we find :

SAE-g(A)I=[A-g(Aj)] [A-g(A$)] ... (26)

This leads to the following theorem.

THEOREM 3: If Al, A2.... , A" are all the characteristic values (with theproper multiplicities) of a matrix A and if g(µ) is a scalar polynomial, theng(Al), g(4), ... , 9(A,,) are the characteristic values of g(A).

In particular, if A has the characteristic values A,, A2, ... , A., then Ak hasthe characteristic values Ak, AQ, ..., A." (k = 0, 1, 2, ...)

3. We shall now derive an effective formula expressing the adjoint matrixB (A) in terms of the characteristic polynomial d(A).

LetA (A)= I" - pp2"-1 - p21"-2 - ... _P.. (27)

The difference d (A) - d (µ) is divisible by A - µ without remainderTherefore

e (A ) =d 28, µ ( )

is a polynomial in A and y.The identity

A(A)-d(fu)=6(A, fu)(A-/,&) (29)

will still hold if we replace A and jz by the permutable matrices AE and A.Since by the Hamilton-Cayley Theorem A (A) = 0,

d (A) E= a (AE, A) (AE - A). (30)

Comparing (20') with (30), we obtain by virtue of the uniqueness of thequotient the required formula

B(A)=6(A.E, A). (31)

§ 4. CHARACTERISTIC POLYNOMIAL OF A MATRIX. ADJOINT MATRIX 85

Hence by (28)B (A) = E;r-' + B112 + B,r-s + ... + B.-.,, (32)

whereB1= A - p1E, B2 = Aa - p1A - p2E, .. .

and, in general,

Br= Ak - p1Ak-I - p2Ax-2 , ... - p ' (lc=1,2, ... , n -1) . (33)

The matrices B1, B2, . . . , B,_1 can be computed in succession, startingfrom the recurrence relation

Bt = A Bl-1- p +. (k= 1, 2, ... , n -1; B0 = E) . (34)

Moreover,'(35)

The relations (34) and (35) follow immediately from (20) if we equatethe coefficients of equal powers of A on both sides.

If A is non-singular, then

V.= (- 1)11-1 JA 1:pA-0,

and it follows from (35) that

A-' = 1 B.-I. (36)Pa

Let Ao be a characteristic value of A, so that A (Ao) = 0. Substituting thevalue A,, in (20), we find :

(A E-A)B(A)=0. (37)

Let us assume that B (Ao) 0 and denote by b an arbitrary non-zerocolumn of this matrix. Then from (37) we have (A0E - A) b = 0 or

Ab = lob. (38)

Therefore every non-zero column of B (Ao) determines a characteristic vectorcorresponding to the characteristic value Ao.s

Thus :

' From (34) follows (33). If we substitute in (35) the expression for B,._, given in(33), we obtain A(A) = 0. This approach to the Theorem does notrequire the Generalized Theorem explicitly, but contains this theorem implicitly.

5 See Chapter III, § 7. If to the characteristic value X. there correspond d, linearlyindependent characteristic vectors (n - do is the rank of a.E - A ), then the rank ofB (X.) does not exceed do. In particular, if only one characteristic direction correspondsto X,, then in B(X,) the elements of any two columns are proportional.


If the coefficients of the characteristic polynomial are known, then theadjoint matrix can be found by formula (31). If the given matrix A isnon-singular, then the inverse matrix A-' can be found by formula (36).If 1o is a characteristic value of A, then the non-zero columns of B (10) arecharacteristic vectors of A for 1=10.

Example.

A=

-2 -1 1

0 -3 1

-1 1 -3

JA-2 1 -10 2-1 -11 -1 A-1

.

=A'-42 +5A-2,

a(1,p)=4()-µ.(p)=A'+A 4 0-B(A)=8(AB,A)=A'B+A(A-4E)+A'-4A+ BE.

B1 B,

But

B1=A-4B=

B (A) _

2 -1 1

0 1 1

-1 1 1

B,=AB1+5E=

1'-21 -A+2 A-2-1 A;-3A+3 A-2-A+1 A-1 As--3A+2

Furthermore,

JA 1=2, A-'= 2 B:=

0 2 -2-1 3 -2

1 -1 2

1 0 1 -11 3 -121 _ 1 1

2 2

4(1)=(1-1)2 (1-2).

The first column of the matrix B (+l) gives the characteristic vector(+1, +1, 0) for the characteristic value A = 1.

The first column of the matrix B(+2) gives the characteristic vector(0, +1, +1) corresponding to the characteristic value A= 2.

§ 5. THE METHOD OF FADDEEV 87

§ 5. The Method of Faddeev for the Simultaneous Computation of theCoefficients of the Characteristic Polynomial and of the Adjoint Matrix

1. D. K. Faddeeve has suggested a method for the simultaneous determina-tion of the scalar coefficients pl, p2, . . . , p" of the characteristic polynomial

A (A) = An - p1A"-1 - P2A -2 - ... - pA (39)

and of the matrix coefficients B1, B2, ... , of the ad joint matrix B(A).In order to explain the method of Faddeev' we introduce the concept of

the trace (or spur) of a matrix.By the trace tr A of a matrix A = !I as II1 we mean the sum of the diago-

nal elements of the matrix :N

tr A =Za{{. (40)i-1

It is easy to see that

trA=p1= (41)i-1

if Al, A2, ... , A. are the characteristic values of A, i.e., if

A(A)=(A-Al)(A- '2) ... (2-A"). (42)

Since by Theorem 3 Ak has the characteristic values A1, A2, ..., An(k=0, 1,2,...,),wehave

"tr Ak= 8k =. A{ (k= 0, 1, 2, ...). (43)

+-1

The sums sk (k =1, 2, ... , n) of powers of the roots of the polynomial(39) are connected with the coefficients by Newton's formulas8

Jcpk= 8 k - p18k_1 - ... - pk_181 (k 1, 2, ... , n). (44)

If the traces sl, 82, . . . , s" of the matrices A, A2, ... , All are computed, thenthe coefficients p1, p2, - , p, can be determined from (44). This is themethod of Leverrier for the determination of the coefficients of the charac-teristic polynomial from the traces of the powers of the matrix.2. Faddeev has proposed to compute successively, instead of the traces ofthe powers A, A-2,. . ., All, the traces of certain other matrices A1, A2, ... , A.

6 See [14], p. 160.

7 In Chapter VII, § 8, we shall discuss another effective method, due to A. N. Krylov,of computing the coefficients of the characteristic polynomial.

8 See, for example, G. Chrystal, Textbook of Algebra, Vol. I, pp. 436ff.


and so to determine pi, p2, ... , p and B1, B2, ... , B. by the followingformulas :

A1= A, p1= tr A1, B1= A1- p1E,

- tr AA$ = AB1, p8 - - z, B2 = A$ - pzI`',

An_1= ABn-2, Pn_1 = n 1 1 tr An-1, Bn_1= An_1- pn-iE,

An=ABn_1, p, = n trA., Bn=An-pnE=0-

(45)

The last equation B. = A. - 0 may be used to check the computation.In order to convince ourselves that the numbers p1, p2, . . . , pn and the

matrices B1, B2, . . . , that are determined successively by (45) are, infact, the coefficients of A(2) and B(A), we note that the following formulasfor Ak and Bk (k = 1, 2, ... , n) follow from (45) :

Ak = Ak - p1Ak-1- ... - pk_lA, Bk =Ak - p1Ak-1- ... - pk-1A -prE (46)

Equating the traces on the left-hand and right-hand sides of the first of theseformulas, we obtain

kpk= 8k._ p18k-1- ... - Pk-i8i

But these formulas coincide with Newton's formulas (44) by which thecoefficients of the characteristic polynomial d(al) are determined succes-sively. Therefore the numbers pl, p2, ... , pn determined by (45) are alsothe coefficients of A(A). But then the second of formulas (46) coincidewith formulas (33) by which the matrix coefficients B1, B2, ... , of theadjoint matrix B(A) are determined. Therefore, formulas (45) also deter-mine the coefficients B1, B2, ... , of the matrix polynomial B(A).

Exomple.9

2-1 1 2 -2 -1 1 20 1 1 0 0 -3 1 0A= -1 1 1 1

, p1= trA=4, B1= A-4E= -1 1 -3 1

1 1 1 0 1 1 1 -42 2 4 3

9 As a check on the computation, we write under each matrix A,, As, As a row whoseelements are the sums of the elements above it. The product of this row of 'column-sums'of the first factor into the columns of the second factor must give the elements of thecolumn-sum of the product.

§ 6. MINIMAL POLYNOMIAL OF MATRIX 89

A,=AB1 =

A,=AB,=

-3 4 0-3-1-2-2 1

2 0-2-5-3-3-1 3

-5-1--5-4-5 2 0-2

1 0 -2 -4-1-7-3 40 4-2-7

-5 -1-7 -9

p,=2trA,=-2, B2=A2+2E=

p,=3 trA,=-5, Bs=A,+5E=

-2 0 0 00 -2 0 00 0 -2 00 0 0 -2

, p,=-2.A4=ABs=

J (1)=1'-41'+211+51+2,

J A;_-2, A'I=1 B,p4

0 -11 5

2

1 7

2 2

0 -2-1 -2

1 1

-1 4 0-3-1 0-2 1

2 0 0-5-3--3-1 3

0 2 0 -21 5 -2 4

-1-7 2 40 4-2-2

Note. If we wish to determine p1, p2, p3, p4 and only the first columns ofB1, B2, B3, it is sufficient to compute in A2 the elements of the first columnand only the diagonal elements of the remaining columns, in A3 only theelements of the first column, and in A4 only the first two elements of thefirst column.

§ 6. The Minimal Polynomial of a Matrix

1. DEFINITION 1: A scalar polynomial f (1) is called an annihilating poly-nomial of the square matrix A if

f(A) =0.An annihilating polynomial tp(A) of least degree with highest coefficient

1 is called a minimal polynomial of A.By the Hamilton-Cayley Theorem the characteristic polynomial d (A)

is an annihilating polynomial of A. However, as we shall show below, itis not, in general, a minimal polynomial.

Let us divide an arbitrary annihilating polynomial f (2) by a minimalpolynomial

f(1)=y,(1)q(A)+r(1),


where the degree of r(A) is less than that of zp(A). Hence we have:

f (A) = +p (A) q (A) + r (A).

Since f (A) = 0 and ip(A) = 0, it follows that r(A) = 0. But the degreeof r(A) is less than that of the minimal polynomial ip(A). Thereforer(A) =0.10 Hence: Every annihilating polynomial of a matrix is divisiblewithout remainder by the minimal polynomial.

Let lpl (A) and lp2 (A) be two minimal polynomials of one and the samematrix. Then each is divisible without remainder by the other, i.e., thepolynomials differ by a constant factor. This constant factor must be 1,because the highest coefficients in VYI(A) and tp2(A) are 1. Thus we haveproved the uniqueness of the minimal polynomial of a given matrix A.

2. We shall now derive a formula connecting the minimal polynomial withthe characteristic polynomial.

We denote by Ds-1 (A) the greatest common divisor of all the minors oforder n - 1 of the characteristic matrix AE - A, i.e., of all the elementsof the matrix B (A) = I I b{k (A) I I 1 (see the preceding section). Then

B (A) = D.-IL (A) C (A), (47)

where C (A) is a certain polynomial matrix, the `reduced' ad joint matrixof 1E - A. From (20) and (47) we have :

A (A) E = (AE - A) C (A) D _1(A) . (48)

Hence it follows that A (A) is divisible without remainder by (A):"

D.-IL(1) = Y' (A), (49)

where V(A) is some polynomial. The factor D,-,(A) in (48) may be can-celled on both sides :12

,p(A)E=(AE-A)C(A). (50)

10 Otherwise there would exist an annihilating polynomial of degree less than that ofthe minimal polynomial.

11 We could also verify this immediately by expanding the characteristic determinantd (,l) with respect to the elements of an arbitrary row.

12 In this case we have, apart from (50), also the identity (see (201))

y(d)E=C(,t)(lE-A),

i.e., C(A) is at one and the same time the left quotient and right quotient of lp(A)E ondivision by AR - A.


Since tp (A) E is divisible on the left without remainder by AE - A, itfollows by the Generalized Be'zout Theorem that

V(A) =0.

Thus, the polynomial V(A) defined by (49) is an annihilating polynomialof A. Let us show that it is the minimal polynomial.

We denote the minimal polynomial by 1V*(A). Then y, (A) is divisible by,V* (A) without remainder :

V (1) = Y,* (A) X (A) (51)

Since 1V* (A) = 0, by the Generalized Bezout Theorem the matrix polynomialp (A) E is divisible on the left by AE - A without remainder :

V* (1) E _ (1E -A) C* (1). (52)

From (51) and (52) it follows that

+V (A) E = (1E - A) C* (1) X (A). (53)

The identities (50) and (53) show that C(A) as well as C*(1)x(1) are leftquotients of ip(A) E on division by 1E - A. By the uniqueness of division

C (1) = C* (A) X (A).

Hence it follows that x (2) is a common divisor of all the elements of thepolynomial matrix C(A). But, on the other hand, the greatest commondivisor of all the elements of the reduced adjoint matrix C (A) is equal to 1,because the matrix was obtained from B (A) by division by D1(A) . There-fore x(1) =const. Since the highest coefficients of TV(A) and iV*(A) areequal, we have in (51) x(A) = 1, i.e., ip(A) =y,*(A), and this is what we hadto prove.

We have established the following formula for the minimal polynomial :

(54)

3. For the reduced adjoint matrix C(A) we have a formula analogous to(31) (p. 84) :

C (A) _ 7(AE, A); (55)

where the polynomial W(A, p) is defined by the equation"

13 Formula (55) can be deduced in the same way as (31). On both sides of theidentity y,(A) -y,(µ) _ (A p) (2, e) we substitute for A and p the matrices AE and Aand compare the matrix equation so obtained with (50).


(56)-kMoreover,

(AE-A)C(A)= o(A)E. (57)

Going over to determinants on both sides of (57), we obtain

A (A) I C (A) I = [+v (A)]". (58)

Thus, 4(A) is divisible without remainder by W(A) and some power of W(A)is divisible without remainder by d (A), i.e., the sets of all the distinct rootsof the polynomials d (A) and 9) (A) are equal. In other words : All the distinctcharacteristic values of A are roots of W(A).

If

then

where

A (A) _ (A-A1)"' (A-AE)", ... (A- AX'(A, Al for i j; n,>0, i, j=1, 2, ... , 8),

, (A) = (A-A1)+". (A--2$)"a ... (A,-A.)m,

(59)

(60)

0<mk:nk (k = 1, 2, . .. , a). (61)

4. We mention one further property of the matrix C(A). Let Ao be anarbitrary characteristic value of A = II a{k !I; . Then W(Ao) = 0 and there-fore, by (57),

(20E-A) C (Ao) = 0. (62)

Note that C (Ao) ; 0 always holds, for otherwise alt the elements of thereduced adjoint matrix C (A) would be divisible without remainder by A - Ao,and this is impossible.

We denote by c an arbitrary non-zero column of C(A,). Then from (62)

(4E -A) c =o,

Ac = Aoc. (63)

In other words, every non-zero column of C(A0) (and such a column alwaysexists) determines a characteristic vector for A = Ao.

Example.

A=3 -3 2

-1 5 -2-1 3 0

2-3 3 -2(A) 1 2-5

2


) =d(N)-d(2) _ 2+ (2-8)+2'-82+208 ,k.µ(

B(2) =A'+(2-8)A+(2'-82+20)E

10 - 18 12 3 -3 21 1 0 0-6 22 - 12 + (1-8) -1 b

-2+ (2 '-8.+20) 0 1 0

-6 18 - 8 -1 3 0 1 0 1

11 22-52 + 6 -3A+6 2A-41'

Il

2+2 2'-32+22+2 32-6 2'-82+2

All the elements of the matrix B (A) are divisible by D2 (A) = A - 2.ling this factor, we have :

C(2)=

and

i 2- 3 -3 2 1!-1 2-1 2-1 3 2-6I'

d(2) -1-2

Cancel-

In C(A) we substitute for A the value Ao = 2:

-1 -3 2C(2)= 1-1 1 -2

i-1 3

-4II.

The first column gives us the characteristic vector (1, 1, 1,) for A,, = 2. Thesecond column gives us the characteristic vector (- 3, 1, 3) for the samecharacteristic value Ao = 2. The third column is a linear combination of thefirst two.

Similarly, setting Ao = 4, we find from the first column of the matrixC (4) the characteristic vector (1, - 1, - 1) corresponding to the charac-teristic value Ao = 4.

The reader should note that y.,(A) and C(A) could have been determinedby a different method.

To begin with, let us find D2(A). D2(A) can only have 2 and 4 as itsroots. For A = 4 the second order minor

11-5I1 -31 =-2+2

of d (A) does not vanish. Therefore D2(4) r 0. For A= 2 the columns ofA(A) become proportional. Therefore all the minors of order two in A(A)


vanish for A = 2 : D2(2) = 0. Since the minor to be computed is of thefirst degree, D2(A) cannot be divisible by (,l - 2)2. Therefore

D2(2)=1-2.Hence

2('1-2)(A-4)=.% -62+8,

1-3 -3 2C('1)=V(AE,A)=A+(x-6)E=

11

--1 1-1 -2-1 3 2-6

CHAPTER V

§ 1. Definition of a Function of a Matrix

1. Let A= 11 alk 11 11' be a square matrix and f (A) a function of a scalarargument A. We wish to define what is to be meant by f (A), i.e., we wishto extend the function f (A) to a matrix value of the argument.

We already know the solution of this problem in the simplest special casewhere f (A) = y, A'+ yl 11-1 + + yt is a polynomial in A. In this case,/(A) = yo A' + yl A'-1 + . + y,E. Starting from this special case, we shallobtain a definition of f (A) in the general case.

We denote by

1'(A) (A-A,)"'' (1)

FUNCTIONS OF MATRICES

the minimal polynomial' of A (where A,, A2, . . . , A. are all the distinct charac-

teristic values of A). The degree of this polynomial is m = Mk-k_1

Let g(A) and h(A) be two polynomials such that

g (A)= h (A). (2)

Then the difference d(A) =g(A) -h(A), as an annihilating polynomial forA, is divisible by p(A) without remainder ; we shall write this as follows :

Hence bf (1)g(A)-h(A) (mod o(A)). (3)

d(Ak)=0, d'(Ak)=0, ..., d(nk-1)(Ar)=0 (k =1, 2, ..., 8),i.e.,

9 (Ar) = h (Ak), 91

(Ak) = h' (1k), (Ar) = h0 k-1)(Ar)

(k=1, 2, ..., 8).

i See Chapter IV, § 6.

(4)

95

96 V. FUNCTIONS OF MATRICES

The m numbers

f (1k), f, (Ak) f(mk- l) (Ak) (k= 1,2,...,8) (5)

will be called the values of the function f (A) on the spectrum of the matrix Aand the set of all these values will be denoted symbolically by I (AA)- If fora function f (A) the values (5) exist (i.e., have meaning), then we shall saythat the function f (A) is defined on the spectrum of the matrix A.

Equation (4) shows that the polynomials g(A) and h(2) have the samevalues on the spectrum of A. In symbols :

g (Ad) =A (Ad)

Our argument is reversible : from (4) follows (3) and therefore (2).Thus, given a matrix A, the values of the polynomial g(2) on the spec-

trum of A determine the matrix g (A) completely, i.e., all polynomials g (A)that assume the same values on the spectrum of A have one and the samematrix value g (A) .

We postulate that the definition of f (A) in the general case be subjectto the same principle: The values of the function f(A) on the spectrum ofthe matrix A must determine f (A) completely, i.e., all functions f (A) havingthe same values on the spectrum of A must have the same matrix value f (A).

But then it is obvious that for the general definition of f (A) it is suffi-cient to look for a polynomial2 g(A) that assumes the same values on thespectrum of A as f (A) does and to set :

f(A)=g(A).

We are thus led to the following definition :

DEFINITION 1: If the function f(A) is defined on the spectrum of thematrix A, then

f (Ad) = g (Ad)

where g(A) is an arbitrary polynomial that assumes on the spectrum of Athe same values as does f (A) :

f(A)=g(A).Among all the polynomials with complex coefficients that assume on the

spectrum of A the same values as f (A) there is one and only one polynomial

2 It will be proved in § 2 that such an interpolation polynomial always exists and analgorithm for the computation of the coefficients of the interpolation polynomial of leastdegree will be given.

§ 1. DEFINITION OF FUNCTION OF MATRIX 97

r(A) that is of degree less than m.3 This polynomial r(A) is uniquely deter-mined by the interpolation conditions :

9 (Ak) = f (Ak), r' (2k)= f" (2k), ... , r(mk-1) (Ak) = j(mk-1)(Ak) (6)

(k=1,2,...,s).

The polynomial r(A) is called the Lagrange-Sylvester interpolation poly-nomial for f (A) on the spectrum of A. Definition 1 can also be formulatedas follows :

DEFINITION 1': Let f(A) be a function defined on the spectrum of amatrix A and r(A) the corresponding Lagrange-Sylvester interpolation poly-

nomial. Then

/(A) = r(A).

Note. If the minimal polynomial Ip(A) of a matrix A has no multipleroots4 (in (1) m1=m2=...=m,=1; s = m), then for f(A) to have ameaning it is sufficient that f (A) be defined at the characteristic valuesAl, A2, . . . , Am. But if Ip(A) has multiple roots, then for some characteristicvalues the derivatives of f (A) up to a certain order (see (6)) must be definedas well.

Example 1: Let us consider the matrix5

H=

n0 1 0...00 0 1 ...0

0 0 0... I0 0 0...0

Its minimal polynomial is An. Therefore the values of f (A) on the spec-trum of H are the numbers f (0), f'(0), ... , f(0), and the polynomialr (A) is of the form

A"-1

Therefore

3 This polynomial is obtained from any other polynomial having the same spectralvalues by taking the remainder on division by p(r) of that polynomial.

4 In Chapter VI it will be shown that A is a matrix of simple structure (see ChapterIII, § 8) in this case, and this case only.

5 The properties of the matrix H were worked out in the example on pp. 13-14.


1(H)=f(0)E+ 1! H+ ... + H _(n-1)!

Example 2: Let us consider the matrixn

120 1 .00 1, 1. . .0

J=0 0 0. . .10 0 0. . .A0I

/' (0)1(n-1)

(0)

f (0)

f' (0)1!

0 . . : f (0)

Note that J = 20E + H, so that J -10E = H. The minimal polynomialof J is clearly (A - 2o)n. The interpolation polynomial r(A) of f (A) is givenby the equation

n 1!r (A) = / ('1o) +

f'1 !

1)(1-'t0) + ... + (n -1)1) ('1-

A0)n-1

Therefore

1(J) = r (J) = / (A0) E + f' i) H + ... + (nt> (1) Hn- t

A10)t , 00

0 1('10)

(A0)11

0 0 . . . /(Io)

2. We mention two properties of functions of matrices.1. If two matrices A and B are similar and T transforms A into B,

B=T-'AT,

then the matrices f (A) and f (B) are also similar and T transforms f (A)into f (B),

f (B) = T-' f (A) T.

1. DEFINITION OF FUNCTION OF MATRIX 99

For two similar matrices have equal minimal polynomials,6 so that f (A)assumes the same values on the spectrum of A and of B. Therefore thereexists an interpolation polynomial r(A) such that f (A) = r(A) andf(B) = r(B). But then it follows6 from the equation r(B) =T-1r(A)Tthat

f(B) =T-1f(A)T.

2. If A is a quasi-diagonal matrix

A = {A1, A2, ... , As) ,

then

/(A) = {f (A1), f (A2), - .. , f (A.)).

Let us denote by r(A) the Lagrange-Sylvester interpolation polynomialof f (A) on the spectrum of A. Then it is easy to see that

f (A) .= r (A) = {r (A1), r (A4), . . ., r (A.)} . (7)

On the other hand, the minimal polynomial y,(i.) of A is an annihilatingpolynomial for each of the matrices A,, A2, ... , A. Therefore it followsfrom the equation

that

Therefore

f (Ad) = r (Ad)

f(A41)= r(AA), ..., f(A4,) =r(AA.)

I(A1) = r (AI), ... , f (Au) = r

and equation (7) can be written as follows:

f (A) = (t (A1), f (A2), ... , f (A6)) . (8)

Example 1: If the matrix A is of simple structure

A=T{21, 12, ..., T-1,then

f(A)= T (f (A,), f(2Y), ..., f(A,))T-'.f (A) has meaning if the function f (,l) is defined at A1, 22, ... , 1,,.

6 From B = T-'AT it follows that Bk = T-1AkT (k. 0, 1, 2, ...). Hence for everypolynomial g(X) we have g (B) = T-lg(A)T . Therefore it follows from g(A) =0 that9(B) =0, and vice versa.

100 V. FTJNCTIONS OF MATRICES

Example 2: Let. J be a matrix of the following quasi-diagonal form

t

J='Y

1Y 1 0.. .00 A. 1 . . . 0

o o 0... 1Y 1

o o 0...0 1Y

All the elements in the non-diagonal blocks are zero. By (8) (see also theexample on pp. 12-13),

fal - u(11)II f (1)

1(11) . .1

it 01- 1),

0 f(11)

f' (11)1! -

0 0 . . f (11)

/(J)=

f (1Y)f'

(f 1Y)

0 0 .

/(`u4) (1Y)(vU -1)!

iif (1Y)

Here, as in the matrix J, all the elements in the non-diagonal blocks are alsozero.'

11 1 0. . .00 11 1. . .0

0 0 0. . .10 0 0. . .11

7 It will be established later (Chapter VI, § 6 or Chapter VII, § 7) that an arbitrarymatrix A = 11 au II;` is always similar to some matrix of the form J : A = TJT-1. There-fore (see 1, on p. 98) we always have f (A) = Tf (J) T-1.

§ 2. LAGRANGE-SYLVESTER INTERPOLATION POLYNOMIAL 101

§ 2. The Lagrange-Sylvester Interpolation Polynomial

1. To begin with, we consider the case in which the characteristic equationI AE - A I = 0 has no multiple roots. The roots of this equation-the char-acteristic values of the matrix A-will be denoted by A1i A2, . . . , An. Then

tP(A)=IAE-AI=(A-A1)(A-A2)...(A-AA),

and condition (6) can be written as follows:

r(Ak)=/(Ak) (k=1, 2, ..., n).

In this case, r(A) is the ordinary Lagrange interpolation polynomial forthe function f (A) at the points Al, A2i ... , An :

n (A ` -Ak-t) (A -1k±1) ... (A - 1n)r(A)_,(,tk-,11)...(Ak-Zk-1)(Ak-Ak+1)...(A;-An)f(2k)

By Definition 1'

f (A) = r (A)n (A - A,E) ... (A - Ak-1E) (A - Ak+ lE) ... (A -

A- ( k)(A*`11)-"(Ak-Ak-1)(Ak`Ak+1) (Ak`in)k-1 -

2. Let us assume now that the characteristic polynomial has multiple roots,but that the minimal polynomial, which is a divisor of the characteristicpolynomial, has only simple roots :8

In this case (as in the preceding one) all the exponents Mk in (1) areequal to 1, and the equation (6) takes the form

r (Ak) = f (Ak) (k =1, 2, ..., m).

r(A) is again the ordinary Lagrange interpolation polynomial and

f (A) - (A - AIE) ... (.4_- Ak-1E) (A - Ak+1E) ... (A _ AmE)(Ak - Al) ... (Ak -^ Ak-1) (Ak - Ak+ 1) ... (Ak - Am)k-1

3. We now consider the general case :

V (A) = (A - A1)m. (A - A,)" ... (A - A,)m (m2 + m2 -}- ... + m, = m) .

We represent the rational function. (') , where the degree of r(A) is lessthan the degree of y'(A), as a sum of partial fractions:

8 See footnote 4.


tV(1)-k-.11(2 aAk)*+k (aAk)mk1+...+Ar41 (9)

where ak, ... , mk; k -=1. 2, ... , s) are certain constants.In order to determine the numerators aki of the partial fractions we

multiply both sides of (9) by (2 - 'k )-k and denote by Vk(A) the polynomial

Then we obtain :(x-1k)mk

r1Vk((1)

ak1+ak2(A-2k)+...+akmk(A-2k)"k-1+

+(A-Ak)mkek(A) (k=1, 2, ..., 8), (10)

where ek(A) is a rational function, regular for A= Ak.°Hence

Lirk (1)1aY1 (1)k_,tk

ak 2 V k (1) i a - ak= r (2k) f

Ljk (1) J d - tk +r' (2k) _

Vk(Ak)'

(11)...(k=1,2,...s).

Formulas (11) show that the numerators ak; on the right-hand side of(9) are expressible in terms of the values of the polynomial r(A) on thespectrum of A, and these values are known : they are equal to the correspond-ing values of the function f (A) and its derivatives. Therefore

- 1(1k)_ 1 1

Ilk I Pk(1k) tVk(1) A-AIt(k =1, 2, ..., 8).

(12)

Formulas (12) may be abbreviated as follows :

_ 1 F _j (1) ll0-1>

Iakf(9-1)! lYk(1)!A-x

(7=, 2, ..., mk; k = 1, 2, ..., s).k

(13)

When all the akt have been found, we can determine r(A) from the follow-ing formula, which is obtained from (9) by multiplying both sides by tp(A) :

fr (2) = E (ak i + ak 2 (2 -1k) + ... + ak mk (I. - 2k)mk- 1) Vk (A) (14)

k_1

In this formula the expression in brackets that multiplies 1vk(A) is, by(13), equal to the sum of the first Mk terms of the Taylor expansion of f (A)in powers of (A - Ak).

o I.e., that does not become infinite for 1=At.

§ 2. LAGRANGE-SYLVESTER INTERPOLATION POLYNOMIAL 103

Note. The Lagrange-Sylvester interpolation polynomial can be obtainedby a limiting process from the Lagrange interpolation polynomial.

Let

v' (A) _ (A - (A - A2)""' ... (A - A,)m (m = .G mx).t=1

We denote the Lagrange interpolation polynomial constructed for the mpoints

by

A(t) A(2)A(m,) A(1)A(e)

A(m,). A(I) A(2) Acma1 , 1 , .. ., 1 , 2 , 2 , .., 2 , ... , , a , ..., ,

(A11)), ..., f (Aims))..., ft (Aa1)), !

(lima))

A

Then it is not difficult to show that the required Lagrange-Sylvesterpolynomial is determined b 7 the formula

r (A) = lim L (A).41),..., 4'"i)-. Al

(1), .... 1(m,) - A

Example :

Then1P (A) _ (A - A1)' (A -1s)' (m = 5).

r(A) - a Y is eAs,

Hence

r (A) _ [« + Q (A - A1)] (A - As)s + [Y +6 (A - As) + e (A - As)'] (A - A1)'

and therefore

r(A)== LaE + P (A - A1E)3 (A - A,E)a + (YE + d (A - A=E) + E (A - AaE)'] (A - AIE)'.

a, ,B, y, 6, and s can be found from the following formulas :

_ A(Al

As)' - (A,3

As)' 1(A1) +(A1 1 As)'

1' (A1),

Y=

6 =

A

(13 A1)'

a=_(As

2

A1)'1(As) + (As?

At

1I (As)

(As

3

A1) 1(As) _ (- A2 2 A1)31' (As) + 2

(A$1 A1)2 1" (As).


§ 3. Other Forms of the Definition of I(A).The Components of the Matrix A

1. Let us return to the formula (14) for r(A). When we substitute in (14)the expressions (12) for the coefficients a and combine the terms that con-tain one and the same value of the function f (A) or of one of its derivatives,we represent r(2) in the form

r (A) _ f f (fir) (A) + f' (2k) 9'k2 (A) + ... + f(mk-1) (Ak) mk(2), (15)

Here qk, (A) =1, 2, ... , mk ; k = 1, 2, ... , s) are easily computable poly-nomials in A of degree less than m. These polynomials are completely deter-mined when p(A) is given and do not depend on the choice of the functionf (A). The number of these polynomials is equal to the number of values ofthe function f (A) on the spectrum of A, i.e., equal to m (m is the degree ofthe minimal polynomial y'(A) ). The functions ggkj(A) represent theLagrange-Sylvester interpolation polynomial for the function whose valueson the spectrum of A are all equal to zero with the exception of /(F1) (Ak),which is equal to 1.

All the polynomials qk j (A) (j 1, 2, ... , mk ; k =1, 2, ... , s) are linearlyindependent. For suppose that

ft

0.k-1 j-1

Let us determine the interpolation polynomial r(A) from the mconditions :

r(j-1)(Ak) = ckj (j =1, 2, ..., mk; k =1, 2, ..., 8). (16)

Then by (15) and (16)t ink

r (A) _ X (A) = 0k-1 j-1

and, therefore, by (16)

ckj =0 (j= 1, 2, ..., mk; k =1, 2, ..., 8).

From (15) we deduce the fundamental formula for f(A) :

(A)k-1

ff(Ak)Zkl+I'(Ak)Zk2+...+,(M.' -I)(Ak)Z kl, (17)

where

Zkj =Tkj (A) (j = 1, 2, ..., mk; k =1, 2, ..., s). (18)

§ 3. OTHER FORMS OF DEFINITION OR f (A). COMPONENTS 105

The matrices Zkf are completely determined when A is given and do notdepend on the choice of the function f (1). On the right-hand side of (17)the function f(A) is represented only by its values on the spectrum of A.

The matrices Zkf (j =1, 2, ... , Mk; k =1, 2, . . . , s) will be called theconstituent matrices or components of the given matrix A.

The components Zkj are linearly independent.For suppose that

i mk

', ckiZkj = O .k-1 f-1

Then by (18)X (A) =0,

where8 Mk

(19)

X (A) =,T G Ck/9'kf (2) (20)

Since by (20) the degree of x(2) is less than m, the degree of the minimalpolynomial W (2) , it follows from (19) that

x(a)=0.

But then, since the m functions Tkf(A) are linearly independent, (20) impliesthat

ckf=0 (j= 1, 2, ...,mk; k=1, 2, ...,8),

and this is what we had to prove.

2. From the linear independence of the constituent matrices Zkf it follows,among other things, that none of these matrices can be zero. Let us alsonote that any two components Zkj are permutable among each other andwith A, because they are all scalar polynomials in A.

The formula (17) for f (A) is particularly convenient to use when it isnecessary to deal with several functions of one and the same matrix A, orwhen the function f (2) depends not only on A, but also on some parameter t.In the latter case, the components Zkf on the right-hand side of (17) do notdepend on t, and the parameter t enters only into the scalar coefficients ofthe matrices.

In the example at the end of § 2, where p(2) = (2 - A1)2(2 - 22)3, wemay represent r(A) in the form

r(A)=1(11)9'11(,1)+I'(11)c'12(A)+1(12)9'21(1)+1'(x:) 91t: (1)+1"(11) q,,(2),

where


2- A2 )a ( 3 (A -At) (A ` A0(1 - As)aA2 i 1- A,-AS ]' 9'12 (Z) = (At - Aa)a

A-At 12 f2(A-12) 3(A-1 2)2

1

L1 A2-A, + (As-At)hj

9)22 (A) _ (A -At)s

(AA2)

I1-2 (A-A2)1

(A2 - At)s ( As - At J '

9723- At)2(A - As)'(A) _ (A

2 (A2 - ADS

Therefore

f (A) f (At) Zit + /' (At) Zts + f (A2) Z21 + f' (A2) Zzs + f (A2) Z23where

Z,t =9'11 (A) = (A-12E)a {E -

At

3

(At Az)a(A - AE)]

... .Z12 = 9?12 (A) = 1- (A-A,E) (A-A2E)a,a(At As)

3. When the matrix A is given and its components have actually to befound, we can set in the fundamental formula (17) /(u)= 1 where A

is a parameter. Then we obtain

1 C(A) = Zkt 1!Zk2 (mkZkk(AE -A) =1D(A) kI A-Ak+ (A-Ak)2 +... + (A-Ak)-k (21)

where C(1) is the reduced adjoint matrix of 2E-A (Chapter IV, § 6).10The matrices (j -1) ! Zki are the numerators of the partial fractions in

the decomposition (21), and by analogy with (9) they may be expressed bythe values of C(1) on the spectrum of A by formulas similar to (11) :

.D(Ak)(mk-1)!Zkmk= k'(A) ,

Hence

(mk - 2) ! Zk mk-1 - (A) x - Ak

_ 1 (C (A) l(mk-f)

Zkf (9-1)!(mk-1)!`wk(A) JA_xk(9=1, 2, ..., mk; k=1, 2, ..., 8). (22)

When we replace the constituent matrices in (17) by their expressions (22),we can represent the fundamental formula (17) in the form

10 For 1(p) A 11A

we have f(A) =(AE- A)-- . For f (A) = r(A), where r(EL) is

the Lagrange-Sylvester interpolation polynomial. From the fact that f(p) and r(p)coincide on the spectrum of A it follows that (A- u) r(p) and (A- p) / (p) =1 coincideon this spectrum. Hence (AE- A) r (A) = (AE- A) / (A) = E.

§ 3. OTHER FORMS OF DEFINITION OR f (A). COMPONENTS

1 r 0(A) (M -1)

f(-4) _1) 4

107

(23)

Example 1:"

2 -1 1 12 1-2 1 -1A= 0 1 1 2 , AE- A 0 1-1 -1

-1 1 1 1 1 -1 1-1

In this case A (A) _ AE - A j = (A -1)2 (A - 2). Since the minor ofthe element in the first row and second column of AE - A is equal to 1, wehave D2(A) =1 and, therefore,

V (1)=v(1)=(1-1)2(1-2)=As-412+51-2,

'i(1,µ)='V (/4)-tp{d) =1A2+(1-4)'4+1-41+5

and

C(1)='P(1E,A)=A2+ (1-4)A+(12-41+ 5)E

3 -2 2-1 2 2

-3 3 1

3

3 +(1-4)1

2 -1 10 1 1

-1 1 1

+(12-41+5)1 0 0

0 1 0

0 0 1

The fundamental formula has in this case the form

f(A)=f(1)Z11+/'(1)Z12+f(2)Z21. (24)

Setting f (µ) =1 1we find:

(1E-A)-1= C(1) - Z11 + Z12 $ + Zsi,p(1) 1-1 (A-1) 1-2

hence

Z31=-C(1)-C'(1), Z12=-C(1), Z21 =C(2).

We now use the above expression for C(A), compute Z11, Z12, Z21, and substi-tute the results obtained in (24) :

11 The elements of the sum column are printed in italics and are used for checkingthe computation. When we multiply the rows of A into the sum column of B we obtainthe sum column of AB.


f(A)=l(1)1 0 0 1 -1 111 1 0 0 0!I

1 0 0+/'(I) 1 -1 1+/(2) -1 1011

I1!0 0 0! I-1 1 01 -1 1

I

/ (1) -F /' (1) -/' (1) /' (1) 11

1/(1)+/'(1)-/(2) -f'(1).+/(2) f'(I)I f(1)-f(2) -/ (1)+/(2) f (1)I

(25)

Example 2: Let us show that we can determine f (A) starting only fromthe fundamental formula. Again let

2 -1 1

A 0 1 1

1 1

Then

. W('1)=(1-1)2(.t-2).

f(A)=f(1)Z1+f'(1)Z2+f(2)Z).

In (24') we substitute for f (A) in succession 1, ,i -1, (A -1) 2:

111 0 0z1+Z3=E = 01oil,

0 0 1

Z,+Z,=A-E=1 -1 1 1

0 0 1 1,-1 1 0 0

0 0 0 0Za=(A-E)1= II-1 1 0 0.

-1 1 0 0

(24')

Computing the third equation from the first two term by term, we candetermine all the Z. Substituting in (24'), we obtain the expression forf(A).4. The examples we have analyzed illustrate three methods of practicalcomputation of f (A). In the first method, we found the interpolation poly-nomial r (1) and put f (A) = r (A). In the second method, we made use ofthe decomposition (21) and expressed the components Z,,, in (17) by thevalues of the reduced adjoint matrix C(2) on the spectrum of A. In thethird method, we started from the fundamental formula (17) and substitutedin succession certain simple polynomials for f (1) ; from the linear equationsso obtained we determined the constituent matrices Zk,.

§ 3. OTHER FORMS OF DEFINITION OR f (A). COMPONENTS 109

The third method is perhaps the most convenient for practical purposes.In the general case it can be stated as follows :

In (17) we substitute for f(A) successively certain polynomials gl(A),92(x), ... , 9.(A):

g1 (A) _ [9{ (Ak) Zk1 + 9i (Ak) Zk2 + ... +g{mk-1)

(2k) Zk mrlk-l

(i= 1, 2, ..., m). (26)

From the m equations (26) we determine the matrices Zk3 and substitute theexpressions so obtained in (17).

The result of eliminating Zk, from the (m + 1) equations (26) and (17)can be written in the form

f (A) / (2k) ...f(ml-') (Al) ... t (A,) ... f(ma-1) (A,)

gg (A) gi (A,) ... g(I'n`-') (2k) ... g1 (A,) ... (A,)

=0.

9m (A) 9m (Al) ...g(mi-1) (2k) , .. 9m (A,) ... 9(,m'-1)

(2a)

Expanding this determinant with respect to the elements of the first column,we obtain the required expression for f (A). As the factor of f (A) we havehere the determinant d = I AY) (2k) 1 (in the i-th row of A there are foundthe values of the polynomial g; (A) on the spectrum of A ; i =1, 2, . . . , m).In order to determine f (A) we must have A 0. This will be so if no linearcombination 12 of the polynomials vanishes completely on the spectrum of A,i.e., is divisible by W (A) .

The condition A 0 is always satisfied when the degrees of the poly-nomial gl (A), g2(A), ... , g. (A) are 0, 1, ... , m -1, respectively.13

5. In conclusion, we mention that high powers of a matrix An can be con-veniently computed by formula (17) by setting f (A) equal to A .14

Example : Given the matrix A = 11 4 - 3 II it is required to compute the

elements of A100. The minimal polynomial of the matrix is yp(A) = (A - 1)2.

12 With coefficients not all equal to zero.is In the last example, m = 3, gl(2) = 1, gs(1) = A- 1, 9s(A) = (1- 1)2.14 Formula (17) may also be used to compute the inverse matrix A-1, by setting

J(x) = A or, what is the same, by setting X= 0 in (21).


The fundamental formula is

f(A)(1)ZI+f'(1)Z,.

Replacing f (1) successively by 1 and A - 1, we obtain :

Z,=E, Z2 =A-E.Therefore

f(A)=f(1)E+f'(1)(A-E).

Setting f (A) =A"', we find

Aioo=E + 100(A-E) =111 1011+100114 -4411 =11401400

-.g 11.

§ 4. Representation of Functions of Matrices by means of Series

1. Let A= 11 a!k II i be a matrix with the minimal polynomial (1) :

vY (R) (m =-1 mk)k-IFurthermore, let f (A) be a function and let fI(A), f2(A), ... , fp(A), ... be asequence of functions defined on the spectrum of A.

We shall say that the sequence of functions fp(A) converges for p -+ 00to some limit on the spectrum of A if the limits

(fir) (k =1, 2, ... , 8)urn fp (Ak), lim fp (As), ... , "Mp-+oo p.4oo p-.o0

exist.We shall say that the sequence of functions f,, (A) converges for p-). o0

to the function f (A) on the spectrum of A, and we shall write

lim fp (AA)=f (Ad)

ifp.4oo

Jim fp (Ak) = f (nk), lim fn (1k) = f'(1k), ... ,P.00

lim f P k I (ilk) =p-poo

(k=1,2,...,3).The fundamental formula

i

f (A) f ofthe matrix as a vector in a space of dimension n2, then it follows

from the fundamental formula, by the linear independence of the matricesZkf, that all the f (A) (for given A) form an subspace of RR'

§ 4. REPRESENTATION OF FUNCTIONS OF MATRICES BY SERIES 111

with basis Zk; (j =1, 2, ... , Mk; k =1, 2.... , s). In this basis the `vector'f (A) has as its coordinates the m values of the function f(A) on the spec-trum of A.

These considerations make the following theorem perfectly obvious :

THEOREM 1: A sequence of matrices fp (A) converges for p -- oo to somelimit if and only if the sequence fp(A) converges for p -* oo on the spectrumof A to a limit, i.e., the limits

lim fp (A) and lim fp (AA)p-+oo

always exist simultaneously. Moreover, the equation

lim fp (AA) = f (AA) (27)

implies thatP. 00

lim fp (A) = f (A) (28)F-Co

and conversely.Proof. 1) If the values of f(1) converge on the spectrum of A for

p ---> oo to limit values, then from the formulas

fp (A) =G [fp (Ak) Zk1 + fp (4) fpmkM (At) (29)

there follows the existence of the limit lim fp(A). On the basis of thisV-+00

formula and of (17) we deduce (28) from (27).2) Suppose, conversely, that lim fp(A) exists. Since the m constituent

p-. 00

matrices Z are linearly independent, we can express, by (29), the m valuesof fp(A) on the spectrum of A (as a linear form) by the m elements of thematrix fp(A). Hence the existence of the limit lim /P(AA) follows, and (27)holds in the presence of (28).

P.. co

According to this theorem, if a sequence of polynomials gp(A) (p =1, 2,3, ...) converges to the function f (A) on the spectrum of A, then

lira gp (A) = f (A).

2. This formula underlines the naturalness and generality of our definitionof f (A). f (A) is always obtained from the gp (A) by passing to the limitp -+ oo, provided only that the sequence of polynomials g, (A) converges tof (A) on the spectrum of A. The latter condition is necessary for the exist-ence of the limit lim gp(A).

p-ao


00

We shall say that the series Y u,(A) converges on the spectrum of AP-0

to the function f (2) and we shall write0

f (AA) -4.'uP(AA),P=0

(30)

if all the functions occurring here are defined on the spectrum of A and thefollowing equations hold :

f (2k) -Lr U11 (Ad), up (Ak) . ..., f(mk-l) (Ak) _, uymk 1)P-0 P=0 P=0

(k =1, 2, ... , s),

where the series on the right-hand sides of these equations converge. Inother words, if we set

sP uy (1)y=0

(p = 0, 1, 2, ...),

then (30) is equivalent to

f (AA) -1im sP (AA). (31)

It is obvious that the theorem just proved can be stated in the followingequivalent form :

0*THEOREM V: The series X u ,,(A) converges to a matrix if and only if00 P=0

the series , ' u,(2) converges on the spectrum of A. Moreover, the equationp=0

ero

f (A.!) - uP (AA)P-0

implies that0

f (A) =E uP (A),P-0

and conversely.

3. Suppose a power series is given with the circle of convergence I A -20 I < Rand the sum f (A) :

00

f (A)=Z a,(A-Ao)' (I A-Aol < R). (32)


Since a power series may be differentiated term by term any number oftimes within the circle of convergence, (32) converges on the spectrum ofany matrix whose characteristic values lie within the circle of convergence.

Thus we have :

THEOREM 2: If the function f (2) can be expanded in a power series inthe circle I 2 - Ao I < r,

00

f (A) = Z ap (A - Ao)p, (33)

then this expansion remains valid when the scalar argument A is replacedby a matrix A whose characteristic values lie within the circle of convergence.

Note. In this theorem we may allow a characteristic value 2k of A tofall on the circumference of the circle of convergence; but we must thenpostulate in addition that the series (33), differentiated m7, -I times termby term, should converge at the point I = Ak. It is well known that thisalready implies the convergence of the j times differentiated series (33)at the point 2k to f°)(2k) for j = 0, 1, ... , mk-1.

The theorem just proved leads, for example, to the following expansions :'S

ed =00

AP'- A= °° (_ 1)1' A2pc EA2p+1

A p,

p-oP1 os

P_() (2p) l , sin =G (-1) 2p + ill,XP-00o A2p

cosh A Z (2p) t'00

(E-A)-1=IA'p-0

hi A =.X (A -E) pP-1 p

°0 A2p+1sinhA = o (2p+1)t'

(11,tI<1;k=1,2,...,8),

(IAk-1I<1; k=1,2, ...,F)

(by In A we mean here the so-called principal value of the many-valuedfunction Ln ,I, i.e., that branch for which Ln 1= 0).

Let G (u1, u2i . . . , ui) be a polynomial in u1, u2i ... , uI ; let f1(2), /2(2),f i (2) be functions of A defined on the spectrum of the matrix A, and let

g (A) = o U, (A), fe (A), ..., fl (2)].Then from

g (Ad) =0 (34)the re follows :

GUi(A),f2(A), ...,f:(A)]=o. (35)

15 The expansions in the first two rows hold for an arbitrary matrix A.


For let us denote by /I (A), f 2 (A), ... , f j (A) the Lagrange-Sylvester inter-polation polynomials r2(A), ... ,rs(1), and let us set:

Q (f, (A), fz (A), ..., f: (A)l =G Lr1(A), r2 (A), ..., rl (A)l = h (A) = 0,

Then (34) impliesh (A) = G [r1 (A), ?'s (A), ..., r! (A)],

Hence it follows thath(AA)=O. (36)

and this is what we had to show.This result allows us to extend identities between functions of a scalar

variable to matrix values of the argument.For example, from

cos2A+sin2 A=1

we obtain for an arbitrary matrix Atoss A + sin2A =B

(in this case 0 (u1, u2) = u= + u' - 1, f1 (A) = cos A, and f, (A) = sin A ).Similarly, for every matrix A

eAe7-' = E ,i.e.,

eA = (ea)-'

Further, for every matrix Ae`'=cosA+isinA

Let A be a non-singular matrix (I A J 0). We denote by ft the single-valued branch of the many-valued function VA that is defined in a domainnot containing the origin and containing all the characteristic values of A.Then }"A has a meaning. From A = 0 it now follows that

(FA A.

Let f (A) =x

and let A = II aik II1 be a non-singular matrix. Then f (A)

is defined as the spectrum of A, and in the equationAf(A)=1

we can therefore replace A by A :

i.e.,16 A f (A) =E,f(A)=A-1.

Denoting by r(2) the interpolation polynomial for the function ill wemay represent the inverse matrix A-1 in the form of a polynomial in A :

Is We have already made use of this on p. 109. See footnote 10.


A-1=r(A).

Let us consider a rational function o(2)= where g(A) and h(A) are

co-prime polynomials in A. This function is defined on the spectrum of Aif and only if the characteristic values of A are not roots of h(A), i.e.,'7 ifh(A) 0. Under this assumption we may replace A by A in the identity

obtaining:

Hence

e(1)h(.)=g(A),

(A)h(A)=g(A)-

e(A)=g(A) [h(A)1-1=[h(A)]-1g(A). (37)

Notes. 1) If A is a linear operator in an n-dimensional space R, thenf (A) is defined exactly like f (A) :

f (A) = r(A),

where r (A) is the Lagrange-Sylvester interpolation polynomial for f (A) onthe spectrum of the operator A (the spectrum of A is determined by theminimal annihilating polynomial ip (A) of A).

According to this definition, if the matrix A= II ask 111 corresponds tothe operator A in some basis of the space, then in the same basis the matrixf (A) corresponds to the operator f (A). All the statements of this chapterin which there occurs a matrix A remain valid after replacement of thematrix A by the operator A.

2) We can also define18 a function of a matrix f (A) starting from thecharacteristic polynomial

d (A)- n (A _ 2k)Ilkke1

instead of the minimal polynomial

(A) = U (I - 2k)mkk-1

17 Bee (25) on p. 84.

18 See, for example, MacMillan, W. D., Dynamics of Rigid Bodies (New York, 1936).


We have then to set f (A) = g (A), where g (A) is an interpolation polynomialof degree less than n modulo d (A) of the function f (A)."' The formulas (17),(21), and (23) are to be replaced by the following20

f (A) = L1 [f (Ak) Zk1 + f'(4) Z. + ... i ,Q'k 1) (1k) Zknk]k.=1

(17')

B(2) 4.1 1!Zk2 (nk 1)12knk(1E-A)-1 = + +... - -- nk-1

JII. (21')

4(2) k-1 2-Zk (Z-jk)2

where

1B (A)

htk-1)

dk (1) = (Ad

(Ak)"k

(k1,2, ... , s).

(23')

However, in (17') the values f (mk) (,lk), (/'(k), .. , f(ak-1) (At) occuronly fictitiously, because a comparison of (21) with (21') yields:

41=Zkl, ..., Zkmk, 4mk+1=... = Zknk=0.

§ 5. Application of a Function of a Matrix to the Integration of a Systemof Linear Differential Equations with Constant Coefficients

1. We begin by considering a system of homogeneous linear differentialequations of the first order with constant coefficients :

-allxl + a12x2 + ... + alnxndt

ddt2 = a21x1 + a22x2 + ... + a2nxn

...............

it -anlxl F- an2x2 + ... + a ,,xn,

(38)

where t is the independent variable, x1i x2, ... , xn are unknown functions oft, and a{k (i, k = 1, 2, ... , n) are complex numbers.

We introduce the square matrix A= aik 111 of the coefficients and thecolumn matrix X = (x1, x2, ... , xn) . Then the system (38) can be writtenin the form of a single matrix differential equation

19 The polynomial g(1) is not uniquely determined by the equation f (A) =g(A) andthe condition `degree less than n.'

20 The special case of (23') in which J(2)= in is sometimes called Perron's formula(see [40], pp. 25-27).

§ 5. APPLICATIONS TO SYSTEM OF LINEAR DIFFERENTIAL EQUATIONS 117

dxdt

=Ax. (39)

Here, and in what follows, we mean by the derivative of a matrix thatmatrix which is obtained from the given one by replacing all its elements

by their derivatives. Therefore di is the column matrix with the elementsdxl dx, dxW) at' ...' dt

We shall seek a solution of the system of differential equations satisfyingthe following initial conditions:

x10=x10, X21$-O= xe0 , -1 xn lt_0 =x+.oor, briefly,

x It-O = x0 . (40)

Let us expand the unknown column x into a MacLaurin series in powersof t:

x=x0+ t0E+±0 1 + ...(;to=

dx

.1t-0 $z0=s= , ...). (41)

2! dt two dt :so

Then by successive differentiations we find from (39) :

dsxA

dx =A 8x,d3is

=A at; =A 3x

Substituting the value t = 0 in (39) and (42), we obtain :

x0=Ax0, 20=A2x0, ... .

(42)

Now the series (41) can be written as follows :

ax=x0+tAxo+21A$x0+... dtx0 (43)

By direct substitution in (39) we see21 that (43) is a solution of thedifferential equation (39). Setting t=0 in (43), we find:

xIt_o=xo.

Thus, the formula (43) gives us the solution of the given system of differ-ential equations satisfying the initial conditions (40).

Let us set f (2) = ext in (17). Then

eet= I I qik (t) I I i =,E (Zki + Zk2t + ... + Zk.k rk-') elk' (44)

21 (eat)_d$(E+At+221

+...)=A+A21+ 214 .....-AeAt.


The solution (43) may then be written in the following form:

x1= q11 (t) x10 + q12 (t) x20 + ... + q1, (t) x,ax2 = q21 (t) x1o + q22 (t) x20 + ... + q2n (t) X.0.......................xA=qn1 (t) x10+ga2 (t) x20 +...+q.(t) x,a

(45)

where x10, x20, ... , x,,,, are constants equal to the initial values of the unknownfunctions x1, x2, ... , x,,.

Thus, the integration of the given system of differential equations reducesto the computation of the elements of the matrix eAt.

If t = t0 is taken as the initial value of the argument, then (43) is to bereplaced by the formula

Example.

The coefficient matrix is

x=eA(`-40)xa.

ddt1

3x1-- x2 + x3 ,

dXa2xi -xdx,dt

x1 - x2 + 2 x*.

A =

We form the characteristic determinant

3-2 1 1

2 -A I1 -1 2-1

I

=(I--1)(.i-2)'.

(46)

The greatest common divisor of the minors of order 2 is D2(A) =1. Therefore

v (A) =d (A) _ ('1- I) (1- 21$.

The fundamental formula is

/(A)=/(1)Z1+/(2)ZE+/'(2)Z;.

For f (2) we choose in succession 1, A - 2, (2 - 2) 2. We obtain :

§5.

1 0 0

Z1+Z,=E= 0 1 0

0 0 1

1 -1 1 r

-Z1+Z,=A-2E= 2 -2 1 r,

1 -1 0 00 0 0 0

Z1= (A -29)1 - -1 1 0 0.

-1 1 0 0

Hence we determine Z1j Z2i Z, and substitute in the fundamental formula

0 0 0 1 0 0 1 -1 1I(A)=I(I) -1 1 0 +f(2) 1 0 0 + f'(2) 1 -1 1

-1 1 0 1 -1 1 0 0 0

If we now replace f (1) by eat, we obtain :

eAt= et

Thus

where

APPLICATIONS TO SYSTEM OF LINEAR DIFFERENTIAL EQUATIONS 119

000-110--1 10

+ e2t1 001

I-0011 1 1

+ te2t(1 + t) e2t -010

- et + (1 + t) e2t et -1e2t te2t-et+e2t Pt-e2t

e2t

x1=C1(1 +t)e21-C2te2t+Csk2t

x2 = C1 [- at + (1 + t) e2t) + Cs (et - te2t) + C3te't ,

xs = C1(- et + e2t) + C, (et - 621) + C}e2t

Ct=x1o, C2=x1q, Cs=x,o-

2. We now consider a system of inhomogeneous )}clear differential equationswith constant coefficients :

dx1Wt = a11x1 + a12x2 + ... + a1 z , , + / (t)

dxi = a21x1 + a22x2 + ... + a2Ax>< + f2 (t)

dxdt

(47)


where f,(t) (i=1, 2, ... , n) are continuous functions in the interval to <t < t,. Denoting by f ( t) the Column matrix with the elements f, (t), f2 (t),

f,, (t) and again setting A att; !I", we write the system (47) asfollows :

=Ax+f(t). (48)

We replace x by a new column z of unknown functions, connected withx by the relation

x= a dtz (49)

Differentiating (49) term by term and substituting the expression for

dt in (48) we find-"

Atdt =t (t) (50)

Hence2sI

z(t)=c+ fe-ATf (r)dr (51)

and so by (49)

to

8 9

x =edt [c + f e-AT/ (r) dr] = eAtc + f ea (t-T) f (r) dr ; (52)tp tp

where c is a column with arbitrary constant elements.When we give to the argument tin (52) the value t, we find c=e-,Itoxo;

so that (52) can be written in the following form :I

x = e' (t-to)xa+ f e4(t-T)/(r) dr.t

(53)

22 See footnote 21.23 If a matrix function of a scalar argument is given, B(t) b{k(t) =1, 2, ...

t,

in; k = 1, 2, ... , n ; t, S r 5 t,), then the integral j B (t) dr is defined in the naturalway:

f B(t)dt= f btk(r)dt (i=1,2,...,m.; k=1,2,...,n).t, r,

f


Setting eA° = 11 q(t) 117, we can write the solution (53) in expandedform :

x1 =q11 (t - to) x1o + ... + q1 (t ._ to) xn0 +jc

[g11 (t - r) /1(T) ±.... + q1n (t - z) fn (T)] dTto

(54)

xn = qn1 (t - t0) x10 + ... + qnn (t - to) xn0 +I

+ f [goil (t - r) f 1 (r) + ... + qnn (t - r) fn (v)] dt .

f0

3. As an example we consider the motion of a heavy material point in avacuum near the surface of the earth, taking the motion of the earth. intoaccount. It is known24 that in this case the acceleration of the point relativeto the earth is determined by the constant force of gravity mg and the inertialCoriolis force - 2mco X v (v is the velocity of the point relative to the earth,co the constant angular velocity of the earth). Therefore the differentialequation of motion of the point has the form2'

=g-2w x v.

We define a linear operator A in three-dimensional euclidean space bythe equation

and write instead of (55)Ax=-2w x x (56)

dl =Av+g.

Comparing (57) with (48), we easily find from (53) :

t

v = eArvo + f eAAT dr . g0

(vo = vLo )

(57)

Integrating term by term, we determine the radius vector of the motionof the point :

where

! t T

r = ro + f eAT drvo + f f CA' da dr g, (58)00

ro = r'e_o and vo = vl,_o.

24 See A. Sommerfeld, Lectures on Theoretical Physics, Vol. I (Mechanics), § 30.25 Here the symbol x denotes the vector product.


Substituting for eAr the series

E+A _1f+ A22 ..f....

and replacing A by its expression from (56), we have:

r = ro + vot -}- gt1- w X I vote + 3 gta) + w X I W X (3vota + 8 gtl)J

-{- ... .

Considering that the angular velocity co is small (for the earth,cv M 7.3 X 10-5 see-'), we neglect the terms containing the second andhigher powers of co; for the additional displacement of the point due to themotion of the earth we then obtain the approximate formula

d=-wx(vott+3gts).

Returning to the exact solution (58). let us compute eA°. As a prelimi-nary we establish that the minimal polynomial of the operator A has theform

V (1) = 1(As + 4m').For we find from (56)

Atx = 4w x (w X x) = 4 (wx) w - 4wtx,Atx = - 2w x A'x = 8wt (w x x).

Hence and from (56) it follows that the operators E, A, A2 are linearlyindependent and that

As+4w'A=O.

The minimal polynomial V(A) has the simple roots 0, 2coi, - 2coi. TheLagrange interpolation formula for e4 has the form

1 + sin 2wt I + 1- cos 2wt It2w 4ws

Then

eA' = E + sin 2wtA + 1-- coo 2wt

At2w 40

Substituting this expression for eA= in (58) and replacing the operator Aby its expression from (56), we find

r-ro + vot + gt'

- w X I - cos 2wt v0 +2wt - sin 2wt2 ( 2w' 4w' g) +

+ w X Iw X(2wt - sin 2wt -1 + 2w't'+ cos 2wt

g )J .s V0 +`

(59)J


Let its consider the special case v = o. When we expand the triplevector product we obtain:

t2 2wt - sin 2wt cos 2wt -1 + 2w'ts (g sin q co - wg) ,r = r0 + g 2 + 40 °(g X w) + - 40

where q) is the geographical latitude of the point whose motion we are con-sidering. The term

2wt - sin 2wt4w3 (g X w)

represents the eastward displacement perpendicular to the plane of themeridian, and the last term on the right-hand side of the last formula givesthe displacement in the meridian plane perpendicular to, and away from, theearth's axis.4. Suppose now that the following system of linear differential equationsof the second order is given :

d921 +aiixi+a18x2+...+al8xx=0

3tj2 + a2ixi + a22x9 + ... + a2nxx= 0(60)

.....................d'xx

+ axixi + ax2x2 + ... + axxxx= 0,sxdit

where the ask (i, k =1, 2, ... , n) are constant coefficients. Introducingagain the column x = (x1, X2 )- .. , and the square matrix A = II air, II' ,

we rewrite (60) in matrix formd2Xi +Ax=0. (60')

We consider, to begin with, the case in which I A , 0. If n = 1, i.e., ifz and A are scalars and A # 0, the general solution of the equation (60) canbe written in the form

x = cos (}/A t) xo + sin (VA t) xa (61)

where xa= x 1,dx 1

_o and zo = dt e_o'By direct verification we see that (61) is a solution of (60) for arbitrary

n, where x is a column and A a non-singular square matrix.26 Here we usethe formulas

26 By vA we mean a matrix whose square is equal to A. CA , we know, exists whenCAI#0 (see p. 114).


cos ()"A t) = E - 2 Ate + t A2t4 - .. ,lJ (62)

(VA)`Isin(y'At)=Et- 1 At3+3!

Formula (61) comprises all solutions of the system (60) or (60'), as theinitial values ,r,, and r may be chosen arbitrarily.

The right-hand sides of the formulas (62) have a meaning even whenA 0. Therefore (61) is the general solution of the given system of

differential equations also when I A J = 0. provided only that the functionscos (rt) and (IA)_l sin (}/At), which are part of this expression, are inter-preted as the right-hand sides of the formulas (62).

We leave it to the reader to verify that the general solution of the in-honmogeneous system

dzas

+ Ax = / (t) (63)

satisfying the initial conditions x Jt_o = x0 and dt It,o xo can be written

in the form

x= cos (4'A t) xo + sin (j!A t) xo +

+(}'A)-n sin [}'A (t-z)]/ (r)dT. (64)d

If t = t0 is taken as the initial time, then in (61) and (64) cos (3/At) and

sin (3/At) must be replaced by cos (3/A (t - to)) and sin (3/A (t - to)), and f by f.0 to

In the special case

/ (t) = h sin (pt + a)

(h is a constant column, and p and a are numbers), (64) can be replaced by :

x=cos(VAt)c+(V4)-I sin(jAt)d+(A-p2E)-lhsin(pt+a),

where c and d are columns with arbitrary constant elements. This formulahas meaning when p2 is not a characteristic value of the matrix A(I A- p2E I 0).

§ 6. STABILITY OF MOTION IN THE CASE OF LINEAR SYSTEM 125

§ 6. Stability of Motion in the Case of a Linear System

1. Let x1, x2, ... , x be parameters that characterize the displacement of`perturbed' motion of a given mechanical system from an original motion,27

and suppose that these parameters satisfy a system of differential equationsof the first order:

dx; _(XI I x x t) (i 1 2 n) ; (65)

the independent variable t in these equations is the time, and the right-handsides f{ (x1, x2i ... , xn, t) are continuous functions of the variables x1, ... , xnin some domain containing the point x, = 0, x2 = 0, ... , X. = 0) for all t to

(to is the initial time).We now introduce the definition of stability of motion according to

Lyapunov.2sThe motion to be investigated is called stable if for every e > 0 we can

find a 6 > 0 such that for arbitrary initial values of the parametersx10, x20, ... , xno (for t = to) with moduli less than 6 the parameters x1, x2i.... x remain of moduli less than a for the whole time of the motion (t ?i.e., if for every e > 0 we can find a 6 > 0 such that from

xo I < a (i =1, 2, ..., n) (66)

it follows thatI x; (t) I < e (t ? t0) . (67)

If, in addition, for some 6 > 0 we always have lim xi (t)=0 (i = 1, 2, ... , n )

as long as I xto I < 6 (i = 1, 2.... , n), then the motion is called asymptot-ically stable.

We now consider a linear system, i.e., that special case when (65) is asystem of linear homogeneous differential equations

d x{ nde =y p;r (t) xt,

where the pfk (t) are continuous functions for t ? to (i, k = 1, 2, ... , n).In matrix form the system (68) can be written as follows :

(68)

27 In these parameters, the motion to be studied is characterized by constant zerovalues x, = 0, x, = 0, ... , x = 0. Therefore in the mathematical treatment of the prob-lem we speak of the 'stability' of the zero solution of the system (65) of differentialequations.

28 See [14], p. 13; [9], pp. 10-11; or [36], pp. 11-121. See also [3].


P (t) x, (68')

where x is the column matrix with the elements x1, x2, ... , x and P(t) _II pvv(t) 11 71 is the coefficient matrix.

We denote by

q15 (t), g21 (t) , ... , qn5 (t) (9 1, 2, ... , n) (60)

n linearly independent solutions of (68).29 The matrix Q(t) = 1140 litwhose columns are these solutions is called an integral matrix of the Aya=tem (68).

Every solution of the system of linear homogeneous differential eqi a-tions is obtained as a linear combination of n linearly independent solui:ioiiswith constant coefficients :

ri

x{ c?4r1(t) (t =1, 2, ... , n) ,1-1

or in matrix form,x=Q(t)c, (70)

where c is the column matrix whose elements are arbitrary constants ci, c2i, CO.

We now choose the special integral matrix for which

Q (to) = E; (71)

in other words, in the choice of n linearly independent solutions of (69)we shall start from the following special initial conditions: 30

g4(to)=8{5= { 1(i,?=1,2, ..., n).

Then setting t = tp in (70), we find from (71) :

x0=c,

and therefore formula (70) assumes the form

x = Q (t) xoor, in expanded form,

(72)

x,q{5 (t) xio (i= 1, 2, ..., n). (72')f- 1

29 Here the second subscript j denotes the number of the solution.30 Arbitrary initial conditions determine uniquely a certain solution of a given system.


We consider three cases :

1. Q (t) is a bounded matrix in the interval (to, + oo), i.e., there exists anumber M such that

Igt,(t)I SM (t? to; i, 9= 1, 2, ... , n).

In this case it follows from (72') that

Ixj(t)1SnMmaxlxfo

The condition of stability is satisfied. (It is sufficient to take 6 G nMin (66) and (67).) The motion characterized by the zero solution x, = 0,x2=0, ... , is stable.

2. lim Q(t) =O. In this case the matrix Q(t) is bounded in the intervalt -P. +00

(to, + oo) and therefore, as we have already explained, the motion is stable.Moreover, it follows from (72) that

lim x (t)= 0.t- +Do

for every xo. The motion is asymptotically stable.3. Q(t) is an unbounded matrix in the interval (to, + oo). This means

that at least one of the functions qty (t) , say qhk (t) , is not bounded in theinterval. We take the initiaLeonditions x,o = 0, x,0 = 0, ... , xk_l,o = 0,xko # 0, xk+i,o = O , .... xno = 0. Then

xh (t) = qhk (t) xko X.

However small in modulus xk may be, the function x, (t) is unbounded. Thecondition (67) is not satisfied for any 6. The motion is unstable.

2. We now consider the special case where the coefficients in the system(68) are constants :

P (t) = P = cont. (73)

We have then (see § 5)x =eptt-t.) xo. (74)

Comparing (74) with (72), we find that in this case

Q (t) = ep(t-t.) . (75)

We denote by

lp(A)_(A-Alyn.(A-R2)m,... (A-A,).,

the minimal polynomial of the coefficient matrix P.


For the investigation of the integral matrix (75) we apply formula (17)

on p. 104. In this case I(A)= e' (t-t0) (t is regarded as a parameter),/0) (ilk) = (t - to)f eak(t-t,) . Formula (17) yields

eP (t -t0) _ ,Y [Zk1 + Z, (t - to) + ... + 7x-,t (t - to'"r-l eak(t-to) , (76)

k-1

We consider three cases :

1. Re Zk < 0 (k =1, 2, ... , s) ; and moreover, for all Ak with Re Ak = 0the corresponding mk = 1 (i.e., pure imaginary characteristic values aresimple roots of the minimal polynomial).

2. ReAk<0 (k=1,2,...,s).3. For some k we have Re Ak > 0; or Re AI. = 0, but Mk > 1.

From the formula (76) it follows that in the first case the matrix Q(t)eP(t-to) is bounded in the interval (ta, + oo), in the second case lim eP(t-to) = 0,

t-.+00

and in the third ease the matrix eP(t-t0) is not bounded in the interval(to, + oo).31

Therefore in the first case the motion (x1 = 0, x2 = 0.... , x,, = 0) isstable, in the second case it is asymptotically stable, and in the third caseit is unstable.

91 Special consideration is only required in the case when in (76) for eP(t-to) thereoccur several terms of maximal growth (for t --* + oo ), i.e., with maximal Re ak = ao and(for the given Re Xk=a..) maximal value m0=m... The expression (76) can be repre-sented in the form

rI.

ep(t...to) = e(to(t-to) (t - t0)mo-I L Z_?n etdy(t-to) + (*)]ofal

where #1, #z, ... , f, are distinct real numbers and (*) denotes a matrix that tends tozero as t - + oo. From this representation it follows that the matrix eP(t-to) is not

rbounded for ao + w ro -1 > 0, because the matrix .. Zkfm e{Af(tt0) cannot converge for

f.R1t -* + oo. We can see this by showing that

ICIwhere cj are complex numbers and f3, real and distinct numbers, can converge to zero fort -* + oo only when f (t) = 0. But, in fact, it follows from lim f (t) = 0 that

rfT1cf12 = lim 1J

I T.++oo T0

and therefore

C1= C2 ' Ctt=O.


The results of the investigation may be formulated in the form of thefollowing theorem :32

THEOREM 3: The zero solution of the linear system (68) for P = const.is stable in the sense of Lyapunov if

1) the real parts of all the characteristic values of P are negative or zero,2) those characteristic values whose real part is zero, i.e., the pure imagi-

nary characteristic values (if any such exist), are simple roots of the minimalpolynomial of P;

and it is unstable if at least one of the conditions 1), 2) is violated.

The zero solution of the linear system (68) is asymptotically stable if andonly if all the characteristic values of P have negative real parts.

The considerations above enable us to make a statement about the natureof the integral matrix eP('-'ol in the general case of arbitrary characteristicvalues of the constant matrix P.

THEOREM 4: The integral matrix 6P(t-19) of the linear system (68) forP = const. is always representable in the form

ep (t _to) = Z_ (t) + Zo + Z+ (t),

where

1) lim Z_(t) =0,a.+oo

2) Z, is either constant or is a bounded matrix in the interval (t,,, + oo )

that does not have a limit for t-->+oo,3) Z+ (t) = 0 or Z+ (t) is an unbounded matrix in the interval (t,, + oo).Proof. On the right-hand side of (76) we divide all the summands into

three groups. We denote by Z (t) the sum of all the terms containing thefactors exk(to), with Re Ak < 0. We denote by Zo the sum of all those matricesZk, for which Re 2k = 0. We denote by Z+ (t) the sum of all the remainingterms. It is easy to see that Z._ (t), Z0(t), and Z+ (t) have the properties1), 2), 3) of the theorem.

32 On the question of sharpening the criteria of stability and instability for quasi-linear systems (i.e., of non-linear systems that become linear after neglecting the non-linear terms), see further Chapter XIV, § 3.

CHAPTER VI

EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES.ANALYTIC THEORY OF ELEMENTARY DIVISORS

The first three sections of this chapter deal with the theory of equivalentpolynomial matrices. On the basis of this, we shall develop, in the nextthree sections, the analytical theory of elementary divisors, i.e., the theoryof the reduction of a constant (non-polynomial) square matrix A to a normalform l (A = TAT-'). In the last two sections of the chapter two methodsfor the construction of the transforming matrix T will be given.

§ 1. Elementary Transformations of a Polynomial Matrix

1. DEFINITION 1: A polynomial matrix, or 2-matrix, is a rectangularmatrix A(A) whose elements are polynomials in A:

A (A)=Jjaa(2)II=I!a°2i+ a{t)X-1+...+a;j (i=1, 2, ..., m;k 1, 2,

here l is the largest of the degrees of the polynomials aik(A).Setting

Ai=JIa{x II (i=1,2,...,m; k=1,2,...,n; j=0,1,...,l),

we may represent the polynomial matrix A(A) in the form of a matrixpolynomial in A, i.e., in the form of a polynomial in 1 with matrix coefficients :

A (d) =A0,V + A1A -1 + ... + Al-1A + A1.

We introduce the following elementary operations on a polynomial mat-rix A(1):

1. Multiplication of any row, for example the i-th, by a number c 9& 0.

130

1. ELEMENTARY TRANSFORMATIONS OF A POLYNOMIAL MATRIX 131

2. Addition to any row, for example the i-th, of any other row, forexample the j-th, multiplied by any arbitrary polynomial b (A).

3. Interchange of any two rows, for example the i-th and the j-th.

We leave it to the reader to verify that the operations 1., 2., 3. are equi-

valent to a multiplication of the polynomial matrix A(1) on the left by thefollowing square matrices of order m, respectively:'

(i) (i) (7) 1

1.........0 1.........oll1 . .b(2). . .

S. = .e

0.........1 Ho . . . . . . . . . 1

(i) (7)

1 0

S"i

S"=

1...0...

0 1

(1)

in other words, as the result of applying the operations 1., 2., 3. the matrixA (I) is transformed into S'- A (A), S" A (A) , and S"' A (A), respectively.The operations of type 1., 2., 3. are therefore called left elementary opera-tions.

In the same way we define the right elementary operations on a poly-nomial matrix (these are performed not on the rows, but on the columns) ;2the matrices (of order n) corresponding to them are :

1 In the matrices (1) all the elements that are not shown are 1 on the main diagonaland 0 elsewhere.

2 See footnote 1.

132 VI. EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES

T'= (z) T" =

0.........1

I . . . . . . . . . 0

(i)

1 .0

0.........I

(9)

(i)

(?)

The result of applying a right elementary operation is equivalent tomultiplying the matrix A(2) on the right by the corresponding matrix T.

Note that T' and T... coincide with S' and S"' and that T" coincides withS" when the indices i and j are interchanged in these matrices. The matricesof type S', S", S"' (or, what is the same, T', T", T"') will be called elementarymatrices.

.The determinant of every elementary matrix does not depend on A andis different from zero. Therefore each left (right) elementary operationhas an inverse operation which is also a left (right) elementary operation.3

DEFINITION 2: Two polynomial matrices A(A) and B(A) are called1) left-equivalent, 2) right-equivalent, 3) equivalent if one of them can beobtained from the other by means of 1) left-elementary, 2) right elementary,3) left and right elementary operations, respectively.4

3 It follows from this that if a matrix B(X) is obtained from A(a) by means of left(right; left and right) elementary operations, then A(X) can, conversely, be obtainedfrom B(X) by means of elementary operations of the same type. The left elementaryoperations form a group, as do the right elementary operations.

4 From the definition it follows that only matrices of the same dimensions can be left-equivalent, right-equivalent, or simply equivalent.

§ 1. ELEMENTARY TRANSFORMATIONS OF A POLYNOMIAL MATRIX 133

Let B (A) be obtained from A(A) by means of the left elementary opera-tions corresponding to S,, S2, ... , Si,. Then

B (A) = SpSp_1 ... S1A (A). (2)

Denoting the product S,Sp_1 . Sl by P(A), we write (2) in the form

B (A) = P (A) A (A), (3)where P(A), like each of the matrices S,, 82i ... , S. has a constants non-zerodeterminant.

In the next section we shall prove that every square A-matrix P(A) witha constant non-zero determinant can be represented in the form of a productof elementary matrices. Therefore (3) is equivalent to (2) and signifiesleft equivalence of the matrices A (A) and B (A).

In the case of right equivalence of the polynomial matrices A(A) andB (A) we shall have instead of (3) the equation

B (A) = A (A) Q (A) (3')

and in the case of (two-sided) equivalence the equation

B(A)=P(A)A(A)Q(A). (3")

Here again, P(A) and Q(A) are matrices with non-zero determinants, inde-pendent of A.

Thus, Definition 2 can be replaced by an equivalent definition.

DEFINITION 2': Two rectangular A-matrices A (A) and B(A) are called1) left-equivalent, 2) right-equivalent, 3) equivalent if

1) B(A) =P(A)A(A),2) B(A) =A(A)Q(A),3) B(A) =P(A)A(A)Q(A),

respectively, where P(A) and Q (A) are polynomial square matrices with con-stant non-zero determinants.

2. All the concepts introduced above are illustrated in the following im-portant example.

We consider a system of m linear homogeneous differential equations oforder l with constant coefficients, where x1, x2, ... , x are n unknown func-tions of the independent variable t :

all (D) x1 + a12 (D) x2 + ... + aln (D) x = 0

a21 (D) x1 + a22 (D) x2 + ... + a2R. (D) x,, = 0 (4)

aml (D) x1 +am2(D)x2+... +amn(D) x.=0;

5 I.e., independent of X.


here

a4x (D) = aik)D1 + a ;) (i =1, 2, ..., m; k =1, 2, ..., n)

is a polynomial in D with constant coefficients;D= d is the differential

operator.The matrix of operator coefficients

A (D) = II a,, (D) II (i=1, 2, ..., m; k =1, 2, ..., n

is a polynomial matrix, or D-matrix.Clearly, the left elementary operation 1. on the matrix A(D) signifies

term-by-term multiplication of the i-th differential equation of the systemby the number c 0. The left elementary operation 2. signifies the term-by-term addition to the i-th equation of the j-th equation which has pre-viously been subjected to the differential operator b (D). The left ele-mentary operation 3. signifies an interchange of the i-th and j-th equation.

Thus, if we replace in (4) the matrix A(D) of operator coefficients by aleft-equivalent matrix B(D), we obtain a deduced system of equations.Since, conversely, by the same reasoning, the original system is a conse-quence of the new system, the two systems of equations are equivalent .6

It is not difficult in this example to interpret the right elementary opera-tions as well. The first of them signifies the introduction of a new unknown

function x{ = x4 for the unknown function x4i the second signifies theintroduction of a new unknown function xi = xj + b (D) x{ (instead of xj) ;the third signifies the interchange of the terms in the equations that containx4 and x, (i.e., x{ = xi, xj = x4).

§ 2. Canonical Form of a 1-Matrix

1. To begin with, we shall examine what comparatively simple form wecan obtain for a rectangular polynomial matrix A (A) by means of leftelementary operations only.

Let us assume that the first column of A(A) contains elements not iden-tically equal to zero. Among them we choose a polynomial of least degreeand by a permutation of the rows we make it into the element a (1). Thenwe divide at, (A) by all (,l) ; we denote quotient and remainder by Q4, (A) andr4i(A) (i =2,...,m):

6 Here it is assumed that the unknown functions x,, x,, . . . , x are such that their deriva-tives of all orders, as far as they occur in the transformations, exist. With this restriction,two systems of equations with left-equivalent matrices B(D) and B(D) have the samesolutions.

§ 2. CANONICAL FORM OF A A-MATRIX 135

ail (A) = a11(A) qti (A) + r1 (A) (i=2, ..., m).

Now we subtract from the i-th row the first row multiplied by qil (A)(i = 2, ... , m). If not all the remainders ri, (A) are identically equal tozero, then we choose one of them that is not equal to zero and is of leastdegree and put it into the place of all (2) by a permutation of the rows.As the result of all these operations, the degree of the polynomial a (A)is reduced.

Now we repeat this process. Since the degree of the polynomial all (A)is finite, this must come to an end at some stage-i.e., at this stage all theelements a2, (A), a3, (A), . . . , a., (1) turn out to be identically equal to zero.

Next we take the element a22 (A) and apply the same procedure to the rowsnumbered 2, 3, . . . , m, achieving a32 (A) _ ... =a.2(2)=O- Continuing stillfurther, we finally reduce the matrix A (A) to the following form :

b11 (A) b12 (A) ... bt (A)

b11 (A) b12 (A) ... blm (A) ... bl,, (A)0 b22 (A) ... b2n (A)

0 bn(A)...b2m(A)...b2(A) . . . . . . . . . ..0 bm (A)

(5)0

0 0 ... 0o ... b,,,,,, (A) ... b,,,, (A)

(mSn)0 0 ... 0 II

(M ?n)

If the polynomial b22(A) is not identically equal to zero, then by applyinga left elementary operation of the second type we can make the degree of theelement b12(A) less than the degree of b22(A) (if b22(A) is of degree zero,then b12(1) becomes identically equal to zero). In the same way, ifbss (A) ; 0, then by left elementary operations of the second type we makethe degrees of the elements b, 3 (A), b23 (,l) less than the degree of b (A) with-out changing the elements b12(A), etc.

We have established the following theorem :

THEOREM 1: An arbitrary rectangular polynomial matrix of dimensionm X n can always be brought into the form (5) by means of left elementaryoperations, where the polynomials blk(A), b,k(A), ... , bk_1.k (d) are of degreeless than that of bkk(2), provided bkk(A) 0, and are all identically equalto zero if bkk(A) = eonst. 0 (k = 2, 3, ... , min (m, n) ).

Similarly, we prove

THEOREM 2: An arbitrary rectangular polynomial matrix of dimensionm X n can always be brought into the form


ell 0 ... 0

C31 0 ... 0 0_0C21 C22 (A) ...

0

C21 (A) C22 (A) ... 0 0 ... 0 . . . . . . . . . . . . . . . .

Cnl (A) Cn2 (A) ... can (A) (6)

Cml (A) Cm2 (A) ... Cmm (A) 0 ... 0 . . . . . . . . . . . . . .

(m <_ n) Cml (A) Cm2 (A) ... C. (A)

(m ? n)

by means of right elementary operations, where the polynomials Ckl (A),

Ck2(A), ... , ce,k_1(2) are of degree less than that of Ckk(M), providedCkk(A) 0, and all are identically equal to zero if ck.;(,l) =eonst. 0

(k=2,3,...,min (m,n)).2. From Theorems 1 and 2 we deduce the corollary :

COROLLARY : If the determinant of a square polynomial matrix P(A)does not depend on 1, and is different from zero, then the matrix can berepresented in the form of a product of a finite number of elementarymatrices.

For by Theorem 1 the matrix P (A) can be brought into the form

b11 (A) b12 (A) ... bin (A)

0 b_2 (A) ... b2n (A)

0 0 ... bnn (A),

(7)

by left elementary operations, where n is the order of P(A). Since in theapplication of elementary operations to a square polynomial matrix thedeterminant of the matrix is only multiplied by constant non-zero factors,the determinant of the matrix (7), like that of P(1), does not depend on Aand is different from 0, i.e.,

b11(1) b22 (A) ... b,+n W== const. # 0.Hence

bkk (A) = const. 0 (k =1, 2, ... , n) .

But then, also by Theorem 1, the matrix (7) has the diagonal form l bAk Iiiand can therefore be reduced to the unit matrix E by means of left ele-mentary operations of type 1. But then, conversely, the unit matrix E canbe transformed into P(2) by means of the left elementary operations whosematrices are S1, S2, ... , So. Therefore

P (A) = SPSP-1 ... S1E = SPSP_1 ... Sl

§ 2. CANONICAL FORM OF A A-MATRIX 137

As we pointed out on p. 133, from this corollary there follows the equiva-lence of the two Definitions 2 and 2' of equivalence of polynomial matrices.

3. Let us return to our example of the system of differential equations (4).We apply Theorem 1 to the matrix 11 aik(D) 11 of operator coefficients. Aswe have shown on p. 135, the system (4) is then replaced by an equivalentsystem

bu (D) x1 + b12 (D) x2 + ... + b1, (D) x, _ - b1,,+1(D) x.+1- ... _ bin (D) xu ,b(D) x2 + ... + bu (D) x, = - b2..+1 (D) x,+1- ... - b2n (D) xu , (4')

b..(D) x. =- b,.,+1 (D) x,+1 - ... - b., (D) xu

where s =min (m, n). In this system we may choose the functions x,+l ,

... , x arbitrarily, after which the functions x,, x,_-1, ... , x1 can be deter-mined successively ; however, at each stage of this process only one differen-tial equation with one unknown function has to be integrated.

4. We now pass on to establishing the `canonical' form into which a rec-tangular matrix A (A) can be brought by applying to it both left and rightelementary operations.

Among all the elements aik(A) of A(A) that are not identically equal tozero we choose one which has the least degree in A and by suitable permuta-tions of the rows and columns we make this element into a (A) . Then wefind the quotients and remainders of the polynomials all (A) and a1 k (A) ondivision by a11(A) :

a1 (A) =a11(A) q:1(A) + r:1(A) , a1k (A) =a11(A) q1k (2) + rlk (2)(i=2,3,...,m; k=2,3,...,n).

If at least one of the remainders rs1(A), r1k(A) (i2,.. . , m; k = 2, ... ,n), for example r1k(A), is not identically equal to zero, then by subtractingfrom the k-th column the first column multiplied by qlk(A), we replaceaak(A) by the remainder r1k(A), which is of smaller degree than a11(A).Then we can again reduce the degree of the element in the top left cornerof the matrix by putting in its place an element of smaller degree in 2.

But if all the remainders r21 (A), ... , rfnI(2) ; r12 (A), ...., r1,,(A) are iden-tically equal to zero, then by subtracting from the i-th row the first multi-plied by g11(A) (i = 2, ... , m) and from the k-th column the first multipliedby g1k(A) (k = 2, ... , n), we reduce our polynomial matrix to the form

II

a11(A) 0 ... 0

0 a22 (A) ... a2u (A)

0

...amt

(A) ... a,,,,, (1) I .


If at least one of the elements aak(A) (i = 2, ... , m; k = 2, ... , n) is notdivisible without remainder by a,I (A), then by adding to the first columnthat column which contains such an element we arrive at the preceding caseand can therefore again replace the element a,I (A) by a polynomial of smallerdegree.

Since the original element aI I (A) had a definite degree and since theprocess of reducing this degree cannot be continued indefinitely, we must,after a finite number of elementary operations, obtain a matrix of the form

11 ai (A) 0 ... 011

0 b22 (1) ... b2+. (1)

0 bm2 (1) . .

.

b. (A)

(8)

in which all the elements bsk (A) are divisible without remainder by aI (A).If among these elements b{k(A) there is one not identically equal to zero, thencontinuing the same reduction process on the rows numbered 2, ... , m andthe columns 2, ..., n, we reduce the matrix (8) to the form

a, (A) 0 0 ... 00 a2 (A) 0 ... 00 0 033 (A) ... Co. (A)

0 0 c,,.3(2) ... e. (1)

where a2 (A) is divisible without remainder by al (A) and all the polynomialsc{k(A) are divisible without remainder by a2(A). Continuing the processfurther, we finally arrive at a matrix of the form

ai (.t) 0 ... 0 0 ... 00 a2 (A) ... 0 0 ... 0

0 0 ... a, (A) 0 ... 00 0 ... 0 0...0

II 0 0 ... 0 0...011where the polynomials aI (A), a2 (A), ... , a.(A) (s < min (m, n) ) are notidentically equal to zero and each is divisible by the preceding one.

By multiplying the first s rows by suitable non-zero numerical factors,we can arrange that the highest coefficients of the polynomials aI (A),a2(2), ... , a. (A) are equal to 1.

§ 3. INVARIANT POLYNOMIALS AND ELEMENTARY DIVISORS 139

DEFINITION 3: A rectangular polynomial matrix is called a canonicaldiagonal matrix if it is of the form (9), where 1) the polynomials ai(l),a2(A), ..., a,(A) are not identically equal to zero and 2) each of the poly-nomials a2(A), ..., a,(A) is divisible by the preceding. Moreover, it is as-sumed that the highest coefficients of all the polynomials al(l), a2(1), ... ,a,(.) are equal to 1.

Thus, we have proved that : An arbitrary rectangular polynomial matrixA(A) is equivalent to a canonical diagonal matrix. In the next section weshall prove that: The polynomials al(l), a2(1), ..., a,().) are uniquelydetermined by the given matrix A(A) ; and we shall set up formulas thatconnect these polynomials with the elements of A (A) .

§ 3. Invariant Polynomials and Elementary Divisorsof a Polynomial Matrix

1. We introduce the concept of invariant polynomials of a 1-matrix A(A).Let A(A) be a polynomial matrix of rank r, i.e., the matrix has minors

of order r not identically equal to zero, but all the minors of order greaterthan r are identically equal to zero in A. We denote by DM(A) the greatestcommon divisor of all the minors of order j in A (A) (j = 1, 2, ... , r).' Thenit is easy to see that in the series

Dr (A), Dr_j (A), ... , D1(2), D0(A)=1

each polynomial is divisible by the preceding ones The corresponding quo-tients will be denoted by it (A), i2(A), ... , d, (A) :

$l(A)=Drr1(2)' i2(A) Dr_1(2)' ..., ir(A)Do(A) =D1(i1). (10)

DEFINITION 4: The polynomials i, (A), i2 (2), ... , i,(A) defined by (10)are called the invariant polynomials of the rectangular matrix A(A).

The term `invariant polynomial' is explained by the following argu-ments. Let A(A) and B(A) be two equivalent polynomial matrices. Thenthey are obtained from one another by means of elementary operations. Butan easy verification shows immediately that the elementary operations

We take the highest coefficient in D, (a) to be I (j =1, 2,. .., r).s If we apply the B4zout decomposition with respect to the elements of any row to an

arbitrary minor of order j, then every term in the decomposition is divisible by D,_,(a) ;therefore every minor of order j, and hence D, (a), is divisible by D,_, (a) (j = 2, 3, . . . , r).


change neither the rank of A(A) nor the polynomials D1(2), D2(A), ...,D,(A). For when we apply to the identity (3") the formula that expresses aminor of a product of matrices by the minors of the factors (see p. 12), weobtain for an arbitrary minor of B (A) the expression

B 71 72... jrk1 k2 ... kp

i1 92 ... ip a1 a2 ... ap #1 Y2 ... flP lAfp'

Q1 2 p1 as < «2< ... < ap a (al a2 ... ap

Q

f l P2 . k k k1; P, <d, <... <ppsm

(p=1, 2, ..., min (m, n)).

Hence it follows that all the minors of order r or greater of the matrix B(A)are zero, so that we have for the rank r* of B (A) :

r* < r.

Moreover, it follows from the same formula that D;(2), the greatest commondivisor of all the minors of order p of B (A), is divisible by Dp (A) (p = 1, 2,... , min (m, n) ). But the matrices A(A) and B(A) can exchange roles.Therefore r:5 r* and D9(A) is divisible by D P (A) (p = 1, 2, ... , min (m, n) ).Hence9

r = r', Di (A) =D1(A), D2 (A) =D2 (A), . , Dr (A) =Dr (A) .

Since elementary operations do not change the polynomials D1 (A), D2 (A),.. , Dr(A), they also leave the polynomials it (A), i2(A), ... , i,.(A) defined by

(10) unchanged.Thus, the polynomials i1(A), i2(A), ..., ir(A) remain invariant on transi-

tion from one matrix to another equivalent one.If the polynomial matrix has the canonical diagonal form (9), then it is

easy to see that for this matrix

D1(A) = at (A), .D2(2)-a1(2)a2(2), ..., Dr(j)=a1(A)a2(A) ... ar (A) .

But then, by (10), the diagonal polynomials in (9) a1(A), a2(A), ... , ar(A)coincide with the invariant polynomials

i1 (A) =a, (A), is (A) = ar_1(A), ..., it (A) = a1(A) . (11)

Here ii (A), i2(A), . . . , ir(A) are at the same time the invariant polynomialsof the original matrix A (A), because it is equivalent to (9).

The results obtained can be stated in the form of the following theorem.

9 The highest coefficients in D, (2) and Dp (2) (p = 1, 2, ... , r) are 1.


THEOREM 3: The rectangular polynomial matrix A (A) is always equiva-

lent to a canonical diagonal matrix

i,(2) 0 ... 0 0...00 'e,_i (A) ... 0 0 ...0

0 0 ... ii (2) 0 ... 00 0 0 0 ... 0..............0 0 0 0...0

(12)

Moreover, r must here be the rank of A(A) and ii(1), i2(A), ... , iT(A) theinvariant polynomials of A(A) defined by (10).

COROLLARY 1: Two rectangular matrices of the same dimension A(A)and B(1) are equivalent if and only if they have the same invariant poly-nomials.

The sufficiency of the condition was explained above. The necessityfollows from the fact that two polynomial matrices having the same invariantpolynomials are equivalent to one and the same canonical diagonal matrixand, therefore, to each other. Thus: The invariant polynomials form acomplete system of invariants of a A-matrix.

COROLLARY 2: In the sequence of invariant polynomials

ii (R)= D,ri (A) D,_2 ),Du (A)

(Do (2)=1) (13)

every polynomial from the second onwards divides the preceding one.This statement does not follow immediately from (13). It does follow

from the fact that the Dolynomials it (A), i2(A), ... , i,.(2) coincide with thepolynomials a, (1), a,_1(1), ... , ai (A) of the canonical diagonal matrix (9).

2. We now indicate a method of computing the invariant polynomials of aquasi-diagonal A-matrix if the invariant polynomials of the matrices in thediagonal blocks are known.

THEOREM 4: If in a quasi-diagonal rectangular matrix

A (2) 0

0 B(2)every invariant polynomial of A (A) divides every invariant polynomial ofB(1), then the set of invariant polynomials of C(1) is the union of theinvariant polynomials of A (A) and B(A).


Proof. We denote by ii(A), i2(2), ..., ir(A) and i1(A), i2 (A), ..., iQ (A),

respectively, the invariant polynomials of the A-matrices A(A) and B(A).

Then1°

A (A) - {i; (A), ..., ii (A), 0, ..., 0), B (A) (A), 0, ..., 0)

and thereforeC (A) (i', (A), ..., ill (A), iQ (A), ..., ii (A), 0, ..., 0). (14)

The A-matrix on the right-hand side of this relation is of canonical diago-nal form. By Theorem 3 the diagonal elements of this matrix that are notidentically equal to zero then form a complete system of invariants of thepolynomial matrix C(A). This proves the theorem.

In order to determine the invariant polynomials of C(A) in the generalcase of arbitrary invariant polynomials of A(A) and B(A) we make useof the important concept of elementary divisors.

We decompose the invariant polynomials it (A), i2 (A) , . . . , i,. (A) into irre-ducible factors over the given number field F :11

it Ca) = [qIL (A)]`' [97s (A)]`' ... IT, (A)]`',is (A) _ [971(A)Id' 1972 (A) P .. IT, (A)]d,, (ct dF? ... z lk

0'(16)

k=1,2, .,sif (A) = [q'1 (A)]" IT2 W)h ... [4r, (A)]`'.

Here T, (A), 972(A), . . ., 97,(1) are all the distinct factors irreducible over F(and with highest coefficient 1) that occur in il(A), i2(A), . . . , i,(A).

DEFINITION 5 : All the powers among IT, Q)]11, ... , [q',(A)]i in (15), asfar as they are distinct from 1, are called the elementary divisors of thematrix A (A) in the field p.12

THEOREM 5: The set of elementary divisors of the rectangular quasi-diagonal matrix

A (A) 0

0 B(2)

is always obtained by combining the elementary divisors of A(A) with thoseof B(A).

10 The symbol ~ denotes here the equivalence of matrices; and braces ( ), a diagonalrectangular matrix of the form (12).

11 Some of the exponents cs, dR, ..., lR (k= 1, 2, ... , s) may be equal to zero.12 The formulas (15) enable us to define not only the elementary divisors of .4(A)

in the field r in terms of the invariant polynomials but also, conversely, the invariantpolynomials in terms of the elementary divisors.


Proof. We decompose the invariant polynomials of A(A) and B(A) intoirreducible factors over p :18

ii (A) [q'1(A)fl [9's (A)]`' ... [9'. (A)]`', i (A) = [PI WP [9's (A)]; ... [p, (A)]`',i2 (A) _ [9'1(A)]d' [4's W14 ... [9'. (A)],, i2 (A) = [q'l (A) "[9's (A)] ... IT, (20,.......................................i; (A) == [ml (A)] A' Cq2 (A)]k' ... [9'. (2)]k', i9 (A) _ [p1(A))9i [9's (2)]h ... [9', (A)]s'

We denote by(16)

all the non-zero numbers among ci, di, ... , hi, ci, di, . - . , 9iThen the matrix C(A) is equivalent to the matrix (14), and by a permuta-

tion of rows and of columns the latter can be brought into `diagonal' form

{[9'1(A)]`' (*), [p, (A)14' - (*), - [p, (A)]" (*), (**), . . . , (**)) (17)

where we have denoted by (*) polynomials that are prime to pl (A) and by(**) polynomials that are either prime to g71(A) or identically equal tozero. From the form of the matrix (17) we deduce immediately the follow-ing decomposition of the polynomials D,(2), D,_1(2), ... and i1 (A), i2(A), .. .of the matrix C (A) :

D, (A) _`,+d1+ ... +t, , (*),

D.._1 (A)= [9'i (A)]d'+... +i, , (*), ... ,

it (A) = [9'1(A)]", W, is (A) = [ml (A)]d' (*), ... .

Hence it follows that [9'1(A)p, [gel (A)]d',- - , [c1(A)]`', i.e., all the powers

[q'1(A)] ... , [q'1(2)]1', (A)]`', ... , [q'1(2)]°'

as far as they are distinct from 1, are elementary divisors of C(A).The elementary divisors of C(A) that are powers of ,2(A) are determined

similarly, etc. This completes the proof of the theorem.Note. The theory of equivalence for integral matrices (i.e., matrices

whose elements are integers) can be constructed along similar lines. Herein 1., 2. (see pp. 130-31) c = ± 1, b(A) is to be replaced by an integer, andin (3), (3'), (3"), in place of P(A) and Q(A) there are integral matriceswith determinants equal to - 1.

1s If any irreducible polynomial 9;k(A) occurs as a factor in some invariant polynomials,but not in others, then in the latter we write qk(A) with a zero exponent.


3. Suppose now that A= II ask III is a matrix with elements in the field F.We form its characteristic matrix

AE - A =

-all -a12 ..... .- a21 2-a22

-a,,-a2

(18)

II -a,1 -aii2 ... A-a. 11

The characteristic matrix is a A-matrix of rank n. Its invariant polynomials

itDnn i2 (a) = Dn-2 (A) Do (2) (D o W=1) , (19)

are called the invariant polynomials of the matrix A and the correspondingelementary divisors in F are called the elementary divisors of the matrix Ain the field F. A knowledge of the invariant polynomials (and, hence, ofthe elementary divisors) of A enables us to investigate its structure. There-fore practical methods of computing the invariant polynomials of a matrixare of interest. The formulas (19) give an algorithm for computing thesepolynomials, but for large n this algorithm is very cumbrous.

Theorem 3 gives another method of computing invariant polynomials,based on the reduction of the characteristic matrix (18) to canonical diago-nal form by means of elementary operations.

Example :

3 1 0 0 A-3 -1 0 0

A= - 4 -1 0 0 AE-A = 4 A+1 0 06 1 2 1 -6 -1 A-2 -1

-14 -6 -1 0 14 5 1 A

In the characteristic matrix AE - A we add to the fourth row the thirdmultiplied by A :

A-3 -1 0 0

4 A+1 0 0

-6 -1 A-2 -114-8A 5-A A=-2A+1 0

Now adding to the first three columns the fourth, multiplied by - 6, - 1,and A - 2, respectively, we obtain

A-3 -1 0 04 A+1 0 0

0 0 0 -114-61 5-1 A'-21+1 0

We add to the first column the second multiplied by A - 3:

§ 4. EQUIVALENCE OF LINEAR BINOMIALS 145

0 -1 0 01

12-2A+1 A+1 0 01

0 0 0 -1M6-A 1'-2A } 1 Oi'

To the second and fourth rows we add the first multiplied by A + 1 and 5 - A,respectively ; we obtain

0 -1 0 0

A2-2A + 1 0 0 00 0 0 -1

-A$+2A-1 0 11-21+1 0

To the second row we add the fourth; then we multiply the first and thirdrows by -1. After permuting some rows and columns we obtain :

1 0 0 00 1 0 00 0 (A-1)' 00 0 0 (A -1)2

The matrix has two elementary divisors (A_ 1)2 and (A -1)2.

§ 4. Equivalence of Linear Binomials

1. In the preceding sections we have considered rectangular A-matrices. Inthe present section we consider two square A-matrices A (A) and B (A) oforder n in which all the elements are of degree not higher than 1 in A. Thesepolynomial matrices may be represented in the form of matrix binomials :

A (A) = A OA + A 1, B (A) =BOA + B1.

We shall assume that these binomials are of degree 1 and regular, i.e.,that I Ao 1 O, 1 B01 =A 0 (see p. 76).

The following theorem gives a criterion for the equivalence of suchbinomials :

THEOREM 6: If two regular binomials of the first degree Ao2 + Al andBoil + B1 are equivalent, then they are strictly equivalent, i.e., in the identity

BOA + B1= P (A) (A0A + A1) Q (A) (20)

the matrices P(1) and Q(A)--uith constant non-zero determinants-can bereplaced by constant non-singular matrices P and Q :14

BOA + B1= P (AOA + A1) Q. (21)

14 The identity (21) is equivalent to the two matrix equations: Bo = PA0Q andB, = PA,Q.


Proof. Since the determinant of P(A) does not depend on A and is differ-

ent from zero,"' the inverse matrix M(A) =P-1(A) is also a polynomial mat-rix. With the help of this matrix we write (20) in the form

M (A) (BOA + B1) = (AOA + Al) Q (A) . (22)

Regarding M(A) and Q(A) as matrix polynomials, we divide M(A) onthe left by AOA + A, and Q (A) on the right by BOA + B, :

M (A) = (AoA + A1) B (A) + N, (23)

Q (A) = T (A) (BOA + B1) + Q ; (24)

here M and Q are constant square matrices (independent of A) of order n.We substitute these expressions for M(A) and Q (A) in (22). After a fewsmall transformations, we obtain

(AOA + Al) [T (A) - S (A)] (BOA + B1) = M (BoA + B1) - (AOA + A1) Q . (25)

The difference in the brackets must be identically equal to zero; for other-wise the product on the left-hand side of (25) would be of degree ? 2, whilethe polynomial on the right-hand side of the equation is of degree not higherthan 1. Therefore

S (A) = T (A) ; (26)

But then we obtain from (25) :M(BOA + B1) = (AOA + A1) Q. (27)

We shall now show that M is a non-singular matrix. For this purposewe divide P(A) on the left by BOA + B, :

P (h) _ (BOA + Bl) U (A) + P. (28)

From (22), (23), and (28) we deduce:E =M(A)P (A) = M(A) (BOA + Bl) U(A) + M(A) P

= (AOA + Al) Q (A) U(A) + (AOA + Al) S (A) P + MP= (AOA + A1) [Q (A) U(A) +S (A) P] + MP. (29)

"a The equivalence of the binomials Ac A+ A, and BoA + B, means that an identity(20) exists in which I P (A) I = const. 0 and I Q (A) I = cont. 0. However, in thiscase the last relations follow from (20) itself. For the determinants of regular binomialsof the first degree are of degree n:

IAOA+A,1-IA,IA' +...,IBoA+Bll= IB,IAn+...;IAoI o.

Therefore it follows from

IB01+B, I=IP(A)IIA.)+A,IIQ(A)Ithat

I P (A) const. # 0, 1 Q (A) I = cont. 0.

§ 5. A CRITERION FOR SIMILARITY OF MATRICES 147

Since the last term of this chain of equations must be of degree zero in A(because it is equal to E), the expression in brackets must be identicallyequal to zero. But then from (29)

MP=E' (30)

so that 0andM-1=P.Multiplying both sides of (27) on the left by P, we obtain :

B,A+B1=P(A02+A1)Q-

The fact that P is non-singular follows from (30). That P and Q are non-singular also follows directly from (21), since this identity implies

B0 = PAOQand therefore

IPIIA0IIQI=IB0I0.

This completes the proof of the theorem.Note. From the proof it follows (see (24) and (28)) that the constant

matrices P and Q by which we have replaced the A-matrices P(A) and Q(A)in (20) can be taken as the left and right remainders, respectively, of P(A)and Q (A) on division by B02 + B2.

§ S. A Criterion for Similarity of Matrices

1. Let A II atk 1Ii be a matrix with numerical elements from the field F.Its characteristic matrix AE - A is a A-matrix of rank n and therefore has ninvariant polynomials (see § 3)

ii (A) I ig (A)

The following theorem shows that these invariant polynomials deter-mine the original matrix A to within similarity transformations.

THEOREM 7: Two matrices A= II a4k 111 and B = II b{k II; are similar(B = T-1AT) if and only if they have the same invariant polynomials or,what is the same, the same elementary divisors in the field F.

Proof. The condition is necessary. For if the matrices A and B aresimilar, then there exists a non-singular matrix T such that

HenceB= T-1 AT.

)4E-B=T-1(i1E-A)T.This equation shows that the characteristic matrices AE - A and 2E - Bare equivalent and therefore have the same invariant polynomials.


The condition is sufficient. Suppose that the characteristic matricesAE - A and 1E - B have the same invariant polynomials. Then theseA-matrices are equivalent (see Corollary 1 to Theorem 3) and there exist, inconsequence, two polynomial matrices P(A) and Q(A) such that

AE-B=P(A)(AE-A)Q(A). (31)

Applying Theorem 6 to the matrix binomials AE - A and AE - B, wemay replace in (31) the A-matrices P (A) and Q (A) by constant matricesPand Q:

AE-B=P(AE-A)Q; (32)

moreover, P and Q may be taken (see the Note on p. 147) as the leftremainder and the right remainder, respectively, of P(1) and Q(A) ondivision by AE - B, i.e., by the Generalized Bezout Theorem, we may set :'s

P = P (B), Q =Q (B)

Equating coefficients of the powers of A on both sides of (32), we obtain:

B = PAQ, E = PQ,i.e.,

where

This proves the theorem.

B=T-'AT,

T=Q=P-1

(33)

2. Note. We have incidentally established the following result, which westate separately :

SUPPLEMENT TO THEOREM 7. If A = II a;k II 1 and B = II bik II i are twosimilar matrices,

B = T-1 AT, (34)

then we can choose as the transforming matrix T the matrix

T = Q (B) _ [P (B)]-1 , (35)

where P(A) and Q(A) are polynomial matrices in the identity

AE - B = P(A) (AE - A) Q (A)which connects the equivalent characteristic matrices 2E - A and AE - B;in (35) Q(B) denotes the right value of the matrix polynomial Q(A), andP(B) the left value of P(2), when the argument is replaced by B.

16 We recall that P(B) is the left value of the polynomial P(X) and Q(B) the rightvalue of Q(a), when a is replaced by B (see p. 81).

§ 6. THE NORMAL FORMS OF A MATRIX 149

§ 6. The Normal Forms of a Matrix

1. Letg(A)=Am+a1A",-1+...+. am_11+am

be a polynomial with coefficients in F.We consider the square matrix of order m

L=

0 0 . . . 0 _- am

1 0 . . . 0 a",-'0 1 . . . 0 al-2 (36)

1100. . . 1 -al

11

It is not difficult to verify that g (I) is the characteristic polynomial of L :

A 0 0. . .0 am M

-1 A 0 . . . 0 am-1JAE-L j = 0 -1 A . . . 0 am_2 g

0 0 0...-1 al-FAOn the other hand, the minor of the element a,, in the characteristic

Dm (A)determinant is equal to ± 1. Therefore DD_1(2) = 1 and it (A) = Dm_1(2) =DD(1) =9(A), i2 (A) =...=im(A) =1.

Thus, L has a single invariant polynomial different from 1, namelyg(A).

We shall call L the companion matrix of the polynomial g(,1).Let A= II a a, 11 71 be a matrix with the invariant polynomials

( 1 ) ( ' X ) ... , it (1) , it+' (A) =1, ... , i. ( 1) =1. (37)

Here the polynomials i1 (1), i2 (A), ... , it (A) have positive degrees and, fromthe second onwards, each divides the preceding one. We denote the com-panion matrices of these polynomials by LI, L2, ... , Lt.

Then the quasi-diagonal matrix of order n

LI=(L2,L2,...,Lt) (38)

has the polynomials (37) as its invariant polynomials (see Theorem 4 onp. 141). Since the matrices A and LI have the same invariant polynomials,they are similar, i.e., there always exists a non-singular matrix U (I U 0)such that


A = ULIU-1. (I)

The matrix LI is called the first natural normal form of the matrix A. Thisnormal form is characterized by : 1) the quasi-diagonal form (38), 2) thespecial structure of the diagonal blocks (36), and 3) the additional condi-tion : in the sequence of characteristic polynomials of the diagonal blocksevery polynomial from the second onwards divides the preceding one.17

2. We now denote byXi (A) , Xa (A) , - - , XU (A) (39)

the elementary divisors of A = 11 act Ili in the number field F. The corres-ponding companion matrices will be denoted by

La1, L(s), ... , L(").

Since Xt(1) is the only elementary divisor of L('' (j = 1, 2, ... , u),18 thequasi-diagonal matrix

LII = (DI), L(2), ..., L(a)) (40)

has, by Theorem 5, the polynomials (39) as its elementary divisors.The matrices A and LII have the same elementary divisors in F. There-

fore the matrices are similar, i.e., there always exists a non-singular matrixV (f Vf9& 0) such that

A=VLIIV-1. (II)

The matrix LII is called the second natural normal form of the matrix A.This normal form is characterized by: 1) the quasi-diagonal form (40),2) the special structure of the diagonal blocks (36), and 3) the additionalcondition : the eharacterlstle polynomial of each diagonal block is a powerof an irreducible polynomial over F.

Note. The elementary divisors of a matrix A, in contrast to the invariantpolynomials, are essentially connected with the given number field F. If wechoose instead of the original field F another number field (which also con-tains the elements of the given matrix A), then the elementary divisors maychange. Together with the elementary divisors, the second natural normalform of a matrix also changes.

17 From the conditions 1), 2), 3) it follows automatically that the characteristic poly-nomials of the diagonal blocks in LI are the invariant polynomials of the matrix LI and,hence, of A.

78 Xf(1) is the only invariant polynomial of L(i) and is at the same time a power of apolynomial irreducible over F.

§ 6. Tim NORMAL FORMS OF A MATRIX 151

Suppose, for example, that A= it as IIi is a matrix with real elements.The characteristic polynomial of the matrix then has real coefficients. Butthis polynomial may have complex roots. If F is the field of real numbers,then among the elementary divisors there may also be powers of irreduciblequadratic trinomials with real coefficients. If P is the field of complexnumbers, then every elementary divisor has the form (I - I4)P.

3. Let us assume now that the number field F contains not only the elementsof A, but also the characteristic values of the matrix.19 Then the elementarydivisors of A have the form20

(A-A1)p', (A-A2)p', ..., (A-2,/)Pu (p1+p2+...+pu=n). (41)

We consider one of these elementary divisors :

(A-Ao)P

and associate with it the following matrix of order p :

Ao 1 0 . . . 00 Ao 1 . . 0

. . . . . . . . = A0E(P) + H(P). (42)

0 0 0 . . . 1

0 0 0...AoIt is easy to verify that this matrix has only the one elementary divisor

(I - ),)P. The matrix (42) will be called the Jordan block corresponding tothe elementary divisor (I - I0)P.

The Jordan blocks corresponding to the elementary divisors (41) willbe denoted by

J1,J2,...,J,Then the quasi-diagonal matrix

J=(J1, J2, ..., Jx)has the powers (41) as its elementary divisors.

The matrix J can also be written in the form

J=( AAE1+H1, 22E2+H2, ..., AuE9+H.)where

Ek=E(P4), Hk=H(Ps) (k=1, 2, ..., u).

10 This always holds for an arbitrary matrix A if r is the field of complex numbers.20 Among the numbers 11, 11, ... , 1 there may be some that are equal.


Since the matrices A and J have the same elementary divisors, they aresimilar, i.e., there exists a non-singular matrix T(I T 1 0) such that

A=TJT-'=T(AIR, +H1, A2R2+H$, ..., (UI)

The matrix J is called the Jordan normal form or simply Jordan formof A. The Jordan normal form is characterized by its quasi-diagonal formand by the special structure (42) of the diagonal blocks.

The following scheme describes the Jordan matrix J for the elementarydivisors (A - AI)2, (A - A2)3, A -- A8 , (A - 24)2:

Al 1 0 0 0 0 0 0

0 0 00 A 0 0 0I..............................0 0 A$ 1 0 0 0 0

0 0 0 A2 1 0 0 0 (43)0 0 0 0 A2 0 0 0

0 0 0 0 0 Aa 0 0

0 0 0 0 0 0 A,4 1

0 0 0 0 0 0 0 A4If (and only if) all the elementary divisors of a matrix A are of the first

degree, the Jordan form is a diagonal matrix, and in this case we have :

A =T (A1, Aa, ... ,

A matrix A has simple structure (see Chapter III, § 8) if and onlyif all its elementary divisors are of the first degree.21

Instead of the Jordan block (42) sometimes the `lower' Jordan block oforder p is used :

1o 0 . . . 0 0

1 Ao . . . 0 00

= A I,?tpl + kv)

Ao 0

110 . . . 0 1 AO

This matrix also has the single elementary divisor (A - Ao)¢ only. To theelementary divisors (41) there corresponds the lower Jordan matrix.22

21 The elementary divisors of degree I are often called `linear' or 'simple' elementarydivisors.

22 The matrix J is often called the upper Jordan matrix, in contrast to the lowerJordan matrix J(1) .

§ 7. THE ELEMENTARY DIVISORS OF A MATRIX 153

J(,)= { A1E1 + F1, A2E2 + F2,, ..., A..E,, + F.)

(Ek =E(" , Fk =F(P1) ; k = 1, 2, ..., u).

An arbitrary matrix A having the elementary divisors (41) is alwayssimilar to J(1), i.e., there exists a non-singular matrix T, (I T, I 0) suchthat

A=TIJ(,)Ti1=T,{A1E1+F1, A2E2+F2, ..., (IV)

We also note that if Ao 0, each of the two matrices

Ao(Ecr) + H(n)) ,

A0(E('° + Fcr))

has only the single elementary divisor (A - Therefore for a non-singular matrix A having the elementary divisors (41) we have, apart from(III) and (IV), the representations

A=T2{AI(EI+Hi),As(L's+Hs),...,Au(E.+H.)}Tsl, (V)

A=T3(A1(EI+FI), As(Es+Fs), ..., Au (E#+Fu)) Ti'. (VI)

§ 7. The Elementary Divisors of the Matrix I(A)

1. In this section we consider the following problem :

Given the elementary divisors (in the field of complex numbers) of amatrix A = II a4k I11 and given a function f (A) defined on the spectrum of A,to determine the elementary divisors (in the field of complex numbers) ofthe matrix f (A).

The matrix f (A) does not alter if we replace the function f (A) by a poly-nomial that assumes on the spectrum of A the same values as f (A) (seeChapter V, § 1). Without loss of generality we may therefore assume inwhat follows that f (A) is a polynomial.

We denote by

(A -- Al)r' , (2 --- A2)r' , ..., (A - AX"

the elementary divisors of A.23 Thus A is similar to the Jordan matrix

A=TJT-1,and so

f (A)= T f (J) T-1.

23 Among the A. there may be some that are equal.


Moreover,

J={J1, J2, ..., J,=AcE(PC)+H(PC) (i=1, 2, ..., u)and

f (J) = (f (J1) , f A)' ... , W-01,

where (see Example 2 on p. 100)

f (Ac) 1! (ps-lj.

(45)

(46)

Since the similar matrices f (A) and f (J) have the same elementarydivisors, we shall from now on consider f (J) instead of f (A).

2. Let us determine the defect24 d of f (A) or, what is the same, of f (J).The defect of a quasi-diagonal matrix is equal to the sum of the defects ofthe various diagonal blocks and the defect of f (J4) (see 46)) is equal to thesmaller of the numbers k{ and p4, where k4 is the multiplicity of A as a rootof f (A),25 so that

f(A4)=f'(A4)=...=ICx:)(2),0 (i= 1, 2, ..., u).

We have thus arrived at the following theorem :

THEoRtEM 8: The defect of the matrix f (A), where A has the elementarydivisors

(A - 21)P' , (A - W' (47)

is given by the formula4

d =' min (k4 , ps) ; (48)

24 d = n - r, where r is the rank of f (A). If the elementary divisors of a matrix areknown, then the defect of the matrix is determined as the number of elementary divisorscorresponding to the characteristic value 0, i.e., as the number of elementary divisors ofthe form AO.

25 ki may be equal to zero; in that case f ()) 0.


here ki i s the multiplicity of Ai as root of f (2) (i =1, 2, ... , u).28

As an application of this theorem we shall determine all the elementarydivisors of an arbitrary matrix A = II aik Iii that corresponds to a charac-teristic value AO:

91 92 9m

where gs ? 0 (i = 1, 2, ... , m - 1), gm > 0, provided the defects

of the matrices d1,d2,

. . . , dm

A -.10E , (A - 20E)2 , ... , (A - AOE)mare given.

For this purpose we note that (A-A0E)'= f;(A), where ff(A)(j = 1, 2, ... , m). In order to determine the defect of (A - A.9)1 we have,therefore, to set k{ = j in (48) for the elementary divisors corresponding tothe characteristic value Ao and ks = 0 for all the other terms (j 1, 2, . . . , m).Thus we obtain the formulas

91+ 92+ 9s+...+ 9m=d1,91+292+293+...+ 29m=d2,

91+292+3g3+...+ 3gm=d3, (49)

Hence27

gg = 2di - dd_1 -- di+I (j = 1, 2, . . ., m; do = 0, =dm) . (50)

3. Let us return to the basic problem of determining the elementary divisorsof the matrix f (A). As we have mentioned above, the elementary divisorsof f (A) coincide with those of f (J) and the elementary divisors of a quasi-diagonal matrix coincide with those of the diagonal blocks (see Theorem 5).Therefore the problem reduces to finding the elementary divisors of a matrixC of regular triangular form :

26 In the general case, where f (X) is not a polynomial, then min (kt, pi) in (48) has tobe interpreted as the number pt if

f (\i) = f,00 _ ... = f00 (Xi) = 0and as the number k, : pi if

f(k{)(a4) -A 0(i=1,2,...,u).

27 The number m is characterized by the fact that dm_I< dm= dm fi (j -1, 2, ...).


P-1

C=,ra,.H1' _k-0

:a0 a1 . . . ap

ao:(51)

a1 !!

0 0 . . . ao il

We consider separately two cases :

1. a1 O. The characteristic polynomial of C is obviously equal to

Dp(A) _(A-ao)'.

Since Dp_1(A) divides D,(2) without remainder, we have

Dp_1(A) =(A-a0)9 (9:!9p)

Here Dp_1(A) denotes the greatest common divisor of the minors of orderp -1 in the characteristic matrix

A-a0 -a1. . . . -a,-10 A-a0

AE-C =-a1

0 0 . . . A-ao

It is easy to see that when the minor of the zero element marked by `+'is expanded, every term contains at least one factor A - ao, except the prod-uct of the elements on the main diagonal, which is (- a1)p-1 and is there-fore in our case different from zero. But since D,-,, (A) must be a powerof A - a0, we see that g = 0. But then it follows from

Dp (A) = (A -a0)p, D,-,(A) =l

that C has only the one elementary divisor (A - ao)¢.2. a1= ... = ak_1= 0, ak ; 0. In this case,

C = a0E + a + ....i.. ap-1Hp-1

Therefore for the positive integer j the defect of the matrix

(C - a0E)f = a4H1 i + .. .is given by


dy _ jkj, when

p, whenWe set

Then28

kjp,kj>p.

p=qk+h (0Sh<k). (52)

d1=k, d2= 2k, ..., de =qk, dQ+l=p (53)

Therefore we have by (50)

91= ... = ge-1= 0, 9Q = k --- h, 9y+1= h.

Thus, the matrix C has the elementary divisors

(A-a0)2+1, ..., (A-a0)9+I, (A-ao)4, ..., (A-ao)Q

h k-hwhere the integers q > 0 and h ? 0 are determined by (52).

4. Now we are in a position to ascertain what elementary divisors the matrixf (J) has (see (45) and (46) ). To each elementary divisor of A

(A - Ao)r

there corresponds in f (J) the diagonal cell

p-1 0(a)(AOE+H)=Z tiH;

i-O(55)

Clearly the problem reduces to finding the elementary divisors of a cellof the form (55). But the matrix (55) is of the regular triangular form(51), where

Thus we arrive at the theorem :

f(Ao)f, (AO)

,(p-1)(10)

(P' )0 1 (AO) . '

29 In this case the number q + I Plays the role of at in (49) and (50).


THEOREM 9: The elementary divisors of the matrix f (A) are obtained

from those of A in the following way : To an elementary divisor

(A - A0)P (56)

of A for p =1 or for p > 1 and f' (AO) 7& 0 there corresponds a single ele-

mentary divisor(A - f

(A0))P (57)

of f(A) ; for p > 1 and f (AO) = ... = f0_1)(AO) = 0, /(k)(20)* 0 (k < p) to theelementary divisor (56) of A there correspond the following elementarydivisors of f (A) :

(A - f(A0))4+1, _ .. , (A - f

(AO))9+',

(A - f (A0))9, ... , (A - f (Ao))°, (58)

h k-hwhere

p=qk+h, 0sq, 0Sh<k;finally, for p > 1, f'(AO) = ... =/(P 1)(20) =0, to the elementary divisor (56)there correspond p elementary divisors of the first degree of f (A) :29

A-f(A0), ..., A-f(Ao) (59)

We note the following special cases of this theorem.

1. I f Al, A2, ... , A. are the characteristic values of A, then f (A,), R AO,, f (A,,) are the characteristic values of f (A). (In both sequences each

characteristic value is repeated as often as its multiplicity as a root of thecharacteristic equation indicates.) $0

2. If the derivative f'(A) is not zero on the spectrum of A,91 then ingoing from A to f (A) the elementary divisors are not 'split up' i.e., if Ahas the elementary divisors

(A - Ad", (A -12)P', ... , (A - An)P" ,

then f (A) has the elementary divisors

(A __ f (A1))Pf, (A - f (AZ))P', ... , (A - f (A"))P-

29 (57) is obtained from (58) by setting k = 1; (59) is obtained from (58) by settingk=p ork>p.

so Statement 1. was established separately in Chapter IV, p. 84.a' I.e., f (X,) , 0 for those X that are multiple roots of the minimal polynomial.

§ 8. GENERAL METHOD OF CONSTRUCTING TRANSFORMING MATRIX 159

§ 8. A General Method of Constructing the Transforming Matrix

In many problems in the theory of matrices and its applications it issufficient to know the normal form into which a given matrix A = 11 aik 11T

can be carried by similarity transformations. The normal form is com-pletely determined by the invariant polynomials of the characteristic matrix2E- A. To find the latter, we can use the defining formulas (see (10) onp. 139) or the reduction of the characteristic matrix AE -A to canonicaldiagonal form by elementary transformations.

In some problems, however, it is necessary to know not only the normalform .A of the given matrix A, but also a non-singular transforming matrix T.

1. An immediate method of determining T consists in the following. Theequation

A=TAT-'

can be written as :

AT-TA=O.

This matrix equation in T is equivalent to a system of n2 linear homogeneousequations in the n2 unknown coefficients of T. The determination of atransforming matrix reduces to the solution of this system of n2 equations.Moreover, we have to choose from the set of all solutions one for whicht T I 0. The existence of such a solution is certain, since A and A havethe same invariant polynomials.32

Note that whereas the normal form is uniquely determined by the matrixA,38 for the transforming matrix T we always have an innumerable set ofvalues that are given by

T = UT1i (60)

where T1 is one of the transforming matrices and U is an arbitrary matrixthat is permutable with A.84

32 From this fact follows the similarity of A and A.33 This statement is unconditionally true as regards the first natural normal form.

As far as the second normal form or the Jordan normal form is concerned, they areuniquely determined to within the order of the diagonal blocks.

34 The formula (60) may be replaced by

T = T,V,,

where V is an arbitrary matrix permutable with A.


The method proposed above for determining a transforming matrix T issimple enough in concept but of little use in practice, since it requires a greatmany computations (even for n = 4 we have to solve 16 linear equations).

2. We proceed to explain a more efficient method of constructing thetransforming matrix T. This method is based on the Supplement to Theorem7 (p. 148). According to this, we can choose as the transforming matrix

T= Q(A), (61)

provided

AE-A=P(A) (AE-A)Q(A)

The latter equation expresses the equivalence of the characteristic matricesAE - A and AE - A. Here P (A) and Q (A) are polynomial matrices withconstant non-zero determinants.

For the actual process of finding Q(A) we reduce the two A-matricesAE - A and AE-1 to canonical form by means of the correspondingelementary transformations

in-,(A), ... , i,(A)}=P,(A)(AE-A)Q,(A) (62)('-(A), i,(A)}=P2(A) (AE-A)Q2(A) (63)

whereQ1 (A)=T,T2i...,T,, Q2(A)=TITI...Tpy, (64)

and where T1, ... , Trt, T1, .... T,, are the elementary matrices correspond-ing to the elementary operations on the columns of the A-matrices AE -Aand AE -.X. From (62), (63), and (64) it follows that

AE--A=P(1) (AE-A)Q(A),where

1 (65)Q (A)= Qi (A) Q-1 (A) = T,T2 ... TP,Tr.1Tp. i ...T*-

We can compute the matrix Q (A) by applying successively to the col-umns of the unit matrix E the elementary operations with the matricesT1, ..., Tp T ', ..., Ti -1. After this (in accordance with (61)) wereplace the argument A in Q(A) by the matrix A.

Example.1 -1 1 -1

-3 3 -5 4A=8 -4 3 -415 - 10 11 - 11


Let us introduce a symbolic notation for the right elementary operationsand the corresponding matrices (see pp. 130-131) :

T" [(c)il, T"=[i+(b(A))1]. T", =[i9]

In transforming the characteristic matrix AE - A into normal diagonalform we shall at the same time keep a record of the elementary right opera-tions to be performed, i.e., the operations on the columns :

JJA-1 1 -1 1 1 0 0 0 1

2E-A=

1 0 0 0-4 2+1 1 42-I

4 0 2+1 -42--42+11 -2-1 2 -22-102-4

32-3 5 -4 42-1 2+1 1 -4- 8 4 A-3 4 -42-4 0 2+1 4-15 10 -11 2+I1 -22-I02-4 -2-1 2 2+11

1 0 0 00 2+1 1 42-10 0 2+1 -42-40 -2-1 2 -22-102-4

.

1 0 00 0 1

0 -2'-22-1 2+10 -2'-22-1 2

1 0 0

0 1 0

0 2+1 -22-22-10 A -22-22-1

1 0 0 00 1 0 0

0 0 1'+22+1 412+72+30 0 2'+22+1 522+92+4 ii

1 0 00 1 00 O -AS-21-10 0 A'+22+1

1 0 00 1 0

0 0 A+10 0 -22-32-2

00

-22-22-122+22+1

1 0 00 1 00 0 A+l0 0 0

00

-42'-72-3-522-92-4

1

00

00

-4A2-72-3-522-92-4

0 0 01 0 00 22+21+1 -22-32-20 22+22+1 -2-10

0

0

A+1_22-32-21 0 0 0

0 1 0 0

0 0 A+1 00 0-A'-32-2 -(2+1)'

00 II

.

.

.

.

Here

Ql (2) = [1 + (1 - 2) 4] [2 - 4] [3 + 4] [14] [2 - (2 + 1) 3] [4 + (1- 42) 3] [23] xx [4 - (5) 3] [43] [4 + (A + 1) 3]


We have found the invariant polynomials (A + 1)3, (A + 1), 1, and 1of A. The matrix has two elementary divisors, (A + 1) 3 and (A + 1). There-fore the Jordan normal form is

-1 1 0 0

0 -1 1 0J=

0 0 -1 0

0 0 0 -1By elementary operations we bring the matrix AE - J into normal diago-

nal form

AE-J=A+1 -1 0 0

0 A+1 -1 0

0 0 A+1 0

0 0 0 A+10 -10 0

(A + 1)3 (A + 1)2

+1 -1 0 00 0 -1 00 (A+1)2 A+1 00 0 0 A + I

Here

Therefore

0 0 A+10 0 0

0 0

0 1 0 00 0 1 0

(2+1)20 0 00 0 0 1+ 1

1 0 00 1 0

I

1 0 0 00 1 0 0o o 0 (A + 1)a0 0 A+1 0

0

00 {A+ 1)a

Q2(A)=[2+(A+1)3] [1+(A+1)2] [12] [23] [34].

Q(A) =Q1(A) QY1 (A)=(l+(1-1)4] [2-4] [3+4] (14] [2-(A+1)3] [4+(I-4A)3] [23](4-(5)3] x

x [43] [4 + (A + 1) 3] [34] [23] (12) [1- (A + 1) 2] [2 - (A + 1) 3].

We apply these elementary operations successively to the unit matrix E :

1 0 0 0 1 0 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 1 1-A -1 1 1

0 0 0 1 0 0 0 1

0 1 0 0 0 1 0 00 0 1 0 0 -A-1 1 01 -1 1 1-A 1 -A--2 1 1-A


00 1

00 -5

0 1 ` II 00 0 1

0 0 11! 00 1 01 1-41 1' 0 1 -A-1 1-411 2-5A Il 1 1 1 -A-2 2-5A

01

00 1 A+1I o o -5 -51-401 A+6 A2+61+511 12 11A.4-10

01A+6 -A-11 1 12 -1-2

Thus

Q (A) _

0 0 0 00 0 0 0

1 0 0 00 0 0 0

Observing that

we have

T=Q(J)-

1 -2 1 00 1 -2 00 0 1 00 0 0 1

A+ 1 0 0 1 I'

-51-4 0 0 -5As+61+5 -A-1 1 A+610A + 9 - A 1 12

A+1 0 0 1

-51-4 0 0 -512+6A+5 -A-1 1 A+6

101+9 -A 1 12

As +

1 00 0-5 00 0

6-10 110-10 0

A +

1 00 1

-4 00 -55-11 69 01 12

J2 =

0 0 0 00 0 0 01 0 0 00 0 0 0

1 0 0 0-5 0 0 0

6 -1 0 1

12 -1 0 00 1 0 1

1 -5 0 -50 4 1 5-1 11 0 12

1 -2 1 00 1 -2 00 0 1 00 0 0 1

+

I 00 0 1

00 1 -501 -2-1 A+61 1 -1----2 12

f

1+1 00 1

-51-4 0 0 -5A'+6A+5 0 1 1+6

111 + 10 1 1 12

I

I

-1 1 0 00-1 1 00 0 -1 00 0 0-1

+1 0 0 1-4 0 0-55-11 69 0 1 12

i

f


Check :

0 - 1 1 - 1 0 - 1 1 - 1AT= 1 6 -5 5 TJ = -1 6 -5 5

0 - 4 3 - 5 0 - 4 3 - 51 1 - 12 11 - 12 1 - 12 11 - 12

i.e., AT = TJ.0 1 0 1

I -5 0-5I TI = =-1 0.0 4 1 5

Therefore-1 11 0 12

A = TJT-1.

§ 9. Another Method of Constructing a Transforming Matrix

1. We shall now explain another method of constructing a transformingmatrix which often leads to fewer computations than the method of thepreceding section. However, we shall apply this second method only whenthe Jordan normal form and the elementary divisors

(A -11)Pl, (1- A=)A, ... (66)

of the given matrix A are known.Let A=TJT-1, where

P1

J=(11E" +, g(Pl),

At 1 ... 0

0 . P2

1y 1 0

10 . . . 1E

Then denoting the k-th column of T by tk (k =1, 2, ... , n), we replacethe matrix equation

AT =TJ

§ 9. ANOTHER METHOD OF CONSTRUCTING TRANSFORMING MATRIX 165

by the equivalent system of equations

At1= lltl, Atz =)1t2 + tl, - , A tP,= l1tP, + tP,_1 (67)

Atp,+1 = 12tP,+1 , AtP1+2 = 12tp,+2 + tp,+1, ... , Atp,+P:= 12tp,+p, + tp,+pa-1 (68)

which we rewrite as follows :

(A -11E) t1= 0, (A -11E) t2 = t1, ... , (A -11E) tP, = tP,-1(A - AR) tP,+1 = 0, (A - 22E) tP,+2 = tP,+1, ... , (A -12E) tP,+P, = tp,+P.-1

.................................

(67')

(68')

Thus, all the columns of T are split into `Jordan chains' of columns :[t1, t2, .. , 4,1, [tp,+l, tp1+1, ... , tP,+P, 1, ... .

To every Jordan block of J (or, what is the same, to every elementarydivisor (66)) there corresponds its Jordan chain of columns. Each Jordanchain of columns is characterized by a system of equations of type (67),(68), etc.

The task of finding a transforming matrix T reduces to that of findingthe Jordan chains that would give in all n linearly independent columns.

We shall show that these Jordan chains of columns can be determined bymeans of the reduced adjoint matrix C(2) (see Chapter IV, § 6).

For the matrix C(pl) we have the identity

(1E - A) C (1) = y, (1) R. (69)

where W(2) is the minimal polynomial of A.Let

10(1)=(1-1oYnx(1) Q(10) 0)-

We differentiate the identity (69) term by term m - 1 times :

(1E - A) 0'(A) + C (1) =1P' (1) E(1E-A)C"(1)+2C'(1)=+p"(1)E

(1E - A) 0 m-1) (1) + (m - 1) &A-2) (1).=,(m-1) (1) E.

(70)

Substituting 20 for 1 in (69) and (70) and observing that the right-handsides are zero, we obtain

(A-1oE)C=O, (A-1oE)D=C, (A-1oE)F=D,..., (A-1oE)K=Q; (71)

where

C =C (10), D = 1+

C' (10) , F = 2 j C" (10) , ... , a = (in 2 ! C(m-2) (10)

172)

K=(m 1 I


In (71) we replace the matrices (72) by their k-th columns (k = 1, 2, ...n). We obtain :

(A-AOE)Ck=o, (A-AOE)Dk=Ck,..., (A - AaE)Kk =0k (73)

(k=1,2, ..,n).Since C = C(A() 0,85 we can find a k (c n) such that

Ck 760- (74)

Then the m columnsCk,Dk,Fk,...,0k,Ak (75)

are linearly independent. For letyCk + .Dk + ... + xKk =a. (76)

Multiplying both sides of (76) successively by A - AaE, ... , (A - AaE)we obtain

8Ck+...+xGt=o, ...,xCk=o. (77)

From (76) and (77) we find by (74) :

y=S=... =x-0.Since the linearly independent columns (75) satisfy the system of equa-

tions (73), they form a Jordan chain of vectors corresponding to the ele-mentary divisor (A - Aa) m (compare (73) with (67') ).

If Ck = o for some k, but Dk :Pk o, then the columns Dk, ... , Gk, Kk forma Jordan chain of m -1 vectors, etc.2. We shall now show first of all how to construct a transforming matrix Tin the case where the elementary divisors of A are pairwise co-prime :

With the elementary divisor (A - Af)mf we associate the Jordan chain ofcolumns

CU), DU), ... , 0(i), KU),

constructed as indicated above. Then

(A - AtE) C(l) = o, (A .- A;E) D(f) =C(f), ..., (A -AfE) K(f) = G(9. (78)

When we give to j the values 1, 2, ... , s, we obtain s Jordan chains containingn columns in all. These columns are linearly independent.

35 From C(Xo) = 0 it would follow that all the elements of C(X) have a commondivisor of positive degree, in contradiction to the definition of C(X).


For, suppose that

+ atD(i) + ... + xiV) ] - o . (79)

We multiply both sides of (79) on the left by

(A - A1E)m' ... (A - Ai-1E)-i_a (A -,Z E), mi-1 (A - Af+1E)mi+.... (A - A,E)" (80)

and obtainxi = 0.

Replacing mi -1 successively by m; - 2, m; - 3, ... in (80), we find :

r,=8,=...=xi=0and this is what we had to prove.

We define the matrix T by the formula

T =(C(1), DO), ..., K(1) ($2), D(2), ..., K(2); ...; C("), D(d), ..., K(i)). (81)

Example.

8 3 -10 -33 -1 -4 2 i.2 3 -2 - 4 elementary divisors : (A -13) , (A + 1)2,A _ 2 -1 -3 2 '

2AA''' 'V( ) i (A)1 2 -1 -3 I I -- ,--2)#-1--(A+Afstp(A,µ)= =µ4-3 2 2 1

1 4 0 2

0 (A) = P (AE, A) = As + AA' + (A' -2) A + (A' - 21) E .

We make up the first column C, (A) :

C,(A)=[A'],+2[A2]1+(A'-2) A, + (A' - 2A) E,.

For the computation of the first column of A2 we multiply all the rowsof A into the first column of A. We obtain:36 [A2]1=(1, 4, 0, 2). Multiply-ing all the rows of A into this column, we find : [As ] 1= (3, 6, 2, 3).

Therefore3 1 3 1 ! IA'+3A'-A-3"6 4 ' 2 ' 0 2A1+4A+2

C,(1)= 2 +A 0-2)+(A 2

+(A -2A) 0 2A'-2i3 2 1 1 0 ll+ 2A-}- 1

C16 The columns into which we multiply the rows are written underneath the rows of A.The elements of the row of column-sums are set up in italicq, for checking.

168 VT. EQUIVALENT TRANSFORMATIONS OF POLYNOMIAL MATRICES

Hence CI (1) = (0, 8, 0, 4) and C1'(1)=(8,8,4,4). As C1(-1)=(0, 0, 0, 0), we pass on to the second column and, proceeding as before, wefind : C2(- 1) = (-4,0,-4, 0) and C2(- 1) = (4,-4,4,-4). We setup the matrix:

0 8 -4 4

'8 8 0 -4

S(-I))=(C1(1), Ci (1) ; C:(-1), C 0 4 -4 44 4 0 -4

We cancel37 4 in the first two columns and -4 in the last two columns.

T=0 2 1 -12 2 0 1

0 1 1 -11 1 0 1

We leave it to the reader to verify that

AT=Ti 1 0 00 1 0 00 0 -1 1

0 0 0 -l3. Coming now to the general case, we shall investigate the Jordan chainsof vectors corresponding to a characteristic value Aa for which there are pelementary divisors (A - Aa )'", q elementary divisors (A - Ao) "'-', r ele-mentary divisors (A - A,,)"'-2, etc.

As a preliminary to this, we establish some properties of the matrices

C =C (A0), D = C' (Ao), F = 21 C" (,lo), ..., K =(m

1 &-')&-') (Ao). (82)

1. The matrices (82) can be represented in the form of polynomials in A:

whereC=h1(A), D=h,(A), ..., K=h,,,(4), (83)

ht (A) _ (AV (

AO),(i=1,2, ..., m). (84)

For

whereC(A)=Y'(AE,A),

+F(A,p)= JP (1A)tV(A)p-A

37 A Jordan chain remains a Jordan chain when all its columns are multiplied by anumber c f 0.


Therefore

where

I C(k) (A°-)- 1111(1) (A° E, A),k! k!

IPA LAO

_ I fOk Yl

1P (0)= ht+l (E+) (86)_ kt [0. -A. (1--2°)k+f

(83) follows from (82), (85), and (86).

2. The matrices (82) have the ranks

p, 2p +q, 3p + 2q -{- r.....

This property of the matrices (82) follows immediately from 1. andTheorem 8 (§ 7), if we equate the rank to n-d and use formula (48)for the defect of a function on A (p. 154).

3. In the sequence of matrices (82) every column of each matrix is alinear combination of the columns of every following matrix.

Let us take two matrices h4(A) and hk(A) in (82) (see 1.). Supposethat i < k. Then it follows from (84) that:

hi (A) = hk (A) (A - 2° E)k-;.

Hence the j-th column y1 (j =1, 2, ... , n) of hi (A) is expressed linearly bythe columns z1, z2, . . . , z of ht (A)

n

yi=''ar9zv.0-1

where a1i a2, ... , a, are the elements of the j-th column of (A

4. Without changing the basic formulas (71) we may replace any col-umn in C by an arbitrary linear combination of all the columns, providedwe make the corresponding replacements in D, . . . , K.

We now proceed to the construction of the Jordan chains of columns forthe elementary divisors

(,1-- °)'"; (h-ho)°,-1

P q

Using the properties 2. and 4., we transform the matrix C into the form

0 =(Cl, C ..., C,; o, o, ..., o); (87)


where the columns C,, C2, .. . , C9 are linearly independent. Now

D (D1, D,, ..., Dp; Dp+1, ..., D,).

By 3., for every i (1 < i!5; p) Ci is a linear combination of the columns

D1, D2,...,D,,:Ci = a,D1 + ....+ apDp + ap+1 Dp+1 + ... + anDn. (88)

We multiply both sides of this equation by A -10E. Observing (see (73) )that

(A-20E)Ci=o (i=1,2,...,p), (A-20E)D/=C1

we obtain by (87)o = alCI + a9C2 + ... + apCA;

hence in (88) a1=... =ap-0.

Therefore the columns C,, e2, ... , Cp are linearly independent combinationsof the columns D,,+,, ... , D,a. Therefore by 4. and 2., we can, without chang-ing the matrix C, take the columns C1, ... , C. instead of Dp+ 1, D2,' andzeros instead of Den+q+1, ... , Dn.

Then the matrix D assumes the form

D=(DI, ..., Dp; C1, C ..., Cp; D2p+1, ..., D2p+q; O, o...., o). (89)

In the same way, preserving the forms (87) and (89) of the matrices C andD, we can represent the next matrix F in the form

F =(r119 ..., Fp; D1, ..., Dp; F2p+1, F2p+4I C10 ..., Cp; t (90)

D2p+1, .. ., D2p+g, F3p+2q.1, ..., F$pf?q+r, O, ..., o), J

etc.

Formulas (73) gives us the Jordan chains

m rn

(Cl, Dl, ..., K1), ..., (Cp, Dp, ..., Ap);

pm-1 m-1

(D2p+11 F2p+1...., K2p+11) ..., (D2p+q, F2p+q, ..., K2p+q); ...

q

(91)

These Jordan chains are linearly independent. For all the columns Ciin (91) are linearly independent, because they form p linearly independentcolumns of C. All the columns Ci, D, in (91) are independent, because theyform 2p + q independent columns in D, etc. ; finally, all the columns in (91)


are independent, because they form no = mp + (m -1) q + ... independentcolumns in K. The number of columns in (91) is equal to the sum of theexponents of the elementary divisors corresponding to the given character-istic value A,,.

Suppose that the matrix A = II ack II z has s distinct characteristic values

=1, 2,. .,s;Ai (jd (.) _ (A-Al)n' (A -Is)", ... (A-A,)",

(A) _ (A - A,)" (2A2) % ... (A - A')M,

For each characteristic value Ai we form its system of independent Jordanchains (91) ; the number of columns in this system is equal to n; (j =1, 2,3,.. . , s). All the chains so obtained contain n = nl + n2 + ... + n, columns.

These n columns are linearly independent and form one of the requiredtransforming matrices T.

The proof of the linear independence of these n columns proceeds asfollows.

Every linear combination of these n columns can be represented in theform

(92)

where Hi is a linear combination of columns in the Jordan chains (91)corresponding to the characteristic value A, (j = 1, 2, . . . , s). But every col-umn in the Jordan chain corresponding to the characteristic value Ai satisfiesthe equation

Therefore

(A-A,E)'Ix=o.

(A - A 5E)"'i Hi = o. (93)

We take a fixed number j (1 < j < s) and construct the Lagrange-Sylvester interpolation polynomial r(A) (See Chapter V, §§ 1, 2) with thefollowing values on the spectrum of the matrix :

r (Ac)= r' (A:) _... = rlm'`I (At)= 0 for ijand

r (A!) = 1, r' (Ai) = ... = r("'t-1) (Ai) = 0.

Then, for every i r (A) is divisible by (A - Al )"'i without remainder ;therefore by (93),

r (A) Hi = o ( i : 7 4 - - j) . (94)


In exactly the same way, the difference r(1) -1 is divisible by (A-A,)m'without remainder; therefore

r (A) H, = H1. (95)

Multiplying both sides of (92) by r(A), we obtain from (94) and (95)

Hi=o.

This is valid for every j =1, 2, ... , s. But Hi is a linear combination ofindependent columns corresponding to one and the same characteristic valueA, (j = 1, 2, ... , s). Therefore all the coefficients in the linear combinationHf (j =1, 2, ... , s), and hence all the coefficients in (92), are equal to zero.

Note. Let us point out some transformations on the columns of thematrix T under which it is transformed into the same Jordan form (with thesame arrangement of the Jordan diagonal blocks) :

1. Multiplication of all the columns of an arbitrary Jordan chain by anon-zero number.

H. Addition to each column (beginning with the second) of a Jordanchain of the preceding column of the same chain, multiplied by one and thesame arbitrary number.

III. Addition to all the columns of a Jordan chain of the correspondingcolumns of another chain containing the same or a larger number of columnsand corresponding to the same characteristic value.

Example 1.

A=

1 0 0 1 -10 1 -2 3 -30 0 -I 2 -21 -1 1 0 1

1 -1 1 -1 2

A (1)=(A-1)4(2+ 1),1)=1'-A'-1+ 1.

Elementary divisors of the matrix A(2 -1)', (A - 1)', 1 + 1.

1 1 0 0 00 1 0 0 0

J= 0 0 1 1 00 0 0 1 00 0 0 0 -1

V (µ-R (1) =i"+ (2-I), +2:-A-1C(1)P(AE,A)=A'+(I-1)A+(x'-1-1)E.

Let us compute successively the column of A2 and the correspondingcolumns of C(2), C(1), C'(A), C'(1), C(- 1). We must obtain two linearlyindependent columns of C(1) and one non-zero column of C(- 1).

§ 9.

C(1)=

C(+ 1) =

ANOTHER METHOD OF CONSTRUCTING TRANSFORMING MATRIX 173

1 0 0 2*'0 1 0 2*0 0 1 0*44+(A-1)2 -2 2 -1*2-2 2-2*0 0 0 2*0 0 0 2*0 0 0 0*,C,(A)2-2 2-2*2-2 2-2*

2 -x- * 10 * * 3C'(+1)= 0 * E 2

1 * * 1

1 0 0 1*0 1 -2 3#0 0 -1 2*1 -1 1 0*1 -1 1 -1*1 0 0 1*0 1 -2 3*0 0 -1 2*1 -1 1 0--1 -1 1 -1*

1000001000+1) 0 0 1000001000001

1000001000

+(21-1) 0 0 1 0 0,0001000001

0 0 0 - t-0 0 40 0 40 0 0 E0 0 0

Therefore380 2 2 1 00 0 2 3 4

T= (Cl (+1),C'(+1),C,(+1),C' (+1), Cs(-1))= 0 0 0 2 42 1 -2 1 02 1 -2 -1 0

The matrix T can be simplified a little. We1) Divide the fifth column by 4;2) Add the first column to the third and the second to the fourth;3) Subtract the third column from the fourth;4) Divide the first and second columns by 2;5) Subtract the first column, multiplied by 1/2, from the second.

Then we obtain the matrix.0 1 2 1 0

0 0 2 1 1

0 0 0 2 1

1 0 0 2 01 0 0 0 0

We leave it to the reader to verify that AT1= T1J and I T1 0.

Example 2.

A=1 -1 1 -1

-3 3 -5 48 -4 3 -416 - 10 11 - 11

V

A(A)=(2+1)`,,(2)=(1+1)3.

Elementary divisors : (A + 1)9, A + 1.

38 Here the subscript denotes the number of the column; for example, C3(-1) denotesthe third column of C(-l).


!-1 1 0 011

0 -1 1 0'o 0 -1 0I0 0 0

-1

We form the polynomials

hi(2)= +i =(1+1)', h2(A)_{ h3(A)

and the matrices39

C = h,(A)= (A+E)2, D=h,(A)=A+E, F=E:

0 0 0 0 2 -1 1 -1 1 0 0 0

C= 2 -1 1 -1 D= -3 4 -5 4 F= 0 1 0 00 0 0 0

, 8 -4 4 -4I ,

0 0 1 0

-2 1 -1 1 15 - 10 11 - 10 0 0 0 1

For the first three columns of T we take the third column of these mat-rices : T = (C3f D3, F3, *) . In the matrices C, D, F, we subtract twice thethird column from the first and we add the third column to the second andto the fourth. We obtain

0 0 0 0 0 0 1 0 1 0 0 0

c= 0 0 1 0 D= 7 -1 -5 -1F=

0 1 0 00 0 0 0 0 0 4 0 , -2 1 1 1

0 0 -1 0 -7 1 11 1 0 0 0 1

In the matrices D, F, we add the fourth column, multiplied by 7, to the firstand subtract the fourth column from the second. We obtain

0 0 0 0 0 0 1 0 1 0 0 0C= 0 0 1 0 D 0 0 -5 -1 0 1 0 0

0 0 0 0 0 0 4 0 5 0 1 1

0 0 -1 0 0 0 11 1 7 -1 0 1

For the last column of T we take the first column of F. Then we have

0 1 0 1

1 -5 0 0T=(C3,D3, F3, F1)=0 4 1 5

-1 11 0 7

As a check, we can verify that AT = TJ and that I T 0.

CHAPTER VIITHE STRUCTURE OF A LINEAR OPERATOR

IN AN n-DIMENSIONAL SPACE

(Geometrical Theory of Elementary Divisors)

The analytic theory of elementary divisors expounded in the preceding chap-ter has enabled us to determine for every square matrix a similar matrixhaving `normal' or `canonical' form. On the other hand, we have seen inChapter III that the behaviour of a linear operator in an n-dimensionalspace with respect to various bases is given by means of a class of similarmatrices. The existence of a matrix of normal form in such a class is closelyconnected with important and deep properties of a linear operator in ann-dimensional space. The study of these properties is the object of thepresent chapter. The investigation of the structure of a linear operator willlead us, independently of the contents of the preceding chapter, to thetheory of transformations of a matrix to a normal form. Therefore thecontents of this chapter may be called the geometrical theory of elementarydivisors."

§ 1. The Minimal Polynomial of a Vector and a Space(with Respect to a Given Linear Operator)

1. We consider an n-dimensional vector space R over the field r and alinear operator A in this space.

Let x be an arbitrary vector of R. We form the sequence of vectors

x, Ax, Asx, .... (1)

Since the space is finite-dimensional, there is an integer p (0< p:!5 n)such that the vectors x, Ax, ..., AP''x are linearly independent, while Apxis a linear combination of these vectors with coefficients in F :

" The account of the geometric theory of elementary divisors to be given here is basedon our paper [167]. For other geometrical constructions of the theory of elementarydivisors, see [22], §§ 96-99 and also [53].

175

176 VII. STRUCTURE OF LINEAR OPERATOR IN n-DIMENSIONAL SPACE

APx =-Y1Ap-lx- y2Ap-2x - ... - ypx. (2)

We form the monic polynomial p(A) = AP + yl AP-1 + . + Yp_lA + yp(A monic polynomial is a polynomial in which the coefficient of the highestpower of the variable is unity.) Then (2) can be written:

q,(A)x=o. (3)

Every polynomial q: (A) for which (3) holds will be called an annihilatingpolynomial for the vector x.2 But it is easy to see that of all the monicannihilating polynomials of x the one we have constructed is of least degree.This polynomial will be called the minimal annihilating polynomial of x orsimply the minimal polynomial of x.

Note that every annihilating polynomial (A) of x is divisible by theminimal polynomial T(A).

For let(A) = q' (2) x (A) + e (1),

where x(A), o(A) are quotient and remainder on dividing (A) byThen

4p(A)x=x(A) p(A)x+e(A)x= e(A)x

and theref ore p (A) x = o. But the degree of o (A) is less than that of theminimal polynomial p(A). Hence e(A) -0.

From what we have proved it follows, in particular, that every vectorx has only one minimal polynomial.

2. We choose a basis el, es, ... , e, in R. We denote by q1(2), p2 (A), ... ,97*(A) the minimal polynomials of the basis vectors e1, e$, . . . , e and byV (A) the least common multiple of these polynomials (v' (A) is taken withhighest coefficient 1). Then p(A) is an annihilating polynomial for all thebasis vectors e1, e$, . . . , eA. Since every vector x e R is representable inthe form x = xl el + xs e2 + + x, e, , we have

v(A)x=xiV (A)e1+x2'V (A)e2+...+x,yi(A)e.=osi.e.,

ip (A) = O. (4)

The polynomial v(A) is called an annihilating polynomial for the wholespace R. Let W (2) be an arbitrary annihilating polynomial for the wholespace R. Then w (A) is an annihilating polynomial for the basis vectors

2 Of course, the phrase `with respect to the given operator A' is tacitly understood.For the sake of brevity, this circumstance is not mentioned in the definition, becausethroughout this entire chapter we shall deal with a single operator A.

§ 2. DECOMPOSITION INTO INVARIANT SUBSPACES 177

e1, e2, .... e . Therefore Y (A) must be a common multiple of the minimalpolynomials q'r (A), gis(l), ... , 99-(A) of these vectors and must therefore bedivisible without remainder by their least common multiple V(1). Henceit follows that, of all the annihilating polynomials for the whole space R,the one we have constructed, yp(A), has the least degree and it is monic.This polynomial is uniquely determined by the space R and the operator Aand is called the minimal polynomial of the space R.3 The uniqueness ofthe minimal polynomial of the space R follows from the statement provedabove: every annihilating polynomial +p(A) of the space R is divisible by theminimal polynomial y'(A). Although the construction of the minimal poly-nomial y(A) was associated with a definite basis e1, e2, ... , en , the poly-nomial pp(A) itself does not depend on the choice of this basis (this followsfrom the uniqueness of the minimal polynomial for the space R).

Finally we mention that the minimal polynomial of the space R anni-hilates every vector x of R so that the minimal polynomial of the space isdivisible by the minimal polynomial of every vector in the space.

§ 2. Decomposition into Invariant Subspaceswith Co-Prime Minimal Polynomials

1. If some collection of vectors R' forming part of R has the property thatthe sum of any two vectors of R' and the product of any vector of R' by anumber a e F always belongs to R', then that manifold R' is itself a vectorspace, a subspace of R.

If two subspaces R' and R" of R are given and if it is known that1. R' and R" have no vector in common except the null vector, and2. every vector x of R can be represented in the form of a sum

x=x'+x" (x'ER',x'"ER"), (5)

then we shall say that the space R is decomposed into the two subspaces R'and R" and shall write:

R=R'+R" (6)

Note that the condition 1. implies the uniqueness of the representation(5). For if for a certain vector x we had two distinct representations in theform of a sum of terms from R' and R", (5) and

x = u' + ac" (ac'E R' z" a R") (7)

then, subtracting (7) from (5) term by term, we would obtain:

3 If in some basis e,, eR, . . ., e a matrix A = II ask II? then the annihilating or minimalpolynomial of the space R (with respect to .4) is the annihilating or minimal polynomialof the matrix A, and vice versa. Compare with Chapter IV, § 6.


x'-z'=z"-x"i.e., equality of the non-null vectors x'-z a R' and z"-x" a R", which,by 1., is impossible.

Thus, condition 1. may be replaced by the requirement that the repre-sentation (5) be unique. In this form, the definition of decompositionimmediately extends to an arbitrary number of subspaces.

LetR=R'+R"

and let e' e' e', and e" e" e,"', be bases of R' and R", respec-t, 2, ...,a

I 2 r .. f

tively. Then the reader can easily prove that all these n' + n" vectors arelinearly independent and form a basis of R, so that a basis of the whole spaceis formed from bases of the subspaces. It follows, in particular, thatn=n'+n".

Example 1. Suppose that in a three-dimensional space three directions,not parallel to one and the same plane, are given. Since every vector in thespace can be split, uniquely, into components in these three directions, wehave

R=R'+R"+R"',

where R is the set of all the vectors of one space, R' the set of all vectorsparallel to the first direction, R" to the second, and R"' to the third. Inthis case, n=3 and n'=n"=n'1'=1.

Example 2. Suppose that in a three-dimensional space a plane and aline intersecting the plane are given. Then

R=R'+R",

where R is the set of all vectors of our space, R' the set of all vectors parallelto the given plane, and R" the set of all vectors parallel to the given line.In this example, n = 3, n' = 2, n" =1.

2. A subspace R'CR is called invariant with respect to the operator A ifAR' C R', i.e. if x e R' implies Ax a R'. In other words, the operator A carriesa vector of an invariant subspace into a vector of the same subspace.

In what follows we shall carry out a decomposition of the whole spaceinto subspaces invariant with respect to A. The decomposition reduces thestudy of the behavior of an operator in the whole space to the study of itsbehavior in the various component subspaces.

We shall now prove the following theorem :

§ 2. DECOMPOSITION INTO INVARIANT SUBSPACES 179

THEOREM 1 (First Theorem on the Decomposition of a Space into Invari-ant Subspaces) : If for a given operator A the minimal polynomial y'(1)of the space is represented over F in the form of a product of two co-primepolynomials y'1(A) and "(A) (with highest coefficients 1)

1V (1) =1V1(A) tV2 00 , (8)

then the whole space R splits into two invariant subspaces 11 and 12

R=11+12, (9)

whose minimal polynomials are V, (A) and tp2(1), respectively.Proof.We denote by l1 the set of all vectors xeR satisfying the equation

1p1 (A) x = o. 12 is similarly defined by the equation V2 (A) x = o. 11 and12 so defined are subspaces of R.

Since y,1 (A) and Y2 (A) are co-prime, it follows that there exist poly-nomials X1 (A) and X2(1) (with coefficients in F) such that

l=W1(A)X1(A)+%02(A)X2(A). (10)

Now let x be an arbitrary vector of it In (10) we replace 1 by A andwe apply both sides of the operator equation so obtained to the vector x :

x-'V1(A)X1(A)x+TVs(A)X2(A)x, (11)

i.e.,x = x' -}- (12)

where

x' (A) (A)x=' (A (Ax"=,V2 X2 V1 )x)X1 (13)

Furthermore,

+Vi(A)x'=+p (A)X,(A)x=o, 1V2(A)x"=lo(A)Xi(A)x=o,

x'e 11,andx"e 12.

11 and 12 have only the null vector in common. For if xo a 11 and x, a l2,i.e., v,, (A) x, = o and v2 (A) x,= o, then by (11)

xo=Xi(A)%V1(A)xo+X2(A)+V2(A)xo=o.

Thus we have proved that R =11 + 12.


Now suppose that x e 11. Then V, (A) x = o. Multiplying both sided ofthis equation by A and reversing the order of A and 9,1(A), we obtainy,, (A) A x = o, i.e., Axe 11. This proves that the subspace I, is invariantwith respect to A. The invariance of the subspace 12 is proved similarly.

We shall now show that -y, (A) is the minimal polynomial of 1,. Let(A) be an arbitrary annihilating polynomial for I,, and x an arbitrary

vector of R. Using the decomposition (12) already established, we write :

Since x is an arbitrary vector of R. it follows that the product 1(2)2(2)is an annihilating polynomial for R and is therefore divisible by V(A)V,(A)y'2(A) without remainder; in other words, y,l(A) is divisible by ip', (A).But gPj(A) is an arbitrary annihilating polynomial for I, and V, (A) is aparticular one of the annihilating polynomials (by the definition of 11).Hence y,, (A) is the minimal polynomial of I. In exactly the same way it isshown that y,2(A) is the minimal polynomial for the invariant subspace 12.

This completes the proof of the theorem.Let us decompose lp(A) into irreducible factors over F:

V (1) = 197, M]`' [972 (A)]`' ... [T. (^)]`' (14)

(here p, (A), rp2(2), ..., ggs(A) are distinct irreducible polynomials over F withhighest coefficient 1). Then by the theorem we have

R=11+12+...+ln, (15)

where lk is an invariant subspace with the minimal polynomial [pk(Affk(k=1,2,...,s).

Thus, the theorem reduces the study of the behaviour of a linear operatorin an arbitrary space to the study of the behaviour of this operator in aspace where the minimal polynomial is a power of an irreducible polynomialover F. We shall take advantage of this to prove the following importanttheorem :

THEOREM 2: In a vector space there always exists a vector whose minimalpolynomial coincides with the minimal polynomial of the whole space.

We consider first the special case where the minimal polynomial of thespace R is a power of an irreducible polynomial qq(A) :

V (A) _ [9) MY

§ 3. CONGRUENCE. FACTOR SPACE 181

In R we choose a basis e1, e2, ... , e,,. The minimal polynomial of e1

is a divisor of W(A) and is therefore representable in the form [p(A)]1d, where

l{< t (i=1,2,...,n).But the minimal polynomial of the space is the least common multiple of

the minimal polynomials of the basis vectors, so that ip(A) is the largest of the

powers [p(A)]" (i=1, 2, . . . , n). In other words, y, (A) coincides with theminimal polynomial of one of the basis vectors e1, e2, ... , e,,.

Turning now to the general case, we prove the following preliminary

lemma:LEMMA: If the minimal polynomials of the vectors a and e' are co-

prime, then the minimal polynomial of the sum vector e' + e" is equal to theproduct of the minimal polynomials of the constituent vectors. .

Proof. Let xl (A) and X2 (A) be the minimal polynomials of the vectorse' and e". By assumption, XI (A) and X2 (A) are co-prime. Let X0) be anarbitrary annihilating polynomial of the vector e = e' + e". Then

X$(A)X(A)e'=X2(A)X(A)e-X(A)xs(A)e..= o,

i.e., X2(A)X(A) is an annihilating polynomial of e'. Therefore X2(A)XP)is divisible by Xi (A), and since X, (A) and X2 (A) are co-prime, x (A) is divisibleby xl (A). It is proved similarly that X (A) is divisible by x2 0). But xl (A)and X2 (A) are co-prime. Therefore X (A) is divisible by the productX, (A) X2 (A). Thus, every annihilating polynomial of the vector e is divisibleby xl (A) Xi (A). Therefore xl (A) X2 (2) is the minimal polynomial of the vectore=e'+e".

We now return to Theorem 2. For the proof in the general case we usethe decomposition (15). Since the minimal polynomials of the subspaces11, 12, ... , 1, are powers of irreducible polynomials, our assertion is alreadyproved for these subspaces. Therefore there exist vectors e' a 11, e" a 12, ... ,e0a I, whose minimal polynomials are [q'1 (2)]c , ... , (2)]c , re-spectively. By the lemma, the minimal polynomial of the vectore = e.' + e" + + e(') is equal to the product

[T1 (A)]`' lips (A)IC, ... [p. (A)]`"

i.e., to the minimal polynomial of the space R.

§ 3. Congruence. Factor Space

L Suppose given a subspace ICR. We shall say that two vectors x, y of Rare congruent modulo 1 and shall write x = y (mod I) if and only if y - x e 1.It is easy to verify that the concept of congruence so introduced has thefollowing properties :

182 VII. STRUCTURE OF LINEAR OPERATOR IN 9t-DIMENSIONAL SPACE

For all x, y, aeR1. x = x (mod I) (reflexivity of congruence).2. From x = y (mod 1) it follows that y = x (mod I) (symmetry of

congruence).3. From x - y (mod 1) and y = z (mod I) it follows that x = z (mod 1)

(transitivity of congruence).

The presence of these three properties enables us to make use of congru-ence to divide all the vectors of the space into classes, by assigning vectorsthat are pairwise congruent (mod I) to the same class (vectors of distinctclasses are incongruent (mod 1)). The class containing the vector x will bedenoted by t.' The subspace I is one of these classes, namely o. Note thatto every congruence x = y (mod 1) there corresponds the equalitye of theassociated classes : x =:Y.

It is elementary to prove that congruences may be added term by termand multiplied by a number of F :

1. From

it follows that

2. From

it follows that

x - x' and y - y' (mod l)

x+y=x'+y' (mod1).

xx' (mod l)

ax - ax' (mod 1) (a a F).

These properties of congruence show that the operations of addition andmultiplication by a number of F do not 'breakup' the classes. If we take twoclasses x and F and add elements x, x', ... of the first class to arbitrary ele-ments y, y', . . . of the second class, then all the sums so obtained belong to oneand the same class, which we call the sum of the classes x and y and denoteby x + y`. Similarly, if all the vectors x, x', . . . of the class x are multipliedby a number a e F, then the products belong to one class, which we denoteby ax.

Thus, in the manifold R of all classes x', y, ... two operations are intro-duced: `addition' and `multiplication by a number of F.' It is easy toverify that these operations have the properties set forth in the definitionof a vector space (Chapter III, § 1). Therefore R, as well as R, is a vector

6 Since each class contains an infinite set of vectors, there is, by this condition, aninfinite number of ways of designating the class.

6 That is, identity.

§ 3. CONGRUENCE. FACTOR SPACE 183

space over the field F. We shall say that R is a factor space of R. If n, m, nare the dimensions of the spaces R, 1, R, respectively, then n = n - m.

2. All the concepts introduced in this section can be illustrated very well by

the following example.Example. Let R be the set of all vectors

1 x y x+ y of a three-dimensional space and r the fieldof real numbers. For greater clarity, weshall represent vectors in the form of directedsegments beginning at a point 0. Let I be astraight line passing through 0 (more accu-rately : the set of vectors that lie along someline passing through 0; Fig. 4.).

The congruence x = x' (mod I) signifiesthat the vectors x and x' differ by a vector ofI, i.e., the segment containing the end-pointsof x and x' is parallel to I. Therefore theclass x is represented by the line passingthrough the end-point of x and parallel to 1(more accurately: by the `bundle' of vectorsstarting from 0 whose end-points lie on thatline). `Bundles' may be added and multi-plied by a real number (by adding and multi-plying the vectors that occur in the bundles).These `bundles' are also the elements of thefactor space R. In this example, n = 3,m=1, n=2.

We obtain another example by taking forFig. 4 1 a plane passing through 0. In this example,

n=3, m=2, n=1.Now let A be a linear operator in R. Let us assume that I is an invariant

subspace with respect to A. The reader will easily prove that from x = x'(mod 1) it follows that Ax - Ax' (mod 1), so that the operator A can beapplied to both sides of a congruence. In other words, if the operator A isapplied to all vectors x, x', ... of a class x, then the vectors Ax, Ax', ... alsobelong to one class, which we denote by A. The linear operator A carriesclasses into classes and is, thus, a linear operator in R.

We shall say that the vectors xl, xs, ..., xp are linearly dependentmodulo I if there exist numbers al, a2,... , ap in F, not all equal to zero,such that

aixi + a2x2 + .. + apxp ; o (mod 1). (16)

184 VII. STRUCTURE OF LINEAR OPERATOR IN 9I-DIMENSIONAL SPACE

Note that not only the concept of linear dependence of vectors, but alsoall the concepts, statements, and reasonings, in the preceding sections of thischapter can be repeated word for word with the symbol `_' replaced through-out by the symbol `- (mod 1),' where I is some fixed subspace invariantwith respect to A.

Thus, we can introduce the concepts of an annihilating polynomial andof the minimal polynomial of a vector or a space (mod I). All these con-cepts will be called `relative,' in contrast to the àbsolute' concepts that wereintroduced earlier (and that hold for the symbol '=').

The reader should observe that the relative minimal polynomial (of a vec-tor or a space) is a divisor of the absolute one. For example, let of (A) be therelative minimal polynomial of a vector x and o(A) the corresponding abso-lute minimal polynomial.

Theno(A)x=o,

and hence it follows that also

a(A)x-o (mod l).

Therefore o(A) is a relative annihilating polynomial of x and as suchis divisible by the relative minimal polynomial ol(A).

Side by side with the àbsolute' statements of the preceding sections wehave `relative' statements. For example, we have the statement: Ìn everyspace there always exists a vector whose relative minimal polynomial coin-cides with the relative minimal polynomial of the whole space.'

The truth of all `relative' statements depends on the fact that by operat-ing with congruences modulo I we deal essentially with equalities-howevernot in the space R, but in the space R.

§ 4. Decomposition of a Space into Cyclic Invariant Subspaces

1. Let a (,t) = 2'+ al Ap-' + -i- ap_12 -I- ap be the minimal polynomial ofa vector e. Then the vectors

e, Ae, ..., A''le

are linearly independent, and

Ape = - ape - ap_1Ae - alAp-'e .

(17)

(18)

§ 4. DECOMPOSITION OF SPACE INTO CYCLIC INVARIANT SUBSPACES 185

The vectors (17) form a basis of a p-dimensional subspace 1. We shallcall this subspace cyclic in view of the special character of the basis (17)and of (18) .7 The operator A carries the first vector of (17) into the second,the second into the third, etc. The last basis vector is carried by A into alinear combination of the basis vectors in accordance with (18). Thus, Acarries every basis vector into a vector of I and hence an arbitrary vector of Iinto another vector of I. In other words, a cyclic subspace is always invariantwith respect to A.

Every vector x e I is representable in the form of a linear combination ofthe basis vectors (17), i.e., in the form

x-x(A)e, (19)

where x (A) is a polynomial in A of degree < p - 1 with coefficients in F.By forming all possible polynomials x (A) of degree < p -1 with coeffi-cients in F we obtain all the vectors of I, each once only, i.e., for only onepolynomial x (A) . In view of the basis (17) or the formula (19) we shallsay that the vector e generates the subspace.

Note that the minimal polynomial of the generating vector e is also theminimal polynomial of the whole subspace .

2. We are now ready to establish the fundamental proposition of the wholetheory, according to which the space R splits into cyclic subspaces.

Let V, (A) = v(A) = Am + a,1 -1 + .. + a,,, be the minimal polynomial ofthe space R. Then there exists a vector e in the space for which this poly-nomial is minimal (Theorem 2, p. 180). Let Il denote the cyclic subspacewith the basis

e, Ae, ..., Am_1e.(20)

If n = m, then R = I. Suppose that n > m and that the polynomial

jV2(.I)=1p+ f1Ap-1+... +Ap

is the minimal polynomial of R (mod 11). By the remark at the end of § 3,V2(A) is a divisor of tp, (A), i.e., there exists a polynomial x(A) such that

V, (A) = w2 (A) X (A) (21)

7It would be more accurate to call this subspace: cyclic with respect to the linearoperator A. But since the whole theory is built up with reference to a single operator A,the words 'with respect to the linear operator A' are omitted for the sake of brevity (seethe similar remark in footnote 2, p. 176).

186 VII. STRUCTURE OF LINEAR OPERATOR IN n-l)IMENSIONAI, SPACE

Moreover, in R there exists a vector g* whose relative minimal polynomialis ip2(1). Then

V, (A) g* -o (mod 11),

i.e., there exists a polynomial x (1) of degree < m - 1 such that

tp2(A)g*=x(A)e.

(22)

(23)

We apply the operator x(A) to both sides of the equation. Then by (21)we obtain on the left Wl (A)g*, i.e. zero, because ypl (1) is the absolute minimalpolynomial of the space; therefore

x(A)x(A)e=o.

This equation shows that the product x (1) X (1) is an annihilating poly-nomial of the vector e and is therefore divisible by the minimal polynomialVJL (1) = x (1) Ws (1), so that x (1) is divisible by V2 (1) :

X (A) X, (1) V2 (1), (24)

where x1(1) is a polynomial. Using this decomposition of x (A), we mayrewrite (23) as follows:

Y's (A) [g* - xl (A) e] = o.

We now introduce the vector

g=g*-xl(A)e.Then (25) can be written as follows:

1VS(A)g=o.

(25)

(26)

(27)

The last equation shows that 1p2(A) is an absolute annihilating polynomialof the vector g and is therefore divisible by the absolute minimal polynomialof g. On the other hand, we have from (26) :

g=g* (modll). (28)

Hence V2 (A), being the relative minimal polynomial of g*, is the same forg as well. Comparing the last two statements, we deduce that W2 (1) is simul-taneously the relative and the absolute minimal polynomial of g.

From the fact that V2(1) is the absolute minimal polynomial of g itfollows that the subspace l2 with the basis

g, Ag, ..., A" 'g (29)is cyclic.


From the fact that "(A) is the relative minimal polynomial of g (mod 11)

it follows that the vectors (29) are linearly independent (mod 11), i.e., nolinear combination with coefficients not all zero can be equal to a linearcombination of the vectors (20). Since the latter are themselves linearlyindependent, our last statement asserts the linear independence of the m + pvectors

e, Ae, ..., A'-'e; g, Ag, ..., A"g. (30)

The vectors (30) form a basis of the invariant subspace 11 + 12 of dimen-sion m + p.

If n = m + p, then R=11 + 12. If n> m + p, we consider R (mod11 + 12) and continue our process of separating cyclic subspaces. Since thewhole space R is of finite dimension n, this process must come to an endwith some subspace It, where t < n.

We have arrived at the following theorem :

THEOREM 3 (Second Theorem on the Decomposition of a Space intoInvariant Subspaces) : Relative to a given linear operator A the space canalways be split into cyclic subspaces 11, 12, ..., It with minimal polynomialsy'1(2),''2 (2), ... , tVt (2)

R=11+12+. ..+It (31)

such that ip1(A) coincides with the minimal polynomial W(A) of the wholespace and that each t{(,l) i s divisible by ip1_(A) (i= 2, 3, ... , t).3. We now mention some properties of cyclic spaces. Let R be a cyclicn-dimensional space and y' (1) =A-+... its minimal polynomial. Then itfollows from the definition of a cyclic space that m = n. Conversely, sup-pose that R is an arbitrary space and that it is known that m = n. Applyingthe proof of the decomposition theorem, we represent R in the form (31).But the dimension of the cyclic subspace 11 is m, because its minimal poly-nomial coincides with the minimal polynomial of the whole space. Sincem = n by assumption, we have R =11j i.e., R is a cyclic space.

Thus we have established the following criterion for cyclicity of a space:

THEOREM 4: A space is cyclic if and only if its dimension is equal to thedegree of its minimal polynomial.

Next, suppose that we have a decomposition of a cyclic space R into twoinvariant subspaces 11 and 12:

R=11+12. (32)


We denote the dimensions of R, 11, and 12 by n, nl, and n2, their minimalpolynomials by W(A), W1(A), and W2(A), and the degrees of these minimalpolynomials by m, m1i and m2, respectively. Then

m1Sn1, m2Sn2. (33)

We add these inequalities term by term :

m1 + m2 S n1 + n2 (34)

Since V,(A) is the least common multiple of W, (A) and yt(A), we have

mSm1+m2. (35)

Moreover, it follows from (32) that

n=n1+n2. (36)

(34), (35), and (36) give us a chain of relations

mSm1+m25n1+n2=n. (37)

But since the space R is cyclic, the extreme numbers of this chain, m andn, are equal. Therefore we have equality in the middle terms, i.e.,

m=m1+m2=n1+ n2.

From the fact that m = m1 + m2 we deduce that Vil (1) and V2(A) areco-prime.

Bearing (33) in mind, we find from ml + m2= n1 + n2 that

m1= n1 and m2 = n2. (38)

These equations mean that the subspaces 11 and 12 are cyclic.Thus we have arrived at the following proposition :

THEOREM 5: A cyclic space can only split into invariant subspaces that1. are also cyclic and 2. have co-prime minimal polynomials.

The same arguments (in the opposite order) show that Theorem 5 has aconverse :

THEOREM 6: If a space is split into invariant subspaces that 1. are cyclicand 2. have co-prime minimal polynomials, then the space itself is cyclic.

Suppose now that R is a cyclic space and that its minimal polynomial isa power of an irreducible polynomial over F : y, (1) = [p (,l)]°. In this case,the minimal polynomial of every invariant subspace of R must also be apower of this irreducible polynomial q(A). Therefore the minimal poly-nomials of any two invariant subspaces cannot be co-prime. But then, bywhat we have proved, R cannot split into invariant subspaces.


Suppose, conversely, that some space R is known not to split into invari-ant subspaces. Then R is a cyclic space, for otherwise, by the second decom-position theorem, it could be split into cyclic subspaces ; moreover, the mini-mal polynomial of R must be a power of an irreducible polynomial, becauseotherwise R could be split into invariant subspaces, by the first decomposi-tion theorem.

Thus we have reached the following conclusion :

THEOREM 7: A space does not split into invariant subspaces if and onlyif 1. it is cyclic and 2. its minimal polynomial is a power of an irreduciblepolynomial over F.

We now return to the decomposition (31) and split the minimal poly-nomials W,(A),'V2(A), ..., tVt(2) of the cyclic subspaces I,, l2i ..., It intoirreducible factors over F :

Vi (1) = [9'i (A) ]`' [9'2 (A) IC, ... [9', (2)]`' ,

%V2(A)=[9,(1)]a'[9's(A)]a2 .-.[9's(A)J' ,

Y'e (ia) = [9'i (A)]l' [9's (A)]`' ... [9' (7)]t'(ek? dk> > lk> 0;8 k=1 2, .. ., s).

(39)

To It we apply the first decomposition theorem. Then we obtain

11 =1,'+ I,"+ ... + IP);where I,', 11", ... , 1,(') are cyclic subspaces with the minimal polynomials[91(1)]°'" [9's (a)]`', . , [9'. Similarly we decompose the spaces 12, ... ,It. In this way we obtain a decomposition of the whole space R into cyclicsubspaces with the minimal polynomials [Vk (2)]`k, [9'k (2)]dk, ..., [9'k (2)]rk(k =1, 2, ... , s) . (Here we neglect the powers whose exponents are zero.)From Theorem 7 it follows that these cyclic subspaces are indecomposable(into invariant subspaces). We have thus arrived at the following theorem :

THEOREM 8 (Third Theorem on the Decomposition of a Space into Invari-ant Subspaces) : A space can always be split into cyclic invariant subspaces

R=1'+1"+ ...+100. (40)

such that the minimal polynomial of each of these cyclic subspaces is a powerof an irreducible polynomial.

This theorem gives the decomposition of a space into indecomposableinvariant subspaces.

8 Some of the exponents dk,... , ik for k > 1 may be equal to zero.


Note. Theorem 8 (the third decomposition theorem) has been provedby applying the first two decomposition theorems. But it can also beobtained by other means, namely, as an immediate (and almost trivial)corollary of Theorem 7.

For if the space R splits at all, then it can always be split into indecom-posable invariant subspaces :

R=I'+F'+...+IW. (40)

By Theorem 7, each of the constituent subspaces is cyclic and has as itsminimal polynomial a power of an irreducible polynomial over F.

§ S. The Normal Form of a Matrix

1. Let 11 be an m-dimensional invariant subspace of R. In It we take anarbitrary basis e1, e2, ... , e,,, and complement it to a basis

el, es, ... , em, em+1, ..., e.

of R. Let us see what the matrix A of the operator A looks like inthis basis.We remind the reader that the k-th column of A consists of the coordinatesof the vector Aek (k = 1, 2, . . . , n). For k < m the vector Aek a 11 (by theinvariance of I1) and the last n - m coordinates of Aek are zero. ThereforeA has the following form

m n-mA

Al As={ 0 As u

(41)}m)n-In '

where Al and A2 are square matrices of orders m and n - m, respectively,and As is a rectangular matrix. The fact that the fourth `block' is zeroexpresses the invariance of the subspace It. The matrix Al gives theoperator A in 11 (with respect to the basis e1, e,, ... , em).

Let us assume now that e,,,.1i . . . , e, is the basis of some invariant sub-space 12i so that R =11 + 12 and a basis of the whole space is formed fromthe two parts that are the bases of the invariant subspaces Il and 12. Thenobviously the block A3 in (41) is also equal to zero and the matrix A has thequasi-diagonal form

§ 5. NORMAL FORM OF A MATRIX 191

A- II of A II-(Al, A,), (42)

where Al and A2 are, respectively, square matrices of orders m and n - mwhich give the operator in the subspaces 11 and 12 (with respect to the bases

e1, e2, .... e,,, and em+1 , . . . , e"). It is not difficult to see that, conversely,to a quasi-diagonal form of the matrix there always corresponds a decomposi-tion of the space into invariant subspaces (and the basis of the whole spaceis formed from the bases of these subspaces).

2. By the second decomposition theorem, we can split the whole space Rinto cyclic subspaces I1, 12, .. ., It :

(43)

In the sequence of minimal polynomials of these subspaces lpl(2),V,(1) each factor is a divisor of the proceeding one (from which it followsautomatically that the first polynomial is the minimal polynomial of thewhole space).

Let

P1 (A) _ Am + (Xgl)m-1 + ... + a.,IP! (1) =,l,P + PP,

(m ? P;>= ... 9 V) . (44)

Ytt (A) = A° + E1Ao-1 + ... + E, .

We denote by e, g, ... , l generating vectors of the subspaces 11, I2, . . . , itand we form a basis of the whole space from the following bases of the cyclicsubspaces :

e, Ae, ..., A'n-1 e ; g, Ag, ..., AP-1g; ... ;1, Al, ..., A9-11. (45)

Let us see what the matrix L, corresponding to A in this basis looks like.As we have explained at the beginning of this section, the matrix L, must

have quasi-diagonal form

1Ll 0... OLl =

O Ls ..(46)

O 0 ... Lt

The matrix L1 corresponds to the operator A in 11 with respect to the basise1= e, es = Ae, ... , em = A'"-le . By applying the rule for the formation

192 VII. STRUCTURE OF LINEAR OPERATOR IN n-DIMENSIONAI, SPACE

of the matrix for a given operator in a given basis (Chapter III, p. 67), wefind

0 0...

L, _

Similarly

L2=

0 - am1 0 . . . 0 - am_10 1

0 - ,1 -act0 0 . . .

0 0... 0 -py1 0 . .. 0 - PP-10 1

(47)

(48)

0 -Pp,0 0 ... *1 -p1 il

Computing the characteristic polynomials of the matrices L1, L2, ... , Lt,we find :

IAE-L1j=v1(A), JAE-L2I= 2(A), ..., IAE-L:I=1Vt(A)

(for cyclic subspaces the characteristic polynomial of an operator A coin-cides with the minimal polynomial of the subspace relative to this operator).

The matrix L1 corresponds to the operator A in the `canonical' basis (45).If A is the matrix corresponding to A in an arbitrary basis, then A is similarto L1, i.e., there exists a non-singular matrix T such that

A= TL1T-1. (49)

Of the matrix L1 we shall say that it has the first natural normal form.This form is characterized by

1) The quasi-diagonal form ;2) The special structure of the diagonal blocks (47), (48), etc.3) The additional condition : the characteristic polynomial of each diago-

nal block is divisible by the characteristic polynomial of the followingblock.

If we start not from the second, but from the third decomposition theorem,then in exactly the same way we would obtain a matrix L11 corresponding tothe operator A in the appropriate basis-a matrix having the second naturalnormal form, which is characterized by

1) The quasi-diagonal form

LII = (VI), L(2), L("1)

§ 6. INVARIANT POLYNOMIALS. ELEMENTARY DIVISORS 193

2) The special structure of the diagonal blocks (47), (48), etc.;3) The additional condition : the characteristic polynomial of each block

is a power of an irreducible polynomial over F.

3. In the following section we shall show that in the class of similar matricescorresponding to one and the same operator there is one and only one matrixhaving the first normal form,9 and one and only one 10 having the secondnormal form. Moreover, we shall give an algorithm for the computationof the polynomials V, (d), y,2 (A), ... , t, (1) from the elements of the matrix A.Knowledge of these polynomials enables us to write out all the elements of thematrices LI and L11 similar to A and having the first and second naturalnormal forms, respectively.

§ 6. Invariant Polynomials. Elementary Divisors

1. Well denote by D,(2) the greatest common divisor of all the minors oforder p of the characteristic matrix A,, = AE - A (p = 1, 2, ... , n) .12 Sincein the sequence

D. (A), D"-,(,l), ..., D1(A).

each polynomial is divisible by the following, the formulas

i A = D,_1( , i2 (2) = Dn-2 (a)' ... , in (A) =Do ) (D0 (1) - 1) (50)

define n polynomials whose product is equal to the characteristic polynomial

d (1) = I AE - A I = D. (1) = it (2) i2 (2) ... i (A). (51)

We split the polynomials ip(1) (p = 1, 2, ... , n) into irreducible factorsover F :

(p=1,2, ...,n); (52)

where T, (1), 972 (2), ... are distinct irreducible polynomials over F.

9 This does not mean that there exists only one canonical basis of the form (45).There may be many canonical bases, but to all of them there corresponds one and thesame matrix L1.

10 To within the order of the diagonal blocks.11 In subsection 1. of the present section we repeat the basic concepts of Chapter VI,

§ 3 for the characteristic matrix that were there established for an arbitrary polynomialmatrix.

12 We always take the highest coefficient of the greatest common divisor as 1.


The polynomials iI (A), i2 (A), . . . , i (A) are called the invariant poly-nomials, and all the non-constant powers among [971(A)]Yv, [g72(A)]ap, ... arecalled the elementary divisors, of the characteristic matrix A,, = AE - A or,simply, of A.

The product of all the elementary divisors, like the product of all the_invariant polynomials, is equal to the characteristic polynomial A (A)

AE-A1.The name `invariant polynomial' is justified by the fact that two similar

matrices A and A,

A=T-1AT, (53)

always have identical invariant polynomials

ip (1) = iv (1) (p =1, 2, ... , n) . (54)

For it follows from (53) that

Ax = AE --A = T-1 (AR -A) T = T-'A1T. (55)

Hence (see Chapter I, § 2) we obtain a relation between the minors of thesimilar matrices A2 and A.,:

iLAx

ki k2 ... kp/

T-1 ii i2 ... ip) A210.1 oars ... ap) T !Yl F':. .

cap (al a2 ...XP

1U_fi !'a - - Np 1\k1 ks

(p=1, 2,...,n).

.Pp1

. kp

This equation shows that every common divisor of all the minors of orderp of AX is a common divisor of all the minors of order p of Ax, and vice versa(since A and A can interchange places). Hence it follows that Dp(A) =Dp(A)(p = 1, 2, ... , n) and that (54) holds.

Since all the matrices representing a given operator A in various basesare similar and therefore have the same invariant polynomials and the sameelementary divisors, we can speak of the invariant polynomials and theelementary divisors of an operator A.

2. We choose now for l the matrix LI having the first natural normal formand we compute the invariant polynomials of A starting from the form ofthe matrix it = 2E - A (in (57) this matrix is written out for the casem=5, p=4, q=4, r=3) :


1 0 0 0 a6 0 0 0 0 0 0 0 0 0 0 0

-1 A 0 0 0 0 0 0 0 0 0 0 0 0 0

0-1 A 0 0 0 0 0 0 0 0 0 0 0 0

0 0-1 A 0 0 0 0 0 0 0 0 0 0 0

0 0 0 -1 a1-}-A 0 0 0 0 0 0 0 0 0 0 0....................

0 0 0 0 0 1 0 0 Y4 0 0 0 0 0 0 0

0 0 0 0 0 -1 A 0 p, 0 0 0 0 0 0 0

0 0 0 0 0 0-1 A P, 0 0 0 0 0 0 0

0 0 0 0 0 0 0 -1 #1-}-2 0 0 0 0 0 0 0................................................

0 0 0 0 0 0 0 0 0 A 0 0 y. 0 0 0

0 0 0 0 0 0 0 0 0 -1 1 0 y, 0 0 0

0 0 0 0 0 0 0 0 0 0-1 A y, 0 0 0

0 0 0 0 0 0 0 0 0 0 0-1 y,-3. 0 0 0.............

. (57)

0 0 0 0 0 0 0 0 0 0 0 0 0 a 0 e,

0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 e1+4

Using Laplace's Theorem, we find

D*(A)=I2E-Al=1AE-LEI IAK-L2I...119-L81=V:L(A)w2(A)...ve(1). (58)

Now let us find D.-,(A). We consider the minor of the element a..This minor, apart from a factor ± 1, is equal to

IAB-LS I ... IAB-L$1=V2(A) ... (59)

We shall show that this minor of order n - 1 is a divisor of all the otherminors of order n - 1, so that

DA-1(1) = 2 (1) Ve (1) (60)

For this purpose we first take the minor of an element outside the diago-nal blocks and show that it vanishes. To obtain this minor we have tosuppress one row and one column in the matrix (57). The lines crossed outin this case intersect two distinct diagonal blocks, so that in each of theseblocks only one line is crossed out. Suppose, for example, that in the j-thdiagonal block one of the rows is crossed out. In the minor we take thatvertical strip which contains this diagonal block. In this strip, which has scolumns, all the rows except s -1 rows consist entirely of zeros (we havedenoted the order of A5 by s). Expanding the determinant of order n - 1by Laplace's Theorem with respect to the minors of order s in this strip, wesee that it is equal to zero.


Now we take the minor of an element inside one of the diagonal blocks.In this case the lines crossed out `mutilate' only one of the diagonal blocks,say the j-th, and the matrix of the minor is again quasi-diagonal. Thereforethe minor is equal to

vi (A) ... ,V,_1(1) ip;+1(A) ... w, (A) X (A) , (61)

where x(A) is the determinant of the `mutilated' j-th diagonal block. Sincev' (2) is divisible by V/'{+1(A) (i= 1, 2, ... , t -1). the product (61) is divis-ible by (59). Thus, equation (60) can be regarded as proved. By similararguments we obtain :

D _ 2 ) = 3 ( 2 ) ... V, (A)

DA-:+1 (A) = v ( 2 ) .

D,,-t(A)-. -- = D,(A)=1

From (58), (60), and (62) we find:

V'i (A) = D (2) = i1(A) , 1V2 (A) = D _2 (A)=-4W,

i14.1 (A) _ ... =in (A) =1 .

(62)

(63)

The formulas (63) show that the polynomials v1(2), 312(2), ... ,1Vt(2) coin-cide with the invariant polynomials, other than 1, of the operator A (or thecorresponding matrix A).

Let us give three equivalent formulations of the results obtained :

THEOREM 9 (More precise form of the Second Decomposition Theorem) :If A is a linear operator in R, then the space R can be decomposed into cyclicsubspaces

such that in the sequence of minimal polynomials V1(2), '112(2), ... , V, (A) Ofthe subspaces 11, I2, ... , It, each is divisible by the following. The poly-nomials y11 (A), V2(2), ... , v,(1) are uniquely determined: they coincide withthe invariant polynomials, other than 1, of the operator A.

THEOREM 9': For every linear operator A in R there exists a basis inwhich the matrix L1 that gives the operator is of the first natural normalform. This matrix is uniquely determined when the operator A is given:the characteristic polynomials of the diagonal blocks of L1 are the invariantpolynomials of A.


THEOREM 9": In every class of similar matrices (with elements in F) there

exists one and only one matrix Ll having the first natural normal form. Thecharacteristic polynomials of the diagonal blocks of L1 coincide with theinvariant polynomials (other than 1) of every matrix of that class.

On p. 194 we established that two similar matrices have the same invariantpolynomials. Now suppose, conversely, that two matrices A and B withelements in F are known to have the same invariant polynomials. Since thematrix L1 is uniquely determined when these polynomials are given, the twomatrices A and B are similar to one and the same matrix L3 and, therefore,to each other. We thus arrive at the following proposition :

THEOREM 10: Two matrices with elements in F are similar if and only ifthey have the same invariant polynomials.13

3. The characteristic polynomial 1(A) of the operator A coincides withD(A), and hence with the product of all invariant polynomials :

d (A) _ V1 (A) V2 (A) ... 1V: (A) (64)

But ip1(A) is the minimal polynomial of the whole space with respect toA; hence ip1(A) = 0 and by (64)

A (A) =0. (65)

Thus we have incidentally obtained the Hamilton-Cayley Theorem (seeChapter. IV, § 4) :

Every linear operator (every square matrix) satisfies its characteristicequation.

In § 4 by splitting the polynomials '1(A),'a(A), .... y,, (A) into irreduciblefactors over F :

V'1 (A) = IT1(,)]CI [q'2 (A)]`' ... IP. (A)le' ,

wz (') _ (A)]d' (c Z dx Z . ? lx,166l`,. k=1,2,.

'P, (A) = [q'1 (..)]l' [w2W]I' ... IT,(A)I's

we were led to the third decomposition theorem. To each power with non-zero exponent on the right-hand sides of (66) there corresponds an invariantsubspace in this decomposition.

By (63) all the powers, other than 1, among [p*(A)]ck, ... , [gq,r(A)]ix (k =1, 2, ... , s) are the elementary divisors of A (or A) in the field F (see p. 194).

Thus we arrive at the following more precise statement of the thirddecomposition theorem :

23 Or (what is the same) the same elementary divisors in the field F.


THEOREM 11: If A is a linear operator in a vector space R over a field F,

then R can be split into cyclic subspaces whose minimal polynomials are theelementary divisors of A in F.

LetR=1'+I"+...+I"' (67)

be such a decomposition. We denote by e', e", ... , e0") generating vectorsof the subspaces I', I", ... , I' sand from the `cyclic' bases of these subspaceswe form a basis of the whole space

e', Ae', ...; e", Ae", ...; e("), Ae("), .... (68)

It is easy to see that the matrix L11 corresponding to the operator A in thebasis (68) has quasi-diagonal form, like LI:

LII =(L1, La, ..., L"). (69)

The diagonal blocks LI, L2, ... , L. are of the same structure as the blocks(47) and (48) of LI. However, the characteristic polynomials of thesediagonal blocks are not the invariant polynomials, but the elementary divisorsof A. The matrix LII has the second natural normal form (see § 5).

We have arrived at another formulation of Theorem 11:THEOREM 11': For every linear operator A in R (over the field F) there

exists a basis in which the matrix LII giving the operator is of the secondnatural normal form; the characteristic polynomials of the diagonal blocksare the elementary divisors of A in F.

This theorem also admits a formulation in terms of matrices :THEOREM 11": A matrix A with elements in the field F is always similar

to a matrix LII having the second natural normal form in which the charac-teristic polynomials of the diagonal blocks are the elementary divisors of A.

Theorem 11 and the associated Theorems 11' and 11" have, in a certainsense, a converse.

LetR=11 +I"+...+11"'

be an arbitrary decomposition of a space R into indecomposable invariantsubspaces. Then by Theorem 7 the subspaces 1', 1", ..., It")are cyclic andtheir minimal polynomials are powers of irreducible polynomials over F.We may write these powers, after adding powers with zero exponent if nec-essary, in the form14

14 At least one of the numbers l,,12, . . . , 1. is positive.


[972(x)1"' ...,(ck dt z ... z ll U,l (70)

[ (2)]a', [9'2 ..,z

... . t k=1,2,. .,a J

[q'1(2)}", 19'2 (2)]1', , IT- (2)11'

We denote the sum of the subspaces whose minimal polynomials are in

the first row by 11. Similarly, we introduce 12i ... , It (t is the number ofrows in (70)) . By Theorem 6, the subspaces 11, 12, ... , It are cyclic and theirminimal polynomials v1(,l), V'9(2), ... , V, (A) are determined by the formulas(66). Here in the sequence V, (A), +p2(A), . . , y,,(2) each polynomial is divis-ible by the following. But then Theorem 9 is immediately applicable to the

decompositionR=11+12+...+It.

By this theoremVP (1) = ip (A) (p =1, 21 ... , n),

and therefore, by (66), all the powers (70) with non-zero exponent are theelementary divisors of A in the field F. Thus we have the following theorem :

THEOREM 12: If the vector space R (over the field F) is split in any wayinto decomposable invariant subspaces (with respect to an operator A), thenthe minimal polynomials of these subspaces are all the elementary divisorsof A in F.

There is an equivalent formulation in terms of matrices :THEOREM 12': In each class of similar matrices (with elements in F)

there exists only one matrix (to within the order of the diagonal blocks)having the second normal form L11; the characteristic polynomials of itsdiagonal blocks are the elementary divisors of every matrix of the given class.

Suppose that the space R is split into two invariant subspaces (withrespect to an operator A)

R=11 +12.When we split 11 and 12 into indecomposable subspaces, we obtain at the sametime a decomposition of the whole space R into indecomposable subspaces.Hence, bearing Theorem 12 in mind, we obtain :

THEOREM 13: If the space R is split into invariant subspaces with respectto an operator A, then the elementary divisors of A in each of these invariantsubspaces, taken in their totality, form a complete system of elementarydivisors of A in R.

This theorem has the following matrix form :

200 VII. STRUCTURE OF LINEAR OPERATOR IN 9t-DIMENSIONAL SPACE

THEOREM 13': A complete system of elementary divisors in r of a quasi-diagonal matrix is obtained as the union of the elementary divisors of thediagonal blocks.

Theorem 13' is often used for the actual process of finding the elementarydivisors of a matrix.

§ 7. The Jordan Normal Form of a Matrix

1. Suppose that all the roots of the characteristic polynomial 4(A) of anoperator A belong to the field F. This will hold true, in particular, if F is thefield of all complex numbers.

In this case, the decomposition of the invariant polynomials into ele-mentary divisors in F will look as follows :

i1 (A)=(A-AI)`' (A-A2)`' ... (A - ))`a,

i2(A)=(A-AI)d, (A-A2)d' . . . (A-A3)sa,..................ii (A) = (A - A1)" (A - A2)`' ... (A - A,)t'.

(q4 ...?1kz0'1ck>0; k=1, 2, ..., s

(71)

Since the product of all the invariant polynomials is equal to the character-istic polynomial A (A) , Al, A2, . .. , A. in (71) are all the distinct roots of A (A).

We take an arbitrary elementary divisor

(A -- A0)¢; (72)

here Ao is one of the numbers AI, A2, .... AR and p is one of the (non-zero)exponents ck, dk, ... , lk (k = 1, 2, ... , s).

To this elementary divisor there corresponds in (67) a definite cyclicsubspace , generated by a vector which we denote by e. For this vector(A - A(,) P' is the minimal polynomial.

We consider the vectors

e1=(A-AOE)P-Ie, e2=(A-A,E)r-2e, ..., ep=e. (73)

The vectors el, e2, ... , ep are linearly independent, since otherwise therewould be an annihilating polynomial for e of degree less than p, which isimpossible. Now we note that

or

(A-AOE)el=o, (A-AOE)e2=e1, ..., (A-AoE)ep-e.-I (74)

Ae1= A0e1, Ae2 = Aoe2 + el , ..., AeP = A0ep + ep_1. (75)

§ 7. JORDAN NORMAL FORM 201

With the help of (75) we can easily write down the matrix correspondingto A in I for the basis (73). This matrix looks as follows :

A0 1 0 ... 00 A0 1 ... 0

= A0E(P) + H(P) , (76)

1

0 0 0 . . . A0

where E(P) is the unit matrix of order p and H(P) the matrix of order p whichhas l's along the first superdiagonal and 0's everywhere else.

Linearly independent vectors e1, e2, ... , eP for which (75) holds form aso-called Jordan chain of vectors in . From Jordan chains connected witheach subspace I',1", ... , I(") we form a Jordan basis of R. If we now denotethe minimal polynomials of these subspaces, i.e., the elementary divisors ofA, by

(A - Al)P' , (A - 22)P', ... , (A - Au)r" (77)

(the numbers A1i A2, . . . , Au need not all be distinct), then the matrix J corre-sponding to A in a Jordan basis has the following quasi-diagonal form :

J = { A1E(P,) + H(P,), A2E(P') + H(PQ) , ... , AuE(Pu) + H(Pu) ). (78)

We shall say of the matrix J that it is of Jordan normal form or simplyJordan form. The matrix J can be written down at once when the elemen-tary divisors of A in the field F containing all the characteristic roots of theequation 1(A) = 0 are known.

Every matrix A is similar to a matrix J of Jordan normal form, i.e., foran arbitrary matrix A there always exists a non-singular matrix T (I T 0)such that

A = TJT-1.

If all the elementary divisors of A are of the first degree (and in thatcase only), the Jordan form is a diagonal matrix and we have :

A=T (Al, A2, ..., Au)T-'.

Thus: A linear operator A has simple structure (see Chapter III, § 8)if and only if all the elementary divisors of A are linear.

Let us number the vectors e1, e2, ... , e0 defined by (70) in the reverseorder:g1=ep= e, g2= eP-1=(A-A0E) e, ... , gP=el =(A-A0E)P-1 e. (79)


Then

(A-A0E)g1=g2, (A-4E)g2=g8, ..., (A-AoE)gP=o;hence

Ag1= 20g1 + g2, Age = 20g2 + gs, ..., Ag, = Aog, .

The vectors (79) form a basis in the cyclic invariant subspace I thatcorresponds in (67) to the elementary divisor (A - 2) P.

In this basis, as is easy to see, to the operator A there corresponds thematrix

Ao 0 0 ... 01 A0 0 ... 00 1 A0

0 0 ... 1 Aa

= AOE(P) + F(P).

We shall say of the vectors (79) that they form a lower Jordan chain ofvectors. If we take a lower Jordan chain of vectors in each subspace I', I",.... I(")of (67), we can form from these chains a lower Jordan basis in whichto the operator A there corresponds the quasi-diagonal matrix

Jl = { AIE(P,) + F(Pi), 4E(PA) + FcP.), ... , AUE(P') + F(ft)). (80)

We shall say of the matrix J1 that it is of lower Jordan form. In contrastto (80), we shall sometimes call (78) an upper Jordan matrix.

Thus : Every matrix A is similar to an upper and to a lower Jordanmatrix.

§ 8. Krylov's Method of Transforming the Secular Equation

1. When a matrix A = 11 aik 11 71 is given, then its characteristic (secular)equation can be written in the form

all _A a12 ... al,a21 a, --- A ... a2s

al ap2 ... a. -1On the left-hand side of this equation is the characteristic polynomial

J (A) of degree n. For the direct computation of the coefficients of thispolynomial it is necessary to expand the characteristic determinant

§ S. KRYLOV'S METHOD OF TRANSFORMING SECULAR EQUATION 203

A - AE I; and for large n this involves very cumbersome computationalwork, because A occurs in the diagonal elements of the determinant.15

In 1937, A. N. Krylov [251] proposed a transformation of the character-istic determinant as a result of which A occurs only in the elements of onecolumn (or row).

Krylov's transformation simplifies the computation of the coefficients

of the characteristic equation considerably."In this section we shall give an algebraic method of transforming the

characteristic equation which differs somewhat from Krylov's own method.17We consider an n-dimensional vector space R with basis e1, e2, ... , e and

the linear operator A in R determined by a given matrix A = II atk II1 in thisbasis. We take an arbitrary vector x ,6 o in R and form the sequence ofvectors

x, Ax, A2x, ... . (82)

Suppose that the first p vectors x, Ax, ... , AP-1x of this sequence arelinearly independent and that the (p + 1)-st vector APx is a linear combina-tion of these p vectors :

APx=-aPx-a,-i Ax- -a1AP-ix (83)

or

p (A)x=o, (84)

where

q'(1)=AP +a1AP-1+...+aP, (85)

All the further vectors in (82) can also be expressed linearly by the firstp vectors of the sequence.'' Thus, in (82) there are p linearly independent

15 We recall that the coefficient of Xk in 4(X) is equal (apart from the sign) to thesum of all the principal minors of order n - k in A (k = 1, 2, ... , n). Thus, even forn = 6, the direct determination of the coefficient of X in 4(X) would require the computa-tion of six determinants of order 5; that of X2 would require fifteen determinants oforder 4; etc.

16 The algebraic analysis of Krylov's method of transforming the secular equationis contained in a number of papers [268], [269], [2111, [168), and [149).

17 Krylov arrived at his method of transformation by starting from a system of nlinear differential equations with constant coefficients. Krylov's approach in algebraicform can be found, for example, in [2681 and [168] and in § 21 of the book [25).

is When we apply the operator A to both sides of (83) we express AP+lx linearly interms of Ax, ... , AP-1x, APx . But Apx, by (83), is expressed linearly in terms ofx, Ax,.. ., AP-1x. Hence we obtain a similar expression for AP+'x. By applying theoperator A to the expression thus obtained for AP+lx, we express AP+*x in terms ofx, Ax, ... , AP+ix, etc.

204 VII. STRUCTURE OF LINEAR OPERATOR IN n-DIMENSIONAL, SPACE

vectors and this maximal number of linearly independent vectors in (82)is always realized by the first p vectors.

The polynomial q (A) is the minimal (annihilating) polynomial of thevector x with respect to the operator A (see § 1). The method of Krylovconsists in an, effective determination of the minimal polynomial q:(1) of x.

We consider separately two cases: the regular ease, where p = n ; andthe singular case, where p < n.

The polynomial q°(l) is a divisor of the minimal polynomial VY(2) of thewhole space R,19 and y>(2) in turn is a divisor of the characteristic poly-nomial A(d). Therefore q, (A) is always a divisor of A(1).

In the regular case. q-(A) and 4(1) are of the same degree and, sincetheir highest coefficients are equal, they coincide. Thus, in the regular case

4 (A)= W=97 (A),and therefore in the regular case Krylov's method is a method of computingthe coefficients of the characteristic polynomial A(A).

In the singular case, as we shall see later, Krylov's method does not enableus to determine J(A), and in this case it only determines the divisor q(A)of A(1).

In explaining Krylov's transformation, we shall denote the coordinatesof x in the given basis e1, e2i .... e by a, b, ... , 1, and the coordinates of thevector Akx by ak, bk,..., lk (k=1,2, ...,n).2. Regular case : p =r. In this case, the vectors x, Ax, .... A"-lx are lin-early independent and the equations (83), (84), and (85) assume the form

A"x=-a"x-an_l Ax- -a1A"-lx (86)or

4(A)x=o, (87)

whereA(,1)=An (88)

The condition of linear independence of the vectors x, Ax.... , A"-lx maybe written analytically as follows (see Chapter III, § 1) :

M=a b ... I

a1 b1 ... 4 0. (89)

an-1 11-1

We consider the matrix formed from the coordinate vectors x, Ax, ... ,A"x:

19 y, (X) is the minimal polynomial of A.

§ 8. KRYLOV'S METHOD OF TRANSFORMING SECULAR EQUATION 205

a b ... I

al bi ... 11

(90)

an-1 b,i_1 ... ln_1

an bn In

In the regular case the rank of this matrix is n. The first n rows of thematrix are linearly independent, and the last, (n + 1)-st, row is a linearcombination of the preceding n.

We obtain the dependence between the rows of (90) when we replacethe vector equation (86) by the equivalent system of n scalar equations

- ana - an_lal - 01lan_1 = an

- anb - an_1b1 -- . - alb._1- bn (91)

- and - an_lli - - ail.-1 = In .From this system of n linear equations we may determine the unknown

coefficients al, a2, ... , an uniquely,20 and substitute their values in (88).This elimination of al, a2, ... , an from (88) and (91) can be performedsymmetrically. For this purpose we rewrite (88) and (91) as follows:

aan+alan_i+...+an_,a1+ana. =0ban + bian_i + ... + bn-1ai + bnao = 0

.......................... (010 =1).lan + llan-1 + ... + ln_1 ai + lnao = 0lan+A0tn_1 1+[An-d(2)]ao=0

Since this system of n + 1 equations in the n + 1 unknown a0, a2i ... , an hasa non-zero solution (a0 =1) , its determinant must vanish :

a al ... an-1 an

b bi ... bn-1 bn

l li ... ln_I In

An-1 An (A)

Hence we determine d(A) after a preliminary transposition of the determi-nant (92) with respect to the main diagonal:

20 By (89), the determinant of this system is different from zero .


a b ... I 1

ai b1 ... 11 A

MA(A)_ (93)

a b ... l A^

where the constant factor M is determined by (89) and differs from zero.The identity (93) represents Krylov's transformation. In Krylov's

determinant on the right-hand side of the identity, A occurs only in theelements of the last column ; the remaining elements of the determinant donot depend on A.

Note. In the regular case, the whole space R is cyclic (with respect to A).If we choose the vectors x, Ax, ..., A'-ix as a basis, then in this basis theoperator A corresponds to a matrix A having the natural normal form

0 0...0 -a1 0 ... 0 - an-1

A= (94)

0 ... 1 -aiThe transition from the original basis e1, e3, ... , e to the basis x, Ax,, A1x is accomplished by means of the non-singular transforming matrix

T=

and then

a a, ... a,.-1b b....b_,

I

(95)

li ... 1, _

A = TAT-1. (96)

3. Singular case : p < n. In this case, the vectors x, Ax, ... , A"-ix arelinearly dependent, so that

M=a b ... I

ai bi ... li

a,.-1 b,...1 ... l,ri

= 0.


Now (93) had been deduced under the assumption M 0. But bothsides of this equation are rational integral functions of A and of the para-meters a, b, ... , 1.2' Therefore it follows by a `continuity' argument that(93) also holds for M = 0. But then, when Krylov's determinant is ex-panded, all the coefficients turn out to be zero. Thus in the singular case(p < n) the formula (93) goes over into the trivial identity 0 = 0.

Let us consider the matrix formed from the coordinates of the vectorsx, Ax, ... , APx

(97)

This matrix is of rank p and the first p rows are linearly independent, butthe last, (p + 1) -st, row is a linear combination of the first p rows with thecoefficients -aP,-cep , , ... , -a1 (see (83) ). From the n coordinates a, b,... , l we can choose p coordinates c, f, ... , h such that the determinantformed from the coordinates of the vectors x, Ax, ... , AP-1x is differentfrom zero :

M* =

C f .., hc1 f1 ... hl

(98)

CP-1 fP-1 ... &P-1

Furthermore, it follows from (83) that:

- apc - aP-1c1- ... - a1Cp-1 = cPOl f - aP-1f1- ... - 214-2 = t

P

- 01p4 - ap-1h, - - - - - a1h,-1=hP.

(99)

From this system of equations the coefficients a,' a2, ... , aP of the poly-nomial p(A) (the minimal polynomial of x) are uniquely determined. Inexact analogy with the regular case (however, with the value n replaced byp and the letters a, b, ... , 1 by c) f, ... , h), we may eliminate a1, a2, ... , aPfrom (85) and (99) and obtain the following formula for T(2) :

21 at=a(')a+a(t)b+...+0)1, b =a(4)a + (h b (i11 12 In 21 a28 + ... } an l, etc. (i -

1,2,...,n),where a0) (j,k=1,2,...,n) are the elements ofAi (i`1,2,...,n).


C f ... h 1

C1 /1 h1 A

CP_1 t p-1 ... hP_1 AP-1PCP fP ... hp

(100)

4. Let us now clarify the problem : for what matrices A = II a{k II, and forwhat choice of the original vector x or, what is the same, of the initial para-meters a, b, . . . , 1 the regular case holds.

We have seen that in the regular case

A(A)°v (A)

The fact that the characteristic polynomial A(A) coincides with theminimal polynomial p(A) means that in the matrix A = II ask Ili there areno two elementary divisors with one and the same characteristic value, i.e.,all the elementary divisors are co-prime in pairs. In the case where A is amatrix of simple structure, this requirement is equivalent to the conditionthat the characteristic equation of A. have no multiple roots.

The fact that the polynomials yp(A) and T(A) coincide means that for xwe have chosen a vector that generates (by means of A) the whole space R.Such a vector always exists, by Theorem 2 of § 2.

But if the condition d (A) = ,p(A) is not satisfied, then however we choosethe vector x o, we do not obtain A (A), since the polynomial T(A) obtainedby Krylov's method is a divisor of y) (A) which in this case does not coincidewith A (A) but is only a factor of it. By varying the vector x we may obtainfor T, (A) every divisor of y)(A).22

The results we have reached can be stated in the form of the followingtheorem :

THEOREM 14: Krylov's transformation gives an expression for the char-acteristic polynomial A(A) of the matrix A= II ack Ili in the form of thedeterminant (93) if and only if two conditions are satisfied :

1. The elementary divisors of A are co-prime in pairs.2. The initial parameters b, ... , l are the coordinates of a vector x

that generates the whole n-dimensional space (by means of the operator Acorresponding to the matrix A).2'

22 See, for example, (168), p. 48.23 In analytical form, this condition means that the columns x, Ax, ... , As-1x. are

linearly independent, where x = (a, b, . . . , 1).


In general, the Krylov transformation leads to some divisor T(A) of thecharacteristic polynomial d (1). This divisor 92(1) is the minimal polynomialof the vector x with the coordinates a, b, ... , l (where a, b, ... , 1 are theinitial parameters in the Krylov transformation).

5. Let us show how to find the coordinates of a characteristic vector y foran arbitrary characteristic value Ao which is a root of the polynomial T().)obtained by Krylov's method.24

We shall seek a vector y r o in the form

y=E1x+

Substituting this expression for y in the vector equation

Ay-20y

(101)

and using (83), we obtain

41Ax + E2A2x + - - - + AP-1x + p (- apx - ap_1Ax - .. -.aAP-lx)$Ax+...+. PAP-'x). (102)

Hence, among other things, it follows that Sp ; 0, because the equation5p = 0 would yield by (102) a linear dependence among the vectors x, Ax,.... AP-1x. In what follows we set Sp = 1. Then we obtain from (102) :

Sp =1 , p_1 al , 6p_2 = M P-1 + a2, ... , $1 = 20E2 + aP-1 , (103)0 = ap.

The first of these equations determine for us in succession the values.... L1 (the coordinates of y in the `new' basis x, Ax,..., AP-ix) ; the lastequation is a consequence of the preceding ones and of the relationAa+a1Ao 1+...+ap=0.

The coordinates a', b', ... , 1' of the vector y in the original basis may befound from the following formulas, which follow from (101) :

a'= via + 2ai + ... + $pap-1

V= alb + 2b1 + ... + Pbp-1

11=$11+ $2'1 + ... + bplp_i .

(104)

Example 1.We recommend to the reader the following scheme of computations.

24 The following arguments hold both in the regular case p = n and the singularcase p < n.

210 VII. STRUCTURE OF LINEAR OPERATOR IN 12-DIMENSIONAL SPACE

Under the given matrix A we write the row of the coordinates of x : a, b,, 1. These numbers are given arbitrarily (with only one condition : at

least one is different from zero). Under the row a, b, . . ., l we write therow a1, b1, ... , l1, i.e., the coordinates of the vector Ax. The numbers a,, b1,... , l1 are obtained by multiplying the row a, b, ... , 1 successively into therows of the given matrix A. For example, a1= ala + a12b + ... +b, = a2,a + a22b + ... + etc. Under the row at, b1, ..., 11 we write therow a2, b2, ... , 12, etc. Each of the rows, beginning with the second, is deter-mined by multiplying the preceding row successively into the rows of thegiven matrix.

Above the given matrix we write the sum row as a check.

8 3 -10 -33 -1 - 4 2A= 2 3 - 2 -4

2 -1 - 3 21 2 - 1 -3............................................................................

x=e1 +e2 1 1 0 0 - 1 IAx 2 5 1 3- 1:-1A'x 3 5 2 2 I -1A'x 0 9- 1 5 1

A'x 5 9 4 4j 0 8 0 4

Y, 0 2 0 1

4 0 - 4 0X { 1 0 1 0

The given case is regular, because

hf =

1 1 0 02 b 1 33 b 2 20 9 -1 5

= -16 0.

Krylov's determinant has the form

-164(1)=

1 1 0 0 1

2 5 1 3 1

3 5 2 2 is0 9 -1 5 .l

5 9 4 4 1'

Expanding this determinant and cancelling - 16 we find :

A (1) =1' - 2a° + 1= (1-1)' (1 + 1)1.


We denote by

y = ,x + &2Ax + 3A2x + fgA3x

a characteristic vector of A corresponding to the characteristic value to =1.We find the numbers $1, E2, 3, by the formulas (103) :

,=1, a=1.10+0=1, z1.,,-2=-1, C,=-1.20+0=-1.The control equation -1.10 + 1= 0 is, of course, satisfied.

We place the numbers 4,, ,j S3, $, in a vertical column parallel to thecolumns of x, Ax, Azx, A3x. Multiplying the column ,, $,j S,, 4, into thecolumns a,, a2, a3i a,, we obtain the first coordinate a of the vector y in theoriginal basis e,, e2, e3, e.; similarly we obtain b', c', d'. As coordinates of ywe find (after cancelling by 4) : 0, 2, 0, 1. Similarly, we determine thecoordinates 1, 0, 1, 0 of a characteristic vector z for the characteristic value,10=-1.

Furthermore, by (94) and (95),

A = TAT-1where

0 0 0 -1 1 2 3 0

d= 1 0 0 0 T= 1 5 5 90 1 0 2 0 1 2 -10 0 1 0 0 3 2 5

Example 2. We consider the same matrix A, but as initial parameterswe take the numbers a =1, b = 0, c = 0, d = 0.

8 3 -10 -33 -1 - 4 22 3 - 2 -4A=2 -1 - 3 21 2 - 1 -3

X= OIL 1 0 0 0Ax 3 2 2 1

A2x 1 4 0 2A$x 3 6 2 3

But in this case

M=1 0 0 03 2 2 1

1 4 0 23 6 2 3

=0

and p = 3. We have a singular case to deal with.


Taking the first three coordinates of the vectors x, Ax, A2x, A3x, we writethe Krylov determinant in the form

1 0 0 1

3 2 2 A

1 4 0 12

3 6 2 As

Expanding this determinant and cancelling - 8, we obtain :

P (A)= AS -A2- A+ 1 =(I -1)2 (A + 1).

Hence we find three characteristic values : A, = 1, A2 = 1, A3 = -1. Thefourth characteristic value can be obtained from the condition that the sumof all the characteristic values must be equal to the trace of the matrix. Buttr A = 0. Hence A, = -1.

These examples show that in applying Krylov's method, when we writedown successively the rows of the matrix

a b ... lal bi ... 11

a2 b2 . . . Z$ (105)

it is necessary to watch the rank of the matrix obtained so that we stop afterthe first row (the (p + 1)-st from above) that is a linear combination of thepreceding ones. The determination of the rank is connected with the com-putation of certain determinants. Moreover, after obtaining Krylov's de-terminant in the form (93) or (100), in order to expand it with respect tothe elements of the last column we have to compute a certain number ofdeterminants of order p - 1 (in the regular case, of order n - 1).

Instead of expanding Krylov's determinant we can determine the coeffi-cients a,, a2, ... directly from the system of equations (91) (or (99)) byapplying any efficient method of solution to the system-for example, theelimination method. This method can be applied immediately to the matrix

a b ... l 1

ai b1 ... A

asb2 ...l,Aa (108)

by using it in parallel with the computation of the corresponding rows byKrylov's method. We shall then discover at once a row of the matrix (105)


that depends on the preceding ones, without computing any determinant.Let us explain this in some detail. In the first row of (106) we take an

arbitrary element c 0 and we use it to make the element cl under it into

zero, by subtracting from the second row the first row multiplied by cl/c.Next we take an element f 1* 0 in the second row and by means of c andf 1* we make the elements c2 and f2 into zero, etc.2S As a result of such atransformation, the element in the last column of (106) is replaced by apolynomial of degree k, gk (A) = Ak + (k=0,1,2 .... )

Since under our transformation the rank of the matrix formed from thefirst k rows for any k and the first n columns of (106) does not change, the(p + 1)-st row of the matrix must, after the transformation, have the form

0, 0,...,0,gp(1).

Our transformation does not change the value of the Krylov determinant

c f ... h 1

hi A

CP_1 f,1 ... hp-1 AP-1

CP fP ...P

AP

Therefore

M*,r W.

M*'T (A) = cfl... gq(2), (107)

i.e.,26 g,(A) is the required polynomial 97(A) : gp(1) -9)(A).We recommend the following simplification. After obtaining the k-th

transformed row of (106)

ak1, bk1, ... , 'k-1) 9k-1(") , (108)

one should obtain the following (k + 1)-st row by multiplying at_,, bk_1,...,lk_1 (and not the original ak_1, bk_1, ... , lx-1 ) into the rows of the givenmatrix.27 Then we find the (k + 1)-st row in the form

at , bk , .... lk , Agk-1 (1)

and after subtracting the preceding rows, we obtain :

25 The elements c, f,`, ... must not belong to the last column containing the powers of X.28 We recall that the highest coefficients of p(X) and g,(X) are 1.27 The simplification consists in the fact that in the row of (108) to be transformed

k - 1 elements are equal to zero. Therefore it is simple to multiply such a row into therows of A.


i' Aa. 9k

The slight modification of Krylov's method that we have recommended(its combination with the elimination method) enables us to find at oncethe polynomial cp(2) that we are interested in (in the regular case, d (A) )without computing any determinants or solving any auxiliary system ofequations.28

Example.

A=

4 4 1 5 0

1 1 -1 1 0

1 2 -1 0 1

-1 2 3 -1 0

1 -2 1 2 -12 1 -1 3 0

0 0 0 0 1 1

0 1 0 -1 0:10 2 3 -4 -2 22 [2-41]0- 2 3 0 0 2'-41+2

- 5 - 7 5 7 - 5 18 - 41' + 21 [5 + 71]

- 5 0 5 0 0 18-41'+91+5-10 -10 20 0 -15 14-418+912+51 [15-5(19-41+2)

-2(18-41'+92+5)]0 0- 5 0 0 1'-618+ 1228+71-55 5 -15 -5 5 26-624+1223+718-52 [-5-52+(28

-428-X92+5)-2(14-623+121'+72-5)]0 0 0 0 0 28 - 82' + 2518 - 211' - 152 + 10

A (1)

23 A.,..4- o_..... U_ _e ,

ak a bk, - lk, A9k-1 (A)

and after subtracting the preceding rows, we obtain :

25 The elements c, fl*.... must not belong to the last column containing the powers of X.26 We recall that the highest coefficients of T(X) and g,(X) are 1.27 The simplification consists in the fact that in the row of (108) to be transformed

k - 1 elements are equal to zero. Therefore it is simple to multiply such a row into therows of A.

CHAPTER VIII

MATRIX EQUATIONS

In this chapter we consider certain types of matrix equations that occur invarious problems in the theory of matrices and its applications.

§ 1. The Equation AX = XB

1. Suppose that the equation

AX = XB (1)

is given, where A and B are square matrices (in general of different orders)

A= Ilaf lli, B=11bk Illand where X is an unknown rectangular matrix of dimension m X n :

X=1lxtxll (9=1,2, ...,m; k=1,2, ...,n).We write down the elementary divisors of A and B (in the field of

complex numbers) :

(A) (pi + p2 + ... + pu = m)(B) 1292)°i , ... (q, + q2 + . + q,- n) .

In accordance with these elementary divisors we reduce A and B toJordan normal form

A=UAU-1, B=VBV-', (2)

where U and V are square non-singular matrices of orders m and n, respec-tively, and A and B are the Jordan matrices :

A = I' 21E(P,) + H(p),

{ ju E(ql) + H(Q.),

12E(r,) + H(r,), ... , 2,,E(rx) + H(pk) }

Iu2E(") + H(4:) , ... , 1 vE(9°) + H(9v) } (3 )

215

216 VIII. MATRIX EQUATIONS

Replacing A and B in (1) by their expressions given in (2), we obtain:

UAU-1X=XVBV-1.

We multiply both sides of this equation on the left by U-I and on the rightby V:

AU-1XV = U-1XVB. (4)

When we introduce in place of X a new unknown matrix X (of the samedimension m X n) 1X=U-gV, (5)

we can write equation (4) as follows:

A% =±h. (6)

We have thus replaced the matrix equation (1) by the equation (6), ofthe same form, in which the given matrices have Jordan normal form.

We partition k into blocks corresponding to the quasi-diagonal form ofthe matrices A and B :

X (gap) (a =1, 2, ..., u; =1, 2, ..., v)

(here Xas is a rectangular matrix of dimension pa X qp (a =1, 2, ... , u ;6=1,2,...,v)).

Using the rule for multiplying a partitioned matrix by a quasi-diagonalone (see p. 42), we carry out the multiplication of the matrices on the left-hand and right-hand sides of (6). Then this equation breaks up into uvmatrix equations

pakPa)+ H°Pal gad = gap [ gyp" (4P) + H(4S)]

(a=1, 2, ..., u; #=1, 2, ..., v),

which we rewrite as follows :

(1Ap_';a)Xa0 =Hags,-gap0p (a=1, 2, ..., u; f 1, 2, ..., v); (7)

we have used here the abbreviations

Ha= H(Pa) Go = HcQf) (a = I, 2, ... , u; fl =1, 2, ... , v). (8)

Let us take one of the equations (7). Two cases can occur:

1. Za ,µp. We iterate equation (7) r - 1 times:'

I We multiply both sides of (7) by lip- Am and in each term of the right-hand sidewe replace (µp- 2a) Sap by Flalap-- ZapOp. This process is repeated r- 1 times.

§ 1. THE EQUATION AX = XB 217

(gyp - a)rXap 1)z (t) H:XafG;.a+T-r

Note that, by (8),

(9)

Haa=Gfp=0. (10)

If in (9) we take r ? pa + qp -1, then in each term of the sum on theright-hand side of (9) at least one of the relations

v pa, t:>- qp

is satisfied, so that by (10) either Ha=0 or GG=0. Moreover, since in this

case A. µp, we find from (9)

Xap =0.

2. ,la= pup . In this case equation (7) assumes the form

HaXap=XapGp. (12)

In the matrices H. and Go the elements of the first superdiagonal areequal to 1, and all the remaining elements are zero. Taking this specificstructure of H. and Gp into account and setting

11(i=1, 2, ..., pa; k=1, 2, ..., 4p),

we replace the matrix equation (12) by the following equivalent systemof scalar equations :2

tst+1.k-Et.k-1(4W-tPa+1,k=0; i=1, 2, ..., pa; k=1, 2, ..., qp). (13)

The equations (13) have this meaning :1) In the matrix Xap the elements of every line parallel to the main

diagonal are equal ;tt2) =4.2= ...=SPa.4p-1=0

Let pa = qp. Then Xap is a square matrix. From 1) and 2) it followsthat in Xap all the elements below the main diagonal are zero, all the elementsin the main diagonal are equal to a certain number cp, all the elements ofthe first superdiagonal are equal to a number cap, etc. ; i.e.,

2 From the structure of the matrices Ha and Gp it follows that the product HaXap isobtained from Xgp by shifting all the rows one place upwards and filling the last rowwith zeros; similarly, XgfGp is obtained from Xap by shifting all the columns one placeto the right and filling the first column with zeros (see Chapter I, p. 14). To simplifythe notation we do not write the additional indices a, fl in ,k.


Cap Cap CMPa-1)

0 Cap

Xap = =TPa ;(14)

0 . . . 0 Cap

(Pa=q )

here cap, cap are arbitrary parameters (the equations (12) donot impose any restrictions on the values of these parameters).

It is easy to see that for pa < qp

9p-Pa

Xap = ( 0 , T,m)and for pa > qp

(15)

Xap-(Tzp))Pa-sp(16)0

We shall say of the matrices (14), (15), and (16) that they have regularupper triangular form. The number of arbitrary parameters in Xap isequal to the smaller of the numbers pa and qp. The scheme below shows thestructure of the matrices Xap for xa = Kp (the arbitrary parameters are heredenoted by a, b, c, and d) :

a b c d0 0 a b c

a0

ba

c i

b0 a b cgap _ , Sap= 0 0 0 a b , Xap = 0 0 a0 0 a b

0 0 0 0 a 0 0 00 0 0 a

0 0 0

(pa=qp=4) (pa=3, qp=5) (pa-5, qp=3)In order to subsume case 1 also in the count of arbitrary parameters in X,

we denote by dap (1) the greatest common divisor of the elementary divisors(I-- 2a)P" and (A- ,up)4p and by Sap the degree of the polynomial dap (A)(a=1,2,.. ., u ; fl=1,2,. . ., v) . In case 1, we have 60=0 ; in case 2,aap= min (pa, qp). Thus, in both cases the number of arbitrary parametersin Xap is equal to Sap. The number of arbitrary parameters in X is deter-mined by the formula.

uN = , .

e

a,,$.U-1 p-1

In what follows it will be convenient to denote the general solution of(6) by XAh (so far we have denoted it by X).

§ 1. THE EQUATION AX = XB 219

The results obtained in this section can be stated in the form of the follow-ing theorem :

THEOREM 1: The general solution of the matrix equation

whereAX =XB

A = I I ai,t l !"` = UA U-1= U {21E(P1) + H(r,), ... , A,,E(r") + H(PU)) U-i

1= Il b;k I Iu = V BV-1 = V {,u1E(Ql) + H(q,), . , y A;(qo) + H(qn)) V-1

is given by the formula

X = UXAB V-1 , (17)

Here Xjj is the general solution of the equation

AX =XBand has the following structure :

X;B is decomposed into blocksq'8

X: B = (Xa,) } P. (a=1,2, ..., U; f=1,2, ..., v);if 2u pup, then the null matrix stands in the place Xap, but if A. = pp, thenan arbitrary regular upper triangular matrix stands in the place X.

XaB, and therefore also X, depends linearly on N arbitrary parametersc1, c2,...,cN

N

X = G CtX f,j-1

where N is determined by the formula

(18)

N= 8ap (19)

(here Bap denotes the degree of the greatest common divisor of (A-- 2,)P' and(A - µ'8)4p).

Note that the matrices X1i X2, ... , XN that occur in (18) are solutionsof the original equation (1) (X; is obtained from X by giving to the para-meter c, the value 1 and to the remaining parameters the value 0; j = 1, ,2,.... N). These solutions are linearly independent, since otherwise for cer-tain values of the parameters c1i c2, ..., cv, not all zero, the matrix X, andtherefore Xd8 , would be the null matrix, which is impossible. Thus (18)shows that every solution of the original equation is a linear combinationof N linearly independent solutions.


If the matrices A and B do not have common characteristic values (if the

characteristic polynomials I AE - A I and I AE - B I are co-prime), thenu o

N= X 2;6.0= 0, and so X= 0, i.e., in this case the equation (1) has onlya_la_1

the trivial solution X = 0.

Note. Suppose that the elements of A and B belong to some number

field F. Then we cannot say that the elements of U, V, and Xdk that occurin (17) also belong to F. The elements of these matrices may be taken in anextension field F1 which is obtained from F by adjoining the roots of thecharacteristic equations I AE - A I = 0 and 11E - B I = 0. We always haveto deal with such an extension of the ground field when we use the reductionof given matrices to Jordan normal form.

However, the matrix equation (1) is equivalent to a system of mn linearhomogeneous equations, where the unknown are the elements Xjk (j =1, 2,3, ... , tin ; k =1, 2, ... , n) of the required matrix X :

in n

.E aijxik =.Z xtnbhk (i=1, 2, ..., m; k =1, 2, ..., n). (20)J_1 A_1

What we have shown is that this system has N linearly independent solu-tions, where N is determined by (19). But it is well known that fundamentallinearly independent solutions can be chosen in the ground field F to whichthe coefficients of (20) belong. Thus, in (18) the matrices X1i X2, ... , Xrcan be so chosen that their elements lie in F. If we then give to the arbitraryparameters in (18) all possible values in F, we obtain all the matrices Xwith elements in F that satisfy the equation (1).3

§ 2. The Special Case A = B. Commuting Matrices

1. Let us consider the special ease of the equation (1)

AX =XA, (21)

where A= II ask II TI is a given matrix and X = 11 x{k II; an unknown matrix.We have come to a problem of Frobenius: to determine all the matrices Xthat commute with a given matrix A.

We reduce A to Jordan normal form :

A = UAU'1= U {21E('") + H(P,>, ..., AUE,(ru> + Hcp">} u_i. (22)

3 The matrices a = 44 ail III' and B = II bk:I1 determine a linear operator F(X) _AX-XB in the space of rectangular matrices X of dimension m X n. A treatment ofoperators of this type is contained in the paper [1791.

§ 2. THE SPECIAL CASE A = B. COMMUTING MATRICES 221

Then when we set in (17) V = U, B = A and denote X ja simply by Xa, weobtain all solutions of (21), i.e., all matrices that commute with A, in thefollowing form :

X = UXd U-1, (23)

where Xa denotes an arbitrary matrix permutable with A. As we haveexplained in the preceding section, X2 is split into u2 blocks

X2 = (X"")1

corresponding to the splitting of the Jordan matrix A into blocks ; XQp iseither the null matrix or an arbitrary regular upper triangular matrix,depending on whether 2 Ap or Aa =A, .

As an example, we write down the elements of X2 in the case where Ahas the following elementary divisors:

(2-).), (A-11)3, (A-,12)2, A -A2 (A17` A2).

In this case X2 has the following form :

a b c d:e f 8 0 0 00 a b c:0 e f 0 0 00 0 a b 0 0 e0 000 0 0 a 0 0 00 00

.................0 h k l am p q0 000 0 h k0 m p0 000 0 0 h 0 0 M10 0:00 0 0 00 0 0r s t0 0 0 0 0 0 00 r 0

0 0 0 0 : 0 0 0 0 wz

(a, b, ... , z are arbitraryparameters).

The number of parameters in Xa is equal to N, where N = 8"p ;

here a denotes the degree of the greatest common divisor of the polynomials(A - A,f" and (I -- Ap)1''.

Let us bring the invariant polynomials of A into the discussion: il(A),i2(2), ... , is (A); is+1(A) _ ...= 1. We denote the degrees of thesepolynomials by n1 > n2;2! ... > n, > nj+1= ... = 0. Since each invariantpolynomial is a product of certain co-prime elementary divisors, the formulafor N can be written as follows:


9

N = E mep (24)94-1

where xf is the degree of the greatest common divisor of i9(A) and i;(,l)(g, j =1, 2, ... , t). But the greatest common divisor of iD (A) and i; (A) isone of these polynomials and therefore xj = min (n,, n,). Hence we obtain:

N is the number of linearly independent matrices that commute with A(we may assume that the elements of these matrices belong to the groundfield F containing the elements of A ; see the remark at the end of the preced-ing section). We have arrived at the following theorem:

THEOREM 2: The number of linearly independent matrices that commutewith the matrix A= OR. II TI is given by the formula

N=n1+3n2+...+ (2t-1)nt. (25)

where n1, 712, ... , it, are the degrees of the non-constant invariant polynomialsit (A), i2(A), ... , i,(A) of A.

Note that

n=n,+n2+...+n,. (26)

From (25) and (26) it follows that

N ? P, (27)

where the equality sign holds if and only if t =1, i.e., if all the elementarydivisors of A are co-prime in pairs.

2. Let g(el) be an arbitrary polynomial in A. Then g(A) is permutable withA. There arises the converse question : when can every matrix that is per-mutable with A be expressed as a polynomial in A? Every matrix that com-mutes with A would then be a linear combination of the linearly independentmatrices

E, A, A2, . . ., A"--1.

Hence N = it, < n ; on comparing this with (27), we obtain : N = n1= n.COROLLARY 1 TO THEOREM 2: All the matrices that are permutable with

A can be expressed as polynomials in A if and only if it, = n, i.e., if all theelementary divisors of A are co-prime in pairs.

§ 2. THE SPECIAL CASE A = B. COMMUTING MATRICES 223

3. The polynomials in q matrix that commutes with A also commute with A.We raise the question : when can all the matrices that commute with A beexpressed in the form of polynomials in one and the same matrix C4 Let usconsider the case in which they can be so expressed. Then since by theHamilton-Cayley Theorem the matrix C satisfies its characteristic equation,every matrix that commutes with C must be expressible linearly by thematrices

E, C, C2, . . ., C"-1.

Therefore in this case N:5 n. Comparing this with (27), we find thatN = n. Hence from (25) and (26) we also have n1= n.

COROLLARY 2 TO THEOREM 2: All the matrices that are permutable withA can be expressed in the form of polynomials in one and the same matrixC if and only if n, = n, i.e. if and only if all the elementary divisors ofAE - A are co-prime. In this case all the matrices that are permutable withA can be represented in the form of polynomials in A.

4. We mention a very important property of permutable matrices.THEOREM 3: If two matrices A = II a{k III and B = I I b<k II i are per-

mutable and if one of them, say A, has quasi-diagonal form:, Si

A = (A1, A2) , (28)

where the matrices A, and A2 do not have characteristic values in common,then the other matrix also has the same quasi-diagonal form

dl $j

B = { B17B2). (29)

Proof. We split B into blocks corresponding to the quasi-diagonal form(28) :

d, a,

B=( YI B2).

From the relation AB = BA we obtain four matrix equations:

1. AIB1 =B,A1, 2. A1X =XA2, 3. A2Y=YA1. 4. A2B2=B2A2. (30)

As we explained in § 1 (p. 220), the second and third of the equations in (30)only have the solutions X = 0, Y = 0, since A, and A2 have no characteristicvalues in common. This proves our statement. The first and fourth of theequations in (30) express the permutability of A, and B, and of A2 and B2.


In geometrical language, this theorem runs as follows :

THEOREM 3': IfR =I, + 12

is a decomposition of the whole space R into invariant subspaces 11 and 12

with respect to an operator A and if the minimal polynomials of these sub-spaces (with respect to A) are co-prime, then 11 and 12 are invariant withrespect to any linear operator B that commutes with A.

Let us also give a geometrical proof of this statement. We denote byip1(1) and V'2(A) the minimal polynomials of 11 and 12 with respect to A.From the fact that they are co-prime it follows that all the vectors of R thatsatisfy the equation 1p1 (A) x = o belong to 11 and all the vectors that satisfyVJ2(A)x = o belong to 12.' Let x1 a I1. Then V,1(A)x, = o. The perrnutabil-ity of A and B implies that of Vp1(A) and B, so that

V , B V I L

i.e., Bx1 a 11. The invariance of 12 with respect to B is proved similarly.This theorem leads to a number of corollaries :COROLLARY 1: If the linear operators A, B, ..., L are pairwise permu-

table, then the whole space R can be split into subspaces invariant withrespect to all the operators A, B, . . . , L

R=11+12+...+1,nsuch that the minimal polynomial of each of these subspaces with respect toany one of the operators A, B, . . . , L is a power of an irreducible polynomial.

As a special case of this we obtain:COROLLARY 2: If the linear operators A, B, ... , L are pairwise permu-

table and all the characteristic values of these operators belong to the groundfield, then the whole space R can be split into subspaces ,1, invari-ant with respect to all the operators such that each operator A, B, .... L hasequal characteristic values in each of them.

Finally, we mention a further special case of this statement :COROLLARY 3 : If A, B, ... , L are pairwise permutable operators of simple

structure (see Chapter III, § 8), then a basis of the space can be formedfrom common characteristic vectors of these operators.

We also give the matrix form of the last statement :Permutable matrices of simple structure can be brought into diagonal

form simultaneously by a similarity transformation.

4 See Theorem 1 of Chapter VII (p. 179).

§ 4. THE SCALAR EQUATION f (X) = 0 225

§ 3. The Equation AX - XB = C

1. Suppose that the matrix equation

AX-XB=C (31)

is given, where A= 11 ay I1T and B = 11 bkc I'; are given square matrices oforder m and n and where C = I c, II and X = I Xjk 11 are a given and an un-known rectangular matrix, respectively, of dimension m X n. The equation(31) is equivalent to a system of mn scalar equations in the elements of X :

m n

a,,xttxubtr=cik (i =1,2, ..., m; k=I,2, ..., n).f-i

The corresponding homogeneous system of equations

m n

(i_=1,2, ..., m; k=1,2, ..., n),

can be written in matrix form as follows :

(31')

AX-XB=O. (32)

Thus, if (32) only has the trivial solution X = 0, then (31) has a uniquesolution. But we have established in § 1 that the only solution of (32) isthe trivial one if and only if A and B do not have common characteristicvalues. Therefore, if the matrices A and B do not have characteristic valuesin common, then (31) has a unique solution; but if the matrices A and Bhave characteristic values in common, then two cases may arise dependingon the `constant' term C: either the equation (31) is contradictory, or it hasan infinite number of solutions given by the formula

X=%+%,where X. is a fixed particular solution of (31) and X, the general solutionof the homogeneous equation (32) (the structure of X, was described in § 1).

§ 4. The Scalar Equation J (X) = 0

1. To begin with, let us consider the equation

9(X) =0,where

(33)

9 (A)= (A -Al)°1 (A -A$)°+ ... (A -Ah).h


is a given polynomial in the variable A and X is an unknown square matrixof order n. Since the minimal polynomial of X, i.e., the first invariantpolynomial i, (A), must be a divisor of g(A), the elementary divisors of Xmust have the following form :

'I j2,...,j,=1,2,...,1x,

S ai,, pig S aj,, ..., pl, S al, ,i,+P, +...+P,,=n

(among the indices jj, j2, ... , j, there may be some that are equal; n is thegiven order of the unknown matrix X).

We represent X in the form

X = T {Aj,E(P" + H(Pj=), ... , Aj B(pj,) + HOPQ) T-1, (34)

where T is an arbitrary non-singular matrix of order n. The set of solutionsof the equation (33) with a given order of the unknown matrix splits, byformula (34), into a finite number of classes of similar matrices.

Example 1. Let the equation

X'n =0 (35)be given.

If a certain power of a matrix is the null matrix, then the matrix iscalled nilpotent. The least exponent for which the power of the matrix isthe null matrix is called the index of nilpotency.

Obviously, the solutions of (35) are all the nilpotent matrices with anindex of nilpotency p < m. The formula that comprises all the solutions ofa given order n looks as follows (T is an arbitrary non-singular matrix):

X = T (H(P,), H(Pt), ... , H(Iv)) T-1

Example 2. Let the equation

(Pil P21 ..., py _ in,

i+p2+...+p,.=np(36)

X2 = X (37)be given.

A matrix satisfying this equation is called idempotent. The elementarydivisors of an idempotent matrix can only be A or A - 1. Therefore anidempotent matrix can be described as a matrix of simple structure (i.e.,reducible to diagonal form) with characteristic values 0 or 1. The formulacomprising all the idempotent matrices of a given order n has the form

X=T(1,1,..1,0,...,0)T-', (38)

n

where T is an arbitrary non-singular matrix of order n.

§ 5. MATRIX POLYNOMIAL EQUATIONS 227

2. Let us now consider the more general equation

f (X) = 0, (39)

where f (A) is a regular function of A in some domain 0 of the complex plane.We shall require of the unknown solution X = II xik II i that its character-istic values belong to G and that their multiplicities be as follows :

Zeros : Al, A2, ...Multiplicities: al, a2, ... .

As in the preceding case, every elementary divisor of X must have theform

(A-2)7' (P,Sa{),

and therefore

X = T {Ai,E°'Q + H(Pid, . , A,,E°'Q + H(PQ) T_,

(j1, js, ... , j,, =1, 2, ... ; pit S ait, p, S air, ... , pi, S ai,,;

(T is an arbitrary non-singular matrix).

(40)

§ 5. Matrix Polynomial Equations

1. Let us consider the equations

A0Xm+ AIX'"-i + ... + A. =0, (41)

YmA0+ Y'"-'Al+...+A,,,=0, (42)

where Ao, A,, ..., A. are given square matrices of order n and X, Y areunknown square matrices of the same order. The equation (33) investigatedin the preceding section is a very special-one could almost say, trivial-case of (41) and (42) and is obtained by setting A4 = aLE, where aj is anumber and i =1, 2, ... , m.

The following theorem establishes a connection between (41), (42), and(33).


THEOREM 4: Every solution of the matrix equation

A0Xm + AiXm-1 + ... + A. =O

satisfies the scalar equation

whereg (X) = 0, (43)

9(A)=JAoAm+AlAm-1+ ...+Aml- (44)

The same scalar equation is satisfied by every solution Y of the matrixequation

YmA0+Ym-'A,+...+ Am= 0.

Proof. We denote by F(A) the matrix polynomial,(A) A,Am+AlAm-'+...+Am.

Then the equations (41) and (42) can be written as follows (see p. 81)

F(X)=O, F(Y)=O.By the generalized Bezont Theorem (Chapter IV, § 3), if X and Y are

solutions of these equations, the matrix polynomial F(A) is divisible on theright by AE - X and on the left by AE - Y :

F (A) = Q (A) (AE--X) = (AE- Y) Q, (A).Hence

g (A) =IF(A) I= IQ (A) I A (A) = I Q , (A) I Al (A) (45)

where d (A) = I AE - X I and z1 1(A) = I AE - Y I are the characteristic poly-nomials of X and Y. By the Hamilton-Cayley Theorem (Chapter IV, § 4),

A(X)=0, A(Y)=0.Therefore (45) implies that

g(X)=9(1')=0,and the theorem is proved.

Note that the Hamilton-Cayley Theorem is a special case of this theorem.For every square matrix A, when substituted for A, satisfies the equation

AE-A=0.Therefore, by the theorem just proved,

A(A) =0,where A(A)=IAE-AI.

§ 5. MATRIX POLYNOMIAL EQUATIONS 229

2. Theorem 4 can be generalized as follows :

THEOREM 5:5 If %o, %1, ... , X. are pairwise permutable square mat-rices of order n that satisfy the matrix equation

,A0X0 + AiIj + ... + A. gm= O (46)

(Ao, A1, ..., A. are given square matrices of order n), then the same mat-rices %o, %1, ... , X. satisfy the scalar equation

g (X0, %1, ... , %) = 0, (47)where

9($0, 51, ..., Em)=JAoEo+A,$,+...+AmEmJ, (48)

Proof. We setaF(E0, e1, ..., Em)=JJfik(EO, $1, ...,

L1, ... , Em are scalar variables.We denote by F (Eo, E1, ..., Em) = JJ frk (Eo, 1, , Em) JJ, the adjoint

matrix of F (f{k is the algebraic complement of fkj in the determinant

JF( o, 1, Then every elementf{k(i,k=1, 2, ... , n) of F is a homogeneous polynomial in o,1, ... ,m of degreem - 1, so that F can be represented in the form

F= ' Ff.fi...im o 1 in ,h+f.+ +f,,,-.n-1

where are certain constant matrices of order n.From the definition of F there follows the identity

...,m)E.We write this in the following form :

F1.f....fm(AoEo+A1s1+...+AmEm)E Ei.,.fm

= 9 (Eo, E1, ..., Em) E . (49)

The transition from the left-hand side of (49) to the right-hand side isaccomplished by removing the parentheses and collecting similar terms.In this process we have to permute the variables Eo, S1, ..., 6m among eachother, but we do not have to permute the variables $o, E1, ... , S. with thematrix coefficients A, and fm. Therefore the equation (49) is notviolated when we substitute for the variables the pairwisepermutable matrices %o, %1f . . . , X. :

5 See [318).

6 The / i k E1, , . m) are linear forms in to, $1 , ... , fi n , (i, k = 1, 2, ... , n).

230 VIII. MATRIx EQUATIONS

2 (AOX0 + AiXi + ... + Amgm) X X ... gfmfo+f,+...+fm°n_1

=g(%o, X1, ..., gm) (50)

But, by assumption,AoXo+AIX1 +... +A,,,%m= 0.

Therefore we find from (50)

9 (10, X1, ... , %m) = 0,

and this is what we had to prove.Note 1. Theorem 5 remains valid if (46) is replaced by

%0Ao+X1A1+...+gmAm=O. (51)

For we can apply Theorem 5 to the equation

A;Xo+AiXi+...+AX. =0

and then go over term by term to the transposed matrices.Note 2. Theorem 4 is obtained as a special case of Theorem 5, when we

take for Xo, X1, ... , X.

%m,%'"-1,..., X, E.

3. We have shown that every solution of (41) satisfies the scalar equation(of degree < mn )

g(A) =0.

But the set of matrix solutions of this equation with a given order it splitsinto a finite number of classes of similar matrices (see § 4). Therefore allthe solutions of (41) have to be looked for among the matrices of the form

TTDjT1 1 (52)

(here D{ are well-defined matrices; if we wish, we may assume that the D1have Jordan normal form. T; are arbitrary non-singular matrices of ordern ; i =1, 2, ... , n). In (41) we substitute for X the matrix (52) and chooseT{ such that the equation (41) is satisfied. For each Tt we obtain a linearequation

A0T{Dr + A1T1DD -1 + + A,,,TT = 0 (i =1, 2, ..., n). (53)

A natural method of finding solutions T, of (53) is to replace the matrixequation by a system of linear homogeneous scalar equations in the elements

§ 6. EXTRACTION OF 1Yl-TH ROOTS OF NON-SINGULAR MATRIX 231

of the required matrix T. Each non-singular solution TL of (53), whensubstituted in (52), yields a solution of the given equation (41). Similararguments may be applied to the equation (42).

In the following two sections we shall consider special cases of (41)connected with the extraction of m-th roots of a matrix.

§ 6. The Extraction of m-th Roots of a Non-Singular Matrix

1. In this section and the following, we deal with the equation

Xm=A, (54)

where A is a given matrix and X an unknown matrix (both of order n) andm is a given positive integer.

In this section we consider the case J A J 0 (A is non-singular). Allthe characteristic values of A are different from zero in this case (since

A J is the product of these characteristic values).We denote by

(1-11)p! , (-2)', ... , (A - Ax)ru (55)

the elementary divisors of A and reduce A to Jordan normal form :7

A= UAU-1= U {11E1 + H1, ..., I.E, + Hj U-1. (56)

Since the characteristic values of the unknown matrix X, when raised tothe m-th power, give the characteristic values of A, all the characteristicvalues of X are also different from zero. Therefore the derivative off (A) _ A- does not vanish on these characteristic values. But then (seeChapter VI, p. 158) the elementary divisors of X do not `decompose' whenX is raised to the m-th power. From what we have said, it follows that theelementary divisors of X are :

(2- $)P=, ..., (R- u) (57)

where _ AP i.e., j is one of the m-th roots of Aj Q, j=1, 2,...,u).We now

determine'

AjEj+Hj in the following way. In the A-plane wetake a circle, with center Aj, not containing the origin. In this circle we havem distinct branches of the function 'VA- . These branches can be distinguishedfrom one another by the value they assume at the center 2j of the circle. Wedenote by 'PA that branch whose value at Aj coincides with the characteristicvalue j of the unknown matrix X, and starting from this branch we definethe matrix function

m2j jHj by means of the series

Here Ej=E(P1) and Hi=HSr1) (jaI,°,,)


1 1 iinAiEf+Hf=AiEf+Af Hf+,»a(m-1}f Hi+..., (58)

which breaks off.Since the derivative of the function mil at t is not zero, the matrix (58 )

has only one elementary divisor (A - f) where f =mV! (here j =1, 2,3, ... , u) . Hence it follows that the quasi-diagonal matrix

{'nfAIEI + Hl , ml 22E2 H2 , ... , Hes }

has the elementary divisors (57), i.e., the same elementary divisors as theunknown matrix X. Therefore there exists a non-singular matrix T(I T J 0) such that

X = T{MVAlEI

+ HI , H2 , ... , j' AuEu + Hu } P-1. (59)

In order to determine T, we note that if on both sides of the identity

(m )m=A

we substitute the matrix A,E, + Ht (j =1, 2, ... , u) in place of A, we obtain :

(mVAtEi+H,)m=AtE1+Ht (j= 1, 2, ..., u).

Now from (54) and (59) it follows that

A = T {A1E1 + H1, A2E2 + H2, ..., AuEu + Hu} T-1. (60)

Comparing (56) and (60) we find:

T=UXZ, (61)

where Xa is an arbitrary non-singular matrix permutable with A (the struc-ture of Xa is described in detail in § 2).

When we substitute in (59) for T the expression UXd we obtain a formulathat comprises all the solutions of the equation (54)

X = UXa AIE + Hl, H2 , ... ,m'AEu

+ Hu } X A U. (62)

The multivalence of the right-hand side of this formula has a discrete aswell as a continuous character: the discrete (in this case finite) characterarises from the choice of the distinct branches of the function ya in thevarious blocks of the quasi-diagonal matrix (for ,l; = Ak the branches of m}"Ain the j-th and k-th diagonal blocks may even be distinct) ; the continuouscharacter arises from the arbitrary parameters contained in Xd .

§ 6. EXTRACTION OF m-TH ROOTS OF NON-SINGULAR MATRIX 233

All solutions of (54) will be called m-th roots of A and will be denotedby the many-valued symbol }A_. We point out that 1A- is, in general, nota function of the matrix A (i.e., is not representable in the form of a poly-nomial in A).

Note. If all the elementary divisors of A are co-prime in pairs, i.e., ifthe numbers A,, 22, ... , A,, are all distinct, then the matrix Xd has quasi-diagonal form

Xd ={X1, X2, ..., XI),

where Xi is permutable with 2fEf + Hi and therefore permutable with everyfunction of A1Ef + By and, in particular, with

'AfEf -}- Ht (j =1, 2, ... , u).

Therefore in this case (62) assumes the form

"tX- U{ mVASEs+H', ..., .+Hw) U-1Thus, if the elementary divisors of A are co-prime in pairs, then in the

formula for X=' 1 only a discrete multivalence occurs. In this case everyvalue of VA can be represented as a polynomial in A.

2. Example. Suppose it is required to find all square roots of

111 1 0A= 0 1 0 ,

0 0 1

i.e., all solutions of the equation

X'=d.In this case A has already the Jordan normal form. Therefore in (62)

we can set A = A, U = E. The matrix X4 in this case looks as follows :

X-=a b c

0 a 00 d e

where a, b, c, d, and e are arbitrary parameters.The formula (62), which gives all t'he required solutions X, now assumes

the following form :

a b c

X= 0 a 00 d e 1

a 2 0

0 e 00 0 ,

a b c

0 a 0

_I

(e'=, =1). (63)0 d 6


Without changing X we may multiply Xa in (62) by a scalar so thatX a I = 1. Then this leads to the equation a 2e = 1; and hence e = a 2.

Let us compute the elements of X aX. For this purpose we write downthe linear transformation with the matrix coefficients of Xd :

y,=ax1 +bx8+cx,,yx = axe,y8 = dx8 + a-2x,.

We solve this system of equations with respect to x,, x2, x,. Then weobtain the transformation with the inverse matrix Xz 1:

x, = a-1 y, - (a-2b - cd) y8 - acy8,x8 = a-1y9,x8=- ady8+a'y1.

Hence we find:

X-1=A

a b e

0 a 00 d a-8

-1 a-1 cd - a-$b -ac0 a-1 0

0 -ad a1

The formula (63) yields :e (s--t))acd+ s

aaec(71-e)

X= 0 0

II0 (e -,) da-1 'le (e -,1) vw+ 2 e) v

0 e 00 (e -)) w '1

(v=a8c, w=a-1d).

The solution X depends on two arbitrary parameters u andarbitrary signs a and ,).

w and two

§ 7. The Extraction of m-th Roots of a Singular Matrix

1. We pass on to the discussion of the case where J A I =0 (A is a singularmatrix).

As in the first case, we reduce A to the Jordan normal form :

A = U {A1E(Pi) + H(P,), ..., 2 i'(Pa) + H(Pa) ; H(43), H(4.), ..., H(4i)) U-1 ; (65)

here we have denoted by (A - 21)P+, ... , (A - the elementary divisors ofA that correspond to non-zero characteristic values, and by A4i, 24s, ... , Agethe elementary divisors with characteristic value zero.

§ 7. EXTRACTION OF m-TH ROOTS OF SINGULAR MATRIX 235

Then

where

A = U {A1, A2} U-1, (66)

A1= {i11E(P,) + H(P,), ..., A E(Pu) + H(P )), A2= {H(4,), H(a,), ..., E(Q,)). (67)

Note that A, is a non-singular matrix (I Al I 0) and A2 a nilpotentmatrix with index of nilpotency y = max (q1, q2, ... , q,.) (A'2 = 0).

The original equation (54) implies that A commutes with the unknownmatrix X and therefore the similar matrices

U-1 A U= {A1, A2} and U-1 X U (68)

also commute.As we have shown in § 2 (Theorem 3), from the permutability of the

matrices (68) and the fact that Al and A2 do not have characteristic values incommon, it follows that the second matrix in (68) has a corresponding quasi-diagonal form

U-1 XU = (X1, X2). (69)

When we replace the matrices A and X in (54) by the similar matrices

{A1, A2} and (X1, X2),

we replace (54) by two equations:

Xi = A1,XQ = A2.

(70)

(71)

Since I Al 1 0, the results of the preceding section are applicable to(70). Therefore we find X, by the formula (62) :

X1 = Xd, (AIE(P,) + H(P,), ... , " /2 E(Pu) + H(v )) Xd, . (72)

Thus it remains to consider the equation (71), i.e., to find all m-th rootsof the nilpotent matrix A2, which already has the Jordan normal form

A2 = {H(Q'), H(9,), ..., H(9t))i (73)

µ = max (q1, q2, .... qt) is the index of nilpotency of A2.From A2 = 0 and (71) we find

x"=0.The last equation shows that the required matrix X2 is also nilpotent

with an index of nilpotency v, where m (µ - 1) < v :!g m,u.We reduce X2 to theJordan form :


12 = T (H(t,), H(°=), ..., H(°s)) T-1(v1, v2, ..., v, S v) .

(74)

Now we raise both sides of (74) to the m-th power. We obtain :

A2 =X = T {[HO,)]", [H(ti,)]m, ..., [H(U,)]m) T-1. (75)

2. Let us now elarify the question of what elementary divisors the matrix[H(v)]m has.8 We denote by H the linear operator given by H(`) in a v-dimen-sional vector space with the basis el, e2, ..., e,.. Then from the form of thematrix H(°) (in H(v) all the elements of the first superdiagonal are equal to Iand all the remaining elements are 0) it follows that

He1= o, He2 = ell ... , He. = ew_1. (76)

These equations show that the vectors e3, e2, .... e form a Jordan chainfor H, corresponding to the elementary divisor Av.

We write (76) as follows :

Hei=ef-1 (9 =1, 2, ..., v ; eu = o).Obviously,

H"'e! = ei_m (9 =1, 2, ..., v; eu = e_1=... e-m+1=o). (77)

We express v in the form

v=km+r (r<m),

where k and r are non-negative integers. We arrange the basis vectorse1, e2, ... , ev in the following way :

ell eg, ..., em,em+1, em+2, ..., e2,,,,................

e(k-1) m+1, e(k-1) m+2, ... , ek m,ekm,+l, ... , ekm+,-

(78)

This table has m columns : the first r columns contain k + 1 vectors each,the remaining ones k vectors. The equation (77) shows that the vectors of ,

each column form a Jordan chain with respect to the operator Hm. If instead

g This question is answered by Theorem 9 of Chapter VI (p. 158). Here we arecompelled to use another method of investigating the problem, because we have to findnot only the elementary divisor of the matrix [H(9)]- , but also a matrix p., . transform-ing [H(°)]" into Jordan form.

§ 7. EXTRACTION OF m-TH ROOTS OF SINGULAR MATRIX 237

of numbering the vectors (78) by rows we number them by columns, weobtain a new basis in which the matrix of the operator Hm has the followingJordan normal form :s

H(k+1), H(k), ..., H(kl};r m-r

and therefore[Httil]m _ Ps.m {H(k+1), ... , H(k+1), H(k), ... , HHk>} P-1

V,

r m-r

(79)

where the matrix P,,,m (describing the transition from the one basis to theother) has the following form (see Chapter III, § 4) :

m

1 0 ... 0 0 ...

0 0 ... 0 1 ...

Pn. m =

0 0...0 II

0 1...0

(80)

The matrix Hc°> has the single elementary divisor Av. When HO) is raisedto the m-th power, this elementary divisor `falls apart.' As (79) shows,[H(°)]" has the elementary divisors :

jk+l,'U .. Ikr m-r

Turning now to (75), we set :

vi=k*m+rr (05r{<m, kk?0; i=1,2, ..., e). (81)

Then, by (79), equation (75) can be written as follows:

A2 = %s = TP {H(k1+1), ..., H(kz+1), H(k,), ..., H«>,r,

H(k,+1), ..., H(k,+1), Hck.l ...} P-1 T'. (82)

where P= {Pry,., Pv rn, ..., P.,, }

e In the case k = 0, the blocks 11(k), . , H(k) are absent, and the matrix has the form

r


Comparing (82) with (73), we see that the blocks

H(k,+1), ..., H(ki), ..., H(k,), $(k,+1), ..., H(k,-4'1), .. .

must coincide, apart from the order, with the blocks

H(v,) H(4.), ... ,kg')

(83)

(84)

3. Let us call a system of elementary divisors.", P"'.... ,A"+ admissible forX2 if after raising of the matrix to the m-th power these elementary divisorssplit and generate the given system of elementary divisors of A2: 29,, At,,A,q3,... , 19t. The number of admissible systems of elementary divisors isalways finite, because

max(VII v2, ..., v,)Smµ, v1+v2+...+v,=n2

(n2 is the order of A2).

In every concrete case the admissible systems of elementary divisors forX2 can easily be determined by a finite number of trials.

Let us show that for each admissible system of elementary divisors A"', 2, d"" form a corresponding solution of (71) and let us determine all these

solutions. In this case there exists a transforming matrix Q such that

(H(k'+I), ..., H(k++1), H(k'), ..., H(k'), H(k,+1), ... )=Q-IA2Q. (86)

The matrix Q describes the permutation of the blocks in the quasi-diagonalmatrix that brings about the proper renumbering of the basis vectors.Therefore Q can be regarded as known. Using (86), we obtain from (82)

As = TPQ-1AZQP-IT-1.

Hence

orTPQ-1= X A, ,

T = %e, QP-1, (87)

where %A2 is an arbitrary matrix that commutes with A2.Substituting (87) for T in (74), we have

X' ='A, QP-1 (H(",), H(a), ... , H("a) } PQ-1$a' . (88)

From (69), (72), and (88) we obtain a general formula which comprisesall the solutions :

X = U { %a,, %d,QP-'} {'")/11E(h) + H(P,), ..., H(Pu,

H("'), ..., H(v,) ) . {%d,, PQ-1%a, } U-1. (89)

§ 8. THE LOGARITHM OF A MATRIX 239

We draw the reader's attention to the fact that the m-th root of a singularmatrix does not always exist. Its existence is bound up with the existenceof a system of admissible elementary divisors for X2.

It is easy to see, for example, that the equationgm = H(P)

has no solution for m > 1, p > 1.Example. Suppose it is required to extract the square root of

A=0 1 0

0 0 00 0 0

i.e., to find all the solutions of the equation

1' = A.

,

In this case, A = A2, 1=12, m=2, t=2, q1= 2, and q2 =1. Thematrix X can only have the one elementary divisor 23. Therefore s = 1,v1=3, k1=1, r1=1 and (see (80))

1 0 0P= Ps.s = 0 0 1 11 -P-1, Q= E.

0 1 0

Moreover, as in the example on page 233, in (88) we may set :

XA,=a b c

0 a 00 d a-2

X-1 =A,

a-1 cd -a-2b -ac0 a-1 00 -ad a$

From this formula we obtain

% =X==XA P-'H "PIQ =0 a p0 0 00 ft-1 0

where a = ca-1- a2d and l4 = a3 are arbitrary parameters.

§ 8. The Logarithm of a Matrix

1. We consider the matrix equation

ex =A. (90)

All the solutions of this equation are called (natural) logarithms of Aand are denoted by In A.


The characteristic values A, of A are connected with the characteristicvalues ; of X by the formula A, = eel ; therefore, if the equation (90) has asolution, then all the characteristic values of A are different from zero,and A is non-singular (I A I 0). Thus, the condition I A 1 0 is necessaryfor the existence of solutions of the equation (90). Below, we shall see thatthis condition is also sufficient.

Suppose, then, that I A ; 0. We write down the elementary divisorsof A :

(A-Al)P', (A-A2)P', ..., (11-A,)7

(2122 ... Au 0, Pi+P2+---+P, =n). (91)

Corresponding to these elementary divisors we reduce A to the Jordannormal form :

A = UA U-1

= U { AIE(P') + H(P,), AZE(P,) + H(P.), ... , AuE(Pu) + H(Pu) } U-1. (92)

Since the derivative of the function eE is different from zero for allvalues of , we know (see Chapter VI, p. 158) that in the transition from Xto A = ex the elementary divisors do not split, so that X has the elementarydivisors

(A - sl)P`, (A -6a)P', ..., (A - bu)P" , (93)

where eF! - 2j (j =1, 2, ... , u), i.e., E1 is one of the values of in A (j =1, 2,3,.. ,u).

In the plane of the complex variable A we draw a circle with center at ,1;and with radius less than I 1, I and we denote by f1(A) =1n A that branch ofthe function In A in this circle which at Af assumes the value equal to thecharacteristic value E, of X (j =1, 2, . . . , u). After this, we set:

In (A1E(ri) + H(PI)) = t! (A5E(P/) + H(P)) = In A,E(Pi) + AX'H(P;) + ... , (94)

Since the derivative of In A vanishes nowhere (in the finite part of theA-plane), the matrix (94) has only the one elementary divisor (A - t!)Pi.Therefore the quasi-diagonal matrix

{ In (A1E(P,) + H(P,)), In (22E(P,) + H(P*)), ..., In (AE(Pu) + H(Pi)) } (95)

has the same elementary divisors as the unknown matrix X. Therefore thereexists a matrix T (I T I 0) such that

§ 8. THE LOGARITHM OF A MATRIX 241

X = T { In (d1E(rl) + H(P*)), ..., In (2,,E(Pu) + f(rt)) ) T-1. (96)

In order to determine T, we note that

A = ex = T { A1E(P,) + H(P,), ..., A,,E(Pu) + H(Pu)) T-1. (97)

Comparing (97) and (92), we find:

T=U12, (98)

where Xd is an arbitrary matrix that commutes with A. Substituting theexpression for T from (98) into (96), we obtain a general formula thatcomprises all the logarithms of the matrix :

X = UX; (In (21E(P,) + H(")),

In (.2E(Pa) + H('*)), ... , In (2 E(Ps) + H(ra)) ) XA U-1. (99)

Note. If all the elementary divisors of A are co-prime, then on theright-hand side of (99) the factors X; and Xa can be omitted (see a similarremark on p. 233).

CHAPTER IX

LINEAR OPERATORS IN A UNITARY SPACE

§ 1. General Considerations

In Chapters III and VII we studied linear operators in an arbitraryn-dimensional vector space. All the bases of such a space are of equal stand-ing. To a given linear operator there corresponds in each basis a certainmatrix. The matrices corresponding to one and the same operator in thevarious bases are similar. Thus, the study of linear operators in an n-dimen-sional vector space enables us to bring out those properties of matrices thatare inherent in an entire class of similar matrices.

At the beginning of this chapter we shall introduce a metric into ann-dimensional space by assigning in a special way to each pair of vectorsa certain number, the `scalar product' of the two vectors. By means of thescalar product we shall define the `length' of a vector and the cosine of the`angle' between two vectors. This metrization leads to a unitary spaceif the ground field F is the field of all complex numbers and to a euclideanspace if F is the field of all real numbers.

In the present chapter we shall study the properties of linear operatorsthat are connected with the metric of the space. All the bases of the spaceare by no means of equal standing with respect to the metric. However,this does hold true of all orthonormal bases. The transition from one ortho-normal basis to another in a unitary space is brought about by means ofa special-namely, unitary-transformation (in a euclidean space, an or-thogonal transformation). Therefore all the matrices that correspond to oneand the same linear operator in two distinct bases of a unitary (euclidean)space are unitarily (orthogonally) similar. Thus, by studying linear opera-tors in an n-dimensional metrized space we study the properties of matricesthat remain invariant under transition from a given matrix to a unitarily-or orthogonally-similar one. This will lead in a natural way to the investi-gation of properties of special classes of matrices (normal, hermitian, uni-tary, symmetric, skew-symmetric, orthogonal matrices).

242

§ 2. METRIZATION OR A SPACE 243

§ 2. Metrization of a Space

1. We consider a vector space R over the field of complex numbers. Toevery pair of vectors x and y of R given in a definite order let a certaincomplex number be assigned, the so-called scalar product, or inner product,of the vectors, denoted by (xy) or (x, y). Suppose further that the `scalarmultiplication' has the following properties :

For arbitrary vectors x, y, z of R and an arbitrary complex number a, let'

1. (X Y) _ (Yx) ,2. (ax , Y) =a (xY) , (1)3. (x + Y, Z) =(xx) + (yx) .

Then we shall say that a hermitian metric is introduced in R.Note that 1., 2., and 3. have the following consequences for arbitrary

x, y, z in R:2'. (x, ay) = a (xy ),

3'. (x, y + x)= (xy) + (xx).From 1. we deduce that for every vector x the scalar product (x x) is a

real number. This number is called the norm of x and is denoted by N xNx=(x,x).

If for every vector x of R

4. Nx = (x x) ? 0, (2)

then the hermitian metric is called positive semidefinite. And if, more -over,

5. Nx = (x x) > 0 for x, o, (3)

then the hermitian metric is called positive definite.

DEFINITION 1 : A vector space R with a positive-definite hermitian metricwill be called a unitary space.2

In this chapter we shall consider finite-dimensional unitary spaces.'

By the length of the vector x we mean' + _N x = + (x x) = I x I. From 2.and 5. it follows that every vector other than the null vector has a positive

I A number with a bar over it denotes the complex conjugate of the number.2 The study of n-dimensional vector spaces with an arbitrary (not positive-definite)

metric is taken up in the paper [319].3 In §§ 2-7 of this chapter, wherever it is not expressly stated that the space is finite-

dimensional, all the arguments remain valid for infinite-dimensional spaces.4 The symbol . V denotes the non-negative (arithmetical) value of the root.

244 IX. LINEAR OPERATORS IN A UNITARY SPACE

length and that the null vector has length 0. A vector x is called normalized

(or is said to be a unit vector) if I x I = 1. To normalize an arbitrary vector

x o it is sufficient to multiply it by any complex number A for which

ICI= 1XI,By analogy with the ordinary three-dimensional vector spaces, two vectors

x and y are called orthogonal (in symbols : x 1 y) if (xy) = 0. In this case

it follows from 1., 3., and 3'. that

N(x+y)= (x + y, x+y)= (xx) + (yy)=Nx+Ny,i.e. (the theorem of Pythagoras !) ,

Ix+yI2=1x12+Iyj2 (x±y)Let R be a unitary space of finite dimension n. We consider an arbitrary

basis el, e2, ... , e of R. Let us denote by xi and y{ (i = 1, 2, ... , n) thecoordinates of the vectors x and y in this basis :

n nx= 'x{ei. yy4et.

Then by 2., 3., 2'., and 3'.,

where

r-1 t-1

(xy) _ 2' htkX44f, k-1

(4)

he= (eiek) (s, k =1, 2, ... , n). (5)

In particular,Nx = (xx) h Ax-k . (6)

{,k-1From 1. and (5) we deduce

it hu=hj (i, k=1, 2, ..., n). (7)

2. A form , ' ha xizk , where hkf = Nk (i, k = 1, 2, ... , n) is called her-{'k-1

mitian.5 Thus, the norm of a vector, i.e., the square of its length, is a her-mitian form in its coordinates. Hence the name `hermitian metric.' Theform on the right-hand side of (6) is, by 4., non-negative:

± h.xzk ? 0 (8){.k-1

for all values of the variables x1, x2, ... , By the additional condition 5.,the form is in fact positive definite, i.e., the equality sign in (8) only holdswhen all the xj are zero (i = 1, 2, ... , n).

5In accordance with this, the expression on the right-hand side of (4) is called ahermitian bilinear form (in x,, x2, ... , xa and y,, y,, ... ,y).

§ 2. METRIZATION OR A SPACE 245

DEFINITION 2: A system of vectors e1, e2, ... , e,, is called orthonormal if

0, for ilk,(eiek) =art=1 1, for i =k

(i, k =1, 2, - . , m) (9)

When m = n, where n is the dimension of the space, we obtain an orthonormal

basis of the space.In § 7 we shall prove that every n-dimensional space has an orthonormal

basis.Let x{ and yr (i=1, 2, ... , n) be the coordinates of x and yin an ortho-

normal basis. Then by (4), (5), and (9)

(xY) xryr:-1 (10)

= ±Ixdl2.Nx=(xx)

Let us take an arbitrary fixed basis in an n-dimensional space R. In thisbasis every metrization of the space is connected with a certain positive-

definite hermitian form h{kxdxk; and conversely, by (4), every suchCk-I

form determines a certain positive-definite hermitian metric in R. How-ever, these metrics do not all give essentially different unitary n-dimensionalspaces. For let us take two such metrics with the respective scalar products(xy) and (xy)'. We determine orthonormal bases in R with respect tothese metrices : er and e{ (i = 1, 2, . . . , n). Let the vector x in R be mappedonto the vector x' in R, where x' is the vector whose coordinates in the basiser' are the same as the coordinates of x in the basis e{ (i =1, 2, ... , n).(x -), x'.) This mapping is of f ine.6 Moreover, by (10),

(xY) = (x'Y')'

Therefore: To within an of fine transformation of the space all positivedefinite hermitian metrizations of an n-dimensional vector space coincide.

If the field F is the field of real numbers, then a metric satisfying thepostulates 1., 2., 3., 4., and 5. is called euclidean.

DEFINITION 3: A vector space R over the field of real numbers with apositive euclidean metric is called a euclidean space.

If xr and yr (i 1, 2, ... , n) are the coordinates of the vectors x and yinsome basis el, e2, ... , ea of an n-dimensional euclidean space, then

6 I.e., the operator A that maps the vector x of R onto the vector x' of R' is linearand non-singular.


rA

(xY) = Li 8Ekx4Vk ,i, k_1

Nx = I x 12 = , 'Sikxixk .

Here Sik = Ski (i, k =1, 2, ... , n) are real numbers.' The expression

,'Sikxixk is called a quadratic form in x1i x2, ... , x,,. From the fact that theCk_1 n

metric is positive definite it follows that the quadratic form . ' Sikxixk,i.k_1

which gives this metric analytically, is positive definite, i.e., Sikxixk > 0n i.k_1

if y4>0-i-1

In an orthonormal basis

n A

(xy) xiyi, Nx =I x I2

=.E 4. (11)i_1

For n = 3 we obtain the well-known formulas for the scalar product oftwo vectors and for the square of the length of a vector in a three-dimensionaleuclidean space.

§ 3. Gram's Criterion for Linear Dependence of Vectors

1. Suppose that the vectors x1, x2, ... , x of a unitary or of a euclideanspace R are linearly dependent, i.e., that there exist numbers8 c1i c2, ... , C.not all zero, such that

CI-11 + 02x2 + ... + Cmxm = 0. (12)

When we perform the scalar multiplication by x1i x2, ... , x. in successionon both sides of this equation, we obtain

(x1x1) C1 + (x1x2) C2 + ... + (xlxm) C. = 0(x1x1) Ci + (x2x2) c2 + ... + (x2xm) Cm =0 (13).....................

(x,ax1)C1+(xmx2)0 +...+(xmxm)Cm=0

Regarding c1, c2, ... , c,,, as a non-zero solution of the system (13) of linearhomogeneous equations with the determinant

T s,k = (e,es) (i, k = 1, 2, ... , n).8 In the case of a euclidean space, c,, c,, ... , c, are real numbers.

§ 3. GRAM'S CRITERION FOR LINEAR DEPENDENCE 247

(x1x1) (xlx2) ... (xixm)

G (xl, x2, ... , xm) = (x2x1) (x2x2) (x2xm) (14)

(xmx) xmx2xmxmwe

conclude that this determinant must vanish :

G (xi, x2, ... , X-)=0'

G (xi, x2, ... , xm) is called the Gramian of the vectors x1, x2, ... , xm.Suppose, conversely, that the Gramian (14) is zero. Then the system of

equations (13) has a non-zero solution e `ca, . . . Equations (13) canbe written as follows :

(x1. C1x1 + C2x2 + ... +- Cmxm) = 0

(x2, cixi+-c2x2+...+cmxm)=013'( )

(xm, Cixl + C2x2 + ... + Cmxm) = 0 .

Multiplying these equations by c1, c2f ... , cm respectively, and then add-ing, we obtain :

N(cixi+c2x2+...+Cmx.)=0;and since the metric is positive definite

Cixi+C2x2+...+Cmxm=0,

i.e., the vectors x1, x2i .. ., xm are linearly dependent.Thus we have proved :

THEOREM 1: The vectors x1i x2, ... , xm are linearly independent if andonly if their Gramian is not equal to zero.

We note the following property of the Gramian :If any principal minor of the Gramian is zero, then the Gramian is zero.

For a principal minor is the Gramian of part of the vectors. When thisprincipal minor vanishes, it follows that these vectors are linearly dependentand then the whole system of vectors is dependent.

2. Example. Let fI(t), f2 (t), ..., f,, (t) be n complex functions of a realargument t, sectionally continuous in the closed interval [a, 0]. It isrequired to determine conditions under which they are linearly dependent.For this purpose, we introduce a positive-definite metric into the space offunctions sectionally continuous in [a, PI by setting


p

(f, g) = f f (a) j_(9-) dt.a

Then Gram's criterion (Theorem 1) applied to the given function yieldsthe required condition :

p p _f f (t) fl (t) dt ... f fL (t) fn (t) dta a

p p

f fA (t) f (t) dt ... f fn (t) J._(9)a at

= 0.

§ 4. Orthogonal Projection

1. Let x be an arbitrary vector in a unitary or euclidean space R and S anm-dimensional subspace with a basis x1, x2, ..., x.. We shall show that xcan be represented (and moreover, represented uniquely) in the form

wherex=xS+XN. (15)

xs e S and xN 1 S

(the symbol 1 denotes orthogonality of vectors; orthogonality to a subspacemeans orthogonality to every vector of the subspace) ; xs is the orthogonalprojection of x onto S, XN the projecting vector.

Fig. 5

Example. Let R be a three-dimen-sional euclidean vector space andm = 2. Let all vectors originate at afixed point 0. Then S is a plane pass-ing through 0; xs is the orthogonalprojection of x onto the plane S; xNis the perpendicular dropped from theendpoint of x onto the plane S (Fig.5) ; and h = I xN I is the distance of theendpoint of x from S.

To establish the decomposition (15), we represent the required xS in theform

x8 =C1Xl+C2XZ+..0 +Cmxm, (16)

where cl, c2j ... , c,,, are complex numbers.9

9 In the case of a euclidean space, c,, es, ... , c,,, are real numbers.

§ 4. ORTHOGONAL PROJECTION 249

To determine these numbers we shall start from the relations

(x-x3, xk)=0 (k=1, 2, ..., m). (17)

When we substitute in (17) for xs its expression (16), we obtain:

(x1x1) c1 + .....}.. (Xmxl) C. + (xx1) (-1) = 0

(18)(xlxm) cl + ... + (xmxm) C. +(xxm) (- 1) = 0

xcx1 c xmcm+

Regarding this as a system of linear homogeneous equations with thenon-zero solution c1i c2, ... , cm, - 1, we equate the determinant of the systemto zero and obtain (after transposition with respect to the main diagonal) :lo

(x1x1) ... (xlxm) x1

(xmxl) ... (xmxm) X.(xx1) ... (xxm) xs

=o. (19)

When we separate from this determinant the term containing xs, weobtain (in a readily understandable notation) :

X1

a

XS =

xm(xx1) ... (xxm) 0

(20)

where G = G (x1, x2, ... , xm) is the Gramian of the vectors x1, x2i ... , xm(in virtue of the linear independence of these vectors, G 0). From (15)and (20), we find:

x1

a

xn =x--XS =xm

(xx1) ... (xxm) xa (21)

10 The determinant on the left-hand side of (19) is a vector whose i-th coordinate isobtained by replacing all the vectors x,, ... , xg in the last column by their i-tbcoordinates (i = 1, ... , n) ; the coordinates are taken in an arbitrary basis. Tojustify the transition from (18) to (19), it is sufficient to replace the vectors x,, ... , x,,,,xS by their i-th coordinates.


The formulas (20) and (21) express the projection xs of x onto the sub-space S and the projecting vector xN in terms of the given vector x and thebasis of S.2. We draw attention to another important formula. We denote by hthe length of the vector XN. Then, by (15) and (21),

1 (xix) I

Q

h2 = (xNxN) _ (xNx) =(xmx)

(xx1) ... (xxm) (x x)G

'bis ^ G (x1, x9. ..., xm, x)(22)G (x,, x,, ..., xm)

The quantity h can also be interpreted in the following way :Let the vectors x1, x2, ... , xm, x issue from a single point and construct

on these vectors as edges an (m + 1)-dimensional parallelepiped. Then his the height of this parallelepiped measured from the end of the edge x tothe base S that passes through the edges x1i X2,- - - , x,.

Let y be an arbitrary vector of S and x an arbitrary vector of R. If allvectors start from the origin of coordinates of an n-dimensional point space,then I x - y j and I x - xs I are equal to the value of the slant height and theheight respectively from the endpoint of x to the hyperplane S.11 Therefore,when we set down that the height is shorter than the slant height, we have :12

h=jx-xs ( I x-YI(with equality only for y = xs) . Thus, among all vectors y e S the vectorxs deviates the least from the given vector x e R. The quantity h =j'N (x -- xs) is the mean-square error in the approximation x xs.'a

§ S. The Geometrical Meaning of the Gramian and Some Inequalities

1. We consider arbitrary vectors x,, x2, ..., xm. Let us assume, to beginwith, that they are linearly independent. In this case the Gramian formedfrom any of these vectors is different from zero. Then, when we set, inaccordance with (22),

It Bee the example on p. 248.12 N (x - y)= N (XN + XS -Y)= NxN + N(xs -y) ? N(xN) = h'-13 As regards the application of metrized functional spaces to problems of approxima-

tion of functions, see (1].

§ 5. GEOMETRICAL MEANING OF THE GRAMIAN 251

G(x xs, ..., xP+1) =h>0 (p=1, 2,G .. , xp) P

and multiply these inequalities and the inequality

(23)

G(xi) = (xixi) > 0, (24)

we obtainG (xi, x2, ... , xm) > 0-

Thus: The Gramian of linearly independent vectors is positive; that oflinearly dependent vectors is zero. Negative Gramians do not exist.

Let us use the abbreviation G,= G(xl, x2, ... , x,) (p =1, 2, ... , m).Then, from (23) and (24), we have

-\/G,=Ix, V,,

=VIh,=V2,

where V2 is the area of the parallelogram spanned by x, and x2. Further,

NX = V2h2 = V3,

where V3 is the volume of the parallelepiped spanned by x1, x2, x3. Continu-ing further, we find :

and, in general,G4 = V3h3 = V4

\/Gm=Vm_1h,,,_,=Vm. (25)

It is natural to call V. the volume of the m-dimensional parallelepipedspanned by the vectors x1, x2, ,

We denote by xlk, XU,..., x,k the coordinates of Xk (k = 1, 2, ..., m) inan orthonormal basis of R and set

B=1Ixjkll (i=1, 2, . . . , n ; k=1 ,2 ,

Then, in consequence of (10),

Gm=IXTXI

and therefore (see formula (25) ),

V,',1=Gm = mod

xill xil, ... xilmx421 xi,2 ... xiym

ximl xim, ... xim i

2

(26)

14 Formula (25) gives an inductive definition of the volume of an m-dimensionalparallelepiped.


This equation has the following geometric meaning :

The square of the volume of a parallelepiped is equal to the sum of thesquares of the volumes of its projections on all the m-dimensional coordinatesubspaces. In particular, for m = n, it follows from (26) that

x11 x12 ... x1n

V.=-- mod x21 x92 .. x2-

"7-1 1n2 ...

(27)

The formulas (20), (21), (22), (26), and (27) solve a number of funda-mental metrical problems of n-dimensional unitary and n-dimensionaleuclidean analytical geometry.

2. Let us return to the decomposition (15). This has the immediateconsequence:

(xx)=(x3+XN, xs+XN)=(xs, X)+(XN, XN)?(XNXN)=h2,

which, in conjunction with (22), gives an inequality (for arbitrary vectors

G (x1, x2, ... , xm, x) S G (x1, x2, ... , xm) G (X); (28)

the equality sign holds if and only if x is orthogonal to x1, x2, ... , xm.From this we easily obtain the so-called Hadamard inequality

G (x1, x2, ... , xm) S G (x1) G (x2) ... G (x,,,) , (29)

where the equality sign holds if and only if the vectors x1i x2, ... , xm arepairwise orthogonal. The inequality (29) expresses the following fact,which is geometrically obvious :

The volume of a parallelepiped does not exceed the product of the lengthsof its edges and is equal to it only when the parallelepiped is rectangular.

Hadamard's inequality can be put into its usual form by setting m = nin (29) and introducing the determinant d formed from the coordinatesX1k, x2k, ... , x,, of the vectors Xk (k = 1, 2, ... , n) in some orthonor*nal basis :

x11 ... xind = I

xnl .x,,,1


Then it follows from (27) and (29) thatn n n

1 X,112. Ix,212

i_1 {_1

3.15 We now turn to the inequality

G(xis, xas, ..., xms) SG(xi, x2, ..., xm) (30)

If G(x1i x2, ... , xm)r 0, then the equality sign holds in (30) if and only ifx{n =0 (i=1, 2, . . ., m). If G(x1j x2, ... , xm) =0, then (30) implies, ofcourse, that G (x1s, x2s, ... , X,, S) = 0.

In virtue of (25), the inequality (30) expresses the following geometricfact.

The volume of the orthogonal projection of a parallelepipep onto a sub-space S does not exceed the volume of the given parallelepiped; these volumesare equal if and only if the projecting parallelepiped lies in S or has zerovolume.

We prove (30) by induction on rn.The first step (m =1) is trivial and yields the inequality

a (xis) G (xi),

i.e., I xis 1 :5 1 xi I (see Fig. 5 on page 248).We write the volume /G (x1, x2, ... , xm) of our parallelepiped as the

product of the `base' y G (x1, x2, ... , xm_1) by the distance h of the vertexof xm from the base:

G (xi, x3, ... , xm_1) . A = }'G (x1, x2, ... , xm) (31)

If we now go over on the left-hand side of (31) from the vectors x{ totheir projections xis (i = 1, 2, . . . , m), then the first factor cannot increase,by the induction hypothesis, nor the second, by a simple geometric argument.But the product so obtained is the volume / G (x1s, x2s, ... , xms) of the paral-lelepiped projected onto the subspace S. Hence

G(xis, x2S, ... , S V G (xi, x2, . , x')

and by squaring both sides, we obtain (30).Our condition for the equality sign to hold follows immediately from the

proof.

'' Subsections 3 and 4 have been modified in accordance with a correction publishedby the author in 1954 (Uspehi Mat. Nauk,vol. 9, no. 3).


4. Now we shall establish a generalization of fladamard's inequality whichcomprises both the inequalities (28) and (29) :

G (x1, x2, ... , xm) 5 G (x1, . , x7,) G (x7,+1, ... , xm), (32)

where the equality sign holds if and only if each vector x1, x2i ... , x, isorthogonal to each of the vectors xp+1, ... , xm or one of the determinantsG(x1,x2) vanishes.

The inequality (32) has the following geometric meaning:

The volume of a parallepiped does not exceed the product of the volumesof two complementary `faces' and is equal to this product if and only if thesefaces are orthogonal or at least one of them has volume zero.

Let us prove the inequality (32). Let p < m. If G (x1, x2, ... , x7,) = 0,then (32) holds with the equality sign. Let G(x1i x2, ... , x7,) 0. Thenthe p vectors x1i x2, ..., x9 are linearly independent and form a basisof a p-dimensional subspace T of R. The set of all vectors y of R thatare orthogonal to T are easily seen also to form a subspace of R (theso-called orthogonal complement of T; for details, see § 8 of this Chapter).We denote it by S, and then R = T + S.

Since every vector of S is orthogonal to every vector of T, we can go over,in the Gramian G(x1i X2.... , x,,,), whose square represents a certain volume,from the vectors x7,+1i ... , to their projections xms onto thesubspace S :

...G (xl, ..., x7,, x7,+1, ..., xfn) = G (xl, , x7,, x7,+1S, . . ., xfns) .

The same arguments show that the Gramian on the right-hand side of thisequation can be split :

G (x1, ..., x7,, x7,+1S , ..., xms) = G (x1, ..., xx) G (x7,+15, ..., -vans )

If we now go back from the projections to the original vectors and use (30),then we obtain

G (x1, ..., x7,) G (xP+1S , ..., xms) S G (x1, ..., x7,) G (xp+1, ..., xm) .

The equality sign holds in two cases: 1. When G(x,+1, ..., =0, forthen it is obvious that G(xp+1s,... , x,,,5) = 0; and 2. When x;s = x; (i =1, 2,3, ... , m), i.e., when the vectors x7,+1, ... , x,,, belong to S or, what is thesame, each vector x7,+1, ... , x,,, is orthogonal to every vector x1i X2, ... , x7,(the case G(x1, x.2, ... , x7,) = 0 has been considered at the beginning of theproof). By combining the last three relations we obtain the generalizedHadamard inequality (32) and the conditions for the equality sign to hold.This completes the proof.


5. The generalized Hadamard inequality (32) can also be put into analytic

form.

Let hjkxtxk be an arbitrary positive -definite hermitian form. By{, k-1

regarding x1, x2, ... , x,, as the coordinates, in a basis e1, e2, ... , e,,, of a vector

x in an n-dimensional space R, we take ± h{kxcxk as the fundamental4k-1

metric form of R (see p. 244). Then R becomes a unitary space. We applythe generalized Hadamard inequality to the basis vectors e1, e2, ... , en:

O (el, e$, ... , ej.5 G (el, ..., ep) G (ep+i, ... ,

Setting H = II hik II i and noting that (e;ek) = hik (i, k = 1, 2, ... , n), wecan rewrite the latter inequality as follows :

H (1 2 ...nn) ;5 H (1 2 ... p)g lp -F- 1 ... n) (P < n). (33)

Here the equality sign holds if and only if hik = hki = 0 (i=1, 2, ..., p ;k= p+1,...,n).

The inequality (33) holds for the coefficient matrix H = II h{k 111 of anarbitrary positive-definite hermitian form. In particular, (33) holdsif H is the real coefficient matrix of a positive-definite quadratic form

w

At h,xizkr, k-16. We remind the reader of Schwarz's inequality :t

For arbitrary vectors x, y e R

(xy) 2 S NxNy , (34)

and the equality sign holds only if the vectors x and y differ only by a scalarfactor

The validity of Schwarz's inequality follows easily from the inequalityestablished above

G (x, y) = (xx) (xy) ?- .(yx) (yy) I

By analogy with the scalar product of vectors in a three-dimensionaleuclidean space, we can introduce in an n-dimensional unitary space the

16 An analytical appronch to the generalized Hadaniard inequality can be found inthe book [17J, § S.

t In the Russian literature, this is known as Bunyakovskii's inequality.


`angle' B between the vectors x and y by defining"'

cos26=N NI'Y

From Schwarz's inequality it follows that 0 is real.

§ 6. Orthogonalization of a Sequence of Vectors

1. The smallest subspace containing the vectors x1, x2, ... , xp will be de-noted by [x1, x2, ... , xpJ. This subspace consists of all possible linearcombinations c1 x2 + c2 x2 + - + cp xp of the vectors x1, x9, ... , xp(c1, ca,cg,... , cp are complex numbers.)16 If x1, x2, . . ., xp are linearly independ-ent, then they form a basis of [x1, x$, . . . , xpJ. In that case, the subspaceis p-dimensional.

Two sequences of vectorsX : x1, xE, ... ,Y: yi, yg, .. .

containing an equal number of vectors, finite or infinite, will be calledequivalent if for all p

[xi, x2, ... , xp] = [y1, Y2 ... I Y p] (p = 1, 2, ...) .

A sequence of vectorsX: x1, x2, ...

will be called non-degenerate if for every p the vectors x1, x._, ... , xp arelinearly independent.

A sequence of vectors is called orthogonal if any two vectors of thesequence are orthogonal.

By orthogonalization of a sequence of vectors we mean a process of re-placing the sequence by an equivalent orthogonal sequence.

TBEOREM 2 : Every non-degenerate sequence of vectors can be orthogona-lized. The orthogonalizing process leads to vectors that are uniquely deter-mined to within scalar multiples.

17 In the case of a euclidean space, the angle 8 between the vectors x and y is definedby the formula

cas 8 = (xY)IxllY[

"s In the case of a euclidean space, these numbers are real.

§ 6. ORTHOOONALIZATION OF SEQUENCE OF VECTORS 257

Proof. 1) Let us prove the second part of the theorem first. Supposethat two orthogonalizing sequences yl, y2, ... (Y) and z2, z2, ... (Z) are equi-valent to one and the same non-degenerate sequence x1, x2, ... (X). ThenY and Z are equivalent to each other. Therefore for every p there existnumbers c,1, cp2, ... , cpp such that

Z1,'-_ CpIY1 + Cp2y2 + ... + c1, P-lyP-i + c,PyP (p=1, 2, ...) .

When we form the scalar products of both sides of this equation byyl, y2, .. ., y1,-1 and take account of the orthogonality of Y and of therelation

z, -L [z1, z2, ..., xp_lJ=[y1) y2v ..., yp_lJ,

we obtain ep1= Cp2 = = cp,_1= 0, and therefore

z1,=CPPyP (p=1, 2, ...).

2) A concrete form of the orthogonalizing process for an arbitrary non-degenerate sequence of vectors x1, x2, ... (X) is given by the followingconstruction.

Let

S1,= [x1, x2, ... , x1,] , Q1,= G (x1, x2, ... , x1,) (p=1,2,...).We project the vector x,, orthogonally onto the subspace Sp_1 (p=1,2 .... ):19

x1,= XPSp_1 +x1, NO x1,31,_1 6 SP-2, XPN 1 SP-I (p =1, 2, ...) .

We set

y1, )bpxpN (p=1, 2, ...; x1N=XI)

where A1, (p =1, 2, ...) are arbitrary non-zero numbers.Then it is easily seen that

Y: y1, y2, .. .is an orthogonal sequence equivalent to X. This proves Theorem 2.

By (21)x1

0

x1,-1

(xpx1)...(xpxp-1)x1,

op_I (p=1,2,...; G0=1).

19 For p = 1 we set x18. = o and x1N = x,.


Setting A, = O,_I (p = 1, 2, ... ; Co = 1), we obtain the following for-mulas for the vectors of the orthogonalized sequence:

(xlxl) . . . (xlxp-1) xixx x

Yi= X10 Ys(1 1) 1

(xe:L) x: (xr-ixi) .. (xr-lxr-1) xr-1(xpx1) ... (xpxp_1) x,

By (22),

Nyp= ap_1xxp,v =aP-1 G,p 1=Gp_1Gp (p =1, 2, ...; Go- 1). (36)

Therefore, setting

Sip= yp (p=1, 2, ...) , (37)O,-l0,

we obtain an orthogonal sequence Z equivalent to the given sequence X.Example. In the space of real functions that are sectionally continuous

in the interval (-1, + 11, we define the scalar product+1

(f, g) = ff(x)g(x)dx.-1

We consider the non-degenerate sequence of `vectors'

1, x, x2,x°,....

We orthogonalize this sequence by the formulas (35)

1 01

06

0...1

yo=1, yr=0

1

3

(m =1, 2, ..).

...................

................. x'"

30

5

0 ... x

0 0 ... 0

These orthogonal polynomials coincide, apart from constant factors, withthe well-known Legendre polynomials :20

I do s_ 1)MPo(x) =1, P. (x) = 2"'m! dm'" (m 1, 2, ...).

The same sequence of powers 1, x, x2, ... in a different metric

20 See [12], p. 77ff.

§ 6. ORTH00ONALIZATION OF SEQUENCE OF VECTORS 259

a

(f, g) = f f(x)g(x)t(x)dxa

(where t(x) ? 0 for a < x b) gives another sequence of orthogonalpolynomials.

I then we obtain theFor example, if a=-1, b =1 and t(x) _ ,rl-xATchebyshev (Chebyshev) polynomials:

2 11 cos (n are cos x).

For a = - oo, b = + oo and t (x) = ex' we obtain the hermitian poly-nomials, etc.21

2. We shall now take note of the so-called Bessel inequality for an ortho-normal sequence of vectors xl, x$, ... (Z). Let x be an arbitrary vector.We denote by , the projection of x onto z, :

gyp= (xxp) (p =1, 2, ...).

Then the projection of x onto the subspace S. = [xl, x2i ... , s9] can be repre-sented in the form (see (20) )

xs,p = E1x1 + .2x2 + ....+. Epxp (p =1, 2, ...).

But Nxs = IP

S1 I'+ I 2 I' + . - - + I fr 1$ S Nx. Therefore, for every p,

e112+1 4 P1'sNx (38)

This is Bessel's inequality.In the case of a space of finite dimension n, this inequality has a com-

pletely obvious geometrical meaning. For p=n it goes over into the theoremof Pythagoras

IE11 +1 =1x12

In the case of an infinite-dimensional space and an infinite sequence Z,co

it follows from (38) that the series x I ek 12 converges and that

00

.X 14 Nx=1x12.1r_1

Let us form the series

21 For further details see [12], Chapter II, § 9.


00

. kZkk..1

For every p the p-th partial sum of this series,

-11z1.+ 2x2+...+Spk'p,

is the projection x,s., of x onto the subspace

Sv - ff Lz1, z2, . . . , zo]and is therefore the best approximation to the vector x in this subspace :

PN(x-±Skzk) SN(x- ECkzk).

krl k-1

where c1, c2i ... , cp are arbitrary complex numbers. Let us calculate thecorresponding mean-square-deviation b.:

&p = N (x - X kzk) = (x - Skzk , x - G kzk) = Nx -GPI k 12 .k-i k.el k-1 k-iHence

If

00

lim 8P = Nx - i' I k L2p-.oo k-i

lim 6P= 0,P+Go

00

then we say that the series ' k xk converges in the mean (or converges withk_1

respect to the norm) to the vector x.In this case we have an equality for the vector x in R ( the theorem of

Pythagoras in an infinite-dimensional space!):

co

Nx=I X12=I1412-k-1

(39)

00

If for every vector x of R the series F k xk converges in the mean to x,k+1

then the orthonormal sequence of vectors al, Z2, ... is called complete. Inthis case, when we replace x in (39) by x + y and use (39) three times, forN (x + y), Nx, and Ny, then we easily obtain :

00

(xY) = E 4 k r l k [ k = (xxk), 2k = (yzk); k =1, 2, .-1 . (40)k-i

§ 6. ORTHOGONALIZATION OF SEQUENCE OF VECTORS 261

Example. We consider the space of all complex functions f(t) (t is areal variable) that are sectionally continuous in the closed interval [0, 2a].Let us define the norm of f (t) by

2n

Nf= f If (J) 12dt.0

Correspondingly, we have the formula2n

(f, g) = f f (t) 9 (t) dt0

for the scalar product of two functions f (t) and g(t).We take the infinite sequence of functions

Iekt (k-0, f1, f2, ...).

ffx-

These functions form an orthogonal sequence, because271 2,1 0, forf ei14te-'.t dt = ( ee

0 0

The series

00fketkt

k--.(/k=2hj?I(tetdt;) {k= 0> f 1, f 2, .. .

0

converges in the mean to f (t) in the interval [0, 2n]. This series is calledthe Fourier series of f (t) and the coefficients fk (k = 0, -- 1, ± 2, ...) arecalled the Fourier coefficients of f (t).

In the theory of Fourier series it is proved that the system of functionse'kt (k = 0, ± 1, ± 2, ...) is complete.22

The condition of completeness gives Parseval's equality (see (40) )

2n

0

.Foa 2n 2n

f f (t) e-{kt dt f g (t) etkt dt .

If f (t) is a real function, then fo is real, and f k and f -k are conjugatecomplex numbers. Setting

Sn

fk= Jf (t) 'ikedt = 2 (a,, + ibk)

0

22 See, for example, [121, Chapter II.

262

where

IX. LINEAR OPERATORS IN A UNITARY SPACE

sx 2x

ak= I f f (t) cos kt dt , bk = a f f (t) sin kt dt (lc=0, 1, 2, ...) ,

0 0

we have

f keskt + f _te- = at cos kt + bt sin kt (k =1, 2, ...) .

Therefore, for a real function f (t) the Fourier series assumes the form

2x

/ak=_ff(t)costtu ,00

2 + E (ak cos kt + bk sin kt) 0sn

bk = f f (t) sin kt dt,0

§ 7. Orthonormal Bases

1. A basis of any finite-dimensional subspace S in a unitary or a euclideanspace R is a non-degenerate sequence of vectors and therefore-by Theorem 2of the preceding section-can be orthogonalized and normalized. Thus :Every finite-dimensional subspace S (and, in particular, the whole space Rif it is finite-dimensional) has an orthonormal basis.

Let e1, e2, ..., e, be an orthonormal basis of R. We denote by x1, x2,x3, ... , x,, the coordinates of an arbitrary vector x in this basis :

xxkek.k_1

Multiplying both sides of this equation on the right by ek and takinginto account that the basis is orthonormal, we easily find :

xk = (xek) (k= 1, 2, ..., n) ;

i.e., in an orthonormal basis the coordinates of a vector are equal to its pro-jections onto the corresponding basis vectors :

x = E (xek) ek .k_1

(41)

Let x1, x2i ... , x and xi, xz, ... , xn be the coordinates of one and thesame vector x in two different orthonormal bases e1, e2, ...,, en andei, e.2., ... , e' of a unitary space R. The formulas for the coordinate trans-formation have the form

§ 7. ORTiioNORMAL BASES 263

uikx' (i = 1, 2, ...,n). (42)

Here the coefficients u1 , u2k, ... , u k that form the k-th column of thematrix U = I, lctk i are easily seen to be the coordinates of the vector ek

in the basis e1, e2,. . ., e.. Therefore, when we write down the condition forthe basis e'1, e', . . ., eK to be orthonormal in terms of coordinates (see (10) ),we obtain the relations

1, for k=l,uikuil = 41 0, for k 1.-1

(43)

A transformation (42) in which the coefficients satisfy the conditions(43) is called unitary and the corresponding matrix U is called a unitarymatrix. Thus: In an n-dimensional unitary space the transition from oneorthonormal basis to another is effected by a unitary coordinate transfor-mation.

Let R be an n-dimensional euclidean space. The transition from oneorthonormal basis of R to another is effected by a coordinate transformation

xi = .L. vikx'k (i = 1, 2,..., n) (44)k-i

whose coefficients are connected by the relation

vaou = Bki (k , l =1, 2, ... , n) . (45)i-1

Such a coordinate transformation is called orthogonal and the correspond-ing matrix V is called an orthogonal matrix.

2. We note an interesting matrix method of writing the orthogonalizingprocess. Let A = II aik 117 be an arbitrary non-singular matrix (I A 0)with complex elements. We consider a unitary space R with an orthonormalbasis el, e2, ... , e and define the linearly independent vectors ax, a2, ... , anby the equations

akaikei (k n).i_1

Let us perform the orthogonalizing process on the vectors a1, a2, ... , a,,.The orthonormal basis of R so obtained we shall denote by u1, u2, ... , u,,.Suppose we have

nui= ,'uikef (k=1,2,...,n).

i-1


Then(p=1,2,...,[a1,a2,.. .,as,]=[u1,u2,...,up]

i.e.,a1= eliula2 = C12U1 + C22U2 ,

as=Cnul+ C2,u2.+..-+Cnaua,

where the C;k (i, k = 1, 2, ... , n ; i< k) are certain complex numbers.Setting c{k = 0 for i > k, we have:

aak=2 cpkup (k=1,2...... ).

P-1

When we go over to coordinates and introduce the upper triangular matrixC = II cik 111 and the unitary matrix U = I usk II i , we obtain

aik 4U{pCpk

or

(i,k=1,2...... ),

A= UC (*)

According to this formula : Every non-singular matrix A = +I aik II canbe represented in the form of a product of a unitary matrix U and an uppertriangular matrix C.

Since the orthogonalizing process determines the vectors u1, u2, ... , uuniquely, apart from scalar multipliers s,, E2, ... , en ( I e, I = 1; i =1, 2, ... .n), the factors U and C in (*) are uniquely determined apart from a diagonalfactor M = (Cl, 2i ... , ea) :

U= U1M, C=M-1C1.

This can also be shown directly.

Note 1. If A is a real matrix, the factors U and C in (*) can be chosento be real. In this case, U is an orthogonal matrix.

Note 2. The formula (*) also remains valid for a singular matrixA (I A = 0). This can be seen by setting A = lim A,,,, where I A,,, 1; 0

M-000(m=1,2, . .).

Then A.= UrC(m = 1, 2, ...) . When we select from the sequence{Um) a convergent subsequence U,,p (lim U,,p U) and proceed to the"`y p4e0 n'y

limit, then we obtain from the equation A,,,P= U, C,,,p for p -+ oo the re-quired decomposition A = UC. However, in the case ` A I = 0 the factorsU and C are no longer uniquely determined to within a diagonal factor M.

n),

§ 8. THE ADJOINT OPERATOR

Note 3. Instead of (*) we can also obtain a formula

A=DW,

265

where D is a lower triangular matrix and W a unitary matrix. For whenwe apply the formula (*) that was established above to the transposedmatrix AT

AT=UC

and then set W = UT, D= CT, we obtain (**).23

§ 8. The Adjoint Operator

1. Let A be a linear operator in an n-dimensional unitary space.DEFINITION 4: A linear operator A* is called ad joint to the operator A

if and only if for any two vectors x, y of R

(Ax, y) _ (x, A*y). (46)

We shall show that for every linear operator A there exists one and onlyone adjoint operator A*. To prove this, we take an orthonormal basisel, e2, ..., e, in R. Then (see (41)) the required operator A* and an arbi-trary vector y of R must satisfy the equation

n

A*y =,I (A *y, es) ek .k_1

By (46) this can be rewritten as follows :

A*y = E (y, Aek) ek. (47)k_1

We now take (47) as the definition of an operator A*.It is easy to verify that the operator A* so defined is linear and satisfies

(46) for arbitrary vectors x and y of R. Moreover, (47) determines theoperator A* uniquely. Thus the existence and uniqueness of the adjointoperator A* is established.

Let A be a linear operator in a unitary space and let A = II act 11 11 be thecorresponding matrix in an orthonormal basis el, e2, ... , e,,. Then, by apply-

ing the formula (41) to the vector Aek = 2' a ej , we obtain{_1

aik = (Aek, a{) (i, k =1, 2, ..., n). (48)

23 From the fact that U is unitary it follows that UT is unitary, since the condition(43), written in matrix form UT U = E , implies that UUT = E.


Now let A* = II a{k be the matrix corresponding to A* in the samebasis. Then, by (48),

a _ (A*et, ei) (i, k =1, 2, ..., n). (49)

From (48) and (49) it follows by (46) that

w aH (i, k =1, 2, ..., n),

A* =AT.

The matrix A* is the complex conjugate of the transpose of A. This matrixwill be called the adjoint of A. (This is not to be confused with the adjointof a matrix as defined on p. 82.)

Thus: In an orthonormal basis ad joint matrices correspond to ad jointoperators.

The following properties of the adjoint operator follow from its defi-nition :

1. (A*)* = A,

2. (A + B)*= A* + B*,

3. (aA)*= aA* (a a scalar),

4. (AB)* = B*A*.

2. We shall now introduce an important concept. Let S be an arbitrarysubspace of R. We denote by T the set of all vectors y of R that are orthogo-nal to S. It is easy to see that T is a subspace of R and that every vector xof R can be represented uniquely in the form of a sum x = xs + XT, wherexs c S, XT a T, so that we have the resolution

R=S+T, SiT.We obtain this resolution by applying the decomposition (15) to the

arbitrary vector x of R. T is called the orthogonal complement of S. Obvi-ously, S is the orthogonal complement of T. We write S 1. T, meaning by thisthat each vector of S is orthogonal to every vector of T.

Now we can formulate the fundamental property of the adjoint operator :

5. If a subspace S is invariant with respect to A, then the orthogonalcomplement T of the subspace is invariant with respect to A*.

§ 8. THE ADJOINT OPERATOR 267

For let x e S, y e T. Then it follows from Ax E S that (Ax, y) = 0 andhence by (46) that (x, A*y) = 0. Since x is an arbitrary vector of S, A*y T,and this is what we had to prove.

We introduce the following definition :DEFINITION 5: Two systems of vectors x1, x2, ... , x,,, and yl, y2, ... , Y.

are called bi-orthogonal if

(xtYk) = b (i, k =1, 2, ..., m), (50)

where bik is the Kronecker symbol.Now we shall prove the following proposition :

6. If A is a linear operator of simple structure, then the ad joint operatorA* is also of simple structure, and complete systems of characteristic vectorsxl, x2, ... , x and 7i, y2, ..., y,, of A and A* can be chosen such that theyare bi-orthogonal:

Ax i = A,x,, Ayi = p y,, (xxYt) = 6 (1, k =1, 2, ... , n).

For let x1, x2, . . . , x be a complete system of characteristic vectors of A.We use the notation

Sk = [x1, ..., xt-i, xt+p, ... , (k =1, 2, ... , n).

Consider the one-dimensional orthogonal complement Tk = [yk] to the(n -1) -dimensional subspace Sk (k =1, 2, ... , n). Then Tk is invariantwith respect to A*:

A*Yt = µtYt, Yt o (k =1, 2, ... , n).

From Sk 1 yt it follows that (xkyk) 0, because otherwise the vector ykwould have to be the null vector. Multiplying xk, yk (k =1, 2, ... , n) bysuitable numerical factors we obtain

(x:Yk)=6 (i,k=1,2, ... , n).

From the bi-orthogonality of the systems x1i x2, ... , x and yl, y2, ... , y, itfollows that the vectors of each system are linearly independent.

We mention one further proposition :

7. If the operators A and A* have a common characteristic vector, thenthe corresponding characteristic values are complex conjugates.

For let Ax = 2x and A*x = ax (x, o). Then, setting y = x in (46),we have 2(x, x)= µ(x, x) and hence 2 =µ.

268 IX. LINEAR OPERATORS IN A ITNITARY SPACE

§ 9. Normal Operators in a Unitary Space

1. DEFINITION 6. A linear operator A is called normal if it commutes withits ad joint :

AA* = A*A. (51)

DEFINITION 7. A linear operator H is called hermitian if it is equal toits ad joint :

H = H. (52)

DEFINITION 8. A linear operator U is called unitary if it is inverse to itsadjoint :

UU*=E (53)

Note that a unitary operator can be regarded as an isometric operator

in a hermitian space, i.e., as an operator preserving the metric.For suppose that for arbitrary vectors x and y of R

(Ux, UY) = (x, Y) (54)Then by (46)

(U*Ux,Y) =(x,Y)and therefore, since y is arbitrary,

U*Ux=x,

i.e., U*U = E, or U* = U-1. Conversely, (53) implies (54).From (53) and (54) it follows that 1. the product of two unitary opera-

tors is itself a unitary operator, 2. the unit operator E is unitary, and 3. theinverse of a unitary operator is also unitary. Therefore the set of all unitaryoperators is a group.24 This is called the unitary group.

Hermitian operators and unitary operators are special cases of a normaloperator.

2. We haveTuEon 3: Every linear operator A can be represented in the form

A=H1+iH2, (55)

where H1 and H2 are hermitian operators (the' hermitian components' of A).The hermitian components are uniquely determined by A. The operator Ais normal if and only if its hermitian components Hl and H2 are permutable.

24 See footnote 13 on p. 18.

§ 9. NORMAL OPERATORS IN A UNITARY SPACE 269

Proof. Suppose that (55) holds. Then

A* =H1-iH2. (56)

From (55) and (56) we have:

H1= (A+A*), H2-2i (A-A*). (57)

Conversely, the formulas (57) define hermitian operators H1 and H2 con-nected with A by (55).

Now let A be a normal operator : AA* = A*A. Then it follows from(57) that H1H2 = H2H1. Conversely, from H1H2= H2H1 it follows by (55)and (56) that AA* = A*A. This completes the proof.

The representation of an arbitrary linear operator A in the form (55)is an analogue to the representation of a complex number z in the formx1 + ix2, where x1 and x2 are real.

Suppose that in some orthonormal basis the operators A, H, and U cor-respond to the matrices A, H, and U. Then the operator equations

AA*=A*A, H*=H, UU* =E

correspond to the matrix equations

AA* = A*A, H*=H, UU* = E.

(58)

(59)

Therefore we define a matrix as normal if it commutes with its adjoint, ashermitian if it is equal to its adjoint, and finally as unitary if it is inverseto its adjoint.

Then: In an orthonormal basis a normal (hermitian, unitary) operatorcorresponds to a normal (hermitian, unitary) matrix.

A hermitian matrix H= II hik IIi is, by (59), characterized by the fol-lowing relation among its elements :

h1;=h{t (i,k=1,2...... ),i.e., a hermitian matrix is always the coefficient matrix of some hermitianform (see § 1).

A unitary matrix U = I I u4k III is, by (59), characterized by the follow-ing relations among its elements :

A

. urquki = 6 (i, k =1, 2, ... , n). (60)f


Since UU* = E implies that U*U = E, from (60) there follow the equiva-lent relations :

n

,'uju,,E=8jk (i,k=1,2,...,n). (61)i-1

Equation (60) expresses the `orthonormality' of the rows and equation (61)that of the columns of the matrix I1= Il I'm II, ,21

A unitary matrix is the coefficient matrix of some unitary transforma-tion (see § 7).

§ 10. The Spectra of Normal, Hermitian, and Unitary Operators

1. As a preliminary, we establish a property of permutable operators inthe form of a lemma.

LEMMA 1: Permutable operators A and B (AB=BA) always have acommon characteristic vector.

Proof. Let x be a characteristic vector of A : Ax = Ax, x ; o. Then,since A and B are permutable,

AB'fx=2Bkx (k=0, 1, 2, ...).

Suppose that in the sequence of vectors

x, Bx, B2x, ...

(62)

the first p are linearly independent, while the (p + 1)-th vector Bpx is alinear combination of the preceding ones. Then S = [x, Bx, ... , BP-1x) isa subspace invariant with respect to B, so that in this subspace S thereexists a characteristic vector y of B: By =juy, y =A o . On the other hand,(62) shows that the vectors x, Bx, ..., W-Ix are characteristic vectors of Acorresponding to one and the same characteristic value A. Therefore everylinear combination of these vectors, and in particular y, is a characteristicvector of A corresponding to 2. Thus we have proved the existence of acommon characteristic vector of the operators A and B.

Let A be an arbitrary normal operator in an n-dimensional hermitianspace R. In that case A and A* are permutable and therefore have a commoncharacteristic vector x1. Then (see § 8, 7.)

25 Thus, orthonormality of the columns of the matrix U is a consequence of theorthonormality of the rows, and vice versa.

§ 10. SPECTRA OF NORMAL, HERMITIAN, AND UNITARY OPERATORS 271

Ax1= Aixi , A *x1= Aixi (xi o) .

We denote by S, the one-dimensional subspace containing the vectorx1 (S1= [x1)) and by T, the orthogonal complement of S, in R :

R = S1 + T1, S11 T1.

Since S, is invariant with respect to A and A*, T1 is also invariant withrespect to these operators (see § 8, 5.). Therefore, by Lemma 1, the per-mutable operators A and A* have a common characteristic vector x2 in Tt :

Ax2 = A2x2, A*x2 = A2x2 (x2& o) .

Obviously, xi 1 x2. Setting S2 = [x1, x21 and

R = S2 + T2, S21 T2,

we establish in a similar way the existence of a common characteristic vectorx3 of A and A* in T3. Obviously x,1 x3 and x21 x3. Continuing this process,we obtain n pairwise orthogonal common characteristic vectors x1, x2,. . . , xnofAandA*:

Axk= Akxk, A*xk = 2kxk (xk o) ,(i,

k =1, 2, ..., n) . I (63)(xixk)=0, for i,Ak

The vectors x1i x2, ... , xn can be normalized without violating (63).Thus we have proved that a normal operator always has a complete

orthonormal system of characteristic vectors.26Since Ak = A always implies that Ik = 4 it follows from (63) that :1. If A is a normal operator, every characteristic vector of A is a char-

acteristic vector of the ad joint operator A*, i.e., if A is a normal operator,then A and A* have the same characteristic vectors.

Suppose now, conversely, that a linear operator A has a complete ortho-normal system of characteristic vectors :

Axk = Akxk, (xxxk) = 80 (i, k =1, 2, ..., n) .

We shall show that A is then a normal operator. For let us set :

ThenYr=A*x,--11xt.

(x*y) =(xk, A*x,)--A,(xkxt)=(Axk, xt)--At(xkxx)=(Ak-A,)ski =0 (k, i=1, 2, ..., n).

Hence it follows that

26 Here, and in what follows, we mean by a complete orthonormal system of vectorsan orthonormal system of n vectors, where n is the dimension of the space.


yj=_4*xi-Ajxj=o (1=1, 2, ..., n),i.e., that (63) holds.

But thenAA*xk = Ak kxk and A*Axk= AhAkxk (k =1, 2, ..., n) ,

or

AA* =A*A.

Thus we have obtained the following `internal' (spectral) characterizationof a normal operator A (apart from the `external' one : AA* = A*A) :

THEOREM 4: A linear operator is normal if and only if it has a completeorthonormal system of characteristic values.

In particular, we have shown that a normal operator is always of simplestructure.

Let A be a normal operator with the characteristic values 1,, A , ... ,Using the Lagrange interpolation formula, we define two polynomials p(2)and q (A) by the conditions

p(Ak)=Ak, q(Ak) =Ak (k=1, 2, ..., n).Then by (63)

i.e.:A*=p(A), A=q(A*); (64)

2. If A is a normal operator, then each of the operators A and A* canbe represented as a polynomial in the other; these two polynomials aredetermined by the characteristic values of A.

Let S be an invariant subspace of R for a normal operator A and letR = S + T, S 1 T. Then by § 8, 5., the subspace T is invariant with respectto A*. But A = q (A*), where q (1) is a polynomial. Therefore T is alsoinvariant with respect to A. Thus :

3. If S is an invariant subspace with respect to a. normal operator A andT is the orthogonal complement of S, then T is also an invariant subspacefor A.

2. Let us now discuss the spectrum of a hermitian operator. Since a her-mitian operator H is a special form of a normal operator, by what we haveproved it has a complete orthonormal system of characteristic vectors :

Hxk =Arxk, (xkxj) =bkj (k, 1=1, 2, ..., n) . (65)

From H" = H it follows that

Ak=Ak (k=1, 2...., n), (66)

§ 10. SPECTRA OF NORMAL, HERMITIAN, AND UNITARY OPERATORS 273

i.e., all the characteristic values of a hermitian operator H are real.It is not difficult to see that, conversely, a normal operator with real

characteristic values is always hermitian. For from (65), (66), and

H*xr = Akxk (k =1, 2, ... , n)it follows that

H*xt=Hxr (k=1, 2, ..., n),i.e.,

H* =H.We have obtained the following ìnternal' characterization of a hermitian

operator (apart from the èxternal' one : H* = H) :

THEOREM 5: A linear operator H is hermitian if and only if it has acomplete orthonormal system of characteristic vectors with real character-istic values.

Let us now discuss the spectrum of a unitary operator. Since a unitaryoperator U is normal, it has a complete orthonormal system of characteristicvectors

whereUxs= Atxk, (xrxl)= ak1 (k, 1=1, 2, ..., n), (67)

U*x* = AkXk

From UU* =Ewe find :

(k =1, 2, ..., n). (68)

A,1r =1. (69)

Conversely, from (67), (68), and (69) it follows that UU* = E. Thus,among the normal operators a unitary operator is distinguished by the factthat all its characteristic values have modulus 1.

We have thus obtained the following ìnternal' characterization of aunitary operator (apart from the èxternal' one: UU*=E) :

THEOREM 6: A linear operator is unitary if and only if it has a completeorthonormal system of characteristic vectors with characteristic values ofmodulus 1.

Since in an orthonormal basis a normal (hermitian, unitary) matrixcorresponds to a normal (hermitian, unitary) operator, we obtain the fol-lowing propositions :

THEOREM 4': A matrix A is normal if and only if it is unitarily similarto a diagonal matrix :

A = U jj A1Ba jjnU-1 (U*= U-1) . (70)


THEOREM 5': A matrix H is hermitian if and only if it is unitarilysimilar to a diagonal matrix with real diagonal elements :

H=UII18kII1U-I (U*=U-1; 2 =2; i=1, 2, ..., n). (71)

THEOREM 6': A matrix U is unitary if and only if it is unitarily similarto a diagonal matrix with diagonal elements of modulus 1:

U=U1(I SskII"IU11 (U1 =U11; IA I =1; i=1, 2, ..., n). (72)

§ 11. Positive-Semidefinite and Positive-Definite Hermitian Operators

1. We introduce the following definition :

DEFINITION 9: A hermitian operator H is called positive semidefinite iffor every vector x of R

(Hx, X) ?0,

and positive definite if for every vector x e o of R

(Hx, x)>0.

If a vector x is given by its coordinates x1, x2, ... , x in an arbitraryorthonormal basis, then (Hx, x), as is easy to see, is a hermitian form in thevariables x1, x2, .. ., x,,; and to a positive-semidefinite (positive-definite)operator there corresponds a positive-semidefinite (positive-definite) hermi-tian form (see § 1).

We choose an orthonormal basis x1i x2, ... , x of characteristic vectorsof H :

Hxk = Akxk , (xkxl) = 6k!

Then, setting x = Z tkxk, we havek_1

(k, l =I, 2, ..., n). (73)

(Hx, x) X (k=1, 2, ..., n).

Hence we easily deduce the `internal' characterizations of positive-semi-definite and positive-definite operators :

THEOREM 7: A hermitian operator is positive semidefinite (positivedefinite) if and only if all its characteristic values are non-negative (posi-tive).

§ 11. POSITIVE-SEMIDEFINITE & POSITIVE-DEFINITE HERMITIAN OPERATORS 275

From what we have shown, it follows that a positive-definite hermitianoperator is non-singular and positive semidefinite.

Let H be a positive-semidefinite hermitian operator. The equation (73)holds for H with Ak ? 0 (k = 1, 2, ... , n). We set ek = I Ak a 0 (k = 1, 2,3, ... , n) and define a linear operator F by the equation

Fxk = ekxk (k = 1, 2, ..., n). (74)

Then F is also a positive-semidefinite operator and

F2=H. (75)

We shall call the positive-semidefinite hermitian operator F connected withH by (75) the arithmetical square root of H and shall denote it by

F= FH.

If H is positive definite, then F is also positive definite.We define the Lagrange interpolation polynomial g (A) by the equations

9 (Ak) = ek (_ }) (k =1, 2, ... , n).

Then from (73), (74), and (76) it follows that :

F=g(H).

(76)

(77)

The latter equation shows that }'H is a polynomial in H and is uniquelydetermined when the positive-semidefinite hermitian operator H is given(the coefficients of g(2) depend on the characteristic values of H).2. Examples of positive-semidefinite hermitian operators are AA* andA*A, where A is an arbitrary linear operator in the given space. Indeed,for an arbitrary vector x,

(AA*x, x) =(A*x, A*x) ? 0,(A*Ax,x) (Ax, Ax) ;;t 0.

If A is non-singular, then AA* and A*A are positive-definite hermitianoperators.

The operators AA* and A*A are sometimes called the left norm and rightnorm of A. }'AA* and A*A are called the left modulus and right modulusof A.

For a normal operator the left and right norms, and hence the left andright moduli, are equal.27

27 For a detailed study of normal operators, see (168). In this paper necessary andsufficient conditions for the product of two normal operators to be normal are established.


§ 12. Polar Decomposition of a Linear Operator in a Unitary Space.Cayley's Formulas

1. We shall prove the following theorem :28

THEOREM 8: Every linear operator A in a unitary space can be repre-sented in the forms

A = HU, (78)A = U1H1, (79)

where H, H1 are positive-semidefinite hermitian operators and U, U, areunitary operators. A is normal if and only if in (78) (or (79)) the factorsH and U (H1 and Ul) are permutable.

Proof. From (78) and (79) it follows that H and H1 are the left andright moduli, respectively, of A.

ForAA* =HUU*H =IP, A*A =H1UI U1H1= H1.

Note that it is sufficient to establish (78), since by applying this decom-position to A* we obtain A* = HU and hence

A=U-1H.

i.e., the decomposition (79) for A.We begin by establishing (78) in the special case where A is non-singular

(JAJ0). Weset:H=1/AA* (here I H Ia = i A 12 0), U =H-1A

and verify that U is unitary :

UU* = H-1 AA*H-1= H-11PH-1=E.

Note that in this case not only the first factor H in (78), but also thesecond factor U is uniquely determined by the non-singular operator A.

We now consider the general case where A may be singular.First of all we observe that a complete orthonormal system of charac-

teristic vectors of the right norm of A is always transformed by A into anorthogonal system of vectors. For let

ThenA*Axt= e;xk [(xtxi) = a , Qt ? 0; k,1=1, 2, .. , , n] .

(Axe, Axi) = (A*Axt, x,) = er (xtxl) = 0 (k=1&1).

28 See (168], p. 77.

§ 12. POLAR DECOMPOSITION IN A UNITARY SPACE. CAYLEY 'S FORMULAS 277

HerejAxk J2 = (Axk, Axk) = ex (k=1,2, ... , n) .

Therefore there exists an orthonormal system of vectors act, 22, ... , s suchthat

Axk = eksr (ssxr) = 8kt ; k, t =1, 2, ... , nJ . (80)

We define linear operators H and U by the equations

Uxk=zk, Hzk=etxk. (81)

From (80) and (81) we find :

A=HU.

Here H is, by (81), a positive-semidefinite hermitian operator, because ithas a complete orthonormal system of characteristic vectors st, z2,. . . , X. withnon-negative characteristic values pt, el, . . . , e,,; and U is a unitary operator,because it carries the orthonormal system of vectors x1, x2, ... , x, into theorthonormal system st, z2, .. . , En.

Thus we can take it as proved that an arbitrary linear operator A hasdecompositions (78) and (79), that the hermitian factors H and Hl arealways uniquely determined by x (they are the left and right moduli of A,respectively) and that the unitary factors U and Ut are uniquely determinedonly when A is non-singular.

From (78) we find easily :

AA* =H2, A*A = U-1H$U. (82)

If A is a normal operator (AA* = A*A), then it follows from (82) that

H2 U= UHa. (83)

Since H =)'H2 = g (Hs) (see § 11), (83) shows that U and H commute.Conversely, if H and U commute, then it follows from (82) that A is normal.This completes the proof of the theorem.29

29 If the characteristic values X,, X2, ... , 4 and ei, ea , . . . , e* of the linear operator Aand its left modulus H=1/AA* (by (82) ei, e= , ... , e* are also the characteristic valuesof the right modulus Ht = A* A) are so numbered that

then (see [379], or [1531 and [296]) the following inequality of Weyl holds:

I2tlse,, Ptl+Izslset+es . IAil ++I4ISet++e,


It is hardly necessary to mention that together with the operator equa-tions (78) and (79) the corresponding matrix equations hold.

The decompositions (78) and (79) are analogues to the representationof a complex number z in the form z = rat, where r = I z I and I u I = 1.

2. Now let x,, x2, ... , x be a complete orthonormal system of characteristicvectors of the arbitrary unitary operator U. Then

Uxk = e'/kxk , (xkxi) = 8k: (k, l = 1, 2, ... , n). (84)

where the fk (k = 1, 2, ... , n) are real numbers. We define a hermitianoperator F by the equations

Fxk = fkxk (k =1, 2, ... , n).

From (84) and (85) it follows that :30

U=e'F. (86)

Thus, a unitary operator U is always representable in the form (86), whereF is a hermitian operator. Conversely, if F is a hermitian operator, thenU= OF is unitary.

The decompositions (78) and (79) together with (86) give the follow-ing equations :

A = He{F, (87)

A = t'F,H1 (88)

where H, F, H,, and F, are hermitian operators, with H and H, positive semi-definite.

The decompositions (87) and (88) are analogues to the representationof a complex number z in the form z = rein, where r ? 0 and q, are realnumbers.

Note. In (86), the operator F is not uniquely determined by U. For Fis defined by means of the numbers fk (k =1, 2, ... , n) and we can add toeach of these numbers an arbitrary multiple of 2n without changing theoriginal equations (84). By choosing these multiples of 2n suitably we canassume that e'fk = eifi always implies that f k = f, (1 < k, 1:5 n). Then wecan determine the interpolation polynomial g(A) by the equations

g (e'fk) = fk (k =1, 2, ... , n). (89)

30 OF= r(F), where r(A) is the Lagrange interpolation polynomial for the functioneix at the places f,, f2, ... , fn-

§ 12. POLAR DECOMPOSITION IN A UNITARY SPACE. CAYLEY'S FORMULAS 279

From (84), (85), and (89) it follows that

F = g (U) = g (e;F). (90)

Similarly we can normalize the choice of F1 so that

F1= h (U1) =h (91)

where h (A) is a polynomial.By (90) and (91), the permutability of H and U (H1 and U1) implies

that of H and F (HI and F1) , and vice versa. Therefore, by Theorem 8,A is normal if and only if in (87) H and F (or, in (88), H1 and F1) arepermutable, provided the characteristic values of F (or F1) are suitablynormalized.

The formula (86) is based on the fact that the functional dependence

1u=e{1 (92)

carries n arbitrary numbers f 1, f2, ... , f on the real axis into certain num-bers µ1, u2 , ... , µ on the unit circle I u I = 1, and vice versa.

The transcendental dependence (92) can be replaced by the rational

dependence

I + it (93)1-itwhich carries the real axis f = T into the circle =1; here the point atinfinity on the real axis goes over into the point u = -1. From (93), wefind :

I-µ(94)

Repeating the arguments which have led us to the formula (86), weobtain from (93) and (94) the pair of inverse formulas :

U =(E + iF) (E - iF)-1,(95)

F=i(E---U) (E+ U)-1,

We have thus obtained Cayley's formulas. These formulas establish aone-to-one correspondence between arbitrary hermitian operators F andthose unitary operators U that do not have the characteristic value - 1.31

31 The exceptional value - I can be replaced by any number po (I po 1) = 1). For thispurpose, we have to take instead of (93) a fractional-linear function mapping the realaxis f = f onto the circle I p I = 1 and carrying the point f = ac into p = po. Theformulas (94) and (95) can be modified correspondingly.


The formulas (86), (87), (88), and (95) are obviously valid when wereplace all the operators by the corresponding matrices.

§ 13. Linear Operators in a Euclidean Space

1. We consider an n-dimensional euclidean space R. Let A be a linearoperator in R.

DEFINITION 10. The linear operator AT is called the transposed operatorof A (or the transpose of A) if for any two vectors x and y of R:

(Ax, y) = (x, AT y) . (96)

The existence and uniqueness of the transposed operator is establishedin exactly the same way as was done in § 8 for the adjoint operator in aunitary space.

The transposed operator has the following properties :

1. (AT)T = A,2. (A+B)T=AT+BT,3. (aA)' = aAT (a a real number),4. (AB)T = BT AT T.

We introduce a number of definitions.

DEFINITION 11: A linear operator A is called normal if

AAT = ATA.

DEFINITION 12: A linear operator S is called symmetric if

ST=S.

DEFINITION 13: A symmetric operator S is called positive semidefiniteif for every vector x of R

(Sx, x) > 0.

DEFINITION 14: A symmetric operator S is called positive definite iffor every vector x o of R

(Sx,x) > 0.

DEFINITION 15: A linear operator K is called skew-symmetric if

KT=--K.

§ 13. LINEAR OPERATORS IN A EUCLIDEAN SPACE 281

An arbitrary linear operator A can always be represented uniquely in

the formA = S + K, (97)

where S is symmetric and K is skew-symmetric.For it follows from (97) that

AT = S - K. (98)

From (97) and (98) we have :

S=.2 (A+AT), K= 2 (A-AT). (99)

Conversely, (99) defines a symmetric operator S and a skew-symmetricoperator K for which (97) holds.

S and K are called respectively the symmetric component and the skew-symmetric component of A.

DEI'INITION 16: An operator Q is called orthogonal if it preserves themetric of the space, i.e., if for any two vectors x, y of R

(Qx, QY) = (x, Y) (100)

By (96), equation (100) can be written as: (x, QTQy) =(x, y). Hence

QTQ=E. (101)

Conversely, (101) implies (100) (for arbitrary vectors X, y).32 From(101) it follows that : Q 2 1, i.e.,

IQI=±1.

We shall call Q an orthogonal operator of the first kind (or proper) ifI Q j =1 and of the second kind (or improper) if I Q I =-1.

Symmetric, skew-symmetric, and orthogonal operators are special formsof a normal operator.

We consider an arbitrary orthonormal basis in the given euclidean space.Suppose that in this basis A corresponds to the matrix A = II ark II1 (hereall the ask are real numbers). The reader will have no difficulty in showingthat the transposed operator AT corresponds in this basis to the transposedmatrix AT = ! l aT 1 1i , where aj = akj (i, k =1, 2, ... , n) . Hence it followsthat in an orthonormal basis a normal operator A corresponds to a normal

32 The orthogonal operators in a euclidean space form a group, the so-called orthogonalgroup.


matrix A (AAT=ATA), a symmetric operator S to a symmetric matrixS = 11 s:x !I i (ST = S), a skew-symmetric operator K to a skew-symmetricmatrix K ='I k{, (KT = - K) and, finally, an orthogonal operator Q toan orthogonal matrix Q = qek II i (QQT = E).33

Just as was done in § 8 for the adjoint operator, we can here make thefollowing statement for the transposed operator:

If a subspace S of R is invariant with respect to a linear operator A,then the orthogonal complement T of S in R is invariant with respect to AT.

2. For the study of linear operators in a euclidean space R, we extend Rto a unitary space R. This extension is made in the following way:

1. The vectors of R are called `real' vectors.2. We introduce `complex' vectors u = x + iy, where x and y are real,

i.e., x c R, y c R.3. The operations of addition of complex vectors and of multiplication

by a complex number are defined in the natural way. Then the set of allcomplex vectors forms an n-dimensional vector space R over the field ofcomplex numbers which contains R as a subspace.

4. In R we introduce a hermitian metric such that in R it coincideswith the existing euclidean metric. The reader can easily verify that therequired hermitian metric is given in the following way :

If x=x+iy,w=u+iv(x,y,u,veR), then

(xw) = (xu) + (Yv) + i [(Yu) - (xv)I

Setting ar = x - iy and iv = u -- iv, we have

(i w) _ (xw).

If we choose a real basis, i.e., a basis of R, then it will be the set of all vectorswith complex coordinates and R the set of all vectors with real coordinatesin this basis.

Every linear operator A in R extends uniquely to a linear operator in R :

A(x+iy) =Ax+iAy.

3. Among all the linear operators of R those that are obtainable as theresult of such an extension of operators of R can be characterized by thefact that they carry R into R (AR C R) . These operators are called real.

33 The papers [138), [262a), (170b) are devoted to the study of the structure oforthogonal matrices. Orthogonal matrices, like orthogonal operators, are called properand improper according as I Q I=+ 1 or I Q I=-1.

§ 13. LINEAR OPERATORS IN A EUCLIDEAN SPACE 283

In a real basis real operators are determined by real matrices, i.e., mat-rices with real elements.

A real operator A carries conjugate complex vectors x = x + iy ,E = x - iy (x, y c R) into conjugate complex vectors :

Ax=Ax+iAy,Az=Ax-iAy (Ax,Ayc%).

The secular equation of a real operator has real coefficients, so that whenit has a root 1 of multiplicity p it also has the root 1 with the multiplicity p.From Ax = Ax it follows that As = Au, i.e., to conjugate characteristic valuesthere correspond conjugate characteristic vectors.

The two-dimensional space [z, a ] has a real basis :

x=2(z+z), y=2i(x-at).

We shall call the plane in R spanned by this basis an invariant plane of Acorresponding to the pair of characteristic values A, 1.

Let A = p + iv. Then it is easy to see that

Ax =,ux - vy,

Ay=vx+ uy.

We consider a real operator A of simple structure with the characteristicvalues :

A2k-1=Yk+avk, Ask=Juk-ivk, A1= ut (k =l, 2, ..., q; l =2q+ 1, ..., n),

where Yk, vk, JAI are real and vk 0 (k =1, 2, ... , q) .Then the characteristic vectors xl, x2, ... , x,, corresponding to these char-

acteristic values can be chosen such that

x2t-1 = xk + iYk, x2k = xk - IYk, xl = Z1

The vectors

x1, y11 x2, y$, ..., X9, ye, xgq}1) ..., xX

form a basis of the euclidean space R. Here

(k = 1, 2, ..., q; l = 2q + 1, ..., n).

(102)

(103)

sa If to the characteristic value ). of the real operator A there correspond the linearlyindependent characteristic vectors z1, s,, ... , s,, then to the characteristic value X therecorrespond the linearly independent characteristic vectors i,, a,, ... , 1,,.


In the basisdiagonal matrix

{

Axk= ukxk - vky , k =1, 2, ..., qAyk =vkxk + Akyk' (l =2q+ 1, ..., n) (104)

Axt = µixl(103) there corresponds to the operator A the real quasi-

lu1 V1- V1 lul I

' A2q+1' ... , GlnVq-Vq

Pq(105)

Thus: For every operator A of simple structure in a euclidean space thereexists a basis in which A corresponds to a matrix of the form (105). Henceit follows that : A real matrix of simple structure is real-similar to a canonicalmatrix of the form (105) :

Pi V1-vl Jul

i

Aug Vq

Vq Aug/42q+1) ..., fin } T-1 (T =T). (106)

The transposed operator AT of A in R upon extension becomes the adjointoperator A* of A in R. Therefore : Normal, symmetric, skew-symmetric,and orthogonal operators in R after the extension become normal, hermitiam,hermitian multiplied by i, and unitary real operators in R.

It is easy to show that for a normal operator A in a euclidean space acanonical basis can be chosen as an orthonormal basis (103) for which (104)holds." Therefore a real normal matrix is always real-similar andorthogonally-similar to a matrix of the form (105) :

A=Q{'I -v1 fk1

V1 #'F Vqq Pir III

}Q-1 (107)

(Q _ QT_1 =Q).

All the characteristic values of a symmetric operator S in a euclideanspace are real, since after the extension the operator becomes hermitian.For a symmetric operator S we must set q = 0 in (104). Then we obtain :

Sxt = Atxt [(xkx:) = akt; k, l =1, 2, ..., n]. (108)

A symmetric operator S in a euclidean space always has an othonormalsystem of characteristic vectors wih real characteristic values.311 Therefore :

35 The orthonormality of the basis (102) in the hermitian metric implies the orthonor-mality of the basis (103) in the corresponding euclidean metric.

36 The symmetric operator S is positive semidefinite if in (108) all µt? 0 and positivedefinite if all Iut> 0.

§ 13. LINEAR OPERATORS IN -A EUCLIDEAN SPACE 285

A real symmetric matrix is always real-similar and orthogonally-similar to adiagonal matrix :

(Q=Q'1=6- (109)

All the characteristic values of a skew-symmetric operator K in aeuclidean space are pure imaginary (after the extension the operator is itimes a hermitian operator). For a skew-symmetric operator we must setin (104) :

F1= u2 = ... _ ILq =,u=q+1= ... =FAA = 0

then the formulas assume the form

Kxt=-vkyt,Kyt = vtxt,

(k =1.2, ..., q; I- 2q + 1, ..., n). (110)Kxi=0

Since K is a normal operator, the basis (103) can be assumed to beorthonormal. Thus : Every real skew-symmetric matrix is real-similar andorthogonally-similar to a canonical skew-symmetric matrix :

0 v1

- vl 00vQ

I

i T-i-Vt 0 '0'---'0}Q-1 (Q=Q=Q) (111)

111

All the characteristic values of an orthogonal operator Q in a euclideanspace are of modulus 1 (upon extension the operator becomes unitary).Therefore in the case of an orthogonal operator we must set in (104) :

,4+vk=1, 4a = ± 1 (k=1,2, ...,q;1=2q+1, ...,n).

For this basis (103) can be assumed to be orthonormal. The formulas (104)can be represented in the form

Qxk = xt cos % - yt sin 99t , k=1, 2,...,q,(112)Qyt=xtsinp,+ytcos1=2q+

1, ., n)Qxt = ± xt

From what we have shown, it follows that : Every real orthogonal matrixis real-similar and orthogonally-similar to a canonical orthogonal matrix :

cos 97i sin Pi- sin q'i cos rpi

..., Icosg9 sing,sin q?,

! i1 (113)3 - COS Pill

(Q1 =Q1-1 =Q1)


Example. We consider an arbitrary finite rotation around the point0 in a three-dimensional space. It carries a directed segment OA into adirected segment OB and can therefore be regarded as an operator Q in athree-dimensional vector space (formed by all possible segments OA). Thisoperator is linear and orthogonal. Its determinant is + 1, since Q does notchange the orientation of the space.

Thus, Q is a proper orthogonal operator. For this operator the formulas(112) look as follows :

Qx1=x1cosp -ylsin9,Qyi=x,sin9; +yicos9),Qx. = f X2.

From the equation Q I =1 it follows that Qx2 = x2. This means thatall the points on the line through 0 in the direction of x2 remain fixed. Thuswe have obtained the Theorem of Euler-D'Alembert:

Every finite rotation of a rigid body around a fixed point can be obtainedas a finite rotation by an angle q,- around some fixed axis passing throughthat point.

§ 14. Polar Decomposition of an Operator and the Cayley Formulasin a Euclidean Space

1. In § 12 we established the polar decomposition of a linear operator ina unitary space. In exactly the same way we obtain the polar decompositionof a linear operator in a euclidean space.

THEOREM 9. Every linear operator A is representable in the form of aproduct "

A=SQ (114)A = Q1S1 (115)

where S, S, are positive-semidefinite symmetric and Q, Q, are orthogonaloperators; here S = V4-AT= g (AAT ), S1= ATA -h (ATA), where g (A) andh(A) are real polynomials.

A is a normal operator if and only if S and Q (S1 and Q,) are permutable.Similar statements hold for matrices.

37 As in Theorem 8, the operators S and Si are uniquely determined by A. If A isnon-singular, then the orthogonal factors Q and Q, are also uniquely determined.

§ 14. POLAR DECOMPOSITION IN A EUCLIDEAN SPACE. CAYLEY'S FORMULAS 287

Let us point out the geometrical content of the formulas (114) and (115).We let the vectors of an n-dimensional euclidean point space issue from theorigin of the coordinate system. Then every vector is the radius vector ofsome point of the space. The orthogonal transformation realized by theoperator Q (or Qi) is a `rotation' in this space, because it preserves theeuclidean metric and leaves the origin of the coordinate system fixed.38The symmetric operator S (or S1) represents a `dilatation' of the n-dimen-sional space (i.e., a `stretching' along n mutually perpendicular directionswith stretching factors Qoi, 02, ... , Pn that are, in general, distinctware arbitrary non-negative numbers) ). According to the formulas (114)and (115), every linear homogeneous transformation of an n-dimensionaleuclidean space can be obtained by carrying out in succession some rotationand some dilatation (in any order).

2. Just as was done in the preceding section for a unitary operator, wenow consider some representations of an orthogonal operator in a euclideanspace R.

Let K be an arbitrary skew-symmetric operator (KT = - K) and let

Q= ex.

Then Q is a proper orthogonal operator.For

QT = eKT = e-K = Q-1

and

I Q I = 1.39

(116)

Let us show that every proper orthogonal operator is representable inthe form (116). For this purpose we take the corresponding orthogonalmatrix Q. Since I Q I =1, we have, by (113),40

38 For I Q I = I this is a proper rotation; but for I Q 1 it is a combination of arotation and a reflection in a coordinate plane.

3e If k,, k,, ... , kn are the characteristic values of K, then p1= eke, ju, = ekt , ... ,Nn = ei*n are the characteristic values of Q = eX ; moreover

since

n'V ki

IQI=,U,P2...ft. =ei-i =1,n

,Yki=0.f-1

40 Among the characteristic values of a proper orthogonal matrix Q there is an evennumber equal to - 1. The diagonal matrix 1-0-11 can be written in the form

cos91 sing)I for p=n.-sin 91 cos T


cos 991 sin 99]- sin 9,1 cos 9'i

,cos 97P sin Tj

-sin q?P cos 9'Q

0 92q I- Tg 0 l

(Q1= (Q3)-1 = 01)

We define the skew-symmetric matrix K by the equation

Since

,

0 9,

-92 0 1=

it follows from (117) and (118) that

,+1,...,+1}Q,-l(117)

,0,...,o}Q1_L1. (118)

cos 9' sin p- sin q, cos p

Q = eK. (119)

The matrix equation (119) implies the operator equation (116).In order to represent an improper orthogonal operator we introduce

a special operator W which is defined in an orthonormal basis e1, e2, ... , eby the equations

Wet =e1, ... , Wen-1 = en_1, Wen =-en. (120)

W is an improper orthogonal operator. If Q is an arbitrary improperorthogonal operator then W --1 Q and QW-' are proper and therefore repre-sentable in the form eK and eKl, where K and K1 are skew-symmetric opera-tors. Hence we obtain the formulas for an improper orthogonal operator

Q=WeK=eK,W. (121)

The basis e1, e2, ... , e in (120) can be chosen such that it coincides withthe basis xk,yk,x, (k=1,2,...,q;l=2q+1,...,n)in (110) and (112).The operator W so defined is permutable with K ; therefore the two formulas(121) merge into one

Q=WeK (W=WT=W-1;

o-Fl-9'i 0

KT=-K, WK=KW). (122)

Let us now turn to the Cayley formulas, which establish a connectionbetween orthogonal and skew-symmetric operators in a euclidean space.The formula

§ 14. POLAR DECOMPOSITION IN A EUCLIDEAN SPACE. CAYLEY'S FORMULAS 289

Q =(E-K) (E+ K)-' , (123)

as is easily verified, carries the skew-symmetric operator K into the orthogo-nal operator Q. (123) enables us to express K in terms of Q :

K = (E -- Q) (E + Q)-' . (124)

The formulas (123) and (124) establish a one-to-one correspondencebetween the skew-symmetric operators and those orthogonal operators thatdo not have the characteristic value -1. Instead of (123) and (124) wecan take the formulas

Q= -(E-K)(E+K)-', (125)K = (E + Q) (E - Q)-1. (126)

In this case the number + 1 plays the role of the exceptional value.

3. The polar decomposition of a real matrix in accordance with Theorem 9enables us to obtain the fundamental formulas (107), (109), (111), and(113) without embedding the euclidean space in a unitary space, as was doneabove. This second approach to the fundamental formulas is based on thefollowing theorem :

THEOREM 10: If two real normal matrices are similar,

B=T-'AT (AAT= ATA, BBT =BTB, A=A, B=B), (127)

then they are real-similar and orthogonally-similar :

B= Q-'AQ (Q =Q =QT-1) (128)

Proof : Since the normal matrices A and B have the same characteristicvalues, there exists a polynomial g(2) (see 2. on p. 272) such that

AT =g(A),BT=g(B)Therefore the equation

g(B) =T-'g(A)T,which is a consequence of (127), can be written as follows:

BT=T-'ATT. (129)

When we go over to the transposed matrices in this equation, we obtain :

B = TTATT-1. (130)

A comparison of (127) with (130) shows that

TTTA= ATTT. (131)


Now we make use of the polar decomposition of T :

T = SQ, (132)

where S = TTT = h (TTT) (h (A) a polynomial) is symmetric and Q is realand orthogonal. Since A, by (131), is permutable with TTT, it is also per-mutable with S = h (TTT) . Therefore, when we substitute the expressionfor T from (132) in (127), we have:

B'= Q-1S-'ASQ =Q-1AQ.

This completes the proof.Let us consider the real canonical matrix

S

!u1 vi 4'...,

-Vl Pill1uq V9

1'q 1uq, 92q+i, ..., Yn). (133)

The matrix (133) is normal and has the characteristic values µl + iv1, ... ,µq ± ivq, iu2q+1, ... , ii,,. Since normal matrices are of simple structure, everynormal matrix having the same characteristic values is similar (and byTheorem 10 real-similar and orthogonally-similar) to the matrix (133).Thus we arrive at the formula (107).

The formulas (109), (111), and (11.3) are obtained in exactly the sameway.

§ 15. Commuting Normal Operators

In § 10 we have shown that two commuting operators A and B in ann-dimensional unitary space R always have a common characteristic vector.By mathematical induction we can show that this statement is true not onlyfor two, but for any finite number, of commuting operators. For given mpairwise commuting operators Al, A2,. . . , A. the first m - I of which havea common characteristic vector x, by repeating verbatim the argument ofLemma I (p. 270) (for A we take any A{ (i=1, 2, ... , m-1) and for B wetake Am), we obtain a vector y which is a common characteristic vector ofA,, 42,...,A1.

This statement is even true for an infinite set of commuting operators,because such a set can only contain a finite number (< n2) of linearly inde-pendent operators, and a common characteristic value of the latter is acommon characteristic value of all the operators of the given set.

2. Now suppose that an arbitrary finite or infinite set of pairwise com-muting normal operators A, B, C, ... is given. They all have a commoncharacteristic vector x1. We denote by Ti the (n-1)-dimensional sub-

§ 15. COMMUTING NORMAL OPERATORS 291

space consisting of all vectors of. R that are orthogonal to xI. By § 10, 3.(p. 272), the subspace TI is invariant with respect to A, B, C, ... . There-fore all these operators have a common characteristic vector x2 in T1. Weconsider the orthogonal complement T2 of the plane [XI, x21 and select in it avector x3j etc. Thus we obtain an orthogonal system xI, x2i ... , x of com-mon characteristic vectors of A, B, C, .... These vectors can be normalized.Hence we have proved :

THEOREM 11: If a finite or infinite set of pairwise commuting normaloperators A, B, C, ... in a unitary space R is given, then all these operatorshave a complete orthonormal system of common characteristic vectorsXI, Z2, ... , a,,:

Axi =,2izi, Bxi = 2izi, Czi = A; zi, ... [(zizk) =bjk; i, k =1, 2, ..., n]. (134)

In matrix form, this theorem reads as follows :

THEOREM 11': If a finite or infinite set of pairwise commuting normalmatrices A, B, C, ... is given, then all these matrices can be carried by oneand the same unitary transformation into diagonal form, i.e., there exists aunitary matrix U such that

A=U{A1, ..., Rri}U-I, B=U(A', ..., A' ) U-1,

C=UP1, ..., A }U_1,... (U=U*-).(135)

Now suppose that commuting normal operators in a euclidean space Rare given. We denote by A, B, C, ... the linearly independent ones amongthem (their number is finite). We embed R (under preservation of themetric) in a unitary space A, as was done in § 13. Then by Theorem 11, theoperators A, B, C, ... have a complete orthonormal system of common char-acteristic vectors zI, z2, ... , z, in R, i.e., (134) is satisfied.

We consider an arbitrary linear combination of A, B, C, ... .P=aA+pB±yC + .

For arbitrary real values a, / , y, . . . P is a real (PR c R) normal operatorinlandPx, = Afz,, Af = aA; + t + yet + .. .

(136)k[( = 6 = 1, 2, ..., n).xfxk) A.; 9,

The characteristic values A, (j =1, 2, ... , n) of P are linear forms ina, fl, y, .... Since P is real, these forms can be split into pairs of complexconjugates and real ones; with a suitable numbering of the characteristicvectors, we have


A_k_I =JIk+iNk, A2k=Mk-iNk, Al=M1 (137)

(k = 1, 2, ..., q; l = 2q + 1, ..., n),where Mk, Nk, and M, are real linear forms in a, B, y, ... .

We may assume that in (136) the corresponding vectors Z2k_1 and %2k arecomplex conjugates, and the z1 real :

Z2_1= xk + iyk , z 2k = xk - iyk , z1= x1 (138)

(k=1, 2, ..., q; 1=2q+1, ...,n).

But then, as is easy to see, the real vectors

xk, yk, x1 (k=1, 2, ..., q; 1=2q+ 1, ..., n)

form an orthonormal basis of R. In this canonical basis we have:"

(139)

Pxk = Mkxk - NkYk k =1, 2, .. , q(140)Pyk = Nkxk + MkYk (l = 2q + 1, ... , n

Px1 = M1x,

Since all the operators of the given set are obtained from P for specialvalues of a, f4, y, ... the basis (139), which does not depend on these parame-ters, is a common canonical basis for all the operators. Thus we have proved :

THEOREM 12: If an arbitrary set of commuting normal linear operatorsin a euclidean space R is given, then all these operators have a commonorthonormal canonical basis Xk, yk, x1:

Axk = flkxk - vkYk Bxk = /kxk - ykyk , ... ,Ayk = VkXk + Nkyk By, vkxk + f4kyk , ... , (141)

Ax1 = µ1z1; Bxt =1

1, ... .

We give the matrix form of Theorem 12:THEOREM 12'. Every set of commuting normal real matrices A, B, C,...

can be carried by one and the same real orthogonal transformation Q intocanonical form

A-QIII-11pQ/I III }Q-

- ve /L2Ii, P2o+1, ... , /in IV -, 1

(142)

+' The equation (140) follows from (136), (137), and (138).

§ 15. COMMUTING NORMAL OPERATORS 293

Note. If one of the operators A, B, C,... (matrices A, B, C,...)-say A(A)-is symmetric, then in the corresponding formulas (141) ((142)) allthe v are zero. In the case of skew-symmetry, all the a are zero. In the casewhere A is an orthogonal operator (A an orthogonal matrix), we haveµk COB TkYvt=Bin Tk, p,= f 1 (k=1, 2, ..., q; l=2q+1, ..., n).

CHAPTER X

QUADRATIC AND HERMITIAN FORMS

§ 1. Transformation of the Variables in a Quadratic Form

1. A quadratic form is a homogeneous polynomial of the second degree in

n variables x1, x2, ... , x,,. A quadratic form always has a representation

na.xixk (ack= al; i, k =1, 2, ... , n)

(,k-1

where A= II aik 1171 is a symmetric matrix.If we denote the column matrix (x1, x2i ... , x,,) by x and denote the

quadratic form byR

A(x, x) _ ' a4x{xk, (1)

{.k-1then we can write :1

A(x, x) = xTAx . (2)

If A = II aik II i is a real symmetric matrix, then the form (1) is calledreal. In this chapter we shall mainly be concerned with real quadratic forms.

The determinant I A I = I aik 11 is called the discriminant of the quadraticform A(x, x). The form is called singular if its discriminant is zero.

To every quadratic form there corresponds a bilinear form

A (x, y) _ Z aikx$yk{,k-1or

A(x, y)=xTA1 (x=(xl, ..., xn), y=(yi, ..., yn))-

(3)

(4)

If x1, x2, ... , x1, y1, y2, . . ym are column matrices and c1i c2, ... , Chd1, d2, ... , dm are scalars, then by the bilinearity of A (x, y) (see (4)) ,

I The sign T denotes transposition. In (2) the quadratic form is represented as aproduct of three matrices: the row xT, the square matrix A, and the column x.

294

§ 1. TRANSFORMATION OF VARIABLES IN QUADRATIC FORM 295

I m m

A ddyt) 2'c{ddA (x{, yi) (5){+1 I-1

If A is an operator in an n-dimensional euclidean space and if in someorthonormal basis e1, e2, ... , en this symmetric operator corresponds to

the matrix A = II a{k 11' 1', then for arbitrary vectorsn n

x =E x{e{, y ` y{e{td1 {a1

we have the identity2

In particular,

where

A(x, y) _ (Ax, y) = (x, Ay).

A(x, x) _ (Ax, x) = (x, Ax),

a{k= (Ae{,ek) (i,k=1,2,...,n).

2. Let us see how the coefficient matrix of the form changes under a trans-formation of the variables :

n

X4 =X t{k4k (i=1,2,...,n).k-1

In matrix notation, this transformation looks as follows :

(6)

(6')

Here x, are column matrices: x = (XI, x2i ... , xn) and = (S1, 2, ... , 1:0and T is the transforming matrix: T= t{k II i .

Substituting the expression for x in (2), we obtain from (6') :

A (x, x) A

whereA=TTAT. (7)

The formula (7) expresses the coefficient matrix A = I a.{k Ii 1 of the

transformed form A (, X 26{4 in terms of the coefficient matrixd,ko1

of the original form A = II a{k II i and the transformation matrix T= II t{k II zIt follows from (7) that under a transformation the discriminant of the

form is multiplied by the square "of the determinant of the transformation :

s In A (x, y), the parentheses form part of the notation; in (Ax, y) and (x, Ay), theydenote the scalar product.

296 X. QUADRATIC AND HERMITIAN FORMS

IAI=IAIITI2 (8)

In what follows we shall make use exclusively of non-singular transfor-mations of the variables (I T I 0). Under such transformations, as isclear from (7), the rank of the coefficient matrix remains unchanged (therank of A is the same as that of A).3 The rank of the coefficient matrix isusually called the rank of the quadratic form.

DEFINITION 1: Two symmetric matrices A and .9 connected as in formula(7), with I T 10, are called congruent.

Thus, a whole class of congruent symmetric matrices is associated withevery quadratic form. As mentioned above, all these matrices have one andthe same rank, the rank of the form. The rank is an invariant for the givenclass of matrices. In the real case, a second invariant is the so-called `signa-ture' of the quadratic form. We shall now proceed to introduce this concept.

§ 2. Reduction of a Quadratic Form to a Sum of Squares.The Law of Inertia

1. A real quadratic form A (x, x) can be represented in an infinite numberof ways in the form

AaX ,

where a{ 9& 0 (i =1, 2, ... , r) andri

X at{xbr_i

(i=1,2, ... , r)

(9)

are linearly independent real linear forms in the variables x1, X 2 ,- .. , x (sothat r < n).

Let us consider a non-singular transformation of the variables underwhich the first r of the new variables $1, $2, ... , are connected withx1, x2i ... , x by the formulas'

{=Xt (i=1,2, ... , r)

Then, in the new variables,

3 see p. 17.

* We obtain the necessary transformation by adjoining to the system of linear forms8,, ... , %, such linear forms .Y,.i.l, ... , X. that the forms I, (j = 1, 2, ... , n) arelinearly independent and then setting =X1 (j =1, 2, . .. , n).

§ 2. REDUCTION TO SUM OF SQUARES. LAW OF INERTIA 297

A (x, x) = A (E, ) a{ {

and therefore .A = (a1, a2, ..., a,., 0, ... , 0). But the rank of A is r. Hence :The number of squares in the representation (9) is always equal to the rank

of the form.

2. We shall show that not only is the total number of squares invariant inthe various representations of A(x, x) in the form (9), but also so is thenumber of positives (and, hence, the number of negative) squares.

THEOREM 1 (The Law of Inertia for Quadratic Forms) : In a repre-sentation of a real quadratic form A (x, x) as a sum of independent squares',

A(x,x)=JaiXi,i-1

(9)

the number of positive and the number of negative squares are independentof the choice of the representation.

Proof. Let us assume that we have, in addition to (9), another repre-sentation of A(x, x) in the form of a sum of independent squares

A (x, x) = I biY{

and thatia1

a1>0,a2>0,...,aa>0,aa+1 <0,...,a.<0,bl>0,b2>0,...,bh>0,bh+l <0,...,b,<0,

Suppose that g ; h, say g < h. Then in the identity

a,8{ =,X b{Yi (10)

we give to the variables x1, x2i ... , xn values that satisfy the system ofr - (h - g) equations

%1=0, X2=0, ...'X9'=0' Y,+1=0,...,Y,=0, (11)

5 By the number of positive (negative) squares in (9) we mean the number of positive(or negative) a,.

6 By a sum of independent squares we mean a sum of the form (9) in which allat ¢ 0 and the forms %1, %,, ... , %, are linearly independent.


and for which at least one of the forms X,+I,... , X, does not vanish.' Forthese values of the variables the left-hand side of the identity is

X a,X < 0,

and the right-hand side ish

,X bkY't2 0.k-1

Thus, the assumption g h has led to a contradiction, and the theoremis proved.

DEFINITION 2: The difference a between the number n of positivesquares and the number v of negative squares in the representation of A(x, x)is called the signature of the form A (x, x). (Notation: o, = o-[A(x,x)I).

The rank r and the signature o, determine the numbers a and v uniquely,since

r=v+v, v=2-v.

Note that in (9) the positive factor a{ I can be absorbed into the formXs (i=1, 2, ... , r). Then (9) assumes the form

A(x,x).=Xi+X2+...+Xn-Xn+1._...-Xr. (12)

Settings 4{=X{ (i=1. 2, ... , r), we reduce A(x, x) to the canonicalform

A(13)

Hence we deduce from Theorem 1 that : Every real symmetric matrix A is--1,congruent to a diagonal matrix in which the diagonal elements are +1,

or 0:

A=TT(+1,...,+1,-1,...,-1,0,...,0)T. (14)-0 1b

In the next section we shall give a rule for determining the signaturefrom the coefficients of the quadratic form.

' Such values exist, since otherwise the equations X9+1= 0, ... , X, = 0 and hence allthe equations X1= 0, X, = 0 , ... , X, = 0 would be consequences of the r - (h -ig)equations (11). This is impossible, because the linear forms X,, 8:, ..., X, are inde-pendent.

8 See footnote 4.

§ 3. METHODS OF LAGRANGE AND JACOBI 299

§ 3. The Methods of Lagrange and Jacobi of Reducing a QuadraticForm to a Sum of Squares

It follows from the preceding section that in order to determine the rankand the signature of a form it is sufficient to reduce it in any way to a sumof independent squares.

We shall describe here two reduction methods : that of Lagrange andthat of Jacobi.

1. Lagrange's Method. Let a quadratic form

A(x,x)akx{xk

be given.

We consider two cases-

1) For some g (1 g < n) the diagonal coefficient aD, is not equal tozero. Then we set

A (x, x) = a (} akxk)s+ Al (x, x)k-1 (15)

and convince ourselves by direct verification that the quadratic formAl (x, x) does not contain the variable x,. This method of separating out asquare form in a quadratic form is always applicable when there is a non-zero diagonal element in the matrix A = II aik 1111'.

2) a = 0 and ahh = 0, but a,h 0. Then we set :

1n 1 "

A (x, x) ` 2ah, [ £ (a,k + ank) xk]' -' 2ahy [?(a,L.-ahk)xk}_

+A2 (x, x). (16)

The formsn n

k..1 apkxkk

akkxk (17)

are linearly independent, since the first contains xh but not x,, and thesecond contains x, but not xh. Therefore, in (16), the forms within thebrackets are linearly independent (as sum and difference, respectively, ofthe independent linear forms (17) ).

Therefore we have separated out two independent squares in A (x, x).Each of these squares contains x, and xh, whereas A2(x, x) does not containthese variables, as is easy to verify.


By successive application of a combination of the methods 1) and 2),we can always reduce the form A(x, x) by means of rational operations toa sum of squares. Moreover, the squares so obtained are linearly independ-ent, since at each stage the square that is separated out contains an unknownthat does not occur in the subsequent squares.

Note that the basic formulas (15) and (16) can be written as follows

A(x,x)=4a_(aa) +A1(x,x), (15)

A1- ±A)2 A2(z x). (16')[(4p dxA/ -QA-a- +

Example.

A(x,x)=4xi+xi+x1+x2-4x1x2-4x1x2+4x1x4+4x2x3-4x=x4.

We apply formula (15') with g = 1:1A(x,x)16(8x1-4x2-4x3+4x4)'+A,(x,x)

=(2x1-x2-xs+x4)'+A,(z,x),where

Al (x, x) = 2 x1--3 - 2 x2x4 + 2 23x4 .

We apply formula (16') with g = 2 and h = 3:

Al (x, x)=8 (2x2+2x3)2- 1 (2x3-2x2---4z4)2 +A3(z, x)

= 2 (x3+x3)2.-2

(x2-x2-2z4)'+As(x,x),where

Finally,A2 (x, z) = 2 xZ .

A(x,x)=(2x1- xz-x3+zjs+2

(X2+xs)'-2-L (X,- x, -2x4)1+ 2xt,

r=4, Q=2.

2. Jacobi's Method. We denote the rank of A(x, x) askxlxk by r and(,k-Iassume that

1 2 ... kD4=A1 2 ... k) 0

(k=1,2, ..., r).

Then the symmetric matrix A = 11 aqk 1I i can be reduced to the form

§ 3. METHODS OF LAGRANGE AND JACOBI

G=

911 912 . . . . gin

0 922 . . . . . 92n

0 0 ... g, ... gm0 0 ... 0 ... 0

301

(18)

110 0 ... 0 ... 011

by Gauss's elimination algorithm (see Chapter II, § 1).The elements of 0 are expressed in terms of the elements of A by the

well-known formulas9

A(1 2... p-1 p\_ t1 2 ... p- I q

9P91 2 ... p-1

(1 2 ... p-1)In particular,

(q = p, p -+- 1, .... n; p =1, 2, ..., r). (19)

9 =DpP-t (p=1, 2, ..., r; Do= 1). (20)

In Chapter II, § 4 (formula (55) on page 41) we have shown that

41=GTDG, (21)

where D is the diagonal matrix :

I D1 D,_i _ 1 1 1D= D ,0, "' 0 ,-,...,-0,...,0 (22)t r 1 g211,2: g",

Without infringing (21) we may replace some of the zeros in the lastn - r rows of G by arbitrary elements. By such a replacement we canmake G into a non-singular upper triangular matrix

911

0

T=

912 ......gin922 . . . . . . 92n

0 0...gn ... gm0 0:..0 *...*

0 0 ....... *

(1T1 0). (23)

9 See Chapter II, § 2.


The equation (21) can then be rewritten :

A = VDT.

From this equation it follows that the quadratic form10

r !?k-:I! Eg ` " 4kD DL- k

kel gkk

(=(1, ,4, ...,n) Do=1)

goes over into the form A (x, x) under the transformation

=Tx.Since

sk = Xk, Xk - 9kkxk + 9k. k+ixk+1 + ' ± 9knxn

we have Jacobi's Formul"a"

r Dk_1 s X2A(x' x)=Y-DAX k 9kkk-1 k-i

(24)

(k =1, ... , r), (25)

(Do =1) . (26)

This formula gives a representation of A (x, x) in the form of a sum ofr independent squares.l2

Jacobi's formula is often given in another form.Instead of Xk (k = 1, 2, ... , r), the linearly independent forms

Yk=Dk_1Xk (k=1, 2, ..., r; Do= 1) (27)

are intros Iced. Then Jacobi's formula (26) can be written as:r v

XksA (x, x) = Dk_1Dk.

Here

Yk = okkxk + Ck, k+lxki-1 + ... -1- Cknxn (k =1, 2, ..., r)where

(28)

(29)

70 We regard D(E, ) as a quadratic form in the n variables 1, Stq, ... , skn.11 Another approach to Jacobi's formula, which does not depend on (21), can be

found, for example, in [171, pp. 43-44.12 The independence of the squares in Jacobi's formula follows from the fact that

the form A (x, x) is of rank r. But we can also convince ourselves directly of the inde-T10

and there-pendence of the forms X,, X_., ... , X,. For, according to (20), 9kk = Dkk1

fore Xk contains the variable xk, which does not occur in the-forms Xk+1, ... , X, (k= 1, 2,3, ... , r). Hence X,, X,, ... , X, are linearly independent forms.

§ 3. METHODS OF LAGRANGE AND JACOBI 303

1 2 ... k -1 kekg= A 1 2...k-1 q

Example.

A (x, x) = x,2 + 3 x= - 3 xZ - 4 xlx2 + 2 xlxs -2 x1x4 - 6 x2xs + 8 xx4 + 2 xsx4 .

We reduce the matrix

A=

to the Gaussian form

a=

1 -2 1 -1-2 3 -3 4

1 -3 0 1

-1 4 1 -3

0 -1 -1 2

o 0 0 0

o 0 0 0

Hence r = 2, 911 = 1, 922 = -1.Jacobi's formula (26) yields:

A(x, x)=(xl-2x2+xs-x4)2-(-xs-xs+2x4)s.

Jacobi's formula (28) yields the following theorem:

THEOREM 2 (Jacobi). If for the quadratic form

n

A (x, x) _ F aixxxxc. k .1

of rank r the inequality

Dk=A(12 ... k)' 0

( k = (31)

holds, then the number n of positive squares and the number v of negativesquares of A (x, x) coincide, respectively, with the number P of permanencesof sign and the number V of variations of sign in the sequence

1, D1, D2, ... , Dr, (32)

i.e., n = P(1, D1i D2, ... , D,r), v = V (1, D1, D2t ... , Dr), and the signature

v=r-2V(1,D1,D2,...,Dr). (33)

k =1, 2, ..., r) . (30)


Note 1. If in the sequence 1, D1, ... , Dr 0 there are zeros, but not threein succession, then the signature can be determined by the use of the formula

a=r- 2V(1,D1,D2,...,Dr)

omitting the zero Dk provided Dk_1Dk+1 0, and setting

1, when Dk+i <0,V (DA;-,, Dk, Dk+l, Dk+2) = (34)

2, when k±! > 0

if Dk=Dk+1=0.We state this rule without proof."t

Note 2. When three consecutive zeros occur in D1, D2, ... , Dy_,, then

the signature of the quadratic form cannot be immediately determined byJacobi's Theorem. In this case, the signs of the non-zero Dk do not determinethe signature of the form. This is shown by the following example:

A (x, x) = 2 a, xlx4 + a2x2 + asx2 (a1a3as 0) .

Here

ButD,=D2=D,=O, D4 = - aia2a3 0

v=J 1, when a2> 0, a3 > 01

13, when a2<0,a,<0.in both cases. D4 < 0.

Note 3. If D, 0,....D,._1 0, but D.=O, then the signs ofD1iD2.Dr_, do not determine the signature of the form. As a corroborating

example, we can take the form17ax + axe + bx2 + 2 axlx2 + 2 ax2xe + 2 axlxg = a (x1 + x2 + x3)2 + (b - a) x2 .

§ 4. Positive Quadratic Forms

1. In this section we deal with the special, but important, class of positivequadratic forms.

nDEFINITION 3: A real quadratic form A (x, x) Z aikxixk is called

i,km1

positive (negative) semidefinite if for arbitrary real values of the variables :

A.(x,x)?0 (<0). (35)

13 This rule was found in the case of a single zero Dk by Gundenfinger and for twosuccessive zeros Dk by Frobenius [1621.

§ 4. POSITIVE QUADRATIC FORMS 305

DEFINITION 4: A real quadratic form A(x, x) _ aikxixk is calledi,ks l

positive (negative) definite if for arbitrary values of the variables, not allzero, (x r0)

A(x, x) > 0 (< 0). (36)

The class of positive (negative) definite forms is part of the class ofpositive (negative) semidefinite forms.

Let A(x, x) be a positive-semidefinite form. We represent it in theform of a sum of linearly independent squares :

A (x, x) a1X? ,f_1

(37)

In this representation, all the squares must be positive :

ai > 0 (i = 1, 2, . .., r). (38)

For if any ai were negative, then we could select values of x1i x2, ... , xfor which

Xl_...-xi-1=xi+1=...=X,=0, X; 0.

But then A(x, x) would have a negative value for these values of the vari-ables, and by assumption this is impossible. It is clear that, conversely, itfollows from (37) and (38) that the form A(x, x) is positive semidcfinite.

Thus, a positive semidefinite quadratic form is characterized by the equa-tions a=r (a=r, v=0).

Now let A(x, x) be a positive-definite form. Then A(x, x) is also posi-tive semidefinite. Therefore it is representable in the form (37), where allthe ai (i = 1, 2, ... , r) are positive. From the positive definiteness it followsthat r = n. For if r < n, we could find values of x1, x2, ... , x., not all zero,such that all the %i would be zero. But then by (37) A (x, x) = 0 for x ; o,and this contradicts (36).

It is easy to see that, conversely, if in (37) r =n and all the a1, a2i ... , anare positive, then A(x, x) is a positive-definite form.

In other words: A positive-semidefinite form is positive definite if andonly if it is not singular.

2. The following theorem gives a criterion for positive definiteness in theform of inequalities which the coefficients of the form must satisfy. Weshall use the notation of the preceding section for the sequence of the prin-cipal minors of A :


D1=a11, D2= all a12

a21 a22

I all a12 ... a,,

a21 a22 ... a2n

I a 1 a". I

THEOREM 3: A quadratic form is positive definite if and only if

D1>0,D2>0,...,D.>0. (39)

Proof. The sufficiency of the conditions (39) follows immediately from

Jacobi's formula (28). The necessity of (39) is established as follows.

From the fact that A(x, x) aikxixk is positive definite, it follows thati.kel

the `restricted' forms14p

AP (x, x) _Y akx{xk (p =1, 2, ..., n)i.k-1

are also positive definite. But then all these forms must be non singular, i.e.,

Dn= I A, 1 0 (p=1,2,...,n).We are now in a position to apply Jacobi's formula (28) (for r= n).

Since all the squares on the right-hand side of the formula must be positive,

we haveDi>0, D1D2>0, D2D$>0, ...,

Hence the inequality (39) follows, and the theorem is proved.Since every principal minor of A can be brought into the top left corner

by a suitable numbering of the variables, we have the

COROLLARY : In a positive-definite quadratic form A (x, x) = ,Eaikxixk,t, k-l

all the principal minors of the coefficient matrix are positive:"

A(::::;)>0ilii(1Si1<i2<...<iPSn;p=1,2, ...,n).ii iNote.

If the successive principal minors are non-negative,

D1 _-0,D2?0,...,Da?0, (40)

14 The form A, (x, x) is obtained from A(x, x) if we set in the latterxP+,=... (p=1,2,...,n).

15 Thus, when the successive principal minors of a real symmetric matrix are positive,all the remaining principal minors are then also positive.

§ 4. POSITIVE QUADRATIC FORMS

it does not follow that A(x, x) is positive semidefinite. For, the form

a11xQ s1 + 2a12x1x2 + a22x2

307

in which a11= a12 = 0, a22 < 0 satisfies (40), but is not positive semidefinite.However, we have the following theorem.

nTHEOREM 4: A quadratic form A(x, x) _ .i alkxixk is positive semi-

ti.k- Idefinite if and only if all the principal minors of its coefficient matrix arenon-negative :

A(sly2..'p)?0 (15i1<i2<...<i n;p=1,2,...,n). (41)

\ it i2 ' ip /

Proof. We introduce the auxiliary form

nA, (x, (e<0).

i-1

Obviously lim A, (x, x) =A (x, x).s-.0

The fact that A (x, x) is positive semidefinite implies that A. (x, x) ispositive definite, so that we have the inequality (cf. Corollary to Theorem 3) :

As(iltill2 iP)>0 (1 Si1<i2<... <ipSn;p=1, 2, ..., n).i2 ip

Proceeding to the limit for e--+0, we obtain (41).Suppose, conversely, that (41) holds. Then we have

As(ili2... is, )=Ep+ ?ep>0 ...,n).Sl i2 ... ip

But then (by Theorem 3), A. (x, x) is positive definite

A. (x, x) > 0 (x # o) .

Proceeding to the limit for e -+ 0 we obtain :

A (x, x) ?0.This completes the proof.

The conditions for a form to be negative semidefinite and negative def i-nite are obtained from (39) and (41), respectively, when these inequalitiesare applied to - A(x, x).


THEOREM 5: A quadratic form A(x, x) is negative definite if and onlyif the following inequalities hold:

Dl<0,D2>0,D3<0,...,(-1)"Dn>0. (42)

THEOREM 6: A quadratic form A (x, x) is negative semidefinite if andonly if the following inequalities hold:

(-- 1)rA (i1 2 ... iP I ? 0 (1 S i1 < i2 < ... < a n ; p =1, 2, ... , n) . (43)11 22 ip JJ

§ S. Reduction of a Quadratic Form to Principal Axes

1. We consider an arbitrary real quadratic form

A (x, x) aikxixk .i, k_1

Its coefficient matrix A= 11 aik I!i is real and symmetric. Therefore(see Chapter IX, § 13) it is orthogonally similar to a real diagonal matrix A,i.e., there exists a real orthogonal matrix Q such that

A =Q-IAQ (A = 11 Ai8ik jji, QQT =E) . (44)

Here Ai, £2, ... , A. are the characteristic values of A.Since for an orthogonal matrix Q-I = QT, it follows from (43) that

under the orthogonal transformation of the variables

x = Q (QQT = E) (45)

or, in greater detail,n

xi = gik k 4 gigk; = 8a ; i, k =1, 2, ... , (45 )k_1

the form A(x, x) goes over into

n

(46)

§ 5. REDUCTION TO PRINCIPAL AXES 309

THEOREM 7: Every real quadratic form A(x, x) atkxixk can be;,k-l

reduced to the canonical form (46) by an orthogonal transformation, whereA1, A2, ... , A. are the characteristic values of A= II ak

The reduction of the quadratic form A(x, x) to the canonical form (46)is called reduction to principal axes. The reason for this name is that theequation of a central hypersurface of the second order

n

Z aikx;xk = c (c = const. 0) (47)i, k-1

under the orthogonal transformation (45') of the variables assumes thecanonical form

fi (=-; Ai e,= l; i=1, 2, ...,n (48)a{ c

If we regard x1, x2, ... , x,, as coordinates in an orthonormal basis in ann-dimensional euclidean space, then S1, 2i ..., ,, are the coordinates in anew orthonormal basis of the same space, and the `rotation"' of the axesis brought about by the orthogonal transformation (45). The new coordi-nate axes are axes of symmetry of the central surface (47) and are usuallycalled its principal axes.

2. It follows from (46) that the rank r of A(x, x) is equal to the number ofnon-zero characteristic values of A and the signature o is equal to the differ-ence between the number of positive and the number of negative character-istic values of A.

Hence, in particular, we have the following proposition :If under a continuous change of the coefficients of a quadratic form the

rank remains unchanged, then the signature also remains unchanged.

Here we have started from the fact that a continuous change of thecoefficients produces a continuous change of the characteristic values. Thesignature can only change when some characteristic value changes sign.But then at some intermediate stage this characteristic value must passthrough zero, and this results in a change of the rank of the form.

to If IQ I=-1, then (45) is a combination of a rotation with a reflection (seep. 287). However, the reduction to principal axes can always be effected by a properorthogonal matrix (I Q I = 1). This follows from the fact that, without changing thecanonical form, we can perform the additional transformation

{=Ei (i=1, 2, ..., n-1), 1R=-$,n.


§ 6. Pencils of Quadratic Forms

1. In the theory of small oscillations it is necessary to consider simul-taneously two quadratic forms one of which gives the potential, and theother the kinetic energy, of the system. The second form is always positivedefinite.

The study of a system of two such forms is the object of this section.Two real quadratic forms

nA (x, x) afkxixk and B (x, x) = ' b;kxixk

i, ko.1 i, k+.1

determine the pencil of forms A (x, x) - AB(x, x) (A is a parameter).If the form B (x, x) is positive definite, the pencil A(x, x) - 2B (x, x)

is then called regular.The equation

I A-AB I =0

is called the characteristic equation of the pencil of forms A(x,x)-AB(x,x).We denote by A some root of this equation. Since the matrix A -

is singular, there exists a column z = (z1, z2i ... , o such that

Az=A0Bz (z:7-1

The number A will be called a characteristic value of the pencilA(x, x) - AB(x, x) and z a corresponding principal column or `principalvector' of the pencil. The following theorem holds:

THEOREM 8: The characteristic equation

I A-AB I =0

of a regular pencil of forms A (x, x) - AB(x, x) always has n real rootsAk with the corresponding principal vectors zk = (z,k, z2k, ..., znk)(k=l,2,...,n):

Azk=AkBzk (k=1,2,...,n). (49)

These principal vectors zk can be chosen such that the relations

B(zi,zk)=bik (i,k=1,2,...,n) (50)are satisfied.

§ 6. PENCILS OF QUADRATIC FORMS 311

Proof. We observe that (49) can be written as :

B-tAzk=Akzk (k=1, 2, ..., n). (51)

Thus, our theorem states that the matrix

D = B-1A (52)

1. has simple structure, 2. has real characteristic values, and 3. has charac-teristic columns (vectors) z', z2, ... , z" corresponding to these characteristicvalues and satisfying the relations (50)."

In order to prove these three statements, we introduce an n-dimensionalvector space R over the field of real numbers. In this space we fix a basisel, e2, ... , e,, and introduce a scalar product of two arbitrary vectors

x = 2' x{et, y = , ' yieii-1 i-1

by means of the positive-definite bilinear form B(x, y) :

n

(xy) = B (x, y) _ bjtxcyk = xT By (53)

i,k-1

and hence the square of the length of a vector x by means of the form B (x, x) :

(xx) = B (x, x) = xT Bx , (53')

where x and y are columns x = (x,, x2, ... , x"), y = (y1, y2, ... ) y").It is easy to verify that the metric so introduced satisfies the postulates

1.-5. (p. 243) and is, therefore, euclidean.We have obtained an n-dimensional euclidean space R, but the original

basis el, e2, ... , e" is, in general, not orthonormal. To the matrices A, B, andD = B-1A there correspond in this basis linear operators in R : A, B, andD=B-1A.18

17 If D were a symmetric matrix, then the properties 1. and 2. would follow immediatelyfrom properties of a symmetric operator (Chapter IX, p. 284). However, D, as a productof two symmetric matrices, is not necessarily itself symmetric, since D = B-IA andDT=AB-1.

18 Since the basis e1, e_, ... , e is not orthonormal, the operators A and B to which,in this basis, the symmetric matrices A and B correspond, are not necessarily symmetricthemselves.


We shall show that D is a symmetric operator in R (see Chapter IX,§ 13).'s Indeed, for arbitrary vectors x and y with the coordinate columnsx= (x1, x2, ..., xn) and y= (yl, y2, ..., yn) we have, by (52) and (53),

(Dx, y) = (Dx)TBy= xTDTBy= xTAB-1By= xTAy

and

(Dx, y) = (x, Dy) .

The symmetric operator D = B- 'A has real characteristic values Al, A2,A3f ... , A. and a complete orthonormal system of characteristic vectors zl, %2,as3, ... , an (seep. 284, Chapter IX) :

(x, Dy) = xTBDy = xTBB-'Ay = xTAy,

B-lAzk = Akzk (k = 1, 2, ..., n), (54)

(zrzk) = ark (1, k= 1, 2, ..., n). (54')

Let zk = (zlk, z2k, ... , znk) be the coordinate column of sk (k = 1, 2, ... , n)in the basis e1, e2, ... , en. Then the equations (54) can be written in theform (51) or (49) and the relations (54'), by (53), yield the equation (50).

This completes the proof.Note that it follows from (50) that the columns z1, z2, ... , z" are linearly

independent. For suppose thatn

'r C,-* = 0.k-1

Then foreveryi (15i<n),by (50),

(55)

n

cxzk

n

0=B(z', cB(zr, zk)=cc.-1k-1

Then all the cr (i = 1, 2, ... , n) in (55) are zero and there is no linear depend-ence among the columns z1, z2, ... , z".

A square matrix formed from principal columns z1, z2, ... , z" satisfyingthe relations (50)

Z=(z1, z2, ..., z")=IIzrk1fi

will be called a principal matrix for the pencil of forms A (x, x) - AB (x, x).

19 Hence D is similar to some symmetric matrix.


The principal matrix Z is non-singular (I Z 1 =,4 0), because its columns arelinearly independent.

The equation (50) can be written as follows :

z{TBzk = 6, (i, k = 1, 2, ... , n) . (56)

Moreover, when we multiply both sides of (49) on the left by the row matrixT

Z' , we obtain :

z'TAzk = IkziTBzk = Akbik (1, k =1, 2, ... , n) . (57)

By introducing the principal matrix Z = (zi, z2, ... , z"), we can repre-sent (56) and (57) in the form

I L I ZTBZ = C . (58)

The formulas (58) show that the non-singular transformation

x=Z (59)

reduces the quadratic forms A(x, x) and B(x, x) simultaneously to sums ofsquares:

n nandkk.i k.=1(60)

This property of (59) characterizes a principal matrix Z. For supposethat the transformation (59) reduces the forms A(x, x) and B(x, x) simul-taneously to the canonical forms (60). Then (58) holds, and hence (56)and (57) holds for Z. (58) implies that Z is non-singular (I Z ,' 0). Werewrite (57) as follows :

ziT(Azk - AkBzk) = o (i = 1, 2, ... , n), (61)

where k has an arbitrary fixed value (1 < k < n). The system of equations(61) can be contracted into the single equation

ZT (Azk - ABzk) = 0 ;

hence, since ZT is non-singular,

Azk-AkBzk=O;

i.e., for every k (49) holds. Therefore Z is a principal matrix. Thus wehave proved the following theorem :


THEOREM 9: If Z = Ii zik 11,11 is a principal matrix of a regular pencilof forms A(x,x) -AB(x,x), then the transformation

x=ZE (62)

reduces the forms A(x,x) and B(x,x) simultaneously to sums of squares

L-1 km1

(63)

where Al, A2, ... , A. are the characteristic values of the pencilA(x, x)-AB(x, x) corresponding to the columns z1, z2, ... , z" of Z.

Conversely, if some transformation (62) simultaneously reduces A (x, x)and B(x, x) to the form (63), then Z='I zfk 11 is a principal matrix of theregular pencil of forms A(x, x)- AB(x, x).

Sometimes the characteristic property of the transformation (62) for-mulated in Theorem 9 is used for the construction of a principal matrixand the proof of Theorem 8.20 For this purpose, we first of all carry outa transformation of variables x = Ty that reduces the form B(x, x) to

the `unit' sum of squares yk (which is always possible, since B (x, x) isk - 1

positive definite). Then A (x, x) is carried into a certain form A1(y, y).

Now the form Al (y, y) is reduced to the form 27 Ak E by an orthogonal trans-k-1

formation y = QE (reduction to principal axes!). Then, obviously,21n

27yk Thus the transformation x = ZE, where Z = TQ, reduces thek_1 k.ltwo given forms to (63). Afterwards it turns out (as we have shown onp. 313) that the columns z1, z2, ... , zn of Z satisfy the relations (49) and (50).

In the special case where B (x, x) is the unit form, i.e., R (x, x) _ xl ,k-1so that B = E, the characteristic equation of the pencil A(x, x) - AB(x, x)coincides with the characteristic equation of A, and the principal vectorsof the pencil are characteristic vectors of A. In this case the relations (50)can be written as follows :

z`TZk=h1k (i, k = 1, 2, ... , n)

and they express the orthonormality of the columns z1, z2, ... , zn.

20 See [ 171, pp. 56.57.

21 An orthogonal transformation does not alter a sum of squares of the variables,because (Qx)TQ5=xTx.


2. Theorems 8 and 9 admit of an intuitive geometric interpretation. We

introduce a euclidean space R with the basis e1, e2, ... , e and the funda-mental metric form B(x, x) just as was done for the proof of Theorem 8.In R we consider a central hypersurface of the second order whose equation

is n

A (x, x) askxixk = c . (64)

After the coordinate transformation x = Z = 11 zik 11 71 is a prin-cipal matrix of the pencil A(x, x) - AB(x, x.), the new basis vectors are thevectors z1, z2, ..., z" whose coordinates in the old basis form the columnsof Z, i.e., the principal vectors of the pencil. These vectors form an ortho-normal basis in which the equation of the hypersurface (64) has the form

n

,' Ak6,'k=c. (65)k-1

Therefore the principal vectors z1, z2, ... , z" of the pencil coincide in direc-tion with the principal axes of the hypersurface (64), and the characteristicvalues A,, A2, ... , A. of the pencil determine the lengths of the semi-axes :

Ak = f ; (k= 1, 2, ... , n).k

Thus, the task of determining the characteristic values and the principalvectors of a regular pencil of forms A(x, x) - AB (x, x) is equivalent to thetask of reducing the equation (64) of a central hypersurface of the secondorder to principal axes, provided the equation of the hypersurface is givenin a general skew coordinate system22 in which the `unit sphere' has theequation B (x, x) =1.

Example. Given the equation of a surface of the second order

2x2-2y2-3z2-10yz+2xz-4=0 (66)

in a general skew coordinate system in which the equation of the unit sphereis

2x2+3y2+2z2+2xz=1, (67)

it is required to reduce equation (66) to principal axes.In this case

2 0 1

0 -2 -51 -5 -3

BHf

2 0 1

0 3 01 0 2

22 I.e., a skew coordinate system with distinct units of lengths.along the axes.


The characteristic equation of the pencil I A - AB = 0 has the form

2-21 0 1-A0 -2-32 -5

1-2 -5 -3-22

This equation has three roots : Al =1, 22 =1, A, _ - 4.We denote the coordinates of a principal vector corresponding to the

characteristic value 1 by n, v, w. The values of u, v, w are determined fromthe system of homogeneous equations whose coefficients are the elements ofthe determinant (68) for A =1:

-5w-5w

In fact we have only one relation

v+w=0.

To the characteristic value A = 1 there must correspond two orthonormalprincipal vectors. The coordinates of the first can be chosen arbitrarily,provided they satisfy the relation v + w = 0.

We setU=0, v, w=-v.

We take the coordinates of the second principal vector in the form

u' v' w' = - v'

and write down the condition for orthogonality (B(z', z2) = 0) :

2uzt' + 3vv' + 2ww' + uw' + zn'w = 0.

Hence we find : u' = W. Thus, the coordinates of the second principalvector are

at.' = 5v', v', w' = - v'.

Similarly, by setting A _ - 4 in the characteristic determinant, we findfor the corresponding principal vector :

u" v" =-u" w"-=-2u".

§ 7. EXTREMAL PROPERTIES OF CHARACTERISTIC VALUES 317

The values of v, v', and u" are determined from the condition that thecoordinates of a principal vector must satisfy the equation of the unit sphere(B (x, x) =1), i.e., (67). Hence we find :

1 , 1v= v = 3 .

Therefore the principal matrix has the form

00 1

3 3

1 1 1Z= 5 3V6 3

1 1 2

and the corresponding coordinate transformation (x = Z$) reduces theequations (66) and (67) to the canonical form

i+$i-4H-4=0, i+Cs+Hs=1The first equation can also be written as follows :

4!+ 4'- 1l=1.

This is the equation of a one-sheet hyperboloid of rotation with real semi-axes equal to 2, and an imaginary one equal to 1. The coordinates of theendpoint of the axis of rotation is determined by the third column of Z,i.e., -1/3,1/3, 2/3. The coordinates of the endpoints of the other two ortho-gonal axes are given by the first and second columns.

§ 7. Extremal Properties of the Characteristic Values of aRegular Pencil of Forms23

1. Suppose that two quadratic forms are given

nA (x, x) aikxixk and B (x, x) _ Y bikxixk,

i.k=1 i,k-1

of which B(x,x) is positive definite. We number the characteristic valuesof the regular pencil of forms A(x, x) -AB(x,x) in non-descending order:

A, 5 A$ S ... S A. - (69)

23 In the exposition of this section, we follow the book 1171, § 10.


The principal vectors24 corresponding to these characteristic values aredenoted, as before, by zl, z2, ... , zn :

zk = (zlk, z21, ... , zriz) (k =1, 2,- - , n).

Let us determine the least value (minimum) of the ratio of the formsA(x, x) considering all possible values of the variables, not all equal to zero

(x, xl(x o). For this purpose it is convenient to go over to new variabesl, 2, ... , 6. by means of the transformation

nx=Z (x{= i = 1, 2, ..., n)

k-1

where Z = II Ztk 11I is a principal matrix of the pencil A(x, x) - AB(x, x).In the new variables the ratio of the forms is represented (see (63)) by

A(x, x) A,l t + A2 + ... + (70)

On the real axis we take the n points A,, A2, ... , L. We ascribe to thesepoints non-negative masses m1= m2 = 2, ... , respectively.

Then, by (70), the quotient `4(x'x) is the coordinate of the center of theseB(x, x)

masses. Therefore

A SA(x'x)SA .B(x,x)-Let us, for the time being, ignore the second part of the inequality and

investigate when the equality sign holds in the first part. For this purpose,we group together the equal characteristic values in (69)

A1= ... = Apt < Ap1+1= ... = API+p, < .. (71)

The center of mass can coincide with the least value Al only if all themasses are zero except at this point, i.e., when

41+1=..._ n=0.In this case the corresponding x is a linear combination of the principalcolumns z1, z2, ... , zQ,.25 Therefore all these columns correspond to thecharacteristic value A,, so that x is also a principal column (vector) for A = A,.

24 Here we use the term 'principal vector' in the sense of a principal column of thepencil (see p. 310). Throughout this section, having the geometric interpretation in mind,we often call a column, a vector.

n25 Froni x.= ZE it follows that x='tkzk.

k-1


We have proved :

THEOREM 10: The smallest characteristic value of the regular pencil

A(x, x) - AB(x, x) is the minimum of the ratio of the forms A(x, x) mend

B(x, X)

I = min A (x' x) (72)B(x,x)'

and this minimum is only assumed for principal vectors of the characteristicvalue A,.

2. In order to give an analogous `minimal' characteristic for the next chAr-acteristic value A2, we restrict ourselves to all the vectors orthogonal to z',i.e., to those that satisfy the equation26

B(z',x)=0.For these vectors,

and therefore

A(x,x)_A2Ej+...+Ma'

B(x,x) Ei+...+El

minA(x,x)__A2 (B(zl,x)=0).B(x,x)

Here the equality sign holds only for those vectors orthogonal to z' thatare principal vectors for the characteristic value A2.

Proceeding to the subsequent characteristic values, we eventually obtainthe following theorem :

THEOREM 11: For every p (1 p < n) the p-th characteristic value A,in (69) is the minimum of the ratio of the forms

= min A (x, x) (73)AP B(x,x)'

provided that the variable vector x is orthogonal to the first p -1 ortho-normal principal vectors z', z2, ... , zp-' :

26 Here, and iu what follows, we shall mean by the orthogonality of two vectors(columns) x, y that the equation B(x, y) = 0 holds. This is in complete agreementwith the geometric interpretation given in the preceding section. We shall regard thequantities .r,, x. as the coordinates of a vector x in some basis of a euclidean spacein which the square of the length (the norm) is given by the positive-definite form

B(.r., x) b;kx;xk . In this metric the vectors z', z, ... , z'1 form an orthonormalt.k-I

X

basis. Therefore, if the vector x= 1 Ekzk is orthogonal to one of the Zk, then the cor-k-1

responding Ek = 0.


B (zl, x) = 0, ... , B (0-1, x) =0 . (74)

Moreover, the minimum is assumed only for those vectors that satisfy thecondition (74) and are at the same time principal vectors for the charac-teristic value A,,.

3. The characterization of A given in Theorem 11 has the disadvantagethat it is connected with the preceding principal vectors z', z2, ... , zQ-1 andcan therefore be used only when these vectors are known. Moreover, thereis a certain arbitrariness in the choice of these vectors.

In order to give a characterization of lp (p = 1, 2, ... , n) free from thesedefects, we introduce the concept of constraint imposed on the variablesx1, x2, - - , xn-

Suppose that linear forms in the variables x1i x2, ... , xn are given :

Lk(x) = llkxl +12kx2+---+l,,kx" (lc= 1, 2, ..., h). (74')

We shall say that the variables x1, x2, ... , x" or (what is the same) thevector x is subject to h constraints L1, L2, ... , L,, if only such values of thevariables are considered that satisfy the system of equations

Lk(x)=O (k=1,2,...,h). (74")

Preserving the notation (74') for arbitrary linear forms we introducea specialized notation for the `scalar product' of x with the principal vectorsz1, z2, ... , Z":

Lk(x) = B(zk, x) (k =1, 2, ... , n).2' (75)

Furthermore, when the variable vector is subject to the constraints (74")we shall denote min A (x, x) as follows:B(x, x)

Afi (B; L1, L2, ... , Lk) .

In this notation, (73) is written as follows :

dp=µ (B;.ZV L2, ..., Lp_1) (p=1, 2, ..., n). (76)

We consider the constraints

and

Ll(x)=0, ..., Lp_1(x)=0 (77)

Lr+1 (x) = 0, ... , L. (x) = 0. (78)

27 Lk (x) = zkT Bx = dkx, + 12kx2 + - - + where llk, l1k, ... , lnk are the ele-ments of the row matrix zkTB (k = 1, 2, ... , n).


Since the number of constraints (77) and (78) is less than n, there existsa vector x11 I o satisfying all these constraints. Since the constraints (78)express the orthogonality of x to the principal vectors zvfl, ... , z", the corre-sponding coordinates of x(l) are 4+1 = . = " = 0. Therefore, by (70),

B (x(1), x(1)) ^ 4i ... +g P

But thenA \ A (x(1), x(1))(B; L1, L2, ... , LF_1) S

B (x(1), x(1y)< p

This inequality in conjunction with (76) shows that for variable con-straints L1, L2, ... , L,_1 the value of ,u remains less than or equal to A andbecomes A. if the specialized constraints L1, L2f ... , Lp_1 are taken.

Thus we have proved :

THEOREM 12: If we consider the minimum of the ratio of the two formsA(z,x)B(x, x) for p - 1 arbitrary, but variable, constraints L1, L2, ... , L9thenthe maximum of these minima is equal to A,:

AAV = max µ B; Ll, L2, ... , Lr-1) (p =1, ... , n). (79)

Theorem 12 gives a 'maximal-minimal' characterization of Al, 22, ... , A.in contrast to the `minimal' characterization which we discussed in Theo-rem 11.

4. Note that when in the pencil A (x, x) - AB (x, x) the form A (x, x) isreplaced by - A(x, x), all the characteristic values of the pencil changesign, but the corresponding principal vectors remain unchanged. Thus, thecharacteristic values of the pencil - A(x, x) -AB(x, x) are

-A"S-A"-1<...5Moreover, by using the notation

e (B; Ll, £2, ..., Lh) = max B(z,z) (80)

when the variable vector is subject to the constraints L1, L2, ... , L4, we canwrite :

andfu (- B; £1, Lz, ..., Lh) = -v (B; L1, L8, ..., Lh)

max s (_- B; L1, Ls, ... , Lh) _ - mine (B; Ll, L$, ... , L-)'Therefore, by applying Theorems 10, 11, and 12 to the ratio - A (x, x)

we obtain instead of (72), x)'), (76), and (79) the formulas


=maxA(x,x)

* B(x,x)

Av (g; L*, L*_1, ... , L*-r+2)

(p=2,.. ,n).-r+I =min v (B; L1, L2, ... , LVA,

These formulas establish the `maximal' and the 'minimal-maximal' prop-erties, respectively, of A,, 22, ..., A,,, which we formulate in the followingtheorem :

THEOREM 13: Suppose that to the characteristic values

a, 5 A2 5 A.

of the regular pencil of forms A(x, x) -AB(x, x) there correspond the line-arly independent principal vectors of the pencil z', z2, ... , z". Then :

1) The largest characteristic value A. is the maximum of the ratio of theforms A (x, x) .

B (z, x) '

%*=max(x'x) (81)B(x,z)'

and this maximum is assumed only for principal vectors of the pencil corre-sponding to the characteristic value A,,.

2) The characteristic value p-th from the end 2"_n+i (2:5 p:5 n) is themaximum of the same ratio of the forms

* = max A x, x) (82)A -P+ B(x,x)

provided that the variable vector x is subject to the constraints :28

B (z*, x) = 0, B (z*-1, x) = 0, ... , B (z*P+2, x) = 0, (83)

A )(84)

this maximum is assumed only for principal vectors of the pencil correspond-ing to the characteristic value and satisfying the constraints (83).

28 In a euclidean apace with a metric form B(x, x), the condition (83) expresses thefact that the vector x is orthogonal to the principal vectors z"-P+2 , ... , X". Bee foot-note 26.


3) If in the maximum of the ratio of the forms 4!!L!) with the constraintsB(x,x)

L1 (x) = 0, ... , L,,_1 (x) = 0 (2:!-. p S n)

(2 < p < n) the constraints are varied, then the least value (minimum) ofthis maximum is equal to

An-p+1= min v ; L1, L2, ... , Lr_1 . (85)0

Lo (x) = 0, L2 (x) = 0, ... , Lh W= 0 . (86)

be h independent constraints.29 Then we can express h of the variablesx1, x2, ... , x by the remaining variables, which we denote by v1f v2,. .. ,Therefore, when the constraints (86) are imposed, the regular pencil offorms A (x, x) - AB (x, x) goes over into the pencil A° (v, v) - 2B° (v, v),where B° (v, v) is again a positive -definite form (only in n - h variables).The regular pencil so obtained has n - h real characteristic values

Ai A2 S ... S Ao (87)

Subject to the constraints (86) we can express all the variables in termsof n - h independent ones v1, v2, ... , vri_k in various ways. However, thecharacteristic values (87) are independent of this arbitrariness and havecompletely definite values. This follows, for example, from the maximal-minimal property of the characteristic values

A1= Mill B° (v, v) = µ (B ; L1, L2, .... LA) (88)

and, in general,

0AP = max ,u (B° ; L1, L9, ..., LP-1)

= max is (B ; Ll, ... , LA, L1, ... , Lp_1) , (89)

where in (89) only the constraints L1, L2, ..., L,_1 are allowed to vary.

29 The constraints (86) are independent when the linear forms Lo (x), L2 (x), ... ,LA (x) on the left-hand sides of (86) are independent.


The following theorem holds :

THEOREM 14: If Al < 2 ... <1n are the characteristic values of theregular pencil of forms A(x, x) - AB(x, x) and 11 0 S AQ S ... S ,1 arethe characteristic values of the same pencil subject to h independent con-straints, then

AP S AP s Ap+h (p=1,2, ... , n-h). (90)

Proof. The inequality 1p S 1p (p = 1, 2, ... , n - h) follows easily from(79) and (89). For when new constraints are added, the value of theminimum µ (B ; LI, ... , Lp_i increases or remains the same. Therefore

(B LI, ..., L,_I) (B ; L1, ..., Lh ; LI, ... , Lp_i)

Hence

= max µ (AB ; LI, ... , L S A0 ° = max (AB; L° L° LI L _P v-1 - 1 ..., h, , ..., p I

The second part of the inequality (90) holds in view of the relations

Apmaxµ(B; L°, ..., L,°,; L1, ..., Lp-I)S max µ (A ; LI, ..., LP-1) Lo

Here not only are L1, ... , La-1 varied, on the right-hand side, but Lp, ... ,Lp+,,_1 also; on the left-hand side the latter are replaced by the fixed con-straints Li, L°, . . . , LA .

This completes the proof.

6. Suppose that two regular pencils of forms

A (x, x) - AB (x, x), A (x, x) - AB (x, x) (91)

are given and that for every x o,

A(x,x) SA(x,x)B(x,x) - B(x,x)

Then obviously,


max,u (B ; L1, L2, ... , LPI-1) S max,u(=J ; L1, L2, ... , Lr-11

(p=1,2, ..., n).

Therefore, if we denote by Al < A2 < ... A. and A, < ;2:5:- ...:5 2,,, re-spectively, the characteristic values of the pencils (91), then we have :

ASAP (p=1,2, ..., n).

Thus, we have proved the following theorem :

THEOREM 15: If two regular pencils of forms A(x, x) - AB(x, x) andA(x, x) - AB(x, x) with the characteristic values A, < A2 < ... < A,, and11 < 22 < ... < 2,, are given, then the identical relation

A (x,x) A(x,z)(92)

B(x,x) B(x,z)

implies that ArSAp (p=1,2, ..., n). (93)

Let us consider the special case where, in (92), B(x, x) = B (x, x). Inthis case, the difference A(x, x) - A(x, x) is a positive-semidefinite quad-ratic form and can therefore be expressed as a sum of independent positivesquares:

A(x,x)=A(x,x)+2'[X (x)]2.i-1

Then, when the r independent constraints

%1 (x)=0, %2(x)=0, ... , %,(x)=0

are imposed, the forms A(x,x) and A (x, x) coincide, and the pencilsA('x, x) - AB(x, x) and A(x, x) - AB(x, x) have the same characteristicvalues

Applying Theorem 14 to both pencils A (x, x) - AB (x, x) and(x, x) -AB(x,x), we have:

APSAp5AP.F., (p=1,2,...,n-r).In conjunction with the inequality (93), this leads to the following theorem :


THEOREM 16: If Al < A2 < ... < A. and 1, !5; 12 < ... < 1. are the char-acteristic values of two regular pencils of forms A(x, x) - AB(x, z) andA(x, x) - AB (x, x), where

A(x,x)=A(x,x)+[I (x)]2,i.1

and X4 (x) (i = 1, 2, ... , r) are independent linear forms, then the followinginequalities hold :30

A,, 2p+r (p =1, 2, ... , n) . (94)

In exactly the same way the following theorem is proved :

THEOREM 17 : If Al < A2 !5; ... < A. and 11 <_ A2 < ... < !n are the char-acteristic values of the regular pencil of forms A(x, x) -AB(x, x) andA(x, x) - AB (x, x), where the form T3 (x, x) is obtained from B(x, x) byadding r positive squares, then the following inequalities hold:"

1,, 2p (p=1,2, ..., n). (95)

Note. In Theorems 16 and 17 we can claim that for some p we have,respectively Ap < 2, and X. < Ap, provided of course that r 0.$2

§ S. Small Oscillations of a System with n Degrees of Freedom

The results of the two preceding sections have important applications in thetheory of small oscillations of a mechanical system with n degrees of freedom.

1. We consider the free oscillations of a conservative mechanical systemwith n degrees of freedom near a stable position of equilibrium. We shallgive the deviation of the system from the position of equilibrium by meansof independent generalized coordinates q1, q2, ... , q,,. The position ofequilibrium itself corresponds to zero values of these coordinates : q, = 0,q2 = 0, ... , q = 0. Then the kinetic energy of the system is represented asa quadratic form in the generalized velocities q,, 42, ... , 4.: 33

n

T = X be (q1, q2, ... , qA) 449Ei,k-l

30 The second parts of these inequalities hold for p < n - r only.31 The first parts of the inequalities hold for p > r.32 See [17], pp. 71-73.

33 A dot denotes the derivative with respect to time.

§ 8. SMALL OSCILLATIONS OF SYSTEM WITH n DEGREES OF FREEDOM 327

Expanding the coefficients bik(gl, q2, ... , qn) as power series in ql, q2, ... , qn

bik(g1,g2, ... , g,.)=bik+.,. (i,k=1,2, .... n)

and keeping only the constant terms b4k, since the deviations q1, q2, . , qn

are small, we then have :n

T = E bikMki,ka1

(bik = bk{ ; i, k =1, 2, ..., n).

The kinetic energy is always positive, and is zero only for zero velocities

q1= q2 = ... = qn = 0. Therefore bikgigk is a positive-definite form.i, k..l

The potential energy of the system is a function of the coordinates :P (ql, q2 ,- .. , qn) . Without loss of generality, we can take

P0=P(0,0,...,0)=0.

Then, expanding the potential energy as a power series in q1, q2, ... , qn,we obtain : n

P=,Y aM+aikggk+...

i=1 i,ke1

Since in a position of equilibrium the potential energy always has astationary value, we have

a,i dq Io=0 (i=1, 2, ..., n).

Keeping only the terms of the second order in q1, q2i ... , q, we haven

P =Z aikgigk (aik =aki; i, k =1, 2, ... , n) .i, k-1

Thus, the potential energy P and the kinetic energy T are determined bytwo quadratic forms :

n n

P =,' a.xgigk, T =,Ybikgigk , (96)i,k-1 i,k-I

the second of which is positive definite.We now write down the differential equations of motion in the form

of Lagrange's equations of the second kind :

d 8T eT 8P (i =1, 2, ..., n). (97)at a4{ - aq _ - aQi

34 See, for example, G. K. Buslow (Buslov), Theoreti8che Mechanik, § 191.


When we substitute for T and P their expressions from (96), we obtain:

X btkgk + X aikgk = 0 (i =1, 2, ..., n) . (98)k-1 k=1

We introduce the real symmetric matrices

A=llaskII andB=llb,k1l1

and the column matrix q = (q1, q2, . . , qn) and write the system of equations(98) in the following matrix form :

Bq + Aq = o. (98')

We shall seek solutions of (98) in the form of harmonic oscillations

q1= v1 sin (cot + a), q,,= v2 sin (wt + a), ..., qn = v sin (cut + a),

in matrix notation :q = vsin (wt + a). (99)

Here v = (v1, v2, ... , vn) is the constant-amplitude column (constant-amplitude `vector'), co is the frequency, and a is the initial phase of theoscillation.

Substituting the expression (99) for q in (98') and cancellingsin (wt + a), we obtain :

Av = ,1Bv (,1=w2).

But this equation is the same as (49). Therefore the required amplitudevector is a principal vector, and the square of the frequency A = cot is thecorresponding characteristic value of the regular pencil of formsA(x, x) - AB (x, x).

We subject the potential energy to an additional restriction by postu-lating that the function P(q1, q2i ... , q,,) in a position of equilibrium shallhave a strict minimum.35

Then, by a theorem of Dirichlet,36 the position of equilibrium is stable.On the other hand, our assumption means that the quadratic formP = A(q, q) is also positive definite.

By Theorem 8, the regular pencil of forms A(x, x) - AB (x, x) has realcharacteristic values Al, A2, . . . , A. and n corresponding principal character-istic vectors v1, v2, ... , vn (vk = (vlk, V2k, . , V,,k) ; k = 1, 2, ... , n) satisfy-ing the condition

35 I.e., that the value of Po in the position of equilibrium is less than all other valuesof the function in some neighborhood of the position of equilibrium.

36 See G. K. Suslow (Suslov), Theoretische Mechanik, § 210.

§ 8. SMALL OSCILLATIONS OF SYSTEM WITH n DEGREES OF FREEDOM 329

B (vi, vk) = ' bp,vµivrk = Vik (i, k =1, 2, ..., n). (100)k. v-1

From the fact that A(x, x) is positive definite it follows that all thecharacteristic values of the pencil A(x, x) - AB(x, x) are positive :3'

Ak>0 (k=1,2,...,n).

But then there exist n harmonic oscillations3s

vk sin (cot + ak) ((o%= .k, k =1, 2, ... , n), (101)

whose amplitude vectors vk = (v1k, V2k, . . . , vk) (k =1, 2, ... , n) satisfythe conditions of 'orthonormality' (100).

Since the equation (98') is linear, every oscillation can be obtained by asuperposition of the harmonic oscillations (101) :

n

q PAksin ((okt+ak)vk (102)k-1

where Ak and ak are arbitrary constants. For, whatever the values of theseconstants, the expression (102) is a solution of (98). On the other hand, thearbitrary constants can be made to satisfy the following initial conditions :

q(e-o=qo, 4jc_o=q0

For from (102) we find :

n n

q0=,'Aksinakvk, 4o=£WkAkcosakvk.k-1 k-1

(103)

Since the principal columns v1, v2, . .., vn are always linearly independent.the values Ak sin ak and cvk cos ak (k = 1, 2, .. ., n), and hence the constantsAk and ak (k = 1, 2, ... , n) , are uniquely determined from (103).

The solution (102) of our system of differential equations can be writtenmore conveniently :

n

qr X Ak sin ((ot + ak) va (104)k-1

Note that we could also derive the formulas (102) and (104) startingfrom Theorem 9. For let us consider a non-singular transformation of the

90 This follows, for example, from the representation (63).38 Here the initial phases a. (k = 1, 2, ... , n) are arbitrary constants.


variables with the matrix V = II v,k Ili that reduces the two forms A (x, x)

and B(x, x) simultaneously to the canonical form (63). Setting

or, more briefly,

n

qj=Zv8, (i=1, 2, ..., n)k-1

(105)

q=VO (106)

and observing that q = V6, we have :n n

P=A (q, q) ='A Or, T =B(4, _.E 9Ei_1 k-1

(107)

The coordinates 01, 92i . . . , Bn in which the potential and kinetic energieshave a representation as in (107) are called principal coordinates.

We now make use of Lagrange's equations of the second kind (98) andsubstitute the expressions (107) for P and T. We obtain:

9k + Ak9k = 0 (k =1, 2, ..., n). (108)

Since A(q, q) is positive definite, all the numbers A,, A2, ... , 2,, are positiveand can be represented in the form

Ak=wk (wk>0; k=1 ,2 , (109)

From (108) and (109), we find:

Ok = At sin (wkt + ak) (k = 1, 2, ... , n) . (110)

When we substitute these expressions for 9k in (105). we again obtainthe formulas (104) and therefore (102). The values vi p (i, k =1, 2, ... , n)in both methods are the same, because the matrix V = II v,k 11 i in (106) is,by Theorem 9, a principal matrix of the regular pencil of formsA(x, x) - AB (x, x).

2. We also mention a mechanical interpretation of Theorems 14 and 15.We number the frequencies W1, w2, ... , w,, of the given mechanical system

in non-descending order :

0<w1Sco2 ' SWn.

The disposition of the corresponding characteristic values 2k= wk (k = 1, 2,3, ..., n) of the pencil A(x, x) - AB(x, x) is then also determined :

§ 9. HERMITIAN FORMS 331

We impose h independent finite stationary constraints39 on the given

system. Since the deviations ql, q2, .. , qn are supposed to be small, these

connections can be assumed to be linear in q1, q2, ... , qn :

L,(q) =0, L2(q) =0, ..., Lh(q) =0.

After the constraints are imposed, our system has n - It degrees offreedom. The frequencies of the system,

are connected with the characteristic values AI A°s S 5 A°_h of the

pencil A(x, x) - AB (x, x), subject to the constraints L1, L2, ... , Lh, by therelations AO = (012 (j = 1, 2, .... n - h). Therefore Theorem 14 immediately

implies that0-)j W! S wj+h (j =1, 2,...,n-h).

Thus : When h constraints are imposed, the frequencies of a system cancannot exceed theonly increase, but the value of the new j-th frequency 10;

value of the previous (j + h) -th frequency w*h .In exactly the same way, we can assert on the basis of Theorem 15 that:

With increasing rigidity of the system, i.e., with an increase of the form4.(q, q) for the potential energy (without a change in B(q, q) ), the fre-quencies can only increase; and with increasing inertia of the system, i.e.,with an increase of the form B(4,4) for the kinetic energy (without achange in A (q, q) ), the frequencies can only decrease.

Theorems 16 and 17 lead to an additional sharpening of this proposition.40

§ 9. Hermitian Forms41

1. All the results of §§ 1-7 of this chapter that were established for quad-ratic forms can be extended to hermitian forms.

We recall42 that a hermitian form is an expression

$9 A finite stationary constraint is expressed by an equation f (ql, qs, ... , 0,where f (qi, q2, ... , q,,) is some function of the generalized coordinates.

40 The reader can find an account of the oscillatory properties of elastic oscillationsof a system with n degrees of freedom in [17], Chapter III.

41 In the preceding sections, all the numbers and variables were real. In this section,the numbers are complex and the variables assume complex values.

42 See Chapter IX, § 2.

332 X. QITADRATIC AND HERMITIAN FORMS

H (x, x) _ I hikxixk (h,k= hki; i, k =1, 2, ..., n) . (111)i,ke1

To the hermitian form (111) there corresponds the following bilinear

hermitian form :

H(x, y) hikxiyk; (112)

moreover,i, k=1

H(y,x) =H(x,y) (113)

and, in particular,H(x,x) =H(x,x) (113')

i.e., the hermitian form H (x, x) assumes real values only.The coefficient matrix H = 1 hik II; of the hermitian form is hermitian,

i.e., H = H.43By means of the matrix H = II hik 11 71 we can represent H (x, y) and, in

particular, H(x, x) in the form of a product of three matrices, a row, asquare, and a column matrix :44

H(x, y) =xTH7, H(x,x) =xTHx. (114)If

m pX =1 cui, y = ' dkvk

k_1(115)

where u', vk are column matrices and cj, dk are complex numbers (i = 1, 2,3, ... , m ; k =1, 2, . ., p), then

m p

H(x, y) £cidkH(ui, vk). (116)i-1 km1

We subject the variables x1, X2 ,- .. , x to the linear transformation

xi = di tiksk (i =1, 2, ... , n) (117)k-i

43 A matrix symbol followed by an asterisk * denotes the matrix that is obtained fromthe given one by transposition and replacement of all the elements by their complexconjugates (H* = gT).

44 Here

x = (x1, X2.... , xn), (x1, 21, ... , xn), t _ yv ... , yn), t = !2, ... , in);

the sign T denotes transposition.


or, in matrix notation,

x=Tl; (T= 11 tik11").

After the transformation, H (x, x) assumes the form

(117')

n

H(,i,k- l

where the new coefficient matrix H = II htk 111 is connected with the oldcoefficient matrix H = II ha, II i by the formula

hT = TTHT. (118)

This is immediately clear when, in the second of the formulas (114), x isreplaced by

T we can rewrite (118) as follows :

II = WHW. (119)

From the formula (118) it follows that H and H have the same rankprovided the transformation (117) is non-singular (I T i z 0). The rank ofH is called the rank of the form H(x, x).

The determinant i H I is called the discriminant of H (x, x). From(118) we obtain the formula for the transformation of the discriminant ontransition to new variables :

IHI=IHIITiiTIA hermitian form is called singular if its discriminant is zero. Obviously,

a singular form remains singular under any transformation of the vari-ables (117).

A hermitian form H(x, x) can be represented in infinitely many waysin the form

H(x,x)ajXjX{,{a1

where a{ , 0 (i = 1, 2, ... , r) are real numbers andn

X` =Z,' ackxk (i =1, 2, . . ., r)k-i

(120)

are independent complex linear forms in the variables x1, x2, ... , xq.'S

45 Therefore r < n.


We shall call the right-hand side of (120) a sum of linearly independentsquares46 and every term in the sum a positive or a negative square accord-

ing as a{ > 0 or < 0. Just as for quadratic forms, the number r in (120)is equal to the rank of the form H(x, x).

THEOREM 18 (The Law of Inertia for Hermitian Forms) : In the repre-sentation of a hermitian form H(x, x) as a sum of linearly independentsquares,

H(x,x)aXX ,i-1

the number of positive squares and the number of negative squares do notdepend on the choice of the representation.

The proof is completely analogous to the proof of Theorem 1 (p. 297).The difference a between the number n of positive squares and the num-

ber v of negative squares in (120) is called the signature of the bermitianform H(x,x): o, =n-v.

Lagrange's method of reduction of quadratic forms to sums of squarescan also be used for hermitian forms, only the fundamental formulas (15)and (16) on p. 299 must then be replaced by the formulas49

H(x,x)= 1hop

k i

(hkf+ Jjo)

xk

A

i' hkaxkk-1

2

2

+ HI (XP x) , (121)

h(hkf XI;

2)-I- Hp,(x, z). (122)

Let us proceed to establish Jacobi's formula for a hermitian formn

H(z, x) hx{zk of rank r. Here, as in the case of a quadratic form,{,k-1

we assume that

Dk = H (12 ... k) 0 (k =1, 2, ..., r). (123)

This inequality enables us to use Theorem 2 of Chapter II (p. 38) on therepresentation of an arbitrary square matrix in the form of a product ofthree matrices : a lower triangular matrix F, a diagonal matrix D, and anupper triangular matrix L. We apply this theorem to the matrix H = 11 hiR I1;

46 This terminology is connected with the fact that R,%, is the square of the modulusof Z, (%i%i= I Z i ).

41 The formula (121) is applicable when h,, * 0; and (122), when hff; h, = 0,No ¢- 0.


and obtainH=F{Di, Dl, ..., D' 0, ..., O)L, (124)

where F = II f 4k II 1 , L - II lik II i , and

k

ffk DkH ... k-1 k)'1k

DkH(1 ... k-1 j) (125)

(j=k,k+1,...,n; k=1,2,...,r),

ja=1,j=0 (i <k; i, k=1, 2, ..., n). (126)

Since H= II h4k 11 7 is a hermitian matrix, it follows from these equa-tions that

i? k; k=1, 2, ..., r; 2, ..., n,1 (127)frk= ltt (i <k; i, k=1, 2, ..., n J

Since all the elements in the last n - r columns of F and the last n - rrows of L can be chosen arbitrarily,48 we choose these elements such that1) the relations (127) hold for all i, k

f4k=14k (i,k=1,2,...,n)and 2) IF{=ILIO. Then

F = L", (128)

and (124) assumes the form

H=L* DI) D2,...,Dr , 0,...,0}L. (129)

Setting(130)

we write (129) as follows :

H=TT {D,, Dl, ..., Dr'1, 0, ..., 0) T (jTj O). (131)

A comparison of this formula with (118) shows that the hermitian formn n .

(132),yDkx xk (Do-1)L

under the transformation of the variables

^" These elements, in fact, drop out of the right-hand side of (124), because the lastn - r diagonal elements of D are zero.


ti=

r`,' tikxhXk-1

(i=1, 2, ... , n)

goes over into H(x, x), i.e., that Jacobi's formula holds:

H (x, x) = EJ -XkXk (Do= 1), (133)

where

and

Xk x k + tk. k+lxk+l + + t k x (k =1, 2, ... , r) (134)

_ 1 1 2 ... k-1 7tDkH 1 2...k-1 k (135)

The linear forms X1, X2, ..., X, are independent, since Xk contains thevariable xk which does not occur in the subsequent forms Xk+1i . . . , X.

When we introduce, in place of Xl, X2i ... , Xr, the linearly independentforms

Yk=DkXk (k=1, 2, ..., r) , (136)

we can write Jacobi's formula (133) in the form

r Y_H (x, x) ' (D0=1). (137)

According to Jacobi's formula (137), the number of negative squaresin the representation of H(x, x) is equal to the number of variations of signin the sequence 1, D,, D2, ... , Dr

v=V(1,D1,D2,...,Dr),

so that the signature o of H(x, x) is determined by the formula

o=r-2V(1,DI,D2,...,Dr). (138)

All the remarks about the special cases that may occur, made for quad-ratic forms (§ 3). automatically carry over to hermitian forms.

DEFINITION 5: A hermitian form H(x, x) X hjkxtxk is called posi-i, k-1

tine (negative) semidefinite if for arbitrary values of the variables x1, x2iX3, ... , x,,, not all equal to zero,

H(x,x) >0 (<0).

§ 9. HERMITIAN FORMSa

337

DEFINITION 6: A hermitian form H(x, x) _ 2Thikxixk is called positivei. k+l

(negative) definite if for arbitrary values of the variables x1, x2, ... , x,,, not

all equal to zero,H(x, x) > 0 (< 0).

A

THEOREM 19: A hermitian form H(x, x) = Y hikx{xk is positive defi-i.k..1

mite if and only if the following inequalities hold:

Dk=H(1 2...k}>0 (k= 1,2,...,n). (139)

a

THEOREM 20: A hermitian form H(x, x) = .2; hikxixk is positive semi-{.k..1

definite if and only if all the principal minors of H = II hik II; are non-negative :

? 0H (i1 22 ... sip)

(i1,i2,...,tip=1,2,...,n;pn).(140)

The proofs of Theorems 19 and 20 are completely analogous to the proofsof Theorems 3 and 4 for quadratic forms.

The conditions for a hermitian form H(x, x) to be negative definite orsemidefinite are obtained by applying (139) and (140) to the form-H (x, x).

From Theorem 5' of Chapter IX (p. 274), we obtain the Theorem on thereduction of a hermitian form to principal axes :

THEOREM 21: Every hermitian form H(x, x) _ 2,' hikxixk can be re-duced by a unitary transformation of the variables

k'1

X= Uj (UU*=E) (141)

to the canonical formtt rr

A t_A (142)

where Al, 22, ... , A. are the characteristic values of the matrix H = li hik i .Theorem 21 follows from the formula

H= U 11 Abu 11 U-1= TT II A{a{t I I T (UT= U-'=T). (143)n ri

Let H(x, x) =X hikx{xk and G(x, x) =Z,' gikxixk be two hermitiani,ilk-1

forms. We shall study the pencil of hermitian forms H (x, x) - ,lG (x, x)


(A is a real parameter). This pencil is called regular if G(x, x) is positivedefinite. By means of the hermitian matrices H = II h;k II; and G = II gik II "Iwe form the equation

H-AG I =0.

This equation is called the characteristic equation of the pencil of her-mitian forms. Its roots are called the characteristic values of the pencil.

If A. is a characteristic value of the pencil, then there exists a columnz = (z,, z2i ... , z") o such that

Hz = Aoz.

We shall call the column z a principal column or principal vector of thepencil H(x,x) -AG(x,x) corresponding to the characteristic value Ao.

Then the following theorem holds :

THEOREM 22: The characteristic equation of a regular pencil of hermi-tian forms H(x, x) - AG(x, x) has n real roots Al, A2, ... , A,,. To these rootsthere correspond n principal vectors z', z2, ..., z" satisfying the conditionsof 'orthonormality':

G(z',zk) =8ik (i,k=1, 2, ..., n).

The proof is completely analogous to the proof of Theorem 8.All extremal properties of the characteristic values of a regular pencil

of quadratic forms remain valid for hermitian forms.Theorems 10-17 remain valid if the term `quadratic form' is replaced

throughout by the term `hermitian form.' The proofs of the theorems arethen unchanged.

§ 10. Hankel Forms

1. Let so, 81, ... , 32n_2 be a sequence of numbers. We form, by means ofthese numbers, a quadratic form in n variables

"-1

A9 (0, Y) _ `..' 8i+xxtxxt,x_o

(144)

This is called a Hankel form. The matrix 8= II sl+k II o 1 correspondingto this form is called a Hankel matrix. It has the form

§ 10. Harixi, FORMS

S=

80 81 82 ... 8n-1

81 82 83... 8n

82 8s 84 ... 8n+1

339

11 8.-1 On 8n+1 ... 82n-2 iI

We denote the sequence of principal minors of S by D1, D2, ... , Dn:

Dr=Ie{+tIo 1 (p=1,2, ..., n).

In this section we shall derive the fundamental results of Frobenius aboutthe rank and signature of real Hankel forms.49

We begin by proving two lemmas.

LEMMA 1: If the first h rows of the Hanukel matrix S= 11 s{+k 110 arelinearly independent, but the first h + 1 rows linearly dependent, then

Dh 0.

Proof. We denote the first h + 1 rows of S by R1, R2, ... , Rh, RA+1.By assumption, R1, R2, ..., Rh are linearly independent and Rh+1 is ex-pressed linearly in terms of them :

h

'ERh+1= i-1afRh-f+1

or h

8g of Q-1 (4'-h,h+ 1, ..., h+n-1). (145)

We write down the matrix formed from the first h rows R1, R2, ... , Rhof S:

80 81 82 ... 8n-1

81 82 88 ... 8n

8h-1 8h 8h+1 ... 8h+n-2

(146)

This matrix is of rank h. On the other hand, by (145) every column of thematrix can be expressed linearly in terms of the preceding h columns andhence in the terms of the first h columns. But since the rank of (146) is h,these first h columns of (146) must then be linearly independent, i.e.,

Dh&0.This proves the lemma.

49 pee [162].


LEMMA 2: If in the matrix S= II si+k iIo 1, for a certain h (< n),

DAO, DA+1=...=Dn=O

and

1...hh+i+1S...h h+k+1)= 1G Dhtit 1 ... h DA

BA+i - . 82h+;-1

(i,k,=0, 1, ...,n-h-1)

(147)

BA+k

8211+k-1

82h+i+k

(148)

then the matrix T = II tik lion-4-1 is also a Hankel matrix and all its elementsabove the second diagonal are zero, i.e., there exist numbers tn_A_1, ... ,t2.-21,-2 such that

tik = ti+k (i,k=0, 1, ... , n-h-1; to=tl=...=tn-A-2=0).

Proof. We introduce the matrices

TV = 11 tilt llu-1 (p=1,2,... n -h).

In this notation T = T.-h.We shall show that every Tp (p =1, 2, ... , n - h.) is a Hankel matrix

and that tik = 0 for i + k < p - 2. The proof is by induction with respectto p.

For the matrix T1i our assertion is trivial ; for T2, it is obvious, since

too tolltlo tll tot= t1o (because S is symmetric) and too = DA-+1= 0DA

Let us assume that our assertion is true for the matrices T. (p < n - h) ;we shall show that it is also true for Tp+1= II t1k Ilo . From the assumptionit follows that there exist numbers tp_1, tp, ... , t2p_2 such that withto=...=tp_2=0

Tp -IIti+k IIU 1.

HereI Tp I = t-1. (149)

On the other hand, using Sylvester's determinant identity (see (28) onpage 32), we find :

Tp I =DA=0. (150)

§ 10. HANKEL FORMS

Comparing (149) with (150), we obtain

t,_1=0.Furthermore from (148)

8h+k

t = 82h+i+kDh

DA

341

(151)

(152)

82h+k-1BA+i ... 82h+i-] 0

By the preceding lemma, it follows from (147) that the (h + 1)-th rowof the matrix S= Its{+k IIo 1 is linearly dependent on the first h rows :

h

8Q= ,'apsg_p (q=h,h+1, ... , h+n-1). (153)v-1

Let i, k:5 p < i + k:5 2p -1. Then one of the numbers i or k is lessthan p. Without loss of generality, we assume that i < p. Then, when weexpand, by (153), the last column of the determinant of the right-hand sideof (152) and use the relations (152) again, we shall have

h

ti = 82h+i+k +p-1

Dh

BA+k-p

DA

8h+i ... 82h+i-1h

= 82h+i+k +X ap (t_9 - 82h+i+k-p) .p-1

(154)

By the induction hypothesis (151) holds, and since in (154) i < p, k - g < pand i + k - g < 2p - 2, we have ti,k-p = ti+k-p Therefore, for i + k < pall the tik = 0, and for p < i + k < 2p -1 the value of tik, by (154), dependson i + k only.

Thus, T,+1 is a Hankel matrix, and all its elements to, tj, ... , t,_, abovethe second diagonal are zero.

This proves the lemma.

Using Lemma 2, we shall prove the following theorem :


THEOREM 23: If the Hankel matrix S= II Si+k has rank r and if forsome h (< r)

DA O, DA+1=...=Dr=O,

then the principal minor of order r formed from the first h and the last r - hrows and columns of S is not zero:

D(*) = S1 ... h n-r+h+l n-r+h+2 ...(1 ... h n-r+h+ 1 n-r+h+2 ...

Proof. By the preceding lemma, the matrix

T = 11 tit l0n-A-1

hh+i+1t

... h h+k+1).

hS(l ... )

n

n0.

(i,k=0, 1, ..., n-h+ 1)

is a Hankel matrix in which all the elements above the second diagonal arezero. Therefore

T t"-h'o.A_h-1.

On the other hand,S° I T I =Dh = 0. Therefore to, _h_1= 0, and the matrixT has the form

0

T=

0' uA-A-1

U2

0 un-h-1 . . . u2 "u1

The rank of T must be r - h.51 Therefore for r < n - 1 in the matrix Tthe elements u,_h+1=...=u,,_A+1=0, and

60 By Sylvester's determinant identity (see (28) on p. 32).51 From Sylvester's identity it follows that all the minors of T in which the order

exceeds r - h are zero. On the other hand, 8 contains some non-vanishing minors oforder r bordering D5. Hence it follows that the corresponding minor of order r - h of Tis different from zero.

§ 10. HANKEL FORMS 343

0 . . . . . . . . 0 II

T=

.0

ur-h(ur-h 0) .

... 0 ur_h ... ul II110*

But then, by Sylvester's identity (see page 32),

n-?+1...n-hDO) = DhT ( = Dhu,_h 0 ,n-r+ l ...n-h

and this is what we had to prove. ,rlLet us consider a real52 Hankel form S(x, x)= Si+k xi xk of rank r.

i.k-oWe denote by n, v, and o, respectively, the number of positive and of negativesquares and the signature of the form :

a+v=r, a=n-v=r-2v.By the theorem of Jacobi (p. 303) these values can be determined from

the signs of the successive minors

Do= 1, DI, D2, ..., Dr_l, Dr (155)

by the formulas

n = P (1, Dl, ... Dr), v = V (1, D1, ..., Dr) ,(156)

a =p (1, D1, ... , Dr) V (1, Dl, ... Dr) =r - 2 V (1, Dl, ... , Dr) .

These formulas become inapplicable when the last term in (155) or anythree consecutive terms are zero (see § 3). However, as Frobenius hasshown, for Hankel forms there is a rule that enables us to use the formulas(156) in the general case: n-i

THEOREM 24 (Frobenius) : For a real Hankel form S(x, x) = I ei+k xi xktj 0

of rank r the values of n, v, and o can be determined by the formulas (156)provided that

52 In the preceding Lemmas I and 2 and in Theorem 23, the ground field call be takenas an arbitrary number field-in particular, the field of complex or of real numbers.

344

1) for

X. QUADRATIC AND HERMITIAN FORMS

Dh 0, DA+I=... =D,==O (h <r)

Dr is replaced by D(r), where

D(r)=S(1:.hn-r+h+l...'n1 di

`1 . hn-r+h+1 . n

2) in any group of p consecutive zero determinants

(Dh 0) Dh+l = DA+2 = ... DA+P = 0 (Dh+P+1 ,r0)

a sign is attributed to the zero determinants according to the formula

f U-1)Sign DA+i = (-1) 2 sign DA.

(157)

(158)

(159)

The values of P, V, and P - V corresponding to the group (158) arethen:"'

p odd

Ph, P = P (Dh, Dh+1, ..., Dh+p+1)

Vh, P=-- V (Dh, Dh+1, ... , D)i+p+1)

Ph, P-Vh,p

p+12

P+12

0

a I)! sign Dh+p+1DA

p even

P+1+82

P+1-c2

(160)

8

Proof. To begin with we consider the case where Dr 0. Then then-1 r-1

forms S (x, x) = Eof+kx;xk and S, (x, x) _ s;+kx;xk have not only thet, r_o {, k-o r

same rank r, but also the same signature a. For let S (x, x) = ,' ;Z, where:-1

the Z; are real linear forms and at _ {- 1 (i = 1, 2, ... , r) . We set xr+1 == xn_1= 0. Then the forms S(x, x) and Z1 go over, respectively, into

Sr(x, x) and Z{ (i=1, 2, ... , r) ; and Sr(:r, x) e;Z{, i.e., S, (x, x) hasc-i

53 The formulas (159) and (160) are also applicable to (157), but we have to set

p = r - h -1 and interpret Dh+p+1 not as D, - 0, but as D(r) -96 0.


the same number of positive and negative squares as S(x, X).54 Thus the

signature of Sr(x, x) is u.We now vary the parameters so, sI, ... , 32r_2 continuously in such a way

that for the new parameter values 8o, si , ... , ssr-2 all the terms of thesequenee55

1, D3, DE, ... Dr (DQ -' St+k o 4 =1, 2, ... , r)

are different from zero and that in the process of variation none of the non-zero determinants (155) vanishes .56

Since the rank of Sr(x, x) does not change during the variation, its signa-ture also remains unchanged (see p. 309). Therefore

a_-P(1, Di, ...,D;)-V (1, Dl*, ..., D;). (161)

If Ds 0 for some i, then sign D*= sign D. Therefore the whole prob-lem reduces to determining the variations in sign among those D* that corre-spond to Di = 0. More accurately, for every group of the form (158) wehave to determine

P (Dn, Dh+i, ..., Dh+r+i) - V (De, Dn+L,' ..., Dh-4-p, Dn+r+i)

For this purpose we set :

85+k

Dh

825+k-1

85+i . 885+i-1 885+i+k

(i, k=0, 1, ..., p).

By Lemma 2, the matrix T = !I 4k l+o is a Hankel matrix and all itselements above the second diagonal are zero, so that T has the form

s4 The linear forms Z1, Z5, ..., Zr are linearly independent, because the quadraticr

form 8(z, x) is of rank r (D, , 0).f-1

55 In this section, the asterisk * does not indicate the adjoint matrix.56 Such a variation of the parameter is always possible, because in the space of the

parameters s., a,, ... , s2--2 an equation of the form D4 = 0 determines a certain algebraichypersurface. If a point lies in some such hypersurfaces, then it can always be approxi-mated by arbitrarily close points that do not lie in these hypersurfaces.


T=

Iit * ... *We denote the successive minors of T by D1, D2,-. .. Dr+1

DQ-JttkIu-1 (q=1, 2, ..., p+ 1).

Together with T, we consider the matrix

T*=IItrkilo,where

8h*+k

D*h*

82b+t-1

8A+1 .. 82A+1-1 82A+1+k

and the corresponding determinants

(s, k = 0, 1, . . . , p)

DQ*=I t* I o-1 (q=1, 2, ..., p+ 1).

By Sylvester's determinant identity,

Dh+Q=DaD9 (q=1, 2, ..., p + 1).Therefore

(162)

* * * * *P (DA, DA+1, ... , DA+p+1) - V (DA, DA+1, ... , DA+p+1)

P (1, Dl*, ... , Dp+1) - V (1, Dl*, ... , Dp+1) = a*, (163)

where a* is the signature of the formp

T*(x, x)=4' tikxlxk .

1,k-0

Together with T*(x, x), we consider the forms

T (x, x) = ± tl+kxjxk and T** (x, x) = tp (xoxp + xlxp_1 + .. + xpxo) .i, k-0


The matrix T** is obtained from T (see (162)) when we replace in the latterall the elements above the second diagonal by zeros. We denote the signa-tures of T (x, x) and T**(x, x) by or andv**. Since T*(x, x) and T**(x, x)are obtained from T (x, x) by variations of the coefficients during which

the rank of the form does not change (I T** I = T D DA 1 n 0, T*

h+P 1 0), the signatures of T (x, x), T* (x, x), and T** (x, x) must alsoD

be equal :

But

T** (x, x) -_ J 2t, (xoxzx_1 + ... + xx_lx4) for odd p,tP [2 (x0x2k + + x*_1x41) + xk] for even p

(164)

Since every product of the form xaxp with a L fi can be replaced by a dif-

ference of squares (xa2

xP)a-(x"2 zP)$, we can obtain a decomposition of

T** (x, x) into independent real squares and we have

** _ 0 for odd p,sign tP for even p.

On the other hand, from (162),

(165)

Dh+P+1 P (P+l)

D h = 17' I = (-1Y p+l . (166)

From (163), (164), (165), and (166), it follows that:

P (DA, Dh+1, ... , Dh*+P+1) - F (Dh' Dh+l, ... , D P+1)

-f0 for oddp,

where

Since

t e for even p .

pDh+P+1e_(-1)a sign-Dh .

P (Dn*+v De+s) ... , Dh+p+1) + V (DD+1, DA+z) ... , Dh+P+1) = p + 1,

(167)

(168)

the table (160) can be deduced from (167) and (168).Now let Dr = 0. Then for some h < r

Dh 0, Dh+1= ... =D, = 0.


In this case, by Theorem 25,

DO =8 1...h n-r+h+1 ...n 0(1...h n-r+h+1...n

The case to be considered reduces to the preceding case by renumberingn-ithe variables in the quadratic form S(x, x) = X Si+kX$Xk We set:

k-0

x0 = x0, ... , xA-1 = xA-1, xA ' xn-r+M ... , xr-1 = xii_1,x, = XA, ..., xn-1 = xn-r+A_1.

-1Then S(x, x) = '' ss+k xixk.

i.ke0Starting from the structure of the matrix T on page 346 and using the

relations

Dt=DDhi , (7=1, 2, ..., n-h)

obtained from Sylvester's determinant identity, we find that the sequence1, Dn is obtained from l., D1, D2, ... , D. by replacing the singleelement A. by P11.

We leave it to the reader to verify that the table (160) corresponds tothe attribution of signs to the zero determinants given by (159).

This completes the proof of the theorem.Note. It follows from (166) that for odd p (p is the number of zero

determinants in the group (158) )

p+I

sign DA ±1 = (__ 1) 2

A

In particular, for p:-- 1 we have DADh+2 < 0. In this case, we can omitDn+1 in computing V(1, D I,.. . , Dr), thus obtaining G4undenfinger's rule. Inexactly the same way, we obtain Frobenius' rule (see page 304) from (160)for p=2.

BIBLIOGRAPHY

BIBLIOGRAPHY

Items in the Russian language are indicated by *

PART A. Textbooks, Monographs, and Surveys

[1]

[2]

AcHIESER (Akhieser), N. J., Theory of Approximation. New York : Ungar, 1956.[Translated from the Russian.]AITKEN, A. C., Determinants and matrices. 9th ed., Edinburgh: Oliver and Boyd,1956.

[3) BELLMAN, R., Stability Theory of Differential Equations. New York: McGraw-Hill, 1953.

-[4] BERNSTEIN, S. N., Theory of Probability. 4th ed., Moscow: Gostekhizdat, 1946.[5] BODEWIO, E., Matrix Calculus. 2nd ed., Amsterdam: North Holland, 1959.[6] CAHEN, G., Elements du calcul matriciel. Paris: Dunod, 1955.

*[7] CHEBOTAREV, N. G., and MEIMAN, N. N., The problem of Routh-Hurwitz for poly-nomials and integral functions. Trudy Mat. Inst. Steklov., vol. 26 (1949).

'[8] CHEBYSHEV, P. L., Complete collected works. vol. III. Moscow: Izd. AN SSSR,1948.

"[9] CHETAEV, N. G., Stability of motion. Moscow: Gostekhizdat, 1946.[10] COLLATZ, L., Eigenwertaufgaben mit technischen Anwendungen. Leipzig: Akad.

Velags., 1949.[11] Eigenwertprobleme and ihre numerische Behandlung. New York:

Chelsea, 1948.[12] COURANT, R. and HILBERT, D., Methods of Mathematical Physics, vol. I. Trans.

and revised from the German original. New York: Interscience, 1953."[13] ERuoIN, N. it, The method of Lappo-Danitevskil in the theory of linear differen-

tial equations. Leningrad : Leningrad University, 1956."[14] FADDEEV, D. K. and SOMINSH]r, I. S., Problems in higher algebra. 2nd ed., Moscow,

1949; 5th ed. Moscow: Gostekhizdat, 1954.[15] FADDEEVA, V. N., Computational methods of linear algebra. New York: Dover

Publications, 1959. [Translated from the Russian.][16] FRAZER, R. A., DUNCAN, W. J., and COLLAR, A., Elementary Matrices and Some

Applications to Dynamics and Differential Equations. Cambridge: CambridgeUniversity Press, 1938.

`[17] GANTMACHER (Gantmakher), F. R. and KREIN, M. G., Oscillation matrices andkernels and small vibrations of dynamical systems. 2nd ed., Moscow: Gostekh-izdat, 1950. [A German translation is in preparation.]

[18] GRSBNER, W., Matrizenrechnung. Munich: Oldenburg, 1956.[19] HAHN, W., Theorie and 4nwendung der direkten Methode von Lyapunov (Ergeb-

nisse der Mathematik, Neue Folge, Heft 22). Berlin: Springer, 1959. [Containsan extensive bibliography.]

351

352 BIBLIOGRAPHY

[20] INCE, E. L., Ordinary Differential Equations. New York: Dover, 1948.[21] JUNG, H., Matrizen and Determinanten. Eine Einfiihrung. Leipzig, 1953.[22] KLEIN, F., Vorlesungen caber hbhere Geometric. 3rd ed., New York: Chelsea, 1949.[23] KOWALEWSKI, G., Einfiihrung in die Determinantentheorie. 3rd ed., New York:

Chelsea, 1949.*[241 KREIN, M. G., Fundamental propositions in the theory of A-zone stability of a

canonical system of linear differential equations with periodic coefficients.Moscow : Moscow Academy, 1955.

*[25] KREIN, M. G. and NAIMARK, M. A., The method of symmetric and hermitian formsin the theory of separation of roots of algebraic equations. Kharkov: GNTI,1936.

*[26] KREIN, M. G. and RUTMAN, M. A., Linear operators leaving a cone in a Banachspace invariant. Uspehi Mat. Nauk, vol. 3 no. 1, (1948).

*[27] KUDRYAVCHEV, L. D., On some mathematical problems in the theory of electricalnetworks. Uspehi Mat. Nauk, vol. 3 no. 4 (1948).

*[28] LAPPO-DANILEVSKII, I. A., Theory of functions of matrices and systems of lineardifferential equations. Moscow, 1934.

[29] Memoires sur la thdorie den systemes des equations differentielles line-aires. 3 vols., Trudy Mat. Inst. Steklov. vols. 6-8 (1934-1936). New York:Chelsea, 1953.

[30] LEFSCHETZ, S., Differential Equations: Geometric Theory. New York: Inter-science, 1957.

[31] LICHNEROWICZ, A., Algtbre et analyse lineaires. 2nd ed., Paris: Masson, 1956.[32] LYAPUNOV (Liapounoff), A. M., Le Problpme general de la stability du mouve-

ment (Annals of Mathematics Studies, No. 17). Princeton: Princeton Univ.Press, 1949.

[33] MACDUFFEE, C. C., The Theory of Matrices. New York: Chelsea, 1946.[34] Vectors and matrices. La Salle: Open Court, 1943.

*[35] MALKIN, I. G., The method of Lyapunov and Poincard in the theory of non-linearoscillations. Moscow: Gostekhizdat, 1949.

[36] Theory of stability of motion. Moscow: Gostekhizdat, 1952. [A Germantranslation is in preparation.]

[37] MARDEN, M., The geometry of the zeros of a polynomial in a complex variable(Mathematical Surveys, No. 3). New York: Amer. Math. Society, 1949.

*[38] MARKOV, A. A., Collected works. Moscow, 1948.*[39] MEIMAN, N. N., Some problems in the disposition of roots of polynomials. Uspehi

Mat. Nauk, vol. 4 (1949).[40] MIRSKY, L., An Introduction to Linear Algebra. Oxford: Oxford University

Press, 1955.*[41] NAIMARK, Y. I., Stability of linearized systems. Leningrad : Leningrad Aero-

nautical Engineering Academy, 1949.[42] PARODI, M., Sur quelques proprietds des valeurs caraetdristiques des matrices

carrdes (Memorial des Sciences Matht matiques, vol. 118), Paris: Gauthiers-Villars,1952.

[43] PERLIS, S., Theory of Matrices. Cambridge. (Mass.): Addison-Wesley, 1952.[44] PICKERT, G., Normalformen von Matrizen (Enz. Math. Wiss., Band I, Teil B.Heft 3, Teil I). Leipzig: Teubner, 1953.

*[45] PoTAPOV, V. P., The multiplicative structure of J-inextensible matrix functions.Trudy Moscow Mat. Soc., vol. 4 (1955).

BIBLIOGRAPHY 353

`[46] ROMANOVSKII, V. I., Discrete Markov chains. Moscow: Gostekhizdat, 1948.

[47] ROUTH, E. J., A treatise on the stability of a given state of motion. London:

Macmillan, 1877.[48] The advanced part of a Treatise on the Dynamics of a Rigid Body.

6th ed., London: Macmillan, 1905; repr., New York: Dover, 1959.

[49] SCHLESINGER, L., Vorlesungen iiber lineare Differentialgieichungen. Berlin, 1908.

[50] Einfuhru.ng in die Theorie der getviihnlichen. Dif ferentialgleichungen auffunktionentheoretischer Grundlage. Berlin, 1922.

[51] SCHMEIDLER, W., Vortrage fiber Determinanten and Matrizen mit Anwendungenin Physik and Technik. Berlin: Akademie-Verlag, 1949.

[52] SCHREIER, 0. and SPERNER, E., Vorlesungen fiber Matrizen. Leipzig: Teubner,1932. [A slightly revised version of this book appears as Chapter V of [53].]

[53] Introduction to Modern Algebra and Matrix Theory. New York: Chelsea,1958.

[54] SCHWERDTFEGER, H., Introduction to Linear Algebra and the Theory of Matrices.Groningen: Noordhoff, 1950.

[55] SHORAT, J. A. and TAMARKIN, J. D., The problem of moments (MathematicalSurveys, No. 1). New York : Amer. Math. Society, 1943.

[56] SMIRNOW, W. I. (Smirnov, V. I.), Lehrgang der hi heren Mathematik, Vol. III.Berlin, 1956. [This is a translation of the 13th Russian edition.]

[57] SPECHT, W., Algebraische Gleichungen mit reellen oder komplexen Koeffizienten(Enz. Math. Wiss., Band I, Teil B, Heft 3, Teil II). Stuttgart: Teubner, 1958.

[58] STIELTJES, T. J., Oeuvres Completes. 2 vols., Groningen: Noordhoff.[59] STOLL, R. R., Linear Algebra and Matrix Theory. New York: McGraw-Hill, 1952.[60] THRALL, R. M. and TORNHEIM, L., Vector spaces and matrices. New York:

Wiley, 1957.[61] TURNBULL, H. W., The Theory of Determinants, Matrices and Invariants. Lon-

don: Blackie, 1950.[62] TURNBULL, H. W. and AITxEN, A. C., An Introduction to the Theory of Canonical

Matrices. London : Blackie, 1932.(63] VOLTERRA, V. et HosTINsxy, B., Operations infinitesimales lineaires. Paris:

Gauthiers-Villars, 1938.[64] WEDDERBURN, J. H. M., Lectures on matrices. New York: Amer. Math. Society,

1934.

[65] WEYL, H., Mathematische Analyse des Raumproblems. Berlin, 1923. [A reprintis in preparation: Chelsea, 1960.]

[66] WINTNER, A., Spektraltheorie der unendlichen Matrizen. Leipzig, 1929.[67] ZUEMiiHL, R., Matrizen. Berlin, 1950.

PART B. Papers

[101] ArRIAT, S., Composite matrices, Quart. J. Math. vol. 5, pp. 81-89 (1954).`[102] AIZERMAN (Aisermann), M. A., On the computation of non-linear functions of

several variables in the investigation of the stability of an automatic regulatingsystem, Avtomat. i Tolemeh. vol. 8 (1947).

[103] AISERMANN, M. A. and F. R. GANTMACHER, Determination of stability by linearapproximation of a periodic solution of a system of differential equations withdiscontinuous right-hand sides, Quart. J. Mech. Appl. Math. vol. 11, pp. 385-98(1958).

354 BIBLIOGRAPHY

[104] AITKEN, A. C., Studies in practical mathematics. The evaluation, with applica-tions, of a certain triple product matrix. Proc. Roy. Soc. Edinburgh vol. 57,(1936-37).

[105] AMIR Mo z ALI, R., Extreme properties of eigenvalues of a hermitian transforma-tion and singular values of the sum and product of linear transformations, DukeMath. J. vol. 23, pp. 463-76 (1956).

'[106] ARTASHENKOV, P. V., Determination of the arbitrariness in the choice of amatrix reducing a system of linear differential equations to a system with con-stant coefficients. Vestnik Leningrad. Univ., Ser. Mat., Phys. i Chim., vol. 2,pp. 17-29 (1953).

107] ARZHANYCH, I. S., Extension of Krylov's method to polynomial matrices, Dokl.Akad. Nauk SSSR, Vol. 81, pp. 749-52 (1951).

*[108] AZBELEV, N. and R. VINOORAD, The process of successive approximations for thecomputation of eigenvalues and eigenvectors, Dokl. Akad. Nauk., vol. 83, pp. 173-74 (1952).

[109] BAKER, H. F., On the integration of linear differential equations, Proc. LondonMath. Soc., vol. 35, pp. 333-78 (1903).

[110] BARANKIN, E. W., Bounds for characteristic roots of a matrix, Bull. Amer. Math.Soc., vol. 51, pp. 767-70 (1945).

[111] BARTSCH, H., Abschatzungen fur die Kleinste charakteristische Zahl einer positiv-definiten hermitschen Matrix, Z. Angew. Math. Mech., vol. 34, pp. 72-74 (1954).

[112] BELLMAN, R., Notes on matrix theory, Amer. Math. Monthly, vol. 60, pp. 173-75,(1953); vol. 62, pp. 172-73, 571-72, 647-48 (1955); vol. 64, pp. 189-91 (1957).

[113] BELLMAN, R. and A. HOFFMAN, On a theorem of Ostrowski, Arch. Math., vol. 5,pp. 123-27 (1954).

[114] BENDAT, J. and S. SILVERMAN, Monotone and convex operator functions, Trans.Amer. Math. Soc., vol. 79, pp. 58.71 (1955).

[115] BERGE, C., Sur une propriet6 des matrices doublement stochastiques, C. R. Acad.Sci. Paris, vol. 241, pp. 269-71 (1955).

[116] BIRKHOFF, G., On product integration, J. Math. Phys., vol. 16, pp. 104-32 (1937).[117] BIRKHOFF, G. D., Equivalent singular points of ordinary linear differential

equations, Math. Ann., vol. 74, pp. 134-39 (1913).[118] BoTT, R. and R. DUFFIN, On the algebra of networks, Trans. Amer. Math. Soc.,

vol. 74, pp. 99-109 (1953).[119] BRAUER, A., Limits for the characteristic roots of a matrix, Duke Math. J.,

vol. 13, pp. 387-95 (1946) ; vol. 14, pp. 21-26 (1947) ; vol. 15, pp. 871-77 (1948) ;vol. 19, pp. 73-91, 553-62 (1952) ; vol. 22, pp. 387-95 (1955).

[120] Ober die Lage der charakteristischen Wurzeln einer Matrix, J. ReineAngew. Math., vol. 192, pp. 113-16 (1953).

[121] Bounds for the ratios of the coordinates of the characteristic vectors ofa matrix, Proc. Nat. Acad. Sci. U.S.A., vol. 41, pp. 162-64 (1955).

[122] The theorems of Ledermann and Ostrowski on positive matrices, DukeMath. J., vol. 24, pp. 265-74 (1957).

[123] BRENNER, J., Bounds for determinants, Proc. Nat. Acad. Sci. U.S.A., vol. 40,pp. 452-54 (1954) ; Proc. Amer. Math. Soc., vol. 5, pp. 631-34 (1954) ; vol. 8,pp. 532-34 (1957); C. R. Acad. Sci. Paris, vol. 238, pp. 555-56 (1954).

[124] BRUIJN, N., Inequalities concerning minors and eigenvalues, Nieuw Arch. Wisk.,vol. 4, pp. 18-35 (1956).

[125] BRUIJN, N. and G. SZEKERES, On some exponential and polar representatives ofmatrices, Nieuw Arch. Wisk., vol. 3, pp. 20-32 (1955).

BIBLIOWIAPIIY 355

*[126] BuLOAKOV, B. V., The splitting of rectangular matrices, Dokl. Akad. Nauk SSSR,vol. 85, pp. 21-24 (1952).

11271 CAYLEY, A., A memoir on the theory of matrices, Phil. Trans. London, vol. 148,

pp. 17-37 (1857) ; Coll. Works, vol. 2, pp. 475-96.11281 COLLATZ, L., Einschliessungssato fur die charakteristischen Zahlen von Matrizen,

Math. Z., vol. 48, pp. 221-26 (1942).[1291 Ober monotone systeme linearen Ungleichungen, J. Reine Angew. Math.,

vol. 194, pp. 193-94 (1955).[1301 CREMER, L., Die Verringerung der Zahl der Stabilitatskriterien bei Yoraussetzu.ng

positiven koeffizienten der charakteristischen Gleichung, Z. Angew. Math. Mech.,vol. 33, pp. 222-27 (1953).

*1131] DANILEvsxIi, A. M., On the numerical solution of the secular equation, Mat. Sb.,vol. 2, pp. 169-72 (1937).

[1321 DILTBERTO, S., On systems of ordinary differential operations. In: Contributionsto the Theory of Non-linear Oscillations, vol. I, edited by S. Lefschetz (Annalsof Mathematics Studies, No. 20). Princeton: Princeton Univ. Press (1950),pp. 1-38.

*[1331 DMITRIEV, N. A. and E. B. DYNKIN, On the characteristic roots of stochasticmatrices, Dokl. Akad. Nauk SSSR, vol. 49, pp. 159-62 (1945).

*[133a] Characteristic roots of Stochastic Matrices, Izv. Akad. Nauk, Ser. Fiz-Mat., vol. 10, pp. 167-94 (1946).

[134) DoBSCH, 0., Matrixfunktionen beschrankter Schwankung, Math. Z., vol. 43,pp. 353-88 (1937).

*1135] DONSKAYA, L. I., Construction of the solution of a linear system in the neighbor-hood of a regular singularity in special cases, Vestnik Leningrad. Univ., vol. 6(1952).

*[136] On the structure of the solution of a system of linear differential equa-tions in the neighbourhood of a regular singularity, Vestuik Leningrad. Univ.,vol. 8, pp. 55-64 (1954).

*[137] DuBNOV, Y. S., On simultaneous invariants of a system of affinors, Trans. Math.Congress in Moscow 1927, pp. 236-37.

* [ 138] On doubly symmetric orthogonal matrices, Bull. Ass. Inst. Univ. Moscow,pp. 33-35 (1927).

*[139] On Dirac's matrices, U. zap. Univ. Moscow, vol. 2, pp. 2, 43-48 (1934).1401 DUBNOV, Y. S. and V. K. IvANOV, On the reduction of the degree of affinor

polynomials, Dokl. Akad. Nauk SSSR, vol. 41, pp. 99-102 (1943).11411 DUNCAN, W., Reciprocation of triply-partitioned matrices, J. Roy. Aero. Soc.,

Vol. 60, pp. 131.32 (1956).[1421 EOERVARY, E., On a lemma of Stieltjes on matrices, Acta. Sci. Math., vol. 15,

pp. 99-103 (1954).[143] On hypermatrices whose blocks are commutable in pairs and their appli-

cation in lattice-dynamics, Aeta Sci. Math., vol. 15, pp. 211-22 (1954).[144] EPSTEIN, M. and H. FLANDERS, On the reduction of a matrix to diagonal form,

Amer. Math. Monthly, vol. 82, pp. 168-71 (1955).*[1451 ERSHov, A. P., On a method of inverting matrices, Dokl. Akad. Nauk SSSR,

vol. 100, pp. 209-11 (1955).[1461 ERUaIN, N. P., Sur la substitution. exposante pour quelques systCmes irreguliers,

Mat. Sb., vol. 42, pp. 745-53 (1935).*[1471 Exponential substitutions of an irregular system of linear differential

equations, Dokl. Akad. Nauk SSSR, vol. 17, pp. 235-36 (1935).

356 BIBLIOGRAPHY

`[148] On Riemann's problem for a Gaussian system, Uc. Zap. Ped. Inst., vol. 28,pp. 293-304 (1939).

*[149J FADDEEV, D. K., On the transformation of the secular equation of a matrix,Trans. Inst. Eng. Constr., vol. 4, pp. 78-86 (1937).

[150] FAEDO, S., Un nuove problema di stabilitic per le equationi algebriche a coeffi-cienti reali, Ann. Scuola Norm. Sup. Pisa, vol. 7, pp. 53-63 (1953).

`[151] FAGE, M. K., Generalization of Hadamard's determinant inequality, Dokl. Akad.\auk SSSR, vol. 54, pp. 765-68 (1946).

'[152] On symmetrizable matrices, Uspelii Mat. Nauk, vol. 6, no. 3, pp. 153-56(1951).

[153] FAN, K., On a theorem of Weyl concerning eigenvalues of linear transformations,Proc. Nat. Acad. Sci. U.S.A., vol. 35, pp. 652-55 (1949); vol. 36, pp. 31-35 (1950).

[154] Maximum properties and inequalities for the eigenvalues of completelycontinuous operators, Proc. Nat. Acad. Sci. U.S.A., vol. 37, pp. 760-66 (1951).

[155] A comparison theorem for eigenvalues of normal matrices, Pacific J.Math., vol. 5, pp. 911-1s (1955).

[156] Some inequalities concerning positive-definite Hermitian matrices, Proc.Cambridge Philos. Soc., vol. 51, pp. 414-21 (1955).

[157) Topological proofs for certain theorems on matrices with non-negativeelements, Monatsh. Math., vol. 62, pp. 219-37 (1958).

[158] FAN, K. and A. I1OFFMAN, Some metric inequalities in the space of matrices, Proc.Amer.iMath. Soc., vol. 6, pp. 111-16 (1958).

(159] FAN, K. and G. PALE, Imbedding conditions for Hermitian and normal matrices,Canad. J. Math., vol. 9, pp. 298-304 (1957).

[160] FAN, K. and J. TODD, A determinantal inequality, J. London Math. Soc., vol. 30,pp. 58.64 (1955).

[161] FROBENIUS, G., Ober lineare substitutionen and bilineare Formen, J. ReineAngew. Math., vol. 84, pp. 1-63 (1877).

[162] Ober das Tragheitsgesetz der quadratischen Formen, S.-B. Deutsch.Akad. Wiss. Berlin. Math.-Nat. KI., 1894, pp. 241-56, 407-31.

[163] ttber die cogredienten transformationen der bilinearer Formen, S.-B.Deutsch. Akad. Wiss. Berlin. Math.-Nat. KI., 1896, pp. 7-16.

[164] Ober die vertauschbaren Matrizen, S.-B. Deutsch. Akad. Wiss. Berlin.Math.-Nat. KI., 1896, pp. 601-614.

[165] t`ber Matrizen aus positiven Elementen, S.-B. Deutsch. Akad. Wiss.Berlin. Math-Nat. Iil. 1908, pp. 471-76; 1909, pp. 514.18.

[166] Ober Matrizen aus nicht negativen Elementen, S.-B. Deutsch. Akad.Wiss. Berlin Math.-Nat. KI., 1912, pp. 456-77.

"[167] GANTMACHER, F. R., Geometric theory of elementary divisors after Krull, TrudyOdessa Gos. Univ. Mat., vol. 1, pp. 89-108 (1935).

*[168] On the algebraic, analysis of Krylov's method of transforming the secularequation, Trans. Second Math. Congress, vol. II, pp. 45-48 (1934).

[169] On the classification of real simple Lie groups, Mat. Sb., vol. 5, pp. 217-50(1939).

"[170] GANTMACHER, F. R. and M. G. KREIN, On the structure of an orthogonal matrix,Trans. Ukrain, Acad. Sci. Phys.-Mat. Kiev (Trudy fiz.-mat. otdcla VUAN, Kiev),1929, pp. 1-8.

*[171] Normal operators in a hermitian space, Bull. Phys-Mat. Soc. Univ. Kasan(Izvestiya fiz.-neat. ob-va pri Kazanskonl universitete), IV, vol. 1, ser. 3, pp. 71-84(1929-30).

BIBLIOGRAPHY 357

[172] On a special class of determinants connected with Kellogg's integralkernels, Mat. Sb., vol. 42, pp. 501-8 (1935).

[173) Sur lea matrices oscillatoires et comptetement non-negatives, CompositioMath., vol. 4, pp. 445-76 (1937).

[174] GANTSCHI, W., Bounds of matrices with regard to an hermitian metric, CompositioMath., vol. 12, pp. 1-16 (1954).

*[175] GELFAND, I. M. and V. B. LIDSKII, On the structure of the domains of stabilityof linear canonical systems of differential equations with periodic coefficients,Uspehi. Mat. Nauk, vol. 10, no. 1, pp. 3-40 (1955).

[176] GERSHOORIN, S. A., Ober die Abgrenzung der Eigenwerte einer Matrix, Izv. Akad.Nauk SSSR, Ser. Fiz.-Mat., vol. 6, pp. 749-54 (1931).

[177] GODDARD, L., An extension of a matrix theorem of A. Brauer, Proc. Int. Cong.Math. Amsterdam, 1954, vol. 2, pp. 22-23.

[178] GOHEEN, H. E., On a lemma of Stieltjes on matrices, Amer. Math. Monthly,vol. 56, pp. 328-29 (1949).

[179] GOLUSCHIKOV, A. IF., On the structure of the automorphisms of the complex simpleLie groups, Dokl. Akad. Nauk SSSR, vol. 27, pp. 7-9 (1951).

[180] GRAVE, D. A., Small oscillations and some propositions in algebra, Izv. Akad.Nauk SSSR, Ser. Fiz.-Mat., vol. 2, pp. 563-70 (1929).

[181] GROSSMAN, D. P., On the problem of a numerical solution of systems of simul-taneous linear algebraic equations, Uspehi Mat. Nauk, vol. 5, no. 3, pp. 87-103(1950).

[182] HAHN, W., Eine Bemerkung zur zweiten Methode von Lyapunov, Math. Nachr.,vol. 14, pp. 349-54 (1956).

[183] Ober die Anwendung der Methode von Lyapunov auf Differenzengleich-ungen, Math. Ann., vol. 136, pp. 430-41 (1958).

(184) HAYNSWORTH, E., Bounds for determinants with dominant main diagonal, DukeMath. J., vol. 20, pp. 199-209 (1953).

[185] Note on bounds for certain determinants, Duke Math. J., vol. 24, pp. 313-19 (1957).

[186] HELLMANN, 0., Die Anwendung der Matrizanten bei Eigenwertaufgaben, Z.Angew. Math. Mech., vol. 35, pp. 300-15 (1955).

[187] HERMITE, C., Sur le nombre des racines dune Equation algebrique comprise entredes limites donnees, J. Reine Angew. Math., vol. 52, pp. 39-51 (1856).

[188] HJELMSLER, J., Introduction it la thtorie des suites monotones, Kgl. Danske Vid.Selbak. Forh. 1914, pp. 1-74.

[189] HOFFMAN, A. and 0. TAUSSKY, A characterization of normal matrices, J. Res.Nat. Bur. Standards, vol. 52, pp. 17-19 (1954).

[190] HOFFMAN, A. and H. WIELANDT, The variation of the spectrum of a normalmatrix, Duke Math. J., vol. 20, pp. 37-39 (1953).

[191] HORN, A., On the eigenvalues of a matrix with prescribed singular values, Proc.Amer. Math. Soc., vol. 5, pp. 4-7 (1954).

[192] HOTELLING, H., Some new methods in matrix calculation, Ann. Math. Statist.,vol. 14, pp. 1-34 (1943).

[193) HOUSEHOLDER, A. S., On matrices with non-negative elements, Monatsh. Math.,vol. 62, pp. 238-49 (1958).

[194] HOUSEHOLDER, A. S. and F. L. BAUER, On certain methods for expanding thecharacteristic polynomial, Numer. Math., vol. 1, pp. 29-35 (1959).

[195] Hsu, P. L., On symmetric, orthogonal, and skew-symmetric matrices, Proc.Edinburgh Math. Soc., vol. 10, pp. 37-44 (1953).

358 BIBLIOGRAPHY

[196] On a kind of transformation of matrices, Acta Math. Sinica, vol. 5.pp. 333-47 (1955).

[197] HUA, L.-K., On the theory of automorphic functions of a matrix variable, Amer.J. Math., vol. 66, pp. 470-88; 531-63 (1944).

[198] Geometries of matrices, Trans. Amer. Math. floc., vol. 57, pp. 441-90

(1945).[199] Orthogonal classification of Hermitian matrices, Trans. Amer. Math.

Soc., vol. 59, pp. 508-23 (1946).'[200] Geometries of symmetric matrices over the real field, Dokl. Akad. Nauk

SSSR, vol. 53, pp. 95.98; 195-96 (1946).'[201] Automorphisms of the real symplectic group, Dokl. Akad. Nauk SSSR,

vol. 53, pp. 303-306 (1946).[202] Inequalities involving determinants, Acts. Math. Sinica, vol. 5, pp. 463-70

(1955).'[203] HUA, L.-K. and B. A. RoSENFELD, The geometry of rectangular matrices and their

application to the real projective and non-euclidean geometries, Izv. Higher Ed.SSSR, Matematika, vol. 1, pp. 233-46 (1957).

[204] HURWITZ, A., Ober die Bedingungen, unter welchen tine Gleichung nur Wurzelnmit negativen reellen Teilen besitzt, Math. Ann., vol. 46, pp. 273-84 (1895).

[205] INORAHAM, M. H., On the reduction of a matrix to its rational canonical form,Bull. Amer. Math. Soc., vol. 39, pp. 379-82 (1933).

[206] IoNEscu, D., 0 identitate important8 si descompunere a unei forme bilineare intosumd de produse, Gaz. Mat. Ser. Fiz. A. 7, vol. 7, pp. 303-312 (1955).

[207] IsHAK, M., Sur les spectres des matrices, Sbm. P. Dubreil et Ch. Pisot, Fac. Sci.Paris, vol. 9, pp. 1-14 (1955/56).

[208] KAUAN, V. F., On some number systems arising from Lorentz transformations,Izv. Ass. Inst. Moscow Univ. 1927, pp. 3-31.

'[209] KARPELEVICH, F. I., On the eigenvalues of a matrix with non-negative elements,Izv. Akad. Nauk SSSR Ser. Mat., vol. 15, pp. 361-83 (1951).

(210] KHAN, N. A., The characteristic roots of a product of matrices, Quart. J. Math.,vol. 7, pp. 138-43 (1956).

[211] KHLODOVSKII, I. N., On the theory of the general case of Krylov's transforma-tion of the secular equation, Izv. Akad. Nauk, Ser. Fiz: Mat., vol. 7, pp. 1076-1102(1933).

0[212] KoLMoooaov, A. N., Markov chains with countably many possible states, Bull.Univ. Moscow (A), vol. 1:3 (1937).

[213] KOTELYANSKII, D. M., On monotonic matrix functions of order n, Trans. Univ.Odessa, vol. 3, pp. 103-114 (1941).

(214] On the theory of non-negative and oscillatory matrices, Ukrain. Mat. Z.,vol. 2, pp. 94-101 (1950).

[215] On some properties of matrices with positive elements, Mat. Sb., vol. 31,pp. 497-506 (1952).

[216] On a property of matrices of symmetric signs, Uspehi Mat. Nauk, vol. 8,no. 4, pp. 163-67 (1953).

[217] On some sufficient conditions for the spectrum of a matrix to be realand simple, Mat. Sb., vol. 36, pp. 163-68 (1955).

[218] On the influence of Gauss' transformation on the spectra of matrices,Uspehi Mat. Nauk, vol. 9, no. 3, pp. 117-21 (1954).

[219] On the distribution of points on a matrix spectrum, Ukrain. Mat. Z.,vol. 7, pp. 131-33 (1955).

BIBLIOGRAPHY 359

'[220] Estimates for determinants of matrices with dominant main diagonal,Izv. Akad. Nauk SSSR, Ser. Mat., vol. 20, pp. 137-44 (1956).

'[221] KOVALENKO, K. R. and M. G. KREIN, On some investigations of Lyapunov con-cerning differential equations with periodic coefficients, DOkl. Akad. NaukSSSR, vol. 75, pp. 495-99 (1950).

[222] KOWALEWSKI, G., Natiirliehe Normalformen linearer Transformationen, Leipz.Ber., vol. 69, pp. 325-35 (1917).

'[223) KRASOVSKII, N. N., On the stability after the first approximation, Prikl. Mat.Meh., vol. 19, pp. 516-30 (1955).

'[2241 KRASNOSEL'SKII, M. A. and M. G. KREIN, An iteration process with minimaldeviations, Mat. Sb., vol. 31, pp. 315-34 (1952).

[225] KRAUS, F., Ober konvexe Matrixfunktionen, Math. Z., vol. 41, pp. 18-42 (1936).

'[226] KRAVCHUx, M. F., On the general theory of bilinear forms, Izv. Polyt. Inst.Kiev, vol. 19, pp. 17-18 (1924).

'[227] On the theory of permutable matrices, Zap. Akad. Nauk Kiev, Ser. Fiz.-Mat., vol. 1:2, pp. 28-33 (1924).

'[228] On a transformc.tion of quadratic forms, Zap. Akad. Nauk Kiev, Ser.Fiz: Mat., vol. 1:2, pp. f 7-90 (1924).

'[229] On quadratic forms and linear transformations, Zap. Akad. Nauk Kiev,Ser. Fiz.-Mat., vol. 1:3, pp. 1-89 (1924).

'[230] Permutable sets of linear transformations, Zap. Agr. Inst. Kiev, vol. 1,pp. 25-58 (1926).

[231] fiber vertauschbare Matrizen, Rend. Circ. Mat. Palermo, vol. 51, pp. 126-30 (1927).

[232] On the structure of permutable groups of matrices, Trans. Second. Mat.Congress 1934, vol. 2, pp. 11-12.

'[233] KRAVOHUK, M. F. and Y. S. GoL'DBAUM, On groups of commuting matrices,Trans. Av. Inst. Kiev, 1929, pp. 73-98; 1936, pp. 12-23.

'[234] On the equivalence of singular pencils of matrices, Trans. Av. Inst. Kiev,1936, pp. 5.27.

'[235] KREIN, M. G., Addendum to the paper 'On the structure of an orthogonal matrix,'Trans. Fiz.-Mat. Class. Akad. Nauk Kiev, 1931, pp. 103-7.

'[236] On the spectrum of a Jacobian form in connection with the theory oftorsion oscillations of drums, Mat. Sb., vol. 40, pp. 455-66 (1933).

(237) On a new class of hermitian forms, Izv. Akad. Nauk SSSR, Ser. Fiz.-Mat.,vol. 9, pp. 1259-75 (1933).

[238] On the nodes of harmonic oscillations of mechanical systems of a specialtype, Mat. Sb., vol. 41, pp. 339-48 (1934).

[239] Sur quelques applications des noyaux de Kellog aux problemes d'oscilla-tions, Proc. Charkov Mat. Soc. (4), vol. 11, pp. 3-19 (1935).

[240] Sur lea vibrations propres des tiges dont 1'une des extremites eat encastreeet l'autre libre, Proc. Charkov. Mat. Soc. (4), vol. 12, pp. 3-11 (1935).

'[241] Generalization of some results of Lyapunov on linear differential equa-tions with periodic coefficients, Dokl. Akad. Nauk SSSR, vol. 73, pp. 445-48(1950).

'[242] On an application of the fixed-point principle in the theory of linear

transformations of spaces with indefinite metric, Uspehi Mat. Nauk, vol. 5, no. 2,pp. 180-90 (1950).

'[243] On an application of an algebraic proposition in the theory of monodromymatrices, Uspehi Mat. Nauk, vol. 6, no. 1, pp. 171-77 (1951).

360 BIBLIOGRAPHY

"[244] On some problems concerning Lyapunov's ideas in the theory of stability,Uspehi Mat. Nauk, vol. 3, no. 3, pp. 166-69 (1948).

[245] On the theory of integral matrix functions of exponential type, Ukrain.

Mat. Z., vol. 3, pp. 164-73 (1951)."[246] On some problems in the theory of oscillations of Sturm systems, Prikl.

Mat. Meh., vol. 16, pp. 555-68 (1952)."[247] KR2IN, M. G. and M. A. NAIMARK (Neumark), On a transformation of the

B6zoutian leading to Sturm's theorem, Proc. Charkov Mat. Soc., (4), vol. 10,pp. 33-40 (1933).

"[248] On the application of the Bdzoutian to problems of the separation ofroots of algebraic equations, Trudy Odessa Gos. Univ. Mat., vol. 1, pp. 51-69(1935).

[249] KRONECKER, L., Algebraische Reduction der Schaaren bilinearer Formen, S: B.Akad. Berlin 1890, pp. 763-76.

[250] KRULL, W., Theorie and Anwendung der veraligemeinerten Abelschen Gruppen,S: B. Akad. Heidelberg 1926, p. 1.

'[251] KRYLOV, A. N., On the numerical solution of the equation by which the frequencyof small oscillations is determined in technical problems, Izv. Akad. Nauk SSSRSer. Piz.-Mat., vol. 4, pp. 491-539 (1931).

[252] LAPPO-DANILEVSKII, I. A., algorithmique des problemes reguliers dePoincare et de Riemann, J. Phys. Mat. Soc. Leningrad, vols. 2:1, pp. 94-120;121-54 (1928).

[253] Theorie des matrices satisfaisantes a des systemes des equations differentielles lindaires a coefficients rationnels arbitraires, J. Phys. Mat. Soc. Leningrad, vols. 2:2, pp. 41-80 (1928).

'[254] Fundamental problems in the theory of systems of linear differentialequations with arbitrary rational coefficients, Trans. First Math. Congr., ONTI,1936, pp. 254.62.

[255] LEDERMANN, W., Reduction of singular pencils of matrices, Proc. EdinburghMath. Soc., vol. 4, pp. 92-105 (1935).

[256] Bounds for the greatest latent root of a positive matrix, J. London Math.Soc., vol. 25, pp. 265-68 (1950).

"[257] LIDSXI', V. B., On the characteristic roots of a sum and a product of symmetricmatrices, Dokl. Akad. Nauk SSSR, vol. 75, pp. 769-72 (1950).

"[258] Oscillation theorems for canonical systems of differential equations,Dokl. Akad. Nauk SSSR, vol. 102, pp. 111-17 (1955).

[259] LIiNARD, and CHIPART, Sur la signe de la partie rtelle des racines d'une equationalgdbrique, J. Math. Pures Appl. (6), vol. 10, pp. 291-346 (1914).

"[260] LIPIN, N. V., On regular matrices, Trans. Inst. Eng 8. Transport, vol. 9, p. 105(1934).

'[261] LIVSHITZ, M. S. and V. P. POTAPov, The multiplication theorem for characteristicmatrix functions, Dokl. Akad. Nauk SSSR, vol. 72, pp. 164-73 (1950).

'[262] LOPSHITZ, A. M., Vector solution of a problem on doubly symmetric matrices,Trans. Math. Congress Moscow, 1927, pp. 186-87.

'[263] The characteristic equation of lowest degree for affinors and its appli-cation to the integration of differential equations, Trans. Sem. Vectors and Ten.sors, vols. 2/3 (1935).

"[264] A numerical method of determining the characteristic roots and charac.teristic planes of a linear operator, Trans. Sem. Vectors and Tensors, vol. 7,pp. 233.59 (1947).

BIBLIOGRAPHY 361

[265] An extremal theorem for a hyper-ellipsoid and its application to thesolution of a system of linear algebraic equations, Trans. Sem. Vectors and Ten-sors, vol. 9, pp. 183-97 (1952).

[266] LoWNER, K., Ober monotone Matrixfunktionen, Math. Z., vol. 38, pp. 177-216(1933) ; vol. 41, pp. 18-42 (1936).

[267] Some classes of functions defined by difference on differential inequali-ties, Bull. Amer. Math. Soc., vol. 56, pp. 308-19 (1950).

"[268] LusIN, N. N., On Krylov's method of forming the secular equation, Izv. Akad.Nauk SSSR, Ser. Fiz.-Mat., vol. 7, pp. 903-958 (1931).

"[269] --- On some properties of the displacement factor in Krylov's method, Izv.Akad. Nauk SSSR, Ser. Fiz.-Mat., vol. 8, pp. 596-638; 735-62; 1065-1102 (1932).

'[270] On the matrix theory of differential equations, Avtomat. i Telemeh, vol. 5,pp. 3-66 (1940).

'[271] LYUSTERNIK, L. A., The determination of eigenvalues of functions by an electricscheme, ElectrRestvo, vol. 11, pp. 67-8 (1946).

' [272] On electric models of symmetric matrices, Uspehi Mat. Nauk, vol. 4,no. 2, pp. 198-200 (1949).

'[273] LYUSTERNIK, L. A. and A. M. PROKHOROV, Determination of eigenvalues andfunctions of certain operators by means of an electrical network, Dokl. AkadNauk SSSR, vol. 55, pp. 579-82; Izv. Akad. Nauk SSSR, Ser. Mat., vol. 11,pp. 141-45 (1947).

[274] MARCUS, M., A remark on a norm inequality for square matrices, Proc. Amer.Math. Soc., vol. 6, pp. 117-19 (1955).

[275] An eigenvalue inequality for the product of normal matrices, Amer.Math. Monthly, vol. 63, pp. 173-74 (1956).

[276] A determinantal inequality of H. P. Robertson, II, J. Washington Acad.Sei., vol. 47, pp. 264.66 (1957).

[277] Convex functions of quadratic forms, Duke Math. J., vol. 24, pp. 321-26(1957).

[2781 MARCUS, M. and J. L. MCGREGOR, Extremal properties of Hermitian matrices,Canad. J. Math., vol. 8, pp. 524-31 (1956).

[279] MARCUS, M. and B. N. MOYLS, On the maximum principle of Ky Fan, Canad.J. Math., vol. 9, pp. 313-20 (1957).

[280] Maximum and minimum values for the elementary symmetric functionsof Hermitian forms, J. London Math. Soc., vol. 32, pp. 374-77 (1957).

'[281] MAYANTS, L. S., A method for the exact determination of the roots of secularequations of high degree and a numerical analysis of their dependence on theparameters of the corresponding matrices, Dokl. Akad. Nauk SSSR, vol. 50,pp. 121.24 (1945).

(282] MIRSKY, L., An inequality for positive-definite matrices, Amer. Math. Monthly,vol. 62, pp. 428-30 (1955).

[283] The norm of adjugate and inverse matrices, Arch. Math., vol. 7, pp. 276-77(1956).

[2841 The spread of a matrix, Mathematika, vol. 3, pp. 127-30 (1956).[2851 Inequalities for normal and Hermitian matrices, Duke Math. J., vol. 24,

pp. 591-99 (1957).[286] MITROVIC, D., Conditions graphiques pour que toutes lee ravines dune equation

algebrique soient z parties reelles negatives, C. R. Acad. Sei. Paris, vol. 240,pp. 1177-79 (1955).

[2871 MORGENSTERN, D., Eine Versch irfung der Ostrowskischen Determinanten-abachatzung, Math. Z., vol. 66, pp. 143-46 (1956).

362 BIBLIOGRAPHY

[288] MoTzKIN, T. and O. TAUSSKY, Pairs of matrices with property L., Trans. Amer.Math. Soc., vol. 73, pp. 108-14 (1952) ; vol. 80, pp. 387-401 (1954).

*[289] NEIOAUS (Neuhaus), M. G. and V. B. LIDSKI', On the boundedness of the solu-tions of linear systems of differential equations with periodic coefficients, Dokl.Akad. Nauk SSSR, vol. 77, pp. 183-93 (1951).

[290] NEUMANN, J., Approximative of matrices of high order, Portugal. Math., vol. 3,pp. 1-62 (1942).

*[291] NUDEL'MAN, A. A. and P. A. SHVARTSMAN, On the spectrum of the product ofunitary matrices, Uspehi Mat. Nauk, vol. 13, no. 6, pp. 111-17 (1958).

[292] OKAMOTO, M., On a certain type of matrices with an application to experimentaldesign, Osaka Math. J., vol. 6, pp. 73-82 (1954).

[293] OPPENIIEIM, A., Inequalities connected with definite Hermitian forms, Amer.Math. Monthly, vol. 61, pp. 463-66 (1954).

[294] ORLANDO, L., Sul problems di Hurwitz relativo alle parti reali delle radici di un'equatione algebrica, Math. Ann., vol. 71, pp. 233-45 (1911).

[295] OsTaowsxl, A., Bounds for the greatest latent root of a positive matrix, J. Lon-don Math. Soc., vol. 27, pp. 253-56 (1952).

[296] Sur quelques applications des fonctions convexes et concaves au lens de

1. Schur, J. Math. Puree Appl., vol. 31, pp. 253-92 (1952).[297] On nearly triangular matrices, J. Res. Nat. Bur. Standards, vol. 52,

pp. 344-45 (1954).[298] On the spectrum of a one-parametric family of matrices, J. Reine Angew.

Math., vol. 193, pp. 143-60 (1954).[299] Sur les determinants d diagonale dominante, Bul. Soc. Math. Belg., vol. 7,

pp. 46-51 (1955).[300] Note on bounds for some determinants, Duke Math. J., vol. 22, pp. 95-102

(1955).[3011 tlber Normen von Matrizen, Math. Z., vol. 63, pp. 2-18 (1955).[302] Uber die Stetigkeit von charakteristischen Wurzeln in Abhangigkeit von

den Matrizenelementen, Jber. Deutsch. Math. Verein., vol. 60, pp. 40-42 (1957).*[303] PAPKOVICR, P. F., On a method of computing the roots of a characteristic deter-

minant, Prikl. Mat. Meh., vol. 1, pp. 314-18 (1933).[304] PAPULIS, A., Limits on the zeros of a network determinant, Quart. Appl. Math.,

vol. 15, pp. 193-94 (1957).(3051 PARODI, M., Remarques sur la stabilit6, C. R. Acad. Sei. Paris, vol. 228, pp. 51-2;

807-8; 1198-1200 (1949).[306] Sur une propriet6 des racines dune equation qui intervient en mecanique,

C. R. Acad. Sci. Paris, vol. 241, pp. 1019-21 (1955).[3071 Sur la localisation des valeurs caract6ristiques des matrices dans le plan

complexe, C. R. Acad. Sci. Paris, vol. 242, pp. 2617-18 (1956).[308] PEANO, G., Integration par series des equations diff6rentielles lin6aires, Math.

Ann., vol. 32, pp. 450-56 (1888).[309] PENROSE, R., A generalized inverse for matrices, Proc. Cambridge Philos. Soc.,

vol. 51, pp. 406-13 (1955).[310] On best approximate solutions of linear matrix equations, Proc. Cam-

bridge Philos. Soc., vol. 52, pp. 17.19 (1956).[311] PERFECT, H., On matrices with positive elements, Quart. J. Math., vol. 2, pp. 286-

90 (1951).[312] On positive stochastic matrices with real characteristic roots, Proc. Cam-

bridge Philos. Soc., vol. 48, pp. 271-76 (1952).

BIBLIOGRAPHY 363

[313] Methods of constructing certain stochastic matrices, Duke Math. J.,vol. 20, pp. 395-404 (1953); vol. 22, pp. 305-11 (1955).

[314) A lower bound for the diagonal elements of a non-negative matrix, J.London Math. Soc., vol. 31, pp. 491-93 (1956).

[315] PERRON, 0., Jacobischer Eettenbruchalgorithmus, Math. Ann., vol. 64, pp. 1-76(1907).

[316] t7ber Matrizen, Math. Ann., vol. 64, pp. 248-63 (1907).

[317) t)ber Stabilitdt and asymptotisches Verhalten der Losungen eines Systemsendlicher Differenzengleichungen, J. Reine Angew. Math., vol. 161, pp. 41-64(1929).

[318] PHILLIPS, H. B., Functions of matrices, Amer. J. Math., vol. 41, pp. 266-78 (1919).

[319] PONTRYAGIN, L. S., Hermitian operators in a space with indefinite metric, Izv.Akad. Nauk 88SR, Ser. Mat., vol. 8, pp. 243-80 (1944).

"[320] POTAPOV, V. P., On holomorphic matrix functions bounded in the unit circle,Dokl. Akad. Nauk SSSR, vol. 72, pp. 849-53 (1950).

[321) RASca, G., Zur Theorie and Anwendung der Produktintegrals, J. Reine Angew.Math., vol. 171, pp. 65-119 (19534).

[322] REICHARDT, H., Einfarbe Herleitung der Jordanschen Normalform, Wiss. Z.Humboldt-Univ. Berlin. Math: Nat. Reihe, vol. 6, pp. 445-47 (1953/54).

'[323] RECHTMAN-OL'SHANSKAYA, P. G., On a theorem of Markov, Uspehi Mat. Nauk,vol. 12, no. 3, pp. 181-87 (1957).

[324] RHAM, G. DE, Sur un theoreme de Stieltjes relatif it certain matrices, Acad.Serbe Sci. Publ. Inst. Math., vol. 4, pp. 133-54 (1952).

[325] RICHTEa, H., tfber Matrixfunktionen, Math. Ann., vol. 122, pp. 16-34 (1950).[326] Bemerkung zur Norm der rnversen einer Matrix, Arch. Math., vol. 5,

pp. 447-48 (1954).[327] Zur Abscha"tzung von Matrizennormen, Math. Nachr., vol. 18, pp. 178-87

(1958).[328] ROMANOVSKII, V. I., Un theoreme sur lea zeros des matrices non-negatives, Bull.

Soc. Math. France, vol. 61, pp. 213-19 (1933).[329) Recherches stir lea chatnes de Markoff, Aeta Matb., vol. 66, pp. 147-251

(1935).[330] ROTH, W., On the characteristic polynomial of the product of two matrices, Proe.

Amer. Math. Soc., vol. 5, pp. 1-3 (1954).[331] On the characteristic polynomial of the product of several matrices, Proc.

Amer. Math. Soc., vol. 7, pp. 578-82 (1856).[332] ROY, S., A useful theorem in matrix theory, Proc. Amer. Math. Soc., vol. 5,

pp. 635-38 (1954).[333] SAKHNOVICH, L. A., On the limits of multiplicative integrals, Uspehi Mat. Nauk,

vol. 12 no. 3, pp. 205-11 (1957)."[334] SARYMSAKOV, T. A., On sequences of stochastic matrices, Dokl. Akad. Nauk, vol.

47, pp. 331-33 (1945).[335] SCHNEIDER, H., An inequality for latent roots applied to determinants with domi-

nant principal diagonal, J. London Math. Soc., vol. 28, pp. 8-20 (1953).[336] A pair of matrices with property P, J. Amer. Math. Monthly, vol. 62,

pp. 247-49 (1955).[337] A matrix problem concerning projections, Proc. Edinburgh Math. Soc.,

vol. 10, pp. 129-30 (1956).[338] The elementary divisors, associated with 0, of a singular M-matrix,

Proc. Edinburgh Math. Soc., vol. 10, pp. 108-22 (1956).

364 BIBLIOGRAPHY

[339] SCHOENBERG, J., Uber variationsvermin.dernde lineare transformationen, Math.Z., vol. 32, pp. 321-28 (1930).

[340] Zur abziihlung der reellen wurzeln algebraischer gleichungen, Math. Z.,vol. 38, p. 546 (1933).

[341] SCHOENBERG, I. J., and A. WHITNEY, A theorem on polygons in n dimensionswith application to variation diminishing linear transformations, CompositioMath., vol. 9, pp. 141-60 (1951).

[342] SCHUR, I., tlber die charakteristischen wurzeln einer linearen substitution miteiner anwendung out die theorie der integraigleichungen, Math. Ann., vol. 66,pp. 488-510 (1909).

[343] SEMENDYAEV, K. A., On the determination of the eigenvalues and invariant mani-folds of matrices by means of iteration, Prikl. Matem. Meh., vol. 3, pp. 193-221(1943).

[344] SEVAST'YANOV, B. A., The theory of branching random processes, Uspehi Mat.Nauk, vol. 6, no. 6, pp. 46.99 (1951).

[345] SHIFFNER, L. M., The development of the integral of a system of differentialequations with regular singularities in series of powers of the elements of thedifferential substitution, Trudy Mat. Inst. Steklov. vol. 9, pp. 235-66 (1935).

[346] On the powers of matrices, Mat. Sb., vol. 42, pp. 385-94 (1935).[347] SHODA, K., t7ber mit einer matrix vertauschbare matrizen, Math. Z., vol. 29,

pp. 696-712 (1929).[348] SHOSTAK, P. Y., On a criterion for the conditional definiteness of quadratic forms

in n linearly independent variables and on a sufficient condition for a conditionalextremum of a function of n variables, Uspehi Mat. Nauk, vol. 8, no. 4, pp. 199-206(1954).

[349] SHREIDER, Y. A., A solution of systems of linear algebraic equations, Dokl. Akad.Nauk, vol. 76, pp. 651-55 (1950).

(350] SHTAERMAN (Steiermann), I. Y., A new method for the solution of certainalgebraic equations which have application to mathematical physics, Z. Mat.,Kiev, vol. 1, pp. 83-89 (1934); vol. 4, pp. 9-20 (1934).

(351] SHTAERMAN (Steiermann), I. Y. and N. I. AKHIESER (Achieser), On the theoryof quadratic forms, Izv. Polyteh., Kiev, vol. 19, pp. 116-23 (1934).

[352] SHuRA-BwRA, M. R., An estimate of error in the numerical computation of matricesof high order, Uspehi Mat. Nauk, vol. 6, no. 4, pp. 121-50 (1951).

[353] SHVARTSMAN (Schwarzmann), A. P., On Green's matrices of self-ad joint differ-ential operators, Proc. Odessa Univ. Matematika, vol. 3, pp. 35-77 (1941).

[354] SIEGEL, C. L., Symplectic Geometry, Amer. J. Math., vol. 65, pp. 1-86 (1943).[355] SKAL'KINA, M. A., On the preservation of asymptotic stability on transition from

differential equations to the corresponding difference equations, Dokl. Akad.Nauk SSSR, vol. 104, pp. 505.8, (1955).

[356] SMOGORZHEVSKII, A. S., Sur lea matrices unitaires du type de ciroulants, J. Mat.Circle Akad. Nauk Kiev, vol. 1, pp. 89-91 (1932).

[356a] SMOGORZHEvsxIf, A. S. and M. F. KRAVCHUK, On orthogonal transformations,Zap. Inst. Ed. Kiev, vol. 2, pp. 151-56 (1927).

[357] STENZEL, H., Ober die Daratellbarkeit einer Matrix als Produkt symmet-rischen Matrizen, Math. Z., vol. 15, pp. 1-25 (1922).

(358] S76Ha, A., Oazillationstheoreme for die Eigenvektoren spesiellen Matrizen, J.Reins Angew. Math., vol. 185, pp. 129-43 (1943).

0[359] SULEIMANOVA, K. R., Stochastic matrices with real characteristic values, Dokl.Akad. Nauk SSSR, vol. 66, pp. 343-45 (1949).

BIBLIOGRAPHY 365

"[360] On the characteristic values of stochastic matrices, U6. Zap. MoscowPed. Inst., Ser. 71, Math., vol. 1, pp. 167-97 (1953).

*[361] SULTANOV, R. M., Some properties of matrices with elements in a non-commutativering, Trudy Mat. Sectora Akad. Nauk Baku, vol. 2, pp. 11-17 (1946).

"[362] SUSHKEVICH, A. K., On matrices of a special type, Uli. Zap. Univ. Charkov,vol. 10, pp. 1-16 (1937).

[363] Sz-NAGY, B., Remark on S. N. Roy's paper `A useful theorem in matrix theory,'Proc. Amer. Math. Soc., vol. 7, p. 1 (1956).

[364] TA Li, Die Stabilitatsfrage bei Differenzengleichungen, Acta Math., vol. 63,pp. 99-141 (1934).

[365] TAUSSKY, 0., Bounds for characteristic roots of matrices, Duke Math. J., vol. 15,pp. 1043-44 (1948).

[366] A determinantal inequality of H. P. Robertson, I, J. Washington Acad.Sci., vol. 47, pp. 263-64 (1957).

[367] Commutativity in finite matrices, Amer. Math. Monthly, vol. 64, pp. 229-35 (1957).

[368] TOEPLITZ, 0., Das algebraische Analogon su einem Sate von Math. Z.,vol. 2, pp. 187-97 (1918).

[369] TUBNBULL, H. W., On the reduction of singular matrix pencils, Proc. EdinburghMath. Soc., vol. 4, pp. 67-76 (1935).

"[370] TUBCHANINOV, A. S., On some applications of matrix calculus to linear differen.tial equations, Ul. Zap. Univ. Odessa, vol. 1, pp. 41-48 (1921).

"[371] VEBZHBITS1 I, B. D., Some problems in the theory of series compounded fromseveral matrices, Mat. Sb., vol. 5, pp. 505-12 (1939).

"[372] VILENKIN, N. Y., On an estimate for the maximal eigenvalue of a matrix, U.Zap. Moscow Ped. Inst., vol. 108, pp. 55-57 (1957).

[373] VIVIES, M., Note sur ice structures unitaires et paraunitaires, C. R. Acad. 8ci.Paris, vol. 240, pp. 1039.41 (1955).

[374] VOLTESBA, V., Sui fondamenti delta teoria defile equationi differenziali lineari,Mem. Soc. Ital. Sci. (3), vol. 6, pp. 1-104 (1887); vol. 12, pp. 3-68 (1902).

[375] WALKaa, A. and J. WESTON, Inclusion theorems for the eigenvalues of a normalmatrix, J. London Math. Soc., vol. 24, pp. 28-31 (1949).

[376] WAYLAND, H., Expansions of determinantal equations into polynomial form,Quart. Appl. Math., vol. 2, pp. 277.306 (1945).

[377] WEIEBSTRASB, K., Zur theorie der bilinearen and quadratischen Formen, Monatsli.Akad. Wiss. Berlin, 1867, pp. 310-38.

[378] WELLSTEIN, J., Ober symmetrische, alternierende and orthogonale Normalformenvon Mat risen, J. Reins Angew. Math., vol. 163, pp. 166-82 (1930).

[379) WEYL, H., Inequalities between the two kinds of eigenvaiues of a linear trans-formation, Proc. Nat. Acad. Sci., vol. 35, pp. 408-11 (1949).

[380] WEYE, E., Zur Theorie der bilinearen Formen, Monatsh. f. Math. and Physik,vol. 1, pp. 163-236 (1890).

[381] WHITNEY, A., A reduction theorem for totally positive matrices, J. Analyse Math.,vol. 2, pp. 88-92 (1952).

[382] WIELANDT, H., Bin Einschliessungssatz fur charakteristische Wurzeln normalerMatrizen, Arch. Math., vol. 1, pp. 348-52 (1948/49).

[383] Die Einachliessung von Eigenwerten normaler Matrizen, Math. Ann.vol. 121, pp. 234-41 (1949).

[384] Unzerlegbare nicht-negative Matrizen, Math. Z., vol. 52, pp. 642-48(1950).

366 BIBLIOGRAPHY

[385] Lineare Scharen von Matrixen mit reellen Eigenwerten, Math. Z., vol. 53,pp. 219-25 (1950).

[386] Pairs of normal matrices with property L, J. Res. Nat. Bur. Standards,vol. 51, pp. 89-90 (1953).

[387] Inclusion theorems for eigenvalues, Nat. Bur. Standards, Appl. Math.Sci., vol. 29, pp. 75-78 (1953).

[388] An extremum property of sums of eigenvalues, Proc. Amer. Math. Soc.,vol. 6, pp. 106-110 (1955).

[389] On eigenvalues of sums of normal matrices, Pacific J. Math., vol. 5,pp. 633-38 (1955).

[390] WINTNER, A., On criteria for linear stability, J. Math. Mech., vol. 6, pp. 301-9(1957).

[391] WONO, Y., An inequality for Minkowski matrices, Proc. Amer. Math. Soc., vol. 4,pp. 137-41 (1953).

[392] On non-negative valued matrices, Proc. Nat. Acad. Sci. U.S.A., vol. 40,pp. 121-24 (1954).

'[393] YAGLOM, I. M., Quadratic and skew-symmetric bilinear forms in a real symplecticspace, Trudy Bern. Vect. Tens. Anal. Moscow, vol. 8, pp. 364-81 (1950).

'[394] YAKUBOVICH, V. A., Some criteria for the reducibility of a system of differentialequations, Dokl. Akad. Nauk SSSR, vol. 66, pp. 577-80 (1949).

'[395] ZEITLIN (Tseitlin), M. L., Application of the matrix calculus to the synthesis ofrelay-contact schemes, Dokl. Akad. Nauk SSSR, vol. 86, pp. 525-28 (1952).

[396] ZIMMERMANN (Taimmerman), G. K., Decomposition of the norm of a matrixinto products of norms of its rows, Naub. Zap. Ped. Inst. Nikolaevak, vol. 4,pp. 130-35 (1953).

[Numbers in

ABSOLUTE CONCEPTS, 184Addition of congruences, 182Addition of operators, 57Adjoint matrix, 82Adjoint operator, 265Algebra, 17Algorithm of Gauss, 23ff.

generalized, 45Angle between vectors, 242Axes, principal, 309

reduction to, 309

BASIS(ES), 51characteristic, 73coordinates of vector in, 53Jordan, 201

lower, 202orthonormal, 242, 245

Bessel, inequality of, 259Bbzout, generalized theorem of, 81Binet-Cauchy formula, 9Birkhoff, G. D., 147Block, of matrix, 41

diagonal, isolated, 75Jordan, 151

Block multiplication of matrices, 42Bundle of vectors, 183Bunyakovekii's inequality, 255

CARTAN, theorem of, 4Cauchy, formula of Binet-, 9

system of, 115Cauchy identity, 10Cauchy index, 174, 216Cayley, formulas of, 279Cayley-Hamilton theorem, 83, 197Cell, of matrix, 41Chain, see Jordan, Markov, SturmCharacteristic basis, 73Characteristic direction, 71Characteristic equation, 70, 310, 338Characteristic matrix, 82Characteristic polynomial, 71, 82

INDEXitalics refer to Volume Two)

Characterization of root, minimal, 319maximal-minimal, 321, 322

Chebyshev, 173, 240polynomials of, 259

Chebyshev-Markov, formula of, 248theorem of, 247

Chetaev, 121Chipart, 173, 221Coefficients of Fourier, 261Coefficients of influence, reduced, 111Column, principal, 338Column matrix, 2Columns, Jordan chains of, 165Components, of matrix, 105

of operator, herinitian, 268skew-symmetric, 281

symmetric, 281Compound matrix, 19ff., 20Computation of powers of matrix, 109Congruences, 181, 182Constraint, 320Convergence, 110, 112Coordinates, transformation of, 59

of vector, 53Coordinate transformation, matrix of, 60'

D'ALEMBERT-EULER, theorem of, 286Danilevskii, 214Decomposition, of matrix into triangular

factors, 33ff.polar, of operator, 276, 286; 6of space, 248

Defect of vector space, 64Derivative, multiplicative, 133Determinant identity of Sylvester, 32, 33Determinant of square matrix, 1Diagonal matrix, 3Dilatation of space, 287Dimension, of matrix, 1

of vector space, 51Direction, characteristic, 71Discriminant of form, 333

369

370

Divisors, elementary, 142, 144, 194admissible, 238geometrical theory of, 175infinite, 27

Dmitriev, 87Domain of stability, 232Dynkin, 87

EIGENVALUE, 69Elements of matrix, 1Elimination method of Gauss, 23ff.Equivalence, of matrices, 61, 132, 133

of pencils, strict, 24Ergodic theorem for Markov chains, 95Erugin, theorem of, 122Euler-D 'Alembert, theorem of, 286

FACTOR SPACE, 183Faddeev, method of, 87Field, 1Forces, linear superposition of, 28Form, bilinear, 294

Hankel, 338; 205hermitian, 244, 331

bilinear, 332canonical form of, 337negative definite, 337negative semidefinite, 336pencil of, see pencilpositive definite, 337positive semidefinite, 336rank of, 333signature of, 334singular, 333

quadratic, 246, 294definite, 305discriminant of, 294rank of, 296real, 294reduction of, 299ff.reduction to principal axes, 309restricted, 306semidefinite, 304signature of, 296, 298singular, 294

Fourier series, 261Frobenius, 304, 339, 343; 53

theorem of, 343; 53Function, entire, 169

left value of, 81

GANTMACHER, 103Gauss, algorithm of, 23ff.

generalized, 45elimination method of, 23ff.

INDEX

Gaussian form of matrix, 39Golubchikov, 124Governors, 172, 233Gram, criterion of, 247Gramian, 247, 251Group, 18

unitary, 268Gundenfinger, 304

HADAMARD INEQUALITY, 252generalized, 254

Hamilton-Cayley theorem, 83, 197Hankel form, 338; 205Hankel matrix, 338; 205Hermite, 172, 202, 210Hermite-Biehler theorem, 228Hurwitz, 173, 190, 210Hurwitz matrix, 190Hyperlogarithm, 169

IDENTITY OPERATOR, 66Imprimitivity, index of, 80Ince, 147Inertia, law of, 297, 334Integral, multiplicative, 132, 138

product, 132Invariant plane, of operator, 283

JACOBI, formula of, 302, 336identity of, 114method of, 300theorem of, 303

Jacobi matrix, 99Jordan basis, 201Jordan block, 151Jordan chains of columns, 165Jordan form of matrix, 152, 201, 202Jordan matrix, 152, 201

KARPELEVICH, 87Kernel of 7l-matrix, 39Kolmogorov, 83, 87, 92Kotelyanskii, 103

lemma of, 71Krein, 821, 250Kronecker, 75; 25, 37, 40Krylov, 203

transformation of, 206

LAGRANGE, method of, 299Lagrange interpolation polynomial, 101Lagrange-Sylvester interpolation polyno-

mial, 97).-matrix, 130

kernel of, 89

Lappo-Danilevekil, 168, 170, 171Left value, 81Legendre polynomials, 258Li6nard, 173, 221Li6nard-Chipart stability criterion, 221Limit of sequence of matrices, 33Linear (in)dependence of vectors, 51Linear transformation, 3Logarithm of matrix, 239Lyapunov, 173, 185

criterion of, 120equivalence in the sense of, 118theorem of, 187

Lyapunov matrix, 117Lyapunov transformation, 117

MACMILLAN, 115Mapping, affine, 245Markov, 173, 240

theorem of, 242Markov chain, acyclic, 88

cyclic, 88fully regular, 88homogeneous, 83

period of, 96(ir)reducible, 88regular, 88

Markov parameters, 233, 234Matricant, 127Matrices, addition of, 4

group property, 18annihilating polynomial of, 89applications to differential equations,

116ff.congruence of, 296difference of, 5equivalence of, 132, 133equivalent, 61ff.left-equivalence of, 132, 133limit of sequence of, 33multiplication on left by H, 14product of, 6quotient of, 17rank of product, 12similarity of, 67unitary similarity of, 242with same real part of spectrum, 122adjoint, 82, 266

reduced, 90blocks of, 41canonical form of, 63,135, 136, 139, 141,

152, 192, 201, 202, 264, 265cells of, 41characteristic, 82characteristic polynomial of, 82

INDEX 371

Matrix, column, 2commuting, 7companion, 149completely reducible, 81complex, 1 ff.

orthogonal, normal form of, 23representation of as product, 6skew-symmetric, normal form of, 18symmetric, normal form of, 11

components of, 105compound, 19ff., 20computation of power of, 109constituent, 105of coordinate transformation, 60cyclic form of, 54decomposition into triangular factors,

33ff.derivative of, 117determinant of, 1, 5diagonal, 3

multiplication by, 8diagonal form of, 152dimension of, 1elementary, 132elementary divisors of, 142, 144, 194elements of, 1function of, 95ff.

defined on spectrum, 96fundamental, 73Gaussian form of, 39Hankel, 338; 205

projective, 20Hurwitz, 190idempotent, 226infinite, rank of, 239integral, 126; 113

normalized, 114invariant polynomials of, 139, 144, 194inverse of, 15

minors of, 19ff.irreducible, 50

(im)primitive, 80Jacobi, 99Jordan form of, 152, 201, 202A, 130and linear operator, 56logarithm of, 239Lyapunov, 117minimal polynomial of, 89

uniqueness of, 90minor of, 2

principal, 2multiplication of, by number, 5

by matrix, 17

372

Matrix, nilpotent, 226non-negative, 5U

totally, 98non-singular, 15normal, 269normal form of, 150, 192, 201, 202notation for, 1order of, 1orthogonal, 263oscillatory, 103partitioned, 41, 42permutable, 7permutation of, 50polynomial, see polynomial matrixpolynomials in, permutability of, 13positive, 50

spectra of, 53totally, 98

power of, 12computation of, 109

power series in, 113principal minor of, 2quasi-triangular, 43rank of, 2reducible, 50, 51

normal form of, 75representation as product, 264root of non-singular, 233root of singular, 234ff., 239Routh, 191row, 2of simple structure, 73singular, 15skew-symmetric, 19square, 1square root of, 239stochastic, 83

fully regular, 88regular, 88

spur of, 87subdiagonal of, 13superdiagonal of, 13symmetric, 19trace of, 87

transformation of coordinate, 60transforming, 35, 60transpose of, 19triangular, 18, 218; 155unit, 12unitary, 263, 269unitary, representation of as product, 5upper quasi-triangular, 43upper triangular, 18

Matrix addition, properties of, 4

INDEX

Matrix equations, 215ff.uniqueness of solution, 16

Matrix multiplication, 6, 7Matrix polynomials, 76

left quotient of, 78multiplication of, 77

Maxwell, 172Mean, convergence in, of series, 260Metric, 242

euclidean, 245hermitian, 243, 244

positive definite, 243positive semidefinite, 243

Minimal indices for columns, 38Minor, 2

almost principal, 102of zero density, 104

Modulus, left, 275Moments, problem of, 236, 237Motion, of mechanical system, 125

of point, 121stability of, 125

asymptotic, 125

NAIMARK, 221, 233, 250Nilpoteney, index of, 226Norm, left, 275

of vector, 243Null vector, 52Nullity of vector space, 64Number space, n-dimensional, 52

OPERATIONS, elementary, 134Operator (linear), 55, 66

adjoint, 265decomposition of, 281hermitian, 268

positive definite, 274positive semidefinite, 274projective, 20spectrum of, 272

identity, 66invariant plane of, 283matrix corresponding to, 56normal, 268positive definite, 280positive semidefinite, 280normal, 280orthogonal, of first kind, 281

(im)proper, 281of second kind, 281

polar decomposition of, 276, 286real, 282semidefinite, 274, 280

INDEX 373

Operator (linear), of simple structure, 72skew-symmetric, 280square root of, 275

symmetric, 280transposed, 280

unitary, 268spectrum of, 273

Operators, addition of, 57multiplictaion of, 58

Order of matrix, 1Orlando, formula of, 196Orthogonal complement, 266Orthogonalization, 256Oscillations, small, of system, 326

PARAMETERS, homogeneous, 26Markov, 233, 234

Parseval, equality of, 261Peano, 127Pencil of hermitian forms, 338

characteristic equation of, 338characteristic values of, 338principal vector of, 338

Pencil(s) of matrices, canonical form of,37, 39

congruent, 41elementary divisors of, infinite, 27rank of, 29regular, 25singular, 26strict equivalence of, 24

Pencil of quadratic forms, 310characteristic equation of, 310characteristic value of, 310principal column of, 310principal matrix of, 312principal vector of, 310

Period, of Markov chain, 96Permutation of matrix, 50Perron, 53

formula of, 116Petrovskil, 113Polynomial(s), annihilating, 176, 177

minimal, 176of square matrix, 89

of Chebyshev, 259characteristic, 71interpolation, 97, 101, 103invariant, 139, 144, 194of Legendre, 258matrix, see matrix polynomialsminimal, 89, 176, 177monic, 176scalar, 76positive pair of, 227

Polynomial matrix, 76,130elementary operations on, 130, 131regular, 76order of, 76

Power of matrix, 12Probability, absolute, 93

limiting, 94mean limiting, 96

transition, 82final, 88limiting, 88mean limiting, 96

Product, inner, of vectors, 243scalar, of vectors, 242, 243of operators, 58of sequences, 6

Pythagoras, theorem of, 244

QUASI-EROODIC THEOREM, 95Quasi-triangular matrix, 43Quotients of matrices, 17

RANK, of infinite matrix, 239of matrix, 2of pencil, 29of vector space, 64

Relative concepts, 184Right value, 81Ring, 17Romanovskii, 83Root of matrix, 233, 234ff., 239Rotation of space, 287Routh, 173, 201

criterion of, 180Routh-Hurwitz, criterion of, 194Routh matrix, 191Routh scheme, 179Row matrix, 2

SCHLESINOER, 133Schur, formulas of, 46Schwarz, inequality of, 255Sequence of vectors, 256, 260Series, convergence of, 260

fundamental, of solutions, 38Signature of quadratic form, 296, 298Similarity of matrices, 67Singularity, 143Smirnov, 171Space, coefficient, 232

decomposition of, 177, 248dilatation of, 287euclidean, 242, 245

extension of, to unitary space, 282factor, 183

374 INDEX

Space, rotation of, 287unitary, 242, 243

as extension of euclidean, 282Spectrum, 96, 272, 273; 53Spur, 87Square(s), independent, 297

positive, 334Stability, criterion of, 221

domain of, 232of motion, 125of solnution of linear system, 129

States, essential, 92limiting, 92non-essential, 92

Stieltjes, theorem of, 232Stodola, 173Sturm, theorem of, 175Sturm chain, 175

generalized, 176Subdiagonal, 13Subspace, characteristic, 71

coordinate, 51cyclic, 185generated by vector, 185invariant, 178vector, 63

Substitution, integral, 143, 169Suleimanova, 87Superdiagonal, 13Sylvester, identity of, 32, 33

inequality of, 66Systems of differential equations, applica-

tion of matrices to, 116ff.equivalent, 118reducible, 118regular, 121, 168singularity of, 143stability of solution, 129

Systems of vectors, bi-orthogonal, 267orthonormal, 245

TRACE, 87Transformation, linear, 3

of coordinates, 59orthogonal, 242, 263unitary, 242, 263written as matrix equation, 7

Lyapunov, 117Transforming matrix, 35, 60Transpose, 19, 280Transposition, 18

UNIT SUM OF SQUARES, 314Unit sphere, 315Unit vector, 244

VALUE(S), characteristic, maximal, 53extremal properties of, 317

latent, 69left and right, of function, 81proper, 69

Vector(s), 51angle between, 242bundle of, 183Jordan chain of, 202complex, 282congruence of, 181extremal, 55inner product of, 243Jordan chain of, 201latent, 69length of, 242, 243linear dependence of, 51

test for, 251modulo I, 183

linear independence of, 51

norm of, 243normalized, 244; 66null, 52orthogonal, 244, 248orthogonalization of sequence, 256principal, 318, 338proper, 69projecting, 248projection of, orthogonal, 248real, 282scalar product of, 242, 243systems of, bi-orthogonal, 267

orthonormal, 245unit, 244

Vector space, 50ff., 51basis of, 51defect of, 64dimension of, 51finite-dimensional, 51infinite-dimensional, 51nullity of, 64rank of, 64

Vector, subspace, 63Volterra, 133, 146, 147Vyshnegradskii, 172

WEIERSTRASS, 25

ISBN 0-8218-1376-5

9"78082111813768CIIEL/131.H

f. r. gantmacher the theory of matrices, vol. 1 1990

Documents