matrix analysis for scientists and engineers by alan j laub
TRANSCRIPT
Matrix Analysisfor Scientists & Engineers
Matrix Analysis for Scientists & Engineers
This page intentionally left blank This page intentionally left blank
Matrix Analysisfor Scientists & Engineers
Alan J. LaubUniversity of California
Davis, California
slam.
Matrix Analysis for Scientists & Engineers
Alan J. Laub University of California
Davis, California
Copyright © 2005 by the Society for Industrial and Applied Mathematics.
1 0 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this bookmay be reproduced, stored, or transmitted in any manner without the written permissionof the publisher. For information, write to the Society for Industrial and AppliedMathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688.
MATLAB® is a registered trademark of The MathWorks, Inc. For MATLAB product information,please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 USA,508-647-7000, Fax: 508-647-7101, [email protected], www.mathworks.com
Mathematica is a registered trademark of Wolfram Research, Inc.
Mathcad is a registered trademark of Mathsoft Engineering & Education, Inc.
Library of Congress Cataloging-in-Publication Data
Laub, Alan J., 1948-Matrix analysis for scientists and engineers / Alan J. Laub.
p. cm.Includes bibliographical references and index.ISBN 0-89871-576-8 (pbk.)
1. Matrices. 2. Mathematical analysis. I. Title.
QA188138 2005512.9'434—dc22
2004059962
About the cover: The original artwork featured on the cover was created by freelanceartist Aaron Tallon of Philadelphia, PA. Used by permission.
slam is a registered trademark.
Copyright © 2005 by the Society for Industrial and Applied Mathematics.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688.
MATLAB® is a registered trademark of The MathWorks, Inc. For MATLAB product information, please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 USA, 508-647-7000, Fax: 508-647-7101, [email protected], wwwmathworks.com
Mathematica is a registered trademark of Wolfram Research, Inc.
Mathcad is a registered trademark of Mathsoft Engineering & Education, Inc.
Library of Congress Cataloging-in-Publication Data
Laub, Alan J., 1948-Matrix analysis for scientists and engineers / Alan J. Laub.
p. cm. Includes bibliographical references and index. ISBN 0-89871-576-8 (pbk.)
1. Matrices. 2. Mathematical analysis. I. Title.
QA 188.L38 2005 512.9'434-dc22
2004059962
About the cover: The original artwork featured on the cover was created by freelance artist Aaron Tallon of Philadelphia, PA. Used by permission .
• 5.lam... is a registered trademark.
To my wife, Beverley
(who captivated me in the UBC math librarynearly forty years ago)
To my wife, Beverley
(who captivated me in the UBC math library nearly forty years ago)
This page intentionally left blank This page intentionally left blank
Contents
Preface xi
1 Introduction and Review 11.1 Some Notation and Terminology 11.2 Matrix Arithmetic 31.3 Inner Products and Orthogonality 41.4 Determinants 4
2 Vector Spaces 72.1 Definitions and Examples 72.2 Subspaces 92.3 Linear Independence 102.4 Sums and Intersections of Subspaces 13
3 Linear Transformations 173.1 Definition and Examples 173.2 Matrix Representation of Linear Transformations 183.3 Composition of Transformations 193.4 Structure of Linear Transformations 203.5 Four Fundamental Subspaces 22
4 Introduction to the Moore-Penrose Pseudoinverse 294.1 Definitions and Characterizations 294.2 Examples 304.3 Properties and Applications 31
5 Introduction to the Singular Value Decomposition 355.1 The Fundamental Theorem 355.2 Some Basic Properties 385.3 Row and Column Compressions 40
6 Linear Equations 436.1 Vector Linear Equations 436.2 Matrix Linear Equations 446.3 A More General Matrix Linear Equation 476.4 Some Useful and Interesting Inverses 47
vii
Contents
Preface
1 Introduction and Review 1.1 Some Notation and Terminology 1.2 Matrix Arithmetic . . . . . . . . 1.3 Inner Products and Orthogonality . 1.4 Determinants
2 Vector Spaces 2.1 Definitions and Examples . 2.2 Subspaces......... 2.3 Linear Independence . . . 2.4 Sums and Intersections of Subspaces
3 Linear Transformations 3.1 Definition and Examples . . . . . . . . . . . . . 3.2 Matrix Representation of Linear Transformations 3.3 Composition of Transformations . . 3.4 Structure of Linear Transformations 3.5 Four Fundamental Subspaces . . . .
4 Introduction to the Moore-Penrose Pseudoinverse 4.1 Definitions and Characterizations. 4.2 Examples.......... 4.3 Properties and Applications . . . .
5 Introduction to the Singular Value Decomposition 5.1 The Fundamental Theorem . . . 5.2 Some Basic Properties ..... 5.3 Rowand Column Compressions
6 Linear Equations 6.1 Vector Linear Equations . . . . . . . . . 6.2 Matrix Linear Equations ....... . 6.3 A More General Matrix Linear Equation 6.4 Some Useful and Interesting Inverses.
vii
xi
1 1
3 4 4
7 7 9
10
13
17 17 18 19 20 22
29 29 30 31
35 35 38 40
43 43 44 47 47
viii Contents
7 Projections, Inner Product Spaces, and Norms 517.1 Projections 51
7.1.1 The four fundamental orthogonal projections 527.2 Inner Product Spaces 547.3 Vector Norms 577.4 Matrix Norms 59
8 Linear Least Squares Problems 658.1 The Linear Least Squares Problem 658.2 Geometric Solution 678.3 Linear Regression and Other Linear Least Squares Problems 67
8.3.1 Example: Linear regression 678.3.2 Other least squares problems 69
8.4 Least Squares and Singular Value Decomposition 708.5 Least Squares and QR Factorization 71
9 Eigenvalues and Eigenvectors 759.1 Fundamental Definitions and Properties 759.2 Jordan Canonical Form 829.3 Determination of the JCF 85
9.3.1 Theoretical computation 869.3.2 On the +1's in JCF blocks 88
9.4 Geometric Aspects of the JCF 899.5 The Matrix Sign Function 91
10 Canonical Forms 9510.1 Some Basic Canonical Forms 9510.2 Definite Matrices 9910.3 Equivalence Transformations and Congruence 102
10.3.1 Block matrices and definiteness 10410.4 Rational Canonical Form 104
11 Linear Differential and Difference Equations 10911.1 Differential Equations 109
11.1.1 Properties of the matrix exponential 10911.1.2 Homogeneous linear differential equations 11211.1.3 Inhomogeneous linear differential equations 11211.1.4 Linear matrix differential equations 11311.1.5 Modal decompositions 11411.1.6 Computation of the matrix exponential 114
11.2 Difference Equations 11811.2.1 Homogeneous linear difference equations 11811.2.2 Inhomogeneous linear difference equations 11811.2.3 Computation of matrix powers 119
11.3 Higher-Order Equations 120
viii
7 Projections, Inner Product Spaces, and Norms 7.1 Projections ..................... .
7.1.1 The four fundamental orthogonal projections 7.2 Inner Product Spaces 7.3 Vector Norms 7.4 Matrix Norms ....
8 Linear Least Squares Problems 8.1 The Linear Least Squares Problem . . . . . . . . . . . . . . 8.2 Geometric Solution . . . . . . . . . . . . . . . . . . . . . . 8.3 Linear Regression and Other Linear Least Squares Problems
8.3.1 Example: Linear regression ...... . 8.3.2 Other least squares problems ...... .
8.4 Least Squares and Singular Value Decomposition 8.5 Least Squares and QR Factorization . . . . . . .
9 Eigenvalues and Eigenvectors 9.1 Fundamental Definitions and Properties 9.2 Jordan Canonical Form .... . 9.3 Determination of the JCF .... .
9.3.1 Theoretical computation . 9.3.2 On the + l's in JCF blocks
9.4 Geometric Aspects of the JCF 9.5 The Matrix Sign Function.
10 Canonical Forms 10.1 Some Basic Canonical Forms . 10.2 Definite Matrices . . . . . . . 10.3 Equivalence Transformations and Congruence
10.3.1 Block matrices and definiteness 10.4 Rational Canonical Form . . . . . . . . .
11 Linear Differential and Difference Equations ILl Differential Equations . . . . . . . . . . . . . . . .
11.1.1 Properties ofthe matrix exponential . . . . 11.1.2 Homogeneous linear differential equations 11.1.3 Inhomogeneous linear differential equations 11.1.4 Linear matrix differential equations . . 11.1.5 Modal decompositions . . . . . . . . . 11.1.6 Computation of the matrix exponential
11.2 Difference Equations . . . . . . . . . . . . . . 11.2.1 Homogeneous linear difference equations 11.2.2 Inhomogeneous linear difference equations 11.2.3 Computation of matrix powers .
11.3 Higher-Order Equations. . . . . . . . . . . . . . .
Contents
51 51 52 54 57 59
65 65 67 67 67 69 70 71
75 75 82 85 86 88 89 91
95
95 99
102 104 104
109 109 109 112 112 113 114 114 118 118 118 119
120
Contents ix
12 Generalized Eigenvalue Problems 12512.1 The Generalized Eigenvalue/Eigenvector Problem 12512.2 Canonical Forms 12712.3 Application to the Computation of System Zeros 13012.4 Symmetric Generalized Eigenvalue Problems 13112.5 Simultaneous Diagonalization 133
12.5.1 Simultaneous diagonalization via SVD 13312.6 Higher-Order Eigenvalue Problems 135
12.6.1 Conversion to first-order form 135
13 Kronecker Products 13913.1 Definition and Examples 13913.2 Properties of the Kronecker Product 14013.3 Application to Sylvester and Lyapunov Equations 144
Bibliography 151
Index 153
Contents
12 Generalized Eigenvalue Problems 12.1 The Generalized EigenvaluelEigenvector Problem 12.2 Canonical Forms ................ . 12.3 Application to the Computation of System Zeros . 12.4 Symmetric Generalized Eigenvalue Problems . 12.5 Simultaneous Diagonalization ........ .
12.5.1 Simultaneous diagonalization via SVD 12.6 Higher-Order Eigenvalue Problems ..
12.6.1 Conversion to first-order form
13 Kronecker Products 13.1 Definition and Examples ............ . 13.2 Properties of the Kronecker Product ...... . 13.3 Application to Sylvester and Lyapunov Equations
Bibliography
Index
ix
125 125 127 130 131 133 133 135 135
139 139 140 144
151
153
This page intentionally left blank This page intentionally left blank
Preface
This book is intended to be used as a text for beginning graduate-level (or even senior-level)students in engineering, the sciences, mathematics, computer science, or computationalscience who wish to be familar with enough matrix analysis that they are prepared to use itstools and ideas comfortably in a variety of applications. By matrix analysis I mean linearalgebra and matrix theory together with their intrinsic interaction with and application tolinear dynamical systems (systems of linear differential or difference equations). The textcan be used in a one-quarter or one-semester course to provide a compact overview ofmuch of the important and useful mathematics that, in many cases, students meant to learnthoroughly as undergraduates, but somehow didn't quite manage to do. Certain topicsthat may have been treated cursorily in undergraduate courses are treated in more depthand more advanced material is introduced. I have tried throughout to emphasize only themore important and "useful" tools, methods, and mathematical structures. Instructors areencouraged to supplement the book with specific application examples from their ownparticular subject area.
The choice of topics covered in linear algebra and matrix theory is motivated both byapplications and by computational utility and relevance. The concept of matrix factorizationis emphasized throughout to provide a foundation for a later course in numerical linearalgebra. Matrices are stressed more than abstract vector spaces, although Chapters 2 and 3do cover some geometric (i.e., basis-free or subspace) aspects of many of the fundamentalnotions. The books by Meyer [18], Noble and Daniel [20], Ortega [21], and Strang [24]are excellent companion texts for this book. Upon completion of a course based on thistext, the student is then well-equipped to pursue, either via formal courses or through self-study, follow-on topics on the computational side (at the level of [7], [11], [23], or [25], forexample) or on the theoretical side (at the level of [12], [13], or [16], for example).
Prerequisites for using this text are quite modest: essentially just an understandingof calculus and definitely some previous exposure to matrices and linear algebra. Basicconcepts such as determinants, singularity of matrices, eigenvalues and eigenvectors, andpositive definite matrices should have been covered at least once, even though their recollec-tion may occasionally be "hazy." However, requiring such material as prerequisite permitsthe early (but "out-of-order" by conventional standards) introduction of topics such as pseu-doinverses and the singular value decomposition (SVD). These powerful and versatile toolscan then be exploited to provide a unifying foundation upon which to base subsequent top-ics. Because tools such as the SVD are not generally amenable to "hand computation," thisapproach necessarily presupposes the availability of appropriate mathematical software ona digital computer. For this, I highly recommend MATLAB® although other software such as
xi
Preface
This book is intended to be used as a text for beginning graduate-level (or even senior-level) students in engineering, the sciences, mathematics, computer science, or computational science who wish to be familar with enough matrix analysis that they are prepared to use its tools and ideas comfortably in a variety of applications. By matrix analysis I mean linear algebra and matrix theory together with their intrinsic interaction with and application to linear dynamical systems (systems of linear differential or difference equations). The text can be used in a one-quarter or one-semester course to provide a compact overview of much of the important and useful mathematics that, in many cases, students meant to learn thoroughly as undergraduates, but somehow didn't quite manage to do. Certain topics that may have been treated cursorily in undergraduate courses are treated in more depth and more advanced material is introduced. I have tried throughout to emphasize only the more important and "useful" tools, methods, and mathematical structures. Instructors are encouraged to supplement the book with specific application examples from their own particular subject area.
The choice of topics covered in linear algebra and matrix theory is motivated both by applications and by computational utility and relevance. The concept of matrix factorization is emphasized throughout to provide a foundation for a later course in numerical linear algebra. Matrices are stressed more than abstract vector spaces, although Chapters 2 and 3 do cover some geometric (i.e., basis-free or subspace) aspects of many of the fundamental notions. The books by Meyer [18], Noble and Daniel [20], Ortega [21], and Strang [24] are excellent companion texts for this book. Upon completion of a course based on this text, the student is then well-equipped to pursue, either via formal courses or through selfstudy, follow-on topics on the computational side (at the level of [7], [II], [23], or [25], for example) or on the theoretical side (at the level of [12], [13], or [16], for example).
Prerequisites for using this text are quite modest: essentially just an understanding of calculus and definitely some previous exposure to matrices and linear algebra. Basic concepts such as determinants, singularity of matrices, eigenvalues and eigenvectors, and positive definite matrices should have been covered at least once, even though their recollection may occasionally be "hazy." However, requiring such material as prerequisite permits the early (but "out-of-order" by conventional standards) introduction of topics such as pseudoinverses and the singular value decomposition (SVD). These powerful and versatile tools can then be exploited to provide a unifying foundation upon which to base subsequent topics. Because tools such as the SVD are not generally amenable to "hand computation," this approach necessarily presupposes the availability of appropriate mathematical software on a digital computer. For this, I highly recommend MAlLAB® although other software such as
xi
xii Preface
Mathematica® or Mathcad® is also excellent. Since this text is not intended for a course innumerical linear algebra per se, the details of most of the numerical aspects of linear algebraare deferred to such a course.
The presentation of the material in this book is strongly influenced by computa-tional issues for two principal reasons. First, "real-life" problems seldom yield to simpleclosed-form formulas or solutions. They must generally be solved computationally andit is important to know which types of algorithms can be relied upon and which cannot.Some of the key algorithms of numerical linear algebra, in particular, form the foundationupon which rests virtually all of modern scientific and engineering computation. A secondmotivation for a computational emphasis is that it provides many of the essential tools forwhat I call "qualitative mathematics." For example, in an elementary linear algebra course,a set of vectors is either linearly independent or it is not. This is an absolutely fundamentalconcept. But in most engineering or scientific contexts we want to know more than that.If a set of vectors is linearly independent, how "nearly dependent" are the vectors? If theyare linearly dependent, are there "best" linearly independent subsets? These turn out tobe much more difficult problems and frequently involve research-level questions when setin the context of the finite-precision, finite-range floating-point arithmetic environment ofmost modern computing platforms.
Some of the applications of matrix analysis mentioned briefly in this book derivefrom the modern state-space approach to dynamical systems. State-space methods arenow standard in much of modern engineering where, for example, control systems withlarge numbers of interacting inputs, outputs, and states often give rise to models of veryhigh order that must be analyzed, simulated, and evaluated. The "language" in which suchmodels are conveniently described involves vectors and matrices. It is thus crucial to acquirea working knowledge of the vocabulary and grammar of this language. The tools of matrixanalysis are also applied on a daily basis to problems in biology, chemistry, econometrics,physics, statistics, and a wide variety of other fields, and thus the text can serve a ratherdiverse audience. Mastery of the material in this text should enable the student to read andunderstand the modern language of matrices used throughout mathematics, science, andengineering.
While prerequisites for this text are modest, and while most material is developed frombasic ideas in the book, the student does require a certain amount of what is conventionallyreferred to as "mathematical maturity." Proofs are given for many theorems. When they arenot given explicitly, they are either obvious or easily found in the literature. This is idealmaterial from which to learn a bit about mathematical proofs and the mathematical maturityand insight gained thereby. It is my firm conviction that such maturity is neither encouragednor nurtured by relegating the mathematical aspects of applications (for example, linearalgebra for elementary state-space theory) to an appendix or introducing it "on-the-fly" whennecessary. Rather, one must lay a firm foundation upon which subsequent applications andperspectives can be built in a logical, consistent, and coherent fashion.
I have taught this material for many years, many times at UCSB and twice at UCDavis, and the course has proven to be remarkably successful at enabling students fromdisparate backgrounds to acquire a quite acceptable level of mathematical maturity andrigor for subsequent graduate studies in a variety of disciplines. Indeed, many students whocompleted the course, especially the first few times it was offered, remarked afterward thatif only they had had this course before they took linear systems, or signal processing,
xii Preface
Mathematica® or Mathcad® is also excellent. Since this text is not intended for a course in numerical linear algebra per se, the details of most of the numerical aspects of linear algebra are deferred to such a course.
The presentation of the material in this book is strongly influenced by computational issues for two principal reasons. First, "real-life" problems seldom yield to simple closed-form formulas or solutions. They must generally be solved computationally and it is important to know which types of algorithms can be relied upon and which cannot. Some of the key algorithms of numerical linear algebra, in particular, form the foundation upon which rests virtually all of modem scientific and engineering computation. A second motivation for a computational emphasis is that it provides many of the essential tools for what I call "qualitative mathematics." For example, in an elementary linear algebra course, a set of vectors is either linearly independent or it is not. This is an absolutely fundamental concept. But in most engineering or scientific contexts we want to know more than that. If a set of vectors is linearly independent, how "nearly dependent" are the vectors? If they are linearly dependent, are there "best" linearly independent subsets? These tum out to be much more difficult problems and frequently involve research-level questions when set in the context of the finite-precision, finite-range floating-point arithmetic environment of most modem computing platforms.
Some of the applications of matrix analysis mentioned briefly in this book derive from the modem state-space approach to dynamical systems. State-space methods are now standard in much of modem engineering where, for example, control systems with large numbers of interacting inputs, outputs, and states often give rise to models of very high order that must be analyzed, simulated, and evaluated. The "language" in which such models are conveniently described involves vectors and matrices. It is thus crucial to acquire a working knowledge of the vocabulary and grammar of this language. The tools of matrix analysis are also applied on a daily basis to problems in biology, chemistry, econometrics, physics, statistics, and a wide variety of other fields, and thus the text can serve a rather diverse audience. Mastery of the material in this text should enable the student to read and understand the modem language of matrices used throughout mathematics, science, and engineering.
While prerequisites for this text are modest, and while most material is developed from basic ideas in the book, the student does require a certain amount of what is conventionally referred to as "mathematical maturity." Proofs are given for many theorems. When they are not given explicitly, they are either obvious or easily found in the literature. This is ideal material from which to learn a bit about mathematical proofs and the mathematical maturity and insight gained thereby. It is my firm conviction that such maturity is neither encouraged nor nurtured by relegating the mathematical aspects of applications (for example, linear algebra for elementary state-space theory) to an appendix or introducing it "on-the-f1y" when necessary. Rather, one must lay a firm foundation upon which subsequent applications and perspectives can be built in a logical, consistent, and coherent fashion.
I have taught this material for many years, many times at UCSB and twice at UC Davis, and the course has proven to be remarkably successful at enabling students from disparate backgrounds to acquire a quite acceptable level of mathematical maturity and rigor for subsequent graduate studies in a variety of disciplines. Indeed, many students who completed the course, especially the first few times it was offered, remarked afterward that if only they had had this course before they took linear systems, or signal processing.
Preface xiii
or estimation theory, etc., they would have been able to concentrate on the new ideasthey wanted to learn, rather than having to spend time making up for deficiencies in theirbackground in matrices and linear algebra. My fellow instructors, too, realized that byrequiring this course as a prerequisite, they no longer had to provide as much time for"review" and could focus instead on the subject at hand. The concept seems to work.
— AJL, June 2004
Preface XIII
or estimation theory, etc., they would have been able to concentrate on the new ideas they wanted to learn, rather than having to spend time making up for deficiencies in their background in matrices and linear algebra. My fellow instructors, too, realized that by requiring this course as a prerequisite, they no longer had to provide as much time for "review" and could focus instead on the subject at hand. The concept seems to work.
-AJL, June 2004
This page intentionally left blank This page intentionally left blank
Chapter 1
Introduction and Review
1.1 Some Notation and TerminologyWe begin with a brief introduction to some standard notation and terminology to be usedthroughout the text. This is followed by a review of some basic notions in matrix analysisand linear algebra.
The following sets appear frequently throughout subsequent chapters:
1. Rn= the set of n-tuples of real numbers represented as column vectors. Thus, x e Rnmeans
where xi e R for i e n.Henceforth, the notation n denotes the set {1, . . . , n}.
Note: Vectors are always column vectors. A row vector is denoted by yT, wherey G Rn and the superscript T is the transpose operation. That a vector is always acolumn vector rather than a row vector is entirely arbitrary, but this convention makesit easy to recognize immediately throughout the text that, e.g., XTy is a scalar whilexyT is an n x n matrix.
2. Cn = the set of n-tuples of complex numbers represented as column vectors.
3. Rmxn = the set of real (or real-valued) m x n matrices.
4. Rmxnr = the set of real m x n matrices of rank r. Thus, Rnxnn denotes the set of realnonsingular n x n matrices.
5. Cmxn = the set of complex (or complex-valued) m x n matrices.
6. Cmxn = the set of complex m x n matrices of rank r.
1
Chapter 1
Introduction and Review
1.1 Some Notation and Terminology We begin with a brief introduction to some standard notation and terminology to be used throughout the text. This is followed by a review of some basic notions in matrix analysis and linear algebra.
The following sets appear frequently throughout subsequent chapters:
I. IRn = the set of n-tuples of real numbers represented as column vectors. Thus, x E IRn
means
where Xi E IR for i E !!. Henceforth, the notation!! denotes the set {I, ... , n }.
Note: Vectors are always column vectors. A row vector is denoted by y~ where y E IRn and the superscript T is the transpose operation. That a vector is always a column vector rather than a row vector is entirely arbitrary, but this convention makes it easy to recognize immediately throughout the text that, e.g., x T y is a scalar while xyT is an n x n matrix.
2. en = the set of n-tuples of complex numbers represented as column vectors.
3. IR rn xn = the set of real (or real-valued) m x n matrices.
4. 1R;n xn = the set of real m x n matrices of rank r. Thus, IR~ xn denotes the set of real nonsingular n x n matrices.
5. ernxn = the set of complex (or complex-valued) m x n matrices.
6. e;n xn = the set of complex m x n matrices of rank r.
Chapter 1. Introduction and Review
Each of the above also has a "block" analogue obtained by replacing scalar components inthe respective definitions by block submatrices. For example, if A e Rnxn, B e Rm x n , andC e Rmxm, then the (m + n) x (m + n) matrix [ A0 Bc ] is block upper triangular.
The transpose of a matrix A is denoted by AT and is the matrix whose (i, j)th entryis the (7, Oth entry of A, that is, (A7),, = a,,. Note that if A e Rmx", then A7" e E"xm.If A e Cmx", then its Hermitian transpose (or conjugate transpose) is denoted by AH (orsometimes A*) and its (i, j)\h entry is (AH),7 = («77), where the bar indicates complexconjugation; i.e., if z = a + jf$ (j = i = v^T), then z = a — jfi. A matrix A is symmetricif A = AT and Hermitian if A = AH. We henceforth adopt the convention that, unlessotherwise noted, an equation like A = AT implies that A is real-valued while a statementlike A = AH implies that A is complex-valued.
Remark 1.1. While \/—\ is most commonly denoted by i in mathematics texts, j isthe more common notation in electrical engineering and system theory. There is someadvantage to being conversant with both notations. The notation j is used throughout thetext but reminders are placed at strategic locations.
Example 1.2.
Transposes of block matrices can be defined in an obvious way. For example, it iseasy to see that if A,, are appropriately dimensioned subblocks, then
is symmetric (and Hermitian).
is complex-valued symmetric but not Hermitian.
is Hermitian (but not symmetric).
2
We now classify some of the more familiar "shaped" matrices. A matrix A e(or A eC"x")is
• diagonal if a,7 = 0 for i ^ j.
• upper triangular if a,; = 0 for i > j.
• lower triangular if a,7 = 0 for / < j.
• tridiagonal if a(y = 0 for |z — j\ > 1.
• pentadiagonal if ai; = 0 for |/ — j\ > 2.
• upper Hessenberg if afj = 0 for i — j > 1.
• lower Hessenberg if a,; = 0 for j — i > 1.
2 Chapter 1. Introduction and Review
We now classify some of the more familiar "shaped" matrices. A matrix A E IRn xn
(or A E enxn ) is
• diagonal if aij = 0 for i i= }.
• upper triangular if aij = 0 for i > }.
• lower triangular if aij = 0 for i < }.
• tridiagonal if aij = 0 for Ii - JI > 1.
• pentadiagonal if aij = 0 for Ii - J I > 2.
• upper Hessenberg if aij = 0 for i - j > 1.
• lower Hessenberg if aij = 0 for } - i > 1.
Each of the above also has a "block" analogue obtained by replacing scalar components in the respective definitions by block submatrices. For example, if A E IRnxn , B E IRnxm , and
C E jRmxm, then the (m + n) x (m + n) matrix [~ ~] is block upper triangular.
The transpose of a matrix A is denoted by AT and is the matrix whose (i, j)th entry is the (j, i)th entry of A, that is, (AT)ij = aji. Note that if A E jRmxn, then AT E jRnxm.
If A E em xn, then its Hermitian transpose (or conjugate transpose) is denoted by A H (or sometimes A*) and its (i, j)th entry is (AH)ij = (aji), where the bar indicates complex conjugation; i.e., if z = IX + jfJ (j = i = R), then z = IX - jfJ. A matrix A is symmetric if A = A T and Hermitian if A = A H. We henceforth adopt the convention that, unless otherwise noted, an equation like A = A T implies that A is real-valued while a statement like A = AH implies that A is complex-valued.
Remark 1.1. While R is most commonly denoted by i in mathematics texts, } is the more common notation in electrical engineering and system theory. There is some advantage to being conversant with both notations. The notation j is used throughout the text but reminders are placed at strategic locations.
Example 1.2.
1. A = [ ; ~ ] is symmetric (and Hermitian).
2. A = [ 5
7+} 7 + j ] 2 is complex-valued symmetric but not Hermitian.
[ 5 7+} ] 3 A - 2 is Hermitian (but not symmetric). · - 7 - j
Transposes of block matrices can be defined in an obvious way. For example, it is easy to see that if Aij are appropriately dimensioned subblocks, then
r = [
1.2. Matrix Arithmetic
1.2 Matrix ArithmeticIt is assumed that the reader is familiar with the fundamental notions of matrix addition,multiplication of a matrix by a scalar, and multiplication of matrices.
A special case of matrix multiplication occurs when the second matrix is a columnvector x, i.e., the matrix-vector product Ax. A very important way to view this product isto interpret it as a weighted sum (linear combination) of the columns of A. That is, suppose
The importance of this interpretation cannot be overemphasized. As a numerical example,
take A = [96 85 74]x = 2 . Then we can quickly calculate dot products of the rows of A
with the column x to find Ax = [50 32]' but this matrix-vector product can also be computedv1a
For large arrays of numbers, there can be important computer-architecture-related advan-tages to preferring the latter calculation method.
For matrix multiplication, suppose A e Rmxn and B = [bi,...,bp] e Rnxp withbi e W1. Then the matrix product A B can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appearsfrequently in the text and is presented below as a theorem. Again, its importance cannot beoveremphasized. It is deceptively simple and its full understanding is well rewarded.
Theorem 1.3. Let U = [MI , . . . , un] e Rmxn with ut e Rm and V = [v{,..., vn] e Rpxn
with vt e Rp. Then
If matrices C and D are compatible for multiplication, recall that (CD)T = DTCT
(or (CD}H — DHCH). This gives a dual to the matrix-vector result above. Namely, ifC eRmxn has row vectors cj e Elx", and is premultiplied by a row vector yTe R l xm ,then the product can be written as a weighted linear sum of the rows of C as follows:
3
Theorem 1.3 can then also be generalized to its "row dual." The details are left to the readei
Then
1.2. Matrix Arithmetic 3
1 .2 Matrix Arithmetic It is assumed that the reader is familiar with the fundamental notions of matrix addition, multiplication of a matrix by a scalar, and multiplication of matrices.
A special case of matrix multiplication occurs when the second matrix is a column vector x, i.e., the matrix-vector product Ax. A very important way to view this product is to interpret it as a weighted sum (linear combination) of the columns of A. That is, suppose
I ;xn~ ] A = la' ....• a"1 E JRm" with a, E JRm and x = l Then
Ax = Xjal + ... + Xnan E jRm.
The importance of this interpretation cannot be overemphasized. As a numerical example,
take A = [~ ! ~], x = [~]. Then we can quickly calculate dot products of the rows of A
with the column x to find Ax = [;~], but this matrix-vector product can also be computed via
3.[ ~ J+2.[ ~ J+l.[ ~ l For large arrays of numbers, there can be important computer-architecture-related advantages to preferring the latter calculation method.
For matrix multiplication, suppose A E jRmxn and B = [hI,.'" hp ] E jRnxp with hi E jRn. Then the matrix product AB can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appears frequently in the text and is presented below as a theorem. Again, its importance cannot be overemphasized. It is deceptively simple and its full understanding is well rewarded.
Theorem 1.3. Let U = [Uj, ... , un] E jRmxn with Ui E jRm and V = [VI, .•. , Vn] E lRPxn
with Vi E jRP. Then
n
UV T = LUiVr E jRmxp.
i=I
If matrices C and D are compatible for multiplication, recall that (C D)T = DT CT
(or (C D)H = DH C H). This gives a dual to the matrix-vector result above. Namely, if C E jRmxn has row vectors cJ E jRlxn, and is premultiplied by a row vector yT E jRlxm,
then the product can be written as a weighted linear sum of the rows of C as follows:
yTC=YICf +"'+Ymc~ EjRlxn.
Theorem 1.3 can then also be generalized to its "row dual." The details are left to the reader.
Chapter 1. Introduction and Review
1.3 Inner Products and OrthogonalityFor vectors x, y e R", the Euclidean inner product (or inner product, for short) of x andy is given by
Note that the inner product is a scalar.If x, y e C", we define their complex Euclidean inner product (or inner product,
for short) by
and we see that, indeed, (x, y)c = (y, x)c.
Note that x Tx = 0 if and only if x = 0 when x e Rn but that this is not true if x e Cn.What is true in the complex case is that XH x = 0 if and only if x = 0. To illustrate, considerthe nonzero vector x above. Then XTX = 0 but XHX = 2.
Two nonzero vectors x, y e R are said to be orthogonal if their inner product iszero, i.e., xTy = 0. Nonzero complex vectors are orthogonal if XHy = 0. If x and y areorthogonal and XTX = 1 and yTy = 1, then we say that x and y are orthonormal. Amatrix A e Rnxn is an orthogonal matrix if ATA = AAT = /, where / is the n x nidentity matrix. The notation /„ is sometimes used to denote the identity matrix in Rnx"(orC"x"). Similarly, a matrix A e Cnxn is said to be unitary if AH A = AAH = I. Clearlyan orthogonal or unitary matrix has orthonormal rows and orthonormal columns. There isno special name attached to a nonsquare matrix A e Rmxn (or € Cmxn) with orthonormalrows or columns.
1.4 DeterminantsIt is assumed that the reader is familiar with the basic theory of determinants. For A e R nxn
(or A 6 Cnxn) we use the notation det A for the determinant of A. We list below some of
Note that (x, y)c = (y, x)c, i.e., the order in which x and y appear in the complex innerproduct is important. The more conventional definition of the complex inner product is( x , y ) c = yHx = Eni=1 xiyi but throughout the text we prefer the symmetry with the realcase.
Example 1.4. Let x = [ 1j ] and y = [ 1/2 ]. Then
while
44 Chapter 1. Introduction and Review
1.3 Inner Products and Orthogonality For vectors x, y E IRn, the Euclidean inner product (or inner product, for short) of x and y is given by
n
(x, y) := x T y = Lx;y;. ;=1
Note that the inner product is a scalar. If x, y E <en, we define their complex Euclidean inner product (or inner product,
for short) by
n
(x'Y}c :=xHy = Lx;y;. ;=1
Note that (x, y)c = (y, x}c, i.e., the order in which x and y appear in the complex inner product is important. The more conventional definition of the complex inner product is (x, y)c = yH x = L:7=1 x;y; but throughout the text we prefer the symmetry with the real case.
Example 1.4. Let x = [} ] and y = [~]. Then
(x, Y}c = [ } JH [ ~ ] = [I - j] [ ~ ] = 1 - 2j
while
and we see that, indeed, (x, Y}c = {y, x)c'
Note that x T x = 0 if and only if x = 0 when x E IRn but that this is not true if x E en. What is true in the complex case is that x H x = 0 if and only if x = O. To illustrate, consider the nonzero vector x above. Then x T x = 0 but x H
X = 2. Two nonzero vectors x, y E IRn are said to be orthogonal if their inner product is
zero, i.e., x T y = O. Nonzero complex vectors are orthogonal if x H y = O. If x and y are orthogonal and x T x = 1 and yT y = 1, then we say that x and y are orthonormal. A matrix A E IRnxn is an orthogonal matrix if AT A = AAT = I, where I is the n x n identity matrix. The notation In is sometimes used to denote the identity matrix in IRnxn
(or en xn). Similarly, a matrix A E en xn is said to be unitary if A H A = AA H = I. Clearly an orthogonal or unitary matrix has orthonormal rows and orthonormal columns. There is no special name attached to a nonsquare matrix A E ]Rrn"n (or E emxn
) with orthonormal rows or columns.
1.4 Determinants It is assumed that the reader is familiar with the basic theory of determinants. For A E IRn xn (or A E en xn) we use the notation det A for the determinant of A. We list below some of
1.4. Determinants
the more useful properties of determinants. Note that this is not a minimal set, i.e., severalproperties are consequences of one or more of the others.
1. If A has a zero row or if any two rows of A are equal, then det A = 0.
2. If A has a zero column or if any two columns of A are equal, then det A = 0.
3. Interchanging two rows of A changes only the sign of the determinant.
4. Interchanging two columns of A changes only the sign of the determinant.
5. Multiplying a row of A by a scalar a results in a new matrix whose determinant isa det A.
6. Multiplying a column of A by a scalar a results in a new matrix whose determinantis a det A.
7. Multiplying a row of A by a scalar and then adding it to another row does not changethe determinant.
8. Multiplying a column of A by a scalar and then adding it to another column does notchange the determinant.
9. det AT = det A (det AH = det A if A e Cnxn).
10. If A is diagonal, then det A = a11a22 • • • ann, i.e., det A is the product of its diagonalelements.
11. If A is upper triangular, then det A = a11a22 • • • ann.
12. If A is lower triangular, then det A = a11a22 • • • ann.
13. If A is block diagonal (or block upper triangular or block lower triangular), withsquare diagonal blocks A11, A22, • • •, Ann (of possibly different sizes), then det A =det A11 det A22 • • • det Ann.
14. If A, B eRnxn,thendet(AB) = det A det 5.
15. If A € Rnxn, then det(A-1) = 1det A.
16. If A e Rn x nand D e Rmxm , then det [Ac BD] = del A det(D – CA– l B).
Proof: This follows easily from the block LU factorization
17. If A eR n x n and D e RMmxm, then det [Ac BD] = det D det(A – B D – 1 C ) .
Proof: This follows easily from the block UL factorization
51.4. Determinants 5
the more useful properties of determinants. Note that this is not a minimal set, i.e., several properties are consequences of one or more of the others.
1. If A has a zero row or if any two rows of A are equal, then det A = o.
2. If A has a zero column or if any two columns of A are equal, then det A = O.
3. Interchanging two rows of A changes only the sign of the determinant.
4. Interchanging two columns of A changes only the sign of the determinant.
5. Multiplying a row of A by a scalar ex results in a new matrix whose determinant is exdetA.
6. Multiplying a column of A by a scalar ex results in a new matrix whose determinant is ex det A.
7. Multiplying a row of A by a scalar and then adding it to another row does not change the determinant.
8. Multiplying a column of A by a scalar and then adding it to another column does not change the determinant.
9. detAT = detA (detA H = detA if A E C"X").
10. If A is diagonal, then det A = alla22 ... ann, i.e., det A is the product of its diagonal elements.
11. If A is upper triangular, then det A = all a22 ... a"n.
12. If A is lower triangUlar, then det A = alla22 ... ann.
13. If A is block diagonal (or block upper triangular or block lower triangular), with square diagonal blocks A 11, A22 , ... , An" (of possibly different sizes), then det A = det A 11 det A22 ... det Ann.
14. If A, B E IRnxn , then det(AB) = det A det B.
15. If A E lR~xn, then det(A- 1) = de: A .
16. If A E lR~xn and DE IRmxm, then det [~ ~] = detA det(D - CA- 1 B).
Proof" This follows easily from the block LU factorization
[~ ~J=[ ~ ] [ ~ 17. If A E IRnxn and D E lR~xm, then det [~ ~] = det D det(A - B D- 1C).
Proof" This follows easily from the block UL factorization
BD- 1
I ] [
Chapter 1. Introduction and Review
Remark 1.5. The factorization of a matrix A into the product of a unit lower triangularmatrix L (i.e., lower triangular with all 1's on the diagonal) and an upper triangular matrixU is called an LU factorization; see, for example, [24]. Another such factorization is ULwhere U is unit upper triangular and L is lower triangular. The factorizations used aboveare block analogues of these.
Remark 1.6. The matrix D — CA– 1 B is called the Schur complement of A in [AC BD].
Similarly, A – BD– l C is the Schur complement of D in [AC BD ].
EXERCISES
1. If A e Rnxn and or is a scalar, what is det(aA)? What is det(–A)?
2. If A is orthogonal, what is det A? If A is unitary, what is det A?
3. Let x, y e Rn. Show that det(I – xyT) = 1 – yTx.
4. Let U1, U2, . . ., Uk € Rnxn be orthogonal matrices. Show that the product U =U1 U2 • • • Uk is an orthogonal matrix.
5. Let A e R n x n . The trace of A, denoted TrA, is defined as the sum of its diagonalelements, i.e., TrA = Eni=1 aii.
(a) Show that the trace is a linear function; i.e., if A, B e Rnxn and a, ft e R, thenTr(aA + fiB)= aTrA + fiTrB.
(b) Show that Tr(Afl) = Tr(£A), even though in general AB ^ B A.
(c) Let S € Rnxn be skew-symmetric, i.e., ST = -S. Show that TrS = 0. Theneither prove the converse or provide a counterexample.
6. A matrix A e Wx" is said to be idempotent if A2 = A.
/ x ™ . , • , ! T 2cos2<9 sin 20 1 . . _,(a) Show that the matrix A = - . _.. _ . 2rt is idempotent for all #.
2 |_ sin 2^ 2smz# J r
(b) Suppose A e IR"X" is idempotent and A ^ I. Show that A must be singular.
66 Chapter 1. Introduction and Review
Remark 1.5. The factorization of a matrix A into the product of a unit lower triangular matrix L (i.e., lower triangular with all l's on the diagonal) and an upper triangular matrix V is called an LV factorization; see, for example, [24]. Another such factorization is VL where V is unit upper triangular and L is lower triangular. The factorizations used above are block analogues of these.
Remark 1.6. The matrix D - e A -I B is called the Schur complement of A in [~ ~ ].
Similarly, A - BD-Ie is the Schur complement of Din [~ ~ l
EXERCISES
1. If A E jRnxn and a is a scalar, what is det(aA)? What is det(-A)?
2. If A is orthogonal, what is det A? If A is unitary, what is det A?
3. Letx,y E jRn. Showthatdet(l-xyT) = 1- yTx.
4. Let VI, V2, ... ,Vk E jRn xn be orthogonal matrices. Show that the product V = VI V2 ... Vk is an orthogonal matrix.
5. Let A E jRNxn. The trace of A, denoted Tr A, is defined as the sum of its diagonal elements, i.e., TrA = L~=I au·
(a) Show that the trace is a linear function; i.e., if A, B E JRn xn and a, f3 E JR, then Tr(aA + f3B) = aTrA + f3TrB.
(b) Show that Tr(AB) = Tr(BA), even though in general AB i= BA.
(c) Let S E jRnxn be skew-symmetric, i.e., ST = -So Show that TrS = O. Then either prove the converse or provide a counterexample.
6. A matrix A E jRnxn is said to be idempotent if A2 = A.
I [ 2cos20 (a) Show that the matrix A = - . 2f) 2 sm 0
sin 20 J. . d .. II II 2sin20 IS I empotent lor a o.
(b) Suppose A E jRn xn is idempotent and A i= I. Show that A must be singular.
Chapter 2
Vector Spaces
In this chapter we give a brief review of some of the basic concepts of vector spaces. Theemphasis is on finite-dimensional vector spaces, including spaces formed by special classesof matrices, but some infinite-dimensional examples are also cited. An excellent referencefor this and the next chapter is [10], where some of the proofs that are not given here maybe found.
2.1 Definitions and ExamplesDefinition 2.1. A field is a set F together with two operations +, • : F x F —> F such that
Axioms (A1)-(A3) state that (F, +) is a group and an abelian group if (A4) also holds.Axioms (M1)-(M4) state that (F \ {0}, •) is an abelian group.
Generally speaking, when no confusion can arise, the multiplication operator "•" isnot written explicitly.
7
(Al) a + (P + y ) = (a + p ) + y f o r all a, ft, y € F.
(A2) there exists an element 0 e F such that a + 0 = a. for all a e F.
(A3) for all a e F, there exists an element (—a) e F such that a + (—a) = 0.
(A4) a + p = ft + afar all a, ft e F.
(Ml) a - ( p - y ) = ( a - p ) - y f o r all a, p, y e F.
(M2) there exists an element 1 e F such that a • I = a for all a e F.
(M3) for all a e ¥, a ^ 0, there exists an element a"1 € F such that a • a~l = 1.
(M4) a • p = P • a for all a, p e F.
(D) a - ( p + y)=ci-p+a- y for alia, p,ye¥.
Chapter 2
Vector Spaces
In this chapter we give a brief review of some of the basic concepts of vector spaces. The emphasis is on finite-dimensional vector spaces, including spaces formed by special classes of matrices, but some infinite-dimensional examples are also cited. An excellent reference for this and the next chapter is [10], where some of the proofs that are not given here may be found.
2.1 Definitions and Examples Definition 2.1. A field is a set IF together with two operations +, . : IF x IF ~ IF such that
(Al) a + (,8 + y) = (a +,8) + y for all a,,8, y Elf.
(A2) there exists an element 0 E IF such that a + 0 = a for all a E IF.
(A3) for all a E IF, there exists an element (-a) E IF such that a + (-a) = O.
(A4) a + ,8 = ,8 + a for all a, ,8 Elf.
(Ml) a· (,8, y) = (a·,8)· y for all a,,8, y Elf.
(M2) there exists an element I E IF such that a . I = a for all a E IF.
(M3) for all a E IF, a f. 0, there exists an element a-I E IF such that a . a-I = 1.
(M4) a·,8 =,8 . afar all a, ,8 E IF.
(D) a· (,8 + y) = a·,8 +a· y for all a, ,8, y Elf.
Axioms (Al)-(A3) state that (IF, +) is a group and an abelian group if (A4) also holds. Axioms (MI)-(M4) state that (IF \ to), .) is an abelian group.
Generally speaking, when no confusion can arise, the multiplication operator "." is not written explicitly.
7
Chapter 2. Vector Spaces
Example 2.2.
1. R with ordinary addition and multiplication is a field.
2. C with ordinary complex addition and multiplication is a field.
3. Raf.r] = the field of rational functions in the indeterminate x
8
where Z+ = {0,1,2, ...}, is a field.
4. RMrmxn = { m x n matrices of rank r with real coefficients) is clearly not a field since,for example, (Ml) does not hold unless m = n. Moreover, R"x" is not a field eithersince (M4) does not hold in general (although the other 8 axioms hold).
Definition 2.3. A vector space over a field F is a set V together with two operations+ :V x V -^V and- : F xV -»• V such that
A vector space is denoted by (V, F) or, when there is no possibility of confusion as to theunderlying fie Id, simply by V.
Remark 2.4. Note that + and • in Definition 2.3 are different from the + and • in Definition2.1 in the sense of operating on different objects in different sets. In practice, this causesno confusion and the • operator is usually not even written explicitly.
Example 2.5.
1. (R", R) with addition defined by
and scalar multiplication defined by
is a vector space. Similar definitions hold for (C", C).
(VI) (V, +) is an abelian group.
(V2) ( a - p ) - v = a - ( P ' V ) f o r all a, p e F and for all v e V.
(V3) (a + ft) • v = a • v + p • v for all a, p € F and for all v e V.
(V4) a-(v + w)=a-v + a- w for all a e F and for all v, w e V.
(V5) 1 • v = v for all v e V (1 e F).
8 Chapter 2. Vector Spaces
Example 2.2.
I. IR with ordinary addition and multiplication is a field.
2. e with ordinary complex addition and multiplication is a field.
3. Ra[x] = the field of rational functions in the indeterminate x
= {ao + atX + ... + apxP +} :aj,f3i EIR ;P,qEZ ,
f30 + f3t X + ... + f3qXq
where Z+ = {O,l,2, ... }, is a field.
4. IR~ xn = { m x n matrices of rank r with real coefficients} is clearly not a field since, for example, (MI) does not hold unless m = n. Moreover, lR~xn is not a field either since (M4) does not hold in general (although the other 8 axioms hold).
Definition 2.3. A vector space over a field IF is a set V together with two operations + : V x V -+ V and· : IF x V -+ V such that
(VI) (V, +) is an abelian group.
(V2) (a· f3) . v = a . (f3 . v) for all a, f3 E IF andfor all v E V.
(V3) (a + f3). v = a· v + f3. v for all a, f3 Elf andforall v E V.
(V4) a· (v + w) = a . v + a . w for all a ElF andfor all v, w E V.
(V5) I· v = v for all v E V (1 Elf).
A vector space is denoted by (V, IF) or, when there is no possibility of confusion as to the underlying field, simply by V.
Remark 2.4. Note that + and· in Definition 2.3 are different from the + and . in Definition 2.1 in the sense of operating on different objects in different sets. In practice, this causes no confusion and the· operator is usually not even written explicitly.
Example 2.5.
I. (IRn, IR) with addition defined by
and scalar multiplication defined by
is a vector space. Similar definitions hold for (en, e).
2.2. Subspaces
3. Let (V, F) be an arbitrary vector space and V be an arbitrary set. Let O(X>, V) be theset of functions / mapping D to V. Then O(D, V) is a vector space with additiondefined by
2.2 SubspacesDefinition 2.6. Let (V, F) be a vector space and let W c V, W = 0. Then (W, F) is asubspace of (V, F) if and only if (W, F) is itself a vector space or, equivalently, if and onlyi f ( a w 1 + ßW2) e W for all a, ß e ¥ and for all w1, w2 e W.
Remark 2.7. The latter characterization of a subspace is often the easiest way to checkor prove that something is indeed a subspace (or vector space); i.e., verify that the set inquestion is closed under addition and scalar multiplication. Note, too, that since 0 e F, thisimplies that the zero vector must be in any subspace.
Notation: When the underlying field is understood, we write W c V, and the symbol c,when used with vector spaces, is henceforth understood to mean "is a subspace of." Theless restrictive meaning "is a subset of" is specifically flagged as such.
9
2. (Emxn, E) is a vector space with addition defined by
and scalar multiplication defined by
and scalar multiplication defined by
Special Cases:
(a) V = [to, t \ ] , (V, F) = (IR", E), and the functions are piecewise continuous=: (PC[f0, t\])n or continuous =: (C[?0, h])n.
4. Let A € R"x". Then (x(t) : x ( t ) = Ax(t}} is a vector space (of dimension n).
2.2. Subspaces 9
2. (JRmxn, JR) is a vector space with addition defined by
[ ." + P" al2 + fJI2 aln + fJln
l a21 + fJ2I a22 + fJ22 a2n + fJ2n A+B= .
amI + fJml am2 + fJm2 amn + fJmn
and scalar multiplication defined by
[ ya" y a l2
ya," l y a 21 y a22 ya2n yA = . . .
yaml yam2 yamn
3. Let (V, IF) be an arbitrary vector space and '0 be an arbitrary set. Let cf>('O, V) be the set of functions f mapping '0 to V. Then cf>('O, V) is a vector space with addition defined by
(f + g)(d) = fed) + g(d) for all d E '0 and for all f, g E cf>
and scalar multiplication defined by
(af)(d) = af(d) for all a E IF, for all d ED, and for all f E cf>.
Special Cases:
(a) '0 = [to, td, (V, IF) = (JRn , JR), and the functions are piecewise continuous =: (PC[to, td)n or continuous =: (C[to, td)n.
(b) '0 = [to, +00), (V, IF) = (JRn, JR), etc.
4. Let A E JR(nxn. Then {x(t) : x(t) = Ax(t)} is a vector space (of dimension n).
2.2 Subspaces Definition 2.6. Let (V, IF) be a vector space and let W ~ V, W f= 0. Then (W, IF) is a subspace of (V, IF) if and only if (W, IF) is itself a vector space or, equivalently, if and only if(awl + fJw2) E W foral! a, fJ E IF andforall WI, W2 E W.
Remark 2.7. The latter characterization of a subspace is often the easiest way to check or prove that something is indeed a subspace (or vector space); i.e., verify that the set in question is closed under addition and scalar multiplication. Note, too, that since 0 E IF, this implies that the zero vector must be in any subspace.
Notation: When the underlying field is understood, we write W ~ V, and the symbol ~, when used with vector spaces, is henceforth understood to mean "is a subspace of." The less restrictive meaning "is a subset of' is specifically flagged as such.
Then Wa,ß is a subspace of V if and only if ß = 0. As an interesting exercise, sketchW2,1, W2,o, W1/2,1, and W1/2, 0. Note, too, that the vertical line through the origin (i.e.,a = oo) is also a subspace.
All lines through the origin are subspaces. Shifted subspaces Wa,ß with ß = 0 arecalled linear varieties.
Henceforth, we drop the explicit dependence of a vector space on an underlying field.Thus, V usually denotes a vector space with the underlying field generally being R unlessexplicitly stated otherwise.
Definition 2.9. If 12, and S are vector spaces (or subspaces), then R = S if and only ifR C S and S C R.
Note: To prove two vector spaces are equal, one usually proves the two inclusions separately:An arbitrary r e R is shown to be an element of S and then an arbitrary 5 € S is shown tobe an element of R.
2.3 Linear IndependenceLet X = {v1, v2, • • •} be a nonempty collection of vectors u, in some vector space V.
Definition 2.10. X is a linearly dependent set of vectors if and only if there exist k distinctelements v1, . . . , vk e X and scalars a1, . . . , ak not all zero such that
10 Chapter 2. Vector Spaces
Example 2.8.
1. Consider (V, F) = (R"X",R) and let W = [A e R"x" : A is symmetric}. ThenWe V.
Proof: Suppose A\, A2 are symmetric. Then it is easily shown that ctA\ + fiAi issymmetric for all a, ft e R.
2. Let W = {A € R"x" : A is orthogonal}. Then W is /wf a subspace of R"x".
3. Consider (V, F) = (R2, R) and for each v € R2 of the form v = [v1v2 ] identify v1 withthe jc-coordinate in the plane and u2 with the y-coordinate. For a, ß e R, define
X is a linearly independent set of vectors if and only if for any collection of k distinctelements v1, . . . ,Vk of X and for any scalars a1, . . . , ak,
10 Chapter 2. Vector Spaces
Example 2.S.
1. Consider (V,lF) = (JR.nxn,JR.) and let W = {A E JR.nxn : A is symmetric}. Then W~V.
Proof' Suppose AI, A2 are symmetric. Then it is easily shown that aAI + f3A2 is symmetric for all a, f3 E R
2. Let W = {A E ]Rnxn : A is orthogonal}. Then W is not a subspace of JR.nxn.
3. Consider (V, IF) = (]R2, JR.) and for each v E ]R2 of the form v = [~~ ] identify VI with the x-coordinate in the plane and V2 with the y-coordinate. For a, f3 E R define
W",/l = {V : v = [ ac ~ f3 ] ; c E JR.} .
Then W",/l is a subspace of V if and only if f3 = O. As an interesting exercise, sketch W2.I, W2,O, Wi,I' and Wi,o, Note, too, that the vertical line through the origin (i.e., a = 00) is also a subspace.
All lines through the origin are subspaces. Shifted subspaces W",/l with f3 =1= 0 are called linear varieties.
Henceforth, we drop the explicit dependence of a vector space on an underlying field. Thus, V usually denotes a vector space with the underlying field generally being JR. unless explicitly stated otherwise.
Definition 2.9. ffR and S are vector spaces (or subspaces), then R = S if and only if R ~SandS ~ R.
Note: To prove two vector spaces are equal, one usually proves the two inclusions separately: An arbitrary r E R is shown to be an element of S and then an arbitrary s E S is shown to be an element of R.
2.3 Linear Independence Let X = {VI, V2, •.• } be a nonempty collection of vectors Vi in some vector space V.
Definition 2.10. X is a linearly dependent set of vectors if and only if there exist k distinct elements VI, ... , Vk E X and scalars aI, ..• , (Xk not all zero such that
X is a linearly independent set of vectors if and only if for any collection of k distinct elements VI, ... , Vk of X and for any scalars aI, ••• , ak,
al VI + ... + (XkVk = 0 implies al = 0, ... , ak = O.
2.3. Linear Independence 11
(since 2v\ — v2 + v3 = 0).
2. Let A e R xn and 5 e R" xm. Then consider the rows of etA B as vectors in Cm [t0, t1](recall that efA denotes the matrix exponential, which is discussed in more detail inChapter 11). Independence of these vectors turns out to be equivalent to a conceptcalled controllability, to be studied further in what follows.
Let vf e R", i e k, and consider the matrix V = [v1 , ... ,Vk] e Rnxk. The lineardependence of this set of vectors is equivalent to the existence of a nonzero vector a e Rk
such that Va = 0. An equivalent condition for linear dependence is that the k x k matrixVT V is singular. If the set of vectors is independent, and there exists a e R* such thatVa = 0, then a = 0. An equivalent condition for linear independence is that the matrixVTV is nonsingular.
Definition 2.12. Let X = [ v 1 , v 2 , . . . } be a collection of vectors vi. e V. Then the span ofX is defined as
Example 2.13. Let V = Rn and define
Then Sp{e1, e2, ...,en} = Rn.
Definition 2.14. A set of vectors X is a basis for V if and only ij
1. X is a linearly independent set (of basis vectors), and
2. Sp(X) = V.
Example 2.11.
is a linearly independent set. Why?
s a linearly dependent setHowever,
1. LetV = R3. Then
where N = {1, 2, ...}.
2.3. Linear Independence 11
Example 2.11.
I. 1£t V = ~, Then {[ ~ H i Hi] } i" Ime~ly independent.. Why?
Howe,."I [ i 1 [ i 1 [ l ] } is a Iin=ly de~ndent ~t (since 2vI - V2 + V3 = 0).
2. Let A E ]Rnxn and B E ]Rnxm. Then consider the rows of etA B as vectors in em [to, tIl (recall that etA denotes the matrix exponential, which is discussed in more detail in Chapter 11). Independence of these vectors turns out to be equivalent to a concept called controllability, to be studied further in what follows.
Let Vi E ]Rn, i E If, and consider the matrix V = [VI, ... , Vk] E ]Rnxk. The linear dependence of this set of vectors is equivalent to the existence of a nonzero vector a E ]Rk
such that Va = O. An equivalent condition for linear dependence is that the k x k matrix VT V is singular. If the set of vectors is independent, and there exists a E ]Rk such that Va = 0, then a = O. An equivalent condition for linear independence is that the matrix V T V is nonsingular.
Definition 2.12. Let X = {VI, V2, ..• } be a collection of vectors Vi E V. Then the span of X is defined as
Sp(X) = Sp{VI, V2, ... }
= {v : V = (Xl VI + ... + (XkVk ; (Xi ElF, Vi EX, kEN},
where N = {I, 2, ... }.
Example 2.13. Let V = ]Rn and define
0 0 0 1 0
el = 0 , e2 = 0 ,'" ,en = 0
o o
Then SpIel, e2, ... , en} = ]Rn.
Definition 2.14. A set of vectors X is a basis for V if and only if
1. X is a linearly independent set (of basis vectors), and
2. Sp(X) = V.
12 Chapter 2. Vector Spaces
Example 2.15. [e\,..., en} is a basis for IR" (sometimes called the natural basis).
Now let b1, ..., bn be a basis (with a specific order associated with the basis vectors)for V. Then for all v e V there exists a unique n-tuple {E1 , . . . , E n } such that
Definition 2.16. The scalars {Ei} are called the components (or sometimes the coordinates)of v with respect to the basis (b1, ..., bn] and are unique. We say that the vector x ofcomponents represents the vector v with respect to the basis B.
Example 2.17. In Rn,
we have
To see this, write
Then
Theorem 2.18. The number of elements in a basis of a vector space is independent of theparticular basis considered.
Definition 2.19. If a basis X for a vector space V= 0) has n elements, V is said tobe n-dimensional or have dimension n and we write dim(V) = n or dim V — n. For
We can also determine components of v with respect to another basis. For example, while
with respect to the basis
where
12 Chapter 2. Vector Spaces
Example 2.15. {el, ... , en} is a basis for]Rn (sometimes called the natural basis).
Now let b l , ... , bn be a basis (with a specific order associated with the basis vectors) for V. Then for all v E V there exists a unique n-tuple {~I' ... , ~n} such that
v = ~Ibl + ... + ~nbn = Bx,
where
B ~ [b".,b.l. x ~ D J Definition 2.16. The scalars {~i } are called the components (or sometimes the coordinates) of v with respect to the basis {b l , ... , bn} and are unique. We say that the vector x of components represents the vector v with respect to the basis B.
Example 2.17. In]Rn,
VI ] : = vlel + V2e2 + ... + vnen ·
Vn
We can also determine components of v with respect to another basis. For example, while
with respect to the basis
we have
To see this, write
Then
[ ~ ] = I . el + 2 . e2,
{[-~l[-!J}
[ ~ ] = 3 . [ -~ ] + 4· [ -~ l [ ~ ] = XI • [ - ~ ] + X2 • [ _! ]
= [ - ~ -! ] [ ~~ l [ ~~ ] = [ -; -1 r I [ ; ] = [ ~ l
Theorem 2.18. The number of elements in a basis of a vector space is independent of the particular basis considered.
Definition 2.19. If a basis X for a vector space V(Jf 0) has n elements, V is said to be n.dimensional or have dimension n and we write dim (V) = n or dim V = n. For
2.4 Sums and Intersections of SubspacesDefinition 2.21. Let (V, F) be a vector space and let 71, S c V. The sum and intersectionof R, and S are defined respectively by:
The subspaces R, and S are said to be complements of each other in T.
Remark 2.23. The union of two subspaces, R C S, is not necessarily a subspace.
Definition 2.24. T = R 0 S is the direct sum of R and S if
Theorem 2.22.
2.4. Sums and Intersections of Subspaces 13
consistency, and because the 0 vector is in any vector space, we define dim(O) = 0. Avector space V is finite-dimensional if there exists a basis X with n < +00 elements;otherwise, V is infinite-dimensional.
Thus, Theorem 2.18 says that dim(V) = the number of elements in a basis.
Example 2.20.
1. dim(Rn)=n.
2. dim(Rmxn) = mn.
Note: Check that a basis for Rmxn is given by the mn matrices Eij; i e m, j e n,where Efj is a matrix all of whose elements are 0 except for a 1 in the (i, j)th location.The collection of Eij matrices can be called the "natural basis matrices."
3. dim(C[to, t1]) - +00.
4. dim{A € Rnxn : A = AT} = {1/2(n + 1).12(To see why, determine 1/2n(n + 1) symmetric basis matrices.)
5. dim{A e Rnxn : A is upper (lower) triangular} = 1/2n(n + 1).
1. n + S = {r + s : r e U, s e 5}.
2. ft H 5 = {v : v e 7^ and v e 5}.
K
1. K + S C V (in general, U\ -\ h 7^ =: ]T ft/ C V, for finite k).1=1
2. 72. D 5 C V (in general, f] *R,a C V/or an arbitrary index set A).a e A
1. n n S = 0, and
2. U + S = T (in general ft; n (^ ft,-) = 0 am/ ]P ft,- = T).y>f «
2.4. Sums and Intersections of Subspaces 13
consistency, and because the 0 vector is in any vector space, we define dim(O) = O. A vector space V is finite-dimensional if there exists a basis X with n < +00 elements; otherwise, V is infinite-dimensional.
Thus, Theorem 2.18 says that dim (V) = the number of elements in a basis.
Example 2.20.
1. dim(~n) = n.
2. dim(~mXn) = mn.
Note: Check that a basis for ~mxn is given by the mn matrices Eij; i E m, j E ~,
where Eij is a matrix all of whose elements are 0 except for a 1 in the (i, J)th location. The collection of E;j matrices can be called the "natural basis matrices."
3. dim(C[to, tJJ) = +00.
4. dim{A E ~nxn : A = AT} = !n(n + 1). (To see why, determine !n(n + 1) symmetric basis matrices.)
5. dim{A E ~nxn : A is upper (lower) triangular} = !n(n + 1).
2.4 Sums and Intersections of Subspaces Definition 2.21. Let (V, JF') be a vector space and let R, S S; V. The sum and intersection ofR and S are defined respectively by:
1. R + S = {r + s : r E R, s E S}.
2. R n S = {v : v E R and v E S}.
Theorem 2.22. k
1. R + S S; V (in general, RI + ... + Rk =: L R; S; V, for finite k). ;=1
2. R n S S; V (in general, n Ra S; V for an arbitrary index set A). CiEA
Remark 2.23. The union of two subspaces, R U S, is not necessarily a subspace.
Definition 2.24. T = REB S is the direct sum ofR and S if
1. R n S = 0, and
2. R + S = T (in general, R; n (L R j ) = 0 and L Ri = T). H;
The subspaces Rand S are said to be complements of each other in T.
14 Chapter 2. Vector Spaces
Remark 2.25. The complement of ft (or S) is not unique. For example, consider V = R2
and let ft be any line through the origin. Then any other distinct line through the origin isa complement of ft. Among all the complements there is a unique one orthogonal to ft.We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose T =R O S. Then
1. every t € T can be written uniquely in the form t = r + s with r e R and s e S.
2. dim(T) = dim(ft) + dim(S).
Proof: To prove the first part, suppose an arbitrary vector t e T can be written in two waysas t = r1 + s1 = r2 + S2, where r1, r2 e R. and s1, S2 e S. Then r1 — r2 = s2— s\. Butr1 –r2 £ ft and 52 — si e S. Since ft fl S = 0, we must have r\ = r-i and s\ = si fromwhich uniqueness follows.
The statement of the second part is a special case of the next theorem. D
Theorem 2.27. For arbitrary subspaces ft, S of a vector space V,
EXERCISES
1. Suppose {vi,..., Vk} is a linearly dependent set. Then show that one of the vectorsmust be a linear combination of the others.
2. Let x\, *2, . . . , x/c E R" be nonzero mutually orthogonal vectors. Show that [x\,...,Xk} must be a linearly independent set.
3. Let v\,... ,vn be orthonormal vectors in R". Show that Av\,..., Avn are also or-thonormal if and only if A e R"x" is orthogonal.
4. Consider the vectors v\ — [2 l]r and 1*2 = [3 l]r. Prove that vi and V2 form a basisfor R2. Find the components of the vector v = [4 l]r with respect to this basis.
Example 2.28. Let U be the subspace of upper triangular matrices in E" x" and let £ be thesubspace of lower triangular matrices in Rnxn. Then it may be checked that U + L = Rnxn
while U n £ is the set of diagonal matrices in Rnxn. Using the fact that dim (diagonalmatrices} = n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validityof the formula given in Theorem 2.27.
Example 2.29. Let (V, F) = (Rnxn, R), let ft be the set of skew-symmetric matrices inR"x", and let S be the set of symmetric matrices in R"x". Then V = U 0 S.
Proof: This follows easily from the fact that any A e R"x" can be written in the form
The first matrix on the right-hand side above is in S while the second is in ft.
14 Chapter 2. Vector Spaces
Remark 2.25. The complement of R (or S) is not unique. For example, consider V = jR2
and let R be any line through the origin. Then any other distinct line through the origin is a complement of R. Among all the complements there is a unique one orthogonal to R. We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose T = R EB S. Then
1. every t E T can be written uniquely in the form t = r + s with r E Rand s E S.
2. dim(T) = dim(R) + dim(S).
Proof: To prove the first part, suppose an arbitrary vector t E T can be written in two ways as t = rl + Sl = r2 + S2, where rl, r2 E Rand SI, S2 E S. Then r, - r2 = S2 - SI. But rl - r2 E Rand S2 - SI E S. Since R n S = 0, we must have rl = r2 and SI = S2 from which uniqueness follows.
The statement of the second part is a special case of the next theorem. 0
Theorem 2.27. For arbitrary subspaces R, S of a vector space V,
dim(R + S) = dim(R) + dim(S) - dim(R n S).
Example 2.28. Let U be the subspace of upper triangular matrices in jRn xn and let .c be the subspace of lower triangUlar matrices in jRn xn. Then it may be checked that U + .c = jRn xn
while un.c is the set of diagonal matrices in jRnxn. Using the fact that dim {diagonal matrices} = n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity of the formula given in Theorem 2.27.
Example 2.29. Let (V, IF) = (jRnxn, jR), let R be the set of skew-symmetric matrices in jRnxn, and let S be the set of symmetric matrices in jRnxn. Then V = n $ S.
Proof: This follows easily from the fact that any A E jRnxn can be written in the form
1 TIT A=2:(A+A )+2:(A-A).
The first matrix on the right-hand side above is in S while the second is in R.
EXERCISES
1. Suppose {VI, ... , vd is a linearly dependent set. Then show that one of the vectors must be a linear combination of the others.
2. Let XI, X2, ... , Xk E jRn be nonzero mutually orthogonal vectors. Show that {XI, ... ,
Xk} must be a linearly independent set.
3. Let VI, ... , Vn be orthonormal vectors in jRn. Show that Av" •.. , AVn are also orthonormal if and only if A E jRnxn is orthogonal.
4. Consider the vectors VI = [2 1 f and V2 = [3 1 f. Prove that VI and V2 form a basis for jR2. Find the components of the vector v = [4 If with respect to this basis.
Exercises 15
5. Let P denote the set of polynomials of degree less than or equal to two of the formPo + p\x + pix2, where po, p\, p2 e R. Show that P is a vector space over E. Showthat the polynomials 1, *, and 2x2 — 1 are a basis for P. Find the components of thepolynomial 2 + 3x + 4x2 with respect to this basis.
6. Prove Theorem 2.22 (for the case of two subspaces R and S only).
7. Let Pn denote the vector space of polynomials of degree less than or equal to n, and ofthe form p ( x ) = po + p\x + • • • + pnx
n, where the coefficients /?, are all real. Let PEdenote the subspace of all even polynomials in Pn, i.e., those that satisfy the propertyp(—x} = p(x). Similarly, let PQ denote the subspace of all odd polynomials, i.e.,those satisfying p(—x} = – p ( x ) . Show that Pn = PE © PO-
8. Repeat Example 2.28 using instead the two subspaces 7" of tridiagonal matrices andU of upper triangular matrices.
Exercises 15
5. Let P denote the set of polynomials of degree less than or equal to two of the form Po + PI X + P2x2, where Po, PI, P2 E R Show that P is a vector space over R Show that the polynomials 1, x, and 2x2 - 1 are a basis for P. Find the components of the polynomial 2 + 3x + 4x2 with respect to this basis.
6. Prove Theorem 2.22 (for the case of two subspaces Rand S only).
7. Let Pn denote the vector space of polynomials of degree less than or equal to n, and of the form p(x) = Po + PIX + ... + Pnxn, where the coefficients Pi are all real. Let PE denote the subspace of all even polynomials in Pn , i.e., those that satisfy the property p( -x) = p(x). Similarly, let Po denote the subspace of all odd polynomials, i.e., those satisfying p(-x) = -p(x). Show that Pn = PE EB Po·
8. Repeat Example 2.28 using instead the two subspaces T of tridiagonal matrices and U of upper triangular matrices.
This page intentionally left blank This page intentionally left blank
Chapter 3
Linear Transformations
3.1 Definition and ExamplesWe begin with the basic definition of a linear transformation (or linear map, linear function,or linear operator) between two vector spaces.
Definition 3.1. Let (V, F) and (W, F) be vector spaces. Then C : V -> W is a lineartransformation if and only if
£(avi + pv2) = aCv\ + fi£v2 far all a, £ e F and far all v},v2e V.
The vector space V is called the domain of the transformation C while VV, the space intowhich it maps, is called the co-domain.
Example 3.2.
1. Let F = R and take V = W = PC[f0, +00).Define £ : PC[t0, +00) -> PC[t0, +00) by
2. Let F = R and take V = W = Rmx". Fix M e Rmxm.Define £ : Rmx" -> Mmxn by
3. Let F = R and take V = P" = (p(x) = a0 + ct}x H h anx" : a, E R} andw = -pn-1.Define C.: V —> W by Lp — p', where' denotes differentiation with respect to x.
17
Chapter 3
Linear Transformations
3.1 Definition and Examples We begin with the basic definition of a linear transformation (or linear map, linear function, or linear operator) between two vector spaces.
Definition 3.1. Let (V, IF) and (W, IF) be vector spaces. Then I:- : V -+ W is a linear transformation if and only if
I:-(avi + {3V2) = al:-vi + {3I:-V2 for all a, {3 ElF and for all VI, V2 E V.
The vector space V is called the domain of the transformation I:- while W, the space into which it maps, is called the co-domain.
Example 3.2.
1. Let IF = JR and take V = W = PC[to, +00). Define I:- : PC[to, +00) -+ PC[to, +00) by
vet) f--+ wet) = (I:-v)(t) = 11 e-(t-r)v(r) dr. to
2. Let IF = JR and take V = W = JRmxn. Fix ME JRmxm.
Define I:- : JRmxn -+ JRmxn by
X f--+ Y = I:-X = MX.
3. Let IF = JR and take V = pn = {p(x) = ao + alx + ... + anxn : ai E JR} and W = pn-l. Define I:- : V -+ W by I:- p = p', where I denotes differentiation with respect to x.
17
18 Chapters. Li near Transformations
3.2 Matrix Representation of Linear TransformationsLinear transformations between vector spaces with specific bases can be represented con-veniently in matrix form. Specifically, suppose £ : (V, F) —>• (W, F) is linear and furthersuppose that {u,, i e n} and {Wj, j e m] are bases for V and W, respectively. Then theith column of A = Mat £ (the matrix representation of £ with respect to the given basesfor V and W) is the representation of £i>, with respect to {w}•, j e raj. In other words,
represents £ since
where W = [w\,..., wm] and
is the z'th column of A. Note that A = Mat £ depends on the particular bases for V and W.This could be reflected by subscripts, say, in the notation, but this is usually not done.
The action of £ on an arbitrary vector v e V is uniquely determined (by linearity)by its action on a basis. Thus, if v = E1v1 + • • • + Envn = Vx (where u, and hence jc, isarbitrary), then
Thinking of A both as a matrix and as a linear transformation from Rn to Rm usually causes noconfusion. Change of basis then corresponds naturally to appropriate matrix multiplication.
Thus, £V = WA since x was arbitrary.When V = R", W = Rm and [ v i , i e n}, [ w j , j e m} are the usual (natural) bases
the equation £V = WA becomes simply £ = A. We thus commonly identify A as a lineatransformation with its matrix representation, i.e.,
18 Chapter 3. Linear Transformations
3.2 Matrix Representation of Linear Transformations Linear transformations between vector spaces with specific bases can be represented conveniently in matrix form. Specifically, suppose L : (V, IF) ~ (W, IF) is linear and further suppose that {Vi, i E ~} and {w j, j E !!!.} are bases for V and W, respectively. Then the ith column of A = Mat L (the matrix representation of L with respect to the given bases for V and W) is the representation of LVi with respect to {w j, j E m}. In other words,
represents L since
A= al
n ]
: E JR.mxn
a mn
LVi = aliwl + ... + amiWm
=Wai,
where W = [WI, ... , wm] and
is the ith column of A. Note that A = Mat L depends on the particular bases for V and W. This could be reflected by subscripts, say, in the notation, but this is usually not done.
The action of L on an arbitrary vector V E V is uniquely determined (by linearity) by its action on a basis. Thus, if V = ~I VI + ... + ~n Vn = V x (where v, and hence x, is arbitrary), then
LVx = Lv = ~ILvI + ... + ~nLvn =~IWal+"'+~nWan
= WAx.
Thus, LV = W A since x was arbitrary. When V = JR.n, W = lR.m and {Vi, i E ~}, {W j' j E !!!.} are the usual (natural) bases,
the equation LV = W A becomes simply L = A. We thus commonly identify A as a linear transformation with its matrix representation, i.e.,
Thinking of A both as a matrix and as a linear transformation from JR." to lR.m usually causes no confusion. Change of basis then corresponds naturally to appropriate matrix multiplication.
3.3. Composition of Transformations 19
3.3 Composition of TransformationsConsider three vector spaces U, V, and W and transformations B from U to V and A fromV to W. Then we can define a new transformation C as follows:
formula
Two Special Cases:
Inner Product: Let x, y e Rn. Then their inner product is the scalar
Outer Product: Let x e Rm, y e Rn. Then their outer product is the m x nmatrix
Note that any rank-one matrix A e Rmxn can be written in the form A = xyT
above (or xyH if A e Cmxn). A rank-one symmetric matrix can be written inthe form XXT (or XXH).
The above diagram illustrates the composition of transformations C = AB. Note that inmost texts, the arrows above are reversed as follows:
However, it might be useful to prefer the former since the transformations A and B appearin the same order in both the diagram and the equation. If dimZ// = p, dimV = n,and dim W = m, and if we associate matrices with the transformations in the usual way,then composition of transformations corresponds to standard matrix multiplication. That is,we have C — A B . The above is sometimes expressed componentwise by the
3.3. Composition ofTransformations 19
3.3 Composition of Transformations Consider three vector spaces U, V, and Wand transformations B from U to V and A from V to W. Then we can define a new transformation C as follows:
C
The above diagram illustrates the composition of transformations C = AB. Note that in most texts, the arrows above are reversed as follows:
C
However, it might be useful to prefer the former since the transformations A and B appear in the same order in both the diagram and the equation. If dimU = p, dim V = n, and dim W = m, and if we associate matrices with the transformations in the usual way, then composition of transformations corresponds to standard matrix mUltiplication. That is, we have CAB . The above is sometimes expressed componentwise by the
mxp
formula
Two Special Cases:
nxp
n
cij = L aikbkj.
k=1
Inner Product: Let x, y E ~n. Then their inner product is the scalar
n
xTy = Lx;y;. ;=1
Outer Product: Let x E ~m, y E ~n. Then their outer product is the m x n matrix
Note that any rank-one matrix A E ~mxn can be written in the form A = xyT above (or xyH if A E cmxn ). A rank-one symmetric matrix can be written in the form xxT (or xx H
).
20 Chapter 3. Li near Transformations
3.4 Structure of Linear TransformationsLet A : V —> W be a linear transformation.
Definition 3.3. The range of A, denotedlZ( A), is the set {w e W : w = Av for some v e V}.Equivalently, R(A) — {Av : v e V}. The range of A is also known as the image of A anddenoted Im(A).
The nullspace of A, denoted N(A), is the set {v e V : Av = 0}. The nullspace ofA is also known as the kernel of A and denoted Ker (A).
Theorem 3.4. Let A : V —>• W be a linear transformation. Then
1. R (A) C W.
2. N(A) c V.
Note that N(A) and R(A) are, in general, subspaces of different spaces.
Theorem 3.5. Let A e Rmxn. If A is written in terms of its columns as A = [a\,... ,an],then
Proof: The proof of this theorem is easy, essentially following immediately from the defi-nition. D
Remark 3.6. Note that in Theorem 3.5 and throughout the text, the same symbol (A) isused to denote both a linear transformation and its matrix representation with respect to theusual (natural) bases. See also the last paragraph of Section 3.2.
Definition 3.7. Let {v1 , . . . , vk] be a set of nonzero vectors u, e Rn. The set is said tobe orthogonal if' vjvj = 0 for i ^ j and orthonormal if vf vj = 8ij, where 8tj is theKronecker delta defined by
Example 3.8.
is an orthogonal set.
is an orthonormal set.
3. If { t > i , . . . , Vk} with u, € M." is an orthogonal set, then I —/==, - -., —/=== | is anI ^/v, vi ^/v'k vk ]
orthonormal set.
then
20 Chapter 3. LinearTransformations
3.4 Structure of Linear Transformations Let A : V --+ W be a linear transformation.
Definition3.3. The range of A, denotedR(A), is the set {w E W : w = Av for some v E V}. Equivalently, R(A) = {Av : v E V}. The range of A is also known as the image of A and denoted Im(A).
The nullspace of A, denoted N(A), is the set {v E V : Av = O}. The nullspace of A is also known as the kernel of A and denoted Ker (A).
Theorem 3.4. Let A : V --+ W be a linear transformation. Then
1. R(A) S; W.
2. N(A) S; V.
Note that N(A) and R(A) are, in general, subspaces of different spaces.
Theorem 3.5. Let A E ~mxn. If A is written in terms of its columns as A = [ai, ... , an], then
R(A) = Sp{al, ... , an} .
Proof: The proof of this theorem is easy, essentially following immediately from the defi-nition. 0
Remark 3.6. Note that in Theorem 3.5 and throughout the text, the same symbol (A) is used to denote both a linear transformation and its matrix representation with respect to the usual (natural) bases. See also the last paragraph of Section 3.2.
Definition 3.7. Let {VI, ... , vd be a set of nonzero vectors Vi E ~n. The set is said to be orthogonal if vr v j = 0 for i f= j and orthonormal if vr v j = 8ij' where 8ij is the Kronecker delta defined by
8 {I ifi=j, ij = 0 if i f= j.
Example 3.8.
1. {[ ~ J. [ -: J} is an orthogonal set.
2. {[ ~~i ] ,[ -:~~ J} is an orthonormal set.
3 If { } . h 1Tlln • h I th { ~ ~} . . VI, •.• ,Vk Wit Vi E.IN,. IS an ort ogona set, en ~, ... , ~ IS an ~VI VI ~~~
orthonormal set.
3.4. Structure of Linear Transformations 21
Definition 3.9. Let S c Rn. Then the orthogonal complement of S is defined as the set
S1- = {v e Rn : VTS = 0 for all s e S}.
Example 3.10. Let
Then it can be shown that
Working from the definition, the computation involved is simply to find all nontrivial (i.e.,nonzero) solutions of the system of equations
Note that there is nothing special about the two vectors in the basis defining S being or-thogonal. Any set of vectors will do, including dependent spanning vectors (which would,of course, then give rise to redundant equations).
Theorem 311 Let R S C Rn The
Proof: We prove and discuss only item 2 here. The proofs of the other results are left asexercises. Let { v1 , ..., vk} be an orthonormal basis for S and let x e Rn be an arbitraryvector. Set
3.4. Structure of Li near Transformations 21
Definition 3.9. Let S <; ]Rn. Then the orthogonal complement of S is defined as the set
S~={VE]Rn: vTs=OforallsES}.
Example 3.10. Let
Then it can be shown that
Working from the definition, the computation involved is simply to find all nontrivial (i.e., nonzero) solutions of the system of equations
3xI + 5X2 + 7X3 = 0,
-4xI + X2 + X3 = 0.
Note that there is nothing special about the two vectors in the basis defining S being orthogonal. Any set of vectors will do, including dependent spanning vectors (which would, of course, then give rise to redundant equations).
Theorem 3.11. Let n, S <; ]Rn. Then
2. S \B S~ = ]Rn.
3. (S~)l. = S.
4. n <; S if and only if S~ <; n~.
5. (n + S)~ = nl. n S~.
6. (n n S)~ = n~ + S~.
Proof: We prove and discuss only item 2 here. The proofs of the other results are left as exercises. Let {VI, ... , Vk} be an orthonormal basis for S and let x E ]Rn be an arbitrary vector. Set
k
XI = L (xT Vi)Vi, ;=1
X2 = X -XI.
we see that x2 is orthogonal to v1, ..., Vk and hence to any linear combination of thesevectors. In other words, X2 is orthogonal to any vector in S. We have thus shown thatS + S1 = Rn. We also have that S U S1 =0 since the only vector s e S orthogonal toeverything in S (i.e., including itself) is 0.
It is also easy to see directly that, when we have such direct sum decompositions, wecan write vectors in a unique way with respect to the corresponding subspaces. Suppose,for example, that x = x1 + x2. = x'1+ x'2, where x\, x 1 E S and x2, x'2 e S1. Then(x'1 — x1)T(x '2 — x2) = 0 by definition of ST. But then (x'1 — x 1 ) T ( x ' 1 – x1) = 0 sincex2 — X2 = — (x'1 — x1) (which follows by rearranging the equation x1+x2 = x'1 + x'2). Thus,x1 — x'1 and x2 = x2. D
Theorem 3.12. Let A : Rn —> Rm. Then
1. N(A)1" = 7£(Ar). (Note: This holds only for finite-dimensional vector spaces.)
2. 'R,(A)1~ — J\f(AT). (Note: This also holds for infinite-dimensional vector spaces.)
Proof: To prove the first part, take an arbitrary x e A/"(A). Then Ax = 0 and this isequivalent to yT Ax = 0 for all v. But yT Ax = ( A T y ) x. Thus, Ax = 0 if and only if xis orthogonal to all vectors of the form AT v, i.e., x e R(A r) . Since x was arbitrary, wehave established that N(A)1 = U(AT}.
The proof of the second part is similar and is left as an exercise. D
Definition 3.13. Let A : Rn -> Rm. Then {v e R" : Av = 0} is sometimes called theright nullspace of A. Similarly, (w e Rm : WT A = 0} is called the left nullspace of A.Clearly, the right nullspace is A/"(A) while the left nullspace is J\f(AT).
Theorem 3.12 and part 2 of Theorem 3.11 can be combined to give two very fun-damental and useful decompositions of vectors in the domain and co-domain of a lineartransformation A. See also Theorem 2.26.
Theorem 3.14 (Decomposition Theorem). Let A : R" -> Rm. Then
7. every vector v in the domain space R" can be written in a unique way as v = x + y,where x € M(A) and y € J\f(A)± = ft(Ar) (i.e., R" = M(A) 0 ft(Ar)).
2. every vector w in the co-domain space Rm can be written in a unique way asw = x+y,where x e U(A) and y e ft(A)1- = Af(AT) (i.e., Rm = 7l(A) 0 M(AT)).
This key theorem becomes very easy to remember by carefully studying and under-standing Figure 3.1 in the next section.
3.5 Four Fundamental SubspacesConsider a general matrix A € E^x". When thought of as a linear transformation from E"to Rm, many properties of A can be developed in terms of the four fundamental subspaces
22 Chapters. Li near Transformations
Then x\ e <S and, since
22 Chapter 3. Linear Transformations
Then XI E S and, since
T T T x2 V j = X V j - X I V j
=XTVj-XTVj=O,
we see that X2 is orthogonal to VI, .•. , Vk and hence to any linear combination of these vectors. In other words, X2 is orthogonal to any vector in S. We have thus shown that S + S.l = IRn. We also have that S n S.l = 0 since the only vector s E S orthogonal to everything in S (i.e., including itself) is O.
It is also easy to see directly that, when we have such direct sum decompositions, we can write vectors in a unique way with respect to the corresponding subspaces. Suppose, for example, that x = XI + X2 = x; + x~, where XI, x; E Sand X2, x~ E S.l. Then (x; - XI/ (x~ - X2) = 0 by definition of S.l. But then (x; - XI)T (x; - xd = 0 since x~ -X2 = -(x; -XI) (which follows by rearranging the equation XI +X2 = x; +x~). Thus, XI = x; andx2 = x~. 0
Theorem 3.12. Let A : IRn -+ IRm. Then
1. N(A).l = R(A T ). (Note: This holds only for finite-dimensional vector spaces.)
2. R(A).l = N(A T). (Note: This also holds for infinite-dimensional vector spaces.)
Proof: To prove the first part, take an arbitrary x E N(A). Then Ax = 0 and this is equivalent to yT Ax = 0 for all y. But yT Ax = (AT y{ x. Thus, Ax = 0 if and only if x
is orthogonal to all vectors of the form AT y, i.e., x E R(AT).l. Since x was arbitrary, we have established thatN(A).l = R(AT).
The proof of the second part is similar and is left as an exercise. 0
Definition 3.13. Let A : IRn -+ IRm. Then {v E IRn : A v = O} is sometimes called the
right nullspace of A. Similarly, {w E IRm : w T A = O} is called the left nullspace of A.
Clearly, the right nullspace is N(A) while the left nullspace is N(A T).
Theorem 3.12 and part 2 of Theorem 3.11 can be combined to give two very fundamental and useful decompositions of vectors in the domain and co-domain of a linear transformation A. See also Theorem 2.26.
Theorem 3.14 (Decomposition Theorem). Let A : IRn -+ IRm. Then
1. every vector v in the domain space IRn can be written in a unique way as v = x + y,
where x E N(A) and y E N(A).l = R(AT) (i.e., IRn = N(A) EB R(AT».
2. every vector w in the co-domain space IRm can be written ina unique way as w = x+y, where x E R(A) and y E R(A).l = N(A T) (i.e., IRm = R(A) EBN(A T».
This key theorem becomes very easy to remember by carefully studying and understanding Figure 3.1 in the next section.
3.5 Four Fundamental Subspaces Consider a general matrix A E lR;,xn. When thought of as a linear transformation from IRn
to IRm, many properties of A can be developed in terms of the four fundamental subspaces
3.5. Four Fundamental Subspaces 23
Figure 3.1. Four fundamental subspaces.
7£(A), 'R.(A)^, A f ( A ) , and N(A)T. Figure 3.1 makes many key properties seem almostobvious and we return to this figure frequently both in the context of linear transformationsand in illustrating concepts such as controllability and observability.
Definition 3.15. Let V and W be vector spaces and let A : Vmotion.
1. A is onto (also called epic or surjective) ifR,(A) = W.
W be a linear transfor-
2. A is one-to-one or 1-1 (also called monic or infective) ifJ\f(A) = 0. Two equivalentcharacterizations of A being 1-1 that are often easier to verify in practice are thefollowing:
Definition 3.16. Let A : E" -> Rm. Then rank(A) = dimftCA). This is sometimes calledthe column rank of A (maximum number of independent columns). The row rank of A is
3.5. Four Fundamental Subspaces 23
A
r N(A)1- r
EB {OJ X {O}Gl
n-r m -r
Figure 3.1. Four fundamental subspaces.
R(A), R(A)1-, N(A), and N(A)1-. Figure 3.1 makes many key properties seem almost obvious and we return to this figure frequently both in the context of linear transformations and in illustrating concepts such as controllability and observability.
Definition 3.15. Let V and W be vector spaces and let A : V -+ W be a linear transformation.
1. A is onto (also called epic or surjective) ifR(A) = W.
2. A is one-to-one or 1-1 (also called monic or injective) if N(A) = O. Two equivalent characterizations of A being 1-1 that are often easier to verify in practice are the following:
(a) AVI = AV2 ===} VI = V2 .
(b) VI t= V2 ===} AVI t= AV2 .
Definition 3.16. Let A : IRn -+ IRm. Then rank(A) = dim R(A). This is sometimes called the column rank of A (maximum number of independent columns). The row rank of A is
24 Chapter3. Linear Transformations
dim 7£(Ar) (maximum number of independent rows). The dual notion to rank is the nullityof A, sometimes denoted nullity(A) or corank(A), and is defined as dim A/"(A).
Theorem 3.17. Let A : Rn -> Rm. Then dim K(A) = dimA/'(A)±. (Note: SinceA/^A)1" = 7l(AT), this theorem is sometimes colloquially stated "row rank of A = columnrank of A.")
Proof: Define a linear transformation T : J\f(A)~L —>• 7£(A) by
Clearly T is 1-1 (since A/"(T) = 0). To see that T is also onto, take any w e 7£(A). Thenby definition there is a vector x e R" such that Ax — w. Write x = x\ + X2, wherex\ e A/^A)1- and jc2 e A/"(A). Then Ajti = u; = r*i since *i e A/^A)-1. The last equalityshows that T is onto. We thus have that dim7?.(A) = dimA/^A^ since it is easily shownthat if {ui, . . . , iv} is abasis forA/'CA)1, then {Tv\, . . . , Tvr] is abasis for 7?.(A). Finally, ifwe apply this and several previous results, the following string of equalities follows easily:"column rank of A" = rank(A) = dim7e(A) = dim A/^A)1 = dim7l(AT) = rank(Ar) ="row rank of A." D
The following corollary is immediate. Like the theorem, it is a statement about equalityof dimensions; the subspaces themselves are not necessarily in the same vector space.
Corollary 3.18. Let A : R" -> Rm. Then dimA/"(A) + dimft(A) = n, where n is thedimension of the domain of A.
Proof: From Theorems 3.11 and 3.17 we see immediately that
For completeness, we include here a few miscellaneous results about ranks of sumsand products of matrices.
Theorem 3.19. Let A, B e R"xn. Then
Part 4 of Theorem 3.19 suggests looking at the general problem of the four fundamentalsubspaces of matrix products. The basic results are contained in the following easily provedtheorem.
24 Chapter 3. LinearTransformations
dim R(AT) (maximum number of independent rows). The dual notion to rank is the nullity of A, sometimes denoted nullity(A) or corank(A), and is defined as dimN(A).
Theorem 3.17. Let A : ]Rn ~ ]Rm. Then dim R(A) = dimNCA)-L. (Note: Since N(A)-L = R(AT), this theorem is sometimes colloquially stated "row rank of A = column rank of A.")
Proof: Define a linear transformation T : N(A)-L ~ R(A) by
Tv = Av for all v E N(A)-L.
Clearly T is 1-1 (since N(T) = 0). To see that T is also onto, take any W E R(A). Then by definition there is a vector x E ]Rn such that Ax = w. Write x = Xl + X2, where Xl E N(A)-L andx2 E N(A). Then AXI = W = TXI since Xl E N(A)-L. The last equality shows that T is onto. We thus have that dim R(A) = dimN(A)-L since it is easily shown that if {VI, ... , vr } is a basis for N(A)-L, then {TVI, ... , Tv r } is a basis for R(A). Finally, if we apply this and several previous results, the following string of equalities follows easily: "column rank of A" = rank(A) = dim R(A) = dimN(A)-L = dim R(AT) = rank(AT) = "row rank of A." 0
The following corollary is immediate. Like the theorem, it is a statement about equality of dimensions; the subspaces themselves are not necessarily in the same vector space.
Corollary 3.18. Let A : ]Rn ~ ]Rm. Then dimN(A) + dim R(A) = n, where n is the dimension of the domain of A.
Proof: From Theorems 3.11 and 3.17 we see immediately that
n = dimN(A) + dimN(A)-L
= dimN(A) + dim R(A) . 0
For completeness, we include here a few miscellaneous results about ranks of sums and products of matrices.
Theorem 3.19. Let A, B E ]Rnxn. Then
1. O:s rank(A + B) :s rank(A) + rank(B).
2. rank(A) + rank(B) - n :s rank(AB) :s min{rank(A), rank(B)}.
3. nullity(B) :s nullity(AB) :s nullity(A) + nullity(B).
4. if B is nonsingular, rank(AB) = rank(BA) = rank(A) and N(BA) = N(A).
Part 4 of Theorem 3.19 suggests looking atthe general problem of the four fundamental subspaces of matrix products. The basic results are contained in the following easily proved theorem.
3.5. Four Fundamental Subspaces 25
Theorem 3.20. Let A e Rmxn, B e Rnxp. Then
The next theorem is closely related to Theorem 3.20 and is also easily proved. Itis extremely useful in text that follows, especially when dealing with pseudoinverses andlinear least squares problems.
Theorem 3.21. Let A e Rmxn. Then
We now characterize 1-1 and onto transformations and provide characterizations interms of rank and invertibility.
Theorem 3.22. Let A : Rn -» Rm. Then
1. A is onto if and only //"rank(A) — m (A has linearly independent rows or is said tohave full row rank; equivalently, AAT is nonsingular).
2. A is 1-1 if and only z/rank(A) = n (A has linearly independent columns or is saidto have full column rank; equivalently, AT A is nonsingular).
Proof: Proof of part 1: If A is onto, dim7?,(A) — m — rank (A). Conversely, let y e Rm
be arbitrary. Let jc = AT(AAT)~]y e Rn. Then y = Ax, i.e., y e 7?.(A), so A is onto.Proof of part 2: If A is 1-1, then A/"(A) = 0, which implies that dim A/^A)-1 —n —
dim 7£(Ar), and hence dim 7£(A) = n by Theorem 3.17. Conversely, suppose Ax\ = Ax^.Then ArA;ti = AT Ax2, which implies x\ = x^. since ArA is invertible. Thus, A is1-1. D
Definition 3.23. A : V —» W is invertible (or bijective) if and only if it is 1-1 and onto.Note that if A is invertible, then dim V — dim W. Also, A : W1 -»• E" is invertible ornonsingular if and only z/rank(A) = n.
Note that in the special case when A € R"x", the transformations A, Ar, and A"1
are all 1-1 and onto between the two spaces M(A)± and 7£(A). The transformations AT
and A~! have the same domain and range but are in general different maps unless A isorthogonal. Similar remarks apply to A and A~T.
3.5. Four Fundamental Subspaces 25
Theorem 3.20. Let A E IRmxn, B E IRnxp. Then
1. RCAB) S; RCA).
2. N(AB) ;2 N(B).
3. R«AB)T) S; R(BT).
4. N«AB)T) ;2 N(A T).
The next theorem is closely related to Theorem 3.20 and is also easily proved. It is extremely useful in text that follows, especially when dealing with pseudoinverses and linear least squares problems.
Theorem 3.21. Let A E IRmxn. Then
1. R(A) = R(AAT).
2. R(AT) = R(AT A).
3. N(A) = N(A T A).
4. N(AT) = N(AA T).
We now characterize I-I and onto transformations and provide characterizations in terms of rank and invertibility.
Theorem 3.22. Let A : IRn -+ IRm. Then
1. A is onto if and only if rank (A) = m (A has linearly independent rows or is said to have full row rank; equivalently, AA T is nonsingular).
2. A is 1-1 if and only ifrank(A) = n (A has linearly independent columns or is said to have full column rank; equivalently, AT A is nonsingular).
Proof' Proof of part 1: If A is onto, dim R(A) = m = rank(A). Conversely, let y E IRm be arbitrary. Let x = AT (AAT)-I Y E IRn. Then y = Ax, i.e., y E R(A), so A is onto.
Proof of part 2: If A is 1-1, then N(A) = 0, which implies that dimN(A)1- = n = dim R(AT), and hence dim R(A) = n by Theorem 3.17. Conversely, suppose AXI = AX2. Then AT AXI = AT AX2, which implies XI = X2 since AT A is invertible. Thus, A is 1-1. D
Definition 3.23. A : V -+ W is invertible (or bijective) if and only if it is 1-1 and onto. Note that if A is invertible, then dim V = dim W. Also, A : IRn -+ IRn is invertible or nonsingular ifand only ifrank(A) = n.
Note that in the special case when A E IR~xn, the transformations A, AT, and A-I are all 1-1 and onto between the two spaces N(A)1- and R(A). The transformations AT and A -I have the same domain and range but are in general different maps unless A is orthogonal. Similar remarks apply to A and A -T.
26 Chapters. Li near Transformations
If a linear transformation is not invertible, it may still be right or left invertible. Defi-nitions of these concepts are followed by a theorem characterizing left and right invertibletransformations.
Definition 3.24. Let A : V -> W. Then
1. A is said to be right invertible if there exists a right inverse transformation A~R :W —> V such that AA~R = Iw, where Iw denotes the identity transformation on W.
2. A is said to be left invertible if there exists a left inverse transformation A~L : W —>V such that A~LA = Iv, where Iv denotes the identity transformation on V.
Theorem 3.25. Let A : V -> W. Then
1. A is right invertible if and only if it is onto.
2. A is left invertible if and only if it is 1-1.
Moreover, A is invertible if and only if it is both right and left invertible, i.e., both 1-1 andonto, in which case A~l = A~R = A~L.
Note: From Theorem 3.22 we see that if A : E" ->• Em is onto, then a right inverseis given by A~R = AT(AAT) . Similarly, if A is 1-1, then a left inverse is given byA~L = (ATA)~1AT.
Theorem 3.26. Let A : V -» V.
1. If there exists a unique right inverse A~R such that AA~R = I, then A is invertible.
2. If there exists a unique left inverse A~L such that A~LA = I, then A is invertible.
Proof: We prove the first part and leave the proof of the second to the reader. Notice thefollowing:
Thus, (A R + A RA — /) must be a right inverse and, therefore, by uniqueness it must bethe case that A~R + A~RA — I = A~R. But this implies that A~RA = /, i.e., that A~R isa left inverse. It then follows from Theorem 3.25 that A is invertible. D
Example 3.27.
1. Let A = [1 2] : E2 -»• E1. Then A is onto. (Proof: Take any a € E1; then onecan always find v e E2 such that [1 2][^] = a). Obviously A has full row rank
(=1) and A~R = [ _j j is a right inverse. Also, it is clear that there are infinitely manyright inverses for A. In Chapter 6 we characterize all right inverses of a matrix bycharacterizing all solutions of the linear matrix equation AR = I.
26 Chapter 3. linear Transformations
If a linear transformation is not invertible, it may still be right or left invertible. Definitions of these concepts are followed by a theorem characterizing left and right invertible transformations.
Definition 3.24. Let A : V -+ W. Then
1. A is said to be right invertible if there exists a right inverse transformation A-R :
W -+ V such that AA -R = Iw, where Iw denotes the identity transfonnation on W.
2. A is said to be left invertible if there exists a left inverse transformation A -L : W -+
V such that A -L A = Iv, where Iv denotes the identity transfonnation on V.
Theorem 3.25. Let A : V -+ W. Then
1. A is right invertible if and only if it is onto.
2. A is left invertible if and only ifit is 1-1.
Moreover, A is invertible if and only if it is both right and left invertible, i.e., both 1-1 and onto, in which case A -I = A -R = A -L.
Note: From Theorem 3.22 we see that if A : ]Rn -+ ]Rm is onto, then a right inverse is given by A -R = AT (AAT) -I. Similarly, if A is 1-1, then a left inverse is given by A-L = (AT A)-I AT.
Theorem 3.26. Let A : V -+ V.
1. If there exists a unique right inverse A - R such that A A - R = I, then A is invertible.
2. If there exists a unique left inverse A -L such that A -L A = I, then A is invertible.
Proof: We prove the first part and leave the proof of the second to the reader. Notice the following:
A(A-R + A-RA -I) = AA-R + AA-RA - A
= I + I A - A since AA -R = I
= I.
Thus, (A -R + A -R A - I) must be a right inverse and, therefore, by uniqueness it must be the case that A -R + A -R A - I = A -R. But this implies that A -R A = I, i.e., that A -R is a left inverse. It then follows from Theorem 3.25 that A is invertible. 0
Example 3.27.
1. Let A = [1 2]:]R2 -+ ]R I. Then A is onto. (Proo!' Take any a E ]R I; then one
can always find v E ]R2 such that [1 2][ ~~] = a). Obviously A has full row rank
(= 1) and A - R = [ _ ~] is a right inverse. Also, it is clear that there are infinitely many right inverses for A. In Chapter 6 we characterize all right inverses of a matrix by characterizing all solutions of the linear matrix equation A R = I.
Exercises 27
2. Let A = [J] : E1 -> E2. ThenAis 1-1. (Proof: The only solution to 0 = Av = [I2]v
is v = 0, whence A/"(A) = 0 so A is 1-1). It is now obvious that A has full columnrank (=1) and A~L = [3 — 1] is a left inverse. Again, it is clear that there areinfinitely many left inverses for A. In Chapter 6 we characterize all left inverses of amatrix by characterizing all solutions of the linear matrix equation LA = I.
3. The matrix
when considered as a linear transformation on IEbelow bases for its four fundamental subspaces.
\ is neither 1-1 nor onto. We give
EXERCISES
3 41. Let A = [8 5 J and consider A as a linear transformation mapping E3 to E2.Find the matrix representation of A with respect to the bases
2. Consider the vector space Rnx" over E, let S denote the subspace of symmetricmatrices, and let 7£ denote the subspace of skew-symmetric matrices. For matricesX, Y e Enx" define their inner product by (X, Y) = Tr(XrF). Show that, withrespect to this inner product, 'R, — S^.
3. Consider the differentiation operator C defined in Example 3.2.3. Is £ 1-1? Is£onto?
4. Prove Theorem 3.4.
of R3 and
of E2.
Exercises 27
2. LetA = [i]:]Rl ~ ]R2. Then A is 1-1. (Proof The only solution toO = Av = [i]v
is v = 0, whence N(A) = 0 so A is 1-1). It is now obvious that A has full column rank (=1) and A -L = [3 - 1] is a left inverse. Again, it is clear that there are infinitely many left inverses for A. In Chapter 6 we characterize all left inverses of a matrix by characterizing all solutions of the linear matrix equation LA = I.
3. The matrix
[
1 1 A = 2 1
3 1
when considered as a linear transformation on ]R3, is neither 1-1 nor onto. We give below bases for its four fundamental subspaces.
EXERCISES
1. Let A = [~ ; i) and consider A as a linear transformation mapping ]R3 to ]R2.
Find the matrix representation of A with respect to the bases
{[lHHU]} {[il[~J}
2. Consider the vector space ]Rnxn over ]R, let S denote the subspace of symmetric matrices, and let R denote the subspace of skew-symmetric matrices. For matrices X, Y E ]Rnxn define their inner product by (X, y) = Tr(XT Y). Show that, with respect to this inner product, R = S J. .
3. Consider the differentiation operator £, defined in Example 3.2.3. Is £, I-I? Is £,
onto?
4. Prove Theorem 3.4.
28 Chapters. Linear Transformations
5. Prove Theorem 3.11.4.
6. Prove Theorem 3.12.2.
7. Determine bases for the four fundamental subspaces of the matrix
8. Suppose A e Rmxn has a left inverse. Show that AT has a right inverse.
9. Let A = [ J o]. Determine A/"(A) and 7£(A). Are they equal? Is this true in general?If this is true in general, prove it; if not, provide a counterexample.
10. Suppose A € Mg9x48. How many linearly independent solutions can be found to thehomogeneous linear system Ax = 0?
11. Modify Figure 3.1 to illustrate the four fundamental subspaces associated with AT eRnxm thought of as a transformation from Rm to R".
28 Chapter 3. Linear Transformations
5. Prove Theorem 3.Il.4.
6. Prove Theorem 3.12.2.
7. Detennine bases for the four fundamental subspaces of the matrix
A=[~ ~ ~ ~]. 2 5 5 3
8. Suppose A E IRm xn has a left inverse. Show that A T has a right inverse.
9. Let A = [~ n DetennineN(A) and R(A). Are they equal? Is this true in general? If this is true in general, prove it; if not, provide a counterexample.
10. Suppose A E 1R~9X48. How many linearly independent solutions can be found to the homogeneous linear system Ax = O?
11. Modify Figure 3.1 to illustrate the four fundamental subspaces associated with ATE
IRnxm thought of as a transformation from IRm to IRn.
Chapter 4
Introduction to theMoore-Pen rosePseudoinverse
In this chapter we give a brief introduction to the Moore-Penrose pseudoinverse, a gener-alization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for anymatrix and, as is shown in the following text, brings great notational and conceptual clarityto the study of solutions to arbitrary systems of linear equations and linear least squaresproblems.
4.1 Definitions and CharacterizationsConsider a linear transformation A : X —>• y, where X and y are arbitrary finite-dimensional vector spaces. Define a transformation T : Af(A)1- —>• Tl(A) by
Then, as noted in the proof of Theorem 3.17, T is bijective (1-1 and onto), and hence wecan define a unique inverse transformation T~l : 7£(A) —>• J\f(A}~L. This transformationcan be used to give our first definition of A+, the Moore-Penrose pseudoinverse of A.Unfortunately, the definition neither provides nor suggests a good computational strategyfor determining A+.
Definition 4.1. With A and T as defined above, define a transformation A+ : y —»• X by
where y = y\ + j2 with y\ e 7£(A) and yi e Tl(A}L. Then A+ is the Moore-Penrosepseudoinverse of A.
Although X and y were arbitrary vector spaces above, let us henceforth consider thecase X = W1 and y = Rm. We have thus defined A+ for all A e IR™X". A purely algebraiccharacterization of A+ is given in the next theorem, which was proved by Penrose in 1955;see [22].
29
Chapter 4
Introduction to the Moore-Penrose Pseudoinverse
In this chapter we give a brief introduction to the Moore-Penrose pseudoinverse, a generalization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for any matrix and, as is shown in the following text, brings great notational and conceptual clarity to the study of solutions to arbitrary systems of linear equations and linear least squares problems.
4.1 Definitions and Characterizations Consider a linear transformation A : X ---+ y, where X and Y are arbitrary finitedimensional vector spaces. Define a transformation T : N(A).l ---+ R(A) by
Tx = Ax for all x E NCA).l.
Then, as noted in the proof of Theorem 3.17, T is bijective Cl-l and onto), and hence we can define a unique inverse transformation T- 1 : RCA) ---+ NCA).l. This transformation can be used to give our first definition of A +, the Moore-Penrose pseudoinverse of A. Unfortunately, the definition neither provides nor suggests a good computational strategy for determining A + .
Definition 4.1. With A and T as defined above, define a transformation A + : Y ---+ X by
where Y = YI + Yz with Yl E RCA) and Yz E RCA).l. Then A+ is the Moore-Penrose pseudoinverse of A.
Although X and Y were arbitrary vector spaces above, let us henceforth consider the case X = ~n and Y = lP1.m . We have thus defined A + for all A E lP1.;" xn. A purely algebraic characterization of A + is given in the next theorem, which was proved by Penrose in 1955; see [22].
29
30 Chapter 4. Introduction to the Moore-Penrose Pseudoinverse
Theorem 4.2. Let A e R?xn. Then G = A+ if and only if
(PI) AGA = A.
(P2) GAG = G.
(P3) (AGf = AG.
(P4) (GA)T = GA.
Furthermore, A+ always exists and is unique.
Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. Also,a right or left inverse satisfies no fewer than three of the four properties. Unfortunately, aswith Definition 4.1, neither the statement of Theorem 4.2 nor its proof suggests a computa-tional algorithm. However, the Penrose properties do offer the great virtue of providing acheckable criterion in the following sense. Given a matrix G that is a candidate for beingthe pseudoinverse of A, one need simply verify the four Penrose conditions (P1)-(P4). If Gsatisfies all four, then by uniqueness, it must be A+. Such a verification is often relativelystraightforward.
Example 4.3. Consider A = [']. Verify directly that A+ = [| f ] satisfies (P1)-(P4).Note that other left inverses (for example, A~L = [3 — 1]) satisfy properties (PI), (P2),and (P4) but not (P3).
Still another characterization of A+ is given in the following theorem, whose proofcan be found in [1, p. 19]. While not generally suitable for computer implementation, thischaracterization can be useful for hand calculation of small examples.
Theorem 4.4. Let A e R™xn. Then
4.2 ExamplesEach of the following can be derived or verified by using the above definitions or charac-terizations.
Example 4.5. A+ = AT(AAT)~ if A is onto (independent rows) (A is right invertible).
Example 4.6. A+ = (AT A)~ AT if A is 1-1 (independent columns) (A is left invertible).
Example 4.7. For any scalar a,
30 Chapter 4. Introduction to the Moore-Penrose Pseudoinverse
Theorem 4.2. Let A E lR;" xn. Then G = A + if and only if
(Pl) AGA = A.
(P2) GAG = G.
(P3) (AG)T = AG.
(P4) (GA)T = GA.
Furthermore, A + always exists and is unique.
Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. Also, a right or left inverse satisfies no fewer than three of the four properties. Unfortunately, as with Definition 4.1, neither the statement of Theorem 4.2 nor its proof suggests a computational algorithm. However, the Penrose properties do offer the great virtue of providing a checkable criterion in the following sense. Given a matrix G that is a candidate for being the pseudoinverse of A, one need simply verify the four Penrose conditions (P1)-(P4). If G satisfies all four, then by uniqueness, it must be A +. Such a verification is often relatively straightforward.
Example 4.3. Consider A = [a Verify directly that A+ = [! ~] satisfies (PI)-(P4).
Note that other left inverses (for example, A -L = [3 - 1]) satisfy properties (PI), (P2), and (P4) but not (P3).
Still another characterization of A + is given in the following theorem, whose proof can be found in [1, p. 19]. While not generally suitable for computer implementation, this characterization can be useful for hand calculation of small examples.
Theorem 4.4. Let A E lR;" xn. Then
4.2 Examples
A + = lim (AT A + 821) -I AT 6--+0
= limAT(AAT +821)-1. 6--+0
(4.1)
(4.2)
Each of the following can be derived or verified by using the above definitions or characterizations.
Example 4.5. X t = AT (AA T) -I if A is onto (independent rows) (A is right invertible).
Example 4.6. A+ = (AT A)-I AT if A is 1-1 (independent columns) (A is left invertible).
Example 4.7. For any scalar a,
if a t= 0, if a =0.
4.3. Properties and Applications 31
Example 4.8. For any vector v e M",
Example 4.9.
Example 4.10.
4.3 Properties and ApplicationsThis section presents some miscellaneous useful results on pseudoinverses. Many of theseare used in the text that follows.
Theorem 4.11. Let A e Rmx" and suppose U e Rmxm, V e Rnx" are orthogonal (M isorthogonal if MT = M-1). Then
Proof: For the proof, simply verify that the expression above does indeed satisfy each cthe four Penrose conditions. D
Theorem 4.12. Let S e Rnxn be symmetric with UTSU = D, where U is orthogonal anD is diagonal. Then S+ = UD+UT, where D+ is again a diagonal matrix whose diagoncelements are determined according to Example 4.7.
Theorem 4.13. For all A e Rmxn,
Proof: Both results can be proved using the limit characterization of Theorem 4.4. Theproof of the first result is not particularly easy and does not even have the virtue of beingespecially illuminating. The interested reader can consult the proof in [1, p. 27]. Theproof of the second result (which can also be proved easily by verifying the four Penroseconditions) is as follows:
4.3. Properties and Applications
Example 4.8. For any vector v E jRn,
Example 4.9. [~ ~ r = [ 0 ~l
[ ~ ~ r = [ I I
1 Example 4.10. 4 4 I I 4 4
4.3 Properties and Applications
if v i= 0, if v = O.
31
This section presents some miscellaneous useful results on pseudoinverses. Many of these are used in the text that follows.
Theorem 4.11. Let A E jRmxn and suppose U E jRmxm, V E jRnxn are orthogonal (M is orthogonal if MT = M- 1). Then
Proof: For the proof, simply verify that the expression above does indeed satisfy each of the four Penrose conditions. 0
Theorem 4.12. Let S E jRnxn be symmetric with U T SU = D, where U is orthogonal and D is diagonal. Then S+ = U D+U T , where D+ is again a diagonal matrix whose diagonal elements are determined according to Example 4.7.
Theorem 4.13. For all A E jRmxn,
1. A+ = (AT A)+ AT = AT (AAT)+.
2. (AT)+ = (A+{.
Proof: Both results can be proved using the limit characterization of Theorem 4.4. The proof of the first result is not particularly easy and does not even have the virtue of being especially illuminating. The interested reader can consult the proof in [1, p. 27]. The proof of the second result (which can also be proved easily by verifying the four Penrose conditions) is as follows:
(AT)+ = lim (AAT + 82 l)-IA ~--+O
= lim [AT(AAT + 82 l)-1{ ~--+O
= [limAT(AAT + 82 l)-1{ ~--+O
= (A+{. 0
32 Chapter 4. Introduction to the Moore-Penrose Pseudoinverse
Note that by combining Theorems 4.12 and 4.13 we can, in theory at least, computethe Moore-Penrose pseudoinverse of any matrix (since A AT and AT A are symmetric). Thisturns out to be a poor approach in finite-precision arithmetic, however (see, e.g., [7], [11],[23]), and better methods are suggested in text that follows.
Theorem 4.11 is suggestive of a "reverse-order" property for pseudoinverses of prod-nets of matrices such as exists for inverses of nroducts TTnfortnnatelv. in peneraK
As an example consider A = [0 1J and B = LI. Then
while
However, necessary and sufficient conditions under which the reverse-order property doeshold are known and we quote a couple of moderately useful results for reference.
Theorem 4.14. (AB)+ = B+A+ if and only if
Proof: For the proof, see [9]. D
Theorem 4.15. (AB)+ = B?A+, where BI = A+AB and A) = AB\B+.
Proof: For the proof, see [5]. D
Theorem 4.16. If A e Rnrxr, B e Rr
rxm, then (AB)+ = B+A+.
Proof: Since A e Rnrxr, then A+ = (ATA)~lAT, whence A+A = Ir. Similarly, since
B e Wrxm, we have B+ = BT(BBT)~\ whence BB+ = Ir. The result then follows by
taking BI = B, A\ = A in Theorem 4.15. D
The following theorem gives some additional useful properties of pseudoinverses.
Theorem 4.17. For all A e Rmxn,
32 Chapter 4. Introduction to the Moore-Penrose Pseudo inverse
Note that by combining Theorems 4.12 and 4.13 we can, in theory at least, compute the Moore-Penrose pseudoinverse of any matrix (since AAT and AT A are symmetric). This turns out to be a poor approach in finite-precision arithmetic, however (see, e.g., [7], [II], [23]), and better methods are suggested in text that follows.
Theorem 4.11 is suggestive of a "reverse-order" property for pseudoinverses of products of matrices such as exists for inverses of products. Unfortunately, in general,
As an example consider A = [0 I] and B = [ : J. Then
(AB)+ = 1+ = I
while
B+ A+ = [~ ~J [ ~ ] = ~. However, necessary and sufficient conditions under which the reverse-order property does hold are known and we quote a couple of moderately useful results for reference.
Theorem 4.14. (AB)+ = B+ A + if and only if
1. n(BBT AT) ~ n(AT)
and
2. n(A T AB) ~ nCB) .
Proof: For the proof, see [9]. 0
Theorem 4.15. (AB)+ = B{ Ai, where BI = A+ AB and AI = ABIB{.
Proof: For the proof, see [5]. 0
Theorem 4.16. If A E lR~xr, B E lR~xm, then (AB)+ = B+ A+.
Proof' Since A E lR~xr, then A+ = (AT A)-I AT, whence A+ A = f r • Similarly, since
B E lR;xm, we have B+ = BT(BBT)-I, whence BB+ = f r . The result then follows by takingBt = B,At = A in Theorem 4.15. 0
The following theorem gives some additional useful properties of pseudoinverses.
Theorem 4.17. For all A E lRmxn,
1. (A+)+ = A.
2. (AT A)+ = A+(AT)+, (AA T)+ = (A T)+ A+.
3. n(A+) = n(AT) = n(A+ A) = n(AT A).
4. N(A+) = N(AA+) = N«AAT)+) = N(AA T) = N(A T).
5. If A is normal, then AkA+ = A+ Ak and (Ak)+ = (A+)kforall integers k > O.
Exercises 33
Note: Recall that A e R"xn is normal if AAT = AT A. For example, if A is symmetric,skew-symmetric, or orthogonal, then it is normal. However, a matrix can be none of thepreceding but still be normal, such as
for scalars a, b e E.The next theorem is fundamental to facilitating a compact and unifying approach
to studying the existence of solutions of (matrix) linear equations and linear least squaresproblems.
Theorem 4.18. Suppose A e Rnxp, B e EMXm . Then K(B) c U(A) if and only ifAA+B = B.
Proof: Suppose K(B) c U(A) and take arbitrary jc e Rm. Then Bx e H(B) c H(A), sothere exists a vector y e Rp such that Ay = Bx. Then we have
where one of the Penrose properties is used above. Since x was arbitrary, we have shownthat B = AA+B.
To prove the converse, assume that AA+B = B and take arbitrary y e K(B). Thenthere exists a vector x e Rm such that Bx = y, whereupon
EXERCISES
1. Use Theorem 4.4 to compute the pseudoinverse of \ 2 21 •
2. If jc, y e R", show that (xyT)+ = (xTx)+(yTy)+yxT.
3. For A e Rmxn, prove that 7£(A) = 7£(AAr) using only definitions and elementaryproperties of the Moore-Penrose pseudoinverse.
4. For A e Rmxn, prove that ft(A+) = ft(Ar).
5. For A e Rpxn and 5 € Rmx", show that JV(A) C A/"(S) if and only if fiA+A = B.
6. Let A G M"xn, 5 e Enxm, and D € Emxm and suppose further that D is nonsingular.
(a) Prove or disprove that
(b) Prove or disprove that
Exercises 33
Note: Recall that A E IRn xn is normal if A A T = A T A. For example, if A is symmetric, skew-symmetric, or orthogonal, then it is normal. However, a matrix can be none of the preceding but still be normal, such as
A=[ a b] -b a
for scalars a, b E R The next theorem is fundamental to facilitating a compact and unifying approach
to studying the existence of solutions of (matrix) linear equations and linear least squares problems.
Theorem 4.18. Suppose A E IRnxp, B E IRnxm. Then R(B) S; R(A) if and only if AA+B = B.
Proof: Suppose R(B) S; R(A) and take arbitrary x E IRm. Then Bx E R(B) S; RCA), so there exists a vector y E IRP such that Ay = Bx. Then we have
Bx = Ay = AA + Ay = AA + Bx,
where one of the Penrose properties is used above. Since x was arbitrary, we have shown that B = AA+ B.
To prove the converse, assume that AA + B = B and take arbitrary y E R(B). Then there exists a vector x E IRm such that Bx = y, whereupon
y = Bx = AA+Bx E R(A). 0
EXERCISES
1. Use Theorem 4.4 to compute the pseudoinverse of U ;]. 2. If x, Y E IRn, show that (xyT)+ = (xT x)+(yT y)+ yx T.
3. For A E IRmxn, prove that RCA) = R(AAT) using only definitions and elementary properties of the Moore-Penrose pseudoinverse.
4. For A E IRmxn, prove that R(A+) = R(AT).
5. For A E IRPxn and BE IRmxn, show thatN(A) S; N(B) if and only if BA+ A = B.
6. Let A E IRn xn, B E JRn xm , and D E IRm xm and suppose further that D is nonsingular.
(a) Prove or disprove that
[ ~ AB r = [ A+ -A+ABD- i ]. D 0 D- i
(b) Prove or disprove that
[ ~ B r =[ A+ -A+BD- 1
l D 0 D- i
This page intentionally left blank This page intentionally left blank
Chapter 5
Introduction to the SingularValue Decomposition
In this chapter we give a brief introduction to the singular value decomposition (SVD). Weshow that every matrix has an SVD and describe some useful properties and applicationsof this important matrix factorization. The SVD plays a key conceptual and computationalrole throughout (numerical) linear algebra and its applications.
5.1 The Fundamental TheoremTheorem 5.1. Let A e R™xn . Then there exist orthogonal matrices U e Rmxm andV € Rnxn such that
where S = [J °0], S = diagfcri, ... ,o>) e Rrxr, and a\ > • • • > or > 0. Morespecifically, we have
The submatrix sizes are all determined by r (which must be < min{m, «}), i.e., U\ e Wnxr,U2 e ^x(m-r); Vi e R«xr j y2 € Rnxfo-r^ and the 0-JM^/ocJb in E are compatiblydimensioned.
Proof: Since A r A > 0 ( A r A i s symmetric and nonnegative definite; recall, for example,[24, Ch. 6]), its eigenvalues are all real and nonnegative. (Note: The rest of the proof followsanalogously if we start with the observation that A A T > 0 and the details are left to the readeras an exercise.) Denote the set of eigenvalues of AT A by {of , / e n} with a\ > • • • > ar >0 = o>+i = • • • = an. Let {u, , i e n} be a set of corresponding orthonormal eigenvectorsand let V\ = [v\, ..., v r ] , Vi = [vr+\, . . . , vn]. Letting S — diag(cri, . . . , cfr), we canwrite A rAVi = ViS2. Premultiplying by Vf gives Vf ATAVi = VfV^S2 = S2, the latterequality following from the orthonormality of the r;, vectors. Pre- and postmultiplying byS~l eives the emotion
35
Chapter 5
Introduction to the Singular Value Decomposition
In this chapter we give a brief introduction to the singular value decomposition (SVD). We show that every matrix has an SVD and describe some useful properties and applications of this important matrix factorization. The SVD plays a key conceptual and computational role throughout (numerical) linear algebra and its applications.
5.1 The Fundamental Theorem Theorem 5.1. Let A E IR~xn. Then there exist orthogonal matrices U E IRmxm and V E IRnxn such that
A = U~VT, (5.1)
where ~ = [~ nS diag(ul, ... , ur ) E IRrxr, and UI > > Ur > O. More specifically, we have
U2) [ ~ 0 ] [ V T ] A = [UI I (5.2) 0 VT
2
= Ulsvt· (5.3)
The submatrix sizes are all determined by r (which must be S min{m, n}), i.e., UI E IRmxr,
U2 E IRrnx(m-rl, VI E IRnxr, V2 E IRnx(n-r), and the O-subblocks in ~ are compatibly dimensioned.
Proof: Since AT A ::::: 0 (AT A is symmetric and nonnegative definite; recall, for example, [24, Ch. 6]), its eigenvalues are all real and nonnegative. (Note: The rest of the proof follows analogously if we start with the observation that AAT ::::: 0 and the details are left to the reader as an exercise.) Denote the set of eigenvalues of AT A by {U?, i E !!.} with UI ::::: ... ::::: Ur > 0= Ur+1 = ... = Un. Let {Vi, i E !!.} be a set of corresponding orthonormal eigenvectors and let VI = [VI, ... ,Vr ), V2 = [Vr+I, ... ,Vn ]. LettingS = diag(uI, ... ,ur ), we can write A T A VI = VI S2. Premultiplying by vt gives vt A T A VI = vt VI S2 = S2, the latter equality following from the orthonormality of the Vi vectors. Pre- and postmultiplying by S-I gives the equation
(5.4)
35
36 Chapter 5. Introduction to the Singular Value Decomposition
Turning now to the eigenvalue equations corresponding to the eigenvalues or+\, . . . , an wehave that ATAV2 = V20 = 0, whence Vf ATAV2 = 0. Thus, AV2 = 0. Now define thematrix Ui e Mmx/" by U\ = AViS~l. Then from (5.4) we see that UfU\ = /; i.e., thecolumns of U\ are orthonormal. Choose any matrix U2 £ ^77IX(™~r) such that [U\ U2] isorthogonal. Then
since A V2 =0. Referring to the equation U\ = A V\ S l defining U\, we see that U{ AV\ =S and 1/2 AVi = U^UiS = 0. The latter equality follows from the orthogonality of the
columns of U\ andU2. Thus, we see that, in fact, UTAV = [Q Q], and defining this matrixto be S completes the proof. D
Definition 5.2. Let A = t/E VT be an SVD of A as in Theorem 5.1.
1. The set [a\,..., ar} is called the set of (nonzero) singular values of the matrix A andi
is denoted £(A). From the proof of Theorem 5.1 we see that cr,(A) = A(2 (ATA) =
A.? (AAT). Note that there are also min{m, n] — r zero singular values.
2. The columns ofU are called the left singular vectors of A (and are the orthonormaleigenvectors of AAT).
3. The columns of V are called the right singular vectors of A (and are the orthonormaleigenvectors of A1A).
Remark 5.3. The analogous complex case in which A e C™x" is quite straightforward.The decomposition is A = t/E VH, where U and V are unitary and the proof is essentiallyidentical, except for Hermitian transposes replacing transposes.
Remark 5.4. Note that U and V can be interpreted as changes of basis in both the domainand co-domain spaces with respect to which A then has a diagonal matrix representation.Specifically, let C, denote A thought of as a linear transformation mapping W to W. Thenrewriting A = U^VT as AV = U E we see that Mat £ is S with respect to the bases[v\,..., vn} for R" and {u\,..., um] for Rm (see the discussion in Section 3.2). See alsoRemark 5.16.
Remark 5.5. The singular value decomposition is not unique. For example, an examinationof the proof of Theorem 5.1 reveals that
• any orthonormal basis for jV(A) can be used for V2.
there may be nonuniqueness associated with the columns of V\ (and hence U\) cor-responding to multiple cr/'s.
36 Chapter 5. Introduction to the Singular Value Decomposition
Turning now to the eigenvalue equations corresponding to the eigenvalues ar+l, ... , an we have that A T A Vz = VzO = 0, whence Vi A T A V2 = O. Thus, A V2 = O. Now define the matrix VI E IRmxr by VI = AViS-I. Then from (5.4) we see that VrVI = /; i.e., the columns of VI are orthonormal. Choose any matrix V2 E IRmx(m-r) such that [VI V2] is orthogonal. Then
VT
AV = [ VrAVI
VIAVI
=[ VrAVI
vIA VI
Vr AVz ] vI AVz
~ ] since A V2 = O. Referring to the equation V I = A VI S-I defining VI, we see that V r A VI = S and vI A VI = vI VI S = O. The latter equality follows from the orthogonality of the
columns of VI and V 2 . Thus, we see that, in fact, VT A V = [~ ~], and defining this matrix
to be ~ completes the proof. 0
Definition 5.2. Let A = V"i:. VT be an SVD of A as in Theorem 5.1.
1. The set {ai, ... , ar } is called the set of (nonzero) singular values of the matrix A and I
is denoted ~(A). From the proof of Theorem 5.1 we see that ai(A) = A;'- (AT A) = I
At- (AA T ). Note that there are also min{m, n} - r zero singular values.
2. The columns of V are called the left singular vectors of A (and are the orthonormal eigenvectors of AAT ).
3. The columns of V are called the right singular vectors of A (and are the orthonormal eigenvectors of AT A).
Remark 5.3. The analogous complex case in which A E IC~ xn is quite straightforward. The decomposition is A = V"i:. V H, where V and V are unitary and the proof is essentially identical, except for Hermitian transposes replacing transposes.
Remark 5.4. Note that V and V can be interpreted as changes of basis in both the domain and co-domain spaces with respect to which A then has a diagonal matrix representation. Specifically, let C denote A thought of as a linear transformation mapping IRn to IRm. Then rewriting A = V"i:. VT as A V = V"i:. we see that Mat C is "i:. with respect to the bases {VI, ... , vn } for IRn and {u I, •.. , um} for IRm (see the discussion in Section 3.2). See also Remark 5.16.
Remark 5.5. The !:ingular value decomposition is not unique. For example, an examination of the proof of Theorem 5.1 reveals that
• £lny orthonormal basis for N(A) can be used for V2.
• there may be nonuniqueness associated with the columns of VI (and hence VI) corresponding to multiple O'i'S.
5.1. The Fundamental Theorem 37
• any C/2 can be used so long as [U\ Ui] is orthogonal.
• columns of U and V can be changed (in tandem) by sign (or multiplier of the formeje in the complex case).
What is unique, however, is the matrix E and the span of the columns of U\, f/2, Vi, and¥2 (see Theorem 5.11). Note, too, that a "full SVD" (5.2) can always be constructed froma "compact SVD" (5.3).
Remark 5.6. Computing an SVD by working directly with the eigenproblem for AT A orAAT is numerically poor in finite-precision arithmetic. Better algorithms exist that workdirectly on A via a sequence of orthogonal transformations; see, e.g., [7], [11], [25].
F/vamnlp 5.7.
Example 5.10. Let A e RMX" be symmetric and positive definite. Let V be an orthogonalmatrix of eigenvectors that diagonalizes A, i.e., VT AV = A > 0. Then A = VAVT is anSVD of A.
A factorization t/SVr o f a n m x n matrix A qualifies as an SVD if U and V areorthogonal and £ is an m x n "diagonal" matrix whose diagonal elements in the upperleft corner are positive (and ordered). For example, if A = f/E VT is an SVD of A, thenVS rC/ r i sanSVDof AT.
where U is an arbitrary 2x2 orthogonal matrix, is an SVD.
Example 5.8.
where 0 is arbitrary, is an SVD.
Example 5.9.
is an SVD.
5.1. The Fundamental Theorem 37
• any U2 can be used so long as [UI U2] is orthogonal.
• columns of U and V can be changed (in tandem) by sign (or multiplier of the form e j8 in the complex case).
What is unique, however, is the matrix I: and the span of the columns of UI, U2, VI, and V2 (see Theorem 5.11). Note, too, that a "full SVD" (5.2) can always be constructed from a "compact SVD" (5.3).
Remark 5.6. Computing an SVD by working directly with the eigenproblem for A T A or AA T is numerically poor in finite-precision arithmetic. Better algorithms exist that work directly on A via a sequence of orthogonal transformations; see, e.g., [7], [11], [25],
Example 5.7.
A - [1 0 ] - U I U T - 01- ,
where U is an arbitrary 2 x 2 orthogonal matrix, is an SVD.
Example 5.8.
A _ [ 1 - 0 -~ ] = [
where e is arbitrary, is an SVD.
Example 5.9.
cose - sine
sin e cose J[~ ~J[ cose
sine
A=U n=[ I -2y'5
2~ ][ 3~ 0][ 3 -5-
2 y'5 4y'5 0 0 3 S- 15
2 0 _y'5 0 0
3 -3-
[ I
] 3
3J2 [~ ~] = 2 3 2 3
is an SVD.
Sine] -cose '
v'2 v'2 ] T T v'2 -v'2 T -2-
Example 5.10. Let A E IRnxn be symmetric and positive definite. Let V be an orthogonal matrix of eigenvectors that diagonalizes A, i.e., VT A V = A > O. Then A = V A V T is an SVDof A.
A factorization UI: VT of an m x n matrix A qualifies as an SVD if U and V are orthogonal and I: is an m x n "diagonal" matrix whose diagonal elements in the upper left comer are positive (and ordered). For example, if A = UI:VT is an SVD of A, then VI:TU T is an SVD of AT.
38 Chapter 5. Introduction to the Singular Value Decomposition
5.2 Some Basic PropertiesTheorem 5.11. Let A e Rmxn have a singular value decomposition A = VLVT. Usingthe notation of Theorem 5.1, the following properties hold:
1. rank(A) = r = the number of nonzero singular values of A.
2. Let U =. [HI, . . . , um] and V = [v\, ..., vn]. Then A has the dyadic (or outerproduct) expansion
Remark 5.12. Part 4 of the above theorem provides a numerically superior method forfinding (orthonormal) bases for the four fundamental subspaces compared to methods basedon, for example, reduction to row or column echelon form. Note that each subspace requiresknowledge of the rank r. The relationship to the four fundamental subspaces is summarizednicely in Figure 5.1.
Remark 5.13. The elegance of the dyadic decomposition (5.5) as a sum of outer productsand the key vector relations (5.6) and (5.7) explain why it is conventional to write the SVDas A = UZVT rather than, say, A = UZV.
Theorem 5.14. Let A e Emx" have a singular value decomposition A = UHVT as inTheoremS.]. Then
where
3. The singular vectors satisfy the relations
38 Chapter 5. Introduction to the Singular Value Decomposition
5.2 Some Basic Properties Theorem 5.11. Let A E jRrnxn have a singular value decomposition A = U'£ V T . Using the notation of Theorem 5.1, the following properties hold:
1. rank(A) = r = the number of nonzero singular values of A.
2. Let V = [UI, ... , urn] and V = [VI, ... , vn]. Then A has the dyadic (or outer product) expansion
r
A = Laiuiv;. i=1
3. The singular vectors satisfy the relations
for i E r.
AVi = ajui,
AT Uj = aivi
(5.5)
(5.6)
(5.7)
4. LetUI = [UI, ... , u r ], U2 = [Ur+I, ... , urn], VI = [VI, ... , vr], andV2 = [Vr+I, ... , Vn]. Then
(a) R(VI) = R(A) = N(AT/.
(b) R(U2) = R(A)1- = N(A T ).
(c) R(VI) = N(A)1- = R(A T ).
(d) R(V2) = N(A) = R(AT)1-.
Remark 5.12. Part 4 of the above theorem provides a numerically superior method for finding (orthonormal) bases for the four fundamental subspaces compared to methods based on, for example, reduction to row or column echelon form. Note that each subspace requires knowledge of the rank r. The relationship to the four fundamental subspaces is summarized nicely in Figure 5.1.
Remark 5.13. The elegance of the dyadic decomposition (5.5) as a sum of outer products and the key vector relations (5.6) and (5.7) explain why it is conventional to write the SVD as A = U'£V T rather than, say, A = U,£V.
Theorem 5.14. Let A E jRmxn have a singular value decomposition A = U,£V T as in Theorem 5.1. Then
(5.8)
where
5.2. Some Basic Properties 39
Figure 5.1. SVD and the four fundamental subspaces.
with the Q-subblocks appropriately sized. Furthermore, if we let the columns of U and Vbe as defined in Theorem 5.11, then
Proof: The proof follows easily by verifying the four Penrose conditions. D
Remark 5.15. Note that none of the expressions above quite qualifies as an SVD of A+
if we insist that the singular values be ordered from largest to smallest. However, a simplereordering accomplishes the task:
This can also be written in matrix terms by using the so-called reverse-order identity matrix(or exchange matrix) P = \er,er^\, ..., e^, e\\, which is clearly orthogonal and symmetric.
5.2. Some Basic Properties 39
A
r r
E9 {O} / {O)<!l
n-r m-r
Figure 5.1. SVD and the four fundamental subspaces.
with the O-subblocks appropriately sized. Furthermore, if we let the columns of U and V be as defined in Theorem 5.11, then
r 1 = L -v;u;, (5.10)
;=1 U;
Proof' The proof follows easily by verifying the four Penrose conditions. 0
Remark 5.15. Note that none of the expressions above quite qualifies as an SVD of A + if we insist that the singular values be ordered from largest to smallest. However, a simple reordering accomplishes the task:
(5.11)
This can also be written in matrix terms by using the so-called reverse-order identity matrix (or exchange matrix) P = [er , er-I, ... , e2, ed, which is clearly orthogonal and symmetric.
is the matrix version of (5.11). A "full SVD" can be similarly constructed.
Remark 5.16. Recall the linear transformation T used in the proof of Theorem 3.17 andin Definition 4.1. Since T is determined by its action on a basis, and since ( v \ , . . . , vr} is abasis forJ\f(A)±, then T can be defined by TV; = cr, w, , / e r. Similarly, since [u\, ... ,ur}isabasisfor7£(.4), then T~l can be defined by T^'M, = ^-u, , / e r. From Section 3.2, thematrix representation for T with respect to the bases { v \ , ..., vr} and {MI , . . . , ur] is clearlyS, while the matrix representation for the inverse linear transformation T~l with respect tothe same bases is 5""1.
5.3 Row and Column Compressions
Row compression
Let A E Rmxn have an SVD given by (5.1). Then
Notice that M(A) - M(UT A) = A/"(SV,r) and the matrix SVf e Rrx" has full rowrank. In other words, premultiplication of A by UT is an orthogonal transformation that"compresses" A by row transformations. Such a row compression can also be accomplished
D _
by orthogonal row transformations performed directly on A to reduce it to the form 0 ,where R is upper triangular. Both compressions are analogous to the so-called row-reducedechelon form which, when derived by a Gaussian elimination algorithm implemented infinite-precision arithmetic, is not generally as reliable a procedure.
Column compressionAgain, let A e Rmxn have an SVD given by (5.1). Then
This time, notice that H(A) = K(AV) = K(UiS) and the matrix UiS e Rmxr has fullcolumn rank. In other words, postmultiplication of A by V is an orthogonal transformationthat "compresses" A by column transformations. Such a compression is analogous to the
40 Chapters. Introduction to the Singular Value Decomposition
Then
40 Chapter 5. Introduction to the Singular Value Decomposition
Then A+ = (VI p)(PS-1 p)(PVr)
is the matrix version of (5.11). A "full SVD" can be similarly constructed.
Remark 5.16. Recall the linear transformation T used in the proof of Theorem 3.17 and in Definition 4.1. Since T is determined by its action on a basis, and since {VI, ... , vr } is a basisforN(A).l, then T can be defined by TVj = OjUj , i E~. Similarly, since {UI, ... , u r }
is a basis forR(A), then T- I canbedefinedbyT-Iu; = tv; ,i E~. From Section 3.2, the matrix representation for T with respect to the bases {VI, ... , vr } and {u I, ... , u r } is clearly S, while the matrix representation for the inverse linear transformation T- I with respect to the same bases is S-I.
5.3 Rowand Column Compressions
Row compression
Let A E lR.mxn have an SVD given by (5.1). Then
VT A = :EVT
= [~ ~ ] [ ~i ] - [ SVr ] lR.mxn - 0 E .
Notice that N(A) = N(V T A) = N(svr> and the matrix SVr E lR.rxll has full row rank. In other words, premultiplication of A by VT is an orthogonal transformation that "compresses" A by row transformations. Such a row compression can also be accomplished
by orthogonal row transformations performed directly on A to reduce it to the form [~], where R is upper triangular. Both compressions are analogous to the so-called row-reduced echelon form which, when derived by a Gaussian elimination algorithm implemented in finite-precision arithmetic, is not generally as reliable a procedure.
Column compression
Again, let A E lR.mxn have an SVD given by (5.1). Then
AV = V:E
= [VI U2] [~ ~ ]
=[VIS 0] ElR.mxn.
This time, notice that R(A) = R(A V) = R(UI S) and the matrix VI S E lR.m xr has full
column rank. In other words, postmultiplication of A by V is an orthogonal transformation that "compresses" A by I;olumn transformations. Such a compression is analogous to the
Exercises 41
so-called column-reduced echelon form, which is not generally a reliable procedure whenperformed by Gauss transformations in finite-precision arithmetic. For details, see, forexample, [7], [11], [23], [25].
EXERCISES
1. Let X € Mmx". If XTX = 0, show that X = 0.
2. Prove Theorem 5.1 starting from the observation that AAT > 0.
3. Let A e E"xn be symmetric but indefinite. Determine an SVD of A.
4. Let x e Rm, y e Rn be nonzero vectors. Determine an SVD of the matrix A e R™defined by A = xyT.
6. Let A e Rmxn and suppose W eRmxm and 7 e Rnxn are orthogonal.
(a) Show that A and W A F have the same singular values (and hence the same rank).
(b) Suppose that W and Y are nonsingular but not necessarily orthogonal. Do Aand WAY have the same singular values? Do they have the same rank?
7. Let A € R"XM. Use the SVD to determine a polar factorization of A, i.e., A = QPwhere Q is orthogonal and P = PT > 0. Note: this is analogous to the polar formz = rel&ofa complex scalar z (where i = j = V^T).
5. Determine SVDs of the matrices
Exercises 41
so-called column-reduced echelon form, which is not generally a reliable procedure when performed by Gauss transformations in finite-precision arithmetic. For details, see, for example, [7], [11], [23], [25].
EXERCISES
1. Let X E IRmxn. If XT X = 0, show that X = o.
2. Prove Theorem 5.1 starting from the observation that AAT ~ O.
3. Let A E IRnxn be symmetric but indefinite. Determine an SVD of A.
4. Let x E IRm, y E ~n be nonzero vectors. Determine an SVD of the matrix A E ~~ xn defined by A = xyT.
5. Determine SVDs of the matrices
(a) [ -1 ] 0 -1
(b) [ ~l 6. Let A E ~mxn and suppose W E IRmxm and Y E ~nxn are orthogonal.
(a) Show that A and WAY have the same singular values (and hence the same rank).
(b) Suppose that Wand Yare nonsingular but not necessarily orthogonal. Do A and WAY have the same singular values? Do they have the same rank?
7. Let A E ~~xn. Use the SVD to determine a polar factorization of A, i.e., A = Q P where Q is orthogonal and P = p T > O. Note: this is analogous to the polar form z = reiO of a complex scalar z (where i = j = J=I).
This page intentionally left blank This page intentionally left blank
Chapter 6
Linear Equations
In this chapter we examine existence and uniqueness of solutions of systems of linearequations. General linear systems of the form
are studied and include, as a special case, the familiar vector system
6.1 Vector Linear EquationsWe begin with a review of some of the principal results associated with vector linear systems.
Theorem 6.1. Consider the system of linear equations
1. There exists a solution to (6.3) if and only ifbeH(A).
2. There exists a solution to (6.3} for all b e Rm if and only ifU(A) = W", i.e., A isonto; equivalently, there exists a solution if and only j/"rank([A, b]) = rank(A), andthis is possible only ifm < n (since m = dimT^(A) = rank(A) < min{m, n}).
3. A solution to (6.3) is unique if and only ifJ\f(A) = 0, i.e., A is 1-1.
4. There exists a unique solution to (6.3) for all b e W" if and only if A is nonsingular;equivalently, A G Mmxm and A has neither a 0 singular value nor a 0 eigenvalue.
5. There exists at most one solution to (6.3) for all b e W1 if and only if the columns ofA are linearly independent, i.e., A/"(A) = 0, and this is possible only ifm > n.
6. There exists a nontrivial solution to the homogeneous system Ax = 0 if and only ifrank(A) < n.
43
Chapter 6
Linear Equations
In this chapter we examine existence and uniqueness of solutions of systems of linear equations. General linear systems of the form
(6.1)
are studied and include, as a special case, the familiar vector system
Ax = b; A E ]Rn xn, b E ]Rn. (6.2)
6.1 Vector Linear Equations We begin with a review of some of the principal results associated with vector linear systems.
Theorem 6.1. Consider the system of linear equations
Ax = b; A E lRm xn, b E lRm. (6.3)
1. There exists a solution to (6.3) if and only if b E R(A).
2. There exists a solution to (6.3) for all b E lRm if and only ifR(A) = lRm, i.e., A is onto; equivalently, there exists a solution if and only ifrank([A, b]) = rank(A), and this is possible only ifm :::: n (since m = dim R(A) = rank(A) :::: min{m, n n.
3. A solution to (6.3) is unique if and only if N(A) = 0, i.e., A is 1-1.
4. There exists a unique solution to (6.3) for all b E ]Rm if and only if A is nonsingular; equivalently, A E lRmxm and A has neither a 0 singular value nor a 0 eigenvalue.
5. There exists at most one solution to (6.3) for all b E lRm if and only if the columns of A are linearly independent, i.e., N(A) = 0, and this is possible only ifm ::: n.
6. There exists a nontrivial solution to the homogeneous system Ax = 0 if and only if rank(A) < n.
43
44 Chapter 6. Linear Equations
Proof: The proofs are straightforward and can be consulted in standard texts on linearalgebra. Note that some parts of the theorem follow directly from others. For example, toprove part 6, note that x = 0 is always a solution to the homogeneous system. Therefore, wemust have the case of a nonunique solution, i.e., A is not 1-1, which implies rank(A) < nby part 3. D
6.2 Matrix Linear EquationsIn this section we present some of the principal results concerning existence and uniquenessof solutions to the general matrix linear system (6.1). Note that the results of Theorem6.1 follow from those below for the special case k = 1, while results for (6.2) follow byspecializing even further to the case m = n.
Theorem 6.2 (Existence). The matrix linear equation
and this is clearly of the form (6.5).
has a solution if and only ifl^(B) C 7£(A); equivalently, a solution exists if and only ifAA+B = B.
Proof: The subspace inclusion criterion follows essentially from the definition of the rangeof a matrix. The matrix criterion is Theorem 4.18.
Theorem 6.3. Let A e Rmxn, B eRmxk and suppose that AA+B = B. Then any matrixof the form
is a solution of
Furthermore, all solutions of (6.6) are of this form.
Proof: To verify that (6.5) is a solution, premultiply by A:
That all solutions arc of this form can be seen as follows. Let Z be an arbitrary solution of(6.6), i.e., AZ — B. Then we can write
44 Chapter 6. Linear Equations
Proof: The proofs are straightforward and can be consulted in standard texts on linear algebra. Note that some parts of the theorem follow directly from others. For example, to prove part 6, note that x = 0 is always a solution to the homogeneous system. Therefore, we must have the case of a nonunique solution, i.e., A is not I-I, which implies rank(A) < n by part 3. 0
6.2 Matrix Linear Equations In this section we present some of the principal results concerning existence and uniqueness of solutions to the general matrix linear system (6.1). Note that the results of Theorem 6.1 follow from those below for the special case k = 1, while results for (6.2) follow by specializing even further to the case m = n.
Theorem 6.2 (Existence). The matrix linear equation
AX = B; A E JR.mxn , BE JR.mxk, (6.4)
has a solution if and only ifR(B) S; R(A); equivalently, a solution exists if and only if AA+B = B.
Proof: The subspace inclusion criterion follows essentially from the definition of the range of a matrix. The matrix criterion is Theorem 4.18. 0
Theorem 6.3. Let A E JR.mxn, B E JR.mxk and suppose that AA + B = B. Then any matrix of the form
X = A+ B + (/ - A+ A)Y, where Y E JR.nxk is arbitrary, (6.5)
is a solution of
AX=B. (6.6)
Furthermore, all solutions of (6.6) are of this form.
Proof: To verify that (6.5) is a solution, premultiply by A:
AX = AA+ B + A(I - A+ A)Y
= B + (A - AA+ A)Y by hypothesis
= B since AA + A = A by the first Penrose condition.
That all solutions are of this form can be seen as follows. Let Z be an arbitrary solution of (6.6). i.e .. AZ :::: B. Then we can write
Z=A+AZ+(I-A+A)Z
=A+B+(I-A+A)Z
and this is clearly of the form (6.5). 0
6.2. Matrix Linear Equations 45
Remark 6.4. When A is square and nonsingular, A+ = A"1 and so (/ — A+A) = 0. Thus,there is no "arbitrary" component, leaving only the unique solution X• = A~1B.
Remark 6.5. It can be shown that the particular solution X = A+B is the solution of (6.6)that minimizes TrX7 X. (Tr(-) denotes the trace of a matrix; recall that TrXr X = £\ • jcj.)
Theorem 6.6 (Uniqueness). A solution of the matrix linear equation
is unique if and only if A+A = /; equivalently, (6.7) has a unique solution if and only ifM(A) = 0.
Proof: The first equivalence is immediate from Theorem 6.3. The second follows by notingthat A+A = / can occur only if r — n, where r = rank(A) (recall r < h). But rank(A) = nif and only if A is 1-1 or _/V(A) = 0. D
Example 6.7. Suppose A e E"x". Find all solutions of the homogeneous system Ax — 0.Solution:
where y e R" is arbitrary. Hence, there exists a nonzero solution if and only if A+A /= I.This is equivalent to either rank (A) = r < n or A being singular. Clearly, if there exists anonzero solution, it is not unique.
Computation: Since y is arbitrary, it is easy to see that all solutions are generatedfrom a basis for 7£(7 — A+A). But if A has an SVD given by A = f/E VT, then it is easilychecked that / - A+A = V2V2
r and U(V2V^) = K(V2) = N(A).
Example 6.8. Characterize all right inverses of a matrix A e ]Rmx"; equivalently, find allsolutions R of the equation AR = Im. Here, we write Im to emphasize the m x m identitymatrix.
Solution: There exists a right inverse if and only if 7£(/m) c 7£(A) and this isequivalent to AA+Im = Im. Clearly, this can occur if and only if rank(A) = r = m (sincer < m) and this is equivalent to A being onto (A+ is then a right inverse). All right inversesof A are then of the form
where Y e E"xm is arbitrary. There is a unique right inverse if and only if A+A = /(AA(A) = 0), in which case A must be invertible and R = A"1.
Example 6.9. Consider the system of linear first-order difference equations
6.2. Matrix Linear Equations 45
Remark 6.4. When A is square and nonsingular, A + = A-I and so (I - A + A) = O. Thus, there is no "arbitrary" component, leaving only the unique solution X = A-I B.
Remark 6.5. It can be shown that the particular solution X = A + B is the solution of (6.6) that minimizes TrXT X. (TrO denotes the trace of a matrix; recall that TrXT X = Li,j xlj.)
Theorem 6.6 (Uniqueness). A solution of the matrix linear equation
AX = B; A E lR,mxn, BE lR,mxk (6,7)
is unique if and only if A + A = I; equivalently, (6.7) has a unique solution if and only if N(A) = O.
Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting thatA+ A = I can occur only ifr = n, wherer = rank(A) (recallr ::: n), Butrank(A) = n if and only if A is I-lor N(A) = O. 0
Example 6.7. Suppose A E lR,nxn. Find all solutions of the homogeneous system Ax = 0, Solution:
x=A+O+(I-A+A)y
= (I-A+A)y,
where y E lR,n is arbitrary. Hence, there exists a nonzero solution if and only if A + A t= I, This is equivalent to either rank(A) = r < n or A being singular. Clearly, if there exists a nonzero solution, it is not unique,
Computation: Since y is arbitrary, it is easy to see that all solutions are generated from a basis for R(I - A + A). But if A has an SVD given by A = U h VT, then it is easily checked that 1- A+ A = Vz V[ and R(Vz vD = R(Vz) = N(A),
Example 6.S. Characterize all right inverses of a matrix A E lR,mxn; equivalently, find all solutions R of the equation AR = 1m, Here, we write 1m to emphasize the m x m identity matrix,
Solution: There exists a right inverse if and only if R(Im) S; R(A) and this is equivalent to AA + 1m = 1m. Clearly, this can occur if and only if rank(A) = r = m (since r ::: m) and this is equivalent to A being onto (A + is then a right inverse). All right inverses of A are then of the form
R = A+ 1m + (In - A+ A)Y
=A++(I-A+A)Y,
where Y E lR,nxm is arbitrary, There is a unique right inverse if and only if A+ A I (N(A) = 0), in which case A must be invertible and R = A-I.
Example 6.9. Consider the system of linear first-order difference equations
(6,8)
46 Chapter 6. Linear Equations
with A e R"xn and fieR"xm(rc>l,ra>l). The vector Jt* in linear system theory isknown as the state vector at time k while Uk is the input (control) vector. The generalsolution of (6.8) is given by
for k > 1. We might now ask the question: Given XQ = 0, does there exist an input sequence{uj }y~Q such that x^ takes an arbitrary vaof reachability. Since m > 1, from thesee that (6.8) is reachable if and only if
[Uj }kjj^ such that Xk takes an arbitrary value in W ? In linear system theory, this is a questionof reachability. Since m > 1, from the fundamental Existence Theorem, Theorem 6.2, we
or, equivalently, if and only if
A related question is the following: Given an arbitrary initial vector XQ, does there ex-ist an input sequence {"y}"~o such that xn = 0? In linear system theory, this is calledcontrollability. Again from Theorem 6.2, we see that (6.8) is controllable if and only if
Clearly, reachability always implies controllability and, if A is nonsingular, control-lability and reachability are equivalent. The matrices A = [ ° 1
Q1 and 5 = f ̂ 1 provide anexample of a system that is controllable but not reachable.
The above are standard conditions with analogues for continuous-time models (i.e.,linear differential equations). There are many other algebraically equivalent conditions.
Example 6.10. We now introduce an output vector yk to the system (6.8) of Example 6.9by appending the equation
with C e Rpxn and D € Rpxm (p > 1). We can then pose some new questions about theoverall system that are dual in the system-theoretic sense to reachability and controllability.The answers are cast in terms that are dual in the linear algebra sense as well. The conditiondual to reachability is called observability: When does knowledge of {"7 }"!Q and {y_/}"~osuffice to determine (uniquely) Jt0? As a dual to controllability, we have the notion ofreconstructibility: When does knowledge of {wy}"~Q and {;y/}"Io suffice to determine(uniquely) xnl The fundamental duality result from linear system theory is the following:
(A, B) is reachable [controllable] if and only if(AT, B T] is observable [reconstructive].
46 Chapter 6. Linear Equations
with A E IRnx" and B E IRnxm (n ~ I, m ~ I). The vector Xk in linear system theory is known as the state vector at time k while Uk is the input (control) vector. The general solution of (6.8) is given by
k-J
Xk = Akxo + LAk-J-j BUj
j=O
k k-J Uk-2
[
Uk-J ]
~Axo+[B.AB •...• A B] ~o
(6.9)
(6.10)
for k ~ 1. We might now ask the question: Given Xo = 0, does there exist an input sequence {u j 1 ~:b such that Xk takes an arbitrary value in 1R"? In linear system theory, this is a question of reacbability. Since m ~ I, from the fundamental Existence Theorem, Theorem 6.2, we see that (6.8) is reachable if and only if
R([ B, AB, ... , An-
J B]) = 1R"
or, equivalently, if and only if
rank [B, AB, ... , An-
J B] = n.
A related question is the following: Given an arbitrary initial vector Xo, does there exist an input sequence {u j l'/:b such that Xn = O? In linear system theory, this is called controllability. Again from Theorem 6.2, we see that (6.8) is controllable if and only if
Clearly, reachability always implies controllability and, if A is nonsingular, control
lability and reachability are equivalent. The matrices A = [~ ~] and B = [~] provide an example of a system that is controllable but not reachable.
The above are standard conditions with analogues for continuous-time models (i.e., linear differential equations). There are many other algebraically equivalent conditions.
Example 6.10. We now introduce an output vector Yk to the system (6.8) of Example 6.9 by appending the equation
(6.11)
with C E IRPxn and D E IR Pxm (p ~ 1). We can then pose some new questions about the overall system that are dual in the system-theoretic sense to reachability and controllability. The answers are cast in terms that are dual in the linear algebra sense as well. The condition dual to reachability is called observability: When does knowledge of {u j r/:b and {Yj l';:b suffice to determine (uniquely) xo? As a dual to controllability, we have the notion of reconstructibility: When does knowledge of {u j r/:b and {YJ lj:b suffice to determine (uniquely) xn? The fundamental duality result from linear system theory is the following:
(A. B) iJ reachable [controllablcl if and only if (A T. B T) is observable [reconsrrucrible]
6.4 Some Useful and Interesting Inverses 47
To derive a condition for observability, notice that
Thus,
Let v denote the (known) vector on the left-hand side of (6.13) and let R denote the matrix onthe right-hand side. Then, by definition, v e Tl(R), so a solution exists. By the fundamentalUniqueness Theorem, Theorem 6.6, the solution is then unique if and only if N(R) = 0,or, equivalently, if and only if
6.3 A More General Matrix Linear EquationTheorem 6.11. Let A e Rmxn, B e Rmxq, and C e Rpxti. Then the equation
has a solution if and only if AA+BC+C = B, in which case the general solution is of the
where Y € Rn*p is arbitrary.
A compact matrix criterion for uniqueness of solutions to (6.14) requires the notionof the Kronecker product of matrices for its statement. Such a criterion (CC+ <g) A+A — I)is stated and proved in Theorem 13.27.
6.4 Some Useful and Interesting InversesIn many applications, the coefficient matrices of interest are square and nonsingular. Listedbelow is a small collection of useful matrix identities, particularly for block matrices, as-sociated with matrix inverses. In these identities, A e Rnxn, B E Rnxm, C e Rmxn,and D € Emxm. Invertibility is assumed for any component or subblock whose inverse isindicated. Verification of each identity is recommended as an exercise for the reader.
6.4 Some Useful and Interesting Inverses
Thus,
To derive a condition for observability, notice that
k-l Yk = CAkxo + L CAk-1-j BUj + DUk.
j=O
r
Yo - Duo Yl - CBuo - Du]
Yn-] - Lj:~ CAn-2-j BUj - DUn-l
47
(6.12)
(6.13)
Let v denote the (known) vector on the left-hand side of (6.13) and let R denote the matrix on the right-hand side. Then, by definition, v E R(R), so a solution exists. By the fundamental Uniqueness Theorem, Theorem 6.6, the solution is then unique if and only if N(R) = 0, or, equivalently, if and only if
6.3 A More General Matrix Linear Equation Theorem 6.11. Let A E jRmxn, B E jRmx q , and C E jRpxq. Then the equation
AXC=B (6.14)
has a solution if and only if AA + BC+C = B, in which case the general solution is of the form
(6.15)
where Y E jRnxp is arbitrary.
A compact matrix criterion for uniqueness of solutions to (6.14) requires the notion of the Kronecker product of matrices for its statement. Such a criterion (C C+ ® A + A = I) is stated and proved in Theorem 13.27.
6.4 Some Useful and Interesting Inverses In many applications, the coefficient matrices of interest are square and nonsingular. Listed below is a small collection of useful matrix identities, particularly for block matrices, associated with matrix inverses. In these identities, A E jRnxn, B E jRnxm, C E jRmxn,
and D E jRm xm. Invertibility is assumed for any component or subblock whose inverse is indicated. Verification of each identity is recommended as an exercise for the reader.
48 Chapter 6. Linear Equations
1. (A + BDCr1 = A~l - A~lB(D~l + CA~lB)~[CA~l.This result is known as the Sherman-Morrison-Woodbury formula. It has manyapplications (and is frequently "rediscovered") including, for example, formulas forthe inverse of a sum of matrices such as (A + D)"1 or (A"1 + D"1) . It alsoyields very efficient "updating" or "downdating" formulas in expressions such as
T — 1(A + JUT ) (with symmetric A e R"x" and ;c e E") that arise in optimizationtheory.
EXERCISES
1. As in Example 6.8, characterize all left inverses of a matrix A e Mmx".
2. Let A € Emx", B e Rmxk and suppose A has an SVD as in Theorem 5.1. Assuming7Z(B) c 7£(A), characterize all solutions of the matrix linear equation
Both of these matrices satisfy the matrix equation X^ = I from which it is obviousthat X~l = X. Note that the positions of the / and — / blocks may be exchanged.
where E = (D — CA B) (E is the inverse of the Schur complement of A). Thisresult follows easily from the block LU factorization in property 16 of Section 1.4.
where F = (A — ED C) . This result follows easily from the block UL factor-ization in property 17 of Section 1.4.
in terms of the SVD of A
48 Chapter 6. Linear Equations
1. (A + BDC)-I = A-I - A-IB(D-I + CA-IB)-ICA-I.
This result is known as the Sherman-Morrison-Woodbury formula. It has many applications (and is frequently "rediscovered") including, for example, formulas for the inverse of a sum of matrices such as (A + D)-lor (A-I + D-I)-I. It also yields very efficient "updating" or "downdating" formulas in expressions such as (A + xxT ) -I (with symmetric A E lRnxn and x E lRn) that arise in optimization theory.
2. [~ ~ rl
= [ ~
3. [~ !/ rl
= [~ -~ l [~ ~/ rl
= [~ -~ 1 Both of these matrices satisfy the matrix equation X2 = / from which it is obvious that X-I = X. Note that the positions of the / and - / blocks may be exchanged.
4. [~ ~ rl
= [ A~I -A-I BD- I ] D- I .
5. [~ ~ rl
= [-D~I~A-I D~I 1 6. [ / +c
BC ~ rl
= [!C / ~~B 1 7. [~ ~ rl
= [ A-I +_~~!~CA-I -A~BE l where E = (D - CA- I B)-I (E is the inverse of the Schur complement of A). This result follows easily from the block LU factorization in property 16 of Section 1.4.
8. [~ ~ rl
= [-D~CF D-I +-~~I~;BD-I l where F = (A - B D- I C) -I. This result follows easily from the block UL factorization in property 17 of Section 1.4.
EXERCISES
1. As in Example 6.8, characterize all left inverses of a matrix A E lRm xn .
2. Let A E lRmxn, B E lRfflxk and suppose A has an SVD as in Theorem 5.1. Assuming R(B) ~ R(A), characterize all solutions of the matrix linear equation
AX=B
in terms of the SVD of A.
Exercises 49
3. Let jc, y e E" and suppose further that XTy ^ 1. Show that
4. Let x, y € E" and suppose further that XTy ^ 1. Show that
where c = 1/(1 — xTy).
5. Let A e R"x" and let A"1 have columns c\, ..., cn and individual elements y;y.Assume that x/( 7^ 0 for some / and j. Show that the matrix B — A —l—eieT: (i.e.,
A with — subtracted from its (zy)th element) is singular.Hint: Show that ct <= M(B).
6. As in Example 6.10, check directly that the condition for reconstructibility takes theform
Exercises 49
3. Let x, y E IRn and suppose further that x T y i= 1. Show that
T -1 1 T (/ - xy) = I - xy .
xTy -1
4. Let x, y E IRn and suppose further that x T y i= 1. Show that
-cxJ C '
where C = 1/(1 - x T y).
5. Let A E 1R~ xn and let A -1 have columns Cl, ... ,Cn and individual elements Yij. Assume that Yji i= 0 for some i and j. Show that the matrix B = A - ~i e;e; (i.e.,
A with yl subtracted from its (ij)th element) is singular. l'
Hint: Show that Ci E N(B).
6. As in Example 6.10, check directly that the condition for reconstructibility takes the form
N[ fA J ~ N(An).
CAn - 1
This page intentionally left blank This page intentionally left blank
Chapter 7
Projections, Inner ProductSpaces, and Norms
7.1 ProjectionsDefinition 7.1. Let V be a vector space with V = X 0 y. By Theorem 2.26, every v e Vhas a unique decomposition v = x + y with x e X and y e y. Define PX y • V —>• X c Vby
Figure 7.1. Oblique projections.
Theorem 7.2. Px,y is linear and P# y — Px,y-
Theorem 7.3. A linear transformation P is a projection if and only if it is idempotent, i.e.,P2 = P. Also, P is a projection if and only if I —P is a projection. Infact, Py,x — I — Px,y-
Proof: Suppose P is a projection, say on X along y (using the notation of Definition 7.1).
51
Px,y is called the (oblique) projection on X along 3^.
Figure 7.1 displays the projection of v on both X and 3^ in the case V =
Chapter 7
Projections, Inner Product Spaces, and Norms
7.1 Projections Definition 7.1. Let V be a vector space with V = X EEl Y. By Theorem 2.26, every v E V has a unique decomposition v = x + y with x E X and y E y. Define pX,y : V ---+ X <; V by
PX,yV = x for all v E V.
PX,y is called the (oblique) projection on X along y.
Figure 7.1 displays the projection of von both X and Y in the case V = ]R2.
y
x
Figure 7.1. Oblique projections.
Theorem 7.2. px.y is linear and pl. y = px.y.
Theorem 7.3. A linear transformation P is a projection if and only if it is idempotent, i.e., p2 = P. Also, P isaprojectionifandonlyifl -P isaprojection. Infact, Py.x = I -px.y.
Proof: Suppose P is a projection, say on X along Y (using the notation of Definition 7.1).
51
52 Chapter 7. Projections, Inner Product Spaces, and Norms
Let u e V be arbitrary. Then Pv = P(x + y) = Px = x. Moreover, P2v = PPv —Px = x = Pv. Thus, P2 = P. Conversely, suppose P2 = P. Let X = {v e V : Pv = v}and y = {v € V : Pv = 0}. It is easy to check that X and 3^ are subspaces. We now provethat V = X 0 y. First note that tfveX, then Pv = v. If v e y, then Pv = 0. Henceif v € X n y, then v = 0. Now let u e V be arbitrary. Then v = Pv + (I - P)v. Letx = Pv, y = (I - P)v. Then Px = P2v = Pv = x so x e X, while Py = P(I - P}v =Pv - P2v = 0 so y e y. Thus, V = X 0 y and the projection on X along y is P.Essentially the same argument shows that / — P is the projection on y along X. D
Definition 7.4. In the special case where y = X^, PX.XL *s called an orthogonal projec-tion and we then use the notation PX = PX,XL-
Theorem 7.5. P e E"xn is the matrix of an orthogonal projection (onto K(P)} if and onlyifP2 = p = PT.
Proof: Let P be an orthogonal projection (on X, say, along XL} and let jc, y e R" bearbitrary. Note that (/ - P)x = (I - PX,X^X = Px±,xx by Theorem 7.3. Thus,(/ - P)x e XL. Since Py e X, we have ( P y f ( I - P)x = yTPT(I - P)x = 0.Since x and y were arbitrary, we must have PT (I — P) = 0. Hence PT = PTP = P,with the second equality following since PTP is symmetric. Conversely, suppose P is asymmetric projection matrix and let x be arbitrary. Write x = Px + (I — P)x. ThenxTPT(I - P)x = xTP(I - P}x = 0. Thus, since Px e U(P), then (/ - P)x 6 ft(P)1
and P must be an orthogonal projection. D
7.1.1 The four fundamental orthogonal projections
Using the notation of Theorems 5.1 and 5.11, let A 6 Rmxn with SVD A = UT,VT =UtSVf. Then
are easily checked to be (unique) orthogonal projections onto the respective four funda-mental subspaces,
52 Chapter 7. Projections, Inner Product Spaces, and Norms
Let v E V be arbitrary. Then Pv = P(x + y) = Px = x. Moreover, p 2v = P Pv = Px = x = Pv. Thus, p2 = P. Conversely, suppose p2 = P. Let X = {v E V : Pv = v} and Y = {v E V : Pv = OJ. It is easy to check that X and Y are subspaces. We now prove that V = X $ y. First note that if v E X, then Pv = v. If v E Y, then Pv = O. Hence if v E X ny, then v = O. Now let v E V be arbitrary. Then v = Pv + (I - P)v. Let x = Pv, y = (I - P)v. Then Px = p 2v = Pv = x so x E X, while Py = P(l - P)v = Pv - p 2v = 0 so Y E y. Thus, V = X $ Y and the projection on X along Y is P. Essentially the same argument shows that I - P is the projection on Y along X. 0
Definition 7.4. In the special case where Y = X1-, px.xl. is called an orthogonal projection and we then use the notation Px = PX.XL
Theorem 7.5. P E jRnxn is the matrix of an orthogonal projection (onto R(P)) if and only if p2 = P = pT.
Proof: Let P be an orthogonal projection (on X, say, along X 1-) and let x, y E jR" be arbitrary. Note that (I - P)x = (I - px.xJ.)x = PXJ..xx by Theorem 7.3. Thus, (I - P)x E X1-. Since Py E X, we have (py)T (I - P)x = yT pT (I - P)x = O. Since x and y were arbitrary, we must have pT (I - P) = O. Hence pT = pT P = P, with the second equality following since pT P is symmetric. Conversely, suppose P is a symmetric projection matrix and let x be arbitrary. Write x = P x + (I - P)x. Then x T pT (I - P)x = x T P(l - P)x = O. Thus, since Px E R(P), then (I - P)x E R(P)1-and P must be an orthogonal projection. 0
7.1 .1 The four fundamental orthogonal projections
Using the notation of Theorems 5.1 and 5.11, let A E jRmxII with SVD A = U!:V T
U\SVr Then
r
PR(A) AA+ U\U[ Lu;uT, ;=1
m
PR(A).L 1- AA+ U2 U! LUiUT, i=r+l
11
PN(A) 1- A+A V2V{ L ViVf, i=r+l
r
PN(A)J. A+A VIV{ LViVT i=l
are easily checked to be (unique) orthogonal projections onto the respective four fundamental subspaces.
7.1. Projections 53
Example 7.6. Determine the orthogonal projection of a vector v e M" on another nonzerovector w e Rn.
Solution: Think of the vector w as an element of the one-dimensional subspace IZ(w).Then the desired projection is simply
(using Example 4.8)
Moreover, the vector z that is orthogonal to w and such that v = Pv + z is given by
z = PK(W)±V = (/ — PK(W))V = v — (^-^ j w. See Figure 7.2. A direct calculation shows
that z and u; are, in fact, orthogonal:
Figure 7.2. Orthogonal projection on a "line."
Example 7.7. Recall the proof of Theorem 3.11. There, {v \ , . . . , Vk} was an orthornormalbasis for a subset S of W1. An arbitrary vector x e R" was chosen and a formula for x\appeared rather mysteriously. The expression for x\ is simply the orthogonal projection ofx on S. Specifically,
Example 7.8. Recall the diagram of the four fundamental subspaces. The indicated directsum decompositions of the domain E" and co-domain Rm are given easily as follows.
Let x e W1 be an arbitrary vector. Then
7.1. Projections 53
Example 7.6. Determine the orthogonal projection of a vector v E IRn on another nonzero vector w E IRn.
Solution: Think of the vector w as an element of the one-dimensional subspace R( w). Then the desired projection is simply
Pn(w)v = ww+v
wwTv (using Example 4.8)
= (WTV) T W. W W
Moreover, the vector z that is orthogonal to wand such that v = P v + z is given by
z = Pn(w)"' v = (l - Pn(w»v = v - (:;~) w. See Figure 7.2. A direct calculation shows
that z and ware, in fact, orthogonal:
v z
Pv w
Figure 7.2. Orthogonal projection on a "line."
Example 7.7. Recall the proof of Theorem 3.11. There, {VI, ... , Vk} was an orthomormal basis for a subset S of IRn. An arbitrary vector x E IRn was chosen and a formula for XI
appeared rather mysteriously. The expression for XI is simply the orthogonal projection of X on S. Specifically,
Example 7.8. Recall the diagram of the four fundamental subspaces. The indicated direct sum decompositions of the domain IRn and co-domain IRm are given easily as follows.
Let X E IRn be an arbitrary vector. Then
X = PN(A)u + PN(A)X
= A+ Ax + (I - A+ A)x
= VI vt x + V2 Vi x (recall VVT = I).
54 Chapter 7. Projections, Inner Product Spaces, and Norms
Similarly, let y e ]Rm be an arbitrary vector. Then
Example 7.9. Let
Then
and we can decompose the vector [2 3 4]r uniquely into the sum of a vector in A/'CA)-1
and a vector in J\f(A), respectively, as follows:
7.2 Inner Product SpacesDefinition 7.10. Let V be a vector space over R. Then { • , • ) : V x Vproduct if
is a real inner
1. (x, x) > Qfor all x 6V and ( x , x } =0 if and only ifx = 0.
2. (x, y) = (y,x)forallx,y e V.
3. {*, cryi + ^2) = a(x, y\) + /3(jt, y^} for all jc, yi, j2 ^ V and for alia, ft e R.
Example 7.11. Let V = R". Then {^, y} = XTy is the "usual" Euclidean inner product ordot product.
Example 7.12. Let V = E". Then (jc, y)Q = XTQy, where Q = QT > 0 is an arbitraryn x n positive definite matrix, defines a "weighted" inner product.
Definition 7.13. If A e Rmx", then AT e Rnxm is the unique linear transformation or mapsuch that (x, Ay) - (ATx, y) for all x € Rm and for all y e R".
54 Chapter 7. Projections, Inner Product Spaces, and Norms
Similarly, let Y E IRm be an arbitrary vector. Then
Y = PR(A)Y + PR(A)~Y = AA+y + (l- AA+)y
= U1Ur y + U2U[ Y (recall UU T = I).
Example 7.9. Let
Then 1/4 1/4 o
1/4 ] 1/4 o
and we can decompose the vector [2 3 4V uniquely into the sum of a vector in N(A)-L and a vector in N(A), respectively, as follows:
[ ! ] ~ A' Ax + (l - A' A)x
[ 1/2 1/2 0] [ 2] [
= 1~2 1~2 ~ ! +
[ 5/2] [-1/2]
= 5~2 + 1~2 .
7.2 Inner Product Spaces
1/2 -1/2
o
-1/2 1/2 o
Definition 7.10. Let V be a vector space over IR. Then (', .) : V x V -+ IR is a real inner product if
1. (x, x) ::: Of or aU x E V and (x, x) = 0 if and only ifx = O.
2. (x, y) = (y, x) for all x, y E V.
3. (x, aYI + PY2) = a(x, Yl) + f3(x, Y2) for all x, Yl, Y2 E V and/or all a, f3 E IR.
Example 7.11. Let V = IRn. Then (x, y) = x T Y is the "usual" Euclidean inner product or dot product.
Example 7.12. Let V = IRn. Then (x, y) Q = X T Qy, where Q = Q T > 0 is an arbitrary n x n positive definite matrix, defines a "weighted" inner product.
Definition 7.13. If A E IRm xn, then ATE IRn xm is the unique linear transformation or map such that {x, Ay) = {AT x, y) for all x E IRm andfor all y e IRn.
7.2. Inner Product Spaces 55
It is easy to check that, with this more "abstract" definition of transpose, and if the(/, y)th element of A is a(;, then the (i, y)th element of AT is a/,. It can also be checkedthat all the usual properties of the transpose hold, such as (Afl) = BT AT. However, thedefinition above allows us to extend the concept of transpose to the case of weighted innerproducts in the following way. Suppose A e Rmxn and let {-, -}g and (•, -}R, with Q andR positive definite, be weighted inner products on Rm and W, respectively. Then we candefine the "weighted transpose" A# as the unique map that satisfies
(x, Ay)Q = (A#x, y)R for all x e Rm and for all y e W1.
By Example 7.12 above, we must then have XT QAy = xT(A#) Ry for all x, y. Hence wemust have QA = (A#) R. Taking transposes (of the usual variety) gives AT Q = RA#.Since R is nonsingular, we find
A* = /r'A' Q.
We can also generalize the notion of orthogonality (xTy = 0) to Q -orthogonality (Q isa positive definite matrix). Two vectors x, y e W are <2-orthogonal (or conjugate withrespect to Q) if ( x , y } Q = XT Qy = 0. Q -orthogonality is an important tool used instudying conjugate direction methods in optimization theory.
Definition 7.14. Let V be a vector space over C. Then {-, •} : V x V -> C is a complexinner product if
1. ( x , x ) > Qfor all x e V and ( x , x ) =0 if and only ifx = 0.
2. (x, y) = (y, x) for all x, y e V.
3. (x,ayi + fiy2) = a(x, y\) + fi(x, y2}forallx, y\, y2 e V and for alia, ft 6 C.
Remark 7.15. We could use the notation {•, -}c to denote a complex inner product, butif the vectors involved are complex-valued, the complex inner product is to be understood.Note, too, from part 2 of the definition, that ( x , x ) must be real for all x.
Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have
(ax\ + fix2, y) = a(x\, y) + P(x2, y}.
Remark 7.17. The Euclidean inner product of x, y e C" is given by
The conventional definition of the complex Euclidean inner product is (x, y} = yHx but weuse its complex conjugate xHy here for symmetry with the real case.
Remark 7.18. A weighted inner product can be defined as in the real case by (x, y}Q —XH Qy, for arbitrary Q = QH > 0. The notion of Q -orthogonality can be similarlygeneralized to the complex case.
7.2. Inner product Spaces 55
It is easy to check that, with this more "abstract" definition of transpose, and if the (i, j)th element of A is aij, then the (i, j)th element of AT is ap. It can also be checked that all the usual properties of the transpose hold, such as (AB) = BT AT. However, the definition above allows us to extend the concept of transpose to the case of weighted inner products in the following way. Suppose A E ]Rm xn and let (., .) Q and (., .) R, with Q and R positive definite, be weighted inner products on IRm and IRn, respectively. Then we can define the "weighted transpose" A # as the unique map that satisfies
(x, AY)Q = (A#x, Y)R for all x E IRm and for all Y E IRn.
By Example 7.l2 above, we must then have x T QAy = x T (A#{ Ry for all x, y. Hence we must have QA = (A#{ R. Taking transposes (of the usual variety) gives AT Q = RA#. Since R is nonsingular, we find
A# = R-1AT Q.
We can also generalize the notion of orthogonality (x T y = 0) to Q-orthogonality (Q is a positive definite matrix). Two vectors x, y E IRn are Q-orthogonal (or conjugate with respect to Q) if (x, y) Q = X T Qy = O. Q-orthogonality is an important tool used in studying conjugate direction methods in optimization theory.
Definition 7.14. Let V be a vector space over <C. Then (., .) : V x V -+ C is a complex inner product if
1. (x, x) :::: 0 for all x E V and (x, x) = 0 if and only if x = O.
2. (x, y) = (y, x) for all x, y E V.
3. (x, aYI + f3Y2) = a(x, yll + f3(x, Y2) for all x, YI, Y2 E V andfor all a, f3 E c.
Remark 7.15. We could use the notation (., ·)e to denote a complex inner product, but if the vectors involved are complex-valued, the complex inner product is to be understood. Note, too, from part 2 of the definition, that (x, x) must be real for all x.
Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have
Remark 7.17. The Euclidean inner product of x, y E Cn is given by
n
(x, y) = LXiYi = xHy. i=1
The conventional definition of the complex Euclidean inner product is (x, y) = yH x but we use its complex conjugate x H y here for symmetry with the real case.
Remark 7.1S. A weighted inner product can be defined as in the real case by (x, y)Q = x H Qy, for arbitrary Q = QH > o. The notion of Q-orthogonality can be similarly generalized to the complex case.
56 Chapter 7. Projections, Inner Product Spaces, and Norms
Definition 7.19. A vector space (V, F) endowed with a specific inner product is called aninner product space. If F = C, we call V a complex inner product space. If F = R, wecall V a real inner product space.
Example 7.20.
1. Check that V = R" x" with the inner product (A, B) = Tr AT B is a real inner productspace. Note that other choices are possible since by properties of the trace function,TrATB = TrBTA = TrABT = TrBAT.
2. Check that V = Cnx" with the inner product (A, B) = Tr AHB is a complex innerproduct space. Again, other choices are possible.
Definition 7.21. Let V be an inner product space. For v e V, we define the norm (orlength) ofv by \\v\\ = */(v, v). This is called the norm induced by ( - , - ) .
Example 7.22.
1. If V = E." with the usual inner product, the induced norm is given by ||i>|| =xV—*« 9\ 7(E,=i<Y) 2 -
2. If V = C" with the usual inner product, the induced norm is given by \\v\\ =
(£?=,l»,-l2)*.
Theorem 7.23. Let P be an orthogonal projection on an inner product space V. Then\\Pv\\ < \\v\\forallv e V.
Proof: Since P is an orthogonal projection, P2 = P = P#. (Here, the notation P# denotesthe unique linear transformation that satisfies ( P u , v } = (u, P#v) for all u, v e V. If thisseems a little too abstract, consider V = R" (or C"), where P# is simply the usual PT (orPH)). Hence ( P v , v) = (P2v, v) = (Pv, P#v) = ( P v , Pv) = \\Pv\\2 > 0. Now / - P isalso a projection, so the above result applies and we get
from which the theorem follows.
Definition 7.24. The norm induced on an inner product space by the "usual" inner productis called the natural norm.
In case V = C" or V = R", the natural norm is also called the Euclidean norm. Inthe next section, other norms on these vector spaces are defined. A converse to the aboveprocedure is also available. That is, given a norm defined by \\x\\ — •>/(•*> x), an innerproduct can be defined via the following.
56 Chapter 7. Projections, Inner Product Spaces, and Norms
Definition 7.19. A vector space (V, IF) endowed with a specific inner product is called an inner product space. If IF = e, we call V a complex inner product space. If IF = R we call V a real inner product space.
Example 7.20.
1. Check that V = IRn xn with the inner product (A, B) = Tr AT B is a real inner product space. Note that other choices are possible since by properties of the trace function, Tr AT B = Tr B T A = Tr A B T = Tr BAT.
2. Check that V = enxn with the inner product (A, B) = Tr AH B is a complex inner product space. Again, other choices are possible.
Definition 7.21. Let V be an inner product space. For v E V, we define the norm (or length) ofv by IIvll = -J(V,V). This is called the norm induced by (', .).
Example 7.22.
1. If V = IRn with the usual inner product, the induced norm is given by II v II n 2 1
(Li=l Vi )2.
2. If V = en with the usual inner product, the induced norm is given by II v II = "n 2 ! (L...i=l IVi I ) .
Theorem 7.23. Let P be an orthogonal projection on an inner product space V. Then IIPvll ::::: Ilvll for all v E V.
Proof: Since P is an orthogonal projection, p2 = P = pH. (Here, the notation p# denotes the unique linear transformation that satisfies (Pu, v) = (u, p#v) for all u, v E V. If this seems a little too abstract, consider V = IRn (or en), where p# is simply the usual pT (or pH)). Hence (Pv, v) = (P2v, v) = (Pv, p#v) = (Pv, Pv) = IIPvll 2 ::: O. Now /- P is also a projection, so the above result applies and we get
0::::: ((I - P)v. v) = (v. v) - (Pv, v)
= IIvll2 - IIPvll 2
from which the theorem follows. 0
Definition 7.24. The norm induced on an inner product space by the "usual" inner product is called the natural norm.
In case V = en or V = IRn, the natural norm is also called the Euclidean norm. In the next section, other norms on these vector spaces are defined. A converse to the above procedure is also available. That is, given a norm defined by IIx II = .j(X,X}, an inner product can be defined via the following.
7.3. Vector Norms 57
Theorem 7.25 (Polarization Identity).
1. For x, y € R", an inner product is defined by
7.3 Vector NormsDefinition 7.26. Let (V, F) be a vector space. Then \ \ - \ \ : V ->• R is a vector norm if itsatisfies the following three properties:
2. For x, y e C", an inner product is defined by
where j = i = \/—T.
(This is called the triangle inequality, as seen readily from the usual diagram illustrating the sum of two vectors in R2.)
Remark 7.27. It is convenient in the remainder of this section to state results for complex-valued vectors. The specialization to the real case is obvious.
Definition 7.28. A vector space (V, F) is said to be a normed linear space if and only ifthere exists a vector norm || • || : V -> R satisfying the three conditions of Definition 7.26.
Example 7.29.
1. For x e C", the Holder norms, or p-norms, are defined by
Special cases:
(The second equality is a theorem that requires proof.)
7.3. Vector Norms
Theorem 7.25 (Polarization Identity).
1. For x, y E m~n, an inner product is defined by
(x,y)=xTy= IIx+YIl2~IIX_YI12_
2. For x, y E en, an inner product is defined by
where j = i = .J=I.
7.3 Vector Norms
IIx + yll2 _ IIxll2 _ lIyll2
2
57
Definition 7.26. Let (V, IF) be a vector space. Then II . II : V ---+ IR is a vector norm ifit satisfies the following three properties:
1. Ilxll::: Of or all x E V and IIxll = 0 ifand only ifx = O.
2. Ilaxll = lalllxllforallx E Vandforalla E IF.
3. IIx + yll :::: IIxll + IIYliforall x, y E V. (This is called the triangle inequality, as seen readily from the usual diagram illustrating the sum of two vectors in ]R2 .)
Remark 7.27. It is convenient in the remainder of this section to state results for complexvalued vectors. The specialization to the real case is obvious.
Definition 7.28. A vector space (V, IF) is said to be a normed linear space if and only if there exists a vector norm II . II : V ---+ ]R satisfying the three conditions of Definition 7.26.
Example 7.29.
1. For x E en, the HOlder norms, or p-norms, are defined by
Special cases:
(a) Ilx III = L:7=1 IXi I (the "Manhattan" norm). 1 1
(b) Ilxllz = (L:7=1Ix;l2)2 = (X H X)2 (the Euclidean norm).
(c) Ilxlioo = maxlx;l = lim IIxllp-IE!! p---++oo
(The second equality is a theorem that requires proof.)
58 Chapter 7. Projections, Inner Product Spaces, and Norms
2. Some weighted p-norms:
(a) ||JC||,.D = E^rf/l*/!, where 4 > 0.
(b) Ikl lz.g — (xh QXY > where Q = QH > 0 (this norm is more commonlydenoted || • ||c).
3. On the vector space (C[to, t \ ] , R), define the vector norm
On the vector space ((C[to, t\])n, R), define the vector norm
Fhcorem 7.30 (Holder Inequality). Let x, y e C". Ther,
A particular case of the Holder inequality is of special interest.
Theorem 7.31 (Cauchy-Bunyakovsky-Schwarz Inequality). Let x, y e C". Then
with equality if and only if x and y are linearly dependent.
Proof: Consider the matrix [x y] e C"x2. Since
is a nonnegative definite matrix, its determinant must be nonnegative. In other words,0 < ( x H x ) ( y H y ) — ( x H y ) ( y H x ) . Since yHx = xHy, we see immediately that \XHy\ <
\\X\\2\\y\\2- D
Note: This is not the classical algebraic proof of the Cauchy-Bunyakovsky-Schwarz(C-B-S) inequality (see, e.g., [20, p. 217]). However, it is particularly easy to remember.
Remark 7.32. The angle 0 between two nonzero vectors x, y e C" may be defined bycos# = I, „ |.̂ || , 0 < 0 < 5-. The C-B-S inequality is thus equivalent to the statement
Il-Mmlylb — ^| COS 0 | < 1.
Remark 7.33. Theorem 7.31 and Remark 7.32 are true for general inner product spaces.
Remark 7.34. The norm || • ||2 is unitarily invariant, i.e., if U € C"x" is unitary, then\\Ux\\2 = \\x\\2 (Proof. \\Ux\\l = xHUHUx = XHX = \\x\\\). However, || - ||, and || - 1^
58 Chapter 7. Projections, Inner Product Spaces, and Norms
2. Some weighted p-norms:
(a) IIxll1.D = L~=ld;lx;l, whered; > O. 1
(b) IIx IIz.Q = (x H Qx) 2, where Q = QH > 0 (this norm is more commonly denoted II . IIQ)'
3. On the vector space (C[to, ttl, 1Ft), define the vector norm
11111 = max 1/(t)I· to:::.t~JI
On the vector space «e[to, ttlr, 1Ft), define the vector norm
1111100 = max II/(t) 11 00 , tO~t:5.tl
Theorem 7.30 (HOlder Inequality). Let x, y E en. Then
I I -+-=1. p q
A particular case of the HOlder inequality is of special interest.
Theorem 7.31 (Cauchy-Bunyakovsky-Schwarz Inequality). Let x, y E en. Then
with equality if and only if x and yare linearly dependent.
Proof' Consider the matrix [x y] E en x2 . Since
is a nonnegative definite matrix, its determinant must be nonnegative. In other words, o ~ (x H x)(yH y) - (x H y)(yH x). Since yH x = x H y, we see immediately that IXH yl ~ IIxll2l1yllz. 0
Note: This is not the classical algebraic proof of the Cauchy-Bunyakovsky-Schwarz (C-B-S) inequality (see, e.g., [20, p. 217]). However, it is particularly easy to remember.
Remark 7.32. The angle e between two nonzero vectors x, y E en may be defined by
cos e = 1I;~~1~1112' 0 ~ e ~ I' The C-B-S inequality is thus equivalent to the statement 1 cose 1 ~ 1.
Remark 7.33. Theorem 7.31 and Remark 7.32 are true for general inner product spaces.
Remark 7.34. The norm II . 112 is unitarily invariant, i.e., if U E enxn is unitary, then IIUxll2 = IIxll2 (Proof IIUxili = XHUHUx = xHx = IIxlli)· However, 11·111 and 1I·IIClO
7.4. Matrix Norms 59
are not unitarily invariant. Similar remarks apply to the unitary invariance of norms of realvectors under orthogonal transformation.
Remark 7.35. If x, y € C" are orthogonal, then we have the Pythagorean Identity
7.4 Matrix NormsIn this section we introduce the concept of matrix norm. As with vectors, the motivation forusing matrix norms is to have a notion of either the size of or the nearness of matrices. Theformer notion is useful for perturbation analysis, while the latter is needed to make sense of"convergence" of matrices. Attention is confined to the vector space (Wnxn, R) since that iswhat arises in the majority of applications. Extension to the complex case is straightforwardand essentially obvious.
Definition 7.39. || • || : Rmx" -> E is a matrix norm if it satisfies the following threeproperties:
2 _ _/ / .the proof of which follows easily from ||z||2 = z z.
Theorem 7.36. All norms on C" are equivalent; i.e., there exist constants c\, c-i (possiblydepending onn) such that
Example 7.37. For x G C", the following inequalities are all tight bounds; i.e., there existvectors x for which equality holds:
Finally, we conclude this section with a theorem about convergence of vectors. Con-vergence of a sequence of vectors to some limit vector can be converted into a statementabout convergence of real numbers, i.e., convergence in terms of vector norms.
Theorem 7.38. Let \\ • \\ be a vector norm and suppose v, i»(1), v(2\ ... e C". Then
7.4. Matrix Norms 59
are not unitarily invariant. Similar remarks apply to the unitary invariance of norms of real vectors under orthogonal transformation.
Remark 7.35. If x, y E en are orthogonal, then we have the Pythagorean Identity
Ilx ± YII~ = IIxll~ + IIYII~,
the proof of which follows easily from liz II~ = ZH z.
Theorem 7.36. All norms on en are equivalent; i.e., there exist constants CI, C2 (possibly depending on n) such that
Example 7.37. For x E en, the following inequalities are all tight bounds; i.e., there exist vectors x for which equality holds:
Ilxlll :::: Jn Ilxlb Ilxll2:::: IIxll» IIxlloo :::: IIxll»
Ilxlll :::: n IIxlloo; IIxl12 :::: Jn Ilxll oo ; IIxlioo :::: IIxllz.
Finally, we conclude this section with a theorem about convergence of vectors. Convergence of a sequence of vectors to some limit vector can be converted into a statement about convergence of real numbers, i.e., convergence in terms of vector norms.
Theorem 7.38. Let II· II be a vector norm and suppose v, v(l), v(2), ... E en. Then
lim V(k) = v if and only if lim II v(k) - v II = O. k4+00 k~+oo
7.4 Matrix Norms In this section we introduce the concept of matrix norm. As with vectors, the motivation for using matrix norms is to have a notion of either the size of or the nearness of matrices. The former notion is useful for perturbation analysis, while the latter is needed to make sense of "convergence" of matrices. Attention is confined to the vector space (IRm xn , IR) since that is what arises in the majority of applications. Extension to the complex case is straightforward and essentially obvious.
Definition 7.39. II· II : IRmxn ~ IR is a matrix norm if it satisfies the following three properties:
1. IIAII ~ Of or all A E IRmxn and IIAII = 0 if and only if A = O.
2. lIaAl1 = lalliAliforall A E IRmxn andfor all a E IR.
3. IIA + BII :::: IIAII + IIBII for all A, BE IRmxn.
(As with vectors, this is called the triangle inequality.)
60 Chapter 7. Projections, Inner Product Spaces, and Norms
Example 7.40. Let A e Rmx". Then the Frobenius norm (or matrix Euclidean norm) isdefined by
^wncic r = laiiK^/i;;.
Example 7.41. Let A e Rmxn. Then the matrix p-norms are defined by
The following three special cases are important because they are "computable." Each is atheorem and requires a proof.
1. The "maximum column sum" norm is
2. The "maximum row sum" norm is
3. The spectral norm is
Example 7.42. Let A E Rmxn. The Schatten/7-norms are defined by
Some special cases of Schatten /?-norms are equal to norms defined previously. For example,|| . || 5 2 = || . \\F and || • ||5i00 = || • ||2. The norm || • ||5>1 is often called the trace norm.
Example 7.43. Let A e Kmx". Then "mixed" norms can also be defined by
Example 7.44. The "matrix analogue of the vector 1-norm," || A\\s = ^ j \ai}; |, is a norm.
The concept of a matrix norm alone is not altogether useful since it does not allow usto estimate the size of a matrix product A B in terms of the sizes of A and B individually.
60 Chapter 7. Projections, Inner Product Spaces, and Norms
Example 7.40. Let A E lR,mxn. Then the Frobenius norm (or matrix Euclidean norm) is defined by
IIAIIF ~ (t. t ai;) I ~ (t. altA)) 1 ~ (T, (A' A)) 1 ~ (T, (AA '));
(where r = rank(A)).
Example 7.41. Let A E lR,mxn. Then the matrix p-norms are defined by
IIAxll IIAII = max -_P = max IIAxll .
P Ilxllp;60 Ilxli p IIxllp=1 p
The following three special cases are important because they are "computable." Each is a theorem and requires a proof.
I. The "maximum column sum" norm is
2. The "maximum row sum" norm is
IIAlioo = max rE!!l. (t laUI).
J=1
3. The spectral norm is
tTL T IIAII2 = Amax(A A) = A~ax(AA ) = a1(A).
Note: IIA+llz = l/ar(A), where r = rank(A).
Example 7.42. Let A E lR,mxn. The Schattenp-norms are defined by
I
IIAlls.p = (at' + ... + a!)"".
Some special cases of Schatten p-norms are equal to norms defined previously. For example, 11·115.2 = II . IIF and 11'115,00 = II . 112' The norm II . 115.1 is often called the trace norm.
Example 7.43. Let A E lR,mxn _ Then "mixed" norms can also be defined by
IIAII = max IIAxil p p,q 11.<110#0 IIxllq
Example 7.44. The "matrix analogue of the vector I-norm," IIAlis = Li.j laij I, is a norm.
The concept of a matrix norm alone is not altogether useful since it does not allow us to estimate the size of a matrix product AB in terms of the sizes of A and B individually.
7.4. Matrix Norms 61
Notice that this difficulty did not arise for vectors, although there are analogues for, e.g.,inner products or outer products of vectors. We thus need the following definition.
Definition 7.45. Let A e Rmxn, B e Rnxk. Then the norms \\ • \\a, \\ • \\p, and \\ • \\y aremutually consistent if \\ A B \\ a < \\A\\p\\B\\y. A matrix norm \\ • \\ is said to be consistentif \\AB\\ < || A || || fi|| whenever the matrix product is defined.
Example 7.46.
1. || • ||/7 and || • || p for all p are consistent matrix norms.
2. The "mixed" norm
is a matrix norm but it is not consistent. For example, take A = B = \ \ J1. Then||Afl|| l i00 = 2while||A|| l i00 | |B||1>00 = l.
The p -norms are examples of matrix norms that are subordinate to (or induced by)a vector norm, i.e.,
11^4^11(or, more generally, ||A|| = max^o ., . . p ) . For such subordinate norms, also called oper-ator norms, we clearly have ||Ajc|| < ||A||1|jt||. Since ||Afijc|| < ||A||||fljc|| < ||A||||fl||||jt||,it follows that all subordinate norms are consistent.
Theorem 7.47. There exists a vector x* such that ||Ajt*|| = ||A|| ||jc*|| if the matrix norm issubordinate to the vector norm.
Theorem 7.48. If \\ • \\m is a consistent matrix norm, there exists a vector norm \\ • \\v
consistent with it, i.e., H A j c J I ^ < \\A\\m \\x\\v.
Not every consistent matrix norm is subordinate to a vector norm. For example,consider || • \\F. Then||A^||2 < ||A||F||jc||2, so || • ||2 is consistent with || • ||F, but there doesnot exist a vector norm || • || such that ||A||F is given by max^o \^ •
Useful Results
The following miscellaneous results about matrix norms are collected for future reference.The interested reader is invited to prove each of them as an exercise.
2. For A e R"x", the following inequalities are all tight, i.e., there exist matrices A forwhich equality holds:
7.4. Matrix Norms 61
Notice that this difficulty did not arise for vectors, although there are analogues for, e.g., inner products or outer products of vectors. We thus need the following definition.
Definition 7.45. Let A E ]Rmxn, B E ]Rnxk. Then the norms II . II", II· Ilfl' and II . lIy are mutuallyconsistentifIlABII,,::S IIAllfllIBlly. A matrix norm 11·11 is said to be consistent if II A B II ::s II A 1111 B II whenever the matrix product is defined.
Example 7.46.
1. II· II F and II . II p for all p are consistent matrix norms.
2. The "mixed" norm IIAxll1
II· 11 100 = max -- = max laijl , x;60 Ilx 1100 i,j
is a matrix norm but it is not consistent. For example, take A = B = [: :]. Then
IIABIII,oo = 2 while IIAIII,ooIlBIII,oo = 1.
The p-norms are examples of matrix norms that are subordinate to (or induced by) a vector norm, i.e.,
IIAxl1 IIAII = max -- = max IIAxl1
x;60 IIx II Ilxll=1 IIAxll . (or, more generally, IIAllp,q = maxx;60 IIxll
q
P), For such subordmate norms, also caUedoper-
atornorms, wec1earlyhave IIAxll ::s IIAllllxll· Since IIABxl1 ::s IIAlIllBxll ::s IIAIIIIBllllxll, it follows that all subordinate norms are consistent.
Theorem 7.47. There exists a vector x* such that IIAx*11 = IIAllllx*11 if the matrix norm is subordinate to the vector norm.
Theorem 7.48. If II . 11m is a consistent matrix norm, there exists a vector norm II . IIv consistent with it, i.e., IIAxliv ::s IIAlim Ilxli v'
Not every consistent matrix norm is subordinate to a vector norm. For example, consider II . II F' Then II Ax 112 ::s II A II Filx 112, so II . 112 is consistent with II . II F, but there does not exist a vector norm II . II such that IIAIIF is given by maxx ;60 ",~~i'.
Useful Results
The following miscellaneous results about matrix norms are collected for future reference. The interested reader is invited to prove each of them as an exercise.
1. II In II p = 1 for all p, while IIIn II F = .jii. 2. For A E ]Rnxn, the following inequalities are all tight, i.e., there exist matrices A for
which equality holds:
IIAIII ::s .jii IIAlb IIAII2 ::s.jii IIAII I, II A 1100 ::s n IIAII I, IIAIIF ::s.jii IIAII I,
IIAIII ::s n IIAlloo, IIAII2 ::s .jii IIAlloo, IIAlioo ::s .jii IIAII2, IIAIIF ::s .jii IIAlb
IIAIII ::s .jii II A IIF; IIAII2::S IIAIIF; IIAlioo ::s .jii IIAIIF; IIAIIF ::s .jii IIAlioo'
62 Chapter 7. Projections, Inner Product Spaces, and Norms
3. For A eRmxa.
4. The norms || • \\F and || • ||2 (as well as all the Schatten /?-norms, but not necessarilyother p-norms) are unitarily invariant; i.e., for all A e Rmx" and for all orthogonalmatrices Q zRmxm and Z e M"x", ||(MZ||a = ||A||afora = 2 or F.
Convergence
The following theorem uses matrix norms to convert a statement about convergence of asequence of matrices into a statement about the convergence of an associated sequence ofscalars.
Theorem 7.49. Let \\ -\\bea matrix norm and suppose A, A(1), A(2), ... e Rmx". Then
EXERCISES
1. If P is an orthogonal projection, prove that P+ = P.
2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P — Qmust be an orthogonal matrix.
3. Prove that / — A+A is an orthogonal projection. Also, prove directly that V2 V/ is anorthogonal projection, where ¥2 is defined as in Theorem 5.1.
4. Suppose that a matrix A e Wnxn has linearly independent columns. Prove that theorthogonal projection onto the space spanned by these column vectors is given by thematrix P = A(ATA)~}AT.
5. Find the (orthogonal) projection of the vector [2 3 4]r onto the subspace of R3
spanned by the plane 3;c — v + 2z = 0.
6. Prove that E"x" with the inner product (A, B) = Tr ATB is a real inner productspace.
7. Show that the matrix norms || • ||2 and || • \\F are unitarily invariant.
8. Definition: Let A e Rnxn and denote its set of eigenvalues (not necessarily distinct)by {A-i , . . . ,> . „} . The spectral radius of A is the scalar
62 Chapter 7. Projections, Inner Product Spaces, and Norms
3. For A E IRmxn ,
max laijl :::: IIAII2 :::: ~ max laijl. l.] l.]
4. The norms II . IIF and II . 112 (as well as all the Schatten p-norms, but not necessarily other p-norms) are unitarily invariant; i.e., for all A E IRmxn and for all orthogonal matrices Q E IRmxm and Z E IRnxn
, IIQAZlia = IIAlla fora = 2 or F.
Convergence
The following theorem uses matrix norms to convert a statement about convergence of a sequence of matrices into a statement about the convergence of an associated sequence of scalars.
Theorem 7.49. Let II ·11 be a matrix norm and suppose A, A(I), A(2), ... E IRmxn. Then
lim A (k) = A if and only if lim IIA (k) - A II = o. k~+oo k~+oo
EXERCISES
1. If P is an orthogonal projection, prove that p+ = P.
2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P - Q must be an orthogonal matrix.
3. Prove that I - A + A is an orthogonal projection. Also, prove directly that V2 Vl is an orthogonal projection, where V2 is defined as in Theorem 5.1.
4. Suppose that a matrix A E IRmxn has linearly independent columns. Prove that the orthogonal projection onto the space spanned by these column vectors is given by the matrix P = A(AT A) -1 AT.
5. Find the (orthogonal) projection of the vector [2 3 4f onto the subspace of 1R3
spanned by the plane 3x - y + 2z = O.
6. Prove that IRn xn with the inner product (A, B) = Tr AT B is a real inner product space.
7. Show that the matrix norms II . 112 and II . IIF are unitarily invariant.
8. Definition: Let A E IRnxn and denote its set of eigenvalues (not necessarily distinct) by P.l, ... , An}. The spectral radius of A is the scalar
p(A) = max IA;I. i
Exercises 63
Determine ||A||F, H A I d , ||A||2, H A H ^ , and p(A). (An n x n matrix, all of whosecolumns and rows as well as main diagonal and antidiagonal sum to s = n(n2 + l)/2,is called a "magic square" matrix. If M is a magic square matrix, it can be provedthat || M Up = s for all/?.)
10. Let A = xyT, where both x, y e R" are nonzero. Determine ||A||F, ||A||j, ||A||2,and ||A||oo in terms of \\x\\a and/or \\y\\p, where a and ft take the value 1, 2, or oo asappropriate.
Let
9. Let
Determine ||A||F, \\A\\lt ||A||2, HA^, and p(A).
Exercises 63
Let
A=[~ 0 ~]. 14 12 5
Determine IIAIIF' IIAII I , IIAlb IIAlloo, and peA).
9. Let
A=[~ ~ ~]. 492
Determine IIAIIF' IIAII I , IIAlb IIAlloo, and peA). (An n x n matrix, all of whose columns and rows as well as main diagonal and antidiagonal sum to s = n (n2 + 1) /2, is called a "magic square" matrix. If M is a magic square matrix, it can be proved that IIMllp = s for all p.)
10. Let A = xyT, where both x, y E IRn are nonzero. Determine IIAIIF' IIAIII> IIAlb and II A 1100 in terms of IIxlla and/or IlylljJ, where ex and {3 take the value 1,2, or (Xl as appropriate.
This page intentionally left blank This page intentionally left blank
Chapter 8
Linear Least SquaresProblems
8.1 The Linear Least Squares ProblemProblem: Suppose A e Rmx" with m > n and b <= Rm is a given vector. The linear leastsquares problem consists of finding an element of the set
Solution: The set X has a number of easily verified properties:
1. A vector x e X if and only if ATr = 0, where r = b — Ax is the residual associatedwith x. The equations ATr — 0 can be rewritten in the form ATAx = ATb and thelatter form is commonly known as the normal equations, i.e., x e X if and only ifx is a solution of the normal equations. For further details, see Section 8.2.
2. A vector x E X if and onlv if x is of the form
To see why this must be so, write the residual r in the form
Now, (Pn(A)b — AJC) is clearly in 7£(A), while
so these two vectors are orthogonal. Hence,
from the Pythagorean identity (Remark 7.35). Thus, ||A.x — b\\\ (and hence p ( x ) =\\Ax —b\\2) assumes its minimum value if and only if
65
Chapter 8
Linear Least Squares Problems
8.1 The Linear Least Squares Problem Problem: Suppose A E jRmxn with m 2: nand b E jRm is a given vector. The linear least squares problem consists of finding an element of the set
x = {x E jRn : p(x) = IIAx - bll 2 is minimized}.
Solution: The set X has a number of easily verified properties:
1. A vector x E X if and only if AT r = 0, where r = b - Ax is the residual associated with x. The equations AT r = 0 can be rewritten in the form A T Ax = AT b and the latter form is commonly known as the normal equations, i.e., x E X if and only if x is a solution of the normal equations. For further details, see Section 8.2.
2. A vector x E X if and only if x is of the form
x=A+b+(I-A+A)y, whereyEjRnisarbitrary. (8.1)
To see why this must be so, write the residual r in the form
r = (b - PR(A)b) + (PR(A)b - Ax).
Now, (PR(A)b - Ax) is clearly in 'R(A), while
(b - PR(A)b) = (I - PR(A))b
= PR(A),,-b E 'R(A)-L
so these two vectors are orthogonal. Hence,
IIrll~ = lib - Axll~ = lib - PR(A)bll~ + IIPR(A)b - Axll~
from the Pythagorean identity (Remark 7.35). Thus, IIAx - bll~ (and hence p(x) = II Ax - b 112) assumes its minimum value if and only if
(8.2)
65
66 Chapter 8. Linear Least Squares Problems
and this equation always has a solution since AA+b e 7£(A). By Theorem 6.3, allsolutions of (8.2) are of the form
where y e W is arbitrary. The minimum value of p ( x ) is then clearly equal to
the last inequality following by Theorem 7.23.
3. X is convex. To see why, consider two arbitrary vectors jci = A+b + (I — A+A)yand *2 = A+b + (I — A+A)z in X. Let 6 e [0, 1]. Then the convex combination0*i + (1 - #)*2 = A+b + (I - A+A)(Oy + (1 - 0)z) is clearly in X.
4. X has a unique element x* of minimal 2-norm. In fact, x* = A+b is the unique vectorthat solves this "double minimization" problem, i.e., x* minimizes the residual p ( x )and is the vector of minimum 2-norm that does so. This follows immediately fromconvexity or directly from the fact that all x e X are of the form (8.1) and
which follows since the two vectors are orthogonal.
5. There is a unique solution to the least squares problem, i.e., X = {x*} = {A+b}, ifand only if A+A = I or, equivalently, if and only if rank (A) = n.
Just as for the solution of linear equations, we can generalize the linear least squaresproblem to the matrix case.
Theorem 8.1. Let A e Emx" and B € Rmxk. The general solution to
is of the form
where Y € R"xfc is arbitrary. The unique solution of minimum 2-norm or F-norm isX = A+B.
Remark 8.2. Notice that solutions of the linear least squares problem look exactly thesame as solutions of the linear system AX = B. The only difference is that in the caseof linear least squares solutions, there is no "existence condition" such as K(B) c 7£(A).If the existence condition happens to be satisfied, then equality holds and the least squares
66 Chapter 8. Linear Least Squares Problems
and this equation always has a solution since AA+b E R(A). By Theorem 6.3, all solutions of (8.2) are of the form
x = A+ AA+b + (I - A+ A)y
=A+b+(I-A+A)y,
where y E ]R.n is arbitrary. The minimum value of p (x) is then clearly equal to
lib - PR(A)bll z = 11(1 - AA+)bI1 2
~ Ilbllz,
the last inequality following by Theorem 7.23.
3. X is convex. To see why, consider two arbitrary vectors Xl = A + b + (I - A + A) y and Xz = A+b + (I - A+ A)z in X. Let 8 E [0,1]. Then the convex combination 8x, + (1 - 8)xz = A+b + (I - A+ A)(8y + (1 - 8)z) is clearly in X.
4. X has a unique element x" of minimal2-norm. In fact, x" = A + b is the unique vector that solves this "double minimization" problem, i.e., x* minimizes the residual p(x)
and is the vector of minimum 2-norm that does so. This follows immediately from convexity or directly from the fact that all x E X are of the form (8.1) and
which follows since the two vectors are orthogonal.
5. There is a unique solution to the least squares problem, i.e., X = {x"} = {A+b}, if and only if A + A = lor, equivalently, if and only if rank(A) = n.
Just as for the solution of linear equations, we can generalize the linear least squares problem to the matrix case.
Theorem 8.1. Let A E ]R.mxn and BE ]R.mxk. The general solution to
min IIAX - Bib XElRPlxk
is of the form
X=A+B+(I-A+A)Y,
where Y E ]R.nxk is arbitrary. The unique solution of minimum 2-norm or F-norm is X = A+B.
Remark 8.2. Notice that solutions of the linear least squares problem look exactly the same as solutions of the linear system AX = B. The only difference is that in the case of linear least squares solutions, there is no "existence condition" such as R(B) S; R(A). If the existence condition happens to be satisfied. then equality holds and the least squares
8.3 Linear Regression and Other Linear Least Squares Problems 67
residual is 0. Of all solutions that give a residual of 0, the unique solution X = A+B hasminimum 2-norm or F-norm.
Remark 8.3. If we take B = Im in Theorem 8.1, then X = A+ can be interpreted assaying that the Moore-Penrose pseudoinverse of A is the best (in the matrix 2-norm sense)matrix such that AX approximates the identity.
Remark 8.4. Many other interesting and useful approximation results are available for thematrix 2-norm (and F-norm). One such is the following. Let A e M™x" with SVD
8.2 Geometric SolutionLooking at the schematic provided in Figure 8.1, it is apparent that minimizing || Ax — b\\2
is equivalent to finding the vector x e W1 for which p — Ax is closest to b (in the Euclideannorm sense). Clearly, r = b — Ax must be orthogonal to 7£(A). Thus, if Ay is an arbitraryvector in 7£(A) (i.e., y is arbitrary), we must have
Then a best rank k approximation to A for l < f c < r , i . e . , a solution to
is given by
The special case in which m = n and k = n — 1 gives a nearest singular matrix to A e
Since y is arbitrary, we must have ATb — ATAx = 0 or ArA;c = ATb.Special case: If A is full (column) rank, then x = (AT A) ATb.
8.3 Linear Regression and Other Linear Least SquaresProblems
8.3.1 Example: Linear regression
Suppose we have m measurements (t\,y\), . . . , (tm,ym) for which we hypothesize a linear(affine) relationship
8.3 Linear Regression and Other Linear Least Squares Problems 67
residual is O. Of all solutions that give a residual of 0, the unique solution X = A + B has minimum 2-norm or F -norm.
Remark 8.3. If we take B = 1m in Theorem 8.1, then X = A+ can be interpreted as saying that the Moore-Penrose pseudoinverse of A is the best (in the matrix 2-norm sense) matrix such that AX approximates the identity.
Remark 8.4. Many other interesting and useful approximation results are available for the matrix 2-norm (and F -norm). One such is the following. Let A E lR~xn with SVD
A = U~VT = LOiUiV!. i=l
Then a best rank k approximation to A for 1 :s k :s r, i.e., a solution to
min IIA - MIi2, MEJRZ'xn
is given by k
Mk = LOiUiV!. i=1
The special case in which m = nand k = n - 1 gives a nearest singular matrix to A E lR~ x n .
8.2 Geometric Solution Looking at the schematic provided in Figure 8.1, it is apparent that minimizing IIAx - bll 2
is equivalent to finding the vector x E lRn for which p = Ax is closest to b (in the Euclidean norm sense). Clearly, r = b - Ax must be orthogonal to R(A). Thus, if Ay is an arbitrary vector in R(A) (i.e., y is arbitrary), we must have
0= (Ay)T (b - Ax)
=yTAT(b-Ax)
= yT (ATb _ AT Ax).
Since y is arbitrary, we must have AT b - AT Ax = 0 or AT Ax = AT b.
Special case: If A is full (column) rank, then x = (AT A)-l ATb.
8.3 Linear Regression and Other Linear Least Squares Problems
8.3.1 Example: Linear regression
Suppose we have m measurements (ll, YI), ... , (trn, Ym) for which we hypothesize a linear (affine) relationship
y = at + f3 (8.3)
68 Chapter 8. Linear Least Squares Problems
Figure 8.1. Projection of b on K(A).
for certain constants a. and ft. One way to solve this problem is to find the line that best fitsthe data in the least squares sense; i.e., with the model (8.3), we have
where &\,..., 8m are "errors" and we wish to minimize 8\ + • • • + 8^- Geometrically, weare trying to find the best line that minimizes the (sum of squares of the) distances from thegiven data points. See, for example, Figure 8.2.
Figure 8.2. Simple linear regression.
Note that distances are measured in the vertical sense from the points to the line (asindicated, for example, for the point (t\, y\}}. However, other criteria arc possible. For ex-ample, one could measure the distances in the horizontal sense, or the perpendicular distancefrom the points to the line could be used. The latter is called total least squares. Insteadof 2-norms, one could also use 1-norms or oo-norms. The latter two are computationally
68 Chapter 8. Linear Least Squares Problems
b r
p=Ax Ay E R(A)
Figure S.l. Projection of b on R(A).
for certain constants a and {3. One way to solve this problem is to find the line that best fits the data in the least squares sense; i.e., with the model (8.3), we have
YI = all + {3 + 81,
Y2 = al2 + {3 + 82
where 81, ... , 8m are "errors" and we wish to minimize 8? + ... + 8;. Geometrically, we are trying to find the best line that minimizes the (sum of squares of the) distances from the given data points. See, for example, Figure 8.2.
y
Figure 8.2. Simple linear regression.
Note that distances are measured in the venical sense from the point!; to [he line (a!; indicated. for example. for the point (tl. YIn. However. other criteria nrc po~~iblc. For cxample, one could measure the distances in the horizontal sense, or the perpendiculnr distance from the points to the line could be used. The latter is called total least squares. Instead of 2-norms, one could also use I-norms or oo-norms. The latter two are computationally
8.3. Linear Regression and Other Linear Least Squares Problems 69
much more difficult to handle, and thus we present only the more tractable 2-norm case intext that follows.
The ra "error equations" can be written in matrix form as
where
We then want to solve the problem
or, equivalently,
Solution: x — [^1 is a solution of the normal equations ATAx = ATy where, for thespecial form of the matrices above, we have
and
8.3.2 Other least squares problems
Suppose the hypothesized model is not the linear equation (8.3) but rather is of the form
y = f ( t ) = Cl0!(0 + • • • 4- cn<t>n(t). (8.5)
In (8.5) the </>,(0 are given (basis) functions and the c; are constants to be determined tominimize the least squares error. The matrix problem is still (8.4), where we now have
An important special case of (8.5) is least squares polynomial approximation, whichcorresponds to choosing 0,• (?) = t'~l, i; e n, although this choice can lead to computational
The solution for the parameters a and ft can then be written
8.3. Linear Regression and Other Linear Least Squares Problems 69
much more difficult to handle, and thus we present only the more tractable 2-norm case in text that follows.
The m "error equations" can be written in matrix form as
Y = Ax +0,
where
We then want to solve the problem
minoT 0 = min (Ax - y)T (Ax - y) x
or, equivalently, min lIoll~ = min II Ax - YII~.
x
Solution: x = [~] is a solution of the normal equations AT Ax
special form of the matrices above, we have
and
AT Y = [ Li ti Yi J. LiYi
The solution for the parameters a and f3 can then be written
8.3.2 Other least squares problems
(8.4)
AT y where, for the
Suppose the hypothesized model is not the linear equation (S.3) but rather is of the form
(8.5)
In (8.5) the ¢i(t) are given (basis) functions and the Ci are constants to be determined to minimize the least squares error. The matrix problem is still (S.4), where we now have
An important special case of (8.5) is least squares polynomial approximation, which corresponds to choosing ¢i (t) = t i
-1, i E !!, although this choice can lead to computational
70 Chapter 8. Linear Least Squares Problems
difficulties because of numerical ill conditioning for large n. Numerically better approachesare based on orthogonal polynomials, piecewise polynomial functions, splines, etc.
The key feature in (8.5) is that the coefficients c, appear linearly. The basis functions</>,- can be arbitrarily nonlinear. Sometimes a problem in which the c, 's appear nonlinearlycan be converted into a linear problem. For example, if the fitting function is of the formy = f ( t ) = c\eC2i, then taking logarithms yields the equation logy = logci + cjt. Thendefining y — logy, c\ = logci, and GI = cj_ results in a standard linear least squaresproblem.
8.4 Least Squares and Singular Value DecompositionIn the numerical linear algebra literature (e.g., [4], [7], [11], [23]), it is shown that solutionof linear least squares problems via the normal equations can be a very poor numericalmethod in finite-precision arithmetic. Since the standard Kalman filter essentially amountsto sequential updating of normal equations, it can be expected to exhibit such poor numericalbehavior in practice (and it does). Better numerical methods are based on algorithms thatwork directly and solely on A itself rather than AT A. Two basic classes of algorithms arebased on S VD and QR (orthogonal-upper triangular) factorization, respectively. The formeris much more expensive but is generally more reliable and offers considerable theoreticalinsight.
In this section we investigate solution of the linear least squares problem
The last equality follows from the fact that if v = [£ ], then ||u||^ = ||i>i \\\ + \\vi\\\ (notethat orthogonality is not what is used here; the subvectors can have different lengths). Thisexplains why it is convenient to work above with the square of the norm rather than thenorm. As far as the minimization is concerned, the two are equivalent. In fact, the lastquantity above is clearly minimized by taking z\ = S~lc\. The subvector z2 is arbitrary,while the minimum value of \\Ax — b\\^ is l^llr
via the SVD. Specifically, we assume that A has an SVD given by A = UT, VT = U\SVfas in Theorem 5.1. We now note that
70 Chapter 8. Linear Least Squares Problems
difficulties because of numerical ill conditioning for large n. Numerically better approaches are based on orthogonal polynomials, piecewise polynomial functions, splines, etc.
The key feature in (8.5) is that the coefficients Ci appear linearly. The basis functions ¢i can be arbitrarily nonlinear. Sometimes a problem in which the Ci'S appear nonlinearly can be converted into a linear problem. For example, if the fitting function is of the form Y = f (t) = c, eC2
/ , then taking logarithms yields the equation log y = log c, + c2f. Then defining y = log y, c, = log c" and C2 = C2 results in a standard linear least squares problem.
8.4 Least Squares and Singular Value Decomposition In the numerical linear algebra literature (e.g., [4], [7], [11], [23]), it is shown that solution of linear least squares problems via the normal equations can be a very poor numerical method in finite-precision arithmetic. Since the standard Kalman filter essentially amounts to sequential updating of normal equations, it can be expected to exhibit such poor numerical behavior in practice (and it does). Better numerical methods are based on algorithms that work directly and solely on A itself rather than AT A. Two basic classes of algorithms are based on SVD and QR (orthogonal-upper triangular) factorization, respectively. The former is much more expensive but is generally more reliable and offers considerable theoretical insight.
In this section we investigate solution of the linear least squares problem
min II Ax - b11 2, A E IRmxn, bE IRm
, (8.6) x
via the SVD. Specifically, we assume that A has an SVD given by A = U~VT = U,SVr as in Theorem 5.1. We now note that
IIAx - bll~ = IIU~VT x - bll~
= II ~ VT X - U T bll; since II . Ib is unitarily invariant
=11~z-cll~ wherez=VTx,c=UTb
= II [~ ~] [ ~~ ] - [ ~~ ] II:
= II [ sz~~ c, ] II:
The last equality follows from the fact that if v = [~~]. then II v II ~ = II viii ~ + II v211 ~ (note that orthogonality is not what is used here; the subvectors can have different lengths). This explains why it is convenient to work above with the square of the norm rather than the norm. As far as the minimization is concerned. the two are equivalent. In fact. the last quantity above is clearly minimized by taking z, = S-'c,. The subvector Z2 is arbitrary, while the minimum value of II Ax - b II ~ is II czll ~.
8.5. Least Squares and QR Factorization 71
Now transform back to the original coordinates:
The last equality follows from
Note that since 12 is arbitrary, V2z2 is an arbitrary vector in 7Z(V2) = A/"(A). Thus, x hasbeen written in the form x = A+b + (/ — A+A)_y, where y e Rm is arbitrary. This agrees,of course, with (8.1).
The minimum value of the least squares residual is
and we clearly have that
minimum least squares residual is 0 -4=> b is orthogonal to all vectors in U2
•<=^ b is orthogonal to all vectors in 7l(A}L
Another expression for the minimum residual is || (/ — AA+)b||2. This follows easily since||(7 - AA+)b\\2
2 - \\U2Ufb\\l = bTU2U^U2UJb = bTU2U*b = \\U?b\\22.
Finally, an important special case of the linear least squares problem is theso-called full-rank problem, i.e., A e 1R™X". In this case the SVD of A is given by
A = UZVT = [U{ t/2][o]^ir> and there is thus "no V2 part" to the solution.
8.5 Least Squares and QR FactorizationIn this section, we again look at the solution of the linear least squares problem (8.6) but thistime in terms of the QR factorization. This matrix factorization is much cheaper to computethan an SVD and, with appropriate numerical enhancements, can be quite reliable.
To simplify the exposition, we add the simplifying assumption that A has full columnrank, i.e., A e R™X M . It is then possible, via a sequence of so-called Householder or Givenstransformations, to reduce A in the following way. A finite sequence of simple orthogonalrow transformations (of Householder or Givens type) can be performed on A to reduce itto triangular form. If we label the product of such orthogonal row transformations as theorthogonal matrix QT € Rmxm, we have
B.S. Least Squares and QR Factorization 71
Now transform back to the original coordinates:
x = Vz
= [VI V21 [ ~~ ]
= VIZ I + V2Z2
= VIS-ici + V2Z2
= vls-Iufb + V2Z2.
The last equality follows from
c = U T b = [ ~ f: ] = [ ~~ l Note that since Z2 is arbitrary, V2 Z 2 is an arbitrary vector in R(V2 ) = N(A). Thus, x has been written in the form x = A + b + (I - A + A) y, where y E ffi.m is arbitrary. This agrees, of course, with (8.1).
The minimum value of the least squares residual is
and we clearly have that
minimum least squares residual is 0 {::=:} b is orthogonal to all vectors in U2
{::=:} b is orthogonal to all vectors in R(A)l.
{::=:} b E R(A).
Another expression for the minimum residual is II (I - AA +)bllz. This follows easily since
11(1- AA+)bll~ = 11U2U!b"~ = bTUZV!V2V!b = bTVZV!b = IIV!bll~. Finally, an important special case of the linear least squares problem is the
so-called full-rank problem, i.e., A E ffi.~xn. In this case the SVD of A is given by
A = V:EV T = [VI Vzl[g]Vr, and there is thus "no V2 part" to the solution.
8.5 Least Squares and QR Factorization In this section, we again look at the solution of the linear least squares problem (8.6) but this time in terms of the QR factorization. This matrix factorization is much cheaper to compute than an SVD and, with appropriate numerical enhancements, can be quite reliable.
To simplify the exposition, we add the simplifying assumption that A has full column rank, i.e., A E ffi.~xn. It is then possible, via a sequence of so-called Householder or Givens transformations, to reduce A in the following way. A finite sequence of simple orthogonal row transformations (of Householder or Givens type) can be performed on A to reduce it to triangular form. If we label the product of such orthogonal row transformations as the orthogonal matrix QT E ffi.mxm, we have
(8.7)
72 Chapter 8. Linear Least Squares Problems
where R e M£x" is upper triangular. Now write Q = [Q\ Q2], where Q\ e Rmx" andQ2 € K"IX(m~"). Both Q\ and <22 have orthonormal columns. Multiplying through by Qin (8.7), we see that
Any of (8.7), (8.8), or (8.9) are variously referred to as QR factorizations of A. Note that(8.9) is essentially what is accomplished by the Gram-Schmidt process, i.e., by writingAR~l = Q\ we see that a "triangular" linear combination (given by the coefficients ofR~l) of the columns of A yields the orthonormal columns of Q\.
Now note that
The last quantity above is clearly minimized by taking x = R lc\ and the minimum residualis \\C2\\2- Equivalently, we have x = R~lQ\b = A+b and the minimum residual is IIC?^!^-
EXERCISES
1. For A € Wxn, b e Em, and any y e R", check directly that (I - A+A)y and A+bare orthogonal vectors.
2. Consider the following set of measurements (*,, yt):
(a) Find the best (in the 2-norm sense) line of the form y = ax + ft that fits thisdata.
(b) Find the best (in the 2-norm sense) line of the form jc = ay + (3 that fits thisdata.
3. Suppose qi and q2 are two orthonormal vectors and b is a fixed vector, all in R".
(a) Find the optimal linear combination aq^ + fiq2 that is closest to b (in the 2-normsense).
(b) Let r denote the "error vector" b — ctq\ — flq2- Show that r is orthogonal toboth^i and q2.
72 Chapter 8. Linear Least Squares Problems
where R E ffi.~xn is upper triangular. Now write Q = [QI Qz], where QI E ffi.mxn and Qz E ffi.m x (m-n). Both Q I and Qz have orthonormal columns. Multiplying through by Q in (8.7), we see that
A=Q[~J (8.8)
= [QI Qz] [ ~ ]
= QIR. (8.9)
Any of (8.7), (8.8), or (8.9) are variously referred to as QR factorizations of A. Note that (8.9) is essentially what is accomplished by the Gram-Schmidt process, i.e., by writing AR-1 = QI we see that a "triangular" linear combination (given by the coefficients of R- I
) of the columns of A yields the orthonormal columns of Q I. Now note that
IIAx - bll~ = IIQT Ax - QTbll~ since II . 112 is unitarily invariant
= II [ ~ ] x - [ ~~ ] If:,
The last quantity above is clearly minimized by taking x = R- I Cl and the minimum residual is Ilczllz. Equivalently, we have x = R- 1 Qf b = A +b and the minimum residual is II Qr bllz'
EXERCISES
1. For A E ffi.mxn , b E ffi.m , and any y E ffi.n , check directly that (I - A + A)y and A +b are orthogonal vectors.
2. Consider the following set of measurements (Xi, Yi):
(1,2), (2,1), (3,3).
(a) Find the best (in the 2-norm sense) line of the form y = ax + fJ that fits this data.
(b) Find the best (in the 2-norm sense) line of the form x = ay + fJ that fits this data.
3. Suppose q, and qz are two orthonormal vectors and b is a fixed vector, all in ffi.n •
(a) Find the optimallinear combination aql + (3q2 that is closest to b (in the 2-norm sense).
(b) Let r denote the "error vector" b - aql - {3qz. Show that r is orthogonal to both ql and q2.
Exercises 73
4. Find all solutions of the linear least squares problem
5. Consider the problem of finding the minimum 2-norm solution of the linear least«rmarp« nrr»h1<=>m
(a) Consider a perturbation E\ = [0 pi of A, where 8 is a small positive number.Solve the perturbed version of the above problem,
where AI = A + E\. What happens to ||jt* — y ||2 as 8 approaches 0?
(b) Now consider the perturbation EI = \0 s~\ of A, where again 8 is a smallpositive number. Solve the perturbed problem
where A2 — A + E2. What happens to \\x* — z||2 as 8 approaches 0?
6. Use the four Penrose conditions and the fact that Q\ has orthonormal columns toverify that if A e R™x" can be factored in the form (8.9), then A+ = R~l Q\.
1. Let A e R"x", not necessarily nonsingular, and suppose A = QR, where Q isorthogonal. Prove that A+ = R+QT.
Exercises 73
4. Find all solutions of the linear least squares problem
min II Ax - bll 2 x
when A = [ ~ 5. Consider the problem of finding the minimum 2-norm solution of the linear least
squares problem min II Ax - bl1 2
x
when A = [~ ~ ] and b = [ ! 1 The solution is
(a) Consider a perturbation EI = [~ ~] of A, where 8 is a small positive number. Solve the perturbed version of the above problem,
where AI = A + E I . What happens to IIx* - yII2 as 8 approaches O?
(b) Now consider the perturbation E2 = [~ n of A, where again 8 is a small positive number. Solve the perturbed problem
min II A2z - bib z
where A2 = A + E2 • What happens to IIx* - zll2 as 8 approaches O?
6. Use the four Penrose conditions and the fact that QI has orthonormal columns to verify that if A E ~;:,xn can be factored in the form (8.9), then A+ = R- I Qf.
7. Let A E ~nxn, not necessarily nonsingular, and suppose A = QR, where Q is orthogonal. Prove that A + = R+ QT .
This page intentionally left blank This page intentionally left blank
Chapter 9
Eigenvalues andEigenvectors
9.1 Fundamental Definitions and PropertiesDefinition 9.1. A nonzero vector x e C" is a right eigenvector of A e Cnxn if there existsa scalar A. e C, called an eigenvalue, such that
Similarly, a nonzero vector y e C" is a left eigenvector corresponding to an eigenvaluea if
By taking Hermitian transposes in (9.1), we see immediately that XH is a left eigen-vector of AH associated with A. Note that if x [y] is a right [left] eigenvector of A, thenso is ax [ay] for any nonzero scalar a E C. One often-used scaling for an eigenvector isa — \j'||;t|| so that the scaled eigenvector has norm 1. The 2-norm is the most commonnorm used for such scaling.
Definition 9.2. The polynomial n (A.) = det(A—A,/) is called the characteristic polynomialof A. (Note that the characteristic polynomial can also be defined as det(A./ — A). Thisresults in at most a change of sign and, as a matter of convenience, we use both formsthroughout the text.}
The following classical theorem can be very useful in hand calculation. It can beproved easily from the Jordan canonical form to be discussed in the text to follow (see, forexample, [21]) or directly using elementary properties of inverses and determinants (see,for example, [3]).
Theorem 9.3 (Cayley-Hamilton). For any A e Cnxn, n(A) = 0.
Example 9.4. Let A = [~g ~g]. Then n(k) = X2 + 2A, — 3. It is an easy exercise to
verify that n(A) = A2 + 2A - 31 = 0.
It can be proved from elementary properties of determinants that if A e C"x", then7t (X) is a polynomial of degree n. Thus, the Fundamental Theorem of Algebra says that
75
Chapter 9
Eigenvalues and Eigenvectors
9.1 Fundamental Definitions and Properties Definition 9.1. A nonzero vector x E en is a right eigenvector of A E e nxn if there exists a scalar A E e, called an eigenvalue, such that
Ax = AX. (9.1)
Similarly, a nonzero vector y E en is a left eigenvector corresponding to an eigenvalue Mif
(9.2)
By taking Hennitian transposes in (9.1), we see immediately that x H is a left eigenvector of A H associated with I. Note that if x [y] is a right [left] eigenvector of A, then so is ax [ay] for any nonzero scalar a E C. One often-used scaling for an eigenvector is a = 1/ IIx II so that the scaled eigenvector has nonn 1. The 2-nonn is the most common nonn used for such scaling.
Definition 9.2. The polynomialn (A) = det (A - A l) is called the characteristic polynomial of A. (Note that the characteristic polynomial can also be defined as det(Al - A). This results in at most a change of sign and, as a matter of convenience, we use both forms throughout the text.)
The following classical theorem can be very useful in hand calculation. It can be proved easily from the Jordan canonical fonn to be discussed in the text to follow (see, for example, [21D or directly using elementary properties of inverses and determinants (see, for example, [3]).
Theorem 9.3 (Cayley-Hamilton). For any A E enxn , n(A) = O.
Example 9.4. Let A = [-~ -~]. Then n(A) = A2 + 2A - 3. It is an easy exercise to
verify that n(A) = A2 + 2A - 31 = O.
It can be proved from elementary properties of detenninants that if A E enxn, then
n(A) is a polynomial of degree n. Thus, the Fundamental Theorem of Algebra says that
75
and set X = 0 in this identity, we get the interesting fact that del (A) = AI • A.2 • • • AM (seealso Theorem 9.25).
If A e Wxn, then n(X) has real coefficients. Hence the roots of 7r(A), i.e., theeigenvalues of A, must occur in complex conjugate pairs.
Example 9.6. Let a, ft e R and let A = [ _^ £ ]. Then jr(A.) = A.2 - 2aA + a2 + ft2 and
A has eigenvalues a ± fij (where j = i = •>/—!)•
If A € R"x", then there is an easily checked relationship between the left and righteigenvectors of A and AT (take Hermitian transposes of both sides of (9.2)). Specifically, ify is a left eigenvector of A corresponding to A e A(A), then y is a right eigenvector of AT
corresponding to A. € A (A). Note, too, that by elementary properties of the determinant,we always have A(A) = A(A r), but that A(A) = A(A) only if A e R"x".
Definition 9.7. IfX is a root of multiplicity m ofjr(X), we say that X is an eigenvalue of Aof algebraic multiplicity m. The geometric multiplicity ofX is the number of associatedindependent eigenvectors = n — rank(A — A/) = dim J\f(A — XI).
If A € A(A) has algebraic multiplicity m, then 1 < dimA/"(A — A/) < m. Thus, ifwe denote the geometric multiplicity of A by g, then we must have 1 < g < m.
Definition 9.8. A matrix A e Wx" is said to be defective if it has an eigenvalue whosegeometric multiplicity is not equal to (i.e., less than) its algebraic multiplicity. Equivalently,A is said to be defective if it does not have n linearly independent (right or left) eigenvectors.
From the Cayley-Hamilton Theorem, we know that n(A) = 0. However, it is pos-sible for A to satisfy a lower-order polynomial. For example, if A = \1
Q ®], then A sat-
isfies (1 — I)2 = 0. But it also clearly satisfies the smaller degree polynomial equationa - n = o.Definition 5.5. The minimal polynomial of A G K""" is the polynomial o/(X) of leastdegree such that a (A) =0.
It can be shown that or(l) is essentially unique (unique if we force the coefficientof the highest power of A to be +1, say; such a polynomial is said to be monic and wegenerally write et (A) as a monic polynomial throughout the text). Moreover, it can also be
76 Chapter 9. Eigenvalues and Eigenvectors
7r(A) has n roots, possibly repeated. These roots, as solutions of the determinant equation
are the eigenvalues of A and imply the singularity of the matrix A — XI, and hence furtherguarantee the existence of corresponding nonzero eigenvectors.
Definition 9.5. The spectrum of A e C"x" is the set of all eigenvalues of A, i.e., the set ofall roots of its characteristic polynomial n(X). The spectrum of A is denoted A (A).
Let the eigenvalues of A e C"x" be denoted X\,..., Xn. Then if we write (9.3) in theform
76 Chapter 9. Eigenvalues and Eigenvectors
n(A) has n roots, possibly repeated. These roots, as solutions of the determinant equation
n(A) = det(A - AI) = 0, (9.3)
are the eigenvalues of A and imply the singularity of the matrix A - AI, and hence further guarantee the existence of corresponding nonzero eigenvectors.
Definition 9.5. The spectrum of A E cnxn is the set of all eigenvalues of A, i.e., the set of all roots of its characteristic polynomialn(A). The spectrum of A is denoted A(A).
Let the eigenvalues of A E en xn be denoted A], ... , An. Then if we write (9.3) in the form
n(A) = det(A - AI) = (A] - A) ... (An - A) (9.4)
and set A = 0 in this identity, we get the interesting fact that det(A) = A] . A2 ... An (see also Theorem 9.25).
If A E 1Ftnxn, then n(A) has real coefficients. Hence the roots of n(A), i.e., the eigenvalues of A, must occur in complex conjugate pairs.
Example 9.6. Let a, f3 E 1Ft and let A = [~f3 !]. Then n(A) = A 2 - 2aA + a 2 + f32 and
A has eigenvalues a ± f3j (where j = i = R). If A E 1Ftnxn, then there is an easily checked relationship between the left and right
eigenvectors of A and AT (take Hermitian transposes of both sides of (9.2». Specifically, if y is a left eigenvector of A corresponding to A E A(A), then y is a right eigenvector of AT corresponding to I E A(A). Note, too, that by elementary properties of the determinant, we always have A(A) = A(AT), but that A(A) = A(A) only if A E 1Ftnxn.
Definition 9.7. If A is a root of multiplicity m of n(A), we say that A is an eigenvalue of A of algebraic multiplicity m. The geometric multiplicity of A is the number of associated independent eigenvectors = n - rank(A - AI) = dimN(A - AI).
If A E A(A) has algebraic multiplicity m, then I :::: dimN(A - AI) :::: m. Thus, if we denote the geometric multiplicity of A by g, then we must have I :::: g :::: m.
Definition 9.8. A matrix A E 1Ftnxn is said to be defective if it has an eigenvalue whose geometric multiplicity is not equal to (i.e., less than) its algebraic multiplicity. Equivalently, A is said to be defective if it does not have n linearly independent (right or left) eigenvectors.
From the Cayley-Hamilton Theorem, we know that n(A) = O. However, it is pos
sible for A to satisfy a lower-order polynomial. For example, if A = [~ ~], then A sat
isfies (Je - 1)2 = O. But it also clearly satisfies the smaller degree polynomial equation (it. - 1) ;;;:; 0
neftnhion ~.~. Thll minimal polynomial Of A l::: l!if.nxn ix (hI' polynomilll a(A) oJ IPll.ft
degree such that a(A) ~ O.
It can be shown that a(Je) is essentially unique (unique if we force the coefficient of the highest power of A to be + 1. say; such a polynomial is said to be monic and we generally write a(A) as a monic polynomial throughout the text). Moreover, it can also be
9.1. Fundamental Definitions and Properties 77
shown that a (A.) divides every nonzero polynomial fi(k} for which ft (A) = 0. In particular,a(X) divides n(X).
There is an algorithm to determine or (A.) directly (without knowing eigenvalues and as-sociated eigenvector structure). Unfortunately, this algorithm, called the Bezout algorithm,is numerically unstable.
Example 9.10. The above definitions are illustrated below for a series of matrices, eachof which has an eigenvalue 2 of algebraic multiplicity 4, i.e., 7r(A) = (A — 2)4. We denotethe geometric multiplicity by g.
At this point, one might speculate that g plus the degree of a must always be five.Unfortunately, such is not the case. The matrix
Theorem 9.11. Let A e C«x" ana [et A., be an eigenvalue of A with corresponding righteigenvector jc,-. Furthermore, let yj be a left eigenvector corresponding to any A; e A (A)such that Xj =£ A.,. Then yfx{ = 0.
Proof: Since Axt = A,*,,
9.1. Fundamental Definitions and Properties 77
shown that a(A) divides every nonzero polynomial f3(A) for which f3(A) = O. In particular, a(A) divides n(A).
There is an algorithm to determine a(A) directly (without knowing eigenvalues and associated eigenvector structure). Unfortunately, this algorithm, called the Bezout algorithm, is numerically unstable.
Example 9.10. The above definitions are illustrated below for a series of matrices, each of which has an eigenvalue 2 of algebraic multiplicity 4, i.e., n(A) = (A - 2)4. We denote the geometric multiplicity by g.
A-[~ 0 ! ] ha,"(A) ~ (A - 2)' ""d g ~ 1.
2 I - 0 0 2
0 0 0
A~[~ 0
~ ] ha< a(A) ~ (A - 2)' ""d g ~ 2. 2 0 2 0 0
A~U I 0
~ ] h'" a(A) ~ (A - 2)2 ""d g ~ 3. 2 0 0 2 0 0
A~U 0 0
~ ] ha<a(A) ~ (A - 2) andg ~ 4. 2 0 0 2 0 0
At this point, one might speculate that g plus the degree of a must always be five. Unfortunately, such is not the case. The matrix
A~U I 0 !] 2 0 0 2 0 0
has a(A) = (A - 2)2 and g = 2.
Theorem 9.11. Let A E ccnxn and let Ai be an eigenvalue of A with corresponding right eigenvector Xi. Furthermore, let Yj be a left eigenvector corresponding to any Aj E l\(A) such that Aj 1= Ai. Then YY Xi = O.
Proof' Since AXi = AiXi,
(9.5)
78 Chapter 9. Eigenvalues and Eigenvectors
Similarly, since y" A = Xjyf,
Subtracting (9.6) from (9.5), we find 0 = (A.,- — Ay)j^jc,. Since A,,- — A.7- ^ 0, we must haveyfxt =0.
The proof of Theorem 9.11 is very similar to two other fundamental and importantresults.
Theorem 9.12. Let A e C"x" be Hermitian, i.e., A = AH. Then all eigenvalues of A mustbe real.
Proof: Suppose (A., x) is an arbitrary eigenvalue/eigenvector pair such that Ax = A.JC. Then
Taking Hermitian transposes in (9.7) yields
Using the fact that A is Hermitian, we have that XxHx = XxHx. However, since x is aneigenvector, we have XHX /= 0, from which we conclude A. = A, i.e., A. is real. D
Theorem 9.13. Let A e C"x" be Hermitian and suppose X and /JL are distinct eigenvaluesof A with corresponding right eigenvectors x and z, respectively. Then x and z must beorthogonal.
Proof: Premultiply the equation Ax = A.JC by ZH to get ZH Ax = X z H x . Take the Hermitiantranspose of this equation and use the facts that A is Hermitian and A. is real to get XH Az =XxHz. Premultiply the equation Az = i^z by XH to get XHAz = /^XHZ = XxHz. SinceA, ^ /z, we must have that XHz = 0, i.e., the two vectors must be orthogonal. D
Let us now return to the general case.
Theorem 9.14. Let A €. Cnxn have distinct eigenvalues A , 1 ? . . . , A.n with correspondingright eigenvectors x\,... ,xn. Then [x\,..., xn} is a linearly independent set. The sameresult holds for the corresponding left eigenvectors.
Proof: For the proof see, for example, [21, p. 118].
If A e Cnx" has distinct eigenvalues, and if A., e A(A), then by Theorem 9.11, jc, isorthogonal to all yj's for which j ^ i. However, it cannot be the case that yf*xt = 0 aswell, or else xf would be orthogonal to n linearly independent vectors (by Theorem 9.14)and would thus have to be 0, contradicting the fact that it is an eigenvector. Since yf*Xi ^ 0for each i, we can choose the normalization of the *, 's, or the y, 's, or both, so that yt
Hx; = 1for/ € n.
78 Chapter 9. Eigenvalues and Eigenvectors
Similarly, since YY A = A j yy, (9.6)
Subtracting (9.6) from (9.5), we find 0 = (Ai - Aj)YY xi. Since Ai - Aj =1= 0, we must have
YyXi = O. 0
The proof of Theorem 9.11 is very similar to two other fundamental and important results.
Theorem 9.12. Let A E cnxn be Hermitian, i.e., A = AH. Then all eigenvalues of A must be real.
Proof: Suppose (A, x) is an arbitrary eigenvalue/eigenvector pair such that Ax = AX. Then
(9.7)
Taking Hermitian transposes in (9.7) yields
Using the fact that A is Hermitian, we have that IXH x = AXH x. However, since x is an eigenvector, we have xH x =1= 0, from which we conclude I = A, i.e., A is real. 0
Theorem 9.13. Let A E cnxn be Hermitian and suppose A and iJ- are distinct eigenvalues of A with corresponding right eigenvectors x and z, respectively. Then x and z must be orthogonal.
Proof: Premultiply the equation Ax = AX by ZH to get ZH Ax = AZH x. Take the Hermitian transpose of this equation and use the facts that A is Hermitian and A is real to get x H Az = AxH z. Premultiply the equation Az = iJ-Z by x H to get x H Az = iJ-XH Z = AXH z. Since A =1= iJ-, we must have that x H z = 0, i.e., the two vectors must be orthogonal. 0
Let us now return to the general case.
Theorem 9.14. Let A E cnxn have distinct eigenvalues AI, ... , An with corresponding right eigenvectors XI, ... , xn • Then {XI, ... , xn } is a linearly independent set. The same result holds for the corresponding left eigenvectors.
Proof: For the proof see, for example, [21, p. 118]. 0
If A E cnxn has distinct eigenvalues, and if Ai E A(A), then by Theorem 9.11, Xi is orthogonal to all y/s for which j =1= i. However, it cannot be the case that Yi
H Xi = 0 as
well, or else Xi would be orthogonal to n linearly independent vectors (by Theorem 9.14) and would thus have to be 0, contradicting the fact that it is an eigenvector. Since yr Xi =1= 0 for each i, we can choose the normalization of the Xi'S, or the Yi 's, or both, so that YiH
Xi = 1 for i E !1.
9.1. Fundamental Definitions and Properties 79
Theorem 9.15. Let A e C"x" have distinct eigenvalues A.I , ..., A.n and let the correspond-ing right eigenvectors form a matrix X = [x\, ..., xn]. Similarly, let Y — [y\, ..., yn]be the matrix of corresponding left eigenvectors. Furthermore, suppose that the left andright eigenvectors have been normalized so that yf1 Xi = 1, / en. Finally, let A =diag(A,j, . . . , Xn) e Wtxn. Then AJC, = A.,-*/, / e n, can be written in matrix form as
Example 9.16. Let
Then n(X) = det(A - A./) = -(A.3 + 4A.2 + 9 A. + 10) = -(A. + 2)(A.2 + 2A, + 5), fromwhich we find A (A) = {—2, — 1 ± 2 j } . We can now find the right and left eigenvectorscorresponding to these eigenvalues.
For A-i = —2, solve the 3 x 3 linear system (A — (—2}I)x\ = 0 to get
while y^Xj = 5,;, / en, y' e n, is expressed by the equation
These matrix equations can be combined to yield the following matrix factorizations:
and
Note that one component of ;ci can be set arbitrarily, and this then determines the other two(since dimA/XA — (—2)7) = 1). To get the corresponding left eigenvector y\, solve thelinear system y\(A + 21) = 0 to get
This time we have chosen the arbitrary scale factor for y\ so that y f x \ = 1.For A2 = — 1 + 2j, solve the linear system (A — (—1 + 2j)I)x2 = 0 to get
9.1. Fundamental Definitions and Properties 79
Theorem 9.15. Let A E en xn have distinct eigenvalues A I, ... , An and let the corresponding right eigenvectors form a matrix X = [XI, ... , xn]. Similarly, let Y = [YI,"" Yn] be the matrix of corresponding left eigenvectors. Furthermore, suppose that the left and right eigenvectors have been normalized so that YiH Xi = 1, i E !!:: Finally, let A = diag(AJ, ... , An) E ]Rnxn. Then AXi = AiXi, i E !!, can be written in matrixform as
AX=XA (9.8)
while YiH X j = oij, i E!!, j E !!, is expressed by the equation
yHX = I. (9.9)
These matrix equations can be combined to yield the following matrix factorizations:
and
Example 9.16. Let
X-lAX = A = yRAX
n
A = XAX- I = XAyH = LAixiyr
2 5
-3
-3 -2
i=1
~ ] . -4
(9.10)
(9.11)
Then rr(A) = det(A - AI) = -(A3 + 4A2 + 9)" + 10) = -()" + 2)(),,2 + 2)" + 5), from which we find A(A) = {-2, -1 ± 2j}. We can now find the right and left eigenvectors corresponding to these eigenvalues.
For Al = -2, solve the 3 x 3 linear system (A - (-2)l)xI = 0 to get
Note that one component of XI can be set arbitrarily, and this then determines the other two (since dimN(A - (-2)1) = 1). To get the corresponding left eigenvector YI, solve the linear system yi (A + 21) = 0 to get
This time we have chosen the arbitrary scale factor for YJ so that yi XI = 1. For A2 = -1 + 2j, solve the linear system (A - (-1 + 2j) I)x2 = 0 to get
[
3+ j ] X2 = 3 ~/ .
80 Chapter 9. Eigenvalues and Eigenvectors
Solve the linear system y" (A — (-1 + 27')/) = 0 and normalize y>2 so that y"x2 = 1 to get
For XT, = — 1 — 2j, we could proceed to solve linear systems as for A.2. However, wecan also note that x$ =x2' and yi = jj. To see this, use the fact that A, 3 = A.2 and simplyconjugate the equation A;c2 — ̂ .2*2 to get Ax^ = ^2X2- A similar argument yields the resultfor left eigenvectors.
Now define the matrix X of right eigenvectors:
It is then easy to verify that
Other results in Theorem 9.15 can also be verified. For example,
Finally, note that we could have solved directly only for *i and x2 (and XT, = x2). Then,instead of determining the j,'s directly, we could have found them instead by computingX~l and reading off its rows.
Example 9.17. Let
Then 7r(A.) = det(A - A./) = -(A3 + 8A2 + 19X + 12) = -(A. + 1)(A. + 3)(A, + 4),from which we find A (A) = {—1, —3, —4}. Proceeding as in the previous example, it isstraightforward to compute
and
80 Chapter 9. Eigenvalues and Eigenvectors
Solve the linear system yf (A - ( -I + 2 j) I) = 0 and nonnalize Y2 so that yf X2 = 1 to get
For A3 = -I - 2j, we could proceed to solve linear systems as for A2. However, we can also note that X3 = X2 and Y3 = Y2. To see this, use the fact that A3 = A2 and simply conjugate the equation AX2 = A2X2 to get AX2 = A2X2. A similar argument yields the result for left eigenvectors.
Now define the matrix X of right eigenvectors:
3+j 3- j ] 3-j 3+j . -2 -2
It is then easy to verify that
.!.=.L 4
l+j 4
!.±1 4
.!.=.L 4
Other results in Theorem 9.15 can also be verified. For example,
[
-2 0 X-IAX=A= 0 -1+2j
o 0
Finally, note that we could have solved directly only for XI and X2 (and X3 = X2). Then, instead of detennining the Yi'S directly, we could have found them instead by computing X-I and reading off its rows.
Example 9.17. Let
A = [-~ -~ ~] . o -3
Then Jl"(A) = det(A - AI) = _(A3 + 8A2 + 19A + 12) = -(A + I)(A + 3)(A + 4), from which we find A(A) = {-I, -3, -4}. Proceeding as in the previous example, it is gtruightforw!U"d to comput~
X~[~ I -i ] 0
-I
and
x-,~q 1 2 1 ] ~ y' 3 0 -3
2 -2 2
9.1. Fundamental Definitions and Properties 81
We also have X~l AX = A = diag(—1, —3, —4), which is equivalent to the dyadic expan-sion
Theorem 9.18. Eigenvalues (but not eigenvectors) are invariant under a similarity trans-formation T.
Proof: Suppose (A, jc) is an eigenvalue/eigenvector pair such that Ax = Xx. Then, since Tis nonsingular, we have the equivalent statement (T~lAT)(T~lx) = X ( T ~ l x ) , from whichthe theorem statement follows. For left eigenvectors we have a similar statement, namelyyHA = XyH ifandon\yif(THy)H(T~1AT) =X(THyf. D
Remark 9.19. If / is an analytic function (e.g., f ( x ) is a polynomial, or ex, or sin*,or, in general, representable by a power series X^^o fln*n)> then it is easy to show thatthe eigenvalues of /(A) (defined as X^o^-A") are /(A), but /(A) does not necessarily
have all the same eigenvectors (unless, say, A is diagonalizable). For example, A = T0 Oj
has only one right eigenvector corresponding to the eigenvalue 0, but A2 = f 0 01 has twoindependent right eigenvectors associated with the eigenvalue 0. What is true is that theeigenvalue/eigenvector pair (A, x) maps to (/(A), x) but not conversely.
The following theorem is useful when solving systems of linear differential equations.Details of how the matrix exponential e'A is used to solve the system x = Ax are the subjectof Chapter 11.
Theorem 9.20. Let A e R"xn and suppose X~~1AX — A, where A is diagonal. Then
9.1. Fundamental Definitions and Properties 81
We also have X-I AX = A = diag( -1, -3, -4), which is equivalent to the dyadic expansion
3
A = LAixiyr i=1
~(-I)[ ~ W ~ ~l+(-3)[ j ][~ 0 -~l
+(-4) [ -; ] [~ 1
~J - -3
~ (-I) [ I I I
J + (-3) [
I 0 I
] + (-4) [
I I I
l (; 3 (; 2 -2 3 -3 3 I 2 I 0 0 0 I I I 3 3 3 -3 3 -3
I I I I 0 I I I I (; 3 (; -2 2 3 -3 3
Theorem 9.18. Eigenvalues (but not eigenvectors) are invariant under a similarity transformation T.
Proof: Suppose (A, X) is an eigenvalue/eigenvector pair such that Ax = AX. Then, since T is nonsingular, we have the equivalent statement (T- I AT)(T- I x) = A(T- I x), from which the theorem statement follows. For left eigenvectors we have a similar statement, namely yH A = AyH if and only if (T H y)H (T- 1 AT) = A(TH y)H. D
Remark 9.19. If f is an analytic function (e.g., f(x) is a polynomial, or eX, or sinx, or, in general, representable by a power series L~:O anxn), then it is easy to show that the eigenvalues of f(A) (defined as L~:OanAn) are f(A), but f(A) does not necessarily
have all the same eigenvectors (unless, say, A is diagonalizable). For example, A = [~ 6 ] has only one right eigenvector corresponding to the eigenvalue 0, but A 2 = [~ ~ ] has two independent right eigenvectors associated with the eigenvalue o. What is true is that the eigenvalue/eigenvector pair (A, x) maps to (f(A), x) but not conversely.
The following theorem is useful when solving systems of linear differential equations. Details of how the matrix exponential etA is used to solve the system i = Ax are the subject of Chapter 11.
Theorem 9.20. Let A E jRnxn and suppose X-I AX = A, where A is diagonal. Then
n
= LeA,txiYiH. i=1
82 Chapter 9. Eigenvalues and Eigenvectors
Proof: Starting from the definition, we have
The following corollary is immediate from the theorem upon setting t = I.
Corollary 9.21. If A e Rnx" is diagonalizable with eigenvalues A.,-, /' en, and righteigenvectors xt•, / € n_, then eA has eigenvalues e X i , i € n_, and the same eigenvectors.
There are extensions to Theorem 9.20 and Corollary 9.21 for any function that isanalytic on the spectrum of A, i.e., f ( A ) = X f ( A ) X ~ l = Xdiag(/(A.i) , . . . , f ( X t t ) ) X ~ l .
It is desirable, of course, to have a version of Theorem 9.20 and its corollary in whichA is not necessarily diagonalizable. It is necessary first to consider the notion of Jordancanonical form, from which such a result is then available and presented later in this chapter.
9.2 Jordan Canonical FormTheorem 9.22.
1. Jordan Canonical Form (/CF): For all A e C"x" with eigenvalues X\,..., kn e C(not necessarily distinct), there exists X € C^x" such that
where each of the Jordan block matrices / i , . . . , Jq is of the form
82 Chapter 9. Eigenvalues and Eigenvectors
Proof' Starting from the definition, we have
n
= LeA;IXiYiH. 0 i=1
The following corollary is immediate from the theorem upon setting t = I.
Corollary 9.21. If A E ]Rn xn is diagonalizable with eigenvalues Ai, i E ~, and right eigenvectors Xi, i E ~, then eA has eigenvalues eA
" i E ~, and the same eigenvectors.
There are extensions to Theorem 9.20 and Corollary 9.21 for any function that is analytic on the spectrum of A, i.e., f(A) = Xf(A)X- I = Xdiag(J(AI), ... , f(An))X- I .
It is desirable, of course, to have a version of Theorem 9.20 and its corollary in which A is not necessarily diagonalizable. It is necessary first to consider the notion of Jordan canonical form, from which such a result is then available and presented later in this chapter.
9.2 Jordan Canonical Form Theorem 9.22.
I. lordan Canonical Form (JCF): For all A E c nxn with eigenvalues AI, ... , An E C (not necessarily distinct), there exists X E c~xn such that
X-I AX = 1 = diag(ll, ... , 1q), (9.12)
where each of the lordan block matrices 11, ••• , 1q is of the form
Ai 0 o 0 Ai 0
1i = Ai (9.13)
o Ai
o o Ai
9.2. Jordan Canonical Form 83
2. Real Jordan Canonical Form: For all A € Rnx" with eigenvalues Xi, . . . , Xn (notnecessarily distinct), there exists X € R"xn such that
where each of the Jordan block matrices J\, ..., Jq is of the form
in the case of real eigenvalues A., e A (A), and
where M; = [ _»' ^ 1 and I2 = \0 A in the case of complex conjugate eigenvaluesai±jpieA(A>).
Proof: For the proof see, for example, [21, pp. 120-124]. D
Transformations like T = I" _, "•{"] allow us to go back and forth between a real JCFand its complex counterpart:
For nontrivial Jordan blocks, the situation is only a bit more complicated. With
9.2. Jordan Canonical Form 83
and L;=1 ki = n.
2. Real Jordan Canonical Form: For all A E jRnxn with eigenvalues AI, ... , An (not necessarily distinct), there exists X E lR.~xn such that
(9.14)
where each of the Jordan block matrices 11, ... , 1q is of the form
where Mi = [ _~; ~: ] and h = [6 ~] in the case of complex conjugate eigenvalues (Xi ± jfJi E A(A).
Proof: For the proof see, for example, [21, pp. 120-124]. 0
Transformations like T = [ _ ~ - { ] allow us to go back and forth between a real JCF and its complex counterpart:
T-I [ (X + jfJ O. ] T = [ (X fJ ] = M. o (X - JfJ -fJ (X
For nontrivial Jordan blocks, the situation is only a bit more complicated. With
1 o
-j o
-j o 1 o 0 ' ~ -~] o -j 1
84 Chapter 9. Eigenvalues and Eigenvectors
it is easily checked that
Definition 9.23. The characteristic polynomials of the Jordan blocks defined in Theorem9.22 are called the elementary divisors or invariant factors of A.
Theorem 9.24. The characteristic polynomial of a matrix is the product of its elementarydivisors. The minimal polynomial of a matrix is the product of the elementary divisors ofhighest degree corresponding to distinct eigenvalues.
Theorem 9.25. Let A e C"x" with eigenvalues AI, ..., Xn. Then
Proof:
1. From Theorem 9.22 we have that A = X J X ~ l . Thus,det(A) = det(XJX-1) = det(7) = ]~["=l A,-.
2. Again, from Theorem 9.22 we have that A = X J X ~ l . Thus,Tr(A) = Tr(XJX~l) = TrC/X"1*) = Tr(/) = £"=1 A.,-. D
Example 9.26. Suppose A e E7x7 is known to have 7r(A) = (A. - 1)4(A - 2)3 anda (A.) = (A. — I)2 (A. — 2)2. Then A has two possible JCFs (not counting reorderings of thediagonal blocks):
Note that 7(1) has elementary divisors (A - I)2, (A. - 1), (A. - 1), (A, - 2)2, and (A - 2),while /(2) has elementary divisors (A - I)2, (A - I)2, (A - 2)2, and (A - 2).
84 Chapter 9. Eigenvalues and Eigenvectors
it is easily checked that
[ "+ jfi 0 0
]T~[~ T- I 0 et + jf3 0 0 h l 0 0 et - jf3 M 0 0 0 et - jf3
Definition 9.23. The characteristic polynomials of the Jordan blocks defined in Theorem 9.22 are called the elementary divisors or invariant factors of A.
Theorem 9.24. The characteristic polynomial of a matrix is the product of its elementary divisors. The minimal polynomial of a matrix is the product of the elementary divisors of highest degree corresponding to distinct eigenvalues.
Theorem 9.25. Let A E cnxn with eigenvalues AI, .. " An. Then
n
1. det(A) = nAi. i=1
n
2. Tr(A) = 2,)i. i=1
Proof:
1. From Theorem 9.22 we have that A = X J X-I. Thus, det(A) = det(X J X-I) = det(J) = n7=1 Ai.
2. Again, from Theorem 9.22 we have that A = X J X-I. Thus, Tr(A) = Tr(X J X-I) = Tr(JX- 1 X) = Tr(J) = L7=1 Ai. 0
Example 9.26. Suppose A E lR.7x7 is known to have :rr(A) = (A - 1)4(A - 2)3 and et(A) = (A - 1)2(A - 2)2. Then A has two possible JCFs (not counting reorderings of the diagonal blocks):
1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 I 0 0 0
J(l) = 0 0 0 1 0 0 0 and f2) = 0 0 0 1 0 0 0 0 0 0 0 2 1 0 0 0 0 0 2 1 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2
Note that J(l) has elementary divisors (A - 1)z, (A - 1), (A - 1), (A - 2)2, and (A - 2), while J(2) has elementary divisors (A - 1)2, (A - 1)2, (A - 2)2, and (A - 2).
9.3. Determination of the JCF &5
Example 9.27. Knowing TT (A.), a (A), and rank (A — A,,7) for distinct A., is not sufficient todetermine the JCF of A uniquely. The matrices
both have 7r(A.) = (A. — a) , a(A.) = (A. — a) , and rank(A — al) = 4, i.e., three eigen-vectors.
9.3 Determination of the JCFThe first critical item of information in determining the JCF of a matrix A e Wlxn is itsnumber of eigenvectors. For each distinct eigenvalue A,,, the associated number of linearlyindependent right (or left) eigenvectors is given by dim A^(A — A.,7) = n — rank(A — A.(7).The straightforward case is, of course, when X,- is simple, i.e., of algebraic multiplicity 1; itthen has precisely one eigenvector. The more interesting (and difficult) case occurs whenA, is of algebraic multiplicity greater than one. For example, suppose
Then
has rank 1, so the eigenvalue 3 has two eigenvectors associated with it. If we let [^i £2 &]T
denote a solution to the linear system (A — 3/)£ = 0, we find that 2£2 + £3=0. Thus, both
are eigenvectors (and are independent). To get a third vector JC3 such that X = [x\ KJ_ XT,]reduces A to JCF, we need the notion of principal vector.
Definition 9.28. Let A e C"xn (or R"x"). Then x is a right principal vector of degree kassociated with X e A (A) if and only if (A - XI)kx = 0 and (A - U}k~lx ^ 0.
Remark 9.29.
1. An analogous definition holds for a left principal vector of degree k.
9.3. Determination of the JCF 85
Example 9.27. Knowing rr(A), a(A), and rank(A - Ai l) for distinct Ai is not sufficient to determine the JCF of A uniquely. The matrices
a 0 0 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0
Al= 0 0 0 a 0 0 A2 = 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 0 a 1 0 0 0 0 0 a 0 0 0 0 0 0 0 a 0 0 0 0 0 0 a
both have rr(A) = (A - a)7, a(A) = (A - a)\ and rank(A - al) = 4, i.e., three eigenvectors.
9.3 Determination of the JCF The first critical item of information in determining the JCF of a matrix A E ]R.nxn is its number of eigenvectors. For each distinct eigenvalue Ai, the associated number of linearly independent right (or left) eigenvectors is given by dimN(A - A;l) = n - rank(A - A;l). The straightforward case is, of course, when Ai is simple, i.e., of algebraic multiplicity 1; it then has precisely one eigenvector. The more interesting (and difficult) case occurs when Ai is of algebraic multiplicity greater than one. For example, suppose
[3 2 n A = 0 3 o 0
Then
A-3I= U 2 I] o 0 o 0
has rank 1, so the eigenvalue 3 has two eigenvectors associated with it. If we let [~l ~2 ~3]T denote a solution to the linear system (A - 3l)~ = 0, we find that 2~2 + ~3 = O. Thus, both
are eigenvectors (and are independent). To get a third vector X3 such that X = [Xl X2 X3]
reduces A to JCF, we need the notion of principal vector.
Definition 9.28. Let A E cnxn (or ]R.nxn). Then X is a right principal vector of degree k associated with A E A(A) ifand only if(A - ulx = 0 and (A - AI)k-lx i= o. Remark 9.29.
1. An analogous definition holds for a left principal vector of degree k.
86 Chapter 9. Eigenvalues and Eigenvectors
2. The phrase "of grade k" is often used synonymously with "of degree k."
3. Principal vectors are sometimes also called generalized eigenvectors, but the latterterm will be assigned a much different meaning in Chapter 12.
4. The case k = 1 corresponds to the "usual" eigenvector.
5. A right (or left) principal vector of degree k is associated with a Jordan block ji ofdimension k or larger.
9.3.1 Theoretical computation
To motivate the development of a procedure for determining principal vectors, consider a
2x2 Jordan block {h 0 h1. Denote by x(1) and x(2) the two columns of a matrix X e R2,x2
that reduces a matrix A to this JCF. Then the equation AX = XJ can be written
The first column yields the equation Ax(1) = hx(1) which simply says that x(1) is a righteigenvector. The second column yields the following equation for x(2), the principal vectorof degree 2:
If we premultiply (9.17) by (A - XI), we find (A - X I ) z x ( 2 ) = (A - XI)xw = 0. Thus,the definition of principal vector is satisfied.
This suggests a "general" procedure. First, determine all eigenvalues of A e R"x"(or Cnxn). Then for each distinct X e A (A) perform the following:
1. Solve
This step finds all the eigenvectors (i.e., principal vectors of degree 1) associated withX. The number of eigenvectors depends on the rank of A — XI. For example, ifrank(A — XI) = n — 1, there is only one eigenvector. If the algebraic multiplicity ofX is greater than its geometric multiplicity, principal vectors still need to be computedfrom succeeding steps.
2. For each independent jc(1), solve
The number of linearly independent solutions at this step depends on the rank of(A — X I ) 2 . If, for example, this rank is n — 2, there are two linearly independentsolutions to the homogeneous equation (A — XI)2x^ = 0. One of these solutionsis, of course, x(l) (^ 0), since (A - X I ) 2 x ( l ) = (A - XI)0 = 0. The other solutionis the desired principal vector of degree 2. (It may be necessary to take a linearcombination of jc(1) vectors to get a right-hand side that is in 7£(A — XI). See, forexample, Exercise 7.)
86 Chapter 9. Eigenvalues and Eigenvectors
2. The phrase "of grade k" is often used synonymously with "of degree k."
3. Principal vectors are sometimes also called generalized eigenvectors, but the latter term will be assigned a much different meaning in Chapter 12.
4. The case k = 1 corresponds to the "usual" eigenvector.
S. A right (or left) principal vector of degree k is associated with a Jordan block J; of dimension k or larger.
9.3.1 Theoretical computation
To motivate the development of a procedure for determining principal vectors, consider a 2 x 2 Jordan block [~ i]. Denote by x(l) and x(2) the two columns of a matrix X E lR~X2 that reduces a matrix A to this JCF. Then the equation AX = X J can be written
A [x(l) x(2)] = [x(l) X(2)] [~ ~ J. The first column yields the equation Ax(!) = AX(!), which simply says that x(!) is a right eigenvector. The second column yields the following equation for x(2), the principal vector of degree 2:
(A - A/)x(2) = x(l). (9.17)
If we premultiply (9.17) by (A - AI), we find (A - A1)2 x(2) = (A - A1)X(l) = O. Thus, the definition of principal vector is satisfied.
This suggests a "general" procedure. First, determine all eigenvalues of A E lRnxn
(or cnxn ). Then for each distinct A E A(A) perform the following:
1. Solve (A - A1)X(l) = O.
This step finds all the eigenvectors (i.e., principal vectors of degree I) associated with A. The number of eigenvectors depends on the rank of A - AI. For example, if rank(A - A/) = n - 1, there is only one eigenvector. If the algebraic multiplicity of A is greater than its geometric multiplicity, principal vectors still need to be computed from succeeding steps.
2. For each independent x(l), solve
(A - A1)x(2) = x(l).
The number of linearly independent solutions at this step depends on the rank of (A - uf. If, for example, this rank is n - 2, there are two linearly independent solutions to the homogeneous equation (A - AI)2x (2) = o. One of these solutions is, of course, x(l) (1= 0), since (A - 'A1) 2
x(l) = (A - AI)O = o. The other solution is the desired principal vector of degree 2. (It may be necessary to take a linear combination of x(l) vectors to get a right-hand side that is in R(A - AI). See, for example, Exercise 7.)
9.3. Determination of the JCF 87
4. Continue in this way until the total number of independent eigenvectors and principalvectors is equal to the algebraic multiplicity of A.
Unfortunately, this natural-looking procedure can fail to find all Jordan vectors. Formore extensive treatments, see, for example, [20] and [21]. Determination of eigenvectorsand principal vectors is obviously very tedious for anything beyond simple problems (n = 2or 3, say). Attempts to do such calculations in finite-precision floating-point arithmeticgenerally prove unreliable. There are significant numerical difficulties inherent in attemptingto compute a JCF, and the interested student is strongly urged to consult the classical and veryreadable [8] to learn why. Notice that high-quality mathematical software such as MATLABdoes not offer a jcf command, although a jordan command is available in MATLAB'SSymbolic Toolbox.
Theorem 9.30. Suppose A e Ckxk has an eigenvalue A, of algebraic multiplicity k andsuppose further that rank(A — AI) = k — 1. Let X = [ x ( l ) , . . . , x(k)], where the chain ofvectors x(i) is constructed as above. Then
Theorem 9.31. (x (1), . . . , x(k)} is a linearly independent set.
Theorem 9.32. Principal vectors associated with different Jordan blocks are linearly inde-pendent.
Example 9.33. Let
The eigenvalues of A are A1 = 1, h2 = 1, and h3 = 2. First, find the eigenvectors associatedwith the distinct eigenvalues 1 and 2.
,(1)(A - 2/)x3(1) = 0 yields
3. For each independent x(2) from step 2, solve
9.3. Determination of the JCF 87
3. For each independent X(2) from step 2, solve
(A - AI)x(3) = x(2).
4. Continue in this way until the total number of independent eigenvectors and principal vectors is equal to the algebraic multiplicity of A.
Unfortunately, this natural-looking procedure can fail to find all Jordan vectors. For more extensive treatments, see, for example, [20] and [21]. Determination of eigenvectors and principal vectors is obviously very tedious for anything beyond simple problems (n = 2 or 3, say). Attempts to do such calculations in finite-precision floating-point arithmetic generally prove unreliable. There are significant numerical difficulties inherent in attempting to compute a JCF, and the interested student is strongly urged to consult the classical and very readable [8] to learn why. Notice that high-quality mathematical software such as MATLAB does not offer a j cf command, although a j ardan command is available in MATLAB's Symbolic Toolbox.
Theorem 9.30. Suppose A E Ckxk has an eigenvalue A of algebraic multiplicity k and suppose further that rank(A - AI) = k - 1. Let X = [x(l), ... , X(k)], where the chain of vectors x(i) is constructed as above. Then
Theorem 9.31. {x(l), ... , X(k)} is a linearly independent set.
Theorem 9.32. Principal vectors associated with different Jordan blocks are linearly independent.
Example 9.33. Let
A=[~ 1 ;]. 002
The eigenvalues of A are AI = I, A2 = 1, and A3 = 2. First, find the eigenvectors associated with the distinct eigenvalues 1 and 2. (A - 2I)x~1) = 0 yields
88 Chapter 9. Eigenvalues and Eigenvectors
(A- l/)x,(1) =0 yields
Then it is easy to check that
9.3.2 On the +1 's in JCF blocks
In this subsection we show that the nonzero superdiagonal elements of a JCF need not be1 's but can be arbitrary — so long as they are nonzero. For the sake of defmiteness, weconsider below the case of a single Jordan block, but the result clearly holds for any JCF.Supposed A € Rnxn and
Let D = diag(d1, . . . , dn) be a nonsingular "scaling" matrix. Then
To find a principal vector of degree 2 associated with the multiple eigenvalue 1, solve(A – l/)x,(2) = x,(1)toeet
Now let
88 Chapter 9. Eigenvalues and Eigenvectors
(A - 11)x?J = 0 yields
To find a principal vector of degree 2 associated with the multiple eigenvalue 1, solve (A - 1I)xl2) = xiI) to get
[ 0 ] (2) x, = ~ .
Now let
xl" xl"] ~ [ ~ 0 5 l X = [xiI) 1 3 0
Then it is easy to check that
X-'~U 0
-5 ] [ I n 1 -i and X-lAX = ~ 1 0 0
9.3.2 On the +1 's in JCF blocks
In this subsection we show that the nonzero superdiagonal elements of a JCF need not be 1 's but can be arbitrary - so long as they are nonzero. For the sake of definiteness, we consider below the case of a single Jordan block, but the result clearly holds for any JCF. Suppose A E jRnxn and
Let D = diag(d" ... , dn ) be a nonsingular "scaling" matrix. Then
A 4l. 0 0 d,
0 )... !b. 0 d,
D-'(X-' AX)D = D-' J D = j = A
dn - I 0 dn- 2
A- dn
dn - I
0 0 )...
9.4. Geometric Aspects of the JCF 89
Appropriate choice of the di 's then yields any desired nonzero superdiagonal elements.This result can also be interpreted in terms of the matrix X = [x\,..., xn] of eigenvectorsand principal vectors that reduces A to its JCF. Specifically, J is obtained from A via thesimilarity transformation XD = \d\x\,..., dnxn}.
In a similar fashion, the reverse-order identity matrix (or exchange matrix)
9.4 Geometric Aspects of the JCF
Note that dimM(A — A.,/)w = «,-.
Definition 9.35. Let V be a vector space over F and suppose A : V —>• V is a lineartransformation. A subspace S c V is A-invariant if AS c S, where AS is defined as theset {As : s e S}.
can be used to put the superdiagonal elements in the subdiagonal instead if that is desired:
The matrix X that reduces a matrix A e IR"X" (or C nxn) to a JCF provides a change of basiswith respect to which the matrix is diagonal or block diagonal. It is thus natural to expect anassociated direct sum decomposition of R. Such a decomposition is given in the followingtheorem.
Theorem 9.34. Suppose A e R"x" has characteristic polynomial
and minimal polynomial
with A- i , . . . , A.m distinct. Then
9.4. Geometric Aspects of the JCF 89
Appropriate choice of the di's then yields any desired nonzero superdiagonal elements. This result can also be interpreted in terms of the matrix X = [x[, ... ,xn ] of eigenvectors and principal vectors that reduces A to its lCF. Specifically, j is obtained from A via the similarity transformation XD = [d[x[, ... , dnxn].
In a similar fashion, the reverse-order identity matrix (or exchange matrix)
0 0 I
0 p = pT = p-[ = (9.18)
0 1 I 0 0
can be used to put the superdiagonal elements in the subdiagonal instead if that is desired:
A I 0 0 A 0 0
0 A 0 A 0
p-[ A p=
0 1 A
0
A I A 0 0 0 A 0 0 A
9.4 Geometric Aspects of the JCF The matrix X that reduces a matrix A E jH.nxn (or cnxn ) to a lCF provides a change of basis with respect to which the matrix is diagonal or block diagonal. It is thus natural to expect an associated direct sum decomposition of jH.n. Such a decomposition is given in the following theorem.
Theorem 9.34. Suppose A E jH.nxn has characteristic polynomial
n(A) = (A - A[)n) ... (A - Amtm
and minimal polynomial a(A) = (A - A[)V) '" (A - Am)Vm
with A I, ... , Am distinct. Then
jH.n = N(A - AlIt) E6 ... E6 N(A - AmItm
= N (A - A 1 I) v) E6 ... E6 N (A - Am I) Vm .
Note that dimN(A - AJ)Vi = ni.
Definition 9.35. Let V be a vector space over IF and suppose A : V --+ V is a linear transformation. A subspace S ~ V is A -invariant if AS ~ S, where AS is defined as the set {As: s E S}.
90 Chapter 9. Eigenvalues and Eigenvectors
If V is taken to be R" over R, and S e R"x* is a matrix whose columns s\,..., s/tspan a /^-dimensional subspace <S, i.e., K(S) = <S, then <S is A-invariant if and only if thereexists M eRkxk such that
This follows easily by comparing the /th columns of each side of (9.19):
Example 9.36. The equation Ax = A* = x A defining a right eigenvector x of an eigenvalueX says that * spans an A-invariant subspace (of dimension one).
Example 9.37. Suppose X block diagonalizes A, i.e.,
Rewriting in the form
we have that A A, = A", /,, / = 1, 2, so the columns of A, span an A-mvanant subspace.
Theorem 9.38. Suppose A e E"x".
7. Let p(A) = «o/ + o?i A + • • • + <xqAq be a polynomial in A. Then N(p(A)) and
7£(p(A)) are A-invariant.
2. S is A-invariant if and only ifS1- is AT-invariant.
Theorem 9.39. If V is a vector space over F such that V = N\ ® • • • 0 Nm, where eachA// is A-invariant, then a basis for V can be chosen with respect to which A has a blockdiagonal representation.
The Jordan canonical form is a special case of the above theorem. If A has distincteigenvalues A,,- as in Theorem 9.34, we could choose bases for N(A — A.,-/)"' by SVD, forexample (note that the power n, could be replaced by v,). We would then get a block diagonalrepresentation for A with full blocks rather than the highly structured Jordan blocks. Othersuch "canonical" forms are discussed in text that follows.
Suppose A" = [ X i , . . . , Xm] e R"nxn is such that X^AX = diag(7i , . . . , Jm), where
each Ji = diag(/,i,..., //*,.) and each /,* is a Jordan block corresponding to A, e A(A).We could also use other block diagonal decompositions (e.g., via SVD), but we restrict ourattention here to only the Jordan block case. Note that A A", = A*,- /,, so by (9.19) the columnsof A", (i.e., the eigenvectors and principal vectors associated with A.,) span an A-invariantsubspace of W.
Finally, we return to the problem of developing a formula for e'A in the case that Ais not necessarily diagonalizable. Let 7, € C"x"' be a Jordan basis for N(AT — A.,/)"'.Equivalently, partition
90 Chapter 9. Eigenvalues and Eigenvectors
If V is taken to be ]Rn over Rand S E ]Rn xk is a matrix whose columns SI, ... , Sk
span a k-dimensional subspace S, i.e., R(S) = S, then S is A-invariant if and only if there exists M E ]Rkxk such that
AS = SM. (9.19)
This follows easily by comparing the ith columns of each side of (9.19):
Example 9.36. The equation Ax = AX = x A defining a right eigenvector x of an eigenvalue A says that x spans an A-invariant subspace (of dimension one).
Example 9.37. Suppose X block diagonalizes A, i.e.,
X-I AX = [~ J2
].
Rewriting in the form
~ J, we have that AXi = X;li, i = 1,2, so the columns of Xi span an A-invariant subspace.
Theorem 9.38. Suppose A E ]Rnxn.
1. Let peA) = CloI + ClIA + '" + ClqAq be a polynomial in A. Then N(p(A)) and R(p(A)) are A-invariant.
2. S is A -invariant if and only if S 1. is A T -invariant.
Theorem 9.39. If V is a vector space over IF such that V = NI EB ... EB N m, where each N; is A-invariant, then a basis for V can be chosen with respect to which A has a block diagonal representation.
The Jordan canonical form is a special case of the above theorem. If A has distinct eigenvalues Ai as in Theorem 9.34, we could choose bases for N(A - Ai/)n, by SVD, for example (note that the power ni could be replaced by Vi). We would then get a block diagonal representation for A with full blocks rather than the highly structured Jordan blocks. Other such "canonical" forms are discussed in text that follows.
Suppose X = [Xl ..... Xm] E ]R~xn is such that X-I AX = diag(J1, ... , Jm), where each Ji = diag(JiI,"" Jik,) and each Jik is a Jordan block corresponding to Ai E A(A). We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our attention here to only the Jordan block case. Note that AXi = Xi Ji, so by (9.19) the columns of Xi (i.e., the eigenvectors and principal vectors associated with Ai) span an A-invariant subspace of]Rn.
Finally, we return to the problem of developing a formula for e l A in the case that A is not necessarily diagonalizable. Let Yi E <enxn, be a Jordan basis for N (AT - A;lt. Equivalently, partition
9.5. The Matrix Sign Function 91
compatibly. Then
In a similar fashion we can compute
which is a useful formula when used in conjunction with the result
for a k x k Jordan block 7, associated with an eigenvalue A. = A.,.
9.5 The Matrix Sign FunctionIn this section we give a very brief introduction to an interesting and useful matrix functioncalled the matrix sign function. It is a generalization of the sign (or signum) of a scalar. Asurvey of the matrix sign function and some of its applications can be found in [15].
Definition 9.40. Let z E C with Re(z) ^ 0. Then the sign of z is defined by
Definition 9.41. Suppose A e C"x" has no eigenvalues on the imaginary axis, and let
be a Jordan canonical form for A, with N containing all Jordan blocks corresponding to theeigenvalues of A in the left half-plane and P containing all Jordan blocks corresponding toeigenvalues in the right half-plane. Then the sign of A, denoted sgn(A), is given by
9.S. The Matrix Sign Function
compatibly. Then
A = XJX- I = XJy H
= [XI, ... , Xm] diag(JI, ... , Jm) [YI , ••• , Ym]H m
= LX;JiYiH
. i=1
In a similar fashion we can compute
m
etA = LXietJ;YiH,
i=1
which is a useful formula when used in conjunction with the result
A 0 0 eAt teAt .lt2eAt 2!
0 A 0 eAt teAt
exp t A 0 0 0 eAt
1 0 0 A 0 0
for a k x k Jordan block Ji associated with an eigenvalue A = Ai.
9.5 The Matrix Sign Function
91
In this section we give a very brief introduction to an interesting and useful matrix function called the matrix sign function. It is a generalization of the sign (or signum) of a scalar. A survey of the matrix sign function and some of its applications can be found in [15].
Definition 9.40. Let z E C with Re(z) f= O. Then the sign of z is defined by
Re(z) {+1 sgn(z) = IRe(z) I = -1
ifRe(z) > 0, ifRe(z) < O.
Definition 9.41. Suppose A E cnxn has no eigenvalues on the imaginary axis, and let
be a Jordan canonical form for A, with N containing all Jordan blocks corresponding to the eigenvalues of A in the left half-plane and P containing all Jordan blocks corresponding to eigenvalues in the right half-plane. Then the sign of A, denoted sgn(A), is given by
[ -/ 0] -I sgn(A) = X 0 / X ,
92 Chapter 9. Eigenvalues and Eigenvectors
where the negative and positive identity matrices are of the same dimensions as N and P,respectively.
There are other equivalent definitions of the matrix sign function, but the one givenhere is especially useful in deriving many of its key properties. The JCF definition of thematrix sign function does not generally lend itself to reliable computation on a finite-word-length digital computer. In fact, its reliable numerical calculation is an interesting topic inits own right.
We state some of the more useful properties of the matrix sign function as theorems.Their straightforward proofs are left to the exercises.
Theorem 9.42. Suppose A e C"x" has no eigenvalues on the imaginary axis, and letS = sgn(A). Then the following hold:
1. S is diagonalizable with eigenvalues equal to del.
2. S2 = I.
3. AS = SA.
4. sgn(A") = (sgn(A))".
5. sgn(T-lAT) = T-lsgn(A)TforallnonsingularT e C"x".
6. sgn(cA) = sgn(c) sgn(A)/or all nonzero real scalars c.
Theorem 9.43. Suppose A e C"x" has no eigenvalues on the imaginary axis, and letS — sgn(A). Then the following hold:
1. 7l(S — /) is an A-invariant subspace corresponding to the left half-plane eigenvaluesof A (the negative invariant subspace).
2. R(S+/) is an A-invariant subspace corresponding to the right half-plane eigenvaluesof A (the positive invariant subspace).
3. negA = (/ — S)/2 is a projection onto the negative invariant subspace of A.
4. posA = (/ + S)/2 is a projection onto the positive invariant subspace of A.
EXERCISES
1. Let A e Cnxn have distinct eigenvalues AI, ..., Xn with corresponding right eigen-vectors Xi, ... ,xn and left eigenvectors y\, ..., yn, respectively. Let v e C" be anarbitrary vector. Show that v can be expressed (uniquely) as a linear combination
of the right eigenvectors. Find the appropriate expression for v as a linear combinationof the left eigenvectors as well.
92 Chapter 9. Eigenvalues and Eigenvectors
where the negative and positive identity matrices are of the same dimensions as N and p, respectively.
There are other equivalent definitions of the matrix sign function, but the one given here is especially useful in deriving many of its key properties. The JCF definition of the matrix sign function does not generally lend itself to reliable computation on a finite-wordlength digital computer. In fact, its reliable numerical calculation is an interesting topic in its own right.
We state some of the more useful properties of the matrix sign function as theorems. Their straightforward proofs are left to the exercises.
Theorem 9.42. Suppose A E enxn has no eigenvalues on the imaginary axis, and let S = sgn(A). Then the following hold:
1. S is diagonalizable with eigenvalues equal to ± 1.
2. S2 = I.
3. AS = SA.
4. sgn(AH) = (sgn(A»H.
5. sgn(T-1AT) = T-1sgn(A)T foralinonsingularT E enxn .
6. sgn(cA) = sgn(c) sgn(A) for all nonzero real scalars c.
Theorem 9.43. Suppose A E enxn has no eigenvalues on the imaginary axis, and let S = sgn(A). Then the following hold:
I. R(S -l) is an A-invariant subspace corresponding to the left half-plane eigenvalues of A (the negative invariant subspace).
2. R(S + l) is an A -invariant subspace corresponding to the right half-plane eigenvalues of A (the positive invariant subspace).
3. negA == (l - S) /2 is a projection onto the negative invariant subspace of A.
4. posA == (l + S)/2 is a projection onto the positive invariant subspace of A.
EXERCISES
1. Let A E enxn have distinct eigenvalues ),.1> ••• , ),.n with corresponding right eigenvectors Xl, ... , Xn and left eigenvectors Yl, ••. , Yn, respectively. Let v E en be an arbitrary vector. Show that v can be expressed (uniquely) as a linear combination
of the right eigenvectors. Find the appropriate expression for v as a linear combination of the left eigenvectors as well.
Exercises 93
2. Suppose A € C"x" is skew-Hermitian, i.e., AH = —A. Prove that all eigenvalues ofa skew-Hermitian matrix must be pure imaginary.
3. Suppose A e C"x" is Hermitian. Let A be an eigenvalue of A with correspondingright eigenvector x. Show that x is also a left eigenvector for A. Prove the same resultif A is skew-Hermitian.
5. Determine the eigenvalues, right eigenvectors and right principal vectors if necessary,and (real) JCFs of the following matrices:
6. Determine the JCFs of the following matrices:
Find a nonsingular matrix X such that X 1AX = J, where J is the JCF
Hint: Use[— 1 1 — l]r as an eigenvector. The vectors [0 1 — l]rand[l 0 0]r
are both eigenvectors, but then the equation (A — /)jc(2) = x(1) can't be solved.
8. Show that all right eigenvectors of the Jordan block matrix in Theorem 9.30 must bemultiples of e\ e R*. Characterize all left eigenvectors.
9. Let A e R"x" be of the form A = xyT, where x, y e R" are nonzero vectors withxTy = 0. Determine the JCF of A.
10. Let A e R"xn be of the form A = / + xyT, where x, y e R" are nonzero vectorswith xTy = 0. Determine the JCF of A.
11. Suppose a matrix A e R16x 16 has 16 eigenvalues at 0 and its JCF consists of a singleJordan block of the form specified in Theorem 9.22. Suppose the small number 10~16
is added to the (16,1) element of J. What are the eigenvalues of this slightly perturbedmatrix?
4. Suppose a matrix A € R5x5 has eigenvalues {2, 2, 2, 2, 3}. Determine all possibleJCFs for A.
7. Let
Exercises 93
2. Suppose A E rcnxn is skew-Hermitian, i.e., AH = -A. Prove that all eigenvalues of a skew-Hermitian matrix must be pure imaginary.
3. Suppose A E rcnxn is Hermitian. Let A be an eigenvalue of A with corresponding right eigenvector x. Show that x is also a left eigenvector for A. Prove the same result if A is skew-Hermitian.
4. Suppose a matrix A E lR.5x5 has eigenvalues {2, 2, 2, 2, 3}. Determine all possible JCFs for A.
5. Determine the eigenvalues, right eigenvectors and right principal vectors if necessary, and (real) JCFs of the following matrices:
[ 2 -1 ]
(a) 1 0 '
6. Determine the JCFs of the following matrices:
<a) U j n -2 -1
2 =n 7. Let
A = [H -1]· 2 2"
Find a nonsingular matrix X such that X-I AX = J, where J is the JCF
J=[~ ~ ~]. 001
Hint: Use[-1 1 - I]T as an eigenvector. The vectors [0 -If and[1 0 of are both eigenvectors, but then the equation (A - I)x(2) = x(1) can't be solved.
8. Show that all right eigenvectors of the Jordan block matrix in Theorem 9.30 must be multiples of el E lR.k . Characterize all left eigenvectors.
9. Let A E lR.nxn be of the form A = xyT, where x, y E lR.n are nonzero vectors with x T y = O. Determine the JCF of A.
10. Let A E lR.nxn be of the form A = 1+ xyT, where x, y E lR.n are nonzero vectors with x T y = O. Determine the JCF of A.
11. Suppose a matrix A E lR.16x 16 has 16 eigenvalues at 0 and its JCF consists of a single Jordan block of the form specified in Theorem 9.22. Suppose the small number 10- 16
is added to the (16,1) element of J. What are the eigenvalues of this slightly perturbed matrix?
94 Chapter 9. Eigenvalues and Eigenvectors
12. Show that every matrix A e R"x" can be factored in the form A = Si$2, where Siand £2 are real symmetric matrices and one of them, say Si, is nonsingular.Hint: Suppose A = X J X ~ l is a reduction of A to JCF and suppose we can constructthe "symmetric factorization" of J. Then A = ( X S i X T ) ( X ~ T S2X~l) would be therequired symmetric factorization of A. Thus, it suffices to prove the result for theJCF. The transformation P in (9.18) is useful.
13. Prove that every matrix A e W x" is similar to its transpose and determine a similaritytransformation explicitly.Hint: Use the factorization in the previous exercise.
14. Consider the block upper triangular matrix
where A e M"xn and An e Rkxk with 1 < k < n. Suppose Au ^ 0 and that wewant to block diagonalize A via the similarity transformation
where X e R*x <«-*), i.e.,
Find a matrix equation that X must satisfy for this to be possible. If n = 2 and k = 1,what can you say further, in terms of AU and A 22, about when the equation for X issolvable?
15. Prove Theorem 9.42.
16. Prove Theorem 9.43.
17. Suppose A e C"xn has all its eigenvalues in the left half-plane. Prove thatsgn(A) = -/.
94 Chapter 9. Eigenvalues and Eigenvectors
12. Show that every matrix A E jRnxn can be factored in the form A = SIS2, where SI and S2 are real symmetric matrices and one of them, say S1, is nonsingular. Hint: Suppose A = Xl X-I is a reduction of A to JCF and suppose we can construct the "symmetric factorization" of 1. Then A = (X SIXT)(X-T S2X-I) would be the required symmetric factorization of A. Thus, it suffices to prove the result for the JCF. The transformation P in (9.18) is useful.
13. Prove that every matrix A E jRn xn is similar to its transpose and determine a similarity transformation explicitly. Hint: Use the factorization in the previous exercise.
14. Consider the block upper triangular matrix
A _ [ All - 0 Al2 ]
A22 '
where A E jRnxn and All E jRkxk with 1 ::s: k ::s: n. Suppose Al2 =1= 0 and that we want to block diagonalize A via the similarity transformation
where X E IRkx(n-k), i.e.,
T-IAT = [A 011 0 ] A22 .
Find a matrix equation that X must satisfy for this to be possible. If n = 2 and k = 1, what can you say further, in terms of All and A22, about when the equation for X is solvable?
15. Prove Theorem 9.42.
16. Prove Theorem 9.43.
17. Suppose A E en xn has all its eigenvalues in the left half-plane. Prove that sgn(A) = -1.
Chapter 10
Canonical Forms
10.1 Some Basic Canonical FormsProblem: Let V and W be vector spaces and suppose A : V —>• W is a linear transformation.Find bases in V and W with respect to which Mat A has a "simple form" or "canonicalform." In matrix terms, if A e Rmxn, find P e R™xm and Q e Rn
nxn such that PAQ has a
"canonical form." The transformation A M» PAQ is called an equivalence; it is called anorthogonal equivalence if P and Q are orthogonal matrices.
Remark 10.1. We can also consider the case A e Cm xn and unitary equivalence if P and<2 are unitary.
Two special cases are of interest:
1. If W = V and <2 = P"1, the transformation A H> PAP"1 is called a similarity.
2. If W = V and if Q = PT is orthogonal, the transformation A i-» PAPT is calledan orthogonal similarity (or unitary similarity in the complex case).
The following results are typical of what can be achieved under a unitary similarity. IfA = AH 6 C"x" has eigenvalues AI, . . . , An, then there exists a unitary matrix £7 such thatUHAU — D, where D = d iag(A. j , . . . , A.n). This is proved in Theorem 10.2. What othermatrices are "diagonalizable" under unitary similarity? The answer is given in Theorem10.9, where it is proved that a general matrix A e C"x" is unitarily similar to a diagonalmatrix if and only if it is normal (i.e., AAH = AH A). Normal matrices include Hermitian,skew-Hermitian, and unitary matrices (and their "real" counterparts: symmetric, skew-symmetric, and orthogonal, respectively), as well as other matrices that merely satisfy thedefinition, such as A = [ _a
b ^1 for real scalars a and b. If a matrix A is not normal, themost "diagonal" we can get is the JCF described in Chapter 9.
Theorem 10.2. Let A = AH e C"x" have (real) eigenvalues A.I , . . . , Xn. Then thereexists a unitary matrix X such that XHAX = D = diag(A.j, . . . , Xn) (the columns ofX areorthonormal eigenvectors for A).
95
Chapter 10
Canonical Forms
10.1 Some Basic Canonical Forms Problem: Let V and W be vector spaces and suppose A : V ---+ W is a linear transformation. Find bases in V and W with respect to which Mat A has a "simple form" or "canonical form." In matrix terms, if A E IRmxn , find P E lR;;:xm and Q E lR~xn such that P AQ has a "canonical form." The transformation A f--+ P AQ is called an equivalence; it is called an orthogonal equivalence if P and Q are orthogonal matrices.
Remark 10.1. We can also consider the case A E emxn and unitary equivalence if P and Q are unitary.
Two special cases are of interest:
1. If W = V and Q = p- 1, the transformation A f--+ PAP-I is called a similarity.
2. If W = V and if Q = pT is orthogonal, the transformation A f--+ P ApT is called an orthogonal similarity (or unitary similarity in the complex case).
The following results are typical of what can be achieved under a unitary similarity. If A = A H E en xn has eigenvalues AI, ... , An, then there exists a unitary matrix U such that U H AU = D, where D = diag(AJ, ... , An). This is proved in Theorem 10.2. What other matrices are "diagonalizable" under unitary similarity? The answer is given in Theorem 10.9, where it is proved that a general matrix A E enxn is unitarily similar to a diagonal matrix if and only if it is normal (i.e., AA H = AHA). Normal matrices include Hermitian, skew-Hermitian, and unitary matrices (and their "real" counterparts: symmetric, skewsymmetric, and orthogonal, respectively), as well as other matrices that merely satisfy the
definition, such as A = [_~ !] for real scalars a and h. If a matrix A is not normal, the most "diagonal" we can get is the JCF described in Chapter 9.
Theorem 10.2. Let A = A H E en xn have (real) eigenvalues AI, ... ,An. Then there exists a unitary matrix X such that X H AX = D = diag(Al, ... , An) (the columns of X are orthonormal eigenvectors for A).
95
96 Chapter 10. Canonical Forms
Proof: Let x\ be a right eigenvector corresponding to X\, and normalize it such that xf*x\ =1. Then there exist n — 1 additional vectors x2, . . . , xn such that X = [x\,..., xn] =[x\ X2] is unitary. Now
Then x^U2 = 0 (/ € k) means that xf is orthogonal to each of the n — k columns of U2.But the latter are orthonormal since they are the last n — k rows of the unitary matrix U.Thus, [Xi f/2] is unitary. D
The construction called for in Theorem 10.2 is then a special case of Theorem 10.3for k = 1. We illustrate the construction of the necessary Householder matrix for k — 1.For simplicity, we consider the real case. Let the unit vector x\ be denoted by [£i, . . . , %n]
T.
In (10.1) we have used the fact that Ax\ = k\x\. When combined with the fact thatx"xi = 1, we get A-i remaining in the (l,l)-block. We also get 0 in the (2,l)-block bynoting that x\ is orthogonal to all vectors in X2. In (10.2), we get 0 in the (l,2)-block bynoting that XH AX is Hermitian. The proof is completed easily by induction upon notingthat the (2,2)-block must have eigenvalues X2,..., A.n. D
Given a unit vector x\ e E", the construction of X2 e ]R"X("-1) such that X —[x\ X2] is orthogonal is frequently required. The construction can actually be performedquite easily by means of Householder (or Givens) transformations as in the proof of thefollowing general result.
Theorem 10.3. Let X\ e Cnxk have orthonormal columns and suppose U is a unitary
matrix such that UX\ = \ 0 1, where R € Ckxk is upper triangular. Write UH = [U\ U2]
with Ui € Cnxk. Then [Xi U2] is unitary.
Proof: Let X\ = [x\,..., Xk]. Construct a sequence of Householder matrices (also knownas elementary reflectors) H\,..., Hk in the usual way (see below) such that
where R is upper triangular (and nonsingular since x\, ..., Xk are orthonormal). Let U =Hk...Hv. Then UH = / / , - • • Hk and
96 Chapter 10. Canonical Forms
Proof' Let XI be a right eigenvector corresponding to AI, and normalize it such that x~ XI = 1. Then there exist n - 1 additional vectors X2, ... , Xn such that X = (XI, ... , xn] = [XI X2] is unitary. Now
XHAX = [ xH ] A [XI X 2] = [ x~Axl X~AX2 ] I
XH XfAxl XfAX2 2
=[ Al X~AX2 ] (10.1) 0 XfAX2
=[ Al 0 l (10.2) 0 XfAXz
In (l0.1) we have used the fact that AXI = AIXI. When combined with the fact that x~ XI = 1, we get Al remaining in the (l,I)-block. We also get 0 in the (2, I)-block by noting that XI is orthogonal to all vectors in Xz. In (10.2), we get 0 in the (l,2)-block by noting that XH AX is Hermitian. The proof is completed easily by induction upon noting that the (2,2)-block must have eigenvalues A2, ... , An. 0
Given a unit vector XI E JRn, the construction of Xz E JRnx(n-l) such that X = [XI X2 ] is orthogonal is frequently required. The construction can actually be performed quite easily by means of Householder (or Givens) transformations as in the proof of the following general result.
Theorem 10.3. Let XI E Cnxk have orthonormal columns and suppose V is a unitary
matrix such that V X I = [ ~], where R E Ckxk is upper triangular. Write V H = [VI Vz]
with VI E Cnxk. Then [XI V2 ] is unitary.
Proof: Let X I = [XI, ... ,xd. Construct a sequence of Householder matrices (also known as elementary reflectors) HI, ... , Hk in the usual way (see below) such that
Hk ... HdxI, ... , xd = [ ~ l where R is upper triangular (and nonsingular since XI, ... , Xk are orthonormal). Let V = Hk'" HI. Then VH = HI'" Hk and
Then XiH U2 = 0 (i E ~) means that Xi is orthogonal to each of the n - k columns of V2.
But the latter are orthonormal since they are the last n - k rows of the unitary matrix U. Thus. [XI U2] is unitary. 0
The construction called for in Theorem 10.2 is then a special case of Theorem 10.3 for k = 1. We illustrate the construction of the necessary Householder matrix for k = 1. For simplicity, we consider the real case. Let the unit vector XI be denoted by [~I, .. , , ~nf.
10.1. Some Basic Canonical Forms 97
Then the necessary Householder matrix needed for the construction of X^ is given byU = I — 2uu+ = I — -^UUT, where u = [t-\ ± 1, £2, • • •» £«]r- It can easily be checkedthat U is symmetric and UTU = U2 = I, so U is orthogonal. To see that U effects thenecessary compression of jci, it is easily verified that UTU = 2 ± 2£i and UTX\ = 1 ± £1.Thus,
Further details on Householder matrices, including the choice of sign and the complex case,can be consulted in standard numerical linear algebra texts such as [7], [11], [23], [25].
The real version of Theorem 10.2 is worth stating separately since it is applied fre-quently in applications.
Theorem 10.4. Let A = AT e Enxn have eigenvalues k\, ... ,Xn. Then there exists anorthogonal matrix X e Wlxn (whose columns are orthonormal eigenvectors of A) such thatXTAX = D = diag(Xi, . . . , Xn).
Note that Theorem 10.4 implies that a symmetric matrix A (with the obvious analoguefrom Theorem 10.2 for Hermitian matrices) can be written
which is often called the spectral representation of A. In fact, A in (10.3) is actually aweighted sum of orthogonal projections P, (onto the one-dimensional eigenspaces corre-sponding to the A., 's), i.e.,
where P, = PUM — xixf = xixj since xj xi — 1.The following pair of theorems form the theoretical foundation of the double-Francis-
QR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
10.1. Some Basic Canonical Forms 97
Then the necessary Householder matrix needed for the construction of X2 is given by U = I - 2uu+ = I - +uuT , where u = [';1 ± 1, ';2, ... , ';nf. It can easily be checked u u that U is symmetric and U T U = U 2 = I, so U is orthogonal. To see that U effects the necessary compression of Xl, it is easily verified that uT u = 2 ± 2';1 and uT
Xl = 1 ± ';1. Thus,
Further details on Householder matrices, including the choice of sign and the complex case, can be consulted in standard numerical linear algebra texts such as [7], [11], [23], [25].
The real version of Theorem 10.2 is worth stating separately since it is applied frequently in applications.
Theorem 10.4. Let A = AT E jRnxn have eigenvalues AI, ... , An. Then there exists an orthogonal matrix X E jRn xn (whose columns are orthonormal eigenvectors of A) such that XT AX = D = diag(Al, ... , An).
Note that Theorem 10.4 implies that a symmetric matrix A (with the obvious analogue from Theorem 10.2 for Hermitian matrices) can be written
n
A = XDX T = LAiXiXT, (10.3) i=1
which is often called the spectral representation of A. In fact, A in (10.3) is actually a weighted sum of orthogonal projections Pi (onto the one-dimensional eigenspaces corresponding to the Ai'S), i.e.,
n
A = LAiPi, i=l
where Pi = PR(x;) = xiXt = xixT since xT Xi = 1. The following pair of theorems form the theoretical foundation of the double-Francis
QR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
98 Chapter 10. Canonical Forms
Theorem 10.5 (Schur). Let A e C"x". Then there exists a unitary matrix U such thatUH AU = T, where T is upper triangular.
Proof: The proof of this theorem is essentially the same as that of Theorem 10.2 except thatin this case (using the notation U rather than X) the (l,2)-block wf AU2 is not 0. D
In the case of A e R"x", it is thus unitarily similar to an upper triangular matrix, butif A has a complex conjugate pair of eigenvalues, then complex arithmetic is clearly neededto place such eigenvalues on the diagonal of T. However, the next theorem shows that everyA e Wxn is also orthogonally similar (i.e., real arithmetic) to a quasi-upper-triangularmatrix. A quasi-upper-triangular matrix is block upper triangular with 1 x 1 diagonalblocks corresponding to its real eigenvalues and 2x2 diagonal blocks corresponding to itscomplex conjugate pairs of eigenvalues.
Theorem 10.6 (Murnaghan-Wintner). Let A e R"x". Then there exists an orthogonalmatrix U such that UTAU = S, where S is quasi-upper-triangular.
Definition 10.7. The triangular matrix T in Theorem 10.5 is called a Schur canonicalform or Schur form. The quasi-upper-triangular matrix S in Theorem 10.6 is called a realSchur canonical form or real Schur form (RSF). The columns of a unitary [orthogonal]matrix U that reduces a matrix to [real] Schur form are called Schur vectors.
Example 10.8. The matrix
is in RSF. Its real JCF is
Note that only the first Schur vector (and then only if the corresponding first eigenvalueis real if U is orthogonal) is an eigenvector. However, what is true, and sufficient for virtuallyall applications (see, for example, [17]), is that the first k Schur vectors span the same A-invariant subspace as the eigenvectors corresponding to the first k eigenvalues along thediagonal of T (or S).
While every matrix can be reduced to Schur form (or RSF), it is of interest to knowwhen we can go further and reduce a matrix via unitary similarity to diagonal form. Thefollowing theorem answers this question.
Theorem 10.9. A matrix A e C"x" is unitarily similar to a diagonal matrix if and only ifA is normal (i.e., AHA = AAH).
Proof: Suppose U is a unitary matrix such that UH AU = D, where D is diagonal. Then
so A is normal.
98 Chapter 10. Canonical Forms
Theorem 10.5 (Schur). Let A E cnxn . Then there exists a unitary matrix U such that U H AU = T, where T is upper triangular.
Proof: The proof of this theorem is essentially the same as that of Theorem lO.2 except that in this case (using the notation U rather than X) the (l,2)-block ur AU2 is not O. 0
In the case of A E IRn xn , it is thus unitarily similar to an upper triangular matrix, but if A has a complex conjugate pair of eigenvalues, then complex arithmetic is clearly needed to place such eigenValues on the diagonal of T. However, the next theorem shows that every A E IRnxn is also orthogonally similar (i.e., real arithmetic) to a quasi-upper-triangular matrix. A quasi-upper-triangular matrix is block upper triangular with 1 x 1 diagonal blocks corresponding to its real eigenvalues and 2 x 2 diagonal blocks corresponding to its complex conjugate pairs of eigenvalues.
Theorem 10.6 (Murnaghan-Wintner). Let A E IRn xn. Then there exists an orthogonal matrix U such that U T AU = S, where S is quasi-upper-triangular.
Definition 10.7. The triangular matrix T in Theorem 10.5 is called a Schur canonical form or Schur fonn. The quasi-upper-triangular matrix S in Theorem 10.6 is called a real Schur canonical form or real Schur fonn (RSF). The columns of a unitary [orthogonal} matrix U that reduces a matrix to [real} Schur fonn are called Schur vectors.
Example 10.8. The matrix
s~ [ -2 5 n -2 4
0 0
is in RSF. Its real JCF is
h[ 1 n -1 1 0 0
Note that only the first Schur vector (and then only if the corresponding first eigenvalue is real if U is orthogonal) is an eigenvector. However, what is true, and sufficient for virtually all applications (see, for example, [17]), is that the first k Schur vectors span the same Ainvariant subspace as the eigenvectors corresponding to the first k eigenvalues along the diagonal of T (or S).
While every matrix can be reduced to Schur form (or RSF), it is of interest to know when we can go further and reduce a matrix via unitary similarity to diagonal form. The following theorem answers this question.
Theorem 10.9. A matrix A E cnxn is unitarily similar to a diagonal matrix if and only if A is normal (i.e., AH A = AAH).
Proof: Suppose U is a unitary matrix such that U H AU = D, where D is diagonal. Then
AAH = U VUHU VHU H = U DDHUH == U DH DUH == AH A
so A is normal.
10.2. Definite Matrices 99
Conversely, suppose A is normal and let U be a unitary matrix such that UH AU = T,where T is an upper triangular matrix (Theorem 10.5). Then
It is then a routine exercise to show that T must, in fact, be diagonal. D
10.2 Definite MatricesDefinition 10.10. A symmetric matrix A e Wxn is
1. positive definite if and only ifxTAx > Qfor all nonzero x G W1. We write A > 0.
2. nonnegative definite (or positive semidefinite) if and only if XT Ax > 0 for allnonzero x e W. We write A > 0.
3. negative definite if—A is positive definite. We write A < 0.
4. nonpositive definite (or negative semidefinite) if— A is nonnegative definite. Wewrite A < 0.
Also, if A and B are symmetric matrices, we write A > B if and only if A — B > 0 orB — A < 0. Similarly, we write A > B if and only ifA — B>QorB — A < 0.
Remark 10.11. If A e C"x" is Hermitian, all the above definitions hold except thatsuperscript //s replace Ts. Indeed, this is generally true for all results in the remainder ofthis section that may be stated in the real case for simplicity.
Remark 10.12. If a matrix is neither definite nor semidefinite, it is said to be indefinite.
Theorem 10.13. Let A = AH e Cnxn with eigenvalues X{ > A2 > • • • > An. Then for allx eC",
Proof: Let U be a unitary matrix that diagonalizes A as in Theorem 10.2. Furthermore,let v = UHx, where x is an arbitrary vector in CM, and denote the components of y byj]i, i € n. Then
But clearly
10.2. Definite Matrices 99
Conversely, suppose A is normal and let U be a unitary matrix such that U H A U = T, where T is an upper triangular matrix (Theorem 10.5). Then
It is then a routine exercise to show that T must, in fact, be diagonal. 0
10.2 Definite Matrices Definition 10.10. A symmetric matrix A E lR.nxn is
1. positive definite if and only if x T Ax > 0 for all nonzero x E lR.n. We write A > O.
2. nonnegative definite (or positive semidefinite) if and only if x T Ax :::: 0 for all nonzero x E lR.n • We write A :::: O.
3. negative definite if - A is positive definite. We write A < O.
4. nonpositive definite (or negative semidefinite) if -A is nonnegative definite. We write A ~ O.
Also, if A and B are symmetric matrices, we write A > B if and only if A - B > 0 or B - A < O. Similarly, we write A :::: B if and only if A - B :::: 0 or B - A ~ O.
Remark 10.11. If A E enxn is Hermitian, all the above definitions hold except that superscript H s replace T s. Indeed, this is generally true for all results in the remainder of this section that may be stated in the real case for simplicity.
Remark 10.12. If a matrix is neither definite nor semidefinite, it is said to be indefinite.
Theorem 10.13. Let A = AH E enxn with eigenvalues AI :::: A2 :::: ... :::: An. Thenfor all x E en,
Proof: Let U be a unitary matrix that diagonalizes A as in Theorem 10.2. Furthermore, let y = U H x, where x is an arbitrary vector in en, and denote the components of y by 11;, i En. Then
But clearly
n
x H Ax = (U H X)H U H AU(UH x) = yH Dy = LA; 111;12.
n
LA; 11'/;12 ~ AlyH Y = AIX H X ;=1
;=1
100 Chapter 10. Canonical Forms
and
from which the theorem follows. D
Remark 10.14. The ratio ^^ for A = AH <= Cnxn and nonzero jc e C" is called theRayleigh quotient of jc. Theorem 10.13 provides upper (AO and lower (A.w) bounds forthe Rayleigh quotient. If A = AH e C"x" is positive definite, XHAx > 0 for all nonzerox E C",soO < Xn < ••• < A.I .
Corollary 10.15. Let A e C"x". Then \\A\\2 = ̂ m(AH A}.
Proof: For all x € C" we have
Let jc be an eigenvector corresponding to Xmax(AHA). Then ^pjp2 = ^^(A" A), whence
Definition 10.16. A principal submatrix of an nxn matrix A is the (n — k)x(n — k) matrixthat remains by deleting k rows and the corresponding k columns. A leading principalsubmatrix of order n — k is obtained by deleting the last k rows and columns.
Theorem 10.17. A symmetric matrix A e E" x" is positive definite if and only if any of thefollowing three equivalent conditions hold:
1. The determinants of all leading principal submatrices of A are positive.
2. All eigenvalues of A are positive.
3. A can be written in the form MT M, where M e R"x" is nonsingular.
Theorem 10.18. A symmetric matrix A € R"x" is nonnegative definite if and only if anyof the following three equivalent conditions hold:
1. The determinants of all principal submatrices of A are nonnegative.
2. All eigenvalues of A are nonnegative.
3. A can be written in the form MT M, where M 6 Rix" and k > rank(A) — rank(M).
Remark 10.19. Note that the determinants of all principal eubmatrioes muet bQ nonnogativoin Theorem 10.18.1, not just those of the leading principal submatrices. For example,consider the matrix A — [0 _l 1. The determinant of the 1x1 leading submatrix is 0 andthe determinant of the 2x2 leading submatrix is also 0 (cf. Theorem 10.17). However, the
100 Chapter 10. Canonical Forms
and n
LAillJilZ::: AnyHy = An xHx , i=l
from which the theorem follows. 0
Remark 10.14. The ratio XHHAx for A = AH E enxn and nonzero x E en is called the x x
Rayleigh quotient of x. Theorem 1O.l3 provides upper (A 1) and lower (An) bounds for the Rayleigh quotient. If A = AH E enxn is positive definite, x H Ax > 0 for all nonzero x E en, so 0 < An ::::: ... ::::: A I.
I
Corollary 10.15. Let A E enxn . Then IIAII2 = Ar1ax(AH A).
Proof: For all x E en we have
I
Let x be an eigenvector corresponding to Amax (A H A). Then 111~~1~2 = Ar1ax (A H A), whence
IIAxll2 ! H IIAliz = max -- = Amax{A A). 0
xfO IIxll2
Definition 10.16. A principal submatrixofan n x n matrix A is the (n -k) x (n -k) matrix that remains by deleting k rows and the corresponding k columns. A leading principal submatrix of order n - k is obtained by deleting the last k rows and columns.
Theorem 10.17. A symmetric matrix A E ~nxn is positive definite if and only if any of the following three equivalent conditions hold:
1. The determinants of all leading principal submatrices of A are positive.
2. All eigenvalues of A are positive.
3. A can be written in the form MT M, where M E ~n xn is nonsingular.
Theorem 10.18. A symmetric matrix A E ~n xn is nonnegative definite if and only if any of the following three equivalent conditions hold:
1. The determinants of all principal submatrices of A are nonnegative.
2. All eigenvalues of A are nonnegaTive.
3. A can be wrirren in [he/orm MT M, where M E IRb<n and k ~ ranlc(A) "" ranlc(M).
R.@mllrk 10.19. Not@th!ltthl!dl!termin!lntl:ofnllprincip!ll "ubm!ltriC[!!l mu"t bB nonnBgmivB in Theorem 10.18.1, not just those of the leading principal submatrices. For example,
consider the matrix A = [~ _ ~]. The determinant of the I x 1 leading submatrix is 0 and the determinant of the 2 x 2 leading submatrix is also 0 (cf. Theorem 10.17). However, the
10.2. Definite Matrices 101
principal submatrix consisting of the (2,2) element is, in fact, negative and A is nonpositivedefinite.
Remark 10.20. The factor M in Theorem 10.18.3 is not unique. For example, if
Recall that A > B if the matrix A — B is nonnegative definite. The followingtheorem is useful in "comparing" symmetric matrices. Its proof is straightforward frombasic definitions.
Theorem 10.21. Let A, B e Rnxn be symmetric.
1. If A >BandMe Rnxm, then MT AM > MTBM.
2. If A > B and M e Rnxm, then MT AM > M.TBM.j m
The following standard theorem is stated without proof (see, for example, [16, p.181]). It concerns the notion of the "square root" of a matrix. That is, if A € E"xn, we saythat S e Rnx" is a square root of A if S2 — A. In general, matrices (both symmetric andnonsymmetric) have infinitely many square roots. For example, if A = /2, any matrix S of
the form [ cs°*
ee _ c
s™ 9e ] is a square root.
Theorem 10.22. Let A e R"x" be nonnegative definite. Then A has a unique nonnegativedefinite square root S. Moreover, SA = AS and rankS = rank A (and hence S is positivedefinite if A is positive definite).
A stronger form of the third characterization in Theorem 10.17 is available and isknown as the Cholesky factorization. It is stated and proved below for the more generalHermitian case.
Theorem 10.23. Let A e <Cnxn be Hermitian and positive definite. Then there exists aunique nonsingular lower triangular matrix L with positive diagonal elements such thatA = LLH.
Proof: The proof is by induction. The case n = 1 is trivially true. Write the matrix A inthe form
By our induction hypothesis, assume the result is true for matrices of order n — 1 so that Bmay be written as B = L\L^, where L\ e C1-""1^""^ is nonsingular and lower triangular
then M can be
10.2. Definite Matrices 101
principal submatrix consisting of the (2,2) element is, in fact, negative and A is nonpositive definite.
Remark 10.20. The factor M in Theorem 10.18.3 is not unique. For example, if
then M can be
[1 0], [ fz -ti
o [~ 0] o l ~ 0 , ...
v'3 0
Recall that A :::: B if the matrix A - B is nonnegative definite. The following theorem is useful in "comparing" symmetric matrices. Its proof is straightforward from basic definitions.
Theorem 10.21. Let A, B E jRnxn be symmetric.
1. 1f A :::: Band M E jRnxm, then MT AM :::: MT BM.
2. If A> Band M E jR~xm, then MT AM> MT BM.
The following standard theorem is stated without proof (see, for example, [16, p. 181]). It concerns the notion of the "square root" of a matrix. That is, if A E lR.nxn , we say that S E jRn xn is a square root of A if S2 = A. In general, matrices (both symmetric and nonsymmetric) have infinitely many square roots. For example, if A = lz, any matrix S of
h " [COSO Sino] . t e 10rm sinO _ cosO IS a square root.
Theorem 10.22. Let A E lR.nxn be nonnegative definite. Then A has a unique nonnegative definite square root S. Moreover, SA = AS and rankS = rankA (and hence S is positive definite if A is positive definite).
A stronger form of the third characterization in Theorem 10.17 is available and is known as the Cholesky factorization. It is stated and proved below for the more general Hermitian case.
Theorem 10.23. Let A E cnxn be Hermitian and positive definite. Then there exists a unique nonsingular lower triangular matrix L with positive diagonal elements such that A = LLH.
Proof: The proof is by induction. The case n = 1 is trivially true. Write the matrix A in the form
By our induction hypothesis, assume the result is true for matrices of order n - 1 so that B may be written as B = L1Lf, where Ll E c(n-l)x(n-l) is nonsingular and lower triangular
102 Chapter 10. Canonical Forms
with positive diagonal elements. It remains to prove that we can write the n x n matrix Ain the form
10.3 Equivalence Transformations and CongruenceTheorem 10.24. Let A € C™*71. Then there exist matrices P e C™xm and Q e C"n
x" suchthat
Proof: A classical proof can be consulted in, for example, [21, p. 131]. Alternatively,suppose A has an SVD of the form (5.2) in its complex version. Then
Note that the greater freedom afforded by the equivalence transformation of Theorem10.24, as opposed to the more restrictive situation of a similarity transformation, yields afar "simpler" canonical form (10.4). However, numerical procedures for computing suchan equivalence directly via, say, Gaussian or elementary row and column operations, aregenerally unreliable. The numerically preferred equivalence is, of course, the unitary equiv-alence known as the SVD. However, the SVD is relatively expensive to compute and othercanonical forms exist that are intermediate between (10.4) and the SVD; see, for example[7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computablethan (10.4) and more efficiently computable than a full SVD. Many similar results are alsoavailable.
where a is positive. Performing the indicated matrix multiplication and equating the cor-responding submatrices, we see that we must have L\c = b and ann = CHC + a2. Clearlyc is given simply by c = L^b. Substituting in the expression involving a, we finda2 = ann — bHL\H L\lb = ann — bH B~lb (= the Schur complement of B in A). But weknow that
Since det(fi) > 0, we must have ann —bHB lb > 0. Choosing a to be the positive squareroot of «„„ — bHB~lb completes the proof. D
102 Chapter 10. Canonical Forms
with positive diagonal elements. It remains to prove that we can write the n x n matrix A in the form
b ] = [LJ 0 ] [Lf c J, ann c a 0 a
where a is positive. Performing the indicated matrix multiplication and equating the corresponding submatrices, we see that we must have L I C = b and ann = cH c + a 2• Clearly c is given simply by c = C,lb. Substituting in the expression involving a, we find a 2 = ann - bH LIH L11b = ann - bH B-1b (= the Schur complement of B in A). But we know that
o < det(A) = det [ ~ b ] = det(B) det(ann _ bH B-1b). ann
Since det(B) > 0, we must have ann - bH B-1b > O. Choosing a to be the positive square root of ann - b H B-1b completes the proof. 0
10.3 Equivalence Transformations and Congruence Theorem 10.24. Let A E c;,xn. Then there exist matrices P E C: xm and Q E c~xn such that
PAQ=[~ ~l (l0.4)
Proof: A classical proof can be consulted in, for example, [21, p. 131]. Alternatively, suppose A has an SVD of the form (5.2) in its complex version. Then
[ S-l 0 ] [ UH
] [I 0 ] o I Uf AV = 0 0 .
Take P = [ S~ 'f [I ] and Q = V to complete the proof. 0
Note that the greater freedom afforded by the equivalence transformation of Theorem 10.24, as opposed to the more restrictive situation of a similarity transformation, yields a far "simpler" canonical form (10.4). However, numerical procedures for computing such an equivalence directly via, say, Gaussian or elementary row and column operations, are generally unreliable. The numerically preferred equivalence is, of course, the unitary equivalence known as the SVD. However, the SVD is relatively expensive to compute and other canonical forms exist that are intermediate between (l0.4) and the SVD; see, for example [7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable than (lOA) and more efficiently computable than a full SVD. Many similar results are also available.
10.3. Equivalence Transformations and Congruence 103
Theorem 10.25 (Complete Orthogonal Decomposition). Let A e C™x". Then there existunitary matrices U e Cmxm and V e Cnxn such that
where R e €,rrxr is upper (or lower) triangular with positive diagonal elements.
Proof: For the proof, see [4]. D
Theorem 10.26. Let A e C™x". Then there exists a unitary matrix Q e Cmxm and apermutation matrix Fl e C"x" such that
where R E Crrxr is upper triangular and S e Crx(" r) is arbitrary but in general nonzero.
Proof: For the proof, see [4]. D
Remark 10.27. When A has full column rank but is "near" a rank deficient matrix,various rank revealing QR decompositions are available that can sometimes detect suchphenomena at a cost considerably less than a full SVD. Again, see [4] for details.
Definition 10.28. Let A e Cnxn and X e Cnnxn. The transformation A i-> XH AX is called
a congruence. Note that a congruence is a similarity if and only ifX is unitary.
Note that congruence preserves the property of being Hermitian; i.e., if A is Hermitian,then XH AX is also Hermitian. It is of interest to ask what other properties of a matrix arepreserved under congruence. It turns out that the principal property so preserved is the signof each eigenvalue.
Definition 10.29. Let A = AH e C"x" and let 7t, v, and £ denote the numbers of positive,negative, and zero eigenvalues, respectively, of A. Then the inertia of A is the triple ofnumbers In(A) = (n, v, £). The signature of A is given by sig(A) = n — v.
Example 10.30.
2. If A = A" eC n x " , thenA > 0 if and only if In (A) = (n,0,0).
3. If In(A) = (TT, v, £), then rank(A) = n + v.
Theorem 10.31 (Sylvester's Law of Inertia). Let A = AHe Cnxn and X e Cnnxn. Then
In(A) = ln(XHAX).
Proof: For the proof, see, for example, [21, p. 134]. D
Theorem 10.31 guarantees that rank and signature of a matrix are preserved undercongruence. We then have the following.
10.3. Equivalence Transformations and Congruence 103
Theorem 10.25 (Complete Orthogonal Decomposition). Let A E e~xn. Then there exist unitary matrices U E e mxm and V E e nxn such that
where R E e;xr is upper (or lower) triangular with positive diagonal elements.
Proof: For the proof, see [4]. 0
(10.5)
Theorem 10.26. Let A E e~xn. Then there exists a unitary matrix Q E e mxm and a permutation matrix IT E en xn such that
QAIT = [~ ~ l (10.6)
where R E e;xr is upper triangular and S E erx(n-r) is arbitrary but in general nonzero.
Proof: For the proof, see [4]. 0
Remark 10.27. When A has full column rank but is "near" a rank deficient matrix, various rank revealing QR decompositions are available that can sometimes detect such phenomena at a cost considerably less than a full SVD. Again, see [4] for details.
Definition 10.28. Let A E e nxn and X E e~xn. The transformation A H- XH AX is called a congruence. Note that a congruence is a similarity if and only if X is unitary.
Note that congruence preserves the property of being Hermitian; i.e., if A is Hermitian, then X H AX is also Hermitian. It is of interest to ask what other properties of a matrix are preserved under congruence. It turns out that the principal property so preserved is the sign of each eigenvalue.
Definition 10.29. Let A = AH E e nxn and let rr, v, and ~ denote the numbers of positive, negative, and zero eigenvalues, respectively, of A. Then the inertia of A is the triple of numbers In(A) = (rr, v, n The signature of A is given by sig(A) = rr - v.
Example 10.30.
l.In[! 1 o 0
00] o 0 -10 =(2,1,1).
o 0
2. If A = AH E e nxn , then A> 0 if and only if In(A) = (n, 0, 0).
3. If In(A) = (rr, v, n, then rank(A) = rr + v.
Theorem 10.31 (Sylvester's Law of Inertia). Let A = A HE en xn and X E e~ xn. Then In(A) = In(XH AX).
Proof: For the proof, see, for example, [21, p. 134]. D
Theorem 10.31 guarantees that rank and signature of a matrix are preserved under congruence. We then have the following.
104 Chapter 10. Canonical Forms
Theorem 10.32. Let A = AHe C"xn with In(A) = (jt, v, £). Then there exists a matrixX e C"n
xn such that XHAX = diag(l, . . . , 1, -1,..., -1, 0, . . . , 0), where the number of1 's is 7i, the number of — l's is v, and the number 0/0 's is (,.
Proof: Let AI , . . . , Xw denote the eigenvalues of A and order them such that the first TT arepositive, the next v are negative, and the final £ are 0. By Theorem 10.2 there exists a unitarymatrix U such that UHAU = diag(Ai, . . . , Aw). Define the n x n matrix
Then it is easy to check that X = U W yields the desired result. D
10.3.1 Block matrices and definiteness
Theorem 10.33. Suppose A = AT and D = DT. Then
if and only if either A > 0 and D - BT A~l B > 0, or D > 0 and A - BD^BT > 0.
Proof: The proof follows by considering, for example, the congruence
The details are straightforward and are left to the reader. D
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem.
Theorem 10.35. Suppose A = AT and D = DT. Then
if and only ifA>0, AA+B = B, and D - BT A+B > 0.
Proof: Consider the congruence with
and proceed as in the proof of Theorem 10.33. D
10.4 Rational Canonical FormOne final canonical form to be mentioned is the rational canonical form.
104 Chapter 10. Canonical Forms
Theorem 10.32. Let A = AH E cnxn with In(A) = (Jr, v, O. Then there exists a matrix X E c~xn such that XH AX = diag(1, ... , I, -I, ... , -1,0, ... ,0), where the number of 1 's is Jr, the number of -I 's is v, and the numberofO's is~.
Proof: Let A I, ... , An denote the eigenvalues of A and order them such that the first Jr are positive, the next v are negative, and the final ~ are O. By Theorem 10.2 there exists a unitary matrix V such that VH AV = diag(AI, ... , An). Define the n x n matrix
vv = diag(I/~, ... , I/~, 1/.f-Arr+I' ... , I/.f-Arr+v, I, ... ,1).
Then it is easy to check that X = V VV yields the desired result. 0
10.3.1 Block matrices and definiteness
Theorem 10.33. Suppose A = AT and D = DT. Then
ifand only ifeither A> ° and D - BT A-I B > 0, or D > 0 and A - BD- I BT > O.
Proof: The proof follows by considering, for example, the congruence
B ] [I _A-I B JT [ A D ~ 0 I BT ~ ] [ ~
The details are straightforward and are left to the reader. 0
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem.
Theorem 10.35. Suppose A = AT and D = DT. Then
B ] > ° D -
if and only if A:::: 0, AA+B = B. and D - BT A+B:::: o. Proof: Consider the congruence with
and proceed as in the proof of Theorem 10.33. 0
10.4 Rational Canonical Form One final canonical form to be mentioned is the rational canonical form.
10.4. Rational Canonical Form 105
Definition 10.36. A matrix A e M"x" is said to be nonderogatory if its minimal polynomialand characteristic polynomial are the same or, equivalently, if its Jordan canonical formhas only one block associated with each distinct eigenvalue.
Suppose A E Wxn is a nonderogatory matrix and suppose its characteristic polyno-mial is 7r(A) = A" — (a0 + «A +is similar to a matrix of the form
+ a n _iA n ~' ) - Then it can be shown (see [12]) that A
Definition 10.37. A matrix A e Enx" of the form (10.7) is called a companion matrix oris said to be in companion form.
Companion matrices also appear in the literature in several equivalent forms. Toillustrate, consider the companion matrix
This matrix is a special case of a matrix in lower Hessenberg form. Using the reverse-orderidentity similarity P given by (9.18), A is easily seen to be similar to the following matrixin upper Hessenberg form:
Moreover, since a matrix is similar to its transpose (see exercise 13 in Chapter 9), thefollowing are also companion matrices similar to the above:
Notice that in all cases a companion matrix is nonsingular if and only if aO /= 0.In fact, the inverse of a nonsingular companion matrix is again in companion form. For£*Yamr\1j=»
10.4. Rational Canonical Form 105
Definition 10.36. A matrix A E lRn Xn is said to be nonderogatory ifits minimal polynomial and characteristic polynomial are the same or; equivalently, if its Jordan canonical form has only one block associated with each distinct eigenvalue.
Suppose A E lRnxn is a nonderogatory matrix and suppose its characteristic polynomial is n(A) = An - (ao + alA + ... + an_IAn-I). Then it can be shown (see [12]) that A is similar to a matrix of the form
o o o o 0
o (10.7)
o o
Definition 10.37. A matrix A E lRnxn of the form (10.7) is called a cornpanion rnatrix or is said to be in cornpanion forrn.
Companion matrices also appear in the literature in several equivalent forms. To illustrate, consider the companion matrix
(l0.8)
This matrix is a special case of a matrix in lower Hessenberg form. Using the reverse-order identity similarity P given by (9.18), A is easily seen to be similar to the following matrix in upper Hessenberg form:
a2 al
o 0 1 0 o 1
6] o . o
(10.9)
Moreover, since a matrix is similar to its transpose (see exercise 13 in Chapter 9), the following are also companion matrices similar to the above:
l:: ~ ! ~01]. ao 0 0
(10.10)
Notice that in all cases a companion matrix is nonsingular if and only if ao i= O. In fact, the inverse of a nonsingular companion matrix is again in companion form. For example,
o 1 o
-~ ao
1 o o
-~ ao
o
o
_!!l
o o
(10.11)
106 Chapter 10. Canonical Forms
with a similar result for companion matrices of the form (10.10).If a companion matrix of the form (10.7) is singular, i.e., if ao = 0, then its pseudo-
inverse can still be computed. Let a e M""1 denote the vector \a\, 02,..., an-i] and letc = l+
lara. Then it is easily verified that
Note that / — caaT = (I + aaT) , and hence the pseudoinverse of a singular companionmatrix is not a companion matrix unless a = 0.
Companion matrices have many other interesting properties, among which, and per-haps surprisingly, is the fact that their singular values can be found in closed form; see[14].
Theorem 10.38. Let a\ > GI > • • • > an be the singular values of the companion matrixA in (10.7). Let a = a\ + a\ + • • • + a%_{ and y = 1 + «.Q + a. Then
Remark 10.39. Explicit formulas for all the associated right and left singular vectors canalso be derived easily.
If A € Rnx" is derogatory, i.e., has more than one Jordan block associated withat least one eigenvalue, then it is not similar to a companion matrix of the form (10.7).However, it can be shown that a derogatory matrix is similar to a block diagonal matrix,each of whose diagonal blocks is a companion matrix. Such matrices are said to be inrational canonical form (or Frobenius canonical form). For details, see, for example, [12].
Companion matrices appear frequently in the control and signal processing literaturebut unfortunately they are often very difficult to work with numerically. Algorithms to reducean arbitrary matrix to companion form are numerically unstable. Moreover, companionmatrices are known to possess many undesirable numerical properties. For example, ingeneral and especially as n increases, their eigenstructure is extremely ill conditioned,nonsingular ones are nearly singular, stable ones are nearly unstable, and so forth [14].
Ifao ^ 0, the largest and smallest singular values can also be written in the equivalent form
106 Chapter 10. Canonical Forms
with a similar result for companion matrices of the form (10.10). If a companion matrix of the form (10.7) is singular, i.e., if ao = 0, then its pseudo
inverse can still be computed. Let a E JRn-1 denote the vector [ai, a2, ... , an-If and let
c = I+~T a' Then it is easily verified that
o
o
o
o
o o
o o
+
o 1- caaT o J. ca
Note that I - caa T = (I + aa T) -I , and hence the pseudoinverse of a singular companion matrix is not a companion matrix unless a = O.
Companion matrices have many other interesting properties, among which, and perhaps surprisingly, is the fact that their singular values can be found in closed form; see [14].
Theorem 10.38. Let al ~ a2 ~ ... ~ an be the singular values of the companion matrix A in (10.7). Leta = ar + ai + ... + a;_1 and y = 1 + aJ + a. Then
2 _ 1 ( J 2 2) a l - 2 y + y - 4ao '
a? = 1 for i = 2, 3, ... , n - 1,
a; = ~ (y - J y2 - 4aJ) .
If ao =1= 0, the largest and smallest singular values can also be written in the equivalent form
Remark 10.39. Explicit formulas for all the associated right and left singular vectors can also be derived easily.
If A E JRnxn is derogatory, i.e., has more than one Jordan block associated with at least one eigenvalue, then it is not similar to a companion matrix of the form (10.7). However, it can be shown that a derogatory matrix is similar to a block diagonal matrix, each of whose diagonal blocks is a companion matrix. Such matrices are said to be in rational canonical form (or Frobenius canonical form). For details, see, for example, [12].
Companion matrices appear frequently in the control and signal processing literature but unfortunately they are often very difficult to work with numerically. Algorithms to reduce an arbitrary matrix to companion form are numerically unstable. Moreover, companion matrices are known to possess many undesirable numerical properties. For example, in general and especially as n increases, their eigenstructure is extremely ill conditioned, nonsingular ones are nearly singular, stable ones are nearly unstable, and so forth [14].
Exercises 107
Companion matrices and rational canonical forms are generally to be avoided in floating-point computation.
Remark 10.40. Theorem 10.38 yields some understanding of why difficult numericalbehavior might be expected for companion matrices. For example, when solving linearsystems of equations of the form (6.2), one measure of numerical sensitivity is KP(A) =II ^ II p II A~l II p > me so-called condition number of A with respect to inversion and with respectto the matrix P-norm. If this number is large, say 0(10*), one may lose up to k digits ofprecision. In the 2-norm, this condition number is the ratio of largest to smallest singularvalues which, by the theorem, can be determined explicitly as
It is easy to show that y/2/ao < k2(A) < -£-,, and when GO is small or y is large (or both),then K2(A) ^ T~I. It is not unusual for y to be large for large n. Note that explicit formulasfor K\ (A) and Koo(A) can also be determined easily by using (10.11).
EXERCISES
1. Show that if a triangular matrix is normal, then it must be diagonal.
2. Prove that if A e M"x" is normal, then Af(A) = A/"(Ar).
3. Let A G Cnx" and define p(A) = maxx€A(A) I'M- Then p(A) is called the spectralradius of A. Show that if A is normal, then p(A) = ||A||2. Show that the converseis true if n = 2.
4. Let A € Cnxn be normal with eigenvalues y1 , . . . , yn and singular values a\ > a2 >• • • > on > 0. Show that a, (A) = |A.,-(A)| for i e n.
5. Use the reverse-order identity matrix P introduced in (9.18) and the matrix U inTheorem 10.5 to find a unitary matrix Q that reduces A e C"x" to lower triangularform.
6. Let A = I J MeC2 x 2 . Find a unitary matrix U such that
7. If A e Wxn is positive definite, show that A [ must also be positive definite.
3. Suppose A e E"x" is positive definite. Is [ ^ /i 1 > 0?
}. Let R, S 6 Enxn be symmetric. Show that [ * J 1 > 0 if and only if S > 0 and
R> S
Exercises 107
Companion matrices and rational canonical forms are generally to be avoided in fioatingpoint computation.
Remark 10.40. Theorem 10.38 yields some understanding of why difficult numerical behavior might be expected for companion matrices. For example, when solving linear systems of equations of the form (6.2), one measure of numerical sensitivity is Kp(A) = II A II p II A -] II p' the so-called condition number of A with respect to inversion and with respect to the matrix p-norm. If this number is large, say O(lOk ), one may lose up to k digits of precision. In the 2-norm, this condition number is the ratio of largest to smallest singular values which, by the theorem, can be determined explicitly as
y+Jy2- 4a5 21 aol
It is easy to show that 21~01 :::: K2(A) :::: 1:01' and when ao is small or y is large (or both), then K2(A) ~ I~I' It is not unusualfor y to be large forlarge n. Note that explicit formulas for K] (A) and Koo(A) can also be determined easily by using (l0.11).
EXERCISES
1. Show that if a triangular matrix is normal, then it must be diagonal.
2. Prove that if A E jRnxn is normal, then N(A) = N(A T ).
3. Let A E ccnxn and define peA) = max)..EA(A) IAI. Then peA) is called the spectral radius of A. Show that if A is normal, then peA) = IIAII2' Show that the converse is true if n = 2.
4. Let A E en xn be normal with eigenvalues A], ... , An and singular values 0'1 ~ 0'2 ~
... ~ an ~ O. Show that a; (A) = IA;(A)I for i E!l.
5. Use the reverse-order identity matrix P introduced in (9.18) and the matrix U in Theorem 10.5 to find a unitary matrix Q that reduces A E ccnxn to lower triangular form.
6. Let A = [~ :] E CC2x2 . Find a unitary matrix U such that
7. If A E jRn xn is positive definite, show that A -I must also be positive definite.
8. Suppose A E jRnxn is positive definite. Is [1 A~I] ~ O?
9. Let R, S E jRnxn be symmetric. Show that [~ ~] > 0 if and only if S > 0 and
R > S-I.
108 Chapter 10. Canonical Forms
10. Find the inertia of the following matrices:
108
10. Find the inertia of the following matrices:
(a) [~ ~ l (b) [
(d) [-1 1 + j ] 1 - j -1 .
Chapter 10. Canonical Forms
-2 1 + j ] 1 - j -2 '
Chapter 11
Linear Differential andDifference Equations
11.1 Differential EquationsIn this section we study solutions of the linear homogeneous system of differential equations
for t > IQ. This is known as an initial-value problem. We restrict our attention in thischapter only to the so-called time-invariant case, where the matrix A e Rnxn is constantand does not depend on t. The solution of (11.1) is then known always to exist and beunique. It can be described conveniently in terms of the matrix exponential.
Definition 11.1. For all A e Rnxn, the matrix exponential eA e Rnxn is defined by thepower series
The series (11.2) can be shown to converge for all A (has radius of convergence equalto +00). The solution of (11.1) involves the matrix
which thus also converges for all A and uniformly in t.
11.1.1 Properties of the matrix exponential
1. e° = I.Proof: This follows immediately from Definition 11.1 by setting A = 0.
2. For all A G R"XM, (eAf - e^.Proof: This follows immediately from Definition 11.1 and linearity of the transpose.
109
Chapter 11
Linear Differential and Difference Equations
11.1 Differential Equations In this section we study solutions of the linear homogeneous system of differential equations
x(t) = Ax(t); x(to) = Xo E JR.n (11.1)
for t 2: to. This is known as an initial-value problem. We restrict our attention in this chapter only to the so-called time-invariant case, where the matrix A E JR.nxn is constant and does not depend on t. The solution of (11.1) is then known always to exist and be unique. It can be described conveniently in terms of the matrix exponential.
Definition 11.1. For all A E JR.nxn, the matrix exponential eA E JR.nxn is defined by the power series
+00 1 eA = L ,Ak.
k=O k. (11.2)
The series (11.2) can be shown to converge for all A (has radius of convergence equal to +(0). The solution of (11.1) involves the matrix
(11.3)
which thus also converges for all A and uniformly in t.
11.1.1 Properties of the matrix exponential
1. eO = I. Proof This follows immediately from Definition 11.1 by setting A = O.
T T 2. For all A E JR.nxn, (e A ) = eA •
Proof This follows immediately from Definition 11.1 and linearity of the transpose.
109
110 Chapter 11. Linear Differential and Difference Equations
3. For all A e R"x" and for all t, r e R, e(t+T)A = e'AerA = elAe'A.Proof: Note that
Compare like powers of A in the above two equations and use the binomial theoremon (t + T)*.
4. For all A, B e R"xn and for all t e R, et(A+B) =^e'Ae'B = e'Be'A if and only if Aand B commute, i.e., AB = B A.Proof: Note that
and
and
while
Compare like powers of t in the first equation and the second or third and use thebinomial theorem on (A + B)k and the commutativity of A and B.
5. For all A e R"x" and for all t e R, (e'A)~l = e~'A.Proof: Simply take T = — t in property 3.
6. Let £ denote the Laplace transform and £~! the inverse Laplace transform. Then forall A € R"x" and for all t € R,
(a) C{etA} = (sI-Arl.
(b) £-1{(j/-A)-1} = «M.
Proof: We prove only (a). Part (b) follows similarly.
110 Chapter 11. Linear Differential and Difference Equations
3. For all A E JRnxn and for all t, T E JR, e(t+r)A = etA erA = erAetA .
Proof" Note that
(t + T)2 2 e(t+r)A = I + (t + T)A + A + ...
2!
and
tA rA t 2 T 2 (
2 )( 2 ) e e = I + t A + 2! A +... I + T A + 2! A +... .
Compare like powers of A in the above two equations and use the binomial theorem on(t+T)k.
4. For all A, B E JRnxn and for all t E JR, et(A+B) =-etAetB = etBetA if and only if A and B commute, i.e., AB = BA. Proof' Note that
and
while
t 2
et(A+B) = I + teA + B) + -(A + B)2 + ... 2!
tB tA t 2 t 2 (
2 )( 2 ) e e = 1+ tB + 2iB +... 1+ tA + 2!A +... .
Compare like powers of t in the first equation and the second or third and use the binomial theorem on (A + B/ and the commutativity of A and B.
5. ForaH A E JRnxn and for all t E JR, (etA)-1 = e-tA .
Proof" Simply take T = -t in property 3.
6. Let £ denote the Laplace transform and £-1 the inverse Laplace transform. Then for all A E JRnxn and for all t E lR,
(a) .l{etA } = (sI - A)-I.
(b) .l-I{(sl- A)-I} = erA.
Proof" We prove only (a). Part (b) follows similarly.
{+oo = io et(-sl)e
tA dt
(+oo = io ef(A-sl) dt since A and (-sf) commute
11.1. Differential Equations 111
= (sl -A)-1.
The matrix (s I — A) ~' is called the resolvent of A and is defined for all s not in A (A).Notice in the proof that we have assumed, for convenience, that A is diagonalizable.If this is not the case, the scalar dyadic decomposition can be replaced by
using the JCF. All succeeding steps in the proof then follow in a straightforward way.
7. For all A e R"x" and for all t e R, £(e'A) = AetA = e'AA.Proof: Since the series (11.3) is uniformly convergent, it can be differentiated term-by-term from which the result follows immediately. Alternatively, the formal definition
can be employed as follows. For any consistent matrix norm,
11.1. Differential Equations 111
= {+oo t e(Ai-S)t x;y;H dt assuming A is diagonalizable 10 ;=1
= ~[fo+oo e(Ai-S)t dt]x;y;H
n 1 = '"' --Xi y;H assuming Re s > Re Ai for i E !!
L..... s - A" i=1 I
= (sI - A)-I.
The matrix (s I - A) -I is called the resolvent of A and is defined for all s not in A (A). Notice in the proof that we have assumed, for convenience, that A is diagonalizable. If this is not the case, the scalar dyadic decomposition can be replaced by
m
et(A-sl) = L Xiet(Ji-sl)y;H
;=1
using the JCF. All succeeding steps in the proof then follow in a straightforward way.
7. For all A E JRnxn and for all t E JR, 1h(etA ) = AetA = etA A. Proof: Since the series (11.3) is uniformly convergent, it can be differentiated term-byterm from which the result follows immediately. Alternatively, the formal definition
d e(t+M)A _ etA _(/A) = lim d t ~t-+O L'lt
can be employed as follows. For any consistent matrix norm,
II e(t+~t)AAt- etA II III II ---u.--- - AetA = L'lt (etAe~tA - /A) - AetA
= II ~t (e~tAetA - etA) - AetA II
= II ~t (e~tA - l)etA
- AetA II
III ( (M)2 2 ) tA tAil = L'lt M A + ~ A +... e - Ae
= II ( AetA + ~; A2
etA + ... ) - Ae
tA II
= II ( ~; A2 + (~~)2 A3 + .. , ) etA II
< MIIA21111etA II _ + -IIAII + --IIAI12 + ... ( 1 L'lt (L'lt)2 )
- 2! 3! 4!
< L'lt1lA21111etA Il (1 + L'ltiIAIl + (~t IIAII2 + ... )
= L'lt IIA 21111etA IIe~tIIAII.
112 Chapter 11. Linear Differential and Difference Equations
For fixed t, the right-hand side above clearly goes to 0 as At goes to 0. Thus, thelimit exists and equals Ae'A. A similar proof yields the limit e'A A, or one can use thefact that A commutes with any polynomial of A of finite degree and hence with e'A.
11.1.2 Homogeneous linear differential equations
Theorem 11.2. Let A e Rnxn. The solution of the linear homogeneous initial-value problem
Proof: Differentiate (11.5) and use property 7 of the matrix exponential to get x ( t ) =Ae(t~to)Axo = Ax(t). Also, x(t0) — e(fo~t°')AXQ — XQ so, by the fundamental existence anduniqueness theorem for ordinary differential equations, (11.5) is the solution of (11.4). D
11.1.3 Inhomogeneous linear differential equations
Theorem 11.3. Let A e Rnxn, B e Wxm and let the vector-valued function u be givenand, say, continuous. Then the solution of the linear inhomogeneous initial-value problem
for t > IQ is given by the variation of parameters formula
Proof: Differentiate (11.7) and again use property 7 of the matrix exponential. The generalformula
is used to get x ( t ) = Ae{'-to)Ax0 + f'o Ae('-s)ABu(s) ds + Bu(t) = Ax(t) + Bu(t). Also,*('o) = <?(f°~fo)/1.¥o + 0 = XQ so, by the fundamental existence and uniqueness theorem forordinary differential equations, (11.7) is the solution of (11.6). D
Remark 11.4. The proof above simply verifies the variation of parameters formula bydirect differentiation. The formula can be derived by means of an integrating factor "trick"as follows. Premultiply the equation x — Ax = Bu by e~tA to get
112 Chapter 11. Linear Differential and Difference Equations
For fixed t, the right-hand side above clearly goes to 0 as t:.t goes to O. Thus, the limit exists and equals Aet A • A similar proof yields the limit et A A, or one can use the fact that A commutes with any polynomial of A of finite degree and hence with etA.
11.1.2 Homogeneous linear differential equations
Theorem 11.2. Let A E IRn xn. The solution of the linear homogeneous initial-value problem
x(t) = Ax(l); x(to) = Xo E IRn (11.4)
for t ::: to is given by
(11.5)
Proof: Differentiate (11.5) and use property 7 of the matrix exponential to get x (t) = Ae(t-to)Axo = Ax(t). Also, x(to) = e(to-to)A Xo = Xo so, by the fundamental existence and uniqueness theorem for ordinary differential equations, (11.5) is the solution of (11.4). 0
11.1.3 Inhomogeneous linear differential equations
Theorem 11.3. Let A E IRnxn , B E IRnxm and let the vector-valued function u be given and, say, continuous. Then the solution of the linear inhomogeneous initial-value problem
x(t) = Ax(t) + Bu(t); x(to) = Xo E IRn
for t ::: to is given by the variation of parameters formula
x(t) = e(t-to)A xo + t e(t-s)A Bu(s) ds. l to
(11.6)
(11.7)
Proof: Differentiate (11.7) and again use property 7 of the matrix exponential. The general formula
d l q(t) l q(t) af(x t) dq(t) dp(t)
- f(x, t) dx = ' dx + f(q(t), t)-- - f(p(t), t)--dt pet) pet) at dt dt
is used to get x(t) = Ae(t-to)A Xo + Ir: Ae(t-s)A Bu(s) ds + Bu(t) = Ax(t) + Bu(t). Also,
x(to} = e(to-tolA Xo + 0 = Xo so, by the fundilm()ntill ()lI.i~t()Oc() nnd uniqu()Oc:s:s theorem for ordinary differential equations, (11.7) is the solution of (1l.6). 0
Remark 11.4. The proof above simply verifies the variation of parameters formula by direct differentiation. The formula can be derived by means of an integrating factor "trick" as follows. Premultiply the equation x - Ax = Bu by e-tA to get
(11.8)
11.1. Differential Equations 113
Now integrate (11.8) over the interval [to, t]:
Thus,
and hence
11.1.4 Linear matrix differential equations
Matrix-valued initial-value problems also occur frequently. The first is an obvious general-ization of Theorem 11.2, and the proof is essentially the same.
Theorem 11.5. Let A e Wlxn. The solution of the matrix linear homogeneous initial-valuenrohlcm
for t > to is given by
In the matrix case, we can have coefficient matrices on both the right and left. Forconvenience, the following theorem is stated with initial time to = 0.
Theorem 11.6. Let A e Rnxn, B e Rmxm, and C e Rnxm. Then the matrix initial-valueproblem
— atAratBhas the solution X ( t ) = e Ce
Proof: Differentiate etACetB with respect to t and use property 7 of the matrix exponential.The fact that X ( t ) satisfies the initial condition is trivial. D
Corollary 11.7. Let A, C e IR"X". Then the matrix initial-value problem
has the solution X(t} = etACetAT.
When C is symmetric in (11.12), X ( t ) is symmetric and (11.12) is known as a Lya-punov differential equation. The initial-value problem (11.11) is known as a Sylvesterdifferential equation.
11.1. Differential Equations
Now integrate (11.8) over the interval [to, t]:
Thus,
and hence
-e-sAx(s) ds = e-SABu(s) ds. 1t d 1t to ds to
e-tAx(t) - e-toAx(to) = t e-sA Bu(s) ds lto
x(t) = e(t-tolAxo + t e(t-s)A Bu(s) ds. lto
11.1.4 Linear matrix differential equations
113
Matrix-valued initial-value problems also occur frequently. The first is an obvious generalization of Theorem 11.2, and the proof is essentially the same.
Theorem 11.5. Let A E jRnxn. The solution of the matrix linear homogeneous initial-value problem
X(t) = AX(t); X(to) = C E jRnxn (11.9)
for t ::: to is given by X(t) = e(t-to)Ac. (11.10)
In the matrix case, we can have coefficient matrices on both the right and left. For convenience, the following theorem is stated with initial time to = O.
Theorem 11.6. Let A E jRnxn, B E jRmxm, and C E ]R.nxm. Then the matrix initial-value problem
X(t) = AX(t) + X(t)B; X(O) = C (11.11)
has the solution X(t) = etACetB .
Proof: Differentiate etACetB with respect to t and use property 7 of the matrix exponential. The fact that X (t) satisfies the initial condition is trivial. 0
Corollary 11.7. Let A, C E ]R.nxn. Then the matrix initial-value problem
X(t) = AX(t) + X(t)AT; X(O) = C (11.12)
has the solution X(t) = etACetAT.
When C is symmetric in (11.12), X (t) is symmetric and (11.12) is known as a Lyapunov differential equation. The initial-value problem (11.11) is known as a Sylvester differential equation.
114 Chapter 11. Linear Differential and Difference Equations
11.1.5 Modal decompositions
Let A E Wxn and suppose, for convenience, that it is diagonalizable (if A is not diagonaliz-able, the rest of this subsection is easily generalized by using the JCF and the decompositionA — ̂ Xf Ji Yt
H as discussed in Chapter 9). Then the solution x(t) of (11.4) can be written
The ki s are called the modal velocities and the right eigenvectors *, are called the modaldirections. The decomposition above expresses the solution x(t) as a weighted sum of itsmodal velocities and directions.
This modal decomposition can be expressed in a different looking but identical form
if we write the initial condition XQ as a weighted sum of the right eigenvectors
Then
In the last equality we have used the fact that yf*Xj = Sfj.Similarly, in the inhomogeneous case we can write
11.1.6 Computation of the matrix exponential
JCF method
Let A e R"x" and suppose X e Rnxn is such that X"1 AX = J, where J is a JCF for A.Then
114 Chapter 11. Linear Differential and Difference Equations
11.1 .5 Modal decompositions
Let A E jRnxn and suppose, for convenience, that it is diagonalizable (if A is not diagonalizable, the rest of this subsection is easily generalized by using the JCF and the decomposition A = L X;li yi
H as discussed in Chapter 9). Then the solution x(t) of (11.4) can be written
x(t) = e(t-to)A Xo
= (ti.iU-tO)Xiyr) Xo 1=1
n
= L(YiH xoeAi(t-tO»Xi. i=1
The Ai s are called the modal velocities and the right eigenvectors Xi are called the modal directions. The decomposition above expresses the solution x (t) as a weighted sum of its modal velocities and directions.
This modal decomposition can be expressed in a different looking but identical form n
if we write the initial condition Xo as a weighted sum of the right eigenvectors Xo = L ai Xi.
Then
n
= L(aieAiU-tO»Xi. i=1
In the last equality we have used the fact that YiH X j = flij. Similarly, in the inhomogeneous case we can write
it e(t-s)A Bu(s) ds = t (it eAiU-S)YiH Bu(s) dS) Xi.
~ i=1 ~
11.1.6 Computation of the matrix exponential
JCF method
i=1
Let A E jRnxn and suppose X E jR~xn is such that X-I AX = J, where J is a JCF for A. Then
etA = etXJX-1
= XetJX- 1
I
n
LeA•, X'YiH if A is diagonalizable
1=1
~ t,x;e'J,y;H in geneml.
11.1. Differential Equations 115
If A is diagonalizable, it is then easy to compute etA via the formula etA = XetJ X 'since etj is simply a diagonal matrix.
In the more general case, the problem clearly reduces simply to the computation ofthe exponential of a Jordan block. To be specific, let .7, e <Ckxk be a Jordan block of the form
Clearly A/ and N commute. Thus, etJi = e'uetN by property 4 of the matrix exponential.The diagonal part is easy: etu = diag(ex ' ,..., ext}. But etN is almost as easy since N isnilpotent of degree k.
Definition 11.8. A matrix M e Mnx"Mp = 0, while Mp~l ^ 0.
is nilpotent of degree (or index, or grade) p if
For the matrix N defined above, it is easy to check that while N has 1's along onlyits first superdiagonal (and O's elsewhere), N2 has 1's along only its second superdiagonal,and so forth. Finally, Nk~l has a 1 in its (1, k) element and has O's everywhere else, andNk = 0. Thus, the series expansion of e'N is finite, i.e.,
Thus,
In the case when A. is complex, a real version of the above can be worked out.
11.1. Differential Equations 115
If A is diagonalizable, it is then easy to compute etA via the formula etA = Xetl X-I since et I is simply a diagonal matrix.
In the more general case, the problem clearly reduces simply to the computation of the exponential of a Jordan block. To be specific, let Ji E Ckxk be a Jordan block of the form
Ji =
A 1
o A
o
o o
o =U+N.
o A
Clearly AI and N commute. Thus, et I, = eO.! el N by property 4 of the matrix exponential. The diagonal part is easy: elH = diag(eAt
, ••• ,eAt). But elN is almost as easy since N is nilpotent of degree k.
Definition 11.8. A matrix M E jRnxn is nilpotent of degree (or index, or grade) p if MP = 0, while MP-I t=- O.
For the matrix N defined above, it is easy to check that while N has l's along only its first superdiagonal (and O's elsewhere), N 2 has l's along only its second superdiagonal, and so forth. Finally, N k- I has a 1 in its (1, k) element and has O's everywhere else, and N k = O. Thus, the series expansion of elN is finite, i.e.,
Thus,
t2 t k- I
eIN =I+tN+-N2 + ... + N k-
I
2! (k - I)!
o
o o
eAt teAt 12 At 2I e
0 eAt teAl
ell; = 0 0 eAt
0 0
t 1
Ik-I At (k-I)! e
12 At 2I e
teAl
eAt
In the case when A is complex, a real version of the above can be worked out.
116 Chapter 11. Linear Differential and Difference Equations
Example 11.9. Let A = [ ~_\ J]. Then A (A) = {-2, -2} and
Interpolation method
This method is numerically unstable in finite-precision arithmetic but is quite effective forhand calculation in small-order problems. The method is stated and illustrated for theexponential function but applies equally well to other functions.
Given A € E.nxn and /(A) = etx, compute f(A) = e'A, where t is a fixed scalar.Suppose the characteristic polynomial of A can be written as n ( X ) = Yi?=i (^ ~~ ^i)" '»where the A.,- s are distinct. Define
where OTQ, . . . , an-i are n constants that are to be determined. They are, in fact, the uniquesolution of the n equations:
Here, the superscript (&) denotes the fcth derivative with respect to X. With the a,s thenknown, the function g is known and /(A) = g(A). The motivation for this method isthe Cayley-Hamilton Theorem, Theorem 9.3, which says that all powers of A greater thann — 1 can be expressed as linear combinations of Ak for k = 0, 1, . . . , n — 1. Thus, all theterms of order greater than n — 1 in the power series for e'A can be written in terms of theselower-order powers as well. The polynomial g gives the appropriate linear combination.
Example 11.10. Let
and /(A) = etK. Then jr(A.) = -(A. + I)3, so m = 1 and n{ = 3.Let g(X) — UQ + a\X + o^A.2. Then the three equations for the a,s are given by
116 Chapter 11. Linear Differential and Difference Equations
Example 11.9. Let A = [=i a Then A(A) = {-2, -2} and
etA = XetJ x-I
=[ 2 1 ] exp t [
-2 -~ ] [ -1 ] 0 -1 2
=[ 2 ] [ e~2t te-2t ] [ -1 ] 1 e-2t -1 2
Interpolation method
This method is numerically unstable in finite-precision arithmetic but is quite effective for hand calculation in small-order problems. The method is stated and illustrated for the exponential function but applies equally well to other functions.
Given A E jRnxn and f(A) = etA, compute f(A) = etA, where t is a fixed scalar. Suppose the characteristic polynomial of A can be written as n(A) = nr=1 (A - Ai t', where the Ai s are distinct. Define
where ao, ... , an-l are n constants that are to be determined. They are, in fact, the unique solution of the n equations:
g(k)(Ai) = f(k)(Ai); k = 0, I, ... , ni - I, i Em.
Here, the superscript (k) denotes the kth derivative with respect to A. With the aiS then known, the function g is known and f(A) = g(A). The motivation for this method is the Cayley-Hamilton Theorem, Theorem 9.3, which says that all powers of A greater than n - 1 can be expressed as linear combinations of A k for k = 0, I, ... , n - 1. Thus, all the terms of order greater than n - 1 in the power series for et A can be written in terms of these lower-order powers as well. The polynomial g gives the appropriate linear combination.
Example 11.10. Let
A = [-~ -~ ~ ] o 0-1
and f(A) = etA. Then n(A) = -(A + 1)3, so m = 1 and nl = 3. Let g(A) = ao + alA + a2A2. Then the three equations for the aiS are given by
g(-I) = f(-1) ==> ao -al +a2 = e- t,
g'(-1) = f'(-1) ==> at - 2a2 = te- t,
g"(-I) = 1"(-1) ==> 2a2 = t 2e- t•
11.1. Differential Equations 117
Solving for the a, s, we find
Thus,
~4 4i t f f > \ tk TU^^ _/"i\ f \ i o\2Example 11.11. Let A = [ _* J] and /(A) = ea. Then 7r(X) = (A + 2)2 so m = 1 and«i = 2.
Let g(A.) = «o + ofiA.. Then the defining equations for the a,-s are given by
Solving for the a,s, we find
Thus,
Other methods
1. Use etA = £~l{(sl — A)^1} and techniques for inverse Laplace transforms. Thisis quite effective for small-order problems, but general nonsymbolic computationaltechniques are numerically unstable since the problem is theoretically equivalent toknowing precisely a JCF.
2. Use Pade approximation. There is an extensive literature on approximating cer-tain nonlinear functions by rational functions. The matrix analogue yields eA =
11 .1. Differential Equations 117
Solving for the ai s, we find
Thus,
Example 11.11. Let A = [::::~ 6] and f(A) = eO-. Then rr(A) = (A + 2)2 so m = 1 and
nL = 2. Let g(A) = ao + aLA. Then the defining equations for the aiS are given by
g(-2) = f(-2) ==> ao - 2al = e-2t,
g'(-2) = f'(-2) ==> al = te-2t.
Solving for the aiS, we find
Thus,
ao = e-2t + 2te-2t,
aL = te-2t.
f(A) = etA = g(A) = aoI + al A
= (e- 2t + 2te-2t) [ ~
_ [ e-2t _ 2te-2t
- -te-2t
Other methods
o ] + te-2t [-4 4 ] I -I 0
1. Use etA = .c-I{(sI - A)-I} and techniques for inverse Laplace transforms. This is quite effective for small-order problems, but general nonsymbolic computational techniques are numerically unstable since the problem is theoretically equivalent to knowing precisely a JCE
2. Use Pade approximation. There is an extensive literature on approximating certain nonlinear functions by rational functions. The matrix analogue yields eA ~
118 Chapter 11. Linear Differential and Difference Equations
D~l(A)N(A), where D(A) = 80I + Si A H h SPAP and N(A) = v0I + vlA +• • • + vq A
q. Explicit formulas are known for the coefficients of the numerator anddenominator polynomials of various orders. Unfortunately, a Fade approximation forthe exponential is accurate only in a neighborhood of the origin; in the matrix casethis means when || A|| is sufficiently small. This can be arranged by scaling A, say, by
/ * \2*multiplying it by 1/2* for sufficiently large k and using the fact that eA = ( e { ] / 2 )A j .Numerical loss of accuracy can occur in this procedure from the successive squarings.
3. Reduce A to (real) Schur form S via the unitary similarity U and use eA = UesUH
and successive recursions up the superdiagonals of the (quasi) upper triangular matrixes.
4. Many methods are outlined in, for example, [19]. Reliable and efficient computationof matrix functions such as eA and log(A) remains a fertile area for research.
11.2 Difference EquationsIn this section we outline solutions of discrete-time analogues of the linear differentialequations of the previous section. Linear discrete-time systems, modeled by systems ofdifference equations, exhibit many parallels to the continuous-time differential equationcase, and this observation is exploited frequently.
11.2.1 Homogeneous linear difference equations
Theorem 11.12. Let A e Rnxn. The solution of the linear homogeneous system of differenceequations
11.2.2 Inhomogeneous linear difference equations
Theorem 11.14. Let A e Rnxn, B e Rnxm and suppose {«*}£§ « a given sequence ofm-vectors. Then the solution of the inhomogeneous initial-value problem
for k > 0 is given by
Proof: The proof is almost immediate upon substitution of (11.14) into (11.13). D
Remark 11.13. Again, we restrict our attention only to the so-called time-invariantcase, where the matrix A in (11.13) is constant and does not depend on k. We could alsoconsider an arbitrary "initial time" ko, but since the system is time-invariant, and since wewant to keep the formulas "clean" (i.e., no double subscripts), we have chosen ko = 0 forconvenience.
118 Chapter 11. Linear Differential and Difference Equations
D-I(A)N(A), where D(A) = 001 + olA + ... + opAP and N(A) = vol + vIA + ... + Vq A q. Explicit formulas are known for the coefficients of the numerator and denominator polynomials of various orders. Unfortunately, a Pad6 approximation for the exponential is accurate only in a neighborhood of the origin; in the matrix case this means when IIAII is sufficiently small. This can be arranged by scaling A, say, by
2' multiplying it by 1/2k for sufficiently large k and using the fact that eA = (e( I /2')A )
Numerical loss of accuracy can occur in this procedure from the successive squarings.
3. Reduce A to (real) Schur form S via the unitary similarity U and use eA = U eS U H
and successive recursions up the superdiagonals of the (quasi) upper triangular matrix eS .
4. Many methods are outlined in, for example, [19]. Reliable and efficient computation of matrix functions such as eA and 10g(A) remains a fertile area for research.
11.2 Difference Equations In this section we outline solutions of discrete-time analogues of the linear differential equations of the previous section. Linear discrete-time systems, modeled by systems of difference equations, exhibit many parallels to the continuous-time differential equation case, and this observation is exploited frequently.
11.2.1 Homogeneous linear difference equations
Theorem 11.12. Let A E jRn xn. The solution of the linear homogeneous system of difference equations
(11.13)
for k 2:: 0 is given by
Proof: The proof is almost immediate upon substitution of (11.14) into (11.13). 0
Remark 11.13. Again, we restrict our attention only to the so-called time-invariant case, where the matrix A in (11.13) is constant and does not depend on k. We could also consider an arbitrary "initial time" ko, but since the system is time-invariant, and since we want to keep the formulas "clean" (i.e., no double subscripts), we have chosen ko = 0 for convenience.
11.2.2 Inhomogeneous linear difference equations
Theorem 11.14. Let A E jRnxn, B E jRnxm and suppose {udt~ is a given sequence of m-vectors. Then the solution of the inhomogeneous initial-value problem
(11.15)
11.2. Difference Equations 119
is given by
11.2.3 Computation of matrix powers
It is clear that solution of linear systems of difference equations involves computation ofAk. One solution method, which is numerically unstable but sometimes useful for handcalculation, is to use z-transforms, by analogy with the use of Laplace transforms to computea matrix exponential. One definition of the z-transform of a sequence {gk} is
Assuming |z| > max |A|, the z-transform of the sequence {Ak} is then given byX€A(A)
Proof: The proof is again almost immediate upon substitution of (11.16) into (11.15). D
Methods based on the JCF are sometimes useful, again mostly for small-order prob-lems. Assume that A e M"xn and let X e R^n be such that X~1AX = /, where J is aJCF for A. Then
If A is diagonalizable, it is then easy to compute Ak via the formula Ak — XJkX l
since /* is simply a diagonal matrix.
11.2. Difference Equations
is given by
k-I
xk=AkXO+LAk-j-IBUj, k:::.O. j=O
119
(11.16)
Proof: The proof is again almost immediate upon substitution of (11.16) into (11.15). 0
11.2.3 Computation of matrix powers
It is clear that solution of linear systems of difference equations involves computation of A k. One solution method, which is numerically unstable but sometimes useful for hand calculation, is to use z-transforms, by analogy with the use of Laplace transforms to compute a matrix exponential. One definition of the z-transform of a sequence {gk} is
+00
Z({gk}t~) = LgkZ-k. k=O
Assuming Izl > max IAI, the z-transform of the sequence {Ak} is then given by AEA(A)
+00 k "'kk 1 12 Z({A})=L...-z-A =I+-A+"2A + ...
k=O z z = (l-z-IA)-I
= z(zI - A)-I.
Methods based on the JCF are sometimes useful, again mostly for small-order problems. Assume that A E jRnxn and let X E jR~xn be such that X-I AX = J, where J is a JCF for A. Then
Ak = (XJX-I)k
= XJkX- 1
_I tA~X;y;H - m
LXi JtyiH
;=1
if A is diagonalizable,
in general.
If A is diagonalizable, it is then easy to compute Ak via the formula Ak = X Jk X-I since Jk is simply a diagonal matrix.
120 Chapter 11. Linear Differential and Difference Equations
In the general case, the problem again reduces to the computation of the power of aJordan block. To be specific, let 7, e Cpxp be a Jordan block of the form
Writing /,• = XI + N and noting that XI and the nilpotent matrix N commute, it isthen straightforward to apply the binomial theorem to (XI + N)k and verify that
The symbol ( ) has the usual definition of ,(^ ., and is to be interpreted as 0 if k < q.In the case when A. is complex, a real version of the above can be worked out.
-4Example 11.15. Let A = [_J J]. Then
Basic analogues of other methods such as those mentioned in Section 11.1.6 can alsobe derived for the computation of matrix powers, but again no universally "best" methodexists. For an erudite discussion of the state of the art, see [11, Ch. 18].
11.3 Higher-Order EquationsIt is well known that a higher-order (scalar) linear differential equation can be converted toa first-order linear system. Consider, for example, the initial-value problem
with 4>(t} a given function and n initial conditions
120 Chapter 11. Linear Differential and Difference Equations
In the general case, the problem again reduces to the computation of the power of a Jordan block. To be specific, let Ji E Cpxp be a Jordan block of the form
o ... 0 A
Writing Ji = AI + N and noting that AI and the nilpotent matrix N commute, it is then straightforward to apply the binomial theorem to (AI + N)k and verify that
Ak kA k-I (;)A
k-
2 ( k ) Ak-P+I p-l
0 Ak kAk- 1
J/ = 0 0 Ak ( ; ) Ak- 2
kAk - 1
0 0 Ak
The symbol (: ) has the usual definition of q!(kk~q)! and is to be interpreted as 0 if k < q. In the case when A is complex, a real version of the above can be worked out.
Example 11.15. Let A = [=i a Then
Ak = XJkX-1 = [2 1 ] [(_2)k k(-2)kk-1
] [ 1 -21 ] 1 1 0 (-2) -1
_ [ (_2/- 1 (-2 - 2k) k( -2l+1 ] - -k( _2)k-1 (-2l- 1 (2k - 2) .
Basic analogues of other methods such as those mentioned in Section 11.1.6 can also be derived for the computation of matrix powers, but again no universally "best" method exists. For an erudite discussion of the state of the art, see [11, Ch. 18].
11.3 Higher-Order Equations It is well known that a higher-order (scalar) linear differential equation can be converted to a first-order linear system. Consider, for example, the initial-value problem
(11.17)
with ¢J(t) a given function and n initial conditions
y(O) = Co, y(O) = CI, ... , in-I)(O) = Cn-I' (1l.l8)
Exercises 121
Here, v(m) denotes the mth derivative of y with respect to t. Define a vector x (?) e R" withcomponents *i(0 = y ( t ) , x2(t) = y ( t ) , . . . , xn(t) = y { n ~ l ) ( t ) . Then
These equations can then be rewritten as the first-order linear system
The initial conditions take the form ^(0) = c = [CQ, c\,..., CM_I] .Note that det(X7 — A) = A." + an-\X
n~l H h a\X + ao. However, the companionmatrix A in (11.19) possesses many nasty numerical properties for even moderately sized nand, as mentioned before, is often well worth avoiding, at least for computational purposes.
A similar procedure holds for the conversion of a higher-order difference equation
EXERCISES
1. Let P € Rnxn be a projection. Show that ep % / + 1.718P.
2. Suppose x, y € R" and let A = xyT. Further, let a = xTy. Show that e'A
I + g ( t , a)xyT, where
3. Let
with n initial conditions, into a linear first-order difference equation with (vector) initialcondition.
Exercises 121
Here, y(m) denotes the mth derivative of y with respect to t. Define a vector x (t) E ]Rn with components Xl (t) = yet), X2(t) = yet), ... , Xn(t) = In-l)(t). Then
Xl (I) = X2(t) = y(t),
X2(t) = X3(t) = yet),
Xn-l (t) = Xn(t) = y(n-l)(t),
Xn(t) = y(n)(t) = -aoy(t) - aly(t) - ... - an_lln-l)(t) + ¢(t)
= -aOx\ (t) - a\X2(t) - ... - an-lXn(t) + ¢(t).
These equations can then be rewritten as the first-order linear system
0 0 0
0 0 1
x(t)+ [ n ~(t) x(t) = 0
0 0 1 -ao -a\ -an-\
The initial conditions take the form X (0) = C = [co, Cl, •.. , Cn-\ r.
(11.19)
Note that det(A! - A) = An + an_1An- 1 + ... + alA + ao. However, the companion matrix A in (11.19) possesses many nasty numerical properties for even moderately sized n and, as mentioned before, is often well worth avoiding, at least for computational purposes.
A similar procedure holds for the conversion of a higher-order difference equation
with n initial conditions, into a linear first-order difference equation with (vector) initial condition.
EXERCISES
1. Let P E lRnxn be a projection. Show that eP ~ ! + 1.718P.
2. Suppose x, y E lRn and let A = xyT. Further, let a = XT y. Show that etA
1+ get, a)xyT, where
{ !(eat - I)
g(t,a)= a t
3. Let
if a 1= 0, if a = O.
122 Chapter 11. Linear Differential and Difference Equations
where X e M'nx" is arbitrary. Show that
4. Let K denote the skew-symmetric matrix
where /„ denotes the n x n identity matrix. A matrix A e R2nx2n is said to beHamiltonian if K~1ATK = -A and to be symplectic if K~l ATK - A - 1 .
(a) Suppose E is Hamiltonian and let A, be an eigenvalue of H. Show that —A, mustalso be an eigenvalue of H.
(b) Suppose S is symplectic and let A. be an eigenvalue of S. Show that 1 /A, mustalso be an eigenvalue of S.
(c) Suppose that H is Hamiltonian and S is symplectic. Show that S~1HS must beHamiltonian.
(d) Suppose H is Hamiltonian. Show that eH must be symplectic.
5. Let a, ft € R and
Then show that
6. Find a general expression for
7. Find eM when A =
5. Let
(a) Solve the differential equation
122 Chapter 11. Linear Differential and Difference Equations
where X E jRmxn is arbitrary. Show that
eA = [eo I sinh 1 X ] ~I .
4. Let K denote the skew-symmetric matrix
[ 0 In ]
-In 0 '
where In denotes the n x n identity matrix. A matrix A E jR2nx2n is said to be Hamiltonian if K -I AT K = - A and to be symplectic if K -I AT K = A -I.
(a) Suppose H is Hamiltonian and let).. be an eigenvalue of H. Show that -).. must also be an eigenvalue of H.
(b) Suppose S is symplectic and let).. be an eigenvalue of S. Show that 1/).. must also be an eigenValue of S.
(c) Suppose that H is Hamiltonian and S is symplectic. Show that S-I H S must be Hamiltonian.
(d) Suppose H is Hamiltonian. Show that eH must be symplectic.
5. Let a, f3 E lR and
Then show that
6. Find a general expression for
7. Find etA when A =
8. Let
ectt cos f3t _eut sin f3t
ectctrt sin ~t J. e cos/A
(a) Solve the differential equation
i = Ax ; x(O) = [ ~ J.
Exercises 123
Show that the eigenvalues of the solution X ( t ) of this problem are the same as thoseof Cf or all?.
11. The year is 2004 and there are three large "free trade zones" in the world: Asia (A),Europe (E), and the Americas (R). Suppose certain multinational companies havetotal assets of $40 trillion of which $20 trillion is in E and $20 trillion is in R. Eachyear half of the Americas' money stays home, a quarter goes to Europe, and a quartergoes to Asia. For Europe and Asia, half stays home and half goes to the Americas.
(a) Find the matrix M that gives
(b) Find the eigenvalues and right eigenvectors of M.
(c) Find the distribution of the companies' assets at year k.
(d) Find the limiting distribution of the $40 trillion as the universe ends, i.e., ask —»• +00 (i.e., around the time the Cubs win a World Series).
(Exercise adapted from Problem 5.3.11 in [24].)
(b) Solve the differential equation
9. Consider the initial-value problem
for t > 0. Suppose that A e E"x" is skew-symmetric and let a = \\XQ\\2. Show that||*(OII2 = aforallf > 0.
10. Consider the n x n matrix initial-value problem
12. (a) Find the solution of the initial-value problem
(b) Consider the difference equation
If £0 = 1 and z\ = 2, what is the value of ZIQOO? What is the value of Zk ingeneral?
Exercises 123
(b) Solve the differential equation
i = Ax + b; x(O) = [ ~ l 9. Consider the initial-value problem
i(t) = Ax(t); x(O) = Xo
for t ~ O. Suppose that A E ~nxn is skew-symmetric and let ex = Ilxol12. Show that I/X(t)1/2 = ex for all t > O.
10. Consider the n x n matrix initial-value problem
X(t) = AX(t) - X(t)A; X(O) = c.
Show that the eigenvalues of the solution X (t) of this problem are the same as those of C for all t.
11. The year is 2004 and there are three large "free trade zones" in the world: Asia (A), Europe (E), and the Americas (R). Suppose certain multinational companies have total assets of $40 trillion of which $20 trillion is in E and $20 trillion is in R. Each year half of the Americas' money stays home, a quarter goes to Europe, and a quarter goes to Asia. For Europe and Asia, half stays home and half goes to the Americas.
(a) Find the matrix M that gives
[A] [A] E =M E
R year k+1 R year k
(b) Find the eigenvalues and right eigenvectors of M.
(c) Find the distribution of the companies' assets at year k.
(d) Find the limiting distribution of the $40 trillion as the universe ends, i.e., as k ---* +00 (i.e., around the time the Cubs win a World Series).
(Exercise adapted from Problem 5.3.11 in [24].)
12. (a) Find the solution of the initial-value problem
.Yet) + 2y(t) + yet) = 0; yeO) = 1, .YeO) = O.
(b) Consider the difference equation
Zk+2 + 2Zk+1 + Zk = O.
If Zo = 1 and ZI = 2, what is the value of ZIOOO? What is the value of Zk in general?
This page intentionally left blank This page intentionally left blank
Chapter 12
Generalized EigenvalueProblems
12.1 The Generalized Eigenvalue/Eigenvector ProblemIn this chapter we consider the generalized eigenvalue problem
125
where A, B e C"xn. The standard eigenvalue problem considered in Chapter 9 obviouslycorresponds to the special case that B = I.
Definition 12.1. A nonzero vector x e C" is a right generalized eigenvector of the pair(A, B) with A, B e CMX" if there exists a scalar A. e C, called a generalized eigenvalue,such that
Similarly, a nonzero vector y e C" is a left generalized eigenvector corresponding to aneigenvalue X if
When the context is such that no confusion can arise, the adjective "generalized"is usually dropped. As with the standard eigenvalue problem, if x [y] is a right [left]eigenvector, then so is ax [ay] for any nonzero scalar a. e C.
Definition 12.2. The matrix A — XB is called a matrix pencil (or pencil of the matrices Aand B).
As with the standard eigenvalue problem, eigenvalues for the generalized eigenvalueproblem occur where the matrix pencil A — XB is singular.
Definition 12.3. The polynomial 7r(A.) = det(A — A.5) is called the characteristic poly-nomial of the matrix pair (A, B). The roots ofn(X.) are the eigenvalues of the associatedgeneralized eigenvalue problem.
Remark 12.4. When A, B e E"xn, the characteristic polynomial is obviously real, andhence nonreal eigenvalues must occur in complex conjugate pairs.
Chapter 12
Generalized Eigenvalue Problems
12.1 The Generalized Eigenvalue/Eigenvector Problem In this chapter we consider the generalized eigenvalue problem
Ax = 'ABx,
where A, B E enxn . The standard eigenvalue problem considered in Chapter 9 obviously corresponds to the special case that B = I.
Definition 12.1. A nonzero vector x E en is a right generalized eigenvector of the pair (A, B) with A, B E enxn if there exists a scalar 'A E e, called a generalized eigenvalue, such that
Ax = 'ABx. (12.1)
Similarly, a nonzero vector y E en is a left generalized eigenvector corresponding to an eigenvalue 'A if
(12.2)
When the context is such that no confusion can arise, the adjective "generalized" is usually dropped. As with the standard eigenvalue problem, if x [y] is a right [left] eigenvector, then so is ax [ay] for any nonzero scalar a E <C.
Definition 12.2. The matrix A - 'AB is called a matrix pencil (or pencil of the matrices A and B).
As with the standard eigenvalue problem, eigenvalues for the generalized eigenvalue problem occur where the matrix pencil A - 'AB is singular.
Definition 12.3. The polynomial n('A) = det(A - 'AB) is called the characteristic polynomial of the matrix pair (A, B). The roots ofn('A) are the eigenvalues of the associated generalized eigenvalue problem.
Remark 12.4. When A, B E jRnxn, the characteristic polynomial is obviously real, and hence nonreal eigenvalues must occur in complex conjugate pairs.
125
and there are again four cases to consider.
Case 1: a ^ 0, ft ^ 0. There are two eigenvalues, 1 and ^.
Case 2: a = 0, ft ^ 0. There is only one eigenvalue, 1 (of multiplicity 1).
Case 3: a ^ 0, f3 = 0. There are two eigenvalues, 1 and 0.
Case 4: a = 0, (3 = 0. All A 6 C are eigenvalues since det(B — uA) = 0.
At least for the case of regular pencils, it is apparent where the "missing" eigenvalues havegone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 ofA — A.B, with its reciprocal eigenvalue being 0 in Case 3 of the reciprocal pencil B — nA.A similar reciprocal symmetry holds for Case 2.
While there are applications in system theory and control where singular pencilsappear, only the case of regular pencils is considered in the remainder of this chapter. Notethat A and/or B may still be singular. If B is singular, the pencil A — KB always has
126 Chapter 12. Generalized Eigenvalue Problems
Remark 12.5. If B = I (or in general when B is nonsingular), then n ( X ) is a polynomialof degree n, and hence there are n eigenvalues associated with the pencil A — XB. However,when B = I, in particular, when B is singular, there may be 0, k e n, or infinitely manyeigenvalues associated with the pencil A — XB. For example, suppose
where a and ft are scalars. Then the characteristic polynomial is
and there are several cases to consider.
Case 1: a ^ 0, ft ̂ 0. There are two eigenvalues, 1 and |.
Case 2: a = 0, f3 / 0. There are two eigenvalues, 1 and 0.
Case 3: a = 0, f3 = 0. There is only one eigenvalue, 1 (of multiplicity 1).
Case 4: a = 0, f3 = 0. All A e C are eigenvalues since det(A — A.B) =0.
Definition 12.6. If del (A — XB) is not identically zero, the pencil A — XB is said to beregular; otherwise, it is said to be singular.
Note that if AA(A) n J\f(B) ^ 0, the associated matrix pencil is singular (as in Case4 above).
Associated with any matrix pencil A — X B is a reciprocal pencil B — n,A and cor-responding generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues(JL = £. It is instructive to consider the reciprocal pencil associated with the example inRemark 12.5. With A and B as in (12.3), the characteristic polynomial is
126 Chapter 12. Generalized Eigenvalue Problems
Remark 12.5. If B = I (or in general when B is nonsingular), then rr(A) is a polynomial of degree n, and hence there are n eigenvalues associated with the pencil A - AB. However, when B =I- I, in particular, when B is singular, there may be 0, k E !!, or infinitely many eigenvalues associated with the pencil A - AB. For example, suppose
where a and (3 are scalars. Then the characteristic polynomial is
det(A - AB) = (I - AHa - (3A)
and there are several cases to consider.
Case 1: a =I- 0, {3 =I- O. There are two eigenvalues, I and ~.
Case 2: a = 0, {3 =I- O. There are two eigenvalues, I and O.
Case 3: a =I- 0, {3 = O. There is only one eigenvalue, I (of multiplicity 1).
Case 4: a = 0, (3 = O. All A E C are eigenvalues since det(A - AB) == O.
(12.3)
Definition 12.6. If det(A - AB) is not identically zero, the pencil A - AB is said to be regular; otherwise, it is said to be singular.
Note that if N(A) n N(B) =I- 0, the associated matrix pencil is singular (as in Case 4 above).
Associated with any matrix pencil A - AB is a reciprocal pencil B - /.LA and corresponding generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues /.L = ±. It is instructive to consider the reciprocal pencil associated with the example in Remark 12.5. With A and B as in (12.3), the characteristic polynomial is
det(B - /.LA) = (1 - /.L)({3 - a/.L)
and there are again four cases to consider.
Case 1: a =I- 0, {3 =I- O. There are two eigenvalues, I and ~.
Case 2: a = 0, {3 =I- O. There is only one eigenvalue, I (of multiplicity I).
Case 3: a =I- 0, {3 = O. There are two eigenvalues, 1 and O.
Case 4: a = 0, (3 = O. All A E C are eigenvalues since det(B - /.LA) == O.
At least for the case of regular pencils, it is apparent where the "missing" eigenvalues have gone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 of A - AB, with its reciprocal eigenvalue being 0 in Case 3 of the reciprocal pencil B - /.LA. A similar reciprocal symmetry holds for Case 2.
While there are applications in system theory and control where singular pencils appear, only the case of regular pencils is considered in the remainder of this chapter. Note that A and/or B may still be singular. If B is singular, the pencil A - AB always has
12.2. Canonical Forms 127
fewer than n eigenvalues. If B is nonsingular, the pencil A - A . f i always has precisely neigenvalues, since the generalized eigenvalue problem is then easily seen to be equivalentto the standard eigenvalue problem B~lAx = Xx (or AB~lw = Xw). However, this turnsout to be a very poor numerical procedure for handling the generalized eigenvalue problemif B is even moderately ill conditioned with respect to inversion. Numerical methods thatwork directly on A and B are discussed in standard textbooks on numerical linear algebra;see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
12.2 Canonical FormsJust as for the standard eigenvalue problem, canonical forms are available for the generalizedeigenvalue problem. Since the latter involves a pair of matrices, we now deal with equiva-lencies rather than similarities, and the first theorem deals with what happens to eigenvaluesand eigenvectors under equivalence.
Theorem 12.7. Let A, fl, Q, Z e Cnxn with Q and Z nonsingular. Then
1. the eigenvalues of the problems A — XB and QAZ — XQBZ are the same (the twoproblems are said to be equivalent).
2. ifx is a right eigenvector of A—XB, then Z~lx is a right eigenvector of QAZ—XQ B Z.
3. ify is a left eigenvector of A —KB, then Q~Hy isa left eigenvector ofQAZ — XQBZ.
Proof:
1. det(QAZ-XQBZ) = det[0(A - XB)Z] = det gdet Zdet(A - XB). Since det 0and det Z are nonzero, the result follows.
2. The result follows by noting that (A –yB)x - Oif andonly if Q(A-XB)Z(Z~lx) =0.
3. Again, the result follows easily by noting that yH(A — XB) — 0 if and only if( Q ~ H y ) H Q ( A – X B ) Z = Q. D
where Ta and Tp are upper triangular.
By Theorem 12.7, the eigenvalues of the pencil A — XB are then the ratios of the diag-onal elements of Ta to the corresponding diagonal elements of Tp, with the understandingthat a zero diagonal element of Tp corresponds to an infinite generalized eigenvalue.
There is also an analogue of the Murnaghan-Wintner Theorem for real matrices.
The first canonical form is an analogue of Schur's Theorem and forms, in fact, thetheoretical foundation for the QZ algorithm, which is the generally preferred method forsolving the generalized eigenvalue problem; see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
Theorem 12.8. Let A, B e Cnxn. Then there exist unitary matrices Q, Z e Cnxn such that
12.2. Canonical Forms 127
fewer than n eigenvalues. If B is nonsingular, the pencil A - AB always has precisely n eigenvalues, since the generalized eigenvalue problem is then easily seen to be equivalent to the standard eigenvalue problem B- 1 Ax = Ax (or AB- 1 W = AW). However, this turns out to be a very poor numerical procedure for handling the generalized eigenvalue problem if B is even moderately ill conditioned with respect to inversion. Numerical methods that work directly on A and B are discussed in standard textbooks on numerical linear algebra; see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
12.2 Canonical Forms Just as for the standard eigenvalue problem, canonical forms are available for the generalized eigenvalue problem. Since the latter involves a pair of matrices, we now deal with equivalencies rather than similarities, and the first theorem deals with what happens to eigenvalues and eigenvectors under equivalence.
Theorem 12.7. Let A, B, Q, Z E cnxn with Q and Z nonsingular. Then
1. the eigenvalues of the problems A - AB and QAZ - AQBZ are the same (the two problems are said to be equivalent).
2. ifx isa right eigenvector of A-AB, then Z-l x isa righteigenvectorofQAZ-AQB Z.
3. ify isa left eigenvector of A -AB, then Q-H y isa lefteigenvectorofQAZ -AQBZ.
Proof:
1. det(QAZ - AQBZ) = det[Q(A - AB)Z] = det Q det Z det(A - AB). Since det Q and det Z are nonzero, the result follows.
2. The result follows by noting that (A -AB)x = 0 if and only if Q(A -AB)Z(Z-l x) = o.
3. Again, the result follows easily by noting that yH (A - AB) o if and only if (Q-H y)H Q(A _ AB)Z = O. 0
The first canonical form is an analogue of Schur's Theorem and forms, in fact, the theoretical foundation for the QZ algorithm, which is the generally preferred method for solving the generalized eigenvalue problem; see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
Theorem 12.8. Let A, B E cnxn. Then there exist unitary matrices Q, Z E cnxn such that
QAZ = Ta , QBZ = TfJ ,
where Ta and TfJ are upper triangular.
By Theorem 12.7, the eigenvalues ofthe pencil A - AB are then the ratios of the diagonal elements of Ta to the corresponding diagonal elements of TfJ , with the understanding that a zero diagonal element of TfJ corresponds to an infinite generalized eigenvalue.
There is also an analogue of the Murnaghan-Wintner Theorem for real matrices.
128 Chapter 12. Generalized Eigenvalue Problems
Theorem 12.9. Let A, B e Rnxn. Then there exist orthogonal matrices Q, Z e R"xn suchthnt
where T is upper triangular and S is quasi-upper-triangular.
When S has a 2 x 2 diagonal block, the 2 x 2 subpencil formed with the corresponding2x2 diagonal subblock of T has a pair of complex conjugate eigenvalues. Otherwise, realeigenvalues are given as above by the ratios of diagonal elements of S to correspondingelements of T.
There is also an analogue of the Jordan canonical form called the Kronecker canonicalform (KCF). A full description of the KCF, including analogues of principal vectors andso forth, is beyond the scope of this book. In this chapter, we present only statements ofthe basic theorems and some examples. The first theorem pertains only to "square" regularpencils, while the full KCF in all its generality applies also to "rectangular" and singularpencils.
Theorem 12.10. Let A, B e Cnxn and suppose the pencil A — XB is regular. Then thereexist nonsingular matrices P, Q € C"x" such that
where J is a Jordan canonical form corresponding to the finite eigenvalues of A -A.fi andN is a nilpotent matrix of Jordan blocks associated with 0 and corresponding to the infiniteeigenvalues of A — XB.
Example 12.11. The matrix pencil
with characteristic polynomial (X — 2)2 has a finite eigenvalue 2 of multiplicty 2 and threeinfinite eigenvalues.
Theorem 12.12 (Kronecker Canonical Form). Let A, B e Cmxn. Then there existnonsingular matrices P e Cmxm and Q e Cnxn such that
128 Chapter 12. Generalized Eigenvalue Problems
Theorem 12.9. Let A, B E jRnxn. Then there exist orthogonal matrices Q, Z E jRnxn such that
QAZ = S, QBZ = T,
where T is upper triangular and S is quasi-upper-triangular.
When S has a 2 x 2 diagonal block, the 2 x 2 subpencil fonned with the corresponding 2 x 2 diagonal subblock of T has a pair of complex conjugate eigenvalues. Otherwise, real eigenvalues are given as above by the ratios of diagonal elements of S to corresponding elements of T.
There is also an analogue of the Jordan canonical fonn called the Kronecker canonical form (KeF). A full description of the KeF, including analogues of principal vectors and so forth, is beyond the scope of this book. In this chapter, we present only statements of the basic theorems and some examples. The first theorem pertains only to "square" regular pencils, while the full KeF in all its generality applies also to "rectangular" and singular pencils.
Theorem 12.10. Let A, B E cnxn and suppose the pencil A - AB is regular. Then there exist nonsingular matrices P, Q E cnxn such that
peA - AB)Q = [~ ~ ] - A [~ ~ l where J is a Jordan canonical form corresponding to the finite eigenvalues of A - AB and N is a nilpotent matrix of Jordan blocks associated with 0 and corresponding to the infinite eigenvalues of A - AB.
Example 12.11. The matrix pencil
[2 I 0 0
~ ]-> [ ~ 0 0
o 0] o 2 0 0 I 0 o 0 o 0 1 0 0 0 I 0 o 0 0 1 0 0 o 0 o 0 0 0 0 0 0 0
with characteristic polynomial (A - 2)2 has a finite eigenvalue 2 of multiplicty 2 and three infinite eigenvalues.
Theorem 12.12 (Kronecker Canonical Form). Let A, B E cmxn • Then there exist nonsingular matrices P E cmxm and Q E cnxn such that
peA - AB)Q = diag(LII' ... , L l" L~, ...• L;'. J - A.I, I - )"N),
12.2. Canonical Forms 129
where N is nilpotent, both N and J are in Jordan canonical form, and L^ is the (k + 1) x kbidiagonal pencil
The /( are called the left minimal indices while the r, are called the right minimal indices.Left or right minimal indices can take the value 0.
Such a matrix is in KCF. The first block of zeros actually corresponds to LQ, LQ, LQ, LQ ,LQ, where each LQ has "zero columns" and one row, while each LQ has "zero rows" andone column. The second block is L\ while the third block is L\. The next two blockscorrespond to
Just as sets of eigenvectors span A-invariant subspaces in the case of the standardeigenproblem (recall Definition 9.35), there is an analogous geometric concept for thegeneralized eigenproblem.
Definition 12.14. Let A, B e Wlxn and suppose the pencil A — XB is regular. Then V is adeflating subspace if
Just as in the standard eigenvalue case, there is a matrix characterization of deflatingsubspace. Specifically, suppose S e Rn*xk is a matrix whose columns span a ^-dimensionalsubspace S of Rn, i.e., R (S ) = <S. Then S is a deflating subspace for the pencil A — XB ifand only if there exists M e Rkxk such that
while the nilpotent matrix N in this example is
12.2. Canonical Forms 129
where N is nilpotent, both Nand J are in Jordan canonical form, and Lk is the (k + I) x k bidiagonal pencil
-A 0 0
-A
Lk = 0 0
-A 0 0 I
The Ii are called the left minimal indices while the ri are called the right minimal indices. Left or right minimal indices can take the value O.
Example 12.13. Consider a 13 x 12 block diagonal matrix whose diagonal blocks are
-A 0] I -A . o I
Such a matrix is in KCF. The first block of zeros actually corresponds to Lo, Lo, Lo, L6, L6, where each Lo has "zero columns" and one row, while each L6 has "zero rows" and one column. The second block is L\ while the third block is LI- The next two blocks correspond to
[
21 J = 0 2
o 0
while the nilpotent matrix N in this example is
[ ~6~]. 000
Just as sets of eigenvectors span A-invariant subspaces in the case of the standard eigenproblem (recall Definition 9.35), there is an analogous geometric concept for the generalized eigenproblem.
Definition 12.14. Let A, B E ~nxn and suppose the pencil A - AB is regular. Then V is a deflating subspace if
dim(AV + BV) = dimV. (12.4)
Just as in the standard eigenvalue case, there is a matrix characterization of deflating subspace. Specifically, suppose S E ~nxk is a matrix whose columns span a k-dimensional subspace S of ~n, i.e., n(S) = S. Then S is a deflating subspace for the pencil A - AB if and only if there exists M E ~kxk such that
AS = BSM. (12.5)
130 Chapter 12. Generalized Eigenvalue Problems
If B = /, then (12.4) becomes dim(AV + V) = dimV, which is clearly equivalent toAV c V. Similarly, (12.5) becomes AS = SM as before. If the pencil is not regular, thereis a concept analogous to deflating subspace called a reducing subspace.
12.3 Application to the Computation of System ZerosConsider the linear svstem
which has a root at —2.8.
The method of finding system zeros via a generalized eigenvalue problem also workswell for general multi-input, multi-output systems. Numerically, however, one must becareful first to "deflate out" the infinite zeros (infinite eigenvalues of (12.6)). This is accom-plished by computing a certain unitary equivalence on the system pencil that then yields asmaller generalized eigenvalue problem with only finite generalized eigenvalues (the finitezeros).
The connection between system zeros and the corresponding system pencil is non-trivial. However, we offer some insight below into the special case of a single-input,
with A € Mn x n , B € R"xm, C e Rpxn, and D € Rpxm. This linear time-invariant state-space model is often used in multivariable control theory, where x(= x(t)) is called the statevector, u is the vector of inputs or controls, and y is the vector of outputs or observables.For details, see, for example, [26].
In general, the (finite) zeros of this system are given by the (finite) complex numbersz, where the "system pencil"
drops rank. In the special case p = m, these values are the generalized eigenvalues of the(n + m) x (n + m) pencil.
Example 12.15. Let
Then the transfer matrix (see [26]) of this system is
which clearly has a zero at —2.8. Checking the finite eigenvalues of the pencil (12.6), wefind the characteristic polynomial to be
130 Chapter 12. Generalized Eigenvalue Problems
If B = I, then (12.4) becomes dim (A V + V) = dim V, which is clearly equivalent to AV ~ V. Similarly, (12.5) becomes AS = SM as before. lEthe pencil is not regular, there is a concept analogous to deflating subspace called a reducing subspace.
12.3 Application to the Computation of System Zeros Consider the linear system
i = Ax + Bu,
y = Cx + Du
with A E jRnxn, B E jRnxm, C E jRPxn, and D E jRPxm. This linear time-invariant statespace model is often used in multivariable control theory, where x(= x(t)) is called the state vector, u is the vector of inputs or controls, and y is the vector of outputs or observables. For details, see, for example, [26].
In general, the (finite) zeros of this system are given by the (finite) complex numbers z, where the "system pencil"
(12.6)
drops rank. In the special case p = m, these values are the generalized eigenvalues of the (n + m) x (n + m) pencil.
Example 12.15. Let
A=[ -4
2
Then the transfer matrix (see [26)) of this system is
C = [I 2],
55 + 14 g(5)=C(sI-A)-'B+D= 2 '
5 + 3s + 2
D=O.
which clearly has a zero at -2.8. Checking the finite eigenvalues of the pencil (12.6), we find the characteristic polynomial to be
det [A -cM B] D "'" 5A + 14,
which has a root at -2.8.
The method of finding system zeros via a generalized eigenvalue problem also works well for general mUlti-input, multi-output systems. Numerically, however, one must be careful first to "deflate out" the infinite zeros (infinite eigenvalues of (12.6». This is accomplished by computing a certain unitary equivalence on the system pencil that then yields a smaller generalized eigenvalue problem with only finite generalized eigenvalues (the finite zeros).
The connection between system zeros and the corresponding system pencil is nontrivial. However, we offer some insight below into the special case of a single-input.
12.4. Symmetric Generalized Eigenvalue Problems 131
single-output system. Specifically, let B = b e Rn, C = c1 e R l x n , and D = d e R.Furthermore, let g(.s) = cr(s7 — A)~!Z? + d denote the system transfer function (matrix),and assume that g ( s ) can be written in the form
where TT(S) is the characteristic polynomial of A, and v(s) and TT(S) are relatively prime(i.e., there are no "pole/zero cancellations").
Suppose z € C is such that
is singular. Then there exists a nonzero solution to
or
Assuming z is not an eigenvalue of A (i.e., no pole/zero cancellations), then from (12.7) weget
Substituting this in (12.8), we have
or g ( z ) y = 0 by the definition of g. Now _y ̂ 0 (else x = 0 from (12.9)). Hence g(z) = 0,i.e., z is a zero of g.
12.4 Symmetric Generalized Eigenvalue ProblemsA very important special case of the generalized eigenvalue problem
for A, B e Rnxn arises when A = A and B = B1 > 0. For example, the second-ordersystem of differential equations
where M is a symmetric positive definite "mass matrix" and K is a symmetric "stiffnessmatrix," is a frequently employed model of structures or vibrating systems and yields ageneralized eigenvalue problem of the form (12.10).
Since B is positive definite it is nonsingular. Thus, the problem (12.10) is equivalentto the standard eigenvalue problem B~l Ax = AJC. However, B~1A is not necessarilysymmetric.
12.4. Symmetric Generalized Eigenvalue Problems 131
single-output system. Specifically, let B = b E ffi.n, C = cT E ffi.l xn, and D = d E R Furthermore, let g(s) = cT (s I - A) -1 b + d denote the system transfer function (matrix), and assume that g(s) can be written in the form
v(s) g(s) = n(s)'
where n(s) is the characteristic polynomial of A, and v(s) and n(s) are relatively prime (i.e., there are no "pole/zero cancellations").
Suppose Z E C is such that
[A - zI b ]
cT d
is singular. Then there exists a nonzero solution to
or
(A - zl)x + by = 0,
cT x +dy = O.
(12.7)
(12.8)
Assuming z is not an eigenvalue of A (i.e., no pole/zero cancellations), then from (12.7) we get
x = -(A - zl)-lby. (12.9)
Substituting this in (12.8), we have
_cT (A - zl)-lby + dy = 0,
or g(z)y = 0 by the definition of g. Now y 1= 0 (else x = 0 from (12.9». Hence g(z) = 0, i.e., z is a zero of g.
12.4 Symmetric Generalized Eigenvalue Problems A very important special case of the generalized eigenvalue problem
Ax = ABx (12.10)
for A, B E ffi.nxn arises when A = AT and B = BT > O. For example, the second-order system of differential equations
Mx+Kx=O,
where M is a symmetric positive definite "mass matrix" and K is a symmetric "stiffness matrix," is a frequently employed model of structures or vibrating systems and yields a generalized eigenvalue problem ofthe form (12.10).
Since B is positive definite it is nonsingular. Thus, the problem (12.10) is equivalent to the standard eigenvalue problem B-1 Ax = AX. However, B-1 A is not necessarily symmetric.
Nevertheless, the eigenvalues of B l A are always real (and are approximately 2.1926and -3.1926 in Example 12.16).
Theorem 12.17. Let A, B e Rnxn with A = AT and B = BT > 0. Then the generalizedeigenvalue problem
whose eigenvalues are approximately 2.1926 and —3.1926 as expected.
The material of this section can, of course, be generalized easily to the case where Aand B are Hermitian, but since real-valued matrices are commonly used in most applications,we have restricted our attention to that case only.
has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to beorthogonal with respect to the inner product (x, y)B = XTBy. Moreover, if A > 0, thenthe eigenvalues are also all positive.
Proof: Since B > 0, it has a Cholesky factorization B = LLT, where L is nonsingular(Theorem 10.23). Then the eigenvalue problem
can be rewritten as the equivalent problem
Letting C = L 1AL J and z = L1 x, (12.11) can then be rewritten as
Since C = CT, the eigenproblem (12.12) has n real eigenvalues, with corresponding eigen-vectors zi,..., zn satisfying
Then x, = L Tzi, i € n, are eigenvectors of the original generalized eigenvalue problemand satisfy
Finally, if A = AT > 0, then C = CT > 0, so the eigenvalues are positive. D
Example 12.18. The Cholesky factor for the matrix B in Example 12.16 is
Then it is easily checked thai
132 Chapter 12. Generalized Eigenvalue Problems
Example 12.16. Let A ThenB~lA
132 Chapter 12. Generalized Eigenvalue Problems
Example 12.16. Let A = [~ ; l B = [i ~ J Then B~I A = [-~ ~ J Nevertheless, the eigenvalues of B~I A are always real (and are approximately 2.1926
and -3.1926 in Example 12.16).
Theorem 12.17. Let A, B E jRnxn with A = AT and B = BT > O. Then the generalized eigenvalue problem
Ax = ABx
has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to be orthogonal with respect to the inner product (x, y) B = x T By. Moreover, if A > 0, then the eigenvalues are also all positive.
Proof: Since B > 0, it has a Cholesky factorization B = LL T, where L is nonsingular (Theorem 10.23). Then the eigenvalue problem
Ax = ABx = ALL T x
can be rewritten as the equivalent problem
(12.11)
Letting C = L ~I AL ~T and Z = LT x, (12.11) can then be rewritten as
Cz = AZ. (12.12)
Since C = C T , the eigenproblem (12.12) has n real eigenvalues, with corresponding eigenvectors Z I, •.. , Zn satisfying
zi Zj = Dij.
Then Xi = L ~T Zi, i E !!., are eigenvectors of the original generalized eigenvalue problem and satisfy
(Xi, Xj)B = xr BXj = (zi L ~l)(LLT)(L ~T Zj) = Dij.
Finally, if A = AT> 0, then C = CT > 0, so the eigenvalues are positive. 0
Example 12.18. The Cholesky factor for the matrix B in Example 12.16 is
L=[~ 1] . .,fi .,fi
Then it is easily checked that
c = L~lAL~T = [ 0 . .5 2.5
2 . .5 ] -1.5 '
whose eigenvalues are approximately 2.1926 and -3.1926 as expected.
The material of this section can, of course, be generalized easily to the case where A and B are Hermitian, but since real-valued matrices are commonly used in most applications, we have restricted our attention to that case only.
12.5. Simultaneous Diagonalization 133
12.5 Simultaneous DiagonalizationRecall that many matrices can be diagonalized by a similarity. In particular, normal ma-trices can be diagonalized by a unitary similarity. It turns out that in some cases a pair ofmatrices (A, B) can be simultaneously diagonalized by the same matrix. There are manysuch results and we present only a representative (but important and useful) theorem here.Again, we restrict our attention only to the real case, with the complex case following in astraightforward way.
Theorem 12.19 (Simultaneous Reduction to Diagonal Form). Let A, B e E"x" withA = AT and B = BT > 0. Then there exists a nonsingular matrix Q such that
\ 2.5.1 Simultaneous diagonalization via SVD
There are situations in which forming C = L~1AL~T as in the proof of Theorem 12.19 isnumerically problematic, e.g., when L is highly ill conditioned with respect to inversion. Insuch cases, simultaneous reduction can also be accomplished via an SVD. To illustrate, let
where D is diagonal. In fact, the diagonal elements of D are the eigenvalues of B 1A.
Proof: Let B = LLT be the Cholesky factorization of B and setC = L~1AL~T. SinceC is symmetric, there exists an orthogonal matrix P such that PTCP = D, where D isdiagonal. Let Q = L~TP. Then
and
Finally, since QDQ~l = QQT AQQ~l = L-TPPTL~1A = L~TL~1A = B~1A, wehaveA(D) = A(B~1A). D
Note that Q is not in general orthogonal, so it does not preserve eigenvalues of A and Bindividually. However, it does preserve the eigenvalues of A — XB. This can be seen directly.LetA = QTAQandB = QTBQ. Then/HA = Q~l B~l Q~T QT AQ = Q~1B~1AQ.
Theorem 12.19 is very useful for reducing many statements about pairs of symmetricmatrices to "the diagonal case." The following is typical.
Theorem 12.20. Let A, B e M"xn be positive definite. Then A > B if and only if B~l >A-1.
Proof: By Theorem 12.19, there exists Q e E"x" such that QT AQ = D and QT BQ = I,where D is diagonal. Now D > 0 by Theorem 10.31. Also, since A > B, by Theorem10.21 we have that QTAQ > QTBQ, i.e., D > I. But then D"1 < / (this is trivially truesince the two matrices are diagonal). Thus, QD~l QT < QQT, i.e., A~l < B~l. D
12.5. Simultaneous Diagonalization 133
12.5 Simultaneous Diagonalization Recall that many matrices can be diagonalized by a similarity. In particular, normal matrices can be diagonalized by a unitary similarity. It turns out that in some cases a pair of matrices (A, B) can be simultaneously diagonalized by the same matrix. There are many such results and we present only a representative (but important and useful) theorem here. Again, we restrict our attention only to the real case, with the complex case following in a straightforward way.
Theorem 12.19 (Simultaneous Reduction to Diagonal Form). Let A, B E ][~nxn with A = AT and B = BT > O. Then there exists a nonsingular matrix Q such that
where D is diagonal. Infact, the diagonal elements of D are the eigenvalues of B-1 A.
Proof: Let B = LLT be the Cholesky factorization of B and set C = L -I AL -T. Since C is symmetric, there exists an orthogonal matrix P such that pTe p = D, where D is diagonal. Let Q = L - T P. Then
and
QT BQ = pT L -I(LLT)L -T P = pT P = [.
Finally, since QDQ-I = QQT AQQ-I = L -T P pT L -I A = L -T L -I A B-1A, we have A(D) = A(B-1 A). 0
Note that Q is not in general orthogonal, so it does not preserve eigenvalues of A and B individually. However, it does preserve the eigenvalues of A - 'AB. This can be seen directly. Let A = QT AQ and B = QT BQ. Then B-1 A = Q-1 B-1 Q-T QT AQ = Q-1 B-1 AQ.
Theorem 12.19 is very useful for reducing many statements about pairs of symmetric matrices to "the diagonal case." The following is typical.
Theorem 12.20. Let A, B E lRnxn be positive definite. Then A 2: B if and only if B- 1 2: A-I.
Proof: By Theorem 12.19, there exists Q E lR~xn such that QT AQ = D and QT BQ = [, where D is diagonal. Now D > 0 by Theorem 10.31. Also, since A 2: B, by Theorem 10.21 we have that QT AQ 2: QT BQ, i.e., D 2: [. But then D- I :::: [(this is trivially true since the two matrices are diagonal). Thus, Q D- I QT :::: Q QT, i.e., A -I :::: B- 1. 0
12.5.1 Simultaneous diagonalization via SVD
There are situations in which forming C = L -I AL -T as in the proof of Theorem 12.19 is numerically problematic, e.g., when L is highly iII conditioned with respect to inversion. In such cases, simultaneous reduction can also be accomplished via an SVD. To illustrate. let
The problem (12.15) is called a generalized singular value problem and algorithms exist tosolve it (and hence equivalently (12.13)) via arithmetic operations performed only on LAand LB separately, i.e., without forming the products LALT
A or LBLTB explicitly; see, for
example, [7, Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M byoperations performed directly on M rather than by forming the matrix MT M and solvingthe eigenproblem MT MX = Xx.
Remark 12.22. Various generalizations of the results in Remark 12.21 are possible, forexample, when A = AT > 0. The case when A is symmetric but indefinite is not sostraightforward, at least in real arithmetic. For example, A can be written as A = PDPT,
~ ~ ~ ~ Twhere D is diagonal and P is orthogonal, but in writing A — PDDP = PD(PD) withD diagonal, D may have pure imaginary elements.
134 Chapter 12. Generalized Eigenvalue Problems
us assume that both A and B are positive definite. Further, let A = LALTA and B — LsLT
B
be Cholesky factorizations of A and B, respectively. Compute the SVD
where E e R£x" is diagonal. Then the matrix Q = LBTU performs the simultaneous
diagonalization. To check this, note that
while
Remark 12.21. The SVD in (12.13) can be computed without explicitly forming theindicated matrix product or the inverse by using the so-called generalized singular valuedecomposition (GSVD). Note that
and thus the singular values of LB LA can be found from the eigenvalue problem
Letting x = LB z we see that (12.14) can be rewritten in the form LALAx = XLBz =ALgL^Lg 7 z , which is thus equivalent to the generalized eigenvalue problem
134 Chapter 12. Generalized Eigenvalue Problems
us assume that both A and B are positive definite. Further, let A = LAL~ and B = LBL~ be Cholesky factorizations of A and B, respectively. Compute the SVD
(12.13)
where L E 1R~ xn is diagonal. Then the matrix Q = L i/ u performs the simultaneous diagonalization. To check this, note that
while
QT AQ = U T Li/(LAL~)Li/U = UTULVTVLTUTU
= L2
QT BQ = U T LB1(LBL~)Li/U = UTU
= I.
Remark 12.21. The SVD in (12.13) can be computed without explicitly forming the indicated matrix product or the inverse by using the so-called generalized singular value decomposition (GSVD). Note that
and thus the singular values of L B 1 L A can be found from the eigenvalue problem
02.14)
Letting x = LBT Z we see that 02.14) can be rewritten in the form LAL~x = ALBz = ALBL~LBT z, which is thus equivalent to the generalized eigenvalue problem
02.15)
The problem (12.15) is called a generalized singular value problem and algorithms exist to solve it (and hence equivalently (12.13» via arithmetic operations performed only on LA and L B separately, i.e., without forming the products LA L ~ or L B L ~ explicitly; see, for example, [7, Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M by operations performed directly on M rather than by forming the matrix MT M and solving the eigenproblem MT M x = AX.
Remark 12.22. Various generalizations of the results in Remark 12.21 are possible, for example, when A = AT::: O. The case when A is symmetric but indefinite is not so straightforward, at least in real arithmetic. For example, A can be written as A = PDP T,
where Disdiagonaland P is orthogonal,butin writing A = PDDp T = PD(PD{ with D diagonal, b may have pure imaginary elements.
12.6. Higher-Order Eigenvalue Problems 135
12.6 Higher-Order Eigenvalue ProblemsConsider the second-order system of differential equations
where q(t} e W1 and M, C, K e Rnxn. Assume for simplicity that M is nonsingular.Suppose, by analogy with the first-order case, that we try to find a solution of (12.16) of theform q(t) = ext p, where the n-vector p and scalar A. are to be determined. Substituting in(12.16) we get
To get a nonzero solution /?, we thus seek values of A. for which the matrix A.2M + A.C + Kis singular. Since the determinantal equation
yields a polynomial of degree 2rc, there are 2n eigenvalues for the second-order (orquadratic) eigenvalue problem A.2M + A.C + K.
A special case of (12.16) arises frequently in applications: M = I, C = 0, andK = KT. Suppose K has eigenvalues
Let a>k = | fjik 12 • Then the 2n eigenvalues of the second-order eigenvalue problem A.2/ + Kare
If r = n (i.e., K = KT > 0), then all solutions of q + Kq = 0 are oscillatory.
12.6.1 Conversion to first-order form
Let x\ = q and \i = q. Then (12.16) can be written as a first-order system (with blockcompanion matrix)
where x(t) €. E2". If M is singular, or if it is desired to avoid the calculation of M l becauseM is too ill conditioned with respect to inversion, the second-order problem (12.16) can stillbe converted to the first-order generalized linear system
or, since
12.6. Higher-Order Eigenvalue Problems 135
12.6 Higher-Order Eigenvalue Problems Consider the second-order system of differential equations
Mq+Cq+Kq=O, (12.16)
where q(t) E ~n and M, C, K E ~nxn. Assume for simplicity that M is nonsingular. Suppose, by analogy with the first-order case, that we try to find a solution of (12.16) of the form q(t) = eAt p, where the n-vector p and scalar A are to be determined. Substituting in (12.16) we get
or, since eAt :F 0, (A 2 M + AC + K) p = O.
To get a nonzero solution p, we thus seek values of A for which the matrix A 2 M + AC + K is singular. Since the determinantal equation
o = det(A 2 M + AC + K) = A 2n + ...
yields a polynomial of degree 2n, there are 2n eigenvalues for the second-order (or quadratic) eigenvalue problem A 2 M + AC + K.
A special case of (12.16) arises frequently in applications: M = I, C = 0, and K = KT. Suppose K has eigenvalues
IL I ::: ... ::: ILr ::: 0 > ILr+ I ::: ... ::: ILn·
Let Wk = I ILk I !. Then the 2n eigenvalues of the second-order eigenvalue problem A 2 I + K are
± jWk; k = 1, ... , r,
± Wk; k = r + 1, ... , n.
If r = n (i.e., K = KT ::: 0), then all solutions of q + K q = 0 are oscillatory.
12.6.1 Conversion to first-order form
Let XI = q and X2 = q. Then (12.16) can be written as a first-order system (with block companion matrix)
. [ 0 X = -M-1K
where x (t) E ~2n. If M is singular, or if it is desired to avoid the calculation of M- I because M is too ill conditioned with respect to inversion, the second-order problem (12.16) can still be converted to the first-order generalized linear system
[ I OJ' [0 I J o M x = -K -C x.
136 Chapter 12. Generalized Eigenvalue Problems
Many other first-order realizations are possible. Some can be useful when M, C, and/or Khave special symmetry or skew-symmetry properties that can exploited.
Higher-order analogues of (12.16) involving, say, the kth derivative of q, lead naturallyto higher-order eigenvalue problems that can be converted to first-order form using aknxknblock companion matrix analogue of (11.19). Similar procedures hold for the general k\h-order difference equation
EXERCISES
are the eigenvalues of the matrix A — BD 1C.
2. Let F, G € CMX". Show that the nonzero eigenvalues of FG and GF are the same.Hint: An easy "trick proof is to verify that the matrices
are similar via the similarity transformation
are identical for all F 6 E"1*" and all G G R"xm.Hint: Consider the equivalence
(A similar result is also true for "nonsquare" pencils. In the parlance of control theory,such results show that zeros are invariant under state feedback or output injection.)
which can be converted to various first-order systems of dimension kn.
1. Suppose A e Rnx" and D e R™x m . Show that the finite generalized eigenvalues ofthe pencil
3. Let F e Cnxm, G e Cmx". Are the nonzero singular values of FG and GF thesame?
4. Suppose A € Rnxn, B e Rn*m, and C e Ewx". Show that the generalized eigenval-ues of the pencils
and
136 Chapter 12. Generalized Eigenvalue Problems
Many other first-order realizations are possible. Some can be useful when M, C, andlor K have special symmetry or skew-symmetry properties that can exploited.
Higher-order analogues of (12.16) involving, say, the kth derivative of q, lead naturally to higher-order eigenvalue problems that can be converted to first-order form using a kn x kn block companion matrix analogue of (11.19). Similar procedures hold for the general kthorder difference equation
which can be converted to various first-order systems of dimension kn.
EXERCISES
1. Suppose A E lRn xn and D E lR::! xm. Show that the finite generalized eigenvalues of the pencil
[~ ~J-A[~ ~J are the eigenvalues of the matrix A - B D- 1 C.
2. Let F, G E enxn • Show that the nonzero eigenvalues of FG and G F are the same. Hint: An easy "trick proof' is to verify that the matrices
[Fg ~ ] and [~ GOF ]
are similar via the similarity transformation
3. Let F E enxm , G E emxn • Are the nonzero singular values of FG and GF the same?
4. Suppose A E ]Rnxn, B E lRnxm , and C E lRmxn. Show that the generalized eigenvalues of the pencils
[~ ~J-A[~ ~J and
[ A + B~ + GC ~] _ A [~ ~] are identical for all F E Rm xn and all G E R" xm .
Hint: Consider the equivalence
[I G][A-U B][I 0] 01 CO Fl'
(A similar result is also true for "nonsquare" pencils. In the parlance of control theory, such results show that zeros are invariant under state feedback or output injection.)
Exercises 137
5. Another family of simultaneous diagonalization problems arises when it is desiredthat the simultaneous diagonalizing transformation Q operates on matrices A, B e]Rnx" in such a way that Q~l AQ~T and QTBQ are simultaneously diagonal. Sucha transformation is called contragredient. Consider the case where both A andB are positive definite with Cholesky factorizations A = L<
A and B = L#Lg,respectively, and let UWT be an SVD of LT
BLA.
(a) Show that Q = LA V£~ 5 is a contragredient transformation that reduces bothA and B to the same diagonal matrix.
(b) Show that Q~l = ^~^UT LTB.
(c) Show that the eigenvalues of A B are the same as those of E2 and hence arepositive.
Exercises 137
5. Another family of simultaneous diagonalization problems arises when it is desired that the simultaneous diagonalizing transformation Q operates on matrices A, B E
jRnxn in such a way that Q-l AQ-T and QT BQ are simultaneously diagonal. Such a transformation is called contragredient. Consider the case where both A and B are positive definite with Cholesky factorizations A = LA L ~ and B = L B L ~, respectively, and let U~VT be an SVD of L~LA'
(a) Show that Q = LA V~-! is a contragredient transformation that reduces both A and B to the same diagonal matrix.
(b) Show that Q-l = ~-!UTL~.
(c) Show that the eigenvalues of AB are the same as those of 1;2 and hence are positive.
This page intentionally left blank This page intentionally left blank
Chapter 13
Kronecker Products
13.1 Definition and ExamplesDefinition 13.1. Let A e Rmx", B e Rpxq. Then the Kronecker product (or tensorproduct) of A and B is defined as the matrix
Obviously, the same definition holds if A and B are complex-valued matrices. Werestrict our attention in this chapter primarily to real-valued matrices, pointing out theextension to the complex case only where it is not obvious.
Example 13.2.
Note that B <g> A / A <g> B.
2. Foranyfl e!FX(7, / 2<8>fl = [o l\Replacing I2 by /„ yields a block diagonal matrix with n copies of B along thediagonal.
3. Let B be an arbitrary 2x2 matrix. Then
139
Chapter 13
Kronecker Products
13.1 Definition and Examples Definition 13.1. Let A E lRmxn , B E lRpxq . Then the Kronecker product (or tensor product) of A and B is defined as the matrix
[
allB
A@B= :
amlB
alnB ] : E lRmpxnq.
amnB
(13.1)
Obviously, the same definition holds if A and B are complex-valued matrices. We restrict our attention in this chapter primarily to real-valued matrices, pointing out the extension to the complex case only where it is not obvious.
Example 13.2.
1. Let A = [~ 2 nand B = [; ~J. Then 2
~]~U 4 2 6 n A@B = [ 3~ 2B 3 4 6 6
2B 3 4 2 2 9 4 6 2
Note that B @ A i- A @ B.
2. Forany B E lRpxq , /z @ B = [~ ~ J. Replacing 12 by In yields a block diagonal matrix with n copies of B along the diagonal.
3. Let B be an arbitrary 2 x 2 matrix. Then l b"
0 b12 0
l B @/z = b~l b ll 0 b12 0 b22 0
b21 0 b22
139
140 Chapter 13. Kronecker Products
The extension to arbitrary B and /„ is obvious.
4. Let Jt € Rm, y e R". Then
5. Let* eRm , y eR". Then
13.2 Properties of the Kronecker ProductTheorem 13.3. Let A e Rmx", 5 e Rrxi, C e R"x^ and D e Rsxt. Then
Proof: Simply verify that
Theorem 13.4. For all A and B,
Proof: For the proof, simply verify using the definitions of transpose and Kroneckerproduct. D
Corollary 13.5. If A e R"xn and B e Rmxm are symmetric, then A® B is symmetric.
Theorem 13.6. If A and B are nonsingular,
Proof: Using Theorem 13.3, simply note that
140 Chapter 13. Kronecker Products
The extension to arbitrary B and In is obvious.
4. Let x E ~m, y E !R.n. Then
[ T T]T
X ® Y = XIY , ... , XmY
= [XIYJ, ... , XIYn, X2Yl, ... , xmYnf E !R.mn
.
13.2 Properties of the Kronecker Product Theorem 13.3. Let A E ~mxn, B E ~rxs, C E ~nxp, and D E ~sxt. Then
(A 0 B)(C 0 D) = AC 0 BD (E ~mrxpt).
Proof; Simply verify that
~[ =AC0BD. 0
Theorem 13.4. Foral! A and B, (A ® Bl = AT ® BT.
L~=l al;kCkPBD ]
L~=1 amkckpBD
(13.2)
Proof' For the proof, simply verify using the definitions of transpose and Kronecker product. 0
Corollary 13.5. If A E ]Rn xn and B E !R.m xm are symmetric, then A ® B is symmetric.
Theorem 13.6. If A and Bare nonsingular, (A ® B)-I = A-I ® B- 1.
Proof: Using Theorem 13.3, simply note that (A ® B)(A -1 ® B- 1) = 1 ® 1 = I. 0
Corollary 13.8. If A € E"xn is orthogonal and B e Mmxm 15 orthogonal, then A <g> B isorthogonal.
13.2. Properties of the Kronecker Product 141
Theorem 13.7. If A e IR"xn am/ B eRmxm are normal, then A® B is normal.
Proof:
yields a singular value decomposition of A <8> B (after a simple reordering of the diagonalelements O/£A <8> £5 and the corresponding right and left singular vectors).
Corollary 13.11. Let A e R™x" have singular values a\ > • • • > ar > 0 and let B ehave singular values T\ > • • • > TS > 0. Then A <g) B (or B <8> A) has rs singular values^iT\ > • • • > ffrTs > Qand
Theorem 13.12. Let A e Rnx" have eigenvalues A.,-, / e n, and let B e Rmxw /zaveeigenvalues jJij, 7 € m. TTzen ?/ze mn eigenvalues of A® B are
Moreover, if x\, ..., xp are linearly independent right eigenvectors of A correspondingto A - i , . . . , A.p (p < n), and zi, • • •, zq are linearly independent right eigenvectors of Bcorresponding to JJL\ , ..., \Juq (q < m), then ;c, <8> Zj € ffi.m" are linearly independent righteigenvectors of A® B corresponding to A.,/u,7, i e /?, 7 e q.
Proof: The basic idea of the proof is as follows:
If A and B are diagonalizable in Theorem 13.12, we can take p = n and q —m andthus get the complete eigenstructure of A <8> B. In general, if A and fi have Jordan form
Example 13.9. Let A and B - Then it is easily seen that
A is orthogonal with eigenvalues e±j9 and B is orthogonal with eigenvalues e±j(i>. The 4 x 4matrix A ® 5 is then also orthogonal with eigenvalues e^'^+'W and e±^(6>~^>\
Theorem 13.10. Lgf A G Emx" have a singular value decomposition l/^E^Vj an^ /effi e ̂ pxq have a singular value decomposition UB^B^B- Then
13.2. Properties of the Kronecker Product
Theorem 13.7. If A E IRnxn and B E IRmxm are normal, then A 0 B is normal.
Proof:
(A 0 B{ (A 0 B) = (AT 0 BT)(A 0 B) by Theorem 13.4
= AT A 0 BT B by Theorem 13.3
= AAT 0 B BT since A and B are normal
= (A 0 B)(A 0 B)T by Theorem 13.3. 0
141
Corollary 13.8. If A E IRnxn is orthogonal and B E IRmxm is orthogonal, then A 0 B is orthogonal.
E I 139 L A [ eose Sine] dB [Cos</> Sin</>] Th ., '1 h xamp e .• et = _ sin e cose an = _ sin</> cos</>O en It IS easl y seen t at
A is orthogonal with eigenvalues e±jO and B is orthogonal with eigenvalues e±j</J. The 4 x 4 matrix A 0 B is then also orthogonal with eigenvalues e±jeH</» and e±jefJ -</».
Theorem 13.10. Let A E IRmxn have a singular value decomposition VA ~A vI and let B E IRpxq have a singular value decomposition V B ~B VI. Then
yields a singular value decomposition of A 0 B (after a simple reordering of the diagonal elements of ~A 0 ~B and the corresponding right and left singular vectors).
Corollary 13.11. Let A E lR;"xn have singular values UI :::: ... :::: U r > 0 and let B E IRfxq
have singular values <I :::: ... :::: <s > O. Then A 0 B (or B 0 A) has rs singular values U, <I :::: ... :::: U r <s > 0 and
rank(A 0 B) = (rankA)(rankB) = rank(B 0 A) .
Theorem 13.12. Let A E IRn xn have eigenvalues Ai, i E !!, and let B E IRm xm have eigenvalues JL j, j E m. Then the mn eigenvalues of A 0 Bare
Moreover, if Xl, ••. , xp are linearly independent right eigenvectors of A corresponding to AI, ... , A p (p ::::: n), and Z I, ... ,Zq are linearly independent right eigenvectors of B corresponding to JLI, ... ,JLq (q ::::: m), then Xi 0 Zj E IRmn are linearly independent right eigenvectors of A 0 B corresponding to Ai JL j, i E l!! j E 1·
Proof: The basic idea of the proof is as follows:
(A 0 B)(x 0 z) = Ax 0 Bz
= AX 0 JLZ
= AJL(X 0 z). 0
If A and Bare diagonalizable in Theorem 13.12, we can take p = nand q = m and thus get the complete eigenstructure of A 0 B. In general, if A and B have Jordan form
142 Chapter 13. Kronecker Products
decompositions given by P~l AP = JA and Q~] BQ = JB, respectively, then we get thefollowing Jordan-like structure:
Note that JA® JB, while upper triangular, is generally not quite in Jordan form and needsfurther reduction (to an ultimate Jordan form that also depends on whether or not certaineigenvalues are zero or nonzero).
A Schur form for A ® B can be derived similarly. For example, suppose P andQ are unitary matrices that reduce A and 5, respectively, to Schur (triangular) form, i.e.,PHAP = TA and QHBQ = TB (and similarly if P and Q are orthogonal similaritiesreducing A and B to real Schur form). Then
Corollary 13.13. Let A e Rnxn and B e Rmxm. Then
Definition 13.14. Let A e Rnxn and B e Rmxm. Then the Kronecker sum (or tensor sum)of A and B, denoted A © B, is the mn x mn matrix (Im <g> A) + (B ® /„). Note that, ingeneral, A ® B ^ B © A.
Example 13.15.
Then
The reader is invited to compute B 0 A = (/3 ® B) + (A <g> /2) and note the differencewith A © B.
1. Let
142 Chapter 1 3. Kronecker Products
decompositions given by p- I AP = JA and Q-l BQ = JB , respectively, then we get the following Jordan-like structure:
(P ® Q)-I(A ® B)(P ® Q) = (P- I ® Q-l)(A ® B)(P ® Q)
= (P- 1 AP) ® (Q-l BQ)
= JA ® JB ·
Note that h ® JR, while upper triangular, is generally not quite in Jordan form and needs further reduction (to an ultimate Jordan form that also depends on whether or not certain eigenvalues are zero or nonzero).
A Schur form for A ® B can be derived similarly. For example, suppose P and Q are unitary matrices that reduce A and B, respectively, to Schur (triangular) form, i.e., pH AP = TA and QH BQ = TB (and similarly if P and Q are orthogonal similarities reducing A and B to real Schur form). Then
(P ® Q)H (A ® B)(P ® Q) = (pH ® QH)(A ® B)(P ® Q)
= (pH AP) ® (QH BQ)
= TA ® TR .
Corollary 13.13. Let A E IRn xn and B E IRrn xm. Then
1. Tr(A ® B) = (TrA)(TrB) = Tr(B ® A).
2. det(A ® B) = (det A)m(det Bt = det(B ® A).
Definition 13.14. Let A E IRn Xn and B E IRm xrn. Then the Kronecker sum (or tensor sum)
of A and B, denoted A EEl B, is the mn x mn matrix Urn ® A) + (B ® In). Note that, in general, A EEl B i= B EEl A.
Example 13.15.
1. Let
A~U 2 !]andB~[ ; ~l 2
Then
2 3 0 0 0 2 0 0 0 0 3 2 1 0 0 0 0 2 0 0 1 0
AfflB = (h®A)+(B®h) = 1 1 4 0 0 0 0 0 2 0 0 0 0 0 2 3 + 2 0 0 3 0 0 0 0 0 3 2 0 2 0 0 3 0 0 0 0 4 0 0 2 0 0 3
The reader is invited to compute B EEl A = (h ® B) + (A 0 h) and note the difference with A EEl B.
13.2. Properties of the Kronecker Product 143
If A and B are diagonalizable in Theorem 13.16, we can take p = n and q = m andthus get the complete eigenstructure of A 0 B. In general, if A and B have Jordan formdecompositions given by P~1AP = JA and Q"1 BQ = JB, respectively, then
is a Jordan-like structure for A © B.
Then J can be written in the very compact form J = (4 <8> M) + (E^®l2) = M 0 Ek.
Theorem 13.16. Let A e E"x" have eigenvalues A , - , i e n, and let B e Rmx'" haveeigenvalues /z;, 7 e ra. TTzen r/ze Kronecker sum A® B = (Im (g> A) + (B <g> /„) /za^ rane/genva/wes
Moreover, if x\,... ,xp are linearly independent right eigenvectors of A correspondingto AI, . . . , Xp (p < n), and z\, ..., zq are linearly independent right eigenvectors of Bcorresponding to f j i \ , . . . , f^q (q < ra), then Zj <8> Xi € W1" are linearly independent righteigenvectors of A® B corresponding to A., + [ij, i € p, j e q.
Proof: The basic idea of the proof is as follows:
2. Recall the real JCF
where M =
13.2. Properties of the Kronecker Product
2. Recall the real JCF
1=
where M = [ a
-f3
M I 0 0
f3 a
o M I 0
M
0
J. Define
0 0
0 0
Ek =
0
I 0
M I o M
o
o
o
143
E jR2kx2k,
Then 1 can be written in the very compact form 1 = (I} ® M) + (Ek ® h) = M $ Ek .
Theorem 13.16. Let A E jRnxn have eigenvalues Ai, i E !!. and let B E jRmxm have eigenvalues fJ-j, j E I!!. Then the Kronecker sum A $ B = (1m ® A) + (B ® In) has mn eigenvalues
Al + fJ-t, ... , AI + fJ-m, A2 + fJ-t,···, A2 + fJ-m, ... , An + fJ-m'
Moreover, if XI, .•• , xp are linearly independent right eigenvectors of A corresponding to AI, ... , Ap (p ::s: n), and ZI, ... , Zq are linearly independent right eigenvectors of B corresponding to fJ-t, ... , fJ-q (q ::s: m), then Z j ® Xi E jRmn are linearly independent right eigenvectors of A $ B corresponding to Ai + fJ-j' i E E, j E fl·
Proof: The basic idea of the proof is as follows:
[(1m ® A) + (B ® In)](Z ® X) = (Z ® Ax) + (Bz ® X)
= (Z ® Ax) + (fJ-Z ® X)
= (A + fJ-)(Z ® X). 0
If A and Bare diagonalizable in Theorem 13.16, we can take p = nand q = m and thus get the complete eigenstructure of A $ B. In general, if A and B have Jordan form decompositions given by p-I AP = lA and Q-t BQ = l B , respectively, then
[(Q ® In)(lm ® p)rt[(lm ® A) + (B ® In)][CQ ® In)(lm ® P)]
= [(1m ® p)-I(Q ® In)-I][(lm ® A) + (B ® In)][(Q ® In)(/m ® P)]
= [(1m ® p-I)(Q-I ® In)][(lm ® A) + (B ® In)][CQ ® In)(/m <:9 P)]
= (1m ® lA) + (JB ® In)
is a Jordan-like structure for A $ B.
144 Chapter 13. Kronecker Products
A Schur form for A © B can be derived similarly. Again, suppose P and Q are unitarymatrices that reduce A and B, respectively, to Schur (triangular) form, i.e., PHAP = TA
and QHBQ = TB (and similarly if P and Q are orthogonal similarities reducing A and Bto real Schur form). Then
((Q ® /„)(/« ® P)]"[(/m <8> A) + (B ® /B)][(e (g) /„)(/„, ® P)] = (/m <8> rA) + (7* (g) /„),
where [(Q <8> /„)(/« ® P)] = (<2 ® P) is unitary by Theorem 13.3 and Corollary 13.8.
13.3 Application to Sylvester and Lyapunov EquationsIn this section we study the linear matrix equation
A special case of (13.3) is the symmetric equation
obtained by taking B = AT. When C is symmetric, the solution X e Wx" is easily shownalso to be symmetric and (13.4) is known as a Lyapunov equation. Lyapunov equationsarise naturally in stability theory.
The first important question to ask regarding (13.3) is, When does a solution exist?By writing the matrices in (13.3) in terms of their columns, it is easily seen by equating thez'th columns that
The coefficient matrix in (13.5) clearly can be written as the Kronecker sum (Im * A) +(BT ® /„). The following definition is very helpful in completing the writing of (13.5) asan "ordinary" linear system.
where A e R"x", B e Rmxm , and C e M"xm. This equation is now often called a Sylvesterequation in honor of J.J. Sylvester who studied general linear matrix equations of the form
These equations can then be rewritten as the mn x mn linear system
144 Chapter 13. Kronecker Products
A Schur fonn for A EB B can be derived similarly. Again, suppose P and Q are unitary matrices that reduce A and B, respectively, to Schur (triangular) fonn, i.e., pH AP = TA and QH BQ = TB (and similarly if P and Q are orthogonal similarities reducing A and B to real Schur fonn). Then
where [(Q ® In)(lm ® P)] = (Q ® P) is unitary by Theorem 13.3 and Corollary 13.8.
13.3 Application to Sylvester and Lyapunov Equations In this section we study the linear matrix equation
AX+XB=C, (13.3)
where A E IRnxn , B E IRmxm , and C E IRnxm. This equation is now often called a Sylvester equation in honor of 1.1. Sylvester who studied general linear matrix equations of the fonn
k
LA;XB; =C. ;=1
A special case of (13.3) is the symmetric equation
AX +XAT = C (13.4)
obtained by taking B = AT. When C is symmetric, the solution X E IRn xn is easily shown also to be symmetric and (13.4) is known as a Lyapunov equation. Lyapunovequations arise naturally in stability theory.
The first important question to ask regarding (13.3) is, When does a solution exist? By writing the matrices in (13.3) in tenns of their columns, it is easily seen by equating the ith columns that
m
AXi + Xb; = C; = AXi + l:~>j;Xj. j=1
These equations can then be rewritten as the mn x mn linear system
[
A+blll b21 1 bl21 A + b 2Z 1
blml b2ml
(13.5)
The coefficient matrix in (13.5) clearly can be written as the Kronecker sum (1m 0 A) + (B T 0 In). The following definition is very helpful in completing the writing of (13.5) as an "ordinary" linear system.
13.3. Application to Sylvester and Lyapunov Equations 145
Definition 13.17. Let c( € E.n denote the columns ofC e Rnxm so that C = [n, . . . , cm}.Then vec(C) is defined to be the mn-vector formed by stacking the columns ofC on top of
one another, i.e., vec(C) =
Using Definition 13.17, the linear system (13.5) can be rewritten in the form
There exists a unique solution to (13.6) if and only if [(Im ® A) + (BT ® /„)] is nonsingular.But [(Im <8> A) + (BT (g) /„)] is nonsingular if and only if it has no zero eigenvalues.From Theorem 13.16, the eigenvalues of [(/m <g> A) + (BT <8> /„)] are A., + IJLJ, whereA,,- e A (A), i e n_, and ^j e A(fi), j e m. We thus have the following theorem.
Theorem 13.18. Let A e Rnxn, B G Rmxm, and C e R"xm. 77ie/i the Sylvester equation
has a unique solution if and only if A and —B have no eigenvalues in common.
Sylvester equations of the form (13.3) (or symmetric Lyapunov equations of the form(13.4)) are generally not solved using the mn x mn "vec" formulation (13.6). The mostcommonly preferred numerical algorithm is described in [2]. First A and B are reduced to(real) Schur form. An equivalent linear system is then solved in which the triangular formof the reduced A and B can be exploited to solve successively for the columns of a suitablytransformed solution matrix X. Assuming that, say, n > m, this algorithm takes only O(n3)operations rather than the O(n6) that would be required by solving (13.6) directly withGaussian elimination. A further enhancement to this algorithm is available in [6] wherebythe larger of A or B is initially reduced only to upper Hessenberg rather than triangularSchur form.
The next few theorems are classical. They culminate in Theorem 13.24, one of manyelegant connections between matrix theory and stability theory for differential equations.
Theorem 13.19. Let A e Rnxn, B e Rmxm, and C e Rnxm. Suppose further that A and Bare asymptotically stable (a matrix is asymptotically stable if all its eigenvalues have realparts in the open left half-plane). Then the (unique) solution of the Sylvester equation
can be written as
Proof: Since A and B are stable, A., (A) + A;-(B) ^ 0 for all i, j so there exists a uniquesolution to(13.8)by Theorem 13.18. Now integrate the differential equation X = AX + XB(with X(0) = C) on [0, +00):
13.3. Application to Sylvester and Lyapunov Equations 145
Definition 13.17. Let Ci E jRn denote the columns ofC E jRnxm so that C = [CI, ... , Cm ].
::~~::~: ::d~~:::O :[]::::fonned by "ocking the colunuu of C on top of
Using Definition 13.17, the linear system (13.5) can be rewritten in the form
[(1m ® A) + (B T ® In)]vec(X) = vec(C). (13.6)
There exists a unique solution to (13.6) if and only if [(1m ® A) + (B T ® In)] is nonsingular. But [(1m ® A) + (B T ® In)] is nonsingular if and only if it has no zero eigenvalues. From Theorem 13.16, the eigenvalues of [(1m ® A) + (BT ® In)] are Ai + Mj, where Ai E A(A), i E!!, and Mj E A(B), j E!!!.. We thus have the following theorem.
Theorem 13.1S. Let A E lRnxn , B E jRmxm, and C E jRnxm. Then the Sylvester equation
AX+XB=C (13.7)
has a unique solution if and only if A and - B have no eigenvalues in common.
Sylvester equations of the form (13.3) (or symmetric Lyapunov equations of the form (13.4» are generally not solved using the mn x mn "vee" formulation (13.6). The most commonly preferred numerical algorithm is described in [2]. First A and B are reduced to (real) Schur form. An equivalent linear system is then solved in which the triangular form of the reduced A and B can be exploited to solve successively for the columns of a suitably transformed solution matrix X. Assuming that, say, n :::: m, this algorithm takes only 0 (n3)
operations rather than the O(n6) that would be required by solving (13.6) directly with Gaussian elimination. A further enhancement to this algorithm is available in [6] whereby the larger of A or B is initially reduced only to upper Hessenberg rather than triangular Schur form.
The next few theorems are classical. They culminate in Theorem 13.24, one of many elegant connections between matrix theory and stability theory for differential equations.
Theorem 13.19. Let A E jRnxn, B E jRmxm, and C E jRnxm. Suppose further that A and B are asymptotically stable (a matrix is asymptotically stable if all its eigenvalues have real parts in the open left half-plane). Then the (unique) solution of the Sylvester equation
AX+XB=C (13.8)
can be written as
(13.9)
Proof: Since A and B are stable, Aj(A) + Aj(B) =I 0 for all i, j so there exists a unique solution to (13.8) by Theorem 13.18. Now integrate the differential equation X = AX + X B (with X(O) = C) on [0, +00):
lim XU) - X(O) = A roo X(t)dt + ([+00 X(t)dt) B. I-Hoo 10 10 (13.10)
146 Chapter 13. Kronecker Products
Using the results of Section 11.1.6, it can be shown easily that lim e = lim e = 0.r—>+oo t—v+oo
Hence, using the solution X ( t ) = etACetB from Theorem 11.6, we have that lim X ( t ) — 0./—<-+3C
Substituting in (13.10) we have
Remark 13.20. An equivalent condition for the existence of a unique solution to AX +
XB = C is that [ J _cfi ] be similar to [ J _°B ] (via the similarity [ J _* ]).
Theorem 13.21. Lef A, C e R"x". TTzen r/ze Lyapunov equation
has a unique solution if and only if A and —AT have no eigenvalues in common. If C issymmetric and (13.11) has a unique solution, then that solution is symmetric.
Remark 13.22. If the matrix A e Wxn has eigenvalues A.I ,...,!„, then — AT has eigen-values —A.], . . . , —kn. Thus, a sufficient condition that guarantees that A and — AT haveno common eigenvalues is that A be asymptotically stable. Many useful results exist con-cerning the relationship between stability and Lyapunov equations. Two basic results dueto Lyapunov are the following, the first of which follows immediately from Theorem 13.19.
Theorem 13.23. Let A,C e R"x" and suppose further that A is asymptotically stable.Then the (unique) solution of the Lyapunov equation
can be written as
Theorem 13.24. A matrix A e R"x" is asymptotically stable if and only if there exists apositive definite solution to the Lyapunov equation
Proof: Suppose A is asymptotically stable. By Theorems 13.21 and 13.23 a solution to(13.13) exists and takes the form (13.12). Now let v be an arbitrary nonzero vector in E".Then
and so X
where C -
146 Chapter 13. Kronecker Products
Using the results of Section 11.1.6, it can be shown easily that lim elA = lim elB = O. 1-->+00 1 .... +00
Hence, using the solution X (t) = elACelB from Theorem 11.6, we have that lim X (t) = O. t~+x
Substituting in (13.10) we have
-C = A (1+00
elACelB dt) + (1+00
elACelB dt) B
{+oo and so X = -1o elACelB dt satisfies (13.8). o
Remark 13.20. An equivalent condition for the existence of a unique solution to AX + X B = C is that [~ _CB ] be similar to [~ _OB] (via the similarity [~ _~ ]).
Theorem 13.21. Let A, C E jRnxn. Then the Lyapunov equation
AX+XAT = C (13.11)
has a unique solution if and only if A and - A T have no eigenvalues in common. If C is symmetric and ( 13.11) has a unique solution, then that solution is symmetric.
Remark 13.22. If the matrix A E jRn xn has eigenvalues )"" ... , An, then - AT has eigenvalues -AI, ... , - An. Thus, a sufficient condition that guarantees that A and - A T have no common eigenvalues is that A be asymptotically stable. Many useful results exist concerning the relationship between stability and Lyapunov equations. Two basic results due to Lyapunov are the following, the first of which follows immediately from Theorem 13.19.
Theorem 13.23. Let A, C E jRnxn and suppose further that A is asymptotically stable. Then the (unique) solution o/the Lyapunov equation
AX+XAT=C
can be written as
(13.12)
Theorem 13.24. A matrix A E jRnxn is asymptotically stable if and only if there exists a positive definite solution to the Lyapunov equation
AX +XAT = C, (13.13)
where C = C T < O.
Proof: Suppose A is asymptotically stable. By Theorems l3.21 and l3.23 a solution to (13.13) exists and takes the form (13.12). Now let v be an arbitrary nonzero vector in jRn.
Then
13.3. Application to Sylvester and Lyapunov Equations 147
Since — C > 0 and etA is nonsingular for all t, the integrand above is positive. HencevTXv > 0 and thus X is positive definite.
Conversely, suppose X = XT > 0 and let A. e A (A) with corresponding left eigen-vector y. Then
Since yHXy > 0, we must have A + A = 2 R e A < 0 . Since A was arbitrary, A must beasymptotically stable. D
Remark 13.25. The Lyapunov equation AX + XAT = C can also be written using thevec notation in the equivalent form
A subtle point arises when dealing with the "dual" Lyapunov equation ATX + XA = C.The equivalent "vec form" of this equation is
However, the complex-valued equation AHX + XA = C is equivalent to
The vec operator has many useful properties, most of which derive from one keyresult.
Theorem 13.26. For any three matrices A, B, and C for which the matrix product ABC isdefined,
Proof: The proof follows in a fairly straightforward fashion either directly from the defini-tions or from the fact that vec(;t;yr) = y <8> x. D
An immediate application is to the derivation of existence and uniqueness conditionsfor the solution of the simple Sylvester-like equation introduced in Theorem 6.11.
Theorem 13.27. Let A e Rmxn, B e Rpx(}, and C e Rmxq. Then the equation
has a solution X e R.nxp if and only ifAA+CB+B = C, in which case the general solutionis of the form
where Y e Rnxp is arbitrary. The solution of (13.14) is unique if BB+ ® A+A = I.
Proof: Write (13.14) as
13.3. Application to Sylvester and Lyapunov Equations 147
Since -C > 0 and etA is nonsingular for all t, the integrand above is positive. Hence vT Xv > 0 and thus X is positive definite.
Conversely, suppose X = XT > 0 and let A E A(A) with corresponding left eigenvector y. Then
0> yHCy = yH AXy + yHXAT Y
= (A + I)yH Xy.
Since yH Xy > 0, we must have A + I = 2 Re A < O. Since A was arbitrary, A must be asymptotically stable. D
Remark 13.25. The Lyapunov equation AX + X A T = C can also be written using the vec notation in the equivalent form
[(/ ® A) + (A ® l)]vec(X) = vec(C).
A subtle point arises when dealing with the "dual" Lyapunov equation A T X + X A = C. The equivalent "vec form" of this equation is
[(/ ® AT) + (AT ® l)]vec(X) = vec(C).
However, the complex-valued equation A H X + X A = C is equivalent to
[(/ ® AH) + (AT ® l)]vec(X) = vec(C).
The vec operator has many useful properties, most of which derive from one key result.
Theorem 13.26. For any three matrices A, B, and C for which the matrix product ABC is defined,
vec(ABC) = (C T ® A)vec(B).
Proof: The proof follows in a fairly straightforward fashion either directly from the defini-tions or from the fact that vec(xyT) = y ® x. D
An immediate application is to the derivation of existence and uniqueness conditions for the solution of the simple Sylvester-like equation introduced in Theorem 6.11.
Theorem 13.27. Let A E jRrnxn, B E jRPxq, and C E jRrnxq. Then the equation
AXB =C (13.14)
has a solution X E jRn x p if and only if A A + C B+ B = C, in which case the general solution is of the form
(13.15)
where Y E jRnxp is arbitrary. The solution of (13. 14) is unique if BB+ ® A+ A = [.
Proof: Write (13.14) as (B T ® A)vec(X) = vec(C) (13.16)
148 Chapter 13. Kronecker Products
by Theorem 13.26. This "vector equation" has a solution if and only if
It is a straightforward exercise to show that (M ® N)+ = M+ <8> N+. Thus, (13.16) has asolution if and only if
and hence if and only if AA+CB+B = C.The general solution of (13.16) is then given by
where Y is arbitrary. This equation can then be rewritten in the form
or, using Theorem 13.26,
The solution is clearly unique if BB+ <8> A+A = I. D
EXERCISES
1. For any two matrices A and B for which the indicated matrix product is defined,show that (vec(A))r(vec(fl)) = Tr(Ar£). In particular, if B e Rnxn, then Tr(fl) =vec(/Jrvec(fl).
2. Prove that for all matrices A and B, (A ® B)+ = A+ ® B+.
3. Show that the equation AX B = C has a solution for all C if A has full row rank andB has full column rank. Also, show that a solution, if it exists, is unique if A has fullcolumn rank and B has full row rank. What is the solution in this case?
4. Show that the general linear equation
can be written in the form
148 Chapter 1 3. Kronecker Products
by Theorem 13.26. This "vector equation" has a solution if and only if
(B T ® A)(B T ® A)+ vec(C) = vec(C).
It is a straightforward exercise to show that (M ® N) + = M+ ® N+. Thus, (13.16) has a solution if and only if
vec(C) = (B T ® A)«B+{ ® A+)vec(C)
= [(B+ B{ ® AA+]vec(C)
= vec(AA +C B+ B)
and hence if and only if AA + C B+ B = C. The general solution of (13 .16) is then given by
vec(X) = (B T ® A) + vec(C) + [I - (BT ® A) + (B T ® A)]vec(Y),
where Y is arbitrary. This equation can then be rewritten in the form
vec(X) = «B+{ ® A+)vec(C) + [I - (BB+{ ® A+ A]vec(y)
or, using Theorem 13.26,
The solution is clearly unique if B B+ ® A + A = I. 0
EXERCISES
I. For any two matrices A and B for which the indicated matrix product is defined, show that (vec(A»T (vec(B» = Tr(A T B). In particular, if B E lRnxn , then Tr(B) =
vec(Inl vec(B).
2. Prove that for all matrices A and B, (A ® B)+ = A+ ® B+.
3. Show that the equation AX B = C has a solution for all C if A has full row rank and B has full column rank. Also, show that a solution, if it exists, is unique if A has full column rank and B has full row rank. What is the solution in this case?
4. Show that the general linear equation
k
LAiXBi =C i=1
can be written in the form
[BT ® AI + ... + B[ ® Ak]vec(X) = vec(C).
Exercises 149
5. Let x € Mm and y e E". Show that *r <8> y = yxT.
6. Let A e R"xn and £ e Mm x m .
(a) Show that ||A <8> B||2 = ||A||2||£||2.
(b) What is ||A ® B\\F in terms of the Frobenius norms of A and B? Justify youranswer carefully.
(c) What is the spectral radius of A <8> B in terms of the spectral radii of A and B?Justify your answer carefully.
7. Let A, 5 eR"x".
(a) Show that (/ ® A)* = / <8> A* and (fl <g> /)* = Bfc ® / for all integers &.
(b) Show that el®A = I <g) eA and e5®7 = eB (g) /.
(c) Show that the matrices / (8) A and B ® / commute.
(d) Show that
(Note: This result would look a little "nicer" had we defined our Kroneckersum the other way around. However, Definition 13.14 is conventional in theliterature.)
8. Consider the Lyapunov matrix equation (13.11) with
and C the symmetric matrix
Clearly
is a symmetric solution of the equation. Verify that
is also a solution and is nonsymmetric. Explain in light of Theorem 13.21.
9. Block Triangularization: Let
where A e Rnxn and D e Rmxm. It is desired to find a similarity transformationof the form
such that T l ST is block upper triangular.
Exercises 149
5. Let x E ]Rm and y E ]Rn. Show that x T ® y = y X T •
(a) Show that IIA ® BII2 = IIAII2I1Blb. (b) What is II A ® B II F in terms of the Frobenius norms of A and B? Justify your
answer carefully.
(c) What is the spectral radius of A ® B in terms of the spectral radii of A and B?
Justify your answer carefully.
7. Let A, B E ]Rnxn.
(a) Show that (l ® A)k = I ® Ak and (B ® Il = Bk ® I for all integers k.
(b) Show that el®A = I ® e A and eB®1 = e B ® I.
(c) Show that the matrices I ® A and B ® I commute.
(d) Show that e AEIlB = eU®A)+(B®l) = e B ® e A .
(Note: This result would look a little "nicer" had we defined our Kronecker sum the other way around. However, Definition 13.14 is conventional in the literature.)
8. Consider the Lyapunov matrix equation (13.11) with
A = [~ _~ ]
and C the symmetric matrix
[ ~ Clearly
Xs = [~ ~ ]
is a symmetric solution of the equation. Verify that
Xns = [_~ ~ ]
is also a solution and is nonsymmetric. Explain in light of Theorem 13.21.
9. Block Triangularization: Let
where A E ]Rn xn and D E ]Rm xm. It is desired to find a similarity transformation of the form
T=[~ ~J such that T- 1 ST is block upper triangular.
150 Chapter 13. Kronecker Products
(a) Show that S is similar to
if X satisfies the so-called matrix Riccati equation
(b) Formulate a similar result for block lower triangularization of S.
10. Block Diagonalization: Let
where A e Rnxn and D E Rm x m . It is desired to find a similarity transformation ofthe form
such that T l ST is block diagonal,
(a) Show that S is similar to
if Y satisfies the Sylvester equation
(b) Formulate a similar result for block diagonalization of
150 Chapter 13. Kronecker Products
(a) Show that S is similar to
[A +OBX B ]
D-XB
if X satisfies the so-called matrix Riccati equation
C-XA+DX-XBX=O.
(b) Fonnulate a similar result for block lower triangularization of S.
to. Block Diagonalization: Let
S= [~ ~ l where A E jRnxn and D E jRmxm. It is desired to find a similarity transfonnation of the fonn
T=[~ ~] such that T- 1 ST is block diagonal.
(a) Show that S is similar to
if Y satisfies the Sylvester equation
AY - YD = -B.
(b) Fonnulate a similar result for block diagonalization of
Bibliography
[1] Albert, A., Regression and the Moore-Penrose Pseudoinverse, Academic Press, NewYork, NY, 1972.
[2] Bartels, R.H., and G.W. Stewart, "Algorithm 432. Solution of the Matrix EquationAX + XB = C," Cornm. ACM, 15(1972), 820-826.
[3] Bellman, R., Introduction to Matrix Analysis, Second Edition, McGraw-Hill, NewYork, NY, 1970.
[4] Bjorck, A., Numerical Methods for Least Squares Problems, SIAM, Philadelphia, PA,1996.
[5] Cline, R.E., "Note on the Generalized Inverse of the Product of Matrices," SIAM Rev.,6(1964), 57–58.
[6] Golub, G.H., S. Nash, and C. Van Loan, "A Hessenberg-Schur Method for the ProblemAX + XB = C," IEEE Trans. Autom. Control, AC-24(1979), 909-913.
[7] Golub, G.H., and C.F. Van Loan, Matrix Computations, Third Edition, Johns HopkinsUniv. Press, Baltimore, MD, 1996.
[8] Golub, G.H., and J.H. Wilkinson, "Ill-Conditioned Eigensystems and the Computationof the Jordan Canonical Form," SIAM Rev., 18(1976), 578-619.
[9] Greville, T.N.E., "Note on the Generalized Inverse of a Matrix Product," SIAM Rev.,8(1966), 518–521 [Erratum, SIAM Rev., 9(1967), 249].
[10] Halmos, PR., Finite-Dimensional Vector Spaces, Second Edition, Van Nostrand,Princeton, NJ, 1958.
[11] Higham, N.J., Accuracy and Stability of'Numerical Algorithms, Second Edition, SIAM,Philadelphia, PA, 2002.
[12] Horn, R.A., and C.R. Johnson, Matrix Analysis, Cambridge Univ. Press, Cambridge,UK, 1985.
[13] Horn, R.A., and C.R. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press,Cambridge, UK, 1991.
151
Bibliography
[1] Albert, A., Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York, NY, 1972.
[2] Bartels, RH., and G.w. Stewart, "Algorithm 432. Solution of the Matrix Equation AX + X B = C," Comm. ACM, 15(1972),820-826.
[3] Bellman, R, Introduction to Matrix Analysis, Second Edition, McGraw-Hill, New York, NY, 1970.
[4] Bjorck, A., Numerical Methodsfor Least Squares Problems, SIAM, Philadelphia, PA, 1996.
[5] Cline, R.E., "Note on the Generalized Inverse of the Product of Matrices," SIAM Rev., 6(1964),57-58.
[6] Golub, G.H., S. Nash, and C. Van Loan, "A Hessenberg-Schur Method for the Problem AX + X B = C," IEEE Trans. Autom. Control, AC-24(1979), 909-913.
[7] Golub, G.H., and c.F. Van Loan, Matrix Computations, Third Edition, Johns Hopkins Univ. Press, Baltimore, MD, 1996.
[8] Golub, G.H., and lH. Wilkinson, "Ill-Conditioned Eigensystems and the Computation ofthe Jordan Canonical Form," SIAM Rev., 18(1976),578-619.
[9] Greville, T.N.E., "Note on the Generalized Inverse of a Matrix Product," SIAM Rev., 8(1966),518-521 [Erratum, SIAM Rev., 9(1967), 249].
[10] Halmos, P.R, Finite-Dimensional Vector Spaces, Second Edition, Van Nostrand, Princeton, NJ, 1958.
[11] Higham, N.1., Accuracy and Stability of Numerical Algorithms, Second Edition, SIAM, Philadelphia, PA, 2002.
[12] Hom, RA., and C.R. Johnson, Matrix Analysis, Cambridge Univ. Press, Cambridge, UK, 1985.
[13] Hom, RA., and C.R. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press, Cambridge, UK, 1991.
151
152 Bibliography
[14] Kenney, C, and A.J. Laub, "Controllability and Stability Radii for Companion FormSystems," Math, of Control, Signals, and Systems, 1(1988), 361-390.
[15] Kenney, C.S., and A.J. Laub, "The Matrix Sign Function," IEEE Trans. Autom. Control,40(1995), 1330–1348.
[16] Lancaster, P., and M. Tismenetsky, The Theory of Matrices, Second Edition withApplications, Academic Press, Orlando, FL, 1985.
[17] Laub, A.J., "A Schur Method for Solving Algebraic Riccati Equations," IEEE Trans..Autom. Control, AC-24( 1979), 913–921.
[18] Meyer, C.D., Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, PA,2000.
[19] Moler, C.B., and C.F. Van Loan, "Nineteen Dubious Ways to Compute the Exponentialof a Matrix," SIAM Rev., 20(1978), 801-836.
[20] Noble, B., and J.W. Daniel, Applied Linear Algebra, Third Edition, Prentice-Hall,Englewood Cliffs, NJ, 1988.
[21] Ortega, J., Matrix Theory. A Second Course, Plenum, New York, NY, 1987.
[22] Penrose, R., "A Generalized Inverse for Matrices," Proc. Cambridge Philos. Soc.,51(1955), 406–413.
[23] Stewart, G. W., Introduction to Matrix Computations, Academic Press, New York, NY,1973.
[24] Strang, G., Linear Algebra and Its Applications, Third Edition, Harcourt BraceJovanovich, San Diego, CA, 1988.
[25] Watkins, D.S., Fundamentals of Matrix Computations, Second Edition, Wiley-Interscience, New York, 2002.
[26] Wonham, W.M., Linear Multivariable Control. A Geometric Approach, Third Edition,Springer-Verlag, New York, NY, 1985.
152 Bibliography
[14] Kenney, C., and AJ. Laub, "Controllability and Stability Radii for Companion Fonn Systems," Math. of Control, Signals, and Systems, 1(1988),361-390.
[15] Kenney, C.S., andAJ. Laub, "The Matrix Sign Function," IEEE Trans. Autom. Control, 40(1995),1330-1348.
[16] Lancaster, P., and M. Tismenetsky, The Theory of Matrices, Second Edition with Applications, Academic Press, Orlando, FL, 1985.
[17] Laub, AJ., "A Schur Method for Solving Algebraic Riccati Equations," IEEE Trans .. Autom. Control, AC-24( 1979), 913-921.
[18] Meyer, C.D., Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, PA, 2000.
[19] Moler, c.B., and c.P. Van Loan, "Nineteen Dubious Ways to Compute the Exponential of a Matrix," SIAM Rev., 20(1978),801-836.
[20] Noble, B., and J.w. Daniel, Applied Linear Algebra, Third Edition, Prentice-Hall, Englewood Cliffs, NJ, 1988.
[21] Ortega, J., Matrix Theory. A Second Course, Plenum, New York, NY, 1987.
[22] Pemose, R., "A Generalized Inverse for Matrices," Proc. Cambridge Philos. Soc., 51(1955),406-413.
[23] Stewart, G.W., Introduction to Matrix Computations, Academic Press, New York, NY, 1973.
[24] Strang, G., Linear Algebra and Its Applications, Third Edition, Harcourt Brace Jovanovich, San Diego, CA, 1988.
[25] Watkins, D.S., Fundamentals of Matrix Computations, Second Edition, WileyInterscience, New York, 2002.
[26] Wonham, W.M., Linear Multivariable Control. A Geometric Approach, Third Edition, Springer-Verlag, New York, NY, 1985.
Index
A–invariant subspace, 89matrix characterization of, 90
algebraic multiplicity, 76angle between vectors, 58
basis, 11natural, 12
block matrix, 2definiteness of, 104diagonalization, 150inverse of, 48LU factorization, 5triangularization, 149
C", 1(pmxn i
(p/nxn 1
Cauchy–Bunyakovsky–Schwarz Inequal-ity, 58
Cayley–Hamilton Theorem, 75chain
of eigenvectors, 87characteristic polynomial
of a matrix, 75of a matrix pencil, 125
Cholesky factorization, 101co–domain, 17column
rank, 23vector, 1
companion matrixinverse of, 105pseudoinverse of, 106singular values of, 106singular vectors of, 106
complementof a subspace, 13orthogonal, 21
congruence, 103conjugate transpose, 2contragredient transformation, 137controllability, 46
defective, 76degree
of a principal vector, 85determinant, 4
of a block matrix, 5properties of, 4–6
dimension, 12direct sum
of subspaces, 13domain, 17
eigenvalue, 75invariance under similarity transfor-
mation, 81elementary divisors, 84equivalence transformation, 95
orthogonal, 95unitary, 95
equivalent generalized eigenvalue prob-lems, 127
equivalent matrix pencils, 127exchange matrix, 39, 89exponential of a Jordan block, 91, 115exponential of a matrix, 81, 109
computation of, 114–118inverse of, 110properties of, 109–112
field, 7four fundamental subspaces, 23function of a matrix, 81
generalized eigenvalue, 125generalized real Schur form, 128
153
Index
A-invariant subspace, 89 matrix characterization of, 90
algebraic multiplicity, 76 angle between vectors, 58
basis, 11 natural, 12
block matrix, 2 definiteness of, 104 diagonalization, 150 inverse of, 48 LV factorization, 5 triangularization, 149
en, 1 emxn , 1 e~xn, 1 Cauchy-Bunyakovsky-Schwarz Inequal
ity,58 Cayley-Hamilton Theorem, 75 chain
of eigenvectors, 87 characteristic polynomial
of a matrix, 75 of a matrix pencil, 125
Cholesky factorization, 101 co-domain, 17 column
rank, 23 vector, 1
companion matrix inverse of, 105 pseudoinverse of, 106 singular values of, 106 singular vectors of, 106
complement of a subspace, 13 orthogonal, 21
153
congruence, 103 conjugate transpose, 2 contragredient transformation, 137 controllability, 46
defective, 76 degree
of a principal vector, 85 determinant, 4
of a block matrix, 5 properties of, 4-6
dimension, 12 direct sum
of subspaces, 13 domain, 17
eigenvalue, 75 invariance under similarity transfor
mation,81 elementary divisors, 84 equivalence transformation, 95
orthogonal, 95 unitary, 95
equivalent generalized eigenvalue prob-lems, 127
equivalent matrix pencils, 127 exchange matrix, 39, 89 exponential of a Jordan block, 91, 115 exponential of a matrix, 81, 109
computation of, 114-118 inverse of, 110 properties of, 109-112
field, 7 four fundamental subspaces, 23 function of a matrix, 81
generalized eigenvalue, 125 generalized real Schur form, 128
154 Index
generalized Schur form, 127generalized singular value decomposition,
134geometric multiplicity, 76
Holder Inequality, 58Hermitian transpose, 2higher–order difference equations
conversion to first–order form, 121higher–order differential equations
conversion to first–order form, 120higher–order eigenvalue problems
conversion to first–order form, 136
i, 2idempotent, 6, 51identity matrix, 4inertia, 103initial–value problem, 109
for higher–order equations, 120for homogeneous linear difference
equations, 118for homogeneous linear differential
equations, 112for inhomogeneous linear difference
equations, 119for inhomogeneous linear differen-
tial equations, 112inner product
complex, 55complex Euclidean, 4Euclidean, 4, 54real, 54usual, 54weighted, 54
invariant factors, 84inverses
of block matrices, 47
7, 2Jordan block, 82Jordan canonical form (JCF), 82
Kronecker canonical form (KCF), 129Kronecker delta, 20
Kronecker product, 139determinant of, 142eigenvalues of, 141eigenvectors of, 141products of, 140pseudoinverse of, 148singular values of, 141trace of, 142transpose of, 140
Kronecker sum, 142eigenvalues of, 143eigenvectors of, 143exponential of, 149
leading principal submatrix, 100left eigenvector, 75left generalized eigenvector, 125left invertible, 26left nullspace, 22left principal vector, 85linear dependence, 10linear equations
characterization of all solutions, 44existence of solutions, 44uniqueness of solutions, 45
linear independence, 10linear least squares problem, 65
general solution of, 66geometric solution of, 67residual of, 65solution via QR factorization, 71solution via singular value decom-
position, 70statement of, 65uniqueness of solution, 66
linear regression, 67linear transformation, 17
co–domain of, 17composition of, 19domain of, 17invertible, 25left invertible, 26matrix representation of, 18nonsingular, 25nullspace of, 20
154
generalized Schur form, 127 generalized singular value decomposition,
134 geometric multiplicity, 76
Holder Inequality, 58 Hermitian transpose, 2 higher-order difference equations
conversion to first-order form, 121 higher-order differential equations
conversion to first-order form, 120 higher-order eigenvalue problems
conversion to first-order form, 136
i,2 idempotent, 6, 51 identity matrix, 4 inertia, 103 initial-value problem, 109
for higher-order equations, 120 for homogeneous linear difference
equations, 118 for homogeneous linear differential
equations, 112 for inhomogeneous linear difference
equations, 119 for inhomogeneous linear differen
tial equations, 112 inner product
complex, 55 complex Euclidean, 4 Euclidean, 4, 54 real, 54 usual, 54 weighted, 54
invariant factors, 84 inverses
of block matrices, 47
j,2 Jordan block, 82 Jordan canonical form (JCF), 82
Kronecker canonical form (KCF), 129 Kronecker delta, 20
Kronecker product, 139 determinant of, 142 eigenvalues of, 141 eigenvectors of, 141 products of, 140 pseudoinverse of, 148 singUlar values of, 141 trace of, 142 transpose of, 140
Kronecker sum, 142 eigenvalues of, 143 eigenvectors of, 143 exponential of, 149
leading principal submatrix, 100 left eigenvector, 75 left generalized eigenvector, 125 left invertible. 26 left nullspace, 22 left principal vector, 85 linear dependence, 10 linear equations
Index
characterization of all solutions, 44 existence of solutions, 44 uniqueness of solutions, 45
linear independence, 10 linear least squares problem, 65
general solution of, 66 geometric solution of, 67 residual of, 65 solution via QR factorization, 71 solution via singular value decom-
position, 70 statement of, 65 uniqueness of solution, 66
linear regression, 67 linear transformation, 17
co-domain of, 17 composition of, 19 domain of, 17 invertible, 25 left invertible. 26 matrix representation of, 18 nonsingular, 25 nulls pace of, 20
Index 155
range of, 20right invertible, 26
LU factorization, 6block, 5
Lyapunov differential equation, 113Lyapunov equation, 144
and asymptotic stability, 146integral form of solution, 146symmetry of solution, 146uniqueness of solution, 146
matrixasymptotically stable, 145best rank k approximation to, 67companion, 105defective, 76definite, 99derogatory, 106diagonal, 2exponential, 109Hamiltonian, 122Hermitian, 2Householder, 97indefinite, 99lower Hessenberg, 2lower triangular, 2nearest singular matrix to, 67nilpotent, 115nonderogatory, 105normal, 33, 95orthogonal, 4pentadiagonal, 2quasi–upper–triangular, 98sign of a, 91square root of a, 101symmetric, 2symplectic, 122tridiagonal, 2unitary, 4upper Hessenberg, 2upper triangular, 2
matrix exponential, 81, 91, 109matrix norm, 59
1–.602–, 60oo–, 60
/?–, 60consistent, 61Frobenius, 60induced by a vector norm, 61mixed, 60mutually consistent, 61relations among, 61Schatten, 60spectral, 60subordinate to a vector norm, 61unitarily invariant, 62
matrix pencil, 125equivalent, 127reciprocal, 126regular, 126singular, 126
matrix sign function, 91minimal polynomial, 76monic polynomial, 76Moore–Penrose pseudoinverse, 29multiplication
matrix–matrix, 3matrix–vector, 3
Murnaghan–Wintner Theorem, 98
negative definite, 99negative invariant subspace, 92nonnegative definite, 99
criteria for, 100nonpositive definite, 99norm
induced, 56natural, 56
normal equations, 65normed linear space, 57nullity, 24nullspace, 20
left, 22right, 22
observability, 46one–to–one (1–1), 23
conditions for, 25onto, 23
conditions for, 25
Index
range of, 20 right invertible, 26
LV factorization, 6 block,5
Lyapunov differential equation, 113 Lyapunov equation, 144
and asymptotic stability, 146 integral form of solution, 146 symmetry of solution, 146 uniqueness of solution, 146
matrix asymptotically stable, 145 best rank k approximation to, 67 companion, 105 defective, 76 definite, 99 derogatory, 106 diagonal,2 exponential, 109 Hamiltonian, 122 Hermitian, 2 Householder, 97 indefinite, 99 lower Hessenberg, 2 lower triangular, 2 nearest singular matrix to, 67 nilpotent, 115 nonderogatory, 105 normal, 33, 95 orthogonal, 4 pentadiagonal, 2 quasi-upper-triangular, 98 sign of a, 91 square root of a, 10 1 symmetric, 2 symplectic, 122 tridiagonal, 2 unitary, 4 upper Hessenberg, 2 upper triangular, 2
matrix exponential, 81, 91, 109 matrix norm, 59
1-,60 2-,60 00-,60
p-,60 consistent, 61 Frobenius, 60 induced by a vector norm, 61 mixed,60 mutually consistent, 61 relations among, 61 Schatten,60 spectral, 60
155
subordinate to a vector norm, 61 unitarily invariant, 62
matrix pencil, 125 equivalent, 127 reciprocal, 126 regular, 126 singUlar, 126
matrix sign function, 91 minimal polynomial, 76 monic polynomial, 76 Moore-Penrose pseudoinverse, 29 multiplication
matrix-matrix, 3 matrix-vector, 3
Mumaghan-Wintner Theorem, 98
negative definite, 99 negative invariant subspace, 92 nonnegative definite, 99
criteria for, 100 nonpositive definite, 99 norm
induced,56 natural,56
normal equations, 65 normed linear space, 57 nullity, 24 nullspace,20
left, 22 right, 22
observability, 46 one-to-one (1-1), 23
conditions for, 25 onto, 23
conditions for, 25
156 Index
orthogonalcomplement, 21matrix, 4projection, 52subspaces, 14vectors, 4, 20
orthonormalvectors, 4, 20
outer product, 19and Kronecker product, 140exponential of, 121pseudoinverse of, 33singular value decomposition of, 41various matrix norms of, 63
pencilequivalent, 127of matrices, 125reciprocal, 126regular, 126singular, 126
Penrose theorem, 30polar factorization, 41polarization identity, 57positive definite, 99
criteria for, 100positive invariant subspace, 92power (Kth) of a Jordan block, 120powers of a matrix
computation of, 119–120principal submatrix, 100projection
oblique, 51on four fundamental subspaces, 52orthogonal, 52
pseudoinverse, 29four Penrose conditions for, 30of a full–column–rank matrix, 30of a full–row–rank matrix, 30of a matrix product, 32of a scalar, 31of a vector, 31uniqueness, 30via singular value decomposition, 38
Pythagorean Identity, 59
Q –orthogonality, 55QR factorization, 72
TO" 1IK , 1
M mxn i, 1
M mxn 1r '
Mnxn 1n ' '
range, 20range inclusion
characterized by pseudoinverses, 33rank, 23
column, 23row, 23
rank–one matrix, 19rational canonical form, 104Rayleigh quotient, 100reachability, 46real Schur canonical form, 98real Schur form, 98reciprocal matrix pencil, 126reconstructibility, 46regular matrix pencil, 126residual, 65resolvent, 111reverse–order identity matrix, 39, 89right eigenvector, 75right generalized eigenvector, 125right invertible, 26right nullspace, 22right principal vector, 85row
rank, 23vector, 1
Schur canonical form, 98generalized, 127
Schur complement, 6, 48, 102, 104Schur Theorem, 98Schur vectors, 98second–order eigenvalue problem, 135
conversion to first–order form, 135Sherman–Morrison–Woodbury formula,
48signature, 103similarity transformation, 95
and invariance of eigenvalues, h
156
orthogonal complement, 21 matrix, 4 projection, 52 subspaces, 14 vectors, 4, 20
orthonormal vectors, 4, 20
outer product, 19 and Kronecker product, 140 exponential of, 121 pseudoinverse of, 33 singular value decomposition of, 41 various matrix norms of, 63
pencil equivalent, 127 of matrices, 125 reciprocal, 126 regular, 126 singular, 126
Penrose theorem, 30 polar factorization, 41 polarization identity, 57 positive definite, 99
criteria for, 100 positive invariant subspace, 92 power (kth) of a Jordan block, 120 powers of a matrix
computation of, 119-120 principal submatrix, 100 projection
oblique, 51 on four fundamental subspaces, 52 orthogonal, 52
pseudoinverse, 29 four Penrose conditions for, 30 of a full-column-rank matrix, 30 of a full-row-rank matrix, 30 of a matrix product, 32 of a scalar, 31 of a vector, 31 uniqueness, 30 via singular value decomposition, 38
Pythagorean Identity, 59
Q-orthogonality, 55 QR factorization, 72
JR.n, I JR.mxn,1 lR.~xn, 1 JR.~xn, I
range, 20 range inclusion
Index
characterized by pseudoinverses, 33 rank, 23
column, 23 row, 23
rank-one matrix, 19 rational canonical form, 104 Rayleigh quotient, 100 reachability, 46 real Schur canonical form, 98 real Schur form, 98 reciprocal matrix pencil, 126 reconstructibility, 46 regular matrix pencil, 126 residual, 65 resolvent, III reverse-order identity matrix, 39, 89 right eigenvector, 75 right generalized eigenvector, 125 right invertible, 26 right nullspace, 22 right principal vector, 85 row
rank, 23 vector, I
Schur canonical form, 98 generalized, 127
Schur complement, 6, 48, 102, 104 Schur Theorem, 98 Schur vectors, 98 second-order eigenvalue problem, 135
conversion to first-order form, 135 Sherman-Morrison-Woodbury formula,
48 signature, 103 similarity transformation, 95
and invariance of eigenvalues, 81
Index 157
orthogonal, 95unitary, 95
simple eigenvalue, 85simultaneous diagonalization, 133
via singular value decomposition, 134singular matrix pencil, 126singular value decomposition (SVD), 35
and bases for four fundamentalsubspaces, 38
and pseudoinverse, 38and rank, 38characterization of a matrix factor-
ization as, 37dyadic expansion, 38examples, 37full vs. compact, 37fundamental theorem, 35nonuniqueness, 36
singular values, 36singular vectors
left, 36right, 36
span, 11spectral radius, 62, 107spectral representation, 97spectrum, 76subordinate norm, 61subspace, 9
A–invariant, 89deflating, 129reducing, 130
subspacescomplements of, 13direct sum of, 13equality of, 10four fundamental, 23intersection of, 13orthogonal, 14sum of, 13
Sylvester differential equation, 113Sylvester equation, 144
integral form of solution, 145uniqueness of solution, 145
Sylvester's Law of Inertia, 103symmetric generalized eigenvalue prob-
lem, 131
total least squares, 68trace, 6transpose, 2
characterization by inner product, 54of a block matrix, 2
triangle inequalityfor matrix norms, 59for vector norms, 57
unitarily invariantmatrix norm, 62vector norm, 58
variation of parameters, 112vec
of a matrix, 145of a matrix product, 147
vector norm, 57l–, 572–, 57oo–, 57P–, 51equivalent, 59Euclidean, 57Manhattan, 57relations among, 59unitarily invariant, 58weighted, 58weighted p–, 58
vector space, 8dimension of, 12
vectors, 1column, 1linearly dependent, 10linearly independent, 10orthogonal, 4, 20orthonormal, 4, 20row, 1span of a set of, 11
zerosof a linear dynamical system, 130
Index
orthogonal, 95 unitary, 95
simple eigenvalue, 85 simultaneous diagonalization, 133
via singular value decomposition, 134 singular matrix pencil, 126 singular value decomposition (SVD), 35
and bases for four fundamental subspaces, 38
and pseudoinverse, 38 and rank, 38 characterization of a matrix factor-
ization as, 37 dyadic expansion, 38 examples, 37 full vs. compact, 37 fundamental theorem, 35 nonuniqueness, 36
singular values, 36 singular vectors
left, 36 right, 36
span, 11 spectral radius, 62, 107 spectral representation, 97 spectrum, 76 subordinate norm, 61 subspace, 9
A-invariant, 89 deflating, 129 reducing, 130
subspaces complements of, 13 direct sum of, 13 equality of, 10 four fundamental, 23 intersection of, 13 orthogonal, 14 sum of, 13
Sylvester differential equation, 113 Sylvester equation, 144
integral form of solution, 145 uniqueness of solution, 145
157
Sylvester's Law of Inertia, 103 symmetric generalized eigenvalue prob
lem,131
total least squares, 68 trace, 6 transpose, 2
characterization by inner product, 54 of a block matrix, 2
triangle inequality for matrix norms, 59 for vector norms, 57
unitarily invariant matrix norm, 62 vector norm, 58
variation of parameters, 112 vec
of a matrix, 145 of a matrix product, 147
vector norm, 57 1-,57 2-,57 00-,57 p-,57 equivalent, 59 Euclidean, 57 Manhattan, 57 relations among, 59 unitarily invariant, 58 weighted, 58 weighted p-, 58
vector space, 8 dimension of, 12
vectors, 1 column, 1 linearly dependent, 10 linearly independent, 10 orthogonal, 4, 20 orthonormal, 4, 20 row, 1 span of a set of, 11
zeros of a linear dynamical system, 130