cr.intro
TRANSCRIPT
-
7/25/2019 cr.intro
1/68
Linear Algebra and Matrices
Notes from Charles A. Rohde
Fall 2003
-
7/25/2019 cr.intro
2/68
Contents
1 Linear Vector Spaces 11.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . 3
2 Linear Transformations 7
2.1 Sums and Scalar Products . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Matrices 9
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Sums and Products of Matrices . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Conjugate Transpose and Transpose Operations . . . . . . . . . . . . . . 13
3.4 Invertibility and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Direct Products and Vecs . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Matrix Factorization 22
4.1 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
i
-
7/25/2019 cr.intro
3/68
5 Linear Equations 34
5.1 Consistency Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Solutions to Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Aisp by p of rankp . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 Aisn byp of rankp . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.3 Aisn byp of rankr where r p n . . . . . . . . . . . . . . . 37
5.2.4 Generalized Inverses . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Determinants 40
6.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7 Inner Product Spaces 44
7.1 definitions and Elementary Properties . . . . . . . . . . . . . . . . . . . . 44
7.2 Gram Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8 Characteristic Roots and Vectors 51
8.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.2 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.4 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.5 The Singular Value Decomposition (SVD) of a Matrix . . . . . . . . . . . 61
8.5.1 Use of the SVD in Data Reduction and Reconstruction . . . . . . 63
ii
-
7/25/2019 cr.intro
4/68
Chapter 1
Linear Vector Spaces
1.1 Definition and Examples
Definition 1.1A linear vector space Vis a set of points (called vectors) satisfying thefollowing conditions:
(1) An operation + exists on Vwhich satisfies the following properties:
(a) x + y= y+ x
(b) x + (y+ z) = (x + y) + z
(c) A vector0exists in V such thatx + 0= x for every x V
(d) For everyx Va vector xexists in Vsuch that (x) + x= 0
(2) An operation exists onVwhich satisfies the following properties:
(a) (x + y) = y + x
(b) ( x) = () x
(c) (+) x= x + x(d) 1 x= x
1
-
7/25/2019 cr.intro
5/68
where the scalars and belong to a field Fwith the identity being given by 1.For ease of notation we will eliminate the in scalar multiplication.
In most applications the field Fwill be the field of real numbersRor the field of complexnumbers C. The 0 vector will be called the null vectoror the origin.
example 1: Letx represent a point in two dimensional space with addition and scalarmultiplication defined by
x1x2
+
y1y2
=
x1+y1x2+y2
and
x1x2
=
x1x2
The origin and negatives are defined by 00
and
x1x2
=
x1x2
example 2: Letx represent a point inn dimensional space (called Euclidean space anddenoted byRn) with addition and scalar multiplication defined by
x1x2...
xn
+
y1y2...
yn
=
x1+y1x2+y2
...xn+yn
and
x1x2...
xn
=
x1x2
...xn
The origin and negatives are defined by
00...0
and
x1x2...
xn
=
x1x2
...xn
example 3: Let x represent a point in n dimensional complex space (called Unitaryspace and denoted by Cn) with addition and scalar multiplication defined by
x1x2...
xn
+ y1y2...
yn
= x1+y1x2+y2
...xn+yn
and x1x2...
xn
= x1x2
...xn
2
-
7/25/2019 cr.intro
6/68
The origin and negatives are defined by
00...0
and
x1x2...
xn
=
x1x2
...xn
In this case the xi and yi can be complex numbers as can the scalars.example 4: Letp be an nth degree polynomial i.e.
p(x) =0+1x+ +nxn
where the i are complex numbers. Define addition and scalar multiplication by
(p1+ p2)(x) =ni=0
(1i+2i)xi and p=
ni=0
ixi
The origin is the null function and p is defined by
p=ni=0
ixi
example 5: LetX1, X2, . . . , X nbe random variables defined on a probability space and
defineVas the collection of all linear combinations ofX1, X2, . . . , X n. Here the vectorsinVare random variables.
1.2 Linear Independence and Bases
Definition 1.2 A finite set of vectors {x1, x2, . . . , xk} is said to be a linearly inde-pendent setif
i
ixi= 0 =i= 0 for each i
If a set of vectors is not linearly independent it is said to be linearly dependent. Ifthe set of vectors is empty we define
i xi = 0 so that, by convention, the empty set ofvectors is a linearly independent set of vectors.
3
-
7/25/2019 cr.intro
7/68
Theorem 1.1The set {x1, x2, . . . , xk} is linearly dependent if and only if
xt=t1i=1
ixi for some t 2
Proof: Sufficiency is obvious since the equation
xt t1i=1
ixi = 0
does not imply that all of the coefficients of the vectors are equal to 0.
Conversely, if the vectors are linearly dependent then we havei
ixi = 0
where at least two of the s are non-zero (we assume that none of the xs are the zerovector). Thus, with suitable relabelling, we have
ti=1
ixi = 0
wheret= 0. Thus
xt = t1i=1
it
xi =
t1i=1
ixi
Definition 1.3A linear basisor coordinate systemin a vector spaceVis a setEoflinearly independent vectors in Vsuch that each vector in Vcan be written as a linearcombination of the vectors in E.
Since the vectors in Eare linearly independent the representation as a linear com-bination is unique. If the number of vectors in E is finite we say that V is finitedimensional.
Definition 1.4The dimensionof a vector space is the number of vectors in any basis
of the vector space.
4
-
7/25/2019 cr.intro
8/68
If we have a set of linearly independent vectors {x1, x2, . . . , xk}, this set may be ex-tended to form a basis. To see this let Vbe finite dimensional with basis {y1, y2, . . . , yn}.Consider the set
{x1, x2, . . . , xk, y1, y2, . . . , yn}
Letz be the first vector in this set which is a linear combination of the preceeding ones.Since the xs are linearly independent we must have z = yi for some i. Thus the set
{x1, x2, . . . , xk, y1, y2, . . . , yi1, yi+1, . . . , yn}
can be used to construct every vector in V. If this set of vectors is linearly independentit is a basis which includes the xs. If not we continue removing ys until the remaining
vectors are linearly independent.It can be proved, using the Axiom of Choice, that every vector space has a basis. In
most applications an explicit basis can be written down and the existence of a basis isa vacuous question.
Definition 1.5A non-empty subsetM of a vector spaceVis called alinear manifoldor a subspace ifx, y V implies that every linear combinationx +y M.
Theorem 1.2The intersection of any collection of subspaces is a subspace.
Proof: Since 0 is in each of the subspaces it is in their intersection. Thus the intersectionis non-empty. Ifx and y are in the intersection then x + yis in each of the subspacesand hence in their intersection. It follows that the intersection is a subspace.
Definition 1.6If{xi, i I} is a set of vectors the subspace spanned by {xi, i I}isdefined to be the intersection of all subspaces containing {xi, i I}.
Theorem 1.3 If{xi, i I} is a set of vectors the subspace spanned by{xi, i I},sp({xi, i I}), is the set of all linear combinations of the vectors in {xi, i I}.
Proof: The set of all linear combinations of vectors in {xi, i I} is obviously a subspacewhich contains {xi, i I}. Conversely, sp({xi, i I}) contains all linear combinationsof vectors in{xi, i I}so that the two subspaces are identical.
It follows that an alternative characterization of a basis is that it is a set of linearlyindependent vectors which spans V.
5
-
7/25/2019 cr.intro
9/68
example 1: IfXis the empty set then the space spanned by X is the 0 vector.example 2: InR3 letXbe the space spanned by e1, e2 where
e1=
10
0
and e2=
01
0
Thensp{e1, e2} is the set of all vectors of the form
x=
x1x20
example 3: If{X1, X2, . . . , X n} is a set of random variables then the space spannedby{X1, X2, . . . , X n}is the set of all random variables of the form
iaiXi. This set is a
basis ifP(Xi= Xj) = 0 for i =j .
6
-
7/25/2019 cr.intro
10/68
Chapter 2
Linear Transformations
LetU be apdimensional vector space and let Vbe ann dimensional vector space.
2.1 Sums and Scalar Products
Definition 2.1A linear transformationL fromU toVis a mapping (function) from UtoV such that
L(x +y) =L(x) +L(y) for everyx, y U and all ,
Definition 2.2 The sum of two linear transformations L1 and L2 is defined by theequation
L(x) =L1(x) + L2(x) for every x U
SimilarlyLis defined by[L](x) =L(x)
The transformation O defined byO(x) =0 has the properties:
L + O= O + L ; L + (L) = (L) + L= O
Also note thatL1 + L2 = L2 + L1. Thus linear transformations with addition and scalarmultiplication as defined above constitute an additive commutative group.
7
-
7/25/2019 cr.intro
11/68
2.2 Products
IfU is p dimensional,V ism dimensional and W is n dimensional and
L1: U V and L2: V W
we define the product, L, ofL1 andL2 by
L(x) =L2(L1(x))
Note that the product of linear transformations is not commutative. The following aresome properties of linear transformations
LO = OL= O
L1(L2+ L3) = L1L2+ L1L3
L1(L2L3) = (L1L2)L3
IfU=V,L is called a linear transformation on a vector space and we can define theidentity transformation I by the equation
I(x) =x
ThenLI= IL = L
We can also define, in this case, A2, and in general An for any integern. Note that
AnAm =An+m and (An)m =Anm
If we define A0 =I then we can define p(A), where p is a polynomial by
p(A)(x) =ni=0
iAix
8
-
7/25/2019 cr.intro
12/68
Chapter 3
Matrices
3.1 Definitions
IfL is a linear transformation fromU (pdimensional) toV(ndimensional) and we havebases {xj : j J} and {yi : i I} respectively for U and Vthen for each j J wehave
L(xj) =n
i=1ijyi for some choice of ij
Then parray
11 12 1p21 22 2p
... ...
... ...
n1 n2 np
is called the matrix of L relative to the bases {xj : j J} and {yi : i I}. Todetermine the matrix ofL we express the transformed jth vector of the basis in the Uspace in terms of the basis of the V space. Then numbers so obtained form the jthcolumn of the matrix ofL. Note that the matrix ofL depends on the bases chosen for
U andV.In typical applications a specific basis is usually present. Thus in Rn we usually
9
-
7/25/2019 cr.intro
13/68
choose the basis {e1, e2, . . . , en} where
ei =
1i2i...
ni
and ji =
1 j = i0 j=i
In such circumstances we callLthe matrix of the linear transformation.
Definition 3.1 A matrix L is an n by p array of scalars and is a carrier of a lineartransformation (Lcarries the basis ofUto the basis ofV).
The scalars making up the matrix are called the elements of the matrix. We will
writeL={ij}
and call ij the i, j element ofL(ij is thus the element in the ith row andjth columnof the matrixL). The zero or null matrix is the matrix each of whose elements is 0 andcorresponds to the zero transformation. A one by p matrix is called a row vector andis written as
Ti = (i1, i2, . . . , ip)
Similarly annby one matrix is called a column vectorand is written as
j =
1j
2j...nj
With this notation we can write the matrix Las
L=
T1
T2...
Tn
Alternatively if j denotes the j th column ofL we may write the matrix Las
L= 1 2 p
10
-
7/25/2019 cr.intro
14/68
3.2 Sums and Products of Matrices
If L1 and L2 are linear transformations then their sum has matrix L defined by theequation
L(xj) = (L1+ L2)(xj)
= L1(xj+ L2(xj)
=i
(1)ij yi+
i
(2)ij yi
=
i
(1)ij +
(2)ij
yi
Thus we have the following
Definition 3.2The sum of two matricesAand B is defined as
A + B= {aij+ bij}
Note thatA and Bboth must be of the same order for the sum to be defined.
Definition 3.3The multiplication of a matrix by a scalar is defined by the equation
A= {aij}
Matrix addition and scalar multiplication of a matrix have the following properties:
A + O= O + A= A
A + B= B + A
A + (B + C) = (AB) + C
A + (A) =O
IfL1 and L2 are linear transformations then their product has matrix L defined by
the equation
L(xj) = L2(L1(xj))
11
-
7/25/2019 cr.intro
15/68
= L2
k (1)
kjyk
=
k
(1)kjL2(yk)
=k
(1)kj
i
(2)ik zi
=i
k
(2)ik
(1)kj
zi
Thus we have
Definition 3.4The product of two matrices Aand B is defined by the equation
AB={k
aikbkj}
Thus the matrix of the product can be found by taking the ith row ofA times the jthcolumn ofB element by element and summing. Note that the product is defined onlyif the number of columns of A is equal to the number of rows of B. We say that ApremultipliesBor thatBpost multipliesAin the product AB. Matrix multiplicationis not commutative.
Provided the indicated products exist matrix multiplication has the following prop-
erties:
AO= O and OA= O
A(B + C) =AB + AC
(A + B)C= AC + BC
A(BC) = (AB)C
Ifn= p the matrix A is said to be a square matrix. In this case we can computeAn
for any integern which has the following properties:
AnAm =An+m
12
-
7/25/2019 cr.intro
16/68
(An)m =Anm
TheidentitymatrixI can be found using the equation
I(xj) =xj
so that the j th column of the identity matrix consists of a one in the j th row and zeroselsewhere. The identity matrix has the property that
AI= IA = A
If we define A0 =I then ifp is a polynomial we may define p(A) by the equation
p(A) =ni=0
iAi
3.3 Conjugate Transpose and Transpose Operations
Definition 3.5The conjugate transposeofA,A is
A ={aji}
where ais the complex conjugate ofa. Thus ifA is n byp, the conjugate transposeA,is p by n withi, j element equal to the complex conjugate of the j, ielement ofA.
Definition 3.6The transpose ofA, AT is
AT ={aji}
Thus ifA is n byp the transposeAT isp byn withi, j element equal to thej, ielementofA.
IfA is a matrix with real elements then A =AT.
Definition 3.7A square matrix is said to be
Hermitian ifA =A
13
-
7/25/2019 cr.intro
17/68
symmetric ifAT =A
normalifAA =AA
The following are some properties of the conjugate transpose and transpose operations:
(AB) =BA (AB)T =BTAT
I =I ; O =OT IT =I(A + B) =A + B (A + B)T =AT + BT
(A) = A (A)T =AT
Definition 3.8A square matrixA is said to be
diagonalifaij = 0 for i=j
upper right triangularifaij = 0 for i > j
lower left triangular ifaij = 0 for i < j
Acolumn vector is an nby one matrix and a row vectoris a one by n matrix. Ifxis a column vector then xT is the row vector with the same elements.
3.4 Invertibility and Inverses
Lemma 3.1 A= B if and only ifAx = Bx for all x.
Proof: IfA= B the assertion is obvious.IfAx= Bx for all x then we need only take x = ei for i = 1, 2, . . . , nwhere ei is thecolumn vector with 1 in the ith row and zeros elsewhere. Thus A and B are equal rowby row.
IfA isn byp andB is p by m are written as
A=
A11 A12A21 A22
and B=
B11 B12B21 B22
14
-
7/25/2019 cr.intro
18/68
then the product ABsatisfies
AB=
A11B11+ A12B21 A11B12+ A12B22A21B11+ A22B21 A21B12+ A22B22
if the partitions are conformable.
If a vector is written as x =
iiei where the ei form a basis for V then theproduct Axgives the transformed value ofxunder the linear transformation A whichcorresponds to the matrixA via the formula
A(x) =i
iA(ei)
Definition 3.9A square matrixA is said to be invertible if
(1) x1=x2 =Ax1=Ax2
(2) For every vectorythere exists at least one x such thatAx= y.
IfA is invertible and maps U into V, both n dimensional, then we define A1, whichmapsV intoU, by the equation
A1(y0) =x0
where y0 is in V and x0 is in Uand satisfies Ax0 = y0 (such an x0 exists by (2) andis unique by (1)). Thus definedA1 is a legitimate transformation from V to U. ThatA1 is a linear transformation fromVtoUcan be seen by the fact that ify1 + y2 Vthen there exist unique x1 and x2 inU such that y1 = A(x1) andy2= A(x2). Since
A(x1+x2) = A(x1) +A(x2)
= y1+y2
it follows that
A
1
(y1+y2) = x1+x2= A1(y1) +A
1(y2)
15
-
7/25/2019 cr.intro
19/68
ThusA1 is a linear transformation and hence has a matrix representation. Finding thematrix representation ofA1 is not an easy task however.
Theorem 3.2IfA,B andC are matrices such that
AB= CA = I
thenAis invertible and A1 =B = C.
Proof: IfAx1 =Ax2 then CAx1 = CAx2 so that x1 =x2 and (1) of defintion 3.9 issatisfied. Ify is any vector then defining x=By implies Ax= ABy = y so that (2)of definition 3.9 is also satisfied and hence A is invertible. Since AB = I =B = A1
and CA = I =C = A1 the conclusions follow.
Theorem 3.3 A matrixA is invertible if and only ifAx= 0 implies x= 0 or equiva-lently if and only if every y Vcan be written as y = Ax.
Proof: IfA is invertible then Ax= 0 and A0= 0 implies that x= 0 since otherwisewe have a contradiction. If A is invertible then by (2) of definition 3.9 we have thaty= Ax for everyy V.
Suppose thatAx= 0implies thatx= 0. Then ifx1=x2we have thatA(x1x2)=0so that Ax1=Ax2 i.e. (1) of definition 3.9 is satisfied. Ifx1, x2, . . . , xn is a basis forU then
i
iA(xi) =0 =A
iixi
= 0 =
i
ixi = 0 =1= 2= = n= 0
It follows that{Ax1, Ax2, . . . , Axn}is a basis forVso that we can write each y Vasy= Ax since
y=i
iA(xi) =A
i
ixi
= Ax
Thus (2) of definition 3.9 is satisfied and A is invertible.
Suppose now that for each y V we can write y = Ax Let {y1, y2, . . . , yn} be abasis forVand letxi be such that yi = Axi. Then ifiixi = 0 we have
0 = A
i
ixi
16
-
7/25/2019 cr.intro
20/68
= i
iA(xi)
=i
iyi
which implies that each of the i equal 0. Thus {x1, x2, . . . , xn} is a basis for U. Itfollows that everyx can be written as
ixi and hence Ax = 0 implies that x = 0 i.e.
(1) of definition 3.9 is satisfied. By assumption, (2) of definition 3.9 is satisfied so thatAis invertible.
Theorem 3.4
(1) IfA andB are invertible so is AB and (AB)1 =B1A1
(2) IfA is invertible and = 0 then A is invertible and (A)1 =1A1
(3) IfA is invertible thenA1 is invertible and (A1)1 =A
Proof: Since(AB)B1A1 =AA1 =I
andB1A1AB= B1B= I
by Theorem 3.2, (AB)1 exists and equals B1A1
Since(A)1A1 =AA1 =I
and1A1(A) =A1A= I
it again follows from Theorem 3.2 that (A)1 exists and equals 1A1.
SinceA1A= AA1 =I
it again follows from Theorem 3.2 that A1 exists and equals A.
In many problems one guesses the inverse ofA and then verifies that it is in fact
the inverse. The following theorems help.
Theorem 3.5
17
-
7/25/2019 cr.intro
21/68
(1) Let A BC D
where A is invertible
Then A BC D
1=
A1 + A1BQ1CA1 A1BQ1
Q1CA1 Q1
whereQ = D CA1B.
(2) Let A BC D
where D is invertible
Then A BC D
1=
(A BD1C)1 (A BD1C)1BD1
D1C(A BD1C)1 D1 + D1C(A BD1C)1BD1
Proof: A BC D
1 A1 + A1BQ1CA1 A1BQ1
Q1CA1 Q1
is equal to
AA1 + AA1BQ1CA1 BQ1CA1 AA1BQ1 + BQ1CA1 + CA1BQ1CA1 DQ1CA1 CA1BQ1 + DQ1
which simplifies to
I + BQ1CA1 BQ1CA1 BQ1 + BQ1
CA1 + (D CA1B)Q1CA1 (D CA1B)Q1
which is seen to be the identity matrix.
The proof for (2) follows similarly.
Theorem 3.6 LetA be nbyn, Ube m byn, S be mbym and V be mbyn. Then
ifA,A + UT
SV andS + SVA1
UT
Sare each invertible we haveA + UTSV
= A1 A1UTS
S + SVA1UTS
1SVA1
18
-
7/25/2019 cr.intro
22/68
Proof:A + UTSV
A + UTSV
1= I UTS
S + SVA1UTS
1SVA1
+UTSVA1 UTSVA1S + SVA1UTS
1SVA1
= I + UT
I SS + SVA1UTS
1SVA1
SVA1UTSS + SVA1UTS
1SVA1
= I + UT
I S + SVA1UTS
S + SVA1UTS
1SVA1
= I
Corollary 3.7 IfS1 exists in the above theorem then
A + UTSV
1=A1 A1UT
S1 + VA1UT
1VA1
Corollary 3.8 IfS = I and u and v are vectors then
A + uvT
= A1
1
(1 + vTA1u)A1uvTA1
The last corollary is used as starting point for the development of a theory of recursuveestimation in least squares and Kalman filtering in information theory.
3.5 Direct Products and Vecs
IfA is a p qmatrix and B is an r smatrix then their direct product denotedbyA Bis the pr qs matrix defined by
A B= (aijB) =
a11B a12B . . . a1qBa21B a22B . . . a2qB
... ... . . . ...ap1B ap2B . . . apqB
19
-
7/25/2019 cr.intro
23/68
Properties of direct products:
0 A = A 0= 0
(A1+ A2) B = A1 B + A2 B
A (B1+ B2) = A B1+ A B2
A B = (A B)
A1A2 B1B2 = (A1 B1)(A2 B2)
(A B)T = AT BT
rank (A B) = rank (A)rank (B)
trace (A B) = [trace (A)][trace[B]
(A B)1 = A1 B1
det(A B) = [det(A)]p[det(B)]m
IfY is an n p matrix:
vecR (Y) is thenp 1 matrix with the rows ofY stacked on top of each other.
vecC(Y) is the np 1 matrix with the columns ofY stacked on top of eachother.
Thei, j element ofY is the (i 1)p+j element of vecR(Y).
Thei, j element ofY is the (j 1)p+i element of vecC(Y).
vecC(YT) = vecR(Y)
vecR(ABC) = (A CT)vecR(B)
whereAis n q1, B is q1 q2 andC is q2 p.
The following are some relationships between vecs and direct products.
abT = bT a
= a bT
vecC(abT) = b a
20
-
7/25/2019 cr.intro
24/68
vecR(abT) = a b
vecC(ABC) = (CT A)vecC(B)
trace (ATB) = [vecC(A)]T[vecC(B)]
21
-
7/25/2019 cr.intro
25/68
Chapter 4
Matrix Factorization
4.1 Rank of a Matrix
In matrix algebra it is often useful to have the matrices expressed in as simple a form aspossible. In particular, if a matrix is diagonal the operations of addition, multiplicationand inversion are easy to perform.
Most of these methods are based on considering the rows and columns of a matrix
as vectors in a vector space of the appropriate dimension.Definition: IfAis an n by p matrix then
(1) The row rank of A is the number of linearly independent rows of the matrixconsidered as vectors in p dimensional space.
(2) The column rank ofAis the number of linearly independent columns of the matrixconsidered as vectors in n dimensional space.
Theorem 4.1 LetA be an n byp matrix. The set of all vectorsy such that y= Ax
is a vector space with dimension equal to the column rank ofA and is called therangespaceofAand is denoted byR(A).
22
-
7/25/2019 cr.intro
26/68
Proof: R(A) is non empty since 0 R(A) Ify1 andy2 are inR(A) then
y = 1y1+2y2
y = 1Ax1+2Ax2
y = A(1x1+2x2)
It follows that y R(A) and henceR(A) is a vector space. If{a1, a2, . . . , ap} are thecolumns ofA then for every y R(A) we can write
y= Ax= x1a1+x2a2+ +xpap
Thus the columns ofA span R(A). It follows that the dimension ofR(A) is equal tothe column rank ofA.
Theorem 4.2Let A be an n by p matrix. The set of all vectors x such that Ax = 0 isa vector space of dimension equal to p column rank ofA. This vector space is calledthenull space ofAand is denoted byN(A).
Proof: Since 0 N(A) it follows that N(A) is non empty. Ifx1 and x2 are in N(A)then
A(1x1+2x2) = 1Ax1+2Ax2
= 0
so thatN(A) is a vector space.
Let {x1, x2, . . . , xk} be a basis forN(A). Then there exists vectors {xk+1, xk+2, . . . , xp}such that {x1, x2, . . . , xp} is a basis forpdimensional space. It follows that {Ax1, Ax2, . . . , Axp}spans the range space ofA since
y = Ax
= A
i
ixi
=i
iAxi
Since Axi = 0 for i = 1, 2, . . . , k it follows that {Axk+1, Axk+2, . . . , Axp} spans the
range space ofA. Suppose now that k+1, k+2. . . , p are such that
k+1Axk+1+k+2Axk+2+ +pAxp = 0
23
-
7/25/2019 cr.intro
27/68
ThenA(k+1xk+1+k+2xk+2+ +pxp) =0
It follows thatk+1xk+1+k+2xk+2+ +pxp N(A)
Since{x1, x2, . . . , xp}is a basis forN(A) it follows that for some 1, 2, . . . , k we have
1x1+2x2+ +kxk =k+1xk+1+k+2xk+2+ +pxp
or1x1+2x2+ +kxk k+1xk+1+k+2xk+2+ +pxp = 0
Since{x1, x2, . . . , xp} is a basis for p dimensional space we have that 1 =2 = =p = 0. It follows that the set
{Axk+1, Axk+2, . . . , Axp}
is linearly independent and spans the range space ofA. The dimension ofR(A is thusp k. But by Theorem 4.1 we also have that the dimension ofR(A) is equal to thecolumn rank ofA. Thus
p k= column rank ofA
so thatk= p column rank ofA = dim (N(A)
Definition 4.2The elementary row (column) operations on a matrix are defined to be
(1) The interchange of two rows (columns).
(2) The multiplication of a row (column) by a non zero scalar.
(3) The addition of one row (column) to another row (column).
Lemma 4.3Elementary row (column) operations do not change the row (column) rankof a matrix.
24
-
7/25/2019 cr.intro
28/68
Proof: The row rank of a matrix A is the number of linearly independent vectors inthe set defined by
A=
aT1aT2
...aTn
Clearly interchange of two rows does not change the rank nor does the multiplication ofa row by a non zero scalar. The addition ofaaTj toa
Ti produces the new matrix
A=
aT1aT2
...aTi +aa
Tj
...aTn
which has the same rank as A. The statements on the column rank ofA are obtainedby considering the row vectors ofAT and using the above results.
Lemma 4.4 Elementary row (column) operations on a matrix can be achieved by pre(post) multiplication by non-singular matrices.
Proof: Interchange of the ith and jth rows can be achieved by pre-multiplication by
25
-
7/25/2019 cr.intro
29/68
the matrix
E1=
eT1eT2...
eTi1eTj
eTi+1...
eTj1eTi
eTj+1...
eTn
Multiplication by a non zero scalar and addition of a scalar multiple of one row toanother row can be achieved by pre multiplication by the matrices E2 and E3 where
E2=
eT1eT2...
aeTi...
eTn
and E3=
eT1eT2...
aeTj + eTi
.
..eTn
The statements concerning column operations are obtained by using the above resultson A and then transposing the final results.
Lemma 4.5 The matrices which produce elementary row (column) operations are nonsingular.
26
-
7/25/2019 cr.intro
30/68
Proof: The inverse ofE1 is ET1 . The inverse ofE2 is the matrix
E2=
eT1eT2...
1a
eTi...
eTn
Finally the inverse ofE3 is the matrix
[e1, e2, . . . , ei, . . . , ej aei, . . . , en]
Again the results for column operations follow from the row results.
Lemma 4.6The column rank of a matrix is invariant under pre multiplication by a nonsingular matrix. Similarly the row rank of a matrix is invariant under post multiplicationby a non singular matrix.
Proof: By Theorem 4.2 the dimension ofN(A) is equal top column rank ofA. Since
N(A={x: Ax= 0}
it follows that x N(A) if and only EAx = 0 where E is non singular. Thus thedimension ofN(A) is invariant under pre multiplication by a non singular matrix andthe conclusion follows. The result for post multiplication follow by considering AT andtransposing.
Corollary 4.7 The column rank of a matrix is invariant under premultiplication byan elementary matrix . Similarly the row rank of a matrix is invariant under postmultiplication by an elementary matrix.
By suitable choices of elementary matrices we can thus write
PA= E
whereP is the product of non singular elementary matrices and hence is non singular.The matrix E is a row echelon matrix i.e. a matrix with r non zero rows in which
27
-
7/25/2019 cr.intro
31/68
the first k columns consist ofe1, e2, . . . , er and 0s in some order. The remainingp r columns have zeros below the first r rows and arbitrary elements in the remainingpositions. It thus follows that
r= row rank ofA column rank ofA
and hencerow rank ofAT column rank ofAT
But we also have that
row rank ofAT = column rank ofA
androw rank ofA = column rank ofAT
It follows thatcolumn rank ofA row rank ofA
and hence we have:
Theorem 4.8row rank ofA = column rank ofA
Alternative Proof of Theorem 4.8LetA be an n p matrix i.e.
A=
a11 a12 a1pa21 a22 a2p
... ...
. . . ...
an1 an2 anp
We may write A in terms of its columns or rows i.e.
A=Ac1, A
c2, . . . , A
cp
=
aT1aT2
...aTn
28
-
7/25/2019 cr.intro
32/68
where the j th column vector, Acj , and ith row vector, aTi are given by
Acj =
a1ja2j
...anj
ai =
ai1ai2
...aip
The column vectors of A span a space which is a subset of Rn and is called thecolumn space ofAand denoted by C(A) i.e.
C(A) =sp (Ac1, Ac2, . . . , A
cp)
The row vectors ofA span a space which is a subspace ofRp and is called the rowspaceofAand denoted by R(A) i.e.
R(A) =sp (a1, a2, . . . , an)
The dimension of C(A) is called the column rank of A and is denoted by C =c(A). Similarly the dimension ofR(A) is called the row rankofA and is denoted byR= r(A).
Result 1: Let B=
bc1 bc2 . . . b
cC
where the columns ofB form a basis for the column space ofA. Then there is a CpmatrixL such that
A= BL
It follows that the rows ofAare linear combinations of the rows ofL and hence
R(A) R(L)
Hence
R= dim[R(A)] dim[R(L)] Ci.e. the row rank ofAis less than or equal to the column rank ofA.
29
-
7/25/2019 cr.intro
33/68
Result 2:
R=
rT1rT2...
rTR
where the rows ofR form a basis for the row space ofA. Then there is a n RmatrixKsuch that
A= KR
It follows that the columns ofA are linear combinations of the columns ofK and hence
C(A) C(K)Hence
C= dim[C(A)] dim[C(K)] R
i.e. the column rank ofA is less than or equal to the row rank ofA.
Hence we have that the row rank and column rank of a matrix are equal. Th commonvalue is called the rankofA.
ReferenceHarville, David A. (1997) Matrix Algebra From a Statisticians Perspective.Springer (pages 36-40).
We thus define the rank of a matrixA,(A) to be the number of linearly independentrows or the number of linearly independent columns in the matrix A. Note that (A)is unaffected by pre or post multiplication by non singular matrices. By pre and postmultiplying by suitable elementary matrices we can write any matrix as
PAQ= B =
Ir OO O
where O denotes a matrix of zeroes of the appropriate order. SinceP and Q are nonsingular we have
A= P1BQ1
which is an example of a matrix factorization.
Another factorization of considerable importance is contained in the following Lemma.
30
-
7/25/2019 cr.intro
34/68
Lemma 4.9IfA is an n by p matrix of rankr then there are matrices B (nby r) andC(r byp) such that
A= BC
whereB andC are both of rank r.
Proof: Let the rows ofC form a basis for the row space ofA. ThenC is r by p of rankr. Since the rows ofC form a basis for the row space ofA we can write the ith row ofAasaTi =b
TiC for some choice ofbi. Thus A = BC.
The fact that the rank ofB in Lemma 4.9 is r follows from the following importantresult
Theorem 4.10rank (AB) min{rank (A), rank (B)}Proof: Each row ofAB is a linear combination of the rows ofB. Thus
rank (AB) rank (B)
Similarly each column ofAB is a linear combination of the columns ofAso that
rank (AB) rank (A)
The conclusion follows.
Theorem 4.11An n by nmatrix is non singular if and only if it is of rank n.
Proof: For any matrix we can write
PAQ=
Ir OO O
whereP and Q are non singular. Since the right hand side of this equation is invertibleif and only ifr = n the conclusion follows.
Definition 4.3IfA is ann byp matrix then thep byp matrixAA(ATAifA is real)is called theGram matrix ofA.
Theorem 4.12IfA is annby pmatrix then
(1) rank (AA) = rank (A) = rank (AA)
31
-
7/25/2019 cr.intro
35/68
(2) rank (ATA) = rank (A) = rank (AAT) ifAis real.
Proof: Assume that the field of scalars is such that
izi zi= 0 implies that z1 = z2=
= zn = 0. If Ax = 0 then AAx = 0. Also if AAx = 0 then xAAx = 0 or
zz = 0 so that z = 0 i.e. Ax = 0. Hence AAx = 0 and Ax = 0 are equivalentstatements. It follows that
N(AA) ={x: AAx= 0}={x: Ax= 0}=N(A)
Thus rank (AA) = rank (A). The conclusion that rank (AA) = rank (A) follows bysymmetry.
Theorem 4.13rank (A + B) rank (A) + rank (B)
Proof: Ifa1, a2, . . . , ar and b1, b2, . . . , bsdenote linearly independent columns ofAandBrespectively then
{a1, a2, . . . , ar, b1, b2, . . . , bs}
span the column space ofA + B. Hence
rank (A + B) r+s= rank (A) + rank (B)
Definition 4.4
(1) A square matrix Ais idempotentifA2 =A.
(2) A square matrix Ais nilpotentifAm =O for some integerm greater than one.
Definition 4.5The traceof a square matrix is the sum of its diagonal elements i.e.
tr (A) =i
aii
Theorem 4.14tr (AB) = tr (BA); tr (A + B) = tr (A) + tr (B); tr (AT) = tr (A)
Theorem 4.15IfA is idempotent then
rank (A) = tr (A)
32
-
7/25/2019 cr.intro
36/68
Proof: We can write A = BC whereB is n by r of rankr,C is r byn of rankr and ris the rank ofA. Thus we have
A2 =BCBC= A = BC
HenceBBCBCC =BBCC
SinceB andC are each of rankr we have thatCB = I and hence
tr (BC) = tr (CB) = tr (I) =r = rank (A)
33
-
7/25/2019 cr.intro
37/68
Chapter 5
Linear Equations
5.1 Consistency Results
Consider a set ofn linear equations in p unknowns
a11x1 + a12x2 + + a1pxp = y1a21x1 + a22x2 + + a2pxp = y2
... ...
... ...
... ...
... ...
...
an1x1 + an2x2 + + anpxp = yn
which may be compactly written in matrix notation as
Ax= y
where A is the n byp matrix with i, j element equal to aij , x is the p by one columnvector withjth element equal toxj andyis thenby one column vector withith elementequal to yi.
Often such equations arise without knowledge of whether they are really equations,i.e. does there exist a vector x which satisfies the equations? If such an x exists theequations are said to be consistent, otherwise they are said to be inconsistent.
The equations Ax= y can be discussed from a vector space perspective since A isa carrier of a linear transformation from the p dimensional space spanned by the xis to
34
-
7/25/2019 cr.intro
38/68
then dimensional space where y lives. It follows that a solution exists if and only ifyis inR(A), the range space ofA. Thus a solution exists if and only ify is in the columnspace ofA defined as the set of all vectors of the form {y : y =Ax for some x}. Ifwe consider the augmented matrix [A, y] we have the following theorem on consistency
Theorem 5.1The equations Ax = y are consistent if and only if
rank([A, y]) = rank(A)
Proof: Obviouslyrank([A, y]) rank(A)
If the equations have a solution then there exists x such that Ax = y so that
rank(A) rank([A, y]) = rank([A, Ax]) = rank(A[I, x]) rank(A)
It follows thatrank([A, y]) = rank(A)
if the equations are consistent.
Conversely ifrank([A, y]) = rank(A)
thenyis a linear combination of the columns ofAso that a solution exists.
Theorem 5.1 gives conditions under which solutions to the equations Ax= y existbut we need to know the form of the solution in order to explicitly solve the equations.The equations Ax = 0 are called the homogeneous equations. If x1 and x2 aresolutions to the homogeneous equations then 1x1 + 2x2 is also a solution to thehomogeneous equations. In general, if{x1, x2, . . . , xs} span the null space of A thenx0 =
si=1ixi is a solution to the homogeneous equations. If xp is a particular
solution to the equations Ax = y then xp +x0 is also a solution to the equationsAx= y. Ifx is any solution then x = xp+ (x xp) so that every solution is of thisform. We thus have the following theorem
Theorem 5.2A general solution to the consistent equationsAx= y is of the form
x= xp+ x0
35
-
7/25/2019 cr.intro
39/68
wherexp is any particular solution to the equations andx0 is any solution to the homo-geneous equations Ax = 0.
A final existence question is to determine when the equationsAx = y have a uniquesolution. The solution is unique if and only if the only solution to the homogeneousequations is the 0 vector i.e. the null space of A contains only the 0 vector. If thesolution x is unique the homogeneous equations cannot have a non zero solution x0since thenx + x0 would be another solution. Conversely, if the homogeneous equationshave only the 0 vector as a solution then the solution to Ax = y is unique ( if therewere two different solutionsx1 andx2 thenx1 x2 would be a non zero solution to thehomogeneous equations).
Since the null space ofA contains only the 0vector if and only if the columns ofAare linearly independent we have that the solution is unique if and only if the rank ofAis equal to p where we assume with no loss of generality that A is n byp with p n.We thus have the following theorem.
Theorem 5.3LetA be n by p with p n then
(1) The equations Ax = y have the general solution
x= xp+ x0
whereAxp = y andAx0= 0 if and only if
rank([A, y]) = rank(A)
(2) The solution is unique (x0 = 0) if and only if
rank(A) =p
5.2 Solutions to Linear Equations
5.2.1 A is p byp of rank p
Here there is a unique solution by Theorem 5.3 with explicit form given by A1y. Theproof of this result is trivial given the following Lemma
36
-
7/25/2019 cr.intro
40/68
Lemma 5.4Ap by p matrix is non singular if and only if rank(A) =p.
Proof: IfA1 exists it is the unique matrix satisfying
A1A= AA1 =I
Thusp rank(A) p i.e. the rank ofA is pifAis non singular.
Conversely, if rank(A) = p then there exists a unique solution xi to the equationAxi = ei fori = 1, 2, . . . , p. If we define Bby
B= [x1, x2, . . . , xp]
then AB = I. Similarly there exists a unique solution zi to the equation ATzi fori= 1, 2, . . . , pso that if we define
CT = [z1, z2, . . . , zp]
thenATCT =I or CA = I. It follows that that C= B i.e. thatA1 exists.
5.2.2 A is n byp of rank p
IfA is n byp of rank p where p n then by theorem 5.3 a unique solution exists and
we need only find its form. Since the rank ofA is p the rank ofATA is p and henceATAhas an inverse. Thus if we define x1 = (A
TA)1ATy we have that
Ax1 = A(ATA)1ATy= A(ATA)1AT(Ax) =Ax= y
so thatx1 is the unique solution to the equations.
5.2.3 A is n byp of rank r where r p n
In this case a solution may not exist and even if a solution exists it may not be unique.
Suppose that a matrixA can be found such that
AAA= A
37
-
7/25/2019 cr.intro
41/68
Such a matrix A is called ageneralized inverse ofA
If the equations Ax= y have a solution then x1 = Ay is a solution since
Ax1= A(Ay) =AA(Ax) =AAAx= Ax= y
so that Ax1 is a solution. The general solution is thus obtained by characterizing thesolutions to the homogeneous equations i.e. we need to characterize the vectors in thenull space ofA. Ifz is an arbitrary vector then (I AA)z is in the null space ofA.The dimension of the set Uwhich can be written in this form is equal to the rank ofI AA. Since
(I AA)2 = I AA AA + AAAA
= I A
Awe see that
rank (I AA) = tr (I AA)
= p tr (AA)
= p rank (AA))
= p rank (A)
= p r
It follows thatUis of dimensionp rand henceU=N(A). Thus we have the followingtheorem
Theorem 5.5IfAis n by p of rankr p n then
(1) If a solution exists the general solution is given by
x= Ay+ (I AA)z
wherez is an arbitrary p by one vector andA is a generalized inverse ofA.
(2) If x0 = By is a solution to Ax = y for every y for which the equations areconsistent thenB is a generalized inverse ofA.
Proof: Since we have established (1) we prove (2) Let y = Awhere is arbitrary. The
equationsAx= A
have a solution by assumption which is of the form BA
. It followsthat ABA= A. Since is arbitrary we have that ABA= A i.e. B is a generalizedinverse ofA.
38
-
7/25/2019 cr.intro
42/68
5.2.4 Generalized Inverses
We now establish that a generalized inverse ofA always exists. Recall that we can findnon singular matricesPand Q such that
PAQ= B i.e. A= P1BQ1
whereB is given by
B=
I OO O
Matrix multiplication shows that BBB= B where
B =
I UV W
whereU, V and W are arbitrary. Since
A(QBP)A = P1BQ1QBPP1BQ1
= P1BQ1
= A
we see that QBP is a generalized inverse ofA establishing the existence of a generalized
inverse for any matrix.
We also note that if
PA=
I CO O
Then another generalized inverse ofA is BPwhere
B =
I OO O
39
-
7/25/2019 cr.intro
43/68
Chapter 6
Determinants
6.1 Definition and Properties
Definition 6.1 LetA be square matrix with columns {x1, x2, . . . , xp}. The determi-nantofA is that function det : Rp R such that
(1) det(x1, x2, . . . , cxm, . . . , xp) =c det(x1, x2, . . . , xm, . . . , xp)
(2) det(x1, x2, . . . , xm+ xk, . . . , xp = det(x1, x2, . . . , xm, . . . , xp)
(3) det(e1, e2, . . . , ep) = 1 where ei is the ith unit vector.
Properties of Determinants
(1) Ifxi = 0 then xi = 0xi so that by (1) of the definition det(x1, x2, . . . , xp) = 0 ifxi = 0.
(2) det(x1, x2, . . . , xm+cxk, . . . , xp) = det(x1, x2, . . . , xm, . . . , xp). Ifc= 0 this resultis trivial. Ifc= 0 we note that
det(x1, x2, . . . , xm+cxk, . . . , xp) = 1
cdet(x1, x2, . . . , xm+cxk, . . . , cxk, . . . , xp)
40
-
7/25/2019 cr.intro
44/68
=
1
cdet(x1, x2, . . . , xm, . . . , cxk, . . . , xp)= det(x1, x2, . . . , xm, . . . , xp)
(3) If two columns ofA are identical then det = 0. Similarly if the columns ofAarelinearly dependent then the determinant ofAis 0.
(4) Ifxk andxm are interchanged then the determinant changes sign i.e.
det(x1, x2, . . . , xm, . . . , xk, . . . , xp) = det(x1, x2, . . . , xk, . . . , xm, . . . , xp)
To prove this we note that
det(x1, x2, . . . , xm, . . . , xk, . . . , xp) = det(x1, x2, . . . , xm+ xk, . . . , xk, . . . , xp)
= det(x1, x2, . . . , xm+ xk, . . . , xk (xm+ xk), . . . , xp)
= det(x1, x2, . . . , xm+ xk, . . . , xm, . . . , xp)
= det(x1, x2, . . . , xk, . . . , xm, . . . , xp)
= det(x1, x2, . . . , xk, . . . , xm, . . . , xp)
Thus the interchange of an even number of columns does not change the sign ofthe determinant while the interchange of an odd number of columns does changethe sign.
(5) det(x1, x2, . . . , xm+ y, . . . , xp) is equal to
det(x1, x2, . . . , xm, . . . , xp) + det(x1, x2, . . . , y, . . . , xp)
If the vectorsxi for i=mare dependent the assertion is obvious since both termsare 0. If the xi are linearly independent for i =m and xm =
icixi the assertion
follows by repeated use of (2) and the fact that the first term of the expression isalways zero.
If the xi are linearly independent theny=
idixi so that
det(x1, x2, . . . , xm+ y, . . . , xp) = det(x1, x2, . . . , xm+i
dixi, . . . , xp)
= det(x1, x2, . . . , (1 +dm)xm, . . . , xp)
41
-
7/25/2019 cr.intro
45/68
= (1 +dm)det(x1, x2, . . . , xm, . . . , xp)
= det(x1, x2, . . . , xp) + det(x1, x2, . . . , dmxm, . . . , xp)
= det(x1, x2, . . . , xp) + det(x1, x2, . . . ,i
dixi, . . . , xp)
= det(x1, x2, . . . , xp) + det(x1, x2, . . . , y, . . . , xp)
Using the above results we can show that the determinant of a matrix is a well definedfunction of a matrix and give a method for computing the determinant. Since anycolumn vector ofA, sayxm can be written as
xm =p
i=1 aimeiwe have that
det(x1, x2, . . . , xp) = det(i1
ai11ei1, x2, . . . , xp)
=i1
ai11det(ei1, x2, . . . , xp)
=i1
ai11i2
ai22det(ei1, ei2, . . . , xp)
=i1
i2
ip
ai11ai22 aippdet(ei1, ei2, . . . , eip)
Now note that det(ei1, ei2, . . . , ep) is 0 if any of the subscripts are the same, +1 ifi1, i2, . . . , ip is an even permutation of 1, 2, . . . , p and 1 if i1, i2, . . . , ip is an odd per-mutation of 1, 2, . . . , p. Hence we can evaluate the determinant of a matrix and it is auniquely defined function satisfying the conditions (1), (2) and (3).
6.2 Computation
The usual method for computing a determinant is to note that ifxm=
iaimxi then
det(x1, x2, . . . , xp) = i
aimdet(x1, x2, . . . , em, . . . , xp)
=i
aimAim
42
-
7/25/2019 cr.intro
46/68
whereAim = det(x1, x2, . . . , em, . . . , xp)
is called thecofactorofaim and is simply the determinant ofA when the mth columnofA is replaced byei.
We now note that
Aim =i1
i2
ip
ai1ai2 aipdet(ei1, . . . , ei, . . . , eip)
HenceAim= (1)
i+m det(Mim)
whereMim is the matrix formed by deleting the ith row ofA and thejth column ofA.Thus the value of the determinant of a pby p matrix can be reduced to the calculationof p determinants of p 1 by p 1 matrices. The determinant of Mim is called theminorofaim.
43
-
7/25/2019 cr.intro
47/68
Chapter 7
Inner Product Spaces
7.1 definitions and Elementary Properties
Definition 7.1Aninner producton a linear vector spaceVis a function ( ,) mappingV V into C such that
1. (x, x) 0 with (x, x) = 0 if and only ifx = 0.
2. (x, y) = (x, y)3. (x + y,vbfz) =(x, z) +(y, z)
A vector space with an inner product is called an inner product space.
Definition 7.2: Thenormor length ofx,||x||is defined by
||x||2 = (x, x)
Definition 7.3: The vectorsx and y are said to be orthogonalif (x, x) = 0. We writex y ifx is orthogonal to y. More generally
(1) xis said to be orthogonal to the set of vectors X if
(x, y) = 0 for every y X
44
-
7/25/2019 cr.intro
48/68
(2) X andYare said to be orthogonal if
(x, y) = 0 for every x X and y Y
Definition 7.4: X is an orthonormalset of vectors if
(x, y) =
1 ifx = y0 ifx =y
X is a complete orthonormal set of vectors if it is not contained in a larger set oforthonormal vectors.
Lemma 7.1IfXis an orthonormal set then its vectors are linearly independent.
Proof: If
iix= 0 for xi X then
0=
i
ixi, xj
=
i
i(xi, xj) =j
It follws that each of thejare equal to 0 and hence that the xjs are linearly independent.
Lemma 8.2 Let {x1, x2, . . . , xp} be an orthonormal basis for X. Thenx X if andonly ifx xi fori = 1, 2, . . . , p.
Proof: Ifx X then by definition x xi fori = 1, 2, . . . , p.Conversely, ifx xi fori = 1, 2, . . . , p. Then ify X we have y =
iixi. Thus
(x, y) =i
i(x, xi) = 0
Thusx X.
Theorem 7.3(Bessels Inequality) Let {x1, x2, . . . , xp} be an orthonormal set in a linearvector spaceVwith an inner product. Then
1.i |i|2 ||x||2 wherex is any vector in V and i= (x, xi).2. The vectorr = x
iixis orthogonal to each xj and hence to the space spanned
by{x1, x2, . . . , xp}.
45
-
7/25/2019 cr.intro
49/68
Proof: We note that
||x i
ixi||2 =
x
i
ixi, x i
ixi
= (x, x) i
(i(xi, x) (x,i
ixi) + (i
ixi,i
ixi)
= ||x||2 i
ii i
i(x, xi) +i
i(xi,i
ixi)
= ||x||2 2i
|i|2 +
i
i(xi, xi)
= ||x||2
i|i|
2
Since||x
iixi||2 0 result (1) follows.
For result (2) we note that
x
i
ixi, xj
= (x, xj)
i
i(x i, xj) = (x, xj) (x, xj) = 0
Theorem 7.4(Cauchy Schwartz Inequality) Ifx and y are vectors in an inner productspace then
|(x, y)| ||x|| ||y||
Proof: If y = 0 equality holds. If y = 0 then y||y||
is orthonormal and by Besselsinequality we have
x, y
||y||
2
||x||2 or |(x, y)| ||x|| ||y||
Theorem 7.5IfX ={x1, x2, . . . , xn} is any finite orthonormal set in a vector space Vthen the following conditions are equivalent:
(1) X is complete.
(2) (x, xi) =0 for i = 1, 2, . . . , n implies thatx= 0.
46
-
7/25/2019 cr.intro
50/68
(3) The space spanned byX is equal toV.
(4) Ifx V thenx =
i(x, xi)xi.
(5) Ifx and yare inV then
(x, y) =i
(x, xi)(xi, y)
(6) Ifx V then||x||2 =
i
|(xi, x)|2
Result (5) is called Parsevals identity.
Proof:(1) =(2) IfXis complete and (x, xi) = 0 for eachi then
x||x||
could be added to thesetXwhich would be a contradiction. Thus x= 0.
(2) = (3) Ifx is not a linear combination of the xi then x
i(x, xi)xi is not equaltox and is orthogonal to Xwhich is a contradiction. Thus the sapace spanned by X isequal to V
(3) =(4) Ifx Vwe can write x=
jjxj. It follows that
(x, xi) =j
j(xj, xi) =i
(4) =(5) Sincex=
i
(x, xi)xi and y=j
(y, xj)xj
we have
(x, y) = (i
(x, xi)xi,j
(y, xj)xj)
= i
(x, xi) (y, xi)
=i
(x, xi)(xi, y)
47
-
7/25/2019 cr.intro
51/68
(5) =(6) In result (5) simply setx = y.
(6) =(1) Ifxo is orthogonal to each xi then by (6) we have
||xo||2 =
i
|(xi, xo)|2 = 0
so thatxo = 0.
7.2 Gram Schmidt Process
The Gram Schmidt process can be used to construct an orthonormal basis for a finitedimensional vector space. Start with a basis forV as {x1, x2, . . . , xn}. Form
y1 = x1||x1||
Next definez2 = x2 (x2, y1)y1
Since thexiare linearly independentz2 =0 and z2is orthogonal toy1. Hencey2 = z2||z2||
is orthogonal to y1 and has unit norm. If y1, y2, . . . , yr have been so chosen then weform
zr+1= xr+1 r
i=1
(xr+1, yi)yi
Since the xi are linearly independent ||zr+1||> 0 and since zr+1 yi fori= 1, 2, . . . , rit follows that
yr+1= zr+1||zr+1||
may be added to the set {y1, y2, . . . , yr}to form a new orthonormal set. The processnecessarily stops withynsince there can be at mostnelements in a linearly independentset.
48
-
7/25/2019 cr.intro
52/68
7.3 Orthogonal Projections
Theorem 7.6 (Orthogonal Projection) LetUbe a subspace of an inner product spaceVand lety be a vector in Vwhich is not inU. Then there exists a unique vectoryU Uand and a unique vector e V such that
y= yU+ e and e U
Proof: Let {x1, x2, . . . , xr} be a basis ofU. Since y / U the set {x1, x2, . . . , xr, y} isa linearly independent set. Use the Gram Schmidt process on this set to form a setof orthonormal vectors {y1, y2, . . . , yr, yr+1}. Since y is in the space spanned by these
vectors we can write
y=r+1i=1
iyr =r
i=1
iyr+r+1yr+1
If we define
yU=r
i=1
iyi and e= r+1yr+1
thenYU U and e U. To show uniqueness, suppose that we also have
y= z1+ e1 where z1 U and e1 U
Then we have (z1 yU) + (e1 e) =0
so that (z1 yU, z1 yU) = 0 and hence z1= yU and e1 = e.
Definition 7.4The vectore in Theorem 7.6 is called theorthogonal projectionfromy to Uand the vector yUis called the orthogonal projectionofy onU.
Theorem 7.7The projection ofy on Uhas the property that
||y yU||=min
x {||y x||: x U}
Proof: Letx UThen (x yU, e) = 0. Hence
||y x|| = (y x, y x)
49
-
7/25/2019 cr.intro
53/68
= (y yU+ yU x, y yU+ yU x)
= ||y yU|| + ||yU x|| + (y yU, yU x) + (yU x, y yU)
= ||y yU|| + ||yU x||
||y yU||
with the minimum occurring whenx = yU. Thus the projection minimizes the distancefromU to x.
50
-
7/25/2019 cr.intro
54/68
Chapter 8
Characteristic Roots and Vectors
8.1 Definitions
Definition 8.1: A scalar is a characteristic root and a non-zero vector x is acharacteristic vectorof the matrix Aif
Ax= x
Other names for charateristic roots are proper value, latent root, eigenvalue and secularvalue with similar adjectives appying to characteristic vectors.
If is a characteristic root ofAlet the setX be defined by
X = {x : Ax= x}
Thegeometric multiplicityof is defined to be the dimension ofX {0}. is saidto be a simple characteristic root if its geometric multiplicity is 1.
Definition 8.2: The spectrum ofA is the set of all characteristic roots ofA and isdenoted by (A)
Since Ax= x if and only if [A ] x= 0 we see that (A) is equivalent tothe set of such thatA I. Similarly x is a characteristic vector corresponding to
51
-
7/25/2019 cr.intro
55/68
if and only x is in the null space ofA I. Using the properties of determinants wehave the following Theorem.
Theorem 8.1 (A) if and only if det(A I) = 0.
If the field of scalars is the set of complex numbers (or any algebraically closed field)then ifAis p by p we have
det(A A) =q
i=1
( i)ni
whereq
i=1 =p and ni 1
Thealgebraic multiplicityof the characteristic rootiis the numberniwhich appearsin the above expression.
The algebraic and geometric multiplicity of a characteristic root need not correspond.To see this let
A=
1 10 1
Then
det (A I) =
1 10 1
= ( 1)
2
It follows that both of the characteristic roots ofA are equal to 1. Hence the algebraicmultiplicity of the characteristic root 1 is equal to 2. However, the equation
Ax=
1 10 1
x1x2
=
x1x2
yields the equations
x1+x2 = x1
x2 = x2
Thus the geometric multiplicity ofAis equal to 1.
Lemma 8.2
52
-
7/25/2019 cr.intro
56/68
(1) The spectrum ofTAT1 is the same as the spectrum ofA.
(2) If is a characteristic root ofA thenp() is a characteristic root ofp(A) wherepis any polynomial.
Proof: Since TAT1 I= T[A I]T1 it follows that [A I]x= 0 if and only if[TAT1 I]Tx= 0 and the spectrums are thus the same.
IfAx = xthen Anx= nxfor any integer nso thatp(Ax= p()x.
8.2 Factorizations
Theorem 8.3 (SchursTriangular Factorization) Let A be any p by p matrix. Thenthere exists a unitary matrix U i.e. UU= I such that
UAU= T
whereT is an upper right triangular matrix with diagonal elements equal to the char-acteristic roots ofA
Proof: Letp = 2 and let u1 be such that
Au1= 1u1 and u1u1= 0
Define the matrix U = [u1, u2] whereu2 is chosen so that UU= I. Then
UAU = [u1, u2]A[u1, u]
=
u1u2
[u1, Au2]
=
u1Au20 u2Au2
Note that u2Au2 = 2 is the other characteristic root of A. Now assume that theresult is true for n = N and let n = N + 1. Let Au1 = 1u1 and u
1u1 = 1 and let
{vbfb1, b2, . . . , bN} be such that
U1 = [u1, b1, b2, . . . , bN]
53
-
7/25/2019 cr.intro
57/68
is unitary. Thus
U1AU1 = [u1, b1, b2, . . . , bN]A[u1, b1, b2, . . . , bN]
= [u1, b1, b2, . . . , bN][1u1, Ab1, Ab2, . . . , AbN]
=
1 u
1Au1 u
1AuN
0 BN
Since the characteristic roots of the above matrix are the solutions to the equation
(1 )det(BN I) = 0
it follows that the characteristic roots of BN are the remaining characteristic roots1, 2, . . . , N+1ofA. By the induction assumption there exists a unitary matrixL suchthat
LBNL
is upper right triangular with diagonal elements equal to 1, 2, . . . , N+1. Let
UN+1=
1 0T
0 L
and U= U1UN+1
Then
UAU = UN+1U1AU1UN+1
= UN+1
1 t
0 BN
=
1 0T
0 L
1 b
0 BN
1 0T
0L
=
1 b
0 LBNL
which is upper right triangular as claimed with the diagonal elements equal to thecharacteristic roots ofA.
Theorem 8.4IfA is normal (A
A= AA
) then there exists a unitary matrix U suchthatUAU= D
54
-
7/25/2019 cr.intro
58/68
whereD is diagonal with elements equal to the characteristic roots ofA.
Proof: By Theorem 8.3 there is a unitary matrix U such that UAU = T whereT is upper right triangular. Thus we als have UAU = T where T is lower righttriangular. It follows that
UAAU= TT and UAAU= TT
Hence the off diagonal elements ofT vanish i.e. Tis diagonal as claimed.
Theorem 8.5(Cochrans Theorem) IfA is normal then
A=
ni=1
iEi
whereEi Ej =O for i=j, E2i =Ei and the i are the characteristic roots ofA.
Proof: By Theorem 8.3 there is a unitary matrix Usuch that UAU= D is diagonalwith diagonal elements equal to the characteristic roots ofA. Hence
A = UDU
= U
1u1
2u2
...
nun
=ni=1
iuiui
DefiningEi= uiui completes the proof.
This representation of A is called the spectral representation of A. In mostapplicationsAis symmetric.
The spectral representation can be rewritten as
A=
ni=1
iEi=
qj=1
jEj
55
-
7/25/2019 cr.intro
59/68
where1< 2< < q are the distinct characteristic roots ofA. Now define
S0 = O and Si =i
j=1
Ej for i= 1, 2, . . . , q
ThenEi = Si Si1 fori= 1, 2, . . . , q and hence
A =q
j=1
jEj
=q
j=1
j(Sj Sj1)
=q
j=1
jSj
which may be written as a Stielges integral i.e.
A=
dS()
which is also called the spectral representation ofA.
The spectral representation ofAhas the property that
(Ax, y) = (
dS()x, y)
=
qj=1
jSjx, y
=q
j=1
j(Sjx, y)
=q
j=1
j[(Sjx, y) (Sj1x, y)]
= d(S()x, y)In more advanced treatments applicable to Hilbert spaces the property just establishedis taken as the defining property for
d(S().
56
-
7/25/2019 cr.intro
60/68
8.3 Quadratic Forms
IfA is a symmetric matrix with real elements xTAx is called a quadratic form. Aquadratic form is said to be
positive definite ifxTAx>0 for all x=0
non negative definite ifxTAx 0 for all x=0
IfA is positve definite then all of its characteristic roots are positive while ifA is nonnegative definite then all of its characteristic roots are non negative. To prove this note
that ifxi is a characteristic vector ofA corresponding toi thenAxi= ixi and hencexTiAxi = ix
Tixi.
Ifi=j then the corresponding characteristic vectors are orthogonal since
jxTixj =x
TiAxj = (Axi)xj =ix
Tixj
which implies thatxTixj = 0 i.e. thatxi andxj are orthogonal ifi=j .
Another way of characterizing characteristic roots is to consider the stationary valuesof
XTAx
xTxas x varies over Rp. Since
A=pi=1
iEi
where1 2 p we have that
Ax=pi=1
iEix
It follows that
XTAx =pi=1
ixTEix
57
-
7/25/2019 cr.intro
61/68
=
pi=1
ixT
EiETix
=pi=1
iEix2
SinceEi= yiyTi , where theyi form an orthonormal basis we have that U
TU= I where
U= [y1, y2, . . . , yp]
It follows that U1 =UT and hence thatUUT =I Thus
I = UU
T
= [y1, y2, . . . , yp]
y1y2...
yp
=pi=1
yiypi
=pi=1
Ei
It follows that
xTx =pi=1
xTEix
=pi=1
xTEiETix
=pi=1
Eix2
ThusXTAx
xTx =p
i=1iEix2p
i=1 Eix2
58
-
7/25/2019 cr.intro
62/68
-
7/25/2019 cr.intro
63/68
Theorem 8.8 A necessary and sufficient condition that a symmetric matrix on a realinner product space satisfiesA = 0 is that Ax, x= 0.
Proof: Necessity is obvious.
To prove sufficiency we note that
A(x + y), (x + y) = (Ax + Ay), (x + y)
. = Ax, x + Ax, y + Ay, x + Ay, y
It follows that
Ax, y + Ay, x= A(x + y), (x + y) A(x, (x A(y, (y
Thus the real part ofAx= 0 sinceA is symmetric. ThusAx= 0 and by Lemma 8.2the result follows.
8.4 Jordan Canonical Form
The most general canonical form for matrices is the Jordan Canonical Form. Thisparticular canonical form is used in the theory of Markov chains.
The Jordan canonical form is a nearly diagonal canonical form for a matrix. LetJpi(i) be pi pi matrix of the form
Jpi(i) =iIpi+ Npi
where
Npi =
0 1 0 0 00 0 1 0 0...
... ...
... . . .
...0 0 0 0 10 0 0 0 0
is a nilpotent matrix of orderpi i.e. Npipi =0.
60
-
7/25/2019 cr.intro
64/68
Theorem 8.9 (Jordan Canonical Form). Let 1, 2, . . . , q be the q distinct charac-teristic roots of a p p matrix A. Then there exists a non singular matrix V suchthat
VAV1 = diag
Jp1 , Jp2, . . . , Jpq
One use of the Jordan canonical form occurs when we want an expression for An.Using the Jordan canonical form we see that
VAnV1 = diag
Jnp1, Jnp2, . . . , J
npq
Thus ifn max pi we have
Jnpi(i) = (iIpi+ Npi)
n
=
r= 0n
n
r
nri N
rpi
by the binomial expansion. Since Nrpi =0 ifr pi we obtain
Jnpi(i) =pi1r=0
n
r
nri N
rpi
Thus for large n we have a relatively simple expression for An.
8.5 The Singular Value Decomposition (SVD) of aMatrix
Theorem 8.10 IfA is any nbyp matrix then there exists matrices U, V and D suchthat
A= U DVT
where
U is n by p and UTU=Ip
V is p by pandVTV =Ip
61
-
7/25/2019 cr.intro
65/68
D= diag (1, 2, . . . , p) and
1 2 p0
Proof: Letx be such that ||x|| = 1 and ||Ax|| =||A||. That such an x can be chosenfollows from the fact that the norm is a continuous function on a closed and boundedsubset ofRn and hence achieves its maximum at a point in the set. Lety1 = Ax anddefine
y= y1||A||
Then ||y||= 1 and Ax=||A||y= y
where = ||A||LetU1 andV1 be such that
U= [y, U1] and [x, V1] are orthogonal
whereU isnby pandV isp by p. It then follows that
UTAV =
yT
UT1
A [x, V1] =
yTAx yTAV1UT1Ax U
T1AV1
Thus
A1 = UTAV =
wT
0 B1
wherewT =yTAV1, B1 = UT1AV1 and y
TAx= yTy=
If we define
z= A1
w
=
wT
0 B1
w
=
2 + wTw
B1w
Then
||z||2 = [2 + wTw, wTBT1]
2 + wTw
B1w
= (2 + wTw)2 + wTBT1B1w
62
-
7/25/2019 cr.intro
66/68
Thus2 =||A||2 =||UTAV||2 =||A1||2 (2 + wTw)
and it follows that w = 0. Thus
UTAV =
0T
0 B1
Induction now establishes the result. The orthogonality properties of U and V thenimply that
UTAV =D A= U DVT
8.5.1 Use of the SVD in Data Reduction and Reconstruction
LetY be a data matrix consisting ofn observations on p response variables. That is
Y=
y11 y12 y1py21 y22 y2p
... ...
. . . ...
yn1 yn2 ynp
The singular value decomposition ofY allows us to representY as
Y= U1D1VT1
where U1 is n byp, V1 is p byp and D1 is a p byp diagonal matrix with ith diagonalelement di. We know that di 0 and we may assume that
d1 d2 dp
Recall that the matrices U1 and V1 have the properties
UT1U1= Ip and VT1 V1= Ip
so that the columns of U1 form an orthonormal basis for the column space of Y i.e.the subspace ofRn spanned by the columns ofY. Similarly the columns ofV1 form an
63
-
7/25/2019 cr.intro
67/68
orthonormal basis for the row space ofY i.e. the subspace ofRp space spanned by therows ofY.
The rows ofY represent points in pdimensional space and the
yi1, yi2, . . . , yip
represent the coordinates of the ith individuals values relative to the axes of the poriginal variables. Two individuals are close in this space if the distance (norm)between them is small. In this case the two individuals have nearly the same valueson each of the variables i.e. their coordinates with respect to the original variables arenearly equal. We refer to this subspace of Rp as individual space or observation
space.
The columns ofY represent p points in Rn and represent the observed values of avariable on n individuals. Two points in this space are close if they have similar valuesover the entire collection of observed individuals i.e. the two variables are measuring thesame thing. We call this space variable space.
The distance between individuals i and i is given by
||yi yi ||=
pj=1
(yij yij)2
1/2
For any data matrixY there is an orthogonal matrix V1 such that ifZ= YV1 then thedistances between individuals are preserved i.e.
||zi zi||=||yiV1 yiV1||=(yi yi)
TVT1 V1(yi yi)1/2
=||yi yi||
Thus the original data is replaced by a new data set in which the coordinate axes of theindividuals are now the columns ofZ.
We now note thatU1 = [u1, u2, , up]
whereuj is the jth column vector ofU1 and that
V1= [v1, v2, , vp]
64
-
7/25/2019 cr.intro
68/68
wherevj is the j thcolumn vector ofV1. Thus we can write
Y= U1DVT1 =
pj=1
d2jujvTj
That is we may reconstruct Y as a sum of constants (the d2j) times matrices whichare the outer product of vectors which span variable space (the uj) and individualspace (the vj). The interpretation of these vectors is of importance to understandingthe nature of the representaion. We note that
YYT = (U1D1VT1 )(V1D1U
T1) =U1D
21U
T1
ThusYYTU1= U1D
21
which shows that the columns ofU1 are the eigenvectors ofYYT with eigenvalues equal
to the d2j .
We also note that
YTY= (V1D1UT1)(U1D1V
T1 ) =V1D
21V
T1
so thatYTYV1 = V1D
21
Thus the columns ofV1 are the eigenvalues ofYTY with eigenvalues again equal to the
d2j .
The relationshipYV1 = U1D1
shows that the columns ofV1 transform an individuals values on the original variablesto values in the new variablesU1 with relative importance given by the dj
If some of the dj are zero or are approximately 0 then we can approximate Y as
Yp
j=1
d2jujvTj
which tells us that an approximating subspace has the same dimension in variable space