cr.intro

7/25/2019 cr.intro

1/68

Linear Algebra and Matrices

Notes from Charles A. Rohde

Fall 2003

7/25/2019 cr.intro

2/68

Contents

1 Linear Vector Spaces 11.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . 3

2 Linear Transformations 7

2.1 Sums and Scalar Products . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Matrices 9

3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Sums and Products of Matrices . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Conjugate Transpose and Transpose Operations . . . . . . . . . . . . . . 13

3.4 Invertibility and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5 Direct Products and Vecs . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Matrix Factorization 22

4.1 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

i

7/25/2019 cr.intro

3/68

5 Linear Equations 34

5.1 Consistency Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 Solutions to Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.1 Aisp by p of rankp . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.2 Aisn byp of rankp . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.3 Aisn byp of rankr where r p n . . . . . . . . . . . . . . . 37

5.2.4 Generalized Inverses . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Determinants 40

6.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7 Inner Product Spaces 44

7.1 definitions and Elementary Properties . . . . . . . . . . . . . . . . . . . . 44

7.2 Gram Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8 Characteristic Roots and Vectors 51

8.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8.2 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

8.4 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

8.5 The Singular Value Decomposition (SVD) of a Matrix . . . . . . . . . . . 61

8.5.1 Use of the SVD in Data Reduction and Reconstruction . . . . . . 63

ii

7/25/2019 cr.intro

4/68

Chapter 1

Linear Vector Spaces

1.1 Definition and Examples

Definition 1.1A linear vector space Vis a set of points (called vectors) satisfying thefollowing conditions:

(1) An operation + exists on Vwhich satisfies the following properties:

(a) x + y= y+ x

(b) x + (y+ z) = (x + y) + z

(c) A vector0exists in V such thatx + 0= x for every x V

(d) For everyx Va vector xexists in Vsuch that (x) + x= 0

(2) An operation exists onVwhich satisfies the following properties:

(a) (x + y) = y + x

(b) ( x) = () x

(c) (+) x= x + x(d) 1 x= x

1

7/25/2019 cr.intro

5/68

where the scalars and belong to a field Fwith the identity being given by 1.For ease of notation we will eliminate the in scalar multiplication.

In most applications the field Fwill be the field of real numbersRor the field of complexnumbers C. The 0 vector will be called the null vectoror the origin.

example 1: Letx represent a point in two dimensional space with addition and scalarmultiplication defined by

x1x2

+

y1y2

=

x1+y1x2+y2

and

x1x2

=

x1x2

The origin and negatives are defined by 00

and

x1x2

=

x1x2

example 2: Letx represent a point inn dimensional space (called Euclidean space anddenoted byRn) with addition and scalar multiplication defined by

x1x2...

xn

+

y1y2...

yn

=

x1+y1x2+y2

...xn+yn

and

x1x2...

xn

=

x1x2

...xn

The origin and negatives are defined by

00...0

and

x1x2...

xn

=

x1x2

...xn

example 3: Let x represent a point in n dimensional complex space (called Unitaryspace and denoted by Cn) with addition and scalar multiplication defined by

x1x2...

xn

+ y1y2...

yn

= x1+y1x2+y2

...xn+yn

and x1x2...

xn

= x1x2

...xn

2

7/25/2019 cr.intro

6/68

The origin and negatives are defined by

00...0

and

x1x2...

xn

=

x1x2

...xn

In this case the xi and yi can be complex numbers as can the scalars.example 4: Letp be an nth degree polynomial i.e.

p(x) =0+1x+ +nxn

where the i are complex numbers. Define addition and scalar multiplication by

(p1+ p2)(x) =ni=0

(1i+2i)xi and p=

ni=0

ixi

The origin is the null function and p is defined by

p=ni=0

ixi

example 5: LetX1, X2, . . . , X nbe random variables defined on a probability space and

defineVas the collection of all linear combinations ofX1, X2, . . . , X n. Here the vectorsinVare random variables.

1.2 Linear Independence and Bases

Definition 1.2 A finite set of vectors {x1, x2, . . . , xk} is said to be a linearly inde-pendent setif

i

ixi= 0 =i= 0 for each i

If a set of vectors is not linearly independent it is said to be linearly dependent. Ifthe set of vectors is empty we define

i xi = 0 so that, by convention, the empty set ofvectors is a linearly independent set of vectors.

3

7/25/2019 cr.intro

7/68

Theorem 1.1The set {x1, x2, . . . , xk} is linearly dependent if and only if

xt=t1i=1

ixi for some t 2

Proof: Sufficiency is obvious since the equation

xt t1i=1

ixi = 0

does not imply that all of the coefficients of the vectors are equal to 0.

Conversely, if the vectors are linearly dependent then we havei

ixi = 0

where at least two of the s are non-zero (we assume that none of the xs are the zerovector). Thus, with suitable relabelling, we have

ti=1

ixi = 0

wheret= 0. Thus

xt = t1i=1

it

xi =

t1i=1

ixi

Definition 1.3A linear basisor coordinate systemin a vector spaceVis a setEoflinearly independent vectors in Vsuch that each vector in Vcan be written as a linearcombination of the vectors in E.

Since the vectors in Eare linearly independent the representation as a linear com-bination is unique. If the number of vectors in E is finite we say that V is finitedimensional.

Definition 1.4The dimensionof a vector space is the number of vectors in any basis

of the vector space.

4

7/25/2019 cr.intro

8/68

If we have a set of linearly independent vectors {x1, x2, . . . , xk}, this set may be ex-tended to form a basis. To see this let Vbe finite dimensional with basis {y1, y2, . . . , yn}.Consider the set

{x1, x2, . . . , xk, y1, y2, . . . , yn}

Letz be the first vector in this set which is a linear combination of the preceeding ones.Since the xs are linearly independent we must have z = yi for some i. Thus the set

{x1, x2, . . . , xk, y1, y2, . . . , yi1, yi+1, . . . , yn}

can be used to construct every vector in V. If this set of vectors is linearly independentit is a basis which includes the xs. If not we continue removing ys until the remaining

vectors are linearly independent.It can be proved, using the Axiom of Choice, that every vector space has a basis. In

most applications an explicit basis can be written down and the existence of a basis isa vacuous question.

Definition 1.5A non-empty subsetM of a vector spaceVis called alinear manifoldor a subspace ifx, y V implies that every linear combinationx +y M.

Theorem 1.2The intersection of any collection of subspaces is a subspace.

Proof: Since 0 is in each of the subspaces it is in their intersection. Thus the intersectionis non-empty. Ifx and y are in the intersection then x + yis in each of the subspacesand hence in their intersection. It follows that the intersection is a subspace.

Definition 1.6If{xi, i I} is a set of vectors the subspace spanned by {xi, i I}isdefined to be the intersection of all subspaces containing {xi, i I}.

Theorem 1.3 If{xi, i I} is a set of vectors the subspace spanned by{xi, i I},sp({xi, i I}), is the set of all linear combinations of the vectors in {xi, i I}.

Proof: The set of all linear combinations of vectors in {xi, i I} is obviously a subspacewhich contains {xi, i I}. Conversely, sp({xi, i I}) contains all linear combinationsof vectors in{xi, i I}so that the two subspaces are identical.

It follows that an alternative characterization of a basis is that it is a set of linearlyindependent vectors which spans V.

5

7/25/2019 cr.intro

9/68

example 1: IfXis the empty set then the space spanned by X is the 0 vector.example 2: InR3 letXbe the space spanned by e1, e2 where

e1=

10

0

and e2=

01

0

Thensp{e1, e2} is the set of all vectors of the form

x=

x1x20

example 3: If{X1, X2, . . . , X n} is a set of random variables then the space spannedby{X1, X2, . . . , X n}is the set of all random variables of the form

iaiXi. This set is a

basis ifP(Xi= Xj) = 0 for i =j .

6

7/25/2019 cr.intro

10/68

Chapter 2

Linear Transformations

LetU be apdimensional vector space and let Vbe ann dimensional vector space.

2.1 Sums and Scalar Products

Definition 2.1A linear transformationL fromU toVis a mapping (function) from UtoV such that

L(x +y) =L(x) +L(y) for everyx, y U and all ,

Definition 2.2 The sum of two linear transformations L1 and L2 is defined by theequation

L(x) =L1(x) + L2(x) for every x U

SimilarlyLis defined by[L](x) =L(x)

The transformation O defined byO(x) =0 has the properties:

L + O= O + L ; L + (L) = (L) + L= O

Also note thatL1 + L2 = L2 + L1. Thus linear transformations with addition and scalarmultiplication as defined above constitute an additive commutative group.

7

7/25/2019 cr.intro

11/68

2.2 Products

IfU is p dimensional,V ism dimensional and W is n dimensional and

L1: U V and L2: V W

we define the product, L, ofL1 andL2 by

L(x) =L2(L1(x))

Note that the product of linear transformations is not commutative. The following aresome properties of linear transformations

LO = OL= O

L1(L2+ L3) = L1L2+ L1L3

L1(L2L3) = (L1L2)L3

IfU=V,L is called a linear transformation on a vector space and we can define theidentity transformation I by the equation

I(x) =x

ThenLI= IL = L

We can also define, in this case, A2, and in general An for any integern. Note that

AnAm =An+m and (An)m =Anm

If we define A0 =I then we can define p(A), where p is a polynomial by

p(A)(x) =ni=0

iAix

8

7/25/2019 cr.intro

12/68

Chapter 3

Matrices

3.1 Definitions

IfL is a linear transformation fromU (pdimensional) toV(ndimensional) and we havebases {xj : j J} and {yi : i I} respectively for U and Vthen for each j J wehave

L(xj) =n

i=1ijyi for some choice of ij

Then parray

11 12 1p21 22 2p

... ...

... ...

n1 n2 np

is called the matrix of L relative to the bases {xj : j J} and {yi : i I}. Todetermine the matrix ofL we express the transformed jth vector of the basis in the Uspace in terms of the basis of the V space. Then numbers so obtained form the jthcolumn of the matrix ofL. Note that the matrix ofL depends on the bases chosen for

U andV.In typical applications a specific basis is usually present. Thus in Rn we usually

9

7/25/2019 cr.intro

13/68

choose the basis {e1, e2, . . . , en} where

ei =

1i2i...

ni

and ji =

1 j = i0 j=i

In such circumstances we callLthe matrix of the linear transformation.

Definition 3.1 A matrix L is an n by p array of scalars and is a carrier of a lineartransformation (Lcarries the basis ofUto the basis ofV).

The scalars making up the matrix are called the elements of the matrix. We will

writeL={ij}

and call ij the i, j element ofL(ij is thus the element in the ith row andjth columnof the matrixL). The zero or null matrix is the matrix each of whose elements is 0 andcorresponds to the zero transformation. A one by p matrix is called a row vector andis written as

Ti = (i1, i2, . . . , ip)

Similarly annby one matrix is called a column vectorand is written as

j =

1j

2j...nj

With this notation we can write the matrix Las

L=

T1

T2...

Tn

Alternatively if j denotes the j th column ofL we may write the matrix Las

L= 1 2 p

10

7/25/2019 cr.intro

14/68

3.2 Sums and Products of Matrices

If L1 and L2 are linear transformations then their sum has matrix L defined by theequation

L(xj) = (L1+ L2)(xj)

= L1(xj+ L2(xj)

=i

(1)ij yi+

i

(2)ij yi

=

i

(1)ij +

(2)ij

yi

Thus we have the following

Definition 3.2The sum of two matricesAand B is defined as

A + B= {aij+ bij}

Note thatA and Bboth must be of the same order for the sum to be defined.

Definition 3.3The multiplication of a matrix by a scalar is defined by the equation

A= {aij}

Matrix addition and scalar multiplication of a matrix have the following properties:

A + O= O + A= A

A + B= B + A

A + (B + C) = (AB) + C

A + (A) =O

IfL1 and L2 are linear transformations then their product has matrix L defined by

the equation

L(xj) = L2(L1(xj))

11

7/25/2019 cr.intro

15/68

= L2

k (1)

kjyk

=

k

(1)kjL2(yk)

=k

(1)kj

i

(2)ik zi

=i

k

(2)ik

(1)kj

zi

Thus we have

Definition 3.4The product of two matrices Aand B is defined by the equation

AB={k

aikbkj}

Thus the matrix of the product can be found by taking the ith row ofA times the jthcolumn ofB element by element and summing. Note that the product is defined onlyif the number of columns of A is equal to the number of rows of B. We say that ApremultipliesBor thatBpost multipliesAin the product AB. Matrix multiplicationis not commutative.

Provided the indicated products exist matrix multiplication has the following prop-

erties:

AO= O and OA= O

A(B + C) =AB + AC

(A + B)C= AC + BC

A(BC) = (AB)C

Ifn= p the matrix A is said to be a square matrix. In this case we can computeAn

for any integern which has the following properties:

AnAm =An+m

12

7/25/2019 cr.intro

16/68

(An)m =Anm

TheidentitymatrixI can be found using the equation

I(xj) =xj

so that the j th column of the identity matrix consists of a one in the j th row and zeroselsewhere. The identity matrix has the property that

AI= IA = A

If we define A0 =I then ifp is a polynomial we may define p(A) by the equation

p(A) =ni=0

iAi

3.3 Conjugate Transpose and Transpose Operations

Definition 3.5The conjugate transposeofA,A is

A ={aji}

where ais the complex conjugate ofa. Thus ifA is n byp, the conjugate transposeA,is p by n withi, j element equal to the complex conjugate of the j, ielement ofA.

Definition 3.6The transpose ofA, AT is

AT ={aji}

Thus ifA is n byp the transposeAT isp byn withi, j element equal to thej, ielementofA.

IfA is a matrix with real elements then A =AT.

Definition 3.7A square matrix is said to be

Hermitian ifA =A

13

7/25/2019 cr.intro

17/68

symmetric ifAT =A

normalifAA =AA

The following are some properties of the conjugate transpose and transpose operations:

(AB) =BA (AB)T =BTAT

I =I ; O =OT IT =I(A + B) =A + B (A + B)T =AT + BT

(A) = A (A)T =AT

Definition 3.8A square matrixA is said to be

diagonalifaij = 0 for i=j

upper right triangularifaij = 0 for i > j

lower left triangular ifaij = 0 for i < j

Acolumn vector is an nby one matrix and a row vectoris a one by n matrix. Ifxis a column vector then xT is the row vector with the same elements.

3.4 Invertibility and Inverses

Lemma 3.1 A= B if and only ifAx = Bx for all x.

Proof: IfA= B the assertion is obvious.IfAx= Bx for all x then we need only take x = ei for i = 1, 2, . . . , nwhere ei is thecolumn vector with 1 in the ith row and zeros elsewhere. Thus A and B are equal rowby row.

IfA isn byp andB is p by m are written as

A=

A11 A12A21 A22

and B=

B11 B12B21 B22

14

7/25/2019 cr.intro

18/68

then the product ABsatisfies

AB=

A11B11+ A12B21 A11B12+ A12B22A21B11+ A22B21 A21B12+ A22B22

if the partitions are conformable.

If a vector is written as x =

iiei where the ei form a basis for V then theproduct Axgives the transformed value ofxunder the linear transformation A whichcorresponds to the matrixA via the formula

A(x) =i

iA(ei)

Definition 3.9A square matrixA is said to be invertible if

(1) x1=x2 =Ax1=Ax2

(2) For every vectorythere exists at least one x such thatAx= y.

IfA is invertible and maps U into V, both n dimensional, then we define A1, whichmapsV intoU, by the equation

A1(y0) =x0

where y0 is in V and x0 is in Uand satisfies Ax0 = y0 (such an x0 exists by (2) andis unique by (1)). Thus definedA1 is a legitimate transformation from V to U. ThatA1 is a linear transformation fromVtoUcan be seen by the fact that ify1 + y2 Vthen there exist unique x1 and x2 inU such that y1 = A(x1) andy2= A(x2). Since

A(x1+x2) = A(x1) +A(x2)

= y1+y2

it follows that

A

1

(y1+y2) = x1+x2= A1(y1) +A

1(y2)

15

7/25/2019 cr.intro

19/68

ThusA1 is a linear transformation and hence has a matrix representation. Finding thematrix representation ofA1 is not an easy task however.

Theorem 3.2IfA,B andC are matrices such that

AB= CA = I

thenAis invertible and A1 =B = C.

Proof: IfAx1 =Ax2 then CAx1 = CAx2 so that x1 =x2 and (1) of defintion 3.9 issatisfied. Ify is any vector then defining x=By implies Ax= ABy = y so that (2)of definition 3.9 is also satisfied and hence A is invertible. Since AB = I =B = A1

and CA = I =C = A1 the conclusions follow.

Theorem 3.3 A matrixA is invertible if and only ifAx= 0 implies x= 0 or equiva-lently if and only if every y Vcan be written as y = Ax.

Proof: IfA is invertible then Ax= 0 and A0= 0 implies that x= 0 since otherwisewe have a contradiction. If A is invertible then by (2) of definition 3.9 we have thaty= Ax for everyy V.

Suppose thatAx= 0implies thatx= 0. Then ifx1=x2we have thatA(x1x2)=0so that Ax1=Ax2 i.e. (1) of definition 3.9 is satisfied. Ifx1, x2, . . . , xn is a basis forU then

i

iA(xi) =0 =A

iixi

= 0 =

i

ixi = 0 =1= 2= = n= 0

It follows that{Ax1, Ax2, . . . , Axn}is a basis forVso that we can write each y Vasy= Ax since

y=i

iA(xi) =A

i

ixi

= Ax

Thus (2) of definition 3.9 is satisfied and A is invertible.

Suppose now that for each y V we can write y = Ax Let {y1, y2, . . . , yn} be abasis forVand letxi be such that yi = Axi. Then ifiixi = 0 we have

0 = A

i

ixi

16

7/25/2019 cr.intro

20/68

= i

iA(xi)

=i

iyi

which implies that each of the i equal 0. Thus {x1, x2, . . . , xn} is a basis for U. Itfollows that everyx can be written as

ixi and hence Ax = 0 implies that x = 0 i.e.

(1) of definition 3.9 is satisfied. By assumption, (2) of definition 3.9 is satisfied so thatAis invertible.

Theorem 3.4

(1) IfA andB are invertible so is AB and (AB)1 =B1A1

(2) IfA is invertible and = 0 then A is invertible and (A)1 =1A1

(3) IfA is invertible thenA1 is invertible and (A1)1 =A

Proof: Since(AB)B1A1 =AA1 =I

andB1A1AB= B1B= I

by Theorem 3.2, (AB)1 exists and equals B1A1

Since(A)1A1 =AA1 =I

and1A1(A) =A1A= I

it again follows from Theorem 3.2 that (A)1 exists and equals 1A1.

SinceA1A= AA1 =I

it again follows from Theorem 3.2 that A1 exists and equals A.

In many problems one guesses the inverse ofA and then verifies that it is in fact

the inverse. The following theorems help.

Theorem 3.5

17

7/25/2019 cr.intro

21/68

(1) Let A BC D

where A is invertible

Then A BC D

1=

A1 + A1BQ1CA1 A1BQ1

Q1CA1 Q1

whereQ = D CA1B.

(2) Let A BC D

where D is invertible

Then A BC D

1=

(A BD1C)1 (A BD1C)1BD1

D1C(A BD1C)1 D1 + D1C(A BD1C)1BD1

Proof: A BC D

1 A1 + A1BQ1CA1 A1BQ1

Q1CA1 Q1

is equal to

AA1 + AA1BQ1CA1 BQ1CA1 AA1BQ1 + BQ1CA1 + CA1BQ1CA1 DQ1CA1 CA1BQ1 + DQ1

which simplifies to

I + BQ1CA1 BQ1CA1 BQ1 + BQ1

CA1 + (D CA1B)Q1CA1 (D CA1B)Q1

which is seen to be the identity matrix.

The proof for (2) follows similarly.

Theorem 3.6 LetA be nbyn, Ube m byn, S be mbym and V be mbyn. Then

ifA,A + UT

SV andS + SVA1

UT

Sare each invertible we haveA + UTSV

= A1 A1UTS

S + SVA1UTS

1SVA1

18

7/25/2019 cr.intro

22/68

Proof:A + UTSV

A + UTSV

1= I UTS

S + SVA1UTS

1SVA1

+UTSVA1 UTSVA1S + SVA1UTS

1SVA1

= I + UT

I SS + SVA1UTS

1SVA1

SVA1UTSS + SVA1UTS

1SVA1

= I + UT

I S + SVA1UTS

S + SVA1UTS

1SVA1

= I

Corollary 3.7 IfS1 exists in the above theorem then

A + UTSV

1=A1 A1UT

S1 + VA1UT

1VA1

Corollary 3.8 IfS = I and u and v are vectors then

A + uvT

= A1

1

(1 + vTA1u)A1uvTA1

The last corollary is used as starting point for the development of a theory of recursuveestimation in least squares and Kalman filtering in information theory.

3.5 Direct Products and Vecs

IfA is a p qmatrix and B is an r smatrix then their direct product denotedbyA Bis the pr qs matrix defined by

A B= (aijB) =

a11B a12B . . . a1qBa21B a22B . . . a2qB

... ... . . . ...ap1B ap2B . . . apqB

19

7/25/2019 cr.intro

23/68

Properties of direct products:

0 A = A 0= 0

(A1+ A2) B = A1 B + A2 B

A (B1+ B2) = A B1+ A B2

A B = (A B)

A1A2 B1B2 = (A1 B1)(A2 B2)

(A B)T = AT BT

rank (A B) = rank (A)rank (B)

trace (A B) = [trace (A)][trace[B]

(A B)1 = A1 B1

det(A B) = [det(A)]p[det(B)]m

IfY is an n p matrix:

vecR (Y) is thenp 1 matrix with the rows ofY stacked on top of each other.

vecC(Y) is the np 1 matrix with the columns ofY stacked on top of eachother.

Thei, j element ofY is the (i 1)p+j element of vecR(Y).

Thei, j element ofY is the (j 1)p+i element of vecC(Y).

vecC(YT) = vecR(Y)

vecR(ABC) = (A CT)vecR(B)

whereAis n q1, B is q1 q2 andC is q2 p.

The following are some relationships between vecs and direct products.

abT = bT a

= a bT

vecC(abT) = b a

20

7/25/2019 cr.intro

24/68

vecR(abT) = a b

vecC(ABC) = (CT A)vecC(B)

trace (ATB) = [vecC(A)]T[vecC(B)]

21

7/25/2019 cr.intro

25/68

Chapter 4

Matrix Factorization

4.1 Rank of a Matrix

In matrix algebra it is often useful to have the matrices expressed in as simple a form aspossible. In particular, if a matrix is diagonal the operations of addition, multiplicationand inversion are easy to perform.

Most of these methods are based on considering the rows and columns of a matrix

as vectors in a vector space of the appropriate dimension.Definition: IfAis an n by p matrix then

(1) The row rank of A is the number of linearly independent rows of the matrixconsidered as vectors in p dimensional space.

(2) The column rank ofAis the number of linearly independent columns of the matrixconsidered as vectors in n dimensional space.

Theorem 4.1 LetA be an n byp matrix. The set of all vectorsy such that y= Ax

is a vector space with dimension equal to the column rank ofA and is called therangespaceofAand is denoted byR(A).

22

7/25/2019 cr.intro

26/68

Proof: R(A) is non empty since 0 R(A) Ify1 andy2 are inR(A) then

y = 1y1+2y2

y = 1Ax1+2Ax2

y = A(1x1+2x2)

It follows that y R(A) and henceR(A) is a vector space. If{a1, a2, . . . , ap} are thecolumns ofA then for every y R(A) we can write

y= Ax= x1a1+x2a2+ +xpap

Thus the columns ofA span R(A). It follows that the dimension ofR(A) is equal tothe column rank ofA.

Theorem 4.2Let A be an n by p matrix. The set of all vectors x such that Ax = 0 isa vector space of dimension equal to p column rank ofA. This vector space is calledthenull space ofAand is denoted byN(A).

Proof: Since 0 N(A) it follows that N(A) is non empty. Ifx1 and x2 are in N(A)then

A(1x1+2x2) = 1Ax1+2Ax2

= 0

so thatN(A) is a vector space.

Let {x1, x2, . . . , xk} be a basis forN(A). Then there exists vectors {xk+1, xk+2, . . . , xp}such that {x1, x2, . . . , xp} is a basis forpdimensional space. It follows that {Ax1, Ax2, . . . , Axp}spans the range space ofA since

y = Ax

= A

i

ixi

=i

iAxi

Since Axi = 0 for i = 1, 2, . . . , k it follows that {Axk+1, Axk+2, . . . , Axp} spans the

range space ofA. Suppose now that k+1, k+2. . . , p are such that

k+1Axk+1+k+2Axk+2+ +pAxp = 0

23

7/25/2019 cr.intro

27/68

ThenA(k+1xk+1+k+2xk+2+ +pxp) =0

It follows thatk+1xk+1+k+2xk+2+ +pxp N(A)

Since{x1, x2, . . . , xp}is a basis forN(A) it follows that for some 1, 2, . . . , k we have

1x1+2x2+ +kxk =k+1xk+1+k+2xk+2+ +pxp

or1x1+2x2+ +kxk k+1xk+1+k+2xk+2+ +pxp = 0

Since{x1, x2, . . . , xp} is a basis for p dimensional space we have that 1 =2 = =p = 0. It follows that the set

{Axk+1, Axk+2, . . . , Axp}

is linearly independent and spans the range space ofA. The dimension ofR(A is thusp k. But by Theorem 4.1 we also have that the dimension ofR(A) is equal to thecolumn rank ofA. Thus

p k= column rank ofA

so thatk= p column rank ofA = dim (N(A)

Definition 4.2The elementary row (column) operations on a matrix are defined to be

(1) The interchange of two rows (columns).

(2) The multiplication of a row (column) by a non zero scalar.

(3) The addition of one row (column) to another row (column).

Lemma 4.3Elementary row (column) operations do not change the row (column) rankof a matrix.

24

7/25/2019 cr.intro

28/68

Proof: The row rank of a matrix A is the number of linearly independent vectors inthe set defined by

A=

aT1aT2

...aTn

Clearly interchange of two rows does not change the rank nor does the multiplication ofa row by a non zero scalar. The addition ofaaTj toa

Ti produces the new matrix

A=

aT1aT2

...aTi +aa

Tj

...aTn

which has the same rank as A. The statements on the column rank ofA are obtainedby considering the row vectors ofAT and using the above results.

Lemma 4.4 Elementary row (column) operations on a matrix can be achieved by pre(post) multiplication by non-singular matrices.

Proof: Interchange of the ith and jth rows can be achieved by pre-multiplication by

25

7/25/2019 cr.intro

29/68

the matrix

E1=

eT1eT2...

eTi1eTj

eTi+1...

eTj1eTi

eTj+1...

eTn

Multiplication by a non zero scalar and addition of a scalar multiple of one row toanother row can be achieved by pre multiplication by the matrices E2 and E3 where

E2=

eT1eT2...

aeTi...

eTn

and E3=

eT1eT2...

aeTj + eTi

.

..eTn

The statements concerning column operations are obtained by using the above resultson A and then transposing the final results.

Lemma 4.5 The matrices which produce elementary row (column) operations are nonsingular.

26

7/25/2019 cr.intro

30/68

Proof: The inverse ofE1 is ET1 . The inverse ofE2 is the matrix

E2=

eT1eT2...

1a

eTi...

eTn

Finally the inverse ofE3 is the matrix

[e1, e2, . . . , ei, . . . , ej aei, . . . , en]

Again the results for column operations follow from the row results.

Lemma 4.6The column rank of a matrix is invariant under pre multiplication by a nonsingular matrix. Similarly the row rank of a matrix is invariant under post multiplicationby a non singular matrix.

Proof: By Theorem 4.2 the dimension ofN(A) is equal top column rank ofA. Since

N(A={x: Ax= 0}

it follows that x N(A) if and only EAx = 0 where E is non singular. Thus thedimension ofN(A) is invariant under pre multiplication by a non singular matrix andthe conclusion follows. The result for post multiplication follow by considering AT andtransposing.

Corollary 4.7 The column rank of a matrix is invariant under premultiplication byan elementary matrix . Similarly the row rank of a matrix is invariant under postmultiplication by an elementary matrix.

By suitable choices of elementary matrices we can thus write

PA= E

whereP is the product of non singular elementary matrices and hence is non singular.The matrix E is a row echelon matrix i.e. a matrix with r non zero rows in which

27

7/25/2019 cr.intro

31/68

the first k columns consist ofe1, e2, . . . , er and 0s in some order. The remainingp r columns have zeros below the first r rows and arbitrary elements in the remainingpositions. It thus follows that

r= row rank ofA column rank ofA

and hencerow rank ofAT column rank ofAT

But we also have that

row rank ofAT = column rank ofA

androw rank ofA = column rank ofAT

It follows thatcolumn rank ofA row rank ofA

and hence we have:

Theorem 4.8row rank ofA = column rank ofA

Alternative Proof of Theorem 4.8LetA be an n p matrix i.e.

A=

a11 a12 a1pa21 a22 a2p

... ...

. . . ...

an1 an2 anp

We may write A in terms of its columns or rows i.e.

A=Ac1, A

c2, . . . , A

cp

=

aT1aT2

...aTn

28

7/25/2019 cr.intro

32/68

where the j th column vector, Acj , and ith row vector, aTi are given by

Acj =

a1ja2j

...anj

ai =

ai1ai2

...aip

The column vectors of A span a space which is a subset of Rn and is called thecolumn space ofAand denoted by C(A) i.e.

C(A) =sp (Ac1, Ac2, . . . , A

cp)

The row vectors ofA span a space which is a subspace ofRp and is called the rowspaceofAand denoted by R(A) i.e.

R(A) =sp (a1, a2, . . . , an)

The dimension of C(A) is called the column rank of A and is denoted by C =c(A). Similarly the dimension ofR(A) is called the row rankofA and is denoted byR= r(A).

Result 1: Let B=

bc1 bc2 . . . b

cC

where the columns ofB form a basis for the column space ofA. Then there is a CpmatrixL such that

A= BL

It follows that the rows ofAare linear combinations of the rows ofL and hence

R(A) R(L)

Hence

R= dim[R(A)] dim[R(L)] Ci.e. the row rank ofAis less than or equal to the column rank ofA.

29

7/25/2019 cr.intro

33/68

Result 2:

R=

rT1rT2...

rTR

where the rows ofR form a basis for the row space ofA. Then there is a n RmatrixKsuch that

A= KR

It follows that the columns ofA are linear combinations of the columns ofK and hence

C(A) C(K)Hence

C= dim[C(A)] dim[C(K)] R

i.e. the column rank ofA is less than or equal to the row rank ofA.

Hence we have that the row rank and column rank of a matrix are equal. Th commonvalue is called the rankofA.

ReferenceHarville, David A. (1997) Matrix Algebra From a Statisticians Perspective.Springer (pages 36-40).

We thus define the rank of a matrixA,(A) to be the number of linearly independentrows or the number of linearly independent columns in the matrix A. Note that (A)is unaffected by pre or post multiplication by non singular matrices. By pre and postmultiplying by suitable elementary matrices we can write any matrix as

PAQ= B =

Ir OO O

where O denotes a matrix of zeroes of the appropriate order. SinceP and Q are nonsingular we have

A= P1BQ1

which is an example of a matrix factorization.

Another factorization of considerable importance is contained in the following Lemma.

30

7/25/2019 cr.intro

34/68

Lemma 4.9IfA is an n by p matrix of rankr then there are matrices B (nby r) andC(r byp) such that

A= BC

whereB andC are both of rank r.

Proof: Let the rows ofC form a basis for the row space ofA. ThenC is r by p of rankr. Since the rows ofC form a basis for the row space ofA we can write the ith row ofAasaTi =b

TiC for some choice ofbi. Thus A = BC.

The fact that the rank ofB in Lemma 4.9 is r follows from the following importantresult

Theorem 4.10rank (AB) min{rank (A), rank (B)}Proof: Each row ofAB is a linear combination of the rows ofB. Thus

rank (AB) rank (B)

Similarly each column ofAB is a linear combination of the columns ofAso that

rank (AB) rank (A)

The conclusion follows.

Theorem 4.11An n by nmatrix is non singular if and only if it is of rank n.

Proof: For any matrix we can write

PAQ=

Ir OO O

whereP and Q are non singular. Since the right hand side of this equation is invertibleif and only ifr = n the conclusion follows.

Definition 4.3IfA is ann byp matrix then thep byp matrixAA(ATAifA is real)is called theGram matrix ofA.

Theorem 4.12IfA is annby pmatrix then

(1) rank (AA) = rank (A) = rank (AA)

31

7/25/2019 cr.intro

35/68

(2) rank (ATA) = rank (A) = rank (AAT) ifAis real.

Proof: Assume that the field of scalars is such that

izi zi= 0 implies that z1 = z2=

= zn = 0. If Ax = 0 then AAx = 0. Also if AAx = 0 then xAAx = 0 or

zz = 0 so that z = 0 i.e. Ax = 0. Hence AAx = 0 and Ax = 0 are equivalentstatements. It follows that

N(AA) ={x: AAx= 0}={x: Ax= 0}=N(A)

Thus rank (AA) = rank (A). The conclusion that rank (AA) = rank (A) follows bysymmetry.

Theorem 4.13rank (A + B) rank (A) + rank (B)

Proof: Ifa1, a2, . . . , ar and b1, b2, . . . , bsdenote linearly independent columns ofAandBrespectively then

{a1, a2, . . . , ar, b1, b2, . . . , bs}

span the column space ofA + B. Hence

rank (A + B) r+s= rank (A) + rank (B)

Definition 4.4

(1) A square matrix Ais idempotentifA2 =A.

(2) A square matrix Ais nilpotentifAm =O for some integerm greater than one.

Definition 4.5The traceof a square matrix is the sum of its diagonal elements i.e.

tr (A) =i

aii

Theorem 4.14tr (AB) = tr (BA); tr (A + B) = tr (A) + tr (B); tr (AT) = tr (A)

Theorem 4.15IfA is idempotent then

rank (A) = tr (A)

32

7/25/2019 cr.intro

36/68

Proof: We can write A = BC whereB is n by r of rankr,C is r byn of rankr and ris the rank ofA. Thus we have

A2 =BCBC= A = BC

HenceBBCBCC =BBCC

SinceB andC are each of rankr we have thatCB = I and hence

tr (BC) = tr (CB) = tr (I) =r = rank (A)

33

7/25/2019 cr.intro

37/68

Chapter 5

Linear Equations

5.1 Consistency Results

Consider a set ofn linear equations in p unknowns

a11x1 + a12x2 + + a1pxp = y1a21x1 + a22x2 + + a2pxp = y2

... ...

... ...

... ...

... ...

...

an1x1 + an2x2 + + anpxp = yn

which may be compactly written in matrix notation as

Ax= y

where A is the n byp matrix with i, j element equal to aij , x is the p by one columnvector withjth element equal toxj andyis thenby one column vector withith elementequal to yi.

Often such equations arise without knowledge of whether they are really equations,i.e. does there exist a vector x which satisfies the equations? If such an x exists theequations are said to be consistent, otherwise they are said to be inconsistent.

The equations Ax= y can be discussed from a vector space perspective since A isa carrier of a linear transformation from the p dimensional space spanned by the xis to

34

7/25/2019 cr.intro

38/68

then dimensional space where y lives. It follows that a solution exists if and only ifyis inR(A), the range space ofA. Thus a solution exists if and only ify is in the columnspace ofA defined as the set of all vectors of the form {y : y =Ax for some x}. Ifwe consider the augmented matrix [A, y] we have the following theorem on consistency

Theorem 5.1The equations Ax = y are consistent if and only if

rank([A, y]) = rank(A)

Proof: Obviouslyrank([A, y]) rank(A)

If the equations have a solution then there exists x such that Ax = y so that

rank(A) rank([A, y]) = rank([A, Ax]) = rank(A[I, x]) rank(A)

It follows thatrank([A, y]) = rank(A)

if the equations are consistent.

Conversely ifrank([A, y]) = rank(A)

thenyis a linear combination of the columns ofAso that a solution exists.

Theorem 5.1 gives conditions under which solutions to the equations Ax= y existbut we need to know the form of the solution in order to explicitly solve the equations.The equations Ax = 0 are called the homogeneous equations. If x1 and x2 aresolutions to the homogeneous equations then 1x1 + 2x2 is also a solution to thehomogeneous equations. In general, if{x1, x2, . . . , xs} span the null space of A thenx0 =

si=1ixi is a solution to the homogeneous equations. If xp is a particular

solution to the equations Ax = y then xp +x0 is also a solution to the equationsAx= y. Ifx is any solution then x = xp+ (x xp) so that every solution is of thisform. We thus have the following theorem

Theorem 5.2A general solution to the consistent equationsAx= y is of the form

x= xp+ x0

35

7/25/2019 cr.intro

39/68

wherexp is any particular solution to the equations andx0 is any solution to the homo-geneous equations Ax = 0.

A final existence question is to determine when the equationsAx = y have a uniquesolution. The solution is unique if and only if the only solution to the homogeneousequations is the 0 vector i.e. the null space of A contains only the 0 vector. If thesolution x is unique the homogeneous equations cannot have a non zero solution x0since thenx + x0 would be another solution. Conversely, if the homogeneous equationshave only the 0 vector as a solution then the solution to Ax = y is unique ( if therewere two different solutionsx1 andx2 thenx1 x2 would be a non zero solution to thehomogeneous equations).

Since the null space ofA contains only the 0vector if and only if the columns ofAare linearly independent we have that the solution is unique if and only if the rank ofAis equal to p where we assume with no loss of generality that A is n byp with p n.We thus have the following theorem.

Theorem 5.3LetA be n by p with p n then

(1) The equations Ax = y have the general solution

x= xp+ x0

whereAxp = y andAx0= 0 if and only if

rank([A, y]) = rank(A)

(2) The solution is unique (x0 = 0) if and only if

rank(A) =p

5.2 Solutions to Linear Equations

5.2.1 A is p byp of rank p

Here there is a unique solution by Theorem 5.3 with explicit form given by A1y. Theproof of this result is trivial given the following Lemma

36

7/25/2019 cr.intro

40/68

Lemma 5.4Ap by p matrix is non singular if and only if rank(A) =p.

Proof: IfA1 exists it is the unique matrix satisfying

A1A= AA1 =I

Thusp rank(A) p i.e. the rank ofA is pifAis non singular.

Conversely, if rank(A) = p then there exists a unique solution xi to the equationAxi = ei fori = 1, 2, . . . , p. If we define Bby

B= [x1, x2, . . . , xp]

then AB = I. Similarly there exists a unique solution zi to the equation ATzi fori= 1, 2, . . . , pso that if we define

CT = [z1, z2, . . . , zp]

thenATCT =I or CA = I. It follows that that C= B i.e. thatA1 exists.

5.2.2 A is n byp of rank p

IfA is n byp of rank p where p n then by theorem 5.3 a unique solution exists and

we need only find its form. Since the rank ofA is p the rank ofATA is p and henceATAhas an inverse. Thus if we define x1 = (A

TA)1ATy we have that

Ax1 = A(ATA)1ATy= A(ATA)1AT(Ax) =Ax= y

so thatx1 is the unique solution to the equations.

5.2.3 A is n byp of rank r where r p n

In this case a solution may not exist and even if a solution exists it may not be unique.

Suppose that a matrixA can be found such that

AAA= A

37

7/25/2019 cr.intro

41/68

Such a matrix A is called ageneralized inverse ofA

If the equations Ax= y have a solution then x1 = Ay is a solution since

Ax1= A(Ay) =AA(Ax) =AAAx= Ax= y

so that Ax1 is a solution. The general solution is thus obtained by characterizing thesolutions to the homogeneous equations i.e. we need to characterize the vectors in thenull space ofA. Ifz is an arbitrary vector then (I AA)z is in the null space ofA.The dimension of the set Uwhich can be written in this form is equal to the rank ofI AA. Since

(I AA)2 = I AA AA + AAAA

= I A

Awe see that

rank (I AA) = tr (I AA)

= p tr (AA)

= p rank (AA))

= p rank (A)

= p r

It follows thatUis of dimensionp rand henceU=N(A). Thus we have the followingtheorem

Theorem 5.5IfAis n by p of rankr p n then

(1) If a solution exists the general solution is given by

x= Ay+ (I AA)z

wherez is an arbitrary p by one vector andA is a generalized inverse ofA.

(2) If x0 = By is a solution to Ax = y for every y for which the equations areconsistent thenB is a generalized inverse ofA.

Proof: Since we have established (1) we prove (2) Let y = Awhere is arbitrary. The

equationsAx= A

have a solution by assumption which is of the form BA

. It followsthat ABA= A. Since is arbitrary we have that ABA= A i.e. B is a generalizedinverse ofA.

38

7/25/2019 cr.intro

42/68

5.2.4 Generalized Inverses

We now establish that a generalized inverse ofA always exists. Recall that we can findnon singular matricesPand Q such that

PAQ= B i.e. A= P1BQ1

whereB is given by

B=

I OO O

Matrix multiplication shows that BBB= B where

B =

I UV W

whereU, V and W are arbitrary. Since

A(QBP)A = P1BQ1QBPP1BQ1

= P1BQ1

= A

we see that QBP is a generalized inverse ofA establishing the existence of a generalized

inverse for any matrix.

We also note that if

PA=

I CO O

Then another generalized inverse ofA is BPwhere

B =

I OO O

39

7/25/2019 cr.intro

43/68

Chapter 6

Determinants

6.1 Definition and Properties

Definition 6.1 LetA be square matrix with columns {x1, x2, . . . , xp}. The determi-nantofA is that function det : Rp R such that

(1) det(x1, x2, . . . , cxm, . . . , xp) =c det(x1, x2, . . . , xm, . . . , xp)

(2) det(x1, x2, . . . , xm+ xk, . . . , xp = det(x1, x2, . . . , xm, . . . , xp)

(3) det(e1, e2, . . . , ep) = 1 where ei is the ith unit vector.

Properties of Determinants

(1) Ifxi = 0 then xi = 0xi so that by (1) of the definition det(x1, x2, . . . , xp) = 0 ifxi = 0.

(2) det(x1, x2, . . . , xm+cxk, . . . , xp) = det(x1, x2, . . . , xm, . . . , xp). Ifc= 0 this resultis trivial. Ifc= 0 we note that

det(x1, x2, . . . , xm+cxk, . . . , xp) = 1

cdet(x1, x2, . . . , xm+cxk, . . . , cxk, . . . , xp)

40

7/25/2019 cr.intro

44/68

=

1

cdet(x1, x2, . . . , xm, . . . , cxk, . . . , xp)= det(x1, x2, . . . , xm, . . . , xp)

(3) If two columns ofA are identical then det = 0. Similarly if the columns ofAarelinearly dependent then the determinant ofAis 0.

(4) Ifxk andxm are interchanged then the determinant changes sign i.e.

det(x1, x2, . . . , xm, . . . , xk, . . . , xp) = det(x1, x2, . . . , xk, . . . , xm, . . . , xp)

To prove this we note that

det(x1, x2, . . . , xm, . . . , xk, . . . , xp) = det(x1, x2, . . . , xm+ xk, . . . , xk, . . . , xp)

= det(x1, x2, . . . , xm+ xk, . . . , xk (xm+ xk), . . . , xp)

= det(x1, x2, . . . , xm+ xk, . . . , xm, . . . , xp)

= det(x1, x2, . . . , xk, . . . , xm, . . . , xp)

= det(x1, x2, . . . , xk, . . . , xm, . . . , xp)

Thus the interchange of an even number of columns does not change the sign ofthe determinant while the interchange of an odd number of columns does changethe sign.

(5) det(x1, x2, . . . , xm+ y, . . . , xp) is equal to

det(x1, x2, . . . , xm, . . . , xp) + det(x1, x2, . . . , y, . . . , xp)

If the vectorsxi for i=mare dependent the assertion is obvious since both termsare 0. If the xi are linearly independent for i =m and xm =

icixi the assertion

follows by repeated use of (2) and the fact that the first term of the expression isalways zero.

If the xi are linearly independent theny=

idixi so that

det(x1, x2, . . . , xm+ y, . . . , xp) = det(x1, x2, . . . , xm+i

dixi, . . . , xp)

= det(x1, x2, . . . , (1 +dm)xm, . . . , xp)

41

7/25/2019 cr.intro

45/68

= (1 +dm)det(x1, x2, . . . , xm, . . . , xp)

= det(x1, x2, . . . , xp) + det(x1, x2, . . . , dmxm, . . . , xp)

= det(x1, x2, . . . , xp) + det(x1, x2, . . . ,i

dixi, . . . , xp)

= det(x1, x2, . . . , xp) + det(x1, x2, . . . , y, . . . , xp)

Using the above results we can show that the determinant of a matrix is a well definedfunction of a matrix and give a method for computing the determinant. Since anycolumn vector ofA, sayxm can be written as

xm =p

i=1 aimeiwe have that

det(x1, x2, . . . , xp) = det(i1

ai11ei1, x2, . . . , xp)

=i1

ai11det(ei1, x2, . . . , xp)

=i1

ai11i2

ai22det(ei1, ei2, . . . , xp)

=i1

i2

ip

ai11ai22 aippdet(ei1, ei2, . . . , eip)

Now note that det(ei1, ei2, . . . , ep) is 0 if any of the subscripts are the same, +1 ifi1, i2, . . . , ip is an even permutation of 1, 2, . . . , p and 1 if i1, i2, . . . , ip is an odd per-mutation of 1, 2, . . . , p. Hence we can evaluate the determinant of a matrix and it is auniquely defined function satisfying the conditions (1), (2) and (3).

6.2 Computation

The usual method for computing a determinant is to note that ifxm=

iaimxi then

det(x1, x2, . . . , xp) = i

aimdet(x1, x2, . . . , em, . . . , xp)

=i

aimAim

42

7/25/2019 cr.intro

46/68

whereAim = det(x1, x2, . . . , em, . . . , xp)

is called thecofactorofaim and is simply the determinant ofA when the mth columnofA is replaced byei.

We now note that

Aim =i1

i2

ip

ai1ai2 aipdet(ei1, . . . , ei, . . . , eip)

HenceAim= (1)

i+m det(Mim)

whereMim is the matrix formed by deleting the ith row ofA and thejth column ofA.Thus the value of the determinant of a pby p matrix can be reduced to the calculationof p determinants of p 1 by p 1 matrices. The determinant of Mim is called theminorofaim.

43

7/25/2019 cr.intro

47/68

Chapter 7

Inner Product Spaces

7.1 definitions and Elementary Properties

Definition 7.1Aninner producton a linear vector spaceVis a function ( ,) mappingV V into C such that

1. (x, x) 0 with (x, x) = 0 if and only ifx = 0.

2. (x, y) = (x, y)3. (x + y,vbfz) =(x, z) +(y, z)

A vector space with an inner product is called an inner product space.

Definition 7.2: Thenormor length ofx,||x||is defined by

||x||2 = (x, x)

Definition 7.3: The vectorsx and y are said to be orthogonalif (x, x) = 0. We writex y ifx is orthogonal to y. More generally

(1) xis said to be orthogonal to the set of vectors X if

(x, y) = 0 for every y X

44

7/25/2019 cr.intro

48/68

(2) X andYare said to be orthogonal if

(x, y) = 0 for every x X and y Y

Definition 7.4: X is an orthonormalset of vectors if

(x, y) =

1 ifx = y0 ifx =y

X is a complete orthonormal set of vectors if it is not contained in a larger set oforthonormal vectors.

Lemma 7.1IfXis an orthonormal set then its vectors are linearly independent.

Proof: If

iix= 0 for xi X then

0=

i

ixi, xj

=

i

i(xi, xj) =j

It follws that each of thejare equal to 0 and hence that the xjs are linearly independent.

Lemma 8.2 Let {x1, x2, . . . , xp} be an orthonormal basis for X. Thenx X if andonly ifx xi fori = 1, 2, . . . , p.

Proof: Ifx X then by definition x xi fori = 1, 2, . . . , p.Conversely, ifx xi fori = 1, 2, . . . , p. Then ify X we have y =

iixi. Thus

(x, y) =i

i(x, xi) = 0

Thusx X.

Theorem 7.3(Bessels Inequality) Let {x1, x2, . . . , xp} be an orthonormal set in a linearvector spaceVwith an inner product. Then

1.i |i|2 ||x||2 wherex is any vector in V and i= (x, xi).2. The vectorr = x

iixis orthogonal to each xj and hence to the space spanned

by{x1, x2, . . . , xp}.

45

7/25/2019 cr.intro

49/68

Proof: We note that

||x i

ixi||2 =

x

i

ixi, x i

ixi

= (x, x) i

(i(xi, x) (x,i

ixi) + (i

ixi,i

ixi)

= ||x||2 i

ii i

i(x, xi) +i

i(xi,i

ixi)

= ||x||2 2i

|i|2 +

i

i(xi, xi)

= ||x||2

i|i|

2

Since||x

iixi||2 0 result (1) follows.

For result (2) we note that

x

i

ixi, xj

= (x, xj)

i

i(x i, xj) = (x, xj) (x, xj) = 0

Theorem 7.4(Cauchy Schwartz Inequality) Ifx and y are vectors in an inner productspace then

|(x, y)| ||x|| ||y||

Proof: If y = 0 equality holds. If y = 0 then y||y||

is orthonormal and by Besselsinequality we have

x, y

||y||

2

||x||2 or |(x, y)| ||x|| ||y||

Theorem 7.5IfX ={x1, x2, . . . , xn} is any finite orthonormal set in a vector space Vthen the following conditions are equivalent:

(1) X is complete.

(2) (x, xi) =0 for i = 1, 2, . . . , n implies thatx= 0.

46

7/25/2019 cr.intro

50/68

(3) The space spanned byX is equal toV.

(4) Ifx V thenx =

i(x, xi)xi.

(5) Ifx and yare inV then

(x, y) =i

(x, xi)(xi, y)

(6) Ifx V then||x||2 =

i

|(xi, x)|2

Result (5) is called Parsevals identity.

Proof:(1) =(2) IfXis complete and (x, xi) = 0 for eachi then

x||x||

could be added to thesetXwhich would be a contradiction. Thus x= 0.

(2) = (3) Ifx is not a linear combination of the xi then x

i(x, xi)xi is not equaltox and is orthogonal to Xwhich is a contradiction. Thus the sapace spanned by X isequal to V

(3) =(4) Ifx Vwe can write x=

jjxj. It follows that

(x, xi) =j

j(xj, xi) =i

(4) =(5) Sincex=

i

(x, xi)xi and y=j

(y, xj)xj

we have

(x, y) = (i

(x, xi)xi,j

(y, xj)xj)

= i

(x, xi) (y, xi)

=i

(x, xi)(xi, y)

47

7/25/2019 cr.intro

51/68

(5) =(6) In result (5) simply setx = y.

(6) =(1) Ifxo is orthogonal to each xi then by (6) we have

||xo||2 =

i

|(xi, xo)|2 = 0

so thatxo = 0.

7.2 Gram Schmidt Process

The Gram Schmidt process can be used to construct an orthonormal basis for a finitedimensional vector space. Start with a basis forV as {x1, x2, . . . , xn}. Form

y1 = x1||x1||

Next definez2 = x2 (x2, y1)y1

Since thexiare linearly independentz2 =0 and z2is orthogonal toy1. Hencey2 = z2||z2||

is orthogonal to y1 and has unit norm. If y1, y2, . . . , yr have been so chosen then weform

zr+1= xr+1 r

i=1

(xr+1, yi)yi

Since the xi are linearly independent ||zr+1||> 0 and since zr+1 yi fori= 1, 2, . . . , rit follows that

yr+1= zr+1||zr+1||

may be added to the set {y1, y2, . . . , yr}to form a new orthonormal set. The processnecessarily stops withynsince there can be at mostnelements in a linearly independentset.

48

7/25/2019 cr.intro

52/68

7.3 Orthogonal Projections

Theorem 7.6 (Orthogonal Projection) LetUbe a subspace of an inner product spaceVand lety be a vector in Vwhich is not inU. Then there exists a unique vectoryU Uand and a unique vector e V such that

y= yU+ e and e U

Proof: Let {x1, x2, . . . , xr} be a basis ofU. Since y / U the set {x1, x2, . . . , xr, y} isa linearly independent set. Use the Gram Schmidt process on this set to form a setof orthonormal vectors {y1, y2, . . . , yr, yr+1}. Since y is in the space spanned by these

vectors we can write

y=r+1i=1

iyr =r

i=1

iyr+r+1yr+1

If we define

yU=r

i=1

iyi and e= r+1yr+1

thenYU U and e U. To show uniqueness, suppose that we also have

y= z1+ e1 where z1 U and e1 U

Then we have (z1 yU) + (e1 e) =0

so that (z1 yU, z1 yU) = 0 and hence z1= yU and e1 = e.

Definition 7.4The vectore in Theorem 7.6 is called theorthogonal projectionfromy to Uand the vector yUis called the orthogonal projectionofy onU.

Theorem 7.7The projection ofy on Uhas the property that

||y yU||=min

x {||y x||: x U}

Proof: Letx UThen (x yU, e) = 0. Hence

||y x|| = (y x, y x)

49

7/25/2019 cr.intro

53/68

= (y yU+ yU x, y yU+ yU x)

= ||y yU|| + ||yU x|| + (y yU, yU x) + (yU x, y yU)

= ||y yU|| + ||yU x||

||y yU||

with the minimum occurring whenx = yU. Thus the projection minimizes the distancefromU to x.

50

7/25/2019 cr.intro

54/68

Chapter 8

Characteristic Roots and Vectors

8.1 Definitions

Definition 8.1: A scalar is a characteristic root and a non-zero vector x is acharacteristic vectorof the matrix Aif

Ax= x

Other names for charateristic roots are proper value, latent root, eigenvalue and secularvalue with similar adjectives appying to characteristic vectors.

If is a characteristic root ofAlet the setX be defined by

X = {x : Ax= x}

Thegeometric multiplicityof is defined to be the dimension ofX {0}. is saidto be a simple characteristic root if its geometric multiplicity is 1.

Definition 8.2: The spectrum ofA is the set of all characteristic roots ofA and isdenoted by (A)

Since Ax= x if and only if [A ] x= 0 we see that (A) is equivalent tothe set of such thatA I. Similarly x is a characteristic vector corresponding to

51

7/25/2019 cr.intro

55/68

if and only x is in the null space ofA I. Using the properties of determinants wehave the following Theorem.

Theorem 8.1 (A) if and only if det(A I) = 0.

If the field of scalars is the set of complex numbers (or any algebraically closed field)then ifAis p by p we have

det(A A) =q

i=1

( i)ni

whereq

i=1 =p and ni 1

Thealgebraic multiplicityof the characteristic rootiis the numberniwhich appearsin the above expression.

The algebraic and geometric multiplicity of a characteristic root need not correspond.To see this let

A=

1 10 1

Then

det (A I) =

1 10 1

= ( 1)

2

It follows that both of the characteristic roots ofA are equal to 1. Hence the algebraicmultiplicity of the characteristic root 1 is equal to 2. However, the equation

Ax=

1 10 1

x1x2

=

x1x2

yields the equations

x1+x2 = x1

x2 = x2

Thus the geometric multiplicity ofAis equal to 1.

Lemma 8.2

52

7/25/2019 cr.intro

56/68

(1) The spectrum ofTAT1 is the same as the spectrum ofA.

(2) If is a characteristic root ofA thenp() is a characteristic root ofp(A) wherepis any polynomial.

Proof: Since TAT1 I= T[A I]T1 it follows that [A I]x= 0 if and only if[TAT1 I]Tx= 0 and the spectrums are thus the same.

IfAx = xthen Anx= nxfor any integer nso thatp(Ax= p()x.

8.2 Factorizations

Theorem 8.3 (SchursTriangular Factorization) Let A be any p by p matrix. Thenthere exists a unitary matrix U i.e. UU= I such that

UAU= T

whereT is an upper right triangular matrix with diagonal elements equal to the char-acteristic roots ofA

Proof: Letp = 2 and let u1 be such that

Au1= 1u1 and u1u1= 0

Define the matrix U = [u1, u2] whereu2 is chosen so that UU= I. Then

UAU = [u1, u2]A[u1, u]

=

u1u2

[u1, Au2]

=

u1Au20 u2Au2

Note that u2Au2 = 2 is the other characteristic root of A. Now assume that theresult is true for n = N and let n = N + 1. Let Au1 = 1u1 and u

1u1 = 1 and let

{vbfb1, b2, . . . , bN} be such that

U1 = [u1, b1, b2, . . . , bN]

53

7/25/2019 cr.intro

57/68

is unitary. Thus

U1AU1 = [u1, b1, b2, . . . , bN]A[u1, b1, b2, . . . , bN]

= [u1, b1, b2, . . . , bN][1u1, Ab1, Ab2, . . . , AbN]

=

1 u

1Au1 u

1AuN

0 BN

Since the characteristic roots of the above matrix are the solutions to the equation

(1 )det(BN I) = 0

it follows that the characteristic roots of BN are the remaining characteristic roots1, 2, . . . , N+1ofA. By the induction assumption there exists a unitary matrixL suchthat

LBNL

is upper right triangular with diagonal elements equal to 1, 2, . . . , N+1. Let

UN+1=

1 0T

0 L

and U= U1UN+1

Then

UAU = UN+1U1AU1UN+1

= UN+1

1 t

0 BN

=

1 0T

0 L

1 b

0 BN

1 0T

0L

=

1 b

0 LBNL

which is upper right triangular as claimed with the diagonal elements equal to thecharacteristic roots ofA.

Theorem 8.4IfA is normal (A

A= AA

) then there exists a unitary matrix U suchthatUAU= D

54

7/25/2019 cr.intro

58/68

whereD is diagonal with elements equal to the characteristic roots ofA.

Proof: By Theorem 8.3 there is a unitary matrix U such that UAU = T whereT is upper right triangular. Thus we als have UAU = T where T is lower righttriangular. It follows that

UAAU= TT and UAAU= TT

Hence the off diagonal elements ofT vanish i.e. Tis diagonal as claimed.

Theorem 8.5(Cochrans Theorem) IfA is normal then

A=

ni=1

iEi

whereEi Ej =O for i=j, E2i =Ei and the i are the characteristic roots ofA.

Proof: By Theorem 8.3 there is a unitary matrix Usuch that UAU= D is diagonalwith diagonal elements equal to the characteristic roots ofA. Hence

A = UDU

= U

1u1

2u2

...

nun

=ni=1

iuiui

DefiningEi= uiui completes the proof.

This representation of A is called the spectral representation of A. In mostapplicationsAis symmetric.

The spectral representation can be rewritten as

A=

ni=1

iEi=

qj=1

jEj

55

7/25/2019 cr.intro

59/68

where1< 2< < q are the distinct characteristic roots ofA. Now define

S0 = O and Si =i

j=1

Ej for i= 1, 2, . . . , q

ThenEi = Si Si1 fori= 1, 2, . . . , q and hence

A =q

j=1

jEj

=q

j=1

j(Sj Sj1)

=q

j=1

jSj

which may be written as a Stielges integral i.e.

A=

dS()

which is also called the spectral representation ofA.

The spectral representation ofAhas the property that

(Ax, y) = (

dS()x, y)

=

qj=1

jSjx, y

=q

j=1

j(Sjx, y)

=q

j=1

j[(Sjx, y) (Sj1x, y)]

= d(S()x, y)In more advanced treatments applicable to Hilbert spaces the property just establishedis taken as the defining property for

d(S().

56

7/25/2019 cr.intro

60/68

8.3 Quadratic Forms

IfA is a symmetric matrix with real elements xTAx is called a quadratic form. Aquadratic form is said to be

positive definite ifxTAx>0 for all x=0

non negative definite ifxTAx 0 for all x=0

IfA is positve definite then all of its characteristic roots are positive while ifA is nonnegative definite then all of its characteristic roots are non negative. To prove this note

that ifxi is a characteristic vector ofA corresponding toi thenAxi= ixi and hencexTiAxi = ix

Tixi.

Ifi=j then the corresponding characteristic vectors are orthogonal since

jxTixj =x

TiAxj = (Axi)xj =ix

Tixj

which implies thatxTixj = 0 i.e. thatxi andxj are orthogonal ifi=j .

Another way of characterizing characteristic roots is to consider the stationary valuesof

XTAx

xTxas x varies over Rp. Since

A=pi=1

iEi

where1 2 p we have that

Ax=pi=1

iEix

It follows that

XTAx =pi=1

ixTEix

57

7/25/2019 cr.intro

61/68

=

pi=1

ixT

EiETix

=pi=1

iEix2

SinceEi= yiyTi , where theyi form an orthonormal basis we have that U

TU= I where

U= [y1, y2, . . . , yp]

It follows that U1 =UT and hence thatUUT =I Thus

I = UU

T

= [y1, y2, . . . , yp]

y1y2...

yp

=pi=1

yiypi

=pi=1

Ei

It follows that

xTx =pi=1

xTEix

=pi=1

xTEiETix

=pi=1

Eix2

ThusXTAx

xTx =p

i=1iEix2p

i=1 Eix2

58

7/25/2019 cr.intro

62/68

7/25/2019 cr.intro

63/68

Theorem 8.8 A necessary and sufficient condition that a symmetric matrix on a realinner product space satisfiesA = 0 is that Ax, x= 0.

Proof: Necessity is obvious.

To prove sufficiency we note that

A(x + y), (x + y) = (Ax + Ay), (x + y)

. = Ax, x + Ax, y + Ay, x + Ay, y

It follows that

Ax, y + Ay, x= A(x + y), (x + y) A(x, (x A(y, (y

Thus the real part ofAx= 0 sinceA is symmetric. ThusAx= 0 and by Lemma 8.2the result follows.

8.4 Jordan Canonical Form

The most general canonical form for matrices is the Jordan Canonical Form. Thisparticular canonical form is used in the theory of Markov chains.

The Jordan canonical form is a nearly diagonal canonical form for a matrix. LetJpi(i) be pi pi matrix of the form

Jpi(i) =iIpi+ Npi

where

Npi =

0 1 0 0 00 0 1 0 0...

... ...

... . . .

...0 0 0 0 10 0 0 0 0

is a nilpotent matrix of orderpi i.e. Npipi =0.

60

7/25/2019 cr.intro

64/68

Theorem 8.9 (Jordan Canonical Form). Let 1, 2, . . . , q be the q distinct charac-teristic roots of a p p matrix A. Then there exists a non singular matrix V suchthat

VAV1 = diag

Jp1 , Jp2, . . . , Jpq

One use of the Jordan canonical form occurs when we want an expression for An.Using the Jordan canonical form we see that

VAnV1 = diag

Jnp1, Jnp2, . . . , J

npq

Thus ifn max pi we have

Jnpi(i) = (iIpi+ Npi)

n

=

r= 0n

n

r

nri N

rpi

by the binomial expansion. Since Nrpi =0 ifr pi we obtain

Jnpi(i) =pi1r=0

n

r

nri N

rpi

Thus for large n we have a relatively simple expression for An.

8.5 The Singular Value Decomposition (SVD) of aMatrix

Theorem 8.10 IfA is any nbyp matrix then there exists matrices U, V and D suchthat

A= U DVT

where

U is n by p and UTU=Ip

V is p by pandVTV =Ip

61

7/25/2019 cr.intro

65/68

D= diag (1, 2, . . . , p) and

1 2 p0

Proof: Letx be such that ||x|| = 1 and ||Ax|| =||A||. That such an x can be chosenfollows from the fact that the norm is a continuous function on a closed and boundedsubset ofRn and hence achieves its maximum at a point in the set. Lety1 = Ax anddefine

y= y1||A||

Then ||y||= 1 and Ax=||A||y= y

where = ||A||LetU1 andV1 be such that

U= [y, U1] and [x, V1] are orthogonal

whereU isnby pandV isp by p. It then follows that

UTAV =

yT

UT1

A [x, V1] =

yTAx yTAV1UT1Ax U

T1AV1

Thus

A1 = UTAV =

wT

0 B1

wherewT =yTAV1, B1 = UT1AV1 and y

TAx= yTy=

If we define

z= A1

w

=

wT

0 B1

w

=

2 + wTw

B1w

Then

||z||2 = [2 + wTw, wTBT1]

2 + wTw

B1w

= (2 + wTw)2 + wTBT1B1w

62

7/25/2019 cr.intro

66/68

Thus2 =||A||2 =||UTAV||2 =||A1||2 (2 + wTw)

and it follows that w = 0. Thus

UTAV =

0T

0 B1

Induction now establishes the result. The orthogonality properties of U and V thenimply that

UTAV =D A= U DVT

8.5.1 Use of the SVD in Data Reduction and Reconstruction

LetY be a data matrix consisting ofn observations on p response variables. That is

Y=

y11 y12 y1py21 y22 y2p

... ...

. . . ...

yn1 yn2 ynp

The singular value decomposition ofY allows us to representY as

Y= U1D1VT1

where U1 is n byp, V1 is p byp and D1 is a p byp diagonal matrix with ith diagonalelement di. We know that di 0 and we may assume that

d1 d2 dp

Recall that the matrices U1 and V1 have the properties

UT1U1= Ip and VT1 V1= Ip

so that the columns of U1 form an orthonormal basis for the column space of Y i.e.the subspace ofRn spanned by the columns ofY. Similarly the columns ofV1 form an

63

7/25/2019 cr.intro

67/68

orthonormal basis for the row space ofY i.e. the subspace ofRp space spanned by therows ofY.

The rows ofY represent points in pdimensional space and the

yi1, yi2, . . . , yip

represent the coordinates of the ith individuals values relative to the axes of the poriginal variables. Two individuals are close in this space if the distance (norm)between them is small. In this case the two individuals have nearly the same valueson each of the variables i.e. their coordinates with respect to the original variables arenearly equal. We refer to this subspace of Rp as individual space or observation

space.

The columns ofY represent p points in Rn and represent the observed values of avariable on n individuals. Two points in this space are close if they have similar valuesover the entire collection of observed individuals i.e. the two variables are measuring thesame thing. We call this space variable space.

The distance between individuals i and i is given by

||yi yi ||=

pj=1

(yij yij)2

1/2

For any data matrixY there is an orthogonal matrix V1 such that ifZ= YV1 then thedistances between individuals are preserved i.e.

||zi zi||=||yiV1 yiV1||=(yi yi)

TVT1 V1(yi yi)1/2

=||yi yi||

Thus the original data is replaced by a new data set in which the coordinate axes of theindividuals are now the columns ofZ.

We now note thatU1 = [u1, u2, , up]

whereuj is the jth column vector ofU1 and that

V1= [v1, v2, , vp]

64

7/25/2019 cr.intro

68/68

wherevj is the j thcolumn vector ofV1. Thus we can write

Y= U1DVT1 =

pj=1

d2jujvTj

That is we may reconstruct Y as a sum of constants (the d2j) times matrices whichare the outer product of vectors which span variable space (the uj) and individualspace (the vj). The interpretation of these vectors is of importance to understandingthe nature of the representaion. We note that

YYT = (U1D1VT1 )(V1D1U

T1) =U1D

21U

T1

ThusYYTU1= U1D

21

which shows that the columns ofU1 are the eigenvectors ofYYT with eigenvalues equal

to the d2j .

We also note that

YTY= (V1D1UT1)(U1D1V

T1 ) =V1D

21V

T1

so thatYTYV1 = V1D

21

Thus the columns ofV1 are the eigenvalues ofYTY with eigenvalues again equal to the

d2j .

The relationshipYV1 = U1D1

shows that the columns ofV1 transform an individuals values on the original variablesto values in the new variablesU1 with relative importance given by the dj

If some of the dj are zero or are approximately 0 then we can approximate Y as

Yp

j=1

d2jujvTj

which tells us that an approximating subspace has the same dimension in variable space

cr.intro

Documents