cr.intro

Upload: ahshali

Post on 27-Feb-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 cr.intro

    1/68

    Linear Algebra and Matrices

    Notes from Charles A. Rohde

    Fall 2003

  • 7/25/2019 cr.intro

    2/68

    Contents

    1 Linear Vector Spaces 11.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . 3

    2 Linear Transformations 7

    2.1 Sums and Scalar Products . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3 Matrices 9

    3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.2 Sums and Products of Matrices . . . . . . . . . . . . . . . . . . . . . . . 11

    3.3 Conjugate Transpose and Transpose Operations . . . . . . . . . . . . . . 13

    3.4 Invertibility and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.5 Direct Products and Vecs . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    4 Matrix Factorization 22

    4.1 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    i

  • 7/25/2019 cr.intro

    3/68

    5 Linear Equations 34

    5.1 Consistency Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.2 Solutions to Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . 36

    5.2.1 Aisp by p of rankp . . . . . . . . . . . . . . . . . . . . . . . . . 36

    5.2.2 Aisn byp of rankp . . . . . . . . . . . . . . . . . . . . . . . . . 37

    5.2.3 Aisn byp of rankr where r p n . . . . . . . . . . . . . . . 37

    5.2.4 Generalized Inverses . . . . . . . . . . . . . . . . . . . . . . . . . 39

    6 Determinants 40

    6.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    6.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    7 Inner Product Spaces 44

    7.1 definitions and Elementary Properties . . . . . . . . . . . . . . . . . . . . 44

    7.2 Gram Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    7.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    8 Characteristic Roots and Vectors 51

    8.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    8.2 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    8.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    8.4 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    8.5 The Singular Value Decomposition (SVD) of a Matrix . . . . . . . . . . . 61

    8.5.1 Use of the SVD in Data Reduction and Reconstruction . . . . . . 63

    ii

  • 7/25/2019 cr.intro

    4/68

    Chapter 1

    Linear Vector Spaces

    1.1 Definition and Examples

    Definition 1.1A linear vector space Vis a set of points (called vectors) satisfying thefollowing conditions:

    (1) An operation + exists on Vwhich satisfies the following properties:

    (a) x + y= y+ x

    (b) x + (y+ z) = (x + y) + z

    (c) A vector0exists in V such thatx + 0= x for every x V

    (d) For everyx Va vector xexists in Vsuch that (x) + x= 0

    (2) An operation exists onVwhich satisfies the following properties:

    (a) (x + y) = y + x

    (b) ( x) = () x

    (c) (+) x= x + x(d) 1 x= x

    1

  • 7/25/2019 cr.intro

    5/68

    where the scalars and belong to a field Fwith the identity being given by 1.For ease of notation we will eliminate the in scalar multiplication.

    In most applications the field Fwill be the field of real numbersRor the field of complexnumbers C. The 0 vector will be called the null vectoror the origin.

    example 1: Letx represent a point in two dimensional space with addition and scalarmultiplication defined by

    x1x2

    +

    y1y2

    =

    x1+y1x2+y2

    and

    x1x2

    =

    x1x2

    The origin and negatives are defined by 00

    and

    x1x2

    =

    x1x2

    example 2: Letx represent a point inn dimensional space (called Euclidean space anddenoted byRn) with addition and scalar multiplication defined by

    x1x2...

    xn

    +

    y1y2...

    yn

    =

    x1+y1x2+y2

    ...xn+yn

    and

    x1x2...

    xn

    =

    x1x2

    ...xn

    The origin and negatives are defined by

    00...0

    and

    x1x2...

    xn

    =

    x1x2

    ...xn

    example 3: Let x represent a point in n dimensional complex space (called Unitaryspace and denoted by Cn) with addition and scalar multiplication defined by

    x1x2...

    xn

    + y1y2...

    yn

    = x1+y1x2+y2

    ...xn+yn

    and x1x2...

    xn

    = x1x2

    ...xn

    2

  • 7/25/2019 cr.intro

    6/68

    The origin and negatives are defined by

    00...0

    and

    x1x2...

    xn

    =

    x1x2

    ...xn

    In this case the xi and yi can be complex numbers as can the scalars.example 4: Letp be an nth degree polynomial i.e.

    p(x) =0+1x+ +nxn

    where the i are complex numbers. Define addition and scalar multiplication by

    (p1+ p2)(x) =ni=0

    (1i+2i)xi and p=

    ni=0

    ixi

    The origin is the null function and p is defined by

    p=ni=0

    ixi

    example 5: LetX1, X2, . . . , X nbe random variables defined on a probability space and

    defineVas the collection of all linear combinations ofX1, X2, . . . , X n. Here the vectorsinVare random variables.

    1.2 Linear Independence and Bases

    Definition 1.2 A finite set of vectors {x1, x2, . . . , xk} is said to be a linearly inde-pendent setif

    i

    ixi= 0 =i= 0 for each i

    If a set of vectors is not linearly independent it is said to be linearly dependent. Ifthe set of vectors is empty we define

    i xi = 0 so that, by convention, the empty set ofvectors is a linearly independent set of vectors.

    3

  • 7/25/2019 cr.intro

    7/68

    Theorem 1.1The set {x1, x2, . . . , xk} is linearly dependent if and only if

    xt=t1i=1

    ixi for some t 2

    Proof: Sufficiency is obvious since the equation

    xt t1i=1

    ixi = 0

    does not imply that all of the coefficients of the vectors are equal to 0.

    Conversely, if the vectors are linearly dependent then we havei

    ixi = 0

    where at least two of the s are non-zero (we assume that none of the xs are the zerovector). Thus, with suitable relabelling, we have

    ti=1

    ixi = 0

    wheret= 0. Thus

    xt = t1i=1

    it

    xi =

    t1i=1

    ixi

    Definition 1.3A linear basisor coordinate systemin a vector spaceVis a setEoflinearly independent vectors in Vsuch that each vector in Vcan be written as a linearcombination of the vectors in E.

    Since the vectors in Eare linearly independent the representation as a linear com-bination is unique. If the number of vectors in E is finite we say that V is finitedimensional.

    Definition 1.4The dimensionof a vector space is the number of vectors in any basis

    of the vector space.

    4

  • 7/25/2019 cr.intro

    8/68

    If we have a set of linearly independent vectors {x1, x2, . . . , xk}, this set may be ex-tended to form a basis. To see this let Vbe finite dimensional with basis {y1, y2, . . . , yn}.Consider the set

    {x1, x2, . . . , xk, y1, y2, . . . , yn}

    Letz be the first vector in this set which is a linear combination of the preceeding ones.Since the xs are linearly independent we must have z = yi for some i. Thus the set

    {x1, x2, . . . , xk, y1, y2, . . . , yi1, yi+1, . . . , yn}

    can be used to construct every vector in V. If this set of vectors is linearly independentit is a basis which includes the xs. If not we continue removing ys until the remaining

    vectors are linearly independent.It can be proved, using the Axiom of Choice, that every vector space has a basis. In

    most applications an explicit basis can be written down and the existence of a basis isa vacuous question.

    Definition 1.5A non-empty subsetM of a vector spaceVis called alinear manifoldor a subspace ifx, y V implies that every linear combinationx +y M.

    Theorem 1.2The intersection of any collection of subspaces is a subspace.

    Proof: Since 0 is in each of the subspaces it is in their intersection. Thus the intersectionis non-empty. Ifx and y are in the intersection then x + yis in each of the subspacesand hence in their intersection. It follows that the intersection is a subspace.

    Definition 1.6If{xi, i I} is a set of vectors the subspace spanned by {xi, i I}isdefined to be the intersection of all subspaces containing {xi, i I}.

    Theorem 1.3 If{xi, i I} is a set of vectors the subspace spanned by{xi, i I},sp({xi, i I}), is the set of all linear combinations of the vectors in {xi, i I}.

    Proof: The set of all linear combinations of vectors in {xi, i I} is obviously a subspacewhich contains {xi, i I}. Conversely, sp({xi, i I}) contains all linear combinationsof vectors in{xi, i I}so that the two subspaces are identical.

    It follows that an alternative characterization of a basis is that it is a set of linearlyindependent vectors which spans V.

    5

  • 7/25/2019 cr.intro

    9/68

    example 1: IfXis the empty set then the space spanned by X is the 0 vector.example 2: InR3 letXbe the space spanned by e1, e2 where

    e1=

    10

    0

    and e2=

    01

    0

    Thensp{e1, e2} is the set of all vectors of the form

    x=

    x1x20

    example 3: If{X1, X2, . . . , X n} is a set of random variables then the space spannedby{X1, X2, . . . , X n}is the set of all random variables of the form

    iaiXi. This set is a

    basis ifP(Xi= Xj) = 0 for i =j .

    6

  • 7/25/2019 cr.intro

    10/68

    Chapter 2

    Linear Transformations

    LetU be apdimensional vector space and let Vbe ann dimensional vector space.

    2.1 Sums and Scalar Products

    Definition 2.1A linear transformationL fromU toVis a mapping (function) from UtoV such that

    L(x +y) =L(x) +L(y) for everyx, y U and all ,

    Definition 2.2 The sum of two linear transformations L1 and L2 is defined by theequation

    L(x) =L1(x) + L2(x) for every x U

    SimilarlyLis defined by[L](x) =L(x)

    The transformation O defined byO(x) =0 has the properties:

    L + O= O + L ; L + (L) = (L) + L= O

    Also note thatL1 + L2 = L2 + L1. Thus linear transformations with addition and scalarmultiplication as defined above constitute an additive commutative group.

    7

  • 7/25/2019 cr.intro

    11/68

    2.2 Products

    IfU is p dimensional,V ism dimensional and W is n dimensional and

    L1: U V and L2: V W

    we define the product, L, ofL1 andL2 by

    L(x) =L2(L1(x))

    Note that the product of linear transformations is not commutative. The following aresome properties of linear transformations

    LO = OL= O

    L1(L2+ L3) = L1L2+ L1L3

    L1(L2L3) = (L1L2)L3

    IfU=V,L is called a linear transformation on a vector space and we can define theidentity transformation I by the equation

    I(x) =x

    ThenLI= IL = L

    We can also define, in this case, A2, and in general An for any integern. Note that

    AnAm =An+m and (An)m =Anm

    If we define A0 =I then we can define p(A), where p is a polynomial by

    p(A)(x) =ni=0

    iAix

    8

  • 7/25/2019 cr.intro

    12/68

    Chapter 3

    Matrices

    3.1 Definitions

    IfL is a linear transformation fromU (pdimensional) toV(ndimensional) and we havebases {xj : j J} and {yi : i I} respectively for U and Vthen for each j J wehave

    L(xj) =n

    i=1ijyi for some choice of ij

    Then parray

    11 12 1p21 22 2p

    ... ...

    ... ...

    n1 n2 np

    is called the matrix of L relative to the bases {xj : j J} and {yi : i I}. Todetermine the matrix ofL we express the transformed jth vector of the basis in the Uspace in terms of the basis of the V space. Then numbers so obtained form the jthcolumn of the matrix ofL. Note that the matrix ofL depends on the bases chosen for

    U andV.In typical applications a specific basis is usually present. Thus in Rn we usually

    9

  • 7/25/2019 cr.intro

    13/68

    choose the basis {e1, e2, . . . , en} where

    ei =

    1i2i...

    ni

    and ji =

    1 j = i0 j=i

    In such circumstances we callLthe matrix of the linear transformation.

    Definition 3.1 A matrix L is an n by p array of scalars and is a carrier of a lineartransformation (Lcarries the basis ofUto the basis ofV).

    The scalars making up the matrix are called the elements of the matrix. We will

    writeL={ij}

    and call ij the i, j element ofL(ij is thus the element in the ith row andjth columnof the matrixL). The zero or null matrix is the matrix each of whose elements is 0 andcorresponds to the zero transformation. A one by p matrix is called a row vector andis written as

    Ti = (i1, i2, . . . , ip)

    Similarly annby one matrix is called a column vectorand is written as

    j =

    1j

    2j...nj

    With this notation we can write the matrix Las

    L=

    T1

    T2...

    Tn

    Alternatively if j denotes the j th column ofL we may write the matrix Las

    L= 1 2 p

    10

  • 7/25/2019 cr.intro

    14/68

    3.2 Sums and Products of Matrices

    If L1 and L2 are linear transformations then their sum has matrix L defined by theequation

    L(xj) = (L1+ L2)(xj)

    = L1(xj+ L2(xj)

    =i

    (1)ij yi+

    i

    (2)ij yi

    =

    i

    (1)ij +

    (2)ij

    yi

    Thus we have the following

    Definition 3.2The sum of two matricesAand B is defined as

    A + B= {aij+ bij}

    Note thatA and Bboth must be of the same order for the sum to be defined.

    Definition 3.3The multiplication of a matrix by a scalar is defined by the equation

    A= {aij}

    Matrix addition and scalar multiplication of a matrix have the following properties:

    A + O= O + A= A

    A + B= B + A

    A + (B + C) = (AB) + C

    A + (A) =O

    IfL1 and L2 are linear transformations then their product has matrix L defined by

    the equation

    L(xj) = L2(L1(xj))

    11

  • 7/25/2019 cr.intro

    15/68

    = L2

    k (1)

    kjyk

    =

    k

    (1)kjL2(yk)

    =k

    (1)kj

    i

    (2)ik zi

    =i

    k

    (2)ik

    (1)kj

    zi

    Thus we have

    Definition 3.4The product of two matrices Aand B is defined by the equation

    AB={k

    aikbkj}

    Thus the matrix of the product can be found by taking the ith row ofA times the jthcolumn ofB element by element and summing. Note that the product is defined onlyif the number of columns of A is equal to the number of rows of B. We say that ApremultipliesBor thatBpost multipliesAin the product AB. Matrix multiplicationis not commutative.

    Provided the indicated products exist matrix multiplication has the following prop-

    erties:

    AO= O and OA= O

    A(B + C) =AB + AC

    (A + B)C= AC + BC

    A(BC) = (AB)C

    Ifn= p the matrix A is said to be a square matrix. In this case we can computeAn

    for any integern which has the following properties:

    AnAm =An+m

    12

  • 7/25/2019 cr.intro

    16/68

    (An)m =Anm

    TheidentitymatrixI can be found using the equation

    I(xj) =xj

    so that the j th column of the identity matrix consists of a one in the j th row and zeroselsewhere. The identity matrix has the property that

    AI= IA = A

    If we define A0 =I then ifp is a polynomial we may define p(A) by the equation

    p(A) =ni=0

    iAi

    3.3 Conjugate Transpose and Transpose Operations

    Definition 3.5The conjugate transposeofA,A is

    A ={aji}

    where ais the complex conjugate ofa. Thus ifA is n byp, the conjugate transposeA,is p by n withi, j element equal to the complex conjugate of the j, ielement ofA.

    Definition 3.6The transpose ofA, AT is

    AT ={aji}

    Thus ifA is n byp the transposeAT isp byn withi, j element equal to thej, ielementofA.

    IfA is a matrix with real elements then A =AT.

    Definition 3.7A square matrix is said to be

    Hermitian ifA =A

    13

  • 7/25/2019 cr.intro

    17/68

    symmetric ifAT =A

    normalifAA =AA

    The following are some properties of the conjugate transpose and transpose operations:

    (AB) =BA (AB)T =BTAT

    I =I ; O =OT IT =I(A + B) =A + B (A + B)T =AT + BT

    (A) = A (A)T =AT

    Definition 3.8A square matrixA is said to be

    diagonalifaij = 0 for i=j

    upper right triangularifaij = 0 for i > j

    lower left triangular ifaij = 0 for i < j

    Acolumn vector is an nby one matrix and a row vectoris a one by n matrix. Ifxis a column vector then xT is the row vector with the same elements.

    3.4 Invertibility and Inverses

    Lemma 3.1 A= B if and only ifAx = Bx for all x.

    Proof: IfA= B the assertion is obvious.IfAx= Bx for all x then we need only take x = ei for i = 1, 2, . . . , nwhere ei is thecolumn vector with 1 in the ith row and zeros elsewhere. Thus A and B are equal rowby row.

    IfA isn byp andB is p by m are written as

    A=

    A11 A12A21 A22

    and B=

    B11 B12B21 B22

    14

  • 7/25/2019 cr.intro

    18/68

    then the product ABsatisfies

    AB=

    A11B11+ A12B21 A11B12+ A12B22A21B11+ A22B21 A21B12+ A22B22

    if the partitions are conformable.

    If a vector is written as x =

    iiei where the ei form a basis for V then theproduct Axgives the transformed value ofxunder the linear transformation A whichcorresponds to the matrixA via the formula

    A(x) =i

    iA(ei)

    Definition 3.9A square matrixA is said to be invertible if

    (1) x1=x2 =Ax1=Ax2

    (2) For every vectorythere exists at least one x such thatAx= y.

    IfA is invertible and maps U into V, both n dimensional, then we define A1, whichmapsV intoU, by the equation

    A1(y0) =x0

    where y0 is in V and x0 is in Uand satisfies Ax0 = y0 (such an x0 exists by (2) andis unique by (1)). Thus definedA1 is a legitimate transformation from V to U. ThatA1 is a linear transformation fromVtoUcan be seen by the fact that ify1 + y2 Vthen there exist unique x1 and x2 inU such that y1 = A(x1) andy2= A(x2). Since

    A(x1+x2) = A(x1) +A(x2)

    = y1+y2

    it follows that

    A

    1

    (y1+y2) = x1+x2= A1(y1) +A

    1(y2)

    15

  • 7/25/2019 cr.intro

    19/68

    ThusA1 is a linear transformation and hence has a matrix representation. Finding thematrix representation ofA1 is not an easy task however.

    Theorem 3.2IfA,B andC are matrices such that

    AB= CA = I

    thenAis invertible and A1 =B = C.

    Proof: IfAx1 =Ax2 then CAx1 = CAx2 so that x1 =x2 and (1) of defintion 3.9 issatisfied. Ify is any vector then defining x=By implies Ax= ABy = y so that (2)of definition 3.9 is also satisfied and hence A is invertible. Since AB = I =B = A1

    and CA = I =C = A1 the conclusions follow.

    Theorem 3.3 A matrixA is invertible if and only ifAx= 0 implies x= 0 or equiva-lently if and only if every y Vcan be written as y = Ax.

    Proof: IfA is invertible then Ax= 0 and A0= 0 implies that x= 0 since otherwisewe have a contradiction. If A is invertible then by (2) of definition 3.9 we have thaty= Ax for everyy V.

    Suppose thatAx= 0implies thatx= 0. Then ifx1=x2we have thatA(x1x2)=0so that Ax1=Ax2 i.e. (1) of definition 3.9 is satisfied. Ifx1, x2, . . . , xn is a basis forU then

    i

    iA(xi) =0 =A

    iixi

    = 0 =

    i

    ixi = 0 =1= 2= = n= 0

    It follows that{Ax1, Ax2, . . . , Axn}is a basis forVso that we can write each y Vasy= Ax since

    y=i

    iA(xi) =A

    i

    ixi

    = Ax

    Thus (2) of definition 3.9 is satisfied and A is invertible.

    Suppose now that for each y V we can write y = Ax Let {y1, y2, . . . , yn} be abasis forVand letxi be such that yi = Axi. Then ifiixi = 0 we have

    0 = A

    i

    ixi

    16

  • 7/25/2019 cr.intro

    20/68

    = i

    iA(xi)

    =i

    iyi

    which implies that each of the i equal 0. Thus {x1, x2, . . . , xn} is a basis for U. Itfollows that everyx can be written as

    ixi and hence Ax = 0 implies that x = 0 i.e.

    (1) of definition 3.9 is satisfied. By assumption, (2) of definition 3.9 is satisfied so thatAis invertible.

    Theorem 3.4

    (1) IfA andB are invertible so is AB and (AB)1 =B1A1

    (2) IfA is invertible and = 0 then A is invertible and (A)1 =1A1

    (3) IfA is invertible thenA1 is invertible and (A1)1 =A

    Proof: Since(AB)B1A1 =AA1 =I

    andB1A1AB= B1B= I

    by Theorem 3.2, (AB)1 exists and equals B1A1

    Since(A)1A1 =AA1 =I

    and1A1(A) =A1A= I

    it again follows from Theorem 3.2 that (A)1 exists and equals 1A1.

    SinceA1A= AA1 =I

    it again follows from Theorem 3.2 that A1 exists and equals A.

    In many problems one guesses the inverse ofA and then verifies that it is in fact

    the inverse. The following theorems help.

    Theorem 3.5

    17

  • 7/25/2019 cr.intro

    21/68

    (1) Let A BC D

    where A is invertible

    Then A BC D

    1=

    A1 + A1BQ1CA1 A1BQ1

    Q1CA1 Q1

    whereQ = D CA1B.

    (2) Let A BC D

    where D is invertible

    Then A BC D

    1=

    (A BD1C)1 (A BD1C)1BD1

    D1C(A BD1C)1 D1 + D1C(A BD1C)1BD1

    Proof: A BC D

    1 A1 + A1BQ1CA1 A1BQ1

    Q1CA1 Q1

    is equal to

    AA1 + AA1BQ1CA1 BQ1CA1 AA1BQ1 + BQ1CA1 + CA1BQ1CA1 DQ1CA1 CA1BQ1 + DQ1

    which simplifies to

    I + BQ1CA1 BQ1CA1 BQ1 + BQ1

    CA1 + (D CA1B)Q1CA1 (D CA1B)Q1

    which is seen to be the identity matrix.

    The proof for (2) follows similarly.

    Theorem 3.6 LetA be nbyn, Ube m byn, S be mbym and V be mbyn. Then

    ifA,A + UT

    SV andS + SVA1

    UT

    Sare each invertible we haveA + UTSV

    = A1 A1UTS

    S + SVA1UTS

    1SVA1

    18

  • 7/25/2019 cr.intro

    22/68

    Proof:A + UTSV

    A + UTSV

    1= I UTS

    S + SVA1UTS

    1SVA1

    +UTSVA1 UTSVA1S + SVA1UTS

    1SVA1

    = I + UT

    I SS + SVA1UTS

    1SVA1

    SVA1UTSS + SVA1UTS

    1SVA1

    = I + UT

    I S + SVA1UTS

    S + SVA1UTS

    1SVA1

    = I

    Corollary 3.7 IfS1 exists in the above theorem then

    A + UTSV

    1=A1 A1UT

    S1 + VA1UT

    1VA1

    Corollary 3.8 IfS = I and u and v are vectors then

    A + uvT

    = A1

    1

    (1 + vTA1u)A1uvTA1

    The last corollary is used as starting point for the development of a theory of recursuveestimation in least squares and Kalman filtering in information theory.

    3.5 Direct Products and Vecs

    IfA is a p qmatrix and B is an r smatrix then their direct product denotedbyA Bis the pr qs matrix defined by

    A B= (aijB) =

    a11B a12B . . . a1qBa21B a22B . . . a2qB

    ... ... . . . ...ap1B ap2B . . . apqB

    19

  • 7/25/2019 cr.intro

    23/68

    Properties of direct products:

    0 A = A 0= 0

    (A1+ A2) B = A1 B + A2 B

    A (B1+ B2) = A B1+ A B2

    A B = (A B)

    A1A2 B1B2 = (A1 B1)(A2 B2)

    (A B)T = AT BT

    rank (A B) = rank (A)rank (B)

    trace (A B) = [trace (A)][trace[B]

    (A B)1 = A1 B1

    det(A B) = [det(A)]p[det(B)]m

    IfY is an n p matrix:

    vecR (Y) is thenp 1 matrix with the rows ofY stacked on top of each other.

    vecC(Y) is the np 1 matrix with the columns ofY stacked on top of eachother.

    Thei, j element ofY is the (i 1)p+j element of vecR(Y).

    Thei, j element ofY is the (j 1)p+i element of vecC(Y).

    vecC(YT) = vecR(Y)

    vecR(ABC) = (A CT)vecR(B)

    whereAis n q1, B is q1 q2 andC is q2 p.

    The following are some relationships between vecs and direct products.

    abT = bT a

    = a bT

    vecC(abT) = b a

    20

  • 7/25/2019 cr.intro

    24/68

    vecR(abT) = a b

    vecC(ABC) = (CT A)vecC(B)

    trace (ATB) = [vecC(A)]T[vecC(B)]

    21

  • 7/25/2019 cr.intro

    25/68

    Chapter 4

    Matrix Factorization

    4.1 Rank of a Matrix

    In matrix algebra it is often useful to have the matrices expressed in as simple a form aspossible. In particular, if a matrix is diagonal the operations of addition, multiplicationand inversion are easy to perform.

    Most of these methods are based on considering the rows and columns of a matrix

    as vectors in a vector space of the appropriate dimension.Definition: IfAis an n by p matrix then

    (1) The row rank of A is the number of linearly independent rows of the matrixconsidered as vectors in p dimensional space.

    (2) The column rank ofAis the number of linearly independent columns of the matrixconsidered as vectors in n dimensional space.

    Theorem 4.1 LetA be an n byp matrix. The set of all vectorsy such that y= Ax

    is a vector space with dimension equal to the column rank ofA and is called therangespaceofAand is denoted byR(A).

    22

  • 7/25/2019 cr.intro

    26/68

    Proof: R(A) is non empty since 0 R(A) Ify1 andy2 are inR(A) then

    y = 1y1+2y2

    y = 1Ax1+2Ax2

    y = A(1x1+2x2)

    It follows that y R(A) and henceR(A) is a vector space. If{a1, a2, . . . , ap} are thecolumns ofA then for every y R(A) we can write

    y= Ax= x1a1+x2a2+ +xpap

    Thus the columns ofA span R(A). It follows that the dimension ofR(A) is equal tothe column rank ofA.

    Theorem 4.2Let A be an n by p matrix. The set of all vectors x such that Ax = 0 isa vector space of dimension equal to p column rank ofA. This vector space is calledthenull space ofAand is denoted byN(A).

    Proof: Since 0 N(A) it follows that N(A) is non empty. Ifx1 and x2 are in N(A)then

    A(1x1+2x2) = 1Ax1+2Ax2

    = 0

    so thatN(A) is a vector space.

    Let {x1, x2, . . . , xk} be a basis forN(A). Then there exists vectors {xk+1, xk+2, . . . , xp}such that {x1, x2, . . . , xp} is a basis forpdimensional space. It follows that {Ax1, Ax2, . . . , Axp}spans the range space ofA since

    y = Ax

    = A

    i

    ixi

    =i

    iAxi

    Since Axi = 0 for i = 1, 2, . . . , k it follows that {Axk+1, Axk+2, . . . , Axp} spans the

    range space ofA. Suppose now that k+1, k+2. . . , p are such that

    k+1Axk+1+k+2Axk+2+ +pAxp = 0

    23

  • 7/25/2019 cr.intro

    27/68

    ThenA(k+1xk+1+k+2xk+2+ +pxp) =0

    It follows thatk+1xk+1+k+2xk+2+ +pxp N(A)

    Since{x1, x2, . . . , xp}is a basis forN(A) it follows that for some 1, 2, . . . , k we have

    1x1+2x2+ +kxk =k+1xk+1+k+2xk+2+ +pxp

    or1x1+2x2+ +kxk k+1xk+1+k+2xk+2+ +pxp = 0

    Since{x1, x2, . . . , xp} is a basis for p dimensional space we have that 1 =2 = =p = 0. It follows that the set

    {Axk+1, Axk+2, . . . , Axp}

    is linearly independent and spans the range space ofA. The dimension ofR(A is thusp k. But by Theorem 4.1 we also have that the dimension ofR(A) is equal to thecolumn rank ofA. Thus

    p k= column rank ofA

    so thatk= p column rank ofA = dim (N(A)

    Definition 4.2The elementary row (column) operations on a matrix are defined to be

    (1) The interchange of two rows (columns).

    (2) The multiplication of a row (column) by a non zero scalar.

    (3) The addition of one row (column) to another row (column).

    Lemma 4.3Elementary row (column) operations do not change the row (column) rankof a matrix.

    24

  • 7/25/2019 cr.intro

    28/68

    Proof: The row rank of a matrix A is the number of linearly independent vectors inthe set defined by

    A=

    aT1aT2

    ...aTn

    Clearly interchange of two rows does not change the rank nor does the multiplication ofa row by a non zero scalar. The addition ofaaTj toa

    Ti produces the new matrix

    A=

    aT1aT2

    ...aTi +aa

    Tj

    ...aTn

    which has the same rank as A. The statements on the column rank ofA are obtainedby considering the row vectors ofAT and using the above results.

    Lemma 4.4 Elementary row (column) operations on a matrix can be achieved by pre(post) multiplication by non-singular matrices.

    Proof: Interchange of the ith and jth rows can be achieved by pre-multiplication by

    25

  • 7/25/2019 cr.intro

    29/68

    the matrix

    E1=

    eT1eT2...

    eTi1eTj

    eTi+1...

    eTj1eTi

    eTj+1...

    eTn

    Multiplication by a non zero scalar and addition of a scalar multiple of one row toanother row can be achieved by pre multiplication by the matrices E2 and E3 where

    E2=

    eT1eT2...

    aeTi...

    eTn

    and E3=

    eT1eT2...

    aeTj + eTi

    .

    ..eTn

    The statements concerning column operations are obtained by using the above resultson A and then transposing the final results.

    Lemma 4.5 The matrices which produce elementary row (column) operations are nonsingular.

    26

  • 7/25/2019 cr.intro

    30/68

    Proof: The inverse ofE1 is ET1 . The inverse ofE2 is the matrix

    E2=

    eT1eT2...

    1a

    eTi...

    eTn

    Finally the inverse ofE3 is the matrix

    [e1, e2, . . . , ei, . . . , ej aei, . . . , en]

    Again the results for column operations follow from the row results.

    Lemma 4.6The column rank of a matrix is invariant under pre multiplication by a nonsingular matrix. Similarly the row rank of a matrix is invariant under post multiplicationby a non singular matrix.

    Proof: By Theorem 4.2 the dimension ofN(A) is equal top column rank ofA. Since

    N(A={x: Ax= 0}

    it follows that x N(A) if and only EAx = 0 where E is non singular. Thus thedimension ofN(A) is invariant under pre multiplication by a non singular matrix andthe conclusion follows. The result for post multiplication follow by considering AT andtransposing.

    Corollary 4.7 The column rank of a matrix is invariant under premultiplication byan elementary matrix . Similarly the row rank of a matrix is invariant under postmultiplication by an elementary matrix.

    By suitable choices of elementary matrices we can thus write

    PA= E

    whereP is the product of non singular elementary matrices and hence is non singular.The matrix E is a row echelon matrix i.e. a matrix with r non zero rows in which

    27

  • 7/25/2019 cr.intro

    31/68

    the first k columns consist ofe1, e2, . . . , er and 0s in some order. The remainingp r columns have zeros below the first r rows and arbitrary elements in the remainingpositions. It thus follows that

    r= row rank ofA column rank ofA

    and hencerow rank ofAT column rank ofAT

    But we also have that

    row rank ofAT = column rank ofA

    androw rank ofA = column rank ofAT

    It follows thatcolumn rank ofA row rank ofA

    and hence we have:

    Theorem 4.8row rank ofA = column rank ofA

    Alternative Proof of Theorem 4.8LetA be an n p matrix i.e.

    A=

    a11 a12 a1pa21 a22 a2p

    ... ...

    . . . ...

    an1 an2 anp

    We may write A in terms of its columns or rows i.e.

    A=Ac1, A

    c2, . . . , A

    cp

    =

    aT1aT2

    ...aTn

    28

  • 7/25/2019 cr.intro

    32/68

    where the j th column vector, Acj , and ith row vector, aTi are given by

    Acj =

    a1ja2j

    ...anj

    ai =

    ai1ai2

    ...aip

    The column vectors of A span a space which is a subset of Rn and is called thecolumn space ofAand denoted by C(A) i.e.

    C(A) =sp (Ac1, Ac2, . . . , A

    cp)

    The row vectors ofA span a space which is a subspace ofRp and is called the rowspaceofAand denoted by R(A) i.e.

    R(A) =sp (a1, a2, . . . , an)

    The dimension of C(A) is called the column rank of A and is denoted by C =c(A). Similarly the dimension ofR(A) is called the row rankofA and is denoted byR= r(A).

    Result 1: Let B=

    bc1 bc2 . . . b

    cC

    where the columns ofB form a basis for the column space ofA. Then there is a CpmatrixL such that

    A= BL

    It follows that the rows ofAare linear combinations of the rows ofL and hence

    R(A) R(L)

    Hence

    R= dim[R(A)] dim[R(L)] Ci.e. the row rank ofAis less than or equal to the column rank ofA.

    29

  • 7/25/2019 cr.intro

    33/68

    Result 2:

    R=

    rT1rT2...

    rTR

    where the rows ofR form a basis for the row space ofA. Then there is a n RmatrixKsuch that

    A= KR

    It follows that the columns ofA are linear combinations of the columns ofK and hence

    C(A) C(K)Hence

    C= dim[C(A)] dim[C(K)] R

    i.e. the column rank ofA is less than or equal to the row rank ofA.

    Hence we have that the row rank and column rank of a matrix are equal. Th commonvalue is called the rankofA.

    ReferenceHarville, David A. (1997) Matrix Algebra From a Statisticians Perspective.Springer (pages 36-40).

    We thus define the rank of a matrixA,(A) to be the number of linearly independentrows or the number of linearly independent columns in the matrix A. Note that (A)is unaffected by pre or post multiplication by non singular matrices. By pre and postmultiplying by suitable elementary matrices we can write any matrix as

    PAQ= B =

    Ir OO O

    where O denotes a matrix of zeroes of the appropriate order. SinceP and Q are nonsingular we have

    A= P1BQ1

    which is an example of a matrix factorization.

    Another factorization of considerable importance is contained in the following Lemma.

    30

  • 7/25/2019 cr.intro

    34/68

    Lemma 4.9IfA is an n by p matrix of rankr then there are matrices B (nby r) andC(r byp) such that

    A= BC

    whereB andC are both of rank r.

    Proof: Let the rows ofC form a basis for the row space ofA. ThenC is r by p of rankr. Since the rows ofC form a basis for the row space ofA we can write the ith row ofAasaTi =b

    TiC for some choice ofbi. Thus A = BC.

    The fact that the rank ofB in Lemma 4.9 is r follows from the following importantresult

    Theorem 4.10rank (AB) min{rank (A), rank (B)}Proof: Each row ofAB is a linear combination of the rows ofB. Thus

    rank (AB) rank (B)

    Similarly each column ofAB is a linear combination of the columns ofAso that

    rank (AB) rank (A)

    The conclusion follows.

    Theorem 4.11An n by nmatrix is non singular if and only if it is of rank n.

    Proof: For any matrix we can write

    PAQ=

    Ir OO O

    whereP and Q are non singular. Since the right hand side of this equation is invertibleif and only ifr = n the conclusion follows.

    Definition 4.3IfA is ann byp matrix then thep byp matrixAA(ATAifA is real)is called theGram matrix ofA.

    Theorem 4.12IfA is annby pmatrix then

    (1) rank (AA) = rank (A) = rank (AA)

    31

  • 7/25/2019 cr.intro

    35/68

    (2) rank (ATA) = rank (A) = rank (AAT) ifAis real.

    Proof: Assume that the field of scalars is such that

    izi zi= 0 implies that z1 = z2=

    = zn = 0. If Ax = 0 then AAx = 0. Also if AAx = 0 then xAAx = 0 or

    zz = 0 so that z = 0 i.e. Ax = 0. Hence AAx = 0 and Ax = 0 are equivalentstatements. It follows that

    N(AA) ={x: AAx= 0}={x: Ax= 0}=N(A)

    Thus rank (AA) = rank (A). The conclusion that rank (AA) = rank (A) follows bysymmetry.

    Theorem 4.13rank (A + B) rank (A) + rank (B)

    Proof: Ifa1, a2, . . . , ar and b1, b2, . . . , bsdenote linearly independent columns ofAandBrespectively then

    {a1, a2, . . . , ar, b1, b2, . . . , bs}

    span the column space ofA + B. Hence

    rank (A + B) r+s= rank (A) + rank (B)

    Definition 4.4

    (1) A square matrix Ais idempotentifA2 =A.

    (2) A square matrix Ais nilpotentifAm =O for some integerm greater than one.

    Definition 4.5The traceof a square matrix is the sum of its diagonal elements i.e.

    tr (A) =i

    aii

    Theorem 4.14tr (AB) = tr (BA); tr (A + B) = tr (A) + tr (B); tr (AT) = tr (A)

    Theorem 4.15IfA is idempotent then

    rank (A) = tr (A)

    32

  • 7/25/2019 cr.intro

    36/68

    Proof: We can write A = BC whereB is n by r of rankr,C is r byn of rankr and ris the rank ofA. Thus we have

    A2 =BCBC= A = BC

    HenceBBCBCC =BBCC

    SinceB andC are each of rankr we have thatCB = I and hence

    tr (BC) = tr (CB) = tr (I) =r = rank (A)

    33

  • 7/25/2019 cr.intro

    37/68

    Chapter 5

    Linear Equations

    5.1 Consistency Results

    Consider a set ofn linear equations in p unknowns

    a11x1 + a12x2 + + a1pxp = y1a21x1 + a22x2 + + a2pxp = y2

    ... ...

    ... ...

    ... ...

    ... ...

    ...

    an1x1 + an2x2 + + anpxp = yn

    which may be compactly written in matrix notation as

    Ax= y

    where A is the n byp matrix with i, j element equal to aij , x is the p by one columnvector withjth element equal toxj andyis thenby one column vector withith elementequal to yi.

    Often such equations arise without knowledge of whether they are really equations,i.e. does there exist a vector x which satisfies the equations? If such an x exists theequations are said to be consistent, otherwise they are said to be inconsistent.

    The equations Ax= y can be discussed from a vector space perspective since A isa carrier of a linear transformation from the p dimensional space spanned by the xis to

    34

  • 7/25/2019 cr.intro

    38/68

    then dimensional space where y lives. It follows that a solution exists if and only ifyis inR(A), the range space ofA. Thus a solution exists if and only ify is in the columnspace ofA defined as the set of all vectors of the form {y : y =Ax for some x}. Ifwe consider the augmented matrix [A, y] we have the following theorem on consistency

    Theorem 5.1The equations Ax = y are consistent if and only if

    rank([A, y]) = rank(A)

    Proof: Obviouslyrank([A, y]) rank(A)

    If the equations have a solution then there exists x such that Ax = y so that

    rank(A) rank([A, y]) = rank([A, Ax]) = rank(A[I, x]) rank(A)

    It follows thatrank([A, y]) = rank(A)

    if the equations are consistent.

    Conversely ifrank([A, y]) = rank(A)

    thenyis a linear combination of the columns ofAso that a solution exists.

    Theorem 5.1 gives conditions under which solutions to the equations Ax= y existbut we need to know the form of the solution in order to explicitly solve the equations.The equations Ax = 0 are called the homogeneous equations. If x1 and x2 aresolutions to the homogeneous equations then 1x1 + 2x2 is also a solution to thehomogeneous equations. In general, if{x1, x2, . . . , xs} span the null space of A thenx0 =

    si=1ixi is a solution to the homogeneous equations. If xp is a particular

    solution to the equations Ax = y then xp +x0 is also a solution to the equationsAx= y. Ifx is any solution then x = xp+ (x xp) so that every solution is of thisform. We thus have the following theorem

    Theorem 5.2A general solution to the consistent equationsAx= y is of the form

    x= xp+ x0

    35

  • 7/25/2019 cr.intro

    39/68

    wherexp is any particular solution to the equations andx0 is any solution to the homo-geneous equations Ax = 0.

    A final existence question is to determine when the equationsAx = y have a uniquesolution. The solution is unique if and only if the only solution to the homogeneousequations is the 0 vector i.e. the null space of A contains only the 0 vector. If thesolution x is unique the homogeneous equations cannot have a non zero solution x0since thenx + x0 would be another solution. Conversely, if the homogeneous equationshave only the 0 vector as a solution then the solution to Ax = y is unique ( if therewere two different solutionsx1 andx2 thenx1 x2 would be a non zero solution to thehomogeneous equations).

    Since the null space ofA contains only the 0vector if and only if the columns ofAare linearly independent we have that the solution is unique if and only if the rank ofAis equal to p where we assume with no loss of generality that A is n byp with p n.We thus have the following theorem.

    Theorem 5.3LetA be n by p with p n then

    (1) The equations Ax = y have the general solution

    x= xp+ x0

    whereAxp = y andAx0= 0 if and only if

    rank([A, y]) = rank(A)

    (2) The solution is unique (x0 = 0) if and only if

    rank(A) =p

    5.2 Solutions to Linear Equations

    5.2.1 A is p byp of rank p

    Here there is a unique solution by Theorem 5.3 with explicit form given by A1y. Theproof of this result is trivial given the following Lemma

    36

  • 7/25/2019 cr.intro

    40/68

    Lemma 5.4Ap by p matrix is non singular if and only if rank(A) =p.

    Proof: IfA1 exists it is the unique matrix satisfying

    A1A= AA1 =I

    Thusp rank(A) p i.e. the rank ofA is pifAis non singular.

    Conversely, if rank(A) = p then there exists a unique solution xi to the equationAxi = ei fori = 1, 2, . . . , p. If we define Bby

    B= [x1, x2, . . . , xp]

    then AB = I. Similarly there exists a unique solution zi to the equation ATzi fori= 1, 2, . . . , pso that if we define

    CT = [z1, z2, . . . , zp]

    thenATCT =I or CA = I. It follows that that C= B i.e. thatA1 exists.

    5.2.2 A is n byp of rank p

    IfA is n byp of rank p where p n then by theorem 5.3 a unique solution exists and

    we need only find its form. Since the rank ofA is p the rank ofATA is p and henceATAhas an inverse. Thus if we define x1 = (A

    TA)1ATy we have that

    Ax1 = A(ATA)1ATy= A(ATA)1AT(Ax) =Ax= y

    so thatx1 is the unique solution to the equations.

    5.2.3 A is n byp of rank r where r p n

    In this case a solution may not exist and even if a solution exists it may not be unique.

    Suppose that a matrixA can be found such that

    AAA= A

    37

  • 7/25/2019 cr.intro

    41/68

    Such a matrix A is called ageneralized inverse ofA

    If the equations Ax= y have a solution then x1 = Ay is a solution since

    Ax1= A(Ay) =AA(Ax) =AAAx= Ax= y

    so that Ax1 is a solution. The general solution is thus obtained by characterizing thesolutions to the homogeneous equations i.e. we need to characterize the vectors in thenull space ofA. Ifz is an arbitrary vector then (I AA)z is in the null space ofA.The dimension of the set Uwhich can be written in this form is equal to the rank ofI AA. Since

    (I AA)2 = I AA AA + AAAA

    = I A

    Awe see that

    rank (I AA) = tr (I AA)

    = p tr (AA)

    = p rank (AA))

    = p rank (A)

    = p r

    It follows thatUis of dimensionp rand henceU=N(A). Thus we have the followingtheorem

    Theorem 5.5IfAis n by p of rankr p n then

    (1) If a solution exists the general solution is given by

    x= Ay+ (I AA)z

    wherez is an arbitrary p by one vector andA is a generalized inverse ofA.

    (2) If x0 = By is a solution to Ax = y for every y for which the equations areconsistent thenB is a generalized inverse ofA.

    Proof: Since we have established (1) we prove (2) Let y = Awhere is arbitrary. The

    equationsAx= A

    have a solution by assumption which is of the form BA

    . It followsthat ABA= A. Since is arbitrary we have that ABA= A i.e. B is a generalizedinverse ofA.

    38

  • 7/25/2019 cr.intro

    42/68

    5.2.4 Generalized Inverses

    We now establish that a generalized inverse ofA always exists. Recall that we can findnon singular matricesPand Q such that

    PAQ= B i.e. A= P1BQ1

    whereB is given by

    B=

    I OO O

    Matrix multiplication shows that BBB= B where

    B =

    I UV W

    whereU, V and W are arbitrary. Since

    A(QBP)A = P1BQ1QBPP1BQ1

    = P1BQ1

    = A

    we see that QBP is a generalized inverse ofA establishing the existence of a generalized

    inverse for any matrix.

    We also note that if

    PA=

    I CO O

    Then another generalized inverse ofA is BPwhere

    B =

    I OO O

    39

  • 7/25/2019 cr.intro

    43/68

    Chapter 6

    Determinants

    6.1 Definition and Properties

    Definition 6.1 LetA be square matrix with columns {x1, x2, . . . , xp}. The determi-nantofA is that function det : Rp R such that

    (1) det(x1, x2, . . . , cxm, . . . , xp) =c det(x1, x2, . . . , xm, . . . , xp)

    (2) det(x1, x2, . . . , xm+ xk, . . . , xp = det(x1, x2, . . . , xm, . . . , xp)

    (3) det(e1, e2, . . . , ep) = 1 where ei is the ith unit vector.

    Properties of Determinants

    (1) Ifxi = 0 then xi = 0xi so that by (1) of the definition det(x1, x2, . . . , xp) = 0 ifxi = 0.

    (2) det(x1, x2, . . . , xm+cxk, . . . , xp) = det(x1, x2, . . . , xm, . . . , xp). Ifc= 0 this resultis trivial. Ifc= 0 we note that

    det(x1, x2, . . . , xm+cxk, . . . , xp) = 1

    cdet(x1, x2, . . . , xm+cxk, . . . , cxk, . . . , xp)

    40

  • 7/25/2019 cr.intro

    44/68

    =

    1

    cdet(x1, x2, . . . , xm, . . . , cxk, . . . , xp)= det(x1, x2, . . . , xm, . . . , xp)

    (3) If two columns ofA are identical then det = 0. Similarly if the columns ofAarelinearly dependent then the determinant ofAis 0.

    (4) Ifxk andxm are interchanged then the determinant changes sign i.e.

    det(x1, x2, . . . , xm, . . . , xk, . . . , xp) = det(x1, x2, . . . , xk, . . . , xm, . . . , xp)

    To prove this we note that

    det(x1, x2, . . . , xm, . . . , xk, . . . , xp) = det(x1, x2, . . . , xm+ xk, . . . , xk, . . . , xp)

    = det(x1, x2, . . . , xm+ xk, . . . , xk (xm+ xk), . . . , xp)

    = det(x1, x2, . . . , xm+ xk, . . . , xm, . . . , xp)

    = det(x1, x2, . . . , xk, . . . , xm, . . . , xp)

    = det(x1, x2, . . . , xk, . . . , xm, . . . , xp)

    Thus the interchange of an even number of columns does not change the sign ofthe determinant while the interchange of an odd number of columns does changethe sign.

    (5) det(x1, x2, . . . , xm+ y, . . . , xp) is equal to

    det(x1, x2, . . . , xm, . . . , xp) + det(x1, x2, . . . , y, . . . , xp)

    If the vectorsxi for i=mare dependent the assertion is obvious since both termsare 0. If the xi are linearly independent for i =m and xm =

    icixi the assertion

    follows by repeated use of (2) and the fact that the first term of the expression isalways zero.

    If the xi are linearly independent theny=

    idixi so that

    det(x1, x2, . . . , xm+ y, . . . , xp) = det(x1, x2, . . . , xm+i

    dixi, . . . , xp)

    = det(x1, x2, . . . , (1 +dm)xm, . . . , xp)

    41

  • 7/25/2019 cr.intro

    45/68

    = (1 +dm)det(x1, x2, . . . , xm, . . . , xp)

    = det(x1, x2, . . . , xp) + det(x1, x2, . . . , dmxm, . . . , xp)

    = det(x1, x2, . . . , xp) + det(x1, x2, . . . ,i

    dixi, . . . , xp)

    = det(x1, x2, . . . , xp) + det(x1, x2, . . . , y, . . . , xp)

    Using the above results we can show that the determinant of a matrix is a well definedfunction of a matrix and give a method for computing the determinant. Since anycolumn vector ofA, sayxm can be written as

    xm =p

    i=1 aimeiwe have that

    det(x1, x2, . . . , xp) = det(i1

    ai11ei1, x2, . . . , xp)

    =i1

    ai11det(ei1, x2, . . . , xp)

    =i1

    ai11i2

    ai22det(ei1, ei2, . . . , xp)

    =i1

    i2

    ip

    ai11ai22 aippdet(ei1, ei2, . . . , eip)

    Now note that det(ei1, ei2, . . . , ep) is 0 if any of the subscripts are the same, +1 ifi1, i2, . . . , ip is an even permutation of 1, 2, . . . , p and 1 if i1, i2, . . . , ip is an odd per-mutation of 1, 2, . . . , p. Hence we can evaluate the determinant of a matrix and it is auniquely defined function satisfying the conditions (1), (2) and (3).

    6.2 Computation

    The usual method for computing a determinant is to note that ifxm=

    iaimxi then

    det(x1, x2, . . . , xp) = i

    aimdet(x1, x2, . . . , em, . . . , xp)

    =i

    aimAim

    42

  • 7/25/2019 cr.intro

    46/68

    whereAim = det(x1, x2, . . . , em, . . . , xp)

    is called thecofactorofaim and is simply the determinant ofA when the mth columnofA is replaced byei.

    We now note that

    Aim =i1

    i2

    ip

    ai1ai2 aipdet(ei1, . . . , ei, . . . , eip)

    HenceAim= (1)

    i+m det(Mim)

    whereMim is the matrix formed by deleting the ith row ofA and thejth column ofA.Thus the value of the determinant of a pby p matrix can be reduced to the calculationof p determinants of p 1 by p 1 matrices. The determinant of Mim is called theminorofaim.

    43

  • 7/25/2019 cr.intro

    47/68

    Chapter 7

    Inner Product Spaces

    7.1 definitions and Elementary Properties

    Definition 7.1Aninner producton a linear vector spaceVis a function ( ,) mappingV V into C such that

    1. (x, x) 0 with (x, x) = 0 if and only ifx = 0.

    2. (x, y) = (x, y)3. (x + y,vbfz) =(x, z) +(y, z)

    A vector space with an inner product is called an inner product space.

    Definition 7.2: Thenormor length ofx,||x||is defined by

    ||x||2 = (x, x)

    Definition 7.3: The vectorsx and y are said to be orthogonalif (x, x) = 0. We writex y ifx is orthogonal to y. More generally

    (1) xis said to be orthogonal to the set of vectors X if

    (x, y) = 0 for every y X

    44

  • 7/25/2019 cr.intro

    48/68

    (2) X andYare said to be orthogonal if

    (x, y) = 0 for every x X and y Y

    Definition 7.4: X is an orthonormalset of vectors if

    (x, y) =

    1 ifx = y0 ifx =y

    X is a complete orthonormal set of vectors if it is not contained in a larger set oforthonormal vectors.

    Lemma 7.1IfXis an orthonormal set then its vectors are linearly independent.

    Proof: If

    iix= 0 for xi X then

    0=

    i

    ixi, xj

    =

    i

    i(xi, xj) =j

    It follws that each of thejare equal to 0 and hence that the xjs are linearly independent.

    Lemma 8.2 Let {x1, x2, . . . , xp} be an orthonormal basis for X. Thenx X if andonly ifx xi fori = 1, 2, . . . , p.

    Proof: Ifx X then by definition x xi fori = 1, 2, . . . , p.Conversely, ifx xi fori = 1, 2, . . . , p. Then ify X we have y =

    iixi. Thus

    (x, y) =i

    i(x, xi) = 0

    Thusx X.

    Theorem 7.3(Bessels Inequality) Let {x1, x2, . . . , xp} be an orthonormal set in a linearvector spaceVwith an inner product. Then

    1.i |i|2 ||x||2 wherex is any vector in V and i= (x, xi).2. The vectorr = x

    iixis orthogonal to each xj and hence to the space spanned

    by{x1, x2, . . . , xp}.

    45

  • 7/25/2019 cr.intro

    49/68

    Proof: We note that

    ||x i

    ixi||2 =

    x

    i

    ixi, x i

    ixi

    = (x, x) i

    (i(xi, x) (x,i

    ixi) + (i

    ixi,i

    ixi)

    = ||x||2 i

    ii i

    i(x, xi) +i

    i(xi,i

    ixi)

    = ||x||2 2i

    |i|2 +

    i

    i(xi, xi)

    = ||x||2

    i|i|

    2

    Since||x

    iixi||2 0 result (1) follows.

    For result (2) we note that

    x

    i

    ixi, xj

    = (x, xj)

    i

    i(x i, xj) = (x, xj) (x, xj) = 0

    Theorem 7.4(Cauchy Schwartz Inequality) Ifx and y are vectors in an inner productspace then

    |(x, y)| ||x|| ||y||

    Proof: If y = 0 equality holds. If y = 0 then y||y||

    is orthonormal and by Besselsinequality we have

    x, y

    ||y||

    2

    ||x||2 or |(x, y)| ||x|| ||y||

    Theorem 7.5IfX ={x1, x2, . . . , xn} is any finite orthonormal set in a vector space Vthen the following conditions are equivalent:

    (1) X is complete.

    (2) (x, xi) =0 for i = 1, 2, . . . , n implies thatx= 0.

    46

  • 7/25/2019 cr.intro

    50/68

    (3) The space spanned byX is equal toV.

    (4) Ifx V thenx =

    i(x, xi)xi.

    (5) Ifx and yare inV then

    (x, y) =i

    (x, xi)(xi, y)

    (6) Ifx V then||x||2 =

    i

    |(xi, x)|2

    Result (5) is called Parsevals identity.

    Proof:(1) =(2) IfXis complete and (x, xi) = 0 for eachi then

    x||x||

    could be added to thesetXwhich would be a contradiction. Thus x= 0.

    (2) = (3) Ifx is not a linear combination of the xi then x

    i(x, xi)xi is not equaltox and is orthogonal to Xwhich is a contradiction. Thus the sapace spanned by X isequal to V

    (3) =(4) Ifx Vwe can write x=

    jjxj. It follows that

    (x, xi) =j

    j(xj, xi) =i

    (4) =(5) Sincex=

    i

    (x, xi)xi and y=j

    (y, xj)xj

    we have

    (x, y) = (i

    (x, xi)xi,j

    (y, xj)xj)

    = i

    (x, xi) (y, xi)

    =i

    (x, xi)(xi, y)

    47

  • 7/25/2019 cr.intro

    51/68

    (5) =(6) In result (5) simply setx = y.

    (6) =(1) Ifxo is orthogonal to each xi then by (6) we have

    ||xo||2 =

    i

    |(xi, xo)|2 = 0

    so thatxo = 0.

    7.2 Gram Schmidt Process

    The Gram Schmidt process can be used to construct an orthonormal basis for a finitedimensional vector space. Start with a basis forV as {x1, x2, . . . , xn}. Form

    y1 = x1||x1||

    Next definez2 = x2 (x2, y1)y1

    Since thexiare linearly independentz2 =0 and z2is orthogonal toy1. Hencey2 = z2||z2||

    is orthogonal to y1 and has unit norm. If y1, y2, . . . , yr have been so chosen then weform

    zr+1= xr+1 r

    i=1

    (xr+1, yi)yi

    Since the xi are linearly independent ||zr+1||> 0 and since zr+1 yi fori= 1, 2, . . . , rit follows that

    yr+1= zr+1||zr+1||

    may be added to the set {y1, y2, . . . , yr}to form a new orthonormal set. The processnecessarily stops withynsince there can be at mostnelements in a linearly independentset.

    48

  • 7/25/2019 cr.intro

    52/68

    7.3 Orthogonal Projections

    Theorem 7.6 (Orthogonal Projection) LetUbe a subspace of an inner product spaceVand lety be a vector in Vwhich is not inU. Then there exists a unique vectoryU Uand and a unique vector e V such that

    y= yU+ e and e U

    Proof: Let {x1, x2, . . . , xr} be a basis ofU. Since y / U the set {x1, x2, . . . , xr, y} isa linearly independent set. Use the Gram Schmidt process on this set to form a setof orthonormal vectors {y1, y2, . . . , yr, yr+1}. Since y is in the space spanned by these

    vectors we can write

    y=r+1i=1

    iyr =r

    i=1

    iyr+r+1yr+1

    If we define

    yU=r

    i=1

    iyi and e= r+1yr+1

    thenYU U and e U. To show uniqueness, suppose that we also have

    y= z1+ e1 where z1 U and e1 U

    Then we have (z1 yU) + (e1 e) =0

    so that (z1 yU, z1 yU) = 0 and hence z1= yU and e1 = e.

    Definition 7.4The vectore in Theorem 7.6 is called theorthogonal projectionfromy to Uand the vector yUis called the orthogonal projectionofy onU.

    Theorem 7.7The projection ofy on Uhas the property that

    ||y yU||=min

    x {||y x||: x U}

    Proof: Letx UThen (x yU, e) = 0. Hence

    ||y x|| = (y x, y x)

    49

  • 7/25/2019 cr.intro

    53/68

    = (y yU+ yU x, y yU+ yU x)

    = ||y yU|| + ||yU x|| + (y yU, yU x) + (yU x, y yU)

    = ||y yU|| + ||yU x||

    ||y yU||

    with the minimum occurring whenx = yU. Thus the projection minimizes the distancefromU to x.

    50

  • 7/25/2019 cr.intro

    54/68

    Chapter 8

    Characteristic Roots and Vectors

    8.1 Definitions

    Definition 8.1: A scalar is a characteristic root and a non-zero vector x is acharacteristic vectorof the matrix Aif

    Ax= x

    Other names for charateristic roots are proper value, latent root, eigenvalue and secularvalue with similar adjectives appying to characteristic vectors.

    If is a characteristic root ofAlet the setX be defined by

    X = {x : Ax= x}

    Thegeometric multiplicityof is defined to be the dimension ofX {0}. is saidto be a simple characteristic root if its geometric multiplicity is 1.

    Definition 8.2: The spectrum ofA is the set of all characteristic roots ofA and isdenoted by (A)

    Since Ax= x if and only if [A ] x= 0 we see that (A) is equivalent tothe set of such thatA I. Similarly x is a characteristic vector corresponding to

    51

  • 7/25/2019 cr.intro

    55/68

    if and only x is in the null space ofA I. Using the properties of determinants wehave the following Theorem.

    Theorem 8.1 (A) if and only if det(A I) = 0.

    If the field of scalars is the set of complex numbers (or any algebraically closed field)then ifAis p by p we have

    det(A A) =q

    i=1

    ( i)ni

    whereq

    i=1 =p and ni 1

    Thealgebraic multiplicityof the characteristic rootiis the numberniwhich appearsin the above expression.

    The algebraic and geometric multiplicity of a characteristic root need not correspond.To see this let

    A=

    1 10 1

    Then

    det (A I) =

    1 10 1

    = ( 1)

    2

    It follows that both of the characteristic roots ofA are equal to 1. Hence the algebraicmultiplicity of the characteristic root 1 is equal to 2. However, the equation

    Ax=

    1 10 1

    x1x2

    =

    x1x2

    yields the equations

    x1+x2 = x1

    x2 = x2

    Thus the geometric multiplicity ofAis equal to 1.

    Lemma 8.2

    52

  • 7/25/2019 cr.intro

    56/68

    (1) The spectrum ofTAT1 is the same as the spectrum ofA.

    (2) If is a characteristic root ofA thenp() is a characteristic root ofp(A) wherepis any polynomial.

    Proof: Since TAT1 I= T[A I]T1 it follows that [A I]x= 0 if and only if[TAT1 I]Tx= 0 and the spectrums are thus the same.

    IfAx = xthen Anx= nxfor any integer nso thatp(Ax= p()x.

    8.2 Factorizations

    Theorem 8.3 (SchursTriangular Factorization) Let A be any p by p matrix. Thenthere exists a unitary matrix U i.e. UU= I such that

    UAU= T

    whereT is an upper right triangular matrix with diagonal elements equal to the char-acteristic roots ofA

    Proof: Letp = 2 and let u1 be such that

    Au1= 1u1 and u1u1= 0

    Define the matrix U = [u1, u2] whereu2 is chosen so that UU= I. Then

    UAU = [u1, u2]A[u1, u]

    =

    u1u2

    [u1, Au2]

    =

    u1Au20 u2Au2

    Note that u2Au2 = 2 is the other characteristic root of A. Now assume that theresult is true for n = N and let n = N + 1. Let Au1 = 1u1 and u

    1u1 = 1 and let

    {vbfb1, b2, . . . , bN} be such that

    U1 = [u1, b1, b2, . . . , bN]

    53

  • 7/25/2019 cr.intro

    57/68

    is unitary. Thus

    U1AU1 = [u1, b1, b2, . . . , bN]A[u1, b1, b2, . . . , bN]

    = [u1, b1, b2, . . . , bN][1u1, Ab1, Ab2, . . . , AbN]

    =

    1 u

    1Au1 u

    1AuN

    0 BN

    Since the characteristic roots of the above matrix are the solutions to the equation

    (1 )det(BN I) = 0

    it follows that the characteristic roots of BN are the remaining characteristic roots1, 2, . . . , N+1ofA. By the induction assumption there exists a unitary matrixL suchthat

    LBNL

    is upper right triangular with diagonal elements equal to 1, 2, . . . , N+1. Let

    UN+1=

    1 0T

    0 L

    and U= U1UN+1

    Then

    UAU = UN+1U1AU1UN+1

    = UN+1

    1 t

    0 BN

    =

    1 0T

    0 L

    1 b

    0 BN

    1 0T

    0L

    =

    1 b

    0 LBNL

    which is upper right triangular as claimed with the diagonal elements equal to thecharacteristic roots ofA.

    Theorem 8.4IfA is normal (A

    A= AA

    ) then there exists a unitary matrix U suchthatUAU= D

    54

  • 7/25/2019 cr.intro

    58/68

    whereD is diagonal with elements equal to the characteristic roots ofA.

    Proof: By Theorem 8.3 there is a unitary matrix U such that UAU = T whereT is upper right triangular. Thus we als have UAU = T where T is lower righttriangular. It follows that

    UAAU= TT and UAAU= TT

    Hence the off diagonal elements ofT vanish i.e. Tis diagonal as claimed.

    Theorem 8.5(Cochrans Theorem) IfA is normal then

    A=

    ni=1

    iEi

    whereEi Ej =O for i=j, E2i =Ei and the i are the characteristic roots ofA.

    Proof: By Theorem 8.3 there is a unitary matrix Usuch that UAU= D is diagonalwith diagonal elements equal to the characteristic roots ofA. Hence

    A = UDU

    = U

    1u1

    2u2

    ...

    nun

    =ni=1

    iuiui

    DefiningEi= uiui completes the proof.

    This representation of A is called the spectral representation of A. In mostapplicationsAis symmetric.

    The spectral representation can be rewritten as

    A=

    ni=1

    iEi=

    qj=1

    jEj

    55

  • 7/25/2019 cr.intro

    59/68

    where1< 2< < q are the distinct characteristic roots ofA. Now define

    S0 = O and Si =i

    j=1

    Ej for i= 1, 2, . . . , q

    ThenEi = Si Si1 fori= 1, 2, . . . , q and hence

    A =q

    j=1

    jEj

    =q

    j=1

    j(Sj Sj1)

    =q

    j=1

    jSj

    which may be written as a Stielges integral i.e.

    A=

    dS()

    which is also called the spectral representation ofA.

    The spectral representation ofAhas the property that

    (Ax, y) = (

    dS()x, y)

    =

    qj=1

    jSjx, y

    =q

    j=1

    j(Sjx, y)

    =q

    j=1

    j[(Sjx, y) (Sj1x, y)]

    = d(S()x, y)In more advanced treatments applicable to Hilbert spaces the property just establishedis taken as the defining property for

    d(S().

    56

  • 7/25/2019 cr.intro

    60/68

    8.3 Quadratic Forms

    IfA is a symmetric matrix with real elements xTAx is called a quadratic form. Aquadratic form is said to be

    positive definite ifxTAx>0 for all x=0

    non negative definite ifxTAx 0 for all x=0

    IfA is positve definite then all of its characteristic roots are positive while ifA is nonnegative definite then all of its characteristic roots are non negative. To prove this note

    that ifxi is a characteristic vector ofA corresponding toi thenAxi= ixi and hencexTiAxi = ix

    Tixi.

    Ifi=j then the corresponding characteristic vectors are orthogonal since

    jxTixj =x

    TiAxj = (Axi)xj =ix

    Tixj

    which implies thatxTixj = 0 i.e. thatxi andxj are orthogonal ifi=j .

    Another way of characterizing characteristic roots is to consider the stationary valuesof

    XTAx

    xTxas x varies over Rp. Since

    A=pi=1

    iEi

    where1 2 p we have that

    Ax=pi=1

    iEix

    It follows that

    XTAx =pi=1

    ixTEix

    57

  • 7/25/2019 cr.intro

    61/68

    =

    pi=1

    ixT

    EiETix

    =pi=1

    iEix2

    SinceEi= yiyTi , where theyi form an orthonormal basis we have that U

    TU= I where

    U= [y1, y2, . . . , yp]

    It follows that U1 =UT and hence thatUUT =I Thus

    I = UU

    T

    = [y1, y2, . . . , yp]

    y1y2...

    yp

    =pi=1

    yiypi

    =pi=1

    Ei

    It follows that

    xTx =pi=1

    xTEix

    =pi=1

    xTEiETix

    =pi=1

    Eix2

    ThusXTAx

    xTx =p

    i=1iEix2p

    i=1 Eix2

    58

  • 7/25/2019 cr.intro

    62/68

  • 7/25/2019 cr.intro

    63/68

    Theorem 8.8 A necessary and sufficient condition that a symmetric matrix on a realinner product space satisfiesA = 0 is that Ax, x= 0.

    Proof: Necessity is obvious.

    To prove sufficiency we note that

    A(x + y), (x + y) = (Ax + Ay), (x + y)

    . = Ax, x + Ax, y + Ay, x + Ay, y

    It follows that

    Ax, y + Ay, x= A(x + y), (x + y) A(x, (x A(y, (y

    Thus the real part ofAx= 0 sinceA is symmetric. ThusAx= 0 and by Lemma 8.2the result follows.

    8.4 Jordan Canonical Form

    The most general canonical form for matrices is the Jordan Canonical Form. Thisparticular canonical form is used in the theory of Markov chains.

    The Jordan canonical form is a nearly diagonal canonical form for a matrix. LetJpi(i) be pi pi matrix of the form

    Jpi(i) =iIpi+ Npi

    where

    Npi =

    0 1 0 0 00 0 1 0 0...

    ... ...

    ... . . .

    ...0 0 0 0 10 0 0 0 0

    is a nilpotent matrix of orderpi i.e. Npipi =0.

    60

  • 7/25/2019 cr.intro

    64/68

    Theorem 8.9 (Jordan Canonical Form). Let 1, 2, . . . , q be the q distinct charac-teristic roots of a p p matrix A. Then there exists a non singular matrix V suchthat

    VAV1 = diag

    Jp1 , Jp2, . . . , Jpq

    One use of the Jordan canonical form occurs when we want an expression for An.Using the Jordan canonical form we see that

    VAnV1 = diag

    Jnp1, Jnp2, . . . , J

    npq

    Thus ifn max pi we have

    Jnpi(i) = (iIpi+ Npi)

    n

    =

    r= 0n

    n

    r

    nri N

    rpi

    by the binomial expansion. Since Nrpi =0 ifr pi we obtain

    Jnpi(i) =pi1r=0

    n

    r

    nri N

    rpi

    Thus for large n we have a relatively simple expression for An.

    8.5 The Singular Value Decomposition (SVD) of aMatrix

    Theorem 8.10 IfA is any nbyp matrix then there exists matrices U, V and D suchthat

    A= U DVT

    where

    U is n by p and UTU=Ip

    V is p by pandVTV =Ip

    61

  • 7/25/2019 cr.intro

    65/68

    D= diag (1, 2, . . . , p) and

    1 2 p0

    Proof: Letx be such that ||x|| = 1 and ||Ax|| =||A||. That such an x can be chosenfollows from the fact that the norm is a continuous function on a closed and boundedsubset ofRn and hence achieves its maximum at a point in the set. Lety1 = Ax anddefine

    y= y1||A||

    Then ||y||= 1 and Ax=||A||y= y

    where = ||A||LetU1 andV1 be such that

    U= [y, U1] and [x, V1] are orthogonal

    whereU isnby pandV isp by p. It then follows that

    UTAV =

    yT

    UT1

    A [x, V1] =

    yTAx yTAV1UT1Ax U

    T1AV1

    Thus

    A1 = UTAV =

    wT

    0 B1

    wherewT =yTAV1, B1 = UT1AV1 and y

    TAx= yTy=

    If we define

    z= A1

    w

    =

    wT

    0 B1

    w

    =

    2 + wTw

    B1w

    Then

    ||z||2 = [2 + wTw, wTBT1]

    2 + wTw

    B1w

    = (2 + wTw)2 + wTBT1B1w

    62

  • 7/25/2019 cr.intro

    66/68

    Thus2 =||A||2 =||UTAV||2 =||A1||2 (2 + wTw)

    and it follows that w = 0. Thus

    UTAV =

    0T

    0 B1

    Induction now establishes the result. The orthogonality properties of U and V thenimply that

    UTAV =D A= U DVT

    8.5.1 Use of the SVD in Data Reduction and Reconstruction

    LetY be a data matrix consisting ofn observations on p response variables. That is

    Y=

    y11 y12 y1py21 y22 y2p

    ... ...

    . . . ...

    yn1 yn2 ynp

    The singular value decomposition ofY allows us to representY as

    Y= U1D1VT1

    where U1 is n byp, V1 is p byp and D1 is a p byp diagonal matrix with ith diagonalelement di. We know that di 0 and we may assume that

    d1 d2 dp

    Recall that the matrices U1 and V1 have the properties

    UT1U1= Ip and VT1 V1= Ip

    so that the columns of U1 form an orthonormal basis for the column space of Y i.e.the subspace ofRn spanned by the columns ofY. Similarly the columns ofV1 form an

    63

  • 7/25/2019 cr.intro

    67/68

    orthonormal basis for the row space ofY i.e. the subspace ofRp space spanned by therows ofY.

    The rows ofY represent points in pdimensional space and the

    yi1, yi2, . . . , yip

    represent the coordinates of the ith individuals values relative to the axes of the poriginal variables. Two individuals are close in this space if the distance (norm)between them is small. In this case the two individuals have nearly the same valueson each of the variables i.e. their coordinates with respect to the original variables arenearly equal. We refer to this subspace of Rp as individual space or observation

    space.

    The columns ofY represent p points in Rn and represent the observed values of avariable on n individuals. Two points in this space are close if they have similar valuesover the entire collection of observed individuals i.e. the two variables are measuring thesame thing. We call this space variable space.

    The distance between individuals i and i is given by

    ||yi yi ||=

    pj=1

    (yij yij)2

    1/2

    For any data matrixY there is an orthogonal matrix V1 such that ifZ= YV1 then thedistances between individuals are preserved i.e.

    ||zi zi||=||yiV1 yiV1||=(yi yi)

    TVT1 V1(yi yi)1/2

    =||yi yi||

    Thus the original data is replaced by a new data set in which the coordinate axes of theindividuals are now the columns ofZ.

    We now note thatU1 = [u1, u2, , up]

    whereuj is the jth column vector ofU1 and that

    V1= [v1, v2, , vp]

    64

  • 7/25/2019 cr.intro

    68/68

    wherevj is the j thcolumn vector ofV1. Thus we can write

    Y= U1DVT1 =

    pj=1

    d2jujvTj

    That is we may reconstruct Y as a sum of constants (the d2j) times matrices whichare the outer product of vectors which span variable space (the uj) and individualspace (the vj). The interpretation of these vectors is of importance to understandingthe nature of the representaion. We note that

    YYT = (U1D1VT1 )(V1D1U

    T1) =U1D

    21U

    T1

    ThusYYTU1= U1D

    21

    which shows that the columns ofU1 are the eigenvectors ofYYT with eigenvalues equal

    to the d2j .

    We also note that

    YTY= (V1D1UT1)(U1D1V

    T1 ) =V1D

    21V

    T1

    so thatYTYV1 = V1D

    21

    Thus the columns ofV1 are the eigenvalues ofYTY with eigenvalues again equal to the

    d2j .

    The relationshipYV1 = U1D1

    shows that the columns ofV1 transform an individuals values on the original variablesto values in the new variablesU1 with relative importance given by the dj

    If some of the dj are zero or are approximately 0 then we can approximate Y as

    Yp

    j=1

    d2jujvTj

    which tells us that an approximating subspace has the same dimension in variable space