notes 4 supp diffeq

7/31/2019 Notes 4 Supp Diffeq

1/23

Additional Topics for Chapter 4Linear Algebra and Differential Equations1

Matrix Factorization

Review of Elementary Matrices

Denition 1 An elementary matrix is an n

n matrix that can be obtained by performing a single

elementary row operation on the identity matrix In. (Note that the identity matrix itself is an elementarymatrix because we could multiply any row of In by the scalar 1.)

Recall that the elementary row operations are:

1. Swap two rows

2. Multiply a row by a nonzero constant

3. Add a multiple of one row to another row

Example 2 Row swap: Multiplying matrix A by the elementary matrix E1, in which rows 1 and 2 of I3are swapped, produces a matrix in which rows 1 and 2 of A have also been swapped.

240 1 0

1 0 00 0 1

35241 4 7

2 5 63 1 2 35 = 242 5 6

1 4 73 1 2 35Example 3 Multiplication of a row by a scalar: Multiplying matrix A by the elementary matrix E2,in which the second row of I3 has been multiplied by

1

3, produces a new matrix in which the second row of A

has been multiplied by 13

.24 1 0 00 1

30

0 0 1

3524 1 4 72 5 6

3 1 2

35 =

24 1 4 72

3

5

32

3 1 2

35

Example 4 Adding a multiple of one row to another: Multiplying matrixA by the elementary matrixE3, in which two times the rst row has been subtracted from the second row of I3, produces a new matrixin which the two times the rst row of A has been subtracted from the second row of A.

24 1 0 02 1 00 0 1

3524 1 4 72 5 63 1 2

35 = 24 1 4 70 3 83 1 2

35This leads us to the following theorems, the second of which is a direct result of the fact that elementary

row operations are reversible.

Theorem 5 If an elementary row operation is performed on a matrix A, the resulting matrix can also beobtained by multiplying A (on the left) by the corresponding elementary matrix E.

Theorem 6 If E is an elementary matrix, then E1 exists and is also an elementary matrix.

As conrmation of the previous theorem, note that the elementary matrices E1, E2, and E3 from abovehave inverses

24 0 1 01 0 00 0 1

35 ,24 1 0 00 3 00 0 1

35 , and 24 1 0 02 1 00 0 1

35because

E1E11

= E11

E1 =

24 0 1 01 0 0

0 0 1

3524 0 1 01 0 0

0 0 1

35 =

24 1 0 00 1 0

0 0 1

35 ,

1 Material from Falvo, David C. and Larson, Ron. Elementary Linear Algebra, 6th ed. Brooks/Cole. 2010.

1


2/23

while

E2E12

= E12

E2 =

24 1 0 00 1

30

0 0 1

3524 1 0 00 3 0

0 0 1

35 =

24 1 0 00 1 0

0 0 1

35 ,

and

E3E13

= E13

E3 =

24 1 0 02 1 0

0 0 1

3524 1 0 02 1 0

0 0 1

35 =

24 1 0 00 1 0

0 0 1

35 .

Theorem 7 Two matricesA andB are row equivalent if there exists a nite number of elementary matricesE1; E2;:::;Ek such that B = EkEk1 E2E1A. (In other words, A and B are row equivalent if we can get

fromA to B via a nite number of elementary row operations.)

Following is an example of elementary matrices in use to reduce a 2 2 matrix to reduced row-echelonform (i.e., I2 in this case):

Example 8 Start with A =

5 181 4

:

Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix

5 181 4

swap R1 and R2 E1 =

0 11 0

E11

=

0 11 0

1 45 18

Add5R1 to R2 E2 = 1 05 1 E12 = 1 05 1

1 40 2

Multiply R2 by 1

2E3 =

1 00 1

2

E13

=

1 00 2

1 40 1

Add4R2 to R1 E4 =

1 40 1

E14

=

1 40 1

Then, E4E3E2E1A = I. Since each of the Ei are invertible, we also see that

E11

E12

E13

E14

E4E3E2E1A = E11

E12

E13

E14

I

A = E1

1 E1

2 E1

3 E1

4 .

In other words,

A =

0 11 0

1 05 1

1 00 2

1 40 1

,

or, A is the product of the inverses of the elementary matrices that were used to reduce A to I.

The LU-Factorization (without row interchanges)

There are a number of "matrix factorizations" in frequent use. Perhaps the most basic of these is what isknown as the "LU-Factorization." To motivate its development, let us consider an example:

Example 9 Start with A = 2 1

8 7 . We can accomplish row-echelon form with only one row operation.Here is that row operation and its associated elementary matrix:Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix

2 18 7

Add4R1 to R2 E1 =

1 0

4 1

E11

=

1 04 1

#

2 10 3

2


3/23

The above example shows that E1A = U, so the relation A = LU implies that L must actually be E11

,or

A =

2 18 7

=

1 04 1

2 10 3

= LU:

What is the signicance of this factorization? First of all, we use the letters L and U for a reason. Note that

L is lower triangular (any nonzero elements are on or below the diagonal) and U is upper triangular (anynonzero elements are on or above the diagonal). Additionally, the diagonal elements of the L matrix are1s. Once we have an LU-factorization of a matrix, we can generate an algorithm to easily solve numeroussystems involving that same coecient matrix. The practical signicance of this is that it is even moreecient than Gaussian elimination when we need to reuse a coecient matrix with varying right-hand sides(i.e., what weve been calling the b vector).2 Before we proceed, we need to mention an important "lemma"(a lemma is a sort of warm-up to a Theorem):

Lemma 10 IfL and bL are lower triangular matrices of the same size, so is their productLbL. Furthermore,if both of the matrices have ones on their diagonals, then so does their product. If U and bU are uppertriangular matrices of the same size, so is their product UbU.

Let us illustrate with another example, this time taking note of the result of the above lemma.

Example 11 Find an LU-factorization of the matrix A =

24 2 1 14 5 22 2 0

35.Here is the procedure (Gaussian elimination) and its associated elementary matrices.

Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix

24 2 1 14 5 2

2 2 0

35 Add2R1 to R2 E1 =

24 1 0 02 1 0

0 0 1

35 E1

1=

24 1 0 02 1 0

0 0 1

35

24

2 1 10 3 0

2 2 0

35

Add

R1 to R3 E2 =

24

1 0 00 1 0

1 0 1

35

E12

=

24

1 0 00 1 0

1 0 1

352

4 2 1 10 3 00 3 1

35 Add R2 to R3 E3 =

24 1 0 00 1 0

0 1 1

35 E1

3=

24 1 0 00 1 0

0 1 1

35

#24 2 1 10 3 0

0 0 1

35

Just as in the earlier 2 2 example, we have

E3E2E1A = U;

soE11

E12

E13

E3E2E1A = A = E11

E12

E13

U.

2 For n n systems, LU-Factorization requires

4n3 3n2 n=6 airthmetic operations for the factorization itself (which

only has to be done once and can then be reused). Then each solution for the two resulting tiangular systems (more on thislater) can be carried out in 2n2n operations per system. On the other hand, Gaussian elimination uses

4n3 + 9n2 7n

=6

arithmetic operations to arrive at a solution, and it requries this many operations for each system.

3


4/23

But, note that each of the E1i s are lower triangular with ones on their diagonal. According to the previouslemma, their product will also have this form. Indeed,

E11

E12

E13

=

24 1 0 02 1 0

0 0 1

3524 1 0 00 1 0

1 0 1

3524 1 0 00 1 0

0 1 1

35 =

24 1 0 02 1 0

1 1 1

35 ; (1)

and we realize that E11

E12

E13

= L, and that A = LU, as desired. In other words, A can be "factored"into

A = 24 2 1 14 5 22 2 0

35 = 24 1 0 02 1 01 1 1

3524 2 1 10 3 00 0 1

35 = LU, (2)which again, is a product of a lower and an upper diagonal matrix. Note too that the result of themultiplication in (1) is a matrix whose diagonal elements are ones and whose other elements are the individualelements of the elementary matrices "condensed" into one matrix. We can look directly at L (at least inthis case) and see exactly what row operations were performed to get from A to U.

Using A = LU to Solve Systems

So how do we use this factorization to solve a system Ax = b? We can use a simple two-stage process:

1. Solve the lower triangular system Ly = b for the vector c by forward substitution.

2. Solve the resulting upper triangular system Ux = y for x by back substitution.

The above two-stage process works because if

Ux = y and Ly = b, then Ax = LUx = Ly = b.

As an example, consider the LU-factorization we found in (2) above, namely24 2 1 14 5 2

2 2 0

35 =

24 1 0 02 1 0

1 1 1

3524 2 1 10 3 0

0 0 1

35 :

Suppose we seek to nd the solution to the system

24 2 1 14 5 22 2 0

3524 xyz

35 = 24 122

35 .We rst solve the lower triangular system2

4 1 0 02 1 01 1 1

3524 ab

c

35 =

24 12

2

35 , or

8 0;with positive scalars ci (called weights) are all inner products on R

n. Note the conditionc > 0. If any ofthe ci are zero or negative, the product is no longer an inner product.

8


9/23

Example 17 Consider the real-valued and continuous functions in the vector space C[a; b] (the space of all

continuous functions on the interval [a; b]). Thenhf; gi = Rba

f(x) g (x) dx is an inner product on C[a; b].

(1) hf; gi = Rba

f(x) g (x) dx =Rba

g (x) f(x) dx = hg; fi(2) hf; g + hi = Rb

af(x) [g (x) + h (x)] dx =

Rba

f(x) g (x) dx +Rba

f(x) h (x) dx = hf; gi + hf; hi(3) c hf; gi = c Rb

af(x) g (x) dx =

Rba

(cf(x)) g (x) dx = hcf;gi(4) hf; fi = Rb

af(x) f(x) dx 0 because (f(x))2 0 for all x. Additionally, hf; fi = 0 if and only if

f(x) = 0 or if a = b.

Orthogonal Projections

Review of Dot Products and Orthogonality

Recall the following:

Two vectors are said to be orthogonal if their dot product is zero, namely

u v = 0 or uTv = 0,

where u and v are column vectors. By denition, the zero vector is orthogonal to all other vectors.

The angle between the two vectors is given by the relation

u v = kuk kvk cos or cos = u vkuk kvk .

The length or norm of a vector is given by kvk2 = v v. The distance between two points (or vectors) is given by

d (u;v) = ku vk = kv uk .

A set of vectors is said to be mutually orthogonal if every pair of vectors in the set is orthogonal.Additionally, if all of the vectors are unit vectors (i.e., have length of one), the set is said to beorthonormal.

An orthogonal set of nonzero vectors is linearly independent. A basis that is an orthogonal set is called an orthogonal basis. If the vectors in the basis are all

of length one, the basis is called an orthonormal basis. (All of the familiar "standard" bases areorthonormal, e.g. f(1; 0; 0) ; (0; 1; 0) ; (0; 0; 1)g)

Orthogonal and Orthonormal Bases

Why make a big deal out of orthogonal and orthonormal bases? It turns out that the orthonormal bases ofa vector space are quite useful because there is a simple formula for writing any vector in the vector space asa linear combination of those orthonormal basis vectors. We do not have to start over and solve a system ofequations just to determine the coecients of the given vector relative to the basis every single time. Hereis the derivation of that formula.

Suppose we have an orthonormal basis fu1;:::;ung for a vector space V. Ifv is a vector in V, theremust exist scalars c1;:::;cn such that

v = c1u1 + c2u2 + + cnun. (3)

We seek a formula to determine each of the cis. Start with the ith basis vector, namely ui. If we take thedot product ofui with both sides of (3), we have

v ui = (c1u1 + c2u2 + + cnun) ui,

9


10/23

and using the properties of dot products, this leads to

v ui = (c1u1 + c2u2 + + cnun) ui= c1u1 ui + c2u2 ui + + cnun ui.

Now, since each of the basis vectors are mutually orthogonal, we must have ui uj = 0 for any two distinctvectors in the set fu1;:::;ung (i.e., ui uj = 0 unless i = j). Therefore,

v ui = 0 + 0 + + ciui ui + + 0 + 0.

Since the basis vectors are orthonormal, we know their lengths are all one, so ui ui = kuik2 = 1, and

v ui = ci (ui ui) = ci.

We have therefore found a formula for the ith coecient ci. As i ranges from 1 to n, we nd that c1 = v u1,c2 = v u2, ..., cn = v un. Consequently, we have proven the following theorem.

Theorem 18 Iffu1; :::;ung is an orthonormal basis for a vector space V, any vectorv in V can be writtenas a linear combination of these basis vectors as follows:

v = c1u1 + c2u2 + + cnun= (v u1)u1 + (v u2)u2 + + (v un)un.

Example 19 The vectors u1 = (0; 1; 0), u2 = 35 ; 0; 45, andu3 = 45 ; 0; 35 form an orthonormal basisB for R3. Express the vectorv = (2; 3; 1) as a linear combination of these basis vectors.

Solution 20 Take the three required dot products:

v u1 = (2; 3; 1) (0; 1; 0) = 3v u2 = (2; 3; 1)

3

5; 0; 4

5

=

2

5

v u3 = (2; 3; 1)

4

5; 0;

3

5

=

11

5

These scalars represent the "coordinates ofv relative to the basis B," and

v = 3(0; 1; 0) + 253

5; 0; 4

5 + 11

54

5; 0; 3

5 .

(Multiply it out to conrm this!)

Furthermore, note that taking dot products in this manner, with the rst vector the same each time, isequivalent to the following matrix multiplications:

[2; 3; 1]24 01

0

35 = 3, [2; 3; 1]

24 350

45

35 = 2

5, and [2; 3; 1]

24 450

3

5

35 = 11

5,

and we can combine all of them into a single matrix multiplication:

[2; 3; 1]24 0 35 451 0 0

0 45

3

5

35 = 3 25

11

5

,

yielding the desired coecients ofu1, u2, and u3, respectively. (Compare this to the technique we had touse to nd the coordinates of a vector relative to a nonstandard basis.)

10


11/23

Distance and Projections

We quite often need to determine the distance between a point b and a line in the direction of vector a, asshown in the gure below. Or, we might want to determine "how much" of the force vector b is pointing inthe direction ofa. (We have probably all done this with respect to the coordinate axes in the former caseor horizontal and vertical vector components in the latter.) Regardless of the question, the approach is thesame. We need to determine the projection of b onto a, denoted by projab and represented by p in thegure.

b

a

e = b - p

p

b

a

e = b - p

p

O

It might help to think of projab as what b would look like if you were "above" it and looking directlydown at a, with a line of sight perpendicular to a.

We will now derive the formula for p. Note that p must be some scalar multiple of vector a because itis in the same direction (or opposite direction if the angle was obtuse). Therefore, p = ca, and we need tosolve for c. Of course, the point on the vector a that is closest to b would be the point at the foot of theperpendicular dropped from b onto a. In other words, the line from b to the closest point p on a wouldbe perpendicular to a: Note that in terms of vector subtraction, the side opposite angle O (denoted e inthe gure) represents the vector subtraction e = b p, or because p = ca, e = b ca. Since vector e isperpendicular to a, we must have

a e = 0, or a (b ca) = 0, or a b a ca = 0,

which in turn leads to the solution

c =a ba a .

Therefore, the projection p of vector b onto a is given by

p = proja

b = ca =a ba a

a. (4)

If we rewrite the dot products in (4) in the equivalent form a b = aTb and a a = aTa, we have

projab =aTb

aTaa.

Realizing that this is a scalar aTb

aTamultiplied by the vector a and rearranging, we have3

projab = aaTb

aTa=aaT

aTab.

Note that the quantity aaT

aTaactually represents a matrix called the projection matrix P. (It is a matrix

because aaT is a column times a row (say an n

1 times a 1

n, so the product is an n

n matrix), andaTa is the familiar dot product of a with itself.) Thus we conclude that the projection ofb onto a can be

found by multiplying the projection matrix P = aaT

aTaby the vector b:

p = Pb.

3 The 11 "matrix" (i.e. scalar) aTa is called an "inner product" while the nn matrix aaT is called the "outer product."

11


12/23

Example 21 The matrix that projects any vector onto the line through the point a = (1; 1; 1) is given by

P =aaT

aTa=

1

3

24 11

1

35 1 1 1 =

24 13 13 131

3

1

3

1

31

3

1

3

1

3

35 .

For example, to determine the projection of (2; 3; 1) onto the line through (1; 1; 1), we would simply calculate24

1

3

1

3

1

31

3

1

3

1

31

3

1

3

1

3

3524

215

35 =

24

8

38

38

3

35 .

Note again the ease with which the projections can be found if the vector a has unit length. The dotproduct a a would be 1, and the resulting formulas would become

projab = (a b)a

andP = aaT.

Example 22 Determine the projection of the vectorv = (6; 7) onto the vectoru = (1; 4) :

Method 1 Using the formula projab =abaaa, we have

projuv =34

17(1; 4) = (2; 8) .

Method 2 Using the projection matrix P = aaT

aTa, we nd

P =1

17

14

1 4

=

1

17

4

174

17

16

17

.

Then

projuv = Pv =

1

17

4

174

17

16

17

67

=

28

,

both of which appear to agree with the gure shown below.

(2,8)

(1,4)

(6,7)

(2,8)

(1,4)

(6,7)

u

p

O

v

12


13/23

Gram-Schmidt Orthonormalization

Recall that, in R2, the projection of a vector v onto a nonzero vector u is given by

projuv =u vu uu.

If the vector u is of unit length, this projection becomes

projuv =u vu

uu = (u v)u. (5)

Now suppose we have a basis fw1; :::;wng for some vector space V and we wish to use this basis toconstruct an orthogonal (or orthonormal) basis fv1;:::;vng for V. Start by choosing

v1 = w1,

(where v1 6= 0 because w1 was a member of the original basis). We then require that the second vector beorthogonal to the rst, or v1 v2 = 0. Weve seen previously that at least one way to obtain an orthogonalvector is to consider the perpendicular dropped from v onto u in the projection projuv:

proj(v)

v

v - proj(v)

u

proj(v)

v

v - proj(v)

u

O

So lets take the next vector, v1, to be the perpendicular dropped from w2 onto v1, i.e.

v2 = w2 projv1w2. (6)

As conrmation of this choice, note that this will satisfy the orthogonality requirement because

v1 v2 = v1 w2 projv1w2= v1 w2 v1 v1 w2

v1 v1 v1= v1 w2 v1 w2= 0.

Because v1 = w1 and w2 are members of the original basis, we know they are linearly independent andtherefore v1 and v2 are also linearly independent, and thus v2 = w2 v1w2v1v1 v1 6= 0.

Now we need the third basis vector to be perpendicular to the rst two. Note from Eq. (6) that inorder to construct a new orthogonal basis vector (i.e. v2), we took the next given basis vector (i.e., w2) andremoved the component ofw2 that pointed in the direction of v1, our already settled basis vector. If wecontinue in this manner, to nd v3 we would subtract the components ofw3 in the directions ofv1 and v2 to

obtain a vector that is perpendicular to both v1 and v2, then to nd v4 we would subtract the componentsofw4 in the direction ofv1, v2, and v3, and so on. In other words, we will take

v3 = w3 projv1w3 projv2w3,

and thenv4 = w4 projv1w4 projv2w4 projv3w4,

and so on. This leads to the following generalization:

13


14/23

Theorem 23 Gram-Schmidt Orthogonalization: Let W = fw1;:::;wng be a basis for a vector space V.To create a set of orthogonal basis vectors B = fv1;:::;vng from W, construct the vi as follows:

v1 = w1

v2 = w2 projv1w2v3 = w3 projv1w3 projv2w3

...

vn = wn

projv1wn

projv2wn

projvn1wn

To create an orthonormal basis, normalize each of the vectors vi.

If we normalize the vectors as we go through the process, all of the dot products, as we are remindedin (5), are easier to calculate. However, the normalization usually introduces many square roots into thecalculation, which may be cumbersome to work with.

Here are some examples of this process.

Example 24 Apply the Gram-Schmidt process to the following basis for R2: B = f(1; 1) ; (0; 1)g.

Solution: Choose v1 = (1; 1). Then remove the component of w2 = (0; 1) that points in the directionof v1:

v2 = w2 projv1w2= (0; 1) (1; 1) (0; 1)

(1; 1) (1; 1) (1; 1)

= (0; 1)

1

2;

1

2

=

12

;1

2

:

Therefore an orthogonal basis for R2 based on the two vectors (1; 1) and (0; 1) would be (1; 1) and1

2; 12

.

If we desire and orthonormal basis, divide each vector by its respective length, namely kv1k =p

2 and

kv2

k= 1

p2, so the basis would be

p2

2;p2

2 andp2

2;p2

2 .Note: Had we chosen v1 = (0; 1), we would have found

v2 = (1; 1) (0; 1) (1; 1)(0; 1) (0; 1) (0; 1) = (1; 0) ,

which we should have been able to guess in the rst place, since (1; 0) and (0; 1) make up the standard basisfor R2!

Example 25 Apply the Gram-Schmidt process to the following basis for a three-dimensional subspace of R4:B = f(1; 2; 0; 3) ; (4; 0; 5; 8) ; (8; 1; 5; 6)g.

Solution: Choose v1 = (1; 2; 0; 3). Then remove the component of w2 = (4; 0; 5; 8) that points in the

direction of v1:

v2 = w2 projv1w2= (4; 0; 5; 8) (1; 2; 0; 3) (4; 0; 5; 8)

(1; 2; 0; 3) (1; 2; 0; 3) (1; 2; 0; 3)= (4; 0; 5; 8) (2; 4; 0; 6)= (2; 4; 5; 2) :

14


15/23

Now remove the components of w3 = (8; 1; 5; 6) that point in the directions of v1 and v2:

v3 = w3 projv1w3 projv2w3= (8; 1; 5; 6) (1; 2; 0; 3) (8; 1; 5; 6)

(1; 2; 0; 3) (1; 2; 0; 3) (1; 2; 0; 3) (2; 4; 5; 2) (8; 1; 5; 6)

(2; 4; 5; 2) (2; 4; 5; 2) (2; 4; 5; 2)= (8; 1; 5; 6) (2; 4; 0; 6) (2; 4; 5; 2)= (4; 1; 0; 2) .

We conclude that the setf

(1; 2; 0; 3) ; (2;

4; 5; 2) ; (4; 1; 0;

2)g

constitutes an orthogonal basis for this par-ticular subspace. We get an orthonormal basis by dividing each vector by its length:

k(1; 2; 0; 3)k =p

14

k(2; 4; 5; 2)k = 7k(4; 1; 0; 2)k =

p21,

so the orthonormal basis is given by1p14

;2p14

; 0;3p14

;

2

7;47

;5

7;

2

7

;

4p21

;1p21

; 0;2p

21

.

Projection and Distances on Subspaces; QR-Factorization

Quick Review

We now know how to project one vector onto another vector, namely via any of the following formulas:

projuv =u vu uu or projuv =

uTv

uTuu or projuv =

uuT

uTuv.

We also know how to write any vector w in a vector space V in terms of its orthonormal basis vectorsfu1;:::;ung:

w = (w u1)u1 + (w u2)u2 + + (w un)un.Finally, weve devised a way to generate an orthonormal basis fv1; :::;vng from another basis fw1;:::;wngvia the Gram-Schmidt process:

v1 = w1

v2 = w2 projv1w2v3 = w3 projv1w3 projv2w3

...

vn = wn projv1wn projv2wn projvn1wn.

Projection onto a Subspace

The projection of a vector v onto a subspace tells us "how much" of the given vector v lies in that particularsubspace. Put another way (and rather non-rigorously), the projection ofv onto the subspace tells us "howmany" of each of the subspaces orthonormal basis vectors we would need to represent v. We have met this

quantity before, and you should recognize the right-hand side of the following.

Denition 26 Consider the subspace W of Rn and letfu1; :::;ukg be an orthonormal basis for W. Ifv isa vector in Rn, the projection of vectorv onto the subspace W, denoted proj

Wv, is dened as

projWv = (v u1)u1 + (v u2)u2 + + (v uk)uk.

15


16/23

This is the exact same formula we encountered when writing a vector in terms of orthonormal basisvectors of a particular subspace! In addition, it would make sense (and we accept without proof) that everyvector in Rn can be "decomposed" into a vector w within a vector space W and a vector w? orthogonal toW. In symbols,

v = w + w?, where w is in W and w? is in W?.

It should come as no surprise, especially if one considers the two-dimensional case, that

w = projWv,

and because v = w + w?, we must have

w? = v projWv.

Example 27 Suppose we have the vector v = (3; 2; 6) in R3, and we wish to decomposev into the sum ofa vector that lies in the subspace W consisting of all vectors of the form (a;b;b) and a vector orthogonal tothat subspace.

Solution: The vectors (1; 0; 0) and (0; 1; 1) span all of W and are orthogonal (hence linearly independent),and therefore form a basis for W. Normalizing, we nd orthonormal basis vectors

u1 = (1; 0; 0) and u2 =

0;

1p2

;1p

2

.

Then

w = projWv

= (v u1)u1 + (v u2)u2= ((3; 2; 6) (1; 0; 0)) (1; 0; 0) +

(3; 2; 6)

0;

1p2

;1p

2

0;

1p2

;1p

2

= (3; 0; 0) + (0; 4; 4)

= (3; 4; 4) .

Now,

w

?= v

proj

Wv

= (3; 2; 6) (3; 4; 4)= (0; 2; 2) .

We can then conclude that (3; 4; 4) is a vector in W while (0; 2; 2) is a vector that is orthogonal to W.

Distance from a Point to a Subspace

Again, it would seem reasonable to extend the concept of "distance between points" to "distance from apoint to a line" to "distance from a point to a subspace" by realizing that the latter is simply the distanceof the point from its projection in the subspace. In symbols,

d (x; W) =x proj

Wx

.

Example 28 Determine the distance of the point x = (4; 1; 7) from the subspace W discussed in the pre-vious example.

Solution: We have already found an orthonormal basis for W, namely

u1 = (1; 0; 0) and u2 =

0;

1p2

;1p

2

.

16


17/23

Then

projWx = (x u1)u1 + (x u2)u2

= ((4; 1; 7) (1; 0; 0)) (1; 0; 0) +

(4; 1; 7)

0;1p

2;

1p2

0;

1p2

;1p

2

= (4; 0; 0) + (0; 3; 3)= (4; 3; 3) .

The distance of the pointx from the subspace W is then

x projWx = k(4; 1; 7) (4; 3; 3)k

= k(0; 4; 4)k=

p32.

Orthogonal Matrices

Denition 29 Anorthogonal matrixis a square matrix with orthonormal columns. Denoting this matrixQ, it is easy to determine that QTQ = I, and therefore, QT = Q1. In other words, the transpose of anorthogonal matrix is its inverse.4

Example 30 Consider the rotation matrix Q =

cos sin sin cos

. Then QT =

cos sin

sin cos

; and

it is easy to verify that QT

Q = I. This type of matrix is called an isometry because it represents alength-preserving transformation. We can calculate the length of(1; 2)T to be p5. Then,cos sin sin cos

12

=

cos 2sin 2cos + sin

,

which still has a length ofp

5:

Example 31 All permutation matrices are orthogonal, hence we conrm that the inverse of a permutationmatrix is actually its transpose.

Another important property of orthogonal matrices is that multiplication by Q preserves lengths, innerproducts, and angles (i.e., lengths, inner products, and angles that existed before multiplication by Q will

be the same after multiplication by Q.) For instance, lengths are equivalent (i.e., kQxk2 = kxk2) because

(Qx)

T

(Qx) = xT

QT

Qx = xT

x, and inner products are preserved because (Qx)

T

(Qy) = xT

QT

Qy = xT

y.Therefore, the following statements are equivalent if they are about an n n matrix Q:1. Q is orthogonal.

2. kQxk = kxk for exery x in Rn:3. Qx Qy = x y for every x and y in Rn.Note that the discussion earlier regarding the expression of a vector v as a linear combination of a

subspaces orthonormal basis vectors can be reinterpreted here if we consider again the system Ax = b.This time, however, we will consider Qx = b, where the columns of Q are the orthonormal basis vectors.Then writing b as a linear combination of the basis vectors fq1;:::;qng simply equates to solving the system

x1q1 + x2q2 + + xnqn = b, or Qx = b.The solution to this system is x = Q1b, and since Q1 = QT, this becomes

x =QTb =

264

qT1

...

qTn

3752664

...b...

3775 =

264

qT1

b...

qTn b

375 , (7)

4 The QTQ = I relation still works even ifQ is not square. IfQ is an m n matrix, QT would be an nm matrix, andtheir product would be a square identity matrix.

17


18/23

where the components ofx are the dot products of the orthonormal basis vectors with b, as we would expect.

Note: When we projected a vector b onto a line, we ended up with the expression aTb

aTa. Note here

that a is actually qi, and because of the unit lengths, the denominator is 1. What Eq. (7) then showsis that every vector b is the sum of its one-dimensional projections onto the lines spanned by each of theorthonormal vectors qi.

Note: Furthermore, because QT = Q1 we have QQT = I (in addition to QTQ = I). This leads to thesomewhat remarkable conclusion that the rows of a square matrix are orthonormal whenever the columnsare!

QR-Factorization

In the Gram-Schmidt process, we start with independent vectors in Rm, namely fa1; :::;ang ; and end withorthonormal vectors fq1; :::;qng (again in Rm) . If we make these vectors the columns of matrices A andQ, respectively, we have two m n matrices. Is there a third matrix that connects these two?

Recall that we can easily write vectors in a space as linear combinations of the vectors in any orthonormalbasis of that space. Since the qi constitute an orthonormal basis, we have

a1 =qT1a1q1 +

qT2a1q2 + +

qTna1

qn

a2 =qT1a2q1 +

qT2a2q2 + +

qTna2

qn

a3 =qT1a3q1 +

qT2a3q2 + +

qTna3

qn

...

an =qT1anq1 +

qT2anq2 + + qTnanqn:

However, because of the manner in which the Gram-Schmidt process is performed, we know that vector a1is orthogonal to the vectors q2;q3;q4;:::, the vector a2 is orthogonal to the vectors q3;q4;q5;:::, the vectora3 is orthogonal to the vectors q4;q5;q6;:::, and so on. Therefore, all of the dot products q

Tj ai with j > i

will equal zero, yielding the following:

a1 =qT1a1q1

a2 =qT1a2q1 +

qT2a2q2

a3 =qT1a3q1 +

qT2a3q2 +

qT3a3q3

...

an = qT1 anq1 + qT2 anq2 + + qTnanqn:Of course, this corresponds exactly to the following system:

A =

24 a1 a2 an

35

| {z }mn

=

24 q1 q2 qn

35

| {z }mn

2666664

qT1a1

qT1a2

qT1a3 qT

1an

0qT2a2

qT2a3 qT

2an

0 0qT3a3 qT

3an

......

.... . .

...0 0 0 0

qTnan

3777775

| {z }nn

= QR,

and we have arrived at the QR-Factorization of matrix A, in which Q has orthonormal columns and R is

upper triangular (because of how Gram-Schmidt is performed - we start with vector a, which falls on thesame line as q1. Then vectors a1 and a2 are in the same plane as q1 and q2, and so on). Thus matrix R isthe matrix that connects Q back to A, and we have the following theorem:

Theorem 32 Let A be an m n matrix with linearly independent columns. Then A can be factored asA = QR, where Q is an m n matrix with orthonormal columns and R is an invertible upper triangularmatrix.

18


19/23

Example 33 Find a QR factorization of

A =

2664

1 2 21 1 21 0 11 1 2

3775 .

Solution: It is easy to determine that the columns of A are linearly independent, so it forms a basis forthe subspace spanned by those columns (i.e., the column space of A). Start the Gram-Schmidt process by

settingv1

= a1

:

v1 =

0BB@1

111

1CCA .Then,

v2 =

0BB@

2101

1CCA

v1 a2v1 v1

0BB@1

111

1CCA =

0BB@

2101

1CCA

2

4

0BB@1

111

1CCA =

0BB@

3

23

21

21

2

1CCA .

Note: Since we will be normalizing later, we can "rescale"v2 without changing any orthogonality relation-ships to make future calculations easier. So well replacev2 withv02 = (3; 3; 1; 1). Finally,

v3 =

0BB@2212

1CCAv1 a3v1 v1

0BB@1

111

1CCAv02 a3

v02 v0

2

0BB@3311

1CCA

=

0BB@

2212

1CCA

1

4

0BB@1

111

1CCA

15

20

0BB@3311

1CCA

=

0BB@

12

01

2

1

1CCA

.

We can again rescale v3 to obtain v03

=

0BB@1012

1CCA. We now have an orthogonal basis fv1;v02;v03g for thesubspace W. Now, to obtain an orthonormal basis, normalize each vector (the details are left to you):

fq1; q2; q3g =

8>>>:0BB@

1=21=21=21=2

1CCA ;

0BB@

3p

5=10

3p

5=10p5=10p5=10

1CCA ;

0BB@

p6=60p6=6p6=3

1CCA9>>=>>; .

Now, to obtain a QR factorization for A, we have

Q =

26641=2 3p5=10 p6=6

1=2 3p5=10 01=2 p5=10 p6=61=2

p5=10

p6=3

3775 .

Because Q has orthonormal columns, we know that QTQ = I. Therefore, if A = QR,

QTA = QTQR = IR = R.

19


20/23

So to nd R, just calculate QTA:

QTA =

24 12 12 12 123p5=10 3p5=10 p5=10 p5=10

p6=6 0 p6=6 p6=3

352664

1 2 21 1 21 0 11 1 2

3775

=

2

4

2 1 12

0p

5 32

p5

0 0 12

p6

3

5= R.Note that the diagonals of R contain the lengths of vectors v1, v2, and v3.

Using the QR Factorization to Solve Systems

Note that the system Ax = b becomes QRx = b, and hence

Rx = QTb (8)

(because Q1 = QT). Because R is upper triangular, the equation in (8) can be solved easily via backsubstitution. For example, given the system Ax = (0; 4; 5) and the fact that A = QR factorization yields

24 1 1 21 0 21 2 3

35 = 2641

p34

p422

p141p3

1p42

3p14

1p3

5p42

1p14

375264 p3 1

p3 p30

p14p3

p21p2

0 0p7p2

375 ,We nd

QTb =

24 13

p3 1

3

p3 1

3

p3

2

21

p42 1

42

p42 5

42

p42

1

7

p14 3

14

p14 1

14

p14

3524 04

5

35 =

24 3

p3

1

2

p42

1

2

p14

35 .

Then solve

Rx =

264

p3 1p

3p3

0p14p3

p21p2

0 0p7

p2

375

2

4xyz

3

5=

2

43p31

2

p42

1

2

p14

3

5by back substitution to obtain

x =

24 20

1

35 .

Least Squares and the QR Factorization

Review of Least Squares and the Normal Equations

This topic builds o of what we did in Computer Lab #10. In the lab, we learned:

In a least-squares situation, in order to minimize all of the errors (specically, the sum of the squared

distances between the "best-t" line and the actual data points), we needed to determine the vectorin Ax that was closest to the vector b.

This is the same as determining the projection of b onto a subspace, and that subspace was actuallythe column space of A.

Typically, in a least squares setting, we have many more data points than variables, so if A is m n,then m > n, and we most likely do not have an exact solution (i.e., rarely will all the points follow themathematical model exactly).

20


21/23

In terms of matrix subspaces, the vector b will most likely be outside the column space of A. However, the point p in the subspace that is closest to b would be in the column space of A, so it can

be written as p = Abx, where bx represents the "best estimate" vector to the "almost" solution vectorx.

Since p is the projection of b onto the column space, the error vector we wish to minimize, i.e.e = b Abx, will be orthogonal to that space.

However, if a vector is orthogonal to the column space of the matrix A, it is also orthogonal to the rowspace of the transpose A

T

, and any vector orthogonal to the row space of a matrix is in the null spaceof that matrix.

Therefore, because e is orthogonal to the column space of A, we can conclude that it is in the nullspace of AT. This is what nally allowed us make the following important connection:

AT (b Abx) = 0ATb ATAbx = 0

ATAbx = ATb, (9)the last line of which describes what are called the normal equations.

Finally, the matrix ATA is invertible exactly when the columns of A are linearly independent.5 Then,the best estimate

bx, which gives us the coecients in the mathematical model (or "line" of best-t), 6

can be found as bx = ATA1 ATb:Example 34 Find a least squares solution to the inconsistent system Ax = b, where

A =

24 1 52 2

1 1

35 and b =

24 32

5

35 .

Solution: Compute

ATA =

1 2 15 2 1

24 1 52 21 1

35 = 6 0

0 30

andATb =

1 2 15 2 1

24 325

35 = 216

.

Then the normal equations are

ATAbx = ATb6 00 30

bx = 216

,

from which it is easy to see thatbx = 13

; 815

T.

Example 35 Find the least squares approximating line for the data points (1; 2), (2; 2), and (3; 4).Solution: We want the liney = a + bx that is best ts these three points. The appropriate system would

be

a + b (1) = 2

a + b (2) = 2

a + b (3) = 4

5 Be careful here - because A might be rectangular, we are acutally dealing with what is called a "left inverse," and the

relationATA

1

= A1AT

1

does not hold as it does with square matrices.6 I use quotes here because we are not limited to linear models with this technique.

21


22/23

which can be reformed into Ax = b as 24 1 11 2

1 3

35 a

b

=

24 22

4

35 .

Again, compute

ATA =

1 1 11 2 3

2

41 11 21 3

3

5=

3 66 14

and

ATb =

1 1 11 2 3

24 224

35 = 8

18

.

Solving 3 66 14

bx = 818

leads to the solution bx = 23

; 1T

, so the equation for the line of best t would be y = 23

+ x, shown in theplot below along with the three data points:

52.50-2.5-5

5

2.5

0

-2.5

x

y

x

y

While were at it, we can also calculate the actual least squares error. If bx represents the least squaressolution of Ax = b, it is the vector in the column space of A that is closest to b. The actual distance fromb to bx would simply be the length of the perpendicular component of the projection ofb onto A. In symbols,

kek = kb Abxk .Now,

e = b Abx =24 22

4

35

24 1 11 2

1 3

35 23

1

=

24 132

31

3

35 ,

and the length ofe is thenq132 + 2

32 + 1

32 = q2

3 0:816.

Least Squares and the QR Factorization

One major advantage of orthogonalization is that it greatly simplies the least squares problem Ax = b.The normal equations from (9) are still

ATAbx = ATb,

22


23/23

but with QR factorization, ATA becomes

ATA = (QR)T

(QR) = RTQTQR = RTR, (because QT = Q1).

Then, the equations in (9) become

ATAbx = ATbRTRbx = RTQTb,

or,R

bx = QTb. (10)

Although this may not look like much of an improvement, it most certainly is, particularly because R isupper triangular. Therefore, the solution to (10) can be found via back substitution. We still need to useGram-Schmidt to produce Q and R, but the payo is that the equations in (10) are less prone to numericalinaccuracies such as round-o error.

Example 36 Consider the previous example in which we found the line of best t for the points (1; 2), (2; 2),and (3; 4). If we instead nd the QR factorization, we have

A =

24 1 11 2

1 3

35 =

24 13

p3 1

2

p2

1

3

p3 0

1

3

p3 1

2

p2

35 p3 2p3

0p

2

= QR.

Then Rbx = QTb becomes p3 2p30 p2 bx =

1

3p31

3p31

3p312

p2 0 12

p2 242

2435 = 83p3p2 .

Hence,p

2b =p

2 ) b = 1 and so p3a + 2p3(1) = 8p3

3) a = 2

3, as we found earlier.

An Aside: Least Squares and Calculus

Consider the simple system

a1x = b1

a2x = b2:

a3x = b3

This is solvable only if b1; b2, and b3 are in the ratio of a1 : a2 : a3. In practice, this would rarely be the

case if the above equations came from "real" data. So, instead of trying to solve the unsolvable, we proceedby choosing an x that minimizes the average error E in the equations. A convenient error measurement touse is the "sum of squares," namely

E2 = (a1x b1)2 + (a2x b2)2 + (a3x b3)2 .If there was an exact solution, E = 0. If there is not an exact solution, we can nd the minimum error bysetting the derivative of E2 = 0

dE2

dx= 2[(a1x b1) a1 + (a2x b2) a2 + (a3x b3) a3] = 0

and then solving for x:

0 = 2 ((a1x b1) a1 + (a2x b2) a2 + (a3x b3) a3)= 2xa2

1 2a2b2

2a3b3

2a1b1 + 2xa

2

2+ 2xa2

3

)x =

a1b1 + a2b2 + a3b3a21

+ a22

+ a23

=aTb

aTa.

This result, which you should recognize as the coecient in the projection calculations, gives us the least-squares solution to a problem ax = b in one variable x.

23

notes 4 supp diffeq

Documents